WO2021258350A1 - Audio signal processing method and apparatus - Google Patents

Audio signal processing method and apparatus Download PDF

Info

Publication number
WO2021258350A1
WO2021258350A1 PCT/CN2020/098183 CN2020098183W WO2021258350A1 WO 2021258350 A1 WO2021258350 A1 WO 2021258350A1 CN 2020098183 W CN2020098183 W CN 2020098183W WO 2021258350 A1 WO2021258350 A1 WO 2021258350A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
frequency domain
audio signal
signal
domain coefficients
Prior art date
Application number
PCT/CN2020/098183
Other languages
French (fr)
Chinese (zh)
Inventor
张立斌
袁庭球
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/098183 priority Critical patent/WO2021258350A1/en
Priority to CN202080092744.4A priority patent/CN114945981A/en
Publication of WO2021258350A1 publication Critical patent/WO2021258350A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Definitions

  • This application relates to the field of multimedia processing technology, and in particular to an audio signal processing method and device.
  • the electronic device as the transmitting end can sample, quantize, and encode the collected audio signal and then compress and transmit it to the electronic device at the receiving end.
  • multiple applications on the electronic device as the receiving end may have different delay requirements and quality requirements for the audio signal, and they require the electronic device at the transmitting end to compress and encode the audio signal differently.
  • FIG. 1 shows a possible application scenario.
  • the mobile phone sends the collected audio signal to a smart headset.
  • audio application 1 is a voice enhancement application.
  • the requirements are high, and the audio signal transmission quality requirements are general;
  • the audio application 2 is a three-dimensional sound field collection application, which has high requirements for the transmission quality of the received audio signals, but the audio signal delay requirements are not high.
  • the mobile phone needs to perform different compression and encoding processing on the same audio signal, and transmit multiple audio signals to the smart earphone.
  • the transmission delay and quality of different audio signals are different, but the different audio signals are different.
  • the content is the same audio signal collected by the mobile phone. Therefore, it will cause repeated transmission of audio signals, leading to occupation and waste of bandwidth resources.
  • the present application provides an audio signal processing method and device, which solves the problem of repeated transmission and bandwidth resources caused by different audio applications having different audio signal compression and coding requirements when the prior art is aimed at the transmission of audio signals between multiple electronic devices.
  • the problem of waste is a problem of waste.
  • an audio signal processing method includes: a first device performs sampling and quantization processing on an acquired first audio signal to obtain a second audio signal;
  • the first encoding method is encoded to obtain a basic frame
  • the second audio signal is encoded in the second encoding method to obtain an extended frame using the second duration as a unit, where the second duration is greater than the first duration, and the first encoding method and
  • the second encoding method respectively encodes different signals carried in the second audio signal, and/or encodes the second audio signal with different encoding degrees respectively; and sends the basic frame and the extended frame to the second device.
  • the audio signal sending end can encode and compress the same audio signal to obtain two encoding frames with different frame lengths, including a basic frame and an extended frame, and the extended frame can be a comparison between the basic frame and the second audio signal.
  • the receiving end can decode the basic frame to obtain an audio signal, and jointly decode the basic frame and the extended frame to obtain another audio signal.
  • the restored two audio signals have different delays and different audio quality, which can meet the above requirements.
  • the needs of different audio applications avoid the problems of repeated transmission and waste of bandwidth resources after encoding the same audio signal on the encoding side, and reduce system overhead.
  • the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  • the time interval between basic frames is the first duration
  • the time interval between extended frames is N times the first duration, that is, every encoding N-frame basic frame, one-frame extended frame coding. Therefore, the encoding side obtains encoded frames with different delays, and the decoding side uses the encoded frames with different delays to recover audio signals with different delays to meet the needs of different audio applications, increase the encoding rate, and solve the problem of bandwidth resource waste. Reduce system overhead.
  • the second audio signal is encoded by the first encoding method to obtain the basic frame by using the first duration as the unit, which specifically includes: down-sampling the second audio signal to obtain the second audio signal.
  • the low-frequency signal; the low-frequency signal is encoded according to the time-domain coding method to obtain multiple basic frames with the first time length as the frame length.
  • the encoding side may encode the low-frequency signal included in the second audio signal according to a time-domain encoding manner to obtain a basic frame. Since the time-domain encoding method can encode the audio signal into a digital signal with a lower delay, it is suitable for encoding to obtain a basic frame with a lower delay and only including the low-frequency part of the original audio signal, so that the decoding side can recover from the basic frame Obtain an audio signal with strong real-time performance and general audio quality for application to corresponding audio applications.
  • the second audio signal is encoded by the second encoding method to obtain the extended frame by using the second duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding
  • the frequency domain coefficients of the frequency domain coefficients of the second audio signal are averagely grouped in the order from low frequency to high frequency in the high frequency part of the frequency domain coefficients corresponding to the second audio signal to obtain the group envelope values of multiple high frequency groups, where,
  • the group envelope value is an average value of multiple high-frequency frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the encoding side may also encode the high-frequency signal included in the second audio signal in a frequency domain encoding manner to obtain an extended frame, so as to obtain an extended frame for the basic frame.
  • the high frequency part of the signal that has not been coded is coded. Therefore, the decoding side can jointly expand the frame recovery based on the above basic frame to obtain an audio signal with low real-time performance, but including the low-frequency and high-frequency parts of the original audio signal, and with better audio quality, so as to be applied to the corresponding audio application.
  • the foregoing embodiments can meet the requirements of multiple audio applications through basic frame encoding and extended frame encoding, increase the encoding rate, and solve the problem of bandwidth resource waste.
  • the basic frame is obtained by encoding the second audio signal through the first encoding method with the first duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal; the multiple frequency-domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain a group of multiple high frequency groups Envelope value, where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group; encode according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain the first duration as Multiple basic frames of frame length.
  • the encoding side may encode the low-frequency signal and the high-frequency signal included in the second audio signal according to a frequency-domain encoding manner, wherein multiple frequency-domain coefficients of the low-frequency signal are encoded, and
  • the high-frequency signal only encodes the group envelope value of the high-frequency signal to obtain the basic frame.
  • the basic frame coding method is to perform high-quality coding on the low-frequency part, and perform lower-quality coding on the high-frequency part.
  • the decoding side can recover the audio signal with strong real-time performance and general audio quality according to the basic frame, which can be applied to the corresponding Audio application.
  • the second audio signal is encoded by the second encoding method in the second time length as the unit to obtain the extended frame, which specifically includes: taking the second time length as the unit, the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.
  • the encoding side can further encode the high-frequency part of the signal with lower encoding quality in the basic frame according to the basic frame in the second manner, that is, according to the multiple frequency domain coefficients of the high-frequency signal and the corresponding
  • the difference value obtained from the group envelope value is encoded.
  • the extended encoding method is to perform further high-quality encoding on the high-frequency part. Therefore, the decoding side can jointly decode and restore the above-mentioned basic frame and extended frame to obtain an audio signal with general real-time performance and strong audio quality, which can be applied to the corresponding audio signal.
  • the foregoing embodiment uses basic frame encoding and extended frame encoding to obtain encoded frames with different time delays and different encoding qualities, so that the encoding rate can be increased and the system overhead can be reduced.
  • the encoding side can obtain the basic frame according to the time-domain encoding method of the above method 1, and obtain the first extended frame according to the encoding method of the extended frame in the above method 1, and then according to the above method
  • the second extended frame is obtained by encoding the two pairs of extended frames.
  • a basic frame with strong real-time performance, containing only low-frequency signals, and low coding quality can be obtained; the first extension with strong real-time performance, containing low-frequency and high-frequency signals but high-frequency signals, and low coding quality can be obtained
  • the levels of encoded frames are more abundant, and the decoding side can jointly decode and restore audio signals of different quality according to the above-mentioned basic frames and the first extended frame and the second extended frame to meet the needs of different audio applications and improve the flexibility of audio encoding. Performance and coding rate, reducing system overhead.
  • the second audio signal is encoded by the second encoding method to obtain the extended frame in the unit of the second duration, and specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal Multiple frequency domain coefficients of the corresponding low-frequency signal and multiple frequency domain coefficients of the high-frequency signal; multiple frequency-domain coefficients of the low-frequency signal and multiple frequency-domain coefficients of the high-frequency signal are performed in the order from low frequency to high frequency respectively Average grouping to obtain a corresponding group envelope value, where the group envelope value is an average value of multiple frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the encoding side may, according to the frequency domain encoding method, calculate the group envelope value and the high frequency frequency domain coefficients of the low frequency domain coefficients corresponding to the second audio signal
  • the group envelope value of is encoded to obtain the extended frame. Therefore, in the case that the basic frame is lost, the decoding side can also decode according to the extended frame to recover the audio signal, which improves the reliability of audio coding transmission and improves the user experience.
  • performing frequency domain transformation on the second audio signal specifically includes: obtaining MDCT frequency domain component coefficients corresponding to the second audio signal according to an improved discrete cosine transform MDCT algorithm.
  • an audio signal processing method includes: a second device receives a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is Audio signals corresponding to multiple basic frames are re-encoded; basic frames are decoded to obtain basic audio signals; or, basic frames and extended frames are jointly decoded to obtain extended audio signals.
  • decoding the basic frame to obtain the basic audio signal specifically includes: decoding the basic frame according to the time-domain codec mode to obtain the basic audio signal.
  • the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes the group envelope values of multiple high-frequency signals, then according to the group of multiple high-frequency signals The envelope value obtains multiple frequency domain coefficients of the high-frequency signal, and the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; The signal undergoes frequency domain transformation frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the high frequency signal and multiple frequency domain coefficients of the low frequency signal to obtain Extend the audio signal.
  • the basic frame is decoded to obtain the basic audio signal, which specifically includes: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, then according to the basic frame Obtain multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency-domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency-domain coefficients; according to the multiple frequency domains of the low-frequency signal The coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain the basic audio signal.
  • joint decoding of the basic frame and the extended frame to obtain the extended audio signal specifically includes: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, According to the multiple group envelope values of the high-frequency signal, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the multiple frequency domain coefficients of the high-frequency signal are obtained; The frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain an extended audio signal.
  • the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal , Then obtain multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal, and obtain multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal;
  • the multiple frequency domain coefficients are determined by frequency domain transformation based on the basic audio signal obtained in the basic frame, or the frequency domain coefficients of multiple low-frequency signals are determined based on multiple group envelope values of the low-frequency signal in the extended frame.
  • the multiple frequency domain coefficients are the group envelope values corresponding to the frequency domain coefficients; the frequency domain inverse transform is performed according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
  • performing the frequency domain inverse change according to the frequency domain coefficients specifically includes: obtaining the audio analog signal corresponding to the frequency domain coefficient according to the improved inverse discrete cosine transform algorithm.
  • the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
  • an audio signal processing device includes: a preprocessing module for sampling and quantizing the acquired first audio signal to obtain a second audio signal; and an encoding module for using a first duration
  • the second audio signal is encoded in the first encoding mode to obtain a basic frame
  • the second audio signal is encoded in the second encoding method in the second time length unit to obtain an extended frame, wherein the second The duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and/or separately perform the second audio signal Encoding with different encoding levels
  • sending module used to send the basic frame and the extended frame to the second device.
  • the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  • the encoding module is specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; to encode the low-frequency signal according to the time-domain encoding method to obtain multiple
  • the first duration is multiple basic frames of frame length.
  • the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;
  • the multiple frequency domain coefficients are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is the average value of the multiple high frequency frequency domain coefficients in each group;
  • the group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.
  • the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ;
  • the multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is multiple high frequency frequency domains in each group The average value of the coefficients; encoding according to multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain multiple basic frames with the first time length as the frame length.
  • the encoding module is specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain the second Multiple extended frames whose duration is the frame length.
  • the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ;
  • the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain the corresponding group envelope value, where the group envelope value is in each group The average value of multiple frequency domain coefficients; encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the frequency domain transform specifically includes: an improved discrete cosine transform MDCT algorithm.
  • an audio signal processing device includes: a receiving module for receiving a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame It is obtained by re-encoding the audio signals corresponding to multiple basic frames; the decoding module is used to decode the basic frame to obtain the basic audio signal; or, jointly decode the basic frame and the extended frame to obtain the extended audio signal.
  • the decoding module is specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.
  • the decoding module is specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple frequency signals of the high-frequency signal according to the group envelope values of the multiple high-frequency signals.
  • the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient;
  • the basic audio signal is up-sampled to obtain the third audio signal;
  • the third audio signal is subjected to frequency domain transformation frame by frame to obtain the third
  • the multiple frequency domain coefficients of the low frequency signal corresponding to the audio signal; the inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
  • the decoding module is specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequencies of the low-frequency signal according to the basic frame. Domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain the basic audio signal.
  • the decoding module is specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple group envelope values of the high-frequency signal Value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency The domain coefficients are inversely transformed in the frequency domain to obtain an extended audio signal.
  • the decoding module is specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal.
  • the value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are obtained according to the basic frame
  • the basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to the multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal correspond to the frequency domain coefficients
  • Group envelope value Perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.
  • the frequency domain inverse change specifically includes: an improved inverse discrete cosine transform algorithm.
  • the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
  • an electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the first aspect and the first aspect.
  • an electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the second aspect and the second aspect described above.
  • a computer-readable storage medium When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned first aspect and the first aspect. Any one of the audio signal processing methods.
  • An eighth aspect provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of the first aspect and the first aspect.
  • a computer-readable storage medium When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned second aspect and the second aspect. Any one of the audio signal processing methods.
  • a computer program product is provided.
  • the computer program product runs on a computer, the computer executes the audio signal processing method according to any one of the second aspect and the second aspect.
  • any audio signal processing device, electronic device, computer readable storage medium, and computer program product provided above can be used to execute the corresponding method provided above, and therefore, the benefits that can be achieved are For the effect, please refer to the beneficial effect in the corresponding method provided above, which will not be repeated here.
  • FIG. 1 is a schematic diagram of an application scenario of an audio signal processing method provided by an embodiment of this application
  • FIG. 2 is a schematic flowchart of an audio signal processing method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the processing process of an audio signal processing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of an audio signal encoding frame provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of another audio signal processing method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an audio signal processing device provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of another audio signal processing device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the present embodiment, unless otherwise specified, “plurality” means two or more.
  • the embodiments of the present application provide an audio signal processing method and device, which can be applied to the transmission of audio signals between multiple electronic devices, and can be used for different audio signal processing requirements for different applications, and flexibly perform audio based on basic frames and extended frames.
  • the signal encoding and decoding can meet the audio processing with different delay requirements or different quality requirements. This solves the problems of repeated transmission and waste of bandwidth resources caused by different audio applications' requirements for the real-time and restoration quality of audio signal transmission when the same channel of audio signal is transmitted between multiple electronic devices in the prior art.
  • the audio signal processing method provided by the embodiment of the present application can be applied to an electronic device with audio signal processing capability, and includes at least two electronic devices, and data can be transmitted between the two electronic devices.
  • the audio signal can be transmitted through a wired network, a wireless local area network, Near Field Communication (NFC), or Bluetooth.
  • NFC Near Field Communication
  • the electronic device can be a mobile phone, a smart speaker, a smart headset, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), and a netbook.
  • UMPC ultra-mobile personal computer
  • UMPC ultra-mobile personal computer
  • netbook a netbook
  • PDA personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • the electronic device 1 may be a mobile phone
  • the electronic device 2 may be a smart headset.
  • the embodiment of the application provides an audio signal processing method, which is applied to a first device and a second device. As shown in Figure 2, the method may include:
  • the first device performs sampling and quantization processing on the acquired first audio signal to obtain a second audio signal.
  • the first audio signal may be an audio signal collected by the first device, or an audio signal stored locally by the first device or from another device or device.
  • the first audio signal needs to be sampled and quantized to obtain a digital signal to save transmission bandwidth.
  • the basic processing procedure can be referred to as shown in FIG. 3, after sampling and quantizing the first audio signal, the second audio signal s(n) is obtained, where n corresponds to different audio sampling points and is arranged in chronological order. If the audio signal is sampled at a frequency of 16kHz, which means that 16 ⁇ 10 3 sampling points are sampled per second, then the time interval between every two sampling points is 0.0625ms.
  • the quantized value corresponding to the sampling point of the audio signal is encoded into a binary digital signal, which can be transmitted.
  • different quantization precisions can be used to represent the quantization value of the sampling point, for example, it can be represented by 16 bits, 24 bits, or 32 bits.
  • the first device encodes the second audio signal frame by frame through the first encoding method in the unit of the first time length to obtain a basic frame, and encodes the second audio signal frame by frame in the second encoding method in the unit of the second time length. Get the extended frame.
  • the second duration is greater than the first duration, and therefore, the frame length of the extended frame is greater than the frame length of the basic frame.
  • the second audio signal of a fixed duration can be used as an interval, and after each frame of the second audio signal is collected and quantized, the second audio signal of this frame can be compressed and encoded, and then sent after being encoded frame by frame.
  • the second audio signal is encoded according to different time intervals, that is, different frame lengths, to generate two or more encoded frames, including a basic frame and an extended frame.
  • the current audio coding technology can only achieve infinitely close to the original audio signal, that is, the encoding of the audio signal.
  • the decoding rules determine that the digital encoding and decoding methods all have a certain degree of distortion to the audio signal, and cannot completely restore the original audio signal.
  • the encoding method involved in this application is a lossy encoding technology.
  • the basic frame or the extended frame in the embodiment of the present application can only encode a part of the first audio signal, but not all of it.
  • the extended frame may be obtained by re-encoding the second audio signal segments corresponding to multiple basic frames, and the extended frame may further encode audio signals in the basic frame that are not encoded or have insufficient encoding precision.
  • the first encoding method and the second encoding method may respectively encode different signals carried in the second audio signal.
  • the low-frequency signal part carried in the second audio signal is encoded according to the first encoding method to obtain a basic frame
  • the high-frequency signal part carried in the second audio signal is encoded according to the second encoding method to obtain an extended frame.
  • the first encoding method and the second encoding method can also be encoding frames with different encoding levels on the second audio signal respectively to obtain an encoded frame with lower encoding quality and an encoded frame with higher encoding quality, which are then transmitted to the decoding side decoding. Therefore, the decoding side can respectively recover different audio signals according to the basic frame or the extended frame. Compared with the original audio signal, the audio signal recovered from the extended frame combined with the basic frame has less distortion, so the encoding quality is better.
  • the encoding quality of the audio signal refers to the degree of restoration of the audio signal recovered after decoding relative to the original audio signal before encoding and compression. That is to say, the longer the frame length for encoding the second audio signal, the audio signal obtained after decoding has a higher signal reproduction degree and a lower distortion rate than the original audio signal.
  • the basic frame may be a lower delay and/or lower quality encoding of the current second audio signal
  • the first device may separately transmit the basic frame to the second device frame by frame.
  • the audio signal can be obtained by decoding according to a preset decoding mode, so as to be applied to audio applications that require low delay or relatively low audio quality.
  • the extended frame may perform higher delay and/or higher quality encoding on the current second audio signal.
  • the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame encoding transmits enhanced information for multiple basic frame audio signals, and further encodes data that is not included in the basic frame or incompletely encoded in the audio signal.
  • the second device side can jointly decode with the basic frame after receiving the extended frame frame by frame to obtain an audio signal with higher audio quality, which can be applied to audio applications that do not require high real-time performance but relatively high audio quality. .
  • the first device may encode the second audio signal in a unit of a first duration to obtain a basic frame; the first device encodes the second audio signal in a unit of a second duration to obtain an extended frame.
  • the second duration may be N times the first duration, and N is a natural number greater than or equal to 2.
  • the first duration is the frame length of the basic frame, that is, the time interval between two basic frames
  • the second duration is the frame length of the extended frame, that is, the time interval between two extended frames.
  • t1, t2, t3, t4, t5, t6, t7, and t8 represent the basic frames of audio coding.
  • the algorithmic delay of the basic frame is about ⁇ t, that is, the time interval between two basic frames is ⁇ t .
  • T1 and T2 represent the extended frames of audio coding.
  • the extended frame compression is performed once every four basic frames as an example.
  • the basic frame or the extended frame contains the digitized audio sample data.
  • the time delay ⁇ t may be 0.5 ms or 5 ms, and the time delay ⁇ t and ⁇ T depend on the design of the coding structure and actual application requirements. For example, when the sampling frequency is 16kHz and the frame length of the basic frame is 5ms, the number of audio sampling points contained in each basic frame is 80.
  • S203 The first device sends the basic frame and the extended frame to the second device.
  • the first device may transmit the basic frame to the second device frame by frame after encoding the basic frame, and the first device may transmit the extended frame to the second device frame by frame after encoding the extended frame. Therefore, after receiving the basic frame or the extended frame, the second device decodes the basic frame or the extended frame to recover the audio signal, which is used for different audio applications.
  • the second device receives the digital signal sent from the first device, and the digital signal includes a basic frame or an extended frame, and the second device can decode according to a preset encoding and decoding method, and restore Audio signal.
  • the specific process may include:
  • the second device receives the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding audio signals corresponding to multiple basic frames.
  • S502 The second device decodes the basic frame to obtain the basic audio signal, or jointly decodes the basic frame and the extended frame to obtain the extended audio signal.
  • the second device decodes the received basic frame or extended frame according to the preset codec rules, that is, the second device decodes the digital signal to obtain an analog signal, so as to meet the audio signal requirements of different audio applications on the second device.
  • the second device After receiving the basic frame, the second device performs frame decoding according to the basic frame to obtain the corresponding basic audio signal s 1 (n). After receiving the extended frame, the second device performs comprehensive decoding according to the extended frame and the basic frame to obtain the corresponding extended audio signal s 2 (n).
  • the audio content of the basic audio signal s 1 (n) and the second audio signal s 2 (n) are the same, but the transmission delay of the basic audio signal s 1 (n) and the extended audio signal s 2 (n) is sum The audio quality is different.
  • the audio quality of the basic audio signal s 1 (n) is slightly worse than that of the extended audio signal s 2 (n), and the transmission delay of the basic audio signal s 1 (n) is lower than that of the extended audio signal s 2 ( n) the transmission delay.
  • audio applications with different delay requirements between the encoding side and the decoding side can be transmitted using the same set of encoding schemes, that is, the encoding side only obtains one audio signal, but it can meet different delay requirements.
  • the basic frame and the extended frame are respectively encoded, so that the decoding side can decode different audio signals according to the two encoded frames to meet the needs of different audio applications.
  • the audio signal decoded according to the basic frame has a low delay, but the audio signal quality is poor.
  • the audio signal decoded according to the extended frame combined with the basic frame has a longer time delay, but the audio signal quality is better, and the distortion of the original audio signal is small.
  • the decoding side can recover more than two audio signals according to different basic frames and extended frames, and only one audio signal is encoded during encoding.
  • This encoding method reduces redundant information and avoids the encoding side from processing the same audio signal.
  • the problems of repeated transmission and waste of bandwidth resources after encoding have greatly reduced system overhead.
  • the first device may use a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low frequency part of the second audio signal.
  • the first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.
  • the required audio signals require strong real-time performance.
  • the signal transmission delay interval does not exceed 1ms, but the audio quality is not required.
  • High the audio signal may not contain high-frequency signals but only low-frequency signals.
  • the other is voice enhancement applications.
  • the required audio signal is not real-time, and the signal transmission delay does not exceed 6ms, but the audio quality is relatively high, and both high-frequency and low-frequency signals are required.
  • the encoding of the basic frame by the first device may specifically include:
  • the first device down-samples the second audio signal to obtain the low-frequency signal included in the second audio signal.
  • down-sampling means to sample a sequence of samples at intervals of several samples, so as to obtain the processing mode of the new sequence.
  • the bandwidth of the second audio signal obtained by quantization may be half of the sampling rate, that is, the bandwidth may be 8 kHz.
  • the second audio signal includes a frequency band of 0-8kHz, where the low-frequency signal s L (n) is a part of 0-4kHz, and the high-frequency signal s H (n) is a part of 4k-8kHz.
  • the second audio signal is subjected to double downsampling processing to obtain an audio signal whose low-frequency signal s L (n) included in the second audio signal is 0-4 kHz.
  • the time domain coding is to encode the waveform of the audio signal.
  • time-domain coding there are coding standards such as International Telecommunication Union (ITU) G.726, G.723.1 or G.728. These coding standards widely use code-excited linear prediction technology, based on the principle Human occurrence mechanism modeling, using the inherent characteristics of human glottis and sound channels to remove redundant information in audio signals, so as to maintain high audio quality while greatly reducing the bit rate required for audio coding .
  • the first device may use the G.726 encoding method to encode s L (n), and assemble basic frames at intervals of the first time length, and the frame length of the basic frames is the first time length.
  • the first duration may be 0.5 ms
  • the s L (n) signals of each 0.5 ms duration are coded one by one, and the obtained digital signal is a basic frame.
  • G.726 is a speech coding and decoding algorithm that can encode audio signals into digital signals with lower delay.
  • the encoding of the extended frame by the first device may specifically include:
  • the principle of frequency domain coding is to encode audio signals in the frequency domain by using the human ear's acceptance principle of sound. Focus on coding the frequency bands that humans pay attention to, and use a rough quantization or non-quantization strategy for frequency bands that are masked by other frequency bands or that are not easily perceivable by humans.
  • the advantage of frequency domain coding is that according to the characteristics of the human ear, a certain amount of redundancy is removed. Therefore, the coding effect of various audio signals is almost equivalent, especially for music and other signals.
  • the coding quality is higher than that of time domain coding.
  • MDCT Modified Discrete Cosine Transform
  • the MDCT transform is an algorithm that transforms the signal from the time domain to the frequency domain, and the obtained coefficients represent the frequency domain components of each frequency point.
  • the MDCT coefficient S(k) is obtained, and S(k) is the frequency domain part of the second audio signal.
  • the second duration is 5ms, that is, the frame length for encoding the extended frame is 5ms, and the sampling rate is 16kHz
  • the value range is 0 ⁇ 79.
  • the MDCT transform is performed on the s(n) signals of each 5ms duration one by one to obtain the corresponding MDCT coefficients.
  • the value range of k can be 0-79.
  • the frequency domain coefficient k starts from 0 and represents from low frequency to high frequency. Then the low-frequency frequency domain coefficients from low to high are S(0) ⁇ S(39), and the high-frequency frequency domain coefficients from low to high are S(40) ⁇ S(79).
  • the multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.
  • the above 40 high frequency frequency domain coefficients S(40) to S(79) are equally divided into 8 groups, and each group of high frequency groups includes five high frequency frequency domain coefficients, and the specific groups are as follows:
  • Group 1 contains high frequency frequency domain coefficients: S(40) ⁇ S(44);
  • Group 2 contains high frequency frequency domain coefficients: S(45) ⁇ S(49);
  • Group 3 contains high frequency frequency domain coefficients: S(50) ⁇ S(54);
  • Group 4 contains high frequency frequency domain coefficients: S(55) ⁇ S(59);
  • Group 5 contains high frequency frequency domain coefficients: S(69) ⁇ S(64);
  • Group 6 contains high frequency frequency domain coefficients: S(65) ⁇ S(69);
  • Group 7 contains high frequency frequency domain coefficients: S(70) ⁇ S(74);
  • Group 8 contains high frequency frequency domain coefficients: S(75) ⁇ S(79).
  • the group envelope values of the multiple high-frequency groups are obtained, where the group envelope value is the average value of the multiple high-frequency frequency domain coefficients in each group.
  • the first device can obtain the group envelope value of each group of the high-frequency part of the second audio signal, and then encode according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the calculation of the group envelope value may specifically be:
  • S HE (1) [S(45)+S(46)+S(47)+S(48)+S(49)]/5;
  • Group 8 envelope value: S HE (7) [S(75)+S(76)+S(77)+S(78)+S(79)]/5.
  • the first device may digitally encode the group envelope values of the multiple high-frequency groups obtained above, and send them to the second device frame by frame. For example, every 5 ms, the first device assembles the obtained S HE (0) to S HE (7) codes into an extended frame and sends it to the second device.
  • the second device receives a basic frame at regular intervals, and then decodes the basic frame according to the time-domain decoding method to obtain the first audio signal, which is relative to the original audio on the encoding side
  • the signal only contains the low frequency part.
  • the second device receives an extended frame at regular intervals, and the extended frame only contains the high frequency part of the original audio signal.
  • the second device combines the extended frame with the basic frame for comprehensive decoding to obtain the second audio signal.
  • the second audio signal includes not only a low frequency part, but also a high frequency part.
  • the second device can receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s 1 (n).
  • the basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower latency requirements, such as equipment calibration and positioning applications.
  • the extended frame received by the second device includes the group envelope values of multiple high-frequency signals
  • multiple high-frequency frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, that is, the high-frequency signal
  • the frequency domain coefficient of is the group envelope value corresponding to the high frequency frequency domain coefficient
  • the basic audio signal is up-sampled to obtain the third audio signal
  • the third audio signal is frequency domain transformed frame by frame to obtain the third audio signal corresponding Multiple low-frequency frequency domain coefficients of the low-frequency signal.
  • the audio signal recovered by the second device according to the multiple high-frequency frequency domain coefficients and the multiple low-frequency frequency domain coefficients is the extended audio signal.
  • the second device may receive an extended frame every 5 ms, and obtain the group envelope values S HE (0) to S HE (7) of the high frequency part of the audio signal from the extended frame.
  • group envelope value multiple high frequency frequency domain coefficients can be obtained, that is, the high frequency frequency domain coefficient of the audio signal is equal to the group envelope value of the corresponding high frequency frequency domain coefficient group, namely:
  • the up-sampling process is to insert one or more zero points in two adjacent points in the original signal.
  • a bandwidth of 8k and a sampling rate of 16 kHz can be obtained.
  • the third audio signal s′ L (n) but the high frequency part of the third audio signal s′ L (n) is still 0.
  • the frequency domain coefficient S′ L (k) can be obtained by MDCT transformation on the audio signal s′ L (n) of the low frequency part according to the following formula:
  • S(k) S′ L (k)
  • k 0-39.
  • the inverse transformation of the improved discrete cosine transform is performed on S(k), and the extended audio signal s 2 (n) can be obtained, and the extended audio signal s 2 (n) includes both high-frequency components and low-frequency components.
  • the specific formula of the inverse transform of the improved discrete cosine transform is as follows:
  • the audio signal s 1 (n) decoded according to the basic frame has only low-frequency components, and the decoding quality is low, but the audio signal has a low delay, which can be used for different audio quality requirements. High and low audio delay requirements for audio services applications.
  • the audio signal s 2 (n) obtained by joint decoding of the extended frame and the basic frame both high frequency and low frequency components are present, and the decoding quality is higher, but the delay is longer. Therefore, it can be used for higher audio quality requirements, but Applications of audio services that do not require high real-time audio transmission.
  • one audio application is transmitted through the same set of codec solutions, and different audio signals obtained by decoding can be applied to different audio applications respectively, thereby avoiding repeated coding, decoding and transmission processes, and greatly avoiding bandwidth
  • the waste of resources reduces system overhead.
  • the device on the decoding side may decode according to the extended frame
  • the device on the decoding side may decode according to the extended frame
  • the frequency domain inverse transform the low frequency domain coefficient is 0, and the audio signal can be recovered by performing the frequency domain inverse transform only according to the frequency domain coefficient of the high frequency part.
  • the audio signal only contains high frequency parts.
  • the first device may adopt a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low-frequency part of the second audio signal.
  • the first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.
  • the required audio signal requires strong real-time performance, the signal delay is low and does not exceed 6ms, and both high and low frequencies are required.
  • the other is a three-dimensional (3D) sound field acquisition application, which requires a higher audio signal quality and a longer signal delay.
  • the encoding of the basic frame by the first device may specifically include:
  • the first device uses the first time length as the frame length to perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients, that is, to obtain multiple low frequency frequency domain coefficients and high frequency signals of the low frequency signal corresponding to the second audio signal Of multiple high frequency frequency domain coefficients.
  • the multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency, and the group envelope value of multiple high frequency groups is obtained, where the group envelope value is multiple high frequencies in each group.
  • the average value of the frequency domain coefficients are obtained.
  • the first duration may be 5 ms.
  • the sampling rate is 16kHz
  • the first device can perform MDCT transformation on the audio signal s(n) every 5ms to obtain the MDCT coefficient S(k), where the value range of k can be 0-79.
  • the first device encodes the multiple frequency domain coefficients S(0)-S(39) of the low-frequency signal and the group envelope values S HE (0)-S HE (7) of the high-frequency signal to obtain a basic frame.
  • the encoding of the extended frame by the first device may specifically include:
  • the first device uses the second duration as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the calculation method can be as follows:
  • SD HE (40) S(40)-S HE (0);
  • SD HE (41) S(41)-S HE (0);
  • SD HE (46) S(45)-S HE (1);
  • SD HE (79) S(79)-S HE (7).
  • the first device may assemble these group envelope coefficient differences SD HE (40) to SD HE (79) into an extended frame every 20ms, and transmit it to the second device.
  • the first device may directly encapsulate these group envelope coefficient differences SD HE (40) to SD HE (79) for transmission, or may also use differential quantization for encoding and transmission.
  • the second device receives a basic frame every first time length. If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, the second device receives the basic frame according to the basic The multiple envelope values of the high-frequency signal in the frame obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain the first An audio signal.
  • the second device receives an extended frame every second time length. If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device can combine the high-frequency signal in the basic frame Obtain multiple frequency domain coefficients of the high-frequency signal, and then perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the second audio signal.
  • the second audio signal includes not only a low frequency part, but also a high frequency part.
  • the second device can receive the basic frame every 5 ms.
  • the second device first obtains the frequency domain coefficients of the low frequency part of S(k) according to the basic frame, that is, S(0) to S(39).
  • the second device then obtains the high-frequency coefficients according to the high-frequency group envelope value in the basic frame, that is, each high-frequency frequency domain coefficient can be made equal to its corresponding group envelope value, namely:
  • the basic audio signal s 1 (n) has a relatively low time delay and includes both the high frequency part and the low frequency part of the original audio signal.
  • the high frequency part is only a high frequency signal restored with the group envelope value, that is, the values of multiple frequency bands are the same, the signal quality of the high frequency part is slightly worse, which is equivalent to reducing the frequency domain resolution of the high frequency part.
  • the second device can receive the extended frame every 20ms, and the second device obtains the group envelope coefficient difference SD HE (40)-SD HE (79) of the high frequency part of the audio signal from the extended frame. Then according to SD HE (40) ⁇ SD HE (79) to obtain the frequency domain coefficients of the high frequency part of each basic frame, that is, by adding the group envelope coefficient difference and the spectral envelope as shown below, each high frequency frequency Domain coefficient:
  • the extended audio signal s 2 (n) includes both the high frequency part and the low frequency part of the original audio signal, and the high frequency part is the high frequency signal restored with the group envelope value combined with the group envelope coefficient difference, so the expanded audio signal s 2 (n) higher compared to the basic audio signal s 1 (n) reducing the quality, but the longer the delay spreading s 2 (n) of the audio signal in terms of real-time signal transmission, the basic audio signal s 1 (n) is better than the extended audio signal s 2 (n).
  • the first device when the first device needs to meet more than three different audio application requirements on the second device, the first device may encode one basic frame and two or more extended frames.
  • the basic frame can be obtained by the first device using a time-domain coding method with a lower delay and low quality, that is, only the low frequency part of the second audio signal is encoded.
  • the first device obtains the first extended frame by adopting a frequency domain coding method with higher delay and low quality.
  • the first extended frame only encodes the envelope value of the frequency domain group of the high frequency part of the second audio signal.
  • the first device adopts a higher time delay and high-quality frequency domain coding method to obtain a second extended frame, and the second extended frame contains the high frequency part of the second audio signal.
  • the second device there are three different audio applications on the second device.
  • One is equipment calibration and positioning applications.
  • the requirement for processing audio signals is real-time, and the signal transmission delay interval should not exceed 1ms.
  • the audio signal can only contain low-frequency signals.
  • High-frequency signal the second is the application of voice enhancement, the application of audio signal processing requirements is strong real-time, the signal transmission delay does not exceed 6ms, the audio quality requirements are higher, the high-frequency signal and low-frequency in the audio signal
  • the signal part is required; the third is for the 3D sound field acquisition application, which does not require high real-time processing of audio signals, but requires high audio quality.
  • the encoding of the basic frame by the first device may refer to the encoding manner of the basic frame in the foregoing manner 1, which may include:
  • the first device may use the G.726 encoding method to encode s L (n), and assemble it into a basic frame at the interval of the first time length.
  • the first time length may be 0.5 ms, which satisfies the above-mentioned first audio frequency. Application requirements.
  • the encoding of the first extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:
  • the multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.
  • the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients.
  • the frame length is 5ms
  • the sampling rate is 16kHz
  • s(n) includes 80 sampling points, that is, S(0 ) ⁇ S(79).
  • HE (0) ⁇ S HE (7) where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group.
  • the first device can digitally encode the multiple high-frequency group envelope values S HE (0) ⁇ S HE (7) obtained above, and every 5 ms, the first device converts the obtained S HE (0) ⁇ S HE (0) ⁇ S HE (7)
  • the encoding is assembled into an extended frame and sent to the second device.
  • the encoding of the second extended frame in the foregoing step S202 may refer to the encoding process of the extended frame in the second manner, including:
  • the first device uses the third time as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the third time as the frame length.
  • the first device may calculate the difference between each high-frequency frequency domain coefficient of the high-frequency part after encoding of the first extended frame and the group envelope value of the corresponding high-frequency group every 20 ms. Specifically, multiple high-frequency frequency domain coefficients can be subtracted from the group envelope value corresponding to the high-frequency frequency domain coefficient to obtain group envelope coefficient differences SD HE (40) to SD HE (79 ). Then, the first device can assemble these group envelope coefficient differences SD HE (40)-SD HE (79) into a second extended frame every 20ms, and transmit it to the second device.
  • the second device receives a basic frame every first time length, and then decodes the basic frame according to the time domain decoding method to obtain a basic audio signal, which is relative to the original audio signal on the encoding side Only the low frequency part is included.
  • the second device receives a frame of the first extended frame every second time length. If the first extended frame includes the group envelope values of multiple high-frequency signals, the second device uses the group envelope values of the multiple high-frequency signals To the multiple frequency domain coefficients of the high frequency signal, the frequency domain coefficient of the high frequency signal is the group envelope value corresponding to the frequency domain coefficient; at the same time, the first audio signal obtained by decoding the basic frame is up-sampled to obtain the third audio signal ; Perform frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal. Then, inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the first extended audio signal.
  • the first extended audio signal includes a low-frequency signal and a high-frequency signal, but the high-frequency quality is slightly weaker, and the first extended audio signal has a longer time delay. Therefore, it can be used for the application of the second audio service described above.
  • the second device receives a second extended frame every third time. If the second extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device may combine the first Expand the group envelope value of the high-frequency signal in a frame to obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain The second extended audio signal.
  • the second extended audio signal includes not only a low frequency part, but also a high frequency part.
  • the second device may receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s 1 (n).
  • the basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower delay requirements, such as the aforementioned equipment calibration and positioning applications.
  • the second device can receive a first extended frame every 5ms, and obtain the group envelope value S HE (0) ⁇ S HE (7) of the high frequency part of the audio signal from the first extended frame, then the second device can be based on the group Envelope value can get multiple high frequency frequency domain coefficients S(40) ⁇ S(79).
  • the second device performs up-sampling processing on the audio signal s L (n) obtained by decoding multiple basic frames received within 5 ms to obtain the audio signal s′ L (n), and performs MDCT transformation on s′ L (n) to obtain the low frequency frequency Domain coefficients S(0) ⁇ S(39). Perform the MDCT inverse transformation on S(0) ⁇ S(79) to obtain the first extended audio signal s 2 (n).
  • the first extended audio signal s 2 (n) includes both the high frequency part and the low frequency part. , Among them, the quality of the high frequency part is slightly weaker.
  • the second device may receive a second extended frame every 20ms, and obtain the group envelope coefficient difference SD HE (40)-SD HE (79) of the high frequency part of the audio signal from the second extended frame. Then according to SD HE (40) ⁇ SD HE (79), combined with the group envelope value S HE (0) ⁇ S HE (7) of the high frequency part of the audio signal obtained in the above-mentioned first extended frame, each high frequency part is obtained The frequency domain coefficients S(40) ⁇ S(79). The inverse MDCT transform is performed on S(0) ⁇ S(79), and the second extended audio signal s 3 (n) of the 20ms time period is obtained.
  • the second extended audio signal s 3 (n) includes both the high frequency part It also includes a low frequency part, where the second extended audio signal s 3 (n) has a slightly better quality than the high frequency part of the first extended audio signal s 2 (n).
  • the present application provides more possible audio coding structures, which can be applied to three or more audio applications with different requirements, thereby saving transmission bandwidth and improving system performance.
  • the first device may use a time-domain coding method with a lower delay and low quality to obtain the basic frame, that is, only the low-frequency part of the second audio signal is encoded.
  • the first device can use a higher delay, low quality frequency domain encoding method to obtain the extended frame, and only encode the frequency domain group envelope value of the low frequency part and the frequency domain group envelope value of the high frequency part of the second audio signal .
  • step S202 for the encoding of the basic frame by the first device, refer to the encoding method for the basic frame in the above method 1, which may include:
  • the first device may use the G.726 encoding method to encode s L (n), and assemble it into a basic frame at intervals of the first duration, for example, the first duration may be 0.5 ms.
  • the encoding of the extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:
  • the multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, and the low frequency part
  • the multiple frequency domain coefficients of are averagely grouped in the order from low frequency to high frequency, and the group envelope values of multiple low frequency groups are obtained, which are encoded according to the envelope coding method.
  • the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients.
  • the frame length is 5ms
  • the sampling rate is 16kHz
  • s(n) includes 80 sampling points, that is, S(0 ) ⁇ S(79).
  • Each high-frequency group has five low-frequency component coefficients, and the group envelope value S LE (0 ) ⁇ S LE (7).
  • the 40 high-frequency component coefficients S(40) ⁇ S(79) are divided into 8 groups evenly, and each high-frequency group has five high-frequency component coefficients, and the group envelope value S of each high-frequency group is obtained.
  • the first device can digitally encode the group envelope values S LE (0) ⁇ S LE (7) of the multiple low frequency groups obtained above, and perform the group envelope values S HE (0) ⁇ of the multiple high frequency groups.
  • S HE (7) performs digital encoding. Every 5ms, the first device assembles the S LE (0) ⁇ S LE (7) and S HE (0) ⁇ S HE (7) obtained above into an extended frame and sends it to the second Device.
  • the second device receives a basic frame every first time period, and then decodes the basic frame according to the time-domain decoding method to obtain a basic audio signal.
  • the first audio signal is relative to the original audio on the encoding side.
  • the signal contains only the low frequency part.
  • the second device receives an extended frame every second time length, and if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, it is based on the multiple group envelope values of the low-frequency signal The value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal.
  • the multiple frequency domain coefficients of the low-frequency signal may be determined by performing frequency domain transformation on the first audio signal obtained from the basic frame.
  • the second device can determine multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal in the extended frame, where the frequency of the multiple low-frequency signals
  • the domain coefficient is the group envelope value corresponding to the frequency domain coefficient.
  • the second device can perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
  • the second device can receive a basic frame every 0.5ms, and then decode the basic frame according to the G.726 decoding method to obtain the basic audio signal s 1 (n) .
  • the basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms.
  • the second device can receive an extended frame every 5ms, and the group envelope value S HE (0) ⁇ S HE (7) of the high frequency part of the audio signal is obtained from the extended frame, and then multiple envelope values can be obtained according to the group envelope value.
  • the second device performs up-sampling processing on the audio signal s L (n) obtained by decoding multiple extended frames received within 5 ms to obtain the audio signal s′ L (n), and performs MDCT transformation on s′ L (n) to obtain the low frequency frequency Domain coefficients S(0) ⁇ S(39). Perform inverse MDCT transformation on S(0) ⁇ S(79) to obtain the extended audio signal s 2 (n).
  • the extended audio signal s 2 (n) includes both the high frequency part and the low frequency part. The quality of the frequency part is slightly weaker.
  • the second device decodes the group envelope value S of the low frequency part obtained by decoding the extended frame.
  • LE (0) to S LE (7) obtain a plurality of low frequency frequency domain coefficients S(0) to S(39), wherein the plurality of low frequency frequency domain coefficients is equal to the group envelope value of the corresponding low frequency frequency domain coefficient group.
  • the second device obtains multiple high-frequency frequency domain coefficients S(40)-S(79) according to the group envelope values S HE (0)-S HE (7) of the high-frequency part obtained by decoding the extended frame.
  • the frequency-frequency domain coefficient is equal to the group envelope value of the corresponding high-frequency frequency domain coefficient group.
  • the second device performs MDCT inverse transformation on S(0) ⁇ S(79) obtained by decoding multiple extended frames received within 5ms, and then the extended audio signal s 2 (n) can be obtained.
  • the extended audio signal s 2 (n ) Includes both high frequency part and low frequency part.
  • the device on the decoding side can still decode based on the extended frame to realize the restoration of the entire audio signal.
  • the above-mentioned implementations provided by this application can transmit one audio application through the same set of codec solutions, and different audio signals obtained by decoding the basic frame or extended frame can be applied to different audio applications, thereby avoiding duplication.
  • the encoding, decoding and transmission process can greatly avoid the waste of bandwidth resources and reduce system overhead.
  • the device on the decoding side can decode according to the extended frame, which further improves the reliability of audio transmission.
  • the encoding side device may communicate with the decoding side device in advance according to the encoding requirements of the audio application for the transmission of the audio signal, and negotiate a specific encoding and decoding mode. For example, according to the first audio application on the second device that requires a low-latency, low-quality audio signal, the second device sends the audio signal request information to the first device to carry the configuration information, which is used to indicate that the audio signal request corresponds to Encoding. Or, when the first device sends the encoded frame to the second device, the encoding mode of the encoded frame can be indicated by the agreed bit.
  • the first device sends the basic frame of the audio signal to the second device, and the basic frame It includes two pre-configured bits.
  • 01 can indicate encoding mode two. It can be seen that the configuration of the foregoing codec is only shown as an example, and is not limited to the foregoing two types, and the embodiment of the present application does not specifically limit this.
  • the present application also provides an audio processing device, as shown in FIG. 6, the device 600 may include a preprocessing module 601, an encoding module 602, and a sending module 603.
  • the preprocessing module 601 may be used to perform sampling and quantization processing on the acquired first audio signal to obtain the second audio signal.
  • the encoding module 602 may be configured to encode the second audio signal in a first encoding mode in a first time length as a unit to obtain a basic frame, and perform the second audio signal in a second encoding method in a second time length as a unit. Encoding to obtain an extended frame, wherein the second duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and /Or encoding the second audio signal with different encoding degrees respectively.
  • the sending module 603 can be used to send the basic frame and the extended frame to the second device.
  • the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  • the encoding module 602 may be specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; and encode the low-frequency signal according to the time-domain encoding method to obtain Multiple basic frames with the first duration as the frame length.
  • the encoding module 602 can be specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal; Part of the multiple frequency domain coefficients are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, where the group envelope value is the average value of multiple high frequency frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients; the multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups. Among them, the group envelope value is multiple high frequencies in each group. The average value of the frequency domain coefficients; encoding according to the multiple frequency domain coefficients of the low frequency signal and the group envelope value of the high frequency signal to obtain multiple basic frames with the first time length as the frame length.
  • the encoding module 602 may be specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain The second duration is multiple extended frames of the frame length.
  • the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients: group multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain the corresponding group envelope value, where the group envelope value is the average value of the multiple frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the frequency domain transform in the foregoing embodiment may specifically be an improved discrete cosine transform MDCT algorithm.
  • the device 700 includes a receiving module 701 and a decoding module 702.
  • the receiving module 701 may be used to receive the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding the audio signals corresponding to multiple basic frames of.
  • the decoding module 702 can be used to decode a basic frame to obtain a basic audio signal; or, to jointly decode a basic frame and an extended frame to obtain an extended audio signal.
  • the decoding module 702 may be specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.
  • the decoding module 702 can be specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple envelope values of the high-frequency signals according to the group envelope values of the multiple high-frequency signals.
  • a frequency domain coefficient, the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient;
  • the basic audio signal is up-sampled to obtain the third audio signal;
  • the third audio signal is subjected to frequency domain transformation frame by frame to obtain The multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
  • the decoding module 702 can be specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, obtain the multiple of the low-frequency signal according to the basic frame. Multiple frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency-domain coefficients of the low-frequency signal and the high-frequency signal Perform inverse frequency domain transformation on multiple frequency domain coefficients to obtain a basic audio signal.
  • the decoding module 702 may be specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple groups of the high-frequency signal The envelope value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain the multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain an extended audio signal.
  • the decoding module 702 can be specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal
  • the envelope value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are based on the basic frame
  • the obtained basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal are frequency domain coefficients.
  • Corresponding group envelope value ; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.
  • the frequency domain inverse change in the foregoing embodiment may specifically be an improved inverse discrete cosine transform algorithm.
  • the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
  • the sending module may be a transmitter, which may include an antenna and a radio frequency circuit, and the preprocessing module, encoding module, and decoding module may be processors, such as baseband chips.
  • the audio signal processing device is a component having the function of the first device or the second device
  • the sending module may be a radio frequency unit
  • the preprocessing module, encoding module, and decoding module may be processors.
  • the sending module may be the output interface of the chip system
  • the preprocessing module, encoding module, and decoding module may be the processors of the chip system, such as a central processing unit (CPU) .
  • CPU central processing unit
  • the audio signal processing device is presented in the form of dividing various functional modules in an integrated manner.
  • the "module” herein may refer to a specific circuit, a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions.
  • the audio signal processing device may adopt the form shown in FIG. 8 below.
  • FIG. 8 is a schematic structural diagram of an exemplary electronic device 800 shown in an embodiment of the application.
  • the electronic device 800 may be the first device or the second device in the foregoing embodiment, and is used to execute the smart camera in the foregoing embodiment. Test method.
  • the electronic device 800 may include at least one processor 801, a communication line 802, and a memory 803.
  • the processor 801 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication line 802 may include a path to transmit information between the above-mentioned components, and the communication line may be, for example, a bus.
  • the memory 803 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently, and is connected to the processor through a communication line 802.
  • the memory can also be integrated with the processor.
  • the memory provided by the embodiment of the present application is usually a non-volatile memory.
  • the memory 803 is used to store and execute computer program instructions involved in the solutions of the embodiments of the present application, and the processor 801 controls the execution.
  • the processor 801 is configured to execute computer program instructions stored in the memory 803, so as to implement the method provided in the embodiment of the present application.
  • the computer program instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the processor 801 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 8.
  • the electronic device 800 may include multiple processors, such as the processor 801 and the processor 807 in FIG. 8. These processors can be single-CPU (single-CPU) processors or multi-core (multi-CPU) processors.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the electronic device 800 may further include a communication interface 804.
  • the electronic device can send and receive data through the communication interface 804, or communicate with other devices or a communication network.
  • the communication interface 804 can be, for example, an Ethernet interface, a radio access network (RAN), or a wireless local area interface (wireless local area). networks, WLAN) or USB interface, etc.
  • the electronic device 800 may further include an output device 805 and an input device 806.
  • the output device 805 communicates with the processor 801 and can display information in a variety of ways.
  • the output device 805 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • the input device 806 communicates with the processor 801, and can receive user input in a variety of ways.
  • the input device 806 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
  • the electronic device 800 can be a desktop computer, a portable computer, a web server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, a smart camera, or a smart camera as shown in Figure 8. Similar structure equipment.
  • PDA personal digital assistant
  • the embodiment of the present application does not limit the type of the electronic device 800. If it is used to implement the method of the second device in the foregoing embodiment, the electronic device 800 needs to be equipped with a smart camera.
  • the processor 801 in FIG. 8 may invoke the computer program instructions stored in the memory 803 to cause the electronic device 800 to execute the method in the foregoing method embodiment.
  • each processing module in FIG. 6 or FIG. 7 may be implemented by the processor 801 in FIG. 8 calling computer program instructions stored in the memory 803.
  • the function/implementation process of the preprocessing module 601 and the encoding module 602 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803.
  • the function/implementation process of the receiving module 701 and the decoding module 702 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803.
  • a computer-readable storage medium including instructions is also provided.
  • the foregoing instructions can be executed by the processor 801 of the electronic device 800 to complete the smart camera testing method of the foregoing embodiment. Therefore, the technical effects that can be obtained can refer to the above-mentioned method embodiments, which will not be repeated here.
  • the computer can be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.

Abstract

The present application relates to the technical field of multimedia processing, and provides an audio signal processing method and apparatus, solving the problems in the prior art that when an audio signal is transmitted among multiple electronic devices, transmission is repeated and bandwidth resources are wasted due to different requirements of different audio applications for compressing and encoding the audio signal. The method comprises: a first apparatus samples and quantifies an obtained first audio signal to obtain a second audio signal; encode, in units of a first duration, the second audio signal by means of a first encoding mode to obtain a basic frame; encode, in units of a second duration, the second audio signal by means of a second encoding mode to obtain an extension frame, wherein the second duration is longer than the first duration, and the first encoding mode and the second encoding mode are used for respectively encoding different signals carried in the second audio signal, and/or separately encoding the second audio signal to different degrees of encoding; send the basic frame and the extension frame to a second apparatus.

Description

一种音频信号处理方法和装置Audio signal processing method and device 技术领域Technical field
本申请涉及多媒体处理技术领域,尤其涉及一种音频信号处理方法和装置。This application relates to the field of multimedia processing technology, and in particular to an audio signal processing method and device.
背景技术Background technique
目前,随着人们对越来越多电子设备的使用日趋频繁,多个电子设备之间的音频信号协同处理会成为未来音频信号处理的一个重要技术发展趋势。当多个电子设备之间进行音频信号的传输时,作为发送端的电子设备可以将采集到的音频信号进行采样、量化和编码后压缩传输给接收端的电子设备。而作为接收端的电子设备上的多种应用对该音频信号的时延要求和质量要求可能是不同的,其要求发送端的电子设备对音频信号进行压缩编码的处理不同。At present, as people use more and more electronic devices more and more frequently, the collaborative processing of audio signals between multiple electronic devices will become an important technology development trend for audio signal processing in the future. When the audio signal is transmitted between multiple electronic devices, the electronic device as the transmitting end can sample, quantize, and encode the collected audio signal and then compress and transmit it to the electronic device at the receiving end. However, multiple applications on the electronic device as the receiving end may have different delay requirements and quality requirements for the audio signal, and they require the electronic device at the transmitting end to compress and encode the audio signal differently.
如图1示出了可能的应用场景,手机将采集到的音频信号发送给智能耳机,该智能耳机上有不同的音频应用,如音频应用1为语音增强应用,对接收的音频信号的实时性要求高,音频信号的传输质量要求一般;音频应用2为三维声场采集应用,对接收的音频信号的传输质量要求高,但是音频信号的时延要求不高。根据现有技术的处理方式,手机需要对同一条音频信号进行不同的压缩编码处理,给智能耳机传输多路音频信号,其不同音频信号的传输时延和质量是不同的,但是不同音频信号的内容都是手机所采集到的同一条音频信号。因此,会造成音频信号的重复传输,导致带宽资源的占用和浪费。Figure 1 shows a possible application scenario. The mobile phone sends the collected audio signal to a smart headset. There are different audio applications on the smart headset. For example, audio application 1 is a voice enhancement application. The requirements are high, and the audio signal transmission quality requirements are general; the audio application 2 is a three-dimensional sound field collection application, which has high requirements for the transmission quality of the received audio signals, but the audio signal delay requirements are not high. According to the processing method of the prior art, the mobile phone needs to perform different compression and encoding processing on the same audio signal, and transmit multiple audio signals to the smart earphone. The transmission delay and quality of different audio signals are different, but the different audio signals are different. The content is the same audio signal collected by the mobile phone. Therefore, it will cause repeated transmission of audio signals, leading to occupation and waste of bandwidth resources.
发明内容Summary of the invention
本申请提供一种音频信号处理方法和装置,解决了现有技术针对多个电子设备之间音频信号的传输时,不同的音频应用对音频信号压缩编码的需求不同所造成的重复传输、带宽资源浪费的问题。The present application provides an audio signal processing method and device, which solves the problem of repeated transmission and bandwidth resources caused by different audio applications having different audio signal compression and coding requirements when the prior art is aimed at the transmission of audio signals between multiple electronic devices. The problem of waste.
第一方面,提供一种音频信号处理方法,该方法包括:第一装置对获取的第一音频信号进行采样和量化处理,得到第二音频信号;以第一时长为单位对第二音频信号通过第一编码方式进行编码得到基本帧,以第二时长为单位对第二音频信号通过第二编码方式进行编码得到扩展帧,其中,第二时长大于所述第一时长,且第一编码方式和第二编码方式分别对第二音频信号中携带的不同信号进行编码,和/或分别对第二音频信号进行不同编码程度的编码;将基本帧和扩展帧发送给第二装置。In a first aspect, an audio signal processing method is provided. The method includes: a first device performs sampling and quantization processing on an acquired first audio signal to obtain a second audio signal; The first encoding method is encoded to obtain a basic frame, and the second audio signal is encoded in the second encoding method to obtain an extended frame using the second duration as a unit, where the second duration is greater than the first duration, and the first encoding method and The second encoding method respectively encodes different signals carried in the second audio signal, and/or encodes the second audio signal with different encoding degrees respectively; and sends the basic frame and the extended frame to the second device.
上述技术方案中,音频信号的发送端可以对同一路音频信号进行编码压缩,得到两种不同帧长的编码帧,包括基本帧和扩展帧,并且扩展帧可以是对基本帧对第二音频信号未进行编码的部分信号进行重新编码,或者基本帧编码不够精细的部分进行重新编码得到的。从而接收端可以根据基本帧进行解码得到一种音频信号,根据基本帧和扩展帧进行联合解码得到另一种音频信号,恢复的两种音频信号时延不同、音频质量有差异,从而能够满足上述不同音频应用的需求,避免了编码侧对同一路音频信号进行编码后重复传输、带宽资源浪费的问题,降低系统开销。In the above technical solution, the audio signal sending end can encode and compress the same audio signal to obtain two encoding frames with different frame lengths, including a basic frame and an extended frame, and the extended frame can be a comparison between the basic frame and the second audio signal. Part of the signal that has not been encoded is re-encoded, or the part of the basic frame encoding that is not fine enough is re-encoded. Therefore, the receiving end can decode the basic frame to obtain an audio signal, and jointly decode the basic frame and the extended frame to obtain another audio signal. The restored two audio signals have different delays and different audio quality, which can meet the above requirements. The needs of different audio applications avoid the problems of repeated transmission and waste of bandwidth resources after encoding the same audio signal on the encoding side, and reduce system overhead.
在一种可能的设计方式中,第二时长为第一时长的N倍,N为大于等于2的自然数。In a possible design manner, the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
上述可能的实现方式中,第一装置对第二音频信号进行编码时,基本帧之间的时间间隔为第一时长,扩展帧之间的时间间隔为第一时长的N倍,也就是每编码N帧基本帧,进行一帧扩展帧编码。从而编码侧得到不同时延的编码帧,用于解码侧根据不同时延的编码帧恢复得到不同时延的音频信号,以满足不同音频应用的需求,提高编码率,解决带宽资源浪费的问题,降低系统开销。In the foregoing possible implementation manner, when the first device encodes the second audio signal, the time interval between basic frames is the first duration, and the time interval between extended frames is N times the first duration, that is, every encoding N-frame basic frame, one-frame extended frame coding. Therefore, the encoding side obtains encoded frames with different delays, and the decoding side uses the encoded frames with different delays to recover audio signals with different delays to meet the needs of different audio applications, increase the encoding rate, and solve the problem of bandwidth resource waste. Reduce system overhead.
在一种可能的设计方式中,以第一时长为单位对第二音频信号通过第一编码方式进行编码得到基本帧,具体包括:对第二音频信号进行下采样,得到第二音频信号中携带的低频信号;根据时域编码方式对低频信号进行编码,得到多个以第一时长为帧长的多个基本帧。In a possible design method, the second audio signal is encoded by the first encoding method to obtain the basic frame by using the first duration as the unit, which specifically includes: down-sampling the second audio signal to obtain the second audio signal. The low-frequency signal; the low-frequency signal is encoded according to the time-domain coding method to obtain multiple basic frames with the first time length as the frame length.
上述可能的实现方式一中,编码侧可以根据时域编码的方式对第二音频信号中包括的低频信号进行编码,得到基本帧。由于时域编码的方式可以将音频信号编码为时延较低的数字信号,适用于编码得到时延较低、仅包括原始音频信号中低频部分的基本帧,从而解码侧可以根据该基本帧恢复得到实时性强,音频质量一般的音频信号,以应用到对应的音频应用中。In the foregoing possible implementation manner 1, the encoding side may encode the low-frequency signal included in the second audio signal according to a time-domain encoding manner to obtain a basic frame. Since the time-domain encoding method can encode the audio signal into a digital signal with a lower delay, it is suitable for encoding to obtain a basic frame with a lower delay and only including the low-frequency part of the original audio signal, so that the decoding side can recover from the basic frame Obtain an audio signal with strong real-time performance and general audio quality for application to corresponding audio applications.
在一种可能的设计方式中,以第二时长为单位对第二音频信号通过第二编码方式进行编码得到扩展帧,具体包括:对第二音频信号进行频域变换,得到第二音频信号对应的频域系数;将第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值;根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。In a possible design method, the second audio signal is encoded by the second encoding method to obtain the extended frame by using the second duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding The frequency domain coefficients of the frequency domain coefficients of the second audio signal are averagely grouped in the order from low frequency to high frequency in the high frequency part of the frequency domain coefficients corresponding to the second audio signal to obtain the group envelope values of multiple high frequency groups, where, The group envelope value is an average value of multiple high-frequency frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
上述可能的实现方式一中,对应于上述编码得到的基本帧,编码侧还可以根据频域编码的方式对第二音频信号中包括的高频信号进行编码,得到扩展帧,以对上述基本帧未进行编码的高频部分信号进行编码。从而解码侧可以根据上述基本帧联合扩展帧恢复得到实时性较弱,但包括原始音频信号中低频和高频部分的、音频质量较好的音频信号,以应用到对应的音频应用中。上述实施方式通过基本帧编码和扩展帧编码,可以满足多种音频应用的需求,提高编码率,解决了带宽资源浪费问题。In the foregoing possible implementation manner 1, corresponding to the basic frame obtained by the foregoing encoding, the encoding side may also encode the high-frequency signal included in the second audio signal in a frequency domain encoding manner to obtain an extended frame, so as to obtain an extended frame for the basic frame. The high frequency part of the signal that has not been coded is coded. Therefore, the decoding side can jointly expand the frame recovery based on the above basic frame to obtain an audio signal with low real-time performance, but including the low-frequency and high-frequency parts of the original audio signal, and with better audio quality, so as to be applied to the corresponding audio application. The foregoing embodiments can meet the requirements of multiple audio applications through basic frame encoding and extended frame encoding, increase the encoding rate, and solve the problem of bandwidth resource waste.
在一种可能的设计方式中,以第一时长为单位对第二音频信号通过第一编码方式进行编码得到基本帧,具体包括:对第二音频信号进行频域变换,得到第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;将高频信号的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值;根据低频信号的多个频域系数和高频信号的组包络值进行编码得到以第一时长为帧长的多个基本帧。In a possible design method, the basic frame is obtained by encoding the second audio signal through the first encoding method with the first duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal; the multiple frequency-domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain a group of multiple high frequency groups Envelope value, where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group; encode according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain the first duration as Multiple basic frames of frame length.
上述可能的实现方式二中,编码侧可以根据频域编码的方式对第二音频信号中包括的低频信号和高频信号进行编码,其中,对低频信号的多个频域系数进行编码,而对高频信号仅编码高频信号的组包络值,得到基本帧。该基本帧编码方式是对低频部分进行高质量编码,而对高频部分进行较低质量的编码,解码侧可以根据该基本帧恢复得到实时性强,音频质量一般的音频信号,以应用到对应的音频应用中。In the second possible implementation manner, the encoding side may encode the low-frequency signal and the high-frequency signal included in the second audio signal according to a frequency-domain encoding manner, wherein multiple frequency-domain coefficients of the low-frequency signal are encoded, and The high-frequency signal only encodes the group envelope value of the high-frequency signal to obtain the basic frame. The basic frame coding method is to perform high-quality coding on the low-frequency part, and perform lower-quality coding on the high-frequency part. The decoding side can recover the audio signal with strong real-time performance and general audio quality according to the basic frame, which can be applied to the corresponding Audio application.
在一种可能的设计方式中,以第二时长为单位对第二音频信号通过第二编码方式进行编码得到扩展帧,具体包括:以第二时长为单位,将高频信号的多个频域系数与对应的 组包络值得到的差值进行编码,得到以第二时长为帧长的多个扩展帧。In a possible design method, the second audio signal is encoded by the second encoding method in the second time length as the unit to obtain the extended frame, which specifically includes: taking the second time length as the unit, the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.
上述可能的实现方式二中,编码侧可以根据上述方式二的基本帧,对基本帧中编码质量较低的高频部分信号进行进一步编码,即可以根据高频信号的多个频域系数与对应的组包络值得到的差值进行编码。该扩展编码方式是对高频部分进行进一步的高质量编码,因此,解码侧可以根据上述基本帧和扩展帧联合解码恢复得到实时性一般、音频质量较强的音频信号,以应用到对应的音频应用中。上述实施方式通过基本帧编码和扩展帧编码,得到时延不同、编码质量不同的编码帧,从而可以提高编码率,降低系统开销。In the second possible implementation manner, the encoding side can further encode the high-frequency part of the signal with lower encoding quality in the basic frame according to the basic frame in the second manner, that is, according to the multiple frequency domain coefficients of the high-frequency signal and the corresponding The difference value obtained from the group envelope value is encoded. The extended encoding method is to perform further high-quality encoding on the high-frequency part. Therefore, the decoding side can jointly decode and restore the above-mentioned basic frame and extended frame to obtain an audio signal with general real-time performance and strong audio quality, which can be applied to the corresponding audio signal. In application. The foregoing embodiment uses basic frame encoding and extended frame encoding to obtain encoded frames with different time delays and different encoding qualities, so that the encoding rate can be increased and the system overhead can be reduced.
另外,还有一种可能的实施方式三,编码侧可以根据上述方式一的时域编码方式得到基本帧,并根据上述方式一中对扩展帧的编码方式得到第一扩展帧,再根据上述的方式二对扩展帧的编码方式得到第二扩展帧。通过该编码方式可以得到实时性强、仅包含低频信号的、编码质量较低的基本帧;得到实时性较强、包含低频和高频信号但高频信号的、编码质量不高的第一扩展帧;得到实时性弱、包含低频和高频信号且高频信号编码质量较高的第二扩展帧。从而编码帧的层次更加丰富,解码侧可以根据上述基本帧分别和第一扩展帧、第二扩展帧进行联合解码恢复得到不同质量的音频信号,以满足不同音频应用的需求,提高音频编码的灵活性和编码率,降低系统开销。In addition, there is another possible implementation manner 3. The encoding side can obtain the basic frame according to the time-domain encoding method of the above method 1, and obtain the first extended frame according to the encoding method of the extended frame in the above method 1, and then according to the above method The second extended frame is obtained by encoding the two pairs of extended frames. Through this encoding method, a basic frame with strong real-time performance, containing only low-frequency signals, and low coding quality can be obtained; the first extension with strong real-time performance, containing low-frequency and high-frequency signals but high-frequency signals, and low coding quality can be obtained Frame: Obtain a second extended frame with weak real-time performance, low-frequency and high-frequency signals, and high-frequency signal encoding quality. As a result, the levels of encoded frames are more abundant, and the decoding side can jointly decode and restore audio signals of different quality according to the above-mentioned basic frames and the first extended frame and the second extended frame to meet the needs of different audio applications and improve the flexibility of audio encoding. Performance and coding rate, reducing system overhead.
在一种可能的设计方式中,以第二时长为单位对第二音频信号通过第二编码方式进行编码得到扩展帧,具体还包括:对第二音频信号进行频域变换,得到第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;将低频信号的多个频域系数和高频信号的多个频域系数按照从低频到高频的顺序分别进行平均分组得到对应的组包络值,其中,组包络值为每组中多个频域系数的平均值;根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。In a possible design manner, the second audio signal is encoded by the second encoding method to obtain the extended frame in the unit of the second duration, and specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal Multiple frequency domain coefficients of the corresponding low-frequency signal and multiple frequency domain coefficients of the high-frequency signal; multiple frequency-domain coefficients of the low-frequency signal and multiple frequency-domain coefficients of the high-frequency signal are performed in the order from low frequency to high frequency respectively Average grouping to obtain a corresponding group envelope value, where the group envelope value is an average value of multiple frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
上述可能的实现方式四中,对应于上述方式一编码得到的基本帧,编码侧可以根据频域编码的方式,对第二音频信号对应的低频频域系数的组包络值和高频频域系数的组包络值进行编码得到扩展帧。从而在基本帧丢失的情况下,解码侧还可以根据扩展帧进行解码,恢复得到音频信号,提高了音频编码传输的可靠性,提高用户的使用体验。In the fourth possible implementation manner above, corresponding to the basic frame obtained by encoding in the above manner 1, the encoding side may, according to the frequency domain encoding method, calculate the group envelope value and the high frequency frequency domain coefficients of the low frequency domain coefficients corresponding to the second audio signal The group envelope value of is encoded to obtain the extended frame. Therefore, in the case that the basic frame is lost, the decoding side can also decode according to the extended frame to recover the audio signal, which improves the reliability of audio coding transmission and improves the user experience.
在一种可能的设计方式中,对第二音频信号进行频域变换,具体包括:根据改进离散余弦变换MDCT算法,得到第二音频信号对应的MDCT频域分量系数。In a possible design manner, performing frequency domain transformation on the second audio signal specifically includes: obtaining MDCT frequency domain component coefficients corresponding to the second audio signal according to an improved discrete cosine transform MDCT algorithm.
第二方面,提供一种音频信号处理方法,该方法包括:第二装置接收来自第一装置发送的基本帧和扩展帧,其中,扩展帧的帧长大于基本帧的帧长,扩展帧是对多个基本帧对应的音频信号重新进行编码得到的;对基本帧进行解码得到基本音频信号;或者,对基本帧和扩展帧进行联合解码得到扩展音频信号。In a second aspect, an audio signal processing method is provided. The method includes: a second device receives a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is Audio signals corresponding to multiple basic frames are re-encoded; basic frames are decoded to obtain basic audio signals; or, basic frames and extended frames are jointly decoded to obtain extended audio signals.
在一种可能的设计方式中,对基本帧进行解码得到基本音频信号,具体包括:根据时域编解码方式对基本帧进行解码,得到基本音频信号。In a possible design manner, decoding the basic frame to obtain the basic audio signal specifically includes: decoding the basic frame according to the time-domain codec mode to obtain the basic audio signal.
在一种可能的设计方式中,对基本帧和扩展帧进行联合解码得到扩展音频信号,具体包括:若扩展帧包括多个高频信号的组包络值,则根据多个高频信号的组包络值得到高频信号的多个频域系数,高频信号的频域系数为频域系数对应的组包络值;对基本音频信号进行上采样,得到第三音频信号;对第三音频信号逐帧进行频域变换,得到第三音频信号对应的低频信号的多个频域系数;根据高频信号的多个频域系数和低频信号的多个频域 系数进行频域反变换,得到扩展音频信号。In a possible design method, the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes the group envelope values of multiple high-frequency signals, then according to the group of multiple high-frequency signals The envelope value obtains multiple frequency domain coefficients of the high-frequency signal, and the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; The signal undergoes frequency domain transformation frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the high frequency signal and multiple frequency domain coefficients of the low frequency signal to obtain Extend the audio signal.
在一种可能的设计方式中,对基本帧进行解码得到基本音频信号,具体包括:若基本帧包括低频信号的多个频域系数和高频信号的多个组包络值,则根据基本帧得到低频信号的多个频域系数和高频信号的多个频域系数,其中,高频信号的多个频域系数为频域系数对应的组包络值;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到基本音频信号。In a possible design method, the basic frame is decoded to obtain the basic audio signal, which specifically includes: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, then according to the basic frame Obtain multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency-domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency-domain coefficients; according to the multiple frequency domains of the low-frequency signal The coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain the basic audio signal.
在一种可能的设计方式中,对基本帧和扩展帧进行联合解码得到扩展音频信号,具体包括:若扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则根据高频信号的多个组包络值,以及高频信号的多个频域系数与对应的组包络值的差值得到高频信号的多个频域系数;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design method, joint decoding of the basic frame and the extended frame to obtain the extended audio signal specifically includes: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, According to the multiple group envelope values of the high-frequency signal, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the multiple frequency domain coefficients of the high-frequency signal are obtained; The frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain an extended audio signal.
在一种可能的设计方式中,对基本帧和扩展帧进行联合解码得到扩展音频信号,具体包括:若扩展帧包括低频信号的多个组包络值和高频信号的多个组包络值,则根据低频信号的多个组包络值得到低频信号的多个频域系数,并根据高频信号的多个组包络值得到高频信号的多个频域系数;其中,低频信号的多个频域系数是根据基本帧得到的基本音频信号进行频域变换确定的,或者多个低频信号的频域系数是根据扩展帧中的低频信号的多个组包络值确定,低频信号的多个频域系数为频域系数对应的组包络值;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design method, the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal , Then obtain multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal, and obtain multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal; The multiple frequency domain coefficients are determined by frequency domain transformation based on the basic audio signal obtained in the basic frame, or the frequency domain coefficients of multiple low-frequency signals are determined based on multiple group envelope values of the low-frequency signal in the extended frame. The multiple frequency domain coefficients are the group envelope values corresponding to the frequency domain coefficients; the frequency domain inverse transform is performed according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
在一种可能的设计方式中,根据频域系数进行频域反变化,具体包括:根据改进离散余弦反变换算法,得到频域系数对应的音频模拟信号。In a possible design method, performing the frequency domain inverse change according to the frequency domain coefficients specifically includes: obtaining the audio analog signal corresponding to the frequency domain coefficient according to the improved inverse discrete cosine transform algorithm.
在一种可能的设计方式中,组包络值包括对多个频域系数按照从低频到高频的顺序进行平均分组后得到的每组中多个频域系数的平均值。In a possible design manner, the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
第三方面,提供一种音频信号处理装置,该装置包括:预处理模块,用于对获取的第一音频信号进行采样和量化处理,得到第二音频信号;编码模块,用于以第一时长为单位对所述第二音频信号通过第一编码方式进行编码得到基本帧,以第二时长为单位对所述第二音频信号通过第二编码方式进行编码得到扩展帧,其中,所述第二时长大于所述第一时长,且所述第一编码方式和所述第二编码方式分别对所述第二音频信号中携带的不同信号进行编码,和/或分别对所述第二音频信号进行不同编码程度的编码;发送模块,用于将基本帧和扩展帧发送给第二装置。In a third aspect, an audio signal processing device is provided. The device includes: a preprocessing module for sampling and quantizing the acquired first audio signal to obtain a second audio signal; and an encoding module for using a first duration The second audio signal is encoded in the first encoding mode to obtain a basic frame, and the second audio signal is encoded in the second encoding method in the second time length unit to obtain an extended frame, wherein the second The duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and/or separately perform the second audio signal Encoding with different encoding levels; sending module, used to send the basic frame and the extended frame to the second device.
在一种可能的设计方式中,第二时长为第一时长的N倍,N为大于等于2的自然数。In a possible design manner, the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
在一种可能的设计方式中,编码模块具体用于:对第二音频信号进行下采样,得到第二音频信号中携带的低频信号;根据时域编码方式对低频信号进行编码,得到多个以第一时长为帧长的多个基本帧。In a possible design method, the encoding module is specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; to encode the low-frequency signal according to the time-domain encoding method to obtain multiple The first duration is multiple basic frames of frame length.
在一种可能的设计方式中,编码模块具体用于:对第二音频信号进行频域变换,得到第二音频信号对应的频域系数;将第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值;根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。In a possible design method, the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal; The multiple frequency domain coefficients are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is the average value of the multiple high frequency frequency domain coefficients in each group; The group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.
在一种可能的设计方式中,编码模块具体用于:对第二音频信号进行频域变换,得 到第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;将高频信号的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值;根据低频信号的多个频域系数和高频信号的组包络值进行编码得到以第一时长为帧长的多个基本帧。In a possible design method, the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ; The multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is multiple high frequency frequency domains in each group The average value of the coefficients; encoding according to multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain multiple basic frames with the first time length as the frame length.
在一种可能的设计方式中,编码模块具体用于:以第二时长为单位,将高频信号的多个频域系数与对应的组包络值得到的差值进行编码,得到以第二时长为帧长的多个扩展帧。In a possible design method, the encoding module is specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain the second Multiple extended frames whose duration is the frame length.
在一种可能的设计方式中,编码模块具体用于:对第二音频信号进行频域变换,得到第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;将低频信号的多个频域系数和高频信号的多个频域系数按照从低频到高频的顺序分别进行平均分组得到对应的组包络值,其中,组包络值为每组中多个频域系数的平均值;根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。In a possible design method, the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ; The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain the corresponding group envelope value, where the group envelope value is in each group The average value of multiple frequency domain coefficients; encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
在一种可能的设计方式中,频域变换具体包括:改进离散余弦变换MDCT算法。In a possible design method, the frequency domain transform specifically includes: an improved discrete cosine transform MDCT algorithm.
第四方面,提供一种音频信号处理装置,该装置包括:接收模块,用于接收来自第一装置发送的基本帧和扩展帧,其中,扩展帧的帧长大于基本帧的帧长,扩展帧是对多个基本帧对应的音频信号重新进行编码得到的;解码模块,用于对基本帧进行解码得到基本音频信号;或者,对基本帧和扩展帧进行联合解码得到扩展音频信号。In a fourth aspect, an audio signal processing device is provided. The device includes: a receiving module for receiving a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame It is obtained by re-encoding the audio signals corresponding to multiple basic frames; the decoding module is used to decode the basic frame to obtain the basic audio signal; or, jointly decode the basic frame and the extended frame to obtain the extended audio signal.
在一种可能的设计方式中,解码模块具体用于:根据时域编解码方式对基本帧进行解码,得到基本音频信号。In a possible design manner, the decoding module is specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.
在一种可能的设计方式中,解码模块具体用于:若扩展帧包括多个高频信号的组包络值,则根据多个高频信号的组包络值得到高频信号的多个频域系数,高频信号的频域系数为频域系数对应的组包络值;对基本音频信号进行上采样,得到第三音频信号;对第三音频信号逐帧进行频域变换,得到第三音频信号对应的低频信号的多个频域系数;根据高频信号的多个频域系数和低频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design method, the decoding module is specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple frequency signals of the high-frequency signal according to the group envelope values of the multiple high-frequency signals. The frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; the third audio signal is subjected to frequency domain transformation frame by frame to obtain the third The multiple frequency domain coefficients of the low frequency signal corresponding to the audio signal; the inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
在一种可能的设计方式中,解码模块具体用于:若基本帧包括低频信号的多个频域系数和高频信号的多个组包络值,则根据基本帧得到低频信号的多个频域系数和高频信号的多个频域系数,其中,高频信号的多个频域系数为频域系数对应的组包络值;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到基本音频信号。In a possible design method, the decoding module is specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequencies of the low-frequency signal according to the basic frame. Domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain the basic audio signal.
在一种可能的设计方式中,解码模块具体用于:若扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则根据高频信号的多个组包络值,以及高频信号的多个频域系数与对应的组包络值的差值得到高频信号的多个频域系数;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design method, the decoding module is specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple group envelope values of the high-frequency signal Value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency The domain coefficients are inversely transformed in the frequency domain to obtain an extended audio signal.
在一种可能的设计方式中,解码模块具体用于:若扩展帧包括低频信号的多个组包络值和高频信号的多个组包络值,则根据低频信号的多个组包络值得到低频信号的多个频域系数,并根据高频信号的多个组包络值得到高频信号的多个频域系数;其中,低频信号的多个频域系数是根据基本帧得到的基本音频信号进行频域变换确定的,或者多个低频信号的频域系数是根据扩展帧中的低频信号的多个组包络值确定,低频信号的多个频域系数为频域系数对应的组包络值;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design method, the decoding module is specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal The value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are obtained according to the basic frame The basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to the multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal correspond to the frequency domain coefficients Group envelope value: Perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.
在一种可能的设计方式中,频域反变化具体包括:改进离散余弦反变换算法。In a possible design method, the frequency domain inverse change specifically includes: an improved inverse discrete cosine transform algorithm.
在一种可能的设计方式中,组包络值包括对多个频域系数按照从低频到高频的顺序进行平均分组后得到的每组中多个频域系数的平均值。In a possible design manner, the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
第五方面,提供一种电子设备,该电子设备包括:处理器和传输接口;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以使得所述电子设备实现如上述第一方面以及第一方面中任一项所述的音频信号处理方法。In a fifth aspect, an electronic device is provided, the electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the first aspect and the first aspect.
第六方面,提供一种电子设备,该电子设备包括:处理器和传输接口;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以使得所述电子设备实现如上述第二方面以及第二方面中任一项所述的音频信号处理方法。In a sixth aspect, an electronic device is provided, the electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the second aspect and the second aspect described above.
第七方面,提供一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如上述第一方面以及第一方面中任一项所述的音频信号处理方法。In a seventh aspect, a computer-readable storage medium is provided. When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned first aspect and the first aspect. Any one of the audio signal processing methods.
第八方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如上述第一方面以及第一方面中任一项所述的音频信号处理方法。An eighth aspect provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of the first aspect and the first aspect.
第九方面,提供一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如上述第二方面以及第二方面中任一项所述的音频信号处理方法。In a ninth aspect, a computer-readable storage medium is provided. When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned second aspect and the second aspect. Any one of the audio signal processing methods.
第十方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如上述第二方面以及第二方面中任一项所述的音频信号处理方法。In a tenth aspect, a computer program product is provided. When the computer program product runs on a computer, the computer executes the audio signal processing method according to any one of the second aspect and the second aspect.
可以理解地,上述提供的任一种音频信号处理装置、电子设备、计算机可读存储介质和计算机程序产品,均可以用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。It is understandable that any audio signal processing device, electronic device, computer readable storage medium, and computer program product provided above can be used to execute the corresponding method provided above, and therefore, the benefits that can be achieved are For the effect, please refer to the beneficial effect in the corresponding method provided above, which will not be repeated here.
附图说明Description of the drawings
图1为本申请实施例提供的一种音频信号处理方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of an audio signal processing method provided by an embodiment of this application;
图2为本申请实施例提供的一种音频信号处理方法的流程示意图;2 is a schematic flowchart of an audio signal processing method provided by an embodiment of this application;
图3为本申请实施例提供的一种音频信号处理方法的处理过程示意图;FIG. 3 is a schematic diagram of the processing process of an audio signal processing method provided by an embodiment of the application;
图4为本申请实施例提供的一种音频信号编码帧示意图;4 is a schematic diagram of an audio signal encoding frame provided by an embodiment of the application;
图5为本申请实施例提供的另一种音频信号处理方法的流程示意图;FIG. 5 is a schematic flowchart of another audio signal processing method provided by an embodiment of the application;
图6为本申请实施例提供的一种音频信号处理装置示意图;FIG. 6 is a schematic diagram of an audio signal processing device provided by an embodiment of the application;
图7为本申请实施例提供的另一种音频信号处理装置示意图;FIG. 7 is a schematic diagram of another audio signal processing device provided by an embodiment of the application;
图8为本申请实施例提供的一种电子设备结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
具体实施方式detailed description
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present embodiment, unless otherwise specified, "plurality" means two or more.
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如” 等词旨在以具体方式呈现相关概念。It should be noted that in this application, words such as "exemplary" or "for example" are used to represent examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in this application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
首先,对本申请实施例的实施环境和应用场景进行简单介绍。First, the implementation environment and application scenarios of the embodiments of the present application are briefly introduced.
本申请实施例提供一种音频信号处理方法和装置,可以应用于多个电子设备之间传输音频信号时,可以针对不同的应用对音频信号处理要求的不同,基于基本帧和扩展帧灵活进行音频信号的编解码,满足不同时延要求或者不同质量要求的音频处理。从而解决了现有技术针对多个电子设备之间同一路音频信号的传输时,不同的音频应用对音频信号传输的实时性和还原质量的要求不同所造成的重复传输、带宽资源浪费的问题。The embodiments of the present application provide an audio signal processing method and device, which can be applied to the transmission of audio signals between multiple electronic devices, and can be used for different audio signal processing requirements for different applications, and flexibly perform audio based on basic frames and extended frames. The signal encoding and decoding can meet the audio processing with different delay requirements or different quality requirements. This solves the problems of repeated transmission and waste of bandwidth resources caused by different audio applications' requirements for the real-time and restoration quality of audio signal transmission when the same channel of audio signal is transmitted between multiple electronic devices in the prior art.
如图1所示,本申请实施例提供的音频信号处理方法可以应用于具有音频信号处理能力的电子设备,且至少包括两个电子设备,两个电子设备之间可以进行数据传输。例如,可以通过有线网络、无线局域网、近场通信(Near Field Communication,NFC)或者蓝牙等传输音频信号。As shown in FIG. 1, the audio signal processing method provided by the embodiment of the present application can be applied to an electronic device with audio signal processing capability, and includes at least two electronic devices, and data can be transmitted between the two electronic devices. For example, the audio signal can be transmitted through a wired network, a wireless local area network, Near Field Communication (NFC), or Bluetooth.
具体的,该电子设备可以为手机、智能音箱、智能耳机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑、车载设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备等,本公开实施例对该电子设备的具体形态不作特殊限制。示例性的,如图1所示,电子设备1可以是手机,电子设备2可以是智能耳机。Specifically, the electronic device can be a mobile phone, a smart speaker, a smart headset, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), and a netbook. , As well as cellular phones, personal digital assistants (personal digital assistants, PDAs), augmented reality (AR)\virtual reality (VR) devices, etc. The embodiments of the present disclosure do not specifically limit the specific form of the electronic device . Exemplarily, as shown in FIG. 1, the electronic device 1 may be a mobile phone, and the electronic device 2 may be a smart headset.
本申请实施例提供一种音频信号处理方法,应用于第一装置和第二装置。如图2所示,该方法可以包括:The embodiment of the application provides an audio signal processing method, which is applied to a first device and a second device. As shown in Figure 2, the method may include:
S201:第一装置对获取的第一音频信号进行采样和量化处理,得到第二音频信号。S201: The first device performs sampling and quantization processing on the acquired first audio signal to obtain a second audio signal.
第一音频信号可以是第一装置采集到的音频信号,也可以是第一装置本地存储的或者来自其他装置或者设备的音频信号。The first audio signal may be an audio signal collected by the first device, or an audio signal stored locally by the first device or from another device or device.
若第一装置响应于第二装置的音频请求,需要要向第二装置发送第一音频信号,则需要对第一音频信号进行采样、量化后得到数字信号,以节省传输带宽。基本处理过程可以参照图3所示的,对第一音频信号进行采样和量化处理后得到第二音频信号s(n),其中n对应不同的音频采样点,按时间顺序排列。如果音频信号基于16kHz的频率进行采样,即表示每秒采样16×10 3个采样点,那么每两个采样点之间的时间间隔为0.0625ms。 If the first device needs to send the first audio signal to the second device in response to the audio request of the second device, the first audio signal needs to be sampled and quantized to obtain a digital signal to save transmission bandwidth. The basic processing procedure can be referred to as shown in FIG. 3, after sampling and quantizing the first audio signal, the second audio signal s(n) is obtained, where n corresponds to different audio sampling points and is arranged in chronological order. If the audio signal is sampled at a frequency of 16kHz, which means that 16×10 3 sampling points are sampled per second, then the time interval between every two sampling points is 0.0625ms.
接下来,将音频信号采样点对应的量化值编码为二进制数字信号,就可以进行传输。其中,可以用不同的量化精度来表示采样点的量化值,比如可以用16比特、24比特或者32比特位来表示。Next, the quantized value corresponding to the sampling point of the audio signal is encoded into a binary digital signal, which can be transmitted. Among them, different quantization precisions can be used to represent the quantization value of the sampling point, for example, it can be represented by 16 bits, 24 bits, or 32 bits.
S202:第一装置以第一时长为单位对第二音频信号通过第一编码方式逐帧进行编码得到基本帧,并以第二时长为单位对第二音频信号通过第二编码方式逐帧进行编码得到扩展帧。S202: The first device encodes the second audio signal frame by frame through the first encoding method in the unit of the first time length to obtain a basic frame, and encodes the second audio signal frame by frame in the second encoding method in the unit of the second time length. Get the extended frame.
其中,第二时长大于所述第一时长,因此,扩展帧的帧长大于基本帧的帧长。Wherein, the second duration is greater than the first duration, and therefore, the frame length of the extended frame is greater than the frame length of the basic frame.
在编码压缩时,可以固定时长的第二音频信号为间隔,每采集量化完一帧第二音频信号,就可以对这帧第二音频信号进行压缩编码,逐帧进行编码后发送。本申请中对第二音频信号按照不同的时间间隔也就是不同的帧长,编码生成两种或者两种以上的编码帧,包括基本帧和扩展帧。When encoding and compressing, the second audio signal of a fixed duration can be used as an interval, and after each frame of the second audio signal is collected and quantized, the second audio signal of this frame can be compressed and encoded, and then sent after being encoded frame by frame. In this application, the second audio signal is encoded according to different time intervals, that is, different frame lengths, to generate two or more encoded frames, including a basic frame and an extended frame.
需要说明的是,根据上述对音频信号的编码原理以及采样率等可以得知,相对自然界中的原始音频信号,目前的音频编码技术只能做到无限接近原始音频信号,也就是音频信号的编解码规则决定了,数字编解码方式对音频信号都是有一定程度的失真的,无法完全对原始音频信号进行还原,本申请所涉及的编码方式是有损的编码技术。It should be noted that according to the above-mentioned encoding principle and sampling rate of audio signals, it can be known that compared with the original audio signal in nature, the current audio coding technology can only achieve infinitely close to the original audio signal, that is, the encoding of the audio signal. The decoding rules determine that the digital encoding and decoding methods all have a certain degree of distortion to the audio signal, and cannot completely restore the original audio signal. The encoding method involved in this application is a lossy encoding technology.
因此,本申请实施例中的基本帧或者扩展帧仅可以对第一音频信号的一部分信号进行编码,并未全部编码。具体的,扩展帧可以是对多个基本帧对应的第二音频信号片段重新进行编码得到的,扩展帧可以对上述基本帧中未进行编码或者编码精度不够的音频信号进一步编码。Therefore, the basic frame or the extended frame in the embodiment of the present application can only encode a part of the first audio signal, but not all of it. Specifically, the extended frame may be obtained by re-encoding the second audio signal segments corresponding to multiple basic frames, and the extended frame may further encode audio signals in the basic frame that are not encoded or have insufficient encoding precision.
具体的说,第一编码方式和第二编码方式可以分别对第二音频信号中携带的不同信号进行编码。例如,根据第一编码方式对第二音频信号中携带的低频信号部分进行编码得到基本帧,根据第二编码方式对第二音频信号中携带的高频信号部分进行编码得到扩展帧。Specifically, the first encoding method and the second encoding method may respectively encode different signals carried in the second audio signal. For example, the low-frequency signal part carried in the second audio signal is encoded according to the first encoding method to obtain a basic frame, and the high-frequency signal part carried in the second audio signal is encoded according to the second encoding method to obtain an extended frame.
另外,第一编码方式和第二编码方式还可以是分别对第二音频信号进行不同编码程度的编码帧,得到较低编码质量的编码帧和较高编码质量的编码帧,然后传输到解码侧解码。因此,解码侧可以根据基本帧或者扩展帧分别恢复出来不同的音频信号。相比原始音频信号,根据扩展帧联合基本帧恢复出来的音频信号失真度更小,因此编码质量更佳。In addition, the first encoding method and the second encoding method can also be encoding frames with different encoding levels on the second audio signal respectively to obtain an encoded frame with lower encoding quality and an encoded frame with higher encoding quality, which are then transmitted to the decoding side decoding. Therefore, the decoding side can respectively recover different audio signals according to the basic frame or the extended frame. Compared with the original audio signal, the audio signal recovered from the extended frame combined with the basic frame has less distortion, so the encoding quality is better.
可知,一般情况下,对第二音频信号进行编码的帧长越长,则对第一音频信号的压缩率越高,发送信号的时延也越高;在相同的码率下,音频信号的编码质量也越好。其中,音频信号的编码质量指的是解码后恢复的音频信号相对编码压缩前的原始音频信号的还原程度。也即是说,对第二音频信号进行编码的帧长越长,则解码后获得到的音频信号相对原始音频信号来说信号的还原度更高,失真率较低。It can be seen that, in general, the longer the frame length for encoding the second audio signal, the higher the compression rate of the first audio signal, and the higher the delay of sending the signal; at the same bit rate, the audio signal’s The encoding quality is also better. Among them, the encoding quality of the audio signal refers to the degree of restoration of the audio signal recovered after decoding relative to the original audio signal before encoding and compression. That is to say, the longer the frame length for encoding the second audio signal, the audio signal obtained after decoding has a higher signal reproduction degree and a lower distortion rate than the original audio signal.
在本申请的实施例中,基本帧可以是对当前的第二音频信号进行较低时延,和/或较低质量的编码,第一装置可以逐帧将基本帧单独传输给第二装置。这样第二装置在逐帧收到基本帧之后可以根据预设的解码方式进行解码得到音频信号,以应用到对时延要求较低或者对音频质量要求相对较低的音频应用中。In the embodiment of the present application, the basic frame may be a lower delay and/or lower quality encoding of the current second audio signal, and the first device may separately transmit the basic frame to the second device frame by frame. In this way, after the second device receives the basic frame frame by frame, the audio signal can be obtained by decoding according to a preset decoding mode, so as to be applied to audio applications that require low delay or relatively low audio quality.
扩展帧可以是对当前的第二音频信号进行较高时延,和/或较高质量的编码。其中,扩展帧的帧长大于基本帧的帧长,扩展帧编码传输的是针对多个基本帧音频信号的增强信息,是对音频信号中基本帧没有包含或者编码不完整的数据进一步进行编码。这样第二装置侧在逐帧收到扩展帧之后可以与基本帧进行联合解码得到音频质量更高的音频信号,以应用到对实时性要求不高而对音频质量要求相对较高的音频应用中。The extended frame may perform higher delay and/or higher quality encoding on the current second audio signal. Wherein, the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame encoding transmits enhanced information for multiple basic frame audio signals, and further encodes data that is not included in the basic frame or incompletely encoded in the audio signal. In this way, the second device side can jointly decode with the basic frame after receiving the extended frame frame by frame to obtain an audio signal with higher audio quality, which can be applied to audio applications that do not require high real-time performance but relatively high audio quality. .
在一种实施方式中,第一装置可以以第一时长为单位对第二音频信号进行编码得到基本帧;第一装置以第二时长为单位对第二音频信号进行编码得到扩展帧。其中,第二时长可以为第一时长的N倍,N为大于等于2的自然数。其中,第一时长为基本 帧的帧长,也就是两个基本帧之间的时间间隔,第二时长为扩展帧的帧长,也就是两个扩展帧之间的时间间隔。In an implementation manner, the first device may encode the second audio signal in a unit of a first duration to obtain a basic frame; the first device encodes the second audio signal in a unit of a second duration to obtain an extended frame. Wherein, the second duration may be N times the first duration, and N is a natural number greater than or equal to 2. Among them, the first duration is the frame length of the basic frame, that is, the time interval between two basic frames, and the second duration is the frame length of the extended frame, that is, the time interval between two extended frames.
以图4为例,t1、t2、t3、t4、t5、t6、t7、t8代表音频编码的基本帧,基本帧的算法时延约为Δt,即两个基本帧之间的时间间隔为Δt。T1、T2代表音频编码的扩展帧,图4中是以每四个基本帧进行一次扩展帧的压缩作为示例的,扩展帧的算法时延为ΔT,即两个扩展帧之间的时间间隔为ΔT,其中,ΔT=4×Δt,即N=4。基本帧或者扩展帧包含了数字化后的音频采样数据。Taking Figure 4 as an example, t1, t2, t3, t4, t5, t6, t7, and t8 represent the basic frames of audio coding. The algorithmic delay of the basic frame is about Δt, that is, the time interval between two basic frames is Δt . T1 and T2 represent the extended frames of audio coding. In Figure 4, the extended frame compression is performed once every four basic frames as an example. The algorithm delay of the extended frame is ΔT, that is, the time interval between two extended frames is ΔT, where ΔT=4×Δt, that is, N=4. The basic frame or the extended frame contains the digitized audio sample data.
示例性的,时延Δt可以是0.5ms,也可以是5ms,时延Δt和ΔT取决于编码结构的设计和实际的应用需求。如采样频率为16kHz,基本帧的帧长为5ms的情况下,每帧基本帧包含的音频采样点个数为80个。Exemplarily, the time delay Δt may be 0.5 ms or 5 ms, and the time delay Δt and ΔT depend on the design of the coding structure and actual application requirements. For example, when the sampling frequency is 16kHz and the frame length of the basic frame is 5ms, the number of audio sampling points contained in each basic frame is 80.
S203:第一装置将基本帧和扩展帧发送给第二装置。S203: The first device sends the basic frame and the extended frame to the second device.
第一装置可以对基本帧编码完成后逐帧发送给第二装置,第一装置对扩展帧编码完成后逐帧发送给第二装置。从而使得第二装置接收到基本帧或者扩展帧后,根据基本帧或者扩展帧进行解码恢复出音频信号,用于不同的音频应用。The first device may transmit the basic frame to the second device frame by frame after encoding the basic frame, and the first device may transmit the extended frame to the second device frame by frame after encoding the extended frame. Therefore, after receiving the basic frame or the extended frame, the second device decodes the basic frame or the extended frame to recover the audio signal, which is used for different audio applications.
根据本申请实施例提供的上述编码方式,第二装置接收到来自第一装置发送的数字信号,该数字信号包括基本帧或者扩展帧,第二装置可以根据预设的编解码方式进行解码,恢复出音频信号。如图5所示,具体过程可以包括:According to the above encoding method provided by the embodiment of the present application, the second device receives the digital signal sent from the first device, and the digital signal includes a basic frame or an extended frame, and the second device can decode according to a preset encoding and decoding method, and restore Audio signal. As shown in Figure 5, the specific process may include:
S501:第二装置接收来自第一装置发送的基本帧和扩展帧,其中,扩展帧的帧长大于基本帧的帧长,扩展帧是对多个基本帧对应的音频信号重新进行编码得到的。S501: The second device receives the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding audio signals corresponding to multiple basic frames.
S502:第二装置对基本帧进行解码得到基本音频信号,或者,对基本帧和扩展帧进行联合解码得到扩展音频信号。S502: The second device decodes the basic frame to obtain the basic audio signal, or jointly decodes the basic frame and the extended frame to obtain the extended audio signal.
第二装置根据预设的编解码规则对接收到的基本帧或者扩展帧进行解码,也即第二装置根据数字信号解码得到模拟信号,以满足第二装置上不同音频应用对音频信号的需求。The second device decodes the received basic frame or extended frame according to the preset codec rules, that is, the second device decodes the digital signal to obtain an analog signal, so as to meet the audio signal requirements of different audio applications on the second device.
进一步的,第二装置接收到基本帧后,根据基本帧进行帧解码,得到对应的基本音频信号s 1(n)。第二装置接收到扩展帧后,根据扩展帧联合基本帧进行综合解码,得到对应的扩展音频信号s 2(n)。 Further, after receiving the basic frame, the second device performs frame decoding according to the basic frame to obtain the corresponding basic audio signal s 1 (n). After receiving the extended frame, the second device performs comprehensive decoding according to the extended frame and the basic frame to obtain the corresponding extended audio signal s 2 (n).
其中,基本音频信号s 1(n)和第二音频信号s 2(n)的音频内容是一样的,但基本音频信号s 1(n)和扩展音频信号s 2(n)的传输时延和音频质量不同,基本音频信号s 1(n)的音频质量稍差于扩展音频信号s 2(n)的音频质量,基本音频信号s 1(n)的传输时延低于扩展音频信号s 2(n)的传输时延。 Among them, the audio content of the basic audio signal s 1 (n) and the second audio signal s 2 (n) are the same, but the transmission delay of the basic audio signal s 1 (n) and the extended audio signal s 2 (n) is sum The audio quality is different. The audio quality of the basic audio signal s 1 (n) is slightly worse than that of the extended audio signal s 2 (n), and the transmission delay of the basic audio signal s 1 (n) is lower than that of the extended audio signal s 2 ( n) the transmission delay.
通过本申请的上述实施方式,编码侧和解码侧之间,对有不同时延要求的音频应用可以用同一套编码方案传输,即编码侧仅获取一路音频信号,但可以针对不同的时延要求分别编码出基本帧和扩展帧,从而解码侧可以根据这两种编码的帧可以解码出不同的音频信号,以满足不同音频应用的需要。其中,根据基本帧解码出的音频信号时延较低,但是音频信号质量较差。根据扩展帧联合基本帧解码出的音频信号时延较长,但是音频信号质量较好,对原始音频信号进行还原的失真度较小。因此,解码侧可以根据不同的基本帧和扩展帧恢复出来两种以上的音频信号,而编码时仅对一路音频信号进行编码,该编码方式减少冗余信息,避免了编码侧对同一路音频信号进行编 码后重复传输、带宽资源浪费的问题,极大程度的降低了系统开销。Through the above-mentioned implementation manners of this application, audio applications with different delay requirements between the encoding side and the decoding side can be transmitted using the same set of encoding schemes, that is, the encoding side only obtains one audio signal, but it can meet different delay requirements. The basic frame and the extended frame are respectively encoded, so that the decoding side can decode different audio signals according to the two encoded frames to meet the needs of different audio applications. Among them, the audio signal decoded according to the basic frame has a low delay, but the audio signal quality is poor. The audio signal decoded according to the extended frame combined with the basic frame has a longer time delay, but the audio signal quality is better, and the distortion of the original audio signal is small. Therefore, the decoding side can recover more than two audio signals according to different basic frames and extended frames, and only one audio signal is encoded during encoding. This encoding method reduces redundant information and avoids the encoding side from processing the same audio signal. The problems of repeated transmission and waste of bandwidth resources after encoding have greatly reduced system overhead.
接下来,通过列举几种优选的编解码实现方式,如方式一、方式二、方式三、以及方式四,对上述本申请的技术方案中的编码和解码方式和过程进行详细说明。下述的几种实施方式并不是本申请的全部可能的实施方式,仅是示例性的实施方式。Next, by enumerating several preferred encoding and decoding implementation manners, such as manner one, manner two, manner three, and manner four, the encoding and decoding manners and processes in the above-mentioned technical solutions of the present application will be described in detail. The following several implementations are not all possible implementations of this application, but are only exemplary implementations.
方式一、method one,
1、编码侧编码过程:1. Encoding process on the encoding side:
在一种可能的实施方式中,第一装置可以采用较低时延的时域编码方式得到基本帧,即仅编码第二音频信号中的低频部分。第一装置采用较高时延的频域编码方式得到扩展帧,且扩展帧仅包含第二音频信号中的高频部分。In a possible implementation manner, the first device may use a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low frequency part of the second audio signal. The first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.
应用的场景例如,第二装置上有两种不同的音频应用,一个为设备校准和定位应用,需要的音频信号要求实时性较强,信号发送时延间隔不超过1ms,但对音频质量要求不高,音频信号中可以不包含高频信号只包含低频信号。另一个是语音增强应用,需要的音频信号实时性不强,信号发送时延不超过6ms,但对音频质量要求较高,高频、低频部分信号都需要。Application scenarios For example, there are two different audio applications on the second device. One is equipment calibration and positioning applications. The required audio signals require strong real-time performance. The signal transmission delay interval does not exceed 1ms, but the audio quality is not required. High, the audio signal may not contain high-frequency signals but only low-frequency signals. The other is voice enhancement applications. The required audio signal is not real-time, and the signal transmission delay does not exceed 6ms, but the audio quality is relatively high, and both high-frequency and low-frequency signals are required.
则上述的步骤S202中,第一装置对基本帧的编码具体可以包括:Then, in the above step S202, the encoding of the basic frame by the first device may specifically include:
(1)第一装置对第二音频信号进行下采样,得到第二音频信号包括的低频信号。(1) The first device down-samples the second audio signal to obtain the low-frequency signal included in the second audio signal.
其中,下采样表示对于一个样值序列间隔几个样值取样一次,这样得到新序列的处理方式。例如,对第一音频信号进行采样的采样率为16kHz,则量化得到的第二音频信号的频宽可以为采样率的一半,即频宽可以为8kHz。如第二音频信号包括0~8kHz的频段,其中,低频信号s L(n)为0~4kHz部分,高频信号s H(n)为4k~8kHz部分。则对第二音频信号进行一倍下采样处理,即可得到第二音频信号中包括的低频信号s L(n)为0~4kHz部分的音频信号。 Among them, down-sampling means to sample a sequence of samples at intervals of several samples, so as to obtain the processing mode of the new sequence. For example, if the sampling rate for sampling the first audio signal is 16 kHz, the bandwidth of the second audio signal obtained by quantization may be half of the sampling rate, that is, the bandwidth may be 8 kHz. For example, the second audio signal includes a frequency band of 0-8kHz, where the low-frequency signal s L (n) is a part of 0-4kHz, and the high-frequency signal s H (n) is a part of 4k-8kHz. Then, the second audio signal is subjected to double downsampling processing to obtain an audio signal whose low-frequency signal s L (n) included in the second audio signal is 0-4 kHz.
(2)根据时域编码方式以第一时长为单位对该低频信号进行编码,得到多个基本帧。(2) Encode the low-frequency signal with the first duration as a unit according to the time-domain encoding method to obtain multiple basic frames.
其中,时域编码是针对音频信号的波形,进行编码。针对时域编码比较典型的有国际电信联盟(International Telecommunication Union,ITU)的G.726、G.723.1或者G.728等编码标准,这些编码标准广泛采用了码激励线性预测技术,从原理上根据人类的发生机理建模,利用人类声门、声道固有的特性,去除音频信号里面的冗余信息,从而在保持较高的音频质量的同时,大幅度的降低了音频编码所需的比特率。Among them, the time domain coding is to encode the waveform of the audio signal. For time-domain coding, there are coding standards such as International Telecommunication Union (ITU) G.726, G.723.1 or G.728. These coding standards widely use code-excited linear prediction technology, based on the principle Human occurrence mechanism modeling, using the inherent characteristics of human glottis and sound channels to remove redundant information in audio signals, so as to maintain high audio quality while greatly reducing the bit rate required for audio coding .
示例性的,第一装置可以对s L(n)采用G.726的编码方式编码,以第一时长为间隔组装成基本帧,基本帧的帧长为第一时长。例如,第一时长可以为0.5ms,逐个对每0.5ms时长的s L(n)信号进行编码,得到的数字信号为基本帧。其中,G.726是一种语音编解码算法,可以将音频信号编码为时延较低的数字信号。 Exemplarily, the first device may use the G.726 encoding method to encode s L (n), and assemble basic frames at intervals of the first time length, and the frame length of the basic frames is the first time length. For example, the first duration may be 0.5 ms, and the s L (n) signals of each 0.5 ms duration are coded one by one, and the obtained digital signal is a basic frame. Among them, G.726 is a speech coding and decoding algorithm that can encode audio signals into digital signals with lower delay.
进一步的,上述的步骤S202中,第一装置对扩展帧的编码具体可以包括:Further, in the foregoing step S202, the encoding of the extended frame by the first device may specifically include:
(1)以第二时长为单位对第二音频信号进行频域变换,得到第二音频信号对应的频域系数。(1) Perform frequency domain transformation on the second audio signal by using the second duration as a unit to obtain frequency domain coefficients corresponding to the second audio signal.
频域编码的原理在于,利用人耳对于声音的接受原理,在频域对于音频信号进行编码。重点编码人类关注的频段,而对于被其他频段掩蔽或是人类不易感知的频段,采用粗略量化或是不量化的策略。频域编码的优势在于根据人耳的特性,去除了一定 的冗余,因此对各种音频信号的编码效果几乎相当,尤其对于音乐等信号的编码质量要高于时域编码。The principle of frequency domain coding is to encode audio signals in the frequency domain by using the human ear's acceptance principle of sound. Focus on coding the frequency bands that humans pay attention to, and use a rough quantization or non-quantization strategy for frequency bands that are masked by other frequency bands or that are not easily perceivable by humans. The advantage of frequency domain coding is that according to the characteristics of the human ear, a certain amount of redundancy is removed. Therefore, the coding effect of various audio signals is almost equivalent, especially for music and other signals. The coding quality is higher than that of time domain coding.
具体的,可以对第二音频信号进行改进离散余弦变换(Modified Discrete Cosine Transform,MDCT),得到第二音频信号对应的MDCT频域系数。其中,MDCT变换是一种将信号从时域变换到频域的算法,得到的系数代表的就是各个频率点的频域分量。Specifically, Modified Discrete Cosine Transform (MDCT) may be performed on the second audio signal to obtain MDCT frequency domain coefficients corresponding to the second audio signal. Among them, the MDCT transform is an algorithm that transforms the signal from the time domain to the frequency domain, and the obtained coefficients represent the frequency domain components of each frequency point.
将时域信号s(n)变换到MDCT频域系数S(k)的变换公式如下:The transformation formula for transforming the time domain signal s(n) to MDCT frequency domain coefficient S(k) is as follows:
Figure PCTCN2020098183-appb-000001
得到MDCT系数S(k),S(k)即为第二音频信号的频域部分。
Figure PCTCN2020098183-appb-000001
The MDCT coefficient S(k) is obtained, and S(k) is the frequency domain part of the second audio signal.
示例性的,如第二时长为5ms,即对扩展帧进行编码的帧长为5ms,采样率为16kHz,则s(n)包括80个采样点,也就是N=80,采样点n的取值范围为0~79。逐个对每5ms时长的s(n)信号进行MDCT变换,得到对应的MDCT系数,k的取值范围可以为0~79。频域系数k从0开始,代表从低频到高频。则低频频域系数从低到高为S(0)~S(39),高频频域系数从低到高为S(40)~S(79)。Exemplarily, if the second duration is 5ms, that is, the frame length for encoding the extended frame is 5ms, and the sampling rate is 16kHz, then s(n) includes 80 sampling points, that is, N=80, the sampling point n is taken The value range is 0~79. The MDCT transform is performed on the s(n) signals of each 5ms duration one by one to obtain the corresponding MDCT coefficients. The value range of k can be 0-79. The frequency domain coefficient k starts from 0 and represents from low frequency to high frequency. Then the low-frequency frequency domain coefficients from low to high are S(0)~S(39), and the high-frequency frequency domain coefficients from low to high are S(40)~S(79).
(2)将第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,按照包络编码方式进行编码。(2) The multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.
示例性的,将上述的40个高频频域系数S(40)~S(79)平均分为8组,每一组高频分组中包括五个高频频域系数,具体分组如下:Exemplarily, the above 40 high frequency frequency domain coefficients S(40) to S(79) are equally divided into 8 groups, and each group of high frequency groups includes five high frequency frequency domain coefficients, and the specific groups are as follows:
组1包含高频频域系数为:S(40)~S(44); Group 1 contains high frequency frequency domain coefficients: S(40)~S(44);
组2包含高频频域系数为:S(45)~S(49); Group 2 contains high frequency frequency domain coefficients: S(45)~S(49);
组3包含高频频域系数为:S(50)~S(54); Group 3 contains high frequency frequency domain coefficients: S(50)~S(54);
组4包含高频频域系数为:S(55)~S(59); Group 4 contains high frequency frequency domain coefficients: S(55)~S(59);
组5包含高频频域系数为:S(69)~S(64); Group 5 contains high frequency frequency domain coefficients: S(69)~S(64);
组6包含高频频域系数为:S(65)~S(69); Group 6 contains high frequency frequency domain coefficients: S(65)~S(69);
组7包含高频频域系数为:S(70)~S(74); Group 7 contains high frequency frequency domain coefficients: S(70)~S(74);
组8包含高频频域系数为:S(75)~S(79)。 Group 8 contains high frequency frequency domain coefficients: S(75)~S(79).
接下来,得到上述多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值。第一装置可以得到第二音频信号的高频部分的每一组的组包络值,然后根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。Next, the group envelope values of the multiple high-frequency groups are obtained, where the group envelope value is the average value of the multiple high-frequency frequency domain coefficients in each group. The first device can obtain the group envelope value of each group of the high-frequency part of the second audio signal, and then encode according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
示例性的,组包络值的计算具体可以为:Exemplarily, the calculation of the group envelope value may specifically be:
组1包络值:S HE(0)=[S(40)+S(41)+S(42)+S(43)+S(44)]/5; Group 1 envelope value: S HE (0)=[S(40)+S(41)+S(42)+S(43)+S(44)]/5;
组2包络值:S HE(1)=[S(45)+S(46)+S(47)+S(48)+S(49)]/5; Group 2 envelope value: S HE (1)=[S(45)+S(46)+S(47)+S(48)+S(49)]/5;
组3包络值:S HE(2)=[S(50)+S(51)+S(52)+S(53)+S(54)]/5; Group 3 envelope value: S HE (2)=[S(50)+S(51)+S(52)+S(53)+S(54)]/5;
组4包络值:S HE(3)=[S(55)+S(56)+S(57)+S(58)+S(59)]/5; Group 4 envelope value: S HE (3)=[S(55)+S(56)+S(57)+S(58)+S(59)]/5;
组5包络值:S HE(4)=[S(60)+S(61)+S(62)+S(63)+S(64)]/5; Group 5 envelope value: S HE (4)=[S(60)+S(61)+S(62)+S(63)+S(64)]/5;
组6包络值:S HE(5)=[S(65)+S(66)+S(67)+S(68)+S(69)]/5; Group 6 envelope value: S HE (5)=[S(65)+S(66)+S(67)+S(68)+S(69)]/5;
组7包络值:S HE(6)=[S(70)+S(71)+S(72)+S(73)+S(74)]/5; Group 7 envelope value: S HE (6)=[S(70)+S(71)+S(72)+S(73)+S(74)]/5;
组8包络值:S HE(7)=[S(75)+S(76)+S(77)+S(78)+S(79)]/5。 Group 8 envelope value: S HE (7)=[S(75)+S(76)+S(77)+S(78)+S(79)]/5.
以第二时长为帧长,第一装置可以将上述得到的多个高频分组的组包络值进行数字化编码,逐帧发送给第二装置。例如,每5ms,第一装置将上述得到的S HE(0)~S HE(7)编码组装成扩展帧发送给第二装置。 Taking the second time length as the frame length, the first device may digitally encode the group envelope values of the multiple high-frequency groups obtained above, and send them to the second device frame by frame. For example, every 5 ms, the first device assembles the obtained S HE (0) to S HE (7) codes into an extended frame and sends it to the second device.
2、解码侧解码过程:2. Decoding process on the decoding side:
基于上述编码方式,第二装置每隔一定时间接收到一帧基本帧,然后按照时域的解码方式对基本帧进行解码,得到第一音频信号,该第一音频信号相对于编码侧的原始音频信号仅包含低频部分。Based on the above encoding method, the second device receives a basic frame at regular intervals, and then decodes the basic frame according to the time-domain decoding method to obtain the first audio signal, which is relative to the original audio on the encoding side The signal only contains the low frequency part.
第二装置每隔一定时间接收到一帧扩展帧,扩展帧中仅包含原始音频信号的高频部分,第二装置将扩展帧联合基本帧进行综合解码,可以得到第二音频信号。该第二音频信号不仅包括低频部分,还包括高频部分。The second device receives an extended frame at regular intervals, and the extended frame only contains the high frequency part of the original audio signal. The second device combines the extended frame with the basic frame for comprehensive decoding to obtain the second audio signal. The second audio signal includes not only a low frequency part, but also a high frequency part.
以上述实施例为例,第二装置每0.5ms可以接收到一个基本帧,然后对基本帧按照G.726的解码方式进行解码,得到基本音频信号s 1(n)。该基本音频信号s 1(n)只有低频部分,但时延较低为0.5ms。因此,该音频信可以应用于较低时延要求的音频应用,例如设备校准和定位等应用。 Taking the foregoing embodiment as an example, the second device can receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s 1 (n). The basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower latency requirements, such as equipment calibration and positioning applications.
若第二装置接收到的扩展帧中包括多个高频信号的组包络值,则根据多个高频信号的组包络值得到高频信号的多个高频频域系数,即高频信号的频域系数为高频频域系数对应的组包络值;另外,对基本音频信号进行上采样,得到第三音频信号;对第三音频信号逐帧进行频域变换,得到第三音频信号对应的低频信号的多个低频频域系数。则第二装置可以根据多个高频频域系数和多个低频频域系数恢复得到的音频信号即为扩展音频信号。If the extended frame received by the second device includes the group envelope values of multiple high-frequency signals, multiple high-frequency frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, that is, the high-frequency signal The frequency domain coefficient of is the group envelope value corresponding to the high frequency frequency domain coefficient; in addition, the basic audio signal is up-sampled to obtain the third audio signal; the third audio signal is frequency domain transformed frame by frame to obtain the third audio signal corresponding Multiple low-frequency frequency domain coefficients of the low-frequency signal. Then, the audio signal recovered by the second device according to the multiple high-frequency frequency domain coefficients and the multiple low-frequency frequency domain coefficients is the extended audio signal.
示例性的,第二装置每5ms可以接收到一个扩展帧,从扩展帧中得到音频信号高频部分的组包络值S HE(0)~S HE(7)。则根据组包络值可以得到多个高频频域系数,即令音频信号的高频频域系数等于其对应的高频频域系数分组的组包络值,即: Exemplarily, the second device may receive an extended frame every 5 ms, and obtain the group envelope values S HE (0) to S HE (7) of the high frequency part of the audio signal from the extended frame. According to the group envelope value, multiple high frequency frequency domain coefficients can be obtained, that is, the high frequency frequency domain coefficient of the audio signal is equal to the group envelope value of the corresponding high frequency frequency domain coefficient group, namely:
S(40)=S(41)=S(42)=S(43)=S(44)=S HE(0); S(40)=S(41)=S(42)=S(43)=S(44)=S HE (0);
S(45)=S(46)=S(47)=S(48)=S(49)=S HE(1); S(45)=S(46)=S(47)=S(48)=S(49)=S HE (1);
S(50)=S(51)=S(52)=S(53)=S(54)=S HE(2); S(50)=S(51)=S(52)=S(53)=S(54)=S HE (2);
S(55)=S(56)=S(57)=S(58)=S(59)=S HE(3); S(55)=S(56)=S(57)=S(58)=S(59)=S HE (3);
S(60)=S(61)=S(62)=S(63)=S(64)=S HE(4); S(60)=S(61)=S(62)=S(63)=S(64)=S HE (4);
S(65)=S(66)=S(67)=S(68)=S(69)=S HE(5); S(65)=S(66)=S(67)=S(68)=S(69)=S HE (5);
S(70)=S(71)=S(72)=S(73)=S(74)=S HE(6); S(70)=S(71)=S(72)=S(73)=S(74)=S HE (6);
S(75)=S(76)=S(77)=S(78)=S(79)=S HE(7),即可以得到S(40)~S(79)。 S(75)=S(76)=S(77)=S(78)=S(79)=S HE (7), that is, S(40)~S(79) can be obtained.
取第二时长段内接收到的基本帧恢复的音频信号,例如上述的5ms内多个基本帧解码得到的音频信号s 1(n),将该音频信号s 1(n)进行上采样处理得到第三音频信号s′ L(n)。其中,上采样处理是在原信号中的相邻两点内插入一个或者多个零点,示例性的,对上述音频信号s 1(n)经过上采样后,可以得到8k频宽、采样率为16kHz的第三音频信号s′ L(n),但该第三音频信号s′ L(n)的高频部分仍为0。 Take the audio signal recovered from the basic frame received in the second time period, such as the audio signal s 1 (n) obtained by decoding multiple basic frames within 5 ms, and perform up-sampling of the audio signal s 1 (n) to obtain The third audio signal s'L (n). Among them, the up-sampling process is to insert one or more zero points in two adjacent points in the original signal. Illustratively, after up-sampling the above audio signal s 1 (n), a bandwidth of 8k and a sampling rate of 16 kHz can be obtained. The third audio signal s′ L (n), but the high frequency part of the third audio signal s′ L (n) is still 0.
对低频部分的音频信号s′ L(n)做MDCT变换,可以根据如下公式得到频域系数S′ L(k): The frequency domain coefficient S′ L (k) can be obtained by MDCT transformation on the audio signal s′ L (n) of the low frequency part according to the following formula:
Figure PCTCN2020098183-appb-000002
Figure PCTCN2020098183-appb-000002
其中,对应5ms时延,采样率为16kHz的音频信号片段有80个采样点,即上述公式中的N=80。将S′ L(k)的低频系数与上述步骤中根据扩展帧得到的高频系数S(40)~S(79)整合,得到该音频帧的完整MDCT系数S(k)。其中,S(k)=S′ L(k),k=0~39。 Among them, the audio signal segment with a sampling rate of 16 kHz corresponding to a time delay of 5 ms has 80 sampling points, that is, N=80 in the above formula. Integrate the low-frequency coefficients of S′ L (k) with the high-frequency coefficients S(40)-S(79) obtained from the extended frame in the above steps to obtain the complete MDCT coefficient S(k) of the audio frame. Among them, S(k)=S′ L (k), and k=0-39.
对S(k)进行改进离散余弦变换的反变换,就可以得到扩展音频信号s 2(n),该扩展音频信号s 2(n)中既包括高频成分又包括低频成分。其中,改进离散余弦变换的反变换的具体公式如下: The inverse transformation of the improved discrete cosine transform is performed on S(k), and the extended audio signal s 2 (n) can be obtained, and the extended audio signal s 2 (n) includes both high-frequency components and low-frequency components. Among them, the specific formula of the inverse transform of the improved discrete cosine transform is as follows:
Figure PCTCN2020098183-appb-000003
Figure PCTCN2020098183-appb-000003
根据上述实施例解码获得的音频信号中,根据基本帧解码的音频信号s 1(n)只有低频成分,解码质量较低,但该音频信号的时延较低,可以用于对音频质量要求不高、而对音频时延要求较低的音频业务的应用。根据扩展帧与基本帧联合解码得到的音频信号s 2(n),高频、低频成分都有,解码质量较高,但是延时较长,因此,可以用于对音频质量要求较高、而对音频传输的实时性要求不高的音频业务的应用。 In the audio signal obtained by decoding according to the above-mentioned embodiment, the audio signal s 1 (n) decoded according to the basic frame has only low-frequency components, and the decoding quality is low, but the audio signal has a low delay, which can be used for different audio quality requirements. High and low audio delay requirements for audio services applications. According to the audio signal s 2 (n) obtained by joint decoding of the extended frame and the basic frame, both high frequency and low frequency components are present, and the decoding quality is higher, but the delay is longer. Therefore, it can be used for higher audio quality requirements, but Applications of audio services that do not require high real-time audio transmission.
上述本申请的实施方式,通过同一套编解码方案传输一路音频应用,解码得到的不同音频信号可以分别应用于不同的音频应用,从而避免了重复编解码和传输过程,能够极大程度的避免带宽资源的浪费,降低系统开销。In the above implementation of the present application, one audio application is transmitted through the same set of codec solutions, and different audio signals obtained by decoding can be applied to different audio applications respectively, thereby avoiding repeated coding, decoding and transmission processes, and greatly avoiding bandwidth The waste of resources reduces system overhead.
进一步的,当根据上述实施方式进行编解码时,解码侧设备接收到的基本帧丢失,或者没有接收到基本帧,无法根据基本帧解码恢复音频信号的时候,解码侧设备可以根据扩展帧进行解码,在进行频域反变换时,低频频域系数为0,仅根据高频部分的频域系数进行频域反变换,即可以恢复得到音频信号。其中,该音频信号仅包含高频部分。Further, when performing encoding and decoding according to the foregoing embodiment, the basic frame received by the device on the decoding side is lost, or the basic frame is not received, and the audio signal cannot be recovered according to the basic frame decoding, the device on the decoding side may decode according to the extended frame When performing the frequency domain inverse transform, the low frequency domain coefficient is 0, and the audio signal can be recovered by performing the frequency domain inverse transform only according to the frequency domain coefficient of the high frequency part. Among them, the audio signal only contains high frequency parts.
方式二、Method two,
1、编码侧编码过程:1. Encoding process on the encoding side:
在一种可能的实施方式中,第一装置可以采用较低时延的时域编码方式得到基本帧,即仅编码第二音频信号中的低频部分。第一装置采用较高时延的频域编码方式得到扩展帧,且扩展帧仅包含第二音频信号中的高频部分。In a possible implementation manner, the first device may adopt a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low-frequency part of the second audio signal. The first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.
例如,第二装置上有两个不同的音频应用,一个是语音增强应用,需要的音频信号要求实时性强,信号时延较低不超过6ms,高低频部分都需要。另一个是三维(three dimensional,3D)声场采集应用,需要的音频信号质量要求较高,信号时延可以较长。For example, there are two different audio applications on the second device, one is a voice enhancement application, the required audio signal requires strong real-time performance, the signal delay is low and does not exceed 6ms, and both high and low frequencies are required. The other is a three-dimensional (3D) sound field acquisition application, which requires a higher audio signal quality and a longer signal delay.
则上述的步骤S202中,第一装置对基本帧的编码具体可以包括:Then, in the above step S202, the encoding of the basic frame by the first device may specifically include:
(1)第一装置以第一时长为帧长,对第二音频信号进行频域变换,得到频域系数,即得到第二音频信号对应的低频信号的多个低频频域系数和高频信号的多个高频频域系数。(1) The first device uses the first time length as the frame length to perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients, that is, to obtain multiple low frequency frequency domain coefficients and high frequency signals of the low frequency signal corresponding to the second audio signal Of multiple high frequency frequency domain coefficients.
(2)将高频信号的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值。(2) The multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency, and the group envelope value of multiple high frequency groups is obtained, where the group envelope value is multiple high frequencies in each group. The average value of the frequency domain coefficients.
(3)根据低频信号的多个频域系数和高频信号的组包络值进行编码得到以第一时长为帧长的多个基本帧。(3) Encode according to multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain multiple basic frames with the first time length as the frame length.
示例性的,为满足第二装置上的语音增强应用对实时性的要求,第一时长可以为5ms。如采样率为16kHz,第一装置可以对每5ms的音频信号s(n)进行MDCT变换,得到MDCT系数S(k),其中,k的取值范围可以为0~79。将高频频域系数S(40)~S(79)按顺序平均分为8个组,每组包括5个高频频域系数,则得到多个高频分组的组包络值S HE(0)~S HE(7)。第一装置将低频信号的多个频域系数S(0)~S(39)和高频信号的组包络值S HE(0)~S HE(7)进行编码得到基本帧。 Exemplarily, in order to meet the real-time requirement of the voice enhancement application on the second device, the first duration may be 5 ms. For example, the sampling rate is 16kHz, the first device can perform MDCT transformation on the audio signal s(n) every 5ms to obtain the MDCT coefficient S(k), where the value range of k can be 0-79. Divide the high frequency frequency domain coefficients S(40)~S(79) into 8 groups evenly in order, and each group includes 5 high frequency frequency domain coefficients, then the group envelope value S HE (0) of multiple high frequency groups is obtained ~S HE (7). The first device encodes the multiple frequency domain coefficients S(0)-S(39) of the low-frequency signal and the group envelope values S HE (0)-S HE (7) of the high-frequency signal to obtain a basic frame.
进一步的,上述的步骤S202中,第一装置对扩展帧的编码具体可以包括:Further, in the foregoing step S202, the encoding of the extended frame by the first device may specifically include:
第一装置以第二时长为单位,将高频信号的多个频域系数与对应的组包络值得到的差值进行编码,得到以第二时长为帧长的多个扩展帧。The first device uses the second duration as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the second duration as the frame length.
示例性的,第一装置可以每20ms,计算得到上述基本帧编码后高频部分各高频频域系数与其对应的高频分组的组包络值的差值。具体的,可以将多个高频频域系数与该高频频域系数对应的组包络值相减,得到组包络系数差值SD HE(k),其中,k=40~79。计算方法可以如下: Exemplarily, the first device may calculate the difference between each high-frequency frequency domain coefficient of the high-frequency part after the basic frame encoding and the group envelope value of the corresponding high-frequency group every 20 ms. Specifically, multiple high-frequency frequency domain coefficients may be subtracted from the group envelope value corresponding to the high-frequency frequency domain coefficient to obtain the group envelope coefficient difference SD HE (k), where k=40-79. The calculation method can be as follows:
SD HE(40)=S(40)-S HE(0); SD HE (40)=S(40)-S HE (0);
SD HE(41)=S(41)-S HE(0); SD HE (41)=S(41)-S HE (0);
……...
SD HE(45)=S(45)-S HE(1); SD HE (45)=S(45)-S HE (1);
SD HE(46)=S(45)-S HE(1); SD HE (46)=S(45)-S HE (1);
……...
SD HE(78)=S(78)-S HE(7) SD HE (78)=S(78)-S HE (7)
SD HE(79)=S(79)-S HE(7)。 SD HE (79)=S(79)-S HE (7).
第一装置可以每隔20ms将这些组包络系数差值SD HE(40)~SD HE(79)组装成扩展帧,传输给第二装置。其中,第一装置可以将这些组包络系数差值SD HE(40)~SD HE(79)直接封装进行传输,或者也可以用差分量化方式进行编码传输。 The first device may assemble these group envelope coefficient differences SD HE (40) to SD HE (79) into an extended frame every 20ms, and transmit it to the second device. Among them, the first device may directly encapsulate these group envelope coefficient differences SD HE (40) to SD HE (79) for transmission, or may also use differential quantization for encoding and transmission.
2、解码侧解码过程:2. Decoding process on the decoding side:
基于上述编码方式,第二装置每隔第一时长接收到一帧基本帧,若基本帧包括低频信号的多个频域系数和高频信号的多个组包络值,则第二装置根据基本帧中高频信号的多个组包络值得到高频信号的多个频域系数,再根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到第一音频信号。Based on the above encoding method, the second device receives a basic frame every first time length. If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, the second device receives the basic frame according to the basic The multiple envelope values of the high-frequency signal in the frame obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain the first An audio signal.
第二装置每隔第二时长接收到一帧扩展帧,若扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则第二装置可以结合基本帧中高频信号的组包络值,得到高频信号的多个频域系数,再根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到第二音频信号。该第二音频信号不仅包括低频部分,还包括高频部分。The second device receives an extended frame every second time length. If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device can combine the high-frequency signal in the basic frame Obtain multiple frequency domain coefficients of the high-frequency signal, and then perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the second audio signal. The second audio signal includes not only a low frequency part, but also a high frequency part.
以上述实施例为例,第二装置每5ms可以接收到基本帧,第二装置首先根据基本帧得到S(k)的低频部分频域系数,即得到S(0)~S(39)。第二装置再根据基本帧中的高频组包络值得到高频系数,也就是可以另各高频频域系数等于其对应的组包络值,即:Taking the foregoing embodiment as an example, the second device can receive the basic frame every 5 ms. The second device first obtains the frequency domain coefficients of the low frequency part of S(k) according to the basic frame, that is, S(0) to S(39). The second device then obtains the high-frequency coefficients according to the high-frequency group envelope value in the basic frame, that is, each high-frequency frequency domain coefficient can be made equal to its corresponding group envelope value, namely:
S(40)=S(41)=S(42)=S(43)=S(44)=S HE(0); S(40)=S(41)=S(42)=S(43)=S(44)=S HE (0);
S(45)=S(46)=S(47)=S(48)=S(49)=S HE(1); S(45)=S(46)=S(47)=S(48)=S(49)=S HE (1);
S(50)=S(51)=S(52)=S(53)=S(54)=S HE(2); S(50)=S(51)=S(52)=S(53)=S(54)=S HE (2);
S(55)=S(56)=S(57)=S(58)=S(59)=S HE(3); S(55)=S(56)=S(57)=S(58)=S(59)=S HE (3);
S(60)=S(61)=S(62)=S(63)=S(64)=S HE(4); S(60)=S(61)=S(62)=S(63)=S(64)=S HE (4);
S(65)=S(66)=S(67)=S(68)=S(69)=S HE(5); S(65)=S(66)=S(67)=S(68)=S(69)=S HE (5);
S(70)=S(71)=S(72)=S(73)=S(74)=S HE(6); S(70)=S(71)=S(72)=S(73)=S(74)=S HE (6);
S(75)=S(76)=S(77)=S(78)=S(79)=S HE(7),即得到S(40)~S(79)。 S(75)=S(76)=S(77)=S(78)=S(79)=S HE (7), that is, S(40)~S(79) are obtained.
综合上述根据基本帧解码得到的低频频域系数S(0)~S(39),和高频部分有缺陷的高频频域系数S(40)~S(79)。对得到的S(0)~S(79)进行MDCT反变换,得到基本音频信号s 1(n)。该基本音频信号s 1(n)时延较低,既包括原始音频信号的高频部分又包括低频部分。但由于高频部分仅是用组包络值还原的高频信号,即多个频带的数值是一样的,因此高频部分信号质量稍差,相当于降低了高频部分的频域分辨率。 Combining the above-mentioned low-frequency frequency domain coefficients S(0)-S(39) obtained from basic frame decoding, and the defective high-frequency frequency domain coefficients S(40)-S(79) of the high-frequency part. Perform inverse MDCT transformation on the obtained S(0)-S(79) to obtain the basic audio signal s 1 (n). The basic audio signal s 1 (n) has a relatively low time delay and includes both the high frequency part and the low frequency part of the original audio signal. However, since the high frequency part is only a high frequency signal restored with the group envelope value, that is, the values of multiple frequency bands are the same, the signal quality of the high frequency part is slightly worse, which is equivalent to reducing the frequency domain resolution of the high frequency part.
第二装置每20ms可以接收到扩展帧,第二装置从扩展帧中得到音频信号高频部分的组包络系数差值SD HE(40)~SD HE(79)。然后根据SD HE(40)~SD HE(79)得到每个基本帧中高频部分的频域系数,即通过如下所示,将组包络系数差值与谱包络相加得到每个高频频域系数: The second device can receive the extended frame every 20ms, and the second device obtains the group envelope coefficient difference SD HE (40)-SD HE (79) of the high frequency part of the audio signal from the extended frame. Then according to SD HE (40) ~ SD HE (79) to obtain the frequency domain coefficients of the high frequency part of each basic frame, that is, by adding the group envelope coefficient difference and the spectral envelope as shown below, each high frequency frequency Domain coefficient:
S(40)=SD HE(40)+S HE(0); S(40)=SD HE (40)+S HE (0);
S(41)=SD HE(41)+S HE(0); S(41)=SD HE (41)+S HE (0);
….......
S(45)=SD HE(45)+S HE(1); S(45)=SD HE (45)+S HE (1);
S(46)=SD HE(46)+S HE(1); S(46)=SD HE (46)+S HE (1);
….......
S(78)=SD HE(78)+S HE(7); S(78)=SD HE (78)+S HE (7);
S(79)=SD HE(79)+S HE(7),即可以得到频谱的完整高频部分S(40)~S(79)。 S(79)=SD HE (79)+S HE (7), that is, the complete high frequency part S(40)~S(79) of the frequency spectrum can be obtained.
综合上述根据基本帧解码得到的低频部分的频域系数S(0)~S(39),对得到的S(0)~S(79)进行MDCT反变换,得到扩展音频信号s 2(n),该扩展音频信号s 2(n)既包括原始音频信号的高频部分又包括低频部分,且高频部分是用组包络值结合组包络系数差值还原的高频信号,因此扩展音频信号s 2(n)相较于基本音频信号s 1(n)的还原质量较高,但是扩展音频信号s 2(n)的时延较长,在信号传输的实时性方面,基本音频信号s 1(n)是优于扩展音频信号s 2(n)的。 Synthesize the frequency domain coefficients S(0)~S(39) of the low frequency part obtained from the basic frame decoding, and perform the MDCT inverse transformation on the obtained S(0)~S(79) to obtain the extended audio signal s 2 (n) , The extended audio signal s 2 (n) includes both the high frequency part and the low frequency part of the original audio signal, and the high frequency part is the high frequency signal restored with the group envelope value combined with the group envelope coefficient difference, so the expanded audio signal s 2 (n) higher compared to the basic audio signal s 1 (n) reducing the quality, but the longer the delay spreading s 2 (n) of the audio signal in terms of real-time signal transmission, the basic audio signal s 1 (n) is better than the extended audio signal s 2 (n).
方式三、Way three
1、编码侧编码过程:1. Encoding process on the encoding side:
在一种可能的实施方式中,当第一装置需要满足第二装置上的三种以上的不同音频应用需求时,第一装置可以编码出一种基本帧和两种以上的扩展帧。In a possible implementation manner, when the first device needs to meet more than three different audio application requirements on the second device, the first device may encode one basic frame and two or more extended frames.
具体可以通过:第一装置采用较低时延、低质量的时域编码方式得到基本帧,即仅编码第二音频信号中的低频部分。第一装置采用较高时延、低质量的频域编码方式得到第一扩展帧,第一扩展帧仅是对第二音频信号中高频部分的频域组包络值进行编码。第一装置采用较高时延、高质量的频域编码方式得到第二扩展帧,第二扩展帧中包含第二音频信号中的高频部分。Specifically, the basic frame can be obtained by the first device using a time-domain coding method with a lower delay and low quality, that is, only the low frequency part of the second audio signal is encoded. The first device obtains the first extended frame by adopting a frequency domain coding method with higher delay and low quality. The first extended frame only encodes the envelope value of the frequency domain group of the high frequency part of the second audio signal. The first device adopts a higher time delay and high-quality frequency domain coding method to obtain a second extended frame, and the second extended frame contains the high frequency part of the second audio signal.
例如,第二装置上有三个不同的音频应用,一个是设备校准和定位应用,处理音频信号的要求是实时性强,需要信号发送时延间隔不超过1ms,音频信号可以只包含低频信号不包含高频信号;第二个是是语音增强应用,该应用处理音频信号的要求是实时性较强,信号发送时延不超过6ms,音频质量的要求较高,音频信号中的高频信号和低频信号部分都需要;第三个是为3D声场采集应用,该应用处理音频信号的实时性要求不高,但是对音频的质量要求较高。For example, there are three different audio applications on the second device. One is equipment calibration and positioning applications. The requirement for processing audio signals is real-time, and the signal transmission delay interval should not exceed 1ms. The audio signal can only contain low-frequency signals. High-frequency signal; the second is the application of voice enhancement, the application of audio signal processing requirements is strong real-time, the signal transmission delay does not exceed 6ms, the audio quality requirements are higher, the high-frequency signal and low-frequency in the audio signal The signal part is required; the third is for the 3D sound field acquisition application, which does not require high real-time processing of audio signals, but requires high audio quality.
则上述的步骤S202中,第一装置对基本帧的编码具体可以参看上述方式一中对基本帧的编码方式,可以包括:Then, in the foregoing step S202, the encoding of the basic frame by the first device may refer to the encoding manner of the basic frame in the foregoing manner 1, which may include:
(1)对第二音频信号进行下采样,得到第二音频信号中包括的低频信号;(1) Down-sampling the second audio signal to obtain the low-frequency signal included in the second audio signal;
(2)根据时域编码方式对该低频信号进行编码,得到多个以第一时长为帧长的基本帧。(2) Encode the low-frequency signal according to the time-domain encoding method to obtain a plurality of basic frames with the first duration as the frame length.
示例性的,第一装置可以对s L(n)采用G.726的编码方式编码,以第一时长为间隔组装成基本帧,例如,第一时长可以为0.5ms,满足上述第一个音频应用的需求。 Exemplarily, the first device may use the G.726 encoding method to encode s L (n), and assemble it into a basic frame at the interval of the first time length. For example, the first time length may be 0.5 ms, which satisfies the above-mentioned first audio frequency. Application requirements.
进一步的,上述的步骤S202中,第一装置对第一扩展帧的编码具体可以参看上述方式一中对扩展帧的编码处理,包括:Further, in the foregoing step S202, the encoding of the first extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:
(1)以第二时长为帧长,对第二音频信号进行频域变换,得到第二音频信号对应的频域系数;(1) Using the second duration as the frame length, perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;
(2)将第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,按照包络编码方式进行编码。(2) The multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.
示例性的,第一装置可以对s(n)进行MDCT变换,得到MDCT频域系数,如帧长为5ms,采样率为16kHz,s(n)包括80个采样点,即可以得到S(0)~S(79)。将其中的40个高频分量系数S(40)~S(79)平均分为8组,每一组高频分组有五个高频分量系数,得到每组高频分组的组包络值S HE(0)~S HE(7),其中,组包络值为每组中多个高频频域系数的平均值。第一装置可以将上述得到的多个高频分组的组包络值S HE(0)~S HE(7)进行数字化编码,每5ms第一装置将上述得到的S HE(0)~S HE(7)编码组装成扩展帧发送给第二装置。 Exemplarily, the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients. For example, the frame length is 5ms, the sampling rate is 16kHz, and s(n) includes 80 sampling points, that is, S(0 )~S(79). Divide the 40 high-frequency component coefficients S(40)~S(79) into 8 groups equally, and each group of high-frequency groups has five high-frequency component coefficients, and obtain the group envelope value S of each group of high-frequency groups. HE (0) ~ S HE (7), where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group. The first device can digitally encode the multiple high-frequency group envelope values S HE (0) ~ S HE (7) obtained above, and every 5 ms, the first device converts the obtained S HE (0) ~ S HE (0) ~ S HE (7) The encoding is assembled into an extended frame and sent to the second device.
结合上文,则上述的步骤S202中对第二扩展帧的编码具体可以参看上述方式二中对扩展帧的编码处理,包括:In combination with the above, the encoding of the second extended frame in the foregoing step S202 may refer to the encoding process of the extended frame in the second manner, including:
第一装置以第三时间为单位,将高频信号的多个频域系数与对应的组包络值得到的差值进行编码,得到以第三时间为帧长的多个扩展帧。The first device uses the third time as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the third time as the frame length.
示例性的,第一装置可以每20ms,计算得到上述第一扩展帧编码后高频部分各高频频域系数与其对应的高频分组的组包络值的差值。具体的,可以将多个高频频域系数与该高频频域系数对应的组包络值相减,得到多个高频频域系数的组包络系数差值SD HE(40)~SD HE(79)。然后第一装置可以每隔20ms,将这些组包络系数差值SD HE(40)~SD HE(79)组装成第二扩展帧,传输给第二装置。 Exemplarily, the first device may calculate the difference between each high-frequency frequency domain coefficient of the high-frequency part after encoding of the first extended frame and the group envelope value of the corresponding high-frequency group every 20 ms. Specifically, multiple high-frequency frequency domain coefficients can be subtracted from the group envelope value corresponding to the high-frequency frequency domain coefficient to obtain group envelope coefficient differences SD HE (40) to SD HE (79 ). Then, the first device can assemble these group envelope coefficient differences SD HE (40)-SD HE (79) into a second extended frame every 20ms, and transmit it to the second device.
2、解码侧解码过程:2. Decoding process on the decoding side:
基于上述编码方式,第二装置每隔第一时长接收到一帧基本帧,然后按照时域的解码方式对基本帧进行解码,得到基本音频信号,该基本音频信号相对于编码侧的原始音频信号仅包含低频部分。Based on the above encoding method, the second device receives a basic frame every first time length, and then decodes the basic frame according to the time domain decoding method to obtain a basic audio signal, which is relative to the original audio signal on the encoding side Only the low frequency part is included.
第二装置每隔第二时长接收到一帧第一扩展帧,若第一扩展帧中包括多个高频信号的组包络值,则第二装置根据多个高频信号的组包络值得到高频信号的多个频域系数,高频信号的频域系数为频域系数对应的组包络值;同时,对基本帧解码得到的第一音频信号进行上采样,得到第三音频信号;对第三音频信号逐帧进行频域变换,得到第三音频信号对应的低频信号的多个频域系数。然后根据高频信号的多个频域系数和低频信号的多个频域系数进行频域反变换,得到第一扩展音频信号。该第一扩展音频信号包括低频信号和高频信号,但高频质量稍弱,该第一扩展音频信号的时延较长,因此,可以用于上述第二个音频业务的应用。The second device receives a frame of the first extended frame every second time length. If the first extended frame includes the group envelope values of multiple high-frequency signals, the second device uses the group envelope values of the multiple high-frequency signals To the multiple frequency domain coefficients of the high frequency signal, the frequency domain coefficient of the high frequency signal is the group envelope value corresponding to the frequency domain coefficient; at the same time, the first audio signal obtained by decoding the basic frame is up-sampled to obtain the third audio signal ; Perform frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal. Then, inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the first extended audio signal. The first extended audio signal includes a low-frequency signal and a high-frequency signal, but the high-frequency quality is slightly weaker, and the first extended audio signal has a longer time delay. Therefore, it can be used for the application of the second audio service described above.
第二装置每隔第三时间接收到一帧第二扩展帧,若第二扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则第二装置可以结合第一扩展帧中高频信号的组包络值,得到高频信号的多个频域系数,再根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到第二扩展音频信号。该第二扩展音频信号不仅包括低频部分,还包括高频部分。The second device receives a second extended frame every third time. If the second extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device may combine the first Expand the group envelope value of the high-frequency signal in a frame to obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain The second extended audio signal. The second extended audio signal includes not only a low frequency part, but also a high frequency part.
示例性的,结合上述实施例,第二装置每0.5ms可以接收到一个基本帧,然后对基本帧按照G.726的解码方式进行解码,得到基本音频信号s 1(n)。该基本音频信号s 1(n)只有低频部分,但时延较低为0.5ms。因此,该音频信可以应用于较低时延要求的音频应用,例如上述的设备校准和定位等应用。 Exemplarily, in combination with the foregoing embodiment, the second device may receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s 1 (n). The basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower delay requirements, such as the aforementioned equipment calibration and positioning applications.
第二装置每5ms可以接收到一个第一扩展帧,从第一扩展帧中得到音频信号高频部分的组包络值S HE(0)~S HE(7),则第二装置可以根据组包络值可以得到多个高频频域系数S(40)~S(79)。第二装置对5ms内接收到的多个基本帧解码得到的音频信号s L(n)进行上采样处理得到音频信号s′ L(n),对s′ L(n)做MDCT变换得到低频频域系数S(0)~S(39)。对S(0)~S(79)进行MDCT反变换,就可以得到第一扩展音频信号s 2(n),该第一扩展音频信号s 2(n)中既包括高频部分又包括低频部分,其中,高频部分质量稍弱。 The second device can receive a first extended frame every 5ms, and obtain the group envelope value S HE (0) ~ S HE (7) of the high frequency part of the audio signal from the first extended frame, then the second device can be based on the group Envelope value can get multiple high frequency frequency domain coefficients S(40)~S(79). The second device performs up-sampling processing on the audio signal s L (n) obtained by decoding multiple basic frames received within 5 ms to obtain the audio signal s′ L (n), and performs MDCT transformation on s′ L (n) to obtain the low frequency frequency Domain coefficients S(0)~S(39). Perform the MDCT inverse transformation on S(0)~S(79) to obtain the first extended audio signal s 2 (n). The first extended audio signal s 2 (n) includes both the high frequency part and the low frequency part. , Among them, the quality of the high frequency part is slightly weaker.
第二装置每20ms可以接收到一个第二扩展帧,从第二扩展帧中得到音频信号高频部分的组包络系数差值SD HE(40)~SD HE(79)。然后根据SD HE(40)~SD HE(79),结合上述第一扩展帧中得到音频信号高频部分的组包络值S HE(0)~S HE(7),得到每个高频部分的频域系数S(40)~S(79)。对S(0)~S(79)进行MDCT反变换,就得到了该20ms时间段的第二扩展音频信号s 3(n),该第二扩展音频信号s 3(n)既包括高频部分又包括低频部分,其中,第二扩展音频信号s 3(n)相较于第一扩展音频信号s 2(n)的高频部分质量稍好。 The second device may receive a second extended frame every 20ms, and obtain the group envelope coefficient difference SD HE (40)-SD HE (79) of the high frequency part of the audio signal from the second extended frame. Then according to SD HE (40) ~ SD HE (79), combined with the group envelope value S HE (0) ~ S HE (7) of the high frequency part of the audio signal obtained in the above-mentioned first extended frame, each high frequency part is obtained The frequency domain coefficients S(40)~S(79). The inverse MDCT transform is performed on S(0)~S(79), and the second extended audio signal s 3 (n) of the 20ms time period is obtained. The second extended audio signal s 3 (n) includes both the high frequency part It also includes a low frequency part, where the second extended audio signal s 3 (n) has a slightly better quality than the high frequency part of the first extended audio signal s 2 (n).
通过上述实施方式,本申请提供了更多可能的音频编码结构,可以适用于三种及三种以上不同要求的音频应用,从而节省传输带宽,提高系统性能。Through the foregoing implementation manners, the present application provides more possible audio coding structures, which can be applied to three or more audio applications with different requirements, thereby saving transmission bandwidth and improving system performance.
方式四、Way four,
1、编码侧编码过程:1. Encoding process on the encoding side:
在一种可能的实施方式中,第一装置可以采用较低时延、低质量的时域编码方式得到基本帧,即仅编码第二音频信号中的低频部分。第一装置可以采用较高时延、低质量的频域编码方式得到扩展帧,仅对第二音频信号中低频部分的频域组包络值和高频部分的频域组包络值进行编码。In a possible implementation manner, the first device may use a time-domain coding method with a lower delay and low quality to obtain the basic frame, that is, only the low-frequency part of the second audio signal is encoded. The first device can use a higher delay, low quality frequency domain encoding method to obtain the extended frame, and only encode the frequency domain group envelope value of the low frequency part and the frequency domain group envelope value of the high frequency part of the second audio signal .
则上述的步骤S202中,第一装置对基本帧的编码具体可以参看上述方式一中对基 本帧的编码方式,可以包括:Then, in the above step S202, for the encoding of the basic frame by the first device, refer to the encoding method for the basic frame in the above method 1, which may include:
(1)对第二音频信号进行下采样,得到第二音频信号中包括的低频信号。(1) Down-sampling the second audio signal to obtain the low-frequency signal included in the second audio signal.
(2)根据时域编码方式对该低频信号进行编码,得到多个以第一时长为帧长的基本帧。(2) Encode the low-frequency signal according to the time-domain encoding method to obtain a plurality of basic frames with the first duration as the frame length.
示例性的,第一装置可以对s L(n)采用G.726的编码方式编码,以第一时长为间隔组装成基本帧,例如,第一时长可以为0.5ms。 Exemplarily, the first device may use the G.726 encoding method to encode s L (n), and assemble it into a basic frame at intervals of the first duration, for example, the first duration may be 0.5 ms.
进一步的,上述的步骤S202中,第一装置对扩展帧的编码具体可以参照上述方式一中对扩展帧的编码处理,包括:Further, in the foregoing step S202, the encoding of the extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:
(1)以第二时长为单位对第二音频信号进行频域变换,得到第二音频信号对应的频域系数。(1) Perform frequency domain transformation on the second audio signal by using the second duration as a unit to obtain frequency domain coefficients corresponding to the second audio signal.
(2)将第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,并且对低频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个低频分组的组包络值,按照包络编码方式进行编码。(2) The multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, and the low frequency part The multiple frequency domain coefficients of are averagely grouped in the order from low frequency to high frequency, and the group envelope values of multiple low frequency groups are obtained, which are encoded according to the envelope coding method.
示例性的,第一装置可以对s(n)进行MDCT变换,得到MDCT频域系数,如帧长为5ms,采样率为16kHz,s(n)包括80个采样点,即可以得到S(0)~S(79)。将其中的40个低频分量系数S(0)~S(39)平均分为8组,每一组高频分组有五个低频分量系数,得到每组低频分组的组包络值S LE(0)~S LE(7)。并且,将40个高频分量系数S(40)~S(79)平均分为8组,每一组高频分组有五个高频分量系数,得到每组高频分组的组包络值S HE(0)~S HE(7),其中,组包络值为每组中多个高频频域系数的平均值。第一装置可以将上述得到的多个低频分组的组包络值S LE(0)~S LE(7)进行数字化编码,且对多个高频分组的组包络值S HE(0)~S HE(7)进行数字化编码,每5ms第一装置将上述得到的S LE(0)~S LE(7)和S HE(0)~S HE(7)编码组装成扩展帧发送给第二装置。 Exemplarily, the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients. For example, the frame length is 5ms, the sampling rate is 16kHz, and s(n) includes 80 sampling points, that is, S(0 )~S(79). Divide the 40 low-frequency component coefficients S(0)~S(39) into 8 groups evenly. Each high-frequency group has five low-frequency component coefficients, and the group envelope value S LE (0 )~S LE (7). In addition, the 40 high-frequency component coefficients S(40)~S(79) are divided into 8 groups evenly, and each high-frequency group has five high-frequency component coefficients, and the group envelope value S of each high-frequency group is obtained. HE (0) ~ S HE (7), where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group. The first device can digitally encode the group envelope values S LE (0) ~ S LE (7) of the multiple low frequency groups obtained above, and perform the group envelope values S HE (0) ~ of the multiple high frequency groups. S HE (7) performs digital encoding. Every 5ms, the first device assembles the S LE (0) ~ S LE (7) and S HE (0) ~ S HE (7) obtained above into an extended frame and sends it to the second Device.
2、解码侧解码过程:2. Decoding process on the decoding side:
基于上述编码方式,第二装置每隔第一时长接收到一帧基本帧,然后按照时域的解码方式对基本帧进行解码,得到基本音频信号,该第一音频信号相对于编码侧的原始音频信号仅包含低频部分。Based on the above encoding method, the second device receives a basic frame every first time period, and then decodes the basic frame according to the time-domain decoding method to obtain a basic audio signal. The first audio signal is relative to the original audio on the encoding side. The signal contains only the low frequency part.
第二装置每隔第二时长接收到一帧扩展帧,若扩展帧包括低频信号的多个组包络值和高频信号的多个组包络值,则根据低频信号的多个组包络值得到低频信号的多个频域系数,并根据高频信号的多个组包络值得到高频信号的多个频域系数。其中,若第二装置正常接收到多个基本帧,则低频信号的多个频域系数可以是根据基本帧得到的第一音频信号进行频域变换确定的。若第二装置没有正常接收到多个基本帧,则第二装置可以根据扩展帧中的低频信号的多个组包络值确定低频信号的多个频域系数,其中,多个低频信号的频域系数是频域系数对应的组包络值。第二装置可以根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。The second device receives an extended frame every second time length, and if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, it is based on the multiple group envelope values of the low-frequency signal The value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal. Wherein, if the second device normally receives multiple basic frames, the multiple frequency domain coefficients of the low-frequency signal may be determined by performing frequency domain transformation on the first audio signal obtained from the basic frame. If the second device does not normally receive multiple basic frames, the second device can determine multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal in the extended frame, where the frequency of the multiple low-frequency signals The domain coefficient is the group envelope value corresponding to the frequency domain coefficient. The second device can perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
示例性的,若第二装置正常接收基本帧,如第二装置每0.5ms可以接收到一个基本帧,然后对基本帧按照G.726的解码方式进行解码,得到基本音频信号s 1(n)。该基本音频信号s 1(n)只有低频部分,但时延较低为0.5ms。 Exemplarily, if the second device normally receives the basic frame, for example, the second device can receive a basic frame every 0.5ms, and then decode the basic frame according to the G.726 decoding method to obtain the basic audio signal s 1 (n) . The basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms.
第二装置每5ms可以接收到一个扩展帧,则从扩展帧中得到音频信号高频部分的 组包络值S HE(0)~S HE(7),则根据组包络值可以得到多个高频频域系数S(40)~S(79)。第二装置对5ms内接收到的多个扩展帧解码得到的音频信号s L(n)进行上采样处理得到音频信号s′ L(n),对s′ L(n)做MDCT变换得到低频频域系数S(0)~S(39)。对S(0)~S(79)进行MDCT反变换,就可以得到扩展音频信号s 2(n),该扩展音频信号s 2(n)中既包括高频部分又包括低频部分,其中,高频部分质量稍弱。 The second device can receive an extended frame every 5ms, and the group envelope value S HE (0) ~ S HE (7) of the high frequency part of the audio signal is obtained from the extended frame, and then multiple envelope values can be obtained according to the group envelope value. High frequency frequency domain coefficients S(40)~S(79). The second device performs up-sampling processing on the audio signal s L (n) obtained by decoding multiple extended frames received within 5 ms to obtain the audio signal s′ L (n), and performs MDCT transformation on s′ L (n) to obtain the low frequency frequency Domain coefficients S(0)~S(39). Perform inverse MDCT transformation on S(0)~S(79) to obtain the extended audio signal s 2 (n). The extended audio signal s 2 (n) includes both the high frequency part and the low frequency part. The quality of the frequency part is slightly weaker.
示例性的,若第二装置没有正常接收到基本帧,例如基本帧丢失或者经验证接收到的是有差错的基本帧,则第二装置根据扩展帧解码得到的低频部分的组包络值S LE(0)~S LE(7)得到多个低频频域系数S(0)~S(39),其中,多个低频频域系数等于其对应的低频频域系数分组的组包络值。第二装置根据扩展帧解码得到的高频部分的组包络值S HE(0)~S HE(7)得到多个高频频域系数S(40)~S(79),其中,多个高频频域系数等于其对应的高频频域系数分组的组包络值。第二装置对5ms内接收到的多个扩展帧解码得到的S(0)~S(79)进行MDCT反变换,就可以得到扩展音频信号s 2(n),该扩展音频信号s 2(n)中既包括高频部分又包括低频部分。 Exemplarily, if the second device does not receive the basic frame normally, for example, the basic frame is lost or it is verified that the received basic frame is a basic frame with errors, the second device decodes the group envelope value S of the low frequency part obtained by decoding the extended frame. LE (0) to S LE (7) obtain a plurality of low frequency frequency domain coefficients S(0) to S(39), wherein the plurality of low frequency frequency domain coefficients is equal to the group envelope value of the corresponding low frequency frequency domain coefficient group. The second device obtains multiple high-frequency frequency domain coefficients S(40)-S(79) according to the group envelope values S HE (0)-S HE (7) of the high-frequency part obtained by decoding the extended frame. The frequency-frequency domain coefficient is equal to the group envelope value of the corresponding high-frequency frequency domain coefficient group. The second device performs MDCT inverse transformation on S(0)~S(79) obtained by decoding multiple extended frames received within 5ms, and then the extended audio signal s 2 (n) can be obtained. The extended audio signal s 2 (n ) Includes both high frequency part and low frequency part.
根据上述实施方式,解码侧设备可以在基本帧无法正常解码恢复音频信号时,仍然可以基于扩展帧进行解码,实现整个音频信号的恢复。According to the foregoing implementation manner, when the basic frame cannot be decoded to restore the audio signal normally, the device on the decoding side can still decode based on the extended frame to realize the restoration of the entire audio signal.
综上所述,本申请提供的上述实施方式,可以通过同一套编解码方案传输一路音频应用,根据基本帧或者扩展帧解码得到不同的音频信号可以分别应用于不同的音频应用,从而避免了重复编解码和传输过程,能够极大程度的避免带宽资源的浪费,降低系统开销。此外,当解码侧基本帧丢失,无法根据基本帧解码恢复音频信号的时候,解码侧设备可以根据扩展帧进行解码,进一步提高了音频传输的可靠性。In summary, the above-mentioned implementations provided by this application can transmit one audio application through the same set of codec solutions, and different audio signals obtained by decoding the basic frame or extended frame can be applied to different audio applications, thereby avoiding duplication. The encoding, decoding and transmission process can greatly avoid the waste of bandwidth resources and reduce system overhead. In addition, when the basic frame on the decoding side is lost and the audio signal cannot be recovered according to the basic frame decoding, the device on the decoding side can decode according to the extended frame, which further improves the reliability of audio transmission.
在另一种可能的实施方式中,在音频信号的编解码传输之前,编码侧设备可以根据音频应用对传输音频信号的编码要求,事先和解码侧设备进行通信,协商出具体的编解码方式。例如,根据第二装置上第一音频应用需要低延时、低质量的音频信号,则第二装置向第一装置发送音频信号请求信息中携带该配置信息,用于指示该音频信号请求对应的编码方式。或者,可以通过第一装置向第二装置发送编码帧的时候,通过约定的比特位来指示该编码帧的编码方式,例如,第一装置向第二装置发送音频信号的基本帧,该基本帧中包括预先配置的两个比特位,如01可以表示编码方式二。可知,上述编解码的配置方式仅是示例性的示出,并不限于上述两种,本申请实施例对此不作具体限定。In another possible implementation manner, before the audio signal is encoded and decoded for transmission, the encoding side device may communicate with the decoding side device in advance according to the encoding requirements of the audio application for the transmission of the audio signal, and negotiate a specific encoding and decoding mode. For example, according to the first audio application on the second device that requires a low-latency, low-quality audio signal, the second device sends the audio signal request information to the first device to carry the configuration information, which is used to indicate that the audio signal request corresponds to Encoding. Or, when the first device sends the encoded frame to the second device, the encoding mode of the encoded frame can be indicated by the agreed bit. For example, the first device sends the basic frame of the audio signal to the second device, and the basic frame It includes two pre-configured bits. For example, 01 can indicate encoding mode two. It can be seen that the configuration of the foregoing codec is only shown as an example, and is not limited to the foregoing two types, and the embodiment of the present application does not specifically limit this.
本申请还提供一种音频处理装置,如图6,该装置600可以包括预处理模块601、编码模块602和发送模块603。The present application also provides an audio processing device, as shown in FIG. 6, the device 600 may include a preprocessing module 601, an encoding module 602, and a sending module 603.
预处理模块601,可以用于对获取的第一音频信号进行采样和量化处理,得到第二音频信号。The preprocessing module 601 may be used to perform sampling and quantization processing on the acquired first audio signal to obtain the second audio signal.
编码模块602,可以用于以第一时长为单位对所述第二音频信号通过第一编码方式进行编码得到基本帧,以第二时长为单位对所述第二音频信号通过第二编码方式进行编码得到扩展帧,其中,所述第二时长大于所述第一时长,且所述第一编码方式和所述第二编码方式分别对所述第二音频信号中携带的不同信号进行编码,和/或分别对所述第二音频信号进行不同编码程度的编码。The encoding module 602 may be configured to encode the second audio signal in a first encoding mode in a first time length as a unit to obtain a basic frame, and perform the second audio signal in a second encoding method in a second time length as a unit. Encoding to obtain an extended frame, wherein the second duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and /Or encoding the second audio signal with different encoding degrees respectively.
发送模块603,可以用于将基本帧和扩展帧发送给第二装置。The sending module 603 can be used to send the basic frame and the extended frame to the second device.
在一种可能的设计方式中,第二时长为第一时长的N倍,N为大于等于2的自然数。In a possible design manner, the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
在一种可能的设计方式中,编码模块602具体可以用于:对第二音频信号进行下采样,得到第二音频信号中包括携带的低频信号;根据时域编码方式对低频信号进行编码,得到多个以第一时长为帧长的多个基本帧。In a possible design manner, the encoding module 602 may be specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; and encode the low-frequency signal according to the time-domain encoding method to obtain Multiple basic frames with the first duration as the frame length.
在一种可能的设计方式中,编码模块602具体可以用于:对第二音频信号进行频域变换,得到第二音频信号对应的频域系数;将第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值;根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。In a possible design manner, the encoding module 602 can be specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal; Part of the multiple frequency domain coefficients are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, where the group envelope value is the average value of multiple high frequency frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
在一种可能的设计方式中,编码模块602具体可以用于:对第二音频信号进行频域变换,得到第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;将高频信号的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,组包络值为每组中多个高频频域系数的平均值;根据低频信号的多个频域系数和高频信号的组包络值进行编码得到以第一时长为帧长的多个基本帧。In a possible design manner, the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients; the multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups. Among them, the group envelope value is multiple high frequencies in each group. The average value of the frequency domain coefficients; encoding according to the multiple frequency domain coefficients of the low frequency signal and the group envelope value of the high frequency signal to obtain multiple basic frames with the first time length as the frame length.
在一种可能的设计方式中,编码模块602具体可以用于:以第二时长为单位,将高频信号的多个频域系数与对应的组包络值得到的差值进行编码,得到以第二时长为帧长的多个扩展帧。In a possible design manner, the encoding module 602 may be specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain The second duration is multiple extended frames of the frame length.
在一种可能的设计方式中,编码模块602具体可以用于:对第二音频信号进行频域变换,得到第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;将低频信号的多个频域系数和高频信号的多个频域系数分别分组得到对应的组包络值,其中,组包络值为每组中多个频域系数的平均值;根据组包络值进行编码得到以第二时长为帧长的多个扩展帧。In a possible design manner, the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients: group multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain the corresponding group envelope value, where the group envelope value is the average value of the multiple frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
在一种可能的设计方式中,上述实施例中的频域变换具体可以为改进离散余弦变换MDCT算法。In a possible design manner, the frequency domain transform in the foregoing embodiment may specifically be an improved discrete cosine transform MDCT algorithm.
本申请还提供一种音频信号处理装置,如图7所示,该装置700包括接收模块701和解码模块702。This application also provides an audio signal processing device. As shown in FIG. 7, the device 700 includes a receiving module 701 and a decoding module 702.
接收模块701,可以用于接收来自第一装置发送的基本帧和扩展帧,其中,扩展帧的帧长大于基本帧的帧长,扩展帧是对多个基本帧对应的音频信号重新进行编码得到的。The receiving module 701 may be used to receive the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding the audio signals corresponding to multiple basic frames of.
解码模块702,可以用于对基本帧进行解码得到基本音频信号;或者,对基本帧和扩展帧进行联合解码得到扩展音频信号。The decoding module 702 can be used to decode a basic frame to obtain a basic audio signal; or, to jointly decode a basic frame and an extended frame to obtain an extended audio signal.
在一种可能的设计方式中,解码模块702具体可以用于:根据时域编解码方式对基本帧进行解码,得到基本音频信号。In a possible design manner, the decoding module 702 may be specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.
在一种可能的设计方式中,解码模块702具体可以用于:若扩展帧包括多个高频信号的组包络值,则根据多个高频信号的组包络值得到高频信号的多个频域系数,高频信号的频域系数为频域系数对应的组包络值;对基本音频信号进行上采样,得到第三音频信号;对第三音频信号逐帧进行频域变换,得到第三音频信号对应的低频信号 的多个频域系数;根据高频信号的多个频域系数和低频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design manner, the decoding module 702 can be specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple envelope values of the high-frequency signals according to the group envelope values of the multiple high-frequency signals. A frequency domain coefficient, the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; the third audio signal is subjected to frequency domain transformation frame by frame to obtain The multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
在一种可能的设计方式中,解码模块702具体可以用于:若基本帧包括低频信号的多个频域系数和高频信号的多个组包络值,则根据基本帧得到低频信号的多个频域系数和高频信号的多个频域系数,其中,高频信号的多个频域系数为频域系数对应的组包络值;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到基本音频信号。In a possible design manner, the decoding module 702 can be specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, obtain the multiple of the low-frequency signal according to the basic frame. Multiple frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency-domain coefficients of the low-frequency signal and the high-frequency signal Perform inverse frequency domain transformation on multiple frequency domain coefficients to obtain a basic audio signal.
在一种可能的设计方式中,解码模块702具体可以用于:若扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则根据高频信号的多个组包络值,以及高频信号的多个频域系数与对应的组包络值的差值得到高频信号的多个频域系数;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design manner, the decoding module 702 may be specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple groups of the high-frequency signal The envelope value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain the multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain an extended audio signal.
在一种可能的设计方式中,解码模块702具体可以用于:若扩展帧包括低频信号的多个组包络值和高频信号的多个组包络值,则根据低频信号的多个组包络值得到低频信号的多个频域系数,并根据高频信号的多个组包络值得到高频信号的多个频域系数;其中,低频信号的多个频域系数是根据基本帧得到的基本音频信号进行频域变换确定的,或者多个低频信号的频域系数是根据扩展帧中的低频信号的多个组包络值确定,低频信号的多个频域系数为频域系数对应的组包络值;根据低频信号的多个频域系数和高频信号的多个频域系数进行频域反变换,得到扩展音频信号。In a possible design manner, the decoding module 702 can be specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal The envelope value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are based on the basic frame The obtained basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal are frequency domain coefficients. Corresponding group envelope value; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.
在一种可能的设计方式中,上述实施例中的频域反变化具体可以为改进离散余弦反变换算法。In a possible design manner, the frequency domain inverse change in the foregoing embodiment may specifically be an improved inverse discrete cosine transform algorithm.
在一种可能的设计方式中,组包络值包括对多个频域系数按照从低频到高频的顺序进行平均分组后得到的每组中多个频域系数的平均值。In a possible design manner, the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
可以理解的,当上述音频信号处理装置是电子设备时,上述发送模块可以是发送器,可以包括天线和射频电路等,预处理模块、编码模块和解码模块可以是处理器,例如基带芯片等。当上述音频信号处理装置是具有上述第一装置或者第二装置功能的部件时,发送模块可以是射频单元,预处理模块、编码模块和解码模块可以是处理器。当上述音频信号处理装置是芯片系统时,发送模块可以是芯片系统的输出接口,预处理模块、编码模块和解码模块可以是芯片系统的处理器,例如:中央处理单元(central processing unit,CPU)。It is understandable that when the audio signal processing device is an electronic device, the sending module may be a transmitter, which may include an antenna and a radio frequency circuit, and the preprocessing module, encoding module, and decoding module may be processors, such as baseband chips. When the audio signal processing device is a component having the function of the first device or the second device, the sending module may be a radio frequency unit, and the preprocessing module, encoding module, and decoding module may be processors. When the above audio signal processing device is a chip system, the sending module may be the output interface of the chip system, and the preprocessing module, encoding module, and decoding module may be the processors of the chip system, such as a central processing unit (CPU) .
需要说明的是,上述的装置600中具体的执行过程和实施例可以参照上述方法实施例中第一装置执行的步骤和相关的描述,上述的装置700中具体的执行过程和实施例可以参照上述方法实施例中第二装置执行的步骤和相关的描述,所解决的技术问题和带来的技术效果也可以参照前述实施例所述的内容,此处不再一一赘述。It should be noted that the specific execution process and embodiments in the above-mentioned apparatus 600 can refer to the steps performed by the first apparatus in the above method embodiment and related descriptions, and the specific execution process and embodiments in the above-mentioned apparatus 700 can refer to the above The steps performed by the second device in the method embodiment and related descriptions, the technical problems solved and the technical effects brought about can also refer to the content described in the foregoing embodiments, which will not be repeated here.
在本实施例中,该音频信号处理装置以采用集成的方式划分各个功能模块的形式来呈现。这里的“模块”可以指特定电路、执行一个或多个软件或固件程序的处理器和存储器、集成逻辑电路、和/或其他可以提供上述功能的器件。在一个简单的实施例中,本领域的技术人员可以想到该音频信号处理装置可以采用如下图8所示的形式。In this embodiment, the audio signal processing device is presented in the form of dividing various functional modules in an integrated manner. The "module" herein may refer to a specific circuit, a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions. In a simple embodiment, those skilled in the art can imagine that the audio signal processing device may adopt the form shown in FIG. 8 below.
图8为本申请实施例示出的一种示例性的电子设备800的结构示意图,该电子设 备800可以为上述实施方式中的第一装置或者第二装置,用于执行上述实施方式中的智能摄像头的测试方法。如图8所示,该电子设备800可以包括至少一个处理器801,通信线路802以及存储器803。FIG. 8 is a schematic structural diagram of an exemplary electronic device 800 shown in an embodiment of the application. The electronic device 800 may be the first device or the second device in the foregoing embodiment, and is used to execute the smart camera in the foregoing embodiment. Test method. As shown in FIG. 8, the electronic device 800 may include at least one processor 801, a communication line 802, and a memory 803.
处理器801可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个集成电路。The processor 801 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits.
通信线路802可包括一条通路,在上述组件之间传送信息,该通信线路例如可以是总线。The communication line 802 may include a path to transmit information between the above-mentioned components, and the communication line may be, for example, a bus.
存储器803可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路802与处理器相连接。存储器也可以和处理器集成在一起。本申请实施例提供的存储器通常为非易失性存储器。其中,存储器803用于存储执行本申请实施例的方案所涉及的计算机程序指令,并由处理器801来控制执行。处理器801用于执行存储器803中存储的计算机程序指令,从而实现本申请实施例提供的方法。The memory 803 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory can exist independently, and is connected to the processor through a communication line 802. The memory can also be integrated with the processor. The memory provided by the embodiment of the present application is usually a non-volatile memory. The memory 803 is used to store and execute computer program instructions involved in the solutions of the embodiments of the present application, and the processor 801 controls the execution. The processor 801 is configured to execute computer program instructions stored in the memory 803, so as to implement the method provided in the embodiment of the present application.
可选的,本申请实施例中的计算机程序指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。Optionally, the computer program instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
在具体实现中,作为一种实施例,处理器801可以包括一个或多个CPU,例如图8中的CPU0和CPU1。In a specific implementation, as an embodiment, the processor 801 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 8.
在具体实现中,作为一种实施例,电子设备800可以包括多个处理器,例如图8中的处理器801和处理器807。这些处理器可以是单核(single-CPU)处理器,也可以是多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the electronic device 800 may include multiple processors, such as the processor 801 and the processor 807 in FIG. 8. These processors can be single-CPU (single-CPU) processors or multi-core (multi-CPU) processors. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
在具体实现中,作为一种实施例,电子设备800还可以包括通信接口804。电子设备可以通过通信接口804收发数据,或者与其他设备或通信网络通信,该通信接口804例如可以为以太网接口,无线接入网接口(radio access network,RAN),无线局域网接口(wireless local area networks,WLAN)或者USB接口等。In a specific implementation, as an embodiment, the electronic device 800 may further include a communication interface 804. The electronic device can send and receive data through the communication interface 804, or communicate with other devices or a communication network. The communication interface 804 can be, for example, an Ethernet interface, a radio access network (RAN), or a wireless local area interface (wireless local area). networks, WLAN) or USB interface, etc.
在具体实现中,作为一种实施例,电子设备800还可以包括输出设备805和输入设备806。输出设备805和处理器801通信,可以以多种方式来显示信息。例如,输出设备805可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备806和处理器801通信,可以以多种方式接收用户的输入。例如,输入设备806可以是鼠标、键盘、触摸屏设备或传感设备等。In a specific implementation, as an embodiment, the electronic device 800 may further include an output device 805 and an input device 806. The output device 805 communicates with the processor 801 and can display information in a variety of ways. For example, the output device 805 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait. The input device 806 communicates with the processor 801, and can receive user input in a variety of ways. For example, the input device 806 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
在具体实现中,电子设备800可以是台式机、便携式电脑、网络服务器、掌上电 脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备、智能摄像头或有图8中类似结构的设备。本申请实施例不限定电子设备800的类型,如用于实现上述实施例中第二装置的方法,则电子设备800需要配置有智能摄像头。In a specific implementation, the electronic device 800 can be a desktop computer, a portable computer, a web server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, a smart camera, or a smart camera as shown in Figure 8. Similar structure equipment. The embodiment of the present application does not limit the type of the electronic device 800. If it is used to implement the method of the second device in the foregoing embodiment, the electronic device 800 needs to be equipped with a smart camera.
在一些实施例中,图8中的处理器801可以通过调用存储器803中存储的计算机程序指令,使得电子设备800执行上述方法实施例中的方法。In some embodiments, the processor 801 in FIG. 8 may invoke the computer program instructions stored in the memory 803 to cause the electronic device 800 to execute the method in the foregoing method embodiment.
示例性的,图6或者图7中的各处理模块的功能/实现过程可以通过图8中的处理器801调用存储器803中存储的计算机程序指令来实现。例如,图7中的预处理模块601和编码模块602的功能/实现过程可以通过图8中的处理器801调用存储器803中存储的计算机执行指令来实现。图7中的接收模块701和解码模块702的功能/实现过程可以通过图8中的处理器801调用存储器803中存储的计算机执行指令来实现。Exemplarily, the function/implementation process of each processing module in FIG. 6 or FIG. 7 may be implemented by the processor 801 in FIG. 8 calling computer program instructions stored in the memory 803. For example, the function/implementation process of the preprocessing module 601 and the encoding module 602 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803. The function/implementation process of the receiving module 701 and the decoding module 702 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803.
在示例性实施例中,还提供了一种包括指令的计算机可读存储介质,上述指令可由电子设备800的处理器801执行以完成上述实施例的智能摄像头的测试方法。因此其所能获得的技术效果可参考上述方法实施例,在此不再赘述。In an exemplary embodiment, a computer-readable storage medium including instructions is also provided. The foregoing instructions can be executed by the processor 801 of the electronic device 800 to complete the smart camera testing method of the foregoing embodiment. Therefore, the technical effects that can be obtained can refer to the above-mentioned method embodiments, which will not be repeated here.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。After considering the specification and practicing the invention disclosed herein, those skilled in the art will easily think of other embodiments of the present application. This application is intended to cover any variations, uses, or adaptive changes of this application. These variations, uses, or adaptive changes follow the general principles of this application and include common knowledge or customary technical means in this technical field that are not disclosed in this application. .
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any change or replacement within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (38)

  1. 一种音频信号处理方法,其特征在于,所述方法包括:An audio signal processing method, characterized in that the method includes:
    第一装置对获取的第一音频信号进行采样和量化处理,得到第二音频信号;The first device performs sampling and quantization processing on the acquired first audio signal to obtain a second audio signal;
    以第一时长为单位对所述第二音频信号通过第一编码方式进行编码得到基本帧;Encoding the second audio signal in a first encoding manner by using the first duration as a unit to obtain a basic frame;
    以第二时长为单位对所述第二音频信号通过第二编码方式进行编码得到扩展帧,其中,所述第二时长大于所述第一时长,且所述第一编码方式和所述第二编码方式分别对所述第二音频信号中携带的不同信号进行编码,和/或分别对所述第二音频信号进行不同编码程度的编码;Encode the second audio signal in a second encoding mode with a second duration as a unit to obtain an extended frame, wherein the second duration is greater than the first duration, and the first encoding manner and the second encoding manner Encoding manners respectively encode different signals carried in the second audio signal, and/or encode the second audio signal with different encoding levels;
    将所述基本帧和所述扩展帧发送给第二装置。Sending the basic frame and the extended frame to a second device.
  2. 根据权利要求1所述的方法,其特征在于,所述第二时长为所述第一时长的N倍,N为大于等于2的自然数。The method according to claim 1, wherein the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  3. 根据权利要求1或2所述的方法,其特征在于,所述以第一时长为单位对所述第二音频信号通过第一编码方式进行编码得到基本帧,具体包括:The method according to claim 1 or 2, wherein said encoding said second audio signal in a first time length as a unit to obtain a basic frame in a first encoding manner specifically comprises:
    对所述第二音频信号进行下采样,得到所述第二音频信号中携带的低频信号;Down-sampling the second audio signal to obtain the low-frequency signal carried in the second audio signal;
    根据时域编码方式对所述低频信号进行编码,得到多个以所述第一时长为帧长的多个所述基本帧。The low-frequency signal is encoded according to a time-domain encoding manner to obtain a plurality of the basic frames with the first time length as the frame length.
  4. 根据权利要求3所述的方法,其特征在于,所述以第二时长为单位对所述第二音频信号进通过第二编码方式进行编码得到扩展帧,具体包括:The method according to claim 3, wherein the encoding the second audio signal in a second encoding mode to obtain an extended frame in a unit of the second duration specifically includes:
    对所述第二音频信号进行频域变换,得到所述第二音频信号对应的频域系数;Performing frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;
    将所述第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,所述组包络值为每组中多个高频频域系数的平均值;The multiple frequency domain coefficients of the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group The envelope value is the average value of multiple high-frequency frequency domain coefficients in each group;
    根据所述组包络值进行编码得到以所述第二时长为帧长的多个所述扩展帧。Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
  5. 根据权利要求1或2所述的方法,其特征在于,所述以第一时长为单位对所述第二音频信号通过第一编码方式进行编码得到基本帧,具体包括:The method according to claim 1 or 2, wherein said encoding said second audio signal in a first time length as a unit to obtain a basic frame in a first encoding manner specifically comprises:
    对所述第二音频信号进行频域变换,得到所述第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;
    将所述高频信号的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,所述组包络值为每组中多个高频频域系数的平均值;The multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group envelope value is multiple in each group The average value of high frequency frequency domain coefficients;
    根据所述低频信号的多个频域系数和高频信号的所述组包络值进行编码得到以所述第一时长为帧长的多个所述基本帧。Encoding is performed according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of the basic frames whose frame length is the first time length.
  6. 根据权利要求4或5所述的方法,其特征在于,所述以第二时长为单位对所述第二音频信号通过第二编码方式进行编码得到扩展帧,具体包括:The method according to claim 4 or 5, wherein the encoding the second audio signal in a second encoding mode to obtain an extended frame in a unit of the second duration specifically includes:
    以所述第二时长为单位,将高频信号的多个频域系数与对应的组包络值得到的差值进行编码,得到以所述第二时长为帧长的多个所述扩展帧。Using the second duration as a unit, encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain a plurality of the extended frames with the second duration as the frame length .
  7. 根据权利要求3所述的方法,其特征在于,所述以第二时长为单位对所述第二音频信号通过第二编码方式进行编码得到扩展帧,具体还包括:The method according to claim 3, wherein the encoding of the second audio signal in a second encoding mode to obtain an extended frame in a unit of a second duration further specifically includes:
    对所述第二音频信号进行频域变换,得到所述第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;
    将所述低频信号的多个频域系数和所述高频信号的多个频域系数按照从低频到高频的顺序分别进行平均分组得到对应的组包络值,其中,所述组包络值为每组中多个频域系数的平均值;The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain corresponding group envelope values, wherein, the group envelope Value is the average value of multiple frequency domain coefficients in each group;
    根据所述组包络值进行编码得到以所述第二时长为帧长的多个所述扩展帧。Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
  8. 根据权利要求4-7任一项所述的方法,其特征在于,对所述第二音频信号进行频域变换,具体包括:The method according to any one of claims 4-7, wherein performing frequency domain transformation on the second audio signal specifically comprises:
    根据改进离散余弦变换MDCT算法,得到第二所述音频信号对应的MDCT频域分量系数。According to the improved discrete cosine transform MDCT algorithm, the MDCT frequency domain component coefficients corresponding to the second audio signal are obtained.
  9. 一种音频信号处理方法,其特征在于,所述方法包括:An audio signal processing method, characterized in that the method includes:
    第二装置接收来自第一装置发送的基本帧和扩展帧,其中,所述扩展帧的帧长大于所述基本帧的帧长;The second device receives the basic frame and the extended frame sent from the first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame;
    对所述基本帧进行解码得到基本音频信号;Decode the basic frame to obtain a basic audio signal;
    或者,or,
    对所述基本帧和所述扩展帧进行联合解码得到扩展音频信号。Jointly decoding the basic frame and the extended frame to obtain an extended audio signal.
  10. 根据权利要求9所述的方法,其特征在于,所述对所述基本帧进行解码得到基本音频信号,具体包括:The method according to claim 9, wherein said decoding said basic frame to obtain a basic audio signal specifically comprises:
    根据时域编解码方式对所述基本帧进行解码,得到所述基本音频信号。The basic frame is decoded according to the time-domain coding and decoding mode to obtain the basic audio signal.
  11. 根据权利要求9或10所述的方法,其特征在于,所述对所述基本帧和所述扩展帧进行联合解码得到扩展音频信号,具体包括:The method according to claim 9 or 10, wherein the joint decoding of the basic frame and the extended frame to obtain an extended audio signal specifically comprises:
    若所述扩展帧包括多个高频信号的组包络值,则根据所述多个高频信号的组包络值得到高频信号的多个频域系数,所述高频信号的频域系数为所述频域系数对应的组包络值;If the extended frame includes group envelope values of multiple high-frequency signals, then multiple frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, and the frequency domain of the high-frequency signal The coefficient is the group envelope value corresponding to the frequency domain coefficient;
    对所述基本音频信号进行上采样,得到第三音频信号;Up-sampling the basic audio signal to obtain a third audio signal;
    对所述第三音频信号逐帧进行频域变换,得到所述第三音频信号对应的低频信号的多个频域系数;Performing frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal;
    根据所述高频信号的多个频域系数和所述低频信号的多个频域系数进行频域反变换,得到所述扩展音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
  12. 根据权利要求9所述的方法,其特征在于,所述对所述基本帧进行解码得到基本音频信号,具体包括:The method according to claim 9, wherein said decoding said basic frame to obtain a basic audio signal specifically comprises:
    若所述基本帧包括低频信号的多个频域系数和高频信号的多个组包络值,则根据所述基本帧得到所述低频信号的多个频域系数和所述高频信号的多个频域系数,其中,所述高频信号的多个频域系数为所述频域系数对应的组包络值;If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, the multiple frequency domain coefficients of the low-frequency signal and the multiple envelope values of the high-frequency signal are obtained according to the basic frame. Multiple frequency domain coefficients, where the multiple frequency domain coefficients of the high-frequency signal are group envelope values corresponding to the frequency domain coefficients;
    根据所述低频信号的多个频域系数和所述高频信号的多个频域系数进行频域反变换,得到所述基本音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the basic audio signal.
  13. 根据权利要求11或12所述的方法,其特征在于,所述对所述基本帧和所述扩展帧进行联合解码得到扩展音频信号,具体包括:The method according to claim 11 or 12, wherein the joint decoding of the basic frame and the extended frame to obtain an extended audio signal specifically comprises:
    若所述扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则根据所述高频信号的多个组包络值,以及高频信号的多个频域系数与对应的组包络值的差值得到高频信号的多个频域系数;If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, then according to the multiple group envelope values of the high-frequency signal, and the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value obtains multiple frequency domain coefficients of the high-frequency signal;
    根据所述低频信号的多个频域系数和所述高频信号的多个频域系数进行频域反变换,得到所述扩展音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  14. 根据权利要求10所述的方法,其特征在于,所述对所述基本帧和所述扩展帧进行联合解码得到扩展音频信号,具体包括:The method according to claim 10, wherein the joint decoding of the basic frame and the extended frame to obtain an extended audio signal specifically comprises:
    若所述扩展帧包括低频信号的多个组包络值和高频信号的多个组包络值,则根据所述低频信号的多个组包络值得到低频信号的多个频域系数,并根据所述高频信号的多个组包络值得到高频信号的多个频域系数;If the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequency domain coefficients of the low-frequency signal according to the multiple group envelope values of the low-frequency signal, And obtain multiple frequency domain coefficients of the high frequency signal according to the multiple group envelope values of the high frequency signal;
    其中,所述低频信号的多个频域系数是根据所述基本帧得到的所述基本音频信号进行频域变换确定的,或者所述低频信号的多个频域系数是根据所述扩展帧中的所述低频信号的多个组包络值确定,所述低频信号的多个频域系数为所述频域系数对应的组包络值;Wherein, the multiple frequency domain coefficients of the low-frequency signal are determined according to the frequency domain transformation of the basic audio signal obtained in the basic frame, or the multiple frequency domain coefficients of the low-frequency signal are determined according to the extension frame Multiple group envelope values of the low-frequency signal are determined, and multiple frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients;
    根据所述低频信号的多个频域系数和所述高频信号的多个频域系数进行频域反变换,得到所述扩展音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  15. 根据权利要求11-14任一项所述的方法,其特征在于,根据频域系数进行频域反变化,具体包括:The method according to any one of claims 11-14, wherein performing frequency domain inverse change according to frequency domain coefficients specifically includes:
    根据改进离散余弦反变换算法,得到所述频域系数对应的音频模拟信号。According to the improved inverse discrete cosine transform algorithm, the audio analog signal corresponding to the frequency domain coefficient is obtained.
  16. 根据权利要求11-15任一项所述的方法,其特征在于,所述组包络值包括对多个频域系数按照从低频到高频的顺序进行平均分组后得到的每组中多个频域系数的平均值。The method according to any one of claims 11-15, wherein the group envelope value comprises a plurality of frequency domain coefficients obtained by averaging grouping a plurality of frequency domain coefficients in order from low frequency to high frequency. The average value of the frequency domain coefficients.
  17. 一种音频信号处理装置,其特征在于,所述装置包括:An audio signal processing device, characterized in that the device includes:
    预处理模块,用于对获取的第一音频信号进行采样和量化处理,得到第二音频信号;The preprocessing module is used for sampling and quantizing the acquired first audio signal to obtain the second audio signal;
    编码模块,用于以第一时长为单位对所述第二音频信号通过第一编码方式进行编码得到基本帧,以第二时长为单位对所述第二音频信号通过第二编码方式进行编码得到扩展帧,其中,所述第二时长大于所述第一时长,且所述第一编码方式和所述第二编码方式分别对所述第二音频信号中携带的不同信号进行编码,和/或分别对所述第二音频信号进行不同编码程度的编码;An encoding module, configured to encode the second audio signal in a first time length as a unit through a first encoding method to obtain a basic frame, and use the second time length as a unit to encode the second audio signal in a second encoding method to obtain a basic frame An extended frame, wherein the second duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and/or Encoding the second audio signal with different encoding degrees respectively;
    发送模块,用于将所述基本帧和所述扩展帧发送给第二装置。The sending module is configured to send the basic frame and the extended frame to a second device.
  18. 根据权利要求17所述的装置,其特征在于,所述第二时长为所述第一时长的N倍,N为大于等于2的自然数。The device according to claim 17, wherein the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  19. 根据权利要求18所述的装置,其特征在于,所述编码模块具体用于:The device according to claim 18, wherein the encoding module is specifically configured to:
    对所述第二音频信号进行下采样,得到所述第二音频信号中携带的低频信号;Down-sampling the second audio signal to obtain the low-frequency signal carried in the second audio signal;
    根据时域编码方式对所述低频信号进行编码,得到多个以所述第一时长为帧长的多个所述基本帧。The low-frequency signal is encoded according to a time-domain encoding manner to obtain a plurality of the basic frames with the first time length as the frame length.
  20. 根据权利要求19所述的装置,其特征在于,所述编码模块具体用于:The device according to claim 19, wherein the encoding module is specifically configured to:
    对所述第二音频信号进行频域变换,得到所述第二音频信号对应的频域系数;Performing frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;
    将所述第二音频信号对应的频域系数中高频部分的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,所述组包络值为每组中多个高频频域系数的平均值;The multiple frequency domain coefficients of the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group The envelope value is the average value of multiple high-frequency frequency domain coefficients in each group;
    根据所述组包络值进行编码得到以所述第二时长为帧长的多个所述扩展帧。Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
  21. 根据权利要求18所述的装置,其特征在于,所述编码模块具体用于:The device according to claim 18, wherein the encoding module is specifically configured to:
    对所述第二音频信号进行频域变换,得到所述第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;
    将所述高频信号的多个频域系数按照从低频到高频的顺序进行平均分组,得到多个高频分组的组包络值,其中,所述组包络值为每组中多个高频频域系数的平均值;The multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group envelope value is multiple in each group The average value of high frequency frequency domain coefficients;
    根据所述低频信号的多个频域系数和高频信号的所述组包络值进行编码得到以所述第一时长为帧长的多个所述基本帧。Encoding is performed according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of the basic frames whose frame length is the first time length.
  22. 根据权利要求20或21所述的装置,其特征在于,所述编码模块具体用于:The device according to claim 20 or 21, wherein the encoding module is specifically configured to:
    以所述第二时长为单位,将高频信号的多个频域系数与对应的组包络值得到的差值进行编码,得到以所述第二时长为帧长的多个所述扩展帧。Using the second duration as a unit, encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain a plurality of the extended frames with the second duration as the frame length .
  23. 根据权利要求19所述的装置,其特征在于,所述编码模块具体用于:The device according to claim 19, wherein the encoding module is specifically configured to:
    对所述第二音频信号进行频域变换,得到所述第二音频信号对应的低频信号的多个频域系数和高频信号的多个频域系数;Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;
    将所述低频信号的多个频域系数和所述高频信号的多个频域系数按照从低频到高频的顺序分别进行平均分组得到对应的组包络值,其中,所述组包络值为每组中多个频域系数的平均值;The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain corresponding group envelope values, wherein, the group envelope Value is the average of multiple frequency domain coefficients in each group;
    根据所述组包络值进行编码得到以所述第二时长为帧长的多个所述扩展帧。Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
  24. 根据权利要求20-23任一项所述的装置,其特征在于,所述频域变换具体包括:改进离散余弦变换MDCT算法。The device according to any one of claims 20-23, wherein the frequency domain transform specifically comprises: an improved discrete cosine transform (MDCT) algorithm.
  25. 一种音频信号处理装置,其特征在于,所述装置包括:An audio signal processing device, characterized in that the device includes:
    接收模块,用于接收来自第一装置发送的基本帧和扩展帧,其中,所述扩展帧的帧长大于所述基本帧的帧长,所述扩展帧是对多个基本帧对应的音频信号重新进行编码得到的;The receiving module is configured to receive a basic frame and an extended frame sent from the first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is an audio signal corresponding to a plurality of basic frames Re-encoded;
    解码模块,用于对所述基本帧进行解码得到基本音频信号;或者,对所述基本帧和所述扩展帧进行联合解码得到扩展音频信号。The decoding module is configured to decode the basic frame to obtain a basic audio signal; or jointly decode the basic frame and the extended frame to obtain an extended audio signal.
  26. 根据权利要求25所述的装置,其特征在于,所述解码模块具体用于:The device according to claim 25, wherein the decoding module is specifically configured to:
    根据时域编解码方式对所述基本帧进行解码,得到所述基本音频信号。The basic frame is decoded according to the time-domain coding and decoding mode to obtain the basic audio signal.
  27. 根据权利要求25或26所述的装置,其特征在于,所述解码模块具体用于:The device according to claim 25 or 26, wherein the decoding module is specifically configured to:
    若所述扩展帧包括多个高频信号的组包络值,则根据所述多个高频信号的组包络值得到高频信号的多个频域系数,所述高频信号的频域系数为所述频域系数对应的组包络值;If the extended frame includes group envelope values of multiple high-frequency signals, then multiple frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, and the frequency domain of the high-frequency signal The coefficient is the group envelope value corresponding to the frequency domain coefficient;
    对所述基本音频信号进行上采样,得到第三音频信号;Up-sampling the basic audio signal to obtain a third audio signal;
    对所述第三音频信号逐帧进行频域变换,得到所述第三音频信号对应的低频信号的多个频域系数;Performing frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal;
    根据所述高频信号的多个频域系数和所述低频信号的多个频域系数进行频域反变换,得到所述扩展音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
  28. 根据权利要求25所述的装置,其特征在于,所述解码模块具体用于:The device according to claim 25, wherein the decoding module is specifically configured to:
    若所述基本帧包括低频信号的多个频域系数和高频信号的多个组包络值,则根据 所述基本帧得到所述低频信号的多个频域系数和所述高频信号的多个频域系数,其中,所述高频信号的多个频域系数为所述频域系数对应的组包络值;If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, the multiple frequency domain coefficients of the low-frequency signal and the multiple envelope values of the high-frequency signal are obtained according to the basic frame. Multiple frequency domain coefficients, where the multiple frequency domain coefficients of the high-frequency signal are group envelope values corresponding to the frequency domain coefficients;
    根据所述低频信号的多个频域系数和所述高频信号的多个频域系数进行频域反变换,得到所述基本音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the basic audio signal.
  29. 根据权利要求27或28所述的装置,其特征在于,所述解码模块具体用于:The device according to claim 27 or 28, wherein the decoding module is specifically configured to:
    若所述扩展帧包括高频信号的多个频域系数与对应的组包络值的差值,则根据所述高频信号的多个组包络值,以及高频信号的多个频域系数与对应的组包络值的差值得到高频信号的多个频域系数;If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, then according to the multiple group envelope values of the high-frequency signal, and the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value obtains multiple frequency domain coefficients of the high-frequency signal;
    根据所述低频信号的多个频域系数和所述高频信号的多个频域系数进行频域反变换,得到所述扩展音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  30. 根据权利要求26所述的装置,其特征在于,所述解码模块具体用于:The device according to claim 26, wherein the decoding module is specifically configured to:
    若所述扩展帧包括低频信号的多个组包络值和高频信号的多个组包络值,则根据所述低频信号的多个组包络值得到低频信号的多个频域系数,并根据所述高频信号的多个组包络值得到高频信号的多个频域系数;If the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequency domain coefficients of the low-frequency signal according to the multiple group envelope values of the low-frequency signal, And obtain multiple frequency domain coefficients of the high frequency signal according to the multiple group envelope values of the high frequency signal;
    其中,所述低频信号的多个频域系数是根据所述基本帧得到的所述基本音频信号进行频域变换确定的,或者所述低频信号的多个频域系数是根据所述扩展帧中的所述低频信号的多个组包络值确定,所述低频信号的多个频域系数为所述频域系数对应的组包络值;Wherein, the multiple frequency domain coefficients of the low-frequency signal are determined according to the frequency domain transformation of the basic audio signal obtained in the basic frame, or the multiple frequency domain coefficients of the low-frequency signal are determined according to the extension frame Multiple group envelope values of the low-frequency signal are determined, and multiple frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients;
    根据所述低频信号的多个频域系数和所述高频信号的多个频域系数进行频域反变换,得到所述扩展音频信号。Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  31. 根据权利要求27-30任一项所述的装置,其特征在于,所述频域反变化具体包括:改进离散余弦反变换算法。The device according to any one of claims 27-30, wherein the frequency domain inverse change specifically comprises: an improved inverse discrete cosine transform algorithm.
  32. 根据权利要求27-31任一项所述的装置,其特征在于,所述组包络值包括对多个频域系数按照从低频到高频的顺序进行平均分组后得到的每组中多个频域系数的平均值。The device according to any one of claims 27-31, wherein the group envelope value comprises a plurality of frequency domain coefficients obtained by averaging grouping of multiple frequency domain coefficients in order from low frequency to high frequency. The average value of the frequency domain coefficients.
  33. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that, the electronic device includes:
    处理器和传输接口;Processor and transmission interface;
    用于存储所述处理器可执行指令的存储器;A memory for storing executable instructions of the processor;
    其中,所述处理器被配置为执行所述指令,以使得所述电子设备实现如权利要求1至8中任一项所述的音频信号处理方法。Wherein, the processor is configured to execute the instructions, so that the electronic device implements the audio signal processing method according to any one of claims 1 to 8.
  34. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that, the electronic device includes:
    处理器和传输接口;Processor and transmission interface;
    用于存储所述处理器可执行指令的存储器;A memory for storing executable instructions of the processor;
    其中,所述处理器被配置为执行所述指令,以使得所述电子设备实现如权利要求9至16中任一项所述的音频信号处理方法。Wherein, the processor is configured to execute the instruction, so that the electronic device implements the audio signal processing method according to any one of claims 9 to 16.
  35. 一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如权利要求1至8中任一项所述的音频信号处理方法。A computer-readable storage medium, when the instructions in the computer-readable storage medium are executed by the processor of an electronic device, the electronic device can execute the audio signal according to any one of claims 1 to 8 Approach.
  36. 一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述 计算机执行如权利要求1至8中任一项所述的音频信号处理方法。A computer program product, when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of claims 1 to 8.
  37. 一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如权利要求9至16中任一项所述的音频信号处理方法。A computer-readable storage medium, when the instructions in the computer-readable storage medium are executed by the processor of an electronic device, the electronic device can execute the audio signal according to any one of claims 9 to 16 Approach.
  38. 一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求9至16中任一项所述的音频信号处理方法。A computer program product, when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of claims 9 to 16.
PCT/CN2020/098183 2020-06-24 2020-06-24 Audio signal processing method and apparatus WO2021258350A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/098183 WO2021258350A1 (en) 2020-06-24 2020-06-24 Audio signal processing method and apparatus
CN202080092744.4A CN114945981A (en) 2020-06-24 2020-06-24 Audio signal processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/098183 WO2021258350A1 (en) 2020-06-24 2020-06-24 Audio signal processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2021258350A1 true WO2021258350A1 (en) 2021-12-30

Family

ID=79282732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098183 WO2021258350A1 (en) 2020-06-24 2020-06-24 Audio signal processing method and apparatus

Country Status (2)

Country Link
CN (1) CN114945981A (en)
WO (1) WO2021258350A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425294A (en) * 2002-09-06 2009-05-06 松下电器产业株式会社 Sound encoding apparatus and sound encoding method
CN103035248A (en) * 2011-10-08 2013-04-10 华为技术有限公司 Encoding method and device for audio signals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425294A (en) * 2002-09-06 2009-05-06 松下电器产业株式会社 Sound encoding apparatus and sound encoding method
CN103035248A (en) * 2011-10-08 2013-04-10 华为技术有限公司 Encoding method and device for audio signals

Also Published As

Publication number Publication date
CN114945981A (en) 2022-08-26

Similar Documents

Publication Publication Date Title
US8442838B2 (en) Bitrate constrained variable bitrate audio encoding
RU2439718C1 (en) Method and device for sound signal processing
US10089997B2 (en) Method for predicting high frequency band signal, encoding device, and decoding device
US11289102B2 (en) Encoding method and apparatus
WO2019233362A1 (en) Deep learning-based speech quality enhancing method, device, and system
US20220180881A1 (en) Speech signal encoding and decoding methods and apparatuses, electronic device, and storage medium
WO2019233364A1 (en) Deep learning-based audio quality enhancement
EP2863388B1 (en) Bit allocation method and device for audio signal
US20100324914A1 (en) Adaptive Encoding of a Digital Signal with One or More Missing Values
JP2019529979A (en) Quantizer with index coding and bit scheduling
CN111768790B (en) Method and device for transmitting voice data
WO2021213128A1 (en) Audio signal encoding method and apparatus
WO2015151451A1 (en) Encoder, decoder, encoding method, decoding method, and program
WO2015165264A1 (en) Signal processing method and device
WO2021258350A1 (en) Audio signal processing method and apparatus
UA114233C2 (en) Systems and methods for determining an interpolation factor set
CN103503065B (en) For method and the demoder of the signal area of the low accuracy reconstruct that decays
CN113096670A (en) Audio data processing method, device, equipment and storage medium
WO2022237851A1 (en) Audio encoding method and apparatus, and audio decoding method and apparatus
WO2022258036A1 (en) Encoding method and apparatus, decoding method and apparatus, and device, storage medium and computer program
EP4354430A1 (en) Three-dimensional audio signal processing method and apparatus
WO2022242534A1 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium and computer program
WO2022252957A1 (en) Audio data encoding method and related apparatus, audio data decoding method and related apparatus, and computer-readable storage medium
WO2022267754A1 (en) Speech coding method and apparatus, speech decoding method and apparatus, computer device, and storage medium
CN111710342B (en) Encoding device, decoding device, encoding method, decoding method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20942229

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20942229

Country of ref document: EP

Kind code of ref document: A1