WO2021258350A1 - Procédé et appareil de traitement de signal audio - Google Patents

Procédé et appareil de traitement de signal audio Download PDF

Info

Publication number
WO2021258350A1
WO2021258350A1 PCT/CN2020/098183 CN2020098183W WO2021258350A1 WO 2021258350 A1 WO2021258350 A1 WO 2021258350A1 CN 2020098183 W CN2020098183 W CN 2020098183W WO 2021258350 A1 WO2021258350 A1 WO 2021258350A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
frequency domain
audio signal
signal
domain coefficients
Prior art date
Application number
PCT/CN2020/098183
Other languages
English (en)
Chinese (zh)
Inventor
张立斌
袁庭球
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/098183 priority Critical patent/WO2021258350A1/fr
Priority to CN202080092744.4A priority patent/CN114945981A/zh
Publication of WO2021258350A1 publication Critical patent/WO2021258350A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Definitions

  • This application relates to the field of multimedia processing technology, and in particular to an audio signal processing method and device.
  • the electronic device as the transmitting end can sample, quantize, and encode the collected audio signal and then compress and transmit it to the electronic device at the receiving end.
  • multiple applications on the electronic device as the receiving end may have different delay requirements and quality requirements for the audio signal, and they require the electronic device at the transmitting end to compress and encode the audio signal differently.
  • FIG. 1 shows a possible application scenario.
  • the mobile phone sends the collected audio signal to a smart headset.
  • audio application 1 is a voice enhancement application.
  • the requirements are high, and the audio signal transmission quality requirements are general;
  • the audio application 2 is a three-dimensional sound field collection application, which has high requirements for the transmission quality of the received audio signals, but the audio signal delay requirements are not high.
  • the mobile phone needs to perform different compression and encoding processing on the same audio signal, and transmit multiple audio signals to the smart earphone.
  • the transmission delay and quality of different audio signals are different, but the different audio signals are different.
  • the content is the same audio signal collected by the mobile phone. Therefore, it will cause repeated transmission of audio signals, leading to occupation and waste of bandwidth resources.
  • the present application provides an audio signal processing method and device, which solves the problem of repeated transmission and bandwidth resources caused by different audio applications having different audio signal compression and coding requirements when the prior art is aimed at the transmission of audio signals between multiple electronic devices.
  • the problem of waste is a problem of waste.
  • an audio signal processing method includes: a first device performs sampling and quantization processing on an acquired first audio signal to obtain a second audio signal;
  • the first encoding method is encoded to obtain a basic frame
  • the second audio signal is encoded in the second encoding method to obtain an extended frame using the second duration as a unit, where the second duration is greater than the first duration, and the first encoding method and
  • the second encoding method respectively encodes different signals carried in the second audio signal, and/or encodes the second audio signal with different encoding degrees respectively; and sends the basic frame and the extended frame to the second device.
  • the audio signal sending end can encode and compress the same audio signal to obtain two encoding frames with different frame lengths, including a basic frame and an extended frame, and the extended frame can be a comparison between the basic frame and the second audio signal.
  • the receiving end can decode the basic frame to obtain an audio signal, and jointly decode the basic frame and the extended frame to obtain another audio signal.
  • the restored two audio signals have different delays and different audio quality, which can meet the above requirements.
  • the needs of different audio applications avoid the problems of repeated transmission and waste of bandwidth resources after encoding the same audio signal on the encoding side, and reduce system overhead.
  • the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  • the time interval between basic frames is the first duration
  • the time interval between extended frames is N times the first duration, that is, every encoding N-frame basic frame, one-frame extended frame coding. Therefore, the encoding side obtains encoded frames with different delays, and the decoding side uses the encoded frames with different delays to recover audio signals with different delays to meet the needs of different audio applications, increase the encoding rate, and solve the problem of bandwidth resource waste. Reduce system overhead.
  • the second audio signal is encoded by the first encoding method to obtain the basic frame by using the first duration as the unit, which specifically includes: down-sampling the second audio signal to obtain the second audio signal.
  • the low-frequency signal; the low-frequency signal is encoded according to the time-domain coding method to obtain multiple basic frames with the first time length as the frame length.
  • the encoding side may encode the low-frequency signal included in the second audio signal according to a time-domain encoding manner to obtain a basic frame. Since the time-domain encoding method can encode the audio signal into a digital signal with a lower delay, it is suitable for encoding to obtain a basic frame with a lower delay and only including the low-frequency part of the original audio signal, so that the decoding side can recover from the basic frame Obtain an audio signal with strong real-time performance and general audio quality for application to corresponding audio applications.
  • the second audio signal is encoded by the second encoding method to obtain the extended frame by using the second duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding
  • the frequency domain coefficients of the frequency domain coefficients of the second audio signal are averagely grouped in the order from low frequency to high frequency in the high frequency part of the frequency domain coefficients corresponding to the second audio signal to obtain the group envelope values of multiple high frequency groups, where,
  • the group envelope value is an average value of multiple high-frequency frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the encoding side may also encode the high-frequency signal included in the second audio signal in a frequency domain encoding manner to obtain an extended frame, so as to obtain an extended frame for the basic frame.
  • the high frequency part of the signal that has not been coded is coded. Therefore, the decoding side can jointly expand the frame recovery based on the above basic frame to obtain an audio signal with low real-time performance, but including the low-frequency and high-frequency parts of the original audio signal, and with better audio quality, so as to be applied to the corresponding audio application.
  • the foregoing embodiments can meet the requirements of multiple audio applications through basic frame encoding and extended frame encoding, increase the encoding rate, and solve the problem of bandwidth resource waste.
  • the basic frame is obtained by encoding the second audio signal through the first encoding method with the first duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal; the multiple frequency-domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain a group of multiple high frequency groups Envelope value, where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group; encode according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain the first duration as Multiple basic frames of frame length.
  • the encoding side may encode the low-frequency signal and the high-frequency signal included in the second audio signal according to a frequency-domain encoding manner, wherein multiple frequency-domain coefficients of the low-frequency signal are encoded, and
  • the high-frequency signal only encodes the group envelope value of the high-frequency signal to obtain the basic frame.
  • the basic frame coding method is to perform high-quality coding on the low-frequency part, and perform lower-quality coding on the high-frequency part.
  • the decoding side can recover the audio signal with strong real-time performance and general audio quality according to the basic frame, which can be applied to the corresponding Audio application.
  • the second audio signal is encoded by the second encoding method in the second time length as the unit to obtain the extended frame, which specifically includes: taking the second time length as the unit, the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.
  • the encoding side can further encode the high-frequency part of the signal with lower encoding quality in the basic frame according to the basic frame in the second manner, that is, according to the multiple frequency domain coefficients of the high-frequency signal and the corresponding
  • the difference value obtained from the group envelope value is encoded.
  • the extended encoding method is to perform further high-quality encoding on the high-frequency part. Therefore, the decoding side can jointly decode and restore the above-mentioned basic frame and extended frame to obtain an audio signal with general real-time performance and strong audio quality, which can be applied to the corresponding audio signal.
  • the foregoing embodiment uses basic frame encoding and extended frame encoding to obtain encoded frames with different time delays and different encoding qualities, so that the encoding rate can be increased and the system overhead can be reduced.
  • the encoding side can obtain the basic frame according to the time-domain encoding method of the above method 1, and obtain the first extended frame according to the encoding method of the extended frame in the above method 1, and then according to the above method
  • the second extended frame is obtained by encoding the two pairs of extended frames.
  • a basic frame with strong real-time performance, containing only low-frequency signals, and low coding quality can be obtained; the first extension with strong real-time performance, containing low-frequency and high-frequency signals but high-frequency signals, and low coding quality can be obtained
  • the levels of encoded frames are more abundant, and the decoding side can jointly decode and restore audio signals of different quality according to the above-mentioned basic frames and the first extended frame and the second extended frame to meet the needs of different audio applications and improve the flexibility of audio encoding. Performance and coding rate, reducing system overhead.
  • the second audio signal is encoded by the second encoding method to obtain the extended frame in the unit of the second duration, and specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal Multiple frequency domain coefficients of the corresponding low-frequency signal and multiple frequency domain coefficients of the high-frequency signal; multiple frequency-domain coefficients of the low-frequency signal and multiple frequency-domain coefficients of the high-frequency signal are performed in the order from low frequency to high frequency respectively Average grouping to obtain a corresponding group envelope value, where the group envelope value is an average value of multiple frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the encoding side may, according to the frequency domain encoding method, calculate the group envelope value and the high frequency frequency domain coefficients of the low frequency domain coefficients corresponding to the second audio signal
  • the group envelope value of is encoded to obtain the extended frame. Therefore, in the case that the basic frame is lost, the decoding side can also decode according to the extended frame to recover the audio signal, which improves the reliability of audio coding transmission and improves the user experience.
  • performing frequency domain transformation on the second audio signal specifically includes: obtaining MDCT frequency domain component coefficients corresponding to the second audio signal according to an improved discrete cosine transform MDCT algorithm.
  • an audio signal processing method includes: a second device receives a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is Audio signals corresponding to multiple basic frames are re-encoded; basic frames are decoded to obtain basic audio signals; or, basic frames and extended frames are jointly decoded to obtain extended audio signals.
  • decoding the basic frame to obtain the basic audio signal specifically includes: decoding the basic frame according to the time-domain codec mode to obtain the basic audio signal.
  • the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes the group envelope values of multiple high-frequency signals, then according to the group of multiple high-frequency signals The envelope value obtains multiple frequency domain coefficients of the high-frequency signal, and the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; The signal undergoes frequency domain transformation frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the high frequency signal and multiple frequency domain coefficients of the low frequency signal to obtain Extend the audio signal.
  • the basic frame is decoded to obtain the basic audio signal, which specifically includes: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, then according to the basic frame Obtain multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency-domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency-domain coefficients; according to the multiple frequency domains of the low-frequency signal The coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain the basic audio signal.
  • joint decoding of the basic frame and the extended frame to obtain the extended audio signal specifically includes: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, According to the multiple group envelope values of the high-frequency signal, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the multiple frequency domain coefficients of the high-frequency signal are obtained; The frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain an extended audio signal.
  • the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal , Then obtain multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal, and obtain multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal;
  • the multiple frequency domain coefficients are determined by frequency domain transformation based on the basic audio signal obtained in the basic frame, or the frequency domain coefficients of multiple low-frequency signals are determined based on multiple group envelope values of the low-frequency signal in the extended frame.
  • the multiple frequency domain coefficients are the group envelope values corresponding to the frequency domain coefficients; the frequency domain inverse transform is performed according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
  • performing the frequency domain inverse change according to the frequency domain coefficients specifically includes: obtaining the audio analog signal corresponding to the frequency domain coefficient according to the improved inverse discrete cosine transform algorithm.
  • the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
  • an audio signal processing device includes: a preprocessing module for sampling and quantizing the acquired first audio signal to obtain a second audio signal; and an encoding module for using a first duration
  • the second audio signal is encoded in the first encoding mode to obtain a basic frame
  • the second audio signal is encoded in the second encoding method in the second time length unit to obtain an extended frame, wherein the second The duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and/or separately perform the second audio signal Encoding with different encoding levels
  • sending module used to send the basic frame and the extended frame to the second device.
  • the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  • the encoding module is specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; to encode the low-frequency signal according to the time-domain encoding method to obtain multiple
  • the first duration is multiple basic frames of frame length.
  • the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;
  • the multiple frequency domain coefficients are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is the average value of the multiple high frequency frequency domain coefficients in each group;
  • the group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.
  • the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ;
  • the multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is multiple high frequency frequency domains in each group The average value of the coefficients; encoding according to multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain multiple basic frames with the first time length as the frame length.
  • the encoding module is specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain the second Multiple extended frames whose duration is the frame length.
  • the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ;
  • the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain the corresponding group envelope value, where the group envelope value is in each group The average value of multiple frequency domain coefficients; encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the frequency domain transform specifically includes: an improved discrete cosine transform MDCT algorithm.
  • an audio signal processing device includes: a receiving module for receiving a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame It is obtained by re-encoding the audio signals corresponding to multiple basic frames; the decoding module is used to decode the basic frame to obtain the basic audio signal; or, jointly decode the basic frame and the extended frame to obtain the extended audio signal.
  • the decoding module is specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.
  • the decoding module is specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple frequency signals of the high-frequency signal according to the group envelope values of the multiple high-frequency signals.
  • the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient;
  • the basic audio signal is up-sampled to obtain the third audio signal;
  • the third audio signal is subjected to frequency domain transformation frame by frame to obtain the third
  • the multiple frequency domain coefficients of the low frequency signal corresponding to the audio signal; the inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
  • the decoding module is specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequencies of the low-frequency signal according to the basic frame. Domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain the basic audio signal.
  • the decoding module is specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple group envelope values of the high-frequency signal Value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency The domain coefficients are inversely transformed in the frequency domain to obtain an extended audio signal.
  • the decoding module is specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal.
  • the value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are obtained according to the basic frame
  • the basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to the multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal correspond to the frequency domain coefficients
  • Group envelope value Perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.
  • the frequency domain inverse change specifically includes: an improved inverse discrete cosine transform algorithm.
  • the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
  • an electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the first aspect and the first aspect.
  • an electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the second aspect and the second aspect described above.
  • a computer-readable storage medium When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned first aspect and the first aspect. Any one of the audio signal processing methods.
  • An eighth aspect provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of the first aspect and the first aspect.
  • a computer-readable storage medium When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned second aspect and the second aspect. Any one of the audio signal processing methods.
  • a computer program product is provided.
  • the computer program product runs on a computer, the computer executes the audio signal processing method according to any one of the second aspect and the second aspect.
  • any audio signal processing device, electronic device, computer readable storage medium, and computer program product provided above can be used to execute the corresponding method provided above, and therefore, the benefits that can be achieved are For the effect, please refer to the beneficial effect in the corresponding method provided above, which will not be repeated here.
  • FIG. 1 is a schematic diagram of an application scenario of an audio signal processing method provided by an embodiment of this application
  • FIG. 2 is a schematic flowchart of an audio signal processing method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the processing process of an audio signal processing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of an audio signal encoding frame provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of another audio signal processing method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an audio signal processing device provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of another audio signal processing device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the present embodiment, unless otherwise specified, “plurality” means two or more.
  • the embodiments of the present application provide an audio signal processing method and device, which can be applied to the transmission of audio signals between multiple electronic devices, and can be used for different audio signal processing requirements for different applications, and flexibly perform audio based on basic frames and extended frames.
  • the signal encoding and decoding can meet the audio processing with different delay requirements or different quality requirements. This solves the problems of repeated transmission and waste of bandwidth resources caused by different audio applications' requirements for the real-time and restoration quality of audio signal transmission when the same channel of audio signal is transmitted between multiple electronic devices in the prior art.
  • the audio signal processing method provided by the embodiment of the present application can be applied to an electronic device with audio signal processing capability, and includes at least two electronic devices, and data can be transmitted between the two electronic devices.
  • the audio signal can be transmitted through a wired network, a wireless local area network, Near Field Communication (NFC), or Bluetooth.
  • NFC Near Field Communication
  • the electronic device can be a mobile phone, a smart speaker, a smart headset, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), and a netbook.
  • UMPC ultra-mobile personal computer
  • UMPC ultra-mobile personal computer
  • netbook a netbook
  • PDA personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • the electronic device 1 may be a mobile phone
  • the electronic device 2 may be a smart headset.
  • the embodiment of the application provides an audio signal processing method, which is applied to a first device and a second device. As shown in Figure 2, the method may include:
  • the first device performs sampling and quantization processing on the acquired first audio signal to obtain a second audio signal.
  • the first audio signal may be an audio signal collected by the first device, or an audio signal stored locally by the first device or from another device or device.
  • the first audio signal needs to be sampled and quantized to obtain a digital signal to save transmission bandwidth.
  • the basic processing procedure can be referred to as shown in FIG. 3, after sampling and quantizing the first audio signal, the second audio signal s(n) is obtained, where n corresponds to different audio sampling points and is arranged in chronological order. If the audio signal is sampled at a frequency of 16kHz, which means that 16 ⁇ 10 3 sampling points are sampled per second, then the time interval between every two sampling points is 0.0625ms.
  • the quantized value corresponding to the sampling point of the audio signal is encoded into a binary digital signal, which can be transmitted.
  • different quantization precisions can be used to represent the quantization value of the sampling point, for example, it can be represented by 16 bits, 24 bits, or 32 bits.
  • the first device encodes the second audio signal frame by frame through the first encoding method in the unit of the first time length to obtain a basic frame, and encodes the second audio signal frame by frame in the second encoding method in the unit of the second time length. Get the extended frame.
  • the second duration is greater than the first duration, and therefore, the frame length of the extended frame is greater than the frame length of the basic frame.
  • the second audio signal of a fixed duration can be used as an interval, and after each frame of the second audio signal is collected and quantized, the second audio signal of this frame can be compressed and encoded, and then sent after being encoded frame by frame.
  • the second audio signal is encoded according to different time intervals, that is, different frame lengths, to generate two or more encoded frames, including a basic frame and an extended frame.
  • the current audio coding technology can only achieve infinitely close to the original audio signal, that is, the encoding of the audio signal.
  • the decoding rules determine that the digital encoding and decoding methods all have a certain degree of distortion to the audio signal, and cannot completely restore the original audio signal.
  • the encoding method involved in this application is a lossy encoding technology.
  • the basic frame or the extended frame in the embodiment of the present application can only encode a part of the first audio signal, but not all of it.
  • the extended frame may be obtained by re-encoding the second audio signal segments corresponding to multiple basic frames, and the extended frame may further encode audio signals in the basic frame that are not encoded or have insufficient encoding precision.
  • the first encoding method and the second encoding method may respectively encode different signals carried in the second audio signal.
  • the low-frequency signal part carried in the second audio signal is encoded according to the first encoding method to obtain a basic frame
  • the high-frequency signal part carried in the second audio signal is encoded according to the second encoding method to obtain an extended frame.
  • the first encoding method and the second encoding method can also be encoding frames with different encoding levels on the second audio signal respectively to obtain an encoded frame with lower encoding quality and an encoded frame with higher encoding quality, which are then transmitted to the decoding side decoding. Therefore, the decoding side can respectively recover different audio signals according to the basic frame or the extended frame. Compared with the original audio signal, the audio signal recovered from the extended frame combined with the basic frame has less distortion, so the encoding quality is better.
  • the encoding quality of the audio signal refers to the degree of restoration of the audio signal recovered after decoding relative to the original audio signal before encoding and compression. That is to say, the longer the frame length for encoding the second audio signal, the audio signal obtained after decoding has a higher signal reproduction degree and a lower distortion rate than the original audio signal.
  • the basic frame may be a lower delay and/or lower quality encoding of the current second audio signal
  • the first device may separately transmit the basic frame to the second device frame by frame.
  • the audio signal can be obtained by decoding according to a preset decoding mode, so as to be applied to audio applications that require low delay or relatively low audio quality.
  • the extended frame may perform higher delay and/or higher quality encoding on the current second audio signal.
  • the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame encoding transmits enhanced information for multiple basic frame audio signals, and further encodes data that is not included in the basic frame or incompletely encoded in the audio signal.
  • the second device side can jointly decode with the basic frame after receiving the extended frame frame by frame to obtain an audio signal with higher audio quality, which can be applied to audio applications that do not require high real-time performance but relatively high audio quality. .
  • the first device may encode the second audio signal in a unit of a first duration to obtain a basic frame; the first device encodes the second audio signal in a unit of a second duration to obtain an extended frame.
  • the second duration may be N times the first duration, and N is a natural number greater than or equal to 2.
  • the first duration is the frame length of the basic frame, that is, the time interval between two basic frames
  • the second duration is the frame length of the extended frame, that is, the time interval between two extended frames.
  • t1, t2, t3, t4, t5, t6, t7, and t8 represent the basic frames of audio coding.
  • the algorithmic delay of the basic frame is about ⁇ t, that is, the time interval between two basic frames is ⁇ t .
  • T1 and T2 represent the extended frames of audio coding.
  • the extended frame compression is performed once every four basic frames as an example.
  • the basic frame or the extended frame contains the digitized audio sample data.
  • the time delay ⁇ t may be 0.5 ms or 5 ms, and the time delay ⁇ t and ⁇ T depend on the design of the coding structure and actual application requirements. For example, when the sampling frequency is 16kHz and the frame length of the basic frame is 5ms, the number of audio sampling points contained in each basic frame is 80.
  • S203 The first device sends the basic frame and the extended frame to the second device.
  • the first device may transmit the basic frame to the second device frame by frame after encoding the basic frame, and the first device may transmit the extended frame to the second device frame by frame after encoding the extended frame. Therefore, after receiving the basic frame or the extended frame, the second device decodes the basic frame or the extended frame to recover the audio signal, which is used for different audio applications.
  • the second device receives the digital signal sent from the first device, and the digital signal includes a basic frame or an extended frame, and the second device can decode according to a preset encoding and decoding method, and restore Audio signal.
  • the specific process may include:
  • the second device receives the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding audio signals corresponding to multiple basic frames.
  • S502 The second device decodes the basic frame to obtain the basic audio signal, or jointly decodes the basic frame and the extended frame to obtain the extended audio signal.
  • the second device decodes the received basic frame or extended frame according to the preset codec rules, that is, the second device decodes the digital signal to obtain an analog signal, so as to meet the audio signal requirements of different audio applications on the second device.
  • the second device After receiving the basic frame, the second device performs frame decoding according to the basic frame to obtain the corresponding basic audio signal s 1 (n). After receiving the extended frame, the second device performs comprehensive decoding according to the extended frame and the basic frame to obtain the corresponding extended audio signal s 2 (n).
  • the audio content of the basic audio signal s 1 (n) and the second audio signal s 2 (n) are the same, but the transmission delay of the basic audio signal s 1 (n) and the extended audio signal s 2 (n) is sum The audio quality is different.
  • the audio quality of the basic audio signal s 1 (n) is slightly worse than that of the extended audio signal s 2 (n), and the transmission delay of the basic audio signal s 1 (n) is lower than that of the extended audio signal s 2 ( n) the transmission delay.
  • audio applications with different delay requirements between the encoding side and the decoding side can be transmitted using the same set of encoding schemes, that is, the encoding side only obtains one audio signal, but it can meet different delay requirements.
  • the basic frame and the extended frame are respectively encoded, so that the decoding side can decode different audio signals according to the two encoded frames to meet the needs of different audio applications.
  • the audio signal decoded according to the basic frame has a low delay, but the audio signal quality is poor.
  • the audio signal decoded according to the extended frame combined with the basic frame has a longer time delay, but the audio signal quality is better, and the distortion of the original audio signal is small.
  • the decoding side can recover more than two audio signals according to different basic frames and extended frames, and only one audio signal is encoded during encoding.
  • This encoding method reduces redundant information and avoids the encoding side from processing the same audio signal.
  • the problems of repeated transmission and waste of bandwidth resources after encoding have greatly reduced system overhead.
  • the first device may use a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low frequency part of the second audio signal.
  • the first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.
  • the required audio signals require strong real-time performance.
  • the signal transmission delay interval does not exceed 1ms, but the audio quality is not required.
  • High the audio signal may not contain high-frequency signals but only low-frequency signals.
  • the other is voice enhancement applications.
  • the required audio signal is not real-time, and the signal transmission delay does not exceed 6ms, but the audio quality is relatively high, and both high-frequency and low-frequency signals are required.
  • the encoding of the basic frame by the first device may specifically include:
  • the first device down-samples the second audio signal to obtain the low-frequency signal included in the second audio signal.
  • down-sampling means to sample a sequence of samples at intervals of several samples, so as to obtain the processing mode of the new sequence.
  • the bandwidth of the second audio signal obtained by quantization may be half of the sampling rate, that is, the bandwidth may be 8 kHz.
  • the second audio signal includes a frequency band of 0-8kHz, where the low-frequency signal s L (n) is a part of 0-4kHz, and the high-frequency signal s H (n) is a part of 4k-8kHz.
  • the second audio signal is subjected to double downsampling processing to obtain an audio signal whose low-frequency signal s L (n) included in the second audio signal is 0-4 kHz.
  • the time domain coding is to encode the waveform of the audio signal.
  • time-domain coding there are coding standards such as International Telecommunication Union (ITU) G.726, G.723.1 or G.728. These coding standards widely use code-excited linear prediction technology, based on the principle Human occurrence mechanism modeling, using the inherent characteristics of human glottis and sound channels to remove redundant information in audio signals, so as to maintain high audio quality while greatly reducing the bit rate required for audio coding .
  • the first device may use the G.726 encoding method to encode s L (n), and assemble basic frames at intervals of the first time length, and the frame length of the basic frames is the first time length.
  • the first duration may be 0.5 ms
  • the s L (n) signals of each 0.5 ms duration are coded one by one, and the obtained digital signal is a basic frame.
  • G.726 is a speech coding and decoding algorithm that can encode audio signals into digital signals with lower delay.
  • the encoding of the extended frame by the first device may specifically include:
  • the principle of frequency domain coding is to encode audio signals in the frequency domain by using the human ear's acceptance principle of sound. Focus on coding the frequency bands that humans pay attention to, and use a rough quantization or non-quantization strategy for frequency bands that are masked by other frequency bands or that are not easily perceivable by humans.
  • the advantage of frequency domain coding is that according to the characteristics of the human ear, a certain amount of redundancy is removed. Therefore, the coding effect of various audio signals is almost equivalent, especially for music and other signals.
  • the coding quality is higher than that of time domain coding.
  • MDCT Modified Discrete Cosine Transform
  • the MDCT transform is an algorithm that transforms the signal from the time domain to the frequency domain, and the obtained coefficients represent the frequency domain components of each frequency point.
  • the MDCT coefficient S(k) is obtained, and S(k) is the frequency domain part of the second audio signal.
  • the second duration is 5ms, that is, the frame length for encoding the extended frame is 5ms, and the sampling rate is 16kHz
  • the value range is 0 ⁇ 79.
  • the MDCT transform is performed on the s(n) signals of each 5ms duration one by one to obtain the corresponding MDCT coefficients.
  • the value range of k can be 0-79.
  • the frequency domain coefficient k starts from 0 and represents from low frequency to high frequency. Then the low-frequency frequency domain coefficients from low to high are S(0) ⁇ S(39), and the high-frequency frequency domain coefficients from low to high are S(40) ⁇ S(79).
  • the multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.
  • the above 40 high frequency frequency domain coefficients S(40) to S(79) are equally divided into 8 groups, and each group of high frequency groups includes five high frequency frequency domain coefficients, and the specific groups are as follows:
  • Group 1 contains high frequency frequency domain coefficients: S(40) ⁇ S(44);
  • Group 2 contains high frequency frequency domain coefficients: S(45) ⁇ S(49);
  • Group 3 contains high frequency frequency domain coefficients: S(50) ⁇ S(54);
  • Group 4 contains high frequency frequency domain coefficients: S(55) ⁇ S(59);
  • Group 5 contains high frequency frequency domain coefficients: S(69) ⁇ S(64);
  • Group 6 contains high frequency frequency domain coefficients: S(65) ⁇ S(69);
  • Group 7 contains high frequency frequency domain coefficients: S(70) ⁇ S(74);
  • Group 8 contains high frequency frequency domain coefficients: S(75) ⁇ S(79).
  • the group envelope values of the multiple high-frequency groups are obtained, where the group envelope value is the average value of the multiple high-frequency frequency domain coefficients in each group.
  • the first device can obtain the group envelope value of each group of the high-frequency part of the second audio signal, and then encode according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the calculation of the group envelope value may specifically be:
  • S HE (1) [S(45)+S(46)+S(47)+S(48)+S(49)]/5;
  • Group 8 envelope value: S HE (7) [S(75)+S(76)+S(77)+S(78)+S(79)]/5.
  • the first device may digitally encode the group envelope values of the multiple high-frequency groups obtained above, and send them to the second device frame by frame. For example, every 5 ms, the first device assembles the obtained S HE (0) to S HE (7) codes into an extended frame and sends it to the second device.
  • the second device receives a basic frame at regular intervals, and then decodes the basic frame according to the time-domain decoding method to obtain the first audio signal, which is relative to the original audio on the encoding side
  • the signal only contains the low frequency part.
  • the second device receives an extended frame at regular intervals, and the extended frame only contains the high frequency part of the original audio signal.
  • the second device combines the extended frame with the basic frame for comprehensive decoding to obtain the second audio signal.
  • the second audio signal includes not only a low frequency part, but also a high frequency part.
  • the second device can receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s 1 (n).
  • the basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower latency requirements, such as equipment calibration and positioning applications.
  • the extended frame received by the second device includes the group envelope values of multiple high-frequency signals
  • multiple high-frequency frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, that is, the high-frequency signal
  • the frequency domain coefficient of is the group envelope value corresponding to the high frequency frequency domain coefficient
  • the basic audio signal is up-sampled to obtain the third audio signal
  • the third audio signal is frequency domain transformed frame by frame to obtain the third audio signal corresponding Multiple low-frequency frequency domain coefficients of the low-frequency signal.
  • the audio signal recovered by the second device according to the multiple high-frequency frequency domain coefficients and the multiple low-frequency frequency domain coefficients is the extended audio signal.
  • the second device may receive an extended frame every 5 ms, and obtain the group envelope values S HE (0) to S HE (7) of the high frequency part of the audio signal from the extended frame.
  • group envelope value multiple high frequency frequency domain coefficients can be obtained, that is, the high frequency frequency domain coefficient of the audio signal is equal to the group envelope value of the corresponding high frequency frequency domain coefficient group, namely:
  • the up-sampling process is to insert one or more zero points in two adjacent points in the original signal.
  • a bandwidth of 8k and a sampling rate of 16 kHz can be obtained.
  • the third audio signal s′ L (n) but the high frequency part of the third audio signal s′ L (n) is still 0.
  • the frequency domain coefficient S′ L (k) can be obtained by MDCT transformation on the audio signal s′ L (n) of the low frequency part according to the following formula:
  • S(k) S′ L (k)
  • k 0-39.
  • the inverse transformation of the improved discrete cosine transform is performed on S(k), and the extended audio signal s 2 (n) can be obtained, and the extended audio signal s 2 (n) includes both high-frequency components and low-frequency components.
  • the specific formula of the inverse transform of the improved discrete cosine transform is as follows:
  • the audio signal s 1 (n) decoded according to the basic frame has only low-frequency components, and the decoding quality is low, but the audio signal has a low delay, which can be used for different audio quality requirements. High and low audio delay requirements for audio services applications.
  • the audio signal s 2 (n) obtained by joint decoding of the extended frame and the basic frame both high frequency and low frequency components are present, and the decoding quality is higher, but the delay is longer. Therefore, it can be used for higher audio quality requirements, but Applications of audio services that do not require high real-time audio transmission.
  • one audio application is transmitted through the same set of codec solutions, and different audio signals obtained by decoding can be applied to different audio applications respectively, thereby avoiding repeated coding, decoding and transmission processes, and greatly avoiding bandwidth
  • the waste of resources reduces system overhead.
  • the device on the decoding side may decode according to the extended frame
  • the device on the decoding side may decode according to the extended frame
  • the frequency domain inverse transform the low frequency domain coefficient is 0, and the audio signal can be recovered by performing the frequency domain inverse transform only according to the frequency domain coefficient of the high frequency part.
  • the audio signal only contains high frequency parts.
  • the first device may adopt a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low-frequency part of the second audio signal.
  • the first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.
  • the required audio signal requires strong real-time performance, the signal delay is low and does not exceed 6ms, and both high and low frequencies are required.
  • the other is a three-dimensional (3D) sound field acquisition application, which requires a higher audio signal quality and a longer signal delay.
  • the encoding of the basic frame by the first device may specifically include:
  • the first device uses the first time length as the frame length to perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients, that is, to obtain multiple low frequency frequency domain coefficients and high frequency signals of the low frequency signal corresponding to the second audio signal Of multiple high frequency frequency domain coefficients.
  • the multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency, and the group envelope value of multiple high frequency groups is obtained, where the group envelope value is multiple high frequencies in each group.
  • the average value of the frequency domain coefficients are obtained.
  • the first duration may be 5 ms.
  • the sampling rate is 16kHz
  • the first device can perform MDCT transformation on the audio signal s(n) every 5ms to obtain the MDCT coefficient S(k), where the value range of k can be 0-79.
  • the first device encodes the multiple frequency domain coefficients S(0)-S(39) of the low-frequency signal and the group envelope values S HE (0)-S HE (7) of the high-frequency signal to obtain a basic frame.
  • the encoding of the extended frame by the first device may specifically include:
  • the first device uses the second duration as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the calculation method can be as follows:
  • SD HE (40) S(40)-S HE (0);
  • SD HE (41) S(41)-S HE (0);
  • SD HE (46) S(45)-S HE (1);
  • SD HE (79) S(79)-S HE (7).
  • the first device may assemble these group envelope coefficient differences SD HE (40) to SD HE (79) into an extended frame every 20ms, and transmit it to the second device.
  • the first device may directly encapsulate these group envelope coefficient differences SD HE (40) to SD HE (79) for transmission, or may also use differential quantization for encoding and transmission.
  • the second device receives a basic frame every first time length. If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, the second device receives the basic frame according to the basic The multiple envelope values of the high-frequency signal in the frame obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain the first An audio signal.
  • the second device receives an extended frame every second time length. If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device can combine the high-frequency signal in the basic frame Obtain multiple frequency domain coefficients of the high-frequency signal, and then perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the second audio signal.
  • the second audio signal includes not only a low frequency part, but also a high frequency part.
  • the second device can receive the basic frame every 5 ms.
  • the second device first obtains the frequency domain coefficients of the low frequency part of S(k) according to the basic frame, that is, S(0) to S(39).
  • the second device then obtains the high-frequency coefficients according to the high-frequency group envelope value in the basic frame, that is, each high-frequency frequency domain coefficient can be made equal to its corresponding group envelope value, namely:
  • the basic audio signal s 1 (n) has a relatively low time delay and includes both the high frequency part and the low frequency part of the original audio signal.
  • the high frequency part is only a high frequency signal restored with the group envelope value, that is, the values of multiple frequency bands are the same, the signal quality of the high frequency part is slightly worse, which is equivalent to reducing the frequency domain resolution of the high frequency part.
  • the second device can receive the extended frame every 20ms, and the second device obtains the group envelope coefficient difference SD HE (40)-SD HE (79) of the high frequency part of the audio signal from the extended frame. Then according to SD HE (40) ⁇ SD HE (79) to obtain the frequency domain coefficients of the high frequency part of each basic frame, that is, by adding the group envelope coefficient difference and the spectral envelope as shown below, each high frequency frequency Domain coefficient:
  • the extended audio signal s 2 (n) includes both the high frequency part and the low frequency part of the original audio signal, and the high frequency part is the high frequency signal restored with the group envelope value combined with the group envelope coefficient difference, so the expanded audio signal s 2 (n) higher compared to the basic audio signal s 1 (n) reducing the quality, but the longer the delay spreading s 2 (n) of the audio signal in terms of real-time signal transmission, the basic audio signal s 1 (n) is better than the extended audio signal s 2 (n).
  • the first device when the first device needs to meet more than three different audio application requirements on the second device, the first device may encode one basic frame and two or more extended frames.
  • the basic frame can be obtained by the first device using a time-domain coding method with a lower delay and low quality, that is, only the low frequency part of the second audio signal is encoded.
  • the first device obtains the first extended frame by adopting a frequency domain coding method with higher delay and low quality.
  • the first extended frame only encodes the envelope value of the frequency domain group of the high frequency part of the second audio signal.
  • the first device adopts a higher time delay and high-quality frequency domain coding method to obtain a second extended frame, and the second extended frame contains the high frequency part of the second audio signal.
  • the second device there are three different audio applications on the second device.
  • One is equipment calibration and positioning applications.
  • the requirement for processing audio signals is real-time, and the signal transmission delay interval should not exceed 1ms.
  • the audio signal can only contain low-frequency signals.
  • High-frequency signal the second is the application of voice enhancement, the application of audio signal processing requirements is strong real-time, the signal transmission delay does not exceed 6ms, the audio quality requirements are higher, the high-frequency signal and low-frequency in the audio signal
  • the signal part is required; the third is for the 3D sound field acquisition application, which does not require high real-time processing of audio signals, but requires high audio quality.
  • the encoding of the basic frame by the first device may refer to the encoding manner of the basic frame in the foregoing manner 1, which may include:
  • the first device may use the G.726 encoding method to encode s L (n), and assemble it into a basic frame at the interval of the first time length.
  • the first time length may be 0.5 ms, which satisfies the above-mentioned first audio frequency. Application requirements.
  • the encoding of the first extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:
  • the multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.
  • the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients.
  • the frame length is 5ms
  • the sampling rate is 16kHz
  • s(n) includes 80 sampling points, that is, S(0 ) ⁇ S(79).
  • HE (0) ⁇ S HE (7) where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group.
  • the first device can digitally encode the multiple high-frequency group envelope values S HE (0) ⁇ S HE (7) obtained above, and every 5 ms, the first device converts the obtained S HE (0) ⁇ S HE (0) ⁇ S HE (7)
  • the encoding is assembled into an extended frame and sent to the second device.
  • the encoding of the second extended frame in the foregoing step S202 may refer to the encoding process of the extended frame in the second manner, including:
  • the first device uses the third time as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the third time as the frame length.
  • the first device may calculate the difference between each high-frequency frequency domain coefficient of the high-frequency part after encoding of the first extended frame and the group envelope value of the corresponding high-frequency group every 20 ms. Specifically, multiple high-frequency frequency domain coefficients can be subtracted from the group envelope value corresponding to the high-frequency frequency domain coefficient to obtain group envelope coefficient differences SD HE (40) to SD HE (79 ). Then, the first device can assemble these group envelope coefficient differences SD HE (40)-SD HE (79) into a second extended frame every 20ms, and transmit it to the second device.
  • the second device receives a basic frame every first time length, and then decodes the basic frame according to the time domain decoding method to obtain a basic audio signal, which is relative to the original audio signal on the encoding side Only the low frequency part is included.
  • the second device receives a frame of the first extended frame every second time length. If the first extended frame includes the group envelope values of multiple high-frequency signals, the second device uses the group envelope values of the multiple high-frequency signals To the multiple frequency domain coefficients of the high frequency signal, the frequency domain coefficient of the high frequency signal is the group envelope value corresponding to the frequency domain coefficient; at the same time, the first audio signal obtained by decoding the basic frame is up-sampled to obtain the third audio signal ; Perform frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal. Then, inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the first extended audio signal.
  • the first extended audio signal includes a low-frequency signal and a high-frequency signal, but the high-frequency quality is slightly weaker, and the first extended audio signal has a longer time delay. Therefore, it can be used for the application of the second audio service described above.
  • the second device receives a second extended frame every third time. If the second extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device may combine the first Expand the group envelope value of the high-frequency signal in a frame to obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain The second extended audio signal.
  • the second extended audio signal includes not only a low frequency part, but also a high frequency part.
  • the second device may receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s 1 (n).
  • the basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower delay requirements, such as the aforementioned equipment calibration and positioning applications.
  • the second device can receive a first extended frame every 5ms, and obtain the group envelope value S HE (0) ⁇ S HE (7) of the high frequency part of the audio signal from the first extended frame, then the second device can be based on the group Envelope value can get multiple high frequency frequency domain coefficients S(40) ⁇ S(79).
  • the second device performs up-sampling processing on the audio signal s L (n) obtained by decoding multiple basic frames received within 5 ms to obtain the audio signal s′ L (n), and performs MDCT transformation on s′ L (n) to obtain the low frequency frequency Domain coefficients S(0) ⁇ S(39). Perform the MDCT inverse transformation on S(0) ⁇ S(79) to obtain the first extended audio signal s 2 (n).
  • the first extended audio signal s 2 (n) includes both the high frequency part and the low frequency part. , Among them, the quality of the high frequency part is slightly weaker.
  • the second device may receive a second extended frame every 20ms, and obtain the group envelope coefficient difference SD HE (40)-SD HE (79) of the high frequency part of the audio signal from the second extended frame. Then according to SD HE (40) ⁇ SD HE (79), combined with the group envelope value S HE (0) ⁇ S HE (7) of the high frequency part of the audio signal obtained in the above-mentioned first extended frame, each high frequency part is obtained The frequency domain coefficients S(40) ⁇ S(79). The inverse MDCT transform is performed on S(0) ⁇ S(79), and the second extended audio signal s 3 (n) of the 20ms time period is obtained.
  • the second extended audio signal s 3 (n) includes both the high frequency part It also includes a low frequency part, where the second extended audio signal s 3 (n) has a slightly better quality than the high frequency part of the first extended audio signal s 2 (n).
  • the present application provides more possible audio coding structures, which can be applied to three or more audio applications with different requirements, thereby saving transmission bandwidth and improving system performance.
  • the first device may use a time-domain coding method with a lower delay and low quality to obtain the basic frame, that is, only the low-frequency part of the second audio signal is encoded.
  • the first device can use a higher delay, low quality frequency domain encoding method to obtain the extended frame, and only encode the frequency domain group envelope value of the low frequency part and the frequency domain group envelope value of the high frequency part of the second audio signal .
  • step S202 for the encoding of the basic frame by the first device, refer to the encoding method for the basic frame in the above method 1, which may include:
  • the first device may use the G.726 encoding method to encode s L (n), and assemble it into a basic frame at intervals of the first duration, for example, the first duration may be 0.5 ms.
  • the encoding of the extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:
  • the multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, and the low frequency part
  • the multiple frequency domain coefficients of are averagely grouped in the order from low frequency to high frequency, and the group envelope values of multiple low frequency groups are obtained, which are encoded according to the envelope coding method.
  • the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients.
  • the frame length is 5ms
  • the sampling rate is 16kHz
  • s(n) includes 80 sampling points, that is, S(0 ) ⁇ S(79).
  • Each high-frequency group has five low-frequency component coefficients, and the group envelope value S LE (0 ) ⁇ S LE (7).
  • the 40 high-frequency component coefficients S(40) ⁇ S(79) are divided into 8 groups evenly, and each high-frequency group has five high-frequency component coefficients, and the group envelope value S of each high-frequency group is obtained.
  • the first device can digitally encode the group envelope values S LE (0) ⁇ S LE (7) of the multiple low frequency groups obtained above, and perform the group envelope values S HE (0) ⁇ of the multiple high frequency groups.
  • S HE (7) performs digital encoding. Every 5ms, the first device assembles the S LE (0) ⁇ S LE (7) and S HE (0) ⁇ S HE (7) obtained above into an extended frame and sends it to the second Device.
  • the second device receives a basic frame every first time period, and then decodes the basic frame according to the time-domain decoding method to obtain a basic audio signal.
  • the first audio signal is relative to the original audio on the encoding side.
  • the signal contains only the low frequency part.
  • the second device receives an extended frame every second time length, and if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, it is based on the multiple group envelope values of the low-frequency signal The value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal.
  • the multiple frequency domain coefficients of the low-frequency signal may be determined by performing frequency domain transformation on the first audio signal obtained from the basic frame.
  • the second device can determine multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal in the extended frame, where the frequency of the multiple low-frequency signals
  • the domain coefficient is the group envelope value corresponding to the frequency domain coefficient.
  • the second device can perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
  • the second device can receive a basic frame every 0.5ms, and then decode the basic frame according to the G.726 decoding method to obtain the basic audio signal s 1 (n) .
  • the basic audio signal s 1 (n) has only a low frequency part, but the time delay is as low as 0.5 ms.
  • the second device can receive an extended frame every 5ms, and the group envelope value S HE (0) ⁇ S HE (7) of the high frequency part of the audio signal is obtained from the extended frame, and then multiple envelope values can be obtained according to the group envelope value.
  • the second device performs up-sampling processing on the audio signal s L (n) obtained by decoding multiple extended frames received within 5 ms to obtain the audio signal s′ L (n), and performs MDCT transformation on s′ L (n) to obtain the low frequency frequency Domain coefficients S(0) ⁇ S(39). Perform inverse MDCT transformation on S(0) ⁇ S(79) to obtain the extended audio signal s 2 (n).
  • the extended audio signal s 2 (n) includes both the high frequency part and the low frequency part. The quality of the frequency part is slightly weaker.
  • the second device decodes the group envelope value S of the low frequency part obtained by decoding the extended frame.
  • LE (0) to S LE (7) obtain a plurality of low frequency frequency domain coefficients S(0) to S(39), wherein the plurality of low frequency frequency domain coefficients is equal to the group envelope value of the corresponding low frequency frequency domain coefficient group.
  • the second device obtains multiple high-frequency frequency domain coefficients S(40)-S(79) according to the group envelope values S HE (0)-S HE (7) of the high-frequency part obtained by decoding the extended frame.
  • the frequency-frequency domain coefficient is equal to the group envelope value of the corresponding high-frequency frequency domain coefficient group.
  • the second device performs MDCT inverse transformation on S(0) ⁇ S(79) obtained by decoding multiple extended frames received within 5ms, and then the extended audio signal s 2 (n) can be obtained.
  • the extended audio signal s 2 (n ) Includes both high frequency part and low frequency part.
  • the device on the decoding side can still decode based on the extended frame to realize the restoration of the entire audio signal.
  • the above-mentioned implementations provided by this application can transmit one audio application through the same set of codec solutions, and different audio signals obtained by decoding the basic frame or extended frame can be applied to different audio applications, thereby avoiding duplication.
  • the encoding, decoding and transmission process can greatly avoid the waste of bandwidth resources and reduce system overhead.
  • the device on the decoding side can decode according to the extended frame, which further improves the reliability of audio transmission.
  • the encoding side device may communicate with the decoding side device in advance according to the encoding requirements of the audio application for the transmission of the audio signal, and negotiate a specific encoding and decoding mode. For example, according to the first audio application on the second device that requires a low-latency, low-quality audio signal, the second device sends the audio signal request information to the first device to carry the configuration information, which is used to indicate that the audio signal request corresponds to Encoding. Or, when the first device sends the encoded frame to the second device, the encoding mode of the encoded frame can be indicated by the agreed bit.
  • the first device sends the basic frame of the audio signal to the second device, and the basic frame It includes two pre-configured bits.
  • 01 can indicate encoding mode two. It can be seen that the configuration of the foregoing codec is only shown as an example, and is not limited to the foregoing two types, and the embodiment of the present application does not specifically limit this.
  • the present application also provides an audio processing device, as shown in FIG. 6, the device 600 may include a preprocessing module 601, an encoding module 602, and a sending module 603.
  • the preprocessing module 601 may be used to perform sampling and quantization processing on the acquired first audio signal to obtain the second audio signal.
  • the encoding module 602 may be configured to encode the second audio signal in a first encoding mode in a first time length as a unit to obtain a basic frame, and perform the second audio signal in a second encoding method in a second time length as a unit. Encoding to obtain an extended frame, wherein the second duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and /Or encoding the second audio signal with different encoding degrees respectively.
  • the sending module 603 can be used to send the basic frame and the extended frame to the second device.
  • the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
  • the encoding module 602 may be specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; and encode the low-frequency signal according to the time-domain encoding method to obtain Multiple basic frames with the first duration as the frame length.
  • the encoding module 602 can be specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal; Part of the multiple frequency domain coefficients are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, where the group envelope value is the average value of multiple high frequency frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients; the multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups. Among them, the group envelope value is multiple high frequencies in each group. The average value of the frequency domain coefficients; encoding according to the multiple frequency domain coefficients of the low frequency signal and the group envelope value of the high frequency signal to obtain multiple basic frames with the first time length as the frame length.
  • the encoding module 602 may be specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain The second duration is multiple extended frames of the frame length.
  • the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients: group multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain the corresponding group envelope value, where the group envelope value is the average value of the multiple frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.
  • the frequency domain transform in the foregoing embodiment may specifically be an improved discrete cosine transform MDCT algorithm.
  • the device 700 includes a receiving module 701 and a decoding module 702.
  • the receiving module 701 may be used to receive the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding the audio signals corresponding to multiple basic frames of.
  • the decoding module 702 can be used to decode a basic frame to obtain a basic audio signal; or, to jointly decode a basic frame and an extended frame to obtain an extended audio signal.
  • the decoding module 702 may be specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.
  • the decoding module 702 can be specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple envelope values of the high-frequency signals according to the group envelope values of the multiple high-frequency signals.
  • a frequency domain coefficient, the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient;
  • the basic audio signal is up-sampled to obtain the third audio signal;
  • the third audio signal is subjected to frequency domain transformation frame by frame to obtain The multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
  • the decoding module 702 can be specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, obtain the multiple of the low-frequency signal according to the basic frame. Multiple frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency-domain coefficients of the low-frequency signal and the high-frequency signal Perform inverse frequency domain transformation on multiple frequency domain coefficients to obtain a basic audio signal.
  • the decoding module 702 may be specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple groups of the high-frequency signal The envelope value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain the multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain an extended audio signal.
  • the decoding module 702 can be specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal
  • the envelope value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are based on the basic frame
  • the obtained basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal are frequency domain coefficients.
  • Corresponding group envelope value ; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.
  • the frequency domain inverse change in the foregoing embodiment may specifically be an improved inverse discrete cosine transform algorithm.
  • the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.
  • the sending module may be a transmitter, which may include an antenna and a radio frequency circuit, and the preprocessing module, encoding module, and decoding module may be processors, such as baseband chips.
  • the audio signal processing device is a component having the function of the first device or the second device
  • the sending module may be a radio frequency unit
  • the preprocessing module, encoding module, and decoding module may be processors.
  • the sending module may be the output interface of the chip system
  • the preprocessing module, encoding module, and decoding module may be the processors of the chip system, such as a central processing unit (CPU) .
  • CPU central processing unit
  • the audio signal processing device is presented in the form of dividing various functional modules in an integrated manner.
  • the "module” herein may refer to a specific circuit, a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions.
  • the audio signal processing device may adopt the form shown in FIG. 8 below.
  • FIG. 8 is a schematic structural diagram of an exemplary electronic device 800 shown in an embodiment of the application.
  • the electronic device 800 may be the first device or the second device in the foregoing embodiment, and is used to execute the smart camera in the foregoing embodiment. Test method.
  • the electronic device 800 may include at least one processor 801, a communication line 802, and a memory 803.
  • the processor 801 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication line 802 may include a path to transmit information between the above-mentioned components, and the communication line may be, for example, a bus.
  • the memory 803 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently, and is connected to the processor through a communication line 802.
  • the memory can also be integrated with the processor.
  • the memory provided by the embodiment of the present application is usually a non-volatile memory.
  • the memory 803 is used to store and execute computer program instructions involved in the solutions of the embodiments of the present application, and the processor 801 controls the execution.
  • the processor 801 is configured to execute computer program instructions stored in the memory 803, so as to implement the method provided in the embodiment of the present application.
  • the computer program instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the processor 801 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 8.
  • the electronic device 800 may include multiple processors, such as the processor 801 and the processor 807 in FIG. 8. These processors can be single-CPU (single-CPU) processors or multi-core (multi-CPU) processors.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the electronic device 800 may further include a communication interface 804.
  • the electronic device can send and receive data through the communication interface 804, or communicate with other devices or a communication network.
  • the communication interface 804 can be, for example, an Ethernet interface, a radio access network (RAN), or a wireless local area interface (wireless local area). networks, WLAN) or USB interface, etc.
  • the electronic device 800 may further include an output device 805 and an input device 806.
  • the output device 805 communicates with the processor 801 and can display information in a variety of ways.
  • the output device 805 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • the input device 806 communicates with the processor 801, and can receive user input in a variety of ways.
  • the input device 806 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
  • the electronic device 800 can be a desktop computer, a portable computer, a web server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, a smart camera, or a smart camera as shown in Figure 8. Similar structure equipment.
  • PDA personal digital assistant
  • the embodiment of the present application does not limit the type of the electronic device 800. If it is used to implement the method of the second device in the foregoing embodiment, the electronic device 800 needs to be equipped with a smart camera.
  • the processor 801 in FIG. 8 may invoke the computer program instructions stored in the memory 803 to cause the electronic device 800 to execute the method in the foregoing method embodiment.
  • each processing module in FIG. 6 or FIG. 7 may be implemented by the processor 801 in FIG. 8 calling computer program instructions stored in the memory 803.
  • the function/implementation process of the preprocessing module 601 and the encoding module 602 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803.
  • the function/implementation process of the receiving module 701 and the decoding module 702 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803.
  • a computer-readable storage medium including instructions is also provided.
  • the foregoing instructions can be executed by the processor 801 of the electronic device 800 to complete the smart camera testing method of the foregoing embodiment. Therefore, the technical effects that can be obtained can refer to the above-mentioned method embodiments, which will not be repeated here.
  • the computer can be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

La présente invention se rapporte au domaine technique du traitement multimédia, et concerne un procédé et un appareil de traitement de signal audio, résolvant les problèmes de l'état antérieur de la technique selon lesquels, lorsqu'un signal audio est transmis entre de multiples dispositifs électroniques, la transmission est répétée et les ressources en bande passante sont gaspillées en raison des différentes exigences des différentes applications audio pour compresser et coder le signal audio. Le procédé comprend les étapes suivantes : échantillonner et quantifier, par un premier appareil, un premier signal audio obtenu pour obtenir un second signal audio ; coder, en unités d'une première durée, le second signal audio au moyen d'un premier mode de codage pour obtenir une trame de base ; coder, en unités d'une seconde durée, le second signal audio au moyen d'un second mode de codage pour obtenir une trame d'extension, la seconde durée étant plus longue que la première durée, et le premier mode de codage et le second mode de codage étant utilisés pour coder respectivement différents signaux transportés dans le second signal audio, et/ou coder séparément le second signal audio à différents degrés de codage ; envoyer la trame de base et la trame d'extension à un second appareil.
PCT/CN2020/098183 2020-06-24 2020-06-24 Procédé et appareil de traitement de signal audio WO2021258350A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/098183 WO2021258350A1 (fr) 2020-06-24 2020-06-24 Procédé et appareil de traitement de signal audio
CN202080092744.4A CN114945981A (zh) 2020-06-24 2020-06-24 一种音频信号处理方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/098183 WO2021258350A1 (fr) 2020-06-24 2020-06-24 Procédé et appareil de traitement de signal audio

Publications (1)

Publication Number Publication Date
WO2021258350A1 true WO2021258350A1 (fr) 2021-12-30

Family

ID=79282732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098183 WO2021258350A1 (fr) 2020-06-24 2020-06-24 Procédé et appareil de traitement de signal audio

Country Status (2)

Country Link
CN (1) CN114945981A (fr)
WO (1) WO2021258350A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425294A (zh) * 2002-09-06 2009-05-06 松下电器产业株式会社 声音编解码与发送接收设备及编码方法、通信终端和基站
CN103035248A (zh) * 2011-10-08 2013-04-10 华为技术有限公司 音频信号编码方法和装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425294A (zh) * 2002-09-06 2009-05-06 松下电器产业株式会社 声音编解码与发送接收设备及编码方法、通信终端和基站
CN103035248A (zh) * 2011-10-08 2013-04-10 华为技术有限公司 音频信号编码方法和装置

Also Published As

Publication number Publication date
CN114945981A (zh) 2022-08-26

Similar Documents

Publication Publication Date Title
US8442838B2 (en) Bitrate constrained variable bitrate audio encoding
RU2439718C1 (ru) Способ и устройство для обработки звукового сигнала
US10089997B2 (en) Method for predicting high frequency band signal, encoding device, and decoding device
US11289102B2 (en) Encoding method and apparatus
WO2019233362A1 (fr) Procédé d'amélioration de la qualité de la parole basés sur un apprentissage profond, dispositif et système
US20220180881A1 (en) Speech signal encoding and decoding methods and apparatuses, electronic device, and storage medium
WO2019233364A1 (fr) Amélioration de la qualité audio basée sur un apprentissage profond
EP2863388B1 (fr) Procédé et dispositif d'attribution de bits pour signal audio
US20100324914A1 (en) Adaptive Encoding of a Digital Signal with One or More Missing Values
JP2019529979A (ja) インデックスコーディング及びビットスケジューリングを備えた量子化器
CN111710342B (zh) 编码装置、解码装置、编码方法、解码方法及程序
CN111768790B (zh) 用于传输语音数据的方法和装置
WO2021213128A1 (fr) Procédé et appareil de codage de signal audio
WO2015165264A1 (fr) Procédé et dispositif de traitement de signal
WO2021258350A1 (fr) Procédé et appareil de traitement de signal audio
UA114233C2 (uk) Системи та способи для визначення набору коефіцієнтів інтерполяції
CN103503065B (zh) 用于衰减低精确度重构的信号区域的方法和解码器
CN113096670A (zh) 音频数据的处理方法、装置、设备及存储介质
WO2022258036A1 (fr) Procédé et appareil d'encodage, procédé et appareil de décodage, dispositif, support de stockage et programme informatique
EP4354430A1 (fr) Procédé et appareil de traitement de signal audio tridimensionnel
WO2022242534A1 (fr) Procédé et appareil d'encodage, procédé et appareil de décodage, dispositif, support de stockage et programme informatique
WO2022252957A1 (fr) Procédé de codage de données audio et appareil associé, procédé de décodage de données audio et appareil associé, et support de stockage lisible par ordinateur
WO2023141034A1 (fr) Codage spatial d'ambisonie d'ordre supérieur pour un codec audio immersif à faible latence
TW202248995A (zh) 一種音訊編碼、解碼方法及裝置
CN115512711A (zh) 语音编码、语音解码方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20942229

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20942229

Country of ref document: EP

Kind code of ref document: A1