WO2021258350A1

WO2021258350A1 - Audio signal processing method and apparatus

Info

Publication number: WO2021258350A1
Application number: PCT/CN2020/098183
Authority: WO
Inventors: 张立斌; 袁庭球
Original assignee: 华为技术有限公司
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2021-12-30
Also published as: CN114945981A

Abstract

The present application relates to the technical field of multimedia processing, and provides an audio signal processing method and apparatus, solving the problems in the prior art that when an audio signal is transmitted among multiple electronic devices, transmission is repeated and bandwidth resources are wasted due to different requirements of different audio applications for compressing and encoding the audio signal. The method comprises: a first apparatus samples and quantifies an obtained first audio signal to obtain a second audio signal; encode, in units of a first duration, the second audio signal by means of a first encoding mode to obtain a basic frame; encode, in units of a second duration, the second audio signal by means of a second encoding mode to obtain an extension frame, wherein the second duration is longer than the first duration, and the first encoding mode and the second encoding mode are used for respectively encoding different signals carried in the second audio signal, and/or separately encoding the second audio signal to different degrees of encoding; send the basic frame and the extension frame to a second apparatus.

Description

Audio signal processing method and device

Technical field

This application relates to the field of multimedia processing technology, and in particular to an audio signal processing method and device.

Background technique

At present, as people use more and more electronic devices more and more frequently, the collaborative processing of audio signals between multiple electronic devices will become an important technology development trend for audio signal processing in the future. When the audio signal is transmitted between multiple electronic devices, the electronic device as the transmitting end can sample, quantize, and encode the collected audio signal and then compress and transmit it to the electronic device at the receiving end. However, multiple applications on the electronic device as the receiving end may have different delay requirements and quality requirements for the audio signal, and they require the electronic device at the transmitting end to compress and encode the audio signal differently.

Figure 1 shows a possible application scenario. The mobile phone sends the collected audio signal to a smart headset. There are different audio applications on the smart headset. For example, audio application 1 is a voice enhancement application. The requirements are high, and the audio signal transmission quality requirements are general; the audio application 2 is a three-dimensional sound field collection application, which has high requirements for the transmission quality of the received audio signals, but the audio signal delay requirements are not high. According to the processing method of the prior art, the mobile phone needs to perform different compression and encoding processing on the same audio signal, and transmit multiple audio signals to the smart earphone. The transmission delay and quality of different audio signals are different, but the different audio signals are different. The content is the same audio signal collected by the mobile phone. Therefore, it will cause repeated transmission of audio signals, leading to occupation and waste of bandwidth resources.

Summary of the invention

The present application provides an audio signal processing method and device, which solves the problem of repeated transmission and bandwidth resources caused by different audio applications having different audio signal compression and coding requirements when the prior art is aimed at the transmission of audio signals between multiple electronic devices. The problem of waste.

In a first aspect, an audio signal processing method is provided. The method includes: a first device performs sampling and quantization processing on an acquired first audio signal to obtain a second audio signal; The first encoding method is encoded to obtain a basic frame, and the second audio signal is encoded in the second encoding method to obtain an extended frame using the second duration as a unit, where the second duration is greater than the first duration, and the first encoding method and The second encoding method respectively encodes different signals carried in the second audio signal, and/or encodes the second audio signal with different encoding degrees respectively; and sends the basic frame and the extended frame to the second device.

In the above technical solution, the audio signal sending end can encode and compress the same audio signal to obtain two encoding frames with different frame lengths, including a basic frame and an extended frame, and the extended frame can be a comparison between the basic frame and the second audio signal. Part of the signal that has not been encoded is re-encoded, or the part of the basic frame encoding that is not fine enough is re-encoded. Therefore, the receiving end can decode the basic frame to obtain an audio signal, and jointly decode the basic frame and the extended frame to obtain another audio signal. The restored two audio signals have different delays and different audio quality, which can meet the above requirements. The needs of different audio applications avoid the problems of repeated transmission and waste of bandwidth resources after encoding the same audio signal on the encoding side, and reduce system overhead.

In a possible design manner, the second duration is N times the first duration, and N is a natural number greater than or equal to 2.

In the foregoing possible implementation manner, when the first device encodes the second audio signal, the time interval between basic frames is the first duration, and the time interval between extended frames is N times the first duration, that is, every encoding N-frame basic frame, one-frame extended frame coding. Therefore, the encoding side obtains encoded frames with different delays, and the decoding side uses the encoded frames with different delays to recover audio signals with different delays to meet the needs of different audio applications, increase the encoding rate, and solve the problem of bandwidth resource waste. Reduce system overhead.

In a possible design method, the second audio signal is encoded by the first encoding method to obtain the basic frame by using the first duration as the unit, which specifically includes: down-sampling the second audio signal to obtain the second audio signal. The low-frequency signal; the low-frequency signal is encoded according to the time-domain coding method to obtain multiple basic frames with the first time length as the frame length.

In the foregoing possible implementation manner 1, the encoding side may encode the low-frequency signal included in the second audio signal according to a time-domain encoding manner to obtain a basic frame. Since the time-domain encoding method can encode the audio signal into a digital signal with a lower delay, it is suitable for encoding to obtain a basic frame with a lower delay and only including the low-frequency part of the original audio signal, so that the decoding side can recover from the basic frame Obtain an audio signal with strong real-time performance and general audio quality for application to corresponding audio applications.

In a possible design method, the second audio signal is encoded by the second encoding method to obtain the extended frame by using the second duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding The frequency domain coefficients of the frequency domain coefficients of the second audio signal are averagely grouped in the order from low frequency to high frequency in the high frequency part of the frequency domain coefficients corresponding to the second audio signal to obtain the group envelope values of multiple high frequency groups, where, The group envelope value is an average value of multiple high-frequency frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.

In the foregoing possible implementation manner 1, corresponding to the basic frame obtained by the foregoing encoding, the encoding side may also encode the high-frequency signal included in the second audio signal in a frequency domain encoding manner to obtain an extended frame, so as to obtain an extended frame for the basic frame. The high frequency part of the signal that has not been coded is coded. Therefore, the decoding side can jointly expand the frame recovery based on the above basic frame to obtain an audio signal with low real-time performance, but including the low-frequency and high-frequency parts of the original audio signal, and with better audio quality, so as to be applied to the corresponding audio application. The foregoing embodiments can meet the requirements of multiple audio applications through basic frame encoding and extended frame encoding, increase the encoding rate, and solve the problem of bandwidth resource waste.

In a possible design method, the basic frame is obtained by encoding the second audio signal through the first encoding method with the first duration as the unit, which specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal corresponding The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal; the multiple frequency-domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain a group of multiple high frequency groups Envelope value, where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group; encode according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain the first duration as Multiple basic frames of frame length.

In the second possible implementation manner, the encoding side may encode the low-frequency signal and the high-frequency signal included in the second audio signal according to a frequency-domain encoding manner, wherein multiple frequency-domain coefficients of the low-frequency signal are encoded, and The high-frequency signal only encodes the group envelope value of the high-frequency signal to obtain the basic frame. The basic frame coding method is to perform high-quality coding on the low-frequency part, and perform lower-quality coding on the high-frequency part. The decoding side can recover the audio signal with strong real-time performance and general audio quality according to the basic frame, which can be applied to the corresponding Audio application.

In a possible design method, the second audio signal is encoded by the second encoding method in the second time length as the unit to obtain the extended frame, which specifically includes: taking the second time length as the unit, the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.

In the second possible implementation manner, the encoding side can further encode the high-frequency part of the signal with lower encoding quality in the basic frame according to the basic frame in the second manner, that is, according to the multiple frequency domain coefficients of the high-frequency signal and the corresponding The difference value obtained from the group envelope value is encoded. The extended encoding method is to perform further high-quality encoding on the high-frequency part. Therefore, the decoding side can jointly decode and restore the above-mentioned basic frame and extended frame to obtain an audio signal with general real-time performance and strong audio quality, which can be applied to the corresponding audio signal. In application. The foregoing embodiment uses basic frame encoding and extended frame encoding to obtain encoded frames with different time delays and different encoding qualities, so that the encoding rate can be increased and the system overhead can be reduced.

In addition, there is another possible implementation manner 3. The encoding side can obtain the basic frame according to the time-domain encoding method of the above method 1, and obtain the first extended frame according to the encoding method of the extended frame in the above method 1, and then according to the above method The second extended frame is obtained by encoding the two pairs of extended frames. Through this encoding method, a basic frame with strong real-time performance, containing only low-frequency signals, and low coding quality can be obtained; the first extension with strong real-time performance, containing low-frequency and high-frequency signals but high-frequency signals, and low coding quality can be obtained Frame: Obtain a second extended frame with weak real-time performance, low-frequency and high-frequency signals, and high-frequency signal encoding quality. As a result, the levels of encoded frames are more abundant, and the decoding side can jointly decode and restore audio signals of different quality according to the above-mentioned basic frames and the first extended frame and the second extended frame to meet the needs of different audio applications and improve the flexibility of audio encoding. Performance and coding rate, reducing system overhead.

In a possible design manner, the second audio signal is encoded by the second encoding method to obtain the extended frame in the unit of the second duration, and specifically includes: performing frequency domain transformation on the second audio signal to obtain the second audio signal Multiple frequency domain coefficients of the corresponding low-frequency signal and multiple frequency domain coefficients of the high-frequency signal; multiple frequency-domain coefficients of the low-frequency signal and multiple frequency-domain coefficients of the high-frequency signal are performed in the order from low frequency to high frequency respectively Average grouping to obtain a corresponding group envelope value, where the group envelope value is an average value of multiple frequency domain coefficients in each group; encoding is performed according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.

In the fourth possible implementation manner above, corresponding to the basic frame obtained by encoding in the above manner 1, the encoding side may, according to the frequency domain encoding method, calculate the group envelope value and the high frequency frequency domain coefficients of the low frequency domain coefficients corresponding to the second audio signal The group envelope value of is encoded to obtain the extended frame. Therefore, in the case that the basic frame is lost, the decoding side can also decode according to the extended frame to recover the audio signal, which improves the reliability of audio coding transmission and improves the user experience.

In a possible design manner, performing frequency domain transformation on the second audio signal specifically includes: obtaining MDCT frequency domain component coefficients corresponding to the second audio signal according to an improved discrete cosine transform MDCT algorithm.

In a second aspect, an audio signal processing method is provided. The method includes: a second device receives a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is Audio signals corresponding to multiple basic frames are re-encoded; basic frames are decoded to obtain basic audio signals; or, basic frames and extended frames are jointly decoded to obtain extended audio signals.

In a possible design manner, decoding the basic frame to obtain the basic audio signal specifically includes: decoding the basic frame according to the time-domain codec mode to obtain the basic audio signal.

In a possible design method, the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes the group envelope values of multiple high-frequency signals, then according to the group of multiple high-frequency signals The envelope value obtains multiple frequency domain coefficients of the high-frequency signal, and the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; The signal undergoes frequency domain transformation frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the high frequency signal and multiple frequency domain coefficients of the low frequency signal to obtain Extend the audio signal.

In a possible design method, the basic frame is decoded to obtain the basic audio signal, which specifically includes: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, then according to the basic frame Obtain multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency-domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency-domain coefficients; according to the multiple frequency domains of the low-frequency signal The coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain the basic audio signal.

In a possible design method, joint decoding of the basic frame and the extended frame to obtain the extended audio signal specifically includes: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, According to the multiple group envelope values of the high-frequency signal, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the multiple frequency domain coefficients of the high-frequency signal are obtained; The frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal are subjected to frequency domain inverse transformation to obtain an extended audio signal.

In a possible design method, the basic frame and the extended frame are jointly decoded to obtain the extended audio signal, which specifically includes: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal , Then obtain multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal, and obtain multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal; The multiple frequency domain coefficients are determined by frequency domain transformation based on the basic audio signal obtained in the basic frame, or the frequency domain coefficients of multiple low-frequency signals are determined based on multiple group envelope values of the low-frequency signal in the extended frame. The multiple frequency domain coefficients are the group envelope values corresponding to the frequency domain coefficients; the frequency domain inverse transform is performed according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.

In a possible design method, performing the frequency domain inverse change according to the frequency domain coefficients specifically includes: obtaining the audio analog signal corresponding to the frequency domain coefficient according to the improved inverse discrete cosine transform algorithm.

In a possible design manner, the group envelope value includes the average value of the multiple frequency domain coefficients in each group obtained by averaging the multiple frequency domain coefficients in the order from low frequency to high frequency.

In a third aspect, an audio signal processing device is provided. The device includes: a preprocessing module for sampling and quantizing the acquired first audio signal to obtain a second audio signal; and an encoding module for using a first duration The second audio signal is encoded in the first encoding mode to obtain a basic frame, and the second audio signal is encoded in the second encoding method in the second time length unit to obtain an extended frame, wherein the second The duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and/or separately perform the second audio signal Encoding with different encoding levels; sending module, used to send the basic frame and the extended frame to the second device.

In a possible design method, the encoding module is specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; to encode the low-frequency signal according to the time-domain encoding method to obtain multiple The first duration is multiple basic frames of frame length.

In a possible design method, the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal; The multiple frequency domain coefficients are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is the average value of the multiple high frequency frequency domain coefficients in each group; The group envelope value is encoded to obtain multiple extended frames with the second duration as the frame length.

In a possible design method, the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ; The multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope value of multiple high frequency groups, where the group envelope value is multiple high frequency frequency domains in each group The average value of the coefficients; encoding according to multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain multiple basic frames with the first time length as the frame length.

In a possible design method, the encoding module is specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain the second Multiple extended frames whose duration is the frame length.

In a possible design method, the encoding module is specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal ; The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain the corresponding group envelope value, where the group envelope value is in each group The average value of multiple frequency domain coefficients; encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.

In a possible design method, the frequency domain transform specifically includes: an improved discrete cosine transform MDCT algorithm.

In a fourth aspect, an audio signal processing device is provided. The device includes: a receiving module for receiving a basic frame and an extended frame sent from a first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame It is obtained by re-encoding the audio signals corresponding to multiple basic frames; the decoding module is used to decode the basic frame to obtain the basic audio signal; or, jointly decode the basic frame and the extended frame to obtain the extended audio signal.

In a possible design manner, the decoding module is specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.

In a possible design method, the decoding module is specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple frequency signals of the high-frequency signal according to the group envelope values of the multiple high-frequency signals. The frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; the third audio signal is subjected to frequency domain transformation frame by frame to obtain the third The multiple frequency domain coefficients of the low frequency signal corresponding to the audio signal; the inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.

In a possible design method, the decoding module is specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequencies of the low-frequency signal according to the basic frame. Domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain the basic audio signal.

In a possible design method, the decoding module is specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple group envelope values of the high-frequency signal Value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency The domain coefficients are inversely transformed in the frequency domain to obtain an extended audio signal.

In a possible design method, the decoding module is specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal The value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are obtained according to the basic frame The basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to the multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal correspond to the frequency domain coefficients Group envelope value: Perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.

In a possible design method, the frequency domain inverse change specifically includes: an improved inverse discrete cosine transform algorithm.

In a fifth aspect, an electronic device is provided, the electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the first aspect and the first aspect.

In a sixth aspect, an electronic device is provided, the electronic device comprising: a processor and a transmission interface; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions so that The electronic device implements the audio signal processing method according to any one of the second aspect and the second aspect described above.

In a seventh aspect, a computer-readable storage medium is provided. When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned first aspect and the first aspect. Any one of the audio signal processing methods.

An eighth aspect provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of the first aspect and the first aspect.

In a ninth aspect, a computer-readable storage medium is provided. When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned second aspect and the second aspect. Any one of the audio signal processing methods.

In a tenth aspect, a computer program product is provided. When the computer program product runs on a computer, the computer executes the audio signal processing method according to any one of the second aspect and the second aspect.

It is understandable that any audio signal processing device, electronic device, computer readable storage medium, and computer program product provided above can be used to execute the corresponding method provided above, and therefore, the benefits that can be achieved are For the effect, please refer to the beneficial effect in the corresponding method provided above, which will not be repeated here.

Description of the drawings

FIG. 1 is a schematic diagram of an application scenario of an audio signal processing method provided by an embodiment of this application;

2 is a schematic flowchart of an audio signal processing method provided by an embodiment of this application;

FIG. 3 is a schematic diagram of the processing process of an audio signal processing method provided by an embodiment of the application;

4 is a schematic diagram of an audio signal encoding frame provided by an embodiment of the application;

FIG. 5 is a schematic flowchart of another audio signal processing method provided by an embodiment of the application;

FIG. 6 is a schematic diagram of an audio signal processing device provided by an embodiment of the application;

FIG. 7 is a schematic diagram of another audio signal processing device provided by an embodiment of the application;

FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.

detailed description

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present embodiment, unless otherwise specified, "plurality" means two or more.

It should be noted that in this application, words such as "exemplary" or "for example" are used to represent examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in this application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

First, the implementation environment and application scenarios of the embodiments of the present application are briefly introduced.

The embodiments of the present application provide an audio signal processing method and device, which can be applied to the transmission of audio signals between multiple electronic devices, and can be used for different audio signal processing requirements for different applications, and flexibly perform audio based on basic frames and extended frames. The signal encoding and decoding can meet the audio processing with different delay requirements or different quality requirements. This solves the problems of repeated transmission and waste of bandwidth resources caused by different audio applications' requirements for the real-time and restoration quality of audio signal transmission when the same channel of audio signal is transmitted between multiple electronic devices in the prior art.

As shown in FIG. 1, the audio signal processing method provided by the embodiment of the present application can be applied to an electronic device with audio signal processing capability, and includes at least two electronic devices, and data can be transmitted between the two electronic devices. For example, the audio signal can be transmitted through a wired network, a wireless local area network, Near Field Communication (NFC), or Bluetooth.

Specifically, the electronic device can be a mobile phone, a smart speaker, a smart headset, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), and a netbook. , As well as cellular phones, personal digital assistants (personal digital assistants, PDAs), augmented reality (AR)\virtual reality (VR) devices, etc. The embodiments of the present disclosure do not specifically limit the specific form of the electronic device . Exemplarily, as shown in FIG. 1, the electronic device 1 may be a mobile phone, and the electronic device 2 may be a smart headset.

The embodiment of the application provides an audio signal processing method, which is applied to a first device and a second device. As shown in Figure 2, the method may include:

S201: The first device performs sampling and quantization processing on the acquired first audio signal to obtain a second audio signal.

The first audio signal may be an audio signal collected by the first device, or an audio signal stored locally by the first device or from another device or device.

If the first device needs to send the first audio signal to the second device in response to the audio request of the second device, the first audio signal needs to be sampled and quantized to obtain a digital signal to save transmission bandwidth. The basic processing procedure can be referred to as shown in FIG. 3, after sampling and quantizing the first audio signal, the second audio signal s(n) is obtained, where n corresponds to different audio sampling points and is arranged in chronological order. If the audio signal is sampled at a frequency of 16kHz, which means that 16×10 ³ sampling points are sampled per second, then the time interval between every two sampling points is 0.0625ms.

Next, the quantized value corresponding to the sampling point of the audio signal is encoded into a binary digital signal, which can be transmitted. Among them, different quantization precisions can be used to represent the quantization value of the sampling point, for example, it can be represented by 16 bits, 24 bits, or 32 bits.

S202: The first device encodes the second audio signal frame by frame through the first encoding method in the unit of the first time length to obtain a basic frame, and encodes the second audio signal frame by frame in the second encoding method in the unit of the second time length. Get the extended frame.

Wherein, the second duration is greater than the first duration, and therefore, the frame length of the extended frame is greater than the frame length of the basic frame.

When encoding and compressing, the second audio signal of a fixed duration can be used as an interval, and after each frame of the second audio signal is collected and quantized, the second audio signal of this frame can be compressed and encoded, and then sent after being encoded frame by frame. In this application, the second audio signal is encoded according to different time intervals, that is, different frame lengths, to generate two or more encoded frames, including a basic frame and an extended frame.

It should be noted that according to the above-mentioned encoding principle and sampling rate of audio signals, it can be known that compared with the original audio signal in nature, the current audio coding technology can only achieve infinitely close to the original audio signal, that is, the encoding of the audio signal. The decoding rules determine that the digital encoding and decoding methods all have a certain degree of distortion to the audio signal, and cannot completely restore the original audio signal. The encoding method involved in this application is a lossy encoding technology.

Therefore, the basic frame or the extended frame in the embodiment of the present application can only encode a part of the first audio signal, but not all of it. Specifically, the extended frame may be obtained by re-encoding the second audio signal segments corresponding to multiple basic frames, and the extended frame may further encode audio signals in the basic frame that are not encoded or have insufficient encoding precision.

Specifically, the first encoding method and the second encoding method may respectively encode different signals carried in the second audio signal. For example, the low-frequency signal part carried in the second audio signal is encoded according to the first encoding method to obtain a basic frame, and the high-frequency signal part carried in the second audio signal is encoded according to the second encoding method to obtain an extended frame.

In addition, the first encoding method and the second encoding method can also be encoding frames with different encoding levels on the second audio signal respectively to obtain an encoded frame with lower encoding quality and an encoded frame with higher encoding quality, which are then transmitted to the decoding side decoding. Therefore, the decoding side can respectively recover different audio signals according to the basic frame or the extended frame. Compared with the original audio signal, the audio signal recovered from the extended frame combined with the basic frame has less distortion, so the encoding quality is better.

It can be seen that, in general, the longer the frame length for encoding the second audio signal, the higher the compression rate of the first audio signal, and the higher the delay of sending the signal; at the same bit rate, the audio signal’s The encoding quality is also better. Among them, the encoding quality of the audio signal refers to the degree of restoration of the audio signal recovered after decoding relative to the original audio signal before encoding and compression. That is to say, the longer the frame length for encoding the second audio signal, the audio signal obtained after decoding has a higher signal reproduction degree and a lower distortion rate than the original audio signal.

In the embodiment of the present application, the basic frame may be a lower delay and/or lower quality encoding of the current second audio signal, and the first device may separately transmit the basic frame to the second device frame by frame. In this way, after the second device receives the basic frame frame by frame, the audio signal can be obtained by decoding according to a preset decoding mode, so as to be applied to audio applications that require low delay or relatively low audio quality.

The extended frame may perform higher delay and/or higher quality encoding on the current second audio signal. Wherein, the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame encoding transmits enhanced information for multiple basic frame audio signals, and further encodes data that is not included in the basic frame or incompletely encoded in the audio signal. In this way, the second device side can jointly decode with the basic frame after receiving the extended frame frame by frame to obtain an audio signal with higher audio quality, which can be applied to audio applications that do not require high real-time performance but relatively high audio quality. .

In an implementation manner, the first device may encode the second audio signal in a unit of a first duration to obtain a basic frame; the first device encodes the second audio signal in a unit of a second duration to obtain an extended frame. Wherein, the second duration may be N times the first duration, and N is a natural number greater than or equal to 2. Among them, the first duration is the frame length of the basic frame, that is, the time interval between two basic frames, and the second duration is the frame length of the extended frame, that is, the time interval between two extended frames.

Taking Figure 4 as an example, t1, t2, t3, t4, t5, t6, t7, and t8 represent the basic frames of audio coding. The algorithmic delay of the basic frame is about Δt, that is, the time interval between two basic frames is Δt . T1 and T2 represent the extended frames of audio coding. In Figure 4, the extended frame compression is performed once every four basic frames as an example. The algorithm delay of the extended frame is ΔT, that is, the time interval between two extended frames is ΔT, where ΔT=4×Δt, that is, N=4. The basic frame or the extended frame contains the digitized audio sample data.

Exemplarily, the time delay Δt may be 0.5 ms or 5 ms, and the time delay Δt and ΔT depend on the design of the coding structure and actual application requirements. For example, when the sampling frequency is 16kHz and the frame length of the basic frame is 5ms, the number of audio sampling points contained in each basic frame is 80.

S203: The first device sends the basic frame and the extended frame to the second device.

The first device may transmit the basic frame to the second device frame by frame after encoding the basic frame, and the first device may transmit the extended frame to the second device frame by frame after encoding the extended frame. Therefore, after receiving the basic frame or the extended frame, the second device decodes the basic frame or the extended frame to recover the audio signal, which is used for different audio applications.

According to the above encoding method provided by the embodiment of the present application, the second device receives the digital signal sent from the first device, and the digital signal includes a basic frame or an extended frame, and the second device can decode according to a preset encoding and decoding method, and restore Audio signal. As shown in Figure 5, the specific process may include:

S501: The second device receives the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding audio signals corresponding to multiple basic frames.

S502: The second device decodes the basic frame to obtain the basic audio signal, or jointly decodes the basic frame and the extended frame to obtain the extended audio signal.

The second device decodes the received basic frame or extended frame according to the preset codec rules, that is, the second device decodes the digital signal to obtain an analog signal, so as to meet the audio signal requirements of different audio applications on the second device.

Further, after receiving the basic frame, the second device performs frame decoding according to the basic frame to obtain the corresponding basic audio signal s ₁ (n). After receiving the extended frame, the second device performs comprehensive decoding according to the extended frame and the basic frame to obtain the corresponding extended audio signal s ₂ (n).

Among them, the audio content of the basic audio signal s ₁ (n) and the second audio signal s ₂ (n) are the same, but the transmission delay of the _{basic audio signal s 1} (n) and the extended audio signal s _{2 (n) is sum} The audio quality is different. The audio quality of the basic audio signal s ₁ (n) is slightly worse than that of the extended audio signal s ₂ (n), and _{the transmission delay of the basic audio signal s 1} (n) is lower than that of the extended audio signal s ₂ ( n) the transmission delay.

Through the above-mentioned implementation manners of this application, audio applications with different delay requirements between the encoding side and the decoding side can be transmitted using the same set of encoding schemes, that is, the encoding side only obtains one audio signal, but it can meet different delay requirements. The basic frame and the extended frame are respectively encoded, so that the decoding side can decode different audio signals according to the two encoded frames to meet the needs of different audio applications. Among them, the audio signal decoded according to the basic frame has a low delay, but the audio signal quality is poor. The audio signal decoded according to the extended frame combined with the basic frame has a longer time delay, but the audio signal quality is better, and the distortion of the original audio signal is small. Therefore, the decoding side can recover more than two audio signals according to different basic frames and extended frames, and only one audio signal is encoded during encoding. This encoding method reduces redundant information and avoids the encoding side from processing the same audio signal. The problems of repeated transmission and waste of bandwidth resources after encoding have greatly reduced system overhead.

Next, by enumerating several preferred encoding and decoding implementation manners, such as manner one, manner two, manner three, and manner four, the encoding and decoding manners and processes in the above-mentioned technical solutions of the present application will be described in detail. The following several implementations are not all possible implementations of this application, but are only exemplary implementations.

method one,

1. Encoding process on the encoding side:

In a possible implementation manner, the first device may use a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low frequency part of the second audio signal. The first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.

Application scenarios For example, there are two different audio applications on the second device. One is equipment calibration and positioning applications. The required audio signals require strong real-time performance. The signal transmission delay interval does not exceed 1ms, but the audio quality is not required. High, the audio signal may not contain high-frequency signals but only low-frequency signals. The other is voice enhancement applications. The required audio signal is not real-time, and the signal transmission delay does not exceed 6ms, but the audio quality is relatively high, and both high-frequency and low-frequency signals are required.

Then, in the above step S202, the encoding of the basic frame by the first device may specifically include:

(1) The first device down-samples the second audio signal to obtain the low-frequency signal included in the second audio signal.

Among them, down-sampling means to sample a sequence of samples at intervals of several samples, so as to obtain the processing mode of the new sequence. For example, if the sampling rate for sampling the first audio signal is 16 kHz, the bandwidth of the second audio signal obtained by quantization may be half of the sampling rate, that is, the bandwidth may be 8 kHz. For example, the second audio signal includes a frequency band of 0-8kHz, where the low-frequency signal s _L (n) is a part of 0-4kHz, and the high-frequency signal s _H (n) is a part of 4k-8kHz. Then, the second audio signal is subjected to double downsampling processing to obtain an audio signal whose low-frequency signal s _L (n) included in the second audio signal is 0-4 kHz.

(2) Encode the low-frequency signal with the first duration as a unit according to the time-domain encoding method to obtain multiple basic frames.

Among them, the time domain coding is to encode the waveform of the audio signal. For time-domain coding, there are coding standards such as International Telecommunication Union (ITU) G.726, G.723.1 or G.728. These coding standards widely use code-excited linear prediction technology, based on the principle Human occurrence mechanism modeling, using the inherent characteristics of human glottis and sound channels to remove redundant information in audio signals, so as to maintain high audio quality while greatly reducing the bit rate required for audio coding .

Exemplarily, the first device may _{use the G.726 encoding method to encode s L} (n), and assemble basic frames at intervals of the first time length, and the frame length of the basic frames is the first time length. For example, the first duration may be 0.5 ms, and the s _L (n) signals of each 0.5 ms duration are coded one by one, and the obtained digital signal is a basic frame. Among them, G.726 is a speech coding and decoding algorithm that can encode audio signals into digital signals with lower delay.

Further, in the foregoing step S202, the encoding of the extended frame by the first device may specifically include:

(1) Perform frequency domain transformation on the second audio signal by using the second duration as a unit to obtain frequency domain coefficients corresponding to the second audio signal.

The principle of frequency domain coding is to encode audio signals in the frequency domain by using the human ear's acceptance principle of sound. Focus on coding the frequency bands that humans pay attention to, and use a rough quantization or non-quantization strategy for frequency bands that are masked by other frequency bands or that are not easily perceivable by humans. The advantage of frequency domain coding is that according to the characteristics of the human ear, a certain amount of redundancy is removed. Therefore, the coding effect of various audio signals is almost equivalent, especially for music and other signals. The coding quality is higher than that of time domain coding.

Specifically, Modified Discrete Cosine Transform (MDCT) may be performed on the second audio signal to obtain MDCT frequency domain coefficients corresponding to the second audio signal. Among them, the MDCT transform is an algorithm that transforms the signal from the time domain to the frequency domain, and the obtained coefficients represent the frequency domain components of each frequency point.

The transformation formula for transforming the time domain signal s(n) to MDCT frequency domain coefficient S(k) is as follows:

The MDCT coefficient S(k) is obtained, and S(k) is the frequency domain part of the second audio signal.

Exemplarily, if the second duration is 5ms, that is, the frame length for encoding the extended frame is 5ms, and the sampling rate is 16kHz, then s(n) includes 80 sampling points, that is, N=80, the sampling point n is taken The value range is 0～79. The MDCT transform is performed on the s(n) signals of each 5ms duration one by one to obtain the corresponding MDCT coefficients. The value range of k can be 0-79. The frequency domain coefficient k starts from 0 and represents from low frequency to high frequency. Then the low-frequency frequency domain coefficients from low to high are S(0)～S(39), and the high-frequency frequency domain coefficients from low to high are S(40)～S(79).

(2) The multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, which are encoded according to the envelope Way to encode.

Exemplarily, the above 40 high frequency frequency domain coefficients S(40) to S(79) are equally divided into 8 groups, and each group of high frequency groups includes five high frequency frequency domain coefficients, and the specific groups are as follows:

Group 1 contains high frequency frequency domain coefficients: S(40)～S(44);

Group 2 contains high frequency frequency domain coefficients: S(45)～S(49);

Group 3 contains high frequency frequency domain coefficients: S(50)～S(54);

Group 4 contains high frequency frequency domain coefficients: S(55)～S(59);

Group 5 contains high frequency frequency domain coefficients: S(69)～S(64);

Group 6 contains high frequency frequency domain coefficients: S(65)～S(69);

Group 7 contains high frequency frequency domain coefficients: S(70)～S(74);

Group 8 contains high frequency frequency domain coefficients: S(75)～S(79).

Next, the group envelope values of the multiple high-frequency groups are obtained, where the group envelope value is the average value of the multiple high-frequency frequency domain coefficients in each group. The first device can obtain the group envelope value of each group of the high-frequency part of the second audio signal, and then encode according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.

Exemplarily, the calculation of the group envelope value may specifically be:

Group 1 envelope value: S _HE (0)=[S(40)+S(41)+S(42)+S(43)+S(44)]/5;

Group 2 envelope value: S _HE (1)=[S(45)+S(46)+S(47)+S(48)+S(49)]/5;

Group 3 envelope value: S _HE (2)=[S(50)+S(51)+S(52)+S(53)+S(54)]/5;

Group 4 envelope value: S _HE (3)=[S(55)+S(56)+S(57)+S(58)+S(59)]/5;

Group 5 envelope value: S _HE (4)=[S(60)+S(61)+S(62)+S(63)+S(64)]/5;

Group 6 envelope value: S _HE (5)=[S(65)+S(66)+S(67)+S(68)+S(69)]/5;

Group 7 envelope value: S _HE (6)=[S(70)+S(71)+S(72)+S(73)+S(74)]/5;

Group 8 envelope value: S _HE (7)=[S(75)+S(76)+S(77)+S(78)+S(79)]/5.

Taking the second time length as the frame length, the first device may digitally encode the group envelope values of the multiple high-frequency groups obtained above, and send them to the second device frame by frame. For example, every 5 ms, the first device _{assembles the obtained S HE} (0) to S _HE (7) codes into an extended frame and sends it to the second device.

2. Decoding process on the decoding side:

Based on the above encoding method, the second device receives a basic frame at regular intervals, and then decodes the basic frame according to the time-domain decoding method to obtain the first audio signal, which is relative to the original audio on the encoding side The signal only contains the low frequency part.

The second device receives an extended frame at regular intervals, and the extended frame only contains the high frequency part of the original audio signal. The second device combines the extended frame with the basic frame for comprehensive decoding to obtain the second audio signal. The second audio signal includes not only a low frequency part, but also a high frequency part.

Taking the foregoing embodiment as an example, the second device can receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s ₁ (n). The basic audio signal s ₁ (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower latency requirements, such as equipment calibration and positioning applications.

If the extended frame received by the second device includes the group envelope values of multiple high-frequency signals, multiple high-frequency frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, that is, the high-frequency signal The frequency domain coefficient of is the group envelope value corresponding to the high frequency frequency domain coefficient; in addition, the basic audio signal is up-sampled to obtain the third audio signal; the third audio signal is frequency domain transformed frame by frame to obtain the third audio signal corresponding Multiple low-frequency frequency domain coefficients of the low-frequency signal. Then, the audio signal recovered by the second device according to the multiple high-frequency frequency domain coefficients and the multiple low-frequency frequency domain coefficients is the extended audio signal.

Exemplarily, the second device may receive an extended frame every 5 ms, and obtain the group envelope values S _HE (0) to S _HE (7) of the high frequency part of the audio signal from the extended frame. According to the group envelope value, multiple high frequency frequency domain coefficients can be obtained, that is, the high frequency frequency domain coefficient of the audio signal is equal to the group envelope value of the corresponding high frequency frequency domain coefficient group, namely:

S(40)=S(41)=S(42)=S(43)=S(44)=S _HE (0);

S(45)=S(46)=S(47)=S(48)=S(49)=S _HE (1);

S(50)=S(51)=S(52)=S(53)=S(54)=S _HE (2);

S(55)=S(56)=S(57)=S(58)=S(59)=S _HE (3);

S(60)=S(61)=S(62)=S(63)=S(64)=S _HE (4);

S(65)=S(66)=S(67)=S(68)=S(69)=S _HE (5);

S(70)=S(71)=S(72)=S(73)=S(74)=S _HE (6);

S(75)=S(76)=S(77)=S(78)=S(79)=S _HE (7), that is, S(40)～S(79) can be obtained.

Take the audio signal recovered from the basic frame received in the second time period, such as the audio signal s ₁ (n) obtained by decoding multiple basic frames within 5 ms, and perform up-sampling of the audio signal s _{1 (n) to obtain} The third audio signal _s'L (n). Among them, the up-sampling process is to insert one or more zero points in two adjacent points in the original signal. Illustratively, _{after up-sampling the above audio signal s 1} (n), a bandwidth of 8k and a sampling rate of 16 kHz can be obtained. The third audio signal s′ _L (n), but the _{high frequency part of the third audio signal s′ L} (n) is still 0.

_{The frequency domain coefficient S′ L} (k) can be obtained by MDCT transformation on the audio signal s′ _{L (n) of the low frequency part according to the following formula:}

Among them, the audio signal segment with a sampling rate of 16 kHz corresponding to a time delay of 5 ms has 80 sampling points, that is, N=80 in the above formula. Integrate the low-frequency coefficients of S′ _L (k) with the high-frequency coefficients S(40)-S(79) obtained from the extended frame in the above steps to obtain the complete MDCT coefficient S(k) of the audio frame. Among them, S(k)=S′ _L (k), and k=0-39.

The inverse transformation of the improved discrete cosine transform is performed on S(k), and the extended audio signal s ₂ (n) can be obtained, and the extended audio signal s ₂ (n) includes both high-frequency components and low-frequency components. Among them, the specific formula of the inverse transform of the improved discrete cosine transform is as follows:

In the audio signal obtained by decoding according to the above-mentioned embodiment, the audio signal s ₁ (n) decoded according to the basic frame has only low-frequency components, and the decoding quality is low, but the audio signal has a low delay, which can be used for different audio quality requirements. High and low audio delay requirements for audio services applications. _{According to the audio signal s 2} (n) obtained by joint decoding of the extended frame and the basic frame, both high frequency and low frequency components are present, and the decoding quality is higher, but the delay is longer. Therefore, it can be used for higher audio quality requirements, but Applications of audio services that do not require high real-time audio transmission.

In the above implementation of the present application, one audio application is transmitted through the same set of codec solutions, and different audio signals obtained by decoding can be applied to different audio applications respectively, thereby avoiding repeated coding, decoding and transmission processes, and greatly avoiding bandwidth The waste of resources reduces system overhead.

Further, when performing encoding and decoding according to the foregoing embodiment, the basic frame received by the device on the decoding side is lost, or the basic frame is not received, and the audio signal cannot be recovered according to the basic frame decoding, the device on the decoding side may decode according to the extended frame When performing the frequency domain inverse transform, the low frequency domain coefficient is 0, and the audio signal can be recovered by performing the frequency domain inverse transform only according to the frequency domain coefficient of the high frequency part. Among them, the audio signal only contains high frequency parts.

Method two,

1. Encoding process on the encoding side:

In a possible implementation manner, the first device may adopt a time-domain encoding method with a lower delay to obtain the basic frame, that is, only encode the low-frequency part of the second audio signal. The first device uses a higher time-delay frequency domain coding method to obtain the extended frame, and the extended frame only includes the high frequency part of the second audio signal.

For example, there are two different audio applications on the second device, one is a voice enhancement application, the required audio signal requires strong real-time performance, the signal delay is low and does not exceed 6ms, and both high and low frequencies are required. The other is a three-dimensional (3D) sound field acquisition application, which requires a higher audio signal quality and a longer signal delay.

(1) The first device uses the first time length as the frame length to perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients, that is, to obtain multiple low frequency frequency domain coefficients and high frequency signals of the low frequency signal corresponding to the second audio signal Of multiple high frequency frequency domain coefficients.

(2) The multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency, and the group envelope value of multiple high frequency groups is obtained, where the group envelope value is multiple high frequencies in each group. The average value of the frequency domain coefficients.

(3) Encode according to multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain multiple basic frames with the first time length as the frame length.

Exemplarily, in order to meet the real-time requirement of the voice enhancement application on the second device, the first duration may be 5 ms. For example, the sampling rate is 16kHz, the first device can perform MDCT transformation on the audio signal s(n) every 5ms to obtain the MDCT coefficient S(k), where the value range of k can be 0-79. Divide the high frequency frequency domain coefficients S(40)～S(79) into 8 groups evenly in order, and each group includes 5 high frequency frequency domain coefficients, then the group envelope value S _HE (0) of multiple high frequency groups is obtained ~S _HE (7). The first device encodes the multiple frequency domain coefficients S(0)-S(39) of the low-frequency signal and the group envelope values S _HE (0)-S _HE (7) of the high-frequency signal to obtain a basic frame.

The first device uses the second duration as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the second duration as the frame length.

Exemplarily, the first device may calculate the difference between each high-frequency frequency domain coefficient of the high-frequency part after the basic frame encoding and the group envelope value of the corresponding high-frequency group every 20 ms. Specifically, multiple high-frequency frequency domain coefficients may be subtracted from the group envelope value corresponding to the high-frequency frequency domain coefficient to obtain the group envelope coefficient difference SD _HE (k), where k=40-79. The calculation method can be as follows:

SD _HE (40)=S(40)-S _HE (0);

SD _HE (41)=S(41)-S _HE (0);

...

SD _HE (45)=S(45)-S _HE (1);

SD _HE (46)=S(45)-S _HE (1);

...

SD _HE (78)=S(78)-S _HE (7)

SD _HE (79)=S(79)-S _HE (7).

The first device may assemble these group envelope coefficient differences SD _HE (40) to SD _HE (79) into an extended frame every 20ms, and transmit it to the second device. Among them, the first device may _{directly encapsulate these group envelope coefficient differences SD HE} (40) to SD _HE (79) for transmission, or may also use differential quantization for encoding and transmission.

2. Decoding process on the decoding side:

Based on the above encoding method, the second device receives a basic frame every first time length. If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple group envelope values of the high-frequency signal, the second device receives the basic frame according to the basic The multiple envelope values of the high-frequency signal in the frame obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain the first An audio signal.

The second device receives an extended frame every second time length. If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device can combine the high-frequency signal in the basic frame Obtain multiple frequency domain coefficients of the high-frequency signal, and then perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the second audio signal. The second audio signal includes not only a low frequency part, but also a high frequency part.

Taking the foregoing embodiment as an example, the second device can receive the basic frame every 5 ms. The second device first obtains the frequency domain coefficients of the low frequency part of S(k) according to the basic frame, that is, S(0) to S(39). The second device then obtains the high-frequency coefficients according to the high-frequency group envelope value in the basic frame, that is, each high-frequency frequency domain coefficient can be made equal to its corresponding group envelope value, namely:

S(40)=S(41)=S(42)=S(43)=S(44)=S _HE (0);

S(45)=S(46)=S(47)=S(48)=S(49)=S _HE (1);

S(50)=S(51)=S(52)=S(53)=S(54)=S _HE (2);

S(55)=S(56)=S(57)=S(58)=S(59)=S _HE (3);

S(60)=S(61)=S(62)=S(63)=S(64)=S _HE (4);

S(65)=S(66)=S(67)=S(68)=S(69)=S _HE (5);

S(70)=S(71)=S(72)=S(73)=S(74)=S _HE (6);

S(75)=S(76)=S(77)=S(78)=S(79)=S _HE (7), that is, S(40)～S(79) are obtained.

Combining the above-mentioned low-frequency frequency domain coefficients S(0)-S(39) obtained from basic frame decoding, and the defective high-frequency frequency domain coefficients S(40)-S(79) of the high-frequency part. Perform inverse MDCT transformation on the obtained S(0)-S(79) to obtain the basic audio signal s ₁ (n). The basic audio signal s ₁ (n) has a relatively low time delay and includes both the high frequency part and the low frequency part of the original audio signal. However, since the high frequency part is only a high frequency signal restored with the group envelope value, that is, the values of multiple frequency bands are the same, the signal quality of the high frequency part is slightly worse, which is equivalent to reducing the frequency domain resolution of the high frequency part.

The second device can receive the extended frame every 20ms, and the second device obtains the group envelope coefficient difference SD _HE (40)-SD _HE (79) of the high frequency part of the audio signal from the extended frame. Then according to SD _HE (40) ~ SD _HE (79) to obtain the frequency domain coefficients of the high frequency part of each basic frame, that is, by adding the group envelope coefficient difference and the spectral envelope as shown below, each high frequency frequency Domain coefficient:

S(40)=SD _HE (40)+S _HE (0);

S(41)=SD _HE (41)+S _HE (0);

.....

S(45)=SD _HE (45)+S _HE (1);

S(46)=SD _HE (46)+S _HE (1);

.....

S(78)=SD _HE (78)+S _HE (7);

S(79)=SD _HE (79)+S _HE (7), that is, the complete high frequency part S(40)～S(79) of the frequency spectrum can be obtained.

Synthesize the frequency domain coefficients S(0)～S(39) of the low frequency part obtained from the basic frame decoding, and perform the MDCT inverse transformation on the obtained S(0)～S(79) to obtain the extended audio signal s ₂ (n) , The extended audio signal s ₂ (n) includes both the high frequency part and the low frequency part of the original audio signal, and the high frequency part is the high frequency signal restored with the group envelope value combined with the group envelope coefficient difference, so the expanded audio signal s ₂ (n) higher compared to the basic audio signal s ₁ (n) reducing the quality, but the longer the delay spreading s ₂ (n) of the audio signal in terms of real-time signal transmission, the basic audio signal s ₁ (n) is better than the extended audio signal s ₂ (n).

Way three

1. Encoding process on the encoding side:

In a possible implementation manner, when the first device needs to meet more than three different audio application requirements on the second device, the first device may encode one basic frame and two or more extended frames.

Specifically, the basic frame can be obtained by the first device using a time-domain coding method with a lower delay and low quality, that is, only the low frequency part of the second audio signal is encoded. The first device obtains the first extended frame by adopting a frequency domain coding method with higher delay and low quality. The first extended frame only encodes the envelope value of the frequency domain group of the high frequency part of the second audio signal. The first device adopts a higher time delay and high-quality frequency domain coding method to obtain a second extended frame, and the second extended frame contains the high frequency part of the second audio signal.

For example, there are three different audio applications on the second device. One is equipment calibration and positioning applications. The requirement for processing audio signals is real-time, and the signal transmission delay interval should not exceed 1ms. The audio signal can only contain low-frequency signals. High-frequency signal; the second is the application of voice enhancement, the application of audio signal processing requirements is strong real-time, the signal transmission delay does not exceed 6ms, the audio quality requirements are higher, the high-frequency signal and low-frequency in the audio signal The signal part is required; the third is for the 3D sound field acquisition application, which does not require high real-time processing of audio signals, but requires high audio quality.

Then, in the foregoing step S202, the encoding of the basic frame by the first device may refer to the encoding manner of the basic frame in the foregoing manner 1, which may include:

(1) Down-sampling the second audio signal to obtain the low-frequency signal included in the second audio signal;

(2) Encode the low-frequency signal according to the time-domain encoding method to obtain a plurality of basic frames with the first duration as the frame length.

Exemplarily, the first device may _{use the G.726 encoding method to encode s L} (n), and assemble it into a basic frame at the interval of the first time length. For example, the first time length may be 0.5 ms, which satisfies the above-mentioned first audio frequency. Application requirements.

Further, in the foregoing step S202, the encoding of the first extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:

(1) Using the second duration as the frame length, perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;

Exemplarily, the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients. For example, the frame length is 5ms, the sampling rate is 16kHz, and s(n) includes 80 sampling points, that is, S(0 )～S(79). Divide the 40 high-frequency component coefficients S(40)～S(79) into 8 groups equally, and each group of high-frequency groups has five high-frequency component coefficients, and obtain the group envelope value S of each group of high-frequency groups. _HE (0) ~ S _HE (7), where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group. The first device can digitally encode the multiple high-frequency group envelope values S _HE (0) ~ S _HE (7) obtained above, and every 5 ms, the first device converts the obtained S _HE (0) ~ S _{HE (0) ~ S HE} (7) The encoding is assembled into an extended frame and sent to the second device.

In combination with the above, the encoding of the second extended frame in the foregoing step S202 may refer to the encoding process of the extended frame in the second manner, including:

The first device uses the third time as a unit to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain multiple extended frames with the third time as the frame length.

Exemplarily, the first device may calculate the difference between each high-frequency frequency domain coefficient of the high-frequency part after encoding of the first extended frame and the group envelope value of the corresponding high-frequency group every 20 ms. Specifically, multiple high-frequency frequency domain coefficients can be subtracted from the group envelope value corresponding to the high-frequency frequency domain coefficient to obtain group envelope coefficient differences SD _HE (40) to SD _HE (79 ). Then, the first device can _{assemble these group envelope coefficient differences SD HE} (40)-SD _HE (79) into a second extended frame every 20ms, and transmit it to the second device.

2. Decoding process on the decoding side:

Based on the above encoding method, the second device receives a basic frame every first time length, and then decodes the basic frame according to the time domain decoding method to obtain a basic audio signal, which is relative to the original audio signal on the encoding side Only the low frequency part is included.

The second device receives a frame of the first extended frame every second time length. If the first extended frame includes the group envelope values of multiple high-frequency signals, the second device uses the group envelope values of the multiple high-frequency signals To the multiple frequency domain coefficients of the high frequency signal, the frequency domain coefficient of the high frequency signal is the group envelope value corresponding to the frequency domain coefficient; at the same time, the first audio signal obtained by decoding the basic frame is up-sampled to obtain the third audio signal ; Perform frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal. Then, inverse frequency domain transformation is performed according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the first extended audio signal. The first extended audio signal includes a low-frequency signal and a high-frequency signal, but the high-frequency quality is slightly weaker, and the first extended audio signal has a longer time delay. Therefore, it can be used for the application of the second audio service described above.

The second device receives a second extended frame every third time. If the second extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, the second device may combine the first Expand the group envelope value of the high-frequency signal in a frame to obtain multiple frequency-domain coefficients of the high-frequency signal, and then perform inverse frequency domain transformation according to the multiple frequency-domain coefficients of the low-frequency signal and the multiple frequency-domain coefficients of the high-frequency signal to obtain The second extended audio signal. The second extended audio signal includes not only a low frequency part, but also a high frequency part.

Exemplarily, in combination with the foregoing embodiment, the second device may receive a basic frame every 0.5 ms, and then decode the basic frame according to the G.726 decoding mode to obtain the basic audio signal s ₁ (n). The basic audio signal s ₁ (n) has only a low frequency part, but the time delay is as low as 0.5 ms. Therefore, the audio signal can be applied to audio applications with lower delay requirements, such as the aforementioned equipment calibration and positioning applications.

The second device can receive a first extended frame every 5ms, and obtain the group envelope value S _HE (0) ~ S _HE (7) of the high frequency part of the audio signal from the first extended frame, then the second device can be based on the group Envelope value can get multiple high frequency frequency domain coefficients S(40)～S(79). The second device performs up-sampling processing on _{the audio signal s L} (n) obtained by decoding multiple basic frames received within 5 ms to _{obtain the audio signal s′ L} (n), and _{performs MDCT transformation on s′ L} (n) to obtain the low frequency frequency Domain coefficients S(0)～S(39). Perform the MDCT inverse transformation on S(0)～S(79) to obtain the first extended audio signal s ₂ (n). The first extended audio signal s ₂ (n) includes both the high frequency part and the low frequency part. , Among them, the quality of the high frequency part is slightly weaker.

The second device may receive a second extended frame every 20ms, and obtain the group envelope coefficient difference SD _HE (40)-SD _HE (79) of the high frequency part of the audio signal from the second extended frame. Then according to SD _HE (40) ~ SD _HE _{(79), combined with the group envelope value S HE} (0) ~ S _HE (7) of the high frequency part of the audio signal obtained in the above-mentioned first extended frame, each high frequency part is obtained The frequency domain coefficients S(40)～S(79). The inverse MDCT transform is performed on S(0)～S(79), and the second extended audio signal s ₃ (n) of the 20ms time period is obtained. The second extended audio signal s ₃ (n) includes both the high frequency part It also includes a low frequency part, where the second extended audio signal s ₃ (n) has a slightly better quality than the high frequency part of the first extended audio signal s _{2 (n).}

Through the foregoing implementation manners, the present application provides more possible audio coding structures, which can be applied to three or more audio applications with different requirements, thereby saving transmission bandwidth and improving system performance.

Way four,

1. Encoding process on the encoding side:

In a possible implementation manner, the first device may use a time-domain coding method with a lower delay and low quality to obtain the basic frame, that is, only the low-frequency part of the second audio signal is encoded. The first device can use a higher delay, low quality frequency domain encoding method to obtain the extended frame, and only encode the frequency domain group envelope value of the low frequency part and the frequency domain group envelope value of the high frequency part of the second audio signal .

Then, in the above step S202, for the encoding of the basic frame by the first device, refer to the encoding method for the basic frame in the above method 1, which may include:

(1) Down-sampling the second audio signal to obtain the low-frequency signal included in the second audio signal.

Exemplarily, the first device may _{use the G.726 encoding method to encode s L} (n), and assemble it into a basic frame at intervals of the first duration, for example, the first duration may be 0.5 ms.

Further, in the foregoing step S202, the encoding of the extended frame by the first device may refer to the encoding process of the extended frame in the foregoing manner 1, including:

(2) The multiple frequency domain coefficients in the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in the order from low frequency to high frequency to obtain the group envelope values of multiple high frequency groups, and the low frequency part The multiple frequency domain coefficients of are averagely grouped in the order from low frequency to high frequency, and the group envelope values of multiple low frequency groups are obtained, which are encoded according to the envelope coding method.

Exemplarily, the first device can perform MDCT transformation on s(n) to obtain MDCT frequency domain coefficients. For example, the frame length is 5ms, the sampling rate is 16kHz, and s(n) includes 80 sampling points, that is, S(0 )～S(79). Divide the 40 low-frequency component coefficients S(0)～S(39) into 8 groups evenly. Each high-frequency group has five low-frequency component coefficients, and the group envelope value S _LE (0 )～S _LE (7). In addition, the 40 high-frequency component coefficients S(40)～S(79) are divided into 8 groups evenly, and each high-frequency group has five high-frequency component coefficients, and the group envelope value S of each high-frequency group is obtained. _HE (0) ~ S _HE (7), where the group envelope value is the average value of multiple high-frequency frequency domain coefficients in each group. _{The first device can digitally encode the group envelope values S LE} (0) ~ S _LE (7) of the multiple low frequency groups obtained above, and perform the group envelope values S _HE (0) ~ of the multiple high frequency groups. S _HE (7) performs digital encoding. Every 5ms, the first device _{assembles the S LE} (0) ~ S _LE (7) and S _HE (0) ~ S _HE (7) obtained above into an extended frame and sends it to the second Device.

2. Decoding process on the decoding side:

Based on the above encoding method, the second device receives a basic frame every first time period, and then decodes the basic frame according to the time-domain decoding method to obtain a basic audio signal. The first audio signal is relative to the original audio on the encoding side. The signal contains only the low frequency part.

The second device receives an extended frame every second time length, and if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, it is based on the multiple group envelope values of the low-frequency signal The value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple group envelope values of the high-frequency signal. Wherein, if the second device normally receives multiple basic frames, the multiple frequency domain coefficients of the low-frequency signal may be determined by performing frequency domain transformation on the first audio signal obtained from the basic frame. If the second device does not normally receive multiple basic frames, the second device can determine multiple frequency domain coefficients of the low-frequency signal according to multiple group envelope values of the low-frequency signal in the extended frame, where the frequency of the multiple low-frequency signals The domain coefficient is the group envelope value corresponding to the frequency domain coefficient. The second device can perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low frequency signal and the multiple frequency domain coefficients of the high frequency signal to obtain the extended audio signal.

Exemplarily, if the second device normally receives the basic frame, for example, the second device can receive a basic frame every 0.5ms, and then decode the basic frame according to the G.726 decoding method to obtain the basic audio signal s ₁ (n) . The basic audio signal s ₁ (n) has only a low frequency part, but the time delay is as low as 0.5 ms.

The second device can receive an extended frame every 5ms, and the group envelope value S _HE (0) ~ S _HE (7) of the high frequency part of the audio signal is obtained from the extended frame, and then multiple envelope values can be obtained according to the group envelope value. High frequency frequency domain coefficients S(40)～S(79). The second device performs up-sampling processing on _{the audio signal s L} (n) obtained by decoding multiple extended frames received within 5 ms to _{obtain the audio signal s′ L} (n), and _{performs MDCT transformation on s′ L} (n) to obtain the low frequency frequency Domain coefficients S(0)～S(39). Perform inverse MDCT transformation on S(0)～S(79) to obtain the extended audio signal s ₂ (n). The extended audio signal s ₂ (n) includes both the high frequency part and the low frequency part. The quality of the frequency part is slightly weaker.

Exemplarily, if the second device does not receive the basic frame normally, for example, the basic frame is lost or it is verified that the received basic frame is a basic frame with errors, the second device decodes the group envelope value S of the low frequency part obtained by decoding the extended frame. _LE (0) to S _LE (7) obtain a plurality of low frequency frequency domain coefficients S(0) to S(39), wherein the plurality of low frequency frequency domain coefficients is equal to the group envelope value of the corresponding low frequency frequency domain coefficient group. The second device obtains multiple high-frequency frequency domain coefficients S(40)-S(79) _{according to the group envelope values S HE} (0)-S _HE (7) of the high-frequency part obtained by decoding the extended frame. The frequency-frequency domain coefficient is equal to the group envelope value of the corresponding high-frequency frequency domain coefficient group. The second device performs MDCT inverse transformation on S(0)～S(79) obtained by decoding multiple extended frames received within 5ms, and then the extended audio signal s ₂ (n) can be obtained. The extended audio signal s ₂ (n ) Includes both high frequency part and low frequency part.

According to the foregoing implementation manner, when the basic frame cannot be decoded to restore the audio signal normally, the device on the decoding side can still decode based on the extended frame to realize the restoration of the entire audio signal.

In summary, the above-mentioned implementations provided by this application can transmit one audio application through the same set of codec solutions, and different audio signals obtained by decoding the basic frame or extended frame can be applied to different audio applications, thereby avoiding duplication. The encoding, decoding and transmission process can greatly avoid the waste of bandwidth resources and reduce system overhead. In addition, when the basic frame on the decoding side is lost and the audio signal cannot be recovered according to the basic frame decoding, the device on the decoding side can decode according to the extended frame, which further improves the reliability of audio transmission.

In another possible implementation manner, before the audio signal is encoded and decoded for transmission, the encoding side device may communicate with the decoding side device in advance according to the encoding requirements of the audio application for the transmission of the audio signal, and negotiate a specific encoding and decoding mode. For example, according to the first audio application on the second device that requires a low-latency, low-quality audio signal, the second device sends the audio signal request information to the first device to carry the configuration information, which is used to indicate that the audio signal request corresponds to Encoding. Or, when the first device sends the encoded frame to the second device, the encoding mode of the encoded frame can be indicated by the agreed bit. For example, the first device sends the basic frame of the audio signal to the second device, and the basic frame It includes two pre-configured bits. For example, 01 can indicate encoding mode two. It can be seen that the configuration of the foregoing codec is only shown as an example, and is not limited to the foregoing two types, and the embodiment of the present application does not specifically limit this.

The present application also provides an audio processing device, as shown in FIG. 6, the device 600 may include a preprocessing module 601, an encoding module 602, and a sending module 603.

The preprocessing module 601 may be used to perform sampling and quantization processing on the acquired first audio signal to obtain the second audio signal.

The encoding module 602 may be configured to encode the second audio signal in a first encoding mode in a first time length as a unit to obtain a basic frame, and perform the second audio signal in a second encoding method in a second time length as a unit. Encoding to obtain an extended frame, wherein the second duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and /Or encoding the second audio signal with different encoding degrees respectively.

The sending module 603 can be used to send the basic frame and the extended frame to the second device.

In a possible design manner, the encoding module 602 may be specifically used to: down-sample the second audio signal to obtain the low-frequency signal carried in the second audio signal; and encode the low-frequency signal according to the time-domain encoding method to obtain Multiple basic frames with the first duration as the frame length.

In a possible design manner, the encoding module 602 can be specifically used to: perform frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal; Part of the multiple frequency domain coefficients are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, where the group envelope value is the average value of multiple high frequency frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.

In a possible design manner, the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients; the multiple frequency domain coefficients of the high-frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups. Among them, the group envelope value is multiple high frequencies in each group. The average value of the frequency domain coefficients; encoding according to the multiple frequency domain coefficients of the low frequency signal and the group envelope value of the high frequency signal to obtain multiple basic frames with the first time length as the frame length.

In a possible design manner, the encoding module 602 may be specifically used to encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value with the second duration as a unit, to obtain The second duration is multiple extended frames of the frame length.

In a possible design manner, the encoding module 602 may be specifically used to: perform frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequencies of the high frequency signal corresponding to the second audio signal. Domain coefficients: group multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain the corresponding group envelope value, where the group envelope value is the average value of the multiple frequency domain coefficients in each group ; Perform encoding according to the group envelope value to obtain multiple extended frames with the second duration as the frame length.

In a possible design manner, the frequency domain transform in the foregoing embodiment may specifically be an improved discrete cosine transform MDCT algorithm.

This application also provides an audio signal processing device. As shown in FIG. 7, the device 700 includes a receiving module 701 and a decoding module 702.

The receiving module 701 may be used to receive the basic frame and the extended frame sent from the first device, where the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is obtained by re-encoding the audio signals corresponding to multiple basic frames of.

The decoding module 702 can be used to decode a basic frame to obtain a basic audio signal; or, to jointly decode a basic frame and an extended frame to obtain an extended audio signal.

In a possible design manner, the decoding module 702 may be specifically used to decode the basic frame according to the time-domain coding and decoding manner to obtain the basic audio signal.

In a possible design manner, the decoding module 702 can be specifically used to: if the extended frame includes the group envelope values of multiple high-frequency signals, obtain the multiple envelope values of the high-frequency signals according to the group envelope values of the multiple high-frequency signals. A frequency domain coefficient, the frequency domain coefficient of the high-frequency signal is the group envelope value corresponding to the frequency domain coefficient; the basic audio signal is up-sampled to obtain the third audio signal; the third audio signal is subjected to frequency domain transformation frame by frame to obtain The multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal; perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.

In a possible design manner, the decoding module 702 can be specifically used to: if the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, obtain the multiple of the low-frequency signal according to the basic frame. Multiple frequency domain coefficients and multiple frequency domain coefficients of the high-frequency signal, where the multiple frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; according to the multiple frequency-domain coefficients of the low-frequency signal and the high-frequency signal Perform inverse frequency domain transformation on multiple frequency domain coefficients to obtain a basic audio signal.

In a possible design manner, the decoding module 702 may be specifically used to: if the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, according to the multiple groups of the high-frequency signal The envelope value, and the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain the multiple frequency domain coefficients of the high-frequency signal; according to the multiple frequency domain coefficients of the low-frequency signal and the multiple of the high-frequency signal The frequency domain coefficients are subjected to frequency domain inverse transformation to obtain an extended audio signal.

In a possible design manner, the decoding module 702 can be specifically used to: if the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, according to the multiple group envelope values of the low-frequency signal The envelope value obtains multiple frequency domain coefficients of the low-frequency signal, and obtains multiple frequency domain coefficients of the high-frequency signal according to the multiple envelope values of the high-frequency signal; among them, the multiple frequency domain coefficients of the low-frequency signal are based on the basic frame The obtained basic audio signal is determined by frequency domain transformation, or the frequency domain coefficients of multiple low-frequency signals are determined according to multiple group envelope values of the low-frequency signal in the extended frame, and the multiple frequency domain coefficients of the low-frequency signal are frequency domain coefficients. Corresponding group envelope value; perform frequency domain inverse transformation according to multiple frequency domain coefficients of the low-frequency signal and multiple frequency domain coefficients of the high-frequency signal to obtain an extended audio signal.

In a possible design manner, the frequency domain inverse change in the foregoing embodiment may specifically be an improved inverse discrete cosine transform algorithm.

It is understandable that when the audio signal processing device is an electronic device, the sending module may be a transmitter, which may include an antenna and a radio frequency circuit, and the preprocessing module, encoding module, and decoding module may be processors, such as baseband chips. When the audio signal processing device is a component having the function of the first device or the second device, the sending module may be a radio frequency unit, and the preprocessing module, encoding module, and decoding module may be processors. When the above audio signal processing device is a chip system, the sending module may be the output interface of the chip system, and the preprocessing module, encoding module, and decoding module may be the processors of the chip system, such as a central processing unit (CPU) .

It should be noted that the specific execution process and embodiments in the above-mentioned apparatus 600 can refer to the steps performed by the first apparatus in the above method embodiment and related descriptions, and the specific execution process and embodiments in the above-mentioned apparatus 700 can refer to the above The steps performed by the second device in the method embodiment and related descriptions, the technical problems solved and the technical effects brought about can also refer to the content described in the foregoing embodiments, which will not be repeated here.

In this embodiment, the audio signal processing device is presented in the form of dividing various functional modules in an integrated manner. The "module" herein may refer to a specific circuit, a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions. In a simple embodiment, those skilled in the art can imagine that the audio signal processing device may adopt the form shown in FIG. 8 below.

FIG. 8 is a schematic structural diagram of an exemplary electronic device 800 shown in an embodiment of the application. The electronic device 800 may be the first device or the second device in the foregoing embodiment, and is used to execute the smart camera in the foregoing embodiment. Test method. As shown in FIG. 8, the electronic device 800 may include at least one processor 801, a communication line 802, and a memory 803.

The processor 801 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits.

The communication line 802 may include a path to transmit information between the above-mentioned components, and the communication line may be, for example, a bus.

The memory 803 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory can exist independently, and is connected to the processor through a communication line 802. The memory can also be integrated with the processor. The memory provided by the embodiment of the present application is usually a non-volatile memory. The memory 803 is used to store and execute computer program instructions involved in the solutions of the embodiments of the present application, and the processor 801 controls the execution. The processor 801 is configured to execute computer program instructions stored in the memory 803, so as to implement the method provided in the embodiment of the present application.

Optionally, the computer program instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In a specific implementation, as an embodiment, the processor 801 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 8.

In a specific implementation, as an embodiment, the electronic device 800 may include multiple processors, such as the processor 801 and the processor 807 in FIG. 8. These processors can be single-CPU (single-CPU) processors or multi-core (multi-CPU) processors. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

In a specific implementation, as an embodiment, the electronic device 800 may further include a communication interface 804. The electronic device can send and receive data through the communication interface 804, or communicate with other devices or a communication network. The communication interface 804 can be, for example, an Ethernet interface, a radio access network (RAN), or a wireless local area interface (wireless local area). networks, WLAN) or USB interface, etc.

In a specific implementation, as an embodiment, the electronic device 800 may further include an output device 805 and an input device 806. The output device 805 communicates with the processor 801 and can display information in a variety of ways. For example, the output device 805 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait. The input device 806 communicates with the processor 801, and can receive user input in a variety of ways. For example, the input device 806 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.

In a specific implementation, the electronic device 800 can be a desktop computer, a portable computer, a web server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, a smart camera, or a smart camera as shown in Figure 8. Similar structure equipment. The embodiment of the present application does not limit the type of the electronic device 800. If it is used to implement the method of the second device in the foregoing embodiment, the electronic device 800 needs to be equipped with a smart camera.

In some embodiments, the processor 801 in FIG. 8 may invoke the computer program instructions stored in the memory 803 to cause the electronic device 800 to execute the method in the foregoing method embodiment.

Exemplarily, the function/implementation process of each processing module in FIG. 6 or FIG. 7 may be implemented by the processor 801 in FIG. 8 calling computer program instructions stored in the memory 803. For example, the function/implementation process of the preprocessing module 601 and the encoding module 602 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803. The function/implementation process of the receiving module 701 and the decoding module 702 in FIG. 7 can be implemented by the processor 801 in FIG. 8 calling a computer execution instruction stored in the memory 803.

In an exemplary embodiment, a computer-readable storage medium including instructions is also provided. The foregoing instructions can be executed by the processor 801 of the electronic device 800 to complete the smart camera testing method of the foregoing embodiment. Therefore, the technical effects that can be obtained can refer to the above-mentioned method embodiments, which will not be repeated here.

In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.

After considering the specification and practicing the invention disclosed herein, those skilled in the art will easily think of other embodiments of the present application. This application is intended to cover any variations, uses, or adaptive changes of this application. These variations, uses, or adaptive changes follow the general principles of this application and include common knowledge or customary technical means in this technical field that are not disclosed in this application. .

Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any change or replacement within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

An audio signal processing method, characterized in that the method includes:

The first device performs sampling and quantization processing on the acquired first audio signal to obtain a second audio signal;

Encoding the second audio signal in a first encoding manner by using the first duration as a unit to obtain a basic frame;

Encode the second audio signal in a second encoding mode with a second duration as a unit to obtain an extended frame, wherein the second duration is greater than the first duration, and the first encoding manner and the second encoding manner Encoding manners respectively encode different signals carried in the second audio signal, and/or encode the second audio signal with different encoding levels;

Sending the basic frame and the extended frame to a second device.
The method according to claim 1, wherein the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
The method according to claim 1 or 2, wherein said encoding said second audio signal in a first time length as a unit to obtain a basic frame in a first encoding manner specifically comprises:

Down-sampling the second audio signal to obtain the low-frequency signal carried in the second audio signal;

The low-frequency signal is encoded according to a time-domain encoding manner to obtain a plurality of the basic frames with the first time length as the frame length.
The method according to claim 3, wherein the encoding the second audio signal in a second encoding mode to obtain an extended frame in a unit of the second duration specifically includes:

Performing frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;

The multiple frequency domain coefficients of the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group The envelope value is the average value of multiple high-frequency frequency domain coefficients in each group;

Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
The method according to claim 1 or 2, wherein said encoding said second audio signal in a first time length as a unit to obtain a basic frame in a first encoding manner specifically comprises:

Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;

The multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group envelope value is multiple in each group The average value of high frequency frequency domain coefficients;

Encoding is performed according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of the basic frames whose frame length is the first time length.
The method according to claim 4 or 5, wherein the encoding the second audio signal in a second encoding mode to obtain an extended frame in a unit of the second duration specifically includes:

Using the second duration as a unit, encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain a plurality of the extended frames with the second duration as the frame length .
The method according to claim 3, wherein the encoding of the second audio signal in a second encoding mode to obtain an extended frame in a unit of a second duration further specifically includes:

Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;

The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain corresponding group envelope values, wherein, the group envelope Value is the average value of multiple frequency domain coefficients in each group;

Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
The method according to any one of claims 4-7, wherein performing frequency domain transformation on the second audio signal specifically comprises:

According to the improved discrete cosine transform MDCT algorithm, the MDCT frequency domain component coefficients corresponding to the second audio signal are obtained.
An audio signal processing method, characterized in that the method includes:

The second device receives the basic frame and the extended frame sent from the first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame;

Decode the basic frame to obtain a basic audio signal;

or,

Jointly decoding the basic frame and the extended frame to obtain an extended audio signal.
The method according to claim 9, wherein said decoding said basic frame to obtain a basic audio signal specifically comprises:

The basic frame is decoded according to the time-domain coding and decoding mode to obtain the basic audio signal.
The method according to claim 9 or 10, wherein the joint decoding of the basic frame and the extended frame to obtain an extended audio signal specifically comprises:

If the extended frame includes group envelope values of multiple high-frequency signals, then multiple frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, and the frequency domain of the high-frequency signal The coefficient is the group envelope value corresponding to the frequency domain coefficient;

Up-sampling the basic audio signal to obtain a third audio signal;

Performing frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
The method according to claim 9, wherein said decoding said basic frame to obtain a basic audio signal specifically comprises:

If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, the multiple frequency domain coefficients of the low-frequency signal and the multiple envelope values of the high-frequency signal are obtained according to the basic frame. Multiple frequency domain coefficients, where the multiple frequency domain coefficients of the high-frequency signal are group envelope values corresponding to the frequency domain coefficients;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the basic audio signal.
The method according to claim 11 or 12, wherein the joint decoding of the basic frame and the extended frame to obtain an extended audio signal specifically comprises:

If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, then according to the multiple group envelope values of the high-frequency signal, and the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value obtains multiple frequency domain coefficients of the high-frequency signal;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
The method according to claim 10, wherein the joint decoding of the basic frame and the extended frame to obtain an extended audio signal specifically comprises:

If the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequency domain coefficients of the low-frequency signal according to the multiple group envelope values of the low-frequency signal, And obtain multiple frequency domain coefficients of the high frequency signal according to the multiple group envelope values of the high frequency signal;

Wherein, the multiple frequency domain coefficients of the low-frequency signal are determined according to the frequency domain transformation of the basic audio signal obtained in the basic frame, or the multiple frequency domain coefficients of the low-frequency signal are determined according to the extension frame Multiple group envelope values of the low-frequency signal are determined, and multiple frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
The method according to any one of claims 11-14, wherein performing frequency domain inverse change according to frequency domain coefficients specifically includes:

According to the improved inverse discrete cosine transform algorithm, the audio analog signal corresponding to the frequency domain coefficient is obtained.
The method according to any one of claims 11-15, wherein the group envelope value comprises a plurality of frequency domain coefficients obtained by averaging grouping a plurality of frequency domain coefficients in order from low frequency to high frequency. The average value of the frequency domain coefficients.
An audio signal processing device, characterized in that the device includes:

The preprocessing module is used for sampling and quantizing the acquired first audio signal to obtain the second audio signal;

An encoding module, configured to encode the second audio signal in a first time length as a unit through a first encoding method to obtain a basic frame, and use the second time length as a unit to encode the second audio signal in a second encoding method to obtain a basic frame An extended frame, wherein the second duration is greater than the first duration, and the first encoding method and the second encoding method respectively encode different signals carried in the second audio signal, and/or Encoding the second audio signal with different encoding degrees respectively;

The sending module is configured to send the basic frame and the extended frame to a second device.
The device according to claim 17, wherein the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
The device according to claim 18, wherein the encoding module is specifically configured to:

Down-sampling the second audio signal to obtain the low-frequency signal carried in the second audio signal;

The low-frequency signal is encoded according to a time-domain encoding manner to obtain a plurality of the basic frames with the first time length as the frame length.
The device according to claim 19, wherein the encoding module is specifically configured to:

Performing frequency domain transformation on the second audio signal to obtain frequency domain coefficients corresponding to the second audio signal;

The multiple frequency domain coefficients of the high frequency part of the frequency domain coefficients corresponding to the second audio signal are averagely grouped in order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group The envelope value is the average value of multiple high-frequency frequency domain coefficients in each group;

Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
The device according to claim 18, wherein the encoding module is specifically configured to:

Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;

The multiple frequency domain coefficients of the high frequency signal are averagely grouped in the order from low frequency to high frequency to obtain group envelope values of multiple high frequency groups, wherein the group envelope value is multiple in each group The average value of high frequency frequency domain coefficients;

Encoding is performed according to the multiple frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of the basic frames whose frame length is the first time length.
The device according to claim 20 or 21, wherein the encoding module is specifically configured to:

Using the second duration as a unit, encode the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value to obtain a plurality of the extended frames with the second duration as the frame length .
The device according to claim 19, wherein the encoding module is specifically configured to:

Performing frequency domain transformation on the second audio signal to obtain multiple frequency domain coefficients of the low frequency signal and multiple frequency domain coefficients of the high frequency signal corresponding to the second audio signal;

The multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal are respectively averaged and grouped in the order from low frequency to high frequency to obtain corresponding group envelope values, wherein, the group envelope Value is the average of multiple frequency domain coefficients in each group;

Encoding according to the group envelope value to obtain a plurality of the extended frames with the second duration as the frame length.
The device according to any one of claims 20-23, wherein the frequency domain transform specifically comprises: an improved discrete cosine transform (MDCT) algorithm.
An audio signal processing device, characterized in that the device includes:

The receiving module is configured to receive a basic frame and an extended frame sent from the first device, wherein the frame length of the extended frame is greater than the frame length of the basic frame, and the extended frame is an audio signal corresponding to a plurality of basic frames Re-encoded;

The decoding module is configured to decode the basic frame to obtain a basic audio signal; or jointly decode the basic frame and the extended frame to obtain an extended audio signal.
The device according to claim 25, wherein the decoding module is specifically configured to:

The basic frame is decoded according to the time-domain coding and decoding mode to obtain the basic audio signal.
The device according to claim 25 or 26, wherein the decoding module is specifically configured to:

If the extended frame includes group envelope values of multiple high-frequency signals, then multiple frequency domain coefficients of the high-frequency signal are obtained according to the group envelope values of the multiple high-frequency signals, and the frequency domain of the high-frequency signal The coefficient is the group envelope value corresponding to the frequency domain coefficient;

Up-sampling the basic audio signal to obtain a third audio signal;

Performing frequency domain transformation on the third audio signal frame by frame to obtain multiple frequency domain coefficients of the low frequency signal corresponding to the third audio signal;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the high frequency signal and the multiple frequency domain coefficients of the low frequency signal to obtain the extended audio signal.
The device according to claim 25, wherein the decoding module is specifically configured to:

If the basic frame includes multiple frequency domain coefficients of the low-frequency signal and multiple envelope values of the high-frequency signal, the multiple frequency domain coefficients of the low-frequency signal and the multiple envelope values of the high-frequency signal are obtained according to the basic frame. Multiple frequency domain coefficients, where the multiple frequency domain coefficients of the high-frequency signal are group envelope values corresponding to the frequency domain coefficients;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the basic audio signal.
The device according to claim 27 or 28, wherein the decoding module is specifically configured to:

If the extended frame includes the difference between the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope value, then according to the multiple group envelope values of the high-frequency signal, and the multiple frequency domains of the high-frequency signal The difference between the coefficient and the corresponding group envelope value obtains multiple frequency domain coefficients of the high-frequency signal;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
The device according to claim 26, wherein the decoding module is specifically configured to:

If the extended frame includes multiple group envelope values of the low-frequency signal and multiple group envelope values of the high-frequency signal, obtain multiple frequency domain coefficients of the low-frequency signal according to the multiple group envelope values of the low-frequency signal, And obtain multiple frequency domain coefficients of the high frequency signal according to the multiple group envelope values of the high frequency signal;

Wherein, the multiple frequency domain coefficients of the low-frequency signal are determined according to the frequency domain transformation of the basic audio signal obtained in the basic frame, or the multiple frequency domain coefficients of the low-frequency signal are determined according to the extension frame Multiple group envelope values of the low-frequency signal are determined, and multiple frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients;

Perform frequency domain inverse transformation according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
The device according to any one of claims 27-30, wherein the frequency domain inverse change specifically comprises: an improved inverse discrete cosine transform algorithm.
The device according to any one of claims 27-31, wherein the group envelope value comprises a plurality of frequency domain coefficients obtained by averaging grouping of multiple frequency domain coefficients in order from low frequency to high frequency. The average value of the frequency domain coefficients.
An electronic device, characterized in that, the electronic device includes:

Processor and transmission interface;

A memory for storing executable instructions of the processor;

Wherein, the processor is configured to execute the instructions, so that the electronic device implements the audio signal processing method according to any one of claims 1 to 8.
An electronic device, characterized in that, the electronic device includes:

Processor and transmission interface;

A memory for storing executable instructions of the processor;

Wherein, the processor is configured to execute the instruction, so that the electronic device implements the audio signal processing method according to any one of claims 9 to 16.
A computer-readable storage medium, when the instructions in the computer-readable storage medium are executed by the processor of an electronic device, the electronic device can execute the audio signal according to any one of claims 1 to 8 Approach.
A computer program product, when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of claims 1 to 8.
A computer-readable storage medium, when the instructions in the computer-readable storage medium are executed by the processor of an electronic device, the electronic device can execute the audio signal according to any one of claims 9 to 16 Approach.
A computer program product, when the computer program product runs on a computer, causes the computer to execute the audio signal processing method according to any one of claims 9 to 16.