CN114945981A - Audio signal processing method and device - Google Patents

Audio signal processing method and device Download PDF

Info

Publication number
CN114945981A
CN114945981A CN202080092744.4A CN202080092744A CN114945981A CN 114945981 A CN114945981 A CN 114945981A CN 202080092744 A CN202080092744 A CN 202080092744A CN 114945981 A CN114945981 A CN 114945981A
Authority
CN
China
Prior art keywords
frequency
frequency domain
audio signal
domain coefficients
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080092744.4A
Other languages
Chinese (zh)
Inventor
张立斌
袁庭球
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114945981A publication Critical patent/CN114945981A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Abstract

The application provides an audio signal processing method and device, relates to the technical field of multimedia processing, and solves the problems of repeated transmission and bandwidth resource waste caused by different requirements of different audio applications on audio signal compression coding when audio signals are transmitted among a plurality of electronic devices in the prior art. The method comprises the following steps: the first device samples and quantizes the acquired first audio signal to obtain a second audio signal; coding the second audio signal by a first coding mode by taking the first time length as a unit to obtain a basic frame; coding a second audio signal in a second coding mode by taking a second time length as a unit to obtain an extended frame, wherein the second time length is greater than the first time length, and the first coding mode and the second coding mode respectively code different signals carried in the second audio signal and/or respectively code the second audio signal in different coding degrees; and transmitting the basic frame and the extended frame to the second device.

Description

Audio signal processing method and device Technical Field
The present application relates to the field of multimedia processing technologies, and in particular, to an audio signal processing method and apparatus.
Background
At present, as people use more and more electronic devices more and more frequently, the audio signal co-processing among a plurality of electronic devices becomes an important technical development trend of the audio signal processing in the future. When audio signals are transmitted among a plurality of electronic devices, the electronic device serving as a transmitting end can sample, quantize and encode the acquired audio signals, and then compress and transmit the audio signals to the electronic device at a receiving end. The delay requirement and the quality requirement of the audio signal may be different for a plurality of applications on the electronic device as the receiving end, which requires that the electronic device at the transmitting end perform different processing for compression coding on the audio signal.
As fig. 1 shows a possible application scenario, a mobile phone sends a collected audio signal to an intelligent headset, where the intelligent headset has different audio applications, for example, an audio application 1 is a speech enhancement application, and has a high requirement on real-time performance of a received audio signal and a general requirement on transmission quality of the audio signal; the audio application 2 is a three-dimensional sound field acquisition application, and has a high requirement on the transmission quality of received audio signals, but has a low requirement on the time delay of the audio signals. According to the processing mode in the prior art, the mobile phone needs to perform different compression coding processing on the same audio signal, and transmits a plurality of audio signals to the smart headset, wherein the transmission delay and the quality of different audio signals are different, but the contents of different audio signals are the same audio signal collected by the mobile phone. Therefore, repeated transmission of audio signals may be caused, resulting in occupation and waste of bandwidth resources.
Disclosure of Invention
The application provides an audio signal processing method and an audio signal processing device, which solve the problems of repeated transmission and bandwidth resource waste caused by different requirements of different audio applications on audio signal compression coding when audio signals are transmitted among a plurality of electronic devices in the prior art.
In a first aspect, an audio signal processing method is provided, the method comprising: the first device samples and quantizes the acquired first audio signal to obtain a second audio signal; coding a second audio signal in a first coding mode by taking a first time length as a unit to obtain a basic frame, and coding the second audio signal in a second coding mode by taking a second time length as a unit to obtain an extended frame, wherein the second time length is greater than the first time length, and the first coding mode and the second coding mode respectively code different signals carried in the second audio signal and/or respectively code the second audio signal in different coding degrees; and transmitting the basic frame and the extended frame to the second device.
In the above technical solution, a sending end of an audio signal may encode and compress the same audio signal to obtain two encoded frames with different frame lengths, including a basic frame and an extended frame, where the extended frame may be obtained by re-encoding a part of a signal, which is not encoded by the basic frame, of a second audio signal, or by re-encoding a part, which is not encoded by the basic frame, of the basic frame. Therefore, a receiving end can decode according to the basic frame to obtain one audio signal, joint decoding is carried out according to the basic frame and the extended frame to obtain the other audio signal, the two recovered audio signals have different time delays and different audio qualities, the requirements of different audio applications can be met, the problems of repeated transmission and bandwidth resource waste after the same path of audio signal is coded by a coding side are solved, and the system overhead is reduced.
In a possible design, the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
In the foregoing possible implementation manner, when the first apparatus encodes the second audio signal, the time interval between the basic frames is a first duration, and the time interval between the extended frames is N times the first duration, that is, each N frames of basic frames are encoded, and one frame of extended frame encoding is performed. Therefore, the coding side obtains the coding frames with different time delays and is used for recovering the audio signals with different time delays according to the coding frames with different time delays by the decoding side so as to meet the requirements of different audio applications, improve the coding rate, solve the problem of bandwidth resource waste and reduce the system overhead.
In a possible design manner, encoding the second audio signal by the first encoding manner with the first duration as a unit to obtain the basic frame specifically includes: down-sampling the second audio signal to obtain a low-frequency signal carried in the second audio signal; and coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as the frame length.
In the first possible implementation manner, the encoding side may encode the low-frequency signal included in the second audio signal according to a time-domain encoding manner to obtain the basic frame. The audio signal can be coded into a digital signal with lower time delay by a time domain coding mode, so that the method is suitable for coding to obtain a basic frame which has lower time delay and only comprises a low-frequency part in the original audio signal, and a decoding side can recover to obtain the audio signal with strong real-time property and general audio quality according to the basic frame so as to be applied to corresponding audio application.
In a possible design, encoding a second audio signal in a second encoding manner with a second duration as a unit to obtain an extended frame specifically includes: performing frequency domain transformation on the second audio signal to obtain a frequency domain coefficient corresponding to the second audio signal; averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to a sequence from low frequency to high frequency to obtain a plurality of high-frequency grouped group envelope values, wherein the group envelope values are average values of a plurality of high-frequency domain coefficients in each group; and coding according to the group envelope value to obtain a plurality of extended frames taking the second duration as the frame length.
In the first possible implementation manner, corresponding to the basic frame obtained by the encoding, the encoding side may further encode the high-frequency signal included in the second audio signal according to a frequency-domain encoding manner to obtain an extended frame, so as to encode the high-frequency partial signal that is not encoded in the basic frame. Therefore, the decoding side can obtain an audio signal with weaker real-time performance, low-frequency and high-frequency parts in the original audio signal and better audio quality according to the basic frame and the combined extended frame, so as to be applied to corresponding audio application. The above embodiment can satisfy the requirements of various audio applications, improve the coding rate and solve the problem of bandwidth resource waste through the basic frame coding and the extended frame coding.
In a possible design, encoding the second audio signal in a first encoding manner with a first duration as a unit to obtain a basic frame specifically includes: performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of the low-frequency signal and a plurality of frequency domain coefficients of the high-frequency signal corresponding to the second audio signal; averagely grouping a plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is the average value of the plurality of high-frequency domain coefficients in each group; and coding according to the plurality of frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of basic frames taking the first duration as the frame length.
In the second possible implementation manner, the encoding side may encode the low-frequency signal and the high-frequency signal included in the second audio signal according to a frequency-domain encoding manner, where a plurality of frequency-domain coefficients of the low-frequency signal are encoded, and only the group envelope value of the high-frequency signal is encoded for the high-frequency signal, so as to obtain the basic frame. The basic frame coding mode is to carry out high-quality coding on a low-frequency part and carry out lower-quality coding on a high-frequency part, and a decoding side can recover and obtain an audio signal with strong real-time performance and general audio quality according to the basic frame so as to be applied to corresponding audio application.
In a possible design manner, encoding a second audio signal in a second encoding manner with a second duration as a unit to obtain an extended frame specifically includes: and coding the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values by taking the second time length as a unit to obtain a plurality of extension frames taking the second time length as a frame length.
In the second possible implementation manner, the encoding side may further encode the high-frequency partial signal with lower encoding quality in the basic frame according to the basic frame in the second possible implementation manner, that is, may encode the high-frequency partial signal according to the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding set of envelope values. The extension coding mode is to perform high-quality coding on a high-frequency part, so that a decoding side can recover and obtain an audio signal with general real-time performance and strong audio quality according to the basic frame and the extension frame, and the audio signal can be applied to corresponding audio application. The above embodiment obtains the encoded frames with different time delays and different encoding qualities by the basic frame encoding and the extended frame encoding, thereby improving the encoding rate and reducing the system overhead.
In another possible embodiment, the encoding side may obtain the basic frame according to the time domain coding method of the first embodiment, obtain the first extended frame according to the coding method of the extended frame in the first embodiment, and obtain the second extended frame according to the coding method of the two pairs of extended frames in the first embodiment. The basic frame which has strong real-time performance, only contains low-frequency signals and has lower coding quality can be obtained through the coding mode; obtaining a first extension frame which has stronger real-time performance, contains low-frequency and high-frequency signals but has low coding quality; and obtaining a second extension frame which is weak in real-time performance, contains low-frequency and high-frequency signals and is high in high-frequency signal coding quality. Therefore, the layers of the coding frames are richer, and the decoding side can carry out joint decoding and recovery with the first extension frame and the second extension frame respectively according to the basic frame to obtain audio signals with different qualities so as to meet the requirements of different audio applications, improve the flexibility and the coding rate of audio coding and reduce the system overhead.
In a possible design, encoding the second audio signal in a second encoding manner with a second duration as a unit to obtain an extended frame, specifically further includes: performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of the low-frequency signal and a plurality of frequency domain coefficients of the high-frequency signal corresponding to the second audio signal; respectively carrying out average grouping on the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain corresponding group envelope values, wherein the group envelope values are the average values of the plurality of frequency domain coefficients in each group; and coding according to the group envelope value to obtain a plurality of extended frames taking the second duration as the frame length.
In a fourth possible implementation manner, corresponding to the basic frame obtained by encoding in the first manner, the encoding side may encode the set of envelope values of the low-frequency domain coefficients and the set of envelope values of the high-frequency domain coefficients corresponding to the second audio signal according to a frequency domain encoding manner to obtain an extended frame. Therefore, under the condition that the basic frame is lost, the decoding side can decode according to the extension frame to recover and obtain the audio signal, the reliability of audio coding transmission is improved, and the use experience of a user is improved.
In a possible design, the frequency domain transforming the second audio signal specifically includes: and obtaining the MDCT frequency domain component coefficient corresponding to the second audio signal according to the Modified Discrete Cosine Transform (MDCT) algorithm.
In a second aspect, there is provided an audio signal processing method, the method comprising: the second device receives a basic frame and an extended frame sent by the first device, wherein the frame length of the extended frame is larger than that of the basic frame, and the extended frame is obtained by recoding audio signals corresponding to a plurality of basic frames; decoding the basic frame to obtain a basic audio signal; or jointly decoding the basic frame and the extended frame to obtain the extended audio signal.
In a possible design, decoding the basic frame to obtain a basic audio signal specifically includes: and decoding the basic frame according to a time domain coding and decoding mode to obtain a basic audio signal.
In a possible design, jointly decoding the basic frame and the extended frame to obtain an extended audio signal includes: if the extended frame comprises group envelope values of a plurality of high-frequency signals, obtaining a plurality of frequency domain coefficients of the high-frequency signals according to the group envelope values of the plurality of high-frequency signals, wherein the frequency domain coefficients of the high-frequency signals are the group envelope values corresponding to the frequency domain coefficients; up-sampling the basic audio signal to obtain a third audio signal; performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of frequency domain coefficients of the low-frequency signal corresponding to the third audio signal; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the high-frequency signal and the plurality of frequency domain coefficients of the low-frequency signal to obtain the expanded audio signal.
In a possible design, decoding the basic frame to obtain the basic audio signal specifically includes: if the basic frame comprises a plurality of frequency domain coefficients of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, obtaining the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the basic frame, wherein the plurality of frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain a basic audio signal.
In a possible design, jointly decoding the basic frame and the extended frame to obtain an extended audio signal includes: if the extended frame comprises the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values, obtaining the plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal and the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the expanded audio signal.
In a possible design, jointly decoding the basic frame and the extended frame to obtain an extended audio signal includes: if the extended frame comprises a plurality of group envelope values of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, obtaining a plurality of frequency domain coefficients of the low-frequency signal according to the plurality of group envelope values of the low-frequency signal, and obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal; the method comprises the steps that a plurality of frequency domain coefficients of a low-frequency signal are determined by performing frequency domain transformation on a basic audio signal obtained from a basic frame, or the frequency domain coefficients of the low-frequency signal are determined by a plurality of group envelope values of the low-frequency signal in an extended frame, and the plurality of frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the expanded audio signal.
In one possible design, performing inverse frequency domain transformation according to the frequency domain coefficients specifically includes: and obtaining the audio analog signal corresponding to the frequency domain coefficient according to the improved inverse discrete cosine transform algorithm.
In one possible design, the set of envelope values includes an average value of the plurality of frequency domain coefficients in each set, which is obtained by averagely grouping the plurality of frequency domain coefficients in order from low frequency to high frequency.
In a third aspect, an audio signal processing apparatus is provided, the apparatus comprising: the preprocessing module is used for sampling and quantizing the acquired first audio signal to obtain a second audio signal; the encoding module is configured to encode the second audio signal in a first encoding manner by using a first time length as a unit to obtain a basic frame, and encode the second audio signal in a second encoding manner by using a second time length as a unit to obtain an extended frame, where the second time length is greater than the first time length, and the first encoding manner and the second encoding manner respectively encode different signals carried in the second audio signal and/or respectively encode the second audio signal in different encoding degrees; and the sending module is used for sending the basic frame and the extended frame to the second device.
In a possible design, the second duration is N times the first duration, and N is a natural number greater than or equal to 2.
In one possible design, the encoding module is specifically configured to: down-sampling the second audio signal to obtain a low-frequency signal carried in the second audio signal; and coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as a frame length.
In one possible design, the encoding module is specifically configured to: performing frequency domain transformation on the second audio signal to obtain a frequency domain coefficient corresponding to the second audio signal; averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to a sequence from low frequency to high frequency to obtain a plurality of high-frequency grouped group envelope values, wherein the group envelope values are average values of a plurality of high-frequency domain coefficients in each group; and coding according to the group envelope value to obtain a plurality of extended frames taking the second duration as the frame length.
In one possible design, the encoding module is specifically configured to: performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of the low-frequency signal and a plurality of frequency domain coefficients of the high-frequency signal corresponding to the second audio signal; averagely grouping a plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is the average value of the plurality of high-frequency domain coefficients in each group; and coding according to the plurality of frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of basic frames taking the first duration as the frame length.
In one possible design, the encoding module is specifically configured to: and coding the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values by taking the second time length as a unit to obtain a plurality of extension frames taking the second time length as a frame length.
In a possible design, the encoding module is specifically configured to: performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of the low-frequency signal and a plurality of frequency domain coefficients of the high-frequency signal corresponding to the second audio signal; respectively carrying out average grouping on the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain corresponding group envelope values, wherein the group envelope values are the average values of the plurality of frequency domain coefficients in each group; and coding according to the group envelope value to obtain a plurality of extended frames taking the second duration as the frame length.
In one possible design, the frequency domain transform specifically includes: the discrete cosine transform MDCT algorithm is improved.
In a fourth aspect, there is provided an audio signal processing apparatus comprising: a receiving module, configured to receive a basic frame and an extended frame sent from a first apparatus, where a frame length of the extended frame is greater than a frame length of the basic frame, and the extended frame is obtained by recoding audio signals corresponding to multiple basic frames; the decoding module is used for decoding the basic frame to obtain a basic audio signal; or jointly decoding the basic frame and the extended frame to obtain the extended audio signal.
In one possible design, the decoding module is specifically configured to: and decoding the basic frame according to a time domain coding and decoding mode to obtain a basic audio signal.
In one possible design, the decoding module is specifically configured to: if the extended frame comprises group envelope values of a plurality of high-frequency signals, obtaining a plurality of frequency domain coefficients of the high-frequency signals according to the group envelope values of the plurality of high-frequency signals, wherein the frequency domain coefficients of the high-frequency signals are the group envelope values corresponding to the frequency domain coefficients; up-sampling the basic audio signal to obtain a third audio signal; performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of frequency domain coefficients of the low-frequency signal corresponding to the third audio signal; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the high-frequency signal and the plurality of frequency domain coefficients of the low-frequency signal to obtain the expanded audio signal.
In one possible design, the decoding module is specifically configured to: if the basic frame comprises a plurality of frequency domain coefficients of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, obtaining the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the basic frame, wherein the plurality of frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain a basic audio signal.
In one possible design, the decoding module is specifically configured to: if the extended frame comprises the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values, obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal and the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the expanded audio signal.
In one possible design, the decoding module is specifically configured to: if the extended frame comprises a plurality of groups of envelope values of the low-frequency signal and a plurality of groups of envelope values of the high-frequency signal, obtaining a plurality of frequency domain coefficients of the low-frequency signal according to the plurality of groups of envelope values of the low-frequency signal, and obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of groups of envelope values of the high-frequency signal; the method comprises the steps that a plurality of frequency domain coefficients of a low-frequency signal are determined by performing frequency domain transformation on a basic audio signal obtained from a basic frame, or the frequency domain coefficients of the low-frequency signal are determined by a plurality of group envelope values of the low-frequency signal in an extended frame, and the plurality of frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the expanded audio signal.
In a possible design, the inverse frequency domain transformation specifically includes: and improving an inverse discrete cosine transform algorithm.
In one possible design, the set of envelope values includes an average value of the plurality of frequency domain coefficients in each set, which is obtained by averagely grouping the plurality of frequency domain coefficients in order from low frequency to high frequency.
In a fifth aspect, an electronic device is provided, which includes: a processor and a transmission interface; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to cause the electronic device to implement the audio signal processing method according to any one of the first aspect and the first aspect.
In a sixth aspect, an electronic device is provided, which includes: a processor and a transmission interface; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to cause the electronic device to implement the audio signal processing method according to any one of the second and third aspects.
In a seventh aspect, a computer-readable storage medium is provided, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the audio signal processing method according to any one of the first aspect and the first aspect.
An eighth aspect provides a computer program product which, when run on a computer, causes the computer to perform the audio signal processing method according to any one of the first aspect and the first aspect as described above.
In a ninth aspect, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the audio signal processing method according to any one of the second and third aspects.
A tenth aspect provides a computer program product which, when run on a computer, causes the computer to perform the audio signal processing method according to any one of the second and third aspects.
It is understood that any one of the audio signal processing apparatus, the electronic device, the computer readable storage medium and the computer program product provided above can be used to execute the corresponding method provided above, and therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method provided above, and are not described herein again.
Drawings
Fig. 1 is a schematic view of an application scenario of an audio signal processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of an audio signal processing method according to an embodiment of the present application;
fig. 3 is a schematic processing process diagram of an audio signal processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an encoded frame of an audio signal according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another audio signal processing method according to an embodiment of the present application;
fig. 6 is a schematic diagram of an audio signal processing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic diagram of another audio signal processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.
It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
First, a brief description is given of an implementation environment and an application scenario of the embodiment of the present application.
The embodiment of the application provides an audio signal processing method and an audio signal processing device, which can be applied to audio signal transmission among a plurality of electronic devices, can flexibly perform coding and decoding of audio signals based on basic frames and extended frames according to different requirements of different applications on audio signal processing, and meet audio processing requirements of different time delays or different quality requirements. Therefore, the problems of repeated transmission and bandwidth resource waste caused by different requirements of different audio applications on the real-time performance and the reduction quality of audio signal transmission when the same audio signal is transmitted among a plurality of electronic devices in the prior art are solved.
As shown in fig. 1, the audio signal processing method provided in the embodiment of the present application may be applied to an electronic device with an audio signal processing capability, and at least includes two electronic devices, and data transmission may be performed between the two electronic devices. For example, the audio signal may be transmitted through a wired network, a wireless lan, Near Field Communication (NFC), bluetooth, or the like.
Specifically, the electronic device may be a mobile phone, a smart speaker, a smart headset, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, and the like, and the specific form of the electronic device is not particularly limited in the embodiments of the present disclosure. Illustratively, as shown in fig. 1, the electronic device 1 may be a mobile phone, and the electronic device 2 may be a smart headset.
The embodiment of the application provides an audio signal processing method which is applied to a first device and a second device. As shown in fig. 2, the method may include:
s201: the first device samples and quantizes the acquired first audio signal to obtain a second audio signal.
The first audio signal may be an audio signal captured by the first apparatus, or may be an audio signal stored locally by the first apparatus or from another apparatus or device.
If the first device needs to send the first audio signal to the second device in response to the audio request of the second device, the first audio signal needs to be sampled and quantized to obtain a digital signal, so as to save transmission bandwidth. As shown in fig. 3, the basic processing procedure may be to perform sampling and quantization processing on the first audio signal to obtain a second audio signal s (n), where n corresponds to different audio sampling points and is arranged in time sequence. If the audio signal is sampled on the basis of a frequency of 16kHz, i.e. representing 16 x 10 samples per second 3 A sampling point, then the time interval between every two sampling points is 0.0625 ms.
And then, coding the quantization value corresponding to the audio signal sampling point into a binary digital signal, and transmitting. The quantized values of the sampling points may be represented with different quantization precisions, such as 16 bits, 24 bits, or 32 bits.
S202: the first device encodes the second audio signal frame by frame in a first encoding mode by taking the first time length as a unit to obtain a basic frame, and encodes the second audio signal frame by frame in a second encoding mode by taking the second time length as a unit to obtain an extended frame.
The second duration is longer than the first duration, so the frame length of the extended frame is longer than that of the basic frame.
During coding compression, the second audio signal with fixed time duration can be used as an interval, and each time one frame of second audio signal is collected and quantized, the frame of second audio signal can be compressed and coded, and is sent after being coded frame by frame. In the present application, two or more kinds of encoded frames, including a basic frame and an extended frame, are generated by encoding a second audio signal according to different time intervals, that is, different frame lengths.
It should be noted that, according to the above-mentioned coding principle and sampling rate for audio signals, it can be known that, compared to original audio signals in the nature, the current audio coding technology can only approach the original audio signals infinitely, that is, the coding and decoding rules of the audio signals determine that the digital coding and decoding method has a certain degree of distortion for the audio signals, and cannot completely restore the original audio signals, and the coding method related to the present application is a lossy coding technology.
Therefore, the basic frame or the extension frame in the embodiment of the present application may encode only a portion of the first audio signal, and not all of the first audio signal. Specifically, the extension frame may be obtained by re-encoding the second audio signal segments corresponding to the plurality of basic frames, and the extension frame may further encode the audio signal that is not encoded or has insufficient encoding precision in the basic frames.
Specifically, the first encoding mode and the second encoding mode may encode different signals carried in the second audio signal respectively. For example, a low-frequency signal portion carried in the second audio signal is encoded according to the first encoding method to obtain a basic frame, and a high-frequency signal portion carried in the second audio signal is encoded according to the second encoding method to obtain an extended frame.
In addition, the first encoding method and the second encoding method may be performed by encoding frames of different encoding degrees to the second audio signal, respectively, to obtain an encoded frame of lower encoding quality and an encoded frame of higher encoding quality, and then transmitting the encoded frames to the decoding side for decoding. Therefore, the decoding side can recover different audio signals from the basic frame or the extended frame, respectively. Compared with the original audio signal, the audio signal recovered according to the extension frame and the basic frame has smaller distortion degree, so the coding quality is better.
It can be known that, in general, the longer the frame length for encoding the second audio signal, the higher the compression rate of the first audio signal, and the higher the time delay for transmitting the signal; the better the coding quality of the audio signal at the same code rate. Wherein, the encoding quality of the audio signal refers to the degree of restoration of the audio signal restored after decoding with respect to the original audio signal before encoding compression. That is, the longer the frame length for encoding the second audio signal is, the higher the restoration degree of the audio signal obtained after decoding is compared with the original audio signal, and the lower the distortion rate is.
In embodiments of the application, the basic frame may be a lower latency, and/or lower quality encoding of the current second audio signal, and the first device may transmit the basic frame individually to the second device on a frame-by-frame basis. Thus, after receiving the basic frame by frame, the second device can decode the basic frame according to a preset decoding mode to obtain an audio signal, so as to be applied to audio application with low time delay requirement or relatively low audio quality requirement.
The extension frame may be a higher latency, and/or higher quality encoding of the current second audio signal. The frame length of the extended frame is larger than that of the basic frame, the extended frame codes and transmits the enhancement information aiming at the audio signals of a plurality of basic frames, and the basic frame in the audio signals does not contain or codes incomplete data to further code. Therefore, the second device side can jointly decode the extended frames and the basic frames after receiving the extended frames frame by frame to obtain an audio signal with higher audio quality, so as to be applied to audio applications with low requirements on real-time performance and relatively higher requirements on audio quality.
In one embodiment, the first apparatus may encode the second audio signal in units of a first duration to obtain a basic frame; and the first device encodes the second audio signal by taking the second time length as a unit to obtain an extended frame. The second duration may be N times the first duration, where N is a natural number greater than or equal to 2. The first duration is a frame length of the basic frame, that is, a time interval between two basic frames, and the second duration is a frame length of the extended frame, that is, a time interval between two extended frames.
Taking fig. 4 as an example, t1, t2, t3, t4, t5, t6, t7, and t8 represent basic frames of audio coding, and the algorithmic time delay of the basic frames is about Δ t, i.e., the time interval between two basic frames is Δ t. T1 and T2 represent extended frames of audio coding, and fig. 4 exemplifies compression of the extended frames every four basic frames, and the arithmetic delay of the extended frames is Δ T, i.e., the time interval between two extended frames is Δ T, where Δ T is 4 × Δ T, i.e., N is 4. The base frame or the extension frame contains digitized audio sample data.
Illustratively, the time delay Δ T may be 0.5ms or 5ms, and the time delays Δ T and Δ T depend on the design of the coding structure and the actual application requirements. If the sampling frequency is 16kHz and the frame length of the basic frame is 5ms, the number of audio samples contained in each basic frame is 80.
S203: the first device transmits the basic frame and the extended frame to the second device.
The first device may encode the basic frame and transmit the encoded basic frame to the second device frame by frame, and the first device encodes the extended frame and transmits the encoded extended frame to the second device frame by frame. Therefore, after the second device receives the basic frame or the extended frame, the second device decodes according to the basic frame or the extended frame to recover the audio signal for different audio applications.
According to the above coding method provided in the embodiment of the present application, the second device receives the digital signal sent from the first device, where the digital signal includes a basic frame or an extended frame, and the second device may perform decoding according to a preset coding and decoding method to recover the audio signal. As shown in fig. 5, the specific process may include:
s501: the second device receives the basic frame and the extended frame sent by the first device, wherein the frame length of the extended frame is larger than that of the basic frame, and the extended frame is obtained by recoding the audio signals corresponding to a plurality of basic frames.
S502: the second device decodes the basic frame to obtain a basic audio signal, or jointly decodes the basic frame and the extended frame to obtain an extended audio signal.
The second device decodes the received basic frame or the extended frame according to a preset coding and decoding rule, namely the second device decodes the received basic frame or the received extended frame according to the digital signal to obtain an analog signal so as to meet the requirements of different audio applications on the second device on the audio signal.
Further, after receiving the basic frame, the second device performs frame decoding according to the basic frame to obtain a corresponding basic audio signal s 1 (n) of (a). After receiving the extended frame, the second device performs comprehensive decoding according to the extended frame and the basic frame to obtain a corresponding extended audio signal s 2 (n)。
Wherein the basic audio signal s 1 (n) and a second audio signal s 2 (n) the audio content is the same, but the basic audio signal s 1 (n) and an extended audio signal s 2 (n) different transmission delay and audio quality, basic audio signal s 1 (n) is slightly inferior to the expanded audio signal s 2 (n) audio quality, basic audio signal s 1 (n) has a transmission delay lower than that of the spread audio signal s 2 (n) a transmission delay.
Through the above embodiment of the present application, the same set of encoding scheme can be used for transmitting audio applications with different time delay requirements between the encoding side and the decoding side, that is, the encoding side only acquires one path of audio signals, but can encode the basic frame and the extended frame respectively according to different time delay requirements, so that the decoding side can decode different audio signals according to the two encoded frames to meet the requirements of different audio applications. Wherein, the audio signal decoded according to the basic frame has low time delay, but the audio signal quality is poor. The audio signal decoded by combining the extended frame and the basic frame has longer time delay, but the audio signal has better quality, and the distortion degree for restoring the original audio signal is smaller. Therefore, the decoding side can recover more than two audio signals according to different basic frames and extended frames, only one path of audio signal is coded during coding, redundant information is reduced by the coding mode, the problems of repeated transmission and bandwidth resource waste after the same path of audio signal is coded by the coding side are solved, and the system overhead is greatly reduced.
Next, the encoding and decoding methods and processes in the technical solution of the present application are described in detail by listing several preferred codec implementation modes, such as mode one, mode two, mode three, and mode four. The several embodiments described below are not all possible embodiments of the present application, but are merely exemplary embodiments.
In a first way,
1. And (3) encoding process at the encoding side:
in a possible embodiment, the first device may use a time-domain coding with a lower delay to obtain the basic frame, i.e. only encode the low frequency part of the second audio signal. The first device obtains the extension frame by adopting a frequency domain coding mode with higher time delay, and the extension frame only comprises a high-frequency part in the second audio signal.
For example, the second device has two different audio applications, one is an equipment calibration and positioning application, the required audio signal requires high real-time performance, the signal transmission delay interval does not exceed 1ms, but the requirement on audio quality is not high, and the audio signal may not include a high-frequency signal and only includes a low-frequency signal. The other is speech enhancement application, the real-time performance of the required audio signal is not strong, the signal transmission delay does not exceed 6ms, but the requirement on the audio quality is high, and signals of high-frequency and low-frequency parts are required.
In step S202, the encoding of the basic frame by the first apparatus may specifically include:
(1) the first device down-samples the second audio signal to obtain a low-frequency signal comprised by the second audio signal.
Wherein the down-sampling represents the processing of a sample sequence once every few samples, thus obtaining a new sequence. For example, the sampling rate of sampling the first audio signal is 16kHz, and the bandwidth of the quantized second audio signal may be half of the sampling rate, that is, the bandwidth may be 8 kHz. If the second audio signal comprises a frequency band of 0-8 kHz, wherein the low-frequency signal s L (n) is 0-4 kHz part, high frequency signal s H (n) is 4k to 8 kHz. Then the second audio signal is subjected to a one-time down-sampling process to obtain the low-frequency signal s included in the second audio signal L (n) is an audio signal of 0-4 kHz.
(2) And coding the low-frequency signal by taking the first time length as a unit according to a time domain coding mode to obtain a plurality of basic frames.
The time-domain coding is to code the waveform of the audio signal. For typical coding standards of time domain coding, such as g.726, g.723.1 or g.728 of the International Telecommunication Union (ITU), the coding standards widely adopt a code excitation linear prediction technology, and are modeled according to a human occurrence mechanism in principle, and redundant information in an audio signal is removed by using the inherent characteristics of a human glottal and a sound channel, so that the bit rate required by audio coding is greatly reduced while high audio quality is maintained.
Illustratively, the first device may be paired with s L And (n) coding by adopting a G.726 coding mode, assembling basic frames by taking the first duration as an interval, wherein the frame length of the basic frames is the first duration. For example, the first duration may be 0.5ms, one for each 0.5ms duration s L And (n) coding the signals, wherein the obtained digital signals are basic frames. G.726 is a speech codec algorithm, which can encode an audio signal into a digital signal with low time delay.
Further, in step S202, the encoding of the extended frame by the first apparatus may specifically include:
(1) and performing frequency domain transformation on the second audio signal by taking the second time length as a unit to obtain a frequency domain coefficient corresponding to the second audio signal.
The principle of frequency domain coding is to encode an audio signal in the frequency domain using the principle of acceptance of sound by the human ear. The frequency band concerned by human is mainly coded, and for the frequency band masked by other frequency bands or not easy to be perceived by human, a rough quantization or non-quantization strategy is adopted. The frequency domain coding has an advantage in that a certain redundancy is removed according to characteristics of human ears, so that the coding effect on various audio signals is almost equivalent, and the coding quality especially for signals such as music is higher than that of time domain coding.
Specifically, Modified Discrete Cosine Transform (MDCT) may be performed on the second audio signal to obtain an MDCT frequency domain coefficient corresponding to the second audio signal. The MDCT transform is an algorithm for transforming a signal from a time domain to a frequency domain, and the obtained coefficients represent frequency domain components of each frequency point.
The transformation formula for transforming the time domain signal s (n) to the MDCT frequency domain coefficients s (k) is as follows:
Figure PCTCN2020098183-APPB-000001
the MDCT coefficients s (k) are obtained, and s (k) is the frequency domain portion of the second audio signal.
Illustratively, if the second duration is 5ms, that is, the frame length for encoding the extended frame is 5ms, and the sampling rate is 16kHz, s (N) includes 80 sample points, that is, N is 80, and the value of sample point N ranges from 0 to 79. And performing MDCT transformation on the s (n) signals with the duration of 5ms one by one to obtain corresponding MDCT coefficients, wherein the value range of k can be 0-79. The frequency domain coefficients k start from 0 and represent low to high frequencies. The low-frequency-domain coefficients are S (0) to S (39) from low to high, and the high-frequency-domain coefficients are S (40) to S (79) from low to high.
(2) And averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, and encoding according to an envelope encoding mode.
Illustratively, the above 40 high-frequency-domain coefficients S (40) to S (79) are equally divided into 8 groups, each group of high-frequency groups includes five high-frequency-domain coefficients, and the specific groups are as follows:
group 1 contains the high frequency domain coefficients as: s (40) to S (44);
group 2 contains the high frequency domain coefficients as: s (45) to S (49);
group 3 contains the high frequency domain coefficients as: s (50) to S (54);
group 4 contains the high frequency domain coefficients as: s (55) to S (59);
group 5 contains the high frequency domain coefficients: s (69) to S (64);
group 6 contains the high frequency domain coefficients as: s (65) to S (69);
group 7 contains the high frequency domain coefficients as: s (70) to S (74);
group 8 contains the high frequency domain coefficients as: s (75) to S (79).
Next, a group envelope value of the plurality of high frequency groups is obtained, wherein the group envelope value is an average value of a plurality of high frequency domain coefficients in each group. The first means may obtain a group envelope value for each group of the high frequency part of the second audio signal and then encode according to the group envelope values to obtain a plurality of extended frames with the second duration as a frame length.
Illustratively, the group envelope value may specifically be calculated as:
group 1 envelope value: s HE (0)=[S(40)+S(41)+S(42)+S(43)+S(44)]/5;
Group 2 envelope values: s HE (1)=[S(45)+S(46)+S(47)+S(48)+S(49)]/5;
Group 3 envelope values: s HE (2)=[S(50)+S(51)+S(52)+S(53)+S(54)]/5;
Group 4 envelope values: s HE (3)=[S(55)+S(56)+S(57)+S(58)+S(59)]/5;
Group 5 envelope values: s. the HE (4)=[S(60)+S(61)+S(62)+S(63)+S(64)]/5;
Group 6 envelope values: s HE (5)=[S(65)+S(66)+S(67)+S(68)+S(69)]/5;
Group 7 envelope values: s HE (6)=[S(70)+S(71)+S(72)+S(73)+S(74)]/5;
Group 8 envelope values: s HE (7)=[S(75)+S(76)+S(77)+S(78)+S(79)]/5。
With the second duration as the frame length, the first device may digitally encode the obtained group envelope values of the plurality of high frequency packets, and send the group envelope values to the second device frame by frame. For example, every 5ms, the first device will obtain S as described above HE (0)~S HE (7) The codes are assembled into an extended frame and sent to a second device.
2. Decoding side decoding process:
based on the above coding method, the second device receives a frame of basic frame at regular intervals, and then decodes the basic frame according to the time domain decoding method to obtain the first audio signal, where the first audio signal only contains the low frequency part relative to the original audio signal at the coding side.
The second device receives a frame of extended frame at regular intervals, the extended frame only contains the high-frequency part of the original audio signal, and the second device performs comprehensive decoding on the extended frame and the basic frame to obtain a second audio signal. The second audio signal comprises not only a low frequency part but also a high frequency part.
Taking the above embodiment as an example, the second device may receive a basic frame every 0.5ms, and then decode the basic frame according to the decoding manner of g.726, to obtain the basic audio signal s 1 (n) in the formula (I). The basic audio signal s 1 (n) only the low frequency part, but the delay is low at 0.5 ms. Thus, the audio signal may be applied to audio applications with lower latency requirements, such as device calibration and positioning applications.
If the extended frame received by the second device includes the group envelope values of the plurality of high-frequency signals, obtaining a plurality of high-frequency domain coefficients of the high-frequency signals according to the group envelope values of the plurality of high-frequency signals, that is, the frequency domain coefficients of the high-frequency signals are the group envelope values corresponding to the high-frequency domain coefficients; in addition, the basic audio signal is up-sampled to obtain a third audio signal; and performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of low-frequency domain coefficients of the low-frequency signal corresponding to the third audio signal. The second device may restore the obtained audio signal according to the plurality of high frequency domain coefficients and the plurality of low frequency domain coefficients to be the extended audio signal.
Illustratively, the second device may receive an extension frame every 5ms, from which the group envelope value S of the high frequency part of the audio signal is derived HE (0)~S HE (7). Then a plurality of high frequency domain coefficients may be derived from the group envelope values, i.e. the high frequency domain coefficients of the audio signal are made equal to the group envelope values of their corresponding group of high frequency domain coefficients, i.e.:
S(40)=S(41)=S(42)=S(43)=S(44)=S HE (0);
S(45)=S(46)=S(47)=S(48)=S(49)=S HE (1);
S(50)=S(51)=S(52)=S(53)=S(54)=S HE (2);
S(55)=S(56)=S(57)=S(58)=S(59)=S HE (3);
S(60)=S(61)=S(62)=S(63)=S(64)=S HE (4);
S(65)=S(66)=S(67)=S(68)=S(69)=S HE (5);
S(70)=S(71)=S(72)=S(73)=S(74)=S HE (6);
S(75)=S(76)=S(77)=S(78)=S(79)=S HE (7) thus, S (40) to S (79) can be obtained.
The audio signal recovered by taking the basic frame received in the second time interval, for example, the audio signal s decoded by a plurality of basic frames in 5ms as mentioned above 1 (n) converting the audio signal s 1 (n) upsampling to obtain a third audio signal s' L (n) of (a). Wherein, the up-sampling process is to insert one or more zero points into two adjacent points in the original signal,illustratively, for the audio signal s 1 (n) after up-sampling, a third audio signal s 'with 8k frequency width and 16kHz sampling rate can be obtained' L (n) but the third audio signal s' L The high frequency part of (n) is still 0.
Audio signal s 'to low frequency part' L (n) performing MDCT transform to obtain frequency domain coefficient S 'according to the following formula' L (k):
Figure PCTCN2020098183-APPB-000002
Wherein, corresponding to 5ms delay, the audio signal segment with the sampling rate of 16kHz has 80 sampling points, i.e. N in the above formula is 80. Is prepared from S' L (k) And the low-frequency coefficients of (3) are integrated with the high-frequency coefficients S (40) to S (79) obtained from the extended frame in the above step to obtain the complete MDCT coefficients S (k) of the audio frame. Wherein S (k) ═ S' L (k),k=0~39。
Performing inverse transformation of the modified discrete cosine transform on S (k) to obtain an extended audio signal s 2 (n) the extended audio signal s 2 Both high and low frequency components are included in (n). The specific formula of the inverse transform of the modified discrete cosine transform is as follows:
Figure PCTCN2020098183-APPB-000003
of the audio signals obtained by decoding according to the above-described embodiments, the audio signal s decoded from the basic frame 1 And (n) only has low-frequency components, the decoding quality is low, but the time delay of the audio signal is low, and the method can be used for the application of audio services with low requirements on audio quality and low requirements on audio time delay. Audio signal s obtained by joint decoding of extension frame and basic frame 2 (n) both high and low frequency components are present, the decoding quality is higher, but the delay is longer,therefore, the method can be used for the application of audio services with high requirements on audio quality and low requirements on the real-time performance of audio transmission.
According to the embodiment of the application, one path of audio application is transmitted through the same set of coding and decoding scheme, different audio signals obtained by decoding can be applied to different audio applications respectively, so that repeated coding and decoding and a transmission process are avoided, the waste of bandwidth resources can be avoided to a great extent, and the system overhead is reduced.
Further, when the encoding and decoding are performed according to the above embodiment, the decoding side device may perform decoding according to the extended frame when the basic frame received by the decoding side device is lost or the basic frame is not received and the audio signal cannot be restored according to the basic frame decoding, and when performing inverse frequency domain transform, the low frequency domain coefficient is 0, and the audio signal can be restored by performing inverse frequency domain transform only according to the frequency domain coefficient of the high frequency portion. Wherein the audio signal contains only high frequency parts.
The second way,
1. And (3) encoding process at the encoding side:
in a possible embodiment, the first device may use a time-domain coding with a lower time delay to obtain the basic frame, i.e. only encode the low frequency part of the second audio signal. The first device obtains the extension frame by adopting a frequency domain coding mode with higher time delay, and the extension frame only comprises a high-frequency part in the second audio signal.
For example, the second device has two different audio applications, one is a speech enhancement application, the required audio signal has strong real-time requirements, the signal delay is lower and is not more than 6ms, and both high and low frequency parts are required. The other is a three-dimensional (3D) sound field acquisition application, which requires a higher audio signal quality requirement and a longer signal delay.
In step S202, the encoding of the basic frame by the first apparatus may specifically include:
(1) and the first device performs frequency domain transformation on the second audio signal by taking the first duration as the frame length to obtain frequency domain coefficients, namely a plurality of low-frequency domain coefficients of the low-frequency signal and a plurality of high-frequency domain coefficients of the high-frequency signal corresponding to the second audio signal.
(2) And averagely grouping the plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is the average value of the plurality of high-frequency domain coefficients in each group.
(3) And coding according to the plurality of frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of basic frames taking the first duration as the frame length.
Illustratively, to meet the real-time requirements of the speech enhancement application on the second device, the first duration may be 5 ms. If the sampling rate is 16kHz, the first device may perform MDCT transform on the audio signal s (n) every 5ms to obtain an MDCT coefficient s (k), where k may have a value range of 0 to 79. The high frequency domain coefficients S (40) -S (79) are evenly divided into 8 groups according to the sequence, each group comprises 5 high frequency domain coefficients, and then group envelope values S of a plurality of high frequency groups are obtained HE (0)~S HE (7). The first means groups a plurality of frequency domain coefficients S (0) to S (39) of the low frequency signal and a group envelope value S of the high frequency signal HE (0)~S HE (7) And coding to obtain a basic frame.
Further, in step S202, the encoding of the extended frame by the first apparatus may specifically include:
and the first device encodes the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values by taking the second time length as a unit to obtain a plurality of extension frames with the second time length as a frame length.
For example, the first apparatus may calculate the difference between each high frequency domain coefficient of the encoded high frequency part of the basic frame and the group envelope value of the corresponding high frequency group every 20 ms. Specifically, the set of envelope values corresponding to a plurality of high-frequency domain coefficients may be subtracted from the set of envelope values to obtain a set of envelope coefficient difference SD HE (k) Wherein k is 40-79. The calculation method may be as follows:
SD HE (40)=S(40)-S HE (0);
SD HE (41)=S(41)-S HE (0);
……
SD HE (45)=S(45)-S HE (1);
SD HE (46)=S(45)-S HE (1);
……
SD HE (78)=S(78)-S HE (7)
SD HE (79)=S(79)-S HE (7)。
the first means may apply these sets of envelope coefficient differences SD every 20ms HE (40)~SD HE (79) And assembled into an extended frame for transmission to a second device. Wherein the first means may group the sets of envelope coefficient difference values SD HE (40)~SD HE (79) The transmission is directly packaged, or the transmission can be coded in a differential quantization mode.
2. Decoding side decoding process:
based on the above encoding method, the second device receives a frame of basic frame every first time length, and if the basic frame includes a plurality of frequency domain coefficients of the low frequency signal and a plurality of group envelope values of the high frequency signal, the second device obtains a plurality of frequency domain coefficients of the high frequency signal according to the plurality of group envelope values of the high frequency signal in the basic frame, and then performs inverse frequency domain transform according to the plurality of frequency domain coefficients of the low frequency signal and the plurality of frequency domain coefficients of the high frequency signal to obtain the first audio signal.
The second device receives a frame of extended frame every second duration, and if the extended frame comprises a difference value between a plurality of frequency domain coefficients of the high-frequency signal and a corresponding set envelope value, the second device can obtain a plurality of frequency domain coefficients of the high-frequency signal by combining the set envelope value of the high-frequency signal in the basic frame, and then perform inverse frequency domain transform according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain a second audio signal. The second audio signal comprises not only a low frequency part but also a high frequency part.
Taking the above embodiment as an example, the second apparatus may receive the basic frame every 5ms, and the second apparatus first obtains the low frequency partial frequency domain coefficients of S (k) according to the basic frame, i.e., S (0) to S (39). The second device obtains the high-frequency coefficients according to the high-frequency group envelope values in the basic frame, that is, each high-frequency domain coefficient can be equal to the corresponding group envelope value, that is:
S(40)=S(41)=S(42)=S(43)=S(44)=S HE (0);
S(45)=S(46)=S(47)=S(48)=S(49)=S HE (1);
S(50)=S(51)=S(52)=S(53)=S(54)=S HE (2);
S(55)=S(56)=S(57)=S(58)=S(59)=S HE (3);
S(60)=S(61)=S(62)=S(63)=S(64)=S HE (4);
S(65)=S(66)=S(67)=S(68)=S(69)=S HE (5);
S(70)=S(71)=S(72)=S(73)=S(74)=S HE (6);
S(75)=S(76)=S(77)=S(78)=S(79)=S HE (7) thus, S (40) to S (79) were obtained.
The low-frequency coefficients S (0) to S (39) obtained by decoding the basic frame and the high-frequency coefficients S (40) to S (79) having a defect in the high-frequency portion are combined. Performing inverse MDCT on the obtained S (0) -S (79) to obtain a basic audio signal S 1 (n) of (a). The basic audio signal s 1 (n) the time delay is low and includes both the high frequency and low frequency portions of the original audio signal. However, since the high frequency part is only the high frequency signal restored by the set envelope value, i.e. the values of the plurality of frequency bands are the same, the signal quality of the high frequency part is slightly poor, which is equivalent to reducing the frequency domain resolution of the high frequency part.
The second device receives the extension frame every 20ms, from which it derives the set of envelope coefficient differences SD of the high-frequency part of the audio signal HE (40)~SD HE (79). Then according to SD HE (40)~SD HE (79) Obtaining the frequency domain coefficients of the high frequency part in each basic frame byAs shown, each high frequency domain coefficient is obtained by adding the set of envelope coefficient differences to the spectral envelope:
S(40)=SD HE (40)+S HE (0);
S(41)=SD HE (41)+S HE (0);
…..
S(45)=SD HE (45)+S HE (1);
S(46)=SD HE (46)+S HE (1);
…..
S(78)=SD HE (78)+S HE (7);
S(79)=SD HE (79)+S HE (7) the complete high-frequency part of the spectrum S (40) to S (79) can be obtained.
By combining the frequency domain coefficients S (0) to S (39) of the low frequency region obtained by the basic frame decoding, MDCT inverse transformation is performed on the obtained S (0) to S (79) to obtain an extended audio signal S 2 (n) the expanded audio signal s 2 (n) including both high and low frequency portions of the original audio signal, the high frequency portion being the high frequency signal restored by combining the group envelope values with the group envelope coefficient difference, thereby expanding the audio signal s 2 (n) compared to the basic audio signal s 1 (n) the quality of the restoration is high, but the audio signal s is extended 2 (n) is longer, and the basic audio signal s is transmitted in real time 1 (n) is preferred over the expanded audio signal s 2 (n) of (a).
The third method,
1. And (3) encoding process at the encoding side:
in one possible embodiment, when the first device needs to meet the requirements of more than three different audio applications on the second device, the first device may encode one basic frame and more than two extended frames.
The method specifically comprises the following steps: the first device obtains the basic frame by adopting a time domain coding mode with lower time delay and low quality, namely, only the low-frequency part in the second audio signal is coded. The first device obtains a first extended frame by adopting a high-delay and low-quality frequency domain coding mode, and the first extended frame only codes the frequency domain group envelope value of the high-frequency part in the second audio signal. And the first device adopts a higher-time-delay and high-quality frequency domain coding mode to obtain a second extended frame, and the second extended frame comprises a high-frequency part in the second audio signal.
For example, the second device has three different audio applications, one is an equipment calibration and positioning application, the requirement for processing the audio signal is strong in real-time performance, the signal transmission delay interval is required to be not more than 1ms, and the audio signal may only contain a low-frequency signal and not contain a high-frequency signal; the second is speech enhancement application, the requirement of the application for processing audio signals is that the real-time performance is strong, the signal transmission delay does not exceed 6ms, the requirement of the audio quality is high, and both high-frequency signals and low-frequency signals in the audio signals are needed; the third is for 3D sound field acquisition applications that do not have high real-time requirements for processing audio signals, but have high audio quality requirements.
In step S202, the encoding of the basic frame by the first apparatus may specifically refer to the encoding manner of the basic frame in the above manner one, and may include:
(1) down-sampling the second audio signal to obtain a low-frequency signal included in the second audio signal;
(2) and coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as the frame length.
Illustratively, the first device may be paired with s L (n) the encoding scheme of g.726 is adopted, and the basic frames are assembled at intervals of a first duration, for example, the first duration may be 0.5ms, which meets the requirement of the first audio application.
Further, in step S202, the encoding of the first extended frame by the first device specifically refers to the encoding process of the extended frame in the first mode, which includes:
(1) performing frequency domain transformation on the second audio signal by taking the second duration as a frame length to obtain a frequency domain coefficient corresponding to the second audio signal;
(2) and averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, and encoding according to an envelope encoding mode.
For example, the first means may perform MDCT transform on S (n) to obtain MDCT frequency domain coefficients, where a frame length is 5ms, a sampling rate is 16kHz, and S (n) includes 80 sampling points, i.e., S (0) to S (79) may be obtained. Averagely dividing 40 high-frequency component coefficients S (40) -S (79) into 8 groups, wherein each group of high-frequency groups has five high-frequency component coefficients, and obtaining a group envelope value S of each group of high-frequency groups HE (0)~S HE (7) And the group envelope value is the average value of a plurality of high-frequency domain coefficients in each group. The first means may wrap the above-obtained group envelope value S of the plurality of high frequency packets HE (0)~S HE (7) Digitally encoding, the first device will obtain S every 5ms HE (0)~S HE (7) The code is assembled into an extended frame and sent to the second device.
In combination with the above, the encoding of the second extension frame in step S202 may specifically refer to the encoding process of the extension frame in the above manner two, which includes:
the first device encodes, in a unit of a third time, the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding set of envelope values, to obtain a plurality of extended frames having a frame length of the third time.
For example, the first apparatus may calculate, every 20ms, a difference between each high frequency domain coefficient of the encoded high frequency part of the first extension frame and a group envelope value of the corresponding high frequency group. Specifically, the set of envelope values corresponding to the plurality of high-frequency domain coefficients may be subtracted from the plurality of high-frequency domain coefficients to obtain a set of envelope coefficient difference SD of the plurality of high-frequency domain coefficients HE (40)~SD HE (79). The first device may then wrap these sets of envelope coefficient differences SD every 20ms HE (40)~SD HE (79) And assembled into a second extended frame for transmission to a second device.
2. Decoding side decoding process:
based on the above coding method, the second device receives a frame of basic frame every first time length, and then decodes the basic frame according to the time domain decoding method to obtain a basic audio signal, where the basic audio signal only contains a low frequency part relative to the original audio signal at the coding side.
The second device receives a frame of first extended frame every second duration, and if the first extended frame comprises group envelope values of a plurality of high-frequency signals, the second device obtains a plurality of frequency domain coefficients of the high-frequency signals according to the group envelope values of the plurality of high-frequency signals, wherein the frequency domain coefficients of the high-frequency signals are group envelope values corresponding to the frequency domain coefficients; meanwhile, the first audio signal obtained by decoding the basic frame is up-sampled to obtain a third audio signal; and performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of frequency domain coefficients of the low-frequency signal corresponding to the third audio signal. And then performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the high-frequency signal and the plurality of frequency domain coefficients of the low-frequency signal to obtain a first extended audio signal. The first extended audio signal includes a low frequency signal and a high frequency signal, but the high frequency signal is of a weaker quality, and the first extended audio signal has a longer time delay, so that the first extended audio signal can be used for the application of the second audio service.
The second device receives a frame of second extended frame every third time, and if the second extended frame comprises the difference values of the multiple frequency domain coefficients of the high-frequency signal and the corresponding group envelope values, the second device can obtain the multiple frequency domain coefficients of the high-frequency signal by combining the group envelope values of the high-frequency signal in the first extended frame, and then perform inverse frequency domain transform according to the multiple frequency domain coefficients of the low-frequency signal and the multiple frequency domain coefficients of the high-frequency signal to obtain a second extended audio signal. The second extension audio signal includes not only a low frequency part but also a high frequency part.
Illustratively, in conjunction with the above embodiment, the second device may receive a basic frame every 0.5ms, and then decode the basic frame according to the decoding manner of g.726, to obtain the basic audio signal s 1 (n) of (a). The basic audio signal s 1 (n) only low frequency part, but delay is low at 0.5ms. Thus, the audio signal may be applied to audio applications with lower latency requirements, such as the above-described applications of device calibration and positioning.
The second device may receive a first extension frame every 5ms, and derive a group envelope value S for the high frequency part of the audio signal from the first extension frame HE (0)~S HE (7) The second means may derive a plurality of high frequency domain coefficients S (40) to S (79) from the group envelope value. The second device decodes the audio signal s obtained by decoding a plurality of basic frames received within 5ms L (n) upsampling to obtain an audio signal s' L (n), to s' L And (n) performing MDCT to obtain low-frequency domain coefficients S (0) to S (39). The first extended audio signal S can be obtained by performing inverse MDCT transformation on S (0) -S (79) 2 (n) the first spread audio signal s 2 Both the high frequency part and the low frequency part are included in (n), wherein the high frequency part is of a slightly weaker quality.
The second device may receive a second extension frame every 20ms, from which the set of envelope coefficient differences SD for the high frequency part of the audio signal is derived HE (40)~SD HE (79). Then according to SD HE (40)~SD HE (79) Combining the first extended frame to obtain the group envelope value S of the high frequency part of the audio signal HE (0)~S HE (7) Frequency domain coefficients S (40) to S (79) for each high frequency portion are obtained. The second extended audio signal S of the 20ms time segment is obtained by performing inverse MDCT transform on S (0) to S (79) 3 (n) the second spread audio signal s 3 (n) comprises both a high frequency part and a low frequency part, wherein the second spread audio signal s 3 (n) comparing the first extended audio signal s 2 The high frequency part of (n) is of slightly better quality.
Through the implementation mode, more possible audio coding structures are provided, and the method and the device can be suitable for audio application with three or more different requirements, so that transmission bandwidth is saved, and system performance is improved.
The fourth way,
1. And (3) encoding process at the encoding side:
in one possible embodiment, the first device may use a lower-latency, low-quality time-domain coding to obtain the basic frame, i.e. to encode only the low-frequency part of the second audio signal. The first device may obtain the extended frame by using a higher-delay and low-quality frequency domain coding method, and only encode the frequency domain group envelope value of the low frequency part and the frequency domain group envelope value of the high frequency part in the second audio signal.
In step S202, the encoding of the basic frame by the first apparatus may specifically refer to the encoding mode of the basic frame in the first mode, and may include:
(1) and downsampling the second audio signal to obtain a low-frequency signal included in the second audio signal.
(2) And coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as the frame length.
Illustratively, the first device may be paired with s L And (n) the encoding mode of G.726 is adopted for encoding, and basic frames are assembled by taking a first time length as an interval, wherein the first time length can be 0.5ms, for example.
Further, in step S202, the encoding of the extended frame by the first apparatus specifically refers to the encoding process of the extended frame in the first mode, and includes:
(1) and performing frequency domain transformation on the second audio signal by taking the second time length as a unit to obtain a frequency domain coefficient corresponding to the second audio signal.
(2) And averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to the sequence from low frequency to high frequency to obtain a plurality of high-frequency grouped group envelope values, averagely grouping a plurality of frequency domain coefficients of a low-frequency part according to the sequence from low frequency to high frequency to obtain a plurality of low-frequency grouped group envelope values, and encoding according to an envelope encoding mode.
For example, the first device may perform MDCT transform on S (n) to obtain MDCT frequency domain coefficients, where a frame length is 5ms, a sampling rate is 16kHz, and S (n) includes 80 sampling points, i.e. S (0) may be obtained) S (79). Averagely dividing 40 low-frequency component coefficients S (0) -S (39) into 8 groups, wherein each group of high-frequency groups has five low-frequency component coefficients to obtain a group envelope value S of each group of low-frequency groups LE (0)~S LE (7). Then, the 40 high-frequency component coefficients S (40) to S (79) are equally divided into 8 groups, five high-frequency component coefficients are provided for each group of high-frequency groups, and a group envelope value S for each group of high-frequency groups is obtained HE (0)~S HE (7) And the group envelope value is the average value of a plurality of high-frequency domain coefficients in each group. The first means may group envelope values S of the plurality of low frequency packets obtained as described above LE (0)~S LE (7) Digitally encoding and enveloping the groups of the plurality of high frequency packets with a value S HE (0)~S HE (7) Digitally encoding, the first device will obtain S every 5ms LE (0)~S LE (7) And S HE (0)~S HE (7) The code is assembled into an extended frame and sent to the second device.
2. Decoding side decoding process:
based on the above coding method, the second device receives a frame of basic frame every first time length, and then decodes the basic frame according to the time domain decoding method to obtain a basic audio signal, where the first audio signal only contains a low frequency part relative to an original audio signal at the coding side.
And the second device receives a frame of extension frame every a second time length, if the extension frame comprises a plurality of group envelope values of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, a plurality of frequency domain coefficients of the low-frequency signal are obtained according to the plurality of group envelope values of the low-frequency signal, and a plurality of frequency domain coefficients of the high-frequency signal are obtained according to the plurality of group envelope values of the high-frequency signal. If the second device normally receives a plurality of basic frames, the plurality of frequency domain coefficients of the low-frequency signal may be determined by performing frequency domain transformation on the first audio signal obtained from the basic frames. If the second device does not normally receive the plurality of basic frames, the second device may determine a plurality of frequency domain coefficients of the low frequency signal according to a plurality of group envelope values of the low frequency signal in the extended frame, where the frequency domain coefficients of the plurality of low frequency signals are group envelope values corresponding to the frequency domain coefficients. The second device may perform inverse frequency domain transform according to the plurality of frequency domain coefficients of the low frequency signal and the plurality of frequency domain coefficients of the high frequency signal to obtain the extended audio signal.
For example, if the second device receives the basic frame normally, for example, the second device may receive a basic frame every 0.5ms, and then decode the basic frame according to the decoding manner of g.726, so as to obtain the basic audio signal s 1 (n) of (a). The basic audio signal s 1 (n) only the low frequency part, but the delay is low at 0.5 ms.
The second device receives an extension frame every 5ms, and obtains a group envelope value S of the high frequency part of the audio signal from the extension frame HE (0)~S HE (7) Then, a plurality of high-frequency domain coefficients S (40) to S (79) can be obtained from the group envelope value. The second device decodes the audio signal s obtained by decoding a plurality of extension frames received within 5ms L (n) upsampling to obtain an audio signal s' L (n), to s' L And (n) performing MDCT to obtain low-frequency domain coefficients S (0) to S (39). The expanded audio signal S can be obtained by performing inverse MDCT on S (0) -S (79) 2 (n) the extended audio signal s 2 Both the high frequency and low frequency parts are included in (n), wherein the high frequency part is of a slightly weaker quality.
Illustratively, if the second device does not normally receive the basic frame, e.g., the basic frame is lost or it is verified that the received basic frame is an erroneous basic frame, the second device decodes the obtained group envelope value S of the low frequency part from the extended frame LE (0)~S LE (7) A plurality of low frequency domain coefficients S (0) -S (39) are derived, wherein the plurality of low frequency domain coefficients are equal to the group envelope value of its corresponding group of low frequency domain coefficients. The second device decodes the group envelope value S of the high frequency part obtained from the extended frame HE (0)~S HE (7) A plurality of high frequency domain coefficients S (40) -S (79) are derived, wherein the plurality of high frequency domain coefficients are equal to the group envelope value of its corresponding group of high frequency domain coefficients. The second device performs inverse MDCT on S (0) -S (79) obtained by decoding a plurality of extended frames received within 5msAn expanded audio signal s can be obtained 2 (n) the expanded audio signal s 2 Both the high frequency part and the low frequency part are included in (n).
According to the above embodiment, the decoding-side device can still perform decoding based on the extended frame when the basic frame cannot normally decode and recover the audio signal, thereby recovering the entire audio signal.
To sum up, the above-mentioned embodiment that this application provided can transmit audio frequency application all the way through same set of codec scheme, obtains different audio signals according to basic frame or extension frame decoding and can be applied to different audio frequency applications respectively to avoid repeated coding and decoding and transmission process, the waste of avoiding the bandwidth resource that can very big degree reduces system overhead. In addition, when the basic frame is lost on the decoding side and the audio signal cannot be recovered according to the basic frame decoding, the decoding side equipment can decode according to the extended frame, and the reliability of audio transmission is further improved.
In another possible implementation, before codec transmission of the audio signal, the encoding side device may communicate with the decoding side device in advance according to the encoding requirement of the audio application on the transmitted audio signal, and negotiate a specific codec mode. For example, according to the requirement of a low-delay and low-quality audio signal for a first audio application on the second device, the second device sends the audio signal request information to the first device, where the audio signal request information carries the configuration information, so as to indicate a coding mode corresponding to the audio signal request. Alternatively, when the first apparatus transmits the encoded frame to the second apparatus, the encoding mode of the encoded frame may be indicated by an agreed bit, for example, the first apparatus transmits a basic frame of the audio signal to the second apparatus, the basic frame includes two bits configured in advance, and for example, 01 may indicate a second encoding mode. It should be understood that the above-mentioned configuration of the codec is only an exemplary illustration, and is not limited to the above-mentioned two configurations, and this is not specifically limited in this embodiment of the present application.
The present application also provides an audio processing apparatus, as shown in fig. 6, the apparatus 600 may include a preprocessing module 601, an encoding module 602, and a transmitting module 603.
The preprocessing module 601 may be configured to sample and quantize the acquired first audio signal to obtain a second audio signal.
The encoding module 602 may be configured to encode the second audio signal in a first encoding manner by using a first time length as a unit to obtain a basic frame, and encode the second audio signal in a second encoding manner by using a second time length as a unit to obtain an extended frame, where the second time length is greater than the first time length, and the first encoding manner and the second encoding manner respectively encode different signals carried in the second audio signal, and/or respectively encode the second audio signal in different encoding degrees.
The sending module 603 may be configured to send the basic frame and the extended frame to the second apparatus.
In a possible design manner, the second time duration is N times the first time duration, and N is a natural number greater than or equal to 2.
In one possible design, the encoding module 602 may specifically be configured to: down-sampling the second audio signal to obtain a second audio signal comprising a carried low-frequency signal; and coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as a frame length.
In one possible design, the encoding module 602 may specifically be configured to: performing frequency domain transformation on the second audio signal to obtain a frequency domain coefficient corresponding to the second audio signal; averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to a sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is an average value of the plurality of high-frequency domain coefficients in each group; and coding according to the group envelope value to obtain a plurality of extended frames taking the second duration as the frame length.
In one possible design, the encoding module 602 may specifically be configured to: performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of the low-frequency signal and a plurality of frequency domain coefficients of the high-frequency signal corresponding to the second audio signal; averagely grouping a plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is the average value of the plurality of high-frequency domain coefficients in each group; and coding according to the plurality of frequency domain coefficients of the low-frequency signal and the group envelope value of the high-frequency signal to obtain a plurality of basic frames taking the first duration as the frame length.
In one possible design, the encoding module 602 may specifically be configured to: and coding the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values by taking the second time length as a unit to obtain a plurality of extension frames taking the second time length as a frame length.
In one possible design, the encoding module 602 may specifically be configured to: performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of the low-frequency signal and a plurality of frequency domain coefficients of the high-frequency signal corresponding to the second audio signal; grouping the frequency domain coefficients of the low-frequency signal and the frequency domain coefficients of the high-frequency signal respectively to obtain corresponding group envelope values, wherein the group envelope values are average values of the frequency domain coefficients in each group; and coding according to the group envelope value to obtain a plurality of extended frames taking the second duration as the frame length.
In a possible design, the frequency domain transform in the above embodiment may be specifically a modified discrete cosine transform MDCT algorithm.
The present application further provides an audio signal processing apparatus, as shown in fig. 7, the apparatus 700 includes a receiving module 701 and a decoding module 702.
The receiving module 701 may be configured to receive a basic frame and an extended frame sent from a first apparatus, where a frame length of the extended frame is greater than a frame length of the basic frame, and the extended frame is obtained by recoding an audio signal corresponding to a plurality of basic frames.
A decoding module 702, configured to decode the basic frame to obtain a basic audio signal; or jointly decoding the basic frame and the extended frame to obtain the extended audio signal.
In one possible design, the decoding module 702 may be specifically configured to: and decoding the basic frame according to a time domain coding and decoding mode to obtain a basic audio signal.
In one possible design, the decoding module 702 may be specifically configured to: if the extended frame comprises group envelope values of a plurality of high-frequency signals, obtaining a plurality of frequency domain coefficients of the high-frequency signals according to the group envelope values of the plurality of high-frequency signals, wherein the frequency domain coefficients of the high-frequency signals are the group envelope values corresponding to the frequency domain coefficients; up-sampling the basic audio signal to obtain a third audio signal; performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of frequency domain coefficients of the low-frequency signal corresponding to the third audio signal; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the high-frequency signal and the plurality of frequency domain coefficients of the low-frequency signal to obtain the expanded audio signal.
In one possible design, the decoding module 702 may be specifically configured to: if the basic frame comprises a plurality of frequency domain coefficients of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, obtaining the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the basic frame, wherein the plurality of frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain a basic audio signal.
In one possible design, the decoding module 702 may be specifically configured to: if the extended frame comprises the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values, obtaining the plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal and the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
In one possible design, the decoding module 702 may be specifically configured to: if the extended frame comprises a plurality of groups of envelope values of the low-frequency signal and a plurality of groups of envelope values of the high-frequency signal, obtaining a plurality of frequency domain coefficients of the low-frequency signal according to the plurality of groups of envelope values of the low-frequency signal, and obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of groups of envelope values of the high-frequency signal; the method comprises the steps that a plurality of frequency domain coefficients of a low-frequency signal are determined by performing frequency domain transformation on a basic audio signal obtained from a basic frame, or the frequency domain coefficients of the low-frequency signal are determined by a plurality of group envelope values of the low-frequency signal in an extended frame, and the plurality of frequency domain coefficients of the low-frequency signal are group envelope values corresponding to the frequency domain coefficients; and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the expanded audio signal.
In a possible design manner, the inverse frequency domain transform in the above embodiment may be specifically a modified inverse discrete cosine transform algorithm.
In one possible design, the set of envelope values includes an average value of the plurality of frequency domain coefficients in each set, which is obtained by averagely grouping the plurality of frequency domain coefficients in order from low frequency to high frequency.
It is to be understood that, when the audio signal processing apparatus is an electronic device, the transmitting module may be a transmitter, and may include an antenna, a radio frequency circuit, and the like, and the preprocessing module, the encoding module, and the decoding module may be a processor, such as a baseband chip, and the like. When the audio signal processing apparatus is a component having the functions of the first apparatus or the second apparatus, the transmitting module may be a radio frequency unit, and the preprocessing module, the encoding module, and the decoding module may be a processor. When the audio signal processing apparatus is a chip system, the sending module may be an output interface of the chip system, and the preprocessing module, the encoding module and the decoding module may be processors of the chip system, for example: a Central Processing Unit (CPU).
It should be noted that, for the specific implementation process and embodiment in the apparatus 600, reference may be made to the steps executed by the first apparatus in the foregoing method embodiment and the related descriptions, for the specific implementation process and embodiment in the apparatus 700, reference may be made to the steps executed by the second apparatus in the foregoing method embodiment and the related descriptions, and for the technical problem to be solved and the technical effect to be brought about, reference may also be made to the contents described in the foregoing embodiment, and details are not repeated here.
In the present embodiment, the audio signal processing apparatus is presented in a form in which the respective functional modules are divided in an integrated manner. "module" herein may refer to a specific circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that may provide the described functionality. In a simple embodiment, the audio signal processing means may take the form as shown in fig. 8 below, as will be appreciated by those skilled in the art.
Fig. 8 is a schematic structural diagram of an exemplary electronic device 800 shown in an embodiment of the present application, where the electronic device 800 may be the first device or the second device in the foregoing embodiment, and is configured to execute the method for testing the smart camera in the foregoing embodiment. As shown in fig. 8, the electronic device 800 may include at least one processor 801, a communication link 802, and a memory 803.
The processor 801 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits.
Communication link 802, which may comprise a communication path such as a bus, may carry information between the aforementioned components.
The memory 803 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be separate and coupled to the processor via a communication line 802. The memory may also be integrated with the processor. The memory provided by the embodiment of the application is generally a nonvolatile memory. The memory 803 is used for storing computer program instructions related to the implementation of the solution of the embodiment of the present application, and is controlled and executed by the processor 801. The processor 801 is configured to execute the computer program instructions stored in the memory 803, thereby implementing the methods provided by the embodiments of the present application.
Optionally, the computer program instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
In particular implementations, processor 801 may include one or more CPUs such as CPU0 and CPU1 in fig. 8, for example, as an example.
In particular implementations, electronic device 800 may include multiple processors, such as processor 801 and processor 807 in FIG. 8, for example, as an embodiment. These processors may be single-core (single-CPU) processors or multi-core (multi-CPU) processors. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, electronic device 800 may also include a communications interface 804, as one embodiment. The electronic device may receive and transmit data through a communication interface 804, or communicate with other devices or a communication network, where the communication interface 804 may be, for example, an ethernet interface, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN) interface, or a USB interface.
In particular implementations, electronic device 800 may also include an output device 805 and an input device 806, as one embodiment. The output device 805 is in communication with the processor 801 and may display information in a variety of ways. For example, the output device 805 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 806 is in communication with the processor 801 and may receive user input in a variety of ways. For example, the input device 806 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
In a specific implementation, the electronic device 800 may be a desktop, a laptop, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet, a wireless terminal device, an embedded device, a smart camera, or a device with a similar structure as in fig. 8. The embodiment of the present application does not limit the type of the electronic device 800, and as a method for implementing the second apparatus in the above embodiment, the electronic device 800 needs to be configured with an intelligent camera.
In some embodiments, the processor 801 in fig. 8 may cause the electronic device 800 to perform the methods in the above-described method embodiments by calling computer program instructions stored in the memory 803.
Illustratively, the functionality/implementation of the processing modules of fig. 6 or 7 may be implemented by the processor 801 of fig. 8 invoking computer program instructions stored in the memory 803. For example, the functions/implementation processes of the preprocessing module 601 and the encoding module 602 in fig. 7 may be implemented by the processor 801 in fig. 8 calling a computer-executable instruction stored in the memory 803. The functions/implementation procedures of the receiving module 701 and the decoding module 702 in fig. 7 may be implemented by the processor 801 in fig. 8 calling computer-executable instructions stored in the memory 803.
In an exemplary embodiment, a computer readable storage medium including instructions executable by the processor 801 of the electronic device 800 to perform the testing method of the smart camera of the above-described embodiment is also provided. Therefore, the technical effects obtained by the method can be obtained by referring to the method embodiments, which are not described herein again.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (38)

  1. A method of audio signal processing, the method comprising:
    the first device samples and quantizes the acquired first audio signal to obtain a second audio signal;
    coding the second audio signal by a first coding mode by taking a first duration as a unit to obtain a basic frame;
    coding the second audio signal in a second coding mode by taking a second duration as a unit to obtain an extended frame, wherein the second duration is greater than the first duration, and the first coding mode and the second coding mode respectively code different signals carried in the second audio signal and/or respectively code the second audio signal in different coding degrees;
    and sending the basic frame and the extended frame to a second device.
  2. The method of claim 1, wherein the second duration is N times the first duration, and wherein N is a natural number greater than or equal to 2.
  3. The method according to claim 1 or 2, wherein the encoding the second audio signal in the first time length unit by the first encoding method to obtain the basic frame specifically comprises:
    down-sampling the second audio signal to obtain a low-frequency signal carried in the second audio signal;
    and coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as a frame length.
  4. The method according to claim 3, wherein the encoding the second audio signal in the second encoding manner by using the second duration as a unit to obtain the extended frame specifically comprises:
    performing frequency domain transformation on the second audio signal to obtain a frequency domain coefficient corresponding to the second audio signal;
    averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to a sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is an average value of the plurality of high-frequency domain coefficients in each group;
    and coding according to the group of envelope values to obtain a plurality of extended frames taking the second duration as the frame length.
  5. The method according to claim 1 or 2, wherein said encoding the second audio signal in a first time length unit by a first encoding manner to obtain a basic frame specifically comprises:
    performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of a low-frequency signal and a plurality of frequency domain coefficients of a high-frequency signal corresponding to the second audio signal;
    averagely grouping the plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is the average value of the plurality of high-frequency domain coefficients in each group;
    and coding according to the plurality of frequency domain coefficients of the low-frequency signal and the group of envelope values of the high-frequency signal to obtain a plurality of basic frames taking the first time length as a frame length.
  6. The method according to claim 4 or 5, wherein the encoding the second audio signal in the second time length unit by the second encoding method to obtain the extended frame specifically comprises:
    and coding the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group of envelope values by taking the second time length as a unit to obtain a plurality of extension frames with the second time length as a frame length.
  7. The method according to claim 3, wherein the encoding the second audio signal in the second encoding manner by using the second duration as a unit to obtain the extended frame further comprises:
    performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of a low-frequency signal and a plurality of frequency domain coefficients of a high-frequency signal corresponding to the second audio signal;
    averaging and grouping the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal respectively according to a sequence from low frequency to high frequency to obtain corresponding group envelope values, wherein the group envelope values are average values of the plurality of frequency domain coefficients in each group;
    and coding according to the group of envelope values to obtain a plurality of extended frames taking the second duration as the frame length.
  8. The method according to any of claims 4-7, wherein frequency-domain transforming the second audio signal comprises:
    and obtaining the MDCT frequency domain component coefficient corresponding to the second audio signal according to an improved discrete cosine transform (MDCT) algorithm.
  9. A method of audio signal processing, the method comprising:
    the method comprises the steps that a second device receives a basic frame and an extended frame sent by a first device, wherein the frame length of the extended frame is larger than that of the basic frame;
    decoding the basic frame to obtain a basic audio signal;
    alternatively, the first and second electrodes may be,
    and jointly decoding the basic frame and the extended frame to obtain an extended audio signal.
  10. The method according to claim 9, wherein said decoding the basic frame to obtain a basic audio signal specifically comprises:
    and decoding the basic frame according to a time domain coding and decoding mode to obtain the basic audio signal.
  11. The method according to claim 9 or 10, wherein the jointly decoding the basic frame and the extended frame to obtain an extended audio signal comprises:
    if the extended frame comprises a plurality of group envelope values of the high-frequency signals, obtaining a plurality of frequency domain coefficients of the high-frequency signals according to the group envelope values of the high-frequency signals, wherein the frequency domain coefficients of the high-frequency signals are the group envelope values corresponding to the frequency domain coefficients;
    up-sampling the basic audio signal to obtain a third audio signal;
    performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of frequency domain coefficients of a low-frequency signal corresponding to the third audio signal;
    and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the high-frequency signal and the plurality of frequency domain coefficients of the low-frequency signal to obtain the extended audio signal.
  12. The method according to claim 9, wherein said decoding the basic frame to obtain a basic audio signal specifically comprises:
    if the basic frame comprises a plurality of frequency domain coefficients of a low-frequency signal and a plurality of group envelope values of a high-frequency signal, obtaining the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the basic frame, wherein the plurality of frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients;
    and performing inverse frequency domain transform according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the basic audio signal.
  13. The method according to claim 11 or 12, wherein the jointly decoding the basic frame and the extended frame to obtain an extended audio signal comprises:
    if the extended frame comprises the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values, obtaining the plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal and the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values;
    and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  14. The method according to claim 10, wherein the jointly decoding the basic frame and the extended frame to obtain an extended audio signal comprises:
    if the extended frame comprises a plurality of group envelope values of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, obtaining a plurality of frequency domain coefficients of the low-frequency signal according to the plurality of group envelope values of the low-frequency signal, and obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal;
    wherein the plurality of frequency domain coefficients of the low-frequency signal are determined by performing frequency domain transformation on the basic audio signal obtained from the basic frame, or the plurality of frequency domain coefficients of the low-frequency signal are determined by using a plurality of sets of envelope values of the low-frequency signal in the extended frame, and the plurality of frequency domain coefficients of the low-frequency signal are set of envelope values corresponding to the frequency domain coefficients;
    and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  15. The method according to any of claims 11-14, wherein the inverse frequency domain transformation is performed based on frequency domain coefficients, and specifically comprises:
    and obtaining the audio frequency analog signal corresponding to the frequency domain coefficient according to an improved inverse discrete cosine transform algorithm.
  16. The method according to any of claims 11-15, wherein the set of envelope values comprises an average of the plurality of frequency domain coefficients in each set, the plurality of frequency domain coefficients being grouped in order from low frequency to high frequency.
  17. An audio signal processing apparatus, characterized in that the apparatus comprises:
    the preprocessing module is used for sampling and quantizing the acquired first audio signal to obtain a second audio signal;
    the encoding module is configured to encode the second audio signal in a first encoding manner by using a first time length as a unit to obtain a basic frame, and encode the second audio signal in a second encoding manner by using a second time length as a unit to obtain an extended frame, where the second time length is greater than the first time length, and the first encoding manner and the second encoding manner respectively encode different signals carried in the second audio signal and/or respectively encode the second audio signal in different encoding degrees;
    a sending module, configured to send the basic frame and the extended frame to a second device.
  18. The apparatus of claim 17, wherein the second duration is N times the first duration, and wherein N is a natural number greater than or equal to 2.
  19. The apparatus of claim 18, wherein the encoding module is specifically configured to:
    down-sampling the second audio signal to obtain a low-frequency signal carried in the second audio signal;
    and coding the low-frequency signal according to a time domain coding mode to obtain a plurality of basic frames taking the first duration as a frame length.
  20. The apparatus of claim 19, wherein the encoding module is specifically configured to:
    performing frequency domain transformation on the second audio signal to obtain a frequency domain coefficient corresponding to the second audio signal;
    averagely grouping a plurality of frequency domain coefficients of a high-frequency part in the frequency domain coefficients corresponding to the second audio signal according to a sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is an average value of the plurality of high-frequency domain coefficients in each group;
    and coding according to the group of envelope values to obtain a plurality of extended frames taking the second duration as the frame length.
  21. The apparatus of claim 18, wherein the encoding module is specifically configured to:
    performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of a low-frequency signal and a plurality of frequency domain coefficients of a high-frequency signal corresponding to the second audio signal;
    averagely grouping the plurality of frequency domain coefficients of the high-frequency signal according to the sequence from low frequency to high frequency to obtain a group envelope value of a plurality of high-frequency groups, wherein the group envelope value is the average value of the plurality of high-frequency domain coefficients in each group;
    and coding according to the plurality of frequency domain coefficients of the low-frequency signal and the group of envelope values of the high-frequency signal to obtain a plurality of basic frames taking the first time length as a frame length.
  22. The apparatus according to claim 20 or 21, wherein the encoding module is specifically configured to:
    and coding the difference values obtained by the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group of envelope values by taking the second time length as a unit to obtain a plurality of extension frames with the second time length as a frame length.
  23. The apparatus of claim 19, wherein the encoding module is specifically configured to:
    performing frequency domain transformation on the second audio signal to obtain a plurality of frequency domain coefficients of a low-frequency signal and a plurality of frequency domain coefficients of a high-frequency signal corresponding to the second audio signal;
    averaging and grouping the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal respectively according to a sequence from low frequency to high frequency to obtain corresponding group envelope values, wherein the group envelope values are average values of the plurality of frequency domain coefficients in each group;
    and coding according to the group of envelope values to obtain a plurality of extended frames taking the second duration as the frame length.
  24. The apparatus according to any of the claims 20-23, wherein the frequency domain transformation specifically comprises: the discrete cosine transform MDCT algorithm is improved.
  25. An audio signal processing apparatus, characterized in that the apparatus comprises:
    a receiving module, configured to receive a basic frame and an extended frame sent from a first apparatus, where a frame length of the extended frame is greater than a frame length of the basic frame, and the extended frame is obtained by recoding audio signals corresponding to multiple basic frames;
    the decoding module is used for decoding the basic frame to obtain a basic audio signal; or jointly decoding the basic frame and the extended frame to obtain an extended audio signal.
  26. The apparatus of claim 25, wherein the decoding module is specifically configured to:
    and decoding the basic frame according to a time domain coding and decoding mode to obtain the basic audio signal.
  27. The apparatus of claim 25 or 26, wherein the decoding module is specifically configured to:
    if the extended frame comprises a plurality of group envelope values of the high-frequency signals, obtaining a plurality of frequency domain coefficients of the high-frequency signals according to the group envelope values of the high-frequency signals, wherein the frequency domain coefficients of the high-frequency signals are the group envelope values corresponding to the frequency domain coefficients;
    up-sampling the basic audio signal to obtain a third audio signal;
    performing frequency domain transformation on the third audio signal frame by frame to obtain a plurality of frequency domain coefficients of a low-frequency signal corresponding to the third audio signal;
    and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the high-frequency signal and the plurality of frequency domain coefficients of the low-frequency signal to obtain the extended audio signal.
  28. The apparatus of claim 25, wherein the decoding module is specifically configured to:
    if the basic frame comprises a plurality of frequency domain coefficients of a low-frequency signal and a plurality of group envelope values of a high-frequency signal, obtaining the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal according to the basic frame, wherein the plurality of frequency domain coefficients of the high-frequency signal are the group envelope values corresponding to the frequency domain coefficients;
    and performing inverse frequency domain transform according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the basic audio signal.
  29. The apparatus of claim 27 or 28, wherein the decoding module is specifically configured to:
    if the extended frame comprises the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values, obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal and the difference values of the plurality of frequency domain coefficients of the high-frequency signal and the corresponding group envelope values;
    and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  30. The apparatus of claim 26, wherein the decoding module is specifically configured to:
    if the extended frame comprises a plurality of group envelope values of the low-frequency signal and a plurality of group envelope values of the high-frequency signal, obtaining a plurality of frequency domain coefficients of the low-frequency signal according to the plurality of group envelope values of the low-frequency signal, and obtaining a plurality of frequency domain coefficients of the high-frequency signal according to the plurality of group envelope values of the high-frequency signal;
    wherein the plurality of frequency domain coefficients of the low-frequency signal are determined by performing frequency domain transformation on the basic audio signal obtained from the basic frame, or the plurality of frequency domain coefficients of the low-frequency signal are determined by using a plurality of sets of envelope values of the low-frequency signal in the extended frame, and the plurality of frequency domain coefficients of the low-frequency signal are set of envelope values corresponding to the frequency domain coefficients;
    and performing inverse frequency domain transformation according to the plurality of frequency domain coefficients of the low-frequency signal and the plurality of frequency domain coefficients of the high-frequency signal to obtain the extended audio signal.
  31. The apparatus according to any of claims 27-30, wherein the inverse frequency domain transform specifically comprises: and improving an inverse discrete cosine transform algorithm.
  32. The apparatus according to any of claims 27-31, wherein the group envelope value comprises an average of a plurality of frequency domain coefficients in each group, which is obtained by averagely grouping the plurality of frequency domain coefficients in order from low frequency to high frequency.
  33. An electronic device, characterized in that the electronic device comprises:
    a processor and a transmission interface;
    a memory for storing the processor-executable instructions;
    wherein the processor is configured to execute the instructions to cause the electronic device to implement the audio signal processing method of any one of claims 1 to 8.
  34. An electronic device, characterized in that the electronic device comprises:
    a processor and a transmission interface;
    a memory for storing the processor-executable instructions;
    wherein the processor is configured to execute the instructions to cause the electronic device to implement the audio signal processing method of any of claims 9 to 16.
  35. A computer-readable storage medium whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform the audio signal processing method of any one of claims 1 to 8.
  36. A computer program product which, when run on a computer, causes the computer to carry out the audio signal processing method of any one of claims 1 to 8.
  37. A computer-readable storage medium whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform the audio signal processing method of any of claims 9 to 16.
  38. A computer program product which, when run on a computer, causes the computer to perform the audio signal processing method of any one of claims 9 to 16.
CN202080092744.4A 2020-06-24 2020-06-24 Audio signal processing method and device Pending CN114945981A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/098183 WO2021258350A1 (en) 2020-06-24 2020-06-24 Audio signal processing method and apparatus

Publications (1)

Publication Number Publication Date
CN114945981A true CN114945981A (en) 2022-08-26

Family

ID=79282732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080092744.4A Pending CN114945981A (en) 2020-06-24 2020-06-24 Audio signal processing method and device

Country Status (2)

Country Link
CN (1) CN114945981A (en)
WO (1) WO2021258350A1 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3881943B2 (en) * 2002-09-06 2007-02-14 松下電器産業株式会社 Acoustic encoding apparatus and acoustic encoding method
CN103035248B (en) * 2011-10-08 2015-01-21 华为技术有限公司 Encoding method and device for audio signals

Also Published As

Publication number Publication date
WO2021258350A1 (en) 2021-12-30

Similar Documents

Publication Publication Date Title
US8509931B2 (en) Progressive encoding of audio
US10089997B2 (en) Method for predicting high frequency band signal, encoding device, and decoding device
US10909992B2 (en) Energy lossless coding method and apparatus, signal coding method and apparatus, energy lossless decoding method and apparatus, and signal decoding method and apparatus
RU2439718C1 (en) Method and device for sound signal processing
CN109147806B (en) Voice tone enhancement method, device and system based on deep learning
EP3975173B1 (en) A computer-readable storage medium and a computer software product
US9916837B2 (en) Methods and apparatuses for transmitting and receiving audio signals
EP2863388B1 (en) Bit allocation method and device for audio signal
US10121484B2 (en) Method and apparatus for decoding speech/audio bitstream
BR112014016153B1 (en) method for an encoder to process audio data, method for processing an audio signal, encoder and decoder
WO2015196837A1 (en) Audio coding method and apparatus
WO2014051964A1 (en) Apparatus and method for audio frame loss recovery
US9385750B2 (en) Split gain shape vector coding
US20110010167A1 (en) Method for generating background noise and noise processing apparatus
EP3128513A1 (en) Encoder, decoder, encoding method, decoding method, and program
WO2015165264A1 (en) Signal processing method and device
CN114945981A (en) Audio signal processing method and device
CN113096670A (en) Audio data processing method, device, equipment and storage medium
CN113903345A (en) Audio processing method and device and electronic device
US9354957B2 (en) Method and apparatus for concealing error in communication system
CN115985330A (en) System and method for audio encoding and decoding
CN116011556A (en) System and method for training audio codec
CN116863949A (en) Communication receiving method and device thereof
CN115512711A (en) Speech coding, speech decoding method, apparatus, computer device and storage medium
CN117119190A (en) Video processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination