CN114550732A - Coding and decoding method and related device for high-frequency audio signal - Google Patents

Coding and decoding method and related device for high-frequency audio signal Download PDF

Info

Publication number
CN114550732A
CN114550732A CN202210395889.2A CN202210395889A CN114550732A CN 114550732 A CN114550732 A CN 114550732A CN 202210395889 A CN202210395889 A CN 202210395889A CN 114550732 A CN114550732 A CN 114550732A
Authority
CN
China
Prior art keywords
coding
frequency
audio signal
signal frame
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210395889.2A
Other languages
Chinese (zh)
Other versions
CN114550732B (en
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210395889.2A priority Critical patent/CN114550732B/en
Publication of CN114550732A publication Critical patent/CN114550732A/en
Application granted granted Critical
Publication of CN114550732B publication Critical patent/CN114550732B/en
Priority to PCT/CN2023/081461 priority patent/WO2023197809A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Abstract

The application discloses a coding and decoding method and a related device of a high-frequency audio signal, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving, vehicle-mounted scenes and the like. The method comprises the steps of obtaining a plurality of coding modes and obtaining an original high-frequency audio signal frame obtained by decomposing the original audio signal frame, wherein the coding modes have corresponding priorities, and the number of coding bits of the coding modes is increased progressively according to the priorities of the coding modes from high to low. And according to the priority of the coding modes, determining a coding mode with a coding error in an error preset interval from the multiple coding modes as a target coding mode, and transmitting a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end. Under the condition that the coding quality allows, the coding mode with small number of coding bits is selected, so that the effect of satisfying the coding bit number and the coding quality is achieved, and the audio with lower coding bit number and high quality is achieved.

Description

Coding and decoding method and related device for high-frequency audio signal
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and a related apparatus for encoding and decoding a high-frequency audio signal.
Background
The audio coding and decoding plays an important role in modern communication systems, and the network bandwidth pressure of the audio signals in network transmission can be reduced by carrying out compression coding processing on the audio signals, so that the storage cost and the transmission cost of the audio signals are saved.
The high-frequency component of the audio signal (namely, the high-frequency audio signal) has richer information, the influence on the tone quality is large, and the loss of the high-frequency audio signal causes the problems of sound oppression, intelligibility reduction, fidelity reduction and the like. Compared with the low-frequency components of the audio signals (namely, the low-frequency audio signals), the low-frequency audio signal has the characteristics of low energy occupation, low harmonic component, low human ear resolution and the like, so that the high-frequency audio signal has a large coding compression space.
The present high frequency audio signal coding mode either sacrifices coding quality for reducing the number of coding bits or increases the number of coding bits for improving the coding quality, and it is difficult to achieve satisfactory effects on both the number of coding bits and the coding quality.
Disclosure of Invention
In order to solve the above technical problem, the present application provides a method and a related apparatus for encoding and decoding a high frequency audio signal, which can select an encoding mode with a small number of encoding bits when the encoding quality allows, achieve a satisfactory effect on both the number of encoding bits and the encoding quality, and have a lower number of encoding bits and a high quality audio.
The embodiment of the application discloses the following technical scheme:
in one aspect, an embodiment of the present application provides a method for encoding a high-frequency audio signal, where the method includes:
acquiring a plurality of coding modes and acquiring an original high-frequency audio signal frame obtained by decomposing the original audio signal frame;
acquiring priorities corresponding to the multiple coding modes respectively, and increasing the number of coding bits of the coding modes in an ascending order according to the priorities from high to low;
according to the priority of the coding modes, determining a coding mode with a coding error in an error preset interval from the multiple coding modes as a target coding mode, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode;
and sending a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end, wherein the high-frequency code stream is provided with a coding identifier which is used for indicating a coding mode used for coding the high-frequency code stream.
In one aspect, an embodiment of the present application provides another method for decoding a high-frequency audio signal, where the method includes:
receiving a high-frequency code stream sent by a sending end, wherein the high-frequency code stream is provided with a coding identifier, and the coding identifier is used for indicating a coding mode used by the high-frequency code stream obtained through coding;
analyzing to obtain a coding identifier corresponding to the high-frequency code stream;
and decoding the high-frequency code stream according to a decoding mode corresponding to the coding mode indicated by the coding identification to obtain a high-frequency audio signal frame.
In one aspect, an embodiment of the present application provides an apparatus for encoding a high-frequency audio signal, where the apparatus includes an obtaining unit, a determining unit, and a sending unit:
the acquisition unit is used for acquiring a plurality of coding modes and acquiring an original high-frequency audio signal frame obtained by decomposing the original audio signal frame;
the acquiring unit is further configured to acquire priorities corresponding to the multiple coding modes, and the number of coded bits of the coding modes increases progressively according to the sequence from high to low of the priorities;
the determining unit is used for determining a coding mode of a coding error in an error preset interval from the multiple coding modes as a target coding mode according to the priority of the coding modes, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode;
the transmitting unit is used for transmitting a high-frequency code stream obtained by encoding the original high-frequency audio signal frame by using the target encoding mode to a receiving end, wherein the high-frequency code stream is provided with an encoding identifier, and the encoding identifier is used for indicating an encoding mode used by the high-frequency code stream obtained by encoding.
In one aspect, an embodiment of the present application provides another apparatus for decoding a high-frequency audio signal, where the apparatus includes a receiving unit, a parsing unit, and a decoding unit:
the receiving unit is used for receiving a high-frequency code stream sent by a sending end, the high-frequency code stream is provided with a coding identifier, and the coding identifier is used for indicating a coding mode used by the high-frequency code stream obtained through coding;
the analysis unit is used for analyzing to obtain a coding identifier corresponding to the high-frequency code stream;
and the decoding unit is used for decoding the high-frequency code stream according to the decoding mode corresponding to the coding mode indicated by the coding identification to obtain a high-frequency audio signal frame.
In one aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of the preceding aspects in accordance with instructions in the program code.
In one aspect, the present application provides a computer-readable storage medium for storing program code for executing the method of any one of the preceding aspects.
In one aspect, the present application provides a computer program product comprising a computer program that, when executed by a processor, implements the method of any one of the preceding aspects.
It can be seen from the foregoing technical solutions that, in the present application, for an original high-frequency audio signal of an original audio signal, a coding error decision-based high-frequency audio signal coding and decoding method with multiple coding modes mixed is provided, and specifically, for each original audio signal frame in the original audio signal, multiple coding modes are obtained and an original high-frequency audio signal frame obtained by decomposing the original audio signal frame is obtained, where the coding modes have corresponding priorities, and the priorities of the coding modes are used to indicate a priority order of coding using the coding modes, and in general, in order to reduce a bandwidth of audio signal transmission as much as possible, the number of coding bits of the coding modes increases in an order from high to low in priority. And then according to the priority of the coding mode, determining a coding mode with a coding error in an error preset interval from multiple coding modes as a target coding mode, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode, so that the target coding mode can be determined by using the coding error as a judgment standard and the optimal number of coding bits as a target, and transmitting a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end, thereby selecting the coding mode with a small number of coding bits under the condition of allowable coding quality, and reducing the bandwidth of audio signal transmission. The high-frequency code stream has the coding identifier, and the coding identifier is used for indicating the coding mode used by the high-frequency code stream obtained by coding, so that the decoding end can determine which coding mode is used for decoding the received high-frequency code stream according to the coding identifier. Therefore, the method and the device can select the coding mode with small number of coding bits under the condition of allowable coding quality, achieve satisfactory effect on the number of the coding bits and the coding quality, and have lower number of the coding bits and high-quality audio.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for a person of ordinary skill in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is an application scenario architecture diagram of a coding and decoding method for a high-frequency audio signal according to an embodiment of the present application;
fig. 2 is a flowchart of a method for encoding a high-frequency audio signal according to an embodiment of the present application;
fig. 3 is a flowchart of coding in an SBR method according to an embodiment of the present application;
fig. 4 is a decoding flowchart of an SBR method according to an embodiment of the present application;
FIG. 5 is a diagram illustrating low frequency audio signal replication and high frequency replica signal correction according to an embodiment of the present application;
fig. 6 is a coding flow chart of a CELP coding method according to an embodiment of the present application;
fig. 7 is a decoding flow chart of a CELP coding method according to an embodiment of the present application;
fig. 8 is a flowchart of a method for determining a coding error according to an embodiment of the present application;
FIG. 9 is a graph of acoustic equal loudness contours measured by the International Acoustic Standard organization provided in an embodiment of the present application;
FIG. 10 is a graph of calculated auditory perception weighting coefficients provided by embodiments of the present application;
fig. 11 is a flowchart of a method for decoding a high-frequency audio signal according to an embodiment of the present application;
fig. 12 is an architecture diagram of an overall implementation of a method for encoding and decoding a high-frequency audio signal according to an embodiment of the present application;
fig. 13 is a block diagram of an apparatus for encoding a high frequency audio signal according to an embodiment of the present application;
fig. 14 is a block diagram of a decoding apparatus for high frequency audio signals according to an embodiment of the present application;
fig. 15 is a block diagram of a terminal according to an embodiment of the present application;
fig. 16 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
Audio coding and decoding play an important role in modern communication systems. For example, in voice call applications, an audio signal is acquired by a microphone, an analog audio signal is converted into a digital audio signal by an analog-to-digital conversion circuit, the digital audio signal is compressed by an encoder, and then is packed and sent to a receiving end according to a communication network transmission format and a protocol, the receiving end unpacks a data packet and outputs a coding code stream, the coding code stream passes through a decoder and then regenerates an audio digital signal, and finally the audio digital signal is played by a speaker. The audio encoding and decoding can effectively reduce the bandwidth of audio signal transmission, and plays a decisive role in saving the storage and transmission cost of audio signals and ensuring the integrity of audio signals in the transmission process of a communication network.
The high-frequency audio signal of the audio signal has rich information and has a large influence on the tone quality, and compared with the low-frequency audio signal of the audio signal, the high-frequency audio signal has the characteristics of low energy occupation, low harmonic component, low human ear resolution ratio and the like, so that the high-frequency audio signal has a large coding compression space.
The high-frequency audio signal coding scheme provided by the related art sacrifices coding quality (for example, blind spreading scheme) for reducing the number of coding bits, or increases the number of coding bits for improving coding quality (for example, Code-Excited Linear Prediction (CELP) coding scheme), and thus it is difficult to achieve satisfactory effects on both the number of coding bits and the coding quality.
In order to solve the above technical problem, an embodiment of the present application provides a method for encoding and decoding a high frequency audio signal, where the method is a method for encoding and decoding a high frequency audio signal in which multiple encoding modes are mixed based on encoding error decision, and a coding mode with a small number of encoding bits can be selected and used when the encoding quality is allowed, so that a satisfactory effect can be achieved on both the number of encoding bits and the encoding quality, and the method has a lower number of encoding bits and a high quality audio. The audio signal may be voice, music, etc.
As shown in fig. 1, fig. 1 is a diagram illustrating an application scenario architecture of a coding and decoding method for a high frequency audio signal. A sender 101 and a receiver 102 may be included in the application scenario. The sending end 101 and the receiving end 102 may both be terminals, or the sending end 101 may be a terminal, and the receiving end 102 may be a server, and so on. The terminal may be, for example, a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, an aircraft, or the like, but is not limited thereto. The server may be, for example, a stand-alone server, or a server in a cluster or a cloud server. When the transmitting end 101 is a terminal and the receiving end 102 is a server, the terminal and the server may be connected in a wired or wireless manner. In the embodiment of the present application, a voice call scenario is taken as an example for introduction, and at this time, the sending end 101 and the receiving end 102 may both be terminals, and the terminals are mobile phones.
In a voice call scenario, a sending end 101 may collect an original audio signal through a corresponding microphone (at this time, the audio signal may be a voice of a user corresponding to the sending end 101), and before the sending end 101 sends the original audio signal to a receiving end 102, the original audio signal may be encoded.
For each original audio signal frame in the original audio signal, the transmitting end 101 may obtain multiple encoding modes and obtain an original high-frequency audio signal frame decomposed from the original audio signal frame, where the encoding modes have corresponding priorities, and the priorities of the encoding modes are used to indicate a priority order of encoding using the encoding modes, and in general, in order to reduce a bandwidth of audio signal transmission as much as possible, the number of encoding bits of the encoding modes is increased in an order from high to low in priority.
Then, the transmitting end 101 determines, according to the priority of the coding method, a coding method with a coding error within a preset error interval from the multiple coding methods as a target coding method, where the coding error of the coding method is generated by coding the original high-frequency audio signal frame by using the coding method, so that the target coding method can be determined by using the coding error as a criterion and using the optimal number of coding bits as a target. The sending end 101 sends the high-frequency code stream obtained by encoding the original high-frequency audio signal frame by using the target encoding mode to the receiving end 102, so that the encoding mode with small number of encoding bits is selected to be used under the condition that the encoding quality allows, and the bandwidth of audio signal transmission is reduced. Because the high-frequency code stream has the coding identifier, the coding identifier is used for indicating the coding mode used by the high-frequency code stream obtained by coding, and the receiving end 102 can determine which coding mode is used to decode the received high-frequency code stream according to the coding identifier.
The receiving end 102 determines a decoding mode corresponding to the coding mode identified by the coding identifier according to the coding identifier, so that the high-frequency code stream is decoded by using the corresponding decoding mode to obtain a high-frequency audio signal frame, and the high-frequency audio signal frame is played through a corresponding speaker.
It should be noted that the embodiment of the present application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart traffic, driving assistance, vehicle-mounted scenarios, and the like, and is particularly applied to voice calls, video conferences, human-computer interaction scenarios, and the like in these scenarios.
Next, a method for encoding a high-frequency audio signal according to an embodiment of the present application will be described in detail with reference to the accompanying drawings from the perspective of a transmitting end.
Referring to fig. 2, fig. 2 shows a flow chart of a method of encoding a high frequency audio signal, the method comprising:
s201, acquiring a plurality of coding modes, and acquiring an original high-frequency audio signal frame obtained by decomposing the original audio signal frame.
When the sending end acquires the original audio signal, the bandwidth of audio signal transmission is effectively reduced, the original audio signal can be encoded firstly, and therefore the encoded code stream is transmitted to the receiving end. The original audio signal may include a high-frequency audio signal, and the high-frequency audio signal has a larger coding space based on the characteristics of the high-frequency audio signal. When encoding a high-frequency audio signal, the embodiment of the present application provides a method for encoding a high-frequency audio signal by mixing multiple encoding methods based on encoding error decision. Specifically, for each original audio signal frame in the original audio signal, the transmitting end may obtain multiple encoding modes and obtain an original high-frequency audio signal frame decomposed from the original audio signal frame.
In this embodiment of the present application, the multiple selectable coding modes may be any multiple coding modes in existing coding modes, for example, two or more combinations of a Super Resolution audio (SSR) mode, a Spectral Band Replication (SBR) mode, a CELP coding mode, and the like may be included, where the SSR mode is a blind expansion mode, and the SBR mode and the CELP coding mode are non-blind expansion modes, and of course, the multiple coding modes may also include other coding modes, which is not limited in this embodiment of the present application. The embodiment of the present application mainly takes a plurality of coding modes including an SSR mode, an SBR mode, and a CELP coding mode as examples.
The coding and decoding principles of each coding mode are introduced in turn as follows:
the SSR scheme is a blind band spreading scheme, which may also be referred to as a blind spreading scheme, and does not transmit coding parameters to a receiving end during coding, so that the SSR scheme does not occupy the number of coding bits. When the receiving end decodes, based on that the low-frequency audio signal and the high-frequency audio signal have a certain correlation, the high-frequency audio signal is mapped out through the low-frequency audio signal by some prediction methods, such as a neural network model in deep learning. This approach aims to reconstruct a high-resolution audio signal with a lower-resolution audio signal as input. With the rapid development of the deep neural network, the SSR method can predict the feature information of the high-frequency audio signal based on the neural network model through the feature information of the input low-frequency audio signal, thereby forming the high-frequency audio signal.
The SBR method is a non-blind expansion method, and requires reconstruction of a high-frequency audio signal from a small number of encoding parameters transmitted from a transmitting end. This approach has better reconstruction quality because the high frequency reconstruction is supported by the side information. As shown in fig. 3 and fig. 4, fig. 3 is a flow chart of encoding in the SBR method, and fig. 4 is a flow chart of decoding in the SBR method. Fig. 3 illustrates an encoding method of Advanced Audio Coding (AAC) + SBR, and fig. 4 illustrates a decoding method of corresponding AAC + SBR. In fig. 3, an original audio signal is first decomposed into a high-frequency audio signal and a low-frequency audio signal, for example, the high-frequency audio signal is obtained through a Quadrature Mirror Filter (QMF) Filter bank, the low-frequency audio signal is obtained through a 2:1 down-sampler, the low-frequency audio signal generates encoding parameters of the low-frequency audio signal by using an AAC encoder, the high-frequency audio signal is encoded based on the SBR encoder, a high-frequency replica signal is obtained by copying the low-frequency audio signal to a high-frequency band, then an envelope feature is obtained by extracting the envelope feature according to the envelope, the high-frequency replica signal is corrected by using the envelope feature (the process is shown in fig. 5), and the encoding parameters are extracted to be transmitted to a receiving end. As can be seen from fig. 5, the graph (a) in fig. 5 is a schematic diagram of a high-frequency energy curve obtained by directly copying a low-frequency audio signal to a high-frequency band to obtain a high-frequency copied signal 501, and the high-frequency energy is slightly different from the actual high-frequency energy, and the envelope characteristic can more accurately reflect the high-frequency energy, so that the high-frequency copied signal is corrected based on the envelope characteristic obtained by envelope extraction to obtain a high-frequency reconstructed signal 502, and the high-frequency energy curve obtained at this time can be shown in fig. 5 (b). The coded high-frequency audio signal and the coded low-frequency audio signal obtained in the above process can be combined by a bit stream multiplexer to obtain a corresponding coded code stream.
By introducing the encoding process of the SBR, the SBR only needs to transmit limited parameters to the receiving end.
In the decoding process of the receiving end shown in fig. 4, the encoded code stream is decomposed into an encoded low-frequency audio signal and an encoded high-frequency audio signal by the code stream decomposer. Firstly, decoding the encoded low-frequency audio signals, generating the low-frequency audio signals by the encoded low-frequency audio signals through an AAC decoder, and participating in high-frequency reconstruction after the low-frequency audio signals pass through a QMF analysis filter. The high-frequency reconstruction process comprises the steps of obtaining required coding parameters through decoding of an SBR decoder, and copying a low-frequency audio signal to a high-frequency band to obtain a high-frequency copy signal. And correcting the high-frequency copy signal by using the envelope characteristics obtained by envelope extraction to generate a high-frequency reconstruction signal, aligning the high-frequency signal and the low-frequency signal after a certain time delay, and combining the signals into a full-band audio signal through a synthesis filter.
The CELP coding mode is an effective voice compression coding mode with medium and low coding bit quantity, takes a codebook as an excitation source, has the advantages of low code rate, high quality of synthesized voice, strong noise resistance and the like, is widely applied to code rates of 4.8-16 kbps, and has various models and the like in the traditional coder adopting the CELP coding mode. Fig. 6 and 7 are a coding flow chart of the CELP coding scheme and a decoding flow chart of the CELP coding scheme, respectively. In FIG. 6, the original audio signal is preprocessed, e.g., high-pass filtered, and then Linear Predictive Coding (LPC) is used to obtain a signalThe linear prediction filter coefficients are grouped and the LPC parameters (e.g., linear prediction filter coefficients) are converted to LSP parameters and quantized, thereby facilitating transmission to the receiving end. The preprocessed original audio signal s (n) and LPC prediction filtering result
Figure 816413DEST_PATH_IMAGE001
And (n) obtaining a filtered residual signal after the residual signal passes through a perceptual weighting filter, and searching the optimal fixed codebook and the optimal adaptive codebook based on the filtered residual signal e (n) by using the minimum perceptual weighting error as a principle to calculate the fixed codebook gain (Gc) and the adaptive codebook gain (Ga). The coding parameters obtained in the coding process are packaged and transmitted to a receiving end.
In the decoding process, referring to fig. 7, the receiving end parses all encoding parameters from the received data packet through the decoder, and simultaneously generates a fixed codebook excitation signal based on the fixed codebook and the fixed codebook gain, and generates an adaptive codebook excitation signal based on the adaptive codebook and the adaptive codebook gain, and the sum of the two excitations is filtered and post-processed by the synthesis filter to obtain the final audio signal. Wherein the filter coefficients of the synthesis filter are interpolated from the LSP parameters.
S202, acquiring priorities corresponding to the multiple coding modes respectively, and increasing the number of the coding bits of the coding modes according to the sequence from high priority to low priority.
In the embodiment of the present application, the coding method has a corresponding priority, and the priority of the coding method is used to indicate a priority order of coding using the coding method, and in general, in order to reduce the bandwidth of audio signal transmission as much as possible, the number of coded bits of the coding method is increased in an order from high priority to low priority, that is, the bandwidth occupied by transmission or the compressed storage space is increased.
When the multiple coding modes include an SSR mode, an SBR mode, and a CELP coding mode, the coding bit numbers are the SSR mode SBR mode and the CELP coding mode in order from small to large, and thus the priorities are the SSR mode SBR mode and the CELP coding mode in order from high to low.
And S203, according to the priority of the coding modes, determining a coding mode with a coding error in an error preset interval from the multiple coding modes as a target coding mode, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode.
In the related technology, when a blind spreading mode is used, high-frequency reconstruction is performed with the advantage of not occupying the number of coding bits, and a high-frequency audio signal frame of the high-frequency audio signal frame is completely predicted based on the characteristic information of a low-frequency audio signal frame; the SBR mode can only ensure envelope matching, cannot further reduce errors and occupies less coding bit quantity; high-frequency reconstruction based on a CELP coding mode ensures that a reconstructed high-frequency audio signal frame and an original high-frequency audio signal frame have consistent envelopes through LSP parameters, and further reduces errors of the reconstructed high-frequency audio signal frame and the original high-frequency audio signal frame through codebook excitation, but occupies more coding bits. Based on the above analysis, the embodiments of the present application aim to achieve a satisfactory effect on both the number of coding bits and the coding quality (i.e. the error is small), and therefore, based on the coding error decision and the priority of the coding scheme (the priority reflects the number of coding bits), the embodiments of the present application select a suitable coding scheme to encode the current original high-frequency audio signal frame. Specifically, the transmitting end determines, as a target encoding scheme, an encoding scheme with an encoding error within an error preset interval from among a plurality of encoding schemes according to priorities of the encoding schemes, where the encoding error of the encoding scheme is generated by encoding an original high-frequency audio signal frame by using the encoding scheme.
The encoding error is generated by encoding the original high-frequency audio signal frame by using an encoding mode, may be an error between a reconstructed high-frequency audio signal frame (i.e., a high-frequency reconstructed signal frame) and the original high-frequency audio signal frame, and may represent encoding quality, where the smaller the encoding error is, the higher the encoding quality is. The priority and the coding quality are comprehensively considered, a coding mode with small number of coding bits can be selected and used under the condition of allowing the coding quality, the satisfactory effect on the number of the coding bits and the coding quality is achieved, and the audio with lower number of the coding bits and high quality is achieved. The preset error interval may be greater than an error threshold (Thrd), and when the coding error is less than or equal to Thrd, the coding error may be considered to be within the preset error interval, otherwise, the coding error exceeds the preset error interval.
It should be noted that the embodiments of the present application provide various ways to implement S203. In one possible implementation manner, the encoding error of each of the plurality of encoding schemes may be determined, and if there is an encoding scheme having an encoding error within the error preset interval, the encoding scheme with the highest priority is determined as the target encoding scheme from the encoding schemes having an encoding error within the error preset interval. Taking a plurality of coding modes including an SSR mode, an SBR mode, and a CELP coding mode as an example, coding errors of the SSR mode, the SBR mode, and the CELP coding mode are respectively determined according to the SSR mode, the SBR mode, and the CELP coding mode in order of priority from high to low, and if the coding errors of the SSR mode and the SBR mode are in the coding mode within the error preset interval, the SSR mode is used as a target coding mode because the SSR mode has higher priority than the SBR mode. Of course, if only the coding error of the SSR scheme is in the coding scheme within the error preset interval, the SSR scheme is directly used as the target coding scheme, or if only the coding error of the SBR scheme is in the coding scheme within the error preset interval, the SBR scheme is directly used as the target coding scheme.
If there is no coding mode with coding error in the preset error interval, the coding mode can be selected according to actual conditions. For example, for a scene with a high quality requirement, the coding method with the minimum coding error is used as the target coding method, so that the coding quality is ensured. And if the scene with higher bandwidth requirement is used, the coding mode with the highest priority is taken as the target coding mode. Continuously taking a plurality of coding modes including an SSR mode, an SBR mode and a CELP coding mode as an example, respectively determining coding errors of the SSR mode, the SBR mode and the CELP coding mode according to the sequence of the priorities from high to low, wherein the coding errors of any coding mode exceed an error preset interval, and for scenes with high quality requirements, the coding mode is taken as a target coding mode because the coding error of the CELP coding mode is minimum and the coding quality is highest; for a scene with a higher bandwidth requirement, the SSR mode is used as a target coding mode because the SSR mode has the highest priority.
In another possible implementation manner, a step attempt may be performed from high to low according to the priority, whether the coding error of the coding scheme is within the error preset interval is sequentially determined, and if the coding error of the currently selected coding scheme is within the error preset interval, the attempt is stopped and the currently selected coding scheme is selected as the target coding scheme for coding. Specifically, the sending end selects an undetermined coding mode from multiple coding modes in sequence according to the sequence of priorities from high to low, determines a coding error of the undetermined coding mode, determines the undetermined coding mode as a target coding mode if the coding error of the undetermined coding mode is within an error preset interval, and stops continuously selecting the undetermined coding mode. If the pending coding mode is the last coding mode (i.e. the coding mode with the lowest priority) in the multiple coding modes, it indicates that the coding errors of the previously attempted coding modes all exceed the preset error interval, and the last coding mode can be directly used as the target coding mode without executing the step of determining the coding errors of the coding modes for the last coding mode.
Continuously taking a plurality of coding modes including an SSR mode, an SBR mode and a CELP coding mode as examples, and sequentially selecting the SSR mode, the SBR mode and the CELP coding mode from high priority to low priority, firstly selecting the SSR mode as a pending coding mode, determining a coding error of the SSR mode, and if the coding error of the SSR mode is in an error preset interval, determining the SSR mode as a target coding mode; if the coding error of the SSR mode exceeds the error preset interval, the SBR mode is continuously selected as the to-be-determined coding mode, the coding error of the SBR mode is determined, and if the coding error of the SBR mode is in the error preset interval, the SBR mode is determined as the target coding mode; and if the coding error of the SBR mode exceeds the error preset interval, directly taking the CELP coding mode as the target coding mode.
S204, sending a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end, wherein the high-frequency code stream has a coding identifier which is used for indicating a coding mode used for coding the high-frequency code stream.
After the target coding mode is determined, the high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode can be sent to the receiving end, the high-frequency code stream has a coding identifier, and the coding identifier is used for indicating the coding mode used by the high-frequency code stream obtained by coding, so that the receiving end can know the decoding mode corresponding to the coding mode. The coded mark may be a coded pattern for unique identification, and may be in various possible forms, such as numbers, symbols, letters, and the like.
It can be understood that, when encoding each original high-frequency audio signal frame, an appropriate encoding method needs to be selected according to the priority of the encoding method and the encoding error for encoding, and therefore, the encoding methods may be different between different original high-frequency audio signal frames, thereby implementing mixed encoding of multiple encoding methods.
When selecting the encoding mode for the original high-frequency audio signal, determining the encoding error is a more critical step. Next, a method of determining a coding error in any coding scheme will be described. Referring to fig. 8, the method includes:
s801, acquiring an original low-frequency audio signal frame obtained by decomposing the original audio signal frame.
S802, according to the original low-frequency audio signal frame, high-frequency reconstruction is carried out by utilizing any coding mode to obtain a high-frequency reconstruction signal frame.
It should be noted that, according to the encoding and decoding principles of different encoding schemes, the high frequency reconstruction method may be different for different encoding schemes. If any coding mode is an audio super-resolution mode, the implementation mode of S802 may be to obtain a neural network model corresponding to the audio super-resolution mode, and predict, according to the original low-frequency audio signal frame, the neural network model to obtain the high-frequency reconstructed signal frame. The neural network model is obtained by training a training sample, the training sample is a sample low-frequency audio signal frame with a label, low-frequency characteristics corresponding to the sample low-frequency audio signal frame can be used as input of the neural network model in a training stage, a high-frequency reconstruction signal frame is used as output of the neural network model, and the neural network model capable of predicting the high-frequency reconstruction signal frame according to the sample low-frequency audio signal frame is obtained through training of a large-scale training sample. When the method is used, the original low-frequency audio signal frame is used as the input of a neural network model, low-frequency characteristics are extracted through the neural network model, and then the high-frequency reconstruction signal frame is predicted according to the low-frequency characteristics. In some cases, the low-frequency features of the original low-frequency audio signal frame may be extracted by other means, and the low-frequency features may be used as an input of the neural network model, so as to output the high-frequency reconstructed signal frame. The Neural network model may be a Convolutional Neural Network (CNN), a Long-Short-Term Memory network (LSTM), and the like, which is not limited in this embodiment of the present application.
If any of the encoding methods is a frequency band replication method, the implementation manner of S802 may be to replicate the original low-frequency audio signal frame to a high-frequency band to obtain a high-frequency replicated signal frame. The low-frequency audio signal frame is directly copied to a high-frequency band, the high-frequency energy of the obtained high-frequency copied signal frame is slightly different from the actual high-frequency energy, and the envelope characteristic can reflect the high-frequency energy more accurately, so that the envelope characteristic of the original high-frequency audio signal frame can be extracted, and the envelope characteristic is used for correcting the high-frequency copied signal frame to obtain a high-frequency reconstructed signal frame.
If any coding mode is a code excited linear prediction mode, the implementation mode of S802 may be to obtain coding parameters from a high-frequency code stream, obtain a pitch period (pitch) of an original low-frequency audio signal frame, and perform high-frequency reconstruction according to the coding parameters and the pitch period to obtain a high-frequency reconstructed signal frame. The coding parameters may include, among other things, LSP parameters, codebook data (e.g., fixed codebook and adaptive codebook), gain data (e.g., fixed codebook gain and adaptive codebook gain).
And S803, carrying out error analysis based on the high-frequency reconstruction signal frame and the original high-frequency audio signal frame to obtain a corresponding coding error.
After the high-frequency reconstructed signal frame is obtained, error analysis can be performed on the basis of the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain a corresponding coding error, and the coding error can reflect an error between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame, so that the coding quality of the coding mode can be measured through the coding error.
Based on the effect of the coding error, it can be understood that in one possible implementation manner, the implementation manner of S803 may be to calculate a difference signal between the high frequency reconstructed signal frame and the original high frequency audio signal frame, and then determine the coding error by using the difference signal. If the high frequency reconstructed signal frame is denoted as S 'and the original high frequency audio signal frame is denoted as S, the difference signal is obtained by subtracting S' from S, and the difference signal may be denoted as Err.
Since the difference signal may already represent the error between the high frequency reconstructed signal frame and the original high frequency audio signal frame, in one possible implementation, the difference signal may be used as the coding error, thereby representing the coding error of the coding scheme more accurately.
In some cases, the error represented by the difference signal is an error of the signal itself, and the signal generally needs to be played to a user, and an error of an auditory perception level of the user may be different from the error of the signal itself, so in another possible implementation manner, a psychoacoustic perception analysis method may be used for the error signal, and the error of the auditory perception level may be quantified through psychoacoustic perception. Based on the above, when calculating the coding error, the difference signal can be subjected to auditory perception weighted energy calculation to obtain difference energy, the original high-frequency audio signal frame is subjected to auditory perception weighted energy calculation to obtain original energy, and the ratio of the difference energy to the original energy is used as the coding error. Wherein the difference energy and the original energy are auditory perception weighted energy. If the difference energy is denoted as EP _ err (i) and the original energy is denoted as EP _ s (i), the coding error can be calculated as:
Figure 638876DEST_PATH_IMAGE002
(1)
where w (i) is the coding error, EP _ err (i) is the difference energy, and EP _ s (i) is the original energy. And comparing w (i) with the error preset interval Thrd, wherein when w (i) > Thrd, the coding error exceeds the error preset interval, and otherwise, the coding error is in the error preset interval.
In this way, the coding error can be measured from the aspect of auditory perception, and therefore the coding quality can be guaranteed at the aspect of auditory perception.
The main basis of auditory perception is "loudness", which varies with the intensity of the audio signal but is also influenced by frequency, i.e. audio signals of the same intensity and different frequencies have different auditory perception for the human ear. Fig. 9 is a graph of acoustic equal loudness measured by international organization for acoustic standards according to an embodiment of the present application, where an acoustic equal loudness curve is a curve describing the relationship between sound pressure loudness and frequency under equal loudness, and is one of important auditory characteristics. I.e. what sound pressure level intensity the audio signal at different frequencies needs to reach in order to obtain a consistent auditory loudness for the user. To illustrate the meaning of the curve, it can be seen that for a medium and low frequency (below 1 kHz), the lower the frequency, the greater the sound pressure intensity (i.e., energy) required for equal loudness, and simply the greater the energy required to make the user have the same auditory sensation, as illustrated in any equal loudness curve in fig. 9. For medium and high frequencies (above 1 kHz), audio in different frequency bands has different acoustic auditory perception characteristics. In this case, the calculation of the auditory perception weighting energy may be:
1) framing and windowing:
for an input audio signal (e.g. a difference signal or an original high frequency audio signal frame according to an embodiment of the present application) an analysis window of 20ms for one frame (consistent with the encoder frame definition) is typically used, and the window function may be a hanning window or a hamming window.
2) And (3) power spectrum calculation:
fourier transform is carried out on the audio signal obtained after windowing and framing, and the energy of each frequency point of the ith frame is solved
Figure 217494DEST_PATH_IMAGE003
Wherein K is the total frequency point number.
3) Computing auditory perception weighted energy:
the energy of each frequency point k is multiplied by different auditory perception weighting coefficients and then accumulated to obtain the auditory perception weighting energy value of the audio signal of the frame, and the calculation formula is as follows:
Figure 416394DEST_PATH_IMAGE004
(2)
wherein, ep (i) is the auditory perception weighting energy of the ith frame of audio signal, i is the frame number, k is the frequency point number, and cof (k) is the auditory perception weighting coefficient of the kth frequency point.
Thus, when the ith frame audio signal is the current original high-frequency audio signal frame, the calculated EP (i) is represented as the original energy EP _ s (i); when the i-th frame audio signal is the corresponding difference signal, the calculated EP (i) is expressed as the difference energy EP _ err (i).
For the auditory perception weighting coefficient, the psychoacoustic equal loudness curve data based on the BS3383 standard is calculated in the embodiment of the present application, and the calculation formula is as follows:
cof(freq) =(10^loud/20)/1000 (3)
where freq represents a frequency point, cof (freq) corresponds to an auditory perception weighting coefficient of the k-th frequency point, and loud represents a loudness value of the frequency point freq.
It should be noted that the loudness value loud of the frequency point freq can be calculated by the following formula:
loud=4.2+afy*(dB-cfy)/(1+bfy*(dB-cfy)) (4)
afy=af(j-1)+(freq-ff(j-1))*(af(j)-af(j-1))/(ff(j)-ff(j-1)) (5)
bfy=bf(j-1)+(freq-ff(j-1))*(bf(j)-bf(j-1))/(ff(j)-ff(j-1)) (6)
cfy=cf(j-1)+(freq-ff(j-1))*(cf(j)-cf(j-1))/(ff(j)-ff(j-1)) (7)
the data in the equal loudness curve data table disclosed in the BS3383 standard corresponding to ff, af, bf and cf can be obtained by querying the equal loudness curve data table, j is the number in the equal loudness curve data table, freq is the frequency point for calculating the loudness value loud, and the loudness value loud calculation is obtained by interpolating the data in the equal loudness curve data table by adopting a linear interpolation method.
It will be appreciated that freq for calculating the loudness value by the above formula is typically the frequency bin corresponding to the number between j-1 and j. The auditory perception weighting coefficient graph calculated based on the above formula can be seen in fig. 10, which shows auditory perception weighting coefficients corresponding to different frequency points.
An embodiment of the present invention further provides a method for decoding a high-frequency audio signal, which is introduced from the perspective of a receiving end, and with reference to fig. 11, the method includes:
s1101, receiving a high-frequency code stream sent by a sending end, wherein the high-frequency code stream is provided with a coding identifier, and the coding identifier is used for indicating a coding mode used by the high-frequency code stream obtained through coding.
And S1102, analyzing to obtain a code identifier corresponding to the high-frequency code stream.
And S1103, decoding the high-frequency code stream according to a decoding mode corresponding to the coding mode indicated by the coding identification to obtain a high-frequency audio signal frame.
After receiving the high-frequency code stream, the receiving end can analyze the high-frequency code stream to obtain the high-frequency code stream and the corresponding coding identifier, and then decode the high-frequency code stream according to the decoding mode corresponding to the coding mode indicated by the coding identifier to obtain a high-frequency audio signal frame.
It can be seen from the foregoing technical solutions that, in the present application, for an original high-frequency audio signal of an original audio signal, a coding error decision-based high-frequency audio signal coding and decoding method with multiple mixed coding modes is provided, and specifically, for each original audio signal frame in the original audio signal, an original high-frequency audio signal frame obtained by decomposing the original audio signal frame and multiple coding modes are obtained, where the coding modes have corresponding priorities, and the priorities of the coding modes are used to indicate a priority order of coding using the coding modes, and in general, in order to reduce a bandwidth of audio signal transmission as much as possible, the number of coding bits of the coding modes is increased in an order from high to low in priority. And then according to the priority of the coding mode, determining a coding mode with a coding error in an error preset interval from multiple coding modes as a target coding mode, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode, so that the target coding mode can be determined by using the coding error as a judgment standard and the optimal number of coding bits as a target, and transmitting a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end, thereby selecting the coding mode with a small number of coding bits under the condition of allowable coding quality, and reducing the bandwidth of audio signal transmission. The high-frequency code stream has the coding identifier, and the coding identifier is used for indicating the coding mode used by the high-frequency code stream obtained by coding, so that the decoding end can determine which coding mode is used for decoding the received high-frequency code stream according to the coding identifier. Therefore, the method and the device can select the coding mode with small number of coding bits under the condition of allowable coding quality, achieve satisfactory effect on the number of the coding bits and the coding quality, and have lower number of the coding bits and high-quality audio.
The application also provides a coding and decoding method of the high-frequency audio signal, which is introduced from the perspective of the overall architecture of the sending end and the receiving end. In the embodiment of the present application, a plurality of coding modes including an SSR mode, an SBR mode, and a CELP coding mode are taken as examples, and the SSR mode, the SBR mode, and the CELP coding mode are sequentially performed from high to low in priority, and a whole implementation framework of encoding and decoding of a high-frequency audio signal may be as shown in fig. 12.
Wherein an original high frequency audio signal frame and an original low frequency audio signal frame (see 1201 in fig. 12) are input, the original high frequency audio signal frame and the original low frequency audio signal frame are obtained by high and low frequency decomposition (e.g. by QMF filter bank decomposition) of the original audio signal frame, and the original low frequency audio signal frame can be used for subsequent high frequency reconstruction.
In the high-frequency audio coding link, firstly, coding is tried to obtain a high-frequency code stream by an SSR (see 1202 in fig. 12), then high-frequency reconstruction is performed (see 1203 in fig. 12), based on a high-frequency reconstructed signal frame obtained by the high-frequency reconstruction and an original high-frequency audio signal frame, whether a coding error is in an error preset interval (see 1204 in fig. 12) is determined, if yes, a step of transmitting the high-frequency code stream to a receiving end (see 1209 in fig. 12) is executed, if no, coding is tried to obtain a high-frequency code stream by an SBR (see 1205 in fig. 12), then high-frequency reconstruction is performed (see 1206 in fig. 12), whether a coding error is in the error preset interval (see 1207 in fig. 12) is determined, if yes, a step of transmitting the high-frequency code stream to the receiving end (see 1209 in fig. 12) is executed, if no, a CELP coding mode is continuously tried (see 1208 in fig. 12), and executing the step of sending the high-frequency code stream to a receiving end (see 1209 in fig. 12), wherein the high-frequency code stream has a corresponding code identifier. In the high-frequency audio decoding step, the high-frequency code stream and the coding identifier (see 1210 in fig. 12) are obtained by parsing, the high-frequency code stream is decoded by using the decoding method corresponding to the coding method indicated by the coding identifier (see 1211 in fig. 12), and through the above-mentioned process, the high-frequency audio signal frame (see 1212 in fig. 12) is obtained by decoding.
It should be noted that, on the basis of the implementation manners provided by the above aspects, the present application may be further combined to provide further implementation manners.
Based on the method for encoding a high-frequency audio signal provided in the embodiment corresponding to fig. 2, an embodiment of the present application further provides an apparatus 1300 for encoding a high-frequency audio signal. Referring to fig. 13, the apparatus 1300 for encoding a high frequency audio signal includes an obtaining unit 1301, a determining unit 1302, and a transmitting unit 1303:
the obtaining unit 1301 is configured to obtain multiple encoding modes and obtain an original high-frequency audio signal frame decomposed from an original audio signal frame;
the obtaining unit 1301 is further configured to obtain priorities corresponding to the multiple coding modes, and the number of coded bits of the coding modes increases progressively according to a sequence from high to low of the priorities;
the determining unit 1302 is configured to determine, according to the priority of the encoding mode, an encoding mode with an encoding error within an error preset interval from the multiple encoding modes as a target encoding mode, where the encoding error of the encoding mode is generated by encoding the original high-frequency audio signal frame by using the encoding mode;
the sending unit 1303 is configured to send a high-frequency code stream obtained by coding the original high-frequency audio signal frame in the target coding manner to a receiving end, where the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate a coding manner used by the high-frequency code stream obtained by coding.
In a possible implementation manner, the determining unit 1302 is specifically configured to:
according to the sequence of the priority from high to low, selecting a pending coding mode from the multiple coding modes in sequence;
determining a coding error of the undetermined coding mode;
and if the coding error of the undetermined coding mode is within the error preset interval, determining the undetermined coding mode as the target coding mode, and stopping continuously selecting the undetermined coding mode.
In a possible implementation manner, the determining unit 1302 is specifically configured to:
respectively determining the coding error of each coding mode in the multiple coding modes;
and determining the coding mode with the highest priority as the target coding mode from the coding modes of the coding errors in the preset error interval.
In a possible implementation manner, for any one of the plurality of encoding manners, the apparatus further includes a reconstruction unit and an error analysis unit:
the obtaining unit 1301 is further configured to obtain an original low-frequency audio signal frame decomposed from the original audio signal frame;
the reconstruction unit is used for performing high-frequency reconstruction by utilizing any one coding mode according to the original low-frequency audio signal frame to obtain a high-frequency reconstruction signal frame;
and the error analysis unit is used for carrying out error analysis on the basis of the high-frequency reconstruction signal frame and the original high-frequency audio signal frame to obtain a corresponding coding error.
In a possible implementation manner, the error analysis unit is specifically configured to:
calculating a difference signal between the high-frequency reconstruction signal frame and the original high-frequency audio signal frame;
determining the coding error using the difference signal.
In a possible implementation manner, the error analysis unit is specifically configured to:
taking the difference signal as the coding error;
alternatively, the first and second electrodes may be,
carrying out auditory perception weighted energy calculation on the difference signal to obtain difference energy, and carrying out auditory perception weighted energy calculation on the original high-frequency audio signal frame to obtain original energy;
and taking the ratio of the difference energy to the original energy as the coding error.
In a possible implementation manner, if any one of the encoding manners is an audio super-resolution manner, the reconstruction unit is specifically configured to:
acquiring a neural network model corresponding to the audio super-resolution mode;
extracting the characteristics of the original low-frequency audio signal frame to obtain low-frequency characteristics;
and predicting through the neural network model according to the low-frequency characteristics to obtain the high-frequency reconstruction signal frame.
In a possible implementation manner, if any one of the coding schemes is a spectral band replication scheme, the reconstructing unit is specifically configured to:
copying the original low-frequency audio signal frame to a high-frequency band to obtain a high-frequency copied signal frame;
extracting envelope characteristics of the original high-frequency audio signal frame;
and correcting the high-frequency copy signal frame by using the envelope characteristic to obtain the high-frequency reconstruction signal frame.
In a possible implementation manner, if the any coding method is a code-excited linear prediction coding method, the reconstruction unit is specifically configured to:
acquiring coding parameters from the high-frequency code stream, and acquiring a pitch period of the original low-frequency audio signal frame;
and performing high-frequency reconstruction according to the coding parameters and the pitch period to obtain the high-frequency reconstructed signal frame.
Based on the decoding method for the high-frequency audio signal provided by the embodiment corresponding to fig. 11, the embodiment of the present application further provides a decoding apparatus 1400 for the high-frequency audio signal. Referring to fig. 14, the decoding apparatus 1400 for a high frequency audio signal includes a receiving unit 1401, a parsing unit 1402, and a decoding unit 1403:
the receiving unit 1401 is configured to receive a high-frequency code stream sent by a sending end, where the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate a coding mode used by a coding to obtain the high-frequency code stream;
the analysis unit 1402 is configured to analyze the code identifier corresponding to the high-frequency code stream;
the decoding unit 1403 is configured to decode the high-frequency code stream according to the decoding mode corresponding to the coding mode indicated by the coding identifier, so as to obtain a high-frequency audio signal frame.
The embodiment of the application also provides computer equipment which can execute the coding and decoding method of the high-frequency audio signal. The computer device may be, for example, a terminal, taking the terminal as a smart phone as an example:
fig. 15 is a block diagram illustrating a partial structure of a smartphone according to an embodiment of the present application. Referring to fig. 15, the smart phone includes: radio Frequency (Radio Frequency, RF) circuit 1510, memory 1520, input unit 1530, display unit 1540, sensor 1550, audio circuit 1560, wireless fidelity (WiFi) module 1570, processor 1580, and power 1590. The input unit 1530 may include a touch panel 1531 and other input devices 1532, the display unit 1540 may include a display panel 1541, and the audio circuit 1560 may include a speaker 1561 and a microphone 1562. It will be appreciated that the smartphone configuration shown in fig. 15 is not intended to be limiting of smartphones and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The memory 1520 may be used to store software programs and modules, and the processor 1580 performs various functional applications and data processing of the smartphone by executing the software programs and modules stored in the memory 1520. The memory 1520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the smartphone, and the like. Further, the memory 1520 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The processor 1580 is a control center of the smartphone, connects various parts of the entire smartphone using various interfaces and lines, and performs various functions of the smartphone and processes data by operating or executing software programs and/or modules stored in the memory 1520 and calling data stored in the memory 1520. Optionally, the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, and the like, and a modem processor, which mainly handles wireless communications. It is to be appreciated that the modem processor may not be integrated into the processor 1580.
In this embodiment, the processor 1580 in the smartphone may perform the following steps:
acquiring a plurality of coding modes and acquiring an original high-frequency audio signal frame obtained by decomposing the original audio signal frame;
acquiring priorities corresponding to the multiple coding modes respectively, and increasing the number of coding bits of the coding modes in an ascending order according to the priorities from high to low;
according to the priority of the coding modes, determining a coding mode with a coding error in an error preset interval from the multiple coding modes as a target coding mode, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode;
and sending a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end, wherein the high-frequency code stream is provided with a coding identifier which is used for indicating a coding mode used for coding the high-frequency code stream.
Or the like, or, alternatively,
receiving a high-frequency code stream sent by a sending end, wherein the high-frequency code stream is provided with a coding identifier, and the coding identifier is used for indicating a coding mode used by the high-frequency code stream obtained through coding;
analyzing to obtain a coding identifier corresponding to the high-frequency code stream;
and decoding the high-frequency code stream according to a decoding mode corresponding to the coding mode indicated by the coding identification to obtain a high-frequency audio signal frame.
Referring to fig. 16, fig. 16 is a block diagram of a server 1600 provided in this embodiment, and the server 1600 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1622 (e.g., one or more processors) and a memory 1632, and one or more storage media 1630 (e.g., one or more mass storage devices) storing an application program 1642 or data 1644. Memory 1632 and storage media 1630 may be transient or persistent storage, among others. The program stored on the storage medium 1630 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Further, central processing unit 1622 may be configured to communicate with storage medium 1630 to execute a series of instruction operations on storage medium 1630 at server 1600.
The Server 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input-output interfaces 1658, and/or one or more operating systems 1641, such as a Windows ServerTM,Mac OS XTM,UnixTM, LinuxTM,FreeBSDTMAnd so on.
In this embodiment, the steps performed by the central processor 1622 in the server 1600 may be implemented based on the structure shown in fig. 16.
According to an aspect of the present application, there is provided a computer-readable storage medium for storing a program code for performing the method of encoding and decoding a high frequency audio signal according to the foregoing embodiments.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the embodiment.
The description of the flow or structure corresponding to each of the above drawings has emphasis, and a part not described in detail in a certain flow or structure may refer to the related description of other flows or structures.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (15)

1. A method of encoding a high frequency audio signal, the method comprising:
acquiring a plurality of coding modes and acquiring an original high-frequency audio signal frame obtained by decomposing the original audio signal frame;
acquiring priorities corresponding to the multiple coding modes respectively, and increasing the number of coding bits of the coding modes in an ascending order according to the priorities from high to low;
according to the priority of the coding modes, determining a coding mode with a coding error in an error preset interval from the multiple coding modes as a target coding mode, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode;
and sending a high-frequency code stream obtained by coding the original high-frequency audio signal frame by using the target coding mode to a receiving end, wherein the high-frequency code stream is provided with a coding identifier which is used for indicating a coding mode used for coding the high-frequency code stream.
2. The method according to claim 1, wherein the determining, as the target coding scheme, a coding scheme with a coding error within a preset error interval from among the plurality of coding schemes according to the priority of the coding scheme, comprises:
according to the sequence of the priority from high to low, selecting a pending coding mode from the multiple coding modes in sequence;
determining a coding error of the undetermined coding mode;
and if the coding error of the undetermined coding mode is within the error preset interval, determining the undetermined coding mode as the target coding mode, and stopping continuously selecting the undetermined coding mode.
3. The method according to claim 1, wherein the determining, as the target coding scheme, a coding scheme with a coding error within a preset error interval from among the plurality of coding schemes according to the priority of the coding scheme, comprises:
respectively determining the coding error of each coding mode in the multiple coding modes;
and determining the coding mode with the highest priority as the target coding mode from the coding modes of the coding errors in the preset error interval.
4. The method according to claim 1, wherein for any of the plurality of encoding schemes, the determining of the coding error of the any encoding scheme comprises:
acquiring an original low-frequency audio signal frame obtained by decomposing the original audio signal frame;
according to the original low-frequency audio signal frame, performing high-frequency reconstruction by using any coding mode to obtain a high-frequency reconstruction signal frame;
and carrying out error analysis based on the high-frequency reconstruction signal frame and the original high-frequency audio signal frame to obtain a corresponding coding error.
5. The method according to claim 4, wherein the performing an error analysis based on the high frequency reconstructed signal frame and the original high frequency audio signal frame to obtain the coding error comprises:
calculating a difference signal between the high-frequency reconstruction signal frame and the original high-frequency audio signal frame;
determining the coding error using the difference signal.
6. The method of claim 5, wherein said determining the coding error using the difference value comprises:
taking the difference signal as the coding error;
alternatively, the first and second liquid crystal display panels may be,
carrying out auditory perception weighted energy calculation on the difference signal to obtain difference energy, and carrying out auditory perception weighted energy calculation on the original high-frequency audio signal frame to obtain original energy;
and taking the ratio of the difference energy to the original energy as the coding error.
7. The method according to any one of claims 4 to 6, wherein if any one of the coding schemes is an audio super-resolution scheme, the performing high-frequency reconstruction according to the original low-frequency audio signal frame by using any one of the coding schemes to obtain a high-frequency reconstructed signal frame comprises:
acquiring a neural network model corresponding to the audio super-resolution mode;
extracting the characteristics of the original low-frequency audio signal frame to obtain low-frequency characteristics;
and predicting through the neural network model according to the low-frequency characteristics to obtain the high-frequency reconstruction signal frame.
8. The method according to any one of claims 4-6, wherein if said any one of the encoding schemes is a spectral band replication scheme, said performing high frequency reconstruction using said any one of the encoding schemes according to the original low frequency audio signal frame to obtain a high frequency reconstructed signal frame comprises:
copying the original low-frequency audio signal frame to a high-frequency band to obtain a high-frequency copied signal frame;
extracting envelope characteristics of the original high-frequency audio signal frame;
and correcting the high-frequency copy signal frame by using the envelope characteristic to obtain the high-frequency reconstruction signal frame.
9. The method according to any one of claims 4-6, wherein if said any one coding method is a code excited linear prediction coding method, said performing high frequency reconstruction according to said original low frequency audio signal frame by using said any one coding method to obtain a high frequency reconstructed signal frame comprises:
acquiring coding parameters from the high-frequency code stream, and acquiring a pitch period of the original low-frequency audio signal frame;
and performing high-frequency reconstruction according to the coding parameters and the pitch period to obtain the high-frequency reconstructed signal frame.
10. A method of decoding a high frequency audio signal, the method comprising:
receiving a high-frequency code stream sent by a sending end, wherein the high-frequency code stream is provided with a coding identifier, and the coding identifier is used for indicating a coding mode used by the high-frequency code stream obtained through coding;
analyzing to obtain a coding identifier corresponding to the high-frequency code stream;
and decoding the high-frequency code stream according to a decoding mode corresponding to the coding mode indicated by the coding identification to obtain a high-frequency audio signal frame.
11. An apparatus for encoding a high frequency audio signal, the apparatus comprising an acquisition unit, a determination unit, and a transmission unit:
the acquisition unit is used for acquiring a plurality of coding modes and acquiring an original high-frequency audio signal frame obtained by decomposing the original audio signal frame;
the acquiring unit is further configured to acquire priorities corresponding to the multiple coding modes, and the number of coded bits of the coding modes increases progressively according to the sequence from high to low of the priorities;
the determining unit is used for determining a coding mode of a coding error in an error preset interval from the multiple coding modes as a target coding mode according to the priority of the coding modes, wherein the coding error of the coding mode is generated by coding the original high-frequency audio signal frame by using the coding mode;
the transmitting unit is used for transmitting a high-frequency code stream obtained by encoding the original high-frequency audio signal frame by using the target encoding mode to a receiving end, wherein the high-frequency code stream is provided with an encoding identifier, and the encoding identifier is used for indicating an encoding mode used by the high-frequency code stream obtained by encoding.
12. An apparatus for decoding a high frequency audio signal, the apparatus comprising a receiving unit, a parsing unit and a decoding unit:
the receiving unit is used for receiving a high-frequency code stream sent by a sending end, the high-frequency code stream is provided with a coding identifier, and the coding identifier is used for indicating a coding mode used by the high-frequency code stream obtained through coding;
the analysis unit is used for analyzing to obtain a coding identifier corresponding to the high-frequency code stream;
and the decoding unit is used for decoding the high-frequency code stream according to the decoding mode corresponding to the coding mode indicated by the coding identification to obtain a high-frequency audio signal frame.
13. A computer device, the computer device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of claims 1-10 according to instructions in the program code.
14. A computer-readable storage medium for storing program code, which when executed by a processor causes the processor to perform the method of any one of claims 1-10.
15. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1-10 when executed by a processor.
CN202210395889.2A 2022-04-15 2022-04-15 Coding and decoding method and related device for high-frequency audio signal Active CN114550732B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210395889.2A CN114550732B (en) 2022-04-15 2022-04-15 Coding and decoding method and related device for high-frequency audio signal
PCT/CN2023/081461 WO2023197809A1 (en) 2022-04-15 2023-03-14 High-frequency audio signal encoding and decoding method and related apparatuses

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210395889.2A CN114550732B (en) 2022-04-15 2022-04-15 Coding and decoding method and related device for high-frequency audio signal

Publications (2)

Publication Number Publication Date
CN114550732A true CN114550732A (en) 2022-05-27
CN114550732B CN114550732B (en) 2022-07-08

Family

ID=81666757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210395889.2A Active CN114550732B (en) 2022-04-15 2022-04-15 Coding and decoding method and related device for high-frequency audio signal

Country Status (2)

Country Link
CN (1) CN114550732B (en)
WO (1) WO2023197809A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116348952A (en) * 2023-02-09 2023-06-27 北京小米移动软件有限公司 Audio signal processing device, equipment and storage medium
WO2023197809A1 (en) * 2022-04-15 2023-10-19 腾讯科技(深圳)有限公司 High-frequency audio signal encoding and decoding method and related apparatuses
WO2023241254A1 (en) * 2022-06-15 2023-12-21 腾讯科技(深圳)有限公司 Audio encoding and decoding method and apparatus, electronic device, computer readable storage medium, and computer program product

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101662288A (en) * 2008-08-28 2010-03-03 华为技术有限公司 Method, device and system for encoding and decoding audios
CN101710489A (en) * 2009-11-09 2010-05-19 清华大学 Method and device capable of encoding and decoding audio by grade and encoding and decoding system
US20160111103A1 (en) * 2013-06-11 2016-04-21 Panasonic Intellectual Property Corporation Of America Device and method for bandwidth extension for audio signals
CN106409305A (en) * 2010-12-29 2017-02-15 三星电子株式会社 Apparatus and method for encoding and decoding signal for high frequency bandwidth extension
JP2019049745A (en) * 2014-03-24 2019-03-28 ソニー株式会社 Decoder and method, and program
JP2019133184A (en) * 2019-04-05 2019-08-08 株式会社Nttドコモ Voice decoding device, voice decoding method, and voice decoding program
CN111489758A (en) * 2014-03-24 2020-08-04 索尼公司 Decoding device, decoding method, and storage medium
CN111933159A (en) * 2017-11-10 2020-11-13 弗劳恩霍夫应用研究促进协会 Audio encoder, audio decoder, methods and computer programs adapting encoding and decoding of least significant bits
US20210082448A1 (en) * 2019-09-12 2021-03-18 Immersion Networks, Inc. Systems and methods for processing high frequency audio signal
CN112530444A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Audio encoding method and apparatus
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
US20210151062A1 (en) * 2018-04-25 2021-05-20 Dolby International Ab Integration of high frequency reconstruction techniques with reduced post-processing delay
CN113593586A (en) * 2020-04-15 2021-11-02 华为技术有限公司 Audio signal encoding method, decoding method, encoding apparatus, and decoding apparatus
CN113841197A (en) * 2019-03-14 2021-12-24 博姆云360公司 Spatial-aware multiband compression system with priority
CN113963703A (en) * 2020-07-03 2022-01-21 华为技术有限公司 Audio coding method and coding and decoding equipment
CN114333861A (en) * 2021-11-18 2022-04-12 腾讯科技(深圳)有限公司 Audio processing method, device, storage medium, equipment and product

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11133999A (en) * 1997-10-29 1999-05-21 Ricoh Co Ltd Voice coding and decoding equipment
CN102074242B (en) * 2010-12-27 2012-03-28 武汉大学 Extraction system and method of core layer residual in speech audio hybrid scalable coding
CN113470667A (en) * 2020-03-11 2021-10-01 腾讯科技(深圳)有限公司 Voice signal coding and decoding method and device, electronic equipment and storage medium
CN114550732B (en) * 2022-04-15 2022-07-08 腾讯科技(深圳)有限公司 Coding and decoding method and related device for high-frequency audio signal

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101662288A (en) * 2008-08-28 2010-03-03 华为技术有限公司 Method, device and system for encoding and decoding audios
CN101710489A (en) * 2009-11-09 2010-05-19 清华大学 Method and device capable of encoding and decoding audio by grade and encoding and decoding system
CN106409305A (en) * 2010-12-29 2017-02-15 三星电子株式会社 Apparatus and method for encoding and decoding signal for high frequency bandwidth extension
US20160111103A1 (en) * 2013-06-11 2016-04-21 Panasonic Intellectual Property Corporation Of America Device and method for bandwidth extension for audio signals
JP2019049745A (en) * 2014-03-24 2019-03-28 ソニー株式会社 Decoder and method, and program
CN111489758A (en) * 2014-03-24 2020-08-04 索尼公司 Decoding device, decoding method, and storage medium
CN111933159A (en) * 2017-11-10 2020-11-13 弗劳恩霍夫应用研究促进协会 Audio encoder, audio decoder, methods and computer programs adapting encoding and decoding of least significant bits
US20210151062A1 (en) * 2018-04-25 2021-05-20 Dolby International Ab Integration of high frequency reconstruction techniques with reduced post-processing delay
CN113841197A (en) * 2019-03-14 2021-12-24 博姆云360公司 Spatial-aware multiband compression system with priority
JP2019133184A (en) * 2019-04-05 2019-08-08 株式会社Nttドコモ Voice decoding device, voice decoding method, and voice decoding program
US20210082448A1 (en) * 2019-09-12 2021-03-18 Immersion Networks, Inc. Systems and methods for processing high frequency audio signal
CN112530444A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Audio encoding method and apparatus
CN113593586A (en) * 2020-04-15 2021-11-02 华为技术有限公司 Audio signal encoding method, decoding method, encoding apparatus, and decoding apparatus
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
CN113963703A (en) * 2020-07-03 2022-01-21 华为技术有限公司 Audio coding method and coding and decoding equipment
CN114333861A (en) * 2021-11-18 2022-04-12 腾讯科技(深圳)有限公司 Audio processing method, device, storage medium, equipment and product

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENHAO HU ET AL: "SPATIAL AUDIO OBJECT CODING BASED ON TIME-FREQUENCY SHIFTING AND SCHEDULING", 《2021 ICME》 *
CHI-MIN LIU ET AL: "HIGH FREQUENCY RECONSTRUCTION FOR BAND-LIMITED AUDIO SIGNALS", 《PROC. OF THE 6TH INT. CONFERENCE ON DIGITAL AUDIO EFFECTS》 *
姜林: "基于非线性映射模型的音频带宽扩展编码研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023197809A1 (en) * 2022-04-15 2023-10-19 腾讯科技(深圳)有限公司 High-frequency audio signal encoding and decoding method and related apparatuses
WO2023241254A1 (en) * 2022-06-15 2023-12-21 腾讯科技(深圳)有限公司 Audio encoding and decoding method and apparatus, electronic device, computer readable storage medium, and computer program product
CN116348952A (en) * 2023-02-09 2023-06-27 北京小米移动软件有限公司 Audio signal processing device, equipment and storage medium

Also Published As

Publication number Publication date
CN114550732B (en) 2022-07-08
WO2023197809A1 (en) 2023-10-19

Similar Documents

Publication Publication Date Title
KR102636424B1 (en) Method and system for decoding left and right channels of a stereo sound signal
CN114550732B (en) Coding and decoding method and related device for high-frequency audio signal
US11727946B2 (en) Method, apparatus, and system for processing audio data
JP5688852B2 (en) Audio codec post filter
JP5226777B2 (en) Recovery of hidden data embedded in audio signals
CN106847303B (en) Method, apparatus and recording medium for supporting bandwidth extension of harmonic audio signal
JP2009539132A (en) Linear predictive coding of audio signals
CN110634503B (en) Method and apparatus for signal processing
CN110136742B (en) System and method for performing noise modulation and gain adjustment
KR20190060887A (en) Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system
CN113470667A (en) Voice signal coding and decoding method and device, electronic equipment and storage medium
EP3080804A1 (en) Bandwidth extension mode selection
RU2667973C2 (en) Methods and apparatus for switching coding technologies in device
WO2016209541A1 (en) Random noise seed value generation
AU2023254936A1 (en) Multi-channel signal generator, audio encoder and related methods relying on a mixing noise signal
JPH0946233A (en) Sound encoding method/device and sound decoding method/ device
JP6584431B2 (en) Improved frame erasure correction using speech information
WO2024051412A1 (en) Speech encoding method and apparatus, speech decoding method and apparatus, computer device and storage medium
JP2000132193A (en) Signal encoding device and method therefor, and signal decoding device and method therefor
WO2011114192A1 (en) Method and apparatus for audio coding
CN116110424A (en) Voice bandwidth expansion method and related device
CN114863942A (en) Model training method for voice quality conversion, method and device for improving voice quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40070387

Country of ref document: HK