CN114582361B - High-resolution audio coding and decoding method and system based on generation countermeasure network - Google Patents

High-resolution audio coding and decoding method and system based on generation countermeasure network Download PDF

Info

Publication number
CN114582361B
CN114582361B CN202210463201.XA CN202210463201A CN114582361B CN 114582361 B CN114582361 B CN 114582361B CN 202210463201 A CN202210463201 A CN 202210463201A CN 114582361 B CN114582361 B CN 114582361B
Authority
CN
China
Prior art keywords
frequency
low
frequency band
frequency spectrum
band
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210463201.XA
Other languages
Chinese (zh)
Other versions
CN114582361A (en
Inventor
李强
朱勇
王尧
叶东翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Barrot Wireless Co Ltd
Original Assignee
Barrot Wireless Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Barrot Wireless Co Ltd filed Critical Barrot Wireless Co Ltd
Priority to CN202210463201.XA priority Critical patent/CN114582361B/en
Publication of CN114582361A publication Critical patent/CN114582361A/en
Application granted granted Critical
Publication of CN114582361B publication Critical patent/CN114582361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • G10L19/265Pre-filtering, e.g. high frequency emphasis prior to encoding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The application discloses a high-resolution audio coding and decoding method and system based on a generation countermeasure network, which belong to the technical field of audio coding and decoding, and the method comprises the following steps: filtering the encoded audio through an orthogonal mirror analysis filter to obtain low-frequency band audio data and high-frequency band audio data; carrying out standard LC3 encoding on the low-frequency band audio data to obtain a low-frequency band code stream and obtain a low-frequency spectrum envelope; obtaining a high-frequency band code stream according to the low-frequency spectrum envelope and the frequency domain spectrum coefficient corresponding to the high-frequency band audio data; the audio receiving end carries out standard LC3 decoding process on the low-frequency band code stream to obtain a low-frequency spectrum coefficient and obtain decoded low-frequency band data; processing the high-frequency band code stream by utilizing a pre-trained generating network to obtain decoded high-frequency band data; and synthesizing the low-frequency band data and the high-frequency band data through an orthogonal mirror image synthesis filter to obtain a decoding result. The application realizes high-resolution audio coding and decoding at twice standard sampling rate.

Description

High-resolution audio coding and decoding method and system based on generation countermeasure network
Technical Field
The present application relates to the field of audio encoding and decoding technologies, and in particular, to a high resolution audio encoding and decoding method and system based on a generative countermeasure network.
Background
Currently mainstream bluetooth audio encoders are as follows: SBC: the A2DP protocol is mandatory and is most widely used, and all Bluetooth audio equipment must support the protocol, but the tone quality is general; AAC-LC, wherein the sound quality is good and the application is wide, and a plurality of mainstream mobile phones support the AAC-LC, but compared with SBC, the memory occupation is large, the operation complexity is high, a plurality of Bluetooth devices are based on an embedded platform, the battery capacity is limited, the operation capability of a processor is poor, the memory is limited, and the patent fee is high; aptX series: the sound quality is good, but the code rate is high, the code rate of aptX needs 384kbps, and the code rate of aptX-HD is 576kbps, is a unique technology of high pass, and is relatively closed; LDAC, the tone quality is better, but the code rate is also very high, it is 330kbps, 660kbps and 990kbps respectively, because the wireless environment that the bluetooth apparatus locates is especially complicated, there is certain difficulty in supporting such high code rate steadily, and it is the unique technology of Sony, it is very closed too; LHDC: the sound quality is good, but the code rate is also high, typically including 400kbps, 600 kbps and 900kbps, and such high code rate puts high requirements on the baseband/radio frequency design of bluetooth.
For the above reasons, the Bluetooth international association Bluetooth Sig combines with numerous manufacturers to provide LC3, mainly for Bluetooth low energy, and can also be used for classic Bluetooth, which has the advantages of low delay, high sound quality and coding gain, and no special fee in the Bluetooth field, and is paid attention by the manufacturers.
For the LC3 Audio codec, the positioning is low complexity, only the sampling rate of 8 kHz-48 kHz is supported, and the sampling rate requirement of High Resolution Audio (High Resolution Audio) cannot be met. In the prior art, the method for improving the sampling rate needs high computational complexity and large power consumption, and cannot be applied to LC3 low-power-consumption Bluetooth equipment.
Disclosure of Invention
The application provides a high-resolution audio coding and decoding method and system based on a generation countermeasure network, aiming at the problems that in the prior art, when high-resolution audio coding and decoding are carried out, the required calculation force is high, the power consumption is large, LC3 low-power-consumption Bluetooth equipment has high requirements on the power consumption, and the LC3 low-power-consumption Bluetooth field cannot be directly applied.
In a first aspect, the present application provides a high resolution audio coding method based on a generative countermeasure network, including: filtering the input coded audio at an audio transmitting end through an orthogonal mirror image analysis filter to obtain low-frequency band audio data and high-frequency band audio data; carrying out standard LC3 encoding on the low-frequency band audio data by using a standard sampling rate to obtain a low-frequency band code stream, and simultaneously obtaining a low-frequency spectrum envelope; according to the low-frequency spectrum envelope and the frequency domain spectrum coefficient corresponding to the high-frequency band audio data, encoding the high-frequency band audio data by using a standard sampling rate to obtain a high-frequency band code stream; the audio receiving end receives the low-frequency band code stream and the high-frequency band code stream, and performs a standard LC3 decoding process on the low-frequency band code stream to obtain a low-frequency spectrum coefficient and obtain decoded low-frequency band data; generating a high-frequency spectrum coefficient according to the low-frequency spectrum coefficient by using a pre-trained generating network, decoding a high-frequency band code stream to obtain a high-frequency low-frequency spectrum envelope ratio, correcting the high-frequency spectrum coefficient by using the high-frequency low-frequency spectrum envelope ratio, and performing inverse transformation to obtain high-frequency band data; and synthesizing the decoded low-frequency band data and the decoded high-frequency band data through an orthogonal mirror image synthesis filter to obtain a decoding result corresponding to the coded audio.
Optionally, according to the low-frequency spectrum envelope and the frequency domain spectral coefficient corresponding to the high-frequency band audio data, encoding the high-frequency band audio data by using a standard sampling rate to obtain a high-frequency band code stream, including: acquiring frequency domain spectral coefficients of the high-frequency band data, and calculating to obtain high-frequency spectrum envelopes corresponding to the high-frequency band audio data according to the frequency domain spectral coefficients; and calculating according to the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a high-frequency low-frequency spectrum envelope ratio, and performing quantization and standard LC3 encoding processes on the high-frequency low-frequency spectrum envelope ratio to obtain a high-frequency band code stream.
Optionally, generating a high-frequency spectral coefficient according to the low-frequency spectral coefficient by using a pre-trained generating network, decoding the high-frequency band code stream to obtain a high-frequency low-frequency spectral envelope ratio, correcting the high-frequency spectral coefficient by using the high-frequency low-frequency spectral envelope ratio, and performing inverse transformation to obtain high-frequency band data, where the method includes: processing the low-frequency spectral coefficient by using a pre-trained generating network to obtain a corresponding high-frequency spectral coefficient; correcting the high-frequency spectrum coefficient by using the high-frequency low-frequency spectrum envelope ratio to obtain a corrected high-frequency spectrum coefficient; and performing low-delay improved inverse discrete cosine transform on the corrected high-frequency spectrum coefficient to obtain high-frequency band data.
Optionally, the pre-training process for generating the network includes: filtering the input audio signal through a quadrature mirror image analysis filter to obtain a low-frequency band signal and a high-frequency band signal; carrying out low-delay improved discrete cosine transform on the low-frequency band signal to obtain a low-frequency spectrum envelope, and inputting the result of the low-delay improved discrete cosine transform into a generating network to obtain a predicted high-frequency spectrum coefficient; carrying out low-delay improved discrete cosine transform on the high-frequency band signal to obtain a high-frequency spectrum envelope and an original high-frequency spectrum coefficient; adjusting the predicted high-frequency spectrum coefficient by using the low-frequency spectrum envelope and the high-frequency spectrum envelope to obtain an updated predicted high-frequency spectrum coefficient; and comparing the original high-frequency spectrum coefficient with the updated prediction high-frequency spectrum coefficient by using the discrimination network, and optimizing the generated network according to the comparison result to obtain the pre-trained generated network.
In a second aspect, the present application provides a high resolution audio coding system based on a generative confrontation network, comprising: an orthogonal mirror analysis filter for filtering the input encoded audio at an audio transmitting end to obtain low-band audio data and high-band audio data; the low-frequency band coding module is used for carrying out standard LC3 coding on the low-frequency band audio data by using a standard sampling rate to obtain a low-frequency band code stream and simultaneously obtain a low-frequency spectrum envelope; the high-frequency band encoding module is used for encoding the high-frequency band audio data by using a standard sampling rate according to the low-frequency spectrum envelope and the frequency domain spectrum coefficient corresponding to the high-frequency band audio data to obtain a high-frequency band code stream; the low-frequency band decoding module is used for performing a standard LC3 decoding process on the received low-frequency band code stream at an audio receiving end to obtain a low-frequency spectrum coefficient and obtain decoded low-frequency band data; the high-frequency band processing module is used for generating a high-frequency spectrum coefficient according to the low-frequency spectrum coefficient by utilizing a pre-trained generating network, decoding a high-frequency band code stream to obtain a high-frequency low-frequency spectrum envelope ratio, correcting the high-frequency spectrum coefficient by utilizing the high-frequency low-frequency spectrum envelope ratio and performing inverse transformation to obtain high-frequency band data; and the orthogonal mirror image synthesis filter is used for synthesizing the decoded low-frequency band data and the decoded high-frequency band data to obtain a decoding result corresponding to the coded audio.
In a third aspect, the present application provides a computer-readable storage medium storing computer instructions, wherein the computer instructions are operable to perform the high resolution audio coding method based on a generative countermeasure network in the first aspect.
In a third aspect, the present application provides a computer device comprising a processor and a memory, the memory storing computer instructions, wherein the processor operates the computer instructions to perform the high resolution audio coding method based on the generative countermeasure network in the first aspect.
The beneficial effect of this application is: at an encoding end, encoding low-frequency band audio data in audio data by using a standard sampling rate, and obtaining an encoding result of high-frequency band audio data by using only spectral envelope and corresponding parameters for the high-frequency band audio data, thereby realizing encoding of the audio data by twice the standard sampling rate; at a decoding end, the low-frequency band code stream is subjected to standard decoding, a corresponding decoding result is obtained by generating a countermeasure network for the high-frequency band code stream, the calculation power and the power consumption are reduced, the high-frequency band code stream only needs to transmit a very small amount of high-frequency low-frequency spectrum envelope ratio parameters, the bandwidth and the calculation power consumption are saved, and the method is suitable for LC3 low-power-consumption Bluetooth equipment, so that high-resolution Audio coding and decoding are realized in LC3 low-power-consumption Bluetooth, and the problem that in the prior art, a very high code rate is needed for transmitting high-resolution Audio, such as 990kbps of LDAC, and Audio jamming is easily caused in LE Audio is solved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram of one embodiment of a high resolution audio coding method based on a generation countermeasure network according to the present application;
FIG. 2 is a flow diagram of high-band code stream processing;
FIG. 3 is a diagram illustrating an example of the high-band code stream processing process according to the present application;
FIG. 4 is a schematic diagram of the network training process of the present application;
fig. 5 is a schematic diagram of an embodiment of a high resolution audio coding system based on the generation countermeasure network of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Moreover, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of steps or elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
Currently mainstream bluetooth audio encoders are as follows: SBC: the A2DP protocol is mandatory and is most widely used, and all Bluetooth audio equipment must support the protocol, but the tone quality is general; AAC-LC, wherein the sound quality is good and the application is wide, and a plurality of mainstream mobile phones support the AAC-LC, but compared with SBC, the memory occupation is large, the operation complexity is high, a plurality of Bluetooth devices are based on an embedded platform, the battery capacity is limited, the operation capability of a processor is poor, the memory is limited, and the patent fee is high; aptX series: the sound quality is good, but the code rate is high, the code rate of aptX needs 384kbps, and the code rate of aptX-HD is 576kbps, is a unique technology of high pass, and is relatively closed; LDAC, the tone quality is better, but the code rate is also very high, it is 330kbps, 660kbps and 990kbps respectively, because the wireless environment that the bluetooth apparatus locates is especially complicated, there is certain difficulty in supporting such high code rate steadily, and it is the unique technology of Sony, it is very closed too; LHDC: the sound quality is good, but the code rate is also high, typically including 400kbps, 600 kbps and 900kbps, and such high code rate puts high requirements on the baseband/radio frequency design of bluetooth.
For the above reasons, the Bluetooth international association Bluetooth Sig combines with numerous manufacturers to provide LC3, mainly for Bluetooth low energy, and can also be used for classic Bluetooth, which has the advantages of low delay, high sound quality and coding gain, and no special fee in the Bluetooth field, and is paid attention by the manufacturers.
For the LC3 Audio codec, the positioning is low complexity, only the sampling rate of 8 kHz-48 kHz is supported, and the sampling rate requirement of High Resolution Audio (High Resolution Audio) cannot be met. In the prior art, the method for improving the sampling rate needs high computational complexity and large power consumption, and cannot be applied to LC3 low-power-consumption Bluetooth equipment.
The method comprises the steps that filtering is carried out on input coded audio at an audio transmitting end through an orthogonal mirror image analysis filter, and low-frequency band audio data and high-frequency band audio data are obtained; performing standard LC3 encoding on the low-frequency band audio data by using a standard sampling rate to obtain a low-frequency band code stream, and simultaneously acquiring a low-frequency spectrum envelope; coding the high-frequency band audio data by using a standard sampling rate according to the low-frequency spectrum envelope and the frequency domain spectrum coefficient corresponding to the high-frequency band audio data to obtain a high-frequency band code stream; the audio receiving end receives the low-frequency band code stream and the high-frequency band code stream, and performs a standard LC3 decoding process on the low-frequency band code stream to obtain a low-frequency spectrum coefficient and obtain decoded low-frequency band data; generating a high-frequency spectrum coefficient according to the low-frequency spectrum coefficient by using a pre-trained generating network, decoding a high-frequency band code stream to obtain a high-frequency low-frequency spectrum envelope ratio, correcting the high-frequency spectrum coefficient by using the high-frequency low-frequency spectrum envelope ratio, and performing inverse transformation to obtain high-frequency band data; and synthesizing the decoded low-frequency band data and the decoded high-frequency band data through an orthogonal mirror image synthesis filter to obtain a decoding result corresponding to the coded audio.
At an encoding end, encoding low-frequency band audio data in audio data by using a standard sampling rate, and obtaining an encoding result of high-frequency band audio data by using only spectral envelope and corresponding parameters for the high-frequency band audio data, thereby realizing encoding of the audio data by twice the standard sampling rate; at a decoding end, the low-frequency band code stream is subjected to standard decoding, a corresponding decoding result is obtained by generating a countermeasure network for the high-frequency band code stream, the calculation power and the power consumption are reduced, the high-frequency band code stream only needs to transmit a very small amount of high-frequency low-frequency spectrum envelope ratio parameters, the bandwidth and the calculation power consumption are saved, and the method is suitable for LC3 low-power-consumption Bluetooth equipment, so that high-resolution Audio coding and decoding are realized in LC3 low-power-consumption Bluetooth, and the problem that in the prior art, a very high code rate (such as 990kbps of LDAC) is needed for transmitting high-resolution Audio, and Audio jamming is easily caused in LE Audio is avoided.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flow chart of an embodiment of the high-resolution audio coding method based on the generation countermeasure network of the present application.
In the embodiment shown in fig. 1, the high-resolution audio coding method based on the generative countermeasure network of the present application includes a process S101 of filtering input coded audio through a quadrature mirror analysis filter at an audio transmitting end to obtain low-band audio data and high-band audio data.
In this embodiment, when encoding audio data, the audio data is filtered by the quadrature mirror analysis filter to obtain low-band audio data and high-band audio data, respectively. And then respectively carrying out corresponding processing on the low-frequency band audio data and the high-frequency band audio data.
Specifically, when the effective bandwidth of the input audio signal is 32KHz, low-band audio data with a frequency band of 0-16KHz and high-band audio data with a frequency band of 16-32KHz are obtained after passing through the quadrature mirror filter, and then the low-band audio data and the high-band audio data are respectively subjected to subsequent processing.
In the embodiment shown in fig. 1, the high-resolution audio encoding method based on the generative countermeasure network of the present application includes a process S102, where standard LC3 encoding is performed on low-band audio data using a standard sampling rate to obtain a low-band code stream, and a low-spectrum envelope is obtained at the same time.
In this embodiment, the standard LC3 encoding process is performed on the low-band audio data using the standard sampling rate in the LC3 encoder, so as to obtain a low-band code stream corresponding to the low-band audio data. The sampling rate range supported by the LC3 encoder is 8-48KHz, and the specific standard sampling rate needs to be set appropriately according to the actual encoding requirements. For example, when the sampling rate requirement is around 64KHz, the LC3 codec cannot meet the sampling rate requirement, and the standard sampling rate is half of the sampling rate required by the codec, that is, a standard LC3 encoding process is performed on low-band audio data with a sampling rate of 32 KHz. And in the process of encoding the low-frequency band audio data, simultaneously obtaining the low-frequency spectrum envelope corresponding to the low-frequency band audio data.
In the embodiment shown in fig. 1, the high-resolution audio encoding method based on the generative countermeasure network of the present application includes a process S103, which encodes the high-band audio data with a standard sampling rate according to the low-band spectral envelope and the frequency-domain spectral coefficients corresponding to the high-band audio data, so as to obtain a high-band code stream.
In this embodiment, the standard encoding process is not performed on the obtained high-band audio data, but high-frequency parameter extraction is performed on a low-frequency spectrum envelope obtained when the low-frequency band audio data is encoded and a high-frequency spectrum coefficient obtained after the high-frequency band audio data is analyzed, and encoding is performed on the high-frequency band audio data to obtain a corresponding high-frequency band code stream.
Optionally, according to the low-frequency spectrum envelope and the frequency domain spectral coefficient corresponding to the high-frequency band audio data, encoding the high-frequency band audio data by using a standard sampling rate to obtain a high-frequency band code stream, including: acquiring frequency domain spectral coefficients of the high-frequency-band audio data, and calculating to obtain high-frequency spectrum envelopes corresponding to the high-frequency-band audio data according to the frequency domain spectral coefficients; and calculating according to the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a high-frequency low-frequency spectrum envelope ratio, and performing quantization and standard LC3 encoding processes on the high-frequency low-frequency spectrum envelope ratio to obtain a high-frequency band code stream.
In this optional embodiment, when the high-band audio data is processed, the frequency domain spectral coefficient of the high-band data is obtained, and then the high-frequency spectral envelope corresponding to the high-band audio data is obtained by using the frequency domain spectral coefficient. And performing an encoding process on the high-frequency band audio data through a high-frequency low-frequency spectrum envelope ratio obtained by calculating the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a high-frequency band code stream corresponding to the high-frequency band audio data.
Specifically, fig. 2 shows a flow chart of the high-band code stream processing.
As shown in fig. 2, in the example shown in fig. 2, the high-frequency band audio data, for example, the audio data with the bandwidth of 16-32KHz in the above example, is subjected to the low-delay modified discrete cosine transform process in the LC3 encoder, so as to obtain the frequency domain spectral coefficients corresponding to the high-frequency band audio data. The specific process is shown as follows:
Figure DEST_PATH_IMAGE001
Figure 687558DEST_PATH_IMAGE002
wherein
Figure DEST_PATH_IMAGE003
Is the input time domain audio pcm signal and x (k) is the frequency domain spectral coefficients that are subjected to a discrete cosine transform.
After determining the frequency domain spectral coefficients of the high frequency band audio data, the calculation of the high frequency spectrum envelope is performed as follows:
Figure 484000DEST_PATH_IMAGE004
wherein
Figure DEST_PATH_IMAGE005
Which is a table in the LC3 standard for dividing MDCT spectral coefficients into different frequency bands, x (k) represents the frequency domain spectral coefficients obtained above.
After the high-frequency spectrum envelope of the high-frequency band audio data is obtained through calculation, the high-frequency and low-frequency spectrum envelope ratio is calculated by using the obtained low-frequency spectrum envelope, specifically as follows:
let the low-frequency spectral envelope denote
Figure 539681DEST_PATH_IMAGE006
Pack the high frequency spectrum into
Figure DEST_PATH_IMAGE007
Then, the high-frequency low-frequency spectrum envelope ratio is:
Figure 356327DEST_PATH_IMAGE008
and then, quantizing the high-frequency and low-frequency spectrum envelope ratio by using a method of spectrum number quantization in standard LC3 coding, and then sequentially performing arithmetic coding, code stream packaging and other processes to finally obtain a high-frequency band code stream corresponding to the high-frequency band audio data.
In the embodiment shown in fig. 1, the high-resolution audio coding method based on the generative countermeasure network includes a process S104, where an audio receiving end receives a low-frequency-band code stream and a high-frequency-band code stream, and performs a standard LC3 decoding process on the low-frequency-band code stream to obtain a low-frequency spectral coefficient, and obtain decoded low-frequency-band data.
In this embodiment, after the encoding process at the transmitting end is completed, the low-band code stream and the high-band code stream obtained by encoding are subjected to corresponding decoding and other processing at the receiving end. Firstly, a standard LC3 decoding process is carried out on a low-frequency band code stream obtained through standard LC3 coding, decoded low-frequency band data are obtained, and meanwhile, low-frequency spectrum coefficients are obtained in the decoding process.
In the embodiment shown in fig. 1, the high resolution audio coding method based on the generative countermeasure network of the present application includes a process S105 of generating high spectral coefficients from low spectral coefficients by using a pre-trained generative network, simultaneously decoding a high band code stream to obtain a high frequency low spectral envelope ratio, modifying the high spectral coefficients by using the high frequency low spectral envelope ratio, and performing an inverse transformation to obtain high band data.
In the embodiment, the high-frequency band code stream is processed by using a pre-trained generation network, and a decoding result corresponding to the high-frequency band code stream is finally obtained according to the low-frequency spectrum coefficient obtained in the low-frequency band code stream decoding process, so that the decoded high-frequency band data is obtained.
Optionally, generating a high-frequency spectral coefficient according to the low-frequency spectral coefficient by using a pre-trained generating network, decoding the high-frequency band code stream to obtain a high-frequency low-frequency spectral envelope ratio, correcting the high-frequency spectral coefficient by using the high-frequency low-frequency spectral envelope ratio, and performing inverse transformation to obtain high-frequency band data, where the method includes: processing the low-frequency spectral coefficient by using a pre-trained generating network to obtain a corresponding high-frequency spectral coefficient; correcting the high-frequency spectrum coefficient by using the high-frequency low-frequency spectrum envelope ratio to obtain a corrected high-frequency spectrum coefficient; and performing low-delay improved inverse discrete cosine transform on the corrected high-frequency spectrum coefficient to obtain high-frequency band data.
In the optional embodiment, in the process of decoding the high-frequency band code stream, a standard decoding process is not performed, but a pre-trained generation network is used to directly generate a corresponding high-frequency spectral coefficient according to a low-frequency spectral coefficient obtained in the process of decoding the low-frequency band code stream. And then, obtaining a corresponding high-frequency low-frequency spectrum envelope ratio in the analysis process of the high-frequency band code stream. And modifying the high-frequency spectrum coefficient generated by the generating network by using the high-frequency low-frequency spectrum envelope comparison, and finally performing inverse discrete cosine transform on the modified high-frequency spectrum coefficient to obtain decoded high-frequency band data.
Specifically, fig. 3 is a schematic diagram illustrating an example of the high-frequency band code stream processing process according to the present application.
In the example shown in fig. 3, standard LC3 decoding on the low-band code stream outputs low-frequency spectral coefficients
Figure 155656DEST_PATH_IMAGE009
,
Figure 741358DEST_PATH_IMAGE010
Inputting low frequency spectrum coefficient into the generating network, outputting high frequency spectrum coefficient
Figure 233519DEST_PATH_IMAGE011
. Carrying out code stream analysis, arithmetic decoding and inverse quantization on the high-frequency band code stream by using the same standard LC3 decoding, and outputting a high-frequency low-frequency spectrum envelope ratio by inverse quantization in detail;
Figure 475144DEST_PATH_IMAGE012
then, the high-frequency low-frequency spectrum envelope ratio is used for correcting the high-frequency spectrum coefficient to obtain a corrected high-frequency spectrum coefficient, which is expressed as follows:
Figure 812585DEST_PATH_IMAGE013
and finally, performing low-delay inverse discrete cosine transform on the corrected high-frequency spectrum coefficient, and outputting decoded high-frequency band data corresponding to the high-frequency band code stream.
Figure 515443DEST_PATH_IMAGE014
Optionally, the pre-training process for generating the network includes: filtering the input audio signal through a quadrature mirror image analysis filter to obtain a low-frequency band signal and a high-frequency band signal; carrying out low-delay improved discrete cosine transform on the low-frequency band signal to obtain a low-frequency spectrum envelope, and inputting the result of the low-delay improved discrete cosine transform into a generating network to obtain a predicted high-frequency spectrum coefficient; carrying out low-delay improved discrete cosine transform on the high-frequency band signal to obtain a high-frequency spectrum envelope and an original high-frequency spectrum coefficient; adjusting the predicted high-frequency spectrum coefficient by using the low-frequency spectrum envelope and the high-frequency spectrum envelope to obtain an updated predicted high-frequency spectrum coefficient; and comparing the original high-frequency spectrum coefficient with the updated prediction high-frequency spectrum coefficient by using the discrimination network, and optimizing the generated network according to the comparison result to obtain the pre-trained generated network.
Specifically, fig. 4 shows a schematic diagram of the network training process generated by the present application.
In the example shown in fig. 4, when the generation network is trained, the input audio is filtered by a quadrature mirror analysis filter (QMF) to obtain a low-band signal and a high-band signal, respectively. And performing low-delay modified discrete cosine transform (LD-MDCT) on the low-frequency band signal to obtain a low-frequency spectrum envelope on the one hand, and inputting the result of the low-delay modified discrete cosine transform into a generating network to obtain a predicted high-frequency spectrum coefficient. For the high-frequency band signal, low-delay modified discrete cosine transform (LD-MDCT) is also carried out on the high-frequency band signal, and a high-frequency spectrum envelope and original high-frequency spectrum coefficients are obtained. And adjusting the predicted high-frequency spectrum coefficient of the generating network by using the obtained high-frequency spectrum envelope and low-frequency spectrum envelope to obtain an updated predicted high-frequency spectrum coefficient. And finally, in the discrimination network, judging the generation result of the generation network, comparing the updated prediction high-frequency spectrum coefficient with the original high-frequency spectrum coefficient of the high-frequency band signal, and optimizing the generation network according to the comparison result until the discrimination result of the discrimination network is 'true', which indicates that the difference between the high-frequency spectrum coefficient generated by the generation network and the original high-frequency spectrum coefficient is small, and when the two can be considered as consistent, finishing the training of the generation network.
The generation network used by the invention can be based on a self-encoder and also can be based on other neural network models, the application is not limited, the generation network is a mature technology, and the structure is briefly described as follows: the device comprises an encoder and a decoder, wherein the encoder comprises a plurality of convolution layers and is used for reducing the dimension of low-frequency spectral coefficients and extracting the characteristics of the low-frequency MDCT spectral coefficients, and the decoder comprises a corresponding number of deconvolution layers and is used for increasing the dimension of the characteristics of the low-frequency MDCT spectral coefficients, so that the high-frequency spectral coefficients which generate network output have the same dimension as the input low-frequency spectral coefficients. The convolution layer includes convolution, batch normalization and activation functions, and the deconvolution layer structure is similar.
The discrimination network used by the invention can be based on a deep neural network and other neural network models, the application is not limited, and the structure is briefly described as follows: taking the audio sampling rate of 32kHz and the frame length configuration of 10ms as an example, 640 nodes of an input layer, 960 nodes of a first hidden layer, 960 nodes of a second hidden layer and 1 node of an output layer are provided, an input layer and hidden activation function is tanh, and an output layer activation function is sigmoid.
Creating a countermeasure network is an unsupervised learning method that learns by letting two neural networks game each other. The GAN mainly comprises a generation Network (Generator Network) and a discriminant Network (discriminant Network), wherein the generation Network is mainly used for generating samples, the input of the generation Network can be noise data, and the output of the generation Network is generated target samples; the discriminating network is primarily used to distinguish whether its input samples are present in the target samples generated by the generating network or the real samples. The two neural networks play games during training, namely the output result of the generated network needs to simulate real samples in a training set as much as possible, the input samples of the discrimination network need to be distinguished as much as possible, the two networks continuously adjust parameters and resist against each other in the training process, and finally balance is achieved, so that the 'false data' samples generated by the generated network are close to real data, and the discrimination network cannot judge whether the output result of the generated network is the real sample.
The training process is briefly described as follows: firstly, fixing a generated network (Generator, G for short) unchanged, using 'true' to supervise and update a G parameter of a discrimination network when the input of the discrimination network (D) is true data, and using 'false' to supervise and update a G parameter of the discrimination network when the input of the discrimination network is false data, thereby finding out the current optimal discrimination network; then the fixed discrimination network D is unchanged, the 'true' is used for monitoring and updating the parameters of the generation network, and the current optimal generation network is found out.
In the embodiment shown in fig. 1, the method for encoding high-resolution audio based on the generative countermeasure network of the present application includes a process S106 of synthesizing the decoded low-band data and the decoded high-band data by using an orthogonal mirror synthesis filter to obtain a decoding result corresponding to the encoded audio.
In this embodiment, after the decoded low-band data and the decoded high-band data are obtained separately, the two are synthesized by the orthogonal mirror synthesis filter, and finally, the final decoding result corresponding to the encoded audio is obtained. The method comprises the steps that standard LC3 encoding and decoding are carried out on low-frequency band data of encoded audio only by using a standard sampling rate, and frequency-changing parameter extraction is carried out on high-frequency band data in the encoded audio during encoding, so that a corresponding encoding result is obtained; when decoding, the generation network generates corresponding decoding results, and finally synthesizes data of high frequency band and low frequency band, thereby realizing the purpose of encoding and decoding the encoded audio with twice standard sampling rate. The standard sampling rate range of the known LC3 codec is 8-48KHz, so the final sampling rate range can reach 16-96 KHz. Therefore, the high-resolution audio coding and decoding can be completed at a higher sampling rate.
At an encoding end, encoding low-frequency band audio data in audio data by using a standard sampling rate, and obtaining an encoding result of high-frequency band audio data by using only spectral envelope and corresponding parameters for the high-frequency band audio data, thereby realizing encoding of the audio data by twice the standard sampling rate; at a decoding end, the low-frequency band code stream is subjected to standard decoding, a corresponding decoding result is obtained by generating a countermeasure network for the high-frequency band code stream, the calculation power and the power consumption are reduced, the high-frequency band code stream only needs to transmit a very small amount of high-frequency low-frequency spectrum envelope ratio parameters, the bandwidth and the calculation power consumption are saved, and the method is suitable for LC3 low-power-consumption Bluetooth equipment, so that high-resolution Audio coding and decoding are realized in LC3 low-power-consumption Bluetooth, and the problem that in the prior art, a very high code rate (such as 990kbps of LDAC) is needed for transmitting high-resolution Audio, and Audio jamming is easily caused in LE Audio is avoided. Based on the method of the invention, only a small amount of code stream information is needed to transmit high-frequency information, thereby saving transmission bandwidth, ensuring high-resolution tone quality and avoiding audio jamming; the invention can be applied to the sampling rate which is not supported by the current LC3 to support high tone quality, such as 64kHz or above, and can also be applied to the sampling rate which is supported by the current LC3 to reduce the code rate, for example, the 48kHz sampling rate which is supported by the standard LC3 specification is divided into high frequency and low frequency, and the high frequency and the low frequency are respectively coded, transmitted and decoded and synthesized, thereby realizing equivalent tone quality with lower code rate; the present invention is exemplified in the field of bluetooth, but other fields may be used.
Fig. 5 shows an embodiment of the high resolution audio coding system of the present application based on the generation of a countermeasure network.
In the embodiment shown in fig. 5, the high resolution audio coding system based on the generative countermeasure network of the present application includes an orthogonal mirror analysis filter 501, which filters the input coded audio at the audio transmitting end to obtain low-band audio data and high-band audio data; a low-band encoding module 502, which performs standard LC3 encoding on the low-band audio data using a standard sampling rate to obtain a low-band code stream, and simultaneously obtains a low-frequency spectrum envelope; a high-band encoding module 503, configured to encode the high-band audio data by using a standard sampling rate according to the low-band spectral envelope and the frequency-domain spectral coefficient corresponding to the high-band audio data, to obtain a high-band code stream; a low-band decoding module 504, which performs a standard LC3 decoding process on the received low-band code stream at an audio receiving end to obtain a low-frequency spectrum coefficient and obtain decoded low-band data; a high-band processing module 505, which generates a high-band spectral coefficient according to the low-band spectral coefficient by using a pre-trained generating network, decodes the high-band code stream to obtain a high-band low-band spectral envelope ratio, corrects the high-band spectral coefficient by using the high-band low-band spectral envelope ratio, and performs inverse transformation to obtain high-band data; and a quadrature mirror synthesis filter 506 that synthesizes the decoded low-band data and the decoded high-band data to obtain a decoding result corresponding to the encoded audio.
Optionally, in the high-band encoding module 503, a frequency domain spectral coefficient of the high-band data is obtained, and a high-band spectral envelope corresponding to the high-band audio data is obtained through calculation according to the frequency domain spectral coefficient; and calculating according to the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a high-frequency low-frequency spectrum envelope ratio, and performing quantization and standard LC3 encoding processes on the high-frequency low-frequency spectrum envelope ratio to obtain a high-frequency band code stream.
Optionally, in the high-band processing module 505, the low-frequency spectral coefficient is processed by using a pre-trained generation network to obtain a corresponding high-frequency spectral coefficient; correcting the high-frequency spectrum coefficient by using the high-frequency low-frequency spectrum envelope ratio to obtain a corrected high-frequency spectrum coefficient; and performing low-delay improved inverse discrete cosine transform on the corrected high-frequency spectrum coefficient to obtain high-frequency band data.
Optionally, the pre-training process for generating the network includes: filtering the input audio signal through a quadrature mirror image analysis filter to obtain a low-frequency band signal and a high-frequency band signal; carrying out low-delay improved discrete cosine transform on the low-frequency band signal to obtain a low-frequency spectrum envelope, and inputting the result of the low-delay improved discrete cosine transform into a generating network to obtain a predicted high-frequency spectrum coefficient; carrying out low-delay improved discrete cosine transform on the high-frequency band signal to obtain a high-frequency spectrum envelope and an original high-frequency spectrum coefficient; adjusting the predicted high-frequency spectral coefficient by using the low-frequency spectral envelope and the high-frequency spectral envelope to obtain an updated predicted high-frequency spectral coefficient; and comparing the original high-frequency spectrum coefficient with the updated predicted high-frequency spectrum coefficient by using a discrimination network, and optimizing the generated network according to a comparison result to obtain a pre-trained generated network.
At an encoding end, encoding low-frequency band audio data in audio data by using a standard sampling rate, and obtaining an encoding result of high-frequency band audio data by using only spectral envelope and corresponding parameters for the high-frequency band audio data, thereby realizing encoding of the audio data by twice the standard sampling rate; at a decoding end, the low-frequency band code stream is subjected to standard decoding, a corresponding decoding result is obtained by generating a countermeasure network for the high-frequency band code stream, the calculation power and the power consumption are reduced, the high-frequency band code stream only needs to transmit a very small amount of high-frequency low-frequency spectrum envelope ratio parameters, the bandwidth and the calculation power consumption are saved, and the method is suitable for LC3 low-power-consumption Bluetooth equipment, so that high-resolution Audio coding and decoding are realized in LC3 low-power-consumption Bluetooth, and the problem that in the prior art, a very high code rate (such as 990kbps of LDAC) is needed for transmitting high-resolution Audio, and Audio jamming is easily caused in LE Audio is avoided.
In a particular embodiment of the present application, a computer-readable storage medium stores computer instructions, wherein the computer instructions are operable to perform the high resolution audio coding method based on generation of a countermeasure network described in any of the embodiments. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one embodiment of the present application, a computer device includes a processor and a memory, the memory storing computer instructions, wherein: the processor operates the computer instructions to perform the high resolution audio coding method based on the generative countermeasure network described in any of the embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above embodiments are merely examples, which are not intended to limit the scope of the present disclosure, and all equivalent structural changes made by using the contents of the specification and the drawings, or any other related technical fields, are also included in the scope of the present disclosure.

Claims (8)

1. A high-resolution audio coding and decoding method based on a generative countermeasure network is characterized by comprising the following steps:
filtering the input coded audio at an audio transmitting end through an orthogonal mirror image analysis filter to obtain low-frequency band audio data and high-frequency band audio data;
performing standard LC3 encoding on the low-frequency-band audio data by using a standard sampling rate to obtain a low-frequency-band code stream, and simultaneously acquiring a low-frequency spectrum envelope;
according to the low-frequency spectrum envelope and the frequency domain spectrum coefficient corresponding to the high-frequency band audio data, encoding the high-frequency band audio data by using the standard sampling rate to obtain a high-frequency band code stream;
an audio receiving end receives the low-frequency band code stream and the high-frequency band code stream, and performs a standard LC3 decoding process on the low-frequency band code stream to obtain a low-frequency spectrum coefficient and obtain decoded low-frequency band data;
generating a high-frequency spectrum coefficient according to the low-frequency spectrum coefficient by using a pre-trained generating network, decoding the high-frequency band code stream to obtain a high-frequency low-frequency spectrum envelope ratio, correcting the high-frequency spectrum coefficient by using the high-frequency low-frequency spectrum envelope ratio, and performing inverse transformation to obtain high-frequency band data;
synthesizing the decoded low-frequency band data and the decoded high-frequency band data through an orthogonal mirror image synthesis filter to obtain a decoding result corresponding to the coded audio, wherein the decoding result is obtained
The encoding the high-frequency band audio data by using the standard sampling rate according to the low-frequency spectrum envelope and the frequency domain spectrum coefficient corresponding to the high-frequency band audio data to obtain a high-frequency band code stream, including:
acquiring frequency domain spectral coefficients of the high-frequency band audio data, and calculating to obtain a high-frequency spectrum envelope corresponding to the high-frequency band audio data according to the frequency domain spectral coefficients;
and calculating according to the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a high-frequency low-frequency spectrum envelope ratio, and performing quantization and standard LC3 encoding processes on the high-frequency low-frequency spectrum envelope ratio to obtain the high-frequency band code stream.
2. The method as claimed in claim 1, wherein the generating network using pre-training generates high-frequency spectral coefficients according to the low-frequency spectral coefficients, decodes the high-frequency band code stream to obtain a high-frequency low-frequency spectral envelope ratio, modifies the high-frequency spectral coefficients using the high-frequency low-frequency spectral envelope ratio, and performs inverse transformation to obtain high-frequency band data, and comprises:
processing the low-frequency spectral coefficient by using the pre-trained generation network to obtain a corresponding high-frequency spectral coefficient;
utilizing the high-frequency and low-frequency spectrum envelope ratio to correct the high-frequency spectrum coefficient to obtain a corrected high-frequency spectrum coefficient;
and carrying out low-delay improved inverse discrete cosine transform on the corrected high-frequency spectrum coefficient to obtain the high-frequency band data.
3. The generation countermeasure network-based high resolution audio codec method according to claim 1, wherein the pre-training process for the generation network includes:
filtering the input audio signal through a quadrature mirror image analysis filter to obtain a low-frequency band signal and a high-frequency band signal;
performing low-delay improved discrete cosine transform on the low-frequency band signal to obtain a low-frequency spectrum envelope, and inputting a result of the low-delay improved discrete cosine transform into a generating network to obtain a predicted high-frequency spectrum coefficient;
performing the low-delay improved discrete cosine transform on the high-frequency band signal to obtain a high-frequency spectrum envelope and an original high-frequency spectrum coefficient;
adjusting the predicted high-frequency spectrum coefficient by using the low-frequency spectrum envelope and the high-frequency spectrum envelope to obtain an updated predicted high-frequency spectrum coefficient;
and comparing the original high-frequency spectrum coefficient with the updated prediction high-frequency spectrum coefficient by using a discrimination network, and optimizing the generated network according to the comparison result to obtain the pre-trained generated network.
4. A high-resolution audio codec system based on a generative confrontation network, comprising:
an orthogonal mirror analysis filter for filtering the input encoded audio at an audio transmitting end to obtain low-band audio data and high-band audio data;
the low-frequency band coding module is used for carrying out standard LC3 coding on the low-frequency band audio data by using a standard sampling rate to obtain a low-frequency band code stream and simultaneously obtain a low-frequency spectrum envelope;
the high-frequency band encoding module is used for encoding the high-frequency band audio data by using the standard sampling rate according to the low-frequency spectrum envelope and the frequency domain spectral coefficient corresponding to the high-frequency band audio data to obtain a high-frequency band code stream;
the low-frequency band decoding module is used for performing a standard LC3 decoding process on the received low-frequency band code stream at an audio receiving end to obtain a low-frequency spectrum coefficient and obtain decoded low-frequency band data;
the high-frequency band processing module is used for generating a high-frequency spectrum coefficient according to the low-frequency spectrum coefficient by utilizing a pre-trained generating network, decoding the high-frequency band code stream to obtain a high-frequency low-frequency spectrum envelope ratio, correcting the high-frequency spectrum coefficient by utilizing the high-frequency low-frequency spectrum envelope ratio and performing inverse transformation to obtain high-frequency band data;
a quadrature mirror synthesis filter for synthesizing the decoded low-band data and the decoded high-band data to obtain a decoding result corresponding to the encoded audio, wherein
In the high-band encoding module, the encoding the high-band audio data by using the standard sampling rate according to the low-band spectral envelope and the frequency-domain spectral coefficient corresponding to the high-band audio data to obtain a high-band code stream, including:
acquiring frequency domain spectral coefficients of the high-frequency band audio data, and calculating to obtain a high-frequency spectrum envelope corresponding to the high-frequency band audio data according to the frequency domain spectral coefficients;
and calculating according to the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a high-frequency low-frequency spectrum envelope ratio, and performing quantization and standard LC3 encoding processes on the high-frequency low-frequency spectrum envelope ratio to obtain the high-frequency band code stream.
5. The generation countermeasure network-based high resolution audio codec system of claim 4, wherein in the high band processing module, the low spectral coefficients are processed by the pre-trained generation network to obtain corresponding high spectral coefficients; utilizing the high-frequency low-frequency spectrum envelope ratio to correct the high-frequency spectrum coefficient to obtain a corrected high-frequency spectrum coefficient; and performing low-delay improved inverse discrete cosine transform on the corrected high-frequency spectrum coefficient to obtain the high-frequency band data.
6. The generation-based countermeasure network high-resolution audio codec system of claim 4, wherein the pre-training process of the generation network comprises:
filtering the input audio signal through a quadrature mirror image analysis filter to obtain a low-frequency band signal and a high-frequency band signal;
performing low-delay improved discrete cosine transform on the low-frequency band signal to obtain a low-frequency spectrum envelope, and inputting a result of the low-delay improved discrete cosine transform into a generation network to obtain a predicted high-frequency spectrum coefficient;
performing the low-delay improved discrete cosine transform on the high-frequency band signal to obtain a high-frequency spectrum envelope and an original high-frequency spectrum coefficient;
adjusting the predicted high-frequency spectrum coefficient by using the low-frequency spectrum envelope and the high-frequency spectrum envelope to obtain an updated predicted high-frequency spectrum coefficient;
and comparing the original high-frequency spectrum coefficient with the updated predicted high-frequency spectrum coefficient by using a discrimination network, and optimizing the generated network according to a comparison result to obtain the pre-trained generated network.
7. A computer readable storage medium storing computer instructions, wherein the computer instructions are operative to perform the high resolution audio codec method based on the generative countermeasure network of any one of claims 1 to 3.
8. A computer device comprising a processor and a memory, the memory storing computer instructions, wherein: the processor operates the computer instructions to perform the high resolution audio codec method based on the generative countermeasure network of any of claims 1-3.
CN202210463201.XA 2022-04-29 2022-04-29 High-resolution audio coding and decoding method and system based on generation countermeasure network Active CN114582361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210463201.XA CN114582361B (en) 2022-04-29 2022-04-29 High-resolution audio coding and decoding method and system based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210463201.XA CN114582361B (en) 2022-04-29 2022-04-29 High-resolution audio coding and decoding method and system based on generation countermeasure network

Publications (2)

Publication Number Publication Date
CN114582361A CN114582361A (en) 2022-06-03
CN114582361B true CN114582361B (en) 2022-07-08

Family

ID=81784117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210463201.XA Active CN114582361B (en) 2022-04-29 2022-04-29 High-resolution audio coding and decoding method and system based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN114582361B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863940B (en) * 2022-07-05 2022-09-30 北京百瑞互联技术有限公司 Model training method for voice quality conversion, method, device and medium for improving voice quality

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140759A (en) * 2006-09-08 2008-03-12 华为技术有限公司 Band-width spreading method and system for voice or audio signal
CN103971693A (en) * 2013-01-29 2014-08-06 华为技术有限公司 Forecasting method for high-frequency band signal, encoding device and decoding device
AU2014283196A1 (en) * 2013-06-21 2016-02-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an adaptive spectral shape of comfort noise
CN107945811A (en) * 2017-10-23 2018-04-20 北京大学 A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN111429926A (en) * 2020-03-24 2020-07-17 北京百瑞互联技术有限公司 Method and device for optimizing audio coding speed
CN111768793A (en) * 2020-07-11 2020-10-13 北京百瑞互联技术有限公司 LC3 audio encoder coding optimization method, system and storage medium
CN112309408A (en) * 2020-11-10 2021-02-02 北京百瑞互联技术有限公司 Method, device and storage medium for expanding LC3 audio encoding and decoding bandwidth
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2830063A1 (en) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for decoding an encoded audio signal
KR102002681B1 (en) * 2017-06-27 2019-07-23 한양대학교 산학협력단 Bandwidth extension based on generative adversarial networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140759A (en) * 2006-09-08 2008-03-12 华为技术有限公司 Band-width spreading method and system for voice or audio signal
CN103971693A (en) * 2013-01-29 2014-08-06 华为技术有限公司 Forecasting method for high-frequency band signal, encoding device and decoding device
AU2014283196A1 (en) * 2013-06-21 2016-02-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an adaptive spectral shape of comfort noise
CN107945811A (en) * 2017-10-23 2018-04-20 北京大学 A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN111429926A (en) * 2020-03-24 2020-07-17 北京百瑞互联技术有限公司 Method and device for optimizing audio coding speed
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
CN111768793A (en) * 2020-07-11 2020-10-13 北京百瑞互联技术有限公司 LC3 audio encoder coding optimization method, system and storage medium
CN112309408A (en) * 2020-11-10 2021-02-02 北京百瑞互联技术有限公司 Method, device and storage medium for expanding LC3 audio encoding and decoding bandwidth

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network;Yang Xiang,et al.;《 IEEE/ACM Transactions on Audio, Speech, and Language Processing》;IEEE;20200529;第28卷;全文 *
基于非线性映射模型的音频带宽扩展编码研究;姜林;《中国博士学位论文全文数据库 信息科技辑》;中国学术期刊(光盘版)电子杂志社;20200115(第1期);全文 *
高保真低速率音频编码关键技术研究;郭庆巍;《中国优秀硕士学位论文全文数据库 信息科技辑》;20090115(第1期);全文 *

Also Published As

Publication number Publication date
CN114582361A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
US8554551B2 (en) Systems, methods, and apparatus for context replacement by audio level
US7933770B2 (en) Method and device for coding audio data based on vector quantisation
MXPA06010825A (en) Coding of audio signals.
EP2038883B1 (en) Vocoder and associated method that transcodes between mixed excitation linear prediction (melp) vocoders with different speech frame rates
CN110827842B (en) High-band excitation signal generation
CN101006495A (en) Audio encoding apparatus, audio decoding apparatus, communication apparatus and audio encoding method
KR101770237B1 (en) Method, apparatus, and system for processing audio data
TWI332193B (en) Method and apparatus of processing time-varying signals coding and decoding and computer program product
RU2636685C2 (en) Decision on presence/absence of vocalization for speech processing
CN107787510A (en) High-frequency band signals produce
WO2011127832A1 (en) Time/frequency two dimension post-processing
CN107112027B (en) The bi-directional scaling of gain shape circuit
WO2015154397A1 (en) Noise signal processing and generation method, encoder/decoder and encoding/decoding system
CN107743644A (en) High-frequency band signals produce
CN114550732B (en) Coding and decoding method and related device for high-frequency audio signal
EP2831875A1 (en) Bandwidth extension of harmonic audio signal
CN114582361B (en) High-resolution audio coding and decoding method and system based on generation countermeasure network
CN111986685A (en) Audio coding and decoding method and system for realizing high sampling rate
JP2015537254A (en) Encoding method, decoding method, encoding device, and decoding device
US8473286B2 (en) Noise feedback coding system and method for providing generalized noise shaping within a simple filter structure
JP2018511086A (en) Audio encoder and method for encoding an audio signal
JP2000099095A (en) Device and method for filtering voice signal, handset and telephone communication system
CN114863942B (en) Model training method for voice quality conversion, method and device for improving voice quality
CN116110424A (en) Voice bandwidth expansion method and related device
CN116631418A (en) Speech coding method, speech decoding method, speech coding device, speech decoding device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: A1009, floor 9, block a, No. 9, Shangdi Third Street, Haidian District, Beijing 100085

Patentee after: Beijing Bairui Internet Technology Co.,Ltd.

Address before: A1009, floor 9, block a, No. 9, Shangdi Third Street, Haidian District, Beijing 100085

Patentee before: BARROT WIRELESS Co.,Ltd.

CP01 Change in the name or title of a patent holder