CN112037803A - Audio encoding method and device, electronic equipment and storage medium - Google Patents

Audio encoding method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112037803A
CN112037803A CN202010383119.7A CN202010383119A CN112037803A CN 112037803 A CN112037803 A CN 112037803A CN 202010383119 A CN202010383119 A CN 202010383119A CN 112037803 A CN112037803 A CN 112037803A
Authority
CN
China
Prior art keywords
audio segment
audio
active audio
granularity
active
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010383119.7A
Other languages
Chinese (zh)
Other versions
CN112037803B (en
Inventor
闫玉凤
肖全之
黄荣均
方桂萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Jieli Technology Co Ltd
Original Assignee
Zhuhai Jieli Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Jieli Technology Co Ltd filed Critical Zhuhai Jieli Technology Co Ltd
Priority to CN202010383119.7A priority Critical patent/CN112037803B/en
Publication of CN112037803A publication Critical patent/CN112037803A/en
Application granted granted Critical
Publication of CN112037803B publication Critical patent/CN112037803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

The invention provides an audio coding method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: carrying out voice endpoint detection processing on the audio data to be coded so as to segment active audio segments and inactive audio segments in the audio data to be coded; for each active audio segment, calculating the granularity average energy of each sub-band in each granularity by using the energy value of each sub-band in each granularity; determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment; for each active audio segment, carrying out audio coding on the active audio segment according to the coding rate of the active audio segment; and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding rate of each active audio segment is greater than that of each inactive audio segment. The invention can be beneficial to improving the coding quality and reducing the audio distortion after coding.

Description

Audio encoding method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of audio coding technologies, and in particular, to an audio coding method and apparatus, an electronic device, and a storage medium.
Background
At present, in order to facilitate network transmission and storage of audio, an audio coding technique is usually required to convert original audio data into compressed data, and the amount of the compressed data is smaller, so as to be beneficial to saving a storage space and reducing a network bandwidth required by network transmission.
Disclosure of Invention
Based on the above situation, it is a primary objective of the present invention to provide an audio encoding method and apparatus, an electronic device, and a storage medium, which are beneficial to reducing audio distortion after encoding.
In order to achieve the above object, an embodiment of the present invention provides an audio encoding method, including:
step S1: carrying out voice endpoint detection processing on audio data to be coded so as to segment active audio segments and inactive audio segments in the audio data to be coded to obtain a plurality of audio segments;
step S2: partitioning each active audio segment to obtain a plurality of granularities, performing sub-band decomposition on each granularity, calculating the energy value of each sub-band in each granularity, and calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
step S3: determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
step S4: for each active audio segment, carrying out audio coding on the active audio segment according to the coding rate of the active audio segment;
step S5: and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding rate of each active audio segment is greater than that of each inactive audio segment.
Further, step S2 includes:
step S21: performing block processing on the kth active audio segment obtained by dividing the audio data to be coded to obtain a plurality of granularities, wherein k is 1,2,3, …, and L is the number of the active audio segments obtained by dividing the audio data to be coded;
step S22: performing a subband decomposition operation on each granularity of the kth active audio segment, and then calculating an energy value of each subband of each granularity of the kth active audio segment;
Figure BDA0002482948320000021
wherein, W(k,i)[sb]For the energy value of the sb sub-band in the i granularity of the k active audio segment, SP(k,i)[sb][j]For the spectral value of the jth frequency line of the sb-th sub-band in the ith granularity of the kth active audio segment, sb represents a sub-band number, sb is 1,2,3, …, N is the number of sub-bands in each granularity, j represents a frequency line number, Z is the number of frequency lines of each sub-band, and a is a preset value greater than 1;
step S23: calculating an energy distribution value of the kth active audio segment on each sub-band;
Figure BDA0002482948320000022
wherein D isk[sb]For the energy distribution value of the kth active audio segment on the sb subband, grs _ k is the granularity number obtained after the kth active audio segment is processed in a blocking mode;
step S24: determining a granular mean energy EDS of the kth active audio segmentk
Figure BDA0002482948320000023
Further, the determining a coding rate for each of the active audio segments based on the granular average energy of each of the active audio segments comprises:
acquiring the total target coding rate of the audio data to be coded;
and calculating the coding rate of each active audio segment according to the overall target coding rate of the audio data to be coded, the coding rate of each inactive audio segment and the granularity average energy of each active audio segment.
Further, the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment, including:
the ratio between the coding rates of the active audio segments is consistent with the ratio between the granularity average energy of the active audio segments.
Further, the coding rate of the inactive audio segment is the lowest coding rate supported by the coding format corresponding to the audio coding method.
In order to achieve the above object, an embodiment of the present invention further provides an audio encoding apparatus, including:
the voice endpoint detection processing module is used for carrying out voice endpoint detection processing on the audio data to be coded so as to divide an active audio segment and an inactive audio segment in the audio data to be coded to obtain a plurality of audio segments;
the calculating module is used for carrying out block processing on each active audio segment to obtain a plurality of granularities, carrying out sub-band decomposition on each granularity and calculating the energy value of each sub-band in each granularity, and then calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
the code rate determining module is used for determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
the first coding processing module is used for carrying out audio coding on each active audio segment according to the coding rate of the active audio segment;
and the second coding processing module is used for coding the inactive audio segments obtained by dividing the audio data to be coded, and the coding rate of each active audio segment is greater than that of each inactive audio segment.
In order to achieve the above object, the present invention further provides an audio encoding apparatus, including a processor and a memory coupled to the processor, where the memory stores instructions for the processor to execute, and when the processor executes the instructions, the audio encoding method according to the above can be implemented.
In order to achieve the above object, the present invention further provides an electronic device, including the audio encoding apparatus.
Furthermore, the electronic device is a sound box, a recording pen, a mobile phone, an intelligent tablet, a notebook computer, a desktop computer or an electronic toy.
To achieve the above object, the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the above audio encoding method.
The audio coding method provided by the invention has the advantages that the voice endpoint detection processing is carried out on the audio data to be coded, the active audio segment and the inactive audio segment are divided, the active audio segment with larger granularity average energy is set with larger coding rate, the active audio segment with smaller granularity average energy is set with smaller coding rate, and meanwhile, the coding rate of each active audio segment is larger than that of each inactive audio segment, so that the coding quality can be effectively improved, and the audio distortion after coding is reduced.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth in order to avoid obscuring the nature of the present invention, well-known methods, procedures, and components have not been described in detail.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
Referring to fig. 1, fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present invention, where the audio encoding method includes:
step S1: carrying out voice endpoint detection processing on audio data to be coded so as to segment active audio segments and inactive audio segments in the audio data to be coded to obtain a plurality of audio segments;
the beginning and the end of the speech in the audio data to be encoded can be detected through a voice endpoint detection (VAD) process, for example, in this step, a VAD algorithm based on a threshold, a statistical model or machine learning (such as a neural network) can be adopted to perform voice endpoint detection process on the audio data to be encoded;
for example, the active audio segment may be an audio segment whose audio characteristics satisfy a preset condition, and the inactive audio segment is an audio segment whose audio characteristics do not satisfy the preset condition, where the preset condition may be set according to a specific requirement, and the audio characteristics may include one or more of energy characteristics, spectral characteristics, harmonic characteristics, sub-band signal-to-noise ratio, and zero-crossing rate, that is, the audio characteristics extracted from the audio data to be encoded are analyzed by using a VAD method, so as to implement the decision of the active audio segment and the inactive audio segment;
step S2: partitioning each active audio segment to obtain a plurality of granularities, performing sub-band decomposition on each granularity, calculating the energy value of each sub-band in each granularity, and calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
wherein, for each active audio segment, the granularity average energy is the average of the energy of each granularity;
step S3: determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
that is, for any two of the active audio segments, the coding rate of the active audio segment with larger granularity average energy is greater than the coding rate of the active audio segment with smaller granularity average energy;
step S4: for each active audio segment, carrying out audio coding on the active audio segment according to the coding rate of the active audio segment;
step S5: and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding rate of each active audio segment is greater than that of each inactive audio segment.
According to the audio coding method provided by the embodiment of the invention, voice endpoint detection processing is carried out on audio data to be coded, an active audio segment and an inactive audio segment are segmented, a larger coding rate is set for the active audio segment with larger granularity average energy, a smaller coding rate is set for the active audio segment with smaller granularity average energy, and meanwhile, the coding rate of each active audio segment is greater than that of each inactive audio segment, so that the coding quality can be effectively improved, and the audio distortion after coding is reduced.
For example, in an embodiment, the step S2 may specifically include:
step S21: performing block processing on the kth active audio segment obtained by dividing the audio data to be coded to obtain a plurality of granularities, wherein k is 1,2,3, …, and L is the number of the active audio segments obtained by dividing the audio data to be coded;
step S22: performing a subband decomposition operation on each granularity of the kth active audio segment, and then calculating an energy value of each subband of each granularity of the kth active audio segment;
Figure BDA0002482948320000051
wherein, W(k,i)[sb]For the energy value of the sb sub-band in the i granularity of the k active audio segment, SP(k,i)[sb][j]For the spectral value of the j-th frequency line of the sb-th subband in the i-th granularity of the k-th active audio segment, sb denotes a subband number, sb ═ 1,2,3, …, N is the number of subbands in each granularity, j denotes a frequency line number, Z is the number of frequency lines per subband, a is a preset value greater than 1, e.g., the value of a is 2 or e or 10;
step S23: calculating an energy distribution value of the kth active audio segment on each sub-band;
Figure BDA0002482948320000061
wherein D isk[sb]For the energy distribution value of the kth active audio segment on the sb subband, grs _ k is the granularity number obtained after the kth active audio segment is processed in a blocking mode;
step S24: determining a granular mean energy EDS of the kth active audio segmentk
Figure BDA0002482948320000062
The Audio coding method in the embodiment of the present invention may be applied to an MP1(MPEG-1/2/2.5Audio Layer-1) coding process, an MP2(MPEG-1/2/2.5Audio Layer-2) coding process, an MP3(MPEG-1/2/2.5Audio Layer-3) coding process, or an encoding process of other transform coding formats;
preferably, in an embodiment, the coding rate of the inactive audio segment may be set to the lowest coding rate that can be supported by the coding format corresponding to the audio coding method, for example, in an embodiment, the audio coding method is the coding method of the MP2 coding format, and if the sampling rate of the audio data to be coded is greater than or equal to 32kHz, the coding rate of each inactive audio segment is 32 kbps; and if the sampling rate of the audio data to be coded is less than 32kHz, the coding rate of each inactive audio segment is 8 kbps.
In this embodiment, a larger coding rate is set for an active audio segment with larger granularity average energy, and a smaller coding rate is set for an active audio segment with smaller granularity average energy, that is, more bit streams are allocated to the active audio segment with larger granularity average energy, less bit streams are allocated to the active audio segment with smaller granularity average energy, and the coding rate of each active audio segment is made larger than that of each inactive audio segment, so that the coding quality can be effectively improved under the condition of the same compression rate, and a higher compression rate can be obtained under the condition of the same coding quality.
For example, in an embodiment, determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment may specifically include:
acquiring the total target coding rate of the audio data to be coded;
and calculating the coding rate of each active audio segment according to the overall target coding rate of the audio data to be coded, the coding rate of each inactive audio segment and the granularity average energy of each active audio segment.
For example, in one embodiment, the coding rate of the inactive audio segment is set to the lowest coding rate that can be supported by the currently encoded coding format, and then the coding rate of each active audio segment is calculated according to the overall target coding rate of the audio data to be encoded and the ratio between the granularity average energy of each active audio segment, the ratio between the coding rates of each active audio segment is consistent with the ratio between the granularity average energy of each active audio segment, and the sum of the number of bits of each audio segment is equal to the number of bits of the audio data to be encoded (the product of the overall target coding rate of the audio data to be encoded and the audio duration of the audio data to be encoded).
For example, the audio encoding method provided by the embodiment of the present invention may be applied to encoding in an MP2 encoding format, and the specific process steps include the following steps:
step A: performing voice endpoint detection (VAD) processing on audio data to be coded, and segmenting an active audio segment and an inactive audio segment (approximate mute segment with lower energy) in the audio data to be coded, wherein the signal length of each active audio segment is integral multiple of 384 sampling points, for example, two active audio segments and one inactive audio segment are obtained after the audio data to be coded is segmented;
the 1 st active audio segment is subjected to block processing to obtain a plurality of granularities, each block of 384 sampling points is used as one granularity to carry out the sub-band decomposition operation of MP2 coding to obtain 32 sub-bands, and each sub-band comprises 12 frequency lines, namely N is 32, and Z is 12;
and B: performing a subband decomposition operation on each granularity of the 1 st active audio segment, and then calculating the energy value of each subband of each granularity of the 1 st active audio segment;
Figure BDA0002482948320000071
wherein, W(1,i)[sb]For the energy value of the sb sub-band in the i granularity of the 1 st active audio segment, SP(1,i)[sb][j]Is the spectral value of the jth frequency line of the sb sub-band in the ith granularity of the 1 st active audio segment;
and C: calculating an energy distribution value of the 1 st active audio segment on each sub-band;
Figure BDA0002482948320000072
for example, the 1 st active audio segment has a total of 36 granularities (i.e., grs _1 ═ 36), i.e., D1[1]Is the energy value W of the 1 st subband of the 1 st granularity(1,1)[1]Energy value W of sub-band 1 of 2 nd granularity(1,2)[1]Energy value W of sub-band 1 of the 36 th granularity(1,36)[1]Sum, and for the same reason, sequentially calculating D1[1]、...、D1[32]The total energy distribution value is 32;
step D: calculating an expectation and summing the energy distribution values of the 1 st active audio segment sub-band;
EDF1[sb]=D1[sb]/grs_1;
Figure BDA0002482948320000081
and obtaining an information quantity distribution table of the 1 st active audio frequency segment, and recording the information quantity distribution table as EDF _ 1: EDF1[sb:1~32]And the granularity mean energy EDS of the 1 st active audio segment1
Wherein, the EDFk[sb]Representing an energy distribution expectation of the kth active audio segment on the sb sub-band;
step E: processing the 2 nd active audio frequency segment by the same method to obtain the information quantity distribution table EDF _2 and the granularity average energy EDS of the 2 nd active audio frequency segment2
Setting the coding rate of the inactive audio segment as the lowest coding rate bitrate _ min which can be supported by MP2 coding, and then according to EDS1、EDS2The coding rate of the 1 st active audio segment and the coding rate of the 2 nd active audio segment are calculated according to the ratio and the total target coding rate of the audio data to be coded;
by the mode, bit streams consumed by inactive audio segments can be reduced, the compression ratio is improved, the bit streams are reasonably distributed among the active audio segments, and the coding quality of the active audio segments is effectively improved;
step F: for each audio segment in the plurality of audio segments, performing MP2 audio coding on the audio segment according to the coding rate of the audio segment, for example, each audio segment may be coded by using a psychoacoustic model;
the audio encoding method provided by the embodiment has smaller distortion of the MP2 encoded at the same encoding rate, and can better ensure the recovery of the audio signal.
In addition, since the fixed psychoacoustic model usually preferentially allocates the bitstream at low frequencies, and the actual sounds are different types, since the information distribution of the subbands is different in different types of sound signals, if the fixed psychoacoustic model is adopted, the problem of severe high frequency loss is easily caused in audio signals with rich frequencies or more high frequencies, and preferably, in an embodiment, the encoding process of the kth active audio segment includes:
step S41: obtaining a signal masking ratio of each subband in each granularity of the kth active audio segment, and then calculating a bit distribution weight value of each subband in each granularity of the kth active audio segment according to the signal masking ratio of each subband in each granularity of the kth active audio segment and an energy distribution value of the kth active audio segment on each subband;
P(k,i)[sb]=C*SMR(k,i)[sb]*Dk[sb];
wherein, P(k,i)[sb]Assigning weight values, SMR, to bits of an sb sub-band in an i granularity of the kth active audio segment(k,i)[sb]C is a preset coefficient and is a positive value, and is the signal masking ratio of the sb sub-band in the ith granularity of the kth active audio segment;
for each granularity, analyzing and processing a frame corresponding to the granularity by adopting a psychoacoustic model to obtain a signal masking table, wherein the signal masking table comprises a signal masking ratio of each sub-band in the granularity;
step S42: for each granularity of the kth active audio segment, carrying out bit distribution on each sub-band according to the bit distribution weight value of each sub-band, wherein the number of bits distributed by the sub-band with the larger bit distribution weight value is larger than that distributed by the sub-band with the smaller bit distribution weight value in any two sub-bands in the same granularity;
for example, for each granularity of the k-th active audio segment, a ratio between the number of bits allocated by each sub-band is consistent with a ratio between the weight values allocated by the bits of each sub-band.
Step S43: quantizing each sub-band in each granularity of the kth active audio segment according to the number of bits allocated to each sub-band, and performing bit stream encapsulation after quantization;
for example, in step S41, the EDF may be normalized with respect to the information amount distribution table of the kth active audio segmentk[sb]/EDSkThen multiplying N and the corresponding signal masking ratio in sequence;
P(k,i)[sb]=SMR(k,i)[sb]*Dk[sb]*N/(grs_k*EDSk);
i.e. for the k-th active audio segment: n/(grs _ k EDS) Ck);
For example, in an embodiment, each active audio segment obtained by dividing the audio data to be encoded is encoded by using the above steps S41-S43, that is, k is an integer between 1 and L.
By the mode, different types of sound signals can be considered, signal recovery of the different types of sound signals after coding is facilitated, and stable quantization level and timely spectrum tracking are facilitated.
An embodiment of the present invention further provides an audio encoding apparatus, including:
the voice endpoint detection processing module is used for carrying out voice endpoint detection processing on the audio data to be coded so as to divide an active audio segment and an inactive audio segment in the audio data to be coded to obtain a plurality of audio segments;
the calculating module is used for carrying out block processing on each active audio segment to obtain a plurality of granularities, carrying out sub-band decomposition on each granularity and calculating the energy value of each sub-band in each granularity, and then calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
the code rate determining module is used for determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
the first coding processing module is used for carrying out audio coding on each active audio segment according to the coding rate of the active audio segment;
and the second coding processing module is used for coding the inactive audio segments obtained by dividing the audio data to be coded, and the coding rate of each active audio segment is greater than that of each inactive audio segment.
An embodiment of the present invention further provides an audio encoding apparatus, including a processor and a memory coupled to the processor, where the memory stores instructions for the processor to execute, and when the processor executes the instructions, the audio encoding method can be implemented.
The audio coding device provided by the embodiment of the invention has the advantages that the voice endpoint detection processing is carried out on the audio data to be coded, the active audio segment and the inactive audio segment are divided, the active audio segment with larger granularity average energy is set with larger coding code rate, the active audio segment with smaller granularity average energy is set with smaller coding code rate, and meanwhile, the code rate of each active audio segment is larger than that of each inactive audio segment, so that the coding quality can be effectively improved, and the audio distortion after coding is reduced.
The embodiment of the invention also provides electronic equipment which comprises the audio coding device. For example, the electronic device may be a sound box, a voice pen, a mobile phone, a smart tablet, a notebook computer, a desktop computer, or an electronic toy.
An embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the audio encoding method.
It will be appreciated by those skilled in the art that the above-described preferred embodiments may be freely combined, superimposed, without conflict.
It will be understood that the embodiments described above are illustrative only and not restrictive, and that various obvious and equivalent modifications and substitutions for details described herein may be made by those skilled in the art without departing from the basic principles of the invention.

Claims (10)

1. An audio encoding method, comprising:
step S1: carrying out voice endpoint detection processing on audio data to be coded so as to segment active audio segments and inactive audio segments in the audio data to be coded to obtain a plurality of audio segments;
step S2: partitioning each active audio segment to obtain a plurality of granularities, performing sub-band decomposition on each granularity, calculating the energy value of each sub-band in each granularity, and calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
step S3: determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
step S4: for each active audio segment, carrying out audio coding on the active audio segment according to the coding rate of the active audio segment;
step S5: and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding rate of each active audio segment is greater than that of each inactive audio segment.
2. The method according to claim 1, wherein step S2 includes:
step S21: performing block processing on the kth active audio segment obtained by dividing the audio data to be coded to obtain a plurality of granularities, wherein k is 1,2,3, …, and L is the number of the active audio segments obtained by dividing the audio data to be coded;
step S22: performing a subband decomposition operation on each granularity of the kth active audio segment, and then calculating an energy value of each subband of each granularity of the kth active audio segment;
Figure FDA0002482948310000011
wherein, W(k,i)[sb]For the energy value of the sb sub-band in the i granularity of the k active audio segment, SP(k,i)[sb][j]For the spectral value of the jth frequency line of the sb-th sub-band in the ith granularity of the kth active audio segment, sb represents a sub-band number, sb is 1,2,3, …, N is the number of sub-bands in each granularity, j represents a frequency line number, Z is the number of frequency lines of each sub-band, and a is a preset value greater than 1;
step S23: calculating an energy distribution value of the kth active audio segment on each sub-band;
Figure FDA0002482948310000012
wherein D isk[sb]For the energy distribution value of the kth active audio segment on the sb subband, grs _ k is the granularity number obtained after the kth active audio segment is processed in a blocking mode;
step S24: determining a granular mean energy EDS of the kth active audio segmentk
Figure FDA0002482948310000021
3. The method of claim 1, wherein determining a coding rate for each of the active audio segments based on a granular average energy of each of the active audio segments comprises:
acquiring the total target coding rate of the audio data to be coded;
and calculating the coding rate of each active audio segment according to the overall target coding rate of the audio data to be coded, the coding rate of each inactive audio segment and the granularity average energy of each active audio segment.
4. The method of claim 1, wherein the coding rate of the active audio segments positively correlates with the granular mean energy of the active audio segments, comprising:
the ratio between the coding rates of the active audio segments is consistent with the ratio between the granularity average energy of the active audio segments.
5. The method according to any of claims 1-4, wherein the coding rate of the inactive audio segment is the lowest coding rate supported by the coding format corresponding to the audio coding method.
6. An audio encoding apparatus, comprising:
the voice endpoint detection processing module is used for carrying out voice endpoint detection processing on the audio data to be coded so as to divide an active audio segment and an inactive audio segment in the audio data to be coded to obtain a plurality of audio segments;
the calculating module is used for carrying out block processing on each active audio segment to obtain a plurality of granularities, carrying out sub-band decomposition on each granularity and calculating the energy value of each sub-band in each granularity, and then calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
the code rate determining module is used for determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
the first coding processing module is used for carrying out audio coding on each active audio segment according to the coding rate of the active audio segment;
and the second coding processing module is used for coding the inactive audio segments obtained by dividing the audio data to be coded, and the coding rate of each active audio segment is greater than that of each inactive audio segment.
7. Audio coding device comprising a processor and a memory coupled to the processor, wherein the memory has stored therein instructions for execution by the processor, wherein the instructions, when executed by the processor, enable implementation of the method according to any one of claims 1 to 5.
8. An electronic device, characterized in that it comprises an audio coding apparatus as claimed in claim 6 or an audio coding apparatus as claimed in claim 7.
9. The electronic device of claim 8, wherein the electronic device is a sound box, a voice pen, a mobile phone, a smart tablet, a laptop computer, a desktop computer, or an electronic toy.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.
CN202010383119.7A 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium Active CN112037803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010383119.7A CN112037803B (en) 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010383119.7A CN112037803B (en) 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112037803A true CN112037803A (en) 2020-12-04
CN112037803B CN112037803B (en) 2023-09-29

Family

ID=73579419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010383119.7A Active CN112037803B (en) 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112037803B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023005414A1 (en) * 2021-07-29 2023-02-02 华为技术有限公司 Audio signal encoding method and apparatus, and audio signal decoding method and apparatus

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025919A (en) * 2006-02-22 2007-08-29 上海奇码数字信息有限公司 Synthetic sub-band filtering method for audio decoding and synthetic sub-band filter
CN101320563A (en) * 2007-06-05 2008-12-10 华为技术有限公司 Background noise encoding/decoding device, method and communication equipment
CN101335000A (en) * 2008-03-26 2008-12-31 华为技术有限公司 Method and apparatus for encoding and decoding
CN101552007A (en) * 2004-03-01 2009-10-07 杜比实验室特许公司 Multiple channel audio code
US20090299756A1 (en) * 2004-03-01 2009-12-03 Dolby Laboratories Licensing Corporation Ratio of speech to non-speech audio such as for elderly or hearing-impaired listeners
CN101800050A (en) * 2010-02-03 2010-08-11 武汉大学 Audio fine scalable coding method and system based on perception self-adaption bit allocation
CN102177543A (en) * 2008-10-08 2011-09-07 弗朗霍夫应用科学研究促进协会 Audio decoder, audio encoder, method for decoding an audio signal, method for encoding an audio signal, computer program and audio signal
CN102543090A (en) * 2011-12-31 2012-07-04 深圳市茂碧信息科技有限公司 Code rate automatic control system applicable to variable bit rate voice and audio coding
CN103247293A (en) * 2013-05-14 2013-08-14 中国科学院自动化研究所 Coding method and decoding method for voice data
US10043527B1 (en) * 2015-07-17 2018-08-07 Digimarc Corporation Human auditory system modeling with masking energy adaptation
CN109600700A (en) * 2018-11-16 2019-04-09 珠海市杰理科技股份有限公司 Audio data processing method, device, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552007A (en) * 2004-03-01 2009-10-07 杜比实验室特许公司 Multiple channel audio code
US20090299756A1 (en) * 2004-03-01 2009-12-03 Dolby Laboratories Licensing Corporation Ratio of speech to non-speech audio such as for elderly or hearing-impaired listeners
CN101025919A (en) * 2006-02-22 2007-08-29 上海奇码数字信息有限公司 Synthetic sub-band filtering method for audio decoding and synthetic sub-band filter
CN101320563A (en) * 2007-06-05 2008-12-10 华为技术有限公司 Background noise encoding/decoding device, method and communication equipment
CN101335000A (en) * 2008-03-26 2008-12-31 华为技术有限公司 Method and apparatus for encoding and decoding
CN102177543A (en) * 2008-10-08 2011-09-07 弗朗霍夫应用科学研究促进协会 Audio decoder, audio encoder, method for decoding an audio signal, method for encoding an audio signal, computer program and audio signal
CN101800050A (en) * 2010-02-03 2010-08-11 武汉大学 Audio fine scalable coding method and system based on perception self-adaption bit allocation
CN102543090A (en) * 2011-12-31 2012-07-04 深圳市茂碧信息科技有限公司 Code rate automatic control system applicable to variable bit rate voice and audio coding
CN103247293A (en) * 2013-05-14 2013-08-14 中国科学院自动化研究所 Coding method and decoding method for voice data
US10043527B1 (en) * 2015-07-17 2018-08-07 Digimarc Corporation Human auditory system modeling with masking energy adaptation
CN109600700A (en) * 2018-11-16 2019-04-09 珠海市杰理科技股份有限公司 Audio data processing method, device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023005414A1 (en) * 2021-07-29 2023-02-02 华为技术有限公司 Audio signal encoding method and apparatus, and audio signal decoding method and apparatus

Also Published As

Publication number Publication date
CN112037803B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US10546592B2 (en) Audio signal coding and decoding method and device
TWI604437B (en) Bit allocating method, bit allocating apparatus and computer readable recording medium
US9972326B2 (en) Method and apparatus for allocating bits of audio signal
CN106941004B (en) Method and apparatus for bit allocation of audio signal
CN111968656B (en) Signal encoding method and device and signal decoding method and device
US10789964B2 (en) Dynamic bit allocation methods and devices for audio signal
CN106847297A (en) The Forecasting Methodology of high-frequency band signals, coding/decoding apparatus
CN112037803B (en) Audio encoding method and device, electronic equipment and storage medium
CN112037802B (en) Audio coding method and device based on voice endpoint detection, equipment and medium
CN102737636B (en) Audio coding method and device thereof
US20120143599A1 (en) Warped spectral and fine estimate audio encoding
CN116018642A (en) Maintaining invariance of perceptual dissonance and sound localization cues in an audio codec
CN115641857A (en) Audio processing method, device, electronic equipment, storage medium and program product
Ghahabi et al. Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 519075 No. 333, Kexing Road, Xiangzhou District, Zhuhai City, Guangdong Province

Applicant after: ZHUHAI JIELI TECHNOLOGY Co.,Ltd.

Address before: Floor 1-107, building 904, ShiJiHua Road, Zhuhai City, Guangdong Province

Applicant before: ZHUHAI JIELI TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant