CN112037803B - Audio encoding method and device, electronic equipment and storage medium - Google Patents

Audio encoding method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112037803B
CN112037803B CN202010383119.7A CN202010383119A CN112037803B CN 112037803 B CN112037803 B CN 112037803B CN 202010383119 A CN202010383119 A CN 202010383119A CN 112037803 B CN112037803 B CN 112037803B
Authority
CN
China
Prior art keywords
audio segment
audio
coding
granularity
active audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010383119.7A
Other languages
Chinese (zh)
Other versions
CN112037803A (en
Inventor
闫玉凤
肖全之
黄荣均
方桂萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Jieli Technology Co Ltd
Original Assignee
Zhuhai Jieli Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Jieli Technology Co Ltd filed Critical Zhuhai Jieli Technology Co Ltd
Priority to CN202010383119.7A priority Critical patent/CN112037803B/en
Publication of CN112037803A publication Critical patent/CN112037803A/en
Application granted granted Critical
Publication of CN112037803B publication Critical patent/CN112037803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides an audio coding method and device, electronic equipment and storage medium, wherein the method comprises the following steps: performing voice endpoint detection processing on the audio data to be encoded so as to divide active audio segments and inactive audio segments in the audio data to be encoded; for each active audio segment, calculating the granularity average energy of each sub-band in each granularity by using the energy value of each sub-band; determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment; for each active audio segment, performing audio coding on the active audio segment according to the coding code rate; and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding rate of each active audio segment is larger than that of each inactive audio segment. The invention can be beneficial to improving the coding quality and reducing the audio distortion after coding.

Description

Audio encoding method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of audio encoding technologies, and in particular, to an audio encoding method and apparatus, an electronic device, and a storage medium.
Background
At present, in order to facilitate network transmission and storage of audio, an audio encoding technology is generally required to convert original audio data into compressed data, and the data volume after compression is smaller, so that storage space is saved and network bandwidth required by network transmission is reduced, but in general, audio distortion is easily caused after encoding.
Disclosure of Invention
Based on the above-mentioned situation, a main object of the present invention is to provide an audio encoding method and apparatus, an electronic device, and a storage medium, which are beneficial to reducing audio distortion after encoding.
In order to achieve the above object, the present invention provides an audio encoding method, including:
step S1: performing voice endpoint detection processing on audio data to be encoded so as to divide active audio segments and inactive audio segments in the audio data to be encoded to obtain a plurality of audio segments;
step S2: performing block processing on each active audio segment to obtain a plurality of granularities, performing sub-band decomposition on each granularity, calculating the energy value of each sub-band in each granularity, and calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
step S3: determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
step S4: for each active audio segment, carrying out audio coding on the active audio segment according to the coding code rate;
step S5: and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding code rate of each active audio segment is larger than that of each inactive audio segment.
Further, step S2 includes:
step S21: the k-th active audio segment obtained by dividing the audio data to be encoded is subjected to block processing to obtain a plurality of granularities, wherein k=1, 2,3, …, L is the number of active audio segments obtained by dividing the audio data to be encoded;
step S22: sub-band decomposition is carried out on each granularity of the kth active audio segment, and then the energy value of each sub-band of each granularity of the kth active audio segment is calculated;
wherein W is (k,i) [sb]In the ith granularity of the kth active audio segmentEnergy value of sb-th subband, SP (k,i) [sb][j]For the frequency spectrum value of the j-th frequency line of the sb sub-band in the ith granularity of the kth active audio segment, sb represents a sub-band number, sb=1, 2,3, …, N is the number of sub-bands in each granularity, j represents a frequency line number, Z is the number of frequency lines of each sub-band, and a is a preset value greater than 1;
step S23: calculating an energy distribution value of the kth active audio segment on each sub-band;
wherein D is k [sb]The grs_k is the granularity number obtained after the block processing of the kth active audio segment;
step S24: determining a granularity average energy EDS of the kth active audio segment k
Further, the determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment includes:
acquiring an overall target coding rate of the audio data to be coded;
and calculating the coding rate of each active audio segment according to the overall target coding rate of the audio data to be coded, the coding rate of each inactive audio segment and the granularity average energy of each active audio segment.
Further, the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment, and the method comprises the following steps:
the ratio between the coding rates of the active audio segments is consistent with the ratio between the granularity average energy of the active audio segments.
Further, the coding rate of the inactive audio segment is the lowest coding rate supported by the coding format corresponding to the audio coding method.
In order to achieve the above object, the present invention further provides an audio encoding apparatus, including:
the voice endpoint detection processing module is used for carrying out voice endpoint detection processing on the audio data to be encoded so as to divide the active audio segment and the inactive audio segment in the audio data to be encoded to obtain a plurality of audio segments;
the computing module is used for carrying out block processing on each active audio segment to obtain a plurality of granularities, carrying out sub-band decomposition on each granularity, computing the energy value of each sub-band in each granularity, and then computing the granularity average energy of each active audio segment by utilizing the energy value of each sub-band in each granularity;
the code rate determining module is used for determining the coding code rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding code rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
the first coding processing module is used for carrying out audio coding on each active audio segment according to the coding code rate of the active audio segment;
and the second coding processing module is used for coding the inactive audio segments obtained by dividing the audio data to be coded, and the coding rate of each active audio segment is larger than that of each inactive audio segment.
In order to achieve the above object, the present invention further provides an audio encoding apparatus, which includes a processor and a memory coupled to the processor, wherein instructions are stored in the memory for the processor to execute, and when the processor executes the instructions, the audio encoding method can be implemented.
In order to achieve the above object, the present invention further provides an electronic device, which includes the above audio encoding apparatus.
Further, the electronic equipment is a sound box, a recording pen, a mobile phone, an intelligent tablet, a notebook computer, a desktop computer or an electronic toy.
To achieve the above object, the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above audio encoding method.
According to the audio coding method provided by the invention, the voice endpoint detection processing is carried out on the audio data to be coded, the active audio segments and the inactive audio segments are segmented, the active audio segments with larger granularity average energy are provided with larger coding code rates, the active audio segments with smaller granularity average energy are provided with smaller coding code rates, and meanwhile, the coding code rate of each active audio segment is larger than the coding code rate of each inactive audio segment, so that the coding quality can be effectively improved, and the audio distortion after coding is reduced.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present invention.
Detailed Description
The present invention is described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth in order to avoid obscuring the present invention, and in order to avoid obscuring the present invention, well-known methods, procedures, flows, and components are not presented in detail.
Moreover, those of ordinary skill in the art will appreciate that the drawings are provided herein for illustrative purposes and that the drawings are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.
Referring to fig. 1, fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present invention, the audio encoding method includes:
step S1: performing voice endpoint detection processing on audio data to be encoded so as to divide active audio segments and inactive audio segments in the audio data to be encoded to obtain a plurality of audio segments;
the start and end of speech in the audio data to be encoded may be detected by a speech end-point detection (VAD) process, for example, in this step, the VAD algorithm based on threshold, statistical model or machine learning (e.g. neural network) may be used to perform the speech end-point detection process on the audio data to be encoded;
for example, the active audio segment may be an audio segment whose audio feature meets a preset condition, and the inactive audio segment is an audio segment whose audio feature does not meet the preset condition, where the preset condition may be set according to specific requirements, and the audio feature may include one or more of an energy feature, a spectrum feature, a harmonic feature, a subband signal-to-noise ratio, and a zero-crossing rate, that is, the audio feature extracted from the audio data to be encoded is analyzed and processed by using a VAD method, so as to implement a decision of the active audio segment and the inactive audio segment;
step S2: performing block processing on each active audio segment to obtain a plurality of granularities, performing sub-band decomposition on each granularity, calculating the energy value of each sub-band in each granularity, and calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
wherein, for each active audio segment, the average energy of the granularity is the average value of the energy of each granularity;
step S3: determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
that is, for any two active audio segments, the coding rate of the active audio segment with larger granularity average energy is larger than that of the active audio segment with smaller granularity average energy;
step S4: for each active audio segment, carrying out audio coding on the active audio segment according to the coding code rate;
step S5: and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding code rate of each active audio segment is larger than that of each inactive audio segment.
According to the audio coding method provided by the embodiment of the invention, the voice endpoint detection processing is carried out on the audio data to be coded, the active audio segments and the inactive audio segments are segmented, the active audio segments with larger granularity average energy are provided with larger coding code rates, the active audio segments with smaller granularity average energy are provided with smaller coding code rates, and meanwhile, the coding code rate of each active audio segment is larger than the coding code rate of each inactive audio segment, so that the coding quality can be effectively improved, and the audio distortion after coding is reduced.
For example, in an embodiment, the step S2 may specifically include:
step S21: the k-th active audio segment obtained by dividing the audio data to be encoded is subjected to block processing to obtain a plurality of granularities, wherein k=1, 2,3, …, L is the number of active audio segments obtained by dividing the audio data to be encoded;
step S22: sub-band decomposition is carried out on each granularity of the kth active audio segment, and then the energy value of each sub-band of each granularity of the kth active audio segment is calculated;
wherein W is (k,i) [sb]For the energy value, SP, of the sb sub-band in the ith granularity of the kth active audio segment (k,i) [sb][j]For the spectrum value of the j-th frequency line of the sb sub-band in the ith granularity of the kth active audio segment, sb represents a sub-band number, sb=1, 2,3, …, N is the number of sub-bands in each granularity, j represents a frequency line number, Z is the number of frequency lines of each sub-band, a is a preset value greater than 1, for example, a is 2 or e or 10;
step S23: calculating an energy distribution value of the kth active audio segment on each sub-band;
wherein D is k [sb]The grs_k is the granularity number obtained after the block processing of the kth active audio segment;
step S24: determining a granularity average energy EDS of the kth active audio segment k
The Audio coding method in the embodiment of the invention can be applied to MP1 (MPEG-1/2/2.5 Audio Layer-1) coding process, MP2 (MPEG-1/2/2.5 Audio Layer-2) coding process or MP3 (MPEG-1/2/2.5 Audio Layer-3) coding process, and can also be applied to coding process of other transformation coding formats;
preferably, in an embodiment, the coding rate of the inactive audio segment may be set to be the lowest coding rate that can be supported by the coding format corresponding to the audio coding method, for example, in an embodiment, the audio coding method is an MP2 coding format coding method, and if the sampling rate of the audio data to be coded is greater than or equal to 32kHz, the coding rate of each inactive audio segment is 32kbps; if the sampling rate of the audio data to be encoded is smaller than 32kHz, the encoding code rate of each inactive audio segment is 8kbps.
In this embodiment, by setting a larger coding rate for the active audio segments with larger granularity average energy, setting a smaller coding rate for the active audio segments with smaller granularity average energy, that is, allocating more bitstreams to the active audio segments with larger granularity average energy, allocating less bitstreams to the active audio segments with smaller granularity average energy, and making the coding rate of each active audio segment greater than the coding rate of each inactive audio segment, the coding quality can be effectively improved under the same compression rate, and a higher compression rate can be obtained under the same coding quality.
For example, in an embodiment, determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment may specifically include:
acquiring an overall target coding rate of the audio data to be coded;
and calculating the coding rate of each active audio segment according to the overall target coding rate of the audio data to be coded, the coding rate of each inactive audio segment and the granularity average energy of each active audio segment.
For example, in one embodiment, the coding rate of the inactive audio segment is set to be the lowest coding rate that can be supported by the current coding format, and then the coding rate of each active audio segment is calculated according to the overall target coding rate of the audio data to be coded and the ratio between the granularity average energy of each active audio segment, where the ratio between the coding rates of each active audio segment is consistent with the ratio between the granularity average energy of each active audio segment, and the sum of the number of bits of each audio segment is equal to the number of bits of the audio data to be coded (the product of the overall target coding rate of the audio data to be coded and the audio duration of the audio data to be coded).
For example, the audio encoding method provided by the embodiment of the invention can be applied to encoding of an MP2 encoding format, and the specific flow steps include the following:
step A: performing voice endpoint detection (VAD) processing on the audio data to be encoded, and dividing an active audio segment and an inactive audio segment (approximate silence segment with lower energy) in the audio data to be encoded, wherein the signal length of each active audio segment is an integer multiple of 384 sampling points, for example, two active audio segments and one inactive audio segment are obtained after the audio data to be encoded is divided;
the 1 st active audio segment is subjected to block processing to obtain a plurality of granularities, 384 sampling points are used as a granularity to carry out MP2 coding sub-band decomposition operation, 32 sub-bands are obtained, and each sub-band comprises 12 frequency lines, namely N=32 and Z=12;
and (B) step (B): sub-band decomposition is performed on each granularity of the 1 st active audio segment, and then energy values of each sub-band of each granularity of the 1 st active audio segment are calculated;
wherein W is (1,i) [sb]For the energy value, SP, of the sb sub-band in the ith granularity of the 1 st active audio segment (1,i) [sb][j]A spectral value of a jth frequency line of a sb sub-band in an ith granularity of the 1 st active audio segment;
step C: calculating the energy distribution value of the 1 st active audio segment on each sub-band;
for example, the 1 st active audio segment has 36 granularity (i.e., grs1=36), i.e., D 1 [1]The value of (1) is the energy value W of the 1 st subband of the 1 st granularity (1,1) [1]Energy value W of 1 st subband of 2 nd granularity (1,2) [1]Energy value W of sub-band 1 of granularity 36 (1,36) [1]And, similarly, sequentially calculating D 1 [1]、...、D 1 [32]There are 32 energy distribution values;
step D: calculating expectation and summing the energy distribution value of the 1 st active audio segment sub-band;
EDF 1 [sb]=D 1 [sb]/grs_1;
the information quantity distribution table of the 1 st active audio segment is obtained and is marked as EDF_1: EDF (electronic data flow) 1 [sb:1~32]And the granularity average energy EDS of the 1 st active audio segment 1
Wherein EDF k [sb]Indicating the energy distribution expectations of the kth active audio segment on the sb sub-band;
step E: processing the 2 nd active audio segment by the same method to obtain an information quantity distribution table EDF_2 and a granularity average energy EDS of the 2 nd active audio segment 2
Setting the coding rate of the inactive audio segment as the lowest coding rate bit_min supported by MP2 coding, and then according to EDS 1 、EDS 2 Calculating the coding rate of the 1 st active audio segment and the coding rate of the 2 nd active audio segment according to the proportion and the overall target coding rate of the audio data to be coded;
by the method, bit streams consumed by the inactive audio segments can be reduced, the compression ratio is improved, the bit streams are reasonably distributed among the active audio segments, and the coding quality of the active audio segments is effectively improved;
step F: for each of the plurality of audio segments, MP2 audio encoding is performed according to an encoding rate, for example, each audio segment may be encoded using a psychoacoustic model;
according to the audio coding method, MP2 distortion coded on the same coding rate is smaller, and recovery of an audio signal can be well guaranteed.
Furthermore, since the fixed psychoacoustic model will preferentially allocate bit streams at low frequencies, and the actual sounds are of different kinds, and since the information distribution of the sub-bands is different in different kinds of sound signals, if the fixed psychoacoustic model is used, the problem that the high frequency loss is more serious is easily caused in the audio signals with rich frequencies or more high frequencies, preferably, in an embodiment, the encoding process of the kth active audio segment includes:
step S41: acquiring a signal masking ratio of each sub-band in each granularity of the kth active audio segment, and then calculating a bit allocation weight value of each sub-band in each granularity of the kth active audio segment according to the signal masking ratio of each sub-band in each granularity of the kth active audio segment and an energy distribution value of the kth active audio segment on each sub-band;
P (k,i) [sb]=C*SMR (k,i) [sb]*D k [sb];
wherein P is (k,i) [sb]Assigning weight values to bits of sb sub-bands in the ith granularity of the kth active audio segment, SMR (k,i) [sb]The signal masking ratio of the sb sub-band in the ith granularity of the kth active audio segment is given, and C is a preset coefficient and is a positive value;
for each granularity, a psychoacoustic model can be adopted to analyze and process frames corresponding to the granularity to obtain a signal masking table, wherein the signal masking table comprises the signal masking ratio of each sub-band in the granularity;
step S42: for each granularity of the kth active audio segment, carrying out bit allocation on each sub-band according to the bit allocation weight value of each sub-band, wherein the number of bits allocated by the sub-band with larger bit allocation weight value is larger than that of bits allocated by the sub-band with smaller bit allocation weight value in any two sub-bands in the same granularity;
for example, for each granularity of the kth active audio segment, the ratio between the number of bits allocated for each sub-band is consistent with the ratio between the bit allocation weight values for each sub-band.
Step S43: for each sub-band in each granularity of the kth active audio segment, quantizing the frequency line according to the number of bits allocated to the sub-band, and performing bit stream encapsulation after quantization;
for example, inIn the step S41, the EDF may be normalized to the information amount distribution table of the kth active audio segment k [sb]/EDS k Then multiplying by N and corresponding signal masking ratio in turn;
P (k,i) [sb]=SMR (k,i) [sb]*D k [sb]*N/(grs_k*EDS k );
i.e. for the kth active audio segment: c=n/(grs_k EDS k );
For example, in one embodiment, each active audio segment obtained by dividing the audio data to be encoded is encoded in the steps S41-S43, i.e., k is each integer between 1 and L in turn.
By the mode, different kinds of sound signals can be considered, the signal recovery of the different kinds of sound signals after coding is facilitated, and the stable quantization level and the timely spectrum tracking are also facilitated.
The embodiment of the invention also provides an audio coding device, which comprises:
the voice endpoint detection processing module is used for carrying out voice endpoint detection processing on the audio data to be encoded so as to divide the active audio segment and the inactive audio segment in the audio data to be encoded to obtain a plurality of audio segments;
the computing module is used for carrying out block processing on each active audio segment to obtain a plurality of granularities, carrying out sub-band decomposition on each granularity, computing the energy value of each sub-band in each granularity, and then computing the granularity average energy of each active audio segment by utilizing the energy value of each sub-band in each granularity;
the code rate determining module is used for determining the coding code rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding code rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
the first coding processing module is used for carrying out audio coding on each active audio segment according to the coding code rate of the active audio segment;
and the second coding processing module is used for coding the inactive audio segments obtained by dividing the audio data to be coded, and the coding rate of each active audio segment is larger than that of each inactive audio segment.
The embodiment of the invention also provides an audio coding device which comprises a processor and a memory coupled with the processor, wherein the memory stores instructions for the processor to execute, and when the processor executes the instructions, the audio coding method can be realized.
According to the audio coding device provided by the embodiment of the invention, the voice endpoint detection processing is carried out on the audio data to be coded, the active audio segments and the inactive audio segments are segmented, the active audio segments with larger granularity average energy are provided with larger coding code rates, the active audio segments with smaller granularity average energy are provided with smaller coding code rates, meanwhile, the code rate of each active audio segment is larger than the code rate of each inactive audio segment, the coding quality can be effectively improved, and the audio distortion after coding is reduced.
The embodiment of the invention also provides electronic equipment comprising the audio coding device. For example, the electronic device may be a sound box, a sound pen, a mobile phone, a smart tablet, a notebook computer, a desktop computer, or an electronic toy.
The embodiment of the invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the audio encoding method described above.
Those skilled in the art will appreciate that the above-described preferred embodiments can be freely combined and stacked without conflict.
It will be understood that the above-described embodiments are merely illustrative and not restrictive, and that all obvious or equivalent modifications and substitutions to the details given above may be made by those skilled in the art without departing from the underlying principles of the invention, are intended to be included within the scope of the appended claims.

Claims (10)

1. An audio encoding method, comprising:
step S1: performing voice endpoint detection processing on audio data to be encoded so as to divide active audio segments and inactive audio segments in the audio data to be encoded to obtain a plurality of audio segments;
step S2: performing block processing on each active audio segment to obtain a plurality of granularities, performing sub-band decomposition on each granularity, calculating the energy value of each sub-band in each granularity, and calculating the granularity average energy of each active audio segment by using the energy value of each sub-band in each granularity;
step S3: determining the coding rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
step S4: for each active audio segment, carrying out audio coding on the active audio segment according to the coding code rate;
step S5: and coding the inactive audio segments obtained by dividing the audio data to be coded, wherein the coding code rate of each active audio segment is larger than that of each inactive audio segment.
2. The method according to claim 1, wherein step S2 comprises:
step S21: the k-th active audio segment obtained by dividing the audio data to be encoded is subjected to block processing to obtain a plurality of granularities, wherein k=1, 2,3, …, L is the number of active audio segments obtained by dividing the audio data to be encoded;
step S22: sub-band decomposition is carried out on each granularity of the kth active audio segment, and then the energy value of each sub-band of each granularity of the kth active audio segment is calculated;
wherein W is (k,i) [sb]For the energy value, SP, of the sb sub-band in the ith granularity of the kth active audio segment (k,i) [sb][j]For the frequency spectrum value of the j-th frequency line of the sb sub-band in the ith granularity of the kth active audio segment, sb represents a sub-band number, sb=1, 2,3, …, N is the number of sub-bands in each granularity, j represents a frequency line number, Z is the number of frequency lines of each sub-band, and a is a preset value greater than 1;
step S23: calculating an energy distribution value of the kth active audio segment on each sub-band;
wherein D is k [sb]The grs_k is the granularity number obtained after the block processing of the kth active audio segment;
step S24: determining a granularity average energy EDS of the kth active audio segment k
3. The method of claim 1, wherein said determining the coding rate of each of said active audio segments based on the granularity average energy of each of said active audio segments comprises:
acquiring an overall target coding rate of the audio data to be coded;
and calculating the coding rate of each active audio segment according to the overall target coding rate of the audio data to be coded, the coding rate of each inactive audio segment and the granularity average energy of each active audio segment.
4. The method of claim 1, wherein the coding rate of the active audio segment is positively correlated with a granularity average energy of the active audio segment, comprising:
the ratio between the coding rates of the active audio segments is consistent with the ratio between the granularity average energy of the active audio segments.
5. The method according to any one of claims 1-4, wherein the coding rate of the inactive audio segment is a lowest coding rate supported by a coding format corresponding to the audio coding method.
6. An audio encoding apparatus, comprising:
the voice endpoint detection processing module is used for carrying out voice endpoint detection processing on the audio data to be encoded so as to divide the active audio segment and the inactive audio segment in the audio data to be encoded to obtain a plurality of audio segments;
the computing module is used for carrying out block processing on each active audio segment to obtain a plurality of granularities, carrying out sub-band decomposition on each granularity, computing the energy value of each sub-band in each granularity, and then computing the granularity average energy of each active audio segment by utilizing the energy value of each sub-band in each granularity;
the code rate determining module is used for determining the coding code rate of each active audio segment according to the granularity average energy of each active audio segment, wherein the coding code rate of the active audio segment is positively correlated with the granularity average energy of the active audio segment;
the first coding processing module is used for carrying out audio coding on each active audio segment according to the coding code rate of the active audio segment;
and the second coding processing module is used for coding the inactive audio segments obtained by dividing the audio data to be coded, and the coding rate of each active audio segment is larger than that of each inactive audio segment.
7. An audio encoding device comprising a processor and a memory coupled to the processor, wherein the memory has instructions stored therein for execution by the processor, which when executed by the processor, is capable of implementing the method according to any of claims 1-5.
8. An electronic device comprising the audio encoding apparatus according to claim 6 or the audio encoding apparatus according to claim 7.
9. The electronic device of claim 8, wherein the electronic device is a sound box, a sound pen, a mobile phone, a smart tablet, a notebook computer, a desktop computer, or an electronic toy.
10. A computer readable storage medium storing a computer program, which when executed by a processor implements the method of any one of claims 1-5.
CN202010383119.7A 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium Active CN112037803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010383119.7A CN112037803B (en) 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010383119.7A CN112037803B (en) 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112037803A CN112037803A (en) 2020-12-04
CN112037803B true CN112037803B (en) 2023-09-29

Family

ID=73579419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010383119.7A Active CN112037803B (en) 2020-05-08 2020-05-08 Audio encoding method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112037803B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115691521A (en) * 2021-07-29 2023-02-03 华为技术有限公司 Audio signal coding and decoding method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025919A (en) * 2006-02-22 2007-08-29 上海奇码数字信息有限公司 Synthetic sub-band filtering method for audio decoding and synthetic sub-band filter
CN101320563A (en) * 2007-06-05 2008-12-10 华为技术有限公司 Background noise encoding/decoding device, method and communication equipment
CN101335000A (en) * 2008-03-26 2008-12-31 华为技术有限公司 Method and apparatus for encoding and decoding
CN101552007A (en) * 2004-03-01 2009-10-07 杜比实验室特许公司 Multiple channel audio code
CN101800050A (en) * 2010-02-03 2010-08-11 武汉大学 Audio fine scalable coding method and system based on perception self-adaption bit allocation
CN102177543A (en) * 2008-10-08 2011-09-07 弗朗霍夫应用科学研究促进协会 Audio decoder, audio encoder, method for decoding an audio signal, method for encoding an audio signal, computer program and audio signal
CN102543090A (en) * 2011-12-31 2012-07-04 深圳市茂碧信息科技有限公司 Code rate automatic control system applicable to variable bit rate voice and audio coding
CN103247293A (en) * 2013-05-14 2013-08-14 中国科学院自动化研究所 Coding method and decoding method for voice data
US10043527B1 (en) * 2015-07-17 2018-08-07 Digimarc Corporation Human auditory system modeling with masking energy adaptation
CN109600700A (en) * 2018-11-16 2019-04-09 珠海市杰理科技股份有限公司 Audio data processing method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090299756A1 (en) * 2004-03-01 2009-12-03 Dolby Laboratories Licensing Corporation Ratio of speech to non-speech audio such as for elderly or hearing-impaired listeners

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552007A (en) * 2004-03-01 2009-10-07 杜比实验室特许公司 Multiple channel audio code
CN101025919A (en) * 2006-02-22 2007-08-29 上海奇码数字信息有限公司 Synthetic sub-band filtering method for audio decoding and synthetic sub-band filter
CN101320563A (en) * 2007-06-05 2008-12-10 华为技术有限公司 Background noise encoding/decoding device, method and communication equipment
CN101335000A (en) * 2008-03-26 2008-12-31 华为技术有限公司 Method and apparatus for encoding and decoding
CN102177543A (en) * 2008-10-08 2011-09-07 弗朗霍夫应用科学研究促进协会 Audio decoder, audio encoder, method for decoding an audio signal, method for encoding an audio signal, computer program and audio signal
CN101800050A (en) * 2010-02-03 2010-08-11 武汉大学 Audio fine scalable coding method and system based on perception self-adaption bit allocation
CN102543090A (en) * 2011-12-31 2012-07-04 深圳市茂碧信息科技有限公司 Code rate automatic control system applicable to variable bit rate voice and audio coding
CN103247293A (en) * 2013-05-14 2013-08-14 中国科学院自动化研究所 Coding method and decoding method for voice data
US10043527B1 (en) * 2015-07-17 2018-08-07 Digimarc Corporation Human auditory system modeling with masking energy adaptation
CN109600700A (en) * 2018-11-16 2019-04-09 珠海市杰理科技股份有限公司 Audio data processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112037803A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
US10546592B2 (en) Audio signal coding and decoding method and device
US9972326B2 (en) Method and apparatus for allocating bits of audio signal
CN103544957B (en) Method and device for bit distribution of sound signal
US10789964B2 (en) Dynamic bit allocation methods and devices for audio signal
JP2002196792A (en) Audio coding system, audio coding method, audio coder using the method, recording medium, and music distribution system
CN112037803B (en) Audio encoding method and device, electronic equipment and storage medium
US20130346088A1 (en) Audio coding method and apparatus
CN112037802B (en) Audio coding method and device based on voice endpoint detection, equipment and medium
US7650277B2 (en) System, method, and apparatus for fast quantization in perceptual audio coders
CN106409303A (en) Method and device for processing signal
JP3466507B2 (en) Audio coding method, audio coding device, and data recording medium
CN116018642A (en) Maintaining invariance of perceptual dissonance and sound localization cues in an audio codec
JPH10232695A (en) Method of encoding speech compression and device therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 519075 No. 333, Kexing Road, Xiangzhou District, Zhuhai City, Guangdong Province

Applicant after: ZHUHAI JIELI TECHNOLOGY Co.,Ltd.

Address before: Floor 1-107, building 904, ShiJiHua Road, Zhuhai City, Guangdong Province

Applicant before: ZHUHAI JIELI TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant