CN114333862B - Audio encoding method, decoding method, device, equipment, storage medium and product - Google Patents

Audio encoding method, decoding method, device, equipment, storage medium and product Download PDF

Info

Publication number
CN114333862B
CN114333862B CN202111327258.9A CN202111327258A CN114333862B CN 114333862 B CN114333862 B CN 114333862B CN 202111327258 A CN202111327258 A CN 202111327258A CN 114333862 B CN114333862 B CN 114333862B
Authority
CN
China
Prior art keywords
parameter
code stream
parameters
frame
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111327258.9A
Other languages
Chinese (zh)
Other versions
CN114333862A (en
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111327258.9A priority Critical patent/CN114333862B/en
Publication of CN114333862A publication Critical patent/CN114333862A/en
Application granted granted Critical
Publication of CN114333862B publication Critical patent/CN114333862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to an audio encoding method, an audio decoding device, audio encoding equipment, audio decoding equipment, audio storage media and audio products, and relates to the technical field of network communication. The method comprises the following steps: acquiring a first audio signal frame; encoding the first audio signal frame to obtain encoding parameters of the first audio signal, wherein the encoding parameters comprise whole frame type parameters and frame division type parameters; grouping the frame classification parameters to obtain n parameter groups; the parameter group comprises frame classification parameters of at least one subframe corresponding to the parameter group, wherein n is more than or equal to2, and n is a positive integer; combining the whole frame type parameters with n parameter groups respectively to generate n description code streams corresponding to the n parameter groups; by the method, the single description coding device is changed into the multi-description coding device, so that the coding mode is changed from single description coding to multi-description coding, the packet loss resistance is improved, and meanwhile, the coding effect on the audio signal is improved.

Description

Audio encoding method, decoding method, device, equipment, storage medium and product
Technical Field
The embodiment of the application relates to the technical field of network communication, in particular to an audio encoding method, an audio decoding device, audio encoding equipment, audio decoding equipment, a storage medium and a product.
Background
With the development of communication technology, an audio encoding and decoding scheme has become an important technical means of audio communication, however, in actual operation, due to factors such as network congestion, channel interference, noise and the like, the problem of unavoidable packet loss in real-time audio communication greatly affects the call audio quality.
In the related art, the quality degradation problem caused by packet loss in audio coding transmission is solved by a multiple description coding (MDC, multiple Description Coding) technique. The multi-description coding technology is a technology for separating signals by adopting different decomposition methods on original signals, and the separated signals are respectively coded by adopting different descriptions.
However, the existing multi-description coding is based on processing at the signal source level, and when the encoder is deployed, the existing single-description encoder needs to be completely replaced by the multi-description encoder, so that the coding compression efficiency, the voice quality, the bandwidth occupation and the like are affected, and the coding effect on the audio signal is further affected.
Disclosure of Invention
The embodiment of the application provides an audio coding method, a decoding method, a device, equipment, a storage medium and a product, which can change the coding mode of coding equipment from single description coding to multi-description coding, improve the packet loss resistance of an audio coding and decoding system and improve the coding effect of audio signals. The technical scheme is as follows:
in one aspect, there is provided an audio encoding method, the method comprising:
Acquiring a first audio signal frame;
Encoding the first audio signal frame to obtain encoding parameters of the first audio signal, wherein the encoding parameters comprise whole frame type parameters and frame type parameters; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
Grouping the frame grouping parameters to obtain n parameter groups; the parameter group comprises framing class parameters of at least one subframe corresponding to the parameter group, n is more than or equal to 2, and n is a positive integer;
And respectively combining the whole frame type parameters with n parameter groups to generate n description coding code streams corresponding to the n parameter groups.
In another aspect, there is provided an audio decoding method, the method comprising:
Receiving an effective code stream, wherein the effective code stream is a successfully received code stream in n description coding code streams; the n description coding code streams are obtained after the coding parameters of the first audio signal frame are subjected to grouping processing; the grouping processing refers to processing operations of grouping frame class parameters in the coding parameters, and respectively adding whole frame class parameters in the coding parameters into the n parameter groups after obtaining n parameter groups; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
performing code stream processing on the effective code stream to obtain a decoding parameter value corresponding to the effective code stream;
And acquiring a second audio signal frame based on the decoding parameter value corresponding to the effective code stream.
In another aspect, there is provided an audio encoding apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a first audio signal frame;
The encoding module is used for encoding the first audio signal frame to obtain encoding parameters of the first audio signal, wherein the encoding parameters comprise whole frame type parameters and frame division type parameters; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
The grouping module is used for grouping the framing parameters to obtain n parameter groups; the parameter group comprises framing class parameters of at least one subframe corresponding to the parameter group, n is more than or equal to 2, and n is a positive integer;
and the combination module is used for respectively combining the whole frame type parameters with n parameter groups to generate n description coding code streams corresponding to the n parameter groups.
In one possible implementation, the grouping module includes:
a number determination sub-module for determining the number of parameter packets based on the number of the at least two subframes; the upper limit of the number of the parameter packets is equal to the number of the at least two subframes;
and the grouping sub-module is used for dividing the parameter information corresponding to each of the at least two subframes into n groups respectively based on the number of the parameter groups so as to obtain n parameter groups.
In one possible implementation, the number of subframes is equal to the number of subframes in response to the parameter packet; the grouping sub-module is configured to divide parameter information corresponding to each of the at least two subframes into n groups respectively by taking a single subframe as a unit, so as to obtain n parameter groups.
In one possible implementation manner, in response to the number of the parameter packets being smaller than the number of the at least two subframes, and the number m of the at least two subframes being greater than 2, n of the parameter packets include at least one target parameter packet, where the number of subframes corresponding to the target parameter packet is i, 2+.i < m, i, and m are all positive integers.
In another aspect, there is provided an audio decoding apparatus, the apparatus including:
An effective code stream receiving module, configured to receive an effective code stream, where the effective code stream is a successfully received code stream in n description coding code streams; the n description coding code streams are obtained after the coding parameters of the first audio signal frame are subjected to grouping processing; the grouping processing refers to processing operations of grouping frame class parameters in the coding parameters, and respectively adding whole frame class parameters in the coding parameters into the n parameter groups after obtaining n parameter groups; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
the code stream processing module is used for carrying out code stream processing on the effective code stream to obtain a decoding parameter value corresponding to the effective code stream;
And the second acquisition module is used for acquiring a second audio signal frame based on the decoding parameter value corresponding to the effective code stream.
In one possible implementation manner, in response to the valid code stream being all of the n description code streams, the second obtaining module includes:
The first reorganization submodule is used for reorganizing decoding parameter values corresponding to the descriptive coded streams according to the subframe sequence to obtain first decoding parameters;
And the first decoding submodule is used for decoding the first decoding parameter to obtain the second audio signal frame.
In one possible implementation manner, in response to the valid code stream being a partial code stream of n description code streams, the second acquisition module includes:
The parameter prediction sub-module is used for carrying out parameter prediction on the failure code stream to obtain a predicted parameter value corresponding to the failure code stream; the failure code stream is the code stream which is not successfully received in the n description code streams;
A second reconstruction sub-module, configured to reconstruct, according to a sub-frame sequence, a decoding parameter value corresponding to the valid code stream and a prediction parameter value corresponding to the invalid code stream, to obtain a second decoding parameter;
And the second decoding submodule is used for decoding the second decoding parameter to obtain the second audio signal frame.
In one possible implementation, the parameter prediction submodule includes:
An information acquisition unit for acquiring historical frame parameter information; the historical frame parameter information comprises historical parameter values of target frame numbers corresponding to the failure code stream;
and the parameter prediction unit is used for carrying out parameter prediction based on the historical frame parameter information to obtain a prediction parameter value corresponding to the failure code stream.
In one possible implementation manner, the parameter prediction unit is configured to perform parameter prediction on the framing parameter in the failure code stream based on the historical frame parameter information, so as to obtain a predicted parameter value corresponding to the framing parameter in the failure code stream.
In one possible implementation manner, the parameter prediction unit is configured to input historical parameter information corresponding to a first target parameter into a first prediction network, and obtain a predicted parameter value corresponding to the first target parameter output by the first prediction network; the first target parameter belongs to the framing class parameter;
The first predictive network is trained based on historical sample parameter values of the first target parameter and historical parameter tags of the first target parameter.
In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory storing at least one computer program, the at least one computer program being loaded and executed by the processor to implement the above-described audio encoding method, or the above-described audio decoding method.
In another aspect, a computer readable storage medium having at least one computer program stored therein is provided, the computer program being loaded and executed by a processor to implement the above-described audio encoding method, or the above-described audio decoding method.
In another aspect, a computer program product is provided, comprising at least one computer program loaded and executed by a processor to implement the audio encoding method provided in the various alternative implementations described above, or the audio decoding method described above.
The technical scheme provided by the application can comprise the following beneficial effects:
The method comprises the steps of carrying out code stream analysis on a received effective code stream to obtain a code stream analysis result corresponding to the effective code stream, and obtaining a second audio signal frame based on the code stream analysis result, wherein the effective code stream is all code streams and partial code streams received by decoding equipment in n description coding code streams, dividing coding parameters of a first audio signal frame into whole frame type parameters and frame type parameters when generating the n description coding code streams, distributing the frame type parameters into different description coding code streams, copying and adding the whole frame type parameters into each description coding code stream, and generating the n description coding code streams, so that single description coding equipment is changed into multi-description coding equipment, the single description coding equipment is changed into multi-description coding, and the anti-packet loss performance is improved; meanwhile, the method provided by the application is to add and reform the adaptive function on the basis of the original coding equipment, so that the original coding equipment is not required to be globally replaced, the influence on coding compression efficiency, bandwidth occupation, voice quality and the like caused by the replacement of the coding equipment is avoided, and the coding effect on the audio signal is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 illustrates a schematic diagram of an audio codec system according to an exemplary embodiment of the present application;
Fig. 2 illustrates a flowchart of an audio encoding method according to an exemplary embodiment of the present application;
Fig. 3 illustrates a flowchart of an audio decoding method according to an exemplary embodiment of the present application;
fig. 4 illustrates a flowchart of an audio codec method according to an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a decoding process according to an exemplary embodiment of the present application;
FIG. 6 is a schematic diagram illustrating a parameter prediction process according to an exemplary embodiment of the present application;
FIG. 7 is a schematic diagram illustrating a decoding process according to an exemplary embodiment of the present application;
Fig. 8 illustrates a schematic diagram of an audio codec method according to an exemplary embodiment of the present application;
fig. 9 illustrates a block diagram of an audio encoding apparatus according to an exemplary embodiment of the present application;
fig. 10 illustrates a block diagram of an audio decoding apparatus according to an exemplary embodiment of the present application;
FIG. 11 is a block diagram illustrating a computer device according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
Fig. 1 shows a schematic diagram of an audio codec system according to an exemplary embodiment of the present application, and as shown in fig. 1, the audio codec system 100 may include an encoding device 110 and a decoding device 120.
The encoding device 110 may include an encoder for encoding the acquired audio signal frames to obtain corresponding encoding results, and the encoder may be implemented as an application or a hardware component in the encoding device 110; or the encoding device 110 may be implemented as an encoder; the coding means recording the sampled and quantized digital data according to a certain format; in an embodiment of the present application, the encoding apparatus may be constructed of a single description coding model, which may include a CELP (Code Excited Linear Prediction, code excited linear prediction coding) model and a CELP variant model, where the CELP variant model may include an AMR (Adaptive Multi-Rate) model, a g.729, an EVRC (Enhanced Variable Rate Codec ) model, a Speex (speech) model, and the like.
The decoding device 120 may include a decoder for decoding the obtained description code stream to obtain a corresponding decoding result, so as to obtain a decoded audio signal frame; the decoder may be implemented as an application or a hardware component in the decoding device 120; or the decoding device 120 may be implemented as a decoder.
In the above process, since there may be a frame loss phenomenon during the data transmission process, the description encoded code stream output by the encoding device may not be completely received by the decoding device, and thus, there may be a difference between the decoded audio signal frame obtained by the decoding device and the audio signal frame input to the encoding device.
The transmission of the encoded code stream is described between the encoding device 110 and the decoding device 120 via a transmission network, which may be implemented as a wired network or a wireless network.
Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), a mobile, wired or wireless network, a private network, or any combination of virtual private networks. In some embodiments, data exchanged over the network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The embodiment of the application is used for explaining the audio coding and decoding method provided by the application through the operations respectively executed from the coding equipment side and the decoding equipment side.
Fig. 2 shows a flow chart of an audio encoding method according to an exemplary embodiment of the present application, which may be performed by a computer device, which may be implemented as an encoding device in the audio codec system shown in fig. 1, and which may include the steps of:
step 210, a first audio signal frame is acquired.
The encoding device, which may be an encoding device constructed based on a single description encoding model, acquires the first audio signal frame.
The first audio signal frame may be an audio signal frame in an audio signal transmission scene, and illustratively, the first audio signal frame may be an audio signal frame transmitted in an audio communication process, such as a scene of a voice call through an instant messaging application program; or may be an audio signal frame transmitted in the audio sharing process, and the application does not limit the acquisition source of the first audio signal frame.
Step 220, encoding the first audio signal frame to obtain encoding parameters of the first audio signal, where the encoding parameters include an entire frame class parameter and a frame class parameter.
The whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameter is a coding parameter corresponding to at least two subframes respectively.
In the process of encoding the first audio signal, the encoding device divides the audio signal frames in the time domain, so as to encode one audio signal frame into a plurality of subframes arranged in time sequence. For the whole frame type parameter in the coding parameters, the influence range of the parameter includes each subframe of the first audio signal, and illustratively, the g.729 model takes 20ms as a frame, and part of the parameters in the coding process are divided into 2 subframes for coding, for example, 10ms as a subframe for coding; for the frame-division parameters in the coding parameters, the influence ranges of the parameters respectively correspond to all subframes of the first audio signal frame, that is, in the coding process, the parameters which need to be divided into subframes for coding are frame-division parameters, and the parameters which do not need to be divided into subframes for coding are whole frame parameters; for the framing class parameter, the information of several bits of the framing class parameter may be contained in each subframe, and the information amount of the framing class parameter contained in each subframe may be different, and taking the encoding result of the g.729 model as an example, the parameters output by the g.729 model include a line spectrum pair (Linear Spectrum Pair, LSP), an adaptive codebook delay, a period delay parity parameter, a fixed codebook index, a fixed codebook flag, a codebook gain 1, and a codebook gain 2.
TABLE 1
Parameters (parameters) Coding field Subframe 1bit number Subframe 2bit number Total bit number
Line spectrum versus LSP L0/L1/L2/L3 18
Adaptive codebook delay P1/P2 8 5 13
Cycle delay parity parameter P0 1 1
Fixed codebook index C1/C2 13 13 26
Fixed codebook markers S1/S2 4 4 8
Codebook gain 1 GA1/GA2 3 3 6
Codebook gain 2 GB1/GB2 4 4 8
Totals to 80
Table 1 shows the encoding result of the g.729 model on the audio signal frame, as shown in table 1, the g.729 model may divide a part of parameters into 2 subframes for encoding, that is, the frame classification parameters include adaptive codebook delay, cyclic delay parity parameters, fixed codebook index, fixed codebook flag, codebook gain 1 and codebook gain 2, and the information amount contained in each subframe of different parameters may be the same (e.g. fixed codebook index, fixed codebook flag, codebook gain 1 and codebook gain 2) or may be different (e.g. adaptive codebook delay and cyclic delay parity parameters); parameters that do not need to be encoded by dividing the sub-frame are whole frame class parameters, such as line spectrum versus LSP in the above parameters.
Step 230, grouping the frame-classified parameters to obtain n parameter groups; the parameter packet comprises frame class parameters of at least one subframe corresponding to the parameter packet, n is more than or equal to 2, and n is a positive integer.
In one possible implementation, the grouping of the framing class parameters may be performed, and the grouping rule for obtaining the n parameter groupings may be preset based on a user, or may be determined by the encoding device based on the framing class parameters. Schematically, if a user sets a subframe to correspondingly generate a description code stream, the coding device divides frame class parameters corresponding to the subframe into a parameter group, and the number of the parameter group is equal to the number of at least two subframes at the moment; when the number of at least two subframes is greater than 2, if the user sets that each two subframes correspondingly generate a description coding code stream, the coding device acquires each two subframes as a parameter packet, and the number of the parameter packets is smaller than that of the at least two subframes.
And 240, respectively combining the whole frame class parameters with n parameter groups to generate n description coding code streams corresponding to the n parameter groups.
That is, in the n description code streams generated by the encoding device, each description code stream includes the same whole frame type parameter, each description code stream includes a part of information of each frame type parameter, and the information of each frame type parameter included in each description code stream is different.
Alternatively, after the encoding device generates n description code streams, the encoding device may transmit the n description code streams to the decoding device through a transmission network.
In summary, in the audio coding method provided by the embodiment of the present application, the coding parameters of the first audio signal frame are divided into the whole frame type parameters and the frame type parameters, the frame type parameters are allocated to different description coding code streams, and the whole frame type parameters are copied and added to each description coding code stream to generate two or more description coding code streams, so that the single description coding device is changed into the multiple description coding device, the coding mode of the coding device is changed from the single description coding to the multiple description coding, and the packet loss resistance performance is improved.
Meanwhile, the method provided by the application is to add and reform the adaptive function on the basis of the original coding equipment, so that the original coding equipment is not required to be globally replaced, the influence on coding compression efficiency, bandwidth occupation, voice quality and the like caused by the replacement of the coding equipment is avoided, and the coding effect on the audio signal is improved.
Furthermore, the method provided by the application does not need to globally replace the original coding equipment, so that the portability of the audio coding and decoding scheme is improved.
Fig. 3 shows a flow chart of an audio decoding method according to an exemplary embodiment of the present application, which may be performed by a computer device, which may be implemented as a decoding device in the audio codec system shown in fig. 1, and which may include the steps of:
In step 310, an active code stream is received, the active code stream being a successfully received code stream of the n description code streams.
The n descriptive code streams are obtained after the coding parameters of the first audio signal frame are subjected to grouping processing; the grouping processing refers to processing operation of grouping frame class parameters in the coding parameters, and respectively adding the whole frame class parameters in the coding parameters into n parameter groups after n parameter groups are obtained; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameter is a coding parameter corresponding to at least two subframes respectively.
The decoding device receives n description code streams sent by the encoding device through a transmission network, but because a packet loss situation may exist in the transmission process of the description code streams, the number of description code streams successfully received by the decoding device may not be equal to the number of description code streams sent by the encoding device, so that an effective code stream is a successfully received code stream in the n description code streams, namely, the effective code stream is all code streams or part of code streams in the n description code streams, and when the packet loss situation does not occur in the transmission process, the effective code stream is all code streams in the n description code streams; when packet loss occurs in the transmission process, the effective code stream is a part of the n description code streams.
And 320, performing code stream processing on the effective code stream to obtain decoding parameter values corresponding to the effective code stream.
The decoding parameter value is used to indicate a parameter value having an actual physical meaning. That is, the decoding parameter value can obtain a corresponding audio signal after passing through decoding.
Step 330, based on the decoding parameter value corresponding to the effective code stream, a second audio signal frame is obtained.
In summary, in the audio decoding method provided by the embodiment of the present application, the code stream analysis is performed on the received effective code stream to obtain the code stream analysis result corresponding to the effective code stream, so as to obtain the second audio signal frame based on the code stream analysis result, where the effective code stream is all the code streams and part of the code streams received by the decoding device in the n description coding code streams, when the n description coding code streams are generated, the coding parameters of the first audio signal frame are divided into the whole frame type parameters and the frame type parameters, the frame type parameters are allocated to different description coding code streams, and the whole frame type parameters are copied and added to each description coding code stream to generate the n description coding code streams, so that the single description coding device is changed into the multiple description coding device, and the anti-packet loss performance is improved.
Meanwhile, the method provided by the application adds and reforms the adaptive function on the basis of the original coding equipment, avoids the influence on coding compression efficiency, bandwidth occupation, voice quality and the like caused by the replacement of the coding equipment, and improves the coding effect on the audio signal.
Furthermore, the method provided by the application does not need to globally replace the original coding equipment, so that the portability of the audio coding and decoding scheme is improved.
Fig. 4 shows a flowchart of an audio codec method shown in an exemplary embodiment of the present application, which may be interactively performed by an encoding apparatus and a decoding apparatus, which may be implemented as the encoding apparatus and the decoding apparatus in the audio codec system shown in fig. 1; as shown in fig. 4, the audio codec method may include the steps of:
in step 401, the encoding device acquires a first audio signal frame.
Step 402, the encoding device encodes the first audio signal frame to obtain encoding parameters of the first audio signal, wherein the encoding parameters comprise whole frame type parameters and frame type parameters; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameter is a coding parameter corresponding to at least two subframes respectively.
In the embodiment of the application, the coding modes of the coding devices constructed based on different single description coding models are different, and optionally, the division modes of the sub-frames and the number of the sub-frames obtained by division of the coding devices constructed based on different single description coding models are different. Assuming that a whole frame is 20ms, a coding device constructed based on a first single description coding model divides subframes into 2 subframes when the mode of dividing subframes is 10 ms; the coding device constructed based on the second single description coding model divides the subframe into 5ms subframes, and then 4 subframes can be obtained by dividing.
Step 403, the encoding device groups the frame class parameters to obtain n parameter groups; the parameter packet comprises frame class parameters of at least one subframe corresponding to the parameter packet, n is more than or equal to 2, and n is a positive integer.
In one possible implementation manner, grouping the frame-like parameters, and obtaining a grouping rule of n parameter groups may be based on a preset user, where the encoding device and the decoding device encode or decode according to the grouping rule; or the grouping manner of grouping the framing class parameters may also be determined by the encoding device based on the framing class parameters.
The grouping manner of grouping the framing class parameters is determined by the encoding device based on the framing class parameters, and the process includes:
The encoding device determining a number of parameter packets based on a number of at least two subframes; the upper limit of the parameter packet is equal to the number of at least two subframes;
and dividing the parameter information corresponding to each of the at least two subframes into n groups based on the number of the parameter groups so as to obtain n parameter groups.
Wherein the parameter information includes parameter values corresponding to respective sub-frames for respective sub-frame class parameters.
When the encoding device determines the maximum number of parameter packets, at least two subframes may be grouped within the range of the maximum number of parameter packets.
Responsive to the number of parameter packets being equal to the number of at least two subframes; the parameter information corresponding to each of the at least two subframes may be divided into n packets, respectively, in units of a single subframe to obtain n parameter packets. That is, one subframe may be correspondingly divided into one parameter packet.
Or the plurality of subframes can be divided into one parameter group, and in response to the number of the parameter groups being smaller than the number of at least two subframes, the number m of the at least two subframes is more than 2, n parameter groups comprise at least one target parameter group, the number of subframes corresponding to the target parameter group is i, i is more than or equal to 2 and less than m, and i and m are positive integers; that is, the number of subframes corresponding to each parameter packet may be the same or different.
Because the audio signal has time sequence and frame relevance, when grouping, a plurality of continuous subframes can be divided into one group according to the sequence of the subframes, and if the number of the subframes is 4, the first two subframes can be divided into 1 group, and the second two subframes can be divided into 1 group; or the first three subframes may be divided into 1 group, and the last subframe may be divided into 1 group; alternatively, the encoding device may also randomly group, for example, the first frame and the third frame may be divided into 1 group, the second frame and the fourth frame may be divided into 1 group, or the first frame and the fourth frame may be divided into 1 group, and the second frame and the third frame may be divided into 1 group.
When the acquisition mode of the parameter packet may also be determined by the encoding device based on the number of at least two subframes, since the above-mentioned packet mode has selectivity and randomness, in order to ensure that the decoding device can acquire the audio signal with the correct timing, the encoding device may set a subframe identification corresponding to each subframe when performing the parameter packet, where the subframe identification is used to describe the sequence of the subframe in time sequence, and when transmitting the description code stream generated based on the parameter packet to the decoding device, transmit the corresponding subframe identification together to the decoding device, so that the decoding device can confirm the timing position of the obtained decoding information based on the subframe identification. Illustratively, when the number of at least two subframes is 4, if the encoding device divides the frame class parameter corresponding to each of the first subframe and the third subframe into one parameter packet when generating the parameter packet, divides the frame class parameter corresponding to each of the second subframe and the fourth subframe into one parameter packet, and generates the description encoding code stream corresponding to each of the two parameter packets, in order to enable the decoding device to locate the sequential order of each of the frame class parameters contained in each of the parameter packets when receiving the description encoding code stream, the subframe identifier may be set corresponding to each of the subframes when performing the parameter packet, for example, the first subframe identifier may be set on the parameter set corresponding to the frame class parameter corresponding to the first subframe to indicate that the sequence of the frame class parameter is the first subframe, and correspondingly, the respective subframe identifiers may be set on the parameter set corresponding to each of the frame class parameter corresponding to each of the second subframe, the third subframe and the fourth subframe, respectively, so as to ensure that the decoding device can decode the audio code stream accurately and obtain the time sequence of the audio code stream after obtaining the audio code stream.
In step 404, the encoding device combines the whole frame class parameters with the n parameter packets respectively, to generate n description code streams corresponding to the n parameter packets.
Taking the description code stream of the parameters output by the g.729 model shown in table 1 as an example, since the number of subframes divided during the encoding of the g.729 model is 2, two description code streams can be generated based on the encoding parameters obtained by the g.729 model, where the first description code stream includes: the line spectrum pair LSP, the first subframe adaptive codebook delay, the period delay parity parameter, the first subframe fixed codebook index, the first subframe fixed codebook mark, the first subframe codebook gain 1 and the first subframe codebook gain 2; the second description code stream includes: the line spectrum pairs LSP, the second subframe adaptive codebook delay, the second subframe fixed codebook index, the second subframe codebook gain 1, and the second subframe codebook gain 2. The types of the framing class parameters included in the respective description code streams may be different, and the parameter information of the framing class parameters included in the respective description code streams may be different.
Wherein, the code stream length of each description code stream is smaller than the original code stream length; if the original code stream is 80 bits, after the above-mentioned method is divided, the length of the first description code stream is 51 bits, and the length of the second description code stream is 47 bits. By generating a plurality of description coding code streams to send coding parameters, even if packet loss exists, a certain number of subframes can be decoded, so that the risk of losing all the coding parameters is reduced, and the packet loss resistance of an audio coding and decoding system is improved.
Step 405, the encoding device sends n description encoded code streams to the decoding device; accordingly, the decoding apparatus receives an effective code stream, which is a successfully received code stream of the n description encoded code streams.
The transmission of the descriptive coding code stream is carried out between the coding device and the decoding device through a transmission network; when the effective code stream is all the n description code streams, the situation that the packet is lost is indicated; when the effective code stream is a part of the n description code streams, the packet loss condition is indicated.
In step 406, the decoding device performs code stream processing on the effective code stream to obtain a decoding parameter value corresponding to the effective code stream.
The process of performing the code stream processing on the effective code stream may include:
Performing code stream analysis on the effective code stream to obtain a parameter index value corresponding to the effective code stream;
based on the parameter index value, obtaining a decoding parameter value corresponding to the effective code stream through table lookup or related calculation.
In step 407, in response to the effective code stream being all the n description code streams, the decoding device reorganizes the decoding parameter values corresponding to the description code streams according to the subframe sequence, so as to obtain a first decoding parameter.
When the effective code stream is all the n description code streams, the description code streams can be respectively subjected to code stream processing to obtain parameter index values corresponding to the description code streams; and respectively acquiring decoding parameter values (multiple description parameters) corresponding to the description code streams based on the parameter index values corresponding to the description code streams. After the decoding parameter values corresponding to the description code streams are obtained, the multiple description parameters are recombined based on the time sequence of the subframes, and a frame of first decoding parameters corresponding to the encoding parameters are obtained.
In step 408, the decoding device decodes the first decoding parameter to obtain a second audio signal frame.
Fig. 5 is a schematic diagram illustrating a decoding process according to an exemplary embodiment of the present application, as shown in fig. 5, when an effective code stream is all of n description coding code streams, a decoding device performs code stream analysis on each code stream in the effective code stream 510 after obtaining the effective code stream 510, obtains parameter index values corresponding to each code stream in the effective code stream, obtains decoding parameter values corresponding to each code stream by table lookup or correlation calculation, and obtains a first decoding parameter 520 after recombining decoding parameter values corresponding to each code stream according to a subframe sequence, and obtains a final audio signal frame, that is, a second audio signal frame after the parameter decoding process.
In this case, the second audio signal frame is identical to the first audio signal frame, or the difference between the two is small.
In step 409, in response to the valid code stream being a partial code stream of the n description code streams, the decoding device performs parameter prediction on the invalid code stream, to obtain a prediction parameter value corresponding to the invalid code stream.
The failure code stream is a code stream which is not successfully received in the n description code streams.
The process of obtaining the predicted parameter value corresponding to the failure code stream by the decoding device through parameter prediction can be implemented as follows: acquiring historical frame parameter information; the historical frame parameter information comprises historical parameter values of target frame numbers corresponding to failure code streams; and carrying out parameter prediction based on the historical frame parameter information to obtain a prediction parameter value corresponding to the failure code stream.
In one possible implementation, the historical frame parameter information includes historical parameter values of a target frame number received before the audio signal frame of the current frame, adjacent to the audio signal frame of the current frame.
The historical parameter value refers to a decoding parameter value obtained after code stream processing is performed on the received code stream in the audio signal frame obtaining process of the historical frame.
In the embodiment of the present application, since each description code stream includes the whole frame type parameter, the decoding device obtains one of the description code streams, and can obtain the whole frame type parameter, so when parameter prediction is performed, parameter prediction can be performed only on the frame type parameter, that is, the above parameter prediction process can be implemented as follows: and carrying out parameter prediction on the framing parameters in the failure code stream based on the historical frame parameter information to obtain prediction parameter information corresponding to the framing parameters in the failure code stream.
In order to improve the accuracy of predicting the parameter information of the failure code stream, in the embodiment of the present application, parameter prediction may be performed in combination with a deep learning method, and because different kinds of framing parameters have different rules, the parameter prediction network may be designed and trained respectively corresponding to the framing parameters of each parameter type, so as to perform parameter prediction on the framing parameters of each type that are not successfully received in the failure code stream, and illustratively, if there is a failure code stream in the framing parameters shown in table 1, the parameter prediction network may be used to input the history parameter information corresponding to each parameter prediction network to obtain the prediction parameter corresponding to each parameter based on the parameter prediction network corresponding to each parameter delay, the cycle delay parity, the fixed codebook index, the fixed codebook mark, the codebook gain 1 and the codebook gain 2, and thus, the failure code stream is recovered, and taking one of the framing parameters as an example, the process may be implemented as follows:
Inputting the historical parameter information corresponding to the first target parameter into a first prediction network to obtain a predicted parameter value corresponding to the first target parameter output by the first prediction network; the first target parameter belongs to a framing class parameter;
the first predictive network is trained based on historical sample parameter values of the first target parameter and actual parameter tags of the first target parameter.
Wherein the parameter prediction network (including the first prediction network) may be obtained based on training of LSTM (Long Short-Term Memory network), CNN (Convolutional Neural Network ), RNN (Recurrent Neural Network, recurrent neural network), or GRU (Gated Recurrent Unit, gated loop unit) network, etc., wherein the GRU network is a variant of the LSTM network. The parameter prediction networks corresponding to different types of framing parameters can be obtained based on the same/different neural network training, and when the network types of the parameters corresponding to the different types of framing parameters are the same, the corresponding internal levels of the parameters corresponding to the different types of framing parameters can be the same/different, so that the application is not limited to the above.
Taking the first prediction network as an example obtained by training based on the GRU network, fig. 6 is a schematic diagram showing a parameter prediction process according to an exemplary embodiment of the present application, as shown in fig. 6, a historical parameter value of a target frame number corresponding to a first target parameter is input into the first prediction network 610, for example, a historical parameter value of an adaptive codebook delay of a historical N frame is input into the first prediction network, where the first prediction network may be a GRU model formed by a full connection layer, a gate layer and a connection layer structure; the prediction parameter value of the first target parameter (adaptive codebook delay) in the failure code stream of the current frame can be obtained through the processing of the first prediction network. It should be noted that the architecture in the first prediction network shown in fig. 6 is schematic, and the present application is not limited to the composition of the first prediction network.
When the first prediction network is predicted, a history sample parameter value of a first target parameter is input into the first prediction network to obtain a prediction history parameter of the first target parameter output by the first prediction network; training the first prediction network based on the prediction history parameters and the history parameter labels of the first target parameters, repeating the process based on different history sample parameter values, and performing iterative training on the first prediction network until a training completion condition is reached, so as to obtain a trained first prediction network, and performing parameter prediction on the first target parameters in the failure code stream through the trained first prediction network.
The historical sample parameter value of the first target parameter may be a historical parameter value of N consecutive frames, and the historical parameter tag of the first target parameter may be a historical parameter value of n+1st frame after N consecutive frames, for example, when the historical sample parameter value of the first target parameter is a historical parameter value of adaptive codebook delay of 8 consecutive frames, the historical parameter tag of the first target parameter may be a historical parameter value of adaptive codebook delay of 9 th frame after 8 consecutive frames.
In the above process, the historical parameter and the historical parameter label of the first target parameter may be predicted, the function value of the loss function may be calculated, and the first prediction network may be trained based on the function value of the loss function. Wherein the loss function may be a cross entropy function.
In one possible implementation, the corresponding parameter prediction method may be set based on the numerical characteristics of some of the frame class parameters, schematically: processing the historical frame parameter information corresponding to the second target parameter by a target prediction method to obtain a predicted parameter value corresponding to the second target parameter; the second target parameter is a parameter with continuity in the framing type parameter; the target prediction method may be an interpolation prediction method, for example, the adaptive codebook delay in table 1 is a parameter with continuity, and the adaptive codebook delay in the failure code stream may be subjected to parameter prediction by the interpolation prediction method.
In another possible implementation manner, the invalid code stream of the current frame can be predicted based on the acquired parameter information of each subframe in the valid code stream of the current frame; illustratively, if the encoding device sends the description code streams corresponding to the 4 subframes respectively to the decoding device, that is, the 4 description code streams, but the encoding device only receives 3 description code streams, taking the description code streams corresponding to the second subframe and the fourth subframe respectively as examples, where the description code stream corresponding to the third subframe is a failure code stream, the decoding parameter values corresponding to the first subframe, the second subframe and the fourth subframe respectively can be obtained based on the received description code streams corresponding to the first subframe, the second subframe and the fourth subframe respectively; and based on the first subframe, the second subframe and the fourth subframe respectively correspond to the decoding parameter values to carry out parameter prediction on the third subframe, so as to obtain the prediction parameter value corresponding to the third subframe.
The process of predicting the failure code stream of the current frame based on the obtained parameter information of each subframe in the valid code stream of the current frame may also be performed through a deep learning network, where the parameter prediction network corresponding to each subframe is obtained based on a sample parameter value and parameter tag training, where the historical parameter values of other subframes in the same historical frame may be obtained as the sample parameter value, and the historical parameter values of the current subframe in the same historical frame may be obtained as the parameter tag training. The training process of the parameter prediction network may refer to the above description of the training process of the first prediction network, which is not repeated herein.
Or in another possible implementation manner, the second audio signal frame may be acquired based on the frame classification parameter of the subframe in the received effective code stream; illustratively, the process may be implemented as: the method comprises the steps of obtaining parameter values of frame-classifying parameters of subframes adjacent to a subframe corresponding to a failure code stream in the same frame, and obtaining parameter values of the frame-classifying parameters corresponding to the failure code stream, so that complete parameter information of a current frame is obtained, and the integrity of a description code stream of the current frame obtained by decoding equipment is guaranteed under the condition that the frame-classifying parameters in the failure code stream are not predicted; the adjacent subframe may be a subframe whose timing is before the subframe corresponding to the failure code stream, or may be a subframe whose timing is after the subframe corresponding to the failure code stream.
In step 410, the decoding device reorganizes the decoding parameter values corresponding to the valid code stream and the prediction parameter values corresponding to the invalid code stream according to the subframe sequence, so as to obtain a second decoding parameter.
Since the predicted parameter value corresponding to the failed code stream is a parameter value having a physical meaning, after the predicted parameter value corresponding to the failed code stream is obtained, in order to ensure the completeness of the decoding parameter, the predicted parameter value corresponding to the failed code stream and the decoding parameter value corresponding to the effective code stream may be recombined to obtain a frame of second decoding parameter corresponding to the encoding parameter.
In step 411, the decoding device decodes the second decoding parameter to obtain a second audio signal frame.
Fig. 7 is a schematic diagram illustrating a decoding process according to an exemplary embodiment of the present application, where, as shown in fig. 7, when an effective code stream is a part of n description code streams in an encoded code stream, after obtaining the effective code stream 710, a decoding device performs code stream analysis on each code stream in the effective code stream 710 to obtain parameter index values corresponding to each code stream in the effective code stream, and obtains decoding parameter values corresponding to each code stream by table lookup or correlation calculation; meanwhile, through the parameter prediction network corresponding to each kind of frame classification parameter, based on the historical parameter information of each kind of frame classification parameter, parameter prediction is performed on each kind of frame classification parameter in the failure code stream, so as to obtain the predicted parameter value of each kind of frame classification parameter, the decoding parameter value corresponding to each code stream in the effective code stream 710 is subjected to parameter recombination with the deep learning predicted result 720 corresponding to the deep learning predicted result 720 in fig. 7, so as to obtain a second decoding parameter 730, and a final audio signal frame, namely a second audio signal frame, is obtained after the parameter decoding process.
In summary, in the audio encoding and decoding method provided by the embodiment of the present application, at the encoding device side, the encoding parameters of the first audio signal frame are divided into the whole frame type parameters and the frame type parameters, the frame type parameters are allocated to different description encoding code streams, and the whole frame type parameters are copied and added to each description encoding code stream to generate two or more description encoding code streams, so that the single description encoding device is changed into the multiple description encoding device, the encoding mode of the encoding device is changed from single description encoding to multiple description encoding, and the packet loss resistance performance of the audio encoding and decoding system is improved;
Meanwhile, the method provided by the application is to add and reform the adaptive function on the basis of the original coding equipment, so that the original coding equipment is not required to be globally replaced, the influence on coding compression efficiency, bandwidth occupation, voice quality and the like caused by the replacement of the coding equipment is avoided, and the coding effect on the audio signal is improved;
furthermore, the method provided by the application does not need to globally replace the original coding equipment, thereby improving the portability of the audio coding and decoding scheme
On the decoding equipment side, the audio signal frame is restored based on the received effective code stream, when the packet loss occurs in the transmission process, the frame-dividing parameters in the failure code stream are subjected to parameter prediction in a deep learning mode, so that the lost frame-dividing parameters are restored, the voice signal of the whole frame is further decoded, the packet loss resistance of the audio coding and decoding system is further improved, and the accuracy of audio signal transmission is improved.
Taking an audio communication scenario as an example, fig. 8 shows a schematic diagram of an audio codec method according to an exemplary embodiment of the present application, which may be interactively performed by an encoding device and a decoding device, as shown in fig. 8, the process may be implemented as follows: the encoding device 810 encodes the voice signal to obtain encoding parameters, where the encoding parameters may be divided into whole frame class parameters and frame class parameters, and the number of the whole frame class parameters and the number of the frame class parameters may be one or more, and fig. 8 illustrates an example that one whole frame class parameter and N frame class parameters are included, and corresponding description encoding code streams are generated corresponding to the N frame class parameters, respectively, where each description encoding code stream includes the whole frame class parameter; the encoding device 810 sends the generated N description code streams to the decoding device 830 through the transmission network 820, and accordingly, the decoding device receives the N description code streams, in a scenario where a packet loss situation exists, after the decoding device 830 acquires an effective code stream, the decoding device 830 recovers frame class parameters in the ineffective code stream through the deep learning network to obtain predicted parameter values corresponding to the ineffective code stream, so as to perform parameter recombination with decoding parameter values obtained by processing the effective code stream, and the decoding device 830 decodes the recombined parameters to obtain a final voice signal.
Fig. 9 shows a block diagram of an audio encoding apparatus according to an exemplary embodiment of the present application, as shown in fig. 9, including:
a first acquisition module 910, configured to acquire a first audio signal frame;
the encoding module 920 is configured to encode the first audio signal frame to obtain encoding parameters of the first audio signal, where the encoding parameters include an entire frame class parameter and a frame class parameter; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
a grouping module 930, configured to group the frame-class parameters to obtain n parameter groups; the parameter group comprises framing class parameters of at least one subframe corresponding to the parameter group, n is more than or equal to 2, and n is a positive integer;
and a combining module 940, configured to combine the whole frame class parameter with n parameter packets, respectively, to generate n description code streams corresponding to the n parameter packets.
In one possible implementation, the grouping module 930 includes:
a number determination sub-module for determining the number of parameter packets based on the number of the at least two subframes; the upper limit of the number of the parameter packets is equal to the number of the at least two subframes;
and the grouping sub-module is used for dividing the parameter information corresponding to each of the at least two subframes into n groups respectively based on the number of the parameter groups so as to obtain n parameter groups.
In one possible implementation, the number of subframes is equal to the number of subframes in response to the parameter packet; the grouping sub-module is configured to divide parameter information corresponding to each of the at least two subframes into n groups respectively by taking a single subframe as a unit, so as to obtain n parameter groups.
In one possible implementation manner, in response to the number of the parameter packets being smaller than the number of the at least two subframes, and the number m of the at least two subframes being greater than 2, n of the parameter packets include at least one target parameter packet, where the number of subframes corresponding to the target parameter packet is i, 2+.i < m, i, and m are all positive integers.
In summary, in the audio encoding device provided by the embodiment of the application, the encoding parameters of the first audio signal frame are divided into the whole frame type parameters and the frame type parameters, the frame type parameters are distributed to different description encoding code streams, the whole frame type parameters are copied and added to each description encoding code stream, and two or more description encoding code streams are generated, so that a single description encoding device is changed into a multi-description encoding device, the encoding mode of the encoding device is changed from single description encoding to multi-description encoding, and the packet loss resistance performance is improved.
Meanwhile, the method provided by the application is to add and reform the adaptive function on the basis of the original coding equipment, so that the original coding equipment is not required to be globally replaced, the influence on coding compression efficiency, bandwidth occupation, voice quality and the like caused by the replacement of the coding equipment is avoided, and the coding effect on the audio signal is improved.
Furthermore, the method provided by the application does not need to globally replace the original coding equipment, thereby improving the portability of the audio coding and decoding scheme
Fig. 10 shows a block diagram of an audio decoding apparatus according to an exemplary embodiment of the present application, as shown in fig. 10, including:
An effective code stream receiving module 1010, configured to receive an effective code stream, where the effective code stream is a successfully received code stream of n description coding code streams; the n description coding code streams are obtained after the coding parameters of the first audio signal frame are subjected to grouping processing; the grouping processing refers to processing operations of grouping frame class parameters in the coding parameters, and respectively adding whole frame class parameters in the coding parameters into the n parameter groups after obtaining n parameter groups; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
The code stream processing module 1020 is configured to perform code stream processing on an effective code stream to obtain a decoding parameter value corresponding to the effective code stream;
A second obtaining module 1030 is configured to obtain a second audio signal frame based on the decoding parameter value corresponding to the valid code stream.
In one possible implementation manner, in response to the valid code stream being all of the n description code streams, the second obtaining module 1030 includes:
The first reorganization submodule is used for reorganizing decoding parameter values corresponding to the descriptive coded streams according to the subframe sequence to obtain first decoding parameters;
And the first decoding submodule is used for decoding the first decoding parameter to obtain the second audio signal frame.
In one possible implementation manner, in response to the valid code stream being a partial code stream of the n description code streams, the second obtaining module 1030 includes:
The parameter prediction sub-module is used for carrying out parameter prediction on the failure code stream to obtain a predicted parameter value corresponding to the failure code stream; the failure code stream is the code stream which is not successfully received in the n description code streams;
A second reconstruction sub-module, configured to reconstruct, according to a sub-frame sequence, a decoding parameter value corresponding to the valid code stream and a prediction parameter value corresponding to the invalid code stream, to obtain a second decoding parameter;
And the second decoding submodule is used for decoding the second decoding parameter to obtain the second audio signal frame.
In one possible implementation, the parameter prediction submodule includes:
An information acquisition unit for acquiring historical frame parameter information; the historical frame parameter information comprises historical parameter values of target frame numbers corresponding to the failure code stream;
and the parameter prediction unit is used for carrying out parameter prediction based on the historical frame parameter information to obtain a prediction parameter value corresponding to the failure code stream.
In one possible implementation manner, the parameter prediction unit is configured to perform parameter prediction on the framing parameter in the failure code stream based on the historical frame parameter information, so as to obtain a predicted parameter value corresponding to the framing parameter in the failure code stream.
In one possible implementation manner, the parameter prediction unit is configured to input historical parameter information corresponding to a first target parameter into a first prediction network, and obtain a predicted parameter value corresponding to the first target parameter output by the first prediction network; the first target parameter belongs to the framing class parameter;
The first predictive network is trained based on historical sample parameter values of the first target parameter and historical parameter tags of the first target parameter.
In summary, in the audio decoding apparatus provided in the embodiment of the present application, the code stream analysis is performed on the received effective code stream to obtain the code stream analysis result corresponding to the effective code stream, so as to obtain the second audio signal frame based on the code stream analysis result, where the effective code stream is all the code streams and part of the code streams received by the decoding device in the n description coding code streams, when the n description coding code streams are generated, the coding parameters of the first audio signal frame are divided into the whole frame type parameters and the frame type parameters, the frame type parameters are allocated to different description coding code streams, and the whole frame type parameters are copied and added to each description coding code stream to generate the n description coding code streams, so that the single description coding device is changed into the multiple description coding device, and the anti-packet loss performance is improved.
Meanwhile, the method provided by the application is to add and reform the adaptive function on the basis of the original coding equipment, so that the original coding equipment is not required to be globally replaced, the influence on coding compression efficiency, bandwidth occupation, voice quality and the like caused by the replacement of the coding equipment is avoided, and the coding effect on the audio signal is improved.
Furthermore, the method provided by the application does not need to globally replace the original coding equipment, so that the portability of the audio coding and decoding scheme is improved.
Fig. 11 illustrates a block diagram of a computer device 1100 in accordance with an exemplary embodiment of the present application. The computer device 1100 may be implemented as an encoding device as described above, or as a decoding device, such as: smart phones, tablet computers, notebook computers or desktop computers. The computer device 1100 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.
In general, the computer device 1100 includes: a processor 1101 and a memory 1102.
The processor 1101 may include one or more processing cores, such as a 4-core processor, an 11-core processor, and the like. The processor 1101 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1101 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement all or part of the steps in the audio encoding method shown in the method embodiments of the present application.
In some embodiments, the computer device 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, a positioning assembly 1108, and a power supply 1109.
A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
In some embodiments, the computer device 1100 also includes one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyroscope sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.
Those skilled in the art will appreciate that the architecture shown in fig. 11 is not limiting as to the computer device 1100, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.
In an exemplary embodiment, a computer readable storage medium is also provided for storing at least one computer program that is loaded and executed by a processor to implement all or part of the steps of the above described audio encoding method and/or audio decoding method. For example, the computer readable storage medium may be Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), compact disc Read-Only Memory (CD-ROM), magnetic tape, floppy disk, optical data storage device, and the like.
In an exemplary embodiment, a computer program product or a computer program is also provided, the computer program product comprising at least one computer program loaded into a processor and executed all or part of the steps of the method shown in any of the embodiments of fig. 2, 3 or 4 described above.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A method of audio encoding, the method comprising:
Acquiring a first audio signal frame;
Encoding the first audio signal frame to obtain encoding parameters of the first audio signal, wherein the encoding parameters comprise whole frame type parameters and frame type parameters; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
Grouping the frame grouping parameters to obtain n parameter groups; the parameter group comprises framing class parameters of at least one subframe corresponding to the parameter group, n is more than or equal to 2, and n is a positive integer; the upper limit of the number of the parameter packets is equal to the number of the at least two subframes;
And respectively combining the whole frame type parameters with n parameter groups to generate n description coding code streams corresponding to the n parameter groups.
2. The method of claim 1, wherein grouping the framing class parameters to obtain n parameter groups comprises:
determining a number of the parameter packets based on the number of the at least two subframes;
And dividing the parameter information corresponding to each of the at least two subframes into n groups based on the number of the parameter groups so as to obtain n parameter groups.
3. The method of claim 2, wherein in response to the number of parameter packets being equal to the number of at least two subframes; dividing the parameter information corresponding to each of the at least two subframes into n packets based on the number of the parameter packets, to obtain n parameter packets, including:
Dividing the parameter information corresponding to each of the at least two subframes into n groups respectively by taking a single subframe as a unit to obtain n parameter groups.
4. The method of claim 2, wherein in response to the number of parameter packets being less than the number of at least two subframes, and the number of at least two subframes m > 2, n of the parameter packets include at least one target parameter packet, the number of subframes corresponding to the target parameter packet being i, 2+.i < m, i, m being a positive integer.
5. A method of audio decoding, the method comprising:
Receiving an effective code stream, wherein the effective code stream is a successfully received code stream in n description coding code streams; the n description coding code streams are obtained after the coding parameters of the first audio signal frame are subjected to grouping processing; the grouping processing refers to processing operations of grouping frame class parameters in the coding parameters, and respectively adding whole frame class parameters in the coding parameters into the n parameter groups after obtaining n parameter groups; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively; the upper limit of the number of the parameter packets is equal to the number of the at least two subframes;
performing code stream processing on the effective code stream to obtain a decoding parameter value corresponding to the effective code stream;
And acquiring a second audio signal frame based on the decoding parameter value corresponding to the effective code stream.
6. The method of claim 5, wherein, in response to the valid code stream being all of the n descriptive code streams, the obtaining a second audio signal frame based on decoding parameter values corresponding to the valid code stream comprises:
Recombining decoding parameter values corresponding to the description coding code streams according to the subframe sequence to obtain a first decoding parameter;
and decoding the first decoding parameter to obtain the second audio signal frame.
7. The method of claim 5, wherein, in response to the valid code stream being a partial code stream of the n description coded code streams, the obtaining a second audio signal frame based on decoding parameter values corresponding to the valid code stream comprises:
parameter prediction is carried out on the failure code stream, and a prediction parameter value corresponding to the failure code stream is obtained; the failure code stream is the code stream which is not successfully received in the n description code streams;
According to the subframe sequence, recombining decoding parameter values corresponding to the effective code stream and prediction parameter values corresponding to the ineffective code stream to obtain a second decoding parameter;
And decoding the second decoding parameter to obtain the second audio signal frame.
8. The method of claim 7, wherein the performing parameter prediction on the failure code stream to obtain a predicted parameter value corresponding to the failure code stream comprises:
Acquiring historical frame parameter information; the historical frame parameter information comprises historical parameter values of target frame numbers corresponding to the failure code stream;
And carrying out parameter prediction based on the historical frame parameter information to obtain a prediction parameter value corresponding to the failure code stream.
9. The method according to claim 8, wherein the performing parameter prediction based on the historical frame parameter information to obtain the predicted parameter value corresponding to the failure code stream includes:
And carrying out parameter prediction on the framing parameters in the failure code stream based on the historical frame parameter information to obtain predicted parameter values corresponding to the framing parameters in the failure code stream.
10. The method according to claim 9, wherein the performing parameter prediction on the framing-type parameter in the failure code stream based on the historical frame parameter information to obtain a predicted parameter value corresponding to the framing-type parameter in the failure code stream includes:
inputting historical parameter information corresponding to a first target parameter into a first prediction network, and obtaining a prediction parameter value corresponding to the first target parameter output by the first prediction network; the first target parameter belongs to the framing class parameter;
The first predictive network is trained based on historical sample parameter values of the first target parameter and historical parameter tags of the first target parameter.
11. An audio encoding apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a first audio signal frame;
The encoding module is used for encoding the first audio signal frame to obtain encoding parameters of the first audio signal, wherein the encoding parameters comprise whole frame type parameters and frame division type parameters; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively;
The grouping module is used for grouping the framing parameters to obtain n parameter groups; the parameter group comprises framing class parameters of at least one subframe corresponding to the parameter group, n is more than or equal to 2, and n is a positive integer; the upper limit of the number of the parameter packets is equal to the number of the at least two subframes;
and the combination module is used for respectively combining the whole frame type parameters with n parameter groups to generate n description coding code streams corresponding to the n parameter groups.
12. An audio decoding apparatus, the apparatus comprising:
an effective code stream receiving module, configured to receive an effective code stream, where the effective code stream is a successfully received code stream in n description coding code streams; the n description coding code streams are obtained after the coding parameters of the first audio signal frame are subjected to grouping processing; the grouping processing refers to processing operations of grouping frame class parameters in the coding parameters, and respectively adding whole frame class parameters in the coding parameters into the n parameter groups after obtaining n parameter groups; the whole frame type parameter is a coding parameter shared by at least two subframes obtained by coding the first audio signal frame; the frame classification parameters are coding parameters corresponding to the at least two subframes respectively; the upper limit of the number of the parameter packets is equal to the number of the at least two subframes;
the code stream processing module is used for carrying out code stream processing on the effective code stream to obtain a decoding parameter value corresponding to the effective code stream;
And the second acquisition module is used for acquiring a second audio signal frame based on the decoding parameter value corresponding to the effective code stream.
13. A computer device, characterized in that it comprises a processor and a memory, said memory storing at least one computer program, said at least one computer program being loaded and executed by said processor to implement the audio encoding method according to any of claims 1 to 4 or the audio decoding method according to any of claims 5 to 10.
14. A computer readable storage medium, characterized in that at least one computer program is stored in the computer readable storage medium, which computer program is loaded and executed by a processor to implement the audio encoding method of any one of claims 1 to 4 or the audio decoding method of any one of claims 5 to 10.
15. A computer program product, characterized in that the computer program product comprises at least one computer program, which is loaded and executed by a processor to implement the audio encoding method of any of claims 1 to 4 or the audio decoding method of any of claims 5 to 10.
CN202111327258.9A 2021-11-10 2021-11-10 Audio encoding method, decoding method, device, equipment, storage medium and product Active CN114333862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111327258.9A CN114333862B (en) 2021-11-10 2021-11-10 Audio encoding method, decoding method, device, equipment, storage medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111327258.9A CN114333862B (en) 2021-11-10 2021-11-10 Audio encoding method, decoding method, device, equipment, storage medium and product

Publications (2)

Publication Number Publication Date
CN114333862A CN114333862A (en) 2022-04-12
CN114333862B true CN114333862B (en) 2024-05-03

Family

ID=81045090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111327258.9A Active CN114333862B (en) 2021-11-10 2021-11-10 Audio encoding method, decoding method, device, equipment, storage medium and product

Country Status (1)

Country Link
CN (1) CN114333862B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831546A (en) * 2022-09-29 2024-04-05 抖音视界有限公司 Encoding method, decoding method, encoder, decoder, electronic device, and storage medium
CN118038879A (en) * 2022-11-07 2024-05-14 抖音视界有限公司 Audio data encoding method, audio data decoding method and audio data decoding device
CN118230743A (en) * 2022-12-20 2024-06-21 北京字跳网络技术有限公司 Audio processing method, device and equipment
CN118230742A (en) * 2022-12-20 2024-06-21 北京字跳网络技术有限公司 Audio processing method, device and equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781881A (en) * 1995-10-19 1998-07-14 Deutsche Telekom Ag Variable-subframe-length speech-coding classes derived from wavelet-transform parameters
US6496798B1 (en) * 1999-09-30 2002-12-17 Motorola, Inc. Method and apparatus for encoding and decoding frames of voice model parameters into a low bit rate digital voice message
CN1852429A (en) * 2005-12-05 2006-10-25 华为技术有限公司 Video-code-flow grouped transmission method and system
CN101150564A (en) * 2006-09-19 2008-03-26 华为技术有限公司 Method for dividing frame structure, its realization system and a domain scheduling method
CN101253809A (en) * 2005-08-30 2008-08-27 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
CN104123946A (en) * 2006-07-31 2014-10-29 高通股份有限公司 Systemand method for including identifier with packet associated with speech signal
CN109660825A (en) * 2017-10-10 2019-04-19 腾讯科技(深圳)有限公司 Video transcoding method, device, computer equipment and storage medium
CN111312264A (en) * 2020-02-20 2020-06-19 腾讯科技(深圳)有限公司 Voice transmission method, system, device, computer readable storage medium and equipment
CN113113032A (en) * 2020-01-10 2021-07-13 华为技术有限公司 Audio coding and decoding method and audio coding and decoding equipment
CN113192521A (en) * 2020-01-13 2021-07-30 华为技术有限公司 Audio coding and decoding method and audio coding and decoding equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2869151B1 (en) * 2004-04-19 2007-01-26 Thales Sa METHOD OF QUANTIFYING A VERY LOW SPEECH ENCODER
CN101308655B (en) * 2007-05-16 2011-07-06 展讯通信(上海)有限公司 Audio coding and decoding method and layout design method of static discharge protective device and MOS component device
TWI390503B (en) * 2009-11-19 2013-03-21 Gemtek Technolog Co Ltd Dual channel voice transmission system, broadcast scheduling design module, packet coding and missing sound quality damage estimation algorithm
US9118807B2 (en) * 2013-03-15 2015-08-25 Cisco Technology, Inc. Split frame multistream encode

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781881A (en) * 1995-10-19 1998-07-14 Deutsche Telekom Ag Variable-subframe-length speech-coding classes derived from wavelet-transform parameters
US6496798B1 (en) * 1999-09-30 2002-12-17 Motorola, Inc. Method and apparatus for encoding and decoding frames of voice model parameters into a low bit rate digital voice message
CN101253809A (en) * 2005-08-30 2008-08-27 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
CN1852429A (en) * 2005-12-05 2006-10-25 华为技术有限公司 Video-code-flow grouped transmission method and system
CN104123946A (en) * 2006-07-31 2014-10-29 高通股份有限公司 Systemand method for including identifier with packet associated with speech signal
CN101150564A (en) * 2006-09-19 2008-03-26 华为技术有限公司 Method for dividing frame structure, its realization system and a domain scheduling method
CN109660825A (en) * 2017-10-10 2019-04-19 腾讯科技(深圳)有限公司 Video transcoding method, device, computer equipment and storage medium
CN113113032A (en) * 2020-01-10 2021-07-13 华为技术有限公司 Audio coding and decoding method and audio coding and decoding equipment
CN113192521A (en) * 2020-01-13 2021-07-30 华为技术有限公司 Audio coding and decoding method and audio coding and decoding equipment
CN111312264A (en) * 2020-02-20 2020-06-19 腾讯科技(深圳)有限公司 Voice transmission method, system, device, computer readable storage medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于子帧联合编码的600b/s低速语音编码算法;陈亮, 张雄伟;电子与信息学报;20030315(第03期);全文 *

Also Published As

Publication number Publication date
CN114333862A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN114333862B (en) Audio encoding method, decoding method, device, equipment, storage medium and product
JP7383138B2 (en) Audio transmission method, its system, device, computer program, and computer equipment
CN103299366B (en) Devices for encoding and detecting a watermarked signal
CN111312264B (en) Voice transmission method, system, device, computer readable storage medium and apparatus
CN104054125A (en) Devices for redundant frame coding and decoding
US11395010B2 (en) Massive picture processing method converting decimal element in matrices into binary element
US11900954B2 (en) Voice processing method, apparatus, and device and storage medium
CN101262418A (en) Transmission of a digital message interspersed throughout a compressed information signal
CN104995673A (en) Frame error concealment
EP3913808A1 (en) Split gain shape vector coding
CN114067800B (en) Voice recognition method and device and electronic equipment
CN113990347A (en) Signal processing method, computer equipment and storage medium
CN109474826B (en) Picture compression method and device, electronic equipment and storage medium
CN114339252B (en) Data compression method and device
CN114726893B (en) Internet of things application layer access method and system capable of supporting multiple underlying bearer protocols
CN115206330A (en) Audio processing method, audio processing apparatus, electronic device, and storage medium
JP6106336B2 (en) Inter-channel level difference processing method and apparatus
CN113744744A (en) Audio coding method and device, electronic equipment and storage medium
EP2860728A1 (en) Method and apparatus for encoding and for decoding directional side information
CN113625965B (en) Data storage method, system and device of distributed storage system and storage medium
CN113808601B (en) Method, device and electronic equipment for generating RDSS short message channel voice code
CN116580716B (en) Audio encoding method, device, storage medium and computer equipment
US20240153514A1 (en) Machine Learning Based Enhancement of Audio for a Voice Call
EP4398242A1 (en) Encoding and decoding methods and apparatus, device, storage medium, and computer program
CN105208399A (en) Information processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40071960

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant