CN113744744B - Audio coding method, device, electronic equipment and storage medium - Google Patents

Audio coding method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113744744B
CN113744744B CN202110890333.6A CN202110890333A CN113744744B CN 113744744 B CN113744744 B CN 113744744B CN 202110890333 A CN202110890333 A CN 202110890333A CN 113744744 B CN113744744 B CN 113744744B
Authority
CN
China
Prior art keywords
audio frame
redundancy
frame
target audio
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110890333.6A
Other languages
Chinese (zh)
Other versions
CN113744744A (en
Inventor
崔承宗
阮良
陈功
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Zhiqi Technology Co Ltd
Original Assignee
Hangzhou Netease Zhiqi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Zhiqi Technology Co Ltd filed Critical Hangzhou Netease Zhiqi Technology Co Ltd
Priority to CN202110890333.6A priority Critical patent/CN113744744B/en
Publication of CN113744744A publication Critical patent/CN113744744A/en
Application granted granted Critical
Publication of CN113744744B publication Critical patent/CN113744744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present disclosure discloses an audio encoding method, an apparatus, an electronic device, and a storage medium, the method comprising: the method comprises the steps of obtaining a target audio frame and coding redundancy information at least comprising first redundancy indication information, performing first coding processing on the target audio frame to obtain conventional code stream data of the target audio frame, performing source redundancy processing on the conventional code stream data of the target audio frame and the redundant code stream data of the first audio frame in response to the first redundancy indication information that the first audio frame is the redundant frame of the target audio frame to obtain coded data of the target audio frame, and outputting the coded data, wherein the first audio frame is already coded and is separated from the target audio frame by at least one audio frame, the redundant code stream data of the first audio frame is obtained by performing second coding processing on the first audio frame, and the audio compression degree corresponding to the second coding processing is higher than that corresponding to the first coding processing. Thus, the selection of the redundant frames of the target audio frame is flexible, and the flexibility of audio coding is improved.

Description

Audio coding method, device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of audio processing, and in particular relates to an audio encoding method, an audio encoding device, electronic equipment and a storage medium.
Background
WebRTC is a real-time audio-video open source framework, comprising a variety of audio codecs, and Opus is one of the commonly used audio codecs.
The Opus has the advantages of wide dynamic code rate coverage, variable coding frame length, variable complexity, good network packet loss resistance, coverage of all frequency ranges from narrow band to full band, suitability for speech and music scenes and the like. But the coding flexibility of Opus remains to be improved.
Disclosure of Invention
The embodiment of the disclosure provides an audio encoding method, an audio encoding device, electronic equipment and a storage medium, which are used for improving flexibility of audio encoding.
In a first aspect, an embodiment of the present disclosure provides an audio encoding method, including:
acquiring a target audio frame and coding redundancy information, wherein the coding redundancy information at least comprises first redundancy indication information, the first redundancy indication information is used for indicating whether a first audio frame is a redundancy frame of the target audio frame, and the first audio frame is already coded and is separated from the target audio frame by at least one audio frame;
performing first coding processing on the target audio frame to obtain conventional code stream data of the target audio frame;
in response to first redundancy indication information that the first audio frame is a redundancy frame of the target audio frame, performing source redundancy processing on conventional code stream data of the target audio frame and redundancy code stream data of the first audio frame to obtain encoded data of the target audio frame, wherein the redundancy code stream data of the first audio frame is obtained by performing second encoding processing on the first audio frame, and the audio compression degree corresponding to the second encoding processing is higher than that corresponding to the first encoding processing;
and outputting the coded data of the target audio frame.
In some possible implementations, the first audio frame is separated from the target audio frame by one audio frame.
In some possible embodiments, the encoded redundancy information further includes second redundancy indicating information for indicating whether a second audio frame is a redundant frame of the target audio frame, the second audio frame being located between the first audio frame and the target audio frame, further including:
and responding to first redundancy indication information of the first audio frame which is the redundancy frame of the target audio frame and second redundancy indication information of the second audio frame which is the redundancy frame of the target audio frame, performing information source redundancy processing on conventional code stream data of the target audio frame, redundancy code stream data of the first audio frame and redundancy code stream data of the second audio frame to obtain encoded data of the target audio frame, wherein the redundancy code stream data of the second audio frame is obtained by performing second encoding processing on the second audio frame.
In some possible embodiments, performing source redundancy processing on the regular code stream data of the target audio frame, the redundant code stream data of the first audio frame, and the redundant code stream data of the second audio frame to obtain encoded data of the target audio frame, including:
and performing information source redundancy processing on the conventional code stream data of the target audio frame, the redundant code stream data of the first audio frame and the redundant code stream data of the second audio frame by adopting a distance coding algorithm to obtain the coded data of the target audio frame.
In some possible embodiments, the method further comprises:
responding to first redundancy indication information of the first audio frame being a redundancy frame of the target audio frame or second redundancy indication information of the second audio frame being a redundancy frame of the target audio frame, and performing second encoding processing on the target audio frame to obtain redundancy code stream data of the target audio frame;
and storing redundant code stream data of the target audio frame.
In a second aspect, embodiments of the present disclosure provide an audio encoding apparatus, including:
the device comprises an acquisition module, a coding module and a coding module, wherein the coding module is used for acquiring a target audio frame and coding redundancy information, the coding redundancy information at least comprises first redundancy indication information, the first redundancy indication information is used for indicating whether a first audio frame is a redundancy frame of the target audio frame, and the first audio frame is already coded and is separated from the target audio frame by at least one audio frame;
the encoding module is used for carrying out first encoding processing on the target audio frame to obtain conventional code stream data of the target audio frame;
the redundancy processing module is used for responding to first redundancy indication information of the first audio frame which is the redundancy frame of the target audio frame, performing information source redundancy processing on conventional code stream data of the target audio frame and redundancy code stream data of the first audio frame to obtain encoded data of the target audio frame, wherein the redundancy code stream data of the first audio frame is obtained by performing second encoding processing on the first audio frame, and the audio compression degree corresponding to the second encoding processing is higher than that corresponding to the first encoding processing;
and the output module is used for outputting the coded data of the target audio frame.
In some possible implementations, the first audio frame is separated from the target audio frame by one audio frame.
In some possible implementations, the encoded redundancy information further includes second redundancy indicating information for indicating whether a second audio frame is a redundant frame of the target audio frame, the second audio frame being located between the first audio frame and the target audio frame, the redundancy processing module further configured to:
and responding to first redundancy indication information of the first audio frame which is the redundancy frame of the target audio frame and second redundancy indication information of the second audio frame which is the redundancy frame of the target audio frame, performing information source redundancy processing on conventional code stream data of the target audio frame, redundancy code stream data of the first audio frame and redundancy code stream data of the second audio frame to obtain encoded data of the target audio frame, wherein the redundancy code stream data of the second audio frame is obtained by performing second encoding processing on the second audio frame.
In some possible embodiments, the redundancy processing module is specifically configured to:
and performing information source redundancy processing on the conventional code stream data of the target audio frame, the redundant code stream data of the first audio frame and the redundant code stream data of the second audio frame by adopting a distance coding algorithm to obtain the coded data of the target audio frame.
In some possible embodiments, the method further comprises a storage module:
the encoding module is further configured to perform a second encoding process on the target audio frame in response to the first redundancy indication information that the first audio frame is a redundancy frame of the target audio frame or the second redundancy indication information that the second audio frame is a redundancy frame of the target audio frame, so as to obtain redundancy stream data of the target audio frame;
the storage module is used for storing redundant code stream data of the target audio frame.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the audio encoding method described above.
In a fourth aspect, embodiments of the present disclosure provide a storage medium, which when executed by a processor of an electronic device, is capable of performing the above-described audio encoding method.
In the embodiment of the disclosure, a target audio frame and encoded redundancy information at least including first redundancy indication information are acquired, first encoding processing is performed on the target audio frame to obtain conventional code stream data of the target audio frame, source redundancy processing is performed on the conventional code stream data of the target audio frame and the redundant code stream data of the first audio frame in response to the first redundancy indication information that the first audio frame is a redundant frame of the target audio frame, encoded data of the target audio frame is obtained and output, wherein the first audio frame is encoded and separated from the target audio frame by at least one audio frame, the redundant code stream data of the first audio frame is obtained by performing second encoding processing on the first audio frame, and the audio compression degree corresponding to the second encoding processing is higher than that corresponding to the first encoding processing. Therefore, when the target audio frame is encoded, the first audio frame which is separated from the target audio frame by at least one audio frame and is encoded can be used as a redundant frame of the target audio frame, the selection of the redundant frame is flexible, the flexibility of audio encoding is improved, the redundant code stream data of the first audio frame which is subjected to the information source redundancy processing is more compressed than the conventional code stream data of the target audio frame, and the code rate is saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the present disclosure, and together with the description serve to explain the present disclosure. In the drawings:
fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of yet another audio encoding method provided by an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a structure of a SILK layer structure of one frame of Opus code stream according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a process of audio encoding provided by an embodiment of the present disclosure;
fig. 5 is a schematic diagram of compression results of related code stream data of each audio frame according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an audio encoding apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic hardware structure of an electronic device for implementing an audio encoding method according to an embodiment of the disclosure.
Detailed Description
In order to improve flexibility of audio coding, embodiments of the present disclosure provide an audio coding method, an apparatus, an electronic device, and a storage medium.
The preferred embodiments of the present disclosure will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and are not intended to limit the present disclosure, and embodiments of the present disclosure and features of the embodiments may be combined with each other without conflict.
In order to facilitate understanding of the present disclosure, the present disclosure relates to the technical terms:
opus, a lossy sound coding format developed by the xiph. Org foundation, and later standardized by the internet engineering task force (Internet Engineering Task Force, IETF), standard format defined in RFC 6716 protocol.
SILK, an audio codec applied to Opus.
A distance encoding (Range encoding) algorithm, an entropy encoding method, can effectively improve the data compression rate.
Fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present disclosure, the method including the following steps.
In step S101, a target audio frame and encoded redundancy information including at least first redundancy indication information are acquired.
Wherein the first redundancy indicating information is used for indicating whether the first audio frame is a redundant frame of the target audio frame, and the first audio frame is already encoded and is separated from the target audio frame by at least one audio frame.
In a specific implementation, the first redundancy indication information may be a flag bit a. Taking the values of the flag bit A as 0 and 1 as an example, when the value of the flag bit A is 0, the first audio frame is not a redundant frame of the target audio frame; when the value of the flag bit A is 1, the first audio frame is a redundant frame of the target audio frame. Taking the value of the flag bit A as FALSE and TRUE as an example, when the value of the flag bit A is FALSE, the first audio frame is not a redundant frame of the target audio frame; when the flag bit a is TRUE, it indicates that the first audio frame is a redundant frame of the target audio frame.
Assuming that the target audio frame is the T-th audio frame in the sequence of audio frames, the first audio frame may be the T-2 th audio frame, the T-3 rd audio frame, the T-4 th audio frame, etc.
In addition, the first audio frame may be spaced from the target audio frame by one audio frame, i.e., the first audio frame may be the T-2 th audio frame when the target audio frame is the T-th audio frame in the sequence of audio frames, taking into account that redundancy of the audio frames is less necessary the further the audio frame is spaced from the target audio frame. Thus, the target audio frame can be flexibly encoded, and the function of the redundant frame can be exerted to the maximum extent.
In step S102, a first encoding process is performed on the target audio frame, so as to obtain conventional code stream data of the target audio frame.
In step S103, in response to the first redundancy indication information that the first audio frame is a redundancy frame of the target audio frame, source redundancy processing is performed on the regular code stream data of the target audio frame and the redundancy code stream data of the first audio frame, so as to obtain encoded data of the target audio frame.
The redundant code stream data of the first audio frame is obtained by performing second coding processing on the first audio frame, and the audio compression degree corresponding to the second coding processing is higher than that corresponding to the first coding processing.
In the above example, the first redundancy indication information of the first audio frame being the redundancy frame of the target audio frame is that the index flag a has a value of TRUE or 1.
In addition, it should be noted that, for the same audio frame, the regular code stream data of the audio frame is obtained by performing a first encoding process on the audio frame, and the redundant code stream data of the audio frame is obtained by performing a second encoding process on the audio frame, where the audio compression degree corresponding to the second encoding process is higher than the audio compression degree corresponding to the first encoding process, that is, the data amount of the regular code stream data of the audio frame is greater than the data amount of the redundant code stream data of the audio frame, and the redundant code stream data of the audio frame is equivalent to a low-profile version of the regular code stream data of the audio frame.
In step S104, encoded data of the target audio frame is output.
In the embodiment of the disclosure, when the target audio frame is encoded, the first audio frame which is spaced from the target audio frame by at least one audio frame and is encoded can be used as the redundant frame of the target audio frame, the selection of the redundant frame is flexible, the flexibility of audio encoding is improved, the redundant code stream data of the first audio frame which is subjected to the information source redundancy processing is more compressed than the conventional code stream data of the target audio frame, and the code rate is saved.
Fig. 2 is a flowchart of yet another audio encoding method provided in an embodiment of the present disclosure, the method including the following steps.
In step S201, a target audio frame and encoded redundancy information including first redundancy indicating information and second redundancy indicating information are acquired.
Wherein the first redundancy indicating information is used for indicating whether the first audio frame is a redundancy frame of the target audio frame, and the first audio frame is already encoded and is separated from the target audio frame by at least one audio frame; the second redundancy indicating information is used for indicating whether the second audio frame is a redundancy frame of the target audio frame, and the second audio frame is located between the first audio frame and the target audio frame.
The representation of the first redundant indication information is described in step S101, and is not described herein. Similarly, the second redundancy indicating information may be one flag bit B. Taking the values of the zone bit B as 0 and 1 as an example, when the value of the zone bit B is 0, the second audio frame is not a redundant frame of the target audio frame; when the value of the flag bit B is 1, it indicates that the second audio frame is a redundant frame of the target audio frame. Taking the values of the flag bit B as FALSE and TRUE as examples, when the value of the flag bit B is FALSE, the second audio frame is not a redundant frame of the target audio frame; when the flag bit B is TRUE, it indicates that the second audio frame is a redundant frame of the target audio frame.
In particular, when the first audio frame is more than one audio frame apart from the target audio frame, the second audio frame may be any audio frame located between the first audio frame and the target audio frame; when a first audio frame is separated from a target audio frame by one audio frame, a second audio frame is the audio frame that is located between the first audio frame and the target audio frame.
In step S202, a first encoding process is performed on the target audio frame, so as to obtain conventional code stream data of the target audio frame.
In step S203, in response to the first redundancy indication information of the first audio frame being the redundancy frame of the target audio frame and the second redundancy indication information of the second audio frame being the redundancy frame of the target audio frame, performing source redundancy processing on the regular code stream data of the target audio frame, the redundancy code stream data of the first audio frame, and the redundancy code stream data of the second audio frame, to obtain encoded data of the target audio frame.
The redundant code stream data of the first audio frame is obtained by performing second coding processing on the first audio frame, the redundant code stream data of the second audio frame is obtained by performing second coding processing on the second audio frame, and the audio compression degree corresponding to the second coding processing is higher than that corresponding to the first coding processing.
In the above example, the first redundancy indication information of the first audio frame being the redundancy frame of the target audio frame indicates that the value of the flag a is TRUE or 1, and the second redundancy indication information of the second audio frame being the redundancy frame of the target audio frame indicates that the value of the flag B is TRUE or 1.
In specific implementation, a Range encoding algorithm can be adopted to perform source redundancy processing on conventional code stream data of the target audio frame, redundant code stream data of the first audio frame and redundant code stream data of the second audio frame, so as to obtain encoded data of the target audio frame. Because the data compression performance of the distance coding algorithm is good, the data compression rate and the audio coding efficiency can be improved.
In step S204, encoded data of the target audio frame is output.
In the embodiment of the disclosure, when encoding the target audio frame, the source redundancy processing can be performed on the first audio frame which is separated from the target audio frame by at least one audio frame and is encoded, and the second audio frame which is positioned between the target audio frame and the first audio frame, and the Range encoding algorithm can be used for performing the source redundancy processing on the corresponding code stream data of the three audio frames, so that the encoding flexibility is better, and the data compression performance is better.
In addition, in any of the above embodiments, the second encoding process may be performed on the target audio frame in response to the first redundancy indication information of the first audio frame being the redundancy frame of the target audio frame or the second redundancy indication information of the second audio frame being the redundancy frame of the target audio frame, to obtain redundancy stream data of the target audio frame, and the redundancy stream data of the target audio frame may be stored. Thus, when the target audio frame is used as the redundant frame of other audio frames, the redundant code stream data of the target audio frame can be directly obtained, and the source redundancy processing is carried out with the conventional code stream data of the corresponding audio frame, so that the encoding speed is improved as much as possible.
The following describes the scheme of the embodiment of the present disclosure by taking Opus as an example.
Taking mono encoding as an example, fig. 3 is a schematic diagram of a structure of a SILK layer structure of a frame Opus code stream according to an embodiment of the present disclosure, including: VAD flags, LBRR (T-1) flag, LBRR (T-2) flag, LBRR (T-1) Frame, LBRR (T-2) Frame, and Regular SILK Frame, wherein VAD flags are used to indicate whether a target audio Frame is a boundary of an audio Frame sequence, LBRR (T-1) flag is used to indicate whether a T-1 audio Frame (i.e., a second audio Frame) is a redundant Frame of a T-th audio Frame (i.e., a target audio Frame), LBRR (T-2) flag is used to indicate whether a T-2 audio Frame (i.e., a first audio Frame) is a redundant Frame of a T-th audio Frame, and LBRR (T-1) Frame and Regular SILK Frame may have a coding Frame length of 20ms.
Fig. 4 is a schematic diagram of an audio encoding process provided in an embodiment of the present disclosure, where for a sequence of audio frames to be sent to a client, audio encoding may be performed according to the flow shown in fig. 4, where the flow includes the following steps:
in step S401, a sequence of audio frames is acquired.
In step S402, it is determined whether the LBRR (T-2) flag is TRUE, and if not, the process proceeds to step S403; if yes, the process proceeds to step S409.
Wherein, when LBRR (T-2) flag is TRUE, it is indicated that the T-2 th audio frame is a redundant frame of the T-th audio frame; when the LBRR (T-2) flag is FALSE, which is not TRUE, it indicates that the T-2 th audio frame is not a redundant frame of the T-th audio frame.
In step S403, it is determined whether the LBRR (T-1) flag is TRUE, and if not, the process proceeds to step S404; if yes, the process proceeds to step S406.
Wherein, when LBRR (T-1) flag is TRUE, it is indicated that the T-1 th audio frame is a redundant frame of the T-th audio frame; when LBRR (T-1) flag is FALSE, it indicates that the T-1 st audio frame is not a redundant frame of the T-th audio frame.
In step S404, the regular code stream data of the T-th audio frame is encoded.
In step S405, the normal code stream data of the T-th audio frame is taken as the encoded data of the T-th audio frame.
In step S406, the redundancy code stream data and the regular code stream data of the T-th audio frame are encoded.
In step S407, it is determined whether the T-th audio frame is the first audio frame, if yes, step S405 is entered; if not, the process advances to step S408.
In step S408, source redundancy processing is performed on the regular code stream data of the T-th audio frame and the redundant code stream data of the T-1 th audio frame to obtain encoded data of the T-th audio frame.
In step S409, it is determined whether the LBRR (T-1) flag is TRUE, if yes, the process proceeds to step S410; if yes, the flow proceeds to step S415.
In step S410, the redundancy code stream data and the regular code stream data of the T-th audio frame are encoded.
In step S411, it is determined whether the T-th audio frame is the first audio frame, if yes, step S405 is entered; if not, the process proceeds to step S412.
In step S412, it is determined whether the T-th audio frame is the second audio frame, if yes, step S413 is entered; if not, the process proceeds to step S414.
In step S413, source redundancy processing is performed on the regular code stream data of the T audio frames and the redundant code stream data of the T-1 th audio frame, to obtain encoded data of the T audio frame.
In step S414, source redundancy processing is performed on the regular code stream data of the T audio frames, the redundancy code stream data of the T-1 audio frames and the redundancy code stream data of the T-2 audio frames to obtain the coding data of the T audio frames.
In specific implementation, the Range Encoding algorithm may be used to perform source redundancy processing on the regular code stream data of the T-th audio frame, the redundancy code stream data of the T-1 th audio frame, and the redundancy code stream data of the T-2 th audio frame, and fig. 5 is a schematic diagram of compression results of the relevant code stream data of each audio frame according to the embodiment of the present disclosure, where 100 represents the regular code stream data of the T-th audio frame, 99 (T-1) represents the redundancy code stream data of the T-1 th audio frame, and 98 (T-2) represents the redundancy code stream data of the T-2 th audio frame.
In step S415, redundant code stream data and regular code stream data of the T-th audio frame are encoded.
In step S416, it is determined whether the T-th audio frame is the first audio frame or the second audio frame, if yes, step S405 is entered; if not, the process advances to step S417.
In step S417, source redundancy processing is performed on the regular code stream data of the T audio frames and the redundant code stream data of the T-2 th audio frame to obtain encoded data of the T audio frame.
In step S418, the encoded data of the T-th audio frame is output.
In the embodiment of the disclosure, when the T audio frame is encoded, flexible protection of the T-1 audio frame and the T-2 audio frame can be realized in the in-band forward error correction redundancy, and an Opus in-band forward error correction redundancy strategy with more flexibility and stronger capability is provided. And on the premise of keeping the anti-packet loss level unchanged, the code rate consumption of the joint optimization of the information source channels can be reduced by utilizing the efficient Range encoding technology in the Opus, so that the purpose of saving the code rate is achieved.
Based on the same technical concept, the embodiments of the present disclosure further provide an audio encoding device, and the principle of solving the problem of the audio encoding device is similar to that of the audio encoding method, so that the implementation of the audio encoding device can refer to the implementation of the audio encoding method, and the repetition is omitted. Fig. 6 is a schematic structural diagram of an audio encoding device according to an embodiment of the present disclosure, which includes an obtaining module 601, an encoding module 602, a redundancy processing module 603, and an output module 604.
An obtaining module 601, configured to obtain a target audio frame and encoded redundancy information, where the encoded redundancy information includes at least first redundancy indication information, where the first redundancy indication information is used to indicate whether a first audio frame is a redundancy frame of the target audio frame, and the first audio frame has been encoded and is separated from the target audio frame by at least one audio frame;
the encoding module 602 is configured to perform a first encoding process on the target audio frame to obtain conventional code stream data of the target audio frame;
the redundancy processing module 603 is configured to perform source redundancy processing on the regular code stream data of the target audio frame and the redundancy code stream data of the first audio frame in response to the first redundancy indication information that the first audio frame is a redundancy frame of the target audio frame, so as to obtain encoded data of the target audio frame, where the redundancy code stream data of the first audio frame is obtained by performing second encoding processing on the first audio frame, and an audio compression degree corresponding to the second encoding processing is higher than an audio compression degree corresponding to the first encoding processing;
an output module 604, configured to output the encoded data of the target audio frame.
In some possible implementations, the first audio frame is separated from the target audio frame by one audio frame.
In some possible implementations, the encoded redundancy information further includes second redundancy indicating information, where the second redundancy indicating information is used to indicate whether a second audio frame is a redundancy frame of the target audio frame, and the second audio frame is located between the first audio frame and the target audio frame, and the redundancy processing module 603 is further configured to:
and responding to first redundancy indication information of the first audio frame which is the redundancy frame of the target audio frame and second redundancy indication information of the second audio frame which is the redundancy frame of the target audio frame, performing information source redundancy processing on conventional code stream data of the target audio frame, redundancy code stream data of the first audio frame and redundancy code stream data of the second audio frame to obtain encoded data of the target audio frame, wherein the redundancy code stream data of the second audio frame is obtained by performing second encoding processing on the second audio frame.
In some possible embodiments, the redundancy processing module 603 is specifically configured to:
and performing information source redundancy processing on the conventional code stream data of the target audio frame, the redundant code stream data of the first audio frame and the redundant code stream data of the second audio frame by adopting a distance coding algorithm to obtain the coded data of the target audio frame.
In some possible implementations, the memory module 605 is further included:
the encoding module 602 is further configured to perform a second encoding process on the target audio frame in response to the first redundancy indication information that the first audio frame is a redundancy frame of the target audio frame or the second redundancy indication information that the second audio frame is a redundancy frame of the target audio frame, to obtain redundancy stream data of the target audio frame;
the storage module 605 is configured to store redundancy stream data of the target audio frame.
The division of the modules in the embodiments of the present disclosure is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in the embodiments of the present disclosure may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The coupling of the individual modules to each other may be achieved by means of interfaces which are typically electrical communication interfaces, but it is not excluded that they may be mechanical interfaces or other forms of interfaces. Thus, the modules illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated modules may be implemented in hardware or in software functional modules.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device includes a transceiver 701 and physical devices such as a processor 702, and the processor 702 may be a central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit, a programmable logic circuit, a large-scale integrated circuit, or a digital processing unit. The transceiver 701 is used for data transmission and reception between the electronic device and other devices.
The electronic device may further comprise a memory 703 for storing software instructions for execution by the processor 702, and of course some other data required by the electronic device, such as identification information of the electronic device, encryption information of the electronic device, user data, etc. The Memory 703 may be a Volatile Memory (RAM), such as a Random-Access Memory (RAM); the Memory 703 may also be a Non-Volatile Memory (Non-Volatile Memory), such as Read-Only Memory (ROM), flash Memory (Flash Memory), hard Disk (HDD) or Solid State Drive (SSD), or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 703 may be a combination of the above.
The specific connection medium between the processor 702, the memory 703, and the transceiver 701 is not limited in the embodiments of the present disclosure. The embodiments of the present disclosure are illustrated in fig. 7 by way of example only, in which the memory 703, the processor 702, and the transceiver 701 are connected by a bus 704, which is shown in bold lines in fig. 7, and the connection between other components is illustrated by way of illustration only and not by way of limitation. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus.
The processor 702 may be dedicated hardware or a processor running software, and when the processor 702 can run software, the processor 702 reads the software instructions stored in the memory 703 and performs the audio encoding method referred to in the foregoing embodiment under the drive of the software instructions.
The disclosed embodiments also provide a storage medium, which when executed by a processor of an electronic device is capable of performing the audio encoding method referred to in the foregoing embodiments.
In some possible implementations, aspects of the audio encoding method provided by the present disclosure may also be implemented in the form of a program product including program code for causing an electronic device to perform the audio encoding method as referred to in the foregoing embodiments, when the program product is run on the electronic device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (EPROM), flash Memory, optical fiber, compact disc read-Only Memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for audio encoding in embodiments of the present disclosure may take the form of a CD-ROM and include program code that can run on a computing device. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In cases involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, such as a local area network (Local Area Network, LAN) or wide area network (Wide Area Network, WAN), or may be connected to an external computing device (e.g., connected over the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the units described above may be embodied in one unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the disclosure.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present disclosure without departing from the spirit or scope of the disclosure. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. An audio encoding method, comprising:
acquiring a target audio frame and encoded redundancy information, wherein the encoded redundancy information comprises first redundancy indication information and second redundancy indication information, the first redundancy indication information is used for indicating whether a first audio frame is a redundant frame of the target audio frame, the first audio frame is already encoded and is separated from the target audio frame by at least one audio frame, the second redundancy indication information is used for indicating whether a second audio frame is a redundant frame of the target audio frame, and the second audio frame is positioned between the first audio frame and the target audio frame;
performing first coding processing on the target audio frame to obtain conventional code stream data of the target audio frame;
in response to the first redundancy indication information that the first audio frame is the redundancy frame of the target audio frame and the second redundancy indication information that the second audio frame is the redundancy frame of the target audio frame, performing source redundancy processing on the regular code stream data of the target audio frame, the redundancy code stream data of the first audio frame and the redundancy code stream data of the second audio frame by adopting a distance coding method to obtain coded data of the target audio frame, wherein the redundancy code stream data of the first audio frame is obtained by performing second coding processing on the first audio frame, the redundancy code stream data of the second audio frame is obtained by performing second coding processing on the second audio frame, and the audio compression degree corresponding to the second coding processing is higher than that corresponding to the first coding processing;
and outputting the coded data of the target audio frame.
2. The method of claim 1, wherein the first audio frame is spaced one audio frame from the target audio frame.
3. The method as recited in claim 1, further comprising:
responding to first redundancy indication information of the first audio frame being a redundancy frame of the target audio frame or second redundancy indication information of the second audio frame being a redundancy frame of the target audio frame, and performing second encoding processing on the target audio frame to obtain redundancy code stream data of the target audio frame;
and storing redundant code stream data of the target audio frame.
4. An audio encoding apparatus, comprising:
the device comprises an acquisition module, a coding module and a coding module, wherein the coding module is used for acquiring a target audio frame and coding redundancy information, the coding redundancy information comprises first redundancy indication information and second redundancy indication information, the first redundancy indication information is used for indicating whether a first audio frame is a redundancy frame of the target audio frame, the first audio frame is already coded and is separated from the target audio frame by at least one audio frame, the second redundancy indication information is used for indicating whether a second audio frame is a redundancy frame of the target audio frame, and the second audio frame is positioned between the first audio frame and the target audio frame;
the encoding module is used for carrying out first encoding processing on the target audio frame to obtain conventional code stream data of the target audio frame;
the redundancy processing module is used for responding to first redundancy indication information of the first audio frame which is the redundancy frame of the target audio frame and second redundancy indication information of the second audio frame which is the redundancy frame of the target audio frame, performing source redundancy processing on conventional code stream data of the target audio frame and redundancy code stream data of the first audio frame by adopting a distance coding method to obtain coded data of the target audio frame, wherein the redundancy code stream data of the first audio frame is obtained by performing second coding processing on the first audio frame, the redundancy code stream data of the second audio frame is obtained by performing second coding processing on the second audio frame, and the audio compression degree corresponding to the second coding processing is higher than that corresponding to the first coding processing;
and the output module is used for outputting the coded data of the target audio frame.
5. The apparatus of claim 4, wherein the first audio frame is spaced one audio frame from the target audio frame.
6. The apparatus of claim 4, further comprising a storage module:
the encoding module is further configured to perform a second encoding process on the target audio frame in response to the first redundancy indication information that the first audio frame is a redundancy frame of the target audio frame or the second redundancy indication information that the second audio frame is a redundancy frame of the target audio frame, so as to obtain redundancy stream data of the target audio frame;
the storage module is used for storing redundant code stream data of the target audio frame.
7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.
8. A storage medium, characterized in that the electronic device is capable of performing the method of any of claims 1-3 when instructions in the storage medium are executed by a processor of the electronic device.
CN202110890333.6A 2021-08-04 2021-08-04 Audio coding method, device, electronic equipment and storage medium Active CN113744744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110890333.6A CN113744744B (en) 2021-08-04 2021-08-04 Audio coding method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110890333.6A CN113744744B (en) 2021-08-04 2021-08-04 Audio coding method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113744744A CN113744744A (en) 2021-12-03
CN113744744B true CN113744744B (en) 2023-10-03

Family

ID=78730152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110890333.6A Active CN113744744B (en) 2021-08-04 2021-08-04 Audio coding method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113744744B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118230743A (en) * 2022-12-20 2024-06-21 北京字跳网络技术有限公司 Audio processing method, device and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109524015A (en) * 2017-09-18 2019-03-26 杭州海康威视数字技术股份有限公司 Audio coding method, coding/decoding method, device and audio coding and decoding system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109524015A (en) * 2017-09-18 2019-03-26 杭州海康威视数字技术股份有限公司 Audio coding method, coding/decoding method, device and audio coding and decoding system

Also Published As

Publication number Publication date
CN113744744A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
US20110145308A1 (en) System to improve numereical conversions and associated methods
CN110427359A (en) A kind of diagram data treating method and apparatus
CN108777606B (en) Decoding method, apparatus and readable storage medium
CN111078930A (en) Audio file data processing method and device
JP7470800B2 (en) AUDIO ENCODING AND DECODING METHOD AND AUDIO ENCODING AND DECODING DEVICE - Patent application
CN107205041A (en) Upgrade method, audio frequency apparatus and the intelligent sound box of audio frequency apparatus
CN113744744B (en) Audio coding method, device, electronic equipment and storage medium
CN114333862A (en) Audio encoding method, decoding method, device, equipment, storage medium and product
CN110085241A (en) Data-encoding scheme, device, computer storage medium and data encoding apparatus
US20220115027A1 (en) Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
US10601441B2 (en) Efficient software closing of hardware-generated encoding context
CN110944197B (en) Method and device for coding images and audios
CN110933436A (en) Image encoding method, image encoding device, computer device, and storage medium
CN110970039A (en) Audio transmission method and device, electronic equipment and storage medium
CN109413492B (en) Audio data reverberation processing method and system in live broadcast process
CN114039919A (en) Traffic scheduling method, medium, device and computing equipment
EP2981081B1 (en) Methods and devices for coding and decoding depth information, and video processing and playing device
CN104732979A (en) Processing method and device of audio data
US8823557B1 (en) Random extraction from compressed data
CN114070470A (en) Encoding and decoding method and device
CN109375892B (en) Method and apparatus for playing audio
WO2024067771A1 (en) Encoding method, decoding method, encoding apparatus, decoding apparatus, electronic device, and storage medium
WO2024131847A1 (en) Audio processing method and apparatus, electronic device and computer-readable storage medium
EP4398242A1 (en) Encoding and decoding methods and apparatus, device, storage medium, and computer program
CN111639055B (en) Differential packet calculation method, differential packet calculation device, differential packet calculation equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant