WO2022183841A1 - 解码方法、装置和计算机可读存储介质 - Google Patents
解码方法、装置和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2022183841A1 WO2022183841A1 PCT/CN2022/070088 CN2022070088W WO2022183841A1 WO 2022183841 A1 WO2022183841 A1 WO 2022183841A1 CN 2022070088 W CN2022070088 W CN 2022070088W WO 2022183841 A1 WO2022183841 A1 WO 2022183841A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- stream
- audio
- header information
- segment
- decoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000003139 buffering effect Effects 0.000 claims abstract description 8
- 238000004590 computer program Methods 0.000 claims description 9
- 239000000872 buffer Substances 0.000 claims description 7
- 238000005538 encapsulation Methods 0.000 claims description 5
- 238000012952 Resampling Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 8
- 238000013473 artificial intelligence Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the present disclosure relates to the field of computer technology, and in particular, to a decoding method, an apparatus, and a computer-readable storage medium.
- Real-time decoding of the audio stream needs to obtain the audio format, parameters, etc., and these information are generally included in the header information.
- a decoding method comprising: buffering one or more stream segments of a received data stream, wherein the data stream includes an audio stream; and parsing the buffered stream segments until parsing The header information is obtained; the header information is stored; and the stream segments of the audio stream in each received stream segment are decoded according to the header information, until the decoding of the audio stream is completed.
- parsing the cached stream segments until the header information is obtained by parsing includes: determining whether the total data length of all currently cached stream segments reaches a preset frame length; When the length of the data reaches the preset frame length, parse the stream segment from the start data to the data that meets the preset frame length; determine whether the header information is successfully parsed; if the header information is not successfully parsed Next, increase the preset frame length by a preset value, and update the preset frame length; and repeat the above steps until the header information is obtained by parsing.
- parsing the cached stream segments until the parsing obtains the header information further includes: in the case that the total data length of all currently cached stream segments does not reach the preset frame length, waiting to receive the next After the stream segments are buffered, re-execution is performed to determine whether the total data length of all the buffered stream segments reaches the preset frame length.
- decoding the stream segments of the audio stream in the received stream segments according to the header information includes: determining the length of the audio frame according to the header information; Stream segments of an audio stream differentiate between different audio frames for decoding.
- decoding the stream segments of the audio stream in the received stream segments to distinguish different audio frames includes: according to the length of the audio frame, in the order of the data encapsulation format, decoding the audio stream Divide the current stream segment to obtain one or more complete audio frames; decode one or more complete audio frames; determine whether the tail data of the current stream segment of the audio stream belongs to incomplete audio frames; If the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame, the incomplete audio frame will be buffered; after waiting for the next stream segment of the audio stream to be received, the next stream segment will be combined with the incomplete audio.
- the frames are spliced to obtain a spliced stream segment; the spliced stream segment is used as the current stream segment of the audio stream, and the above steps are repeated until the decoding of the last stream segment of the audio stream is completed.
- decoding the stream segments of the audio stream among the received stream segments according to the header information until the decoding of the audio stream is completed includes: when the decoding of the current stream segment of the audio stream according to the header information fails In the case of , the current stream segment or the current stream segment and the stream segment after the current stream segment are parsed until the new header information is obtained by parsing; the stream segment after the current stream segment is connected according to the new header information Decode until the audio stream is decoded.
- parsing the cached stream segment until the parsing obtains the header information includes: calling the Open avformat method in FFmpeg to parse the cached stream segment until the parsing obtains the header information.
- decoding the stream segments of the audio stream among the received stream segments according to the header information includes: determining whether the data stream includes other data streams other than the audio stream according to the header information; if the data stream includes other data streams than the audio stream In the case of other data streams, separate the other data streams from the audio stream; determine the format information of the audio stream according to the header information; transcode each stream segment of the audio stream into the original audio stream according to the format information of the audio stream; The audio stream is resampled according to the preset bit rate.
- a decoding apparatus comprising: a buffering module configured to buffer one or more stream segments of a received data stream, wherein the data stream includes an audio stream; header information parsing The module is used to parse the cached stream segments until the header information is obtained; the header information storage module is used to save the header information; the decoding module is used to analyze the received stream segments according to the header information.
- the stream segments of the audio stream are decoded until the decoding of the audio stream is complete.
- a decoding apparatus comprising: a processor; and a memory coupled to the processor for storing instructions, and when the instructions are executed by the processor, the processor executes any of the foregoing implementations example decoding method.
- a non-transitory computer-readable storage medium having a computer program stored thereon, wherein, when the program is executed by a processor, the decoding method of any of the foregoing embodiments is implemented.
- FIG. 1 shows a schematic flowchart of a decoding method according to some embodiments of the present disclosure.
- FIG. 2 shows a schematic structural diagram of an audio stream according to some embodiments of the present disclosure.
- FIG. 3 shows a schematic flowchart of decoding methods according to other embodiments of the present disclosure.
- FIG. 4 shows a schematic structural diagram of a decoding apparatus according to some embodiments of the present disclosure.
- FIG. 5 shows a schematic structural diagram of a decoding apparatus according to other embodiments of the present disclosure.
- FIG. 6 shows a schematic structural diagram of a decoding apparatus according to further embodiments of the present disclosure.
- a technical problem to be solved by the present disclosure is: how to realize real-time decoding of audio streams.
- the present disclosure provides a decoding method that can be used for real-time decoding of an audio stream in an artificial intelligence customer service scenario, which will be described below with reference to FIGS. 1 to 3 .
- FIG. 1 is a flowchart of some embodiments of the disclosed decoding method. As shown in FIG. 1 , the method of this embodiment includes steps S102 to S108.
- step S102 one or more stream segments of the received data stream are buffered.
- Data streams include audio streams, and can also include other data streams other than audio streams, for example, non-audio data streams such as video streams. In the case of mixing audio streams and other data streams, different streams need to be separated in subsequent steps. described in the examples that follow.
- a data stream is divided into multiple stream segments during transmission, and each stream segment can be encapsulated into a data packet (Package) for transmission.
- the decoding apparatus After receiving the data packet, the decoding apparatus (the apparatus for executing the decoding method of the present disclosure) parses the data packet to obtain stream segments, and buffers the stream segments.
- the scheme of the present disclosure can be implemented based on the FFmpeg API.
- you can initialize two modules, avformat and avio context (Init avformat/Init avio context), which are respectively used for subsequent header information parsing and audio stream reading, and the Buffer stream method can be called for buffering stream segments.
- step S104 the cached stream segment is parsed until header information is obtained by parsing.
- the header information includes, for example, format information of the audio stream and at least one parameter
- the at least one parameter includes, for example, at least one item of sampling rate, bit depth, number of channels, compression ratio, etc., which is not limited to the examples. Since the division of stream segments is uncertain, it is possible that a stream segment contains complete header information, or a stream segment only contains partial header information, and multiple stream segments are required to obtain complete header information. In some embodiments, after caching one stream segment each time, try to parse all previously cached stream segments to determine whether the header information is successfully parsed, and if the header information is not successfully parsed, continue to parse the next stream segment. The stream segment is cached, and the above process is repeated until the header information is successfully parsed.
- the preset frame length may be statistically obtained according to the length of the header information in the historical audio stream. After each stream segment is cached, it can be determined whether the total data length of all currently cached stream segments reaches the preset frame length. If the total data length of all currently buffered stream segments does not reach the preset frame length, wait for the next stream segment to be buffered, and then re-execute to determine whether the total data length of all buffered stream segments reaches the preset frame length Frame length steps. Until the total data length of all currently buffered stream segments reaches the preset frame length, try to parse the data from the start data to the data satisfying the preset frame length in the currently cached stream segments.
- the preset frame length is 200 bytes
- the data starting from the first byte of the first buffered stream segment to 200 bytes long is used as the data to be parsed, and the data to be parsed is parsed to determine whether it is successfully parsed. header information. If the header information is successfully parsed, the parsing process of the header information is stopped. If the parsing of the header information fails, the preset frame length is increased by a preset value, and the preset frame length is updated, for example, increasing from 200 bytes to 300 bytes. Afterwards, the step of determining whether the total data length of all stream segments currently buffered reaches the preset frame length starts to be executed again.
- step S106 the header information is saved.
- step S108 the stream segments of the audio stream among the received stream segments are decoded according to the header information, until the decoding of the audio stream is completed.
- each received stream segment is directly decoded using the header information.
- the data stream contains an audio stream and other data streams (non-audio data streams).
- a stream separation operation is required.
- whether the data stream includes other data streams other than the audio stream is determined according to the header information; if the data stream includes other data streams other than the audio stream, the other data streams are separated from the audio stream. For example, call the Separate stream method in FFmpeg to separate other data streams from the audio stream.
- the stream segments of the audio stream in each received stream segment are separated, the stream segments of the audio stream are decoded by using the header information.
- the format information of the audio stream is determined according to the header information; each stream segment of the audio stream is transcoded into an original audio stream according to the format information of the audio stream; and the original audio stream is resampled according to a preset bit rate. The resampled bit rate matches the bit rate of the playback device, which is convenient for playback.
- Parse format parsing format
- the saved header information can be used to correctly decode the entire audio stream.
- the header information of different audio files may be different, and the decoding process fails.
- the decoding of the current stream segment of the audio stream fails according to the header information
- the current stream segment or the current stream segment and the stream segments following the current stream segment are decoded. Parse until new header information is obtained by parsing; and decode the stream segments after the current stream segment according to the new header information, until the decoding of the audio stream is completed.
- Saving the new header information can delete the original saved header information, and use the new header information to decode the stream segment received later, until the decoding of the audio stream is completed.
- the method of the above embodiment first caches one or more stream segments of the received data stream, continuously parses the cached stream segments until the header information is obtained from the parsing, saves the header information, and uses the header information to perform subsequent analysis of the data stream.
- the stream segments of the audio stream in the received stream segments are decoded until the decoding of the audio stream is completed.
- the method of the above embodiment can realize the real-time decoding of the audio stream, and meet the requirement of real-time decoding of the real-time audio stream in the artificial intelligence customer service scenario.
- the method of the above-mentioned embodiment caches the stream segment through the audio stream buffer, extracts the header information (including the format information and parameters of the audio stream, etc.) and parses and saves it, and can be parsed according to the header information.
- the format information and at least one parameter of the audio stream are output, and the decoder type can be obtained through the format information of the audio stream, and in the stream segment of the audio stream received later, use the previously cached decoder type to link the corresponding decoder engine, And try to decode the subsequent stream segment according to at least one parameter of the audio stream.
- real-time decoding can be achieved, which solves the problem of the FFmpeg tool due to most stream segments.
- the problem that the header information is not included and cannot be decoded.
- the transmitted stream segments are not divided according to an integer multiple of the audio frame length, there may be a problem of incomplete audio frames.
- stream segment 1 of the audio stream contains audio frame (Frame) 1, audio frame 2 and a part of audio frame 3, while stream segment 2 contains another part of audio frame 3.
- the decoder is used according to the header. Errors are reported when decoding stream segments 1 and 2.
- the present disclosure also provides a solution.
- the length of the audio frame is determined according to the header information; according to the length of the audio frame, the stream segments of the audio stream in each received stream segment are decoded to distinguish different audio frames.
- the length of the audio frame may be determined according to the parameters included in the header information, for example, the length of the audio frame may be determined according to the sampling rate, bit depth, number of channels, etc., reference may be made to the prior art, and details will not be repeated.
- decoding the stream segments of the audio stream in the received stream segments according to the header information includes steps S302 to S316 .
- step S302 the length of the audio frame is determined according to the header information.
- step S304 if the stream segment where the header information is located also contains audio data, the stream segment is regarded as the current stream segment of the audio stream.
- step S306 for the current stream segment, according to the length of the audio frame, the audio frame is divided in the order of the data encapsulation format, that is, according to the length of the audio frame, the current stream segment is divided in the order of the data encapsulation format to obtain a or multiple complete audio frames.
- data is arranged in a left-to-right or front-to-back order in a stream segment.
- the tail data belongs to the incomplete audio frame 3.
- step S308 the one or more complete audio frames are decoded.
- step S310 it is determined whether the current stream segment is the last stream segment, if so, stop, otherwise step S312 is performed.
- step S312 it is determined whether the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame. If yes, go to step S314, otherwise go to step S313.
- step S313 after waiting for the next stream segment of the audio stream to be received, the next stream segment is regarded as the current stream segment, and the process returns to step S306 to restart the execution.
- step S314 the incomplete audio frame is buffered.
- step S316 after waiting for the next stream segment of the audio stream to be received, splicing the next stream segment with the incomplete audio frame to obtain the spliced stream segment as the current stream segment, and returning to step S306 to start again implement
- the first half of the audio frame 3 in the stream segment 2 and the stream segment 1 are spliced to form a complete frame.
- the method of the above embodiment considers that incomplete frame information is buffered until the next stream segment is received and then spliced, which solves the problem that the stream segment cannot be decoded correctly when the stream segment contains incomplete audio frames.
- the present disclosure also provides a decoding apparatus, which will be described in conjunction with 4 below.
- FIG. 4 is a structural diagram of some embodiments of the disclosed decoding apparatus.
- the apparatus 40 in this embodiment includes: a cache module 410 , a header information parsing module 420 , a header information saving module 430 , and a decoding module 440 .
- the buffering module 410 is configured to buffer the stream segments of the received data stream, wherein the data stream includes an audio stream.
- the header information parsing module 420 is configured to parse the cached one or more stream segments until the header information is obtained by parsing.
- the header information parsing module 420 is configured to determine whether the total data length of all currently buffered stream segments reaches a preset frame length; the total data length of all currently buffered stream segments reaches a preset frame length In the case of , parse the stream segment from the start data to the data that meets the preset frame length; determine whether the header information is successfully parsed; if the header information is not successfully parsed, increase the preset frame length If the preset value is set, the preset frame length is updated; the above steps are repeated until the header information is obtained by parsing.
- the header information parsing module 420 is configured to re-execute the determination after waiting for the next stream segment to be buffered when the total data length of all currently buffered stream segments does not reach the preset frame length Whether the total data length of all buffered stream segments reaches the preset frame length.
- the header information parsing module 420 is configured to call the Open avformat method in FFmpeg to parse the cached stream segment until the parsing obtains the header information.
- the header information saving module 430 is used to save the header information.
- the decoding module 440 is configured to decode the stream segments of the audio stream among the received stream segments according to the header information, until the decoding of the audio stream is completed.
- the decoding module 440 is configured to determine the length of the audio frame according to the header information; according to the length of the audio frame, the stream segments of the audio stream in each received stream segment are decoded by distinguishing different audio frames.
- the decoding module 440 is configured to divide the current stream segment of the audio stream in the order of the data encapsulation format according to the length of the audio frame to obtain one or more complete audio frames; Decode the complete audio frame; determine whether the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame; if the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame, it will be incomplete After receiving the next stream segment of the audio stream, splicing the next stream segment with the incomplete audio frame to obtain the spliced stream segment; using the spliced stream segment as the current audio stream segment stream segment, repeat the above steps until the decoding of the last stream segment of the audio stream is completed.
- the decoding module 440 is configured to decode the current stream segment or the current stream segment and the stream segments following the current stream segment if the decoding of the current stream segment of the audio stream fails according to the header information Perform parsing until new header information is obtained by parsing; decode the stream segments following the current stream segment according to the new header information, until the decoding of the audio stream is completed.
- the decoding module 440 is configured to determine whether the data stream includes other data streams other than the audio stream according to the header information; in the case that the data stream includes other data streams other than the audio stream, the other data streams are compared with the audio stream. Separation; determining the format information of the audio stream according to the header information; transcoding each stream segment of the audio stream into the original audio stream according to the format information of the audio stream; resampling the original audio stream according to the preset bit rate.
- the decoding module 440 is configured to call the Separate stream method in FFmpeg to separate other data streams from the audio stream; call the Parse format method in FFmpeg to determine the format information of the audio stream according to the header information, according to The format information of the audio stream transcodes each stream segment of the audio stream into an original audio stream, and resamples the original audio stream according to a preset bit rate.
- the decoding apparatuses in the embodiments of the present disclosure may each be implemented by various computing devices or computer systems, which will be described below with reference to FIG. 5 and FIG. 6 .
- FIG. 5 is a structural diagram of some embodiments of the disclosed decoding apparatus.
- the apparatus 50 of this embodiment includes a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to execute any of the implementations of the present disclosure based on instructions stored in the memory 510 The decoding method in the example.
- the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
- the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
- FIG. 6 is a structural diagram of other embodiments of the disclosed decoding apparatus.
- the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630 , 640 , 650 and the memory 610 and the processor 620 can be connected, for example, through a bus 660 .
- the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
- the network interface 640 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server.
- the storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.
- embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .
- computer-usable non-transitory storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
- These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps configured to implement the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (12)
- 一种解码方法,包括:对接收的数据流的一个或多个流片段进行缓存,其中,所述数据流包括音频流;对缓存的流片段进行解析,直至解析得到头部信息;将所述头部信息进行保存;根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码,直至完成对所述音频流的解码。
- 根据权利要求1所述的解码方法,其中,所述对缓存的流片段进行解析,直至解析得到头部信息包括:确定当前缓存的所有流片段的总的数据长度是否达到预设帧长度;在当前缓存的所有流片段的总的数据长度达到所述预设帧长度的情况下,对所述流片段中从起始数据开始到满足所述预设帧长度的数据进行解析;确定是否成功解析得到所述头部信息;在未成功解析得到所述头部信息的情况下,将所述预设帧长度增加预设值,更新所述预设帧长度;重复执行上述步骤,直至解析得到头部信息。
- 根据权利要求2所述的解码方法,其中,所述对缓存的流片段进行解析,直至解析得到头部信息还包括:在当前缓存的所有流片段的总的数据长度未达到预设帧长度的情况下,等待接收到下一个流片段进行缓存后,重新执行确定缓存的所有流片段的总的数据长度是否达到预设帧长度。
- 根据权利要求1所述的解码方法,其中,所述根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码包括:根据所述头部信息确定音频帧的长度;根据所述音频帧的长度,对接收到的各个流片段中所述音频流的流片段区分不同的音频帧进行解码。
- 根据权利要求4所述的解码方法,其中,所述根据所述音频帧的长度,对接收到的各个流片段中所述音频流的流片段区分不同的音频帧进行解码包括:根据所述音频帧的长度,按照数据封装格式的顺序对所述音频流的当前的流片段,进行划分,得到一个或多个完整的音频帧;对所述一个或多个完整的音频帧进行解码;确定所述音频流的当前的流片段的尾部数据是否属于不完整的音频帧;在所述音频流的当前的流片段的尾部数据属于不完整的音频帧的情况下,将所述不完整的音频帧缓存;等待接收到所述音频流的下一个流片段后,将所述下一个流片段与所述不完整的音频帧进行拼接,得到拼接后的流片段;将拼接后的流片段作为所述音频流的当前的流片段,重复执行上述步骤,直至完成对所述音频流的最后一个流片段的解码。
- 根据权利要求1所述的解码方法,其中,所述根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码,直至完成对所述音频流的解码包括:在根据所述头部信息对所述音频流的当前的流片段进行解码失败的情况下,对当前的流片段或者当前的流片段以及当前的流片段之后的流片段进行解析,直至解析得到新的头部信息;根据所述新的头部信息对接当前的流片段之后的流片段进行解码,直至完成对所述音频流的解码。
- 根据权利要求1所述的解码方法,其中,所述对缓存的流片段进行解析,直至解析得到头部信息包括:调用FFmpeg中的Open avformat方法对缓存的流片段进行解析,直至解析得到头部信息。
- 根据权利要求1所述的解码方法,其中,所述根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码包括:根据所述头部信息确定所述数据流是否包括所述音频流以外的其他数据流;在所述数据流包括所述音频流以外的其他数据流的情况下,将所述其他数据流与所述音频流分离;根据所述头部信息确定所述音频流的格式信息;根据所述音频流的格式信息将所述音频流的各个流片段转码为原始音频流;将所述原始音频流按照预设码率重新采样。
- 根据权利要求8所述的解码方法,其中,调用FFmpeg中的分离流Separate stream方法将所述其他数据流与所述音频流分离;调用FFmpeg中的解析格式Parse format方法根据所述头部信息确定所述音频流的格式信息,根据所述音频流的格式信息将所述音频流的各个流片段转码为原始音频流,并将所述原始音频流按照预设码率重新采样。
- 一种解码装置,包括:缓存模块,用于对接收的数据流的一个或多个流片段进行缓存,其中,所述数据流包括音频流;头部信息解析模块,用于对缓存的流片段进行解析,直至解析得到头部信息;头部信息保存模块,用于将所述头部信息进行保存;解码模块,用于根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码,直至完成对所述音频流的解码。
- 一种解码装置,包括:处理器;以及耦接至所述处理器的存储器,用于存储指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-9任一项所述的解码方法。
- 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-9任一项所述方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023553356A JP2024509833A (ja) | 2021-03-02 | 2022-01-04 | 復号化方法および装置、ならびにコンピュータ可読記憶媒体 |
US18/546,387 US20240233740A9 (en) | 2021-03-02 | 2022-01-04 | Decoding method and apparatus, and computer readable storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110229441.9A CN114093375A (zh) | 2021-03-02 | 2021-03-02 | 解码方法、装置和计算机可读存储介质 |
CN202110229441.9 | 2021-03-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022183841A1 true WO2022183841A1 (zh) | 2022-09-09 |
Family
ID=80295963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/070088 WO2022183841A1 (zh) | 2021-03-02 | 2022-01-04 | 解码方法、装置和计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240233740A9 (zh) |
JP (1) | JP2024509833A (zh) |
CN (1) | CN114093375A (zh) |
WO (1) | WO2022183841A1 (zh) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050018775A1 (en) * | 2003-07-23 | 2005-01-27 | Mk Subramanian | System and method for audio/video synchronization |
CN1909657A (zh) * | 2005-08-05 | 2007-02-07 | 乐金电子(惠州)有限公司 | Mpeg音频解码方法 |
CN102254560A (zh) * | 2010-05-19 | 2011-11-23 | 安凯(广州)微电子技术有限公司 | 一种移动数字电视录像中的音频处理方法 |
CN104113777A (zh) * | 2014-08-01 | 2014-10-22 | 广州金山网络科技有限公司 | 一种音频流解码方法及装置 |
CN104202656A (zh) * | 2014-09-16 | 2014-12-10 | 国家计算机网络与信息安全管理中心 | 网络音频mp3流乱序分段解码方法 |
CN104780422A (zh) * | 2014-01-13 | 2015-07-15 | 北京兆维电子(集团)有限责任公司 | 流媒体播放方法及流媒体播放器 |
CN108122558A (zh) * | 2017-12-22 | 2018-06-05 | 深圳国微技术有限公司 | 一种latm aac音频流的实时转容实现方法及装置 |
CN108389582A (zh) * | 2016-12-12 | 2018-08-10 | 中国航空工业集团公司西安航空计算技术研究所 | Mpeg-2/4aac音频解码错误检测及处理方法 |
-
2021
- 2021-03-02 CN CN202110229441.9A patent/CN114093375A/zh active Pending
-
2022
- 2022-01-04 WO PCT/CN2022/070088 patent/WO2022183841A1/zh active Application Filing
- 2022-01-04 US US18/546,387 patent/US20240233740A9/en active Pending
- 2022-01-04 JP JP2023553356A patent/JP2024509833A/ja active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050018775A1 (en) * | 2003-07-23 | 2005-01-27 | Mk Subramanian | System and method for audio/video synchronization |
CN1909657A (zh) * | 2005-08-05 | 2007-02-07 | 乐金电子(惠州)有限公司 | Mpeg音频解码方法 |
CN102254560A (zh) * | 2010-05-19 | 2011-11-23 | 安凯(广州)微电子技术有限公司 | 一种移动数字电视录像中的音频处理方法 |
CN104780422A (zh) * | 2014-01-13 | 2015-07-15 | 北京兆维电子(集团)有限责任公司 | 流媒体播放方法及流媒体播放器 |
CN104113777A (zh) * | 2014-08-01 | 2014-10-22 | 广州金山网络科技有限公司 | 一种音频流解码方法及装置 |
CN104202656A (zh) * | 2014-09-16 | 2014-12-10 | 国家计算机网络与信息安全管理中心 | 网络音频mp3流乱序分段解码方法 |
CN108389582A (zh) * | 2016-12-12 | 2018-08-10 | 中国航空工业集团公司西安航空计算技术研究所 | Mpeg-2/4aac音频解码错误检测及处理方法 |
CN108122558A (zh) * | 2017-12-22 | 2018-06-05 | 深圳国微技术有限公司 | 一种latm aac音频流的实时转容实现方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
US20240233740A9 (en) | 2024-07-11 |
US20240135942A1 (en) | 2024-04-25 |
JP2024509833A (ja) | 2024-03-05 |
CN114093375A (zh) | 2022-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020078165A1 (zh) | 视频处理方法、装置、电子设备及计算机可读介质 | |
CN110996160B (zh) | 视频处理方法、装置、电子设备及计算机可读取存储介质 | |
WO2022021852A1 (zh) | 一种基于fpga的fast协议解码方法、装置及设备 | |
US10177958B2 (en) | Method for synchronously taking audio and video in order to proceed one-to-multi multimedia stream | |
US10476928B2 (en) | Network video playback method and apparatus | |
WO2020155964A1 (zh) | 音视频的切换方法、装置、计算机设备及可读存储介质 | |
CN103179431A (zh) | Vdi环境下音视频重定向转码分离方法 | |
CN115243074B (zh) | 视频流的处理方法及装置、存储介质、电子设备 | |
US20080033978A1 (en) | Program, data processing method, and system of same | |
CN113079386B (zh) | 一种视频在线播放方法、装置、电子设备及存储介质 | |
US20070239780A1 (en) | Simultaneous capture and analysis of media content | |
WO2022183841A1 (zh) | 解码方法、装置和计算机可读存储介质 | |
CN113382278A (zh) | 视频推送方法、装置、电子设备和可读存储介质 | |
CN110868610B (zh) | 流媒体传输方法、装置、服务器及存储介质 | |
EP3352077A1 (en) | Method for synchronously taking audio and video in order to proceed one-to-multi multimedia stream | |
US11831430B2 (en) | Methods and apparatuses for encoding and decoding signal frame | |
CN108124183B (zh) | 以同步获取影音以进行一对多影音串流的方法 | |
US7664373B2 (en) | Program, data processing method, and system of same | |
US20100076944A1 (en) | Multiprocessor systems for processing multimedia data and methods thereof | |
CN113784094A (zh) | 视频数据处理方法、网关、终端设备及存储介质 | |
WO2016107174A1 (zh) | 多媒体文件数据的处理方法及系统、播放器和客户端 | |
CN111126003A (zh) | 话单数据处理方法及装置 | |
CN114025196B (zh) | 编码方法、解码方法、编解码装置及介质 | |
CN116033113B (zh) | 一种视频会议辅助信息传输方法和系统 | |
CN111757168B (zh) | 音频解码方法、装置、存储介质及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22762310 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18546387 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023553356 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11202305980W Country of ref document: SG |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM1205A DATED 16.01.2024) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22762310 Country of ref document: EP Kind code of ref document: A1 |