WO2022183841A1 - 解码方法、装置和计算机可读存储介质 - Google Patents

解码方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2022183841A1
WO2022183841A1 PCT/CN2022/070088 CN2022070088W WO2022183841A1 WO 2022183841 A1 WO2022183841 A1 WO 2022183841A1 CN 2022070088 W CN2022070088 W CN 2022070088W WO 2022183841 A1 WO2022183841 A1 WO 2022183841A1
Authority
WO
WIPO (PCT)
Prior art keywords
stream
audio
header information
segment
decoding
Prior art date
Application number
PCT/CN2022/070088
Other languages
English (en)
French (fr)
Inventor
崔午阳
吴俊仪
蔡玉玉
全刚
杨帆
丁国宏
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Priority to JP2023553356A priority Critical patent/JP2024509833A/ja
Priority to US18/546,387 priority patent/US20240233740A9/en
Publication of WO2022183841A1 publication Critical patent/WO2022183841A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a decoding method, an apparatus, and a computer-readable storage medium.
  • Real-time decoding of the audio stream needs to obtain the audio format, parameters, etc., and these information are generally included in the header information.
  • a decoding method comprising: buffering one or more stream segments of a received data stream, wherein the data stream includes an audio stream; and parsing the buffered stream segments until parsing The header information is obtained; the header information is stored; and the stream segments of the audio stream in each received stream segment are decoded according to the header information, until the decoding of the audio stream is completed.
  • parsing the cached stream segments until the header information is obtained by parsing includes: determining whether the total data length of all currently cached stream segments reaches a preset frame length; When the length of the data reaches the preset frame length, parse the stream segment from the start data to the data that meets the preset frame length; determine whether the header information is successfully parsed; if the header information is not successfully parsed Next, increase the preset frame length by a preset value, and update the preset frame length; and repeat the above steps until the header information is obtained by parsing.
  • parsing the cached stream segments until the parsing obtains the header information further includes: in the case that the total data length of all currently cached stream segments does not reach the preset frame length, waiting to receive the next After the stream segments are buffered, re-execution is performed to determine whether the total data length of all the buffered stream segments reaches the preset frame length.
  • decoding the stream segments of the audio stream in the received stream segments according to the header information includes: determining the length of the audio frame according to the header information; Stream segments of an audio stream differentiate between different audio frames for decoding.
  • decoding the stream segments of the audio stream in the received stream segments to distinguish different audio frames includes: according to the length of the audio frame, in the order of the data encapsulation format, decoding the audio stream Divide the current stream segment to obtain one or more complete audio frames; decode one or more complete audio frames; determine whether the tail data of the current stream segment of the audio stream belongs to incomplete audio frames; If the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame, the incomplete audio frame will be buffered; after waiting for the next stream segment of the audio stream to be received, the next stream segment will be combined with the incomplete audio.
  • the frames are spliced to obtain a spliced stream segment; the spliced stream segment is used as the current stream segment of the audio stream, and the above steps are repeated until the decoding of the last stream segment of the audio stream is completed.
  • decoding the stream segments of the audio stream among the received stream segments according to the header information until the decoding of the audio stream is completed includes: when the decoding of the current stream segment of the audio stream according to the header information fails In the case of , the current stream segment or the current stream segment and the stream segment after the current stream segment are parsed until the new header information is obtained by parsing; the stream segment after the current stream segment is connected according to the new header information Decode until the audio stream is decoded.
  • parsing the cached stream segment until the parsing obtains the header information includes: calling the Open avformat method in FFmpeg to parse the cached stream segment until the parsing obtains the header information.
  • decoding the stream segments of the audio stream among the received stream segments according to the header information includes: determining whether the data stream includes other data streams other than the audio stream according to the header information; if the data stream includes other data streams than the audio stream In the case of other data streams, separate the other data streams from the audio stream; determine the format information of the audio stream according to the header information; transcode each stream segment of the audio stream into the original audio stream according to the format information of the audio stream; The audio stream is resampled according to the preset bit rate.
  • a decoding apparatus comprising: a buffering module configured to buffer one or more stream segments of a received data stream, wherein the data stream includes an audio stream; header information parsing The module is used to parse the cached stream segments until the header information is obtained; the header information storage module is used to save the header information; the decoding module is used to analyze the received stream segments according to the header information.
  • the stream segments of the audio stream are decoded until the decoding of the audio stream is complete.
  • a decoding apparatus comprising: a processor; and a memory coupled to the processor for storing instructions, and when the instructions are executed by the processor, the processor executes any of the foregoing implementations example decoding method.
  • a non-transitory computer-readable storage medium having a computer program stored thereon, wherein, when the program is executed by a processor, the decoding method of any of the foregoing embodiments is implemented.
  • FIG. 1 shows a schematic flowchart of a decoding method according to some embodiments of the present disclosure.
  • FIG. 2 shows a schematic structural diagram of an audio stream according to some embodiments of the present disclosure.
  • FIG. 3 shows a schematic flowchart of decoding methods according to other embodiments of the present disclosure.
  • FIG. 4 shows a schematic structural diagram of a decoding apparatus according to some embodiments of the present disclosure.
  • FIG. 5 shows a schematic structural diagram of a decoding apparatus according to other embodiments of the present disclosure.
  • FIG. 6 shows a schematic structural diagram of a decoding apparatus according to further embodiments of the present disclosure.
  • a technical problem to be solved by the present disclosure is: how to realize real-time decoding of audio streams.
  • the present disclosure provides a decoding method that can be used for real-time decoding of an audio stream in an artificial intelligence customer service scenario, which will be described below with reference to FIGS. 1 to 3 .
  • FIG. 1 is a flowchart of some embodiments of the disclosed decoding method. As shown in FIG. 1 , the method of this embodiment includes steps S102 to S108.
  • step S102 one or more stream segments of the received data stream are buffered.
  • Data streams include audio streams, and can also include other data streams other than audio streams, for example, non-audio data streams such as video streams. In the case of mixing audio streams and other data streams, different streams need to be separated in subsequent steps. described in the examples that follow.
  • a data stream is divided into multiple stream segments during transmission, and each stream segment can be encapsulated into a data packet (Package) for transmission.
  • the decoding apparatus After receiving the data packet, the decoding apparatus (the apparatus for executing the decoding method of the present disclosure) parses the data packet to obtain stream segments, and buffers the stream segments.
  • the scheme of the present disclosure can be implemented based on the FFmpeg API.
  • you can initialize two modules, avformat and avio context (Init avformat/Init avio context), which are respectively used for subsequent header information parsing and audio stream reading, and the Buffer stream method can be called for buffering stream segments.
  • step S104 the cached stream segment is parsed until header information is obtained by parsing.
  • the header information includes, for example, format information of the audio stream and at least one parameter
  • the at least one parameter includes, for example, at least one item of sampling rate, bit depth, number of channels, compression ratio, etc., which is not limited to the examples. Since the division of stream segments is uncertain, it is possible that a stream segment contains complete header information, or a stream segment only contains partial header information, and multiple stream segments are required to obtain complete header information. In some embodiments, after caching one stream segment each time, try to parse all previously cached stream segments to determine whether the header information is successfully parsed, and if the header information is not successfully parsed, continue to parse the next stream segment. The stream segment is cached, and the above process is repeated until the header information is successfully parsed.
  • the preset frame length may be statistically obtained according to the length of the header information in the historical audio stream. After each stream segment is cached, it can be determined whether the total data length of all currently cached stream segments reaches the preset frame length. If the total data length of all currently buffered stream segments does not reach the preset frame length, wait for the next stream segment to be buffered, and then re-execute to determine whether the total data length of all buffered stream segments reaches the preset frame length Frame length steps. Until the total data length of all currently buffered stream segments reaches the preset frame length, try to parse the data from the start data to the data satisfying the preset frame length in the currently cached stream segments.
  • the preset frame length is 200 bytes
  • the data starting from the first byte of the first buffered stream segment to 200 bytes long is used as the data to be parsed, and the data to be parsed is parsed to determine whether it is successfully parsed. header information. If the header information is successfully parsed, the parsing process of the header information is stopped. If the parsing of the header information fails, the preset frame length is increased by a preset value, and the preset frame length is updated, for example, increasing from 200 bytes to 300 bytes. Afterwards, the step of determining whether the total data length of all stream segments currently buffered reaches the preset frame length starts to be executed again.
  • step S106 the header information is saved.
  • step S108 the stream segments of the audio stream among the received stream segments are decoded according to the header information, until the decoding of the audio stream is completed.
  • each received stream segment is directly decoded using the header information.
  • the data stream contains an audio stream and other data streams (non-audio data streams).
  • a stream separation operation is required.
  • whether the data stream includes other data streams other than the audio stream is determined according to the header information; if the data stream includes other data streams other than the audio stream, the other data streams are separated from the audio stream. For example, call the Separate stream method in FFmpeg to separate other data streams from the audio stream.
  • the stream segments of the audio stream in each received stream segment are separated, the stream segments of the audio stream are decoded by using the header information.
  • the format information of the audio stream is determined according to the header information; each stream segment of the audio stream is transcoded into an original audio stream according to the format information of the audio stream; and the original audio stream is resampled according to a preset bit rate. The resampled bit rate matches the bit rate of the playback device, which is convenient for playback.
  • Parse format parsing format
  • the saved header information can be used to correctly decode the entire audio stream.
  • the header information of different audio files may be different, and the decoding process fails.
  • the decoding of the current stream segment of the audio stream fails according to the header information
  • the current stream segment or the current stream segment and the stream segments following the current stream segment are decoded. Parse until new header information is obtained by parsing; and decode the stream segments after the current stream segment according to the new header information, until the decoding of the audio stream is completed.
  • Saving the new header information can delete the original saved header information, and use the new header information to decode the stream segment received later, until the decoding of the audio stream is completed.
  • the method of the above embodiment first caches one or more stream segments of the received data stream, continuously parses the cached stream segments until the header information is obtained from the parsing, saves the header information, and uses the header information to perform subsequent analysis of the data stream.
  • the stream segments of the audio stream in the received stream segments are decoded until the decoding of the audio stream is completed.
  • the method of the above embodiment can realize the real-time decoding of the audio stream, and meet the requirement of real-time decoding of the real-time audio stream in the artificial intelligence customer service scenario.
  • the method of the above-mentioned embodiment caches the stream segment through the audio stream buffer, extracts the header information (including the format information and parameters of the audio stream, etc.) and parses and saves it, and can be parsed according to the header information.
  • the format information and at least one parameter of the audio stream are output, and the decoder type can be obtained through the format information of the audio stream, and in the stream segment of the audio stream received later, use the previously cached decoder type to link the corresponding decoder engine, And try to decode the subsequent stream segment according to at least one parameter of the audio stream.
  • real-time decoding can be achieved, which solves the problem of the FFmpeg tool due to most stream segments.
  • the problem that the header information is not included and cannot be decoded.
  • the transmitted stream segments are not divided according to an integer multiple of the audio frame length, there may be a problem of incomplete audio frames.
  • stream segment 1 of the audio stream contains audio frame (Frame) 1, audio frame 2 and a part of audio frame 3, while stream segment 2 contains another part of audio frame 3.
  • the decoder is used according to the header. Errors are reported when decoding stream segments 1 and 2.
  • the present disclosure also provides a solution.
  • the length of the audio frame is determined according to the header information; according to the length of the audio frame, the stream segments of the audio stream in each received stream segment are decoded to distinguish different audio frames.
  • the length of the audio frame may be determined according to the parameters included in the header information, for example, the length of the audio frame may be determined according to the sampling rate, bit depth, number of channels, etc., reference may be made to the prior art, and details will not be repeated.
  • decoding the stream segments of the audio stream in the received stream segments according to the header information includes steps S302 to S316 .
  • step S302 the length of the audio frame is determined according to the header information.
  • step S304 if the stream segment where the header information is located also contains audio data, the stream segment is regarded as the current stream segment of the audio stream.
  • step S306 for the current stream segment, according to the length of the audio frame, the audio frame is divided in the order of the data encapsulation format, that is, according to the length of the audio frame, the current stream segment is divided in the order of the data encapsulation format to obtain a or multiple complete audio frames.
  • data is arranged in a left-to-right or front-to-back order in a stream segment.
  • the tail data belongs to the incomplete audio frame 3.
  • step S308 the one or more complete audio frames are decoded.
  • step S310 it is determined whether the current stream segment is the last stream segment, if so, stop, otherwise step S312 is performed.
  • step S312 it is determined whether the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame. If yes, go to step S314, otherwise go to step S313.
  • step S313 after waiting for the next stream segment of the audio stream to be received, the next stream segment is regarded as the current stream segment, and the process returns to step S306 to restart the execution.
  • step S314 the incomplete audio frame is buffered.
  • step S316 after waiting for the next stream segment of the audio stream to be received, splicing the next stream segment with the incomplete audio frame to obtain the spliced stream segment as the current stream segment, and returning to step S306 to start again implement
  • the first half of the audio frame 3 in the stream segment 2 and the stream segment 1 are spliced to form a complete frame.
  • the method of the above embodiment considers that incomplete frame information is buffered until the next stream segment is received and then spliced, which solves the problem that the stream segment cannot be decoded correctly when the stream segment contains incomplete audio frames.
  • the present disclosure also provides a decoding apparatus, which will be described in conjunction with 4 below.
  • FIG. 4 is a structural diagram of some embodiments of the disclosed decoding apparatus.
  • the apparatus 40 in this embodiment includes: a cache module 410 , a header information parsing module 420 , a header information saving module 430 , and a decoding module 440 .
  • the buffering module 410 is configured to buffer the stream segments of the received data stream, wherein the data stream includes an audio stream.
  • the header information parsing module 420 is configured to parse the cached one or more stream segments until the header information is obtained by parsing.
  • the header information parsing module 420 is configured to determine whether the total data length of all currently buffered stream segments reaches a preset frame length; the total data length of all currently buffered stream segments reaches a preset frame length In the case of , parse the stream segment from the start data to the data that meets the preset frame length; determine whether the header information is successfully parsed; if the header information is not successfully parsed, increase the preset frame length If the preset value is set, the preset frame length is updated; the above steps are repeated until the header information is obtained by parsing.
  • the header information parsing module 420 is configured to re-execute the determination after waiting for the next stream segment to be buffered when the total data length of all currently buffered stream segments does not reach the preset frame length Whether the total data length of all buffered stream segments reaches the preset frame length.
  • the header information parsing module 420 is configured to call the Open avformat method in FFmpeg to parse the cached stream segment until the parsing obtains the header information.
  • the header information saving module 430 is used to save the header information.
  • the decoding module 440 is configured to decode the stream segments of the audio stream among the received stream segments according to the header information, until the decoding of the audio stream is completed.
  • the decoding module 440 is configured to determine the length of the audio frame according to the header information; according to the length of the audio frame, the stream segments of the audio stream in each received stream segment are decoded by distinguishing different audio frames.
  • the decoding module 440 is configured to divide the current stream segment of the audio stream in the order of the data encapsulation format according to the length of the audio frame to obtain one or more complete audio frames; Decode the complete audio frame; determine whether the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame; if the tail data of the current stream segment of the audio stream belongs to an incomplete audio frame, it will be incomplete After receiving the next stream segment of the audio stream, splicing the next stream segment with the incomplete audio frame to obtain the spliced stream segment; using the spliced stream segment as the current audio stream segment stream segment, repeat the above steps until the decoding of the last stream segment of the audio stream is completed.
  • the decoding module 440 is configured to decode the current stream segment or the current stream segment and the stream segments following the current stream segment if the decoding of the current stream segment of the audio stream fails according to the header information Perform parsing until new header information is obtained by parsing; decode the stream segments following the current stream segment according to the new header information, until the decoding of the audio stream is completed.
  • the decoding module 440 is configured to determine whether the data stream includes other data streams other than the audio stream according to the header information; in the case that the data stream includes other data streams other than the audio stream, the other data streams are compared with the audio stream. Separation; determining the format information of the audio stream according to the header information; transcoding each stream segment of the audio stream into the original audio stream according to the format information of the audio stream; resampling the original audio stream according to the preset bit rate.
  • the decoding module 440 is configured to call the Separate stream method in FFmpeg to separate other data streams from the audio stream; call the Parse format method in FFmpeg to determine the format information of the audio stream according to the header information, according to The format information of the audio stream transcodes each stream segment of the audio stream into an original audio stream, and resamples the original audio stream according to a preset bit rate.
  • the decoding apparatuses in the embodiments of the present disclosure may each be implemented by various computing devices or computer systems, which will be described below with reference to FIG. 5 and FIG. 6 .
  • FIG. 5 is a structural diagram of some embodiments of the disclosed decoding apparatus.
  • the apparatus 50 of this embodiment includes a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to execute any of the implementations of the present disclosure based on instructions stored in the memory 510 The decoding method in the example.
  • the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • FIG. 6 is a structural diagram of other embodiments of the disclosed decoding apparatus.
  • the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630 , 640 , 650 and the memory 610 and the processor 620 can be connected, for example, through a bus 660 .
  • the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 640 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server.
  • the storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.
  • embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .
  • computer-usable non-transitory storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps configured to implement the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种解码方法、装置和计算机可读存储介质,涉及计算机技术领域。方法包括:对接收的数据流的一个或多个流片段进行缓存(S102),其中,数据流包括音频流;对缓存的流片段进行解析,直至解析得到头部信息(S104);将头部信息进行保存(S106);根据头部信息对接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码(S108)。

Description

解码方法、装置和计算机可读存储介质
相关申请的交叉引用
本申请是以CN申请号为202110229441.9,申请日为2021年3月2日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机技术领域,特别涉及一种解码方法、装置和计算机可读存储介质。
背景技术
随着人工智能的高速发展,人工智能客服机器人的应用越来越广泛,涉及语音识别技术,而语音识别又依赖说实时的音频流的输入,作为语音识别的前置条件。通常在人工智能客服领域,需要对用户同机器人讲的话进行识别,用户讲的话会作为音频流实时传入系统中,因此在对音频流的实时解码成为一个要解决的问题。
对音频流进行实时解码需要获取音频的格式、参数等,这些信息一般包含在头部信息中。
发明内容
根据本公开的一些实施例,提供的一种解码方法,包括:对接收的数据流的一个或多个流片段进行缓存,其中,数据流包括音频流;对缓存的流片段进行解析,直至解析得到头部信息;将头部信息进行保存;根据头部信息对接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码。
在一些实施例中,对缓存的流片段进行解析,直至解析得到头部信息包括:确定当前缓存的所有流片段的总的数据长度是否达到预设帧长度;在当前缓存的所有流片段的总的数据长度达到预设帧长度的情况下,对流片段中从起始数据开始到满足预设帧长度的数据进行解析;确定是否成功解析得到头部信息;在未成功解析得到头部信息的情况下,将预设帧长度增加预设值,更新预设帧长度;重复执行上述步骤,直至解析得到头部信息。
在一些实施例中,对缓存的流片段进行解析,直至解析得到头部信息还包括:在 当前缓存的所有流片段的总的数据长度未达到预设帧长度的情况下,等待接收到下一个流片段进行缓存后,重新执行确定缓存的所有流片段的总的数据长度是否达到预设帧长度。
在一些实施例中,根据头部信息对接收的各个流片段中音频流的流片段进行解码包括:根据头部信息确定音频帧的长度;根据音频帧的长度,对接收到的各个流片段中音频流的流片段区分不同的音频帧进行解码。
在一些实施例中,根据音频帧的长度,对接收到的各个流片段中音频流的流片段区分不同的音频帧进行解码包括:根据音频帧的长度,按照数据封装格式的顺序对音频流的当前的流片段,进行划分,得到一个或多个完整的音频帧;对一个或多个完整的音频帧进行解码;确定音频流的当前的流片段的尾部数据是否属于不完整的音频帧;在音频流的当前的流片段的尾部数据属于不完整的音频帧的情况下,将不完整的音频帧缓存;等待接收到音频流的下一个流片段后,将下一个流片段与不完整的音频帧进行拼接,得到拼接后的流片段;将拼接后的流片段作为音频流的当前的流片段,重复执行上述步骤,直至完成对音频流的最后一个流片段的解码。
在一些实施例中,根据头部信息对接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码包括:在根据头部信息对音频流的当前的流片段进行解码失败的情况下,对当前的流片段或者当前的流片段以及当前的流片段之后的流片段进行解析,直至解析得到新的头部信息;根据新的头部信息对接当前的流片段之后的流片段进行解码,直至完成对音频流的解码。
在一些实施例中,对缓存的流片段进行解析,直至解析得到头部信息包括:调用FFmpeg中的Open avformat方法对缓存的流片段进行解析,直至解析得到头部信息。
在一些实施例中,根据头部信息对接收的各个流片段中音频流的流片段进行解码包括:根据头部信息确定数据流是否包括音频流以外的其他数据流;在数据流包括音频流以外的其他数据流的情况下,将其他数据流与音频流分离;根据头部信息确定音频流的格式信息;根据音频流的格式信息将音频流的各个流片段转码为原始音频流;将原始音频流按照预设码率重新采样。
在一些实施例中,调用FFmpeg中的分离流Separate stream方法将其他数据流与音频流分离;调用FFmpeg中的解析格式Parse format方法根据头部信息确定音频流的格式信息,根据音频流的格式信息将音频流的各个流片段转码为原始音频流,并将原始音频流按照预设码率重新采样。
根据本公开的另一些实施例,提供的一种解码装置,包括:缓存模块,用于对接收的数据流的一个或多个流片段进行缓存,其中,数据流包括音频流;头部信息解析模块,用于对缓存的流片段进行解析,直至解析得到头部信息;头部信息保存模块,用于将头部信息进行保存;解码模块,用于根据头部信息对接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码。
根据本公开的又一些实施例,提供的一种解码装置,包括:处理器;以及耦接至处理器的存储器,用于存储指令,指令被处理器执行时,使处理器执行如前述任意实施例的解码方法。
根据本公开的再一些实施例,提供的一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意实施例的解码方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明被配置为解释本公开,并不构成对本公开的不当限定。
图1示出本公开的一些实施例的解码方法的流程示意图。
图2示出本公开的一些实施例的音频流的结构示意图。
图3示出本公开的另一些实施例的解码方法的流程示意图。
图4示出本公开的一些实施例的解码装置的结构示意图。
图5示出本公开的另一些实施例的解码装置的结构示意图。
图6示出本公开的又一些实施例的解码装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
发明人发现:实际人工智能客服的电话场景中,音频需要进行流式处理,即将音 频文件分为音频流片段进行传输,此时第一个流片段或前几个流片段中会包含音频编码时产生的头部信息。之后的流片段中不包含头部信息,尤其在使用FFmpeg工具针对不同的流片段进行解码的过程中,会因为大部分流片段不包含头部信息无法解码而返回错误信息,无法满足实时音频流在人工智能客服场景中实时解码的需求。
本公开所要解决的一个技术问题是:如何实现对音频流的实时解码。
本公开提供一种解码方法可以用于人工智能客服场景中对音频流实时解码,下面结合图1~图3进行描述。
图1为本公开解码方法一些实施例的流程图。如图1所示,该实施例的方法包括:步骤S102~S108。
在步骤S102中,对接收的数据流的一个或多个流片段进行缓存。
数据流包括音频流,还可以包括音频流之外的其他数据流,例如,视频流等非音频数据流,音频流和其他数据流混合的情况下,后续步骤中需要进行不同流的分离,具体在之后的实施例中描述。数据流在传输过程中被划分为多个流片段,每个流片段可以被封装成数据包(Package)进行传输。解码装置(执行本公开解码方法的装置)接收到数据包后,对数据包进行解析得到流片段,对流片段进行缓存。
可以基于FFmpeg API实现本公开的方案。首先可以初始化avformat和avio context两个模块(Init avformat/Init avio context),分别用来进行后续的头部信息解析以及音频流的读取,对流片段进行缓存可以调用Buffer stream方法。
在步骤S104中,对缓存的流片段进行解析,直至解析得到头部信息。
头部(Header)信息例如包括:音频流的格式信息和至少一个参数,至少一个参数例如包括:采样率、位深、通道数、压缩比等至少一项,不限于所举示例。由于流片段的划分是不确定的,有可能一个流片段中就包含完整的头部信息,也有可能一个流片段只包含部分头部信息,需要多个流片段才能得到完整的头部信息。在一些实施例中,每次缓存一个流片段之后,尝试对之前缓存的所有流片段进行解析,确定是否成功解析得到头部信息,在未成功解析得到头部信息的情况下,继续对下一个流片段进行缓存,重复上述过程,直至成功解析得到头部信息。
在另一些实施例中,确定当前缓存的所有流片段的总的数据长度是否达到预设帧长度;在当前缓存的所有流片段的总的数据长度达到预设帧长度的情况下,对当前缓存的流片段中从起始数据开始到满足预设帧长度的数据进行解析;确定是否成功解析得到头部信息;在未成功解析得到头部信息的情况下,将预设帧长度增加预设值,更 新预设帧长度;重复执行上述步骤,直至解析得到头部信息。
预设帧长度可以是根据历史音频流中头部信息的长度统计得到的。每次缓存一个流片段之后可以判断当前缓存的所有流片段的总的数据长度是否达到预设帧长度。在当前缓存的所有流片段的总的数据长度未达到预设帧长度的情况下,等待接收到下一个流片段进行缓存后,重新执行确定缓存的所有流片段的总的数据长度是否达到预设帧长度的步骤。直至当前缓存的所有流片段的总的数据长度达到预设帧长度,则尝试对当前缓存的流片段中从起始数据开始到满足预设帧长度的数据进行解析。
例如,预设帧长度为200字节,则由最开始缓存的流片段的第一个字节开始至200字节的长度的数据,作为待解析数据,对待解析数据进行解析确定是否成功解析得到头部信息。如果成功解析得到头部信息,则停止头部信息的解析过程。如果解析头部信息失败,预设帧长度增加预设值,更新预设帧长度,例如,将200字节增加至300字节。之后由确定当前缓存的所有流片段的总的数据长度是否达到预设帧长度的步骤开始重新执行。
可以调用FFmpeg中的Open avformat方法对缓存的流片段进行解析,直至解析得到头部信息。上述方法中通过持续对缓存的流片段进行头部解析处理的尝试,避免了在头部信息被划分至不同流片段的情况下,无法成功解析的问题。通过预设帧长度的判断和修正,减少了解析次数,提高效率。
在步骤S106中,将头部信息进行保存。
在步骤S108中,根据头部信息对接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码。
在数据流只包含音频流的情况下,利用头部信息直接对接收到的每个流片段进行解码。在数据流包含音频流和其他数据流(非音频数据流)的情况下,需要进行流的分离操作。在一些实施例中,根据头部信息确定数据流是否包括音频流以外的其他数据流;在数据流包括音频流以外的其他数据流的情况下,将其他数据流与音频流分离。例如,调用FFmpeg中的Separate stream(分离流)方法将其他数据流与音频流分离。
将接收的每个流片段中的音频流的流片段分离出来后,利用头部信息对音频流的流片段进行解码。在一些实施例中,根据头部信息确定音频流的格式信息;根据音频流的格式信息将音频流的各个流片段转码为原始音频流;将原始音频流按照预设码率重新采样。重新采样的码率符合播放设备的码率,便于播放。例如,调用FFmpeg中的Parse format(解析格式)方法根据头部信息确定音频流的格式信息,根据音频流 的格式信息将音频流的各个流片段转码为原始音频流,并将原始音频流按照预设码率重新采样。
在音频流只包含一个完整的音频文件的情况下,利用保存的头部信息可以实现对整个音频流的正确解码。在音频流包含多个完整的音频文件的情况下,不同音频文件的头部信息可能会不同,在解码过程中出现失败。针对该问题,在一些实施例中,在根据头部信息对音频流的当前的流片段进行解码失败的情况下,对当前的流片段或者当前的流片段以及当前的流片段之后的流片段进行解析,直至解析得到新的头部信息;根据新的头部信息对接当前的流片段之后的流片段进行解码,直至完成对音频流的解码。
解析获取新的头部信息的方法可以参考前述实施例中解析头部信息的方法。将新的头部信息进行保存,可以删除原来保存的头部信息,利用新的头部信息对之后接收的流片段进行解码,直至完成对音频流的解码。
上述实施例的方法首先对接收的数据流的一个或多个流片段进行缓存,对缓存的流片段持续进行解析,直至解析得到头部信息,将头部信息进行保存,利用头部信息对后续接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码。上述实施例的方法可以实现音频流的实时解码,满足实时音频流在人工智能客服场景中实时解码的需求。
尤其针对使用FFmpeg工具实现音频解码的场景,上述实施例的方法通过音频流buffer缓存流片段,提取头部信息(包含音频流的格式信息和参数等)解析后进行保存,根据头部信息可以解析出音频流的格式信息和至少一个参数,通过音频流的格式信息可以获取到解码器类型,并在之后接收的音频流的流片段中,使用之前缓存的解码器类型链接对应的解码器引擎,并根据音频流的至少一个参数去对之后的流片段进行解码尝试,这种情况下,对于使用标准的音频编码器生成的音频流,可以做到实时解码,解决了FFmpeg工具由于大部分流片段中不包含头部信息而无法解码的问题。
音频流传输过程中,如果传输的流片段并非按照音频帧长度的整数倍进行切分,可能存在不完整音频帧的问题。如图2所示,音频流的流片段1包含音频帧(Frame)1、音频帧2以及音频帧3的一部分,而流片段2包含音频帧3的另一部分,此时使用解码器根据头部信息对流片段1和2解码时会报错。针对上述问题,本公开还提供一种解决方法。在一些实施例中,根据头部信息确定音频帧的长度;根据音频帧的长度,对接收到的各个流片段中音频流的流片段区分不同的音频帧进行解码。可以根据 头部信息包含的参数确定音频帧的长度,例如,根据采样率、位深、通道数等确定音频帧的长度,可以参考现有技术,具体不再赘述。
进一步,如图3所示,根据头部信息对接收的各个流片段中音频流的流片段进行解码包括:步骤S302~S316。
在步骤S302中,根据头部信息确定音频帧的长度。
在步骤S304中,如果头部信息所在的流片段还包含音频数据,则将该流片段作为音频流的当前的流片段。
在步骤S306中,对当前的流片段,根据音频帧的长度,按照数据封装格式的顺序划分音频帧,即根据音频帧的长度,按照数据封装格式的顺序对当前的流片段进行划分,得到一个或多个完整的音频帧。
例如,数据在流片段中按照从左到右或者从前到后的顺序排列。如图2所示,流片段1划分音频帧后,尾部数据属于不完整的音频帧3。
在步骤S308中,对所述一个或多个完整的音频帧进行解码。
在步骤S310中,确定当前的流片段是否为最后一个流片段,如果是,则停止,否则执行步骤S312。
在步骤S312中,确定音频流的当前的流片段的尾部数据是否属于不完整的音频帧。如果是,则执行步骤S314,否则执行步骤S313。
在步骤S313中,等待接收到音频流的下一个流片段后,将下一个流片段作为当前的流片段,返回步骤S306重新开始执行。
在步骤S314中,将不完整的音频帧缓存。
在步骤S316中,等待接收到音频流的下一个流片段后,将下一个流片段与不完整的音频帧进行拼接,得到拼接后的流片段,作为当前的流片段,并返回步骤S306重新开始执行
如图2所示,将流片段2与流片段1中音频帧3的前半部分进行拼接,组成完整的帧。
上述实施例的方法,考虑对不完整的帧信息缓存到下个流片段接收后再进行拼接处理,解决了流片段中包含不完整的音频帧的情况下,无法正确解码的问题。
本公开还提供一种解码装置,下面结合4进行描述。
图4为本公开解码装置的一些实施例的结构图。如图4所示,该实施例的装置40包括:缓存模块410,头部信息解析模块420,头部信息保存模块430,解码模块440。
缓存模块410用于对接收的数据流的流片段进行缓存,其中,数据流包括音频流。
头部信息解析模块420用于对缓存的一个或多个流片段进行解析,直至解析得到头部信息。
在一些实施例中,头部信息解析模块420用于确定当前缓存的所有流片段的总的数据长度是否达到预设帧长度;在当前缓存的所有流片段的总的数据长度达到预设帧长度的情况下,对流片段中从起始数据开始到满足预设帧长度的数据进行解析;确定是否成功解析得到头部信息;在未成功解析得到头部信息的情况下,将预设帧长度增加预设值,更新预设帧长度;重复执行上述步骤,直至解析得到头部信息。
在一些实施例中,头部信息解析模块420用于在当前缓存的所有流片段的总的数据长度未达到预设帧长度的情况下,等待接收到下一个流片段进行缓存后,重新执行确定缓存的所有流片段的总的数据长度是否达到预设帧长度。
在一些实施例中,头部信息解析模块420用于调用FFmpeg中的Open avformat方法对缓存的流片段进行解析,直至解析得到头部信息。
头部信息保存模块430用于将头部信息进行保存。
解码模块440用于根据头部信息对接收的各个流片段中音频流的流片段进行解码,直至完成对音频流的解码。
在一些实施例中,解码模块440用于根据头部信息确定音频帧的长度;根据音频帧的长度,对接收到的各个流片段中音频流的流片段区分不同的音频帧进行解码。
在一些实施例中,解码模块440用于根据音频帧的长度,按照数据封装格式的顺序对音频流的当前的流片段,进行划分,得到一个或多个完整的音频帧;对一个或多个完整的音频帧进行解码;确定音频流的当前的流片段的尾部数据是否属于不完整的音频帧;在音频流的当前的流片段的尾部数据属于不完整的音频帧的情况下,将不完整的音频帧缓存;等待接收到音频流的下一个流片段后,将下一个流片段与不完整的音频帧进行拼接,得到拼接后的流片段;将拼接后的流片段作为音频流的当前的流片段,重复执行上述步骤,直至完成对音频流的最后一个流片段的解码。
在一些实施例中,解码模块440用于在根据头部信息对音频流的当前的流片段进行解码失败的情况下,对当前的流片段或者当前的流片段以及当前的流片段之后的流片段进行解析,直至解析得到新的头部信息;根据新的头部信息对接当前的流片段之后的流片段进行解码,直至完成对音频流的解码。
在一些实施例中,解码模块440用于根据头部信息确定数据流是否包括音频流以 外的其他数据流;在数据流包括音频流以外的其他数据流的情况下,将其他数据流与音频流分离;根据头部信息确定音频流的格式信息;根据音频流的格式信息将音频流的各个流片段转码为原始音频流;将原始音频流按照预设码率重新采样。
在一些实施例中,解码模块440用于调用FFmpeg中的分离流Separate stream方法将其他数据流与音频流分离;调用FFmpeg中的解析格式Parse format方法根据头部信息确定音频流的格式信息,根据音频流的格式信息将音频流的各个流片段转码为原始音频流,并将原始音频流按照预设码率重新采样。
本公开的实施例中的解码装置可各由各种计算设备或计算机系统来实现,下面结合图5以及图6进行描述。
图5为本公开解码装置的一些实施例的结构图。如图5所示,该实施例的装置50包括:存储器510以及耦接至该存储器510的处理器520,处理器520被配置为基于存储在存储器510中的指令,执行本公开中任意一些实施例中的解码方法。
其中,存储器510例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。
图6为本公开解码装置的另一些实施例的结构图。如图6所示,该实施例的装置60包括:存储器610以及处理器620,分别与存储器510以及处理器520类似。还可以包括输入输出接口630、网络接口640、存储接口650等。这些接口630,640,650以及存储器610和处理器620之间例如可以通过总线660连接。其中,输入输出接口630为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口640为各种联网设备提供连接接口,例如可以连接到数据库服务器或者云端存储服务器等。存储接口650为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可 提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (12)

  1. 一种解码方法,包括:
    对接收的数据流的一个或多个流片段进行缓存,其中,所述数据流包括音频流;
    对缓存的流片段进行解析,直至解析得到头部信息;
    将所述头部信息进行保存;
    根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码,直至完成对所述音频流的解码。
  2. 根据权利要求1所述的解码方法,其中,所述对缓存的流片段进行解析,直至解析得到头部信息包括:
    确定当前缓存的所有流片段的总的数据长度是否达到预设帧长度;
    在当前缓存的所有流片段的总的数据长度达到所述预设帧长度的情况下,对所述流片段中从起始数据开始到满足所述预设帧长度的数据进行解析;
    确定是否成功解析得到所述头部信息;
    在未成功解析得到所述头部信息的情况下,将所述预设帧长度增加预设值,更新所述预设帧长度;
    重复执行上述步骤,直至解析得到头部信息。
  3. 根据权利要求2所述的解码方法,其中,所述对缓存的流片段进行解析,直至解析得到头部信息还包括:
    在当前缓存的所有流片段的总的数据长度未达到预设帧长度的情况下,等待接收到下一个流片段进行缓存后,重新执行确定缓存的所有流片段的总的数据长度是否达到预设帧长度。
  4. 根据权利要求1所述的解码方法,其中,所述根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码包括:
    根据所述头部信息确定音频帧的长度;
    根据所述音频帧的长度,对接收到的各个流片段中所述音频流的流片段区分不同的音频帧进行解码。
  5. 根据权利要求4所述的解码方法,其中,所述根据所述音频帧的长度,对接收到的各个流片段中所述音频流的流片段区分不同的音频帧进行解码包括:
    根据所述音频帧的长度,按照数据封装格式的顺序对所述音频流的当前的流片段,进行划分,得到一个或多个完整的音频帧;
    对所述一个或多个完整的音频帧进行解码;
    确定所述音频流的当前的流片段的尾部数据是否属于不完整的音频帧;
    在所述音频流的当前的流片段的尾部数据属于不完整的音频帧的情况下,将所述不完整的音频帧缓存;
    等待接收到所述音频流的下一个流片段后,将所述下一个流片段与所述不完整的音频帧进行拼接,得到拼接后的流片段;
    将拼接后的流片段作为所述音频流的当前的流片段,重复执行上述步骤,直至完成对所述音频流的最后一个流片段的解码。
  6. 根据权利要求1所述的解码方法,其中,所述根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码,直至完成对所述音频流的解码包括:
    在根据所述头部信息对所述音频流的当前的流片段进行解码失败的情况下,对当前的流片段或者当前的流片段以及当前的流片段之后的流片段进行解析,直至解析得到新的头部信息;
    根据所述新的头部信息对接当前的流片段之后的流片段进行解码,直至完成对所述音频流的解码。
  7. 根据权利要求1所述的解码方法,其中,所述对缓存的流片段进行解析,直至解析得到头部信息包括:
    调用FFmpeg中的Open avformat方法对缓存的流片段进行解析,直至解析得到头部信息。
  8. 根据权利要求1所述的解码方法,其中,所述根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码包括:
    根据所述头部信息确定所述数据流是否包括所述音频流以外的其他数据流;
    在所述数据流包括所述音频流以外的其他数据流的情况下,将所述其他数据流与所述音频流分离;
    根据所述头部信息确定所述音频流的格式信息;
    根据所述音频流的格式信息将所述音频流的各个流片段转码为原始音频流;
    将所述原始音频流按照预设码率重新采样。
  9. 根据权利要求8所述的解码方法,其中,
    调用FFmpeg中的分离流Separate stream方法将所述其他数据流与所述音频流分离;
    调用FFmpeg中的解析格式Parse format方法根据所述头部信息确定所述音频流的格式信息,根据所述音频流的格式信息将所述音频流的各个流片段转码为原始音频流,并将所述原始音频流按照预设码率重新采样。
  10. 一种解码装置,包括:
    缓存模块,用于对接收的数据流的一个或多个流片段进行缓存,其中,所述数据流包括音频流;
    头部信息解析模块,用于对缓存的流片段进行解析,直至解析得到头部信息;
    头部信息保存模块,用于将所述头部信息进行保存;
    解码模块,用于根据所述头部信息对接收的各个流片段中所述音频流的流片段进行解码,直至完成对所述音频流的解码。
  11. 一种解码装置,包括:
    处理器;以及
    耦接至所述处理器的存储器,用于存储指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-9任一项所述的解码方法。
  12. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-9任一项所述方法的步骤。
PCT/CN2022/070088 2021-03-02 2022-01-04 解码方法、装置和计算机可读存储介质 WO2022183841A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023553356A JP2024509833A (ja) 2021-03-02 2022-01-04 復号化方法および装置、ならびにコンピュータ可読記憶媒体
US18/546,387 US20240233740A9 (en) 2021-03-02 2022-01-04 Decoding method and apparatus, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110229441.9A CN114093375A (zh) 2021-03-02 2021-03-02 解码方法、装置和计算机可读存储介质
CN202110229441.9 2021-03-02

Publications (1)

Publication Number Publication Date
WO2022183841A1 true WO2022183841A1 (zh) 2022-09-09

Family

ID=80295963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070088 WO2022183841A1 (zh) 2021-03-02 2022-01-04 解码方法、装置和计算机可读存储介质

Country Status (4)

Country Link
US (1) US20240233740A9 (zh)
JP (1) JP2024509833A (zh)
CN (1) CN114093375A (zh)
WO (1) WO2022183841A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050018775A1 (en) * 2003-07-23 2005-01-27 Mk Subramanian System and method for audio/video synchronization
CN1909657A (zh) * 2005-08-05 2007-02-07 乐金电子(惠州)有限公司 Mpeg音频解码方法
CN102254560A (zh) * 2010-05-19 2011-11-23 安凯(广州)微电子技术有限公司 一种移动数字电视录像中的音频处理方法
CN104113777A (zh) * 2014-08-01 2014-10-22 广州金山网络科技有限公司 一种音频流解码方法及装置
CN104202656A (zh) * 2014-09-16 2014-12-10 国家计算机网络与信息安全管理中心 网络音频mp3流乱序分段解码方法
CN104780422A (zh) * 2014-01-13 2015-07-15 北京兆维电子(集团)有限责任公司 流媒体播放方法及流媒体播放器
CN108122558A (zh) * 2017-12-22 2018-06-05 深圳国微技术有限公司 一种latm aac音频流的实时转容实现方法及装置
CN108389582A (zh) * 2016-12-12 2018-08-10 中国航空工业集团公司西安航空计算技术研究所 Mpeg-2/4aac音频解码错误检测及处理方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050018775A1 (en) * 2003-07-23 2005-01-27 Mk Subramanian System and method for audio/video synchronization
CN1909657A (zh) * 2005-08-05 2007-02-07 乐金电子(惠州)有限公司 Mpeg音频解码方法
CN102254560A (zh) * 2010-05-19 2011-11-23 安凯(广州)微电子技术有限公司 一种移动数字电视录像中的音频处理方法
CN104780422A (zh) * 2014-01-13 2015-07-15 北京兆维电子(集团)有限责任公司 流媒体播放方法及流媒体播放器
CN104113777A (zh) * 2014-08-01 2014-10-22 广州金山网络科技有限公司 一种音频流解码方法及装置
CN104202656A (zh) * 2014-09-16 2014-12-10 国家计算机网络与信息安全管理中心 网络音频mp3流乱序分段解码方法
CN108389582A (zh) * 2016-12-12 2018-08-10 中国航空工业集团公司西安航空计算技术研究所 Mpeg-2/4aac音频解码错误检测及处理方法
CN108122558A (zh) * 2017-12-22 2018-06-05 深圳国微技术有限公司 一种latm aac音频流的实时转容实现方法及装置

Also Published As

Publication number Publication date
US20240233740A9 (en) 2024-07-11
US20240135942A1 (en) 2024-04-25
JP2024509833A (ja) 2024-03-05
CN114093375A (zh) 2022-02-25

Similar Documents

Publication Publication Date Title
WO2020078165A1 (zh) 视频处理方法、装置、电子设备及计算机可读介质
CN110996160B (zh) 视频处理方法、装置、电子设备及计算机可读取存储介质
WO2022021852A1 (zh) 一种基于fpga的fast协议解码方法、装置及设备
US10177958B2 (en) Method for synchronously taking audio and video in order to proceed one-to-multi multimedia stream
US10476928B2 (en) Network video playback method and apparatus
WO2020155964A1 (zh) 音视频的切换方法、装置、计算机设备及可读存储介质
CN103179431A (zh) Vdi环境下音视频重定向转码分离方法
CN115243074B (zh) 视频流的处理方法及装置、存储介质、电子设备
US20080033978A1 (en) Program, data processing method, and system of same
CN113079386B (zh) 一种视频在线播放方法、装置、电子设备及存储介质
US20070239780A1 (en) Simultaneous capture and analysis of media content
WO2022183841A1 (zh) 解码方法、装置和计算机可读存储介质
CN113382278A (zh) 视频推送方法、装置、电子设备和可读存储介质
CN110868610B (zh) 流媒体传输方法、装置、服务器及存储介质
EP3352077A1 (en) Method for synchronously taking audio and video in order to proceed one-to-multi multimedia stream
US11831430B2 (en) Methods and apparatuses for encoding and decoding signal frame
CN108124183B (zh) 以同步获取影音以进行一对多影音串流的方法
US7664373B2 (en) Program, data processing method, and system of same
US20100076944A1 (en) Multiprocessor systems for processing multimedia data and methods thereof
CN113784094A (zh) 视频数据处理方法、网关、终端设备及存储介质
WO2016107174A1 (zh) 多媒体文件数据的处理方法及系统、播放器和客户端
CN111126003A (zh) 话单数据处理方法及装置
CN114025196B (zh) 编码方法、解码方法、编解码装置及介质
CN116033113B (zh) 一种视频会议辅助信息传输方法和系统
CN111757168B (zh) 音频解码方法、装置、存储介质及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22762310

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18546387

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2023553356

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 11202305980W

Country of ref document: SG

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM1205A DATED 16.01.2024)

122 Ep: pct application non-entry in european phase

Ref document number: 22762310

Country of ref document: EP

Kind code of ref document: A1