CN114073098B

CN114073098B - Streaming media synchronization method and display device

Info

Publication number: CN114073098B
Application number: CN202080000658.6A
Authority: CN
Inventors: 朱宗花; 王云刚; 康健民
Original assignee: Vidaa Netherlands International Holdings BV
Current assignee: Vidaa Netherlands International Holdings BV
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2023-04-25
Anticipated expiration: 2040-04-28
Also published as: WO2021217435A1; CN114073098A

Abstract

The application discloses a streaming media synchronization method, comprising the following steps: respectively adding the parsed audio/video stream data to a first audio/video buffer queue; the audio/video stream data extracted from the first audio/video buffer queue are added to the second audio/video buffer queue respectively, a decapsulation module and a multi-buffer queue in the current player pipeline are destroyed, and the audio/video stream data are extracted from the second audio/video buffer queue and synchronously injected into a decoder.

Description

Streaming media synchronization method and display device

Technical Field

The present disclosure relates to the field of streaming media playing technologies, and in particular, to a streaming media synchronization method and a display device supporting a variable slice format.

Background

Streaming media is a relatively wide multimedia transmission scheme on the current network, and compared with traditional media, streaming media files can be sliced and transmitted. Slice files may typically contain one or more file types of video, audio, and subtitle streams, which may be combined for transmission or transmitted separately. When the video stream, the audio stream and the subtitle stream are separately transmitted, the audio stream and the subtitle stream are much smaller than the content of the video stream, so that the downloading progress of different types of slice files in the same network state is greatly different, and the respective slicing time lengths of the slice files are also greatly different. The time stamps of the code streams injected into the decoding module may have a large difference.

The streaming media player pipeline refers to a pipeline formed by a plurality of functional modules used in playing streaming media data. If the type or coding type of a certain slice file changes or the audio/video time stamp changes suddenly in the transmission process, the old slice file in the playing module usually needs to be processed before all the old slice files are changed, so that a new player pipeline can be created to process the changed new slice file, and the new player pipeline cannot be rebuilt in time.

Because the code stream time stamp of the slice file and the slice time length are large in difference, the basic stream injection module injects the slice file of the path with the largest waiting time stamp, and other paths of slice files are already injected, when the playing of the current video slice file is completed, the caption or the audio is still played, so that the problems of playing picture blocking, longer buffering, abnormal exiting of playing and the like occur, and the playing effect is extremely influenced.

Disclosure of Invention

In view of this, the present application provides a streaming media synchronization method and a display device supporting a variable slice format, which are used to realize continuous normal play of streaming media and improve user experience.

Specifically, the application is realized by the following embodiments:

in a first aspect, the present application provides a display device, comprising:

a display;

a network module for browsing and/or downloading service content from a server;

a decoder for decoding elementary stream data acquired from the service contents;

the slice downloading module downloads the current video slice file and the current audio slice file respectively;

the decapsulation module analyzes the downloaded video slice file and the audio slice file according to frames to obtain video basic stream data and audio basic stream data;

the multi-buffer queue module adds each frame of video elementary stream data after being unpacked to the first video buffer queue, and adds each frame of audio elementary stream data after being unpacked to the first audio buffer queue;

the basic flow synchronous injection module comprises a basic flow buffer submodule and a basic flow injection submodule, wherein,

the basic stream buffer sub-module continuously extracts video basic stream data from the first video buffer queue and adds the video basic stream data to the second video buffer queue, continuously extracts audio basic stream data from the first audio buffer queue and adds the audio basic stream data to the second audio buffer queue until all extraction of the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file is completed, and destroys the decapsulation module and the multi-buffer queue module in the current stream media player pipeline;

The basic stream injection submodule extracts the video basic stream data and the audio basic stream data from the second video buffer queue and the second audio buffer queue respectively according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data and synchronously injects the video basic stream data and the audio basic stream data into the decoder.

In a second aspect, the present application provides a display device, comprising:

a display;

a network module for browsing and/or downloading service content from a server;

the slicing buffer module continuously controls the slicing download module to download video slice files, and controls the slicing download module to download the next audio slice file based on the target time of the recently downloaded video slice files;

In a third aspect, the present application provides a streaming media synchronization method supporting a variable slice format, the method including:

respectively downloading a current video slice file and a current audio slice file;

analyzing the downloaded video slice file and audio slice file according to frames to obtain video basic stream data and audio basic stream data;

Adding each frame of video elementary stream data after decapsulation to a first video buffer queue, and adding each frame of audio elementary stream data after decapsulation to a first audio buffer queue;

continuously extracting video basic stream data from the first video buffer queue and adding the video basic stream data to a second video buffer queue, and continuously extracting audio basic stream data from the first audio buffer queue and adding the audio basic stream data to the second audio buffer queue until all the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are extracted;

and respectively extracting video elementary stream data and audio elementary stream data from the second video buffer queue and the second audio buffer queue according to the time stamp of the video elementary stream data and the time stamp of the audio elementary stream data, and synchronously injecting the video elementary stream data and the audio elementary stream data into the decoder.

In a fourth aspect, the present application provides a streaming media synchronization method supporting a variable slice format, the method including:

continuously controlling to download the video slice file, and controlling the slice downloading module to download the next audio slice file based on the target time of the recently downloaded video slice file;

Drawings

A schematic diagram of an operational scenario between the display device 200 and the control 100 is schematically shown in fig. 1A;

a block diagram of the configuration of the control apparatus 100 in fig. 1A is exemplarily shown in fig. 1B;

A block diagram of the configuration of the display device 200 in fig. 1A is exemplarily shown in fig. 1C;

an architectural configuration block diagram of the operating system in the memory of the display device 200 is illustrated in fig. 1D.

A schematic diagram of the structure of a player pipeline is exemplarily shown in fig. 2;

another schematic diagram of the structure of a player duct is exemplarily shown in fig. 3;

a process flow diagram for controlling slice file download is illustrated in fig. 4;

an interactive schematic diagram supporting streaming media synchronization in a variable slice format is exemplarily shown in fig. 5;

a process flow diagram of a method of streaming media synchronization supporting variable slice formats is exemplarily shown in fig. 6;

another process flow diagram for a method of streaming media synchronization supporting variable slice formats is illustrated in fig. 7.

Detailed Description

All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are intended to be within the scope of the present application based on the exemplary embodiments shown in the present application. Furthermore, while the disclosure is presented in the context of an exemplary embodiment or embodiments, it should be appreciated that the various aspects of the disclosure may, separately, comprise a complete embodiment.

The terms "comprises," "comprising," and "having," and any variations thereof, as used in this application, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.

The term "gesture" as used herein refers to a user behavior by which a user expresses an intended idea, action, purpose, and/or result through a change in hand shape or movement of a hand, etc.

A schematic diagram of an operation scenario between the display apparatus 200 and the control device 100 is exemplarily shown in fig. 1A. As shown in fig. 1A, communication between the control apparatus 100 and the display device 200 may be performed in a wired or wireless manner.

Wherein the control apparatus 100 is configured to control the display device 200, which can receive an operation instruction input by a user, and convert the operation instruction into an instruction recognizable and responsive to the display device 200, and to mediate interaction between the user and the display device 200. Such as: the user responds to the channel addition and subtraction operation by operating the channel addition and subtraction key on the control apparatus 100.

The control device 100 may be a remote control 100A, including an infrared protocol communication or a bluetooth protocol communication, and other short-range communication modes, and the display apparatus 200 is controlled by a wireless or other wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc. Such as: the user can input corresponding control instructions through volume up-down keys, channel control keys, up/down/left/right movement keys, voice input keys, menu keys, on-off keys, etc. on the remote controller to realize the functions of the control display device 200.

The control device 100 may also be an intelligent device, such as a mobile terminal 100B, a tablet computer, a notebook computer, or the like. For example, the display device 200 is controlled using an application running on a smart device. The application program, by configuration, can provide various controls to the user through an intuitive User Interface (UI) on a screen associated with the smart device.

In some embodiments, the mobile terminal 100B may install a software application with the display device 200, and implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 100B may be caused to establish a control instruction protocol with the display device 200 to implement functions such as physical keys arranged by the remote controller 100A by operating various function keys or virtual buttons of a user interface provided on the mobile terminal 100B. The audio/video content displayed on the mobile terminal 100B may also be transmitted to the display device 200, so as to implement a synchronous display function.

The display device 200 may be implemented as a television, and may provide a broadcast receiving television function as well as an intelligent network television function of a computer support function. Examples of display devices include digital televisions, web televisions, smart televisions, internet Protocol Televisions (IPTV), and the like.

The display device 200 may be a liquid crystal display, an organic light emitting display, a projection display device. The specific display device type, size, resolution, etc. are not limited.

The display device 200 is also in data communication with the server 300 via a variety of communication means. Display device 200 may be permitted to communicate via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 300 may provide various contents and interactions to the display device 200. By way of example, the display device 200 may send and receive information, such as: receiving Electronic Program Guide (EPG) data, receiving software program updates, or accessing a remotely stored digital media library. The servers 300 may be one group, may be multiple groups, and may be one or more types of servers. Other web service content such as video on demand and advertising services are provided through the server 300.

A block diagram of the configuration of the control apparatus 100 is exemplarily shown in fig. 1B. As shown in fig. 1B, the control device 100 includes a controller 110, a memory 120, a communicator 130, a user input interface 140, an output interface 150, and a power supply 160.

The controller 110 includes a Random Access Memory (RAM) 111, a Read Only Memory (ROM) 112, a processor 113, a communication interface, and a communication bus. The controller 110 is used to control the operation and operation of the control device 100, as well as the communication collaboration between the internal components, external and internal data processing functions.

In some embodiments, when an interaction in which a user presses a key disposed on the remote controller 100A or an interaction in which a touch panel disposed on the remote controller 100A is touched is detected, the controller 110 may control to generate a signal corresponding to the detected interaction and transmit the signal to the display device 200.

The memory 120 stores various operation programs, data, and applications for driving and controlling the control device 100 under the control of the controller 110. The memory 120 may store various control signal instructions input by a user.

The communicator 130 performs communication of control signals and data signals with the display device 200 under the control of the controller 110. Such as: the control apparatus 100 transmits a control signal (e.g., a touch signal or a button signal) to the display device 200 via the communicator 130, and the control apparatus 100 may receive the signal transmitted by the display device 200 via the communicator 130. Communicator 130 may include an infrared signal interface 131 and a radio frequency signal interface 132. For example: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. And the following steps: when the radio frequency signal interface is used, the user input instruction is converted into a digital signal, and then the digital signal is modulated according to a radio frequency control signal modulation protocol and then transmitted to the display device 200 through the radio frequency transmission terminal.

The user input interface 140 may include at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, etc., so that a user may input user instructions regarding controlling the display apparatus 200 to the control device 100 through voice, touch, gesture, press, etc.

The output interface 150 outputs a user instruction received by the user input interface 140 to the display device 200 or outputs an image or voice signal received by the display device 200. Here, the output interface 150 may include an LED interface 151, a vibration interface 152 generating vibrations, a sound output interface 153 outputting sound, a display 154 outputting an image, and the like. For example, the remote controller 100A may receive an output signal of audio, video, or data from the output interface 150, and display the output signal as an image form on the display 154, as an audio form at the sound output interface 153, or as a vibration form at the vibration interface 152.

A power supply 160 for providing operating power support for the various elements of the control device 100 under the control of the controller 110. May be in the form of a battery and associated control circuitry.

A hardware configuration block diagram of the display device 200 is exemplarily shown in fig. 1C. As shown in fig. 1C, the display apparatus 200 may further include therein a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a memory 260, a user interface 265, a video processor 270, a display 275, an audio processor 280, an audio input interface 285, a power supply 290.

The modem 210 receives broadcast television signals through a wired or wireless manner, and may perform modulation and demodulation processes such as amplification, mixing, and resonance, for demodulating an audio/video signal carried in a frequency of a television channel selected by a user and additional information (e.g., EPG data) from among a plurality of wireless or wired broadcast television signals.

The tuning demodulator 210 is responsive to the frequency of the television channel selected by the user and the television signal carried by that frequency, as selected by the user, and as controlled by the controller 250.

The tuning demodulator 210 can receive signals in various ways according to broadcasting systems of television signals, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and the analog signal and the digital signal can be demodulated according to the kind of the received television signal.

In other exemplary embodiments, the modem 210 may also be in an external device, such as an external set-top box or the like. In this way, the set-top box outputs a television signal after modulation and demodulation, and inputs the television signal to the display apparatus 200 through the external device interface 240.

The communicator 220 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the display device 200 may transmit content data to an external device connected via the communicator 220, or browse and download content data from an external device connected via the communicator 220. The communicator 220 may include a network communication protocol module or a near field communication protocol module such as a WIFI module 221, a bluetooth communication protocol module 222, a wired ethernet communication protocol module 223, etc., so that the communicator 220 may receive a control signal of the control device 100 according to the control of the controller 250 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, etc.

The detector 230 is a component of the display device 200 for collecting signals of the external environment or interaction with the outside. The detector 230 may include an image collector 231, such as a camera, webcam, etc., that may be used to collect external environmental scenes to adaptively change the display parameters of the display device 200; and the function is used for collecting the attribute of the user or interacting gestures with the user so as to realize the interaction between the display equipment and the user. A light receiver 232 may also be included for collecting ambient light intensity, to adapt to changes in display parameters of the display device 200, etc.

In other exemplary embodiments, the detector 230 may further include a temperature sensor, such as by sensing ambient temperature, the display device 200 may adaptively adjust the display color temperature of the image. In some embodiments, the display device 200 may be adjusted to display a colder color temperature shade of the image when the temperature is higher than ambient; when the temperature is low, the display device 200 may be adjusted to display a color temperature-warm tone of the image.

In other exemplary embodiments, the detector 230 may further include a sound collector, such as a microphone, that may be used to receive a user's sound, such as a voice signal of a control instruction of the user controlling the display device 200; alternatively, ambient sounds for identifying the type of ambient scene may be collected, and the implementation display device 200 may adapt to ambient noise.

The external device interface 240 is a component that provides the controller 210 to control data transmission between the display apparatus 200 and an external device. The external device interface 240 may be connected to an external device such as a set-top box, a game device, a notebook computer, etc., in a wired/wireless manner, and may receive data such as a video signal (e.g., a moving image), an audio signal (e.g., music), additional information (e.g., an EPG), etc., of the external device.

The external device interface 240 may include: any one or more of a High Definition Multimedia Interface (HDMI) terminal 241, a Composite Video Blanking Sync (CVBS) terminal 242, an analog or digital Component terminal 243, a Universal Serial Bus (USB) terminal 244, a Component terminal (not shown), a Red Green Blue (RGB) terminal (not shown), and the like.

The controller 250 controls the operation of the display device 200 and responds to the user's operations by running various software control programs (e.g., an operating system and various application programs) stored on the memory 260.

As shown in fig. 1C, the controller 250 includes a Random Access Memory (RAM) 251, a Read Only Memory (ROM) 252, a graphics processor 253, a CPU processor 254, a communication interface 255, and a communication bus 256. The RAM251, the ROM252, the graphics processor 253, and the CPU 254 are connected to each other via a communication bus 256.

A ROM252 for storing various system boot instructions. When the power of the display apparatus 200 starts to be started upon receiving the power-on signal, the CPU processor 254 runs a system start instruction in the ROM252, copies the operating system stored in the memory 260 into the RAM251 to start running the start operating system. When the operating system is started, the CPU processor 254 copies various applications in the memory 260 to the RAM251, and then starts running the various applications.

The graphic processor 253 generates screen images of various graphic objects such as icons, images, and operation menus. The graphic processor 253 may include an operator for performing an operation by receiving user input of various interactive instructions, thereby displaying various objects according to display attributes; and a renderer for generating various objects based on the operator, and displaying the result of rendering on the display 275.

CPU processor 254 is operative to execute operating system and application program instructions stored in memory 260. And executing processing of various application programs, data and contents according to the received user input instructions so as to finally display and play various audio and video contents.

In some exemplary embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include one main processor and a plurality or one sub-processor. A main processor for performing some initialization operations of the display device 200 in a display device preloading mode and/or an operation of displaying a picture in a normal mode. A plurality of or a sub-processor for performing an operation in a state of standby mode or the like of the display device.

Communication interface 255 may include a first interface through an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.

The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user input command for selecting a GUI object displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user input command.

Wherein the object may be any one of selectable objects, such as a hyperlink or an icon. The operation related to the selected object, for example, an operation of displaying a link to a hyperlink page, a document, an image, or the like, or an operation of executing a program corresponding to an icon. The user input command for selecting the GUI object may be a command input through various input means (e.g., mouse, keyboard, touch pad, etc.) connected to the display apparatus 200 or a voice command corresponding to a user speaking voice.

The memory 260 is used to store various types of data, software programs, or applications that drive and control the operation of the display device 200. Memory 260 may include volatile and/or nonvolatile memory. And the term "memory" includes memory 260, RAM251 and ROM252 of controller 250, or a memory card in display device 200.

In some embodiments, the memory 260 is specifically configured to store an operating program that drives the controller 250 in the display device 200; various application programs built in the display device 200 and downloaded from an external device by a user are stored; data for configuring various GUIs provided by the display 275, various objects related to the GUIs, visual effect images of selectors for selecting GUI objects, and the like are stored.

In some embodiments, the memory 260 is specifically configured to store drivers and related data for the modem 210, the communicator 220, the detector 230, the external device interface 240, the video processor 270, the display 275, the audio processor 280, etc., external data (e.g., audio-visual data) received from the external device interface, or user data (e.g., key information, voice information, touch information, etc.) received from the user interface.

In some embodiments, memory 260 specifically stores software and/or programs for representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. In some embodiments, the kernel may control or manage system resources, as well as other program implemented functions (e.g., the middleware, APIs, or application programs); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to implement control or management of system resources.

An architectural configuration block diagram of the operating system in the memory of the display device 200 is illustrated in fig. 1D. The operating system architecture is an application layer, a middleware layer and a kernel layer in sequence from top to bottom.

The application layer, the application programs built in the system and the non-system-level application programs belong to the application layer and are responsible for direct interaction with the user. The application layer may include a plurality of applications, such as NETFLIX applications, setup applications, media center applications, and the like. These applications may be implemented as Web applications that execute based on WebKit engines, and in particular may be developed and executed based on HTML, cascading Style Sheets (CSS), and JavaScript.

Here, HTML, which is called a hypertext markup language (HyperText Markup Language) in its entirety, is a standard markup language for creating web pages, which are described by markup tags for describing words, graphics, animations, sounds, tables, links, etc., and a browser reads an HTML document, interprets the contents of tags within the document, and displays them in the form of web pages.

CSS, collectively referred to as cascading style sheets (Cascading Style Sheets), is a computer language used to represent the style of HTML files and may be used to define style structures such as fonts, colors, positions, and the like. The CSS style can be directly stored in an HTML webpage or a separate style file, so that the control of the style in the webpage is realized.

JavaScript, a language applied to Web page programming, can be inserted into HTML pages and interpreted by a browser. The interaction logic of the Web application is realized through JavaScript. The JavaScript can be used for realizing communication with the kernel layer by encapsulating the JavaScript extension interface through the browser,

middleware layer, some standardized interfaces may be provided to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding expert group (MHEG) of middleware related to data broadcasting, as DLNA middleware of middleware related to communication with an external device, as middleware providing a browser environment in which applications within a display device are running, and the like.

A kernel layer providing core system services such as: file management, memory management, process management, network management, system security authority management and other services. The kernel layer may be implemented as a kernel based on various operating systems, such as a kernel based on the Linux operating system.

The kernel layer also provides communication between system software and hardware at the same time, providing device driver services for various hardware, such as: providing a display driver for a display, providing a camera driver for a camera, providing a key driver for a remote control, providing a WIFI driver for a WIFI module, providing an audio driver for an audio output interface, providing a Power Management (PM) module with a power management driver, and the like.

A user interface 265 receives various user interactions. Specifically, an input signal for a user is transmitted to the controller 250, or an output signal from the controller 250 is transmitted to the user. In some embodiments, the remote control 100A may send input signals such as a power switch signal, a channel selection signal, a volume adjustment signal, etc. input by the user to the user interface 265, and then be forwarded by the user interface 265 to the controller 250; alternatively, the remote controller 100A may receive an output signal such as audio, video, or data, which is processed by the controller 250 to be output from the user interface 265, and display the received output signal or output the received output signal in the form of audio or vibration.

In some embodiments, a user may input a user command through a Graphical User Interface (GUI) displayed on the display 275, and the user interface 265 receives the user input command through the GUI. In particular, the user interface 265 may receive user input commands for controlling the position of a selector in a GUI to select different objects or items.

Alternatively, the user may enter a user command by entering a particular sound or gesture, and the user interface 265 recognizes the sound or gesture through the sensor to receive the user input command.

The video processor 270 is configured to receive an external video signal, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image composition according to a standard codec protocol of an input signal, so as to obtain a video signal that is directly displayed or played on the display 275.

By way of example, video processor 270 includes a demultiplexing module, a video decoding module, an image compositing module, a frame rate conversion module, a display formatting module, and the like.

Wherein, the demultiplexing module is used for demultiplexing the input audio/video data stream, such as the input MPEG-2 stream (based on the compression standard of the digital storage media moving image and voice), and then the demultiplexing module demultiplexes the input audio/video data stream into video signals, audio signals and the like.

And the video decoding module is used for processing the demultiplexed video signal, including decoding, scaling and the like.

And an image synthesis module, such as an image synthesizer, for performing superposition mixing processing on the graphic generator and the video image after the scaling processing according to the GUI signal input by the user or generated by the graphic generator, so as to generate an image signal for display.

The frame rate conversion module is configured to convert a frame rate of an input video, for example, convert a frame rate of an input 60Hz video into a frame rate of 120Hz or 240Hz, and a common format is implemented in an inserting frame manner.

And a display formatting module for converting the signal output by the frame rate conversion module into a signal conforming to a display format such as a display, for example, format converting the signal output by the frame rate conversion module to output an RGB data signal.

And a display 275 for receiving the image signals from the video processor 270 and displaying video, images and menu manipulation interfaces. For example, the display may display video in a broadcast signal received from the modem 210, may display video input from the communicator 220 or the external device interface 240, and may display an image stored in the memory 260. And a display 275 for simultaneously displaying a user manipulation interface UI generated in the display device 200 and used to control the display device 200.

And, the display 275 may include a display screen assembly for presenting pictures and a drive assembly for driving the display of images. Alternatively, if the display 275 is a projection display, a projection device and a projection screen may be included.

The audio processor 280 is configured to receive an external audio signal, decompress and decode according to a standard codec of an input signal, and perform audio data processing such as noise reduction, digital-to-analog conversion, and amplification, so as to obtain an audio signal that can be played in the speaker 286.

In some embodiments, audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, advanced Audio Coding (AAC), high efficiency AAC (HE-AAC), etc.

An audio output interface 285 for receiving audio signals derived from the output of audio processor 280. For example, the audio output interface may output audio in a broadcast signal received via the modem 210, may output audio input via the communicator 220 or the external device interface 240, and may output audio stored in the memory 260. The audio output interface 285 may include a speaker 286 or an external audio output terminal 287, such as a headphone output terminal, that outputs to a generating means of an external device.

In other exemplary embodiments, video processor 270 may include one or more chip components. Audio processor 280 may also include one or more chip components.

And, in other exemplary embodiments, video processor 270 and audio processor 280 may be separate chips or integrated with controller 250 in one or more chips.

The power supply 290 is used for providing power supply support for the display device 200 by power input by an external power supply under the control of the controller 250. The power supply 290 may be a built-in power supply circuit mounted inside the display device 200 or may be a power supply mounted outside the display device 200.

In some embodiments, the player pipeline structure of the streaming media may be as shown in fig. 2, and the player pipeline of the streaming media may be specifically implemented as a middleware layer in the operating system in fig. 1D, which may include:

the video downloading module is used for downloading media link addresses transmitted by the application, for streaming media, index file addresses, which are m3u8, manifest and mpd format files, and outputting the index file addresses to the video format detection module.

The video format detection module is configured to identify a protocol type of the streaming media based on the index file, for example HLS, MSS, dash, and output the protocol type, the index file address, and the index file data downloaded by the video download module to the streaming media protocol analysis module.

The streaming media protocol analysis module comprises a plurality of fragment downloading modules and a fragment buffering module, wherein the fragment downloading modules are used for analyzing fragment file addresses based on index file data, acquiring fragment file addresses of streaming media and downloading the fragment file addresses, and the fragment file downloading of video, audio and subtitles is respectively executed by different fragment downloading modules; the slicing buffer module is used for buffering the slicing files of the video, the audio and the caption downloaded by the different slicing download modules respectively, and continuously sending the slicing files of the video, the audio and the caption in the buffer to the decapsulation module.

The decapsulation module is configured to decapsulate the slice files of video, audio, and subtitle to obtain corresponding elementary stream data of video, audio, and subtitle, for example, parse the downloaded video slice file in mp4 format according to frame format to obtain a multi-frame H264 format video elementary stream.

And the multi-buffer queue module is used for respectively adding the base stream of each frame of video, audio and subtitle after the decapsulation into the buffer queues of the video, audio and subtitle for buffer treatment, so as to ensure that the connection between the video slice file and the audio slice file in the pipeline and the base stream injection module is established after the video slice file and the audio slice file in the pipeline are both parsed into the base stream, and prevent the decapsulation module from being unable to establish connection with the base stream injection module.

The elementary stream injection module is used for extracting elementary stream data of video, audio and subtitle from buffer queues of the video, audio and subtitle according to frames respectively, and injecting the elementary stream data into a decoder behind a player pipeline, and the video elementary stream injection module, the audio elementary stream injection module and the subtitle elementary stream injection module in fig. 2 are not perceived mutually.

And the decoder is used for decoding and correspondingly processing the basic stream data of the video, the audio and the subtitle, and finally outputting the processed data to a corresponding video display window or an audio output interface to realize the playing of the video, the audio and the subtitle.

Since most decoders are implemented by a hardware chip quotient scheme, a basic stream injection module in a player pipeline is connected with a format encapsulation module and the decoder, and since the decoder is completely dependent on data given by a decapsulation module, the data given by the decapsulation module is usually ensured to be synchronous, when basic streams in different formats are injected into the decoder, the time stamps of different slice files are usually ensured to be within a certain threshold range, such as 2s.

The streaming media player pipeline scheme shown in fig. 2 has no problem on normal streaming media playing processing of audio and video, but the most remarkable feature of streaming media is that media content is adaptively switched along with a network, if the streaming media packaging format or the coding format before and after switching is changed, the player pipeline triggers a dynamic switching mechanism, a decapsulation module shown in fig. 2 is changed, for example, the streaming media format is changed from ts format to mp4 format, at this time, due to the synchronous limitation of a basic stream injection module, audio basic stream data and subtitle basic stream data in a multi-buffer queue in an old player pipeline cannot be converted to a new pipeline to receive new media content before consumption is completed, so that a picture is blocked.

For example, assume that the player pipeline sets a threshold of 2s for the video and audio synchronization mechanism. When the duration of one video slice file is 3s and the duration of one audio slice file is 9s, if the format of the next video slice file is changed from ts format to mp4 format, the player pipeline needs to wait for the audio slice of 9s to be consumed before the next video slice file can be requested to be downloaded by the server. When the processing of the video elementary stream data corresponding to the 3s video slice file is already injected into the decoder, the audio elementary stream data corresponding to the audio slice file can be injected for 5s at most according to the threshold value of the synchronization mechanism, and then the synchronization mechanism of the decoder considers that no video elementary stream data exists, so that the elementary stream injection module is not required to inject the remaining 4s audio elementary stream data, the remaining audio elementary stream data can stay in the elementary stream injection module and cannot be injected into the decoder, the current player pipeline cannot be ended, a new player pipeline cannot be created, and therefore, the phenomenon of picture blocking in the display can occur.

In view of the above problems, the present application provides a streaming media synchronization method supporting a variable slice format and a display device, so that when a slice file format changes in the display device, the dynamic reconstruction of a player pipeline and the synchronous injection function of the slice file are not affected, and a video can be normally and continuously played. Specific examples are set forth below.

In a first embodiment, a schematic structural diagram of another player pipeline provided in the present application is shown in fig. 3, where the structure includes: video download module, video format detection module, stream media protocol analysis module (including slice download module and slice buffer module), decapsulation module, multi-buffer queue module, and basic stream synchronous injection module (including basic stream buffer sub-module and basic stream injection sub-module), wherein the input end of the player pipeline is stream media server, and the output end is decoder, such as decoding chip. The video downloading module and the video format detecting module have the same functions as those shown in fig. 2, and are not described herein.

In combination with the display device in fig. 1B, the controller in the display device of the present application may specifically further control the video downloading module, the video format detecting module, the streaming media protocol parsing module (including a slice downloading module and a slice buffering module), the decapsulating module, the multi-buffering queue module, the elementary stream buffering module, and the elementary stream injecting module in the player pipeline, to perform the following operations:

the controller is used for controlling a fragment downloading module, a decapsulation module, a multi-buffer queue module, a basic stream buffer module and a basic stream injection module in a current stream media player pipeline before the format of the next video slice file is changed; wherein, the liquid crystal display device comprises a liquid crystal display device,

The slice downloading module downloads the current video slice file and the current audio slice file respectively; specifically, the slice downloading module may parse the content of the index file, obtain the slice file address of the streaming media, and download the slice file of the streaming media, where the slice file downloads of video, audio, and subtitles are executed by different slice downloading modules respectively;

the slicing buffer module is used for buffering the slicing files of the video, the audio and the caption downloaded by the different slicing download modules respectively, and continuously sending the slicing files of the video, the audio and the caption in the buffer to the decapsulation module. The slice buffer module is mainly used for buffering the downloaded slice files, the size of the slice buffer module is generally controlled by the slice file downloading time, and the slice buffer module is generally designed to be 10-30 s, but at least 1 slice length needs to be ensured.

The decapsulation module analyzes the downloaded video slice file and the audio slice file according to frames to obtain video basic stream data and audio basic stream data; specifically, the decapsulation module may decapsulate the video slice file, the audio slice file, and the subtitle slice file to obtain corresponding video elementary stream data, audio elementary stream data, and subtitle elementary stream data, for example, parse the downloaded mp4 format video slice file according to a frame format to obtain a multi-frame H264 format video elementary stream.

The multi-buffer queue module adds each frame of video elementary stream data after being unpacked to the first video buffer queue, and adds each frame of audio elementary stream data after being unpacked to the first audio buffer queue; the multi-buffer queue module is mainly used for controlling all the streaming media slice files to be completely resolved before the player is loaded, so that the module can be designed to be small enough, such as 2M.

And the basic stream buffer submodule continuously extracts video basic stream data from the first video buffer queue and adds the video basic stream data to the second video buffer queue, and continuously extracts audio basic stream data from the first audio buffer queue and adds the audio basic stream data to the second audio buffer queue until all the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are extracted. After the extraction is completed, the multi-buffer queue module has no unprocessed data, so that the decapsulation module and the multi-buffer queue module can be removed and destroyed in the current player pipeline, thereby recreating a new decapsulation module and multi-buffer queue module suitable for the changed format. Therefore, the method and the device can reconstruct the pipeline when the elementary streams are buffered to the elementary stream buffering submodule, and reconstruct the elementary streams without waiting for the elementary streams to be injected into the decoder, thereby ensuring that the video can still be normally played after the video format is changed.

It should be noted that, when the elementary stream buffer submodule extracts elementary stream data from the multi-buffer queue module, the buffer queue size needs to set a threshold value, and the threshold value can ensure that data of at least one slice file can be completely buffered. The size of the elementary stream buffer sub-module is designed to be the sum of the buffer size of the multi-buffer queue module, the buffer size of the tile buffer module, and an appropriate threshold (e.g., 1M).

As an example, the elementary stream injection module in this embodiment may mutually perceive the time stamps of the elementary stream data when injecting the video elementary stream data and the audio elementary stream data, respectively, thereby facilitating synchronous injection.

The method for implementing synchronous injection of video elementary stream data and audio elementary stream data to the decoder by the elementary stream injection submodule specifically comprises the following steps: before audio elementary stream data of a current frame is injected, calculating a difference value between a time stamp of the audio elementary stream data of the current frame and a maximum time stamp of the injected video elementary stream data; judging whether the difference value is within a preset threshold range, if so, injecting audio elementary stream data of the current frame; if not, stopping injecting the audio elementary stream data of the current frame. Note that, the injection of the subtitle elementary stream is the same as the injection flow of the audio elementary stream, and the description thereof is omitted here.

Considering that streaming format changes are typically implemented in the decapsulation module, the elementary stream buffer module may store the elementary stream data in the multi-buffer queue module in the current old player pipe.

For example, assume that the player pipeline sets a synchronization mechanism threshold of 2s. When the duration of one video slice file is 3s and the duration of one audio slice file is 9s, if the format of the next video slice file changes, the elementary stream buffer module will first extract the video elementary stream data 3s and the audio elementary stream data 9s from the multi-buffer queue module. The elementary stream injection module synchronously injects the video elementary stream data and the audio elementary stream data into the decoder according to the synchronization mechanism. When the 3s video elementary stream data has been injected into the decoder processing is completed, the audio elementary stream data can be injected for 5s at most according to the threshold 2s of the synchronization mechanism, and then the decoder returns an acknowledgement.

The residual audio elementary stream data are stored in the elementary stream buffer module, and the audio elementary stream data in the multi-buffer queue module are all extracted into the elementary stream buffer module, so that the multi-buffer queue module considers that the current player pipeline does not have unprocessed elementary stream data, and accordingly notifies the pipeline that all processing of the data in the current pipeline is completed, so that the protocol decapsulation module can remove connection with the old decapsulation module, and a new decapsulation module applicable to the changed video slice format is created, and the phenomenon of picture blocking is avoided.

The basic stream buffer module is added before the basic stream injection module, so that the basic stream data in the multi-buffer queue can be received until the basic stream data in the multi-buffer queue is completely buffered, and the player pipeline can consider that the basic stream buffer module has enabled the old player pipeline to run through the pipeline switching process, so that the old pipeline can be closed in time, and the basic stream injection sub-module adopts a new thread to inject the basic stream data into the decoder without blocking the pipeline reconstruction.

In a second embodiment, in connection with the display device in fig. 1B, the controller in the display device of the present application may further specifically control the video downloading module, the video format detecting module, the streaming media protocol parsing module (including the slice downloading module and the slice buffering module), the decapsulating module, the multi-buffering queue module, the elementary stream buffering module, and the elementary stream injecting module in the player pipeline, to perform the following operations:

the controller is used for controlling a slice downloading module, a slice buffering module, a decapsulation module and a multi-buffer queue module in the current streaming media player pipeline before the format of the next video slice file changes, and the basic stream synchronous injection module; wherein, the liquid crystal display device comprises a liquid crystal display device,

The slice downloading module downloads the current video slice file and the current audio slice file respectively; specifically, the slice downloading module may acquire a slice file address of the streaming media and download slice files based on the content of the index file, where slice file downloading of video, audio, and subtitles is performed by different slice downloading modules respectively;

the slicing buffer module continuously controls the slicing download module to download the video slice file, and controls the slicing download module to download the next audio slice file based on the target time of the recently downloaded video slice file.

The multi-buffer queue module adds each frame of video elementary stream data after being unpacked to the first video buffer queue, and adds each frame of audio elementary stream data after being unpacked to the first audio buffer queue.

and the basic stream buffer sub-module continuously extracts video basic stream data from the first video buffer queue and adds the video basic stream data to the second video buffer queue, continuously extracts audio basic stream data from the first audio buffer queue and adds the audio basic stream data to the second audio buffer queue until all extraction of the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file is completed, and destroys the decapsulation module and the multi-buffer queue module in the current stream media player pipeline.

Unlike the above embodiment, the slicing buffer module in this embodiment controls the slicing download module to download the next audio slice file based on the target time of the recently downloaded video slice file, and specifically includes: setting a video slice file as a main format file, setting an audio slice file as a secondary format file, controlling a slice downloading module to judge whether the difference value between the target time of the currently downloaded main format file and the target time of the secondary format file is smaller than or equal to a preset threshold before downloading the next secondary format file, and downloading the next secondary format file if the difference value is smaller than or equal to the preset threshold; if not, the downloading of the next piece of secondary format file is suspended.

In addition, the slicing buffer module may change the secondary format file of the next frame into the primary format file after the complete download of the primary format file is completed, and continuously download the primary format file.

By controlling the split download of the split buffer module, the time stamp difference of different types of streaming media files in the pipeline conversion process can be reduced, the excessive cache quantity of the files in the secondary format is avoided, and the memory occupation of the basic stream buffer module is reduced.

As an example, please refer to a process flow chart for controlling slice file downloading shown in fig. 4, which includes:

step 401, analyzing streaming media content, and respectively starting video, audio and caption threads to download a first slice file;

for example, if there are audio, video and subtitles in the streaming media, 3 threads are started, and if there are only audio and video, only two threads are started.

Step 402, updating the target time of the current downloaded fragments of the main format file;

the target time is the sum of the start time and the slicing duration corresponding to the current slicing of the slice file, namely the time of ending the slicing.

Step 403, judging whether the current fragment of the main format file is downloaded; if so, go to step 404; if not, go to step 407;

Step 404, judging whether all fragments of the main format file are downloaded; if so, go to step 405; if not, go to step 409;

step 405, judging whether downloading of all fragments of the secondary format file is completed; if yes, ending the downloading; if not, go to step 406;

step 406, selecting a file with a secondary format as a primary format file; turning to step 402;

step 407, judging whether the file contains the secondary format file, if yes, turning to step 410; if not, go to step 408;

step 408, wait for the file download of the main format to finish, go to step 409;

step 409, continuing downloading the next main format file, and turning to step 402;

step 410, judging whether the current fragment of the secondary format file is downloaded, if yes, turning to step 411; if not, go to step 415;

step 411, judging whether the downloading of the whole part of the current secondary format file is completed, if so, suspending the secondary format file downloading thread; if not, go to step 412;

step 412, determining whether the difference between the next slice target time of the secondary format file and the target time of the primary format file is within a threshold range, if so, turning to step 413; if not, go to step 414;

Step 413, continuing the secondary format file downloading thread to download the next fragmented secondary format file, and turning to step 410;

step 414, suspending the secondary format file downloading thread, and triggering downloading after the downloading of the primary format file is completed;

step 415, continue to download the current shard secondary format file, go to step 410.

In this embodiment, the primary and secondary streaming media files are added, so that the secondary streaming media files can adjust their own downloading speed based on the current downloading progress of the primary streaming media files. The primary and secondary allocations are based primarily on file download rates, with the lowest download rates being primary format files and the remainder being secondary format files. For example, if there is video in the streaming media, the video is set as a main format file because the video download rate is low, if there is audio and subtitle in the media, the audio is set as a main format file, and if the media has only one path, the track is set as a main format file.

The thread of the primary format file can always download, and the secondary format file is compared with the target time of the primary format file when the thread of the primary format file is downloaded unless the limit of a downloading queue is reached, if the difference between the primary format file and the target time is within a certain threshold value range, the secondary format file is restored, otherwise, the fact that the difference between the time stamps of the primary format file and the secondary format file is larger indicates that the secondary format file is suspended until the primary format file is downloaded until the primary format file is completely downloaded, the threshold value is the maximum value of the current format fragment duration and the threshold value of the primary stream injection module, the secondary format file can be downloaded in time, and the memory occupation of the primary stream buffer module is reduced.

An interactive schematic diagram supporting streaming media synchronization in a variable slice format is exemplarily shown in fig. 5. Assuming that the duration of one video slice file is 3s, the duration of one audio slice file is 9s, and the duration of one subtitle slice file is 20s, if the format of the next video slice file changes, the interactive flow of the player pipeline specifically includes:

the slice downloading module downloads a video slice file 3s, an audio slice file 9s and a subtitle slice file 20s from the server according to the address of the parsed streaming media slice file based on the control of the slice buffering module.

The slicing buffer module is used for buffering the slicing files of the video, the audio and the caption downloaded by the different slicing download modules respectively, and continuously sending the slicing files of the video, the audio and the caption in the buffer to the decapsulation module. Assuming that the format of the next video slice file changes, the slice buffer module controls the slice download module to download only the video slice file of the current slice, and the audio slice file and the subtitle slice file of the current slice.

The decapsulation module is used for decapsulating the slice files of the video, the audio and the subtitle to obtain corresponding video elementary stream data, audio elementary stream data and subtitle elementary stream data, for example, the current video slice file is in an mp4 format, and the decapsulation module is used for parsing the video slice file according to frames to obtain 1 frame of video elementary stream data in an H264 format, so that the audio and subtitle processing methods are identical.

The multi-buffer queue module respectively adds the unpacked video elementary stream data, the audio elementary stream data and the subtitle elementary stream data to the corresponding first buffer queues;

the basic stream buffer module extracts video basic stream data, audio basic stream data and subtitle basic stream data from the first frequency buffer queue respectively, and adds the video basic stream data, the audio basic stream data and the subtitle basic stream data to the second buffer queue until the current video basic stream data, the audio basic stream data and the subtitle basic stream data are all extracted.

The elementary stream injection module extracts the video elementary stream data, the audio elementary stream data and the subtitle elementary stream data from the second buffer queue according to the time stamps of the video elementary stream data, the audio elementary stream data and the subtitle elementary stream data respectively and synchronously injects the video elementary stream data, the audio elementary stream data and the subtitle elementary stream data into the decoder.

For example, assume that the threshold of the synchronization mechanism is 2s.

When video elementary stream data, audio elementary stream data, and subtitle elementary stream data of the first frame are injected, the time stamps of the injected video elementary stream data, audio elementary stream data, and subtitle elementary stream data are the same, and at this time, the first frame may be injected.

When the audio elementary stream data and the subtitle elementary stream data are re-injected after the injection of the video elementary stream data of 3s is completed, since the time stamp of the injected video elementary stream data is 3s and the threshold value based on the synchronization mechanism is 2s, the maximum time stamp of the injection of the audio elementary stream data and the subtitle elementary stream data is 3+2=5s, the audio elementary stream data and the subtitle elementary stream data after exceeding 5s are not injected any more, and at this time, the decoder considers that all the injections are completed, and the confirmation is returned. Since all the streaming media data in the current player pipe has been buffered to the elementary stream buffering sub-module, the player considers that the current player pipe module has no unprocessed data, thereby closing the current streaming media player pipe. Therefore, a new player pipeline suitable for the changed video slice format can be triggered to be created, and the phenomenon of picture blocking is avoided.

In a third embodiment, as shown in fig. 6, a method for performing streaming media synchronization supporting a variable slice format is provided in a display device of the present application, the method comprising the steps of:

step 601, respectively downloading a current video slice file and a current audio slice file before the format of the next video slice file changes;

step 602, analyzing the downloaded video slice file and audio slice file according to frames to obtain video basic stream data and audio basic stream data;

step 603, adding each frame of video elementary stream data after decapsulation to a first video buffer queue, and adding each frame of audio elementary stream data after decapsulation to a first audio buffer queue;

step 604, continuously extracting video elementary stream data from the first video buffer queue and adding the video elementary stream data to the second video buffer queue, continuously extracting audio elementary stream data from the first audio buffer queue and adding the audio elementary stream data to the second audio buffer queue until all the video elementary stream data corresponding to the current video slice file and the audio elementary stream data corresponding to the current audio slice file are extracted;

step 605 extracts video elementary stream data and audio elementary stream data from the second video buffer queue and the second audio buffer queue according to the time stamp of the video elementary stream data and the time stamp of the audio elementary stream data, respectively, and synchronously injects the extracted video elementary stream data and audio elementary stream data to the decoder.

In a fourth embodiment, as shown in fig. 7, another method for synchronizing streaming media supporting variable slice formats is provided in a display device of the present application, the method comprising the steps of:

step 701, respectively downloading a current video slice file and a current audio slice file before the format of the next video slice file changes;

step 702, continuously controlling to download a video slice file, and controlling the slice downloading module to download the next audio slice file based on the target time of the recently downloaded video slice file;

step 703, analyzing the downloaded video slice file and audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

step 704, adding each frame of video elementary stream data after decapsulation to a first video buffer queue, and adding each frame of audio elementary stream data after decapsulation to a first audio buffer queue;

step 705, continuously extracting video elementary stream data from the first video buffer queue and adding the video elementary stream data to the second video buffer queue, continuously extracting audio elementary stream data from the first audio buffer queue and adding the audio elementary stream data to the second audio buffer queue until all the video elementary stream data corresponding to the current video slice file and the audio elementary stream data corresponding to the current audio slice file are extracted;

Step 706, extracting video elementary stream data and audio elementary stream data from the second video buffer queue and the second audio buffer queue respectively according to the time stamp of the video elementary stream data and the time stamp of the audio elementary stream data, and synchronously injecting the extracted video elementary stream data and audio elementary stream data into the decoder.

Based on the above embodiment, the display device may add the buffer queue of the basic stream injection module on the basis of the multi-buffer queue, so that when the next slice file changes, the data in the multi-buffer queue in the old player pipeline can be exhausted in time, and a new player pipeline is created, so that the dynamic reconstruction of the player pipeline and the synchronous injection function of the slice file are not affected, and the video can be normally and continuously played. In addition, the downloading progress of the slice downloading module to the audio slice file is controlled by adding the slice buffering module, so that the memory size occupied by the video slice file and the audio slice file is optimized, and the multi-path data can be synchronously injected into the decoder when the slice file changes.

The foregoing embodiments have been provided for the purpose of illustration and description, and are not intended to be exhaustive or limiting of the application. The individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but may be used or interchanged in selected embodiments if applicable even if not specifically shown or described. Many forms of modification are equally possible, which are not to be regarded as departure from the scope of the claims appended hereto, and all such modifications are intended to be covered by the scope of the claims appended hereto.

Claims

1. A display device, characterized by comprising:

a display;

a network module for browsing and/or downloading service content from a server;

the slice downloading module is used for respectively downloading the current video slice file and the current audio slice file;

the decapsulation module is used for parsing the downloaded video slice file and the audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

the multi-buffer queue module is used for adding each frame of video elementary stream data after being unpacked to the first video buffer queue and adding each frame of audio elementary stream data after being unpacked to the first audio buffer queue;

2. The display device according to claim 1, wherein the elementary stream injection sub-module extracts video elementary stream data and audio elementary stream data from the second video buffer queue and the second audio buffer queue, respectively, based on the time stamps of the video elementary stream data and the time stamps of the audio elementary stream data, and synchronously injects the video elementary stream data and the audio elementary stream data to the decoder, in particular:

before audio elementary stream data of a current frame is injected, calculating a difference value between a time stamp of the audio elementary stream data of the current frame and a maximum time stamp of the injected video elementary stream data;

judging whether the difference value is within a preset threshold range, if so, injecting audio elementary stream data of the current frame; if not, stopping injecting the audio elementary stream data of the current frame.

3. A display device, characterized by comprising:

a display;

a network module for browsing and/or downloading service content from a server;

the slicing buffer module is used for continuously controlling the slicing download module to download video slice files and controlling the slicing download module to download the next audio slice file based on the target time of the recently downloaded video slice files;

4. The display device of claim 3, wherein the tile buffer module controls the tile download module to download a next audio tile file based on a target time of a recently downloaded video tile file, comprising:

setting a video slice file as a main format file, setting an audio slice file as a secondary format file, controlling a slice downloading module to judge whether the difference value between the target time of the currently downloaded main format file and the target time of the secondary format file is smaller than or equal to a preset threshold before downloading the next secondary format file, and downloading the next secondary format file if the difference value is smaller than or equal to the preset threshold; if not, the downloading of the next piece of secondary format file is suspended.

5. The display device of claim 4, wherein the fragmentation buffer is further configured to change the secondary format file of the next frame to the primary format file after the complete download of the primary format file is completed, and continuously download the primary format file.

6. A streaming media synchronization method supporting a variable slice format, the method comprising:

and respectively extracting video basic stream data and audio basic stream data from the second video buffer queue and the second audio buffer queue according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data, and synchronously injecting the video basic stream data and the audio basic stream data into a decoder.

7. The method according to claim 6, wherein the extracting video elementary stream data and audio elementary stream data from the second video buffer queue and the second audio buffer queue respectively is synchronized with the injection of the video elementary stream data and audio elementary stream data into the decoder based on the time stamps of the video elementary stream data and the time stamps of the audio elementary stream data, in particular:

8. A streaming media synchronization method supporting a variable slice format, the method comprising:

9. The method according to claim 8, wherein the controlling the tile download module to download the next audio tile file based on the target time of the most recently downloaded video tile file specifically comprises:

10. The method of claim 9, wherein the method further comprises:

when the complete download of the primary format file is completed, the secondary format file of the next frame is changed into the primary format file, and the primary format file is continuously downloaded.