CN114073098A

CN114073098A - Streaming media synchronization method and display device

Info

Publication number: CN114073098A
Application number: CN202080000658.6A
Authority: CN
Inventors: 朱宗花; 王云刚; 康健民
Original assignee: Qingdao Hisense Media Network Technology Co Ltd
Current assignee: Vidaa Netherlands International Holdings BV
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2022-02-18
Anticipated expiration: 2040-04-28
Also published as: CN114073098B; WO2021217435A1

Abstract

The application discloses a streaming media synchronization method, which comprises the following steps: adding the parsed audio/video stream data to a first audio/video buffer queue respectively; and audio/video stream data respectively extracted from the first audio/video buffer queue is added into a second audio/video buffer queue, a decapsulation module and a multi-buffer queue in the current player pipeline are destroyed, and the audio/video stream data is extracted from the second audio/video buffer queue and synchronously injected into a decoder.

Description

Streaming media synchronization method and display device

Technical Field

The present application relates to the field of streaming media playing technologies, and in particular, to a streaming media synchronization method and a display device supporting a variable slice format.

Background

Streaming media is a relatively wide multimedia transmission scheme on the current network, and compared with traditional media, streaming media files can be sliced and then transmitted in a slicing mode. The slice files may typically contain one or more file types of video streams, audio streams, and subtitle streams, and may be transmitted together or separately. When the video stream, the audio stream and the subtitle stream are separately transmitted, because the content of the audio stream and the content of the subtitle stream are much smaller than that of the video stream, the downloading progress of different types of slice files in the same network state is greatly different, and the slicing time lengths of the slice files are also greatly different. So that there is a large difference in the time stamps of the code streams injected into the decoding module.

The stream media player pipeline refers to a pipeline formed by a plurality of functional modules used when stream media data are played. For a streaming media source with separated audio, video and subtitle, if the type or coding type of a certain path of slice file changes or an audio and video timestamp jumps suddenly during transmission, an old player pipeline in a playing module usually needs to process all old slice files before the change, and then a new player pipeline can be created to process the changed new slice file, so that the new player pipeline cannot be reconstructed in time.

Because the difference between the code stream timestamp of the slice file and the time length of the slice is large, the basic stream injection module finishes the injection of other paths of slice files in the process of waiting for the injection of the path of the slice file with the largest timestamp, so that when the playing of the current video slice file is finished, subtitles or audio are still played, the problems of playing picture blocking, long buffering, abnormal playing exit and the like are caused, and the playing effect is extremely influenced.

Disclosure of Invention

In view of this, the present application provides a streaming media synchronization method and a display device supporting a variable slice format, so as to implement continuous normal playing of streaming media and improve user experience.

Specifically, the present application is realized by the following embodiments:

in a first aspect, the present application provides a display device comprising:

a display;

the network module is used for browsing and/or downloading service contents from the server;

a decoder for decoding elementary stream data acquired from service contents;

the fragment downloading module respectively downloads a current video slice file and a current audio slice file;

the decapsulation module analyzes the downloaded video slice file and the downloaded audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

the multi-buffer queue module adds each frame of video elementary stream data after being unpacked to a first video buffer queue and adds each frame of audio elementary stream data after being unpacked to a first audio buffer queue;

the basic stream synchronous injection module comprises a basic stream buffering submodule and a basic stream injection submodule, wherein,

the basic stream buffering submodule continuously extracts the video basic stream data from the first video buffering queue and adds the video basic stream data to a second video buffering queue, and continuously extracts the audio basic stream data from the first audio buffering queue and adds the audio basic stream data to the second audio buffering queue until the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are all extracted, and the decapsulation module and the multi-buffering queue module in the current streaming media player pipeline are destroyed;

and the basic stream injection sub-module extracts video basic stream data and audio basic stream data from the second video buffer queue and the second audio buffer queue respectively according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data and synchronously injects the video basic stream data and the audio basic stream data into the decoder.

In a second aspect, the present application provides a display device comprising:

a display;

a decoder for decoding elementary stream data acquired from service contents;

the fragment buffering module continuously controls the fragment downloading module to download the video slice file, and controls the fragment downloading module to download the next audio slice file based on the target time of the video slice file which is downloaded most recently;

and the elementary stream injection sub-module extracts video elementary stream data and audio elementary stream data from the second video buffering queue and the second audio buffering queue respectively according to the time stamp of the video elementary stream data and the time stamp of the audio elementary stream data and synchronously injects the video elementary stream data and the audio elementary stream data into the decoder.

In a third aspect, the present application provides a method for streaming media synchronization supporting a variable slice format, where the method includes:

respectively downloading a current video slice file and a current audio slice file;

analyzing the downloaded video slice file and audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

adding each frame of video elementary stream data after being unpacked into a first video buffering queue, and adding each frame of audio elementary stream data after being unpacked into a first audio buffering queue;

continuously extracting video elementary stream data from the first video buffering queue and adding the video elementary stream data to a second video buffering queue, and continuously extracting audio elementary stream data from the first audio buffering queue and adding the audio elementary stream data to a second audio buffering queue until all the video elementary stream data corresponding to the current video slice file and all the audio elementary stream data corresponding to the current audio slice file are completely extracted;

and according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data, respectively extracting the video basic stream data and the audio basic stream data from the second video buffering queue and the second audio buffering queue to be synchronously injected into the decoder.

In a fourth aspect, the present application provides a method for streaming media synchronization supporting a variable slice format, the method comprising:

continuously controlling to download the video slice file, and controlling the slice downloading module to download the next audio slice file based on the target time of the video slice file which is downloaded recently;

Drawings

Fig. 1A schematically illustrates an operation scenario between the display device 200 and the control 100;

fig. 1B is a block diagram schematically illustrating a configuration of the control apparatus 100 in fig. 1A;

fig. 1C is a block diagram schematically illustrating a configuration of the display device 200 in fig. 1A;

a block diagram of the architectural configuration of the operating system in the memory of the display device 200 is illustrated in fig. 1D.

Fig. 2 is a schematic diagram illustrating a structure of a player pipeline;

FIG. 3 is a schematic diagram illustrating another player pipeline configuration;

fig. 4 is a flowchart illustrating a process for controlling the download of a slice file;

fig. 5 is an interaction diagram for supporting streaming media synchronization in a variable slice format;

fig. 6 is a process flow diagram illustrating a method of streaming media synchronization in support of variable slice formats;

another process flow diagram of a method of streaming media synchronization that supports variable slice formats is illustrated in fig. 7.

Detailed Description

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments shown in the present application without inventive effort, shall fall within the scope of protection of the present application. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

The terms "comprises" and "comprising," and any variations thereof, as used herein, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.

Fig. 1A is a schematic diagram illustrating an operation scenario between the display device 200 and the control apparatus 100. As shown in fig. 1A, the control apparatus 100 and the display device 200 may communicate with each other in a wired or wireless manner.

Among them, the control apparatus 100 is configured to control the display device 200, which may receive an operation instruction input by a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an intermediary for interaction between the user and the display device 200. Such as: the user operates the channel up/down key on the control device 100, and the display device 200 responds to the channel up/down operation.

The control device 100 may be a remote controller 100A, which includes infrared protocol communication or bluetooth protocol communication, and other short-distance communication methods, etc. to control the display apparatus 200 in a wireless or other wired manner. The user may input a user instruction through a key on a remote controller, voice input, control panel input, etc., to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.

The control device 100 may also be an intelligent device, such as a mobile terminal 100B, a tablet computer, a notebook computer, and the like. For example, the display device 200 is controlled using an application program running on the smart device. The application program may provide various controls to a user through an intuitive User Interface (UI) on a screen associated with the smart device through configuration.

In some embodiments, the mobile terminal 100B may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 100B may be caused to establish a control instruction protocol with the display device 200 to implement the functions of the physical keys as arranged in the remote control 100A by operating various function keys or virtual buttons of the user interface provided on the mobile terminal 100B. The audio and video content displayed on the mobile terminal 100B may also be transmitted to the display device 200, so as to implement a synchronous display function.

The display apparatus 200 may be implemented as a television, and may provide an intelligent network television function of a broadcast receiving television function as well as a computer support function. Examples of the display device include a digital television, a web television, a smart television, an Internet Protocol Television (IPTV), and the like.

The display device 200 may be a liquid crystal display, an organic light emitting display, a projection display device. The specific display device type, size, resolution, etc. are not limited.

The display apparatus 200 also performs data communication with the server 300 through various communication means. Here, the display apparatus 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 300 may provide various contents and interactions to the display apparatus 200. By way of example, the display device 200 may send and receive information such as: receiving Electronic Program Guide (EPG) data, receiving software program updates, or accessing a remotely stored digital media library. The servers 300 may be a group or groups of servers, and may be one or more types of servers. Other web service contents such as a video on demand and an advertisement service are provided through the server 300.

Fig. 1B is a block diagram illustrating the configuration of the control device 100. As shown in fig. 1B, the control device 100 includes a controller 110, a memory 120, a communicator 130, a user input interface 140, an output interface 150, and a power supply 160.

The controller 110 includes a Random Access Memory (RAM)111, a Read Only Memory (ROM)112, a processor 113, a communication interface, and a communication bus. The controller 110 is used to control the operation of the control device 100, as well as the internal components of the communication cooperation, external and internal data processing functions.

In some embodiments, when an interaction in which a user presses a key disposed on the remote controller 100A or an interaction in which a touch panel disposed on the remote controller 100A is touched is detected, the controller 110 may control to generate a signal corresponding to the detected interaction and transmit the signal to the display device 200.

And a memory 120 for storing various operation programs, data and applications for driving and controlling the control apparatus 100 under the control of the controller 110. The memory 120 may store various control signal commands input by a user.

The communicator 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the control apparatus 100 transmits a control signal (e.g., a touch signal or a button signal) to the display device 200 via the communicator 130, and the control apparatus 100 may receive the signal transmitted by the display device 200 via the communicator 130. The communicator 130 may include an infrared signal interface 131 and a radio frequency signal interface 132. For example: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.

The user input interface 140 may include at least one of a microphone 141, a touch pad 142, a sensor 143, a key 144, and the like, so that a user can input a user instruction regarding controlling the display apparatus 200 to the control apparatus 100 through voice, touch, gesture, press, and the like.

The output interface 150 outputs a user instruction received by the user input interface 140 to the display apparatus 200, or outputs an image or voice signal received by the display apparatus 200. Here, the output interface 150 may include an LED interface 151, a vibration interface 152 generating vibration, a sound output interface 153 outputting sound, a display 154 outputting an image, and the like. For example, the remote controller 100A may receive an output signal such as audio, video, or data from the output interface 150, and display the output signal in the form of an image on the display 154, in the form of audio on the sound output interface 153, or in the form of vibration on the vibration interface 152.

And a power supply 160 for providing operation power support for each element of the control device 100 under the control of the controller 110. In the form of a battery and associated control circuitry.

A hardware configuration block diagram of the display device 200 is exemplarily illustrated in fig. 1C. As shown in fig. 1C, the display apparatus 200 may further include a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a memory 260, a user interface 265, a video processor 270, a display 275, an audio processor 280, an audio input interface 285, and a power supply 290.

The tuner demodulator 210 receives the broadcast television signal in a wired or wireless manner, and may perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, so as to demodulate, from a plurality of wireless or wired broadcast television signals, an audio/video signal carried in a frequency of a television channel selected by a user, and additional information (e.g., EPG data).

The tuner demodulator 210 is responsive to the user selected frequency of the television channel and the television signal carried by the frequency, as selected by the user and controlled by the controller 250.

The tuner demodulator 210 can receive a television signal in various ways according to the broadcasting system of the television signal, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and can demodulate the analog signal and the digital signal according to the different kinds of the received television signals.

In other exemplary embodiments, the tuning demodulator 210 may also be in an external device, such as an external set-top box. In this way, the set-top box outputs a television signal after modulation and demodulation, and inputs the television signal into the display apparatus 200 through the external device interface 240.

The communicator 220 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the display apparatus 200 may transmit content data to an external apparatus connected via the communicator 220, or browse and download content data from an external apparatus connected via the communicator 220. The communicator 220 may include a network communication protocol module or a near field communication protocol module, such as a WIFI module 221, a bluetooth communication protocol module 222, and a wired ethernet communication protocol module 223, so that the communicator 220 may receive a control signal of the control device 100 according to the control of the controller 250 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.

The detector 230 is a component of the display apparatus 200 for collecting signals of an external environment or interaction with the outside. The detector 230 may include an image collector 231, such as a camera, a video camera, etc., which may be used to collect external environment scenes to adaptively change the display parameters of the display device 200; and the function of acquiring the attribute of the user or interacting gestures with the user so as to realize the interaction between the display equipment and the user. A light receiver 232 may also be included to collect ambient light intensity to adapt to changes in display parameters of the display device 200, etc.

In some other exemplary embodiments, the detector 230 may further include a temperature sensor, such as by sensing an ambient temperature, and the display device 200 may adaptively adjust a display color temperature of the image. In some embodiments, when the temperature is higher, the display apparatus 200 may be adjusted to display an image with a color temperature that is cooler; when the temperature is lower, the display device 200 may be adjusted to display a warmer color temperature of the image.

In some other exemplary embodiments, the detector 230, which may further include a sound collector, such as a microphone, may be configured to receive a sound of a user, such as a voice signal of a control instruction of the user to control the display device 200; alternatively, ambient sounds may be collected that identify the type of ambient scene, enabling the display device 200 to adapt to ambient noise.

The external device interface 240 is a component for providing the controller 210 to control data transmission between the display apparatus 200 and an external apparatus. The external device interface 240 may be connected to an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), additional information (e.g., EPG), etc. of the external apparatus.

The external device interface 240 may include: a High Definition Multimedia Interface (HDMI) terminal 241, a Composite Video Blanking Sync (CVBS) terminal 242, an analog or digital Component terminal 243, a Universal Serial Bus (USB) terminal 244, a Component terminal (not shown), a red, green, blue (RGB) terminal (not shown), and the like.

The controller 250 controls the operation of the display device 200 and responds to the operation of the user by running various software control programs (such as an operating system and various application programs) stored on the memory 260.

As shown in fig. 1C, the controller 250 includes a Random Access Memory (RAM)251, a Read Only Memory (ROM)252, a graphics processor 253, a CPU processor 254, a communication interface 255, and a communication bus 256. The RAM251, the ROM252, the graphic processor 253, and the CPU processor 254 are connected to each other through a communication bus 256 through a communication interface 255.

The ROM252 stores various system boot instructions. When the display apparatus 200 starts power-on upon receiving the power-on signal, the CPU processor 254 executes a system boot instruction in the ROM252, copies the operating system stored in the memory 260 to the RAM251, and starts running the boot operating system. After the start of the operating system is completed, the CPU processor 254 copies the various application programs in the memory 260 to the RAM251 and then starts running and starting the various application programs.

A graphic processor 253 for generating screen images of various graphic objects such as icons, images, and operation menus. The graphic processor 253 may include an operator for performing an operation by receiving various interactive instructions input by a user, and further displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on the display 275.

A CPU processor 254 for executing operating system and application program instructions stored in memory 260. And according to the received user input instruction, processing of various application programs, data and contents is executed so as to finally display and play various audio-video contents.

In some example embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some initialization operations of the display apparatus 200 in the display apparatus preload mode and/or operations of displaying a screen in the normal mode. A plurality of or one sub-processor for performing an operation in a state of a standby mode or the like of the display apparatus.

The communication interface 255 may include a first interface to an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.

The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user input command for selecting a GUI object displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user input command.

Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to an icon. The user input command for selecting the GUI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch panel, etc.) connected to the display apparatus 200 or a voice command corresponding to a user uttering voice.

A memory 260 for storing various types of data, software programs, or applications for driving and controlling the operation of the display device 200. The memory 260 may include volatile and/or nonvolatile memory. And the term "memory" includes the memory 260, the RAM251 and the ROM252 of the controller 250, or a memory card in the display device 200.

In some embodiments, the memory 260 is specifically used for storing an operating program for driving the controller 250 of the display device 200; storing various application programs built in the display apparatus 200 and downloaded by a user from an external apparatus; data such as visual effect images for configuring various GUIs provided by the display 275, various objects related to the GUIs, and selectors for selecting GUI objects are stored.

In some embodiments, the memory 260 is specifically configured to store drivers and related data for the tuner demodulator 210, the communicator 220, the detector 230, the external device interface 240, the video processor 270, the display 275, the audio processor 280, and the like, external data (e.g., audio-visual data) received from the external device interface, or user data (e.g., key information, voice information, touch information, and the like) received from the user interface.

In some embodiments, memory 260 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. In some embodiments, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., the middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.

A block diagram of the architectural configuration of the operating system in the memory of the display device 200 is illustrated in fig. 1D. The operating system architecture comprises an application layer, a middleware layer and a kernel layer from top to bottom.

The application layer, the application programs built in the system and the non-system-level application programs belong to the application layer and are responsible for direct interaction with users. The application layer may include a plurality of applications such as NETFLIX applications, setup applications, media center applications, and the like. These applications may be implemented as Web applications that execute based on a WebKit engine, and in particular may be developed and executed based on HTML, Cascading Style Sheets (CSS), and JavaScript.

Here, HTML, which is called HyperText Markup Language (HyperText Markup Language), is a standard Markup Language for creating web pages, and describes the web pages by Markup tags, where the HTML tags are used to describe characters, graphics, animation, sound, tables, links, etc., and a browser reads an HTML document, interprets the content of the tags in the document, and displays the content in the form of web pages.

CSS, known as Cascading Style Sheets (Cascading Style Sheets), is a computer language used to represent the Style of HTML documents, and may be used to define Style structures, such as fonts, colors, locations, etc. The CSS style can be directly stored in the HTML webpage or a separate style file, so that the style in the webpage can be controlled.

JavaScript, a language applied to Web page programming, can be inserted into an HTML page and interpreted and executed by a browser. The interaction logic of the Web application is realized by JavaScript. The JavaScript can package a JavaScript extension interface through a browser, realize the communication with the kernel layer,

the middleware layer may provide some standardized interfaces to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding experts group (MHEG) middleware related to data broadcasting, DLNA middleware which is middleware related to communication with an external device, middleware which provides a browser environment in which each application program in the display device operates, and the like.

The kernel layer provides core system services, such as: file management, memory management, process management, network management, system security authority management and the like. The kernel layer may be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.

The kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: provide display driver for the display, provide camera driver for the camera, provide button driver for the remote controller, provide wiFi driver for the WIFI module, provide audio driver for audio output interface, provide power management drive for Power Management (PM) module etc..

A user interface 265 receives various user interactions. Specifically, it is used to transmit an input signal of a user to the controller 250 or transmit an output signal from the controller 250 to the user. In some embodiments, the remote control 100A may send an input signal, such as a power switch signal, a channel selection signal, a volume adjustment signal, etc., input by the user to the user interface 265, and then the input signal is forwarded to the controller 250 by the user interface 265; alternatively, the remote controller 100A may receive an output signal such as audio, video, or data output from the user interface 265 via the controller 250, and display the received output signal or output the received output signal in audio or vibration form.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user interface 265 receives the user input commands through the GUI. Specifically, the user interface 265 may receive user input commands for controlling the position of a selector in the GUI to select different objects or items.

Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user interface 265 receives the user input command by recognizing the sound or gesture through the sensor.

The video processor 270 is configured to receive an external video signal, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a video signal that is directly displayed or played on the display 275.

Illustratively, the video processor 270 includes a demultiplexing module, a video decoding module, an image synthesizing module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is configured to demultiplex an input audio/video data stream, where, for example, an input MPEG-2 stream (based on a compression standard of a digital storage media moving image and voice), the demultiplexing module demultiplexes the input audio/video data stream into a video signal and an audio signal.

And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.

And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.

The frame rate conversion module is configured to convert a frame rate of an input video, for example, convert a frame rate of an input 60Hz video into a frame rate of 120Hz or 240Hz, where a common format is implemented by using, for example, an interpolation frame method.

And a display formatting module for converting the signal output by the frame rate conversion module into a signal conforming to a display format of a display, such as converting the format of the signal output by the frame rate conversion module to output an RGB data signal.

And a display 275 for receiving the image signal from the output of the video processor 270 and displaying video, images and menu manipulation interfaces. For example, the display may display video from a broadcast signal received by the tuner demodulator 210, may display video input from the communicator 220 or the external device interface 240, and may display an image stored in the memory 260. The display 275, while displaying a user manipulation interface UI generated in the display apparatus 200 and used to control the display apparatus 200.

And, the display 275 may include a display screen assembly for presenting a picture and a driving assembly for driving the display of an image. Alternatively, a projection device and projection screen may be included, provided display 275 is a projection display.

The audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played by the speaker 286.

In some embodiments, audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, Advanced Audio Coding (AAC), high efficiency AAC (HE-AAC), and the like.

Audio output interface 285 receives audio signals from the output of audio processor 280. For example, the audio output interface may output audio in a broadcast signal received via the tuner demodulator 210, may output audio input via the communicator 220 or the external device interface 240, and may output audio stored in the memory 260. The audio output interface 285 may include a speaker 286, or an external audio output terminal 287, such as an earphone output terminal, that outputs to a generating device of an external device.

In other exemplary embodiments, video processor 270 may comprise one or more chips. Audio processor 280 may also comprise one or more chips.

And, in other exemplary embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated with the controller 250 in one or more chips.

And a power supply 290 for supplying power supply support to the display apparatus 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may be a built-in power supply circuit installed inside the display apparatus 200 or may be a power supply installed outside the display apparatus 200.

In some embodiments, the player pipeline structure of the streaming media may be as shown in fig. 2, and the player pipeline of the streaming media may be specifically implemented as a middleware layer in the operating system in fig. 1D, which may include:

the video downloading module is used for downloading a media link address transmitted by the application, is an index file address for streaming media, is usually a file in m3u8, manifest and mpd formats, and outputs the index file address to the video format detection module.

The video format detection module is used for identifying the protocol type of the streaming media based on the index file, such as HLS, MSS, dash and the like, and outputting the protocol type, the index file address and the index file data downloaded by the video downloading module to the streaming media protocol analysis module.

The streaming media protocol analysis module comprises a plurality of fragment downloading modules and a fragment buffering module, wherein the fragment downloading modules are used for analyzing fragment file addresses based on index file data, acquiring fragment file addresses of the streaming media and downloading the fragment file addresses, and the fragment file downloading of video, audio and subtitles is executed by different fragment downloading modules respectively; and the fragment buffering module is used for buffering the video, audio and subtitle slice files downloaded by different fragment downloading modules respectively and continuously sending the buffered video, audio and subtitle slice files to the decapsulation module.

And an decapsulation module, configured to decapsulate the slice files of the video, the audio, and the subtitle to obtain elementary stream data of the corresponding video, audio, and subtitle, for example, parsing the downloaded video slice file in mp4 format according to a frame format to obtain a multi-frame video elementary stream in H264 format.

And the multi-buffer queue module is used for respectively adding the decapsulated basic streams of each frame of video, audio and subtitle into the video, audio and subtitle buffer queues for buffering, so that the video slice file and the audio slice file in the pipeline are guaranteed to be connected with the basic stream injection module after being analyzed to the basic streams, and the decapsulation module is prevented from being incapable of being connected with the basic stream injection module.

And the elementary stream injection module is used for extracting the elementary stream data of the video, the audio and the subtitle from the buffer queues of the video, the audio and the subtitle respectively according to frames and injecting the extracted elementary stream data into a decoder behind a player pipeline, and the video elementary stream injection module, the audio elementary stream injection module and the subtitle elementary stream injection module in the fig. 2 are not perceived mutually.

And the decoder is used for decoding and correspondingly processing the basic stream data of the video, the audio and the subtitles, and finally outputting the processed data to a corresponding video display window or an audio output interface to realize the playing of the video, the audio and the subtitles.

Since most decoders are implemented by hardware chip-vendor schemes, the elementary stream injection module in the player pipeline connects the format encapsulation module and the decoders, and since the decoders completely depend on the data given by the decapsulation module, it is usually ensured that the data given by the decapsulation module is synchronous, so when elementary streams of different formats are injected into the decoders, it is usually ensured that timestamps of different slice files are within a certain threshold range, for example, 2 s.

The streaming media player pipeline scheme shown in fig. 2 has no problem in normal audio-video streaming media playing processing, but the most significant feature of streaming media is that media content is adaptively switched with a network, if a streaming media encapsulation format or a coding format before and after switching changes, a player pipeline triggers a dynamic switching mechanism, a decapsulation module shown in fig. 2 changes, for example, the streaming media format is changed from a ts format to an mp4 format, and at this time, due to a synchronization limitation of an elementary stream injection module, audio elementary stream data and subtitle elementary stream data in a multi-buffer queue in an old player pipeline cannot be converted to a new pipeline to receive new media content before the consumption of the audio elementary stream data and the subtitle elementary stream data is completed, thereby causing a picture jam.

For example, assume that the threshold for the video and audio synchronization mechanism set by the player pipeline is 2 s. When the duration of one video slice file is 3s and the duration of one audio slice file is 9s, if the format of the next video slice file is changed from ts format to mp4 format, the player pipeline needs to wait for the 9s audio slice to be consumed, and then the server can request to download the next video slice file. When the video elementary stream data corresponding to the 3s video slice file is injected into the decoder to be processed, according to the threshold value of the synchronization mechanism, the audio elementary stream data corresponding to the audio slice file can be injected for 5s at most, and then the synchronization mechanism of the decoder recognizes that no video elementary stream data exists, so that the basic stream injection module cannot be requested to inject the remaining 4s audio elementary stream data, the remaining audio elementary stream data can stay in the basic stream injection module and cannot be injected into the decoder, the current player pipeline cannot be finished, and a new player pipeline cannot be created, so that a phenomenon of picture blocking may occur in the display.

In view of the above problems, the present application provides a streaming media synchronization method and a display device supporting a variable slice format, so that when a slice file format changes in the display device, dynamic reconstruction of a player pipeline and a synchronous injection function of the slice file are not affected, and a video can be played normally and continuously. See the following examples for specific embodiments.

In a first embodiment, a schematic structural diagram of another player pipeline provided by the present application is shown in fig. 3, which sequentially includes: the player comprises a video downloading module, a video format detection module, a streaming media protocol analysis module (comprising a fragment downloading module and a fragment buffering module), a decapsulation module, a multi-buffer queue module, and a basic stream synchronous injection module (comprising a basic stream buffering sub-module and a basic stream injection sub-module), wherein the input end of the player pipeline is a streaming media server, and the output end of the player pipeline is a decoder, such as a decoding chip. The video downloading module and the video format detecting module have the same functions as those shown in fig. 2, and are not described herein again.

With reference to the display device in fig. 1B, the controller in the display device of the present application may further specifically control the video downloading module, the video format detecting module, the streaming media protocol parsing module (including the slice downloading module and the slice buffering module), the decapsulation module, the multi-buffer queue module, the elementary stream buffering module, and the elementary stream injection module in the player pipeline, and perform the following operations:

the controller is used for controlling a fragment downloading module, a decapsulation module, a multi-buffer queue module, a basic stream buffer module and a basic stream injection module in a current streaming media player pipeline before the next video slice file has a format change; wherein the content of the first and second substances,

the fragment downloading module respectively downloads a current video slice file and a current audio slice file; specifically, the fragment downloading module can analyze the content of the index file, obtain the fragment file address of the streaming media, and download the streaming media fragment file, wherein the fragment file downloading of the video, the audio and the subtitle is respectively executed by different fragment downloading modules;

and the fragment buffering module is used for buffering the video, audio and subtitle slice files downloaded by different fragment downloading modules respectively and continuously sending the buffered video, audio and subtitle slice files to the decapsulation module. The fragment buffering module is mainly used for buffering downloaded fragment files, the size of the fragment buffering module is usually controlled by the download duration of the fragment files, the fragment buffering module is usually designed to be 10-30 s, and at least 1 fragment length needs to be guaranteed.

The decapsulation module analyzes the downloaded video slice file and the downloaded audio slice file according to frames to obtain video elementary stream data and audio elementary stream data; specifically, the decapsulation module may decapsulate the video slice file, the audio slice file, and the subtitle slice file to obtain corresponding video elementary stream data, audio elementary stream data, and subtitle elementary stream data, and may parse the downloaded mp4 format video slice file in a frame format to obtain a multi-frame H264 format video elementary stream, for example.

The multi-buffer queue module adds each frame of video elementary stream data after being unpacked to a first video buffer queue and adds each frame of audio elementary stream data after being unpacked to a first audio buffer queue; the multi-buffer queue module is used for buffering queues, and mainly controls that the loading of the player is completed after all streaming media slice files are analyzed, so that the module can be designed to be small enough, such as 2M.

And the basic stream buffering submodule continuously extracts the video basic stream data from the first video buffering queue and adds the video basic stream data to a second video buffering queue, and continuously extracts the audio basic stream data from the first audio buffering queue and adds the audio basic stream data to the second audio buffering queue until the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are completely extracted. After extraction is completed, the multi-buffer queue module has no unprocessed data, so that the decapsulation module and the multi-buffer queue module can be removed and destroyed in the current player pipeline, and new decapsulation modules and multi-buffer queue modules suitable for the changed formats are created again. Therefore, the method and the device can reconstruct the pipeline when the basic streams are buffered in the basic stream buffering submodule, and do not need to wait for all the basic streams to be injected into the decoder to be reconstructed, so that the video can still be normally played after the video format is changed.

It should be noted that, when the elementary stream buffering sub-module extracts elementary stream data from the multi-buffering queue module, a threshold needs to be set for the size of the buffering queue, and the threshold can ensure that data of at least one slice file can be completely buffered. The size of the elementary stream buffering submodule is designed to be the sum of the buffer size of the multi-buffer queue module, the buffer size of the slice buffering module and a proper threshold (for example, 1M).

As an example, the elementary stream injection module in the present embodiment may mutually perceive the time stamp of the elementary stream data when injecting the video elementary stream data and the audio elementary stream data separately, thereby facilitating the synchronous injection.

The method for realizing the synchronous injection of the video elementary stream data and the audio elementary stream data into the decoder by the elementary stream injection submodule specifically comprises the following steps: before injecting the audio basic flow data of the current frame, calculating the difference value of the time stamp of the audio basic flow data of the current frame and the maximum time stamp of the injected video basic flow data; judging whether the difference value is within a preset threshold range, and if so, injecting the audio basic stream data of the current frame; if not, stopping injecting the audio basic stream data of the current frame. It should be noted that the injection of the subtitle elementary stream is the same as the injection of the audio elementary stream, and the details are not described here.

Considering that the format change of the streaming media is usually reflected in the decapsulation module, the elementary stream buffer module only needs to store the elementary stream data in the multi-buffer queue module in the current old player pipeline.

For example, assume that the threshold for the synchronization mechanism set by the player pipeline is 2 s. When the duration of one video slice file is 3s and the duration of one audio slice file is 9s, if the format of the next video slice file is changed, the elementary stream buffer module will extract the video elementary stream data 3s and the audio elementary stream data 9s from the multi-buffer queue module, respectively. The elementary stream injection module synchronously injects the video elementary stream data and the audio elementary stream data into the decoder according to the synchronization mechanism. When the 3s video elementary stream data has been injected into the decoder and the processing is completed, the audio elementary stream data can be injected for 5s at most according to the threshold 2s of the synchronization mechanism, and then the decoder returns an acknowledgement.

Because the residual audio basic stream data is stored in the basic stream buffering module, and all the audio basic stream data in the multi-buffering queue module is extracted into the basic stream buffering module, the multi-buffering queue module considers that the current player pipeline has no unprocessed basic stream data, so that the multi-buffering queue module informs the pipeline that all the data in the current pipeline are processed, the protocol decapsulation module can remove the connection with the old decapsulation module, and a new decapsulation module suitable for the changed video slice format is created, thereby avoiding the phenomenon of picture blocking.

The basic flow injection sub-module adopts a new thread to inject into the decoder without blocking the pipeline reconstruction.

In a second embodiment, with reference to the display device in fig. 1B, the controller in the display device of the present application may further specifically control the video downloading module, the video format detecting module, the streaming media protocol parsing module (including the slice downloading module and the slice buffering module), the decapsulating module, the multi-buffer queue module, the elementary stream buffering module, and the elementary stream injecting module in the player pipeline, and perform the following operations:

the controller is used for controlling the fragment downloading module, the fragment buffering module, the decapsulation module and the multi-buffer queue module in the current streaming media player pipeline before the next video fragment file has a format change, and the basic stream synchronous injection module; wherein the content of the first and second substances,

the fragment downloading module respectively downloads a current video slice file and a current audio slice file; specifically, the fragment downloading module may obtain a fragment file address of the streaming media and download a fragment file based on the content of the index file, where the downloading of the fragment files of the video, the audio, and the subtitle are respectively performed by different fragment downloading modules;

the fragment buffering module continuously controls the fragment downloading module to download the video slice file, and controls the fragment downloading module to download the next audio slice file based on the target time of the video slice file which is downloaded most recently.

The multi-buffer queue module adds each frame of video elementary stream data after being unpacked to a first video buffer queue and adds each frame of audio elementary stream data after being unpacked to a first audio buffer queue.

and the basic stream buffering submodule continuously extracts the video basic stream data from the first video buffering queue and adds the video basic stream data to a second video buffering queue, and continuously extracts the audio basic stream data from the first audio buffering queue and adds the audio basic stream data to the second audio buffering queue until the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are all extracted, and the decapsulation module and the multi-buffering queue module in the current streaming media player pipeline are destroyed.

Different from the foregoing embodiment, the segment buffering module in this embodiment controls the segment downloading module to download the next audio segment file based on the target time of the video segment file that is downloaded most recently, and specifically includes: setting a video slice file as a main format file, setting an audio slice file as a secondary format file, controlling a slice downloading module to judge whether the difference value of the target time of the currently downloaded main format file and the target time of the secondary format file is less than or equal to a preset threshold value before the next secondary format file is downloaded, and if so, downloading the next secondary format file; and if not, suspending downloading the next secondary format file.

In addition, the fragment buffering module can change the secondary format file of the next frame into the primary format file after the primary format file is completely downloaded, and continuously download the primary format file.

The control of the fragment downloading by the fragment buffering module can reduce the timestamp difference of different types of streaming media files in the pipeline conversion process, avoid overlarge buffer amount of the files in the secondary format and reduce the memory occupation of the basic stream buffering module.

As an example, please refer to a flowchart of a process for controlling the downloading of a slice file shown in fig. 4, which includes:

step 401, analyzing streaming media content, and respectively starting video, audio and subtitle threads to download a first slice file;

for example, if the streaming media contains audio, video, and subtitles, 3 threads are started, and if the streaming media contains only audio and video, only two threads are started.

Step 402, updating the target time of the fragments of the currently downloaded main format file;

the target time is the sum of the starting time and the slicing time duration corresponding to the current slicing of the slicing file, namely the time of the slicing ending.

Step 403, judging whether the current fragment of the main format file is downloaded completely; if yes, go to step 404; if not, go to step 407;

step 404, judging whether all fragments of the main format file are downloaded completely; if yes, go to step 405; if not, go to step 409;

step 405, judging whether all the sub-pieces of the secondary format file are downloaded; if yes, ending the downloading; if not, go to step 406;

step 406, selecting a certain secondary format file as a main format file; turning to step 402;

step 407, judging whether the file contains the secondary format file, if so, turning to step 410; if not, go to step 408;

step 408, waiting for the completion of the downloading of the main format file, and turning to step 409;

step 409, continuing downloading the next main format file, and turning to step 402;

step 410, judging whether the current fragment of the secondary format file is downloaded completely, if yes, turning to step 411; if not, go to step 415;

step 411, judging whether the current secondary format file is completely downloaded, if so, pausing the secondary format file downloading thread; if not, go to step 412;

step 412, judging whether the difference value between the next fragment target time of the secondary format file and the target time of the primary format file is in the threshold range, if so, turning to step 413; if not, go to step 414;

step 413, continuing the secondary format file downloading thread, downloading the next piece of secondary format file, and turning to step 410;

step 414, suspending the secondary format file downloading thread, and triggering downloading after the main format file downloading is finished;

and step 415, continuing to download the current fragment secondary format file, and turning to step 410.

In this embodiment, the primary and secondary scores are added to the downloaded streaming media files, so that the secondary format files can adjust the downloading speed of the secondary format files based on the current downloading progress of the primary format files. The primary and secondary allocations are based primarily on file download rates, with the lowest download rate being the primary format files and the remainder being the secondary format files. For example, if there is a video in the streaming media, the video is set as the main format file because the video download rate is low, if there is audio and subtitles in the media, the audio is set as the main format file, and if there is only one path in the media, the path is the main format file.

The thread of the primary format file is downloaded all the time, the secondary format file is downloaded based on the target time of the primary format file unless the limit of a download queue is reached, if the difference between the two is within a certain threshold range, the downloading of the secondary format file is resumed, otherwise, the time stamp difference between the two files is larger, the downloading of the secondary format file is suspended, and the secondary format file is judged again until the primary format file is downloaded completely.

An interaction diagram supporting the synchronization of streaming media in a variable slice format is illustrated in fig. 5. If the duration of one video slice file is 3s, the duration of one audio slice file is 9s, and the duration of one subtitle slice file is 20s, if the format of the next video slice file changes, the interactive process of the player pipeline specifically includes:

and the fragment downloading module downloads a video fragment file 3s, an audio fragment file 9s and a subtitle fragment file 20s from the server respectively according to the address of the analyzed streaming media fragment file based on the control of the fragment buffering module.

And the fragment buffering module is used for buffering the video, audio and subtitle slice files downloaded by different fragment downloading modules respectively and continuously sending the buffered video, audio and subtitle slice files to the decapsulation module. And if the format of the next video slice file is changed, the slicing buffer module controls the slicing downloading module to only download the video slice file of the current slicing, and the audio slice file and the subtitle slice file of the current slicing.

The decapsulation module is configured to decapsulate the slice file of the video, the audio, and the subtitle to obtain corresponding video elementary stream data, audio elementary stream data, and subtitle elementary stream data, where, for example, the current video slice file is in an mp4 format, and the decapsulation module parses the video slice file according to frames to obtain 1-frame video elementary stream data in an H264 format, where the audio and subtitle processing methods are the same.

The multi-buffer queue module respectively adds the decapsulated video elementary stream data, audio elementary stream data and subtitle elementary stream data to corresponding first buffer queues;

and the basic stream buffer module extracts video basic stream data, audio basic stream data and subtitle basic stream data from the first frequency buffer queue and adds the video basic stream data, the audio basic stream data and the subtitle basic stream data to a second buffer queue until the current video basic stream data, the audio basic stream data and the subtitle basic stream data are completely extracted.

And the elementary stream injection module extracts the video elementary stream data, the audio elementary stream data and the subtitle elementary stream data from the second buffer queue according to the respective timestamps of the video elementary stream data, the audio elementary stream data and the subtitle elementary stream data and synchronously injects the video elementary stream data, the audio elementary stream data and the subtitle elementary stream data into the decoder.

For example, assume that the threshold for the synchronization mechanism is 2 s.

When the video elementary stream data, the audio elementary stream data, and the subtitle elementary stream data of the first frame are injected, the timestamps of the injected video elementary stream data, audio elementary stream data, and subtitle elementary stream data are the same, and the first frame may be injected at this time.

When the audio elementary stream data and the subtitle elementary stream data are injected again after the 3s video elementary stream data are injected, the time stamp of the injected video elementary stream data is 3s, and the threshold value based on the synchronization mechanism is 2s, the maximum time stamp of the injection of the audio elementary stream data and the subtitle elementary stream data is 3+2 which is 5s, the audio elementary stream data and the subtitle elementary stream data after exceeding 5s are not injected, and the decoder returns confirmation after considering that all the injections are completed. Since all streaming media data in the current player pipeline are already cached in the elementary stream buffering submodule, the player considers that the current player pipeline module has no unprocessed data, and thus the current streaming media player pipeline is closed. Therefore, a new player pipeline suitable for the changed video slice format can be triggered and established, and the phenomenon of picture blocking is avoided.

In a third embodiment, as shown in fig. 6, a method for performing streaming media synchronization supporting a variable slice format is provided in a display device of the present application, the method comprising the steps of:

601, before the format of the next video slice file is changed, respectively downloading the current video slice file and the current audio slice file;

step 602, analyzing the downloaded video slice file and audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

step 603, adding each frame of decapsulated video elementary stream data to a first video buffer queue, and adding each frame of decapsulated audio elementary stream data to a first audio buffer queue;

step 604, continuously extracting video elementary stream data from the first video buffer queue and adding the video elementary stream data to a second video buffer queue, and continuously extracting audio elementary stream data from the first audio buffer queue and adding the audio elementary stream data to the second audio buffer queue until all the video elementary stream data corresponding to the current video slice file and all the audio elementary stream data corresponding to the current audio slice file are extracted;

and step 605, extracting the video elementary stream data and the audio elementary stream data from the second video buffering queue and the second audio buffering queue respectively according to the time stamp of the video elementary stream data and the time stamp of the audio elementary stream data, and synchronously injecting the video elementary stream data and the audio elementary stream data into the decoder.

In a fourth embodiment, as shown in fig. 7, a display device of the present application provides a method for performing another streaming media synchronization method supporting a variable slice format, the method comprising the steps of:

step 701, before the format of the next video slice file is changed, downloading the current video slice file and the current audio slice file respectively;

step 702, continuously controlling to download a video slice file, and controlling the slice downloading module to download the next audio slice file based on the target time of the video slice file which is downloaded most recently;

step 703, analyzing the downloaded video slice file and audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

step 704, adding each frame of decapsulated video elementary stream data to a first video buffer queue, and adding each frame of decapsulated audio elementary stream data to a first audio buffer queue;

step 705, continuously extracting video elementary stream data from the first video buffering queue and adding the video elementary stream data to a second video buffering queue, and continuously extracting audio elementary stream data from the first audio buffering queue and adding the audio elementary stream data to the second audio buffering queue until all the video elementary stream data corresponding to the current video slice file and all the audio elementary stream data corresponding to the current audio slice file are completely extracted;

step 706, according to the time stamp of the video elementary stream data and the time stamp of the audio elementary stream data, respectively extracting the video elementary stream data and the audio elementary stream data from the second video buffering queue and the second audio buffering queue, and synchronously injecting the video elementary stream data and the audio elementary stream data into the decoder.

Based on the above embodiments, it can be seen that the display device can add the buffer queue of the basic stream injection module on the basis of the multiple buffer queues, so that when the next slice file changes, the data in the multiple buffer queues in the old player pipeline can be exhausted in time, and a new player pipeline is created, thereby not affecting the dynamic reconstruction of the player pipeline and the synchronous injection function of the slice file, and enabling the video to be played normally and continuously. In addition, the downloading progress of the fragment downloading module for the audio slice file is controlled by adding the fragment buffering module, so that the memory size occupied by the video slice file and the audio slice file is optimized, and when the slice file is changed, multi-channel data can be synchronously injected into a decoder.

The foregoing embodiments have been presented for the purposes of illustration and description, and are not intended to be exhaustive or limiting of the application. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but may be used or interchanged in selected embodiments where applicable, even if not specifically shown or described. Also, many modifications may be made which are not to be regarded as departure from the scope of the claims appended hereto, and all such modifications are intended to be included within the scope of the claims appended hereto.

Claims

A display device, comprising:

a display;

the network module is used for browsing and/or downloading service contents from the server;

a decoder for decoding elementary stream data acquired from service contents;

the slicing downloading module is used for respectively downloading the current video slice file and the current audio slice file;

the decapsulation module is used for analyzing the downloaded video slice file and the downloaded audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

the multi-buffer queue module is used for adding each frame of video elementary stream data after being unpacked to a first video buffer queue and adding each frame of audio elementary stream data after being unpacked to a first audio buffer queue;

the basic stream synchronous injection module comprises a basic stream buffering submodule and a basic stream injection submodule, wherein,

the basic stream buffering submodule continuously extracts the video basic stream data from the first video buffering queue and adds the video basic stream data to a second video buffering queue, and continuously extracts the audio basic stream data from the first audio buffering queue and adds the audio basic stream data to the second audio buffering queue until the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are all extracted, and the decapsulation module and the multi-buffering queue module in the current streaming media player pipeline are destroyed;

and the basic stream injection sub-module extracts video basic stream data and audio basic stream data from the second video buffer queue and the second audio buffer queue respectively according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data and synchronously injects the video basic stream data and the audio basic stream data into the decoder.
The display device as claimed in claim 1, wherein the elementary stream injection sub-module extracts video elementary stream data and audio elementary stream data from the second video buffer queue and the second audio buffer queue, respectively, and injects them to the decoder synchronously, specifically:

before injecting the audio basic flow data of the current frame, calculating the difference value of the time stamp of the audio basic flow data of the current frame and the maximum time stamp of the injected video basic flow data;

judging whether the difference value is within a preset threshold range, and if so, injecting the audio basic stream data of the current frame; if not, stopping injecting the audio basic stream data of the current frame.
A display device, comprising:

a display;

the network module is used for browsing and/or downloading service contents from the server;

a decoder for decoding elementary stream data acquired from service contents;

the slicing downloading module is used for respectively downloading the current video slice file and the current audio slice file;

the fragment buffer module is used for continuously controlling the fragment downloading module to download the video slice file and controlling the fragment downloading module to download the next audio slice file based on the target time of the video slice file which is downloaded most recently;

the decapsulation module is used for analyzing the downloaded video slice file and the downloaded audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

the multi-buffer queue module is used for adding each frame of video elementary stream data after being unpacked to a first video buffer queue and adding each frame of audio elementary stream data after being unpacked to a first audio buffer queue;

the basic stream synchronous injection module comprises a basic stream buffering submodule and a basic stream injection submodule, wherein,

the basic stream buffering submodule continuously extracts the video basic stream data from the first video buffering queue and adds the video basic stream data to a second video buffering queue, and continuously extracts the audio basic stream data from the first audio buffering queue and adds the audio basic stream data to the second audio buffering queue until the video basic stream data corresponding to the current video slice file and the audio basic stream data corresponding to the current audio slice file are all extracted, and the decapsulation module and the multi-buffering queue module in the current streaming media player pipeline are destroyed;

and the basic stream injection sub-module extracts video basic stream data and audio basic stream data from the second video buffer queue and the second audio buffer queue respectively according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data and synchronously injects the video basic stream data and the audio basic stream data into the decoder.
The display device according to claim 3, wherein the slice buffering module controls the slice downloading module to download the next audio slice file based on the target time of the most recently downloaded video slice file, and specifically comprises:

setting a video slice file as a main format file, setting an audio slice file as a secondary format file, controlling a slice downloading module to judge whether the difference value of the target time of the currently downloaded main format file and the target time of the secondary format file is less than or equal to a preset threshold value before the next secondary format file is downloaded, and if so, downloading the next secondary format file; and if not, suspending downloading the next secondary format file.
The display device as claimed in claim 4, wherein the slice buffering module is further configured to change the secondary format file of the next frame to the primary format file after the primary format file is completely downloaded, and continue downloading the primary format file.
A method for streaming media synchronization supporting a variable slice format, the method comprising:

respectively downloading a current video slice file and a current audio slice file;

analyzing the downloaded video slice file and audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

adding each frame of video elementary stream data after being unpacked into a first video buffering queue, and adding each frame of audio elementary stream data after being unpacked into a first audio buffering queue;

continuously extracting video elementary stream data from the first video buffering queue and adding the video elementary stream data to a second video buffering queue, and continuously extracting audio elementary stream data from the first audio buffering queue and adding the audio elementary stream data to a second audio buffering queue until all the video elementary stream data corresponding to the current video slice file and all the audio elementary stream data corresponding to the current audio slice file are completely extracted;

and according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data, respectively extracting the video basic stream data and the audio basic stream data from the second video buffering queue and the second audio buffering queue to be synchronously injected into the decoder.
The method as claimed in claim 6, wherein said extracting video elementary stream data and audio elementary stream data from said second video buffer queue and said second audio buffer queue respectively according to a time stamp of the video elementary stream data and a time stamp of the audio elementary stream data, and synchronously injecting them into said decoder, specifically:

before injecting the audio basic flow data of the current frame, calculating the difference value of the time stamp of the audio basic flow data of the current frame and the maximum time stamp of the injected video basic flow data;

judging whether the difference value is within a preset threshold range, and if so, injecting the audio basic stream data of the current frame; if not, stopping injecting the audio basic stream data of the current frame.
A method for streaming media synchronization supporting a variable slice format, the method comprising:

respectively downloading a current video slice file and a current audio slice file;

continuously controlling to download the video slice file, and controlling the slice downloading module to download the next audio slice file based on the target time of the video slice file which is downloaded recently;

analyzing the downloaded video slice file and audio slice file according to frames to obtain video elementary stream data and audio elementary stream data;

adding each frame of video elementary stream data after being unpacked into a first video buffering queue, and adding each frame of audio elementary stream data after being unpacked into a first audio buffering queue;

continuously extracting video elementary stream data from the first video buffering queue and adding the video elementary stream data to a second video buffering queue, and continuously extracting audio elementary stream data from the first audio buffering queue and adding the audio elementary stream data to a second audio buffering queue until all the video elementary stream data corresponding to the current video slice file and all the audio elementary stream data corresponding to the current audio slice file are completely extracted;

and according to the time stamp of the video basic stream data and the time stamp of the audio basic stream data, respectively extracting the video basic stream data and the audio basic stream data from the second video buffering queue and the second audio buffering queue to be synchronously injected into the decoder.
The method of claim 8, wherein the controlling the slice download module to download the next audio slice file based on the target time of the most recently downloaded video slice file comprises:

setting a video slice file as a main format file, setting an audio slice file as a secondary format file, controlling a slice downloading module to judge whether the difference value of the target time of the currently downloaded main format file and the target time of the secondary format file is less than or equal to a preset threshold value before the next secondary format file is downloaded, and if so, downloading the next secondary format file; and if not, suspending downloading the next secondary format file.
The method of claim 9, wherein the method further comprises:

and after the main format file is completely downloaded, changing the secondary format file of the next frame into the main format file, and continuously downloading the main format file.