CN115134644A

CN115134644A - Live broadcast data processing method and device

Info

Publication number: CN115134644A
Application number: CN202110318360.6A
Authority: CN
Inventors: 朱宗花; 王云刚
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2022-09-30

Abstract

The application provides a live broadcast data processing method and device, which are used for solving the problem that audio and video cannot be played synchronously when a live broadcast video starts to play. The method comprises the following steps: after a user triggers video live broadcast, a first PMT and a second PMT are received in sequence; the first PMT is a PMT received for the first time after a user triggers video live broadcast, the first PMT indicates that an encoding period corresponding to the first PMT encodes audio data and uncoded video data, and the second PMT indicates that an encoding period corresponding to the second PMT encodes video data; the time for receiving the second PMT is less than the set time length from the time for receiving the first PMT; and when a second PMT is received, starting live video to synchronously play the video data and the audio data.

Description

Live broadcast data processing method and device

Technical Field

The present application relates to the field of live video technologies, and in particular, to a live data processing method and apparatus.

Background

At present, in a live video scene, when a live video starts playing, a sending end encodes audio and video data and sends the encoded audio and video data to a receiving end, the receiving end performs decapsulation and decoding operations after receiving the audio and video data, and outputs the decapsulated and decoded audio data to audio equipment and renders the video data to display equipment. As people are pursuing high definition of the watched live broadcast picture, the code rate of the video data is high, and the coding rate of the video data at the sending end becomes slow, so that the video data is not transmitted in time. And because the code rate of the audio data is low, the audio data can be transmitted in time, and the audio and video transmission is not synchronous. Therefore, at the stage of starting playing of live video, the situation that only sound can be heard and live video cannot be seen may occur, and the watching experience of the user is reduced.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing live broadcast data, which are used for solving the problem that audio and video output is not synchronous in a video live broadcast starting stage.

In a first aspect, embodiments of the present application provide a display device, including an input interface, a controller, and a display screen, wherein,

the input interface is used for receiving the first PMT and the second PMT in sequence after a user triggers the live video;

the controller is used for starting video live broadcast to render the video data to a display screen and output the audio data when determining that the second PMT is received;

the display screen is used for displaying the video data;

the first PMT is a PMT received for the first time after a user triggers video live broadcast, and indicates that an encoding period corresponding to the first PMT encodes audio data and uncoded video data; the second PMT indicates that the coding period corresponding to the second PMT is coded with video data; the time for receiving the second PMT is less than the set time length from the time for receiving the first PMT.

Based on the scheme, after a user triggers video live broadcast, the first program mapping table PMT indicates that only audio data is coded in the coding period corresponding to the first PMT, and if the second PMT is determined to be received within a set time length and indicates that video data is coded in the coding period corresponding to the second PMT, the audio data and the video data are synchronously played. The method and the device ensure that the audio and video data can be synchronously played, and avoid the condition that a user can only hear the sound and cannot see the picture in the live broadcast starting stage because the audio is only played when the audio data is received later.

In some embodiments, the controller is further configured to:

before receiving a first PMT, counting the number of data packets carrying audio data and the number of data packets carrying non-audio data, which are received by an input interface;

when the number of the data packets bearing the audio data is determined to be larger than that of the data packets bearing the non-audio data, taking the first time length as the set time length; alternatively, the first and second liquid crystal display panels may be,

when the number of the data packets bearing the audio data is determined to be smaller than the number of the data packets bearing the non-audio data, taking the second time length as the set time length;

the second duration is greater than the first duration.

Based on the above scheme, the application provides that the set duration can be determined by comparing the number of the packets carrying the audio data and the number of the packets carrying the non-audio data, which are received before the first PMT is received. Because if the number of the data packets carrying the audio data is larger than that of the data packets carrying the non-audio data, which indicates that the live program is likely to be a pure audio program, the first shorter time length is used as the set time length. On the contrary, if the number of the data packets carrying the audio data is less than the number of the data packets carrying the non-audio data, which indicates that the live program is a program with both audio and video, the longer second time length is used as the set time length. The method for determining the set time length can adapt to the actual conditions of different programs, and avoids resource waste.

In a possible implementation manner, the first PMT includes a correspondence between a type identifier for indicating that a data type is an audio and a first packet identifier;

the controller is specifically configured to determine that the number of first packet identifiers included in a packet received before the first PMT is received is the number of packets carrying audio data, and determine that a difference between the total number of received packets and the number of packets carrying audio data is the number of packets carrying non-audio data.

Based on the above scheme, the identifier of the first data packet, which is used for carrying the audio data, is determined according to the corresponding relationship between the type identifier, which is used for indicating that the data type is the audio, in the first PMT and the identifier of the first data packet. And further determining the number of the data packets received before the first PMT is received, wherein the number includes the first packet identifier, as the number of the audio-carrying data packets, and then determining the number of the data packets carrying the non-audio data according to the total number of the data packets minus the data of the data packets carrying the audio data. By adopting the method, the determined quantity of the audio data packets and the quantity of the non-audio data packets can be ensured to be more accurate.

In some embodiments, the controller is further configured to, before receiving the first PMT, record the number of different packet identifiers included in the packet received by the input interface;

when determining that the number of first packet identifiers included in the data packets received before the first PMT is received is the number of data packets carrying audio data, the method is specifically configured to determine the number corresponding to the first packet identifier in the recorded numbers of different data packet identifiers.

Based on the scheme, before the first PMT is received, the number of the different data packet identifications included in the received data protection is recorded. By adopting the method, the number of the data packets bearing the audio data and the number of the data packets bearing the non-audio data can be determined more quickly, and the efficiency is improved.

In some embodiments, the controller is further configured to store, in a buffer, a packet carrying audio data received after receiving the first PMT, before determining that the second PMT is received;

and when video live broadcast is started to render the video data to the display screen and output the audio data, the method is specifically used for starting the video live broadcast and outputting the audio data included in the cache and rendering the video data received after the second PMT is received to the display screen.

Based on the scheme, the data packet carrying the audio data received between the first PMT and the second PMT is stored in the cache, and after the second PMT is determined to be received, the video data and the audio data in the cache are played synchronously. By adopting the scheme, the integrity of the played data can be ensured, and the condition that the video watched by the user is incomplete due to the loss of the data is reduced.

In a second aspect, an embodiment of the present application provides a live data processing method, including:

after a user triggers video live broadcast, a first PMT and a second PMT are received in sequence;

the first PMT is a PMT received for the first time after a user triggers live video, the first PMT indicates that an encoding period corresponding to the first PMT encodes audio data and does not encode video data, and the second PMT indicates that an encoding period corresponding to the second PMT encodes video data; the time for receiving the second PMT is less than the set time length from the time for receiving the first PMT;

and when a second PMT is received, starting live video to synchronously play the video data and the audio data.

In some embodiments, the method further comprises:

before receiving the first PMT, the number of the received data packets carrying the audio data and the number of the data packets carrying the non-audio data;

when the number of the data packets carrying the audio data is larger than that of the data packets carrying the non-audio data, taking the first time length as the set time length; alternatively, the first and second electrodes may be,

when the number of the data packets carrying the audio data is smaller than that of the data packets carrying the non-audio data, taking a second time length as the set time length;

the second duration is greater than the first duration.

In some embodiments, the first PMT includes a correspondence between a type identifier for indicating that the data type is audio and a first packet identifier;

before receiving the first PMT, the received number of packets carrying audio data and the number of packets carrying non-audio data include:

determining the number of first data packet identifiers included in the data packets received before the first PMT is received as the number of data packets carrying audio data, and determining the difference between the total number of the received data packets and the number of the data packets carrying the audio data as the number of data packets carrying non-audio data.

In some embodiments, the method further comprises:

before receiving the first PMT, recording the number of different data packet identifications included in the received data packet;

determining the number of first packet identifiers included in the packet received before the first PMT is received as the number of packets carrying audio data, including:

and determining the number corresponding to the first data packet identifier in the recorded number of different data packet identifiers.

In some embodiments, before determining that the second PMT is received, the method further includes:

storing the data packet carrying the audio data received after the first PMT is received in a cache;

initiating a live video feed to render the video data to a display screen and output the audio data, comprising:

and starting the live video broadcast and synchronously playing the audio data included in the cache and the video data received after the second PMT is received.

In a third aspect, an embodiment of the present application provides a live data processing apparatus, including:

the communication unit is used for receiving the first PMT and the second PMT in sequence after a user triggers the live video broadcast;

the processing unit is used for starting video live broadcast to render the video data to a display screen and output the audio data when the second PMT is determined to be received;

a display unit for displaying the video data;

the audio output unit is used for playing the audio data;

In some embodiments, the processing unit is further to:

before receiving a first PMT, counting the number of data packets carrying audio data and the number of data packets carrying non-audio data, which are received by the communication unit;

when the number of the data packets bearing the audio data is determined to be larger than that of the data packets bearing the non-audio data, taking the first time length as the set time length; alternatively, the first and second electrodes may be,

the second duration is greater than the first duration.

the processing unit is specifically configured to determine that the number of first data packet identifiers included in data packets received before the first PMT is received is the number of data packets carrying audio data, and determine that a difference between the total number of received data packets and the number of data packets carrying audio data is the number of data packets carrying non-audio data.

In some embodiments, the processing unit is further configured to, before receiving the first PMT, record the number of different packet identifiers included in the packet received by the communication unit;

In some embodiments, the processing unit is further configured to store, in a cache, data packets carrying audio data received after receiving the first PMT, before determining that the second PMT is received;

and when video live broadcast is started to render the video data to a display unit and the audio data is sent to an audio output unit, the method and the device are specifically used for starting the video live broadcast, outputting the audio data included in the cache to the audio output unit and rendering the video data received after the second PMT is received to the display unit.

In a fourth aspect, the present application further provides a computer storage medium, in which computer program instructions are stored, and when the instructions are run on a computer, the instructions cause the computer to execute the processing method of live data as described in the second aspect.

For technical effects brought by any one implementation manner in the second aspect to the fourth aspect, reference may be made to technical effects brought by an implementation manner corresponding to the first aspect, and details are not described here.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1A is a schematic structural diagram of a UDP packet according to an embodiment of the present application;

fig. 1B is a schematic structural diagram of a TS packet according to an embodiment of the present application;

fig. 1C is a schematic structural diagram of a PMT table according to an embodiment of the present disclosure;

fig. 2A is a schematic view of a scene of a receiving end according to an embodiment of the present disclosure;

fig. 2B is a schematic view of another receiving end scenario provided in the embodiment of the present application;

fig. 3A is a block diagram of a hardware configuration of a terminal device according to an embodiment of the present disclosure;

fig. 3B is a block diagram of a configuration of the control device 100 according to an embodiment of the present application;

fig. 3C is a block diagram of a software structure of a terminal according to an embodiment of the present disclosure;

fig. 4 is a schematic architecture diagram of a live data processing system according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of Table 1 provided in an embodiment of the present application;

fig. 6 is a flowchart of a live data processing method according to an embodiment of the present application;

fig. 7 is a flowchart of playing a new live program according to an embodiment of the present application;

fig. 8 is a flowchart of another live data processing method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a display device according to an embodiment of the present application;

fig. 10 is a schematic view of another live data processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein.

Before describing the embodiments of the present application, technical terms related to the embodiments of the present application will be described first:

(1) the User Datagram Protocol (UDP) is a simple connectionless network transport layer Protocol, has the advantages of small delay and high data transmission efficiency, and is widely applied to real-time transmission of audio and video data. For convenience of understanding, in the embodiment of the present application, a packet transmitted in the UDP protocol is referred to as a UDP packet, and the UDP packet generally includes a header of 8 bytes (Byte) and a payload. For example, as shown in fig. 1A, the header may include a source port (source port), a destination port (destination port), a length (length), and a checksum (checksum).

Source port: this field typically occupies the first 16 bits of the UDP header and typically contains the UDP port used by the application sending the datagram. The application at the receiving end uses the value of this field as the destination address to send the response. Destination port: the port used by the sink. Length: indicating the length of the UDP packet, including the UDP header and the UDP data length. And (3) checking the value: can be used to verify that data is corrupted during transmission. The payload contains the data to be transmitted.

(2) A Transport Stream (TS), a single data Stream composed of one or more packets having a common time reference or independent time references, is used for data transmission. For convenience of description, the data packets transmitted in the TS stream will be referred to as TS packets in the following. The size of a TS packet is typically 188 bytes. The TS packet includes a header and a payload. For example, as shown in fig. 1B, the packet header contains PID of TS packet, sync byte, transport error indicator, start indicator of payload unit, transport priority, transport scrambling control, adaptive field control, and continuity counter.

Wherein the payload element start indicator is the start of a completion packet. The transmission priority represents the priority of transmission, 0 being low priority and 1 being high priority, typically taken as 0. The packet identifier PID is a unique identifier carried by each TS packet. The transport scrambling control is used to indicate whether or not to encrypt, e.g., 00 indicates not to encrypt. The adaptation field control indicates whether an adaptation region is included, e.g., "00" indicates reserved, "01" generally indicates no adaptation region and only contains payload, "10" generally indicates only an adaptation region and no payload, "11" generally indicates both an adaptation region and a payload. The continuous counter is typically from 0-f, and the starting value may not be 0, but needs to be continuous. The payload of the TS packet may be audio data, video data, a PMT table, or a PAT table. That is, the TS packets referred to in the embodiment of the present application are of four types: TS packets for carrying audio data, TS packets for carrying video data, TS packets for carrying PAT tables, and TS packets for carrying PMT tables. For convenience of description, the TS packets for carrying audio data will be referred to as audio TS packets, and the TS packets for carrying video data will be referred to as video TS packets hereinafter.

(3) Elementary Stream (ES) is an original Stream output by an encoder for transmitting an audio/video signal. An ES contains a data stream of only one type of content, such as only video or only audio.

(4) Packetized Elementary Streams (PES), which are Packetized ES Streams, are used to convey a data structure of the ES Streams. The PES stream is a data stream formed by packetizing the ES stream, and in this process, operations of packetizing the ES stream, adding header information, and the like are completed. The basic unit of a PES stream is a PES packet, and the PES packet is packetized when being transmitted. The TS packet includes a PES packet packetized in packets.

(5) Packet Identifiers (PIDs), PIDs are unique identifiers of data packets transmitted by TS streams, video data of a live program corresponds to only one PID, and audio data of a live program corresponds to only one PID. The PID is located in the header of the TS packet.

(6) A Program Map Table (PMT) for recording the type of data encoded in an encoding period and the PID of the TS packets generated in the encoding period. The packet header of the TS packet carrying the PMT table contains the PID of the PMT table. The PMT table generally contains PMT fixed content, an optional description table, the types of all packets contained in the current program, a reserved field, the PID of the packet, the length of information in the packet, an information description table, and a CRC check. The type of data packet included in the current program is usually audio, video, and other associated data types. The type of all data packets included in the current program is used to indicate whether the type of the TS packet is an audio type or a video type, that is, whether data carried by the TS packet is audio data or video data. The reserved field is used for byte alignment. Length of information in packet: for describing the length of the information description table. Information description table: the description table of the TS packets is different from one TS packet to another. The CRC Check is used to record a Cyclic Redundancy Check (CRC) output from the encoder. As an example, the structure of the PMT table may refer to the structure shown in fig. 1C, in which the data type in N is encoded in each encoding cycle as an example.

(7) A Program Association Table (PAT) for defining a Program in the TS packet, where PID of the TS packet for carrying the PAT Table is 0. The PAT table mainly contains the PID of PMT.

In a live video scene, a sending end encodes and packages collected audio and video data into a UDP packet and sends the UDP packet to a receiving end, and the receiving end de-encapsulates and decodes the received data packet and further outputs the decoded data to a player. However, since the code rates of the audio and the video are different and the code rate of the audio is generally lower than that of the video, the coding rate of the audio data is higher, and the coding rate of the video data is lower, which may cause the transmission of the audio data and the video data to be asynchronous, and may cause the situations of normal transmission of the audio data and delayed transmission of the video data. In the video starting stage, if the type of the indication data packet in the first PMT received by the receiving end does not contain video data, the player will only construct the decapsulation, decoding and playing channel of the audio data, and only output the audio to the speaker. The video data received again subsequently cannot be decapsulated, decoded, and rendered, so that a situation that only sound has no picture occurs. For example, see the scenario shown in fig. 2A, the display screen at the receiving end cannot display images, and the user can only hear sounds. Reducing the viewing experience for the user.

Based on this, the present application provides a method and an apparatus for processing live data, when a receiving end receives a type of a data packet indicated by a first PMT and does not include video data, the receiving end does not directly start playing audio data, but detects whether video data is included in subsequent data packets within a set time duration, and if so, synchronizes audio and video data, thereby solving a problem that only audio and no picture occurs during a start-up playing stage, so that the receiving end can synchronously output audio and video data, for example, see a scene shown in fig. 2B.

In the following, to facilitate understanding of the scheme of the present application, a plurality of embodiments in different scenarios will be described. The scheme provided by the application can be applied to terminal equipment, and the terminal equipment related to the application has a display function, such as a computer, a television, a smart phone or a tablet computer. In the present application, a terminal device having a display function may also be referred to as a display device.

As an example, the configuration of the terminal device according to the present application will be described in detail below. Referring to fig. 3A, a schematic diagram of a possible hardware configuration of the terminal device 200 is shown. In some embodiments, the terminal equipment includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a processor 250, a display unit 260, an audio output interface 270, a memory, a power supply, and a user interface 280.

In some embodiments, the display unit 260 includes a display screen component for displaying images, and a driving component for driving image display, a component for receiving image signals output from the processor, and performing display of video content, image content, and a menu manipulation Interface, a User Interface (UI) Interface, and the like.

In some embodiments, the display part 260 may be at least one of a liquid crystal display, an organic light-Emitting Diode (OLED) display, and a projection display, and may also be a projection apparatus and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals from a plurality of wireless or wired broadcast television signals.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wireless Fidelity (Wifi) module, a bluetooth module, a wired ethernet module, or other network communication protocol chip or near field communication protocol chip, and an infrared receiver. The terminal 200 may perform data transmission with the target health detection device or the peer device 300 through the communicator 220.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which can be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: any one or more of a High Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, camera interface, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the processor 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the processor 250 is located, such as an external set-top box.

In some embodiments, the processor 250 includes at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphic Processing Unit (GPU), a ram (random Access Memory), a ROM (Read-Only Memory), a first interface to an nth interface for input/output, a communication Bus (Bus), and the like.

In some embodiments, the CPU is configured to execute operating system and application program instructions stored in the memory, and to execute various application programs, data, and content in accordance with various interactive instructions that receive external inputs for ultimately displaying and playing various audiovisual content. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphical objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on the display component.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal that can be directly displayed or played on the terminal 200.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the demultiplexed video signal, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, perform at least one of decompression and decoding, and denoising, digital-to-analog conversion, and amplification processing according to a standard codec protocol of the input signal, and obtain a sound signal that can be played in the speaker.

In some embodiments, the user may input a user command on a Graphical User Interface (GUI) displayed on the display part 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "display interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the terminal, where the control may include at least one of a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

In some embodiments, user interface 280 is an interface that can be used to receive control inputs (e.g., physical buttons on the body of a peer device, or the like).

In some embodiments, the system of the peer device may include a Kernel (Kernel), a command parser (shell), a file system, and an application. The kernel, shell, and file system together form the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

In some embodiments, the present application may further include a control apparatus 100, where the control apparatus 100 is configured to control a terminal device 200 shown in fig. 3A, and may receive an operation instruction input by a user, convert the operation instruction into an instruction recognizable and responsive by the terminal device 200, and play an intermediary role in interaction between the user and the terminal device 200. Such as: the user operates the channel up/down key on the control device 100, and the terminal device 200 responds to the channel up/down operation.

The control device 100 may be a remote controller 100A, which includes infrared protocol communication or bluetooth protocol communication, and other short-distance communication methods, and controls the terminal device 200 in a wireless or other wired manner. The user can input a user instruction through a key on a remote controller, voice input, control panel input, or the like to control the terminal apparatus 200. Such as: the user can input a corresponding control instruction through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, thereby implementing the function of controlling the terminal device 200.

The control device 100 may also be an intelligent device, such as a mobile terminal 100B, a tablet computer, a notebook computer, and the like. The terminal device 200 is controlled using, for example, an application program running on the smart device. The application program may provide various controls to a user through an intuitive User Interface (UI) on a screen associated with the smart device through configuration.

Fig. 3B is a block diagram illustrating the configuration of the control apparatus 100. As shown in fig. 3B, the control device 100 may include a controller 110, a memory 120, a communicator 130, a user input interface 140, an output interface 150, and a power supply 160. It should be understood that fig. 3B is merely an example, and the control device 100 may include more or better components than those in fig. 3B, and the present application is not limited thereto.

The controller 110 includes a Random Access Memory (RAM)111, a Read Only Memory (ROM)112, a processor 113, a communication interface, and a communication bus. The controller 110 is used to control the operation of the control device 100, as well as the internal components of the communication cooperation, external and internal data processing functions.

Illustratively, when an interaction of a user pressing a key disposed on the remote controller 100A or an interaction of touching a touch panel disposed on the remote controller 100A is detected, the controller 110 may control to generate a signal corresponding to the detected interaction and transmit the signal to the terminal apparatus 200.

And a memory 120 for storing various operation programs, data and applications for driving and controlling the control apparatus 100 under the control of the controller 110. The memory 120 may store various control signal commands input by a user.

The communicator 130 enables communication of control signals and data signals with the terminal device 200 under the control of the controller 110. Such as: the control apparatus 100 transmits a control signal (e.g., a touch signal or a button signal) to the terminal device 200 via the communicator 130, and the control apparatus 100 may receive the signal transmitted by the terminal device 200 via the communicator 130. The communicator 130 may include an infrared signal interface 131 and a radio frequency signal interface 132. For example: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the terminal device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the terminal device 200 through the rf transmitting terminal.

The user input interface 140 may include at least one of a microphone 141, a touch pad 142, a sensor 143, a key 144, and the like, so that a user can input a user instruction regarding the control of the terminal device 200 to the control apparatus 100 through voice, touch, gesture, press, and the like.

The output interface 150 outputs a user instruction received by the user input interface 140 to the terminal device 200, or outputs an image or voice signal received by the terminal device 200. Here, the output interface 150 may include an LED interface 151, a vibration interface 152 generating vibration, a sound output interface 153 outputting sound, a display 154 outputting images, and the like. For example, the remote controller 100A may receive an output signal such as audio, video, or data from the output interface 150, and display the output signal in the form of an image on the display 154, in the form of audio on the sound output interface 153, or in the form of vibration on the vibration interface 152.

And a power supply 160 for providing operation power support for each element of the control device 100 under the control of the controller 110. In the form of a battery and associated control circuitry.

Referring to fig. 3C, a block diagram of an architecture configuration of the operating system of the terminal device is exemplarily shown. The operating system architecture comprises an application layer, a middleware layer and a kernel layer from top to bottom.

The application layer, the application program built in the system and the non-system-level application program belong to the application layer. Is responsible for direct interaction with the user. The application layer may include a plurality of applications, such as a setup application, a media center application, and the like. These applications may be implemented as Web applications that execute based on a WebKit engine, and in particular may be developed and executed based on HTML5, Cascading Style Sheets (CSS), JavaScript, and the like.

The middleware layer may provide some standardized interfaces to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding experts group (MHEG) middleware related to data broadcasting, DLNA middleware which is middleware related to communication with an external device, middleware which provides a browser environment in which each application program in the terminal operates, and the like.

The kernel layer provides core system services such as: file management, memory management, process management, network management, system security authority management and the like. The kernel layer may be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.

The kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: providing a display driver for a display component, a camera driver for a camera, a WiFi driver for a WiFi module, an audio driver for an audio output interface, a power management driver for a Power Management (PM) module, etc.

The hardware configuration and the software structure of different terminals may be different, and thus both fig. 3A and fig. 3B are exemplary illustrations.

In order to facilitate understanding of the embodiment of the present application, a processing system architecture diagram of live data shown in fig. 4 will be specifically described as an example.

The architecture diagram shown in fig. 4 includes a sending-side device 401, a sending-side server 402, a receiving-side server 403, and a receiving-side device 404.

The sending end Device (which may be referred to as a sending end for short) 401 may be a terminal Device (which may also be referred to as a display Device) with functions of shooting, recording and the like, and may be a Mobile Phone (Mobile Phone), a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable Device and the like with functions of shooting, recording and the like. The wearable device may include, for example, a smart watch or a smart bracelet, etc. The system may also be a television with functions of image pickup and sound recording, or may be a device capable of recording images, such as a camera capable of connecting to a network. For example, the sender device 401 may be a terminal device used by a live initiator (e.g., a main broadcast) in a live scene. The sender device 401 may directly upload the collected video and audio data to the sender server 402, or may encode the collected video and audio data first and send the encoded audio and video data to the sender server 402. The structure of the transmitting end device 401 may adopt the structure of the terminal device shown in fig. 3A described above. It should be noted that fig. 4 is only an example, and the number of sending-end devices is not limited in the embodiment of the present application. The processing system of the live broadcast data can also comprise an encoder which is connected with the sending end equipment, for example, when the sending end equipment is a camera, the sending end equipment can be connected with the encoder, and the camera transmits the data generated in real time to the encoder which encodes the data. Portions relating to the encoder are not shown in fig. 4.

The sending-end server 402 according to the present application may be a client server corresponding to a sending-end device used by a live initiating user in a live scene. The sender server 402 may be configured to receive encoded data and send it to the receiver. Or, the sender server 402 may also receive uncoded audio and video data, encode and encapsulate the data, and send the encoded and encapsulated data to the receiver device 404. It should be noted that fig. 4 is only an example, and the functions of the sending-end server 402 according to the present application may be implemented by one server or by a server cluster, which is not specifically limited in this application.

The receiving end server 403 related to the present application may be a client end server corresponding to a receiving end device used by a user watching a live broadcast in a live broadcast scene. The sink server 403 may be configured to receive encoded audio-video data from the sender server 402. The receiving end server 403 may also be configured to decode the encoded audio and video data, and send the decoded audio and video data to the receiving end device 404. Alternatively, the sink server 403 may also directly send the encoded audio/video data to the sink device 404. It should be noted that fig. 4 is only used as an example, and in an actual scenario, the function of the receiving end server 403 may also be implemented by a server cluster. In some embodiments, the functions of the sender server 402 and the receiver server 403 may also be implemented by one server.

The receiving end device (which may be referred to as a receiving end for short) 404 related to the present application may be a terminal device (which may also be referred to as a display device) having an audio/video playing function and including a display screen, and may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a mobile internet device, a television having an audio/video playing function, and the like having the audio/video playing function. The structure of the receiving end device 404 may adopt the structure of the terminal device shown in fig. 4 described above. In this embodiment of the application, the receiving end device 404 may be a device used by a user watching a live broadcast, and is configured to receive decoded audio and video data sent by a server, and output the decoded audio and video data to an audio device, for example, a speaker provided in the receiving end device 404 itself, or a device such as a sound box connected to the receiving end device 404. And rendering the video data to a display screen. The receiving end device 404 related to the present application may further have a decapsulation and decoding function, receive undecoded audio and video data from the receiving end server, decode the audio and video data by using the decapsulation and decoding function of the receiving end device, output the decoded audio data to the audio device, and render the decoded video data to the display screen.

It should be noted that the architecture diagram of the live data processing system shown in fig. 4 is only an example, and the application does not specifically limit the number of the devices and the servers in the system. It should be understood that other terminal devices besides the sending end device and the receiving end device may also be included in the live data processing system.

Next, a live data processing method proposed in the embodiment of the present application is specifically described with reference to a system architecture diagram shown in fig. 4. Live data is data generated during a live process, and includes audio data and video data. In a live video scene, a plurality of Transmission protocols may be used to transmit audio and video data generated during live video, for example, the Transmission protocols may be Transmission Control Protocols (TCP), User Data Protocols (UDP), and the like.

In a live scene, a sending end can acquire video data through a camera and can acquire audio data through sound receiving devices such as a microphone. In some embodiments, after acquiring the audio data and the video data, the sending end may encode the audio data and the video data to obtain an ES stream for transmitting the audio and video data. The sending end can further segment and pack the audio ES stream and the video ES stream to obtain an audio PES packet and a video PES packet. In some embodiments, the sending end may further packetize the audio PES packet and the video PES packet to obtain a TS packet. As an example, the structure of the TS packet may refer to the structure shown in fig. 1B. In some optional manners, when the sending end encodes audio and video data (audio data and video data), a PAT table and a PMT table are generated in each encoding period, and the PAT table and the PMT table may also be carried in a TS packet for transmission. Illustratively, the PID in the packet header of the TS packet for carrying the PAT table may be 0, the PIDs of all audio TS packets corresponding to a live program are the same, the PIDs of all video TS packets corresponding to a live program are the same, and the PIDs of the audio TS packets and the video TS packets are different. The PAT table generated in each encoding period is mainly used for recording the PID of the PMT table generated in the encoding period, and the PMT table generated in each encoding period is mainly used for recording the media type of the data encoded in the encoding period, that is, whether the encoded data is audio data or video data and the PID of the TS packet generated in the encoding period for carrying audio or video data.

It should be noted that the above description for the transmitting end includes that the transmitting end may encode and packetize the collected audio stream and video stream, and generate a PAT table and a PMT table for each encoding period. The method also comprises the steps that the sending end only uploads the collected audio stream and video stream to the sending end server, and the sending end server executes the operations of coding, packaging and the like.

The embodiment of the application provides live broadcast data processing methods in two scenes, and one method is that when a user triggers video live broadcast starting, a receiving end receives a TS packet type indicated by a first PMT and does not include a video type. The other is the situation that the PMT received by the receiving end changes in the process of playing the live video. These two cases will be described separately below.

And (I) triggering the video live broadcast starting moment by a user, and receiving a scene that the TS packet type indicated by the first PMT does not comprise the video type by a receiving end.

After encoding and packaging the collected audio stream and video stream, the sending end sends a TS packet carrying the audio data, the video data, the PAT table and the PMT table to the receiving end.

In this scenario, when the receiving end first receives an audio TS packet or a video TS packet, instead of the TS packets carrying the PAT table and the PMT table, the receiving end regards these TS packets as unknown data packets because the PIDs of these TS packets do not appear, ignores these TS packets, and records the PIDs of the deleted TS packets and the number of TS packets with different PIDs. In some embodiments, a counter may be configured in the receiving end to record the number of received TS packets of different PIDs. The PID of the TS packets and the number of TS packets to be recorded may be stored in a table format, for example, in a hash table format. For ease of description, this table will be referred to hereinafter as table 1. As an example, table 1 may refer to a table as shown in fig. 5.

After deleting the unknown data packet, when receiving a TS packet for carrying the PAT table for the first time, the receiving end may first determine that the TS packet carries the PAT table according to the PID of the TS packet being 0, and then analyze the TS packet to obtain the PID of the PMT table recorded in the PAT table. The receiving end determines the first received TS packet for carrying the PMT table according to the PID of the PMT recorded in the PAT table, and for convenience of description, the first received PMT table is referred to as a first PMT. After receiving the first PMT, the receiving end determines whether the TS packet type indicated by the first PMT includes a video type.

In some embodiments, if the TS packet type indicated by the first PMT includes an audio type and a video type, the receiving end may perform decapsulation and decoding operations on the audio TS packet and the video TS packet indicated by the first PMT, and output the decoded audio data and video data synchronously.

In other embodiments, the TS packet type indicated by the first PMT does not include a video type, in which case, the audio data may be normally output but the video data may be delayed to cause the audio data and the video data to be out of synchronization, possibly because the coding rate of the video data is higher and the coding speed of the video data is lower than the output speed of the video data, and the audio data does not have the problem that the coding speed is lower than the output speed of the audio data. The receiving end determines the PID of the audio TS packet according to the corresponding relationship between the type identifier indicating that the data type is the audio type in the first PMT and the TS packet PID, stores the audio TS packet in a buffer (for example, may store the audio TS packet in a queue), and detects whether the TS packet of which the TS packet type is the video type is received subsequently within a set time length after the first PMT is received. Such as determining whether the type of the received TS packet is a video type based on an indication in a PMT table subsequently received. For convenience of description, the PMT table indicating that the type of the TS packet received first within a set time period contains a video type will be referred to as a second PMT later. In some embodiments, the set duration may be set according to the following logic: and determining the number of the audio TS packets and the number of the non-audio data packets stored in the table 1 according to the PID of the audio TS packets recorded in the first PMT, and determining that the number of the audio TS packets in the table 1 is greater than the number of the non-audio TS packets, and taking the first time length as the set time length. And if the number of the audio TS packets in the table 1 is determined to be smaller than the number of the non-audio TS packets according to the PID of the audio TS packets recorded in the first PMT, taking the second time length as the set time length. Wherein the second duration is greater than the first duration. The reason for determining the set time length by adopting the logic is as follows: since audio data is generally smaller than video data, that is, the number of audio TS packets of the same live program is generally smaller than the number of video TS packets. Then in the case where it is determined that the number of audio TS packets in table 1 is smaller than the number of non-audio TS packets, indicating that the non-audio TS packets are likely to include video TS packets, it is determined that the live program is likely to be a program in which audio and video are both present. That is, there is a high possibility that there is a video TS packet in the TS packets received subsequently, the longer second time period is used as the set time period. Otherwise, the first time length is used as the set time length.

In some possible cases, when the receiving end receives the second PMT within the set time duration, the receiving end may construct a decapsulation, decoding, and playing channel for the audio TS packet and the video TS packet, decapsulate and decode the video TS packet indicated in the second PMT and the audio TS packet stored in the buffer queue, and synchronously output the decoded audio and video data to the player, where the player outputs the audio data to the speaker and renders the video data to the display screen. In some embodiments, it is possible for the receiving end to regard the video TS packet received before the second PMT is received as an unknown packet and delete the video TS packet regarded as the unknown packet, so that the received video TS packet may be missing compared with the received audio TS packet, so when the decoded audio and video data is synchronously output, the timestamp of the video data may be used as the reference timestamp, and audio data smaller than the timestamp of the video data may be discarded, so that the audio data and the video data may be synchronously output.

In other possible cases, when the receiving end does not receive the video TS packet within the set time duration, which indicates that the current live program is a pure audio program, a channel for decapsulating, decoding, and playing the audio TS packet may be constructed, the audio TS packet stored in the buffer queue is decapsulated and decoded, and the decoded audio data is output to the speaker.

Next, referring to fig. 6, a method for processing live data in this scenario will be described in a specific flow.

601, the sending end sends P TS packets to the receiving end. The TS packets carrying the PMT table are not included in the P TS packets.

And 602, the receiving end deletes the received P TS packets and records the PID of the P TS packets and the number of the TS packets with different PIDs.

The PID of P TS packets recorded by the receiving end and the number of TS packets with different PIDs can be recorded in the form of a table as shown in fig. 5 in the above embodiment.

603, the sending end sends the TS packet carrying the PAT table and the TS packet carrying the first PMT table to the receiving end.

For example, the receiving end may determine that the TS packet carries the PAT table according to the PID of the TS packet being 0, and further determine the first PMT according to the PID of the PMT table included in the PAT. For the description of the first PMT, reference may be made to the above-described embodiments.

The receiving end determines 604 that the first PMT indicates that the type of the TS packet does not include a video type.

605, the transmitting end continues to transmit TS packets to the receiving end.

And 606, storing the audio TS packets in a buffer queue by the receiving end, and determining that the TS packets carrying the second PMT are received within the set time length.

The receiving end may determine an audio TS packet in the subsequently received TS packets according to the PID of the audio TS packet recorded in the first PMT and the PID included in the subsequently received TS packet, and store the audio TS packet in the buffer queue. For the description of the second PMT and the set time period, reference may be made to the above-mentioned embodiments.

607, the receiving end de-encapsulates and decodes the video TS packet indicated in the second PMT and the audio TS packet in the buffer queue, and synchronously outputs the decoded audio and video data.

It should be noted that the P TS packets in 601, the TS packet carrying the PAT table in 603, the TS packet carrying the first PMT, and the TS packet in 605 are continuously transmitted by the transmitting device, and fig. 6 is merely an example and does not limit the time interval.

And (II) in the live broadcast process, the receiving end can receive the scene with the PMT changed.

In the process of live video broadcast, a receiving end de-encapsulates and decodes the audio TS packets and the video TS packets indicated in the PMT according to the received PMT, outputs the decoded audio data to a loudspeaker, and renders the decoded video data to a display screen. However, in this process, the PMT table currently received may be different from the PMT table received first including both audio data and video data.

The currently received PMT table and the received PMT table which includes both audio data and video data belong to the same program. For the convenience of description, the PMT table currently received is referred to as a current PMT table, the PMT table received first including both audio data and video data is referred to as a primary PMT table, and the current PMT table may have two changes compared with the primary PMT table:

one type of change is that TS packets recorded in the PMT table are changed, such as: the type of TS packets recorded in the current PMT table is changed from the original PMT table, for example, the original PMT table records audio TS packets and video TS packets, and the current PMT table records only audio TS packets.

Another variation is that the PID of the TS packets recorded in the current PMT table is changed compared to the original PMT table.

For the first variation, it may be caused by that the encoding speed of the sending end is slow and cannot catch up with the speed of live video playing temporarily, so that the PMT table received after the encoding speed is increased may be automatically recovered. Then, for such a situation, the receiving end may ignore such a change, and the original decapsulation, decoding, and playing channel does not need to be changed, and may remain the same. For example, when the TS packets recorded in the original PMT table include both audio TS packets and video TS packets, the TS packets recorded in the current PMT table include only audio TS packets. In this case, if the receiving end plays the audio data with the time axis of the audio data as the reference time axis, the audio data can still be output with the time axis of the audio data while maintaining the current situation; if the receiving end plays the video data by using the time axis of the video data as the reference time axis, the receiving end decodes and unpacks the received audio TS packet when determining that the receiving end does not receive the video TS packet within a period of time, and changes the reference time axis into the system time axis to continuously output the audio data when outputting the decoded audio data. For example, if the receiving end does not receive the video TS packet within 1s, the reference time axis can be changed to the system time axis to continue outputting the audio data. Here, the system time axis refers to a system default time axis of the terminal device at the receiving end. Among them, the reference time axis may be referred to as a play time axis. Taking the reference time axis as the time axis of the video data as an example, the reference time axis can be used to indicate the time corresponding to the video data, such as the total duration, the start time, the end time, and the current playing time of the video data. That is, the reference time axis is a time line of video playing, and is linearly increased.

In the second variation, the current live program may be already played, and the PID recorded in the current PMT table is different from the PID recorded in the original PMT table, so that the current PMT table may be a PMT table of a new live program. Under the condition, the receiving end can inform the player that the playing of the current live program is finished, the player can wait for the audio and video data output of the current live program to be finished after receiving the notice, delete the old decoding channel, create a new decoding channel, decode the TS packet of the new live program and play the new live program. As an example, the flow of playing a new live program for the receiving end may refer to the flow shown in fig. 7.

Next, referring to fig. 8, a live data processing method in this scenario will be described in a specific flow. For convenience of description, fig. 8 still illustrates an example in which the current PMT table is a current PMT table, and the first PMT table received includes both audio data and video data is an original PMT table.

The receiving end 801 receives the current PMT table from the transmitting end.

And 802, the receiving end judges whether the PID in the current PMT table is the same as the PID in the original PMT table.

The receiving end determines whether the PIDs in the current PMT table and the original PMT table are the same or not, see the description of the second variation in the above embodiment.

If not, go to step 803.

If so, go to step 806.

The receiving end 803 judges whether the type of the TS packet recorded in the current PMT table is included in the original PMT table.

The receiving end determines whether the type of the TS packet recorded in the current PMT table is included in the original PMT table, which may refer to the first variation mentioned in the above embodiment. If the type of the TS packet recorded in the current PMT table is an audio type, and the type of the TS packet recorded in the original PMT table includes an audio type and a video type, it is determined that the type of the TS packet recorded in the current PMT table is included in the original PMT table. On the contrary, if the types of the TS packets recorded in the current PMT table include an audio type and a video type, and the types of the TS packets recorded in the original PMT table include only an audio type, it is determined that the types of the TS packets recorded in the current PMT table are not included in the original PMT table.

If so, go to step 804.

If not, go to step 805.

Changes to the current PMT table are ignored 804.

The receiving end continues to use the previous decapsulation, decoding and playing channels without any change.

805, the receiver informs the player that the current live program has been played.

And after finishing outputting the current video data, the player reconstructs a new decoding channel and plays a new live program. See the flow chart shown in fig. 7.

Based on the same concept as the above method, as shown in fig. 9, a display apparatus 900 is provided. The display device 900 is capable of performing the various steps of the above-described method and will not be described in detail herein to avoid repetition. The display device 900 includes: input interface 901, controller 902, display 903.

An input interface 901, configured to receive a first PMT and a second PMT sequentially after a user triggers live video;

the controller 902 is configured to, when determining that the second PMT is received, start live video to render the video data to the display screen 903 and output the audio data;

the display screen 903 is used for displaying the video data;

In some embodiments, the controller 902 is further configured to:

before receiving the first PMT, counting the number of packets carrying audio data and the number of packets carrying non-audio data received by the input interface 901;

the second duration is greater than the first duration.

the controller 902 is specifically configured to determine that the number of first packet identifiers included in a packet received before receiving the first PMT is the number of packets carrying audio data, and determine that a difference between the total number of received packets and the number of packets carrying audio data is the number of packets carrying non-audio data.

In some embodiments, the controller 902 is further configured to record, before receiving the first PMT, the number of different packet identifiers included in the packet received by the input interface 901;

In some embodiments, the controller 902 is further configured to store, in a buffer, the data packets carrying the audio data received after receiving the first PMT, before determining that the second PMT is received;

and when the live video is started to render the video data to the display screen 903 and output the audio data, the method is specifically configured to start the live video and output the audio data included in the buffer and render the video data received after receiving the second PMT to the display screen 903.

Based on the same concept as the above method, as shown in fig. 10, a live data processing apparatus 1000 is provided. The device 1000 is capable of performing the steps of the above method, and will not be described in detail herein to avoid redundancy. The apparatus 1000 comprises: a communication unit 1001, a processing unit 1002, a display unit 1003, and an audio output unit 1004.

A communication unit 1001, configured to receive the first PMT and the second PMT sequentially after a user triggers live video broadcast;

the processing unit 1002 is configured to, when determining that the second PMT is received, start live video to render the video data to a display screen and output the audio data;

a display unit 1003 for displaying the video data;

an audio output unit 1004 for playing the audio data.

Embodiments of the present application also provide a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the steps of any of the methods described above.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

While specific embodiments of the present application have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and that the scope of the present application is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and principles of this application, and these changes and modifications are intended to be included within the scope of this application. While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A display device comprising an input interface, a controller and a display screen, wherein,

the input interface is used for receiving the first PMT and the second PMT in sequence after a user triggers the live video; the controller is used for starting video live broadcast to render the video data to a display screen and output the audio data when determining that the second PMT is received;

the display screen is used for displaying the video data;

2. The display device of claim 1, wherein the controller is further to:

the second duration is greater than the first duration.

3. The display device according to claim 2, wherein the first PMT includes a correspondence between a type identifier for indicating that a data type is audio and a first packet identifier;

4. The display device of claim 3, wherein the controller is further configured to record a number of different respective packet identifications included in the packets received by the input interface prior to receiving the first PMT;

5. The display device of any of claims 1-4, wherein the controller is further configured to store data packets carrying audio data received after receiving the first PMT in a buffer prior to determining that the second PMT is received;

6. A method for processing live broadcast data, comprising:

after a user triggers live video, a first PMT and a second PMT are received in sequence;

the first PMT is a PMT received for the first time after a user triggers video live broadcast, the first PMT indicates that an encoding period corresponding to the first PMT encodes audio data and uncoded video data, and the second PMT indicates that an encoding period corresponding to the second PMT encodes video data; the time for receiving the second PMT is less than the set time length from the time for receiving the first PMT;

7. The method of claim 6, further comprising:

the second duration is greater than the first duration.

8. The method of claim 7, wherein the first PMT includes a correspondence between a type identifier indicating that a data type is audio and a first packet identifier;

before receiving the first PMT, the number of received packets carrying audio data and the number of packets carrying non-audio data include:

9. The method of claim 8, wherein the method further comprises:

10. A computer-readable storage medium having stored thereon computer-executable instructions which, when invoked by a computer, cause the computer to perform the method of any one of claims 6 to 9.