CN114827679A

CN114827679A - Display device and sound picture synchronization method

Info

Publication number: CN114827679A
Application number: CN202210412666.2A
Authority: CN
Inventors: 王云刚; 吕显浩; 李斌
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-07-29

Abstract

Some embodiments of the present application provide a display apparatus and a picture synchronization method. The display device may acquire and decode video data and audio data. The display device can determine the time stamp of the first frame of video data in the decoded video data, the time stamp of the last frame of video data and the time stamp of the first frame of audio data in the audio data, and performs sound and picture synchronization processing on the decoded video data and the decoded audio data according to the three time stamps and plays the video data and the audio data. Instead of just discarding the video data to perform sound and picture synchronization, the playing time of the video data is adjusted according to the timestamp of the data, so that the sound and picture synchronization is realized, and the watching experience of a user is improved.

Description

Display device and sound picture synchronization method

Technical Field

The application relates to the technical field of display equipment, in particular to display equipment and a sound and picture synchronization method.

Background

The display equipment is terminal equipment capable of outputting specific display pictures, along with the rapid development of the display equipment, the functions of the display equipment are more and more abundant, the performance is more and more powerful, the bidirectional human-computer interaction function can be realized, and various functions such as audio and video, entertainment, data and the like are integrated into a whole for meeting the diversified and personalized requirements of users.

A user can view various media assets in the network using a player installed in the display device. The display device can download corresponding media asset data in the network and separate corresponding video data and audio data. The video data is displayed in the display, and the audio data is played through the loudspeaker, so that the media assets are played for a user to watch. However, in the process of playing media assets, the processing time of image data is generally longer than that of audio data, so that the video playing can lag behind the audio, and the phenomenon of audio-video asynchronism is caused.

In order to realize sound and picture synchronization in the related art, a plurality of previous video data are discarded, and the subsequent video data are directly played, so that the subsequent video data are displayed in advance, and the audio is kept synchronous. However, if the discarded video data is large, the difference between the front and the back of the picture in the display is large, and the user can feel that the picture jumps obviously and plays unsmoothly when watching, thereby seriously affecting the watching experience of the user.

Disclosure of Invention

The application provides a display device and a sound and picture synchronization method. The problem that the viewing experience of a user is seriously influenced due to the fact that the front-back difference of a picture in a display is large in the related technology is solved.

In a first aspect, the present application provides a display device comprising a display, an audio output interface, and a controller. Wherein the audio output interface is configured to connect to an audio device; the controller is configured to perform the steps of:

acquiring video data and audio data, and decoding the video data and the audio data;

performing sound-picture synchronization processing on the decoded video data and audio data according to the first video time stamp, the second video time stamp and the first audio time stamp, and playing; the first video timestamp is a timestamp of a first frame of video data in the decoded video data, the second video timestamp is a timestamp of a last frame of video data in the decoded video data, and the first audio timestamp is a timestamp of a first frame of audio data in the decoded audio data.

In a second aspect, the present application provides a sound and picture synchronization method, applied to a display device, the method including:

According to the technical scheme, the display device and the sound picture synchronization method are provided in some embodiments of the application. The display device may acquire and decode video data and audio data. The display device can determine the time stamp of the first frame of video data in the decoded video data, the time stamp of the last frame of video data and the time stamp of the first frame of audio data in the audio data, and performs sound and picture synchronization processing on the decoded video data and the decoded audio data according to the three time stamps and plays the video data and the audio data. Instead of just discarding the video data to perform sound and picture synchronization, the playing time of the video data is adjusted according to the timestamp of the data, so that the sound and picture synchronization is realized, and the watching experience of a user is improved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 illustrates a usage scenario of a display device according to some embodiments;

fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments;

fig. 3 illustrates a hardware configuration block diagram of the display apparatus 200 according to some embodiments;

FIG. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments;

FIG. 5 shows a schematic diagram of a user interface in some embodiments;

FIG. 6 is a schematic diagram illustrating a display device displaying a "movie interface" in some embodiments;

FIG. 7 shows a schematic diagram of an application panel in some embodiments;

FIG. 8 illustrates a player configuration diagram for a display device in some embodiments;

FIG. 9 illustrates an interaction flow diagram for components of a display device in some embodiments;

FIG. 10 is a diagram illustrating the display of a sound-picture synchronization pattern confirmation message in the display in some embodiments;

FIG. 11 is a flow diagram illustrating detection of a picture-in-sound synchronization condition in some embodiments;

FIG. 12 is a diagram illustrating expanding the upper memory limit of an audio buffer region in some embodiments;

FIG. 13 illustrates a schematic diagram of expanding the upper memory limit of an audio buffer region in some embodiments;

FIG. 14 is a schematic diagram illustrating the display of load prompt information in a display in some embodiments;

FIG. 15 is a schematic diagram that illustrates a load prompt interface displayed in a display in some embodiments;

FIG. 16 is a schematic diagram that illustrates time stamping of video data and audio data in some embodiments;

FIG. 17 is a schematic diagram that illustrates time stamping of video data and audio data in some embodiments;

FIG. 18 shows a flow diagram of one embodiment of a voice-picture synchronization method.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment. It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Fig. 1 illustrates a usage scenario of a display device according to some embodiments, as shown in fig. 1, a user may operate the display device 200 through a mobile terminal 300 and a control apparatus 100. The control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication, bluetooth protocol communication, wireless or other wired method to control the display device 200. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. The audio and video contents displayed on the mobile terminal 300 can also be transmitted to the display device 200, so that the display device 200 with the synchronous display function can also perform data communication with the server 400 through multiple communication modes. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The display apparatus 200 may additionally provide an intelligent network tv function that provides a computer support function in addition to the broadcast receiving tv function.

Fig. 2 illustrates a block diagram of a hardware configuration of the control apparatus 100 according to some embodiments. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. The communication interface 130 is used for communicating with the outside, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module. The user input/output interface 140 includes at least one of a microphone, a touch pad, a sensor, a key, or an alternative module.

Fig. 3 illustrates a hardware configuration block diagram of a display device 200 according to some embodiments. As shown in fig. 3, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280. The controller includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, and first to nth interfaces for input/output. The display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen. The display is used for displaying a user interface. The user interface may be a specific target image, such as various media assets acquired from a network signal source, including video, pictures, and other content. The user interface may also be some UI interface of the display device. The user may view content such as assets in the display. The tuner demodulator 210 receives a broadcast television signal through a wired or wireless reception manner, and demodulates an audio/video signal, such as an EPG data signal, from a plurality of wireless or wired broadcast television signals. The detector 230 is used to collect signals of the external environment or interaction with the outside. The controller 250 and the tuner-demodulator 210 may be located in different separate devices, that is, the tuner-demodulator 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. A user may input a user command on a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form that the user can receive. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.

Fig. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments, as shown in fig. 4, the system is divided into four layers, which are an Application (Application) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime layer"), and a kernel layer, respectively, from top to bottom. The inner core layer comprises at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

A user may utilize a display device to view various media assets in a network. The display device can download corresponding media asset data in the network and separate corresponding video data and audio data. The video data is displayed in the display, and the audio data is played through the loudspeaker, so that the media assets are played for a user to watch. However, in the process of playing media assets, the processing time of image data is generally longer than that of audio data, so that the video playing can lag behind the audio, and the phenomenon of audio-video asynchronism is caused. Therefore, the display device has a sound and picture synchronization function, and plays the video data and the audio data synchronously so as to improve the experience of the user.

In the related art, when audio and video synchronization is performed, unplayed video data and audio data can be compared, target video data with a timestamp close to that of the audio data is determined, and all video data before the target video data are deleted, so that synchronization of the two data is guaranteed. However, if the discarded video data is large, the difference between the front and the back of the picture in the display is large, and the user can feel that the picture jumps obviously and plays unsmoothly when watching, thereby seriously affecting the watching experience of the user. Meanwhile, the reason for the synchronization of the sound and the picture is many, which may be the problem of the film source or the display device system, but the reason for the synchronization of the sound and the picture is not determined by the processing mode, and the method cannot be applied to different playing scenes only by deleting the video data, so that the accuracy of the synchronization of the sound and the picture is poor, and the use experience of a user is influenced.

Therefore, the display device provided by the embodiment of the application can accurately perform sound and picture synchronization processing, so that the use experience of a user is improved.

In some embodiments, the controller may control the display to display the user interface when the user controls the display device to power on. FIG. 5 illustrates a schematic diagram of a user interface in some embodiments. As shown in fig. 5, the user interface includes a first navigation bar 500, a second navigation bar 510, a function bar 520, and a content display area 530, and the function bar 520 includes a plurality of function controls such as "view records", "my favorites", and "my applications", etc. The content displayed in the content display area 530 changes according to the selected controls in the first navigation bar 500 and the second navigation bar 510. When the application panel page is applied, the user can click the my application control to input a display instruction for the application panel page to trigger entering the corresponding application panel. It should be noted that the user may also input a selection operation on the functionality control in other manners to trigger entry into the application panel. For example, control is passed to the application panel page using a voice control function or a search function, etc.

In some embodiments, the user may select a control in the second navigation bar 510 to control the display device to enter the interface under the corresponding category to view the network assets. For example, the user may directly select the "movie" entry option, and control the focus cursor to move to the selected "movie" entry option through the control device or the terminal device, thereby triggering the display device to display the "movie interface". FIG. 6 illustrates a schematic view of a display device displaying a "shadow interface" in some embodiments. As shown in fig. 6, the "movie interface" may include a plurality of movie classification entry options, and the user may click on any entry option, for example, click on the "movie" entry option triggers the display device to display the "movie" entry interface, and may select a specific movie asset to play.

In some embodiments, for the user interface shown in FIG. 5, the user may select the "My applications" control, thereby triggering the display device to display an application panel. The user can view the applications that the display device has installed, i.e., the functions supported by the display device, through the application panel. FIG. 7 shows a schematic diagram of an application panel in some embodiments. As shown in fig. 7, the application panel includes three controls of "network media asset", "cable tv", and "video chat". And the user can watch the media assets through the display equipment by touching the network media asset control. At this time, the controller may control the display device to turn on the player. The user may select a certain asset in the player for viewing.

Fig. 8 illustrates a player configuration diagram for a display device in some embodiments. As shown in fig. 8, the player may include a network protocol parsing and downloading module, a demultiplexing module, a decoding module, and an output module. The network protocol analyzing and downloading module can determine a transmission protocol between the display device and the server, so that the media asset data can be downloaded in the server. The demultiplexing module may demultiplex the downloaded media asset data to obtain video data (video) and audio data (audio) contained in the media asset data, and may further contain subtitle data and the like. The decoding module comprises an audio decoding unit and a video decoding unit. The audio decoding unit is used for decoding audio data, and the video decoding unit is used for decoding video data. The output module comprises an audio output unit and a video output unit. The audio output unit transmits the decoded audio data to the audio equipment for playing, and the video output unit transmits the decoded video data to the display for displaying.

FIG. 9 illustrates an interaction flow diagram for components of a display device in some embodiments.

As shown in fig. 9, when the user selects a specific asset and controls the display device to play, it is determined that the user has sent an asset play instruction to the display device. In response to the media asset playing instruction, the display device may determine the target media asset selected by the user, and start to acquire the media asset data of the target media asset, so as to play the target media asset in real time. The controller may acquire the asset data in real time.

Specifically, the network protocol analyzing and downloading module in the display device is configured to analyze the network protocol and download the media asset data, which may be code stream data of the target media asset. The media asset data can be streaming media data, and the streaming media technology refers to a technology of compressing a series of media data and then transmitting the compressed media data in a streaming manner in a network in a segmented manner to realize real-time transmission of video and audio on the network for viewing.

The controller can control the network protocol analysis and download module to be in communication connection with a server for providing target media assets. The network protocol analyzing and downloading module can determine the network transmission protocol between the two parties, so as to send a media resource acquisition request to the server according to the network. The network Transport Protocol may be a Real-time Transport Protocol (RTP), and the RTP can carry TS data streams to ensure efficient and Real-time data transmission, so that the network Transport Protocol is widely applied to communication and entertainment services related to streaming media, such as telephone, video conference, television, push-to-talk, and the like. RTP describes a standard packet format for the delivery of audio and video over the internet, i.e., the format of RTP packets herein.

After determining the network transport Protocol, the network Protocol parsing and downloading module may send an RTSP (Real Time streaming Protocol) request to the server to obtain the media data. RTSP is used to control the transmission of real-time data, and RTSP can control a plurality of data transmission sessions. During an RTSP session, multiple reliable transport connections to the server may be opened or closed to issue RTSP requests.

After receiving the RTSP request, the server may first determine the target asset and then generate the asset data of the target asset. Specifically, since the media asset data generally includes video data and audio data, the server may perform composite processing on the audio data and the video data to obtain corresponding media asset data. It can be composed into TS (Transport stream) data stream, and then add RTP protocol header to obtain the media asset data in the form of RTP packet. At this time, the RSTP session is established between the network protocol analyzing and downloading module and the server, and through the session, the server can send the media asset data in the form of RTP packets to the network protocol analyzing and downloading module at a certain rate. The display device can play the target media asset according to the media asset data.

In some embodiments, a block of storage area may be preset in the network protocol parsing and downloading module. After the media asset data is acquired, the media asset data can be stored in the preset storage area for subsequent processing.

In consideration of the fact that the video data and the audio data are fused in the acquired media asset data, in order to normally play the target media asset, the two data need to be separated and processed. The controller may acquire the video data and the audio data based on the media asset data, and may perform demultiplexing on the media asset data to obtain two types of data.

The display device can be provided with a demultiplexing module which can demultiplex the media asset data. The controller can control the network protocol analysis and download module to send the media asset data to the demultiplexing module, and specifically can send the media asset data in the storage area to the demultiplexing module.

In some embodiments, the controller may control the demultiplexing module to demultiplex the media asset data, decapsulate the media asset data to obtain video data and audio data, and may further include subtitle data and the like.

After the video data and the audio data are obtained, the two data can be decoded respectively, so that decoded data which can be presented to a user by the display device can be obtained, and the display device can further directly play the two decoded data. It should be noted that, since the speed of demultiplexing the media asset stream data and the speed of decoding the audio and video data cannot be completely matched, the two processes cannot be performed in real time consecutively. Therefore, after the media asset data is subjected to demultiplexing processing, the obtained video data and audio data can be stored firstly, so that the subsequent consecutive decoding processing can be ensured.

Specifically, a video buffer area and an audio buffer area may be preset, and are respectively used for storing video data and audio data. The display device can be separately provided with a cache module, and the cache module has a certain storage space, wherein a video cache region and an audio cache region are arranged in the cache module. The cache module may also be directly disposed in a storage area of the network protocol parsing and downloading module, which is not limited in the embodiment of the present application.

In some embodiments, after decoding the video data and the audio data, the controller may further render the two data, and control the output module to output the rendered video data to the display and the rendered audio data to the audio device.

In some embodiments, the display device may display a picture, i.e., display rendered video data, through the display. The display device can also be provided with an audio output interface for connecting with audio equipment, such as a Bluetooth sound box, a USB sound box and the like externally connected with the display device. The external audio device can play sound, namely play rendered audio data. The display device may also be provided with an audio device such as a speaker or a sound box. For a built-in audio device or an external audio device, the audio device may be used to play rendered audio data, and the audio device is not limited in this embodiment of the present application.

In some embodiments, the display device is further provided with a sound and picture synchronization function in consideration of user experience. When the user feels that the picture and the sound played by the display device are not synchronized, a sound-picture synchronization instruction may be transmitted to the display device. The display device may enter a picture-in-sound synchronization mode to perform picture-in-sound synchronization processing on the video data and the audio data.

In some embodiments, the user can send the sound-picture synchronization instruction to the display device by operating a designated key of the remote controller. The corresponding relation between the sound-picture synchronization instruction and the plurality of remote controller keys can be bound in advance, and when a user touches the plurality of keys bound with the sound-picture synchronization instruction, the remote controller sends out the sound-picture synchronization instruction.

In some embodiments, the user may send the sound-picture synchronization instruction to the display device by means of voice input or by a preset gesture or action.

In some embodiments, when the user uses the smart device to control the display device, for example, when using a mobile phone, the user can select whether to enter the sound and picture synchronization mode through a preset control in the mobile phone, so as to send a sound and picture synchronization instruction to the display device.

In some embodiments, a sound and picture synchronization mode option may be set in a UI interface of the display device, and when a user clicks the option, the display device may be controlled to enter or exit the sound and picture synchronization mode.

In some embodiments, to prevent the user from triggering the sound and picture synchronization mode by mistake, when the controller receives the sound and picture synchronization instruction, the controller may control the display to display the sound and picture synchronization mode confirmation information, so that the user performs secondary confirmation to determine whether to control the display device to enter the sound and picture synchronization mode. Fig. 10 is a diagram showing a display of the sound-picture synchronization pattern confirmation information in the display in some embodiments.

In some embodiments, a sound and picture synchronization module can be arranged in the player. When the display device enters a sound and picture synchronization mode, the controller controls the sound and picture synchronization module to start working. And the sound and picture synchronization module is used for carrying out sound and picture synchronization processing on the decoded video data and audio data. The controller can respectively send the video data in the video buffer area and the audio data in the audio buffer area to the sound and picture synchronization module, so that the two data are synchronized in time. The controller can further render the data after the audio and video synchronization, and control the output module to output the rendered data to the display and the audio equipment respectively, so as to realize the playing of the media assets.

In some embodiments, the audio-video synchronization processing may be performed on the decoded video data and the decoded audio data, so that the two data may be synchronized in time, thereby improving the viewing experience of the user. When video data and audio data are subjected to sound-picture synchronization processing, sound-picture synchronization may be performed frame by frame. And performing sound-picture synchronization processing on the first frame of video data and the first frame of audio data in the decoded data. In the embodiment of the present application, the first frame of video data/audio data refers to the video data/audio data with the earliest playing order among the decoded data that have not yet been played.

The playing order of the video data/audio data may be determined by the time stamps. The time stamp can be parameter information carried by the data itself and used for representing the playing time. Each datum will have a time stamp. Taking the video data as an example, the timestamp is a display timestamp of the current frame video data, and represents a display time in the display. The smaller the time stamp, the earlier the playing time of the frame of video data, the earlier the playing order thereof. The larger the timestamp is, the later the playing time of the frame of video data is, the later the playing sequence is.

Generally, when downloading the asset data, the asset data is downloaded in the descending order of the timestamps, so as to be played normally in the time sequence. Therefore, the downloading order can represent the playing order.

In some embodiments, the decoded video data and audio data may be stored in a buffer queue, and the two data may be input to the audio-video synchronization module in an order of ascending time stamps. In order to ensure the accuracy of sound-picture synchronization, only one frame of video data and one frame of audio data can exist in the sound-picture synchronization module at the same time. That is, it is detected that there is still a frame of video data in the audio-visual synchronization module, the video data in the buffer queue will not be continuously input to the audio-visual synchronization module, and when the frame of video data leaves the audio-visual synchronization module, for example, when rendering and playing are started, the video data with the smallest timestamp in the buffer queue can be input to the audio-visual synchronization module. The input process of the audio data may be the same as above.

In some embodiments, when the audio-video synchronization processing is performed, it can be ensured that one kind of data is played normally, and the other kind of data is synchronized with the data as a reference. For example, the audio data may be played normally, that is, each frame of audio data may be played according to its own time stamp, and the controller does not need to perform specific synchronization processing on the audio data. The audio data may be used as a reference to keep the system clock and the audio clock of the display device consistent. The controller may perform a specific synchronization process on each frame of video data so that the video data and the audio data can achieve sound-picture synchronization.

In the embodiment of the present application, the audio data is normally played, and the video data is specifically and synchronously processed, for example, for introduction. The audio data and the video data can be input into the sound and picture synchronization module, but the sound and picture synchronization module can not perform specific processing on the audio data, so that the audio data can be normally played according to the time stamp, and the sound and picture synchronization module performs specific processing on the video data, so that the two data are synchronized. Or only inputting the video data into the audio-visual synchronization module, and directly performing the rendering and playing process according to the time stamp instead of inputting the audio data into the audio-visual synchronization module.

In some embodiments, when performing the sound-picture synchronization process, there may be many situations in consideration of the current playing condition of the display device, for example, the display device may be in a sound-picture synchronization state, and may be that the video display precedes the audio playing or may be that the video display lags the audio playing. In the embodiments of the present application, these cases are collectively referred to as a sound-picture synchronization case.

Therefore, the controller can detect the sound-picture synchronization condition of the display device firstly, and can detect the sound-picture synchronization condition through one frame of video data and one frame of audio data input into the sound-picture synchronization module.

In some embodiments, in detecting a picture synchronization condition of the display device, the detection may be based on time stamps of the video data and the audio data.

The controller may first determine a first video timestamp and a first audio timestamp. In the embodiment of the application, the following are set: the first video timestamp is a timestamp of a first frame of video data in the decoded video data, and the first audio timestamp is a timestamp of a first frame of audio data in the decoded audio data. The first frame of video data/audio data is the decoded but not rendered and played first frame of data, and the time stamp of the first frame of video data/audio data is the smallest in the unplayed video data/audio data. May be considered as video data/audio data input to the sound and picture synchronization module. Thus, the first video time stamp and the first audio time stamp can be directly determined.

It should be noted that, since it is not necessary to perform specific synchronization processing on the audio data, the first audio time stamp is the actual playing time of the first frame of audio data. However, since the video data is subjected to a specific synchronization process, the first video timestamp may not be an actual playing time of the first frame of video data, and a specific playing time may be determined during the synchronization process.

The controller can determine the sound and picture synchronization condition of the display device according to the first video time stamp and the first audio time stamp, and different processing can be performed according to different sound and picture synchronization conditions.

In some embodiments, three audio-visual synchronization conditions may be preset, and the three conditions correspond to the audio-visual synchronized condition, the video display leading the audio playing condition, and the video display lagging the audio playing condition, respectively. In the embodiment of the present application, the first sound-picture synchronization condition is set as follows: the video display lags behind the audio playing, and the second sound-picture synchronization condition is as follows: the video display precedes the audio playing, and the third audio-video synchronization condition is as follows: the sound and picture are synchronized. The controller determines a specific processing procedure by judging which sound-picture synchronization condition the video data and the audio data conform to.

In some embodiments, for a frame of video data and a frame of audio data, if timestamps of two frames of data are the same, the two frames of data may be considered to be in a synchronized state, that is, the display device is already in a sound-picture synchronization state, and meets a third sound-picture synchronization condition, and at this time, the two frames of data may be played normally without performing other processing.

And if the time stamps of the two frames of data are different, the two frames of data are considered to be in an asynchronous state, namely the display device is in a sound-picture asynchronous state. When the display device is in the state of sound and picture asynchronization, two situations can be included: one is that the video presentation precedes the audio playback, where the time stamp of the video data is greater than the time stamp of the audio data. The other is that the video display lags the audio playback, where the time stamp of the video data is less than the time stamp of the audio data. For both cases, the video data may be adjusted so that the two data are synchronized.

By judging the size relationship of the time stamps of the audio data and the video data, the sound and picture synchronization condition can be determined.

In some embodiments, considering some problems of the film source itself, it may happen that the timestamps of the video data and the audio data are not completely the same, but have some deviation; or subsequent processing of the video data and audio data, such as a rendering process, the processing time for the video data may be longer than the processing time for the audio data, resulting in the playback time not being exactly the same. An error range, for example 10 milliseconds, can thus be set. When the difference value between the timestamps of one frame of video data and one frame of audio data is within the error range, the two frames of data can be considered to be in a synchronous state, that is, the display device is in a sound-picture synchronous state.

Specifically, the controller may set three audio-video synchronization conditions as follows, in this embodiment, Tv represents a time stamp of the video data, Ta represents a time stamp of the audio data, and s represents a preset error threshold.

The first sound-picture synchronization condition is as follows: and | Tv-Ta | is less than or equal to s, and Tv is less than Ta. I.e. the time stamp of the video data is smaller than the time stamp of the audio data and the absolute value of the difference between the two time stamps is larger than a preset error threshold. At this point, the video display lags the audio playback.

The second sound-picture synchronization condition is as follows: l Tv-Ta l is less than or equal to s, and Tv is greater than Ta. That is, the time stamp of the video data is greater than the time stamp of the audio data, and the absolute value of the difference between the two time stamps is greater than the preset error threshold. At this time, the video display precedes the audio playback.

The third sound and picture synchronization condition is as follows: and | Tv-Ta | is less than or equal to s. I.e. the absolute value of the difference between the two timestamps is smaller than a preset error threshold. At this time, it is considered that the sound-picture synchronization state is already present.

In some embodiments, the picture-in-sound synchronization condition may be determined on a frame-by-frame basis for the decoded video data and audio data. Specifically, a difference between the first video timestamp and the first audio timestamp may be obtained first, and this embodiment of the present application is referred to as a first difference. The controller may detect which sound-picture synchronization condition is satisfied by the first difference value, and perform corresponding processing.

FIG. 11 is a flow diagram that illustrates the detection of a syncing event in some embodiments. As shown in fig. 11, the following steps may be included:

step S1101: a first video timestamp and a first audio timestamp are determined. The first video timestamp is a timestamp of a first frame of video data in the decoded video data, and the first audio timestamp is a timestamp of a first frame of audio data in the decoded audio data.

Step S1102: a first difference of the first video timestamp and the first audio timestamp is obtained.

Step S1103: detecting a condition satisfied by the first difference. Specifically, three conditions may be included: the first sound-picture synchronization condition, the second sound-picture synchronization condition and the third sound-picture synchronization condition.

Step S11041: if the first difference value is detected to meet the first sound-picture synchronization condition, namely: the first video timestamp is less than the first audio timestamp, and the absolute value of the first difference is greater than the error threshold, then the video presentation is deemed to be behind the audio presentation.

Step S11041: if the first difference value is detected to meet the second sound-picture synchronization condition, namely: the first video timestamp is greater than the first audio timestamp, and the absolute value of the first difference is greater than the error threshold, then the video presentation is deemed to be ahead of the audio playback.

Step S11041: if the first difference value is detected to meet the third sound picture synchronization condition, namely: and if the absolute value of the first difference is less than the error threshold value, the display device is considered to be in the sound-picture synchronization state.

In some embodiments, if it is detected that the first difference satisfies the third voice-picture synchronization condition, the display apparatus is considered to be already in the voice-picture synchronization state. In this case, the first frame video data and the first frame audio data may not be subjected to other processing, and may be played according to their respective time stamps. Specifically, the time may be detected in real time, and the controller may control the display to display the first frame of video data when the time reaches the first video time stamp. And when the time reaches the first audio time stamp, controlling the audio equipment to play the first frame of audio data.

In some embodiments, if the first difference satisfies the third audio-video synchronization condition, the first audio time stamp and the first video time stamp are closer to each other, and the first audio time stamp may also be used as a playing time common to the two frames of data. Namely: and when the time reaches the first audio time stamp, controlling the audio equipment to play the first frame of audio data and simultaneously controlling the display to display the first frame of video data.

In some embodiments, if it is detected that the first difference satisfies the second voice-picture synchronization condition, it indicates that the display device is currently out of voice-picture synchronization and the video display is ahead of the audio playback. At this time, two frames of data need to be synchronized. Considering that the audio data may be in a normally played state, the controller may cause the video data to be displayed slowly, for example, may cause the playing time of the video data to be adjusted as much as possible so as to be synchronized with the audio data. Specifically, the controller may detect the time in real time, control the audio device to play the first frame of audio data when the time reaches the first audio timestamp, and control the display to display the first frame of video data.

Generally, the first frame of video data should be displayed at the time of the first video timestamp, and the controller may cause the display to continuously display the previous frame of video data first, similar to an effect of a video buffer pause, and when the time reaches the first audio timestamp, the first frame of video data and the audio data are synchronously played again, so as to achieve an effect of sound and picture synchronization.

In some embodiments, if the first frame of video data has been synchronized, the display may be controlled to display the first frame of video data at a set time. Meanwhile, the video data of the next frame can be continuously subjected to the synchronization processing. The controller may update the first frame of video data and the first frame of audio data, and determine, as the first frame of video data/audio data, the video data/audio data with the earliest playing order, that is, the smallest timestamp, among the data that is currently decoded and has not yet been played. Meanwhile, the first video time stamp and the first audio time stamp are also correspondingly updated and respectively determined as the time stamps of the current first frame of video data/audio data. At this time, the controller may continue the synchronization process of the first frame video data and the first frame audio data.

In some embodiments, if it is detected that the first difference satisfies the first sound-picture synchronization condition, it indicates that the display device is currently out of sound-picture synchronization and the video display lags behind the audio playing. Generally, the timestamps of the video data and the audio data should be similar, and when the video display lags the audio playing, it can be considered that the current time has exceeded the timestamp of the video data, i.e. the following video data should be played at the current time, and therefore, the video data should catch up with the audio data as soon as possible.

In the related art, some video data is directly discarded, so that the following video data can be displayed as soon as possible. However, the related art does not determine the reason for the asynchronism of the sound and the picture, the most suitable processing mode cannot be adopted, only the mode of discarding the video data is adopted, and if more data are discarded, a user can feel that the picture jumps obviously and the playing is not smooth; meanwhile, the amount of decoded video data is small, and if all the decoded video data are discarded, the same picture is displayed for a long time, so that the picture is jammed. The viewing experience for the user is poor.

In the embodiment of the present application, when it is detected that the video display lags behind the audio playing, that is, the first difference satisfies the first audio-video synchronization condition, the reason for the audio-video asynchronism may be determined in advance, for example: some film sources have problems, audio and video data in media resource data are not uniformly distributed, a certain amount of data can only be cached in a cache queue after demultiplexing, and in extreme cases, the audio cache region can be fully loaded, but the video cache region still has space, so that the video data is obviously less than the audio data. At this time, the demultiplexing process cannot be continued, so that new video data cannot be acquired, and the video data is slowly played and lags behind the audio data. In addition, decoding of high-resolution video data is resource-consuming, and if the system load of the display device is high, there is not enough calculation power to perform decoding operation, so that the decoding speed of the video data is low, the display requirement cannot be met, and the situation that the video data lags behind the audio data also occurs.

Therefore, the reason of the picture-sound asynchronism can be detected firstly, and then corresponding synchronous processing is further carried out.

In some embodiments, when it is detected that the video display lags behind the audio playing, it can be detected whether the video display is a problem of the film source itself, and for this special case, the sound and picture synchronization processing cannot be performed normally.

Specifically, the controller may determine to detect the current video buffer area and the current audio buffer area. And if the audio data in the audio buffer area is detected to reach the storage upper limit and the data volume of the video data in the video buffer area is smaller than the preset buffer threshold value, the audio and video data distribution of the media asset data is unbalanced. The audio data in the current decoded data is too much, and the video data is too little. Under normal conditions, the lack of video data can cause the display device to pause playing media assets, and trigger the action of player buffering, so that the display device can continue to acquire more video data. However, since the audio buffer area has reached the upper storage limit, the issuing thread of the demultiplexing module is blocked, and the video data and the audio data cannot be continuously issued, which finally causes the demultiplexing module to pause. At this time, even if some video data is discarded, the subsequent video data cannot be obtained, so that the picture is jammed, but the audio data is normally played.

For this particular case, the controller may obtain more video data by expanding the upper storage limit of the audio buffer region.

The space of the buffer module can be directly enlarged, and at the moment, the storage upper limit of the video buffer area and the audio buffer area can be increased, so that the demultiplexing module can work normally, more video data can be sent down, and normal playing can be carried out. FIG. 12 illustrates a diagram that expands the upper memory limit of an audio buffer region in some embodiments. As shown in fig. 12, the upper storage limit of the video buffer area and the audio buffer area is initially 5. After the space of the buffer module is enlarged, the storage upper limit of the video buffer area and the audio buffer area is changed to 8.

In some embodiments, the buffer module is composed of a video buffer area and an audio buffer area, and the storage upper limit of the video buffer area and the storage upper limit of the audio buffer area are generally the same. Considering that the current audio data is too much and the video data is too little, the spatial ratio of the video buffer area and the audio buffer area can be adjusted, for example, from 1: 1 is adjusted to 3: and 7, increasing the upper storage limit of the audio buffer area so that the demultiplexing module can work normally and the video data can be played normally. FIG. 13 illustrates a diagram that expands the upper memory limit of an audio buffer region in some embodiments. As shown in fig. 13, the space of the buffer module is 10, and the storage upper limits of the video buffer area and the audio buffer area are both 5 in the initial case. After the adjustment, the upper storage limit of the video buffer area becomes 3 and the upper storage limit of the audio buffer area becomes 7. At this time, the space of the cache module is still 10.

Under the condition that the distribution of the video and audio data of the film source is not uniform, the mode of properly expanding the audio buffer area not only reduces the possibility of picture blocking, but also can obtain proper video data as soon as possible so as to carry out subsequent sound and picture synchronous processing.

In some embodiments, when it is detected that the video display lags the audio playback, it may be detected whether it is a problem with the display device system. Specifically, the controller may obtain a current system load rate of the display device, and determine whether the system load rate is higher than a preset threshold.

If the system load of the display equipment is too high, the CPU computing capacity distributed to the decoding module is insufficient, the decoding speed of video data cannot meet the requirement, and the problems that the video lags behind the audio frequency and the sound and the picture are not synchronous can also occur. In this case, the video cannot catch up with the audio in theory, and even if the video data is discarded, more video data cannot be obtained to catch up with the progress of the audio.

In this case, the controller may control the display to display a load prompt message to prompt the user that the system load rate of the display device is too high to exceed the preset threshold. The user can further reduce the system load. As shown in fig. 14, the load prompt information may be: when detecting the system load rate is too high, please select to close the application!

Meanwhile, the controller can control the display equipment to pause playing the media resources, and the media resources are played again after the system load rate is reduced to be lower than a preset threshold value. Sporadic pause buffering is more acceptable to users than sound-picture asynchrony.

In some embodiments, the controller may control the display to display the load prompt interface when the system load rate is detected to be higher than the preset threshold. As shown in fig. 15, the load prompt interface includes load prompt information and a load list. The load prompt information is used for representing that the system load rate of the display device exceeds a preset threshold, and the load list comprises a plurality of application programs running in the display device. The user may choose to shut down some of the applications to reduce the system load rate.

In some embodiments, if the sound and picture are not out of synchronization due to problems with the film source itself or problems with the display device system, special causes may be eliminated. It can be considered that some problems occur in the processing process of the audio data and the video data, for example, the audio and video are not synchronized due to different processing speeds in the rendering process, and the audio and video are not synchronized conventionally. At this time, the controller may further perform a synchronization process.

In some embodiments, the controller may determine the last frame of video data in the decoded video data. The last frame video data refers to: among the decoded data that have not yet been played, the video data that is the most advanced in the playback order has the largest time stamp among the decoded video data. In the embodiment of the present application, the timestamp of the last frame of video data is referred to as a second video timestamp. The controller may perform audio-visual synchronization processing on the video data and the audio data according to the first video time stamp, the second video time stamp, and the first audio time stamp. The video data can be synchronized in consideration of the fact that the audio data is normally played all the time.

In some embodiments, the controller may first obtain a difference between the second video timestamp and the first audio timestamp, which is referred to as a second difference in this embodiment. The controller may determine a specific synchronization method according to the second difference, where the second difference may represent a play relationship between the last frame of video data and the first frame of audio data.

Specifically, the controller may directly detect whether the second difference satisfies the first sound-picture synchronization condition.

In some embodiments, if the second difference satisfies the first audio-visual synchronization condition, it indicates that the time stamps of the last frame of video data are all smaller than the first audio time stamps, i.e. the last frame of video data should be played earlier than the first frame of audio data under normal conditions. Therefore, all the currently decoded video data should have been played, but none of the currently decoded video data has been played, and a certain frame of video data following the video data should be played at the current time. At this time, the sound and picture synchronization cannot be realized by only adjusting the playing time of the video data, but if all the video data are discarded, the playing is not smooth. Therefore, the controller can control the video data to play at double speed, and the playing speed is increased, so that the audio data can be pursued as soon as possible. FIG. 16 illustrates a time stamp diagram for video data and audio data in some embodiments. The timestamps gradually increase from left to right, and may also represent the playing sequence under normal conditions. The first audio timestamp is P1 and the first video timestamp is V1. There are currently a total of four frames of video data decoded, and the timestamp of the last frame of video data is V4. It can be seen that the timestamps of the last frame of video data are all smaller than the first audio timestamp, and therefore, all four frames of video data should be played before the first frame of audio data. At this time, it is necessary to play the video data as soon as possible.

Specifically, the controller may first obtain an average duration of normally playing a frame of video data. When video data is played, a frame rate is corresponding, for example, the playing frame rate is 25fps, which means that 25 frames of video data are displayed in a display within one second, and therefore the average duration of playing one frame of video data is 40 milliseconds.

The controller may calculate a ratio of the second ratio to the first difference value, and calculate a product of the ratio and an average duration of one frame of video data. The obtained product may be used as the display duration of the first frame of video data.

The display duration of the first frame of video data may be obtained according to formula (1), which is as follows:

Tadp＝Tdur*(T1-T2)/(Ta-T3) (1)

wherein Tadp represents a display duration of the first frame video data; tdur represents the average duration of playing a frame of video data; t1 denotes a first audio time stamp, T2 denotes a second video time stamp, and T3 denotes a first video time stamp.

Because the video data is to catch up with the audio data as soon as possible, and the system time may exceed the first video timestamp, the controller may directly control the display to display the first frame of video data, and the display duration is the calculated Tadp.

For the last frame of video data corresponding to the second video timestamp, the controller may continue to obtain the display durations of all video data before the last frame of video data, and all the display durations may be less than the average duration. And controlling the display to sequentially display the video data by the controller according to the obtained display time length. By shortening the display duration of each frame of video data, more video data can be displayed in the same time, the difference between the video data and the audio data is gradually shortened, the video data can continuously catch up with the audio data, and finally, synchronization is achieved.

The controller may update the first frame video data, the first frame audio data, and the last frame video data until the last frame video data starts to be displayed, and may continue the synchronization process on the current first frame video data.

In some embodiments, if the second difference satisfies the first sound-picture synchronization condition, the second difference satisfies the second sound-picture synchronization condition or the third sound-picture synchronization condition.

It is indicated that the timestamp of the last frame of video data is greater than or equal to the first audio timestamp, that is, the last frame of video data should be played together with the first frame of audio data or later than the first frame of audio data under normal conditions. Therefore, a part of all the video data after current decoding should have been played and a part of the video data has not been played. A certain frame of video data among these video data should be played at the current time. FIG. 17 illustrates a time stamp diagram for video data and audio data in some embodiments. Wherein the first audio timestamp is P1 and the first video timestamp is V1. There are currently a total of four frames of video data decoded, and the timestamp of the last frame of video data is V4. It can be seen that the third frame time stamp and the last frame time stamp are both greater than the first audio time stamp, and therefore, the first two frames of video data should be played before the first frame of audio data, and the second two frames of video data should be played after the first frame of audio data. At this time, some video data may be deleted, for example, the first two frames may be deleted, thereby realizing sound-picture synchronization.

Specifically, the controller may traverse the timestamps of the video data to obtain the timestamp closest to the first audio timestamp, and use the timestamp as target frame video data, where the target frame video data is video data that should be played by the display device under normal conditions, and the target frame video data is the last frame video data or a frame video data before the last frame video data. At this time, the controller may delete all video data preceding the target frame video data. The target frame video data and the first frame audio data are in a synchronized state. Accordingly, it is possible to control the display to display the target frame video data and the audio device to play the first frame audio data when the time reaches the first audio time stamp. The target frame video data may also be played at the time of its time stamp, taking into account errors.

In some embodiments, the controller may, after traversing the timestamps of the video data, reserve video data corresponding to all timestamps whose values are greater than a threshold value with the first audio timestamp as a threshold value, and use the first frame video data in the retained video data as the target frame video data.

In some embodiments, some specific factors, such as a slow decoding rate of the decoding module, may result in less audio data being decoded, or even only one frame of video data being decoded. At this time, only one frame of video data may not have the value of sound-picture synchronization, and the video data may be normally played directly in order to enable the user to continuously watch the video data.

The controller may first detect whether the number of decoded video data is greater than 1.

If the value is more than 1, the sound and picture synchronous processing can be normally carried out, and the controller can enable the data to enter the sound and picture synchronous module primarily and secondarily so as to carry out the sound and picture synchronous processing on the decoded video data and the decoded audio data. If the number is 1, the video data can be played normally, and the frame of video data is the first frame of video data. When the time reaches the first video time stamp, the display is controlled to display the first frame of video data, and when the time reaches the first audio time stamp, the audio device is controlled to play the first frame of audio data.

In some embodiments, when it is detected that the first difference satisfies the first audio-video synchronization condition, it may be determined whether the number of decoded video data is greater than 1, so as to determine whether to perform audio-video synchronization processing.

In some embodiments, it is contemplated that the asset data may have subtitle data therein. Therefore, the subtitle data can also be synchronized with the audio data.

For the viewing habit of human beings, because the subtitles are displayed on the uppermost layer of the layer, when the subtitles exist, a part of attention is allocated to the subtitle picture, so that the synchronization of the subtitles and the sound can also be regarded as the synchronization of a part of the video picture and the audio. Meanwhile, some users may pay more priority to the synchronization of subtitles and audio than video data. Therefore, in the sound and picture synchronization processing process, subtitle group data can be processed, attention of a user to video picture and sound asynchronization can be reduced to a certain extent, and the influence of picture asynchronization is relieved.

Specifically, in the related art, when synchronizing subtitles, synchronization processing is performed based on a video clock. In this embodiment, the controller may determine the system clock based on the first audio timestamp, and determine the audio clock as the system clock. The controller may perform synchronous processing on the decoded subtitle data and audio data according to a system clock.

At this time, the synchronization process of the caption data takes the audio clock as reference, and realizes the synchronization of the caption and the audio, thereby improving the watching experience of the user.

An embodiment of the present application further provides a media asset playing method, as shown in fig. 18, the method includes:

step 1801, acquiring the media asset data.

Step 1802, video data and audio data are obtained based on the media asset data, and the video data and the audio data are decoded.

And 1803, performing audio-video synchronization processing on the decoded video data and audio data according to the first video timestamp, the second video timestamp and the first audio timestamp, and playing. The first video timestamp is a timestamp of a first frame of video data in the decoded video data, the second video timestamp is a timestamp of a last frame of video data in the decoded video data, and the first audio timestamp is a timestamp of a first frame of audio data in the decoded audio data.

Instead of just discarding the video data to perform sound and picture synchronization, the playing time of the video data is adjusted according to the timestamp of the data, for example, in a double-speed playing or frame dropping manner, so that the sound and picture synchronization is realized, and the watching experience of a user is improved.

The same and similar parts in the embodiments in this specification may be referred to one another, and are not described herein again.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method in the embodiments or some parts of the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, comprising:

a display;

an audio output interface configured to connect to an audio device;

a controller configured to:

2. The display device according to claim 1, wherein the controller is further configured to:

in the step of performing sound-picture synchronization processing on the decoded video data and audio data, and performing playback,

obtaining a first difference value of the first video timestamp and the first audio timestamp;

acquiring a second difference value of the second video time stamp and the first audio time stamp based on the fact that the first difference value meets a preset first sound-picture synchronization condition;

detecting whether the second difference value meets the first sound-picture synchronization condition or not;

if yes, performing first sound and picture synchronization processing on the decoded video data and audio data; and if not, performing second sound-picture synchronization processing on the decoded video data and audio data.

3. The display device of claim 2, wherein the controller is further configured to:

after performing the step of obtaining a first difference of the first video timestamp and the first audio timestamp,

based on the fact that the first difference value meets a preset second sound-picture synchronization condition, when the time reaches the first audio time stamp, controlling a display to display the first frame of video data, and controlling an audio device to play the first frame of audio data;

based on the fact that the first difference value meets a preset third audio-video synchronization condition, when the time reaches the first video time stamp, controlling a display to display the first frame of video data, and when the time reaches the first audio time stamp, controlling an audio device to play the first frame of audio data;

and updating the first video time stamp, the second video time stamp and the first audio time stamp, and performing audio-video synchronization processing on the decoded video data and audio data.

4. The display device of claim 2, wherein the controller is further configured to:

in the step of performing the first sound-picture synchronization process on the decoded video data and audio data,

acquiring the average time length for playing a frame of video data, and calculating the ratio of the second difference value to the first difference value;

calculating the product of the ratio and the average duration, and taking the product as the display duration of the first frame of video data;

and controlling a display to display the first frame of video data based on the display duration.

5. The display device according to claim 2, wherein the controller is configured to:

in the step of performing the second sound-picture synchronization process on the decoded video data and audio data,

acquiring target frame video data with a timestamp closest to the first audio timestamp, and deleting all video data before the target frame video data;

when the time reaches the first audio time stamp, controlling a display to display the target frame video data and controlling an audio device to play the first frame audio data;

6. The display device of claim 1, wherein the controller is further configured to:

after performing the step of decoding the video data and the audio data,

judging whether the number of the decoded video data is more than 1;

if yes, executing the step of performing sound and picture synchronization processing on the decoded video data and audio data;

if not, when the time reaches the first video time stamp, controlling a display to display the first frame of video data, and when the time reaches the first audio time stamp, controlling an audio device to play the first frame of audio data.

7. The display device of claim 2, wherein the controller is further configured to:

in performing the step of acquiring video data and audio data,

acquiring media asset data;

carrying out demultiplexing processing on the media asset data to obtain video data and audio data;

storing the audio data in a preset audio cache region, and storing the video data in a preset video cache region;

prior to performing the step of obtaining a second difference of the second video timestamp and the first audio timestamp,

detecting whether the audio data in the audio buffer area reaches an upper storage limit;

if so, expanding the upper storage limit of the audio cache region;

and if not, executing the step of acquiring a second difference value of the second video time stamp and the first audio time stamp.

8. The display device according to claim 2, wherein the controller is configured to:

acquiring the current system load rate of the display equipment, and judging whether the system load rate is higher than a preset threshold value or not;

if so, controlling a display to display a load prompt interface; the load prompt interface comprises load prompt information and a load list, wherein the load prompt information is used for representing that the system load rate of the display equipment exceeds a preset threshold value, and the load list comprises a plurality of application programs running in the display equipment;

if not, a step of obtaining a second difference value of the second video time stamp and the first audio time stamp is executed.

9. The display device of claim 1, wherein the controller is further configured to:

acquiring subtitle data and decoding the subtitle data;

and determining a system clock based on the first audio time stamp, and performing synchronous processing on the decoded subtitle data and audio data according to the system clock.

10. A sound and picture synchronization method is applied to display equipment and is characterized by comprising the following steps: