CN113038193B

CN113038193B - Method for automatically repairing asynchronous audio and video and display equipment

Info

Publication number: CN113038193B
Application number: CN202110312267.4A
Authority: CN
Inventors: 汤小娜; 杨依灿
Original assignee: Vidaa Netherlands International Holdings BV; Vidaa USA Inc
Current assignee: Vidaa Netherlands International Holdings BV; Vidaa USA Inc
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2023-08-11
Anticipated expiration: 2041-03-24
Also published as: CN113038193A

Abstract

The application discloses a method and display equipment for automatically repairing the asynchronous condition of audio and video, which are used for improving the asynchronous condition of the audio and video and enabling the audio and video to return to a synchronous state. The method comprises the following steps: receiving an instruction for playing audio and video data, and sending an audio data request and a video data request; determining the time of audio injection data according to the fed-back audio data; determining the time of video injection data according to the fed-back video data; decoding the audio data and the video data, and determining audio decoding data time and video decoding data time; when the audio decoding data time is greater than the video decoding data time, calculating the audio injection limit time according to the video decoding data time and the highest water level threshold value; when the audio injection data time is equal to the audio injection limit time, sending the audio data playing request is suspended until the difference between the audio injection limit time and the audio decoding data time is smaller than the lowest water level threshold.

Description

Method for automatically repairing asynchronous audio and video and display equipment

Technical Field

The application relates to the technical field of audio and video synchronization, in particular to a method for automatically repairing asynchronous audio and video and display equipment.

Background

In the related art, a user can watch audio and video on a display device, and illustratively watch a screening and communication through the display device. In an ideal case, the audio and video may be completely synchronized. However, in the actual audio/video playing process of the display device, since the audio output is linear and the video output may be nonlinear, the time consumed for decoding and rendering the audio and video may be different, and eventually, each frame output may have a slight gap, and long-term accumulation and asynchronous audio and video becomes more and more obvious.

Disclosure of Invention

The embodiment of the application provides a method for automatically repairing asynchronous audio and video and display equipment, which can improve user experience.

In a first aspect, there is provided a display device including:

a display for displaying a user interface;

a user interface for receiving an input signal;

a controller coupled to the display and the user interface, respectively, for performing:

receiving an instruction for playing audio and video data, and sending an audio data request and a video data request;

receiving feedback audio data and video data, and determining audio injection data time and video injection data time;

decoding the audio data and the video data, and determining audio decoding data time and video decoding data time;

when the audio decoding data time is greater than the video decoding data time, calculating the audio injection limit time according to the video decoding data time and the highest water level threshold value; when the audio injection data time is equal to the audio injection limit time, sending the audio data playing request is suspended until the difference between the audio injection limit time and the audio decoding data time is smaller than the lowest water level threshold.

In some embodiments, the controller is further configured to perform: when the audio decoding data time is not greater than the video decoding data time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

In some embodiments, the controller is further configured to perform: when the audio injection data time is less than the audio injection limit time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

In some embodiments, the calculating the audio injection limit time according to the video decoding time and the highest water level threshold value is calculated according to the following formula:

audio injection limit time = audio decoding data time + (highest water level threshold- (audio decoding data time-video decoding data time)).

In some embodiments, the highest water level threshold is 2s and the lowest water level threshold is 0.5s.

In a second aspect, a method for automatically repairing an audio/video asynchronous state is provided, including:

receiving an instruction for playing audio and video data, and sending an audio data request and a video data request; determining the time of audio injection data according to the fed-back audio data; determining the time of video injection data according to the fed-back video data;

when the audio decoding data time is greater than the video decoding data time, calculating the audio injection limit time according to the video decoding data time and the highest water level threshold value; when the audio injection data time is equal to the audio injection limit time, sending the audio data playing request is suspended, only the video data request is sent, and the step of determining the video injection data time according to the fed-back video data is repeatedly executed.

In some embodiments, the method further comprises: when the audio decoding data time is not greater than the video decoding data time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

In some embodiments, the method further comprises: when the audio injection data time is less than the audio injection limit time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

In the above embodiment, a method and a display device for automatically repairing an audio/video asynchronous state improve the situation of the audio/video asynchronous state and enable the audio/video to return to a synchronous state. The method comprises the following steps: receiving an instruction for playing audio and video data, and sending an audio data request and a video data request; determining the time of audio injection data according to the fed-back audio data; determining the time of video injection data according to the fed-back video data; decoding the audio data and the video data, and determining audio decoding data time and video decoding data time; when the audio decoding data time is greater than the video decoding data time, calculating the audio injection limit time according to the video decoding data time and the highest water level threshold value; when the audio injection data time is equal to the audio injection limit time, sending a play audio data request is suspended until the difference between the audio injection limit time and the audio decoding data time is less than the minimum water level threshold.

Drawings

FIG. 1 illustrates a usage scenario of a display device according to some embodiments;

fig. 2 shows a hardware configuration block diagram of the control apparatus 100 according to some embodiments;

fig. 3 illustrates a hardware configuration block diagram of a display device 200 according to some embodiments;

FIG. 4 illustrates a software configuration diagram in a display device 200 according to some embodiments;

a flowchart of a method of automatically repairing an audio-video dyssynchrony is shown schematically in fig. 5, in accordance with some embodiments;

the timeline of an audio-video according to some embodiments is shown schematically in fig. 6.

Detailed Description

For the purposes of making the objects and embodiments of the present application more apparent, an exemplary embodiment of the present application will be described in detail below with reference to the accompanying drawings in which exemplary embodiments of the present application are illustrated, it being apparent that the exemplary embodiments described are only some, but not all, of the embodiments of the present application.

It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms "first," second, "" third and the like in the description and in the claims and in the above drawings are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the smart device 300 or the control apparatus 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, and the display device 200 is controlled by a wireless or wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc.

In some embodiments, a smart device 300 (e.g., mobile terminal, tablet, computer, notebook, etc.) may also be used to control the display device 200. For example, the display device 200 is controlled using an application running on a smart device.

In some embodiments, the display device 200 may also perform control in a manner other than the control apparatus 100 and the smart device 300, for example, the voice command control of the user may be directly received through a module configured inside the display device 200 device for acquiring voice commands, or the voice command control of the user may be received through a voice control device configured outside the display device 200 device.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 in accordance with an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive to the display device 200, and function as an interaction between the user and the display device 200.

Fig. 3 shows a hardware configuration block diagram of the display device 200 in accordance with an exemplary embodiment.

In some embodiments, display apparatus 200 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, memory, a power supply, a user interface.

In some embodiments the controller includes a processor, a video processor, an audio processor, a graphics processor, RAM, ROM, a first interface for input/output to an nth interface.

In some embodiments, the display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, for receiving image signals from the controller output, for displaying video content, image content, and a menu manipulation interface, and for manipulating a UI interface by a user.

In some embodiments, the display 260 may be a liquid crystal display, an OLED display, a projection device, and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver. The display device 200 may establish transmission and reception of control signals and data signals with the external control device 100 or the server 400 through the communicator 220.

In some embodiments, the user interface may be configured to receive control signals from the control device 100 (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for capturing the intensity of ambient light; alternatively, the detector 230 includes an image collector such as a camera, which may be used to collect external environmental scenes, user attributes, or user interaction gestures, or alternatively, the detector 230 includes a sound collector such as a microphone, or the like, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, or the like. The input/output interface may be a composite input/output interface formed by a plurality of interfaces.

In some embodiments, the modem 210 receives broadcast television signals via wired or wireless reception and demodulates audio-video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command to select a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other operable control. The operations related to the selected object are: displaying an operation of connecting to a hyperlink page, a document, an image, or the like, or executing an operation of a program corresponding to the icon.

In some embodiments the controller includes at least one of a central processing unit (Central Processing Unit, CPU), video processor, audio processor, graphics processor (Graphics Processing Unit, GPU), RAM Random Access Memory, RAM), ROM (Read-Only Memory, ROM), first to nth interfaces for input/output, a communication Bus (Bus), and the like.

A CPU processor. For executing operating system and application program instructions stored in the memory, and executing various application programs, data and contents according to various interactive instructions received from the outside, so as to finally display and play various audio and video contents. The CPU processor may include a plurality of processors. Such as one main processor and one or more sub-processors.

In some embodiments, a graphics processor is used to generate various graphical objects, such as: icons, operation menus, user input instruction display graphics, and the like. The graphic processor comprises an arithmetic unit, which is used for receiving various interactive instructions input by a user to operate and displaying various objects according to display attributes; the device also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image composition, etc., according to a standard codec protocol of an input signal, and may obtain a signal that is displayed or played on the directly displayable device 200.

In some embodiments, the video processor includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the demultiplexed video signal, including decoding, scaling and the like. And an image synthesis module, such as an image synthesizer, for performing superposition mixing processing on the graphic generator and the video image after the scaling processing according to the GUI signal input by the user or generated by the graphic generator, so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received frame rate into a video output signal and changing the video output signal to be in accordance with a display format, such as outputting RGB data signals.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the audio signal according to a standard codec protocol of an input signal, and perform noise reduction, digital-to-analog conversion, and amplification processing to obtain a sound signal that can be played in a speaker.

In some embodiments, a user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.

In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of the user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

In some embodiments, a system of display devices may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together form the basic operating system architecture that allows users to manage files, run programs, and use the system. After power-up, the kernel is started, the kernel space is activated, hardware is abstracted, hardware parameters are initialized, virtual memory, a scheduler, signal and inter-process communication (IPC) are operated and maintained. After the kernel is started, shell and user application programs are loaded again. The application program is compiled into machine code after being started to form a process.

As shown in fig. 4, a system of display devices may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together form the basic operating system architecture that allows users to manage files, run programs, and use the system. After power-up, the kernel is started, the kernel space is activated, hardware is abstracted, hardware parameters are initialized, virtual memory, a scheduler, signal and inter-process communication (IPC) are operated and maintained. After the kernel is started, shell and user application programs are loaded again. The application program is compiled into machine code after being started to form a process.

As shown in fig. 4, the system of the display device is divided into three layers, an application layer, a middleware layer, and a hardware layer, from top to bottom.

The application layer mainly comprises common applications on the television, and an application framework (Application Framework), wherein the common applications are mainly applications developed based on Browser, such as: HTML5 APPs; native applications (Native APPs);

the application framework (Application Framework) is a complete program model with all the basic functions required by standard application software, such as: file access, data exchange …, and the interface for the use of these functions (toolbar, status column, menu, dialog box).

Native applications (Native APPs) may support online or offline, message pushing, or local resource access.

The middleware layer includes middleware such as various television protocols, multimedia protocols, and system components. The middleware can use basic services (functions) provided by the system software to connect various parts of the application system or different applications on the network, so that the purposes of resource sharing and function sharing can be achieved.

The hardware layer mainly comprises a HAL interface, hardware and a driver, wherein the HAL interface is a unified interface for all the television chips to be docked, and specific logic is realized by each chip. The driving mainly comprises: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc.

In order to solve the above technical problems, an embodiment of the present application provides a method for automatically repairing an audio/video asynchronous, as shown in fig. 5, the method includes:

s100, receiving an instruction for playing audio and video data, and sending an audio data request and a video data request. In the embodiment of the application, the instruction of playing the audio and video data can be completed by pressing the confirmation key on the control device by the user, and the selection and communication control is displayed on the display interface by way of example, the user moves the selector to the selection and communication control by the control device and presses the confirmation key on the control device, so that the instruction of playing the audio and video data is generated.

The display device may be equipped with a video application, for example, a YouTube video application, and may play video through a YouYube video application. The YouTube video application needs to use a browser as a carrier of the video application, for example, the browser may be a cobalt browser. In the embodiment of the application, a playing architecture in a display device system comprises a browser layer, a middle layer and a player layer. The browser layer is the source of audio and video data. The middle layer may process audio and video data, including sending audio and video data requests to the browser layer, injecting data to the player layer, and so forth. The player layer is used for decrypting, decoding, synchronizing and playing the decoded audio and video data.

In the embodiment of the application, the instruction for playing audio and video data is received, and the middle layer sends an audio data request and a video data request to the browser layer. Ideally, the audio and video can be synchronized, and the sending of the audio data request and the video data request can be performed synchronously at all times. However, the time consumed for decoding and rendering the video and the audio in actual playing is different, and finally, the audio and the video can be played out of synchronization to influence the watching of the user. Therefore, the embodiment of the application improves the phenomenon that the audio and video play are different by controlling the situation of sending the audio data request and the video data request.

S200, receiving the fed-back audio data and video data, and determining the audio injection data time and the video injection data time. In the embodiment of the application, after sending the audio data request and the video data request, the middle layer receives the fed back audio data and video data and determines the time of audio injection data and the time of video injection data.

The audio injection data time refers to the time corresponding to the received audio data. The audio data is sent from the browser layer to the middle layer, the middle layer can analyze the time corresponding to the audio data, and when screening is played, the audio data sent to the middle layer at this time is 12s audio data of the screening and transmitting first set, and the audio injection data time is 12s. Similarly, the video injection data time can also be obtained by analyzing video data by using an intermediate layer.

S300, decoding the audio data and the video data, and determining the audio decoding data time and the video decoding data time. In some embodiments, the audio data and the video data are decoded by the player layer, and a time stamp corresponding to the currently decoded audio data is used as the audio decoding data time. The corresponding time stamp of the currently decoded video data is taken as the video decoding data time. It should be explained that since audio-video-audio data transmitted to the intermediate layer by the browser layer need to be decoded one by one, the audio decoding data time and the audio injection data time are not the same. For example, the audio injection data time may be 12s, but at this time the audio is not completely decoded, and the audio decoding data time may be 10s. Similarly, the video decoding data time and the video injection data time are different.

In the embodiment of the application, the audio data and the video data are played after being decoded, so the audio decoding data time can be understood as the current audio playing progress, and the video decoding data time can be understood as the current video playing progress. In the embodiment of the application, the video output is nonlinear, the audio output is linear, the video decoding time is slow, and the audio decoding time is relatively fast, so that the progress of video playing is taken as the content on the progress bar displayed on the display.

S400, judging the size of the audio decoding data time and the video decoding data time.

S500, when the audio decoding data time is greater than the video decoding data time, calculating the audio injection limit time according to the video decoding data time and the highest water level threshold value. In the embodiment of the application, when the time of audio decoding data is not less than the time of video decoding data, the phenomenon that the audio and video playing is asynchronous is indicated, the audio decoding is fast, the video decoding is slow, and the situation that the audio and video are asynchronous is aggravated if corresponding processing is not performed. In the embodiment of the application, the audio injection limit time is utilized to control the request for sending the audio data.

In order to avoid the aggravation of the asynchronous condition of the audio and the video, the embodiment of the application limits the audio injection data time, reduces the difference between the audio injection data time and the video injection data time, and further repairs the asynchronous condition of the audio and the video.

In some embodiments, as shown in fig. 6, a timeline for video and a timeline for audio are shown in fig. 6. In fig. 6, the video decoding data time is smaller than the audio decoding data time, in which case the audio and video are not synchronized. After the calculation of the highest water level threshold value, it can be seen that the video injection data time and the original video injection data time still have a larger difference, which can lead to the situation that the video and the audio are not synchronous to be improved. The embodiment of the application is provided with the audio injection limit time, and the audio injection data time is controlled at the audio injection limit time, so that the difference between the video injection data time and the audio injection data time can be reduced, and the condition of asynchronous audio and video is improved.

The audio injection limit time is calculated according to the video decoding time and the highest water level threshold value, and the audio injection limit time is calculated according to the following formula: audio injection limit time = audio decoding data time + (highest water level threshold- (audio decoding data time-video decoding data time)). In the embodiment of the present application, the highest water level threshold is the highest time difference between the audio injection data time and the audio decoding data time, and the time difference between the video injection data time and the video decoding data time. In some embodiments, the highest water level threshold may be 2s and the lowest water level threshold may be 0.5s. Illustratively, the video decoding data time is 10s and the audio injection limit time is 12s.

S500, judging the audio injection data time and the audio injection limit time.

And S600, when the audio injection data time is equal to the audio injection limit time, sending the audio data playing request is suspended until the difference value of the audio injection limit time and the audio decoding data time is smaller than the lowest water level threshold. In the embodiment of the application, when the audio injection data time reaches the audio injection limit time, the audio data playing request is not sent any more, and if the audio data playing request is continuously sent, the asynchronous condition of the audio and the video is only aggravated. Therefore, the embodiment of the application only sends the video data request, so that the difference between the video injection data time and the audio injection data time is continuously reduced, and the player synchronizes the audio and video after decoding the audio and video data so as to repair the unsynchronized state.

In some embodiments, the method further comprises: and S700, when the audio decoding data time is not more than the video decoding data time, repeating the steps of sending the audio data request and the video data request. In the embodiment of the application, when the audio decoding data time is not longer than the video decoding data time, although the audio playing and the video playing are asynchronous, the audio output is nonlinear, the video decoding is slow, and the audio decoding is fast, but the application mainly uses the video decoding data time, so that the video data request is not limited, and after the proper time, the audio playing progress can catch up with the video playing progress, so that only the steps of sending the audio data request and the video data request need to be repeatedly executed.

In some embodiments, the method further comprises: s800, when the audio injection data time is less than the audio injection limit time, repeating the steps of sending the audio data request and the video data request. In the embodiment of the application, the audio injection data time is smaller than the audio injection limit time, which means that the audio injection data time can be accepted now, so that the audio data request is not required to be limited.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, characterized by comprising:

a display for displaying a user interface;

a user interface for receiving an input signal;

receiving feedback audio data and video data, and determining audio injection data time and video injection data time, wherein the audio injection data time refers to time corresponding to the received audio data, and the video injection data time refers to time corresponding to the received video data;

decoding the audio data and the video data, and determining audio decoding data time and video decoding data time, wherein the audio decoding data time refers to a time stamp corresponding to the decoded audio data, and the video decoding data time refers to a time stamp corresponding to the decoded video data;

when the audio decoding data time is greater than the video decoding data time, calculating the audio injection limit time according to the video decoding data time and a highest water level threshold, wherein the highest water level threshold is a preset highest time difference value between the audio injection data time and the audio decoding data time; the calculation formula of the audio injection limit time is as follows: audio injection limit time = audio decoding data time+ (highest water level threshold- (audio decoding data time-video decoding data time));

when the audio injection data time is equal to the audio injection limit time, sending a request for playing the audio data is suspended until the difference between the audio injection limit time and the audio decoding data time is smaller than a minimum water level threshold, wherein the minimum water level threshold is preset as the minimum time difference between the audio injection data time and the audio decoding data time which can be accepted.

2. The display device of claim 1, wherein the controller is further configured to perform: when the audio decoding data time is not greater than the video decoding data time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

3. The display device of claim 1, wherein the controller is further configured to perform: when the audio injection data time is less than the audio injection limit time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

4. The display device of claim 1, wherein the highest water level threshold is 2s and the lowest water level threshold is 0.5s.

5. A method for automatically repairing an audio-video dyssynchrony, comprising:

receiving an instruction for playing audio and video data, and sending an audio data request and a video data request; determining the time of audio injection data according to the fed-back audio data; determining video injection data time according to the fed-back video data, wherein the audio injection data time refers to the time corresponding to the received audio data, and the video injection data time refers to the time corresponding to the received video data;

6. The method of claim 5, wherein the method further comprises: when the audio decoding data time is not greater than the video decoding data time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

7. The method of claim 5, wherein the method further comprises: when the audio injection data time is less than the audio injection limit time, the steps of transmitting the audio data request and the video data request are repeatedly performed.

8. The method of claim 5, wherein the highest water level threshold is 2s and the lowest water level threshold is 0.5s.