CN115243089B

CN115243089B - Audio and video synchronous rendering method and device and electronic equipment

Info

Publication number: CN115243089B
Application number: CN202210912502.6A
Authority: CN
Inventors: 徐洋; 陈金; 张平; 齐铁鹏; 蔡熙
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2022-07-30
Filing date: 2022-07-30
Publication date: 2024-01-02
Anticipated expiration: 2042-07-30
Also published as: CN115243089A

Abstract

The application discloses an audio and video synchronous rendering method and device and electronic equipment, wherein the method comprises the following steps: decoding a video time stamp and video image data from the video data to be decoded through a decoding sub-thread of the player, and performing format conversion on the audio data to be played through a encapsulation sub-thread of the player; the main thread of the player plays the audio data to be played after format conversion by using a playing component; monitoring an audio time stamp of the currently played audio data through a time update event by a main thread of the player; and rendering the video image data according to the video time stamp and the audio time stamp through a main thread of the player. The smoothness of synchronous rendering of the audio and video is improved, the playing is smoother, and the jamming rate is reduced.

Description

Audio and video synchronous rendering method and device and electronic equipment

Technical Field

The invention relates to the technical field of video processing, in particular to an audio and video synchronous rendering method and device and electronic equipment.

Background

High efficiency video coding (High Efficiency Video Coding, HEVC), is a successor to the ITU-T H.264/MPEG-4AVC standard. Compared with H.264, HEVC has higher compression rate, which means that the same code rate is used, the image quality of H.265 is clearer, and lower storage and transmission cost can be used by higher compression rate. The bit rate is also called bit rate, and the higher the bit rate, the more data is transmitted per second, and the more clear the image quality is.

Currently, most of the existing Web-side HEVC playing devices on the market adopt a video-to-audio alignment mode, and perform synchronous rendering playing by taking audio as a reference time axis, so how to accurately perform synchronous rendering of video under the condition of normal playing of audio becomes a problem to be solved.

Disclosure of Invention

The purpose of the application is to provide an audio and video synchronous rendering method and device and electronic equipment. The method and the device are used for solving the problem that the video accurately performs synchronous non-cartoon rendering under the condition of normal audio playing.

In a first aspect, an embodiment of the present application provides an audio and video synchronous rendering method, where the method includes:

decoding the video time stamp and the video image data from the video data to be decoded through a decoding sub-thread of the player, and monitoring the audio time stamp of the currently played audio data through a main thread of the player;

and rendering the video image data according to the video time stamp and the audio time stamp through a main thread of the player.

In some possible embodiments, the method further comprises: performing format conversion on audio data to be played through a packaging sub-thread of the player;

the main thread of the player plays the audio data to be played after format conversion by using a playing component;

monitoring, by a main thread of the player, an audio timestamp of currently played audio data, including:

the main thread of the player monitors the audio time stamp of the currently played audio data through a time update event.

In some possible embodiments, rendering, by a main thread of a player, the video image data according to the video timestamp and the audio timestamp, includes:

and directly rendering the video image data when the main thread of the player determines that the audio time stamp of the current audio data to be played is the same as the audio time stamp of the audio data played last time.

determining, by a main thread of a player, that an audio time stamp of current audio data to be played is different from an audio time stamp of last played audio data;

and if the video time stamp is smaller than or equal to the audio time stamp, rendering the video image data.

determining, by a main thread of the player, that an audio time stamp of the audio data to be played currently is different from an audio time stamp of the audio data played last,

and if the video time stamp is larger than the audio time stamp, delaying the rendering of the video image data, and when the video time stamp is smaller than or equal to the audio time stamp currently acquired by the sub-thread, rendering the video image data.

In a second aspect, an embodiment of the present application provides an audio and video synchronous rendering device, where the device includes:

the decoding module is used for decoding the video time stamp and the video image data from the video data to be decoded through a decoding sub-thread of the player, and monitoring the audio time stamp of the audio data currently played through a main thread of the player;

and the rendering module is used for rendering the video image data according to the video time stamp and the audio time stamp through a main thread of the player.

In a third aspect, embodiments of the present application provide an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, where the instructions are executable by the at least one processor, so that the at least one processor can perform the audio and video synchronous rendering method provided in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer storage medium storing a computer program for causing a computer to execute the audio/video synchronous rendering method provided in the first aspect.

In order to solve the problem of accurately performing synchronous rendering of video under the condition of normal audio playing, the embodiment of the application utilizes the performance advantage of audio timeupdate event callback to replace a conventionally used settimeout timing training mode to perform audio and video synchronous rendering. The audio time update event callback has the advantages that the triggering frequency of the audio time update event can ensure that the audio time update event is triggered for 4-66 times per second, the audio time stamp is independently acquired as long as the audio time stamp is updated, the audio time stamp passing process is carried out more uniformly and finely, the audio time stamp and the video time stamp are obtained more robustly, synchronization is carried out in the fastest time and is not influenced by the state of a main line Cheng Kongxian of the player, video frame accumulation is not caused, and video clamping phenomenon caused by delay of video in a live broadcast scene is avoided.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, and it is obvious that the drawings that are described below are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of an audio/video synchronous rendering method according to an embodiment of the present application;

FIG. 2 is a relationship diagram of a decoding sub-thread, a encapsulating sub-thread, and a player main thread in an audio/video synchronous rendering method according to an embodiment of the present application;

FIG. 3 is a detailed flow chart of an audio/video synchronous rendering method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an audio/video synchronous rendering device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings. In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and in addition, in the description of the embodiments of the present application, "plural" means two or more than two.

In the description of the embodiments of the present application, unless otherwise indicated, the term "plurality" refers to two or more, and other words and phrases are to be understood and appreciated that the preferred embodiments described herein are for illustration and explanation of the present application only and are not intended to limit the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.

In order to further explain the technical solutions provided in the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operational steps as shown in the following embodiments or figures, more or fewer operational steps may be included in the method based on routine or non-inventive labor. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application. The methods may be performed sequentially or in parallel as shown in the embodiments or the drawings when the actual processing or the control device is executing.

In view of the problem of how to accurately perform synchronous rendering of video under the condition of normal playing of audio in the related art. The application provides an audio and video synchronous rendering method and device, and electronic equipment, which can not cause video frame accumulation and avoid video clamping phenomenon caused by delay of video in a live scene.

The following describes an audio and video synchronous rendering method in the embodiment of the application in detail with reference to the accompanying drawings. The method is suitable for audio and video synchronous rendering of encoding and playing devices such as web end HEVC, H.266/VVC, which are all called multifunctional video encoding (Versatile Video Coding, VVC, also called H.266), digital audio and video encoding and decoding standards (Audio Video coding Standard, AVS), 2 nd algebraic audio and video encoding and decoding standards (Audio Video coding Standard2, AVS 2), 3 rd algebraic digital audio and video encoding and decoding standards (Audio Video coding Standard 3, AVS 3) and the like.

The existing web-side audio and video synchronous rendering technology uses a javascript timer, wherein the javascript timer generally sets two parameters, and one parameter is a function expected to be executed after expiration time (delay milliseconds); parameter two is the number of milliseconds delayed (one second equals 1000 milliseconds) after which the invocation of the function will occur; the timer may execute a function or a specified piece of code after the timer expires. The actual delay time may be longer than the expected value no matter what the set delay time is, the reason for this generation is various, so that the synchronization timer may be triggered only for a long time (for example, more than 1s, and generally, two video images are separated by more than 200ms and regarded as being stuck) to trigger a synchronization mechanism, so that the phenomenon of blocking the playing picture is caused, a large amount of decoded video pictures are piled up, but because the audio is continuously updated, a large amount of expired video image data has to be discarded for audio-video synchronization, the blocking rate is gradually increased, and the visual experience of the user is affected.

Fig. 1 shows a flowchart of an audio and video synchronous rendering method according to an embodiment of the present application, including:

step 101: the method comprises the steps of decoding a video time stamp and video image data from video data to be decoded through a decoding sub-thread of a player, and monitoring an audio time stamp of audio data played currently through a main thread of the player.

The player is provided with a main thread and a plurality of sub threads, the main thread and the sub threads are established when the player is in an initial state, the sub threads serve the main thread, data processed by the sub threads are sent to the main thread, and audio and video synchronization is carried out through the main thread.

When the method is applied to audio and video synchronous rendering of the HEVC player at the Web end, the audio and video synchronous playing is carried out by taking an audio time axis formed by audio time stamps as a main axis, namely, the process of video to audio synchronization.

The decoding sub-thread downloads the video stream firstly, the audio and the video are unpacked, and basic information of the audio and the video, such as length and width, video format, video length and the like, can be obtained through unpacking. The de-encapsulated h.265 video data needs to be further decoded to obtain video image data and the video timestamp of the video frame.

The decoding sub-thread decodes the video data to be decoded, and obtains the video time stamp and the video image data through calculation. The video image data refers to YUV data of a video image, the decoding thread decodes the YUV data of the video image and the video time stamp, and the video time stamp and the YUV data exist in a queue to wait for the main thread to use.

As an alternative embodiment, the method further comprises: performing format conversion on audio data to be played through a packaging sub-thread of the player; the main thread of the player plays the audio data to be played after format conversion by using a playing component; monitoring, by a main thread of the player, an audio timestamp of currently played audio data, including: the main thread of the player monitors the audio time stamp of the currently played audio data through a time update event.

The method comprises the steps that a packaging sub-thread of a player obtains current audio data to be played, the audio data are packaged into audio data in an mp4a format which can be identified by an audio element (an element tag for playing audio on a browser), the audio data are transmitted to the audio element, and a main thread of the player plays the audio data to be played after format conversion by using a playing component, wherein the playing component is the audio element. The audio timestamp is obtained by triggering a timeupdate event when the audio. That is, the audio timestamp needs to be obtained through an audio. CurrentTime attribute, which is an audio timestamp that is called once to return once, and the main thread of the player monitors the audio timestamp of the currently played audio data through the timeupdate event of the audio element.

Step 102: and rendering the video image data according to the video time stamp and the audio time stamp through a main thread of the player.

Specifically, after the decoding sub-thread finishes processing the video data to be decoded to obtain a video time stamp, video image data and the main thread of the player monitors the audio time stamp of the currently played audio data, the main thread of the player is used for rendering the video image data.

As an alternative embodiment, the rendering, by the main thread of the player, the video image data according to the video timestamp and the audio timestamp includes: and directly rendering the video image data when the main thread of the player determines that the audio time stamp of the current audio data to be played is the same as the audio time stamp of the audio data played last time.

Specifically, if there is audio time stamp of the audio data to be played currently is the same as the audio time stamp of the audio data played last, that is, the audio time stamp is not changed, the video image data is directly and continuously rendered.

As an alternative embodiment, the rendering, by the main thread of the player, the video image data according to the video timestamp and the audio timestamp includes: determining, by a main thread of a player, that an audio time stamp of current audio data to be played is different from an audio time stamp of last played audio data;

rendering the video image data if the video time stamp is less than or equal to the audio time stamp;

Specifically, if the audio time stamp of the audio data to be played is different from the audio time stamp of the audio data played last, the sizes of the video time stamp and the audio time stamp need to be determined.

The audio and video synchronous rendering adopts a strategy of synchronizing video to audio, namely normal playing of the audio, continuously checks the video frame at the head of the queue in the video to be rendered, compares the video time stamp of the video frame with the current audio time stamp to be played, and renders video image data corresponding to the video frame at the head of the queue to be rendered if the video time stamp is smaller than or equal to the audio time stamp. Otherwise, the audio time update event callback processing function is re-entered to acquire the audio time stamp in real time, and the video image data corresponding to the frame of video is re-rendered when the condition is met (the video time stamp is smaller than or equal to the audio time stamp acquired by the sub-thread currently). If the to-be-rendered queue is empty, the previous operation is resumed waiting a fixed time (e.g., 10 ms).

Referring to fig. 2, a relationship diagram of a decoding sub-thread, a packaging sub-thread and a player main thread in an audio/video synchronous rendering method;

the decoding sub-thread is used for decoding the video time stamp and YUV data of the video image from the video data to be decoded;

and the encapsulation sub-thread is used for converting the format of the audio data to be played if the video to be played is the audio data which is not supported by the AudioElement.

The main thread of the player is used for monitoring the audio time stamp of the currently played audio data through the time update event, and the video image data is rendered according to the video time stamp and the audio time stamp.

Referring to fig. 3, a detailed flowchart of an audio/video synchronous rendering method is shown.

Step 301, decoding a video time stamp and video image data from video data to be decoded by a decoding sub-thread of a player;

step 302, performing format conversion on audio data to be played through a packaging sub-thread of the player;

step 303, the main thread of the player uses a playing component to play the audio data to be played after format conversion;

step 304, monitoring an audio time stamp of the currently played audio data through a time update event by a main thread of the player;

step 305, judging whether the audio time stamp of the current audio data to be played is the same as the audio time stamp of the last audio data to be played through the main thread of the player, if so, executing step 308, and if not, executing step 306;

step 306, judging that the video time stamp is smaller than or equal to the audio time stamp, if not, executing step 307, if yes, executing step 308;

step 307, delay rendering video image data, returning to step 301;

step 308, renders video image data.

In the method, the performance advantage of audio timeupdate event callback is utilized to replace a conventionally used settimeout timing training mode for audio and video synchronous rendering. The audio time update event callback has the advantages that the triggering frequency of the audio time update event can ensure that the audio time update event is triggered for 4-66 times per second, the audio time stamp is independently acquired as long as the audio time stamp is updated, the audio time stamp passing process is carried out more uniformly and finely, the audio time stamp and the video time stamp are obtained more robustly, synchronization is carried out in the fastest time without being influenced by the state of a main line Cheng Kongxian of the player, video frame accumulation is avoided, the smoothness of audio and video synchronous rendering is further enhanced, video clamping phenomenon caused by delay of a video in a live broadcast scene is avoided, the playing is smoother and smoother, and the user visual experience of the clamping rate is better.

Example 2

Based on the same inventive concept, the present application further provides an audio/video synchronous rendering device, as shown in fig. 4, including:

the decoding module 401 is configured to decode, by using a decoding sub-thread of the player, a video timestamp and video image data from video data to be decoded, and monitor, by using a main thread of the player, an audio timestamp of audio data currently being played;

and a rendering module 402, configured to render, by a main thread of the player, the video image data according to the video timestamp and the audio timestamp.

Optionally, the apparatus further comprises:

a format conversion module 403, configured to perform format conversion on audio data to be played through a encapsulation sub-thread of the player;

a playing module 404, configured to play the audio data to be played after format conversion by using a playing component by using a main thread of the player;

the decoding module 401 is specifically configured to monitor an audio timestamp of the currently played audio data by the main thread of the player through a timeupdate event.

Optionally, the rendering module 402 is specifically configured to: and directly rendering the video image data when the main thread of the player determines that the audio time stamp of the current audio data to be played is the same as the audio time stamp of the audio data played last time.

Optionally, the rendering module 402 is specifically configured to: determining, by a main thread of a player, that an audio time stamp of current audio data to be played is different from an audio time stamp of last played audio data;

Optionally, the rendering module 402 is specifically configured to: determining, by a main thread of the player, that an audio time stamp of the audio data to be played currently is different from an audio time stamp of the audio data played last,

and if the video time stamp is larger than the audio time stamp, delaying the rendering of the video image data, monitoring the audio time stamp in real time, and rendering the video image data when the video time stamp is smaller than or equal to the audio time stamp currently acquired by the sub-thread.

Having described the audio and video synchronous rendering method and apparatus of an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

In some possible implementations, an electronic device according to the present application may include at least one processor, and at least one memory. The memory stores program code that, when executed by the processor, causes the processor to perform the steps in the audio-video synchronous rendering method according to various exemplary embodiments of the present application described above in the present specification.

An electronic device 130 according to this embodiment of the present application, i.e., the above-described audio-video synchronous rendering device, is described below with reference to fig. 5. The electronic device 130 shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 5, the electronic device 130 is in the form of a general-purpose electronic device. Components of electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 connecting the various system components, including the memory 132 and the processor 131.

Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.

Memory 132 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the electronic device 130, and/or any device (e.g., router, modem, etc.) that enables the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 135. Also, electronic device 130 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 130, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In some possible embodiments, aspects of an audio-video synchronous rendering method provided herein may also be implemented in the form of a program product comprising program code for causing a computer device to carry out the steps of an audio-video synchronous rendering method according to various exemplary embodiments of the present application as described herein above, when the program product is run on a computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for monitoring of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device, partly on the remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flowchart and/or block of the flowchart and block diagrams, and combinations of flowcharts and block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. An audio and video synchronous rendering optimization method is characterized by comprising the following steps:

decoding the video time stamp and the video image data from the video data to be decoded through a decoding sub-thread of the player;

monitoring an audio time stamp of currently played audio data based on a time update event through a main thread of the player;

and rendering the video image data according to an audio time axis formed by the audio time stamps through a main thread of the player.

2. The method of claim 1, wherein the method further comprises:

performing format conversion on audio data to be played through a packaging sub-thread of the player;

and the main thread of the player plays the audio data to be played after format conversion by using a playing component.

3. The method of claim 1, wherein rendering, by a main thread of the player, the video image data according to an audio timeline constituted by the audio timestamps, comprises:

and directly rendering the video image data when the main thread of the player determines that the audio time stamp of the current audio data to be played is the same as the audio time stamp of the audio data played last.

4. The method of claim 1, wherein rendering, by a main thread of the player, the video image data according to an audio timeline constituted by the audio timestamps, comprises:

determining, by the main thread of the player, that an audio time stamp of the audio data to be played currently is different from an audio time stamp of the audio data played last;

5. The method of claim 1, wherein rendering, by a main thread of the player, the video image data according to an audio timeline constituted by the audio timestamps, comprises:

determining, by the main thread of the player, that an audio time stamp of the audio data to be played currently is different from an audio time stamp of the audio data played last,

6. An audio and video synchronous rendering device, characterized in that the device comprises:

the decoding module is used for decoding the video time stamp and the video image data from the video data to be decoded through a decoding sub-thread of the player; monitoring an audio time stamp of currently played audio data based on a time update event through a main thread of the player;

and the rendering module is used for rendering the video image data according to an audio time axis formed by the audio time stamp through a main thread of the player.

7. The apparatus of claim 6, wherein the apparatus further comprises:

the format conversion module is used for carrying out format conversion on the audio data to be played through a packaging sub-thread of the player;

and the playing module is used for playing the audio data to be played after format conversion by using the playing component by the main thread of the player.

8. The apparatus of claim 6, wherein the rendering module is specifically configured to:

9. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

10. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method according to any one of claims 1-5.