CN118301398A

CN118301398A - Video rendering method, device, electronic equipment, storage medium and program product

Info

Publication number: CN118301398A
Application number: CN202310004117.6A
Authority: CN
Inventors: 刘阿海
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2024-07-05

Abstract

The application provides a video rendering method, a video rendering device, electronic equipment, a storage medium and a program product; the method comprises the following steps: acquiring an audio frame of a video file to be rendered from an audio buffer area for buffering the audio frame; acquiring an image frame of a video file to be rendered from the video file to be rendered; respectively rendering the audio frames and the image frames, and synchronously recording the audio in the process of rendering the audio frames to obtain recorded audio frames; matching the recorded audio frame with the audio frame in the audio buffer area to obtain a target audio frame; and based on the target audio frame, adjusting the rendering progress of the image frame of the video file to be rendered so as to synchronize the rendering of the audio frame and the image frame of the video file to be rendered. According to the application, the accuracy of synchronous rendering of the audio frame and the image frame is effectively improved.

Description

Video rendering method, device, electronic equipment, storage medium and program product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video rendering method, apparatus, electronic device, storage medium, and program product.

Background

With the development of internet technology and the advent of short video, more and more video applications (e.g., short video applications) are becoming more common. When the video application program shoots videos, the camera equipment (such as a camera) of the mobile phone can be controlled to shoot, image data is obtained, meanwhile, the pickup equipment (such as a microphone) of the mobile phone is controlled to record, audio data is obtained, and then the obtained image data and the obtained audio data can be rendered. The subsequent user may play the rendered video.

Because the image data and the audio data are respectively acquired through different devices, if the video application program has problems such as blocking in the process of controlling the camera device to acquire the image data or in the process of controlling the pick-up device to acquire the audio data, the problem that the sound and the picture are not synchronous may exist in the finally rendered video.

Disclosure of Invention

The embodiment of the application provides a video rendering method, a video rendering device, electronic equipment, a computer readable storage medium and a computer program product, which can effectively improve the accuracy of synchronous rendering of audio frames and image frames.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a video rendering method, which comprises the following steps:

Acquiring an audio frame of a video file to be rendered from an audio buffer area for buffering the audio frame; acquiring an image frame of the video file to be rendered from the video file to be rendered;

Respectively rendering the audio frames and the image frames, and synchronously recording the audio in the process of rendering the audio frames to obtain recorded audio frames;

Matching the recorded audio frame with the audio frame in the audio buffer area to obtain a target audio frame;

and based on the target audio frame, adjusting the rendering progress of the image frame of the video file to be rendered so as to synchronize the rendering of the audio frame and the image frame of the video file to be rendered.

An embodiment of the present application provides a video rendering apparatus, including:

The acquisition module is used for acquiring the audio frames of the video file to be rendered from the audio buffer area for buffering the audio frames; acquiring an image frame of the video file to be rendered from the video file to be rendered;

The rendering module is used for respectively rendering the audio frames and the image frames, and synchronously recording the audio in the process of rendering the audio frames to obtain recorded audio frames;

the matching module is used for matching the recorded audio frames with the audio frames in the audio buffer area to obtain target audio frames;

And the adjusting module is used for adjusting the rendering progress of the image frames of the video file to be rendered based on the target audio frames so as to synchronize the rendering of the audio frames and the image frames of the video file to be rendered.

In some embodiments, the video rendering apparatus further includes: the storage module is used for acquiring the video file to be rendered and an audio buffer zone, wherein the video file to be rendered comprises audio frames and image frames corresponding to each playing moment, and the maximum number of the audio frames which can be stored in the audio buffer zone is smaller than the total number of the audio frames in the video file to be rendered; when the audio buffer area does not reach the maximum storage load, the audio frames corresponding to the playing time are sequentially stored into the audio buffer area according to the sequence of the playing time. And the deleting module is used for deleting the acquired audio frames from the audio buffer.

In some embodiments, the obtaining module is further configured to sequentially obtain the audio frames from the audio buffer for buffering the audio frames according to the sequence of the playing moments when the audio buffer reaches the maximum storage load. The acquiring module is further configured to sequentially acquire the image frames from the video file to be rendered according to the sequence of the playing moments.

In some embodiments, the above-mentioned rendering module is further configured to obtain an audio rendering speed of the audio frame and an image rendering speed of the image frame, where the audio rendering speed and the image rendering speed are different; based on the audio rendering speed, sequentially rendering each audio frame according to the sequence of the playing time; and based on the image rendering speed, sequentially rendering each image frame according to the sequence of the playing time.

In some embodiments, the matching module is further configured to obtain a recording time of the recorded audio frame; determining each audio frame stored in the audio buffer at the recording time as a candidate audio frame, wherein the audio frames stored in the audio buffer at different times are different; selecting a plurality of audio frames to be compared, wherein the audio frames to be compared meet preset conditions at playing moments, from the candidate audio frames; and comparing the recorded audio frames with the audio frames to be compared respectively, and determining the target audio frame from a plurality of audio frames to be compared based on a comparison result.

In some embodiments, the matching module is further configured to obtain a playing time of each candidate audio frame, and a playing time threshold, where the playing time threshold is between a latest playing time and an earliest playing time of the candidate audio frame; and determining each candidate audio frame with the playing time later than the playing time threshold as the audio frame to be compared.

In some embodiments, the adjusting module is further configured to determine an audio rendering delay based on the target audio frame; the audio rendering delay is used for representing the difference value between playing moments corresponding to the audio frame currently rendered and the image frame currently rendered respectively; and adjusting the rendering progress of the image frame based on the audio rendering delay so as to synchronize the rendering of the audio frame and the image frame.

In some embodiments, the adjusting module is further configured to obtain a target playing time corresponding to the target audio frame; acquiring playing time corresponding to each audio frame in the audio buffer zone, and determining the latest playing time from the acquired playing time; and determining the difference between the target playing time and the latest playing time as the audio rendering time delay.

In some embodiments, the adjusting module is further configured to, when the value of the audio rendering delay is greater than zero, delay the rendering progress of the image frame based on the value of the audio rendering delay, so as to synchronize the rendering of the audio frame and the image frame, where the value of the audio rendering delay is positively correlated with the delay degree of the rendering progress; and when the value of the audio rendering delay is smaller than zero, accelerating the rendering progress of the image frame based on the value of the audio rendering delay so as to synchronize the rendering of the audio frame and the image frame, wherein the value of the audio rendering delay is positively correlated with the accelerating degree of the rendering progress.

In some embodiments, the adjusting module is further configured to determine a pause duration based on a value of the audio rendering delay, where the pause duration is positively related to the value of the audio rendering delay; suspending rendering of the image frame according to the suspension duration; or acquiring the image rendering speed of the image frame, and determining a first rendering speed based on the numerical value of the audio rendering delay; wherein the first rendering speed is inversely related to the magnitude of the value of the audio rendering delay, and the first rendering speed is smaller than the audio rendering speed of the audio frame; and adjusting the image rendering speed to the first rendering speed.

In some embodiments, the adjusting module is further configured to obtain an image rendering speed of the image frame, and determine a second rendering speed based on a value of the audio rendering delay; wherein the second rendering speed is positively correlated with the magnitude of the audio rendering delay, the second rendering speed being greater than the audio rendering speed of the audio frame; and adjusting the image rendering speed to the second rendering speed.

An embodiment of the present application provides an electronic device, including:

A memory for storing computer executable instructions or computer programs;

And the processor is used for realizing the video rendering method provided by the embodiment of the application when executing the computer executable instructions or the computer programs stored in the memory.

The embodiment of the application provides a computer readable storage medium, which stores computer executable instructions for causing a processor to execute the video rendering method provided by the embodiment of the application.

Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer-executable instructions from the computer-readable storage medium, and the processor executes the computer-executable instructions, so that the electronic device executes the video rendering method according to the embodiment of the application.

The embodiment of the application has the following beneficial effects:

And rendering the acquired audio frames and image frames by acquiring the audio frames and the image frames of the video file to be rendered, synchronously recording the audio in the rendering process of the audio frames to obtain recorded audio frames, matching the recorded audio frames with the audio frames in the audio buffer area to obtain target audio frames, and adjusting the rendering progress of the image frames based on the target audio frames. Therefore, the accurate target audio frame is obtained by matching the recorded audio frame with the audio frame in the audio buffer zone, so that the rendering progress of the image frame is adjusted based on the target audio frame, the rendering progress of the image frame and the rendering progress of the audio frame are kept synchronous, the latest rendering progress is represented because the recorded audio frame is synchronously recorded during rendering, and the rendering progress gap between the audio frame and the image frame can be accurately represented by matching the recorded audio frame with the audio frame in the audio buffer zone, so that the rendering progress of the image frame is adjusted based on the target audio frame, and the accuracy of synchronous rendering of the audio frame and the image frame is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a video rendering system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a structure for a video rendering electronic device according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a video rendering method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a video rendering method according to an embodiment of the present application;

fig. 5 to 8 are schematic flow diagrams of a video rendering method according to an embodiment of the present application;

fig. 9 is a schematic diagram of a video rendering method according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

1) Computer Vision technology (CV): the computer vision is a science for researching how to make a machine "see", and more specifically, a camera and a computer are used to replace human eyes to identify, follow and measure targets, and further perform graphic processing, so that the computer is processed into images more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

2) And (3) synchronizing sound and picture: a film audio-video relation refers to that the playing time of an audio frame and an image frame which are rendered at the same time is the same, and the audio and video pictures are adapted when the audio-video synchronous video is played.

3) Audio frequency: the term audio has been used to generally describe devices within the audio range that are associated with sound and their roles. All sounds that a human being can hear are called audio, which may include noise, etc. After the sound is recorded, both speaking, singing and musical instruments can be processed by digital music software. Audio is simply sound stored in a computer. If a computer is added with a corresponding audio card, namely a frequently-called sound card, all the sounds can be recorded, and the acoustic characteristics of the sounds, such as the sound level, can be stored in a computer hard disk file mode.

4) Video: video (Video) generally refers to various techniques for capturing, recording, processing, storing, transmitting, and reproducing a series of still images as electrical signals. When the continuous image changes more than 24 frames (frames) per second, according to the persistence of vision principle, the human eyes cannot distinguish a single static picture; it appears as a smooth continuous visual effect, such that successive pictures are called videos. Video technology was originally developed for television systems, but has now evolved into a variety of different formats to facilitate video recording by consumers. The development of networking technology has also prompted recorded segments of video to exist as streaming media over the internet and to be received and played by computers. Video is a different technology than film, which uses photography to capture dynamic images as a series of still pictures.

In the implementation of the embodiments of the present application, the applicant found that the related art has the following problems:

In the related art, an Audio renderer (Audio Track) may acquire a time stamp of the latest renderer through an interface, and then calculate the total duration of an Audio frame that has been actually rendered. The application layer then accumulates the write Audio frame duration each time an Audio frame is written at the write interface that invokes the Audio Track. The application layer write audio frame total duration minus the real rendered audio frame total duration may be used to the audio rendering delay time. And then the synchronous machine participates in synchronous calculation after taking the audio rendering delay. The key of the above scheme of the related technology is that the value obtained through the interface call of the Audio Track is inaccurate in some models, so the calculated Audio rendering delay is also inaccurate, and the Audio and video synchronization is inaccurate.

Embodiments of the present application provide a video rendering method, apparatus, electronic device, computer readable storage medium, and computer program product, which effectively improve accuracy of synchronous rendering of audio frames and image frames, and the following describes an exemplary application of the video rendering system provided by the embodiments of the present application.

Referring to fig. 1, fig. 1 is a schematic architecture diagram of a video rendering system 100 according to an embodiment of the present application, where a terminal (a terminal 400 is shown in an exemplary manner) is connected to a server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 is configured to display the rendered video on a graphical interface 410-1 (the graphical interface 410-1 is shown as an example) using a client 410 for a user. The terminal 400 and the server 200 are connected to each other through a wired or wireless network.

In some embodiments, the server 200 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart television, a smart watch, a car terminal, etc. The electronic device provided by the embodiment of the application can be implemented as a terminal or a server. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

In some embodiments, the server 200 obtains from the terminal 400 an audio frame of a video file to be rendered; acquiring an image frame of a video file to be rendered from the video file to be rendered, rendering an audio frame and the image frame, synchronously recording audio in the process of rendering the audio frame to obtain a recorded audio frame, and matching the recorded audio frame with the audio frame in the audio buffer zone to obtain a target audio frame; and based on the target audio frame, adjusting the rendering progress of the image frame of the video file to be rendered so as to synchronize the rendering of the audio frame and the image frame of the video file to be rendered.

In other embodiments, the terminal 400 obtains the audio frames of the video file to be rendered from the server 200; acquiring an image frame of a video file to be rendered from the video file to be rendered, rendering an audio frame and the image frame, synchronously recording audio in the process of rendering the audio frame to obtain a recorded audio frame, and matching the recorded audio frame with the audio frame in the audio buffer zone to obtain a target audio frame; and based on the target audio frame, adjusting the rendering progress of the image frame of the video file to be rendered so as to synchronize the rendering of the audio frame and the image frame of the video file to be rendered.

In other embodiments, the embodiments of the present application may be implemented by means of Cloud Technology (Cloud Technology), which refers to a hosting Technology that unifies serial resources such as hardware, software, networks, etc. in a wide area network or a local area network, so as to implement calculation, storage, processing, and sharing of data.

The cloud technology is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on cloud computing business model application, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical network systems require a large amount of computing and storage resources.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 for video rendering according to an embodiment of the present application, where the electronic device 500 shown in fig. 2 may be the server 200 or the terminal 400 in fig. 1, and the electronic device 500 shown in fig. 2 includes: at least one processor 410, a memory 450, at least one network interface 420. The various components in electronic device 500 are coupled together by bus system 440. It is understood that the bus system 440 is used to enable connected communication between these components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 440.

The Processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose Processor, such as a microprocessor or any conventional Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

Memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 450 optionally includes one or more storage devices physically remote from processor 410.

Memory 450 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (RAM, random Access Memor y). The memory 450 described in embodiments of the present application is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 451 including system programs, e.g., framework layer, core library layer, driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;

A network communication module 452 for accessing other electronic devices via one or more (wired or wireless) network interfaces 420, the exemplary network interface 420 comprising: bluetooth, wireless compatibility authentication (WiFi, wireless Fidelity), and universal serial bus (USB, universal Serial Bus), etc.

In some embodiments, the video rendering device provided by the embodiments of the present application may be implemented in software, and fig. 2 shows the video rendering device 455 stored in the memory 450, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the acquisition module 4551, the rendering module 4552, the matching module 4553, the adjustment module 4554 are logical, so that any combination or further splitting may be performed according to the implemented functions. The functions of the respective modules will be described hereinafter.

In other embodiments, the video rendering apparatus provided in the embodiments of the present application may be implemented in hardware, and as an example, the video rendering apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to perform the video rendering method provided in the embodiments of the present application, for example, the processor in the form of a hardware decoding processor may use one or more Application specific integrated circuits (ASICs, application SPECIFIC INTEGRATED circuits), DSPs, programmable logic devices (PLDs, progra mmable Logic Device), complex Programmable logic devices (CPLDs, complex Programmabl e Logic Device), field Programmable Gate Arrays (FPGAs), GATE ARRAY), or other electronic components.

In some embodiments, the terminal or the server may implement the video rendering method provided by the embodiments of the present application by running a computer program or computer executable instructions. For example, the computer program may be a native program (e.g., a dedicated video rendering program) or a software module in an operating system, e.g., a video rendering module that may be embedded in any program (e.g., a video client, an instant messaging client, an album program, an electronic map client, a navigation client); for example, a Native Application (APP) may be used, i.e. a program that needs to be installed in an operating system to be run. In general, the computer programs described above may be any form of application, module or plug-in.

The video rendering method provided by the embodiment of the application will be described in conjunction with exemplary applications and implementations of the server or the terminal provided by the embodiment of the application.

Referring to fig. 3, fig. 3 is a schematic flow chart of a video rendering method according to an embodiment of the present application, which will be described with reference to steps 101 to 106 shown in fig. 3, where the video rendering method according to an embodiment of the present application may be implemented by a server or a terminal alone or by a server and a terminal cooperatively, and will be described below by taking a terminal alone as an example.

In step 101, an audio frame of a video file to be rendered is acquired from an audio buffer for buffering the audio frame.

In some embodiments, prior to step 101 above, the audio frames may be stored by: the method comprises the steps of obtaining a video file to be rendered and an audio buffer zone, wherein the video file to be rendered comprises audio frames and image frames corresponding to each playing time, and the maximum number of the audio frames which can be stored in the audio buffer zone is smaller than the total number of the audio frames in the video file to be rendered; when the audio buffer area does not reach the maximum storage load, the audio frames corresponding to the playing moments are sequentially stored in the audio buffer area according to the sequence of the playing moments.

In some embodiments, taking a local play scene as an example, a third-party player reads a video file to be rendered from a local storage location of a terminal, the video file to be rendered includes audio frames and image frames corresponding to each play time, and the third-party player obtains the audio frames of the video file to be rendered from a data buffer area for buffering the audio frames. For example, at least one name corresponding to each video file to be rendered is displayed in a third-party player interface, when a click operation of a user on the name of the video file A is received, the video file named as the video file A is determined to be the video file to be rendered, the third-party player reads the video file A from a storage position local to the terminal, and the third-party player acquires the audio frame of the video file A from a data buffer area for buffering the audio frame. That is, the package format of the video file stored locally in the terminal may be a non-streaming media format, that is, the video file may be decoded and played after the video file needs to be downloaded completely, for example, a windows media video (WMV, windows Media Video) file, a MKV file format (MKV) video file, and the like.

In other embodiments, taking a network play scenario as an example, the third party player sends a video play request to the server in response to a user-triggered video play operation (e.g., receiving a click operation from a user on a "start play" button displayed in the third party player interface). The server responds to a video playing request sent by the third-party player, sends a play list of the video file to be played to the third-party player, and then the third-party player sequentially downloads a plurality of video clips of the video file from the server according to the play list, wherein each video clip carries information required for decoding, and therefore the video clip can be independently decoded, namely independent of other video clips. That is, for the video file to be rendered in the network playing scene, the package format of the video file may be a streaming media format, that is, the video file may be decoded and played without being completely downloaded or additional transcoding, for example, a FLV (Flash Video) format video file or the like.

As an example, the maximum number of audio frames that the audio buffer can store is less than the total number of audio frames in the video file to be rendered, the maximum number of audio frames that the audio buffer can store is 10 frames of audio frames, and the total number of audio frames in the video file to be rendered is 1000 frames.

In some embodiments, the maximum storage load of the audio buffer is the maximum number of audio frames that the audio buffer can store, i.e., when the audio buffer reaches the maximum storage load, it is characterized that the maximum number of audio frames has been stored in the audio buffer.

As an example, referring to fig. 4, fig. 4 is a schematic diagram of a video rendering method provided by the embodiment of the present application, where a video file to be rendered and an audio buffer area are obtained, where the video file to be rendered includes audio frames and image frames corresponding to each playing time, and the maximum number of audio frames that can be stored in the audio buffer area is 500 milliseconds (ms), when the audio buffer area does not reach the maximum storage load, the audio frames corresponding to each playing time are sequentially stored in the audio buffer area according to the sequence of the playing times, for example, according to the sequence of the playing times from 0ms to 500ms, and the audio frames corresponding to each playing time are sequentially stored in the audio buffer area.

In some embodiments, following the above step 101, the following processing may also be performed: the acquired audio frames are deleted from the audio buffer.

In some embodiments, after the audio frames of the video file to be rendered are acquired from the audio buffer for buffering the audio frames, the acquired audio frames may be deleted from the audio buffer in order to free up storage space in the audio buffer for storing the audio frames at a subsequent play time.

As an example, after an audio frame having a play time of 0ms of a video file to be rendered is acquired from an audio buffer for buffering the audio frame, the audio frame having the play time of 0ms is deleted from the audio buffer.

In some embodiments, the step 101 may be implemented as follows: when the audio buffer area reaches the maximum storage load, the audio frames are sequentially acquired from the audio buffer area for caching the audio frames according to the sequence of playing time.

In some embodiments, the triggering time of the step 101 may be when the audio buffer reaches the maximum storage load, and when the audio buffer reaches the maximum storage load, the audio buffer cannot continue to store the audio frames, so that the audio frames may be sequentially acquired from the audio buffer according to the sequence of the playing time, and the acquired audio frames are deleted from the audio buffer, so as to release the storage space of the audio buffer, and after the acquired audio frames are deleted from the audio buffer, the audio buffer does not reach the maximum storage load, and sequentially store the audio frames corresponding to the subsequent playing time to the audio buffer according to the sequence of the playing time.

As an example, referring to fig. 4, when the audio buffer reaches the maximum storage load (audio frames from 0ms to 500ms are already stored in the audio buffer), and at this time, the audio buffer cannot continue to store audio frames, then the audio frames may be sequentially acquired from the audio buffer in the order of the playing time (for example, the audio frames with 0ms are acquired first), the acquired audio frames (the audio frames with 0ms are deleted from the audio buffer, so that the storage space of the audio buffer is released, and after the acquired audio frames (the audio frames with 0ms are deleted from the audio buffer), the audio buffer does not reach the maximum storage load, and the audio frames corresponding to the subsequent playing time (the audio frames with 501ms are sequentially stored in the audio buffer in the order of the playing time).

Therefore, when the audio buffer reaches the maximum storage load, the audio buffer can not continuously store the audio frames at the moment, so that the audio frames can be sequentially acquired from the audio buffer according to the sequence of the playing time, the acquired audio frames are deleted from the audio buffer, the storage space of the audio buffer is released, after the acquired audio frames are deleted from the audio buffer, the audio buffer does not reach the maximum storage load, the audio frames corresponding to the subsequent playing time are sequentially stored in the audio buffer according to the sequence of the playing time, the function of temporarily storing the audio frames in the audio buffer is effectively ensured, all the audio frames of the video file to be rendered can be stored in the audio buffer for rendering according to the sequence, and the playing is performed through a loudspeaker, so that the time sequence of audio rendering is effectively ensured.

In step 102, an image frame of a video file to be rendered is acquired from the video file to be rendered.

In some embodiments, the video file to be rendered includes an audio frame and an image frame corresponding to each playing time.

In some embodiments, taking a local play scene as an example, a third-party player reads a video file to be rendered from a local storage position of a terminal, the video file to be rendered includes audio frames and image frames corresponding to each play time, and the third-party player obtains the image frames of the video file to be rendered from the video file to be rendered. For example, at least one name corresponding to each video file to be rendered is displayed in a third-party player interface, when a click operation of a user on the name of the video file A is received, the video file named as the video file A is determined to be the video file to be rendered, the third-party player reads the video file A from a storage position of the terminal, and the third-party player acquires an image frame of the video file to be rendered from the video file to be rendered.

In some embodiments, the step 102 may be implemented as follows: and acquiring the image frames of the video file to be rendered from the video file to be rendered in sequence according to the sequence of the playing time.

As an example, image frames with playing moments of 0ms, 1ms, 2ms and 3ms are sequentially acquired from the video file to be rendered according to the sequence of the playing moments.

In this way, the image frames of the video file to be rendered are acquired from the video file to be rendered in sequence according to the sequence of the playing time, so that the acquired image frames are convenient to render, all the image frames of the video file to be rendered can be rendered according to the sequence, and the time sequence of the image frame rendering is effectively ensured.

In step 103, the audio frame and the image frame are rendered, respectively.

In some embodiments, the audio frames and the image frames are rendered differently, wherein the audio frames may be rendered by an audio renderer and the image frames may be rendered by an image renderer.

In some embodiments, referring to fig. 5, fig. 5 is a flowchart of a video rendering method according to an embodiment of the present application, and step 103 shown in fig. 3 may be implemented by executing steps 1031 to 1033 shown in fig. 5.

In step 1031, an audio rendering rate of the audio frame, and an image rendering rate of the image frame are acquired.

In some embodiments, the audio rendering rate and the image rendering rate are different, the audio rendering rate of the audio frames being determined by the performance of the audio renderer used to render the audio frames, the performance of the audio renderer being proportional to the size of the audio rendering rate, i.e., the higher the performance of the audio renderer, the greater the size of the audio rendering rate.

In some embodiments, the image rendering speed of the image frame is determined by the performance of the image renderer, which is proportional to the magnitude of the image rendering speed, i.e., the higher the performance of the image renderer, the greater the magnitude of the image rendering speed.

In step 1032, each audio frame is sequentially rendered according to the sequence of the playing time based on the audio rendering speed.

As an example, when the audio rendering speed is 10 frames/sec, the audio frames are sequentially rendered in the order of the playing time at the audio rendering speed of 10 frames/sec.

In step 1033, each image frame is sequentially rendered according to the sequence of the playing time based on the image rendering speed.

As an example, when the image rendering speed is 5 frames/second, the image frames are sequentially rendered in the order of the playing time according to the image rendering speed of 5 frames/second.

Therefore, all the image frames and the audio frames of the video file to be rendered can be rendered according to the sequence, and the time sequence of the image frames and the audio frames is effectively ensured.

In step 104, audio recording is performed synchronously during the process of rendering the audio frames, so as to obtain recorded audio frames.

In some embodiments, in the process of rendering the audio frames, audio recording is performed on each rendered audio frame synchronously to obtain recorded audio frames, and each rendered audio frame corresponds to one recorded audio frame, so that the subsequent determination of audio rendering time delay through the recorded audio frames is facilitated, the rendering progress of the audio frames is controlled, and synchronous rendering of the audio frames and the image frames is realized.

In some embodiments, audio frames are recorded for controlling the rendering progress of the audio frames to achieve synchronous rendering of the audio frames and the image frames.

In step 105, the recorded audio frame is matched with the audio frame in the audio buffer to obtain a target audio frame.

In some embodiments, each recorded audio frame corresponds to a target audio frame, the target audio frame and the recorded audio frame are in a one-to-one correspondence, the audio frames stored in the audio buffer corresponding to each recorded audio frame are different, and the target audio frame corresponding to the recorded audio frame is an audio frame in the audio buffer corresponding to the recorded audio frame.

In some embodiments, referring to fig. 6, fig. 6 is a flowchart illustrating a video rendering method according to an embodiment of the present application, and step 105 shown in fig. 3 may be implemented by performing steps 1051 to 1054 shown in fig. 6 for each recorded audio frame.

In step 1051, a recording time at which an audio frame is recorded is obtained.

In some embodiments, the recording time of the recorded audio frame is a playing time corresponding to the recorded audio frame, and the recording time corresponding to the different recorded audio frames is different.

In step 1052, each audio frame stored in the recording-time audio buffer is determined as a candidate audio frame.

In some embodiments, the audio frames stored in the audio buffer at different moments are different, each audio frame stored in the audio buffer at the moment of recording is determined to be a candidate audio frame, so that the audio frame to be compared can be conveniently selected from the candidate audio frames later, and then the target audio frame is determined based on the audio frame to be compared.

In step 1053, a plurality of audio frames to be compared whose playing time satisfies a preset condition are selected from the candidate audio frames.

In some embodiments, each candidate audio frame corresponds to a playing time, and the candidate audio frame whose playing time meets the preset condition is determined as the audio frame to be compared by judging whether the playing time corresponding to each candidate audio frame meets the preset condition.

In some embodiments, step 1053 described above may be implemented as follows: acquiring playing time of each candidate audio frame and a playing time threshold, wherein the playing time threshold is between the latest playing time and the earliest playing time of the candidate audio frame; and determining each candidate audio frame with the playing time later than the playing time threshold as an audio frame to be compared.

In some embodiments, the latest play time of the candidate audio frames refers to the latest play time of the play times corresponding to the candidate audio frames, and the earliest play time of the candidate audio frames refers to the earliest play time of the play times corresponding to the candidate audio frames.

In some embodiments, the playing time threshold is between the latest playing time and the earliest playing time, and the playing time threshold may be set based on the latest playing time and the earliest playing time, and the specific value of the playing time threshold does not form a limitation to the embodiment of the present application.

Therefore, the candidate audio frames with the playing time later than the playing time threshold value are selected from the candidate audio frames through the playing time threshold value to serve as the audio frames to be compared, so that the selected audio frames to be compared are audio frames with relatively earlier playing time in the audio buffer zone, the comparison time of the subsequent audio frames is effectively reduced, and the comparison efficiency is improved.

In step 1054, the recorded audio frames are compared with each audio frame to be compared, and a target audio frame is determined from the plurality of audio frames to be compared based on the comparison result.

In some embodiments, step 1054 above may be implemented as follows: comparing the recorded audio frames with each audio frame to be compared to obtain a comparison result corresponding to each audio frame to be compared, wherein the comparison result represents the matching degree between the audio frames to be compared and the recorded audio frames; and determining the audio frame to be compared with the largest matching degree in the plurality of audio frames to be compared as a target audio frame.

In some embodiments, the matching degree between the audio frame to be compared and the recorded audio frame refers to a similarity degree between the audio frame to be compared and the recorded audio frame, and the similarity degree may be determined by an amplitude difference between audio signals corresponding to the audio frame to be compared and the recorded audio frame respectively.

In step 106, the rendering schedule of the image frames of the video file to be rendered is adjusted based on the target audio frames to synchronize the rendering of the audio frames and the image frames of the video file to be rendered.

In some embodiments, the rendering schedule of the image frames of the video file to be rendered characterizes a ratio of the number of image frames that have been rendered to the total number of image frames of the video file to be rendered.

In some embodiments, referring to fig. 7, fig. 7 is a flowchart of a video rendering method according to an embodiment of the present application, and step 106 shown in fig. 3 may be implemented by executing steps 1061 to 1062 shown in fig. 7.

In step 1061, an audio rendering delay is determined based on the target audio frame.

In some embodiments, the audio rendering delay is used to characterize a difference between playing moments corresponding to the current rendered audio frame and the current rendered image frame, respectively.

In some embodiments, because the audio rendering speed and the image rendering speed are different, the rendering progress of the audio frame and the rendering progress of the image frame are different, so that a difference exists between the currently rendered audio frame and the currently rendered image frame, and the audio rendering delay can be determined through the target playing time corresponding to the target audio frame, so that the subsequent audio-video synchronization can be realized based on the audio rendering delay.

In some embodiments, step 1061 above may be implemented as follows: acquiring a target playing time corresponding to a target audio frame; acquiring playing time corresponding to each audio frame in an audio buffer zone, and determining the latest playing time from the acquired playing time; and determining the difference between the target playing time and the latest playing time as the audio rendering time delay.

As an example, referring to fig. 9, fig. 9 is a schematic diagram of a video rendering method according to an embodiment of the present application, in which recorded audio frames are compared with audio frames to be compared to obtain comparison results corresponding to the audio frames to be compared, where the comparison results represent a matching degree between the audio frames to be compared and the recorded audio frames; and determining the audio frame to be compared with the largest matching degree in the plurality of audio frames to be compared as a target audio frame, and determining the audio rendering time delay based on the target audio frame.

In some embodiments, the latest play time refers to the latest play time among play times corresponding to audio frames in the audio buffer. The target playing time refers to the playing time corresponding to the target audio frame.

Therefore, the target audio frame is accurately determined, the audio rendering time delay is determined based on the target audio frame, the subsequent audio-video synchronization based on the audio rendering time delay is facilitated, and timeliness and accuracy of the audio synchronization are effectively achieved.

In step 1062, the rendering schedule of the image frames is adjusted based on the audio rendering delay to synchronize the rendering of the audio frames and the image frames.

In some embodiments, the rendering schedule of the image frames characterizes the rendering schedule of the image frames of the video file to be rendered, characterizing the ratio of the number of image frames that have been rendered to the total number of image frames of the video file to be rendered. The adjustment of the rendering progress of the image frames can be achieved by adjusting the rendering speed of the image frames, suspending the rendering of the image frames, and the like.

In some embodiments, step 1062 above may be implemented as follows: when the value of the audio rendering delay is larger than zero, delaying the rendering progress of the image frame based on the value of the audio rendering delay so as to synchronize the rendering of the audio frame and the image frame, wherein the value of the audio rendering delay is positively correlated with the delaying degree of the rendering progress; when the value of the audio rendering delay is smaller than zero, the rendering progress of the image frames is quickened based on the value of the audio rendering delay, so that the rendering of the audio frames and the image frames is synchronous, and the value of the audio rendering delay is positively correlated with the quickening degree of the rendering progress.

In some embodiments, when the value of the audio rendering delay is greater than zero, the rendering progress of the representation audio frame is behind the rendering progress of the image frame, so that the rendering progress of the audio frame can catch up with the rendering progress of the image frame as soon as possible by delaying the rendering progress of the image frame, so as to realize the rendering synchronization of the audio frame and the image frame.

In some embodiments, when the value of the audio rendering delay is less than zero, the rendering progress of the representation audio frame is advanced to the rendering progress of the image frame, so that the rendering progress of the image frame can catch up with the rendering progress of the audio frame as soon as possible, thereby realizing the rendering synchronization of the audio frame and the image frame.

In some embodiments, the values of the audio rendering delay are typically not equal to zero because the image rendering rate of the image frames and the audio rendering rate of the audio frames are different, and therefore the rendering progress of the audio frames and the image frames may not always be consistent. And when the numerical value of the audio rendering delay is equal to zero, rendering progress of the representation audio frames and the image frames is kept consistent, namely the rendering of the audio frames and the image frames of the video file to be rendered is synchronous. Therefore, the audio and video synchronization can be realized to the greatest extent by enabling the numerical value of the audio rendering time delay to be equal to zero as far as possible, after the rendering progress of the image frames of the video file to be rendered is adjusted, the audio rendering time delay is equal to zero, at this time, the rendering progress of the audio frames is kept consistent, and as the image rendering speed of the image frames is still different from the audio rendering speed of the audio frames in the subsequent rendering process, the rendering progress of the audio frames is inconsistent, and then the audio rendering time delay is continuously calculated, and the rendering progress of the image frames of the video file to be rendered is continuously adjusted by the audio rendering time delay in the rendering process, so that the rendering of the audio frames of the video file to be rendered and the rendering of the image frames is continuously synchronized in the rendering process.

In some embodiments, since the image rendering speed of the image frame is different from the audio rendering speed of the audio frame, after the rendering of the audio frame and the rendering of the image frame are synchronized by the audio rendering delay, in the subsequent rendering process, the rendering of the audio frame and the rendering of the image frame are asynchronous due to the difference of the rendering speeds, so that the audio rendering delay is continuously recalculated in the subsequent process, the rendering progress of the image frame is continuously adjusted in the rendering process, the rendering synchronization of the audio frame and the image frame is continuously realized, and the rendering synchronization (i.e. the sound and picture synchronization) of the audio frame and the image frame which cannot be perceived by human eyes can be realized by continuously realizing the rendering synchronization of the audio frame and the image frame can be realized due to relatively short occurrence time of the asynchronous phenomenon, thereby effectively improving the viewing experience of the viewer and the video rendering quality.

In some embodiments, the foregoing delay of rendering of the image frame based on the magnitude of the audio rendering delay may be implemented in the following manner: determining a pause time based on the numerical value of the audio rendering time delay, wherein the pause time is positively correlated with the numerical value of the audio rendering time delay; suspending rendering of the image frames according to the suspension time; or acquiring the image rendering speed of the image frame, and determining a first rendering speed based on the numerical value of the audio rendering delay; the first rendering speed is inversely related to the value of the audio rendering time delay, and is smaller than the audio rendering speed of the audio frame; the image rendering speed is adjusted to a first rendering speed.

In some embodiments, the determining the pause duration based on the value of the audio rendering delay may be implemented as follows: and querying a target index entry including the numerical value of the audio rendering delay in a plurality of duration index entries of the database, and determining the duration in the target index entry as the pause duration.

In some embodiments, a duration index entry in the database characterizes a mapping of a numerical magnitude of the audio rendering delay to a duration.

In some embodiments, the determining the first rendering speed based on the magnitude of the audio rendering delay may be implemented as follows: and querying a target index entry including the numerical value of the audio rendering delay in a plurality of speed index entries of the database, and determining the rendering speed in the target index entry as the first rendering speed.

In some embodiments, the first rendering speed is inversely related to the value of the audio rendering delay, the larger the audio rendering delay is, the larger the difference between the rendering progress representing the audio frame and the rendering progress representing the image frame is, the smaller the corresponding first rendering speed is, and the image rendering speed is adjusted to be the first rendering speed, so that the rendering progress of the image frame is effectively slowed down, the audio frame is guaranteed to catch up with the rendering progress of the image frame as soon as possible, and therefore sound and picture synchronization is achieved.

In some embodiments, the foregoing value size based on the audio rendering delay accelerates the rendering progress of the image frame, which may be implemented in the following manner: acquiring an image rendering speed of an image frame, and determining a second rendering speed based on the numerical value of the audio rendering delay; the second rendering speed is positively related to the value of the audio rendering delay, and the second rendering speed is larger than the audio rendering speed of the audio frame; the image rendering speed is adjusted to a second rendering speed.

In some embodiments, the determining the second rendering speed based on the magnitude of the audio rendering delay may be implemented as follows: and querying a target index entry including the numerical value of the audio rendering delay in a plurality of speed index entries of the database, and determining the rendering speed in the target index entry as a second rendering speed.

In some embodiments, the second rendering speed is positively correlated with the value of the audio rendering delay, the larger the audio rendering delay is, the larger the difference between the rendering progress representing the audio frame and the rendering progress representing the image frame is, and the larger the corresponding second rendering speed is, the image rendering speed is adjusted to be the second rendering speed, so that the rendering progress of the image frame is effectively accelerated, the image frame is guaranteed to catch up with the rendering progress of the audio frame as soon as possible, and therefore sound and picture synchronization is achieved.

In this way, the audio frames and the image frames of the video file to be rendered are obtained, the obtained audio frames and image frames are rendered, in the rendering process of the audio frames, audio recording is synchronously carried out to obtain recorded audio frames, the audio frames in the recorded audio frames and the audio buffer area are matched to obtain target audio frames, and the rendering progress of the image frames is adjusted based on the target audio frames. Therefore, the accurate target audio frame is obtained by matching the recorded audio frame with the audio frame in the audio buffer zone, so that the rendering progress of the image frame is adjusted based on the target audio frame, the rendering progress of the image frame and the rendering progress of the audio frame are kept synchronous, the latest rendering progress is represented because the recorded audio frame is synchronously recorded during rendering, and the rendering progress gap between the audio frame and the image frame can be accurately represented by matching the recorded audio frame with the audio frame in the audio buffer zone, so that the rendering progress of the image frame is adjusted based on the target audio frame, and the accuracy of synchronous rendering of the audio frame and the image frame is effectively improved.

In the following, an exemplary application of an embodiment of the present application in an actual video-rendered application scenario will be described.

In the playing process of the video source, the video source stands on the angle of a user, if the sound and the image are only synchronously rendered, the user feels comfortable, the human body is sensitive to asynchronous rendering of the sound and the image, and if the difference value of the image and the sound rendering time stamp exceeds about 100 milliseconds, the human body can start to perceive asynchronous experience. The ability to synchronize audio and video is the minimum requirement of the player's software. The rendering of images is similar to the fast version of a slide show, in the video source of the current mainstream, the video frame rate is 25fps, that is, 25 frames of images need to be rendered in 1 second, that is, approximately every 40ms of images, that is, the rendering time stamp is similar to 0, 40ms,80ms,120ms …, the images are drawn on the screen, and the images are seen as a continuous video. The rendering mode of sound is somewhat different from that of video, because the human ear is extremely sensitive to sound, the sampling rate of the current mainstream audio reaches 48000HZ, i.e. the sound signal data is sampled 48000 times a second, so that the sound signal data is required to be restored 48000 times when playing, and finally, an audio frame is divided into a frame of tens of milliseconds, i.e. the audio renderer takes tens of milliseconds to render the frame of audio. The mobile audio renderer has a buffer area of the audio renderer, before rendering a frame of audio, the audio data needs to be sent into the buffer area, and then the renderer fetches data from the buffer area to render, so that the audio rendering has a rendering delay, for example, a system buffer area of the audio renderer can store 1 second of audio data, if 500 milliseconds of data in the current buffer area is not consumed, the audio rendering delay is 500 milliseconds, when the video is synchronous with the audio, the 500 milliseconds delay is calculated and entered, if the delay is ignored, the delay may reach 500 milliseconds, and a human body can perceive that the audio and the video are asynchronous. On the mobile terminal, due to too many models, more factories calculate inaccurate audio rendering delay, and particularly under the condition of wearing Bluetooth headphones, the audio delay provided by a system interface and the actual audio delay can reach 1 second at most. The invention provides a solution, which is to analyze and compare recorded audio with backup audio data sent into a buffer zone by opening the audio capability of the equipment so as to obtain a time stamp for actually rendering an audio frame. And then comparing the image rendering time stamp with the time stamp to realize sound and picture synchronization.

In some embodiments, referring to fig. 8, fig. 8 is a flowchart of a video rendering method according to an embodiment of the present application, where the video rendering method according to the embodiment of the present application may be implemented by steps 201 to 207 shown in fig. 8.

In step 201, the network protocol data in the stream input is deprotocol.

In some embodiments, the network data is parsed according to a network protocol, such as an http protocol, and the http protocol data is separated from data transmitted by an application layer according to the http protocol, so as to obtain streaming media data, where the streaming media data are all data with a container package format, such as an mp4 format.

In step 202, the media container data after the protocol is unpacked to obtain an audio frame and a video frame.

In some embodiments, the video frames, i.e., the image frames described above, are fed into a media data parser, contact container packaging, output resulting audio video frames, and metadata carried by the frames, including frame size, display time stamp, etc.

In step 203, the video frame is decoded.

In step 204, the audio frame is decoded.

In some embodiments, the audio and video decoders are of different types, and different decoders are used to decode the compressed audio and video frames to render the renderable data.

In step 205, audio-video synchronization is performed based on the PCM data and the frame image.

In some embodiments, because the decoders for decoding the audio and video are different, the decoding speed and the rendering speed are not consistent, so that the audio and video synchronization is required, otherwise, the phenomenon of asynchronous audio and video occurs.

In step 206, audio is rendered.

In step 207, the video is rendered.

In some embodiments, the decoded and synchronized renderable frames are fed into an audio-video renderer, so that the sound and the image can be heard and seen simultaneously.

The embodiment of the application obtains the delay size of the buffer area of the leaning spectrum by using an audio delay calculation method with controllable application layer, thus being applicable to all models and solving the problem of sound and picture asynchronism caused by inaccurate delay buffer area of audio software obtained from a system side.

Referring to fig. 4, fig. 4 shows the basic principle of audio and video synchronization, where both audio and video frames need to be processed by a synchronous machine before rendering, the synchronous machine generally uses audio as a main clock for synchronization, and assuming that the size of an audio software layer buffer is 500ms, after audio frames with a display timestamp of 500ms are sent, an audio renderer starts to render the video frames with the audio buffer queue head, that is, the audio frames with 0ms, so that the audio timestamp buffered by the synchronous machine is 500ms, but the real audio rendering timestamp is 0ms, and when the video frames are rendered, the sent video frames should be 0ms and appear to be audio and video synchronized, and the time from the synchronous machine to the screen is basically negligible. A formula is derived:

video pts delta ＝ video pts - audio pts + audio latency time (1)

Wherein, video pts represents the instant of playing of image frame, audio pts represents the instant of playing of video frame, audio LATENCY TIME represents audio rendering delay, audio LATENCY TIME represents rendering error.

In some embodiments, the most accurate synchronization is only performed when the value of the rendering error video pts delta is 0. The invention mainly starts audio recording when the audio is rendered, and the audio frame of built-in sound of the latest recording system is analyzed and matched with sound data sent to the audio renderer, so that the display time stamp of the latest rendering of an audio software layer can be obtained, and the display time stamp of the latest pouring audio renderer is cached when the latest rendering frame is written, so that audio LATENCY TIME =the playing time of the audio frame which is latest written into the audio renderer-the playing time of the latest rendering frame obtained by analysis.

The audio frames sent to the audio renderer are copied into an audio delay analyzer and carry audio frame time stamps, and a blanking mechanism can be adopted, so that the audio frames which exceed the latest rendering frames by 2 seconds are blanked because the audio rendering delay is less than 2 seconds from the analysis of the data of each model. And simultaneously starting audio recording, setting the audio recording parameters to be the same as the audio parameters of the audio renderer, finally obtaining an audio frame with the highest matching degree by matching recorded audio and video data with data sent into the audio renderer, obtaining a display time stamp of the frame, subtracting the display time stamp of the audio frame of the newly written audio renderer, and outputting the obtained difference value, namely the audio rendering delay.

Thus, by the video rendering method provided by the embodiment of the application, after the technical scheme is integrated into the player, the problem of asynchronous sound and picture can not occur when a user using the APP of the player views the video.

It will be appreciated that in the embodiments of the present application, related data such as video files to be rendered is involved, when the embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

Continuing with the description below of an exemplary architecture of the video rendering device 455 implemented as a software module provided by embodiments of the present application, in some embodiments, as shown in fig. 2, the software modules stored in the video rendering device 455 of the memory 450 may include: an obtaining module 4551, configured to obtain an audio frame of a video file to be rendered from an audio buffer for buffering the audio frame; acquiring an image frame of a video file to be rendered from the video file to be rendered; the rendering module 4552 is configured to render the audio frame and the image frame respectively, and perform audio recording synchronously in the process of rendering the audio frame to obtain a recorded audio frame; the matching module 4553 is configured to match the recorded audio frame with the audio frame in the audio buffer to obtain a target audio frame; and the adjusting module 4554 is configured to adjust the rendering progress of the image frame of the video file to be rendered based on the target audio frame, so as to synchronize the rendering of the audio frame and the image frame of the video file to be rendered.

In some embodiments, the video rendering device 455 further includes: the storage module is used for acquiring a video file to be rendered and an audio buffer zone, wherein the video file to be rendered comprises audio frames and image frames corresponding to each playing time, and the maximum number of the audio frames which can be stored in the audio buffer zone is smaller than the total number of the audio frames in the video file to be rendered; when the audio buffer area does not reach the maximum storage load, the audio frames corresponding to the playing moments are sequentially stored in the audio buffer area according to the sequence of the playing moments. And the deleting module is used for deleting the acquired audio frames from the audio buffer.

In some embodiments, the obtaining module 4551 is further configured to sequentially obtain the audio frames from the audio buffer for buffering the audio frames according to the sequence of the playing time when the audio buffer reaches the maximum storage load. The acquisition module is further used for sequentially acquiring the image frames of the video file to be rendered from the video file to be rendered according to the sequence of the playing time.

In some embodiments, the rendering module 4552 is further configured to obtain an audio rendering speed of the audio frame and an image rendering speed of the image frame, where the audio rendering speed and the image rendering speed are different; based on the audio rendering speed, sequentially rendering each audio frame according to the sequence of the playing time; and based on the image rendering speed, sequentially rendering each image frame according to the sequence of the playing time.

In some embodiments, the matching module 4553 is further configured to obtain a recording time of the recorded audio frame; each audio frame stored in the audio buffer at the recording time is determined to be a candidate audio frame, wherein the audio frames stored in the audio buffer at different times are different; selecting a plurality of audio frames to be compared, wherein the audio frames to be compared meet preset conditions at the playing time, from the candidate audio frames; and comparing the recorded audio frames with each audio frame to be compared respectively, and determining a target audio frame from a plurality of audio frames to be compared based on the comparison result.

In some embodiments, the matching module 4553 is further configured to obtain a playing time of each candidate audio frame, and a playing time threshold, where the playing time threshold is between a latest playing time and an earliest playing time of the candidate audio frame; and determining each candidate audio frame with the playing time later than the playing time threshold as an audio frame to be compared.

In some embodiments, the adjusting module 4554 is further configured to determine an audio rendering delay based on the target audio frame; the audio rendering delay is used for representing the difference value between playing moments corresponding to the current rendered audio frame and the current rendered image frame respectively; based on the audio rendering delay, the rendering progress of the image frames is adjusted to synchronize the rendering of the audio frames and the image frames.

In some embodiments, the adjusting module 4554 is further configured to obtain a target playing time corresponding to the target audio frame; acquiring playing time corresponding to each audio frame in an audio buffer zone, and determining the latest playing time from the acquired playing time; and determining the difference between the target playing time and the latest playing time as the audio rendering time delay.

In some embodiments, the adjusting module 4554 is further configured to, when the value of the audio rendering delay is greater than zero, delay the rendering progress of the image frame based on the value of the audio rendering delay, so as to synchronize the rendering of the audio frame and the image frame, where the value of the audio rendering delay is positively related to the delay degree of the rendering progress; when the value of the audio rendering delay is smaller than zero, the rendering progress of the image frames is quickened based on the value of the audio rendering delay, so that the rendering of the audio frames and the image frames is synchronous, and the value of the audio rendering delay is positively correlated with the quickening degree of the rendering progress.

In some embodiments, the adjusting module 4554 is further configured to determine a pause duration based on the magnitude of the audio rendering delay, where the pause duration is positively related to the magnitude of the audio rendering delay; suspending rendering of the image frames according to the suspension time; or acquiring the image rendering speed of the image frame, and determining a first rendering speed based on the numerical value of the audio rendering delay; the first rendering speed is inversely related to the value of the audio rendering time delay, and is smaller than the audio rendering speed of the audio frame; the image rendering speed is adjusted to a first rendering speed.

In some embodiments, the adjusting module 4554 is further configured to obtain an image rendering speed of the image frame, and determine a second rendering speed based on a value of the audio rendering delay; the second rendering speed is positively related to the value of the audio rendering delay, and the second rendering speed is larger than the audio rendering speed of the audio frame; the image rendering speed is adjusted to a second rendering speed.

Embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, cause the processor to perform a video rendering method provided by embodiments of the present application, for example, a video rendering method as shown in fig. 3.

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of electronic devices including one or any combination of the above-described memories.

In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subroutines).

As an example, computer-executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the embodiment of the application has the following beneficial effects:

(1) And rendering the acquired audio frames and image frames by acquiring the audio frames and the image frames of the video file to be rendered, synchronously recording the audio in the rendering process of the audio frames to obtain recorded audio frames, matching the recorded audio frames with the audio frames in the audio buffer area to obtain target audio frames, and adjusting the rendering progress of the image frames based on the target audio frames. Therefore, the accurate target audio frame is obtained by matching the recorded audio frame with the audio frame in the audio buffer zone, so that the rendering progress of the image frame is adjusted based on the target audio frame, the rendering progress of the image frame and the rendering progress of the audio frame are kept synchronous, the latest rendering progress is represented because the recorded audio frame is synchronously recorded during rendering, and the rendering progress gap between the audio frame and the image frame can be accurately represented by matching the recorded audio frame with the audio frame in the audio buffer zone, so that the rendering progress of the image frame is adjusted based on the target audio frame, and the accuracy of synchronous rendering of the audio frame and the image frame is effectively improved.

(2) When the audio buffer reaches the maximum storage load, the audio buffer can not continuously store audio frames at the moment, so that the audio frames can be sequentially acquired from the audio buffer according to the sequence of playing time, the acquired audio frames are deleted from the audio buffer, the storage space of the audio buffer is released, after the acquired audio frames are deleted from the audio buffer, the audio buffer does not reach the maximum storage load, the audio frames corresponding to the subsequent playing time are sequentially stored into the audio buffer according to the sequence of the playing time, the function of temporarily storing the audio frames in the audio buffer is effectively ensured, all the audio frames of the video file to be rendered can be stored into the audio buffer for rendering according to the sequence, and the playing is performed through a loudspeaker, so that the time sequence of audio rendering is effectively ensured.

(3) The image frames of the video file to be rendered are acquired from the video file to be rendered in sequence according to the sequence of the playing time, so that the acquired image frames are convenient to render, all the image frames of the video file to be rendered can be rendered according to the sequence, and the time sequence of image frame rendering is effectively ensured.

(4) By sequentially rendering each audio frame and each video frame according to the audio rendering speed, the video rendering speed and the sequence of playing time, all the image frames and the audio frames of the video file to be rendered can be rendered according to the sequence, and the time sequence of rendering the image frames and the audio frames is effectively ensured.

(5) And selecting a candidate audio frame with the playing time later than the playing time threshold from the candidate audio frames through the playing time threshold as an audio frame to be compared, thereby ensuring that the selected audio frame to be compared is an audio frame with relatively earlier playing time in an audio buffer zone, effectively reducing the comparison time of the subsequent audio frames and improving the comparison efficiency.

(6) The target audio frame is accurately determined, the audio rendering time delay is determined based on the target audio frame, the subsequent realization of sound and picture synchronization based on the audio rendering time delay is facilitated, and the timeliness and the accuracy of the audio synchronization are effectively realized.

(7) The second rendering speed is positively correlated with the value of the audio rendering time delay, the larger the audio rendering time delay is, the larger the difference between the rendering progress of the representation audio frame and the rendering progress of the image frame is, the larger the corresponding second rendering speed is, and the image rendering speed is adjusted to be the second rendering speed, so that the rendering progress of the image frame is effectively accelerated, the image frame is guaranteed to catch up with the rendering progress of the audio frame as soon as possible, and therefore sound and picture synchronization is achieved.

(8) Because the image rendering speed of the image frame is different from the audio rendering speed of the audio frame, after the rendering of the audio frame and the image frame is synchronized through the audio rendering time delay, in the subsequent rendering process, the rendering asynchronous phenomenon of the audio frame and the image frame is caused by the difference of the rendering speed, so that the audio rendering time delay is continuously recalculated in the subsequent process, the rendering progress of the image frame is continuously adjusted in the rendering process, the rendering synchronous of the audio frame and the image frame is continuously realized, the occurrence time of the asynchronous phenomenon is relatively short and cannot be perceived by a spectator, and the rendering synchronous (namely the sound and picture synchronous) of the audio frame and the image frame which cannot be perceived by the spectator can be realized through the continuous realization of the rendering synchronous of the audio frame and the image frame, thereby effectively improving the viewing experience of the spectator and the video rendering quality.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of video rendering, the method comprising:

acquiring an audio frame of a video file to be rendered from an audio buffer area for buffering the audio frame;

acquiring an image frame of the video file to be rendered from the video file to be rendered;

2. The method of claim 1, wherein prior to obtaining the audio frames of the video file to be rendered from the audio buffer for buffering the audio frames, the method further comprises:

The method comprises the steps of obtaining a video file to be rendered and an audio buffer zone, wherein the video file to be rendered comprises audio frames and image frames corresponding to each playing time, and the maximum number of the audio frames which can be stored in the audio buffer zone is smaller than the total number of the audio frames in the video file to be rendered;

When the audio buffer area does not reach the maximum storage load, sequentially storing the audio frames corresponding to the playing moments into the audio buffer area according to the sequence of the playing moments;

after the audio frames of the video file to be rendered are acquired from the audio buffer for buffering the audio frames, the method further includes:

And deleting the acquired audio frames from the audio buffer.

3. The method of claim 2, wherein the obtaining the audio frames of the video file to be rendered from the audio buffer for buffering the audio frames comprises:

When the audio buffer area reaches the maximum storage load, sequentially acquiring the audio frames from the audio buffer area for caching the audio frames according to the sequence of the playing time;

the obtaining the image frame of the video file to be rendered from the video file to be rendered includes:

and acquiring the image frames from the video file to be rendered in sequence according to the sequence of the playing time.

4. The method of claim 2, wherein rendering the audio frame and the image frame, respectively, comprises:

Acquiring an audio rendering speed of the audio frame and an image rendering speed of the image frame, wherein the audio rendering speed and the image rendering speed are different;

Based on the audio rendering speed, sequentially rendering each audio frame according to the sequence of the playing time;

And based on the image rendering speed, sequentially rendering each image frame according to the sequence of the playing time.

5. The method of claim 1, wherein said matching the recorded audio frames to the audio frames in the audio buffer results in target audio frames, comprising:

acquiring the recording time of the recorded audio frame;

Determining each audio frame stored in the audio buffer at the recording time as a candidate audio frame, wherein the audio frames stored in the audio buffer at different times are different;

selecting a plurality of audio frames to be compared, wherein the audio frames to be compared meet preset conditions at playing moments, from the candidate audio frames;

And comparing the recorded audio frames with the audio frames to be compared respectively, and determining the target audio frame from a plurality of audio frames to be compared based on a comparison result.

6. The method according to claim 5, wherein selecting a plurality of audio frames to be compared whose playing time satisfies a preset condition from the candidate audio frames comprises:

Acquiring playing time of each candidate audio frame and a playing time threshold, wherein the playing time threshold is between the latest playing time and the earliest playing time of the candidate audio frame;

and determining each candidate audio frame with the playing time later than the playing time threshold as the audio frame to be compared.

7. The method of claim 1, wherein adjusting the rendering progress of the image frames of the video file to be rendered based on the target audio frame comprises:

Determining an audio rendering delay based on the target audio frame;

the audio rendering delay is used for representing the difference value between playing moments corresponding to the audio frame currently rendered and the image frame currently rendered respectively;

And adjusting the rendering progress of the image frame based on the audio rendering delay so as to synchronize the rendering of the audio frame and the image frame.

8. The method of claim 7, wherein the determining an audio rendering delay based on the target audio frame comprises:

acquiring a target playing time corresponding to the target audio frame;

Acquiring playing time corresponding to each audio frame in the audio buffer zone, and determining the latest playing time from the acquired playing time;

and determining the difference between the target playing time and the latest playing time as the audio rendering time delay.

9. The method of claim 7, wherein adjusting the rendering progress of the image frame based on the audio rendering delay to synchronize the rendering of the audio frame and the image frame comprises:

When the value of the audio rendering delay is larger than zero, delaying the rendering progress of the image frame based on the value of the audio rendering delay so as to synchronize the rendering of the audio frame and the image frame, wherein the value of the audio rendering delay is positively correlated with the delaying degree of the rendering progress;

And when the value of the audio rendering delay is smaller than zero, accelerating the rendering progress of the image frame based on the value of the audio rendering delay so as to synchronize the rendering of the audio frame and the image frame, wherein the value of the audio rendering delay is positively correlated with the accelerating degree of the rendering progress.

10. The method of claim 9, wherein the deferring the rendering progress of the image frame based on the magnitude of the audio rendering delay value comprises:

Determining a pause time based on the numerical value of the audio rendering time delay, wherein the pause time is positively correlated with the numerical value of the audio rendering time delay; suspending rendering of the image frame according to the suspension duration; or alternatively

Acquiring an image rendering speed of the image frame, and determining a first rendering speed based on the numerical value of the audio rendering delay;

Wherein the first rendering speed is inversely related to the magnitude of the value of the audio rendering delay, and the first rendering speed is smaller than the audio rendering speed of the audio frame;

and adjusting the image rendering speed to the first rendering speed.

11. The method of claim 9, wherein the accelerating the rendering progress of the image frame based on the magnitude of the audio rendering delay comprises:

acquiring an image rendering speed of the image frame, and determining a second rendering speed based on the numerical value of the audio rendering delay;

wherein the second rendering speed is positively correlated with the magnitude of the audio rendering delay, the second rendering speed being greater than the audio rendering speed of the audio frame;

And adjusting the image rendering speed to the second rendering speed.

12. A video rendering device, the device comprising:

13. An electronic device, the electronic device comprising:

A memory for storing computer executable instructions or computer programs;

A processor for implementing the video rendering method of any one of claims 1 to 11 when executing computer-executable instructions or computer programs stored in the memory.

14. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the video rendering method of any one of claims 1 to 11.

15. A computer program product comprising a computer program or computer-executable instructions which, when executed by a processor, implement the video rendering method of any one of claims 1 to 11.