WO2022193141A1

WO2022193141A1 - Multimedia file playing method and related apparatus

Info

Publication number: WO2022193141A1
Application number: PCT/CN2021/081127
Authority: WO
Inventors: 刘秦涛
Original assignee: 华为技术有限公司
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-09-22
Also published as: CN116965038A

Abstract

Embodiments of the present application provide a multimedia file playing method and a related apparatus. The method may comprise: loading a multimedia file into a memory of a player; parsing the multimedia file to obtain a plurality of code stream packets, wherein the code stream packets comprise audio data packets and video data packets; and continuously loading a first number of audio data packets into an audio decoder, and continuously loading a second number of video data packets into a video decoder, wherein the first number is an integer greater than 1, and the second number is an integer greater than 1. According to the embodiments of the present application, playing jamming caused by frequently jumping and loading corresponding code stream packets can be reduced while ensuring that audio data packets and video data packets can be played synchronously.

Description

Multimedia file playback method and related device

technical field

The present invention relates to the technical field of multimedia, and in particular, to a method and a related device for playing multimedia files.

Background technique

With the development of computer technology, it is more and more common to record multimedia files (such as short videos) on electronic devices such as mobile phones, and to process multimedia files (such as short videos). Generally speaking, a multimedia file contains two parts: one part is description information or description block, and the other part is multiple code stream packets. The code stream packet may include processed audio data packets and video data packets. Generally speaking, the audio data packets and the video data packets are sequentially and uniformly interleaved and stored.

However, due to differences in equipment or software, the storage location of the stream packets in the recorded or processed multimedia files may change, that is, the audio data packets and the video data packets are not evenly interleaved and stored in sequence. Therefore, in some scenarios, it may cause a stuck phenomenon when the multimedia file is played, reducing the user's perception.

SUMMARY OF THE INVENTION

The embodiment of the present application discloses a method and a related device for playing a multimedia file, which can ensure that audio data packets and video data packets can be played synchronously, and reduce the playing card caused by frequent jumping and loading of corresponding stream packets. Dayton problem.

A first aspect of the embodiments of the present application provides a method for playing a multimedia file, the method may include: loading the multimedia file into the memory of the player; parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets; then continuously load the first quantity of the audio data packets into the audio decoder, and continuously load the second quantity of the video data packets into the video decoder, wherein the first quantity is greater than an integer of 1, the second number is an integer greater than 1.

In the embodiment of the present application, the loading method of continuously loading audio data packets with a quantity greater than 1 into the audio decoder each time and continuously loading video data packets with a quantity greater than 1 into the video decoder each time is compared with the prior art. The loading method of loading 1 audio data packet into the audio decoder at a time, and loading 1 video data packet into the video decoder at a time, can reduce the number of frequent jumps to load code stream packets. Under the condition that audio data packets and video data packets can be played synchronously, the speed of reading stream packets is improved to avoid playback freezes caused by data packet underload.

According to the first aspect, in a possible implementation manner, the first number of video data packets corresponds to the first playback period, the second number of audio data packets corresponds to the second playback period, and the first playback period and the second The deviation between the playback time periods is less than the preset time difference threshold.

According to the first aspect, in a possible implementation manner, the first playback time period corresponding to the first number of video data packets is at least 1 second; the second playback time period corresponding to the second number of video data packets for a length of time of at least 1 second.

It can be seen that the first quantity can satisfy a certain time length for the video data packets to be played on the player, and the second quantity can also satisfy the time length required for the video data packets to play the first quantity on the player. Therefore, the above-mentioned method of loading stream packets is no longer the method of synchronizing and aligning with the display timestamp of a single stream packet in the prior art, but the method of roughly aligning in units of preset time periods, which can avoid Performance overhead caused by frequent jumps to load stream packages.

According to the first aspect, in a possible implementation manner, the multimedia file includes description information, and after parsing the multimedia file to obtain a plurality of code stream packets, the audio data packets of the first number are continuously loaded into the audio decoder, and, Before continuously loading the second number of the video data packets into the video decoder, the method further includes: determining whether the interleaving of the plurality of code stream packets is uniform according to the description information. The description information stores the relevant information of each code stream packet, that is, the relevant information of each audio data packet and each video data packet. According to the relevant information of the audio data packet and the video data packet, it can be determined whether the interleaving of the multiple code stream packets is uniform.

According to the first aspect, in a possible implementation manner, determining whether the interleaving of multiple code stream packets is uniform according to the description information may include: according to the description information, counting audio data with a corresponding display timestamp relationship in the multiple code stream packets The amount of stored data between the packets and the video data packets; then count the number of targets whose stored data amount is greater than or equal to the target distance threshold; finally, it can be determined whether the interleaving of the multiple code stream packets is uniform according to the number of targets.

It can be seen that the target number is determined by fully considering each audio data packet and video data packet, so the accuracy of judging whether the interleaving of multiple code stream packets is uniform can be improved according to the target quantity, so that the judgment result of whether the interleaving is uniform can be improved. have higher credibility.

According to the first aspect, in a possible implementation manner, the target distance threshold is the larger one of the first distance and the preset distance, and the first distance is the width, height, storage ratio of the multimedia file carried according to the description information or determined by at least one of the compression ratios; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.

It can be seen that the first distance is a parameter related to the multimedia file, so the target distance threshold is also a parameter related to the multimedia file. Therefore, the target distance threshold can improve the accuracy of determining whether the interleaving of multiple code stream packets is uniform in the multimedia file. , so that the judgment result of whether the interleaving is uniform has higher reliability.

According to the first aspect, in a possible implementation manner, the statistics between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets according to the description information The amount of stored data includes: determining, according to the description information, the display time stamp and storage location of each code stream packet in a preset number of code stream packets, where the preset number of code stream packets belong to the multiple code streams The first N code stream packets in the package, where N is a positive integer; according to the manner in which the display time stamp is gradually increased, it is determined that the video data packets and the audio frequency corresponding to the display time stamp relationship in the preset number of code stream packets are The respective storage positions of the data packets; according to the statistics of the storage positions of the video data packets and the storage positions of the audio data packets, determine the storage data amount between the video data packets and the audio data packets with the corresponding display time stamp relationship.

It can be seen that when the number of code stream packets of a multimedia file is large, the interleaving situation of the first N code stream packets in the multiple code stream packets can be determined first, so as to avoid the excessive time for determining the interleaving situation of the code stream packets. Affects the speed at which the player starts playback.

According to the first aspect, in a possible implementation manner, the determining whether the interleaving of the plurality of code stream packets is uniform according to the target quantity includes: calculating the ratio of the target quantity to the preset quantity; If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the multiple code stream packets belongs to uneven interleaving; or if the target number is greater than or equal to the third preset threshold, it is determined that the multiple code stream packets are interleaved unevenly; The interleaving of the code stream packets is uneven.

It can be seen that, because the amount of stored data between the audio data packet and the video data packet is greater than or equal to the target distance threshold, according to the prior art, one audio data packet is loaded into the audio decoder each time, and one audio data packet is loaded each time. There may be jumps in the way of loading video data packets into the video decoder, that is, after loading an audio data packet into the audio decoder first, it is necessary to jump to the storage location of the corresponding video data packet to load a video packets to the video decoder. Therefore, when the amount of stored data between the audio data packets and the video data packets with the corresponding display timestamp relationship is greater than or equal to the target distance threshold, the ratio of the target quantity to the preset quantity is relatively large, that is, the ratio of the target quantity to the preset quantity When it is greater than or equal to the second preset threshold, or, when the above-mentioned number of targets is greater than or equal to the third preset threshold, frequent jumps may occur in the process of loading the code stream packet to the corresponding decoder according to the prior art, so it is possible to Determine the uneven interleaving of multiple stream packets. The ratio obtained by calculation can improve the accuracy of judging that the interleaving of multiple code stream packets belongs to uneven interleaving, so that the judgment result of uneven interleaving has higher reliability. In the case of uneven interleaving of multiple code stream packets, each consecutive loading of audio data packets greater than 1 into the audio decoder, and each successive loading of video packets greater than 1 into the video decoder In this way, the number of frequent jumps to load the stream package can be reduced.

Because the number of code stream packets that are continuously loaded into the corresponding decoder each time is larger than the number of code stream packets loaded into the corresponding decoder each time in the prior art, in the case of uneven interleaving of multiple code stream packets, In the prior art, frequent jumps are required to complete the loading of the code stream package, and this solution can reduce the number of times of jumping and loading the code stream package in the prior art. It should be noted that both the second preset threshold and the third preset threshold may be a value artificially set according to experience for reference and comparison, or a value obtained by training (or learning) according to multiple historical values. The value of the reference comparison.

According to the first aspect, in a possible implementation manner, the code stream package further includes a subtitle data package, and after parsing the multimedia file to obtain the description information and multiple code stream packages, it may further include: continuously loading a third number of subtitles data packets into the subtitle decoder, wherein a first quantity of the audio data packets, a second quantity of video data packets and a third quantity of subtitle data packets are alternately loaded; wherein the third quantity is an integer greater than 1, The third number of subtitle data packets corresponds to the third playback time period, and the deviation between the third playback time period and the first playback time period or the deviation from the second playback time period is smaller than the preset time difference threshold.

It can be seen that when the code stream package also includes subtitle data packets, each time the audio data packets with a quantity greater than 1, the video data packets with a quantity greater than 1, and the subtitle data packets with a quantity greater than 1 are loaded into the corresponding decoders, respectively, Compared with the method of loading 1 audio data packet, 1 video data packet and 1 subtitle data packet into the corresponding decoders at a time in the prior art, the number of frequent jumps to load the code stream packets can be reduced, and the audio frequency can be guaranteed. When the data packets, video data packets and subtitle data packets are synchronized, the speed of reading the stream packets is increased, and the playback freeze caused by the occurrence of data packet underload is avoided.

According to the first aspect, in a possible implementation manner, after the parsing the multimedia file to obtain the description information and the multiple code stream packets, the method further includes: if the description information determines that the interleaving situation of the multiple code stream packets belongs to If the interleaving is uniform, the audio data packet with the smallest display time stamp is loaded into the audio decoder, and the video data packet with the smallest display time stamp is loaded into the video decoder.

It can be seen that in the process of reading and loading the code stream package, the interleaving situation of the code stream package can be dynamically judged in real time. When the interleaving situation is uniform, the loading method of the code stream package can be dynamically adjusted. The unloaded audio data packets and the video data packets with the smallest display time stamp are loaded in the largest order and played in the player.

According to the first aspect, in a possible implementation manner, the multimedia file includes description information, continuously loads a first number of audio data packets into the audio decoder, and continuously loads a second number of video data packets into the video decoder , including: if the information of the first stream packet stored in the description information is the information of the audio data packet, then continuously load the audio data packet of the first quantity into the audio decoder, and then continuously load the video data packet of the second quantity to in the video decoder.

It can be seen that the method of loading the data packet can be determined according to the description information of the multimedia file. If the first stream packet is an audio data packet, the audio data packet can be loaded to the audio decoder first. Determining the order of loading audio code stream packets and video code stream packets through the description information has better reliability.

According to the first aspect, in a possible implementation manner, the multimedia file includes description information, continuously loads a first number of audio data packets into the audio decoder, and continuously loads a second number of video data packets into the video decoder , including: if the information of the first stream packet stored in the description information is the information of the video data packet, then continuously load the second quantity of video data packets into the video decoder, and then continuously load the first quantity of audio data packets to in the audio codec.

It can be seen that the method of loading the data packet can be determined according to the description information of the multimedia file. If the first stream packet is a video data packet, the video data packet can be loaded to the video decoder first. Determining the order of loading audio code stream packets and video code stream packets through the description information has better reliability.

A second aspect of the embodiments of the present application provides a device for playing multimedia files, and the device may include:

The first loading unit is used to load the multimedia file into the memory of the player;

a parsing unit, used for parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets;

a second loading unit, configured to continuously load a first number of audio data packets into the audio decoder, and continuously load a second number of video data packets into the video decoder, wherein the first number is an integer greater than 1, The second number is an integer greater than one.

According to the second aspect, in a possible implementation manner, the first number of video data packets corresponds to the first playback period, the second number of audio data packets corresponds to the second playback period, and the first playback period and the second The deviation between the playback time periods is less than the preset time difference threshold.

According to the second aspect, in a possible implementation manner, the multimedia file may include description information, and the above apparatus further includes a determination unit, configured to determine whether the interleaving of multiple code stream packets is uniform according to the description information.

According to the second aspect, in a possible implementation manner, the determining unit is specifically configured to: according to the description information, count the stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets Count the number of targets whose data volume is greater than or equal to the target distance threshold; determine whether the interleaving of multiple code stream packets is uniform according to the target number.

According to the second aspect, in a possible implementation manner, the target distance threshold is the larger one of the first distance and the preset distance, and the first distance is the width, height, storage ratio of the multimedia file carried according to the description information or determined by at least one of the compression ratios; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.

According to the second aspect, in a possible implementation manner, the determining unit is specifically configured to: determine, according to the description information, the display time stamp and storage location of each code stream packet in the preset number of code stream packets, and the preset number of The stream packet belongs to the first N stream packets among the multiple stream packets, and N is a positive integer; the video data packets with the corresponding display timestamp relationship among the preset number of stream packets are determined according to the method of increasing the display timestamp gradually. and the respective storage locations of the audio data packets; determine the storage data amount between the video data packets and the audio data packets with the corresponding display time stamp relationship according to the statistical storage positions of the video data packets and the storage positions of the audio data packets.

According to the second aspect, in a possible implementation manner, the determining unit is specifically configured to: calculate the ratio of the target quantity to the preset quantity; if the ratio is greater than or equal to the second preset threshold, determine the The interleaving is uneven; or if the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of multiple code stream packets is uneven.

According to the second aspect, in a possible implementation manner, the code stream package further includes a subtitle data package, and the loading unit is further configured to: continuously load a third number of subtitle data packages into the subtitle decoder, where the first number of The audio data packets of the second quantity, the video data packets of the second quantity and the subtitle data packets of the third quantity are loaded alternately; wherein, the third quantity is an integer greater than 1, the subtitle data packets of the third quantity correspond to the third playback time period, and the The deviation between the third playing time period and the first playing time period, or the deviation from the second playing time period is smaller than the preset time difference threshold.

According to the second aspect, in a possible implementation manner, the loading unit is further configured to: if the description information determines that the interleaving of the multiple code stream packets is uniformly interleaved, load the audio data packet with the smallest display time stamp to the audio decoding and, load the video packet with the smallest display timestamp into the video decoder.

Regarding the technical effects brought about by the second aspect or possible implementation manners, reference may be made to the introduction to the technical effects of the first aspect or corresponding implementation manners.

A third aspect of the embodiments of the present application provides an electronic device, the electronic device includes at least one processor and a transmission interface, the at least one processor receives or sends a signal through the transmission interface; the at least one processor is used to call storage A computer program in a memory to cause a trading company electronic device to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

A fourth aspect of the embodiments of the present application discloses a computer-readable storage medium, where program instructions are stored in the computer-readable storage medium, and when the program instructions are run on a computer or a processor, the first aspect or any one of the first aspect is executed. A possible implementation of the described method.

A fifth aspect of the embodiments of the present application discloses a computer program product. The computer program product includes program instructions. When the program instructions are run on a computer or a processor, the first aspect or any possible implementation manner of the first aspect is executed. the described method.

Description of drawings

1A is a schematic diagram of a method for playing a uniformly interleaved multimedia file provided by an embodiment of the present application;

1B is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by an embodiment of the present application;

1C is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by the prior art;

2A is a schematic diagram of a playback environment of a multimedia file provided by an embodiment of the present application;

2B is a schematic diagram of another multimedia file playback environment provided by an embodiment of the present application;

3A is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

3B is a software structural block diagram of an electronic device provided by an embodiment of the present application;

4 is a schematic diagram of software module interaction of an electronic device provided by an embodiment of the present application;

5 is a method for playing a multimedia file provided by an embodiment of the present application;

6 is another method for playing a multimedia file provided by an embodiment of the present application;

7 is a schematic flowchart of a method for judging whether the interleaving of multimedia files is uniform according to an embodiment of the present application;

8A is a schematic flowchart of a method for playing a batch loading stream package provided by an embodiment of the present application;

8B is a schematic diagram of a playback method for batch loading stream packets provided by an embodiment of the present application;

9 is another method for playing a multimedia file provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of an apparatus for playing a multimedia file provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It should be noted that, in this application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiment or design described in this application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs, use of "exemplary" or "such as ” and other words are intended to present the related concept in a concrete way.

The following briefly introduces related technologies and professional terms involved in the present application to facilitate understanding.

1. Multimedia container

The compressed audio data packets, video data packets and/or subtitle data packets are stored in the multimedia container, and the container format is also called the encapsulation format. Common encapsulation formats include one or more of the following: MPEG-4 Part 14 (MPEG-4 Part14, MP4), Audio Video Interleaved (AVI), Transport Stream (MPEG2-TS, TS) and so on. Wherein, different container formats store audio data packets, video data packets and/or subtitle data packets in different ways, and are respectively applied in different fields. For example: TS is a stream encapsulation form, commonly used in broadcast TV and streaming media protocols; MP4 is a frame encapsulation form, commonly used in the field of local video and network video.

2. Descriptive information

The description information can also be called an information block. The description information includes the description information of the stream packets (for example, multiple video data packets, multiple audio data packets and/or multiple subtitle data packets) contained in the multimedia file, and the description information may include one or more of the following: Information such as file identification, playback duration of multimedia files, video width, height, frame rate, bit rate, resolution, etc.; audio sampling rate, number of channels, and other information. In addition, the description information also includes a storage information table, which describes the storage location (Position, POS) to which each video data packet, audio data packet and/or subtitle data packet is stored, the length of the packet, and the display of the packet. Timestamp (Presentation time stamp, PTS), etc.

It should be noted that the description information is usually located at the head or tail of the multimedia file. Wherein, the file identifier may be "Chinese subtitle", "Chinese audio", "English audio" and so on.

3. Data Rate

The code stream refers to the data flow in a unit time after the video data is encoded and compressed. Generally speaking, under the same resolution, the larger the code stream of the video data, the smaller the compression ratio and the higher the picture quality.

4. Resolution

Video is composed of consecutive images, each image is called a frame (Frame), and the image is composed of pixels (Pixel). The number of pixels in an image is called the resolution of the image. For example, an image of 1920*1080 means that it is composed of horizontal and vertical 1920*1080 pixels. Therefore, the resolution of the video is the resolution of each frame of the image.

5. Frame rate

A frame is a still picture, and continuous frames form an animation, such as a movie. The number of frames usually referred to is the number of frames of pictures transmitted in seconds, usually expressed in frames per second (Frames Per Second, FPS). Each frame is a still image, and displaying frames in rapid succession creates the illusion of motion, restoring the state of the object at that time. Higher frame rates result in smoother, more realistic animations. The more frames per second (FPS), the smoother the displayed motion will be.

6. Bit Rate

Bit rate refers to the number of bits (bits) transmitted per second. The unit is bps (Bit Per Second). The higher the bit rate, the larger the transmitted data.

The bit rate indicates how many bits per second the encoded (compressed) audio and video data needs to represent, and a bit is the smallest unit in binary, either 0 or 1. The relationship between bit rate and audio and video compression is simply that the higher the bit rate, the better the quality of audio and video, but the larger the encoded file; if the bit rate is smaller, the situation is just the opposite.

7. Sample Rate

Sampling rate (also called sampling speed or sampling frequency) defines the number of times that audio data is taken per second, and is expressed in Hertz (Hz).

The sampling rate refers to the sampling frequency when converting an analog signal into a digital signal, that is, how many points are sampled per unit time. How many bits are there in a sample point data.

8. Number of channels

The number of channels is the number of sound channels. The sound is played by the speaker after the audio data is decoded. The channels are often divided into monophonic and stereophonic.

9. Decapsulation

Decapsulation is to split the multimedia file according to the corresponding encapsulation format, and split the audio data packet, video data packet and/or subtitle data packet in the multimedia file. The parameters of the multimedia file can be obtained through decapsulation, such as encoding format, file size, playback duration, resolution, audio sampling rate, number of channels, and so on.

In order to facilitate understanding of the embodiments of the present application, the following first analyzes and proposes specific technical problems to be solved by the present application. Please refer to FIG. 1A . FIG. 1A is a schematic diagram of a method for playing a uniformly interleaved multimedia file provided by an embodiment of the present application. As can be seen from FIG. 1A , the multimedia file 100A includes description information and a plurality of code stream packets, the code stream packets include audio data packets and video data packets, and the description information carries the attribute information of each code stream packet (the attribute information indicates the whether it is an audio data packet or a video data packet), display time stamp, storage location, memory size and other description information. Wherein, the audio data packets and the video data packets of the multimedia file 100A are evenly interleaved and stored in sequence according to the adjacent display time stamps. For example: the audio data packet with the display time stamp "1" and the video data packet with the display time stamp "1" are stored adjacent to each other, and the video data packet with the display time stamp "1" and the video data packet with the display time stamp "2" are stored adjacent to each other. Audio packets are stored contiguously. It should be noted that if the display time stamp is "1", it can indicate that the stream packet corresponding to the display time stamp can be displayed in the "1st bit" after decoding. The corresponding code stream packet can be displayed in the "2nd bit" after decoding, that is, it is displayed after the display time stamp is "1". Or the display timestamp of "1" can also indicate that the stream packet corresponding to the displayed timestamp can be displayed at the "preset time point" after decoding, and the "preset time point" can be determined according to actual needs. This application implements The example does not impose any restrictions. The electronic device loads the multimedia file 100A into the memory of the player 200, then parses the multimedia file 100A to obtain description information and a plurality of code stream packets, and then loads the audio data packets in sequence according to the description information of the plurality of code stream packets contained in the description information To the audio decoder, video data packets to the video decoder, and then synchronously process the decoded audio data packets and video data packets and send them to the speaker and the display, respectively, to achieve the playback effect of accent and lip sync.

For example, it can be seen from FIG. 1A that the electronic device loads the audio data packets with a display time stamp of "1" and the audio data packets with a display time stamp of "1" in sequence from the audio data packets with a display time stamp of "1". The video data packets, the audio data packets with the display time stamp "2", and the video data packets with the display time stamp "2" are stored in the memory of the player 200 . Next, the electronic device loads the audio data packet with the display time stamp "1" into the audio decoder, and then loads the video data packet with the display time stamp "1" into the video decoder, and then sequentially displays the time The audio packets with the timestamp "2" are loaded into the audio decoder, and the video packets with the display timestamp "2" are loaded into the video decoder.

It should be noted that each stream packet has a display timestamp, and the display timestamp indicates when the stream packet is decoded and played, that is, the display timestamp is played first, and the display timestamp is displayed later. , the stream packets with the corresponding display timestamps need to be played at the same time after being decoded. For example, an audio data packet with a display time stamp of "1" and a video data packet with a display time stamp of "1" are stream packets with a corresponding display time stamp relationship. Therefore, when loading the code stream package, the player 200 can sequentially load the code stream package into the decoder according to the increasing order of the display time stamp, perform decoding and playback.

Please refer to FIG. 1B . FIG. 1B is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by an embodiment of the present application. As can be seen from FIG. 1B , the audio data packets and video data packets of the multimedia file 100A are not evenly stored in order according to the display time error. The playback of the data packets) is consistent, and it is necessary to continuously jump to the multimedia file 100B to load the corresponding audio data packets and video data packets. Then, after synchronizing the decoded audio data packets and video data packets, they are sent to the speaker and the display respectively, so that the playback effect of accent and mouth synchronization can be realized.

For example, it can be seen from FIG. 1B that the electronic device loads the first code stream block (including the description information of the multimedia file and the display timestamp of “1” from the audio data packet with the display time stamp “1” in the storage order. ”, audio packets with display timestamp “2”, audio packets with display timestamp “3”, audio packets with display timestamp “4”, and audio packets with display timestamp “5” audio data packets) into the memory of the player 200. Then the electronic device can first load the audio data packets with the display time stamp "1" into the audio decoder according to the order of the display time stamps, and then according to the principle that the audio data packets and the video data packets need to be played synchronously, the display time stamps need to be displayed. A video packet of "1" is sent to the video decoder. However, the electronic device finds that the first stream block loaded into the memory of the player 200 does not have a video data packet with a display timestamp of "1", so it needs to jump to the multimedia file 100B with a display timestamp of "1" The storage location of the video data packet loads the second code stream block (including one or more code stream packets) into the memory. Therefore, the player 200 loads the second stream block in sequence starting from the video data packet with the display time stamp "1" (including the video data packet with the display time stamp "1" of the

multimedia file

100B, 10", audio packets with display timestamp "11", audio packets with display timestamp "12", and video packets with display timestamp "2") into memory. It can be understood that the audio data packets with display time stamps "2" to "5" previously loaded into the memory will be deleted from the memory by the player 200 because they are not used. Then, the electronic device loads the video data packet whose display time stamp is “1” into the video decoder of the player 200 . Next, the electronic device sequentially needs to load the audio data packets with the display time stamp "2" into the audio decoder of the player 200, but there is no display time stamp "2" in the second stream block loaded into the memory Therefore, the electronic device needs to jump to the storage location of the audio data packet with the timestamp "2" in the multimedia file 100B to load the third stream block (including one or more stream packets) into the memory. It can be understood that the audio data packets with display time stamps "10" to "12" and the video data packets with display time stamp "2" previously loaded into the memory will be deleted from the memory by the player 200 because they are not used. deleted in .

It can be seen that when the interleaving of the audio data packets and the video data packets of the multimedia file 100B is uneven, it is necessary to repeatedly jump and load the corresponding stream packets into the decoder, which will lead to a decrease in the loading performance. In a network scenario, when an unevenly interleaved multimedia file 100B is played online, the performance of repeated jumps may be more serious, which may cause problems such as playback freezes caused by data underload.

Please refer to FIG. 1C , which is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided in the prior art. First, the player 200 can determine the first file start position of the audio data package of the target multimedia file and the second file start position of the video data package, when the possible difference between the first file start position and the second file start position is When the amount of stored data is greater than or equal to the preset threshold, the player 200 can read the audio data packets and the video data packets respectively based on the dual input/output (I/O), and then read the audio data packets and the video data packets according to the Display time stamp or decoding time stamp (it should be noted that for online multimedia files, decoding time stamp is equal to display time stamp) Sort the read audio data packets and video data packets, based on the sorted audio data packets and video packets to play the target multimedia file.

For example, it can be seen from FIG. 1C that the amount of storable data is determined based on the storage location of the audio data packet with the display time stamp "1" and the storage location of the video data packet with the display time stamp "1". When the amount of data exceeds the threshold, it means that the code stream packets of the multimedia file 100C are unevenly interleaved. The player 200 sequentially loads the first stream block (including the audio data packets with the display time stamp "1 to 9", the display video packets with a timestamp of "1" and audio packets with a display timestamp of "10"), then delete the video packets with a display timestamp of "1", and only keep the audio in the first I/O packets, and then sort the retained audio packets according to the display timestamp. The player 200 sequentially loads the second stream block (including the video data packet with the display time stamp "1", the video data packet with the display time stamp "1", the display time stamp Audio packets with display timestamps "10 to 12", video packets with display timestamps "2", audio packets with display timestamps "13 to 15", and video packets with display timestamps "3 to 6" data packets), and then delete the audio data packets with the display time stamp "10 to 12" and the display time stamp "13 to 15", only keep the video data packets in the first I/O, and then according to the display time stamp Sort reserved video packets.

It can be seen that the following problems exist:

1. It is inaccurate to judge the interleaving situation of multiple stream packets of the multimedia file based on the storage distance (storable data amount) between the first audio data packet and the first video data packet.

2. Compared to loading multimedia files with a single I/O channel, loading multimedia files with dual I/O channels may result in double the bandwidth ratio, resulting in lower download efficiency. For example, when playing multimedia files with 4K resolution or 8K resolution, there may be an underload due to network speed limitations.

3. To load multimedia files based on dual I/O channels, two threads need to be established. The two threads coordinate the download position with each other, which may increase the memory consumption of the player and waste system resources.

4. Sorting audio data packets and video data packets based on the display time stamp will take time and reduce playback efficiency.

It should be noted that the above-mentioned multimedia file 100A, multimedia file 100B and multimedia file 100C may be files downloaded from the network or files loaded locally.

In order to solve the above technical problem, first, please refer to FIG. 2A , which is a schematic diagram of a playback environment of a multimedia file provided by an embodiment of the present application. As can be seen from FIG. 2A , the playback environment 001 may include a first electronic device 200A and a server 201 . A communication connection relationship may be established between the first electronic device 200A and the server 201 to perform information transmission. The communication between the first electronic device 200A and the server 201 may be based on any wired network and/or wireless network, including but not limited to the Internet, a wide area network, a metropolitan area network, a virtual private network, a wireless communication network, and the like.

Wherein, the first electronic device 200A is installed with application software for playing multimedia files, such as a player provided by some service providers and the like. The first electronic device 200A may include, but is not limited to, terminal devices such as smart phones, desktop computers, tablet computers, notebook computers, digital assistants, smart wearable devices, and the like.

The server 201 may be an independently running server, or a distributed server, or a server cluster composed of multiple servers. The server 201 may store multimedia files to be played, and further, the server 201 may be a background server to which the multimedia files belong. For example, if the first electronic device 200A plays a multimedia file provided by service provider A, the server 201 may be a background server of service provider A.

When the user needs to play a multimedia file on the first electronic device 200A, the first electronic device 200A sends a request to the server 201, and the server 201 sends the multimedia file to the first electronic device 200A, that is, the first electronic device 200A sends a request from the server 201 to the first electronic device 200A. Download multimedia files. After receiving the multimedia file, the first electronic device 200A loads the multimedia file into the memory of the player, and then parses the multimedia file to obtain multiple stream packets, wherein the stream packets include audio data packets and video data packets. Next, the first electronic device 200A may continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder. The first number is an integer greater than 1, and the second number is an integer greater than 1. Wherein, the first number of video data packets corresponds to the first playback time period, the second number of the audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period is less than Preset time difference threshold. That is, the first number of video data packets and the second number of audio data packets can be synchronized for a period of time after being decoded.

Please refer to FIG. 2B . FIG. 2B is a schematic diagram of another multimedia file playback environment provided by an embodiment of the present application. As can be seen from FIG. 2B , the playback environment 002 may include a first electronic device 200A, a second electronic device 200B, and a router 202 . The first electronic device 200A can be connected to the router 202, and the second electronic device 200B can also be connected to the router 202. The router 202 can ensure that the first electronic device 200A and the second electronic device 200 are in the same local area network.

Wherein, the first electronic device 200A is installed with application software for recording and/or playing multimedia files, such as short video shooting software provided by some service providers. The first electronic device may be a terminal device supporting the DIGITAL LIVING NETWORK ALLIANCE (DLNA), such as a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, and the like.

The second electronic device 200B may be a device that has a playback function and also supports DLNA, such as a smart TV, a desktop computer, a notebook computer, and the like.

After the user records or processes a certain multimedia file (which may be a small video) through the first electronic device 200A, the difference between the audio data packet and the video data packet in the recorded or processed multimedia file is affected by the application of the device or software difference. Storage locations are subject to change. When the multimedia file recorded and photographed by the first electronic device 200A is played on the second electronic device 200B through DLNA, after the second electronic device 200A loads the multimedia file into the memory of the player, it can parse the multimedia file to obtain multiple codes Stream packets, wherein the code stream packets include audio data packets and video data packets. Next, the second electronic device 200A can continuously load the first number of the audio data packets into the audio decoder, and continuously load the second number of the video data packets into the video decoder, wherein the first A number is an integer greater than 1, and the second number is an integer greater than 1. Wherein, the first number of video data packets corresponds to the first playback time period, the second number of the audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period is less than Preset time difference threshold. That is, the first number of video data packets and the second number of audio data packets can be synchronized for a period of time after being decoded.

Please refer to FIG. 3A , which is a schematic structural diagram of an electronic device provided by an embodiment of the present application. Further, the electronic device 300 shown in FIG. 3A may specifically be a schematic structural diagram of the first electronic device 200A in FIG. 2A , or a schematic structural diagram of the second electronic device 200B in FIG. 2A . 3A, the electronic device 300 may include a processor 110, a memory 120, a sensor module 130, a display device 140, a mobile communication module 150, a wireless communication module 160, an audio module 170, a camera 180, an input device 190, and the like.

It can be understood that, it can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 300 . In other embodiments of the present application, the electronic device 300 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processors, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

The memory 120 stores computer programs, and the computer programs include operating system programs, application programs, and the like, wherein the application programs include browser programs. The processor 110 is configured to read the computer program in the memory 120, and then execute the method defined by the computer program, for example, the processor 110 reads the operating system program to run the operating system on the electronic device 300 and implement various functions of the operating system, Or read one or more application programs to run the applications on the electronic device 300 , for example, read a browser program to run a browser.

In addition, the memory 120 also stores other data other than the computer program, and the other data may include data generated after the operating system or the application program is executed, and the data includes system data (such as configuration parameters of the operating system) and user data, such as Payment information for business products can be regarded as user data.

Memory 120 generally includes internal memory and external memory. The internal memory can store executable program codes of the calculator, and can be random access memory (RAM), read only memory (ROM), and cache memory (CACHE). The processor 110 executes various functional applications and data processing of the electronic device 300 by executing the instructions stored in the internal memory. The internal memory may include a program storage area and a data storage area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area can store data (such as audio data packets, video data packets, subtitle data packets, etc.) created during the use of the electronic device, and the like. In addition, the internal memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.

The external memory can be used to connect an external memory card, such as a hard disk, an optical disk, a USB disk, a floppy disk, or a tape drive, etc., so as to expand the storage capacity of the electronic device 300 . The external memory card communicates with the processor 110 through the external memory interface to realize the data storage function. For example to save files like music, video etc in external memory card.

The sensor module 130 includes a pressure sensor, a fingerprint sensor, a touch sensor, and the like. Among them, the pressure sensor is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor may be provided on the display screen. The fingerprint sensor is used to collect fingerprints, and electronic devices can use the collected fingerprint features to unlock fingerprints, access application locks with fingerprints, take photos with fingerprints, answer calls with fingerprints, and pay with fingerprints. A touch sensor is used to detect touch operations on or near it.

The display device 140 is used to display images, videos, and the like. It includes a display screen for displaying information input by the user or information provided to the user, various menu interfaces of the electronic device 300, and the like. In the embodiment of the present application, the electronic device displays the video data packets in the multimedia file through the display screen. The display screen of the display device 140 can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active matrix organic light emitting diode). -matrix organic light emitting diode, AMOLED), flexible light emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 300 may include 1 or N display screens, where N is a positive integer greater than 1.

The wireless communication function of the electronic device 300 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like. Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the electronic device 300 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 . The electronic device 300 can directly establish a communication connection with the server 201 through the mobile communication module 150, receive instructions and data (such as multimedia files) transmitted by the server 201, and also transmit instructions and data to the cloud server.

The wireless communication module 160 can provide applications on the electronic device 300 including wireless local area networks (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellite systems (GNSS). ), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .

In some embodiments, the antenna 1 of the electronic device 300 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 300 can communicate with the network and other devices through wireless communication technology. The wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).

The electronic device may implement audio functions through the audio module 170 and the processor 110 and the like. For example, audio playback, recording, etc. The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .

In some embodiments, the electronic device may also implement a shooting function through an ISP, a camera, a video codec, a GPU, a display device 140, a processor 110, and the like.

The ISP is used to process the data fed back by the camera 180 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 180 .

The camera 180 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 300 may include 1 or N cameras 180 , where N is a positive integer greater than 1.

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 300 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, and the like.

Video codecs are used to compress or decompress digital video. The electronic device 300 may support one or more video codecs. In this way, the electronic device 300 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.

The input device 190 is used for receiving input digital information, character information or contact touch operation/non-contact gesture, and generating signal input related to user settings and function control of the electronic device 300 .

Referring to FIG. 3B , FIG. 3B is a software structural block diagram of an electronic device provided by an embodiment of the present application. It can be understood that the software structural block diagram of the electronic device shown in FIG. 3B may specifically be the software structural block diagram of the first electronic device 200A shown in FIG. 2A , or the software structural block diagram of the second electronic device 200B shown in FIG. 2B . Software systems of electronic equipment include but are not limited to

Linux or other operating systems.

For Huawei's Hongmeng system. The embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of an electronic device. As can be seen from Figure 3B, the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a system runtime layer, and a kernel layer.

The application layer can include a series of application packages. As shown in Figure 3B, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, etc.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. As shown in Figure 3B, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.

Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide the communication function of the electronic device. For example, the management of call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.

The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.

The system runtime layer includes system libraries and Android runtime. The system library is the support of the application framework; the Android runtime is responsible for the scheduling and management of the Android system, and is divided into two parts: the core library and the virtual machine.

The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.

The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.

A system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.

The Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: Moving Pictures Experts Group (MPEG), Audio Video Interleaved (AVI), Transport Stream (MPEG2-TS, TS), dynamic Video experts compress standard Audio Layer 3 (Moving Picture Experts Group Audio Layer III, MP3), Advanced Audio Coding (Advanced Audio Coding, AAC), Portable Network Graphics (PNG), etc.

The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, compositing and layer processing, etc.

2D graphics engine is a drawing engine for 2D drawing. The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.

The software and hardware workflows of the electronic device 300 are exemplarily described below with reference to the playback scene of the multimedia file.

When the touch sensor receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch coordinate operation, and the control corresponding to the touch operation is the control of the icon of the video application as an example, the video application calls the interface of the application framework layer to start the video application, and then starts the audio driver and the display driver by calling the kernel layer. , to continuously load the first number of audio data packets into the audio decoder, and, to continuously load the second number of video data packets into the video decoder, and then through the display device and 140 and the audio module 170 shown in FIG. 3A Realize the synchronous playback of video data packets and audio data packets. The first number of video data packets corresponds to the first playback time period, the second number of audio data packets corresponds to the second playback time period, and the difference between the first playback time period and the second playback time period is less than a preset time difference threshold , the first quantity is an integer greater than 1, and the second quantity is an integer greater than 1.

Please refer to FIG. 4. FIG. 4 is a schematic diagram of software module interaction of an electronic device provided by an embodiment of the present application. It can be understood that the schematic diagram of the interaction of software modules of the electronic device shown in FIG. 4 may specifically be the schematic diagram of the interaction of software modules of the first electronic device 200A shown in FIG. 2A , or the software modules of the second electronic device 200B shown in FIG. 2B . Interactive diagram. The software modules may include: a file loading module 401 , a parsing module 402 , a judgment module 403 , a stream package loading module 404 , a decoding module 405 and a synchronization module 406 . in,

File loading module 401, including one or more file protocols, such as File Transfer Protocol (File Transfer Protocol, FTP), Hypertext Transfer Protocol (Hypertext Transfer Protocol, HTTP), Real Time Streaming Protocol (Real Time Streaming Protocol, RTSP) etc., the loader 402 can load or download the multimedia file into the memory of the player through the above-mentioned protocol.

The parsing module 402 is configured to use a corresponding file protocol to parse the multimedia file according to the encapsulation form of the multimedia file to obtain description information and a plurality of code stream packets, wherein the code stream packets include audio data packets and video data packets.

The judgment module 403 is used to count the storage data volume between the audio data packets and the video data packets with the corresponding display timestamp relationship in the multiple code stream packets according to the description information, and then count the storage data volume greater than or equal to the target distance threshold value. The target number, and finally, according to the target number, it can be determined whether the interleaving of multiple code stream packets is uniform. In an optional situation, the target distance threshold is the larger one of the first distance and the preset distance, and the first distance is the width, height, storage ratio or compression ratio of the multimedia file carried according to the description information. At least one item is determined; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame. A video frame may be an image displayed on a display after the video data packets in the multimedia file are decoded.

The code stream package loading module 404 continuously loads the first number of audio data packets into the audio decoder, and continuously loads the second number of video data packets into the video decoder. Wherein, the first number is an integer greater than 1, and the second number is an integer greater than 1. Further, the first number of video data packets corresponds to the first playback time period, the second number of the audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period less than the preset time difference threshold. The first playback time period and the second playback time period may be time periods set according to actual needs, and the preset time difference threshold may be a value artificially set based on experience for reference and comparison, or training based on multiple historical values. (or learned) a value used for reference contrast.

The decoding module 405 includes an audio decoder and a video decoder. The audio decoder is used for decoding the audio data packets in the compressed and encoded form into uncompressed audio raw data; the video decoder is used for decoding the video data packets in the compressed and encoded form into the uncompressed video original data.

The synchronization module 405 is used to perform synchronization processing on the first quantity of audio data packets and the second quantity of video data packets obtained by decoding according to the description information obtained by analysis, and send the above-mentioned first quantity of audio data packets to the sound card. , and send the second number of video data packets to the graphics card.

Please refer to FIG. 5. FIG. 5 is a method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:

Step S501: Load the multimedia file into the memory of the player.

Specifically, the electronic device can load or download multimedia files into the memory of the player according to a series of file protocols, such as one or more of the FTP protocol, the HTTP protocol and the RTSP protocol.

Step S502: Parse the multimedia file to obtain multiple stream packets.

Specifically, the electronic device can parse the multimedia file according to the specific encapsulation form of the multimedia file by using a corresponding encapsulation protocol to obtain a plurality of code stream packets, wherein the code stream packets include audio data packets and video data packets. The audio data packets may be data packets in the form of audio stream compression, and the video data packets may be data packets in the form of video stream compression encoding.

Step S503: Continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.

Specifically, the first number of video data packets corresponds to the first playback time period, the second number of audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period is less than the preset time difference threshold. That is, the first number and the second number may be equal or unequal, but when the electronic device loads the first number of audio data packets and the second number of video data packets to the player for playback, the playback duration needs to be basically the same. . "Substantially the same" indicates to the user that the audio data packets and the video data packets are played synchronously. The first playback time period and the second playback time period can be determined according to the size of the buffer register buffer in the decoder (the first playback time corresponds to the video decoder, and the second playback time corresponds to the audio decoder) and the frequency of reading stream packets. . It is understandable that if the number of code stream packets read at one time is too large, the buffer may be blocked; if the number of code stream packets read at one time is too small, it may cause frequent jumps and cause read underload. Therefore, it is necessary to set a reasonable playback time range to ensure that the number of stream packets read each time is appropriate. Further, the preset time range can satisfy the first number of audio data packets and the second number of video data packets to play play in the player for at least 1 second.

The audio decoder is used for decoding the first quantity of audio data packets in the compressed and coded form into the first quantity of uncompressed audio raw data; the video decoder is used for decoding the second quantity of the video data packets in the compressed and coded form into a second quantity of audio data packets. Amount of raw video data in uncompressed form. The audio decoder sends the first quantity of uncompressed audio raw data packets to the sound card, and the video decoder sends the second quantity of uncompressed uncompressed video raw data packets to the graphics card. Accent and lip sync playback is possible with a sound card and display.

In a possible implementation manner, the multimedia file may include description information, and if the information of the first stream packet stored in the description information is the information of the audio data packet, the electronic device can continuously load the first number of audio data packets Go to the audio decoder to obtain the audio frame, and then continuously load the second number of video data packets into the video decoder to obtain the video frame. After synchronizing the audio frame and the video frame, the audio frame is played by the speaker, and the video frame is displayed by the display.

In a possible implementation manner, if the information of the first stream packet stored in the description information is the information of the video data packet, the electronic device can first continuously load the second number of video data packets into the video decoder to obtain the video frame, and then continuously load the first number of audio data packets into the audio decoder to obtain audio frames. After synchronizing the video frame and the audio frame, the audio frame is played by the speaker, and the video frame is displayed by the display.

Please refer to FIG. 6. FIG. 6 is another method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:

Step S601: Load the multimedia file into the memory of the player.

Specifically, for a detailed description, reference may be made to step S501, which will not be repeated here.

Step S602: Parse the multimedia file to obtain multiple stream packets.

Specifically, the multimedia file may include description information, and the description information and multiple code stream packets may be obtained by parsing the multimedia file.

Step S603: Determine whether the interleaving of multiple code stream packets is uniform according to the description information.

Specifically, the description information can be the information located at the head or the tail of the multimedia file. Like a book, it contains two parts, the directory and the content. The description information describes the file information and the attribute information, storage location and size, etc. For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.; The position (Position, POS), the length of the stream packet, and the presentation timestamp (Presentation Time Stamp, PTS) of the stream packet. Whether the interleaving of multiple code stream packets is uniform can be determined according to the storage location and display time stamp of each code stream packet in the description information.

In a possible implementation manner, the description information stores the display timestamp and storage location of each stream packet, and the electronic device may first count the audio data with the corresponding display timestamp relationship in the multiple stream packets according to the description information. Amount of stored data between packets and video packets. That is, the electronic device can find the video data packet with the corresponding display time stamp according to the display time stamp of the audio data packet, and then according to the storage location statistics show the storage between the audio data packet and the video data packet with the corresponding time stamp. The amount of data. It can be understood that the amount of stored data can indicate that other code stream packets are stored between the storage location of the audio data packet and the storage location of the video data packet corresponding to the display time stamp. Then, the electronic device counts the number of targets whose stored data amount is greater than or equal to the target distance threshold, and finally can determine whether the interleaving of multiple code stream packets is uniform according to the target number.

In a possible implementation manner, the target distance threshold is the larger one of the first distance and the threshold distance, and the first distance is the width, height, storage ratio or compression ratio of the multimedia file carried by the electronic device according to the description information determined by at least one of; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame. For example, the first distance L=width*height*store_rate*compression_rate. Then, the electronic device determines whether the interleaving of multiple code stream packets is uniform according to the target distance threshold, where the target distance threshold is the larger one of the first distance and the preset distance threshold. It should be noted that the preset distance threshold is a preset parameter used to measure whether the interleaving of data packets is uniform, which can be a value artificially set based on experience for reference comparison, or training based on multiple historical values. (or learned) a value used for reference contrast. For example, assuming the preset distance threshold D=2*1024*1024Byte=2097152Byte=2MB, when the first distance L is greater than the preset distance threshold P, the target distance threshold selects the value of the first distance L; when the first distance L When it is less than the preset distance threshold P, the target distance threshold D selects the value of the preset distance threshold P. It can be understood that, when the first distance L is equal to the preset threshold distance P, the target distance threshold D may be the value of the first distance L or the value of the preset distance P.

In a possible implementation manner, the electronic device may determine, according to the description information, the display time stamp and storage location of each code stream packet in a preset number of code stream packets, where the preset number of code stream packets belong to multiple code streams The first N code stream packets in the stream packets, where N is a positive integer. Then the electronic device counts the respective storage locations of the video data packets and the audio data packets with the corresponding display time stamp relationship in the preset number of stream packets in the manner of increasing the display time stamp, and finally according to the storage location of the video data packets according to the statistics and the storage location of the audio data packet to determine the amount of stored data between the video data packet and the audio data packet with the corresponding display time stamp relationship. It should be noted that the preset number may be a value for reference comparison artificially set according to experience, or a value for reference comparison obtained by training (or learning) according to a plurality of historical values. The preset number N can be set to 2000. If it is determined according to the description information that the number of code stream packets of the multimedia file is less than the preset number, the electronic device may count the number of the audio data packets and the video data packets with the corresponding display timestamp relationship in all the code stream packets. The amount of data stored.

For example, if it is determined according to the description information that the multimedia file to be played includes 4000 stream packets, because 4000 is greater than the preset number N, the electronic device can determine the number of stream packets in the preset number N of stream packets. Show timestamp and storage location. If the information of the first stream packet stored in the description information is the information of the audio data packet, the storage location of the video data packet corresponding to the above-mentioned display timestamp can be determined according to the display time stamp of the first audio data packet , and count the amount of stored data between the storage location of the audio data packet and the storage location of the video data packet corresponding to the time stamp relationship. By analogy, the storage location of the second audio data packet, the third audio data packet, . . . , the Nth audio data packet and the video data packet corresponding to the respective display time stamps is determined in a manner of increasing the display time stamp, Finally, the amount of stored data between the storage location of each audio data packet and the storage location of the video data packet whose display time stamp has a corresponding relationship is counted.

Because there are multiple code stream packets, there may also be multiple storage data volumes (the storage data volume between each audio data packet and the video data packet whose display time stamp has a corresponding relationship). The stored data volume counts the number of targets whose stored data volume is greater than or equal to the above target distance threshold. For example, if the target distance threshold D is 3MB, and the multiple storage data volumes include: 10MB, 2MB, 3.7MB, 1.9MB, 6.9MB, and 11.6MB, then the storage data volume is greater than or equal to the target distance threshold D is 3MB. The number is 4. Finally, the electronic device can determine whether the interleaving of the multiple code stream packets is uniform according to the target quantity. Further, the electronic device can calculate the ratio rate of the target number abnormal_cnt to the preset number N, and if the ratio rate is greater than or equal to the second preset threshold, it can be determined that the interleaving of multiple code stream packets belongs to uneven interleaving. It can be understood that, the second preset threshold may be a value artificially set according to experience for reference comparison, or a value obtained by training (or learning) according to a plurality of historical values for reference comparison. The second preset threshold may take a value of 0.2. For example, the ratio rate=abnormal_cnt/N=0.3, since 0.3 is greater than the second preset threshold 0.2, it can be determined that the code stream interleaving is not uniform.

Step S604: Continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.

Specifically, for a detailed description, reference may be made to step S503, which will not be repeated here.

Please refer to FIG. 7 . FIG. 7 is a schematic flowchart of a method for judging whether the interleaving of multimedia files is uniform according to an embodiment of the present application. As can be seen from FIG. 7 , after the electronic device loads the multimedia file, description information and/or multiple code stream packets can be obtained by parsing the multimedia file. Among them, the code stream packet includes audio data packets and video data packets. The description information describes the file information and the attribute information, storage location and size of each code stream packet. For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.; The position POS, the length of the code stream packet, and the PTS of the code stream packet.

Then, the electronic device may determine the target distance threshold according to the description information of the multimedia file. Specifically, the electronic device may determine the first distance according to one or more of the width, height, storage ratio store_rate, and compression ratio compression_rate of the multimedia file carried by the description information. If the first distance is greater than the preset distance threshold, the target threshold is the first distance; if the first distance is less than the preset distance threshold, the target threshold is the preset distance threshold; if the first distance is equal to the preset distance threshold, the target threshold is the first distance or a preset distance threshold. That is, the target distance threshold is the larger one of the first distance and the preset distance threshold. It should be noted that the preset distance threshold is a preset parameter used to measure whether the interleaving of data packets is uniform, which can be a value artificially set based on experience for reference comparison, or training based on multiple historical values. (or learned) a value used for reference contrast.

Next, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage location of the preset number of code stream packets and the display time stamp. If the number of code stream packets of the multimedia file is less than the preset number, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage locations and display time stamps of all code stream packets of the multimedia file, that is, the preset number is the code stream The total number of packets; if the number of code stream packets of the multimedia file is greater than or equal to the above preset number, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage location and display time stamp of the first N code stream packets. That is, the preset number is the first N code stream packets among the plurality of code stream packets. After determining the preset number of code stream packets to be analyzed, according to the method of first selecting the code stream packet with a smaller timestamp and then selecting the code stream packet with a larger timestamp, based on each code stream stored in the description information The storage location and display time stamp of the stream packets are used to analyze and judge a preset number of stream packets. Specifically, in each analysis and processing, the first code stream packet with the smallest timestamp is selected among the unprocessed code stream packets. It should be noted that if the information of the first code stream packet stored in the description information is an audio data packet information, then the first code stream packet is an audio data packet, and the second code stream packet is a video data packet; if the information of the first code stream packet stored in the description information is the information of the video data packet, then the first code stream packet is a video data packet, and the second stream packet is an audio data packet. Wherein, the second code stream packet is a code stream packet having a corresponding display timestamp relationship with the first code stream packet.

Then, the electronic device needs to determine whether the amount of stored data between the storage location of the first stream packet and the storage location of the second stream packet is greater than or equal to the target threshold, and if it is greater than or equal to the target threshold, it is considered that the two streams The amount of data stored in the package is abnormal, and a large jump may occur in actual playback. Therefore, the number of times the amount of stored data exceeds the target threshold is recorded by updating the target amount. According to the method shown in Figure 7, the interleaving situation of each code stream packet in the preset number is analyzed and processed. When the number of the code stream packets analyzed and processed is greater than or equal to the preset number, it means that the electronic equipment has processed and analyzed. The electronic device may calculate the ratio of the target quantity to the preset quantity, that is, ratio=target quantity/preset quantity. If the ratio is greater than or equal to the second preset threshold, it can be determined that the interleaving of multiple code stream packets is uneven; if the ratio is less than the second preset threshold, it can be determined that the interleaving of multiple code stream packets is uniform; it can be understood that, The second preset threshold value may be a value for reference comparison artificially set according to experience, or a value for reference comparison obtained by training (or learning) according to a plurality of historical values.

It should be noted that, when analyzing the interleaving of the code stream packets, if the preset number of code stream packets do not have the first code stream packet and the second code stream packet with the corresponding relationship of the display time stamp, it is not necessary to disregard the two code stream packets. A code stream packet is analyzed and processed.

Please refer to FIG. 8A . FIG. 8A is a schematic flowchart of a playback method for batch loading stream packets provided by an embodiment of the present application. As can be seen from FIG. 8A , after the electronic device parses the multimedia file according to the schematic flowchart shown in FIG. 6 and determines the interleaving situation of the code stream packets. If the interleaving of the code stream packets is uniform, the code stream packets with the smallest display time stamp that are not loaded are selected each time to the corresponding decoder until all the code stream packets are loaded into the decoder for decoding and playback. For a detailed description, reference may be made to FIG. 1A , which will not be repeated here.

If the interleaving of the code stream packets is uneven, the first number of audio data packets are continuously loaded into the audio decoder, and the second number of video data packets are continuously loaded into the video decoder. Specifically, please refer to FIG. 8B . FIG. 8B is an example provided by an embodiment of the present application. As can be seen from FIG. 8B , if the information of the first stream packet stored in the description information is the information of the audio Sequentially read and load the first number of audio data packets into the audio decoder, and the playback duration of the first number of video data packets in the player needs to correspond to the first playback period; then the electronic device jumps to the audio The position of the video stream packet corresponding to the display timestamp of the data packet, read and load the second number of video data packets into the video decoder in sequence, and the playback duration of the second number of video data packets in the player also needs to be Corresponding to the second playback time period, the deviation between the first playback time period and the second playback time period is less than a preset time difference threshold. Then loop the above loading process until all the code stream packets are loaded into the corresponding decoder for decoding and playback. Optionally, if the information of the first stream packet stored in the description information is the information of the video data packet, the electronic device first reads and loads the second number of video data packets into the video decoder in sequence, and then sequentially. Read and load the first number of audio packets into the audio decoder. It should be noted that, the above-mentioned "sequential" may be in the order of displaying timestamps from small to large.

It can be understood that the first quantity and the second quantity may or may not be equal, but when the electronic device loads the first quantity of audio data packets and the second quantity of video data packets into the corresponding decoder for decoding and playing, the playing The time period needs to be basically the same. "Substantially the same" indicates to the user that the audio data packets and the video data packets are played synchronously. The preset time range can be determined according to the size of the buffer in the decoder and the frequency of reading stream packets. It is understandable that if the number of code stream packets read at one time is too large, the buffer may be blocked; if the number of code stream packets read at one time is too small, it may cause frequent jumps and cause read underload. Therefore, it is necessary to set a reasonable preset time range to ensure that the number of stream packets read each time is appropriate. Further, the preset time range can satisfy the first number of audio data packets and the second number of video data packets. Play in the player for a duration of at least 1 second. The electronic device can load audio data packets and video data packets into the player in a roughly aligned manner in a preset time range (for example, 1 second), reducing the frequent jumps caused by the alignment of the code stream packets. performance overhead.

In a possible implementation manner, in the process of reading and loading the code stream package, the electronic device can dynamically determine the interleaving situation of the code stream package according to the schematic flowchart shown in FIG. 6, and adjust the loading of the code stream package in real time Way. For example, if the multimedia file to be played includes 8,000 stream packets, when it is found that the interleaving of the stream packets is uneven according to the analysis of the first N stream packets in the 8,000 stream packets, the electronic device can continuously load the first stream packet. A number of audio data packets, and a second number of video data packets are sequentially loaded into the player for playback. Wherein, N can be a preset number, and N is less than 8000. After the electronic device has loaded 1,000 code stream packets into the player for decoding and playback, there are still 7,000 code stream packets that are not loaded in the multimedia file. At this time, the electronic device can analyze the interleaving situation of the code stream packets at this time according to the first N code stream packets in the 7000 code stream packets. Select the unloaded code stream package with the smallest display time stamp to the corresponding decoder until all the code stream packages are loaded into the player for decoding and playback.

Please refer to FIG. 9. FIG. 9 is another method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:

Step S901: Load the multimedia file into the memory of the player.

Specifically, the parameter step S501 is described in detail, and details are not repeated here.

Step S902: Parse the multimedia file to obtain multiple stream packets.

Specifically, after reading the multimedia file, the electronic device parses the multimedia file and obtains that the code stream package includes a video data package, an audio data package and a subtitle data package. The electronic device can display video data packets and subtitle data packets through a display, and play audio data packets through a speaker. It can be understood that, when the audio data packets of some multimedia files do not have embedded subtitles, the code stream packets of the multimedia files may include subtitle data packets.

Step S903: Continuously load a first number of audio data packets into the audio decoder, and, continuously load a second number of video data packets into the video decoder, and continuously load a third number of subtitle data packets into the subtitle decoder middle.

Specifically, the electronic device sequentially reads and loads a first number of audio data packets into an audio decoder for decoding to obtain audio frames, loads a second number of video data packets into a video decoder for decoding to obtain video frames, and, Load the third number of subtitle data packets into the subtitle decoder for decoding to obtain subtitle frames. Then, after the electronic device performs synchronization processing on the audio frame, the video frame and the subtitle frame, the video frame and the subtitle frame are displayed by the display, and the audio frame is played by the speaker. Therefore, audio data packets, video data packets and subtitle data packets are loaded alternately until the code stream packets of the multimedia files to be played are loaded. The third number is an integer greater than 1, the third number of subtitle data packets corresponds to the third playback period, the deviation between the third playback period and the first playback period, or the deviation from the second playback period are smaller than the preset time difference threshold.

In a possible implementation manner, if the electronic device determines according to the schematic flowchart shown in FIG. 7 that the interleaving of multiple code stream packets is uneven, the electronic device continuously loads the first number of audio data packets into the audio decoder, and , successively loading a second number of video packets into the video decoder, and successively loading a third number of subtitle packets into the subtitle decoder.

It should be noted that the loading order of audio data packets, video data packets and subtitle data packets can be determined according to the description information obtained by parsing. If the information of the first stream packet stored in the description information is the information of the video data packet, the The device may first load the second quantity of video data packets, then load the first quantity of audio data packets, and then load the third quantity of subtitle data packets to be played in the player. This embodiment of the present application does not impose any restrictions on the order of loading the code stream package.

Please refer to FIG. 10. FIG. 10 is a schematic structural diagram of an apparatus for playing a multimedia file provided by an embodiment of the present application. The apparatus 100 for playing a multimedia file may be a node, or may be a device in a node, such as a chip or an integrated circuit , the apparatus 100 for playing a multimedia file may include a first loading unit 1001, a parsing unit 1002 and a second loading unit 1003, wherein the detailed description of each unit is as follows.

The first loading unit 1001 is used for loading multimedia files into the memory of the player;

parsing unit 1002, configured to parse the multimedia file to obtain a plurality of code stream packets, the code stream packets include audio data packets and video data packets;

The second loading unit 1003 is configured to continuously load a first number of the audio data packets into an audio decoder, and continuously load a second number of the video data packets into a video decoder, wherein the first The number is an integer greater than 1, and the second number is an integer greater than 1.

In a possible implementation manner, the first number of the video data packets corresponds to a first playback period, the second number of the audio data packets corresponds to a second playback period, and the first playback period The deviation between the time period and the second playing time period is less than a preset time difference threshold.

In a possible implementation manner, the multimedia file includes description information, and the apparatus further includes a determination unit 1004, where the determination unit 1004 is configured to: determine whether the interleaving of the multiple code stream packets is uniform according to the description information.

In a possible implementation manner, the determining unit 1004 is specifically configured to: count, according to the description information, the audio data packets and the video data that have a corresponding display timestamp relationship in the plurality of stream packets The amount of stored data between packets; count the number of targets whose stored data amount is greater than or equal to the target distance threshold; determine whether the interleaving of the multiple code stream packets is uniform according to the target number.

In a possible implementation manner, the target distance threshold is a larger one of a first distance and a preset distance, and the first distance is the width and height of the multimedia file carried according to the description information , storage ratio or compression ratio; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.

In a possible implementation manner, the determining unit 1004 is specifically configured to: determine, according to the description information, a display timestamp and a storage location of each stream packet in a preset number of stream packets, the preset The number of code stream packets belongs to the first N code stream packets in the plurality of code stream packets, and N is a positive integer; it is determined that the preset number of code stream packets has corresponding Display the respective storage positions of the video data packets and the audio data packets with the time stamp relationship; The amount of data stored between the video data packets and the audio data packets.

In a possible implementation manner, the determining unit 1004 is specifically configured to: calculate the ratio of the target quantity to the preset quantity; if the ratio is greater than or equal to a second preset threshold, determine the The interleaving of the multiple code stream packets is not uniform; or if the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the multiple code stream packets is not uniform.

In a possible implementation manner, the code stream package further includes a subtitle data package, and the second loading unit 1003 is further configured to: continuously load a third number of the subtitle data packages into the subtitle decoder, wherein , the audio data packets of the first quantity, the video data packets of the second quantity and the subtitle data packets of the third quantity are loaded alternately; wherein the third quantity is greater than 1 , the third number of the subtitle data packets corresponds to the third playback time period, the deviation between the third playback time period and the first playback time period, or the difference between the third playback time period and the second playback time period The deviations between them are all smaller than the preset time difference threshold.

In a possible implementation manner, the second loading unit 1003 is further configured to: if the description information determines that the interleaving of the plurality of code stream packets belongs to uniform interleaving, load the display with the smallest timestamp. Audio packets are loaded into the audio decoder, and video packets with the smallest display timestamp are loaded into the video decoder.

It should be noted that, the implementation of each unit may also refer to the corresponding description of the method embodiment shown in FIG. 5 , FIG. 6 or FIG. 9 .

Embodiments of the present application further provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium. When the above program instructions are executed on a computer or a processor, the method flow shown in FIG. 5 , FIG. 6 or FIG. 9 be realized.

Embodiments of the present application further provide a computer program product, the computer program product includes program instructions, and when the program instructions are run on a computer or a processor, the method flow shown in FIG. 5 , FIG. 6 or FIG. 9 is implemented.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims

A method for playing multimedia files, characterized in that the method comprises:

Load multimedia files into the player's memory;

Parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets include audio data packets and video data packets;

Continuously loading a first number of the audio data packets into the audio decoder, and continuously loading a second number of the video data packets into the video decoder, wherein the first number is an integer greater than 1, so The second number is an integer greater than 1.
The method according to claim 1, wherein the first number of the video data packets corresponds to a first playback time period, the second number of the audio data packets corresponds to a second playback time period, and the The deviation between the first playing time period and the second playing time period is less than a preset time difference threshold.
The method according to claim 1 or 2, wherein the multimedia file includes description information, and after the multimedia file is parsed to obtain a plurality of stream packets, the continuous loading of the audio data of the first quantity into the audio decoder, and, before successively loading the second number of the video data packets into the video decoder, further comprising:

Whether the interleaving of the plurality of code stream packets is uniform is determined according to the description information.
The method according to claim 3, wherein the determining whether the interleaving of the multiple code stream packets is uniform according to the description information comprises:

According to the description information, count the amount of stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets;

Count the number of targets whose stored data amount is greater than or equal to the target distance threshold;

Whether the interleaving of the plurality of code stream packets is uniform is determined according to the target number.
The method according to claim 4, wherein the target distance threshold is a larger one of a first distance and a preset distance, and the first distance is the multimedia file carried according to the description information It is determined by at least one of the width, height, storage ratio or compression ratio of the multimedia file; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
The method according to claim 4 or 5, characterized in that, according to the description information, the statistics of the audio data packets and the video data packets that have a corresponding display time stamp relationship in the plurality of stream packets are performed. The amount of stored data between, including:

The display timestamp and storage location of each code stream packet in a preset number of code stream packets are determined according to the description information, and the preset number of code stream packets belong to the first N codes in the plurality of code stream packets Stream packet, N is a positive integer;

Determine the respective storage locations of the video data packets and the audio data packets that have a corresponding display time stamp relationship in the preset number of stream packets in a manner of increasing the display time stamps;

According to the storage position of the video data packet and the storage position of the audio data packet, the storage data amount between the video data packet and the audio data packet with the corresponding display time stamp relationship is determined.
The method according to claim 6, wherein the determining whether the interleaving of the plurality of code stream packets is uniform according to the target number comprises:

calculating the proportion of the target quantity to the preset quantity;

If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven; or

If the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven.
The method according to any one of claims 1 to 7, wherein the code stream package further includes a subtitle data package, and after the multimedia file is parsed to obtain description information and a plurality of code stream packages, the method Also includes:

Continuously loading a third number of the subtitle data packets into the subtitle decoder, wherein the first number of the audio data packets, the second number of the video data packets and the third number of all the The subtitle data packets are loaded alternately;

Wherein, the third quantity is an integer greater than 1, the third quantity of the subtitle data packets corresponds to a third playback time period, and the deviation between the third playback time period and the first playback time period , or the deviation from the second playback time period is smaller than the preset time difference threshold.
The method according to any one of claims 1 to 8, wherein after the parsing the multimedia file to obtain the description information and a plurality of code stream packets, the method further comprises:

If it is determined according to the description information that the multiple code stream packets are evenly interleaved, then

Loading the audio data packet with the smallest display time stamp into the audio decoder, and loading the video data packet with the smallest display time stamp into the video decoder.
A device for playing multimedia files, characterized in that the device comprises:

The first loading unit is used to load the multimedia file into the memory of the player;

a parsing unit for parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets;

A second loading unit, configured to continuously load a first number of the audio data packets into the audio decoder, and continuously load a second number of the video data packets into the video decoder, wherein the first number of is an integer greater than 1, and the second number is an integer greater than 1.
The apparatus according to claim 10, wherein the first number of the video data packets corresponds to a first playback time period, the second number of the audio data packets corresponds to a second playback time period, and the The deviation between the first playing time period and the second playing time period is less than a preset time difference threshold.
The apparatus according to claim 10 or 11, wherein the multimedia file includes description information, and the apparatus further comprises a determining unit, configured to:

Whether the interleaving of the plurality of code stream packets is uniform is determined according to the description information.
The device according to claim 12, wherein the determining unit is specifically configured to:

According to the description information, count the amount of stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets;

Count the number of targets whose stored data amount is greater than or equal to the target distance threshold;

Whether the interleaving of the plurality of code stream packets is uniform is determined according to the target number.
The device according to claim 13, wherein the target distance threshold is a larger one of a first distance and a preset distance, and the first distance is the multimedia file carried according to the description information It is determined by at least one of the width, height, storage ratio or compression ratio of the multimedia file; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
The device according to claim 13 or 14, wherein the determining unit is specifically configured to:

The display timestamp and storage location of each code stream packet in a preset number of code stream packets are determined according to the description information, and the preset number of code stream packets belong to the first N codes in the plurality of code stream packets Stream packet, N is a positive integer;

Determine the respective storage locations of the video data packets and the audio data packets that have a corresponding display time stamp relationship in the preset number of stream packets in a manner of increasing the display time stamps;

According to the statistics of the storage positions of the video data packets and the storage positions of the audio data packets, determine the amount of stored data between the video data packets and the audio data packets that have a corresponding display time stamp relationship.
The device according to claim 15, wherein the determining unit is specifically configured to:

calculating the proportion of the target quantity to the preset quantity;

If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven; or

If the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven.
The apparatus according to any one of claims 10 to 16, wherein the code stream package further comprises a subtitle data package, and the second loading unit is further configured to: continuously load a third quantity of the subtitle data into a subtitle decoder, wherein the first number of the audio data packets, the second number of the video data packets, and the third number of the subtitle data packets are alternately loaded;

Wherein, the third quantity is an integer greater than 1, the third quantity of the subtitle data packets corresponds to a third playback time period, and the deviation between the third playback time period and the first playback time period , or the deviation from the second playback time period is smaller than the preset time difference threshold.
The device according to any one of claims 10 to 17, wherein the second loading unit is further configured to:

If the description information determines that the interleaving of the plurality of code stream packets belongs to uniform interleaving, then

Loading the audio data packet with the smallest display time stamp into the audio decoder, and loading the video data packet with the smallest display time stamp into the video decoder.
An electronic device, characterized in that the electronic device comprises at least one processor and a transmission interface, the at least one processor receives or sends a signal through the transmission interface; the at least one processor is used to call storage in a memory A computer program to cause the apparatus to perform the method of any of claims 1-10.
A computer-readable storage medium, wherein program instructions are stored in the computer-readable storage medium, and when the program instructions are executed on a processor, the method of any one of claims 1-10 is implemented .
A computer program product, characterized in that the computer program product includes program instructions, and when the program instructions are executed on a computer or a processor, the method according to any one of claims 1-10 is implemented.