WO2022193141A1 - Multimedia file playing method and related apparatus - Google Patents

Multimedia file playing method and related apparatus Download PDF

Info

Publication number
WO2022193141A1
WO2022193141A1 PCT/CN2021/081127 CN2021081127W WO2022193141A1 WO 2022193141 A1 WO2022193141 A1 WO 2022193141A1 CN 2021081127 W CN2021081127 W CN 2021081127W WO 2022193141 A1 WO2022193141 A1 WO 2022193141A1
Authority
WO
WIPO (PCT)
Prior art keywords
packets
data packets
code stream
video
multimedia file
Prior art date
Application number
PCT/CN2021/081127
Other languages
French (fr)
Chinese (zh)
Inventor
刘秦涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180095561.2A priority Critical patent/CN116965038A/en
Priority to PCT/CN2021/081127 priority patent/WO2022193141A1/en
Publication of WO2022193141A1 publication Critical patent/WO2022193141A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams

Definitions

  • the present invention relates to the technical field of multimedia, and in particular, to a method and a related device for playing multimedia files.
  • multimedia file contains two parts: one part is description information or description block, and the other part is multiple code stream packets.
  • the code stream packet may include processed audio data packets and video data packets.
  • the audio data packets and the video data packets are sequentially and uniformly interleaved and stored.
  • the storage location of the stream packets in the recorded or processed multimedia files may change, that is, the audio data packets and the video data packets are not evenly interleaved and stored in sequence. Therefore, in some scenarios, it may cause a stuck phenomenon when the multimedia file is played, reducing the user's perception.
  • the embodiment of the present application discloses a method and a related device for playing a multimedia file, which can ensure that audio data packets and video data packets can be played synchronously, and reduce the playing card caused by frequent jumping and loading of corresponding stream packets. Dayton problem.
  • a first aspect of the embodiments of the present application provides a method for playing a multimedia file, the method may include: loading the multimedia file into the memory of the player; parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets; then continuously load the first quantity of the audio data packets into the audio decoder, and continuously load the second quantity of the video data packets into the video decoder, wherein the first quantity is greater than an integer of 1, the second number is an integer greater than 1.
  • the loading method of continuously loading audio data packets with a quantity greater than 1 into the audio decoder each time and continuously loading video data packets with a quantity greater than 1 into the video decoder each time is compared with the prior art.
  • the loading method of loading 1 audio data packet into the audio decoder at a time, and loading 1 video data packet into the video decoder at a time can reduce the number of frequent jumps to load code stream packets. Under the condition that audio data packets and video data packets can be played synchronously, the speed of reading stream packets is improved to avoid playback freezes caused by data packet underload.
  • the first number of video data packets corresponds to the first playback period
  • the second number of audio data packets corresponds to the second playback period
  • the first playback period and the second The deviation between the playback time periods is less than the preset time difference threshold
  • the first playback time period corresponding to the first number of video data packets is at least 1 second; the second playback time period corresponding to the second number of video data packets for a length of time of at least 1 second.
  • the first quantity can satisfy a certain time length for the video data packets to be played on the player
  • the second quantity can also satisfy the time length required for the video data packets to play the first quantity on the player. Therefore, the above-mentioned method of loading stream packets is no longer the method of synchronizing and aligning with the display timestamp of a single stream packet in the prior art, but the method of roughly aligning in units of preset time periods, which can avoid Performance overhead caused by frequent jumps to load stream packages.
  • the multimedia file includes description information
  • the audio data packets of the first number are continuously loaded into the audio decoder
  • the method further includes: determining whether the interleaving of the plurality of code stream packets is uniform according to the description information.
  • the description information stores the relevant information of each code stream packet, that is, the relevant information of each audio data packet and each video data packet. According to the relevant information of the audio data packet and the video data packet, it can be determined whether the interleaving of the multiple code stream packets is uniform.
  • determining whether the interleaving of multiple code stream packets is uniform according to the description information may include: according to the description information, counting audio data with a corresponding display timestamp relationship in the multiple code stream packets The amount of stored data between the packets and the video data packets; then count the number of targets whose stored data amount is greater than or equal to the target distance threshold; finally, it can be determined whether the interleaving of the multiple code stream packets is uniform according to the number of targets.
  • the target number is determined by fully considering each audio data packet and video data packet, so the accuracy of judging whether the interleaving of multiple code stream packets is uniform can be improved according to the target quantity, so that the judgment result of whether the interleaving is uniform can be improved. have higher credibility.
  • the target distance threshold is the larger one of the first distance and the preset distance
  • the first distance is the width, height, storage ratio of the multimedia file carried according to the description information or determined by at least one of the compression ratios
  • the multimedia file includes a video frame
  • the width and height of the multimedia file correspond to the width and height of the video frame.
  • the target distance threshold is also a parameter related to the multimedia file. Therefore, the target distance threshold can improve the accuracy of determining whether the interleaving of multiple code stream packets is uniform in the multimedia file. , so that the judgment result of whether the interleaving is uniform has higher reliability.
  • the amount of stored data includes: determining, according to the description information, the display time stamp and storage location of each code stream packet in a preset number of code stream packets, where the preset number of code stream packets belong to the multiple code streams
  • the interleaving situation of the first N code stream packets in the multiple code stream packets can be determined first, so as to avoid the excessive time for determining the interleaving situation of the code stream packets. Affects the speed at which the player starts playback.
  • the determining whether the interleaving of the plurality of code stream packets is uniform according to the target quantity includes: calculating the ratio of the target quantity to the preset quantity; If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the multiple code stream packets belongs to uneven interleaving; or if the target number is greater than or equal to the third preset threshold, it is determined that the multiple code stream packets are interleaved unevenly; The interleaving of the code stream packets is uneven.
  • one audio data packet is loaded into the audio decoder each time, and one audio data packet is loaded each time.
  • the ratio of the target quantity to the preset quantity is relatively large, that is, the ratio of the target quantity to the preset quantity
  • the ratio of the target quantity to the preset quantity When it is greater than or equal to the second preset threshold, or, when the above-mentioned number of targets is greater than or equal to the third preset threshold, frequent jumps may occur in the process of loading the code stream packet to the corresponding decoder according to the prior art, so it is possible to Determine the uneven interleaving of multiple stream packets.
  • the ratio obtained by calculation can improve the accuracy of judging that the interleaving of multiple code stream packets belongs to uneven interleaving, so that the judgment result of uneven interleaving has higher reliability.
  • uneven interleaving of multiple code stream packets each consecutive loading of audio data packets greater than 1 into the audio decoder, and each successive loading of video packets greater than 1 into the video decoder In this way, the number of frequent jumps to load the stream package can be reduced.
  • both the second preset threshold and the third preset threshold may be a value artificially set according to experience for reference and comparison, or a value obtained by training (or learning) according to multiple historical values. The value of the reference comparison.
  • the code stream package further includes a subtitle data package, and after parsing the multimedia file to obtain the description information and multiple code stream packages, it may further include: continuously loading a third number of subtitles data packets into the subtitle decoder, wherein a first quantity of the audio data packets, a second quantity of video data packets and a third quantity of subtitle data packets are alternately loaded; wherein the third quantity is an integer greater than 1,
  • the third number of subtitle data packets corresponds to the third playback time period, and the deviation between the third playback time period and the first playback time period or the deviation from the second playback time period is smaller than the preset time difference threshold.
  • the code stream package also includes subtitle data packets
  • each time the audio data packets with a quantity greater than 1, the video data packets with a quantity greater than 1, and the subtitle data packets with a quantity greater than 1 are loaded into the corresponding decoders, respectively.
  • 1 video data packet and 1 subtitle data packet into the corresponding decoders at a time in the prior art the number of frequent jumps to load the code stream packets can be reduced, and the audio frequency can be guaranteed.
  • the data packets, video data packets and subtitle data packets are synchronized, the speed of reading the stream packets is increased, and the playback freeze caused by the occurrence of data packet underload is avoided.
  • the method further includes: if the description information determines that the interleaving situation of the multiple code stream packets belongs to If the interleaving is uniform, the audio data packet with the smallest display time stamp is loaded into the audio decoder, and the video data packet with the smallest display time stamp is loaded into the video decoder.
  • the interleaving situation of the code stream package can be dynamically judged in real time.
  • the loading method of the code stream package can be dynamically adjusted. The unloaded audio data packets and the video data packets with the smallest display time stamp are loaded in the largest order and played in the player.
  • the multimedia file includes description information, continuously loads a first number of audio data packets into the audio decoder, and continuously loads a second number of video data packets into the video decoder , including: if the information of the first stream packet stored in the description information is the information of the audio data packet, then continuously load the audio data packet of the first quantity into the audio decoder, and then continuously load the video data packet of the second quantity to in the video decoder.
  • the method of loading the data packet can be determined according to the description information of the multimedia file. If the first stream packet is an audio data packet, the audio data packet can be loaded to the audio decoder first. Determining the order of loading audio code stream packets and video code stream packets through the description information has better reliability.
  • the multimedia file includes description information, continuously loads a first number of audio data packets into the audio decoder, and continuously loads a second number of video data packets into the video decoder , including: if the information of the first stream packet stored in the description information is the information of the video data packet, then continuously load the second quantity of video data packets into the video decoder, and then continuously load the first quantity of audio data packets to in the audio codec.
  • the method of loading the data packet can be determined according to the description information of the multimedia file. If the first stream packet is a video data packet, the video data packet can be loaded to the video decoder first. Determining the order of loading audio code stream packets and video code stream packets through the description information has better reliability.
  • a second aspect of the embodiments of the present application provides a device for playing multimedia files, and the device may include:
  • the first loading unit is used to load the multimedia file into the memory of the player
  • a parsing unit used for parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets;
  • a second loading unit configured to continuously load a first number of audio data packets into the audio decoder, and continuously load a second number of video data packets into the video decoder, wherein the first number is an integer greater than 1, The second number is an integer greater than one.
  • the first number of video data packets corresponds to the first playback period
  • the second number of audio data packets corresponds to the second playback period
  • the first playback period and the second The deviation between the playback time periods is less than the preset time difference threshold
  • the multimedia file may include description information
  • the above apparatus further includes a determination unit, configured to determine whether the interleaving of multiple code stream packets is uniform according to the description information.
  • the determining unit is specifically configured to: according to the description information, count the stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets Count the number of targets whose data volume is greater than or equal to the target distance threshold; determine whether the interleaving of multiple code stream packets is uniform according to the target number.
  • the target distance threshold is the larger one of the first distance and the preset distance
  • the first distance is the width, height, storage ratio of the multimedia file carried according to the description information or determined by at least one of the compression ratios
  • the multimedia file includes a video frame
  • the width and height of the multimedia file correspond to the width and height of the video frame.
  • the determining unit is specifically configured to: determine, according to the description information, the display time stamp and storage location of each code stream packet in the preset number of code stream packets, and the preset number of The stream packet belongs to the first N stream packets among the multiple stream packets, and N is a positive integer; the video data packets with the corresponding display timestamp relationship among the preset number of stream packets are determined according to the method of increasing the display timestamp gradually. and the respective storage locations of the audio data packets; determine the storage data amount between the video data packets and the audio data packets with the corresponding display time stamp relationship according to the statistical storage positions of the video data packets and the storage positions of the audio data packets.
  • the determining unit is specifically configured to: calculate the ratio of the target quantity to the preset quantity; if the ratio is greater than or equal to the second preset threshold, determine the The interleaving is uneven; or if the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of multiple code stream packets is uneven.
  • the code stream package further includes a subtitle data package
  • the loading unit is further configured to: continuously load a third number of subtitle data packages into the subtitle decoder, where the first number of The audio data packets of the second quantity, the video data packets of the second quantity and the subtitle data packets of the third quantity are loaded alternately; wherein, the third quantity is an integer greater than 1, the subtitle data packets of the third quantity correspond to the third playback time period, and the The deviation between the third playing time period and the first playing time period, or the deviation from the second playing time period is smaller than the preset time difference threshold.
  • the loading unit is further configured to: if the description information determines that the interleaving of the multiple code stream packets is uniformly interleaved, load the audio data packet with the smallest display time stamp to the audio decoding and, load the video packet with the smallest display timestamp into the video decoder.
  • a third aspect of the embodiments of the present application provides an electronic device, the electronic device includes at least one processor and a transmission interface, the at least one processor receives or sends a signal through the transmission interface; the at least one processor is used to call storage A computer program in a memory to cause a trading company electronic device to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
  • a fourth aspect of the embodiments of the present application discloses a computer-readable storage medium, where program instructions are stored in the computer-readable storage medium, and when the program instructions are run on a computer or a processor, the first aspect or any one of the first aspect is executed.
  • program instructions are stored in the computer-readable storage medium, and when the program instructions are run on a computer or a processor, the first aspect or any one of the first aspect is executed.
  • a fifth aspect of the embodiments of the present application discloses a computer program product.
  • the computer program product includes program instructions. When the program instructions are run on a computer or a processor, the first aspect or any possible implementation manner of the first aspect is executed. the described method.
  • 1A is a schematic diagram of a method for playing a uniformly interleaved multimedia file provided by an embodiment of the present application
  • 1B is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by an embodiment of the present application
  • 1C is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by the prior art
  • FIG. 2A is a schematic diagram of a playback environment of a multimedia file provided by an embodiment of the present application.
  • FIG. 2B is a schematic diagram of another multimedia file playback environment provided by an embodiment of the present application.
  • 3A is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • 3B is a software structural block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of software module interaction of an electronic device provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a method for judging whether the interleaving of multimedia files is uniform according to an embodiment of the present application
  • 8A is a schematic flowchart of a method for playing a batch loading stream package provided by an embodiment of the present application
  • 8B is a schematic diagram of a playback method for batch loading stream packets provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an apparatus for playing a multimedia file provided by an embodiment of the present application.
  • the compressed audio data packets, video data packets and/or subtitle data packets are stored in the multimedia container, and the container format is also called the encapsulation format.
  • Common encapsulation formats include one or more of the following: MPEG-4 Part 14 (MPEG-4 Part14, MP4), Audio Video Interleaved (AVI), Transport Stream (MPEG2-TS, TS) and so on.
  • MPEG-4 Part14 MP4
  • AVI Audio Video Interleaved
  • MPEG2-TS Transport Stream
  • TS Transport Stream
  • different container formats store audio data packets, video data packets and/or subtitle data packets in different ways, and are respectively applied in different fields.
  • TS is a stream encapsulation form, commonly used in broadcast TV and streaming media protocols
  • MP4 is a frame encapsulation form, commonly used in the field of local video and network video.
  • the description information can also be called an information block.
  • the description information includes the description information of the stream packets (for example, multiple video data packets, multiple audio data packets and/or multiple subtitle data packets) contained in the multimedia file, and the description information may include one or more of the following: Information such as file identification, playback duration of multimedia files, video width, height, frame rate, bit rate, resolution, etc.; audio sampling rate, number of channels, and other information.
  • the description information also includes a storage information table, which describes the storage location (Position, POS) to which each video data packet, audio data packet and/or subtitle data packet is stored, the length of the packet, and the display of the packet. Timestamp (Presentation time stamp, PTS), etc.
  • the description information is usually located at the head or tail of the multimedia file.
  • the file identifier may be "Chinese subtitle”, “Chinese audio”, “English audio” and so on.
  • the code stream refers to the data flow in a unit time after the video data is encoded and compressed. Generally speaking, under the same resolution, the larger the code stream of the video data, the smaller the compression ratio and the higher the picture quality.
  • Video is composed of consecutive images, each image is called a frame (Frame), and the image is composed of pixels (Pixel).
  • the number of pixels in an image is called the resolution of the image.
  • an image of 1920*1080 means that it is composed of horizontal and vertical 1920*1080 pixels. Therefore, the resolution of the video is the resolution of each frame of the image.
  • a frame is a still picture, and continuous frames form an animation, such as a movie.
  • the number of frames usually referred to is the number of frames of pictures transmitted in seconds, usually expressed in frames per second (Frames Per Second, FPS).
  • FPS frames Per Second
  • Each frame is a still image, and displaying frames in rapid succession creates the illusion of motion, restoring the state of the object at that time.
  • Higher frame rates result in smoother, more realistic animations.
  • the more frames per second (FPS) the smoother the displayed motion will be.
  • Bit rate refers to the number of bits (bits) transmitted per second.
  • the unit is bps (Bit Per Second). The higher the bit rate, the larger the transmitted data.
  • bit rate indicates how many bits per second the encoded (compressed) audio and video data needs to represent, and a bit is the smallest unit in binary, either 0 or 1.
  • the relationship between bit rate and audio and video compression is simply that the higher the bit rate, the better the quality of audio and video, but the larger the encoded file; if the bit rate is smaller, the situation is just the opposite.
  • Sampling rate (also called sampling speed or sampling frequency) defines the number of times that audio data is taken per second, and is expressed in Hertz (Hz).
  • the sampling rate refers to the sampling frequency when converting an analog signal into a digital signal, that is, how many points are sampled per unit time. How many bits are there in a sample point data.
  • the number of channels is the number of sound channels.
  • the sound is played by the speaker after the audio data is decoded.
  • the channels are often divided into monophonic and stereophonic.
  • Decapsulation is to split the multimedia file according to the corresponding encapsulation format, and split the audio data packet, video data packet and/or subtitle data packet in the multimedia file.
  • the parameters of the multimedia file can be obtained through decapsulation, such as encoding format, file size, playback duration, resolution, audio sampling rate, number of channels, and so on.
  • FIG. 1A is a schematic diagram of a method for playing a uniformly interleaved multimedia file provided by an embodiment of the present application.
  • the multimedia file 100A includes description information and a plurality of code stream packets, the code stream packets include audio data packets and video data packets, and the description information carries the attribute information of each code stream packet (the attribute information indicates the whether it is an audio data packet or a video data packet), display time stamp, storage location, memory size and other description information.
  • the audio data packets and the video data packets of the multimedia file 100A are evenly interleaved and stored in sequence according to the adjacent display time stamps.
  • the audio data packet with the display time stamp "1" and the video data packet with the display time stamp “1” are stored adjacent to each other
  • the video data packet with the display time stamp "1” and the video data packet with the display time stamp “2” are stored adjacent to each other.
  • Audio packets are stored contiguously. It should be noted that if the display time stamp is "1", it can indicate that the stream packet corresponding to the display time stamp can be displayed in the "1st bit" after decoding.
  • the corresponding code stream packet can be displayed in the "2nd bit” after decoding, that is, it is displayed after the display time stamp is "1".
  • the display timestamp of "1" can also indicate that the stream packet corresponding to the displayed timestamp can be displayed at the "preset time point” after decoding, and the "preset time point” can be determined according to actual needs.
  • This application implements The example does not impose any restrictions.
  • the electronic device loads the multimedia file 100A into the memory of the player 200, then parses the multimedia file 100A to obtain description information and a plurality of code stream packets, and then loads the audio data packets in sequence according to the description information of the plurality of code stream packets contained in the description information To the audio decoder, video data packets to the video decoder, and then synchronously process the decoded audio data packets and video data packets and send them to the speaker and the display, respectively, to achieve the playback effect of accent and lip sync.
  • the electronic device loads the audio data packets with a display time stamp of "1” and the audio data packets with a display time stamp of "1" in sequence from the audio data packets with a display time stamp of "1".
  • the video data packets, the audio data packets with the display time stamp "2", and the video data packets with the display time stamp "2" are stored in the memory of the player 200 .
  • the electronic device loads the audio data packet with the display time stamp "1" into the audio decoder, and then loads the video data packet with the display time stamp "1” into the video decoder, and then sequentially displays the time The audio packets with the timestamp "2" are loaded into the audio decoder, and the video packets with the display timestamp "2" are loaded into the video decoder.
  • each stream packet has a display timestamp
  • the display timestamp indicates when the stream packet is decoded and played, that is, the display timestamp is played first, and the display timestamp is displayed later.
  • the stream packets with the corresponding display timestamps need to be played at the same time after being decoded.
  • an audio data packet with a display time stamp of "1” and a video data packet with a display time stamp of "1" are stream packets with a corresponding display time stamp relationship. Therefore, when loading the code stream package, the player 200 can sequentially load the code stream package into the decoder according to the increasing order of the display time stamp, perform decoding and playback.
  • FIG. 1B is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by an embodiment of the present application.
  • the audio data packets and video data packets of the multimedia file 100A are not evenly stored in order according to the display time error.
  • the playback of the data packets) is consistent, and it is necessary to continuously jump to the multimedia file 100B to load the corresponding audio data packets and video data packets.
  • After synchronizing the decoded audio data packets and video data packets they are sent to the speaker and the display respectively, so that the playback effect of accent and mouth synchronization can be realized.
  • the electronic device loads the first code stream block (including the description information of the multimedia file and the display timestamp of “1” from the audio data packet with the display time stamp “1” in the storage order. ”, audio packets with display timestamp “2”, audio packets with display timestamp “3”, audio packets with display timestamp “4”, and audio packets with display timestamp “5” audio data packets) into the memory of the player 200.
  • the first code stream block including the description information of the multimedia file and the display timestamp of “1” from the audio data packet with the display time stamp “1” in the storage order.
  • the electronic device can first load the audio data packets with the display time stamp "1" into the audio decoder according to the order of the display time stamps, and then according to the principle that the audio data packets and the video data packets need to be played synchronously, the display time stamps need to be displayed.
  • a video packet of "1" is sent to the video decoder.
  • the electronic device finds that the first stream block loaded into the memory of the player 200 does not have a video data packet with a display timestamp of "1", so it needs to jump to the multimedia file 100B with a display timestamp of "1"
  • the storage location of the video data packet loads the second code stream block (including one or more code stream packets) into the memory.
  • the player 200 loads the second stream block in sequence starting from the video data packet with the display time stamp "1" (including the video data packet with the display time stamp “1” of the multimedia file 100B, 10", audio packets with display timestamp "11", audio packets with display timestamp “12”, and video packets with display timestamp “2") into memory. It can be understood that the audio data packets with display time stamps "2" to "5" previously loaded into the memory will be deleted from the memory by the player 200 because they are not used. Then, the electronic device loads the video data packet whose display time stamp is “1” into the video decoder of the player 200 .
  • the electronic device sequentially needs to load the audio data packets with the display time stamp "2" into the audio decoder of the player 200, but there is no display time stamp "2" in the second stream block loaded into the memory Therefore, the electronic device needs to jump to the storage location of the audio data packet with the timestamp "2" in the multimedia file 100B to load the third stream block (including one or more stream packets) into the memory. It can be understood that the audio data packets with display time stamps "10" to "12” and the video data packets with display time stamp "2" previously loaded into the memory will be deleted from the memory by the player 200 because they are not used. deleted in .
  • FIG. 1C is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided in the prior art.
  • the player 200 can determine the first file start position of the audio data package of the target multimedia file and the second file start position of the video data package, when the possible difference between the first file start position and the second file start position is When the amount of stored data is greater than or equal to the preset threshold, the player 200 can read the audio data packets and the video data packets respectively based on the dual input/output (I/O), and then read the audio data packets and the video data packets according to the Display time stamp or decoding time stamp (it should be noted that for online multimedia files, decoding time stamp is equal to display time stamp) Sort the read audio data packets and video data packets, based on the sorted audio data packets and video packets to play the target multimedia file.
  • I/O dual input/output
  • decoding time stamp is equal to display time stamp
  • the amount of storable data is determined based on the storage location of the audio data packet with the display time stamp "1" and the storage location of the video data packet with the display time stamp "1".
  • the amount of data exceeds the threshold, it means that the code stream packets of the multimedia file 100C are unevenly interleaved.
  • the player 200 sequentially loads the first stream block (including the audio data packets with the display time stamp "1 to 9", the display video packets with a timestamp of "1” and audio packets with a display timestamp of "10"), then delete the video packets with a display timestamp of "1", and only keep the audio in the first I/O packets, and then sort the retained audio packets according to the display timestamp.
  • the player 200 sequentially loads the second stream block (including the video data packet with the display time stamp "1", the video data packet with the display time stamp "1", the display time stamp Audio packets with display timestamps "10 to 12", video packets with display timestamps "2", audio packets with display timestamps "13 to 15”, and video packets with display timestamps "3 to 6" data packets), and then delete the audio data packets with the display time stamp "10 to 12” and the display time stamp "13 to 15", only keep the video data packets in the first I/O, and then according to the display time stamp Sort reserved video packets.
  • loading multimedia files with dual I/O channels may result in double the bandwidth ratio, resulting in lower download efficiency.
  • Sorting audio data packets and video data packets based on the display time stamp will take time and reduce playback efficiency.
  • multimedia file 100A, multimedia file 100B and multimedia file 100C may be files downloaded from the network or files loaded locally.
  • FIG. 2A is a schematic diagram of a playback environment of a multimedia file provided by an embodiment of the present application.
  • the playback environment 001 may include a first electronic device 200A and a server 201 .
  • a communication connection relationship may be established between the first electronic device 200A and the server 201 to perform information transmission.
  • the communication between the first electronic device 200A and the server 201 may be based on any wired network and/or wireless network, including but not limited to the Internet, a wide area network, a metropolitan area network, a virtual private network, a wireless communication network, and the like.
  • the first electronic device 200A is installed with application software for playing multimedia files, such as a player provided by some service providers and the like.
  • the first electronic device 200A may include, but is not limited to, terminal devices such as smart phones, desktop computers, tablet computers, notebook computers, digital assistants, smart wearable devices, and the like.
  • the server 201 may be an independently running server, or a distributed server, or a server cluster composed of multiple servers.
  • the server 201 may store multimedia files to be played, and further, the server 201 may be a background server to which the multimedia files belong. For example, if the first electronic device 200A plays a multimedia file provided by service provider A, the server 201 may be a background server of service provider A.
  • the first electronic device 200A When the user needs to play a multimedia file on the first electronic device 200A, the first electronic device 200A sends a request to the server 201, and the server 201 sends the multimedia file to the first electronic device 200A, that is, the first electronic device 200A sends a request from the server 201 to the first electronic device 200A. Download multimedia files.
  • the first electronic device 200A loads the multimedia file into the memory of the player, and then parses the multimedia file to obtain multiple stream packets, wherein the stream packets include audio data packets and video data packets.
  • the first electronic device 200A may continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.
  • the first number is an integer greater than 1
  • the second number is an integer greater than 1.
  • the first number of video data packets corresponds to the first playback time period
  • the second number of the audio data packets corresponds to the second playback time period
  • the deviation between the first playback time period and the second playback time period is less than Preset time difference threshold. That is, the first number of video data packets and the second number of audio data packets can be synchronized for a period of time after being decoded.
  • FIG. 2B is a schematic diagram of another multimedia file playback environment provided by an embodiment of the present application.
  • the playback environment 002 may include a first electronic device 200A, a second electronic device 200B, and a router 202 .
  • the first electronic device 200A can be connected to the router 202, and the second electronic device 200B can also be connected to the router 202.
  • the router 202 can ensure that the first electronic device 200A and the second electronic device 200 are in the same local area network.
  • the first electronic device 200A is installed with application software for recording and/or playing multimedia files, such as short video shooting software provided by some service providers.
  • the first electronic device may be a terminal device supporting the DIGITAL LIVING NETWORK ALLIANCE (DLNA), such as a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, and the like.
  • DLNA DIGITAL LIVING NETWORK ALLIANCE
  • the second electronic device 200B may be a device that has a playback function and also supports DLNA, such as a smart TV, a desktop computer, a notebook computer, and the like.
  • the difference between the audio data packet and the video data packet in the recorded or processed multimedia file is affected by the application of the device or software difference. Storage locations are subject to change.
  • the multimedia file recorded and photographed by the first electronic device 200A is played on the second electronic device 200B through DLNA, after the second electronic device 200A loads the multimedia file into the memory of the player, it can parse the multimedia file to obtain multiple codes Stream packets, wherein the code stream packets include audio data packets and video data packets.
  • the second electronic device 200A can continuously load the first number of the audio data packets into the audio decoder, and continuously load the second number of the video data packets into the video decoder, wherein the first A number is an integer greater than 1, and the second number is an integer greater than 1.
  • the first number of video data packets corresponds to the first playback time period
  • the second number of the audio data packets corresponds to the second playback time period
  • the deviation between the first playback time period and the second playback time period is less than Preset time difference threshold. That is, the first number of video data packets and the second number of audio data packets can be synchronized for a period of time after being decoded.
  • FIG. 3A is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 300 shown in FIG. 3A may specifically be a schematic structural diagram of the first electronic device 200A in FIG. 2A , or a schematic structural diagram of the second electronic device 200B in FIG. 2A .
  • the electronic device 300 may include a processor 110, a memory 120, a sensor module 130, a display device 140, a mobile communication module 150, a wireless communication module 160, an audio module 170, a camera 180, an input device 190, and the like.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 300 .
  • the electronic device 300 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processors, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the memory 120 stores computer programs, and the computer programs include operating system programs, application programs, and the like, wherein the application programs include browser programs.
  • the processor 110 is configured to read the computer program in the memory 120, and then execute the method defined by the computer program, for example, the processor 110 reads the operating system program to run the operating system on the electronic device 300 and implement various functions of the operating system, Or read one or more application programs to run the applications on the electronic device 300 , for example, read a browser program to run a browser.
  • the memory 120 also stores other data other than the computer program, and the other data may include data generated after the operating system or the application program is executed, and the data includes system data (such as configuration parameters of the operating system) and user data, such as Payment information for business products can be regarded as user data.
  • system data such as configuration parameters of the operating system
  • user data such as Payment information for business products can be regarded as user data.
  • Memory 120 generally includes internal memory and external memory.
  • the internal memory can store executable program codes of the calculator, and can be random access memory (RAM), read only memory (ROM), and cache memory (CACHE).
  • the processor 110 executes various functional applications and data processing of the electronic device 300 by executing the instructions stored in the internal memory.
  • the internal memory may include a program storage area and a data storage area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area can store data (such as audio data packets, video data packets, subtitle data packets, etc.) created during the use of the electronic device, and the like.
  • the internal memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the external memory can be used to connect an external memory card, such as a hard disk, an optical disk, a USB disk, a floppy disk, or a tape drive, etc., so as to expand the storage capacity of the electronic device 300 .
  • the external memory card communicates with the processor 110 through the external memory interface to realize the data storage function. For example to save files like music, video etc in external memory card.
  • the sensor module 130 includes a pressure sensor, a fingerprint sensor, a touch sensor, and the like.
  • the pressure sensor is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor may be provided on the display screen.
  • the fingerprint sensor is used to collect fingerprints, and electronic devices can use the collected fingerprint features to unlock fingerprints, access application locks with fingerprints, take photos with fingerprints, answer calls with fingerprints, and pay with fingerprints.
  • a touch sensor is used to detect touch operations on or near it.
  • the display device 140 is used to display images, videos, and the like. It includes a display screen for displaying information input by the user or information provided to the user, various menu interfaces of the electronic device 300, and the like. In the embodiment of the present application, the electronic device displays the video data packets in the multimedia file through the display screen.
  • the display screen of the display device 140 can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active matrix organic light emitting diode).
  • the electronic device 300 may include 1 or N display screens, where N is a positive integer greater than 1.
  • the wireless communication function of the electronic device 300 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the electronic device 300 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • At least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the electronic device 300 can directly establish a communication connection with the server 201 through the mobile communication module 150, receive instructions and data (such as multimedia files) transmitted by the server 201, and also transmit instructions and data to the cloud server.
  • the wireless communication module 160 can provide applications on the electronic device 300 including wireless local area networks (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellite systems (GNSS). ), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the antenna 1 of the electronic device 300 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 300 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system, GLONASS
  • Beidou navigation satellite system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quadsi -zenith satellite system, QZSS
  • SBAS satellite based augmentation systems
  • the electronic device may implement audio functions through the audio module 170 and the processor 110 and the like. For example, audio playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • the electronic device may also implement a shooting function through an ISP, a camera, a video codec, a GPU, a display device 140, a processor 110, and the like.
  • the ISP is used to process the data fed back by the camera 180 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 180 .
  • the camera 180 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 300 may include 1 or N cameras 180 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 300 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, and the like.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 300 may support one or more video codecs.
  • the electronic device 300 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 moving picture experts group
  • the input device 190 is used for receiving input digital information, character information or contact touch operation/non-contact gesture, and generating signal input related to user settings and function control of the electronic device 300 .
  • FIG. 3B is a software structural block diagram of an electronic device provided by an embodiment of the present application.
  • the software structural block diagram of the electronic device shown in FIG. 3B may specifically be the software structural block diagram of the first electronic device 200A shown in FIG. 2A , or the software structural block diagram of the second electronic device 200B shown in FIG. 2B .
  • Software systems of electronic equipment include but are not limited to Linux or other operating systems. For Huawei's Hongmeng system.
  • the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of an electronic device.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a system runtime layer, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, etc.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Figure 3B, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device. For example, the management of call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • the system runtime layer includes system libraries and Android runtime.
  • the system library is the support of the application framework; the Android runtime is responsible for the scheduling and management of the Android system, and is divided into two parts: the core library and the virtual machine.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: Moving Pictures Experts Group (MPEG), Audio Video Interleaved (AVI), Transport Stream (MPEG2-TS, TS), dynamic Video experts compress standard Audio Layer 3 (Moving Picture Experts Group Audio Layer III, MP3), Advanced Audio Coding (Advanced Audio Coding, AAC), Portable Network Graphics (PNG), etc.
  • MPEG Moving Pictures Experts Group
  • AVI Audio Video Interleaved
  • MPEG2-TS Transport Stream
  • TS Transport Stream
  • dynamic Video experts compress standard Audio Layer 3 (Moving Picture Experts Group Audio Layer III, MP3), Advanced Audio Coding (Advanced Audio Coding, AAC), Portable Network Graphics (PNG), etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, compositing and layer processing, etc.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • the software and hardware workflows of the electronic device 300 are exemplarily described below with reference to the playback scene of the multimedia file.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch coordinate operation, and the control corresponding to the touch operation is the control of the icon of the video application as an example, the video application calls the interface of the application framework layer to start the video application, and then starts the audio driver and the display driver by calling the kernel layer.
  • the first number of video data packets corresponds to the first playback time period
  • the second number of audio data packets corresponds to the second playback time period
  • the difference between the first playback time period and the second playback time period is less than a preset time difference threshold
  • the first quantity is an integer greater than 1
  • the second quantity is an integer greater than 1.
  • FIG. 4 is a schematic diagram of software module interaction of an electronic device provided by an embodiment of the present application. It can be understood that the schematic diagram of the interaction of software modules of the electronic device shown in FIG. 4 may specifically be the schematic diagram of the interaction of software modules of the first electronic device 200A shown in FIG. 2A , or the software modules of the second electronic device 200B shown in FIG. 2B . Interactive diagram.
  • the software modules may include: a file loading module 401 , a parsing module 402 , a judgment module 403 , a stream package loading module 404 , a decoding module 405 and a synchronization module 406 . in,
  • File loading module 401 including one or more file protocols, such as File Transfer Protocol (File Transfer Protocol, FTP), Hypertext Transfer Protocol (Hypertext Transfer Protocol, HTTP), Real Time Streaming Protocol (Real Time Streaming Protocol, RTSP) etc.
  • the loader 402 can load or download the multimedia file into the memory of the player through the above-mentioned protocol.
  • the parsing module 402 is configured to use a corresponding file protocol to parse the multimedia file according to the encapsulation form of the multimedia file to obtain description information and a plurality of code stream packets, wherein the code stream packets include audio data packets and video data packets.
  • the judgment module 403 is used to count the storage data volume between the audio data packets and the video data packets with the corresponding display timestamp relationship in the multiple code stream packets according to the description information, and then count the storage data volume greater than or equal to the target distance threshold value.
  • the target number and finally, according to the target number, it can be determined whether the interleaving of multiple code stream packets is uniform.
  • the target distance threshold is the larger one of the first distance and the preset distance
  • the first distance is the width, height, storage ratio or compression ratio of the multimedia file carried according to the description information.
  • At least one item is determined; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
  • a video frame may be an image displayed on a display after the video data packets in the multimedia file are decoded.
  • the code stream package loading module 404 continuously loads the first number of audio data packets into the audio decoder, and continuously loads the second number of video data packets into the video decoder.
  • the first number is an integer greater than 1
  • the second number is an integer greater than 1.
  • the first number of video data packets corresponds to the first playback time period
  • the second number of the audio data packets corresponds to the second playback time period
  • the deviation between the first playback time period and the second playback time period less than the preset time difference threshold may be time periods set according to actual needs
  • the preset time difference threshold may be a value artificially set based on experience for reference and comparison, or training based on multiple historical values. (or learned) a value used for reference contrast.
  • the decoding module 405 includes an audio decoder and a video decoder.
  • the audio decoder is used for decoding the audio data packets in the compressed and encoded form into uncompressed audio raw data;
  • the video decoder is used for decoding the video data packets in the compressed and encoded form into the uncompressed video original data.
  • the synchronization module 405 is used to perform synchronization processing on the first quantity of audio data packets and the second quantity of video data packets obtained by decoding according to the description information obtained by analysis, and send the above-mentioned first quantity of audio data packets to the sound card. , and send the second number of video data packets to the graphics card.
  • FIG. 5 is a method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:
  • Step S501 Load the multimedia file into the memory of the player.
  • the electronic device can load or download multimedia files into the memory of the player according to a series of file protocols, such as one or more of the FTP protocol, the HTTP protocol and the RTSP protocol.
  • a series of file protocols such as one or more of the FTP protocol, the HTTP protocol and the RTSP protocol.
  • Step S502 Parse the multimedia file to obtain multiple stream packets.
  • the electronic device can parse the multimedia file according to the specific encapsulation form of the multimedia file by using a corresponding encapsulation protocol to obtain a plurality of code stream packets, wherein the code stream packets include audio data packets and video data packets.
  • the audio data packets may be data packets in the form of audio stream compression
  • the video data packets may be data packets in the form of video stream compression encoding.
  • Step S503 Continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.
  • the first number of video data packets corresponds to the first playback time period
  • the second number of audio data packets corresponds to the second playback time period
  • the deviation between the first playback time period and the second playback time period is less than the preset time difference threshold. That is, the first number and the second number may be equal or unequal, but when the electronic device loads the first number of audio data packets and the second number of video data packets to the player for playback, the playback duration needs to be basically the same. . "Substantially the same" indicates to the user that the audio data packets and the video data packets are played synchronously.
  • the first playback time period and the second playback time period can be determined according to the size of the buffer register buffer in the decoder (the first playback time corresponds to the video decoder, and the second playback time corresponds to the audio decoder) and the frequency of reading stream packets. . It is understandable that if the number of code stream packets read at one time is too large, the buffer may be blocked; if the number of code stream packets read at one time is too small, it may cause frequent jumps and cause read underload. Therefore, it is necessary to set a reasonable playback time range to ensure that the number of stream packets read each time is appropriate. Further, the preset time range can satisfy the first number of audio data packets and the second number of video data packets to play play in the player for at least 1 second.
  • the audio decoder is used for decoding the first quantity of audio data packets in the compressed and coded form into the first quantity of uncompressed audio raw data; the video decoder is used for decoding the second quantity of the video data packets in the compressed and coded form into a second quantity of audio data packets. Amount of raw video data in uncompressed form.
  • the audio decoder sends the first quantity of uncompressed audio raw data packets to the sound card, and the video decoder sends the second quantity of uncompressed uncompressed video raw data packets to the graphics card. Accent and lip sync playback is possible with a sound card and display.
  • the multimedia file may include description information, and if the information of the first stream packet stored in the description information is the information of the audio data packet, the electronic device can continuously load the first number of audio data packets Go to the audio decoder to obtain the audio frame, and then continuously load the second number of video data packets into the video decoder to obtain the video frame. After synchronizing the audio frame and the video frame, the audio frame is played by the speaker, and the video frame is displayed by the display.
  • the electronic device can first continuously load the second number of video data packets into the video decoder to obtain the video frame, and then continuously load the first number of audio data packets into the audio decoder to obtain audio frames. After synchronizing the video frame and the audio frame, the audio frame is played by the speaker, and the video frame is displayed by the display.
  • FIG. 6 is another method for playing a multimedia file provided by an embodiment of the present application.
  • the method includes but is not limited to the following steps:
  • Step S601 Load the multimedia file into the memory of the player.
  • step S501 which will not be repeated here.
  • Step S602 Parse the multimedia file to obtain multiple stream packets.
  • the multimedia file may include description information, and the description information and multiple code stream packets may be obtained by parsing the multimedia file.
  • Step S603 Determine whether the interleaving of multiple code stream packets is uniform according to the description information.
  • the description information can be the information located at the head or the tail of the multimedia file. Like a book, it contains two parts, the directory and the content.
  • the description information describes the file information and the attribute information, storage location and size, etc. For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.; The position (Position, POS), the length of the stream packet, and the presentation timestamp (Presentation Time Stamp, PTS) of the stream packet. Whether the interleaving of multiple code stream packets is uniform can be determined according to the storage location and display time stamp of each code stream packet in the description information.
  • the description information stores the display timestamp and storage location of each stream packet
  • the electronic device may first count the audio data with the corresponding display timestamp relationship in the multiple stream packets according to the description information.
  • Amount of stored data between packets and video packets That is, the electronic device can find the video data packet with the corresponding display time stamp according to the display time stamp of the audio data packet, and then according to the storage location statistics show the storage between the audio data packet and the video data packet with the corresponding time stamp.
  • the amount of data It can be understood that the amount of stored data can indicate that other code stream packets are stored between the storage location of the audio data packet and the storage location of the video data packet corresponding to the display time stamp. Then, the electronic device counts the number of targets whose stored data amount is greater than or equal to the target distance threshold, and finally can determine whether the interleaving of multiple code stream packets is uniform according to the target number.
  • the target distance threshold is the larger one of the first distance and the threshold distance
  • the first distance is the width, height, storage ratio or compression ratio of the multimedia file carried by the electronic device according to the description information determined by at least one of; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
  • the first distance L width*height*store_rate*compression_rate.
  • the electronic device may determine, according to the description information, the display time stamp and storage location of each code stream packet in a preset number of code stream packets, where the preset number of code stream packets belong to multiple code streams The first N code stream packets in the stream packets, where N is a positive integer. Then the electronic device counts the respective storage locations of the video data packets and the audio data packets with the corresponding display time stamp relationship in the preset number of stream packets in the manner of increasing the display time stamp, and finally according to the storage location of the video data packets according to the statistics and the storage location of the audio data packet to determine the amount of stored data between the video data packet and the audio data packet with the corresponding display time stamp relationship.
  • the preset number may be a value for reference comparison artificially set according to experience, or a value for reference comparison obtained by training (or learning) according to a plurality of historical values.
  • the preset number N can be set to 2000. If it is determined according to the description information that the number of code stream packets of the multimedia file is less than the preset number, the electronic device may count the number of the audio data packets and the video data packets with the corresponding display timestamp relationship in all the code stream packets. The amount of data stored.
  • the electronic device can determine the number of stream packets in the preset number N of stream packets. Show timestamp and storage location. If the information of the first stream packet stored in the description information is the information of the audio data packet, the storage location of the video data packet corresponding to the above-mentioned display timestamp can be determined according to the display time stamp of the first audio data packet , and count the amount of stored data between the storage location of the audio data packet and the storage location of the video data packet corresponding to the time stamp relationship. By analogy, the storage location of the second audio data packet, the third audio data packet, . . .
  • the Nth audio data packet and the video data packet corresponding to the respective display time stamps is determined in a manner of increasing the display time stamp. Finally, the amount of stored data between the storage location of each audio data packet and the storage location of the video data packet whose display time stamp has a corresponding relationship is counted.
  • the storage data volume may also be multiple storage data volumes (the storage data volume between each audio data packet and the video data packet whose display time stamp has a corresponding relationship).
  • the stored data volume counts the number of targets whose stored data volume is greater than or equal to the above target distance threshold. For example, if the target distance threshold D is 3MB, and the multiple storage data volumes include: 10MB, 2MB, 3.7MB, 1.9MB, 6.9MB, and 11.6MB, then the storage data volume is greater than or equal to the target distance threshold D is 3MB. The number is 4.
  • the electronic device can determine whether the interleaving of the multiple code stream packets is uniform according to the target quantity.
  • the electronic device can calculate the ratio rate of the target number abnormal_cnt to the preset number N, and if the ratio rate is greater than or equal to the second preset threshold, it can be determined that the interleaving of multiple code stream packets belongs to uneven interleaving.
  • the second preset threshold may be a value artificially set according to experience for reference comparison, or a value obtained by training (or learning) according to a plurality of historical values for reference comparison.
  • the second preset threshold may take a value of 0.2.
  • Step S604 Continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.
  • step S503 which will not be repeated here.
  • FIG. 7 is a schematic flowchart of a method for judging whether the interleaving of multimedia files is uniform according to an embodiment of the present application.
  • description information and/or multiple code stream packets can be obtained by parsing the multimedia file.
  • the code stream packet includes audio data packets and video data packets.
  • the description information describes the file information and the attribute information, storage location and size of each code stream packet.
  • the playback time, storage ratio, compression ratio of multimedia files For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.;
  • the position POS, the length of the code stream packet, and the PTS of the code stream packet For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.
  • the electronic device may determine the target distance threshold according to the description information of the multimedia file. Specifically, the electronic device may determine the first distance according to one or more of the width, height, storage ratio store_rate, and compression ratio compression_rate of the multimedia file carried by the description information. If the first distance is greater than the preset distance threshold, the target threshold is the first distance; if the first distance is less than the preset distance threshold, the target threshold is the preset distance threshold; if the first distance is equal to the preset distance threshold, the target threshold is the first distance or a preset distance threshold. That is, the target distance threshold is the larger one of the first distance and the preset distance threshold.
  • the preset distance threshold is a preset parameter used to measure whether the interleaving of data packets is uniform, which can be a value artificially set based on experience for reference comparison, or training based on multiple historical values. (or learned) a value used for reference contrast.
  • the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage location of the preset number of code stream packets and the display time stamp. If the number of code stream packets of the multimedia file is less than the preset number, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage locations and display time stamps of all code stream packets of the multimedia file, that is, the preset number is the code stream The total number of packets; if the number of code stream packets of the multimedia file is greater than or equal to the above preset number, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage location and display time stamp of the first N code stream packets.
  • the preset number is the first N code stream packets among the plurality of code stream packets.
  • the method of first selecting the code stream packet with a smaller timestamp and then selecting the code stream packet with a larger timestamp, based on each code stream stored in the description information The storage location and display time stamp of the stream packets are used to analyze and judge a preset number of stream packets. Specifically, in each analysis and processing, the first code stream packet with the smallest timestamp is selected among the unprocessed code stream packets.
  • the first code stream packet stored in the description information is an audio data packet information
  • the first code stream packet is an audio data packet
  • the second code stream packet is a video data packet
  • the second code stream packet is a code stream packet having a corresponding display timestamp relationship with the first code stream packet.
  • the electronic device needs to determine whether the amount of stored data between the storage location of the first stream packet and the storage location of the second stream packet is greater than or equal to the target threshold, and if it is greater than or equal to the target threshold, it is considered that the two streams
  • the amount of data stored in the package is abnormal, and a large jump may occur in actual playback. Therefore, the number of times the amount of stored data exceeds the target threshold is recorded by updating the target amount.
  • the interleaving situation of each code stream packet in the preset number is analyzed and processed. When the number of the code stream packets analyzed and processed is greater than or equal to the preset number, it means that the electronic equipment has processed and analyzed.
  • the second preset threshold value may be a value for reference comparison artificially set according to experience, or a value for reference comparison obtained by training (or learning) according to a plurality of historical values.
  • FIG. 8A is a schematic flowchart of a playback method for batch loading stream packets provided by an embodiment of the present application.
  • the electronic device parses the multimedia file according to the schematic flowchart shown in FIG. 6 and determines the interleaving situation of the code stream packets. If the interleaving of the code stream packets is uniform, the code stream packets with the smallest display time stamp that are not loaded are selected each time to the corresponding decoder until all the code stream packets are loaded into the decoder for decoding and playback.
  • FIG. 1A which will not be repeated here.
  • FIG. 8B is an example provided by an embodiment of the present application. As can be seen from FIG.
  • the electronic device jumps to the audio
  • the position of the video stream packet corresponding to the display timestamp of the data packet, read and load the second number of video data packets into the video decoder in sequence, and the playback duration of the second number of video data packets in the player also needs to be Corresponding to the second playback time period, the deviation between the first playback time period and the second playback time period is less than a preset time difference threshold.
  • the electronic device first reads and loads the second number of video data packets into the video decoder in sequence, and then sequentially. Read and load the first number of audio packets into the audio decoder.
  • the above-mentioned “sequential" may be in the order of displaying timestamps from small to large.
  • the first quantity and the second quantity may or may not be equal, but when the electronic device loads the first quantity of audio data packets and the second quantity of video data packets into the corresponding decoder for decoding and playing, the playing The time period needs to be basically the same. "Substantially the same” indicates to the user that the audio data packets and the video data packets are played synchronously.
  • the preset time range can be determined according to the size of the buffer in the decoder and the frequency of reading stream packets. It is understandable that if the number of code stream packets read at one time is too large, the buffer may be blocked; if the number of code stream packets read at one time is too small, it may cause frequent jumps and cause read underload.
  • the preset time range can satisfy the first number of audio data packets and the second number of video data packets. Play in the player for a duration of at least 1 second.
  • the electronic device can load audio data packets and video data packets into the player in a roughly aligned manner in a preset time range (for example, 1 second), reducing the frequent jumps caused by the alignment of the code stream packets. performance overhead.
  • the electronic device in the process of reading and loading the code stream package, can dynamically determine the interleaving situation of the code stream package according to the schematic flowchart shown in FIG. 6, and adjust the loading of the code stream package in real time Way. For example, if the multimedia file to be played includes 8,000 stream packets, when it is found that the interleaving of the stream packets is uneven according to the analysis of the first N stream packets in the 8,000 stream packets, the electronic device can continuously load the first stream packet. A number of audio data packets, and a second number of video data packets are sequentially loaded into the player for playback. Wherein, N can be a preset number, and N is less than 8000.
  • the electronic device After the electronic device has loaded 1,000 code stream packets into the player for decoding and playback, there are still 7,000 code stream packets that are not loaded in the multimedia file. At this time, the electronic device can analyze the interleaving situation of the code stream packets at this time according to the first N code stream packets in the 7000 code stream packets. Select the unloaded code stream package with the smallest display time stamp to the corresponding decoder until all the code stream packages are loaded into the player for decoding and playback.
  • FIG. 9 is another method for playing a multimedia file provided by an embodiment of the present application.
  • the method includes but is not limited to the following steps:
  • Step S901 Load the multimedia file into the memory of the player.
  • Step S902 Parse the multimedia file to obtain multiple stream packets.
  • the electronic device parses the multimedia file and obtains that the code stream package includes a video data package, an audio data package and a subtitle data package.
  • the electronic device can display video data packets and subtitle data packets through a display, and play audio data packets through a speaker. It can be understood that, when the audio data packets of some multimedia files do not have embedded subtitles, the code stream packets of the multimedia files may include subtitle data packets.
  • Step S903 Continuously load a first number of audio data packets into the audio decoder, and, continuously load a second number of video data packets into the video decoder, and continuously load a third number of subtitle data packets into the subtitle decoder middle.
  • the electronic device sequentially reads and loads a first number of audio data packets into an audio decoder for decoding to obtain audio frames, loads a second number of video data packets into a video decoder for decoding to obtain video frames, and, Load the third number of subtitle data packets into the subtitle decoder for decoding to obtain subtitle frames. Then, after the electronic device performs synchronization processing on the audio frame, the video frame and the subtitle frame, the video frame and the subtitle frame are displayed by the display, and the audio frame is played by the speaker. Therefore, audio data packets, video data packets and subtitle data packets are loaded alternately until the code stream packets of the multimedia files to be played are loaded.
  • the third number is an integer greater than 1, the third number of subtitle data packets corresponds to the third playback period, the deviation between the third playback period and the first playback period, or the deviation from the second playback period are smaller than the preset time difference threshold.
  • the electronic device continuously loads the first number of audio data packets into the audio decoder, and , successively loading a second number of video packets into the video decoder, and successively loading a third number of subtitle packets into the subtitle decoder.
  • the loading order of audio data packets, video data packets and subtitle data packets can be determined according to the description information obtained by parsing. If the information of the first stream packet stored in the description information is the information of the video data packet, the The device may first load the second quantity of video data packets, then load the first quantity of audio data packets, and then load the third quantity of subtitle data packets to be played in the player. This embodiment of the present application does not impose any restrictions on the order of loading the code stream package.
  • FIG. 10 is a schematic structural diagram of an apparatus for playing a multimedia file provided by an embodiment of the present application.
  • the apparatus 100 for playing a multimedia file may be a node, or may be a device in a node, such as a chip or an integrated circuit , the apparatus 100 for playing a multimedia file may include a first loading unit 1001, a parsing unit 1002 and a second loading unit 1003, wherein the detailed description of each unit is as follows.
  • the first loading unit 1001 is used for loading multimedia files into the memory of the player
  • parsing unit 1002 configured to parse the multimedia file to obtain a plurality of code stream packets, the code stream packets include audio data packets and video data packets;
  • the second loading unit 1003 is configured to continuously load a first number of the audio data packets into an audio decoder, and continuously load a second number of the video data packets into a video decoder, wherein the first The number is an integer greater than 1, and the second number is an integer greater than 1.
  • the first number of the video data packets corresponds to a first playback period
  • the second number of the audio data packets corresponds to a second playback period
  • the first playback period The deviation between the time period and the second playing time period is less than a preset time difference threshold.
  • the multimedia file includes description information
  • the apparatus further includes a determination unit 1004, where the determination unit 1004 is configured to: determine whether the interleaving of the multiple code stream packets is uniform according to the description information.
  • the determining unit 1004 is specifically configured to: count, according to the description information, the audio data packets and the video data that have a corresponding display timestamp relationship in the plurality of stream packets The amount of stored data between packets; count the number of targets whose stored data amount is greater than or equal to the target distance threshold; determine whether the interleaving of the multiple code stream packets is uniform according to the target number.
  • the target distance threshold is a larger one of a first distance and a preset distance
  • the first distance is the width and height of the multimedia file carried according to the description information , storage ratio or compression ratio
  • the multimedia file includes a video frame
  • the width and height of the multimedia file correspond to the width and height of the video frame.
  • the determining unit 1004 is specifically configured to: determine, according to the description information, a display timestamp and a storage location of each stream packet in a preset number of stream packets, the preset The number of code stream packets belongs to the first N code stream packets in the plurality of code stream packets, and N is a positive integer; it is determined that the preset number of code stream packets has corresponding Display the respective storage positions of the video data packets and the audio data packets with the time stamp relationship; The amount of data stored between the video data packets and the audio data packets.
  • the determining unit 1004 is specifically configured to: calculate the ratio of the target quantity to the preset quantity; if the ratio is greater than or equal to a second preset threshold, determine the The interleaving of the multiple code stream packets is not uniform; or if the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the multiple code stream packets is not uniform.
  • the code stream package further includes a subtitle data package
  • the second loading unit 1003 is further configured to: continuously load a third number of the subtitle data packages into the subtitle decoder, wherein , the audio data packets of the first quantity, the video data packets of the second quantity and the subtitle data packets of the third quantity are loaded alternately; wherein the third quantity is greater than 1 , the third number of the subtitle data packets corresponds to the third playback time period, the deviation between the third playback time period and the first playback time period, or the difference between the third playback time period and the second playback time period The deviations between them are all smaller than the preset time difference threshold.
  • the second loading unit 1003 is further configured to: if the description information determines that the interleaving of the plurality of code stream packets belongs to uniform interleaving, load the display with the smallest timestamp. Audio packets are loaded into the audio decoder, and video packets with the smallest display timestamp are loaded into the video decoder.
  • each unit may also refer to the corresponding description of the method embodiment shown in FIG. 5 , FIG. 6 or FIG. 9 .
  • Embodiments of the present application further provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium.
  • a program is stored in the computer-readable storage medium.
  • Embodiments of the present application further provide a computer program product, the computer program product includes program instructions, and when the program instructions are run on a computer or a processor, the method flow shown in FIG. 5 , FIG. 6 or FIG. 9 is implemented.
  • the process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium.
  • the program When the program is executed , which may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Abstract

Embodiments of the present application provide a multimedia file playing method and a related apparatus. The method may comprise: loading a multimedia file into a memory of a player; parsing the multimedia file to obtain a plurality of code stream packets, wherein the code stream packets comprise audio data packets and video data packets; and continuously loading a first number of audio data packets into an audio decoder, and continuously loading a second number of video data packets into a video decoder, wherein the first number is an integer greater than 1, and the second number is an integer greater than 1. According to the embodiments of the present application, playing jamming caused by frequently jumping and loading corresponding code stream packets can be reduced while ensuring that audio data packets and video data packets can be played synchronously.

Description

多媒体文件的播放方法及相关装置Multimedia file playback method and related device 技术领域technical field
本发明涉及多媒体技术领域,尤其涉及一种多媒体文件的播放方法及相关装置。The present invention relates to the technical field of multimedia, and in particular, to a method and a related device for playing multimedia files.
背景技术Background technique
随着计算机技术的发展,在手机等电子设备上录制多媒体文件(比如说短视频),以及对多媒体文件(比如说短视频)进行加工处理也越来越普遍。一般来说,多媒体文件包含两部分:一部分为描述信息或称描述块,另一部分为多个码流包。码流包可以包括处理后的音频数据包和视频数据包,一般来说,音频数据包和视频数据包是依序均匀交织存储的。With the development of computer technology, it is more and more common to record multimedia files (such as short videos) on electronic devices such as mobile phones, and to process multimedia files (such as short videos). Generally speaking, a multimedia file contains two parts: one part is description information or description block, and the other part is multiple code stream packets. The code stream packet may include processed audio data packets and video data packets. Generally speaking, the audio data packets and the video data packets are sequentially and uniformly interleaved and stored.
但是,因为设备或软件的差异,可能会导致录制或处理后的多媒体文件中码流包的存储位置会发生变化,也即音频数据包和视频数据包不是依序均匀交织存储的。所以,在一些场景下可能会导致多媒体文件播放时出现卡顿的现象,降低用户的观感。However, due to differences in equipment or software, the storage location of the stream packets in the recorded or processed multimedia files may change, that is, the audio data packets and the video data packets are not evenly interleaved and stored in sequence. Therefore, in some scenarios, it may cause a stuck phenomenon when the multimedia file is played, reducing the user's perception.
发明内容SUMMARY OF THE INVENTION
本申请实施例公开了一种多媒体文件的播放方法及相关装置,能够保证音频数据包和视频数据包可以被同步播放的情况下,减少因频繁跳转加载对应的码流包而引起的播放卡顿问题。The embodiment of the present application discloses a method and a related device for playing a multimedia file, which can ensure that audio data packets and video data packets can be played synchronously, and reduce the playing card caused by frequent jumping and loading of corresponding stream packets. Dayton problem.
本申请实施例第一方面提供了一种多媒体文件的播放方法,该方法可以包括:加载多媒体文件到播放器的内存中;解析上述多媒体文件得到多个码流包,码流包包括音频数据包和视频数据包;然后再连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中,其中,第一数量为大于1的整数,第二数量为大于1的整数。A first aspect of the embodiments of the present application provides a method for playing a multimedia file, the method may include: loading the multimedia file into the memory of the player; parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets; then continuously load the first quantity of the audio data packets into the audio decoder, and continuously load the second quantity of the video data packets into the video decoder, wherein the first quantity is greater than an integer of 1, the second number is an integer greater than 1.
在本申请实施例中,每次连续加载数量大于1的音频数据包到音频解码器中,以及每次连续加载数量大于1的视频数据包到视频解码器中的加载方式,相对于现有技术中每次加载1个音频数据包到音频解码器中,以及每次加载1个视频数据包到视频解码器中的加载方式,可以减少频繁跳转加载码流包的次数。保证音频数据包和视频数据包可以被同步播放的情况下,提高读取码流包的速度,避免因发生数据包欠载而引起的播放卡顿。In the embodiment of the present application, the loading method of continuously loading audio data packets with a quantity greater than 1 into the audio decoder each time and continuously loading video data packets with a quantity greater than 1 into the video decoder each time is compared with the prior art. The loading method of loading 1 audio data packet into the audio decoder at a time, and loading 1 video data packet into the video decoder at a time, can reduce the number of frequent jumps to load code stream packets. Under the condition that audio data packets and video data packets can be played synchronously, the speed of reading stream packets is improved to avoid playback freezes caused by data packet underload.
根据第一方面,在一种可能的实现方式中,第一数量的视频数据包对应第一播放时间段,第二数量的音频数据包对应第二播放时间段,第一播放时间段和第二播放时间段之间的偏差小于预设时间差阈值。According to the first aspect, in a possible implementation manner, the first number of video data packets corresponds to the first playback period, the second number of audio data packets corresponds to the second playback period, and the first playback period and the second The deviation between the playback time periods is less than the preset time difference threshold.
根据第一方面,在一种可能的实现方式中,第一数量的视频数据包对应的第一播放时间段为至少1秒的时间长度;第二数量的视频数据包对应的第二播放时间段为至少1秒的时间长度。According to the first aspect, in a possible implementation manner, the first playback time period corresponding to the first number of video data packets is at least 1 second; the second playback time period corresponding to the second number of video data packets for a length of time of at least 1 second.
可以看出,第一数量可以满足视频数据包在播放器中播放一定的时间长度,第二数量也可以满足视频数据包在播放器中播放第一数量所满足的时间长度。因此,上述加载码流包的方式不再是现有技术中以单个码流包的显示时间戳来同步对齐的方式,而是以预设时 间段为单位大概对齐的方式,这种方式可以避免因频繁跳转加载码流包而带来的性能开销。It can be seen that the first quantity can satisfy a certain time length for the video data packets to be played on the player, and the second quantity can also satisfy the time length required for the video data packets to play the first quantity on the player. Therefore, the above-mentioned method of loading stream packets is no longer the method of synchronizing and aligning with the display timestamp of a single stream packet in the prior art, but the method of roughly aligning in units of preset time periods, which can avoid Performance overhead caused by frequent jumps to load stream packages.
根据第一方面,在一种可能的实现方式中,多媒体文件包括描述信息,解析多媒体文件得到多个码流包之后,连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中之前,还包括:根据描述信息确定所述多个码流包的交织是否均匀。描述信息中存储了每个码流包的相关信息,也即每个音频数据包和每个视频数据包的相关信息。根据音频数据包和视频数据包的相关信息可以确定多个码流包的交织是否均匀。According to the first aspect, in a possible implementation manner, the multimedia file includes description information, and after parsing the multimedia file to obtain a plurality of code stream packets, the audio data packets of the first number are continuously loaded into the audio decoder, and, Before continuously loading the second number of the video data packets into the video decoder, the method further includes: determining whether the interleaving of the plurality of code stream packets is uniform according to the description information. The description information stores the relevant information of each code stream packet, that is, the relevant information of each audio data packet and each video data packet. According to the relevant information of the audio data packet and the video data packet, it can be determined whether the interleaving of the multiple code stream packets is uniform.
根据第一方面,在一种可能的实现方式中,根据描述信息确定多个码流包的交织是否均匀,可以包括:根据描述信息统计多个码流包中具有对应显示时间戳关系的音频数据包和视频数据包之间的存储数据量;然后再统计存储数据量大于或等于目标距离阈值的目标数量;最后可以根据目标数量确定所述多个码流包的交织是否均匀。According to the first aspect, in a possible implementation manner, determining whether the interleaving of multiple code stream packets is uniform according to the description information may include: according to the description information, counting audio data with a corresponding display timestamp relationship in the multiple code stream packets The amount of stored data between the packets and the video data packets; then count the number of targets whose stored data amount is greater than or equal to the target distance threshold; finally, it can be determined whether the interleaving of the multiple code stream packets is uniform according to the number of targets.
可以看出,目标数量是充分考虑了每一个音频数据包和视频数据包而确定的,所以根据目标数量可以提高判定多个码流包的交织是否均匀的准确性,使得交织是否均匀的判定结果具有更高的可信度。It can be seen that the target number is determined by fully considering each audio data packet and video data packet, so the accuracy of judging whether the interleaving of multiple code stream packets is uniform can be improved according to the target quantity, so that the judgment result of whether the interleaving is uniform can be improved. have higher credibility.
根据第一方面,在一种可能的实现方式中,目标距离阈值为第一距离和预设距离中更大的一个,第一距离为根据描述信息所携带的多媒体文件的宽度、高度、存储比或压缩比中的至少一项所确定的;其中,多媒体文件包括视频帧,多媒体文件的宽度和高度对应于所述视频帧的宽度和高度。According to the first aspect, in a possible implementation manner, the target distance threshold is the larger one of the first distance and the preset distance, and the first distance is the width, height, storage ratio of the multimedia file carried according to the description information or determined by at least one of the compression ratios; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
可以看出,第一距离是与多媒体文件所相关的参数,所以目标距离阈值也是与多媒体文件相关的参数,因此通过目标距离阈值可以提高多媒体文件中判定多个码流包的交织是否均匀的准确性,使得交织是否均匀的判定结果具有更高的可信度。It can be seen that the first distance is a parameter related to the multimedia file, so the target distance threshold is also a parameter related to the multimedia file. Therefore, the target distance threshold can improve the accuracy of determining whether the interleaving of multiple code stream packets is uniform in the multimedia file. , so that the judgment result of whether the interleaving is uniform has higher reliability.
根据第一方面,在一种可能的实现方式中,所述根据所述描述信息统计所述多个码流包中具有对应显示时间戳关系的所述音频数据包和所述视频数据包之间的存储数据量,包括:根据所述描述信息确定预设数量的码流包中每个码流包的显示时间戳和存储位置,所述预设数量的码流包属于所述多个码流包中的前N个码流包,N为正整数;按照所述显示时间戳逐增的方式确定所述预设数量的码流包中具有对应显示时间戳关系的所述视频数据包和音频数据包各自的存储位置;根据统计的所述视频数据包的存储位置和音频数据包的存储位置,确定具有对应显示时间戳关系的所述视频数据包和音频数据包之间的存储数据量。According to the first aspect, in a possible implementation manner, the statistics between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets according to the description information The amount of stored data includes: determining, according to the description information, the display time stamp and storage location of each code stream packet in a preset number of code stream packets, where the preset number of code stream packets belong to the multiple code streams The first N code stream packets in the package, where N is a positive integer; according to the manner in which the display time stamp is gradually increased, it is determined that the video data packets and the audio frequency corresponding to the display time stamp relationship in the preset number of code stream packets are The respective storage positions of the data packets; according to the statistics of the storage positions of the video data packets and the storage positions of the audio data packets, determine the storage data amount between the video data packets and the audio data packets with the corresponding display time stamp relationship.
可以看出,当多媒体文件的码流包的数量较多时,可以先判定多个码流包中的前N个码流包的交织情况,避免因确定码流包的交织情况的时间过长而影响播放器启动播放的速度。It can be seen that when the number of code stream packets of a multimedia file is large, the interleaving situation of the first N code stream packets in the multiple code stream packets can be determined first, so as to avoid the excessive time for determining the interleaving situation of the code stream packets. Affects the speed at which the player starts playback.
根据第一方面,在一种可能的实现方式中,所述根据所述目标数量确定所述多个码流包的交织是否均匀,包括:计算所述目标数量占所述预设数量的比例;若所述比例大于或等于第二预设阈值,则确定所述多个码流包的交织情况属于交织不均匀;或若所述目标数量大于或等于第三预设阈值,则确定所述多个码流包的交织不均匀。According to the first aspect, in a possible implementation manner, the determining whether the interleaving of the plurality of code stream packets is uniform according to the target quantity includes: calculating the ratio of the target quantity to the preset quantity; If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the multiple code stream packets belongs to uneven interleaving; or if the target number is greater than or equal to the third preset threshold, it is determined that the multiple code stream packets are interleaved unevenly; The interleaving of the code stream packets is uneven.
可以看出,因为音频数据包和视频数据包之间的存储数据量大于或等于目标距离阈值,所以按照现有技术每次加载1个音频数据包到音频解码器中,以及每次加载1个视频数据 包到视频解码器中的加载方式,可能会出现跳转,也即当先加载1个音频数据包到音频解码器后,需要跳转到对应的视频数据包的存储位置来加载1个视频数据包到视频解码器。所以当具有对应显示时间戳关系的音频数据包和视频数据包之间的存储数据量大于或等于目标距离阈值的目标数量与预设数量的比例较大,也即目标数量与预设数量的比例大于或等于第二预设阈值时,或者,上述目标数量大于或等于第三预设阈值时,按照现有技术加载码流包到对应的解码器的过程中可能会发生频繁跳转,所以可以确定多个码流包的交织不均匀。通过计算得到的比例可以提高判定多个码流包的交织情况属于交织不均匀的准确性,使得交织不均匀的这一判定结果具有更高的可信度。在多个码流包的交织不均匀的情况下,每次连续加载数量大于1的音频数据包到音频解码器中,以及每次连续加载数量大于1的视频数据包到视频解码器中的加载方式,可以减少频繁跳转加载码流包的次数。It can be seen that, because the amount of stored data between the audio data packet and the video data packet is greater than or equal to the target distance threshold, according to the prior art, one audio data packet is loaded into the audio decoder each time, and one audio data packet is loaded each time. There may be jumps in the way of loading video data packets into the video decoder, that is, after loading an audio data packet into the audio decoder first, it is necessary to jump to the storage location of the corresponding video data packet to load a video packets to the video decoder. Therefore, when the amount of stored data between the audio data packets and the video data packets with the corresponding display timestamp relationship is greater than or equal to the target distance threshold, the ratio of the target quantity to the preset quantity is relatively large, that is, the ratio of the target quantity to the preset quantity When it is greater than or equal to the second preset threshold, or, when the above-mentioned number of targets is greater than or equal to the third preset threshold, frequent jumps may occur in the process of loading the code stream packet to the corresponding decoder according to the prior art, so it is possible to Determine the uneven interleaving of multiple stream packets. The ratio obtained by calculation can improve the accuracy of judging that the interleaving of multiple code stream packets belongs to uneven interleaving, so that the judgment result of uneven interleaving has higher reliability. In the case of uneven interleaving of multiple code stream packets, each consecutive loading of audio data packets greater than 1 into the audio decoder, and each successive loading of video packets greater than 1 into the video decoder In this way, the number of frequent jumps to load the stream package can be reduced.
因为每次连续加载码流包到对应解码器中的数量比现有技术中每次加载码流包到对应解码器中的数量多,所以在多个码流包的交织不均匀的情况下,现有技术需要频繁跳转才可以将码流包加载完毕,而本方案可以减少现有技术中跳转加载码流包的次数。需要说明的是,第二预设阈值和第三预设阈值均可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个用于参考对比的值。Because the number of code stream packets that are continuously loaded into the corresponding decoder each time is larger than the number of code stream packets loaded into the corresponding decoder each time in the prior art, in the case of uneven interleaving of multiple code stream packets, In the prior art, frequent jumps are required to complete the loading of the code stream package, and this solution can reduce the number of times of jumping and loading the code stream package in the prior art. It should be noted that both the second preset threshold and the third preset threshold may be a value artificially set according to experience for reference and comparison, or a value obtained by training (or learning) according to multiple historical values. The value of the reference comparison.
根据第一方面,在一种可能的实现方式中,码流包还包括字幕数据包,解析所述多媒体文件得到描述信息和多个码流包之后,还可以包括:连续加载第三数量的字幕数据包到字幕解码器中,其中,第一数量的所述音频数据包、第二数量的视频数据包和第三数量的字幕数据包被交替加载;其中,第三数量为大于1的整数,第三数量的字幕数据包对应第三播放时间段,第三播放时间段与第一播放时间段之间的偏差,或与第二播放时间段之间的偏差均小于所述预设时间差阈值。According to the first aspect, in a possible implementation manner, the code stream package further includes a subtitle data package, and after parsing the multimedia file to obtain the description information and multiple code stream packages, it may further include: continuously loading a third number of subtitles data packets into the subtitle decoder, wherein a first quantity of the audio data packets, a second quantity of video data packets and a third quantity of subtitle data packets are alternately loaded; wherein the third quantity is an integer greater than 1, The third number of subtitle data packets corresponds to the third playback time period, and the deviation between the third playback time period and the first playback time period or the deviation from the second playback time period is smaller than the preset time difference threshold.
可以看出,当码流包还包括字幕数据包时,每次加载数量大于1的音频数据包、数量大于1的视频数据包以及数量大于1的字幕数据包分别到到对应的解码器中,相对于现有技术中每次加载1个音频数据包、1个视频数据包和1个字幕数据包分别到对应的解码器中的方式,可以减少频繁跳转加载码流包的次数,保证音频数据包、视频数据包和字幕数据包同步的情况下,提高读取码流包的速度,避免因发生数据包欠载而引起的播放卡顿。It can be seen that when the code stream package also includes subtitle data packets, each time the audio data packets with a quantity greater than 1, the video data packets with a quantity greater than 1, and the subtitle data packets with a quantity greater than 1 are loaded into the corresponding decoders, respectively, Compared with the method of loading 1 audio data packet, 1 video data packet and 1 subtitle data packet into the corresponding decoders at a time in the prior art, the number of frequent jumps to load the code stream packets can be reduced, and the audio frequency can be guaranteed. When the data packets, video data packets and subtitle data packets are synchronized, the speed of reading the stream packets is increased, and the playback freeze caused by the occurrence of data packet underload is avoided.
根据第一方面,在一种可能的实现方式中,所述解析多媒体文件得到描述信息和多个码流包之后,还包括:若所述描述信息确定所述多个码流包的交织情况属于交织均匀,则加载显示时间戳最小的所述音频数据包到所述音频解码器中,以及,加载显示时间戳最小的视频数据包到所述视频解码器中。According to the first aspect, in a possible implementation manner, after the parsing the multimedia file to obtain the description information and the multiple code stream packets, the method further includes: if the description information determines that the interleaving situation of the multiple code stream packets belongs to If the interleaving is uniform, the audio data packet with the smallest display time stamp is loaded into the audio decoder, and the video data packet with the smallest display time stamp is loaded into the video decoder.
可以看出,读取和加载码流包的过程中可以实时动态地判断码流包的交织情况,当交织情况属于交织均匀时,可以动态地调整码流包的加载方式,按照显示时间戳从小到大的顺序加载未被加载的显示时间戳最小的所述音频数据包和所述视频数据包到所述播放器中播放。It can be seen that in the process of reading and loading the code stream package, the interleaving situation of the code stream package can be dynamically judged in real time. When the interleaving situation is uniform, the loading method of the code stream package can be dynamically adjusted. The unloaded audio data packets and the video data packets with the smallest display time stamp are loaded in the largest order and played in the player.
根据第一方面,在一种可能的实现方式中,多媒体文件包括描述信息,连续加载第一数量的音频数据包到音频解码器,以及,连续加载第二数量的视频数据包到视频解码器中,包括:若描述信息中存储的首个码流包的信息为音频数据包的信息,则连续加载第一数量的音频数据包到音频解码器中,再连续加载第二数量的视频数据包到视频解码器中。According to the first aspect, in a possible implementation manner, the multimedia file includes description information, continuously loads a first number of audio data packets into the audio decoder, and continuously loads a second number of video data packets into the video decoder , including: if the information of the first stream packet stored in the description information is the information of the audio data packet, then continuously load the audio data packet of the first quantity into the audio decoder, and then continuously load the video data packet of the second quantity to in the video decoder.
可以看出,加载数据包的方式可以根据多媒体文件的描述信息来确定,若首个码流包为音频数据包,则可以先加载音频数据包到音频解码器。通过描述信息来确定加载音频码流包和视频码流包的顺序具有更好的可信度。It can be seen that the method of loading the data packet can be determined according to the description information of the multimedia file. If the first stream packet is an audio data packet, the audio data packet can be loaded to the audio decoder first. Determining the order of loading audio code stream packets and video code stream packets through the description information has better reliability.
根据第一方面,在一种可能的实现方式中,多媒体文件包括描述信息,连续加载第一数量的音频数据包到音频解码器,以及,连续加载第二数量的视频数据包到视频解码器中,包括:若描述信息中存储的首个码流包的信息为视频数据包的信息,则连续加载第二数量的视频数据包到视频解码器中,再连续加载第一数量的音频数据包到音频解码器中。According to the first aspect, in a possible implementation manner, the multimedia file includes description information, continuously loads a first number of audio data packets into the audio decoder, and continuously loads a second number of video data packets into the video decoder , including: if the information of the first stream packet stored in the description information is the information of the video data packet, then continuously load the second quantity of video data packets into the video decoder, and then continuously load the first quantity of audio data packets to in the audio codec.
可以看出,加载数据包的方式可以根据多媒体文件的描述信息来确定,若首个码流包为视频数据包,则可以先加载视频数据包到视频解码器。通过描述信息来确定加载音频码流包和视频码流包的顺序具有更好的可信度。It can be seen that the method of loading the data packet can be determined according to the description information of the multimedia file. If the first stream packet is a video data packet, the video data packet can be loaded to the video decoder first. Determining the order of loading audio code stream packets and video code stream packets through the description information has better reliability.
本申请实施例第二方面提供了一种多媒体文件的播放装置,该装置可以包括:A second aspect of the embodiments of the present application provides a device for playing multimedia files, and the device may include:
第一加载单元,用于加载多媒体文件到播放器的内存中;The first loading unit is used to load the multimedia file into the memory of the player;
解析单元,用于解析多媒体文件得到多个码流包,码流包包括音频数据包和视频数据包;a parsing unit, used for parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets;
第二加载单元,用于连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中,其中,第一数量为大于1的整数,第二数量为大于1的整数。a second loading unit, configured to continuously load a first number of audio data packets into the audio decoder, and continuously load a second number of video data packets into the video decoder, wherein the first number is an integer greater than 1, The second number is an integer greater than one.
根据第二方面,在一种可能的实现方式中,第一数量的视频数据包对应第一播放时间段,第二数量的音频数据包对应第二播放时间段,第一播放时间段和第二播放时间段之间的偏差小于预设时间差阈值。According to the second aspect, in a possible implementation manner, the first number of video data packets corresponds to the first playback period, the second number of audio data packets corresponds to the second playback period, and the first playback period and the second The deviation between the playback time periods is less than the preset time difference threshold.
根据第二方面,在一种可能的实现方式中,多媒体文件可以包括描述信息,上述装置还包括确定单元,用于根据描述信息确定多个码流包的交织是否均匀。According to the second aspect, in a possible implementation manner, the multimedia file may include description information, and the above apparatus further includes a determination unit, configured to determine whether the interleaving of multiple code stream packets is uniform according to the description information.
根据第二方面,在一种可能的实现方式中,确定单元,具体用于:根据描述信息统计多个码流包中具有对应显示时间戳关系的音频数据包和视频数据包之间的存储数据量;统计存储数据量大于或等于目标距离阈值的目标数量;根据目标数量确定多个码流包的交织是否均匀。According to the second aspect, in a possible implementation manner, the determining unit is specifically configured to: according to the description information, count the stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets Count the number of targets whose data volume is greater than or equal to the target distance threshold; determine whether the interleaving of multiple code stream packets is uniform according to the target number.
根据第二方面,在一种可能的实现方式中,目标距离阈值为第一距离和预设距离中更大的一个,第一距离为根据描述信息所携带的多媒体文件的宽度、高度、存储比或压缩比中的至少一项所确定的;其中,多媒体文件包括视频帧,多媒体文件的宽度和高度对应于视频帧的宽度和高度。According to the second aspect, in a possible implementation manner, the target distance threshold is the larger one of the first distance and the preset distance, and the first distance is the width, height, storage ratio of the multimedia file carried according to the description information or determined by at least one of the compression ratios; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
根据第二方面,在一种可能的实现方式中,确定单元,具体用于:根据描述信息确定预设数量的码流包中每个码流包的显示时间戳和存储位置,预设数量的码流包属于多个码流包中的前N个码流包,N为正整数;按照显示时间戳逐增的方式确定预设数量的码流包中具有对应显示时间戳关系的视频数据包和音频数据包各自的存储位置;根据统计的视频数据包的存储位置和音频数据包的存储位置,确定具有对应显示时间戳关系的视频数据包和音频数据包之间的存储数据量。According to the second aspect, in a possible implementation manner, the determining unit is specifically configured to: determine, according to the description information, the display time stamp and storage location of each code stream packet in the preset number of code stream packets, and the preset number of The stream packet belongs to the first N stream packets among the multiple stream packets, and N is a positive integer; the video data packets with the corresponding display timestamp relationship among the preset number of stream packets are determined according to the method of increasing the display timestamp gradually. and the respective storage locations of the audio data packets; determine the storage data amount between the video data packets and the audio data packets with the corresponding display time stamp relationship according to the statistical storage positions of the video data packets and the storage positions of the audio data packets.
根据第二方面,在一种可能的实现方式中,确定单元,具体用于:计算目标数量占预 设数量的比例;若比例大于或等于第二预设阈值,则确定多个码流包的交织不均匀;或若目标数量大于或等于第三预设阈值,则确定多个码流包的交织不均匀。According to the second aspect, in a possible implementation manner, the determining unit is specifically configured to: calculate the ratio of the target quantity to the preset quantity; if the ratio is greater than or equal to the second preset threshold, determine the The interleaving is uneven; or if the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of multiple code stream packets is uneven.
根据第二方面,在一种可能的实现方式中,码流包还包括字幕数据包,加载单元,还用于:连续加载第三数量的字幕数据包到字幕解码器中,其中,第一数量的音频数据包、第二数量的视频数据包和第三数量的字幕数据包被交替加载;其中,第三数量为大于1的整数,第三数量的字幕数据包对应第三播放时间段,第三播放时间段与第一播放时间段之间的偏差,或与第二播放时间段之间的偏差均小于预设时间差阈值。According to the second aspect, in a possible implementation manner, the code stream package further includes a subtitle data package, and the loading unit is further configured to: continuously load a third number of subtitle data packages into the subtitle decoder, where the first number of The audio data packets of the second quantity, the video data packets of the second quantity and the subtitle data packets of the third quantity are loaded alternately; wherein, the third quantity is an integer greater than 1, the subtitle data packets of the third quantity correspond to the third playback time period, and the The deviation between the third playing time period and the first playing time period, or the deviation from the second playing time period is smaller than the preset time difference threshold.
根据第二方面,在一种可能的实现方式中,加载单元,还用于:若描述信息确定多个码流包的交织情况属于交织均匀,则加载显示时间戳最小的音频数据包到音频解码器中,以及,加载显示时间戳最小的视频数据包到视频解码器中。According to the second aspect, in a possible implementation manner, the loading unit is further configured to: if the description information determines that the interleaving of the multiple code stream packets is uniformly interleaved, load the audio data packet with the smallest display time stamp to the audio decoding and, load the video packet with the smallest display timestamp into the video decoder.
关于第二方面或可能的实现方式所带来的技术效果,可参考对于第一方面或相应的实施方式的技术效果的介绍。Regarding the technical effects brought about by the second aspect or possible implementation manners, reference may be made to the introduction to the technical effects of the first aspect or corresponding implementation manners.
本申请实施例第三方面提供了一种电子设备,电子设备包括至少一个处理器和传输接口,所述至少一个处理器通过所述传输接口接收或发送信号;所述至少一个处理器用于调用存储在存储器中的计算机程序,以使得商社电子设备执行第一方面或第一方面任意一种可能的实施方式所描述的方法。A third aspect of the embodiments of the present application provides an electronic device, the electronic device includes at least one processor and a transmission interface, the at least one processor receives or sends a signal through the transmission interface; the at least one processor is used to call storage A computer program in a memory to cause a trading company electronic device to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
本申请实施例第四方面公开了一种计算机可读存储介质,计算机可读存储介质中存储有程序指令,当程序指令在计算机或处理器上运行时,执行第一方面或第一方面的任意一种可能的实施方式所描述的方法。A fourth aspect of the embodiments of the present application discloses a computer-readable storage medium, where program instructions are stored in the computer-readable storage medium, and when the program instructions are run on a computer or a processor, the first aspect or any one of the first aspect is executed. A possible implementation of the described method.
本申请实施例第五方面公开了一种计算机程序产品,计算机程序产品包括程序指令,当程序指令在计算机或处理器上运行时,执行第一方面或第一方面的任意一种可能的实施方式所描述的方法。A fifth aspect of the embodiments of the present application discloses a computer program product. The computer program product includes program instructions. When the program instructions are run on a computer or a processor, the first aspect or any possible implementation manner of the first aspect is executed. the described method.
附图说明Description of drawings
图1A是本申请实施例提供的一种交织均匀的多媒体文件的播放方法的示意图;1A is a schematic diagram of a method for playing a uniformly interleaved multimedia file provided by an embodiment of the present application;
图1B是本申请实施例提供的一种交织不均匀的多媒体文件的播放方法的示意图;1B is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by an embodiment of the present application;
图1C是现有技术提供的一种交织不均匀的多媒体文件的播放方法的示意图;1C is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by the prior art;
图2A是本申请实施例提供的一种多媒体文件的播放环境的示意图;2A is a schematic diagram of a playback environment of a multimedia file provided by an embodiment of the present application;
图2B是本申请实施例提供的另一种多媒体文件的播放环境的示意图;2B is a schematic diagram of another multimedia file playback environment provided by an embodiment of the present application;
图3A是本申请实施例提供的一种电子设备的结构示意图;3A is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图3B是本申请实施例提供的一种电子设备的软件结构框图;3B is a software structural block diagram of an electronic device provided by an embodiment of the present application;
图4是本申请实施例提供的一种电子设备的软件模块交互示意图;4 is a schematic diagram of software module interaction of an electronic device provided by an embodiment of the present application;
图5是本申请实施例提供的一种多媒体文件的播放方法;5 is a method for playing a multimedia file provided by an embodiment of the present application;
图6是本申请实施例提供的另一种多媒体文件的播放方法;6 is another method for playing a multimedia file provided by an embodiment of the present application;
图7是本申请实施例提供的一种判断多媒体文件交织是否均匀的方法的流程示意图;7 is a schematic flowchart of a method for judging whether the interleaving of multimedia files is uniform according to an embodiment of the present application;
图8A是本申请实施例提供的一种批量加载码流包的播放方法的流程示意图;8A is a schematic flowchart of a method for playing a batch loading stream package provided by an embodiment of the present application;
图8B是本申请实施例提供的一种批量加载码流包的播放方法的示意图;8B is a schematic diagram of a playback method for batch loading stream packets provided by an embodiment of the present application;
图9是本申请实施例提供的另一种多媒体文件的播放方法;9 is another method for playing a multimedia file provided by an embodiment of the present application;
图10是本申请实施例提供的一种多媒体文件的播放装置的结构示意图。FIG. 10 is a schematic structural diagram of an apparatus for playing a multimedia file provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合本申请实施例中的附图对本申请实施例进行描述。需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。The embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It should be noted that, in this application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiment or design described in this application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs, use of "exemplary" or "such as ” and other words are intended to present the related concept in a concrete way.
下面先对本申请涉及到的相关技术和专业术语进行简单的介绍以方便理解。The following briefly introduces related technologies and professional terms involved in the present application to facilitate understanding.
1.多媒体容器1. Multimedia container
多媒体容器中存储了压缩处理后的音频数据包、视频数据包和/或字幕数据包,容器格式也称为封装格式。常见的封装格式包括以下一种或多种:MPEG-4第14部分(MPEG-4Part14,MP4)、音频视频交错格式(Audio Video Interleaved,AVI)、传输流格式(MPEG2-TS,TS)等。其中,不同的容器格式存储音频数据包、视频数据包和/或字幕数据包的方式不同,分别应用于不同的领域。比如说:TS属于流封装形式,常用在广播电视以及流媒体协议中;MP4属于帧封装形式,常用本地视频以及网络视频领域。The compressed audio data packets, video data packets and/or subtitle data packets are stored in the multimedia container, and the container format is also called the encapsulation format. Common encapsulation formats include one or more of the following: MPEG-4 Part 14 (MPEG-4 Part14, MP4), Audio Video Interleaved (AVI), Transport Stream (MPEG2-TS, TS) and so on. Wherein, different container formats store audio data packets, video data packets and/or subtitle data packets in different ways, and are respectively applied in different fields. For example: TS is a stream encapsulation form, commonly used in broadcast TV and streaming media protocols; MP4 is a frame encapsulation form, commonly used in the field of local video and network video.
2.描述信息2. Descriptive information
描述信息又可以称为信息块。描述信息包括该多媒体文件所包含的码流包(例如多个视频数据包、多个音频数据包和/或多个字幕数据包)的描述信息,描述信息可以包括以下一种或多种:多媒体文件标识、多媒体文件的播放时长、视频的宽、高、帧率、码率、分辨率等信息;音频的采样率、通道数等信息。另外,描述信息还包括一个存储信息表,存储信息表中描述了每一个视频数据包、音频数据包和/或字幕数据包到的存储位置(Position,POS),包的长度,以及包的显示时间戳(Presentation time stamp,PTS)等。The description information can also be called an information block. The description information includes the description information of the stream packets (for example, multiple video data packets, multiple audio data packets and/or multiple subtitle data packets) contained in the multimedia file, and the description information may include one or more of the following: Information such as file identification, playback duration of multimedia files, video width, height, frame rate, bit rate, resolution, etc.; audio sampling rate, number of channels, and other information. In addition, the description information also includes a storage information table, which describes the storage location (Position, POS) to which each video data packet, audio data packet and/or subtitle data packet is stored, the length of the packet, and the display of the packet. Timestamp (Presentation time stamp, PTS), etc.
需要说明的是,描述信息通常位于多媒体文件的头部或尾部。其中,文件标识可以是“中文字幕”、“中文音频”、“英文音频”等。It should be noted that the description information is usually located at the head or tail of the multimedia file. Wherein, the file identifier may be "Chinese subtitle", "Chinese audio", "English audio" and so on.
3.码流(Data Rate)3. Data Rate
码流是指视频数据经过编码压缩后在单位时间内的数据流量。一般来说,同样分辨率下,视频数据的码流越大,压缩比就越小,画面质量就越高。The code stream refers to the data flow in a unit time after the video data is encoded and compressed. Generally speaking, under the same resolution, the larger the code stream of the video data, the smaller the compression ratio and the higher the picture quality.
4.分辨率(Resolution)4. Resolution
视频是由连续的图像构成的,每一张图像,称为一帧(Frame),图像则是由像素(Pixel)构成的。一张图像有多少像素,称为这个图像的分辨率。比如说:1920*1080的图像,说明它是由横纵1920*1080个像素点构成。因此,视频的分辨率就是每一帧图像的分辨率。Video is composed of consecutive images, each image is called a frame (Frame), and the image is composed of pixels (Pixel). The number of pixels in an image is called the resolution of the image. For example, an image of 1920*1080 means that it is composed of horizontal and vertical 1920*1080 pixels. Therefore, the resolution of the video is the resolution of each frame of the image.
5.帧率5. Frame rate
一帧就是一副静止的画面,连续的帧就形成动画,如电影等。通常所说的帧数就是在秒钟时间里传输的图片的帧数,通常用每秒传输帧数(Frames Per Second,FPS)表示。每一帧都是静止的图像,快速连续地显示帧便形成了运动的假象,还原了物体当时的状态。高帧率可以得到更流畅、更逼真的动画。每秒钟帧数(FPS)愈多,所显示的动作就会愈流畅。A frame is a still picture, and continuous frames form an animation, such as a movie. The number of frames usually referred to is the number of frames of pictures transmitted in seconds, usually expressed in frames per second (Frames Per Second, FPS). Each frame is a still image, and displaying frames in rapid succession creates the illusion of motion, restoring the state of the object at that time. Higher frame rates result in smoother, more realistic animations. The more frames per second (FPS), the smoother the displayed motion will be.
6.比特率(Bit Rate)6. Bit Rate
比特率是指每秒传送的比特(bit)数。单位为bps(Bit Per Second),比特率越高,传送的 数据越大。Bit rate refers to the number of bits (bits) transmitted per second. The unit is bps (Bit Per Second). The higher the bit rate, the larger the transmitted data.
比特率表示经过编码(压缩)后的音、视频数据每秒钟需要用多少个比特来表示,而比特就是二进制里面最小的单位,要么是0,要么是1。比特率与音、视频压缩的关系,简单的说就是比特率越高,音、视频的质量就越好,但编码后的文件就越大;如果比特率越小则情况刚好相反。The bit rate indicates how many bits per second the encoded (compressed) audio and video data needs to represent, and a bit is the smallest unit in binary, either 0 or 1. The relationship between bit rate and audio and video compression is simply that the higher the bit rate, the better the quality of audio and video, but the larger the encoded file; if the bit rate is smaller, the situation is just the opposite.
7.采样率(Sample Rate)7. Sample Rate
采样率(也称为采样速度或者采样频率)定义了每秒对音频数据的采用次数,它用赫兹(Hz)来表示。Sampling rate (also called sampling speed or sampling frequency) defines the number of times that audio data is taken per second, and is expressed in Hertz (Hz).
采样率是指将模拟信号转换成数字信号时的采样频率,也就是单位时间内采样多少点。一个采样点数据有多少个比特。The sampling rate refers to the sampling frequency when converting an analog signal into a digital signal, that is, how many points are sampled per unit time. How many bits are there in a sample point data.
8.通道数8. Number of channels
通道数即为声音的通道的数目,声音为音频数据解码后由扬声器播放的,通道常有单声道和立体声之分。The number of channels is the number of sound channels. The sound is played by the speaker after the audio data is decoded. The channels are often divided into monophonic and stereophonic.
9.解封装9. Decapsulation
解封装就是将多媒体文件按照对应的封装格式进行拆分,把多媒体文件中的音频数据包、视频数据包和/或字幕数据包拆分出来。通过解封装可以得到多媒体文件的参数,比如说编码格式、文件大小、播放时长、分辨率、音频采样率、通道数等等。Decapsulation is to split the multimedia file according to the corresponding encapsulation format, and split the audio data packet, video data packet and/or subtitle data packet in the multimedia file. The parameters of the multimedia file can be obtained through decapsulation, such as encoding format, file size, playback duration, resolution, audio sampling rate, number of channels, and so on.
为了便于理解本申请实施例,下面先分析并提出本申请所具体要解决的技术问题。请参见图1A,图1A是本申请实施例提供的一种交织均匀的多媒体文件的播放方法的示意图。从图1A可以看出,多媒体文件100A包括描述信息和多个码流包,码流包包括音频数据包和视频数据包,描述信息中携带了每一个码流包的属性信息(属性信息标明了是音频数据包还是视频数据包)、显示时间戳、存储位置、内存大小等描述信息。其中,多媒体文件100A的音频数据包和视频数据包按照显示时间戳相邻依序均匀交织存储。比如说:显示时间戳为“1”的音频数据包和显示时间戳为“1”的视频数据包相邻存储,显示时间戳为“1”的视频数据包和显示时间戳为“2”的音频数据包相邻存储。需要说明的是,显示时间戳为“1”可以表明该显示时间戳所对应的码流包在解码后可以在“第1位”显示,显示时间戳为“2”可以表明该显示时间戳所对应的码流包在解码后可以在“第2位”显示,也即在显示时间戳为“1”的后面显示。或者显示时间戳为“1”还可以表明该显示时间戳所对应的码流包在解码后可以在“预设时间点”显示,“预设时间点”可以根据实际需要来确定,本申请实施例不做任何限制。电子设备加载多媒体文件100A到播放器200的内存中,然后解析多媒体文件100A得到描述信息和多个码流包,再根据描述信息中包含的多个码流包的描述信息依序加载音频数据包到音频解码器,视频数据包到视频解码器,然后对解码后的音频数据包和视频数据包进行同步处理后将其分别发送给扬声器和显示器,即可实现口音和口型同步的播放效果。In order to facilitate understanding of the embodiments of the present application, the following first analyzes and proposes specific technical problems to be solved by the present application. Please refer to FIG. 1A . FIG. 1A is a schematic diagram of a method for playing a uniformly interleaved multimedia file provided by an embodiment of the present application. As can be seen from FIG. 1A , the multimedia file 100A includes description information and a plurality of code stream packets, the code stream packets include audio data packets and video data packets, and the description information carries the attribute information of each code stream packet (the attribute information indicates the whether it is an audio data packet or a video data packet), display time stamp, storage location, memory size and other description information. Wherein, the audio data packets and the video data packets of the multimedia file 100A are evenly interleaved and stored in sequence according to the adjacent display time stamps. For example: the audio data packet with the display time stamp "1" and the video data packet with the display time stamp "1" are stored adjacent to each other, and the video data packet with the display time stamp "1" and the video data packet with the display time stamp "2" are stored adjacent to each other. Audio packets are stored contiguously. It should be noted that if the display time stamp is "1", it can indicate that the stream packet corresponding to the display time stamp can be displayed in the "1st bit" after decoding. The corresponding code stream packet can be displayed in the "2nd bit" after decoding, that is, it is displayed after the display time stamp is "1". Or the display timestamp of "1" can also indicate that the stream packet corresponding to the displayed timestamp can be displayed at the "preset time point" after decoding, and the "preset time point" can be determined according to actual needs. This application implements The example does not impose any restrictions. The electronic device loads the multimedia file 100A into the memory of the player 200, then parses the multimedia file 100A to obtain description information and a plurality of code stream packets, and then loads the audio data packets in sequence according to the description information of the plurality of code stream packets contained in the description information To the audio decoder, video data packets to the video decoder, and then synchronously process the decoded audio data packets and video data packets and send them to the speaker and the display, respectively, to achieve the playback effect of accent and lip sync.
举例来说:从图1A可以看出,电子设备从显示时间戳为“1”的音频数据包开始依序先加载显示时间戳为“1”的音频数据包、显示时间戳为“1”的视频数据包、显示时间戳为“2”的音频数据包和显示时间戳为“2”的视频数据包到播放器200的内存中。接下来,电子设备将显示时间戳为“1”的音频数据包加载到音频解码器中,再将显示时间戳为“1”的视频数据包 加载到视频解码器中,再依序将显示时间戳为“2”的音频数据包加载到音频解码器中,将显示时间戳为“2”的视频数据包加载到视频解码器中。For example, it can be seen from FIG. 1A that the electronic device loads the audio data packets with a display time stamp of "1" and the audio data packets with a display time stamp of "1" in sequence from the audio data packets with a display time stamp of "1". The video data packets, the audio data packets with the display time stamp "2", and the video data packets with the display time stamp "2" are stored in the memory of the player 200 . Next, the electronic device loads the audio data packet with the display time stamp "1" into the audio decoder, and then loads the video data packet with the display time stamp "1" into the video decoder, and then sequentially displays the time The audio packets with the timestamp "2" are loaded into the audio decoder, and the video packets with the display timestamp "2" are loaded into the video decoder.
需要说明的是,每个码流包均有显示时间戳,显示时间戳指示了该码流包被解码后在何时执行播放,即显示时间戳小的先播放,显示时间戳大的后播放,具有对应显示时间戳关系的码流包被解码后需要同时播放。比如说显示时间戳“为1”的音频数据包和显示时间戳为“1”的视频数据包为具有对应显示时间戳关系的码流包。因此,播放器200在加载码流包时,可以按照显示时间戳增长的顺序依次将码流包加载到解码器中进行解码后播放。It should be noted that each stream packet has a display timestamp, and the display timestamp indicates when the stream packet is decoded and played, that is, the display timestamp is played first, and the display timestamp is displayed later. , the stream packets with the corresponding display timestamps need to be played at the same time after being decoded. For example, an audio data packet with a display time stamp of "1" and a video data packet with a display time stamp of "1" are stream packets with a corresponding display time stamp relationship. Therefore, when loading the code stream package, the player 200 can sequentially load the code stream package into the decoder according to the increasing order of the display time stamp, perform decoding and playback.
请参见图1B,图1B是本申请实施例提供的一种交织不均匀的多媒体文件的播放方法的示意图。从图1B可以看出,多媒体文件100A的音频数据包和视频数据包没有按照显示时间错依序均匀存储,播放器200为了保证口音(解码后的音频数据包)和口型(解码后的视频数据包)的播放一致,需要不断地跳转到多媒体文件100B中加载相应的音频数据包和视频数据包。然后对解码后的音频数据包和视频数据包进行同步处理后将其分别发送给扬声器和显示器,即可实现口音和口型同步的播放效果。Please refer to FIG. 1B . FIG. 1B is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided by an embodiment of the present application. As can be seen from FIG. 1B , the audio data packets and video data packets of the multimedia file 100A are not evenly stored in order according to the display time error. The playback of the data packets) is consistent, and it is necessary to continuously jump to the multimedia file 100B to load the corresponding audio data packets and video data packets. Then, after synchronizing the decoded audio data packets and video data packets, they are sent to the speaker and the display respectively, so that the playback effect of accent and mouth synchronization can be realized.
举例来说,从图1B可以看出,电子设备从显示时间戳为“1”的音频数据包开始按照存储顺序先加载第一码流块(包括多媒体文件的描述信息和显示时间戳为“1”的音频数据包、显示时间戳为“2”的音频数据包、显示时间戳为“3”的音频数据包、显示时间戳为“4”的音频数据包和显示时间戳为“5”的音频数据包)到播放器200的内存中。然后电子设备可以按照显示时间戳的排序先将显示时间戳为“1”的音频数据包加载到音频解码器后,接下来根据音频数据包和视频数据包需要同步播放的原则需要将显示时间戳为“1”的视频数据包到视频解码器中。但是电子设备发现加载到播放器200的内存中的第一码流块中不存在显示时间戳为“1”的视频数据包,因此需要跳转到多媒体文件100B中显示时间戳为“1”的视频数据包的存储位置加载第二码流块(包括一个或多个码流包)到内存中。所以,播放器200加载从显示时间戳为“1”的视频数据包开始依序加载第二码流块(包括多媒体文件100B的显示时间戳为“1”的视频数据包、显示时间戳为“10”的音频数据包、显示时间戳为“11”的音频数据包、显示时间戳为“12”的音频数据包和显示时间戳为“2”的视频数据包)到内存中。可以理解的是,之前加载到内存中的显示时间戳为“2”至“5”的音频数据包将因为没有被使用而被播放器200从内存中删除。然后,电子设备将显示时间戳为“1”的视频数据包加载到播放器200的视频解码器中。接下来,电子设备依序需要加载显示时间戳为“2”的音频数据包到播放器200音频解码器中,但是加载到内存中的第二码流块中不存在显示时间戳为“2”的音频数据包,因此电子设备需要跳转到多媒体文件100B中显示时间戳为“2”的音频数据包的存储位置加载第三码流块(包括一个或多个码流包)到内存中。可以理解的是,之前加载到内存中的显示时间戳为“10”至“12”的音频数据包和显示时间戳为“2”的视频数据包将因为没有被使用而被播放器200从内存中删除。For example, it can be seen from FIG. 1B that the electronic device loads the first code stream block (including the description information of the multimedia file and the display timestamp of “1” from the audio data packet with the display time stamp “1” in the storage order. ”, audio packets with display timestamp “2”, audio packets with display timestamp “3”, audio packets with display timestamp “4”, and audio packets with display timestamp “5” audio data packets) into the memory of the player 200. Then the electronic device can first load the audio data packets with the display time stamp "1" into the audio decoder according to the order of the display time stamps, and then according to the principle that the audio data packets and the video data packets need to be played synchronously, the display time stamps need to be displayed. A video packet of "1" is sent to the video decoder. However, the electronic device finds that the first stream block loaded into the memory of the player 200 does not have a video data packet with a display timestamp of "1", so it needs to jump to the multimedia file 100B with a display timestamp of "1" The storage location of the video data packet loads the second code stream block (including one or more code stream packets) into the memory. Therefore, the player 200 loads the second stream block in sequence starting from the video data packet with the display time stamp "1" (including the video data packet with the display time stamp "1" of the multimedia file 100B, 10", audio packets with display timestamp "11", audio packets with display timestamp "12", and video packets with display timestamp "2") into memory. It can be understood that the audio data packets with display time stamps "2" to "5" previously loaded into the memory will be deleted from the memory by the player 200 because they are not used. Then, the electronic device loads the video data packet whose display time stamp is “1” into the video decoder of the player 200 . Next, the electronic device sequentially needs to load the audio data packets with the display time stamp "2" into the audio decoder of the player 200, but there is no display time stamp "2" in the second stream block loaded into the memory Therefore, the electronic device needs to jump to the storage location of the audio data packet with the timestamp "2" in the multimedia file 100B to load the third stream block (including one or more stream packets) into the memory. It can be understood that the audio data packets with display time stamps "10" to "12" and the video data packets with display time stamp "2" previously loaded into the memory will be deleted from the memory by the player 200 because they are not used. deleted in .
可以看出,当多媒体文件100B的音频数据包和视频数据包的交织不均匀时,需要反复跳转加载对应的码流包到解码器中,会导致加载性能下降。在网络场景中,在线播放交织不均匀的多媒体文件100B时,反复跳转的表现可能会更严重,将会产生数据欠载引发的播放卡顿等问题。It can be seen that when the interleaving of the audio data packets and the video data packets of the multimedia file 100B is uneven, it is necessary to repeatedly jump and load the corresponding stream packets into the decoder, which will lead to a decrease in the loading performance. In a network scenario, when an unevenly interleaved multimedia file 100B is played online, the performance of repeated jumps may be more serious, which may cause problems such as playback freezes caused by data underload.
请参见图1C,图1C是现有技术提供的一种交织不均匀的多媒体文件的播放方法的示意图。首先,播放器200可以确定目标多媒体文件的音频数据包的第一文件起始位置和视频数据包的第二文件起始位置,当第一文件起始位置与第二文件起始位置间的可存储数据量大于或等于预设阈值时,播放器200可以基于双输入/输出(Input/Output,I/O)分别读取音频数据包和视频数据包,然后根据音频数据包和视频数据包的显示时间戳或解码时间戳(需要说明的是,对于在线播放的多媒体文件,解码时间戳与显示时间戳相等)对读取的音频数据包和视频数据包进行排序,基于排序后的音频数据包和视频数据包播放目标多媒体文件。Please refer to FIG. 1C , which is a schematic diagram of a method for playing a multimedia file with uneven interleaving provided in the prior art. First, the player 200 can determine the first file start position of the audio data package of the target multimedia file and the second file start position of the video data package, when the possible difference between the first file start position and the second file start position is When the amount of stored data is greater than or equal to the preset threshold, the player 200 can read the audio data packets and the video data packets respectively based on the dual input/output (I/O), and then read the audio data packets and the video data packets according to the Display time stamp or decoding time stamp (it should be noted that for online multimedia files, decoding time stamp is equal to display time stamp) Sort the read audio data packets and video data packets, based on the sorted audio data packets and video packets to play the target multimedia file.
举例来说,从图1C可以看出,基于显示时间戳为“1”的音频数据包的存储位置和显示时间戳为“1”的视频数据包的存储位置确定可存储数据量,当可存储数据量超过阈值时,说明多媒体文件100C的码流包为交织不均匀。播放器200通过第一条I/O通道从显示时间戳为“1”的音频数据包开始依序加载第一码流块(包括,显示时间戳为“1至9”的音频数据包、显示时间戳为“1”的视频数据包和显示时间戳为“10”的音频数据包),然后将显示时间戳为“1”的视频数据包删除,在第一条I/O中只保留音频数据包,再根据显示时间戳对保留的音频数据包进行排序。播放器200通过第二条I/O通道从显示时间戳为“1”的视频数据包开始依序加载第二码流块(包括,显示时间戳为“1”的视频数据包、显示时间戳为“10至12”的音频数据包、显示时间戳为“2”的视频数据包、显示时间戳为“13至15”的音频数据包和显示时间戳为“3至6”的视频数据包数据包),然后将显示时间戳为“10至12”和显示时间戳为“13至15”的音频数据包删除,在第一条I/O中只保留视频数据包,再根据显示时间戳对保留的视频数据包进行排序。For example, it can be seen from FIG. 1C that the amount of storable data is determined based on the storage location of the audio data packet with the display time stamp "1" and the storage location of the video data packet with the display time stamp "1". When the amount of data exceeds the threshold, it means that the code stream packets of the multimedia file 100C are unevenly interleaved. The player 200 sequentially loads the first stream block (including the audio data packets with the display time stamp "1 to 9", the display video packets with a timestamp of "1" and audio packets with a display timestamp of "10"), then delete the video packets with a display timestamp of "1", and only keep the audio in the first I/O packets, and then sort the retained audio packets according to the display timestamp. The player 200 sequentially loads the second stream block (including the video data packet with the display time stamp "1", the video data packet with the display time stamp "1", the display time stamp Audio packets with display timestamps "10 to 12", video packets with display timestamps "2", audio packets with display timestamps "13 to 15", and video packets with display timestamps "3 to 6" data packets), and then delete the audio data packets with the display time stamp "10 to 12" and the display time stamp "13 to 15", only keep the video data packets in the first I/O, and then according to the display time stamp Sort reserved video packets.
可以看出存在以下问题:It can be seen that the following problems exist:
1、基于第一个音频数据包和第一个视频数据包之间的存储距离(可存储数据量)来判断该多媒体文件的多个码流包的交织情况是不太准确的。1. It is inaccurate to judge the interleaving situation of multiple stream packets of the multimedia file based on the storage distance (storable data amount) between the first audio data packet and the first video data packet.
2、双I/O通道加载多媒体文件相较于单I/O通道加载多媒体文件,可能会产生双倍的带宽占比,导致下载效率降低。比如说在播放4K分辨率或者8K分辨率的多媒体文件时,可能会因为网速限制而产生欠载。2. Compared to loading multimedia files with a single I/O channel, loading multimedia files with dual I/O channels may result in double the bandwidth ratio, resulting in lower download efficiency. For example, when playing multimedia files with 4K resolution or 8K resolution, there may be an underload due to network speed limitations.
3、基于双I/O通道加载多媒体文件,需要建立两个线程,两个线程相互协调下载位置,同时可能会增加播放器的内存消耗,浪费系统资源。3. To load multimedia files based on dual I/O channels, two threads need to be established. The two threads coordinate the download position with each other, which may increase the memory consumption of the player and waste system resources.
4、基于显示时间戳对音频数据包和视频数据包进行排序,将会产生耗时,降低播放效率。4. Sorting audio data packets and video data packets based on the display time stamp will take time and reduce playback efficiency.
需要说明的是,上述提及的多媒体文件100A、多媒体文件100B和多媒体文件100C可以是从网络上下载得到的文件或者本地加载得到的文件。It should be noted that the above-mentioned multimedia file 100A, multimedia file 100B and multimedia file 100C may be files downloaded from the network or files loaded locally.
为了解决上述技术问题,首先,请参见图2A,图2A是本申请实施例提供的一种多媒体文件的播放环境的示意图。从图2A可以看出,播放环境001可以包括第一电子设备200A和服务器201。第一电子设备200A和服务器201之间可以建立通信连接关系,进行信息传输。第一电子设备200A和服务器201之间的通信可以基于任何有线网络和/或无线网络,包括但不限于因特网、广域网、城域网、虚拟专用网络和无线通信网络等等。In order to solve the above technical problem, first, please refer to FIG. 2A , which is a schematic diagram of a playback environment of a multimedia file provided by an embodiment of the present application. As can be seen from FIG. 2A , the playback environment 001 may include a first electronic device 200A and a server 201 . A communication connection relationship may be established between the first electronic device 200A and the server 201 to perform information transmission. The communication between the first electronic device 200A and the server 201 may be based on any wired network and/or wireless network, including but not limited to the Internet, a wide area network, a metropolitan area network, a virtual private network, a wireless communication network, and the like.
其中,第一电子设备200A中安装有用于播放多媒体文件的应用软件,比如说某些服务商提供的播放器等等。第一电子设备200A可以包括但不限于智能手机、台式电脑、平板电脑、笔记本电脑、数字助理、智能可穿戴设备等等终端设备。Wherein, the first electronic device 200A is installed with application software for playing multimedia files, such as a player provided by some service providers and the like. The first electronic device 200A may include, but is not limited to, terminal devices such as smart phones, desktop computers, tablet computers, notebook computers, digital assistants, smart wearable devices, and the like.
服务器201可以是一个独立运行的服务器,或者分布式服务器,或者由多个服务器组成的服务器集群。服务器201可以存储需要播放的多媒体文件,进一步地,服务器201可以是多媒体文件所属于的后台服务器。举例来说,若第一电子设备200A播放的服务商A提供的多媒体文件,则服务器201可以是服务商A的后台服务器。The server 201 may be an independently running server, or a distributed server, or a server cluster composed of multiple servers. The server 201 may store multimedia files to be played, and further, the server 201 may be a background server to which the multimedia files belong. For example, if the first electronic device 200A plays a multimedia file provided by service provider A, the server 201 may be a background server of service provider A.
当用户需要在第一电子设备200A上播放多媒体文件时,第一电子设备200A向服务器201发送请求,由服务器201向第一电子设备200A发送多媒体文件,也即第一电子设备200A从服务器201上下载多媒体文件。当第一电子设备200A接收到多媒体文件后,加载多媒体文件到播放器的内存中,然后解析多媒体文件可以得到多个码流包,其中,码流包包括音频数据包和视频数据包。接下来,第一电子设备200A可以连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中。其中,第一数量为大于1的整数,第二数量为大于1的整数。其中,第一数量的视频数据包对应第一播放时间段,第二数量的所述音频数据包对应第二播放时间段,第一播放时间段和所述第二播放时间段之间的偏差小于预设时间差阈值。也即,第一数量的视频数据包和第二数量的音频数据包被解码后可以同步波导一段时间。When the user needs to play a multimedia file on the first electronic device 200A, the first electronic device 200A sends a request to the server 201, and the server 201 sends the multimedia file to the first electronic device 200A, that is, the first electronic device 200A sends a request from the server 201 to the first electronic device 200A. Download multimedia files. After receiving the multimedia file, the first electronic device 200A loads the multimedia file into the memory of the player, and then parses the multimedia file to obtain multiple stream packets, wherein the stream packets include audio data packets and video data packets. Next, the first electronic device 200A may continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder. The first number is an integer greater than 1, and the second number is an integer greater than 1. Wherein, the first number of video data packets corresponds to the first playback time period, the second number of the audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period is less than Preset time difference threshold. That is, the first number of video data packets and the second number of audio data packets can be synchronized for a period of time after being decoded.
请参见图2B,图2B是本申请实施例提供的另一种多媒体文件的播放环境的示意图。从图2B可以看出,播放环境002可以包括第一电子设备200A、第二电子设备200B和路由器202。第一电子设备200A可以连接上路由器202,第二电子设备200B也可以连接上路由器202,通过路由器202可以保证第一电子设备200A和第二电子设备200,均在同一个局域网中。Please refer to FIG. 2B . FIG. 2B is a schematic diagram of another multimedia file playback environment provided by an embodiment of the present application. As can be seen from FIG. 2B , the playback environment 002 may include a first electronic device 200A, a second electronic device 200B, and a router 202 . The first electronic device 200A can be connected to the router 202, and the second electronic device 200B can also be connected to the router 202. The router 202 can ensure that the first electronic device 200A and the second electronic device 200 are in the same local area network.
其中,第一电子设备200A中安装有用于录制和/或播放多媒体文件的应用软件,比如某些服务商提供的短视频拍摄软件。第一电子设备可以是支持数字生活网络联盟(DIGITAL LIVING NETWORK ALLIANCE,DLNA)的终端设备,比如说智能手机、台式电脑、平板电脑、笔记本电脑、数字助理、智能可穿戴设备等等。Wherein, the first electronic device 200A is installed with application software for recording and/or playing multimedia files, such as short video shooting software provided by some service providers. The first electronic device may be a terminal device supporting the DIGITAL LIVING NETWORK ALLIANCE (DLNA), such as a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, and the like.
第二电子设备200B可以是具有播放功能,且同样支持DLNA的设备,比如说智能电视、台式电脑、笔记本电脑等。The second electronic device 200B may be a device that has a playback function and also supports DLNA, such as a smart TV, a desktop computer, a notebook computer, and the like.
当用户通过第一电子设备200A录制或处理某个多媒体文件(可以是一个小视频)后,受到设备或软件差异的应用,录制或处理后的多媒体文件中音频数据包和视频数据包之间的存储位置可能会发生变化。当通过DLNA将第一电子设备200A录制以及拍摄的多媒体文件在第二电子设备200B上播放时,第二电子设备200A加载上述多媒体文件到播放器的内存中后,可以解析多媒体文件得到多个码流包,其中,码流包包括音频数据包和视频数据包。接下来,第二电子设备200A可以连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中,其中,所述第一数量为大于1的整数,所述第二数量为大于1的整数。其中,第一数量的视频数据包对应第一播放时间段,第二数量的所述音频数据包对应第二播放时间段,第一播放时间段和所述第 二播放时间段之间的偏差小于预设时间差阈值。也即,第一数量的视频数据包和第二数量的音频数据包被解码后可以同步波导一段时间。After the user records or processes a certain multimedia file (which may be a small video) through the first electronic device 200A, the difference between the audio data packet and the video data packet in the recorded or processed multimedia file is affected by the application of the device or software difference. Storage locations are subject to change. When the multimedia file recorded and photographed by the first electronic device 200A is played on the second electronic device 200B through DLNA, after the second electronic device 200A loads the multimedia file into the memory of the player, it can parse the multimedia file to obtain multiple codes Stream packets, wherein the code stream packets include audio data packets and video data packets. Next, the second electronic device 200A can continuously load the first number of the audio data packets into the audio decoder, and continuously load the second number of the video data packets into the video decoder, wherein the first A number is an integer greater than 1, and the second number is an integer greater than 1. Wherein, the first number of video data packets corresponds to the first playback time period, the second number of the audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period is less than Preset time difference threshold. That is, the first number of video data packets and the second number of audio data packets can be synchronized for a period of time after being decoded.
请参见图3A,图3A是本申请实施例提供的一种电子设备的结构示意图。进一步的,图3A所示的电子设备300具体可以是图2A中的第一电子设备200A的结构示意图,或者图2A中的第二电子设备200B的结构示意图。从图3A可以看出,电子设备300可以包括处理器110、存储器120、传感器模块130、显示设备140、移动通信模块150、无线通信模块160、音频模块170、摄像头180、输入设备190等等。Please refer to FIG. 3A , which is a schematic structural diagram of an electronic device provided by an embodiment of the present application. Further, the electronic device 300 shown in FIG. 3A may specifically be a schematic structural diagram of the first electronic device 200A in FIG. 2A , or a schematic structural diagram of the second electronic device 200B in FIG. 2A . 3A, the electronic device 300 may include a processor 110, a memory 120, a sensor module 130, a display device 140, a mobile communication module 150, a wireless communication module 160, an audio module 170, a camera 180, an input device 190, and the like.
可以理解的是,可以理解的是,本申请实施例示意的结构并不构成对电子设备300的具体限定。在本申请另一些实施例中,电子设备300可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that, it can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 300 . In other embodiments of the present application, the electronic device 300 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理器,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processors, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
存储器120存储计算机程序,计算机程序包括操作系统程序和应用程序等,其中,应用程序包括浏览器程序。处理器110用于读取存储器120中的计算机程序,然后执行计算机程序定义的方法,例如处理器110读取操作系统程序从而在该电子设备300上运行操作系统以及实现操作系统的各种功能,或读取一种或多种应用程序,从而在该电子设备300上运行应用,例如,读取浏览器程序来运行浏览器。The memory 120 stores computer programs, and the computer programs include operating system programs, application programs, and the like, wherein the application programs include browser programs. The processor 110 is configured to read the computer program in the memory 120, and then execute the method defined by the computer program, for example, the processor 110 reads the operating system program to run the operating system on the electronic device 300 and implement various functions of the operating system, Or read one or more application programs to run the applications on the electronic device 300 , for example, read a browser program to run a browser.
另外,存储器120还存储有除计算机程序之外的其他数据,其他数据可包括操作系统或应用程序被运行后产生的数据,该数据包括系统数据(例如操作系统的配置参数)和用户数据,例如业务产品的支付信息可看作是用户数据。In addition, the memory 120 also stores other data other than the computer program, and the other data may include data generated after the operating system or the application program is executed, and the data includes system data (such as configuration parameters of the operating system) and user data, such as Payment information for business products can be regarded as user data.
存储器120一般包括内部存储器和外部存储器。内部存储器可以存储计算器可执行程序代码,可以为随机存储器(RAM),只读存储器(ROM),以及高速缓存(CACHE)等。处理器110通过运行存储在内部存储器的指令,从而执行电子设备300的各种功能应用以及数据处理。内部存储器可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备使用过程中所创建的数据(比如音频数据包、视频数据包、字幕数据包等)等。此外,内部存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。Memory 120 generally includes internal memory and external memory. The internal memory can store executable program codes of the calculator, and can be random access memory (RAM), read only memory (ROM), and cache memory (CACHE). The processor 110 executes various functional applications and data processing of the electronic device 300 by executing the instructions stored in the internal memory. The internal memory may include a program storage area and a data storage area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area can store data (such as audio data packets, video data packets, subtitle data packets, etc.) created during the use of the electronic device, and the like. In addition, the internal memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
外部存储器可以用于连接外部存储卡,例如硬盘、光盘、USB盘、软盘或磁带机等,实现扩展电子设备300的存储能力。外部存储卡通过外部存储器接口与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory can be used to connect an external memory card, such as a hard disk, an optical disk, a USB disk, a floppy disk, or a tape drive, etc., so as to expand the storage capacity of the electronic device 300 . The external memory card communicates with the processor 110 through the external memory interface to realize the data storage function. For example to save files like music, video etc in external memory card.
传感器模块130包括压力传感器、指纹传感器、触摸传感器,等等。其中,压力传感 器用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器可以设置于显示屏。指纹传感器用于采集指纹,电子设备可以利用采集的指纹特征实现指纹解锁、指纹访问应用锁、指纹拍照、指纹接听来电、指纹支付等。触摸传感器用于检测作用于其上或附近的触控操作。The sensor module 130 includes a pressure sensor, a fingerprint sensor, a touch sensor, and the like. Among them, the pressure sensor is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor may be provided on the display screen. The fingerprint sensor is used to collect fingerprints, and electronic devices can use the collected fingerprint features to unlock fingerprints, access application locks with fingerprints, take photos with fingerprints, answer calls with fingerprints, and pay with fingerprints. A touch sensor is used to detect touch operations on or near it.
显示设备140用于显示图像,视频等。包括显示屏,用于显示由用户输入的信息或提供给用户的信息以及电子设备300的各种菜单界面等。在本申请实施例中,电子设备通过显示屏显示多媒体文件中的视频数据包。显示设备140的显示屏可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备300可以包括1个或N个显示屏,N为大于1的正整数。The display device 140 is used to display images, videos, and the like. It includes a display screen for displaying information input by the user or information provided to the user, various menu interfaces of the electronic device 300, and the like. In the embodiment of the present application, the electronic device displays the video data packets in the multimedia file through the display screen. The display screen of the display device 140 can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active matrix organic light emitting diode). -matrix organic light emitting diode, AMOLED), flexible light emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 300 may include 1 or N display screens, where N is a positive integer greater than 1.
电子设备300的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。电子设备300中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。The wireless communication function of the electronic device 300 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like. Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
移动通信模块150可以提供应用在电子设备300上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。电子设备300通过移动通信模块150可以直接与服务器201建立通信连接,接收服务器201传输的指令和数据(比如说多媒体文件),也向云端服务器传输指令和数据。The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the electronic device 300 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 . The electronic device 300 can directly establish a communication connection with the server 201 through the mobile communication module 150, receive instructions and data (such as multimedia files) transmitted by the server 201, and also transmit instructions and data to the cloud server.
无线通信模块160可以提供应用在电子设备300上的包括无线局域网(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the electronic device 300 including wireless local area networks (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellite systems (GNSS). ), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
在一些实施例中,电子设备300的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备300可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple  access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device 300 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 300 can communicate with the network and other devices through wireless communication technology. The wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
电子设备可以通过音频模块170以及处理器110等实现音频功能。例如,音频播放、录音等等。音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The electronic device may implement audio functions through the audio module 170 and the processor 110 and the like. For example, audio playback, recording, etc. The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
在一些实施例中,电子设备还可以通过ISP、摄像头、视频编解码器,GPU,显示设备140以及处理器110等实现拍摄功能。In some embodiments, the electronic device may also implement a shooting function through an ISP, a camera, a video codec, a GPU, a display device 140, a processor 110, and the like.
ISP用于处理摄像头180反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头180中。The ISP is used to process the data fed back by the camera 180 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 180 .
摄像头180用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备300可以包括1个或N个摄像头180,N为大于1的正整数。The camera 180 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 300 may include 1 or N cameras 180 , where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备300在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 300 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, and the like.
视频编解码器用于对数字视频压缩或解压缩。电子设备300可以支持一种或多种视频编解码器。这样,电子设备300可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The electronic device 300 may support one or more video codecs. In this way, the electronic device 300 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
输入设备190,用于接收输入的数字信息、字符信息或接触式触摸操作/非接触式手势,以及产生与电子设备300的用户设置以及功能控制有关的信号输入等。The input device 190 is used for receiving input digital information, character information or contact touch operation/non-contact gesture, and generating signal input related to user settings and function control of the electronic device 300 .
请参见图3B,图3B是本申请实施例提供的一种电子设备的软件结构框图。可以理解的是,图3B所示的电子设备的软件结构框图具体可以是图2A所示的第一电子设备200A的软件结构框图,或者图2B所示的第二电子设备200B的软件结构框图。电子设备的软件系统包括但不限于
Figure PCTCN2021081127-appb-000001
Linux或者其它操作系统。
Figure PCTCN2021081127-appb-000002
为华为的鸿蒙系统。本申请实施例以分层架构的Android系统为例,示例性说明电子设备软件结构。从图3B可以看出,分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,系统运行库层,以及内核层。
Referring to FIG. 3B , FIG. 3B is a software structural block diagram of an electronic device provided by an embodiment of the present application. It can be understood that the software structural block diagram of the electronic device shown in FIG. 3B may specifically be the software structural block diagram of the first electronic device 200A shown in FIG. 2A , or the software structural block diagram of the second electronic device 200B shown in FIG. 2B . Software systems of electronic equipment include but are not limited to
Figure PCTCN2021081127-appb-000001
Linux or other operating systems.
Figure PCTCN2021081127-appb-000002
For Huawei's Hongmeng system. The embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of an electronic device. As can be seen from Figure 3B, the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a system runtime layer, and a kernel layer.
应用程序层可以包括一系列应用程序包。如图3B所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频等应用程序。The application layer can include a series of application packages. As shown in Figure 3B, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, etc.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图3B所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. As shown in Figure 3B, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the electronic device. For example, the management of call status (including connecting, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
系统运行库层包括系统库和安卓运行时。系统库是应用程序框架的支撑;安卓运行时负责安卓系统的调度和管理,分为核心库和虚拟机两部分。The system runtime layer includes system libraries and Android runtime. The system library is the support of the application framework; the Android runtime is responsible for the scheduling and management of the Android system, and is divided into two parts: the core library and the virtual machine.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。A system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。The Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:动态图像专家组(Moving Pictures Experts Group, MPEG),音视频交错格式(Audio Video Interleaved,AVI)、传输流格式(MPEG2-TS,TS),动态影像专家压缩标准音频层面3(Moving Picture Experts Group Audio Layer III,MP3),高级音频编码(Advanced Audio Coding,AAC),便携式网络图形(Portable Network Graphics,PNG)等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: Moving Pictures Experts Group (MPEG), Audio Video Interleaved (AVI), Transport Stream (MPEG2-TS, TS), dynamic Video experts compress standard Audio Layer 3 (Moving Picture Experts Group Audio Layer III, MP3), Advanced Audio Coding (Advanced Audio Coding, AAC), Portable Network Graphics (PNG), etc.
三维图形处理库用于实现三维图形绘图,图像渲染,合成和图层处理等。The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, compositing and layer processing, etc.
2D图形引擎是2D绘图的绘图引擎。内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。2D graphics engine is a drawing engine for 2D drawing. The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
下面结合多媒体文件的播放场景,示例性说明电子设备300的软件以及硬件的工作流程。The software and hardware workflows of the electronic device 300 are exemplarily described below with reference to the playback scene of the multimedia file.
当触摸传感器接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸坐标操作,该触摸操作所对应的控件为视频应用的图标的控件为例,视频应用调用应用框架层的接口,启动视频应用,进而通过调用内核层启动音频驱动和显示驱动,以连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中,进而可以通过图3A所示的显示设备和140和音频模块170实现视频数据包和音频数据包的同步播放。其中,第一数量的视频数据包对应第一播放时间段,第二数量的音频数据包对应第二播放时间段,第一播放时间段和第二播放时间段之间的偏差小于预设时间差阈值,第一数量为大于1的整数,第二数量为大于1的整数。When the touch sensor receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch coordinate operation, and the control corresponding to the touch operation is the control of the icon of the video application as an example, the video application calls the interface of the application framework layer to start the video application, and then starts the audio driver and the display driver by calling the kernel layer. , to continuously load the first number of audio data packets into the audio decoder, and, to continuously load the second number of video data packets into the video decoder, and then through the display device and 140 and the audio module 170 shown in FIG. 3A Realize the synchronous playback of video data packets and audio data packets. The first number of video data packets corresponds to the first playback time period, the second number of audio data packets corresponds to the second playback time period, and the difference between the first playback time period and the second playback time period is less than a preset time difference threshold , the first quantity is an integer greater than 1, and the second quantity is an integer greater than 1.
请参见图4,图4是本申请实施例提供的一种电子设备的软件模块交互示意图。可以理解的是,图4所示的电子设备的软件模块交互示意图具体可以是图2A所示的第一电子设备200A的软件模块交互示意图,或者图2B所示的第二电子设备200B的软件模块交互示意图。软件模块可以包括:文件加载模块401,解析模块402,判断模块403,码流包加载模块404,解码模块405和同步模块406。其中,Please refer to FIG. 4. FIG. 4 is a schematic diagram of software module interaction of an electronic device provided by an embodiment of the present application. It can be understood that the schematic diagram of the interaction of software modules of the electronic device shown in FIG. 4 may specifically be the schematic diagram of the interaction of software modules of the first electronic device 200A shown in FIG. 2A , or the software modules of the second electronic device 200B shown in FIG. 2B . Interactive diagram. The software modules may include: a file loading module 401 , a parsing module 402 , a judgment module 403 , a stream package loading module 404 , a decoding module 405 and a synchronization module 406 . in,
文件加载模块401,包括一种或多种文件协议,如文件传输协议(File Transfer Protocol,FTP),超文本传输协议(Hypertext Transfer Protocol,HTTP),实时流传输协议(Real Time Streaming Protocol,RTSP)等,加载器402可以通过上述协议加载或下载多媒体文件到播放器的内存中。 File loading module 401, including one or more file protocols, such as File Transfer Protocol (File Transfer Protocol, FTP), Hypertext Transfer Protocol (Hypertext Transfer Protocol, HTTP), Real Time Streaming Protocol (Real Time Streaming Protocol, RTSP) etc., the loader 402 can load or download the multimedia file into the memory of the player through the above-mentioned protocol.
解析模块402,用于根据多媒体文件的封装形式采用对应的文件协议来解析多媒体文件得到描述信息和多个码流包,其中,码流包包括音频数据包和视频数据包。The parsing module 402 is configured to use a corresponding file protocol to parse the multimedia file according to the encapsulation form of the multimedia file to obtain description information and a plurality of code stream packets, wherein the code stream packets include audio data packets and video data packets.
判断模块403,用于根据描述信息统计多个码流包中具有对应显示时间戳关系的音频数据包和视频数据包之间的存储数据量,然后再统计存储数据量大于或等于目标距离阈值的目标数量,最后可以根据目标数量确定多个码流包的交织是否均匀。在一种可选的情况中,目标距离阈值为第一距离和预设距离中更大的一个,第一距离为根据描述信息所携带的多媒体文件的宽度、高度、存储比或压缩比中的至少一项所确定的;其中,多媒体文件包括视频帧,多媒体文件的宽度和高度对应于所述视频帧的宽度和高度。视频帧可以是多媒体文件中的视频数据包被解码后在显示器上显示的图像。The judgment module 403 is used to count the storage data volume between the audio data packets and the video data packets with the corresponding display timestamp relationship in the multiple code stream packets according to the description information, and then count the storage data volume greater than or equal to the target distance threshold value. The target number, and finally, according to the target number, it can be determined whether the interleaving of multiple code stream packets is uniform. In an optional situation, the target distance threshold is the larger one of the first distance and the preset distance, and the first distance is the width, height, storage ratio or compression ratio of the multimedia file carried according to the description information. At least one item is determined; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame. A video frame may be an image displayed on a display after the video data packets in the multimedia file are decoded.
码流包加载模块404,连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中。其中,所述第一数量为大于1的整数,所述第二数量为大于1的整数。进一步的,第一数量的视频数据包对应第一播放时间段,第二数量的所述音频数据包对应第二播放时间段,第一播放时间段和所述第二播放时间段之间的偏差小于预设时间差阈值。第一播放时间段和第二播放时间段可以是根据实际需求所设置的时间段,预设时间差阈值可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个用于参考对比的值。The code stream package loading module 404 continuously loads the first number of audio data packets into the audio decoder, and continuously loads the second number of video data packets into the video decoder. Wherein, the first number is an integer greater than 1, and the second number is an integer greater than 1. Further, the first number of video data packets corresponds to the first playback time period, the second number of the audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period less than the preset time difference threshold. The first playback time period and the second playback time period may be time periods set according to actual needs, and the preset time difference threshold may be a value artificially set based on experience for reference and comparison, or training based on multiple historical values. (or learned) a value used for reference contrast.
解码模块405,包括音频解码器和视频解码器。音频解码器用于将压缩编码形式的音频数据包解码成为非压缩形式的音频原始数据;视频解码器用于将压缩编码形式的视频数据包解码成为非压缩形式的视频原始数据。The decoding module 405 includes an audio decoder and a video decoder. The audio decoder is used for decoding the audio data packets in the compressed and encoded form into uncompressed audio raw data; the video decoder is used for decoding the video data packets in the compressed and encoded form into the uncompressed video original data.
同步模块405,用于将根据解析得到的描述信息对解码得到的第一数量的音频数据包和第二数量的视频数据包进行同步处理,并将上述第一数量的音频数据包发送至声卡处,将上述第二数量的视频数据包发送至显卡处。The synchronization module 405 is used to perform synchronization processing on the first quantity of audio data packets and the second quantity of video data packets obtained by decoding according to the description information obtained by analysis, and send the above-mentioned first quantity of audio data packets to the sound card. , and send the second number of video data packets to the graphics card.
请参见图5,图5是本申请实施例提供的一种多媒体文件的播放方法,该方法包括但不限于如下步骤:Please refer to FIG. 5. FIG. 5 is a method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:
步骤S501:加载多媒体文件到播放器的内存中。Step S501: Load the multimedia file into the memory of the player.
具体地,电子设备可以根据一系列文件协议,比如说FTP协议,HTTP协议和RTSP协议中的一种或多种加载或下载多媒体文件到播放器的内存中。Specifically, the electronic device can load or download multimedia files into the memory of the player according to a series of file protocols, such as one or more of the FTP protocol, the HTTP protocol and the RTSP protocol.
步骤S502:解析多媒体文件得到多个码流包。Step S502: Parse the multimedia file to obtain multiple stream packets.
具体地,电子设备可以根据多媒体文件的具体封装形式,采用对应的封装协议解析多媒体文件得到多个码流包,其中,码流包包括音频数据包和视频数据包。音频数据包可以是音频流压缩形式的数据包,视频数据包可以是视频流压缩编码形式的数据包。Specifically, the electronic device can parse the multimedia file according to the specific encapsulation form of the multimedia file by using a corresponding encapsulation protocol to obtain a plurality of code stream packets, wherein the code stream packets include audio data packets and video data packets. The audio data packets may be data packets in the form of audio stream compression, and the video data packets may be data packets in the form of video stream compression encoding.
步骤S503:连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中。Step S503: Continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.
具体地,第一数量的视频数据包对应第一播放时间段,第二数量的音频数据包对应第二播放时间段,第一播放时间段和第二播放时间段之间的偏差小于预设时间差阈值。也即,第一数量和第二数量可以相等也可以不相等,但是电子设备加载第一数量的音频数据包和第二数量的视频数据包到播放器中播放时,播放的时长需要保持基本一致。“基本一致”表明对用户来说,音频数据包和视频数据包是同步播放的。第一播放时间段和第二播放时间段可以根据解码器(第一播放时间对应视频解码器,第二播放时间对应音频解码器)中缓冲寄存器buffer的大小和读取码流包的频率来确定。可以理解的是,若一次读取码流包的数量太多可能会引起buffer的阻塞;若一次读取码流包的数量太小可能会引起频繁的跳转导致读取欠载。因此需要设定合理的播放时间范围来保证每一次读取码流包的数量是适当的,进一步的,预设时间范围可以满足第一数量的音频数据包和第二数量的视频数据包到播放器中播放至少1秒时间长度。Specifically, the first number of video data packets corresponds to the first playback time period, the second number of audio data packets corresponds to the second playback time period, and the deviation between the first playback time period and the second playback time period is less than the preset time difference threshold. That is, the first number and the second number may be equal or unequal, but when the electronic device loads the first number of audio data packets and the second number of video data packets to the player for playback, the playback duration needs to be basically the same. . "Substantially the same" indicates to the user that the audio data packets and the video data packets are played synchronously. The first playback time period and the second playback time period can be determined according to the size of the buffer register buffer in the decoder (the first playback time corresponds to the video decoder, and the second playback time corresponds to the audio decoder) and the frequency of reading stream packets. . It is understandable that if the number of code stream packets read at one time is too large, the buffer may be blocked; if the number of code stream packets read at one time is too small, it may cause frequent jumps and cause read underload. Therefore, it is necessary to set a reasonable playback time range to ensure that the number of stream packets read each time is appropriate. Further, the preset time range can satisfy the first number of audio data packets and the second number of video data packets to play play in the player for at least 1 second.
音频解码器用于将第一数量的压缩编码形式的音频数据包解码成为第一数量的非压缩形式的音频原始数据;视频解码器用于将第二数量的压缩编码形式的视频数据包解码成为 第二数量的非压缩形式的视频原始数据。音频解码器将第一数量的非压缩形式的音频原始数据包发送至声卡处,视频解码器将第二数量的非压缩形式的视频原始数据包发送至显卡处。声卡和显示即可实现口音和口型的同步播放。The audio decoder is used for decoding the first quantity of audio data packets in the compressed and coded form into the first quantity of uncompressed audio raw data; the video decoder is used for decoding the second quantity of the video data packets in the compressed and coded form into a second quantity of audio data packets. Amount of raw video data in uncompressed form. The audio decoder sends the first quantity of uncompressed audio raw data packets to the sound card, and the video decoder sends the second quantity of uncompressed uncompressed video raw data packets to the graphics card. Accent and lip sync playback is possible with a sound card and display.
在一种可能的实现方式中,多媒体文件中可以包括描述信息,若描述信息中存储的首个码流包的信息为音频数据包的信息,则电子设备可以连续加载第一数量的音频数据包到音频解码器中得到音频帧,再连续加载第二数量的视频数据包到视频解码器中得到视频帧。对音频帧和视频帧进行同步处理后,由扬声器来播放音频帧,由显示器来显示视频帧。In a possible implementation manner, the multimedia file may include description information, and if the information of the first stream packet stored in the description information is the information of the audio data packet, the electronic device can continuously load the first number of audio data packets Go to the audio decoder to obtain the audio frame, and then continuously load the second number of video data packets into the video decoder to obtain the video frame. After synchronizing the audio frame and the video frame, the audio frame is played by the speaker, and the video frame is displayed by the display.
在一种可能的实现方式中,若描述信息中存储的首个码流包的信息为视频数据包的信息,则电子设备可以先连续加载第二数量的视频数据包到视频解码器中得到视频帧,再连续加载第一数量的音频数据包到音频解码器中得到音频帧。对视频帧和音频帧进行同步处理后,由扬声器来播放音频帧,由显示器来显示视频帧。In a possible implementation manner, if the information of the first stream packet stored in the description information is the information of the video data packet, the electronic device can first continuously load the second number of video data packets into the video decoder to obtain the video frame, and then continuously load the first number of audio data packets into the audio decoder to obtain audio frames. After synchronizing the video frame and the audio frame, the audio frame is played by the speaker, and the video frame is displayed by the display.
请参见图6,图6是本申请实施例提供的另一种多媒体文件的播放方法,该方法包括但不限于如下步骤:Please refer to FIG. 6. FIG. 6 is another method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:
步骤S601:加载多媒体文件到播放器的内存中。Step S601: Load the multimedia file into the memory of the player.
具体地,详细描述可参考步骤S501,此处不再赘述。Specifically, for a detailed description, reference may be made to step S501, which will not be repeated here.
步骤S602:解析多媒体文件得到多个码流包。Step S602: Parse the multimedia file to obtain multiple stream packets.
具体地,多媒体文件中可以包括描述信息,解析多媒体文件可以得到描述信息和多个码流包。Specifically, the multimedia file may include description information, and the description information and multiple code stream packets may be obtained by parsing the multimedia file.
步骤S603:根据描述信息确定多个码流包的交织是否均匀。Step S603: Determine whether the interleaving of multiple code stream packets is uniform according to the description information.
具体地,描述信息可以是位于多媒体文件的头部或者尾部的信息,如同图书一样包含目录和内容两部分类似,描述信息描述了文件信息以及文件中每一个码流包的属性信息、存储位置以及大小等信息。比如说:多媒体文件的播放时长、存储比、压缩比,多媒体文件中视频的宽度、高度、帧率、码率等信息,多媒体文件中音频的采样率、通道数等信息;码流包的存储位置(Position,POS),码流包的长度,以及码流包的显示时间戳(Presentation Time Stamp,PTS)。根据描述信息中每一个码流包的存储位置和显示时间戳可以确定多个码流包的交织是否均匀。Specifically, the description information can be the information located at the head or the tail of the multimedia file. Like a book, it contains two parts, the directory and the content. The description information describes the file information and the attribute information, storage location and size, etc. For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.; The position (Position, POS), the length of the stream packet, and the presentation timestamp (Presentation Time Stamp, PTS) of the stream packet. Whether the interleaving of multiple code stream packets is uniform can be determined according to the storage location and display time stamp of each code stream packet in the description information.
在一种可能的实现方式中,描述信息中存储了每个码流包的显示时间戳和存储位置,电子设备可以先根据描述信息统计多个码流包中具有对应显示时间戳关系的音频数据包和视频数据包之间的存储数据量。也即,电子设备可以根据音频数据包的显示时间戳找到与其显示时间戳具有对应关系的视频数据包,再根据存储位置统计显示时间戳具有对应关系的音频数据包和视频数据包之间的存储数据量。可以理解的是,存储数据量可以表明音频数据包的存储位置和其显示时间戳对应的视频数据包的存储位置之间还存储有其他的码流包。然后电子设备再统计存储数据量大于或等于目标距离阈值的目标数量,最后可以根据目标数量确定多个码流包的交织是否均匀。In a possible implementation manner, the description information stores the display timestamp and storage location of each stream packet, and the electronic device may first count the audio data with the corresponding display timestamp relationship in the multiple stream packets according to the description information. Amount of stored data between packets and video packets. That is, the electronic device can find the video data packet with the corresponding display time stamp according to the display time stamp of the audio data packet, and then according to the storage location statistics show the storage between the audio data packet and the video data packet with the corresponding time stamp. The amount of data. It can be understood that the amount of stored data can indicate that other code stream packets are stored between the storage location of the audio data packet and the storage location of the video data packet corresponding to the display time stamp. Then, the electronic device counts the number of targets whose stored data amount is greater than or equal to the target distance threshold, and finally can determine whether the interleaving of multiple code stream packets is uniform according to the target number.
在一种可能的实现方式中,目标距离阈值为第一距离和阈值距离中更大的一个,而第一距离为电子设备根据描述信息所携带的多媒体文件的宽度、高度、存储比或压缩比中的 至少一项所确定的;其中,多媒体文件包括视频帧,多媒体文件的宽度和高度对应于所述视频帧的宽度和高度。举例来说,第一距离L=width*height*store_rate*conpression_rate。然后电子设备再根据目标距离阈值确定多个码流包的交织情况是否均匀,其中,目标距离阈值为第一距离和预设距离阈值中更大的一个。需要说明的是,预设距离阈值为预设的用于衡量数据包的交织情况是否均匀的参数,可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个用于参考对比的值。举例来说,假设预设距离阈值D=2*1024*1024Byte=2097152Byte=2MB,当第一距离L大于预设距离阈值P时,目标距离阈值选用第一距离L的值;当第一距离L小于预设距离阈值P时,目标距离阈值D选用预设距离阈值P的值。可以理解的是,第一距离L与预设阈值距离P相等时,目标距离阈值D可以是第一距离L的值或者预设距离P的值。In a possible implementation manner, the target distance threshold is the larger one of the first distance and the threshold distance, and the first distance is the width, height, storage ratio or compression ratio of the multimedia file carried by the electronic device according to the description information determined by at least one of; wherein the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame. For example, the first distance L=width*height*store_rate*compression_rate. Then, the electronic device determines whether the interleaving of multiple code stream packets is uniform according to the target distance threshold, where the target distance threshold is the larger one of the first distance and the preset distance threshold. It should be noted that the preset distance threshold is a preset parameter used to measure whether the interleaving of data packets is uniform, which can be a value artificially set based on experience for reference comparison, or training based on multiple historical values. (or learned) a value used for reference contrast. For example, assuming the preset distance threshold D=2*1024*1024Byte=2097152Byte=2MB, when the first distance L is greater than the preset distance threshold P, the target distance threshold selects the value of the first distance L; when the first distance L When it is less than the preset distance threshold P, the target distance threshold D selects the value of the preset distance threshold P. It can be understood that, when the first distance L is equal to the preset threshold distance P, the target distance threshold D may be the value of the first distance L or the value of the preset distance P.
在一种可能的实现方式中,电子设备可以根据描述信息确定预设数量的码流包中每个码流包的显示时间戳和存储位置,其中,预设数量的码流包属于多个码流包中的前N个码流包,N为正整数。然后电子设备再按照显示时间戳逐增的方式统计预设数量的码流包中具有对应显示时间戳关系的视频数据包和音频数据包各自的存储位置,最后根据统计的视频数据包的存储位置和音频数据包的存储位置,确定具有对应显示时间戳关系的视频数据包和音频数据包之间的存储数据量。需要说明的是,预设数量可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个用于参考对比的值。预设数量N可以设定为2000。若根据描述信息确定多媒体文件的码流包的数量小于预设数量,则电子设备可以统计全部的码流包中具有对应显示时间戳关系的所述音频数据包和所述视频数据包之间的存储数据量。In a possible implementation manner, the electronic device may determine, according to the description information, the display time stamp and storage location of each code stream packet in a preset number of code stream packets, where the preset number of code stream packets belong to multiple code streams The first N code stream packets in the stream packets, where N is a positive integer. Then the electronic device counts the respective storage locations of the video data packets and the audio data packets with the corresponding display time stamp relationship in the preset number of stream packets in the manner of increasing the display time stamp, and finally according to the storage location of the video data packets according to the statistics and the storage location of the audio data packet to determine the amount of stored data between the video data packet and the audio data packet with the corresponding display time stamp relationship. It should be noted that the preset number may be a value for reference comparison artificially set according to experience, or a value for reference comparison obtained by training (or learning) according to a plurality of historical values. The preset number N can be set to 2000. If it is determined according to the description information that the number of code stream packets of the multimedia file is less than the preset number, the electronic device may count the number of the audio data packets and the video data packets with the corresponding display timestamp relationship in all the code stream packets. The amount of data stored.
举例来说,若根据描述信息确定需要播放的多媒体文件中包括4000个码流包,因为4000大于预设数量N,则电子设备可以确定预设数量N的码流包中每个码流包的显示时间戳和存储位置。若描述信息中存储的第一笔码流包的信息为音频数据包的信息,则可以根据第一笔音频数据包的显示时间戳确定与上述显示时间戳具有对应关系的视频数据包的存储位置,统计具体对应显示时间戳关系的音频数据包的存储位置和视频数据包的存储位置之间的存储数据量。以此类推,按照显示时间戳逐增的方式确定第二笔音频数据包、第三笔音频数据包,….,第N笔音频数据包与各自显示时间戳对应的视频数据包的存储位置,最后统计每笔音频数据包的存储位置和其显示时间戳具有对应关系的视频数据包的存储位置之间的存储数据量。For example, if it is determined according to the description information that the multimedia file to be played includes 4000 stream packets, because 4000 is greater than the preset number N, the electronic device can determine the number of stream packets in the preset number N of stream packets. Show timestamp and storage location. If the information of the first stream packet stored in the description information is the information of the audio data packet, the storage location of the video data packet corresponding to the above-mentioned display timestamp can be determined according to the display time stamp of the first audio data packet , and count the amount of stored data between the storage location of the audio data packet and the storage location of the video data packet corresponding to the time stamp relationship. By analogy, the storage location of the second audio data packet, the third audio data packet, . . . , the Nth audio data packet and the video data packet corresponding to the respective display time stamps is determined in a manner of increasing the display time stamp, Finally, the amount of stored data between the storage location of each audio data packet and the storage location of the video data packet whose display time stamp has a corresponding relationship is counted.
因为存在多个码流包,所以也可能存在多个存储数据量(每一笔音频数据包和其显示时间戳具有对应关系的视频数据包之间的存储数据量),电子设备可以根据多个存储数据量统计存储数据量大于或等于上述目标距离阈值的目标数量。举例来说,若目标距离阈值D为3MB,多个存储数据量包括:10MB、2MB、3.7MB、1.9MB、6.9MB、11.6MB,则存储数据量大于或等于目标距离阈值D为3MB的目标数量为4。最后电子设备可以根据目标数量确定多个码流包的交织情况是否均匀。进一步的,电子设备可以计算目标数量abnormal_cnt占预设数量N的比例rate,若比例rate大于或等于第二预设阈值,则可以确定多个码流包的交织情况属于交织不均匀。可以理解的是,第二预设阈值可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个 用于参考对比的值。第二预设阈值可以取值为0.2。举例来说,比例rate=abnormal_cnt/N=0.3,因为0.3大于第二预设阈值0.2,则可以确定码流交织不均匀。Because there are multiple code stream packets, there may also be multiple storage data volumes (the storage data volume between each audio data packet and the video data packet whose display time stamp has a corresponding relationship). The stored data volume counts the number of targets whose stored data volume is greater than or equal to the above target distance threshold. For example, if the target distance threshold D is 3MB, and the multiple storage data volumes include: 10MB, 2MB, 3.7MB, 1.9MB, 6.9MB, and 11.6MB, then the storage data volume is greater than or equal to the target distance threshold D is 3MB. The number is 4. Finally, the electronic device can determine whether the interleaving of the multiple code stream packets is uniform according to the target quantity. Further, the electronic device can calculate the ratio rate of the target number abnormal_cnt to the preset number N, and if the ratio rate is greater than or equal to the second preset threshold, it can be determined that the interleaving of multiple code stream packets belongs to uneven interleaving. It can be understood that, the second preset threshold may be a value artificially set according to experience for reference comparison, or a value obtained by training (or learning) according to a plurality of historical values for reference comparison. The second preset threshold may take a value of 0.2. For example, the ratio rate=abnormal_cnt/N=0.3, since 0.3 is greater than the second preset threshold 0.2, it can be determined that the code stream interleaving is not uniform.
步骤S604:连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中。Step S604: Continuously load the first number of audio data packets into the audio decoder, and continuously load the second number of video data packets into the video decoder.
具体地,详细描述可参考步骤S503,此处不再赘述。Specifically, for a detailed description, reference may be made to step S503, which will not be repeated here.
请参见图7,图7是本申请实施例提供的一种判断多媒体文件交织是否均匀的方法的流程示意图。从图7可以看出,电子设备加载多媒体文件后,解析多媒体文件可以得到描述信息和/或多个码流包。其中,码流包包括音频数据包和视频数据包。描述信息描述了文件信息以及每一个码流包的属性信息、存储位置以及大小等信息。比如说:多媒体文件的播放时长、存储比、压缩比,多媒体文件中视频的宽度、高度、帧率、码率等信息,多媒体文件中音频的采样率、通道数等信息;码流包的存储位置POS,码流包的长度,以及码流包的PTS。Please refer to FIG. 7 . FIG. 7 is a schematic flowchart of a method for judging whether the interleaving of multimedia files is uniform according to an embodiment of the present application. As can be seen from FIG. 7 , after the electronic device loads the multimedia file, description information and/or multiple code stream packets can be obtained by parsing the multimedia file. Among them, the code stream packet includes audio data packets and video data packets. The description information describes the file information and the attribute information, storage location and size of each code stream packet. For example: the playback time, storage ratio, compression ratio of multimedia files, the width, height, frame rate, bit rate and other information of the video in the multimedia file, the sampling rate and number of channels of the audio in the multimedia file, etc.; The position POS, the length of the code stream packet, and the PTS of the code stream packet.
然后,电子设备可以根据多媒体文件的描述信息来确定目标距离阈值。具体地,电子设备可以根据描述信息所携带的多媒体文件的宽度width、高度height、存储比store_rate、和压缩比conpression_rate中的1个或1个以上来确定第一距离。若第一距离大于预设距离阈值,则目标阈值为第一距离;若第一距离小于预设距离阈值,则目标阈值为预设距离阈值;若第一距离等于预设距离阈值,则目标阈值为第一距离或者预设距离阈值。也即,目标距离阈值为第一距离和预设距离阈值中更大的一个。需要说明的是,预设距离阈值为预设的用于衡量数据包的交织情况是否均匀的参数,可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个用于参考对比的值。Then, the electronic device may determine the target distance threshold according to the description information of the multimedia file. Specifically, the electronic device may determine the first distance according to one or more of the width, height, storage ratio store_rate, and compression ratio compression_rate of the multimedia file carried by the description information. If the first distance is greater than the preset distance threshold, the target threshold is the first distance; if the first distance is less than the preset distance threshold, the target threshold is the preset distance threshold; if the first distance is equal to the preset distance threshold, the target threshold is the first distance or a preset distance threshold. That is, the target distance threshold is the larger one of the first distance and the preset distance threshold. It should be noted that the preset distance threshold is a preset parameter used to measure whether the interleaving of data packets is uniform, which can be a value artificially set based on experience for reference comparison, or training based on multiple historical values. (or learned) a value used for reference contrast.
接下来,电子设备需要根据预设数量的码流包的存储位置和显示时间戳来分析的码流包是否交织均匀。若多媒体文件的码流包的数量小于预设数量,则电子设备需要根据多媒体文件的全部码流包的存储位置和显示时间戳来分析码流包是否交织均匀,也即预设数量为码流包的全部数量;若多媒体文件的码流包的数量大于或等于上述预设数量,则电子设备需要根据前N笔码流包的存储位置和显示时间戳来分析码流包是否交织均匀,也即预设数量为多个码流包中的前N个码流包。当确定需要分析的码流包的预设数量后,按照先选取显示时间戳较小的码流包然后再选择显示时间戳较大的码流包的方式,基于描述信息中存储的每一个码流包的存储位置和显示时间戳对预设数量的码流包进行分析判断。具体地,每次分析处理时,选取未处理的码流包中显示时间戳最小的第一码流包,需要说明的是,若描述信息中存储的首个码流包的信息为音频数据包的信息,则第一码流包为音频数据包,第二码流包为视频数据包;若描述信息中存储的首个码流包的信息为视频数据包的信息,则第一码流包为视频数据包,第二码流包为音频数据包。其中,第二码流包为与第一码流包为具有对应显示时间戳关系的码流包。Next, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage location of the preset number of code stream packets and the display time stamp. If the number of code stream packets of the multimedia file is less than the preset number, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage locations and display time stamps of all code stream packets of the multimedia file, that is, the preset number is the code stream The total number of packets; if the number of code stream packets of the multimedia file is greater than or equal to the above preset number, the electronic device needs to analyze whether the code stream packets are evenly interleaved according to the storage location and display time stamp of the first N code stream packets. That is, the preset number is the first N code stream packets among the plurality of code stream packets. After determining the preset number of code stream packets to be analyzed, according to the method of first selecting the code stream packet with a smaller timestamp and then selecting the code stream packet with a larger timestamp, based on each code stream stored in the description information The storage location and display time stamp of the stream packets are used to analyze and judge a preset number of stream packets. Specifically, in each analysis and processing, the first code stream packet with the smallest timestamp is selected among the unprocessed code stream packets. It should be noted that if the information of the first code stream packet stored in the description information is an audio data packet information, then the first code stream packet is an audio data packet, and the second code stream packet is a video data packet; if the information of the first code stream packet stored in the description information is the information of the video data packet, then the first code stream packet is a video data packet, and the second stream packet is an audio data packet. Wherein, the second code stream packet is a code stream packet having a corresponding display timestamp relationship with the first code stream packet.
然后,电子设备需要判断第一码流包的存储位置和第二码流包的存储位置之间的存储数据量是否大于或等于目标阈值,若大于或等于目标阈值,则认为这两笔码流包的存储数据量比较异常,在实际播放中可能产生较大的跳转。因此,通过更新目标数量来记录存储数据量超过目标阈值的次数。按照图7所示的方法分析处理预设数量中每一个码流包的交 织情况,当分析处理的码流包的数量大于或等于预设数量时,说明电子设备已经处理分析完毕。电子设备可以计算目标数量占预设数量的比例,也即比例=目标数量/预设数量。若比例大于或等于第二预设阈值,则可以确定多个码流包的交织不均匀;若比例小于第二预设阈值,则可以确定多个码流包的交织均匀;可以理解的是,第二预设阈值可以是根据经验人为设定的一个用于参考对比的值,或者根据多个历史值进行训练(或学习)得到的一个用于参考对比的值。Then, the electronic device needs to determine whether the amount of stored data between the storage location of the first stream packet and the storage location of the second stream packet is greater than or equal to the target threshold, and if it is greater than or equal to the target threshold, it is considered that the two streams The amount of data stored in the package is abnormal, and a large jump may occur in actual playback. Therefore, the number of times the amount of stored data exceeds the target threshold is recorded by updating the target amount. According to the method shown in Figure 7, the interleaving situation of each code stream packet in the preset number is analyzed and processed. When the number of the code stream packets analyzed and processed is greater than or equal to the preset number, it means that the electronic equipment has processed and analyzed. The electronic device may calculate the ratio of the target quantity to the preset quantity, that is, ratio=target quantity/preset quantity. If the ratio is greater than or equal to the second preset threshold, it can be determined that the interleaving of multiple code stream packets is uneven; if the ratio is less than the second preset threshold, it can be determined that the interleaving of multiple code stream packets is uniform; it can be understood that, The second preset threshold value may be a value for reference comparison artificially set according to experience, or a value for reference comparison obtained by training (or learning) according to a plurality of historical values.
需要说明的是,若在分析码流包的交织情况时,预设数量的码流包中不存在具有显示时间戳对应关系的第一码流包和第二码流包,则可以不对这两个码流包进行分析处理。It should be noted that, when analyzing the interleaving of the code stream packets, if the preset number of code stream packets do not have the first code stream packet and the second code stream packet with the corresponding relationship of the display time stamp, it is not necessary to disregard the two code stream packets. A code stream packet is analyzed and processed.
请参见图8A,图8A是本申请实施例提供的一种批量加载码流包的播放方法的流程示意图。从图8A可以看出,当电子设备根据图6所示的流程示意图解析多媒体文件,并确定码流包的交织情况后。若码流包的交织均匀,则每次选取未被加载的具有最小显示时间戳的码流包到相应解码器,直到把码流包全部加载到解码器中进行解码播放。详细描述可参考图1A,此处不再赘述。Please refer to FIG. 8A . FIG. 8A is a schematic flowchart of a playback method for batch loading stream packets provided by an embodiment of the present application. As can be seen from FIG. 8A , after the electronic device parses the multimedia file according to the schematic flowchart shown in FIG. 6 and determines the interleaving situation of the code stream packets. If the interleaving of the code stream packets is uniform, the code stream packets with the smallest display time stamp that are not loaded are selected each time to the corresponding decoder until all the code stream packets are loaded into the decoder for decoding and playback. For a detailed description, reference may be made to FIG. 1A , which will not be repeated here.
若码流包的交织不均匀,则连续加载第一数量音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中。具体地,请参见图8B,图8B是本申请实施例提供的一种从图8B可以看出,若描述信息中存储的首个码流包的信息为音频数据包的信息,则电子设备先依序读取并加载第一数量的音频数据包到音频解码器中,第一数量的视频数据包在播放器中的播放时长需要对应第一播放时间段;然后电子设备再跳转到与音频数据包的显示时间戳对应的视频码流包的位置,依序读取并加载第二数量的视频数据包到视频解码器中,第二数量的视频数据包在播放器中的播放时长也需要对应第二播放时间段,第一播放时间段和第二播放时间段之间的偏差小于预设时间差阈值。然后循环上述加载过程直到把码流包全部加载到对应的解码器中进行解码播放。可选的,若描述信息中存储首个码流包的信息为视频数据包的信息,则电子设备先依序读取并加载第二数量的视频数据包到视频解码器中,然后再依序读取并加载第一数量的音频数据包到音频解码器中。需要说明的是,上述的“依序”可以是按照显示时间戳从小到大的顺序。If the interleaving of the code stream packets is uneven, the first number of audio data packets are continuously loaded into the audio decoder, and the second number of video data packets are continuously loaded into the video decoder. Specifically, please refer to FIG. 8B . FIG. 8B is an example provided by an embodiment of the present application. As can be seen from FIG. 8B , if the information of the first stream packet stored in the description information is the information of the audio Sequentially read and load the first number of audio data packets into the audio decoder, and the playback duration of the first number of video data packets in the player needs to correspond to the first playback period; then the electronic device jumps to the audio The position of the video stream packet corresponding to the display timestamp of the data packet, read and load the second number of video data packets into the video decoder in sequence, and the playback duration of the second number of video data packets in the player also needs to be Corresponding to the second playback time period, the deviation between the first playback time period and the second playback time period is less than a preset time difference threshold. Then loop the above loading process until all the code stream packets are loaded into the corresponding decoder for decoding and playback. Optionally, if the information of the first stream packet stored in the description information is the information of the video data packet, the electronic device first reads and loads the second number of video data packets into the video decoder in sequence, and then sequentially. Read and load the first number of audio packets into the audio decoder. It should be noted that, the above-mentioned "sequential" may be in the order of displaying timestamps from small to large.
可以理解的是,第一数量和第二数量可以相等也可以不相等,但是电子设备加载第一数量的音频数据包和第二数量的视频数据包到对应的解码器中解码播放时,播放的时间段需要保持基本一致。“基本一致”表明对用户来说,音频数据包和视频数据包是同步播放的。预设时间范围可以根据解码器中缓冲buffer的大小和读取码流包的频率来确定。可以理解的是,若一次读取码流包的数量太多可能会引起buffer的阻塞;若一次读取码流包的数量太小可能会引起频繁的跳转导致读取欠载。因此需要设定合理的预设时间范围来保证每一次读取码流包的数量是适当的,进一步的,预设时间范围可以满足第一数量的音频数据包和第二数量的视频数据包到播放器中播放至少1秒时间长度。电子设备可以以预设时间范围(比如说1秒)为单位大概对齐的方式来加载音频数据包和视频数据包到播放器中播放,减少因码流包对齐的方式而造成的频繁跳转所带来的性能开销。It can be understood that the first quantity and the second quantity may or may not be equal, but when the electronic device loads the first quantity of audio data packets and the second quantity of video data packets into the corresponding decoder for decoding and playing, the playing The time period needs to be basically the same. "Substantially the same" indicates to the user that the audio data packets and the video data packets are played synchronously. The preset time range can be determined according to the size of the buffer in the decoder and the frequency of reading stream packets. It is understandable that if the number of code stream packets read at one time is too large, the buffer may be blocked; if the number of code stream packets read at one time is too small, it may cause frequent jumps and cause read underload. Therefore, it is necessary to set a reasonable preset time range to ensure that the number of stream packets read each time is appropriate. Further, the preset time range can satisfy the first number of audio data packets and the second number of video data packets. Play in the player for a duration of at least 1 second. The electronic device can load audio data packets and video data packets into the player in a roughly aligned manner in a preset time range (for example, 1 second), reducing the frequent jumps caused by the alignment of the code stream packets. performance overhead.
在一种可能的实现方式中,电子设备在读取并加载码流包的过程中可以根据图6所示的流程示意图动态地判断码流包的交织情况,并实时地调整码流包的加载方式。举例来说, 若需要播放的多媒体文件包括8000个码流包,当根据8000个码流包中的前N个码流包分析得到码流包的交织不均匀时,电子设备可以连续加载第一数量的音频数据包,以及,依序加载第二数量的视频数据包到播放器中播放。其中,N可以是预设数量,N小于8000。当电子设备已经加载了1000个码流包到播放器中解码播放后,多媒体文件中未被加载的码流包还有7000。此时电子设备可以再根据7000个码流包中的前N个码流包来分析此时的码流包的交织情况,若分析得到此时码流包的交织均匀时,电子设备可以每次选取未被加载的具有最小显示时间戳的码流包到相应解码器,直到把码流包全部加载到播放器中进行解码播放。In a possible implementation manner, in the process of reading and loading the code stream package, the electronic device can dynamically determine the interleaving situation of the code stream package according to the schematic flowchart shown in FIG. 6, and adjust the loading of the code stream package in real time Way. For example, if the multimedia file to be played includes 8,000 stream packets, when it is found that the interleaving of the stream packets is uneven according to the analysis of the first N stream packets in the 8,000 stream packets, the electronic device can continuously load the first stream packet. A number of audio data packets, and a second number of video data packets are sequentially loaded into the player for playback. Wherein, N can be a preset number, and N is less than 8000. After the electronic device has loaded 1,000 code stream packets into the player for decoding and playback, there are still 7,000 code stream packets that are not loaded in the multimedia file. At this time, the electronic device can analyze the interleaving situation of the code stream packets at this time according to the first N code stream packets in the 7000 code stream packets. Select the unloaded code stream package with the smallest display time stamp to the corresponding decoder until all the code stream packages are loaded into the player for decoding and playback.
请参见图9,图9是本申请实施例提供的另一种多媒体文件的播放方法,该方法包括但不限于如下步骤:Please refer to FIG. 9. FIG. 9 is another method for playing a multimedia file provided by an embodiment of the present application. The method includes but is not limited to the following steps:
步骤S901:加载多媒体文件到播放器的内存中。Step S901: Load the multimedia file into the memory of the player.
具体地,详细描述可参数步骤S501,此处不再赘述。Specifically, the parameter step S501 is described in detail, and details are not repeated here.
步骤S902:解析多媒体文件得到多个码流包。Step S902: Parse the multimedia file to obtain multiple stream packets.
具体地,当读取多媒体文件之后,电子设备解析多媒体文件得到码流包中包括视频数据包、音频数据包和字幕数据包。电子设备可以通过显示器来显示视频数据包和字幕数据包,以及通过扬声器来播放音频数据包。可以理解的是,当某些多媒体文件的音频数据包中没有内嵌字幕时,多媒体文件的码流包中可以包括字幕数据包。Specifically, after reading the multimedia file, the electronic device parses the multimedia file and obtains that the code stream package includes a video data package, an audio data package and a subtitle data package. The electronic device can display video data packets and subtitle data packets through a display, and play audio data packets through a speaker. It can be understood that, when the audio data packets of some multimedia files do not have embedded subtitles, the code stream packets of the multimedia files may include subtitle data packets.
步骤S903:连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中,以及,连续加载第三数量的字幕数据包到字幕解码器中。Step S903: Continuously load a first number of audio data packets into the audio decoder, and, continuously load a second number of video data packets into the video decoder, and continuously load a third number of subtitle data packets into the subtitle decoder middle.
具体地,电子设备依序读取并加载第一数量的音频数据包到音频解码器中进行解码得到音频帧,加载第二数量的视频数据包到视频解码器中进行解码得到视频帧,以及,加载第三数量的字幕数据包到字幕解码器中进行解码得到字幕帧。然后,电子设备将音频帧、视频帧和字幕帧进行同步处理后,由显示器对视频帧和字幕帧进行显示,由扬声器对音频帧进行播放。因此,音频数据包、视频数据包和字幕数据包交替加载,直到将需要播放的多媒体文件的码流包加载完毕。第三数量为大于1的整数,第三数量的字幕数据包对应第三播放时间段,第三播放时间段与第一播放时间段之间的偏差,或与第二播放时间段之间的偏差均小于所述预设时间差阈值。Specifically, the electronic device sequentially reads and loads a first number of audio data packets into an audio decoder for decoding to obtain audio frames, loads a second number of video data packets into a video decoder for decoding to obtain video frames, and, Load the third number of subtitle data packets into the subtitle decoder for decoding to obtain subtitle frames. Then, after the electronic device performs synchronization processing on the audio frame, the video frame and the subtitle frame, the video frame and the subtitle frame are displayed by the display, and the audio frame is played by the speaker. Therefore, audio data packets, video data packets and subtitle data packets are loaded alternately until the code stream packets of the multimedia files to be played are loaded. The third number is an integer greater than 1, the third number of subtitle data packets corresponds to the third playback period, the deviation between the third playback period and the first playback period, or the deviation from the second playback period are smaller than the preset time difference threshold.
在一种可能的实现方式中,若电子设备根据图7所示的流程示意图确定多个码流包的交织不均匀,则电子设备连续加载第一数量的音频数据包到音频解码器中,以及,连续加载第二数量的视频数据包到视频解码器中,以及,连续加载第三数量的字幕数据包到字幕解码器中。In a possible implementation manner, if the electronic device determines according to the schematic flowchart shown in FIG. 7 that the interleaving of multiple code stream packets is uneven, the electronic device continuously loads the first number of audio data packets into the audio decoder, and , successively loading a second number of video packets into the video decoder, and successively loading a third number of subtitle packets into the subtitle decoder.
需要说明的是,音频数据包、视频数据包和字幕数据包的加载顺序可以根据解析得到描述信息来确定,若描述信息中存储的首个码流包的信息为视频数据包的信息,则电子设备可以先加载第二数量的视频数据包,再加载第一数量的音频数据包,然后再加载第三数量的字幕数据包到播放器中播放。本申请实施例对加载码流包的顺序不做任何限制。It should be noted that the loading order of audio data packets, video data packets and subtitle data packets can be determined according to the description information obtained by parsing. If the information of the first stream packet stored in the description information is the information of the video data packet, the The device may first load the second quantity of video data packets, then load the first quantity of audio data packets, and then load the third quantity of subtitle data packets to be played in the player. This embodiment of the present application does not impose any restrictions on the order of loading the code stream package.
请参见图10,图10是本申请实施例提供的一种多媒体文件的播放装置的结构示意图, 该多媒体文件的播放装置100可以为节点,也可以为节点中的一个器件,例如芯片或者集成电路,该多媒体文件的播放装置100可以包括第一加载单元1001、解析单元1002和第二加载单元1003,其中,各个单元的详细描述如下。Please refer to FIG. 10. FIG. 10 is a schematic structural diagram of an apparatus for playing a multimedia file provided by an embodiment of the present application. The apparatus 100 for playing a multimedia file may be a node, or may be a device in a node, such as a chip or an integrated circuit , the apparatus 100 for playing a multimedia file may include a first loading unit 1001, a parsing unit 1002 and a second loading unit 1003, wherein the detailed description of each unit is as follows.
第一加载单元1001,用于加载多媒体文件到播放器的内存中;The first loading unit 1001 is used for loading multimedia files into the memory of the player;
解析单元1002,用于解析所述多媒体文件得到多个码流包,所述码流包包括音频数据包和视频数据包; parsing unit 1002, configured to parse the multimedia file to obtain a plurality of code stream packets, the code stream packets include audio data packets and video data packets;
第二加载单元1003,用于连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中,其中,所述第一数量为大于1的整数,所述第二数量为大于1的整数。The second loading unit 1003 is configured to continuously load a first number of the audio data packets into an audio decoder, and continuously load a second number of the video data packets into a video decoder, wherein the first The number is an integer greater than 1, and the second number is an integer greater than 1.
在一种可能的实施方式中,所述第一数量的所述视频数据包对应第一播放时间段,所述第二数量的所述音频数据包对应第二播放时间段,所述第一播放时间段和所述第二播放时间段之间的偏差小于预设时间差阈值。In a possible implementation manner, the first number of the video data packets corresponds to a first playback period, the second number of the audio data packets corresponds to a second playback period, and the first playback period The deviation between the time period and the second playing time period is less than a preset time difference threshold.
在一种可能的实施方式中,所述多媒体文件包括描述信息,所述装置还包括确定单元1004,确定单元1004用于:根据所述描述信息确定所述多个码流包的交织是否均匀。In a possible implementation manner, the multimedia file includes description information, and the apparatus further includes a determination unit 1004, where the determination unit 1004 is configured to: determine whether the interleaving of the multiple code stream packets is uniform according to the description information.
在一种可能的实施方式中,所述确定单元1004,具体用于:根据所述描述信息统计所述多个码流包中具有对应显示时间戳关系的所述音频数据包和所述视频数据包之间的存储数据量;统计所述存储数据量大于或等于所述目标距离阈值的目标数量;根据所述目标数量确定所述多个码流包的交织是否均匀。In a possible implementation manner, the determining unit 1004 is specifically configured to: count, according to the description information, the audio data packets and the video data that have a corresponding display timestamp relationship in the plurality of stream packets The amount of stored data between packets; count the number of targets whose stored data amount is greater than or equal to the target distance threshold; determine whether the interleaving of the multiple code stream packets is uniform according to the target number.
在一种可能的实施方式中,所述目标距离阈值为第一距离和预设距离中更大的一个,所述第一距离为根据所述描述信息所携带的所述多媒体文件的宽度、高度、存储比或压缩比中的至少一项所确定的;其中,所述多媒体文件包括视频帧,所述多媒体文件的宽度和高度对应于所述视频帧的宽度和高度。In a possible implementation manner, the target distance threshold is a larger one of a first distance and a preset distance, and the first distance is the width and height of the multimedia file carried according to the description information , storage ratio or compression ratio; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
在一种可能的实施方式中,所述确定单元1004,具体用于:根据所述描述信息确定预设数量的码流包中每个码流包的显示时间戳和存储位置,所述预设数量的码流包属于所述多个码流包中的前N个码流包,N为正整数;按照所述显示时间戳逐增的方式确定所述预设数量的码流包中具有对应显示时间戳关系的所述视频数据包和所述音频数据包各自的存储位置;根据统计的所述视频数据包的存储位置和所述音频数据包的存储位置,确定具有对应显示时间戳关系的所述视频数据包和所述音频数据包之间的存储数据量。In a possible implementation manner, the determining unit 1004 is specifically configured to: determine, according to the description information, a display timestamp and a storage location of each stream packet in a preset number of stream packets, the preset The number of code stream packets belongs to the first N code stream packets in the plurality of code stream packets, and N is a positive integer; it is determined that the preset number of code stream packets has corresponding Display the respective storage positions of the video data packets and the audio data packets with the time stamp relationship; The amount of data stored between the video data packets and the audio data packets.
在一种可能的实施方式中,所述确定单元1004,具体用于:计算所述目标数量占所述预设数量的比例;若所述比例大于或等于第二预设阈值,则确定所述多个码流包的交织不均匀;或若所述目标数量大于或等于第三预设阈值,则确定所述多个码流包的交织不均匀。In a possible implementation manner, the determining unit 1004 is specifically configured to: calculate the ratio of the target quantity to the preset quantity; if the ratio is greater than or equal to a second preset threshold, determine the The interleaving of the multiple code stream packets is not uniform; or if the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the multiple code stream packets is not uniform.
在一种可能的实施方式中,所述码流包还包括字幕数据包,所述第二加载单元1003,还用于:连续加载第三数量的所述字幕数据包到字幕解码器中,其中,所述第一数量的所述音频数据包、所述第二数量的所述视频数据包和所述第三数量的所述字幕数据包被交替加载;其中,所述第三数量为大于1的整数,所述第三数量的所述字幕数据包对应第三播放时间段,所述第三播放时间段与所述第一播放时间段之间的偏差,或与所述第二播放时间段之间的偏差均小于所述预设时间差阈值。In a possible implementation manner, the code stream package further includes a subtitle data package, and the second loading unit 1003 is further configured to: continuously load a third number of the subtitle data packages into the subtitle decoder, wherein , the audio data packets of the first quantity, the video data packets of the second quantity and the subtitle data packets of the third quantity are loaded alternately; wherein the third quantity is greater than 1 , the third number of the subtitle data packets corresponds to the third playback time period, the deviation between the third playback time period and the first playback time period, or the difference between the third playback time period and the second playback time period The deviations between them are all smaller than the preset time difference threshold.
在一种可能的实施方式中,所述第二加载单元1003,还用于:若所述描述信息确定所 述多个码流包的交织情况属于交织均匀,则加载显示时间戳最小的所述音频数据包到所述音频解码器中,以及,加载显示时间戳最小的视频数据包到所述视频解码器中。In a possible implementation manner, the second loading unit 1003 is further configured to: if the description information determines that the interleaving of the plurality of code stream packets belongs to uniform interleaving, load the display with the smallest timestamp. Audio packets are loaded into the audio decoder, and video packets with the smallest display timestamp are loaded into the video decoder.
需要说明的是,各个单元的实现还可以对应参照图5、图6或图9所示的方法实施例的相应描述。It should be noted that, the implementation of each unit may also refer to the corresponding description of the method embodiment shown in FIG. 5 , FIG. 6 or FIG. 9 .
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有程序,当上述程序指令在计算机或处理器上运行时,图5、图6或图9所示的方法流程得以实现。Embodiments of the present application further provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium. When the above program instructions are executed on a computer or a processor, the method flow shown in FIG. 5 , FIG. 6 or FIG. 9 be realized.
本申请实施例还提供一种计算机程序产品,计算机程序产品包括程序指令,当程序指令在计算机或处理器上运行时,图5、图6或图9所示的方法流程得以实现。Embodiments of the present application further provide a computer program product, the computer program product includes program instructions, and when the program instructions are run on a computer or a processor, the method flow shown in FIG. 5 , FIG. 6 or FIG. 9 is implemented.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims (21)

  1. 一种多媒体文件的播放方法,其特征在于,所述方法包括:A method for playing multimedia files, characterized in that the method comprises:
    加载多媒体文件到播放器的内存中;Load multimedia files into the player's memory;
    解析所述多媒体文件得到多个码流包,所述码流包包括音频数据包和视频数据包;Parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets include audio data packets and video data packets;
    连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中,其中,所述第一数量为大于1的整数,所述第二数量为大于1的整数。Continuously loading a first number of the audio data packets into the audio decoder, and continuously loading a second number of the video data packets into the video decoder, wherein the first number is an integer greater than 1, so The second number is an integer greater than 1.
  2. 根据权利要求1所述的方法,其特征在于,所述第一数量的所述视频数据包对应第一播放时间段,所述第二数量的所述音频数据包对应第二播放时间段,所述第一播放时间段和所述第二播放时间段之间的偏差小于预设时间差阈值。The method according to claim 1, wherein the first number of the video data packets corresponds to a first playback time period, the second number of the audio data packets corresponds to a second playback time period, and the The deviation between the first playing time period and the second playing time period is less than a preset time difference threshold.
  3. 根据权利要求1或2所述的方法,其特征在于,所述多媒体文件包括描述信息,所述解析所述多媒体文件得到多个码流包之后,所述连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中之前,还包括:The method according to claim 1 or 2, wherein the multimedia file includes description information, and after the multimedia file is parsed to obtain a plurality of stream packets, the continuous loading of the audio data of the first quantity into the audio decoder, and, before successively loading the second number of the video data packets into the video decoder, further comprising:
    根据所述描述信息确定所述多个码流包的交织是否均匀。Whether the interleaving of the plurality of code stream packets is uniform is determined according to the description information.
  4. 根据权利要求3所述的方法,其特征在于,所述根据描述信息确定所述多个码流包的交织是否均匀,包括:The method according to claim 3, wherein the determining whether the interleaving of the multiple code stream packets is uniform according to the description information comprises:
    根据所述描述信息统计所述多个码流包中具有对应显示时间戳关系的所述音频数据包和所述视频数据包之间的存储数据量;According to the description information, count the amount of stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets;
    统计所述存储数据量大于或等于所述目标距离阈值的目标数量;Count the number of targets whose stored data amount is greater than or equal to the target distance threshold;
    根据所述目标数量确定所述多个码流包的交织是否均匀。Whether the interleaving of the plurality of code stream packets is uniform is determined according to the target number.
  5. 根据权利要求4所述的方法,其特征在于,所述目标距离阈值为第一距离和预设距离中更大的一个,所述第一距离为根据所述描述信息所携带的所述多媒体文件的宽度、高度、存储比或压缩比中的至少一项所确定的;其中,所述多媒体文件包括视频帧,所述多媒体文件的宽度和高度对应于所述视频帧的宽度和高度。The method according to claim 4, wherein the target distance threshold is a larger one of a first distance and a preset distance, and the first distance is the multimedia file carried according to the description information It is determined by at least one of the width, height, storage ratio or compression ratio of the multimedia file; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述描述信息统计所述多个码流包中具有对应显示时间戳关系的所述音频数据包和所述视频数据包之间的存储数据量,包括:The method according to claim 4 or 5, characterized in that, according to the description information, the statistics of the audio data packets and the video data packets that have a corresponding display time stamp relationship in the plurality of stream packets are performed. The amount of stored data between, including:
    根据所述描述信息确定预设数量的码流包中每个码流包的显示时间戳和存储位置,所述预设数量的码流包属于所述多个码流包中的前N个码流包,N为正整数;The display timestamp and storage location of each code stream packet in a preset number of code stream packets are determined according to the description information, and the preset number of code stream packets belong to the first N codes in the plurality of code stream packets Stream packet, N is a positive integer;
    按照所述显示时间戳逐增的方式确定所述预设数量的码流包中具有对应显示时间戳关系的所述视频数据包和所述音频数据包各自的存储位置;Determine the respective storage locations of the video data packets and the audio data packets that have a corresponding display time stamp relationship in the preset number of stream packets in a manner of increasing the display time stamps;
    根据统计的所述视频数据包的存储位置和所述音频数据包的存储位置,确定具有对应 显示时间戳关系的所述视频数据包和所述音频数据包之间的存储数据量。According to the storage position of the video data packet and the storage position of the audio data packet, the storage data amount between the video data packet and the audio data packet with the corresponding display time stamp relationship is determined.
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述目标数量确定所述多个码流包的交织是否均匀,包括:The method according to claim 6, wherein the determining whether the interleaving of the plurality of code stream packets is uniform according to the target number comprises:
    计算所述目标数量占所述预设数量的比例;calculating the proportion of the target quantity to the preset quantity;
    若所述比例大于或等于第二预设阈值,则确定所述多个码流包的交织不均匀;或If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven; or
    若所述目标数量大于或等于第三预设阈值,则确定所述多个码流包的交织不均匀。If the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven.
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述码流包还包括字幕数据包,所述解析所述多媒体文件得到描述信息和多个码流包之后,所述方法还包括:The method according to any one of claims 1 to 7, wherein the code stream package further includes a subtitle data package, and after the multimedia file is parsed to obtain description information and a plurality of code stream packages, the method Also includes:
    连续加载第三数量的所述字幕数据包到字幕解码器中,其中,所述第一数量的所述音频数据包、所述第二数量的所述视频数据包和所述第三数量的所述字幕数据包被交替加载;Continuously loading a third number of the subtitle data packets into the subtitle decoder, wherein the first number of the audio data packets, the second number of the video data packets and the third number of all the The subtitle data packets are loaded alternately;
    其中,所述第三数量为大于1的整数,所述第三数量的所述字幕数据包对应第三播放时间段,所述第三播放时间段与所述第一播放时间段之间的偏差,或与所述第二播放时间段之间的偏差均小于所述预设时间差阈值。Wherein, the third quantity is an integer greater than 1, the third quantity of the subtitle data packets corresponds to a third playback time period, and the deviation between the third playback time period and the first playback time period , or the deviation from the second playback time period is smaller than the preset time difference threshold.
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述解析多媒体文件得到描述信息和多个码流包之后,还包括:The method according to any one of claims 1 to 8, wherein after the parsing the multimedia file to obtain the description information and a plurality of code stream packets, the method further comprises:
    若根据所述描述信息确定所述多个码流包的交织均匀,则If it is determined according to the description information that the multiple code stream packets are evenly interleaved, then
    加载显示时间戳最小的所述音频数据包到所述音频解码器中,以及,加载显示时间戳最小的视频数据包到所述视频解码器中。Loading the audio data packet with the smallest display time stamp into the audio decoder, and loading the video data packet with the smallest display time stamp into the video decoder.
  10. 一种多媒体文件的播放装置,其特征在于,所述装置包括:A device for playing multimedia files, characterized in that the device comprises:
    第一加载单元,用于加载多媒体文件到播放器的内存中;The first loading unit is used to load the multimedia file into the memory of the player;
    解析单元,用于解析所述多媒体文件得到多个码流包,所述码流包包括音频数据包和视频数据包;a parsing unit for parsing the multimedia file to obtain a plurality of code stream packets, the code stream packets including audio data packets and video data packets;
    第二加载单元,用于连续加载第一数量的所述音频数据包到音频解码器中,以及,连续加载第二数量的所述视频数据包到视频解码器中,其中,所述第一数量为大于1的整数,所述第二数量为大于1的整数。A second loading unit, configured to continuously load a first number of the audio data packets into the audio decoder, and continuously load a second number of the video data packets into the video decoder, wherein the first number of is an integer greater than 1, and the second number is an integer greater than 1.
  11. 根据权利要求10所述的装置,其特征在于,所述第一数量的所述视频数据包对应第一播放时间段,所述第二数量的所述音频数据包对应第二播放时间段,所述第一播放时间段和所述第二播放时间段之间的偏差小于预设时间差阈值。The apparatus according to claim 10, wherein the first number of the video data packets corresponds to a first playback time period, the second number of the audio data packets corresponds to a second playback time period, and the The deviation between the first playing time period and the second playing time period is less than a preset time difference threshold.
  12. 根据权利要求10或11所述的装置,其特征在于,所述多媒体文件包括描述信息,所述装置还包括确定单元,用于:The apparatus according to claim 10 or 11, wherein the multimedia file includes description information, and the apparatus further comprises a determining unit, configured to:
    根据所述描述信息确定所述多个码流包的交织是否均匀。Whether the interleaving of the plurality of code stream packets is uniform is determined according to the description information.
  13. 根据权利要求12所述的装置,其特征在于,所述确定单元,具体用于:The device according to claim 12, wherein the determining unit is specifically configured to:
    根据所述描述信息统计所述多个码流包中具有对应显示时间戳关系的所述音频数据包和所述视频数据包之间的存储数据量;According to the description information, count the amount of stored data between the audio data packets and the video data packets that have a corresponding display timestamp relationship in the plurality of stream packets;
    统计所述存储数据量大于或等于所述目标距离阈值的目标数量;Count the number of targets whose stored data amount is greater than or equal to the target distance threshold;
    根据所述目标数量确定所述多个码流包的交织是否均匀。Whether the interleaving of the plurality of code stream packets is uniform is determined according to the target number.
  14. 根据权利要求13所述的装置,其特征在于,所述目标距离阈值为第一距离和预设距离中更大的一个,所述第一距离为根据所述描述信息所携带的所述多媒体文件的宽度、高度、存储比或压缩比中的至少一项所确定的;其中,所述多媒体文件包括视频帧,所述多媒体文件的宽度和高度对应于所述视频帧的宽度和高度。The device according to claim 13, wherein the target distance threshold is a larger one of a first distance and a preset distance, and the first distance is the multimedia file carried according to the description information It is determined by at least one of the width, height, storage ratio or compression ratio of the multimedia file; wherein, the multimedia file includes a video frame, and the width and height of the multimedia file correspond to the width and height of the video frame.
  15. 根据权利要求13或14所述的装置,其特征在于,所述确定单元,具体用于:The device according to claim 13 or 14, wherein the determining unit is specifically configured to:
    根据所述描述信息确定预设数量的码流包中每个码流包的显示时间戳和存储位置,所述预设数量的码流包属于所述多个码流包中的前N个码流包,N为正整数;The display timestamp and storage location of each code stream packet in a preset number of code stream packets are determined according to the description information, and the preset number of code stream packets belong to the first N codes in the plurality of code stream packets Stream packet, N is a positive integer;
    按照所述显示时间戳逐增的方式确定所述预设数量的码流包中具有对应显示时间戳关系的所述视频数据包和所述音频数据包各自的存储位置;Determine the respective storage locations of the video data packets and the audio data packets that have a corresponding display time stamp relationship in the preset number of stream packets in a manner of increasing the display time stamps;
    根据统计的所述视频数据包的存储位置和所述音频数据包的存储位置,确定具有对应显示时间戳关系的所述视频数据包和所述音频数据包之间的存储数据量。According to the statistics of the storage positions of the video data packets and the storage positions of the audio data packets, determine the amount of stored data between the video data packets and the audio data packets that have a corresponding display time stamp relationship.
  16. 根据权利要求15所述的装置,其特征在于,所述确定单元,具体用于:The device according to claim 15, wherein the determining unit is specifically configured to:
    计算所述目标数量占所述预设数量的比例;calculating the proportion of the target quantity to the preset quantity;
    若所述比例大于或等于第二预设阈值,则确定所述多个码流包的交织不均匀;或If the ratio is greater than or equal to the second preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven; or
    若所述目标数量大于或等于第三预设阈值,则确定所述多个码流包的交织不均匀。If the target number is greater than or equal to the third preset threshold, it is determined that the interleaving of the plurality of code stream packets is uneven.
  17. 根据权利要求10至16任一项所述的装置,其特征在于,所述码流包还包括字幕数据包,所述第二加载单元,还用于:连续加载第三数量的所述字幕数据包到字幕解码器中,其中,所述第一数量的所述音频数据包、所述第二数量的所述视频数据包和所述第三数量的所述字幕数据包被交替加载;The apparatus according to any one of claims 10 to 16, wherein the code stream package further comprises a subtitle data package, and the second loading unit is further configured to: continuously load a third quantity of the subtitle data into a subtitle decoder, wherein the first number of the audio data packets, the second number of the video data packets, and the third number of the subtitle data packets are alternately loaded;
    其中,所述第三数量为大于1的整数,所述第三数量的所述字幕数据包对应第三播放时间段,所述第三播放时间段与所述第一播放时间段之间的偏差,或与所述第二播放时间段之间的偏差均小于所述预设时间差阈值。Wherein, the third quantity is an integer greater than 1, the third quantity of the subtitle data packets corresponds to a third playback time period, and the deviation between the third playback time period and the first playback time period , or the deviation from the second playback time period is smaller than the preset time difference threshold.
  18. 根据权利要求10至17任一项所述的装置,其特征在于,所述第二加载单元,还用于:The device according to any one of claims 10 to 17, wherein the second loading unit is further configured to:
    若所述描述信息确定所述多个码流包的交织情况属于交织均匀,则If the description information determines that the interleaving of the plurality of code stream packets belongs to uniform interleaving, then
    加载显示时间戳最小的所述音频数据包到所述音频解码器中,以及,加载显示时间戳最小的视频数据包到所述视频解码器中。Loading the audio data packet with the smallest display time stamp into the audio decoder, and loading the video data packet with the smallest display time stamp into the video decoder.
  19. 一种电子设备,其特征在于,所述电子设备包括至少一个处理器和传输接口,所述至少一个处理器通过所述传输接口接收或发送信号;所述至少一个处理器用于调用存储在存储器中的计算机程序,以使得所述装置执行如权利要求1-10中任一项所述的方法。An electronic device, characterized in that the electronic device comprises at least one processor and a transmission interface, the at least one processor receives or sends a signal through the transmission interface; the at least one processor is used to call storage in a memory A computer program to cause the apparatus to perform the method of any of claims 1-10.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,当所述程序指令在处理器上运行时,实现权利要求1-10中任一所述的方法。A computer-readable storage medium, wherein program instructions are stored in the computer-readable storage medium, and when the program instructions are executed on a processor, the method of any one of claims 1-10 is implemented .
  21. 一种计算机程序产品,其特征在于,所述计算机程序产品包括程序指令,当所述程序指令在计算机或处理器上运行时,权利要求1-10中任一项所述的方法得以实现。A computer program product, characterized in that the computer program product includes program instructions, and when the program instructions are executed on a computer or a processor, the method according to any one of claims 1-10 is implemented.
PCT/CN2021/081127 2021-03-16 2021-03-16 Multimedia file playing method and related apparatus WO2022193141A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180095561.2A CN116965038A (en) 2021-03-16 2021-03-16 Multimedia file playing method and related device
PCT/CN2021/081127 WO2022193141A1 (en) 2021-03-16 2021-03-16 Multimedia file playing method and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/081127 WO2022193141A1 (en) 2021-03-16 2021-03-16 Multimedia file playing method and related apparatus

Publications (1)

Publication Number Publication Date
WO2022193141A1 true WO2022193141A1 (en) 2022-09-22

Family

ID=83321806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081127 WO2022193141A1 (en) 2021-03-16 2021-03-16 Multimedia file playing method and related apparatus

Country Status (2)

Country Link
CN (1) CN116965038A (en)
WO (1) WO2022193141A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272536A (en) * 2022-09-26 2022-11-01 深圳乐娱游网络科技有限公司 Animation playing method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1463334A2 (en) * 1995-11-22 2004-09-29 General Instrument Corporation Acquisition and error recovery of audio carried in a packetized data stream
CN102740064A (en) * 2012-06-15 2012-10-17 福建星网视易信息系统有限公司 Packing method for streaming media transmission in intercom system
CN102868939A (en) * 2012-09-10 2013-01-09 杭州电子科技大学 Method for synchronizing audio/video data in real-time video monitoring system
CN102970615A (en) * 2012-11-21 2013-03-13 联想中望系统服务有限公司 System for efficient transmission and coding/encoding of high-definition videos
CN103051928A (en) * 2013-01-25 2013-04-17 上海德思普微电子技术有限公司 Method and device for wireless audio and video data transmission
CN103414957A (en) * 2013-07-30 2013-11-27 广东工业大学 Method and device for synchronization of audio data and video data
CN107637084A (en) * 2015-05-20 2018-01-26 Nxt解决方案公司 IPTV in managed network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1463334A2 (en) * 1995-11-22 2004-09-29 General Instrument Corporation Acquisition and error recovery of audio carried in a packetized data stream
CN102740064A (en) * 2012-06-15 2012-10-17 福建星网视易信息系统有限公司 Packing method for streaming media transmission in intercom system
CN102868939A (en) * 2012-09-10 2013-01-09 杭州电子科技大学 Method for synchronizing audio/video data in real-time video monitoring system
CN102970615A (en) * 2012-11-21 2013-03-13 联想中望系统服务有限公司 System for efficient transmission and coding/encoding of high-definition videos
CN103051928A (en) * 2013-01-25 2013-04-17 上海德思普微电子技术有限公司 Method and device for wireless audio and video data transmission
CN103414957A (en) * 2013-07-30 2013-11-27 广东工业大学 Method and device for synchronization of audio data and video data
CN107637084A (en) * 2015-05-20 2018-01-26 Nxt解决方案公司 IPTV in managed network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272536A (en) * 2022-09-26 2022-11-01 深圳乐娱游网络科技有限公司 Animation playing method and device and electronic equipment

Also Published As

Publication number Publication date
CN116965038A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
US10638166B2 (en) Video sharing method and device, and video playing method and device
US20170311006A1 (en) Method, system and server for live streaming audio-video file
WO2022052773A1 (en) Multi-window screen projection method and electronic device
US10574933B2 (en) System and method for converting live action alpha-numeric text to re-rendered and embedded pixel information for video overlay
US20230162324A1 (en) Projection data processing method and apparatus
US20230217081A1 (en) Screen Casting Method and Terminal
WO2021143386A1 (en) Resource transmission method and terminal
WO2022193141A1 (en) Multimedia file playing method and related apparatus
CN116052701B (en) Audio processing method and electronic equipment
WO2023130896A1 (en) Media data processing method and apparatus, computer device and storage medium
CN116055802B (en) Image frame processing method and electronic equipment
CN114827454B (en) Video acquisition method and device
CN103974087B (en) Video image file compressibility, client and method
CN113873187B (en) Cross-terminal screen recording method, terminal equipment and storage medium
US20240098045A1 (en) Chat interaction method, electronic device, and server
CN116708753A (en) Method, device and storage medium for determining preview blocking reason
CN116264619A (en) Resource processing method, device, server, terminal, system and storage medium
RU2690888C2 (en) Method, apparatus and computing device for receiving broadcast content
US20240073415A1 (en) Encoding Method, Electronic Device, Communication System, Storage Medium, and Program Product
EP2988502B1 (en) Method for transmitting data
CN116055613B (en) Screen projection method and device
US20240056617A1 (en) Signaling changes in aspect ratio of media content
CN113938457B (en) Method, system and equipment for cloud mobile phone to apply remote camera
WO2024027718A1 (en) Multi-window screen mirroring method and system, and electronic device
CN115811614A (en) Video data processing method, chip, electronic device and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21930737

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180095561.2

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21930737

Country of ref document: EP

Kind code of ref document: A1