WO2023010402A1 - 一种媒体文件播放方法、装置、计算机设备及存储介质 - Google Patents

一种媒体文件播放方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2023010402A1
WO2023010402A1 PCT/CN2021/110823 CN2021110823W WO2023010402A1 WO 2023010402 A1 WO2023010402 A1 WO 2023010402A1 CN 2021110823 W CN2021110823 W CN 2021110823W WO 2023010402 A1 WO2023010402 A1 WO 2023010402A1
Authority
WO
WIPO (PCT)
Prior art keywords
timestamp
input
time stamp
output
media file
Prior art date
Application number
PCT/CN2021/110823
Other languages
English (en)
French (fr)
Inventor
张金梁
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Priority to PCT/CN2021/110823 priority Critical patent/WO2023010402A1/zh
Publication of WO2023010402A1 publication Critical patent/WO2023010402A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the present application belongs to the technical field of multimedia processing, and in particular relates to a media file playing method, device, computer equipment and storage medium.
  • TLV Type Length Value
  • TLV packets generally transmit 4K ultra-high-definition programs, and the data volume of ultra-high-definition programs is relatively large.
  • TLV packets generally contain data such as key table information data, audio and video data, CC subtitle data, etc.
  • TLV packets do not contain NTP packets, and NTP (Network Time Protocol, Network Time Protocol) packets are used to make the time of audio and video consistent with UTC (Universal Time Coordinated, World Standard Time) synchronization, therefore, the audio and video time of digital TV is not synchronized with UTC.
  • NTP Network Time Protocol, Network Time Protocol
  • CC Code Caption, closed caption
  • Embodiments of the present application provide a media file playing method, device, computer equipment, and storage medium, which can accurately play audio and video frames and text.
  • a method for playing a media file comprising:
  • the input timestamp and the output timestamp are fused to obtain the target timestamp
  • the audio and video frame and the text are played.
  • an embodiment of the present application provides a media file playing device, including:
  • the response unit is used to respond to the playback instruction for the media file, obtain the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file, and obtain the display time stamp of the text in the media file;
  • a fusion unit is configured to perform fusion processing on the input timestamp and the output timestamp according to the difference information to obtain a target timestamp;
  • the playback unit is configured to play audio and video frames and text if the difference between the target timestamp and the display timestamp is smaller than a first preset threshold.
  • the response unit can also be used to compare the input time stamp with the output time stamp; if the input time stamp matches the output time stamp, then filter the input time stamp that meets the first preset number of digits from the input time stamp The first character string of ; determine the first character string as difference information.
  • the response unit may also be specifically configured to filter out a second character string that satisfies a second preset number of digits from the input timestamp; combine the second character string with the second preset digit with the output timestamp Performing a comparison; if the second character string with the second preset number of digits is the same as the output timestamp, then it is determined that the input timestamp matches the output timestamp.
  • the response unit can also be specifically configured to cut the input time stamp in order from high order to low order; from the cut input time stamp, select high-order character strings satisfying the first preset number of digits; The high-order character string of the first preset number of digits is determined as the first character string.
  • the response unit can also be used to obtain the input timestamp according to the time when the audio and video frames in the media file are input to the receiving hardware; and obtain the output time according to the time when the audio and video frames in the media file are decoded by the output hardware Stamp; According to the input timestamp and output timestamp, get the difference information between the input timestamp and the output timestamp.
  • the response unit can also be used to obtain a standard time stamp, which is a time stamp of Universal Standard Time; compare the standard time stamp with the input time stamp; if the difference between the input time stamp and the standard time stamp If the difference between the time stamps is less than the second preset threshold, it is determined that the input time stamp is a valid time stamp; the standard time stamp is compared with the output time stamp; if the difference between the output time stamp and the standard time stamp is less than the third preset threshold, it is determined that the output timestamp is a valid timestamp; if both the input timestamp and the output timestamp are valid timestamps, the input timestamp is compared with the output timestamp to obtain difference information.
  • a standard time stamp which is a time stamp of Universal Standard Time
  • the media file playing device further includes a storage unit, which can be specifically used to store the input time stamp in the data set; the media file playing device also includes a deletion unit, which can specifically be used to store the input time stamp Remove from the dataset.
  • the fusion unit can also be specifically configured to filter out target strings satisfying preset conditions from the input timestamps according to the difference information; splicing the target strings with the output timestamps to obtain the target timestamps.
  • the fusion unit can also be specifically configured to compare the difference information with the target character string; if the difference information is the same as the target character string, splicing the target character string and the output time stamp to obtain the target time stamp.
  • the fusion unit can also be specifically used to filter out target strings satisfying preset conditions from input timestamps according to difference information; splicing target strings and output timestamps to obtain spliced time Stamp; in the order from low to high, filter out a third character string with the same number of digits as the displayed time stamp from the concatenated time stamp; determine the third character string as the target time stamp.
  • the media file playback device also includes a traversal unit, which can be used to traverse the display timestamp based on the target timestamp; the playback unit can also be used specifically if the difference between the target timestamp and the display timestamp If the value is less than the first preset threshold, the target display time stamp corresponding to the target time stamp is screened out from the display time stamp, and the text corresponding to the target display time stamp is determined; the audio and video frame and the text corresponding to the target display time stamp are played.
  • a traversal unit which can be used to traverse the display timestamp based on the target timestamp
  • the playback unit can also be used specifically if the difference between the target timestamp and the display timestamp If the value is less than the first preset threshold, the target display time stamp corresponding to the target time stamp is screened out from the display time stamp, and the text corresponding to the target display time stamp is determined; the audio and video frame and the text corresponding to the target display time stamp are played.
  • the playing unit may also be specifically configured to play the audio and video frame if the difference between the target time stamp and the display time stamp is greater than or equal to a first preset threshold.
  • the response unit may also be specifically configured to receive a text acquisition instruction when the text is not activated; based on the text acquisition instruction, acquire the display time stamp of the text in the media file.
  • the embodiment of the present application also provides a computer device, including a memory and a processor; the memory stores a computer program, and the processor is used to run the computer program in the memory to execute any one of the media file playing methods provided in the embodiment of the present application .
  • an embodiment of the present application further provides a storage medium, the storage medium stores a computer program, and the computer program is suitable for being loaded by a processor, so as to execute any one of the media file playing methods provided in the embodiments of the present application.
  • the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file may be obtained, and the display time stamp of the text in the media file may be obtained;
  • the input timestamp and the output timestamp are fused to obtain the target timestamp; if the difference between the target timestamp and the display timestamp is less than the preset threshold, the audio and video frames and text are played.
  • the embodiment of the present application can use the difference information between the input time stamp of the audio and video frame and the output time stamp of the audio and video frame to determine the target time stamp, compare the target time stamp with the display time stamp of the text, and then compare the target time stamp and The difference between the displayed time stamps is judged, so that the effect of accurate playback of audio and video frames and text can be achieved.
  • FIG. 1 is a schematic diagram of a scene of a method for playing a media file provided by an embodiment of the present application.
  • FIG. 2 is a flow diagram of a method for playing a media file provided in an embodiment of the present application.
  • FIG. 3 is a schematic flow chart of obtaining difference information in an embodiment of the present application.
  • Fig. 4 is a schematic flow diagram of selecting a first character string satisfying a first preset number of digits from an input timestamp according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a process flow of fusion processing of input time and output time according to an embodiment of the present application.
  • FIG. 6 is a second schematic diagram of a process flow for fusion processing of input time and output time according to an embodiment of the present application.
  • FIG. 7 is a second schematic diagram of the flow of the media file playing method provided by the embodiment of the present application.
  • FIG. 8 is a third schematic flow diagram of a method for playing a media file provided in an embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a media file playing device provided by an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Embodiments of the present application provide a media file playing method, device, computer equipment, and storage medium.
  • the media file playing apparatus may be integrated in a computer device, and the computer device may be a server, or a terminal or other equipment.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication , middleware service, domain name service, security service, network acceleration service (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • the terminal may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart TV, etc., but is not limited thereto.
  • the terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this application.
  • the computer device can respond to a playback instruction for the media file and obtain the difference between the input time stamp and the output time stamp of the audio and video frames in the media file information, and obtain the display time stamp of the text in the media file, then, according to the difference information, the input time stamp and the output time stamp are fused to obtain the target time stamp, and then the target time stamp is compared with the display time stamp, if the target If the difference between the time stamp and the display time stamp is less than the first preset threshold, then the audio/video frame and the text are played, so that the accurate playback of the audio/video frame and the text is realized.
  • the play instruction can be triggered by the user by performing a target operation on the computer device.
  • the computer device is a smart TV
  • the user triggers the play instruction by pressing a play button on a remote control that matches the smart TV.
  • the input timestamp can be cut, and then the cut input timestamp and the output timestamp can be spliced; for example, the input timestamp can be After the timestamp is cut, the cut input timestamp and output timestamp are spliced to obtain the spliced timestamp, and then the spliced timestamp is cut to obtain the target timestamp.
  • the play instruction is triggered by the user performing a target operation on the computer device.
  • the computer device is a smart TV, and the user triggers the play instruction by pressing a play button on a remote control that matches the smart TV.
  • the computer device is a computer, and the user triggers the playback instruction through voice control.
  • the text may be subtitles, text information explaining the content of audio and video frames, or barrage.
  • the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file is obtained, as shown in Figure 3, specifically as follows:
  • A1 Obtain the input time stamp of the audio and video frame in the media file, and obtain the output time stamp of the audio and video frame in the media file.
  • the input timestamp can be obtained according to the time when the audio and video frames in the media file are input to the receiving hardware.
  • the output timestamp can be obtained according to the time when the audio and video frames in the media file are decoded by the output hardware.
  • computer equipment includes receiving hardware, input hardware and output hardware.
  • the computer device receives the audio and video through the signal receiving end of the receiving hardware, and generates the input time stamp of the audio and video frame according to the time of the received audio and video, and then, the audio and video and the input
  • the timestamp is transmitted to the input hardware, and the number of bits of the audio and video and the input timestamp are the same as the number of bits of the input hardware.
  • the audio and video are processed by the input hardware to obtain the processed audio and video
  • the processed audio and video are transmitted to the output hardware, and decoded by the output hardware to obtain the decoded audio and video, the number of bits of the decoded audio and video and the output
  • the number of bits of the hardware is the same
  • the output hardware displays the decoded audio and video on the display interface of the computer device, wherein, after the output hardware decodes the processed audio and video, generates the corresponding frame of the decoded audio and video according to the decoded audio and video. Output timestamp.
  • the input hardware has an input terminal, which is also referred to as a TA terminal.
  • the computer device system kernel can also implement information interaction with the input hardware through the input terminal, so that the computer device can obtain the input time stamp from the input terminal.
  • the output hardware has an output terminal, which is also called a firmware terminal, and the computer device system kernel can also realize information interaction with the output hardware through the output terminal, so that the computer device can obtain the output time stamp through the output terminal.
  • the number of bits of the input hardware is different from that of the output hardware, generally the number of bits of the input hardware is greater than the number of bits of the output hardware. Therefore, the number of bits of the input timestamp of the audio and video obtained from the input end of the input hardware, that is, the TA end, is the same as that from the The output terminal of the output hardware, that is, the firmware terminal, obtains different digits of the output time stamp of the audio and video.
  • the general input hardware adopts 64-bit or 128-bit digits to meet the digits of audio and video
  • the input hardware and the receiving hardware generally have the same number of digits
  • the general output hardware adopts 32-bit digits .
  • the input hardware can be OPTEE (Open Portable Trusted Execution Environment, open portable trusted execution environment) hardware
  • the output hardware can be hardware integrated with audio hardware decoder and video hardware decoder
  • the receiving hardware can be a coordinator or a demodulator.
  • the system kernel of the computer equipment, the output hardware, and the input hardware carry out information interaction. Since the input hardware itself has a kernel, the kernel of the input hardware is called the first sub-kernel here, and the output hardware itself has a kernel. The kernel of the output hardware is called the second sub-kernel. Therefore, the computer equipment system kernel realizes information interaction with the input hardware through shared memory (ShareMemory), and the computer equipment system kernel realizes information exchange with the output hardware through cross-process communication (RPC) interact.
  • ShareMemory shared memory
  • RPC cross-process communication
  • the embodiment of the present application can also obtain the standard timestamp, which is the timestamp of Universal Standard Time; compare the standard timestamp with the input timestamp; if the input timestamp is the same as If the difference between the standard time stamps is less than the second preset threshold, it is determined that the input time stamp is a valid time stamp; if the difference between the input time stamp and the standard time stamp is greater than or equal to the second preset threshold, it is determined that the input Timestamp is not a valid timestamp.
  • the standard timestamp which is the timestamp of Universal Standard Time
  • the second preset threshold may be 0.1 ms, and the second preset threshold may be set according to actual requirements.
  • the standard timestamp can also be compared with the output timestamp; if the difference between the output timestamp and the standard timestamp is less than the third preset threshold, the output timestamp is determined to be Valid timestamp; if both the input timestamp and the output timestamp are valid timestamps, compare the input timestamp with the output timestamp to obtain the difference information.
  • the third preset threshold may be 0.1 ms, and the third preset threshold may be set according to actual requirements.
  • the validity of the input timestamp is ensured by comparing the standard timestamp with the input timestamp
  • the validity of the output timestamp is ensured by comparing the standard timestamp with the output timestamp, thereby ensuring the accuracy of the input timestamp and the output timestamp sex.
  • the input timestamp is stored in the data set. Since there is a delay between the input of the audio and video frames in the media file from the signal receiving end of the receiving hardware to the transmission to the input hardware, the embodiment of the present application stores the input timestamp in the data set, which can fully buffer the input timestamp.
  • the number of digits of the input timestamp is different from that of the output timestamp, for example, the number of digits of the input timestamp is 64 bits, and the number of digits of the output timestamp is 32 bits; for example, the number of digits of the input timestamp is 128 bits , the number of bits in the output timestamp is 32 bits.
  • the number of digits of the input timestamp and the number of digits of the output timestamp are determined by the hardware and chip of the computer device.
  • filter out a second character string that satisfies the second preset number of digits in the input time stamp compare the second character string with the output time stamp; if the second character string is the same as the output time stamp, determine the input time stamp matches the output timestamp; if the second character string of the second preset digit is different from the output timestamp, it is determined that the input timestamp does not match the output timestamp.
  • the second preset number of digits may be consistent with the number of digits of the output timestamp.
  • the second preset number of digits may be the last 32 bits in the input timestamp, that is, the lower 32 bits; the second preset number of digits may be the last 64 bits in the input timestamp, that is, the lower 64 bits.
  • the output timestamp is 32 bits
  • the second preset number of digits is the lower 32 bits of the input timestamp; for example, when the output timestamp is 64 bits, the second preset number of digits is the lower 64 bits of the input timestamp bit.
  • the first preset number of digits can be the first 96 bits in the input timestamp, that is, the upper 96 bits
  • the first preset number of digits can be the first 32 bits in the input timestamp, that is, the upper 32 bits
  • the first preset The number of digits can be the first 64 bits of the input timestamp, that is, the upper 64 bits, which can be determined according to the number of digits of the input timestamp and the number of output timestamps
  • the first preset number of digits can be the number of digits of the input timestamp and the difference between the number of digits in the output timestamp.
  • the first preset number of digits is the upper 96 bits of the input timestamp; for example, when the input timestamp is 64 bits, the output timestamp is 32 bits , the first preset number of digits is the upper 32 bits of the input timestamp; for example, when the input timestamp is 128 bits and the output timestamp is 64 bits, the first preset number of digits is the upper 64 bits of the input timestamp bit.
  • the first character string that satisfies the first preset number of digits is screened out from the input timestamp, as shown in FIG. 4 , which may be specifically as follows:
  • the first preset number of digits is the high 32 bits of the input timestamp, and the input timestamp is 64 bits.
  • the input timestamp is cut in order from high to low, and the input timestamp is cut into a string of high 32 bits. and the lower 32 bits of the string.
  • the first preset number of bits is the upper 96 bits of the input timestamp, and the input timestamp is 128 bits.
  • the input timestamp is cut in order from high to low, and the input timestamp is cut into a string of high 96 bits. and the lower 32 bits of the string.
  • the first preset digit is the upper 32 bits of the input timestamp, and the upper 32 digits of the input timestamp are filtered from the cut input timestamp; for example, the first preset digit is the output timestamp The upper 96 bits of the input timestamp are filtered out from the cut input timestamp.
  • the input timestamp can also be cut in order from low to high, for example, the character strings in the input timestamp are cut one by one. Then, from the cut input time stamp, in order from high to low, filter out high-order character strings satisfying the first preset number of digits one by one.
  • the first preset number of digits is the upper 32 bits of the input timestamp, and the upper 32-bit strings of the input timestamp are filtered out one by one in order from the highest digit to the lowest digit.
  • A4. Determine the first character string as difference information.
  • the first character string may be a 32-bit character string, may be a 96-bit character string, or may be a 64-bit character string. More specifically, the first character string can be the first 32 characters of the input timestamp, that is, the upper 32 characters, it can be the first 96 characters of the input timestamp, that is, the upper 96 characters, or it can be the input The first 64-bit string of the timestamp, that is, the upper 64-bit string.
  • the media file includes audio and video frames and text
  • the computer device caches it locally within a preset time.
  • the preset time can be 1 second or 0.5 second, but not limited to 1 second or 0.5 second. It can be set according to actual requirements.
  • the computer device can directly extract the display time stamp of the text from the locally cached media file.
  • the input time stamp and output time stamp of the audio and video frames in the media file can also be extracted from the locally cached media file.
  • the text matching the audio and video frame is obtained, and based on the text, the display time stamp corresponding to the text is obtained. That is, the display time stamp of the text matching the audio and video frame in the media file is acquired.
  • a text acquisition instruction is received; based on the text acquisition instruction, a display time stamp of the text in the media file is acquired.
  • the text acquisition instruction may be triggered by a user's control operation on the computer device.
  • the input timestamp and the output timestamp are fused to obtain the target timestamp.
  • the first way as shown in Figure 5, can be specifically as follows:
  • the preset condition is determined by the digits of the output timestamp and the digits of the input timestamp. For example, when the output timestamp is 32 bits and the input timestamp is 64 bits, the preset condition is the high 32-bit string of the input timestamp, that is, the preset condition is the high 32-bit string of the input timestamp. For example, when the output timestamp is 32 bits and the input timestamp is 128 bits, the preset condition is the first 96-bit string of the input timestamp, that is, the preset condition is the upper 96-bit string of the input timestamp.
  • the preset condition can also be the same as the difference information, for example, the difference information is a character string in the input timestamp, the difference information can be filtered out from the input timestamp, and the difference information can be used as the target character string.
  • the target string and the output timestamp may be spliced in order from high to low, that is, the output timestamp is spliced after the target string.
  • the difference information can be compared with the target character string; if the difference information is the same as the target character string, the target character string is spliced with the output timestamp to obtain the target timestamp.
  • the second way as shown in Figure 6, can be specifically as follows:
  • the input timestamp is 128 bits
  • the output timestamp is 32 bits
  • the target string filtered from the input timestamp is the high 96-bit string of the output timestamp
  • the high 96-bit string of the output timestamp and the output The timestamps are spliced to obtain the spliced timestamps
  • the spliced timestamps are 128-bit strings.
  • the time stamp after concatenation is a 128-bit character string
  • the display time stamp is a 64-bit character string
  • the lower 64-bit character string, that is, the third character string is extracted from the time stamp after concatenation.
  • the input timestamp and the output timestamp are fused according to the difference information, and after the target timestamp is obtained, the input timestamp can be deleted from the data set, so that the space of the data set can be saved.
  • the time stamps may also be traversed and displayed based on the target time stamp.
  • the first preset threshold may be 0.1 ms, but not limited to 0.1 ms, which may be determined according to actual requirements.
  • the target time stamp after traversing the display time stamps, specifically, if the difference between the target time stamp and the display time stamp is less than the first preset threshold, filter out the time stamp corresponding to the target time stamp from the display time stamp
  • the target displays the timestamp, and determines the text corresponding to the target display timestamp; plays the audio and video frame and the text corresponding to the target display timestamp.
  • the audio and video frame is played.
  • the embodiment of the present application adopts the method of setting the number of digits of the target time stamp of the audio and video frame to be the same as the number of digits of the display time stamp of the text, so that the audio and video frame and the text can be played synchronously.
  • the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file may be obtained, and the display time stamp of the text in the media file may be obtained; according to The difference information fuses the input timestamp and the output timestamp to obtain the target timestamp; if the difference between the target timestamp and the display timestamp is less than the preset threshold, the audio and video frame and text are played.
  • the embodiment of the present application can use the difference information between the input time stamp of the audio and video frame and the output time stamp of the audio and video frame to determine the target time stamp, then compare the target time stamp with the display time stamp of the text, and then compare the target time stamp The difference between the time stamp and the displayed time stamp is judged, so that the effect of accurate playback of audio and video frames and text can be achieved, and synchronous playback of audio and video frames and text can be realized.
  • the media file playing device is integrated in a computer device
  • the computer device is a smart TV
  • the method is applied to a live broadcast scene as an example.
  • the smart TV receives a playback instruction for the media file.
  • the play instruction can be triggered by the user by operating a control button on the remote controller.
  • This remote is compatible with smart TVs.
  • the smart TV acquires the input time stamp and the output time stamp of the audio and video frames in the media file.
  • the smart TV After the smart TV obtains the input timestamp, it can:
  • the standard time stamp which is the time stamp of UTC; compare the standard time stamp with the input time stamp; if the difference between the input time stamp and the standard time stamp is less than the second preset threshold, determine the input Timestamp is a valid timestamp.
  • the smart TV can also store the input time stamp in the data set.
  • the smart TV After the smart TV obtains the output timestamp, it can also:
  • the third preset threshold may be 0.1 ms.
  • the smart TV compares the input timestamp with the output timestamp to obtain difference information.
  • the smart TV compares the input timestamp with the output timestamp to obtain difference information, which can be:
  • the smart TV compares the input time stamp with the output time stamp; if the input time stamp matches the output time stamp, then filter out the first character string satisfying the first preset digit from the input time stamp; convert the first character string Determined as difference information.
  • the smart TV compares the input timestamp with the output timestamp, which can be:
  • the output timestamp is 32 bits
  • the second preset number of digits is the lower 32 bits of the input timestamp
  • the second character string is the lower 32 bits of the input timestamp.
  • the lower 32 bits of the input timestamp are compared with the output timestamp, and if the lower 32 bits of the input timestamp are the same as the output timestamp, it is determined that the input timestamp matches the output timestamp.
  • the smart TV screens out the first character string that satisfies the first preset number of digits from the input timestamp, which may specifically be:
  • the smart TV gets the display timestamp of the text in the media file.
  • the smart TV obtains the display time stamp of the text in the media file, and may:
  • a text acquisition instruction is received; based on the text acquisition instruction, a display time stamp of the text in the media file is acquired.
  • the smart TV screens out the target character string satisfying the preset condition from the input time stamp.
  • the smart TV splices the target character string and the output timestamp to obtain the target timestamp.
  • the input time stamp is 64 bits
  • the input time stamp is 32 bits
  • the preset condition is the upper 32-bit character string of the input time stamp
  • the target character string obtained after filtering is the upper 32-bit character string of the input time stamp.
  • the upper 32-bit string of the input timestamp is concatenated with the output timestamp to obtain the target timestamp
  • the obtained target timestamp is a 64-bit timestamp.
  • the difference information when the difference information is a character string, the difference information can be compared with the target character string; if the difference information is the same as the target character string, the target character string is concatenated with the output timestamp to obtain the target timestamp.
  • the smart TV screens out the target character string satisfying the preset condition from the input time stamp according to the difference information; splicing the target character string and the output time stamp to obtain the time stamp after splicing; In the order of , filter out the third character string with the same number of digits as the displayed time stamp from the time stamp after concatenation; determine the third character string as the target time stamp.
  • the input timestamp is 128 bits
  • the input timestamp is 32 bits
  • the display timestamp is 64 bits.
  • the preset condition is the high 96-bit string of the input timestamp
  • the target string obtained after filtering is the high 96 bits of the input timestamp. bit string.
  • the upper 96-bit string of the input timestamp is spliced with the output timestamp to obtain the spliced timestamp, that is, a 128-bit timestamp.
  • the lower 32-bit character string is filtered out from the concatenated timestamp, and the lower 32-bit character string is the third character string, that is, the target timestamp.
  • the input time stamp may be deleted from the data set.
  • the smart TV will play audio and video frames and text.
  • the difference between the target time stamp and the display time stamp is less than a first preset threshold, filter out the target display time stamp corresponding to the target time stamp from the display time stamp, and determine The target displays the text corresponding to the time stamp; and plays the audio and video frame and the text corresponding to the target display time stamp.
  • the smart TV plays the audio and video frames.
  • the first preset threshold may be 0.1 ms.
  • step 204 may be after step 202 and before or after any step before step 207 .
  • the text is CC subtitles
  • the input time stamp of audio and video frames is 64 bits
  • the output time stamp of audio and video frames is 32 bits
  • the display time stamp of CC subtitles is 64 bits.
  • An example is used for description, for example, as shown in FIG. 8 .
  • the dataset is used to store input timestamps.
  • the memory can store 2-second input time stamps.
  • 60HZ audio and video has 120 frames of audio and video frames within 2 seconds.
  • Each frame of audio and video frames corresponds to a set of input time stamps, that is, the memory can store 120 sets of input timestamp.
  • This memory serves as storage for the data collection of input timestamps.
  • the input timestamp can be stored in the data set in the form of an array.
  • the input end is the input end of the smart TV input hardware.
  • Input timestamps can be obtained by checking the time at the input.
  • the specific process can be: obtain the standard timestamp, which is the timestamp of Universal Time; compare the standard timestamp with the input timestamp; if the input timestamp is the same as If the difference between the standard time stamps is smaller than the second preset threshold, it is determined that the input time stamp is a valid time stamp.
  • the second preset threshold is 0.1 ms.
  • the output end is an output end of the smart TV output hardware.
  • the output timestamp can be obtained by detecting the time of the output terminal.
  • the specific process of judging whether the 32-bit output time stamp is valid can be: comparing the standard time stamp with the output time stamp; if the difference between the output time stamp and the standard time stamp is less than the third preset threshold, then Determines that the output timestamp is a valid timestamp.
  • the third preset threshold is 0.1 ms.
  • step S8 If the lower 32 bits of the 64-bit input timestamp are the same as the 32-bit output timestamp, execute step S8; if not, execute step S7.
  • step S11 is executed.
  • the first preset threshold is 0.1 ms.
  • the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file may be obtained, and the display time stamp of the text in the media file may be obtained;
  • the input timestamp and the output timestamp are fused to obtain the target timestamp; if the difference between the target timestamp and the display timestamp is less than the preset threshold, the audio and video frames and text are played.
  • the embodiment of the present application can use the difference information between the input time stamp of the audio and video frame and the output time stamp of the audio and video frame to determine the target time stamp, then compare the target time stamp with the display time stamp of the text, and then compare the target time stamp The difference between the time stamp and the displayed time stamp is judged, so that the effect of accurate playback of audio and video frames and text can be achieved, and synchronous playback of audio and video frames and text can be realized.
  • the embodiment of the present application also provides a media file playing device, which can be integrated into computer equipment, such as a server or a terminal, and the terminal can include a smart TV, a tablet computer, etc. , laptop and/or personal computer, etc.
  • the media file playback device may include a response unit 301, a fusion unit 302, and a playback unit 303, as follows:
  • the response unit 301 is configured to, in response to a play instruction for the media file, obtain the difference information between the input time stamp and the output time stamp of the audio and video frames in the media file, and obtain the display time stamp of the text in the media file.
  • the response unit 301 can specifically be configured to compare the input timestamp with the output timestamp; if the input timestamp matches the output timestamp, then filter out the input timestamp that satisfies the first preset number of digits.
  • a character string determine the first character string as difference information.
  • the response unit 301 may be specifically configured to filter out a second character string that satisfies a second preset number of digits from the input time stamp; compare the second character string with the output time stamp; if the second character string matches the If the output timestamps are the same, then the input timestamps are determined to match the output timestamps.
  • the response unit 301 can specifically be configured to cut the input time stamps in order from high to low; filter out high-order character strings satisfying the first preset number of digits from the cut input time stamps; A high-order character string with a preset number of digits is determined as the first character string.
  • the response unit 301 can be specifically configured to obtain an input timestamp according to the time when the audio and video frames in the media file are input to the receiving hardware; obtain an output timestamp according to the time when the audio and video frames in the media file are decoded by the output hardware; According to the input timestamp and the output timestamp, get the difference information between the input timestamp and the output timestamp.
  • the response unit 301 can specifically be used to obtain a standard time stamp, which is a time stamp of Universal Standard Time; compare the standard time stamp with the input time stamp; if the time stamp between the input time stamp and the standard time stamp If the difference is less than the second preset threshold, it is determined that the input timestamp is a valid timestamp; the standard timestamp is compared with the output timestamp; if the difference between the output timestamp and the standard timestamp is less than the third preset threshold, Then determine that the output timestamp is a valid timestamp; if both the input timestamp and the output timestamp are valid timestamps, compare the input timestamp with the output timestamp to obtain difference information.
  • a standard time stamp which is a time stamp of Universal Standard Time
  • the response unit 301 may specifically be configured to receive a text acquisition instruction when the text is not activated; based on the text acquisition instruction, acquire the display time stamp of the text in the media file.
  • the fusion unit 302 is configured to perform fusion processing on the input timestamp and the output timestamp according to the difference information to obtain a target timestamp.
  • the fusion unit 302 may specifically be configured to filter out target character strings satisfying preset conditions from input time stamps according to difference information; and splicing target character strings and output time stamps to obtain target time stamps.
  • the fusion unit 302 can specifically be configured to compare the difference information with the target character string; if the difference information is the same as the target character string, splicing the target character string and the output time stamp to obtain the target time stamp.
  • the fusion unit 302 can be specifically configured to filter out a target character string satisfying a preset condition from the input timestamp according to the difference information; splicing the target string and the output timestamp to obtain a spliced timestamp; Filter out a third character string with the same number of digits as the displayed time stamp from the concatenated time stamp in the order from low to high; determine the third character string as the target time stamp.
  • the playing unit 303 is configured to play audio and video frames and text if the difference between the target time stamp and the display time stamp is less than a first preset threshold.
  • the playing unit 303 may specifically be configured to play the audio and video frame if the difference between the target time stamp and the display time stamp is greater than or equal to a first preset threshold.
  • the media file playing device also includes a storage unit 304, and the storage unit 304 is used for:
  • the media file playing device also includes a deletion unit 305, and the deletion unit 305 is used for:
  • the media file playback device also includes a traversal unit 306, which can be used to traverse the display time stamp based on the target time stamp; If it is less than the first preset threshold, the target display time stamp corresponding to the target time stamp is screened out from the display time stamp, and the text corresponding to the target display time stamp is determined; the audio and video frame and the text corresponding to the target display time stamp are played.
  • a traversal unit 306 which can be used to traverse the display time stamp based on the target time stamp; If it is less than the first preset threshold, the target display time stamp corresponding to the target time stamp is screened out from the display time stamp, and the text corresponding to the target display time stamp is determined; the audio and video frame and the text corresponding to the target display time stamp are played.
  • the response unit 301 of the embodiment of the present application can be used to respond to the playback instruction for the media file, obtain the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file, and obtain the display time of the text in the media file stamp;
  • the fusion unit 302 can be used to fuse the input timestamp and the output timestamp according to the difference information to obtain the target timestamp;
  • the playback unit 303 can be used for if the difference between the target timestamp and the display timestamp is less than the preset Threshold, then play audio and video frames and text.
  • the embodiment of the present application can use the difference information between the input time stamp of the audio and video frame and the output time stamp of the audio and video frame to determine the target time stamp, then compare the target time stamp with the display time stamp of the text, and then compare the target time stamp The difference between the time stamp and the displayed time stamp is judged, so that the effect of accurate playback of audio and video frames and text can be achieved, and synchronous playback of audio and video frames and text can be realized.
  • FIG. 10 shows a schematic structural diagram of the computer device involved in the embodiment of the present application, specifically:
  • the computer device may include a processor 401 of one or more processing cores, a memory 402 of one or more computer-readable storage media, a power supply 403, an input unit 404 and other components.
  • a processor 401 of one or more processing cores may include a processor 401 of one or more processing cores, a memory 402 of one or more computer-readable storage media, a power supply 403, an input unit 404 and other components.
  • FIG. 10 does not constitute a limitation on the computer device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. in:
  • the processor 401 is the control center of the computer equipment, and uses various interfaces and lines to connect various parts of the entire computer equipment, and runs or executes software programs and/or modules stored in the memory 402, and calls stored in the memory 402. Data, perform various functions of computer equipment and process data, so as to monitor the computer equipment as a whole.
  • the processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and computer programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 401 .
  • the memory 402 can be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by running the software programs and modules stored in the memory 402 .
  • the memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, computer programs required by at least one function (such as a sound playback function, an image playback function, etc.); Data created by the use of computer equipment, etc.
  • the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the memory 402 may further include a memory controller to provide the processor 401 with access to the memory 402 .
  • the computer device also includes a power supply 403 for supplying power to each component.
  • the power supply 403 can be logically connected to the processor 401 through a power management system, so that functions such as charging, discharging, and power consumption management can be realized through the power management system.
  • the power supply 403 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.
  • the computer device may also include an input unit 404, which may be used for receiving inputted numerical or character information communication, and generating keyboard, mouse, joystick, optical or trackball signal input related to user setting and function control.
  • an input unit 404 which may be used for receiving inputted numerical or character information communication, and generating keyboard, mouse, joystick, optical or trackball signal input related to user setting and function control.
  • the computer device may also include a display unit, etc., which will not be repeated here.
  • the processor 401 in the computer device loads the executable file corresponding to the process of one or more computer programs into the memory 402 according to the following instructions, and the processor 401 executes the executable file stored in the The computer program in memory 402, thereby realizes various functions, as follows:
  • the playback instruction for the media file obtain the difference information between the input time stamp and the output time stamp of the audio and video frame in the media file, and obtain the display time stamp of the text in the media file; according to the difference information, the input time stamp and output The time stamp is fused to obtain the target time stamp; if the difference between the target time stamp and the display time stamp is less than the first preset threshold, the audio and video frame and the text are played.
  • an embodiment of the present application provides a storage medium, in which a computer program is stored, and the computer program can be loaded by a processor to execute any one of the media file playing methods provided in the embodiments of the present application.
  • the storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD, etc.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the methods provided in various optional implementation manners provided in the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请公开了一种媒体文件播放方法、装置、计算机设备及存储介质,可以对媒体文件中音视频帧的输入时间戳和输出时间戳进行融合处理,得到目标时间戳;若目标时间戳和媒体文件中文本的显示时间戳之间的差值小于预设阈值,则播放音视频帧和文本,如此提高了对音视频帧和文本同步播放的准确性。

Description

一种媒体文件播放方法、装置、计算机设备及存储介质 技术领域
本申请属于多媒体处理技术领域,尤其涉及一种媒体文件播放方法、装置、计算机设备及存储介质。
背景技术
目前,ISDB-S3标准的数字电视采用TLV(Type Length Value,即类型长度值)包进行数字电视数据传输,TLV包一般是传输4K的超高清节目,超高清节目内容数据量较大。TLV包一般包含关键表信息数据、音视频数据、CC字幕数据等数据,其中,TLV包没有包含NTP包,NTP(Network Time Protocol,网络时间协议)包用于使音视频的时间与UTC(Universal Time Coordinated,世界标准时间)同步,因此,数字电视的音视频的时间没有与UTC同步。而CC(Closed Caption,隐藏式字幕)字幕的时间采用与UTC同步的时间,因此,音视频未能与CC字幕同步播放,即音视频与CC字幕之间的播放进度会存在一定的延时。
技术问题
音视频未能与CC字幕同步播放,即音视频与CC字幕之间的播放进度会存在一定的延时。
技术解决方案
本申请实施例提供一种媒体文件播放方法、装置、计算机设备及存储介质,可以使得音视频帧和文本准确播放。
一种媒体文件播放方法,包括:
响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;
根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;
若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则播放音视频帧和文本。
相应地,本申请实施例提供一种媒体文件播放装置,包括:
响应单元,用于响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;
融合单元,用于根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;
播放单元,用于若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则播放音视频帧和文本。
在一些实施例中,响应单元,具体还可以用于将输入时间戳与输出时间戳进行比较;若输入时间戳与输出时间戳匹配,则从输入时间戳中筛选出满足第一预设位数的第一字符串;将第一字符串确定为差异信息。
在一些实施例中,响应单元,具体还可以用于在输入时间戳中筛选出满足第二预设位数的第二字符串;将第二预设位数的第二字符串与输出时间戳进行比较;若第二预设位数的第二字符串与输出时间戳相同,则确定输入时间戳与输出时间戳匹配。
在一些实施例中,响应单元,具体还可以用于按照从高位到低位的顺序对输入时间戳进行切割;从切割后的输入时间戳中筛选出满足第一预设位数的高位字符串;将第一预设位数的高位字符串确定为第一字符串。
在一些实施例中,响应单元,具体还可以用于根据媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳;根据媒体文件中音视频帧在输出硬件完成解码的时间,得到输出时间戳;根据输入时间戳和输出时间戳,获取输入时间戳和输出时间戳之间的差异信息。
在一些实施例中,响应单元,具体还可以用于获取标准时间戳,标准时间戳为世界标准时间的时间戳;将标准时间戳与输入时间戳进行对比;若输入时间戳与标准时间戳之间的差值小于第二预设阈值,则确定输入时间戳为有效时间戳;将标准时间戳与输出时间戳进行对比;若输出时间戳与标准时间戳之间的差值小于第三预设阈值,则确定输出时间戳为有效时间戳;若输入时间戳和输出时间戳均为有效时间戳,则将输入时间戳与输出时间戳进行比较,以得到差异信息。
在一些实施例中,媒体文件播放装置还包括存储单元,存储单元具体可以用于将输入时间戳存储于数据集合中;媒体文件播放装置还包括删除单元,删除单元具体可以用于将输入时间戳从数据集合中删除。
在一些实施例中,融合单元,具体还可以用于根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串;将目标字符串与输出时间戳进行拼接,得到目标时间戳。
在一些实施例中,融合单元,具体还可以用于将差异信息与目标字符串进行对比;若差异信息与目标字符串相同,则将目标字符串与输出时间戳进行拼接,得到目标时间戳。
在一些实施例中,融合单元,具体还可以用于根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串;将目标字符串与输出时间戳进行拼接,得到拼接后的时间戳;按照从低位到高位的顺序,从拼接后的时间戳中筛选出与显示时间戳位数相同的第三字符串;将第三字符串确定为目标时间戳。
在一些实施例中,媒体文件播放装置还包括遍历单元,遍历单元可以用于基于目标时间戳,遍历显示时间戳;播放单元,具体还可以用于若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则从显示时间戳中筛选出与目标时间戳对应的目标显示时间戳,并确定目标显示时间戳对应的文本;播放音视频帧和目标显示时间戳对应的文本。
在一些实施例中,播放单元,具体还可以用于若目标时间戳和显示时间戳之间的差值大于或等于第一预设阈值,则播放音视频帧。
在一些实施例中,响应单元,具体还可以用于当文本未被激活时,接收文本获取指令;基于文本获取指令,获取媒体文件中文本的显示时间戳。
此外,本申请实施例还提供一种计算机设备,包括存储器和处理器;存储器存储有计算机程序,处理器用于运行存储器内的计算机程序,以执行本申请实施例提供的任一种媒体文件播放方法。
此外,本申请实施例还提供一种存储介质,存储介质存储有计算机程序,计算机程序适于处理器进行加载,以执行本申请实施例提供的任一种媒体文件播放方法。
有益效果
本申请实施例可以响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;若目标时间戳和显示时间戳之间的差值小于预设阈值,则播放音视频帧和文本。由于本申请实施例能够利用音视频帧的输入时间戳和音视频帧的输出时间戳之间的差异信息确定目标时间戳,将目标时间戳和文本的显示时间戳进行对比,再对目标时间戳和显示时间戳之间的差值进行判定,从而能够达到音视频帧和文本准确播放的效果。
附图说明
下面结合附图,通过对本发明的具体实施方式详细描述,将使本发明的技术方案及其有益效果显而易见。
图1是本申请实施例提供的媒体文件播放方法的场景示意图。
图2是本申请实施例提供的媒体文件播放方法流程示意一图。
图3是本申请实施例获取差异信息的流程示意图。
图4是本申请实施例从输入时间戳中筛选出满足第一预设位数的第一字符串流程示意图。
图5是本申请实施例对输入时间和输出时间进行融合处理的流程示意一图。
图6是本申请实施例对输入时间和输出时间进行融合处理的流程示意二图。
图7是本申请实施例提供的媒体文件播放方法流程示意二图。
图8是本申请实施例提供的媒体文件播放方法流程示意三图。
图9是本申请实施例提供的媒体文件播放装置的结构示意图。
图10是本申请实施例提供的计算机设备的结构示意图。
本发明的实施方式
请参照图示,其中相同的组件符号代表相同的组件,本申请实施例的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。
本申请实施例提供一种媒体文件播放方法、装置、计算机设备和存储介质。其中,该媒体文件播放装置可以集成在计算机设备中,该计算机设备可以是服务器,也可以是终端等设备。
其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、网络加速服务(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能电视机等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
例如,参见图1,以媒体文件播放装置集成在计算机设备中为例,计算机设备可以响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳,然后,根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳,接着,将目标时间戳和显示时间戳进行比较,若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则播放音视频帧和文本,如此实现音视频帧和文本的准确播放。
其中,播放指令可以由用户通过执行针对计算机设备的目标操作触发,例如,计算机设备为智能电视机,用户通过对遥控器上的播放按键进行按压触发播放指令,该遥控器与智能电视机匹配。
其中,将输入时间戳和输出时间戳进行融合处理的方式有多种,例如,可以对输入时间戳进行切割后,再将切割后的输入时间戳与输出时间戳进行拼接;例如,可以对输入时间戳进行切割后,再将切割后的输入时间戳与输出时间戳进行拼接,得到拼接后的时间戳,再将拼接后的时间戳进行切割,从而得到目标时间戳。
如图2所示,该媒体文件播放方法的具体流程如下:
101、响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳。
其中,播放指令由用户通过执行针对计算机设备的目标操作触发,例如,计算机设备为智能电视机,用户通过对遥控器上的播放按键进行按压触发播放指令,该遥控器与智能电视机匹配。例如,计算机设备为电脑,用户通过声控的方式触发播放指令。
其中,文本可以为字幕,可以为对音视频帧的内容进行解释的文字信息,可以为弹幕。
具体地,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,如图3所示,具体可以如下:
A1、获取媒体文件中音视频帧的输入时间戳,以及获取媒体文件中音视频帧的输出时间戳。
具体地,可以根据媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳。
具体地,可以根据媒体文件中音视频帧在输出硬件完成解码的时间,得到输出时间戳。
其中,计算机设备包括接收硬件、输入硬件和输出硬件。音视频从计算设备外部输入计算机设备时,计算机设备通过接收硬件的信号接收端接收到音视频,并根据接收到的音视频的时间对应生成音视频帧的输入时间戳,然后,音视频和输入时间戳传输到输入硬件,该音视频的位数和输入时间戳与输入硬件的位数相同。接着,音视频经过输入硬件的处理,得到处理后的音视频,处理后的音视频传输到输出硬件,经过输出硬件的解码,得到解码后的音视频,解码后的音视频的位数与输出硬件的位数相同,输出硬件将解码后的音视频显示在计算机设备的显示界面上,其中,输出硬件对处理后的音视频进行解码后,根据解码后音视频生成解码后音视频帧对应的输出时间戳。
输入硬件具有输入端,该输入端也称为TA端,计算机设备系统内核也可以通过输入端与输入硬件实现信息交互,从而计算机设备可以从输入端获取到输入时间戳。输出硬件具有输出端,该输出端也称为firmware端,计算机设备系统内核也可以通过输出端与输出硬件实现信息交互,从而计算机设备可以通过输出端获取到输出时间戳。
由于输入硬件的位数与输出硬件的位数不同,一般输入硬件的位数大于输出硬件的位数,因此,从输入硬件的输入端即TA端获取到音视频的输入时间戳位数与从输出硬件的输出端即firmware端获取到音视频的输出时间戳位数不同。基于音视频输入计算机设备之前的位数,一般输入硬件采用64位或128位的位数以符合音视频的位数,输入硬件和接收硬件一般位数相同,一般输出硬件采用32位的位数。其中,输入硬件可以为OPTEE(Open Portable Trusted Execution Environment,开放便携式可信执行环境)硬件,输出硬件可以为音频硬件解码器和视频硬件解码器集成的硬件,接收硬件可以为调协器,也可以为解调器。
此过程中,计算机设备的系统内核和输出硬件,以及输入硬件之间进行信息交互,由于输入硬件本身具有内核,此处将输入硬件的内核称为第一子内核,输出硬件本身具有内核,此处将输出硬件的内核称为第二子内核,因此,计算机设备系统内核通过共享内存(ShareMemory)方式和输入硬件实现信息交互,计算机设备系统内核通过跨进程通信(RPC)方式和输出硬件实现信息交互。
另外,具体地,本申请实施例在得到输入时间戳之后,还可以获取标准时间戳,标准时间戳为世界标准时间的时间戳;将标准时间戳与输入时间戳进行对比;若输入时间戳与标准时间戳之间的差值小于第二预设阈值,则确定输入时间戳为有效时间戳;若输入时间戳与标准时间戳之间的差值大于或等于第二预设阈值,则确定输入时间戳为无效时间戳。
其中,第二预设阈值可以为0.1ms,第二预设阈值可以根据实际要求设定。
本申请实施例在得到输出时间戳之后,还可以将标准时间戳与输出时间戳进行对比;若输出时间戳与标准时间戳之间的差值小于第三预设阈值,则确定输出时间戳为有效时间戳;若输入时间戳和输出时间戳均为有效时间戳,则将输入时间戳与输出时间戳进行比较,以得到差异信息。
其中,第三预设阈值可以为0.1ms,第三预设阈值可以根据实际要求设定。
本申请实施例通过将标准时间戳和输入时间戳对比保证输入时间戳的有效性,将标准时间戳和输出时间戳对比保证输出时间戳的有效性,从而保证输入时间戳和输出时间戳的准确性。
另外,具体地,本申请实施例得到输入时间戳之后,将输入时间戳存储于数据集合中。由于媒体文件中音视频帧从接收硬件的信号接收端输入,到传输给输入硬件之间存在延迟,因此,本申请实施例将输入时间戳存储于数据集合中,能够使得输入时间戳充分缓冲。
A2、将输入时间戳与输出时间戳进行比较。
其中,输入时间戳的位数和输出时间戳的位数不同,例如,输入时间戳的位数为64位,输出时间戳的位数为32位;例如,输入时间戳的位数为128位,输出时间戳的位数为32位。输入时间戳的位数和输出时间戳的位数由计算机设备的硬件和芯片决定。
具体地,在输入时间戳中筛选出满足第二预设位数的第二字符串;将第二字符串与输出时间戳进行比较;若第二字符串与输出时间戳相同,则确定输入时间戳与输出时间戳匹配;若第二预设位数的第二字符串与输出时间戳不同,则确定输入时间戳与输出时间戳不匹配。
其中,第二预设位数可以与输出时间戳的位数一致。第二预设位数可以为输入时间戳中的后32位,即低32位;第二预设位数可以为输入时间戳中的后64位,即低64位。例如,输出时间戳为32位时,第二预设位数为输入时间戳中的低32位;例如,输出时间戳为64位时,第二预设位数为输入时间戳中的低64位。
A3、若输入时间戳与输出时间戳匹配,则从输入时间戳中筛选出满足第一预设位数的第一字符串。
其中,第一预设位数可以为输入时间戳中的前96位,即高96位,第一预设位数可以为输入时间戳中的前32位,即高32位,第一预设位数可以为输入时间戳的前64位,即高64位,这可以根据输入时间戳的位数和输出时间戳的位数来确定,第一预设位数可以为输入时间戳的位数和输出时间戳的位数之间的差值。例如,当输入时间戳为128位,输出时间戳为32位时,第一预设位数为输入时间戳中的高96位;例如,当输入时间戳为64位,输出时间戳为32位时,第一预设位数为输入时间戳中的高32位;例如,当输入时间戳为128位,输出时间戳为64位时,第一预设位数为输入时间戳中的高64位。
具体地,从输入时间戳中筛选出满足第一预设位数的第一字符串,如图4所示,具体可以如下:
B1、按照从高位到低位的顺序对输入时间戳进行切割。
例如,第一预设位数为输入时间戳中的高32位,输入时间戳为64位,按照从高位到低位的顺序对输入时间戳进行切割,将输入时间戳切割为高32位字符串和低32位字符串。
例如,第一预设位数为输入时间戳中的高96位,输入时间戳为128位,按照从高位到低位的顺序对输入时间戳进行切割,将输入时间戳切割为高96位字符串和低32位字符串。
B2、从切割后的输入时间戳中筛选出满足第一预设位数的高位字符串。
例如,第一预设位数为输入时间戳中的高32位,从切割后的输入时间戳中筛选出输入时间戳的高32位字符串;例如,第一预设位数为输出时间戳中的高96位,从切割后的输入时间戳中筛选出输入时间戳的高96位字符串。
B3、将第一预设位数的高位字符串确定为第一字符串。
当然,除了上述的切割方法,还可以按照从低位到高位的顺序对输入时间戳进行切割,例如对输入时间戳中的字符串逐一切割。然后再从切割后的输入时间戳中,按照从高位到低位的顺序,逐一筛选出满足第一预设位数的高位字符串。例如,第一预设位数为输入时间戳中的高32位,按照从高位到低位的顺序,逐一筛选出输入时间戳的高32位字符串。
A4、将第一字符串确定为差异信息。
其中,第一字符串可以为32位的字符串,可以为96位的字符串,也可以为64位的字符串。更具体地,第一字符串可以为输入时间戳的前32位字符串,即高32位字符串,可以为输入时间戳的前96位字符串,即高96位字符串,也可以为输入时间戳的前64位字符串,即高64位字符串。
具体地,获取媒体文件中文本的显示时间戳,可以有多种方法。
例如,由于媒体文件中包括音视频帧和文本,计算机设备接收到媒体文件后,在预设时间内在本地进行缓存,预设时间可以为1秒、0.5秒,但不限于1秒、0.5秒,可以根据实际要求进行设定。然后,计算机设备可以直接从本地缓存的媒体文件中提取到文本的显示时间戳。同理,媒体文件中音视频帧的输入时间戳和输出时间戳也可以在本地缓存的媒体文件中提取到。
例如,根据音视频帧,获取到与该音视频帧匹配文本,基于该文本,获取该文本对应的显示时间戳。也即,获取媒体文件中与音视频帧匹配的文本的显示时间戳。
另外,例如,当文本未被激活时,接收文本获取指令;基于文本获取指令,获取媒体文件中文本的显示时间戳。
其中,文本获取指令可以由用户针对计算机设备的控制操作触发。
根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳。
其中,对输入时间和输出时间进行融合处理的方式有多种,具体可以如下:
第一种方式,如图5所示,具体可以如下:
C1、根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串。
其中,预设条件通过输出时间戳的位数,以及输入时间戳的位数确定。例如,当输出时间戳为32位,输入时间戳为64位时,预设条件为输入时间戳的高32位字符串,也即预设条件为输入时间戳的高32位字符串。例如,当输出时间戳为32位,输入时间戳为128位时,预设条件为输入时间戳的前96位字符串,也即预设条件为输入时间戳的高96位字符串。
当然,预设条件也可以为与差异信息相同,例如,该差异信息为输入时间戳中的字符串,可以从输入时间戳中筛选出该差异信息,并将该差异信息作为目标字符串。
C2、将目标字符串与输出时间戳进行拼接,得到目标时间戳。
具体地,可以按从高位到低位的顺序,将目标字符串和输出时间戳进行拼接,也即,将输出时间戳拼接在目标字符串之后。
具体地,可以将差异信息与目标字符串进行对比;若差异信息与目标字符串相同,则将目标字符串与输出时间戳进行拼接,得到目标时间戳。
第二种方式,如图6所示,具体可以如下:
D1、根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串。
D2、将目标字符串与输出时间戳进行拼接,得到拼接后的时间戳。
例如,输入时间戳为128位,输出时间戳为32位,从输入时间戳中筛选出来的目标字符串为输出时间戳的高96位字符串,将输出时间戳的高96位字符串和输出时间戳进行拼接,得到拼接后的时间戳,拼接后的时间戳为128位字符串。
D3、按照从低位到高位的顺序,从拼接后的时间戳中筛选出与显示时间戳位数相同的第三字符串。
例如,拼接后的时间戳为128位字符串,显示时间戳为64位,从拼接后的时间戳中提取出低64位字符串,也即第三字符串。
另外,具体地,根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳之后,可以将输入时间戳从数据集合中删除,如此,可以节省数据集的空间。
D4、将第三字符串确定为目标时间戳。
另外,确定目标时间戳之后,还可以基于目标时间戳,遍历显示时间戳。
103、若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则播放音视频帧和文本。
其中,第一预设阈值可以为0.1ms,但不限于0.1ms,这可以根据实际要求来确定。
其中,基于目标时间戳,遍历显示时间戳之后,具体地,若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则从显示时间戳中筛选出与目标时间戳对应的目标显示时间戳,并确定目标显示时间戳对应的文本;播放音视频帧和目标显示时间戳对应的文本。
另外,具体地,若目标时间戳和显示时间戳之间的差值大于或等于第一预设阈值,则播放音视频帧。
由于ISDB-S3标准的数字电视所采用的音视频帧的时间戳位数与文本的显示时间戳位数不同,因此,文本无法直接与音视频帧同步播放。因此,本申请实施例采用将音视频帧的目标时间戳的位数和文本的显示时间戳的位数相同的方法,来使得音视频帧和文本实现同步播放。
具体地,本申请实施例可以响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;若目标时间戳和显示时间戳之间的差值小于预设阈值,则播放音视频帧和文本。由于本申请实施例能够利用音视频帧的输入时间戳和音视频帧的输出时间戳之间的差异信息确定目标时间戳,再将目标时间戳和文本的显示时间戳进行对比,再对目标时间戳和显示时间戳之间的差值进行判定,从而能够达到音视频帧和文本准确播放的效果,可以实现音视频帧和文本同步播放。
根据上面实施例所描述的方法,以下将举例作进一步详细说明。
在本实施例中,将以该媒体文件播放装置具体集成在计算机设备,该计算机设备为智能电视机,该方法应用于直播场景为例进行说明。
如图7所示,一种媒体文件播放方法,具体流程如下:
智能电视机接收到针对媒体文件的播放指令。
其中,该播放指令可以由用户通过对遥控器上控制按键操作触发。该遥控器与智能电视机匹配。
智能电视机响应于该播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳。
具体地,智能电视机在获取输入时间戳之后,可以:
获取标准时间戳,标准时间戳为世界标准时间的时间戳;将标准时间戳与输入时间戳进行对比;若输入时间戳与标准时间戳之间的差值小于第二预设阈值,则确定输入时间戳为有效时间戳。
另外,智能电视机还可以将输入时间戳存储于数据集合中。
智能电视机获取输出时间戳之后,还可以:
将标准时间戳与输出时间戳进行对比;若输出时间戳与标准时间戳之间的差值小于第三预设阈值,则确定输出时间戳为有效时间戳;若输入时间戳和输出时间戳均为有效时间戳,则将输入时间戳与输出时间戳进行比较,以得到差异信息。其中,第三预设阈值可以为0.1ms。
智能电视机将输入时间戳和输出时间戳进行比较,得到差异信息。
具体地,智能电视机将输入时间戳和输出时间戳进行比较,得到差异信息,可以为:
智能电视机将输入时间戳与输出时间戳进行比较;若输入时间戳与输出时间戳匹配,则从输入时间戳中筛选出满足第一预设位数的第一字符串;将第一字符串确定为差异信息。
具体地,智能电视机将输入时间戳与输出时间戳进行比较,可以为:
在输入时间戳中筛选出满足第二预设位数的第二字符串;将第二字符串与输出时间戳进行比较;若第二字符串与输出时间戳相同,则确定输入时间戳与输出时间戳匹配。
例如,输出时间戳为32位,第二预设位数为输入时间戳的低32位,第二字符串为输入时间戳的低32位。将输入时间戳的低32位与输出时间戳比较,若输入时间戳的低32位与输出时间戳相同,则确定输入时间戳与输出时间戳匹配。
具体地,智能电视机从输入时间戳中筛选出满足第一预设位数的第一字符串,具体可以为:
按照从高位到低位的顺序对输入时间戳进行切割;从切割后的输入时间戳中筛选出满足第一预设位数的高位字符串;将第一预设位数的高位字符串确定为第一字符串。
智能电视机获取媒体文件中文本的显示时间戳。
具体地,智能电视机获取媒体文件中文本的显示时间戳,可以:
当文本未被激活时,接收文本获取指令;基于文本获取指令,获取媒体文件中文本的显示时间戳。
其中,文本的显示时间戳可以有多个,可以从计算机设备的本地缓存中获取预设时间内的所有显示时间戳。
智能电视机根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串。
智能电视机将目标字符串与输出时间戳进行拼接,得到目标时间戳。
例如,输入时间戳为64位,输入时间戳为32位,预设条件为输入时间戳的高32位字符串,经过筛选得到目标字符串为输入时间戳的高32位字符串。将输入时间戳的高32位字符串与输出时间戳进行拼接,得到目标时间戳,得到的目标时间戳为64位的时间戳。
具体地,当差异信息为字符串时,可以将差异信息与目标字符串进行对比;若差异信息与目标字符串相同,则将目标字符串与输出时间戳进行拼接,得到目标时间戳。
可选地,智能电视机根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串;将目标字符串与输出时间戳进行拼接,得到拼接后的时间戳;按照从低位到高位的顺序,从拼接后的时间戳中筛选出与显示时间戳位数相同的第三字符串;将第三字符串确定为目标时间戳。
例如,输入时间戳为128位,输入时间戳为32位,显示时间戳为64位,预设条件为输入时间戳的高96位字符串,经过筛选得到目标字符串为输入时间戳的高96位字符串。将输入时间戳的高96位字符串与输出时间戳进行拼接,得到拼接后的时间戳,即得到128位的时间戳。从拼接后的时间戳中筛选出低32位字符串,该低32位字符串即为第三字符串,也即为目标时间戳。
另外,具体地,智能电视机得到目标时间戳之后,可以将输入时间戳从数据集合中删除。
若目标时间戳和显示时间戳之间的差值小于第一预设阈值,智能电视机则播放音视频帧和文本。
具体地,若所述目标时间戳和所述显示时间戳之间的差值小于第一预设阈值,则从所述显示时间戳中筛选出与目标时间戳对应的目标显示时间戳,并确定所述目标显示时间戳对应的文本;播放所述音视频帧和所述目标显示时间戳对应的文本。
另外,若目标时间戳和显示时间戳之间的差值大于或等于第一预设阈值,智能电视机则播放音视频帧。其中,第一预设阈值可以为0.1ms。
其中,步骤204可以在步骤202之后,步骤207之前的任一个步骤之前或之后。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
以下将以媒体文件播放装置具体集成在智能电视机,文本为CC字幕,音视频帧的输入时间戳为64位,音视频帧的输出时间戳为32位,CC字幕的显示时间戳为64位为例进行说明,例如,如图8所示。
S1、构建数据集合。
其中,该数据集用于存储输入时间戳。具体地,在智能电视机上安装新的内存。该内存能够存储2秒的输入时间戳,譬如,60HZ的音视频,2秒内具有120帧音视频帧,每一帧音视频帧对应一组输入时间戳,也即该内存能存储120组输入时间戳。该内存作为输入时间戳的数据集合的存储。
S2、获取输入端64位的输入时间戳。
其中,输入时间戳可以以数组的形式存储于数据集合中。输入端为智能电视机输入硬件的输入端。可以通过检测输入端的时间得到输入时间戳。
S3、判断该64位的输入时间戳是否有效,若有效,则执行步骤S4;若无效,则执行步骤S2。
其中,判断该64位的输入时间戳是否有效,具体过程可以为:获取标准时间戳,标准时间戳为世界标准时间的时间戳;将标准时间戳与输入时间戳进行对比;若输入时间戳与标准时间戳之间的差值小于第二预设阈值,则确定输入时间戳为有效时间戳。第二预设阈值为0.1ms。
S4、将64位的输入时间戳存储于数据集合中。
S5、获取输出端32位的输出时间戳。
其中,输出端为智能电视机输出硬件的输出端。本申请实施例可以通过检测输出端的时间得到输出时间戳。
S6、判断该32位的输出时间戳是否有效,若有效,则执行步骤S7;若无效,则执行步骤S2。
其中,判断该32位的输出时间戳是否有效,具体过程可以为:将标准时间戳与输出时间戳进行对比;若输出时间戳与标准时间戳之间的差值小于第三预设阈值,则确定输出时间戳为有效时间戳。第三预设阈值为0.1ms。
S7、将64位输入时间戳的低32位与32位输出时间戳进行一一比较。
若64位输入时间戳的低32位与32位输出时间戳相同,则执行步骤S8;若不同,则执行步骤S7。
S8、筛选出输入时间戳的高32位,并将该输入时间戳从数据集合中删除。
S9、将输入时间戳的高32位和32位的输出时间戳进行拼接,得到目标时间戳。
S10、将目标时间戳和显示时间戳进行比较。
若所述目标时间戳和所述显示时间戳之间的差值小于第一预设阈值,则执行步骤S11。其中,第一预设阈值为0.1ms。
S11、播放音视频帧和CC字幕。
本申请实施例可以响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;若目标时间戳和显示时间戳之间的差值小于预设阈值,则播放音视频帧和文本。由于本申请实施例能够利用音视频帧的输入时间戳和音视频帧的输出时间戳之间的差异信息确定目标时间戳,再将目标时间戳和文本的显示时间戳进行对比,再对目标时间戳和显示时间戳之间的差值进行判定,从而能够达到音视频帧和文本准确播放的效果,可以实现音视频帧和文本同步播放。
为了更好地实施以上方法,本申请实施例还提供一种媒体文件播放装置,该媒体文件播放装置可以集成在计算机设备,比如服务器或终端等设备中,该终端可以包括智能电视机、平板电脑、笔记本电脑和/或个人计算机等。
例如,如图9所示,该媒体文件播放装置可以包括响应单元301、融合单元302、播放单元303,如下:
响应单元301;
响应单元301,用于响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳。
可选地,响应单元301,具体可以用于将输入时间戳与输出时间戳进行比较;若输入时间戳与输出时间戳匹配,则从输入时间戳中筛选出满足第一预设位数的第一字符串;将第一字符串确定为差异信息。
可选地,响应单元301,具体可以用于在输入时间戳中筛选出满足第二预设位数的第二字符串;将第二字符串与输出时间戳进行比较;若第二字符串与输出时间戳相同,则确定输入时间戳与输出时间戳匹配。
可选地,响应单元301,具体可以用于按照从高位到低位的顺序对输入时间戳进行切割;从切割后的输入时间戳中筛选出满足第一预设位数的高位字符串;将第一预设位数的高位字符串确定为第一字符串。
可选地,响应单元301,具体可以用于根据媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳;根据媒体文件中音视频帧在输出硬件完成解码的时间,得到输出时间戳;根据输入时间戳和输出时间戳,获取输入时间戳和输出时间戳之间的差异信息。
可选地,响应单元301,具体可以用于获取标准时间戳,标准时间戳为世界标准时间的时间戳;将标准时间戳与输入时间戳进行对比;若输入时间戳与标准时间戳之间的差值小于第二预设阈值,则确定输入时间戳为有效时间戳;将标准时间戳与输出时间戳进行对比;若输出时间戳与标准时间戳之间的差值小于第三预设阈值,则确定输出时间戳为有效时间戳;若输入时间戳和输出时间戳均为有效时间戳,则将输入时间戳与输出时间戳进行比较,以得到差异信息。
可选地,响应单元301,具体可以用于当文本未被激活时,接收文本获取指令;基于文本获取指令,获取媒体文件中文本的显示时间戳。
融合单元302;
融合单元302,用于根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳。
可选地,融合单元302,具体可以用于根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串;将目标字符串与输出时间戳进行拼接,得到目标时间戳。
可选地,融合单元302,具体可以用于将差异信息与目标字符串进行对比;若差异信息与目标字符串相同,则将目标字符串与输出时间戳进行拼接,得到目标时间戳。
可选地,融合单元302,具体可以用于根据差异信息,在输入时间戳中筛选出满足预设条件的目标字符串;将目标字符串与输出时间戳进行拼接,得到拼接后的时间戳;按照从低位到高位的顺序,从拼接后的时间戳中筛选出与显示时间戳位数相同的第三字符串;将第三字符串确定为目标时间戳。
播放单元303;
播放单元303,用于若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则播放音视频帧和文本。
可选地,播放单元303,具体可以用于若目标时间戳和显示时间戳之间的差值大于或等于第一预设阈值,则播放音视频帧。
另外,媒体文件播放装置还包括存储单元304,存储单元304用于:
将输入时间戳存储于数据集合中。
另外,媒体文件播放装置还包括删除单元305,删除单元305用于:
将输入时间戳从数据集合中删除。
另外,媒体文件播放装置还包括遍历单元306,遍历单元306可以用于基于目标时间戳,遍历显示时间戳;播放单元303,具体还可以用于若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则从显示时间戳中筛选出与目标时间戳对应的目标显示时间戳,并确定目标显示时间戳对应的文本;播放音视频帧和目标显示时间戳对应的文本。
本申请实施例的响应单元301可以用于响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;融合单元302可以用于根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;播放单元303可以用于若目标时间戳和显示时间戳之间的差值小于预设阈值,则播放音视频帧和文本。由于本申请实施例能够利用音视频帧的输入时间戳和音视频帧的输出时间戳之间的差异信息确定目标时间戳,再将目标时间戳和文本的显示时间戳进行对比,再对目标时间戳和显示时间戳之间的差值进行判定,从而能够达到音视频帧和文本准确播放的效果,可以实现音视频帧和文本同步播放。
本申请实施例还提供一种计算机设备,如图10所示,其示出了本申请实施例所涉及的计算机设备的结构示意图,具体来讲:
该计算机设备可以包括一个或者一个以上处理核心的处理器401、一个或一个以上计算机可读存储介质的存储器402、电源403和输入单元404等部件。本领域技术人员可以理解,图10中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器401是该计算机设备的控制中心,利用各种接口和线路连接整个计算机设备的各个部分,通过运行或执行存储在存储器402内的软件程序和/或模块,以及调用存储在存储器402内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。可选的,处理器401可包括一个或多个处理核心;优选的,处理器401可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和计算机程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器401中。
存储器402可用于存储软件程序以及模块,处理器401通过运行存储在存储器402的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器402可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备的使用所创建的数据等。此外,存储器402可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器402还可以包括存储器控制器,以提供处理器401对存储器402的访问。
计算机设备还包括给各个部件供电的电源403,优选的,电源403可以通过电源管理系统与处理器401逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源403还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该计算机设备还可包括输入单元404,该输入单元404可用于接收输入的数字或字符信息通讯,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,计算机设备还可以包括显示单元等,在此不再赘述。具体在本实施例中,计算机设备中的处理器401会按照如下的指令,将一个或一个以上的计算机程序的进程对应的可执行文件加载到存储器402中,并由处理器401来运行存储在存储器402中的计算机程序,从而实现各种功能,如下:
响应于针对媒体文件的播放指令,获取媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取媒体文件中文本的显示时间戳;根据差异信息对输入时间戳和输出时间戳进行融合处理,得到目标时间戳;若目标时间戳和显示时间戳之间的差值小于第一预设阈值,则播放音视频帧和文本。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过计算机程序来完成,或通过计算机程序控制相关的硬件来完成,该计算机程序可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。
为此,本申请实施例提供一种存储介质,其中存储有计算机程序,该计算机程序能够被处理器进行加载,以执行本申请实施例所提供的任一种媒体文件播放方法。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
由于该存储介质中所存储的指令,可以执行本申请实施例所提供的任一种媒体文件播放方法中的步骤,因此,可以实现本申请实施例所提供的任一种媒体文件播放方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
其中,根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述实施例提供的各种可选实现方式中提供的方法。
以上对本申请实施例所提供的一种媒体文件播放方法、装置、计算机设备及存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种媒体文件播放方法,其中,包括:
    响应于针对媒体文件的播放指令,获取所述媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取所述媒体文件中文本的显示时间戳;
    根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳;
    若所述目标时间戳和所述显示时间戳之间的差值小于第一预设阈值,则播放所述音视频帧和所述文本。
  2. 根据权利要求1所述的媒体文件播放方法,其中,所述获取所述媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,包括:
    将所述输入时间戳与所述输出时间戳进行比较;
    若所述输入时间戳与所述输出时间戳匹配,则从所述输入时间戳中筛选出满足第一预设位数的第一字符串;
    将所述第一字符串确定为所述差异信息。
  3. 根据权利要求2所述的媒体文件播放方法,其中,所述将所述输入时间戳与所述输出时间戳进行比较,包括:
    在所述输入时间戳中筛选出满足第二预设位数的第二字符串;
    将所述第二字符串与所述输出时间戳进行比较;
    若所述第二字符串与所述输出时间戳相同,则确定所述输入时间戳与所述输出时间戳匹配。
  4. 根据权利要求2所述的媒体文件播放方法,其中,所述从所述输入时间戳中筛选出满足第一预设位数的第一字符串,包括:
    按照从高位到低位的顺序对所述输入时间戳进行切割;
    从切割后的所述输入时间戳中筛选出满足第一预设位数的高位字符串;
    将所述第一预设位数的高位字符串确定为所述第一字符串。
  5. 根据权利要求1所述的媒体文件播放方法,其中,所述获取所述媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,包括:
    根据所述媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳;
    根据所述媒体文件中音视频帧在输出硬件完成解码的时间,得到输出时间戳;
    根据所述输入时间戳和所述输出时间戳,获取所述输入时间戳和输出时间戳之间的差异信息。
  6. 根据权利要求5所述的媒体文件播放方法,其中,所述根据所述媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳之后,包括:
    获取标准时间戳,所述标准时间戳为世界标准时间的时间戳;
    将所述标准时间戳与所述输入时间戳进行对比;
    若所述输入时间戳与所述标准时间戳之间的差值小于第二预设阈值,则确定所述输入时间戳为有效时间戳;
    所述根据所述媒体文件中音视频帧在输出硬件完成解码的时间,得到输出时间戳之后,包括:
    将所述标准时间戳与所述输出时间戳进行对比;
    若所述输出时间戳与所述标准时间戳之间的差值小于第三预设阈值,则确定所述输出时间戳为有效时间戳;
    若所述输入时间戳和所述输出时间戳均为有效时间戳,则将所述输入时间戳与所述输出时间戳进行比较,以得到所述差异信息。
  7. 根据权利要求5所述的媒体文件播放方法,其中,所述根据所述媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳之后,包括:
    将所述输入时间戳存储于数据集合中;
    所述根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳之后,包括:
    将所述输入时间戳从所述数据集合中删除。
  8. 根据权利要求1所述的媒体文件播放方法,其中,所述根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳,包括:
    根据所述差异信息,在所述输入时间戳中筛选出满足预设条件的目标字符串;
    将所述目标字符串与所述输出时间戳进行拼接,得到目标时间戳。
  9. 根据权利要求8所述的媒体文件播放方法,其中,所述差异信息为字符串;所述将所述目标字符串与所述输出时间戳进行拼接,得到目标时间戳,包括:
    将所述差异信息与所述目标字符串进行对比;
    若所述差异信息与所述目标字符串相同,则将所述目标字符串与所述输出时间戳进行拼接,得到目标时间戳。
  10. 根据权利要求1所述的媒体文件播放方法,其中,所述根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳,包括:
    根据所述差异信息,在所述输入时间戳中筛选出满足预设条件的目标字符串;
    将所述目标字符串与所述输出时间戳进行拼接,得到拼接后的时间戳;
    按照从低位到高位的顺序,从所述拼接后的时间戳中筛选出与所述显示时间戳位数相同的第三字符串;
    将所述第三字符串确定为目标时间戳。
  11. 根据权利要求1所述的媒体文件播放方法,其中,所述根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳之后,包括:
    基于所述目标时间戳,遍历所述显示时间戳;
    所述若所述目标时间戳和所述显示时间戳之间的差值小于第一预设阈值,则播放所述音视频帧和所述文本,包括:
    若所述目标时间戳和所述显示时间戳之间的差值小于第一预设阈值,则从所述显示时间戳中筛选出与目标时间戳对应的目标显示时间戳,并确定所述目标显示时间戳对应的文本;
    播放所述音视频帧和所述目标显示时间戳对应的文本。
  12. 根据权利要求1所述的媒体文件播放方法,其中,所述根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳之后,包括:
    若所述目标时间戳和所述显示时间戳之间的差值大于或等于第一预设阈值,则播放所述音视频帧。
  13. 根据权利要求1所述的媒体文件播放方法,其中,所述获取所述媒体文件中文本的显示时间戳,包括:
    当文本未被激活时,接收文本获取指令;
    基于所述文本获取指令,获取所述媒体文件中文本的显示时间戳。
  14. 一种媒体文件播放装置,其中,包括:
    响应单元,用于响应于针对媒体文件的播放指令,获取所述媒体文件中音视频帧的输入时间戳和输出时间戳之间的差异信息,以及获取所述媒体文件中文本的显示时间戳;
    融合单元,用于根据所述差异信息对所述输入时间戳和所述输出时间戳进行融合处理,得到目标时间戳;
    播放单元,用于若所述目标时间戳和所述显示时间戳之间的差值小于第一预设阈值,则播放所述音视频帧和所述文本。
  15. 根据权利要求14所述的媒体文件播放装置,其中,所述响应单元,还用于:
    将所述输入时间戳与所述输出时间戳进行比较;
    若所述输入时间戳与所述输出时间戳匹配,则从所述输入时间戳中筛选出满足第一预设位数的第一字符串;
    将所述第一字符串确定为所述差异信息。
  16. 根据权利要求15所述的媒体文件播放装置,其中,所述响应单元,还用于:
    在所述输入时间戳中筛选出满足第二预设位数的第二字符串;
    将所述第二字符串与所述输出时间戳进行比较;
    若所述第二字符串与所述输出时间戳相同,则确定所述输入时间戳与所述输出时间戳匹配。
  17. 根据权利要求15所述的媒体文件播放装置,其中,所述响应单元,还用于:
    按照从高位到低位的顺序对所述输入时间戳进行切割;
    从切割后的所述输入时间戳中筛选出满足第一预设位数的高位字符串;
    将所述第一预设位数的高位字符串确定为所述第一字符串。
  18. 根据权利要求14所述的媒体文件播放装置,其中,所述响应单元,还用于:
    根据所述媒体文件中音视频帧输入接收硬件的时间,得到输入时间戳;
    根据所述媒体文件中音视频帧在输出硬件完成解码的时间,得到输出时间戳;
    根据所述输入时间戳和所述输出时间戳,获取所述输入时间戳和输出时间戳之间的差异信息。
  19. 一种计算机设备,其中,包括存储器和处理器;所述存储器存储有计算机程序,所述处理器用于运行所述存储器内的计算机程序,以执行权利要求1至13任一项所述的媒体文件播放方法。
  20. 一种存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序适于处理器进行加载,以执行权利要求1至13任一项所述的媒体文件播放方法。
PCT/CN2021/110823 2021-08-05 2021-08-05 一种媒体文件播放方法、装置、计算机设备及存储介质 WO2023010402A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/110823 WO2023010402A1 (zh) 2021-08-05 2021-08-05 一种媒体文件播放方法、装置、计算机设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/110823 WO2023010402A1 (zh) 2021-08-05 2021-08-05 一种媒体文件播放方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023010402A1 true WO2023010402A1 (zh) 2023-02-09

Family

ID=85154965

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/110823 WO2023010402A1 (zh) 2021-08-05 2021-08-05 一种媒体文件播放方法、装置、计算机设备及存储介质

Country Status (1)

Country Link
WO (1) WO2023010402A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117319738A (zh) * 2023-12-01 2023-12-29 飞狐信息技术(天津)有限公司 一种字幕延迟方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107105352A (zh) * 2017-05-16 2017-08-29 青岛海信电器股份有限公司 字幕同步方法及装置
CN107786876A (zh) * 2017-09-21 2018-03-09 北京达佳互联信息技术有限公司 音乐和视频的同步方法、装置及移动终端
CN109246472A (zh) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 视频播放方法、装置、终端设备及存储介质
CN110933449A (zh) * 2019-12-20 2020-03-27 北京奇艺世纪科技有限公司 一种外部数据与视频画面的同步方法、系统及装置
EP3839953A1 (en) * 2019-12-18 2021-06-23 Institut Mines Telecom Automatic caption synchronization and positioning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107105352A (zh) * 2017-05-16 2017-08-29 青岛海信电器股份有限公司 字幕同步方法及装置
CN107786876A (zh) * 2017-09-21 2018-03-09 北京达佳互联信息技术有限公司 音乐和视频的同步方法、装置及移动终端
CN109246472A (zh) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 视频播放方法、装置、终端设备及存储介质
EP3839953A1 (en) * 2019-12-18 2021-06-23 Institut Mines Telecom Automatic caption synchronization and positioning
CN110933449A (zh) * 2019-12-20 2020-03-27 北京奇艺世纪科技有限公司 一种外部数据与视频画面的同步方法、系统及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117319738A (zh) * 2023-12-01 2023-12-29 飞狐信息技术(天津)有限公司 一种字幕延迟方法、装置、电子设备及存储介质
CN117319738B (zh) * 2023-12-01 2024-03-08 飞狐信息技术(天津)有限公司 一种字幕延迟方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
CN109168078B (zh) 一种视频清晰度切换方法及装置
WO2019205886A1 (zh) 字幕数据推送方法、字幕展示方法、装置、设备及介质
WO2019205872A1 (zh) 视频流处理方法、装置、计算机设备及存储介质
US11417341B2 (en) Method and system for processing comment information
US10580459B2 (en) Dynamic media interaction using time-based metadata
US10528631B1 (en) Media data presented with time-based metadata
EP3170311B1 (en) Automatic detection of preferences for subtitles and dubbing
CN103299600B (zh) 用于传输直播媒体内容的装置和方法
CN110996160B (zh) 视频处理方法、装置、电子设备及计算机可读取存储介质
US11100956B2 (en) MP4 file processing method and related device
JP6385447B2 (ja) 動画提供方法および動画提供システム
US20200314509A1 (en) Video Processing Method, Terminal and Server
WO2018086303A1 (zh) 一种广告插播方法、装置及可读存储介质
WO2016150273A1 (zh) 一种视频播放方法、移动终端及系统
US7349395B2 (en) System, method, and computer program product for parsing packetized, multi-program transport stream
US10277652B2 (en) Transmission apparatus, transmission method, and program
WO2023010402A1 (zh) 一种媒体文件播放方法、装置、计算机设备及存储介质
JP2007274142A (ja) 映像送信装置及び映像送信方法
CN108989905B (zh) 媒体流控制方法、装置、计算设备及存储介质
WO2024109317A1 (zh) 一种传输视频帧及摄像参数信息的方法与设备
CN104639976A (zh) 机顶盒与移动终端节目同步的方法、机顶盒及移动终端
WO2017061299A1 (ja) 情報処理装置および情報処理方法
US20230007322A1 (en) Techniques for composite media storage and retrieval
CN115589506A (zh) 一种mp4格式音视频文件生成方法、系统及计算机储存介质
WO2007079631A1 (fr) Procédé, système et passerelle multimédia permettant d'exécuter une fonction de sous-titrage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952297

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21952297

Country of ref document: EP

Kind code of ref document: A1