CN117692681A - Video stream playing method, device, equipment and readable storage medium - Google Patents

Video stream playing method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN117692681A
CN117692681A CN202311776479.3A CN202311776479A CN117692681A CN 117692681 A CN117692681 A CN 117692681A CN 202311776479 A CN202311776479 A CN 202311776479A CN 117692681 A CN117692681 A CN 117692681A
Authority
CN
China
Prior art keywords
media
video
browser
stream
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311776479.3A
Other languages
Chinese (zh)
Inventor
付坤阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kaidelian Software Technology Co ltd
Guangzhou Kaidelian Intelligent Technology Co ltd
Original Assignee
Guangzhou Kaidelian Software Technology Co ltd
Guangzhou Kaidelian Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kaidelian Software Technology Co ltd, Guangzhou Kaidelian Intelligent Technology Co ltd filed Critical Guangzhou Kaidelian Software Technology Co ltd
Priority to CN202311776479.3A priority Critical patent/CN117692681A/en
Publication of CN117692681A publication Critical patent/CN117692681A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the application provides a video stream playing method, a device, equipment and a readable storage medium, wherein terminal equipment acquires a video stream to be played, and decapsulates the video stream to obtain a media block. And the terminal equipment formats the media blocks to obtain coded media blocks which can be read by the bottom layer of the browser, decodes the coded media blocks by utilizing the hardware acceleration function of the media block decoder to obtain a media stream, and plays the media stream through the browser. By adopting the scheme, the hardware acceleration technology is utilized to decode the video stream in the browser, so that the requirement on CPU performance is greatly reduced, the playing speed of the video stream is improved, the playing quality of the video stream is improved, and the heating value of the terminal equipment is reduced. In addition, the media stream obtained by decoding the coded media blocks by utilizing the hardware acceleration function of the media block decoder can be directly played through a browser, so that the load of a CPU (Central processing Unit) is reduced to a certain extent, and the loading waiting time is reduced.

Description

Video stream playing method, device, equipment and readable storage medium
Technical Field
The embodiment of the application relates to the technical field of video playing, in particular to a video stream playing method, a device, equipment and a readable storage medium.
Background
With the rapid development of internet technology, the demand for viewing video streams at clients is increasing. Wherein, the browser is used as a base stone of browser/server (B/S) mode, and is an important role for playing video streams.
In general, a video stream received by a browser from a server is data after being encoded and compressed, such as a video stream in a flash video (FLV) format. Therefore, the video stream needs to be decoded and restored before rendering and playing. The software decoding mode is a decoding algorithm executed by a central processing unit (Central Processing Unit, CPU). In the video stream playing process, the browser unpacks the video stream to obtain data to be decoded, performs software decoding on the data to be decoded to obtain decoded data, and then repackages the decoded data into a data format supported by the browser and plays the data.
However, the above software decoding and repackaging occupy a larger amount of CPU, resulting in slow video stream playback speed and large CPU heat.
Disclosure of Invention
The embodiment of the application provides a video stream playing method, a device, equipment and a readable storage medium, which utilize a hardware acceleration technology to decode video streams in a browser, thereby greatly reducing the requirement on CPU performance, improving the playing speed of the video streams, improving the playing quality of the video streams and reducing the heating value of terminal equipment.
In a first aspect, an embodiment of the present application provides a video stream playing method, including:
acquiring a video stream to be played;
decapsulating the video stream to obtain media blocks;
formatting the media blocks to obtain coded media blocks readable by a browser floor;
decoding the encoded media blocks in the browser using hardware acceleration functionality provided by a media block decoder to obtain a media stream;
and playing the media stream through the browser.
In a second aspect, an embodiment of the present application provides a video stream playing device, including:
the acquisition module is used for acquiring a video stream to be played;
the decapsulation module is used for decapsulating the video stream to obtain media blocks;
the formatting module is used for formatting the media blocks to obtain coded media blocks which can be read by a browser bottom layer;
a processing module, configured to decode the encoded media block in the browser using a hardware acceleration function provided by a media block decoder to obtain a media stream;
and the playing module is used for playing the media stream through the browser.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, which processor, when executing the computer program, causes the electronic device to carry out the method as described above in the first aspect or in the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein computer instructions which, when executed by a processor, are adapted to carry out the method according to the first aspect or the various possible implementations of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the method as described above in the first aspect or in the various possible implementations of the first aspect.
According to the video stream playing method, device and equipment and the readable storage medium, terminal equipment obtains a video stream to be played, and decapsulates the video stream to obtain a media block. And the terminal equipment formats the media blocks to obtain coded media blocks which can be read by the bottom layer of the browser, decodes the coded media blocks by utilizing the hardware acceleration function of the media block decoder to obtain a media stream, and plays the media stream through the browser. By adopting the scheme, the hardware acceleration technology is utilized to decode the video stream in the browser, so that the requirement on CPU performance is greatly reduced, the playing speed of the video stream is improved, the playing quality of the video stream is improved, and the heating value of the terminal equipment is reduced. In addition, the media stream obtained by decoding the coded media blocks by utilizing the hardware acceleration function of the media block decoder can be directly played through a browser, mp4 fragments supported by the browser do not need to be packaged for being played by utilizing the MES technology, the load of a CPU is reduced to a certain extent, the loading waiting time is reduced, the user is prevented from being blocked, and the viewing experience is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a conventional Flash video stream played in a browser;
fig. 2 is an environmental schematic diagram of an implementation of a video stream playing method according to an embodiment of the present application;
fig. 3 is a flowchart of a video stream playing method provided in an embodiment of the present application;
fig. 4 is another flowchart of a video stream playing method provided in an embodiment of the present application;
fig. 5 is a schematic diagram of a video stream playing device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
At present, when video streams such as Flash video are played in a browser, a software decoding mode is mainly adopted. Fig. 1 is a conventional flowchart for playing a Flash video stream in a browser, which includes the following steps:
101. and (5) unpacking the software.
The terminal device acquires a video stream from a server or the like, and decapsulates the acquired video stream. The purpose of the protocol solution is to remove signaling data and the like in the video stream, and only the audio-video data in the encapsulation format is reserved.
The purpose of the de-encapsulation is to separate the audio-video data into compression encoded video information and audio information. The compression-encoded video information is also referred to as data to be decoded, or the like.
102. And (5) decoding the software.
Since the browser does not provide direct access to the underlying layers of the video stream, the terminal device decodes the compression encoded video information using software decoding.
However, software decoding requires computation of a large amount of video information, and requires high CPU performance. Especially, when processing high definition and high code rate video, the huge operand can lead to the problems of low conversion rate, large heating value and the like. The purpose of the software decoding is to convert the data to be decoded into decoded video information.
In addition, the software decoding can not fully utilize the hardware acceleration capability, so that the video playing performance and quality are low, and the problems of picture blocking, color distortion, sound delay and the like are caused.
103. The software encapsulates mp4 fragments.
The browser does not support direct play of the FLV formatted video stream, i.e., the browser cannot directly play the decoded data decoded by the software of step 102. Therefore, in this step, the terminal device repackages the decoded data, thereby obtaining mp4 fragments and the like supported by the browser. When the decoded data are packaged into the mp4 fragment format supported by the browser, the decoded video information also needs to be calculated, and the waiting time is increased due to occupation of a CPU, so that the video playing is not smooth, such as blocking.
104. Mp4 fragments are played using MES technology.
In this step, the terminal device loads and plays mp4 fragments to the browser by means of media source expansion (Media Source Extensions, MSE).
In fig. 1, the software decoding in step 102 and the repackaging in step 103 both require a large number of operations to be performed by calling the CPU, and have high requirements on the CPU, resulting in low efficiency, large heat generation, and time-consuming waiting.
Based on this, the embodiments of the present application provide a video stream playing method, apparatus, device, and readable storage medium, by using a hardware acceleration technology, a video stream is decoded in hardware in a browser, thereby greatly reducing the requirement on the CPU performance, improving the playing speed of the video stream, improving the playing quality of the video stream, and reducing the heating value of a terminal device.
Fig. 2 is an implementation environment schematic diagram of a video stream playing method according to an embodiment of the present application. Referring to fig. 2, the implementation environment includes a terminal device 21 and a server 22. A network connection is established between the terminal device 21 and the server 22.
The terminal device 21 is, for example, a terminal device of a viewing end, including but not limited to a mobile phone, tablet computer, personal computer, electronic book reader, laptop, desktop computer, etc. equipped with an android operating system, microsoft operating system, saint operating system, linux operating system or apple iOS operating system. The terminal device 21 is provided with a browser and the terminal device 21 supports hardware decoding. The terminal device 21 has a display screen, a speaker, and the like for acquiring and playing the video stream from the server 22. The user via the terminal device 21 watches movies, videos, live, video conferences, roams in a three-dimensional digital world, etc. The three-dimensional digital world is, for example, a digital world that can generate video playback effects to a user through video fusion technology.
The server 22 has a storage capability, etc., and the server 22 may be hardware or software. When the server 22 is hardware, the server 22 is a single server or a distributed server cluster of multiple servers. When the server 22 is software, it may be a plurality of software modules or a single software module, etc., and the embodiments of the present application are not limited.
In an on-demand scenario, a recorded video stream of a television show, movie, etc. is stored on the server 22. When the user wants to request, the browser on the terminal device 21 is opened, and a uniform resource locator (Uniform Resource Locator, URL) is input by a mouse, a keyboard, voice, or the like. The terminal device 21 acquires the video stream from the server 22 and plays it according to the URL input by the user.
In the live scene, the video streams are real-time audio and video data uploaded by the main broadcasting end, and the server 22 stores the live video streams. The user opens a browser on the terminal device 21 and inputs a URL by a mouse, a keyboard, voice, or the like. The terminal device 21 requests the acquisition of the video stream from the server 22 and plays it according to the URL input by the user.
In the conference scene, the video stream is real-time audio and video data from a remote conference room. The server stores the conference video streams. The user opens a browser on the terminal device 21, inputs a conference connection, and the terminal device acquires a video stream from the server 22 and plays it.
In the digital world roaming scenario, a user inputs a website of the digital world, etc. in a browser, that is, opens a web site of the digital world and roams in the digital world. For example, a user may want to roam on a line to a university, enter the university's URL and enter the digital world. In the roaming process, the terminal device 21 acquires a video stream from the server 22 and merges the video stream into a three-dimensional scene. For example, a camera is provided at a gate of a school, and is used for photographing vehicles, pedestrians, etc. at the gate, and the camera uploads the photographed video stream to the server 22. When the user roams in the digital world, the terminal device 21 acquires the video stream uploaded by the camera in real time from the server 22, and fuses the video stream into the three-dimensional digital world, so that the user can see the situations of going in and going out of the gate vehicle and passing pedestrians when roaming.
It should be understood that the number of terminal devices 21 and servers 22 in fig. 2 is merely illustrative. In practical implementation, any number of terminal devices 21 and servers 22 are deployed according to practical requirements.
Next, based on the implementation environment shown in fig. 2, the video stream playing according to the embodiment of the present application will be described in detail. Referring to fig. 3, fig. 3 is a flowchart of a video stream playing method according to an embodiment of the present application. The embodiment comprises the following steps:
301. And obtaining a video stream to be played.
In the embodiment of the application, a browser is installed on a terminal device, and the browser sends a Request to a server through methods such as an XML HTTP Request (XHR) and a Fetch to acquire a video stream. Wherein, fetch is an application program interface (Application Program Interface, API) provided by the web that can acquire asynchronous resources.
The video stream may be a live stream, an on-demand stream, a conference video stream, a video stream fused in a three-dimensional digital world, etc. The live stream is, for example, an FLV live stream, etc., and embodiments of the present application are not limited.
302. And decapsulating the video stream to obtain the media blocks.
In the embodiment of the present application, the purpose of the decapsulation is to separate the video stream in encapsulated format into compression-encoded media blocks and media configurations. The terminal equipment unpacks the video stream into media configuration and a plurality of media blocks through software unpacking. Wherein the media blocks include video blocks and audio blocks; the media configuration includes a video configuration and an audio configuration. Video configurations include, but are not limited to, video coding format (codec), code width (code width), code height (code height), frame rate (frame rate), and the like. Audio configurations include, but are not limited to, audio coding format (codec), sampling Bits (sampling Bits), sampling Rate (sampling Rate), channel count (channel count), and the like.
303. The media blocks are formatted to obtain encoded media blocks that can be read by the browser floor.
In this step, the terminal device formats the media blocks into encoded media blocks, where the format of the encoded media blocks is agreed with the browser and can be read by the browser bottom layer. The encoded media block can be read by the browser floor, which does not mean that the encoded media block can be played directly by the browser.
304. The encoded media blocks are decoded in the browser to obtain a media stream using hardware acceleration functionality provided by a media block decoder.
In order to ensure data isolation, the terminal equipment creates a media block decoder by utilizing a media block decoding technology of the bottom layer of the system, and the media block decoder provides a hardware acceleration function, and a specific decoding function is provided by the scheme.
It will be appreciated that: the terminal device has a prototype of the media block decoder, and all media block decoder instances share the same prototype. The creation of a media block decoder in this step can be understood as an instance of the media block decoder created for the present play. For example, the user would watch the live ball game through the terminal device, which creates a media block decoder instance for the live ball game. After a period of time, the user switches to the teaching live broadcast, and the terminal equipment creates a media block decoder instance for the teaching live broadcast.
In the embodiment of the application, the media block decoder is an underlying media block decoding technology supporting GPU acceleration. The terminal device decodes the encoded media blocks obtained in step 303 using the hardware acceleration function of the media block decoder (i.e. the underlying media block decoding technique) to obtain a media stream. The media stream is a media content stream format at the bottom of the system and can be played in the browser.
305. And playing the media stream through the browser.
In this embodiment, the terminal device does not need to encapsulate the media stream obtained in step 304 into mp4 fragments supported by the browser, and then plays the mp4 fragments by using the MES technology. But generates and plays a media stream that can be played directly in the browser. For example, a browser on the terminal device plays the media stream by using the < video > element. The CPU load is reduced to a certain extent, the loading waiting time is reduced, the user is prevented from feeling stuck and the like, and the watching experience is improved.
According to the video stream playing method, the terminal equipment obtains the video stream to be played, and the media block is obtained by decapsulating the video stream. And the terminal equipment formats the media blocks to obtain coded media blocks which can be read by the bottom layer of the browser, decodes the coded media blocks by utilizing the hardware acceleration function of the media block decoder to obtain a media stream, and plays the media stream through the browser. By adopting the scheme, the hardware acceleration technology is utilized to decode the video stream in the browser, so that the requirement on CPU performance is greatly reduced, the playing speed of the video stream is improved, the playing quality of the video stream is improved, and the heating value of the terminal equipment is reduced. In addition, the media stream obtained by decoding the coded media blocks by utilizing the hardware acceleration function of the media block decoder can be directly played through a browser, mp4 fragments supported by the browser do not need to be packaged for being played by utilizing the MES technology, the load of a CPU is reduced to a certain extent, the loading waiting time is reduced, the user is prevented from being blocked, and the viewing experience is improved.
Optionally, in the foregoing embodiment, in the process of decoding the encoded media block to obtain the media stream by using a hardware acceleration function provided by a media block decoder in the browser, the terminal device first creates the media block decoder. The terminal device then provides the media configuration to the media block decoder so that the media block decoder determines a first function indicated by the media configuration from its own supported functions, and decodes the encoded media block in the browser using the first function and the hardware acceleration function provided by the media block decoder to obtain a media stream
Illustratively, the media block decoder itself provides some decoding functionality, such as decoding according to an encoding format, etc. In addition, a function set may be provided, where the functions in the function set are functions that are extended for the media block decoder, i.e. the media block decoder itself does not support the functions in the function set, but may be extended for them. For clarity, a function determined by the media block decoder from among functions supported by itself according to a media configuration is referred to as a first function, and a function determined by the media block decoder from among a set of functions according to a user's setting or the like is referred to as a second function.
In the embodiment of the application, a browser is deployed on the terminal equipment, and a media block decoder prototype is built in the browser. When the video stream is played each time, the terminal equipment acquires the video stream, de-encapsulates the video stream into media configuration and a plurality of media blocks, creates a media block decoder, and provides the media configuration obtained by de-encapsulating the video stream to the media block decoder, so that the media block decoder determines a first function from functions supported by the media block decoder according to the media configuration. For example, the terminal device provides the video configuration to the video block decoder so that the video block decoder determines a first function from the self-supported functions according to the video configuration; the terminal device provides the audio configuration to the audio block decoder so that the audio block decoder determines a first function from among the functions supported by itself according to the audio configuration.
By adopting the scheme, the terminal equipment creates the media block decoder, and provides the media configuration obtained by decapsulating the video stream for the media block decoder, so that the media block decoder determines which first functions are called from the functions supported by the media block decoder, and further decodes the media block decoder through the first functions and the hardware acceleration functions provided by the media block, thereby greatly reducing the requirement on the performance of the CPU, fully utilizing the hardware acceleration capability, improving the playing performance and quality of the video stream, and reducing the heating value of the terminal equipment.
Optionally, in the above embodiment, the terminal device pre-writes the function set, and presets the function set in a web application that needs to run in the browser.
In this embodiment of the present application, the above media configuration includes a video configuration and an audio configuration, and the function set includes a video function set and an audio function set.
The terminal equipment pre-compiles a video function set and presets the video function set in a webpage application needing to run in a browser. Table 1 is a schematic diagram of a video configuration collection in the video stream playing method provided in the embodiment of the present application; table 2 is a schematic diagram of a video function set in the video stream playing method provided in the embodiment of the present application. The configuration items in the video configuration set are used for determining a first function from the functions supported by the terminal equipment.
TABLE 1
TABLE 2
Referring to tables 1 and 2, each of the video configuration sets in table 1 has a corresponding function. When the performance of the configuration items in table 1 is better than the function of the video block decoder itself, the video block decoder replaces the function of itself with the configuration items in the video configuration set. In addition, the terminal device provides the video configuration to the media block decoder, which determines the first function, i.e., which functions to call, from among the functions supported by itself, how to use these functions, and so on, based on the video configuration.
It will be appreciated that the video functionality set shown in table 2 illustrates only a part of the functionality items, and in practice the video functionality set may comprise further functionality items. The terminal equipment presets the video function set in a webpage application which needs to be run in the browser. After receiving the video stream to be played, the subsequent terminal equipment determines a first function from the functions supported by the terminal equipment according to the video configuration, determines a second function required for playing the video stream at the present time from the video function combination set according to the setting file and the like, and decodes the coded video block to obtain a video frame by utilizing the first function, the second function and a hardware acceleration function provided by the video block decoder. For example, the terminal device decapsulates the video stream to obtain video blocks, audio blocks, video configurations, and audio configurations. Wherein the video is configured to: the encoding format is h.264, the video width is 4090, the video height is 2160, and the frame rate is 24. The terminal device provides the video configuration to the video block decoder for the video block decoder to determine a first function from its own supported functions.
Similarly, in the embodiment of the application, the terminal device compiles the audio function collection in advance, and presets the audio function collection in a webpage application which needs to run in a browser. Table 3 is a schematic diagram of an audio configuration collection in the video stream playing method provided in the embodiment of the present application; table 4 is a schematic diagram of an audio function set in the video stream playing method according to the embodiment of the present application. The configuration items in the audio configuration set are used for determining a first function from the functions supported by the terminal equipment.
TABLE 3 Table 3
TABLE 4 Table 4
Referring to tables 3 and 4, each configuration item in the audio configuration set in table 3 has a function corresponding to the audio block decoder. When the performance of the configuration items in table 3 is better than the function of the audio block decoder itself, the audio block decoder replaces the function of itself with the configuration items in the audio configuration set. In addition, the terminal device provides the audio configuration to the media block decoder, which determines the first function, i.e., which functions to call, from among the functions supported by itself, how to use these functions, and so on, based on the audio configuration.
It will be appreciated that the audio functionality set shown in table 4 illustrates only a part of the functionality items, and that in practice the audio functionality set may comprise further functionality items. The terminal device presets the audio function set in a web application that needs to be run in a browser. After receiving the video stream to be played, the subsequent terminal equipment determines a first function from the functions supported by the terminal equipment according to the audio configuration, determines a second function required for playing the video stream at the present time from the audio function combination set according to the setting file and the like, and decodes the coded audio blocks to obtain audio data by utilizing the first function, the second function and a hardware acceleration function provided by the audio block decoder.
In the embodiment of the application, the function set is pre-written and preset in the webpage which needs to run in the browser, and when the follow-up terminal equipment receives the video stream which needs to be played through the browser, the second function which is needed by the playing is determined from the function set, the function is not required to be written every time the video stream is played, and the purpose of improving the playing speed of the video stream is achieved.
Optionally, after receiving the video stream to be played through the browser, the terminal device further obtains a setting file of a web application to be run in the browser, where the setting file indicates a second function, and the second function is a function expanding the media block decoder. The terminal device then provides the settings file to the media block decoder to cause the media block decoder to determine a second function from the set of functions. Finally, the terminal device decodes the encoded media block in the browser using the first function, the second function, and a hardware acceleration function provided by the media block decoder to obtain a media stream.
Illustratively, the browser opens a setup window for users to set up color space (color space), hardware acceleration mode (Hardware Acceleration), delay optimization (Optimize For Latecy), specific character sequences, etc. at their own discretion. The color space includes but is not limited to color gamut, transmission characteristics, matrix coefficient analysis, etc., and the hardware acceleration mode includes but is not limited to both hardware and software acceleration or software decoding. When the hardware acceleration mode is soft and hard, the user has no preference (no-reference).
When the terminal equipment receives a video stream which needs to be played through a browser, on one hand, the video stream is unpacked to obtain a media block and media configuration, and a first function is determined from functions supported by the terminal equipment according to the media configuration; on the other hand, the terminal device acquires a setting file of a web application that needs to be run in the browser, and supplies the setting file to the media decoder so that the media decoder determines the second function from the function combination set.
By adopting the scheme, the user can set the second function required by playing the video stream by opening the setting window, so that the flexibility is high and the mode is simple.
Optionally, in the above embodiment, the media block includes a Video block and an Audio block, the media configuration includes a Video configuration and an Audio configuration, the decoding function includes the Video decoding function (Video Decoder. Configuration) and the Audio decoding function (Audio Decoder), the encoded media block includes the encoded Video block obtained by formatting the Video block, and the encoded Audio block obtained by formatting the Audio block, and the media block Decoder includes the Video block Decoder (Video Decoder) and the Audio block Decoder (Audio Decoder). The terminal equipment decodes the coded media block by utilizing a decoding function and a hardware acceleration function provided by a media block decoder, and in the process of decoding the coded media block to obtain a media stream, decodes the coded video block by utilizing a video decoding function and the hardware acceleration function provided by the video block decoder to obtain a video frame; meanwhile, the encoded audio blocks are decoded using an audio decoding function and a hardware acceleration function provided by an audio block decoder to obtain audio data. The terminal device then generates the media stream from the audio frames and the audio data.
Illustratively, the terminal device decapsulates the video stream to obtain a video configuration, an audio configuration, a number of video blocks, and a number of audio blocks, and creates a video block decoder and an audio block decoder. Thereafter, the video configuration is provided to the video block decoder to determine a first function from among the functions supported by the video block decoder itself, and the setting file of the web application running in the browser is provided to the video block decoder to determine a second function. Similarly, the audio configuration is provided to the audio block decoder to determine a first function from the functions supported by the audio block decoder itself, and the settings file of the web application running in the browser is provided to the video block decoder to determine a second function.
For video blocks and audio blocks, the terminal device formats the video blocks to obtain encoded video blocks that can be read by the browser floor, and formats the audio blocks to obtain encoded audio blocks that can be read by the browser floor. By formatting, encoded video blocks that can be read by the browser floor will be available even though the browser does not provide direct floor access to the video blocks.
Then, the terminal device decodes the encoded video blocks by using the video decoding function and the hardware acceleration function of the video block decoder, thereby obtaining video frames. The hardware decoding of the encoded video blocks is implemented in the browser by utilizing hardware acceleration techniques, i.e., the encoded video blocks are processed by utilizing the hardware decoding function of the graphics card. Similarly, the terminal device decodes the encoded audio blocks by using the audio decoding function and the hardware acceleration function of the audio block decoder, thereby obtaining audio data.
Finally, the terminal device generates a media stream from the video frames and the audio data, the media stream being playable by the browser.
By adopting the scheme, the terminal equipment formats the video block and the audio block into the coded video block and the coded audio block which can be read by the bottom layer of the browser, further decodes the coded video block by utilizing the video decoding function and the video block decoder, and decodes the coded audio block by utilizing the audio decoding function and the audio block decoder, thereby utilizing the hardware acceleration technology to decode the coded video block and the coded audio block in the browser, greatly reducing the requirement on the performance of the CPU and reducing the heating value of the terminal equipment.
Optionally, in the foregoing embodiment, in generating the media stream by the terminal device according to the video frame and the audio data, the terminal device first creates a video stream track and an audio stream track. And the terminal equipment writes the video frames into the video stream track and writes the audio data into the audio stream track. Finally, the terminal device synthesizes the media stream using the video stream track and the audio stream track.
Illustratively, the video stream track prototype and the audio stream track prototype are built in the browser, and each time the video stream is played, the terminal device creates the video stream track and the audio stream track, which means that a video stream track instance and an audio stream track instance are created. For convenience of description, the video stream track instance and the audio stream track instance will be referred to as a video stream track and an audio stream track, respectively.
After creating the video stream track and the audio stream track, the terminal device writes the video frames into the video stream track and the audio data into the audio stream track through the media stream track technology of the system bottom layer. Then, the terminal equipment synthesizes the video stream track and the audio stream track, thereby obtaining a complete media stream, and the media stream can be directly played in a browser.
By adopting the scheme, the terminal equipment synthesizes the video stream track and the audio stream track to obtain the complete media stream, the video stream is not required to be played in a packet MP4 fragment mode, the load of a CPU is reduced to a certain extent, the loading waiting time is shortened, the user is prevented from feeling stuck and the like, and the watching experience is improved.
Optionally, in the foregoing embodiment, in the process that the terminal device formats the media block to obtain an encoded media block that can be read by the bottom layer of the browser, first, a key frame, a non-key frame, a timestamp, and a binary number in the media block are identified; and the terminal equipment formats the media block according to the key frame, the non-key frame, the timestamp and the binary number to obtain the coded media block which can be read by the browser bottom layer.
For example, for a video block, the terminal device identifies a key frame (key frame), a non-key frame (delta frame), a time stamp (time stamp), and a binary number (data) from among a plurality of video blocks. Thereafter, the video blocks are formatted according to key frames (key frames), non-key frames (delta frames), time stamps (time stamp) and binary numbers (data) to obtain encoded video blocks that can be read by the browser floor.
Similarly, for an audio block, the terminal device identifies a key frame (key frame), a non-key frame (delta frame), a time stamp (time stamp), and a binary number (data) from among a plurality of audio blocks. Thereafter, the audio blocks are formatted according to key frames (key frames), non-key frames (delta frames), time stamps (time stamp) and binary numbers (data) to obtain encoded audio blocks that can be read by the browser floor.
By adopting the scheme, the terminal equipment formats the media block after identifying the key frame, the non-key frame, the timestamp and the binary number, thereby realizing the purpose of rapidly formatting the media block.
Optionally, in the foregoing embodiment, in a process that the terminal device plays the media stream through the browser, the media stream is fused into a three-dimensional digital world, where the three-dimensional digital world is a web application currently running in the browser. And then playing the three-dimensional digital world fused with the media stream through a browser.
In this embodiment, the digital world means: based on the digital twin technology, a world which is created in the virtual world and is 1:1 re-carved with the real physical world is used for restoring the real physical world. Each physical entity in the physical world has a corresponding twin in the digital world. The digital world is effectively a three-dimensional map that is scalable, contractible, and rotatable. When a user roams in the digital world, the user can look down the whole digital world and can carefully watch a certain twin in the digital world.
The user inputs the website of the digital world in the browser, and the like, i.e. opens the web site of the digital world and roams in the digital world. For example, a user may want to roam on a line to a university, enter the university's URL and enter the digital world. In the roaming process, the terminal equipment acquires a video stream from the server and fuses the video stream into a three-dimensional scene. For example, an intersection has a camera for photographing vehicles, pedestrians, etc., and the camera uploads a photographed video stream to a server. When the user roams in the digital world, the terminal equipment acquires the video stream uploaded by the camera in real time from the server, and fuses the video stream into the three-dimensional digital world, so that the user can see the situations of vehicles coming and going and passing by the crossroad and pedestrians when roaming.
Obviously, when the digital twin system is deployed, even if some terminal devices are relatively poor in configuration, the problems of clamping, large heating value, fan overload and the like exist in the traditional video stream playing mode. By adopting the scheme, the video stream is subjected to hardware decoding in the browser by utilizing the hardware acceleration technology, so that the requirement on the CPU performance is greatly reduced.
Fig. 4 is another flowchart of a video stream playing method according to an embodiment of the present application. The embodiment comprises the following steps:
401. the software decapsulates the video stream to obtain video blocks, audio blocks, video configurations, and audio configurations.
402. Two threads are started.
Illustratively, one thread is used for video block decoding and another thread is used for audio block decoding. Thereafter, for the video block, steps 403 to 408 and steps 415 and 416 are performed; for audio blocks, steps 409-414 and steps 415 and 416 are performed.
403. A video block decoder is created.
The video block decoder is created based on the video block decoder technology of the bottom layer of the system, and supports GPU acceleration.
404. The video configuration is provided to the video block decoder such that the video block decoder determines the first function from among its supported functions.
The terminal device provides the video configuration and the setting file to the video block decoder, so that the video block decoder determines a first function for video decoding from among the functions supported by itself. In addition, the video block decoder determines a second function based on the profile. The first function includes, but is not limited to, video encoding format, size of video, number of frames per second, etc.; the second function includes, but is not limited to, specifying byte sequences, hardware acceleration patterns, and the like.
405. The video blocks are formatted to obtain encoded video blocks that can be read by the browser floor.
In the formatting process, the terminal device determines a key frame, a time stamp, binary data and the like from the video block so as to format the video block.
406. The aforementioned encoded video blocks are decoded to obtain video frames using the first function, the second function, and a hardware acceleration function provided by the video block decoder.
407. A video stream track is created.
The video stream track at the bottom layer of the system can directly play the video stream without packaging mp4 fragments.
408. Video frames are written to the video stream track.
409. An audio block decoder is created.
410. The audio block decoder provides an audio configuration such that the audio block decoder determines a first function from among its supported functions.
The terminal device provides the audio configuration and the setting file to the audio block decoder, so that the audio block decoder determines a first function for audio decoding from among the functions supported by itself. In addition, the audio block decoder determines a second function from the audio function set based on the setting file or the like. The first function includes, but is not limited to, audio encoding format, number of audio channels, number of frames per second, etc.; the second function includes, but is not limited to, specifying a byte sequence, etc.
411. The audio blocks are formatted to obtain encoded audio blocks that can be read by the browser floor.
In the formatting process, the terminal device determines a key frame, a time stamp, binary data and the like from the audio block so as to format the audio block.
412. The aforementioned encoded audio blocks are decoded to obtain audio data using an audio decoding function and a hardware acceleration function provided by an audio block decoder.
413. An audio stream track is created.
414. Audio data is written to the audio stream track.
415. The media stream is synthesized using the video stream track written to the video frames and the audio stream track written to the audio data.
416. The media stream is played using the Video element.
It will be appreciated that: steps 403 to 406 are the hardware decoding process of the video block, and steps 409 to 412 are the hardware decoding process of the audio block. Step 407, step 408, and steps 413 to 415 are play processes.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Fig. 5 is a schematic diagram of a video stream playing device according to an embodiment of the present application. The video stream playing device 500 includes: an acquisition module 51, a decapsulation module 52, a formatting module 53, a processing module 54 and a playing module 55.
An obtaining module 51, configured to obtain a video stream to be played;
a decapsulation module 52, configured to decapsulate the video stream to obtain media blocks;
a formatting module 53, configured to format the media blocks to obtain encoded media blocks that can be read by the browser bottom layer;
a processing module 54 for decoding the encoded media blocks in the browser using hardware acceleration functionality provided by a media block decoder to obtain a media stream;
and the playing module 55 is used for playing the media stream through the browser.
In a possible implementation, the processing module 54 is configured to decode the encoded video block in the browser using a hardware acceleration function provided by a video block decoder to obtain a video frame; decoding encoded audio blocks in the browser using a hardware acceleration function provided by an audio block decoder to obtain audio data, and generating the media stream from the video frames and the audio data. Wherein the media blocks comprise video blocks and audio blocks, the media configurations comprise video configurations and audio configurations, the encoded media blocks comprise the encoded video blocks obtained by formatting the video blocks, and the encoded audio blocks obtained by formatting the audio blocks, and the media block decoder comprises the video block decoder and the audio block decoder.
In a possible implementation, the processing module 54 is configured to create a video stream track and an audio stream track when generating the media stream from the video frames and the audio data; writing the video frames to the video stream track, and writing the audio data to the audio stream track; and synthesizing the media stream by utilizing the video stream track written in the video frame and the audio stream track written in the audio data.
In a possible implementation, the processing module 54 is configured to create the media block decoder; providing a media configuration to the media block decoder such that the media block decoder determines a first function of the media configuration indication from among its supported functions; decoding the encoded media block in the browser using the first function and a hardware acceleration function provided by the media block decoder to obtain a media stream.
In a possible implementation manner, the processing module 54 is configured to obtain a setting file of a web application running in the browser, where the setting file indicates a second function, and the second function is a function extended for the media block decoder; providing the profile to the media block decoder to cause the media block decoder to determine a second function from a set of functions; decoding the encoded media block in the browser using the first function, the second function, and a hardware acceleration function provided by the media block decoder to obtain a media stream.
In a possible implementation, the processing module 54 is further configured to write the function set before providing the setting file to the media block decoder; and presetting the function set in a webpage application which needs to run in the browser.
In a possible implementation, the formatting module 53 is configured to identify key frames, non-key frames, time stamps and binary numbers in the media block; the media block is formatted according to the key frame, the non-key frame, the timestamp, and the binary number to obtain an encoded media block that can be read by a browser floor.
In a possible implementation, the processing module 54 is further configured to fuse the media stream into a three-dimensional digital world, where the three-dimensional digital world is a web application currently running in the browser;
the playing module 55 is configured to play, by using the browser, the three-dimensional digital world fused with the media stream.
The video stream playing device provided in the embodiment of the present application may perform the actions of the terminal device in the above embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 6, an electronic device 600 according to an embodiment of the present application includes: at least one processor 61, at least one communication bus 62, a user interface 63, at least one network interface 64, and a memory 65.
Wherein the communication bus 62 is used to enable connected communication between these components.
The user interface 63 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 63 may further include a standard wired interface and a standard wireless interface.
The network interface 64 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein processor 61 may comprise one or more processing cores. The processor 61 utilizes various interfaces and lines to connect various portions of the overall electronic device 60, perform various functions of the electronic device 600 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 65, and invoking data stored in the memory 65. Alternatively, the processor 61 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 61 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 61 and may be implemented by a single chip.
The Memory 65 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 65 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 65 may be used to store instructions, programs, code, a set of codes, or a set of instructions. The memory 65 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 65 may also optionally be at least one storage device located remotely from the aforementioned processor 61. As shown in fig. 6, an operating system, a network communication module, a user interface module, and an operating application of the electronic device may be included in the memory 65, which is one type of computer storage medium.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A video stream playing method, comprising:
acquiring a video stream to be played;
decapsulating the video stream to obtain media blocks;
formatting the media blocks to obtain coded media blocks readable by a browser floor;
Decoding the encoded media blocks in the browser using hardware acceleration functionality provided by a media block decoder to obtain a media stream;
and playing the media stream through the browser.
2. The method of claim 1, wherein decoding the encoded media blocks in the browser using hardware acceleration functionality provided by a media block decoder to obtain a media stream comprises:
decoding the encoded video blocks in the browser using a hardware acceleration function provided by a video block decoder to obtain video frames;
decoding an encoded audio block in the browser using a hardware acceleration function provided by an audio block decoder to obtain audio data, the media block including a video block and an audio block, the encoded media block including the encoded video block obtained by formatting the video block and the encoded audio block obtained by formatting the audio block, the media block decoder including the video block decoder and the audio block decoder;
and generating the media stream according to the video frame and the audio data.
3. The method of claim 2, wherein the generating the media stream from the video frames and the audio data comprises:
Creating a video stream track and an audio stream track;
writing the video frames to the video stream track, and writing the audio data to the audio stream track;
and synthesizing the media stream by utilizing the video stream track written in the video frame and the audio stream track written in the audio data.
4. A method according to any one of claims 1 to 3, wherein decoding the encoded media blocks in the browser using hardware acceleration functionality provided by a media block decoder to obtain a media stream, comprises:
creating the media block decoder;
providing a media configuration to the media block decoder such that the media block decoder determines a first function of the media configuration indication from among its supported functions;
decoding the encoded media block in the browser using the first function and a hardware acceleration function provided by the media block decoder to obtain a media stream.
5. The method of claim 4, wherein decoding the encoded media block in the browser using the first function and a hardware acceleration function provided by the media block decoder to obtain a media stream, comprises:
Acquiring a setting file of a webpage application running in the browser, wherein the setting file indicates a second function, and the second function is a function expanding the media block decoder;
providing the profile to the media block decoder to cause the media block decoder to determine a second function from a set of functions;
decoding the encoded media block in the browser using the first function, the second function, and a hardware acceleration function provided by the media block decoder to obtain a media stream.
6. The method of claim 5, wherein prior to providing the settings file to the media block decoder, further comprising:
writing the function collection;
and presetting the function set in a webpage application which needs to run in the browser.
7. A method according to any one of claims 1 to 3, wherein said formatting said media blocks to obtain encoded media blocks readable by a browser floor comprises:
identifying key frames, non-key frames, timestamps, and binary numbers in the media blocks;
the media block is formatted according to the key frame, the non-key frame, the timestamp, and the binary number to obtain an encoded media block that can be read by a browser floor.
8. A method according to any one of claims 1 to 3, wherein said playing said media stream through said browser comprises:
fusing the media stream into a three-dimensional digital world, wherein the three-dimensional digital world is a web application currently running in the browser;
and playing the three-dimensional digital world fused with the media stream through the browser.
9. A video stream playback device, comprising:
the acquisition module is used for acquiring a video stream to be played;
the decapsulation module is used for decapsulating the video stream to obtain media blocks;
the formatting module is used for formatting the media blocks to obtain coded media blocks which can be read by a browser bottom layer;
a processing module, configured to decode the encoded media block in the browser using a hardware acceleration function provided by a media block decoder to obtain a media stream;
and the playing module is used for playing the media stream through the browser.
10. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein execution of the computer program by the processor causes the electronic device to implement the method of any one of claims 1 to 8.
CN202311776479.3A 2023-12-21 2023-12-21 Video stream playing method, device, equipment and readable storage medium Pending CN117692681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311776479.3A CN117692681A (en) 2023-12-21 2023-12-21 Video stream playing method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311776479.3A CN117692681A (en) 2023-12-21 2023-12-21 Video stream playing method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN117692681A true CN117692681A (en) 2024-03-12

Family

ID=90130002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311776479.3A Pending CN117692681A (en) 2023-12-21 2023-12-21 Video stream playing method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN117692681A (en)

Similar Documents

Publication Publication Date Title
JP6570646B2 (en) Audio video file live streaming method, system and server
JP7103402B2 (en) Information processing equipment and information processing method
CN107645491B (en) Media stream transmission apparatus and media service apparatus
US11917221B2 (en) Encoding device and method, reproduction device and method, and program
KR20190008901A (en) Method, device, and computer program product for improving streaming of virtual reality media content
US11178377B2 (en) Methods and apparatus for spherical region presentation
US10791160B2 (en) Method and apparatus for cloud streaming service
TW200418327A (en) Real-time video conferencing method, system and storage medium in web game
JP2003153254A (en) Data processing apparatus and method, as well as program, and storage medium
CN110769241B (en) Video frame processing method and device, user side and storage medium
KR20140117889A (en) Client apparatus, server apparatus, multimedia redirection system and the method thereof
WO2022116822A1 (en) Data processing method and apparatus for immersive media, and computer-readable storage medium
CN117692681A (en) Video stream playing method, device, equipment and readable storage medium
CN113141536B (en) Video cover adding method and device, electronic equipment and storage medium
US20240146981A1 (en) Encoding device and method, reproduction device and method, and program
CN117097907A (en) Audio and video transcoding device, method, equipment, medium and product
CN117834982A (en) Video playing processing method and device, electronic equipment and storage medium
JP2024510139A (en) Methods, apparatus and computer programs for supporting pre-roll and mid-roll during media streaming and playback
KR20220067771A (en) Image processing device and image playing device for high resolution image streaming and operaing method of thereof
CN114760525A (en) Video generation and playing method, device, equipment and medium
CN116546163A (en) Video control method, device and system based on cloud service platform and storage medium
CN116546250A (en) Device, method, equipment and storage medium for audio and video signal delay calibration
CN112887755A (en) Method and device for playing video
CN113490047A (en) Android audio and video playing method
VRT et al. First Version of Playout Clients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination