WO2024087208A1

WO2024087208A1 - Video playback method and system, and storage medium

Info

Publication number: WO2024087208A1
Application number: PCT/CN2022/128383
Authority: WO
Inventors: 谭红平; 刘文泽; 吴杰
Original assignee: 深圳市锐明技术股份有限公司
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2024-05-02

Abstract

The present application relates to the field of videos, and provides a video playback method and system, and a storage medium. The method comprises: receiving a video playback request of a browser end, and sending, to a device end for capturing a video, signaling for opening the video; receiving first video data uploaded by the device end, encrypting an audio frame in the first video data by means of predetermined first encryption information, expanding a video frame structure in the first video data, and adding first decryption information to the expanded video frame structure; and sending an expanded video frame and the encrypted audio frame to the browser end, so that the browser end performs decryption and playback according to the first decryption information in the expanded video frame structure. According to the present application, a video frame structure is expanded, thereby obviating the need to separately call an interface of a service end for querying first decryption information, and facilitating the reduction of access pressure on the service end and the reduction of a video playback delay.

Description

Video playback method, system and storage medium

Technical Field

The present application relates to the field of video, and in particular to a video playback method, system and storage medium.

Background technique

In recent years, with the rapid development of technologies such as computers, big data, image analysis, and network transmission, video surveillance, as an important part of the social public security system, has been widely used in many fields such as transportation, finance, public security, electricity, water conservancy, and hotels, and the market size is also growing year by year. However, in the video surveillance system, the collected videos contain a lot of personal biometric information and confidential information of the unit. If the video is leaked, it may lead to the leakage of personal privacy, the leakage of commercial secrets, and adverse social impacts. In the current scenario where network security and privacy security are increasingly valued, the security of video access is extremely important.

To ensure the security of video access, the general solution in the industry is that the server encrypts the audio and video data, and the client decrypts and plays the audio and video data (if the client is a browser, you also need to install a plug-in for the browser, such as the Active plug-in). The main process is as follows:

1. The user starts the client for playing the video;

2. The client queries the server for decryption information of the played video;

3. The server verifies the legitimacy of the client's access and returns the decrypted information to the client after the verification passes;

4. The client pulls the video stream from the server, decrypts the encrypted audio and video data through the decryption information, and then decodes and plays the audio and video data.

Although this method improves the security of audio and video data to a certain extent, when the client plays the video, the browser needs to install a plug-in, and the server must be queried for decryption information when playing the video. When there are a large number of users playing videos, it is easy to cause great access pressure on the server, affecting the concurrency of the platform. In addition, the interactive operation of sending query decryption information to the server will consume a certain amount of time, resulting in a delay in the display time of the video screen, which brings a bad user experience for low-latency video playback.

technical problem

In view of this, the embodiments of the present application provide a video playback method, system and storage medium to solve the problem in the prior art that when playing a video, the client needs to query the server for decryption information, which causes great access pressure on the server, affects the concurrency of the platform, causes delays in the video display time, and brings a bad user experience for low-latency video playback.

Technical Solutions

A first aspect of an embodiment of the present application provides a video playback method, which is applied to a server, and the method includes:

Receive a video playback request from the browser, and send a signal to open the video to the device used to capture the video;

receiving first video data uploaded by a device, encrypting an audio frame in the first video data by using predetermined first encryption information, extending a video frame structure in the first video data, and adding first decryption information to the extended video frame structure, wherein the first decryption information corresponds to the first encryption information;

The expanded video frame and the encrypted audio frame are encapsulated into second video data, and the second video data is sent to the browser. The second video data is used by the browser to decrypt and play the second video data according to the first decryption information in the expanded video frame structure.

In combination with the first aspect, in a first possible implementation manner of the first aspect, before sending a signaling for opening the video to the device end for capturing the video, the method further includes:

Establishing a communication link with the device end, and receiving registration information of the device end through the communication link;

The device end is authenticated according to the registration information, and the online status of the device after the authentication is passed is determined.

In combination with the first aspect, in a second possible implementation manner of the first aspect, the first video data uploaded by the device is encrypted video data, and before receiving the first video data uploaded by the device, the method further includes:

The predetermined first encryption information is sent to the device end, where the first encryption information is used by the device end to encrypt the third video data collected by the device end.

In combination with the first aspect, in a third possible implementation manner of the first aspect, adding first decryption information to the expanded video frame structure includes:

Encrypting the first decrypted information according to predetermined second encryption information;

The encrypted first encryption information is added to the network abstraction layer unit in the expanded video frame structure.

In a second aspect, an embodiment of the present application provides a video playback method, which is applied to a browser side, and the method includes:

Sending a video playback request to the server, wherein the video playback request includes device information of the video requested to be played;

Receive second video data returned by the server, where the second video data includes video frames and audio frames in the first video data sent by the device, and the video frames are video frames to which predetermined first decryption information is added after structural expansion, and the audio frames are audio frames encrypted by first encryption information corresponding to the first decryption information;

The encrypted audio frame is decrypted according to the first decryption information in the second video data, and the video is played according to the video frame and the decrypted audio frame.

In combination with the second aspect, in a first possible implementation manner of the second aspect, decrypting the encrypted audio frame according to the first decryption information in the second video data includes:

Decapsulating the second video data through a decryption library compiled into an underlying virtual machine bytecode format in a WebAssembly encoding manner, to obtain a video frame with an extended video frame structure and an encrypted audio frame included in the second video data;

Parsing the video frame with the extended video frame structure to obtain first decryption information included in the video frame;

The encrypted audio frame is decrypted according to the first decryption information.

In combination with the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, after obtaining the first decryption information included in the video frame, the method further includes:

Detecting the encryption status of the video frame;

When the video frame is in an encrypted state, the video frame is decrypted using the first decryption information.

In combination with the first possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, parsing the video frame having the extended video frame structure to obtain the first decryption information included in the video frame includes:

The video frame having the extended video frame structure is parsed, and the extended information of the video frame is decrypted by using predetermined second decryption information to obtain the first decryption information included in the video frame.

A third aspect of an embodiment of the present application provides a server, including a memory, a processor, a communication unit, and a computer program stored in the memory and executable on the processor, wherein:

The communication unit is used to send and receive data or instructions to the browser or device;

The processor is used to execute the video playback method as described in any one of the first aspects.

A fourth aspect of an embodiment of the present application provides a browser end, comprising a memory, a processor, a communication unit, and a computer program stored in the memory and executable on the processor, wherein the browser end comprises a communication unit and a processing unit, wherein:

The communication unit is used to send and receive data or instructions to the server;

The processor is used to execute the video playback method as described in any one of the second aspects.

A fifth aspect of the embodiment of the present application provides a video playback system, the system comprising a browser side, a server side and a device side, wherein:

The server is used to execute the video playback method as described in any one of the first aspects;

The browser end is used to execute the video playback method as described in any one of the second aspects.

A sixth aspect of an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the method described in any one of the first aspect or the second aspect are implemented.

Beneficial Effects

The beneficial effects of the embodiments of the present application compared with the prior art are as follows: in the video playback method described in the embodiments of the present application, when the server receives a video playback request from the browser, it sends a signal to the device to open the video, receives the first video data uploaded by the device, expands the video frame structure of the video frame in the first video data, adds the first decryption information to the expanded video frame structure, encrypts the audio frame in the first video data by the first encryption information, encapsulates the expanded video frame and the encrypted audio frame as the second video data and sends it to the browser for playback, so that when the browser plays the video, there is no need to separately call the server's interface to query the first decryption information, which fundamentally reduces one business interaction and helps to reduce the access pressure on the server. In addition, the first decryption information is transmitted through the expanded video frame structure, and no additional access time is consumed, thereby reducing the delay of video playback.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

FIG1 is a schematic diagram of a video playback system according to a method provided by an embodiment of the present application;

FIG2 is a schematic diagram of constructing a video playback library on a browser side provided in an embodiment of the present application;

FIG3 is a schematic diagram of an implementation flow of a video playback method provided in an embodiment of the present application;

FIG4 is an extended schematic diagram of a video frame structure provided in an embodiment of the present application;

FIG5 is a schematic diagram of a process of implementing decryption and playback on a browser side provided in an embodiment of the present application;

FIG6 is a schematic diagram of an interactive process of video playback provided in an embodiment of the present application;

FIG7 is a schematic diagram of a video playback device provided in an embodiment of the present application;

FIG8 is a schematic diagram of an electronic device provided in an embodiment of the present application.

Embodiments of the present invention

In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.

In order to illustrate the technical solution described in this application, a specific embodiment is provided below for illustration.

Fig. 1 is a schematic diagram of a video playback system of an implementation scenario of a video playback method provided in an embodiment of the present application. As shown in Fig. 1, the video playback system includes a browser end, a server end, and a device end.

Among them, the device end is the producer of video content, and the device end may include one or more network cameras, or other video acquisition devices, for collecting third video data. After the device end is started, it can automatically connect to the server end to establish a communication link between the device end and the server end. The device end initiates an authentication request to the server end. After the server end authenticates the device end, the server end can set the authenticated device end to an online state. When the device end is in an online state, the browser end can play the video through the server end.

When the device is started normally, the third video data is collected in real time through the microphone and the image sensor, and the collected third video data is encoded and compressed through a predetermined encoding method, such as H264 or H265 encoding method, and then encrypted through a predetermined first encryption algorithm, such as AES (Advanced Encryption Standard) or RSA (Rivest-Shamir-Adleman). The encrypted video data can be stored in the memory of the device. When the user requests to play the video through the browser, the encrypted third video data can be uploaded to the server in real time for playback, or the unencrypted third video data can be uploaded to the server in real time for playback.

The server is used to coordinate data or signaling between the browser and the device so that the browser can play the third video data collected by the device. When coordinating with the device, the server manages the online and offline status of the device. When the device is online, the latest first encryption information, such as a key, is sent to the online device. When coordinating with the browser, if the server receives a video playback request sent by the user through the browser, it notifies the device in real time to upload the first video data, expands the video frame structure in the uploaded first video data, and adds extended information to the extended video frame structure, including the first decryption information corresponding to the first encryption information. The extended first video data is sent to the browser.

The browser side includes a terminal with a browser installed, and can play the video collected by the device side based on the installed browser request. The browser side is the video playback client. When the user plays the video, it is responsible for pulling the encrypted second video data from the server, then parsing the second video data, obtaining the first decryption information, and decrypting the second video data, thereby realizing the video playback.

In order to achieve plug-in-free audio and video playback on the browser side, the browser side will implement a video playback library (videoSdk), which can be directly called by the web program without installing any plug-ins. As shown in Figure 2, the video playback library includes two sub-libraries, namely the decryption library (videoplayer.wasm) in wasm (full name in English: WebAssembly) format, and the playback library (videoplayer.js) in js format.

Among them, for the video playback sub-library in wasm format, the WebAssembly method can be used to compile programs such as video data processing module, multimedia video processing tool FFmpeg, decryption algorithm (including AES, RSA, etc.) into a video playback sub-library based on wasm format, thereby realizing core business logic such as video data decapsulation, decryption, FFmpeg soft decoding, and FMP4 (Fragmented mp4) file encapsulation. The WebAssembly method can compile C/C++ programs into LLVM (Low Level Virtual Machine in English, Low Level Virtual Machine in Chinese) bytecode, a coding format that can only be understood by computers, with the characteristics of high security and fast running speed.

For the playback library in js format, JavaScript language can be used to implement the video streaming module, MSE (Media Source Extensions) playback module, and WebGL (Web Graphics Library) playback module. When a user requests a video, the video streaming module pulls video data from the server and inputs it into videoPlayer.wasm for data processing. The H.264 video frame is directly encapsulated in FMP4 mode and input into the MSE playback module for decoding and rendering. For H.265 video frames, FFmpeg soft decoding is required in videoPlayer.wasm, and then the decoded video data is input into the WebGL playback module for rendering. Finally, the browser can decode and play the audio and video data normally, and the user can hear the sound and see the video screen.

FIG3 is a schematic diagram of an implementation flow of a video playback method provided in an embodiment of the present application. As shown in FIG3 , the method includes:

In S301, a video playback request from a browser is received, and a signaling for opening the video is sent to a device for capturing the video.

In a possible implementation, before the server receives the video sign language request from the browser, a step of the device registering and going online at the server may also be included.

Staff or users can enter device information on the server in advance, including the device serial number, etc. When the device is started, it establishes a communication link with the server through the server access information pre-set by the device, such as a TLS (Transport Layer Security Protocol in Chinese, Hyper Text Transfer Protocol over Secure Socket Layer in English) communication link, which ensures the security of the transmission process through transmission encryption and identity authentication based on HTTP.

After the communication link is established, the device sends a registration package to the server, and the server authenticates the device based on the registration package. After the authentication is passed, the server sets the status of the device to the online state.

In order to determine the security of the first video data transmitted between the server and the device, after establishing the communication link, the server may send first encryption information to the device, including key information of encryption algorithms such as AES and RSA. When the encryption algorithm is a symmetric encryption algorithm, the first encryption information may be encryption parameters. When the encryption algorithm is an asymmetric encryption algorithm, the first encryption information may be public key information.

In a possible implementation, the same device may include multiple video channels. To further ensure the security of the video content, different first encryption information may be configured for different video channels, including different encryption algorithms or encryption parameters. After receiving the first encryption information, the device may encrypt the corresponding video channel and respond to the server with the result information of whether the setting is successful.

After the device is online, the user can send a video playback request to the server through the browser. The video playback request may include device information or video channel information. The media protocol between the browser and the server can use the standard HTTP-FLV (Flash Video Over Http in English, FLASH video based on http in Chinese) transmission protocol.

After receiving the video playback request from the browser, the server will send a video opening signal through the communication link established between the device and the server (such as a TLS communication link). The signal can carry information such as the channel information of the video to be opened, the server IP address, and the media port.

After the device receives the video opening signaling sent by the server, it establishes a media link (such as a TLS media link) between the device and the server through the server IP address and media port information carried in the signaling. At the same time, the device also collects the third video data in real time through the microphone and camera, compresses the video data with H264 or H265 encoding, and can also use the first encryption information sent by the server to encrypt the video frames and audio frames in the video data. After completing the encryption process, the device can upload the encrypted first video data through the established TLS media link.

In S302, the first video data uploaded by the receiving device is encrypted by using predetermined first encryption information, the video frame structure in the first video data is expanded, and first decryption information is added to the expanded video frame structure, wherein the first decryption information corresponds to the first encryption information.

After receiving the first video data, the media port of the server may expand the video frame structure of the first video data and add the first decryption information to the video frame structure.

When the first video data is encrypted first video data, the first video data may be unpacked to restore the H264/H265 video frame and the original audio frame.

For the I frame in the video frame, the video frame structure can be extended to extend a NALU (Network Abstraction Layer Unit in Chinese, used to encapsulate the data provided by the video coding layer for network transmission) unit including SEI (Supplemental Enhancement Information in Chinese, which is used to provide a method for adding additional information to the video code stream and is a feature of the H264/H265 video compression standard) in front of the I frame of the original video data, and the first encryption information is added to the NALU unit body. This method does not destroy the original frame structure of the H264/H265 video, and can use standard streaming media transmission protocols such as HTTP-FLV/HLS (Http Live Streaming in English)/RTMP (Real Time Messaging Protocol in Chinese, which is an open protocol for direct audio, video and data transmission between Flash players and servers) to distribute audio and video data. For the audio frame in the first video data, the audio frame can be directly encrypted by the first encryption information.

As shown in FIG4, a schematic diagram of an extended video frame structure provided by an embodiment of the present application, each NALU unit includes a NALU header and a NALU body. In the NALU body of the NALU unit in the video data before extension, SPS (Sequence Parameter Set in English, Sequence Parameter Set in Chinese, indicating that all information of an image sequence is included), PPS (Picture Parameter Set in Chinese, Picture Parameter Set in English, indicating that information of all slices of an image is included) and basic image information are included. In the NALU body of the extended NALU unit, supplementary enhancement information SEI, i.e., the first encrypted information, is included.

In order to ensure the security of the extended information, the extended information, i.e., the first decrypted information, can be re-encrypted by the predetermined second encryption information. When the browser receives the second video data, the re-encrypted first decrypted information can be decrypted based on the predetermined second decryption information to obtain the first decrypted information.

In S303, the expanded video frame and the encrypted audio frame are encapsulated into second video data, and the second video data is sent to the browser side. The second video data is used by the browser side to decrypt and play the second video data according to the first decryption information in the extended video frame structure.

In a possible implementation, the unpacked video frame can be an unencrypted video frame in H264 format or H265 format, which is expanded through the video frame structure to obtain extended information and the original video frame encrypted by the first encryption information, or it can be an expanded unencrypted video frame, which is encapsulated with the encrypted audio frame as the second video data.

Alternatively, the encryption state of the video frame in the second video data may be switched according to the duration of the second video data sent to the browser. For example, within the first predetermined duration of the start of the transmission of the second video data, the video frame in the transmitted second video data may be an unencrypted video frame, and the first decryption information may be obtained by parsing based on the second decryption information. Alternatively, the first decryption information in the extended information may be directly obtained.

After the first predetermined time period, the video frames in the second video data are encrypted video frames. The browser can decrypt the video frames and audio frames in the second video data based on the first decryption information obtained within the first predetermined time period to obtain video frames and audio frames for playback.

The server can distribute the encrypted audio frames and video frames with extended SEI to the browser through the TLS media link.

In a possible implementation, as shown in the schematic diagram of the video processing flow on the browser side in FIG5 , the playback library of the JS layer on the browser side can pull the audio and video streams, and can decapsulate the audio and video data through the integrated video playback library (videoSdk), and parse the SEI extension information of the video frame I frame to obtain the decryption algorithm and parameters, including the encrypted first decryption information. When obtaining the first decryption information, the encrypted first decryption information can be decrypted according to the preset second decryption information to obtain the first decryption information, including information such as the encryption algorithm type and decryption parameters for decrypting the audio frame, or the audio frame and the video frame.

The video playback library (videoSdk) of the WASM layer can use the decryption algorithm type and decryption parameters in the parsed first decryption information to decrypt the audio frame and video frame to obtain the original audio frame data and video frame data. This includes determining whether the video frame is encrypted. If it is encrypted, decrypting it through the first decryption information to obtain a video frame that can be used for encapsulation and playback. If the video frame is an unencrypted video frame, the unencrypted video frame can be used.

For H.264 video frames, the video playback library (videoSdk) will encapsulate the original audio and video frames in FMP4 format, and decode and play them through the browser's MSE mode. For H.265 video frames, the video playback library will use WebAssembly soft decoding to decode H.265 into YUV data, and then call the browser's WebGL interface for rendering, so that the browser can display the video screen and play the sound normally.

FIG6 is a schematic diagram of an interactive video playback provided by an embodiment of the present application, which is described in detail as follows:

Step 1: The user enters the device information on the server. When the device is turned on, the device establishes a TLS communication link with the server and sends a registration package to the server. The server will authenticate the device information and put the device online after the authentication is passed.

Step 2: The server sends the first encryption information (such as the key of the AES or RSA encryption algorithm) for encrypting the audio and video frames to the device. When a device supports multiple camera channels, different encryption algorithms or encryption parameters can be configured for different channels to further ensure the security of the video content. After receiving the new first encryption information sent, the device will use the new first encryption information to encrypt the audio and video frames in the third video data, and respond to the server with the result information of whether the setting is successful.

Step 3: After the device is online, the user requests the server to play the video through the browser, which can carry parameters such as device information and channel information. The media protocol between the browser and the server can use the standard HTTP-FLV protocol.

Step 4: After the server receives the video playback request sent by the user through the browser, it will send a signal to open the video through the TLS communication link established between the device and the server. The signal will carry information such as channel information, server IP and media port.

Step 5: After the device receives the signaling from the server to open the audio and video, it establishes a TLS media link between the device and the server through the server IP and media port information carried in the signaling. At the same time, the device also collects the third video data in real time through the microphone or camera, compresses the third video data with H264 or H265 encoding, and then encrypts the third video data using the first encryption information sent by the server.

Step 6: The device uploads the encrypted third video data through the established TLS media link to obtain the first video data.

Step 7: After receiving the first video data, the media port of the server will unpack the audio frame and video frame to restore the H264/H265 video frame and the original audio frame. In the video frame structure extension, for each I frame, an SEI NALU unit can be extended in front of the original video data as extended information, and the secondary encrypted key information can be added to the NALU unit body. The advantage of this method is that it will not destroy the original frame structure of H264/H265, and can use standard streaming media transmission protocols such as HTTP-FLV/HLS/RTMP for audio and video data distribution. For each audio frame, the entire audio data can be directly encrypted.

Step 8: The server distributes the encrypted audio frame and the SEI-extended video frame, that is, the second video data, to the browser through the TLS media link.

Step 9: The browser decapsulates the second video data through the integrated video playback library videoSdk, obtains the encrypted first decryption information from the SEI extended data of the I frame, and then decrypts the encrypted first decryption information to obtain the first decryption information, that is, the original encryption algorithm type and decryption parameters.

Step 10: The video playback library videoSdk uses the algorithm type and decryption parameters of the parsed first decryption information to decrypt the audio frame or video frame of the second video data to obtain the original audio frame data and video frame data;

Step 11: For H.264 video frames, the video playback library (videoSdk) will encapsulate the original audio and video frames in FMP4 format, and decode and play them through the browser's MSE mode. For H.265 video frames, the video playback library will use WebAssembly soft decoding to decode H.265 into YUV data, and then call the browser's WebGL interface for rendering, so that the browser can display the video and play the sound normally.

The embodiments of the present application can achieve the following effects:

(1) Reduces the access pressure on the server in high-concurrency scenarios.

The decryption algorithm type and decryption parameters of the first decryption information are transmitted through the extended video frame structure, including the extended I frame SEI method. The browser does not need to call the server's interface separately for query, which fundamentally reduces one business interaction. Third, after the video data is encrypted, it will not cause additional access pressure on the server.

(2) Reduced the delay in first-screen video playback.

The decryption algorithm type and decryption parameters of the first decryption information are transmitted through the extended video frame structure, such as the extended video I frame SEI. The browser does not need to call the server's interface separately for query, which fundamentally reduces one business interaction and does not consume additional access time.

At the same time, the video playback library (videoSdk) based on the WebAssembly method has the characteristics of fast execution speed and does not cause additional performance loss, so it will not affect the playback experience.

(3) High security.

In terms of data transmission, TLS can be used between the device and the server for communication and uploading of audio and video data, and HTTPS can be used between the browser and the server for communication and video data pulling, making network transmission safe and reliable.

In terms of data content, the first video data uploaded by the device to the server is encrypted video data. The second video data forwarded by the server to the browser is also encrypted video data. Without data decryption, the video data cannot be played even if it is captured.

In terms of video playback, the video playback library (videoSdk) is in the underlying virtual machine bytecode format. For example, the decryption library videoPlayer.wasm performs first decryption information restoration and video decryption playback. It is impossible to obtain the key information and video content in the first decryption information by modifying the web page source code.

(3) No plug-ins are required on the browser side, which helps reduce operation and maintenance costs.

When playing videos on the browser, you only need to integrate the video playback library (videoSdk), which can be published together with the web program. There is no need to install other plug-ins separately. When the library needs to be updated, you only need to republish the web program, which will not bring additional work costs for subsequent operation and maintenance.

(4) It will not destroy the standardization of the streaming media transmission protocol and has good scalability.

The first encrypted information is transmitted by extending the SEI NALU unit of the I frame in the video frame structure, which will not damage the frame structure. The browser and the server can still use standard streaming protocols such as HTTP-FLV, HLS, RTMP, etc., which has good versatility and scalability.

It should be understood that the size of the serial numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

FIG. 7 is a schematic diagram of a video playback device applied to a server provided in an embodiment of the present application. As shown in FIG. 7 , the device includes:

The signaling sending unit 701 is used to receive a video playback request from a browser and send a signaling for opening the video to a device for capturing the video.

The expansion unit 702 is used to receive the first video data uploaded by the device end, encrypt the audio frame in the first video data by using the predetermined first encryption information, expand the video frame structure in the first video data, and add the first decryption information to the expanded video frame structure, wherein the first decryption information corresponds to the first encryption information.

The playback unit 703 is used to encapsulate the extended video frame and the encrypted audio frame into second video data, and send the second video data to the browser side. The second video data is used by the browser side to decrypt and play the second video data according to the first decryption information in the extended video frame structure.

The video playing device shown in FIG. 7 corresponds to the video playing method shown in FIG. 3 .

In a possible implementation, the video playback device may also include a browser-based video playback device, including:

A request sending unit, used to send a video play request to the server, wherein the video play request includes device information of the video requested to be played;

A second video data receiving unit is used to receive second video data returned by the server, wherein the second video data includes video frames and audio frames in the first video data sent by the device, and the video frames are video frames with predetermined first decryption information added after structural expansion, and the audio frames are audio frames encrypted by first encryption information corresponding to the first decryption information;

A decryption unit is used to decrypt the encrypted audio frame according to the first decryption information in the second video data, and play the video according to the video frame and the decrypted audio frame.

The video playback device based on the browser side corresponds to the video playback device based on the device side.

FIG8 is a schematic diagram of an electronic device provided in an embodiment of the present application. The electronic device can be a browser side or a server side. The communication unit is used to send and receive data and signaling, including the sending and receiving of data and signaling between the browser side and the server side, or between the server side and the device side. As shown in FIG8 , the electronic device 8 of this embodiment includes: a processor 80, a communication unit and a memory 81, and a computer program 82 stored in the memory 81 and executable on the processor 80, such as a video playback program. When the processor 80 executes the computer program 82, the steps in each of the above-mentioned video playback method embodiments are implemented. Alternatively, when the processor 80 executes the computer program 82, the functions of each module/unit in the above-mentioned device embodiments are implemented.

Exemplarily, the computer program 82 may be divided into one or more modules/units, which are stored in the memory 81 and executed by the processor 80 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, which are used to describe the execution process of the computer program 82 in the electronic device 8.

The electronic device may include, but is not limited to, a processor 80 and a memory 81. Those skilled in the art will appreciate that FIG8 is merely an example of the electronic device 8 and does not limit the electronic device 8. The electronic device may include more or fewer components than shown in the figure, or may combine certain components, or different components. For example, the electronic device may also include an input/output device, a network access device, a bus, etc.

The processor 80 may be a central processing unit (CPU), other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor, etc.

The memory 81 may be an internal storage unit of the electronic device 8, such as a hard disk or memory of the electronic device 8. The memory 81 may also be an external storage device of the electronic device 8, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on the electronic device 8. Further, the memory 81 may also include both an internal storage unit of the electronic device 8 and an external storage device. The memory 81 is used to store the computer program and other programs and data required by the electronic device. The memory 81 may also be used to temporarily store data that has been output or is to be output.

The technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.

In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

In the embodiments provided in the present application, it should be understood that the disclosed devices/terminal equipment and methods can be implemented in other ways. For example, the device/terminal equipment embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by hardware related to computer program instructions. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form. The computer-readable medium may include: any entity or device that can carry the computer program code, recording medium, U disk, mobile hard disk, disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electric carrier signals and telecommunication signals.

The embodiments described above are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. Such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should all be included in the protection scope of the present application.

Claims

A video playback method, characterized in that the method is applied to a server, and the method comprises:

Receive a video playback request from the browser, and send a signal to open the video to the device used to capture the video;

receiving first video data uploaded by a device, encrypting an audio frame in the first video data by using predetermined first encryption information, extending a video frame structure in the first video data, and adding first decryption information to the extended video frame structure, wherein the first decryption information corresponds to the first encryption information;

The expanded video frame and the encrypted audio frame are encapsulated into second video data, and the second video data is sent to the browser. The second video data is used by the browser to decrypt and play the second video data according to the first decryption information in the expanded video frame structure.
The method according to claim 1, characterized in that before sending a signaling for opening the video to the device end for capturing the video, the method further comprises:

Establishing a communication link with the device end, and receiving registration information of the device end through the communication link;

The device end is authenticated according to the registration information, and the online status of the device after the authentication is passed is determined.
The method according to claim 1 is characterized in that the first video data uploaded by the device end is encrypted video data, and before receiving the first video data uploaded by the device end, the method further comprises:

The predetermined first encryption information is sent to the device end, where the first encryption information is used by the device end to encrypt the third video data collected by the device end.
The method according to claim 1, characterized in that adding the first decryption information to the expanded video frame structure comprises:

Encrypting the first decrypted information according to predetermined second encryption information;

The encrypted first encryption information is added to the network abstraction layer unit in the expanded video frame structure.
A video playback method, characterized in that the method is applied to a browser side, and the method comprises:

Sending a video playback request to the server, wherein the video playback request includes device information of the video requested to be played;

Receive second video data returned by the server, where the second video data includes video frames and audio frames in the first video data sent by the device, and the video frames are video frames to which predetermined first decryption information is added after structural expansion, and the audio frames are audio frames encrypted by first encryption information corresponding to the first decryption information;

The encrypted audio frame is decrypted according to the first decryption information in the second video data, and the video is played according to the video frame and the decrypted audio frame.
The method according to claim 5, characterized in that decrypting the encrypted audio frame according to the first decryption information in the second video data comprises:

Decapsulating the second video data through a decryption library compiled into an underlying virtual machine bytecode format in a WebAssembly encoding manner, to obtain a video frame with an extended video frame structure and an encrypted audio frame included in the second video data;

Parsing the video frame with the extended video frame structure to obtain first decryption information included in the video frame;

The encrypted audio frame is decrypted according to the first decryption information.
The method according to claim 6, characterized in that after obtaining the first decryption information included in the video frame, the method further comprises:

Detecting the encryption status of the video frame;

When the video frame is in an encrypted state, the video frame is decrypted using the first decryption information.
The method according to claim 6, characterized in that parsing the video frame having the extended video frame structure to obtain the first decryption information included in the video frame comprises:

The video frame having the extended video frame structure is parsed, and the extended information of the video frame is decrypted by using predetermined second decryption information to obtain the first decryption information included in the video frame.
A server comprises a memory, a processor, a communication unit, and a computer program stored in the memory and executable on the processor, wherein:

The communication unit is used to send and receive data or instructions to the browser or device;

The processor is used to execute the video playback method as described in any one of claims 1-4.
A browser terminal comprises a memory, a processor, a communication unit, and a computer program stored in the memory and executable on the processor, wherein:

The communication unit is used to send and receive data or instructions to the server;

The processor is used to execute the video playback method as described in any one of claims 5-8.
A video playback system, characterized in that the system includes a browser end, a server end and a device end, wherein:

The server is used to execute the video playback method according to any one of claims 1 to 4;

The browser end is used to execute the video playback method as described in any one of claims 5-8.
A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 4, or 5 to 8.