CN113038289A

CN113038289A - Method for sending and receiving video data, terminal equipment and server

Info

Publication number: CN113038289A
Application number: CN202110292268.7A
Authority: CN
Inventors: 杨振新; 姜春苗; 胡波
Original assignee: Samsung China Semiconductor Co Ltd; Samsung Electronics Co Ltd
Current assignee: Samsung China Semiconductor Co Ltd; Samsung Electronics Co Ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-25
Also published as: KR20220130553A; US20220303620A1

Abstract

The server divides original video data into a plurality of video data streams, embeds extension information containing characteristic information of the video data streams into the appointed video data streams, and respectively transmits the plurality of video data streams into corresponding channels for transmission. The multicast prediction model in the terminal equipment can output a multicast access strategy based on the characteristic information and the user experience information of the currently played video, then adjust the currently accessed multicast combination based on the multicast access strategy, obtain a better multicast combination under the current network transmission environment, and receive video data streams with corresponding quantity and quality. The method executed by the server and the terminal equipment can realize the control of network congestion without increasing bandwidth consumption.

Description

Method for sending and receiving video data, terminal equipment and server

Technical Field

The present invention relates to the field of video transmission technologies, and in particular, to an apparatus, a terminal device, and a server for transmitting and receiving video data.

Background

Multicast is a network technology that allows one or more senders (multicast source) to send a single data packet to multiple receivers, and is an effective means to save network bandwidth and reduce network load. A multicast source (e.g., a server) sends a packet to a particular multicast group and only receivers (e.g., end devices) belonging to the address of the multicast group can receive the packet.

Existing multicast technologies include single video stream multicast, multiple video stream repeat multicast, and layered video multicast. In the single-video-stream multicast technology, each receiver (e.g., terminal device) can only obtain video with the same quality (e.g., resolution), and the selection scene of the video quality is relatively single. For the multi-video stream repeat multicast technology, the same original video can be transmitted in different channels for different quality (such as different resolutions) film sources, which may cause the same video to repeatedly occupy limited network bandwidth, easily cause larger network transmission bandwidth, cause increased traffic, and in addition, redundant information processing may also cause waste of computing resources. For the layered video multicast technology, receivers (such as terminal devices) need to join and leave a multicast group periodically to adapt to the change of the network state, and problems of overload of multicast routing and receiver rate adaptation and unstable overall video receiving quality can occur. Moreover, the existing layered video multicast technology cannot cover more application scenes, and has slow response to network congestion (especially transient congestion inside the network).

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for sending and receiving video data, terminal equipment and a server.

According to an aspect of the present invention, there is provided a method of transmitting video data, including: layering an original video into a plurality of video data streams; embedding expansion information in at least one data packet of at least one video data stream in a plurality of video data streams, wherein the expansion information comprises preset characteristic information of the video data stream; and respectively transmitting the plurality of video data streams into corresponding channels for transmission.

As described above, the original video is layered into a plurality of video data streams, and the data can be transmitted to the multicast address through the corresponding channels in a layered manner, and the data of different video data streams are independent from each other. In addition, the extension information (characteristic information) is embedded, so that the receiving end can analyze and predict the multicast access strategy conveniently.

In one embodiment of the present invention, the step of layering the original video into a plurality of video data streams comprises: layering an original video into a base layer video data stream and one or more enhancement layer video data streams, said embedding extension information in at least one data packet of at least one of said plurality of video data streams comprising: embedding extension information at least in at least one data packet of said base layer video data stream.

As described above, the base layer video data stream can provide basic video quality and can be decoded independently, while the enhancement layer video data stream needs to be decoded together with the base layer video data stream to achieve enhancement of video quality. Based on this, the extended information is embedded into the base layer video data stream, so that the receiving end can analyze and acquire the characteristic information even when only the base layer video data stream is received.

In one embodiment of the present invention, the step of embedding the extension information in at least one packet of the base layer video data stream comprises: embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

As described above, the characteristic information of the enhancement layer video data stream is embedded in the base layer video data stream, so that the receiving end can acquire the characteristic information of the enhancement layer video data stream only by decoding the base layer video data stream, thereby enabling the acquisition of the characteristic information to be convenient and efficient.

In one embodiment of the present invention, the step of embedding the extension information in at least one packet of the base layer video data stream comprises: embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and embedding extension information in at least one data packet of each enhancement layer video data stream, the extension information comprising: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

As described above, the extension information is embedded in both the base layer video data stream and each enhancement layer video data stream, so that the feature information can be obtained by decoding the base layer video data stream alone, or the enhancement layer video data stream and the base layer video data stream together to obtain the feature information, thereby providing a new choice for the receiving end to obtain the feature information.

In one embodiment of the present invention, the characteristic information of each video data stream includes at least one of the following types: the sending rate of the video data stream, the ratio of the amount of the video data stream to the amount of the base layer video data stream, and the ratio of the amount of the video data stream to the sum of the amounts of the remaining video data streams.

As described above, the characteristic information of the video data stream may be used as training data for training a multicast prediction model in advance; and the multicast prediction model can also be sent to a receiving end in the implementation stage, so that the receiving end predicts the multicast strategy based on the multicast prediction model.

In one embodiment of the invention, the extension information further comprises at least one of: the method is used for representing a first mark embedded with the extension information by the data packet, the number of video data streams corresponding to the characteristic information contained in the extension information, the type number of the characteristic information in the extension information and the embedding mode of the extension information.

As described above, any of the extension information is an extension of an original Layered Coding Transport (LCT) protocol, and the extension information can facilitate a receiving end to decode and obtain feature information.

The method for transmitting video data can be operated at a server side, and the original video is layered into a base layer video data stream and a plurality of enhancement layer video data streams. The layered video streams can be embedded with the characteristic information and then sent to corresponding multicast through different channels, so that available prediction data are provided for a receiving end, and the receiving end can conveniently make a multicast access strategy based on a multicast prediction model.

According to another aspect of the present invention, there is provided a method of receiving video data, including: receiving a video data stream corresponding to a currently accessed multicast combination, wherein at least one data packet of at least one video data stream in the corresponding video data stream is embedded with extension information, and the extension information comprises preset characteristic information of the video data stream; extracting characteristic information from the extension information; acquiring experience quality information of a video based on the currently played video; acquiring a multicast access strategy by utilizing a multicast prediction model based on the extracted characteristic information and experience quality information; and adjusting the currently accessed multicast combination based on the multicast access strategy.

As described above, the method for receiving video data may be executed at a receiving end (e.g., a terminal), and the method obtains feature information by receiving a video data stream, and obtains a multicast access policy by using a multicast prediction model in combination with quality of experience information of the video, so that it is possible to more accurately guide to execute join/leave actions, and to perform an optimal process of joining and leaving a multicast, thereby greatly reducing unnecessary join/leave trial and error actions, not only saving computing resources of the terminal, but also reducing pressure on an upper layer route for each join/leave. Therefore, the hierarchical congestion control driven by the receiving end is realized under the condition of not increasing the bandwidth consumption, and the user experience is improved; on the other hand, due to the accuracy and robustness of the multicast prediction model, the problem that the layered video multicast technology cannot cover more application scenes due to the fact that the branch coverage is judged to be not accurate enough through trial and error action logic in the prior art is solved on the technical implementation principle.

In one embodiment of the present invention, the corresponding video data stream comprises a base layer video data stream; or, the corresponding video data streams include a base layer video stream and one or more enhancement layer video data streams; wherein at least one data packet of the base layer video data stream has embedded therein extension information.

In one embodiment of the present invention, at least one packet of the base layer video data stream has extension information embedded therein, and the extension information includes characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

In one embodiment of the present invention, at least one packet of the base layer video data stream has extension information embedded therein, and the extension information includes: characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and at least one data packet of each of the one or more enhancement layer video streams has embedded therein extension information, the extension information including: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

In one embodiment of the present invention, the characteristic information extracted from the extension information includes at least one of the following types: the sending rate of the video data stream, the ratio of the amount of data of the video data stream to the amount of data of the base layer video data stream, and the ratio of the amount of data of the video data stream to the sum of the amounts of data of the remaining video data streams.

As described above, the characteristic information of the video data stream may be used for the receiving end to predict the multicast policy based on the multicast prediction model.

As described above, any of the extension information is an extension of the original LCT protocol, and the receiving end can decode the extension information to obtain the feature information.

In an embodiment of the present invention, the step of adjusting the multicast combination accessed by the terminal device includes any one of the following steps: at least one multicast except the multicast combination currently accessed by the new access terminal equipment; quitting at least one multicast in the multicast combination currently accessed by the terminal equipment; and maintaining the current accessed multicast combination of the terminal equipment unchanged.

As described above, the adjustment of the current multicast combination at the receiving end can be realized by the above adjustment steps based on the multicast access policy predicted by the multicast prediction model.

In one embodiment of the invention, the quality of experience information comprises at least one of the following types: jitter duration, average codec bit rate, frame rate deviation.

As described above, the user may give feedback based on these metrics to determine quantifiable quality of experience information to facilitate the multicast prediction model to implement the prediction.

In one embodiment of the invention, the multicast prediction model is retrained to be updated by being based on the extracted characteristic information and quality of experience information and the multicast access policy.

As described above, after receiving a sufficient amount of data (e.g., one cycle of data), the multicast prediction model may be made more accurate based on a dynamic model update strategy of a feedback mechanism.

According to another aspect of the present invention, there is provided a transmitting apparatus of video data, including a video layering module, an information embedding module, and a video transmission module. The video layering module is configured to: layering an original video into a plurality of video data streams; the information embedding module is configured to: embedding expansion information in at least one data packet of at least one video data stream in a plurality of video data streams, wherein the expansion information comprises preset characteristic information of the video data stream; the video transmission module is configured to: and respectively transmitting the plurality of video data streams into corresponding channels for transmission.

In one embodiment of the invention, the video layering module is configured to: layering an original video into a base layer video data stream and one or more enhancement layer video data streams, the information embedding module configured to: embedding extension information at least in at least one data packet of said base layer video data stream.

In one embodiment of the invention, the information embedding module is configured to: embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

In one embodiment of the invention, the information embedding module is configured to: embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and embedding extension information in at least one data packet of each enhancement layer video data stream, the extension information including: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

According to another aspect of the present invention, there is provided a receiving apparatus of video data, including: a video receiving module configured to: receiving a video data stream corresponding to a currently accessed multicast combination, wherein at least one data packet of at least one video data stream in the corresponding video data stream is embedded with extension information, and the extension information comprises preset characteristic information of the video data stream; a first extraction module configured to: extracting characteristic information from the extension information; a second extraction module configured to: acquiring experience quality information of a video based on the currently played video; a policy output module configured to: acquiring a multicast access strategy by utilizing a multicast prediction model based on the extracted characteristic information and experience quality information; a multicast adjustment module configured to: and adjusting the currently accessed multicast combination based on the multicast access strategy.

In one embodiment of the invention, the plurality of video data streams comprises a base layer video data stream; or the corresponding video data streams comprise a base layer video stream and one or more enhancement layer video data streams; wherein at least one data packet of the base layer video data stream has embedded therein extension information.

In one embodiment of the present invention, at least one packet of the base layer video data stream has embedded therein extension information, characteristic information of the base layer video data stream, and characteristic information of at least one enhancement layer video data stream.

In one embodiment of the present invention, the characteristic information of the base layer video data stream and the characteristic information of the enhancement layer video data stream adjacent to the base layer video data stream are embedded in at least one data packet of the base layer video data stream; and at least one data packet of each of the one or more enhancement layer video streams has embedded therein extension information, the extension information including: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

In one embodiment of the present invention, the characteristic information extracted from the extension information includes at least one of the following types: the sending rate of the video data stream, the ratio of the amount of the video data stream to the amount of the base layer video data stream, and the ratio of the amount of the video data stream to the sum of the amounts of the remaining video data streams.

In one embodiment of the invention, the multicast adaptation module is configured to perform any one of the following steps: at least one multicast except the multicast combination currently accessed by the new access terminal equipment; quitting at least one multicast in the multicast combination currently accessed by the terminal equipment; and maintaining the current accessed multicast combination of the terminal equipment unchanged.

In one embodiment of the invention, the apparatus further comprises a model update module configured to: the multicast prediction model is retrained to be updated by being based on the extracted characteristic information and quality of experience information and the multicast access policy.

According to another aspect of the present invention, there is provided a server comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of transmitting video data described above.

According to another aspect of the present invention, there is provided a terminal device comprising at least one processor and at least one memory storing instructions, wherein the instructions, when executed by the at least one processor, cause the at least one processor to perform the above-mentioned method of receiving video data.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor of a server, cause the at least one processor to perform the above-described method of transmitting video data.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the method of receiving video data described above.

Methods and systems for transmitting and receiving video data according to exemplary embodiments of the present invention provide a multicast prediction model. The server divides the original video data into a plurality of video data streams, embeds the expansion information containing the characteristic information of the video data streams into the appointed video data streams, and respectively transmits the plurality of video data streams into the corresponding multicast. The multicast prediction model in the terminal equipment can output a multicast access strategy based on the characteristic information and the user experience information of the currently played video, and then adjust the currently accessed multicast combination based on the multicast access strategy to obtain a better multicast combination under the current network transmission environment so as to receive video data streams with corresponding quantity and quality. The method executed by the server and the terminal equipment can realize the control of network congestion without increasing bandwidth consumption.

Moreover, the multicast prediction model has better accuracy and robustness, can accurately output the corresponding multicast access strategy in different application environments (such as different network transmission environments, different video contents or different user experience information), and can also reduce the test and error actions of meaningless access or exit of the multicast, thereby saving the computing resources of the terminal equipment and reducing the pressure on the upper layer route caused by accessing or exiting the multicast.

In addition, the terminal equipment is accessed to different multicast combinations to obtain videos with different qualities, and the selection scenes of the video quality are increased.

Drawings

The above features and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which:

fig. 1 is an application scene diagram of a method for transmitting video data and a method for receiving video data according to an exemplary embodiment of the present invention.

Fig. 2 illustrates a flowchart of a method of transmitting video data according to an exemplary embodiment of the present invention.

Fig. 3 is a diagram illustrating extended information in a data packet according to an exemplary embodiment of the present invention.

Fig. 4 shows a flowchart of a method of receiving video data according to an exemplary embodiment of the present invention.

Fig. 5 is a schematic diagram illustrating a method for a terminal device to perform multicast combining adjustment based on key frame data packet triggering according to an exemplary embodiment of the present invention.

Fig. 6 is a schematic diagram illustrating a correspondence relationship between an information group and tag information for training a multicast prediction model according to an exemplary embodiment of the present invention.

Fig. 7 is a block diagram illustrating a transmitting apparatus of video data according to an exemplary embodiment of the present invention.

Fig. 8 is a block diagram illustrating a receiving apparatus of video data according to an exemplary embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Wherein like reference numerals refer to like parts throughout.

Exemplary embodiments of the present invention provide a method of transmitting video data, which may be performed by a server, and a method of receiving video data, which may be performed by a terminal device. The method can be applied to video Services, such as Evolved Multimedia Broadcast/Multicast service (EMBMS).

Referring to fig. 1, a server forms original video data into a plurality of video data streams in a layered manner, embeds extension information including video data stream characteristic information in a designated video data stream, and then respectively transmits the plurality of video data streams to corresponding multicasts. The multicast prediction model in the terminal equipment can output a multicast access strategy based on the characteristic information of the video stream and the user experience information of the currently played video, and then adjust the multicast combination accessed by the current terminal equipment based on the multicast access strategy to obtain a better multicast combination under the current network transmission environment so as to receive the video data streams with corresponding quantity and quality. The method executed by the server and the terminal equipment can realize the control of network congestion without increasing bandwidth consumption.

The multicast access strategy can be understood as a scheme for performing optimal selection on a plurality of multicasts included in the multicast combination, which is made for the terminal device based on the characteristic information of the video stream and the user experience information of the currently played video, wherein the selection includes accessing, exiting or maintaining. The target of optimization selection enables the terminal equipment to have good video data stream receiving capacity, and therefore user experience is improved. Specific details of the multicast access policy are described below in conjunction with fig. 5.

The multicast prediction model is a trained machine learning model. Here, the machine learning model may be obtained by performing training based on any available initial model, wherein the initial model may include, but is not limited to, a supervised learning based multi-classification model, a support vector machine, an artificial neural network model, or a random forest model. The model can run in a terminal device and can be trained based on a training data set, wherein the training data can comprise characteristic information of video streams, user experience information of videos and corresponding multicast access strategies. A specific training process will be described below. In addition, the multicast prediction model has better accuracy and robustness, can accurately output the corresponding multicast access strategy in different application environments (such as different network transmission environments, different video contents or different user experience information), and can also reduce the test and error actions of meaningless access or exit of the multicast, thereby saving the computing resources of the terminal equipment and reducing the pressure on the upper layer route caused by the access or exit of the multicast.

The following describes specific steps of a method for transmitting video data according to an exemplary embodiment of the present invention.

Referring to fig. 2, the server layers an original video into a plurality of video data streams at step S110.

In step S110, the server may layer the original video into a plurality of video data streams based on a related video layering technique (e.g., a layered video multicast technique), and the number of video data streams formed after the original video is layered may be determined according to actual needs.

It should be noted here that the multiple video data streams can be enhanced independently and independently from each other, and the sum of the bandwidths occupied by the multiple video data streams is the maximum rate that can be obtained by the terminal equipment downstream of the path. The terminal device may receive at least one video data stream of the plurality of video data streams, and when the number of the video data streams received by the terminal device changes, the quality of the video played by the terminal device also changes, for example, when the number of the video data streams received by the terminal device increases, the resolution of the video played by the terminal device becomes higher; when the number of video data streams received by the terminal device decreases, the resolution of the video played by the terminal device becomes lower.

In one embodiment of the present invention, step S110 may include: the server layers the original video into a base layer video data stream and one or more enhancement layer video data streams.

The base layer video data stream can be decoded independently for providing basic video quality, the enhancement layer video data stream needs to be decoded together with the base layer video data stream, and the enhancement layer video data stream can provide higher video quality. It should be noted that the video data stream received by the terminal device comprises at least a base layer video data stream.

When the terminal equipment only receives the basic layer video data stream, the played video has basic quality; when the terminal device receives the base layer video data stream and the at least one enhancement layer video data stream, the played video can have higher quality, and the quality of the played video can be improved along with the increase of the number of the enhancement layer video data streams received by the terminal device.

Taking the resolution of the video as an example, in an embodiment of the present invention, as shown in fig. 1, the server layers the original video into a base layer video data stream 0, an enhancement layer video data 1, an enhancement layer video data 2, and an enhancement layer video data 3, and the resolution of the video can be divided into 360P, 480P, 720P, 1080P, and so on.

Base layer video data stream 0 may provide video quality with a resolution of 360P, base layer video data stream 0 and enhancement layer video data stream 1 together provide video quality with a resolution of 480P, base layer video data stream 0, enhancement layer video data stream 1 and enhancement layer video data stream 2 together provide video quality with a resolution of 720P, and base layer video data stream 0, enhancement layer video data 1, enhancement layer video data 2 and enhancement layer video data 3 together provide video quality with a resolution of 1080P.

In step S120, the server embeds extension information in at least one data packet of at least one of the plurality of video data streams. Here, the extension information includes characteristic information of a predetermined video data stream.

It is understood that the video data streams are transmitted in the form of data packets, each video data stream may include a plurality of data packets, and the extension information may be embedded in any one data packet of any one video data stream. Alternatively, the number of video data streams and the number of data packets used for embedding the expansion information may be determined according to actual needs, for example, the expansion information is embedded in one or more data packets of a specific video data stream, or the expansion information is embedded in one or more data packets of a plurality of specific video data streams respectively. For example, the extension information is embedded only in the 5 th packet of a certain video stream, or the extension information is embedded in the 1 st, 4 th, 7 th and 10 th packets of a certain video stream.

In one embodiment of the present invention, step S120 may include: embedding extension information at least in at least one data packet of said base layer video data stream.

As described above, the video data stream received by the terminal device at least includes the base layer video data stream, so that the extension information is embedded in the data packet of the base layer video data stream, and it is ensured that each terminal device can receive the extension information.

In one embodiment of the present invention, step S120 may include: the server embeds extension information in at least one data packet of a base layer video data stream, the extension information comprising characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

The extension information embedded in the data packet of the base layer video data stream may be the characteristic information of the base layer video data stream and the characteristic information of a part of the enhancement layer video data stream; or, it may be the characteristic information of the base layer video data stream and the characteristic information of the entire enhancement layer video data stream.

For example, the extension information embedded in the packet of the base layer video data stream 0 may include the characteristic information of the base layer video data stream 0 and the enhancement layer video data stream 1; alternatively, the extension information embedded in the packet of the base layer video data stream 0 may include the characteristic information of the base layer video data stream 0 and the enhancement layer video data streams 1 to 4.

In one embodiment of the present invention, step S120 may include: the extension information embedded in at least one packet of the base layer video data stream includes: characteristic information of the base layer video data stream and characteristic information of the enhancement layer video data stream adjacent to the base layer video data stream; and the extension information embedded in at least one packet of each enhancement layer video data stream includes: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream. It should be explained that the "adjacent" refers to adjacent in logical order of layering the original video. For example, a piece of original video is layered into 4 layers, specifically, including a base layer 0, an enhancement layer 1, an enhancement layer 2, and an enhancement layer 3; where the enhancement layer adjacent to base layer 0 may be enhancement layer 1 and the enhancement layers adjacent to enhancement layer 2 may be enhancement layers 1 and 3.

As an example, base layer video data stream 0, enhancement layer video data stream 1, enhancement layer video data stream 2, and enhancement layer video data stream 3 are sequentially adjacent. The extension information embedded in the data packet of the base layer video data stream 0 may include the characteristic information of the base layer video data stream 0 and the characteristic information of the enhancement layer video data stream 1; the extension information embedded in the data packet of the enhancement layer video data stream 1 may include the characteristic information of the enhancement layer video data stream 1, and the characteristic information of the base layer video data stream 0 and/or the characteristic information of the enhancement layer video data stream 2; the extension information embedded in the data packet of the enhancement layer video data stream 2 may include the characteristic information of the enhancement layer video data stream 2, the characteristic information of the enhancement layer video data stream 1 and/or the enhancement layer video data stream 3, and so on.

Referring to fig. 1, taking base layer video data stream 0 as an example, the characteristic information of base layer video data stream 0 may include a transmission rate of base layer video data stream 0, a ratio of base layer video data stream 0 to its own data amount (it is understood that the ratio is 1), and a ratio of base layer video data stream 0 to a sum of data amounts of the remaining video data streams (i.e., enhancement layer video data stream 1 to enhancement layer video data stream 3).

Taking enhancement layer video data stream 1 as an example, the characteristic information of enhancement layer video data stream 1 may include the transmission rate of enhancement layer video data stream 1, the ratio of the data amount of enhancement layer video data stream 1 to base layer video data stream 0, and the ratio of the sum of the data amount of enhancement layer video data stream 1 and the remaining video data streams (i.e., base layer video data stream 0, enhancement layer video data stream 2, and enhancement layer video data stream 3).

It is understood that the property information of one video data stream may include one or more of the above 3 types of property information in the packet in which the extension information is embedded.

It should be noted that, in the embodiment of the present invention, the extension information may be embedded in the packet header of the data packet, and the first identifier of the extension information is a preset value. When the header of the data packet has the first identification, the data packet is characterized to be embedded with the extension information.

In the embodiment of the present invention, the extension information in one data packet may contain at least one characteristic information of the video data stream. As an example, when the extension information in one packet contains the characteristic information of 1 video data stream, the number of video data streams corresponding to the characteristic information is 1; when the extension information in one data packet contains the characteristic information of 2 video data streams, the number of the video data streams corresponding to the characteristic information is 2; by analogy, when the extension information in one data packet includes the characteristic information of n video data streams, the number of the video data streams corresponding to the characteristic information is n, and n is a positive integer.

In the extension information in one packet, the type and the number of the characteristic information of each video data stream are the same. As described above, the types of the characteristic information of the video data stream include: the sending rate of the video data stream, the ratio of the amount of the video data stream to the amount of the base layer video data stream, and the ratio of the amount of the video data stream to the sum of the amounts of the remaining video data streams. Therefore, in the embodiment of the present invention, the number of types of the property information in the extension information may be 1, 2, or 3.

The following takes fig. 3 as an example to describe the extended information.

Referring to fig. 3, a Flag value of 1 represents a first Flag, and when the Flag value is 0, other entries in fig. 3 are all empty, indicating that no extension information is embedded in the packet.

The value of Type may indicate the manner of embedding the extension information. For example, when the Type value is 1, the first embedding manner may be represented, that is, the extension information is embedded in each of the data packet of the base layer video data stream and the data packet of each enhancement layer video data stream; when the value of Type is 2, the second embedding method can be expressed, that is, only the extension information is embedded in the data packet of the base layer video data stream.

The value of Layer Count indicates the number of video data streams corresponding to the characteristic information included in the extension information.

The value of Feature Count indicates the number of types of property information in the extension information.

Li is used to distinguish different video data streams corresponding to the characteristic information contained in the extension information. As shown in fig. 3, if the Layer Count has a value of 3, the number of video data streams corresponding to the property information included in the extension information is 3, and L1, L2, and L3 represent 3 video data streams, respectively.

The value of Li indicates the relationship between the video data stream represented by Li and the video data stream in which the extension information is located. When Li is 0, the video data stream represented by Li is the video data stream where the expansion information is located; when Li is-1, the video data stream represented by Li is the previous video data stream of the video data stream where the expansion information is located; when Li is 1, the video data stream represented by Li is the next data stream of the video data stream in which the extension information is located.

It can be understood that when Li is-n, the video data stream represented by Li is the first n video data streams of the video data stream where the extension information is located; when Li is n, the video data stream represented by Li is the last n video data streams of the video data stream where the extension information is located, and n is a positive integer.

Referring to fig. 3, L2 has a value of 0, indicating that the enhancement layer video data stream 2 is the video data stream where the extension information is located; l1 has a value of-1 indicating that enhancement layer video stream 1 is the previous video stream of enhancement layer video stream 2; l3 has a value of 1 indicating that enhancement layer video data stream 3 is the next video data stream of enhancement layer video data stream 2.

The value of Lenij indicates the byte length of j-th characteristic information of the video data stream represented by Li. Wherein i and j are both positive integers.

Referring to fig. 3, the Len11 has a value of, for example, the byte length of information of the transmission rate of the enhancement layer video data stream 1.

The value of Fij indicates the value of j-th property information of the video data stream represented by Li. Wherein i and j are both positive integers.

Referring to fig. 3, for example, the value of F11 is the value of the transmission rate of the enhancement layer video data stream 1.

In step S130, the server transmits a plurality of video data streams to corresponding channels respectively for transmission.

It can be understood that the video data streams correspond to the multicast one-to-one, and the server transmits each video data stream to the corresponding multicast through the corresponding channel according to a preset communication protocol, where the preset communication protocol may include a FLUTE protocol, an LCT protocol, and the like.

Taking fig. 1 as an example, the server transmits the base layer video data stream 0 to the multicast 0 through the channel 0, the server transmits the enhancement layer video data stream 1 to the multicast 1 through the channel 1, the server transmits the enhancement layer video data stream 2 to the multicast 2 through the channel 2, and the server transmits the enhancement layer video data stream 3 to the multicast 3 through the channel 3.

The terminal device may access a corresponding multicast to receive a corresponding video data stream, for example, the terminal device may receive the base layer video data stream 0 when accessing the multicast 0, and may receive the base layer video data stream 0 and the enhancement layer video data stream 1 when accessing the multicast 0 and the multicast 1.

As described above, the server end layers the original video into a plurality of video data streams, and can transmit data to the multicast address in a layered manner through corresponding channels, and the data of different video data streams are independent from each other. In addition, the extension information (characteristic information) is embedded, so that the receiving end can analyze and predict the multicast access strategy conveniently.

The following describes specific steps of a method for receiving video data according to an exemplary embodiment of the present invention.

Fig. 4 shows a flowchart of a method of transmitting video data according to an exemplary embodiment of the present invention.

Referring to fig. 4, in step S210, the terminal device receives a video data stream corresponding to a currently accessed multicast combination.

Here, at least one data packet of at least one of the corresponding video data streams is embedded with extension information, and the extension information includes preset characteristic information of the video data stream.

It should be noted here that one multicast group may include at least one multicast. As mentioned above, the terminal device may receive at least one video data stream of the plurality of video data streams, and when the number of video data streams received by the terminal device changes, the quality of the video played by the terminal device also changes. Therefore, different video data streams can be received by the terminal equipment accessing different multicast combinations, so that different video qualities can be obtained.

Optionally, in step S210, the multicast combination currently accessed by the terminal device may be a default multicast combination, or may be a multicast combination determined based on the selection setting of the video quality by the user. In step S210, the multicast combinations accessed by different terminal devices may be the same or different.

Taking fig. 1 as an example, the multicast combination currently accessed by the terminal device 1 includes multicast 0, multicast 1 and multicast 2, and can receive the base layer video data stream 0, the enhancement layer video data stream 1 and the enhancement layer video data stream 2; the multicast combination currently accessed by the terminal device 2 comprises multicast 0 and multicast 1, and can receive the video data stream 0 of the base layer and the video data stream 1 of the enhancement layer; the multicast combination currently accessed by the terminal device 3 includes multicast 0, and can receive the base layer video data stream 0.

Optionally, the corresponding video data stream comprises a base layer video data stream; or the corresponding video data streams comprise a base layer video stream and one or more enhancement layer video data streams, wherein at least one data packet of the base layer video data stream is embedded with the expansion information. In step S210, the multicast combination accessed by the terminal device at least includes the multicast corresponding to the base layer video data stream, so as to ensure that the terminal devices can all receive the extension information.

In step S220, the terminal device extracts the characteristic information from the extension information.

It can be understood that the extension information is embedded in any data packet of the video data stream, the video data stream is transmitted in the form of a data packet, and when the terminal device receives the data packet embedded with the extension information based on the video data stream corresponding to the multicast combination accessed by the terminal device, the characteristic information is extracted from the extension information of the data packet.

As described above, in the embodiment of the present invention, the extension information may be embedded in two ways.

The first embedding method: the data packets of the base layer video data stream and the data packets of the at least one enhancement layer video data stream have extension information embedded therein, respectively. Namely, the characteristic information of the base layer video data stream and the characteristic information of the enhancement layer video data stream adjacent to the base layer video data stream are embedded in at least one data packet of the base layer video data stream;

at least one data packet of each of the one or more enhancement layer video streams has embedded therein extension information, the extension information including: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

The second embedding method: only the packets of the base layer video data stream have extension information embedded therein. Namely, the embedding of the extension information in at least one data packet of the at least the base layer video data stream includes: the at least one data packet of the base layer video data stream is embedded with expansion information, the characteristic information of the base layer video data stream and the characteristic information of the at least one enhancement layer video data stream.

For a data packet embedded with extension information obtained by any one of the embedding methods, when receiving the data packet, the terminal device may extract the characteristic information from the extension information of the data packet.

Optionally, the characteristic information extracted from the extension information by the terminal device includes at least one of the following types: the sending rate of the video data stream, the ratio of the amount of data of the video data stream to the amount of data of the base layer video data stream, and the ratio of the amount of data of the video data stream to the sum of the amounts of data of the remaining video data streams.

In step S230, the terminal device obtains the experience quality information of the video based on the currently played video.

Quality of Experience information generally refers to Quality of Experience (QoE), i.e., the user's overall subjective perception of the Quality and performance (including availability and availability) of a device, network, system, application, or service. In the embodiment of the invention, the experience quality information is information which can be extracted and quantized based on the currently played video.

The quality of experience information includes at least one of the following types: jitter duration, average codec bit rate, frame rate deviation.

Each type of quality of experience information is explained below.

Jitter occurs when the absolute difference between the actual playback time and the expected playback time is greater than a predefined value (100 milliseconds), and the duration of the jitter is the jitter duration. The average codec bit rate is the ratio of the size of a segment of a video file to the time it takes to play the segment of the video file. The frame rate deviation represents the time difference between the actual playing time of a certain frame in the video and the expected playing time of the frame.

It should be noted that, the quality of experience information extracted in step S230 includes at least one of the above 3 types of quality of experience information.

In step S240, the terminal device obtains a multicast access policy by using a multicast prediction model based on the extracted characteristic information and the quality of experience information.

It will be appreciated that the multicast prediction model is a machine learning model that has been trained and that can be run in the terminal device. In step S240, the extracted characteristic information and the quality of experience information are used as input of a model, so that the multicast prediction model can output a multicast access policy, which is used for the indicated adjustment mode of the multicast combination.

It should be noted that the multicast prediction model may be preset in the terminal device, or the multicast prediction model may be downloaded from the terminal device in a designated device. For example, the server shown in fig. 1 stores a multicast prediction model, and when a terminal device first connects to the server, the multicast prediction model is downloaded.

In step S250, the terminal device adjusts the currently accessed multicast combination based on the multicast access policy.

In an embodiment of the present invention, step S250 may be any one of the following steps:

step (a 1): at least one multicast except the multicast combination currently accessed by the new access terminal equipment;

for example, the multicast combination currently accessed by the terminal device 1 includes multicast 0, multicast 1, and multicast 2, and the terminal device 1 may newly access multicast 3 based on the currently accessed multicast combination.

Step (a 2): quitting at least one multicast in the multicast combination currently accessed by the terminal equipment;

for example, the multicast combination currently accessed by the terminal device 1 includes multicast 0, multicast 1 and multicast 2, and the terminal device 1 may exit multicast 2 based on the currently accessed multicast combination.

Step (a 3): and maintaining the current accessed multicast combination of the terminal equipment unchanged.

For example, the multicast combination currently accessed by the terminal device 1 includes multicast 0, multicast 1, and multicast 2, and the terminal device 1 maintains the multicast combination currently accessed.

In the embodiment of the present invention, the data packet embedded with the extension information may be referred to as a key frame data packet, and it can be understood that the terminal device may perform steps S220 to S250 once every time it receives one key frame data packet. That is, the terminal device adjusts the currently accessed multicast combination once every time it receives a key frame data packet.

As described above, the terminal device may obtain the characteristic information by receiving the video data stream, and obtain the multicast access policy by using the multicast prediction model in combination with the experience quality information of the video, so as to more accurately guide the execution of the join/leave action, achieve the optimization processing of join and leave multicast, greatly reduce some meaningless join/leave trial and error actions, not only save the computing resources of the terminal, but also reduce the pressure on the upper layer route for each join/leave. Therefore, the hierarchical congestion control driven by the receiving end is realized under the condition of not increasing the bandwidth consumption, and the user experience is improved; on the other hand, due to the accuracy and robustness of the multicast prediction model, the problem that the layered video multicast technology cannot cover more application scenes due to the fact that the branch coverage is judged to be not accurate enough through trial and error action logic in the prior art is solved on the technical implementation principle.

Referring to fig. 5, the server sequentially transmits a key frame data packet 1, a key frame data packet 2, and a key frame data packet 3 at different time points.

The terminal equipment 1 is initially accessed to multicast 0 and multicast 1, and when the terminal equipment 1 receives the key frame data packet 1, the terminal equipment is newly accessed to multicast 2, multicast 3 and multicast 4; when the terminal equipment 1 receives the key frame data packet 2, maintaining the currently accessed multicast combination unchanged; when the terminal device 1 receives the key frame data packet 3, the currently accessed multicast combination is maintained unchanged.

The terminal device 2 is initially accessed to multicast 0 and multicast 1, and when the terminal device 2 receives the key frame data packet 1, the terminal device is newly accessed to multicast 2 and multicast 3; when the terminal equipment 2 receives the key frame data packet 2, exiting the multicast 3; when the terminal device 2 receives the key frame data packet 3, the currently accessed multicast combination is maintained unchanged.

The terminal device 3 is initially accessed to multicast 0 and multicast 1, and when the terminal device 3 receives the key frame data packet 1, the terminal device is newly accessed to multicast 2, multicast 3 and multicast 4; when the terminal device 3 receives the key frame data packet 2, quitting the multicast 3 and the multicast 4; when the terminal device 3 receives the key frame data packet 3, the multicast 3 is newly accessed.

In one embodiment of the invention, the multicast prediction model may be retrained to be updated based on the extracted characteristic information and quality of experience information and the multicast access policy.

The following describes the training process of the multicast prediction model:

step (b 1): the method comprises the steps of obtaining various characteristic information combinations based on a plurality of original videos with different content sizes, setting various experience quality information combinations, and combining each characteristic information combination with each experience quality information to form a plurality of information groups.

As shown in fig. 6, each information group includes at least one characteristic information and at least one quality of experience information. Each line in fig. 6 represents an information group, and the end of each line is Label information (Label) of the information group.

As shown in fig. 6, the types of the characteristic information include a transmission rate of the video data stream (represented by a Send bit rate in fig. 6), a ratio of a data amount of the video data stream to a data amount of the base layer video data stream (represented by a Relative probability in fig. 6), and a ratio of a data amount of the video data stream to a sum of data amounts of the remaining video data streams (represented by an Absolute probability in fig. 6). The quality of experience information includes Jitter duration (denoted Jitter duration in fig. 6), average Codec bit rate (denoted Codec bit rate in fig. 6), and frame rate offset (not shown in the figure).

As shown in fig. 6, the types of quality of experience information include jitter duration, average codec bit rate, and frame rate deviation.

Step (b 2): label information is labeled for each information group.

The label information indicates the multicast access strategy corresponding to the information group. As shown in fig. 6, the reference tag information may be: "Join two consecutive video layer streams", "Join one layer stream", "remaining unchanged", "Exit one layer stream", "Exit two consecutive video layer streams".

Step (b 3): and taking each information group and the label information thereof as training data to train the initial multicast prediction model.

The multicast prediction model can be a multi-classification model based on supervised learning, and can be trained by using machine learning algorithms such as a support vector machine, an artificial neural network or a random forest.

A procedure for transmitting and receiving video data will be described below by taking the terminal device 1 in fig. 1 as an example.

Step (d 1): the server layers the original video into a base layer video data stream 0, enhancement layer video data 1, enhancement layer video data 2, and enhancement layer video data 3.

Step (d 2): the server embeds extension information in at least one data packet of the base layer video data stream 0, the enhancement layer video data 1, the enhancement layer video data 2 and the enhancement layer video data 3 respectively.

The extension information embedded in the data packet of the base layer video data stream 0 includes the characteristic information of the base layer video data stream 0 and the characteristic information of the enhancement layer video data stream 1; the extension information embedded in the data packet of the enhancement layer video data stream 1 comprises the characteristic information of the enhancement layer video data stream 1 and the characteristic information of the enhancement layer video data stream 2; the extension information embedded in the data packet of the enhancement layer video data stream 2 includes the characteristic information of the enhancement layer video data stream 2 and the characteristic information of the enhancement layer video data stream 1 and the enhancement layer video data stream 3; the extension information embedded in the data packets of the enhancement layer video data stream 3 includes the characteristic information of the enhancement layer video data stream 2 and the characteristic information of the enhancement layer video data stream 3.

Step (d 3): the server transmits the base layer video data stream 0 to the multicast 0 through the channel 0, transmits the enhancement layer video data stream 1 to the multicast 1 through the channel 1, transmits the enhancement layer video data stream 2 to the multicast 2 through the channel 2, and transmits the enhancement layer video data stream 3 to the multicast 3 through the channel 3.

Step (d 4): the terminal device 1 receives a video data stream corresponding to the currently accessed multicast combination.

For example, the multicast combination currently accessed by the terminal device 1 includes multicast 0, multicast 1 and multicast 2, and can receive the base layer video data stream 0, the enhancement layer video data stream 1 and the enhancement layer video data stream 2

Step (d 5): the terminal device 1 receives the data packet embedded with the extension information in the enhancement layer video data stream 2, and the extension information embedded in the data packet may include the characteristic information of the enhancement layer video data stream 2 and the characteristic information of the enhancement layer video data stream 1 and/or the enhancement layer video data stream 3.

Step (d 6): the terminal device 1 extracts quality of experience information based on the currently played video.

Step (d 7): the terminal device 1 obtains a multicast access policy by using a multicast prediction model based on the extracted characteristic information and experience quality information.

Step (d 8): the terminal device 1 adjusts the currently accessed multicast combination based on the multicast access policy.

Specifically, the currently accessed multicast combination of the terminal device 1 includes multicast 0, multicast 1, and multicast 2, and the terminal device 1 newly accesses multicast 3 based on the currently accessed multicast combination. The adjusted multicast combinations include multicast 0, multicast 1, multicast 2, and multicast 3.

It can be understood that, the characteristic information extracted in step S220, the quality of experience information extracted in step S230, and the multicast access policy obtained in step S240 are used as new training data to retrain the current multicast prediction model, so as to further improve the accuracy of the multicast prediction model.

Optionally, the multicast prediction model may be retrained with the following steps:

step (c 1): and the terminal equipment sends the extracted characteristic information, experience quality information and multicast access strategy to preset equipment.

Step (c 2): and the preset equipment takes the received characteristic information, experience quality information and multicast as training data and retrains the multicast prediction model. And the multicast prediction model in the preset equipment is the same as the multicast prediction model in the terminal equipment.

Step (c 3): and the preset equipment sends the retrained parameter information of the multicast prediction model to the terminal equipment.

Step (c 4): the terminal equipment updates the multicast prediction model based on the received parameter information.

It should be noted that the preset device may be the server shown in fig. 1, or may be another server or computer device.

Fig. 7 shows a block diagram of a video data transmitting apparatus according to an exemplary embodiment of the present invention. Wherein the functional elements of the transmitting apparatus of video data may be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present invention. It will be appreciated by those skilled in the art that the functional units described in fig. 7 may be combined or divided into sub-units to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional units described herein.

In the following, functional units that the sending apparatus of video data may have and operations that each functional unit may perform are briefly described, and for the details related thereto, reference may be made to the above-mentioned description, which is not repeated herein.

Referring to fig. 7, the apparatus for transmitting video data according to the exemplary embodiment of the present invention includes a video layering module 310, an information embedding module 320, and a video transmission module 330.

The video layering module 310 is configured to: the original video is layered into a plurality of video data streams.

The information embedding module 320 is configured to: embedding expansion information in at least one data packet of at least one video data stream in the plurality of video data streams, wherein the expansion information comprises preset characteristic information of the video data streams.

The video transmission module 330 is configured to: and respectively transmitting the plurality of video data streams into corresponding channels for transmission.

In one embodiment of the invention, the video layering module 310 is configured to: layering an original video into a base layer video data stream and one or more enhancement layer video data streams, the information embedding module 320 configured to: embedding extension information at least in at least one data packet of said base layer video data stream.

In one embodiment of the present invention, the information embedding module 320 is configured to: embedding extension information in at least one data packet of the base layer video data stream, the extension information including characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

In one embodiment of the present invention, the information embedding module 320 is configured to: embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and embedding extension information in at least one data packet of each enhancement layer video data stream, the extension information including: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

Fig. 8 shows a block diagram of a receiving apparatus of video data according to an exemplary embodiment of the present invention. Wherein the functional elements of the receiving means of the video data may be implemented by hardware, software or a combination of hardware and software implementing the principles of the present invention. It will be appreciated by those skilled in the art that the functional units described in fig. 8 may be combined or divided into sub-units to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional units described herein.

In the following, functional units that a receiving apparatus of video data may have and operations that each functional unit may perform are briefly described, and for details related thereto, reference may be made to the above-mentioned related description, which is not repeated herein.

Referring to fig. 8, the apparatus for receiving video data according to the exemplary embodiment of the present invention includes a video receiving module 410, a first extraction module 420, a second extraction module 430, a policy output module 440, and a multicast adjustment module 450.

The video receiving module 410 is configured to: receiving a video data stream corresponding to a currently accessed multicast combination, wherein at least one data packet of at least one video data stream in the corresponding video data stream is embedded with extension information, and the extension information comprises preset characteristic information of the video data stream.

The first extraction module 420 is configured to: the property information is extracted from the extension information.

The second extraction module 430 is configured to: and acquiring experience quality information of the video based on the currently played video.

The policy output module 440 is configured to: and acquiring a multicast access strategy by utilizing a multicast prediction model based on the extracted characteristic information and experience quality information.

The multicast adaptation module 450 is configured to: and adjusting the currently accessed multicast combination based on the multicast access strategy.

In one embodiment of the present invention, the corresponding video data stream comprises a base layer video data stream; or the corresponding video data streams comprise a base layer video stream and one or more enhancement layer video data streams; wherein at least one data packet of the base layer video data stream has embedded therein extension information.

In one embodiment of the present invention, at least one packet of the base layer video data stream has extension information embedded therein, where the extension information includes characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and at least one data packet of each of the one or more enhancement layer video streams has embedded therein extension information, the extension information including: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

In one embodiment of the present invention, extracting the characteristic information from the extension information includes at least one of the following types: the sending rate of the video data stream, the ratio of the amount of the video data stream to the amount of the base layer video data stream, and the ratio of the amount of the video data stream to the sum of the amounts of the remaining video data streams.

In one embodiment of the present invention, the multicast adaptation module 450 is configured to perform any one of the following steps: at least one multicast except the multicast combination currently accessed by the new access terminal equipment; quitting at least one multicast in the multicast combination currently accessed by the terminal equipment; and maintaining the current accessed multicast combination of the terminal equipment unchanged.

In one embodiment of the present invention, the receiving apparatus further comprises a model update module 460, the model update module 460 being configured to: the multicast prediction model is retrained to be updated by being based on the extracted characteristic information and quality of experience information and the multicast access policy.

Exemplary embodiments of the present invention also provide a server including at least one processor and at least one memory storing instructions, wherein the instructions, when executed by the at least one processor, cause the at least one processor to perform the above-described method of transmitting video data.

Exemplary embodiments of the present invention also provide a terminal device including at least one processor and at least one memory storing instructions, wherein the instructions, when executed by the at least one processor, cause the at least one processor to perform the above-described method of receiving video data.

The processor may be a CPU (central processing Unit), a general-purpose processor, a DSP (digital signal processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like.

The Memory may be a ROM (Read-Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read-Only Memory) or other optical disk storage, optical disk storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

Exemplary embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when executed by at least one processor of a server, cause the at least one processor to perform the above-described method of transmitting video data.

Exemplary embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the above-described method of receiving video data.

The computer-readable recording medium described above is any data storage device that can store data read by a computer system. Examples of the computer-readable recording medium include: read-only memory, random access memory, read-only optical disk, magnetic tape, floppy disk, optical data storage, and carrier waves (such as data transmission through the internet via a wired or wireless transmission path).

While the invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims

1. A method of transmitting video data, comprising:

layering an original video into a plurality of video data streams;

embedding expansion information in at least one data packet of at least one video data stream in the plurality of video data streams, wherein the expansion information comprises preset characteristic information of the video data stream;

and respectively transmitting the plurality of video data streams into corresponding channels to be transmitted.

2. The method of claim 1, wherein the step of layering the original video into a plurality of video data streams comprises:

the original video is layered into a base layer video data stream and one or more enhancement layer video data streams,

said step of embedding extension information in at least one data packet of at least one of said plurality of video data streams comprises:

embedding extension information at least in at least one data packet of said base layer video data stream.

3. The method of claim 2, wherein the step of embedding the extension information in at least one packet of at least the base layer video data stream comprises:

embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

4. The method of claim 2, wherein the step of embedding the extension information in at least one packet of at least the base layer video data stream comprises:

embedding extension information in at least one data packet of the base layer video data stream, wherein the extension information comprises characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and

embedding extension information in at least one data packet of each enhancement layer video data stream, the extension information comprising: characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

5. The method of claim 1, wherein the characteristic information of each of the video data streams includes at least one of the following types: the sending rate of the video data stream, the ratio of the data amount of the video data stream to the data amount of the base layer video data stream, and the ratio of the data amount of the video data stream to the sum of the data amounts of the remaining video data streams.

6. The method of claim 1, wherein the extension information further comprises at least one of:

the first identifier is used for representing that the data packet is embedded with the extension information, the number of video data streams corresponding to the characteristic information contained in the extension information, the type number of the characteristic information in the extension information and the embedding mode of the extension information.

7. A method of receiving video data, comprising:

receiving a video data stream corresponding to a currently accessed multicast combination, wherein at least one data packet of at least one video data stream in the corresponding video data stream is embedded with extension information, and the extension information comprises preset characteristic information of the video data stream;

extracting the characteristic information from the extended information;

acquiring experience quality information of a video based on the currently played video;

acquiring a multicast access strategy by utilizing a multicast prediction model based on the extracted characteristic information and experience quality information;

and adjusting the currently accessed multicast combination based on the multicast access strategy.

8. The method of claim 7, wherein the corresponding video data stream comprises a base layer video data stream; or

The corresponding video data streams include a base layer video stream and one or more enhancement layer video data streams;

wherein at least one data packet of the base layer video data stream has embedded therein extension information.

9. The method of claim 8, wherein,

at least one data packet of the base layer video data stream is embedded with expansion information, and the expansion information comprises characteristic information of the base layer video data stream and characteristic information of at least one enhancement layer video data stream.

10. The method of claim 8, wherein at least one packet of the base layer video data stream has extension information embedded therein, the extension information including characteristic information of the base layer video data stream and characteristic information of an enhancement layer video data stream adjacent to the base layer video data stream; and

at least one data packet of each of the one or more enhancement layer video streams has embedded therein extension information, the extension information including:

characteristic information of the enhancement layer video data stream itself, and characteristic information of a video data stream adjacent to the enhancement layer video data stream.

11. The method of claim 7, wherein the characteristic information extracted from the extended information includes at least one of the following types: the sending rate of the video data stream, the ratio of the amount of data of the video data stream to the amount of data of the base layer video data stream, and the ratio of the amount of data of the video data stream to the sum of the amounts of data of the remaining video data streams.

12. The method of claim 7, wherein the extension information further comprises at least one of:

13. The method of claim 7, wherein the quality of experience information comprises at least one of the following types: jitter duration, average codec bit rate, frame rate deviation.

14. A transmission apparatus of video data, comprising:

a video layering module configured to: layering an original video into a plurality of video data streams;

an information embedding module configured to: embedding expansion information in at least one data packet of at least one video data stream in the plurality of video data streams, wherein the expansion information comprises preset characteristic information of the video data stream;

a video transmission module configured to: and respectively transmitting the plurality of video data streams into corresponding channels to be transmitted.

15. A receiving apparatus of video data, comprising:

a video receiving module configured to: receiving a video data stream corresponding to a currently accessed multicast combination, wherein at least one data packet of at least one video data stream in the corresponding video data stream is embedded with extension information, and the extension information comprises preset characteristic information of the video data stream;

a first extraction module configured to: extracting the characteristic information from the extended information;

a second extraction module configured to: acquiring experience quality information of a video based on the currently played video;

a policy output module configured to: acquiring a multicast access strategy by utilizing a multicast prediction model based on the extracted characteristic information and experience quality information;

a multicast adjustment module configured to: and adjusting the currently accessed multicast combination based on the multicast access strategy.

16. A server comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a method of transmitting video data according to any one of claims 1 to 6.

17. A terminal device comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a method of receiving video data as claimed in any of claims 7 to 13.

18. A computer-readable storage medium storing instructions that, when executed by at least one processor of a server, cause the at least one processor to perform the method of transmitting video data according to any one of claims 1 to 6.

19. A computer-readable storage medium storing instructions that, when executed by at least one processor of a terminal device, cause the at least one processor to perform the method of receiving video data of any of claims 7 to 13.