CN114466224B

CN114466224B - Video data encoding and decoding method and device, storage medium and electronic equipment

Info

Publication number: CN114466224B
Application number: CN202210095199.5A
Authority: CN
Inventors: 周济; 蔡海军
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2024-04-16
Anticipated expiration: 2042-01-26
Also published as: CN114466224A

Abstract

The invention discloses a method and a device for encoding and decoding video data, a storage medium and electronic equipment. Wherein the method comprises the following steps: under the condition that the current network condition is detected to not reach the preset network condition, determining a key video frame in the current video stream to be encoded; acquiring a multi-frame associated video frame associated with a key video frame in a video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame; and adding a plurality of groups of key point coding data corresponding to the multi-frame associated video frames into preset fields of key coding data respectively to construct transmission data, wherein the key coding data are data obtained by carrying out data coding on the key video frames, and the transmission data are used for being sent to a decoding end so that the decoding end obtains the key video frames and the multi-frame associated video frames associated with the key video frames based on the transmission data. The invention solves the technical problem of video picture loss caused by inadaptation of the video data volume to the network condition.

Description

Video data encoding and decoding method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and apparatus for encoding and decoding video data, a storage medium, and an electronic device.

Background

At present, in order to ensure the fluency of watching video under the condition of bad network conditions, the code rate of video pictures is generally reduced, so that the video pictures can be better adapted to the network conditions, but certain loss of the pictures is caused by the reduction of the code rate. And the amount of video data which can be reduced by reducing the code rate of the video picture is limited, so that the video picture is lost due to the fact that the code rate is reduced uniformly, and the viewing experience of the video is seriously affected.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for encoding and decoding video data, a storage medium and electronic equipment, which at least solve the technical problem of video picture loss caused by inadaptation of video data quantity to network conditions.

According to an aspect of an embodiment of the present invention, there is provided a method of encoding and decoding video data, including: under the condition that the current network condition is detected to not reach the preset network condition, determining a key video frame in the current video stream to be encoded; acquiring key point data of each frame of associated video frames in the multi-frame associated video frames associated with the key video frames in the video stream to be encoded; and adding a plurality of groups of key point coding data corresponding to the multi-frame associated video frames into preset fields of key coding data to construct transmission data, wherein the key point coding data are data obtained by data coding the key video frames, the key point coding data comprise coding data of the key point data of the associated video frames, and the transmission data are used for being sent to a decoding end so that the decoding end obtains the key video frames and the multi-frame associated video frames associated with the key video frames based on the transmission data.

According to an aspect of an embodiment of the present invention, there is provided a method of encoding and decoding video data, including: under the condition that transmission data sent by an encoding end are received, key encoding data contained in the transmission data and a plurality of groups of key point encoding data positioned in a preset field of the key encoding data are determined; decoding the key coding data and the plurality of groups of key point coding data to obtain key point data of a key video frame and multi-frame associated video frames associated with the key video frame, wherein the key point data is data corresponding to a target key point of the key video frame in the associated video frame; inputting the key point data and the key video frames into a generating countermeasure network, and obtaining the simulation associated video frames output by the generating countermeasure network, wherein the generating countermeasure network is trained to perform image restoration generation on the key point data by using the key video frames.

According to another aspect of the embodiments of the present invention, there is also provided a video data encoding and decoding apparatus including: the detecting unit is used for determining a key video frame in the current video stream to be encoded under the condition that the current network condition does not reach the preset network condition detected by the encoding end; an obtaining unit, configured to obtain a multi-frame associated video frame associated with the key video frame in the video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame; and the adding unit is used for adding the coded data of a plurality of groups of key point data corresponding to the multi-frame associated video frames respectively into preset fields of key coded data to construct transmission data, wherein the key coded data is obtained by data coding the key video frames, the key point coded data comprises the coded data of the key point data of the associated video frames, and the transmission data is used for being sent to a decoding end so that the decoding end obtains the key video frames and the multi-frame associated video frames associated with the key video frames based on the transmission data.

According to another aspect of the embodiments of the present invention, there is also provided a video data encoding and decoding apparatus including: a determining unit, configured to determine key coded data included in the transmission data and a plurality of groups of key point coded data located in a preset field of the key coded data when the transmission data sent by the encoding end is received; the decoding unit is used for decoding the key coding data and the plurality of groups of key point coding data to obtain key point data of a key video frame and a plurality of groups of associated video frames associated with the key video frame, wherein the key point data is data corresponding to a target key point of the key video frame in the associated video frame; and the generation unit is used for inputting the key point data and the key video frames into a generation countermeasure network and obtaining the simulation associated video frames output by the generation countermeasure network, wherein the generation countermeasure network is trained to perform image restoration generation on the key point data by utilizing the key video frames.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described encoding and decoding method of video data when run.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device including a memory in which a computer program is stored, and a processor configured to execute the above-described encoding and decoding method of video data by the computer program.

In the embodiment of the invention, under the condition that the current network condition is detected by an encoding end and does not reach the preset network condition, a key video frame in a current video stream to be encoded is determined, a multi-frame associated video frame associated with the key video frame in the video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame are obtained, encoding data of a plurality of groups of key point data corresponding to the multi-frame associated video frame are added in preset fields of key encoding data obtained by data encoding the key video frame, transmission data is constructed, the transmission data is transmitted to a decoding end, so that under the condition that the transmission data is received by the decoding end, the key video frames and the multi-frame associated video frames associated with the key video frames are obtained by decoding the transmission data, the key video frames are determined from the video stream to be encoded, the multi-frame associated video frames associated with the key video frames only encode key point data, and the encoded data of the key point data of the multi-frame associated video frames are added in a preset field of the key encoded data, so that the purposes of reducing the frame number of the encoded data obtained by encoding the video stream to be encoded and reducing the quantity of the encoded data required to be transmitted are achieved, the technical effect of obtaining the encoded and decoded data of the video stream with small data quantity is achieved, and the technical problem of video picture loss caused by the fact that the video data quantity is not suitable for network conditions is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic view of an application environment of an alternative video data encoding and decoding method according to an embodiment of the present invention;

FIG. 2 is a flow chart of an alternative method of encoding and decoding video data according to an embodiment of the present invention;

FIG. 3 is a flow chart of an alternative method of encoding and decoding video data according to an embodiment of the present invention;

FIG. 4 is a flow chart of an alternative method of encoding and decoding video data according to an embodiment of the present invention;

FIG. 5 is a flow chart of an alternative method of encoding and decoding video data according to an embodiment of the present invention;

fig. 6 is a schematic structural view of an alternative video data encoding and decoding apparatus according to an embodiment of the present invention

Fig. 7 is a schematic diagram of an alternative video data codec according to an embodiment of the present invention;

Fig. 8 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present invention, there is provided a method for encoding and decoding video data, and optionally, the method for encoding and decoding video data may be applied to an environment as shown in fig. 1, but not limited thereto. The encoding side 100 is not limited to data transmission with the decoding side 120 through the network 110. The encoding end 100 is not limited to use for encoding a video stream, so that the encoded transmission data is sent to the decoding end 120 through the network 110, so that the decoding end 120 decodes the transmission data to obtain the video stream for playing.

The encoding and decoding method of video data is not limited to being implemented by sequentially performing S102 to S110. S102, determining a key video frame. And under the condition that the encoding end detects that the current network condition does not reach the preset network condition, determining a key video frame in the current video stream to be encoded. S104, acquiring key point data of the associated video frames. And acquiring a multi-frame associated video frame associated with the key video frame in the video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame. S106, constructing transmission data. And adding the coding data of a plurality of groups of key point data corresponding to the multi-frame associated video frames into preset fields of key coding data to construct transmission data, wherein the key coding data is data obtained by data coding the key video frames. S108, decoding the transmission data. Under the condition that transmission data sent by an encoding end are received, key encoding data contained in the transmission data and a plurality of groups of key point encoding data located in preset fields of the key encoding data are determined, the key encoding data and the plurality of groups of key point encoding data are decoded, key point data of a key video frame and multi-frame associated video frames associated with the key video frame are obtained, and the key point data are data corresponding to target key points of the key video frame in the associated video frame. S110, obtaining an analog association video frame. Inputting the key point data and the key video frames into a generating countermeasure network, acquiring analog associated video frames output by the generating countermeasure network, and training the generating countermeasure network to perform image restoration generation on the key point data by utilizing the key video frames.

Alternatively, in the present embodiment, the encoding end 100 and the decoding end 120 are not limited to the same terminal device or different terminal devices, but may be servers. The terminal device may be a terminal device configured with a target client, and may include, but is not limited to, at least one of: a Mobile phone (such as an Android Mobile phone, an IOS Mobile phone, etc.), a notebook computer, a tablet computer, a palm computer, an MID (Mobile INTERNET DEVICES, mobile internet device), a PAD, a desktop computer, a smart television, etc. The target client may be a client having a video codec function, not limited to an audio client, a video client, an instant messaging client, a browser client, an educational client, and the like. The network 110 may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communications. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and is not limited in any way in the present embodiment.

As an alternative embodiment, as shown in fig. 2, the method for encoding and decoding video data includes:

s202, determining a key video frame in a current video stream to be encoded under the condition that the encoding end detects that the current network condition does not reach the preset network condition;

S204, acquiring a multi-frame associated video frame associated with a key video frame in a video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame;

S206, adding the coding data of the multiple groups of key point data corresponding to the multiple frames of associated video frames into preset fields of the key coding data, and constructing transmission data.

In S206, the key encoded data is data obtained by data encoding the key video frame, the key encoded data includes encoded data of key point data of the associated video frame, and the transmission data is used for being sent to the decoding end, so that the decoding end obtains the key video frame and the multi-frame associated video frame associated with the key video frame based on the transmission data.

The encoding end is not limited to be used for carrying out data encoding on the video stream to be encoded, so that transmission data obtained by data encoding is transmitted to the decoding end through a network, and the decoding end obtains the video stream by decoding the transmission data, thereby realizing video playing. The encoding end may be a terminal device having a data encoding function, for example, a terminal that captures video, or may be a server for performing video stream processing. The video stream comprises a plurality of frames of video frames, and the playing of the video stream is realized through the sequential display of the frames of the video frames.

The video stream is transmitted from the encoding end to the decoding end through the network, and the integrity and the transmission efficiency of data transmission are related to the network condition. The network conditions are not limited to including uplink transmission rate, downlink transmission rate, packet loss rate, etc. When video data transmission is performed, the video coding and decoding modes are not limited to be adjusted in real time according to network conditions, and the data quantity of coded data obtained by coding is controlled through video coding and decoding in different modes so that the data quantity of the coded data is matched with the network conditions. Specifically, the method is not limited to the method that the video stream is data-encoded by using an encoding mode that obtains a data amount of larger encoded data when the network condition is better; when the network condition is poor, the video stream is data-encoded using an encoding scheme that yields a smaller amount of encoded data.

The multi-frame associated video frames associated with the key video frames in the video stream to be encoded are not limited to multi-frame video frames that precede and/or follow the key video frames in time. The video stream to be encoded is not limited to determining multi-frame key video frames, and the rest video frames are associated with the nearest key video frames in the form of associated video frames, so that the video frames in the video stream to be encoded are divided into the key video frames and the associated video frames associated with the key video frames.

The method comprises the steps of carrying out data encoding on key video frames, not limited to carrying out data encoding on all data of the key video frames, carrying out data encoding on associated video frames, not limited to carrying out data encoding on key point data in the associated video frames, and adding key point encoded data obtained by encoding the key point data of the associated video frames into preset fields of the key encoded data obtained by encoding the key video frames, so that the data encoding on the key video frames and multi-frame associated video frames is realized through the key encoded data, and encoded data with small data quantity is obtained, so that the encoded data is more suitable for the requirement on video stream data transmission when the network condition does not reach the preset network condition.

In the embodiment of the application, under the condition that the current network condition is detected by an encoding end and does not reach the preset network condition, a key video frame in a current video stream to be encoded is determined, a multi-frame associated video frame associated with the key video frame in the video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame are obtained, encoding data of a plurality of groups of key point data corresponding to the multi-frame associated video frame are added in preset fields of key encoding data obtained by data encoding the key video frame, transmission data is constructed, the transmission data is transmitted to a decoding end, so that under the condition that the transmission data is received by the decoding end, the key video frames and the multi-frame associated video frames associated with the key video frames are obtained by decoding the transmission data, the key video frames are determined from the video stream to be encoded, the multi-frame associated video frames associated with the key video frames only encode key point data, and the encoded data of the key point data of the multi-frame associated video frames are added in a preset field of the key encoded data, so that the purposes of reducing the frame number of the encoded data obtained by encoding the video stream to be encoded and reducing the quantity of the encoded data required to be transmitted are achieved, the technical effect of obtaining the encoded and decoded data of the video stream with small data quantity is achieved, and the technical problem of video picture loss caused by the fact that the video data quantity is not suitable for network conditions is solved.

As an alternative embodiment, determining key video frames in the current video stream to be encoded includes: determining a video frame currently positioned at the first position to be encoded in a video stream to be encoded as a key video frame; or taking the video frames including all preset characteristic key points in the video stream to be coded as key video frames.

The key video frames are determined from the video stream to be encoded, not limited to being determined according to the timing sequence of the video frames, or by the number of feature key points contained in the video frames. The timing determination according to the video frames is not limited to determining a video frame located at the first bit to be encoded in the video stream to be a key video frame, and thus determining a multi-frame video frame located after the key video frame as an associated video frame associated with the key video frame. Alternatively, the video frames located on the target sequence position to be encoded in the video stream to be encoded may be determined as key video frames, so that the video frames located in the front N frames and the rear M frames of the key video frames are determined as associated video frames, where N and M are positive integers, and the values may be the same or different.

According to the number of the feature key points contained in the video frames, the video frames with the largest feature key points in the multi-frame video frames are not limited to be used as key video frames, and the feature key points are feature key points of target objects in the video stream. The video frame including the feature key points at the maximum is not limited to the video frame including all the preset feature key points, but is not limited to determining the key video frame from the video frames including all the preset feature key points through the timing of the video frame in the case that the number of the video frames including all the preset feature key points is large or the sequence interval is small.

For a video stream, the number of key video frames is not limited to be determined according to the number of video frames included in the video stream, so that the quality of the video stream is guaranteed to be decoded and recovered by a decoding end while the data amount of transmission data after the video stream is encoded is controlled. The number of associated video frames associated with each key video frame may be the same or different, and the number of associated video frames is not limited to a number range defined according to a preset association range, so as to ensure the quality of the decoded video stream. The number of associated video frames associated with each key video frame is not limited within the preset association range.

In the embodiment of the application, the key video frames are determined from the video stream, then the multi-frame associated video frames associated with the key video frames are determined, all data of the key video frames are subjected to data coding, and the associated video frames are subjected to key point data coding, so that the data volume of transmission data obtained after coding is greatly reduced, and the transmission data can be well adapted to the change of network conditions.

As an optional implementation manner, acquiring multiple frames of associated video frames associated with the key video frames in the video stream to be encoded and key point data of each frame of associated video frames in the multiple frames of associated video frames includes:

S204-1, carrying out characteristic key point identification on each frame of associated video frame in turn by using an identification algorithm, wherein the characteristic key points are used for identifying target objects in the video stream to be encoded;

and S204-2, taking the data of the identified characteristic key points as the key point data of the associated video frames.

The target object is an object included in the video stream, for example, for a live video stream, the target object may be a main cast object in the live video stream or may be a pet object in the live video stream. In the case where the number of target objects is greater than one, the feature key point recognition is not limited to be performed on each target object in turn. The number of feature keypoints is not limited to being determined based on the number of target objects, and is used to explicitly identify the target objects from the video frame.

Taking the target object as an anchor object as an example, the recognition algorithm is not limited to a face alignment (DAN) algorithm. Face keypoint detection is performed on each video frame using the DAN algorithm, and the keypoints are not limited to key feature locations that identify the anchor object. And acquiring key point data of each associated video frame through a DAN algorithm.

As an alternative embodiment, as shown in fig. 3, before adding the plurality of sets of key point encoded data corresponding to the multi-frame associated video frames respectively in the preset fields of the key encoded data, the method further includes:

S302, sequentially determining the frame sequence relation between each frame of associated video frame and key video frame;

S304, carrying out data coding on the frame sequence relation and the key point data to obtain key point coding data;

S306, carrying out data coding on the key video frames to obtain key coding data.

Under the condition that the key video frames and the multi-frame associated video frames associated with the key video frames are determined, determining the frame sequence relation of each associated video frame and the key video frames, wherein the frame sequence relation is used for indicating the frame position relation of the associated video frames and the key video frames in a video stream, and marking the relative frame sequence and relation of the associated video frames and the key video frames corresponding to the key point data by utilizing the frame sequence relation.

When the frame sequence relation of the associated video frame relative to the key video frame is determined, the frame sequence relation of the associated video frame and the key point data are not limited to be respectively subjected to data coding, and the coded data of the frame sequence relation and the coded data of the key point data are used as the key point coded data corresponding to the associated video frame.

As an optional implementation manner, adding multiple sets of key point coding data corresponding to multiple frames of associated video frames respectively in preset fields of the key coding data, and constructing transmission data includes:

s206-1, sequentially adding the coded data of each frame of associated video frame into a preset field of key coded data;

And S206-2, taking key coded data added with coded data of all associated video frames as transmission data.

Under the condition that key video frames are subjected to data coding to obtain key coding data, adding key point coding data corresponding to the associated video frames into preset fields of the key coding data, and constructing transmission data for network transmission.

The preset field is not limited to a reserved field of key coding data for extension coding, and key point coding data corresponding to all associated video frames associated with the key video frames are added in the preset field of the key coding data, so that the key video frames and multi-frame associated video frames are coded into the key coding data to serve as transmission data transmitted to a decoding end.

In the embodiment of the application, the key point coding data of the multi-frame associated video frame is added into the key coding data, so that the transmission data with small data volume is used as the coding data of the video stream, and the transmission data is adapted to the current worse network condition.

According to an aspect of an embodiment of the present invention, there is provided a method for encoding and decoding video data, and optionally, the method for encoding and decoding video data may be applied to, but not limited to, a decoding end as shown in fig. 1. As an alternative embodiment, as shown in fig. 4, the above-mentioned method for encoding and decoding video data includes:

S402, under the condition that transmission data sent by an encoding end are received, key encoding data included in the transmission data and a plurality of groups of key point encoding data positioned in a preset field of the key encoding data are determined;

S404, decoding the key coding data and a plurality of groups of key point coding data to obtain key video frames and key point data of multi-frame associated video frames associated with the key video frames;

in S404, the key point data is data corresponding to the target key point of the key video frame in the associated video frame.

S406, inputting the key point data and the key video frames into a countermeasure network, and obtaining analog associated video frames output by the countermeasure network;

In S406 described above, the generation countermeasure network is trained to perform image restoration generation of the keypoint data using the key video frames.

Under the condition that the decoding end receives the transmission data, the key coding data in the transmission data and a plurality of groups of key point coding data in a preset field of the key coding data are determined. And decoding the key coding data to obtain a key video frame, and decoding the key point coding data to obtain a plurality of groups of key point data associated with the key video frame and a frame sequence relation corresponding to the key point data.

Inputting the key point data and the key video frames into a generated countermeasure network, taking the analog association video frames output by the generated countermeasure network as association video frames associated with the key video frames, constructing video by using the analog association video frames and the key video frames, and playing.

As an alternative embodiment, before inputting the keypoint data and the key video frames to generate the countermeasure network, the method further comprises: calculating a motion offset parameter between the key point data and the corresponding target key point data in the key video frame;

Inputting the keypoint data and the key video frames to generate the countermeasure network comprises: and sequentially inputting the key video frames, the key point data of each frame of associated video frame and the motion offset parameter to generate an countermeasure network so as to obtain the analog associated video frames which generate output of the countermeasure network.

The motion offset parameter between the calculated key point data and the corresponding target key point data in the key video frame is not limited to the motion vector of the calculated key point data and the target key point data, that is, the data offset of the target key point data in the key video frame and the key point data of the associated video frame.

Before the generated countermeasure network is used, the initial countermeasure network and the generated countermeasure network are trained by using sample key frames, sample associated frames and sample offset parameters of the sample associated frames, so that the generated countermeasure network reaching the model convergence condition is obtained. The simulation convergence condition is not limited to that the similarity between the generated simulation correlation frame and the sample correlation frame is higher than a preset similarity threshold.

Inputting the key video frames, the key point data of the associated video frames and the motion offset parameters into a generated countermeasure network, and obtaining the analog associated video frames which are output by the generated countermeasure network and have the similarity higher than a similarity threshold value with the associated video frames, so that the analog associated video frames generated by the generated countermeasure network are used as the associated video frames to be played at a decoding end.

As an alternative embodiment, as shown in fig. 5, after obtaining the analog associated video frame that generates the countermeasure network output, the method further includes:

S502, acquiring a frame sequence relation corresponding to the analog association video frame, wherein the frame sequence relation is used for indicating the relative frame sequence of the analog association video frame and the key video frame;

s504, determining the play sequence of multi-frame simulation associated video frames and key video frames according to the frame sequence relation;

s506, sequentially playing the key video frames and the multi-frame analog related video frames according to the playing sequence.

Under the condition that the decoding end generates analog associated video frames corresponding to each group of key point data through the generation countermeasure network, playing and sequencing the multi-frame analog associated video frames and the key video frames according to the frame sequence relation corresponding to the key point data, so that the playing sequence of the multi-frame analog associated video frames and the key video frames is determined, and the key video frames and the multi-frame analog associated video frames are played in sequence according to the playing sequence.

Under the condition that the decoding end decodes and determines the playing sequence of the multi-frame video frames contained in the key coding data, the decoding end plays the multi-frame video frames according to the playing sequence so as to complete the transmission playing of the video stream.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

According to another aspect of the embodiments of the present invention, there is also provided a video data codec device for implementing the above-mentioned video data codec method. As shown in fig. 6, the apparatus includes:

a detection unit 602, configured to determine a key video frame in a current video stream to be encoded when it is detected that the current network condition does not reach the preset network condition;

an obtaining unit 604, configured to obtain a multi-frame associated video frame associated with a key video frame in a video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame;

The adding unit 606 is configured to add multiple sets of key point coding data corresponding to multiple frames of associated video frames in preset fields of the key coding data, and construct transmission data, where the key coding data is data obtained by performing data coding on the key video frames, the key point coding data includes coding data of key point data of the associated video frames, and the transmission data is used for sending to a decoding end, so that the decoding end obtains the key video frames and multiple frames of associated video frames associated with the key video frames based on the transmission data.

Optionally, the detecting unit 602 is further configured to determine a video frame currently located at the first position to be encoded in the video stream to be encoded as a key video frame; or taking the video frames including all preset characteristic key points in the video stream to be coded as key video frames.

Optionally, the obtaining unit 604 is further configured to perform feature key point identification on each frame of associated video frame in turn by using an identification algorithm, where the feature key point is used to identify a target object in the video stream to be encoded; and taking the data of the identified characteristic key points as the key point data of the associated video frames.

Optionally, the encoding and decoding device of video data further includes an encoding unit, configured to sequentially determine a frame sequence relationship between each frame of associated video frame and each frame of key video frame before adding multiple sets of key point encoded data corresponding to multiple frames of associated video frames respectively to a preset field of the key encoded data; performing data coding on the frame sequence relation and the key point data to obtain key point coding data; and carrying out data coding on the key video frames to obtain key coding data.

Optionally, the adding unit 606 is further configured to sequentially add the key point encoded data of each associated video frame to a preset field of the key encoded data; and taking the key point coding data added with all the associated video frames as transmission data.

According to another aspect of the embodiments of the present invention, there is also provided a video data codec device for implementing the above-mentioned video data codec method. As shown in fig. 7, the apparatus includes:

A determining unit 702, configured to determine, when receiving transmission data sent by an encoding end, key encoding data included in the transmission data and a plurality of groups of key point encoding data located in a preset field of the key encoding data;

The decoding unit 704 decodes the key coding data and the multiple groups of key point coding data to obtain key point data of a key video frame and multiple groups of associated video frames associated with the key video frame, wherein the key point data is data corresponding to a target key point of the key video frame in the associated video frame;

The generating unit 706 is configured to input the keypoint data and the key video frames into a generating countermeasure network, and obtain a simulated associated video frame output by the generating countermeasure network, where the generating countermeasure network is trained to perform image restoration generation on the keypoint data by using the key video frames.

Optionally, the above-mentioned encoding and decoding device of video data further includes a calculating unit for calculating a motion offset parameter between the key point data and the corresponding target key point data in the key video frame before inputting the key point data and the key video frame to generate the countermeasure network;

the generating unit 706 is further configured to sequentially input the key video frames, the key point data of each associated video frame, and the motion offset parameter to generate an countermeasure network, so as to obtain a simulated associated video frame that generates an output of the countermeasure network.

Optionally, the encoding and decoding device of video data further includes a playing unit, configured to obtain a frame sequence relationship corresponding to the analog associated video frame after obtaining the analog associated video frame output by the countermeasure network, where the frame sequence relationship is used to indicate a relative frame sequence of the analog associated video frame and the key video frame; determining the play sequence of multi-frame simulation associated video frames and key video frames according to the frame sequence relation; and playing the key video frames and the multi-frame analog association video frames in sequence according to the playing sequence.

According to still another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the above-mentioned encoding and decoding method of video data, which may be a terminal device or a server as shown in fig. 1. The present embodiment is described taking the electronic device as an encoding end and a decoding end as an example. As shown in fig. 8, the electronic device comprises a memory 802 and a processor 804, the memory 802 having stored therein a computer program, the processor 804 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, determining a key video frame in a current video stream to be encoded under the condition that the current network condition is detected to not reach the preset network condition;

S2, acquiring a multi-frame associated video frame associated with a key video frame in a video stream to be encoded and key point data of each frame associated video frame in the multi-frame associated video frame;

S3, adding a plurality of groups of key point coding data corresponding to the multi-frame associated video frames into preset fields of the key coding data respectively, and constructing transmission data, wherein the key coding data are data obtained by carrying out data coding on the key video frames, the key point coding data comprise coding data of key point data of the associated video frames, and the transmission data are used for being sent to a decoding end, so that the decoding end obtains the key video frames and the multi-frame associated video frames associated with the key video frames based on the transmission data.

s1, under the condition that transmission data sent by an encoding end are received, key encoding data contained in the transmission data and a plurality of groups of key point encoding data positioned in a preset field of the key encoding data are determined;

S2, decoding key coding data and a plurality of groups of key point coding data to obtain key point data of a key video frame and a plurality of frames of associated video frames associated with the key video frame, wherein the key point data is data corresponding to a target key point of the key video frame in the associated video frame;

And S3, inputting the key point data and the key video frames into a generated countermeasure network, and obtaining analog associated video frames output by the generated countermeasure network, wherein the generated countermeasure network is trained to perform image restoration generation on the key point data by utilizing the key video frames.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 8 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an IOS phone, etc.), a tablet computer, a palm computer, and a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 8 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

The memory 802 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for encoding and decoding video data in the embodiment of the present invention, and the processor 804 executes the software programs and modules stored in the memory 802, thereby executing various functional applications and data processing, that is, implementing the method for encoding and decoding video data described above. Memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 802 may further include memory remotely located relative to processor 804, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 802 may be used to store, but is not limited to, information such as video streams, key video frames, key point data, etc. As an example, as shown in fig. 8, the memory 802 may include, but is not limited to, a detection unit 602, an acquisition unit 604, and an addition unit 606 in a codec device including the video data. In addition, other module units in the above-mentioned encoding and decoding device of video data may be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 806 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 806 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 806 is a Radio Frequency (RF) module for communicating wirelessly with the internet.

In addition, the electronic device further includes: a display 808 for displaying the video stream; and a connection bus 810 for connecting the respective module parts in the above-described electronic device.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in various alternative implementations of the above-described encoding and decoding aspects of video data. Wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of:

s1, determining a key video frame in a current video stream to be encoded under the condition that an encoding end detects that a current network condition does not reach a preset network condition;

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-only memory (ROM), random-access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of encoding and decoding video data, comprising:

Under the condition that the current network condition does not reach the preset network condition, determining a first number of key video frames from the current video stream to be encoded, wherein the key video frames are video frames with the largest characteristic key points in the current video stream to be encoded, the characteristic key points are used for identifying target objects in the video stream to be encoded, and the first number and the number of video frames included in the current video stream to be encoded form a positive correlation;

acquiring a plurality of frames of associated video frames associated with the key video frames in the video stream to be encoded and key point data of each frame of associated video frame in the plurality of frames of associated video frames, wherein each key video frame is associated with a second number of associated video frames, and the second number is obtained by limiting a number range according to a preset associated range;

and adding a plurality of groups of key point coding data corresponding to the multi-frame associated video frames into preset fields of key coding data respectively to construct transmission data, wherein the key coding data are data obtained by data coding the key video frames, the key point coding data comprise coding data of the key point data of the associated video frames, and the transmission data are used for being sent to a decoding end so that the decoding end obtains the key video frames and the multi-frame associated video frames associated with the key video frames based on the transmission data.

2. The method according to claim 1, wherein after the detecting that the current network condition does not reach the preset network condition, the method further comprises:

determining a video frame currently positioned at the first position to be encoded in the video stream to be encoded as the key video frame; or (b)

And taking the video frames including all preset characteristic key points in the video stream to be coded as the key video frames.

3. The method of claim 2, wherein the obtaining the multi-frame associated video frames associated with the key video frames in the video stream to be encoded and the key point data for each of the multi-frame associated video frames comprises:

Carrying out characteristic key point identification on the associated video frames of each frame in turn by using an identification algorithm, wherein the characteristic key points are used for identifying the target object in the video stream to be encoded;

and taking the identified characteristic key point data as the key point data of the associated video frame.

4. The method of claim 1, further comprising, prior to adding the plurality of sets of key point encoded data corresponding to the respective multi-frame associated video frames in a preset field of key encoded data:

sequentially determining the frame sequence relation between each frame of associated video frame and the key video frame;

Performing data coding on the frame sequence relation and the key point data to obtain key point coding data;

And carrying out data coding on the key video frames to obtain the key coding data.

5. The method of claim 4, wherein adding the plurality of sets of key point encoded data corresponding to the respective frames of the multi-frame associated video to the predetermined field of key encoded data, and constructing the transmission data comprises:

Sequentially adding the key point coding data of each frame of the associated video frame into a preset field of the key coding data;

And taking the key coded data added with the key point coded data of all the associated video frames as the transmission data.

6. A method of encoding and decoding video data, comprising:

Under the condition that transmission data sent by an encoding end are received, key encoding data included in the transmission data and a plurality of groups of key point encoding data positioned in a preset field of the key encoding data are determined;

Decoding the key coding data and the plurality of groups of key point coding data to obtain a first number of key video frames and key point data of multi-frame associated video frames associated with the key video frames, wherein the key point data are data corresponding to target key points of the key video frames in the associated video frames, the key video frames are video frames with the most characteristic key points in the transmission data, the characteristic key points are used for identifying target objects in the transmission data, the first number and the number of video frames included in the transmission data form a positive correlation, each key video frame is associated with a second number of associated video frames, and the second number is obtained by limiting a number range according to a preset association range;

Inputting the key point data and the key video frames into a generation countermeasure network, and acquiring the simulation correlation video frames which are output by the generation countermeasure network and are correlated with the correlation video frames, wherein the generation countermeasure network is trained to carry out image restoration generation on the key point data by utilizing the key video frames, and the simulation correlation video frames and the key video frames are used for constructing videos to be played.

7. The method according to claim 6, wherein:

Before inputting the keypoint data and the key video frames to generate a countermeasure network, further comprising: calculating a motion offset parameter between the key point data and corresponding target key point data in the key video frame;

The inputting the keypoint data and the key video frames to generate a countermeasure network includes: inputting the key video frames, the key point data of the associated video frames and the motion offset parameters into the generated countermeasure network in turn so as to acquire the analog associated video frames output by the generated countermeasure network.

8. The method of claim 6, further comprising, after obtaining the analog associated video frame that generates the countermeasure network output:

acquiring a frame sequence relation corresponding to the analog association video frame, wherein the frame sequence relation is used for indicating the relative frame sequence of the analog association video frame and the key video frame;

determining the playing sequence of the multi-frame simulation association video frame and the key video frame according to the frame sequence relation;

and sequentially playing the key video frames and the multiple frames of analog related video frames according to the playing sequence.

9. A video data encoding and decoding apparatus applied to an encoding end, comprising:

the detection unit is used for determining a first number of key video frames from a current video stream to be encoded under the condition that the current network condition does not reach the preset network condition, wherein the key video frames are video frames with the largest characteristic key points in the current video stream to be encoded, the characteristic key points are used for identifying target objects in the video stream to be encoded, and the first number and the number of video frames included in the current video stream to be encoded form a positive correlation;

The obtaining unit is used for obtaining multi-frame associated video frames associated with the key video frames in the video stream to be encoded and key point data of each frame of associated video frame in the multi-frame associated video frames, wherein each key video frame is associated with a second number of associated video frames, and the second number is obtained by limiting a number range according to a preset associated range;

The adding unit is used for adding a plurality of groups of key point coding data corresponding to the multi-frame associated video frames into preset fields of key coding data respectively to construct transmission data, wherein the key coding data are data obtained by data coding the key video frames, the key point coding data comprise coding data of the key point data of the associated video frames, and the transmission data are used for being sent to a decoding end so that the decoding end can obtain the key video frames and the multi-frame associated video frames associated with the key video frames based on the transmission data.

10. A video data encoding and decoding apparatus applied to a decoding end, comprising:

The determining unit is used for determining key coding data included in the transmission data and a plurality of groups of key point coding data positioned in a preset field of the key coding data under the condition that the transmission data sent by the coding end are received;

The decoding unit is used for decoding the key coding data and the plurality of groups of key point coding data to obtain key point data of a first number of key video frames and a plurality of groups of associated video frames associated with the key video frames, wherein the key point data are data corresponding to target key points of the key video frames in the associated video frames, the key video frames are video frames with the most characteristic key points in the transmission data, the characteristic key points are used for identifying target objects in the transmission data, the first number and the number of video frames included in the transmission data form a positive correlation, each key video frame is associated with a second number of associated video frames, and the second number is obtained by limiting a number range according to a preset association range;

the generating unit is used for inputting the key point data and the key video frames into a generating countermeasure network, and acquiring the analog associated video frames which are output by the generating countermeasure network and are associated with the associated video frames, wherein the generating countermeasure network is trained to carry out image restoration generation on the key point data by utilizing the key video frames, and the analog associated video frames and the key video frames are used for constructing videos to be played.

11. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 8.

12. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 8 by means of the computer program.