CN117082249A

CN117082249A - Video encoding method, video decoding method, encoder, decoder, and medium

Info

Publication number: CN117082249A
Application number: CN202310884518.5A
Authority: CN
Inventors: 张雪; 林聚财; 江东; 方诚; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2023-11-17

Abstract

The application provides a knowledge image-based video encoding method, a knowledge image-based video decoding method, a knowledge image encoder, a knowledge image decoder and a knowledge image-based computer storage medium. The video encoding method includes: encoding a plurality of frames of video frames and knowledge image frames referenced by the frames of video frames to obtain a first video code stream, wherein the first video code stream comprises a video encoding layer and a network abstraction layer; acquiring all fragments of each knowledge image frame based on the network abstraction layer; placing all fragments of each knowledge image frame at the same position of the first video code stream; and generating a second video code stream according to the plurality of frames of video frames and the knowledge image frames after the positions are moved, and storing the second video code stream. By the video coding method, the knowledge image fragment stream is collected, so that local quick playback and quick play can be realized, the analysis cost is reduced, and the delay is reduced.

Description

Video encoding method, video decoding method, encoder, decoder, and medium

Technical Field

The present application relates to the field of video encoding and decoding technologies, and in particular, to a knowledge image-based video encoding method, a knowledge image-based video decoding method, a knowledge image encoder, a knowledge image-based video decoder, and a knowledge image-based computer storage medium.

Background

In video encoding and decoding, in order to increase the compression rate and reduce the code words to be transmitted, an encoder does not directly encode and transmit pixel values, but adopts an intra-frame or inter-frame prediction mode, and adopts reconstructed pixels of an encoded block of a current frame or a reference frame to predict the pixel values of the current block. The pixel value predicted using a certain prediction mode is called a predicted pixel value, and the difference between the predicted pixel value and the original pixel value is called a residual. The encoder only needs to encode a certain prediction mode and residual errors generated when the prediction mode is adopted, and the decoding end can decode corresponding pixel values according to the code stream information. This greatly reduces the code words required for encoding.

The existing SVAC3 video codec standard introduces the concept of knowledge image (library picture), and L frames in the following proposals represent knowledge images. The knowledge image is a long-term reference frame that is encoded using I-frames, and is used only as a reference frame and is not used for display. The knowledge-image is identified with its knowledge-image index IDX instead of the POC or DOI of the other frames in the code stream.

However, the organization form of the knowledge image stream in the prior art is directly stored after being encoded, and the local play service form has the following problems: in the scenes of playback, fast play and the like, all knowledge image slices can be collected by multiple times of analysis in the code stream, so that decoding is started, the analysis cost is related to the number of knowledge image slices, and the playing delay is correspondingly influenced.

Disclosure of Invention

In order to solve the technical problems, the application provides a knowledge image-based video encoding method, a knowledge image-based video decoding method, a knowledge image-based video encoder, a knowledge image-based video decoder and a knowledge image-based computer storage medium.

In order to solve the technical problems, the application provides a video coding method based on a knowledge image, which comprises the following steps:

encoding a plurality of frames of video frames and knowledge image frames referenced by the frames of video frames to obtain a first video code stream, wherein the first video code stream comprises a video encoding layer and a network abstraction layer;

acquiring all fragments of each knowledge image frame based on the network abstraction layer;

placing all fragments of each knowledge image frame at the same position of the first video code stream;

and generating a second video code stream according to the plurality of frames of video frames and the knowledge image frames after the positions are moved, and storing the second video code stream.

Wherein said placing all slices of each knowledge image frame at the same location of said first video bitstream comprises:

all slices of the knowledge image frame per frame are placed at the same position in the first video bitstream before a video frame referencing the knowledge image frame.

and placing all fragments of each knowledge image frame in the position of the original first fragment in the first video code stream.

and placing all fragments of each knowledge image frame in the position of the original last fragment in the first video code stream.

Wherein, the video coding method further comprises:

responding to a remote pulling stream playing request, converting the second video code stream into the third video code stream, wherein the third video code stream is the same as or different from the complete first video code stream;

and transmitting the third video code stream to a user terminal through a network.

The third video code stream is composed of a target video frame corresponding to the remote stream pulling playing request and a knowledge image frame referenced by the target video frame in the first video code stream.

Wherein the obtaining all slices of each knowledge image frame based on the network abstraction layer comprises:

obtaining a slicing network abstraction layer of each knowledge image frame slicing;

Acquiring target knowledge image frame slicing belonging to a knowledge image ending slice based on the slicing network abstraction layer;

and determining all fragments of each frame of knowledge image frame according to the fragments of the target knowledge image frame.

acquiring target knowledge image frame fragments belonging to knowledge image coding delimiters based on the fragment network abstraction layer;

determining all fragments of each knowledge image frame according to the target knowledge image frame fragments;

the knowledge image coding delimiter is a first knowledge image slice or a last knowledge image slice.

acquiring an image parameter set shared by all fragments of each knowledge image frame;

acquiring a syntax of the number of fragments of the marked knowledge image frame in the image parameter set;

and acquiring all fragments of the knowledge image frame of each frame according to the number of fragments in the syntax.

Wherein the obtaining all slices of the knowledge image frame according to the number of slices in the syntax includes:

Calculating the number of fragments of each knowledge image frame according to the sum of the number of fragments in the syntax and the preset number;

and determining all fragments of each frame of knowledge image frame according to the number of fragments of each frame of knowledge image frame.

acquiring a syntax of the total length of the code stream of the marked knowledge image frame in the image parameter set;

and acquiring all fragments of the knowledge image frame of each frame according to the total length of the code stream in the syntax.

In order to solve the technical problems, the application provides a video decoding method based on a knowledge image, which comprises the following steps:

obtaining a video code stream, wherein the video code stream comprises a video coding layer and a network abstraction layer;

acquiring all fragments of each frame of knowledge image frame based on the network abstraction layer, wherein all fragments of each frame of knowledge image frame are stored in the same position;

decoding the knowledge image frames in the video code stream according to all the fragments of each frame of knowledge image frame;

and decoding other video frames in the video code stream according to the knowledge image frames.

In order to solve the technical problem, the application also provides a video encoder, which comprises a memory and a processor coupled with the memory;

wherein the memory is configured to store program data and the processor is configured to execute the program data to implement a video encoding method as described above.

In order to solve the above technical problems, the present application further provides a video decoder, which includes a memory and a processor coupled to the memory;

wherein the memory is configured to store program data and the processor is configured to execute the program data to implement a video decoding method as described above.

To solve the above technical problem, the present application further proposes a computer storage medium for storing program data, which when executed by a computer, is configured to implement the above video encoding method and/or video decoding method.

Compared with the prior art, the application has the beneficial effects that: the method comprises the steps that a video encoder encodes a plurality of frames of video frames and knowledge image frames referenced by the frames of video frames to obtain a first video code stream, wherein the first video code stream comprises a video encoding layer and a network abstraction layer; acquiring all fragments of each knowledge image frame based on the network abstraction layer; placing all fragments of each knowledge image frame at the same position of the first video code stream; and generating a second video code stream according to the plurality of frames of video frames and the knowledge image frames after the positions are moved, and storing the second video code stream. By the video coding method, the knowledge image fragment stream is collected, so that local quick playback and quick play can be realized, the analysis cost is reduced, and the delay is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Wherein:

FIG. 1 is a diagram of an embodiment of frame references in a code stream comprising I frames and P frames according to the present application;

FIG. 2 is a schematic diagram of an embodiment of frame references in a code stream comprising I frames, P frames, and B frames according to the present application;

FIG. 3 is a schematic diagram of another embodiment of a frame reference relationship in a code stream comprising I frames and P frames provided by the present application;

FIG. 4 is a diagram of an embodiment of frame reference relationships and knowledge of the position of images in a code stream in an IPPP configuration in the prior art;

FIG. 5 is a diagram of another embodiment of the prior art frame reference relationship and knowledge of the position of images in a code stream in an IPPP configuration;

FIG. 6 is a schematic diagram of a video codec storage related service scenario provided by the present application;

FIG. 7 is a flowchart of an embodiment of a video encoding method according to the present application;

FIG. 8 is a schematic flow chart of a specific service scenario of the video encoding method of the present application;

FIG. 9 is a diagram of an embodiment of frame reference relationships and knowledge of the position of images in a code stream according to the present application;

fig. 10 is a schematic diagram of an update process of an LDPB in an encoding process or a decoding process provided by the present application;

FIG. 11 is a schematic diagram of one embodiment of a knowledge image reorganization process at a storage location, in accordance with the present application;

FIG. 12 is a schematic diagram of another embodiment of a knowledge image reorganization process at a storage location, provided by the present application;

FIG. 13 is a schematic diagram of an embodiment of a knowledge image reorganization process for remote streaming playback provided by the present application;

FIG. 14 is a flowchart of an embodiment of a video decoding method according to the present application;

FIG. 15 is a schematic diagram of a video encoder according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a video decoder according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The video image data volume is relatively large, and video pixel data (RGB, YUV, etc.) is usually required to be compressed, and the compressed data is called a video code stream, and the video code stream is transmitted to a user terminal through a wired or wireless network and then decoded and watched. The whole video coding flow comprises the processes of prediction, transformation, quantization, coding and the like.

In video coding, the most commonly used color coding methods include YUV, RGB, etc., and the color coding method adopted in the invention is YUV. Y represents brightness, that is, a gray value of an image; u and V (i.e., cb and Cr) represent chromaticity, which functions to describe image color and saturation. Each Y luminance block corresponds to one Cb and one Cr chrominance block, and each chrominance block also corresponds to only one luminance block.

The video code stream is formed by continuous frames, and each frame is decoded and played in turn to form a video picture. Common frame types in existing video codec standards are I-frames, P-frames, and B-frames.

I frame is an intra-frame coding frame, is an independent frame with all coding and decoding information, and can independently perform coding and decoding without referring to other frames. I-frames require the complete encoding of all the content of the frame image, and generally produce a larger code stream with a lower compression rate.

The P frame is an inter prediction encoded frame, and it is necessary to refer to a past frame in the display order as a reference picture to perform encoding and decoding.

B frames are bi-directional inter-frame predictive coded frames that require reference to past and future frames in display order as reference frames for encoding and decoding.

Fig. 1 is a schematic diagram of frame reference relationships in a code stream including I frames and P frames, and fig. 2 is a schematic diagram of frame reference relationships in a code stream including I frames, P frames, and B frames. POC (pic_order_cnt) in fig. 1 and 2 is the play order of video frames, and DOI (decode order index) is the codec order of video frames. As can be seen from fig. 2, when B frames are present in the bitstream, the encoding and decoding order and the playing order of the frames may be different.

It should be noted that, there may be various free combinations of frame reference relationships of the video sequence, and fig. 1 and fig. 2 are only illustrative of one common reference relationship.

There are many occasions when random access is required when playing video. For example, when a frame is lost in live broadcast, the decoder cannot construct a reference relationship between frames, and cannot decode normally. At which point the decoder will look for the next random access frame to restart decoding. Or when video is to be played from a certain moment in the video-on-demand process, the decoder generally starts decoding from the random access frame preceding the frame corresponding to the moment.

To meet the random access requirement, the encoder periodically inserts a random access frame into the code stream. The random access frames in the prior art are all I frames. However, the size of the code stream of the I frame is larger, and the impact on bandwidth is stronger in the transmission process, so that the I frame is not frequently inserted in the encoding process. However, the larger the interval between the periods of inserting the I-frames, the longer the waiting time in the random access process, and the encoder needs to choose between these, and a suitable I-frame period is selected. One solution to this problem is to insert P-frames or B-frames between random access frames that reference only the previous random access frame.

For ease of understanding, the present disclosure may refer to frames having POC 0,1,2 … … as frames 0,1,2 … … in the following description. As in fig. 3, frame 0 is an I frame and frames 1,2, 3, 4 are P frames, with frame 3 directly referencing the I frame. If the 1 st or 2 nd frame is lost, normal decoding of the 3 rd and 4 th frames is not affected as long as the I frame is still in the decoder.

When playing on demand, if the 3 rd frame is to be randomly played, only the 0 th and 3 rd frames are read and decoded. In the reference model of fig. 1, the image is obtained only by reading and decoding 3 frames of 0,1,2 and then decoding 3 rd frames.

It can be seen that inserting a P frame or B frame referring to only the previous random access frame can reduce the reading and decoding overhead required for random access.

Specifically, the IDR frame is a random access frame. An IDR frame is a special I frame, and in addition to random access, if an IDR frame is encountered during encoding and decoding, all buffered frames in the buffer are cleared. Therefore, frames encoded/decoded after the IDR will not refer to frames encoded/decoded before the IDR.

The existing SVAC3 video codec standard introduces the concept of knowledge image (library picture), and L frames in the following proposals represent knowledge images. Meanwhile, the concept of RL (reference library) frames is introduced, and RL frames refer to P frames or B frames that are reference-only knowledge images. The knowledge image is a long-term reference frame that is encoded using I-frames, and is used only as a reference frame and is not used for display. The knowledge-image is identified with its knowledge-image index IDX instead of the POC or DOI of the other frames in the code stream.

For the structure of knowledge images in the code stream: in the prior art, the knowledge image is encoded in an I frame mode, meanwhile, because QP of the encoded knowledge image is generally smaller, encoding is slower, code rate is generally larger, and the problem that code rate impact is large and jitter occurs in decoding exists when an entire frame of knowledge image code stream is added into the code stream in an interleaving manner, therefore, the knowledge image is segmented into a plurality of patches by using the patch mechanism in the prior SVAC3 standard, and is encoded in an interleaving manner with a plurality of display images, and only one patch is encoded each time and put into the code stream, and finally, an encoded output bit stream of interleaving of a knowledge base patch bit stream and the display image bit stream is obtained. Fig. 4 and 5 below are examples of frame reference relationships and knowledge images in the code stream in the IPPP configuration in the prior art.

Management of knowledge image cache frames: in the prior art, when an image starts to reference a new knowledge image or starts to encode/decode a new knowledge image, the previous knowledge image is replaced, that is, at most only one knowledge image can be referenced for all frames of the current sequence, and the knowledge image must be the most recently encoded/decoded knowledge image.

Index on knowledge images: in the prior art, knowledge images support indexes of 0-511, i.e. only 512 frames of knowledge images are supported to be referenced by frames of the whole sequence.

Regarding the configuration of knowledge image frames (L frames) and RL frames: in the prior art, knowledge images encode all original I frames as RL frames in the form of P frames or B frames according to I frame intervals.

The reference picture queue configuration set (RPL, reference Picture Lists) records the reference relationship of each frame picture and is used to update the reference picture buffer to clean up frames that are no longer referenced later. In SVAC3, the syntax of RPL is expressed as follows:

the RPL includes RPL0 and RPL1, where a previous reference frame and a subsequent reference frame are recorded, and the following syntax is the syntax of one RPL0/RPL1, and it can be seen that the RPL records the number num_of_ref_pic of the reference frames, marks whether each reference frame is a knowledge image library_index_flag, records the index reference_library_picture_index for the knowledge image reference image, records abs_delta_doi for other non-knowledge image reference frames, and further obtains the index. The following table gives the syntactic definition of RPL and its meaning:

for each frame of image, there is a corresponding RPL, where the RPL includes all current frames and reference frames that will be used by frames following the current frame, and the reference frames used by the current frame are the previous num_ref_default_active_minus1 reference frames in the frames, and num_ref_default_active_minus1 is the same as the RPL and is also expressed in syntax. That is, the RPL of one frame includes:

(1) The number of reference frames that the current frame and the frames following the current frame will use.

(2) Marking whether all reference frames are knowledge image frames.

(3) Reference index of all reference frames.

Meanwhile, each frame can judge that the first few frames in all the reference frames of the RPL are frames to be referred to by the current frame according to the syntax num_ref_default_active_minus1.

NAL Unit of SVAC3 includes 3 bytes (24 bits) prefix 0x000001, and the following NAL header related syntax, including 1bit for orbiten_zero_bit, 1bit for nal_ref_idc,4bit for nal_unit_type,1bit for encryption_idc, and 1bit for authentication_idc, and NAL header related syntax is 1 byte (8 bits) total, so NAL Unit is 2 bytes total. The NAL header related syntax is as follows:

the 4bit nal_unit_type may have 16 values, and the purposes of the 16 values at present are as follows:

the application provides a knowledge image stream reorganization method for the change of a knowledge image code stream organization form during transmission and storage, which particularly comprises a code stream specific form for reorganizing the knowledge image stream during storage and a reorganization method. A specific practical application scenario of the knowledge image stream reorganization method of the application is shown in the following figure 6, the image is collected and coded at the collection end, the coded code stream is transmitted and stored at the storage end through the network, and two service forms of local play and far-end stream pulling play are stored at the storage end.

Referring to fig. 7 and fig. 8, fig. 7 is a schematic flow chart of an embodiment of a video encoding method according to the present application, and fig. 8 is a schematic flow chart of a specific service scenario of the video encoding method according to the present application.

As shown in fig. 1, the specific steps are as follows:

step S11: and encoding a plurality of frames of video frames and knowledge image frames referenced by the frames of video frames to obtain a first video code stream, wherein the first video code stream comprises a video encoding layer and a network abstraction layer.

In the embodiment of the application, a video encoder encodes a plurality of frames of video frames and knowledge image frames referenced by the frames of video frames to obtain a first video code stream. Wherein, the encoding process of the current video frame is as follows:

when the video encoder encodes the current video frame, RPL information of the current video frame, namely a first reference image queue configuration set, is extracted. The first reference image queue configuration set comprises reference frame indexes of the current video frame, wherein the reference frame indexes comprise reference knowledge image frame indexes and/or reference non-knowledge image frame indexes.

The video encoder obtains a reference frame index for a current video frame based on the first reference image queue configuration set.

In the embodiment of the present application, the RPL information describes a knowledge image frame and a non-knowledge image frame referred to by a current video frame, and referring to fig. 9, the RPL information of the 1 st frame P frame in fig. 9 describes an index of the 0 st frame RL frame and an index of the 0 th frame L frame. Wherein, the 1 st frame knowledge image is generated by the 3 rd frame original image of the original sequence.

In the prior art, only one knowledge image frame can exist in the buffer area at the same time, and the video frame to be encoded can only refer to the nearest knowledge image frame, namely, the 9 th frame P frame in FIG. 6, and in the prior art, only the 8 th frame RL frame and the 1 st frame L frame can be referred to, because only the 1 st frame L frame is buffered in the buffer area, and the 0 th frame L frame is deleted.

In the application, LDPB is adopted to manage a plurality of L frames, so that a plurality of L frames can be buffered in the buffer area at the same time. As shown in fig. 9, the reference frame index of the 9 th frame P frame includes the 8 th frame RL frame and the 0 th frame L frame.

The video encoder acquires reference knowledge image frames from the buffer area according to the reference frame indexes, and encodes the current video frames according to the reference knowledge image frames to obtain the encoding code stream of the current video frames.

In the embodiment of the application, the video encoder acquires the reference knowledge image frame from the buffer area through the reference frame index, and can encode the current video frame by taking the reference knowledge image frame as the reference frame, thereby obtaining the video code stream of the current video frame.

In addition, if the reference frame index further includes indexes of non-knowledge image frames, that is, indexes of other P frames, other B frames, or other RL frames, the video encoder may further acquire other reference non-knowledge image frames, and encode the current video frame by using the reference non-knowledge image frames pointed by the indexes and the reference knowledge image frames together, so as to obtain a video code stream of the current video frame.

The management mechanism for L frames mentioned above, referred to as LDPB, can manage multiple L frames, i.e. there can be multiple L frames during the sequence encoding/decoding process, which are selected for use by subsequent encoded/decoded frames. The implementation process of the management mechanism is as follows:

first, a syntax needs to be added to specify the length of the LDPB, and the number of L frames in the buffer can be at most present at the same time.

Second, for each L frame in the LDPB, its state is marked, including "referenced", "not referenced", indicating whether the frame is referenced by the current encoded/decoded frame, respectively.

Referring specifically to fig. 10, fig. 10 is a schematic diagram illustrating an LDPB update process in an encoding process or a decoding process according to the present application. The method comprises the following steps:

1. before encoding or decoding, the referenced state of all L frames in the LDPB needs to be marked according to RPL information, if the frame in the current LDPB is not contained in the RPL of the current frame, it needs to be marked as "not referenced", otherwise the state remains as "referenced".

2. A frame in the LDPB that is "not referenced" means that none of the subsequent frames refer to this frame, indicating that it can be exited from the LDPB. The frame "not referenced" in the LDPB is cleared, and the frame "referenced" is continued to be reserved.

3. If the current frame is an L frame, after encoding or decoding this L frame, the generated reconstructed frame is added to the LDPB while being marked as "referenced".

The following describes the LDPB management mechanism by way of a specific embodiment:

take the structure of the code stream in SVAC3 as an example. In the existing monitoring scene, when the dome camera is used for shooting images at multiple angles at fixed points, for example, a certain device shoots 2 fixed scenes in a circulating way, under the scene, the generated knowledge image of each scene for the first time can be always kept in a buffer area, and the subsequent repeated transmission of knowledge images of the same scene for multiple times is reduced. With continued reference to fig. 9, fig. 9 is a configuration in which the latter frame references the former knowledge image, e.g., the dome camera captures a first point location and generates knowledge image 0 at frame 0, captures a second point location and generates knowledge image 1 at frame 3, and switches back to the first point location at frame 8. At this time, the 0 th frame knowledge image may be directly referred to without retransmitting one knowledge image.

After the encoding is finished, the first video code stream comprises a video encoding layer and a network abstraction layer. The video coding layer is responsible for effectively representing the content of video data, and the network abstraction layer is responsible for formatting the data and providing header information to ensure that the data is suitable for transmission over various channels and storage media. The video coding method of the application can rapidly indicate the end of the collecting slice process by increasing the syntax or increasing the tail mark and the like at the network abstraction layer so as to rapidly realize the knowledge image reorganization process.

Step S12: all slices of each knowledge image frame are acquired based on the network abstraction layer.

In an embodiment of the present application, a method for quickly indicating the end of the slice collecting process used by a video encoder includes, but is not limited to:

(1) Adding NAL type of a slice, marking the current slice as the last knowledge image slice.

(2) Adding a NAL type of a CRR slice, and marking the current slice as the first knowledge image slice or the last knowledge image slice.

(3) Syntax is added in the header, and the total number of knowledge image pieces is marked.

(4) Syntax is added to the header, marking the total length of the knowledge image stream.

Wherein, if the method of adding the NAL type is adopted, namely (1) and (2) above, specific organization forms of the NAL include but are not limited to:

(1) Increasing the total number of bytes of the NAL unit to N (N > 1), adding a new number to the NAL type in the NAL unit indicates the above type. If the total number of bits in all syntax in the NAL unit after the increase does not reach the total number of bytes N, then the remaining bits are set to a number of reserved bits, which may be used for purposes including, but not limited to: time domain scalability, spatial domain scalability, quality scalability, encryption authentication correlation, and the like.

(2) The reserved position number in the existing NAL type is adopted, and is set to be the type.

(3) The number is used to replace a certain type of the existing NAL types, and is set to the above type.

The following describes the above four ending methods by specific embodiments respectively:

The following is a specific example based on the SVAC3 standard.

First, the total number of bytes of the NAL unit is extended to 2 bytes, and the nal_unit_type in the NAL unit is extended to 5 bits, i.e. the value range of the nal_unit_type is 0-31.

The specific values of nal_unit_type are as follows:

nal_unit_type	NAL sheetContent of RBSP grammar structure in meta-data
		0	Reservation of
1	Coding slice patch_data_rbsp of non-IDR image ()
		2	Code slice patch_data_rbsp of IDR image ()
3	SVC enhancement layer encoding slice patch_data_rbsp of non-IDR image ()
		4	SVC enhancement layer encoding slice patch_data_rbsp of IDR image ()
5	Monitoring extended data Unit survilance_extension_rbsp ()
		6	Supplementary enhancement information sei_rbsp ()
7	Sequence parameter set seq_parameter_set_rbsp ()
		8	Picture parameter set pic_parameter_set_rbsp ()
9	Safety parameter set sec_parameter_set_rbsp ()
		10	Authentication data authentication_data_rbsp ()
11	End of stream rbsp ()
		12	Knowledge image frame latch_data_rbsp ()
13	Reserved for SVAC audio
		14	Knowledge image DRAP access point, RL frame patch_data_rbsp ()
15	SVC enhancement layer picture parameter set pic_parameter_set_rbsp ()
		16	Knowledge image end piece end_patch_data_rbsp ()
17～31	Reservation of

Specifically, the syntax definition of the knowledge image end slice end_patch_data_rbsp () is added to the nal_unit_type of the end method (1), thereby marking the current slice as the last knowledge image slice.

(2) Adding a CRR slice tail NAL type, and marking the current slice as the first knowledge image slice or the last knowledge image slice.

The following is a specific example based on the SVAC3 standard.

First, the total number of bytes of the NAL unit is extended to 2 bytes, and the nal_unit_type in the NAL unit is extended to 5 bits, i.e., the value range of the nal_unit_type is 0 to 31.

The specific values of nal_unit_type are as follows:

specifically, a syntax definition of a knowledge image coding delimiter slice_data_delimiter_rbsp () is added to the nal_unit_type of the ending method (2), thereby marking the current slice as the first knowledge image slice or the last knowledge image slice.

For example, for the first knowledge image slice and the last knowledge image slice, the nal_unit_type is patch_data_localizer_rbsp (), and for the other knowledge image slices, the nal_unit_type is patch_data_rbsp ().

(3) And adding a num syntax in the image header, and marking the total number of knowledge image slices.

The following is a specific example based on the SVAC3 standard, and is specific as follows:

wherein, library_patch_num_minus1 represents the number of knowledge image frame fragments minus one, and the value range is 0-255.

In other embodiments, the knowledge image frame slice number may also be represented by library_patch_num_minusn, where n is a non-0 integer. By setting minus, the amount of data stored in the syntax can be effectively reduced, for example, the number of slices of the knowledge image frame is 10, and the value corresponding to library_patch_num_minus1 is stored as 9. When decoding, the video decoder only needs to add 1 to the stored value to obtain the knowledge image frame slicing number.

(4) And adding a size syntax into the image header to mark the total length of the knowledge image stream.

wherein library_picture_size represents the total length of the knowledge image stream, and the value range is 0-2 ³² -1。

Step S13: all slices of each knowledge image frame are placed in the same position of the first video stream.

In the embodiment of the application, a knowledge image reorganization process is added at a storage end by a video encoder, the sequence of the code streams is adjusted, and the streams of the knowledge image slices are collected and stored together.

The method for reorganizing the knowledge image at the storage position of the application is characterized in that the change process of the organization of the knowledge image code stream in the whole service scene is as follows: after the image is acquired and encoded, the knowledge image stream is organized according to the prior mode, so that the code rate stability in the transmission process is ensured; adding a knowledge image reorganization process before a storage process, and collecting knowledge image pieces together; for the pull stream playing requirement of a remote user, the code stream is required to be transmitted to the user side through a network, and the stored stream is reorganized and transmitted, so that the code rate is kept stable.

The knowledge image reorganization mode includes but is not limited to:

(1) And the collected knowledge image stream is placed at the position of the original first fragment in the code stream, so that quick decoding and playing are facilitated during on-demand.

(2) The stream of knowledge images collected together is placed at any position before the first stream of frames that need to be referenced to it.

(3) The stream of knowledge images collected together is placed at the location of the last tile in the past.

Specifically, a schematic diagram of an example of a knowledge image reorganization process at a storage is shown in fig. 11, which illustrates a change process of organization of a knowledge image code stream in an entire business scenario.

As shown in fig. 11, after image acquisition and encoding, the knowledge image stream is organized according to the existing mode, so as to ensure the stable code rate in the transmission process.

And a knowledge image reorganization process is added before a storage process, knowledge image slices are collected together, and the collected knowledge image streams are placed at the positions of the original first slices in the code stream, so that quick decoding and playing are facilitated during on-demand.

For the pull stream playing requirement of a remote user, the code stream needs to be transmitted to the user side through a network, when the code stream is transmitted to the knowledge image stream, the first fragment is transmitted first, and the rest fragments are transmitted alternately in the follow-up display frames according to the original mode.

Wherein, put the segmentation in the following problem of the avoidable of former first segmentation's position:

(1) In the pull field, if the knowledge image is not placed at the original first slicing position, the position where the knowledge image should be transmitted needs to be calculated in advance, and delay is increased.

(2) In the pull field, if a knowledge image slice which is required to be transmitted simultaneously with the current display frame is required to be transmitted after the frame, the knowledge image stream is required to be searched backwards, and delay is increased.

(3) After the display frame is rapidly transmitted, a plurality of knowledge image slices are not transmitted, the code rate is suddenly increased, and the time delay is increased.

In particular, a schematic diagram of another example of a knowledge image reorganization process at a storage is shown in fig. 12, which illustrates a change process of organization of a knowledge image code stream in an entire business scenario.

As shown in fig. 12, after image acquisition and encoding, the knowledge image stream is organized according to the existing mode, so as to ensure the stable code rate in the transmission process.

The knowledge image reorganization process is added before the storage process, the knowledge image slices are collected together, and the collected knowledge image stream is placed at the position of the last slice before the stream of the first frame needing to be referred to, so that the fast decoding and playing are facilitated during on-demand.

For the pull stream playing requirement of a remote user, the code stream needs to be transmitted to the user side through a network, and the position where each knowledge image needs to be transmitted earliest is calculated, for example, after POC4 in this example, when the knowledge image is transmitted to the frame, the knowledge image stream is searched backwards, a piece is transmitted, the display frame and the knowledge image piece are transmitted again, and the knowledge image piece stream is transmitted alternately in the subsequent display frame according to the original mode.

Step S14: and generating a second video code stream according to the plurality of frames of video frames and the knowledge image frames after the positions are moved, and storing the second video code stream.

Further, as shown in fig. 6 and fig. 8, in the remote user pulling and playing scenario, for the remote user pulling and playing requirement, the video encoder reassembles the knowledge image code stream into the code stream for network transmission, that is, the second video code stream reassembles the first video code stream into the first video code stream, and decodes the first video code stream for remote viewing by the user through network transmission.

In addition, the video encoder may reorganize the code stream format according to specific playing requirements, referring to fig. 13 specifically, fig. 13 is a schematic diagram of an embodiment of a knowledge image reorganization process of remote pull stream playing provided in the present application.

As shown in fig. 13, the playing requirement in this embodiment is random access from the RL frame of POC8, in this case, only the RL frame and the L frame referenced by the RL frame need to be transmitted, the L frame referenced by the RL frame is searched for forward, and after the L frame is transmitted in whole frame or in fragments, the RL frame and the subsequent display frame of POC8 are transmitted. By analyzing the specific remote streaming playing requirement, the video encoder can only encode partial video frames and reference frames thereof which are required to be played by a user, and does not need to encode complete video each time, so that the encoding and transmission efficiency can be effectively improved.

In the embodiment of the application, a video encoder encodes a plurality of frames of video frames and knowledge image frames referenced by the frames of video frames to obtain a first video code stream, wherein the first video code stream comprises a video encoding layer and a network abstraction layer; acquiring all fragments of each knowledge image frame based on the network abstraction layer; placing all fragments of each knowledge image frame at the same position of the first video code stream; and generating a second video code stream according to the plurality of frames of video frames and the knowledge image frames after the positions are moved, and storing the second video code stream. By the video coding method, the knowledge image fragment stream is collected, so that local quick playback and quick play can be realized, the analysis cost is reduced, and the delay is reduced.

The application provides a knowledge image reorganization method added in storage application, and the knowledge image fragment streams are collected and put together, so that local quick playback and quick playback can be realized, the analysis cost is reduced, and the delay is reduced. The application provides the method for quickly realizing the recombination of the knowledge images by adding the types of the knowledge image slices or adding the ends of the syntactic indication collecting slice process in the image header, and can quickly collect all knowledge image slices with smaller analysis cost so as to achieve the aim of reducing delay.

With continued reference to fig. 14, fig. 14 is a flowchart illustrating an embodiment of a video decoding method according to the present application.

As shown in fig. 14, the specific steps are as follows:

step S21: and obtaining a video code stream, wherein the video code stream comprises a video coding layer and a network abstraction layer.

Step S22: and acquiring all fragments of each frame of knowledge image frame based on the network abstraction layer, wherein all fragments of each frame of knowledge image frame are stored in the same position.

Step S23: the knowledge image frames in the video bitstream are decoded in accordance with all slices of each knowledge image frame.

Step S24: other video frames in the video bitstream are decoded according to the knowledge image frames.

The video decoding method is applied to a local playing scene acted by a video decoder, and realizes local quick playback and quick play by decoding all fragments of the knowledge image frame stored in the same position, thereby reducing the analysis cost and further reducing the delay.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

In order to implement the video encoding method, the present application further provides a video encoder, and referring to fig. 15, fig. 15 is a schematic structural diagram of an embodiment of the video encoder provided by the present application.

The video encoder 400 of the present embodiment includes a processor 41, a memory 42, an input-output device 43, and a bus 44.

The processor 41, the memory 42 and the input/output device 43 are respectively connected to the bus 44, and the memory 42 stores program data, and the processor 41 is configured to execute the program data to implement the video encoding method according to the above embodiment.

In an embodiment of the present application, the processor 41 may also be referred to as a CPU (Central Processing Unit ). The processor 41 may be an integrated circuit chip with signal processing capabilities. The processor 41 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a field programmable gate array (FPGA, field Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 41 may be any conventional processor or the like.

In order to implement the above video decoding method, the present application further provides a video decoder, and referring to fig. 16 specifically, fig. 16 is a schematic structural diagram of an embodiment of the video decoder provided by the present application.

The video decoder 500 of the present embodiment includes a processor 51, a memory 52, an input-output device 53, and a bus 54.

The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, and the memory 52 stores program data, and the processor 51 is configured to execute the program data to implement the video decoding method according to the above embodiment.

In an embodiment of the present application, the processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a field programmable gate array (FPGA, field Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

The present application further provides a computer storage medium, and referring to fig. 17, fig. 17 is a schematic structural diagram of an embodiment of the computer storage medium provided by the present application, in which a computer program 61 is stored in the computer storage medium 600, and the computer program 61 is used to implement the video encoding method and/or the video decoding method according to the above embodiments when executed by a processor.

Embodiments of the present application may be stored in a computer readable storage medium when implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. A method for knowledge-based image video coding, the method comprising:

2. The video coding method of claim 1, wherein,

the placing all slices of the knowledge image frame at the same position of the first video code stream includes:

3. The video coding method according to claim 1 or 2, wherein,

4. The video coding method according to claim 1 or 2, wherein,

5. The video coding method of claim 1, wherein,

the video encoding method further includes:

6. The video coding method of claim 5, wherein,

7. The video coding method of claim 1, wherein,

the obtaining all fragments of each knowledge image frame based on the network abstraction layer comprises the following steps:

8. The video coding method of claim 1, wherein,

9. The video coding method of claim 1, wherein,

10. The video coding method of claim 9, wherein,

the obtaining all slices of the knowledge image frame according to the number of slices in the syntax includes:

11. The video coding method of claim 1, wherein,

12. A method for knowledge-image-based video decoding, the method comprising:

13. A video encoder comprising a memory and a processor coupled to the memory;

wherein the memory is for storing program data and the processor is for executing the program data to implement the video encoding method of any one of claims 1 to 11.

14. A video decoder comprising a memory and a processor coupled to the memory;

wherein the memory is for storing program data and the processor is for executing the program data to implement the video decoding method of claim 12.

15. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the video encoding method of any one of claims 1 to 11 and/or the video decoding method of claim 12.