CN114157863A

CN114157863A - Video coding method, system and storage medium based on digital retina

Info

Publication number: CN114157863A
Application number: CN202210116127.4A
Authority: CN
Inventors: 张羿; 牛梅梅; 向国庆; 滕波; 洪一帆; 焦立欣; 陆嘉瑶
Original assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Current assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date: 2022-02-07
Filing date: 2022-02-07
Publication date: 2022-03-08
Anticipated expiration: 2042-02-07
Also published as: CN114157863B

Abstract

The application provides a video coding method, a system and a storage medium based on a digital retina, which are used for acquiring an input image, and performing inter-frame/intra-frame prediction based on a block to obtain first coding prediction data; according to the input image, inter-frame/intra-frame prediction is carried out based on a depth prediction model to obtain second coding prediction data; when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data. According to the method and the device, the coding compression mode is selected in a self-adaptive mode, and meanwhile, the characteristic extraction capability of the depth model is utilized, so that the redundant information in the image is further compressed, the coding performance is greatly improved, and the higher compression ratio is realized.

Description

Video coding method, system and storage medium based on digital retina

Technical Field

The present application relates to the field of digital signal processing technologies, and in particular, to a method, a system, and a storage medium for video encoding based on a digital retina.

Background

Video compression, also known as the digital retina concept, has attracted much attention in the fields of video encoding and decoding, video monitoring, and the like. In the traditional image processing field, video compression and video analysis belong to two different fields, and the digital retina technology is inspired by the biological function of human retina, and an intelligent image sensor integrating video compression and video analysis is proposed firstly. Specifically, the digital retina is characterized in that video compression data and video feature data can be obtained simultaneously and transmitted to the cloud end through data streams, so that later playback and retrieval are facilitated. In order to obtain the feature stream of the image, the digital retina technology introduces the concept of model stream, that is, the image acquisition front end can apply different feature extraction models according to requirements, and the models can be sent to the image acquisition front end in a cloud storage and reverse transmission mode.

In video compression, the purpose of video coding is to eliminate redundant information that exists between video signals. Increasingly optimized video coding standards help to further increase the compression efficiency of video images. The video compression coding and decoding technology based on the block is developed very mature, and has the characteristics of moderate computational complexity, high compression rate, high reconstruction quality and the like. The mainstream codec technologies at present include h.264/h.265/h.266, MPEG2/MPEG4, etc., which are mainly based on the video codec of the block. The new generation of coding standards adopt a technique of increasing the compression ratio by a "computational space change" method. For example, the evolution from h.264 to h.265, the compression ratio is improved by 50%, but also brings more calculation requirements. This is due to the use of more flexible coding units, which allows motion compensation based compression methods to exploit more compression potential.

However, the nature of video coding is still based on redundancy compression at the pixel level, so the coding unit is still a relatively small block of data. For example, h.265 uses code tree cells that are only 64 x 64 at maximum. Coding units of this scale are unable to efficiently extract and compress the semantic content of an image.

Disclosure of Invention

The invention provides a video coding method, a video coding system and a storage medium based on a digital retina, and aims to solve the problem of low compression performance in video compression coding in the prior art.

According to a first aspect of embodiments of the present application, there is provided a method for compressing video based on a digital retina, comprising the steps of:

acquiring an input image, and performing inter-frame/intra-frame prediction based on a block to obtain first coding prediction data;

according to the input image, inter-frame/intra-frame prediction is carried out through a depth prediction model based on a depth model, and second coding prediction data are obtained;

when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data.

In some embodiments of the present application, after the image is input, the method further includes selecting a coding region, and the method for coding the coding region specifically includes:

inter/intra prediction is carried out on the basis of the blocks, and first coding prediction data of a coding area are obtained;

inter-frame/intra-frame prediction is carried out based on the depth prediction model, and second coding prediction data of a coding region are obtained;

when the second coding prediction data amount of the coding area is smaller than the first coding data amount, coding the second coding prediction data as the coding data of the coding area;

the coded data further comprises a coding region number, a coding mode and a corresponding decoding model.

In some embodiments of the present application, the method further comprises:

when the second coding prediction data amount of the coding area is larger than or equal to the first coding data amount, coding the first coding prediction data as the coding data of the coding area;

the coded data further comprises a coding region number and a coding mode.

In some embodiments of the present application, acquiring an input image, and performing inter/intra prediction based on a block to obtain first encoded prediction data specifically includes:

inputting an input image into an inter-frame/intra-frame prediction module to obtain a first inter-frame/intra-frame prediction value;

obtaining a first residual value according to the interframe/intraframe predicted value and the motion compensation value;

the first residual value is processed by a transformer and a quantizer to obtain first coding prediction data.

In some embodiments of the present application, the first inter/intra prediction value is further input to a depth prediction model, and accordingly, according to an input image, inter/intra prediction is performed through the depth prediction model based on the depth model to obtain second encoded prediction data, which specifically includes:

and performing inter/intra prediction based on the depth prediction model according to the input image and the first inter/intra prediction value to obtain second coding prediction data.

In some embodiments of the present application, inter/intra prediction is performed based on a depth prediction model according to an input image to obtain second encoded prediction data, which specifically includes:

inputting the image into a depth prediction model to obtain a second inter-frame/intra-frame prediction value;

obtaining a second residual value according to the second inter/intra prediction value;

and the second residual error value passes through the quantizer to obtain second coding prediction data.

In some embodiments of the present application, the second residual values are transformed by a transformer and a quantizer to obtain second encoded prediction data.

In some embodiments of the present application, the depth prediction model includes a plurality of models, and performs inter/intra prediction based on the depth prediction model according to the input image to obtain the second encoded prediction data, further including:

simultaneously inputting the same data frame of the image into a plurality of models of the depth prediction model to obtain a plurality of second inter-frame/intra-frame prediction values;

obtaining a plurality of second residual values according to the plurality of second inter/intra prediction values;

and selecting the minimum second residual value and the corresponding model in the plurality of second residual values.

According to a second aspect of the embodiments of the present application, there is provided a digital retina-based video coding system, specifically including:

the first coding prediction unit: the device comprises a processing unit, a coding unit and a prediction unit, wherein the processing unit is used for acquiring an input image, and performing inter/intra prediction based on a block to obtain first coding prediction data;

the second coding prediction unit: the depth prediction module is used for carrying out inter-frame/intra-frame prediction based on the depth prediction model according to the input image to obtain second coding prediction data;

and a coding mode decision device: and the encoder is configured to encode according to the second encoded prediction data when the second encoded prediction data amount is smaller than the first encoded data amount.

According to a third aspect of embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon; the computer program is executed by a processor to implement a video encoding method.

The video coding method, the system and the storage medium based on the digital retina in the embodiment of the application are adopted to obtain an input image, and inter-frame/intra-frame prediction is carried out based on blocks to obtain first coding prediction data; according to the input image, inter-frame/intra-frame prediction is carried out based on a depth prediction model to obtain second coding prediction data; when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data. According to the method and the device, the coding compression mode is selected in a self-adaptive mode, and meanwhile, the characteristic extraction capability of the depth model is utilized, so that the redundant information in the image is further compressed, the coding performance is greatly improved, and the higher compression ratio is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

a conventional motion prediction and compensation based video compression flow diagram in the prior art is shown in fig. 1;

a schematic diagram of an existing video frame encoding is shown in fig. 2;

FIG. 3 is a schematic diagram illustrating building stillness invariance within a video frame;

a schematic diagram of the digital retina principle is shown in fig. 4;

FIG. 5 is a flow chart illustrating steps of a method for digital retina-based video encoding according to an embodiment of the present application;

a schematic diagram of a video coding principle according to an embodiment of the present application is shown in fig. 6;

fig. 7 shows a schematic diagram of a video coding principle according to another embodiment of the present application;

an actual image coding schematic diagram according to the embodiment of the application is shown in fig. 8;

FIG. 9 is a diagram illustrating a depth prediction model based approach according to an embodiment of the present application;

fig. 10 is a schematic diagram of a video encoding principle according to another embodiment of the present application;

fig. 11 is a schematic diagram illustrating an actual image processing process according to the video encoding method of fig. 10;

a schematic diagram of a decoding principle corresponding to the video coding of the present application is shown in fig. 12;

fig. 13 is a schematic diagram illustrating a digital retina-based video coding system according to an embodiment of the present application.

A schematic structural diagram of a video encoding apparatus according to an embodiment of the present application is shown in fig. 14.

Detailed Description

In the process of implementing the application, on the basis of finding that the compression performance of the existing video coding is low, the inventor also finds that the depth model used by the digital retina can realize the gradual abstraction of the image through the depth neural network, and obtains the semantic features of different scales. Further, if a depth model is also used, the image can also be reconstructed based on semantic features. Thus, the depth model enables image compression based on features and achieves higher compression efficiency, although such compression implies higher computational complexity than state-of-the-art video coding techniques.

A conventional motion prediction and compensation based video compression flow diagram in the prior art is shown in fig. 1.

As shown in fig. 1, e in fig. 1 is the residual value after motion compensation. The residual value is transmitted or stored after passing through a transformer, a quantizer and an entropy coder. Where a common quantizer may be a DCT quantizer.

A schematic diagram of an existing video frame encoding is shown in fig. 2. A schematic diagram of building stillness within a video frame is shown in fig. 3.

As shown in fig. 2, a picture is divided into slices (slices) that can be independently coded and decoded in a coded segment. Each stripe is partitioned into a plurality of coding tree units CTU, which in turn may be further divided into a plurality of coding units CU. Each CU may be residual pre-coded based on at least one same CU data of the current or other frame. And a plurality of coding units which can realize independent coding and decoding inside one strip. In general, a valid target will be distributed within different bands. Or from another perspective, as shown in fig. 3, pixels within a band often do not constitute a visually significant target for humans.

Assume that between two frames, as shown in FIG. 3, the building remains stationary. There is a large amount of redundant information between frames, which is also present in the coding units inside the slices. Then, the encoder shown in fig. 1 can compress a large amount of redundant information through inter-frame prediction and a transformer, but the algorithm of the encoder of fig. 1 can only identify the correlation between pixels in the coding unit, and cannot identify that the currently valid target is an architectural background, such as a semantic feature of an image, and thus the corresponding image features have temporal correlation.

However, the feature extraction model based on the depth model can extract semantic information of the image and restore the image through a reconstruction model. There is a great deal of mature prior art in this regard, and for example, the self-encoder that has been widely studied uses this principle, which means that the depth model can extract deep features in the image and reconstruct the image from the deep features. Similarly, if a depth model can achieve inter-frame prediction through the deep semantics of the image, it means that a higher compression ratio can be achieved in a larger range.

A schematic diagram of the digital retina principle is shown in fig. 4.

As shown in fig. 4, where the front-end device has both video compression and depth models for video feature extraction. Since the back-end, as shown in fig. 4, may deploy different models to the front-end through the transmission channel, it can be appreciated that the front-end device has the ability to adaptively acquire any depth model.

Based on the method, the specific area in the image is compressed by using the depth model, and the compressed data is reconstructed by the depth reconstruction model.

The depth prediction model referred to in this application is: on the encoding side, a model for encoding prediction is performed based on the above conventional depth model. Accordingly, a depth reconstruction model for decoding based on the existing depth model exists at the decoding end. The network structure of the depth prediction model and the depth reconstruction model is consistent with the depth model.

In an area, when a higher effective compression ratio is obtained compared with a coding method based on block and motion compensation, the method uses a depth model to code the area, and records area number and mode indication information in coded data, wherein the mode indication information comprises a coding mode based on the depth model and a corresponding decoding model used by the current area. Alternatively, the region may be encoded using a block and motion compensation based approach.

On the decoding side, whether a specific area is decoded based on motion compensation or depth reconstruction model is identified through the mode indication information. And decoding according to the indication information.

The coding region, i.e. the selected specific region, of the present application is typically larger in size than the coding tree unit maximum size.

In particular, the method comprises the following steps of,

according to the video coding scheme, an input image is obtained, inter-frame/intra-frame prediction is carried out on the basis of blocks, and first coding prediction data are obtained; according to the input image, inter-frame/intra-frame prediction is carried out based on a depth prediction model to obtain second coding prediction data; when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data. According to the method and the device, the coding compression mode is selected in a self-adaptive mode, and meanwhile, the characteristic extraction capability of the depth model is utilized, so that the redundant information in the image is further compressed, the coding performance is greatly improved, and the higher compression ratio is realized.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Example 1

A flowchart illustrating the steps of a method for digital retina-based video encoding according to an embodiment of the present application is shown in fig. 5.

As shown in fig. 5, the video encoding method of the present embodiment specifically includes the following steps:

s101: an input image is acquired, and inter/intra prediction is performed on the basis of blocks to obtain first encoding prediction data.

Specifically, the method further comprises the step of selecting the coding area after the input image is acquired. Therefore, inter/intra prediction is performed on a block basis, resulting in first encoded prediction data for the encoded region.

S102: and according to the input image, performing inter-frame/intra-frame prediction based on the depth prediction model to obtain second coding prediction data.

Specifically, inter/intra prediction is performed based on the depth prediction model according to the input image and the selected coding region, so as to obtain second coding prediction data of the coding region.

The depth prediction model is a model for performing coding prediction based on an existing depth model.

S103: when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data.

Specifically, when the second encoding prediction data amount of the selected encoding region is smaller than the first encoding data amount, encoding the second encoding prediction data as the encoding data of the encoding region; the coded data further comprises a coding region number, a coding mode and a corresponding decoding model.

Correspondingly, the method also comprises the following steps: when the second coding prediction data amount of the coding area is larger than or equal to the first coding data amount, coding the first coding prediction data as the coding data of the coding area; the coded data further comprises a coding region number and a coding mode.

A schematic diagram of a video coding principle according to an embodiment of the present application is shown in fig. 6.

As shown in fig. 6, the encoding scheme decider selects the encoding scheme according to the size of the data amount generated by the current encoding region through the two methods S101 and S102. If the block coding mode is used, that is, the amount of data after pass coding (including entropy coding) in the upper half of fig. 6 is small, the signal of the selected port 1 is input to the entropy coder. If the amount of encoded data (including the entropy encoder) generated using the depth prediction model is small, the signal of port 2 is selected to be input to the entropy encoder. Since the quantized data itself corresponds to the data amount compressed by the entropy encoder, the decision device can directly complete the decision through the input signal.

As shown in fig. 6, S101 specifically includes the following steps:

firstly, inputting an input image into an inter/intra prediction module to obtain a first inter/intra prediction value;

secondly, obtaining a first residual value according to the interframe/intraframe predicted value and the motion compensation value;

and finally, the first residual value passes through a converter and a quantizer to obtain first coding prediction data, namely data to be input into a coding mode decision device.

As shown in fig. 6, S102 specifically includes the following steps:

firstly, inputting an image into a depth prediction model to obtain a second inter-frame/intra-frame prediction value;

then, a second residual value is obtained according to a second inter-frame/intra-frame predicted value through a residual calculation module;

and finally, the second residual error value passes through a quantizer to obtain second coding prediction data.

A schematic diagram of a video coding principle according to another embodiment of the present application is shown in fig. 7.

As shown in fig. 7, the residual values based on the depth prediction model are also input to a transformer and then to the quantizer. The transformer may take the largest size 64 x 64 of the CTU and use a DCT transformer.

Correspondingly, the second residual value is transformed by the transformer and the quantizer to obtain the second coded prediction data.

An actual image coding schematic diagram according to the embodiment of the application is shown in fig. 8.

As shown in fig. 8, in an example of actual image coding, assuming that a current frame is a bi-directionally predicted frame "B-frame" and an image remains still in a current GOP, in conventional coding a slice is internally divided into CTUs, each CTU data unit includes two motion vector amount data (MV 1, MV 2), residual data R, a reference frame Ref1, and a reference frame Ref 2.

Due to the fact that the target in the target area is identified through the depth prediction model and accurate prediction is achieved, only quantized residual data exist, and therefore prediction is accurate, and the residual data are all 0 sequences. Compared with the conventional method based on block coding, even if residual data in the CTU is all 0, the MV data and the Ref data are not 0, so that the data amount coded by using the blocks after entropy coding is far larger than that of the coding method based on the depth prediction model.

At this time, according to the present application, the encoding mode decision module selects an encoding mode based on a depth prediction model, so that the overall image compression rate is higher than that of the conventional mode based on only block encoding.

A schematic diagram based on a depth prediction model according to an embodiment of the present application is shown in fig. 9.

As shown in fig. 9, a plurality of image frames are input at an input terminal, and a prediction frame is output after a depth model operation. The error between the predicted frame and the actual sampled image is gradually reduced through the training of the sample data. There are many methods for training the model, either end-to-end training or prediction based on optical flow.

However, a more critical issue here is the temporal relationship between the input image sequence and the output image sequence.

In a prior art frame interpolation algorithm based on a depth model, the time of the output image is between the input images, for example, the input images are T1 and T2, and the time of the output image is at a certain fixed time node within (T1, T2). However, in the present application, the purpose of the prediction model is not to output an image frame at a fixed time node, but to generate a prediction image with the smallest error from the currently encoded image, and the best input frame of this prediction image cannot be known in advance.

Therefore, in another embodiment of the present application, the depth prediction model takes an "I frame" that performs inter/intra prediction based on a block as an input, and outputs an image frame near at least one "I frame" as a prediction value of a corresponding frame.

Since block-based video coding techniques have limitations on the prediction length, for example, the prediction length does not exceed 5, i.e., any one data frame is predicted only from within data frames that are not more than 5 apart. Therefore, the depth prediction model only needs to obtain the prediction capability within the same prediction distance, and the current frame can be predicted not by inputting any distance of image frames. In another embodiment, the input of the depth prediction model is the images corresponding to the "I" frame and the "P" frame, and the depth prediction model outputs only the image corresponding to the bidirectional prediction image "B" frame.

A schematic diagram of a video coding principle according to another embodiment of the present application is shown in fig. 10.

As shown in fig. 10, that is, the first inter/intra prediction value is further input to the depth prediction model, and accordingly, inter/intra prediction is performed based on the depth prediction model according to the input image, so as to obtain second encoded prediction data, which specifically includes:

Fig. 11 is a schematic diagram showing an actual image processing process according to the video encoding method of fig. 10.

As shown in fig. 11, the decision of the current frame by the intra/inter prediction module is input to the depth prediction model for the model to determine the input and output frames of the model.

In other embodiments, there are different models with different prediction results for different image frames, since the accuracy of the depth prediction model depends on its training samples. Thus, the depth prediction model includes a plurality of models. In the same data frame, the residual calculation module calculates residual values for the multiple models, selects the minimum residual value as a prediction model according to the residual values, and indicates the number of the selected model in the data stream.

Specifically, namely:

simultaneously inputting the same data frame of the image into a plurality of models of the depth prediction model to obtain a plurality of second inter-frame/intra-frame prediction values; obtaining a plurality of second residual values according to the plurality of second inter/intra prediction values; and selecting the minimum second residual value and the corresponding model in the plurality of second residual values.

In this embodiment, a higher compression ratio is obtained by further using more calculations. Theoretically, for any image frame, there exists an optimal prediction model, which can predict the result consistent with the current frame through other data frames, that is, the residual value generated by the optimal prediction model is zero. If the model is also stored at the decoding end, the least bits for indicating the selected model can be used, i.e. the encoding and decoding of the entire image frame can be achieved.

A schematic diagram of the decoding principle corresponding to the video coding of the present application is shown in fig. 12.

As shown in fig. 12, the decoding side has the same flow as the encoding side. After passing through the entropy decoder, the decoding mode module determines the decoding mode used by the current frame. Specifically, the encoding/decoding method of the current frame is determined based on information such as the region number, the encoding method based on the depth model, and the corresponding decoding model recorded in the encoded data.

And if the current frame is judged to be the frame based on the block coding, the residual value and the predicted value are added based on the intra-frame or inter-frame prediction after the current frame passes through the inverse quantizer and the inverse transformer, and the decoding output is obtained. If the decoding mode module determines that the decoding mode used by the current frame is based on the depth prediction model, the residual error value is added with the prediction frame output by the depth model after passing through the inverse quantizer and the inverse transformer to obtain the decoding output.

The video coding method based on the digital retina in the embodiment of the application is adopted to obtain an input image, and inter-frame/intra-frame prediction is carried out based on blocks to obtain first coding prediction data; according to the input image, inter-frame/intra-frame prediction is carried out based on a depth prediction model to obtain second coding prediction data; when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data. According to the method and the device, the coding compression mode is selected in a self-adaptive mode, and meanwhile, the characteristic extraction capability of the depth model is utilized, so that the redundant information in the image is further compressed, the coding performance is greatly improved, and the higher compression ratio is realized.

Example 2

For details not disclosed in the video coding system of the present embodiment, please refer to the detailed implementation contents of the video coding method in other embodiments.

Fig. 13 is a schematic diagram illustrating a video coding system according to an embodiment of the present application.

As shown in fig. 13, the video coding system provided in this embodiment specifically includes a first coding prediction unit 10, a second coding prediction unit 20, and a coding mode decision unit 30.

In particular, the method comprises the following steps of,

first coded prediction unit 10: the method is used for acquiring an input image and performing inter/intra prediction on the basis of blocks to obtain first encoding prediction data.

Second coded prediction unit 20: and the inter-frame/intra-frame prediction module is used for carrying out inter-frame/intra-frame prediction based on the depth prediction model according to the input image to obtain second coding prediction data.

The encoding scheme decider 30: and the encoder is configured to encode according to the second encoded prediction data when the second encoded prediction data amount is smaller than the first encoded data amount.

By adopting the video coding system based on the digital retina in the embodiment of the application, the input image is obtained, and the first coding prediction unit 10 performs inter-frame/intra-frame prediction based on the block to obtain first coding prediction data; the second coding prediction unit 20 performs inter/intra prediction based on the depth prediction model according to the input image to obtain second coding prediction data; the encoding mode decision unit 30 performs encoding based on the second encoded prediction data when the second encoded prediction data amount is smaller than the first encoded data amount. According to the method and the device, the coding compression mode is selected in a self-adaptive mode, and meanwhile, the characteristic extraction capability of the depth model is utilized, so that the redundant information in the image is further compressed, the coding performance is greatly improved, and the higher compression ratio is realized.

Example 3

For details that are not disclosed in the video encoding apparatus of this embodiment, please refer to specific implementation contents of the video encoding method or system in other embodiments.

A schematic structural diagram of a video encoding apparatus 400 according to an embodiment of the present application is shown in fig. 14.

As shown in fig. 14, the video encoding apparatus 400 includes:

the memory 402: for storing executable instructions; and

a processor 401 is coupled to the memory 402 to execute executable instructions to perform the motion vector prediction method.

Those skilled in the art will appreciate that the schematic diagram 14 is merely an example of the video encoding apparatus 400 and does not constitute a limitation on the video encoding apparatus 400, and may include more or less components than those shown, or combine some components, or different components, for example, the video encoding apparatus 400 may further include an input-output device, a network access device, a bus, etc.

The Processor 401 (CPU) may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor 401 may be any conventional processor or the like, the processor 401 being the control center of the video encoding device 400 and the various portions of the entire video encoding device 400 being connected using various interfaces and lines.

The memory 402 may be used to store computer readable instructions and the processor 401 may implement the various functions of the video encoding device 400 by executing or executing computer readable instructions or modules stored in the memory 402 and by invoking data stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the video encoding apparatus 400, and the like. In addition, the Memory 402 may include a hard disk, a Memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Memory Card (Flash Card), at least one disk storage device, a Flash Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), or other non-volatile/volatile storage devices.

The modules integrated by the video encoding apparatus 400 may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by hardware related to computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Example 4

The present embodiment provides a computer-readable storage medium having stored thereon a computer program; the computer program is executed by a processor to implement the video encoding method in other embodiments.

The video coding system, the video coding device and the computer storage medium in the embodiment of the application acquire an input image, and perform inter-frame/intra-frame prediction based on a block to obtain first coding prediction data; according to the input image, inter-frame/intra-frame prediction is carried out based on a depth prediction model to obtain second coding prediction data; when the second encoded prediction data amount is smaller than the first encoded data amount, encoding is performed based on the second encoded prediction data. According to the method and the device, the coding compression mode is selected in a self-adaptive mode, and meanwhile, the characteristic extraction capability of the depth model is utilized, so that the redundant information in the image is further compressed, the coding performance is greatly improved, and the higher compression ratio is realized.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for video coding based on a digital retina, comprising the steps of:

and when the second encoding prediction data amount is smaller than the first encoding data amount, encoding according to the second encoding prediction data.

2. The video coding method according to claim 1, wherein the input image is followed by selecting a coding region, and the video coding method for the coding region specifically comprises:

inter/intra prediction is carried out on the basis of the blocks, and first coding prediction data of the coding area are obtained;

inter-frame/intra-frame prediction is carried out based on a depth prediction model, and second coding prediction data of the coding region are obtained;

the encoded data further includes the encoding region number, an encoding method, and a corresponding decoding model.

3. The video coding method of claim 2, further comprising:

wherein the encoded data further includes the encoding region number and an encoding method.

4. The video coding method according to claim 1, wherein the obtaining an input image and performing inter/intra prediction based on a block to obtain first coded prediction data specifically comprises:

inputting the input image into an inter-frame/intra-frame prediction module to obtain a first inter-frame/intra-frame prediction value;

obtaining a first residual value according to the inter-frame/intra-frame predicted value and the motion compensation value;

and the first residual value passes through a transformer and a quantizer to obtain first coding prediction data.

5. The video coding method according to claim 4, wherein the first inter/intra prediction value is further input to the depth prediction model, and accordingly, the inter/intra prediction is performed based on the depth prediction model according to the input image to obtain second coded prediction data, specifically comprising:

and performing inter/intra prediction based on a depth prediction model according to the input image and the first inter/intra prediction value to obtain second coding prediction data.

6. The video coding method according to claim 1, wherein the obtaining second coded prediction data by performing inter/intra prediction according to the input image by using a depth prediction model based on a depth model specifically includes:

and obtaining second coding prediction data after the second residual error value passes through a quantizer.

7. The video coding method of claim 6, wherein the second residual values are transformed and quantized to obtain second coded prediction data.

8. The video coding method according to claim 6, wherein the depth prediction model includes a plurality of models, and the inter/intra prediction is performed based on the depth prediction model according to the input image to obtain second coded prediction data, further comprising:

simultaneously inputting the same data frame of the image into a plurality of models of a depth prediction model to obtain a plurality of second inter-frame/intra-frame prediction values;

9. A video coding system based on digital retina, characterized in that, it specifically includes:

the second coding prediction unit: the depth prediction module is used for carrying out inter-frame/intra-frame prediction based on the input image to obtain second coding prediction data;

and a coding mode decision device: and the encoder is used for encoding according to the second encoding prediction data when the second encoding prediction data amount is smaller than the first encoding data amount.

10. A computer-readable storage medium, having stored thereon a computer program; the computer program is executed by a processor to implement the digital retina-based video encoding method of any one of claims 1-8.