CN111212278A

CN111212278A - Method and system for predicting displacement frame

Info

Publication number: CN111212278A
Application number: CN202010015402.4A
Authority: CN
Inventors: 黄志奇; 陈东义; 杨雁杰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2020-05-29
Anticipated expiration: 2040-01-07
Also published as: CN111212278B

Abstract

The invention discloses a method and a system for predicting a displacement frame, which solve the problem of halving effective information of a frame displacement method. The present invention includes a method of predicting displaced frames and a system for implementing an improved frame displacement method. In the invention, one of the left channel and the right channel displays a real frame at any time, and the other channel displays a predicted frame, because the data for generating the predicted frame comes from the real frame, but the prediction is different from the reality, and the real frame and the predicted frame are displayed at the same time really, the display requirement of the 3D video is really met. The 3D effect generated in the channel data is controllable and adjustable.

Description

Method and system for predicting displacement frame

Technical Field

The invention relates to the field of information processing, in particular to a method and a system for predicting a displacement frame.

Background

The image display of any screen consists of three signals: a row scanning signal, a column scanning signal, a pixel data signal; the working principle is as follows: the line scanning signal and the column scanning signal move on the screen one by one, just like typing, when moving to a new pixel, the 'pixel data signal' is a new value which represents the color of the pixel, and the 'scanning signal' continuously moves to finally have the color of the whole screen, when the scanning speed is very high (when the scanning is finished, only 1/25 seconds are needed, namely, 25 images can be displayed in 1 second), the displayed video is dynamic. In addition, we refer to a picture in a video as a frame, and for a video, at least 25 frames of video need to be displayed in one second for human eyes to feel that the video is smooth.

The frame displacement method is a method for converting a common 2D video into a 3D video, and the key principle of the 3D video lies in that the slightly different images seen by the left eye and the right eye of a person are utilized, and the left eye and the right eye of the person are required to independently display the approximately same slightly different images when the person wants to display the 3D video. The frame shifting method continuously distributes frames in an input section of common 2D video to displays (VR glasses) corresponding to left and right eyes, and supposing that 60 frames exist in the 2D video within one second, the first frame is used for the left-eye display, the second frame is used for the right-eye display, the third frame is used for the left eye, and so on. When the video input by people is dynamic, the frame-to-frame difference exists (the method is invalid when the video is static), so that the images displayed by the left eye and the right eye of people have the slight difference, the 2D video is converted into the 3D video, and the video displayed by the single-side display is halved into 30 frames/second instead of the original 60 frames/second.

The frame shift method has the following defects:

the frame shift method is actually a 3D effect enhanced by reducing the information amount of the one-sided display (originally, the information amount of one channel is 60 frames/s, and now becomes 30 frames/s);

because odd frames and even frames of the same video are distributed to the left display and the right display, image display of the left display and the right display is actually performed alternately, and actually, no time exists when the left eye and the right eye of a person receive images simultaneously, only because the image updating speed is fast enough, the eyes of the person cannot respond so that the left image and the right image are considered to be displayed simultaneously.

Disclosure of Invention

The invention provides a method and a system for predicting a displacement frame, which solve the problem of halving effective information of a frame displacement method.

The invention is realized by the following technical scheme:

a method of predicting displaced frames, comprising the processing steps of, for a plurality of image frames in a dynamic video:

step 1: obtaining a plurality of image frames on a single channel in a dynamic video and allocating the image frames as real frames to a channel A and a channel B: sequentially numbering a plurality of real frames, sequentially pre-allocating real frames with odd serial numbers to a channel A, sequentially pre-allocating real frames with even serial numbers to a channel B, sequentially numbering the real frames as a real frame 1, a real frame 2, a real frame 3, … and a real frame N, and N is the number of the image frames;

step 2: inserting predicted frames at blank positions in time sequence in a channel A, numbering the predicted frames on the channel A as predicted frames A1, predicted frames A2, predicted frames A3 and …, inserting predicted frames at blank positions in time sequence in a channel B, numbering the predicted frames on the channel B as predicted frames B1, predicted frames B2 and predicted frames B3 and …, numbering the image frames in time sequence in the complete channel A as follows: real frame 1, predicted frame a1, real frame 3, predicted frame a2, real frame 5, predicted frame A3, …, and the image frame numbers in the time sequence in the complete channel B are: true frame 2, predicted frame B1, true frame 4, predicted frame B2, true frame 6, predicted frame B3, …;

and step 3: according to the inter-frame prediction method based on the image block segmentation, according to two real frames in time sequence in each channel, predicting to obtain a first predicted frame behind a second real frame in the two real frames, and sequentially obtaining all predicted frames except a predicted frame A1 and a predicted frame B1;

and 4, step 4: and obtaining all frame data in the respective channels and outputting the frame data backwards.

Further, the channel is a data channel of the 3D device.

Further, the channel a is a left eye channel, and the channel B is a right eye channel.

Further, the channel a is a right eye channel, and the channel B is a left eye channel.

Further, in step 3, a tile partition method in the inter prediction method based on tile partition is detailed as follows:

dividing a whole image into a plurality of image blocks according to the image resolution x y according to the row-column scanning display principle, dividing each image frame into a plurality of image blocks, setting a row scanning counting signal hcnt and a column scanning counting signal vcnt, nesting and circulating the hcnt signal and the vcnt signal, circularly increasing the hcnt from 1 to x, clearing to 1 after x, repeating the operation, increasing the vcnt by 1 after the hcnt counts to x, wherein the vcnt ranges from 1 to y, and obtaining two scanning signals which continuously shift on a screen;

at the same time, according to the division of the row and column n blocks, equally dividing x pixels of the row and y pixels of the column into n parts respectively to obtain n²The method comprises the following steps that (x/n) × (y/n) rectangular blocks are divided by artificially designating pixels;

judging the image blocks to which the current scanning counting signal belongs, uniformly processing the pixel data of the points in the same image block, wherein the processing is different between different image blocks, m (x/n) points (y/n) points exist in any image block, each point has corresponding pixel information, the pixel information is two numbers within 0-255, and performing numerical operation on the m pieces of pixel information to obtain the feature vector of the image blocks, wherein the feature vector has two dimensions, 1 dimension represents the brightness, and the other 1 dimension represents the color.

Further, in the step 3, a prediction method in the inter prediction method based on the tile partition is detailed as follows:

pretreatment: scanning the images in sequence to obtain n²Storing the data of all the characteristic vectors, wherein in step 3, the first real frame image of two real frames in each channel in time sequence is completely scanned, the second real frame image of the two real frames is scanned, and the two real frame images are divided into n according to the image block segmentation method²All the image blocks are stored, and for two image blocks at the same position in the two real frame images;

the prediction algorithm processing process comprises the following steps: inputting the feature vectors of two groups of tiles into a prediction algorithm, wherein the first real frame is a reference frame, the feature vector of the tile in the reference frame is S1 ═ (S1_ liang, S1_ se), the luminance component in the feature vector S1 is S1_ liang, and the color component in the feature vector S1 is divided into two groupsThe quantity is S1_ se, the second real frame is the current frame, the feature vector of the tile in the current frame is S2 ═ (S2_ liang, S2_ se), the luminance component in the feature vector S2 is S2_ liang, and the color component in the feature vector S2 is S2_ se, then the change vector S of two frames before and after the video is S2-S1 can be obtained, the change vector S is applied to the feature vector of the tile in the current frame is S2, the feature vector S3 of the tile of the first predicted frame after the second real frame in the two real frames in step 3 is obtained, the first predicted frame after the second real frame in the two real frames is the frame to be predicted, the feature vector S3 of the block to be predicted, the feature vector S3 is used as the luminance and color data of the whole block to generate a whole new tile, and all n and n are sequentially generated²Obtaining a whole new image after each image block;

and (3) recursion: and similarly, the prediction algorithm processing process is adopted, and new prediction frames are continuously generated and inserted into the channel in sequence by analogy.

Further, the numerical operation is an average operation, a median operation, a mode operation, and the like, and the selection of the numerical operation mode is determined according to the actual effect of the application scene.

Further, in the two channels, the predicted frame a1 and the predicted frame B1 are not displayed, and at the same time, the predicted frame a1 and the predicted frame B1 are not displayed at the frame position corresponding to the real frame 1 in the channel a in the channel B, so that no influence is caused on the actual effect, in the actual use, when the 2D video data stream is converted into the two channels in the 3D data, only the initial three frame data are lost, and because there are multi-frame images within 1s, the video image observed by human eyes is still not smooth under the condition that at least 25 frames are guaranteed to be displayed in one second.

Further, the specific difference between the real frame and the predicted frame of each of the two channels depends on the used 'prediction algorithm', and can be controlled by the factors such as the difference of the selected feature vectors, the size of the block and the like, so that the processing process of the prediction algorithm is controllable, and the optimal scheme can be debugged according to the actual situation of the application scene.

A system for realizing an improved frame displacement method comprises a scanning counting module, a positioning module, a predicted frame generating module, a channel switching module and an output channel, wherein the scanning counting module scans and numbers a dynamic video into a plurality of real frames and then sends the real frames to the positioning module, the positioning module processes and stores image data and outputs the image data to the predicted frame generating module, data processing between the positioning module and the predicted frame generating module is that single-channel video source data is input and image segmentation is carried out through the positioning module and then characteristic vectors of image blocks are extracted, the predicted frame generating module processes the characteristic vectors extracted by the positioning module based on a prediction algorithm and then generates and numbers a plurality of predicted frames, the channel switching module carries out time slot switching on the plurality of predicted frames to two channels and inserts the predicted frames into corresponding time slots, the channel switching module allocates odd-numbered real frames to a channel A, and the channel switching module allocates even-numbered real frames to a channel B.

Further, a left-eye video output channel and a right-eye video output channel are also included, and two situations are also included, namely situation 1: the channel A is connected to the left eye video output channel, and the channel B is connected to the right eye video output channel; case 2: and the channel A is connected with the right eye video output channel, and the channel B is connected with the left eye video output channel.

The invention has the following advantages and beneficial effects:

the invention fills the problem of halving effective information of a frame displacement method by inserting 'prediction frames'.

In the invention, one of the left channel and the right channel displays a real frame at any time, and the other channel displays a predicted frame, because the data for generating the predicted frame comes from the real frame, but the prediction is different from the reality, and the real frame and the predicted frame are displayed at the same time really, the display requirement of the 3D video is really met.

The 3D effect generated in the channel data is controllable and adjustable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic flow chart of the method of the present invention.

FIG. 2 is a system diagram of the present invention.

Detailed Description

Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any inventive changes, are within the scope of the present invention.

A method for predicting displaced frames, as shown in fig. 1, includes the following processing steps for a plurality of image frames in a dynamic video:

Further, the channel is a data channel of the 3D device.

the implementation is visible to code segment 1.

see in particular code segment 2.

wire block1_1＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*0&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*1))？1'b1:1'b0；

wire block1_2＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*1&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*2))？1'b1:1'b0；

wire block1_3＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*2&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*3))？1'b1:1'b0；

wire block1_4＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*3&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*4))？1'b1:1'b0；.

.

{ Total n²Section }

.

wire blockn_18＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*19&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*n)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*17&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*(n-2))？1'b1:1'b0；

wire blockn_19＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*19&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*n)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*18&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*(n-1))？1'b1:1'b0；

wireblockn_n＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*19&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*n)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*19&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*n))？1'b1:1'b0；

the prediction algorithm processing process comprises the following steps: inputting the feature vectors of two groups of tiles into a prediction algorithm, where the first real frame is a reference frame, the feature vector of the tile in the reference frame is S1 ═ (S1_ liang, S1_ se), the luminance component in the feature vector S1 is S1_ liang, the color component in the feature vector S1 is S1_ se, the second real frame is a current frame, the feature vector of the tile in the current frame is S2 ═ S2_ liang, S2_ se, the luminance component in the feature vector S2 is S2_ liang, the color component in the feature vector S2 is S2_ se, then the change vector S of the two frames before and after the video is S2-S1, the feature vector S in the current frame is S2, and the change vector S in the first frame after the change vector S2 is obtained, and the second frame is the second real frame to be predicted, and the second frame S3 is the second real frame to be predicted, obtaining a feature vector S3 of the block to be predicted, generating a complete new block by using the feature vector S3 as the brightness and color data of the whole block, and sequentially generating all n²Obtaining a whole new image after each image block;

In one embodiment, the above method is applied to both left and right channels, and a left-right channel switching module is added, similar to a shunt switch, for alternately distributing the input single-channel data to the left and right channels, where channel a is channel L and channel B is channel R, and the following results are finally achieved: at the first moment, a left eye displays a real frame 1, and a right eye does not display the real frame; at the second moment, the left eye is not displayed. The right eye displays the real frame 2; and a third moment: the left eye displays the real frame 3, and the right eye does not display; at the fourth moment, the left eye displays the predicted frame L2, and the right eye displays the real frame 4; at a fifth moment, the left eye displays a real frame 5. The right eye displays the predicted frame R2; and a sixth time: the left eye displays the predicted frame L3, and the right eye displays the real frame 6; at the seventh time instant, the left eye displays real frame 8, the right eye predicts frame R3, and so on.

The channel switching module code is as follows:

based on another embodiment of the foregoing embodiments, a system for implementing an improved frame shift method includes, as shown in fig. 2, a scan counting module, a positioning module, a predicted frame generation module, a channel switching module, and an output channel, where the scan counting module scans and numbers the dynamic video into a plurality of real frames and sends the real frames to the positioning module, the positioning module processes and stores image data and outputs the image data to the predicted frame generation module, the data processing from the positioning module to the predicted frame generation module is that single-channel video source data is input and image segmentation is performed by the positioning module to extract feature vectors of image blocks, the predicted frame generation module processes the feature vectors extracted by the positioning module based on a prediction algorithm and generates and numbers a plurality of predicted frames, the channel switching module performs slot switching on the plurality of predicted frames to two channels according to a time slot and inserts the predicted frames into corresponding slots, the channel switching module allocates odd-numbered real frames to a channel A, and the channel switching module allocates even-numbered real frames to a channel B.

Preferably, a left-eye video output channel and a right-eye video output channel are also included, and two situations are also included, namely situation 1: the channel A is connected to the left eye video output channel, and the channel B is connected to the right eye video output channel; case 2: and the channel A is connected with the right eye video output channel, and the channel B is connected with the left eye video output channel.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for predicting displaced frames, comprising the steps of:

2. The method of claim 1, wherein the channel is a data channel of a 3D device.

3. The method of claim 2, wherein the channel A is a left-eye channel and the channel B is a right-eye channel.

4. The method of claim 2, wherein the channel A is a right-eye channel and the channel B is a left-eye channel.

5. The method of claim 1, wherein in step 3, a tile division method in the inter-frame prediction method based on tile division is detailed as follows:

judging the image blocks to which the current scanning counting signal belongs, uniformly processing pixel data of points in the same image block, wherein m is (x/n) points (y/n) points in any image block, each point has corresponding pixel information, the pixel information is two numbers within 0-255, and performing numerical operation on the m pixel information to obtain a feature vector of the image block, wherein the feature vector has two dimensions, 1 dimension represents brightness, and the other 1 dimension represents color.

6. The method of claim 5, wherein in step 3, the prediction method in the inter-frame prediction method based on tile partition is detailed as follows:

the prediction algorithm processing process comprises the following steps: inputting feature vectors of two groups of tiles into a prediction algorithm, wherein the first real frame is a reference frame, the feature vectors of the tiles in the reference frame are S1 ═ (S1_ liang, S1_ se), the luminance component in the feature vector S1 is S1_ liang, the color component in the feature vector S1 is S1_ se, and the second real frame is a current frame, and the current frame is a picture in the current frameThe feature vector of the block is S2 ═ S2_ liang, S2_ se, the luminance component in the feature vector S2 is S2_ liang, and the color component in the feature vector S2 is S2_ se, then the change vector S of two frames before and after the video is S2-S1, the change vector S is applied to the feature vector of the tile in the current frame is S2, the feature vector S3 of the tile of the first predicted frame after the second real frame in the step 3 is obtained, the first predicted frame after the second real frame in the two real frames is the frame to be predicted, the feature vector S3 of the block to be predicted is obtained, the feature vector S3 is used as the luminance and color data of the whole block to generate a whole new block, and all n new blocks are sequentially generated²Obtaining a whole new image after each image block;

7. The method of claim 5, wherein the numerical operation is an average operation, a median operation, a mode operation, or the like.

8. The system for realizing the method for predicting the displacement frame as claimed in claim 5, wherein the system comprises a scan counting module, a positioning module, a predicted frame generating module, a channel switching module and an output channel, the scan counting module scans and numbers the dynamic video into a plurality of real frames and then sends the real frames to the positioning module, the positioning module processes and stores image data and then outputs the image data to the predicted frame generating module, the data processing from the positioning module to the predicted frame generating module is that single-channel video source data is input and image segmentation is carried out by the positioning module and then feature vectors of image blocks are extracted, the predicted frame generating module processes the feature vectors extracted by the positioning module based on a prediction algorithm and then generates and numbers a plurality of predicted frames, the channel switching module carries out time slot switching on the plurality of predicted frames to two channels and inserts the predicted frames into corresponding time slots, the channel switching module allocates odd-numbered real frames to a channel A, and the channel switching module allocates even-numbered real frames to a channel B.

9. The system of claim 8, further comprising a left-eye video output channel and a right-eye video output channel, and further comprising two cases, case 1: the channel A is connected to the left eye video output channel, and the channel B is connected to the right eye video output channel; case 2: and the channel A is connected with the right eye video output channel, and the channel B is connected with the left eye video output channel.