CN117041597B

CN117041597B - Video encoding and decoding methods and devices, electronic equipment and storage medium

Info

Publication number: CN117041597B
Application number: CN202311295212.2A
Authority: CN
Inventors: 张健; 刘建阳; 文博; 潘建东
Original assignee: China Securities Co Ltd
Current assignee: China Securities Co Ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-01-19
Anticipated expiration: 2043-10-09
Also published as: CN117041597A

Abstract

The embodiment of the invention provides a video coding and decoding method, a video coding and decoding device, electronic equipment and a storage medium, and relates to the technical field of video communication. The video coding method comprises the following steps: judging whether the image content of the first I frame is similar to the image content of the second I frame; if the image content of the first I frame is similar to the image content of the second I frame, determining a motion area which generates motion relative to the second I frame in the first I frame; based on the second I frame, performing motion vector estimation on the coding block in the motion area to obtain a corresponding reference block of the coding block in the second I frame; encoding the encoding block based on the reference block to obtain encoding data of the encoding block; generating encoded data comprising the encoding mode identification, the number of the motion areas, the identification information of the motion areas, the length of the obtained encoded data and the first I frame of the obtained encoded data. By applying the scheme provided by the embodiment of the invention, the video code rate can be reduced, and the requirement on network bandwidth in the video communication process can be reduced.

Description

Video encoding and decoding methods and devices, electronic equipment and storage medium

Technical Field

The present invention relates to the field of video communication technologies, and in particular, to a video encoding method, a video decoding method, a video encoding device, a video decoding device, an electronic device, and a storage medium.

Background

With the development of communication technology, video communication has been widely used in various fields. The video to be communicated may be video collected by a camera, video collected by software that can share a desktop, or the like. When video communication is performed, video frames need to be encoded to obtain encoded data, and then the video communication is realized by transmitting the encoded data.

In the prior art, when encoding video frames, the video frames are typically divided into three different frame types, I-frames, P-frames and B-frames. When an I frame is encoded, an intra-frame encoding mode is generally adopted for encoding, so that the information considered during encoding is mainly the intra-frame information of the I frame, and the information which can be referred to is relatively less, therefore, the compression rate of the I frame is lower, the data volume of encoded data of the I frame is large, the code rate of the encoded video is easy to be large, and the requirement on network bandwidth is high in the video communication process.

Disclosure of Invention

The embodiment of the invention aims to provide a video coding and decoding method, a video coding and decoding device, electronic equipment and a storage medium, so as to reduce video code rate and reduce network bandwidth requirements in the video communication process. The specific technical scheme is as follows:

According to a first aspect of an embodiment of the present invention, there is provided a video encoding method, the method including:

judging whether the image content of a first I frame is similar to the image content of a second I frame, wherein the second I frame is a reconstructed frame of a previous I frame of the first I frame;

if the image content of the first I frame is similar to the image content of the second I frame, determining a motion area generating motion relative to the second I frame in the first I frame;

performing motion vector estimation on the coding block in the motion area based on the second I frame to obtain a corresponding reference block of the coding block in the second I frame;

encoding the encoding block based on the obtained reference block to obtain encoding data of the encoding block;

generating encoded data comprising an encoding mode identifier, the number of the motion areas, the identification information of the motion areas, the length of the obtained encoded data and a first I frame of the obtained encoded data, wherein the encoding mode identifier indicates that the first I frame is encoded based on a non-intra-frame encoding mode.

Optionally, the determining a motion region in the first I-frame that generates motion relative to the second I-frame includes:

obtaining a first luminance image of the first I frame and a second luminance image of the second I frame;

Obtaining an offset image of the first I frame relative to the second I frame based on the difference between the brightness value of each pixel point in the first brightness image and the brightness value of each pixel point in the second brightness image;

and determining a motion area generating motion relative to a second I frame in the first I frame according to an area formed by connecting pixel points with brightness values different from 0 in the offset image.

Optionally, the determining, according to the area formed by connecting the pixels with luminance values other than 0 in the offset image, a motion area in the first I frame that generates motion relative to the second I frame includes:

determining an offset region based on a region formed by pixel points with brightness values not smaller than a brightness threshold in the offset image;

and determining an area corresponding to the offset area in the first I frame, and generating a motion area for motion relative to a second I frame in the first I frame.

Optionally, the determining the offset area based on the area formed by the pixels with the brightness value not less than the brightness threshold in the offset image includes:

sequentially performing corrosion operation treatment, expansion operation treatment and corrosion operation treatment on a region formed by connecting pixel points with brightness values not smaller than a brightness threshold value in the offset image based on a preset corrosion core and an expansion core;

And determining the area formed by the pixel points with the brightness value not smaller than the brightness threshold value in the processed offset image as an offset area.

Optionally, the determining whether the image content of the first I frame is similar to the image content of the second I frame includes:

calculating the similarity between the image content of the first I frame and the image content of the second I frame according to the following relation:

；

wherein RMSE is the calculated similarity, h is the image height of the first I frame and the second I frame, w is the image width of the first I frame and the second I frame, (I, j) representsThe coordinates of the pixels in the image,gray value representing pixel point with coordinates (I, j) in the first I frame,/>A gray value representing a pixel point having coordinates (I, j) in the second I frame;

if the similarity is not greater than a similarity threshold, the image content of the first I frame is similar to the image content of the second I frame;

and if the similarity is greater than the similarity threshold, the image content of the first I frame is dissimilar to the image content of the second I frame.

Optionally, each data in the encoded data of the first I frame is arranged in the following order:

the coding mode identification, the number of the motion areas, the length of the obtained coding data and the area data of each motion area, wherein the area data of each motion area comprises: the data start identifier of the motion region, the encoded data of the encoded block in the motion region, and the data end identifier of the motion region.

According to a second aspect of an embodiment of the present invention, there is provided a video decoding method, the method including:

obtaining video data to be decoded;

if the coding mode identification carried in the video data to be decoded indicates that the video frame to be decoded is an I frame and is coded in a non-intra-frame coding mode, decoding the coding data carried in the video data to be decoded based on the number of the motion areas carried in the video data to be decoded, the length of the coding data and the identification information of the motion areas, so as to obtain the motion vector and residual data of the coding blocks in the motion areas;

obtaining a reconstructed block of the encoded block in the motion region based on the obtained motion vector and residual data and a buffered third I-frame, wherein the third I-frame is: a reconstructed frame of a previous I frame of the video frame to be decoded;

and obtaining a decoding result of the video frame to be decoded based on the obtained reconstructed block and the third I frame.

According to a third aspect of embodiments of the present invention, there is provided a video encoding apparatus, the apparatus comprising:

the image content judging module is used for judging whether the image content of the first I frame is similar to the image content of a second I frame, wherein the second I frame is a reconstructed frame of the previous I frame of the first I frame;

A motion region determining module, configured to determine a motion region in the first I frame that generates motion with respect to the second I frame, in a case where the image content of the first I frame is similar to the image content of the second I frame;

the motion vector estimation module is used for carrying out motion vector estimation on the coding blocks in the motion area based on the second I frame to obtain corresponding reference blocks of the coding blocks in the second I frame;

the coded data obtaining module is used for coding the coded block based on the obtained reference block to obtain coded data of the coded block;

the coding data generation module is used for generating coding data comprising coding mode identification, the number of the motion areas, identification information of the motion areas, the length of the obtained coding data and a first I frame of the obtained coding data, wherein the coding mode identification indicates that the first I frame is coded based on a non-intra-frame coding mode.

Optionally, the motion area determining module includes:

a luminance image obtaining sub-module, configured to obtain a first luminance image of the first I frame and a second luminance image of the second I frame;

an offset image obtaining sub-module, configured to obtain an offset image of the first I frame relative to the second I frame based on a difference between a luminance value of each pixel point in the first luminance image and a luminance value of each pixel point in the second luminance image;

And the motion region determining submodule is used for determining a motion region which generates motion relative to the second I frame in the first I frame according to the region formed by connecting the pixel points with the brightness value of not 0 in the offset image.

Optionally, the motion region determination submodule includes:

an offset region determining unit, configured to determine an offset region based on a region formed by connecting pixel points whose luminance values are not less than a luminance threshold in the offset image;

and the motion region determining unit is used for determining a region corresponding to the offset region in the first I frame and generating a motion region for generating motion relative to a second I frame in the first I frame.

Optionally, the offset area determining unit is specifically configured to: sequentially performing corrosion operation treatment, expansion operation treatment and corrosion operation treatment on a region formed by connecting pixel points with brightness values not smaller than a brightness threshold value in the offset image based on a preset corrosion core and an expansion core; and determining the area formed by the pixel points with the brightness value not smaller than the brightness threshold value in the processed offset image as an offset area.

Optionally, the image content judging module is specifically configured to: calculating the similarity between the image content of the first I frame and the image content of the second I frame according to the following relation:

；

Wherein RMSE is the calculated similarity, h is the image height of the first I-frame and the second I-frame, w is the image width of the first I-frame and the second I-frame, (I, j) represents the coordinates of the pixels in the image,gray value representing pixel point with coordinates (I, j) in the first I frame,/>A gray value representing a pixel point having coordinates (I, j) in the second I frame;

if the similarity is not greater than a similarity threshold, the image content of the first I frame is similar to the image content of the second I frame; and if the similarity is greater than the similarity threshold, the image content of the first I frame is dissimilar to the image content of the second I frame.

According to a fourth aspect of an embodiment of the present invention, there is provided a video decoding apparatus including:

the data acquisition module is used for acquiring video data to be decoded;

The data decoding module is used for decoding the coded data carried in the video data to be decoded based on the number of the motion areas carried in the video data to be decoded, the length of the coded data and the identification information of the motion areas under the condition that the coding mode identification carried in the video data to be decoded indicates that the video frame to be decoded is an I frame and is coded in a non-intra-frame coding mode, so as to obtain the motion vectors and residual data of the coding blocks in the motion areas;

a reconstructed block obtaining module, configured to obtain a reconstructed block of the encoded block in the motion region based on the obtained motion vector, residual data and a buffered third I frame, where the third I frame is: a reconstructed frame of a previous I frame of the video frame to be decoded;

and the decoding result obtaining module is used for obtaining the decoding result of the video frame to be decoded based on the obtained reconstruction block and the third I frame.

According to a fifth aspect of an embodiment of the present invention, there is provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

and a processor, configured to implement the video encoding method according to the first aspect or the video decoding method according to the second aspect when executing the program stored in the memory.

According to a sixth aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the video encoding method of the first aspect or the video decoding method of the second aspect.

The embodiment of the invention has the beneficial effects that:

the video coding method provided by the embodiment of the invention can determine the motion area of the first I frame relative to the second I frame in the image content similar to the image content of the second I frame, wherein the obtained motion area is the area where the first I frame moves relative to the second I frame, namely the area where the first I frame has difference relative to the second I frame. And estimating a motion vector of the coding block in the motion area based on the second I frame to obtain a reference block in the second I frame, and coding the coding block in the first I frame based on the reference block to generate coding data of the first I frame. Since the encoded block is within the motion region, the data of the first I frame generated after encoding the encoded block is actually the encoded data of the motion region within the first I frame. Also, since the image content of the first I frame is similar to the image content of the second I frame, the image content of the first I frame is small in difference from the image content of the second I frame, that is, the region where the difference exists is small, that is, the motion region is small. In this way, the coding block for the motion region is coded, and the motion region is small, so that the code words of the obtained coded data are few. Thus, the video is encoded, the obtained video code rate is small, and the requirement on network bandwidth is low during video communication. Therefore, by applying the scheme provided by the embodiment of the invention, the video code rate can be reduced, and the requirement on network bandwidth in the video communication process can be reduced.

Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other embodiments may be obtained according to these drawings to those skilled in the art.

Fig. 1 is a flowchart of a first video encoding method according to an embodiment of the present invention;

fig. 2 is a flowchart of a second video encoding method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of a video encoding process according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a video decoding method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a video encoding device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video decoding device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.

The execution body of the embodiment of the present invention is described below.

The execution main body of the embodiment of the invention is electronic equipment. The electronic device may be a desktop, a server, or the like.

The video encoding method provided by the embodiment of the invention is described below through a specific embodiment.

Referring to fig. 1, a flow chart of a first video encoding method is provided, the method comprising the following steps S101-S105.

Step S101: it is determined whether the image content of the first I-frame is similar to the image content of the second I-frame.

If the image content of the first I frame is similar to the image content of the second I frame, step S102 is performed.

Wherein the second I frame is a reconstructed frame of an I frame previous to the first I frame.

The above-described manner of determining whether the image content of the first I frame is similar to the image content of the second I frame is described below.

The electronic device may calculate a similarity between the image content of the first I-frame and the image content of the second I-frame and compare the calculated similarity to a similarity threshold. If the similarity is not greater than the similarity threshold, the image content of the first I frame is considered to be similar to the image content of the second I frame; if the similarity is greater than the similarity threshold, the image content of the first I frame is considered dissimilar to the image content of the second I frame.

In one case, the electronic device may calculate the similarity between the image content of the first I-frame and the image content of the second I-frame according to the following relationship:

；

wherein RMSE is the calculated similarity, h is the image height of the first I-frame and the second I-frame, w is the image width of the first I-frame and the second I-frame, (I, j) represents the coordinates of the pixels in the image,gray value representing pixel point with coordinates (I, j) in the first I frame,/>And the gray value of the pixel point with the coordinates (I, j) in the second I frame is represented.

Thus, the similarity is calculated based on the gray value of each pixel point in the first I frame and the gray value of each pixel point in the second I frame, and the obtained similarity can reflect the similarity degree between the image content of the first I frame and the image content of the second I frame. By comparing the magnitude relation between the similarity and the similarity threshold, whether the image content of the first I frame is similar to the image content of the second I frame can be accurately judged.

In addition, the electronic device may also use other similarity calculation algorithms to calculate the similarity between the image content of the first I frame and the image content of the second I frame, for example, a cosine similarity algorithm, a hash similarity algorithm, a histogram algorithm, etc., which is not limited in the embodiment of the present invention.

Step S102: a motion region in the first I-frame that produces motion relative to the second I-frame is determined.

The manner of determining the movement region is described below.

In one implementation, the motion region may be determined based on an offset image obtained from the first I frame relative to the second I frame based on the luminance value of each pixel in the first I frame and the luminance value of each pixel in the second I frame. The specific determination will be further described in the embodiment corresponding to fig. 2, which will not be described in detail here.

In another implementation, the region of motion in the first I-frame that produces motion relative to the second I-frame may be determined using optical streaming.

Step S103: and carrying out motion vector estimation on the coding block in the motion area based on the second I frame to obtain a reference block corresponding to the coding block in the second I frame.

The above-mentioned coding block is the image area in the motion area of the first I frame, the size of the coding block is the preset size. For example, assuming that the size of the encoded block is 4×4 and the size of the motion region is 8×8, 4 encoded blocks will be contained within the motion region; assuming that the size of the encoded blocks is 4 x 4 and the size of the motion region is 16 x 16, then 16 encoded blocks will be contained within the motion region.

The reference block is an image region in the second I frame, and the size of the reference block is identical to the size of the encoded block.

The manner in which the reference block is obtained is described below.

The electronic device may determine a reference block corresponding to the encoded block in the second I-frame according to a motion search algorithm. The motion search algorithm may be a full search algorithm, a diamond search algorithm, or the like, which is not limited in this embodiment of the present invention.

Motion vector estimation is described below.

The electronic device may determine a motion vector between the encoded block and the corresponding reference block based on the position of the encoded block in the first I frame and the position of the reference block corresponding to the encoded block in the second I frame. Specifically, the motion vector between the encoded block and the corresponding reference block may be determined by the following relation:

；

wherein,for the determined motion vector +.>Is the position of the coded block in the first I frame,/or->Is the position of the reference block in the second I frame.

In addition, the electronic device may further obtain a residual matrix between the encoded block and the reference block as a motion vector estimation result by the following relation:

；

wherein,for the calculated residual matrix +.>For codingImage data included in a block, < > >Image data included for the reference block.

Step S104: and encoding the encoding block based on the obtained reference block to obtain encoding data of the encoding block.

The encoding of the encoded block is described below.

As explained for step S103, the electronic device obtains a residual matrix between the encoded block and the reference block. On this basis, the electronic device may perform a DCT (Discrete Cosine Transform ) transformation on the obtained residual matrix according to the following relation:

；

wherein,is a coefficient matrix obtained after DCT transformation, u and v are indexes of frequency domain positions, T is a DCT transformation matrix, and in particular,

。

in one implementation, the coefficient matrix may be entropy encoded to obtain encoded data of the encoded block.

In another implementation manner, the coefficient matrix may be quantized, and entropy encoding is performed on the quantized coefficient matrix to obtain encoded data of the encoded block.

Specifically, the coefficient matrix may be quantized according to the following relation:

；

wherein,is a coefficient matrix after quantization processing, QP is a preset quantization parameter, < >>Is a quantization processing function.

In this case, the obtained encoded data of the encoded block includes not only the coefficient matrix after the quantization processing described above, but also the motion vector between the quantization parameter described above and the encoded block and the corresponding reference block.

Step S105: generating encoded data comprising the encoding mode identification, the number of the motion areas, the identification information of the motion areas, the length of the obtained encoded data and the first I frame of the obtained encoded data.

The coding mode identification indicates that the first I frame is coded based on a non-intra coding mode, and the identification information of the motion areas is used for representing the starting position and the ending position of coding data of each motion area.

Specifically, each data in the encoded data of the first I frame is arranged in the following order:

For example, the encoded data of the first I frame may be in the form of ABC [ DEF, DGF, … … ], where a is the coding mode identification, B is the number of motion regions, C is the length of the obtained encoded data, D is the data start identification of the motion regions, E is the encoded data of the encoded blocks in one motion region, F is the data end identification of the motion regions, and G is the encoded data of the encoded blocks in another motion region.

The encoded data of the first I frame is arranged in this way, and the data format of the encoded data of the first I frame is determined, so that the decoding end can decode the first I frame according to the determined data format.

By applying the scheme provided by the embodiment of the invention, the motion area of the first I frame relative to the second I frame, which is similar to the image content of the second I frame, in the image content of the first I frame can be determined based on the image content of the second I frame, and the obtained motion area is the area where the first I frame moves relative to the second I frame, namely the area where the first I frame has a difference relative to the second I frame. And estimating a motion vector of the coding block in the motion area based on the second I frame to obtain a reference block, and coding the coding block in the first I frame based on the reference block to generate coding data of the first I frame. The encoded block is within the motion region, and therefore, the data of the first I frame generated after encoding the encoded block is actually the encoded data of the motion region within the first I frame. Also, since the image content of the first I frame is similar to the image content of the second I frame, the image content of the first I frame is small in difference from the image content of the second I frame, that is, the region where the difference exists is small, that is, the motion region is small. In this way, the coding block for the motion region is coded, and the resulting coded data codeword for the first I frame is few. Thus, the video is encoded, the obtained video code rate is small, and the requirement on network bandwidth is low during video communication. Therefore, by applying the scheme provided by the embodiment of the invention, the video code rate can be reduced, and the requirement on network bandwidth in the video communication process can be reduced.

Specifically, the coding scheme provided by the embodiment of the invention firstly obtains the I frame in the video frame based on the existing coding scheme, and for the I frame which is not similar to the previous I frame, the existing coding scheme is still used for coding the I frame, and only the I frame which is similar to the previous I frame uses the scheme provided by the embodiment of the invention. That is, the coding scheme provided by the embodiment of the present invention is performed based on the existing coding scheme, and the execution sequence thereof follows the existing coding scheme. Therefore, the coding scheme provided by the embodiment of the invention can be compatible with the existing coding scheme, namely the existing mainstream encoder coding process, and also can be compatible with the universal video communication network protocol.

In one embodiment of the present invention, referring to fig. 2, a flow chart of a second video encoding method is provided.

In this embodiment, as described above for step S102, the motion area may be determined based on the luminance value of each pixel in the first I frame and the luminance value of each pixel in the second I frame. Specifically, the above step S102 may be completed by the following steps S102A to S102C.

Step S102A: a first luminance image of a first I frame and a second luminance image of a second I frame are obtained.

Specifically, the brightness value of each pixel point in the first I frame may be calculated by the following relation:

；

wherein,luminance value representing one pixel point in the first I frame,/and/or>A value representing the red channel of the pixel, is->A value representing the green channel of the pixel, is->A value representing the blue channel of the pixel.

Thus, based on the calculated luminance value of each pixel point in the first I frame, a first luminance image of the first I frame can be obtained.

Similarly, the luminance value of each pixel in the second I frame may be calculated by the following relation:

；

wherein,representing the luminance value of a pixel in the second I frame,/and>a value representing the red channel of the pixel, is->A value representing the green channel of the pixel, is->A value representing the blue channel of the pixel.

In this way, based on the calculated luminance value of each pixel point in the second I frame, a second luminance image of the second I frame can be obtained.

Step S102B: and obtaining an offset image of the first I frame relative to the second I frame based on the difference between the brightness value of each pixel point in the first brightness image and the brightness value of each pixel point in the second brightness image.

The brightness value of each pixel point in the offset image is the absolute value of the difference between the calculated brightness value of each pixel point in the first brightness image and the brightness value of each pixel point in the second brightness image.

Step S102C: and determining a motion area generating motion relative to the second I frame in the first I frame according to the area formed by connecting the pixel points with the brightness value not being 0 in the offset image.

The manner in which the movement region is determined is described below.

In one implementation, a region corresponding to a region connected by pixels with brightness values other than 0 in the offset image in the first I frame may be determined as a motion region in the first I frame that generates motion relative to the second I frame.

In another implementation, the motion region in the first I frame that generates motion relative to the second I frame may be determined based on a region in the offset image where pixels having luminance values not less than a luminance threshold are connected. In particular, this implementation will be further described later, and will not be described in detail here.

Thus, an offset image is obtained based on the difference of gray values of pixel points in the first I frame and the second I frame, a motion area is determined according to the brightness values of the pixel points in the offset image, the obtained offset image can accurately reflect the pixel points with the difference of the brightness values between the first I frame and the second I frame, and the area formed by the pixel points can represent the area with the difference between the first I frame and the second I frame, namely the motion area in the first I frame. Therefore, the motion area in the first I frame can be accurately determined by applying the scheme provided by the embodiment of the invention.

In one embodiment of the present invention, as described above with respect to step S102C, the motion area of the first I frame that generates motion with respect to the second I frame may be determined based on the area where the pixels with luminance values not less than the luminance threshold value are connected in the offset image. Specifically, the step S102C may be completed through the following steps a to B.

Step A: and determining an offset region based on a region formed by pixel points with brightness values not smaller than a brightness threshold value in the offset image.

In this case, the area where the pixel points whose luminance value is not less than the luminance threshold value in the offset image are connected may be determined as the offset area.

In another case, the region connected to the pixels having the luminance value not less than the luminance threshold in the offset image may be subjected to the erosion operation processing and the dilation operation processing, and the region connected to the pixels having the luminance value not less than the luminance threshold in the processed offset image may be the offset region. Specifically, the step A can be completed through the following steps A1 to A2.

Step A1: and based on a preset corrosion kernel and an expansion kernel, sequentially performing corrosion operation treatment, expansion operation treatment and corrosion operation treatment on the region formed by connecting the pixel points with the brightness value not smaller than the brightness threshold value in the offset image.

The predetermined corrosion core and expansion core may beMay be +.>Or in any other form, embodiments of the invention are not limited in this regard.

Step A2: and determining the area connected by the pixel points with the brightness value not smaller than the brightness threshold value in the processed offset image as an offset area.

The area formed by connecting the pixel points with the brightness value not smaller than the brightness threshold value in the offset image is processed, so that the noise area and the burr area in the offset image can be eliminated, and the obtained offset area can better reflect the area with the difference between the first I frame and the second I frame on the whole.

And (B) step (B): an area within the first I-frame corresponding to the offset area is determined as a motion area in the first I-frame that produces motion relative to the second I-frame.

In this way, the motion area is determined, the pixel point with the brightness value smaller than the brightness threshold value in the offset image can be removed from the offset area, namely, the pixel point corresponding to the pixel point in the first I frame is removed from the motion area, so that obvious motion is generated between the determined motion area and the second I frame, the influence of the burr area or the noise area in the two frames of images is removed, the obtained offset area can better reflect the area with the difference between the first I frame and the second I frame, the obtained motion area is more accurate, and the quality of video coding based on the motion area is high. In addition, the burr area or the noise area is removed, the range of the motion area can be reduced, the data volume of the encoded data of the motion area during encoding is reduced, and the video code rate after encoding is reduced.

The video encoding method provided by the embodiment of the present invention will be described in the following with a specific example.

Referring to fig. 3, a flow chart of a video encoding process is provided. This embodiment includes step S301-step S308.

Step S301: the image data of the second I frame is buffered.

Specifically, the electronic device may release the previous buffered data and buffer the image data of the second I frame.

Step S302: image data of a first I frame is acquired.

Step S303: the similarity is calculated based on the image data of the first I frame and the image data of the second I frame.

If the similarity is not greater than the similarity threshold, step S304 is executed; if the similarity is greater than the similarity threshold, step S306 is performed.

Specifically, the above process of calculating the similarity is described in detail in step S101 in the embodiment shown in fig. 1, and is not repeated here.

Step S304: the first I frame is predicted based on the second I frame.

Specifically, the above-mentioned process of predicting the first I frame based on the second I frame is the process performed from step S102 to step S103 in the embodiment shown in fig. 1, and is not repeated here.

Step S305: the first I frame is encoded based on the predicted data.

Specifically, the above manner of encoding the first I frame is described at step S104 in the embodiment shown in fig. 1, and is not repeated here.

Step S306: the first I frame is encoded based on image data of the first I frame.

The encoding of the first I frame based on the image data of the first I frame is performed by intra-frame encoding based on the image data of the first I frame.

Step S307: and transmitting the coded data according to a preset format.

For the encoded data obtained in step S305, the above predetermined format is the format of the arrangement order of the data in the encoded data of the first I frame introduced in step S105 in the embodiment shown in fig. 1, and will not be repeated here.

For the encoded data obtained in step S306, the above-mentioned preset format is a general intra-frame encoded data format, for example, h.264 format, or M-JPEG (Motion-Join Photographic Experts Group, motion still image compression technique) format.

Step S308: the second I frame is updated with the reconstructed frame of the first I frame.

In accordance with one embodiment of the present invention, referring to fig. 4, a flow chart of a video decoding method is provided. The video decoding method includes steps S401 to S404.

Step S401: video data to be decoded is obtained.

Step S402: if the coding mode identification carried in the video data to be decoded indicates that the video frame to be decoded is an I frame and is coded in a non-intra-frame coding mode, decoding the coded data carried in the video data to be decoded based on the number of the moving areas carried in the video data to be decoded, the length of the coded data and the identification information of the moving areas, and obtaining a motion vector and residual error result of a coding block in the moving areas.

Specifically, the electronic device may parse the encoded data carried in the video data to be decoded to obtain a motion vector and a residual result of the encoded block in the motion area based on the number of the motion areas carried in the video data to be decoded, the length of the encoded data, and the identification information of the motion areas.

Step S403: and obtaining a reconstruction block of the coding block in the motion area based on the obtained motion vector and residual result and the cached third I frame.

Wherein the third I frame is: reconstructed frames of a previous I-frame of the video frame to be decoded.

Specifically, the electronic device may perform inverse DCT on encoded data carried in video data to be decoded, and reconstruct data obtained by inverse DCT based on the obtained motion vector and residual result and the buffered third I frame, to obtain a reconstructed block of the encoded block in the motion region.

Step S404: and obtaining a decoding result of the video frame to be decoded based on the obtained reconstructed block and the third I frame.

Since the encoded data carried in the video data to be decoded is the encoded data of the image content corresponding to the motion area, the motion area is the area where there is motion between the video frame to be decoded and the third I frame, that is, for the non-motion area, there is no motion between the video frame to be decoded and the third I frame, that is, there is no difference in the image content.

Therefore, the electronic device can replace the image content of the encoded block in the third I frame corresponding to the reconstructed block with the image content of the obtained reconstructed block, and the replaced image is used as the decoding result of the video frame to be decoded.

By applying the scheme provided by the embodiment of the invention, the coded data carried in the video data can be decoded based on the third I frame, so that the decoding result of the video frame to be decoded is obtained. When the video frame to be decoded is decoded, the data obtained by decoding is the data of the reconstructed block of the coding block in the motion area, namely, the obtained video data to be decoded is the data of the coding block in the motion area, and the motion area is the area where the video frame to be decoded moves relative to the third I frame, so that the video data to be decoded does not need to comprise the data of all image contents in the video frame to be decoded, only needs to comprise the data of the motion area, namely, the data size of the video data to be decoded is small, and therefore, the video code rate of the video to be decoded is small, and the requirement on network bandwidth in video communication is low. Therefore, by applying the scheme provided by the embodiment of the invention, the video code rate can be reduced, and the requirement on network bandwidth in the video communication process can be reduced.

Corresponding to the coding scheme provided in the foregoing, when the receiving end performs video decoding, the receiving end gives the existing decoding scheme to acquire the coded data, decodes the I frame which is not coded by using the coding scheme provided in the embodiment of the present invention according to the existing decoding scheme, decodes the I frame which is coded by using the coding scheme provided in the embodiment of the present invention only by using the decoding scheme provided in the embodiment of the present invention, and decodes the subsequent P frame, B frame, etc. according to the data obtained by decoding. That is, the decoding scheme provided by the embodiment of the invention is performed based on the existing decoding scheme, and the execution sequence of the decoding scheme is before the existing decoding scheme. Therefore, the decoding scheme provided by the embodiment of the invention can be compatible with the existing decoding scheme, namely the existing main stream decoder decoding process and the universal video communication network protocol.

Corresponding to the video coding method, the embodiment of the invention also provides a video coding device.

Referring to fig. 5, there is provided a schematic structural diagram of a video encoding apparatus, the apparatus comprising:

the image content determining module 501 is configured to determine whether the image content of a first I frame is similar to the image content of a second I frame, where the second I frame is a reconstructed frame of a previous I frame of the first I frame;

A motion region determining module 502, configured to determine a motion region in the first I frame that generates motion with respect to the second I frame, in a case where the image content of the first I frame is similar to the image content of the second I frame;

a motion vector estimation module 503, configured to perform motion vector estimation on the encoded block in the motion area based on the second I frame, to obtain a reference block corresponding to the encoded block in the second I frame;

an encoded data obtaining module 504, configured to encode the encoded block based on the obtained reference block, to obtain encoded data of the encoded block;

the encoded data generating module 505 is configured to generate encoded data including an encoding mode identifier, the number of motion regions, identification information of the motion regions, a length of the obtained encoded data, and a first I frame of the obtained encoded data, where the encoding mode identifier indicates that the first I frame is encoded based on a non-intra-frame encoding mode.

In one embodiment of the present invention, the motion area determining module 502 includes:

In one embodiment of the invention, the motion region determination submodule includes:

In one embodiment of the present invention, the offset area determining unit is specifically configured to: sequentially performing corrosion operation treatment, expansion operation treatment and corrosion operation treatment on a region formed by connecting pixel points with brightness values not smaller than a brightness threshold value in the offset image based on a preset corrosion core and an expansion core; and determining the area formed by the pixel points with the brightness value not smaller than the brightness threshold value in the processed offset image as an offset area.

In one embodiment of the present invention, the image content determining module 501 is specifically configured to: calculating the similarity between the image content of the first I frame and the image content of the second I frame according to the following relation:

；

wherein RMSE is the calculated similarity, h is the image height of the first I-frame and the second I-frame, w is the image width of the first I-frame and the second I-frame, (I, j) represents the coordinates of the pixels in the image, Gray value representing pixel point with coordinates (I, j) in the first I frame,/>A gray value representing a pixel point having coordinates (I, j) in the second I frame;

In one embodiment of the present invention, each data in the encoded data of the first I frame is arranged in the following order:

Corresponding to the video decoding method, the embodiment of the invention also provides a video decoding device.

Referring to fig. 6, there is provided a schematic structural diagram of a video decoding apparatus, the apparatus comprising:

a data obtaining module 601, configured to obtain video data to be decoded;

the data decoding module 602 is configured to, when the coding mode identifier carried in the video data to be decoded indicates that the video frame to be decoded is an I frame and is coded in a non-intra-frame coding mode, decode the coded data carried in the video data to be decoded based on the number of motion areas carried in the video data to be decoded, the length of the coded data, and the identification information of the motion areas, to obtain a motion vector and residual data of a coding block in the motion areas;

a reconstructed block obtaining module 603, configured to obtain a reconstructed block of the encoded block in the motion region based on the obtained motion vector and residual data and a buffered third I frame, where the third I frame is: a reconstructed frame of a previous I frame of the video frame to be decoded;

A decoding result obtaining module 604, configured to obtain a decoding result of the video frame to be decoded based on the obtained reconstructed block and the third I frame.

The embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 perform communication with each other through the communication bus 704,

A memory 703 for storing a computer program;

the processor 701 is configured to implement the video encoding method or the video decoding method according to the foregoing method embodiment when executing the program stored in the memory 703.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-ProgrammableGate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, where a computer program is stored, where the computer program, when executed by a processor, implements the video encoding method or the video decoding method according to the foregoing method embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, the electronic device, and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method of video encoding, the method comprising:

generating coded data comprising a coding mode identifier, the number of the motion areas, identification information of the motion areas, the length of the obtained coded data and a first I frame of the obtained coded data, wherein the coding mode identifier indicates that the first I frame is coded based on a non-intra-frame coding mode, and the identification information of the motion areas is used for representing the starting position and the ending position of the coded data of each motion area;

The determining a motion region in the first I-frame that produces motion relative to the second I-frame comprises:

2. The method of claim 1, wherein determining a motion region of the first I-frame that generates motion relative to the second I-frame based on a region of the offset image where pixels having a luminance value other than 0 are connected, comprises:

3. The method according to claim 2, wherein determining the offset region based on the region in which pixels having a luminance value not smaller than a luminance threshold value in the offset image are connected, comprises:

4. The method of claim 1, wherein determining whether the image content of the first I-frame is similar to the image content of the second I-frame comprises:

；

5. The method of any of claims 1-4, wherein the encoded data of the first I-frame is arranged in the following order:

6. A method of video decoding, the method comprising:

obtaining video data to be decoded;

if the coding mode identification carried in the video data to be decoded indicates that the video frame to be decoded is an I frame and is coded in a non-intra-frame coding mode, decoding the coding data carried in the video data to be decoded based on the number of the motion areas carried in the video data to be decoded, the length of the coding data and the identification information of the motion areas to obtain motion vectors and residual data of coding blocks in the motion areas, wherein the identification information of the motion areas is used for representing the starting position and the ending position of the coding data of each motion area;

7. A video encoding device, the device comprising:

The coding data generation module is used for generating coding data comprising coding mode identification, the number of the motion areas, identification information of the motion areas, the length of the obtained coding data and a first I frame of the obtained coding data, wherein the coding mode identification indicates that the first I frame is coded based on a non-intra-frame coding mode, and the identification information of the motion areas is used for representing the starting position and the ending position of the coding data of each motion area;

the motion region determination module includes:

a luminance image obtaining sub-module for obtaining a first luminance image of the first I frame and a first luminance image of the second I frame

A two-brightness image;

an offset image obtaining sub-module for obtaining an offset image based on the brightness value of each pixel point in the first brightness image and the brightness value of each pixel point

Obtaining an offset map of the first I frame relative to the second I frame by the difference of brightness values of each pixel point in the second brightness image

An image;

a motion region determination submodule for determining that the motion region is formed by connecting pixels with brightness values different from 0 in the offset image

And determining a motion area in the first I frame, which generates motion relative to the second I frame.

8. A video decoding device, the device comprising:

The data acquisition module is used for acquiring video data to be decoded;

the data decoding module is used for decoding the coded data carried in the video data to be decoded based on the number of the motion areas carried in the video data to be decoded, the length of the coded data and the identification information of the motion areas under the condition that the coding mode identification carried in the video data to be decoded indicates that the video frame to be decoded is an I frame and is coded in a non-intra-frame coding mode, so as to obtain the motion vector and residual data of the coded block in the motion area, and the identification information of the motion area is used for representing the initial position and the end position of the coded data of each motion area;

9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-5 or the method steps of claim 6 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5 or the method steps of claim 6.