Video coding and decoding method and system
Technical Field
The invention belongs to the technical field of video coding and decoding, and particularly relates to a video coding and decoding method and system.
Background
Because the amount of video data is huge and the network bandwidth is limited, the video data generally needs to be compressed to reduce the amount of data transmission during network transmission. The existing network video compression method is unidirectional, and compression processing is not carried out according to the user state of watching video.
The human visual system is generally clear only around 2 degrees from the center point, and images outside this angle are blurred as the angle is larger. The blurred image is mostly filled with details through the memory of the human brain, so that the whole image is reconstructed in the brain. The amount of data at each moment in time for the video is much higher than what the user can actually observe.
One current method is: the sight line of a person watching the video (hereinafter referred to as a user) is tracked by an instrument, so that a focus point of the sight line on the video is found, and then an unfocused image is blurred by an algorithm, so that the image data of the area is close to each other, and the data volume of the compressed video is reduced. The purpose of this method is mainly biological research rather than reducing the amount of video data.
The other similar way is that: the method comprises the steps of shooting a focus point of a user through a camera with high resolution but small visual angle, and superposing the focus point and an image generated by shooting through a camera with low resolution but large visual angle to form an image with the focus point of the user, wherein the focus point is partially clear and the rest of the image is fuzzy.
In summary, in the prior art, an instrument or a certain algorithm is required to process a video image, and the prior art is not efficient because manpower or resources with fast computing power are required to be vigorously matched.
Disclosure of Invention
The embodiment of the invention provides a video coding and decoding method and system, aiming at solving the problem of low processing efficiency of video images in the prior art.
In one aspect, a video coding and decoding method is provided, and the method includes:
detecting a focus point of eyeballs of a user on a video picture;
dividing the collected video image into a preset number of image areas by taking the focus point as a center, wherein the image areas at least comprise a first image area and a bottom image area;
splitting the video image into at least two independent video images according to the frame of the image area;
performing transparentization processing on the split independent video image;
reducing the independent video images one by one according to a preset reduction scale, wherein the reduction scale corresponding to each independent video image is different, and the closer the distance from the focus point is, the larger the reduction scale value corresponding to the independent video image is;
carrying out video coding on the reduced independent video images one by one to generate coded video images;
performing video decoding on the encoded video image to obtain a reduced independent video image corresponding to the independent video image;
restoring the reduced independent video images to the size before reduction one by one according to the reduction proportion;
processing the enlarged reduced independent video image according to a transparency value used when performing transparentization processing on the independent video image;
and combining the reduced independent video images with the transparency values set by taking the focus point as a center to generate a complete video image.
Further, the shape and size of the frame of the base image area are the same as those of the frame of the video image, and the areas of the other image areas other than the base image area are set so as to gradually increase with increasing distance from the frame of the other image areas to the focus point;
the areas of other image areas except the bottom image area are preset; or,
the image area is adjusted according to the distance between the user and the display screen, and the area of the image area is smaller the closer the user is to the display screen.
Further, the performing the transparentization processing on the split independent video image specifically includes:
setting an area without image information in the split independent video image as a transparent value; and/or the presence of a gas in the gas,
and setting repeated image information between the split independent video images as a transparent value.
Further, after the reducing the plurality of independent video images one by one according to a preset reduction scale, the method further includes:
and adjusting the color resolution of the plurality of independent video images one by one according to a preset color resolution, wherein the color resolution corresponding to each independent video image is different, and the closer the distance from the focus point is, the larger the color resolution corresponding to the independent video image is.
Further, after the reducing the plurality of reduced independent video images one by one to the size before reduction according to the reduction scale, the method further includes:
and smoothing the reduced independent video image restored to the size before reduction.
On the other hand, a video coding and decoding system is provided, which comprises a user terminal and a video acquisition terminal;
the user side includes:
the device comprises a focus point acquisition unit, a video acquisition unit and a video acquisition unit, wherein the focus point acquisition unit is used for detecting a focus point of eyeballs of a user on a video picture and sending the focus point to the video acquisition end;
the video acquisition end comprises:
the dividing unit is used for dividing the collected video image into a preset number of image areas by taking the focus point as a center, wherein the image areas at least comprise a first image area and a bottom image area;
the splitting unit is used for splitting the video image into at least two independent video images according to the borders of the plurality of image areas;
the first transparentizing processing unit is used for performing transparentizing processing on the split independent video image;
the zooming-out unit is used for zooming out the independent video images one by one according to a preset zooming-out scale, the zooming-out scale corresponding to each independent video image is different, and the smaller the distance between each independent video image and the focusing point is, the larger the zooming-out scale value corresponding to each independent video image is;
a coding unit, configured to perform video coding on the reduced independent video images one by one independently, generate a coded video image, and send the coded video image, the reduced scale value, and a transparency value used when performing transparency processing on the independent video image to the user side;
the user side further comprises:
a decoding unit, configured to perform video decoding on the encoded video image to obtain a reduced independent video image corresponding to the independent video image;
the amplifying unit is used for restoring the reduced independent video images to the size before reduction one by one according to the reduction proportion sent by the video acquisition end;
the second transparentization processing unit is used for processing the amplified reduced independent video image according to the transparent value sent by the video acquisition end;
and the image merging unit is used for merging the reduced independent video images with the transparency values set by taking the focus point sent by the video acquisition end as a center to generate a complete video image.
Further, the user terminal further comprises
A smoothing unit for smoothing the reduced independent video image restored to the pre-reduction size.
Further, the video capture terminal further comprises:
and the resolution adjusting unit is used for adjusting the color resolutions of the independent video images one by one according to a preset color resolution, the color resolution corresponding to each independent video image is different, and the closer the distance from the focus point is, the larger the color resolution corresponding to the independent video image is.
Further, the shape and size of the frame of the base image area are the same as those of the frame of the video image, and the areas of the other image areas other than the base image area are set so as to gradually increase with increasing distance from the frame of the other image areas to the focus point;
the areas of other image areas except the bottom image area are preset; or,
the image area is adjusted according to the distance between the user and the display screen, and the area of the image area is smaller the closer the user is to the display screen.
Further, the first transparentizing processing unit includes:
the first transparentizing processing module is used for setting an area without image information in the split independent video image as a transparent value; and/or the presence of a gas in the gas,
and the second transparentizing processing module is used for setting repeated image information among the split independent video images as a transparent value.
In the embodiment of the invention, before video coding, a focus point of eyeballs of a user on a video picture is obtained, the video image is subjected to region division according to the focus point, then the video image is divided into at least two independent video images according to the frame of each divided image, the independent video images are reduced one by one according to the proportion, finally each reduced independent video image is subjected to video coding, and then a system can execute corresponding decoding processing according to processing before the video coding. Since the smaller the distance from the focus point, the larger the reduction scale value corresponding to the independent video image, the sharpness of the video image near the focus point of the eyeball of the user can be effectively maintained, and the data volume of the video image which is not concerned can be reduced. In addition, the embodiment of the invention does not need to execute a certain algorithm, has lower requirement on the computing capacity of the processor, is simple and easy to implement and has high efficiency.
Drawings
Fig. 1 is a flowchart illustrating an implementation of a video encoding and decoding method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a video image after being subjected to region segmentation according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a plurality of separated video images according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a process of scaling down a plurality of independent video images according to an embodiment of the present invention;
fig. 5 is a block diagram of a video codec system according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the embodiment of the invention, before video coding, a focus point of eyeballs of a user on a video picture is obtained, the video image is subjected to region division according to the focus point, then the video image is divided into at least two independent video images according to the frame of each divided image, the independent video images are reduced one by one according to the proportion, finally each reduced independent video image is subjected to video coding, and then a system can execute corresponding decoding processing according to processing before the video coding.
The following detailed description of the implementation of the present invention is made with reference to specific embodiments:
example one
Fig. 1 shows a flowchart of an implementation of a video encoding method according to an embodiment of the present invention, and for convenience of description, only the relevant portions of the embodiment of the present invention are shown, and the detailed description is as follows:
in step S101, a focus point of the eyeball of the user on the video screen is detected.
In the embodiment of the invention, the user side shoots the user by using the stereo camera, acquires the three-dimensional position of the eyeball of the user relative to the display screen, then judges the focus point of the user on the video picture by detecting the posture angle of the eyeball of the user, and finally sends the information of the focus point to the video acquisition end. It should be noted that the display screen is a display screen of the user side, and the video picture is displayed on the display screen.
In step S102, the collected video image is divided into a predetermined number of image areas with the focus point as a center, where the image areas at least include a first image area and a bottom image area, the shape and size of the border of the bottom image area are the same as those of the border of the video image, and the areas of the other image areas except the bottom image area are set in such a manner that the distances from the other image areas to the focus point are increased.
In the embodiment of the present invention, after detecting the focus point of the user's eyeball on the screen, the user terminal sends the focus point to the video capture terminal, and the video capture terminal divides the captured video image into a plurality of image areas with the focus point as the center, as shown in fig. 2. In fig. 2, the video image is divided into a first image area, a second image area, a third image area and a bottom image area centered on the focus point. The frame of the image area may be formed by a circle as shown in fig. 2, or may be formed by other different shapes; the shape and size of the frame of the bottom image area are the same as those of the frame of the video image, so that the whole video image can be completely and accurately included. The areas of the other image regions, except the bottom image region, are set to gradually increase with increasing distance from the focus point. If the first image area is smaller than the second image area, the second image area is smaller than the third image area, and so on. The areas of the other image areas may be preset or adjusted according to the distance between the user and the display screen, that is, the area of the image area is reduced correspondingly as the distance between the user and the display screen is closer.
In step S103, the video image is split into at least two independent video images according to the frame of the image area.
In the embodiment of the present invention, after partitioning the video image, the video image is split into a plurality of independent video images according to the borders of the plurality of image areas shown in fig. 2, as shown in fig. 3, the video image is split into 4 independent video images including the first image, the second image, the third image and the bottom image. The shape of the independent video image is mainly rectangular and can completely cover the frame of each image area.
In step S104, a transparentization process is performed on the split independent video image.
In this embodiment, the area of the split independent video image without image information can be set to a transparent value; the repeated video information between the split independent video images can also be set as a transparent value. For example, a region of the independent video image without the image information is filled with a single color tone, such as black, or a transparency value is set, and the region of the independent video image without the image information is filled with the transparency value as shown in fig. 3, where the image information in the portion is represented by black.
In step S105, the independent video images are reduced one by one according to a preset reduction ratio, the reduction ratio corresponding to each independent video image is different, and the smaller the distance from the focus point, the larger the reduction ratio value corresponding to the independent video image.
In this embodiment, as shown in fig. 4, the separated independent video images are reduced according to a set scale, and the color resolution of each independent video image is different. One effective method is to set the reduction scale in a decreasing manner, i.e., the closer the distance from the focus point, the larger the reduction scale value corresponding to the independent video image. If the reduction ratio of the first image is 100%, the reduction ratio of the second image is 80%, and so on. The method can effectively keep the definition of the video image near the focus point of the eyeball of the user, and simultaneously reduce the data volume of the video image which is not concerned.
In addition, the color resolution of the individual video image may be set to be decreased, that is, the color resolution of the individual video image closer to the focal point is higher, so as to further reduce the amount of image data.
In step S106, the independent video images are reduced one by one according to a preset reduction ratio, the reduction ratio corresponding to each independent video image is different, and the smaller the distance from the focus point, the larger the reduction ratio value corresponding to the independent video image.
In this embodiment, the scaled independent video image is independently encoded, such as h.264, and transmitted to the user end via the network. In addition, the transparency value, the zoom-out ratio and the focus point detected in step S101 of each independent video image are simultaneously transmitted to the user end in the form of data.
In step S107, the independent video images are reduced one by one according to a preset reduction ratio, the reduction ratio corresponding to each independent video image is different, and the smaller the distance from the focus point, the larger the reduction ratio value corresponding to the independent video image.
In this embodiment, after receiving the data and the images transmitted from the video capturing end, the user end first decodes the independent encoded video images by a corresponding decoding method (e.g., h.264) to obtain a plurality of reduced independent video images.
In step S108, the reduced independent video images are restored to the size before reduction one by one according to the reduction ratio.
In this embodiment, the user side restores the decoded reduced independent video image to the original size according to the reduction ratio of each independent video image sent by the video acquisition side, and the reduced and re-enlarged independent video image becomes blurred, but the blurred image areas are areas that are not concerned by the user.
In step S109, the enlarged reduced independent video image is processed based on the transparency value used when performing the transparentization processing on the independent video image.
In this embodiment, the scaled down independent video image may also be smoothed to reduce jagged images. And then applying the transparency value of each independent video image before video coding to each amplified reduced independent image.
In step S110, the reduced independent video images with the transparency values set are combined with the focus point as the center, so as to generate a complete video image.
In this embodiment, the processed independent video images are finally overlapped on the bottom layer image with the focus point as the center, so as to form a complete video image, and the complete video image is finally transmitted to the display screen for displaying.
According to the embodiment of the invention, before video coding is carried out, a focus point of eyeballs of a user on a video picture is obtained, the video image is divided into the regions according to the focus point, then the video image is divided into at least two independent video images according to the frame of each divided image, the independent video images are reduced in proportion, finally each reduced independent video image is subjected to video coding, and subsequently, a system can execute corresponding decoding processing according to processing before network video coding. Since the smaller the distance from the focus point, the larger the reduction scale value corresponding to the independent video image, the sharpness of the video image near the focus point of the eyeball of the user can be effectively maintained, and the data volume of the video image which is not concerned can be reduced. In addition, the embodiment of the invention does not need to execute a certain algorithm, has lower requirement on the computing capacity of the processor, is simple and easy to implement and has high efficiency.
Example two
Fig. 5 shows a block diagram of a video codec system according to a second embodiment of the present invention, and for convenience of description, only the relevant parts of the video codec system according to the second embodiment of the present invention are shown, where the video codec system includes: a user terminal 51 and a video capture terminal 52.
Wherein, the user terminal 51 includes: a focus point acquisition unit 511, a decoding unit 512, an enlarging unit 513, a second transparentizing processing unit 514, and an image merging unit 515.
The focus point obtaining unit 511 is configured to detect a focus point of an eyeball of a user on a video image, and send the focus point to the video capturing end.
The video capture terminal 52 includes: a dividing unit 521, a splitting unit 522, a first transparentization processing unit 523, a reducing unit 524, and an encoding unit 525.
The dividing unit 521 is configured to divide the acquired video image into a preset number of image areas with the focus point as a center, where the image areas at least include a first image area and a bottom image area, where a size of a frame of the bottom image area is the same as a size of a frame of the video image, and sizes of frames of other image areas except the bottom image area; in addition, the size of the frame corresponding to each image area is different, the closer the distance from the focusing point is, the smaller the frame is, the size of the frame corresponding to each image area can be preset, and can also be adjusted according to the distance between the user and the display screen, and the closer the user is to the display screen, the smaller the frame corresponding to the image area is;
a splitting unit 522, configured to split the video image into at least two independent video images according to a frame of the image area;
a first transparentizing processing unit 523 configured to perform transparentizing processing on the split independent video image;
a reducing unit 524, configured to reduce the independent video images one by one according to a preset reduction scale, where the reduction scale corresponding to each independent video image is different, and the smaller the distance from the focus point, the larger the reduction scale value corresponding to the independent video image is;
an encoding unit 525, configured to perform video encoding on the reduced independent video images one by one, generate an encoded video image, and send the encoded video image, the reduced scale value, and a transparency value used when performing transparency processing on the independent video image to the user side;
a decoding unit 512, configured to perform video decoding on the encoded video image sent by the video acquisition end 52, so as to obtain a reduced independent video image corresponding to the independent video image;
an enlarging unit 513, configured to restore the reduced independent video images to the size before reduction one by one according to the reduction ratio sent by the video capturing end 52;
a second transparency value setting unit 514, configured to process the enlarged reduced independent video image according to the transparency value sent by the video capturing end 52;
the image merging unit 515 is configured to merge the reduced independent video images with the transparency values set by taking the focus point sent by the video capture end 52 as a center, so as to generate a complete video image.
Further, as a preferred embodiment of the present invention, the user terminal 51 further includes: a smoothing unit for smoothing the plurality of reduced independent video images restored to the pre-reduction size.
Further, as a preferred embodiment of the present invention, the video capturing end 52 further includes: and the resolution adjusting unit is used for adjusting the color resolution of each independent video image one by one according to the preset color resolution, the corresponding color resolution of each independent video image is different, and the closer the distance from the focus point is, the larger the corresponding color resolution of the independent video image is.
Further, the first transparentizing processing unit 523 includes: the first transparentizing processing module is used for setting an area without image information in the split independent video image as a transparent value; and the second transparentizing processing module is used for setting repeated image information among the split independent video images as a transparent value.
It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In addition, it can be understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above can be implemented by instructing the relevant hardware through a program, and the corresponding program can be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.