Video image processing method and device
Technical Field
The embodiment of the invention relates to the technical field of videos, in particular to a video image processing method and device.
Background
In a video surveillance or video conference scenario, a single image has a large number of still areas and relatively few moving areas. In current coding schemes, basically, the reference picture closest in time to the picture to be coded is used as a reference, and since current coding standards and the coding schemes actually used are lossy compressed, the quality of the reference picture becomes much worse than the high-quality reference picture over time after a high-quality reference picture. When the reference image with poor quality is used as reference, the reconstruction of the subsequent coding image is also poor, and the coding code rate is also high.
Disclosure of Invention
The embodiment of the invention provides a video image processing method and equipment, which are used for solving the defects that in the prior art, a reference image with poor quality is used as a reference, the reconstruction of a subsequent coded image is also poor, and the coding rate is also high, improving the coding quality of the video image, reducing the coding rate of the video image, and further reducing the transmission bandwidth required by a coding code stream and the storage space required by the storage of the coding code stream.
The embodiment of the invention provides a video image processing method, which comprises the following steps:
dividing the area of each image of a video image to distinguish a static area and a moving area of each image of the video image;
generating a long-term reference image by taking the image of the static area as a basis;
generating a short-term reference image only for the motion region;
and enabling a static area in the image to be coded to take the long-term reference image as a reference, and enabling a motion area in the image to be coded to take a short-term reference image as a reference, and generating the image to be coded.
An embodiment of the present invention provides a video image processing apparatus, including:
the image area dividing unit is used for dividing the area of each image of the video image and distinguishing the static area and the moving area of each image of the video image;
a long-term reference image generation unit for generating a long-term reference image based on the image of the still region;
a short-term reference image generation unit for generating a short-term reference image for the motion region;
and the image to be coded generating unit is used for generating an image to be coded, wherein the still area in the image to be coded takes the long-term reference image as a reference, and the motion area in the image to be coded takes the short-term reference image as a reference.
The video image processing method and the video image processing device provided by the embodiment of the invention divide the area of each image of a video image, and distinguish the static area and the motion area of each image of the video image; generating a long-term reference image by taking the image of the static area as a basis; generating a short-term reference image only for the motion region; and taking the long-term reference image as a reference for a static area in the image to be coded, and taking the short-term reference image as a reference for a moving area in the image to be coded, so as to generate the image to be coded. Compared with the prior art, the video processing method and the video processing equipment provided by the embodiment of the invention do not need to encode the whole image to be encoded, the moving area in the image to be encoded takes the short-term reference image as reference, and the static area in the image to be encoded takes the long-term reference image as reference, so that the complete image to be encoded can be generated. Therefore, the video image processing method and the video image processing device provided by the embodiment of the invention can improve the coding quality of the video image, reduce the coding rate of the video image, and further reduce the transmission bandwidth required by the coding code stream and the storage space required by the storage of the coding code stream.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a general procedure of a video image processing method according to a first embodiment and a second embodiment of the present invention;
fig. 2 is a schematic diagram of a general signal flow relationship of video image processing apparatuses according to a third embodiment and a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a video image processing method according to an embodiment of the present invention includes:
step 11: dividing the area of each image of the video image to distinguish the static area and the motion area of each image of the video image;
step 12: generating a long-term reference image by taking the image of the static area as a basis;
step 13: generating a short-term reference image only for the motion region;
step 14: and taking the long-term reference image as a reference for a static area in the image to be coded, and taking the short-term reference image as a reference for a moving area in the image to be coded, so as to generate the image to be coded.
The video image processing method provided by the embodiment of the invention comprises the following steps of: dividing the area of each image of the video image to distinguish the static area and the motion area of each image of the video image; step 12: generating a long-term reference image by taking the image of the static area as a basis; step 13: generating a short-term reference image only for the motion region; step 14: and taking the long-term reference image as a reference for a static area in the image to be coded, and taking the short-term reference image as a reference for a moving area in the image to be coded, so as to generate the image to be coded. Compared with the prior art, the video processing method provided by the embodiment of the invention does not need to encode the whole image to be encoded, the moving area in the image to be encoded takes the short-term reference image as the reference, and the static area in the image to be encoded takes the long-term reference image as the reference, so that the complete image to be encoded can be generated. Therefore, the video image processing method provided by the embodiment of the invention can improve the coding quality of the video image, reduce the coding rate of the video image, and further reduce the transmission bandwidth required by the coding code stream and the storage space required by the storage of the coding code stream.
The application environment of the video image processing method provided by the embodiment of the invention comprises a video monitoring environment and/or a video conference environment. Since the long-term reference image applied in the video image processing method provided by the first embodiment of the present invention is based on the image of the still region in the video image, the shooting background angle of the real-time video image needs to be single, that is, the camera itself needs to be fixed, and the shooting scene also needs to be fixed, and the video monitoring environment and/or the video conference environment completely meet the shooting requirement of the video image processing method provided by the first embodiment of the present invention. However, in the case of shooting by television, movie, etc., since the shooting angle and the focal length are changed in real time, it is difficult to find a still region in each image in the video image, and therefore, the video processing method provided by the embodiment of the present invention is not suitable for processing video images of television and movie.
When the video image processing method provided by the first embodiment of the present invention is applied to divide each image of a video image, a historical image is used as a basis, wherein a static area is an area that is static and unchanged in an image above 120 images, and a moving area is an area in which a person or an object moves in each image. For example, in a video conference environment, when the installation position of a camera and the direction of the camera are not changed, in a real-time video image of a video conference, a fixed background such as a wall, a ceiling lamp, a desk and the like is usually fixed, and a person in a video conference scene may be in motion, so that at this time, an image formed by combining the fixed backgrounds such as the wall, the ceiling lamp, the desk and the like is used as a static area, and the person is used as a motion area to be divided. The reference is to select the image above 120 images to make the selection of the still area more accurate, and at this time, if the number of the selected images is too small, the determination of the still area will be affected. Even though the conventional video playing technology generally adopts the frame frequency of 24 images, in this case, the time required for playing 120 images is 5s, in this embodiment, a video frame area which is static for more than 5s is considered as a static area, so as to avoid that the selection time is too short, and people or objects in the motion area are not changed yet and are mistakenly judged in the static area.
The method for generating the image to be coded comprises the following steps:
step 1411: removing a moving area in an image to be coded to obtain an image of a static area of the image to be coded;
step 1412: generating an image of a motion area of an image to be coded by taking a short-term reference image as a basis;
step 1413: splicing the image of the static area of the image to be coded with the image of the motion area of the image to be coded together to generate a complete image to be coded.
In the video image processing method provided by the first embodiment of the present invention, since the moving area is separated from the static area by using a pre-elimination method, it is required to eliminate a sufficiently large area in the image to ensure that the moving area is completely eliminated, thereby avoiding serious distortion of the video image due to the fact that people or objects in motion in the video image cannot be completely included because the moving area is too small. Therefore, the video image processing method provided by the embodiment of the invention cannot be well adapted to scenes with excessively dispersed static areas.
The video image processing method further comprises the step of updating the long-term reference image in a set period, and the method comprises the following steps: storing each long-term reference image to be updated in a reference image buffer area, wherein each long-term reference image is attached with a generation time tag; and calling the long-term reference image with the generation time label closest to the image to be coded as a reference to generate the image to be coded.
Since the video image processing method provided in the first embodiment of the present invention sets the static area in the image, in long-term application, the image in the static area may also change, for example, if the scene is refitted, the image on the wall may change, and if the desk is moved, the image on the desk may also change, so that the long-term reference image needs to be updated in the set period to avoid the distortion of the video image as much as possible.
Example two
Different from the video image processing method provided in the first embodiment of the present invention, in the video image processing method provided in the second embodiment of the present invention, the method for generating an image to be encoded includes:
step 1421: dividing each image of the video image into a plurality of grid areas;
step 1422: comparing the image to be coded with the long-term reference image; selecting a grid region which is completely consistent with a region corresponding to the long-term reference image in the image to be coded as a static region of the image to be coded; the grid region inconsistent with the region corresponding to the long-term reference image is a motion region of the image to be coded;
step 1423: completely applying the grid-shaped region in the long-term reference image to generate an image of a static region of the image to be coded;
step 1424: generating an image of a motion area of the image to be coded by taking a short-term reference image of the image to be coded as a basis;
step 1425: splicing the image of the static area of the image to be coded with the image of the motion area of the image to be coded together to generate a complete image to be coded.
The video image processing method provided by the second embodiment of the invention has the advantage that even if the static areas are very dispersed, the video image processing method provided by the second embodiment of the invention can be well adapted because the video image is divided in a grid dividing mode.
EXAMPLE III
Referring to fig. 2, as a specific implementation of the video image processing method according to the first embodiment of the present invention, a video image processing apparatus according to a third embodiment of the present invention includes:
an image area dividing unit 31 for dividing an area of each image of the video image to distinguish a still area and a moving area of each image of the video image;
a long-term reference image generating unit 32 for generating a long-term reference image based on the image of the still region;
a short-term reference image generation unit 33 for generating a short-term reference image for the motion region;
and an image to be encoded generating unit 34 configured to generate an image to be encoded, wherein a still region in the image to be encoded refers to the long-term reference image, and a moving region in the image to be encoded refers to the short-term reference image.
The video image processing device provided by the embodiment of the invention divides the area of each image of the video image through the image area dividing unit 31 to distinguish the static area and the motion area of each image of the video image; generating a long-term reference image by the long-term reference image generating unit 32 based on the image of the still region; generating, by the short-term reference image generating unit 33, a short-term reference image only for the motion region; the image to be encoded is generated by the image to be encoded generating unit 34 with the still region in the image to be encoded using the long-term reference image as a reference and the moving region in the image to be encoded using the short-term reference image as a reference. Compared with the prior art, the video processing device provided by the embodiment of the invention does not need to encode the whole image to be encoded, the moving area in the image to be encoded takes the short-term reference image as the reference, and the static area in the image to be encoded takes the long-term reference image as the reference, so that the complete image to be encoded can be generated. Therefore, the video image processing method and the video image processing device provided by the embodiment of the invention can improve the coding quality of the video image, reduce the coding rate of the video image, and further reduce the transmission bandwidth required by the coding code stream and the storage space required by the storage of the coding code stream.
The application environment of the video image processing device in the third embodiment of the present invention includes a video monitoring environment and/or a video conference environment. Since the long-term reference image applied in the video image processing apparatus provided by the third embodiment of the present invention is based on the image of the still region in the video image, the shooting background angle of the real-time video image needs to be single, that is, the camera itself needs to be fixed, and the shooting scene also needs to be fixed, and the video monitoring environment and/or the video conference environment completely meet the shooting requirement of the video image processing apparatus provided by the third embodiment of the present invention. However, in the case of shooting by television, movie, etc., since the shooting angle and the focal length are changed in real time, it is difficult to find a still region in each image in the video image, and therefore, the video processing apparatus provided by the third embodiment of the present invention is not suitable for processing video images of television and movie.
The video image processing device provided by the third embodiment of the present invention further includes:
the static area selection unit is used for comparing the images above 120 images and selecting static areas in the images above 120 images as static areas;
and the motion area selection unit is used for selecting an area where people or objects move in each image as a motion area.
For example, in a video conference environment, when the installation position of a camera and the direction of the camera are not changed, in a real-time video image of a video conference, a fixed background such as a wall, a ceiling lamp, a desk and the like is usually fixed, and a person in a video conference scene may be in motion, so that at this time, an image formed by combining the fixed backgrounds such as the wall, the ceiling lamp, the desk and the like is used as a static area, and the person is used as a motion area to be divided. The reference is to select the image above 120 images to make the selection of the still area more accurate, and at this time, if the number of the selected images is too small, the determination of the still area will be affected. Even though the conventional video playing technology generally adopts the frame frequency of 24 images, in this case, the time required for playing 120 images is 5s, in this embodiment, a video frame area which is static for more than 5s is considered as a static area, so as to avoid that the selection time is too short, and people or objects in the motion area are not changed yet and are mistakenly judged in the static area.
The video processing device provided by the third embodiment of the present invention further includes:
the image cutting unit is used for cutting off the motion area in the image to be coded to obtain the image of the static area of the image to be coded;
the motion region image generating unit is used for generating an image of a motion region of the image to be coded according to the short-term reference image;
and the image splicing unit is used for splicing the image of the static area of the image to be coded and the image of the motion area of the image to be coded together to generate a complete image to be coded.
In the video image processing apparatus provided in the third embodiment of the present invention, since the moving region is separated from the static region by using a pre-elimination method, it is required to eliminate a sufficiently large region in the image to ensure that the moving region is completely eliminated, so as to avoid serious distortion of the video image due to the fact that people or objects in motion in the video image cannot be completely included because the moving region is too small. Therefore, the video image processing device provided by the third embodiment of the invention cannot be well adapted to scenes with excessively dispersed static areas.
The video processing device provided by the third embodiment of the present invention further includes:
the long-term reference image storage unit is used for storing long-term reference images to be updated in a reference image buffer area, wherein each long-term reference image is attached with a generation time tag;
and the long-term reference image calling unit is used for calling the long-term reference image with the generation time label closest to the image to be coded as a reference.
Since the video image processing apparatus provided in the third embodiment of the present invention sets the static area in the image, in long-term application, the image in the static area may also be changed, for example, if the scene is refitted, the image on the wall may be changed, and if the desk is moved, the image on the desk may also be changed, so that the long-term reference image needs to be updated in the set period to avoid the distortion of the video image as much as possible.
Example four
Different from the video processing apparatus provided in the third embodiment of the present invention, the video processing apparatus provided in the fourth embodiment of the present invention further includes:
the grid dividing unit is used for dividing each image of the video image to form a plurality of grid areas;
the image comparison unit is used for comparing the image to be coded with the long-term reference image and selecting a latticed area which is completely consistent with an area corresponding to the long-term reference image in the image to be coded as a static area of the image to be coded; the grid region inconsistent with the region corresponding to the long-term reference image is a motion region of the image to be coded;
a still region image generating unit, configured to apply the trellis region in the long-term reference image completely, and generate an image of a still region of the image to be encoded;
the motion region image generating unit is used for generating an image of a motion region of the image to be coded by taking a short-term reference image of the image to be coded as a basis;
and the image to be coded generating unit is used for splicing the image of the static area and the image of the motion area together to generate a complete image to be coded.
The video image processing device provided by the fourth embodiment of the invention has the advantage that even if the still areas are very scattered, the video image processing device provided by the fourth embodiment of the invention can be well adapted because the video image is divided in a grid dividing mode.
In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
The above-described device embodiments are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.