CN117376571A

CN117376571A - Image processing method, electronic device, and computer storage medium

Info

Publication number: CN117376571A
Application number: CN202210761067.1A
Authority: CN
Inventors: 游晶; 陈杰; 孔德辉; 徐科
Original assignee: Sanechips Technology Co Ltd
Current assignee: Sanechips Technology Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2024-01-09
Also published as: WO2024001345A1

Abstract

The present disclosure provides an image processing method, the method including: dividing a static area and a suspected motion area from an image frame to be processed; determining motion vector information of each pixel in the suspected motion area, and dividing each pixel into a motion pixel and a static pixel according to the motion vector information of each pixel; marking the static state of the static pixels and all pixels in the static region, and marking the motion state and the corresponding motion vector information of the motion pixels; and carrying out video encoding and decoding processing on the marked image frames to be processed. The motion estimation time is shortened, the image processing efficiency is improved, bandwidth resources are saved, and the video transmission pressure is relieved. The disclosure also provides an electronic device and a computer storage medium.

Description

Image processing method, electronic device, and computer storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an electronic device, and a computer storage medium.

Background

Motion estimation (Motion Estimation) is a widely used technique in video coding and video processing (e.g., deinterleaving). In conventional video coding and decoding techniques, motion estimation is usually performed based on a partition Prediction Unit (PU), and the partition PU is usually directly partitioned according to position information, so that a problem of low accuracy of motion estimation of the PU inevitably occurs when motion estimation is performed. In addition, the conventional video coding and decoding technology generally adopts global motion estimation, so that the global motion estimation is long in time consumption, larger bandwidth support is required, and in addition, the video quality and the video resolution are continuously improved, so that the requirement on bandwidth is larger.

Disclosure of Invention

The present disclosure addresses the above-described deficiencies of the prior art by providing an image processing method, an electronic device, and a computer storage medium.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including:

dividing a static area and a suspected motion area from an image frame to be processed;

determining motion vector information of each pixel in the suspected motion area, and dividing each pixel into a motion pixel and a static pixel according to the motion vector information of each pixel;

marking the static state of the static pixels and all pixels in the static region, and marking the motion state and the corresponding motion vector information of the motion pixels;

and carrying out video encoding and decoding processing on the marked image frames to be processed.

In some embodiments, the determining the motion vector information for each pixel in the suspected motion region includes:

dividing the suspected motion area into a plurality of macro blocks which are not overlapped with each other;

for each macro block, determining a matching block of the current macro block from a reference frame corresponding to the current macro block;

and determining the motion vector information of all pixels in each macro block according to each macro block and the matching block of each macro block.

In some embodiments, the determining the motion vector information of all pixels in each of the macro blocks according to each of the macro blocks and the matching blocks of each of the macro blocks includes:

and for each macro block, determining a geometric coordinate difference value between the center point of the matching block of the current macro block and the center point of the current macro block, and taking the geometric coordinate difference value as motion vector information of all pixels in the current macro block.

In some embodiments, dividing the pixels into motion pixels and still pixels according to the motion vector information of the pixels comprises:

determining pixels satisfying a preset condition among the pixels as the static pixels, and determining pixels except the static pixels in the pixels as the motion pixels, wherein the preset condition comprises: the motion vector information is zero, and the frame difference between the image frame to be processed where the pixel is located and the reference frame is smaller than a preset threshold value.

In some embodiments, the stationary region comprises a background region and a stationary target region, and the suspected motion region comprises a moving target region; the dividing the static area and the suspected motion area from the image frame to be processed comprises the following steps:

dividing the image frame to be processed into a foreground region and a background region;

identifying objects in each of the foreground regions;

and dividing each foreground region into a static target region and a moving target region according to the targets in each foreground region.

In some embodiments, the dividing each of the foreground regions into a stationary target region and a moving target region according to the targets in each of the foreground regions comprises:

for any one of the targets in any one of the foreground regions, when detecting that motion exists in the current target, determining a preset range region taking the current target as a center in the current foreground region as the motion target region;

and determining the regions except the moving target region in each foreground region as the static target region.

In some embodiments, the number of image frames to be processed is a plurality; the dividing the static area and the suspected motion area from the image frame to be processed comprises the following steps:

determining a frame difference between each of the image frames to be processed and a corresponding reference frame;

and dividing each image frame to be processed into the static area and the suspected motion area according to each frame difference.

In some embodiments, the dividing each of the image frames to be processed into the stationary region and the suspected motion region according to each of the frame differences includes:

and determining the image frames to be processed with the frame difference larger than or equal to a preset dynamic and static judgment threshold value as the suspected motion area, and determining the image frames to be processed with the frame difference smaller than the preset dynamic and static judgment threshold value as the static area.

By dividing a static area and a suspected motion area from an image frame to be processed, only carrying out local motion estimation on the suspected motion area so as to determine motion vector information of each pixel in the suspected motion area, dividing each pixel into a motion pixel and a static pixel according to the motion vector information of each pixel, marking the static state of the static pixel and all pixels in the static area, carrying out motion state and corresponding marking of the motion vector information on the motion pixel, carrying out video encoding and decoding processing on the marked image frame to be processed, and not carrying out global motion estimation on the image frame to be processed, thereby shortening the time of motion estimation, improving the efficiency of image processing, and the identified static pixels (including the pixels in the static area) have smaller bandwidth requirements when carrying out video encoding and decoding processing, and respectively carrying out targeted processing on the static pixels and the motion pixels when carrying out video encoding and decoding processing, and also saving bandwidth resources and relieving the pressure of video transmission.

Drawings

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present disclosure;

fig. 2 is a second flowchart of an image processing method according to an embodiment of the disclosure;

FIG. 3 is a block matching schematic diagram provided by an embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a third image processing method according to an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a method for image processing according to an embodiment of the present disclosure;

fig. 6 is a flowchart of an image processing method according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Embodiments described herein may be described with reference to plan and/or cross-sectional views with the aid of idealized schematic diagrams of the present disclosure. Accordingly, the example illustrations may be modified in accordance with manufacturing techniques and/or tolerances. Thus, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of the configuration formed based on the manufacturing process. Thus, the regions illustrated in the figures have schematic properties and the shapes of the regions illustrated in the figures illustrate the particular shapes of the regions of the elements, but are not intended to be limiting.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Since motion estimation is generally performed based on dividing PU and global motion estimation is performed on image frames in the conventional video encoding and decoding technology, the accuracy is low, the time consumption is long, and the bandwidth requirement is high.

As shown in fig. 1, an embodiment of the present disclosure provides an image processing method, which may include the steps of:

in step S11, a stationary region and a suspected motion region are divided from an image frame to be processed;

in step S12, motion vector information of each pixel in the suspected motion area is determined, and each pixel is divided into a motion pixel and a static pixel according to the motion vector information of each pixel;

in step S13, the stationary pixels and all the pixels in the stationary region are marked, and the motion pixels and the corresponding motion vector information are marked;

in step S14, video encoding and decoding are performed on the marked image frames to be processed.

The static area refers to an area in which the pixels do not move, and the suspected motion area refers to an area in which the pixels are suspected to move.

The separation of the stationary and suspected motion regions from the image frame to be processed is a motion detection process that may be performed by any conventional image processing operation or deep learning neural network capable of motion detection, such as by image segmentation network, MSE (Mean Square Error ) operation, MAE (Mean Absolute Error, absolute error average) operation, SAD (Sum of Absolute Difference, sum of absolute error) operation, frame difference calculation, etc.

The process of determining the motion vector information of each pixel in the suspected motion area and dividing each pixel into a motion pixel and a static pixel according to the motion vector information of each pixel is a local motion estimation process, which can be performed through any conventional image processing operation or deep learning neural network capable of performing motion estimation, for example, through an image block matching method, an optical flow network and the like.

As can be seen from the foregoing steps S11 to S14, in the image processing method provided by the embodiment of the present disclosure, by dividing a still region and a suspected motion region from an image frame to be processed, only performing local motion estimation on the suspected motion region, so as to determine motion vector information of each pixel in the suspected motion region, and divide each pixel into a motion pixel and a still pixel according to the motion vector information of each pixel, mark the still pixel and all pixels in the still region in a still state, perform video encoding and decoding processing on the motion pixel and the corresponding mark of the motion vector information, and do not need to perform global motion estimation on the image frame to be processed, thereby shortening the duration of motion estimation, improving the efficiency of image processing, and the identified still pixels (including pixels in the still region) have smaller bandwidth requirements when performing video encoding and decoding processing, and have larger bandwidth requirements when performing video encoding and decoding processing on the still pixels and the motion pixels, respectively, and can also reduce the bandwidth resources for video encoding and decoding processing.

Only the suspected motion area is subjected to motion estimation, an image block matching method, an optical flow network and the like can be adopted, wherein the image block matching method is convenient and quick to use and has higher accuracy. Accordingly, in some embodiments, as shown in fig. 2, the determining the motion vector information of each pixel in the suspected motion region (i.e. the step S12) may include the following steps:

in step S121, the suspected motion area is divided into a plurality of macro blocks that do not overlap each other;

in step S122, for each macroblock, determining a matching block of the current macroblock from a reference frame corresponding to the current macroblock;

in step S123, motion vector information of all pixels in each of the macro blocks is determined according to each of the macro blocks and the matching block of each of the macro blocks.

Wherein a macroblock is generally composed of a luminance pixel block and two additional chrominance pixel blocks, and a reference frame corresponding to the macroblock refers to a reference frame of an image frame where the macroblock is located, and in this field, the type and number of the reference frames relate to the type of the current frame, for example, the reference frame is an I frame or a P frame before the current frame when the current frame is a P frame, and the reference frame is an I frame and/or a P frame before and/or after the current frame when the current frame is a B frame.

Firstly, the suspected motion area is divided into a plurality of macro blocks which are not overlapped with each other, and all pixels in each macro block are considered to have the same motion vector information. Further, for each macroblock, a block most similar to the macroblock is searched from the reference frame, which is called a matching block of the macroblock, wherein the similarity is calculated and the most similar block is determined, and the SAD algorithm can be adopted, and the algorithm is simple and quick. Finally, for each macro block, the motion vector information corresponding to the macro block, namely the motion vector information of all pixels in the macro block, can be determined according to the macro block and the matching block of the macro block.

For example, as shown in fig. 3, a block matching schematic diagram provided in the embodiment of the present disclosure is taken as an example, a certain macroblock (referred to as a current block) of a suspected Motion area is taken as a center point (i.e., a point (x, y) shown in the figure) of the current block in a reference frame, a matching block most similar to the current block is searched in a Search range (Search Region) near the center point, the center point of the matching block is (x 1, y 1), and a difference value of geometric coordinates between the center point of the current block and the center point of the matching block may be used as a Motion Vector (Motion Vector) from the current block to the matching block, or may be used as a Motion Vector of all pixels in the current block.

Accordingly, in some embodiments, the determining the motion vector information of all pixels in each of the macro blocks according to each of the macro blocks and the matching blocks of each of the macro blocks (i.e. step S123) may include the following steps: and for each macro block, determining a geometric coordinate difference value between the center point of the matching block of the current macro block and the center point of the current macro block, and taking the geometric coordinate difference value as motion vector information of all pixels in the current macro block.

For example, the center point of the matching block of the current macroblock is (x 1,y 1), the center point of the current macroblock is (x, y), then the difference between the geometric coordinates of (x 1, y 1) and (x, y) is calculatedmv can be used as motion vector information for all pixels in the current macroblock.

Under the condition that the motion vector is not zero, the pixel can be stated to have motion necessarily, but under the condition that the motion vector is zero, the pixel cannot be stated to have motion necessarily, and further judgment is needed by combining the frame difference between the image frame to be processed where the pixel is and the reference frame thereof. Accordingly, in some embodiments, dividing the pixels into motion pixels and still pixels (i.e., as described in step S12) based on the motion vector information of the pixels includes: determining pixels satisfying a preset condition among the pixels as the static pixels, and determining pixels except the static pixels in the pixels as the motion pixels, wherein the preset condition comprises: the motion vector information is zero, and the frame difference between the image frame to be processed where the pixel is located and the reference frame is smaller than a preset threshold value.

That is, it is possible to determine as a still pixel pixels that have motion vector information of zero in each pixel in the suspected motion region and that have a corresponding frame difference of less than a preset threshold, and to determine as a motion pixel pixels that have motion vector information of zero in each pixel in the suspected motion region and that have a corresponding frame difference of greater than or equal to the preset threshold, and that have a motion vector of non-zero (whether or not the corresponding frame difference is zero at this time).

The frame difference between the image frame to be processed, in which the pixel is located, and its reference frame refers to the average value of the differences between each pixel in the image frame to be processed, in which the pixel is located, and each pixel in the reference frame, i.e., the average pixel difference. Under the condition that the motion vector information is zero and the frame difference between the image frame to be processed where the pixel is located and the reference frame is smaller than a preset threshold value, the pixel can be reasonably considered to have no motion, and belongs to a static pixel.

The stationary region and the suspected motion region are divided from the image frames to be processed, and the stationary region and the suspected motion region can be divided from each image frame by adopting an image division algorithm, or a plurality of image frames can be directly classified by performing motion pre-detection, and each image frame is determined to be the stationary region or the suspected motion region.

Accordingly, in some embodiments, the stationary region includes a background region and a stationary target region, and the suspected motion region includes a moving target region; as shown in fig. 4, the step of dividing the still region and the suspected motion region from the image frame to be processed (i.e., step S11) may include the steps of:

in step S111, the image frame to be processed is divided into a foreground region and a background region;

in step S112, a target in each of the foreground regions is identified;

in step S113, each of the foreground regions is divided into a stationary target region and a moving target region according to the target in each of the foreground regions.

The division of the image frame to be processed into the foreground region and the background region and the identification of the object in each foreground region may be performed by any conventional image processing operation or deep learning neural Network capable of image division, for example, FCN (Full Connected Network, fully connected Network), segNet (split Network), U-Net (U-shape Network), or the like. In the art, a foreground region generally refers to a region including local motion, and a target generally refers to a subject in an image, such as a person, an animal, a plant, etc., and embodiments of the present disclosure are not described herein.

After each foreground region is divided into a static target region and a moving target region, the static target region and the background region are directly used as the static regions, and pixels in the regions are considered to have no motion, and motion estimation is not needed. The motion target area is used as a suspected motion area, and motion estimation is needed to further judge whether each pixel in the motion target area has motion.

After identifying the objects in each foreground region, it may be further detected whether there is motion of the objects. Accordingly, in some embodiments, as shown in fig. 5, the dividing each foreground region into a stationary target region and a moving target region according to the target in each foreground region (i.e., step S113) may include the following steps:

in step S1131, for any one of the targets in any one of the foreground regions, when motion of the current target is detected, determining a preset range region with the current target as a center in the current foreground region as the motion target region;

in step S1132, the regions other than the moving target region in each of the foreground regions are determined as the stationary target region.

The detection of whether the target moves may be performed by some simple image processing method, for example, by comparing the geometric position change of the target between the previous frame and the next frame, where the previous frame and the next frame are the frames of the image frame to be processed where the target is located. The target-centered preset range area needs to include at least all of the targets. For the detected moving object, a preset range area taking the object as a center is taken as a moving object area, intersections can exist among all the moving object areas, and after all the moving object areas are determined, the areas except the moving object areas are taken as static object areas.

Instead of taking a preset range area centered on the target as a stationary target area when no movement of the target is detected, after all the moving target areas are determined, since if the target is sequentially detected and the preset range area centered on the target where movement is present is sequentially taken as a moving target area and the preset range area centered on the target where no movement is present is taken as a stationary target area, it is highly likely that the stationary target area determined later covers the moving target area determined earlier, that is, that the moving target area is erroneously identified as a stationary target area, in order to avoid that the moving target area is erroneously identified as a stationary target area and thus reduce the risk of erroneous identification, and to improve the identification accuracy, the area except the moving target area is taken as a stationary target area after all the moving target areas are determined.

In addition to the possibility of using an image segmentation algorithm to segment the stationary and suspected motion regions from each image frame, the multiple image frames may also be classified directly by performing motion pre-detection. Motion pre-detection may employ conventional image processing operations such as calculating frame differences. Accordingly, in some embodiments, the number of image frames to be processed is a plurality; as shown in fig. 6, the step of dividing the still region and the suspected motion region from the image frame to be processed (i.e., step S11) may include the steps of:

in step S111', determining a frame difference between each of the image frames to be processed and a corresponding reference frame;

in step S112', each of the image frames to be processed is divided into the stationary region and the suspected motion region according to each of the frame differences.

The frame difference between the image frame to be processed and the corresponding reference frame refers to an average value of differences between pixels in the image frame to be processed and pixels in the reference frame, that is, an average pixel difference value. The difference between the current image frame to be processed and the reference frame is calculated, which can be expressed as: frame_diff= |frame (t) -frame (t-1) |, frame (t) represents the current image frame to be processed, frame (t-1) represents the reference frame of the current image frame to be processed, and frame_diff represents the frame difference.

Under the condition that the frame difference is small enough, the difference between the current image frame to be processed and the reference frame is small, and the current image frame to be processed can be reasonably considered to belong to a static area, namely, no motion exists in pixels in the current image frame to be processed. Accordingly, in some embodiments, the dividing each of the image frames to be processed into the stationary region and the suspected motion region according to each of the frame differences (i.e., step S112') may include the steps of: and determining the image frames to be processed with the frame difference larger than or equal to a preset dynamic and static judgment threshold value as the suspected motion area, and determining the image frames to be processed with the frame difference smaller than the preset dynamic and static judgment threshold value as the static area.

Representing the preset dynamic and static discrimination threshold as threshold, the frame difference being greater than or equal to the preset dynamic and static discrimination threshold may be represented as: frame_diff > = threshold, and a frame difference smaller than the preset dynamic and static discrimination threshold may be expressed as: frame_diff < threshold.

In addition, the embodiment of the disclosure further provides an electronic device, including:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image processing methods as described above.

Further, the embodiment of the present disclosure also provides a computer storage medium having a computer program stored thereon, wherein the program when executed implements the image processing method as described above.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, functional modules/units in the apparatus disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. An image processing method, the method comprising:

2. The method of claim 1, wherein determining motion vector information for each pixel in the suspected motion region comprises:

3. The method of claim 2, wherein determining motion vector information for all pixels in each of the macroblocks based on each of the macroblocks and the matching blocks for each of the macroblocks comprises:

4. A method according to claim 3, wherein dividing the pixels into motion pixels and still pixels based on the motion vector information of the pixels comprises:

5. The method of any one of claims 1-4, wherein the stationary region comprises a background region and a stationary target region, and the suspected motion region comprises a motion target region; the dividing the static area and the suspected motion area from the image frame to be processed comprises the following steps:

identifying objects in each of the foreground regions;

6. The method of claim 5, wherein dividing each of the foreground regions into a stationary target region and a moving target region based on the targets in each of the foreground regions comprises:

7. The method according to any one of claims 1-4, wherein the number of image frames to be processed is a plurality; the dividing the static area and the suspected motion area from the image frame to be processed comprises the following steps:

8. The method of claim 7, wherein the dividing each of the image frames to be processed into the stationary region and the suspected motion region according to each of the frame differences comprises:

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image processing method of any of claims 1-8.

10. A computer storage medium having stored thereon a computer program, wherein the program when executed implements the image processing method according to any of claims 1-8.