WO2024001345A1

WO2024001345A1 - Image processing method, electronic device, and computer storage medium

Info

Publication number: WO2024001345A1
Application number: PCT/CN2023/084039
Authority: WO
Inventors: 游晶; 陈杰; 孔德辉; 徐科
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2022-06-30
Filing date: 2023-03-27
Publication date: 2024-01-04
Also published as: CN117376571A

Abstract

The present application provides an image processing method. The method comprises: dividing a static area and a suspected motion area from an image frame to be processed; determining motion vector information of pixels in the suspected motion area, and dividing the pixels into motion pixels and static pixels according to the motion vector information of the pixels; marking the static states of the static pixels and all pixels in the static area, and marking the motion states and corresponding motion vector information of the motion pixels; and performing video encoding and decoding processing on the marked image frame to be processed. The present application also provides an electronic device and a computer storage medium.

Description

Image processing methods, electronic equipment and computer storage media

Cross-references to related applications

This application claims priority from Patent Application No. 202210761067.1 filed with the China Patent Office on June 30, 2022, the entire content of which is incorporated herein by reference.

Technical field

The present disclosure relates to, but is not limited to, the technical field of image processing.

Background technique

Motion estimation (Motion Estimation) is a technology widely used in video encoding and decoding and video processing (such as deinterleaving). In traditional video coding and decoding technology, motion estimation is usually based on dividing prediction units (PUs), and dividing PUs is usually crudely segmented directly based on position information. Therefore, when performing motion estimation, inevitable problems will occur. The problem of low motion estimation accuracy of PU. Moreover, traditional video encoding and decoding technology usually uses global motion estimation. Global motion estimation is not only time-consuming, but also requires larger bandwidth support. In addition, with the continuous improvement of video quality and video resolution, the requirements for bandwidth are even greater. .

Contents of the invention

The present disclosure provides an image processing method, an electronic device, and a computer storage medium.

In a first aspect, the present disclosure provides an image processing method, which method includes: dividing a static area and a suspected moving area from an image frame to be processed; determining the motion vector information of each pixel in the suspected moving area, and based on The motion vector information of each pixel divides each pixel into a moving pixel and a static pixel; the static pixel and all pixels in the static area are marked as static, and the moving pixel is marked as moving. Mark the corresponding motion vector information; perform video encoding and decoding processing on the marked image frame to be processed.

In a second aspect, the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are processed by the one or more When the processor is executed, the one or more processors are caused to implement any image processing method described herein.

In a third aspect, the present disclosure provides a computer storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, it causes the processor to implement any of the image processing methods described herein.

Description of drawings

Figure 1 is a schematic flowchart of the image processing method provided by the present disclosure;

Figure 2 is a schematic flowchart of the image processing method provided by the present disclosure;

Figure 3 is a schematic diagram of block matching provided by the present disclosure;

Figure 4 is a schematic flowchart of the image processing method provided by the present disclosure;

Figure 5 is a schematic flowchart of the image processing method provided by the present disclosure;

Figure 6 is a schematic flowchart of the image processing method provided by the present disclosure.

Detailed ways

Example embodiments will be described more fully below with reference to the accompanying drawings, which may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully allow those skilled in the art to fully understand the scope of the disclosure.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is used to describe particular embodiments only and is not intended to limit the disclosure. As used herein, the singular forms "a," "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when the terms "comprising" and/or "made of" are used in this specification, the presence of stated features, integers, steps, operations, elements and/or components is specified but does not exclude the presence or Add one or more other features, entities, steps, operations, elements, components, and/or groups thereof.

The embodiments described herein may be referred to with reference to plan views and/or schematic illustrations of the present disclosure. or cross-sectional view for description. Accordingly, example illustrations may be modified based on manufacturing techniques and/or tolerances. Therefore, embodiments are not limited to those shown in the drawings but include modifications of configurations based on manufacturing processes. Accordingly, the regions illustrated in the figures are of a schematic nature and the shapes of the regions shown in the figures are illustrative of the specific shapes of regions of the element and are not intended to be limiting.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be construed to have meanings consistent with their meanings in the context of the relevant art and the present disclosure, and will not be construed as having idealized or excessive formal meanings, Unless expressly so limited herein.

Since in traditional video encoding and decoding technology, motion estimation is usually based on dividing PU and global motion estimation is performed on image frames, the accuracy is low, time-consuming and has high bandwidth requirements. In view of this, the embodiment of the present disclosure proposes , for some local motion scenes (such as live broadcast scenes), these scenes have a common feature, that is, most areas are actually in a static state, and only a small part of the area is in a moving state. Therefore, preliminary detection can be done first The static area and the suspected moving area are identified, and then further motion detection is performed in the suspected moving area, and local motion estimation is performed on the suspected moving area, so as to determine the motion vector information of the pixels in the suspected moving area, and further divide the suspected moving area into For still pixels and moving pixels, mark the motion status of the still pixels and moving pixels respectively, and directly perform video encoding and decoding processing based on the relevant marks of the motion status of the pixels.

As shown in Figure 1, the present disclosure provides an image processing method, which may include the following steps S11 to S14.

In step S11, a static area and a suspected motion area are divided from the image frame to be processed.

In step S12, the motion vector information of each pixel in the suspected motion area is determined, and each pixel is divided into a moving pixel and a stationary pixel according to the motion vector information of each pixel.

In step S13, the static pixels and all pixels in the static area are marked with a static state, and the moving pixels are marked with a motion state and all corresponding Marker describing motion vector information.

In step S14, video encoding and decoding processing is performed on the marked image frame to be processed.

Among them, both the static area and the suspected motion area include multiple pixels. The static area refers to the area where the pixels do not move, and the suspected motion area refers to the area where the pixels are suspected of moving.

Dividing static areas and suspected moving areas from the image frame to be processed is a motion detection process, which can be carried out through any traditional image processing operation or deep learning neural network capable of motion detection, such as through image segmentation network, MSE (Mean Square Error, mean square error) operation, MAE (Mean Absolute Error, absolute error average) operation, SAD (Sum of Absolute Difference, absolute error sum) operation, frame difference calculation, etc. are performed.

Determining the motion vector information of each pixel in the suspected motion area, and dividing each pixel into moving pixels and stationary pixels based on the motion vector information of each pixel, is a process of local motion estimation, which can be done through any method. Motion estimation is performed using traditional image processing operations or deep learning neural networks, such as image block matching methods, optical flow methods, optical flow networks, etc.

It can be seen from the above steps S11-S14 that the image processing method provided by the embodiment of the present disclosure divides the static area and the suspected moving area from the image frame to be processed, and only performs local motion estimation on the suspected moving area, so as to determine The motion vector information of each pixel in the suspected motion area is divided into moving pixels and stationary pixels according to the motion vector information of each pixel. For the stationary pixel and all pixels in the stationary area, Mark the static state, mark the motion state and the corresponding motion vector information of the moving pixels, and perform video encoding and decoding processing on the marked image frame to be processed without the need to perform global coding on the image frame to be processed. Motion estimation shortens the duration of motion estimation, improves the efficiency of image processing, and the identified still pixels (including pixels in still areas) have smaller bandwidth requirements during video encoding and decoding, while moving pixels are processed Video encoding and decoding has a large bandwidth requirement. Targeted processing of still pixels and moving pixels can also save bandwidth resources and relieve the pressure of video transmission.

For motion estimation only of suspected motion areas, image block matching method, optical flow method, optical flow network, etc. can be used. Among them, the image block matching method is convenient, fast and has high accuracy. Correspondingly, in some embodiments, as shown in FIG. 2 , determining the motion vector information of each pixel in the suspected motion area (ie, step S12) may include the following steps S121 to S123.

In step S121, the suspected motion area is divided into a plurality of non-overlapping macroblocks.

In step S122, for each macroblock, the matching block of the current macroblock is determined from the reference frame corresponding to the current macroblock.

In step S123, the motion vector information of all pixels in each macro block is determined based on each macro block and the matching block of each macro block.

Among them, a macroblock usually consists of a luminance pixel block and two additional chrominance pixel blocks. The reference frame corresponding to the macroblock refers to the reference frame of the image frame where the macroblock is located. In this field, the reference frame The type and number are related to the type of the current frame. For example, when the current frame is a P frame, the reference frame is the I frame or P frame before the current frame. When the current frame is a B frame, the reference frame is the previous and/or following frame. , I frame and/or P frame, which will not be described again in the embodiments of this disclosure.

First, the suspected motion area is divided into multiple non-overlapping macro blocks, and it is considered that all pixels in each macro block have the same motion vector information. Further, for each macroblock, the block most similar to the macroblock is searched from the reference frame, which is called the matching block of the macroblock. The SAD algorithm can be used to calculate the similarity and determine the most similar block. The algorithm is simple and fast. Finally, for each macroblock, the motion vector information corresponding to the macroblock can be determined based on the macroblock and the matching blocks of the macroblock, that is, the motion vector information of all pixels in the macroblock.

For example, as shown in Figure 3, a block matching schematic diagram provided by the present disclosure takes a certain macroblock (called a current block) in a suspected motion area as an example. In the reference frame, the current block is The center point is the center point (that is, the point (x, y) shown in the figure). Within a search range (Search Region) near the center point, search for the matching block most similar to the current block, and the center point of the matching block is (x1, y1), the geometric coordinate difference between the center point of the current block and the center point of the matching block can be used as the motion vector from the current block to the matching block (Motion Vector), which can also be used as the motion vector of all pixels in the current block.

Correspondingly, in some embodiments, determining the motion vector information of all pixels in each macro block based on each macro block and the matching block of each macro block (ie, step S123) may include the following steps: : For each macroblock, the geometric coordinate difference between the center point of the matching block of the current macroblock and the center point of the current macroblock is determined as the motion vector of all pixels in the current macroblock. information.

For example, if the center point of the matching block of the current macroblock is (x1, y1) and the center point of the current macroblock is (x, y), then calculate the geometric coordinate difference between (x1, y1) and (x, y) mv can be used as the motion vector information of all pixels in the current macroblock.

When the motion vector is not zero, it can be said that the pixel must have moved. When the motion vector is zero, it is not enough to show that the pixel must not be moving. It is also necessary to combine the image frame to be processed where the pixel is located and its reference. The frame difference between frames is used to further judge. Correspondingly, in some embodiments, dividing each pixel into a moving pixel and a still pixel according to the motion vector information of each pixel (ie, described in step S12) includes: dividing the pixels satisfying a preset condition into pixels are determined as the stationary pixels, and pixels other than the stationary pixels among the pixels are determined as the moving pixels, wherein the preset conditions include: the motion vector information is zero, and the pixels The frame difference between the image frame to be processed and its reference frame is less than the preset threshold.

That is to say, the pixels whose motion vector information in each pixel in the suspected motion area is zero and whose corresponding frame difference is less than the preset threshold can be determined as still pixels, and the motion vector information in each pixel in the suspected motion area is zero. And the pixels whose corresponding frame difference is greater than or equal to the preset threshold and whose motion vector is not zero (at this time, regardless of whether the corresponding frame difference is zero) are determined to be motion pixels.

The frame difference between the image frame to be processed where the pixel is located and its reference frame refers to the average difference between each pixel in the image frame to be processed where the pixel is located and each pixel in the reference frame, that is, the average pixel difference value. When the motion vector information is zero and the frame difference between the image frame to be processed and its reference frame where the pixel is located is less than the preset threshold, it can be reasonably considered that the pixel has no motion and is a static pixel.

From the image frame to be processed, the static area and the suspected moving area can be divided into Use an image segmentation algorithm to segment still areas and suspected moving areas from each image frame. You can also directly classify multiple image frames through motion pre-detection and determine each image frame as a still area or suspected moving area. .

Correspondingly, in some embodiments, the static area includes a background area and a static target area, and the suspected moving area includes a moving target area; as shown in Figure 4, the static area and the static area are divided from the image frame to be processed. The suspected motion area (ie step S11) may include the following steps S111 to S113.

In step S111, the image frame to be processed is divided into a foreground area and a background area.

In step S112, targets in each foreground area are identified.

In step S113, each foreground area is divided into a stationary target area and a moving target area according to the targets in each foreground area.

Wherein, segmenting the image frame to be processed into a foreground area and a background area and identifying targets in each of the foreground areas can be performed by any traditional image processing operation or deep learning neural network capable of image segmentation, for example, It can be carried out through FCN (Full Connected Network, fully connected network), SegNet (segmented network), U-Net (U-shape Network, U-shaped network), etc. In this field, the foreground area usually refers to the area containing local motion, and the target usually refers to the subject in the image such as people, animals, plants, etc. The embodiments of the present disclosure will not be repeated here.

After each foreground area is divided into a stationary target area and a moving target area, the stationary target area and the background area are directly used as stationary areas. It is considered that there is no motion in the pixels in the area, and no motion estimation is required. The moving target area is a suspected moving area and requires motion estimation to further determine whether there is motion in each pixel in the moving target area.

After identifying the targets in each foreground area, you can further detect whether the targets are moving. Correspondingly, in some embodiments, as shown in Figure 5, dividing each foreground area into a stationary target area and a moving target area according to the target in each foreground area (ie, step S113) may include The following steps are S1131 and S1132.

In step S1131, for any target in any foreground area, when it is detected that the current target has motion, the current foreground area is changed to the current target. The preset range area with the target as the center is determined as the motion target area.

In step S1132, all areas in each foreground area except the moving target area are determined as the stationary target areas.

Among them, detecting whether the target is moving can be carried out through some simple image processing methods. For example, it can be carried out by comparing the geometric position change of the target between the previous and next frames. The previous and next frames refer to the image frames to be processed where the target is located. Previous frame and next frame. The preset range area centered on the target needs to include at least the entire target. For targets that are detected to have motion, a preset range area centered on the target will be used as the moving target area. There may be intersections between the moving target areas. After all moving target areas are determined, the moving target area will be The areas are all regarded as static target areas.

The reason why the area except the moving target area is regarded as the stationary target area after all the moving target areas are determined is not used as the preset range area centered on the target when no movement of the target is detected. The stationary target area is because if the targets are detected in sequence and the preset range area centered on the moving target is used as the moving target area, and the preset range area centered on the non-moving target is used as the stationary target area, then It is very likely that the stationary target area determined later will cover the moving target area determined previously, which means that the moving target area will be mistakenly recognized as a stationary target area. Therefore, in order to avoid the moving target area being mistakenly recognized as a stationary target area and reduce errors. To reduce the risk of identification and improve the identification accuracy, all areas except the moving target areas should be regarded as stationary target areas only after all moving target areas are determined.

In addition to using image segmentation algorithms to segment static areas and suspected moving areas from each image frame, multiple image frames can also be directly classified by performing motion pre-detection. Motion pre-detection can use traditional image processing operations such as calculating frame differences. Correspondingly, in some embodiments, the number of image frames to be processed is multiple; as shown in Figure 6, dividing the still areas and suspected motion areas from the image frames to be processed (ie, step S11) may include The following steps are S111' and S112'.

In step S111', the frame difference between each image frame to be processed and the corresponding reference frame is determined.

In step S112', each of the image frames to be processed is divided into the still area and the suspected motion area according to each of the frame differences.

The frame difference between the image frame to be processed and the corresponding reference frame refers to the average value of the difference between each pixel in the image frame to be processed and each pixel in the reference frame, that is, the average pixel difference value. Calculate the difference between the current image frame to be processed and the reference frame, which can be expressed as: frame_diff=|frame(t)-frame(t-1)|, frame(t) represents the current image frame to be processed, frame(t- 1) Indicates the reference frame of the current image frame to be processed, and frame_diff indicates the frame difference.

When the frame difference is small enough, it means that the difference between the current image frame to be processed and the reference frame is small, and it can be reasonably considered that the current image frame to be processed belongs to the still area, that is, there is no motion in the pixels in the current image frame to be processed. . Correspondingly, in some embodiments, dividing each image frame to be processed into the still area and the suspected motion area (ie step S112') according to each frame difference may include the following steps: dividing the frame The image frames to be processed whose difference is greater than or equal to the preset motion and stillness discrimination threshold are determined as the suspected motion areas, and the image frames to be processed whose frame differences are less than the preset motion and stillness discrimination threshold are determined to be the still areas.

Expressing the preset motion discrimination threshold as threshold, the frame difference greater than or equal to the preset motion discrimination threshold can be expressed as: frame_diff>=threshold, and the frame difference smaller than the preset motion discrimination threshold can be expressed as: frame_diff<threshold.

In addition, the present disclosure also provides an electronic device, including: one or more processors; a storage device on which one or more programs are stored; when the one or more programs are processed by the one or more processors, When executed, the one or more processors are caused to implement the image processing method as described above.

In addition, the present disclosure also provides a computer storage medium on which a computer program is stored, wherein when the program is executed by a processor, it causes the processor to implement the image processing method as described above.

Those of ordinary skill in the art can understand that all or some steps in the methods disclosed above and functional modules/units in the devices can be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. Components execute cooperatively. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or implemented as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in combination with other embodiments, unless expressly stated otherwise. Features and/or components are used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims

An image processing method including:

Divide static areas and suspected motion areas from the image frame to be processed;

Determine the motion vector information of each pixel in the suspected motion area, and divide each pixel into a moving pixel and a stationary pixel according to the motion vector information of each pixel;

The stationary pixels and all pixels in the stationary area are marked with a stationary state, and the moving pixels are marked with a motion state and the corresponding motion vector information;

Perform video encoding and decoding processing on the marked image frames to be processed.
The method according to claim 1, wherein determining the motion vector information of each pixel in the suspected motion area includes:

Divide the suspected motion area into multiple non-overlapping macroblocks;

For each macroblock, determine the matching block of the current macroblock from the reference frame corresponding to the current macroblock;

According to each of the macro blocks and the matching blocks of each of the macro blocks, the motion vector information of all pixels in each of the macro blocks is determined.
The method according to claim 2, wherein determining the motion vector information of all pixels in each macro block based on each macro block and a matching block of each macro block includes:

For each macroblock, the geometric coordinate difference between the center point of the matching block of the current macroblock and the center point of the current macroblock is determined as the motion vector information of all pixels in the current macroblock. .
The method according to claim 3, wherein dividing each pixel into a moving pixel and a static pixel according to the motion vector information of each pixel includes:

Determine the pixels that meet the preset conditions among the pixels as the stationary pixels, and determine the pixels except the stationary pixels among the pixels as the moving pixels, The preset conditions include: the motion vector information is zero, and the frame difference between the image frame to be processed and its reference frame where the pixel is located is less than a preset threshold.
The method according to any one of claims 1 to 4, wherein the static area includes a background area and a static target area, the suspected moving area includes a moving target area; the static area is divided from the image frame to be processed. Areas and areas of suspected movement include:

Divide the image frame to be processed into a foreground area and a background area;

Identify targets in each of the foreground areas;

According to the targets in each foreground area, each foreground area is divided into a stationary target area and a moving target area.
The method according to claim 5, wherein dividing each foreground area into a stationary target area and a moving target area according to the targets in each foreground area includes:

For any target in any of the foreground areas, when it is detected that the current target has motion, a preset range area centered on the current target in the current foreground area is determined as the motion. target area;

Areas in each foreground area other than the moving target area are determined as the stationary target area.
The method according to any one of claims 1 to 4, wherein the number of image frames to be processed is multiple; dividing the still areas and suspected motion areas from the image frames to be processed includes:

Determine the frame difference between each of the image frames to be processed and the corresponding reference frame;

Each of the image frames to be processed is divided into the still area and the suspected motion area according to each of the frame differences.
The method according to claim 7, wherein dividing each of the image frames to be processed into the still area and the suspected motion area according to each of the frame differences includes:

Determine the image frame to be processed whose frame difference is greater than or equal to the preset motion discrimination threshold as the For the suspected motion area, the image frame to be processed whose frame difference is less than the preset motion and stillness discrimination threshold is determined as the still area.
An electronic device including:

one or more processors;

A storage device on which one or more programs are stored;

When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the image processing method according to any one of claims 1-8.
A computer storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the processor implements the image processing method according to any one of claims 1-8.