CN109255797B

CN109255797B - Image processing device and method, and electronic device

Info

Publication number: CN109255797B
Application number: CN201710574945.8A
Authority: CN
Inventors: 石路
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-07-14
Filing date: 2017-07-14
Publication date: 2021-08-06
Anticipated expiration: 2037-07-14
Also published as: JP7014005B2; CN109255797A; JP2019021297A

Abstract

Embodiments of the present invention provide an image processing apparatus and method, and an electronic device, which can eliminate inaccuracy of optical flow matching detection and influence of local jitter on a foreground detection result by calculating a difference between a jitter vector of a whole current frame image and each local jitter vector and performing offset matching on a detected foreground point according to the difference, and remove background points that are erroneously detected as foreground points, thereby obtaining an accurate foreground detection result.

Description

Image processing device and method, and electronic device

Technical Field

The present invention relates to the field of information technologies, and in particular, to an image processing apparatus and method, and an electronic device.

Background

In recent years, foreground detection has been widely used in object tracking, object detection, road monitoring, and many other fields. However, due to various unstable factors during camera shooting, when foreground detection is performed by using the existing algorithm, jitter in the shot video may cause many background regions to be erroneously detected as foreground regions. For example, in a Visual background extraction (VIBE) algorithm, for a pixel, it is assumed that neighboring pixels have a spatial distribution characteristic of a similar pixel value, and the pixel value of its neighboring domain pixel is randomly selected as a background model sample value, which can only prevent the influence of jitter of one pixel. In fact, distortion and local object motion that accompanies jitter cannot be eliminated. These false detections may occur in various regions of the video image, and some false foreground regions may have a large effect on the detection result.

Existing video stabilization methods, such as image block matching or feature extraction for image stabilization, estimate global motion by computing local motion. These methods can reduce the range of jitter and remove relatively large jitter.

It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.

Disclosure of Invention

The inventors have found that when a video is processed by the above conventional video stabilization method, the distance of the local motion is not equal to the distance of the global motion, and image distortion often occurs. In addition, even with compensation, there is still some unresolved local jitter in the video image frames, e.g., leaf wobble. Even small jitter can cause the edges of the background area to be erroneously detected as foreground areas.

According to a first aspect of embodiments of the present invention, there is provided an image processing apparatus, the apparatus comprising: the first foreground detection unit is used for extracting a foreground mask of a current frame image of an input video according to a background model and determining a first foreground area according to the foreground mask; a shake detection unit for detecting a shake vector of the entire current frame image and shake vectors of respective parts in the current frame image based on optical flow matching; a difference calculation unit for calculating a difference between a shake vector of the entire current frame image and each local shake vector; the jitter compensation unit is used for carrying out global jitter compensation on the current frame image according to the jitter vector of the whole current frame image; the offset matching unit is used for performing offset matching on the pixel points in the first foreground region according to the difference value; and the second foreground detection unit is used for updating the foreground mask according to the offset matching result and determining a second foreground area according to the updated foreground mask.

According to a second aspect of embodiments of the present invention, there is provided an electronic device comprising the apparatus according to the first aspect of embodiments of the present invention.

According to a third aspect of embodiments of the present invention, there is provided an image processing method, the method including: extracting a foreground mask of a current frame image of an input video according to a background model, and determining a first foreground region according to the foreground mask; detecting a shaking vector of the whole current frame image and each local shaking vector in the current frame image based on optical flow matching; calculating the difference value between the jitter vector of the whole current frame image and each local jitter vector; carrying out global jitter compensation on the current frame image according to the jitter vector of the whole current frame image; performing offset matching on the pixel points in the first foreground region according to the difference value; and updating the foreground mask according to the offset matching result, and determining a second foreground region according to the updated foreground mask.

The invention has the beneficial effects that: by calculating the difference value between the jitter vector of the whole current frame image and each local jitter vector and performing offset matching on the detected foreground point according to the difference value, the inaccuracy of optical flow matching detection and the influence of local jitter on a foreground detection result can be eliminated, background points which are falsely detected as foreground points are removed, and an accurate foreground detection result can be obtained.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

fig. 1 is a schematic diagram of an image processing apparatus of embodiment 1 of the present invention;

FIG. 2 is a diagram of a current frame image according to embodiment 1 of the present invention;

fig. 3 is a schematic diagram of a foreground mask obtained by the first foreground detecting unit of embodiment 1 of the present invention;

fig. 4 is a schematic diagram of a shake detection unit of embodiment 1 of the invention;

fig. 5 is a schematic diagram of a determination unit 401 of embodiment 1 of the present invention;

fig. 6 is a schematic diagram of the first determination unit 502 of embodiment 1 of the present invention;

fig. 7 is a schematic diagram of a foreground mask obtained by the second foreground detecting unit of embodiment 1 of the present invention;

fig. 8 is a schematic diagram of the update unit 107 of embodiment 1 of the present invention;

fig. 9 is a schematic view of an electronic apparatus of embodiment 2 of the present invention;

fig. 10 is a schematic block diagram of a system configuration of an electronic apparatus according to embodiment 2 of the present invention;

fig. 11 is a schematic diagram of an image processing method of embodiment 3 of the present invention.

Detailed Description

The foregoing and other features of the invention will become apparent from the following description taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims.

Example 1

An image processing apparatus according to an embodiment of the present invention is provided, and fig. 1 is a schematic diagram of an image processing apparatus according to embodiment 1 of the present invention. As shown in fig. 1, the image processing apparatus 100 includes:

a first foreground detection unit 101, configured to extract a foreground mask of a current frame image of an input video according to a background model, and determine a first foreground region according to the foreground mask;

a shake detection unit 102 for detecting shake vectors of the entire current frame image and shake vectors of respective local parts in the current frame image based on optical flow matching;

a difference value calculating unit 103 for calculating a difference value between the shake vector of the entire current frame image and each local shake vector;

a shake compensation unit 104, configured to perform global shake compensation on the current frame image according to a detection result of a shake vector of the entire current frame image;

an offset matching unit 105, configured to perform offset matching on the pixel points in the first foreground region according to the difference;

and a second foreground detecting unit 106, configured to update the foreground mask according to the result of the offset matching, and determine a second foreground region according to the updated foreground mask.

According to the embodiment, the difference value between the jitter vector of the whole current frame image and each local jitter vector is calculated, and the detected foreground point is subjected to offset matching according to the difference value, so that the background point which is mistakenly detected as the foreground point due to jitter can be effectively removed, the influence of the jitter on foreground detection can be eliminated, and an accurate foreground detection result can be obtained.

In the present embodiment, the input video is a video that requires foreground detection, for example, the input video is a video obtained by a monitoring camera.

In the present embodiment, the input video may have a plurality of frame images arranged frame by frame in chronological order.

In this embodiment, the first foreground detecting unit 101 extracts a foreground mask of the current frame image of the input video according to the background model, and determines the first foreground region according to the foreground mask.

In this embodiment, the specific method for extracting the foreground mask according to the background model may refer to the existing method, for example, a background image of the current frame image is obtained through matching of the current background model, by comparing the background image with the current frame image, the pixel values of the significantly different pixels are set to 1, the pixel values of the remaining pixels are set to 0, the pixel with the pixel value of 1 is the foreground point, and the binarized foreground mask is obtained after the processing.

Fig. 2 is a schematic diagram of a current frame image in embodiment 1 of the present invention, and fig. 3 is a schematic diagram of a foreground mask obtained by the first foreground detecting unit in embodiment 1 of the present invention. After the current frame image shown in fig. 2 is processed by the first foreground detecting unit 101, the foreground mask shown in fig. 3 is obtained, which is a binarized image, and a region (white region in fig. 3) composed of pixels with pixel values of 1 is the first foreground region.

In this embodiment, the type of the background model may be an existing background model type, for example, the background model may be a model created by using an average background method or a gaussian mixture model.

In the present embodiment, the shake detection unit 102 detects a shake vector of the entire current frame image and shake vectors of respective local portions in the current frame image based on optical flow matching. A specific method of detecting a dither vector based on optical flow matching may refer to the related art. The structure of the shake detection unit 102 of the present embodiment and a method of detecting a shake vector are exemplarily described below.

Fig. 4 is a schematic diagram of a shake detection unit according to embodiment 1 of the present invention. As shown in fig. 4, the shake detection unit 102 includes:

an extraction unit 401 for extracting a plurality of feature points from the current frame image;

a determining unit 402, configured to determine a weight of each feature point;

and a clustering unit 403, configured to perform clustering according to the weight of each feature point based on optical flow matching, to obtain a shake vector of the entire current frame image and a shake vector of each local part in the current frame image.

In the present embodiment, the extraction unit 401 can obtain the feature points by an existing method. For example, first, an image is divided according to a resolution of a current frame image, for example, the resolution of the current frame image is 1920 × 1080, the image may be divided into 10 × 10 image blocks, and each image block searches 3 to 5 Harris angle points (Harris angle points) as feature points.

The structure of the determination unit 402 and a method of determining the weight are exemplarily described below.

Fig. 5 is a schematic diagram of the determination unit 402 according to embodiment 1 of the present invention. As shown in fig. 5, the determination unit 402 includes:

a counting unit 501, configured to count a ratio of pixel points at positions of the feature points detected as foreground points or background points or matching errors in a predetermined number of frame images before the current frame image;

a first determining unit 502, configured to determine a weight of each feature point in the current frame image according to the ratio.

Therefore, the weight of each feature point is determined by counting foreground detection results before the pixel point of the position of each feature point, and the accuracy of jitter detection can be effectively improved.

In this embodiment, the number of frame images to be counted may be determined according to actual needs. For example, the predetermined number is 50 to 100.

In this embodiment, an initial background model may be created when processing the first frame of the predetermined number of frame images, for example, 18-23 image matrices, which are separated from the gray scale image or 54-69 RGB three channels of the first frame image and have the same size, and the initial gray scale image or the separated channel gray scale of the first frame image is stored to create an initial single-channel or three-channel background model.

In this embodiment, how to determine the weight value according to the ratio can be set according to actual needs, and an exemplary description is provided below.

Fig. 6 is a schematic diagram of the first determination unit 502 of embodiment 1 of the present invention. As shown in fig. 6, the first determination unit 502 includes:

a second determining unit 601, configured to determine a weight of the feature point as 3 when a pixel point at a position where the feature point is located is detected as a foreground point in a predetermined number of frame images at a ratio of 30% to 70% and is detected as a background point at a ratio of 30% to 70%;

a third determining unit 602, configured to determine, when a pixel point at a position where the feature point is located is detected as a foreground point in a predetermined number of frame images in a proportion of 70% to 100%, and is detected as a background point in a proportion of 0% to 30%, a weight of the feature point is 2;

a fourth determining unit 603, configured to determine a weight of the feature point as 1 when a pixel point at a position where the feature point is located is detected as a foreground point in a predetermined number of frame images at a ratio of 0% to 30% and detected as a background point at a ratio of 70% to 100%;

a fifth determining unit 604, configured to determine the weight of the feature point as 0 when the number of times that the pixel point at the position of the feature point is detected as a matching error in a predetermined number of frame images is greater than 2.

In this embodiment, for example, a matching error means that a certain feature point cannot be matched or a calculated displacement after matching exceeds a preset threshold.

The structure of the determining unit 402 and the method for determining the weight of each feature point in the foreground mask are described above, after determining the weight of each feature point, the clustering unit 403 performs clustering based on optical flow matching and according to the weight of each feature point to obtain the dither vector of the whole current frame image and the dither vector of each local part in the current frame image, for example, the existing method can be used for clustering the weight as the number of times of calculating the feature point in clustering.

For example, the shake vector can be calculated according to the following equations (1) and (2):

wherein X represents a variance of a dither vector distance, Y represents a variance of a dither vector angle, N represents the number of detected weighted optical flow matching vectors, and D_iLength of vector representing i-th feature point, D_centerWhich represents the mean value of the length of the global vector,A_iindicates the vector direction of the i-th feature point, A_centerRepresenting the mean of the directions of the global vector, W_iAnd (4) representing the weight of the ith characteristic point, k representing the number of the characteristic points, and k and i being positive integers.

In this embodiment, an existing clustering method, for example, a K-means clustering method, may be used.

For example, optical flow matching vectors of three feature points are randomly selected as the centers of the clusters, the smallest cluster is screened out, and then the average value calculation is carried out on the lengths and the directions of the vectors of the other two clusters to obtain the calculation result of the jitter vector.

In this embodiment, the clustering unit 403 clusters the entire current frame image to obtain the jitter vector of the entire current frame image, and clusters each local portion to obtain the jitter vector of each local portion.

In this embodiment, the parts of the frame image may be divided according to actual needs. For example, the entire current frame image may be divided into upper, middle, and lower three local regions.

In the present embodiment, the difference value calculation unit 103 calculates the difference value of the shake vector of the entire current frame image and the shake vector of each local section.

In the present embodiment, the shake compensation unit 104 performs global shake compensation on the current frame image according to the shake vector of the entire current frame image, and the global shake compensation may be performed by using an existing method, for example, by shifting the image according to the shake vector. Therefore, global jitter compensation is carried out after average feature point sampling, weight matching and vector clustering, and a more stable global image stabilization effect than that of a common optical flow matching method can be obtained, so that a more stable image is obtained.

For example, the translation may be performed according to the following equation (3):

wherein x and y represent the coordinates of each pixel point before translation, and x_i,y_iRepresenting the average of each pixelThe coordinates of the shift, M, represents the distance of the shake vector, and θ represents the angle of the shake vector.

In this embodiment, the offset matching unit 105 performs offset matching on the pixel points in the first foreground region according to the difference calculated by the difference calculation unit 103. For example, after the pixel points in the first foreground region are shifted according to the difference, the current background model is used again for matching.

In the present embodiment, the difference value includes a difference value of distances (i.e., offset distances) and a difference value of angles (i.e., offset angles) of the shake vector of the entire current frame image and the respective local shake vectors.

In this embodiment, when performing offset matching, a difference value used when performing matching is determined according to a local position where a pixel to be matched is located, for example, for a pixel in a first foreground region located in a top part of three local upper, middle and lower parts, a difference value between a dither vector of a whole current frame image and the dither vector of the local upper part is used for performing offset matching, for a pixel in a first foreground region located in a middle part of the three local upper, middle and lower parts, a difference value between the dither vector of the whole current frame image and the dither vector of the local middle part is used for performing offset matching, and for a pixel in a first foreground region located in a lower part of the three local upper, middle and lower parts, a difference value between the dither vector of the whole current frame image and the dither vector of the local lower part is used for performing offset matching.

For example, offset matching may be performed according to the following equation (4):

dist＝abs(reference_frame.at<uchar>(i,j)-current_frame.at<uchar>(i+round(D*cosθ),j+round(D*sinθ)))； (4)

dist represents the difference value between a pixel point (i, j) in the current frame and a pixel point to be matched in the background model, reference _ frame.at < uchar > represents the pixel point to be matched in the background model, current _ frame.at < uchar > represents the pixel point with matching in the current frame, D represents an offset distance, round () represents rounding operation, theta represents an offset angle, and abs () represents absolute value taking operation.

In this embodiment, after the offset matching unit 105 performs offset matching, the second foreground detecting unit 106 updates the foreground mask according to the result of the offset matching, and determines the second foreground region according to the updated foreground mask. For example, the foreground points offset-matched as background points in the first foreground region are removed to obtain an updated foreground mask, and then a region of the updated foreground mask consisting of the remaining foreground points is the second foreground region.

Fig. 7 is a schematic diagram of a foreground mask obtained by the second foreground detecting unit in embodiment 1 of the present invention. As shown in fig. 7, compared to the foreground mask shown in fig. 2, which removes the background points that are erroneously detected as foreground points, the accuracy of the detection result is greatly improved.

In this embodiment, the apparatus 100 may further include:

an updating unit 107 for updating the background model according to the result of the offset matching for processing the next frame image of the current frame.

Fig. 8 is a schematic diagram of the update unit 107 according to embodiment 1 of the present invention, and as shown in fig. 8, the update unit 107 includes:

a first updating unit 801, configured to update, with a first probability, the background model of each background point to a background model of a position where the background point of the current frame image is located or a background model of a position where the background point is located after being offset;

a second updating unit 802, configured to update, with a second probability, a background model of a pixel point included in the first foreground region but not included in the second foreground region, that is, a background model of a background point erroneously detected as a foreground point, to a background model of a position of the current frame image after the pixel point is shifted.

In this way, by updating the background model according to the result of the offset matching, the background model can be further optimized, thereby further improving the accuracy of the foreground retrieval result for the video image with jitter.

For example, the background model may be updated according to equations (5) and (6) below:

B(rng.uniform(0,i))_j＝R(x,y) (5)

R(x,y)＝(int)rng.uniform(0,1)P(x,y)+(int)(1-rng.uniform(0,1))P(x,y) (6)

wherein, B (rng. uniform (0, i))_jR (x, y) represents that the j-th background model has a probability of 1/i to be replaced by R (x, y), and if the background point which is falsely detected as foreground point is selected to be updated, the pixel value of the pixel point P (x, y) at the same position or the pixel point P (x) at the offset position is selected to be updated_d,y_d) Randomly updating, wherein rng.uniform (0,1) represents a random decimal between 0 and 1, and represents that the original position or the offset position is randomly adopted for updating.

In this embodiment, the first probability and the second probability may be the same or different, and the specific value may be set according to actual needs. For example, the first probability and the second probability are 1/90-1/120.

For example, for the background model of each background point, the background model is updated to the background model of the current frame image at the position of the background point or the background model at the position of the background point after the background point is shifted with the probability of 1/90-1/120, wherein when the background model updated to the current frame image at the position of the background point and the background model at the position of the background point after the background point is shifted are 1/2 respectively. And updating the background model of the background point which is mistakenly detected as the foreground point into the background model of the position of the current frame image after the background point is shifted according to the probability of 1/90-1/120.

According to the embodiment, the difference value between the jitter vector of the whole current frame image and each local jitter vector is calculated, the detected foreground point is subjected to offset matching according to the difference value, the inaccuracy of optical flow matching detection and the influence of local jitter on the foreground detection result can be eliminated, the background point which is falsely detected as the foreground point is removed, and the accurate foreground detection result can be obtained.

Example 2

An embodiment of the present invention further provides an electronic device, and fig. 9 is a schematic diagram of an electronic device in embodiment 2 of the present invention. As shown in fig. 9, the electronic device 900 includes an image processing apparatus 901, and the structure and function of the image processing apparatus 901 are the same as those described in embodiment 1, and are not described here again.

Fig. 10 is a schematic block diagram of a system configuration of an electronic apparatus according to embodiment 2 of the present invention. As shown in fig. 10, the electronic device 1000 may include a central processing unit 1001 and a memory 1002; the memory 1002 is coupled to the cpu 1001. The figure is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

As shown in fig. 10, the electronic device 1000 may further include: an input unit 1003, a display 1004, and a power supply 1005.

In one embodiment, the functions of the image processing apparatus described in embodiment 1 may be integrated into the cpu 1001. Among them, the cpu 1001 may be configured to: extracting a foreground mask of a current frame image of an input video according to a background model, and determining a first foreground region according to the foreground mask; detecting a shaking vector of the whole current frame image and each local shaking vector in the current frame image based on optical flow matching; calculating the difference value between the jitter vector of the whole current frame image and each local jitter vector; carrying out global jitter compensation on the current frame image according to the jitter vector of the whole current frame image; performing offset matching on the pixel points in the first foreground region according to the difference value; and updating the foreground mask according to the offset matching result, and determining a second foreground region according to the updated foreground mask.

For example, the detecting a shake vector of the entire current frame image based on optical flow matching includes: extracting a plurality of feature points from the current frame image; determining the weight of each feature point; and clustering based on optical flow matching and according to the weight of each feature point to obtain the jitter vector of the whole current frame image and the local jitter vector of each current frame image.

For example, the determining the weight of each feature point includes: counting the proportion of pixel points at the positions of the characteristic points, which are detected as foreground points or background points or matching errors in a preset number of frame images before the current frame image; and determining the weight of each feature point in the current frame image according to the proportion.

For example, the determining the weight of each feature point in the current frame image according to the ratio includes: when the proportion of the pixel points at the positions of the feature points detected as foreground points in the predetermined number of frame images is 30% -70% and the proportion of the pixel points detected as background points is 30% -70%, determining the weight of the feature points as 3; when the proportion of the pixel points at the positions of the feature points detected as foreground points in the predetermined number of frame images is 70% -100%, and the proportion of the pixel points detected as background points is 0% -30%, determining the weight of the feature points as 2; when the proportion of the pixel points at the positions of the feature points detected as foreground points in the predetermined number of frame images is 0% -30% and the proportion detected as background points is 70% -100%, determining the weight of the feature points as 1; and when the number of times that the pixel points at the positions of the feature points are detected as matching errors in the preset number of frame images is more than 2, determining the weight of the feature points as 0.

For example, the cpu 1001 may be further configured to: and updating the background model according to the offset matching result.

For example, the updating the background model according to the result of the offset matching includes: for the background model of each background point, updating the background model into a background model of the position of the background point of the current frame image or a background model of the position of the background point after the background point is shifted according to a first probability; and updating the background model to the background model of the position of the current frame image after the shift of the foreground point of the pixel by using a second probability for the background models of the foreground pixel points which are included in the first foreground region but not included in the second foreground region.

In another embodiment, the image processing apparatus described in embodiment 1 may be configured separately from the cpu 1001, and for example, the image processing apparatus may be configured as a chip connected to the cpu 1001, and the function of the image processing apparatus is realized by the control of the cpu 1001.

It is not necessary for the electronic device 1000 to include all of the components shown in fig. 10 in this embodiment.

As shown in fig. 10, the central processing unit 1001, sometimes referred to as a controller or operation control, may include a microprocessor or other processor device and/or logic device, and the central processing unit 1001 receives inputs and controls the operation of the various components of the electronic device 1000.

The memory 1002, for example, may be one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. And the cpu 1001 can execute the program stored in the memory 1002 to realize information storage or processing, or the like. The functions of other parts are similar to the prior art and are not described in detail here. The various components of the electronic device 1000 may be implemented in dedicated hardware, firmware, software, or combinations thereof, without departing from the scope of the invention.

Example 3

The embodiment of the invention also provides an image processing method which corresponds to the image processing device in the embodiment 1. Fig. 11 is a schematic diagram of an image processing method of embodiment 3 of the present invention. As shown in fig. 11, the method includes:

step 1101: extracting a foreground mask of a current frame image of an input video according to a background model, and determining a first foreground region according to the foreground mask;

step 1102: detecting a shaking vector of the whole current frame image and each local shaking vector in the current frame image based on optical flow matching;

step 1103: calculating the difference value between the jitter vector of the whole current frame image and each local jitter vector;

step 1104: carrying out global jitter compensation on the current frame image according to the jitter vector of the whole current frame image;

step 1105: performing offset matching on the pixel points in the first foreground region according to the difference value;

step 1106: updating the foreground mask according to the result of the offset matching, and determining a second foreground region according to the updated foreground mask;

step 1107: and updating the background model according to the result of the offset matching.

In this embodiment, step 1103 and step 1104 may be executed simultaneously or sequentially, and the execution order of step 1103 and step 1104 is not limited in the embodiment of the present invention.

In this embodiment, the specific implementation method of the above steps is the same as that described in embodiment 1, and is not repeated here.

An embodiment of the present invention also provides a computer-readable program, where when the program is executed in an image processing apparatus or an electronic device, the program causes a computer to execute the image processing method described in embodiment 3 in the image processing apparatus or the electronic device.

An embodiment of the present invention further provides a storage medium storing a computer-readable program, where the computer-readable program enables a computer to execute the image processing method according to embodiment 3 in an image processing apparatus or an electronic device.

The image processing method performed in the image processing apparatus or the electronic device described in connection with the embodiments of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of the two. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 1 may correspond to individual software modules of a computer program flow or may correspond to individual hardware modules. These software modules may correspond to the steps shown in fig. 11, respectively. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the apparatus (e.g., mobile terminal) employs a relatively large capacity MEGA-SIM card or a large capacity flash memory device, the software module may be stored in the MEGA-SIM card or the large capacity flash memory device.

One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 1 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 1 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.

While the invention has been described with reference to specific embodiments, it will be apparent to those skilled in the art that these descriptions are illustrative and not intended to limit the scope of the invention. Various modifications and alterations of this invention will become apparent to those skilled in the art based upon the spirit and principles of this invention, and such modifications and alterations are also within the scope of this invention.

With respect to the embodiments including the above embodiments, the following remarks are also disclosed:

supplementary note 1, an image processing apparatus, the apparatus comprising:

the first foreground detection unit is used for extracting a foreground mask of a current frame image of an input video according to a background model and determining a first foreground area according to the foreground mask;

a shake detection unit for detecting a shake vector of the entire current frame image and shake vectors of respective parts in the current frame image based on optical flow matching;

a difference calculation unit for calculating a difference between a shake vector of the entire current frame image and each local shake vector;

the jitter compensation unit is used for carrying out global jitter compensation on the current frame image according to the jitter vector of the whole current frame image;

the offset matching unit is used for performing offset matching on the pixel points in the first foreground region according to the difference value;

and the second foreground detection unit is used for updating the foreground mask according to the offset matching result and determining a second foreground area according to the updated foreground mask.

Supplementary note 2, the apparatus according to supplementary note 1, wherein the shake detection unit includes:

an extracting unit configured to extract a plurality of the feature points from a current frame image;

a determining unit, configured to determine a weight of each feature point;

and the clustering unit is used for carrying out clustering according to the weight of each feature point based on optical flow matching to obtain the jitter vector of the whole current frame image and the jitter vector of each local part in the current frame image.

Note 3 of the present invention, the apparatus according to note 2, wherein the determining unit includes:

the statistical unit is used for counting the proportion that pixel points at the positions of the characteristic points are detected as foreground points or background points or matching errors in a preset number of frame images before the current frame image;

and the first determining unit is used for determining the weight of each feature point in the current frame image according to the proportion.

Note 4, the apparatus according to note 3, wherein the first determination unit includes:

a second determining unit, configured to determine, when a proportion of a pixel point at a position where the feature point is located in the predetermined number of frame images detected as a foreground point is 30% to 70%, and a proportion of the pixel point detected as a background point is 30% to 70%, a weight of the feature point is 3;

a third determining unit, configured to determine, when a proportion of a pixel point at a position where the feature point is located in the predetermined number of frame images detected as a foreground point is 70% to 100%, and a proportion of the pixel point detected as a background point is 0% to 30%, a weight of the feature point to be 2;

a fourth determining unit, configured to determine a weight of the feature point as 1 when a pixel point at a position where the feature point is located is detected as a foreground point in the predetermined number of frame images at a ratio of 0% to 30% and detected as a background point in the predetermined number of frame images at a ratio of 70% to 100%;

and the fifth determining unit is used for determining the weight of the feature point as 0 when the number of times that the pixel point at the position of the feature point is detected as a matching error in the predetermined number of frame images is more than 2.

Supplementary note 5, the apparatus according to supplementary note 1, wherein, the apparatus further includes:

an updating unit for updating the background model according to a result of the offset matching.

Supplementary note 6, the apparatus according to supplementary note 5, wherein the updating unit includes:

a first updating unit, configured to update, with a first probability, the background model of each background point to a background model of a position where the background point of the current frame image is located and a background model of a position where the background point is located after being offset;

and a second updating unit, configured to update, with a second probability, a background model of a pixel point included in the first foreground region but not included in the second foreground region to a background model of a position of the current frame image after the pixel point is shifted.

Supplementary note 7, an electronic device comprising the apparatus according to any one of supplementary notes 1-6.

Supplementary note 8, an image processing method, the method comprising:

extracting a foreground mask of a current frame image of an input video according to a background model, and determining a first foreground region according to the foreground mask;

detecting a shaking vector of the whole current frame image and each local shaking vector in the current frame image based on optical flow matching;

calculating the difference value between the jitter vector of the whole current frame image and each local jitter vector;

carrying out global jitter compensation on the current frame image according to the jitter vector of the whole current frame image;

performing offset matching on the pixel points in the first foreground region according to the difference value;

and updating the foreground mask according to the offset matching result, and determining a second foreground region according to the updated foreground mask.

Note 9 of the present invention, the method according to note 8, wherein the detecting a dither vector of the entire current frame image based on optical flow matching includes:

extracting a plurality of feature points from the current frame image;

determining the weight of each feature point;

and clustering based on optical flow matching and according to the weight of each feature point to obtain the jitter vector of the whole current frame image and the local jitter vector of each current frame image.

Supplementary note 10, the method according to supplementary note 9, wherein the determining the weight of each feature point includes:

counting the proportion of pixel points at the positions of the characteristic points, which are detected as foreground points or background points or matching errors in a preset number of frame images before the current frame image;

and determining the weight of each feature point in the current frame image according to the proportion.

Supplementary note 11, the method according to supplementary note 10, wherein the determining the weight of each feature point in the current frame image according to the ratio includes:

when the proportion of the pixel points at the positions of the feature points detected as foreground points in the predetermined number of frame images is 30% -70% and the proportion of the pixel points detected as background points is 30% -70%, determining the weight of the feature points as 3;

when the proportion of the pixel points at the positions of the feature points detected as foreground points in the predetermined number of frame images is 70% -100%, and the proportion of the pixel points detected as background points is 0% -30%, determining the weight of the feature points as 2;

when the proportion of the pixel points at the positions of the feature points detected as foreground points in the predetermined number of frame images is 0% -30% and the proportion detected as background points is 70% -100%, determining the weight of the feature points as 1;

and when the number of times that the pixel points at the positions of the feature points are detected as matching errors in the preset number of frame images is more than 2, determining the weight of the feature points as 0.

Supplementary note 12, the method according to supplementary note 8, wherein the method further comprises:

and updating the background model according to the offset matching result.

Supplementary note 13, the method according to supplementary note 12, wherein said updating the background model according to the result of the offset matching comprises:

for the background model of each background point, updating the background model into a background model of the position of the background point of the current frame image and/or a background model of the position of the background point after the background point is shifted according to a first probability;

and updating the background model to the background model of the position of the current frame image after the shift of the foreground point of the pixel by using a second probability for the background models of the foreground pixel points which are included in the first foreground region but not included in the second foreground region.

Claims

1. An image processing apparatus, the apparatus comprising:

the second foreground detection unit is used for updating the foreground mask according to the result of the offset matching and determining a second foreground area according to the updated foreground mask;

2. The apparatus of claim 1, wherein the jitter detection unit comprises:

an extraction unit configured to extract a plurality of feature points from the current frame image;

a determining unit, configured to determine a weight of each feature point;

3. The apparatus of claim 2, wherein the determining unit comprises:

4. The apparatus of claim 3, wherein the first determining unit comprises:

5. The apparatus of claim 1, wherein the updating unit comprises:

a first updating unit, configured to update, with a first probability, the background model of each background point to a background model of a position where the background point of the current frame image is located or a background model of a position where the background point is located after being offset;

6. An electronic device comprising the apparatus of any one of claims 1-5.

7. A method of image processing, the method comprising:

updating the foreground mask according to the result of the offset matching, and determining a second foreground region according to the updated foreground mask;

and updating the background model according to the offset matching result.

8. The method of claim 7, wherein said detecting a dither vector for the entire current frame image based on optical flow matching comprises:

extracting a plurality of feature points from the current frame image;

determining the weight of each feature point;

and clustering based on optical flow matching and according to the weight of each feature point to obtain the jitter vector of the whole current frame image and the local jitter vector in the current frame image.

9. The method according to claim 8, wherein the determining the weight of each feature point comprises: