CN109949335B

CN109949335B - Image processing method and device

Info

Publication number: CN109949335B
Application number: CN201711385876.2A
Authority: CN
Inventors: 刘雷
Original assignee: Huawei Technologies Co Ltd; Institute of Computing Technology of CAS
Current assignee: Huawei Technologies Co Ltd; Institute of Computing Technology of CAS
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2023-12-08
Anticipated expiration: 2037-12-20
Also published as: CN109949335A

Abstract

The embodiment of the invention discloses an image processing method and device, wherein the image processing method comprises the following steps: acquiring a first image frame of a monitoring video; calculating the image data of the first image frame to obtain an object contained in the first image frame, wherein the object comprises a foreground and/or a background; searching a first parameter learning rate corresponding to the target, wherein the first parameter learning rate is used for representing the rate of updating a newly-added model associated with the target into a background model, and the first parameter learning rate is determined according to the attribute information of the target; and updating a background model of the pixel point in the area corresponding to the target in the monitoring video by adopting the first parameter learning rate. By adopting the embodiment of the invention, the accuracy of the established background model can be improved.

Description

Image processing method and device

Technical Field

The embodiment of the invention relates to the technical field of multimedia, in particular to an image processing method and device.

Background

With the increasing demands of smart cities and public safety, the number of monitoring cameras in cities is increased, the demands can not be met far enough by adopting an artificial video monitoring mode, and the current intelligent video monitoring technology is widely focused and researched and becomes a current research hotspot. For the monitoring video, people often pay attention to the foreground of the image, so that in the actual processing process, a background model is established to eliminate uninteresting complex backgrounds from the monitoring video, the foreground is extracted, and subsequent processing tasks are concentrated on the foreground, so that the calculated amount of the subsequent processing tasks is reduced, system resources are saved, and meanwhile, the task accuracy is improved. And an accurate background model is established, which is particularly important for accurately and effectively extracting the foreground.

The classical process of building a background model in a monitoring scene is: and (3) carrying out statistical estimation on each pixel point of the monitoring video along a time axis, learning a background model of each pixel point, and then identifying the foreground of the monitoring video by utilizing the background model, and then transmitting the foreground to a subsequent task for processing. The background model building mode is not specific, the foreground is easy to learn as the background, or the background is learned as the foreground, and the accuracy is not high.

Disclosure of Invention

The embodiment of the invention provides an image processing method and device, which can improve the accuracy of an established background model.

In a first aspect, an embodiment of the present invention provides an image processing method, where the image processing method includes acquiring a first image frame of a surveillance video, where the first image frame may be any image frame in the surveillance video. The visual task module may be used to calculate the image data of the first image frame to obtain the object contained in the first image frame, where the object may be a foreground and/or a background in the first image frame, and the first image frame may include at least one foreground, for example, where the first image frame includes a foreground including pedestrians and vehicles.

Further searching for the first parameter learning rate corresponding to the target, different targets may correspond to different first parameter learning rates, for example, the first parameter learning rate corresponding to the vehicle is different from the first parameter learning rate corresponding to the pedestrian. The first parameter learning rate is used to represent a rate at which a newly added model appearing on a pixel in a target corresponding region updates a background model referred to as a pixel in the target corresponding region. The first parameter learning rate may be determined based on attribute information of the object, which may be a moving speed of the object, a contour size, and the like.

And updating a background model of the pixel points in the corresponding area of the target in the monitoring video by adopting the searched first parameter learning rate. By using the first parameter learning rate corresponding to the target to update the background model of the pixel points in the corresponding area of the target in the monitoring video, the situation that the foreground in the monitoring video is learned as the background or the background is learned as the foreground can be avoided, and the accuracy of the established background model is improved.

In one possible design, the target includes a foreground and/or a background, and if the target includes the foreground, the first parameter learning rate corresponding to the foreground is found, and if the target includes the background, the first parameter learning rate corresponding to the background is found. The first parameter learning rate corresponding to the foreground is smaller than the first parameter learning rate corresponding to the background. Alternatively, the foreground of different attribute categories may correspond to different first parameter learning rates, for example, the first parameter learning rate corresponding to a pedestrian is different from the first parameter learning rate corresponding to a vehicle.

In one possible design, the background model in the surveillance video may be further corrected after updating the background model with the first parameter learning rate. For example, the second image frame of the surveillance video is continuously acquired, and the second image frame is after the first image frame, and the first image frame and the second image frame may be separated by n frame images, where n is greater than 1. Performing foreground extraction processing on the second image frame according to the updated background model, and determining a prediction area where the foreground in the second image frame is located; and meanwhile, calculating the image data of the second image frame by adopting a visual task module, and determining the actual area where the foreground in the second image frame is located.

If the error between the predicted area and the actual area meets the erroneous judgment condition, the erroneous judgment area is determined according to the predicted area and the actual area, and optionally, the erroneous judgment condition may be that the error area between the predicted area and the actual area is greater than a certain threshold. And further updating the background model of the pixel points in the erroneous judgment area.

And the accuracy of the background model established in the monitoring video is further improved by calculating the erroneous judgment area.

In one possible design, the erroneous determination area includes a first area including pixels covered by the actual area and pixels uncovered by the prediction area, i.e., the foreground is learned as the background, and/or a second area including pixels uncovered by the actual area and pixels covered by the prediction area, i.e., the background is learned as the foreground. When updating the background model of the pixel point in the erroneous judgment area, if the erroneous judgment area comprises a first area, the number of models contained in the background model of the pixel point in the first area is reduced. If the erroneous judgment area comprises the second area, increasing a first parameter learning rate corresponding to the foreground in the second area, and updating a background model of the pixel point in the second area by adopting the increased first parameter learning rate.

By adopting the mode, the foreground can be learned as the pixel point of the background by mistake in the erroneous judgment area, the foreground can be learned again as soon as possible, the background in the erroneous judgment area can be learned as the pixel point of the foreground by mistake, and the foreground can be learned again as soon as possible.

In one possible design, the manner of increasing the first parameter learning rate corresponding to the foreground in the second area may be to obtain the second parameter learning rate corresponding to the foreground, that is, to pre-store the second parameter learning rates corresponding to the foreground in different attribute categories, and then directly search the second parameter learning rate corresponding to the foreground. The second parameter learning rate is used for indicating the rate at which the newly added model appearing on the pixel points in the foreground corresponding area is updated to become the background model of the pixel points in the foreground corresponding area. In general, in order to relearn an area that is wrongly learned as a foreground as a background as soon as possible, the second parameter learning rate corresponding to the foreground is greater than the first parameter learning rate corresponding to the foreground. And further updating a background model of the pixel points in the second area in the monitoring video by adopting the second parameter learning rate.

By using the second parameter learning rates corresponding to different foreground, the region wrongly learned as the foreground can be learned again as the background as soon as possible, so that the accuracy of the background model is improved.

In one possible design, a third image frame of the surveillance video is acquired, the third image frame being subsequent to the first image frame. And carrying out foreground extraction processing on the third image frame according to the updated background model to obtain a foreground image, wherein the foreground image is an image formed by removing the background in the third image frame and reserving the foreground in the third image frame.

And respectively carrying out morphological processing on the foreground image by adopting a plurality of different morphological parameters to obtain a result image corresponding to each morphological parameter. And obtaining a result image with the best image quality in the result images corresponding to each morphological parameter, and determining the morphological parameter corresponding to the result image with the best image quality as a target morphological parameter. The parameters of the morphological processing are set as the target morphological parameters which can be used in the morphological processing of the subsequent plurality of image frames.

The image frames are subjected to morphological processing by adopting different morphological parameters, and the target morphological parameters corresponding to the result image with the best image quality are found, so that the image quality of the result image obtained later is improved.

In a second aspect, an embodiment of the present invention provides an image processing apparatus configured to implement the method and the function performed in the first aspect described above, implemented by hardware/software, the hardware/software including a module corresponding to the function described above.

In a third aspect, an embodiment of the present application provides an image processing apparatus, including: the image processing device comprises a processor, a memory and a communication bus, wherein the communication bus is used for realizing connection communication between the processor and the memory, and the processor executes a program stored in the memory for realizing the steps in the image processing method provided in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a fifth aspect, the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In the embodiment of the application, the background model of the pixel point in the corresponding area of the target in the monitoring video is updated by adopting the first parameter learning rate corresponding to the target in the first image frame, the first parameter learning rate is determined according to the attribute information of the target, different targets can select different first parameter learning rates, for example, the target is a foreground, in order to avoid the foreground learning as a background, the corresponding first parameter learning rate is slightly smaller, the target is a background, and in order to learn a new background model as soon as possible, the corresponding first parameter learning rate is slightly larger, so that the foreground in the monitoring video can be prevented from being learned as the background, and the accuracy of the established background model is improved by adopting the embodiment of the application.

Drawings

In order to more clearly describe the technical solution of the embodiments of the present invention, the following description will explain the drawings required to be used by the embodiments of the present invention.

FIG. 1 is a schematic view of a component structure according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a target database provided by an embodiment of the present invention;

FIG. 4 is a schematic view of a component structure according to an embodiment of the present invention;

FIG. 5 is a schematic view of another component structure according to an embodiment of the present invention;

FIG. 6 is a schematic view of another component structure provided in an embodiment of the present invention;

FIG. 7 is a schematic view of another component structure provided in an embodiment of the present invention;

fig. 8 is a schematic structural view of an image processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

The foreground in the embodiment of the invention can be people or objects and the like which are concerned by the monitoring system in the image frame, and the foreground can be dynamic and static.

The background in the embodiment of the invention can be an image formed by a part except the foreground in the image frame.

It should be noted that, the background and the foreground are relative concepts, the person or the object concerned changes, and accordingly, the foreground and the background in one image frame change, for example, if the person concerned is an automobile on an expressway, the automobile is the foreground, and the road surface and the surrounding environment (including pedestrians) are the background; if pedestrians on highways are of interest, then pedestrians are foreground, while cars, road surfaces and surrounding environment are background.

The model in the embodiment of the invention refers to a mathematical model which is obtained by carrying out statistical observation on pixel points in an image frame and is used for representing the characteristics of the pixel points.

The background model in the embodiment of the invention is to observe a monitoring video for a certain time, statistically estimate each pixel point of an image frame along a time axis, form a background model of each pixel point, and the background model of each pixel point can represent the background characteristics of the pixel point, and is used for identifying the foreground in the image frame. For example, an average background model, a frame difference model, a mixture gaussian model, and the like are typical background models.

The morphological parameters in the embodiment of the invention are as follows: an image frame in the monitoring video can obtain a preliminary foreground image after background modeling, but some noise and regional holes may exist in the foreground image, so that post-processing operations such as morphological expansion, corrosion and the like are often required to be performed on the foreground image after background modeling, and parameters set in the post-processing operations are morphological parameters.

The parameter learning rate (including the first parameter learning rate and the second parameter learning rate) in the embodiment of the present invention is used to represent the rate at which the newly added model appearing on the pixel points in the corresponding region of the target is updated to become the background model of the pixel points in the corresponding region of the target, where the target may be a foreground and/or a background in the image frame.

The first parameter learning rate is used for avoiding learning the background as the foreground and avoiding learning the foreground as the background when the background model is learned, and controlling the rate of updating the newly-added model into the background model. For example, for a target that is background, the first parameter learning rate may be slightly larger so that the background is learned as a background model as soon as possible; for a foreground, the first parameter learning rate may be slightly smaller to avoid the foreground being learned as a background, and the value of the specific first parameter learning rate may be determined according to attribute information of the foreground, where the attribute information may include a contour size, a behavior pattern, a movement speed, and the like of the foreground. For example, the first parameter learning rate of the vehicle is greater than the first parameter learning rate of the pedestrian, and the first parameter learning rate of the cat is greater than the first parameter learning rate of the elephant.

The second parameter learning rate is used for correcting when erroneous judgment occurs (when the background is learned as a scene of the foreground) in the foreground detection process by using the background model, namely, the foreground is quickly learned again as the background. The second parameter learning rate is thus greater than the first parameter learning rate.

The newly added model may be a model newly appeared on the pixel, the model may be associated with a target corresponding to the pixel, and the newly added model learns into a background model of the pixel at a first parameter learning rate with a certain magnitude. For example, the background model of a certain pixel point is a, and since the pedestrian passes through the pixel point, the background model learned by the pixel point includes a new model B of the pedestrian, that is, the background model of the pixel point includes a and B, but the weight of a is larger, that is, the dominant weight is larger, and the weight of B is smaller because of the new model.

Alternatively, the parameter learning rate may be an increasing rate of the weight occupied by the new model, where the weight occupied by the new model finally disappears in the pixel point with the new model, that is, until a certain image frame, the new model does not appear in the pixel point. For example, the newly added model B may be an increase at a rate of 0.1 (0.1, 0.2, 0.3..etc., each weight may be a result of a calculation for one image frame), or the newly added model may be growing at a rate of 0.2 (0.2, 0.4, 0.6, etc., each weight may be a calculation for one image frame). From the above, it can be seen that the larger the parameter learning rate, the faster the newly added model is learned as (i.e., dominates in) the background model. It should be noted that, in actually different application scenarios, different parameter learning rates need to be set according to different targets. For example, in a scene of learning a background model, the first parameter learning rate of the background is larger than the first parameter learning rate of the foreground, and the foreground of different attribute information is specifically different from the first parameter learning rate. In the misjudgment scene, i.e. the scene in which the background is learned as the foreground, the learned foreground needs to be quickly learned as the background, and the set second parameter learning rate is relatively large, i.e. the second parameter learning rate corresponding to the same foreground is larger than the first parameter learning rate. Alternatively, the second parameter learning rates corresponding to all the foreground may be the same or different.

The application scene of the embodiment of the invention can be in an intelligent monitoring system for preprocessing video by using background modeling, and the intelligent monitoring system processes the monitoring video acquired by the camera. As shown in fig. 1, the intelligent monitoring system includes a video image module 2001, a background modeling module 2002, a post-processing module 2003, a visual task module 2004, and a result image module 2005, which constitute a basic video processing system. The video image module 2001 is an input portion of the scene, being a succession of image frames captured by the camera. The background modeling module 2002 models an input image frame and extracts a foreground based on the established background model. The background modeling module 2002 generally includes two parts, a foreground extraction module 2021 and a model update module 2022, the foreground extraction module 2021 being used to separate the foreground and background of the image frame, the model update module 2022 being used to learn parameters of the background model for the image frame. The post-processing module 2003 is a further processing of the results of the background modeling to eliminate noise present in the extracted foreground images, and the erosion 2031 and dilation 2032 in the post-processing module 2003 are two common morphological post-processing operations. The visual task module 2004 is a core task execution part in the intelligent monitoring system, and mainly includes tasks such as foreground detection (2041), foreground segmentation (2042), and foreground tracking (2043). The result image module 2005 is an output part of the intelligent monitoring system, and the area of the monitoring foreground is divided and marked in the image.

According to the embodiment of the invention, the visual task module 2006 is newly added on the basis of the basic video processing system, and the visual task module 2006 can be used for calculating the image frames to obtain the foreground and/or the background contained in the image frames. The first parameter learning rate corresponding to the foreground is further searched from the target database 2009, the first parameter learning rate corresponding to the background is searched, and the background model of the model updating module 2022 is updated by using the searched first parameter learning rate. That is, the parameter learning rate of the background model of the background corresponding region pixel point is updated to the first parameter learning rate corresponding to the background, and the parameter learning rate of the background model of the foreground corresponding region pixel point is updated to the first parameter learning rate corresponding to the corresponding foreground.

Further optionally, the original image frame is processed by the visual task module 2006, a result image with a foreground is obtained, an area where the foreground in the result image is located is an actual area, a result image 2005 with the foreground, which is obtained by performing background modeling processing on the original image frame, is obtained, the area where the foreground in the result image is located is a predicted area, and a misjudgment area is obtained by comparing the foreground information of the actual area and the predicted area. The erroneous determination area includes an area where the foreground is learned as the background, and/or an area where the background is learned as the foreground.

If the background is learned as the foreground, the second parameter learning rate corresponding to the foreground is searched from the target database 2009, and the second parameter learning rate is generally larger, so that the rate of learning the foreground as the background can be increased. If the foreground is learned as the background, the number of models in the background model is reduced, and the background model is learned again to form a more accurate background model.

Referring to fig. 2 in conjunction with the application scenario shown in fig. 1, an embodiment of the present application provides a flowchart of an image processing method, where the method includes:

s10, acquiring a first image frame of a monitoring video;

in the embodiment of the application, the monitoring video can be a video collected by a camera with a fixed visual angle, or the monitoring video can be a section of intercepted video. The first image frame may be any one of the image frames in the surveillance video.

S11, calculating the image data of the first image frame to obtain a target contained in the first image frame, wherein the target comprises a foreground and/or a background;

in the embodiment of the application, the visual task module can be adopted to calculate the image data of the first image frame, so as to obtain the target contained in the first image frame and the position of the target. In an embodiment of the present application, the object of interest may include only at least one foreground included in the first image frame, or the object of interest may include at least one foreground included in the first image frame as well as a background.

Alternatively, as shown in fig. 4, the original first image frame 4201 is directly processed by foreground detection, foreground segmentation and foreground tracking of the visual task module 4202 to obtain a result image 4203, where the foreground included in the first image frame may be obtained by the result image 4203, that is, the foreground information 4204 of the first image frame may be collected, and the area where the background is located may also be obtained by the result image.

Specifically, the visual task module stores features of the foreground to be focused, such as traditional image features of scale invariant feature transform (scale invariant feature transform, SIFT), directional gradient histogram (histogram of oriented gradients, HOG), or depth features extracted by the depth neural network. After the visual task module receives the first image frame, the visual task module scans, segments and the like the first image frame according to the specific type of the visual task, divides the first image frame into a plurality of candidate areas, and then matches each candidate area with the stored features of the foreground to be focused. If the matching is successful, the candidate region is marked as the region where the foreground is located. After all candidate regions are processed, the marked region in the first image frame is the region where the foreground in the first image frame is located, and the unmarked region is the region where the background is located. And meanwhile, the type of the foreground can be obtained through the characteristics of the foreground, or the type of the foreground is stored in the visual task module. The type of foreground may be used to represent a classification of the foreground, such as to represent the foreground as a car or a person, etc.

S12, searching a first parameter learning rate corresponding to the target, wherein the first parameter learning rate is used for representing the rate of updating a newly-added model appearing on a pixel point in a corresponding region of the target into a background model of the pixel point in the corresponding region of the target, and the first parameter learning rate is determined according to attribute information of the target;

in the embodiment of the invention, the first parameter learning rate corresponding to the target is searched in the target database, and the first parameter learning rates corresponding to various targets are stored in the target database, for example, the foreground of various different attribute information corresponds to different first parameter learning rates, and the background corresponds to one first parameter learning rate. The first parameter learning rate is used to represent a rate at which a newly added model appearing on a pixel in the target corresponding area is updated to be a background model of the pixel in the target corresponding area, where the background model of one pixel is taken as an example, the model a is a normal distribution curve model centered on 101, the model B is a normal distribution curve model centered on 104, the model a is a background model of the pixel, and the new object or person may be a foreground due to the appearance of the new object or person in the pixel, or the new object or person may be a changed background. When a new object or person appears at the pixel, a new model exists. For example, the B model is the new model. In order to quickly update the background model, the newly added model B is learned in the background model of the pixel at a specific gravity increase rate, which is the first parameter learning rate. For example, if the first parameter learning rate corresponding to the target is 0.1, the background model of the pixel is updated as follows: 0.9a+0.1b, 0.8a+0.2b, 0.7a+0.3b. It is noted that, the rate of updating the new model into the background model is determined by the magnitude of the first parameter learning rate, and it is noted that updating the new model into the background model means that the new model takes the dominant role of the background model, that is, the weight is relatively large.

In an actual scene, the first parameter learning rate needs to be determined according to the attribute information of the target, for example, the first parameter learning rate corresponding to the background can be slightly larger, so that a new background can be learned as soon as possible, and the accuracy of foreground detection is improved. The first parameter learning rate corresponding to the foreground with relatively high motion speed is larger than the first parameter learning rate corresponding to the foreground with relatively low motion speed, and the first parameter learning rate corresponding to the foreground with relatively large contour and relatively low motion speed is required to be very small, otherwise, the foreground is easily learned as the background in the background model learning process. The above method for determining the first parameter learning rate according to the attribute information of the target is merely an example, and the embodiment of the present invention is not limited to this, and may be other attribute information.

The target database records information such as optimal parameter learning rate (i.e. first parameter learning rate), model number and the like of each target. In the target database, some categories of targets can also comprise sub-category information, and the categories of the targets are divided more finely according to the precision requirement of the task. As shown in fig. 3, which is a schematic diagram of collecting information in a target database according to an embodiment of the present invention, as shown in the figure, categories of targets may be divided into: background, pedestrian, motor vehicle, bicycle, etc., each class of objects corresponds to a first parameter learning rate and number of models. Optionally, the categories of the targets may be further divided into sub-categories, and the corresponding relationship between the first parameter learning rate and the number of models of each sub-category is recorded in the target database. As shown, the motor vehicle may be classified into a car, a truck, a bus, etc., and each sub-category may correspond to a different first parameter learning rate and number of models. For example, the target database may be described in the following manner:

Database＝{Object _info |Object∈AllObjects}

Object _info ＝(learning rate，max model，subObject，...)

Wherein the method comprises the steps ofDatabase is a target Database, allObjects is a collection of all targets of interest to a task, which contains a specific target: background, object _info For information of a certain target, the learning rate records the first parameter learning rate which is optimal at present, the max model records the number of models which are optimal at present, and the sub object records the target

Information of the target subcategory.

The searching form for searching the first parameter learning rate corresponding to the target can be expressed as follows:

Obj _t (i，j)＝{Object _info |Object＝Task _t (i，j)}

wherein, task _t (i, j) is the result of the calculation of pixel point (i, j) at time t, which is typically marked as a specific object, such as the type of object; obj (optical disc) _t (i, j) is target information of the pixel point (i, j) at the t moment, which is obtained by querying a target database, and the target information can comprise a first parameter learning rate and a model number corresponding to the target.

And S13, updating a background model of the pixel points in the corresponding area of the target in the monitoring video by adopting the first parameter learning rate.

In the embodiment of the invention, after the first parameter learning rate corresponding to the target is queried, the background model of the pixel point in the region corresponding to the target in the monitoring video can be updated by adopting the first parameter learning rate. It should be noted that, when the first image frame of the surveillance video includes a plurality of foreground, different foreground may correspond to different first parameter learning rates, and when the background model is updated, the background model of the pixel point in the area corresponding to the foreground is updated by adopting the first parameter learning rate corresponding to the foreground. For example, the first image frame includes a foreground 1 and a foreground 2, where the foreground 1 is in a region 1 of the monitored video, the foreground 2 is in a region 2 of the monitored video, a first parameter learning rate corresponding to the foreground 1 is A1, and a first parameter learning rate corresponding to the foreground 2 is A2, and then the background model of the pixel point of the region 1 is updated by using A1, and the background model of the pixel point of the region 2 is updated by using A2. The background model of the pixel point of the area where the background is located is updated by using the first parameter learning rate corresponding to the background. The form of updating the background model can be represented by the following form:

Update(Model _t (i,j),Obj _t (i,j))

Wherein, model _t (i, j) is a background Model of the pixel point (i, j) at the t-th moment, update (Model) _t (i,j),Obj _t (i, j)) function according to Obj _t The information of (i, j) updates the background model of the pixel point (i, j).

By adopting the method, the background model of the pixel points of the corresponding area is updated by adopting different first parameter learning rates for different targets, so that the situation that the background is learned as the foreground or the foreground is learned as the background can be reduced. It should be noted that, in step S11, the visual task is adopted to calculate the first image frame to obtain the target included in the first image frame, the calculation process is complex, in order to improve the processing efficiency, generally, step S11 may be performed according to a relatively large period, or may not need to be performed for each image frame, and after determining the corresponding first parameter learning rate, the first parameter learning rate may be adopted to learn the background model in a period of time.

Further optionally, the image processing method according to the embodiment of the present invention may further include steps S14 to S18;

s14, acquiring a second image frame of the monitoring video;

s15, carrying out foreground extraction processing on the second image frame according to the updated background model, and determining a prediction area where the foreground in the second image frame is located;

In the embodiment of the present invention, the second image frame may be an image frame subsequent to the first image frame, and a certain number of image frames may be spaced between the second image frame and the first image frame. And carrying out foreground extraction processing on the second image frame by adopting the updated background model to obtain a result image containing the foreground, and determining a prediction area where the foreground in the second image frame is located according to the result image.

Specifically, as shown in fig. 5, after the second image frame 4303 is subjected to the foreground extraction processing of the gaussian mixture model 4302, a foreground image 4301 is obtained, and the foreground image is input into the visual task 4304 module to perform the foreground detection, the foreground segmentation, the foreground tracking and other processing on the foreground image, so as to obtain a result image 4305 containing a foreground, and a prediction area where the foreground is located can be determined by using the result image 4305. As shown in fig. 5, the dashed box in the resulting image 4305 is the prediction area where the foreground is located. The visual task 4304 may perform corrosion and expansion processing on the foreground image by a post-processing module before processing the foreground image.

S16, calculating the image data of the second image frame, and determining an actual area where the foreground in the second image frame is located;

In the embodiment of the present invention, step S15 is to process the second image frame through the established background model, and step S16 is to directly perform visual task calculation processing on the original second image frame, so as to determine the actual area where the foreground in the second image frame is located.

Specifically, as shown in fig. 6, the original second image frame 4303 is directly processed by the visual task module 4304 through foreground detection, foreground segmentation, and foreground tracking, so as to obtain a result image 4306 including the foreground. From this resulting image 4306, the actual area in which the foreground is located can be determined.

It should be noted that, in the step S16, the calculating process of the second image frame may refer to the step S11, and the repetition is not repeated here.

S17, if the error between the prediction area and the actual area meets the error judgment condition, determining an error judgment area according to the prediction area and the actual area, wherein the error judgment area comprises a first area and/or a second area, the first area comprises pixel points covered by the actual area and formed by pixel points uncovered by the prediction area, and the second area comprises pixel points uncovered by the actual area and formed by pixel points covered by the prediction area;

In the embodiment of the invention, whether erroneous judgment exists can be determined according to the predicted area and the actual area. The erroneous judgment condition may be a threshold value of an error between the predicted area and the actual area, that is, an error between the predicted area and the actual area is greater than the threshold value, and the erroneous judgment condition is satisfied. The erroneous determination area may include the first area and/or the second area. The first region may be a region composed of pixels covered by an actual region of the foreground, and pixels not covered by the prediction region, i.e., a region where the foreground is learned as the background. The second region may be a region composed of pixels not covered by the actual region of the foreground, and pixels covered by the prediction region, i.e., a region where the background is learned as the foreground.

Alternatively, the erroneous determination region may be expressed in the following form:

MFSet _t ＝{(i，j)|Mask _t (i，j)＝O and Obj _t (i，j)≠Background}

MbSet _t ＝{(i，j)|Mask _t (i，j)＝255 and Obj _t (i，j)＝Background}

BGS _t (i, j) is a background modeling result of the pixel point (i, j) at the t moment, namely a result determined by a background model;

Mask _t (i, j) is the binary result of the pixel point (i, j) at the t moment;

Obj _t (i, j) is the visual task calculation result of the pixel point (i, j) at the t moment;

MFSet _t recording all the pixel points of the foreground which are misjudged as the background at the t moment;

MbSet _t all pixel points of the background which are misjudged as the foreground at the t moment are recorded.

And S18, updating a background model of the pixel points in the misjudgment area.

In the embodiment of the invention, after the erroneous judgment area is determined, the background model of the pixel points of the erroneous judgment area needs to be further optimized. Optionally, if the erroneous judgment area includes the first area, i.e., the area where the foreground is learned as the background, the area where the foreground is learned needs to be relearned as the foreground, for example, the background model of the first area is relearned, or the number of models included in the background model of the pixel point in the first area is reduced, so that the foreground is relearned as soon as possible, and the background model is corrected. If the erroneous judgment area includes the second area, i.e., the area where the background is learned as the foreground needs to be learned as the background as soon as possible, for example, the first parameter learning rate corresponding to the foreground in the second area is increased, and the background model of the pixel point in the second area is updated by adopting the increased first parameter learning rate. Optionally, the degree of increase of the first parameter learning rate may be flexibly configured, for example, uniformly set to the same larger parameter learning rate; or, the different foreground corresponds to different second parameter learning rates, and the second parameter learning rate is adopted to update the background model of the pixel points of the second area in the monitoring video, and the second parameter learning rate corresponding to the same foreground is larger than the first parameter learning rate corresponding to the foreground. Alternatively, the correspondence between the second parameter learning rates corresponding to the different prospects may be stored in the target database.

As shown in fig. 6, by comparing the result image 4401 and the result image 4402, the erroneous judgment area 4403 is calculated, and the model updating module 4442 in the mixed gaussian model 4404 is further updated according to the erroneous judgment area and the second parameter learning rate stored in the target database.

Further optionally, the image processing method according to the embodiment of the present application may further include steps S19 to S23;

s19, acquiring a third image frame of the monitoring video;

s20, carrying out foreground extraction processing on the third image frame according to the updated background model to obtain a foreground image;

s21, respectively carrying out morphological processing on the foreground images by adopting a plurality of different morphological parameters to obtain a result image corresponding to each morphological parameter;

s22, obtaining a result image with the best image quality in the result images corresponding to each morphological parameter, and taking the morphological parameter corresponding to the result image with the best image quality as a target morphological parameter;

s23, setting the parameters of the morphological treatment as the target morphological parameters.

In the embodiment of the application, an initial foreground image can be obtained after the background modeling is carried out on the image frame of the monitoring video, but the foreground image possibly has some noise data and regional holes, so that post-processing operation is needed to be carried out on the foreground image after the background modeling, and the background modeling result is further improved. The post-treatment operation comprises morphological expansion, corrosion and other operations, morphological parameters need to be set in the post-treatment operation process, and generally poor morphological parameters can lead to worse background modeling results, so that optimal target morphological parameters need to be obtained in the embodiment of the application.

And acquiring a third image frame in the monitoring video, wherein the third image frame can be an image frame after the first image frame, and carrying out foreground extraction processing on the third image frame by adopting the updated background model to acquire a foreground image. And respectively carrying out morphological processing on the foreground image by adopting a plurality of preset different morphological parameters, thereby obtaining a result image corresponding to each morphological parameter. Comparing the result images corresponding to the morphological parameters, determining the morphological parameters corresponding to the result image with the best image quality as the best target morphological parameters, and processing the result images by adopting the target morphological parameters.

The above operation can be expressed in the following form:

Mask _t (i,j)＝Morph(Mask _t (i,j),kernel _t )

kernel _t+1 ＝max(Score(Mask _t ，kernel _t )，kernel _t ∈Lib

wherein Morph (Mask _t (i,j),kernel _t ) Is to Mask _t (i, j) the run parameter is kernel _t Is performed in a morphological operation of (a);

the Score function is used for evaluating background modeling results, namely the quality of a foreground image, and the larger the numerical value is, the better the image quality is represented;

kernel _t morphological parameters representing time t;

kernel _t+1 the morphological parameter at the t+1st moment is represented, namely the target morphological parameter;

all optional morphological parameters are recorded in Lib.

As shown in fig. 7, after the third image frame is processed by the gaussian mixture model 4501, a foreground image is obtained, the foreground image is further input to the post-processing operation module 4502 for performing morphological processing of corrosion and expansion, the post-processing operation module 4502 processes the foreground image by using multiple morphological parameters, and the results of the processing of the morphological parameters are evaluated, so that an optimal target morphological parameter is selected. The morphology parameters in post-processing operations module 4502 are further updated with the target morphology parameters.

Referring to fig. 8, a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention is shown, where the image processing apparatus includes an obtaining module 701, a calculating module 702, a searching module 703, and an updating module 704; wherein, the detailed description of each module is as follows:

an acquiring module 701, configured to acquire a first image frame of a surveillance video;

a calculating module 702, configured to calculate image data of the first image frame, and obtain a target included in the first image frame;

a searching module 703, configured to search a first parameter learning rate corresponding to the target, where the first parameter learning rate is used to represent a rate at which a newly added model appearing on a pixel in the target corresponding area is updated to be a background model of the pixel in the target corresponding area, and the first parameter learning rate is determined according to attribute information of the target;

and the updating module 704 is configured to update a background model of the pixel point in the target corresponding area in the monitoring video by using the first parameter learning rate.

Optionally, the target comprises a foreground and/or a background;

if the target includes the foreground;

the searching module is specifically used for searching a first parameter learning rate corresponding to the foreground;

If the target includes the background;

the searching module is specifically used for searching a first parameter learning rate corresponding to the background;

the first parameter learning rate corresponding to the foreground is smaller than the first parameter learning rate corresponding to the background.

Optionally, the object includes the foreground, and the image processing apparatus according to the embodiment of the present invention may further include a first determining module;

the acquiring module 701 is further configured to acquire a second image frame of the surveillance video;

the first determining module is used for extracting the foreground of the second image frame according to the updated background model and determining a prediction area where the foreground in the second image frame is located;

the first determining module is further configured to calculate image data of the second image frame, and determine an actual area where a foreground in the second image frame is located;

the first determining module is further configured to determine a misjudgment area according to the prediction area and the actual area if the error between the prediction area and the actual area meets a misjudgment condition;

the updating module 704 is further configured to update a background model of the pixel points in the erroneous determination area.

Optionally, the erroneous judgment area includes a first area and/or a second area, where the first area includes pixels covered by the actual area and areas formed by pixels uncovered by the prediction area, and the second area includes areas formed by pixels uncovered by the actual area and areas formed by pixels covered by the prediction area; the updating module updating the background model of the pixel point in the misjudgment area comprises the following steps:

if the misjudgment area comprises the first area, reducing the number of models contained in the background model of the pixel point in the first area; if the erroneous judgment area comprises the second area, increasing a first parameter learning rate corresponding to the foreground in the second area, and updating a background model of the pixel point in the second area by adopting the increased first parameter learning rate.

Optionally, the updating module increases a first parameter learning rate corresponding to the foreground in the second area, and updates a background model of the pixel point in the second area by using the increased first parameter learning rate, including:

acquiring a second parameter learning rate corresponding to the foreground, wherein the second parameter learning rate is used for indicating the rate of updating a newly-added model appearing on a pixel point in a region corresponding to the foreground into a background model of the pixel point in the region corresponding to the foreground, and the second parameter learning rate corresponding to the foreground is larger than the first parameter learning rate corresponding to the foreground;

And updating a background model of the pixel points in the second area in the monitoring video by adopting the second parameter learning rate.

Optionally, the image processing apparatus according to the embodiment of the present invention may further include a foreground extraction module, a morphological processing module, a second determination module, and a setting module;

the acquiring module 701 is further configured to acquire a third image frame of the surveillance video;

the foreground extraction module is used for carrying out foreground extraction processing on the third image frame according to the updated background model to obtain a foreground image;

the morphological processing module is used for respectively carrying out morphological processing on the foreground images by adopting a plurality of different morphological parameters to obtain a result image corresponding to each morphological parameter;

the second determining module is configured to obtain a result image with the best image quality in the result images corresponding to each morphological parameter, and determine the morphological parameter corresponding to the result image with the best image quality as a target morphological parameter;

the setting module is used for setting the parameters of the morphological processing as the target morphological parameters.

It should be noted that the implementation of each module may also correspond to the corresponding description of the method embodiment shown in fig. 2, and perform the method and the function performed in the foregoing embodiment.

With continued reference to fig. 9, fig. 9 is a schematic diagram of an image processing apparatus according to an embodiment of the application. As shown, the image processing apparatus may include: at least one processor 801, at least one communication interface 802, at least one memory 803, and at least one communication bus 804.

The processor 801 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so forth. The communication bus 804 may be a peripheral component interconnect standard PCI bus or an extended industry standard architecture EISA bus or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus. The communication bus 804 is used to enable connected communications between these components. The communication interface 802 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The Memory 803 may include volatile Memory such as nonvolatile dynamic random access Memory (Nonvolatile Random Access Memory, NVRAM), phase Change random access Memory (PRAM), magnetoresistive Random Access Memory (MRAM), etc., and may also include nonvolatile Memory such as at least one magnetic Disk storage device, electrically erasable programmable read Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash Memory device such as inverse or flash Memory (NOR flash Memory) or inverse flash Memory (NAND flash Memory), semiconductor device such as Solid State Disk (SSD), etc. The memory 803 may optionally be at least one memory device located remotely from the processor 801. The memory 803 stores a set of program codes, and the processor 801 executes the programs executed in the memory 803 for realizing the following operations:

Acquiring a first image frame of a monitoring video;

calculating the image data of the first image frame to obtain a target contained in the first image frame;

searching a first parameter learning rate corresponding to the target, wherein the first parameter learning rate is used for representing the rate of updating a newly-added model appearing on a pixel point in a corresponding region of the target into a background model of the pixel point in the corresponding region of the target, and the first parameter learning rate is determined according to attribute information of the target;

and updating a background model of the pixel point in the area corresponding to the target in the monitoring video by adopting the first parameter learning rate.

Optionally, the target comprises a foreground and/or a background;

if the target includes the foreground;

the searching for the first parameter learning rate corresponding to the target comprises the following steps:

searching a first parameter learning rate corresponding to the foreground;

if the target includes the background;

searching a first parameter learning rate corresponding to the background;

Optionally, the processor 801 is further configured to perform the following operations:

acquiring a second image frame of the monitoring video;

performing foreground extraction processing on the second image frame according to the updated background model, and determining a prediction area where the foreground in the second image frame is located;

calculating the image data of the second image frame, and determining an actual area where the foreground in the second image frame is located;

if the error between the predicted area and the actual area meets the erroneous judgment condition, determining an erroneous judgment area according to the predicted area and the actual area;

and updating the background model of the pixel points in the misjudgment area.

Optionally, the erroneous judgment area includes a first area and/or a second area, the first area includes pixels covered by the actual area and an area formed by pixels uncovered by the prediction area, and the second area includes pixels uncovered by the actual area and an area formed by pixels covered by the prediction area.

if the misjudgment area comprises the first area, reducing the number of models contained in the background model of the pixel point in the first area;

If the erroneous judgment area comprises the second area, increasing a first parameter learning rate corresponding to the foreground in the second area, and updating a background model of the pixel point in the second area by adopting the increased first parameter learning rate.

acquiring a third image frame of the monitoring video;

performing foreground extraction processing on the third image frame according to the updated background model to obtain a foreground image;

respectively carrying out morphological processing on the foreground images by adopting a plurality of different morphological parameters to obtain a result image corresponding to each morphological parameter;

Obtaining a result image with the best image quality in the result images corresponding to each morphological parameter, and determining the morphological parameter corresponding to the result image with the best image quality as a target morphological parameter;

setting parameters of the morphological processing as the target morphological parameters.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. An image processing method, comprising:

acquiring a first image frame of a monitoring video;

calculating the image data of the first image frame to obtain an object contained in the first image frame, wherein the object comprises a foreground and/or a background;

searching a first parameter learning rate corresponding to the target, wherein the first parameter learning rate is used for representing the rate of updating a newly-added model appearing on a pixel point in a corresponding region of the target into a background model of the pixel point in the corresponding region of the target, and the first parameter learning rate corresponding to the foreground is determined by attribute information of the foreground;

learning the newly added model into a background model of pixel points in the corresponding area of the target in the monitoring video at the first parameter learning rate;

if the target includes the foreground; the searching for the first parameter learning rate corresponding to the target comprises the following steps:

searching a first parameter learning rate corresponding to the foreground;

if the target includes the background; the searching for the first parameter learning rate corresponding to the target comprises the following steps:

searching a first parameter learning rate corresponding to the background;

2. The method of claim 1, wherein the target comprises the foreground, and wherein updating the background model of the pixels in the region corresponding to the target in the surveillance video using the first parameter learning rate further comprises:

acquiring a second image frame of the monitoring video;

and updating the background model of the pixel points in the misjudgment area.

3. The method according to claim 2, wherein the erroneous determination area includes a first area and/or a second area, the first area includes pixels covered by the actual area and areas composed of pixels uncovered by the prediction area, the second area includes areas composed of pixels uncovered by the actual area and pixels covered by the prediction area, and the updating the background model of the pixels in the erroneous determination area includes:

4. The method of claim 3, wherein the increasing the first parameter learning rate corresponding to the foreground in the second region and updating the background model of the pixel point in the second region using the increased first parameter learning rate comprises:

5. The method of claim 1, wherein the target comprises the foreground, the method further comprising:

acquiring a third image frame of the monitoring video;

6. An image processing apparatus, comprising:

the acquisition module is used for acquiring a first image frame of the monitoring video;

the computing module is used for computing the image data of the first image frame to obtain a target contained in the first image frame, wherein the target comprises a foreground and/or a background;

the searching module is used for searching a first parameter learning rate corresponding to the target, wherein the first parameter learning rate is used for representing the rate of updating a newly-added model appearing on the pixel points in the corresponding region of the target into a background model of the pixel points in the corresponding region of the target, and the first parameter learning rate corresponding to the foreground is determined by attribute information of the foreground;

The updating module is used for learning the newly added model into a background model of the pixel point in the target corresponding area in the monitoring video at the first parameter learning rate;

if the target includes the foreground; the searching module is specifically used for searching a first parameter learning rate corresponding to the foreground;

if the target includes the background; the searching module is specifically used for searching a first parameter learning rate corresponding to the background;

7. The apparatus of claim 6, wherein the target comprises the foreground, the apparatus further comprising a first determination module;

the acquisition module is also used for acquiring a second image frame of the monitoring video;

the updating module is also used for updating the background model of the pixel points in the misjudgment area.

8. The apparatus of claim 7, wherein the erroneous determination area comprises a first area and/or a second area, the first area comprising pixels covered by the actual area and formed by pixels uncovered by the predicted area, the second area comprising pixels uncovered by the actual area and formed by pixels covered by the predicted area; the updating module updating the background model of the pixel point in the misjudgment area comprises the following steps:

9. The apparatus of claim 8, wherein the updating module increases a first parameter learning rate corresponding to the foreground in the second region and updates a background model of pixels in the second region using the increased first parameter learning rate, comprising:

10. The apparatus of claim 6, further comprising a foreground extraction module, a morphology processing module, a second determination module, and a setup module;

the acquisition module is also used for acquiring a third image frame of the monitoring video;

11. An image processing apparatus, comprising: memory, a communication bus and a processor, wherein the memory is adapted to store program code, and the processor is adapted to invoke the program code to perform the method according to any of claims 1-5.

12. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of any of claims 1-5.