Disclosure of Invention
In order to improve the accuracy of image segmentation of the target pedestrian, the invention provides a cascading-type image superpixel target pedestrian segmentation method, which provides more accurate preprocessing information for subsequent work of a computer vision system by establishing a target pedestrian segmentation model.
The invention realizes the above aim by the following technical scheme: a cascading-based image super-pixel target pedestrian segmentation method comprises the following steps:
step 1, sending a source image to an example segmentation channel, outputting an example segmentation image, splitting the example segmentation image, and extracting a single-target area and a segmentation result:
(M,R,S0)=MASKRCNN(I0) (1)
wherein MASKRCNN is an example partition function, I0For inputting a source image, R is a single target area split and extracted from an example segmentation image, S0Splitting and extracting a segmentation result in the example segmentation image, wherein M is the example segmentation image;
wherein:
the example segmentation image M is an image which is not processed after the source image is subjected to example segmentation;
the single target area R is a single target area image obtained by splitting and extracting the example segmentation image M, and the range of the single target area R is larger than that of the target detection frame; formula (2) is a calculation formula of the number of the single target regions R:
B=A±X(A∈(0,N+),B∈(0,N+),X∈(0,N+)) (2)
the source image contains A target objects, and B single target areas R after example segmentation, wherein X represents the number of error detection people of example segmentation targets;
segmentation result S0For example, a contour image obtained by splitting and extracting an image M is segmented, and a segmentation result S is represented by formula (3)0The calculation formula of the number is as follows:
N=J±X(N∈(0,N+),J∈(0,N+),X∈(0,N+)) (3)
the example segmentation image M comprises J target objects, and N segmentation results S after the J target objects are split and extracted0Wherein X represents the number of error detection persons of the example division target;
step 2, sending the single target region R to a super-pixel segmentation channel, and outputting a super-pixel segmentation image with a label;
QGK=SLIC(R) (4)
wherein SLIC is a superpixel segmentation function, R is an extracted single-target region in an example segmentation image, QGKSegmenting the image for K tagged superpixels;
step 3, dividing the super pixel into an image QGKMerging the super-pixel blocks with similar characteristics in the middle adjacent super-pixel blocks, replacing K super-pixel blocks in the super-pixel segmentation image with N super-pixel coloring information blocks, and finally reconstructing a more accurate target object outline;
PN=Cslic(QGK) (5)
wherein, CslicFor combining functions, Q, of SLICGKSegmenting an image for K tagged superpixels, PNIs a reconstructed target object contour;
step 4, dividing the result S0And a target object profile PNFusing and reconstructing a cascaded segmentation fused image Ei。
Ei=NSST(PN,S0) (6)
Wherein NSST is a non-down-shear wave transform multi-scale analysis function, PNIs the contour of the target object, S0For example segmentation results split and extracted in segmented images, EiIs a reconstructed final fused image;
the image fusion uses an energy filtering high-low frequency fusion rule: for the registered segmentation result S0And pre-fusing the target object contour P by adopting an energy filtering high-low frequency fusion rule, fusing the low-frequency coefficient by adopting a fusion rule based on an image guide filter in the low-frequency information fusion, and obtaining the low-frequency fusion coefficient. High frequency information fusion for superpixel QGKAnd then, the coefficients with the same label are gathered into a super-coefficient block, and the spatial frequency of each super-coefficient block is solved to obtain a high-frequency fusion coefficient. Finally, NSST inverse transformation is carried out on the high-frequency fusion coefficient and the low-frequency fusion coefficient, and a final fusion image E is reconstructedi。
Further, the super-pixel block feature merging step is as follows:
1) setting and sequencing superpixel blocks, and calculating the characteristic difference of color and space distance of adjacent superpixel blocks in the graph by the following calculation formula:
in the formula (9), the LAB vector adopts a CIELAB color space model, DLAB(Ri) For the inter-superpixel block color space distance, R' denotes the non-target region, liAnd ljIs a component of the pixel brightness, ai、aj、bi、bjBeing a component of a color, DXY(Ri) Is a position space distance, xi、xj、yi、yjThe vector obtains the spatial coordinate value of the pixel, D (R)i) Is the superpixel distance, δ is the distance weight coefficient, and δ belongs to (0, 1);
2) comparing the calculation result with a preset threshold, merging the target super-pixel block and the adjacent super-pixel block if the characteristic result is smaller than the threshold, ignoring the target pixel block if the characteristic result is larger than the threshold, and continuing to perform characteristic inspection on the next super-pixel block;
determining the correlation degree of the super-pixel area according to the super-pixel distance, wherein the calculation formula is as follows:
C(Ri)=1-exp(-D(Ri)) (10)
in the formula (10), C (R)i) Representing the degree of super-pixel area correlation, D (R)i) The super-pixel distance is inversely related to the area correlation. Determining whether the superpixel blocks accord with the characteristic information of the same target or not according to the correlation;
according to the calculation of the regional relevance of all superpixels, a regional relevance threshold value is calculated by utilizing a maximum inter-class difference method, all superpixel blocks meeting the relevance threshold value are extracted as target superpixels, and the calculation formula is as follows:
in the formula (11), R
*Representing the set of target superpixels finally acquired, R
iIs the target superpixel at i, C (R)
i) Representing the degree of correlation of the super-pixel area,
the method comprises the steps of obtaining a region correlation threshold value, wherein epsilon is a correlation threshold value coefficient, epsilon is 0.5, when epsilon is 0.5, characteristic information can be better divided into different pixel sets, each obtained subset forms a region corresponding to a real scene, the interior of each region has consistent attributes, and adjacent regions do not have the consistent attributes;
3) iterating the steps until all the superpixel blocks in the image complete one-time feature comparison, and generating a first merging result image at the moment;
4) before the second combination, refreshing the characteristic information and the rearrangement sequence of the superpixel blocks, and then combining the first combination result as an object of the combination operation until the superpixel blocks in the first combination result complete characteristic comparison to generate a second combination result graph.
Through the technical scheme, the invention has the beneficial effects that: the existing image segmentation method is basically used for segmenting a source image, the extracted target feature result is not accurate enough, and especially the edge contour effect of the segmented target feature is not ideal. The method adopts a cascading-type super-pixel segmentation method to carry out cascading type segmentation on a source image, finally uses an energy filtering high-low frequency fusion rule to realize sparse representation on the image in each direction and each scale, overcomes the pseudo Gibbs effect, finally improves the segmentation precision of image preprocessing, and provides a beneficial segmentation basis for subsequent identification and tracking
Detailed Description
The invention is further described with reference to the accompanying drawings and the specific classification procedures:
a logic block diagram of a cascade-based image super-pixel target pedestrian segmentation method is shown in FIG. 1, and the method comprises the following specific implementation steps:
step 1: sending the source image to an example segmentation channel, outputting an example segmentation image, and splitting and extracting a single target region R and a segmentation result S on the basis of the example segmentation image0;
Step 2: sending the single target region R to the super-pixel segmentation channel, and outputting the super-pixel segmentation image Q with labelsGK;
And 3, step 3: segmenting the superpixel into an image QGKMerging and reconstructing super pixel blocks with similar characteristics in middle and adjacent super pixel blocks to obtain more accurate target object contour QGK;
Step 4, dividing the result S0And a target object profile PNFusing and reconstructing a final cascade segmentation fused image Ei。
The specific scheme is as follows:
the invention is different from the existing target pedestrian segmentation algorithm, provides a cascading-type super-pixel-based target pedestrian segmentation method, and provides more accurate preprocessing information for the follow-up work of a computer vision system by establishing a target pedestrian segmentation model.
The invention realizes the above aim by the following technical scheme:
step 1, sending a source image to an example segmentation channel, outputting an example segmentation image, and splitting and extracting a single target area and a segmentation result on the basis of the example segmentation image.
(M,R,S0)=MASKRCNN(I0) (1)
Wherein MASKRCNN is an example partition function, I0For inputting a source image (length and width 2)6Multiple source images), R is an extracted single-target region in the example segmentation image, S0The extracted segmentation result is split in example segmentation, and M is an example segmentation image.
Definition 1: the example segmentation image M is an image which is not processed after the source image is subjected to example segmentation.
Definition 2: the single target area R is an image obtained by splitting and extracting the example segmentation image M, and the range of the single target area R is certainly larger than that of the target detection frame. Formula (2) is a calculation formula of the number of the single target regions R:
B=A±X(A∈(0,N+),B∈(0,N+),X∈(0,N+)) (2)
the source image comprises A target objects, B single target areas R after example segmentation, and X represents the number of error detection persons of the example segmentation target.
Definition 3: segmentation result S0For example, the contour image obtained by splitting and extracting the image M is divided, and the formula (3) is a division result S0The calculation formula of the number is as follows:
N=J±X(N∈(0,N+),J∈(0,N+),X∈(0,N+)) (3)
here, the example segmented image M contains J target objects, and after the splitting extraction, N segmentation results S0In the formula, X represents the number of detection errors of the example division target.
And 2, sending the single target region R to a superpixel segmentation channel, and outputting a super pixel segmentation image with a label.
QGK=SLIC(R) (4)
Wherein SLIC is a superpixel segmentation function, R is an extracted single-target region in an example segmentation image, QGKThe image is segmented for the superpixel containing K labeled superpixels.
Step 3, dividing the super pixel into an image QGKAnd combining the super-pixel blocks with similar characteristics in the middle adjacent super-pixel blocks to realize that K super-pixel blocks in the super-pixel segmentation image are replaced by N super-pixel coloring information blocks, and finally reconstructing a more accurate target object outline.
PN=Cslic(QGK) (5)
Wherein, CslicFor combining functions, Q, of SLICGKSegmenting an image for K tagged superpixels, PNIs the reconstructed contour of the target object.
Step 4, dividing the result S0And a target object profile PNFusing and reconstructing a final cascade segmentation fused image Ei。
Ei=NSST(PN,S0) (6)
Wherein NSST is a non-down-shear wave transform multi-scale analysis function, PNIs the contour of the target object, S0For splitting the extracted segmentation result in the instance segmentation, EiIs the reconstructed final fused image.
Energy filtering high-low frequency fusion rule: for the registered segmentation result S0And pre-fusing the target object contour P by adopting an energy filtering high-low frequency fusion rule, fusing the low-frequency coefficient by adopting a fusion rule based on an image guide filter in the low-frequency information fusion, and obtaining the low-frequency fusion coefficient. High frequency information fusion for superpixel QGKAnd then, the coefficients with the same label are gathered into a super-coefficient block, and the spatial frequency of each super-coefficient block is solved to obtain a high-frequency fusion coefficient. Finally, NSST inverse transformation is carried out on the high-frequency fusion coefficient and the low-frequency fusion coefficient, and a final fusion image E is reconstructedi。
The super pixel block feature merging step is as follows:
1) setting and sequencing superpixel blocks, and calculating the characteristic difference of color and space distance of adjacent superpixel blocks in the graph by the following calculation formula:
in the formula (9), the LAB vector adopts a CIELAB color space model, DLAB(Ri) For the inter-superpixel block color space distance, R' denotes the non-target region, liAnd ljIs a component of the pixel brightness, ai、aj、bi、bjBeing a component of a color, DXY(Ri) Is a position space distance, xi、xj、yi、yjThe vector obtains the spatial coordinate value of the pixel, D (R)i) Is the superpixel distance, δ is the distance weight coefficient, and δ belongs to (0, 1);
2) comparing the calculation result with a preset threshold, merging the target super-pixel block and the adjacent super-pixel block if the characteristic result is smaller than the threshold, ignoring the target pixel block if the characteristic result is larger than the threshold, and continuing to perform characteristic inspection on the next super-pixel block;
determining the correlation degree of the super-pixel area according to the super-pixel distance, wherein the calculation formula is as follows:
C(Ri)=1-exp(-D(Ri)) (10)
in the formula (10), C (R)i) Representing the degree of super-pixel area correlation, D (R)i) The super-pixel distance is inversely related to the area correlation. Determining whether the superpixel blocks accord with the characteristic information of the same target or not according to the correlation;
according to the calculation of the regional relevance of all superpixels, a regional relevance threshold value is calculated by utilizing a maximum inter-class difference method, all superpixel blocks meeting the relevance threshold value are extracted as target superpixels, and the calculation formula is as follows:
in the formula (11), R
*Representing the set of target superpixels finally acquired, R
iIs the target superpixel at i, C (R)
i) Representing the degree of correlation of the super-pixel area,
the method comprises the steps of obtaining a region correlation threshold value, wherein epsilon is a correlation threshold value coefficient, epsilon is 0.5, when epsilon is 0.5, characteristic information can be better divided into different pixel sets, each obtained subset forms a region corresponding to a real scene, the interior of each region has consistent attributes, and adjacent regions do not have the consistent attributes;
3) iterating the steps until all the superpixel blocks in the image complete one-time feature comparison, and generating a first merging result image at the moment;
4) before the second combination, refreshing the characteristic information and the rearrangement sequence of the superpixel blocks, and then combining the first combination result as an object of the combination operation until the superpixel blocks in the first combination result complete characteristic comparison to generate a second combination result graph.
The existing image segmentation method is basically used for segmenting a source image, the extracted target feature result is not accurate enough, and especially the edge contour effect of the segmented target feature is not ideal. The method adopts a cascading-type super-pixel segmentation method to carry out cascading type segmentation on a source image, and finally uses an energy filtering high-low frequency fusion rule to realize sparse representation on the image in each direction and each scale, so that the pseudo Gibbs effect is overcome, the segmentation precision of image preprocessing is finally improved, and a beneficial segmentation basis is provided for subsequent identification and tracking. That is to say, the method has the problems of small target false detection, missing detection, inaccurate segmentation result of the overlapped part and the like due to the fact that the Mask-RCNN is independently used, and the method segments the image by establishing a cascading super-pixel segmentation system. Firstly, carrying out example segmentation through Mask-RCNN, splitting and extracting a single target region R and a segmentation result S after segmentation0Then, the single target region R is subjected to superpixel segmentation to obtain a superpixel segmentation image QGKFinally, corresponding fusion rules are formulated to divide the superpixel into the image QGKAnd the segmentation result S0Fusing to reconstruct the final fused image Ei. According to the method, the super-pixel single-target segmentation is carried out on the result of the Mask-RCNN example segmentation, so that the segmentation precision can be improved, and more accurate preprocessing information is provided for the follow-up work of a computer vision system.
Example 1:
vehicle-mounted visual angle single-target pedestrian segmentation condition
A vehicle-mounted camera model is established by utilizing a geometrical relation, wherein the height of a target in an image plane is set to be h, the height of the target in the real world is 168cm, the focal length of a camera is set to be 12.25cm, the actual distance between the target and the camera is 145cm, and the pedestrian target moves at the speed of about 1.5m/s in a video and keeps moving linearly without changing the moving speed. By way of example segmentation, it can be observed that the segmentation of the contour of the target person in fig. 4 is not accurate enough. On the basis of the above-mentioned accuracy, it can be raised, and transferred into super-pixel input channel, and the null can be setThe inter-distance weight value is 65, the number of divided blocks is 225, the initial step size is 5, and superpixel division is performed. After the segmentation is finished, the registered single target region R and the segmentation result S are subjected to energy filtering high-low frequency fusion rule0Fusing to reconstruct the final fused image EiThe image contour accuracy of the cascade segmentation fusion can be obviously improved.
Example 2:
vehicle-mounted visual angle dual-target pedestrian segmentation condition
The vehicle-mounted camera model is established by using a geometrical relation, wherein the height of a target in an image plane is set to be h, the height of a target A in the real world is 168cm, the height of a target B in the real world is 165cm, the focal length of a camera is set to be 11.25cm, the actual distance between the target A and the camera is 120cm, and the actual distance between the target B and the camera is 195 cm. The video contains the double-pedestrian target, moves oppositely at the speed of about 1.4m/s, and keeps moving linearly without changing the moving speed. By way of example segmentation, it can be observed that the segmentation of the contour of the target person in fig. 8 is not accurate enough. On the basis of the above, the precision is improved, and the super-pixel is sent to a super-pixel input channel, the spatial distance weighted value is set to be 75, the number of the segmentation blocks is set to be 150, the initial step size is set to be 5, and super-pixel segmentation is carried out. After the segmentation is finished, the registered single target region R and the segmentation result S are subjected to energy filtering high-low frequency fusion rule0Fusing to reconstruct the final fused image EiThe image contour accuracy of the cascading segmentation and fusion can be obviously improved.
Example 3:
dim environment dual-target pedestrian segmentation condition
And establishing a camera model by using a geometrical relation, wherein the height of the target in an image plane is set as h, the height of a target A in the real world is 175cm, the height of a target B in the real world is 165cm, the focal length of the camera is set as 12.45cm, the actual distance between the target A and the camera is 115cm, and the actual distance between the target B and the camera is 105 cm. The twin pedestrian objects A, B move in opposite directions at a speed of about 0.5m/s, and both keep moving straight without changing the moving speed. By example segmentation, a graph can be observedThe segmentation of the contour of the target person in 11 is not accurate enough. On the basis, the precision is improved, the super-pixel is sent to a super-pixel input channel, a spatial distance weighted value is set to be 80, the number of segmentation blocks is set to be 200, the initial step size is set to be 6, and super-pixel segmentation is carried out. After the segmentation is finished, the registered single target region R and the segmentation result S are subjected to energy filtering high-low frequency fusion rule0Fusing to reconstruct the final fused image EiThe image contour accuracy of the cascading segmentation and fusion can be obviously improved.