CN110472653B - Semantic segmentation method based on maximized region mutual information - Google Patents

Semantic segmentation method based on maximized region mutual information Download PDF

Info

Publication number
CN110472653B
CN110472653B CN201910585061.1A CN201910585061A CN110472653B CN 110472653 B CN110472653 B CN 110472653B CN 201910585061 A CN201910585061 A CN 201910585061A CN 110472653 B CN110472653 B CN 110472653B
Authority
CN
China
Prior art keywords
picture
mutual information
dimensional
segmentation
lower bound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910585061.1A
Other languages
Chinese (zh)
Other versions
CN110472653A (en
Inventor
赵帅
蔡登�
王阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910585061.1A priority Critical patent/CN110472653B/en
Publication of CN110472653A publication Critical patent/CN110472653A/en
Application granted granted Critical
Publication of CN110472653B publication Critical patent/CN110472653B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method based on maximized region mutual information, which comprises the following steps: (1) inputting a real scene picture into a segmentation model to obtain a prediction segmentation picture; (2) constructing high-dimensional distribution of a prediction picture and a label picture; (3) calculating an approximate value of the posterior variance of the high-dimensional distribution of the label pictures under the condition of the high-dimensional distribution of the given prediction pictures; (4) calculating the lower bound of mutual information of two high-dimensional distributions of the predicted picture and the label picture; (5) updating the weight parameters of the segmentation model according to the lower bound of the obtained mutual information, and maximizing the mutual information of two high-dimensional distributions so as to maximize the similarity of the predicted picture and the label picture; (6) and (5) repeating the steps (1) to (5), finishing training after the preset training times are reached, and applying semantic segmentation to the trained model. According to the invention, the segmentation effect of the segmentation model can be enhanced by maximizing the mutual region information between the model segmentation picture and the label picture.

Description

Semantic segmentation method based on maximized region mutual information
Technical Field
The invention belongs to the field of image semantic segmentation in computer vision, and particularly relates to a semantic segmentation method based on maximized region mutual information.
Background
Image semantic segmentation is a basic problem in the field of computer vision, and aims to assign corresponding semantic labels to each pixel in an image, wherein the semantic labels represent object categories to which pixel points belong, such as sky, people, vehicles, buildings and the like. Semantic segmentation has wide application scenes in the fields of automatic driving, medical image analysis, robot vision and the like; the semantic segmentation research also has great heuristic significance to other computer vision problems. In practice, image semantic segmentation is usually treated as a pixel-by-pixel multi-classification problem. In recent years, the problem of semantic segmentation has gained dramatic progress with the development of convolutional neural networks and the introduction of various deep learning models that are well-designed for the task of semantic segmentation. In general, training and optimization of the segmentation model are accomplished by minimizing the average classification loss of pixels. Among these, the most commonly used semantic segmentation loss function is the softmax cross-entropy loss function:
Figure BDA0002114299050000011
wherein p is the classification probability predicted by the model [0, 1], y is the real object class label {0, 1}, N represents the number of pixel points in the picture, and C is the class number of the object to be classified. Minimizing the cross-entropy loss between y and p is equivalent to minimizing the relative entropy between them, or the Kullback-Leibler (KL) dispersion.
As can be seen from the above formula, the cross entropy loss is calculated pixel by pixel, so it ignores the relationship between the pixel points. However, strong dependencies exist between pixel points in one picture, and structural information of an object is implied in the dependencies. Since the pixel-by-pixel loss function (such as the cross entropy loss mentioned above) ignores the relationship between pixels, the semantic segmentation model supervised and trained by the pixel-by-pixel loss function cannot well identify the pixel points of objects with poor foreground information or objects with small spatial structures, and the segmentation effect of the model also deteriorates.
Some previous methods also focus on using the correlation of pixel points in the picture to enhance the segmentation effect. The article "efficiency Information in full Connected CRFs with Gaussian Edge powers" on Conference on Neural Information Processing Systems in the 26 nd proceedings of 2012 and the article "deep Lab" on sheet Analysis and Machine Intelligence, volume 40, page 834 and 848 of the 2018 edition, IEEE Transactions: the Semantic Image Segmentation with Deep relational Nets, atom correlation, and full Connected CRFs uses a Conditional Random Field (CRF) to fit the relationship of the pixels in the picture. However, CRF typically has a time-consuming iterative inference process, and CRF is also sensitive to the appearance of objects in the image. An article "Learning Affinity visual space prediction Networks" on Conference on Neural Information Processing Systems in 31 st year in 2017 and an article "Adaptive Affinity Fields for Semantic Segmentation" on European Computer Vision international Conference video in 2018 in 2017 enhance the Segmentation effect by using the similarity relationship between points, but the similarity relationship of the pixels in the picture usually needs additional network model branches, and the obtained similarity matrix also needs additional memory space to be stored.
Disclosure of Invention
The invention provides a semantic segmentation method based on maximized region mutual information, which is used for enhancing the segmentation effect of a segmentation model by maximizing the region mutual information between a model segmentation picture and a label picture.
A semantic segmentation method based on maximized region mutual information comprises the following steps:
(1) inputting a real scene picture into a segmentation model to obtain a prediction segmentation picture;
(2) constructing high-dimensional distribution of a prediction picture and a label picture;
(3) calculating an approximate value of the posterior variance of the high-dimensional distribution of the label pictures under the condition of the high-dimensional distribution of the given prediction pictures;
(4) calculating the lower bound of mutual information of two high-dimensional distributions of the prediction picture and the label picture according to the obtained approximate value of the posterior variance;
(5) updating the weight parameters of the segmentation model according to the lower bound of the obtained mutual information, and maximizing the mutual information of two high-dimensional distributions so as to maximize the similarity of the predicted picture and the label picture;
(6) and (5) repeating the steps (1) to (5), finishing training after the preset training times are reached, and applying semantic segmentation to the trained model.
In the step (2), the random variables corresponding to the high-dimensional distribution of the prediction picture and the label picture are as follows:
P=[p1,p2,...,pd]T
Y=[y1,y2,...,yd]T
wherein P ∈ RdHigh dimensional variable representing a predicted picture, Y ∈RdHigh dimensional variable, p, representing a tagged pictureiIs in the interval [0, 1]]Y ofiIs 0 or 1, d is the dimension of these high-dimensional vectors, and is also the area size of the regions used to construct these high-dimensional vectors.
The probability density distribution functions (PDFs) of the random variables P and Y are f (P) and f (Y), respectively, and the PDFs of their joint distributions are f (Y, P). The distribution of the random variable P can also be considered to be the variable P1,p2,...,pdThis means that f (p) is1,p2,...,pd). The invention aims to maximize the similarity between P and Y by maximizing the mutual information of P and Y, so that the predicted picture and the real label picture of the segmentation model have higher-level consistency than when only pixel-by-pixel loss is used. Mutual information I (Y; P) of P and Y is defined as follows:
Figure BDA0002114299050000031
here, ,
Figure BDA0002114299050000032
and
Figure BDA0002114299050000033
support sets for Y and P, respectively. Now we aim to maximize the mutual information I (Y; P) between Y and P to maximize the similarity between them, and thus achieve high consistency of the predicted pictures and the real labels.
To calculate I (Y; P), one straightforward way is to find the PDFs mentioned above. However random variable p1,p2,...,pdAre related variables, which make their joint probability density function f (p) difficult to analyze. The article Image Similarity Using Mutual Information of Regions at the European Conference on Computer Vision, European Computer Vision Conference 2004, demonstrated that the distribution of Y and P tends to be Gaussian for grayscale images when d is large enough. However in the context of segmentationIn the case where Y and P tend to have a Gaussian distribution, the dimension d is 900 or more. This means a significant consumption of computing resources, which is unacceptable in most cases. This method of constructing a gaussian distribution is therefore theoretically feasible but not achievable.
Since the high-dimensional distribution of Y and P is uncertain and it is not feasible to directly calculate the exact value of their mutual information, the present invention derives a lower bound of mutual information and maximizes the true mutual information I (Y; P) by maximizing this lower bound.
From the nature of mutual information, I (Y; P) ═ H (Y) -H (Y | P), where H (Y) is the entropy (entropy) of the random variable Y and H (Y | P) is the conditional entropy of the random variable pair (Y, P). Meanwhile, in the distribution of the covariance, the entropy of the gaussian distribution is the largest. And one covariance matrix is ∈ Rd×dHas an entropy of
Figure BDA0002114299050000041
Where e is the Euler constant and det (-) represents the value of the determinant of the matrix. Thus, a lower bound of mutual information I (Y; P) can be obtained:
Figure BDA0002114299050000042
here sigmaY|PIs the posterior variance matrix of given P, Y, which is a symmetric semi-positive definite matrix, and d is the degree of dimension of the random vectors Y and P. By imitating the common cross entropy loss, a constant part irrelevant to the parameters of the model in the objective function is omitted, and a simplified mutual information lower bound I for optimization can be obtainedl(Y;P):
Figure BDA0002114299050000051
In the step (3), since the specific distribution of Y and P is unknown, the posterior variance matrix cannot be directly calculated. Therefore, the invention calculates an approximate value of the posterior variance to obtain an approximate value of the lower bound of mutual information.
For random variables Y and P, E (Y) is the mean of Y (also denoted as μ)y) Var (Y) is the variance of Y (also denoted as ∑Y) Cov (Y, P) is the covariance of Y and P. Symbol Y ^ T2P denotes that Y and P are second order independent, which means that for any value P of P there are E (Y | P ═ E (Y) and Var (Y | P ═ P) ═ Var (Y). Second order independence is a weaker constraint than strict mutual independence. Furthermore, given a regression matrix for P, Y
Figure BDA0002114299050000052
By calculating Y-AypThe correlation coefficient of P and P, it can be easily seen that both are uncorrelated. To get an approximation of the posterior variance, assume that:
(Y-AypP)⊥2P.
this assumption implies that the posterior variance Var (Y-A)ypP | P ═ P) is independent of the value P of P. According to the property of the covariance matrix and the second-order independent definition, the approximate calculation formula of the posterior variance is as follows:
Figure BDA0002114299050000053
wherein P ∈ Rd、Y∈RdHigh-dimensional variables respectively representing the constructed prediction picture and the label picture; sigmaYIs the variance of the Y and is,
Figure BDA0002114299050000054
is the inverse of the variance of P; cov (Y, P) is the covariance matrix between Y and P, Cov (Y, P)TRepresenting its transposed matrix; var (Y | P ═ P) is the posterior variance of Y given P;
Figure BDA0002114299050000055
is the regression matrix for Y given P.
In the step (4), the lower bound calculation formula of the mutual information of the two high-dimensional distributions is as follows:
Figure BDA0002114299050000056
wherein, Il(Y; P) represents a lower bound of mutual information between the random variables Y and P; sigmaYIs the variance of the Y and is,
Figure BDA0002114299050000061
is the inverse of the variance of P; cov (Y, P) is the covariance matrix between Y and P, Coy (Y, P)TRepresenting its transposed matrix; det (-) represents the value of the determinant of the matrix in brackets, log (-) represents the logarithm operation, with the base being the natural number e.
For brevity, define
Figure BDA0002114299050000062
I.e. an approximation of the posterior variance in step (3). When the lower bound of the mutual information is calculated according to the formula, the discovery is carried out
Figure BDA0002114299050000063
Where λ is the eigenvalue of the matrix M. As can be seen from this equation, IlThe size of the value of (Y; P) is most likely related to the number of dimensions of the matrix. Thus IlThe value of (Y; P) is further divided by the dimension d of the matrix to eliminate the influence of the matrix dimension on the value of the lower bound of mutual information:
Figure BDA0002114299050000064
where d is the number of dimensions of the random variables Y and P.
In step (5), after the lower bound of the mutual information is obtained by calculation, a formula for optimizing a total loss function of the segmentation model is as follows:
Figure BDA0002114299050000065
wherein,
Figure BDA0002114299050000066
is the total loss for training the model,
Figure BDA0002114299050000067
is the cross entropy loss between the labeled picture y and the predicted picture p, B represents the number of training pictures in a random training batch, C represents the number of channels of the predicted picture, i.e., the number of classes of objects in the real picture,
Figure BDA0002114299050000068
is the lower bound of the mutual information of the c channel corresponding to the b training picture, λ is a weighting factor, and the value of λ is set to 0.5 by default. Here, the maximization of mutual information is also translated into a problem of minimizing its negative value.
Compared with the prior art, the invention has the following beneficial effects:
1. the loss function of the regional mutual information provided by the invention provides a very intuitive method for fitting the relation of pixel points in the pictures and measuring the structural similarity between the two pictures, only a small amount of computing resources are needed during realization, and the method is easy to use.
2. The method provided by the invention has the advantages that the model is easy to train, and no additional inference step or additional network structure is needed. A large number of experiments prove that the segmentation model trained by the method provided by the invention can obtain performance superior to a reference algorithm and other similar methods.
Drawings
FIG. 1 is a general framework and flow diagram of the present invention;
FIG. 2 is a diagram illustrating the construction of high-dimensional distribution of predicted pictures and labeled pictures according to the present invention;
fig. 3 shows the result of qualitative segmentation on the PASCAL VOC 2012 validation set according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
As shown in fig. 1, a semantic segmentation method based on maximized regional mutual information constructs high-dimensional distributions of a predicted picture and a real tag picture after obtaining the predicted picture output by a segmentation model, and calculates a posterior variance matrix of the real tag distribution given the high-dimensional distribution of the predicted picture; and then the lower bound of the mutual information of the two high-dimensional distributions can be calculated, and the segmentation model can maximize the value of the real mutual information between the segmentation picture and the real label picture by maximizing the lower bound of the mutual information, so that the similarity between the segmentation picture and the real label picture is maximized. The segmentation model trained in the way has better segmentation performance than the model trained by only using the cross entropy loss of each pixel.
As shown in fig. 2, for a predicted picture or a real label picture, a pixel itself and several pixels around the pixel are used to represent the pixel, so that the pixel can be represented as a high-dimensional point, and for a picture, the pixel can be moved from pixel to pixel, so that many high-dimensional points can be obtained. In this case, a picture can be represented by these high-dimensional distributed points. By maximizing the similarity of these two high-dimensional distributions, the prediction picture and the tag picture can achieve higher consistency than if only pixel-wise cross-entropy loss is used. In other words, the segmentation effect of the model becomes better.
By constructing the high-dimensional distribution in the above manner, the problem of excessive memory overhead may exist, and further, the number of floating point calculations may be increased. For example, assuming a 9-dimensional point is constructed, the storage space required to construct the high-dimensional distribution is 9 times that of the original. In the context of current large-scale deep learning, this means memory overhead of several GB or even tens of GB, and the increase in floating point computation load that follows is also what we cannot afford. In order to reduce the consumption of computing resources, the invention firstly carries out down-sampling processing on the prediction picture and the construction picture before constructing the high-dimensional distribution, so that the expense of the computing resources can be controlled within an acceptable range. The down-sampling process has a certain negative effect on the performance of the model, but the down-sampling process makes the loss function for maximizing the regional mutual information, which is provided by the invention, have real practical application significance.
When calculating the lower bound of the mutual information, the value of the determinant of the covariance matrix M needs to be calculated, which may cause a problem of numerical underflow. This is because the probability given by the segmentation model is given by softmax or sigmoid operation, and the value of the probability may be very small, and at the same time, the number of pixels in one picture may be very large. For example, the size of the image commonly used in model training is 513 × 513, which means that there are approximately 263000 points in the picture. If according to Cov (Y, Y) E ((Y-. mu.))y)(Y-μy)T) The covariance matrix is calculated by the formula, and the values of some elements in the matrix are very likely to be very small, so that the calculation formula of the lower bound of mutual information is rewritten as follows in practical application:
Figure BDA0002114299050000081
here, Tr (-) indicates the trace of the matrix,
Figure BDA0002114299050000082
in addition, since M is a symmetric semi-positive definite matrix, in practical application, a small positive number is added to the element on the diagonal of M, and M + ξ I is formed. The influence on the optimal solution of the system is extremely small, but the operation speed of the matrix operation can be accelerated by carrying out the Kariski decomposition on the new symmetric positive definite matrix M. In practice ξ -1 e-6 is set. In order to ensure the accuracy of calculation, the invention uses double-precision floating point number in the operation related to the lower bound of the calculation mutual information. It is also noted that log (det (M)) is a concave function with respect to matrix M when matrix M is a positive definite matrix. This property makes the system easy to optimize.
The overall objective function for optimizing the segmentation model in practice is:
Figure BDA0002114299050000091
where, λ is a weighting factor,
Figure BDA0002114299050000092
is the cross entropy loss between y and p, B represents the number of training pictures in a random training batch, and the problem of maximizing the lower bound of mutual information is translated into a minimization problem. In practice, λ is set to 0.5. The effect of the conventional cross entropy loss is to measure the similarity between the pixel intensities of the two pictures, while the effect of the region mutual information is to measure the structural similarity between the two pictures.
In practice, when the model finally outputs the predicted probability value, the invention adopts sigmoid operation to obtain the predicted probability value instead of the common softmax operation. This is because when the lower bound of the regional mutual information is calculated, the picture of each channel is calculated separately, whereas when the softmax operation is used, a very strong correlation is explicitly introduced between the channels, which may cause unpredictable results. Experiments show that the performance of the model trained by using the softmax cross entropy loss and the sigmoid cross entropy loss is approximately the same. The use of these two operations has a negligible impact on model performance.
In fig. 3, the segmentation effect of the segmentation model trained with the algorithm of the present invention and with the conventional method is shown. It can be seen that the segmentation model trained by the algorithm of the invention has better performance on segmentation details, such as segmentation effects of animal legs and human limbs, compared with the segmentation model trained by the conventional method; the overall visual effect of the segmented picture is greatly improved. This qualitatively demonstrates the effectiveness of the proposed algorithm.
In order to embody the technical effects and advantages of the present invention, the method proposed by the present invention is applied to practical examples, and compared with other methods of the same type.
The segmentation models adopted by the invention are the DeepLabv3 and DeepLabv3+ semantic segmentation models at the current leading edge, and the performance of the segmentation models can be compared by using the method provided by the invention and other methods.
The invention was tested on two public data sets PASCAL VOC 2012 and CamVid. The PASCAL VOC 2012 data set is divided into three parts: training, validation and test sets with 1464, 1449 and 1456 pictures, respectively. The present invention was trained using an enhanced data set of PASCAL VOCs 2012, containing 10582 pictures. The CamVid dataset is a street view dataset, and the training set, validation set, and test set contain 367, 101, and 233 pictures, respectively. The segmentation model is trained on a training set and a verification set of a CamVid data set, and the effect is tested on a test set.
The evaluation index used by the invention is mean intersection-over-unity (mIoU) score, namely the ratio of intersection and union of objects in the predicted segmentation picture and the real segmentation picture. The effect of the algorithm is firstly verified on the PASCAL VOC 2012 verification set, and the result is shown in Table 1.
TABLE 1
Figure BDA0002114299050000101
In table 1, CE and BCE are the conventional softmax and sigmoid cross entropy losses, respectively, and the data for CE one row is the data reported in the deplab v3 and the deplab v3+ model papers.
The CRF and Affinity are the same type of algorithms as the method provided by the invention, and the aim is to fit the relation of pixel points in the picture. CRF-X means that when CRF is used, the iterative inference step for CRF is X. The reasoning time is the time required by the method to output the segmentation picture when a real picture is input, and the CRF has a time-consuming iterative reasoning process as can be seen from the table. It can also be clearly seen that the segmentation model trained by the algorithm proposed by the present invention has better performance than the conventional method and some methods of the same type.
Furthermore, the invention performs control variable experiments on the PASCAL VOC 2012 validation set to test the influence of different elements in the image semantic segmentation system proposed by the invention on the final segmentation result, and the results are shown in table 2.
TABLE 2
Figure BDA0002114299050000111
In table 2, the down-sampling modes include interpolation (Int.), maximum pooling (Max.), and mean pooling (Avg.). The down-sampling factor is the ratio of the original picture size and the size of the down-sampled picture. The side length of the square region determines the dimension size of the high-dimensional distribution of the constructed prediction picture and the label picture. As can be seen from table 2, average pooling, smaller down-sampling factors and larger square areas (higher high-dimensional distribution dimensions) may lead to relatively better performance.
Furthermore, the invention verifies the validity of the proposed algorithm on a CamVid verification set.
TABLE 3
Figure BDA0002114299050000112
As shown in table 3, in comparing the segmentation effect of each object class in the CamVid dataset with the total segmentation effect, the algorithm proposed by the present invention still exhibits better performance than the baseline algorithm and some methods of the same type. This proves the universality and superiority of the loss function and image semantic segmentation system based on the maximized region mutual information provided by the invention.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (4)

1. A semantic segmentation method based on maximized regional mutual information is characterized by comprising the following steps:
(1) inputting a real scene picture into a segmentation model to obtain a prediction picture;
(2) constructing high-dimensional distribution of a prediction picture and a label picture;
(3) calculating an approximate value of the posterior variance of the high-dimensional distribution of the label pictures under the condition of the high-dimensional distribution of the given prediction pictures; the approximate calculation formula of the posterior variance is as follows:
Figure FDA0003169742830000011
wherein P ∈ Rd、Y∈RdHigh-dimensional variables respectively representing the constructed prediction picture and the label picture; sigmaYIs the variance of the Y and is,
Figure FDA0003169742830000012
is the inverse of the variance of P; cov (Y, P) is the covariance matrix between Y and P, Cov (Y, P)TRepresenting its transposed matrix; var (Y | P ═ P) is an approximation of the posterior variance of Y given P;
Figure FDA0003169742830000013
is a regression matrix for Y given P;
(4) calculating the lower bound of mutual information of two high-dimensional distributions of the prediction picture and the label picture according to the obtained approximate value of the posterior variance; the lower bound calculation formula of the mutual information of the two high-dimensional distributions is as follows:
Figure FDA0003169742830000014
wherein, Il(Y; P) represents a lower bound of mutual information between the random variables Y and P; sigmaYIs the variance of the Y and is,
Figure FDA0003169742830000015
is the inverse of the variance of P; cov (Y, P) is the covariance matrix between Y and P, Cov (Y, P)TRepresenting its transposed matrix; det (-) generationThe table evaluates the determinant of the matrix in brackets, log (-) represents the logarithm operation, and the base is the natural number e;
(5) updating the weight parameters of the segmentation model according to the lower bound of the obtained mutual information, and maximizing the mutual information of two high-dimensional distributions so as to maximize the similarity of the predicted picture and the label picture;
after the mutual information lower bound is obtained through calculation, the formula of the total loss function for optimizing the segmentation model is as follows:
Figure FDA0003169742830000021
wherein,
Figure FDA0003169742830000022
is the total loss for training the model,
Figure FDA0003169742830000023
is the conventional cross entropy loss between the labeled picture y and the predicted picture p, B represents the number of training pictures in a random training batch, C represents the number of channels of the predicted picture, i.e. the number of classes of objects in the real picture,
Figure FDA0003169742830000024
is the mutual information lower bound of the c channel corresponding to the b training picture, and lambda is a weight factor;
(6) and (5) repeating the steps (1) to (5), finishing training after the preset training times are reached, and applying semantic segmentation to the trained model.
2. The semantic segmentation method based on the maximized regional mutual information as claimed in claim 1, wherein in the step (2), the random variables corresponding to the high-dimensional distribution of the predicted picture and the tagged picture are:
P=[p1,p2,...,pd]T
Y=[y1,y2,...,yd]T
wherein P ∈ RdHigh dimensional variable representing a predicted picture, Y ∈ RdHigh dimensional variable, p, representing a tagged picturedIs in the interval [0, 1]]Y ofdIs 0 or 1, d is the dimension of these high-dimensional vectors, and is also the area size of the regions used to construct these high-dimensional vectors.
3. The semantic segmentation method based on the mutual information of the maximized area as claimed in claim 1, wherein when calculating the lower bound of the two high-dimensional distributed mutual information, the influence of matrix dimension on the value size of the lower bound of the mutual information is eliminated, and the specific formula is as follows:
Figure FDA0003169742830000025
where d is the number of dimensions of the random variables Y and P.
4. The method for semantic segmentation based on maximized regional mutual information as claimed in claim 1, wherein the value of λ is set to 0.5.
CN201910585061.1A 2019-07-01 2019-07-01 Semantic segmentation method based on maximized region mutual information Active CN110472653B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910585061.1A CN110472653B (en) 2019-07-01 2019-07-01 Semantic segmentation method based on maximized region mutual information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910585061.1A CN110472653B (en) 2019-07-01 2019-07-01 Semantic segmentation method based on maximized region mutual information

Publications (2)

Publication Number Publication Date
CN110472653A CN110472653A (en) 2019-11-19
CN110472653B true CN110472653B (en) 2021-09-21

Family

ID=68507428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910585061.1A Active CN110472653B (en) 2019-07-01 2019-07-01 Semantic segmentation method based on maximized region mutual information

Country Status (1)

Country Link
CN (1) CN110472653B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738355B (en) * 2020-07-22 2020-12-01 中国人民解放军国防科技大学 Image classification method and device with attention fused with mutual information and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657715A (en) * 2018-12-12 2019-04-19 广东工业大学 A kind of semantic segmentation method, apparatus, equipment and medium
CN109886315A (en) * 2019-01-29 2019-06-14 电子科技大学 A kind of Measurement of Similarity between Two Images method kept based on core

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657715A (en) * 2018-12-12 2019-04-19 广东工业大学 A kind of semantic segmentation method, apparatus, equipment and medium
CN109886315A (en) * 2019-01-29 2019-06-14 电子科技大学 A kind of Measurement of Similarity between Two Images method kept based on core

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Adaptive Hierarchical Multinomial Latent Model With Hybrid Kernel Function for SAR Image Semantic Segmentation;Yiping Duan;《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》;20181031;第56卷(第10期);全文 *
DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs;Liang-Chieh Chen;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20180430;第40卷(第4期);全文 *

Also Published As

Publication number Publication date
CN110472653A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CN110930454B (en) Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning
Luo et al. Fire smoke detection algorithm based on motion characteristic and convolutional neural networks
Zhao et al. Optimal-selection-based suppressed fuzzy c-means clustering algorithm with self-tuning non local spatial information for image segmentation
CN104182772B (en) A kind of gesture identification method based on deep learning
Montazer et al. An improved radial basis function neural network for object image retrieval
CN110322445B (en) Semantic segmentation method based on maximum prediction and inter-label correlation loss function
Jancsary et al. Regression Tree Fields—An efficient, non-parametric approach to image labeling problems
CN112434655B (en) Gait recognition method based on adaptive confidence map convolution network
CN110555881A (en) Visual SLAM testing method based on convolutional neural network
Ayyalasomayajula et al. PDNet: Semantic segmentation integrated with a primal-dual network for document binarization
CN110889865B (en) Video target tracking method based on local weighted sparse feature selection
CN109741364B (en) Target tracking method and device
CN113095254B (en) Method and system for positioning key points of human body part
CN111563915A (en) KCF target tracking method integrating motion information detection and Radon transformation
CN109766790B (en) Pedestrian detection method based on self-adaptive characteristic channel
CN109509191A (en) A kind of saliency object detection method and system
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN106874862A (en) People counting method based on submodule technology and semi-supervised learning
CN115810149A (en) High-resolution remote sensing image building extraction method based on superpixel and image convolution
CN110135435B (en) Saliency detection method and device based on breadth learning system
Shi et al. A new multiface target detection algorithm for students in class based on bayesian optimized YOLOv3 model
CN114882278A (en) Tire pattern classification method and device based on attention mechanism and transfer learning
CN110472653B (en) Semantic segmentation method based on maximized region mutual information
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
CN116884067B (en) Micro-expression recognition method based on improved implicit semantic data enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant