WO2023206944A1 - 一种语义分割方法、装置、计算机设备和存储介质 - Google Patents

一种语义分割方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2023206944A1
WO2023206944A1 PCT/CN2022/120480 CN2022120480W WO2023206944A1 WO 2023206944 A1 WO2023206944 A1 WO 2023206944A1 CN 2022120480 W CN2022120480 W CN 2022120480W WO 2023206944 A1 WO2023206944 A1 WO 2023206944A1
Authority
WO
WIPO (PCT)
Prior art keywords
classifier
target
probability distribution
image
weighting coefficient
Prior art date
Application number
PCT/CN2022/120480
Other languages
English (en)
French (fr)
Inventor
田倬韬
陈亦新
赖昕
蒋理
刘枢
吕江波
沈小勇
Original Assignee
深圳思谋信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳思谋信息科技有限公司 filed Critical 深圳思谋信息科技有限公司
Publication of WO2023206944A1 publication Critical patent/WO2023206944A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular to a semantic segmentation method, device, computer equipment and storage medium.
  • image recognition has become more and more widely used. For example, for images that need to be segmented, using the image semantic segmentation model will greatly improve the image segmentation efficiency.
  • images can be input into the feature extractor and classifier of the image semantic segmentation model for image segmentation to obtain the categories corresponding to each pixel.
  • the current image semantic segmentation model uses General classifiers cannot well describe the feature distribution of different samples, and there is a problem of low segmentation accuracy.
  • this application provides an image semantic segmentation method.
  • the method includes: acquiring a target image; performing feature extraction on the target image to obtain image extraction features corresponding to the target image; processing the image extraction features based on a first classifier to obtain an intermediate probability distribution,
  • the intermediate probability distribution includes the probability that each pixel in the target image belongs to each pixel category; the intermediate probability distribution is fused with the image extraction features to obtain a target classifier corresponding to the image extraction features; based on the target
  • the classifier and the first classifier process the image extraction features to obtain a target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the probability that each pixel in the target image belongs to each pixel category; based on The target probability distribution obtains the target pixel category corresponding to each pixel point in the target image.
  • processing the image extraction features based on the target classifier and the first classifier to obtain the target probability distribution corresponding to each pixel category includes: obtaining the third value corresponding to the target classifier.
  • a weighting coefficient and a second weighting coefficient corresponding to the general classifier weighting the target classifier based on the first weighting coefficient to obtain a weighted target classifier; classifying the general classification based on the second weighting coefficient
  • the weighted universal classifier is obtained by weighting the weighted universal classifier; a comprehensive classifier is obtained by combining the weighted universal classifier and the weighted target classifier, and the comprehensive classifier is used to process the image extraction features to obtain the corresponding pixel categories of each pixel. target probability distribution.
  • obtaining the first weighting coefficient corresponding to the target classifier and the second weighting coefficient corresponding to the general classifier includes: obtaining the weight between the general classifier and the target classifier.
  • Classifier similarity a first weighting coefficient is obtained based on the classifier similarity, the first weighting coefficient is positively correlated with the classifier similarity;
  • a second weighting coefficient is obtained based on the first weighting coefficient, the The second weighting coefficient has a negative correlation with the first weighting coefficient.
  • obtaining the first weighting coefficient based on the classifier similarity includes: obtaining a sensitivity coefficient; multiplying the sensitivity coefficient and the classifier similarity to obtain the first weighting coefficient .
  • processing the image extraction features based on the first classifier to obtain an intermediate probability distribution includes: processing the image extraction features based on the general classifier in the image semantic segmentation model. Process to obtain an initial probability distribution, which includes an initial probability matrix corresponding to each pixel category; shift the elements in the initial probability matrix to the same row or column to obtain a shift probability matrix; The shift probability matrices corresponding to each pixel category are spliced to obtain an intermediate probability distribution.
  • the calculation method corresponding to the target classifier includes: performing a matrix multiplication operation on the intermediate probability distribution and the image extraction feature to obtain the target classifier; the intermediate probability distribution is the The initial probability distribution is shifted and spliced, and an activation function is used to activate the spliced initial probability distribution.
  • the initial probability distribution is obtained by processing the extracted features of the image.
  • fusing the intermediate probability distribution with the image extraction features to obtain a target classifier corresponding to the image extraction features includes: obtaining a confidence threshold; Each probability is compared with the confidence threshold, and a mask matrix is obtained based on the relationship between the probability and the confidence threshold; based on the mask matrix, the intermediate probability distribution is screened to obtain the denoising probability Distribution; multiply the denoising probability distribution and the image extraction feature to obtain a target classifier corresponding to the image extraction feature.
  • filtering the intermediate probability distribution based on the mask matrix to obtain a denoising probability distribution includes:
  • Each value in the mask matrix is multiplied by the probability of the corresponding position in the intermediate probability distribution to obtain a denoising probability distribution.
  • the image extraction features are processed based on the first classifier to obtain an intermediate probability distribution, including:
  • the probability values in the initial probability distribution are shifted and normalized to obtain an intermediate probability distribution.
  • this application also provides an image semantic segmentation device, including: a target image acquisition module, used to acquire the target image; an image extraction feature extraction module, used to extract features of the target image to obtain the target image.
  • Image extraction features corresponding to the image An intermediate probability distribution obtaining module, used to process the image extraction features based on the first classifier to obtain an intermediate probability distribution, where the intermediate probability distribution includes that each pixel in the target image belongs to each The probability of the pixel category; a target classifier acquisition module, used to fuse the intermediate probability distribution with the image extraction features to obtain a target classifier corresponding to the image extraction features; a target probability distribution acquisition module, used to obtain the target classifier based on the image extraction features
  • the target classifier and the first classifier process the image extraction features to obtain a target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the probability that each pixel in the target image belongs to each pixel category.
  • the target pixel category obtaining module is used to obtain the target pixel category corresponding to each pixel in the target image.
  • a target classifier obtaining module is configured to: obtain the first weighting coefficient corresponding to the target classifier and the second weighting coefficient corresponding to the general classifier; and obtain the first weighting coefficient based on the first weighting coefficient.
  • the target classifier is weighted to obtain a weighted target classifier; the general classifier is weighted based on the second weighting coefficient to obtain a weighted general classifier; the weighted general classifier and the weighted target classifier are combined to obtain a comprehensive classifier , use the comprehensive classifier to process the image extraction features to obtain the target probability distribution corresponding to each pixel category.
  • a target classifier obtaining module is configured to: obtain the classifier similarity between the general classifier and the target classifier, and obtain the first weighting coefficient based on the classifier similarity, The first weighting coefficient has a positive correlation with the classifier similarity; the second weighting coefficient is obtained based on the first weighting coefficient, and the second weighting coefficient has a negative correlation with the first weighting coefficient.
  • the target classifier acquisition module is configured to: obtain a sensitivity coefficient; and multiply the sensitivity coefficient by the classifier similarity to obtain the first weighting coefficient.
  • an intermediate probability distribution obtaining module is configured to: process the image extraction features based on the universal classifier in the image semantic segmentation model to obtain an initial probability distribution, where the initial probability distribution includes each an initial probability matrix corresponding to each pixel category; shift the elements in the initial probability matrix to the same row or column to obtain a shift probability matrix; splice the shift probability matrices corresponding to each pixel category , get the intermediate probability distribution.
  • the calculation method corresponding to the target classifier includes: shifting and splicing the initial probability distribution obtained by processing the image extraction features, and using an activation function to activate the spliced initial probability distribution to obtain Intermediate probability distribution; perform a matrix multiplication operation on the intermediate probability distribution and the image extraction features to obtain the target classifier.
  • a target classifier obtaining module is configured to: obtain a confidence threshold; compare each probability in the intermediate probability distribution with the confidence threshold, based on the probability and the confidence threshold The mask matrix is obtained by the size relationship; based on the mask matrix, the intermediate probability distribution is screened to obtain the denoising probability distribution; the denoising probability distribution is multiplied by the image extraction feature to obtain target classifier.
  • the target classifier acquisition module is configured to multiply each value in the mask matrix by the probability of the corresponding position in the intermediate probability distribution to obtain a denoising probability distribution.
  • the intermediate probability distribution obtaining module is used to perform dimensionality reduction on the image extraction features based on the first classifier to obtain an initial probability distribution; shift and normalize the probability values in the initial probability distribution , the intermediate probability distribution is obtained.
  • this application also provides a computer device.
  • the computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, the steps in the above image semantic segmentation method are implemented.
  • this application also provides a computer-readable storage medium.
  • the computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps in the above-mentioned image semantic segmentation method are implemented.
  • this application also provides a computer program product.
  • the computer program product includes a computer program that implements the steps in the above image semantic segmentation method when executed by a processor.
  • the above-mentioned image semantic segmentation method, device, computer equipment, storage medium and computer program product obtain a target image; perform feature extraction on the target image to obtain image extraction features corresponding to the target image; and classify the target image based on the first classifier.
  • the above image extraction features are processed to obtain the intermediate probability distribution corresponding to each pixel category.
  • the intermediate probability distribution includes the probability that each pixel in the target image belongs to each pixel category; the intermediate probability distribution is fused with the image extraction features to obtain the image extraction feature correspondence.
  • the target classifier based on the target classifier and the first classifier, the image extraction features are processed to obtain the target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the probability that each pixel in the target image belongs to each pixel category; based on the target The probability distribution is used to obtain the target pixel category corresponding to each pixel in the target image. Extract the feature vector of the target image through the feature extractor to obtain the image extraction features, and then use a general classifier to process the image extraction features to obtain an intermediate probability distribution used to build the target classifier. Then, the intermediate probability distribution is combined with the image extraction features Fusion is performed, and the target classifier is obtained after fusion. Therefore, the target classifier is adaptively changed based on the image extraction features of the target image. Based on the processing of the general classifier and the target classifier, it can enhance the adaptability of the classifier to different picture contents. The perceptual ability improves the accuracy of obtaining the target pixel category corresponding to the target pixel point, thereby improving the accuracy of image semantic segmentation.
  • Figure 1 is an application environment diagram of the image semantic segmentation method in one embodiment
  • Figure 2 is a schematic flowchart of an image semantic segmentation method in one embodiment
  • Figure 3 is a schematic diagram comparing PANet model simulation results in one embodiment
  • Figure 4 is a schematic diagram comparing simulation results of the PFENet model in one embodiment
  • Figure 5 is a schematic flowchart of image semantic segmentation steps in one embodiment
  • Figure 6 is a schematic flowchart of image semantic segmentation steps in another embodiment
  • Figure 7 is a schematic flowchart of image semantic segmentation steps in yet another embodiment
  • Figure 8 is a schematic flowchart of image semantic segmentation steps in yet another embodiment
  • Figure 9 is a schematic flowchart of image semantic segmentation steps in yet another embodiment
  • Figure 10 is a structural block diagram of an image semantic segmentation device in one embodiment
  • Figure 11 is an internal structure diagram of a computer device in one embodiment.
  • the image semantic segmentation method provided by the embodiment of the present application can be applied in the application environment as shown in Figure 1.
  • the terminal 102 communicates with the server 104 through the network.
  • the data storage system may store data that server 104 needs to process.
  • the data storage system can be integrated on the server 104, or placed on the cloud or other network servers.
  • the terminal 102 sends an instruction for image semantic segmentation to the server, and the server 104 obtains the target image; performs feature extraction on the target image to obtain image extraction features corresponding to the target image; and classifies the target image based on the first classifier.
  • the image extraction features are processed to obtain the intermediate probability distribution corresponding to each pixel category.
  • the intermediate probability distribution includes the probability that each pixel in the target image belongs to each pixel category; the intermediate probability distribution is fused with the image extraction features to obtain the corresponding image extraction features.
  • Target classifier based on the target classifier and the first classifier, the image extraction features are processed to obtain the target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the probability that each pixel in the target image belongs to each pixel category; based on the target probability The distribution obtains the target pixel category corresponding to each pixel in the target image.
  • the terminal 102 can be, but is not limited to, various personal computers, laptops, smart phones, tablets, Internet of Things devices and portable wearable devices.
  • the Internet of Things devices can be smart speakers, smart TVs, smart air conditioners, smart vehicle-mounted devices, etc.
  • Portable wearable devices can be smart watches, smart bracelets, head-mounted devices, etc.
  • the server 104 can be implemented as an independent server or a server cluster composed of multiple servers.
  • an image semantic segmentation method is provided.
  • the application of this method to the server in Figure 1 is used as an example to illustrate, including the following steps:
  • Step S202 Obtain the target image.
  • the target image may be an image used for image semantic segmentation of the artificial intelligence model.
  • the artificial intelligence model is used to perform semantic segmentation of the image, and at the same time, the parameters of the artificial intelligence model can be adaptively changed.
  • the server can obtain one or more images that need to be segmented using the image semantic segmentation model.
  • the camera takes a photo of a billboard, and the photo needs to be semantically segmented using an image semantic segmentation model, and the server obtains the photo from the camera terminal.
  • Step S204 Perform feature extraction on the target image to obtain image extraction features corresponding to the target image.
  • the feature extractor in the image semantic segmentation model can be used to extract features from the target image.
  • the image semantic segmentation model can classify the image at the pixel level. This model is aware of the surrounding environment and content of each pixel. , to complete the determination of the category to which a single pixel belongs; the feature extractor can be a function that extracts features from the image, and the features extracted by the function are represented by feature vectors; the image extraction features can be the target image obtained through the feature extractor The vector used to represent the feature of the image.
  • the feature extractor can be based on Convolutional Neural Networks (CNN), which is a type of feed-forward neural network that includes convolutional calculations and has a deep structure.
  • CNN Convolutional Neural Networks
  • the image semantic segmentation model includes a feature extractor and a general classifier.
  • the feature extractor can extract feature vectors in the image.
  • the general classifier classifies the feature vectors and obtains the probability of each pixel in each category.
  • the feature extractor will perform feature extraction on the target image to obtain image extraction features used to represent the features of the image.
  • the target image contains cats, dogs and background.
  • the feature extractor extracts features of cats, dogs and background. After extraction, the features of cats, dogs and background are represented in the form of vectors.
  • Step S206 Process the image extraction features based on the first classifier to obtain an intermediate probability distribution corresponding to each pixel category.
  • the intermediate probability distribution includes the probability that each pixel in the target image belongs to each pixel category.
  • the image extraction features may be processed based on the first classifier, and the universal classifier may be trained based on many training images, so it is universal for each image, and the result of the classification is for each image in the image.
  • [h, w, d] is used to represent the height, width and number of feature channels of the extracted image extraction features, and n is used to represent the number of categories to be predicted, so the size of the initial probability distribution p should be [h, w ,n], represents the probability of belonging of each pixel to n categories, and the corresponding size [n, d] of the general classifier is represented by a d-dimensional vector for each category in n categories.
  • the original size of the initial probability distribution p is [h, w, n], which is adjusted to a vector of size [hw, n] using the reshape operation, and further adjusted to a vector of size [n, hw] by exchanging rows and columns.
  • softmax is a function that obtains the normalized classification probability.
  • a vector is input to it (the vector element is any real number), and the output is the probability value of the vector belonging to a certain category.
  • softmax is used.
  • the target image contains a dog.
  • the general classifier in the image semantic segmentation model reshapes the probability of each pixel of the dog, and obtains the corresponding size of each pixel of the dog [hw , n] vector, and then through the exchange of vector elements, the vector [n, hw] corresponding to each pixel of the dog is obtained.
  • Step S208 fuse the intermediate probability distribution with the image extraction features to obtain a target classifier corresponding to the image extraction features.
  • the target classifier may be a function obtained by multiplying the intermediate probability distribution with the image extraction features. It can be understood that the general classifier and the target classifier are relative, that is, the general classifier is trained based on different training images and is a common classifier for different images, while the target classifier is based on the extraction of image features from one image. And the corresponding intermediate probability distribution is obtained, which is the unique classifier corresponding to the target image.
  • x is the image extraction feature
  • G c (x) here represents the category information extracted from the image feature x, which is the target classifier, and the size is also the same as the classifier.
  • the probability of each pixel of the dog image is passed through Reshape.
  • the process of Reshape is [h, w, n] to obtain the matrix [hw, n] through dimensionality reduction, and shift [hw, n].
  • Get [n, hw] the reshape operation adjusts it to a vector of size [n, hw], and its second dimension vector will be softmaxed, making it a weighted matrix of pairs on n categories (at hw positions in space The sum of the weights is 1), and then matrix multiplication is performed on each pixel of the dog image and the extracted features of the dog image.
  • Step S210 Process the image extraction features based on the target classifier and the first classifier to obtain a target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the probability that each pixel in the target image belongs to each pixel category.
  • the target probability distribution may be a probability distribution vector corresponding to each pixel category obtained by classifying the target image through the target classifier and the general classifier in the first classifier. There are many pixel categories.
  • the classifier composed of the target classifier and the general classifier is a comprehensive classifier.
  • This comprehensive classifier can automatically change its own parameters according to changes in image extraction features, making the model perform better.
  • the image extraction features are input into the target classifier and the general classifier. These two functions classify each pixel of the image feature respectively, and finally give the target probability distribution corresponding to each pixel category.
  • the comprehensive classifier C x is generated depending on the content of the data itself. The generation of the comprehensive classifier C x can be expressed as the following formula:
  • is a weighting coefficient, which can adjust the weight of the original classifier and the newly introduced features of the classifier to complete the construction of the adaptive classifier.
  • the classifier composed of the target classifier G c (x) and the general classifier C is a comprehensive classifier C x , and the feature vector of the target image is input into the comprehensive classifier C x .
  • the comprehensive classifier C x Process to obtain the probability of each pixel of the target image for each category.
  • Step S212 Obtain the target pixel category corresponding to each pixel in the target image based on the target probability distribution.
  • the target pixel category can be a category that can be classified for each pixel in the target image.
  • An image can include pixels of different categories, and image segmentation can segment pixels of the same category into the same area.
  • each pixel point has a probability corresponding to each category.
  • the category with the highest probability is selected as the target pixel category corresponding to the pixel point.
  • segmentation is performed to obtain Multiple areas.
  • the probability that pixel B is located in category 1 is 0.07
  • the probability that it is located in category 2 is 0.1
  • the probability that it is located in category 3 is 0.83. Therefore, pixel B has the highest probability for category 3, so the output pixel B is 3 categories.
  • a target image is obtained; feature extraction is performed on the target image to obtain image extraction features corresponding to the target image; and the image extraction features are processed based on the first classifier to obtain each pixel category.
  • the intermediate probability distribution includes the probability that each pixel in the target image belongs to each pixel category; fuse the intermediate probability distribution with the image extraction features to obtain the target classifier corresponding to the image extraction features; based on the target classifier and the third A classifier processes the image extraction features to obtain the target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the probability that each pixel in the target image belongs to each pixel category; based on the target probability distribution, the corresponding corresponding pixels in the target image are obtained. target pixel category.
  • the target classifier is adaptively changed based on the image extraction features of the target image. Based on the processing of the general classifier and the target classifier, it can enhance the adaptability of the classifier to different picture contents.
  • the perceptual ability improves the accuracy of obtaining the target pixel category corresponding to the target pixel point, thereby improving the accuracy of image semantic segmentation. That is, for different pictures, the obtained target classifier changes adaptively according to the content of the picture, which can better characterize the feature distribution of different images and improve the accuracy of image classification at the pixel level.
  • the first classifier includes a general classifier in the image semantic segmentation model. Based on the target classifier and the first classifier, the image extraction features are processed to obtain targets corresponding to each pixel category. Probability distributions include:
  • Step S302 Obtain the first weighting coefficient corresponding to the target classifier and the second weighting coefficient corresponding to the general classifier.
  • the first weighting coefficient can be the weight of the target classifier in the comprehensive classifier;
  • the second weighting coefficient can be the weight of the general classifier in the comprehensive classifier;
  • the first weighting coefficient and the second weighting coefficient can be It is a preset fixed value, or it can be adaptively obtained based on the target classifier.
  • the server obtains the first weighting coefficient corresponding to the target classifier and the second weighting coefficient corresponding to the general classifier.
  • the first weighting coefficient and the second weighting coefficient are negatively correlated, and the sum of the two is one. .
  • the server obtains the value of the first weighting coefficient as 0.7, because the sum of the first weighting coefficient and the second weighting coefficient is 1, so the server obtains the second weighting coefficient as 0.3.
  • Step S304 Weight the target classifier based on the first weighting coefficient to obtain a weighted target classifier.
  • the weighted target classifier may be a target classifier with a weighted effect obtained by multiplying the target classifier and the first weighting coefficient.
  • the first weighting coefficient is a weight
  • the target classifier is a classifier.
  • the classifier includes model parameters used for classification. After multiplying the two, a weighted target classifier is obtained, which is expressed as ⁇ using the calculation formula. ⁇ G c .
  • the first weighting coefficient is 0.7
  • the target classifier is Gx
  • the weighted target classifier obtained by multiplying the two is 0.7Gx.
  • Step S306 Weight the universal classifier based on the second weighting coefficient to obtain a weighted universal classifier.
  • the weighted universal classifier may be a universal classifier with a weighted effect obtained by multiplying the universal classifier and the second weighting coefficient.
  • the second weighting coefficient is a weight
  • the universal classifier is a classifier. After multiplying the two, a weighted universal classifier is obtained, which is expressed as (1- ⁇ ) ⁇ C in public.
  • the second weighting coefficient is 0.3
  • the universal classifier is C
  • the weighted universal classifier obtained by multiplying the two is 0.3C.
  • Step S308 combine the weighted general classifier and the weighted target classifier to obtain a comprehensive classifier, use the comprehensive classifier to process the image extraction features, and obtain the target probability distribution corresponding to each pixel category.
  • the comprehensive classifier can be a classifier obtained by adding a general classifier and a target classifier.
  • the image extraction features are processed through a comprehensive classifier that combines a general classifier and a target classifier to obtain target probability distributions for different pixel categories.
  • the specific expression is as follows:
  • is the first weighting coefficient
  • 1- ⁇ is the second weighting coefficient
  • C x is the comprehensive classifier, where the size of ⁇ can be determined according to the cosine similarity between G c (x) and C.
  • the cosine similarity Positively related to ⁇ . For example, if the cosine similarity Cos(C,G c (x)) between G c (x) and C is higher, it means that G c (x) is more trustworthy and can replace C with a higher ratio. content to accurately predict the current sample feature x.
  • the extracted features of an image containing cats, dogs and background are processed through the general classifier and the target classifier, and the probability that the pixel about the dog is located in pixel category 1 is 0.02, and the probability of the pixel category 2 is 0.04, the probability of pixel category 3 is 0.94.
  • obtaining the first weighting coefficient corresponding to the target classifier and the second weighting coefficient corresponding to the general classifier includes:
  • Step S402 Obtain the classifier similarity between the general classifier and the target classifier, and obtain the first weighting coefficient based on the classifier similarity.
  • the first weighting coefficient is positively correlated with the classifier similarity.
  • the classifier similarity can be the similarity between the general classifier and the target classifier. If the similarity is higher, it means the target classifier is more reliable, and the content of the general classifier can be replaced with a higher ratio. .
  • the similarity is calculated between the general classifier and the target classifier, and the obtained result is the classifier similarity.
  • the first weighting coefficient can be obtained through the classifier similarity.
  • the classifier similarity may be positively correlated with the first weighting coefficient, for example, the classifier similarity may be directly used as the first weighting coefficient.
  • the universal classifier and the target classifier use cosine similarity to represent their degree of similarity. Through calculation, their cosine similarity is 0.8, and the first weighting coefficient is 0.8.
  • Step S404 Obtain a second weighting coefficient based on the first weighting coefficient, and the second weighting coefficient has a negative correlation with the first weighting coefficient.
  • the first weighting coefficient and the second weighting coefficient are negatively correlated.
  • the sum of the two is one. Therefore, after the first weighting coefficient is obtained, the second weighting coefficient can be obtained.
  • the first weighting coefficient is 0.8, and the sum of the first weighting coefficient and the second weighting coefficient is one, so that the second weighting coefficient is 0.2.
  • the adjustment of the first weighting coefficient can be provided with more basis and it is easier to obtain a better weighting coefficient.
  • obtaining the first weighting coefficient based on classifier similarity includes:
  • Step S502 Obtain the sensitivity coefficient.
  • the sensitivity coefficient may be the sensitivity of the comprehensive classifier to the classifier similarity, and adjusting the sensitivity coefficient may change the size of the first weighting coefficient and the second weighting coefficient.
  • the sensitivity coefficient is preset in the server, and the server obtains the sensitivity coefficient.
  • the sensitivity coefficient can be 1.
  • 0.75 is input to the server as the sensitivity coefficient of the comprehensive classifier.
  • Step S504 Multiply the sensitivity coefficient and the classifier similarity to obtain the first weighting coefficient.
  • the classifier similarity is a value that reflects the similarity.
  • the sensitivity coefficient and the classifier similarity are multiplied to obtain the first weighting coefficient.
  • the sensitivity coefficient is 0.75 and the classifier similarity is 0.8. The two are multiplied to obtain a first weighting coefficient of 0.6.
  • the target can be classified as needed
  • the specific gravity of the device For example, the corresponding relationship between the similarity and the sensitivity coefficient can be set.
  • the similarity is less than a preset threshold, the value of the sensitivity coefficient is increased.
  • the image extraction features are processed based on the first classifier to obtain an intermediate probability distribution including:
  • Step S602 Process the image extraction features based on the first classifier to obtain an initial probability distribution.
  • the initial probability distribution includes an initial probability matrix corresponding to each pixel category.
  • the initial distribution probability can be an initial probability matrix corresponding to each pixel category obtained after processing the image extraction features; the initial probability matrix can be an initial probability matrix that contains each pixel corresponding to the target image in each pixel category. Probability. For example, assuming there are 6 categories and the target image is an image of 200*300 pixels, you can get 6 initial probability matrices with 200 rows and 300 columns. Each initial probability matrix represents the probability that a pixel of 200*300 pixels belongs to one of the six categories.
  • the image extraction features are classified by a general classifier, and the initial probability distribution corresponding to the target image is obtained, and the initial probability distribution contains the probability of each pixel in each pixel category.
  • the extracted features of a target image containing cats, dogs, and backgrounds are classified by a universal classifier. Because the target image has different pixel categories, and each category has many pixels, an initial prediction about the target image is obtained. Probability distribution, and the initial probability distribution of the target image contains the probability of each pixel point for each pixel category of the target image.
  • Step S604 Shift the elements in the initial probability matrix to the same row or column to obtain a shift probability matrix.
  • the shift probability matrix can be a matrix obtained by shifting the elements in the initial probability matrix and moving all elements to the same row or column.
  • the initial probability matrix is a multi-dimensional matrix. By shifting, the elements in the initial probability matrix are moved to the same row or column, and the shifted probability matrix is obtained.
  • the initial probability matrix is a 200*300 matrix, which is shifted to obtain a 1*600 matrix.
  • Step S606 Concatenate the shift probability matrices corresponding to each pixel category to obtain an intermediate probability distribution.
  • the intermediate probability distribution can be obtained by splicing different shifted probability matrices to obtain a multi-dimensional matrix, and the multi-dimensional matrix is the intermediate probability distribution.
  • each pixel category has a corresponding shift probability matrix.
  • the shift probability matrices of all pixel categories are spliced, and the result of the splicing is an intermediate probability distribution.
  • each category contains 200*300 pixels.
  • 6 1*600 shift probability matrices there are a total of 6 1*600 shift probability matrices. These six matrices are After splicing, an intermediate probability distribution of 6*600 was obtained. The intermediate probability distribution has n rows and h*w columns. Each row represents the probability that each pixel in the target image belongs to the pixel category corresponding to that row.
  • the intermediate probability distribution can be obtained by shifting and splicing the image extraction features using a universal classifier.
  • the calculation method corresponding to the target classifier includes: performing a matrix multiplication operation on the intermediate probability distribution and the image extraction features to obtain the target classifier; the intermediate probability distribution is a shift of the initial probability distribution and splicing, and using an activation function to activate the initial probability distribution after splicing.
  • the initial probability distribution is obtained by processing the extracted features of the image.
  • the calculation formula corresponding to the target classifier can be expressed as follows:
  • G c (x) is the target classifier
  • p is the initial probability distribution obtained by processing the image extraction features by the general classifier
  • Reshape(p) is the shifting and splicing of p
  • x is the image extraction feature. Since p is the probability that each pixel corresponding to n pixel categories belongs to the corresponding pixel category based on ) is equivalent to extracting key information for each of the n categories from x, and using it to adjust the original general classifier C so that it can adapt to different input contents.
  • the intermediate probability distribution is fused with the image extraction features to obtain a target classifier corresponding to the image extraction features including:
  • Step S702 Obtain the confidence threshold.
  • the confidence threshold can be a value used to filter out the contribution of feature values with low confidence to avoid introducing noise.
  • the server obtains the confidence threshold.
  • the target classifier needs to filter out intermediate probability distributions with a confidence threshold less than 0.2, so the server is preset with a confidence threshold of 0.2.
  • Step S704 Compare each probability in the intermediate probability distribution with the confidence threshold, and obtain a mask matrix based on the relationship between the probability and the confidence threshold.
  • the mask matrix may be a matrix used to remove the probability that the intermediate probability distribution is smaller than the confidence threshold.
  • the size of the mask matrix is the same as that of the intermediate probability distribution.
  • the value of a position in the mask matrix is determined according to the relationship between the probability corresponding to the position in the intermediate probability distribution and the threshold. When the probability is greater than the threshold, the value of that position in the mask matrix is is 1, otherwise it is 0.
  • the confidence threshold is set, and each pixel of each category in the intermediate probability distribution is compared with the confidence threshold to obtain a mask matrix used to remove noise.
  • the confidence threshold of the intermediate probability distribution of the target image is set to 0.5, then each probability in the intermediate probability distribution of the target image will be compared with the confidence threshold, and a mask matrix is generated.
  • Step S706 Screen the intermediate probability distribution based on the mask matrix to obtain a denoising probability distribution.
  • the denoised probability distribution may be a probability distribution obtained by removing noise from the intermediate probability distribution through a mask matrix. For example, each value in the mask matrix can be multiplied by the probability of the corresponding position in the intermediate probability distribution to obtain the denoising probability distribution.
  • the intermediate probability distribution is filtered through the mask matrix.
  • the probability that is less than the confidence threshold will be removed, and the probability that is greater than or equal to the confidence threshold will be retained.
  • the probability distribution composed of the retained probabilities is denoising. Probability distributions.
  • the mask matrix formed by setting the confidence threshold to 0.5 is used to filter the intermediate probability distribution.
  • the pixels of the intermediate probability distribution have the following probabilities: 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8. value, the two probabilities of 0.3 and 0.4 will be removed through the mask matrix, and the corresponding pixels will not be included in the denoising probability distribution.
  • Step S708 Multiply the denoising probability distribution and the image extraction features to obtain a target classifier.
  • the denoised probability distribution and image extraction features are multiplied, and the result of the multiplication is the target classifier.
  • the specific expression is as follows:
  • M is the denoising probability distribution with size [n, hw], which is the same as the intermediate probability distribution.
  • sum(-1) is the sum of all hw values in M.
  • h and w are the special characteristics of the image respectively. Height and width, x is the image extraction feature, and G c (x) is the target classifier.
  • the probability of insufficient confidence is removed, reducing the noise of image semantic segmentation, improving the performance of the image semantic segmentation model, and making the success rate of image semantic segmentation higher.
  • the classifier of this semantic segmentation model is adaptively changed based on the image extraction features of the target image.
  • the processing based on the general classifier and the target classifier can enhance the classifier's adaptive perception ability of different picture contents.
  • the accuracy of obtaining the target pixel category corresponding to the target pixel is improved, thereby improving the accuracy of image semantic segmentation.
  • embodiments of the present application also provide an image semantic segmentation device for implementing the above-mentioned image semantic segmentation method.
  • the solution to the problem provided by this device is similar to the solution recorded in the above method. Therefore, for the specific limitations in one or more image semantic segmentation device embodiments provided below, please refer to the above description of the image semantic segmentation method. Limitations will not be repeated here.
  • an image semantic segmentation device including: a target image acquisition module, an image extraction feature extraction module, an intermediate probability distribution acquisition module, a target classifier acquisition module, and a target probability distribution acquisition module. module and target pixel category to get the module, where:
  • the target image acquisition module 802 is used to acquire the target image.
  • the image extraction feature extraction module 804 is used to perform feature extraction on the target image to obtain image extraction features corresponding to the target image.
  • the intermediate probability distribution obtaining module 806 is used to process the image extraction features based on the first classifier to obtain an intermediate probability distribution.
  • the intermediate probability distribution includes the probability that each pixel in the target image belongs to each pixel category.
  • the target classifier obtaining module 808 is used to fuse the intermediate probability distribution with the image extraction features to obtain a target classifier corresponding to the image extraction features.
  • the target probability distribution obtaining module 810 is used to process the image extraction features based on the target classifier and the first classifier to obtain the target probability distribution corresponding to each pixel category.
  • the target probability distribution includes the number of pixels belonging to each pixel category in the target image. Probability.
  • the target pixel category obtaining module 812 is used to obtain the target pixel category corresponding to each pixel point in the target image based on the target probability distribution.
  • the target classifier acquisition module is configured to: obtain the first weighting coefficient corresponding to the target classifier and the second weighting coefficient corresponding to the general classifier; weight the target classifier based on the first weighting coefficient to obtain the weighted Target classifier; weight the general classifier based on the second weighting coefficient to obtain a weighted general classifier; combine the weighted general classifier and the weighted target classifier to obtain a comprehensive classifier, use the comprehensive classifier to process the image extraction features, and obtain each The target probability distribution corresponding to the pixel category.
  • the target classifier obtaining module is configured to: obtain the classifier similarity between the general classifier and the target classifier, obtain the first weighting coefficient based on the classifier similarity, and the first weighting coefficient is similar to the classifier degree is positively correlated; the second weighting coefficient is obtained based on the first weighting coefficient, and the second weighting coefficient is negatively correlated with the first weighting coefficient.
  • the target classifier acquisition module is configured to: obtain a sensitivity coefficient; and multiply the sensitivity coefficient by the classifier similarity to obtain a first weighting coefficient.
  • the intermediate probability distribution obtaining module is configured to: process the image extraction features based on the first classifier to obtain an initial probability distribution, where the initial probability distribution includes an initial probability matrix corresponding to each pixel category; The elements in the initial probability matrix are shifted to the same row or column to obtain the shift probability matrix; the shift probability matrices corresponding to each pixel category are spliced to obtain the intermediate probability distribution.
  • the calculation method corresponding to the target classifier is: shifting and splicing the initial probability distribution obtained by processing the image extraction features, and using an activation function to activate the spliced initial probability distribution to obtain an intermediate probability Distribution; perform a matrix multiplication operation on the intermediate probability distribution and the image extraction features to obtain the target classifier.
  • the target classifier acquisition module is configured to: obtain a confidence threshold; compare each probability in the intermediate probability distribution with the confidence threshold, and obtain a mask matrix based on the relationship between the probability and the confidence threshold; Based on the mask matrix, the intermediate probability distribution is screened to obtain the denoising probability distribution; the denoising probability distribution is multiplied by the image extraction features to obtain the target classifier.
  • Each module in the above-mentioned image semantic segmentation device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in Figure 11.
  • the computer device includes a processor, memory, and network interfaces connected through a system bus. Wherein, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the computer device's database is used to store server data.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer program implements an image semantic segmentation method when executed by a processor.
  • Figure 11 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
  • a computer device including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, it implements the steps in the above method embodiments.
  • a computer-readable storage medium which stores a computer program.
  • the computer program is executed by a processor, the steps in the above method embodiments are implemented.
  • a computer program product or computer program includes computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above method embodiments.
  • the user information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • the computer program can be stored in a non-volatile computer-readable storage.
  • the computer program when executed, may include the processes of the above method embodiments.
  • Any reference to memory, database or other media used in the embodiments provided in this application may include at least one of non-volatile and volatile memory.
  • Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive memory (ReRAM), magnetic variable memory (Magnetoresistive Random Access Memory (MRAM), ferroelectric memory (Ferroelectric Random Access Memory, FRAM), phase change memory (Phase Change Memory, PCM), graphene memory, etc.
  • Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory, etc.
  • RAM Random Access Memory
  • RAM random access memory
  • RAM Random Access Memory
  • the databases involved in the various embodiments provided in this application may include at least one of a relational database and a non-relational database.
  • Non-relational databases may include blockchain-based distributed databases, etc., but are not limited thereto.
  • the processors involved in the various embodiments provided in this application may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to this.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种图像语义分割方法、装置、计算机设备和存储介质。方法包括:获取目标图像;对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;基于第一分类器对所述图像提取特征进行处理,得到中间概率分布;将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器;基于目标分类器以及第一分类器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布;基于目标概率分布得到目标图像中各个像素点分别对应的目标像素类别。采用本方法能够提高图像语义分割准确率。

Description

一种语义分割方法、装置、计算机设备和存储介质
本申请要求于2022年04月25日提交的申请号为202210438646.2、发明名称为“一种语义分割方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种语义分割方法、装置、计算机设备和存储介质。
背景技术
随着人工智能技术的发展,图像识别得到越来越广泛的应用。例如,对于需要分割的图像,使用图像语义分割模型将大大地提高图像分割效率。
相关技术中,可以将图像输入到图像语义分割模型的特征提取器以及分类器中进行图像分割,得到每个像素点分别对应的类别,然而,目前的图像语义分割模型对于不同的样本都是使用通用分类器,并不能很好地刻画不同样本的特征分布,存在分割准确度低的问题。
发明内容
基于此,有必要针对上述技术问题,提供一种语义分割方法、装置、计算机设备和存储介质,能够实现提高语义分割准确率。
第一方面,本申请提供了一种图像语义分割方法。所述方法包括:获取目标图像;对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,所述中间概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;将所述中间概率分布与所述图像提取特征进行融合,得到所述图像提取特征对应的目标分类器;基于所述目标分类器以及所述第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布,所述目标概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;基于所述目标概率分布得到所述目标图像中各个像素点分别对应的目标像素类别。
在一些实施例中,所述基于所述目标分类器以及所述第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布包括:获取所述 目标分类器对应的第一加权系数以及所述通用分类器对应的第二加权系数;基于所述第一加权系数对所述目标分类器进行加权,得到加权目标分类器;基于所述第二加权系数对所述通用分类器进行加权,得到加权通用分类器;综合所述加权通用分类器和所述加权目标分类器得到综合分类器,利用所述综合分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布。
在一些实施例中,所述获取所述目标分类器对应的第一加权系数以及所述通用分类器对应的第二加权系数,包括:获取所述通用分类器和所述目标分类器之间的分类器相似度,基于所述分类器相似度得到第一加权系数,所述第一加权系数与所述分类器相似度成正相关关系;基于所述第一加权系数得到第二加权系数,所述第二加权系数与所述第一加权系数为负相关关系。
在一些实施例中,所述基于所述分类器相似度得到所述第一加权系数,包括:获取敏感系数;将所述敏感系数与所述分类器相似度进行相乘,得到第一加权系数。
在一些实施例中,所述基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,包括:基于所述图像语义分割模型中的所述通用分类器对所述图像提取特征进行处理,得到初始概率分布,所述初始概率分布包括每个像素类别分别对应的初始概率矩阵;将所述初始概率矩阵中的元素移位到相同行或者相同列中,得到移位概率矩阵;将每个像素类别对应的所述移位概率矩阵进行拼接,得到中间概率分布。
在一些实施例中,所述目标分类器对应的计算方式包括:将所述中间概率分布与所述图像提取特征进行矩阵相乘运算,得到所述目标分类器;所述中间概率分布是对所述初始概率分布进行移位以及拼接,并利用激活函数对拼接后的所述初始概率分布进行激活得到的,所述初始概率分布是对所述图像提取特征进行处理得到的。
在一些实施例中,所述将所述中间概率分布与所述图像提取特征进行融合,得到所述图像提取特征对应的目标分类器,包括:获取置信度阈值;将所述中 间概率分布中的每个概率与所述置信度阈值进行对比,基于所述概率与所述置信度阈值的大小关系得到遮罩矩阵;基于所述遮罩矩阵,对所述中间概率分布进行筛选,得到去噪概率分布;将所述去噪概率分布与所述图像提取特征进行相乘,得到所述图像提取特征对应的目标分类器。
在一些实施例中,所述基于所述遮罩矩阵,对所述中间概率分布进行筛选,得到去噪概率分布,包括:
将所述遮罩矩阵中的每个值与所述中间概率分布中对应位置的概率进行相乘,得到去噪概率分布。
在一些实施例中,所述基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,包括:
基于第一分类器对所述图像提取特征进行降维,得到初始概率分布;
对所述初始概率分布中的概率值进行移位以及归一化,得到中间概率分布。
第二方面,本申请还提供了一种图像语义分割装置,包括:目标图像获取模块,用于获取目标图像;图像提取特征提取模块,用于对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;中间概率分布获得模块,用于基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,所述中间概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;目标分类器获得模块,用于将所述中间概率分布与所述图像提取特征进行融合,得到所述图像提取特征对应的目标分类器;目标概率分布获得模块,用于基于所述目标分类器以及所述第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布,所述目标概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;目标像素类别得到模块,用于基于所述目标概率分布得到所述目标图像中各个像素点分别对应的目标像素类别。
在一些实施例中,目标分类器获得模块,用于:获取所述目标分类器对应的第一加权系数以及所述通用分类器对应的第二加权系数;基于所述第一加权系数对所述目标分类器进行加权,得到加权目标分类器;基于所述第二加权系 数对所述通用分类器进行加权,得到加权通用分类器;综合所述加权通用分类器和加权目标分类器得到综合分类器,利用所述综合分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布。
在一些实施例中,目标分类器获得模块,用于:获取所述通用分类器和所述目标分类器之间的分类器相似度,基于所述分类器相似度得到所述第一加权系数,所述第一加权系数与所述分类器相似度成正相关关系;基于所述第一加权系数得到所述第二加权系数,所述第二加权系数与所述第一加权系数为负相关关系。
在一些实施例中,目标分类器获得模块,用于:基于获取敏感系数;将所述敏感系数与所述分类器相似度进行相乘,得到所述第一加权系数。
在一些实施例中,中间概率分布获得模块,用于:基于所述图像语义分割模型中的所述通用分类器对所述图像提取特征进行处理,得到初始概率分布,所述初始概率分布包括每个像素类别分别对应的初始概率矩阵;将所述初始概率矩阵中的元素移位到相同行或者相同列中,得到移位概率矩阵;将每个像素类别对应的所述移位概率矩阵进行拼接,得到中间概率分布。
在一些实施例中,所述目标分类器对应的计算方式包括:对所述图像提取特征进行处理得到的初始概率分布进行移位以及拼接,利用激活函数对拼接后的初始概率分布进行激活,得到中间概率分布;将所述中间概率分布与所述图像提取特征进行矩阵相乘运算,得到所述目标分类器。
在一些实施例中,目标分类器获得模块,用于:获取置信度阈值;将所述中间概率分布中的每个概率与所述置信度阈值进行对比,基于所述概率与所述置信度阈值的大小关系得到遮罩矩阵;基于所述遮罩矩阵,对所述中间概率分布进行筛选,得到去噪概率分布;将所述去噪概率分布与所述图像提取特征进行相乘,得到所述目标分类器。
在一些实施例中,目标分类器获得模块,用于将所述遮罩矩阵中的每个值与所述中间概率分布中对应位置的概率进行相乘,得到去噪概率分布。
在一些实施例中,中间概率分布获得模块,用于基于第一分类器对所述图像提取特征进行降维,得到初始概率分布;对所述初始概率分布中的概率值进行移位以及归一化,得到所述中间概率分布。
第三方面,本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述图像语义分割方法中的步骤。
第四方面,本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述图像语义分割方法中的步骤。
第五方面,本申请还提供了一种计算机程序产品。所述计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述图像语义分割方法中的步骤。
上述图像语义分割方法、装置、计算机设备、存储介质和计算机程序产品,通过获取目标图像;对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;基于第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的中间概率分布,中间概率分布包括目标图像中各个像素点属于各个像素类别的概率;将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器;基于目标分类器以及第一分类器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布,目标概率分布包括目标图像中各个像素点属于各个像素类别的概率;基于目标概率分布得到目标图像中各个像素点分别对应的目标像素类别。通过特征提取器提取目标图像的特征向量,得到图像提取特征,然后使用通用分类器对图像提取特征进行处理,得到了用作构建目标分类器的中间概率分布,接着将中间概率分布与图像提取特征进行融合,融合后得到了目标分类器,因此目标分类器是基于目标图像的图像提取特征自适应改变的,基于通用分类器和目标分类器的处理,能够增强分类器对不同图片内容的自适应感知能力,提高了得到目标像素点对应的目标像素类别准 确度,从而提高了图像语义分割的准确度。
附图说明
图1为一个实施例中图像语义分割方法的应用环境图;
图2为一个实施例中图像语义分割方法的流程示意图;
图3为一个实施例中PANet模型模拟结果对比示意图;
图4为一个实施例中PFENet模型模拟结果对比示意图;
图5为一个实施例中图像语义分割步骤的流程示意图;
图6为另一个实施例中图像语义分割步骤的流程示意图;
图7为又一个实施例中图像语义分割步骤的流程示意图;
图8为再一个实施例中图像语义分割步骤的流程示意图;
图9为多一个实施例中图像语义分割步骤的流程示意图;
图10为一个实施例中图像语义分割装置的结构框图;
图11为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供的图像语义分割方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。数据存储系统可以存储服务器104需要处理的数据。数据存储系统可以集成在服务器104上,也可以放在云上或其他网络服务器上。终端102响应于接收操作,向服务器发送图像语义分割的指令,服务器104获取目标图像;对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;基于第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的中间概率分布,中间概率分布包括目标图像中各个像素点属于各个像素类别的概率;将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器;基于目标分类器以及第一分类 器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布,目标概率分布包括目标图像中各个像素点属于各个像素类别的概率;基于目标概率分布得到目标图像中各个像素点分别对应的目标像素类别。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑、物联网设备和便携式可穿戴设备,物联网设备可为智能音箱、智能电视、智能空调、智能车载设备等。便携式可穿戴设备可为智能手表、智能手环、头戴设备等。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一些实施例中,如图2所示,提供了一种图像语义分割方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S202,获取目标图像。
其中,目标图像可以是用作人工智能模型的图像语义分割的图像,使用人工智能模型对图像进行语义分割,同时能使人工智能模型的参数自适应地变化。
具体地,服务器可以获取一个或者多个需要利用图像语义分割模型进行分割的图像。
在一些实施例中,摄像头拍摄了一个广告牌照片,需要对这张照片使用图像语义分割模型进行语义分割,则服务器从摄像头终端获取该照片。
步骤S204,对目标图像进行特征提取,得到目标图像对应的图像提取特征。
其中,可以是基于图像语义分割模型中的特征提取器对目标图像进行特征提取,图像语义分割模型可以是对图像进行像素级别的分类,该模型对每个像素所在的周边环境以及内容有所感知,才能完成对单个像素所属类别的判定;特征提取器可以是对图像进行特征提取的函数,该函数提取出来的特征使用特征向量进行表示;图像提取特征可以是目标图像通过特征提取器所得到的用来表示该图像特征的向量,特征提取器可以是基于卷积神经网络(Convolutional Neural Networks,CNN)的,是一类包含卷积计算且具有深度结构的前馈神经网络。图像语义分割模型包含特征提取器以及通用分类器,特征提取器可以提 取图像中的特征向量,通用分类器对特征向量进行分类,得到每个类别中每个像素点的概率。
具体的,特征提取器获得目标图像后,特征提取器将对目标图像进行特征提取,得到用来表示该图像特征的图像提取特征。
在一些实施例中,目标图像中含有猫、狗以及背景,特征提取器获得目标图像后对猫、狗以及背景进行特征提取,提取后用向量的形式来表示猫、狗以及背景的特征。
步骤S206,基于第一分类器对图像提取特征进行处理,得到各个像素类别对应的中间概率分布,中间概率分布包括目标图像中各个像素点属于各个像素类别的概率。
其中,可以是基于第一分类器对所述图像提取特征进行处理,通用分类器可以是基于很多训练图像训练得到的,因此对于每个图像都是通用的,分类得到的结果是对图像中每个像素属于每一个类别的概率;中间概率分布可以是图像提取特征被通用分类器进行转换计算得到的向量。
具体的,用[h,w,d]来表示提取到的图像提取特征的高、宽和特征通道数,用n来表示待预测类别数量,因而初始概率分布p的尺寸应为[h,w,n],表示每个像素点对于n个类别的归属概率,而通用分类器对应的尺寸[n,d],即对n个类别中的每个类别分别用一个d维度的向量来表示。原本初始概率分布p的尺寸为[h,w,n],用reshape操作将其调整为尺寸[hw,n]的向量,进一步通过行与列的交换调整为尺寸[n,hw]的向量,图像提取特征通过reshape调整后,softmax是一个得到归一化的分类概率的函数,向它输入一个向量(向量元素为任意实数),输出的是该向量属于某个类别的概率值,这里softmax作用在第2维向量(尺寸n)上,使得成为n个类别上对的加权矩阵(在空间hw个位置上权重之和为1),得到中间概率分布。
在一些实施例中,目标图像中含有一只狗,图像语义分割模型中的通用分类器对狗的每个像素点的概率进行reshape,得到了狗的每个像素点对应的调 整为尺寸[hw,n]的向量,然后通过向量元素的交换,得到了狗的每个像素点对应的向量[n,hw]。
步骤S208,将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器。
其中,目标分类器可以是中间概率分布通过与图像提取特征进行乘法运算后所得到的函数。可以理解,通用分类器与目标分类器是相对而言的,即通用分类器是基于不同的训练图像训练得到的,是不同图像通用的分类器,而目标分类器是基于一个图像的提取图像特征以及对应的中间概率分布得到的,是该目标图像对应的特有的分类器。
具体的,中间概率分布和提取到的图像特征(尺寸为[hw,c])进行矩阵乘法运算,得到目标分类器G c(x)(尺寸为[n,c])。具体的reshape、softmax和矩阵乘法步骤的表达式如下式:
G c(x)=Softmax(Reshape(p))×x
其中,x为图像提取特征,G c(x)在这里表示的是从图片特征x当中提取到的类别信息,也就是目标分类器,尺寸亦与分类器相同。
在一些实施例中,狗的图像每个像素点的概率经过Reshape,Reshape的过程为[h,w,n]通过降维得到矩阵[hw,n],对[hw,n]进行移位,得到[n,hw],reshape操作将其调整为尺寸[n,hw]的向量,它的第二维向量将被softmax作用,使得成为n个类别上对的加权矩阵(在空间hw个位置上权重之和为1),然后对狗的图像每个像素点和狗的图像提取特征进行矩阵乘法。
步骤S210,基于目标分类器以及第一分类器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布,目标概率分布包括目标图像中各个像素点属于各个像素类别的概率。
其中,目标概率分布可以是目标图像通过目标分类器和第一分类器中的通 用分类器进行分类所得到的每个像素类别对应的概率分布向量,像素类别有很多个。
具体的,目标分类器和通用分类器所组成的分类器为综合分类器,该综合分类器可以根据图像提取特征的变化而自动改变自己的参数,使得模型性能更好。图像提取特征被输入到目标分类器和通用分类器中,这两个函数分别对图像特征的每个像素点进行分类,最后给出各个像素类别所对应的目标概率分布。综合分类器C x是依赖于数据本身的内容而产生的,综合分类器C x的产生可以表述为下式:
C x=(1-γ)*C+γ*G c(x)
其中,γ为一个加权系数,其能够调节原有分类器以及对分类器新引进的特征的权重,来完成自适应分类器的构建。
在一些实施例中,目标分类器G c(x)和通用分类器C所组成的分类器为综合分类器C x,目标图像的特征向量输入综合分类器C x中,通过综合分类器C x处理,得到目标图像的每个像素点对于每个类别的概率。
步骤S212,基于目标概率分布得到目标图像中各个像素点分别对应的目标像素类别。
其中,目标像素类别可以是对于目标图像中每个像素点可归类的类别,一个图像中可以包括不同类别的像素点,图像分割可以将相同类别的像素点分割到同一个区域。
具体的,通过目标分类器和通用分类器后,每个像素点对应每个类别都有一个概率,选取概率最高的类别作为该像素点所对应的目标像素类别,得到类别之后,进行分割,得到多个区域。
在一些实施例中,像素点B位于1类别的概率为0.07,位于2类别的概率为0.1,位于3类别的概率为0.83,因此像素点B对于3类别的概率最高,所以输出像素点B为3类别。
在一些实施例中,通用语义分割的情况下,基于ResNet-50的PSPNet分割模型上,使用本方法应用在现有技术上均取得性能提升。小样本分割的情况下,应用到PANet和PFENet模型上,数据集均取得显著提升。如图3和图4所示,图中的Fold-0~Fold-3以及Mean均为数据集,Methods为模型,其中PANet和PFENet模型是现有的模型,PANet+Ours和PFENet+Ours是在PANet以及PFENet上采用本申请实施例进行识别的模型,1-shot为单例学习,5-shot为5例学习,图中的数字为计算真实值和预测值两个集合的交集和并集之比,单位为mIOU。
上述图像语义分割方法中,通过获取目标图像;对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;基于第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的中间概率分布,中间概率分布包括目标图像中各个像素点属于各个像素类别的概率;将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器;基于目标分类器以及第一分类器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布,目标概率分布包括目标图像中各个像素点属于各个像素类别的概率;基于目标概率分布得到目标图像中各个像素点分别对应的目标像素类别。通过特征提取器提取目标图像的特征向量,得到图像提取特征,然后使用通用分类器对图像提取特征进行处理,得到了用作构建目标分类器的中间概率分布,接着将中间概率分布与图像提取特征进行融合,融合后得到了目标分类器,因此目标分类器是基于目标图像的图像提取特征自适应改变的,基于通用分类器和目标分类器的处理,能够增强分类器对不同图片内容的自适应感知能力,提高了得到目标像素点对应的目标像素类别准确度,从而提高了图像语义分割的准确度。即对于不同的图片,得到的目标分类器是根据图片的内容进行自适应变化的,从而可以更好地刻画不同图像的特征分布,提高像素级别的图像分类准确度。
在一些实施例中,如图5所示,第一分类器包括图像语义分割模型中的通用分类器,基于目标分类器以及第一分类器对图像提取特征进行处理,得到各 个像素类别对应的目标概率分布包括:
步骤S302,获取目标分类器对应的第一加权系数以及通用分类器对应的第二加权系数。
其中,第一加权系数可以是目标分类器位于综合分类器里面所占的权重;第二加权系数可以是通用分类器位于综合分类器里面所占的权重;第一加权系数和第二加权系数可以是预先设置的固定值,也可以是基于目标分类器自适应得到的。
具体的,服务器获取关于目标分类器所对应的第一加权系数和通用分类器所对应的第二加权系数,第一加权系数和第二加权系数为负相关关系,两者相加之和为一。
在一些实施例中,服务器获取第一加权系数的值为0.7,因为第一加权系数和第二加权系数之和为1,因此服务器获得第二加权系数为0.3。
步骤S304,基于第一加权系数对目标分类器进行加权,得到加权目标分类器。
其中,加权目标分类器可以是目标分类器与第一加权系数进行乘法计算后所得到的具有加权效果的目标分类器。
具体的,第一加权系数是一个权重,目标分类器是一个分类器,分类器中包括用于进行分类的模型参数,两者进行相乘后,得到加权目标分类器,使用计算公式表示为γ×G c
在一些实施例中,第一加权系数为0.7,目标分类器为Gx,两者相乘后得到的加权目标分类器为0.7Gx。
步骤S306,基于第二加权系数对通用分类器进行加权,得到加权通用分类器。
其中,加权通用分类器可以是通用分类器与第二加权系数进行乘法计算后所得到的具有加权效果的通用分类器。
具体的,第二加权系数是一个权重,通用分类器是一个分类器,两者进行 相乘后,得到加权通用分类器,使用公示表示为(1-γ)×C。
在一些实施例中,第二加权系数为0.3,通用分类器为C,两者相乘后得到的加权通用分类器为0.3C。
步骤S308,综合加权通用分类器和加权目标分类器得到综合分类器,利用综合分类器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布。
其中,综合分类器可以是通用分类器和目标分类器相加所得到的分类器。
具体的,图像提取特征通过有通用分类器和目标分类器所组合的综合分类器进行处理,得到对于不同像素类别的目标概率分布,具体的表达式如下:
C x=(1-γ)*C+γ*G c(x)=(1-αCos(C,G c(x)))*C+αCos(C,G c(x))*G c(x)
其中,γ为第一加权系数,1-γ为第二加权系数,C x为综合分类器,其中γ的大小可以根据G c(x)和C之间的余弦相似度确定,该余弦相似度与γ成正相关关系。例如,若G c(x)和C之间的余弦相似度Cos(C,G c(x))较高,说明G c(x)的可信赖程度越高,可以以较高比例来替代C的内容,以精准地对当前样本特征x进行预测。反之,若余弦Cos(C,G c(x))相似度较低,则说明从当前样本引进的类别信息于原有分类器差异较大,引入可能会损害模型原有的泛化能力,因而G c(x)的权重需要对应调小,α为敏感系数。
在一些实施例中,一张包含猫、狗和背景的图像提取特征通过通用分类器和目标分类器进行处理,得到了关于狗的像素位于像素类别1的概率为0.02,像素类别2的概率为0.04,像素类别3的概率为0.94。
本实施例中,通过对图像提取特征经过通用分类器和目标分类器的处理,得到了经过两次处理的目标概率分布,能够对目标图像的像素点进行更准确的分类。
在一些实施例中,如图6所示,获取目标分类器对应的第一加权系数以及通用分类器对应的第二加权系数包括:
步骤S402,获取通用分类器和目标分类器之间的分类器相似度,基于分 类器相似度得到第一加权系数,第一加权系数与分类器相似度成正相关关系。
其中,分类器相似度可以是通用分类器和目标分类器之间的相似度,如果相似度越高,说明目标分类器的可信度越高,可以以较高比例来替代通用分类器的内容。
具体的,对通用分类器和目标分类器之间进行相似度计算,得到的结果为分类器相似度,通过分类器相似度可以得到第一加权系数。分类器相似度可以和第一加权系数成正相关关系,例如直接使用分类器相似度作为第一加权系数。
在一些实施例中,通用分类器和目标分类器用余弦相似度来表示它们的相似程度,通过计算得到它们的余弦相似度为0.8,则得到第一加权系数为0.8。
步骤S404,基于第一加权系数得到第二加权系数,第二加权系数与第一加权系数为负相关关系。
具体的,第一加权系数与第二加权系数是成负相关关系,例如两者之和为一,因此获得第一加权系数后,即可得到第二加权系数。
在一些实施例中,第一加权系数为0.8,且第一加权系数和第二加权系数的和为一,因此得到第二加权系数为0.2。
本实施例中,通过引入分类器相似度,并得到分类器相似度与第一加权系数的关系,能够使第一加权系数的调整有更多的依据,更容易获得更优的加权系数。
在一些实施例中,如图7所示,基于分类器相似度得到第一加权系数包括:
步骤S502,获取敏感系数。
其中,敏感系数可以是综合分类器对分类器相似度的敏感度,调节敏感系数可以对第一加权系数和第二加权系数的大小进行改变。
具体的,服务器中预先设置了敏感系数,服务器获取得到敏感系数,例如敏感系数可以为1。
在一些实施例中,给服务器输入0.75作为该综合分类器的敏感系数。
步骤S504,将敏感系数与分类器相似度进行相乘,得到第一加权系数。
具体的,分类器相似度是一个体现相似度的值,将敏感系数和分类器相似度进行相乘运算,得到第一加权系数。
在一些实施例中,敏感系数为0.75,分类器相似度为0.8,把它们两者进行乘法运算,得到第一加权系数为0.6。
本实施例中,通过增加敏感系数与分类器相似度进行相乘后得到的第一加权系数,调节第一加权系数所占的比重,对于分类器相似度比较高的情形,能够根据需要目标分类器的比重。例如,可以设置相似度与敏感系数的对应关系,当相似度小于预设阈值时,则提高敏感系数的值。
在一些实施例中,如图8所示,基于第一分类器对所述图像提取特征进行处理,得到中间概率分布包括:
步骤S602,基于第一分类器对所述图像提取特征进行处理,得到初始概率分布,初始概率分布包括每个像素类别分别对应的初始概率矩阵。
其中,初始分布概率可以是对图像提取特征进行处理后所得到的每个像素类别对应的初始概率矩阵;初始概率矩阵可以是里面包含了每个像素类别里面目标图像所对应的每个像素点的概率。例如,假设有6个类别,目标图像为200*300像素的图像,则可以得到6个200行乘以300列的初始概率矩阵。每个初始概率矩阵代表的是200*300像素的像素点属于6个类别中其中一个类别的概率。
具体的,图像提取特征被通用分类器进行分类,得到了目标图像所对应的初始概率分布,且初始概率分布里面包含了每个像素类别里面的每个像素点的概率。
在一些实施例中,含有猫、狗以及背景的目标图像的提取特征被通用分类器进行分类,因为目标图像具有不同的像素类别,每个类别具有很多像素点,因此得到了关于目标图像的初始概率分布,而且目标图像的初始概率分布中包含了关于目标图像的每个像素类别的每个像素点的概率。
步骤S604,将初始概率矩阵中的元素移位到相同行或者相同列中,得到 移位概率矩阵。
其中,移位概率矩阵可以是对初始概率矩阵中的元素进行移位,把所有元素都移到同一行或者同一列中所得到的矩阵。
具体的,初始概率矩阵是一个多维的矩阵,通过移位,把初始概率矩阵中的元素移到同一行或者同一列中,得到了移位概率矩阵。
在一些实施例中,初始概率矩阵是一个200*300的矩阵,通过移位后得到了一个1*600的矩阵。
步骤S606,将每个像素类别对应的移位概率矩阵进行拼接,得到中间概率分布。
其中,中间概率分布可以是对不同已经进行移位概率矩阵进行拼接,得到多维矩阵,该多维矩阵则是中间概率分布。
具体的,每个像素类别有对应的移位概率矩阵,对所有像素类别的移位概率矩阵进行拼接,拼接得到的结果就得到了中间概率分布。
在一些实施例中,目标图像的像素点一共有6个类别,每个类别中含有200*300个像素点,reshape后一共有6个1*600的移位概率矩阵,对这6个矩阵进行拼接,得到了6*600的中间概率分布。中间概率分布有n行,有h*w列,每一行代表目标图像中,每个像素点属于该行对应的像素类别的概率。
本实施例中,通过通用分类器对图像提取特征进行移位和拼接的处理,可以得到中间概率分布。
在一些实施例中,目标分类器对应的计算方式包括:将中间概率分布与图像提取特征进行矩阵相乘运算,得到目标分类器;所述中间概率分布是对所述初始概率分布进行移位以及拼接,并利用激活函数对拼接后的所述初始概率分布进行激活得到的,所述初始概率分布是对所述图像提取特征进行处理得到的。例如,目标分类器对应的计算公式可以表示如下:
G c(x)=Softmax(Reshape(p))×x
G c(x)为目标分类器,p为通用分类器对图像提取特征进行处理得到的初始概率分布,Reshape(p)为对p进行移位以及拼接,x为图像提取特征。由于p为基于x得到的对于n个像素类别对应的每个像素点属于对应像素类别的概率,因此通过上式,可以从x提取对于n个类别每个类别中的关键信息,G c(x)相当于是从x提取对于n个类别每个类别中的关键信息,用来调整原有通用分类器C,使得其能够适应于不同输入的内容。
在一些实施例中,如图9所示,将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器包括:
步骤S702,获取置信度阈值。
其中,置信度阈值可以是一个用来过滤掉那些置信度低的特征值的贡献,以避免引入噪声的值。
具体的,给定一个置信度阈值,服务器获取到置信度阈值。
在一些实施例中,目标分类器需要过滤掉置信度阈值小于0.2的中间概率分布,因此给服务器预先设置0.2的置信度阈值。
步骤S704,将中间概率分布中的每个概率与置信度阈值进行对比,基于概率与置信度阈值的大小关系得到遮罩矩阵。
其中,遮罩矩阵可以是用来对中间概率分布小于置信度阈值的概率进行去除的矩阵。遮罩矩阵与中间概率分布的尺寸相同,遮罩矩阵中一个位置的值根据中间概率分布中该位置对应的概率与阈值的大小关系确定,当概率大于阈值,则遮罩矩阵中该位置的值为1,否则为0。
具体的,设置置信度阈值,将中间概率分布中的每一个类别的每一个像素点跟置信度阈值进行对比,得到一个用来去除噪声的遮罩矩阵。
在一些实施例中,设置目标图像的中间概率分布的置信度阈值为0.5,那么目标图像的中间概率分布中的每一个概率将与置信度阈值进行对比,并生成了一个遮罩矩阵。
步骤S706,基于遮罩矩阵,对中间概率分布进行筛选,得到去噪概率分 布。
其中,去噪概率分布可以是通过遮罩矩阵把中间概率分布中的噪音去除后所得到的概率分布。例如可以将遮罩矩阵中的每个值与中间概率分布中对应位置的概率进行相乘,得到去噪概率分布。
具体的,通过遮罩矩阵,对中间概率分布进行筛选,小于置信度阈值的概率将被去除,大于或等于置信度阈值的概率将会被保留,被保留的概率所组成的概率分布为去噪概率分布。
在一些实施例中,设置置信度阈值为0.5所形成的遮罩矩阵,通过对中间概率分布进行筛选,该中间概率分布的像素点具有0.3、0.4、0.5、0.6、0.7以及0.8这几种概率的值,通过遮罩矩阵则会把0.3和0.4两个概率去除掉,相应的像素点也不纳入去噪概率分布中。
步骤S708,将去噪概率分布与图像提取特征进行相乘,得到目标分类器。
具体的,对已经进行去噪的去噪概率分布和图像提取特征进行相乘,相乘得到的结果为目标分类器,具体的表达式如公式如下:
Figure PCTCN2022120480-appb-000001
其中,M为去噪概率分布,尺寸为[n,hw],与中间概率分布相同,sum(-1)是对M里面的所有hw的值进行求和,h和w分别为图像特区特征的高和宽,x为图像提取特征,G c(x)为目标分类器。
本实施例中,通过设置置信度阈值,使得置信度不足的概率被去除掉,降低图像语义分割的噪音,能够提高图像语义分割模型的性能,使得图像语义分割的成功率更高。
应该理解的是,本语义分割模型的分类器是基于目标图像的图像提取特征自适应改变的,基于通用分类器和目标分类器的处理,能够增强分类器对不同图片内容的自适应感知能力,提高了得到目标像素点对应的目标像素类别准确 度,从而提高了图像语义分割的准确度。
应该理解的是,虽然如上的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,如上的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
基于同样的发明构思,本申请实施例还提供了一种用于实现上述所涉及的图像语义分割方法的图像语义分割装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似,故下面所提供的一个或多个图像语义分割装置实施例中的具体限定可以参见上文中对于图像语义分割方法的限定,在此不再赘述。
在一些实施例中,如图10所示,提供了一种图像语义分割装置,包括:目标图像获取模块、图像提取特征提取模块、中间概率分布获得模块、目标分类器获得模块、目标概率分布获得模块和目标像素类别得到模块,其中:
目标图像获取模块802,用于获取目标图像。
图像提取特征提取模块804,用于对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征。
中间概率分布获得模块806,用于基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,中间概率分布包括目标图像中各个像素点属于各个像素类别的概率。
目标分类器获得模块808,用于将中间概率分布与图像提取特征进行融合,得到图像提取特征对应的目标分类器。
目标概率分布获得模块810,用于基于目标分类器以及第一分类器对图像 提取特征进行处理,得到各个像素类别对应的目标概率分布,目标概率分布包括目标图像中各个像素点属于各个像素类别的概率。
目标像素类别得到模块812,用于基于目标概率分布得到目标图像中各个像素点分别对应的目标像素类别。
在一些实施例中,目标分类器获得模块,用于:获取目标分类器对应的第一加权系数以及通用分类器对应的第二加权系数;基于第一加权系数对目标分类器进行加权,得到加权目标分类器;基于第二加权系数对通用分类器进行加权,得到加权通用分类器;综合加权通用分类器和加权目标分类器得到综合分类器,利用综合分类器对图像提取特征进行处理,得到各个像素类别对应的目标概率分布。
在一些实施例中,目标分类器获得模块,用于:获取通用分类器和目标分类器之间的分类器相似度,基于分类器相似度得到第一加权系数,第一加权系数与分类器相似度成正相关关系;基于第一加权系数得到第二加权系数,第二加权系数与第一加权系数为负相关关系。
在一些实施例中,目标分类器获得模块,用于:基于获取敏感系数;将敏感系数与分类器相似度进行相乘,得到第一加权系数。
在一些实施例中,中间概率分布获得模块,用于:基于第一分类器对所述图像提取特征进行处理,得到初始概率分布,初始概率分布包括每个像素类别分别对应的初始概率矩阵;将初始概率矩阵中的元素移位到相同行或者相同列中,得到移位概率矩阵;将每个像素类别对应的移位概率矩阵进行拼接,得到中间概率分布。
在一些实施例中,目标分类器对应的计算方式为:对所述图像提取特征进行处理得到的初始概率分布进行移位以及拼接,利用激活函数对拼接后的初始概率分布进行激活,得到中间概率分布;将所述中间概率分布与所述图像提取特征进行矩阵相乘运算,得到所述目标分类器。
在一些实施例中,目标分类器获得模块,用于:获取置信度阈值;将中间 概率分布中的每个概率与置信度阈值进行对比,基于概率与置信度阈值的大小关系得到遮罩矩阵;基于遮罩矩阵,对中间概率分布进行筛选,得到去噪概率分布;将去噪概率分布与图像提取特征进行相乘,得到目标分类器。
上述图像语义分割装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图11所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储服务器数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像语义分割方法。
本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一些实施例中,还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
在一些实施例中,提供了一种计算机可读存储介质,存储有计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。
在一些实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介 质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各方法实施例中的步骤。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory,MRAM)、铁电存储器(Ferroelectric Random Access Memory,FRAM)、相变存储器(Phase Change Memory,PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器等。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等,不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等,不限于此。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种图像语义分割方法,其特征在于,包括:
    获取目标图像;
    对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;
    基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,所述中间概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;
    将所述中间概率分布与所述图像提取特征进行融合,得到所述图像提取特征对应的目标分类器;
    基于所述目标分类器以及所述第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布,所述目标概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;
    基于所述目标概率分布得到所述目标图像中各个像素点分别对应的目标像素类别。
  2. 根据权利要求1所述的方法,其特征在于,所述第一分类器包括图像语义分割模型中的通用分类器,所述基于所述目标分类器以及所述第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布,包括:
    获取所述目标分类器对应的第一加权系数以及所述通用分类器对应的第二加权系数;
    基于所述第一加权系数对所述目标分类器进行加权,得到加权目标分类器;
    基于所述第二加权系数对所述通用分类器进行加权,得到加权通用分类器;
    综合所述加权通用分类器和所述加权目标分类器得到综合分类器,利用所述综合分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述目标分类器对应的第一加权系数以及所述通用分类器对应的第二加权系数,包括:
    获取所述通用分类器和所述目标分类器之间的分类器相似度,基于所述分 类器相似度得到第一加权系数,所述第一加权系数与所述分类器相似度成正相关关系;
    基于所述第一加权系数得到第二加权系数,所述第二加权系数与所述第一加权系数为负相关关系。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述分类器相似度得到第一加权系数,包括:
    获取敏感系数;
    将所述敏感系数与所述分类器相似度进行相乘,得到第一加权系数。
  5. 根据权利要求1所述的方法,其特征在于,所述第一分类器包括图像语义分割模型中的通用分类器,所述基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,包括:
    基于所述图像语义分割模型中的所述通用分类器对所述图像提取特征进行处理,得到初始概率分布,所述初始概率分布包括每个像素类别分别对应的初始概率矩阵;
    将所述初始概率矩阵中的元素移位到相同行或者相同列中,得到移位概率矩阵;
    将每个像素类别对应的所述移位概率矩阵进行拼接,得到中间概率分布。
  6. 根据权利要求5所述的方法,其特征在于,所述目标分类器对应的计算方式包括:
    将所述中间概率分布与所述图像提取特征进行矩阵相乘运算,得到所述目标分类器;所述中间概率分布是对所述初始概率分布进行移位以及拼接,并利用激活函数对拼接后的所述初始概率分布进行激活得到的,所述初始概率分布是对所述图像提取特征进行处理得到的。
  7. 根据权利要求1所述的方法,其特征在于,所述将所述中间概率分布与所述图像提取特征进行融合,得到所述图像提取特征对应的目标分类器,包括:
    获取置信度阈值;
    将所述中间概率分布中的每个概率与所述置信度阈值进行对比,基于所述概率与所述置信度阈值的大小关系得到遮罩矩阵;
    基于所述遮罩矩阵,对所述中间概率分布进行筛选,得到去噪概率分布;
    将所述去噪概率分布与所述图像提取特征进行相乘,得到所述图像提取特征对应的目标分类器。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述遮罩矩阵,对所述中间概率分布进行筛选,得到去噪概率分布,包括:
    将所述遮罩矩阵中的每个值与所述中间概率分布中对应位置的概率进行相乘,得到去噪概率分布。
  9. 根据权利要求1所述的方法,其特征在于,所述基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,包括:
    基于第一分类器对所述图像提取特征进行降维,得到初始概率分布;
    对所述初始概率分布中的概率值进行移位以及归一化,得到中间概率分布。
  10. 一种图像语义分割装置,其特征在于,包括:
    目标图像获取模块,用于获取目标图像;
    图像提取特征提取模块,用于对所述目标图像进行特征提取,得到所述目标图像对应的图像提取特征;
    中间概率分布获得模块,用于基于第一分类器对所述图像提取特征进行处理,得到中间概率分布,所述中间概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;
    目标分类器获得模块,用于将所述中间概率分布与所述图像提取特征进行融合,得到所述图像提取特征对应的目标分类器;
    目标概率分布获得模块,用于基于所述目标分类器以及所述第一分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布,所述目标概率分布包括所述目标图像中各个像素点属于各个像素类别的概率;
    目标像素类别得到模块,用于基于所述目标概率分布得到所述目标图像中各个像素点分别对应的目标像素类别。
  11. 根据权利要求10所述的装置,其特征在于,所述第一分类器包括图像语义分割模型中的通用分类器,所述目标分类器获得模块,用于获取所述目标分类器对应的第一加权系数以及所述通用分类器对应的第二加权系数;基于所述第一加权系数对所述目标分类器进行加权,得到加权目标分类器;基于所述第二加权系数对所述通用分类器进行加权,得到加权通用分类器;综合所述加权通用分类器和所述加权目标分类器得到综合分类器,利用所述综合分类器对所述图像提取特征进行处理,得到各个像素类别对应的目标概率分布。
  12. 根据权利要求11所述的装置,其特征在于,所述目标分类器获得模块,用于获取所述通用分类器和所述目标分类器之间的分类器相似度,基于所述分类器相似度得到第一加权系数,所述第一加权系数与所述分类器相似度成正相关关系;基于所述第一加权系数得到第二加权系数,所述第二加权系数与所述第一加权系数为负相关关系。
  13. 根据权利要求12所述的装置,其特征在于,所述目标分类器获得模块,用于获取敏感系数;将所述敏感系数与所述分类器相似度进行相乘,得到第一加权系数。
  14. 根据权利要求10所述的装置,其特征在于,所述第一分类器包括图像语义分割模型中的通用分类器,所述中间概率分布获得模块,用于:基于所述图像语义分割模型中的所述通用分类器对所述图像提取特征进行处理,得到初始概率分布,所述初始概率分布包括每个像素类别分别对应的初始概率矩阵;将所述初始概率矩阵中的元素移位到相同行或者相同列中,得到移位概率矩阵;将每个像素类别对应的所述移位概率矩阵进行拼接,得到中间概率分布。
  15. 根据权利要求14所述的装置,其特征在于,所述目标分类器对应的计算方式包括:
    将所述中间概率分布与所述图像提取特征进行矩阵相乘运算,得到所述目 标分类器;所述中间概率分布是对所述初始概率分布进行移位以及拼接,并利用激活函数对拼接后的所述初始概率分布进行激活得到的,所述初始概率分布是对所述图像提取特征进行处理得到的。
  16. 根据权利要求10所述的装置,其特征在于,所述目标分类器获得模块,用于获取置信度阈值;将所述中间概率分布中的每个概率与所述置信度阈值进行对比,基于所述概率与所述置信度阈值的大小关系得到遮罩矩阵;基于所述遮罩矩阵,对所述中间概率分布进行筛选,得到去噪概率分布;将所述去噪概率分布与所述图像提取特征进行相乘,得到所述图像提取特征对应的目标分类器。
  17. 根据权利要求16所述的装置,其特征在于,所述目标分类器获得模块,用于将所述遮罩矩阵中的每个值与所述中间概率分布中对应位置的概率进行相乘,得到去噪概率分布。
  18. 根据权利要求10所述的装置,其特征在于,所述中间概率分布获得模块,用于基于第一分类器对所述图像提取特征进行降维,得到初始概率分布;对所述初始概率分布中的概率值进行移位以及归一化,得到所述中间概率分布。
  19. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9中任一项所述的方法的步骤。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的方法的步骤。
PCT/CN2022/120480 2022-04-25 2022-09-22 一种语义分割方法、装置、计算机设备和存储介质 WO2023206944A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210438646.2 2022-04-25
CN202210438646.2A CN114549913B (zh) 2022-04-25 2022-04-25 一种语义分割方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023206944A1 true WO2023206944A1 (zh) 2023-11-02

Family

ID=81667042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120480 WO2023206944A1 (zh) 2022-04-25 2022-09-22 一种语义分割方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN114549913B (zh)
WO (1) WO2023206944A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636026A (zh) * 2023-11-17 2024-03-01 上海凡顺实业股份有限公司 一种集装箱锁销类别图片识别方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549913B (zh) * 2022-04-25 2022-07-19 深圳思谋信息科技有限公司 一种语义分割方法、装置、计算机设备和存储介质
CN115345895B (zh) * 2022-10-19 2023-01-06 深圳市壹倍科技有限公司 用于视觉检测的图像分割方法、装置、计算机设备及介质
CN115620013B (zh) * 2022-12-14 2023-03-14 深圳思谋信息科技有限公司 语义分割方法、装置、计算机设备及计算机可读存储介质
CN115761239B (zh) * 2023-01-09 2023-04-28 深圳思谋信息科技有限公司 一种语义分割方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872374A (zh) * 2019-02-19 2019-06-11 江苏通佑视觉科技有限公司 一种图像语义分割的优化方法、装置、存储介质及终端
CN110335277A (zh) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN113902913A (zh) * 2021-08-31 2022-01-07 际络科技(上海)有限公司 图片语义分割方法及装置
CN114187311A (zh) * 2021-12-14 2022-03-15 京东鲲鹏(江苏)科技有限公司 一种图像语义分割方法、装置、设备及存储介质
CN114549913A (zh) * 2022-04-25 2022-05-27 深圳思谋信息科技有限公司 一种语义分割方法、装置、计算机设备和存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461157B (zh) * 2018-10-19 2021-07-09 苏州大学 基于多级特征融合及高斯条件随机场的图像语义分割方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872374A (zh) * 2019-02-19 2019-06-11 江苏通佑视觉科技有限公司 一种图像语义分割的优化方法、装置、存储介质及终端
CN110335277A (zh) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN113902913A (zh) * 2021-08-31 2022-01-07 际络科技(上海)有限公司 图片语义分割方法及装置
CN114187311A (zh) * 2021-12-14 2022-03-15 京东鲲鹏(江苏)科技有限公司 一种图像语义分割方法、装置、设备及存储介质
CN114549913A (zh) * 2022-04-25 2022-05-27 深圳思谋信息科技有限公司 一种语义分割方法、装置、计算机设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636026A (zh) * 2023-11-17 2024-03-01 上海凡顺实业股份有限公司 一种集装箱锁销类别图片识别方法
CN117636026B (zh) * 2023-11-17 2024-06-11 上海凡顺实业股份有限公司 一种集装箱锁销类别图片识别方法

Also Published As

Publication number Publication date
CN114549913A (zh) 2022-05-27
CN114549913B (zh) 2022-07-19

Similar Documents

Publication Publication Date Title
WO2023206944A1 (zh) 一种语义分割方法、装置、计算机设备和存储介质
WO2021042828A1 (zh) 神经网络模型压缩的方法、装置、存储介质和芯片
CN108830855B (zh) 一种基于多尺度低层特征融合的全卷积网络语义分割方法
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
WO2020228525A1 (zh) 地点识别及其模型训练的方法和装置以及电子设备
WO2020114378A1 (zh) 视频水印的识别方法、装置、设备及存储介质
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
EP4163831A1 (en) Neural network distillation method and device
CN109522945B (zh) 一种群体情感识别方法、装置、智能设备及存储介质
CN112507898A (zh) 一种基于轻量3d残差网络和tcn的多模态动态手势识别方法
CN110110689B (zh) 一种行人重识别方法
CN112070044B (zh) 一种视频物体分类方法及装置
EP3779891A1 (en) Method and device for training neural network model, and method and device for generating time-lapse photography video
WO2021164269A1 (zh) 基于注意力机制的视差图获取方法和装置
CN111582044A (zh) 基于卷积神经网络和注意力模型的人脸识别方法
CN111898703B (zh) 多标签视频分类方法、模型训练方法、装置及介质
CN114511576B (zh) 尺度自适应特征增强深度神经网络的图像分割方法与系统
CN110929099B (zh) 一种基于多任务学习的短视频帧语义提取方法及系统
CN114283350B (zh) 视觉模型训练和视频处理方法、装置、设备及存储介质
WO2021238586A1 (zh) 一种训练方法、装置、设备以及计算机可读存储介质
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN112749737A (zh) 图像分类方法及装置、电子设备、存储介质
CN114419406A (zh) 图像变化检测方法、训练方法、装置和计算机设备
CN116012841A (zh) 一种基于深度学习的开集图像场景匹配方法及装置
CN111242176A (zh) 计算机视觉任务的处理方法、装置及电子系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939767

Country of ref document: EP

Kind code of ref document: A1