WO2019196633A1 - 一种图像语义分割模型的训练方法和服务器 - Google Patents
一种图像语义分割模型的训练方法和服务器 Download PDFInfo
- Publication number
- WO2019196633A1 WO2019196633A1 PCT/CN2019/079404 CN2019079404W WO2019196633A1 WO 2019196633 A1 WO2019196633 A1 WO 2019196633A1 CN 2019079404 W CN2019079404 W CN 2019079404W WO 2019196633 A1 WO2019196633 A1 WO 2019196633A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- network model
- object region
- semantic segmentation
- magnification
- Prior art date
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 107
- 238000012549 training Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 76
- 239000006185 dispersion Substances 0.000 claims abstract description 73
- 230000004807 localization Effects 0.000 claims description 43
- 238000012545 processing Methods 0.000 claims description 18
- 238000011176 pooling Methods 0.000 claims description 15
- 238000002372 labelling Methods 0.000 claims description 14
- 239000011800 void material Substances 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000004891 communication Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 claims 1
- 238000010790 dilution Methods 0.000 abstract 2
- 239000012895 dilution Substances 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 15
- 238000003709 image segmentation Methods 0.000 description 11
- 230000004044 response Effects 0.000 description 9
- 230000009471 action Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000007499 fusion processing Methods 0.000 description 3
- 206010033307 Overweight Diseases 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2178—Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
- G06V10/7784—Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present application relates to the field of computer technology, and in particular to training of an image semantic segmentation model.
- Image semantic segmentation is the basis of image understanding. Image semantic segmentation is very important in autopilot, drone applications and wearable device applications. An image is composed of many pixels, and semantic segmentation is the grouping of pixels according to the semantic meaning of the expression in the image.
- a conventional deep convolutional neural network is generally trained as an image semantic segmentation network.
- the input image is classified into a full image, and then the object regions marked in the corresponding full image are classified according to the network positioning image, and then these regions are used as
- the supervised information of image semantic segmentation is used to train the image semantic segmentation network through the supervised information.
- the embodiment of the present application provides a training method and a server for an image semantic segmentation model, which are used to locate all object regions from an original image, and improve segmentation quality of image semantic segmentation.
- the embodiment of the present application provides the following technical solutions:
- the embodiment of the present application provides a training method for an image semantic segmentation model, which is applied to a server, and the method includes:
- the multi-magnification cavity convolutional neural network model is used to perform full-image classification and labeling of the original image under different cavity magnifications, and a global object localization map with different dispersion degrees in the original image is obtained, and any dispersion degree is used to indicate the passage.
- the global object location map is used as the supervised information of the image semantic segmentation network model, and the image semantic segmentation network model is trained by the supervised information.
- the embodiment of the present application further provides a server, including:
- An image acquisition module configured to acquire an original image used for model training
- the global object positioning module is configured to perform full-image classification and annotation on the original image at different hole magnifications by using a multi-fold cavity convolutional neural network model, and obtain a global object localization map in different dispersion degrees in the original image, optionally a dispersion is used to indicate a distribution of an object region positioned on the target object by the multi-fold cavity convolutional neural network model under the hole magnification corresponding to the dispersion;
- a model training module configured to use the global object location map as the supervision information of the image semantic segmentation network model, and train the image semantic segmentation network model by using the supervision information.
- constituent modules of the server may also perform the steps described in the first aspect above and various possible implementations, as described in the foregoing description of the first aspect and various possible implementations.
- an embodiment of the present application provides a server, where the server includes: a processor, a memory, a memory for storing an instruction, and a processor, configured to execute an instruction in the memory, such that the server performs the method of any one of the foregoing first aspects. .
- embodiments of the present application provide a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
- embodiments of the present application provide a computer program product comprising instructions that, when run on a computer, cause the computer to perform the methods described in the various aspects above.
- the embodiments of the present application have the following advantages:
- the original image for model training is first obtained, and then the multi-magnification cavity convolutional neural network model is used to perform full-image classification on the original image under different cavity magnifications, and the original image is obtained under different dispersion degrees.
- the global object localization map any one of the dispersion degrees is used to indicate the distribution of the object region located on the target object by the multi-fold cavity convolutional neural network model under the cavity magnification corresponding to the dispersion.
- the global object location map is used as the supervised information of the image semantic segmentation network model, and the image semantic segmentation network model is trained by the supervised information.
- the multi-rate hollow convolutional neural network model is used to perform full-image classification and labeling on the original image. Therefore, the multi-fold cavity convolution of the multi-fold cavity convolutional neural network model can locate different dispersions from the original image.
- the global object localization map includes all the regions of the target object. Therefore, the multi-volume hollow convolutional neural network model accurately locates all the object regions corresponding to the full-image classification annotation in the original image. , improve the segmentation quality of image semantic segmentation.
- FIG. 1 is a schematic block diagram showing a training method of an image semantic segmentation model according to an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a multi-rate hole convolution network model according to an embodiment of the present application
- FIG. 3 is a schematic diagram of a process for obtaining an object region in an image on a deep convolutional neural network model under a hole magnification according to an embodiment of the present application;
- FIG. 4 is a schematic diagram of a cavity convolution of different hole magnifications and corresponding object regions located in an image according to an embodiment of the present application;
- FIG. 5 is a schematic diagram of network segmentation results of weak supervised image classification and labeling training according to an embodiment of the present disclosure
- 6-a is a schematic structural diagram of a server according to an embodiment of the present application.
- FIG. 6-b is a schematic structural diagram of a global object positioning module according to an embodiment of the present disclosure.
- 6-c is a schematic structural diagram of a pooling processing unit according to an embodiment of the present application.
- 6-d is a schematic structural diagram of a structure of a hole convolution unit according to an embodiment of the present application.
- 6-e is a schematic structural diagram of a model training module according to an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a training method of an image semantic segmentation model applied to a server according to an embodiment of the present application.
- the embodiment of the present application provides a training method and a server for an image semantic segmentation model, which are used to locate all object regions from an original image, and improve segmentation quality of image semantic segmentation.
- the method can be applied to a server; the server can be a data processing capable service device located on the network side.
- the method may also be applied to the terminal device when the terminal device has sufficient data processing capability.
- the terminal device may be a computing device such as a PC (personal computer) or a smart terminal with data processing capability located on the user side.
- An embodiment of the training method for the image semantic segmentation model provided by the embodiment of the present application can be specifically applied to the full-image classification annotation of the image based on the hole convolutional neural network model, and the image semantic segmentation model training provided by the embodiment of the present application
- the method adopts weakly supervised image semantic segmentation technology, which can be applied to the situation of lacking fine pixel level segmentation of annotation data, and relies on full-image classification and annotation to achieve high-accuracy image segmentation.
- the global object localization map corresponding to the full-image classification annotation in the image is obtained by using the hollow convolutional neural network model.
- the multi-fold cavity convolutional neural network model is trained to realize the classification of the whole graph.
- the multi-rate cavity convolution is used to accurately locate the global object map corresponding to the full-image classification annotation in the original image. Then, the above-mentioned global object location map is taken as the supervised information of the segmentation, and the image segmentation network model is trained to realize image segmentation.
- the training method of the image semantic segmentation model provided by the embodiment of the present application can automatically crawl an image with a user-created tag in a website of massive user data, thereby training a weakly supervised image semantic segmentation network to implement image semantic segmentation.
- the semantic segmentation result can be used for image search based on image content of the website, personalized recommendation based on image content analysis, and the like.
- a training method for an image semantic segmentation model provided by an embodiment of the present application may include the following steps:
- the training sample image library may store the training sample images, and the images may be used for model training, that is, the images may be referred to as original images for model training, and the subsequent embodiments are simply referred to as original images.
- the original image includes one or more target objects, and the target object may be an object of various shapes, for example, a tool, an animal, or a character, etc., which is not limited herein.
- the original image may be stored in multiple manners, for example, the server receives the original image from the client, stores it in the database of the server, or the server reads the original image in real time in the memory to improve Model training efficiency.
- the convolutional neural network model is used to perform full-image classification, and the convolutional neural network used in the embodiment of the present application uses a plurality of hole-volume-volume convolutions.
- the convolutional neural network model may also be referred to as a "multiple-rate cavity convolutional neural network model.
- the multi-fold cavity convolutional neural network model may be trained to realize the classification of the full image, and after obtaining the network model,
- the multi-magnification cavity convolutional neural network model adopted in the embodiment of the present invention can achieve the purpose of full-image classification by improving the global object localization map corresponding to the full-image classification annotation in the training image by multi-fold cavity convolution, and improving the prior art volume
- the multi-rate cavity convolutional neural network model provided by the embodiment of the present application can locate the global object localization map under different dispersion degrees in the original image, which can only locate the most discriminative partial region of the object.
- the multi-magnification cavity convolutional neural network model is used to perform full-image classification and labeling of the original image under different cavity magnifications, and the global object localization map under different dispersion degrees can be obtained.
- the multi-magnification cavity convolutional neural network model performs a full-image classification and labeling of the original image under a cavity magnification, and a global object localization map under a dispersion degree can be obtained. That is to say, the cavity magnification of the multi-fold cavity convolutional neural network model has a certain correspondence with the dispersion of the global object localization map.
- the dispersion is used to indicate the distribution of the object region located on the target object by the multi-fold cavity convolutional neural network model.
- the object region localized by the multiplicative cavity convolutional neural network model can obtain the high-response object region corresponding to the full-image classification annotation in the original image through the Class Activation Map (CAM).
- CAM Class Activation Map
- the multi-magnification cavity convolutional neural network model can obtain the global object localization map under different dispersion degrees in the original image by performing full-scale classification and labeling on the original image with different hole magnifications, so the global object
- the positioning map not only locates the most discriminative part of the target object, but also locates other areas that lack discriminativeness, thereby locating all areas of the target object.
- step 102 uses a multi-fold cavity convolutional neural network model to perform full-image classification and annotation of the original image, and obtains a global object localization map in different dispersions in the original image, including:
- the feature map of the target object is extracted from the original image by using the first N-1 convolution layers in the multi-fold cavity convolutional neural network model.
- the multi-fold cavity convolutional neural network model includes: N convolution layers, wherein The N convolutional layers are multi-fold void convolutional layers, and N is a positive integer;
- the object regions under different dispersion degrees are obtained, and d is a positive integer;
- the global average pooling process is performed on the object regions under different dispersion degrees to obtain the global object localization map under different dispersion degrees in the original image.
- the multi-fold cavity convolutional neural network model used in the embodiment of the present application has a total of N convolutional layers, wherein the N-1 convolutional layers can extract the feature map of the target object from the original image, Nth
- the convolutional layer is the last convolutional layer
- the Nth convolutional layer is a multi-folded cavity convolutional layer, that is, the Nth convolutional layer uses a multi-fold cavity convolution, multi-rate hollow convolutional nerve
- the number N of convolutional layers of the network model can be determined according to a specific scenario.
- the use of the cavity convolution object positioning can be separated from the most discriminative part, and a multi-fold cavity convolutional neural network model is adopted, which is introduced after the last layer of the conventional convolutional neural network model.
- the global average pooling (GAP) processing can obtain a global object localization map with different degrees of dispersion, and the global object localization map includes an entire area of the target object.
- the foregoing performs global averaging pool processing on object regions under different degrees of dispersion to obtain global object localization maps in different dispersions in the original image, including:
- the first object region and the second object region are fused at different degrees of dispersion according to the first weight and the second weight to obtain a global object location map.
- the feature map of the target object is subjected to cavity convolution processing under multiple cavity magnifications d using a multi-fold cavity convolution layer.
- the object region obtained is called the first object region
- the object region obtained at 0 is called the second object region
- the fusion of the object regions with different weights is performed.
- the specific gravity of the object region with the magnification of 0 is set to be higher than the object region of other magnifications, because the cavity convolution
- the wrong object area may be located. Only when at least two object areas are positioned in the same area can the area be the correct effective object area.
- a high weight can be fixed to the area of the convolved object without voids outside the region of the multi-fold cavity convolution object, thereby avoiding the correct and most discriminating object region that is not convolved by the convolution of the cavity. Filtered by the average operation of the area of the empty convolution object.
- a high-accuracy object localization region can be obtained, which can be used as a high-quality image segmentation supervision information training subsequent image semantic segmentation network.
- Global object location map including:
- Determining a first object region d is equal to 0 when H 0, and the second object region (H 1, H 2, ... , H k) at the time of d equal to or greater than 0 and less than k, k is the maximum void ratio value;
- the global object localization map H is obtained by fusing the first object region H 0 and the second object region (H 1 , H 2 , . . . , H k ) at different dispersion degrees as follows:
- the first weight is 1, and the second weight is H i represents the i-th object region in the second object region (H 1 , H 2 , ..., H k ).
- the multi-rate cavity convolutional neural network model can be used to obtain the object region under different cavity magnifications, and the second object region includes the cavity convolution for each cavity magnification to generate the object region (H 1 , H 2 , ..., H k ).
- the first object region H 0 corresponding to the convolution without holes can be used, and the final global object map H is the fusion of all object regions under different magnification cavity convolutions.
- the first weight is 1, and the second weight is
- the scene may also set the weight corresponding to the first object region and the second object region respectively.
- the feature map of the target object is subjected to cavity convolution processing under a plurality of cavity magnifications d using a multi-fold cavity convolution layer to obtain an object region under different dispersion degrees, including:
- the multi-fold cavity convolution layer cooperates with the category response location map to obtain the high-response object region corresponding to the full-image classification annotation in the original image.
- f k (x, y) is the value at the (x, y) coordinate on the kth feature map of the last convolutional layer.
- Corresponding to the object area of the cth category It can be calculated by the above formula.
- the above-mentioned global object location map is used as the segmentation supervision information to train an image.
- Semantic segmentation network model implements image segmentation.
- the image semantic segmentation network model provided by the embodiment of the present application may specifically be a deep convolutional neural network model.
- the deep convolutional neural network model is trained to realize image segmentation.
- the convolution kernel size and the plurality of pooling core sizes used may be selected in combination with the specific scenario, which is not limited herein.
- the global object localization map includes a high-accuracy object localization region, and can be used as a high-quality image segmentation supervision information training image semantic segmentation network model.
- the semantic segmentation result can be used for image search based on image content of the website, personalized recommendation based on image content analysis, and the like.
- step 103 uses the global object location map as the supervised information of the image semantic segmentation network model, and trains the image semantic segmentation network model through the supervised information, including:
- the original image is input into the image semantic segmentation network model, and the image classification result is obtained by the image semantic segmentation network model;
- the loss results are backpropagated into all layers of the image semantic segmentation network model to continue training the image semantic segmentation network model.
- the image classification result is obtained by the image semantic segmentation network model, and the result is calculated by the cross entropy loss function under the supervision information of the full-image classification annotation, and the cross entropy loss function may be a sigmoid function. Then, the loss result is transmitted back to all layers of the image semantic segmentation network model through backpropagation to train the network parameters. After the image semantic segmentation network model training is completed, all layers of the image semantic segmentation network model can be used for image semantic output of the input image.
- the original image for model training is first obtained, and then the original image is classified by the multi-magnification cavity convolutional neural network model, and the original image is obtained under different dispersion degrees.
- the global object localization map the dispersion is used to indicate the distribution of the object region located on the target object by the multi-fold cavity convolutional neural network model.
- the global object location map is used as the supervised information of the image semantic segmentation network model, and the image semantic segmentation network model is trained by the supervised information.
- the multi-rate hollow convolutional neural network model is used to perform full-image classification and labeling on the original image.
- the multi-fold cavity convolution of the multi-fold cavity convolutional neural network model can locate different dispersions from the original image.
- the global object localization map includes all the regions of the target object. Therefore, the multi-volume hollow convolutional neural network model accurately locates all the object regions corresponding to the full-image classification annotation in the original image. , improve the segmentation quality of image semantic segmentation.
- the original image used for model training in the embodiment of the present application may have multiple sources, for example, an image with a user-created tag may be automatically crawled in a website of massive user data, thereby training a weakly supervised image semantic segmentation network.
- the semantic segmentation results can be used for image search based on image content of the website, personalized recommendation based on image content analysis.
- Embodiments of the present application include the use of a multi-fold cavity convolutional neural network model to achieve the purpose of full-image classification, which can improve the shortcomings of the traditional convolutional neural network that can only locate the most discriminative part of the object.
- the multi-fold cavity convolutional neural network model can not only locate the most discriminative part of the object, but also locate other areas that lack discriminability, thereby positioning all the object areas. Then, using the whole area of the located object as the supervision information, an image semantic segmentation network model is trained to realize image segmentation.
- FIG. 2 is a schematic structural diagram of a multi-rate hole convolution network model provided by an embodiment of the present application.
- r k to perform convolution operations simultaneously, and learns the object regions of different dispersions, and then performs global average pooling processing. Get global object features with different dispersions. The feature is then fused to obtain the final image classification result. The result is calculated by the cross-entropy loss function given the supervised information of the full-image classification annotation, and then the loss is transmitted back to the network through backpropagation.
- the layer performs training on network parameters.
- the multi-fold cavity convolution network model can generate an object localization map (H 1 , H 2 , ..., H k ) as shown in FIG. 2 for the cavity convolution of each void magnification. ).
- H 1 , H 2 , ..., H k an object positioning map
- H 0 corresponding to the convolution without the hole magnification can be generated.
- the final global object localization map H is the fusion of all localization maps under different magnification void convolutions:
- the average operation of the positioning map is filtered.
- FIG. 3 it is a schematic diagram of a process for obtaining an object region in an image on a deep convolutional neural network model under a hole magnification according to an embodiment of the present application.
- the deep convolutional neural network model cooperates with the category response location map to obtain a high-response object region corresponding to the full-image classification annotation in the original image.
- f k (x, y) is the value at the (x, y) coordinate on the kth feature map of the last convolutional layer. Is the weight connecting the kth feature map to the cth category.
- the object response map (CAM) corresponding to the cth category can be obtained:
- FIG. 4 is the cavity convolution of different cavity magnifications provided in the embodiment of the present application and the corresponding position in the image.
- FIG. 4 is the cavity convolution of different cavity magnifications provided in the embodiment of the present application and the corresponding position in the image.
- the convolution without void magnification is usually due to the small receptive field of the network, the localized object region is usually concentrated in the most discriminative part, and the convolution with void magnification is more due to the receptive field.
- FIG. 5 it is a schematic diagram of a network segmentation result of the weak supervised image classification and labeling training provided by the embodiment of the present application.
- the image data of the tag created by a large number of users on the Internet can be improved to train a fine image semantic segmentation network, effectively utilizing a large number of image data that could not be used before, and correspondingly reducing the cost of manual segmentation of the image segmentation. It has potential economic value for image semantic segmentation and its application.
- the image segmentation effect obtained by this technique is shown in Fig. 5. It can be seen that the segmentation quality close to the full supervised annotation can be obtained only by weak supervised annotation.
- the embodiment of the present application is also applicable to other multi-scale convolution networks, including convolution of multiple convolution kernel sizes, and multiple pooling ( Pooling) The pooling of kernel sizes.
- a server 600 provided by an embodiment of the present disclosure may include: an image obtaining module 601, a global object positioning module 602, and a model training module 603, where
- the global object positioning module 602 is configured to perform full-image classification and labeling on the original image by using a multi-fold cavity convolutional neural network model to obtain a global object localization map in different dispersion degrees in the original image, where the dispersion is used. And indicating a distribution of an object region positioned on the target object by the multi-fold cavity convolutional neural network model;
- the model training module 603 is configured to use the global object location map as the supervision information of the image semantic segmentation network model, and train the image semantic segmentation network model by using the supervision information.
- the global object positioning module 602 includes:
- a feature map extracting unit 6021 configured to extract a feature map of the target object from the original image by using the first N-1 convolution layers in the multi-magnification cavity convolutional neural network model, the multi-magnification cavity
- the convolutional neural network model includes: N convolutional layers, wherein the Nth convolutional layer is a multiple rate hole convolutional layer, and the N is a positive integer;
- a hole convolution unit 6022 configured to perform a hole convolution process on the feature map of the target object at a plurality of hole magnifications d by using the multi-fold cavity convolution layer to obtain an object region under different dispersion degrees.
- Said d is a positive integer;
- the pooling processing unit 6023 is configured to perform global averaging pool processing on the object regions under different degrees of dispersion to obtain a global object localization map in different dispersion degrees in the original image.
- the pooling processing unit 6023 includes:
- a weight acquisition sub-unit 60232 configured to acquire a first weight corresponding to the first object region, and a second weight corresponding to the second object region, where the value of the first weight is greater than the second weight value;
- the fusion subunit 60233 is configured to fuse the first object region and the second object region at different degrees of dispersion according to the first weight and the second weight to obtain the global object location map.
- the fusion subunit 6023 is specifically configured to determine a first object region H 0 when the d is equal to 0, and a first when the d is greater than 0 and less than or equal to k a two object region (H 1 , H 2 , ..., H k ), wherein k is a hole magnification maximum value; the first object region H 0 and the second object region (H 1 , H 2 ,...,H k ) is fused at different degrees of dispersion to obtain the global object localization map H:
- the H i represents an i-th object region of the second object region (H 1 , H 2 , ..., H k ).
- the hole convolution unit 6022 includes:
- a pixel feature point acquisition sub-unit 60221 configured to acquire a pixel feature point f t (x, y) at a coordinate of (t, y) on the t-th feature map of the multi-magnification cavity convolution layer, where t is Positive integer
- the class weight obtaining sub-unit 60222 is configured to obtain the weight of connecting the t-th feature to the c-th category under the hole magnification d
- the c is a positive integer
- the object region calculation sub-unit 6023 is configured to calculate the object region H d c corresponding to the c-th category at the hole magnification d by:
- the model training module 603 includes:
- a model output unit 6031 configured to input the original image into the image semantic segmentation network model, and obtain an image classification result by using the image semantic segmentation network model;
- a loss function calculation unit 6032 configured to calculate a cross entropy loss function according to the image classification result and the global object location map, to obtain a loss result
- the back propagation unit 6033 is configured to backpropagate the loss result to all layers of the image semantic segmentation network model to continue training the image semantic segmentation network model.
- the image semantic segmentation network model is specifically a deep convolutional neural network model.
- the original image for model training is first obtained, and then the multi-magnification cavity convolutional neural network model is used to perform full-image classification and labeling on the original image, and the global image under different dispersion degrees is obtained.
- the object localization map is used to indicate the distribution of the object region located on the target object by the multi-fold cavity convolutional neural network model.
- the global object location map is used as the supervised information of the image semantic segmentation network model, and the image semantic segmentation network model is trained by the supervised information.
- the multi-rate hollow convolutional neural network model is used to perform full-image classification and labeling on the original image.
- the multi-fold cavity convolution of the multi-fold cavity convolutional neural network model can locate different dispersions from the original image.
- the global object localization map includes all the regions of the target object. Therefore, the multi-volume hollow convolutional neural network model accurately locates all the object regions corresponding to the full-image classification annotation in the original image. , improve the segmentation quality of image semantic segmentation.
- FIG. 7 is a schematic structural diagram of a server provided by an embodiment of the present application.
- the server 1100 may have a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 1122 (for example, One or more processors and memory 1132, one or more storage media 1130 that store application 1142 or data 1144 (eg, one or one storage device in Shanghai).
- the memory 1132 and the storage medium 1130 may be short-term storage or persistent storage.
- the program stored on storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
- central processor 1122 can be configured to communicate with storage medium 1130, executing a series of instruction operations in storage medium 1130 on server 1100.
- Server 1100 may also include one or more power sources 1126, one or more wired or wireless network interfaces 1150, one or more input and output interfaces 1158, and/or one or more operating systems 1141, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- operating systems 1141 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- the steps of the training method of the image semantic segmentation model executed by the server in the above embodiment may be based on the server structure shown in FIG.
- the embodiment of the present application further provides a storage medium for storing program code, and the program code is used to execute the method provided by the foregoing embodiment.
- the embodiment of the present application further provides a computer program product comprising instructions, which when executed on a server, causes the server to execute the method provided by the above embodiments.
- the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be separate from each other, and the components displayed as units may or may not be
- the object units can be located in one place or distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, and specifically may be implemented as one or more communication buses or signal lines.
- U disk mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc., including a number of instructions to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present application.
- a computer device may be A personal computer, server, or network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
本申请实施例公开了一种图像语义分割模型的训练方法和服务器,用于从原始图像中定位出全部物体区域,提高了图像语义分割的分割质量。本申请实施例提供一种图像语义分割模型的训练方法,包括:获取用于模型训练的原始图像;使用多倍率空洞卷积神经网络模型在不同空洞倍率下对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过所述多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布;使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所述监督信息对所述图像语义分割网络模型进行训练。
Description
本申请要求于2018年04月10日提交中国专利局、申请号为201810317672.3、申请名称为“一种图像语义分割模型的训练方法和服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机技术领域,尤其涉及图像语义分割模型的训练。
图像语义分割是图像理解的基础,图像语义分割在自动驾驶、无人机应用以及穿戴式设备应用中都非常重要。图像是由许多像素组成,而语义分割就是将像素按照图像中表达语义含义的不同进行分组。
现有技术中通常训练一个常规的深度卷积神经网络作为图像语义分割网络,首先对输入图像进行全图分类,再根据该网络定位图像中对应全图分类标注的物体区域,然后利用这些区域作为图像语义分割的监督信息,通过该监督信息训练图像语义分割网络。
现有技术中利用常规的卷积操作定位对应全图分类标注的物体区域,通常只能定位出整个物体的某一个或多个最有判别性的部分,难以定位到全部物体区域,因此现有技术中的图像语义分割存在无法定位出全部物体区域的问题。
发明内容
本申请实施例提供了一种图像语义分割模型的训练方法和服务器,用于从原始图像中定位出全部物体区域,提高了图像语义分割的分割质量。
为解决上述技术问题,本申请实施例提供以下技术方案:
一方面,本申请实施例提供一种图像语义分割模型的训练方法,应用于服务器,所述方法包括:
获取用于模型训练的原始图像;
使用多倍率空洞卷积神经网络模型在不同空洞倍率下对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过所述多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布;
使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所 述监督信息对所述图像语义分割网络模型进行训练。
一方面,本申请实施例还提供一种服务器,包括:
图像获取模块,用于获取用于模型训练的原始图像;
全局物体定位模块,用于使用多倍率空洞卷积神经网络模型在不同空洞倍率下对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过所述多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布;
模型训练模块,用于使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所述监督信息对所述图像语义分割网络模型进行训练。
在第二方面中,服务器的组成模块还可以执行前述第一方面以及各种可能的实现方式中所描述的步骤,详见前述对第一方面以及各种可能的实现方式中的说明。
一方面,本申请实施例提供一种服务器,该服务器包括:处理器、存储器;存储器用于存储指令;处理器用于执行存储器中的指令,使得服务器执行如前述第一方面中任一项的方法。
一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
一方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
在本申请实施例中,首先获取用于模型训练的原始图像,然后使用多倍率空洞卷积神经网络模型在不同空洞倍率下对原始图像进行全图分类标注,得到原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布。最后使用全局物体定位图作为图像语义分割网络模型的监督信息,通过监督信息对图像语义分割网络模型进行训练。本申请实施例由于采用多倍率空洞卷积神经网络模型对原始图像进行全图分类标注,因此 通过多倍率空洞卷积神经网络模型的多倍率空洞卷积可以从原始图像上定位出在不同分散度下的全局物体定位图,该全局物体定位图包括了目标物体的全部区域,因此本申请实施例通过多倍率空洞卷积神经网络模型精确定位出了原始图像中对应全图分类标注的全部物体区域,提高了图像语义分割的分割质量。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种图像语义分割模型的训练方法的流程方框示意图;
图2为本申请实施例提供的多倍率空洞卷积网络模型的结构示意图;
图3为本申请实施例提供的在一个空洞倍率下的深度卷积神经网络模型上得到图像中物体区域的过程示意图;
图4为本申请实施例提供的不同空洞倍率的空洞卷积及在图像中定位出的相应物体区域的示意图;
图5为本申请实施例提供的弱监督图像分类标注训练的网络分割结果的示意图;
图6-a为本申请实施例提供的一种服务器的组成结构示意图;
图6-b为本申请实施例提供的一种全局物体定位模块的组成结构示意图;
图6-c为本申请实施例提供的一种池化处理单元的组成结构示意图;
图6-d为本申请实施例提供的一种空洞卷积单元的组成结构示意图;
图6-e为本申请实施例提供的一种模型训练模块的组成结构示意图;
图7为本申请实施例提供的图像语义分割模型的训练方法应用于服务器的组成结构示意图。
本申请实施例提供了一种图像语义分割模型的训练方法和服务器,用于从原始图像中定位出全部物体区域,提高了图像语义分割的分割质量。该方法可 应用于服务器;服务器可以是位于网络侧的具有数据处理能力的服务设备。在终端设备具备足够的数据处理能力时,该方法也可能应用于终端设备,终端设备可以是位于用户侧的具有数据处理能力的PC(个人计算机)、智能终端等计算设备。
为使得本申请的发明目的、特征、优点能够更加的明显和易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本申请一部分实施例,而非全部实施例。基于本申请中的实施例,本领域的技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
以下分别进行详细说明。
本申请实施例提供的图像语义分割模型的训练方法的一个实施例,具体可以应用于基于空洞卷积神经网络模型对图像的全图分类标注中,本申请实施例提供的图像语义分割模型的训练方法采用弱监督的图像语义分割技术,可以应用于缺乏精细像素级别分割标注数据的情况,仅仅依靠全图分类标注,实现高准确率的图像分割。本申请实施例主要通过空洞卷积神经网络模型依靠全图分类标注获取图像中对应全图分类标注的全局物体定位图。具体而言,先训练多倍率空洞卷积神经网络模型实现全图的分类,得到该网络模型之后依靠多倍率空洞卷积精确定位出原始图像中对应全图分类标注的全局物体定位图。然后将上述定位出的全局物体定位图作为分割的监督信息,训练图像语义分割网络模型实现图像分割。本申请实施例提供的图像语义分割模型的训练方法可以在海量用户数据的网站中自动爬取带有用户创建的标签的图像,以此训练弱监督的图像语义分割网络用来实现图像语义分割,语义分割结果则可用于网站的基于图像内容的以图搜图,基于图像内容分析的个性化推荐等。
请参阅图1所示,本申请一个实施例提供的图像语义分割模型的训练方法,可以包括如下步骤:
101、获取用于模型训练的原始图像。
在本申请实施例中,训练样本图像库中可以存储有训练样本图像,这些图像可以用于模型训练,即这些图像可以称为用于模型训练的原始图像,后续实施例简称为原始图像,在原始图像上包括有一个或者多个的目标物体,该目标物体可以是多种形状的物体,例如可以一种工具,或者一个动物,或者一个人物等,此处不做限定。需要说明的是,本申请实施例中,原始图像的存储可以有多种方式,例如服务器从客户端接收到原始图像,存储到服务器的数据库,或者服务器在内存中实时读入原始图像,以提高模型训练效率。
102、使用多倍率空洞卷积神经网络模型在不同空洞倍率下对原始图像进行全图分类标注,得到原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布。
在本申请实施例中,采用卷积神经网络模型来进行全图分类,并且本申请实施例中采用的卷积神经网络采用的是多个空洞倍率的空洞卷积,因此本申请实施例采用的是卷积神经网络模型也可以称为“多倍率空洞卷积神经网络模型。具体的,本申请实施例中可以先训练多倍率空洞卷积神经网络模型实现全图的分类,得到该网络模型之后依靠多倍率空洞卷积精确定位出训练图像中对应全图分类标注的全局物体定位图,本申请实施例采用的多倍率空洞卷积神经网络模型可以实现全图分类目的,改进现有技术中卷积神经网络只能定位到物体最有判别性的部分区域的缺点,因此本申请实施例提供的多倍率空洞卷积神经网络模型能够定位原始图像中在不同分散度下的全局物体定位图。
需要说明的是,在本申请的上述实施例中,通过多倍率空洞卷积神经网络模型在不同空洞倍率下对原始图像进行全图分类标注,可以得到不同分散度下的全局物体定位图,通过多倍率空洞卷积神经网络模型在一个空洞倍率下对原始图像进行全图分类标注,可以得到一个分散度下的全局物体定位图。也就是说,多倍率空洞卷积神经网络模型的空洞倍率与全局物体定位图的分散度具有一定的对应关系。
其中,分散度用于指示通过多倍率空洞卷积神经网络模型定位出的物体区域在目标物体上的分布。倍率空洞卷积神经网络模型定位出的物体区域具体可 以通过类别响应定位图(Class Activation Map,CAM)获取原始图像中对应全图分类标注的高响应物体区域。对于不同的分散度,是指定位出的高响应物体区域在真实物体上的分布,若高响应物体区域比较集中在目标物体某一个小部分,则分散度的取值为较低,若高响应物体区域分布于整个目标物体,则分散度的取值为较高。本申请实施例中通过多倍率空洞卷积神经网络模型可以通过对原始图像进行不同空洞倍率的全图分类标注,获取到该原始图像中在不同分散度下的全局物体定位图,因此该全局物体定位图不仅能定位到目标物体最有判别性的部分区域,还能同时定位到其他缺乏判别性的区域,以此定位到了目标物体的全部区域。
在本申请的一些实施例中,步骤102使用多倍率空洞卷积神经网络模型对原始图像进行全图分类标注,得到原始图像中在不同分散度下的全局物体定位图,包括:
使用多倍率空洞卷积神经网络模型中的前N-1个卷积层从原始图像中提取出目标物体的特征图,多倍率空洞卷积神经网络模型包括:N个卷积层,其中,第N个卷积层为多倍率空洞卷积层,N为正整数;
使用多倍率空洞卷积层对目标物体的特征图在多个空洞倍率d下分别进行空洞卷积处理,得到在不同分散度下的物体区域,d为正整数;
对在不同分散度下的物体区域进行全局平均池化处理,得到原始图像中在不同分散度下的全局物体定位图。
其中,本申请实施例中采用的多倍率空洞卷积神经网络模型共有N个卷积层,其中,使用前N-1个卷积层可以从原始图像中提取出目标物体的特征图,第N个卷积层为最后一个卷积层,并且该第N个卷积层为多倍率空洞卷积层,即第N个卷积层采用的是多倍率的空洞卷积,多倍率空洞卷积神经网络模型的卷积层个数N可以根据具体场景来确定。本申请实施例中利用空洞卷积物体定位可以脱离最有判别性部分的优点,采用多倍率的空洞卷积神经网络模型,该网络模型在常规的卷积神经网络模型的最后一层后引入多倍率空洞卷积层。多倍率空洞卷积层利用多种空洞倍率(例如d=r
1,r
2…r
k)的空洞卷积(dilated convolution)同时进行卷积操作,学习到不同分散度的物体区域,再进行全局平均池化(global average pooling,GAP)处理,可以得到不同分散度下的全局 物体定位图,该全局物体定位图中包括有目标物体的整体区域。
进一步的,在本申请的一些实施例中,前述的对在不同分散度下的物体区域进行全局平均池化处理,得到原始图像中在不同分散度下的全局物体定位图,包括:
获取空洞倍率为0时的第一物体区域,以及空洞倍率大于0时的第二物体区域;
获取第一物体区域对应的第一权重,以及第二物体区域对应的第二权重,第一权重的取值大于第二权重的取值;
根据第一权重和第二权重在不同分散度下对第一物体区域和第二物体区域进行融合,得到全局物体定位图。
其中,使用多倍率空洞卷积层对目标物体的特征图在多个空洞倍率d下分别进行空洞卷积处理,当空洞倍率为0时得到的物体区域称为第一物体区域,当空洞倍率大于0时得到的物体区域称为第二物体区域,最后进行不同权重的物体区域的融合,在该融合过程将倍率为0的物体区域比重设为高于其他倍率的物体区域,原因在于空洞卷积可能会定位到错误的物体区域。只有当至少两个物体区域都定位到同一个区域,才能说明该区域是正确的有效物体区域。因此,本申请实施例中可以在多倍率空洞卷积物体区域以外固定给没有空洞的卷积物体区域一个高权重,从而能避免没有空洞的卷积定位出的正确的最有判别性的物体区域被空洞卷积物体区域的平均操作所过滤。利用这种融合方式能得到高准确率的物体定位区域,能作为高质量的图像分割监督信息训练后续的图像语义分割网络。
接下来对本申请实施例中不同物体区域按照权重进行融合的方式进行举例说明,进一步的,根据第一权重和第二权重在不同分散度下对第一物体区域和第二物体区域进行融合,得到全局物体定位图,包括:
确定在d等于0时的第一物体区域H
0,以及在d大于0且小于或等于k时的第二物体区域(H
1,H
2,...,H
k),k为空洞倍率最大值;
通过如下方式对第一物体区域H
0和第二物体区域(H
1,H
2,...,H
k)在不同分散度下进行融合得到全局物体定位图H:
其中,使用多倍率空洞卷积神经网络模型可以获取到在不同空洞倍率下的物体区域,第二物体区域中包括对每个空洞倍率的空洞卷积都能生成物体区域(H
1,H
2,...,H
k)。当d=0时,可以采用没有空洞的卷积对应的第一物体区域H
0,最终的全局物体定位图H则为不同倍率空洞卷积下所有物体区域的融合。
在本申请实施例中,使用多倍率空洞卷积层对目标物体的特征图在多个空洞倍率d下分别进行空洞卷积处理,得到在不同分散度下的物体区域,包括:
获取多倍率空洞卷积层的第t个特征图上坐标为(x,y)处的像素特征点,t为正整数;
其中,在每一种空洞倍率下,多倍率空洞卷积层配合类别响应定位图获取原始图像中对应全图分类标注的高响应物体区域。假定f
k(x,y)是最后一个卷积层第k个特征图(feature map)上坐标为(x,y)处的值,
是连接第k个特征图到第c个类别的权重。则对应第c个类别的物体区域
可以通过前述公式计算出。
103、使用全局物体定位图作为图像语义分割网络模型的监督信息,通过监督信息对图像语义分割网络模型进行训练。
在本申请实施例中,通过多倍率空洞卷积神经网络模型得到原始图像中在 不同分散度下的全局物体定位图之后,将上述定位出的全局物体定位图作为分割的监督信息,训练一个图像语义分割网络模型实现图像分割。举例说明,本申请实施例提供的图像语义分割网络模型具体可以为深度卷积神经网络模型。利用定位出的全局物体定位图作为监督信息,训练深度卷积神经网络模型实现图像分割。本申请实施例中对图像语义分割网络模型的训练过程中,可以结合具体场景选择所使用的卷积核大小、多种池化(pooling)核大小,此处不做限定。由于本申请实施例采用全局物体定位图作为图像语义分割网络模型的监督信息,该全局物体定位图包括了高准确率的物体定位区域,能作为高质量的图像分割监督信息训练图像语义分割网络模型,语义分割结果则可用于网站的基于图像内容的以图搜图,基于图像内容分析的个性化推荐等。
在本申请的一些实施例中,步骤103使用全局物体定位图作为图像语义分割网络模型的监督信息,通过监督信息对图像语义分割网络模型进行训练,包括:
将原始图像输入到图像语义分割网络模型,通过图像语义分割网络模型获取到图像分类结果;
根据图像分类结果和全局物体定位图计算交叉熵损失函数,得到损失结果;
将损失结果反向传播到图像语义分割网络模型的所有层中,以继续对图像语义分割网络模型进行训练。
其中,通过图像语义分割网络模型获取到图像分类结果,该结果在给定全图分类标注的监督信息下进行交叉熵损失函数(cross entropy loss)计算,该交叉熵损失函数具体可以是sigmoid函数,然后再通过反向传播将该损失结果回传到图像语义分割网络模型的所有层中进行网络参数的训练。当图像语义分割网络模型训练完成之后,该图像语义分割网络模型的所有层可以用于对输入图像进行图像语义的输出。
通过以上实施例对本申请实施例的描述可知,首先获取用于模型训练的原始图像,然后使用多倍率空洞卷积神经网络模型对原始图像进行全图分类标注,得到原始图像中在不同分散度下的全局物体定位图,分散度用于指示通过多倍率空洞卷积神经网络模型定位出的物体区域在目标物体上的分布。最后使 用全局物体定位图作为图像语义分割网络模型的监督信息,通过监督信息对图像语义分割网络模型进行训练。本申请实施例由于采用多倍率空洞卷积神经网络模型对原始图像进行全图分类标注,因此通过多倍率空洞卷积神经网络模型的多倍率空洞卷积可以从原始图像上定位出在不同分散度下的全局物体定位图,该全局物体定位图包括了目标物体的全部区域,因此本申请实施例通过多倍率空洞卷积神经网络模型精确定位出了原始图像中对应全图分类标注的全部物体区域,提高了图像语义分割的分割质量。
为便于更好的理解和实施本申请实施例的上述方案,下面举例相应的应用场景来进行具体说明。
本申请实施例中用于模型训练的原始图像可以有多种来源,例如可以在海量用户数据的网站中自动爬取带有用户创建的标签的图像,以此训练弱监督的图像语义分割网络用来实现图像语义分割,语义分割结果则可用于网站的基于图像内容的以图搜图,基于图像内容分析的个性化推荐等。
本申请实施例包括利用一种多倍率空洞卷积神经网络模型实现全图分类目的,这种网络模型能改进传统的卷积神经网络只能定位到物体最有判别性的部分区域的缺点,本申请实施例中利用多倍率空洞卷积神经网络模型,不仅能定位到物体最有判别性的部分,还能同时定位到其他缺乏判别性的区域,以此定位到全部物体区域。然后利用定位出的物体全部区域作为监督信息,训练一个图像语义分割网络模型实现图像分割。
本申请实施例中可以利用多倍率空洞卷积神经网络定位出全部物体区域,即可以生成全局物体定位图。如图2所示,为本申请实施例提供的多倍率空洞卷积网络模型的结构示意图。利用空洞卷积物体定位可以脱离最有判别性部分的优点,本申请实施例提出了一种多倍率的空洞卷积神经网络,该网络在卷积神经网络模型的最后一层后引入多倍率空洞卷积层。多倍率空洞卷积层利用多种空洞倍率(d=r
1,r
2…r
k)的空洞卷积同时进行卷积操作,学习到不同分散度的物体区域,再进行全局平均池化处理,得到不同分散度下的全局物体特征。再把该特征融合得到最终的图像分类结果,该结果在给定全图分类标注的监督信息下进行交叉熵的损失函数计算,然后再通过反向传播将该损失(loss)回传到网络所有层进行网络参数的训练。
在本申请的一些实施例中,多倍率空洞卷积网络模型对每个空洞倍率的空洞卷积都能生成如图2所示的物体定位图(H
1,H
2,...,H
k)。当d=0时,可以生成没有空洞倍率的卷积对应的物体定位图H
0。最终的全局物体定位图H则为不同倍率空洞卷积下所有定位图的融合:
需要说明的是,在本申请实施例中,前述的融合过程将没有空洞倍率的卷积(d=0)的定位图比重设为高于其他倍率的定位图,原因在于空洞卷积同时会定位到错误的物体区域。只有当至少2个物体定位图都定位到同一个物体区域,才能说明该区域是正确的有效物体区域。因此,在多倍率空洞卷积定位图以外固定给没有空洞倍率的卷积定位图一个高权重能避免没有空洞倍率的卷积定位出的正确的最有判别性的部分被有空洞倍率的卷积定位图的平均操作所过滤。利用这种融合方式能得到高准确率的物体定位区域,能作为高质量的图像分割监督信息训练后续的图像语义分割网络。
如图3所示,为本申请实施例提供的在一个空洞倍率下的深度卷积神经网络模型上得到图像中物体区域的过程示意图。本申请实施例深度卷积神经网络模型配合类别响应定位图获取原始图像中对应全图分类标注的高响应物体区域。假定f
k(x,y)是最后一个卷积层第k个特征图上坐标为(x,y)处的值,
是连接第k个特征图到第c个类别(class)的权重。则对应第c个类别的物体响应图(CAM)可以求得:
本申请实施例中,有空洞倍率的卷积相比没有空洞倍率的卷积的区别如图4所示,为本申请实施例提供的不同空洞倍率的空洞卷积及在图像中定位出的相应物体区域的示意图。没有空洞倍率的卷积可视为d=0的空洞卷积。由图4可见,没有空洞倍率的卷积通常由于网络的感受野(receptive field)较小,定位出的物体区域通常集中在最有判别性的部分,而有空洞倍率的卷积由于感受野更大,定位出的物体区域更加分散,且d越大,区域越分散。d=0是空洞卷积的空洞倍率,可视为没有空洞倍率,即传统卷积神经网络,若d>0,则为有空洞倍率,即为空洞卷积神经网络。
如图5所示,为本申请实施例提供的弱监督图像分类标注训练的网络分割结果的示意图。本申请实施例中能够提高利用互联网上的海量用户创建了标签的图像数据训练精细的图像语义分割网络,有效地利用了大量以前无法利用的图像数据,并相应减少了图像分割人工标注的成本,对图像语义分割和其应用有潜在经济价值。利用本技术取得的图像分割效果如图5,可见仅仅依靠弱监督标注就能取得接近全监督标注的分割质量。
需要说明的是,本申请实施例中除了适用于上述说明的多倍率空洞卷积网络,还适用于其他多尺度的卷积网络,包括多种卷积核大小的卷积,多种池化(pooling)核大小的池化。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
为便于更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。
请参阅图6-a所示,本申请实施例提供的一种服务器600,可以包括:图像获取模块601、全局物体定位模块602、模型训练模块603,其中,
图像获取模块601,用于获取用于模型训练的原始图像;
全局物体定位模块602,用于使用多倍率空洞卷积神经网络模型对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,所述分散度用于指示通过所述多倍率空洞卷积神经网络模型定位出的物体区域在目标物体上的分布;
模型训练模块603,用于使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所述监督信息对所述图像语义分割网络模型进行训练。
在本申请的一些实施例中,请参阅图6-b所示,所述全局物体定位模块602,包括:
特征图提取单元6021,用于使用所述多倍率空洞卷积神经网络模型中的前N-1个卷积层从所述原始图像中提取出所述目标物体的特征图,所述多倍率空 洞卷积神经网络模型包括:N个卷积层,其中,第N个卷积层为多倍率空洞卷积层,所述N为正整数;
空洞卷积单元6022,用于使用所述多倍率空洞卷积层对所述目标物体的特征图在多个空洞倍率d下分别进行空洞卷积处理,得到在不同分散度下的物体区域,所述d为正整数;
池化处理单元6023,用于对所述在不同分散度下的物体区域进行全局平均池化处理,得到所述原始图像中在不同分散度下的全局物体定位图。
在本申请的一些实施例中,请参阅图6-c所示,所述池化处理单元6023,包括:
物体区域获取子单元60231,用于获取所述空洞倍率为0时的第一物体区域,以及所述空洞倍率大于0时的第二物体区域;
权重获取子单元60232,用于获取所述第一物体区域对应的第一权重,以及所述第二物体区域对应的第二权重,所述第一权重的取值大于所述第二权重的取值;
融合子单元60233,用于根据所述第一权重和所述第二权重在不同分散度下对所述第一物体区域和所述第二物体区域进行融合,得到所述全局物体定位图。
在本申请的一些实施例中,所述融合子单元60233,具体用于确定在所述d等于0时的第一物体区域H
0,以及在所述d大于0且小于或等于k时的第二物体区域(H
1,H
2,...,H
k),所述k为空洞倍率最大值;通过如下方式对所述第一物体区域H
0和所述第二物体区域(H
1,H
2,...,H
k)在不同分散度下进行融合得到所述全局物体定位图H:
在本申请的一些实施例中,请参阅图6-d所示,所述空洞卷积单元6022,包括:
像素特征点获取子单元60221,用于获取所述多倍率空洞卷积层的第t个特征图上坐标为(x,y)处的像素特征点f
t(x,y),所述t为正整数;
物体区域计算子单元60223,用于通过如下方式计算在空洞倍率d下对应第c个类别的物体区域H
d
c:
在本申请的一些实施例中,请参阅图6-e所示,所述模型训练模块603,包括:
模型输出单元6031,用于将所述原始图像输入到所述图像语义分割网络模型,通过所述图像语义分割网络模型获取到图像分类结果;
损失函数计算单元6032,用于根据所述图像分类结果和所述全局物体定位图计算交叉熵损失函数,得到损失结果;
反向传播单元6033,用于将所述损失结果反向传播到所述图像语义分割网络模型的所有层中,以继续对所述图像语义分割网络模型进行训练。
在本申请的一些实施例中,所述图像语义分割网络模型,具体为深度卷积神经网络模型。
通过以上对本申请实施例的描述可知,首先获取用于模型训练的原始图像,然后使用多倍率空洞卷积神经网络模型对原始图像进行全图分类标注,得到原始图像中在不同分散度下的全局物体定位图,分散度用于指示通过多倍率空洞卷积神经网络模型定位出的物体区域在目标物体上的分布。最后使用全局物体定位图作为图像语义分割网络模型的监督信息,通过监督信息对图像语义分割网络模型进行训练。本申请实施例由于采用多倍率空洞卷积神经网络模型对原始图像进行全图分类标注,因此通过多倍率空洞卷积神经网络模型的多倍率空洞卷积可以从原始图像上定位出在不同分散度下的全局物体定位图,该全局物体定位图包括了目标物体的全部区域,因此本申请实施例通过多倍率空洞卷积神经网络模型精确定位出了原始图像中对应全图分类标注的全部物体区域,提高了图像语义分割的分割质量。
图7是本申请实施例提供的一种服务器结构示意图,该服务器1100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1122(例如,一个或一个以上处理器)和存储器1132,一个或一个以上存储应用程序1142或数据1144的存储介质1130(例如一个或一个以上海量存储设备)。其中,存储器1132和存储介质1130可以是短暂存储或持久存储。存储在存储介质1130的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1122可以设置为与存储介质1130通信,在服务器1100上执行存储介质1130中的一系列指令操作。
服务器1100还可以包括一个或一个以上电源1126,一个或一个以上有线或无线网络接口1150,一个或一个以上输入输出接口1158,和/或,一个或一个以上操作系统1141,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的图像语义分割模型的训练方法的步骤可以基于该图7所示的服务器结构。
另外,本申请实施例还提供了一种存储介质,存储介质用于存储程序代码,程序代码用于执行上述实施例提供的方法。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在服务器上运行时,使得服务器执行上述实施例提供的方法。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物体上分开的,作为单元显示的部件可以是或者也可以不是物体单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下, 凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
综上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照上述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对上述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (17)
- 一种图像语义分割模型的训练方法,其特征在于,应用于服务器,所述方法包括:获取用于模型训练的原始图像;使用多倍率空洞卷积神经网络模型在不同空洞倍率下对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过所述多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布;使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所述监督信息对所述图像语义分割网络模型进行训练。
- 根据权利要求1所述的方法,其特征在于,所述使用多倍率空洞卷积神经网络模型在不同空洞倍率下对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,包括:使用所述多倍率空洞卷积神经网络模型中的前N-1个卷积层从所述原始图像中提取出所述目标物体的特征图,所述多倍率空洞卷积神经网络模型包括:N个卷积层,其中,第N个卷积层为多倍率空洞卷积层,所述N为正整数;使用所述多倍率空洞卷积层对所述目标物体的特征图在多个空洞倍率d下分别进行空洞卷积处理,得到在不同分散度下的物体区域,所述d为正整数;对所述在不同分散度下的物体区域进行全局平均池化处理,得到所述原始图像中在不同分散度下的全局物体定位图。
- 根据权利要求2所述的方法,其特征在于,所述对所述在不同分散度下的物体区域进行全局平均池化处理,得到所述原始图像中在不同分散度下的全局物体定位图,包括:获取所述空洞倍率为0时的第一物体区域,以及所述空洞倍率大于0时的第二物体区域;获取所述第一物体区域对应的第一权重,以及所述第二物体区域对应的第二权重,所述第一权重的取值大于所述第二权重的取值;根据所述第一权重和所述第二权重在不同分散度下对所述第一物体区域 和所述第二物体区域进行融合,得到所述全局物体定位图。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所述监督信息对所述图像语义分割网络模型进行训练,包括:将所述原始图像输入到所述图像语义分割网络模型,通过所述图像语义分割网络模型获取到图像分类结果;根据所述图像分类结果和所述全局物体定位图计算交叉熵损失函数,得到 损失结果;将所述损失结果反向传播到所述图像语义分割网络模型的所有层中,以继续对所述图像语义分割网络模型进行训练。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述图像语义分割网络模型包括深度卷积神经网络模型。
- 一种服务器,其特征在于,包括:图像获取模块,用于获取用于模型训练的原始图像;全局物体定位模块,用于使用多倍率空洞卷积神经网络模型在不同空洞倍率下对所述原始图像进行全图分类标注,得到所述原始图像中在不同分散度下的全局物体定位图,任意一个分散度用于指示通过所述多倍率空洞卷积神经网络模型在这个分散度所对应空洞倍率下定位出的物体区域在目标物体上的分布;模型训练模块,用于使用所述全局物体定位图作为图像语义分割网络模型的监督信息,通过所述监督信息对所述图像语义分割网络模型进行训练。
- 根据权利要求8所述的服务器,其特征在于,所述全局物体定位模块,包括:特征图提取单元,用于使用所述多倍率空洞卷积神经网络模型中的前N-1个卷积层从所述原始图像中提取出所述目标物体的特征图,所述多倍率空洞卷积神经网络模型包括:N个卷积层,其中,第N个卷积层为多倍率空洞卷积层,所述N为正整数;空洞卷积单元,用于使用所述多倍率空洞卷积层对所述目标物体的特征图在多个空洞倍率d下分别进行空洞卷积处理,得到在不同分散度下的物体区域,所述d为正整数;池化处理单元,用于对所述在不同分散度下的物体区域进行全局平均池化处理,得到所述原始图像中在不同分散度下的全局物体定位图。
- 根据权利要求9所述的服务器,其特征在于,所述池化处理单元,包括:物体区域获取子单元,用于获取所述空洞倍率为0时的第一物体区域,以及所述空洞倍率大于0时的第二物体区域;权重获取子单元,用于获取所述第一物体区域对应的第一权重,以及所述第二物体区域对应的第二权重,所述第一权重的取值大于所述第二权重的取值;融合子单元,用于根据所述第一权重和所述第二权重在不同分散度下对所述第一物体区域和所述第二物体区域进行融合,得到所述全局物体定位图。
- 根据权利要求8至12中任一项所述的服务器,其特征在于,所述模型训练模块,包括:模型输出单元,用于将所述原始图像输入到所述图像语义分割网络模型,通过所述图像语义分割网络模型获取到图像分类结果;损失函数计算单元,用于根据所述图像分类结果和所述全局物体定位图计算交叉熵损失函数,得到损失结果;反向传播单元,用于将所述损失结果反向传播到所述图像语义分割网络模型的所有层中,以继续对所述图像语义分割网络模型进行训练。
- 根据权利要求8至12中任一项所述的服务器,其特征在于,所述图像语义分割网络模型包括深度卷积神经网络模型。
- 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至7任意一项所述的方法。
- 一种服务器,所述服务器包括:处理器、通信接口、存储器和通信总线;其中,所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;所述通信接口为通信模块的接口;所述存储器,用于存储程序代码,并将所述程序代码传输给所述处理器;所述处理器,用于调用存储器中程序代码的指令执行权利要求1-7任意一项所述的方法。
- 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-7任意一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19784497.0A EP3779774B1 (en) | 2018-04-10 | 2019-03-25 | Training method for image semantic segmentation model and server |
US16/929,444 US11348249B2 (en) | 2018-04-10 | 2020-07-15 | Training method for image semantic segmentation model and server |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810317672.3 | 2018-04-10 | ||
CN201810317672.3A CN110363210B (zh) | 2018-04-10 | 2018-04-10 | 一种图像语义分割模型的训练方法和服务器 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/929,444 Continuation US11348249B2 (en) | 2018-04-10 | 2020-07-15 | Training method for image semantic segmentation model and server |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019196633A1 true WO2019196633A1 (zh) | 2019-10-17 |
Family
ID=68163086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/079404 WO2019196633A1 (zh) | 2018-04-10 | 2019-03-25 | 一种图像语义分割模型的训练方法和服务器 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11348249B2 (zh) |
EP (1) | EP3779774B1 (zh) |
CN (1) | CN110363210B (zh) |
WO (1) | WO2019196633A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111028154A (zh) * | 2019-11-18 | 2020-04-17 | 哈尔滨工程大学 | 一种地形崎岖不平海底的侧扫声呐图像匹配拼接方法 |
CN112631947A (zh) * | 2021-01-15 | 2021-04-09 | 北京字节跳动网络技术有限公司 | 应用程序的测试控制方法、装置、电子设备及存储介质 |
CN113239815A (zh) * | 2021-05-17 | 2021-08-10 | 广东工业大学 | 一种基于真实语义全网络学习的遥感影像分类方法、装置及设备 |
CN113496228A (zh) * | 2021-07-30 | 2021-10-12 | 大连海事大学 | 一种基于Res2Net、TransUNet和协同注意力的人体语义分割方法 |
CN113674300A (zh) * | 2021-08-24 | 2021-11-19 | 苏州天准软件有限公司 | 用于cnc自动测量的模型训练方法、测量方法及系统、设备、介质 |
CN114049269A (zh) * | 2021-11-05 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像校正方法和装置、电子设备 |
WO2022105125A1 (zh) * | 2020-11-17 | 2022-05-27 | 平安科技(深圳)有限公司 | 图像分割方法、装置、计算机设备及存储介质 |
CN114596435A (zh) * | 2022-01-06 | 2022-06-07 | 腾讯科技(深圳)有限公司 | 语义分割标签的生成方法、装置、设备及存储介质 |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020048359A1 (en) * | 2018-09-06 | 2020-03-12 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for improving quality of low-light images |
CN111666960B (zh) * | 2019-03-06 | 2024-01-19 | 南京地平线机器人技术有限公司 | 图像识别方法、装置、电子设备及可读存储介质 |
CN110136809B (zh) * | 2019-05-22 | 2022-12-27 | 腾讯科技(深圳)有限公司 | 一种医疗图像处理方法、装置、电子医疗设备和存储介质 |
CN111046921B (zh) * | 2019-11-25 | 2022-02-15 | 天津大学 | 基于U-Net网络和多视角融合的脑肿瘤分割方法 |
CA3157994A1 (en) * | 2019-11-27 | 2021-06-03 | Pavel SINHA | Systems and methods for performing direct conversion of image sensor data to image analytics |
CN111159542B (zh) * | 2019-12-12 | 2023-05-05 | 中国科学院深圳先进技术研究院 | 一种基于自适应微调策略的跨领域序列推荐方法 |
CN111598838B (zh) * | 2020-04-22 | 2023-04-07 | 中南民族大学 | 心脏mr图像自动分割方法、装置、电子设备和存储介质 |
CN111860827B (zh) * | 2020-06-04 | 2023-04-07 | 西安电子科技大学 | 一种基于神经网络模型的测向体制多目标定位方法和装置 |
CN111951274A (zh) * | 2020-07-24 | 2020-11-17 | 上海联影智能医疗科技有限公司 | 图像分割方法、系统、可读存储介质和设备 |
CN112861708B (zh) * | 2021-02-05 | 2023-04-07 | 北京理工大学前沿技术研究院 | 一种雷达图像的语义分割方法、设备及存储介质 |
JP2022145001A (ja) * | 2021-03-19 | 2022-10-03 | キヤノン株式会社 | 画像処理装置、画像処理方法 |
CN113160246A (zh) * | 2021-04-14 | 2021-07-23 | 中国科学院光电技术研究所 | 一种基于深度监督的图像语义分割方法 |
CN113344857B (zh) * | 2021-05-13 | 2022-05-03 | 深圳市华汉伟业科技有限公司 | 缺陷检测网络的训练方法、缺陷检测方法和存储介质 |
CN113312993B (zh) * | 2021-05-17 | 2022-07-26 | 北京大学 | 一种基于PSPNet的遥感数据土地覆盖分类方法 |
CN113610807B (zh) * | 2021-08-09 | 2024-02-09 | 西安电子科技大学 | 基于弱监督多任务学习的新冠肺炎分割方法 |
CN113808055B (zh) * | 2021-08-17 | 2023-11-24 | 中南民族大学 | 基于混合膨胀卷积的植物识别方法、装置及存储介质 |
CN114861771A (zh) * | 2022-04-15 | 2022-08-05 | 西安交通大学 | 基于特征提取和深度学习的工业ct图像缺陷分类方法 |
CN115063704A (zh) * | 2022-06-28 | 2022-09-16 | 南京邮电大学 | 一种立体特征融合语义分割的无人机监测目标分类方法 |
CN115205300B (zh) * | 2022-09-19 | 2022-12-09 | 华东交通大学 | 基于空洞卷积和语义融合的眼底血管图像分割方法与系统 |
CN118072358B (zh) * | 2024-04-17 | 2024-07-19 | 南昌理工学院 | 基于人工智能的自动按摩调控系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145845A (zh) * | 2017-04-26 | 2017-09-08 | 中山大学 | 基于深度学习及多特征点融合的行人检测方法 |
CN107316015A (zh) * | 2017-06-19 | 2017-11-03 | 南京邮电大学 | 一种基于深度时空特征的高精度面部表情识别方法 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2384185A2 (en) * | 2009-01-28 | 2011-11-09 | The Procter & Gamble Company | Methods for improving skin quality using rinse-off personal care compositions with variable amounts of hydrophobic benefit agents |
EP2524224A1 (en) * | 2010-01-17 | 2012-11-21 | The Procter & Gamble Company | Biomarker-based methods for formulating compositions that improve skin quality and reduce the visible signs of aging in skin for individuals in a selected population |
CN102651127A (zh) * | 2012-04-01 | 2012-08-29 | 深圳市万兴软件有限公司 | 一种超分辨率重建的图像处理方法及系统 |
CN104077808A (zh) * | 2014-07-20 | 2014-10-01 | 詹曙 | 一种用于计算机图形图像处理的、基于深度信息的实时三维人脸建模方法 |
JP6539901B2 (ja) * | 2015-03-09 | 2019-07-10 | 学校法人法政大学 | 植物病診断システム、植物病診断方法、及びプログラム |
US10360477B2 (en) * | 2016-01-11 | 2019-07-23 | Kla-Tencor Corp. | Accelerating semiconductor-related computations using learning based models |
CN107784654B (zh) * | 2016-08-26 | 2020-09-25 | 杭州海康威视数字技术股份有限公司 | 图像分割方法、装置及全卷积网络系统 |
CN106504190B (zh) * | 2016-12-29 | 2019-09-13 | 浙江工商大学 | 一种基于3d卷积神经网络的立体视频生成方法 |
CN106875415B (zh) * | 2016-12-29 | 2020-06-02 | 北京理工雷科电子信息技术有限公司 | 一种动态背景中弱小动目标的连续稳定跟踪方法 |
US10671873B2 (en) * | 2017-03-10 | 2020-06-02 | Tusimple, Inc. | System and method for vehicle wheel detection |
CN107403430B (zh) * | 2017-06-15 | 2020-08-07 | 中山大学 | 一种rgbd图像语义分割方法 |
US10477148B2 (en) * | 2017-06-23 | 2019-11-12 | Cisco Technology, Inc. | Speaker anticipation |
CN107563383A (zh) * | 2017-08-24 | 2018-01-09 | 杭州健培科技有限公司 | 一种医学影像辅助诊断及半监督样本生成系统 |
CN107480726A (zh) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | 一种基于全卷积和长短期记忆单元的场景语义分割方法 |
CN107679477B (zh) * | 2017-09-27 | 2021-02-02 | 深圳市未来媒体技术研究院 | 基于空洞卷积神经网络的人脸深度和表面法向量预测方法 |
CN107665491B (zh) * | 2017-10-10 | 2021-04-09 | 清华大学 | 病理图像的识别方法及系统 |
CN107766820A (zh) * | 2017-10-20 | 2018-03-06 | 北京小米移动软件有限公司 | 图像分类方法及装置 |
CN107767384B (zh) * | 2017-11-03 | 2021-12-03 | 电子科技大学 | 一种基于对抗训练的图像语义分割方法 |
CN107871142A (zh) * | 2017-11-14 | 2018-04-03 | 华南理工大学 | 一种基于深度卷积对抗网络模型的空洞卷积方法 |
CN107767380A (zh) * | 2017-12-06 | 2018-03-06 | 电子科技大学 | 一种基于全局空洞卷积的高分辨率复合视野皮肤镜图像分割方法 |
-
2018
- 2018-04-10 CN CN201810317672.3A patent/CN110363210B/zh active Active
-
2019
- 2019-03-25 WO PCT/CN2019/079404 patent/WO2019196633A1/zh unknown
- 2019-03-25 EP EP19784497.0A patent/EP3779774B1/en active Active
-
2020
- 2020-07-15 US US16/929,444 patent/US11348249B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145845A (zh) * | 2017-04-26 | 2017-09-08 | 中山大学 | 基于深度学习及多特征点融合的行人检测方法 |
CN107316015A (zh) * | 2017-06-19 | 2017-11-03 | 南京邮电大学 | 一种基于深度时空特征的高精度面部表情识别方法 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111028154A (zh) * | 2019-11-18 | 2020-04-17 | 哈尔滨工程大学 | 一种地形崎岖不平海底的侧扫声呐图像匹配拼接方法 |
CN111028154B (zh) * | 2019-11-18 | 2023-05-09 | 哈尔滨工程大学 | 一种地形崎岖不平海底的侧扫声呐图像匹配拼接方法 |
WO2022105125A1 (zh) * | 2020-11-17 | 2022-05-27 | 平安科技(深圳)有限公司 | 图像分割方法、装置、计算机设备及存储介质 |
CN112631947A (zh) * | 2021-01-15 | 2021-04-09 | 北京字节跳动网络技术有限公司 | 应用程序的测试控制方法、装置、电子设备及存储介质 |
CN112631947B (zh) * | 2021-01-15 | 2023-04-25 | 抖音视界有限公司 | 应用程序的测试控制方法、装置、电子设备及存储介质 |
CN113239815B (zh) * | 2021-05-17 | 2022-09-06 | 广东工业大学 | 一种基于真实语义全网络学习的遥感影像分类方法、装置及设备 |
CN113239815A (zh) * | 2021-05-17 | 2021-08-10 | 广东工业大学 | 一种基于真实语义全网络学习的遥感影像分类方法、装置及设备 |
CN113496228A (zh) * | 2021-07-30 | 2021-10-12 | 大连海事大学 | 一种基于Res2Net、TransUNet和协同注意力的人体语义分割方法 |
CN113496228B (zh) * | 2021-07-30 | 2024-03-26 | 大连海事大学 | 一种基于Res2Net、TransUNet和协同注意力的人体语义分割方法 |
CN113674300B (zh) * | 2021-08-24 | 2022-10-28 | 苏州天准软件有限公司 | 用于cnc自动测量的模型训练方法、测量方法及系统、设备、介质 |
CN113674300A (zh) * | 2021-08-24 | 2021-11-19 | 苏州天准软件有限公司 | 用于cnc自动测量的模型训练方法、测量方法及系统、设备、介质 |
CN114049269A (zh) * | 2021-11-05 | 2022-02-15 | Oppo广东移动通信有限公司 | 图像校正方法和装置、电子设备 |
CN114596435A (zh) * | 2022-01-06 | 2022-06-07 | 腾讯科技(深圳)有限公司 | 语义分割标签的生成方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US11348249B2 (en) | 2022-05-31 |
US20210035304A1 (en) | 2021-02-04 |
CN110363210A (zh) | 2019-10-22 |
EP3779774A1 (en) | 2021-02-17 |
EP3779774B1 (en) | 2024-05-08 |
CN110363210B (zh) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019196633A1 (zh) | 一种图像语义分割模型的训练方法和服务器 | |
US20210012198A1 (en) | Method for training deep neural network and apparatus | |
US10586350B2 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
CN108205655B (zh) | 一种关键点预测方法、装置、电子设备及存储介质 | |
WO2021227726A1 (zh) | 面部检测、图像检测神经网络训练方法、装置和设备 | |
WO2019228317A1 (zh) | 人脸识别方法、装置及计算机可读介质 | |
WO2020107847A1 (zh) | 基于骨骼点的跌倒检测方法及其跌倒检测装置 | |
US20190172224A1 (en) | Optimizations for Structure Mapping and Up-sampling | |
CN108288051B (zh) | 行人再识别模型训练方法及装置、电子设备和存储介质 | |
EP4002161A1 (en) | Image retrieval method and apparatus, storage medium, and device | |
EP3493105A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
WO2023185785A1 (zh) | 一种图像处理方法、模型训练方法及相关装置 | |
CN112396106B (zh) | 内容识别方法、内容识别模型训练方法及存储介质 | |
EP3493106B1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
WO2019232772A1 (en) | Systems and methods for content identification | |
WO2020238353A1 (zh) | 数据处理方法和装置、存储介质及电子装置 | |
CN111680550B (zh) | 情感信息识别方法、装置、存储介质及计算机设备 | |
CN116310318B (zh) | 交互式的图像分割方法、装置、计算机设备和存储介质 | |
WO2019108250A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
WO2021127916A1 (zh) | 脸部情感识别方法、智能装置和计算机可读存储介质 | |
WO2024001806A1 (zh) | 一种基于联邦学习的数据价值评估方法及其相关设备 | |
CN113705596A (zh) | 图像识别方法、装置、计算机设备和存储介质 | |
CN113641811A (zh) | 促进购买行为的会话推荐方法、系统、设备及存储介质 | |
CN117726884A (zh) | 对象类别识别模型的训练方法、对象类别识别方法及装置 | |
CN114064973B (zh) | 视频新闻分类模型建立方法、分类方法、装置及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19784497 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019784497 Country of ref document: EP Effective date: 20201110 |