WO2019024808A1 - 语义分割模型的训练方法和装置、电子设备、存储介质 - Google Patents

语义分割模型的训练方法和装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2019024808A1
WO2019024808A1 PCT/CN2018/097549 CN2018097549W WO2019024808A1 WO 2019024808 A1 WO2019024808 A1 WO 2019024808A1 CN 2018097549 W CN2018097549 W CN 2018097549W WO 2019024808 A1 WO2019024808 A1 WO 2019024808A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sub
images
semantic segmentation
neural network
Prior art date
Application number
PCT/CN2018/097549
Other languages
English (en)
French (fr)
Inventor
詹晓航
刘子纬
罗平
吕健勤
汤晓鸥
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020197038767A priority Critical patent/KR102358554B1/ko
Priority to JP2019571272A priority patent/JP6807471B2/ja
Priority to SG11201913365WA priority patent/SG11201913365WA/en
Publication of WO2019024808A1 publication Critical patent/WO2019024808A1/zh
Priority to US16/726,880 priority patent/US11301719B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/23Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on positionally close patterns or neighbourhood relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing

Definitions

  • the embodiments of the present application relate to computer vision technologies, and in particular, to a training method and apparatus for a semantic segmentation model, an electronic device, and a storage medium.
  • Image semantic segmentation assigns a corresponding judgment label to each pixel of the input image, indicating which object or category the pixel is most likely to belong to. It is an important task in the field of computer vision, and its applications include machine scene understanding, video analysis and so on.
  • the embodiment of the present application provides a training technique for a semantic segmentation model.
  • the semantic segmentation model is trained based on the categories of at least two sub-images and the feature distance between the at least two sub-images.
  • a training apparatus for a semantic segmentation model which includes:
  • a segmentation unit configured to perform image semantic segmentation on at least one unlabeled image by using a semantic segmentation model to obtain a preliminary semantic segmentation result as a category of the unlabeled image
  • a sub-image extracting unit configured to obtain, by the convolutional neural network, a feature corresponding to the sub-image and the sub-image corresponding to the at least two images, based on the category of the at least one unlabeled image and the category of the at least one labeled image,
  • the at least two images include at least one of the unlabeled image and at least one of the labeled images, the at least two sub-images carrying a category of a corresponding image;
  • a training unit configured to train the semantic segmentation model based on a category of at least two sub-images and a feature distance between the at least two sub-images.
  • an electronic device including a processor, the processor including a training device of a semantic segmentation model as described above.
  • an electronic device includes: a memory, configured to store executable instructions;
  • a processor for communicating with the memory to execute the executable instructions to perform the operations of the training method of the semantic segmentation model as described above.
  • a computer storage medium for storing computer readable instructions that, when executed, perform the operations of the training method of the semantic segmentation model as described above.
  • a computer program comprising computer readable code, the processor in the device executing to implement any implementation of the present application when the computer readable code is run on a device
  • the instructions of each step in the training method of the semantic segmentation model described in the example are provided.
  • the semantic segmentation model is used to perform image semantic segmentation on the unlabeled image, so that the unlabeled image can obtain a noisy category.
  • the sub-image corresponding to at least two images is obtained, and the labeled image and the unlabeled image are applied to the training, and self-supervised training is realized; through the convolutional neural network, The feature extraction is performed on the sub-image, and the semantic segmentation model is trained based on the category of at least two sub-images and the feature distance between the at least two sub-images, and the self-supervised learning with strong semantic discrimination capability is obtained through training.
  • the semantic segmentation model can achieve higher accuracy in semantic segmentation.
  • FIG. 1 is a flow chart of an embodiment of a training method for a semantic segmentation model of the present application.
  • FIG. 2 is a schematic diagram showing an example of establishing a patch map of the training method of the semantic segmentation model of the present application.
  • FIG. 3 is another schematic diagram of establishing a patch map of the training method of the semantic segmentation model of the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of a training apparatus for a semantic segmentation model of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
  • Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a flow chart of an embodiment of a training method for a semantic segmentation model of the present application. As shown in Figure 1, the method of this embodiment includes:
  • Step 101 Perform semantic segmentation on at least one unlabeled image by using a semantic segmentation model to obtain a preliminary semantic segmentation result as a category of the unlabeled image.
  • the unlabeled image means that the category (for example, the semantic category) of some or all of the pixels in the image is uncertain.
  • the unlabeled image can be performed by a known semantic segmentation model. Image semantic segmentation to obtain semantic segmentation results with noise.
  • the step 101 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a segmentation unit 41 that is executed by the processor.
  • Step 102 Obtain a feature corresponding to the sub-image and the sub-image corresponding to the at least two images by using the convolutional neural network based on the category of the at least one unlabeled image and the category of the at least one labeled image.
  • the at least two images include at least one unlabeled image and at least one labeled image, and at least two of the sub-images carry a category of the corresponding image.
  • the size of the selection box is moved in the image, and then according to the type of the pixel in the image, it is determined whether the pixels in the selection frame are of the same category, and pixels in a selection box that exceed the set ratio belong to the same category. , you can output this selection box as a sub image.
  • the step 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a sub-image extraction unit 42 that is executed by the processor.
  • Step 103 Train the semantic segmentation model based on the categories of the at least two sub-images and the feature distance between the at least two sub-images.
  • the step 103 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a training unit 43 executed by the processor.
  • the semantic segmentation model Based on the training method of the semantic segmentation model provided by the above embodiments of the present application, the semantic segmentation model performs image semantic segmentation on the unlabeled image, so that the unlabeled image can obtain a noisy category, based on the unlabeled image category, and The type of the image has been labeled, and at least two sub-images corresponding to the image are obtained, and the labeled image and the unlabeled image are applied to the training, and the self-supervised training is realized; and the feature extraction is performed on the sub-image through the convolutional neural network.
  • the semantic segmentation model is trained, and the semantic segmentation model of self-supervised learning with strong semantic discrimination ability is obtained through training, on semantic segmentation. Can achieve higher accuracy.
  • Self-supervised learning is performed by using the unlabeled image itself to obtain an image descriptor.
  • the image descriptor is a high-latitude vector that can be used to describe the semantic information of the image; then these image descriptors are used for semantic segmentation training.
  • step 103 includes:
  • the patch graph includes nodes and edges, the nodes include sub-images, and the edges include feature distances between any two sub-images;
  • the semantic segmentation model is trained such that the feature distance between the two sub-images with the same category is smaller than the first preset value in the patch image, and the feature distance between the two sub-images with different categories is greater than the second preset value. .
  • FIG. 2 is a schematic diagram of an example of establishing a patch map of the training method of the semantic segmentation model of the present application.
  • a sub-image is used as a node 221, and at least one of the known category images 21 is selected by the selection box 211.
  • a sub-image the feature distance between the sub-images having the connection relationship is taken as the edge 222 (the feature in the selection frame selected in the middle layer feature in FIG.
  • the feature of the sub-image is a feature selected by the corresponding selection frame in the feature map outputted by the output layer of the convolutional neural network; optionally, the output layer is the middle layer in the convolutional neural network Or any layer in the deep layer; select one of the layers in the convolutional neural network or the deep layer as the output layer, wherein the shallow features of the image generally represent some edges, corners, and the like of the object in the image, and the layer features in the image are generally Characterize some parts of the object (such as: the wheel of the vehicle, the nose of the face, etc.), the deep features of the image generally represent the category information of the image as a whole (eg, people, cars, horses, etc.) In order to create a map through the sub-image and optimize the parameters, select one of the middle layer or the deep layer as the output layer of the labeled image and the unlabeled
  • first preset value and the second preset value are preset
  • the second preset value is generally greater than the first preset value
  • the first preset value and the second preset value are used to make the categories the same
  • FIG. 3 is another schematic diagram of establishing a patch map of the training method of the semantic segmentation model of the present application.
  • the method of the embodiment comprises: based on a convolutional neural network (CNN in FIG. 3), based on a category of at least one unlabeled image (the category of the unlabeled image can be obtained based on a known semantic segmentation model), and at least one labeled image a class, obtaining a feature corresponding to the sub-image and the sub-image corresponding to the at least two images respectively (the feature of the corresponding sub-image position in the middle layer feature in FIG. 3); establishing a patch graph according to the class relationship between the sub-images
  • the patch map includes nodes and edges (the circles in Fig. 3 represent nodes, and the lines connecting the two circles represent edges), the nodes include sub-images, and the edges include feature distances between any two sub-images.
  • the patch map is established according to the class relationship between the sub-images, including:
  • a sub-image of the same category as the reference node is used as a positive correlation node, and a sub-image of a different category from the reference node is used as a negative correlation node, and a positive correlation connection is established between the reference node and at least one positive correlation node, respectively, at the reference node and At least one negative correlation node establishes a negative correlation connection;
  • a map of sparse connections is formed by at least one reference node, a positive correlation node of the reference node, a negative correlation node of the reference node, a positive correlation connection, and a negative correlation connection.
  • the process of creating a patch map is to randomly select a plurality of sub-images from at least two sub-images, respectively, using randomly selected sub-images as anchors, and randomly selecting one sub-category of the same category based on the semantic category.
  • the image acts as a positive, randomly selecting a sub-image of a different semantic category from the anchor as a negative, and then establishing two connections based on one sub-image: anchor-positive and anchor-negative; based on these connections , a patch map of sparse connections is established.
  • the semantic segmentation model is trained, including:
  • the semantic segmentation model is trained by a gradient back propagation algorithm to minimize the error of the convolutional neural network, and the error is a triple loss of the feature of the corresponding sub-image obtained based on the convolutional neural network.
  • the gradient back propagation algorithm is used to narrow the error in the convolutional neural network, so that the parameters of the convolutional neural network from at least one layer of the first layer to the output layer are optimized, and the gradient back propagation algorithm (BP, Back)
  • the Propagation algorithm is a learning algorithm suitable for multi-layer neural networks under the guidance of a mentor. It is based on the gradient descent method.
  • the input-output relationship of BP network is essentially a mapping relationship: the function performed by a BP neural network with n input m output is a continuous mapping from n-dimensional Euclidean space to a finite field in m-dimensional Euclidean space. The mapping is highly nonlinear.
  • the learning process of the BP algorithm consists of a forward propagation process and a back propagation process.
  • the input information is processed through the hidden layer through the input layer and processed layer by layer and transmitted to the output layer. If the desired output value is not obtained at the output layer, the sum of the squares of the output and the expected error is taken as the objective function, and the back propagation is performed, and the partial derivative of the objective function for each neuron weight is obtained layer by layer to form a target.
  • the ladder of the function to the weight vector is used as the basis for modifying the weight.
  • the learning of the network is completed in the process of weight modification. When the error reaches the desired value, the network learning ends.
  • edges in the patch map are obtained from the feature distances between the sub-images output by the output layer, where the output layer is a layer selected from the middle layer or the deep layer, and therefore, the parameters of all the layers of the convolutional neural network are not optimized. It is the parameter from the first layer to the output layer. Therefore, in the error calculation process, the error from the output layer to at least one layer in the first layer is also calculated.
  • the semantic segmentation model is trained by a gradient back propagation algorithm, including:
  • the error is calculated according to the distance between the sub-images output by the convolutional neural network after optimizing the parameters, and the error is taken as the maximum error;
  • Iterative execution calculates the error of at least one layer in the convolutional neural network by using the maximum error to propagate back through the gradient; calculates the gradient of at least one layer of parameters according to the error of at least one layer, and corrects the parameters of the corresponding layer in the convolutional neural network according to the gradient Until the maximum error is less than or equal to the preset value.
  • a loss function is first defined, and the convolutional neural network optimizes the network parameters by minimizing the loss function, as shown in equation (1):
  • the gradient backpropagation algorithm can optimize the parameters of each layer in the convolutional neural network.
  • the process of training the semantic segmentation model may include:
  • the parameters in the semantic segmentation model are initialized based on the parameters of the obtained convolutional neural network.
  • the parameters of the trained convolutional neural network have strong semantic class distinguishing, and a higher accuracy can be obtained in semantic segmentation.
  • the parameters of the convolutional neural network replace the parameters in the original semantic segmentation model, and the trained semantic segmentation model is obtained.
  • step 102 may include:
  • the selection frame in the preset size is moved on the at least two images, and the pixels in the selection frame are determined.
  • the proportion of the pixels in the same semantic category in the pixels in the selection frame is greater than or equal to a preset value, the selection is selected.
  • the image in the frame is output as a sub image, and the sub image is marked as a category;
  • the features corresponding to the sub-images are obtained by convolutional neural networks.
  • At least two images are segmented by a selection box of variable size, wherein at least two images include an unlabeled image and an annotated image, and the pixels in the selection frame belong to a category (eg, semantics)
  • the selection box may be divided into the categories, and the pixel output in the selection box is used as a sub-image, and the size of the selection frame is adjustable.
  • the size of the selection box can be adjusted to re-segment until a certain number of sub-images are obtained.
  • the step 102 may further include: discarding the selection box when the proportion of the pixels of the same category in the pixels in the selection box is less than a preset value.
  • the multiple categories correspond to If the pixel ratio is less than the preset value, then the selection box cannot determine the category. In this case, the selection box needs to be moved to the next position, and the judgment is continued at the next position; when the selection box through a set size is not in an image. When you get any sub-image, you need to adjust the size of the selection box and re-select the image.
  • the feature corresponding to the sub-image obtained by the convolutional neural network includes:
  • the features in the corresponding selection frame are obtained from the corresponding feature map, and the features corresponding to the sub-image are determined.
  • the feature of the corresponding sub-image is selected by the selection frame of the same position and size in the feature map of the output layer of the corresponding convolutional neural network, and then passed
  • the feature of the sub-image obtains the feature distance between any two sub-images.
  • the training method of the semantic segmentation model of the present application may further include, prior to step 102, initializing parameters of the convolutional neural network based on parameters of the semantic segmentation model.
  • parameters of the convolutional neural network are initialized using parameters of the semantic segmentation model.
  • the method may further include:
  • the semantic segmentation model is trained using the stochastic gradient descent method until the preset convergence condition is met.
  • the fine tuning process may include: 1. A semantic segmentation model using a VGG-16 network structure. 2. Set the initial learning rate of the semantic segmentation model to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
  • the gradient of at least one layer of parameters is calculated according to the error of at least one layer, and the parameter value is corrected according to the gradient; the model is converged in the process of constant correction. 5. Iterate to the convergence of the model around the 60,000th round. 6. Use this semantic segmentation model to test on existing public datasets.
  • the method may further include:
  • the convolutional neural network is trained using a stochastic gradient descent method until the default convergence condition is met.
  • the fine tuning process may include: 1. A convolutional neural network using a VGG-16 network structure. 2. Set the initial learning rate of the convolutional neural network to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
  • the gradient of at least one layer of parameters is calculated based on the error of at least one layer, and the parameter values are corrected according to the gradient; the network is converged during the process of constant correction. 5. Iterate to the network convergence around the 60,000th round. 6. Use this convolutional neural network to test on existing public data sets.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 4 is a schematic structural diagram of an embodiment of a training apparatus for a semantic segmentation model of the present application.
  • the apparatus of this embodiment can be used to implement the various method embodiments described above. As shown in FIG. 4, the apparatus of this embodiment includes:
  • the segmentation unit 41 is configured to perform image semantic segmentation on at least one unlabeled image by using a semantic segmentation model to obtain a preliminary semantic segmentation result as a category of the unlabeled image.
  • the sub-image extracting unit 42 is configured to obtain, by the convolutional neural network, a feature corresponding to the sub-image and the sub-image corresponding to the at least two images based on the category of the at least one unlabeled image and the category of the at least one labeled image.
  • the at least two images include at least one unlabeled image and at least one labeled image, and at least two of the sub-images carry a category of the corresponding image.
  • the training unit 43 is configured to train the semantic segmentation model based on the categories of the at least two sub-images and the feature distance between the at least two sub-images.
  • the semantic segmentation model performs image semantic segmentation on the unlabeled image, so that the unlabeled image can obtain a noisy category, based on the unlabeled image category, and The type of the image has been labeled, and at least two sub-images corresponding to the image are obtained, and the labeled image and the unlabeled image are applied to the training, and the self-supervised training is realized; and the feature extraction is performed on the sub-image through the convolutional neural network.
  • the semantic segmentation model is trained, and the semantic segmentation model of self-supervised learning with strong semantic discrimination ability is obtained through training, on semantic segmentation. Can achieve higher accuracy.
  • the training unit 43 includes:
  • a patch map establishing module configured to establish a patch map according to a category relationship between the sub-images, the patch map includes a node and an edge, the node includes a sub-image, and the edge includes a feature distance between any two sub-images;
  • the model training module is configured to train the semantic segmentation model, so that the feature distance between the two sub-images with the same category is smaller than the first preset value in the patch map, and the feature distance between the two sub-images with different categories is greater than The second preset value.
  • the sub-image is used as a node, and the feature distance between the sub-images having the connection relationship is used as an edge, wherein the sub-images are between
  • the connection relationship is determined according to the category corresponding to the sub-image;
  • the feature of the sub-image is a feature selected by the corresponding selection frame in the feature map outputted by the output layer of the convolutional neural network; optionally, the output layer is a volume Any of the middle or deep layers in the neural network; one of the layers or deep layers in the convolutional neural network is selected as the output layer, wherein the shallow features of the image generally characterize some edges, corners, etc. of the objects in the image.
  • the middle layer features of the image generally represent some part information of the object (such as: the wheel of the vehicle, the nose of the face, etc.), and the deep features of the image generally represent the category information of the whole image (eg, people, cars, horses, etc.);
  • the image is created and the parameters are optimized.
  • One of the middle or deep layers is selected as the output layer of the labeled image and the unlabeled image, and after multiple times It is proved that the optimization effect of the middle layer feature is better than the deep layer feature; wherein the first preset value and the second preset value are preset, and usually the second preset value is greater than the first preset value, and the first preset is adopted.
  • the value and the second preset value make the feature distance between the two sub-images of the same category smaller, and the feature distance between the two sub-images of different categories is larger.
  • the patch map creation module includes:
  • a reference selection module configured to select at least one sub-image as a reference node
  • connection relationship establishing module configured to respectively target at least one reference node: a sub-image of the same category as the reference node as a positive correlation node, and a sub-image of a different category from the reference node as a negative correlation node, respectively at the reference node and at least one positive Establishing a positive correlation connection between the relevant nodes, and establishing a negative correlation connection with the at least one of the negative correlation nodes respectively at the reference node;
  • a connection graph establishing module is configured to form a map of sparse connections by at least one reference node, a positive correlation node of the reference node, a negative correlation node of the reference node, a positive correlation connection, and a negative correlation connection.
  • the model training module includes:
  • the network training module is used to train the semantic segmentation model by the gradient back propagation algorithm to minimize the error of the convolutional neural network, and the error is the triple loss of the feature of the corresponding sub-image obtained based on the convolutional neural network.
  • the network training module is specifically configured to:
  • the error is calculated according to the distance between the sub-images output by the convolutional neural network after optimizing the parameters, and the error is taken as the maximum error;
  • Iterative execution calculates the error of at least one layer in the convolutional neural network by using the maximum error to propagate back through the gradient; calculates the gradient of at least one layer of parameters according to the error of at least one layer, and corrects the parameters of the corresponding layer in the convolutional neural network according to the gradient Until the maximum error is less than or equal to the preset value.
  • the model training module further includes:
  • the segmentation model training module is configured to obtain parameters of the convolutional neural network based on the training result of the convolutional neural network; and initialize the parameters in the semantic segmentation model based on the obtained parameters of the convolutional neural network.
  • the sub-image extraction unit is configured to move on the at least two images in response to the selection frame of the preset size, and the selection frame The pixels in the selection are judged.
  • the image in the selection frame is output as a sub-image, and the sub-image is marked as a category;
  • the neural network obtains the features corresponding to the sub-images.
  • At least two images are segmented by a selection box of variable size, wherein at least two images include an unlabeled image and an annotated image, and the pixels in the selection frame belong to a category (eg, semantics)
  • the selection box may be divided into the categories, and the pixel output in the selection box is used as a sub-image, and the size of the selection box is adjustable.
  • the size of the selection box can be adjusted to re-segment until a certain number of sub-images are obtained.
  • the sub-image extraction unit is further configured to discard the pixel of the same category in the pixels in the selection frame when the proportion of the pixels in the same category is less than a preset value. Select box.
  • the sub-image extraction unit is configured to respectively mark the uncorrelated by the convolutional neural network when obtaining the feature corresponding to the sub-image through the convolutional neural network.
  • Feature extraction is performed on the image and the labeled image, and the feature map corresponding to the unlabeled image and the labeled image is obtained; and the feature in the corresponding selection frame is obtained from the feature map corresponding to the labeled image based on the position and size of the selection frame corresponding to the sub image , determining the feature corresponding to the sub image.
  • the device of the embodiment further includes: a model fine-tuning unit, configured to perform training on the semantic segmentation model by using a random gradient descent method, Until the preset convergence condition is met.
  • the fine tuning process may include: 1. A semantic segmentation model using a VGG-16 network structure. 2. Set the initial learning rate of the semantic segmentation model to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
  • the gradient of at least one layer of parameters is calculated according to the error of at least one layer, and the parameter value is corrected according to the gradient; the model is converged in the process of constant correction. 5. Iterate to the convergence of the model around the 60,000th round. 6. Use this semantic segmentation model to test on existing public datasets.
  • the device of the embodiment further includes: a network fine tuning unit for training the convolutional neural network using a random gradient descent method Until the preset convergence condition is met.
  • the fine tuning process may include: 1. A convolutional neural network using a VGG-16 network structure. 2. Set the initial learning rate of the convolutional neural network to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
  • the gradient of at least one layer of parameters is calculated based on the error of at least one layer, and the parameter values are corrected according to the gradient; the network is converged during the process of constant correction. 5. Iterate to the network convergence around the 60,000th round. 6. Use this convolutional neural network to test on existing public data sets.
  • an electronic device including a processor, which includes any one of the embodiments of the training device of the semantic segmentation model of the present application.
  • an electronic device includes: a memory, configured to store executable instructions;
  • a processor for communicating with the memory to execute executable instructions to perform the operations of any of the embodiments of the training method of the semantic segmentation model of the present application.
  • a computer storage medium for storing computer readable instructions, wherein the training method for performing the semantic segmentation model of the present application when the instructions are executed is in each embodiment. Any of the operations.
  • the embodiment of the present application further provides a computer program, comprising computer readable code, when the computer readable code is run on a device, the processor in the device performs the implementation of any embodiment of the present application.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 5 a schematic structural diagram of an electronic device 500 suitable for implementing a terminal device or a server of an embodiment of the present application is shown.
  • the electronic device 500 includes one or more processors and a communication unit.
  • the one or more processors are, for example: one or more central processing units (CPUs) 501, and/or one or more image processing units (GPUs) 513, etc., the processors may be stored in a read only memory ( Various suitable actions and processes are performed by executable instructions in ROM) 502 or executable instructions loaded into random access memory (RAM) 503 from storage portion 508.
  • the communication part 512 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with the read-only memory 502 and/or the random access memory 503 to execute executable instructions, connect to the communication unit 512 via the bus 504, and communicate with other target devices via the communication unit 512, thereby completing the embodiments of the present application.
  • Corresponding operations of any of the methods for example, by semantic segmentation model, image semantic segmentation of at least one unlabeled image, to obtain a preliminary semantic segmentation result as a category of the unlabeled image; by convolutional neural network, based on at least a category of the unlabeled image, and a category of the at least one labeled image, the sub-image corresponding to the at least two images and the feature corresponding to the sub-image, the at least two images including at least one unlabeled image and at least one labeled image, The at least two sub-images carry a category of the corresponding image; the semantic segmentation model is trained based on the categories of the at least two sub-images and the feature distance between the at least two sub-images.
  • RAM 503 various programs and data required for the operation of the device can be stored.
  • the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • ROM 502 is an optional module.
  • the RAM 503 stores executable instructions, or writes executable instructions to the ROM 502 at runtime, and the executable instructions cause the central processing unit 501 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 505 is also coupled to bus 504.
  • the communication unit 512 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • FIG. 5 is only an optional implementation manner.
  • the number and type of components in FIG. 5 may be selected, deleted, added, or replaced according to actual needs; Different function component settings may also be implemented by separate settings or integrated settings.
  • the GPU 513 and the CPU 501 may be separately configured or the GPU 513 may be integrated on the CPU 501.
  • the communication unit may be separately configured or integrated on the CPU 501 or the GPU 513. and many more.
  • an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, performing semantic segmentation on at least one unlabeled image by using a semantic segmentation model, and obtaining preliminary semantic segmentation results as categories of unlabeled images; by convolutional neural networks, And obtaining, according to the category of the at least one unlabeled image and the category of the at least one labeled image, a feature corresponding to the sub-image and the sub-image corresponding to the at least two images, the at least two images including at least one unlabeled image and at least one labeled The image, at least two sub-images carry a category of the corresponding
  • the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511.
  • the computer program is executed by the central processing unit (CPU) 501, the above-described functions defined in the method of the present application are performed.
  • the methods and apparatus of the present application may be implemented in a number of ways.
  • the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种语义分割模型的训练方法和装置、电子设备、存储介质,其中方法包括:通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别;通过卷积神经网络,基于至少一个所述未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,所述至少两个图像包括至少一个所述未标注图像及至少一个所述已标注图像,所述至少两个子图像携带有对应图像的类别;基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。本申请上述实施例训练得到的语义分割模型,在语义分割上能获得较高的准确率。

Description

语义分割模型的训练方法和装置、电子设备、存储介质
本申请要求在2017年8月1日提交中国专利局、申请号为CN201710648545.7、发明名称为“语义分割模型的训练方法和装置、电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机视觉技术,尤其是一种语义分割模型的训练方法和装置、电子设备、存储介质。
背景技术
图像语义分割通过对输入图像的每一个像素在输出上都分配对应的判断标注,标明这个像素最可能是属于一个什么物体或类别。是计算机视觉领域的一个重要任务,其应用包括机器场景理解、视频分析等。
发明内容
本申请实施例提供了一种语义分割模型的训练技术。
本申请实施例提供的一种语义分割模型的训练方法,包括:
通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别;
通过卷积神经网络,基于至少一个所述未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,所述至少两个图像包括至少一个所述未标注图像及至少一个所述已标注图像,所述至少两个子图像携带有对应图像的类别;
基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
根据本申请实施例的另一个方面,提供的一种语义分割模型的训练装置,其特征在于,包括:
分割单元,用于通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别;
子图像提取单元,用于通过卷积神经网络,基于至少一个所述未标注图像的类别,及 至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,所述至少两个图像包括至少一个所述未标注图像及至少一个所述已标注图像,所述至少两个子图像携带有对应图像的类别;
训练单元,用于基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
根据本申请实施例的又一个方面,提供的一种电子设备,包括处理器,所述处理器包括如上所述的语义分割模型的训练装置。
根据本申请实施例的还一个方面,提供的一种电子设备,包括:存储器,用于存储可执行指令;
以及处理器,用于与所述存储器通信以执行所述可执行指令从而完成如上所述语义分割模型的训练方法的操作。
根据本申请实施例的再一个方面,提供的一种计算机存储介质,用于存储计算机可读取的指令,所述指令被执行时执行如上所述语义分割模型的训练方法的操作。
根据本申请实施例的再一个方面,提供的计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的语义分割模型的训练方法中各步骤的指令。
基于本申请上述实施例提供的一种语义分割模型的训练方法和装置、电子设备、存储介质,通过语义分割模型对未标注图像进行图像语义分割,使未标注图像能够得到一个带噪声的类别,基于未标注图像的类别,及已标注图像的类别,得到至少两个图像分别对应的子图像,将标注图像和未标注图像都应用到训练中,实现了自监督训练;通过卷积神经网络,实现对子图像进行特征提取,基于至少两个子图像的类别,及至少两个子图像之间的特征距离,实现对语义分割模型的训练,通过训练得到具有较强的语义区分能力的自监督学习的语义分割模型,在语义分割上能获得较高的准确率。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请语义分割模型的训练方法一个实施例的流程图。
图2为本申请语义分割模型的训练方法建立面片图的一个示例示意图。
图3为本申请语义分割模型的训练方法建立面片图的另一个示例示意图。
图4为本申请语义分割模型的训练装置一个实施例的结构示意图。
图5为本申请电子设备一个实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本申请语义分割模型的训练方法一个实施例的流程图。如图1所示,该实施例 方法包括:
步骤101,通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别。
其中,未标注图像是指该图像中部分或全部像素的类别(例如:语义类别)是不确定的,在本实施例中示例性地,可以通过一个已知的语义分割模型对未标注图像进行图像语义分割,获得具有噪声的语义分割结果。
在一个可选示例中,该步骤101可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的分割单元41执行。
步骤102,通过卷积神经网络,基于至少一个未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征。
其中,至少两个图像包括至少一个未标注图像及至少一个已标注图像,至少两个子图像携带有对应图像的类别。可选地,通过可设置大小的选择框在图像中移动,再根据图像中像素的类别判断选择框内的像素是否是同一类别的,当一个选择框中的超出设定比例的像素都属于同一类别,就可以将这个选择框作为一个子图像输出。
在一个可选示例中,该步骤102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的子图像提取单元42执行。
步骤103,基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
在一个可选示例中,该步骤103可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的训练单元43执行。
基于本申请上述实施例提供的一种语义分割模型的训练方法,通过语义分割模型对未标注图像进行图像语义分割,使未标注图像能够得到一个带噪声的类别,基于未标注图像的类别,及已标注图像的类别,得到至少两个图像分别对应的子图像,将标注图像和未标注图像都应用到训练中,实现了自监督训练;通过卷积神经网络,实现对子图像进行特征提取,基于至少两个子图像的类别,及至少两个子图像之间的特征距离,实现对语义分割模型的训练,通过训练得到具有较强的语义区分能力的自监督学习的语义分割模型,在语义分割上能获得较高的准确率。
自监督学习是用不带标注的图像本身进行训练,得到图像描述子,图像描述子是可以用来描述图像语义信息的高纬向量;然后用这些图像描述子进行语义分割的训练。
本申请语义分割模型的训练方法的另一个实施例中,在上述实施例的基础上,步骤103 包括:
根据子图像之间的类别关系建立面片图(patch graph),该面片图包括节点和边,节点包括子图像,边包括任意两个子图像之间的特征距离;
对语义分割模型进行训练,使得该面片图中,类别相同的两个子图像之间的特征距离小于第一预设值,类别不相同的两个子图像之间的特征距离大于第二预设值。
该实施例中,图2为本申请语义分割模型的训练方法建立面片图的一个示例示意图。如图2所示,为了建立面片图(patch graph)22,首先需要确定节点221,本实施例中将子图像作为节点221,在任意一个已知类别图像21中通过选择框211选择至少一个子图像,将具有连接关系的子图像之间的特征距离作为边222(图2中中层特征中选取的选择框中的特征为子图像的特征),其中子图像之间的连接关系是根据子图像对应的类别决定的;子图像的特征是通过对应的选择框在卷积神经网络的输出层输出的特征图中选出的特征;可选地,该输出层为卷积神经网络中的中层或深层中的任意一层;选择卷积神经网络中层或深层中的一层作为输出层,其中图像浅层特征一般表征图像中物体的一些边缘(edge)、角点等信息,图像中层特征一般表征物体的一些部件信息(比如:车辆的轮子、人脸的鼻子等),图像深层特征一般表征图像整体的类别信息(比如:人、车、马等);为了通过子图像建立图并对参数进行优化,选择中层或深层中的一层作为已标注图像和未标注图像的输出层,并且,经过多次实践证明,中层特征的优化效果优于深层特征;其中,第一预设值和第二预设值是预先设定的,通常第二预设值大于第一预设值,通过第一预设值和第二预设值使类别相同的两个子图像之间的特征距离越小,类别不相同的两个子图像之间的特征距离越大。
图3为本申请语义分割模型的训练方法建立面片图的另一个示例示意图。该实施例方法包括:通过卷积神经网络(图3中CNN),基于至少一个未标注图像的类别(该未标注图像的类别可基于已知语义分割模型获得),及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征(图3中中层特征中的对应子图像位置的特征);根据子图像之间的类别关系建立面片图(patch graph),该面片图包括节点和边(图3中面片图中圆代表节点,连接两个圆之间的线条代表边),节点包括子图像,边包括任意两个子图像之间的特征距离。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,根据子图像之间的类别关系建立面片图,包括:
选择至少一个子图像作为基准节点,分别针对至少一个基准节点:
将与基准节点相同类别的子图像作为正相关节点,将与基准节点不同类别的子图像作为负相关节点,分别在基准节点与至少一个正相关节点之间建立正相关连接,分别在基准节点与至少一个负相关节点建立负相关连接;
由至少一个基准节点、基准节点的正相关节点、基准节点的负相关节点、正相关连接以及负相关连接形成稀疏连接的图。
该实施例中,建立面片图的过程就是从至少两个子图像中随机选择多个子图像,分别以随机选择的子图像作为锚(anchor),基于语义类别,随机选择与anchor同类别的一个子图像作为积极图(positive),随机选择与anchor不同语义类别的一个子图像作为消极图(negative),此时就基于一个子图像建立了两个连接:anchor-positive和anchor-negative;基于这些连接,就建立了一个稀疏连接的面片图。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,对语义分割模型进行训练,包括:
通过梯度反向传播算法,对所述语义分割模型进行训练,以使卷积神经网络的误差达到最小化,误差为基于卷积神经网络获得的对应子图像的特征的三重损失。
本实施例中通过梯度反向传播算法缩小卷积神经网络中的误差,使卷积神经网络从第一层到输出层中的至少一层的参数得到优化,梯度反向传播算法(BP,Back Propagation算法)是在有导师指导下,适合于多层神经元网络的一种学习算法,它建立在梯度下降法的基础上。BP网络的输入输出关系实质上是一种映射关系:一个n输入m输出的BP神经网络所完成的功能是从n维欧氏空间向m维欧氏空间中一有限域的连续映射,这一映射具有高度非线性。BP算法的学习过程由正向传播过程和反向传播过程组成。在正向传播过程中,输入信息通过输入层经隐含层,逐层处理并传向输出层。如果在输出层得不到期望的输出值,则取输出与期望的误差的平方和作为目标函数,转入反向传播,逐层求出目标函数对各神经元权值的偏导数,构成目标函数对权值向量的梯量,作为修改权值的依据,网络的学习在权值修改过程中完成。误差达到所期望值时,网络学习结束。
面片图中的边是由输出层输出的子图像之间的特征距离获得,其中输出层是从中层或深层中选择的一层,因此,优化的不是卷积神经网络的所有层的参数,而是从第一层到该输出层的参数,因此,在误差计算过程中,也同样是计算从输出层到第一层中至少一层的误差。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,通过梯度反向传播算法,对语义分割模型进行训练,包括:
根据建立的面片图中的子图像的特征之间的距离通过损失函数计算得到最大误差;
将最大误差通过梯度反向传播,计算卷积神经网络中至少一层的误差;
根据至少一层的误差计算出至少一层参数的梯度,根据梯度修正所述卷积神经网络中对应层的参数;
根据优化参数后的卷积神经网络输出的子图像之间的距离计算得到误差,将误差作为最大误差;
迭代执行将最大误差通过梯度反向传播,计算卷积神经网络中至少一层的误差;根据至少一层的误差计算出至少一层参数的梯度,根据梯度修正卷积神经网络中对应层的参数,直到最大误差小于或等于预设值。
在本实施例中,首先定义一个损失函数,卷积神经网络通过最小化这个损失函数来优化网络参数,该损失函数公式如公式(1)所示:
Figure PCTCN2018097549-appb-000001
其中,
Figure PCTCN2018097549-appb-000002
表示基于子图像建立的面片图中anchor与positive之间的距离,
Figure PCTCN2018097549-appb-000003
表示基于子图像建立的面片图中anchor与negative之间的距离,m表示一个常数,该公式是基于现有技术中三重损失函数(triplet loss)的公式获得的,通过计算出的误差,结合梯度反向传播算法就可以实现对卷积神经网络中每层的参数的优化。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,对语义分割模型进行训练的过程可以包括:
基于卷积神经网络的训练结果获得卷积神经网络的参数;
基于获得的卷积神经网络的参数初始化语义分割模型中的参数。
在本实施例中,由于语义分割模型也属于卷积神经网络,经过训练得到的卷积神经网络的参数具有较强的语义类别区分性,在语义分割上能获得较高的准确率,将该卷积神经网络的参数替换掉原始语义分割模型中的参数,就获得训练完成的语义分割模型。
本申请语义分割模型的训练方法的又一个实施例中,在上述各实施例的基础上,步骤102可以包括:
响应于预设大小的选择框在至少两个图像上移动,对选择框内的像素进行判断,当选择框内的像素中同一语义类别的像素所占比例大于或等于预设值时,将选择框内的图像作 为一个子图像输出,并对子图像标注为类别;
通过卷积神经网络得到子图像对应的特征。
在本实施例中,通过一个大小可变化的选择框对至少两个图像进行分割,其中,至少两个图像包括未标注图像和已标注图像,当选择框内的像素属于一个类别(例如:语义类别等)的像素所占比例大于或等于预设值时,可以将该选择框分为该类别,并将该选择框内的像素输出作为一个子图像,对于选择框的大小是可调的,当通过一个大小的选择框在图像中没有获得子图像时,可以通过调整选择框的大小,重新进行分割,直到得到一定数量的子图像。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,步骤102还可以包括:当选择框内的像素中同一类别的像素所占比例小于预设值时,丢弃该选择框。
在本示例中,对于已设定大小的选择框需要在一个图像中完成逐像素的移动,以避免漏掉可选子图像,当一个选择框中存在多个类别,但该多个类别对应的像素比例都小于预设值,那么该选择框是无法确定类别的,此时需要将选择框移动到下一个位置,在下一个位置继续判断;当通过一个设定大小的选择框在一个图像中未获得任一子图像时,需要调整选择框的大小,重新对该图像进行选择。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,通过卷积神经网络得到子图像对应的特征,包括:
通过卷积神经网络分别对未标注图像和标注图像的进行特征提取,获得对应未标注图像和已标注图像的特征图;
基于子图像对应的选择框的位置和大小,从对应的特征图中获得对应选择框内的特征,确定子图像对应的特征。
在本实施例中,通过获得子图像的选择框的位置和大小,在对应的卷积神经网络的输出层的特征图中通过同样位置和大小的选择框选择出对应子图像的特征,进而通过子图像的特征获得任意两个子图像之间的特征距离。
在本申请语义分割模型的训练方法上述各实施例的一个可选示例中,还可以包括,在步骤102之前,基于语义分割模型的参数初始化卷积神经网络的参数。
示例性地,为了得到更准确的特征,采用语义分割模型的参数对卷积神经网络的参数进行初始化。
在本申请语义分割模型的训练方法的还一个实施例中,在上述各实施例的基础上,在步骤101之前,还可以包括:
使用随机梯度下降法对语义分割模型进行训练,直至满足预设收敛条件。
本实施例实现的是分别对语义分割模型的微调,可选地,微调过程可包括:1.使用VGG-16网络结构的语义分割模型。2.设置语义分割模型的初始学习率为0.01,每30000轮迭代下降10倍。3.使用随机梯度下降算法微调并优化语义分割任务,此过程利用8个GPU分布计算。4.随机梯度下降算法:随机选择一批数据(本案例为16张图片),输入到网络中,前向传播得到结果,计算其与标注结果的误差,利用反向传播得到至少一层的误差。根据至少一层的误差算出至少一层参数的梯度,根据梯度修正参数值;在不断修正的过程中让模型收敛。5.迭代到第60000轮左右模型收敛。6.利用这个语义分割模型在现有公开数据集上进行测试。
在本申请语义分割模型的训练方法的再一个实施例中,在上述各实施例的基础上,在步骤102之前,还可以包括:
使用随机梯度下降法对卷积神经网络进行训练,直至满足预设收敛条件。
本实施例实现的是分别对卷积神经网络的微调,可选地,微调过程可包括:1.使用VGG-16网络结构的卷积神经网络。2.设置卷积神经网络的初始学习率为0.01,每30000轮迭代下降10倍。3.使用随机梯度下降算法微调并优化语义分割任务,此过程利用8个GPU分布计算。4.随机梯度下降算法:随机选择一批数据(本案例为16张图片),输入到网络中,前向传播得到结果,计算其与标注结果的误差,利用反向传播得到至少一层的误差。根据至少一层的误差算出至少一层参数的梯度,根据梯度修正参数值;在不断修正的过程中让网络收敛。5.迭代到第60000轮左右网络收敛。6.利用这个卷积神经网络在现有公开数据集上进行测试。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图4为本申请语义分割模型的训练装置一个实施例的结构示意图。该实施例的装置可用于实现本申请上述各方法实施例。如图4所示,该实施例的装置包括:
分割单元41,用于通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为未标注图像的类别。
子图像提取单元42,用于通过卷积神经网络,基于至少一个未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征。
其中,至少两个图像包括至少一个未标注图像及至少一个已标注图像,至少两个子图像携带有对应图像的类别。
训练单元43,用于基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
基于本申请上述实施例提供的一种语义分割模型的训练装置,通过语义分割模型对未标注图像进行图像语义分割,使未标注图像能够得到一个带噪声的类别,基于未标注图像的类别,及已标注图像的类别,得到至少两个图像分别对应的子图像,将标注图像和未标注图像都应用到训练中,实现了自监督训练;通过卷积神经网络,实现对子图像进行特征提取,基于至少两个子图像的类别,及至少两个子图像之间的特征距离,实现对语义分割模型的训练,通过训练得到具有较强的语义区分能力的自监督学习的语义分割模型,在语义分割上能获得较高的准确率。
本申请语义分割模型的训练装置的另一个实施例中,在上述实施例的基础上,训练单元43,包括:
面片图建立模块,用于根据子图像之间的类别关系建立面片图,该面片图包括节点和边,节点包括子图像,边包括任意两个子图像之间的特征距离;
模型训练模块,用于对语义分割模型进行训练,使得面片图中,类别相同的两个子图像之间的特征距离小于第一预设值,类别不相同的两个子图像之间的特征距离大于第二预设值。
该实施例中,为了建立面片图(patch graph),首先需要确定节点,本实施例中将子图像作为节点,将具有连接关系的子图像之间的特征距离作为边,其中子图像之间的连接关系是根据子图像对应的类别决定的;子图像的特征是通过对应的选择框在卷积神经网络的输出层输出的特征图中选出的特征;可选地,该输出层为卷积神经网络中的中层或深层中的任意一层;选择卷积神经网络中层或深层中的一层作为输出层,其中图像浅层特征一般表征图像中物体的一些边缘(edge)、角点等信息,图像中层特征一般表征物体的一些部件信息(比如:车辆的轮子、人脸的鼻子等),图像深层特征一般表征图像整体的类别信息(比如:人、车、马等);为了通过子图像建立图并对参数进行优化,选择中层或深层中的一层作为已标注图像和未标注图像的输出层,并且,经过多次实践证明,中层特征的优化效果优于深层特征;其中,第一预设值和第二预设值是预先设定的,通常第二预设值大于第一预设值,通过第一预设值和第二预设值使类别相同的两个子图像之间的特征距离越小,类别不相同的两个子图像之间的特征距离越大。
在本申请语义分割模型的训练装置上述各实施例的一个可选示例中,面片图建立模块,包括:
基准选择模块,用于选择至少一个子图像作为基准节点;
连接关系建立模块,用于分别针对至少一个基准节点:将与基准节点相同类别的子图像作为正相关节点,将与基准节点不同类别的子图像作为负相关节点,分别在基准节点与至少一个正相关节点之间建立正相关连接,分别在基准节点与至少一个所述负相关节点建立负相关连接;
连接图建立模块,用于由至少一个基准节点、基准节点的正相关节点、基准节点的负相关节点、正相关连接以及负相关连接形成稀疏连接的图。
在本申请语义分割模型的训练装置上述各实施例的一个可选示例中,模型训练模块包括:
网络训练模块,用于通过梯度反向传播算法,对语义分割模型进行训练,以使卷积神经网络的误差达到最小化,误差为基于卷积神经网络获得的对应子图像的特征的三重损失。
在本申请语义分割模型的训练装置上述各实施例的一个可选示例中,网络训练模块,具体用于:
根据建立的面片图中的子图像之间的特征距离通过损失函数计算得到最大误差;
将最大误差通过梯度反向传播,计算卷积神经网络中至少一层的误差;
根据至少一层的误差计算出至少一层参数的梯度,根据梯度修正卷积神经网络中对应层的参数;
根据优化参数后的卷积神经网络输出的子图像之间的距离计算得到误差,将误差作为最大误差;
迭代执行将最大误差通过梯度反向传播,计算卷积神经网络中至少一层的误差;根据至少一层的误差计算出至少一层参数的梯度,根据梯度修正卷积神经网络中对应层的参数,直到最大误差小于或等于预设值。
在本申请语义分割模型的训练装置上述各实施例的一个可选示例中,模型训练模块,还包括:
分割模型训练模块,用于基于卷积神经网络的训练结果获得卷积神经网络的参数;基于获得的卷积神经网络的参数初始化语义分割模型中的参数。
本申请语义分割模型的训练装置的又一个实施例中,在上述各实施例的基础上,子图像提取单元,用于响应于预设大小的选择框在至少两个图像上移动,对选择框内的像素进 行判断,当选择框内的像素中同一类别的像素所占比例大于或等于预设值时,将选择框内的图像作为一个子图像输出,并对子图像标注为类别;通过卷积神经网络得到子图像对应的特征。
在本实施例中,通过一个大小可变化的选择框对至少两个图像进行分割,其中,至少两个图像包括未标注图像和已标注图像,当选择框内的像素属于一个类别(例如:语义类别)的像素所占比例大于或等于预设值时,可以将该选择框分为该类别,并将该选择框内的像素输出作为一个子图像,对于选择框的大小是可调的,当通过一个大小的选择框在图像中没有获得子图像时,可以通过调整选择框的大小,重新进行分割,直到得到一定数量的子图像。
在本申请语义分割模型的训练装置上述各实施例的一个可选示例中,子图像提取单元,还用于当选择框内的像素中同一类别的像素所占比例小于预设值时,丢弃该选择框。
在本申请语义分割模型的训练装置上述各实施例的一个可选示例中,子图像提取单元,在通过卷积神经网络得到子图像对应的特征时,用于通过卷积神经网络分别对未标注图像和已标注图像进行特征提取,获取对应未标注图像和已标注图像的特征图;基于子图像对应的选择框的位置和大小,从对应已标注图像的特征图中获得对应选择框内的特征,确定子图像对应的特征。
在本申请语义分割模型的训练装置的还一个实施例中,在上述各实施例的基础上,本实施例装置还包括:模型微调单元,用于使用随机梯度下降法对语义分割模型进行训练,直至满足预设收敛条件。
本实施例实现的是分别对语义分割模型的微调,可选地,微调过程可包括:1.使用VGG-16网络结构的语义分割模型。2.设置语义分割模型的初始学习率为0.01,每30000轮迭代下降10倍。3.使用随机梯度下降算法微调并优化语义分割任务,此过程利用8个GPU分布计算。4.随机梯度下降算法:随机选择一批数据(本案例为16张图片),输入到网络中,前向传播得到结果,计算其与标注结果的误差,利用反向传播得到至少一层的误差。根据至少一层的误差算出至少一层参数的梯度,根据梯度修正参数值;在不断修正的过程中让模型收敛。5.迭代到第60000轮左右模型收敛。6.利用这个语义分割模型在现有公开数据集上进行测试。
在本申请语义分割模型的训练装置的再一个实施例中,在上述各实施例的基础上,本实施例装置还包括:网络微调单元,用于使用随机梯度下降法对卷积神经网络进行训练,直至满足预设收敛条件。
本实施例实现的是分别对卷积神经网络的微调,可选地,微调过程可包括:1.使用VGG-16网络结构的卷积神经网络。2.设置卷积神经网络的初始学习率为0.01,每30000轮迭代下降10倍。3.使用随机梯度下降算法微调并优化语义分割任务,此过程利用8个GPU分布计算。4.随机梯度下降算法:随机选择一批数据(本案例为16张图片),输入到网络中,前向传播得到结果,计算其与标注结果的误差,利用反向传播得到至少一层的误差。根据至少一层的误差算出至少一层参数的梯度,根据梯度修正参数值;在不断修正的过程中让网络收敛。5.迭代到第60000轮左右网络收敛。6.利用这个卷积神经网络在现有公开数据集上进行测试。
根据本申请实施例的一个方面,提供的一种电子设备,包括处理器,处理器包括本申请语义分割模型的训练装置各实施例中的任意一项。
根据本申请实施例的一个方面,提供的一种电子设备,包括:存储器,用于存储可执行指令;
以及处理器,用于与存储器通信以执行可执行指令从而完成本申请语义分割模型的训练方法各实施例中的任意一项的操作。
根据本申请实施例的一个方面,提供的一种计算机存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时执行本申请语义分割模型的训练方法各实施例中的任意一项的操作。
本申请实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的语义分割模型的训练方法中各步骤的指令。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图5,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备500的结构示意图:如图5所示,电子设备500包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)501,和/或一个或多个图像处理器(GPU)513等,处理器可以根据存储在只读存储器(ROM)502中的可执行指令或者从存储部分508加载到随机访问存储器(RAM)503中的可执行指令而执行各种适当的动作和处理。通信部512可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,
处理器可与只读存储器502和/或随机访问存储器503中通信以执行可执行指令,通过总线504与通信部512相连、并经通信部512与其他目标设备通信,从而完成本申请实 施例提供的任一项方法对应的操作,例如,通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别;通过卷积神经网络,基于至少一个未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,至少两个图像包括至少一个未标注图像及至少一个已标注图像,至少两个子图像携带有对应图像的类别;基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
此外,在RAM 503中,还可存储有装置操作所需的各种程序和数据。CPU501、ROM502以及RAM503通过总线504彼此相连。在有RAM503的情况下,ROM502为可选模块。RAM503存储可执行指令,或在运行时向ROM502中写入可执行指令,可执行指令使中央处理单元501执行上述通信方法对应的操作。输入/输出(I/O)接口505也连接至总线504。通信部512可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口505:包括键盘、鼠标等的输入部分506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。
需要说明的,如图5所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图5的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU513和CPU501可分离设置或者可将GPU513集成在CPU501上,通信部可分离设置,也可集成设置在CPU501或GPU513上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为未标注图像的类别;通过卷积神经网络,基于至少一个未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,至少两个图像包括至少一个未标 注图像及至少一个已标注图像,至少两个子图像携带有对应图像的类别;基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。在这样的实施例中,该计算机程序可以通过通信部分509从网络上被下载和安装,和/或从可拆卸介质511被安装。在该计算机程序被中央处理单元(CPU)501执行时,执行本申请的方法中限定的上述功能。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (26)

  1. 一种语义分割模型的训练方法,其特征在于,包括:
    通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别;
    通过卷积神经网络,基于至少一个所述未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,所述至少两个图像包括至少一个所述未标注图像及至少一个所述已标注图像,所述至少两个子图像携带有对应图像的类别;
    基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
  2. 根据权利要求1所述的方法,其特征在于,基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型,包括:
    根据子图像之间的类别关系建立面片图,所述面片图包括节点和边,所述节点包括所述子图像,所述边包括任意两个所述子图像之间的特征距离;
    对所述语义分割模型进行训练,使得所述面片图中,类别相同的两个子图像之间的特征距离小于第一预设值,类别不相同的两个子图像之间的特征距离大于第二预设值。
  3. 根据权利要求2所述的方法,其特征在于,根据子图像之间的类别关系建立面片图,包括:
    选择至少一个子图像作为基准节点,分别针对至少一个基准节点:
    将与所述基准节点相同类别的子图像作为正相关节点,将与所述基准节点不同类别的子图像作为负相关节点,分别在所述基准节点与至少一个所述正相关节点之间建立正相关连接,分别在所述基准节点与至少一个所述负相关节点建立负相关连接;
    由至少一个所述基准节点、所述基准节点的所述正相关节点、所述基准节点的所述负相关节点、所述正相关连接和所述负相关连接形成稀疏连接的所述面片图。
  4. 根据权利要求2或3所述的方法,其特征在于,所述对所述语义分割模型进行训练,包括:
    通过梯度反向传播算法,对所述语义分割模型进行训练,以使所述卷积神经网络的误差达到最小化,所述误差为基于所述卷积神经网络获得的对应子图像的特征的三重损失。
  5. 根据权利要求4所述的方法,其特征在于,所述通过梯度反向传播算法,对所述语义分割模型进行训练,包括:
    根据建立的所述面片图中的子图像之间的特征距离通过损失函数计算得到最大误差; 将所述最大误差通过梯度反向传播,计算所述卷积神经网络中至少一层的误差;
    根据所述至少一层的误差计算出至少一层参数的梯度,根据所述梯度修正所述卷积神经网络中对应层的参数;
    根据优化参数后的卷积神经网络输出的子图像之间的距离计算得到误差,将所述误差作为最大误差;
    迭代执行将所述最大误差通过梯度反向传播,计算所述卷积神经网络中至少一层的误差;
    根据所述至少一层的误差计算出至少一层参数的梯度,根据所述梯度修正所述卷积神经网络中对应层的参数,直到所述最大误差小于或等于预设值。
  6. 根据权利要求4-5任一所述的方法,其特征在于,对所述语义分割模型进行训练,包括:
    基于所述卷积神经网络的训练结果获得所述卷积神经网络的参数;
    基于获得的所述卷积神经网络的参数初始化所述语义分割模型中的参数。
  7. 根据权利要求1-6任一所述的方法,其特征在于,所述通过卷积神经网络,基于至少一个所述未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,包括:
    响应于预设大小的选择框在所述至少两个图像上移动,对选择框内的像素进行判断,当所述选择框内的像素中同一类别的像素所占比例大于或等于预设值时,将所述选择框内的图像作为一个子图像输出,并对所述子图像标注为所述类别;
    通过所述卷积神经网络得到所述子图像对应的特征。
  8. 根据权利要求7所述的方法,其特征在于,还包括:当所述选择框内的像素中同一类别的像素所占比例小于预设值时,丢弃所述选择框。
  9. 根据权利要求7或8所述的方法,其特征在于,所述通过卷积神经网络得到所述子图像对应的特征,包括:
    通过卷积神经网络分别对所述未标注图像和所述已标注图像进行特征提取,获取对应所述未标注图像和所述已标注图像的特征图;
    基于所述子图像对应的选择框的位置和大小,从对应的特征图中获得对应所述选择框内的特征,确定所述子图像对应的特征。
  10. 根据权利要求1-9任一所述的方法,其特征在于,在通过语义分割模型,对至少一个未标注图像进行图像语义分割之前,还包括:
    使用随机梯度下降法对所述语义分割模型进行训练,直至满足预设收敛条件。
  11. 根据权利要求1-10任一所述的方法,其特征在于,在通过卷积神经网络,基于至少一个所述未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征之前,还包括:
    使用随机梯度下降法对所述卷积神经网络进行训练,直至满足预设收敛条件。
  12. 一种语义分割模型的训练装置,其特征在于,包括:
    分割单元,用于通过语义分割模型,对至少一个未标注图像进行图像语义分割,得到初步语义分割结果,作为所述未标注图像的类别;
    子图像提取单元,用于通过卷积神经网络,基于至少一个所述未标注图像的类别,及至少一个已标注图像的类别,得到至少两个图像分别对应的子图像及子图像对应的特征,所述至少两个图像包括至少一个所述未标注图像及至少一个所述已标注图像,所述至少两个子图像携带有对应图像的类别所述已标注图像;
    所述已标注图像训练单元,用于基于至少两个子图像的类别,及至少两个子图像之间的特征距离,训练语义分割模型。
  13. 根据权利要求12所述的装置,其特征在于,所述训练单元,包括:
    面片图建立模块,用于根据子图像之间的类别关系建立面片图,所述面片图包括节点和边,所述节点包括所述子图像,所述边包括任意两个所述子图像之间的特征距离;
    模型训练模块,用于对所述语义分割模型进行训练,使得所述面片图中,类别相同的两个子图像之间的特征距离小于第一预设值,类别不相同的两个子图像之间的特征距离大于第二预设值。
  14. 根据权利要求13所述的装置,其特征在于,所述面片图建立模块,包括:
    基准选择模块,用于选择至少一个子图像作为基准节点;
    连接关系建立模块,用于分别针对至少一个基准节点:将与所述基准节点相同类别的子图像作为正相关节点,将与所述基准节点不同类别的子图像作为负相关节点,分别在所述基准节点与至少一个所述正相关节点之间建立正相关连接,分别在所述基准节点与至少一个所述负相关节点建立负相关连接;
    连接图建立模块,用于由至少一个所述基准节点、所述基准节点的所述正相关节点、所述基准节点的所述负相关节点、所述正相关连接和所述负相关连接形成稀疏连接的所述图。
  15. 根据权利要求13-14任一所述的装置,其特征在于,所述模型训练模块,包括:
    网络训练模块,用于通过梯度反向传播算法,对所述语义分割模型进行训练,以使所述卷积神经网络的误差达到最小化,所述误差为基于所述卷积神经网络获得的对应子图像的特征的三重损失。
  16. 根据权利要求15所述的装置,其特征在于,所述网络训练模块,具体用于:
    根据所述建立的面片图中的子图像之间的特征距离通过损失函数计算得到最大误差;
    将最大误差通过梯度反向传播,计算所述卷积神经网络中至少一层的误差;
    根据所述至少一层的误差计算出至少一层参数的梯度,根据所述梯度修正所述卷积神经网络中对应层的参数;
    根据优化参数后的卷积神经网络输出的子图像之间的距离计算得到误差,将所述误差作为最大误差;
    迭代执行将最大误差通过梯度反向传播,计算所述卷积神经网络中至少一层的误差;根据所述至少一层的误差计算出至少一层参数的梯度,根据所述梯度修正所述卷积神经网络中对应层的参数,直到所述最大误差小于或等于预设值。
  17. 根据权利要求15-16任一所述的装置,其特征在于,所述模型训练模块,还包括:
    分割模型训练模块,用于基于所述卷积神经网络的训练结果获得所述卷积神经网络的参数;基于获得的所述卷积神经网络的参数初始化所述语义分割模型中的参数。
  18. 根据权利要求12-17任一所述的装置,其特征在于,所述子图像提取单元,用于响应于预设大小的选择框在所述至少两个图像上移动,对选择框内的像素进行判断,当所述选择框内的像素中同一类别的像素所占比例大于或等于预设值时,将所述选择框内的图像作为一个子图像输出,并对所述子图像标注为所述类别;通过所述卷积神经网络得到所述子图像对应的特征。
  19. 根据权利要求18所述的装置,其特征在于,所述子图像提取单元,还用于当所述选择框内的像素中同一类别的像素所占比例小于预设值时,丢弃所述选择框。
  20. 根据权利要求18或19所述的装置,其特征在于,所述子图像提取单元在通过所述卷积神经网络得到所述子图像对应的特征时,用于通过卷积神经网络分别对所述未标注图像和所述已标注图像进行特征提取,获取对应所述未标注图像和所述已标注图像的特征图;基于所述子图像对应的选择框的位置和大小,从对应所述已标注图像的特征图中获得对应所述选择框内的特征,确定所述子图像对应的特征。
  21. 根据权利要求12-20任一所述的装置,其特征在于,所述装置还包括:模型微调单元,用于使用随机梯度下降法对所述语义分割模型进行训练,直至满足预设收敛条件。
  22. 根据权利要求12-21任一所述的装置,其特征在于,所述装置还包括:网络微调单元,用于使用随机梯度下降法对所述卷积神经网络进行训练,直至满足预设收敛条件。
  23. 一种电子设备,其特征在于,包括处理器,所述处理器包括权利要求12至22任意一项所述的语义分割模型的训练装置。
  24. 一种电子设备,其特征在于,包括:存储器,用于存储可执行指令;
    以及处理器,用于与所述存储器通信以执行所述可执行指令从而完成权利要求1至11任意一项所述语义分割模型的训练方法的操作。
  25. 一种计算机存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时执行权利要求1至11任意一项所述语义分割模型的训练方法的操作。
  26. 一种计算机程序,其特征在于,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现如权利要求1至11任意一项所述语义分割模型的训练方法中各步骤的指令。
PCT/CN2018/097549 2017-08-01 2018-07-27 语义分割模型的训练方法和装置、电子设备、存储介质 WO2019024808A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020197038767A KR102358554B1 (ko) 2017-08-01 2018-07-27 시맨틱 분할 모델을 위한 훈련 방법 및 장치, 전자 기기, 저장 매체
JP2019571272A JP6807471B2 (ja) 2017-08-01 2018-07-27 セマンティックセグメンテーションモデルの訓練方法および装置、電子機器、ならびに記憶媒体
SG11201913365WA SG11201913365WA (en) 2017-08-01 2018-07-27 Semantic segmentation model trainingmethods and apparatuses, electronic devices, and storage media
US16/726,880 US11301719B2 (en) 2017-08-01 2019-12-25 Semantic segmentation model training methods and apparatuses, electronic devices, and storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710648545.7 2017-08-01
CN201710648545.7A CN108229479B (zh) 2017-08-01 2017-08-01 语义分割模型的训练方法和装置、电子设备、存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/726,880 Continuation US11301719B2 (en) 2017-08-01 2019-12-25 Semantic segmentation model training methods and apparatuses, electronic devices, and storage media

Publications (1)

Publication Number Publication Date
WO2019024808A1 true WO2019024808A1 (zh) 2019-02-07

Family

ID=62654687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/097549 WO2019024808A1 (zh) 2017-08-01 2018-07-27 语义分割模型的训练方法和装置、电子设备、存储介质

Country Status (6)

Country Link
US (1) US11301719B2 (zh)
JP (1) JP6807471B2 (zh)
KR (1) KR102358554B1 (zh)
CN (1) CN108229479B (zh)
SG (1) SG11201913365WA (zh)
WO (1) WO2019024808A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781895A (zh) * 2019-10-10 2020-02-11 湖北工业大学 一种基于卷积神经网络的图像语义分割方法
CN111062252A (zh) * 2019-11-15 2020-04-24 浙江大华技术股份有限公司 一种实时危险物品语义分割方法、装置及存储装置
CN111553362A (zh) * 2019-04-01 2020-08-18 上海卫莎网络科技有限公司 一种视频处理方法、电子设备和计算机可读存储介质
CN111612802A (zh) * 2020-04-29 2020-09-01 杭州电子科技大学 一种基于现有图像语义分割模型的再优化训练方法及应用
CN111783779A (zh) * 2019-09-17 2020-10-16 北京沃东天骏信息技术有限公司 图像处理方法、装置和计算机可读存储介质
CN111814805A (zh) * 2020-06-18 2020-10-23 浙江大华技术股份有限公司 特征提取网络训练方法以及相关方法和装置
CN111833291A (zh) * 2019-04-22 2020-10-27 上海汽车集团股份有限公司 一种语义分割训练集人工标注评价方法及装置
CN113159057A (zh) * 2021-04-01 2021-07-23 湖北工业大学 一种图像语义分割方法和计算机设备
CN113450311A (zh) * 2021-06-01 2021-09-28 国网河南省电力公司漯河供电公司 基于语义分割和空间关系的带销螺丝缺陷检测方法及系统
CN113792742A (zh) * 2021-09-17 2021-12-14 北京百度网讯科技有限公司 遥感图像的语义分割方法和语义分割模型的训练方法
CN114693934A (zh) * 2022-04-13 2022-07-01 北京百度网讯科技有限公司 语义分割模型的训练方法、视频语义分割方法及装置
CN116883673A (zh) * 2023-09-08 2023-10-13 腾讯科技(深圳)有限公司 语义分割模型训练方法、装置、设备及存储介质

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229479B (zh) * 2017-08-01 2019-12-31 北京市商汤科技开发有限公司 语义分割模型的训练方法和装置、电子设备、存储介质
US10755142B2 (en) * 2017-09-05 2020-08-25 Cognizant Technology Solutions U.S. Corporation Automated and unsupervised generation of real-world training data
CN110012210B (zh) * 2018-01-05 2020-09-22 Oppo广东移动通信有限公司 拍照方法、装置、存储介质及电子设备
CN110622213B (zh) * 2018-02-09 2022-11-15 百度时代网络技术(北京)有限公司 利用3d语义地图进行深度定位和分段的系统和方法
CN109101878B (zh) * 2018-07-01 2020-09-29 浙江工业大学 一种用于秸秆燃值估计的图像分析系统及图像分析方法
CN109084955A (zh) * 2018-07-02 2018-12-25 北京百度网讯科技有限公司 显示屏质量检测方法、装置、电子设备及存储介质
CN109190631A (zh) * 2018-08-31 2019-01-11 阿里巴巴集团控股有限公司 图片的目标对象标注方法及装置
CN109087708B (zh) * 2018-09-20 2021-08-31 深圳先进技术研究院 用于斑块分割的模型训练方法、装置、设备及存储介质
JP6695947B2 (ja) * 2018-09-21 2020-05-20 ソニーセミコンダクタソリューションズ株式会社 固体撮像システム、画像処理方法及びプログラム
CN109241951A (zh) * 2018-10-26 2019-01-18 北京陌上花科技有限公司 色情图片识别方法、识别模型构建方法及识别模型和计算机可读存储介质
CN109583328B (zh) * 2018-11-13 2021-09-03 东南大学 一种嵌入稀疏连接的深度卷积神经网络字符识别方法
CN109859209B (zh) * 2019-01-08 2023-10-17 平安科技(深圳)有限公司 遥感影像分割方法、装置及存储介质、服务器
CN109886272B (zh) * 2019-02-25 2020-10-30 腾讯科技(深圳)有限公司 点云分割方法、装置、计算机可读存储介质和计算机设备
CN111626313B (zh) * 2019-02-28 2023-06-02 银河水滴科技(北京)有限公司 一种特征提取模型训练方法、图像处理方法及装置
US11580673B1 (en) * 2019-06-04 2023-02-14 Duke University Methods, systems, and computer readable media for mask embedding for realistic high-resolution image synthesis
US11023783B2 (en) * 2019-09-11 2021-06-01 International Business Machines Corporation Network architecture search with global optimization
US10943353B1 (en) 2019-09-11 2021-03-09 International Business Machines Corporation Handling untrainable conditions in a network architecture search
KR20210061839A (ko) * 2019-11-20 2021-05-28 삼성전자주식회사 전자 장치 및 그 제어 방법
US11080833B2 (en) * 2019-11-22 2021-08-03 Adobe Inc. Image manipulation using deep learning techniques in a patch matching operation
KR102198480B1 (ko) * 2020-02-28 2021-01-05 연세대학교 산학협력단 재귀 그래프 모델링을 통한 비디오 요약 생성 장치 및 방법
CN113496277A (zh) 2020-04-03 2021-10-12 三星电子株式会社 用于检索图像的神经网络装置及其操作方法
CN111401474B (zh) * 2020-04-13 2023-09-08 Oppo广东移动通信有限公司 视频分类模型的训练方法、装置、设备及存储介质
CN111489366B (zh) * 2020-04-15 2024-06-11 上海商汤临港智能科技有限公司 神经网络的训练、图像语义分割方法及装置
CN111652285A (zh) * 2020-05-09 2020-09-11 济南浪潮高新科技投资发展有限公司 一种茶饼类别识别方法、设备及介质
CN111611420B (zh) * 2020-05-26 2024-01-23 北京字节跳动网络技术有限公司 用于生成图像描述信息的方法和装置
CN111710009B (zh) * 2020-05-29 2023-06-23 北京百度网讯科技有限公司 人流密度的生成方法、装置、电子设备以及存储介质
CN111667483B (zh) * 2020-07-03 2022-08-30 腾讯科技(深圳)有限公司 多模态图像的分割模型的训练方法、图像处理方法和装置
CN111898696B (zh) * 2020-08-10 2023-10-27 腾讯云计算(长沙)有限责任公司 伪标签及标签预测模型的生成方法、装置、介质及设备
CN111931782B (zh) * 2020-08-12 2024-03-01 中国科学院上海微系统与信息技术研究所 语义分割方法、系统、介质及装置
CN112016599B (zh) * 2020-08-13 2023-09-15 驭势科技(浙江)有限公司 用于图像检索的神经网络训练方法、装置及电子设备
CN112085739B (zh) * 2020-08-20 2024-05-24 深圳力维智联技术有限公司 基于弱监督的语义分割模型的训练方法、装置及设备
US11694301B2 (en) 2020-09-30 2023-07-04 Alibaba Group Holding Limited Learning model architecture for image data semantic segmentation
US20220147761A1 (en) * 2020-11-10 2022-05-12 Nec Laboratories America, Inc. Video domain adaptation via contrastive learning
CN112613515A (zh) * 2020-11-23 2021-04-06 上海眼控科技股份有限公司 语义分割方法、装置、计算机设备和存储介质
CN112559552B (zh) * 2020-12-03 2023-07-25 北京百度网讯科技有限公司 数据对生成方法、装置、电子设备及存储介质
CN112668509B (zh) * 2020-12-31 2024-04-02 深圳云天励飞技术股份有限公司 社交关系识别模型的训练方法、识别方法及相关设备
CN112861911B (zh) * 2021-01-10 2024-05-28 西北工业大学 一种基于深度特征选择融合的rgb-d语义分割方法
CN112862792B (zh) * 2021-02-21 2024-04-05 北京工业大学 一种用于小样本图像数据集的小麦白粉病孢子分割方法
CN112686898B (zh) * 2021-03-15 2021-08-13 四川大学 一种基于自监督学习的放疗靶区自动分割方法
CN113011430B (zh) * 2021-03-23 2023-01-20 中国科学院自动化研究所 大规模点云语义分割方法及系统
CN113177926B (zh) * 2021-05-11 2023-11-14 泰康保险集团股份有限公司 一种图像检测方法和装置
KR102638075B1 (ko) * 2021-05-14 2024-02-19 (주)로보티즈 3차원 지도 정보를 이용한 의미론적 분할 방법 및 시스템
US20230004760A1 (en) * 2021-06-28 2023-01-05 Nvidia Corporation Training object detection systems with generated images
CN113627568A (zh) * 2021-08-27 2021-11-09 广州文远知行科技有限公司 一种补标方法、装置、设备及可读存储介质
CN113806573A (zh) * 2021-09-15 2021-12-17 上海商汤科技开发有限公司 标注方法、装置、电子设备、服务器及存储介质
CN113837192B (zh) * 2021-09-22 2024-04-19 推想医疗科技股份有限公司 图像分割方法及装置,神经网络的训练方法及装置
WO2023063950A1 (en) * 2021-10-14 2023-04-20 Hewlett-Packard Development Company, L.P. Training models for object detection
CN113642566B (zh) * 2021-10-15 2021-12-21 南通宝田包装科技有限公司 基于人工智能和大数据的药品包装设计方法
CN113642262B (zh) * 2021-10-15 2021-12-21 南通宝田包装科技有限公司 基于人工智能的牙膏包装外观辅助设计方法
US11941884B2 (en) * 2021-11-12 2024-03-26 Adobe Inc. Multi-source panoptic feature pyramid network
CN113936141B (zh) * 2021-12-17 2022-02-22 深圳佑驾创新科技有限公司 图像语义分割方法及计算机可读存储介质
CN114372537B (zh) * 2022-01-17 2022-10-21 浙江大学 一种面向图像描述系统的通用对抗补丁生成方法及系统
CN114663662B (zh) * 2022-05-23 2022-09-09 深圳思谋信息科技有限公司 超参数搜索方法、装置、计算机设备和存储介质
CN115086503B (zh) * 2022-05-25 2023-09-22 清华大学深圳国际研究生院 信息隐藏方法、装置、设备及存储介质
CN114677567B (zh) * 2022-05-27 2022-10-14 成都数联云算科技有限公司 模型训练方法、装置、存储介质及电子设备
CN117274579A (zh) * 2022-06-15 2023-12-22 北京三星通信技术研究有限公司 图像处理方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787482A (zh) * 2016-02-26 2016-07-20 华北电力大学 一种基于深度卷积神经网络的特定目标轮廓图像分割方法
CN106022221A (zh) * 2016-05-09 2016-10-12 腾讯科技(深圳)有限公司 一种图像处理方法及处理系统
WO2017091833A1 (en) * 2015-11-29 2017-06-01 Arterys Inc. Automated cardiac volume segmentation
CN108229479A (zh) * 2017-08-01 2018-06-29 北京市商汤科技开发有限公司 语义分割模型的训练方法和装置、电子设备、存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9317908B2 (en) * 2012-06-29 2016-04-19 Behavioral Recognition System, Inc. Automatic gain control filter in a video analysis system
US9558268B2 (en) * 2014-08-20 2017-01-31 Mitsubishi Electric Research Laboratories, Inc. Method for semantically labeling an image of a scene using recursive context propagation
US9836641B2 (en) * 2014-12-17 2017-12-05 Google Inc. Generating numeric embeddings of images
US9704257B1 (en) * 2016-03-25 2017-07-11 Mitsubishi Electric Research Laboratories, Inc. System and method for semantic segmentation using Gaussian random field network
JP2018097807A (ja) * 2016-12-16 2018-06-21 株式会社デンソーアイティーラボラトリ 学習装置
JP7203844B2 (ja) * 2017-07-25 2023-01-13 達闥機器人股▲分▼有限公司 トレーニングデータの生成方法、生成装置及びその画像のセマンティックセグメンテーション方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017091833A1 (en) * 2015-11-29 2017-06-01 Arterys Inc. Automated cardiac volume segmentation
CN105787482A (zh) * 2016-02-26 2016-07-20 华北电力大学 一种基于深度卷积神经网络的特定目标轮廓图像分割方法
CN106022221A (zh) * 2016-05-09 2016-10-12 腾讯科技(深圳)有限公司 一种图像处理方法及处理系统
CN108229479A (zh) * 2017-08-01 2018-06-29 北京市商汤科技开发有限公司 语义分割模型的训练方法和装置、电子设备、存储介质

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553362B (zh) * 2019-04-01 2023-05-05 上海卫莎网络科技有限公司 一种视频处理方法、电子设备和计算机可读存储介质
CN111553362A (zh) * 2019-04-01 2020-08-18 上海卫莎网络科技有限公司 一种视频处理方法、电子设备和计算机可读存储介质
CN111833291B (zh) * 2019-04-22 2023-11-03 上海汽车集团股份有限公司 一种语义分割训练集人工标注评价方法及装置
CN111833291A (zh) * 2019-04-22 2020-10-27 上海汽车集团股份有限公司 一种语义分割训练集人工标注评价方法及装置
CN111783779A (zh) * 2019-09-17 2020-10-16 北京沃东天骏信息技术有限公司 图像处理方法、装置和计算机可读存储介质
CN111783779B (zh) * 2019-09-17 2023-12-05 北京沃东天骏信息技术有限公司 图像处理方法、装置和计算机可读存储介质
CN110781895A (zh) * 2019-10-10 2020-02-11 湖北工业大学 一种基于卷积神经网络的图像语义分割方法
CN110781895B (zh) * 2019-10-10 2023-06-20 湖北工业大学 一种基于卷积神经网络的图像语义分割方法
CN111062252A (zh) * 2019-11-15 2020-04-24 浙江大华技术股份有限公司 一种实时危险物品语义分割方法、装置及存储装置
CN111062252B (zh) * 2019-11-15 2023-11-10 浙江大华技术股份有限公司 一种实时危险物品语义分割方法、装置及存储装置
CN111612802A (zh) * 2020-04-29 2020-09-01 杭州电子科技大学 一种基于现有图像语义分割模型的再优化训练方法及应用
CN111814805A (zh) * 2020-06-18 2020-10-23 浙江大华技术股份有限公司 特征提取网络训练方法以及相关方法和装置
CN113159057A (zh) * 2021-04-01 2021-07-23 湖北工业大学 一种图像语义分割方法和计算机设备
CN113159057B (zh) * 2021-04-01 2022-09-02 湖北工业大学 一种图像语义分割方法和计算机设备
CN113450311A (zh) * 2021-06-01 2021-09-28 国网河南省电力公司漯河供电公司 基于语义分割和空间关系的带销螺丝缺陷检测方法及系统
CN113792742A (zh) * 2021-09-17 2021-12-14 北京百度网讯科技有限公司 遥感图像的语义分割方法和语义分割模型的训练方法
CN114693934A (zh) * 2022-04-13 2022-07-01 北京百度网讯科技有限公司 语义分割模型的训练方法、视频语义分割方法及装置
CN114693934B (zh) * 2022-04-13 2023-09-01 北京百度网讯科技有限公司 语义分割模型的训练方法、视频语义分割方法及装置
CN116883673A (zh) * 2023-09-08 2023-10-13 腾讯科技(深圳)有限公司 语义分割模型训练方法、装置、设备及存储介质
CN116883673B (zh) * 2023-09-08 2023-12-26 腾讯科技(深圳)有限公司 语义分割模型训练方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN108229479B (zh) 2019-12-31
KR20200015611A (ko) 2020-02-12
US11301719B2 (en) 2022-04-12
JP2020524861A (ja) 2020-08-20
US20200134375A1 (en) 2020-04-30
JP6807471B2 (ja) 2021-01-06
SG11201913365WA (en) 2020-01-30
CN108229479A (zh) 2018-06-29
KR102358554B1 (ko) 2022-02-04

Similar Documents

Publication Publication Date Title
WO2019024808A1 (zh) 语义分割模型的训练方法和装置、电子设备、存储介质
TWI721510B (zh) 雙目圖像的深度估計方法、設備及儲存介質
CN108229296B (zh) 人脸皮肤属性识别方法和装置、电子设备、存储介质
WO2020006961A1 (zh) 用于提取图像的方法和装置
JP7373554B2 (ja) クロスドメイン画像変換
CN108399383B (zh) 表情迁移方法、装置存储介质及程序
WO2018099473A1 (zh) 场景分析方法和系统、电子设备
US20210342643A1 (en) Method, apparatus, and electronic device for training place recognition model
WO2019011249A1 (zh) 一种图像中物体姿态的确定方法、装置、设备及存储介质
WO2018054329A1 (zh) 物体检测方法和装置、电子设备、计算机程序和存储介质
WO2019034129A1 (zh) 神经网络结构的生成方法和装置、电子设备、存储介质
WO2019238072A1 (zh) 深度神经网络的归一化方法和装置、设备、存储介质
CN108154222B (zh) 深度神经网络训练方法和系统、电子设备
CN108229287B (zh) 图像识别方法和装置、电子设备和计算机存储介质
WO2019214344A1 (zh) 系统增强学习方法和装置、电子设备、计算机存储介质
WO2023040510A1 (zh) 图像异常检测模型训练方法、图像异常检测方法和装置
US10521919B2 (en) Information processing device and information processing method for applying an optimization model
US9697614B2 (en) Method for segmenting and tracking content in videos using low-dimensional subspaces and sparse vectors
CN108229313B (zh) 人脸识别方法和装置、电子设备和计算机程序及存储介质
CN113688907B (zh) 模型训练、视频处理方法,装置,设备以及存储介质
CN114746898A (zh) 用于生成图像抠图的三分图的方法和系统
CN112862877A (zh) 用于训练图像处理网络和图像处理的方法和装置
CN114511041B (zh) 模型训练方法、图像处理方法、装置、设备和存储介质
CN111868786B (zh) 跨设备监控计算机视觉系统
US20160071287A1 (en) System and method of tracking an object

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18840825

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019571272

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20197038767

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/07/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18840825

Country of ref document: EP

Kind code of ref document: A1