WO2019024808A1 - 语义分割模型的训练方法和装置、电子设备、存储介质 - Google Patents
语义分割模型的训练方法和装置、电子设备、存储介质 Download PDFInfo
- Publication number
- WO2019024808A1 WO2019024808A1 PCT/CN2018/097549 CN2018097549W WO2019024808A1 WO 2019024808 A1 WO2019024808 A1 WO 2019024808A1 CN 2018097549 W CN2018097549 W CN 2018097549W WO 2019024808 A1 WO2019024808 A1 WO 2019024808A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- sub
- images
- semantic segmentation
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/274—Syntactic or semantic context, e.g. balancing
Definitions
- the embodiments of the present application relate to computer vision technologies, and in particular, to a training method and apparatus for a semantic segmentation model, an electronic device, and a storage medium.
- Image semantic segmentation assigns a corresponding judgment label to each pixel of the input image, indicating which object or category the pixel is most likely to belong to. It is an important task in the field of computer vision, and its applications include machine scene understanding, video analysis and so on.
- the embodiment of the present application provides a training technique for a semantic segmentation model.
- the semantic segmentation model is trained based on the categories of at least two sub-images and the feature distance between the at least two sub-images.
- a training apparatus for a semantic segmentation model which includes:
- a segmentation unit configured to perform image semantic segmentation on at least one unlabeled image by using a semantic segmentation model to obtain a preliminary semantic segmentation result as a category of the unlabeled image
- a sub-image extracting unit configured to obtain, by the convolutional neural network, a feature corresponding to the sub-image and the sub-image corresponding to the at least two images, based on the category of the at least one unlabeled image and the category of the at least one labeled image,
- the at least two images include at least one of the unlabeled image and at least one of the labeled images, the at least two sub-images carrying a category of a corresponding image;
- a training unit configured to train the semantic segmentation model based on a category of at least two sub-images and a feature distance between the at least two sub-images.
- an electronic device including a processor, the processor including a training device of a semantic segmentation model as described above.
- an electronic device includes: a memory, configured to store executable instructions;
- a processor for communicating with the memory to execute the executable instructions to perform the operations of the training method of the semantic segmentation model as described above.
- a computer storage medium for storing computer readable instructions that, when executed, perform the operations of the training method of the semantic segmentation model as described above.
- a computer program comprising computer readable code, the processor in the device executing to implement any implementation of the present application when the computer readable code is run on a device
- the instructions of each step in the training method of the semantic segmentation model described in the example are provided.
- the semantic segmentation model is used to perform image semantic segmentation on the unlabeled image, so that the unlabeled image can obtain a noisy category.
- the sub-image corresponding to at least two images is obtained, and the labeled image and the unlabeled image are applied to the training, and self-supervised training is realized; through the convolutional neural network, The feature extraction is performed on the sub-image, and the semantic segmentation model is trained based on the category of at least two sub-images and the feature distance between the at least two sub-images, and the self-supervised learning with strong semantic discrimination capability is obtained through training.
- the semantic segmentation model can achieve higher accuracy in semantic segmentation.
- FIG. 1 is a flow chart of an embodiment of a training method for a semantic segmentation model of the present application.
- FIG. 2 is a schematic diagram showing an example of establishing a patch map of the training method of the semantic segmentation model of the present application.
- FIG. 3 is another schematic diagram of establishing a patch map of the training method of the semantic segmentation model of the present application.
- FIG. 4 is a schematic structural diagram of an embodiment of a training apparatus for a semantic segmentation model of the present application.
- FIG. 5 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
- Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
- the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
- program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
- program modules may be located on a local or remote computing system storage medium including storage devices.
- FIG. 1 is a flow chart of an embodiment of a training method for a semantic segmentation model of the present application. As shown in Figure 1, the method of this embodiment includes:
- Step 101 Perform semantic segmentation on at least one unlabeled image by using a semantic segmentation model to obtain a preliminary semantic segmentation result as a category of the unlabeled image.
- the unlabeled image means that the category (for example, the semantic category) of some or all of the pixels in the image is uncertain.
- the unlabeled image can be performed by a known semantic segmentation model. Image semantic segmentation to obtain semantic segmentation results with noise.
- the step 101 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a segmentation unit 41 that is executed by the processor.
- Step 102 Obtain a feature corresponding to the sub-image and the sub-image corresponding to the at least two images by using the convolutional neural network based on the category of the at least one unlabeled image and the category of the at least one labeled image.
- the at least two images include at least one unlabeled image and at least one labeled image, and at least two of the sub-images carry a category of the corresponding image.
- the size of the selection box is moved in the image, and then according to the type of the pixel in the image, it is determined whether the pixels in the selection frame are of the same category, and pixels in a selection box that exceed the set ratio belong to the same category. , you can output this selection box as a sub image.
- the step 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a sub-image extraction unit 42 that is executed by the processor.
- Step 103 Train the semantic segmentation model based on the categories of the at least two sub-images and the feature distance between the at least two sub-images.
- the step 103 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a training unit 43 executed by the processor.
- the semantic segmentation model Based on the training method of the semantic segmentation model provided by the above embodiments of the present application, the semantic segmentation model performs image semantic segmentation on the unlabeled image, so that the unlabeled image can obtain a noisy category, based on the unlabeled image category, and The type of the image has been labeled, and at least two sub-images corresponding to the image are obtained, and the labeled image and the unlabeled image are applied to the training, and the self-supervised training is realized; and the feature extraction is performed on the sub-image through the convolutional neural network.
- the semantic segmentation model is trained, and the semantic segmentation model of self-supervised learning with strong semantic discrimination ability is obtained through training, on semantic segmentation. Can achieve higher accuracy.
- Self-supervised learning is performed by using the unlabeled image itself to obtain an image descriptor.
- the image descriptor is a high-latitude vector that can be used to describe the semantic information of the image; then these image descriptors are used for semantic segmentation training.
- step 103 includes:
- the patch graph includes nodes and edges, the nodes include sub-images, and the edges include feature distances between any two sub-images;
- the semantic segmentation model is trained such that the feature distance between the two sub-images with the same category is smaller than the first preset value in the patch image, and the feature distance between the two sub-images with different categories is greater than the second preset value. .
- FIG. 2 is a schematic diagram of an example of establishing a patch map of the training method of the semantic segmentation model of the present application.
- a sub-image is used as a node 221, and at least one of the known category images 21 is selected by the selection box 211.
- a sub-image the feature distance between the sub-images having the connection relationship is taken as the edge 222 (the feature in the selection frame selected in the middle layer feature in FIG.
- the feature of the sub-image is a feature selected by the corresponding selection frame in the feature map outputted by the output layer of the convolutional neural network; optionally, the output layer is the middle layer in the convolutional neural network Or any layer in the deep layer; select one of the layers in the convolutional neural network or the deep layer as the output layer, wherein the shallow features of the image generally represent some edges, corners, and the like of the object in the image, and the layer features in the image are generally Characterize some parts of the object (such as: the wheel of the vehicle, the nose of the face, etc.), the deep features of the image generally represent the category information of the image as a whole (eg, people, cars, horses, etc.) In order to create a map through the sub-image and optimize the parameters, select one of the middle layer or the deep layer as the output layer of the labeled image and the unlabeled
- first preset value and the second preset value are preset
- the second preset value is generally greater than the first preset value
- the first preset value and the second preset value are used to make the categories the same
- FIG. 3 is another schematic diagram of establishing a patch map of the training method of the semantic segmentation model of the present application.
- the method of the embodiment comprises: based on a convolutional neural network (CNN in FIG. 3), based on a category of at least one unlabeled image (the category of the unlabeled image can be obtained based on a known semantic segmentation model), and at least one labeled image a class, obtaining a feature corresponding to the sub-image and the sub-image corresponding to the at least two images respectively (the feature of the corresponding sub-image position in the middle layer feature in FIG. 3); establishing a patch graph according to the class relationship between the sub-images
- the patch map includes nodes and edges (the circles in Fig. 3 represent nodes, and the lines connecting the two circles represent edges), the nodes include sub-images, and the edges include feature distances between any two sub-images.
- the patch map is established according to the class relationship between the sub-images, including:
- a sub-image of the same category as the reference node is used as a positive correlation node, and a sub-image of a different category from the reference node is used as a negative correlation node, and a positive correlation connection is established between the reference node and at least one positive correlation node, respectively, at the reference node and At least one negative correlation node establishes a negative correlation connection;
- a map of sparse connections is formed by at least one reference node, a positive correlation node of the reference node, a negative correlation node of the reference node, a positive correlation connection, and a negative correlation connection.
- the process of creating a patch map is to randomly select a plurality of sub-images from at least two sub-images, respectively, using randomly selected sub-images as anchors, and randomly selecting one sub-category of the same category based on the semantic category.
- the image acts as a positive, randomly selecting a sub-image of a different semantic category from the anchor as a negative, and then establishing two connections based on one sub-image: anchor-positive and anchor-negative; based on these connections , a patch map of sparse connections is established.
- the semantic segmentation model is trained, including:
- the semantic segmentation model is trained by a gradient back propagation algorithm to minimize the error of the convolutional neural network, and the error is a triple loss of the feature of the corresponding sub-image obtained based on the convolutional neural network.
- the gradient back propagation algorithm is used to narrow the error in the convolutional neural network, so that the parameters of the convolutional neural network from at least one layer of the first layer to the output layer are optimized, and the gradient back propagation algorithm (BP, Back)
- the Propagation algorithm is a learning algorithm suitable for multi-layer neural networks under the guidance of a mentor. It is based on the gradient descent method.
- the input-output relationship of BP network is essentially a mapping relationship: the function performed by a BP neural network with n input m output is a continuous mapping from n-dimensional Euclidean space to a finite field in m-dimensional Euclidean space. The mapping is highly nonlinear.
- the learning process of the BP algorithm consists of a forward propagation process and a back propagation process.
- the input information is processed through the hidden layer through the input layer and processed layer by layer and transmitted to the output layer. If the desired output value is not obtained at the output layer, the sum of the squares of the output and the expected error is taken as the objective function, and the back propagation is performed, and the partial derivative of the objective function for each neuron weight is obtained layer by layer to form a target.
- the ladder of the function to the weight vector is used as the basis for modifying the weight.
- the learning of the network is completed in the process of weight modification. When the error reaches the desired value, the network learning ends.
- edges in the patch map are obtained from the feature distances between the sub-images output by the output layer, where the output layer is a layer selected from the middle layer or the deep layer, and therefore, the parameters of all the layers of the convolutional neural network are not optimized. It is the parameter from the first layer to the output layer. Therefore, in the error calculation process, the error from the output layer to at least one layer in the first layer is also calculated.
- the semantic segmentation model is trained by a gradient back propagation algorithm, including:
- the error is calculated according to the distance between the sub-images output by the convolutional neural network after optimizing the parameters, and the error is taken as the maximum error;
- Iterative execution calculates the error of at least one layer in the convolutional neural network by using the maximum error to propagate back through the gradient; calculates the gradient of at least one layer of parameters according to the error of at least one layer, and corrects the parameters of the corresponding layer in the convolutional neural network according to the gradient Until the maximum error is less than or equal to the preset value.
- a loss function is first defined, and the convolutional neural network optimizes the network parameters by minimizing the loss function, as shown in equation (1):
- the gradient backpropagation algorithm can optimize the parameters of each layer in the convolutional neural network.
- the process of training the semantic segmentation model may include:
- the parameters in the semantic segmentation model are initialized based on the parameters of the obtained convolutional neural network.
- the parameters of the trained convolutional neural network have strong semantic class distinguishing, and a higher accuracy can be obtained in semantic segmentation.
- the parameters of the convolutional neural network replace the parameters in the original semantic segmentation model, and the trained semantic segmentation model is obtained.
- step 102 may include:
- the selection frame in the preset size is moved on the at least two images, and the pixels in the selection frame are determined.
- the proportion of the pixels in the same semantic category in the pixels in the selection frame is greater than or equal to a preset value, the selection is selected.
- the image in the frame is output as a sub image, and the sub image is marked as a category;
- the features corresponding to the sub-images are obtained by convolutional neural networks.
- At least two images are segmented by a selection box of variable size, wherein at least two images include an unlabeled image and an annotated image, and the pixels in the selection frame belong to a category (eg, semantics)
- the selection box may be divided into the categories, and the pixel output in the selection box is used as a sub-image, and the size of the selection frame is adjustable.
- the size of the selection box can be adjusted to re-segment until a certain number of sub-images are obtained.
- the step 102 may further include: discarding the selection box when the proportion of the pixels of the same category in the pixels in the selection box is less than a preset value.
- the multiple categories correspond to If the pixel ratio is less than the preset value, then the selection box cannot determine the category. In this case, the selection box needs to be moved to the next position, and the judgment is continued at the next position; when the selection box through a set size is not in an image. When you get any sub-image, you need to adjust the size of the selection box and re-select the image.
- the feature corresponding to the sub-image obtained by the convolutional neural network includes:
- the features in the corresponding selection frame are obtained from the corresponding feature map, and the features corresponding to the sub-image are determined.
- the feature of the corresponding sub-image is selected by the selection frame of the same position and size in the feature map of the output layer of the corresponding convolutional neural network, and then passed
- the feature of the sub-image obtains the feature distance between any two sub-images.
- the training method of the semantic segmentation model of the present application may further include, prior to step 102, initializing parameters of the convolutional neural network based on parameters of the semantic segmentation model.
- parameters of the convolutional neural network are initialized using parameters of the semantic segmentation model.
- the method may further include:
- the semantic segmentation model is trained using the stochastic gradient descent method until the preset convergence condition is met.
- the fine tuning process may include: 1. A semantic segmentation model using a VGG-16 network structure. 2. Set the initial learning rate of the semantic segmentation model to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
- the gradient of at least one layer of parameters is calculated according to the error of at least one layer, and the parameter value is corrected according to the gradient; the model is converged in the process of constant correction. 5. Iterate to the convergence of the model around the 60,000th round. 6. Use this semantic segmentation model to test on existing public datasets.
- the method may further include:
- the convolutional neural network is trained using a stochastic gradient descent method until the default convergence condition is met.
- the fine tuning process may include: 1. A convolutional neural network using a VGG-16 network structure. 2. Set the initial learning rate of the convolutional neural network to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
- the gradient of at least one layer of parameters is calculated based on the error of at least one layer, and the parameter values are corrected according to the gradient; the network is converged during the process of constant correction. 5. Iterate to the network convergence around the 60,000th round. 6. Use this convolutional neural network to test on existing public data sets.
- the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
- the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- FIG. 4 is a schematic structural diagram of an embodiment of a training apparatus for a semantic segmentation model of the present application.
- the apparatus of this embodiment can be used to implement the various method embodiments described above. As shown in FIG. 4, the apparatus of this embodiment includes:
- the segmentation unit 41 is configured to perform image semantic segmentation on at least one unlabeled image by using a semantic segmentation model to obtain a preliminary semantic segmentation result as a category of the unlabeled image.
- the sub-image extracting unit 42 is configured to obtain, by the convolutional neural network, a feature corresponding to the sub-image and the sub-image corresponding to the at least two images based on the category of the at least one unlabeled image and the category of the at least one labeled image.
- the at least two images include at least one unlabeled image and at least one labeled image, and at least two of the sub-images carry a category of the corresponding image.
- the training unit 43 is configured to train the semantic segmentation model based on the categories of the at least two sub-images and the feature distance between the at least two sub-images.
- the semantic segmentation model performs image semantic segmentation on the unlabeled image, so that the unlabeled image can obtain a noisy category, based on the unlabeled image category, and The type of the image has been labeled, and at least two sub-images corresponding to the image are obtained, and the labeled image and the unlabeled image are applied to the training, and the self-supervised training is realized; and the feature extraction is performed on the sub-image through the convolutional neural network.
- the semantic segmentation model is trained, and the semantic segmentation model of self-supervised learning with strong semantic discrimination ability is obtained through training, on semantic segmentation. Can achieve higher accuracy.
- the training unit 43 includes:
- a patch map establishing module configured to establish a patch map according to a category relationship between the sub-images, the patch map includes a node and an edge, the node includes a sub-image, and the edge includes a feature distance between any two sub-images;
- the model training module is configured to train the semantic segmentation model, so that the feature distance between the two sub-images with the same category is smaller than the first preset value in the patch map, and the feature distance between the two sub-images with different categories is greater than The second preset value.
- the sub-image is used as a node, and the feature distance between the sub-images having the connection relationship is used as an edge, wherein the sub-images are between
- the connection relationship is determined according to the category corresponding to the sub-image;
- the feature of the sub-image is a feature selected by the corresponding selection frame in the feature map outputted by the output layer of the convolutional neural network; optionally, the output layer is a volume Any of the middle or deep layers in the neural network; one of the layers or deep layers in the convolutional neural network is selected as the output layer, wherein the shallow features of the image generally characterize some edges, corners, etc. of the objects in the image.
- the middle layer features of the image generally represent some part information of the object (such as: the wheel of the vehicle, the nose of the face, etc.), and the deep features of the image generally represent the category information of the whole image (eg, people, cars, horses, etc.);
- the image is created and the parameters are optimized.
- One of the middle or deep layers is selected as the output layer of the labeled image and the unlabeled image, and after multiple times It is proved that the optimization effect of the middle layer feature is better than the deep layer feature; wherein the first preset value and the second preset value are preset, and usually the second preset value is greater than the first preset value, and the first preset is adopted.
- the value and the second preset value make the feature distance between the two sub-images of the same category smaller, and the feature distance between the two sub-images of different categories is larger.
- the patch map creation module includes:
- a reference selection module configured to select at least one sub-image as a reference node
- connection relationship establishing module configured to respectively target at least one reference node: a sub-image of the same category as the reference node as a positive correlation node, and a sub-image of a different category from the reference node as a negative correlation node, respectively at the reference node and at least one positive Establishing a positive correlation connection between the relevant nodes, and establishing a negative correlation connection with the at least one of the negative correlation nodes respectively at the reference node;
- a connection graph establishing module is configured to form a map of sparse connections by at least one reference node, a positive correlation node of the reference node, a negative correlation node of the reference node, a positive correlation connection, and a negative correlation connection.
- the model training module includes:
- the network training module is used to train the semantic segmentation model by the gradient back propagation algorithm to minimize the error of the convolutional neural network, and the error is the triple loss of the feature of the corresponding sub-image obtained based on the convolutional neural network.
- the network training module is specifically configured to:
- the error is calculated according to the distance between the sub-images output by the convolutional neural network after optimizing the parameters, and the error is taken as the maximum error;
- Iterative execution calculates the error of at least one layer in the convolutional neural network by using the maximum error to propagate back through the gradient; calculates the gradient of at least one layer of parameters according to the error of at least one layer, and corrects the parameters of the corresponding layer in the convolutional neural network according to the gradient Until the maximum error is less than or equal to the preset value.
- the model training module further includes:
- the segmentation model training module is configured to obtain parameters of the convolutional neural network based on the training result of the convolutional neural network; and initialize the parameters in the semantic segmentation model based on the obtained parameters of the convolutional neural network.
- the sub-image extraction unit is configured to move on the at least two images in response to the selection frame of the preset size, and the selection frame The pixels in the selection are judged.
- the image in the selection frame is output as a sub-image, and the sub-image is marked as a category;
- the neural network obtains the features corresponding to the sub-images.
- At least two images are segmented by a selection box of variable size, wherein at least two images include an unlabeled image and an annotated image, and the pixels in the selection frame belong to a category (eg, semantics)
- the selection box may be divided into the categories, and the pixel output in the selection box is used as a sub-image, and the size of the selection box is adjustable.
- the size of the selection box can be adjusted to re-segment until a certain number of sub-images are obtained.
- the sub-image extraction unit is further configured to discard the pixel of the same category in the pixels in the selection frame when the proportion of the pixels in the same category is less than a preset value. Select box.
- the sub-image extraction unit is configured to respectively mark the uncorrelated by the convolutional neural network when obtaining the feature corresponding to the sub-image through the convolutional neural network.
- Feature extraction is performed on the image and the labeled image, and the feature map corresponding to the unlabeled image and the labeled image is obtained; and the feature in the corresponding selection frame is obtained from the feature map corresponding to the labeled image based on the position and size of the selection frame corresponding to the sub image , determining the feature corresponding to the sub image.
- the device of the embodiment further includes: a model fine-tuning unit, configured to perform training on the semantic segmentation model by using a random gradient descent method, Until the preset convergence condition is met.
- the fine tuning process may include: 1. A semantic segmentation model using a VGG-16 network structure. 2. Set the initial learning rate of the semantic segmentation model to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
- the gradient of at least one layer of parameters is calculated according to the error of at least one layer, and the parameter value is corrected according to the gradient; the model is converged in the process of constant correction. 5. Iterate to the convergence of the model around the 60,000th round. 6. Use this semantic segmentation model to test on existing public datasets.
- the device of the embodiment further includes: a network fine tuning unit for training the convolutional neural network using a random gradient descent method Until the preset convergence condition is met.
- the fine tuning process may include: 1. A convolutional neural network using a VGG-16 network structure. 2. Set the initial learning rate of the convolutional neural network to 0.01, and reduce it by 10 times every 30,000 rounds of iteration. 3. Use the stochastic gradient descent algorithm to fine tune and optimize the semantic segmentation task, which uses 8 GPU distribution calculations. 4. Random gradient descent algorithm: randomly select a batch of data (16 pictures in this case), input into the network, forward the result, calculate the error between the result and the labeled result, and use the back propagation to get at least one layer of error. .
- the gradient of at least one layer of parameters is calculated based on the error of at least one layer, and the parameter values are corrected according to the gradient; the network is converged during the process of constant correction. 5. Iterate to the network convergence around the 60,000th round. 6. Use this convolutional neural network to test on existing public data sets.
- an electronic device including a processor, which includes any one of the embodiments of the training device of the semantic segmentation model of the present application.
- an electronic device includes: a memory, configured to store executable instructions;
- a processor for communicating with the memory to execute executable instructions to perform the operations of any of the embodiments of the training method of the semantic segmentation model of the present application.
- a computer storage medium for storing computer readable instructions, wherein the training method for performing the semantic segmentation model of the present application when the instructions are executed is in each embodiment. Any of the operations.
- the embodiment of the present application further provides a computer program, comprising computer readable code, when the computer readable code is run on a device, the processor in the device performs the implementation of any embodiment of the present application.
- the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- FIG. 5 a schematic structural diagram of an electronic device 500 suitable for implementing a terminal device or a server of an embodiment of the present application is shown.
- the electronic device 500 includes one or more processors and a communication unit.
- the one or more processors are, for example: one or more central processing units (CPUs) 501, and/or one or more image processing units (GPUs) 513, etc., the processors may be stored in a read only memory ( Various suitable actions and processes are performed by executable instructions in ROM) 502 or executable instructions loaded into random access memory (RAM) 503 from storage portion 508.
- the communication part 512 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
- the processor can communicate with the read-only memory 502 and/or the random access memory 503 to execute executable instructions, connect to the communication unit 512 via the bus 504, and communicate with other target devices via the communication unit 512, thereby completing the embodiments of the present application.
- Corresponding operations of any of the methods for example, by semantic segmentation model, image semantic segmentation of at least one unlabeled image, to obtain a preliminary semantic segmentation result as a category of the unlabeled image; by convolutional neural network, based on at least a category of the unlabeled image, and a category of the at least one labeled image, the sub-image corresponding to the at least two images and the feature corresponding to the sub-image, the at least two images including at least one unlabeled image and at least one labeled image, The at least two sub-images carry a category of the corresponding image; the semantic segmentation model is trained based on the categories of the at least two sub-images and the feature distance between the at least two sub-images.
- RAM 503 various programs and data required for the operation of the device can be stored.
- the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
- ROM 502 is an optional module.
- the RAM 503 stores executable instructions, or writes executable instructions to the ROM 502 at runtime, and the executable instructions cause the central processing unit 501 to perform operations corresponding to the above-described communication methods.
- An input/output (I/O) interface 505 is also coupled to bus 504.
- the communication unit 512 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
- the following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet.
- Driver 510 is also coupled to I/O interface 505 as needed.
- a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
- FIG. 5 is only an optional implementation manner.
- the number and type of components in FIG. 5 may be selected, deleted, added, or replaced according to actual needs; Different function component settings may also be implemented by separate settings or integrated settings.
- the GPU 513 and the CPU 501 may be separately configured or the GPU 513 may be integrated on the CPU 501.
- the communication unit may be separately configured or integrated on the CPU 501 or the GPU 513. and many more.
- an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, performing semantic segmentation on at least one unlabeled image by using a semantic segmentation model, and obtaining preliminary semantic segmentation results as categories of unlabeled images; by convolutional neural networks, And obtaining, according to the category of the at least one unlabeled image and the category of the at least one labeled image, a feature corresponding to the sub-image and the sub-image corresponding to the at least two images, the at least two images including at least one unlabeled image and at least one labeled The image, at least two sub-images carry a category of the corresponding
- the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511.
- the computer program is executed by the central processing unit (CPU) 501, the above-described functions defined in the method of the present application are performed.
- the methods and apparatus of the present application may be implemented in a number of ways.
- the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
- the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
- the present application also covers a recording medium storing a program for executing the method according to the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| SG11201913365WA SG11201913365WA (en) | 2017-08-01 | 2018-07-27 | Semantic segmentation model trainingmethods and apparatuses, electronic devices, and storage media |
| KR1020197038767A KR102358554B1 (ko) | 2017-08-01 | 2018-07-27 | 시맨틱 분할 모델을 위한 훈련 방법 및 장치, 전자 기기, 저장 매체 |
| JP2019571272A JP6807471B2 (ja) | 2017-08-01 | 2018-07-27 | セマンティックセグメンテーションモデルの訓練方法および装置、電子機器、ならびに記憶媒体 |
| US16/726,880 US11301719B2 (en) | 2017-08-01 | 2019-12-25 | Semantic segmentation model training methods and apparatuses, electronic devices, and storage media |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710648545.7A CN108229479B (zh) | 2017-08-01 | 2017-08-01 | 语义分割模型的训练方法和装置、电子设备、存储介质 |
| CN201710648545.7 | 2017-08-01 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/726,880 Continuation US11301719B2 (en) | 2017-08-01 | 2019-12-25 | Semantic segmentation model training methods and apparatuses, electronic devices, and storage media |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019024808A1 true WO2019024808A1 (zh) | 2019-02-07 |
Family
ID=62654687
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/097549 Ceased WO2019024808A1 (zh) | 2017-08-01 | 2018-07-27 | 语义分割模型的训练方法和装置、电子设备、存储介质 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US11301719B2 (enExample) |
| JP (1) | JP6807471B2 (enExample) |
| KR (1) | KR102358554B1 (enExample) |
| CN (1) | CN108229479B (enExample) |
| SG (1) | SG11201913365WA (enExample) |
| WO (1) | WO2019024808A1 (enExample) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110781895A (zh) * | 2019-10-10 | 2020-02-11 | 湖北工业大学 | 一种基于卷积神经网络的图像语义分割方法 |
| CN111062252A (zh) * | 2019-11-15 | 2020-04-24 | 浙江大华技术股份有限公司 | 一种实时危险物品语义分割方法、装置及存储装置 |
| CN111553362A (zh) * | 2019-04-01 | 2020-08-18 | 上海卫莎网络科技有限公司 | 一种视频处理方法、电子设备和计算机可读存储介质 |
| CN111612802A (zh) * | 2020-04-29 | 2020-09-01 | 杭州电子科技大学 | 一种基于现有图像语义分割模型的再优化训练方法及应用 |
| CN111783779A (zh) * | 2019-09-17 | 2020-10-16 | 北京沃东天骏信息技术有限公司 | 图像处理方法、装置和计算机可读存储介质 |
| CN111814805A (zh) * | 2020-06-18 | 2020-10-23 | 浙江大华技术股份有限公司 | 特征提取网络训练方法以及相关方法和装置 |
| CN111833291A (zh) * | 2019-04-22 | 2020-10-27 | 上海汽车集团股份有限公司 | 一种语义分割训练集人工标注评价方法及装置 |
| CN113159057A (zh) * | 2021-04-01 | 2021-07-23 | 湖北工业大学 | 一种图像语义分割方法和计算机设备 |
| CN113450311A (zh) * | 2021-06-01 | 2021-09-28 | 国网河南省电力公司漯河供电公司 | 基于语义分割和空间关系的带销螺丝缺陷检测方法及系统 |
| CN113792742A (zh) * | 2021-09-17 | 2021-12-14 | 北京百度网讯科技有限公司 | 遥感图像的语义分割方法和语义分割模型的训练方法 |
| CN114266881A (zh) * | 2021-11-18 | 2022-04-01 | 武汉科技大学 | 一种基于改进型语义分割网络的指针式仪表自动读数方法 |
| CN114549405A (zh) * | 2022-01-10 | 2022-05-27 | 中国地质大学(武汉) | 一种基于监督自注意力网络的高分遥感图像语义分割方法 |
| CN114693934A (zh) * | 2022-04-13 | 2022-07-01 | 北京百度网讯科技有限公司 | 语义分割模型的训练方法、视频语义分割方法及装置 |
| CN114691912A (zh) * | 2020-12-25 | 2022-07-01 | 日本电气株式会社 | 图像处理的方法、设备和计算机可读存储介质 |
| CN114821058A (zh) * | 2022-04-28 | 2022-07-29 | 济南博观智能科技有限公司 | 一种图像语义分割方法、装置、电子设备及存储介质 |
| CN114997302A (zh) * | 2022-05-27 | 2022-09-02 | 阿里巴巴(中国)有限公司 | 负载特征确定方法、语义模型训练方法、装置及设备 |
| CN115272668A (zh) * | 2022-06-23 | 2022-11-01 | 重庆金美通信有限责任公司 | 基于结构相似性度量的图像分割方法、装置及终端设备 |
| CN116258852A (zh) * | 2023-01-03 | 2023-06-13 | 重庆长安汽车股份有限公司 | 一种道路场景图像的语义分割方法、装置、设备及介质 |
| CN116343216A (zh) * | 2023-03-30 | 2023-06-27 | 北京百度网讯科技有限公司 | 图像矫正模型的获取方法、处理方法、装置、设备与介质 |
| CN116883673A (zh) * | 2023-09-08 | 2023-10-13 | 腾讯科技(深圳)有限公司 | 语义分割模型训练方法、装置、设备及存储介质 |
| CN120599267A (zh) * | 2025-07-15 | 2025-09-05 | 北京城建智控科技股份有限公司 | 异物检测方法及装置 |
Families Citing this family (77)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108229479B (zh) * | 2017-08-01 | 2019-12-31 | 北京市商汤科技开发有限公司 | 语义分割模型的训练方法和装置、电子设备、存储介质 |
| US10755142B2 (en) * | 2017-09-05 | 2020-08-25 | Cognizant Technology Solutions U.S. Corporation | Automated and unsupervised generation of real-world training data |
| CN110012210B (zh) * | 2018-01-05 | 2020-09-22 | Oppo广东移动通信有限公司 | 拍照方法、装置、存储介质及电子设备 |
| CN110622213B (zh) * | 2018-02-09 | 2022-11-15 | 百度时代网络技术(北京)有限公司 | 利用3d语义地图进行深度定位和分段的系统和方法 |
| CN109101878B (zh) * | 2018-07-01 | 2020-09-29 | 浙江工业大学 | 一种用于秸秆燃值估计的图像分析系统及图像分析方法 |
| CN109084955A (zh) * | 2018-07-02 | 2018-12-25 | 北京百度网讯科技有限公司 | 显示屏质量检测方法、装置、电子设备及存储介质 |
| CN109190631A (zh) * | 2018-08-31 | 2019-01-11 | 阿里巴巴集团控股有限公司 | 图片的目标对象标注方法及装置 |
| CN109087708B (zh) * | 2018-09-20 | 2021-08-31 | 深圳先进技术研究院 | 用于斑块分割的模型训练方法、装置、设备及存储介质 |
| JP6695947B2 (ja) | 2018-09-21 | 2020-05-20 | ソニーセミコンダクタソリューションズ株式会社 | 固体撮像システム、画像処理方法及びプログラム |
| CN109241951A (zh) * | 2018-10-26 | 2019-01-18 | 北京陌上花科技有限公司 | 色情图片识别方法、识别模型构建方法及识别模型和计算机可读存储介质 |
| CN109583328B (zh) * | 2018-11-13 | 2021-09-03 | 东南大学 | 一种嵌入稀疏连接的深度卷积神经网络字符识别方法 |
| CN109859209B (zh) * | 2019-01-08 | 2023-10-17 | 平安科技(深圳)有限公司 | 遥感影像分割方法、装置及存储介质、服务器 |
| CN109886272B (zh) * | 2019-02-25 | 2020-10-30 | 腾讯科技(深圳)有限公司 | 点云分割方法、装置、计算机可读存储介质和计算机设备 |
| CN111626313B (zh) * | 2019-02-28 | 2023-06-02 | 银河水滴科技(北京)有限公司 | 一种特征提取模型训练方法、图像处理方法及装置 |
| CN111767760A (zh) * | 2019-04-01 | 2020-10-13 | 北京市商汤科技开发有限公司 | 活体检测方法和装置、电子设备及存储介质 |
| US11580673B1 (en) * | 2019-06-04 | 2023-02-14 | Duke University | Methods, systems, and computer readable media for mask embedding for realistic high-resolution image synthesis |
| US10943353B1 (en) | 2019-09-11 | 2021-03-09 | International Business Machines Corporation | Handling untrainable conditions in a network architecture search |
| US11023783B2 (en) * | 2019-09-11 | 2021-06-01 | International Business Machines Corporation | Network architecture search with global optimization |
| US20210089924A1 (en) * | 2019-09-24 | 2021-03-25 | Nec Laboratories America, Inc | Learning weighted-average neighbor embeddings |
| KR20210061839A (ko) * | 2019-11-20 | 2021-05-28 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
| US11080833B2 (en) * | 2019-11-22 | 2021-08-03 | Adobe Inc. | Image manipulation using deep learning techniques in a patch matching operation |
| KR102198480B1 (ko) * | 2020-02-28 | 2021-01-05 | 연세대학교 산학협력단 | 재귀 그래프 모델링을 통한 비디오 요약 생성 장치 및 방법 |
| US11449717B2 (en) * | 2020-03-12 | 2022-09-20 | Fujifilm Business Innovation Corp. | System and method for identification and localization of images using triplet loss and predicted regions |
| US12182721B2 (en) * | 2020-03-25 | 2024-12-31 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Deep learning-based anomaly detection in images |
| CN113496277A (zh) | 2020-04-03 | 2021-10-12 | 三星电子株式会社 | 用于检索图像的神经网络装置及其操作方法 |
| CN111401474B (zh) * | 2020-04-13 | 2023-09-08 | Oppo广东移动通信有限公司 | 视频分类模型的训练方法、装置、设备及存储介质 |
| CN111489366B (zh) * | 2020-04-15 | 2024-06-11 | 上海商汤临港智能科技有限公司 | 神经网络的训练、图像语义分割方法及装置 |
| CN111652285A (zh) * | 2020-05-09 | 2020-09-11 | 济南浪潮高新科技投资发展有限公司 | 一种茶饼类别识别方法、设备及介质 |
| CN111797893B (zh) * | 2020-05-26 | 2021-09-14 | 华为技术有限公司 | 一种神经网络的训练方法、图像分类系统及相关设备 |
| CN111611420B (zh) * | 2020-05-26 | 2024-01-23 | 北京字节跳动网络技术有限公司 | 用于生成图像描述信息的方法和装置 |
| CN111724441B (zh) * | 2020-05-28 | 2025-02-18 | 上海商汤智能科技有限公司 | 图像标注方法及装置、电子设备及存储介质 |
| CN111710009B (zh) * | 2020-05-29 | 2023-06-23 | 北京百度网讯科技有限公司 | 人流密度的生成方法、装置、电子设备以及存储介质 |
| CN111667483B (zh) * | 2020-07-03 | 2022-08-30 | 腾讯科技(深圳)有限公司 | 多模态图像的分割模型的训练方法、图像处理方法和装置 |
| CN111898696B (zh) * | 2020-08-10 | 2023-10-27 | 腾讯云计算(长沙)有限责任公司 | 伪标签及标签预测模型的生成方法、装置、介质及设备 |
| CN111931782B (zh) * | 2020-08-12 | 2024-03-01 | 中国科学院上海微系统与信息技术研究所 | 语义分割方法、系统、介质及装置 |
| CN112016599B (zh) * | 2020-08-13 | 2023-09-15 | 驭势科技(浙江)有限公司 | 用于图像检索的神经网络训练方法、装置及电子设备 |
| CN112085739B (zh) * | 2020-08-20 | 2024-05-24 | 深圳力维智联技术有限公司 | 基于弱监督的语义分割模型的训练方法、装置及设备 |
| US11694301B2 (en) | 2020-09-30 | 2023-07-04 | Alibaba Group Holding Limited | Learning model architecture for image data semantic segmentation |
| US12437514B2 (en) * | 2020-11-10 | 2025-10-07 | Nec Corporation | Video domain adaptation via contrastive learning for decision making |
| CN112613515B (zh) * | 2020-11-23 | 2024-09-20 | 上海眼控科技股份有限公司 | 语义分割方法、装置、计算机设备和存储介质 |
| CN114565773A (zh) * | 2020-11-27 | 2022-05-31 | 安徽寒武纪信息科技有限公司 | 语义分割图像的方法、装置、电子设备以及存储介质 |
| CN112559552B (zh) * | 2020-12-03 | 2023-07-25 | 北京百度网讯科技有限公司 | 数据对生成方法、装置、电子设备及存储介质 |
| CN112668509B (zh) * | 2020-12-31 | 2024-04-02 | 深圳云天励飞技术股份有限公司 | 社交关系识别模型的训练方法、识别方法及相关设备 |
| CN113781383B (zh) * | 2021-01-06 | 2024-06-21 | 北京沃东天骏信息技术有限公司 | 处理图像的方法、装置、设备和计算机可读介质 |
| CN112861911B (zh) * | 2021-01-10 | 2024-05-28 | 西北工业大学 | 一种基于深度特征选择融合的rgb-d语义分割方法 |
| CN112862792B (zh) * | 2021-02-21 | 2024-04-05 | 北京工业大学 | 一种用于小样本图像数据集的小麦白粉病孢子分割方法 |
| CN112686898B (zh) * | 2021-03-15 | 2021-08-13 | 四川大学 | 一种基于自监督学习的放疗靶区自动分割方法 |
| CN113011430B (zh) * | 2021-03-23 | 2023-01-20 | 中国科学院自动化研究所 | 大规模点云语义分割方法及系统 |
| CN113283434B (zh) * | 2021-04-13 | 2024-06-21 | 北京工业大学 | 一种基于分割网络优化的图像语义分割方法及系统 |
| BR112023022541A2 (pt) * | 2021-04-30 | 2024-01-02 | Ohio State Innovation Foundation | Aparelhos e métodos para identificar acúmulo de hidrogênio de subsuperfície |
| CN113177926B (zh) * | 2021-05-11 | 2023-11-14 | 泰康保险集团股份有限公司 | 一种图像检测方法和装置 |
| KR102638075B1 (ko) * | 2021-05-14 | 2024-02-19 | (주)로보티즈 | 3차원 지도 정보를 이용한 의미론적 분할 방법 및 시스템 |
| CN113822282B (zh) * | 2021-06-15 | 2025-11-28 | 腾讯科技(深圳)有限公司 | 图像语义分割方法、装置、计算机设备及存储介质 |
| US20230004760A1 (en) * | 2021-06-28 | 2023-01-05 | Nvidia Corporation | Training object detection systems with generated images |
| CN113627568B (zh) * | 2021-08-27 | 2024-07-02 | 广州文远知行科技有限公司 | 一种补标方法、装置、设备及可读存储介质 |
| CN113806573A (zh) * | 2021-09-15 | 2021-12-17 | 上海商汤科技开发有限公司 | 标注方法、装置、电子设备、服务器及存储介质 |
| CN113837192B (zh) * | 2021-09-22 | 2024-04-19 | 推想医疗科技股份有限公司 | 图像分割方法及装置,神经网络的训练方法及装置 |
| EP4388507A4 (en) | 2021-10-14 | 2024-10-23 | Hewlett-Packard Development Company, L.P. | Training models for object detection |
| CN113642262B (zh) * | 2021-10-15 | 2021-12-21 | 南通宝田包装科技有限公司 | 基于人工智能的牙膏包装外观辅助设计方法 |
| CN113642566B (zh) * | 2021-10-15 | 2021-12-21 | 南通宝田包装科技有限公司 | 基于人工智能和大数据的药品包装设计方法 |
| US12462525B2 (en) * | 2021-10-21 | 2025-11-04 | The Toronto-Dominion Bank | Co-learning object and relationship detection with density aware loss |
| US11941884B2 (en) * | 2021-11-12 | 2024-03-26 | Adobe Inc. | Multi-source panoptic feature pyramid network |
| CN114067081B (zh) * | 2021-11-26 | 2025-05-06 | 南京理工大学 | 一种基于双向增强网络的3d牙齿模型分割方法 |
| CN114187211A (zh) * | 2021-12-14 | 2022-03-15 | 深圳致星科技有限公司 | 用于优化图像语义分割结果的图像处理方法及装置 |
| CN113936141B (zh) * | 2021-12-17 | 2022-02-22 | 深圳佑驾创新科技有限公司 | 图像语义分割方法及计算机可读存储介质 |
| CN114372537B (zh) * | 2022-01-17 | 2022-10-21 | 浙江大学 | 一种面向图像描述系统的通用对抗补丁生成方法及系统 |
| US12183058B2 (en) * | 2022-02-16 | 2024-12-31 | Shopify Inc. | Systems and methods for training and using a machine learning model for matching objects |
| CN114663662B (zh) * | 2022-05-23 | 2022-09-09 | 深圳思谋信息科技有限公司 | 超参数搜索方法、装置、计算机设备和存储介质 |
| CN115086503B (zh) * | 2022-05-25 | 2023-09-22 | 清华大学深圳国际研究生院 | 信息隐藏方法、装置、设备及存储介质 |
| CN114677567B (zh) * | 2022-05-27 | 2022-10-14 | 成都数联云算科技有限公司 | 模型训练方法、装置、存储介质及电子设备 |
| CN117274579A (zh) * | 2022-06-15 | 2023-12-22 | 北京三星通信技术研究有限公司 | 图像处理方法及相关设备 |
| US12190520B2 (en) * | 2022-07-05 | 2025-01-07 | Alibaba (China) Co., Ltd. | Pyramid architecture for multi-scale processing in point cloud segmentation |
| US12288371B2 (en) * | 2022-08-10 | 2025-04-29 | Avid Technology, Inc. | Finding the semantic region of interest in images |
| CN115471661A (zh) * | 2022-09-30 | 2022-12-13 | 中国农业银行股份有限公司 | 图像分割模型训练、图像分割方法、装置及电子设备 |
| CN115564959B (zh) * | 2022-11-08 | 2024-11-12 | 长春理工大学 | 一种基于非对称空间特征卷积的实时语义分割方法 |
| CN115953778A (zh) * | 2022-12-30 | 2023-04-11 | 东软睿驰汽车技术(沈阳)有限公司 | 场景语义分割模型训练方法、装置和电子设备 |
| CN118397282B (zh) * | 2024-06-27 | 2024-08-30 | 中国民用航空飞行学院 | 基于语义sam大模型的三维点云鲁棒性部件分割方法 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105787482A (zh) * | 2016-02-26 | 2016-07-20 | 华北电力大学 | 一种基于深度卷积神经网络的特定目标轮廓图像分割方法 |
| CN106022221A (zh) * | 2016-05-09 | 2016-10-12 | 腾讯科技(深圳)有限公司 | 一种图像处理方法及处理系统 |
| WO2017091833A1 (en) * | 2015-11-29 | 2017-06-01 | Arterys Inc. | Automated cardiac volume segmentation |
| CN108229479A (zh) * | 2017-08-01 | 2018-06-29 | 北京市商汤科技开发有限公司 | 语义分割模型的训练方法和装置、电子设备、存储介质 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9317908B2 (en) * | 2012-06-29 | 2016-04-19 | Behavioral Recognition System, Inc. | Automatic gain control filter in a video analysis system |
| US9558268B2 (en) * | 2014-08-20 | 2017-01-31 | Mitsubishi Electric Research Laboratories, Inc. | Method for semantically labeling an image of a scene using recursive context propagation |
| US9836641B2 (en) * | 2014-12-17 | 2017-12-05 | Google Inc. | Generating numeric embeddings of images |
| US9704257B1 (en) * | 2016-03-25 | 2017-07-11 | Mitsubishi Electric Research Laboratories, Inc. | System and method for semantic segmentation using Gaussian random field network |
| JP2018097807A (ja) * | 2016-12-16 | 2018-06-21 | 株式会社デンソーアイティーラボラトリ | 学習装置 |
| JP7203844B2 (ja) * | 2017-07-25 | 2023-01-13 | 達闥機器人股▲分▼有限公司 | トレーニングデータの生成方法、生成装置及びその画像のセマンティックセグメンテーション方法 |
-
2017
- 2017-08-01 CN CN201710648545.7A patent/CN108229479B/zh active Active
-
2018
- 2018-07-27 SG SG11201913365WA patent/SG11201913365WA/en unknown
- 2018-07-27 WO PCT/CN2018/097549 patent/WO2019024808A1/zh not_active Ceased
- 2018-07-27 JP JP2019571272A patent/JP6807471B2/ja active Active
- 2018-07-27 KR KR1020197038767A patent/KR102358554B1/ko active Active
-
2019
- 2019-12-25 US US16/726,880 patent/US11301719B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017091833A1 (en) * | 2015-11-29 | 2017-06-01 | Arterys Inc. | Automated cardiac volume segmentation |
| CN105787482A (zh) * | 2016-02-26 | 2016-07-20 | 华北电力大学 | 一种基于深度卷积神经网络的特定目标轮廓图像分割方法 |
| CN106022221A (zh) * | 2016-05-09 | 2016-10-12 | 腾讯科技(深圳)有限公司 | 一种图像处理方法及处理系统 |
| CN108229479A (zh) * | 2017-08-01 | 2018-06-29 | 北京市商汤科技开发有限公司 | 语义分割模型的训练方法和装置、电子设备、存储介质 |
Cited By (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111553362A (zh) * | 2019-04-01 | 2020-08-18 | 上海卫莎网络科技有限公司 | 一种视频处理方法、电子设备和计算机可读存储介质 |
| CN111553362B (zh) * | 2019-04-01 | 2023-05-05 | 上海卫莎网络科技有限公司 | 一种视频处理方法、电子设备和计算机可读存储介质 |
| CN111833291A (zh) * | 2019-04-22 | 2020-10-27 | 上海汽车集团股份有限公司 | 一种语义分割训练集人工标注评价方法及装置 |
| CN111833291B (zh) * | 2019-04-22 | 2023-11-03 | 上海汽车集团股份有限公司 | 一种语义分割训练集人工标注评价方法及装置 |
| CN111783779A (zh) * | 2019-09-17 | 2020-10-16 | 北京沃东天骏信息技术有限公司 | 图像处理方法、装置和计算机可读存储介质 |
| CN111783779B (zh) * | 2019-09-17 | 2023-12-05 | 北京沃东天骏信息技术有限公司 | 图像处理方法、装置和计算机可读存储介质 |
| CN110781895B (zh) * | 2019-10-10 | 2023-06-20 | 湖北工业大学 | 一种基于卷积神经网络的图像语义分割方法 |
| CN110781895A (zh) * | 2019-10-10 | 2020-02-11 | 湖北工业大学 | 一种基于卷积神经网络的图像语义分割方法 |
| CN111062252A (zh) * | 2019-11-15 | 2020-04-24 | 浙江大华技术股份有限公司 | 一种实时危险物品语义分割方法、装置及存储装置 |
| CN111062252B (zh) * | 2019-11-15 | 2023-11-10 | 浙江大华技术股份有限公司 | 一种实时危险物品语义分割方法、装置及存储装置 |
| CN111612802A (zh) * | 2020-04-29 | 2020-09-01 | 杭州电子科技大学 | 一种基于现有图像语义分割模型的再优化训练方法及应用 |
| CN111814805A (zh) * | 2020-06-18 | 2020-10-23 | 浙江大华技术股份有限公司 | 特征提取网络训练方法以及相关方法和装置 |
| CN114691912A (zh) * | 2020-12-25 | 2022-07-01 | 日本电气株式会社 | 图像处理的方法、设备和计算机可读存储介质 |
| CN113159057B (zh) * | 2021-04-01 | 2022-09-02 | 湖北工业大学 | 一种图像语义分割方法和计算机设备 |
| CN113159057A (zh) * | 2021-04-01 | 2021-07-23 | 湖北工业大学 | 一种图像语义分割方法和计算机设备 |
| CN113450311A (zh) * | 2021-06-01 | 2021-09-28 | 国网河南省电力公司漯河供电公司 | 基于语义分割和空间关系的带销螺丝缺陷检测方法及系统 |
| CN113792742A (zh) * | 2021-09-17 | 2021-12-14 | 北京百度网讯科技有限公司 | 遥感图像的语义分割方法和语义分割模型的训练方法 |
| CN114266881A (zh) * | 2021-11-18 | 2022-04-01 | 武汉科技大学 | 一种基于改进型语义分割网络的指针式仪表自动读数方法 |
| CN114549405A (zh) * | 2022-01-10 | 2022-05-27 | 中国地质大学(武汉) | 一种基于监督自注意力网络的高分遥感图像语义分割方法 |
| CN114693934A (zh) * | 2022-04-13 | 2022-07-01 | 北京百度网讯科技有限公司 | 语义分割模型的训练方法、视频语义分割方法及装置 |
| CN114693934B (zh) * | 2022-04-13 | 2023-09-01 | 北京百度网讯科技有限公司 | 语义分割模型的训练方法、视频语义分割方法及装置 |
| CN114821058A (zh) * | 2022-04-28 | 2022-07-29 | 济南博观智能科技有限公司 | 一种图像语义分割方法、装置、电子设备及存储介质 |
| CN114997302A (zh) * | 2022-05-27 | 2022-09-02 | 阿里巴巴(中国)有限公司 | 负载特征确定方法、语义模型训练方法、装置及设备 |
| CN115272668A (zh) * | 2022-06-23 | 2022-11-01 | 重庆金美通信有限责任公司 | 基于结构相似性度量的图像分割方法、装置及终端设备 |
| CN116258852A (zh) * | 2023-01-03 | 2023-06-13 | 重庆长安汽车股份有限公司 | 一种道路场景图像的语义分割方法、装置、设备及介质 |
| CN116343216A (zh) * | 2023-03-30 | 2023-06-27 | 北京百度网讯科技有限公司 | 图像矫正模型的获取方法、处理方法、装置、设备与介质 |
| CN116343216B (zh) * | 2023-03-30 | 2025-08-29 | 北京百度网讯科技有限公司 | 图像矫正模型的获取方法、处理方法、装置、设备与介质 |
| CN116883673A (zh) * | 2023-09-08 | 2023-10-13 | 腾讯科技(深圳)有限公司 | 语义分割模型训练方法、装置、设备及存储介质 |
| CN116883673B (zh) * | 2023-09-08 | 2023-12-26 | 腾讯科技(深圳)有限公司 | 语义分割模型训练方法、装置、设备及存储介质 |
| CN120599267A (zh) * | 2025-07-15 | 2025-09-05 | 北京城建智控科技股份有限公司 | 异物检测方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20200134375A1 (en) | 2020-04-30 |
| SG11201913365WA (en) | 2020-01-30 |
| CN108229479A (zh) | 2018-06-29 |
| JP6807471B2 (ja) | 2021-01-06 |
| US11301719B2 (en) | 2022-04-12 |
| JP2020524861A (ja) | 2020-08-20 |
| KR102358554B1 (ko) | 2022-02-04 |
| CN108229479B (zh) | 2019-12-31 |
| KR20200015611A (ko) | 2020-02-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019024808A1 (zh) | 语义分割模型的训练方法和装置、电子设备、存储介质 | |
| TWI721510B (zh) | 雙目圖像的深度估計方法、設備及儲存介質 | |
| CN108229296B (zh) | 人脸皮肤属性识别方法和装置、电子设备、存储介质 | |
| CN114746898B (zh) | 用于生成图像抠图的三分图的方法和系统 | |
| CN108399383B (zh) | 表情迁移方法、装置存储介质及程序 | |
| WO2020006961A1 (zh) | 用于提取图像的方法和装置 | |
| WO2018099473A1 (zh) | 场景分析方法和系统、电子设备 | |
| WO2019011249A1 (zh) | 一种图像中物体姿态的确定方法、装置、设备及存储介质 | |
| CN113688907B (zh) | 模型训练、视频处理方法,装置,设备以及存储介质 | |
| CN108229287B (zh) | 图像识别方法和装置、电子设备和计算机存储介质 | |
| CN108229313B (zh) | 人脸识别方法和装置、电子设备和计算机程序及存储介质 | |
| WO2018054329A1 (zh) | 物体检测方法和装置、电子设备、计算机程序和存储介质 | |
| CN112862877A (zh) | 用于训练图像处理网络和图像处理的方法和装置 | |
| WO2019214344A1 (zh) | 系统增强学习方法和装置、电子设备、计算机存储介质 | |
| CN108154222A (zh) | 深度神经网络训练方法和系统、电子设备 | |
| US12493976B2 (en) | Method for training depth estimation model, training apparatus, and electronic device applying the method | |
| CN119131265B (zh) | 基于多视角一致性的三维全景场景理解方法及装置 | |
| CN114511041B (zh) | 模型训练方法、图像处理方法、装置、设备和存储介质 | |
| CN114677565A (zh) | 特征提取网络的训练方法和图像处理方法、装置 | |
| CN114792355A (zh) | 虚拟形象生成方法、装置、电子设备和存储介质 | |
| CN108154153A (zh) | 场景分析方法和系统、电子设备 | |
| CN113766117B (zh) | 一种视频去抖动方法和装置 | |
| CN111311712B (zh) | 视频帧处理方法和装置 | |
| CN114638754A (zh) | 一种虚拟试衣视频生成方法、装置、设备及介质 | |
| CN116129228B (zh) | 图像匹配模型的训练方法、图像匹配方法及其装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18840825 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2019571272 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 20197038767 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/07/2020) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18840825 Country of ref document: EP Kind code of ref document: A1 |