WO2020192471A1 - 一种图像分类模型训练的方法、图像处理的方法及装置 - Google Patents
一种图像分类模型训练的方法、图像处理的方法及装置 Download PDFInfo
- Publication number
- WO2020192471A1 WO2020192471A1 PCT/CN2020/079496 CN2020079496W WO2020192471A1 WO 2020192471 A1 WO2020192471 A1 WO 2020192471A1 CN 2020079496 W CN2020079496 W CN 2020079496W WO 2020192471 A1 WO2020192471 A1 WO 2020192471A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- trained
- network
- model
- classification
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 130
- 238000000034 method Methods 0.000 title claims abstract description 119
- 238000013145 classification model Methods 0.000 title claims abstract description 20
- 238000003672 processing method Methods 0.000 title claims description 9
- 230000011218 segmentation Effects 0.000 claims abstract description 183
- 230000006870 function Effects 0.000 claims description 128
- 238000012545 processing Methods 0.000 claims description 46
- 238000002372 labelling Methods 0.000 claims description 45
- 238000013527 convolutional neural network Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 21
- 238000005070 sampling Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 20
- 238000005516 engineering process Methods 0.000 description 18
- 238000013473 artificial intelligence Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 10
- 238000013461 design Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000011976 chest X-ray Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- This application relates to the field of artificial intelligence, and in particular to an image classification model training method, image processing method and device.
- the determining module is further configured to use the classification loss function to determine the first corresponding to the offset network to be trained based on the image content category information and the fourth predicted category label information acquired by the acquiring module Five model parameters;
- the training module is specifically configured to perform processing on the image to be trained according to the second model parameter, the third model parameter, the fourth model parameter, and the fifth model parameter determined by the determining module 302
- the semantic segmentation network model is trained to obtain the image semantic segmentation network model.
- the training module is specifically configured to use the second model parameter and the third model parameter to train the offset network to be trained for N times, according to each training of the offset network to be trained
- the offset variable of determines the image content area corresponding to the image to be trained, where N is an integer greater than or equal to 1;
- the image semantic segmentation network model is generated.
- the objective loss function is expressed as:
- the L seg represents the target loss function
- the N represents the total number of categories
- the c represents the c-th category
- the k is greater than or equal to 1, and less than or equal to the N
- the I() represents the Dirac function
- the Re represents the predicted probability value of the c-th category at the pixel
- the i represents the abscissa position of the pixel in the image to be trained
- the j represents the position of the pixel in the image to be trained The ordinate position.
- a fourth aspect of the present application provides an image processing device, which is used in computer equipment and includes:
- the acquisition module is used to acquire the image to be processed
- the acquisition module is further configured to acquire the semantic segmentation result of the image to be processed through the image semantic segmentation network model, wherein the image semantic segmentation network model is alternately trained according to the image classification network to be trained and the offset network to be trained Obtained, the offset network to be trained is used to classify images according to offset variables, and the image classification network to be trained is used to classify image content in the images;
- the processing module is configured to process the image to be processed according to the semantic segmentation result acquired by the acquiring module.
- the processor is used to execute the program in the memory and includes the following steps:
- the first prediction category label information of the image to be trained is obtained through the image classification network to be trained, wherein the offset network to be trained is used to Variables classify images, and the image classification network to be trained is used to classify image content in the images;
- the bus system is used to connect the memory and the processor, so that the memory and the processor communicate.
- a sixth aspect of the present application provides a server, including: a memory, a transceiver, a processor, and a bus system;
- the memory is used to store programs
- the semantic segmentation result of the image to be processed is obtained through the image semantic segmentation network model, wherein the image semantic segmentation network model is obtained by alternate training according to the image classification network to be trained and the offset network to be trained.
- the shift network is used to classify images according to offset variables, and the image classification network to be trained is used to classify image content in the images;
- the bus system is used to connect the memory and the processor, so that the memory and the processor communicate.
- the seventh aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the methods described in the above aspects.
- a method for training an image classification model is provided.
- the image to be trained is first obtained, and when the first model parameter of the offset network to be trained is fixed, the first image classification network to be trained is used to obtain the first image of the image to be trained.
- the offset network and the image classification network can be used to train the images to be trained, which are marked as image-level.
- image-level annotation is not required, thereby reducing manual labor.
- the cost of labeling improves the efficiency of model training.
- FIG. 2 is a schematic diagram of a process framework of an image semantic segmentation network model in an embodiment of the application
- FIG. 4 is a schematic structural diagram of an offset network and an image classification network in an embodiment of this application.
- Fig. 5 is a schematic structural diagram of a deformable convolutional neural network in an embodiment of the application.
- FIG. 7 is a schematic diagram of an image processing flow based on a deformable convolutional neural network in an embodiment of the application.
- Fig. 8 is a schematic diagram of an embodiment of a model training device in an embodiment of the application.
- FIG. 9 is a schematic diagram of an embodiment of an image processing device in an embodiment of the application.
- FIG. 10 is a schematic structural diagram of a server in an embodiment of the application.
- FIG. 11 is a schematic diagram of a structure of a terminal device in an embodiment of this application.
- the embodiments of the present application provide a method for training an image classification model, a method and device for image processing, which can train images to be trained that are marked as image levels, and do not need to be manually performed while ensuring the performance of the image semantic segmentation network model Pixel-level labeling reduces the cost of manual labeling and improves the efficiency of model training.
- this application proposes a method for training an image semantic segmentation network model and a method for image processing using the image semantic segmentation network model. This method can reduce the manual labeling in the model training process of image semantic segmentation through artificial intelligence, and improve the efficiency of model training.
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure objects. And further graphics processing, so that the computer processing becomes more suitable for human eyes to observe or send to the instrument to detect images.
- Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, Optical Character Recognition (ORC), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual Technologies such as reality, augmented reality, synchronized positioning and map construction also include common facial recognition, fingerprint recognition and other biometric recognition technologies.
- Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
- Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all areas of artificial intelligence.
- Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning techniques.
- the method provided in this application is mainly used in the field of computer vision in the field of artificial intelligence.
- the problems of segmentation, detection, recognition, and tracking are closely connected.
- image semantic segmentation is to understand the image from the pixel level, and it is necessary to determine the corresponding target category of each pixel in the image.
- the category has never-ending requirements for the accuracy of the algorithm.
- the computer's understanding of the image content can start with a semantic label (image classification) for the entire image, and then progress to drawing the image content position that appears in the picture.
- it is necessary for the computer to understand the semantic information of each pixel in the image so that the computer can see the image like a person. This is image semantic segmentation.
- the goal of image semantic segmentation is to label each pixel in the image with a label. It is simple to understand that semantic segmentation is a very important field in computer vision. It refers to the pixel level. Identify the image, that is, mark the object category to which each pixel in the image belongs. Based on the image processing method provided in this application, it can be applied to an autonomous driving scenario, that is, it is necessary to add necessary perception to the vehicle to understand the environment in which the vehicle is located, so that the autonomous vehicle can drive safely. It can also be used for medical image diagnosis. The machine can enhance the analysis performed by radiologists and greatly reduce the time required to run diagnostic tests. For example, the chest X-ray can be segmented to get the heart area and lung area.
- the client can download the image semantic segmentation network model from the server, and then input the image to be processed into the image semantic segmentation network model, and output the semantic segmentation result of the image to be processed through the image semantic segmentation network model.
- the client may upload the image to be processed to the server, and the server uses the image semantic segmentation network model to process the image to be processed, thereby obtaining the semantic segmentation result, and returning the voice segmentation result to the client.
- the server can also directly use the image semantic segmentation network model to process the image to be processed in the background to obtain the semantic segmentation result.
- Terminal devices include but are not limited to unmanned vehicles, robots, tablets, laptops, handheld computers, mobile phones, voice interactive devices, and personal computers (PCs). ), not limited here.
- Figure 2 is a schematic diagram of a process framework of an image semantic segmentation network model in an embodiment of this application. As shown in the figure, first obtain training images 21 and image-level category annotation information 22, and then use training Image 21 and category annotation information 22 are trained to obtain a weakly supervised image semantic segmentation network model 23. Next, an unknown test image 24 is obtained, and the test image 24 is input to the image semantic segmentation network model 23, and the image semantic segmentation network The model 23 performs segmentation 24 on the unknown test image, thereby predicting the semantic segmentation result 25 of the test image.
- the model training device first needs to obtain the image to be trained, where the image to be trained has category label information.
- the category labeling information is used to indicate the category information of the image content existing in the image to be trained, for example, the image content category information such as "person”, “horse”, “TV”, and “sofa” is marked in the image to be trained.
- the image content category information may not only refer to category information corresponding to objects in the image, but also category information corresponding to scenes such as sky, clouds, lawn, and sea.
- the image to be trained may be downloaded from the database, and then the image to be trained is annotated by manual annotation, thereby obtaining the category label information of the image to be trained. It can also automatically crawl images to be trained with category annotation information from websites with massive user data.
- the first prediction category label information of the image to be trained is obtained through the image classification network to be trained, where the offset network to be trained is used to pair according to the offset variable
- the image is classified, and the image classification network to be trained is used to classify the image content in the image;
- the offset network 42 to be trained is used to provide input point positions that have a weaker contribution to classification.
- the changed offset variable 44 the purpose of positioning the image content area with weaker discriminability can be achieved.
- the image classification network 41 to be trained is used to classify the image content area in the overall image.
- the image content category information and the first prediction category label information use a classification loss function to determine the second model parameter corresponding to the image classification network to be trained;
- the model training device uses a classification loss function to train the image classification network to be trained.
- the classification loss function is used to estimate the degree of inconsistency between the model predicted value and the true value.
- the image content category information of the image to be trained belongs to the true value.
- the first prediction category labeling information of the image to be trained belongs to the predicted value. The smaller the classification loss function, the better the robustness of the image classification network. Therefore, the second model parameter corresponding to the image classification network to be trained can be obtained according to the classification loss function.
- the weight value of the image classification network to be trained needs to be fixed at this time, that is, the image classification network to be trained is fixed.
- the image to be trained is input to the offset network to be trained, and the offset network to be trained outputs the second prediction category label information of the image to be trained.
- the image semantic segmentation network model to be trained to obtain the image semantic segmentation network model, where the image semantic segmentation network model is used to determine the semantic segmentation result of the image to be processed.
- the model training device trains the semantic segmentation network model of the image to be trained based on the model parameters obtained in each round of training (including the second and third model parameters obtained through training) .
- the offset variables predicted by the offset network during the training process are fused into an image content area, and finally, the obtained image content area is used as a pixel-level segmentation
- Use the supervision information to train the semantic segmentation network model of the image to be trained to obtain the image semantic segmentation network model.
- the image semantic segmentation network model outputs the corresponding semantic segmentation result.
- the offset network and the image classification network can be used to train the images to be trained that are marked as image level. Under the condition that the performance of the image semantic segmentation network model is guaranteed, manual pixel level is not required. Labeling, thereby reducing the cost of manual labeling, thereby improving the efficiency of model training.
- the information is labeled according to the image content category information and the first prediction category.
- the classification loss function to determine the second model parameters corresponding to the image classification network to be trained, including:
- the second model parameter corresponding to the image classification network to be trained is determined.
- a method for determining the parameters of the second model is introduced. First, according to the true value (that is, the image content category information of the image to be trained) and the predicted value (that is, the first predicted category labeling information of the image to be trained), the predicted probability value corresponding to each category is determined. Suppose there are five categories, namely "person”, “horse”, “refrigerator”, “TV” and "sofa”.
- the first prediction category label information includes "person”, “refrigerator”, “TV” and “Sofa”, the predicted probability value can be obtained, the predicted probability value of "person” is 0.93, the predicted probability value of "refrigerator” is 0.88, the predicted probability value of "horse” is 0, and the predicted probability value of "TV” is 0.5 , The predicted probability of "sofa” is 0.65.
- the classification loss of the classification loss function is determined according to the predicted probability value corresponding to each category.
- the model parameter corresponding to the image classification network to be trained under the minimum value can be obtained, and the model parameter is the second model parameter.
- the classification loss of the classification loss function in this application may refer to the cross-entropy classification loss.
- Using the classification loss function to determine the third model parameter corresponding to the offset network to be trained includes:
- the third model parameter corresponding to the offset network to be trained is determined.
- a method for determining the parameters of the third model is introduced. First, according to the true value (ie, the image content category information of the image to be trained) and the predicted value (ie the second predicted category labeling information of the image to be trained), the predicted probability value corresponding to each category is determined, where the second predicted category
- the labeling information is obtained after being processed by a deformable convolutional neural network. Suppose there are five categories, namely "person”, “horse”, “refrigerator”, "TV” and "sofa”.
- the second prediction category labeled information includes "person”, “horse”, “refrigerator”, For “TV” and “Sofa”, the predicted probability value can be obtained, the predicted probability value of "person” is 0.75, the predicted probability value of "refrigerator” is 0.65, the predicted probability value of "horse” is 0.19, and the predicted probability of "refrigerator” is The probability value is 0.66, the predicted probability value of "TV” is 0.43, and the predicted probability value of "Sofa” is 0.78.
- the classification loss of the classification loss function is determined according to the predicted probability value corresponding to each category.
- the model parameter corresponding to the offset network to be trained under the maximum value can be obtained, and the model parameter is the third model parameter. It is understandable that the classification loss of the classification loss function in this application may refer to the cross-entropy classification loss.
- the offset network can also provide the position of the input point that has a weaker contribution to the classification. According to the changed offset variable, it can be positioned to the weaker discriminative The purpose of the image content area.
- the classification loss function It can be expressed as:
- L represents the classification loss function
- I() represents the Dirac function
- N represents the total number of categories
- c represents the c category
- k is greater than or equal to 1 and less than or equal to N
- P c represents the prediction corresponding to the c category Probability value.
- a classification loss function for training the image classification network and the offset network is defined, that is, the following classification loss function is used:
- the first image of the image to be trained is obtained through the offset network to be trained.
- the second prediction category Before labeling information for the second prediction category, it can also include:
- the second prediction category corresponding to the feature image to be trained is labeled by the offset network to be trained.
- a method of generating the second prediction category label information using a deformable convolutional neural network is introduced.
- the image to be trained is first input to a deformable convolutional neural network (deformable convolution), and a predicted offset variable is output through the deformable convolutional neural network.
- the offset variable is a convolution Check the position offset of the input pixel corresponding to each weight value, and use the offset variable to change the actual input characteristics of the operation.
- Figure 5 is a structural diagram of a deformable convolutional neural network in an embodiment of this application.
- the traditional convolution window only needs to train the pixel weight value of each convolution window.
- the deformable convolutional network needs additional parameters to train the shape of the convolution window.
- the offset area 51 in Figure 5 is the parameter to be trained plus the deformed convolution.
- the size of the parameter to be trained is the same as the size of the image 52 to be trained.
- the convolution window slides on the offset area 51 to show the convolution pixel offset. The effect of, achieve the effect of sampling point optimization, and finally output the feature image 53 to be trained.
- the feature image to be trained is input to the offset network to be trained, and the offset network to be trained outputs the second prediction category labeling information.
- the position offset variable of the input pixel corresponding to each weight in a convolution kernel can be predicted to change the actual input characteristics of the convolution operation, thereby training to obtain the most effective transformation In this way, the mode of confrontation training can be realized.
- a deformable convolutional neural network is used to obtain the feature image to be trained corresponding to the image.
- the feature image to be trained corresponding to the image may include:
- y(p 0 ) represents the feature image to be trained
- p 0 represents the pixel value in the feature image to be trained
- p n represents the position of the sampling point in the convolution kernel
- ⁇ p n represents the offset variable
- w(p n ) represents The weight value of the convolution kernel performing the convolution operation at the corresponding position of the image to be trained
- x(p 0 +p n + ⁇ p n ) represents the pixel value of the corresponding position of the image to be trained.
- the classification loss function is used to determine the third model parameter corresponding to the image classification network to be trained, it may also include:
- the classification loss function is used to determine the fourth model parameter corresponding to the image classification network to be trained
- the fourth prediction category label information of the image to be trained is obtained through the offset network to be trained;
- the classification loss function is used to determine the fifth model parameter corresponding to the offset network to be trained
- the image semantic segmentation network model to be trained is trained to obtain the image semantic segmentation network model, including:
- the image semantic segmentation network model to be trained is trained to obtain the image semantic segmentation network model.
- the process of another round of model alternate training is introduced.
- the model training device completes one alternate training, the next round of conditional training can be started.
- the model training device uses the classification loss function to train the image classification network to be trained.
- the classification loss function is used to estimate the degree of inconsistency between the predicted value of the model and the true value.
- the image content category information of the image to be trained belongs to the true value, and the image to be trained
- the third prediction category labeling information belongs to the predicted value.
- the smaller the classification loss function the better the robustness of the image classification network. Therefore, the fourth model parameter corresponding to the image classification network to be trained can be obtained according to the classification loss function.
- the weight value of the image classification network to be trained needs to be fixed at this time, that is, the fourth model parameter of the image classification network to be trained is fixed , And then input the image to be trained into the offset network to be trained, and the offset network to be trained outputs the fourth prediction category label information of the image to be trained.
- the model training device uses the same classification loss function to train the offset network to be trained.
- the classification loss function is used to estimate the degree of inconsistency between the predicted value of the model and the true value.
- the image content category information of the image to be trained belongs to the true value and needs to be trained
- the label information of the fourth prediction category of the image belongs to the prediction value. Therefore, the fifth model parameter corresponding to the offset network can be obtained according to the classification loss function.
- the model training device After multiple rounds of alternating training, the model training device performs semantic segmentation of the training image according to the model parameters obtained in each round of training (including the second model parameter, the third model parameter, the fourth model parameter, and the fifth model parameter obtained through training)
- the network model is trained.
- the offset variables predicted by the offset network during the training process are fused into a relatively complete image content area, and finally, the obtained image content area is used as the pixel
- the supervision information of the first-level segmentation is used to train the semantic segmentation network model of the image to be trained to obtain the image semantic segmentation network model.
- the image semantic segmentation network model outputs the corresponding semantic segmentation result.
- the strategy of fixing one branch and training the other branch through the above-mentioned method can make the image classification network and the offset network continue to conduct adversarial learning.
- the image classification network is more informative. Weak regions are continuously enhanced for training the classifier after input.
- the branch of the offset network can also continuously locate regions with weaker discrimination.
- the method for training an image classification model provided in the embodiment of this application is a seventh optional embodiment, according to the second model parameter and the third model parameter, the training The image semantic segmentation network model is trained to obtain the image semantic segmentation network model, which may include:
- the image content area corresponding to the image to be trained is determined according to the offset variable of each training offset network, where , N is an integer greater than or equal to 1;
- L seg represents the objective loss function
- N represents the total number of categories
- c represents the c-th category
- k is greater than or equal to 1 and less than or equal to N
- I() represents the Dirac function
- It represents the predicted probability value of the c-th category at the pixel
- i represents the abscissa position of the pixel in the image to be trained
- j represents the ordinate position of the pixel in the image to be trained.
- the pixel-level image is used as the training object, and the resulting image semantic segmentation network model can predict the category of each feature point in the image.
- the image to be processed includes but is not limited to the following formats, BMP format, PCX format, TIF, GIF, JPEG format, EXIF, SVG format, DXF, EPS format, PNG format, HDRI format and WMF.
- the image processing device inputs the image to be processed into the image semantic segmentation network model, and the image semantic segmentation network model outputs the corresponding semantic segmentation result.
- the image semantic segmentation network model is obtained through alternate training of the image classification network to be trained and the offset network to be trained.
- the offset network to be trained is used to classify images according to the offset variable, and the image classification network to be trained is used To classify the image content in the image. It can be understood that the training process of the image semantic segmentation network model is as described in the first to eighth embodiments corresponding to FIG. 3 and FIG. 3, so it is not repeated here.
- image semantic segmentation network model can be based on Fully Convolutional Networks (FCN), Conditional Random Field (CRF) or Markov Random Field (Markov Random Field, MRF) training
- FCN Fully Convolutional Networks
- CRF Conditional Random Field
- MRF Markov Random Field
- the image processing device processes the image to be processed according to the semantic segmentation result.
- the semantic segmentation result can be used in a website to search for images, that is, to search for other images related to the image to be processed. It can also be a personalized recommendation based on image content analysis.
- Semantic segmentation results usually have the following characteristics. First, the different regions obtained by segmentation are smooth and their textures and gray levels are similar; second, adjacent semantic segmentation regions have obvious differences in the nature of the segmentation; third, After segmentation, the boundaries of different semantic regions are clear and regular.
- weakly-supervised image semantic segmentation can be realized, which can be applied to the case of lack of fine pixel-level segmentation and annotation data, and only rely on full image classification and annotation to achieve high-accuracy image segmentation.
- the input image of interest can be pooled first to obtain a 3 ⁇ 3 feature map 74, and then Through the fully connected layer 75, the output is the offset variable 76 corresponding to each area. After another fully connected layer 77, a semantic segmentation result (including classification information 78 and positioning information 79) is obtained.
- FIG. 8 is a schematic diagram of an embodiment of a model training device in an embodiment of the application.
- the model training device 30 includes:
- the obtaining module 301 is configured to obtain an image to be trained, wherein the image to be trained has category label information, and the category label information is used to indicate the category information of the image content existing in the image to be trained;
- the acquiring module 301 is also configured to acquire the first prediction category label information of the image to be trained through the image classification network to be trained when the first model parameter of the offset network to be trained is fixed, wherein The offset network is used to classify the image according to the offset variable, and the image classification network to be trained is used to classify the image content in the image;
- the determining module 302 is configured to use a classification loss function to determine the second model parameter corresponding to the image classification network to be trained according to the image content category information and the first prediction category label information acquired by the acquiring module 301;
- the obtaining module 301 is further configured to obtain the second prediction category label information of the image to be trained through the offset network to be trained when the second model parameter of the image classification network to be trained is fixed;
- the determining module 302 is further configured to use the classification loss function to determine the corresponding offset network to be trained based on the image content category information and the second predicted category label information acquired by the acquiring module 301
- the training module 303 is configured to train the image semantic segmentation network model to be trained according to the second model parameter and the third model parameter determined by the determining module 302 to obtain an image semantic segmentation network model, wherein the image
- the semantic segmentation network model is used to determine the semantic segmentation result of the image to be processed.
- the determining module 302 is specifically configured to determine the predicted probability value corresponding to each category according to the image content category information and the first predicted category labeling information;
- the second model parameter corresponding to the image classification network to be trained is determined.
- the determining module 302 is specifically configured to determine the predicted probability value corresponding to each category according to the image content category information and the second predicted category labeling information;
- the offset network can also provide the position of the input point that has a weaker contribution to the classification. According to the changed offset variable, it can be positioned to the weaker discriminative The purpose of the image content area.
- the determining module 302 is further configured to use the classification loss function to determine the image classification network corresponding to the image classification network according to the image content category information and the third prediction category label information acquired by the acquisition module 301
- the obtaining module 301 is further configured to obtain the fourth prediction category label information of the image to be trained through the offset network to be trained when the fourth model parameter of the image classification network to be trained is fixed;
- the strategy of fixing one branch and training the other branch through the above-mentioned method can make the image classification network and the offset network continue to conduct adversarial learning.
- the image classification network is more informative. Weak regions are continuously enhanced for training the classifier after input.
- the branch of the offset network can also continuously locate regions with weaker discrimination.
- the training module 303 is specifically configured to use the second model parameter and the third model parameter to train the offset network to be trained for N times, according to the training offset for each training
- the offset variable of the network determines the image content area corresponding to the image to be trained, wherein the N is an integer greater than or equal to 1;
- the image semantic segmentation network model is generated.
- exemplary content of the target loss function is provided.
- the obtaining module 401 is used to obtain an image to be processed
- weakly-supervised image semantic segmentation can be realized, which can be applied to the case of lack of fine pixel-level segmentation and annotation data, and only rely on full image classification and annotation to achieve high-accuracy image segmentation.
- FIG. 10 is a schematic diagram of a server structure provided by an embodiment of the present application.
- the server 500 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 522 (for example, , One or more processors) and memory 532, and one or more storage media 530 (for example, one or more storage devices with a large amount of storage) storing application programs 542 or data 544.
- the memory 532 and the storage medium 530 may be short-term storage or persistent storage.
- the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
- the central processing unit 522 may be configured to communicate with the storage medium 530 and execute a series of instruction operations in the storage medium 530 on the server 500.
- the server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input and output interfaces 558, and/or one or more operating systems 541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- the steps performed by the server in the foregoing embodiment may be based on the server structure shown in FIG. 10.
- the CPU 522 included in the server may also be used to execute all or part of the steps in the embodiment shown in FIG. 3 or FIG. 6.
- the embodiment of the present application also provides another image processing device, as shown in FIG. 11.
- the terminal device can be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a vehicle-mounted computer, etc. Take the terminal device as a mobile phone as an example:
- FIG. 11 shows a block diagram of a part of the structure of a mobile phone related to a terminal device provided in an embodiment of the present application.
- the mobile phone includes: a radio frequency (RF) circuit 610, a memory 620, an input unit 630, a display unit 640, a sensor 650, an audio circuit 660, a wireless fidelity (WiFi) module 670, and a processor 680 , And power supply 690 and other components.
- RF radio frequency
- the RF circuit 610 can be used for receiving and sending signals during the process of sending and receiving information or talking.
- the processor 680 After receiving the downlink information of the base station, it is processed by the processor 680; in addition, the designed uplink data is sent to the base station.
- the memory 620 may be used to store software programs and modules.
- the processor 680 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 620.
- the input unit 630 may be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the mobile phone.
- the input unit 630 may include a touch panel 631 and other input devices 632.
- the input unit 630 may also include other input devices 632.
- the other input device 632 may include, but is not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick.
- the display unit 640 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
- the display unit 640 may include a display panel 641.
- the display panel 641 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
- the touch panel 631 can cover the display panel 641.
- the touch panel 631 and the display panel 641 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 631 and the display panel 641 can be integrated. Realize the input and output functions of mobile phones.
- the mobile phone may also include at least one sensor 650, such as a light sensor, a motion sensor, and other sensors.
- at least one sensor 650 such as a light sensor, a motion sensor, and other sensors.
- the audio circuit 660, the speaker 661, and the microphone 662 can provide an audio interface between the user and the mobile phone.
- the processor 680 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 620, and calling data stored in the memory 620. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (15)
- 一种图像分类模型训练的方法,其特征在于,所述方法由计算机设备执行,包括:获取待训练图像,其中,所述待训练图像具有类别标注信息,所述类别标注信息用于表示所述待训练图像中存在的图像内容类别信息;当固定待训练偏移量网络的第一模型参数时,通过待训练图像分类网络获取所述待训练图像的第一预测类别标注信息,其中,所述待训练偏移量网络用于根据偏移变量对图像进行分类,所述待训练图像分类网络用于对图像中的图像内容进行分类;根据所述图像内容类别信息以及所述第一预测类别标注信息,采用分类损失函数确定所述待训练图像分类网络所对应的第二模型参数;当固定所述待训练图像分类网络的所述第二模型参数时,通过所述待训练偏移量网络获取所述待训练图像的第二预测类别标注信息;根据所述图像内容类别信息以及所述第二预测类别标注信息,采用所述分类损失函数确定所述待训练偏移量网络所对应的第三模型参数;根据所述第二模型参数与所述第三模型参数,对待训练图像语义分割网络模型进行训练,得到图像语义分割网络模型,其中,所述图像语义分割网络模型用于确定待处理图像的语义分割结果。
- 根据权利要求1所述的方法,其特征在于,所述根据所述图像内容类别信息以及所述第一预测类别标注信息,采用分类损失函数确定所述待训练图像分类网络所对应的第二模型参数,包括:根据所述图像内容类别信息以及所述第一预测类别标注信息,确定在各个类别所对应的预测概率值;根据所述各个类别所对应的预测概率值确定所述分类损失函数的分类损失;当所述分类损失函数的分类损失为最小值时,确定所述待训练图像分类网络所对应的所述第二模型参数。
- 根据权利要求1所述的方法,其特征在于,所述根据所述图像内容类别信息以及所述第二预测类别标注信息,采用所述分类损失函数确定所述待训练偏移量网络所对应的第三模型参数,包括:根据所述图像内容类别信息以及所述第二预测类别标注信息,确定在各个类别所对应的预测概率值;根据所述各个类别所对应的预测概率值确定所述分类损失函数的分类损失;当所述分类损失函数的分类损失为最大值时,确定所述待训练偏移量网络所对应的所述第三模型参数。
- 根据权利要求1所述的方法,其特征在于,所述通过所述待训练偏移量网络获取所述待训练图像的第二预测类别标注信息之前,所述方法还包括:通过可变形卷积神经网络获取所述待训练图像所对应的待训练特征图像,其中,所述可变形卷积神经网络用于预测所述待训练图像的偏移变量;所述通过所述待训练偏移量网络获取所述待训练图像的第二预测类别标注信息,包括:通过所述待训练偏移量网络获取所述待训练特征图像所对应的所述第二预测类别标注信息。
- 根据权利要求1所述的方法,其特征在于,所述根据所述图像内容类别信息以及所述第二预测类别标注信息,采用所述分类损失函数确定所述待训练图像分类网络所对应的第三模型参数之后,所述方法还包括:当固定所述待训练偏移量网络所对应的所述第三模型参数时,通过所述待训练图像分类网络获取所述待训练图像的第三预测类别标注信息;根据所述图像内容类别信息以及所述第三预测类别标注信息,采用所述分类损失函数确定所述待训练图像分类网络所对应的第四模型参数;当固定所述待训练图像分类网络的所述第四模型参数时,通过所述待训练偏移量网络获 取所述待训练图像的第四预测类别标注信息;根据所述图像内容类别信息以及所述第四预测类别标注信息,采用所述分类损失函数确定所述待训练偏移量网络所对应的第五模型参数;所述根据所述第二模型参数与所述第三模型参数,对待训练图像语义分割网络模型进行训练,得到图像语义分割网络模型,包括:根据所述第二模型参数、所述第三模型参数、所述第四模型参数以及所述第五模型参数,对所述待训练图像语义分割网络模型进行训练,得到所述图像语义分割网络模型。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第二模型参数与所述第三模型参数,对待训练图像语义分割网络模型进行训练,得到图像语义分割网络模型,包括:当采用所述第二模型参数与所述第三模型参数对所述待训练偏移量网络经过N次训练时,根据每次训练所述待训练偏移量网络的偏移变量,确定所述待训练图像所对应的图像内容区域,其中,所述N为大于或等于1的整数;根据所述图像内容区域,采用目标损失函数对待训练图像语义分割网络模型进行训练;当所述目标损失函数的损失结果为最小值时,生成所述图像语义分割网络模型。
- 一种图像处理的方法,其特征在于,所述方法由计算机设备执行,包括:获取待处理图像;通过图像语义分割网络模型获取所述待处理图像的语义分割结果,其中,所述图像语义分割网络模型为根据待训练图像分类网络以及待训练偏移量网络交替训练得到的,所述待训练偏移量网络用于根据偏移变量对图像进行分类,所述待训练图像分类网络用于对图像中的图像内容进行分类;根据所述语义分割结果对所述待处理图像进行处理。
- 一种模型训练装置,其特征在于,所述装置用于计算机设备中,包括:获取模块,用于获取待训练图像,其中,所述待训练图像具有类别标注信息,所述类别标注信息用于表示所述待训练图像中存在的图像内容类别信息;所述获取模块,还用于当固定待训练偏移量网络的第一模型参数时,通过待训练图像分类网络获取所述待训练图像的第一预测类别标注信息,其中,所述待训练偏移量网络用于根据偏移变量对图像进行分类,所述待训练图像分类网络用于对图像中的图像内容进行分类;确定模块,用于根据所述图像内容类别信息以及所述获取模块获取的所述第一预测类别标注信息,采用分类损失函数确定所述待训练图像分类网络所对应的第二模型参数;所述获取模块,还用于当固定所述待训练图像分类网络的所述第二模型参数时,通过所述待训练偏移量网络获取所述待训练图像的第二预测类别标注信息;所述确定模块,还用于根据所述图像内容类别信息以及所述获取模块获取的所述第二预测类别标注信息,采用所述分类损失函数确定所述待训练偏移量网络所对应的第三模型参数;训练模块,用于根据所述确定模块确定的所述第二模型参数与所述第三模型参数,对待训练图像语义分割网络模型进行训练,得到图像语义分割网络模型,其中,所述图像语义分割网络模型用于确定待处理图像的语义分割结果。
- 一种图像处理装置,其特征在于,所述装置用于计算机设备中,包括:获取模块,用于获取待处理图像;所述获取模块,还用于通过图像语义分割网络模型获取所述待处理图像的语义分割结果,其中,所述图像语义分割网络模型为根据待训练图像分类网络以及待训练偏移量网络交替训练得到的,所述待训练偏移量网络用于根据偏移变量对图像进行分类,所述待训练图像分类网络用于对图像中的图像内容进行分类;处理模块,用于根据所述获取模块获取的所述语义分割结果对所述待处理图像进行处理。
- 一种服务器,其特征在于,包括:存储器、收发器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:获取待训练图像,其中,所述待训练图像具有类别标注信息,所述类别标注信息用于表示所述待训练图像中存在的图像内容类别信息;当固定待训练偏移量网络的第一模型参数时,通过待训练图像分类网络获取所述待训练图像的第一预测类别标注信息,其中,所述待训练偏移量网络用于根据偏移变量对图像进行分类,所述待训练图像分类网络用于对图像中的图像内容进行分类;根据所述图像内容类别信息以及所述第一预测类别标注信息,采用分类损失函数确定所述待训练图像分类网络所对应的第二模型参数;当固定所述待训练图像分类网络的所述第二模型参数时,通过所述待训练偏移量网络获 取所述待训练图像的第二预测类别标注信息;根据所述图像内容类别信息以及所述第二预测类别标注信息,采用所述分类损失函数确定所述待训练偏移量网络所对应的第三模型参数;根据所述第二模型参数与所述第三模型参数,对待训练图像语义分割网络模型进行训练,得到图像语义分割网络模型,其中,所述图像语义分割网络模型用于确定待处理图像的语义分割结果;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 一种终端设备,其特征在于,包括:存储器、收发器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:获取待处理图像;通过图像语义分割网络模型获取所述待处理图像的语义分割结果,其中,所述图像语义分割网络模型为根据待训练图像分类网络以及待训练偏移量网络交替训练得到的,所述待训练偏移量网络用于根据偏移变量对图像进行分类,所述待训练图像分类网络用于对图像中的图像内容进行分类;根据所述语义分割结果对所述待处理图像进行处理;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至9中任一项所述的方法,或执行如权利要求10所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20777689.9A EP3951654A4 (en) | 2019-03-26 | 2020-03-16 | METHOD FOR TRAINING AN IMAGE CLASSIFICATION MODEL AND METHOD AND APPARATUS FOR IMAGE PROCESSING |
JP2021522436A JP7185039B2 (ja) | 2019-03-26 | 2020-03-16 | 画像分類モデルの訓練方法、画像処理方法及びその装置、並びにコンピュータプログラム |
KR1020217013575A KR102698958B1 (ko) | 2019-03-26 | 2020-03-16 | 이미지 분류 모델 훈련 방법, 및 이미지 처리 방법 및 디바이스 |
US17/238,634 US20210241109A1 (en) | 2019-03-26 | 2021-04-23 | Method for training image classification model, image processing method, and apparatuses |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233985.5A CN109784424B (zh) | 2019-03-26 | 2019-03-26 | 一种图像分类模型训练的方法、图像处理的方法及装置 |
CN201910233985.5 | 2019-03-26 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/238,634 Continuation US20210241109A1 (en) | 2019-03-26 | 2021-04-23 | Method for training image classification model, image processing method, and apparatuses |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020192471A1 true WO2020192471A1 (zh) | 2020-10-01 |
Family
ID=66490551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/079496 WO2020192471A1 (zh) | 2019-03-26 | 2020-03-16 | 一种图像分类模型训练的方法、图像处理的方法及装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210241109A1 (zh) |
EP (1) | EP3951654A4 (zh) |
JP (1) | JP7185039B2 (zh) |
KR (1) | KR102698958B1 (zh) |
CN (1) | CN109784424B (zh) |
WO (1) | WO2020192471A1 (zh) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257727A (zh) * | 2020-11-03 | 2021-01-22 | 西南石油大学 | 一种基于深度学习自适应可变形卷积的特征图像提取方法 |
CN112418232A (zh) * | 2020-11-18 | 2021-02-26 | 北京有竹居网络技术有限公司 | 图像分割方法、装置、可读介质及电子设备 |
CN112950639A (zh) * | 2020-12-31 | 2021-06-11 | 山西三友和智慧信息技术股份有限公司 | 一种基于SA-Net的MRI医学图像分割方法 |
CN113033436A (zh) * | 2021-03-29 | 2021-06-25 | 京东鲲鹏(江苏)科技有限公司 | 障碍物识别模型训练方法及装置、电子设备、存储介质 |
CN113139618A (zh) * | 2021-05-12 | 2021-07-20 | 电子科技大学 | 一种基于集成防御的鲁棒性增强的分类方法及装置 |
CN113642581A (zh) * | 2021-08-12 | 2021-11-12 | 福州大学 | 基于编码多路径语义交叉网络的图像语义分割方法及系统 |
CN113887662A (zh) * | 2021-10-26 | 2022-01-04 | 北京理工大学重庆创新中心 | 一种基于残差网络的图像分类方法、装置、设备及介质 |
CN113963220A (zh) * | 2021-12-22 | 2022-01-21 | 熵基科技股份有限公司 | 安检图像分类模型训练方法、安检图像分类方法及装置 |
CN114612663A (zh) * | 2022-03-11 | 2022-06-10 | 浙江工商大学 | 基于弱监督学习的域自适应实例分割方法及装置 |
CN114677677A (zh) * | 2022-05-30 | 2022-06-28 | 南京友一智能科技有限公司 | 一种质子交换膜燃料电池气体扩散层材料比例预测方法 |
CN115019038A (zh) * | 2022-05-23 | 2022-09-06 | 杭州缦图摄影有限公司 | 一种相似图像像素级语义匹配方法 |
WO2023082870A1 (zh) * | 2021-11-10 | 2023-05-19 | 腾讯科技(深圳)有限公司 | 图像分割模型的训练方法、图像分割方法、装置及设备 |
CN116403163A (zh) * | 2023-04-20 | 2023-07-07 | 慧铁科技有限公司 | 一种截断塞门手把开合状态的识别方法和装置 |
CN116503686A (zh) * | 2023-03-28 | 2023-07-28 | 北京百度网讯科技有限公司 | 图像矫正模型的训练方法、图像矫正方法、装置及介质 |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161274B (zh) * | 2018-11-08 | 2023-07-07 | 上海市第六人民医院 | 腹部图像分割方法、计算机设备 |
CN109784424B (zh) * | 2019-03-26 | 2021-02-09 | 腾讯科技(深圳)有限公司 | 一种图像分类模型训练的方法、图像处理的方法及装置 |
CN110210544B (zh) * | 2019-05-24 | 2021-11-23 | 上海联影智能医疗科技有限公司 | 图像分类方法、计算机设备和存储介质 |
CN110223230A (zh) * | 2019-05-30 | 2019-09-10 | 华南理工大学 | 一种多前端深度图像超分辨率系统及其数据处理方法 |
CN111047130B (zh) * | 2019-06-11 | 2021-03-02 | 北京嘀嘀无限科技发展有限公司 | 用于交通分析和管理的方法和系统 |
CN110363709A (zh) * | 2019-07-23 | 2019-10-22 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、图像展示方法、模型训练方法及装置 |
CN110458218B (zh) * | 2019-07-31 | 2022-09-27 | 北京市商汤科技开发有限公司 | 图像分类方法及装置、分类网络训练方法及装置 |
CN110490239B (zh) * | 2019-08-06 | 2024-02-27 | 腾讯医疗健康(深圳)有限公司 | 图像质控网络的训练方法、质量分类方法、装置及设备 |
CN110807760B (zh) * | 2019-09-16 | 2022-04-08 | 北京农业信息技术研究中心 | 一种烟叶分级方法及系统 |
CN110705460B (zh) * | 2019-09-29 | 2023-06-20 | 北京百度网讯科技有限公司 | 图像类别识别方法及装置 |
CN110737783B (zh) * | 2019-10-08 | 2023-01-17 | 腾讯科技(深圳)有限公司 | 一种推荐多媒体内容的方法、装置及计算设备 |
CN110826596A (zh) * | 2019-10-09 | 2020-02-21 | 天津大学 | 一种基于多尺度可变形卷积的语义分割方法 |
CN110704661B (zh) * | 2019-10-12 | 2021-04-13 | 腾讯科技(深圳)有限公司 | 一种图像分类方法和装置 |
CN110930417B (zh) * | 2019-11-26 | 2023-08-08 | 腾讯科技(深圳)有限公司 | 图像分割模型的训练方法和装置、图像分割方法和装置 |
CN110956214B (zh) * | 2019-12-03 | 2023-10-13 | 北京车和家信息技术有限公司 | 一种自动驾驶视觉定位模型的训练方法及装置 |
CN112750128B (zh) * | 2019-12-13 | 2023-08-01 | 腾讯科技(深圳)有限公司 | 图像语义分割方法、装置、终端及可读存储介质 |
CN113053332B (zh) * | 2019-12-28 | 2022-04-22 | Oppo广东移动通信有限公司 | 背光亮度调节方法、装置、电子设备及可读存储介质 |
CN111259904B (zh) * | 2020-01-16 | 2022-12-27 | 西南科技大学 | 一种基于深度学习和聚类的语义图像分割方法及系统 |
CN111369564B (zh) * | 2020-03-04 | 2022-08-09 | 腾讯科技(深圳)有限公司 | 一种图像处理的方法、模型训练的方法及装置 |
CN111523548B (zh) * | 2020-04-24 | 2023-11-28 | 北京市商汤科技开发有限公司 | 一种图像语义分割、智能行驶控制方法及装置 |
CN113673668A (zh) * | 2020-05-13 | 2021-11-19 | 北京君正集成电路股份有限公司 | 一种车辆检测训练中二级损失函数的计算方法 |
CN111723813B (zh) | 2020-06-05 | 2021-07-06 | 中国科学院自动化研究所 | 基于类内判别器的弱监督图像语义分割方法、系统、装置 |
CN111814833B (zh) * | 2020-06-11 | 2024-06-07 | 浙江大华技术股份有限公司 | 票据处理模型的训练方法及图像处理方法、图像处理设备 |
CN111783635A (zh) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | 图像标注方法、装置、设备以及存储介质 |
CN111784673B (zh) * | 2020-06-30 | 2023-04-18 | 创新奇智(上海)科技有限公司 | 缺陷检测模型训练和缺陷检测方法、设备及存储介质 |
CN112132841B (zh) * | 2020-09-22 | 2024-04-09 | 上海交通大学 | 医疗图像切割方法及装置 |
CN112333402B (zh) * | 2020-10-20 | 2021-10-22 | 浙江大学 | 一种基于声波的图像对抗样本生成方法及系统 |
CN112487479B (zh) * | 2020-12-10 | 2023-10-13 | 支付宝(杭州)信息技术有限公司 | 一种训练隐私保护模型的方法、隐私保护方法及装置 |
CN112232355B (zh) * | 2020-12-11 | 2021-04-02 | 腾讯科技(深圳)有限公司 | 图像分割网络处理、图像分割方法、装置和计算机设备 |
CN112819008B (zh) * | 2021-01-11 | 2022-10-28 | 腾讯科技(深圳)有限公司 | 实例检测网络的优化方法、装置、介质及电子设备 |
CN112767420B (zh) * | 2021-02-26 | 2021-11-23 | 中国人民解放军总医院 | 基于人工智能的核磁影像分割方法、装置、设备和介质 |
CN113033549B (zh) * | 2021-03-09 | 2022-09-20 | 北京百度网讯科技有限公司 | 定位图获取模型的训练方法和装置 |
CN113505800A (zh) * | 2021-06-30 | 2021-10-15 | 深圳市慧鲤科技有限公司 | 图像处理方法及其模型的训练方法和装置、设备、介质 |
CN113822901B (zh) * | 2021-07-21 | 2023-12-12 | 南京旭锐软件科技有限公司 | 图像分割方法、装置、存储介质及电子设备 |
CN113610807B (zh) * | 2021-08-09 | 2024-02-09 | 西安电子科技大学 | 基于弱监督多任务学习的新冠肺炎分割方法 |
CN113673607A (zh) * | 2021-08-24 | 2021-11-19 | 支付宝(杭州)信息技术有限公司 | 图像标注模型的训练及图像标注的方法及装置 |
CN114004854B (zh) * | 2021-09-16 | 2024-06-07 | 清华大学 | 一种显微镜下的切片图像实时处理显示系统和方法 |
KR102430989B1 (ko) | 2021-10-19 | 2022-08-11 | 주식회사 노티플러스 | 인공지능 기반 콘텐츠 카테고리 예측 방법, 장치 및 시스템 |
CN113723378B (zh) * | 2021-11-02 | 2022-02-08 | 腾讯科技(深圳)有限公司 | 一种模型训练的方法、装置、计算机设备和存储介质 |
CN114049516A (zh) * | 2021-11-09 | 2022-02-15 | 北京百度网讯科技有限公司 | 训练方法、图像处理方法、装置、电子设备以及存储介质 |
CN113780249B (zh) * | 2021-11-10 | 2022-02-15 | 腾讯科技(深圳)有限公司 | 表情识别模型的处理方法、装置、设备、介质和程序产品 |
TWI806392B (zh) * | 2022-01-27 | 2023-06-21 | 國立高雄師範大學 | 表格文本的表格辨識方法 |
CN114792398B (zh) * | 2022-06-23 | 2022-09-27 | 阿里巴巴(中国)有限公司 | 图像分类的方法、存储介质、处理器及系统 |
CN115170809B (zh) * | 2022-09-06 | 2023-01-03 | 浙江大华技术股份有限公司 | 图像分割模型训练、图像分割方法、装置、设备及介质 |
CN116363374B (zh) * | 2023-06-02 | 2023-08-29 | 中国科学技术大学 | 图像语义分割网络持续学习方法、系统、设备及存储介质 |
CN117218686B (zh) * | 2023-10-20 | 2024-03-29 | 广州脉泽科技有限公司 | 一种开放场景下的掌静脉roi提取方法及系统 |
CN117333493B (zh) * | 2023-12-01 | 2024-03-15 | 深圳市志达精密科技有限公司 | 一种基于机器视觉的显示器底座生产用检测系统以及方法 |
CN117911501B (zh) * | 2024-03-20 | 2024-06-04 | 陕西中铁华博实业发展有限公司 | 一种金属加工钻孔高精度定位方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102436583A (zh) * | 2011-09-26 | 2012-05-02 | 哈尔滨工程大学 | 基于对标注图像学习的图像分割方法 |
CN107871117A (zh) * | 2016-09-23 | 2018-04-03 | 三星电子株式会社 | 用于检测对象的设备和方法 |
US20190015059A1 (en) * | 2017-07-17 | 2019-01-17 | Siemens Healthcare Gmbh | Semantic segmentation for cancer detection in digital breast tomosynthesis |
CN109493330A (zh) * | 2018-11-06 | 2019-03-19 | 电子科技大学 | 一种基于多任务学习的细胞核实例分割方法 |
CN109784424A (zh) * | 2019-03-26 | 2019-05-21 | 腾讯科技(深圳)有限公司 | 一种图像分类模型训练的方法、图像处理的方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10019657B2 (en) * | 2015-05-28 | 2018-07-10 | Adobe Systems Incorporated | Joint depth estimation and semantic segmentation from a single image |
EP3617991A4 (en) * | 2017-04-26 | 2020-12-09 | Sony Interactive Entertainment Inc. | LEARNING DEVICE, IMAGE RECOGNITION DEVICE, LEARNING PROCEDURE AND PROGRAM |
CN108764164B (zh) * | 2018-05-30 | 2020-12-08 | 华中科技大学 | 一种基于可变形卷积网络的人脸检测方法及系统 |
CN109101897A (zh) * | 2018-07-20 | 2018-12-28 | 中国科学院自动化研究所 | 水下机器人的目标检测方法、系统及相关设备 |
-
2019
- 2019-03-26 CN CN201910233985.5A patent/CN109784424B/zh active Active
-
2020
- 2020-03-16 JP JP2021522436A patent/JP7185039B2/ja active Active
- 2020-03-16 WO PCT/CN2020/079496 patent/WO2020192471A1/zh unknown
- 2020-03-16 KR KR1020217013575A patent/KR102698958B1/ko active IP Right Grant
- 2020-03-16 EP EP20777689.9A patent/EP3951654A4/en active Pending
-
2021
- 2021-04-23 US US17/238,634 patent/US20210241109A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102436583A (zh) * | 2011-09-26 | 2012-05-02 | 哈尔滨工程大学 | 基于对标注图像学习的图像分割方法 |
CN107871117A (zh) * | 2016-09-23 | 2018-04-03 | 三星电子株式会社 | 用于检测对象的设备和方法 |
US20190015059A1 (en) * | 2017-07-17 | 2019-01-17 | Siemens Healthcare Gmbh | Semantic segmentation for cancer detection in digital breast tomosynthesis |
CN109493330A (zh) * | 2018-11-06 | 2019-03-19 | 电子科技大学 | 一种基于多任务学习的细胞核实例分割方法 |
CN109784424A (zh) * | 2019-03-26 | 2019-05-21 | 腾讯科技(深圳)有限公司 | 一种图像分类模型训练的方法、图像处理的方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3951654A4 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257727B (zh) * | 2020-11-03 | 2023-10-27 | 西南石油大学 | 一种基于深度学习自适应可变形卷积的特征图像提取方法 |
CN112257727A (zh) * | 2020-11-03 | 2021-01-22 | 西南石油大学 | 一种基于深度学习自适应可变形卷积的特征图像提取方法 |
CN112418232A (zh) * | 2020-11-18 | 2021-02-26 | 北京有竹居网络技术有限公司 | 图像分割方法、装置、可读介质及电子设备 |
CN112950639B (zh) * | 2020-12-31 | 2024-05-10 | 山西三友和智慧信息技术股份有限公司 | 一种基于SA-Net的MRI医学图像分割方法 |
CN112950639A (zh) * | 2020-12-31 | 2021-06-11 | 山西三友和智慧信息技术股份有限公司 | 一种基于SA-Net的MRI医学图像分割方法 |
CN113033436A (zh) * | 2021-03-29 | 2021-06-25 | 京东鲲鹏(江苏)科技有限公司 | 障碍物识别模型训练方法及装置、电子设备、存储介质 |
CN113033436B (zh) * | 2021-03-29 | 2024-04-16 | 京东鲲鹏(江苏)科技有限公司 | 障碍物识别模型训练方法及装置、电子设备、存储介质 |
CN113139618B (zh) * | 2021-05-12 | 2022-10-14 | 电子科技大学 | 一种基于集成防御的鲁棒性增强的分类方法及装置 |
CN113139618A (zh) * | 2021-05-12 | 2021-07-20 | 电子科技大学 | 一种基于集成防御的鲁棒性增强的分类方法及装置 |
CN113642581A (zh) * | 2021-08-12 | 2021-11-12 | 福州大学 | 基于编码多路径语义交叉网络的图像语义分割方法及系统 |
CN113642581B (zh) * | 2021-08-12 | 2023-09-22 | 福州大学 | 基于编码多路径语义交叉网络的图像语义分割方法及系统 |
CN113887662A (zh) * | 2021-10-26 | 2022-01-04 | 北京理工大学重庆创新中心 | 一种基于残差网络的图像分类方法、装置、设备及介质 |
WO2023082870A1 (zh) * | 2021-11-10 | 2023-05-19 | 腾讯科技(深圳)有限公司 | 图像分割模型的训练方法、图像分割方法、装置及设备 |
CN113963220A (zh) * | 2021-12-22 | 2022-01-21 | 熵基科技股份有限公司 | 安检图像分类模型训练方法、安检图像分类方法及装置 |
CN114612663A (zh) * | 2022-03-11 | 2022-06-10 | 浙江工商大学 | 基于弱监督学习的域自适应实例分割方法及装置 |
CN115019038A (zh) * | 2022-05-23 | 2022-09-06 | 杭州缦图摄影有限公司 | 一种相似图像像素级语义匹配方法 |
CN115019038B (zh) * | 2022-05-23 | 2024-04-30 | 杭州海马体摄影有限公司 | 一种相似图像像素级语义匹配方法 |
CN114677677B (zh) * | 2022-05-30 | 2022-08-19 | 南京友一智能科技有限公司 | 一种质子交换膜燃料电池气体扩散层材料比例预测方法 |
CN114677677A (zh) * | 2022-05-30 | 2022-06-28 | 南京友一智能科技有限公司 | 一种质子交换膜燃料电池气体扩散层材料比例预测方法 |
CN116503686A (zh) * | 2023-03-28 | 2023-07-28 | 北京百度网讯科技有限公司 | 图像矫正模型的训练方法、图像矫正方法、装置及介质 |
CN116403163B (zh) * | 2023-04-20 | 2023-10-27 | 慧铁科技有限公司 | 一种截断塞门手把开合状态的识别方法和装置 |
CN116403163A (zh) * | 2023-04-20 | 2023-07-07 | 慧铁科技有限公司 | 一种截断塞门手把开合状态的识别方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
US20210241109A1 (en) | 2021-08-05 |
KR102698958B1 (ko) | 2024-08-27 |
CN109784424B (zh) | 2021-02-09 |
EP3951654A4 (en) | 2022-05-25 |
KR20210072051A (ko) | 2021-06-16 |
EP3951654A1 (en) | 2022-02-09 |
JP7185039B2 (ja) | 2022-12-06 |
CN109784424A (zh) | 2019-05-21 |
JP2022505775A (ja) | 2022-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020192471A1 (zh) | 一种图像分类模型训练的方法、图像处理的方法及装置 | |
EP3940638B1 (en) | Image region positioning method, model training method, and related apparatus | |
JP7238139B2 (ja) | 人工知能による画像領域の認識方法、モデルのトレーニング方法、画像処理機器、端末機器、サーバー、コンピュータ機器及びコンピュータプログラム | |
CN112232425B (zh) | 图像处理方法、装置、存储介质及电子设备 | |
US12100192B2 (en) | Method, apparatus, and electronic device for training place recognition model | |
CN110555481B (zh) | 一种人像风格识别方法、装置和计算机可读存储介质 | |
WO2020182121A1 (zh) | 表情识别方法及相关装置 | |
US11468571B2 (en) | Apparatus and method for generating image | |
CN112419326B (zh) | 图像分割数据处理方法、装置、设备及存储介质 | |
CN113807399A (zh) | 一种神经网络训练方法、检测方法以及装置 | |
CN111709398A (zh) | 一种图像识别的方法、图像识别模型的训练方法及装置 | |
CN114722937B (zh) | 一种异常数据检测方法、装置、电子设备和存储介质 | |
CN116935188B (zh) | 模型训练方法、图像识别方法、装置、设备及介质 | |
WO2022042120A1 (zh) | 目标图像提取方法、神经网络训练方法及装置 | |
CN113723378B (zh) | 一种模型训练的方法、装置、计算机设备和存储介质 | |
CN113822427A (zh) | 一种模型训练的方法、图像匹配的方法、装置及存储介质 | |
CN117854156B (zh) | 一种特征提取模型的训练方法和相关装置 | |
Zhong | A convolutional neural network based online teaching method using edge-cloud computing platform | |
CN117351192A (zh) | 一种对象检索模型训练、对象检索方法、装置及电子设备 | |
WO2023207531A1 (zh) | 一种图像处理方法及相关设备 | |
Osuna-Coutiño et al. | Structure extraction in urbanized aerial images from a single view using a CNN-based approach | |
Rawat et al. | Indian sign language recognition system for interrogative words using deep learning | |
CN114283290B (zh) | 图像处理模型的训练、图像处理方法、装置、设备及介质 | |
CN111742345B (zh) | 通过着色的视觉跟踪 | |
Rahman et al. | A Smartphone Based Real-Time Object Recognition System for Visually Impaired People |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20777689 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021522436 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217013575 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020777689 Country of ref document: EP Effective date: 20211026 |