EP3951654A1 - Verfahren zum trainieren eines bildklassifizierungsmodells sowie verfahren und vorrichtung zur bildverarbeitung - Google Patents

Verfahren zum trainieren eines bildklassifizierungsmodells sowie verfahren und vorrichtung zur bildverarbeitung Download PDF

Info

Publication number
EP3951654A1
EP3951654A1 EP20777689.9A EP20777689A EP3951654A1 EP 3951654 A1 EP3951654 A1 EP 3951654A1 EP 20777689 A EP20777689 A EP 20777689A EP 3951654 A1 EP3951654 A1 EP 3951654A1
Authority
EP
European Patent Office
Prior art keywords
image
trained
network
classification
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP20777689.9A
Other languages
English (en)
French (fr)
Other versions
EP3951654A4 (de
Inventor
Zequn JIE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of EP3951654A1 publication Critical patent/EP3951654A1/de
Publication of EP3951654A4 publication Critical patent/EP3951654A4/de
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to the field of artificial intelligence (AI), and in particular, to a method for training an image classification model, an image processing method, and apparatuses.
  • AI artificial intelligence
  • Semantic image segmentation is the cornerstone technology for image understanding, and plays an important role in automated driving systems (for example, street view recognition and understanding), unmanned aerial vehicle applications (for example, landing point determination), and wearable device applications.
  • An image is formed by many pixels, and semantic segmentation is the segmentation of the pixels based on different semantic meanings expressed in the image, to enable a machine to automatically segment and recognize content in the image.
  • a deep convolutional neural network is usually trained to implement full-image classification.
  • a corresponding image content region in a to-be-trained image is then located based on the deep convolutional neural network.
  • These image content regions annotated through full-image classification are then used as segmented supervised information.
  • training is performed to obtain a semantic image segmentation network model.
  • Embodiments of the present disclosure provide a method for training an image classification model, an image processing method, and apparatuses.
  • To-be-trained images annotated on an image level may be trained, so that while the performance of a semantic image segmentation network model is ensured, manual pixel-level annotation is not required, to reduce the costs of manual annotation, thereby improving the efficiency of model training.
  • a method for training an image classification model is provided.
  • the method is performed by a computer device, and includes:
  • an image processing method is provided.
  • the method is performed by a computer device, and includes:
  • a model training apparatus is provided.
  • the apparatus is applicable to a computer device, and includes:
  • an image processing apparatus is provided.
  • the apparatus is applicable to a computer device, and includes:
  • a server including a memory, a transceiver, a processor, and a bus system;
  • a server including a memory, a transceiver, a processor, and a bus system;
  • a computer-readable storage medium stores instructions.
  • the instructions when run on a computer, causes the computer to perform the method in the foregoing aspects.
  • a method for training an image classification model includes: first obtaining a to-be-trained image, obtaining first prediction class annotation information of the to-be-trained image by using a to-be-trained image classification network when a first model parameter of a to-be-trained offset network is fixed, next, determining a second model parameter corresponding to the to-be-trained image classification network by using a classification loss function based on image content class information and the first prediction class annotation information, obtaining second prediction class annotation information of the to-be-trained image by using the to-be-trained offset network when the second model parameter of the to-be-trained image classification network is fixed, next, determining a third model parameter corresponding to the to-be-trained offset network by using the classification loss function based on the image content class information and the second prediction class annotation information, and finally, training a to-be-trained semantic image segmentation network model based on the second model parameter and the third model parameter, to obtain a
  • to-be-trained images annotated on an image level may be trained by using an offset network and an image classification network, so that while the performance of a semantic image segmentation network model is ensured, manual pixel-level annotation is not required, to reduce the costs of manual annotation, thereby improving the efficiency of model training.
  • Embodiments of the present disclosure provide a method for training an image classification model, an image processing method, and apparatuses.
  • To-be-trained images annotated on an image level may be trained, so that while the performance of a semantic image segmentation network model is ensured, manual pixel-level annotation is not required, to reduce the costs of manual annotation, thereby improving the efficiency of model training.
  • the present disclosure provides a method for training a semantic image segmentation network model and an image processing method using the semantic image segmentation network model.
  • the method may use AI to reduce manual annotation in a model training process for semantic image segmentation, thereby improving the efficiency of model training.
  • AI is a theory, method, technology, and application system that uses a digital computer or a machine controlled by the digital computer to simulate, extend, and expand human intelligence, perceive an environment, obtain knowledge, and use knowledge to obtain an optimal result.
  • AI is a comprehensive technology in computer science and attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence.
  • AI is to study the design principles and implementation methods of various intelligent machines, to enable the machines to have the functions of perception, reasoning, and decision-making.
  • the AI technology is a comprehensive discipline, and relates to a wide range of fields including a hardware-level technology and a software-level technology.
  • the basic AI technology generally includes technologies such as a sensor, a dedicated AI chip, cloud computing, distributed storage, a big data processing technology, an operating/interaction system, and electromechanical integration.
  • AI software technologies mainly include several major directions such as a computer vision (CV) technology, an audio processing technology, a natural language processing technology, and machine learning (ML)/deep learning.
  • the CV is a science that studies how to use a machine to "see”, and furthermore, that uses a camera and a computer to replace human eyes to perform machine vision such as recognition, tracking, and measurement on a target, and further perform graphic processing, so that the computer processes the target into an image more suitable for human eyes to observe, or an image transmitted to an instrument for detection.
  • CV studies related theories and technologies and attempts to establish an AI system that can obtain information from images or multidimensional data.
  • the CV technologies usually include technologies such as image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, a 3D technology, virtual reality, augmented reality, synchronous positioning, or map construction, and further include biological feature recognition technologies such as common face recognition and fingerprint recognition.
  • technologies such as image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, a 3D technology, virtual reality, augmented reality, synchronous positioning, or map construction, and further include biological feature recognition technologies such as common face recognition and fingerprint recognition.
  • ML is a multi-disciplinary subject involving a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory.
  • the ML specializes in studying how a computer simulates or implements a human learning behavior to obtain new knowledge or skills, and reorganize an existing knowledge structure, so as to keep improving its performance.
  • the ML is a core of the AI, is a basic way to make the computer intelligent, and is applied to various fields of the AI.
  • the ML and deep learning generally include technologies such as an artificial neural network, a belief network, reinforcement learning, transfer learning, inductive learning, and learning from demonstrations.
  • the method provided in the present disclosure is mainly applied to the field of CV in the field of AI.
  • segmentation, detection, recognition, and tracking are closely related.
  • semantic image segmentation is to understand an image on a pixel level, and a target class corresponding to each pixel in an image needs to be determined.
  • Classes have endless requirement for algorithm precision.
  • the understanding of image content by a computer may develop from giving a semantic label (image class) to an entire image to drawing the position of image content that appears in an image. Furthermore, it is necessary to enable the computer to understand semantic information of each pixel point in the image, so that the computer sees an image like a person, that is, semantic image segmentation.
  • an objective of semantic image segmentation is to annotate each pixel point in an image with one label.
  • semantic segmentation is a very important field in CV and is to recognize an image on a pixel level, that is, to annotate an object class to which each pixel in an image belongs.
  • the image processing method provided based on the present disclosure may be applied to an automated driving scenario. That is, necessary perception needs to be added to a vehicle, to learn about an environment in which the vehicle is located, to enable an automated vehicle to travel safely.
  • the method may also be applied to medical image diagnosis.
  • a machine may enhance the analysis by a radiologist, thereby greatly reducing the time required for running a diagnosis test. For example, a heart region and a lung region may be obtained by segmenting a chest X-ray image.
  • FIG. 1 is a schematic architectural diagram of an image processing system according to an embodiment of the present disclosure.
  • a model training apparatus provided in the present disclosure may be deployed on a server.
  • the image processing apparatus may be deployed on a client.
  • the image processing apparatus may be alternatively deployed on a server.
  • An example in which the image processing apparatus is deployed on a client is used for description herein.
  • a server trains a deformable convolutional neural network to implement full-image classification.
  • the server uses an adversarial learning strategy to enable a backbone network (that is, an image classification network) and a branch network (that is, an offset network) to perform alternate training.
  • a backbone network that is, an image classification network
  • a branch network that is, an offset network
  • the server updates the branch network by using a gradient generated by increasing a classification loss function to enable the branch network to gradually find a region that makes relatively weak contribution to full-image classification, to obtain an image content region of target image content.
  • the located image content region is used as segmented supervised information. Therefore, one semantic image segmentation network model is obtained through training, to implement image segmentation.
  • the client may download the semantic image segmentation network model from the server, to further input a to-be-processed image into the semantic image segmentation network model.
  • the semantic image segmentation network model is used to output a semantic segmentation result of the to-be-processed image.
  • the client may upload a to-be-processed image to the server, and the server processes the to-be-processed image by using the semantic image segmentation network model, to obtain a semantic segmentation result and return a voice segmentation result to the client.
  • the server may directly process a to-be-processed image at the backend by using the semantic image segmentation network model, to obtain a semantic segmentation result.
  • the client may be deployed on a terminal device.
  • the terminal device includes, but is not limited to, an uncrewed vehicle, a robot, a tablet computer, a notebook computer, a palmtop computer, a mobile phone, a voice interaction device, and a personal computer (PC), and is not limited herein.
  • FIG. 2 is a schematic diagram of a procedural framework of a semantic image segmentation network model according to an embodiment of the present disclosure.
  • a training image 21 and class annotation information 22 on an image level are first obtained.
  • a weakly-supervised semantic image segmentation network model 23 is then obtained by training the training image 21 and the class annotation information 22.
  • an unknown test image 24 is obtained, and the test image 24 is inputted into the semantic image segmentation network model 23.
  • the semantic image segmentation network model 23 segments the unknown test image 24, to predict a semantic segmentation result 25 of the test image.
  • an embodiment of a method for training an image classification model in this embodiment of the present disclosure includes the following steps:
  • the model training apparatus first needs to obtain a to-be-trained image.
  • the to-be-trained image has class annotation information.
  • the class annotation information represents image content class information of an image content that is included in the to-be-trained image.
  • image content class information such as "person”, “horse”, “television”, and “couch” is annotated in the to-be-trained image.
  • the image content class information may be class information corresponding to a scene such as sky, cloud, lawn, and sea in the image.
  • a to-be-trained image may be downloaded from a database, and the to-be-trained image is then annotated in a manual annotation manner, to obtain class annotation information of the to-be-trained image.
  • a website having massive user data may be automatically crawled to obtain a to-be-trained image with class annotation information.
  • the to-be-trained image includes, but is not only limited to, the following formats: a bitmap (BMP) format, a PiCture eXchange (PCX) format, a Tagged Image File Format (TIF), a Graphics Interchange Format (GIF), a Joint Photographic Experts Group (JPEG) format, an exchangeable image file format (EXIF), a Scalable Vector Graphics (SVG) format, a Drawing Exchange Format (DXF), an Encapsulated PostScript (EPS) format, a Portable Network Graphics (PNG) format, a High Dynamic Range Imaging (HDRI) format, and a Windows Metafile (WMF) format.
  • BMP bitmap
  • PCX PiCture eXchange
  • TDF Tagged Image File Format
  • GIF Joint Photographic Experts Group
  • JPEG Joint Photographic Experts Group
  • EXIF exchangeable image file format
  • SVG Scalable Vector Graphics
  • DXF Drawing Exchange Format
  • EPS Encapsulated PostScript
  • PNG Portable Network
  • the to-be-trained image may exist in a format such as a HyperText Markup Language (HTML) format, a picture format, a document (Doc) format, a multimedia format, a dynamic web page format, or a Portable Document Format (PDF).
  • HTML HyperText Markup Language
  • Doc document
  • PDF Portable Document Format
  • FIG. 4 is a schematic structural diagram of an offset network and an image classification network according to an embodiment of the present disclosure.
  • a weight value of the to-be-trained offset network 42 needs to be fixed first. That is, a first model parameter of the to-be-trained offset network 42 is fixed.
  • a to-be-trained image 43 is then inputted into the to-be-trained image classification network 41.
  • the to-be-trained image classification network 41 outputs first prediction class annotation information of the to-be-trained image 43.
  • the to-be-trained offset network 42 is configured to provide an input point position that has relatively weak contribution to classification. Based on a changed offset variable 44, an objective of locating an image content region with relatively low discriminativeness can be achieved.
  • the to-be-trained image classification network 41 is configured to classify an image content region in an entire image.
  • the model training apparatus trains the to-be-trained image classification network by using a classification loss function.
  • the classification loss function is used for estimating a degree of inconsistency between a model prediction value and an actual value.
  • the image content class information of the to-be-trained image is an actual value.
  • the first prediction class annotation information of the to-be-trained image is a predicted value.
  • the classification loss function is smaller, it represents that the image classification network is more robust.
  • the second model parameter corresponding to the to-be-trained image classification network can be obtained according to the classification loss function.
  • the model training apparatus after obtaining the second model parameter of the to-be-trained image classification network through training, the model training apparatus performs model-based alternate training.
  • a weight value of the to-be-trained image classification network needs to be fixed. That is, the second model parameter of the to-be-trained image classification network is fixed.
  • the to-be-trained image is then inputted into the to-be-trained offset network.
  • the to-be-trained offset network outputs the second prediction class annotation information of the to-be-trained image.
  • the model parameter of the to-be-trained offset network may be fixed first.
  • the to-be-trained image classification network is then trained.
  • a model parameter of the to-be-trained image classification network may be fixed first.
  • the to-be-trained offset network is then trained.
  • an example in which the model parameter of the to-be-trained offset network is fixed first and the to-be-trained image classification network is then trained is used for description. However, this is not to be understood as a limitation to the present disclosure.
  • the model training apparatus trains the to-be-trained offset network by using one same classification loss function.
  • the classification loss function is used for estimating a degree of inconsistency between a model prediction value and an actual value.
  • the image content class information of the to-be-trained image is an actual value.
  • the second prediction class annotation information of the to-be-trained image is a predicted value.
  • the second model parameter corresponding to the offset network can be obtained based on the classification loss function.
  • a to-be-trained semantic image segmentation network model based on the second model parameter and the third model parameter, to obtain a semantic image segmentation network model, the semantic image segmentation network model being configured to determine a semantic segmentation result of a to-be-processed image.
  • the model training apparatus trains the to-be-trained semantic image segmentation network model based on model parameters (including the second model parameter and the third model parameter obtained through training) obtained in each round of training.
  • model parameters including the second model parameter and the third model parameter obtained through training
  • offset variables predicted in a training process of the offset network are fused into one image content region.
  • the obtained image content region is used as pixel-level segmented supervised information.
  • the to-be-trained semantic image segmentation network model is trained by using the supervised information, to obtain the semantic image segmentation network model.
  • the semantic image segmentation network model outputs a corresponding semantic segmentation result.
  • Supervised learning is mainly used for resolving two types of problems, namely, regression and classification.
  • the regression corresponds to a quantitative output
  • the classification corresponds to a qualitative output.
  • the calculation of known data to obtain a specific value is regression.
  • y f(x) is a typical regression relationship.
  • the calculation of known data or annotated data to obtain one class is classification.
  • to-be-trained images annotated on an image level may be trained by using an offset network and an image classification network, so that while the performance of a semantic image segmentation network model is ensured, manual pixel-level annotation is not required, to reduce the costs of manual annotation, thereby improving the efficiency of model training.
  • the determining a second model parameter corresponding to the to-be-trained image classification network by using a classification loss function based on the image content class information and the first prediction class annotation information includes:
  • a method for determining the second model parameter is described. First, a prediction probability value corresponding to each class is determined based on an actual value (that is, the image content class information of the to-be-trained image) and a predicted value (that is, the first prediction class annotation information of the to-be-trained image). It is assumed that there are five classes, namely, "person”, “horse”, “refrigerator”, “television”, and "couch".
  • the first prediction class annotation information includes "person”, “refrigerator”, “horse”, “television”, and “couch”, and prediction probability values may be obtained as follows: A prediction probability value of "person” is 0.93, a prediction probability value of "refrigerator” is 0.88, a prediction probability value of "horse” is 0, a prediction probability value of "television” is 0.5, and a prediction probability value of "couch” is 0.65. Next, a classification loss of the classification loss function is determined based on the prediction probability value corresponding to the each class.
  • a model parameter corresponding to the to-be-trained image classification network in the case of the minimum value may be obtained.
  • the model parameter is the second model parameter. It may be understood that a classification loss of the classification loss function in the present disclosure may be a cross-entropy classification loss.
  • the determining a third model parameter corresponding to the to-be-trained offset network by using the classification loss function based on the image content class information and the second prediction class annotation information includes:
  • a method for determining the third model parameter is described.
  • a prediction probability value corresponding to each class is determined based on an actual value (that is, the image content class information of the to-be-trained image) and a predicted value (that is, the second prediction class annotation information of the to-be-trained image).
  • the second prediction class annotation information herein is obtained after processing by a deformable convolutional neural network. It is assumed that there are five classes, namely, "person”, “horse”, “refrigerator”, “television”, and "couch".
  • the second prediction class annotation information includes "person”, “horse”, “refrigerator”, “television”, and “couch”, and prediction probability values may be obtained as follows: A prediction probability value of "person” is 0.75, a prediction probability value of "horse” is 0.19, a prediction probability value of "refrigerator” is 0.66, a prediction probability value of "television” is 0.43, and a prediction probability value of "couch” is 0.78. Next, a classification loss of the classification loss function is determined based on the prediction probability value corresponding to the each class.
  • a model parameter corresponding to the to-be-trained offset network in the case of the maximum value may be obtained.
  • the model parameter is the third model parameter. It may be understood that a classification loss of the classification loss function in the present disclosure may be a cross-entropy classification loss.
  • the classification loss of the classification loss function on an image level is maximized, so that the classification difficulty of the image classification network can be improved, to implement adversarial training, to enable the image classification network to have a better classification effect, that is, a better image classification effect.
  • the classification loss of the classification loss function on an image level is maximized, so that the offset network may provide an input point position that has relatively weak contribution to classification. Based on a changed offset variable, an objective of locating an image content region with relatively low discriminativeness is achieved.
  • an image content region on an image level is used as a training object, so that an obtained image classification network and offset network can predict the class of each image content region in an image.
  • exemplary content of the classification loss function is provided.
  • a feasible method can be provided for the implementation of the solution, thereby improving the feasibility and operability of the solution.
  • the method may further include:
  • the generating the second prediction class annotation information by using the deformable convolutional neural network is described.
  • a to-be-trained image is first inputted into the deformable convolutional neural network (deformable convolution).
  • the deformable convolutional neural network outputs an offset variable obtained through prediction.
  • the offset variable is a position offset of an input pixel corresponding to each weight value of a convolutional kernel. An actual input feature of an operation can be changed by using the offset variable.
  • FIG. 5 is a schematic structural diagram of a deformable convolutional neural network according to an embodiment of the present disclosure. As shown in the figure, for conventional convolutional windows, it is only necessary to train a pixel weight value of each convolutional window. Some parameters need to be additionally added to the deformable convolutional network to the shape of the training convolutional window.
  • An offset region 51 in FIG. 5 is a to-be-trained parameter additionally added to deformable convolution. The value of the to-be-trained parameter is the same as the value of a to-be-trained image 52.
  • the convolutional window slides in the offset region 51 to present an effect of a convolutional pixel offset, to implement sampling point optimization, and finally outputs a to-be-trained feature image 53.
  • the to-be-trained feature image is inputted into the to-be-trained offset network.
  • the to-be-trained offset network outputs the second prediction class annotation information.
  • a position offset variable of an input pixel corresponding to each weight in one convolutional kernel can be predicted, to change an actual input feature of convolutional operation, and training is performed to obtain the most effective transformation manner, so that an adversarial training mode can be implemented.
  • the obtaining a to-be-trained feature image corresponding to the to-be-trained image by using a deformable convolutional neural network may include:
  • the method for obtaining the to-be-trained feature image by using the deformable convolutional neural network is described.
  • a feature with an output position of p 0 if a conventional convolutional layer is used, and an input feature position set corresponding to the convolutional layer is p 0 + p n , where p n ⁇ R , and R is all standard square offsets with 0 as the center.
  • R corresponding to one 3 ⁇ 3 convolutional kernel is ⁇ (0, 0), (-1, -1), (-1, 1), (1, 1), (1, -1), (-1, 0), (1, 0), (0, 1), (0, 1) ⁇ .
  • An additional offset variable obtained through prediction is introduced into an input feature set of the deformable convolutional neural network based on p 0 + p n . Therefore, an actual inputted feature position set is p 0 + p n + ⁇ p n .
  • an exemplary manner of generating the to-be-trained feature image is provided in the foregoing manner.
  • a feasible method can be provided for the implementation of the solution, thereby improving the feasibility and operability of the solution.
  • the method may further include t:
  • the model training apparatus may start a next round of condition training.
  • a weight value of a to-be-trained offset network needs to be fixed first. That is, a third model parameter of the to-be-trained offset network is fixed.
  • a to-be-trained image is then inputted into the to-be-trained image classification network.
  • the to-be-trained image classification network outputs third prediction class annotation information of the to-be-trained image.
  • the model training apparatus trains the to-be-trained image classification network by using a classification loss function.
  • the classification loss function is used for estimating a degree of inconsistency between a model prediction value and an actual value.
  • Image content class information of the to-be-trained image is an actual value.
  • the third prediction class annotation information of the to-be-trained image is a predicted value.
  • a fourth model parameter corresponding to the to-be-trained image classification network can be obtained according to the classification loss function.
  • the model training apparatus After obtaining the fourth model parameter of the to-be-trained image classification network through training, the model training apparatus performs model-based alternate training. In this case, a weight value of the to-be-trained image classification network needs to be fixed. That is, the fourth model parameter of the to-be-trained image classification network is fixed.
  • the to-be-trained image is then inputted into the to-be-trained offset network.
  • the to-be-trained offset network outputs the fourth prediction class annotation information of the to-be-trained image.
  • the model training apparatus trains the to-be-trained offset network by using one same classification loss function.
  • the classification loss function is used for estimating a degree of inconsistency between a model prediction value and an actual value.
  • the image content class information of the to-be-trained image is an actual value.
  • the fourth prediction class annotation information of the to-be-trained image is a predicted value.
  • the fifth model parameter corresponding to the offset network can be obtained based on the classification loss function.
  • the model training apparatus trains the to-be-trained semantic image segmentation network model based on model parameters (including the second model parameter, the third model parameter, the fourth model parameter, and the fifth model parameter obtained through training) obtained in each round of training.
  • model parameters including the second model parameter, the third model parameter, the fourth model parameter, and the fifth model parameter obtained through training
  • offset variables predicted in a training process of the offset network are fused into one relatively complete image content region.
  • the obtained image content region is used as pixel-level segmented supervised information.
  • the to-be-trained semantic image segmentation network model is trained by using the supervised information, to obtain the semantic image segmentation network model.
  • the semantic image segmentation network model outputs a corresponding semantic segmentation result.
  • one branch is fixed, and a strategy of another branch is trained, to enable the image classification network and the offset network to continuously perform adversarial learning, so that a training classifier is continuously enhanced after a region with a weaker information amount is inputted into the image classification network, and the branch of the offset network can also continuously locate a region with weaker discriminativeness.
  • the training a to-be-trained semantic image segmentation network model based on the second model parameter and the third model parameter, to obtain a semantic image segmentation network model may include:
  • a method for generating the semantic image segmentation network model is described. After N times of alternate training ends, all offset variables obtained through prediction in a training process of the offset network are fused. Therefore, a relatively complete image content region may be obtained, to obtain the image content region corresponding to the to-be-trained image. The obtained image content region is used as pixel-level segmented supervised information. Next, the to-be-trained semantic image segmentation network model is trained by using the target loss function. The semantic image segmentation network model is generated when a loss result of the target loss function is minimum.
  • the semantic image segmentation network model has a very wide application range, for example, helps a retoucher with precise image beautification or assists an unscrewed vehicle in accurately understanding obstacles in front.
  • a threshold method is used.
  • An objective of the threshold method is to convert a grayscale image into a binary image with the foreground and background separated. It is assumed that the grayscale image only includes two main classes, namely, foreground image content and a background image.
  • an adequate pixel threshold is found in a manner of balancing an image statistics histogram. All points in the image are classified into the two types. A point with a value greater than the threshold is the image content, and a point with a value less than or equal to the threshold is the background.
  • a pixel clustering method is used. K center points are first chosen. All points in an image are distributed to the K centers based on differences between each pixel point and the K pixels. Subsequently, each class center is recalculated, and iteration and optimization are performed based on the foregoing steps, so that all pixels in the image are classified into K classes.
  • an image edge segmentation method is used. Different regions in an image are segmented by using extracted edge information.
  • a pixel-level image is used as a training object, so that the obtained semantic image segmentation network model can predict the class of each feature point in an image.
  • exemplary content of the target loss function is provided.
  • a feasible method can be provided for the implementation of the solution, thereby improving the feasibility and operability of the solution.
  • an image processing method in the present disclosure is described below.
  • the method may be performed by a computer device, for example, may be performed by a model training apparatus in the computer device.
  • the computer device may be the terminal device or server in the foregoing system shown in FIG. 1 .
  • an embodiment of an image processing method in this embodiment of the present disclosure includes the following steps:
  • the image processing apparatus may obtain a to-be-processed image.
  • the image processing apparatus may use a camera to obtain a street view image acquired in a travel process of the uncrewed vehicle.
  • the image processing apparatus may acquire in real time a real view image of an environment where the robot is located.
  • the image processing apparatus may obtain a photo photographed by a user or a picture downloaded from a website. These images may all be used as to-be-processed images.
  • the to-be-processed image includes, but is not only limited to, the following formats: a BMP format, a PCX format, a TIF, a GIF, a JPEG, an EXIF, an SVG format, a DXF, an EPS format, a PNG format, an HDRI format, and a WMF format.
  • a semantic image segmentation network model being obtained based on alternate training of a to-be-trained image classification network and a to-be-trained offset network
  • the to-be-trained offset network being configured to classify the image based on an offset variable
  • the to-be-trained image classification network being configured to classify image content in the image.
  • the image processing apparatus inputs the to-be-processed image into the semantic image segmentation network model, and the semantic image segmentation network model outputs a corresponding semantic segmentation result.
  • the semantic image segmentation network model is obtained based on alternate training of a to-be-trained image classification network and a to-be-trained offset network.
  • the to-be-trained offset network is configured to classify the image based on an offset variable.
  • the to-be-trained image classification network is configured to classify image content in the image. It may be understood that a training process of the semantic image segmentation network model is content described in the foregoing FIG. 3 and the first to eighth embodiments corresponding to FIG. 3 . Details are not described herein.
  • the semantic image segmentation network model may be obtained through training based on a fully convolutional network (FCN), a conditional random field (CRF), or a Markov random field (MRF), or may be obtained through training of a neural network having another structure. Details are not limited herein.
  • FCN fully convolutional network
  • CRF conditional random field
  • MRF Markov random field
  • a convolutional technology In the FCN, a convolutional technology, an upsampling technology, and a skip structure (skip layer) technology are mainly used.
  • a fully connected layer is discarded in an ordinary classification network, for example, a network such as VGG16 or a residual network (ResNet) 50/101, and is replaced with a corresponding convolutional layer.
  • the upsampling is deconvolution.
  • the deconvolution and convolution are similar, and are both operations of multiplication and addition.
  • the deconvolution is one-to-multiple deconvolutional forward propagation and backpropagation, and it is only necessary to invert forward propagation and backpropagation of convolution.
  • the function of the skip structure is to optimize a result. If results obtained by directly performing upsampling on results after full convolution are relatively coarse, upsampling needs to be performed on results of different pooling layers to optimize an output.
  • the image processing apparatus processes the to-be-processed image based on the semantic segmentation result.
  • the semantic segmentation result may be used for search by image on a website, that is, search for another image related to the to-be-processed image, or may be used for personalized recommendation or the like based on the analysis of image content.
  • the semantic segmentation result usually has the following characteristics. First, different regions obtained through segmentation are flat inside and have similar texture and grayscale. Second, attributes used as the basis for segmentation are significantly different in adjacent semantic segmentation regions. Third, different semantic regions obtained after segmentation have specific and regular boundaries.
  • weakly-supervised semantic image segmentation can be implemented, and can be applied to a case of annotated data lacking fine pixel-level segmentation, and high-accuracy image segmentation is implemented only relying on full-image classification and annotation.
  • FIG. 7 is a schematic flowchart of image processing based on a deformable convolutional neural network according to an embodiment of the present disclosure.
  • a to-be-processed image 71 is obtained first.
  • An image of interest for example, a red vehicle shown in FIG. 7 , is extracted from the to-be-processed image.
  • the image of interest extracted from the to-be-processed image is inputted into a convolutional layer 72.
  • a region of interest pooling layer 73 is used to obtain a feature map 74 of the image of interest.
  • a target of pooling is a 3 ⁇ 3 feature map.
  • region of interest pooling may be performed on an inputted image of interest first, to obtain the feature map 74 with a size of 3 ⁇ 3, and an offset variable 76 corresponding to each region is then outputted by using a fully connected layer 75.
  • a semantic segmentation result (including classification information 78 and positioning information 79) is obtained.
  • FIG. 8 is a schematic diagram of an embodiment of a model training apparatus according to an embodiment of the present disclosure.
  • a model training apparatus 30 includes:
  • the obtaining module 301 obtains a to-be-trained image having class annotation information, the class annotation information representing image content class information of an image content that is included in the to-be-trained image, when a first model parameter of a to-be-trained offset network is fixed, the obtaining module 301 obtains first prediction class annotation information of the to-be-trained image by using a to-be-trained image classification network, the to-be-trained offset network being configured to classify the to-be-trained image based on an offset variable, the to-be-trained image classification network being configured to classify the image content in the to-be-trained image, the determining module 302 determines a second model parameter corresponding to the to-be-trained image classification network by using a classification loss function based on the image content class information and the first prediction class annotation information that is obtained by the obtaining module 301, when the second model parameter of the to-be-trained image classification network is fixed, the obtaining module 301 obtains second prediction class annotation
  • to-be-trained images annotated on an image level may be trained by using an offset network and an image classification network, so that while the performance of a semantic image segmentation network model is ensured, manual pixel-level annotation is not required, to reduce the costs of manual annotation, thereby improving the efficiency of model training.
  • the classification loss of the classification loss function on an image level is maximized, so that the classification difficulty of the image classification network can be improved, to implement adversarial training, to enable the image classification network to have a better classification effect, that is, a better image classification effect.
  • the classification loss of the classification loss function on an image level is maximized, so that the offset network may provide an input point position that has relatively weak contribution to classification. Based on a changed offset variable, an objective of locating an image content region with relatively low discriminativeness is achieved.
  • a position offset variable of an input pixel corresponding to each weight in one convolutional kernel can be predicted, to change an actual input feature of convolutional operation, and training is performed to obtain the most effective transformation manner, so that an adversarial training mode can be implemented.
  • an exemplary manner of generating the to-be-trained feature image is provided in the foregoing manner.
  • a feasible method can be provided for the implementation of the solution, thereby improving the feasibility and operability of the solution.
  • one branch is fixed, and a strategy of another branch is trained, to enable the image classification network and the offset network to continuously perform adversarial learning, so that a training classifier is continuously enhanced after a region with a weaker information amount is inputted into the image classification network, and the branch of the offset network can also continuously locate a region with weaker discriminativeness.
  • exemplary content of the target loss function is provided.
  • a feasible method can be provided for the implementation of the solution, thereby improving the feasibility and operability of the solution.
  • FIG. 9 is a schematic diagram of an embodiment of an image processing apparatus according to an embodiment of the present disclosure.
  • An image processing apparatus 40 includes:
  • the obtaining module 401 obtains a to-be-processed image
  • the obtaining module 401 obtains a semantic segmentation result of the to-be-processed image by using a semantic image segmentation network model, the semantic image segmentation network model being obtained based on alternate training of a to-be-trained image classification network and a to-be-trained offset network, the to-be-trained offset network being configured to classify the image based on an offset variable, the to-be-trained image classification network being configured to classify image content in the image
  • the processing module 402 processes the to-be-processed image based on the semantic segmentation result obtained by the obtaining module 401.
  • weakly-supervised semantic image segmentation can be implemented, and can be applied to a case of annotated data lacking fine pixel-level segmentation, and high-accuracy image segmentation is implemented only relying on full-image classification and annotation.
  • FIG. 10 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
  • the server 500 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPU) 522 (for example, one or more processors) and a memory 532, and one or more storage media 530 (for example, one or more mass storage devices) that store application programs 542 or data 544.
  • the memory 532 and the storage medium 530 may be transient or persistent storages.
  • a program stored in the storage medium 530 may include one or more modules (which are not marked in the figure), and each module may include a series of instruction operations on the server.
  • the CPU 522 may be set to communicate with the storage medium 530, and perform, on the server 500, the series of instruction operations in the storage medium 530.
  • the server 500 may further include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 558, and/or one or more operating systems 541 such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , or FreeBSD TM .
  • operating systems 541 such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , or FreeBSD TM .
  • the steps performed by the server in the foregoing embodiments may be based on the server structure shown in FIG. 10 .
  • the CPU 522 included in the server may further be configured to perform all or some steps in the foregoing embodiment shown in FIG. 3 or FIG. 6 .
  • An embodiment of the present disclosure further provides another image processing apparatus, as shown in FIG. 11 .
  • the terminal device may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), and an on-board computer, and the terminal device being a mobile phone is used as an example.
  • PDA personal digital assistant
  • POS point of sales
  • FIG. 11 is a block diagram of the structure of a part of a mobile phone related to a terminal device according to an embodiment of the present disclosure.
  • the mobile phone includes components such as: a radio frequency (RF) circuit 610, a memory 620, an input unit 630, a display unit 640, a sensor 650, an audio circuit 660, a wireless fidelity (Wi-Fi) module 670, a processor 680, and a power supply 690.
  • RF radio frequency
  • the RF circuit 610 may be configured to receive and transmit signals during an information receiving and transmitting process or a call process. Specifically, the RF circuit receives downlink information from a base station, then delivers the downlink information to the processor 680 for processing, and transmits designed uplink data to the base station.
  • the memory 620 may be configured to store a software program and module.
  • the processor 680 runs the software program and module stored in the memory 620, to implement various functional applications and data processing of the mobile phone.
  • the input unit 630 may be configured to receive input digit or character information, and generate a keyboard signal input related to the user setting and function control of the mobile phone.
  • the input unit 630 may include a touch panel 631 and another input device 632.
  • the input unit 630 may further include the another input device 632.
  • the another input device 632 may include, but is not limited to, one or more of a physical keyboard, a functional key (such as a volume control key or a switch key), a track ball, a mouse, and a joystick.
  • the display unit 640 may be configured to display information input by the user or information provided for the user, and various menus of the mobile phone.
  • the display unit 640 may include a display panel 641.
  • the display panel 641 may be configured by using a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 631 may cover the display panel 641.
  • the touch panel 631 and the display panel 641 are used as two separate parts to implement input and output functions of the mobile phone, in some embodiments, the touch panel 631 and the display panel 641 may be integrated to implement the input and output functions of the mobile phone.
  • the mobile phone may further include at least one sensor 650 such as an optical sensor, a motion sensor, and other sensors.
  • at least one sensor 650 such as an optical sensor, a motion sensor, and other sensors.
  • the audio circuit 660, a loudspeaker 661, and a microphone 662 may provide audio interfaces between the user and the mobile phone.
  • FIG. 11 shows the Wi-Fi module 670, it may be understood that the Wi-Fi module is not a necessary component of the mobile phone, and the Wi-Fi module may be omitted as required provided that the scope of the essence of the present disclosure is not changed.
  • the processor 680 is a control center of the mobile phone, and is connected to various parts of the entire mobile phone by using various interfaces and lines. By running or executing a software program and/or module stored in the memory 620, and invoking data stored in the memory 620, the processor executes various functions of the mobile phone and performs data processing, thereby monitoring the entire mobile phone.
  • the mobile phone further includes the power supply 690 (such as a battery) for supplying power to the components.
  • the power supply may be logically connected to the processor 680 by using a power management system.
  • the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the processor 680 included in the terminal device may further be configured to perform all or some steps in the foregoing embodiment shown in FIG. 3 or FIG. 6 .
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the described apparatus embodiment is merely an example.
  • the unit division is merely a logical function division and may be other division during actual implementation.
  • a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces.
  • the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, and may be located in one place or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each of the units may be physically separated, or two or more units may be integrated into one unit.
  • the integrated unit may be implemented in the form of hardware, or may be implemented in a form of a software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a PC, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of the present disclosure.
  • the foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP20777689.9A 2019-03-26 2020-03-16 Verfahren zum trainieren eines bildklassifizierungsmodells sowie verfahren und vorrichtung zur bildverarbeitung Pending EP3951654A4 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910233985.5A CN109784424B (zh) 2019-03-26 2019-03-26 一种图像分类模型训练的方法、图像处理的方法及装置
PCT/CN2020/079496 WO2020192471A1 (zh) 2019-03-26 2020-03-16 一种图像分类模型训练的方法、图像处理的方法及装置

Publications (2)

Publication Number Publication Date
EP3951654A1 true EP3951654A1 (de) 2022-02-09
EP3951654A4 EP3951654A4 (de) 2022-05-25

Family

ID=66490551

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20777689.9A Pending EP3951654A4 (de) 2019-03-26 2020-03-16 Verfahren zum trainieren eines bildklassifizierungsmodells sowie verfahren und vorrichtung zur bildverarbeitung

Country Status (6)

Country Link
US (1) US20210241109A1 (de)
EP (1) EP3951654A4 (de)
JP (1) JP7185039B2 (de)
KR (1) KR20210072051A (de)
CN (1) CN109784424B (de)
WO (1) WO2020192471A1 (de)

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161274B (zh) * 2018-11-08 2023-07-07 上海市第六人民医院 腹部图像分割方法、计算机设备
CN109784424B (zh) * 2019-03-26 2021-02-09 腾讯科技(深圳)有限公司 一种图像分类模型训练的方法、图像处理的方法及装置
CN110210544B (zh) * 2019-05-24 2021-11-23 上海联影智能医疗科技有限公司 图像分类方法、计算机设备和存储介质
CN110223230A (zh) * 2019-05-30 2019-09-10 华南理工大学 一种多前端深度图像超分辨率系统及其数据处理方法
CN111047130B (zh) * 2019-06-11 2021-03-02 北京嘀嘀无限科技发展有限公司 用于交通分析和管理的方法和系统
CN110458218B (zh) * 2019-07-31 2022-09-27 北京市商汤科技开发有限公司 图像分类方法及装置、分类网络训练方法及装置
CN110490239B (zh) * 2019-08-06 2024-02-27 腾讯医疗健康(深圳)有限公司 图像质控网络的训练方法、质量分类方法、装置及设备
CN110807760B (zh) * 2019-09-16 2022-04-08 北京农业信息技术研究中心 一种烟叶分级方法及系统
CN110705460B (zh) * 2019-09-29 2023-06-20 北京百度网讯科技有限公司 图像类别识别方法及装置
CN110737783B (zh) * 2019-10-08 2023-01-17 腾讯科技(深圳)有限公司 一种推荐多媒体内容的方法、装置及计算设备
CN110826596A (zh) * 2019-10-09 2020-02-21 天津大学 一种基于多尺度可变形卷积的语义分割方法
CN110704661B (zh) * 2019-10-12 2021-04-13 腾讯科技(深圳)有限公司 一种图像分类方法和装置
CN110930417B (zh) * 2019-11-26 2023-08-08 腾讯科技(深圳)有限公司 图像分割模型的训练方法和装置、图像分割方法和装置
CN110956214B (zh) * 2019-12-03 2023-10-13 北京车和家信息技术有限公司 一种自动驾驶视觉定位模型的训练方法及装置
CN112750128B (zh) * 2019-12-13 2023-08-01 腾讯科技(深圳)有限公司 图像语义分割方法、装置、终端及可读存储介质
CN113053332B (zh) * 2019-12-28 2022-04-22 Oppo广东移动通信有限公司 背光亮度调节方法、装置、电子设备及可读存储介质
CN111259904B (zh) * 2020-01-16 2022-12-27 西南科技大学 一种基于深度学习和聚类的语义图像分割方法及系统
CN111369564B (zh) * 2020-03-04 2022-08-09 腾讯科技(深圳)有限公司 一种图像处理的方法、模型训练的方法及装置
CN111523548B (zh) * 2020-04-24 2023-11-28 北京市商汤科技开发有限公司 一种图像语义分割、智能行驶控制方法及装置
CN113673668A (zh) * 2020-05-13 2021-11-19 北京君正集成电路股份有限公司 一种车辆检测训练中二级损失函数的计算方法
CN111723813B (zh) 2020-06-05 2021-07-06 中国科学院自动化研究所 基于类内判别器的弱监督图像语义分割方法、系统、装置
CN111814833B (zh) * 2020-06-11 2024-06-07 浙江大华技术股份有限公司 票据处理模型的训练方法及图像处理方法、图像处理设备
CN111783635A (zh) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 图像标注方法、装置、设备以及存储介质
CN111784673B (zh) * 2020-06-30 2023-04-18 创新奇智(上海)科技有限公司 缺陷检测模型训练和缺陷检测方法、设备及存储介质
CN112132841B (zh) * 2020-09-22 2024-04-09 上海交通大学 医疗图像切割方法及装置
CN112333402B (zh) * 2020-10-20 2021-10-22 浙江大学 一种基于声波的图像对抗样本生成方法及系统
CN112257727B (zh) * 2020-11-03 2023-10-27 西南石油大学 一种基于深度学习自适应可变形卷积的特征图像提取方法
CN112487479B (zh) * 2020-12-10 2023-10-13 支付宝(杭州)信息技术有限公司 一种训练隐私保护模型的方法、隐私保护方法及装置
CN112232355B (zh) * 2020-12-11 2021-04-02 腾讯科技(深圳)有限公司 图像分割网络处理、图像分割方法、装置和计算机设备
CN112950639B (zh) * 2020-12-31 2024-05-10 山西三友和智慧信息技术股份有限公司 一种基于SA-Net的MRI医学图像分割方法
CN112819008B (zh) * 2021-01-11 2022-10-28 腾讯科技(深圳)有限公司 实例检测网络的优化方法、装置、介质及电子设备
CN112767420B (zh) * 2021-02-26 2021-11-23 中国人民解放军总医院 基于人工智能的核磁影像分割方法、装置、设备和介质
CN113033549B (zh) * 2021-03-09 2022-09-20 北京百度网讯科技有限公司 定位图获取模型的训练方法和装置
CN113033436B (zh) * 2021-03-29 2024-04-16 京东鲲鹏(江苏)科技有限公司 障碍物识别模型训练方法及装置、电子设备、存储介质
CN113139618B (zh) * 2021-05-12 2022-10-14 电子科技大学 一种基于集成防御的鲁棒性增强的分类方法及装置
CN113505800A (zh) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 图像处理方法及其模型的训练方法和装置、设备、介质
CN113822901B (zh) * 2021-07-21 2023-12-12 南京旭锐软件科技有限公司 图像分割方法、装置、存储介质及电子设备
CN113610807B (zh) * 2021-08-09 2024-02-09 西安电子科技大学 基于弱监督多任务学习的新冠肺炎分割方法
CN113642581B (zh) * 2021-08-12 2023-09-22 福州大学 基于编码多路径语义交叉网络的图像语义分割方法及系统
CN114004854B (zh) * 2021-09-16 2024-06-07 清华大学 一种显微镜下的切片图像实时处理显示系统和方法
KR102430989B1 (ko) 2021-10-19 2022-08-11 주식회사 노티플러스 인공지능 기반 콘텐츠 카테고리 예측 방법, 장치 및 시스템
CN113723378B (zh) * 2021-11-02 2022-02-08 腾讯科技(深圳)有限公司 一种模型训练的方法、装置、计算机设备和存储介质
CN114049516A (zh) * 2021-11-09 2022-02-15 北京百度网讯科技有限公司 训练方法、图像处理方法、装置、电子设备以及存储介质
CN113780249B (zh) * 2021-11-10 2022-02-15 腾讯科技(深圳)有限公司 表情识别模型的处理方法、装置、设备、介质和程序产品
CN114332554A (zh) * 2021-11-10 2022-04-12 腾讯科技(深圳)有限公司 图像分割模型的训练方法、图像分割方法、装置及设备
CN113963220A (zh) * 2021-12-22 2022-01-21 熵基科技股份有限公司 安检图像分类模型训练方法、安检图像分类方法及装置
TWI806392B (zh) * 2022-01-27 2023-06-21 國立高雄師範大學 表格文本的表格辨識方法
CN115019038B (zh) * 2022-05-23 2024-04-30 杭州海马体摄影有限公司 一种相似图像像素级语义匹配方法
CN114677677B (zh) * 2022-05-30 2022-08-19 南京友一智能科技有限公司 一种质子交换膜燃料电池气体扩散层材料比例预测方法
CN114792398B (zh) * 2022-06-23 2022-09-27 阿里巴巴(中国)有限公司 图像分类的方法、存储介质、处理器及系统
CN115170809B (zh) * 2022-09-06 2023-01-03 浙江大华技术股份有限公司 图像分割模型训练、图像分割方法、装置、设备及介质
CN116403163B (zh) * 2023-04-20 2023-10-27 慧铁科技有限公司 一种截断塞门手把开合状态的识别方法和装置
CN116363374B (zh) * 2023-06-02 2023-08-29 中国科学技术大学 图像语义分割网络持续学习方法、系统、设备及存储介质
CN117218686B (zh) * 2023-10-20 2024-03-29 广州脉泽科技有限公司 一种开放场景下的掌静脉roi提取方法及系统
CN117333493B (zh) * 2023-12-01 2024-03-15 深圳市志达精密科技有限公司 一种基于机器视觉的显示器底座生产用检测系统以及方法
CN117911501B (zh) * 2024-03-20 2024-06-04 陕西中铁华博实业发展有限公司 一种金属加工钻孔高精度定位方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436583B (zh) * 2011-09-26 2013-10-30 哈尔滨工程大学 基于对标注图像学习的图像分割方法
US10019657B2 (en) * 2015-05-28 2018-07-10 Adobe Systems Incorporated Joint depth estimation and semantic segmentation from a single image
US10657364B2 (en) * 2016-09-23 2020-05-19 Samsung Electronics Co., Ltd System and method for deep network fusion for fast and robust object detection
JP6722351B2 (ja) 2017-04-26 2020-07-15 株式会社ソニー・インタラクティブエンタテインメント 学習装置、画像認識装置、学習方法及びプログラム
EP3432263B1 (de) * 2017-07-17 2020-09-16 Siemens Healthcare GmbH Semantische segmentierung zum nachweis von krebs in digitaler brusttomosynthese
CN108764164B (zh) * 2018-05-30 2020-12-08 华中科技大学 一种基于可变形卷积网络的人脸检测方法及系统
CN109101897A (zh) * 2018-07-20 2018-12-28 中国科学院自动化研究所 水下机器人的目标检测方法、系统及相关设备
CN109493330A (zh) * 2018-11-06 2019-03-19 电子科技大学 一种基于多任务学习的细胞核实例分割方法
CN109784424B (zh) * 2019-03-26 2021-02-09 腾讯科技(深圳)有限公司 一种图像分类模型训练的方法、图像处理的方法及装置

Also Published As

Publication number Publication date
US20210241109A1 (en) 2021-08-05
CN109784424A (zh) 2019-05-21
KR20210072051A (ko) 2021-06-16
CN109784424B (zh) 2021-02-09
WO2020192471A1 (zh) 2020-10-01
EP3951654A4 (de) 2022-05-25
JP2022505775A (ja) 2022-01-14
JP7185039B2 (ja) 2022-12-06

Similar Documents

Publication Publication Date Title
EP3951654A1 (de) Verfahren zum trainieren eines bildklassifizierungsmodells sowie verfahren und vorrichtung zur bildverarbeitung
US11861829B2 (en) Deep learning based medical image detection method and related device
EP3940638B1 (de) Bildregionpositionierungsverfahren, modelltrainingsverfahren und zugehörige vorrichtung
EP3968179A1 (de) Verfahren und vorrichtung zur erkennung eines ortes, modelltrainingsverfahren und vorrichtung zur ortserkennung sowie elektronische vorrichtung
US12008810B2 (en) Video sequence selection method, computer device, and storage medium
JP2022529557A (ja) 医用画像分割方法、医用画像分割装置、電子機器及びコンピュータプログラム
CN111507378A (zh) 训练图像处理模型的方法和装置
WO2022001623A1 (zh) 基于人工智能的图像处理方法、装置、设备及存储介质
CN110765882B (zh) 一种视频标签确定方法、装置、服务器及存储介质
CN111062441A (zh) 基于自监督机制和区域建议网络的场景分类方法及装置
CN113052106B (zh) 一种基于PSPNet网络的飞机起降跑道识别方法
CN112419326B (zh) 图像分割数据处理方法、装置、设备及存储介质
CN113723378B (zh) 一种模型训练的方法、装置、计算机设备和存储介质
CN113706562B (zh) 图像分割方法、装置、系统及细胞分割方法
CN116935188B (zh) 模型训练方法、图像识别方法、装置、设备及介质
CN114722937A (zh) 一种异常数据检测方法、装置、电子设备和存储介质
CN114385662A (zh) 路网更新方法、装置、存储介质及电子设备
CN112906517A (zh) 一种自监督的幂律分布人群计数方法、装置和电子设备
CN115937661A (zh) 一种3d场景理解方法、系统、电子设备及存储介质
CN117036658A (zh) 一种图像处理方法及相关设备
CN113822143A (zh) 文本图像的处理方法、装置、设备以及存储介质
US11790228B2 (en) Methods and systems for performing tasks on media using attribute specific joint learning
CN114283290B (zh) 图像处理模型的训练、图像处理方法、装置、设备及介质
CN117315516B (zh) 基于多尺度注意力相似化蒸馏的无人机检测方法及装置
CN117011578A (zh) 对象识别方法和装置、存储介质及电子设备

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210517

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Free format text: PREVIOUS MAIN CLASS: G06K0009620000

Ipc: G06V0010820000

A4 Supplementary search report drawn up and despatched

Effective date: 20220426

RIC1 Information provided on ipc code assigned before grant

Ipc: G06N 3/04 20060101ALI20220420BHEP

Ipc: G06K 9/62 20220101ALI20220420BHEP

Ipc: G06V 10/26 20220101ALI20220420BHEP

Ipc: G06V 10/82 20220101AFI20220420BHEP

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
RAP3 Party data changed (applicant data changed or rights of an application transferred)

Owner name: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20240315