WO2020253663A1 - 基于人工智能的图像区域识别方法、模型训练方法及装置 - Google Patents
基于人工智能的图像区域识别方法、模型训练方法及装置 Download PDFInfo
- Publication number
- WO2020253663A1 WO2020253663A1 PCT/CN2020/096237 CN2020096237W WO2020253663A1 WO 2020253663 A1 WO2020253663 A1 WO 2020253663A1 CN 2020096237 W CN2020096237 W CN 2020096237W WO 2020253663 A1 WO2020253663 A1 WO 2020253663A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- area
- segmentation
- heat map
- trained
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 132
- 238000012549 training Methods 0.000 title claims description 52
- 238000013473 artificial intelligence Methods 0.000 title description 4
- 238000003709 image segmentation Methods 0.000 claims abstract description 409
- 230000011218 segmentation Effects 0.000 claims abstract description 224
- 239000011159 matrix material Substances 0.000 claims description 146
- 238000012545 processing Methods 0.000 claims description 48
- 230000006870 function Effects 0.000 claims description 46
- 238000002372 labelling Methods 0.000 claims description 46
- 230000004044 response Effects 0.000 claims description 12
- 238000004891 communication Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 9
- 238000005192 partition Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 37
- 230000008569 process Effects 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 10
- 239000011796 hollow space material Substances 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000004438 eyesight Effects 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 241000282693 Cercopithecidae Species 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/149—Segmentation; Edge detection involving deformable models, e.g. active contour models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/469—Contour-based spatial representations, e.g. vector-coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20101—Interactive definition of point of interest, landmark or seed
Definitions
- This application relates to the field of artificial intelligence, in particular to the segmentation and recognition of image regions.
- image segmentation technology has become more and more widely used, such as medical image segmentation and natural image segmentation.
- the image segmentation technology refers to the technology that divides the image into several specific areas with unique properties and proposes objects of interest.
- the medical image can be segmented, so that the various tissues of the human body can be clearly distinguished in the segmented image.
- an auxiliary segmentation tool is provided.
- the user draws a bounding box (bbox) in the image.
- the bbox needs to select the target box to be labeled, and then output the polygon of the target to be labeled through the neural network model.
- the segmentation result if the segmentation result is not accurate, the user can revise the result again.
- the embodiments of the application provide an image region recognition method, model training method, and device based on artificial intelligence.
- the region with poor effect in the first stage of image segmentation is further segmented to obtain More accurate image segmentation results and improved image segmentation performance.
- the first aspect of the present application provides an image region recognition method, the method is executed by an image processing device, and the method includes:
- First image feature information is generated according to the image to be segmented, where the first image feature information includes N image matrices and a first heat map, and the first heat map is generated based on the multiple extreme points ,
- the N is an integer greater than or equal to 1;
- the first image segmentation region corresponding to the first image feature information is acquired through a first image segmentation model, where the first image segmentation model includes a first heat map channel and N first matrix channels, the N The first matrix channel has a one-to-one correspondence with the N image matrices, and the first heat map channel has a corresponding relationship with the first heat map;
- the second image segmentation area corresponding to the image to be segmented is acquired through a second image segmentation model, where the second image segmentation model includes a segmentation area channel, a second heat map channel, and N second matrix channels.
- the N second matrix channels have a one-to-one correspondence with the N image matrices
- the segmented area channels have a corresponding relationship with the first image segmented areas
- the second heat map channels have a corresponding relationship with the second heat map.
- the graph has a corresponding relationship.
- the second aspect of the present application provides a method for model training, the method is executed by an image processing device, and the method includes:
- the model parameters are used to train the image segmentation model to be trained to obtain a second image segmentation model.
- the third aspect of the present application provides an image processing device, including:
- An acquisition module for acquiring an image to be divided, wherein the image to be divided includes a plurality of extreme points
- the generating module is configured to generate first image feature information according to the image to be divided obtained by the obtaining module, wherein the first image feature information includes N image matrices and a first heat map, and the first heat map Is generated based on the multiple extreme points, the N is an integer greater than or equal to 1;
- the acquisition module is configured to acquire the first image segmentation area corresponding to the first image feature information generated by the generation module through a first image segmentation model, wherein the first image segmentation model includes a first heat map Channels and N first matrix channels, the N first matrix channels have a one-to-one correspondence with the N image matrices, and the first heat map channel has a corresponding relationship with the first heat map;
- the acquiring module is further configured to acquire a second heat map according to the annotation points corresponding to the first image segmentation area and the first image segmentation area;
- the acquisition module is further configured to acquire a second image segmentation area corresponding to the image to be segmented through a second image segmentation model, wherein the second image segmentation model includes a segmentation area channel, a second heat map channel, and N Second matrix channels, the N second matrix channels have a one-to-one correspondence with the N image matrices, the segmented area channels have a corresponding relationship with the first image segmented area, and the second hot The map channel has a corresponding relationship with the second heat map;
- the generating module is configured to generate an image recognition result of the image to be divided according to the second image segmentation area.
- a fourth aspect of the present application provides an image processing device, including:
- An acquisition module for acquiring a set of images to be trained, wherein the set of images to be trained includes at least one image to be trained;
- the acquisition module is further configured to acquire a first predicted segmentation area of the image to be trained through a first image segmentation model, where the first image segmentation model is an image segmentation model obtained by pre-training;
- a generating module configured to generate a heat map to be trained based on the real segmentation area of the image to be trained and the first predicted segmentation area acquired by the acquisition module, wherein the heat map to be trained is composed of at least one difference point Generated;
- the acquisition module is further configured to acquire the first image segmentation model according to the image to be trained, the first predicted segmentation area, the heat map to be trained generated by the generating module, and the real segmentation area. 2. Predict the partition area;
- a determining module configured to use a target loss function to determine the model parameters corresponding to the image segmentation model to be trained according to the second predicted segmentation area and the real segmentation area acquired by the acquisition module;
- the training module is configured to use the model parameters determined by the determining module to train the image segmentation model to be trained to obtain a second image segmentation model.
- a fifth aspect of the present application provides a terminal device, including: a memory, a transceiver, a processor, and a bus system;
- the memory is used to store programs
- the processor is used to execute the program in the memory and includes the following steps:
- First image feature information is generated according to the image to be segmented, where the first image feature information includes N image matrices and a first heat map, and the first heat map is generated based on the multiple extreme points ,
- the N is an integer greater than or equal to 1;
- the first image segmentation region corresponding to the first image feature information is acquired through a first image segmentation model, where the first image segmentation model includes a first heat map channel and N first matrix channels, the N The first matrix channel has a one-to-one correspondence with the N image matrices, and the first heat map channel has a corresponding relationship with the first heat map;
- the second image segmentation area corresponding to the image to be segmented is acquired through a second image segmentation model, where the second image segmentation model includes a segmentation area channel, a second heat map channel, and N second matrix channels.
- the N second matrix channels have a one-to-one correspondence with the N image matrices
- the segmented area channels have a corresponding relationship with the first image segmented areas
- the second heat map channels have a corresponding relationship with the second heat map.
- the graph has a corresponding relationship;
- the bus system is used to connect the memory and the processor, so that the memory and the processor communicate.
- a sixth aspect of the present application provides a server, including: a memory, a transceiver, a processor, and a bus system;
- the memory is used to store programs
- the processor is used to execute the program in the memory and includes the following steps:
- the bus system is used to connect the memory and the processor, so that the memory and the processor perform Communication.
- a seventh aspect of the present application provides a computer-readable storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the methods described in each of the foregoing aspects.
- the eighth aspect of the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute the method described in the above aspect.
- an image region recognition method is provided.
- the image to be segmented is first obtained, where the image to be segmented includes multiple extreme points, and then first image feature information is generated according to the image to be segmented, and then the first image
- the segmentation model obtains the first image segmentation area corresponding to the first image feature information, and obtains the second heat map based on the annotation points corresponding to the first image segmentation area and the first image segmentation area, and finally obtains the to-be-segmented image through the second image segmentation model The second image segmentation area corresponding to the image.
- the regions with poor results in the first stage image segmentation can be further segmented, so as to obtain more accurate image segmentation results.
- FIG. 1 is a schematic diagram of an architecture of an image area recognition system in an embodiment of the application
- FIG. 2 is a schematic flowchart of an image area recognition method in an embodiment of this application.
- FIG. 3 is a schematic diagram of an embodiment of an image area recognition method in an embodiment of the application.
- FIG. 4 is a schematic diagram of an embodiment of selecting four extreme points in an embodiment of the application.
- Fig. 5 is a schematic diagram of an embodiment in which the first image segmentation model returns to the first image segmentation area in an embodiment of the application;
- FIG. 6 is a schematic diagram of a labeling position based on a first image segmentation area in an embodiment of the application
- FIG. 7 is a schematic diagram of another labeling position based on the first image segmentation area in an embodiment of the application.
- FIG. 8 is a schematic diagram of an embodiment of generating first image feature information in an embodiment of the application.
- FIG. 9 is a schematic diagram of an embodiment of generating second image feature information in an embodiment of the application.
- FIG. 10 is a schematic structural diagram of a second image segmentation model in an embodiment of the application.
- FIG. 11 is a schematic structural diagram of the Xception model in an embodiment of the application.
- FIG. 12 is a schematic diagram of a schematic diagram of a separable convolution with a hole depth in an embodiment of the application
- FIG. 13 is a schematic diagram of an embodiment of a model training method in an embodiment of the application.
- FIG. 14 is a schematic diagram of the original image of the example in the embodiment of the application.
- FIG. 15 is a schematic diagram of an embodiment of selecting a difference point in an embodiment of the application.
- FIG. 16 is a schematic diagram of an embodiment of an image processing device in an embodiment of the application.
- FIG. 17 is a schematic diagram of an embodiment of an image processing device in an embodiment of the application.
- FIG. 18 is a schematic diagram of a structure of a terminal device in an embodiment of the application.
- FIG. 19 is a schematic diagram of a structure of a server in an embodiment of the application.
- the embodiment of the application provides a method for image region segmentation, a method and device for model training.
- the region with poorer effect in the first stage of image segmentation is further segmented, so as to obtain more Accurate image segmentation results improve the performance of image segmentation.
- the image region segmentation method provided in this application can be applied to the field of artificial intelligence (AI), and specifically can be applied to the field of computer vision.
- AI artificial intelligence
- image processing and analysis have gradually formed a set of scientific systems, and new processing methods have emerged one after another.
- images are the foundation of vision. Therefore, digital images have become effective tools for scholar in psychology, physiology, and computer science to study visual perception.
- image processing in large-scale applications such as military, remote sensing, and meteorology.
- Image segmentation technology has always been a basic technology and an important research direction in the field of computer vision.
- Image segmentation technology is an important part of image semantic understanding.
- image processing capabilities have been significantly improved.
- Image segmentation technology is used in medical image analysis (including tumor and other pathological positioning, tissue volume measurement, computer Guided surgery, customization of treatment plans, research on anatomical structures), face recognition, fingerprint recognition, unmanned driving, and machine vision have also played a more important role.
- Figure 1 is a schematic diagram of the architecture of the image area recognition system in an embodiment of this application.
- the image processing equipment provided by this application includes a terminal device or a server.
- the client can be an auxiliary segmentation tool.
- the terminal devices on which the client is deployed include, but are not limited to, tablets, laptops, handheld computers, mobile phones, voice interaction devices, and personal computers (personal computer, PC), there is no limitation here.
- this application proposes an interactive image auxiliary segmentation tool based on a neural network model (ie, a first image segmentation model and a second image segmentation model).
- the auxiliary segmentation tool can feed back a more accurate pre-segmentation result (that is, the first image segmentation area) through the neural network model (ie the first image segmentation model) as long as it acquires a small amount of user interaction behavior.
- the final segmentation result i.e. the second image segmentation area
- This application proposes a "small number of labeled points interactive" segmentation method, and improves the image segmentation model, thereby obtaining better segmentation results and real-time tool performance.
- first image segmentation model and the second image segmentation model can be deployed in a server as an image processing device, and the image segmentation area is predicted through the first image segmentation model and the second image segmentation model, thereby realizing image online
- first image segmentation model and the second image segmentation model can also be deployed on a terminal device as an image processing device.
- the image segmentation area is predicted to realize offline image segmentation. the goal of.
- Figure 2 is a schematic flow chart of the image region recognition method in the embodiment of the application.
- the user uses the auxiliary segmentation tool to mark the extreme points of the image to be processed, for example, the image to be segmented in Figure 2
- the tree in the image is annotated, and the auxiliary segmentation tool generates a first heat map according to the result of the user's annotation.
- the first heat map is combined with the image matrix of the image to be segmented to obtain the first image feature information.
- the first image feature information is input to the first image segmentation model, and the features are extracted through the first image segmentation model, thereby outputting the first image segmentation area, for example, obtaining the segmentation area of the tree.
- the first image segmentation model may be an image segmentation convolutional neural network (Convolutional Neural Networks, CNN), and its model structure mainly includes an input layer, a feature extraction layer, and an output layer. Since the effect of the generated first image segmentation area is not good enough, an auxiliary segmentation tool can also be used to input annotation points. For example, a second heat map is generated according to the annotation points input by the user. The image matrix is combined with the first image segmentation area to obtain the second image feature information. The second image feature information is input to the second image segmentation model, and features are extracted through the second image segmentation model, thereby outputting the second image segmentation area, and obtaining a more accurate tree segmentation area. Perform image recognition on the image to be segmented according to the second image segmentation area, and the obtained image recognition result has higher accuracy.
- CNN image segmentation convolutional neural network
- An embodiment of the image region recognition method in the embodiment of this application includes:
- the image to be segmented includes multiple extreme points.
- the image region recognition device acquires the image to be segmented, where the image processing device can be represented as an auxiliary segmentation tool deployed therein, the image to be segmented can be marked by the auxiliary segmentation tool, and the user uses the auxiliary segmentation tool to label multiple The extreme points are used to generate the image to be segmented based on these extreme points. It is understandable that the image processing device provided in this application can be deployed on a terminal device.
- the multiple extreme points may be the highest point, the lowest point, the leftmost point, and the rightmost point of the target object in the image to be segmented, and may also be several extreme points, which are not limited this time.
- the first image feature information includes N image matrices and a first heat map.
- the first heat map is generated based on a plurality of extreme points, and N is an integer greater than or equal to 1.
- the image area segmentation device generates N image matrices based on the image to be divided, and generates a first heat map based on multiple extreme points, and combines the first heat map with the N image matrices to obtain the image to be divided. Corresponding first image feature information.
- the digital image data can be represented by a matrix. If the size of the read image to be divided is 128*128, the size of the image matrix is 128*128*N, where N is an integer greater than or equal to 1.
- the image matrix can be a matrix corresponding to a grayscale image.
- the image matrix can be a matrix of red green blue (RGB) images.
- RGB image is three-dimensional. The three dimensions represent the red, green and blue components, and the size is 0 to 255. Each pixel is composed of these three components.
- Each RGB channel corresponds to an image matrix (that is, the first image matrix, the second image matrix, and the third image matrix).
- the three RGB channels are stacked together to form a color image, that is, the image to be divided is obtained.
- the image matrix can be red, green, blue and Alpha (red green blue Alpha, RGBA) color space.
- RGBA red green blue Alpha
- PNG Portable Network Graphics
- the first image segmentation model includes a first heat map channel and N first matrix channels.
- the N first matrix channels have a one-to-one correspondence with the N image matrices.
- the first heat map channel and the first heat map have a one-to-one correspondence.
- the corresponding relationship proposed here can be understood as: if the image matrix a has a corresponding relationship with the first matrix channel a, when the first image segmentation area corresponding to the image feature information is obtained through the image segmentation model, the image matrix a is from the first matrix Channel a input image segmentation model.
- the corresponding relationship between the first heat map and the first heat map channel also identifies input methods similar to the above.
- the image processing device inputs the first image feature information into the first image segmentation model, where the first image segmentation model can adopt a deep experiment (Deep Lab) structure, including but not limited to DeepLabV1, DeepLabV2, DeepLabV3, and DeepLabV3+ .
- the DeepLabV2 structure is a CNN model structure for image segmentation. Input a picture and output a mask image of the same size as the original picture. The value of each pixel in the picture represents the category label value that this pixel belongs to.
- the DeepLabV3+ structure is an improved CNN model structure for image segmentation based on DeeplabV2. It usually achieves better results in image segmentation competitions.
- CNN is a development of neural network model. It uses convolutional layer to replace the fully connected layer structure in artificial neural network, and has achieved very excellent performance in various computer vision fields.
- the first image segmentation model includes N first matrix channels and one first heat map channel.
- N 3 image matrices, which correspond to 3 first matrix channels at this time, and each first matrix channel corresponds to an image matrix, and there is also a first heat map channel at this time.
- the map channel corresponds to the first heat map.
- N 1
- 1 image matrix corresponds to 1 first matrix channel at this time
- 1 first matrix channel corresponds to an image matrix of a grayscale image
- there is also a first thermal Map channel the first heat map channel corresponds to the first heat map.
- N 4 image matrices, which correspond to 4 first matrix channels at this time, and each first matrix channel corresponds to an image matrix, and there is also a first heat map channel at this time.
- the first heat map channel corresponds to the first heat map.
- the image processing device receives an annotation point
- the annotation point may be one or more
- the annotation point is annotated by the user according to the first image segmentation area
- the image area segmentation device generates the second heat according to the annotation point.
- the generation method of the second heat map is similar to the generation method of the first heat map, and will not be repeated this time.
- the first image segmentation area may have a less recognized area and a more recognized area relative to the target object that actually needs to be recognized in the image to be segmented.
- the less-recognized area can be understood as a part of the target object that is not in the first image segmentation area
- the more-recognized area can be understood as an area in the first image segmentation area that obviously does not belong to the target object.
- the corresponding label point can be on the edge or at a non-edge position.
- the corresponding label point is a negative point, which can be expressed as -1. If it is a negative point, Then multiply by -1 when generating the Gaussian distribution (that is, the second heat map).
- the corresponding label point can be on the edge or a non-edge position.
- the corresponding label point is a positive point, which can be expressed as 1. If it is a positive point, it is generated Multiply by 1 for Gaussian distribution (ie the second heat map).
- the second image segmentation model includes a segmented area channel, a second heat map channel, and N second matrix channels.
- the N second matrix channels have a one-to-one correspondence with the N image matrices.
- the segmented area channels and the first image The divided regions have a corresponding relationship, and the second heat map channel has a corresponding relationship with the second heat map.
- the image processing device combines the second heat map, the first image segmentation area, and the N image matrices of the image to be segmented to obtain the second image feature information corresponding to the image to be segmented, and then the second image feature The information is input to the second image segmentation model.
- the second image segmentation model includes N second matrix channels, segmented region channels, and second heat map channels.
- the N second matrix channels have a one-to-one correspondence with the N image matrices.
- the segmented area channel has a corresponding relationship with the first image segmented area
- the second heat map channel has a corresponding relationship with the second heat map.
- the image processing device can also generate an image recognition result of the image to be divided according to the second image segmentation area.
- the second image segmentation area and the first image segmentation area are both mask images, and the to-be-segmented image can be obtained based on the mask image. Segment the edge of the target object in the image, and finally get the image recognition result.
- the image recognition result can be displayed through text information, for example, the image recognition result is an object such as "monkey” or "car”.
- the result of image recognition can also be to highlight the target object in the image to be segmented.
- the target object can be an object such as a "car” or a "monkey".
- a method for image region segmentation is provided.
- the image to be segmented is first obtained, where the image to be segmented includes multiple extreme points, and then first image feature information is generated according to the image to be segmented, and then the first
- the image segmentation model obtains the first image segmentation area corresponding to the first image feature information, and obtains the second heat map based on the first image segmentation area, where the second heat map is generated according to the annotation points, and finally passes through the second image segmentation model Acquire the second image segmentation area corresponding to the image to be segmented.
- the regions with poor results in the first stage image segmentation can be further segmented, so as to obtain more accurate image segmentation results.
- acquiring the image to be divided includes:
- an object labeling instruction for the image to be processed where the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object, and the multiple extreme points are used to identify the The contour edge of the target object.
- the extremum points can be determined around the contour edge of the target object, for example, the extremum points in the four directions of up, down, left, and right, as shown in FIG. 4.
- the plurality of extreme points may include four, and correspondingly, the position information of the four extreme points includes the position information of the first extreme point, the position information of the second extreme point, and the position of the third extreme point.
- the image to be segmented is generated according to the image to be processed.
- FIG. 4 is a schematic diagram of an embodiment of selecting four extreme points in an embodiment of this application.
- a to-be-processed image is first shown, and the to-be-processed image includes a target object.
- target objects include, but are not limited to, people, animals, vehicles, and other objects.
- the user can trigger the object labeling instructions, such as selecting several extreme points from the image to be processed by clicking and selecting.
- the user can select through the auxiliary segmentation tool
- the four extreme points of the tree namely the first extreme point A, the second extreme point B, the third extreme point C, and the fourth extreme point D.
- the object labeling instruction specifically carries the coordinate information of these four extreme points, so that the image to be segmented corresponding to the image to be processed is generated according to the object labeling instruction, and the image to be segmented is shown in Figure 4 as the image corresponding to the tree, and
- the image to be segmented includes an area constituted by a first extreme point A, a second extreme point B, a third extreme point C, and a fourth extreme point D.
- the auxiliary segmentation tool generates the first image feature information (including the first heat map and N image matrices) according to the image to be segmented, and then obtains the first image segmentation area corresponding to the first image feature information through the first image segmentation model, see Figure 5,
- Figure 5 is a schematic diagram of an embodiment in which the first image segmentation model returns to the first image segmentation area in the embodiment of the application.
- the auxiliary segmentation tool calculates the first image segmentation area according to four extreme points, and Return to the first image segmentation area, for example, the image corresponding to the shaded part in FIG. 5 is the image segmentation area. It is understandable that the image segmentation area may be a pre-segmented polygon result.
- FIG. 5 is only an illustration, and should not be construed as a limitation of the application.
- a method for marking extreme points is provided.
- the image to be processed is first displayed, and then an object marking instruction is received.
- the object marking instruction carries the first extreme point position information corresponding to the target object, The second extreme point location information, the third extreme point location information, and the fourth extreme point location information, and finally in response to the object labeling instruction, generate the image to be segmented according to the image to be processed.
- the auxiliary segmentation tool can be used to label the image to be processed, the operation of the auxiliary segmentation tool is less difficult, and the convenience of use is higher, thereby improving the feasibility and operability of the solution.
- acquiring the second heat map according to the first image segmentation region may include:
- the marking points are located inside the region of the first image segmentation area, and M is an integer greater than or equal to 1;
- a second heat map is generated according to the M labeling points corresponding to the first labeling instruction.
- FIG. 6 is a schematic diagram of a label position based on the first image segmentation area in an embodiment of the application.
- the user uses the auxiliary segmentation tool to mark M annotation points on the extra area, where M is an integer greater than or equal to 1.
- the M annotation points are inside the first image segmentation area, that is, The extra area in the first image segmentation area is marked, for example, the marked point A obtained by marking in FIG. 6.
- the multi-recognized area can be marked on the edge or in a non-edge position, which is not limited here, and the marked point of the multi-recognized area is a positive point, which can be expressed as 1.
- a method for generating a second heat map based on annotation points is provided, that is, receiving a first annotation instruction, and in response to the first annotation instruction, generate a second heat map based on the M annotation points carried in the first annotation instruction.
- the second heat map is provided.
- the auxiliary segmentation tool can be used to perform the second labeling of the first image segmentation area obtained by the preliminary prediction.
- the auxiliary segmentation tool is less difficult to operate and more convenient to use.
- after the second labeling It can generate more accurate image segmentation results, thereby improving the operability and feasibility of the solution.
- acquiring the second heat map according to the first image segmentation region may include:
- the marking points are located outside the region of the first image segmentation area, and M is an integer greater than or equal to 1;
- a second heat map is generated according to the M labeling points corresponding to the second labeling instruction.
- FIG. 7 is a schematic diagram of another labeling position based on the first image segmentation area in an embodiment of the application.
- the user uses the auxiliary segmentation tool to mark M labeling points on the missing area, where M is an integer greater than or equal to 1, where the M labeling points are inside the first image segmentation area, that is, Mark the missing area of the target object, such as the marked point B obtained by marking in FIG. 7.
- the less-recognized area can be marked on the edge or in the non-edge position.
- the marked point of the less-recognized area is a negative point, which can be expressed as -1, if it is negative Point, multiply by -1 when generating Gaussian distribution.
- another method of generating a second heat map based on the annotation points is provided, that is, receiving a second annotation instruction, and in response to the second annotation instruction, according to the M annotation points carried in the second annotation instruction Generate a second heat map.
- the auxiliary segmentation tool can be used to perform the second labeling of the first image segmentation area obtained by the preliminary prediction.
- the auxiliary segmentation tool is less difficult to operate and more convenient to use.
- after the second labeling It can generate more accurate image segmentation results, thereby improving the operability and feasibility of the solution.
- the embodiment of the present application provides an optional embodiment, and the N first matrix channels include a red channel, a green channel, and The blue channel, which generates the first image feature information according to the image to be segmented, may include:
- N image matrices according to the image to be divided, the N image matrices including a first image matrix corresponding to the red channel, a second image matrix corresponding to the green channel, and a third image corresponding to the blue channel matrix;
- first image feature information is generated.
- FIG. 8 A schematic diagram of an embodiment of generating first image feature information in the application embodiment.
- this application adopts the input format of Deep Extreme Cut (DEXTR), and inputs a four-channel image matrix, which is It is said that in addition to the original image, the input of the first image segmentation model used in this application also includes the information of four extreme points. In order to make full use of the information of the four extreme points, a heatmap with the same size as the image to be segmented is generated.
- DEXTR Deep Extreme Cut
- the three image matrices are the first image matrix, the second image matrix, and the third image matrix.
- the first image matrix corresponds to the red (R) input channel
- the second image matrix corresponds to the green (G) input channel
- the The three-image matrix corresponds to the blue (B) input channel.
- the principle of heat map generation is mainly divided into four steps, specifically:
- the gray value can be superimposed for the area where the buffer crosses, so the more the buffer crosses, the larger the gray value, the hotter the area;
- a method for generating first image feature information based on the image to be segmented is provided.
- the first heat map is generated based on multiple extreme points in the image to be segmented, and the first image is generated based on the image to be segmented.
- the matrix generates a second image matrix according to the image to be divided, and generates a third image matrix according to the image to be divided.
- the embodiment of the present application provides an optional embodiment, and the N second matrix channels include a red channel and a green channel. Channel and blue channel, the N image matrices are determined in the following way:
- N image matrices are generated according to the image to be divided, and the N image matrices include a first image matrix corresponding to the red channel, a second image matrix corresponding to the green channel, and a second image matrix corresponding to the blue channel. Three image matrix.
- step 104 after acquiring the second heat map according to the annotation points corresponding to the first image segmentation area and the first image segmentation area, it may further include:
- second image feature information is generated, wherein the second image feature information is the first image feature information when the second image segmentation area is acquired 2.
- Input information of the image segmentation model is generated, wherein the second image feature information is the first image feature information when the second image segmentation area is acquired 2.
- FIG. 9 is a schematic diagram of an embodiment of generating second image feature information in an embodiment of this application.
- this application adopts the DEXTR input format and inputs a five-channel image matrix, which means that this application uses
- the input of the second image segmentation model also includes the information of the annotation points and the first image segmentation area output by the first image segmentation model.
- the information of the M annotation points generate an image with the same size as the image to be segmented.
- the heatmap that is, the second heat map, as shown in Figure 9, generate 2D Gaussian distribution with M labeled point coordinates as the center, and then use the second heat map as the second heat map channel (ie the fourth matrix channel)
- the first image segmentation area is used as the input of the segmentation area channel (that is, the fifth matrix channel), and then combined with the other three image matrices to obtain the second image feature information, and finally the second image feature information is used as the first 2. Input of image segmentation model.
- the three image matrices are the first image matrix, the second image matrix, and the third image matrix.
- the first image matrix corresponds to the R input channel
- the second image matrix corresponds to the G input channel
- the third image matrix corresponds to B Input channel.
- this application gives the second image segmentation model some prior knowledge so that the second image segmentation model knows that the annotation points are selected by the user, but considering that the user selects the best point for the annotation points , So a second heat map distribution is generated with the labeled point as the center.
- a method for generating second image feature information according to the image to be divided is provided.
- the first image matrix can also be generated according to the image to be divided , Generate the second image matrix according to the image to be divided, generate the third image matrix according to the image to be divided, and generate the first image matrix according to the first image segmentation area, the second heat map, the first image matrix, the second image matrix and the third image matrix 2.
- Image feature information where the second image feature information is input information of the second image segmentation model.
- the second image segmentation model corresponding to the second image segmentation model can include:
- the second image feature information is encoded by the encoder of the second image segmentation model to obtain the first feature map and the second feature map.
- the encoder includes a middle flow module and a hole depth separable convolution, and a hole depth separable convolution
- the product is used to extract the feature map of the second image feature information
- the middle flow module is used to repeat the execution T times, and T is an integer greater than 8;
- the target feature map is decoded by the decoder of the second image segmentation model to obtain the second image segmentation area.
- the structure of a second image segmentation model is introduced.
- This application uses two models, namely the first image segmentation model and the second image segmentation model.
- the first image segmentation model of the first stage is used to obtain Mask, interactively mark the boundary point of the mask corresponding to the correction area, and generate the Gaussian center at the boundary point to form the first heat map corresponding to the instance size, and finally divide the original image and the first image into the generated model
- the mask that is, the first image segmentation area
- the first heat map form a 5-channel input matrix, which is input into the second image segmentation area of the second stage to obtain the corresponding segmentation result.
- DeeplabV3+ version
- U-Net Masked Area Convolutional Neural Network
- Mask Region-CNN Mask R-CNN
- PSPNet Pyramid Scene Parsing Network
- the first image segmentation model can also use DeeplabV3+. DeeplabV3+ is an efficient, fast, and capable Semantic segmentation algorithm for scale instances.
- FIG. 10 is a schematic structural diagram of the second image segmentation model in an embodiment of the application.
- the second image feature information is input to the second Image segmentation model.
- the second image segmentation model includes an encoder (Encoder) and a decoder (Decoder).
- the encoder is used to obtain rich high-level semantic information
- the decoder is used to gradually restore boundary information.
- the image feature information is encoded through the Deep Convolutional Neural Network (DCNN) in the encoder, that is, the resolution of 4 times the size is restored through bilinear interpolation to obtain the first feature map.
- the 1*1 convolution process is used to reduce the number of channels, so as to extract the low-level features of the image feature information, and then the second feature map can be obtained.
- the first feature map and the second feature map are spliced through the concat in the decoder of the image segmentation model to obtain the target feature map. Then a convolution with a size of 3*3 is used to enhance the target feature map, and then an interpolation is used to further restore 4 times the resolution to the size of the image to be divided.
- the encoder is mainly composed of an improved Xception (Extreme Inception) and a hollow space pyramid.
- Figure 11 is a schematic diagram of the structure of the Xception model in the embodiment of the application. As shown in the figure, the improved Xception is used for feature extraction of the image. The specific structural parameters are shown in Figure 11. This time I won’t repeat it. The middle flow module of the original Xception is repeated 8 times, and the improved middle flow module is repeated at least 9 times. This application takes 16 repetitions as an example, but this should not be construed as a limitation on this application.
- the encoder also includes a hole-depth separable convolution, which replaces all the maximum pooling operations in the encoder with a step-size depth-separable convolution, which enables this application to apply the hole-depth separable convolution to extract at any resolution
- the feature map, the schematic diagram of the hole separable convolution model is shown in Figure 12, please refer to Figure 12,
- Figure 12 is a schematic diagram of the hole depth separable convolution diagram in the embodiment of the application, input feature maps (the first feature map and the first feature map) The two feature maps) are separated according to the channels, and then convolved using the depth convolution operation in Figure 12(a).
- batch normalization and linear rectification function (Rectified Linear Unit, ReLU) activation functions are added after each 3x3 depth separable convolution.
- ReLU Rectified Linear Unit
- a hollow space pyramid is used in the encoder, and the hollow space pyramid is used to capture multi-scale information, so as to realize the processing of instances of different scales.
- the original image is processed by improved Xception, and the feature image resolution is reduced to the original 1/16 and then input into the hollow space pyramid structure.
- a method for obtaining the second image segmentation area through the second image segmentation model is provided, that is, the second image feature information is encoded by the encoder of the second image segmentation model to obtain the first The feature map and the second feature map, then the first feature map and the second feature map are spliced to obtain the target feature map, and finally the target feature map is decoded by the decoder of the second image segmentation model to obtain the second image segmentation area .
- a DeeplabV3+-based model structure is used to predict the image segmentation area, and the DeeplabV3+ model structure has a small amount of overall parameters. Therefore, it has a faster running speed in both training and actual prediction.
- an improved Xception model is used to reduce the size of the model while ensuring the performance of feature extraction, and to improve the segmentation speed by using deep separable convolution.
- the hollow space pyramid is used to construct convolution operations and pooling operations with multiple void rates to obtain multi-scale information, which can help the model process multi-scale instances.
- the target feature map is decoded by the decoder of the second image segmentation model To obtain the second image segmentation area, which may include:
- the target feature map is decoded by the decoder of the second image segmentation model to obtain a first pixel point set and a second pixel point set.
- the first pixel point set includes a plurality of first pixels
- the second pixel point set includes Second pixel
- a second image segmentation area is generated.
- a method for generating a second image segmentation region based on a second image segmentation model is introduced. After the image segmentation region decodes the target feature map, the first pixel point set and the second pixel point set are obtained.
- the first pixel point set belongs to the pixel point of the target object, for example, it can be expressed as "1"
- the second pixel point set belongs to the background, for example, it can be expressed as "0", which is composed of the first pixel point set and the second pixel point set
- the second image segmentation area that is, the segmentation result of the target object can be seen in the second image segmentation area.
- the encoding-decoding structure can obtain the edge information of the object by gradually recovering the spatial information.
- the DeeplabV3+ model structure adds a decoder to the DeeplabV3 model structure to enhance the segmentation of the object edge.
- the decoder in the DeeplabV3+ model uses the high-level semantic information output by the encoder and the feature map whose resolution is 1/4 of the original image resolution in the encoder for decoding operations.
- the low-level feature map with rich detailed information output by the encoder undergoes a 1 x 1 convolution operation (this operation is mainly used to reduce the number of channels of the low-level feature map, thereby reducing the proportion of the low-level feature map) , Get a new low-level feature map.
- the high-level feature map with rich semantic information output by the encoder is adopted 4 times to obtain a new high-level feature map.
- the new low-level feature map and the new high-level feature map are spliced according to the channel, and the result obtained is 4 times used after a 3x3 convolution operation to obtain a feature map with the same size as the original image as the final output of the decoder.
- the performance of the model is improved by using high-level semantic information and low-level detailed information.
- a method for obtaining a second image segmentation area by decoding using a second image segmentation model is provided.
- the target feature map is decoded by a decoder of the second image segmentation model to obtain the first pixel
- the point set and the second pixel point set are used to generate a second image segmentation area according to the first pixel point set and the second pixel point set.
- An example of the method of model training in the embodiment of this application includes:
- the model training device acquires a set of images to be trained, where the set of images to be trained includes at least one image to be trained, and the images to be trained include original images of examples, such as "horse", "person", “TV” or " Buildings” etc. It is understandable that the train training set in the Pascal-VOC2012 data set can be used as the image set to be trained during the model training process, which is not limited this time.
- the model training device inputs the image to be trained into the pre-trained first image segmentation model, and the first image segmentation model outputs the first predicted segmentation region corresponding to the image to be trained.
- the first prediction segmentation area includes the foreground and the background, where the pixels of the foreground can be expressed as "1" and the pixels of the background can be expressed as "0".
- the first prediction segmentation area is a mask image.
- the model training device automatically generates at least one difference point according to the first predicted segmentation area and the real segmentation area of the image to be trained, and then generates a corresponding heat map to be trained through the at least one difference point.
- the process of automatically generating at least one difference point is to simulate the process of marking the marked point by the user.
- the real segmentation area is the actual segmentation area segmented based on the image to be trained.
- the first predicted segmentation area, the to-be-trained heat map, and the real segmentation area obtain the second predicted segmentation area through the image segmentation model to be trained;
- the model training device obtains four input information, which are the image to be trained (original image), the first predicted segmentation area, the heat map to be trained, and the real segmentation area, and then uses these four input information to treat the training image
- the segmentation model is trained, that is, the image to be trained, the first predicted segmentation area, the heat map to be trained, and the real segmentation area are input to the image segmentation model to be trained, and a corresponding second predicted segmentation area is output through the image segmentation model to be trained ,
- the second prediction segmentation area is a mask image.
- the target loss function is used to determine the model parameters corresponding to the image segmentation model to be trained
- the model training device uses the target loss function to train the image segmentation model to be trained required in the second stage based on the second predicted segmentation area and the real segmentation area.
- the image set to be trained used in the training phase may include 1464 images to be trained and a total of 3507 instances.
- the original image for example, 512*512
- the first predicted segmentation area, the real segmentation area, and the heat map to be predicted are input into the image segmentation model to be trained for training.
- the segmentation model updates the generated mask image of the image set to be trained.
- the model training device determines the model parameters of the image segmentation model to be trained when the target loss function converges, and the model parameters are used to update the image segmentation model to be trained to obtain the second image segmentation model.
- a method for model training that is, first obtain a set of images to be trained, then obtain the first predicted segmentation area of the image to be trained through the first image segmentation model, and then obtain the real segmentation area of the image to be trained And the first prediction segmentation area, generating a heat map to be trained, according to the image to be trained, the first prediction segmentation area, the heat map to be trained, and the real segmentation area, the second prediction segmentation area is obtained through the image segmentation model to be trained, and finally according to the second Predict the segmentation area and the real segmentation area, use the target loss function to determine the model parameters corresponding to the image segmentation model to be trained, and use the model parameters to train the image segmentation model to be trained to obtain the second image segmentation model.
- mIOU mean Intersection Over Union
- the target loss function may be expressed as:
- Loss Pos_loss*(Neg_num/Total_num)+Neg_loss*(Pos_num/Total_num);
- Loss represents the target loss function
- Pos_loss represents the sum of the positive sample loss of the second prediction segmentation area
- Neg_loss represents the sum of the negative sample loss of the second prediction segmentation area
- Pos_num represents the number of positive samples in the real segmentation area
- Neg_num represents the real The number of negative samples in the segmentation area
- Total_num represents the sum of the number of positive samples and the number of negative samples.
- the objective loss function is a cross-entropy loss function using positive and negative balance.
- the probability is obtained through the softmax function according to the output result of the decoder of the second image segmentation model. Figure and compare it with the real segmentation area to calculate the loss.
- the loss function in this application uses a positive and negative balanced cross-entropy loss function.
- the positive and negative balanced cross entropy loss function is based on the original standard cross entropy loss function.
- the number of positive and negative samples is considered. By calculating the number of positive samples and the number of negative samples in the real segmentation area, the number of positive and negative samples is obtained proportion.
- the positive and negative balanced cross entropy loss function (ie, the target loss function) It can be expressed as) as follows:
- Loss Pos_loss*(Neg_num/Total_num)+Neg_loss*(Pos_num/Total_num);
- Loss represents the target loss function
- Pos_loss represents the sum of the positive sample loss of the second prediction segmentation area
- Neg_loss represents the sum of the negative sample loss of the second prediction segmentation area
- Pos_num represents the number of positive samples in the real segmentation area
- Neg_num represents the real The number of negative samples in the segmentation area
- Total_num represents the sum of the number of positive samples and the number of negative samples.
- the positive samples are the positive points of the real segmented area (that is, the foreground points)
- the negative samples are the negative points of the real segmented area (that is, the background points).
- a cross entropy loss function using positive and negative balance is provided, which can make the model better deal with the positive and negative loss during training, and avoid the model tilting in a larger number of directions, resulting in training failure , Thereby improving the reliability of training.
- the heat map to be trained can include:
- a heat map to be trained is generated according to at least one difference point.
- a method for automatically generating a heat map to be trained is introduced.
- the second-stage auxiliary segmentation algorithm it is necessary to use the difference between the generated mask obtained by the first-stage auxiliary segmentation algorithm and the real mask of the instance.
- the maximum difference point and there is no need to manually describe the maximum difference point during the training process, so the difference point is simulated by the following method.
- Figure 14 is a schematic diagram of the original image of the example in the embodiment of the application. As shown in the figure, after reading an original image, due to the "bundling" relationship, the original image corresponding to the actual The pixel values above the segmented area are extracted from the real segmented area, and then the upper end, lower end, left end, and right end corresponding to the instance are calculated, a total of four extreme points.
- the sample image is extracted by a bounding box (Bounding Box, BBox).
- BBox Binding Box
- 50 pixels are floated around the bounding box of the sample image when cropping, so as to obtain the original example image.
- the real mask that is, the real segmentation area
- the generation mask that is, the first predicted segmentation area
- the resolution is 512*512. It can be understood that the above resolution is only an illustration, and is not understood as a limitation of the application.
- the difference point (which can be a maximum difference point) between the real mask (that is, the real segmentation area) and the generated mask (that is, the first predicted segmentation area)
- the real mask that is, the real All the differences between the segmentation area
- the generated mask that is, the first prediction segmentation area
- FIG. 15 is a schematic diagram of an embodiment of selecting difference points in an embodiment of this application.
- first determine the real mask that is, the real segmentation area
- generate the mask That is, whether the corresponding position pixels between the first prediction divided regions are consistent, so as to obtain a difference map, which can be expressed as the S1 area and the S2 area as shown in FIG. 15.
- the connected domain is calculated for the difference graph, and the largest area in the connected domain is obtained as candidate area 1. Since the S2 area is larger than the S1 area, the S2 area is taken as the candidate area 1.
- At least one difference point for example, the D1 point shown in FIG. 15 is a randomly selected difference point. Based on the at least one difference point, a heat map to be trained can be produced.
- a method for automatically generating a heat map to be trained is provided, that is, a difference map is determined according to the real segmentation area and the first predicted segmentation area, and then the first candidate area and the second candidate area are determined according to the difference map. , Selecting at least one difference point according to the first candidate area and the second candidate area, and finally generating a heat map to be trained according to the at least one difference point.
- FIG. 16 is a schematic diagram of an embodiment of the image processing device in an embodiment of the application.
- the image processing device 30 includes:
- the acquiring module 301 is configured to acquire an image to be segmented, where the image to be segmented includes a plurality of extreme points;
- the generating module 302 is configured to generate first image feature information according to the image to be divided obtained by the obtaining module 301, where the first image feature information includes N image matrices and a first heat map, and the first The heat map is generated based on the multiple extreme points, and the N is an integer greater than or equal to 1;
- the acquisition module 301 is configured to acquire the first image segmentation area corresponding to the first image feature information generated by the generation module 302 through a first image segmentation model, wherein the first image segmentation model includes a first image segmentation model.
- Heat map channels and N first matrix channels the N first matrix channels have a one-to-one correspondence with the N image matrices, and the first heat map channels have a corresponding relationship with the first heat map ;
- the acquiring module 301 is further configured to acquire a second heat map according to the annotation points corresponding to the first image segmentation area and the first image segmentation area;
- the acquisition module 301 is further configured to acquire a second image segmentation area corresponding to the image to be segmented through a second image segmentation model, where the second image segmentation model includes a segmentation area channel, a second heat map channel, and N second matrix channels, the N second matrix channels have a one-to-one correspondence with the N image matrices, the segmented area channels have a corresponding relationship with the first image segmented area, and the second The heat map channel has a corresponding relationship with the second heat map.
- the generating module 302 is configured to generate an image recognition result of the image to be divided according to the second image segmentation area.
- the obtaining module 301 obtains the image to be divided, wherein the image to be divided includes a plurality of extreme points, and the generating module 302 generates first image feature information according to the image to be divided obtained by the obtaining module 301,
- the first image feature information includes N image matrices and a first heat map
- the first heat map is generated according to the multiple extreme points
- the N is an integer greater than or equal to 1, so
- the acquiring module 301 acquires the first image segmentation area corresponding to the first image feature information generated by the generating module 302 through the first image segmentation model, wherein the first image segmentation model includes N first matrix channels And a first heat map channel, the N first matrix channels have a one-to-one correspondence with the N image matrices, the first heat map channel has a corresponding relationship with the first heat map, and the acquiring
- the module 301 obtains a second heat map according to the segmented area of the first image, where the second heat map is generated according to annotated points, and the acquisition
- an image processing device is provided.
- the image segmentation process is divided into two stages.
- the auxiliary segmentation of the second stage the poorer effect of the image segmentation in the first stage
- the region is further segmented to obtain a more accurate image segmentation result without spending a lot of time to correct the image segmentation result, thereby improving the performance of image segmentation.
- the acquiring module 301 is specifically configured to receive an object labeling instruction for an image to be processed, wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object , The multiple extreme points are used to identify the contour edge of the target object;
- the image to be segmented is generated according to the image to be processed.
- the position information of the multiple extreme points includes first extreme point position information, second extreme point position information, and third extreme point position information that respectively identify around the contour edge of the target object, and Position information of the fourth extreme point.
- a method for marking extreme points is provided.
- the auxiliary segmentation tool can be used to annotate the image to be processed.
- the auxiliary segmentation tool is less difficult to operate and more convenient to use. Thereby improving the feasibility and operability of the program.
- the acquiring module 301 is specifically configured to receive a first marking instruction, where the first marking instruction corresponds to M marking points, the marking points are located inside the region of the first image segmentation area, and the M is greater than Or an integer equal to 1;
- the second heat map is generated according to the M labeling points corresponding to the first labeling instruction.
- a method for generating a second heat map based on annotated points is provided.
- an auxiliary segmentation tool can be used to perform secondary annotation on the first image segmentation area obtained by preliminary prediction.
- the segmentation tool is less difficult to operate and more convenient to use.
- the second labeling can generate more accurate image segmentation results, thereby improving the operability and feasibility of the solution.
- the acquiring module 301 is specifically configured to receive a second marking instruction, where the second marking instruction corresponds to M marking points, the marking points are located outside the region of the first image segmentation area, and the M is greater than Or an integer equal to 1;
- the second heat map is generated according to the M labeling points corresponding to the second labeling instruction.
- auxiliary segmentation tool can be used to perform secondary annotation on the first image segmentation area obtained by preliminary prediction.
- Auxiliary segmentation tools are less difficult to operate and more convenient to use.
- secondary labeling more accurate image segmentation results can be generated, thereby improving the operability and feasibility of the solution.
- the N first matrix channels include a red channel, a green channel, and a blue channel.
- Another implementation of the image processing device 30 provided in this embodiment of the application is In the example,
- the generating module 302 is specifically configured to generate the first heat map according to the multiple extreme points in the image to be segmented;
- N image matrices are generated according to the image to be divided, and the N image matrices include a first image matrix corresponding to the red channel, a second image matrix corresponding to the green channel, and a second image matrix corresponding to the blue channel.
- the first image feature information is generated according to the first heat map, the first image matrix, the second image matrix, and the third image matrix.
- a method for generating first image feature information based on the image to be segmented is provided.
- the generated heat map can better provide effective information. Thereby improving the feasibility and operability of the program.
- the N second matrix channels include a red channel, a green channel, and a blue channel.
- the generating module 302 is further configured to generate N image matrices according to the image to be divided, the N image matrices including a first image matrix corresponding to the red channel, and a second image matrix corresponding to the green channel, And a third image matrix corresponding to the blue channel;
- second image feature information is generated, wherein the second The image feature information is input information of the second image segmentation model when acquiring the second image segmentation area.
- a method for generating second image feature information based on the image to be segmented is provided.
- the generated heat map can better provide effective information, thereby Improve the feasibility and operability of the program.
- the acquiring module 301 is specifically configured to encode the second image feature information by the encoder of the second image segmentation model to obtain a first feature map and a second feature map, wherein the encoder includes middle
- the flow module and the hole depth separable convolution are used to extract the feature map of the second image feature information
- the middle flow module is used to repeatedly execute T times, and the T is greater than 8 Integer
- the target feature map is decoded by the decoder of the second image segmentation model to obtain the second image segmentation area.
- a method for obtaining the second image segmentation area through the second image segmentation model is provided.
- a model structure based on DeeplabV3+ is used to predict the image segmentation area, and the overall parameter quantity of the DeeplabV3+ model structure Therefore, it has a faster running speed in both training and actual prediction, and it can respond to user operations faster when applied to auxiliary segmentation tools, improve use efficiency, and enhance user viscosity.
- the improved Xception model is used to ensure the performance of feature extraction while reducing the size of the model by using depth separable convolution to increase the segmentation speed.
- the hollow space pyramid is used to construct convolution operations and pooling operations with multiple void rates to obtain multi-scale information, which can help the model process multi-scale instances.
- the acquiring module 301 is specifically configured to decode the target feature map by the decoder of the second image segmentation model to obtain a first pixel point set and a second pixel point set, wherein the first pixel point
- the set includes a plurality of first pixels, and the second set of pixels includes second pixels;
- the second image segmentation area is generated according to the first pixel point set and the second pixel point set.
- a method for obtaining a second image segmentation area by decoding using a second image segmentation model is provided.
- the target feature map is decoded by a decoder of the second image segmentation model to obtain the first pixel
- the point set and the second pixel point set are used to generate a second image segmentation area according to the first pixel point set and the second pixel point set.
- FIG. 17 is a schematic diagram of an embodiment of an image processing device in an embodiment of this application.
- the image processing device 40 includes:
- the obtaining module 401 is configured to obtain a set of images to be trained, wherein the set of images to be trained includes at least one image to be trained;
- the acquiring module 401 is further configured to acquire a first predicted segmentation area of an image to be trained through a first image segmentation model, where the first image segmentation model is an image segmentation model obtained by pre-training;
- the generating module 402 is configured to generate a heat map to be trained according to the real segmented area of the image to be trained and the first predicted segmented area acquired by the acquiring module 401, wherein the heat map to be trained is composed of at least one Generated by differences;
- the acquisition module 401 is further configured to segment the model according to the image to be trained, the first predicted segmentation area, the heat map to be trained generated by the generating module 402, and the real segmentation area. Acquiring a second prediction segmentation area;
- the determining module 403 is configured to use a target loss function to determine the model parameters corresponding to the image segmentation model to be trained according to the second predicted segmented region and the real segmented region acquired by the acquiring module 401;
- the training module 404 is configured to use the model parameters determined by the determining module 403 to train the image segmentation model to be trained to obtain a second image segmentation model.
- the acquiring module 401 acquires a set of images to be trained, wherein the set of images to be trained includes at least one image to be trained, and the acquiring module 401 acquires the first predicted segmentation area of the image to be trained through the first image segmentation model ,
- the first image segmentation model is a pre-trained image segmentation model
- the generating module 402 generates the to-be-trained image according to the real segmentation area of the image to be trained and the first predicted segmentation area acquired by the acquisition module 401
- a training heat map wherein the heat map to be trained is generated from at least one difference point, and the acquisition module 401 is based on the image to be trained, the first predicted segmentation area, and the generation module 402.
- the second predicted segmentation area is acquired through the image segmentation model to be trained, and the determining module 403 uses the second predicted segmentation area and the real segmentation area acquired by the acquisition module 401
- the target loss function determines the model parameters corresponding to the image segmentation model to be trained, and the training module 404 uses the model parameters determined by the determining module 403 to train the image segmentation model to be trained to obtain a second image segmentation model.
- a method for model training is provided. Through the above method, a higher mIOU value can be obtained on the basis of the first-stage segmentation algorithm, so that the second image segmentation model can be trained based on the first image segmentation. The model and the second image segmentation model can more accurately predict the image segmentation result.
- the objective loss function is expressed as:
- Loss Pos_loss*(Neg_num/Total_num)+Neg_loss*(Pos_num/Total_num);
- the Loss represents the target loss function
- the Pos_loss represents the sum of the positive sample losses of the second prediction segmentation area
- the Neg_loss represents the sum of the negative sample losses of the second prediction segmentation area
- the Pos_num represents the number of positive samples in the real segmentation area
- the Neg_num represents the number of negative samples in the real segmentation area
- the Total_num represents the sum of the number of positive samples and the number of negative samples.
- a cross entropy loss function using positive and negative balance is provided, which can make the model better deal with the positive and negative loss during training, and avoid the model tilting in a larger number of directions, resulting in training failure , Thereby improving the reliability of training.
- the generating module 402 is specifically configured to determine a difference map according to the real segmented area and the first predicted segmented area, where the difference map represents an area where the real segmented area is inconsistent with the first predicted segmented area ;
- the heat map to be trained is generated according to the at least one difference point.
- a method for automatically generating a heat map to be trained is provided, that is, a difference map is determined according to the real segmentation area and the first predicted segmentation area, and then the first candidate area and the second candidate area are determined according to the difference map. , Selecting at least one difference point according to the first candidate area and the second candidate area, and finally generating a heat map to be trained according to the at least one difference point.
- the embodiment of the application also provides another image processing device, as shown in FIG. 18.
- the image processing device is a terminal device for image area recognition.
- the terminal device can be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a vehicle-mounted computer, etc.
- the terminal device is a mobile phone as an example:
- FIG. 18 shows a block diagram of a part of the structure of a mobile phone related to a terminal device provided in an embodiment of the present application.
- the mobile phone includes: a radio frequency (RF) circuit 510, a memory 520, an input unit 530, a display unit 540, a sensor 550, an audio circuit 560, a wireless fidelity (WiFi) module 570, a processor 580, And power supply 590 and other components.
- RF radio frequency
- the RF circuit 510 can be used for receiving and sending signals during information transmission or communication. In particular, after receiving the downlink information of the base station, it is processed by the processor 580; in addition, the designed uplink data is sent to the base station.
- the RF circuit 510 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
- the RF circuit 510 can also communicate with the network and other devices through wireless communication.
- the above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division Multiple) Access, CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- Email Short Messaging Service
- the memory 520 can be used to store software programs and modules.
- the processor 580 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 520.
- the memory 520 may mainly include a storage program area and a storage data area.
- the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones.
- the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
- the input unit 530 can be used to receive inputted number or character information, and generate key signal input related to the user settings and function control of the mobile phone.
- the input unit 530 may include a touch panel 531 and other input devices 532.
- the touch panel 531 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 531 or near the touch panel 531. Operation), and drive the corresponding connection device according to the preset program.
- the touch panel 531 may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 580, and can receive and execute the commands sent by the processor 580.
- the touch panel 531 may be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
- the input unit 530 may also include other input devices 532. Specifically, other input devices 532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, joystick, and the like.
- the display unit 540 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
- the display unit 540 may include a display panel 541.
- the display panel 541 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
- the touch panel 531 can cover the display panel 541. When the touch panel 531 detects a touch operation on or near it, it transmits it to the processor 580 to determine the type of the touch event, and then the processor 580 responds to the touch event. The type provides corresponding visual output on the display panel 541.
- the touch panel 531 and the display panel 541 are used as two independent components to implement the input and input functions of the mobile phone, but in some embodiments, the touch panel 531 and the display panel 541 can be integrated. Realize the input and output functions of mobile phones.
- the mobile phone may also include at least one sensor 550, such as a light sensor, a motion sensor, and other sensors.
- the light sensor can include an ambient light sensor and a proximity sensor.
- the ambient light sensor can adjust the brightness of the display panel 541 according to the brightness of the ambient light.
- the proximity sensor can close the display panel 541 and/or when the mobile phone is moved to the ear. Or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when stationary, and can be used to identify mobile phone posture applications (such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.
- mobile phone posture applications such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.
- vibration recognition related functions such as pedometer, percussion
- other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.
- the audio circuit 560, the speaker 561, and the microphone 562 can provide an audio interface between the user and the mobile phone.
- the audio circuit 560 can transmit the electric signal converted from the received audio data to the speaker 561, and the speaker 561 converts it into a sound signal for output; on the other hand, the microphone 562 converts the collected sound signal into an electric signal, and the audio circuit 560 After being received, it is converted into audio data, and then processed by the audio data output processor 580, and then sent to another mobile phone via the RF circuit 510, or the audio data is output to the memory 520 for further processing.
- WiFi is a short-distance wireless transmission technology.
- the mobile phone can help users send and receive emails, browse webpages, and access streaming media through the WiFi module 570. It provides users with wireless broadband Internet access.
- FIG. 18 shows the WiFi module 570, it is understandable that it is not a necessary component of the mobile phone and can be omitted as needed without changing the essence of the invention.
- the processor 580 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone, and executes by running or executing software programs and/or modules stored in the memory 520, and calling data stored in the memory 520. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
- the processor 580 may include one or more processing units; optionally, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, and application programs. And so on, the modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 580.
- the mobile phone also includes a power source 590 (such as a battery) for supplying power to various components.
- a power source 590 such as a battery
- the power source can be logically connected to the processor 580 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
- the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.
- the processor 580 included in the terminal device also has the following functions:
- First image feature information is generated according to the image to be segmented, where the first image feature information includes N image matrices and a first heat map, and the first heat map is generated based on the multiple extreme points ,
- the N is an integer greater than or equal to 1;
- the first image segmentation region corresponding to the first image feature information is acquired through a first image segmentation model, where the first image segmentation model includes a first heat map channel and N first matrix channels, the N The first matrix channel has a one-to-one correspondence with the N image matrices, and the first heat map channel has a corresponding relationship with the first heat map;
- the second image segmentation area corresponding to the image to be segmented is acquired through a second image segmentation model, where the second image segmentation model includes a segmentation area channel, a second heat map channel, and N second matrix channels.
- the N second matrix channels have a one-to-one correspondence with the N image matrices
- the segmented area channels have a corresponding relationship with the first image segmented areas
- the second heat map channels have a corresponding relationship with the second heat map.
- the graph has a corresponding relationship;
- the image recognition result of the image to be divided is generated according to the second image segmentation area.
- FIG. 19 is a schematic diagram of a server structure provided by an embodiment of the present application.
- the server 600 is a possible implementation form of an image processing device.
- the server 600 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 622 (for example, one or more processors) and a memory 632, one or one
- the above storage medium 630 (for example, one or a storage device with a large amount of storage) for storing the application program 642 or the data 644.
- the memory 632 and the storage medium 630 may be short-term storage or persistent storage.
- the program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
- the central processing unit 622 may be configured to communicate with the storage medium 630, and execute a series of instruction operations in the storage medium 630 on the server 600.
- the server 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input and output interfaces 658, and/or one or more operating systems 641, such as WindowsServerTM, MacOSXTM, UnixTM , LinuxTM, FreeBSDTM and so on.
- the steps performed by the server in the above embodiment may be based on the server structure shown in FIG. 19.
- the CPU622 included in the server also has the following functions:
- the model parameters are used to train the image segmentation model to be trained to obtain a second image segmentation model.
- an embodiment of the present application also provides a storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the method provided in the foregoing embodiment.
- the embodiments of the present application also provide a computer program product including instructions, which when run on a computer, cause the computer to execute the method provided in the foregoing embodiments.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be realized in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (17)
- 一种图像区域识别的方法,所述方法由图像处理设备执行,所述方法包括:获取待分割图像,其中,所述待分割图像包括多个极值点;根据所述待分割图像生成第一图像特征信息,其中,所述第一图像特征信息包括N个图像矩阵以及第一热图,所述第一热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;通过第一图像分割模型获取所述第一图像特征信息所对应的第一图像分割区域,其中,所述第一图像分割模型包括第一热图通道以及N个第一矩阵通道,所述N个第一矩阵通道与所述N个图像矩阵具有一一对应的关系,所述第一热图通道与所述第一热图具有对应关系;根据对应所述第一图像分割区域的标注点,以及所述第一图像分割区域获取第二热图;通过第二图像分割模型获取所述待分割图像所对应的第二图像分割区域,其中,所述第二图像分割模型包括分割区域通道、第二热图通道以及N个第二矩阵通道,所述N个第二矩阵通道与所述N个图像矩阵具有一一对应的关系,所述分割区域通道与所述第一图像分割区域具有对应关系,所述第二热图通道与所述第二热图具有对应关系;根据所述第二图像分割区域生成所述待分割图像的图像识别结果。
- 根据权利要求1所述的方法,所述获取待分割图像,包括:接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应多个极值点的位置信息,所述多个极值点用于标识所述目标对象的轮廓边缘;响应于所述物体标注指令,根据所述待处理图像生成所述待分割图像。
- 根据权利要求2所述的方法,所述多个极值点的位置信息包括分别标识所述目标对象的轮廓边缘四周的第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息。
- 根据权利要求1所述的方法,所述根据对应所述第一图像分割区域的标注点,以及所述第一图像分割区域获取第二热图,包括:接收第一标注指令,其中,所述第一标注指令对应M个标注点,所述标注点位于所述第一图像分割区域的区域内部,所述M为大于或等于1的整数;响应于所述第一标注指令,根据所述第一标注指令所对应的所述M个标注点生成所述第二热图;或,所述根据对应所述第一图像分割区域的标注点,以及所述第一图像分割区域获取第二热图,包括:接收第二标注指令,其中,所述第二标注指令对应M个标注点,所述标注点位于所述第一图像分割区域的区域外部,所述M为大于或等于1的整数;响应于所述第二标注指令,根据所述第二标注指令所对应的所述M个标注点生成所述第二热图。
- 根据权利要求1所述的方法,所述N个第一矩阵通道包括红色通道、绿色通道和蓝色通道,所述根据所述待分割图像生成第一图像特征信息,包括:根据所述待分割图像中的所述多个极值点生成所述第一热图;根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色通道的第一图像矩阵,对应所述绿色通道的第二图像矩阵,以及对应所述蓝色通道的第三图像矩阵;根据所述第一热图、所述第一图像矩阵、所述第二图像矩阵以及所述第三图像矩阵,生成所述第一图像特征信息。
- 根据权利要求1所述的方法,所述N个第二矩阵通道包括红色通道、绿色通道和蓝色通道,所述N个图像矩阵通过下列方式确定:根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色通道的第一图像矩阵,对应所述绿色通道的第二图像矩阵,以及对应所述蓝色通道的第三图像矩阵;在所述根据对应所述第一图像分割区域的标注点,以及所述第一图像分割区域获取第二热图之后,所述方法还包括:根据所述第一图像分割区域、所述第二热图、所述第一图像矩阵、所述第二图像矩阵以及所述第三图像矩阵,生成第二图像特征信息,其中,所述第二图像特征信息为获取所述第二图像分割区域时所述第二图像分割模型的输入信息。
- 根据权利要求6所述的方法,所述通过第二图像分割模型获取所述待分割图像所对应的第二图像分割区域,包括:通过所述第二图像分割模型的编码器对所述第二图像特征信息进行编码,得到第一特征图以及第二特征图,其中,所述编码器包括中间流量middle flow模块以及空洞深度可分离卷积,所述空洞深度可分离卷积用于提取所述第二图像特征信息的特征图,所述middle flow模块用于重复执行T次,所述T为大于8的整数;将所述第一特征图以及所述第二特征图进行拼接,得到目标特征图;通过所述第二图像分割模型的解码器对所述目标特征图进行解码,得到所述第二图像分割区域。
- 根据权利要求7所述的方法,所述通过所述第二图像分割模型的解码器对所述目标特征图进行解码,得到所述第二图像分割区域,包括:通过所述第二图像分割模型的解码器对所述目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,所述第一像素点集合包括多个第一像素点,所述第二像素点集合包括第二像素点;根据所述第一像素点集合以及所述第二像素点集合,生成所述第二图像分割区域。
- 一种模型训练的方法,所述方法由图像处理设备执行,所述方法包括:获取待训练图像集合,其中,所述待训练图像集合包括至少一个待训练图像;通过第一图像分割模型获取所述待训练图像的第一预测分割区域,其中,所述第一图像分割模型为预先训练得到的图像分割模型;根据所述待训练图像的真实分割区域以及所述第一预测分割区域,生成待训练热图,其中,所述待训练热图是由至少一个差异点生成的;根据所述待训练图像、所述第一预测分割区域、所述待训练热图以及所述真实分割区域,通过待训练图像分割模型获取第二预测分割区域;根据所述第二预测分割区域以及所述真实分割区域,采用目标损失函数确定所述待训练图像分割模型所对应的模型参数;采用所述模型参数对所述待训练图像分割模型进行训练,得到第二图像分 割模型。
- 根据权利要求9所述的方法,所述目标损失函数表示为:Loss=Pos_loss*(Neg_num/Total_num)+Neg_loss*(Pos_num/Total_num);其中,所述Loss表示所述目标损失函数,所述Pos_loss表示所述第二预测分割区域的正样本损失之和,所述Neg_loss表示所述第二预测分割区域的负样本损失之和,所述Pos_num表示所述真实分割区域中的正样本数量,所述Neg_num表示所述真实分割区域中的负样本数量,所述Total_num表示所述正样本数量与所述负样本数量之和。
- 根据权利要求9所述的方法,所述根据所述待训练图像的真实分割区域以及所述第一预测分割区域,生成待训练热图,包括:根据所述真实分割区域以及所述第一预测分割区域确定差异图,其中,所述差异图表示所述真实分割区域与所述第一预测分割区域不一致的区域;根据所述差异图确定第一候选区域以及第二候选区域;根据所述第一候选区域以及所述第二候选区域选择所述至少一个差异点;根据所述至少一个差异点生成所述待训练热图。
- 一种图像处理设备,包括:获取模块,用于获取待分割图像,其中,所述待分割图像包括多个极值点;生成模块,用于根据所述获取模块获取的所述待分割图像生成第一图像特征信息,其中,所述第一图像特征信息包括N个图像矩阵以及第一热图,所述第一热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;所述获取模块,用于通过第一图像分割模型获取所述生成模块生成的所述第一图像特征信息所对应的第一图像分割区域,其中,所述第一图像分割模型包括第一热图通道以及N个第一矩阵通道,所述N个第一矩阵通道与所述N个图像矩阵具有一一对应的关系,所述第一热图通道与所述第一热图具有对应关系;所述获取模块,还用于根据对应所述第一图像分割区域的标注点,以及所述第一图像分割区域获取第二热图;所述获取模块,还用于通过第二图像分割模型获取所述待分割图像所对应的第二图像分割区域,其中,所述第二图像分割模型包括分割区域通道、第二 热图通道以及N个第二矩阵通道,所述N个第二矩阵通道与所述N个图像矩阵具有一一对应的关系,所述分割区域通道与所述第一图像分割区域具有对应关系,所述第二热图通道与所述第二热图具有对应关系;所述生成模块,用于根据所述第二图像分割区域生成所述待分割图像的图像识别结果。
- 一种图像处理设备,包括:获取模块,用于获取待训练图像集合,其中,所述待训练图像集合包括至少一个待训练图像;所述获取模块,还用于通过第一图像分割模型获取待训练图像的第一预测分割区域,其中,所述第一图像分割模型为预先训练得到的图像分割模型;生成模块,用于根据所述待训练图像的真实分割区域以及所述获取模块获取的所述第一预测分割区域,生成待训练热图,其中,所述待训练热图是由至少一个差异点生成的;所述获取模块,还用于根据所述待训练图像、所述第一预测分割区域、所述生成模块生成的所述待训练热图以及所述真实分割区域,通过待训练图像分割模型获取第二预测分割区域;确定模块,用于根据所述获取模块获取的所述第二预测分割区域以及所述真实分割区域,采用目标损失函数确定所述待训练图像分割模型所对应的模型参数;训练模块,用于采用所述确定模块确定的所述模型参数对所述待训练图像分割模型进行训练,得到第二图像分割模型。
- 一种终端设备,包括:存储器、收发器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:获取待分割图像,其中,所述待分割图像包括多个极值点;根据所述待分割图像生成第一图像特征信息,其中,所述第一图像特征信息包括N个图像矩阵以及第一热图,所述第一热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;通过第一图像分割模型获取所述第一图像特征信息所对应的第一图像分 割区域,其中,所述第一图像分割模型包括第一热图通道以及N个第一矩阵通道,所述N个第一矩阵通道与所述N个图像矩阵具有一一对应的关系,所述第一热图通道与所述第一热图具有对应关系;根据对应所述第一图像分割区域的标注点,以及所述第一图像分割区域获取第二热图;通过第二图像分割模型获取所述待分割图像所对应的第二图像分割区域,其中,所述第二图像分割模型包括分割区域通道、第二热图通道以及N个第二矩阵通道,所述N个第二矩阵通道与所述N个图像矩阵具有一一对应的关系,所述分割区域通道与所述第一图像分割区域具有对应关系,所述第二热图通道与所述第二热图具有对应关系;根据所述第二图像分割区域生成所述待分割图像的图像识别结果;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 一种服务器,包括:存储器、收发器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:获取待训练图像集合,其中,所述待训练图像集合包括至少一个待训练图像;通过第一图像分割模型获取待训练图像的第一预测分割区域,其中,所述第一图像分割模型为预先训练得到的图像分割模型;根据所述待训练图像的真实分割区域以及所述第一预测分割区域,生成待训练热图,其中,所述待训练热图是由至少一个差异点生成的;根据所述待训练图像、所述第一预测分割区域、所述待训练热图以及所述真实分割区域,通过待训练图像分割模型获取第二预测分割区域;根据所述第二预测分割区域以及所述真实分割区域,采用目标损失函数确定所述待训练图像分割模型所对应的模型参数;采用所述模型参数对所述待训练图像分割模型进行训练,得到第二图像分割模型;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 一种计算机可读存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行如权利要求1至8中任一项所述的方法,或,执行如权利要求9至11中任一项所述的方法。
- 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行如权利要求1至8中任一项所述的方法,或,执行如权利要求9至11中任一项所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20827039.7A EP3989166A4 (en) | 2019-06-20 | 2020-06-16 | METHOD AND DEVICE FOR IMAGE AREA RECOGNITION BASED ON ARTIFICIAL INTELLIGENCE AND MODEL TRAINING METHOD AND DEVICE |
JP2021537734A JP7238139B2 (ja) | 2019-06-20 | 2020-06-16 | 人工知能による画像領域の認識方法、モデルのトレーニング方法、画像処理機器、端末機器、サーバー、コンピュータ機器及びコンピュータプログラム |
US17/395,329 US11983881B2 (en) | 2019-06-20 | 2021-08-05 | AI-based image region recognition method and apparatus and AI-based model training method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910537529.XA CN110232696B (zh) | 2019-06-20 | 2019-06-20 | 一种图像区域分割的方法、模型训练的方法及装置 |
CN201910537529.X | 2019-06-20 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/395,329 Continuation US11983881B2 (en) | 2019-06-20 | 2021-08-05 | AI-based image region recognition method and apparatus and AI-based model training method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020253663A1 true WO2020253663A1 (zh) | 2020-12-24 |
Family
ID=67856917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096237 WO2020253663A1 (zh) | 2019-06-20 | 2020-06-16 | 基于人工智能的图像区域识别方法、模型训练方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11983881B2 (zh) |
EP (1) | EP3989166A4 (zh) |
JP (1) | JP7238139B2 (zh) |
CN (1) | CN110232696B (zh) |
WO (1) | WO2020253663A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116188995A (zh) * | 2023-04-13 | 2023-05-30 | 国家基础地理信息中心 | 一种遥感图像特征提取模型训练方法、检索方法及装置 |
CN116563615A (zh) * | 2023-04-21 | 2023-08-08 | 南京讯思雅信息科技有限公司 | 基于改进多尺度注意力机制的不良图片分类方法 |
CN114742988B (zh) * | 2022-03-14 | 2024-06-28 | 上海人工智能创新中心 | 多阶段检测器进行点标注到框标注转换的方法 |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232696B (zh) | 2019-06-20 | 2024-03-08 | 腾讯科技(深圳)有限公司 | 一种图像区域分割的方法、模型训练的方法及装置 |
KR20210042696A (ko) * | 2019-10-10 | 2021-04-20 | 삼성전자주식회사 | 모델 학습 방법 및 장치 |
CN110826449A (zh) * | 2019-10-30 | 2020-02-21 | 杭州叙简科技股份有限公司 | 基于轻量型卷积神经网络的非机动车再识别目标检索方法 |
CN110910405B (zh) * | 2019-11-20 | 2023-04-18 | 湖南师范大学 | 基于多尺度空洞卷积神经网络的脑肿瘤分割方法及系统 |
CN110889858A (zh) * | 2019-12-03 | 2020-03-17 | 中国太平洋保险(集团)股份有限公司 | 一种基于点回归的汽车部件分割方法及装置 |
CN111210439B (zh) * | 2019-12-26 | 2022-06-24 | 中国地质大学(武汉) | 通过抑制非感兴趣信息的语义分割方法、设备及存储设备 |
CN111259900A (zh) * | 2020-01-13 | 2020-06-09 | 河海大学 | 一种卫星遥感图像的语义分割方法 |
CN111325714B (zh) * | 2020-01-21 | 2024-03-26 | 上海联影智能医疗科技有限公司 | 感兴趣区域的处理方法、计算机设备和可读存储介质 |
CN113221897B (zh) * | 2020-02-06 | 2023-04-18 | 马上消费金融股份有限公司 | 图像矫正方法、图像文本识别方法、身份验证方法及装置 |
CN111445440B (zh) * | 2020-02-20 | 2023-10-31 | 上海联影智能医疗科技有限公司 | 一种医学图像分析方法、设备和存储介质 |
JP7446903B2 (ja) * | 2020-04-23 | 2024-03-11 | 株式会社日立製作所 | 画像処理装置、画像処理方法及び画像処理システム |
CN111582104B (zh) * | 2020-04-28 | 2021-08-06 | 中国科学院空天信息创新研究院 | 基于自注意特征聚合网络的遥感图像语义分割方法及装置 |
CN111598900B (zh) * | 2020-05-18 | 2022-08-09 | 腾讯医疗健康(深圳)有限公司 | 一种图像区域分割模型训练方法、分割方法和装置 |
US11823379B2 (en) * | 2020-08-05 | 2023-11-21 | Ping An Technology (Shenzhen) Co., Ltd. | User-guided domain adaptation for rapid annotation from user interactions for pathological organ segmentation |
CN112116612A (zh) * | 2020-09-15 | 2020-12-22 | 南京林业大学 | 基于Mask R-CNN的行道树图像实例分割方法 |
CN112258431B (zh) * | 2020-09-27 | 2021-07-20 | 成都东方天呈智能科技有限公司 | 基于混合深度可分离膨胀卷积的图像分类模型及其分类方法 |
CN112634282B (zh) * | 2020-12-18 | 2024-02-13 | 北京百度网讯科技有限公司 | 图像处理方法、装置以及电子设备 |
CN112633148B (zh) * | 2020-12-22 | 2022-08-09 | 杭州景联文科技有限公司 | 一种签名指印真假检测方法及系统 |
CN112529894B (zh) * | 2020-12-22 | 2022-02-15 | 徐州医科大学 | 一种基于深度学习网络的甲状腺结节的诊断方法 |
CN113538456B (zh) * | 2021-06-22 | 2022-03-18 | 复旦大学 | 基于gan网络的图像软分割及背景替换系统 |
CN113608805B (zh) * | 2021-07-08 | 2024-04-12 | 阿里巴巴创新公司 | 掩膜预测方法、图像处理方法、显示方法及设备 |
CN113989251B (zh) * | 2021-11-02 | 2022-05-24 | 河南中平自动化股份有限公司 | 一种矿用煤矸分选智能控制系统及方法 |
CN113850249A (zh) * | 2021-12-01 | 2021-12-28 | 深圳市迪博企业风险管理技术有限公司 | 一种图表信息格式化提取方法 |
CN114187318B (zh) * | 2021-12-10 | 2023-05-05 | 北京百度网讯科技有限公司 | 图像分割的方法、装置、电子设备以及存储介质 |
CN114049569B (zh) * | 2022-01-13 | 2022-03-18 | 自然资源部第三地理信息制图院 | 一种深度学习模型性能评价方法及系统 |
CN116934769A (zh) * | 2022-03-29 | 2023-10-24 | 北京字跳网络技术有限公司 | 交互式分割模型训练方法、标注数据生成方法及设备 |
CN114918944A (zh) * | 2022-06-02 | 2022-08-19 | 哈尔滨理工大学 | 基于卷积神经网络融合的家庭服务机器人抓取检测方法 |
CN115272288B (zh) * | 2022-08-22 | 2023-06-02 | 杭州微引科技有限公司 | 一种医学图像标记点自动识别方法、电子设备及存储介质 |
CN115861739B (zh) * | 2023-02-08 | 2023-07-14 | 海纳云物联科技有限公司 | 图像分割模型的训练方法、装置、设备、存储介质及产品 |
CN116020122B (zh) * | 2023-03-24 | 2023-06-09 | 深圳游禧科技有限公司 | 游戏攻略推荐方法、装置、设备及存储介质 |
CN116071376B (zh) * | 2023-04-04 | 2023-06-20 | 江苏势通生物科技有限公司 | 图像分割方法及相关装置、设备和存储介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080240564A1 (en) * | 2007-03-29 | 2008-10-02 | Siemens Corporate Research, Inc. | Fast 4D Segmentation of Large Datasets Using Graph Cuts |
CN102982529A (zh) * | 2011-08-31 | 2013-03-20 | 奥林巴斯株式会社 | 图像处理装置及图像处理方法 |
US20160098589A1 (en) * | 2014-08-29 | 2016-04-07 | Definiens Ag | Applying Pixelwise Descriptors to a Target Image that are Generated by Segmenting Objects in Other Images |
US20170124415A1 (en) * | 2015-11-04 | 2017-05-04 | Nec Laboratories America, Inc. | Subcategory-aware convolutional neural networks for object detection |
CN107657619A (zh) * | 2017-10-13 | 2018-02-02 | 西安科技大学 | 一种低照度林火图像分割方法 |
CN110210487A (zh) * | 2019-05-30 | 2019-09-06 | 上海商汤智能科技有限公司 | 一种图像分割方法及装置、电子设备和存储介质 |
CN110232696A (zh) * | 2019-06-20 | 2019-09-13 | 腾讯科技(深圳)有限公司 | 一种图像区域分割的方法、模型训练的方法及装置 |
CN110276344A (zh) * | 2019-06-04 | 2019-09-24 | 腾讯科技(深圳)有限公司 | 一种图像分割的方法、图像识别的方法以及相关装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9740957B2 (en) * | 2014-08-29 | 2017-08-22 | Definiens Ag | Learning pixel visual context from object characteristics to generate rich semantic images |
EP3679581A4 (en) * | 2017-08-31 | 2021-09-01 | Zeto, Inc. | HIGH-RESOLUTION ELECTROENCEPHALOGRAPHY DATA HOSTING PROCESS |
-
2019
- 2019-06-20 CN CN201910537529.XA patent/CN110232696B/zh active Active
-
2020
- 2020-06-16 EP EP20827039.7A patent/EP3989166A4/en active Pending
- 2020-06-16 JP JP2021537734A patent/JP7238139B2/ja active Active
- 2020-06-16 WO PCT/CN2020/096237 patent/WO2020253663A1/zh active Application Filing
-
2021
- 2021-08-05 US US17/395,329 patent/US11983881B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080240564A1 (en) * | 2007-03-29 | 2008-10-02 | Siemens Corporate Research, Inc. | Fast 4D Segmentation of Large Datasets Using Graph Cuts |
CN102982529A (zh) * | 2011-08-31 | 2013-03-20 | 奥林巴斯株式会社 | 图像处理装置及图像处理方法 |
US20160098589A1 (en) * | 2014-08-29 | 2016-04-07 | Definiens Ag | Applying Pixelwise Descriptors to a Target Image that are Generated by Segmenting Objects in Other Images |
US20170124415A1 (en) * | 2015-11-04 | 2017-05-04 | Nec Laboratories America, Inc. | Subcategory-aware convolutional neural networks for object detection |
CN107657619A (zh) * | 2017-10-13 | 2018-02-02 | 西安科技大学 | 一种低照度林火图像分割方法 |
CN110210487A (zh) * | 2019-05-30 | 2019-09-06 | 上海商汤智能科技有限公司 | 一种图像分割方法及装置、电子设备和存储介质 |
CN110276344A (zh) * | 2019-06-04 | 2019-09-24 | 腾讯科技(深圳)有限公司 | 一种图像分割的方法、图像识别的方法以及相关装置 |
CN110232696A (zh) * | 2019-06-20 | 2019-09-13 | 腾讯科技(深圳)有限公司 | 一种图像区域分割的方法、模型训练的方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3989166A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114742988B (zh) * | 2022-03-14 | 2024-06-28 | 上海人工智能创新中心 | 多阶段检测器进行点标注到框标注转换的方法 |
CN116188995A (zh) * | 2023-04-13 | 2023-05-30 | 国家基础地理信息中心 | 一种遥感图像特征提取模型训练方法、检索方法及装置 |
CN116188995B (zh) * | 2023-04-13 | 2023-08-15 | 国家基础地理信息中心 | 一种遥感图像特征提取模型训练方法、检索方法及装置 |
CN116563615A (zh) * | 2023-04-21 | 2023-08-08 | 南京讯思雅信息科技有限公司 | 基于改进多尺度注意力机制的不良图片分类方法 |
CN116563615B (zh) * | 2023-04-21 | 2023-11-07 | 南京讯思雅信息科技有限公司 | 基于改进多尺度注意力机制的不良图片分类方法 |
Also Published As
Publication number | Publication date |
---|---|
US20210366123A1 (en) | 2021-11-25 |
CN110232696B (zh) | 2024-03-08 |
EP3989166A4 (en) | 2022-08-17 |
EP3989166A1 (en) | 2022-04-27 |
JP2022515620A (ja) | 2022-02-21 |
JP7238139B2 (ja) | 2023-03-13 |
US11983881B2 (en) | 2024-05-14 |
CN110232696A (zh) | 2019-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020253663A1 (zh) | 基于人工智能的图像区域识别方法、模型训练方法及装置 | |
WO2020244373A1 (zh) | 基于人工智能的图像识别方法以及相关装置 | |
JP7090971B2 (ja) | 画像融合方法、モデル訓練方法、画像融合装置、モデル訓練装置、端末機器、サーバ機器、及びコンピュータプログラム | |
JP7408048B2 (ja) | 人工知能に基づくアニメキャラクター駆動方法及び関連装置 | |
CN109635621B (zh) | 用于第一人称视角中基于深度学习识别手势的系统和方法 | |
EP3940638A1 (en) | Image region positioning method, model training method, and related apparatus | |
WO2020192471A1 (zh) | 一种图像分类模型训练的方法、图像处理的方法及装置 | |
CN110490213B (zh) | 图像识别方法、装置及存储介质 | |
WO2020103721A1 (zh) | 信息处理的方法、装置及存储介质 | |
CN110704661B (zh) | 一种图像分类方法和装置 | |
CN112990390B (zh) | 一种图像识别模型的训练方法、图像识别的方法及装置 | |
CN110555337B (zh) | 一种指示对象的检测方法、装置以及相关设备 | |
CN111209423B (zh) | 一种基于电子相册的图像管理方法、装置以及存储介质 | |
CN110517339B (zh) | 一种基于人工智能的动画形象驱动方法和装置 | |
CN113421547B (zh) | 一种语音处理方法及相关设备 | |
CN113723378B (zh) | 一种模型训练的方法、装置、计算机设备和存储介质 | |
CN115471662B (zh) | 语义分割模型的训练方法、识别方法、装置和存储介质 | |
CN111914106B (zh) | 纹理与法线库构建方法、纹理与法线图生成方法及装置 | |
CN116543076B (zh) | 图像处理方法、装置、电子设备及存储介质 | |
CN113723168A (zh) | 一种基于人工智能的主体识别方法、相关装置及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20827039 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021537734 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2020827039 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2020827039 Country of ref document: EP Effective date: 20220120 |