WO2020244373A1 - 基于人工智能的图像识别方法以及相关装置 - Google Patents
基于人工智能的图像识别方法以及相关装置 Download PDFInfo
- Publication number
- WO2020244373A1 WO2020244373A1 PCT/CN2020/090787 CN2020090787W WO2020244373A1 WO 2020244373 A1 WO2020244373 A1 WO 2020244373A1 CN 2020090787 W CN2020090787 W CN 2020090787W WO 2020244373 A1 WO2020244373 A1 WO 2020244373A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- heat map
- image segmentation
- information
- matrix
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 121
- 238000013473 artificial intelligence Methods 0.000 title description 4
- 238000003709 image segmentation Methods 0.000 claims abstract description 362
- 239000011159 matrix material Substances 0.000 claims description 169
- 230000011218 segmentation Effects 0.000 claims description 103
- 238000002372 labelling Methods 0.000 claims description 64
- 238000012545 processing Methods 0.000 claims description 55
- 230000006870 function Effects 0.000 claims description 45
- 230000004044 response Effects 0.000 claims description 36
- 230000008569 process Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 31
- 238000005516 engineering process Methods 0.000 description 11
- 238000013461 design Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000002708 enhancing effect Effects 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 230000004438 eyesight Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 241000282693 Cercopithecidae Species 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000002715 modification method Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/149—Segmentation; Edge detection involving deformable models, e.g. active contour models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/48—Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- This application relates to the field of artificial intelligence, especially to image recognition.
- image segmentation technology has become more and more widely used, such as medical image segmentation and natural image segmentation.
- the image segmentation technology refers to the technology that divides the image into several specific areas with unique properties and proposes objects of interest. For example, in the human body tissue image segmentation scene, medical images can be segmented, so that the various tissues of the human body can be clearly distinguished in the segmented image.
- the embodiments of this application provide an image recognition method and related devices based on artificial intelligence, which use the heat map generated by extreme points as part of the image feature information to enrich the features of the image, thereby generating more accurate image segmentation regions. This improves the versatility and applicability of image segmentation.
- an image segmentation method including:
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the image recognition result of the image to be divided is generated according to the image segmentation area.
- the second aspect of the present application provides an image recognition method, including:
- an object labeling instruction for an image to be processed wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- a third aspect of the present application provides an image recognition device, including:
- An acquisition module for acquiring an image to be divided, wherein the image to be divided includes a plurality of extreme points
- a generating module configured to generate image feature information according to the image to be divided obtained by the obtaining module, wherein the image feature information includes N image matrices and a heat map, and the heat map is based on the multiple extreme values Point generated, the N is an integer greater than or equal to 1;
- the acquisition module is further configured to acquire the image segmentation area corresponding to the image feature information generated by the generation module through the image segmentation model, wherein the image segmentation model includes N matrix input channels and one heat map input channel , The N matrix input channels have a one-to-one correspondence with the N image matrices, and the heat map input channels have a corresponding relationship with the heat map;
- the generating module is further configured to generate an image recognition result of the image to be divided according to the image segmentation area acquired by the acquiring module.
- the acquisition module is specifically configured to display an image to be processed, wherein the image to be processed includes a target object;
- the object labeling instruction carries position information of a plurality of extreme points corresponding to the target object, and the plurality of extreme points are used to identify the contour edge of the target object;
- the image to be segmented is generated according to the image to be processed.
- the position information of the multiple extreme points includes first poles around the edge of the contour of the target object.
- Value point location information, second extreme value point location information, third extreme value point location information, and fourth extreme value point location information includes first poles around the edge of the contour of the target object.
- the image segmentation device further includes a receiving module and a processing module;
- the receiving module is configured to receive a first adjustment instruction for a first vertex, where the first vertex belongs to an edge point of the image segmentation area, and the first vertex corresponds to first position information;
- the processing module is configured to, in response to the first adjustment instruction received by the receiving module, perform reduction processing on the image segmentation area to obtain a target segmentation area, wherein the target segmentation area includes A second vertex obtained by vertex adjustment, where the second vertex corresponds to second position information, and the second position information is different from the first position information.
- the image segmentation device further includes the receiving module and the processing module;
- the receiving module is further configured to receive a second adjustment instruction for a third vertex, where the third vertex belongs to the image segmentation area;
- the processing module is further configured to, in response to the second adjustment instruction received by the receiving module, perform magnification processing on the image segmentation area to obtain a target segmentation area, wherein the target segmentation area includes The fourth vertex obtained by adjusting the three vertex points.
- the N matrix input channels include a red input channel, a green input channel, and a blue input channel
- the generating module is specifically configured to generate the heat map according to the multiple extreme points in the image to be segmented;
- N image matrices according to the image to be divided, the N image matrices including a first image matrix corresponding to the red input channel, a second image matrix corresponding to the green input channel, and a second image matrix corresponding to the blue input The third image matrix of the channel.
- the acquisition module is specifically configured to encode the image feature information by the encoder of the image segmentation model to obtain a first feature map and a second feature map;
- the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- the acquisition module is specifically configured to decode the target feature map through the decoder of the image segmentation model to obtain a first pixel point set and a second pixel point set, wherein the first pixel point set includes multiple A first pixel point, and the second pixel point set includes a second pixel point;
- the image segmentation area is generated according to the first pixel point set and the second pixel point set.
- the image segmentation device further includes the processing module and the determining module;
- the processing module is further configured to, after the acquisition module acquires the image segmentation area corresponding to the image feature information through the image segmentation model, process the image to be segmented through a polygon fitting function to obtain polygon vertex information, where ,
- the vertex information of the polygon includes position information of multiple vertices;
- the determining module is configured to determine a target object from the image to be divided according to the polygon vertex information processed by the processing module.
- a fourth aspect of the present application provides an image recognition device, including:
- a receiving module configured to receive an object labeling instruction for an image to be processed, wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- a generating module configured to generate an image to be segmented according to the image to be processed in response to the object labeling instruction received by the receiving module
- the generating module is further configured to generate image feature information according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, and the heat map is generated based on the multiple extreme points ,
- the N is an integer greater than or equal to 1;
- the acquiring module is configured to acquire the image segmentation area corresponding to the image feature information generated by the generating module through the image segmentation model, wherein the image segmentation model includes N matrix input channels and one heat map input channel, and The N matrix input channels have a one-to-one correspondence with the N image matrices, and the heat map input channels have a corresponding relationship with the heat map;
- a processing module configured to process the to-be-divided image acquired by the acquisition module through a polygon fitting function to obtain polygon vertex information, wherein the polygon vertex information includes position information of multiple vertices;
- the display module is configured to display the target object in the image to be divided according to the polygon vertex information processed by the processing module.
- a fifth aspect of the present application provides a terminal device, including: a memory, a transceiver, a processor, and a bus system;
- the memory is used to store programs
- the processor is used to execute the program in the memory and includes the following steps:
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the bus system is used to connect the memory and the processor, so that the memory and the processor communicate.
- a sixth aspect of the present application provides a server, including: a memory, a transceiver, a processor, and a bus system;
- the memory is used to store programs
- the processor is used to execute the program in the memory and includes the following steps:
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the bus system is used to connect the memory and the processor, so that the memory and the processor communicate.
- a seventh aspect of the present application provides a terminal device, including: a memory, a transceiver, a processor, and a bus system;
- the memory is used to store programs
- the processor is used to execute the program in the memory and includes the following steps:
- an object labeling instruction for an image to be processed wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the bus system is used to connect the memory and the processor, so that the memory and the processor communicate.
- the eighth aspect of the present application provides a computer-readable storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the methods described in each of the foregoing aspects.
- the ninth aspect of the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute the method described in the above aspect.
- a method for image segmentation is provided.
- the image to be segmented is first obtained, where the image to be segmented includes multiple extreme points, and then image feature information is generated according to the image to be segmented, where the image feature information includes N An image matrix and a heat map.
- the heat map is generated based on multiple extreme points, and then the image segmentation area corresponding to the image feature information is obtained through the image segmentation model.
- the image segmentation model includes N matrix input channels and a heat map Input channels, N matrix input channels have a one-to-one correspondence with N image matrices, and heat map input channels have a corresponding relationship with the heat map.
- the image recognition result of the image to be segmented is generated according to the image segmentation area.
- the heat map generated by the extreme point is used as a part of the image feature information, which enriches the feature content of the image, so that the image segmentation model can be generated more accurately based on the image feature information
- the image segmentation area thereby enhancing the versatility and applicability of image segmentation.
- FIG. 1 is a schematic diagram of an architecture of an image recognition system in an embodiment of the application
- FIG. 2 is a schematic structural diagram of an image segmentation model in an embodiment of the application
- FIG. 3 is a schematic diagram of an embodiment of an image recognition method in an embodiment of the application.
- FIG. 4 is a schematic diagram of an embodiment of selecting four extreme points in an embodiment of the application.
- FIG. 5 is a schematic diagram of an embodiment in which the image segmentation model returns to the image segmentation area in the embodiment of the application;
- Fig. 6 is a schematic diagram of an embodiment of reducing image segmentation areas in an embodiment of the application.
- FIG. 7 is a schematic diagram of an embodiment of increasing the image segmentation area in an embodiment of the application.
- FIG. 8 is a schematic diagram of an embodiment of generating image feature information in an embodiment of the application.
- FIG. 9 is a schematic structural diagram of an image segmentation model in an embodiment of the application.
- FIG. 10 is a schematic diagram of an embodiment of an image segmentation model output process in an embodiment of the application.
- FIG. 11 is a schematic diagram of an embodiment of an image recognition method in an embodiment of the application.
- FIG. 12 is a schematic diagram of a comparison of experimental results based on a segmentation method in an embodiment of the application.
- FIG. 13 is a schematic diagram of an embodiment of an image recognition device in an embodiment of the application.
- FIG. 14 is a schematic diagram of another embodiment of an image recognition device in an embodiment of the application.
- 15 is a schematic diagram of another embodiment of an image recognition device in an embodiment of the application.
- FIG. 16 is a schematic diagram of an embodiment of an image recognition device in an embodiment of the application.
- FIG. 17 is a schematic structural diagram of a terminal device in an embodiment of the application.
- FIG. 18 is a schematic diagram of a structure of a server in an embodiment of the application.
- the embodiments of the application provide a method for image segmentation, a method for image recognition, and related devices.
- the heat map generated by extreme points is used as part of image feature information to enrich the features of the image, thereby generating a more accurate image segmentation area , Thereby enhancing the versatility and applicability of image segmentation.
- the image segmentation method and image recognition method provided in this application can be applied to the field of artificial intelligence, and specifically can be applied to the field of computer vision.
- image processing and analysis have gradually formed a set of scientific systems, and new processing methods have emerged one after another.
- development history is not long, it has attracted widespread attention from all walks of life.
- vision is the most important means of human perception, and images are the foundation of vision. Therefore, digital images have become effective tools for scholar in psychology, physiology, and computer science to study visual perception.
- image processing in large-scale applications such as military, remote sensing, and meteorology.
- Image segmentation technology has always been a basic technology and an important research direction in the field of computer vision. Specifically, it is to segment an area of interest (such as people, cars, buildings, etc.) from an image according to real contours. Image segmentation technology is an important part of image semantic understanding. In recent years, with the development of neural networks, image processing capabilities have been significantly improved. Image segmentation technology is used in medical image analysis (including tumor and other pathological positioning, tissue volume measurement, computer Guided surgery, customization of treatment plans, research on anatomical structures), face recognition, fingerprint recognition, unmanned driving, and machine vision have also played a more important role.
- Figure 1 is a schematic diagram of the architecture of the image recognition system in an embodiment of this application.
- the image processing equipment provided by this application includes a terminal device or a server.
- the client terminal device can be an auxiliary segmentation tool.
- the terminal device deployed with the client terminal includes, but is not limited to, tablet computers, laptops, handheld computers, mobile phones, voice interactive devices, and personal computers ( personal computer, PC), not limited here.
- this application proposes an interactive image auxiliary segmentation tool based on a neural network model (ie, image segmentation model).
- a neural network model ie, image segmentation model
- the auxiliary segmentation tool only needs to obtain a small amount of user interaction behavior, and it can feed back a more accurate pre-segmentation result (that is, the image segmentation area) through the neural network model (ie image segmentation model), and then the user based
- the result of pre-segmentation that is, the image segmentation area
- the final segmentation result that is, the target segmentation area can be obtained.
- This application proposes a "four-point interactive" segmentation method, and improves the original image segmentation model, thereby obtaining better segmentation results and real-time tool performance.
- the image segmentation model can be deployed in the server as the image processing device, and the image segmentation model is used to predict the image segmentation area, so as to achieve the purpose of image online segmentation.
- the image segmentation model can also be deployed in the terminal as the image processing device On the device, the image segmentation model is used to predict the image segmentation area, so as to achieve the purpose of offline image segmentation.
- FIG. 2 is a schematic structural diagram of the image segmentation model in the embodiment of this application.
- the user uses the auxiliary segmentation tool to label the extreme points of the image to be processed, such as labeling the tree in Figure 2
- the auxiliary segmentation tool generates a heat map 100 according to the results marked by the user, and the heat map 100 is combined with the image matrix 200 of the image to be processed to obtain image feature information.
- the image feature information is input to the image segmentation model 300, and features are extracted by the image segmentation model 300, thereby outputting the image segmentation area 400, such as the black tree-like area shown in FIG. 2.
- the image segmentation model can be an image segmentation convolutional neural network (Convolutional Neural Networks, CNN), and its model structure mainly includes an input layer, a feature extraction layer, and an output layer.
- CNN convolutional Neural Networks
- An embodiment of the image recognition method in the embodiment of this application includes:
- the image to be segmented includes multiple extreme points.
- the image recognition device acquires the image to be segmented, where the image recognition device can be represented as an auxiliary segmentation tool deployed therein, the image to be segmented can be annotated by the auxiliary segmentation tool, and the user uses the auxiliary segmentation tool to annotate multiple The extreme points are used to generate the image to be segmented based on these extreme points.
- the image recognition device provided in this application may be a terminal device or a server.
- the multiple extreme points may be the highest point, the lowest point, the leftmost point, and the rightmost point of the target object in the image to be segmented.
- the image feature information includes N image matrices and heat maps, the heat maps are generated based on multiple extreme points, and N is an integer greater than or equal to 1.
- the image segmentation device generates N image matrices based on the image to be segmented, and generates a heat map based on multiple extreme points, and combines the heat map with the N image matrices to obtain image feature information corresponding to the image to be segmented .
- the digital image data can be represented by a matrix. If the size of the read image to be divided is 128*128, the size of the image matrix is 128*128*N, where N is an integer greater than or equal to 1.
- the image matrix can be a matrix corresponding to a grayscale image.
- the image matrix can be a matrix of red green blue (RGB) images.
- RGB image is three-dimensional. The three dimensions represent the red, green and blue components, and the size is 0 to 255. Each pixel is composed of these three components.
- Each RGB channel corresponds to an image matrix (that is, the first image matrix, the second image matrix, and the third image matrix).
- the three RGB channels are stacked together to form a color image, that is, the image to be divided is obtained.
- the image matrix can be red, green, blue and Alpha (red green blue Alpha, RGBA) color space.
- RGBA red green blue Alpha
- PNG Portable Network Graphics
- the image segmentation model includes N matrix input channels and one heat map input channel.
- the N matrix input channels have a one-to-one correspondence with the N image matrices
- the heat map input channels have a corresponding relationship with the heat map.
- the corresponding relationship proposed here can be understood as, if the image matrix a has a corresponding relationship with the matrix input channel a, when the image segmentation area corresponding to the image feature information is obtained through the image segmentation model, the image matrix a inputs the image from the matrix input channel a Split the model.
- the heat map and heat map input channels are also input in this way.
- the image segmentation device inputs the image feature information into the image segmentation model, where the image segmentation model may adopt a Deep Lab structure, including but not limited to DeepLabV1, DeepLabV2, DeepLabV3, and DeepLabV3+.
- the DeepLabV2 structure is a CNN model structure for image segmentation. It inputs a picture and outputs a mask image of the same size as the original picture. The value of each pixel in the picture represents the category label value that this pixel belongs to.
- the DeepLabV3+ structure is an improved CNN model structure for image segmentation based on DeeplabV2. It usually achieves better results in image segmentation competitions.
- CNN is a development of neural network model. It uses convolutional layer to replace the fully connected layer structure in artificial neural network, and has achieved very excellent performance in various computer vision fields.
- This application needs to improve the structure of the image segmentation model and modify the first layer parameters of the image segmentation model so that the image segmentation model can receive (N+1) channels of image data, that is, the image segmentation model includes N Two matrix input channels and one heat map input channel. Assuming that N is 3, it means that there are 3 image matrices, this time corresponding to 3 matrix input channels, each matrix input channel corresponds to an image matrix, and there is also a heat map input channel at this time, the heat map input channel corresponds to Heat map.
- N 1
- 1 matrix input channel corresponds to an image matrix of grayscale images
- the heat map input channel corresponds to the heat map.
- N 4 image matrices, which correspond to 4 matrix input channels at this time, and each matrix input channel corresponds to an image matrix, and there is also a heat map input channel at this time, the heat map input The channel corresponds to the heat map.
- the image segmentation device generates the image recognition result of the image to be segmented according to the image segmentation area.
- the image segmentation area is a mask image. Based on the mask image, the edge of the target object in the image to be segmented can be obtained. , The user can manually adjust the edge of the image segmentation area, and finally get the image recognition result.
- the image recognition result can be displayed through text information, for example, the image recognition result is an object such as "monkey” or "car”.
- the result of image recognition can also be to highlight the target object in the image to be segmented.
- the target object can be an object such as a "car” or a "monkey".
- an image recognition method is provided. First, an image to be segmented is acquired, where the image to be segmented includes multiple extreme points, and then image feature information is generated based on the image to be segmented, where the image feature information includes the first An image matrix, a second image matrix, a third image matrix, and a heat map. The heat map is generated based on multiple extreme points. Finally, the image segmentation area corresponding to the image feature information can be obtained through the image segmentation model.
- the image segmentation The model includes a first input channel, a second input channel, a third input channel, and a fourth input channel. The first input channel has a corresponding relationship with the first image matrix, and the second input channel has a corresponding relationship with the second image matrix.
- the input channel has a corresponding relationship with the third image matrix
- the fourth input channel has a corresponding relationship with the heat map.
- acquiring the image to be divided may include:
- an object labeling instruction for the image to be processed where the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object, and the multiple extreme points are used to identify the The contour edge of the target object;
- the extreme value points can be determined around the contour edge of the target object, for example, the extreme value points in the four directions of up, down, left, and right, such as the situation shown in FIG. 4.
- the plurality of extreme points may include four, and correspondingly, the position information of the four extreme points includes the position information of the first extreme point, the position information of the second extreme point, and the position of the third extreme point.
- the image to be segmented is generated according to the image to be processed.
- FIG. 4 is a schematic diagram of an embodiment of selecting four extreme points in an embodiment of this application.
- a to-be-processed image is first shown, and the to-be-processed image includes a target object.
- target objects include, but are not limited to, people, animals, vehicles, and other objects.
- the user can trigger the object labeling instructions, such as selecting several extreme points from the image to be processed by clicking and selecting.
- the user can select through the auxiliary segmentation tool
- the four extreme points of the tree namely the first extreme point A, the second extreme point B, the third extreme point C, and the fourth extreme point D.
- the object labeling instruction specifically carries the coordinate information of these four extreme points, so that the image to be segmented corresponding to the image to be processed is generated according to the object labeling instruction, and the image to be segmented is shown in Figure 4 as the image corresponding to the tree, and
- the image to be segmented includes an area constituted by a first extreme point A, a second extreme point B, a third extreme point C, and a fourth extreme point D.
- the auxiliary segmentation tool generates image feature information (including heat map and image matrix) according to the image to be segmented, and then obtains the image segmentation area corresponding to the image feature information through the image segmentation model.
- image feature information including heat map and image matrix
- Figure 5 is the image in the embodiment of this application.
- the segmentation model returns a schematic diagram of an embodiment of the image segmentation area.
- the auxiliary segmentation tool calculates the image segmentation area according to the four extreme points and returns the image segmentation area.
- the image corresponding to the shaded area in Figure 5 is Divide the area for the image. It is understandable that the image segmentation area may be a pre-segmented polygon result.
- FIG. 5 is only an illustration, and should not be construed as a limitation of the application.
- a method for marking extreme points is provided.
- the image to be processed is first displayed, and then an object marking instruction is received.
- the object marking instruction carries the first extreme point position information corresponding to the target object, The second extreme point location information, the third extreme point location information, and the fourth extreme point location information, and finally in response to the object labeling instruction, generate the image to be segmented according to the image to be processed.
- the auxiliary segmentation tool can be used to label the image to be processed, the operation of the auxiliary segmentation tool is less difficult, and the convenience of use is higher, thereby improving the feasibility and operability of the solution.
- the image segmentation corresponding to the image feature information is obtained through the image segmentation model.
- the image segmentation area is reduced to obtain a target segmentation area, where the target segmentation area includes a second vertex adjusted based on the first vertex, and the second vertex corresponds to the second position information.
- the second position information is different from the first position information.
- a method for adjusting the image segmentation area is introduced.
- the user can trigger the first adjustment instruction through the auxiliary segmentation tool.
- FIG. 6, is the reduced image segmentation area in this embodiment of the application.
- the modification method includes dragging the edges or vertices of the polygon.
- the image segmentation area has a vertex E1 , Vertex E2 and vertex E3, where the line segment formed by vertex E1 and vertex E2 exceeds the range of the tree, therefore, the user can trigger the first adjustment instruction, that is, press the first vertex (such as vertex E2) to the target object (such as tree ) To change the position of the first vertex.
- the auxiliary segmentation tool responds to the first adjustment instruction to reduce the image segmentation area to obtain the target segmentation area, which is the adjusted image segmentation area.
- the original position of the first vertex has changed and becomes the position where the second vertex is located, and the second vertex can be the position shown in E3 in FIG. 6.
- a method for adjusting the image segmentation area that is, receiving a first adjustment instruction, and then in response to the first adjustment instruction, performing reduction processing on the image segmentation area to obtain a target segmentation area.
- the image segmentation corresponding to the image feature information is obtained through the image segmentation model After the area, you can also include:
- the image segmentation area is enlarged to obtain a target segmentation area, where the target segmentation area includes a fourth vertex adjusted based on the third vertex.
- FIG. 7 is an enlarged image in this embodiment of the application.
- a schematic diagram of an embodiment of the segmented area as shown in the figure, if there is an error in the pre-segmented image segmented area, the user can directly modify the image segmented area.
- the modification method includes adding vertices on the polygon edge or drawing a new polygon Cover the error area and merge it.
- the image segmentation area has vertices E1, E2, and E3.
- the line segment formed by the vertices E1 and E2 enters the range of the tree.
- the user can trigger the second adjustment instruction, that is, press Drag the third vertex (such as vertex E2) to the outside of the target object (such as a tree) to change the position of the third vertex.
- the auxiliary segmentation tool responds to the second adjustment instruction to enlarge the image segmentation area to obtain the target segmentation
- the target segmentation area is the adjusted image segmentation area, and the original third vertex position has changed to become a new vertex (fourth vertex) on the target segmentation area.
- the fourth vertex can be as shown in Figure 7. The position shown in E3.
- another method for adjusting the image segmentation area is provided, that is, first receiving the second adjustment instruction, and then in response to the second adjustment instruction, the image segmentation area is enlarged to obtain the target segmentation area.
- the user can use the auxiliary segmentation tool to adjust the image segmentation area, so as to obtain a more accurate segmentation result, thereby enhancing the practicability and flexibility of the solution.
- the N matrix input channels include a red input channel, a green
- the input channel and the blue input channel generate image feature information according to the image to be segmented, which can include:
- N image matrices are generated according to the image to be divided, and the N image matrices include a first image matrix corresponding to the red input channel, a second image matrix corresponding to the green input channel, and a second image matrix corresponding to the blue input channel.
- the third image matrix is generated according to the image to be divided, and the N image matrices include a first image matrix corresponding to the red input channel, a second image matrix corresponding to the green input channel, and a second image matrix corresponding to the blue input channel.
- FIG. 8 is an implementation of this application.
- a schematic diagram of an embodiment of generating image feature information As shown in the figure, this application adopts the input format of Deep Extreme Cut (DEXTR) and inputs a four-channel image matrix, which means that this application uses
- the model input also includes the information of four extreme points.
- a heat map of the same size as the image to be segmented is generated, as shown in Figure 8.
- the four extreme point coordinates are used as the center to generate a 2D Gaussian distribution, and then this heat map is used as the fourth channel, and then combined with the other three image matrices to obtain image feature information, and finally the image feature information is used as Input to the image segmentation model.
- the three image matrices are the first image matrix, the second image matrix, and the third image matrix.
- the first image matrix corresponds to the red (R) input channel
- the second image matrix corresponds to the green (G) input channel
- the The three-image matrix corresponds to the blue (B) input channel.
- the gray value can be superimposed for the area where the buffer crosses, so the more the buffer crosses, the larger the gray value, the hotter the area;
- heat maps there are other ways to generate heat maps. For example, it is also possible to directly construct four solid circles with each extreme point as the center.
- the characteristic of 2D Gaussian distribution is that the closer to the center point, the larger the value, and it decays rapidly as the distance from the center point is farther away.
- the reason why the heat map is used in this application is to give the image segmentation model some prior knowledge in the input heat map so that the image segmentation model knows that these four points are the extreme points selected by the user, but considering that the user selects not necessarily The true extreme points may have certain errors, so a heat map distribution is generated with the extreme points as the center.
- a method for generating image feature information according to the image to be segmented is provided, a heat map is generated according to multiple extreme points in the image to be segmented, a first image matrix is generated according to the image to be segmented, and the The second image matrix is generated by dividing the image, and the third image matrix is generated according to the image to be divided.
- the image segmentation corresponding to the image feature information is obtained through the image segmentation model Areas can include:
- the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- the structure of an image segmentation model is introduced.
- This application uses the DeeplabV3+ model structure as an example. It is understandable that the DeeplabV2 model structure, U-Net or Pyramid Scene Analysis Network (Pyramid Scene Analysis Network) can also be used. Parsing Network, PSPNet) etc.
- Figure 9 is a schematic structural diagram of the image segmentation model in the embodiment of the application.
- the features of the image to be segmented are extracted to obtain image feature information, and the image feature information is input to the image segmentation. model.
- the image segmentation model includes an encoder (Encoder) and a decoder (Decoder).
- the encoder is used to reduce the resolution of the feature map and capture more abstract segmentation information, and the decoder is used to restore spatial information.
- the image feature information is encoded by the Deep Convolutional Neural Network (DCNN) in the encoder, that is, the resolution of 4 times the size is restored by bilinear interpolation, and the first feature map is obtained.
- the 1*1 convolution process is used to reduce the number of channels, so as to extract the low-level features of the image feature information, and then the second feature map can be obtained.
- the first feature map and the second feature map are spliced through the concat in the decoder of the image segmentation model to obtain the target feature map. Then a convolution with a size of 3*3 is used to enhance the target feature map, and then an interpolation is used to further restore 4 times the resolution to the size of the image to be divided.
- the encoding-decoding structure can obtain the edge information of the object by gradually recovering the spatial information.
- the DeeplabV3+ model structure adds a decoder to the DeeplabV3 model structure to enhance the segmentation of the object edge.
- a method for obtaining image segmentation regions through an image segmentation model that is, first, image feature information is encoded by an encoder of the image segmentation model to obtain a first feature map and a second feature map. Then the first feature map and the second feature map are spliced to obtain the target feature map, and finally the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- a model structure based on the deep experiment V3+ version (DeeplabV3+) is used to predict the image segmentation area, and the DeeplabV3+ model structure has a small amount of overall parameters, so it has a faster running speed both in training and actual prediction , Applied to auxiliary segmentation tools can respond to user operations more quickly, improve use efficiency, and enhance user viscosity.
- the target feature map is decoded by the decoder of the image segmentation model To get the image segmentation area, which can include:
- the target feature map is decoded by the decoder of the image segmentation model to obtain a first pixel point set and a second pixel point set.
- the first pixel point set includes a plurality of first pixels
- the second pixel point set includes a second pixel point set. pixel;
- an image segmentation area is generated.
- a method for generating image segmentation regions based on an image segmentation model is introduced. After the image segmentation region decodes the target feature map, the first pixel point set and the second pixel point set are obtained, where the first pixel The point set belongs to the pixel points of the target object, for example, it can be expressed as "1", and the second pixel point set belongs to the background, for example, it can be expressed as "0".
- the first pixel point set and the second pixel point set together constitute the image segmentation area , That is, the segmentation result of the target object can be seen in the image segmentation area.
- the overall parameters of the DeeplabV3+ model structure are less than DeeplabV2. This feature makes the DeeplabV3+ model structure faster when it is being trained or in actual use. This is reflected in the use of real-time auxiliary segmentation tools, which can be more Respond quickly to requests from users.
- a method for obtaining image segmentation regions by decoding the image segmentation model is provided, that is, the target feature map is decoded by the decoder of the image segmentation model to obtain the first pixel point set and the second pixel point. Set, and then generate image segmentation regions according to the first pixel point set and the second pixel point set.
- the image segmentation corresponding to the image feature information is obtained through the image segmentation model.
- the image to be segmented is processed by a polygon fitting function to obtain polygon vertex information, where the polygon vertex information includes position information of multiple vertices;
- the target object is determined from the image to be segmented.
- a method of determining the target object from the image to be segmented is introduced. After the image segmentation area is obtained, the edge processing of the image segmentation area needs to be performed.
- FIG. 10 is this application.
- the auxiliary segmentation tool proposed in the present application is a segmentation tool that does not need to specify a specific object category.
- the model is suitable for any object on a picture. It can provide more accurate segmentation results according to the four extreme points given by the user.
- the pixels are not classified according to the preloaded category number, but each pixel on the image Perform a second classification, which means whether the current pixel is inside the object pointed to by the extreme point.
- the image segmentation area output by the image segmentation model can be expressed as a mask map (it can be understood as a two-dimensional image of the size of the original image with only 1 and 0. 1 means that the model is classified as positive, and 0 means that it is classified as negative) , The value of each pixel in the image segmentation area is 0 or 1.
- the pixel value is 1, which means that the image segmentation model judges this pixel as an internal point of the target object, and the pixel value is 0, which means that the image segmentation model judges this pixel as a background point.
- the image segmentation model extracts the contour edge of the target object based on this mask map, and performs polygon fitting on the edge of the target object. Finally, the polygon vertex information is fed back to the auxiliary segmentation tool and marked in the image to be segmented. Among them, the polygon The vertex information includes two-dimensional coordinate information.
- the polygon fitting function used in this application may specifically be the approxPolyDP function.
- the main function of the approxPolyDP function is to fold a continuous smooth curve and perform polygon fitting on the contour points of the image.
- the approxPolyDP function can be expressed as:
- InputArray curve represents the point set composed of the contour points of the image
- OutputArray approxCurve represents the output polygon point set
- double epsilon represents the accuracy of the output, that is, the maximum distance between another contour point
- bool closed represents whether the output polygon is Closed.
- polygonal fitting function may also be other types of functions. This is only an illustration and should not be construed as a limitation of the application.
- a method for processing the image segmentation area is provided, that is, the image to be segmented is first processed by a polygon fitting function to obtain polygon vertex information, where the polygon vertex information includes multiple vertices Position information, and then determine the target object from the image to be segmented according to the polygon vertex information.
- An embodiment of the image recognition method in the embodiment of this application includes:
- the object marking instruction for an image to be processed.
- the image to be processed includes a target object
- the object labeling instruction carries position information of multiple extreme points corresponding to the target object.
- the multiple extreme points can be included in the four mentioned in the previous embodiment, and the corresponding position information includes the first extreme point position information, the second extreme point position information, and the third extreme point position. Information and the fourth extreme point location information.
- the image recognition device displays the image to be processed, where the image to be processed can be represented as an auxiliary segmentation tool, and the user uses the auxiliary segmentation tool to mark multiple extreme points (including the first extreme point and the second extreme point). , The third extreme point and the fourth extreme point), that is, the object marking instruction is triggered.
- the image recognition device provided in this application may be a terminal device.
- the image recognition device responds to the object labeling instruction, and then generates the image to be segmented based on these extreme points.
- the image to be segmented includes the first extreme point location information, the second extreme point location information, and the third extreme point. Point location information and the fourth extreme point location information.
- the image feature information includes N image matrices and heat maps, the heat maps are generated based on multiple extreme points, and N is an integer greater than or equal to 1.
- the image recognition device generates N image matrices based on the image to be segmented, and generates a heat map based on multiple extreme points, and combines the heat map with the N image matrices to obtain image feature information corresponding to the image to be segmented .
- the digital image data can be represented by a matrix. If the size of the read image to be divided is 128*128, the size of the image matrix is 128*128*N, where N is an integer greater than or equal to 1.
- the image matrix can be a matrix corresponding to a grayscale image.
- the image matrix can be a matrix of RGB images.
- the RGB image is three-dimensional. The three dimensions represent the three components of red, green and blue. The size is 0 to 255. Each pixel is composed of these three Combination of components.
- Each RGB channel corresponds to an image matrix (that is, the first image matrix, the second image matrix, and the third image matrix).
- the three RGB channels are stacked together to form a color image, that is, the image to be divided is obtained.
- the image matrix can be an RGBA color space.
- PNG there are also four image matrices, and the number of N is not limited here.
- the image recognition device inputs image feature information into the image segmentation model, where the image segmentation model can adopt a Deep Lab structure, including but not limited to DeepLabV1, DeepLabV2, DeepLabV3, and DeepLabV3+.
- This application needs to improve the structure of the image segmentation model, modify the first layer parameters of the image segmentation model, so that the image segmentation model can receive four channels of image data, that is, the image segmentation model includes a first input channel and a second input Channel, third input channel and fourth input channel, the first image matrix is used as the input data of the first input channel, the second image matrix is used as the input data of the second input channel, and the third image matrix is used as the input data of the third input channel , The heat map is used as the input data of the fourth input channel.
- This application needs to improve the structure of the image segmentation model and modify the first layer parameters of the image segmentation model so that the image segmentation model can receive (N+1) channel image data, that is, the image segmentation model includes N matrix inputs Channel and a heat map input channel.
- N 3
- N 3
- each matrix input channel corresponds to an image matrix
- N 1
- 1 matrix input channel corresponds to an image matrix of grayscale images
- the heat map input channel corresponds to the heat map.
- N 4 image matrices, which correspond to 4 matrix input channels at this time, and each matrix input channel corresponds to an image matrix, and there is also a heat map input channel at this time, the heat map input The channel corresponds to the heat map.
- the polygon vertex information includes position information of multiple vertices.
- the image segmentation area output by the image recognition device can be specifically expressed as a mask map, which can be understood as a two-dimensional image with the same size as the image to be segmented, and the values inside are only 1 and 0, 1 means The classification is positive, 0 means the classification is negative, and the value of each pixel in the image segmentation area is 0 or 1.
- the pixel value is 1, which means that the image segmentation model judges this pixel as an internal point of the target object, and the pixel value is 0, which means that the image segmentation model judges this pixel as a background point.
- the image recognition device uses a polygon fitting function to process the image to be segmented to obtain polygon vertex information, and feedback the polygon vertex information to the auxiliary segmentation tool.
- the image recognition device highlights the target object in the image to be segmented based on the vertex information of the polygon. Specifically, the polygon vertex information is fed back to the auxiliary segmentation tool, and then marked in the image to be segmented.
- an image recognition method is provided.
- an image to be processed is displayed, an object labeling instruction is received, in response to the object labeling instruction, the image to be segmented is generated based on the image to be processed, and then image features are generated based on the image to be segmented Information, the image segmentation area corresponding to the image feature information is obtained through the image segmentation model, and then the segmented image is processed by the polygon fitting function to obtain polygon vertex information. Finally, according to the polygon vertex information, the target object is highlighted in the segmented image.
- the heat map generated by the extreme point is used as a part of the image feature information, which enriches the feature content of the image, so that the image segmentation model can be generated more accurately based on the image feature information
- the image segmentation area of the image thereby improving the versatility and applicability of the auxiliary segmentation tool, and can directly highlight the target object.
- FIG. 12 is a schematic diagram of a comparison of experimental results based on the segmentation method in an embodiment of the application, as shown in the figure, where (in FIG. 12 a) Figure shows the original image, Figure (b) shows the image obtained by using Google's Fluid Annotation segmentation aid tool, and Figure (c) shows the image obtained by efficiently labeling the Polygon-RNN++ tool with the segmentation data set. Figure (d) shows the image marked with the auxiliary segmentation tool provided by this application. Compared with the original image, (b), (c), and (d) are respectively covered with a layer. This is because the segmentation result combines the original image and the segmented mask. After segmentation, The mask provides a transparent color, which is then superimposed on the original image.
- the auxiliary segmentation tool provided by the present application can provide more accurate segmentation results compared to existing auxiliary segmentation tools.
- the improved image segmentation model of the present application also reduces the model response time while ensuring that the segmentation accuracy does not decrease. For online auxiliary segmentation tools, the interaction is small. Please refer to Table 1. Table 1 compares the performance and time of the image segmentation model provided by this application and the original model.
- mIOU represents the mean Intersection Over Union (mIOU).
- mIOU is an important indicator to measure the accuracy of image segmentation.
- mIOU is the intersection of the predicted area and the actual area divided by the union of the predicted area and the actual area. For all categories Take the average.
- Pascal is an image segmentation data set
- Semantic Boundaries Dataset (SBD) is an image segmentation data set
- Tesla P100 is the model of the graphics card used.
- Table 1 shows the performance of the image segmentation model and the original DEXTR model provided by this application after training under different data sets.
- the indicator mIOU is used to represent the model performance. In the case of training using only the pascal data set, the image segmentation model used in this application can provide more accurate results on the test data set.
- the image segmentation model used in this application is The performance of the original DEXTR model is not much different. Table 1 also shows the average time comparison of the two models running a single picture in the same graphics card environment. It can be seen that the image segmentation model used in this application has a very significant improvement in time performance compared to the original DEXTR model. .
- the auxiliary segmentation tool provided by this application can provide more accurate segmentation results in complex scenes. On the one hand, it can give the same accurate pre-segmentation results. On the other hand, it can also achieve faster model speeds, allowing online assistance Tools can respond faster.
- FIG. 13 is a schematic diagram of an embodiment of an image recognition device in an embodiment of this application.
- the image recognition device 30 includes:
- the acquiring module 301 is configured to acquire an image to be segmented, where the image to be segmented includes a plurality of extreme points;
- the generating module 302 is configured to generate image feature information according to the image to be divided obtained by the obtaining module 301, where the image feature information includes N image matrices and a heat map, and the heat map is based on the multiple extreme values Point generated, the N is an integer greater than or equal to 1;
- the acquiring module 301 is further configured to acquire the image segmentation area corresponding to the image feature information generated by the generating module 302 through an image segmentation model, wherein the image segmentation model includes N matrix input channels and a heat map Input channels, the N matrix input channels have a one-to-one correspondence with the N image matrices, and the heat map input channels have a corresponding relationship with the heat map;
- the generating module 302 is further configured to generate an image recognition result of the image to be divided according to the image segmentation area acquired by the acquiring module 301.
- the obtaining module 301 obtains the image to be divided, wherein the image to be divided includes a plurality of extreme points, and the generating module 302 generates image feature information according to the image to be divided obtained by the obtaining module 301, wherein the The image feature information includes N image matrices and a heat map.
- the heat map is generated based on the multiple extreme points, where N is an integer greater than or equal to 1, and the acquisition module 301 acquires all the data through the image segmentation model.
- the image matrix has a one-to-one correspondence
- the one heat map input channel has a corresponding relationship with the heat map
- the generation module 302 generates the image of the image to be divided according to the image segmentation area acquired by the acquisition module 301 Image recognition result.
- an image recognition device which first obtains an image to be segmented, where the image to be segmented includes multiple extreme points, and then generates image feature information based on the image to be segmented, where the image feature information includes N Image matrix and heat map.
- the heat map is generated based on multiple extreme points, and then the image segmentation area corresponding to the image feature information is obtained through the image segmentation model.
- the image segmentation model includes N matrix input channels and a heat map input Channels, N matrix input channels have a one-to-one correspondence with N image matrices, one heat map input channel has a corresponding relationship with the heat map, and finally the image recognition result of the image to be segmented is generated according to the image segmentation area.
- the heat map generated by the extreme point is used as a part of the image feature information, which enriches the feature content of the image, so that the image segmentation model can be generated more accurately based on the image feature information
- the image segmentation area thereby enhancing the versatility and applicability of image segmentation.
- the acquisition module 301 is specifically configured to display an image to be processed, where the image to be processed includes a target object;
- an object labeling instruction for an image to be processed where the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object, and the multiple extreme points are used To identify the contour edge of the target object;
- the image to be segmented is generated according to the image to be processed.
- a method for marking extreme points is provided.
- the image to be processed is first displayed, and then an object marking instruction is received.
- the object marking instruction carries the first extreme point position information corresponding to the target object, The second extreme point location information, the third extreme point location information, and the fourth extreme point location information, and finally in response to the object labeling instruction, generate the image to be segmented according to the image to be processed.
- the auxiliary segmentation tool can be used to label the image to be processed, the operation of the auxiliary segmentation tool is less difficult, and the convenience of use is higher, thereby improving the feasibility and operability of the solution.
- the image recognition device 30 further includes a receiving module 303 And processing module 304;
- the receiving module 303 is configured to receive a first adjustment instruction for a first vertex, where the first vertex belongs to an edge point of the image segmentation area, and the first vertex corresponds to first position information;
- the processing module 304 is configured to, in response to the first adjustment instruction received by the receiving module 303, perform a reduction process on the image segmentation area to obtain a target segmentation area, wherein the target segmentation area includes A second vertex obtained by adjusting the first vertex, where the second vertex corresponds to second position information, and the second position information is different from the first position information.
- a method for adjusting the image segmentation area that is, receiving a first adjustment instruction, and then in response to the first adjustment instruction, performing reduction processing on the image segmentation area to obtain a target segmentation area.
- the image recognition device 30 further includes the receiving module 303 and the Processing module 304;
- the receiving module 303 is further configured to receive a second adjustment instruction for a third fixed point, where the third vertex belongs to the image segmentation area;
- the processing module 304 is further configured to, in response to the second adjustment instruction received by the receiving module 303, perform magnification processing on the image segmentation area to obtain a target segmentation area, wherein the target segmentation area includes The fourth vertex obtained by adjusting the third vertex.
- another method for adjusting the image segmentation area is provided, that is, first receiving the second adjustment instruction, and then in response to the second adjustment instruction, the image segmentation area is enlarged to obtain the target segmentation area.
- the user can use the auxiliary segmentation tool to adjust the image segmentation area, so as to obtain a more accurate segmentation result, thereby enhancing the practicability and flexibility of the solution.
- the N matrix input channels include a red input channel, a green input channel, and a blue input channel.
- the image recognition device provided in this embodiment of the present application In another embodiment of 30,
- the generating module 302 is specifically configured to generate the heat map according to the multiple extreme points in the image to be segmented;
- N image matrices according to the image to be divided, the N image matrices including a first image matrix corresponding to the red input channel, a second image matrix corresponding to the green input channel, and a second image matrix corresponding to the blue input The third image matrix of the channel.
- a method for generating image feature information according to the image to be segmented is provided, a heat map is generated according to multiple extreme points in the image to be segmented, a first image matrix is generated according to the image to be segmented, and the The second image matrix is generated by dividing the image, and the third image matrix is generated according to the image to be divided.
- the acquiring module 301 is specifically configured to encode the image feature information through the encoder of the image segmentation model to obtain a first feature map and a second feature map;
- the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- a method for obtaining image segmentation regions through an image segmentation model that is, first, image feature information is encoded by an encoder of the image segmentation model to obtain a first feature map and a second feature map. Then the first feature map and the second feature map are spliced to obtain the target feature map, and finally the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- a model structure based on the deep experiment V3+ version (DeeplabV3+) is used to predict the image segmentation area, and the DeeplabV3+ model structure has a small amount of overall parameters, so it has a faster running speed both in training and actual prediction , Applied to auxiliary segmentation tools can respond to user operations more quickly, improve use efficiency, and enhance user viscosity.
- the acquisition module 301 is specifically configured to decode the target feature map through the decoder of the image segmentation model to obtain a first pixel point set and a second pixel point set, where the first pixel point set includes A plurality of first pixel points, and the second pixel point set includes a second pixel point;
- the image segmentation area is generated according to the first pixel point set and the second pixel point set.
- a method for obtaining image segmentation regions by decoding the image segmentation model is provided, that is, the target feature map is decoded by the decoder of the image segmentation model to obtain the first pixel point set and the second pixel point. Set, and then generate image segmentation regions according to the first pixel point set and the second pixel point set.
- the image recognition device 30 further includes all The processing module 304 and the determining module 306;
- the processing module 304 is also used for the acquisition module 301 after acquiring the image segmentation area corresponding to the image feature information through the image segmentation model, and then processing the image to be segmented through a polygon fitting function to obtain polygon vertex information , Wherein the polygon vertex information includes position information of multiple vertices;
- the determining module 306 is configured to determine a target object from the image to be divided according to the polygon vertex information processed by the processing module 304.
- a method for processing the image segmentation area is provided, that is, the image to be segmented is first processed by a polygon fitting function to obtain polygon vertex information, where the polygon vertex information includes multiple vertices Position information, and then determine the target object from the image to be segmented according to the polygon vertex information.
- FIG. 16 is a schematic diagram of an embodiment of the image recognition device in the embodiment of the application.
- the image recognition device 40 includes:
- the receiving module 401 is configured to receive an object labeling instruction for an image to be processed, where the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- a generating module 402 configured to generate an image to be segmented based on the image to be processed in response to the object labeling instruction received by the receiving module 401;
- the generating module 402 is further configured to generate image feature information according to the image to be divided, where the image feature information includes N image matrices and a heat map, and the heat map is generated based on the multiple extreme points. , Said N is an integer greater than or equal to 1;
- the obtaining module 403 is configured to obtain the image segmentation area corresponding to the image feature information generated by the generating module 402 through the image segmentation model, wherein the image segmentation model includes N matrix input channels and one heat map input channel, The N matrix input channels have a one-to-one correspondence with the N image matrices, and the heat map input channels have a corresponding relationship with the heat map;
- the processing module 404 is configured to process the to-be-divided image acquired by the acquisition module 403 by a polygon fitting function to obtain polygon vertex information, where the polygon vertex information includes position information of multiple vertices;
- the display module 405 is configured to display the target object in the image to be divided according to the polygon vertex information processed by the processing module 404.
- the receiving module 401 when displaying the image to be processed, receives an object labeling instruction, where the object labeling instruction carries the first extreme point position information and the second extreme point position information corresponding to the target object , The third extreme point location information and the fourth extreme point location information, the generating module 402 responds to the object labeling instruction received by the receiving module 401 and generates the image to be segmented according to the image to be processed, the generating module 402 Generate image feature information according to the image to be divided, where the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or An integer equal to 1, the acquiring module 403 acquires the image segmentation area corresponding to the image feature information generated by the generating module 402 through the image segmentation model, where the image segmentation model includes N matrix input channels and a heat map input Channel, the N matrix input channels have a one-to-one correspondence with the N image matrices, the one heat map input channel has a
- an image recognition device When displaying an image to be processed, it receives an object labeling instruction, responds to the object labeling instruction, generates an image to be segmented based on the image to be processed, and then generates image feature information based on the image to be segmented Obtain the image segmentation area corresponding to the image feature information through the image segmentation model, and then process the segmented image through the polygon fitting function to obtain polygon vertex information. Finally, according to the polygon vertex information, highlight the target object in the segmented image.
- the heat map generated by the extreme point is used as a part of the image feature information, which enriches the feature content of the image, so that the image segmentation model can be generated more accurately based on the image feature information
- the image segmentation area of the image thereby improving the versatility and applicability of the auxiliary segmentation tool, and can directly highlight the target object.
- the embodiment of the present application also provides another image recognition device. As shown in FIG. 17, for ease of description, only the parts related to the embodiment of the present application are shown. For specific technical details that are not disclosed, please refer to the embodiments of the present application. Method part.
- the terminal device can be any terminal device including mobile phone, tablet computer, PDA, POS, car computer, etc. Take the terminal device as a mobile phone as an example:
- FIG. 17 shows a block diagram of a part of the structure of a mobile phone related to a terminal device provided in an embodiment of the present application.
- the mobile phone includes: a radio frequency (RF) circuit 510, a memory 520, an input unit 530, a display unit 540, a sensor 550, an audio circuit 560, a wireless fidelity (WiFi) module 570, and a processor 580 , And power supply 590 and other components.
- RF radio frequency
- the RF circuit 510 can be used for receiving and sending signals during information transmission or communication. In particular, after receiving the downlink information of the base station, it is processed by the processor 580; in addition, the designed uplink data is sent to the base station.
- the RF circuit 510 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
- the RF circuit 510 can also communicate with the network and other devices through wireless communication.
- the above wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.
- GSM Global System of Mobile Communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- E-mail Short Messaging Service
- the memory 520 can be used to store software programs and modules.
- the processor 580 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 520.
- the memory 520 may mainly include a storage program area and a storage data area.
- the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones.
- the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
- the input unit 530 can be used to receive inputted number or character information, and generate key signal input related to user settings and function control of the mobile phone.
- the input unit 530 may include a touch panel 531 and other input devices 532.
- the touch panel 531 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 531 or near the touch panel 531. Operation), and drive the corresponding connection device according to the preset program.
- the touch panel 531 may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 580, and can receive and execute the commands sent by the processor 580.
- the touch panel 531 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
- the input unit 530 may also include other input devices 532. Specifically, other input devices 532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, joystick, and the like.
- the display unit 540 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
- the display unit 540 may include a display panel 541.
- the display panel 541 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
- the touch panel 531 can cover the display panel 541. When the touch panel 531 detects a touch operation on or near it, it transmits it to the processor 580 to determine the type of the touch event, and then the processor 580 responds to the touch event. The type provides corresponding visual output on the display panel 541.
- the touch panel 531 and the display panel 541 are used as two independent components to implement the input and input functions of the mobile phone, but in some embodiments, the touch panel 531 and the display panel 541 can be integrated Realize the input and output functions of mobile phones.
- the mobile phone may also include at least one sensor 550, such as a light sensor, a motion sensor, and other sensors.
- the light sensor can include an ambient light sensor and a proximity sensor.
- the ambient light sensor can adjust the brightness of the display panel 541 according to the brightness of the ambient light.
- the proximity sensor can close the display panel 541 and/or when the mobile phone is moved to the ear. Or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when stationary, and can be used to identify mobile phone posture applications (such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.
- mobile phone posture applications such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.
- vibration recognition related functions such as pedometer, percussion
- other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, we will not here Repeat.
- the audio circuit 560, the speaker 561, and the microphone 562 can provide an audio interface between the user and the mobile phone.
- the audio circuit 560 can transmit the electric signal converted from the received audio data to the speaker 561, and the speaker 561 converts it into a sound signal for output; on the other hand, the microphone 562 converts the collected sound signal into an electric signal, and the audio circuit 560 After being received, it is converted into audio data, and then processed by the audio data output processor 580, and then sent to another mobile phone via the RF circuit 510, or the audio data is output to the memory 520 for further processing.
- WiFi is a short-distance wireless transmission technology.
- the mobile phone can help users send and receive emails, browse webpages, and access streaming media through the WiFi module 570. It provides users with wireless broadband Internet access.
- FIG. 17 shows the WiFi module 570, it is understandable that it is not a necessary component of the mobile phone and can be omitted as needed without changing the essence of the invention.
- the processor 580 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone, and executes by running or executing software programs and/or modules stored in the memory 520, and calling data stored in the memory 520. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
- the processor 580 may include one or more processing units; optionally, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, and application programs. And so on, the modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 580.
- the mobile phone also includes a power source 590 (such as a battery) for supplying power to various components.
- a power source 590 such as a battery
- the power source can be logically connected to the processor 580 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
- the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.
- the processor 580 included in the terminal device also has the following functions:
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the image recognition result of the image to be divided is generated according to the image segmentation area.
- processor 580 is specifically configured to execute the following steps:
- an object labeling instruction for an image to be processed wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object, and the multiple extreme points Used to identify the contour edge of the target object;
- the image to be segmented is generated according to the image to be processed.
- the position information of the multiple extreme points includes first extreme point position information, second extreme point position information, and third extreme point position information that respectively identify around the contour edge of the target object, and Position information of the fourth extreme point.
- processor 580 is further configured to execute the following steps:
- the image segmentation area is reduced to obtain a target segmentation area, wherein the target segmentation area includes a second vertex adjusted based on the first vertex, and the second vertex Corresponding to the second location information, the second location information is different from the first location information.
- processor 580 is further configured to execute the following steps:
- the image segmentation area is enlarged to obtain a target segmentation area, where the target segmentation area includes a fourth vertex adjusted based on the third vertex.
- the N matrix input channels include a red input channel, a green input channel, and a blue input channel
- the processor 580 is specifically configured to perform the following steps:
- N image matrices according to the image to be divided, the N image matrices including a first image matrix corresponding to the red input channel, a second image matrix corresponding to the green input channel, and a second image matrix corresponding to the blue input The third image matrix of the channel.
- processor 580 is specifically configured to execute the following steps:
- the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- processor 580 is specifically configured to execute the following steps:
- the target feature map is decoded by the decoder of the image segmentation model to obtain a first pixel point set and a second pixel point set, wherein the first pixel point set includes a plurality of first pixels, and The second pixel point set includes the second pixel point;
- the image segmentation area is generated according to the first pixel point set and the second pixel point set.
- processor 580 is further configured to execute the following steps:
- a target object is determined from the image to be divided.
- the processor 580 included in the terminal device also has the following functions:
- an object labeling instruction for an image to be processed wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the target object is displayed in the image to be divided.
- FIG. 18 is a schematic diagram of a server structure provided by an embodiment of the present application.
- the server 600 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 622 (for example, , One or more processors) and a memory 632, and one or more storage media 630 (for example, one or more storage devices) that store application programs 642 or data 644.
- the memory 632 and the storage medium 630 may be short-term storage or persistent storage.
- the program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
- the central processing unit 622 may be configured to communicate with the storage medium 630, and execute a series of instruction operations in the storage medium 630 on the server 600.
- the server 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input and output interfaces 658, and/or one or more operating systems 641, such as WindowsServerTM, MacOSXTM, UnixTM , LinuxTM, FreeBSDTM and so on.
- the steps performed by the server in the foregoing embodiment may be based on the server structure shown in FIG. 18.
- the CPU622 included in the server also has the following functions:
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the image recognition result of the image to be divided is generated according to the image segmentation area.
- CPU622 is specifically configured to execute the following steps:
- an object labeling instruction for an image to be processed wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- the image to be segmented is generated according to the image to be processed.
- CPU622 is also used to execute the following steps:
- the image segmentation area is reduced to obtain a target segmentation area, wherein the target segmentation area includes a second vertex adjusted based on the first vertex, and the second vertex Corresponding to the second location information, the second location information is different from the first location information.
- CPU622 is also used to execute the following steps:
- the image segmentation area is enlarged to obtain a target segmentation area, where the target segmentation area includes a fourth vertex adjusted based on the third vertex.
- the N matrix input channels include a red input channel, a green input channel, and a blue input channel
- the CPU 622 is specifically configured to execute the following steps:
- N image matrices according to the image to be divided, the N image matrices including a first image matrix corresponding to the red input channel, a second image matrix corresponding to the green input channel, and a second image matrix corresponding to the blue input The third image matrix of the channel.
- CPU622 is specifically configured to execute the following steps:
- the target feature map is decoded by the decoder of the image segmentation model to obtain the image segmentation area.
- CPU622 is specifically configured to execute the following steps:
- the target feature map is decoded by the decoder of the image segmentation model to obtain a first pixel point set and a second pixel point set, wherein the first pixel point set includes a plurality of first pixels, and The second pixel point set includes the second pixel point;
- the image segmentation area is generated according to the first pixel point set and the second pixel point set.
- CPU622 is also used to execute the following steps:
- a target object is determined from the image to be divided.
- the CPU622 included in the terminal device also has the following functions:
- an object labeling instruction for an image to be processed wherein the image to be processed includes a target object, and the object labeling instruction carries position information of multiple extreme points corresponding to the target object;
- Image feature information is generated according to the image to be divided, wherein the image feature information includes N image matrices and a heat map, the heat map is generated based on the multiple extreme points, and N is greater than or equal to An integer of 1;
- the image segmentation area corresponding to the image feature information is acquired through an image segmentation model, where the image segmentation model includes N matrix input channels and one heat map input channel, and the N matrix input channels and the N images
- the matrix has a one-to-one correspondence
- the heat map input channel has a corresponding relationship with the heat map;
- the target object is displayed in the image to be divided.
- an embodiment of the present application also provides a storage medium, where the storage medium is used to store program code, and the program code is used to execute the method provided in the foregoing embodiment.
- the embodiments of the present application also provide a computer program product including instructions, which when running on a server, cause the server to execute the method provided in the foregoing embodiment.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
一种图像识别装置和图像识别方法,包括:获取待分割图像(101);根据待分割图像生成图像特征信息(102);通过图像分割模型获取图像特征信息所对应的图像分割区域(103);根据图像分割区域生成待分割图像的图像识别结果(104)。该方法利用极值点所生成的热图作为图像特征信息的一部分,丰富图像的特征,从而生成更加准确的图像分割区域,由此提升图像分割的通用性和适用性。
Description
本申请要求于2019年06月04日提交中国专利局、申请号为201910481441.0、申请名称为“一种图像分割的方法、图像识别的方法以及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能领域,尤其涉及图像识别。
随着计算机技术的发展,图像分割技术应用越来越广泛,例如,医学图像分割以及自然图像分割等。其中,图像分割技术是指把图像分成若干个特定的、具有独特性质的区域并提出感兴趣目标的技术。例如,人体组织图像分割场景中,可以对医学图像进行分割,使得分割后的图像中能明显区分人体各个组织。
发明内容
本申请实施例提供了一种基于人工智能的图像识别方法以及相关装置,利用极值点所生成的热图作为图像特征信息的一部分,丰富图像的特征,从而生成更加准确的图像分割区域,由此提升图像分割的通用性和适用性。
有鉴于此,本申请第一方面提供一种图像分割的方法,包括:
获取待分割图像,其中,所述待分割图像包括多个极值点;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
根据所述图像分割区域生成所述待分割图像的图像识别结果。
本申请第二方面提供一种图像识别的方法,包括:
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
响应于所述物体标注指令,根据所述待处理图像生成待分割图像;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
根据所述多边形顶点信息,在所述待分割图像中突出展示目标对象。
本申请第三方面提供一种图像识别装置,包括:
获取模块,用于获取待分割图像,其中,所述待分割图像包括多个极值点;
生成模块,用于根据所述获取模块获取的所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
所述获取模块,还用于通过图像分割模型获取所述生成模块生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
所述生成模块,还用于根据所述获取模块获取的所述图像分割区域生成所述待分割图像的图像识别结果。
在一种可能的设计中,在本申请实施例的第三方面的第一种实现方式中,
所述获取模块,具体用于展示待处理图像,其中,所述待处理图像中包括目标对象;
接收物体标注指令,其中,所述物体标注指令携带所述目标对象所对应多个极值点的位置信息,所述多个极值点用于标识所述目标对象的轮廓边缘;
响应于所述物体标注指令,根据所述待处理图像生成所述待分割图像。
在一种可能的设计中,在本申请实施例的第三方面的第二种实现方式中,所述多个极值点的位置信息包括分别标识所述目标对象的轮廓边缘四周的第 一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息。
在一种可能的设计中,在本申请实施例的第三方面的第三种实现方式中,所述图像分割装置还包括接收模块以及处理模块;
所述接收模块,用于接收针对第一顶点的第一调整指令,其中,所述第一顶点属于所述图像分割区域的边缘点,所述第一顶点对应于第一位置信息;
所述处理模块,用于响应于所述接收模块接收的所述第一调整指令,对所述图像分割区域进行缩小处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第一顶点调整得到的第二顶点,所述第二顶点对应于第二位置信息,所述第二位置信息与所述第一位置信息不相同。
在一种可能的设计中,在本申请实施例的第三方面的第四种实现方式中,所述图像分割装置还包括所述接收模块以及所述处理模块;
所述接收模块,还用于接收针对第三顶点的第二调整指令,其中,所述第三顶点属于所述图像分割区域;
所述处理模块,还用于响应于所述接收模块接收的所述第二调整指令,对所述图像分割区域进行放大处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第三顶点调整得到的第四顶点。
在一种可能的设计中,在本申请实施例的第三方面的第五种实现方式中,所述N个矩阵输入通道包括红色输入通道、绿色输入通道和蓝色输入通道,
所述生成模块,具体用于根据所述待分割图像中的所述多个极值点生成所述热图;
根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色输入通道的第一图像矩阵,对应所述绿色输入通道的第二图像矩阵,以及对应所述蓝色输入通道的第三图像矩阵。
在一种可能的设计中,在本申请实施例的第三方面的第六种实现方式中,
所述获取模块,具体用于通过所述图像分割模型的编码器对所述图像特征信息进行编码,得到第一特征图以及第二特征图;
将所述第一特征图以及所述第二特征图进行拼接,得到目标特征图;
通过所述图像分割模型的解码器对所述目标特征图进行解码,得到所述图 像分割区域。
在一种可能的设计中,在本申请实施例的第三方面的第七种实现方式中,
所述获取模块,具体用于通过所述图像分割模型的解码器对所述目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,所述第一像素点集合包括多个第一像素点,所述第二像素点集合包括第二像素点;
根据所述第一像素点集合以及所述第二像素点集合,生成所述图像分割区域。
在一种可能的设计中,在本申请实施例的第三方面的第八种实现方式中,所述图像分割装置还包括所述处理模块以及确定模块;
所述处理模块,还用于所述获取模块通过图像分割模型获取所述图像特征信息所对应的图像分割区域之后,通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
所述确定模块,用于根据所述处理模块处理得到的所述多边形顶点信息,从所述待分割图像中确定目标对象。
本申请第四方面提供一种图像识别装置,包括:
接收模块,用于接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
生成模块,用于响应于所述接收模块接收的所述物体标注指令,根据所述待处理图像生成待分割图像;
所述生成模块,还用于根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
获取模块,用于通过图像分割模型获取所述生成模块生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
处理模块,用于通过多边拟合函数对所述获取模块获取的所述待分割图像 进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
展示模块,用于根据所述处理模块处理得到的所述多边形顶点信息,在所述待分割图像中展示所述目标对象。
本申请第五方面提供一种终端设备,包括:存储器、收发器、处理器以及总线系统;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,包括如下步骤:
获取待分割图像,其中,所述待分割图像包括多个极值点;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
根据所述图像分割区域生成所述待分割图像的图像识别结果;
所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
本申请第六方面提供一种服务器,包括:存储器、收发器、处理器以及总线系统;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,包括如下步骤:
获取待分割图像,其中,所述待分割图像包括多个极值点;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩 阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
根据所述图像分割区域生成所述待分割图像的图像识别结果;
所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
本申请第七方面提供一种终端设备,包括:存储器、收发器、处理器以及总线系统;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,包括如下步骤:
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
响应于所述物体标注指令,根据所述待处理图像生成待分割图像;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
根据所述多边形顶点信息,在所述待分割图像中展示所述目标对象;
所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
本申请的第八方面提供了一种计算机可读存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行上述各方面所述的方法。
本申请的第九方面提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行以上方面所述的方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例中,提供了一种图像分割的方法,首先获取待分割图像,其中,待分割图像包括多个极值点,然后根据待分割图像生成图像特征信息,其中,图像特征信息包括N个图像矩阵以及热图,热图为根据多个极值点生成的,再通过图像分割模型获取图像特征信息所对应的图像分割区域,其中,图像分割模型包括N个矩阵输入通道以及一个热图输入通道,N个矩阵输入通道与N个图像矩阵具有一一对应的关系,热图输入通道与热图具有对应关系,最后根据图像分割区域生成待分割图像的图像识别结果。通过上述方式,无需考虑目标是否满足特定类别,而是利用极值点所生成的热图作为图像特征信息的一部分,丰富了图像的特征内容,使得图像分割模型能够根据该图像特征信息生成更加准确的图像分割区域,从而提升图像分割的通用性和适用性。
图1为本申请实施例中图像识别系统的一个架构示意图;
图2为本申请实施例中图像分割模型的一个结构示意图;
图3为本申请实施例中图像识别方法的一个实施例示意图;
图4为本申请实施例中选取四个极值点的一个实施例示意图;
图5为本申请实施例中图像分割模型返回图像分割区域的一个实施例示意图;
图6为本申请实施例中缩小图像分割区域的一个实施例示意图;
图7为本申请实施例中增大图像分割区域的一个实施例示意图;
图8为本申请实施例中生成图像特征信息的一个实施例示意图;
图9为本申请实施例中图像分割模型的一个结构示意图;
图10为本申请实施例中图像分割模型输出过程的一个实施例示意图;
图11为本申请实施例中图像识别的方法一个实施例示意图;
图12为本申请实施例中基于分割方式的一个实验结果对比示意图;
图13为本申请实施例中图像识别装置一个实施例示意图;
图14为本申请实施例中图像识别装置另一个实施例示意图;
图15为本申请实施例中图像识别装置另一个实施例示意图;
图16为本申请实施例中图像识别装置一个实施例示意图;
图17为本申请实施例中终端设备的一个结构示意图;
图18为本申请实施例中服务器的一个结构示意图。
本申请实施例提供了一种图像分割的方法、图像识别的方法以及相关装置,利用极值点所生成的热图作为图像特征信息的一部分,丰富图像的特征,从而生成更加准确的图像分割区域,由此提升图像分割的通用性和适用性。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“对应于”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应理解,本申请所提供的图像分割(Image segmentation)方法以及图像识别方法可应用于人工智能领域,具体可以应用于计算机视觉领域。随着计算机科学技术的不断发展,图像处理和分析逐渐形成了一套科学体系,新的处理方法层出不穷,尽管其发展历史不长,但却引起各方面人士的广泛关注。首先,视觉是人类最重要的感知手段,图像又是视觉的基础,因此,数字图像成为心理学、生理学以及计算机科学等诸多领域内的学者们研究视觉感知的有效工具。其次,图像处理在军事、遥感以及气象等大型应用中有不断增长的需求。图像分割技术一直是计算机视觉领域的基础技术和重要研究方向,具体来说,就是从一张图像上将感兴趣的区域(比如人、车以及建筑物等)按照真实轮廓分割出来。图像分割技术是图像语义理解的重要一环,近年来,随着神经网络的发展,图像处理能力显著提升,图像分割技术在医学影像分析(包括肿瘤和其他病理的定位,组织体积的测量,计算机引导的手术,治疗方案的定制,解剖学结构的研究)、人脸识别、指纹识别、无人驾驶以及机器视觉等领域中也发挥出了更加重要的作用。
为了便于理解,请参阅图1,图1为本申请实施例中图像识别系统的一个架构示意图,如图所示,本申请所提供的图像处理设备包括终端设备或服务器, 例如可以为部署有客户端的终端设备,该客户端具体可以是一款辅助分割工具,需要说明的是,部署了该客户端的终端设备包含但不仅限于平板电脑、笔记本电脑、掌上电脑、手机、语音交互设备及个人电脑(personal computer,PC),此处不做限定。
为了方便用户标注图像分割数据集,本申请提出了一种基于神经网络模型(即图像分割模型)的交互式图像辅助分割工具。在图像分割标注任务中,辅助分割工具只要获取少量的用户交互行为,就能够通过神经网络模型(即图像分割模型)反馈一个较为准确的预分割结果(即得到图像分割区域),然后用户再基于预分割的结果(即图像分割区域)进行少量修改甚至无需修改,就能获得最终的分割结果(即得到目标分割区域)。本申请提出“四点交互”式的分割方法,并改进了原有的图像分割模型,从而获得了更好的分割结果与工具实时性表现。
图像分割模型可以部署在作为图像处理设备的服务器中,通过图像分割模型进行图像分割区域的预测,从而实现图像在线分割的目的,可选地,图像分割模型也可以部署在作为图像处理设备的终端设备上,通过图像分割模型进行图像分割区域的预测,从而实现图像离线分割的目的。
请参阅图2,图2为本申请实施例中图像分割模型的一个结构示意图,如图所示,用户通过辅助分割工具对待处理图像进行极值点的标注,比如对图2中的树进行标注,辅助分割工具根据用户标注的结果生成热图100,该热图100与待处理图像的图像矩阵200进行组合,得到图像特征信息。将图像特征信息输入至图像分割模型300,通过该图像分割模型300提取特征,从而输出图像分割区域400,比如图2示出的黑色树状区域。图像分割模型可以是一种图像分割卷积神经网络(Convolutional Neural Networks,CNN),其模型结构主要包括输入层、特征提取层以及输出层。
结合上述介绍,下面将对本申请中图像识别方法进行介绍,请参阅图3,本申请实施例中图像识别方法一个实施例包括:
101、获取待分割图像。其中,待分割图像包括多个极值点。
本实施例中,图像识别设备获取待分割图像,其中,图像识别设备中可以表现为其中部署的辅助分割工具,待分割图像可以通过该辅助分割工具标注得 到的,用户使用辅助分割工具标注多个极值点,根据这些极值点生成待分割图像。可以理解的是,本申请所提供的图像识别设备可为终端设备或服务器。
具体地,多个极值点可以是待分割图像中目标对象的最高点,最低点,最左点和最右点。
102、根据待分割图像生成图像特征信息。其中,图像特征信息包括N个图像矩阵以及热图,热图为根据多个极值点生成的,N为大于或等于1的整数。
本实施例中,图像分割装置根据待分割图像生成N个图像矩阵,并且根据多个极值点生成热图,将热图与N个图像矩阵进行组合,得到待分割图像所对应的图像特征信息。
其中,数字图像数据可以用矩阵来表示,如果读取的待分割图像大小为128*128,则图像矩阵大小为128*128*N,其中,N为大于或等于1的整数。当N为1时,图像矩阵可以是灰度图像所对应的矩阵。当N为3时,图像矩阵可以是红绿蓝(red green blue,RGB)图像的矩阵,RGB图像是三维的,三个维度分别表示红、绿和蓝三个分量,大小是0到255,每个像素都是由这三个分量组合而成。每一个RGB通道都对应一个图像矩阵(即第一图像矩阵、第二图像矩阵以及第三图像矩阵),因此,这三个RGB通道叠在一起形成了彩色图像,即得到待分割图像。当N为4时,图像矩阵可以是红绿蓝和Alpha(red green blue Alpha,RGBA)的色彩空间,对于便携式网络图形(Portable Network Graphics,PNG)而言,也具有四个图像矩阵,此处不对N的数量进行限定。
103、通过图像分割模型获取图像特征信息所对应的图像分割区域。其中,图像分割模型包括N个矩阵输入通道以及一个热图输入通道,N个矩阵输入通道与N个图像矩阵具有一一对应的关系,热图输入通道与热图具有对应关系。这里所提出的对应关系可以理解为,若图像矩阵a与矩阵输入通道a具有对应关系,在通过图像分割模型获取图像特征信息所对应的图像分割区域时,图像矩阵a从矩阵输入通道a输入图像分割模型。热图与热图输入通道也是这种输入方式。
本实施例中,图像分割装置将图像特征信息输入至图像分割模型,其中,图像分割模型可以采用深度实验(Deep Lab)结构,包含但不仅限于DeepLabV1、DeepLabV2、DeepLabV3以及DeepLabV3+。其中,DeepLabV2结构是一种用于 图像分割的CNN模型结构,输入一张图片,输出原图同大小的掩码图,图中每个像素点的值表示这个像素属于的类别标签值。DeepLabV3+结构是在DeeplabV2的基础上改进后的一种用于图像分割的CNN模型结构,它在图像分割比赛中通常能够取得更好的成绩。CNN是神经网络模型的一种发展,用卷积层替代了人工神经网络中的全连接层结构,在各种计算机视觉领域中取得了非常优异的表现。
本申请需要对图像分割模型的结构进行改进,对图像分割模型的第一层参数进行修改,使图像分割模型能够接收(N+1)个通道(channel)的图像数据,即图像分割模型包括N个矩阵输入通道以及一个热图输入通道。假设N为3,则表示有3个图像矩阵,此时对应3个矩阵输入通道,每个矩阵输入通道对应一个图像矩阵,且此时还具有一个热图输入通道,该热图输入通道对应于热图。
类似地,假设N为1,则表示有1个图像矩阵,此时对应1个矩阵输入通道,1个矩阵输入通道对应灰度图像的一个图像矩阵,且此时还具有一个热图输入通道,该热图输入通道对应于热图。
类似地,假设N为4,则表示有4个图像矩阵,此时对应4个矩阵输入通道,每个矩阵输入通道对应一个图像矩阵,且此时还具有一个热图输入通道,该热图输入通道对应于热图。
104、根据图像分割区域生成待分割图像的图像识别结果。
本实施例中,图像分割装置根据图像分割区域生成待分割图像的图像识别结果,具体地,图像分割区域是一个掩膜(mask)图像,基于该mask图像可以得到待分割图像中目标对象的边缘,用户可以手动调整该图像分割区域的边缘,最后得到图像识别结果。其中,该图像识别结果可以通过文本信息展示,比如,图像识别结果为“猴子”或者“汽车”等对象。图像识别结果还可以是在待分割图像中突出展示目标对象,目标对象可以是“汽车”或者“猴子”等对象。
本申请实施例中,提供了一种图像识别的方法,首先获取待分割图像,其中,待分割图像包括多个极值点,然后根据待分割图像生成图像特征信息,其中,图像特征信息包括第一图像矩阵、第二图像矩阵、第三图像矩阵以及热图,热图为根据多个极值点生成的,最后可以通过图像分割模型获取图像特征信息 所对应的图像分割区域,其中,图像分割模型包括第一输入通道、第二输入通道、第三输入通道以及第四输入通道,第一输入通道与第一图像矩阵具有对应关系,第二输入通道与第二图像矩阵具有对应关系,第三输入通道与第三图像矩阵具有对应关系,第四输入通道与热图具有对应关系。通过上述方式,无需考虑目标是否满足特定类别,而是利用极值点所生成的热图作为图像特征信息的一部分,丰富了图像的特征内容,使得图像分割模型能够根据该图像特征信息生成更加准确的图像分割区域,从而提升图像分割的通用性和适用性,从而分割结果可以为目标对象的图像识别提供准确的数据,进一步的提高了图像识别的精度。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像识别方法一个可选实施例中,获取待分割图像,可以包括:
接收针对待处理图像的物体标注指令,其中,待处理图像中包括目标对象,物体标注指令携带目标对象所对应的多个极值点的位置信息,所述多个极值点用于标识所述目标对象的轮廓边缘;
为了标识目标对象,可以通过目标对象的轮廓边缘四周来确定极值点,例如上下左右四个方向的极值点,例如图4示出的情况。这一情况下,该多个极值点可以包括四个,相应的,四个极值点的位置信息包括第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息;
响应于物体标注指令,根据待处理图像生成待分割图像。
本实施例中,介绍了一种基于极值点标注的方式,用户可以使用辅助分割工具标注多个极值点。为了便于理解,请参阅图4,图4为本申请实施例中选取四个极值点的一个实施例示意图,如图所示,首先展示一个待处理图像,该待处理图像中包括目标对象,比如包括花朵、草堆以及树木,在实际应用中,目标对象包含但不仅限于人物、动物、车辆以及其他物体。在启动辅助分割工具之后用户即可触发物体标注指令,比如通过点选的方式从待处理图像中选择若干个极值点,以图4为例,假设目标对象为树木,用户通过辅助分割工具选择树木的四个极值点,即第一极值点A、第二极值点B、第三极值点C以及第四极值点D。在物体标注指令中具体携带了这四个极值点的坐标信息,从而根据物体标注指令生成待处理图像所对应的待分割图像,待分割图像如图4所示的树 木所对应的图像,且待分割图像包括第一极值点A、第二极值点B、第三极值点C以及第四极值点D所构成的区域。
辅助分割工具根据待分割图像生成图像特征信息(包括热图以及图像矩阵),然后通过图像分割模型获取图像特征信息所对应的图像分割区域,请参阅图5,图5为本申请实施例中图像分割模型返回图像分割区域的一个实施例示意图,如图所示,辅助分割工具根据四个极值点计算得到图像分割区域,并返回该图像分割区域,比如图5中阴影部分所对应的图像即为图像分割区域。可以理解的是,图像分割区域可以为一个预分割的多边形结果,图5仅为一个示意,不应理解为对本申请的限定。
其次,本申请实施例中,提供了一种标注极值点的方法,首先展示待处理图像,然后接收物体标注指令,其中,物体标注指令携带目标对象所对应的第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息,最后响应于物体标注指令,根据待处理图像生成待分割图像。通过上述方式,能够利用辅助分割工具对待处理图像进行标注,辅助分割工具的操作难度较低,使用的便利性较高,从而提升方案的可行性和可操作性。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像识别方法的一个可选实施例中,通过图像分割模型获取图像特征信息所对应的图像分割区域之后,还可以包括:
接收针对第一顶点的第一调整指令,其中,第一顶点属于图像分割区域的边缘点,第一顶点对应于第一位置信息;
响应于第一调整指令,对图像分割区域进行缩小处理,得到目标分割区域,其中,目标分割区域包括基于所述第一顶点调整得到的第二顶点,第二顶点对应于第二位置信息,第二位置信息与第一位置信息不相同。
本实施例中,介绍了一种对图像分割区域进行调整方法,用户可以通过辅助分割工具触发第一调整指令,为了便于理解,请参阅图6,图6为本申请实施例中缩小图像分割区域的一个实施例示意图,如图所示,如果预分割的图像分割区域存在错误,用户可以直接对图像分割区域进行修改,修改方式包括拖动多边形的边或者顶点,比如,图像分割区域具有顶点E1、顶点E2和顶点E3,其中,顶点E1和顶点E2构成的线段超出了树木的范围,因此,用户可以触发 第一调整指令,即按住第一顶点(如顶点E2)向目标对象(如树木)的内部拖动,从而改变第一顶点的位置,辅助分割工具响应于第一调整指令,对图像分割区域进行缩小处理,得到目标分割区域,该目标分割区域即为调整过的图像分割区域,且原来的第一顶点位置发生了变化,变成第二顶点所在的位置,第二顶点可以为图6中E3所示的位置。
其次,本申请实施例中,提供了一种对图像分割区域进行调整方法,即接收第一调整指令,然后响应于第一调整指令,对图像分割区域进行缩小处理,得到目标分割区域。通过上述方式,用户可以采用辅助分割工具对图像分割区域进行调整,从而得到更加准确的分割结果,由此提升方案的实用性和灵活性。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像分割的方法一个可选实施例中,通过图像分割模型获取图像特征信息所对应的图像分割区域之后,还可以包括:
接收针对第三顶点的第二调整指令,其中,第三顶点不属于图像分割区域;
响应于第二调整指令,对图像分割区域进行放大处理,得到目标分割区域,其中,目标分割区域包括基于所述第三顶点调整得到的第四顶点。
本实施例中,介绍了另一种对图像分割区域进行调整方法,用户可以通过辅助分割工具触发第二调整指令,为了便于理解,请参阅图7,图7为本申请实施例中增大图像分割区域的一个实施例示意图,如图所示,如果预分割的图像分割区域存在错误,用户可以直接对图像分割区域进行修改,修改方式包括在多边形边上新增顶点,或者画一个新的多边形覆盖错误区域并进行合并,比如,图像分割区域具有顶点E1、顶点E2和顶点E3,其中,顶点E1和顶点E2构成的线段落入树木的范围,因此,用户可以触发第二调整指令,即按住第三顶点(如顶点E2)向目标对象(如树木)的外部拖动,从而改变第三顶点的位置,辅助分割工具响应于第二调整指令,对图像分割区域进行放大处理,得到目标分割区域,该目标分割区域即为调整过的图像分割区域,且原来的第三顶点位置发生了变化,成为目标分割区域上的一个新的顶点(第四顶点),第四顶点可以为图7中E3所示的位置。
其次,本申请实施例中,提供了另一种对图像分割区域进行调整方法,即首先接收第二调整指令,然后响应于第二调整指令,对图像分割区域进行放大 处理,得到目标分割区域。通过上述方式,用户可以采用辅助分割工具对图像分割区域进行调整,从而得到更加准确的分割结果,由此提升方案的实用性和灵活性。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像分割的方法一个可选实施例中,所述N个矩阵输入通道包括红色输入通道、绿色输入通道和蓝色输入通道,根据待分割图像生成图像特征信息,可以包括:
根据待分割图像中的多个极值点生成热图;
根据待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色输入通道的第一图像矩阵,对应所述绿色输入通道的第二图像矩阵,以及对应所述蓝色输入通道的第三图像矩阵。
本实施例中,将以N=3个矩阵输入通道以及1个热图输入通道为例,介绍了一种生成图像特征信息的方式,为了便于理解,请参阅图8,图8为本申请实施例中生成图像特征信息的一个实施例示意图,如图所示,本申请采用深度极值点分割(Deep Extreme Cut,DEXTR)的输入格式,输入一个四通道的图像矩阵,也就是说本申请采用的模型输入除了原始图像以外,还包括四个极值点的信息,为了充分利用四个极值点的信息,生成一个和待分割图像尺寸一样的热图(heat map),即如图8所示,分别以四个极值点坐标为中心,生成2D高斯分布,然后把这个热图作为第四个通道,再与另外三个图像矩阵进行合并,得到图像特征信息,最后将图像特征信息作为图像分割模型的输入。
其中,三个图像矩阵分别为第一图像矩阵、第二图像矩阵和第三图像矩阵,第一图像矩阵对应于红色(R)输入通道,第二图像矩阵对应于绿色(G)输入通道,第三图像矩阵对应于蓝色(B)输入通道。
通过热图可以简单地聚合大量数据,并使用一种渐进的色带来表现,最终效果一般优于离散点的直接显示,可以很直观地展现空间数据的疏密程度或频率高低。热图生成的原理主要分为四个步骤,具体为:
(1)为离散点设定一个半径,创建一个缓冲区;
(2)对每个离散点的缓冲区,使用渐进的灰度带从内而外,由浅至深地填充;
(3)由于灰度值可以叠加,从而对于有缓冲区交叉的区域,可以叠加灰度值,因而缓冲区交叉的越多,灰度值越大,这块区域也就越热;
(4)以叠加后的灰度值为索引,从一条有256种颜色的色带中映射颜色,并对图像重新着色,从而生成热图。
可以理解的是,在实际应用中,还存在其他生成热图生成方式,比如说,还可以直接以每个极值点为中心构建四个实心圆。2D高斯分布的特点是,越靠近中心点,值越大,并随着距离中心点边远迅速衰减。本申请采用热图的原因是为了在输入热图中,给予图像分割模型一些先验知识,让图像分割模型知道这四个点是用户选择的极值点,但是考虑到用户选择的不一定是真实的极值点,可能存在一定误差,所以以极值点为中心生成了一个热图的分布。
其次,本申请实施例中,提供了一种根据待分割图像生成图像特征信息的方法,根据待分割图像中的多个极值点生成热图,根据待分割图像生成第一图像矩阵,根据待分割图像生成第二图像矩阵,根据待分割图像生成第三图像矩阵。通过上述方式,能够有效地提升方案的可行性和可操作性。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像分割的方法一个可选实施例中,通过图像分割模型获取图像特征信息所对应的图像分割区域,可以包括:
通过图像分割模型的编码器对图像特征信息进行编码,得到第一特征图以及第二特征图;
将第一特征图以及第二特征图进行拼接,得到目标特征图;
通过图像分割模型的解码器对目标特征图进行解码,得到图像分割区域。
本实施例中,介绍一种图像分割模型的结构,本申请是以DeeplabV3+模型结构为例进行介绍的,可以理解的是,还可以采用DeeplabV2模型结构,U-Net或者金字塔场景解析网络(Pyramid Scene Parsing Network,PSPNet)等。
为了便于理解,请参阅图9,图9为本申请实施例中图像分割模型的一个结构示意图,如图所示,提取待分割图像的特征,得到图像特征信息,将图像特征信息输入至图像分割模型。其中,图像分割模型包括编码器(Encoder)以及解码器(Decoder),编码器用于减少特征图的分辨率并捕捉更抽象的分割信息,解码器用于恢复空间信息。
首先通过编码器中的深度卷积神经网络(Deep Convolutional Neural Network,DCNN)对图像特征信息进行编码,即通过双线性插值恢复4倍大小的分辨率,得到第一特征图。采用1*1的卷积处理降低通道数,从而提取到图像特征信息的低层次特征,即可得到第二特征图。通过图像分割模型的解码器中的拼接层(concat)对第一特征图和第二特征图进行拼接,得到目标特征图。接一个大小为3*3的卷积来增强目标特征图,再通过一个插值来进一步恢复4倍分辨率至待分割图像的大小。
编码-解码结构可以通过逐渐恢复空间信息获得物体的边缘信息,DeeplabV3+模型结构在DeeplabV3模型结构的基础上增加了一个解码器来增强物体边缘的分割。
其次,本申请实施例中,提供了一种通过图像分割模型获取图像分割区域的方法,即首先通过图像分割模型的编码器对图像特征信息进行编码,得到第一特征图以及第二特征图,然后将第一特征图以及第二特征图进行拼接,得到目标特征图,最后通过图像分割模型的解码器对目标特征图进行解码,得到图像分割区域。通过上述方式,采用一种基于深度实验V3+版本(DeeplabV3+)的模型结构进行图像分割区域的预测,而DeeplabV3+模型结构总体参数量较少,因此,无论在训练还是实际预测都具有较快的运行速度,应用于辅助分割工具上能够更快地响应用户操作,提升使用效率,增强用户粘度。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像识别方法的一个可选实施例中,通过图像分割模型的解码器对目标特征图进行解码,得到图像分割区域,可以包括:
通过图像分割模型的解码器对目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,第一像素点集合包括多个第一像素点,第二像素点集合包括第二像素点;
根据第一像素点集合以及第二像素点集合,生成图像分割区域。
本实施例中,介绍了一种基于图像分割模型生成图像分割区域的方法,在图像分割区域对目标特征图进行解码之后,得到第一像素点集合以及第二像素点集合,这里的第一像素点集合属于目标对象的像素点,比如可以表示为“1”,第二像素点集合属于背景,比如可以表示为“0”,由第一像素点集合以及第二 像素点集合共同构成图像分割区域,也就是在图像分割区域中可以看到目标对象的分割结果。
DeeplabV3+模型结构总体参数量相较于DeeplabV2来说更少,这一特性使得DeeplabV3+模型结构不论是在训练还是实际使用的时候,运行速度都会获得提升,反映在实时的辅助分割工具使用上,能够更快响应用户给出的请求。
再次,本申请实施例中,提供了一种利用图像分割模型解码得到图像分割区域的方法,即通过图像分割模型的解码器对目标特征图进行解码,得到第一像素点集合以及第二像素点集合,然后根据第一像素点集合以及第二像素点集合,生成图像分割区域。通过上述方式,为方案的实现提供了具体的依据,并且基于图像分割模型的结构对特征进行解码,从而有利于提升图像分割模型应用的可靠性。
可选地,在上述图3以及图3对应的各个实施例的基础上,本申请实施例提供的图像识别方法的一个可选实施例中,通过图像分割模型获取图像特征信息所对应的图像分割区域之后,还可以包括:
通过多边拟合函数对待分割图像进行处理,得到多边形顶点信息,其中,多边形顶点信息包括多个顶点的位置信息;
根据多边形顶点信息,从待分割图像中确定目标对象。
本实施例中,介绍了一种从待分割图像中确定目标对象的方式,在得到图像分割区域之后,还需要对图像分割区域进行边缘处理,具体地,请参阅图10,图10为本申请实施例中图像分割模型输出过程的一个实施例示意图,如图所示,本申请提出的辅助分割工具是一种不需指定特定物体类别的分割工具,模型对于一张图上的任何物体,都可以根据用户给出的四个极值点,提供较准确的分割结果,因此在图像分割模型的输出层,不是根据预加载的类别编号对像素点进行分类,而是对图像上每一个像素点进行一次二分类,代表的意思是当前像素点是否属于极值点指向的这个物体内部。图像分割模型输出的图像分割区域具体可以表现为掩码图(可以理解为一个原图大小的二维图像,里面的值只有1和0,1表示模型分类成正的,0表示分类成负的),图像分割区域中每一个像素点的值都是0或1。像素点值为1,代表图像分割模型判断此像素点为目标对象的内部点,像素值为0,代表图像分割模型判断此像素点为背景点。图像分 割模型根据这个掩码图,提取目标对象的轮廓边缘,并对目标对象的边缘进行多边形拟合,最后将多边形顶点信息反馈给辅助分割工具,并在待分割图像中标注出来,其中,多边形顶点信息包括二维的坐标信息。
本申请采用的多边拟合函数具体可以是approxPolyDP函数,approxPolyDP函数的主要功能是把一个连续光滑曲线折线化,对图像轮廓点进行多边形拟合。approxPolyDP函数可以表示为:
void approxPolyDP(InputArray curve,OutputArray approxCurve,double epsilon,bool closed)
其中,InputArray curve表示是由图像的轮廓点组成的点集,OutputArray approxCurve表示输出的多边形点集,double epsilon表示输出的精度,即另一个轮廓点之间最大距离数,bool closed表示输出的多边形是否封闭。
可以理解的是,多边拟合函数还可以是其他类型的函数,此处仅为一个示意,不应理解为对本申请的限定。
进一步地,本申请实施例中,提供了一种对图像分割区域进行处理的方式,即首先通过多边拟合函数对待分割图像进行处理,得到多边形顶点信息,其中,多边形顶点信息包括多个顶点的位置信息,然后根据多边形顶点信息,从待分割图像中确定目标对象。通过上述方式,考虑到图像可能受到各种噪声干扰,这些噪声在图像上通常表现为孤立像素的离散变化,因此,采用多边拟合函数对待分割图像进行处理,能够很好地保留目标对象的边缘,获得更好地图像增强效果。
结合上述介绍,下面将对本申请中图像识别的方法进行介绍,请参阅图11,本申请实施例中图像识别方法的一个实施例包括:
201、接收针对待处理图像的物体标注指令。其中,所述待处理图像包括目标对象,物体标注指令携带目标对象所对应的多个极值点的位置信息。
需要说明的是,该多个极值点可以入前述实施例提及的为四个,对应的位置信息包括第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息。
本实施例中,图像识别设备展示待处理图像,其中,该待处理图像可以表现为辅助分割工具,用户使用辅助分割工具标注多个极值点(包括第一极值点、 第二极值点、第三极值点以及第四极值点),即触发物体标注指令,可以理解的是,本申请所提供的图像识别设备可为终端设备。
202、响应于物体标注指令,根据待处理图像生成待分割图像。
本实施例中,图像识别装置响应于物体标注指令,然后可以根据这些极值点生成待分割图像,待分割图像包括第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息。
203、根据待分割图像生成图像特征信息。其中,图像特征信息包括N个图像矩阵以及热图,热图为根据多个极值点生成的,N为大于或等于1的整数。
本实施例中,图像识别装置根据待分割图像生成N个图像矩阵,并且根据多个极值点生成热图,将热图与N个图像矩阵进行组合,得到待分割图像所对应的图像特征信息。
其中,数字图像数据可以用矩阵来表示,如果读取的待分割图像大小为128*128,则图像矩阵大小为128*128*N,其中,N为大于或等于1的整数。当N为1时,图像矩阵可以是灰度图像所对应的矩阵。当N为3时,图像矩阵可以是RGB图像的矩阵,RGB图像是三维的,三个维度分别表示红、绿和蓝三个分量,大小是0到255,每个像素都是由这三个分量组合而成。每一个RGB通道都对应一个图像矩阵(即第一图像矩阵、第二图像矩阵以及第三图像矩阵),因此,这三个RGB通道叠在一起形成了彩色图像,即得到待分割图像。当N为4时,图像矩阵可以是RGBA的色彩空间,对于PNG而言,也具有四个图像矩阵,此处不对N的数量进行限定。
204、通过图像分割模型获取图像特征信息所对应的图像分割区域,其中,图像分割模型包括N个矩阵输入通道以及一个热图输入通道,N个矩阵输入通道与N个图像矩阵具有一一对应的关系,一个热图输入通道与热图具有对应关系;
本实施例中,图像识别装置将图像特征信息输入至图像分割模型,其中,图像分割模型可以采用Deep Lab结构,包含但不仅限于DeepLabV1、DeepLabV2、DeepLabV3以及DeepLabV3+。本申请需要对图像分割模型的结构进行改进,对图像分割模型的第一层参数进行修改,使图像分割模型能够接收四个channel的图像数据,即图像分割模型包括第一输入通道、第二输入通道、 第三输入通道以及第四输入通道,第一图像矩阵作为第一输入通道的输入数据,第二图像矩阵作为第二输入通道的输入数据,第三图像矩阵作为第三输入通道的输入数据,热图作为第四输入通道的输入数据。
本申请需要对图像分割模型的结构进行改进,对图像分割模型的第一层参数进行修改,使图像分割模型能够接收(N+1)个channel的图像数据,即图像分割模型包括N个矩阵输入通道以及一个热图输入通道。假设N为3,则表示有3个图像矩阵,此时对应3个矩阵输入通道,每个矩阵输入通道对应一个图像矩阵,且此时还具有一个热图输入通道,该热图输入通道对应于热图。
类似地,假设N为1,则表示有1个图像矩阵,此时对应1个矩阵输入通道,1个矩阵输入通道对应灰度图像的一个图像矩阵,且此时还具有一个热图输入通道,该热图输入通道对应于热图。
类似地,假设N为4,则表示有4个图像矩阵,此时对应4个矩阵输入通道,每个矩阵输入通道对应一个图像矩阵,且此时还具有一个热图输入通道,该热图输入通道对应于热图。
205、通过多边拟合函数对待分割图像进行处理,得到多边形顶点信息。其中,多边形顶点信息包括多个顶点的位置信息。
本实施例中,图像识别装置输出的图像分割区域具体可以表现为掩码图,掩码图可以理解为是一个与待分割图像大小一样的二维图像,里面的值只有1和0,1表示分类为正,0表示分类为负,图像分割区域中每一个像素点的值都是0或1。像素点值为1,代表图像分割模型判断此像素点为目标对象的内部点,像素值为0,代表图像分割模型判断此像素点为背景点。图像识别装置采用多边拟合函数对待分割图像进行处理,得到多边形顶点信息,将多边形顶点信息反馈给辅助分割工具。
206、根据多边形顶点信息,在待分割图像中展示目标对象。
本实施例中,图像识别装置根据多边形顶点信息,在待分割图像中突出展示目标对象。具体可以是将多边形顶点信息反馈给辅助分割工具,然后在待分割图像中标注出来。
本申请实施例中,提供了一种图像识别的方法,当展示待处理图像时,接收物体标注指令,响应于物体标注指令,根据待处理图像生成待分割图像,然 后根据待分割图像生成图像特征信息,通过图像分割模型获取图像特征信息所对应的图像分割区域,再通过多边拟合函数对待分割图像进行处理,得到多边形顶点信息,最后根据多边形顶点信息,在待分割图像中突出展示目标对象。通过上述方式,无需考虑目标是否满足特定类别,而是利用极值点所生成的热图作为图像特征信息的一部分,丰富了图像的特征内容,使得图像分割模型能够根据该图像特征信息生成更加准确的图像分割区域,从而提升辅助分割工具的通用性和适用性,进而还可以直接突出展示目标对象。
下面将结合实验数据对本申请提供的图像分割方法进行说明,请参阅图12,图12为本申请实施例中基于分割方式的一个实验结果对比示意图,如图所示,其中,图12中的(a)图表示原图,(b)图表示采用谷歌公司流体标注(Fluid Annotation)的分割辅助工具所得到的图像,(c)图表示采用分割数据集高效标记Polygon-RNN++工具所得到的图像,(d)图表示采用本申请提供的辅助分割工具所标注的图像。相比于原图而言,(b)图、(c)图和(d)图分别蒙上了一层,这是因为分割结果把原图和分割后的蒙版结合在一起了,分割后的蒙版提供一个透明的色彩,然后和原图叠加在一起。
基于图12的分割结果可以看出,本申请提供的辅助分割工具,相比于现有的辅助分割工具而言,能够提供更准确的分割结果。另外,本申请改进的图像分割模型相比于原始的分割模型,在保证分割精度不下降的情况下,还降低了模型响应时间,对于线上的辅助分割工具来说,增加了交互小。请参阅表1,表1为本申请提供的图像分割模型和原始模型性能及时间的对比。
表1
其中,mIOU表示平均交互比(mean Intersection Over Union,mIOU),mIOU是一个衡量图像分割精度的重要指标,mIOU即预测区域和实际区域的交集除以预测区域和实际区域的并集,对所有类别取平均。Pascal为一种图像分割数据集,语义边界数据集(Semantic Boundaries Dataset,SBD)为一种图像分割 数据集,Tesla P100为采用的显卡的型号。表1展示了本申请提供的图像分割模型和原始DEXTR模型在不同数据集下训练以后的表现,这里采用mIOU这个指标来表示模型性能。在只用pascal数据集训练的情况下,本申请采用的图像分割模型在测试数据集上能够提供更准确的结果,在使用pascal+SBD数据集训练的情况下,本申请采用的图像分割模型和原始DEXTR模型性能相差不大。表1还展示了两个模型在相同显卡环境下运行单张图片的平均时间对比,可以看到,本申请采用的图像分割模型,相比于原始DEXTR模型,在时间性能上有非常显著的提升。
由此可见,本申请提供的辅助分割工具能够提供复杂场景下更加准确的分割结果,一方面能够给出同样准确的预分割结果,另一方面也能实现更快的模型速度,让线上辅助工具能够更快地响应。
下面对本申请中的图像分割装置进行详细描述,请参阅图13,图13为本申请实施例中图像识别装置一个实施例示意图,图像识别装置30包括:
获取模块301,用于获取待分割图像,其中,所述待分割图像包括多个极值点;
生成模块302,用于根据获取模块301获取的所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
所述获取模块301,还用于通过图像分割模型获取所述生成模块302生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
所述生成模块302,还用于根据所述获取模块301获取的所述图像分割区域生成所述待分割图像的图像识别结果。
本实施例中,获取模块301获取待分割图像,其中,所述待分割图像包括多个极值点,生成模块302根据获取模块301获取的所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数,所述获取模块301通过图像分割模型获取所述生成模块302生成的所述图像特征信息所对应的图像 分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述一个热图输入通道与所述热图具有对应关系,所述生成模块302根据所述获取模块301获取的所述图像分割区域生成所述待分割图像的图像识别结果。
本申请实施例中,提供了一种图像识别装置,首先获取待分割图像,其中,待分割图像包括多个极值点,然后根据待分割图像生成图像特征信息,其中,图像特征信息包括N个图像矩阵以及热图,热图为根据多个极值点生成的,再通过图像分割模型获取图像特征信息所对应的图像分割区域,其中,图像分割模型包括N个矩阵输入通道以及一个热图输入通道,N个矩阵输入通道与N个图像矩阵具有一一对应的关系,一个热图输入通道与热图具有对应关系,最后根据图像分割区域生成待分割图像的图像识别结果。通过上述方式,无需考虑目标是否满足特定类别,而是利用极值点所生成的热图作为图像特征信息的一部分,丰富了图像的特征内容,使得图像分割模型能够根据该图像特征信息生成更加准确的图像分割区域,从而提升图像分割的通用性和适用性。
可选地,在上述图13所对应的实施例的基础上,本申请实施例提供的图像识别装置30的另一实施例中,
所述获取模块301,具体用于展示待处理图像,其中,所述待处理图像中包括目标对象;
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应多个极值点的位置信息,所述多个极值点用于标识所述目标对象的轮廓边缘;
响应于所述物体标注指令,根据所述待处理图像生成所述待分割图像。
其次,本申请实施例中,提供了一种标注极值点的方法,首先展示待处理图像,然后接收物体标注指令,其中,物体标注指令携带目标对象所对应的第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息,最后响应于物体标注指令,根据待处理图像生成待分割图像。通过上述方式,能够利用辅助分割工具对待处理图像进行标注,辅助分割工具的操作难度较低,使用的便利性较高,从而提升方案的可行性和可操作性。
可选地,在上述图13所对应的实施例的基础上,请参阅图14,本申请实施 例提供的图像识别装置30的另一实施例中,所述图像识别装置30还包括接收模块303以及处理模块304;
所述接收模块303,用于接收针对第一顶点的第一调整指令,其中,所述第一顶点属于所述图像分割区域的边缘点,所述第一顶点对应于第一位置信息;
所述处理模块304,用于响应于所述接收模块303接收的所述第一调整指令,对所述图像分割区域进行缩小处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第一顶点调整得到的第二顶点,所述第二顶点对应于第二位置信息,所述第二位置信息与所述第一位置信息不相同。
其次,本申请实施例中,提供了一种对图像分割区域进行调整方法,即接收第一调整指令,然后响应于第一调整指令,对图像分割区域进行缩小处理,得到目标分割区域。通过上述方式,用户可以采用辅助分割工具对图像分割区域进行调整,从而得到更加准确的分割结果,由此提升方案的实用性和灵活性。
可选地,在上述图14所对应的实施例的基础上,本申请实施例提供的图像识别装置30的另一实施例中,所述图像识别装置30还包括所述接收模块303以及所述处理模块304;
所述接收模块303,还用于接收针对第三定点的第二调整指令,其中,所述第三顶点属于所述图像分割区域;
所述处理模块304,还用于响应于所述接收模块303接收的所述第二调整指令,对所述图像分割区域进行放大处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第三顶点调整得到的第四顶点。
其次,本申请实施例中,提供了另一种对图像分割区域进行调整方法,即首先接收第二调整指令,然后响应于第二调整指令,对图像分割区域进行放大处理,得到目标分割区域。通过上述方式,用户可以采用辅助分割工具对图像分割区域进行调整,从而得到更加准确的分割结果,由此提升方案的实用性和灵活性。
可选地,在上述图13或图14所对应的实施例的基础上,所述N个矩阵输入通道包括红色输入通道、绿色输入通道和蓝色输入通道,本申请实施例提供的图像识别装置30的另一实施例中,
所述生成模块302,具体用于根据所述待分割图像中的所述多个极值点生 成所述热图;
根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色输入通道的第一图像矩阵,对应所述绿色输入通道的第二图像矩阵,以及对应所述蓝色输入通道的第三图像矩阵。
其次,本申请实施例中,提供了一种根据待分割图像生成图像特征信息的方法,根据待分割图像中的多个极值点生成热图,根据待分割图像生成第一图像矩阵,根据待分割图像生成第二图像矩阵,根据待分割图像生成第三图像矩阵。通过上述方式,能够有效地提升方案的可行性和可操作性。
可选地,在上述图13或图14所对应的实施例的基础上,本申请实施例提供的图像识别装置30的另一实施例中,
所述获取模块301,具体用于通过所述图像分割模型的编码器对所述图像特征信息进行编码,得到第一特征图以及第二特征图;
将所述第一特征图以及所述第二特征图进行拼接,得到目标特征图;
通过所述图像分割模型的解码器对所述目标特征图进行解码,得到所述图像分割区域。
其次,本申请实施例中,提供了一种通过图像分割模型获取图像分割区域的方法,即首先通过图像分割模型的编码器对图像特征信息进行编码,得到第一特征图以及第二特征图,然后将第一特征图以及第二特征图进行拼接,得到目标特征图,最后通过图像分割模型的解码器对目标特征图进行解码,得到图像分割区域。通过上述方式,采用一种基于深度实验V3+版本(DeeplabV3+)的模型结构进行图像分割区域的预测,而DeeplabV3+模型结构总体参数量较少,因此,无论在训练还是实际预测都具有较快的运行速度,应用于辅助分割工具上能够更快地响应用户操作,提升使用效率,增强用户粘度。
可选地,在上述图13或图14所对应的实施例的基础上,本申请实施例提供的图像识别装置30的另一实施例中,
所述获取模块301,具体用于通过所述图像分割模型的解码器对所述目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,所述第一像素点集合包括多个第一像素点,所述第二像素点集合包括第二像素点;
根据所述第一像素点集合以及所述第二像素点集合,生成所述图像分割区 域。
再次,本申请实施例中,提供了一种利用图像分割模型解码得到图像分割区域的方法,即通过图像分割模型的解码器对目标特征图进行解码,得到第一像素点集合以及第二像素点集合,然后根据第一像素点集合以及第二像素点集合,生成图像分割区域。通过上述方式,为方案的实现提供了具体的依据,并且基于图像分割模型的结构对特征进行解码,从而有利于提升图像分割模型应用的可靠性。
可选地,在上述图13或图14所对应的实施例的基础上,请参阅图15,本申请实施例提供的图像识别装置30的另一实施例中,所述图像识别装置还包括所述处理模块304以及确定模块306;
所述处理模块304,还用于所述获取模块301通过图像分割模型获取所述图像特征信息所对应的图像分割区域之后,通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
所述确定模块306,用于根据所述处理模块304处理得到的所述多边形顶点信息,从所述待分割图像中确定目标对象。
进一步地,本申请实施例中,提供了一种对图像分割区域进行处理的方式,即首先通过多边拟合函数对待分割图像进行处理,得到多边形顶点信息,其中,多边形顶点信息包括多个顶点的位置信息,然后根据多边形顶点信息,从待分割图像中确定目标对象。通过上述方式,考虑到图像可能受到各种噪声干扰,这些噪声在图像上通常表现为孤立像素的离散变化,因此,采用多边拟合函数对待分割图像进行处理,能够很好地保留目标对象的边缘,获得更好地图像增强效果。
下面对本申请中的图像识别装置进行详细描述,请参阅图16,图16为本申请实施例中图像识别装置一个实施例示意图,图像识别装置40包括:
接收模块401,用于接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
生成模块402,用于响应于所述接收模块401接收的所述物体标注指令,根 据所述待处理图像生成待分割图像;
所述生成模块402,还用于根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
获取模块403,用于通过图像分割模型获取所述生成模块402生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
处理模块404,用于通过多边拟合函数对所述获取模块403获取的所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
展示模块405,用于根据所述处理模块404处理得到的所述多边形顶点信息,在所述待分割图像中展示所述目标对象。
本实施例中,当展示待处理图像时,接收模块401接收物体标注指令,其中,所述物体标注指令携带所述目标对象所对应的第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息,生成模块402响应于所述接收模块401接收的所述物体标注指令,根据所述待处理图像生成待分割图像,所述生成模块402根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数,获取模块403通过图像分割模型获取所述生成模块402生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述一个热图输入通道与所述热图具有对应关系,处理模块404通过多边拟合函数对所述获取模块403获取的所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息,展示模块405根据所述处理模块404处理得到的所述多边形顶点信息,在所述待分割图像中突出展示目标对象。
本申请实施例中,提供了一种图像识别装置,当展示待处理图像时,接收物体标注指令,响应于物体标注指令,根据待处理图像生成待分割图像,然后 根据待分割图像生成图像特征信息,通过图像分割模型获取图像特征信息所对应的图像分割区域,再通过多边拟合函数对待分割图像进行处理,得到多边形顶点信息,最后根据多边形顶点信息,在待分割图像中突出展示目标对象。通过上述方式,无需考虑目标是否满足特定类别,而是利用极值点所生成的热图作为图像特征信息的一部分,丰富了图像的特征内容,使得图像分割模型能够根据该图像特征信息生成更加准确的图像分割区域,从而提升辅助分割工具的通用性和适用性,进而还可以直接突出展示目标对象。
本申请实施例还提供了另一种图像识别装置,如图17所示,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。该终端设备可以为包括手机、平板电脑、PDA、POS、车载电脑等任意终端设备,以终端设备为手机为例:
图17示出的是与本申请实施例提供的终端设备相关的手机的部分结构的框图。参考图17,手机包括:射频(Radio Frequency,RF)电路510、存储器520、输入单元530、显示单元540、传感器550、音频电路560、无线保真(wireless fidelity,WiFi)模块570、处理器580、以及电源590等部件。本领域技术人员可以理解,图17中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图17对手机的各个构成部件进行具体的介绍:
RF电路510可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器580处理;另外,将设计上行的数据发送给基站。通常,RF电路510包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路510还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器520可用于存储软件程序以及模块,处理器580通过运行存储在存储器520的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器520可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元530可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元530可包括触控面板531以及其他输入设备532。触控面板531,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板531上或在触控面板531附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板531可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器580,并能接收处理器580发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板531。除了触控面板531,输入单元530还可以包括其他输入设备532。具体地,其他输入设备532可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元540可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元540可包括显示面板541,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板541。进一步的,触控面板531可覆盖显示面板541,当触控面板531检测到在其上或附近的触摸操作后,传送给处理器580以确定触摸事件的类型,随后处理器580根据触摸事件的类型在显示面板541上提供相应的视觉输出。虽然在图17中,触控面板531与显示面板541是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面 板531与显示面板541集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器550,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板541的亮度,接近传感器可在手机移动到耳边时,关闭显示面板541和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路560、扬声器561,传声器562可提供用户与手机之间的音频接口。音频电路560可将接收到的音频数据转换后的电信号,传输到扬声器561,由扬声器561转换为声音信号输出;另一方面,传声器562将收集的声音信号转换为电信号,由音频电路560接收后转换为音频数据,再将音频数据输出处理器580处理后,经RF电路510以发送给比如另一手机,或者将音频数据输出至存储器520以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块570可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图17示出了WiFi模块570,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器580是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器520内的软件程序和/或模块,以及调用存储在存储器520内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器580可包括一个或多个处理单元;可选的,处理器580可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器580中。
手机还包括给各个部件供电的电源590(比如电池),可选的,电源可以通过电源管理系统与处理器580逻辑相连,从而通过电源管理系统实现管理充电、 放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该终端设备所包括的处理器580还具有以下功能:
获取待分割图像,其中,所述待分割图像包括多个极值点;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
根据所述图像分割区域生成所述待分割图像的图像识别结果。
可选地,处理器580具体用于执行如下步骤:
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息,所述多个极值点用于标识所述目标对象的轮廓边缘;
响应于所述物体标注指令,根据所述待处理图像生成所述待分割图像。
可选的,所述多个极值点的位置信息包括分别标识所述目标对象的轮廓边缘四周的第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息。
可选地,处理器580还用于执行如下步骤:
接收针对第一顶点的第一调整指令,其中,所述第一顶点属于所述图像分割区域的边缘点,所述第一顶点对应于第一位置信息;
响应于所述第一调整指令,对所述图像分割区域进行缩小处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第一顶点调整得到的第二顶点,所述第二顶点对应于第二位置信息,所述第二位置信息与所述第一位置信息不相同。
可选地,处理器580还用于执行如下步骤:
接收针对第三顶点的第二调整指令,其中,所述第三顶点属于所述图像分 割区域;
响应于所述第二调整指令,对所述图像分割区域进行放大处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第三顶点调整得到的第四顶点。
可选地,所述N个矩阵输入通道包括红色输入通道、绿色输入通道和蓝色输入通道,处理器580具体用于执行如下步骤:
根据所述待分割图像中的所述多个极值点生成所述热图;
根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色输入通道的第一图像矩阵,对应所述绿色输入通道的第二图像矩阵,以及对应所述蓝色输入通道的第三图像矩阵。
可选地,处理器580具体用于执行如下步骤:
通过所述图像分割模型的编码器对所述图像特征信息进行编码,得到第一特征图以及第二特征图;
将所述第一特征图以及所述第二特征图进行拼接,得到目标特征图;
通过所述图像分割模型的解码器对所述目标特征图进行解码,得到所述图像分割区域。
可选地,处理器580具体用于执行如下步骤:
通过所述图像分割模型的解码器对所述目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,所述第一像素点集合包括多个第一像素点,所述第二像素点集合包括第二像素点;
根据所述第一像素点集合以及所述第二像素点集合,生成所述图像分割区域。
可选地,处理器580还用于执行如下步骤:
通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
根据所述多边形顶点信息,从所述待分割图像中确定目标对象。
在本申请实施例中,该终端设备所包括的处理器580还具有以下功能:
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
响应于所述物体标注指令,根据所述待处理图像生成待分割图像;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
根据所述多边形顶点信息,在所述待分割图像中展示所述目标对象。
图18是本申请实施例提供的一种服务器结构示意图,该服务器600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)622(例如,一个或一个以上处理器)和存储器632,一个或一个以上存储应用程序642或数据644的存储介质630(例如一个或一个以上海量存储设备)。其中,存储器632和存储介质630可以是短暂存储或持久存储。存储在存储介质630的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器622可以设置为与存储介质630通信,在服务器600上执行存储介质630中的一系列指令操作。
服务器600还可以包括一个或一个以上电源626,一个或一个以上有线或无线网络接口650,一个或一个以上输入输出接口658,和/或,一个或一个以上操作系统641,例如WindowsServerTM,MacOSXTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的步骤可以基于该图18所示的服务器结构。
在本申请实施例中,该服务器所包括的CPU622还具有以下功能:
获取待分割图像,其中,所述待分割图像包括多个极值点;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于 或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
根据所述图像分割区域生成所述待分割图像的图像识别结果。
可选地,CPU622具体用于执行如下步骤:
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
响应于所述物体标注指令,根据所述待处理图像生成所述待分割图像。
可选地,CPU622还用于执行如下步骤:
接收针对第一顶点的第一调整指令,其中,所述第一顶点属于所述图像分割区域的边缘点,所述第一顶点对应于第一位置信息;
响应于所述第一调整指令,对所述图像分割区域进行缩小处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第一顶点调整得到的第二顶点,所述第二顶点对应于第二位置信息,所述第二位置信息与所述第一位置信息不相同。
可选地,CPU622还用于执行如下步骤:
接收针对第三顶点的第二调整指令,其中,所述第三顶点属于所述图像分割区域;
响应于所述第二调整指令,对所述图像分割区域进行放大处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第三顶点调整得到的第四顶点。
可选地,所述N个矩阵输入通道包括红色输入通道、绿色输入通道和蓝色输入通道,CPU622具体用于执行如下步骤:
根据所述待分割图像中的所述多个极值点生成所述热图;
根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色输入通道的第一图像矩阵,对应所述绿色输入通道的第二图像矩阵,以及对应所述蓝色输入通道的第三图像矩阵。
可选地,CPU622具体用于执行如下步骤:
通过所述图像分割模型的编码器对所述图像特征信息进行编码,得到第一特征图以及第二特征图;
将所述第一特征图以及所述第二特征图进行拼接,得到目标特征图;
通过所述图像分割模型的解码器对所述目标特征图进行解码,得到所述图像分割区域。
可选地,CPU622具体用于执行如下步骤:
通过所述图像分割模型的解码器对所述目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,所述第一像素点集合包括多个第一像素点,所述第二像素点集合包括第二像素点;
根据所述第一像素点集合以及所述第二像素点集合,生成所述图像分割区域。
可选地,CPU622还用于执行如下步骤:
通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
根据所述多边形顶点信息,从所述待分割图像中确定目标对象。
在本申请实施例中,该终端设备所包括的CPU622还具有以下功能:
接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;
响应于所述物体标注指令,根据所述待处理图像生成待分割图像;
根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;
通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;
通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;
根据所述多边形顶点信息,在所述待分割图像中展示所述目标对象。
另外,本申请实施例还提供了一种存储介质,存储介质用于存储程序代码,程序代码用于执行上述实施例提供的方法。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在服务器上运行时,使得服务器执行上述实施例提供的方法。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述 的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (16)
- 一种图像识别的方法,所述方法由图像处理设备执行,所述方法包括:获取待分割图像,其中,所述待分割图像包括多个极值点;根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;根据所述图像分割区域生成所述待分割图像的图像识别结果。
- 根据权利要求1所述的方法,所述获取待分割图像,包括:接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应多个极值点的位置信息,所述多个极值点用于标识所述目标对象的轮廓边缘;响应于所述物体标注指令,根据所述待处理图像生成所述待分割图像。
- 根据权利要求2所述的方法,所述多个极值点的位置信息包括分别标识所述目标对象的轮廓边缘四周的第一极值点位置信息、第二极值点位置信息、第三极值点位置信息以及第四极值点位置信息。
- 根据权利要求1所述的方法,所述通过图像分割模型获取所述图像特征信息所对应的图像分割区域之后,所述方法还包括:接收针对第一顶点的第一调整指令,其中,所述第一顶点属于所述图像分割区域的边缘点,所述第一顶点对应于第一位置信息;响应于所述第一调整指令,对所述图像分割区域进行缩小处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第一顶点调整得到的第二顶点,所述第二顶点对应于第二位置信息,所述第二位置信息与所述第一位置信息不相同。
- 根据权利要求1所述的方法,所述通过图像分割模型获取所述图像特征信息所对应的图像分割区域之后,所述方法还包括:接收针对第三顶点的第二调整指令,其中,所述第三顶点属于所述图像分割区域;响应于所述第二调整指令,对所述图像分割区域进行放大处理,得到目标分割区域,其中,所述目标分割区域包括基于所述第三顶点调整得到的第四顶点。
- 根据权利要求1所述的方法,所述N个矩阵输入通道包括红色输入通道、绿色输入通道和蓝色输入通道,所述根据所述待分割图像生成图像特征信息,包括:根据所述待分割图像中的所述多个极值点生成所述热图;根据所述待分割图像生成N个图像矩阵,所述N个图像矩阵包括对应所述红色输入通道的第一图像矩阵,对应所述绿色输入通道的第二图像矩阵,以及对应所述蓝色输入通道的第三图像矩阵。
- 根据权利要求1所述的方法,所述通过图像分割模型获取所述图像特征信息所对应的图像分割区域,包括:通过所述图像分割模型的编码器对所述图像特征信息进行编码,得到第一特征图以及第二特征图;将所述第一特征图以及所述第二特征图进行拼接,得到目标特征图;通过所述图像分割模型的解码器对所述目标特征图进行解码,得到所述图像分割区域。
- 根据权利要求7所述的方法,所述通过所述图像分割模型的解码器对所述目标特征图进行解码,得到所述图像分割区域,包括:通过所述图像分割模型的解码器对所述目标特征图进行解码,得到第一像素点集合以及第二像素点集合,其中,所述第一像素点集合包括多个第一像素点,所述第二像素点集合包括第二像素点;根据所述第一像素点集合以及所述第二像素点集合,生成所述图像分割区域。
- 根据权利要求1至8中任一项所述的方法,所述通过图像分割模型获取所述图像特征信息所对应的图像分割区域之后,所述方法还包括:通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其 中,所述多边形顶点信息包括多个顶点的位置信息;根据所述多边形顶点信息,从所述待分割图像中确定目标对象。
- 一种图像识别的方法,所述方法由终端设备执行,所述方法包括:接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;响应于所述物体标注指令,根据所述待处理图像生成待分割图像;根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;根据所述多边形顶点信息,在所述待分割图像中展示所述目标对象。
- 一种图像识别装置,包括:获取模块,用于获取待分割图像,其中,所述待分割图像包括多个极值点;生成模块,用于根据所述获取模块获取的所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;所述获取模块,还用于通过图像分割模型获取所述生成模块生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;所述生成模块,还用于根据所述获取模块获取的所述图像分割区域生成所述待分割图像的图像识别结果。
- 一种图像识别装置,包括:接收模块,用于接收针对待处理图像的物体标注指令,其中,所述待处理 图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;生成模块,用于响应于所述接收模块接收的所述物体标注指令,根据所述待处理图像生成待分割图像;所述生成模块,还用于根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;获取模块,用于通过图像分割模型获取所述生成模块生成的所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;处理模块,用于通过多边拟合函数对所述获取模块获取的所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;展示模块,用于根据所述处理模块处理得到的所述多边形顶点信息,在所述待分割图像中展示所述目标对象。
- 一种终端设备,包括:存储器、收发器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:获取待分割图像,其中,所述待分割图像包括多个极值点;根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;根据所述图像分割区域生成所述待分割图像的图像识别结果;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及 所述处理器进行通信。
- 一种终端设备,包括:存储器、收发器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:接收针对待处理图像的物体标注指令,其中,所述待处理图像包括目标对象,所述物体标注指令携带所述目标对象所对应的多个极值点的位置信息;响应于所述物体标注指令,根据所述待处理图像生成待分割图像;根据所述待分割图像生成图像特征信息,其中,所述图像特征信息包括N个图像矩阵以及热图,所述热图为根据所述多个极值点生成的,所述N为大于或等于1的整数;通过图像分割模型获取所述图像特征信息所对应的图像分割区域,其中,所述图像分割模型包括N个矩阵输入通道以及一个热图输入通道,所述N个矩阵输入通道与所述N个图像矩阵具有一一对应的关系,所述热图输入通道与所述热图具有对应关系;通过多边拟合函数对所述待分割图像进行处理,得到多边形顶点信息,其中,所述多边形顶点信息包括多个顶点的位置信息;根据所述多边形顶点信息,在所述待分割图像中展示所述目标对象;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 一种计算机可读存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行如权利要求1至9中任一项所述的方法,或,执行如权利要求10所述的方法。
- 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的方法,或,执行如权利要求10所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20819540.4A EP3982290A4 (en) | 2019-06-04 | 2020-05-18 | IMAGE RECOGNITION METHODS BASED ON ARTIFICIAL INTELLIGENCE AND RESPECTIVE DEVICE |
US17/407,140 US12045990B2 (en) | 2019-06-04 | 2021-08-19 | Image recognition method and related apparatus based on artificial intelligence |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910481441.0A CN110276344B (zh) | 2019-06-04 | 2019-06-04 | 一种图像分割的方法、图像识别的方法以及相关装置 |
CN201910481441.0 | 2019-06-04 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/407,140 Continuation US12045990B2 (en) | 2019-06-04 | 2021-08-19 | Image recognition method and related apparatus based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020244373A1 true WO2020244373A1 (zh) | 2020-12-10 |
Family
ID=67960540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/090787 WO2020244373A1 (zh) | 2019-06-04 | 2020-05-18 | 基于人工智能的图像识别方法以及相关装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US12045990B2 (zh) |
EP (1) | EP3982290A4 (zh) |
CN (1) | CN110276344B (zh) |
WO (1) | WO2020244373A1 (zh) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276344B (zh) * | 2019-06-04 | 2023-11-24 | 腾讯科技(深圳)有限公司 | 一种图像分割的方法、图像识别的方法以及相关装置 |
CN110232696B (zh) | 2019-06-20 | 2024-03-08 | 腾讯科技(深圳)有限公司 | 一种图像区域分割的方法、模型训练的方法及装置 |
CN110929792B (zh) * | 2019-11-27 | 2024-05-24 | 深圳市商汤科技有限公司 | 图像标注方法、装置、电子设备及存储介质 |
CN111080656B (zh) * | 2019-12-10 | 2024-10-01 | 腾讯科技(深圳)有限公司 | 一种图像处理的方法、图像合成的方法以及相关装置 |
CN111178224B (zh) * | 2019-12-25 | 2024-04-05 | 浙江大华技术股份有限公司 | 物体规则判断方法、装置、计算机设备和存储介质 |
CN111369478B (zh) * | 2020-03-04 | 2023-03-21 | 腾讯科技(深圳)有限公司 | 人脸图像增强方法、装置、计算机设备和存储介质 |
CN111415358B (zh) * | 2020-03-20 | 2024-03-12 | Oppo广东移动通信有限公司 | 图像分割方法、装置、电子设备及存储介质 |
CN111583159B (zh) * | 2020-05-29 | 2024-01-05 | 北京金山云网络技术有限公司 | 一种图像补全方法、装置及电子设备 |
CN111860487B (zh) * | 2020-07-28 | 2022-08-19 | 天津恒达文博科技股份有限公司 | 基于深度神经网络的碑文标注检测识别系统 |
CN112364898B (zh) * | 2020-10-27 | 2024-01-19 | 星火科技技术(深圳)有限责任公司 | 图像识别自动标注方法、装置、设备及存储介质 |
CN112529914B (zh) * | 2020-12-18 | 2021-08-13 | 北京中科深智科技有限公司 | 一种实时头发分割方法和系统 |
KR102416216B1 (ko) * | 2021-02-09 | 2022-07-05 | 주식회사 비젼그리드 | 영상인식을 이용한 3차원 실체 형상 데이터 생성 방법 및 장치 |
CN113506302B (zh) * | 2021-07-27 | 2023-12-12 | 四川九洲电器集团有限责任公司 | 一种交互式对象更新方法、装置及处理系统 |
CN113627416B (zh) * | 2021-10-12 | 2022-01-25 | 上海蜜度信息技术有限公司 | 图片分类和对象检测的同步处理方法、系统、存储介质及终端 |
CN114419327B (zh) * | 2022-01-18 | 2023-07-28 | 北京百度网讯科技有限公司 | 图像检测方法和图像检测模型的训练方法、装置 |
CN114241339A (zh) * | 2022-02-28 | 2022-03-25 | 山东力聚机器人科技股份有限公司 | 遥感图像识别模型、方法及系统、服务器及介质 |
CN115082405B (zh) * | 2022-06-22 | 2024-05-14 | 强联智创(北京)科技有限公司 | 颅内病灶的检测模型的训练方法、检测方法、装置及设备 |
CN114998424B (zh) * | 2022-08-04 | 2022-10-21 | 中国第一汽车股份有限公司 | 车窗的位置确定方法、装置和车辆 |
WO2024112579A1 (en) * | 2022-11-23 | 2024-05-30 | Subtle Medical, Inc. | Systems and methods for mri contrast synthesis under light-weighted framework |
CN116385459B (zh) * | 2023-03-08 | 2024-01-09 | 阿里巴巴(中国)有限公司 | 图像分割方法及装置 |
CN116020122B (zh) * | 2023-03-24 | 2023-06-09 | 深圳游禧科技有限公司 | 游戏攻略推荐方法、装置、设备及存储介质 |
CN116958572B (zh) * | 2023-09-18 | 2023-12-19 | 济宁市林业保护和发展服务中心 | 一种果树繁育中叶片病虫害区域分析方法 |
CN118212504B (zh) * | 2024-05-21 | 2024-07-12 | 东北大学 | 一种基于deepLabV2深度学习的低倍组织识别方法与系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170004628A1 (en) * | 2013-08-27 | 2017-01-05 | Samsung Electronics Co., Ltd. | Method and apparatus for segmenting object in image |
CN108022243A (zh) * | 2017-11-23 | 2018-05-11 | 浙江清华长三角研究院 | 一种基于深度学习的图像中纸张检测方法 |
CN109447994A (zh) * | 2018-11-05 | 2019-03-08 | 陕西师范大学 | 结合完全残差与特征融合的遥感图像分割方法 |
CN110276344A (zh) * | 2019-06-04 | 2019-09-24 | 腾讯科技(深圳)有限公司 | 一种图像分割的方法、图像识别的方法以及相关装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10751548B2 (en) * | 2017-07-28 | 2020-08-25 | Elekta, Inc. | Automated image segmentation using DCNN such as for radiation therapy |
CN108427951B (zh) * | 2018-02-08 | 2023-08-04 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、存储介质和计算机设备 |
-
2019
- 2019-06-04 CN CN201910481441.0A patent/CN110276344B/zh active Active
-
2020
- 2020-05-18 WO PCT/CN2020/090787 patent/WO2020244373A1/zh unknown
- 2020-05-18 EP EP20819540.4A patent/EP3982290A4/en active Pending
-
2021
- 2021-08-19 US US17/407,140 patent/US12045990B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170004628A1 (en) * | 2013-08-27 | 2017-01-05 | Samsung Electronics Co., Ltd. | Method and apparatus for segmenting object in image |
CN108022243A (zh) * | 2017-11-23 | 2018-05-11 | 浙江清华长三角研究院 | 一种基于深度学习的图像中纸张检测方法 |
CN109447994A (zh) * | 2018-11-05 | 2019-03-08 | 陕西师范大学 | 结合完全残差与特征融合的遥感图像分割方法 |
CN110276344A (zh) * | 2019-06-04 | 2019-09-24 | 腾讯科技(深圳)有限公司 | 一种图像分割的方法、图像识别的方法以及相关装置 |
Non-Patent Citations (5)
Title |
---|
DIM P. PAPADOPOULOS ET AL.: "Extreme clicking for efficient object annotation", 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 31 December 2017 (2017-12-31), XP033283371, DOI: 20200807123028Y * |
K.-K. MANINIS ET AL.: "Deep Extreme Cut: From Extreme Points to Object Segmentation", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2018 (2018-12-31), XP033476022, DOI: 20200807122615Y * |
K.-K. MANINIS ET AL.: "Deep Extreme Cut: From Extreme Points to Object Segmentation", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2018 (2018-12-31), XP033476022, DOI: 20200807144225X * |
See also references of EP3982290A4 |
YONG WOOD : "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (DeepLab v3+) Notes", 25 February 2019 (2019-02-25), pages 1 - 14, XP055826195, Retrieved from the Internet <URL:https://blog.csdn.net/Lin_Danny/article/details/87924277> * |
Also Published As
Publication number | Publication date |
---|---|
US20210383549A1 (en) | 2021-12-09 |
EP3982290A4 (en) | 2022-07-27 |
US12045990B2 (en) | 2024-07-23 |
CN110276344B (zh) | 2023-11-24 |
EP3982290A1 (en) | 2022-04-13 |
CN110276344A (zh) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020244373A1 (zh) | 基于人工智能的图像识别方法以及相关装置 | |
WO2020253663A1 (zh) | 基于人工智能的图像区域识别方法、模型训练方法及装置 | |
US11989350B2 (en) | Hand key point recognition model training method, hand key point recognition method and device | |
US11776097B2 (en) | Image fusion method, model training method, and related apparatuses | |
CN111652121B (zh) | 一种表情迁移模型的训练方法、表情迁移的方法及装置 | |
CN109635621B (zh) | 用于第一人称视角中基于深度学习识别手势的系统和方法 | |
WO2020192471A1 (zh) | 一种图像分类模型训练的方法、图像处理的方法及装置 | |
CN111985265B (zh) | 图像处理方法和装置 | |
CN111209423B (zh) | 一种基于电子相册的图像管理方法、装置以及存储介质 | |
EP3844718A1 (en) | Active image depth prediction | |
CN111950570B (zh) | 目标图像提取方法、神经网络训练方法及装置 | |
CN113409468B (zh) | 一种图像处理方法、装置、电子设备及存储介质 | |
CN113709385B (zh) | 一种视频处理方法及装置、计算机设备和存储介质 | |
CN114462580B (zh) | 文本识别模型的训练方法、文本识别方法、装置和设备 | |
US20220373791A1 (en) | Automatic media capture using biometric sensor data | |
US20240177414A1 (en) | 3d generation of diverse categories and scenes | |
WO2022245831A1 (en) | Automatic media capture using biometric sensor data | |
CN116935085A (zh) | 图像分类方法、图像分类装置、电子设备及存储介质 | |
JPWO2020059529A1 (ja) | 画像処理装置、画像処理方法及びプログラム、並びに携帯端末装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20819540 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020819540 Country of ref document: EP Effective date: 20220104 |