CN110705558B - Image instance segmentation method and device - Google Patents

Image instance segmentation method and device Download PDF

Info

Publication number
CN110705558B
CN110705558B CN201910932796.7A CN201910932796A CN110705558B CN 110705558 B CN110705558 B CN 110705558B CN 201910932796 A CN201910932796 A CN 201910932796A CN 110705558 B CN110705558 B CN 110705558B
Authority
CN
China
Prior art keywords
image
mask image
generation module
mask
expected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910932796.7A
Other languages
Chinese (zh)
Other versions
CN110705558A (en
Inventor
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Apas Technology Co ltd
Original Assignee
Zhengzhou Apas Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Apas Technology Co ltd filed Critical Zhengzhou Apas Technology Co ltd
Priority to CN201910932796.7A priority Critical patent/CN110705558B/en
Publication of CN110705558A publication Critical patent/CN110705558A/en
Application granted granted Critical
Publication of CN110705558B publication Critical patent/CN110705558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

One embodiment of the present specification provides an image instance segmentation method and apparatus, wherein the method includes: acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, extracting the features of the image to be segmented through an image feature extraction module in the image instance segmentation model, and generating a first feature image and a second feature image; determining an example object in the image to be segmented according to the first characteristic image through a classification positioning module in the image example segmentation model, and determining the category and the position of the example object; generating a background mask image, a foreground mask image and a transition band mask image corresponding to an example object according to the second characteristic image through a first mask image generation module in the image example segmentation model; and generating an example segmentation mask image corresponding to the example object according to the second characteristic image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model.

Description

Image instance segmentation method and device
Technical Field
The present invention relates to the field of image processing, and in particular, to an image instance segmentation method and apparatus.
Background
In the prior art, an image may be subjected to instance segmentation through various existing image instance segmentation models, where the instance segmentation refers to identifying the type, location, and occupied pixel range of each instance object in the image, and then extracting the instance object in the image, where the instance object includes, but is not limited to, a person, an article, and the like.
However, when an image is subjected to instance segmentation by using the existing image instance segmentation model, only a foreground mask image and a background mask image corresponding to an instance object can be extracted by using the instance segmentation model, and a transition region between a background and a foreground of the instance object cannot be focused, so that the problem that a segmentation contour is not accurate enough is often caused.
Disclosure of Invention
An object of one embodiment of the present specification is to provide an image instance segmentation method and apparatus, so as to solve the problem that a segmentation contour is not accurate enough when an image is instance segmented by an existing image instance segmentation model.
To solve the above technical problem, one embodiment of the present specification is implemented as follows:
in a first aspect, an embodiment of the present specification provides an image instance segmentation method, including:
acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting the features of the image to be segmented and generating a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;
determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;
generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to the second characteristic image through a first mask image generation module in the image example segmentation model;
generating an example segmentation mask image corresponding to the example object according to the second feature image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.
In a second aspect, another embodiment of the present specification provides an image example segmentation apparatus, including:
the image segmentation device comprises a feature extraction unit, a feature extraction unit and an image segmentation unit, wherein the feature extraction unit is used for acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting features of the image to be segmented through an image feature extraction module in the image instance segmentation model to generate a first feature image and a second feature image;
the positioning determination unit is used for determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;
a first generating unit, configured to generate, by a first mask image generating module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image;
a second generating unit, configured to generate, by a second mask image generating module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.
In a third aspect, a further embodiment of the present specification provides an image instance segmentation apparatus including: a memory, a processor and computer executable instructions stored on the memory and executable on the processor, the computer executable instructions when executed by the processor implementing the steps of the image instance segmentation method as described in the first aspect above.
In a fourth aspect, a further embodiment of the present specification provides a computer-readable storage medium for storing computer-executable instructions which, when executed by a processor, implement the steps of the image instance segmentation method according to the first aspect.
In the embodiment, when the image to be segmented is segmented by the image instance segmentation model, the first mask image generation module can generate the first mask image, according to the characteristic image corresponding to the image to be segmented, generating a background mask image corresponding to an example object in the image to be segmented, a foreground mask image and a transition band mask image between the background and the foreground of the example object, and through a second mask image generating module, generating an example segmentation mask image corresponding to an example object in the image to be segmented according to the characteristic image corresponding to the image to be segmented, the background mask image, the foreground mask image and the transition band mask image, therefore, the transition region between the background and the foreground of the example object is considered during example segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough when the image is subjected to example segmentation through the conventional image example segmentation model is solved.
Drawings
In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure, the drawings used in the embodiments or the prior art descriptions will be briefly described below.
FIG. 1 is a flowchart illustrating an example image segmentation method according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of an image example segmentation model provided in an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of an image example segmentation model provided in another embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a mask image provided in one embodiment of the present description;
FIG. 5 is a schematic structural diagram of an image example segmentation model provided in yet another embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of an image example segmentation model according to yet another embodiment of the present disclosure;
FIG. 7 is a block diagram of a mask IOU module according to another embodiment of the present disclosure;
fig. 8 is a schematic structural modification diagram of fastrcnn to msskrcnn provided in an embodiment of the present disclosure;
FIG. 9 is a block diagram of an exemplary image segmentation apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an image example segmentation apparatus provided in an embodiment of the present specification.
Detailed Description
The technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present specification.
An object of one embodiment of the present specification is to provide an image instance segmentation method and apparatus, so as to solve the problem that a segmentation contour is not accurate enough when an image is instance segmented by an existing image instance segmentation model. The image instance segmentation method provided in one embodiment of the present specification can be applied to a mobile terminal and executed by the mobile terminal, and can also be applied to a server and executed by the server.
Fig. 1 is a schematic flowchart of an image example segmentation method provided in an embodiment of the present specification, and as shown in fig. 1, the flowchart includes the following steps:
step S102, acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, extracting the features of the image to be segmented through an image feature extraction module in the image instance segmentation model, and generating a first feature image and a second feature image;
step S104, determining an example object in the image to be segmented according to the first characteristic image through a classification positioning module in the image example segmentation model, and determining the category and the position of the example object;
step S106, generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to a second characteristic image through a first mask image generation module in the image example segmentation model;
step S108, generating an example segmentation mask image corresponding to the example object according to a second characteristic image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; the type and the position of the example object and the example segmentation mask image are used for extracting the example object from the image to be segmented.
In the embodiment, when the image to be segmented is segmented by the image instance segmentation model, the first mask image generation module can generate the first mask image, according to the characteristic image corresponding to the image to be segmented, generating a background mask image corresponding to an example object in the image to be segmented, a foreground mask image and a transition band mask image between the background and the foreground of the example object, and through a second mask image generating module, generating an example segmentation mask image corresponding to an example object in the image to be segmented according to the characteristic image corresponding to the image to be segmented, the background mask image, the foreground mask image and the transition band mask image, therefore, the transition region between the background and the foreground of the example object is considered during example segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough when the image is subjected to example segmentation through the conventional image example segmentation model is solved.
Fig. 2 is a schematic structural diagram of an image example segmentation model provided in an embodiment of the present specification, and as shown in fig. 2, the image example segmentation model includes an image feature extraction module, a classification positioning module, a first mask image generation module, and a second mask image generation module. As shown in fig. 2, the image feature extraction module is configured to receive an input image to be segmented, extract features of the image to be segmented, and generate a first feature image and a second feature image.
The image feature extraction module is connected with the classification positioning module, and the classification positioning module is used for determining an example object in the image to be segmented according to the first feature image, and determining the category and the position of the example object, where the position may be the position of the example object in the image to be segmented, and the category may be preset categories such as "person", "automobile", "sun", "flower", and the like.
The image feature extraction module is further connected to a first mask image generation module, and the first mask image generation module is configured to generate a background mask image, a foreground mask image, and a transition band mask image corresponding to the example object according to the second feature image, where the background mask image is an image in which a background pixel of the example object is represented by a pixel having a pixel value of "1", the foreground mask image is an image in which a foreground pixel of the example object is represented by a pixel having a pixel value of "1", and the transition band mask image is an image in which a transition region between a background region and a foreground region of the example object is represented by a pixel having a pixel value between "0" and "1".
The first mask image generation module is further connected to the second mask image generation module, and the second mask image generation module is configured to generate an example segmentation mask image corresponding to the example object according to the second feature image, and the background mask image, the foreground mask image, and the transition band mask image generated by the first mask image generation module, where the example segmentation mask image is also referred to as an example segmentation mask.
When the example segmentation is carried out, the example object can be extracted from the image to be segmented through the example segmentation mask image, the category and the position of the example object. For example, the instance segmentation mask is superimposed on the image to be segmented, and the class and the position of the instance object are marked on the image to be segmented, so as to extract the instance object in the image to be segmented. The first characteristic image and the second characteristic image are two images with different image sizes.
In a specific embodiment, the image instance segmentation model is obtained by improvement based on a maskrnnn model, fig. 3 is a schematic structural diagram of the image instance segmentation model provided in another embodiment of the present specification, and as shown in fig. 3, the image instance segmentation model is obtained by improvement based on the maskrnnn model and includes an image feature extraction module, a classification positioning module, a first mask image generation module, and a second mask image generation module.
As shown in fig. 3, the image feature extraction module is a front-end feature extraction network, the front-end feature extraction network includes four convolution layers, and the front-end feature extraction network can perform convolution operation on an input image to be segmented through the four convolution layers, where a feature image obtained by convolution operation of a fourth convolution layer is determined as the first feature image, and a feature image obtained by convolution operation of a second convolution layer is determined as the second feature image. Of course, the front-end feature extraction network may also adopt other structures, such as five-layer or six-layer convolutional layers or other structures, and all convolutional neural networks capable of extracting image features can be used as the front-end feature extraction network, which is not limited herein. In one embodiment, the front-end feature extraction network is a front-end feature extraction network commonly used in a maskrnn model.
As shown in fig. 3, the classification and positioning module is an RCNN front-end module, which is composed of a convolution layer and two full connection layers, and the RCNN front-end includes an RPN (region pro-active network) layer. After the first feature image is input into the RCNN front end, the first feature image is processed through the RCNN front end, and an example object in the image to be segmented can be determined, and the category and the position of the example object are determined, wherein the category of the example object is class in fig. 3, and the position of the example object is box in fig. 3. As shown in fig. 3, the first feature image may be input into the RCNN front end through a RoiAlign structure. In one embodiment, the RCNN front-end module is a common RCNN front-end module in a maskrcnn model.
As shown in fig. 3, the first mask image generating module is a mask front end, and the mask front end is configured to generate a background mask image, a foreground mask image, and a transition-band mask image corresponding to the example object according to the input second feature image. As shown in fig. 3, the second feature image may be input into a mask front end through a RoiAlign structure, where the mask front end may include multiple layers of convolution layers, and a background mask image, a foreground mask image, and a transition-band mask image corresponding to the instance object are generated through convolution operations. The working process and the training process of the first mask image generation module will be further described later.
As shown in fig. 3, the second mask image generation module includes a cascade unit, an image size recovery unit and an image generation unit, wherein the cascade unit is a concat unit, the image size recovery unit is a unet decoder network structure, and the image generation unit includes a plurality of convolutional layers. The working process and the training process of the second mask image generation module will be further described later. It should be noted that the specific convolution size or image size marked in fig. 3 is only schematically illustrated, and is not used as a limitation on the image example segmentation model.
The method of fig. 1 is further described below.
In step S102, the image feature extraction module in the image instance segmentation model extracts features of the image to be segmented and generates a first feature image and a second feature image, where the first feature image and the second feature image may be: the image to be segmented is input to an image feature extraction module for convolution operation, a feature image output by the Nth layer of convolution layer is used as a first feature image, a feature image output by the Mth layer of convolution layer is used as a second feature image, N and M are two unequal positive integers, and the first feature image and the second feature image are two images with different image sizes. The image feature extraction module may adopt a general convolution network structure for extracting image features, and is not particularly limited herein.
In one embodiment, an image feature extraction module in the image instance segmentation model adopts a mobilenetv2 structure, a shufflenet structure, a pvanet structure and the like, and the structures have the advantages of small volume, sufficient extracted features and high operation speed, so that the volume of the image instance segmentation model is reduced, the operation speed of the image instance segmentation model is improved, and the image instance segmentation model can be applied to a mobile terminal for instance segmentation. Preferably, the image feature extraction module in the image instance segmentation model adopts a mobilenetv2 structure.
In step S104, determining, by the classification and positioning module in the image instance segmentation model, an instance object in the image to be segmented according to the first feature image, and determining a category and a position of the instance object, where the category and the position of the instance object may be: and inputting the first characteristic image into a classification positioning module for convolution operation, thereby determining an example object in the image to be segmented, and determining the category of the example object and the position of the example object in the image to be segmented. In one embodiment, the classification and localization module may include an RPN (region pro-active network) layer, and the first feature image is input to the classification and localization module through a RoiAlign structure. The classification location module may be a network structure that is common in the field of instance segmentation for determining the class and location of instance objects.
In step S106, generating, by the first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image, specifically: and inputting the second characteristic image into a first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image.
In this embodiment, the first mask image generating module may be a mask front end. The specific structure of the first mask image generation module can be shown with reference to the front end of the mask in fig. 3. The first mask image generation module comprises a plurality of convolution layers, and the first mask image generation module can perform convolution operation on the input second characteristic image to obtain the background mask image, the foreground mask image and the transition band mask image.
In this embodiment, the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image, and an expected transition band mask image corresponding to the first training sample image. The expected foreground mask image, the expected background mask image and the expected transition band mask image can be ground route.
Based on this, the method in this embodiment further includes the following steps to train the first mask image generation module:
(a1) when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on a first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;
(a2) calculating a training error value corresponding to a first mask image generation module according to a sample foreground mask image, an expected foreground mask image, a sample background mask image, an expected background mask image, a sample transition band mask image and an expected transition band mask image by using a preset first loss function;
(a3) and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.
In the action (a1), the first training sample image is a predetermined training sample, and when the first mask image generation module is trained, the first training sample image may be first input to the trained image feature extraction module to perform feature extraction, so as to obtain a corresponding second feature image, and then the second feature image is input to the trained first mask image generation module through the RoiAlign structure to be processed, so as to obtain the sample foreground mask image, the sample background mask image, and the sample transition band mask image. On the other hand, an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image and are determined manually in advance are obtained.
In the action (a2), a training error value corresponding to the first mask image generation module is calculated according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image, and the expected transition band mask image by using a preset first loss function, and specifically:
(a21) calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;
(a22) calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;
(a23) calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
(a24) determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;
and the first distance value, the second distance value and the third distance value are all L2 distance values in the proximity algorithm KNN.
Specifically, the first distance value, the second distance value, and the third distance value are all L2 distance values in KNN (k-nearest neighbor, proximity algorithm). The calculation formula of the L2 distance value is:
Figure BDA0002220725440000081
wherein, when calculating the first distance value, I1May represent the sample foreground mask image, I, described above2May represent a desired foreground mask image; in calculating the second distance value, I1May represent the sample background mask image, I, described above2May represent a desired background mask image; in calculating the third distance value, I1May represent the sample transition band mask image, I, described above2A desired transition band mask image may be represented. The meaning of other parameters in a formula can be referred to the general interpretation of the formula.
In this embodiment, the first distance value, the second distance value, and the third distance value may all be used as the training error value corresponding to the first mask image generation module, or the sum of the first distance value, the second distance value, and the third distance value may be used as the training error value corresponding to the first mask image generation module. The training error value is used for representing the difference between the mask image generated by the first mask image generation module and the corresponding expected mask image, the larger the difference is, the worse the precision of the first mask image generation module is, when the difference is smaller and meets a certain requirement, the precision of the first mask image generation module can be determined to be better, and then the first mask image generation module is determined to be trained completely.
After the training error value corresponding to the first mask image generation module is determined, in the above act (a3), the calculation parameters in the first mask image generation module are adjusted according to the training error value corresponding to the first mask image generation module to train the first mask image generation module. For example, the calculation parameters in the first mask image generation module may be adjusted according to the training error value corresponding to the first mask image generation module in a bayesian parameter tuning manner, so as to train the first mask image generation module.
In a specific embodiment, after the calculation parameters in the first mask image generation module are adjusted according to the training error value corresponding to the first mask image generation module, the above actions (a1) and (a2) are repeatedly performed until the training error value corresponding to the first mask image generation module is smaller than the corresponding preset error value, and it is determined that the training of the first mask image generation module is completed.
In the embodiment, the first training sample image and the corresponding expected foreground mask image, expected background mask image and expected transition band mask image are used for training the first mask image generation module, so that the first mask image generation module has the function of generating the background mask image, the foreground mask image and the transition band mask image, and the transition region between the background and the foreground of the example object is considered during example segmentation, and the accuracy of the segmentation contour is improved.
FIG. 4 is a schematic diagram of a mask image according to an embodiment of the present disclosure, and as shown in FIG. 4, a background mask image BsA background portion of the example object is represented by a pixel having a pixel value of "1", a foreground portion of the example object is represented by a pixel having a pixel value of "0", and a foreground mask image FsThe foreground part of the example object is represented by the pixel with the pixel value of 1, the background part of the example object is represented by the pixel with the pixel value of 0, and the transition band mask image UsThe background part and the foreground part of the instance object are represented by pixels with pixel values of ' 0 ', and the pair of instances is represented by pixels with pixel values between ' 0 ' and ' 1A transition region between a background region and a foreground region of the image.
The following describes the relevant procedure of the above step S108.
In step S108, generating, by a second mask image generation module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image, specifically:
(b1) cascading the second characteristic image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generating module;
(b2) restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in a second mask image generating module;
(b3) generating an example segmentation mask image according to the image, the foreground mask image and the transition band mask image which are obtained after size recovery through an image generating unit in a second mask image generating module;
and the second mask image generation module is used for segmenting the expected mask image according to a predetermined second training sample image and an example corresponding to the second training sample image and training the expected mask image.
The size of the image obtained after the cascade connection is restored to be consistent with the size of the image to be segmented through the image size restoring unit, the size of the generated example segmentation mask image can be enabled to be consistent with the size of the image to be segmented, and therefore the image with the alpha value, which is the same as the size of the original image, is directly output, and the alpha refers to the transition zone.
Referring to fig. 3, the background mask image B generated by the first mask image generating modulesForeground mask image FsAnd transition band mask image UsAnd the second characteristic image generated by the image characteristic extraction module is input into a cascade unit in the second mask image generation module, and the cascade unit carries out concat on the second characteristic image and the background mask image BsForeground mask image FsAnd transition band mask image UsTo carry outAnd (4) cascading. The image size recovery unit in the second mask image generation module may be a unet decoder network structure, and is configured to recover the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented, where the image obtained after the size recovery is, for example, α in fig. 3rAs shown. The image generating unit in the second mask image generating module is used for recovering the obtained image alpha according to the sizerForeground mask image FsAnd transition band mask image UsGenerating an example segmentation mask image, the generated example segmentation mask image being alpha in FIG. 3pAs shown.
Specifically, the image generation unit may recover the image α from the size by the following formularForeground mask image FsAnd transition band mask image UsGenerating an example segmentation mask image alphap
Figure BDA0002220725440000111
In this embodiment, the background mask image B generated by the first mask image generation modulesForeground mask image FsAnd transition band mask image UsSpecifically, the first mask image generation module firstly generates a background mask initial image B, a foreground mask initial image F and a transition band mask initial image U, and then performs normalization by the following formula to obtain a background mask image BsForeground mask image FsAnd transition band mask image Us
Figure BDA0002220725440000112
Figure BDA0002220725440000113
Figure BDA0002220725440000114
In this embodiment, the second mask image generation module may segment the expected mask image according to a predetermined second training sample image and an instance corresponding to the second training sample image and train the expected mask image. Based on this, a method procedure for training a second mask image generation module is presented, comprising the steps of:
(c1) when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;
(c2) calculating a training error value corresponding to a second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;
(c3) and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.
In the action (c1), the second training sample image is a predetermined training sample, and when the second mask image generation module is trained, the second training sample image may be first input to the trained image feature extraction module to perform feature extraction, so as to obtain a corresponding second feature image, then the second feature image is input to the trained first mask image generation module through the RoiAlign structure to be processed, so as to obtain an accurate background mask image, a foreground mask image and a transition band mask image, and then the accurate background mask image, foreground mask image and transition band mask image and the accurate second feature image are input to the second mask image generation module, so as to obtain an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image. On the other hand, an example segmentation expected mask image corresponding to a second training sample image determined manually in advance is obtained.
In the action (c2), the preset second loss function is used to calculate a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image, and specifically: and calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in the proximity algorithm KNN, and determining a training error value corresponding to the second mask image generation module according to the distance value.
The calculation formula of the L1 distance value in KNN is as follows:
Figure BDA0002220725440000121
wherein, when calculating a distance value between the instance division sample mask image and the instance division expected mask image, I1The example segmented sample mask image described above, I, can be represented2The example segmentation desired mask image described above may be represented. The meaning of other parameters in a formula can be referred to the general interpretation of the formula.
The way of calculating the L1 distance value in the proximity algorithm KNN is not repeated here, and the calculated distance value between the example segmented sample mask image and the example segmented expected mask image may be used as the training error value corresponding to the second mask image generation module in act (c 2). The training error value is used for representing the difference between the mask image generated by the second mask image generation module and the corresponding expected mask image, the larger the difference is, the worse the precision of the second mask image generation module is, when the difference is smaller and meets a certain requirement, the precision of the second mask image generation module can be determined to be better, and then the second mask image generation module is determined to be trained completely.
After the training error value corresponding to the second mask image generation module is determined, in the above act (c3), the calculation parameters in the second mask image generation module are adjusted according to the training error value corresponding to the second mask image generation module to train the second mask image generation module. For example, the calculation parameters in the second mask image generation module may be adjusted according to the training error value corresponding to the second mask image generation module in a bayesian parameter tuning manner, so as to train the second mask image generation module.
In a specific embodiment, after the calculation parameters in the second mask image generation module are adjusted according to the training error value corresponding to the second mask image generation module, the above actions (c1) and (c2) are repeatedly performed until the training error value corresponding to the second mask image generation module is smaller than the corresponding preset error value, and it is determined that the training of the second mask image generation module is completed.
It can be understood that, in the second mask image generation module, the image generation unit is composed of a plurality of convolution layers, and in training the second mask image generation module, the calculation parameters in the image generation unit are mainly trained. The cascade unit and the image size recovery unit in the second mask image generation module may respectively adopt a common concat structure and a unet decoder network structure.
In one embodiment of the present specification, the transition band mask image may be referred to as an α image, and in this embodiment, when the mask image is generated by the first mask image generation module and the second mask image generation module, the effect of the α channel is considered, so that the accuracy of the segmentation contour can be improved when the instance segmentation is performed.
Fig. 5 is a schematic structural diagram of an image example segmentation model according to still another embodiment of the present disclosure, and as shown in fig. 5, in this embodiment, the image example segmentation model further includes a scoring module, where the scoring module is connected to the first mask image generation module and is configured to score the first mask image generation module in a test stage of the image example segmentation model, and a score obtained by the scoring is used to indicate accuracy of the first mask image generation module in generating the background mask image, the foreground mask image, and the transition band mask image.
Correspondingly, the embodiment further provides the following steps:
(d1) in the testing stage of the image instance segmentation model, acquiring a testing background mask image, a testing foreground mask image and a testing transition band mask image which are generated by a first mask image generation module based on a testing sample image, and acquiring a characteristic image of the testing sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the testing sample image;
(d2) inputting the obtained test background mask image, test foreground mask image, test transition band mask image, characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to a scoring module, and scoring the first mask image generation module through the scoring module.
Specifically, in a testing stage of the image instance segmentation model, a test sample image is preset, the test sample image is input to a tested image feature extraction module, features of the test sample image are extracted through the image feature extraction module, and a first feature image and a second feature image are generated.
In this embodiment, the tested module means that the model passes through a corresponding test and passes the test after the training is completed. Testing the trained model in the field of machine learning is a common practice and is not described herein.
Then, the accurate second feature image is input to a first mask image generation module in a test, and the accurate second feature image is processed through the first mask image generation module in the test, so that a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on the test sample image are obtained. On the other hand, a feature image of the test sample image determined manually in advance and an expected background mask image, an expected foreground mask image and an expected transition band mask image corresponding to the test sample image determined manually in advance are obtained.
Then, the obtained test background mask image, test foreground mask image, test transition band mask image, feature image, expected background mask image, expected foreground mask image and expected transition band mask image are input to a scoring module, and the first mask image generation module is scored through the scoring module. The score obtained by the scoring is used for indicating the accuracy of the first mask image generation module in generating the background mask image, the foreground mask image and the transition band mask image, the higher the score obtained by the scoring is, the higher the accuracy of the background mask image, the foreground mask image and the transition band mask image generated by the first mask image generation module is, and when the score exceeds a preset score, the first mask image generation module is determined to pass the test.
In this embodiment, the first mask image generation module may be composed of a plurality of convolution layers and a plurality of full-link layers. Fig. 6 is a schematic structural diagram of an image instance segmentation model provided in another embodiment of the present disclosure, where the image instance segmentation model is obtained by improvement based on a maskrnnn model, as shown in fig. 6, the image instance segmentation model includes a mask iou module, which is the above-mentioned scoring module.
Fig. 7 is a schematic structural diagram of a mask iou module according to yet another embodiment of the present disclosure. Fig. 6 in fig. 7, the mask iou module is configured to perform regression between the test background mask image, the test foreground mask image, the test transition band mask image, and the expected background mask image, the expected foreground mask image, and the expected transition band mask image. In practical application, the feature image, the test background mask image, the test foreground mask image, the test transition band mask image, the expected background mask image, the expected foreground mask image and the expected transition band mask image can be spliced through the roiign structure and input into the mask iou module. When inputting, the input images are guaranteed to be equal in size by using max pooling. The dimensions in fig. 7 are for illustrative purposes and are not intended to be limiting.
As shown in fig. 6 and fig. 7, the mask iou module may be composed of 4 convolutional layers and 3 fully-connected layers, for the 4 convolutional layers, the kernel sizes and the filter numbers of all convolutional layers are set to 3 and 256, respectively, for the 3 fully-connected layers, the outputs of the first two layers are set to 1024, and the output of the last layer is set to the number of classes.
In training the mask iou module, RPN prompt may be set to the trained samples and returned to the maskoiou using the L2 penalty, with the penalty weight set to a particular value such as 1. And multiplying the predicted Mask by the classification score in the maskIoU working process to obtain a scoring value.
In this embodiment, it is inspired by the AP index, and if the quality of instance segmentation is described based on the pixel-level IOU between the prediction mask and the reference mask, it is more in line with the evaluation labeling of instance segmentation. A network is proposed in mask score rcnn to directly learn an IOU, the IOU is called MaskIou, and through the score, semantic categories can be identified, and the completeness of example masks can be identified. By such a scoring criterion, the quality of the finally obtained mask can be improved, and the AP value can be improved.
For the training and working process of the mask IOU module, reference is made to the general description, which is not too much presented here.
In other embodiments, the scoring module may calculate a first similarity between the test background mask image and the expected background mask image, and calculate a second similarity between the test foreground mask image and the expected foreground mask image, and calculate a third similarity between the test transition mask image and the expected transition mask image, and score the images according to the first similarity, the second similarity, and the third similarity.
In this embodiment, the loss function of the image instance segmentation model is formed by combining the loss function of the first mask image generation module and the loss function of the second mask image generation module, that is, the loss function of the image instance segmentation model is formed by combining the first loss function and the second loss function.
Specifically, the loss function of the image instance segmentation model can be expressed as formula Lp=λ||αpg1+ (1-gamma) score, wherein | | | αpg|' represents the second loss function, score represents the first loss function, λ and γ represent the coefficients used in conjunction with the two loss functions, LPRepresenting the loss function of the image instance segmentation model.
In one embodiment, when training the image instance segmentation model, the first loss function and the second loss function may be trained in combination.
An image instance segmentation model in an embodiment of the present specification may be obtained based on a maskrnn improvement, where the maskrnn model is based on a faster rcnn improvement, fig. 8 is a schematic diagram of a structural modification of fastrcnn to msskrnn provided in an embodiment of the present specification, as shown in fig. 8, Conv is a front-end feature extraction module, rpn is a region pro-spatial network module, a relevant region of interest may be obtained, ROI is pixel corrected by RoiAlign, and then each ROI is predicted using an FCN (full convolution) framework to obtain different instance classifications, so as to obtain an instance segmentation result of a final image. The loss functions used in the mask rcnn network include: classification error + detection error + segmentation error. The mask's loss function in Maskrcnn generally employs a cross-entropy loss function.
In summary, the embodiment of the present specification provides an image instance segmentation method, which performs instance segmentation based on an image instance segmentation model, where the image instance segmentation model may be applied to a mobile terminal to implement multi-objective instance segmentation, and the image instance segmentation model may be implemented based on maskrcnn and has an alpha value. By the image instance segmentation method in the embodiment, a transition region between a background and a foreground of an instance object can be considered during instance segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough during the instance segmentation of an image through an existing image instance segmentation model is solved.
An embodiment of the present specification further provides an image example dividing apparatus, fig. 9 is a schematic block diagram of the image example dividing apparatus provided in the embodiment of the present specification, and as shown in fig. 9, the image example dividing apparatus includes:
the feature extraction unit 91 is configured to acquire an image to be segmented, input the image to be segmented into an image instance segmentation model, and extract features of the image to be segmented and generate a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;
a location determining unit 92, configured to determine, through a classification location module in the image instance segmentation model, an instance object in the image to be segmented according to the first feature image, and determine a category and a location of the instance object;
a first generating unit 93, configured to generate, by a first mask image generating module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image;
a second generating unit 94, configured to generate, by a second mask image generating module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.
Optionally, the first generating unit 93 is specifically configured to:
inputting the second characteristic image into the first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image;
the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image and an expected transition band mask image corresponding to the first training sample image.
Optionally, the apparatus further comprises a first training unit for:
when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on the first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;
calculating a training error value corresponding to the first mask image generation module according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.
Optionally, the first training unit is specifically configured to:
calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;
calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;
calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;
wherein the first distance value, the second distance value, and the third distance value are all L2 distance values in the proximity algorithm KNN.
Optionally, the second generating unit 94 is specifically configured to:
cascading the second feature image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generation module;
restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in the second mask image generating module;
generating, by an image generation unit in the second mask image generation module, the example segmentation mask image according to the image obtained after size restoration, the foreground mask image, and the transition band mask image;
the second mask image generation module is used for segmenting an expected mask image according to a predetermined second training sample image and an example corresponding to the second training sample image and training the expected mask image.
Optionally, the apparatus further comprises a second training unit for:
when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;
calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;
and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.
Optionally, the second training unit is specifically configured to:
calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in a proximity algorithm KNN;
and determining a training error value corresponding to the second mask image generation module according to the distance value.
Optionally, the image instance segmentation model further includes a scoring module, configured to score the first mask image generation module at a test stage of the image instance segmentation model, where a score obtained by scoring is used to indicate an accuracy of the first mask image generation module in generating a background mask image, a foreground mask image, and a transition band mask image; the apparatus further comprises a scoring unit for:
in a test stage of the image instance segmentation model, acquiring a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on a test sample image, and acquiring a feature image of the test sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the test sample image;
inputting the obtained test background mask image, test foreground mask image, test transition band mask image, second characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to the scoring module, and scoring the first mask image generation module through the scoring module.
In the embodiment, when the image to be segmented is segmented by the image instance segmentation model, the first mask image generation module can generate the first mask image, according to the characteristic image corresponding to the image to be segmented, generating a background mask image corresponding to an example object in the image to be segmented, a foreground mask image and a transition band mask image between the background and the foreground of the example object, and through a second mask image generating module, generating an example segmentation mask image corresponding to an example object in the image to be segmented according to the characteristic image corresponding to the image to be segmented, the background mask image, the foreground mask image and the transition band mask image, therefore, the transition region between the background and the foreground of the example object is considered during example segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough when the image is subjected to example segmentation through the conventional image example segmentation model is solved.
The image example segmentation device provided in an embodiment of the present specification can implement each process in the foregoing image example segmentation method embodiments, and achieve the same functions and effects, which are not repeated here.
Further, an embodiment of the present specification further provides an image example segmentation apparatus, and fig. 10 is a schematic structural diagram of the image example segmentation apparatus provided in the embodiment of the present specification, as shown in fig. 10, the apparatus includes: memory 1001, processor 1002, bus 1003, and communication interface 1004. The memory 1001, processor 1002, and communication interface 1004 communicate via bus 1003. communication interface 1004 may include input and output interfaces including, but not limited to, a keyboard, mouse, display, microphone, and the like.
In fig. 10, the memory 1001 has stored thereon computer-executable instructions executable on the processor 1002, and when executed by the processor 1002, the computer-executable instructions implement the following procedures:
acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting the features of the image to be segmented and generating a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;
determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;
generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to the second characteristic image through a first mask image generation module in the image example segmentation model;
generating an example segmentation mask image corresponding to the example object according to the second feature image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.
Optionally, when executed by the processor, the computer-executable instructions generate, by a first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image, including:
inputting the second characteristic image into the first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image;
the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image and an expected transition band mask image corresponding to the first training sample image.
Optionally, the computer executable instructions, when executed by the processor, further comprise:
when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on the first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;
calculating a training error value corresponding to the first mask image generation module according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.
Optionally, when executed by the processor, the computer-executable instructions utilize a preset first loss function to calculate training error values corresponding to the first mask image generation module according to the sample foreground mask image, the desired foreground mask image, the sample background mask image, the desired background mask image, the sample transition band mask image, and the desired transition band mask image, and include:
calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;
calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;
calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;
wherein the first distance value, the second distance value, and the third distance value are all L2 distance values in the proximity algorithm KNN.
Optionally, when executed by the processor, the computer-executable instructions generate, by a second mask image generation module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition-band mask image, including:
cascading the second feature image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generation module;
restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in the second mask image generating module;
generating, by an image generation unit in the second mask image generation module, the example segmentation mask image according to the image obtained after size restoration, the foreground mask image, and the transition band mask image;
the second mask image generation module is used for segmenting an expected mask image according to a predetermined second training sample image and an example corresponding to the second training sample image and training the expected mask image.
Optionally, the computer executable instructions, when executed by the processor, further comprise:
when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;
calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;
and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.
Optionally, when executed by the processor, the computer-executable instructions utilize a preset second loss function to calculate a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image, and include:
calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in a proximity algorithm KNN;
and determining a training error value corresponding to the second mask image generation module according to the distance value.
Optionally, when the computer-executable instructions are executed by the processor, the image instance segmentation model further includes a scoring module, configured to score the first mask image generation module in a test stage of the image instance segmentation model, where a score obtained by scoring is used to indicate an accuracy of the first mask image generation module in generating a background mask image, a foreground mask image, and a transition band mask image; further comprising:
in a test stage of the image instance segmentation model, acquiring a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on a test sample image, and acquiring a feature image of the test sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the test sample image;
inputting the obtained test background mask image, test foreground mask image, test transition band mask image, second characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to the scoring module, and scoring the first mask image generation module through the scoring module.
In the embodiment, when the image to be segmented is segmented by the image instance segmentation model, the first mask image generation module can generate the first mask image, according to the characteristic image corresponding to the image to be segmented, generating a background mask image corresponding to an example object in the image to be segmented, a foreground mask image and a transition band mask image between the background and the foreground of the example object, and through a second mask image generating module, generating an example segmentation mask image corresponding to an example object in the image to be segmented according to the characteristic image corresponding to the image to be segmented, the background mask image, the foreground mask image and the transition band mask image, therefore, the transition region between the background and the foreground of the example object is considered during example segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough when the image is subjected to example segmentation through the conventional image example segmentation model is solved.
The image example segmentation device provided in an embodiment of the present specification can implement each process in the foregoing image example segmentation method embodiments, and achieve the same functions and effects, which are not repeated here.
Further, another embodiment of the present specification also provides a computer-readable storage medium for storing computer-executable instructions, which when executed by a processor implement the following process:
acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting the features of the image to be segmented and generating a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;
determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;
generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to the second characteristic image through a first mask image generation module in the image example segmentation model;
generating an example segmentation mask image corresponding to the example object according to the second feature image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.
Optionally, when executed by the processor, the computer-executable instructions generate, by a first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image, including:
inputting the second characteristic image into the first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image;
the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image and an expected transition band mask image corresponding to the first training sample image.
Optionally, the computer executable instructions, when executed by the processor, further comprise:
when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on the first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;
calculating a training error value corresponding to the first mask image generation module according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.
Optionally, when executed by the processor, the computer-executable instructions utilize a preset first loss function to calculate training error values corresponding to the first mask image generation module according to the sample foreground mask image, the desired foreground mask image, the sample background mask image, the desired background mask image, the sample transition band mask image, and the desired transition band mask image, and include:
calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;
calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;
calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;
wherein the first distance value, the second distance value, and the third distance value are all L2 distance values in the proximity algorithm KNN.
Optionally, when executed by the processor, the computer-executable instructions generate, by a second mask image generation module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition-band mask image, including:
cascading the second feature image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generation module;
restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in the second mask image generating module;
generating, by an image generation unit in the second mask image generation module, the example segmentation mask image according to the image obtained after size restoration, the foreground mask image, and the transition band mask image;
the second mask image generation module is used for segmenting an expected mask image according to a predetermined second training sample image and an example corresponding to the second training sample image and training the expected mask image.
Optionally, the computer executable instructions, when executed by the processor, further comprise:
when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;
calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;
and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.
Optionally, when executed by the processor, the computing a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function includes:
calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in a proximity algorithm KNN;
and determining a training error value corresponding to the second mask image generation module according to the distance value.
Optionally, when the computer-executable instructions are executed by the processor, the image instance segmentation model further includes a scoring module, configured to score the first mask image generation module in a test stage of the image instance segmentation model, where a score obtained by scoring is used to indicate an accuracy of the first mask image generation module in generating the background mask image, the foreground mask image and the transition-band mask image; further comprising:
in a test stage of the image instance segmentation model, acquiring a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on a test sample image, and acquiring a feature image of the test sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the test sample image;
inputting the obtained test background mask image, test foreground mask image, test transition band mask image, second characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to the scoring module, and scoring the first mask image generation module through the scoring module.
The computer-readable storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
In the embodiment, when the image to be segmented is segmented by the image instance segmentation model, the first mask image generation module can generate the first mask image, according to the characteristic image corresponding to the image to be segmented, generating a background mask image corresponding to an example object in the image to be segmented, a foreground mask image and a transition band mask image between the background and the foreground of the example object, and through a second mask image generating module, generating an example segmentation mask image corresponding to an example object in the image to be segmented according to the characteristic image corresponding to the image to be segmented, the background mask image, the foreground mask image and the transition band mask image, therefore, the transition region between the background and the foreground of the example object is considered during example segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough when the image is subjected to example segmentation through the conventional image example segmentation model is solved.
The image example segmentation device provided in an embodiment of the present specification can implement each process in the foregoing image example segmentation method embodiments, and achieve the same functions and effects, which are not repeated here.
The above description is only an example of the present specification and is not intended to limit the present document. Various modifications and changes may occur to the embodiments described herein, as will be apparent to those skilled in the art. Any modifications, equivalents, improvements, etc. which come within the spirit and principle of the disclosure are intended to be included within the scope of the claims of this document.

Claims (14)

1. An image instance segmentation method, comprising:
acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting the features of the image to be segmented and generating a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;
determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;
generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to the second characteristic image through a first mask image generation module in the image example segmentation model;
generating an example segmentation mask image corresponding to the example object according to the second feature image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented;
when the instance segmentation mask image corresponding to the instance object is generated, the second feature image, the background mask image, the foreground mask image and the transition band mask image are cascaded through a cascade unit in the second mask image generation module;
restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in the second mask image generating module;
and generating the example segmentation mask image according to the image obtained after size recovery, the foreground mask image and the transition band mask image by an image generation unit in the second mask image generation module.
2. The method according to claim 1, wherein generating, by a first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image comprises:
inputting the second characteristic image into the first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image;
the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image and an expected transition band mask image corresponding to the first training sample image.
3. The method of claim 2, further comprising:
when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on the first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;
calculating a training error value corresponding to the first mask image generation module according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.
4. The method according to claim 3, wherein calculating training error values corresponding to the first mask image generation module according to the sample foreground mask image, the desired foreground mask image, the sample background mask image, the desired background mask image, the sample transition band mask image, and the desired transition band mask image by using a preset first loss function comprises:
calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;
calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;
calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;
wherein the first distance value, the second distance value, and the third distance value are all L2 distance values in the proximity algorithm KNN.
5. The method according to claim 1, wherein the second mask image generation module is trained by segmenting the expected mask image according to a predetermined second training sample image and an instance corresponding to the second training sample image.
6. The method of claim 5, further comprising:
when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;
calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;
and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.
7. The method of claim 6, wherein calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function comprises:
calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in a proximity algorithm KNN;
and determining a training error value corresponding to the second mask image generation module according to the distance value.
8. The method according to any one of claims 1 to 7, wherein the image instance segmentation model further comprises a scoring module, which is used for scoring the first mask image generation module in a test stage of the image instance segmentation model, and a score obtained by scoring is used for indicating an accuracy of the first mask image generation module in generating a background mask image, a foreground mask image and a transition band mask image; the method further comprises the following steps:
in a test stage of the image instance segmentation model, acquiring a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on a test sample image, and acquiring a feature image of the test sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the test sample image;
inputting the obtained test background mask image, test foreground mask image, test transition band mask image, second characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to the scoring module, and scoring the first mask image generation module through the scoring module.
9. An image instance segmentation apparatus, comprising:
the image segmentation device comprises a feature extraction unit, a feature extraction unit and an image segmentation unit, wherein the feature extraction unit is used for acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting features of the image to be segmented through an image feature extraction module in the image instance segmentation model to generate a first feature image and a second feature image;
the positioning determination unit is used for determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;
a first generating unit, configured to generate, by a first mask image generating module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image;
a second generating unit, configured to generate, by a second mask image generating module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented;
the second generating unit is specifically configured to: cascading the second feature image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generation module;
restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in the second mask image generating module;
and generating the example segmentation mask image according to the image obtained after size recovery, the foreground mask image and the transition band mask image by an image generation unit in the second mask image generation module.
10. The apparatus according to claim 9, wherein the first generating unit is specifically configured to:
inputting the second characteristic image into the first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image;
the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image and an expected transition band mask image corresponding to the first training sample image.
11. The apparatus of claim 10, further comprising a first training unit to:
when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on the first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;
calculating a training error value corresponding to the first mask image generation module according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.
12. The apparatus of claim 11, wherein the first training unit is specifically configured to:
calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;
calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;
calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;
determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;
wherein the first distance value, the second distance value, and the third distance value are all L2 distance values in the proximity algorithm KNN.
13. The apparatus of claim 9, wherein the second mask image generation module is configured to segment the expected mask image according to a predetermined second training sample image and an instance corresponding to the second training sample image.
14. The apparatus of claim 13, further comprising a second training unit to:
when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;
calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;
and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.
CN201910932796.7A 2019-09-29 2019-09-29 Image instance segmentation method and device Active CN110705558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910932796.7A CN110705558B (en) 2019-09-29 2019-09-29 Image instance segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910932796.7A CN110705558B (en) 2019-09-29 2019-09-29 Image instance segmentation method and device

Publications (2)

Publication Number Publication Date
CN110705558A CN110705558A (en) 2020-01-17
CN110705558B true CN110705558B (en) 2022-03-08

Family

ID=69197116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910932796.7A Active CN110705558B (en) 2019-09-29 2019-09-29 Image instance segmentation method and device

Country Status (1)

Country Link
CN (1) CN110705558B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340813B (en) * 2020-02-25 2023-09-01 北京字节跳动网络技术有限公司 Image instance segmentation method and device, electronic equipment and storage medium
CN113362351A (en) * 2020-03-05 2021-09-07 阿里巴巴集团控股有限公司 Image processing method and device, electronic equipment and storage medium
CN111462140B (en) * 2020-04-30 2023-07-07 同济大学 Real-time image instance segmentation method based on block stitching
CN111583159B (en) * 2020-05-29 2024-01-05 北京金山云网络技术有限公司 Image complement method and device and electronic equipment
US11636796B2 (en) * 2020-12-03 2023-04-25 Black Sesame Technologies Inc. Subject segmentation for motion control and textured display
CN112465800B (en) * 2020-12-09 2022-07-29 北京航空航天大学 Instance segmentation method for correcting classification errors by using classification attention module
CN113158733B (en) * 2020-12-30 2024-01-02 北京市商汤科技开发有限公司 Image filtering method and device, electronic equipment and storage medium
CN116798056B (en) * 2023-08-28 2023-11-17 星汉智能科技股份有限公司 Form image positioning method, apparatus, device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325954A (en) * 2018-09-18 2019-02-12 北京旷视科技有限公司 Image partition method, device and electronic equipment
CN109949317A (en) * 2019-03-06 2019-06-28 东南大学 Based on the semi-supervised image instance dividing method for gradually fighting study
CN109949316A (en) * 2019-03-01 2019-06-28 东南大学 A kind of Weakly supervised example dividing method of grid equipment image based on RGB-T fusion
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN110008832A (en) * 2019-02-27 2019-07-12 西安电子科技大学 Based on deep learning character image automatic division method, information data processing terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679351B2 (en) * 2017-08-18 2020-06-09 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325954A (en) * 2018-09-18 2019-02-12 北京旷视科技有限公司 Image partition method, device and electronic equipment
CN110008832A (en) * 2019-02-27 2019-07-12 西安电子科技大学 Based on deep learning character image automatic division method, information data processing terminal
CN109949316A (en) * 2019-03-01 2019-06-28 东南大学 A kind of Weakly supervised example dividing method of grid equipment image based on RGB-T fusion
CN109949317A (en) * 2019-03-06 2019-06-28 东南大学 Based on the semi-supervised image instance dividing method for gradually fighting study
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Feature Pyramid Networks for Object Detection;Tsung-Yi Lin et al.;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171109;第936-944页 *
Mask R-CNN with Feature Pyramid Attention for Instance Segmentation;Xiangyi Zhang et al.;《2018 14th IEEE International Conference on Signal Processing (ICSP)》;20190228;第1194-1197页 *
Mask R-CNN;Kaiming He et al.;《2017 IEEE International Conference on Computer Vision》;20171225;第2980-2988页 *

Also Published As

Publication number Publication date
CN110705558A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN110705558B (en) Image instance segmentation method and device
CN109299274B (en) Natural scene text detection method based on full convolution neural network
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN111340814B (en) RGB-D image semantic segmentation method based on multi-mode self-adaptive convolution
US10402680B2 (en) Methods and apparatus for image salient object detection
CN108960211B (en) Multi-target human body posture detection method and system
CN111259940B (en) Target detection method based on space attention map
CN110807422A (en) Natural scene text detection method based on deep learning
CN105144239A (en) Image processing device, program, and image processing method
EP4047509A1 (en) Facial parsing method and related devices
CN110619316A (en) Human body key point detection method and device and electronic equipment
CN112381763A (en) Surface defect detection method
CN112861785B (en) Instance segmentation and image restoration-based pedestrian re-identification method with shielding function
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN111462140B (en) Real-time image instance segmentation method based on block stitching
CN112233129A (en) Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device
CN115861772A (en) Multi-scale single-stage target detection method based on RetinaNet
CN106650615A (en) Image processing method and terminal
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
CN110880010A (en) Visual SLAM closed loop detection algorithm based on convolutional neural network
CN113487610A (en) Herpes image recognition method and device, computer equipment and storage medium
CN111723852A (en) Robust training method for target detection network
CN113570540A (en) Image tampering blind evidence obtaining method based on detection-segmentation architecture
CN110008949A (en) A kind of image object detection method, system, device and storage medium
CN116543437A (en) Occlusion face recognition method based on occlusion-feature mapping relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant