CN110705558B

CN110705558B - Image instance segmentation method and device

Info

Publication number: CN110705558B
Application number: CN201910932796.7A
Authority: CN
Inventors: 李涛
Original assignee: Zhengzhou Apas Technology Co ltd
Current assignee: Zhengzhou Apas Technology Co ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2022-03-08
Anticipated expiration: 2039-09-29
Also published as: CN110705558A

Abstract

One embodiment of the present specification provides an image instance segmentation method and apparatus, wherein the method includes: acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, extracting the features of the image to be segmented through an image feature extraction module in the image instance segmentation model, and generating a first feature image and a second feature image; determining an example object in the image to be segmented according to the first characteristic image through a classification positioning module in the image example segmentation model, and determining the category and the position of the example object; generating a background mask image, a foreground mask image and a transition band mask image corresponding to an example object according to the second characteristic image through a first mask image generation module in the image example segmentation model; and generating an example segmentation mask image corresponding to the example object according to the second characteristic image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model.

Description

Image instance segmentation method and device

Technical Field

The present invention relates to the field of image processing, and in particular, to an image instance segmentation method and apparatus.

Background

In the prior art, an image may be subjected to instance segmentation through various existing image instance segmentation models, where the instance segmentation refers to identifying the type, location, and occupied pixel range of each instance object in the image, and then extracting the instance object in the image, where the instance object includes, but is not limited to, a person, an article, and the like.

However, when an image is subjected to instance segmentation by using the existing image instance segmentation model, only a foreground mask image and a background mask image corresponding to an instance object can be extracted by using the instance segmentation model, and a transition region between a background and a foreground of the instance object cannot be focused, so that the problem that a segmentation contour is not accurate enough is often caused.

Disclosure of Invention

An object of one embodiment of the present specification is to provide an image instance segmentation method and apparatus, so as to solve the problem that a segmentation contour is not accurate enough when an image is instance segmented by an existing image instance segmentation model.

To solve the above technical problem, one embodiment of the present specification is implemented as follows:

in a first aspect, an embodiment of the present specification provides an image instance segmentation method, including:

acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting the features of the image to be segmented and generating a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;

determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;

generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to the second characteristic image through a first mask image generation module in the image example segmentation model;

generating an example segmentation mask image corresponding to the example object according to the second feature image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.

In a second aspect, another embodiment of the present specification provides an image example segmentation apparatus, including:

the image segmentation device comprises a feature extraction unit, a feature extraction unit and an image segmentation unit, wherein the feature extraction unit is used for acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, and extracting features of the image to be segmented through an image feature extraction module in the image instance segmentation model to generate a first feature image and a second feature image;

the positioning determination unit is used for determining an example object in the image to be segmented according to the first characteristic image and determining the category and the position of the example object through a classification positioning module in the image example segmentation model;

a first generating unit, configured to generate, by a first mask image generating module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image;

a second generating unit, configured to generate, by a second mask image generating module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.

In a third aspect, a further embodiment of the present specification provides an image instance segmentation apparatus including: a memory, a processor and computer executable instructions stored on the memory and executable on the processor, the computer executable instructions when executed by the processor implementing the steps of the image instance segmentation method as described in the first aspect above.

In a fourth aspect, a further embodiment of the present specification provides a computer-readable storage medium for storing computer-executable instructions which, when executed by a processor, implement the steps of the image instance segmentation method according to the first aspect.

In the embodiment, when the image to be segmented is segmented by the image instance segmentation model, the first mask image generation module can generate the first mask image, according to the characteristic image corresponding to the image to be segmented, generating a background mask image corresponding to an example object in the image to be segmented, a foreground mask image and a transition band mask image between the background and the foreground of the example object, and through a second mask image generating module, generating an example segmentation mask image corresponding to an example object in the image to be segmented according to the characteristic image corresponding to the image to be segmented, the background mask image, the foreground mask image and the transition band mask image, therefore, the transition region between the background and the foreground of the example object is considered during example segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough when the image is subjected to example segmentation through the conventional image example segmentation model is solved.

Drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure, the drawings used in the embodiments or the prior art descriptions will be briefly described below.

FIG. 1 is a flowchart illustrating an example image segmentation method according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of an image example segmentation model provided in an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of an image example segmentation model provided in another embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a mask image provided in one embodiment of the present description;

FIG. 5 is a schematic structural diagram of an image example segmentation model provided in yet another embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of an image example segmentation model according to yet another embodiment of the present disclosure;

FIG. 7 is a block diagram of a mask IOU module according to another embodiment of the present disclosure;

fig. 8 is a schematic structural modification diagram of fastrcnn to msskrcnn provided in an embodiment of the present disclosure;

FIG. 9 is a block diagram of an exemplary image segmentation apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an image example segmentation apparatus provided in an embodiment of the present specification.

Detailed Description

The technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present specification.

An object of one embodiment of the present specification is to provide an image instance segmentation method and apparatus, so as to solve the problem that a segmentation contour is not accurate enough when an image is instance segmented by an existing image instance segmentation model. The image instance segmentation method provided in one embodiment of the present specification can be applied to a mobile terminal and executed by the mobile terminal, and can also be applied to a server and executed by the server.

Fig. 1 is a schematic flowchart of an image example segmentation method provided in an embodiment of the present specification, and as shown in fig. 1, the flowchart includes the following steps:

step S102, acquiring an image to be segmented, inputting the image to be segmented into an image instance segmentation model, extracting the features of the image to be segmented through an image feature extraction module in the image instance segmentation model, and generating a first feature image and a second feature image;

step S104, determining an example object in the image to be segmented according to the first characteristic image through a classification positioning module in the image example segmentation model, and determining the category and the position of the example object;

step S106, generating a background mask image and a foreground mask image corresponding to the example object and a transition band mask image between the background and the foreground of the example object according to a second characteristic image through a first mask image generation module in the image example segmentation model;

step S108, generating an example segmentation mask image corresponding to the example object according to a second characteristic image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; the type and the position of the example object and the example segmentation mask image are used for extracting the example object from the image to be segmented.

Fig. 2 is a schematic structural diagram of an image example segmentation model provided in an embodiment of the present specification, and as shown in fig. 2, the image example segmentation model includes an image feature extraction module, a classification positioning module, a first mask image generation module, and a second mask image generation module. As shown in fig. 2, the image feature extraction module is configured to receive an input image to be segmented, extract features of the image to be segmented, and generate a first feature image and a second feature image.

The image feature extraction module is connected with the classification positioning module, and the classification positioning module is used for determining an example object in the image to be segmented according to the first feature image, and determining the category and the position of the example object, where the position may be the position of the example object in the image to be segmented, and the category may be preset categories such as "person", "automobile", "sun", "flower", and the like.

The image feature extraction module is further connected to a first mask image generation module, and the first mask image generation module is configured to generate a background mask image, a foreground mask image, and a transition band mask image corresponding to the example object according to the second feature image, where the background mask image is an image in which a background pixel of the example object is represented by a pixel having a pixel value of "1", the foreground mask image is an image in which a foreground pixel of the example object is represented by a pixel having a pixel value of "1", and the transition band mask image is an image in which a transition region between a background region and a foreground region of the example object is represented by a pixel having a pixel value between "0" and "1".

The first mask image generation module is further connected to the second mask image generation module, and the second mask image generation module is configured to generate an example segmentation mask image corresponding to the example object according to the second feature image, and the background mask image, the foreground mask image, and the transition band mask image generated by the first mask image generation module, where the example segmentation mask image is also referred to as an example segmentation mask.

When the example segmentation is carried out, the example object can be extracted from the image to be segmented through the example segmentation mask image, the category and the position of the example object. For example, the instance segmentation mask is superimposed on the image to be segmented, and the class and the position of the instance object are marked on the image to be segmented, so as to extract the instance object in the image to be segmented. The first characteristic image and the second characteristic image are two images with different image sizes.

In a specific embodiment, the image instance segmentation model is obtained by improvement based on a maskrnnn model, fig. 3 is a schematic structural diagram of the image instance segmentation model provided in another embodiment of the present specification, and as shown in fig. 3, the image instance segmentation model is obtained by improvement based on the maskrnnn model and includes an image feature extraction module, a classification positioning module, a first mask image generation module, and a second mask image generation module.

As shown in fig. 3, the image feature extraction module is a front-end feature extraction network, the front-end feature extraction network includes four convolution layers, and the front-end feature extraction network can perform convolution operation on an input image to be segmented through the four convolution layers, where a feature image obtained by convolution operation of a fourth convolution layer is determined as the first feature image, and a feature image obtained by convolution operation of a second convolution layer is determined as the second feature image. Of course, the front-end feature extraction network may also adopt other structures, such as five-layer or six-layer convolutional layers or other structures, and all convolutional neural networks capable of extracting image features can be used as the front-end feature extraction network, which is not limited herein. In one embodiment, the front-end feature extraction network is a front-end feature extraction network commonly used in a maskrnn model.

As shown in fig. 3, the classification and positioning module is an RCNN front-end module, which is composed of a convolution layer and two full connection layers, and the RCNN front-end includes an RPN (region pro-active network) layer. After the first feature image is input into the RCNN front end, the first feature image is processed through the RCNN front end, and an example object in the image to be segmented can be determined, and the category and the position of the example object are determined, wherein the category of the example object is class in fig. 3, and the position of the example object is box in fig. 3. As shown in fig. 3, the first feature image may be input into the RCNN front end through a RoiAlign structure. In one embodiment, the RCNN front-end module is a common RCNN front-end module in a maskrcnn model.

As shown in fig. 3, the first mask image generating module is a mask front end, and the mask front end is configured to generate a background mask image, a foreground mask image, and a transition-band mask image corresponding to the example object according to the input second feature image. As shown in fig. 3, the second feature image may be input into a mask front end through a RoiAlign structure, where the mask front end may include multiple layers of convolution layers, and a background mask image, a foreground mask image, and a transition-band mask image corresponding to the instance object are generated through convolution operations. The working process and the training process of the first mask image generation module will be further described later.

As shown in fig. 3, the second mask image generation module includes a cascade unit, an image size recovery unit and an image generation unit, wherein the cascade unit is a concat unit, the image size recovery unit is a unet decoder network structure, and the image generation unit includes a plurality of convolutional layers. The working process and the training process of the second mask image generation module will be further described later. It should be noted that the specific convolution size or image size marked in fig. 3 is only schematically illustrated, and is not used as a limitation on the image example segmentation model.

The method of fig. 1 is further described below.

In step S102, the image feature extraction module in the image instance segmentation model extracts features of the image to be segmented and generates a first feature image and a second feature image, where the first feature image and the second feature image may be: the image to be segmented is input to an image feature extraction module for convolution operation, a feature image output by the Nth layer of convolution layer is used as a first feature image, a feature image output by the Mth layer of convolution layer is used as a second feature image, N and M are two unequal positive integers, and the first feature image and the second feature image are two images with different image sizes. The image feature extraction module may adopt a general convolution network structure for extracting image features, and is not particularly limited herein.

In one embodiment, an image feature extraction module in the image instance segmentation model adopts a mobilenetv2 structure, a shufflenet structure, a pvanet structure and the like, and the structures have the advantages of small volume, sufficient extracted features and high operation speed, so that the volume of the image instance segmentation model is reduced, the operation speed of the image instance segmentation model is improved, and the image instance segmentation model can be applied to a mobile terminal for instance segmentation. Preferably, the image feature extraction module in the image instance segmentation model adopts a mobilenetv2 structure.

In step S104, determining, by the classification and positioning module in the image instance segmentation model, an instance object in the image to be segmented according to the first feature image, and determining a category and a position of the instance object, where the category and the position of the instance object may be: and inputting the first characteristic image into a classification positioning module for convolution operation, thereby determining an example object in the image to be segmented, and determining the category of the example object and the position of the example object in the image to be segmented. In one embodiment, the classification and localization module may include an RPN (region pro-active network) layer, and the first feature image is input to the classification and localization module through a RoiAlign structure. The classification location module may be a network structure that is common in the field of instance segmentation for determining the class and location of instance objects.

In step S106, generating, by the first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image, specifically: and inputting the second characteristic image into a first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image.

In this embodiment, the first mask image generating module may be a mask front end. The specific structure of the first mask image generation module can be shown with reference to the front end of the mask in fig. 3. The first mask image generation module comprises a plurality of convolution layers, and the first mask image generation module can perform convolution operation on the input second characteristic image to obtain the background mask image, the foreground mask image and the transition band mask image.

In this embodiment, the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image, and an expected transition band mask image corresponding to the first training sample image. The expected foreground mask image, the expected background mask image and the expected transition band mask image can be ground route.

Based on this, the method in this embodiment further includes the following steps to train the first mask image generation module:

(a1) when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on a first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;

(a2) calculating a training error value corresponding to a first mask image generation module according to a sample foreground mask image, an expected foreground mask image, a sample background mask image, an expected background mask image, a sample transition band mask image and an expected transition band mask image by using a preset first loss function;

(a3) and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.

In the action (a1), the first training sample image is a predetermined training sample, and when the first mask image generation module is trained, the first training sample image may be first input to the trained image feature extraction module to perform feature extraction, so as to obtain a corresponding second feature image, and then the second feature image is input to the trained first mask image generation module through the RoiAlign structure to be processed, so as to obtain the sample foreground mask image, the sample background mask image, and the sample transition band mask image. On the other hand, an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image and are determined manually in advance are obtained.

In the action (a2), a training error value corresponding to the first mask image generation module is calculated according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image, and the expected transition band mask image by using a preset first loss function, and specifically:

(a21) calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;

(a22) calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;

(a23) calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;

(a24) determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;

and the first distance value, the second distance value and the third distance value are all L2 distance values in the proximity algorithm KNN.

Specifically, the first distance value, the second distance value, and the third distance value are all L2 distance values in KNN (k-nearest neighbor, proximity algorithm). The calculation formula of the L2 distance value is:

wherein, when calculating the first distance value, I₁May represent the sample foreground mask image, I, described above₂May represent a desired foreground mask image; in calculating the second distance value, I₁May represent the sample background mask image, I, described above₂May represent a desired background mask image; in calculating the third distance value, I₁May represent the sample transition band mask image, I, described above₂A desired transition band mask image may be represented. The meaning of other parameters in a formula can be referred to the general interpretation of the formula.

In this embodiment, the first distance value, the second distance value, and the third distance value may all be used as the training error value corresponding to the first mask image generation module, or the sum of the first distance value, the second distance value, and the third distance value may be used as the training error value corresponding to the first mask image generation module. The training error value is used for representing the difference between the mask image generated by the first mask image generation module and the corresponding expected mask image, the larger the difference is, the worse the precision of the first mask image generation module is, when the difference is smaller and meets a certain requirement, the precision of the first mask image generation module can be determined to be better, and then the first mask image generation module is determined to be trained completely.

After the training error value corresponding to the first mask image generation module is determined, in the above act (a3), the calculation parameters in the first mask image generation module are adjusted according to the training error value corresponding to the first mask image generation module to train the first mask image generation module. For example, the calculation parameters in the first mask image generation module may be adjusted according to the training error value corresponding to the first mask image generation module in a bayesian parameter tuning manner, so as to train the first mask image generation module.

In a specific embodiment, after the calculation parameters in the first mask image generation module are adjusted according to the training error value corresponding to the first mask image generation module, the above actions (a1) and (a2) are repeatedly performed until the training error value corresponding to the first mask image generation module is smaller than the corresponding preset error value, and it is determined that the training of the first mask image generation module is completed.

In the embodiment, the first training sample image and the corresponding expected foreground mask image, expected background mask image and expected transition band mask image are used for training the first mask image generation module, so that the first mask image generation module has the function of generating the background mask image, the foreground mask image and the transition band mask image, and the transition region between the background and the foreground of the example object is considered during example segmentation, and the accuracy of the segmentation contour is improved.

FIG. 4 is a schematic diagram of a mask image according to an embodiment of the present disclosure, and as shown in FIG. 4, a background mask image B_sA background portion of the example object is represented by a pixel having a pixel value of "1", a foreground portion of the example object is represented by a pixel having a pixel value of "0", and a foreground mask image F_sThe foreground part of the example object is represented by the pixel with the pixel value of 1, the background part of the example object is represented by the pixel with the pixel value of 0, and the transition band mask image U_sThe background part and the foreground part of the instance object are represented by pixels with pixel values of ' 0 ', and the pair of instances is represented by pixels with pixel values between ' 0 ' and ' 1A transition region between a background region and a foreground region of the image.

The following describes the relevant procedure of the above step S108.

In step S108, generating, by a second mask image generation module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image, specifically:

(b1) cascading the second characteristic image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generating module;

(b2) restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in a second mask image generating module;

(b3) generating an example segmentation mask image according to the image, the foreground mask image and the transition band mask image which are obtained after size recovery through an image generating unit in a second mask image generating module;

and the second mask image generation module is used for segmenting the expected mask image according to a predetermined second training sample image and an example corresponding to the second training sample image and training the expected mask image.

The size of the image obtained after the cascade connection is restored to be consistent with the size of the image to be segmented through the image size restoring unit, the size of the generated example segmentation mask image can be enabled to be consistent with the size of the image to be segmented, and therefore the image with the alpha value, which is the same as the size of the original image, is directly output, and the alpha refers to the transition zone.

Referring to fig. 3, the background mask image B generated by the first mask image generating module_sForeground mask image F_sAnd transition band mask image U_sAnd the second characteristic image generated by the image characteristic extraction module is input into a cascade unit in the second mask image generation module, and the cascade unit carries out concat on the second characteristic image and the background mask image B_sForeground mask image F_sAnd transition band mask image U_sTo carry outAnd (4) cascading. The image size recovery unit in the second mask image generation module may be a unet decoder network structure, and is configured to recover the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented, where the image obtained after the size recovery is, for example, α in fig. 3_rAs shown. The image generating unit in the second mask image generating module is used for recovering the obtained image alpha according to the size_rForeground mask image F_sAnd transition band mask image U_sGenerating an example segmentation mask image, the generated example segmentation mask image being alpha in FIG. 3_pAs shown.

Specifically, the image generation unit may recover the image α from the size by the following formula_rForeground mask image F_sAnd transition band mask image U_sGenerating an example segmentation mask image alpha_p：

In this embodiment, the background mask image B generated by the first mask image generation module_sForeground mask image F_sAnd transition band mask image U_sSpecifically, the first mask image generation module firstly generates a background mask initial image B, a foreground mask initial image F and a transition band mask initial image U, and then performs normalization by the following formula to obtain a background mask image B_sForeground mask image F_sAnd transition band mask image U_s：

In this embodiment, the second mask image generation module may segment the expected mask image according to a predetermined second training sample image and an instance corresponding to the second training sample image and train the expected mask image. Based on this, a method procedure for training a second mask image generation module is presented, comprising the steps of:

(c1) when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;

(c2) calculating a training error value corresponding to a second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;

(c3) and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.

In the action (c1), the second training sample image is a predetermined training sample, and when the second mask image generation module is trained, the second training sample image may be first input to the trained image feature extraction module to perform feature extraction, so as to obtain a corresponding second feature image, then the second feature image is input to the trained first mask image generation module through the RoiAlign structure to be processed, so as to obtain an accurate background mask image, a foreground mask image and a transition band mask image, and then the accurate background mask image, foreground mask image and transition band mask image and the accurate second feature image are input to the second mask image generation module, so as to obtain an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image. On the other hand, an example segmentation expected mask image corresponding to a second training sample image determined manually in advance is obtained.

In the action (c2), the preset second loss function is used to calculate a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image, and specifically: and calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in the proximity algorithm KNN, and determining a training error value corresponding to the second mask image generation module according to the distance value.

The calculation formula of the L1 distance value in KNN is as follows:

wherein, when calculating a distance value between the instance division sample mask image and the instance division expected mask image, I₁The example segmented sample mask image described above, I, can be represented₂The example segmentation desired mask image described above may be represented. The meaning of other parameters in a formula can be referred to the general interpretation of the formula.

The way of calculating the L1 distance value in the proximity algorithm KNN is not repeated here, and the calculated distance value between the example segmented sample mask image and the example segmented expected mask image may be used as the training error value corresponding to the second mask image generation module in act (c 2). The training error value is used for representing the difference between the mask image generated by the second mask image generation module and the corresponding expected mask image, the larger the difference is, the worse the precision of the second mask image generation module is, when the difference is smaller and meets a certain requirement, the precision of the second mask image generation module can be determined to be better, and then the second mask image generation module is determined to be trained completely.

After the training error value corresponding to the second mask image generation module is determined, in the above act (c3), the calculation parameters in the second mask image generation module are adjusted according to the training error value corresponding to the second mask image generation module to train the second mask image generation module. For example, the calculation parameters in the second mask image generation module may be adjusted according to the training error value corresponding to the second mask image generation module in a bayesian parameter tuning manner, so as to train the second mask image generation module.

In a specific embodiment, after the calculation parameters in the second mask image generation module are adjusted according to the training error value corresponding to the second mask image generation module, the above actions (c1) and (c2) are repeatedly performed until the training error value corresponding to the second mask image generation module is smaller than the corresponding preset error value, and it is determined that the training of the second mask image generation module is completed.

It can be understood that, in the second mask image generation module, the image generation unit is composed of a plurality of convolution layers, and in training the second mask image generation module, the calculation parameters in the image generation unit are mainly trained. The cascade unit and the image size recovery unit in the second mask image generation module may respectively adopt a common concat structure and a unet decoder network structure.

In one embodiment of the present specification, the transition band mask image may be referred to as an α image, and in this embodiment, when the mask image is generated by the first mask image generation module and the second mask image generation module, the effect of the α channel is considered, so that the accuracy of the segmentation contour can be improved when the instance segmentation is performed.

Fig. 5 is a schematic structural diagram of an image example segmentation model according to still another embodiment of the present disclosure, and as shown in fig. 5, in this embodiment, the image example segmentation model further includes a scoring module, where the scoring module is connected to the first mask image generation module and is configured to score the first mask image generation module in a test stage of the image example segmentation model, and a score obtained by the scoring is used to indicate accuracy of the first mask image generation module in generating the background mask image, the foreground mask image, and the transition band mask image.

Correspondingly, the embodiment further provides the following steps:

(d1) in the testing stage of the image instance segmentation model, acquiring a testing background mask image, a testing foreground mask image and a testing transition band mask image which are generated by a first mask image generation module based on a testing sample image, and acquiring a characteristic image of the testing sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the testing sample image;

(d2) inputting the obtained test background mask image, test foreground mask image, test transition band mask image, characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to a scoring module, and scoring the first mask image generation module through the scoring module.

Specifically, in a testing stage of the image instance segmentation model, a test sample image is preset, the test sample image is input to a tested image feature extraction module, features of the test sample image are extracted through the image feature extraction module, and a first feature image and a second feature image are generated.

In this embodiment, the tested module means that the model passes through a corresponding test and passes the test after the training is completed. Testing the trained model in the field of machine learning is a common practice and is not described herein.

Then, the accurate second feature image is input to a first mask image generation module in a test, and the accurate second feature image is processed through the first mask image generation module in the test, so that a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on the test sample image are obtained. On the other hand, a feature image of the test sample image determined manually in advance and an expected background mask image, an expected foreground mask image and an expected transition band mask image corresponding to the test sample image determined manually in advance are obtained.

Then, the obtained test background mask image, test foreground mask image, test transition band mask image, feature image, expected background mask image, expected foreground mask image and expected transition band mask image are input to a scoring module, and the first mask image generation module is scored through the scoring module. The score obtained by the scoring is used for indicating the accuracy of the first mask image generation module in generating the background mask image, the foreground mask image and the transition band mask image, the higher the score obtained by the scoring is, the higher the accuracy of the background mask image, the foreground mask image and the transition band mask image generated by the first mask image generation module is, and when the score exceeds a preset score, the first mask image generation module is determined to pass the test.

In this embodiment, the first mask image generation module may be composed of a plurality of convolution layers and a plurality of full-link layers. Fig. 6 is a schematic structural diagram of an image instance segmentation model provided in another embodiment of the present disclosure, where the image instance segmentation model is obtained by improvement based on a maskrnnn model, as shown in fig. 6, the image instance segmentation model includes a mask iou module, which is the above-mentioned scoring module.

Fig. 7 is a schematic structural diagram of a mask iou module according to yet another embodiment of the present disclosure. Fig. 6 in fig. 7, the mask iou module is configured to perform regression between the test background mask image, the test foreground mask image, the test transition band mask image, and the expected background mask image, the expected foreground mask image, and the expected transition band mask image. In practical application, the feature image, the test background mask image, the test foreground mask image, the test transition band mask image, the expected background mask image, the expected foreground mask image and the expected transition band mask image can be spliced through the roiign structure and input into the mask iou module. When inputting, the input images are guaranteed to be equal in size by using max pooling. The dimensions in fig. 7 are for illustrative purposes and are not intended to be limiting.

As shown in fig. 6 and fig. 7, the mask iou module may be composed of 4 convolutional layers and 3 fully-connected layers, for the 4 convolutional layers, the kernel sizes and the filter numbers of all convolutional layers are set to 3 and 256, respectively, for the 3 fully-connected layers, the outputs of the first two layers are set to 1024, and the output of the last layer is set to the number of classes.

In training the mask iou module, RPN prompt may be set to the trained samples and returned to the maskoiou using the L2 penalty, with the penalty weight set to a particular value such as 1. And multiplying the predicted Mask by the classification score in the maskIoU working process to obtain a scoring value.

In this embodiment, it is inspired by the AP index, and if the quality of instance segmentation is described based on the pixel-level IOU between the prediction mask and the reference mask, it is more in line with the evaluation labeling of instance segmentation. A network is proposed in mask score rcnn to directly learn an IOU, the IOU is called MaskIou, and through the score, semantic categories can be identified, and the completeness of example masks can be identified. By such a scoring criterion, the quality of the finally obtained mask can be improved, and the AP value can be improved.

For the training and working process of the mask IOU module, reference is made to the general description, which is not too much presented here.

In other embodiments, the scoring module may calculate a first similarity between the test background mask image and the expected background mask image, and calculate a second similarity between the test foreground mask image and the expected foreground mask image, and calculate a third similarity between the test transition mask image and the expected transition mask image, and score the images according to the first similarity, the second similarity, and the third similarity.

In this embodiment, the loss function of the image instance segmentation model is formed by combining the loss function of the first mask image generation module and the loss function of the second mask image generation module, that is, the loss function of the image instance segmentation model is formed by combining the first loss function and the second loss function.

Specifically, the loss function of the image instance segmentation model can be expressed as formula L_p＝λ||α_p-α_g1+ (1-gamma) score, wherein | | | α_p-α_g|' represents the second loss function, score represents the first loss function, λ and γ represent the coefficients used in conjunction with the two loss functions, L_PRepresenting the loss function of the image instance segmentation model.

In one embodiment, when training the image instance segmentation model, the first loss function and the second loss function may be trained in combination.

An image instance segmentation model in an embodiment of the present specification may be obtained based on a maskrnn improvement, where the maskrnn model is based on a faster rcnn improvement, fig. 8 is a schematic diagram of a structural modification of fastrcnn to msskrnn provided in an embodiment of the present specification, as shown in fig. 8, Conv is a front-end feature extraction module, rpn is a region pro-spatial network module, a relevant region of interest may be obtained, ROI is pixel corrected by RoiAlign, and then each ROI is predicted using an FCN (full convolution) framework to obtain different instance classifications, so as to obtain an instance segmentation result of a final image. The loss functions used in the mask rcnn network include: classification error + detection error + segmentation error. The mask's loss function in Maskrcnn generally employs a cross-entropy loss function.

In summary, the embodiment of the present specification provides an image instance segmentation method, which performs instance segmentation based on an image instance segmentation model, where the image instance segmentation model may be applied to a mobile terminal to implement multi-objective instance segmentation, and the image instance segmentation model may be implemented based on maskrcnn and has an alpha value. By the image instance segmentation method in the embodiment, a transition region between a background and a foreground of an instance object can be considered during instance segmentation, the accuracy of the segmentation contour is improved, and the problem that the segmentation contour is not accurate enough during the instance segmentation of an image through an existing image instance segmentation model is solved.

An embodiment of the present specification further provides an image example dividing apparatus, fig. 9 is a schematic block diagram of the image example dividing apparatus provided in the embodiment of the present specification, and as shown in fig. 9, the image example dividing apparatus includes:

the feature extraction unit 91 is configured to acquire an image to be segmented, input the image to be segmented into an image instance segmentation model, and extract features of the image to be segmented and generate a first feature image and a second feature image through an image feature extraction module in the image instance segmentation model;

a location determining unit 92, configured to determine, through a classification location module in the image instance segmentation model, an instance object in the image to be segmented according to the first feature image, and determine a category and a location of the instance object;

a first generating unit 93, configured to generate, by a first mask image generating module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image;

a second generating unit 94, configured to generate, by a second mask image generating module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented.

Optionally, the first generating unit 93 is specifically configured to:

inputting the second characteristic image into the first mask image generation module through a RoiAlign structure, and performing convolution operation on the second characteristic image through the first mask image generation module to obtain the background mask image, the foreground mask image and the transition band mask image;

the first mask image generation module is obtained by training according to a predetermined first training sample image and an expected foreground mask image, an expected background mask image and an expected transition band mask image corresponding to the first training sample image.

Optionally, the apparatus further comprises a first training unit for:

when the first mask image generation module is trained, acquiring a sample foreground mask image, a sample background mask image and a sample transition band mask image which are generated by the first mask image generation module based on the first training sample image, and acquiring an expected foreground mask image, an expected background mask image and an expected transition band mask image which correspond to the first training sample image;

calculating a training error value corresponding to the first mask image generation module according to the sample foreground mask image, the expected foreground mask image, the sample background mask image, the expected background mask image, the sample transition band mask image and the expected transition band mask image by using a preset first loss function;

and adjusting the calculation parameters in the first mask image generation module according to the training error value corresponding to the first mask image generation module so as to train the first mask image generation module.

Optionally, the first training unit is specifically configured to:

calculating a first distance value between the sample foreground mask image and the expected foreground mask image by using a preset first loss function;

calculating a second distance value between the sample background mask image and the expected background mask image by using a preset first loss function;

calculating a third distance value between the sample transition band mask image and the expected transition band mask image by using a preset first loss function;

determining a training error value corresponding to the first mask image generation module according to the first distance value, the second distance value and the third distance value;

wherein the first distance value, the second distance value, and the third distance value are all L2 distance values in the proximity algorithm KNN.

Optionally, the second generating unit 94 is specifically configured to:

cascading the second feature image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generation module;

restoring the size of the image obtained after the cascade connection to be consistent with the size of the image to be segmented through an image size restoring unit in the second mask image generating module;

generating, by an image generation unit in the second mask image generation module, the example segmentation mask image according to the image obtained after size restoration, the foreground mask image, and the transition band mask image;

the second mask image generation module is used for segmenting an expected mask image according to a predetermined second training sample image and an example corresponding to the second training sample image and training the expected mask image.

Optionally, the apparatus further comprises a second training unit for:

when the second mask image generation module is trained, acquiring an example segmentation sample mask image generated by the second mask image generation module based on the second training sample image, and acquiring an example segmentation expected mask image corresponding to the second training sample image;

calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function;

and adjusting the calculation parameters in the second mask image generation module according to the training error value corresponding to the second mask image generation module so as to train the second mask image generation module.

Optionally, the second training unit is specifically configured to:

calculating a distance value between the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function, wherein the distance value is an L1 distance value in a proximity algorithm KNN;

and determining a training error value corresponding to the second mask image generation module according to the distance value.

Optionally, the image instance segmentation model further includes a scoring module, configured to score the first mask image generation module at a test stage of the image instance segmentation model, where a score obtained by scoring is used to indicate an accuracy of the first mask image generation module in generating a background mask image, a foreground mask image, and a transition band mask image; the apparatus further comprises a scoring unit for:

in a test stage of the image instance segmentation model, acquiring a test background mask image, a test foreground mask image and a test transition band mask image which are generated by the first mask image generation module based on a test sample image, and acquiring a feature image of the test sample image and an expected background mask image, an expected foreground mask image and an expected transition band mask image which correspond to the test sample image;

inputting the obtained test background mask image, test foreground mask image, test transition band mask image, second characteristic image, expected background mask image, expected foreground mask image and expected transition band mask image to the scoring module, and scoring the first mask image generation module through the scoring module.

The image example segmentation device provided in an embodiment of the present specification can implement each process in the foregoing image example segmentation method embodiments, and achieve the same functions and effects, which are not repeated here.

Further, an embodiment of the present specification further provides an image example segmentation apparatus, and fig. 10 is a schematic structural diagram of the image example segmentation apparatus provided in the embodiment of the present specification, as shown in fig. 10, the apparatus includes: memory 1001, processor 1002, bus 1003, and communication interface 1004. The memory 1001, processor 1002, and communication interface 1004 communicate via bus 1003. communication interface 1004 may include input and output interfaces including, but not limited to, a keyboard, mouse, display, microphone, and the like.

In fig. 10, the memory 1001 has stored thereon computer-executable instructions executable on the processor 1002, and when executed by the processor 1002, the computer-executable instructions implement the following procedures:

Optionally, when executed by the processor, the computer-executable instructions generate, by a first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image, including:

Optionally, the computer executable instructions, when executed by the processor, further comprise:

Optionally, when executed by the processor, the computer-executable instructions utilize a preset first loss function to calculate training error values corresponding to the first mask image generation module according to the sample foreground mask image, the desired foreground mask image, the sample background mask image, the desired background mask image, the sample transition band mask image, and the desired transition band mask image, and include:

Optionally, when executed by the processor, the computer-executable instructions generate, by a second mask image generation module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition-band mask image, including:

Optionally, when executed by the processor, the computer-executable instructions utilize a preset second loss function to calculate a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image, and include:

Optionally, when the computer-executable instructions are executed by the processor, the image instance segmentation model further includes a scoring module, configured to score the first mask image generation module in a test stage of the image instance segmentation model, where a score obtained by scoring is used to indicate an accuracy of the first mask image generation module in generating a background mask image, a foreground mask image, and a transition band mask image; further comprising:

Further, another embodiment of the present specification also provides a computer-readable storage medium for storing computer-executable instructions, which when executed by a processor implement the following process:

Optionally, when executed by the processor, the computing a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function includes:

Optionally, when the computer-executable instructions are executed by the processor, the image instance segmentation model further includes a scoring module, configured to score the first mask image generation module in a test stage of the image instance segmentation model, where a score obtained by scoring is used to indicate an accuracy of the first mask image generation module in generating the background mask image, the foreground mask image and the transition-band mask image; further comprising:

The computer-readable storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The above description is only an example of the present specification and is not intended to limit the present document. Various modifications and changes may occur to the embodiments described herein, as will be apparent to those skilled in the art. Any modifications, equivalents, improvements, etc. which come within the spirit and principle of the disclosure are intended to be included within the scope of the claims of this document.

Claims

1. An image instance segmentation method, comprising:

generating an example segmentation mask image corresponding to the example object according to the second feature image, the background mask image, the foreground mask image and the transition band mask image through a second mask image generation module in the image example segmentation model; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented;

when the instance segmentation mask image corresponding to the instance object is generated, the second feature image, the background mask image, the foreground mask image and the transition band mask image are cascaded through a cascade unit in the second mask image generation module;

and generating the example segmentation mask image according to the image obtained after size recovery, the foreground mask image and the transition band mask image by an image generation unit in the second mask image generation module.

2. The method according to claim 1, wherein generating, by a first mask image generation module in the image instance segmentation model, a background mask image and a foreground mask image corresponding to the instance object and a transition band mask image between the background and the foreground of the instance object according to the second feature image comprises:

3. The method of claim 2, further comprising:

4. The method according to claim 3, wherein calculating training error values corresponding to the first mask image generation module according to the sample foreground mask image, the desired foreground mask image, the sample background mask image, the desired background mask image, the sample transition band mask image, and the desired transition band mask image by using a preset first loss function comprises:

5. The method according to claim 1, wherein the second mask image generation module is trained by segmenting the expected mask image according to a predetermined second training sample image and an instance corresponding to the second training sample image.

6. The method of claim 5, further comprising:

7. The method of claim 6, wherein calculating a training error value corresponding to the second mask image generation module according to the example segmentation sample mask image and the example segmentation expected mask image by using a preset second loss function comprises:

8. The method according to any one of claims 1 to 7, wherein the image instance segmentation model further comprises a scoring module, which is used for scoring the first mask image generation module in a test stage of the image instance segmentation model, and a score obtained by scoring is used for indicating an accuracy of the first mask image generation module in generating a background mask image, a foreground mask image and a transition band mask image; the method further comprises the following steps:

9. An image instance segmentation apparatus, comprising:

a second generating unit, configured to generate, by a second mask image generating module in the image instance segmentation model, an instance segmentation mask image corresponding to the instance object according to the second feature image, the background mask image, the foreground mask image, and the transition band mask image; wherein the class and position of the instance object and the instance segmentation mask image are used for extracting the instance object in the image to be segmented;

the second generating unit is specifically configured to: cascading the second feature image, the background mask image, the foreground mask image and the transition band mask image through a cascading unit in the second mask image generation module;

10. The apparatus according to claim 9, wherein the first generating unit is specifically configured to:

11. The apparatus of claim 10, further comprising a first training unit to:

12. The apparatus of claim 11, wherein the first training unit is specifically configured to:

13. The apparatus of claim 9, wherein the second mask image generation module is configured to segment the expected mask image according to a predetermined second training sample image and an instance corresponding to the second training sample image.

14. The apparatus of claim 13, further comprising a second training unit to: