WO2021120695A1

WO2021120695A1 - Image segmentation method and apparatus, electronic device and readable storage medium

Info

Publication number: WO2021120695A1
Application number: PCT/CN2020/113134
Authority: WO
Inventors: 李林泽; 邹晓敏
Original assignee: 北京迈格威科技有限公司
Priority date: 2019-12-20
Filing date: 2020-09-03
Publication date: 2021-06-24
Also published as: CN111178211B; CN111178211A

Abstract

An image segmentation method and apparatus, a device and a storage medium, which aim to improve the accuracy of image segmentation. The method comprises: obtaining image features of an image to be segmented (S11); performing an up-sampling operation on the image features to obtain a foreground feature map and a background feature map (S12); normalizing pixel values of pixel points of the foreground feature map and pixel values of pixel points of the background feature map to obtain a target region mask map and a background region mask map, wherein the pixel value of each pixel point in the target region mask map represents the probability of a pixel point that corresponds to the pixel point in the image belonging to a target region, and the pixel value of each pixel point in the background region mask map represents the probability of a pixel point that corresponds to the pixel point in the image belonging to a background region (S13); and according to the target region mask map and the background region mask map, segmenting the image (S14).

Description

Image segmentation method, device, electronic equipment and readable storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911331052.6, and the invention title is "Image Segmentation Method, Apparatus, Electronic Equipment, and Readable Storage Medium" on December 20, 2019. The entire content of the application is approved The reference is incorporated in this application.

Technical field

The embodiments of the present application relate to the field of computer vision technology, and in particular, to an image segmentation method, device, electronic device, and readable storage medium.

Background technique

With the development of computer vision technology, more and more application scenarios combine computer vision technology with human biological characteristics to perform tasks such as unlocking, authentication, criminal investigation, or target tracking. Among them, human biological characteristics include, but are not limited to: fingerprints, palm prints, hand shapes, human faces, iris, auricles, and so on. During the execution of the above tasks, it is usually necessary to collect the original image including the biological characteristics of the human body, and then segment the area of the biological characteristics of the human body from the original image, and then perform the key point extraction and feature comparison processes for the segmented human biological feature area , And finally realize the execution of the above-mentioned various tasks.

Taking fingerprints as a biological feature of the human body as an example, it is first necessary to obtain the original image of the fingerprint by the fingerprint acquisition device, then it is necessary to segment the fingerprint area from the original image, and finally perform key point extraction, fingerprint alignment, and comparison of the segmented fingerprint area. Recognition and other processes. Among them, the success of fingerprint region segmentation and the accuracy of fingerprint region segmentation directly affect the subsequent process, and affect the final unlocking result, authentication result, criminal investigation result or target tracking result, etc.

In related technologies, in order to segment the area where the fingerprints and other human biological features are located from the original image, it is usually to obtain the contrast, direction consistency and other characteristics of the fingerprint area from the original image, and then combine the rules of artificial induction to segment the original image Fingerprint area. However, when there is a complex background in the original image, it is difficult to successfully segment a more accurate fingerprint area when the fingerprint area is segmented using the above method.

Summary of the invention

The embodiments of the present application provide an image segmentation method, device, device, and storage medium, aiming to improve the accuracy of image segmentation.

The first aspect of the embodiments of the present application provides an image segmentation method, the method including:

Performing feature extraction on the image to be segmented to obtain image features of the image to be segmented;

Perform an up-sampling operation on the image feature to obtain a foreground feature map and a background feature map. Each pixel in the foreground feature map corresponds to each pixel in the image to be segmented, and the foreground feature map Characterization of the pixel value of each pixel in the image to be segmented: the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background feature map is related to the pixel in the image to be segmented. Each pixel has a one-to-one correspondence, and the pixel value of each pixel in the background feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

Normalize the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask map and a background area mask map, wherein the target area The pixel value of each pixel in the mask image represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and the pixel of each pixel in the background area mask image Value represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

The image to be divided is segmented according to the target area mask image and the background area mask image.

A second aspect of the embodiments of the present application provides an image segmentation device, the device including:

The feature extraction module is used to perform feature extraction on the image to be segmented to obtain the image features of the image to be segmented;

The up-sampling module is used to perform an up-sampling operation on the image features to obtain a foreground feature map and a background feature map, each pixel in the foreground feature map corresponds to each pixel in the image to be segmented one-to-one, The pixel value of each pixel in the foreground feature map is characterized by the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background feature map is related to all the pixels in the background feature map. Each pixel in the image to be segmented corresponds one-to-one, and the pixel value of each pixel in the background feature map represents: the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the background area ；

The normalization module is used to normalize the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask image and a background area mask image , Wherein the pixel value of each pixel in the target area mask image represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and the background area mask image The pixel value of each pixel in, represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

The segmentation module is configured to segment the image to be segmented according to the target area mask image and the background area mask image.

A third aspect of the embodiments of the present application provides a readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps in the image segmentation method as described in the first aspect of the present application are implemented.

The fourth aspect of the embodiments of the present application provides an electronic device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor executes the computer program, the first The steps of the image segmentation method described in the aspect.

The fifth aspect of the embodiments of the present application provides a computer program, including computer-readable code, which when the computer-readable code runs on a computing processing device, causes the computing processing device to execute the image segmentation method described above.

The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the specification, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.

Using the image segmentation method provided in this application, feature extraction is performed on the image to be segmented to obtain the image features of the image to be segmented; then the image features are up-sampled to obtain the foreground feature map and the background feature map; and then the foreground feature map and The pixel value of each pixel of the background feature map is normalized to obtain the target area mask image and the background area mask image; finally, the image to be segmented is segmented according to the target area mask image and the background area mask image.

Among them, because each pixel of the foreground feature map corresponds to each pixel of the image to be segmented, and the pixel value of each pixel in the foreground feature map represents: the pixel corresponding to the pixel in the image to be segmented Possibility of belonging to the target area. At the same time, each pixel of the background feature map corresponds to each pixel of the image to be segmented, and the pixel value of each pixel in the background feature map represents: the pixel corresponding to the pixel in the image to be segmented belongs to the background Possibility of the area. Therefore, after normalizing the pixel value of each pixel in the foreground feature map and the background feature map, the obtained pixel value of each pixel in the target area mask image indicates: The corresponding pixel point belongs to the probability of the target area, and the pixel value of each pixel point in the background area mask image represents the probability that the pixel point corresponding to the pixel point in the image to be segmented belongs to the background area.

In this way, according to the target area mask map and the background area mask map, the image to be segmented is segmented based on the probability that each pixel belongs to the target area and/or the background area, so as to obtain a more accurate segmentation result.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

Fig. 1 is a flowchart of an image segmentation method proposed in an embodiment of the present application;

Fig. 2 is a schematic diagram of an image segmentation method proposed in an embodiment of the present application;

FIG. 3 is a schematic diagram of fingerprint image area division proposed by an embodiment of the present application;

FIG. 4 is a schematic diagram of a network structure corresponding to several upsampling methods proposed in an embodiment of the present application;

FIG. 5 is a schematic diagram of normalization proposed by an embodiment of the present application;

Fig. 6 is a segmentation effect diagram of the image segmentation method proposed by an embodiment of the present application;

FIG. 7 is a flowchart of an image segmentation method proposed by another embodiment of the present application;

FIG. 8 is a schematic diagram of an image segmentation method proposed by another embodiment of the present application;

FIG. 9 is a schematic diagram of feature map fusion proposed by an embodiment of the present application;

Fig. 10 is a flow chart of model training proposed in an embodiment of the present application;

FIG. 11 is a schematic diagram of an image segmentation device proposed in an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

In related technologies, more and more application scenarios combine computer vision technology with human biological characteristics to perform tasks such as unlocking, authentication, criminal investigation, or target tracking. Among them, human biological characteristics include, but are not limited to: fingerprints, palm prints, hand shapes, human faces, iris, auricles, and so on. During the execution of the above tasks, it is usually necessary to collect the original image including the biological characteristics of the human body, and then segment the area of the biological characteristics of the human body from the original image, and then perform the key point extraction and feature comparison processes for the segmented human biological feature area , And finally realize the execution of the above-mentioned various tasks.

Taking fingerprints as a biological feature of the human body as an example, it is first necessary to obtain the original image of the fingerprint by the fingerprint acquisition device, then it is necessary to segment the fingerprint area from the original image, and finally perform key point extraction, fingerprint alignment, and comparison of the segmented fingerprint area. Recognition and other processes. In order to segment the area where the fingerprints and other human biological features are located from the original image, currently, the contrast and direction consistency of the fingerprint area are usually obtained from the original image, and then combined with artificial induction rules to segment the fingerprint area from the original image. However, when there is a complex background in the original image, it is difficult to successfully segment a more accurate fingerprint area when the fingerprint area is segmented using the above method.

To this end, some embodiments of the present application propose image segmentation methods, aiming to improve the accuracy of image segmentation. Referring to FIG. 1, FIG. 1 is a flowchart of an image segmentation method proposed in an embodiment of the present application. As shown in Figure 1, the method includes the following steps:

Step S11: Perform feature extraction on the image to be segmented to obtain image features of the image to be segmented.

Wherein, the image to be segmented is an image including biological characteristics of the human body. For example, the image to be segmented may be: a fingerprint image, a palmprint image, a face image, or an iris image, etc. For example, after the image segmentation method of the present application is applied to a fingerprint identification system, the fingerprint identification system performs feature extraction on the fingerprint image when performing step S11. Or as an example, after the image segmentation method of the present application is applied to a face recognition system, the face recognition system performs feature extraction on the face image when the face recognition system executes step S11.

In order to realize the feature extraction of the image to be segmented, in some embodiments, the feature extraction module CNN can be used to perform feature extraction on the image to be recognized. Referring to FIG. 2, FIG. 2 is a schematic diagram of an image segmentation method proposed by an embodiment of the present application. As shown in Figure 2, taking the image to be segmented is a fingerprint image as an example, the fingerprint image of N×H×W is input to the feature extraction module CNN to obtain the feature map of N'×H'×W', that is, the image feature. Among them, N represents the number of image channels of the fingerprint image, H represents the height of the fingerprint image, W represents the width of the fingerprint image, N'represents the number of channels of the image feature, H'represents the height of the image feature, and W'represents the width of the image feature. Generally, H'is less than H, and W'is less than W. In Figure 2, the specific structure of the feature extraction module CNN can be VGG, ResNet (Residual Network), or ShuffleNet and other network backbone structures.

Before using the feature extraction module CNN to perform feature extraction on the image to be recognized, the feature extraction module CNN can be established in advance, and then the sample image is used to train it, and finally the trained feature extraction module CNN is used to perform feature extraction on the image to be recognized to obtain Corresponding image characteristics. For specific training methods, please see below.

Step S12: Perform an up-sampling operation on the image feature to obtain a foreground feature map and a background feature map. Each pixel in the foreground feature map corresponds to each pixel in the image to be segmented, and the foreground The pixel value of each pixel in the feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background feature map is related to the pixel to be segmented. Each pixel in the image has a one-to-one correspondence, and the pixel value of each pixel in the background feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the background area.

By performing step S12, the image features are up-sampled to obtain the foreground feature map and the background feature map whose resolution is consistent with the resolution of the image to be segmented. In this way, each pixel of the foreground feature map can be compared with each pixel of the image to be segmented. There is one-to-one correspondence, and each pixel of the background feature map can be one-to-one corresponding to each pixel of the image to be segmented.

Among them, the target area refers to the area to be segmented from the image to be segmented, and the background area refers to the area other than the target area in the image to be segmented. Referring to FIG. 3, FIG. 3 is a schematic diagram of fingerprint image area division proposed by an embodiment of the present application. As shown in Figure 3, taking a fingerprint image as an example, the area within the dashed frame is the effective area of the fingerprint, that is, the target area, and the area outside the dashed frame is the background area.

As described above, the foreground feature map includes multiple pixels, and the multiple pixels correspond to their respective pixel values. For each pixel in the foreground feature map, the larger the pixel value of the pixel, the more likely the pixel at the same position in the image to be segmented belongs to the target area.

As described above, the background feature map includes multiple pixels, and the multiple pixels correspond to respective pixel values. For each pixel in the background feature map, the larger the pixel value of the pixel, the more likely the pixel at the same position in the image to be segmented belongs to the background area.

In order to implement an up-sampling operation on image features, in some embodiments, an image segmentation module may be used to up-sample the image features. As shown in Figure 2, taking the image to be segmented is a fingerprint image as an example, input the feature map of N'×H'×W' into the image segmentation module to obtain the segmentation result of 2×H×W, namely the foreground feature map and the background feature The resolutions of the foreground feature map and the background feature map are both H×W. In Figure 2, the up-sampling methods adopted by the image segmentation module include but are not limited to: deconvolution up-sampling, picture scaling up-sampling, and sub-pixel convolution up-sampling. The up-sampling method of the image segmentation module determines the network structure of the image segmentation module.

Referring to FIG. 4, FIG. 4 is a schematic diagram of a network structure corresponding to each of the several upsampling methods proposed in an embodiment of the present application. As shown in Figure 4, the network structure corresponding to deconvolution and upsampling includes deconvolution network deconvolution; the network structure corresponding to image scaling and upsampling includes: convolutional neural network CNN and image scaling module; sub-pixel convolutional upsampling corresponds to The network structure includes: convolutional neural network CNN (optional) and sub-pixel convolution module sub-pixel convolution.

Before using the above-mentioned image segmentation module to up-sampling the image features, the image segmentation module can be established in advance, and then the sample image is used to train it, and finally the trained image segmentation module is used to up-sample the image features to obtain the corresponding foreground Feature map and background feature map. For specific training methods, please see below.

Step S13: Normalize the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask map and a background area mask map, wherein The pixel value of each pixel in the target area mask image represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background area mask image The pixel value of the point represents: the probability that the pixel point corresponding to the pixel point in the image to be segmented belongs to the background area.

For example, referring to FIG. 5, FIG. 5 is a normalized schematic diagram proposed by an embodiment of the present application. As shown in FIG. 5, the pixel value of the pixel point A1 in the foreground feature map is 7, and the pixel value of the pixel point A2 at the same position in the background feature map is 2. As shown in Figure 2, the softmax function can be used to normalize the pixel value of the pixel A1 and the pixel value of the pixel A2. In the normalized target area mask image, the pixel value of pixel A1' is equal to e ⁷ /(e ⁷ +e ² ), which is approximately equal to 1. In the normalized background area mask image, the pixel value The pixel value of A2' is equal to e ² /(e ⁷ +e ² ), that is, approximately equal to zero. In this way, the pixel A located at the same position in the image to be segmented is likely to belong to the target area, but it is almost impossible to belong to the background area.

Continuing as shown in FIG. 5, the pixel value of the pixel point B1 in the foreground feature map is 3, and the pixel value of the pixel point B2 at the same position in the background feature map is 5. The softmax function can be used to normalize the pixel value of the pixel point B1 and the pixel value of the pixel point B2. In the normalized target area mask image, the pixel value of pixel point B1' is equal to e ³ /(e ³ +e ⁵ ), which is equal to 0.12; in the normalized background area mask image, pixel point B2 The pixel value of 'is equal to e ⁵ /(e ³ +e ⁵ ), which is approximately equal to 0.88. In this way, the probability that the pixel B located at the same position in the image to be segmented belongs to the target area is 0.12, and the probability that it belongs to the background area is 0.88.

Step S14: segment the image to be segmented according to the target area mask image and the background area mask image.

Specifically, the target area can be segmented from the image to be segmented according to the pixel value of each pixel in the target area mask image; the background can be segmented from the image to be segmented according to the pixel value of each pixel in the background area mask image area.

In some embodiments, the above step S14 may specifically include the following sub-steps:

Sub-step S14-1: For each pixel in the target area mask image, if the pixel value of the pixel is greater than the first preset threshold, compare the pixel in the image to be segmented The corresponding pixel is determined to belong to the target area, and the pixel of the target area is segmented from the image to be segmented;

And/or sub-step S14-2: For each pixel in the background area mask image, if the pixel value of the pixel is greater than a second preset threshold, combine the image to be divided with the The pixel point corresponding to the pixel point is determined to belong to the background area; and the pixel point of the background area is segmented from the image to be segmented.

For example, taking the first preset threshold equal to 0.5 as an example, for each pixel in the target area mask image, if the pixel value of the pixel is greater than 0.5, the pixel value of the pixel is updated to 1. , Otherwise update to 0. In this way, an updated target area mask image is obtained, in which pixels with a pixel value of 1 correspond to the target area in the image to be divided, and pixels with a pixel value of 0 in the figure correspond to the background area in the image to be divided. Then multiply the updated target area mask image with the image to be segmented, in other words, the pixel value of each pixel in the updated target area mask image is one-to-one with the pixel value of the corresponding pixel in the image to be segmented Multiply together. In this way, the product of pixel values corresponding to pixels in the background area in the image to be segmented is equal to 0, and the product of pixel values corresponding to pixels in the target area is equal to the original pixel value, thereby segmenting the target area from the image to be segmented.

Taking the second preset threshold equal to 0.5 as an example, for each pixel in the mask image of the background area, if the pixel value of the pixel is greater than 0.5, the pixel value of the pixel is updated to 1, otherwise it is updated Is 0. In this way, an updated background area mask image is obtained. In the figure, a pixel with a pixel value of 1 corresponds to the background area in the image to be divided, and a pixel with a pixel value of 0 in the figure corresponds to the target area in the image to be divided. Then multiply the updated background area mask image with the image to be divided, in other words, the pixel value of each pixel in the updated background area mask image is one-to-one with the pixel value of the corresponding pixel in the image to be divided Multiply together. In this way, the product of pixel values corresponding to pixels in the background area in the image to be segmented is equal to the original pixel value, and the product of pixel values corresponding to pixels in the target area is equal to 0, thereby segmenting the background area from the image to be segmented.

Compared with the prior art, by executing the above-mentioned image segmentation method including steps S11 to S14, it has at least the following advantages:

Since each pixel of the foreground feature map corresponds to each pixel of the image to be segmented, and the pixel value of each pixel in the foreground feature map represents: the pixel corresponding to the pixel in the image to be segmented belongs to the target Possibility of the area. At the same time, each pixel of the background feature map corresponds to each pixel of the image to be segmented, and the pixel value of each pixel in the background feature map represents: the pixel corresponding to the pixel in the image to be segmented belongs to the background Possibility of the area. Therefore, after normalizing the pixel value of each pixel in the foreground feature map and the background feature map, the obtained pixel value of each pixel in the target area mask image indicates: The corresponding pixel point belongs to the probability of the target area, and the pixel value of each pixel point in the background area mask image represents the probability that the pixel point corresponding to the pixel point in the image to be segmented belongs to the background area.

In this way, according to the target area mask map and the background area mask map, the image to be segmented is segmented based on the probability that each pixel belongs to the target area and/or the background area, so as to obtain a more accurate segmentation result and improve the image segmentation recall rate.

Referring to FIG. 6, FIG. 6 is a segmentation effect diagram of the image segmentation method proposed by an embodiment of the present application. As shown in Figure 6, the top three fingerprint images are the three images to be segmented, the bottom three fingerprint images are the segmentation result images after segmentation, and the dashed frame area in each of the three segmentation result images is the segmentation. Fingerprint area. As shown in Figure 6, each image to be segmented is accurately segmented into fingerprint regions.

Referring to FIG. 7, FIG. 7 is a flowchart of an image segmentation method proposed by another embodiment of the present application. As shown in Figure 7, the method includes the following steps:

Step S71: Perform feature extraction operations of multiple scales on the image to be segmented to obtain multiple image features of different scales of the image to be segmented.

Step S72: Perform an up-sampling operation on the multiple image features of different scales, respectively, to obtain a foreground feature map and a background feature map corresponding to each of the multiple image features.

Among them, step S71 is used as a specific implementation of the above step S11, and step S72 is used as a specific implementation of the above step S12.

In order to perform multi-scale feature extraction operations on the image to be segmented, in some embodiments, the image to be segmented can be input to multiple convolutional neural networks CNN, and the number of convolutional layers included in each convolutional neural network CNN Different from each other. The more the number of convolutional layers of a convolutional neural network CNN, the deeper the scale of the feature extraction operation of the convolutional neural network CNN to be segmented, and the deeper the scale of the image features output by the convolutional neural network CNN .

Referring to FIG. 8, FIG. 8 is a schematic diagram of an image segmentation method proposed by another embodiment of the present application. In other embodiments, as shown in FIG. 8, after the convolutional neural network CNN1 performs feature extraction on the image to be segmented N×H×W, to obtain the image feature N′×H′×W′. On the one hand, the image feature N'×H'×W' can be input to the image segmentation module 1 to perform an up-sampling operation to obtain a segmentation result of 2×H×W, that is, a foreground feature map and a background feature map. On the other hand, the image feature N'×H'×W' can be input into the convolutional neural network CNN2.

After the convolutional neural network CNN2 performs further feature extraction on the image feature N'×H'×W' to obtain the image feature N”×H”×W”. On the one hand, the image feature N”×H”× The W" input image segmentation module 2 performs an up-sampling operation to obtain a segmentation result of 2×H×W, that is, a foreground feature map and a background feature map. On the other hand, the image feature N”×H”×W” can be input into the convolutional neural network CNN3.

By analogy, multiple image features of different scales of the image to be segmented can be obtained, and an up-sampling operation is performed on the multiple image features to obtain the foreground feature map and the background feature map corresponding to each of the multiple image features. In Figure 8, H"'<H"<H'<H, W"'<W"<W'<W, image features N'×H'×W', N”×H”×W” and N” The scale of'×H”'×W”' increases in order.

This application obtains multiple image features of different scales of the image to be segmented, and performs subsequent up-sampling, fusion, and normalization processes based on the image features of various scales, thereby increasing the complexity of image features and enabling image features to be Including features with richer scales will further improve the accuracy of image segmentation. In addition, the above-mentioned steps S71 and S72 are executed using the model shown in FIG. 8, the structure of the model is richer, and it has stronger learning ability during training, which is also conducive to further improving the accuracy of image segmentation.

As shown in Fig. 7, the image segmentation method may further include the following steps:

Step S73-1: Fusion of the foreground feature maps corresponding to multiple image features to obtain a fused foreground feature map, and fusion of the background feature maps corresponding to each of the multiple image features to obtain a fused background feature map Figure.

Step S73-2: Normalize the pixel value of each pixel of the fused foreground feature map and the pixel value of each pixel of the fused background feature map to obtain the target area mask image and background Area mask map.

Among them, step S73-1 and step S73-2 are used as a specific implementation of the above step S13.

When performing step S73-1, the multiple foreground feature maps can be superimposed first according to the feature depth order of the multiple foreground feature maps; then the superimposed multiple foreground feature maps are convolved to obtain a fused foreground feature map . Similarly, the multiple background feature maps can be superimposed first according to the feature depth order of the multiple background feature maps; then, the superimposed multiple background feature maps can be convolved to obtain a fused background feature map.

For example, referring to FIG. 9, FIG. 9 is a schematic diagram of feature map fusion proposed by an embodiment of the present application. As shown in FIG. 9, the 2M×H×W image features are divided into M×H×W foreground feature maps and M×H×W background feature maps. Among them, the image features of 2M×H×W correspond to foreground feature maps of M scales and background feature maps of M scales. In other words, in step S71, feature extraction operations of M scales are performed on the image to be segmented. Among them, the M×H×W foreground feature map is formed by sequentially superimposing M 1×H×W foreground feature maps in the order of feature depth. The M×H×W background feature map is formed by sequentially superimposing M 1×H×W background feature maps in the order of feature depth.

As shown in Figure 9, the single-layer convolutional neural network CNN1 is used to convolve the M×H×W foreground feature map to obtain a 1×H×W fused foreground feature map. A single-layer convolutional neural network CNN2 is used to convolve the M×H×W background feature map to obtain a 1×H×W fused background feature map. Finally, the fused foreground feature map and the fused background feature map are superimposed to obtain a 2×H×W fusion feature map. Among them, the network structure shown in FIG. 9 is the segmentation result fusion module in FIG. 8.

When performing step S73-2, reference may be made to the explanation of step S13 above and the content shown in FIG. 5, which is not repeated in this application.

Step S74: segment the image to be segmented according to the target area mask image and the background area mask image.

When performing step S74, reference may be made to the above-mentioned explanation of step S14, which will not be repeated in this application.

In addition, after performing the above step S12, multiple methods of upsampling operations can be simultaneously performed on the image features obtained in the above step S11, thereby integrating the advantages of multiple upsampling methods, and thereby improving the accuracy of image segmentation.

Specifically, for the image features obtained in step S11, the image features are respectively up-sampled through multiple up-sampling paths to obtain a foreground feature map and a background feature map output by each up-sampling path. In a possible implementation manner, the image feature may be input to an image segmentation module that includes multiple upsampling paths, so that the image features are individually upsampled through each upsampling path in the image segmentation module , Respectively obtain a foreground feature map and a background feature map output by each up-sampling path, wherein the up-sampling modes corresponding to each of the multiple up-sampling paths are different from each other.

For example, the image segmentation module includes three up-sampling paths. The network structures of the three up-sampling paths are shown in Figure 4. The three up-sampling paths implement deconvolution and up-sampling of image features, image scaling up-sampling, and Sub-pixel convolutional upsampling. After the image segmentation module is used to up-sampling the image features, the foreground image features and background image features output by each up-sampling path are obtained.

Then when performing the above step S13, you can refer to the content shown in Figure 9. First, the multiple foreground feature maps output by the multiple upsampling paths are merged to obtain a fused foreground feature map, and the multiple upsampling paths output Fusion of multiple background feature maps to obtain a fused background feature map; then the pixel value of each pixel of the fused foreground feature map and the pixel value of each pixel of the fused background feature map are performed Normalize to obtain the target area mask image and the background area mask image.

Or when performing the above step S72, for each of the multiple image features of different scales obtained in the above step S71, the image feature can be up-sampling in multiple ways at the same time, thereby integrating multiple up-sampling The advantages of the method, and then improve the accuracy of image segmentation.

Above, this application has introduced the application process of the image segmentation method through the embodiments. In some embodiments, the application process of the image segmentation method involves the feature extraction module CNN and the image segmentation module. In the following, the present application introduces the training process of each module through embodiments. It should be understood that the implementation of the above-mentioned image segmentation module does not necessarily depend on the above-mentioned various modules, and the application of the above-mentioned various modules should not be understood as a limitation of the present application.

Refer to FIG. 10, which is a flow chart of model training proposed in an embodiment of the present application. As shown in Figure 10, the training process includes the following steps:

Step S10-1: Obtain a sample image, the sample image carries a target area mask annotation map and a background area mask annotation map, and the pixel value of the pixel point of the target area in the target area mask annotation map is the first pixel value , The pixel value of the pixel in the background area is the second pixel value, the pixel value of the pixel in the target area in the background area mask annotation map is the second pixel value, and the pixel value of the pixel in the background area is The first pixel value.

Wherein, the first pixel value is different from the second pixel value. In a possible implementation, the first pixel has a value of 1, and the second pixel has a value of zero.

For example, multiple sample fingerprint images are obtained, and for each sample fingerprint image, the target area mask annotation map and the background area mask annotation map of the sample fingerprint image are generated according to the fingerprint area and the background area in the sample fingerprint image. Among them, the pixel value of the pixel in the fingerprint area in the target area mask annotation map is 1, and the pixel value of the pixel in the background area is 0. The pixel value of the pixel in the fingerprint area in the background area mask annotation map is 0, and the pixel value of the pixel in the background area is 1.

In order to further improve the applicable scope of the model, multiple sample fingerprint images can come from multiple scenes. Examples include: sample fingerprint images collected in cold weather, sample fingerprint images collected when fingertips are wet, and sample fingerprint images collected when fingertips are stained with inkpad.

In order to make the model applicable to terminal devices such as mobile phones and tablet computers, considering that the fingerprint collection area reserved on terminal devices such as mobile phones and tablet computers is small, usually smaller than the area of a fingertip. To this end, the sample fingerprint image can be cropped. For example, crop the original sample fingerprint image with the resolution of H×W into H/2×W/2 sample fingerprint image, and then generate H/2×W/2 sample fingerprint image for the H/2×W/2 sample fingerprint image The target area mask annotated map and the background area mask annotated map.

Step S10-2: Input the sample image into a preset model to perform feature extraction on the sample image through the preset model to obtain image features, and upload the image features through the preset model Sampling operation to obtain foreground prediction feature maps and background prediction feature maps.

For example, the structure of the preset model can refer to the network structure shown in FIG. 2 or FIG. 8. Among them, the image segmentation module may include one or more up-sampling paths. If the image segmentation module includes multiple up-sampling paths, the up-sampling paths are different from each other.

Step S10-3: Perform a normalization operation on the respective pixel values of the foreground prediction feature map and the background prediction feature map to obtain a target region mask prediction map and a background region mask prediction map.

For the specific manner of the normalization operation, reference can be made to the explanation of the above step S13 and the content shown in FIG. 5, which is not repeated in this application.

Step S10-4: Update the preset model according to the target area mask prediction map, the background area mask prediction map, the target area mask annotation map, and the background area mask annotation map.

For example, the cross entropy loss of the target area mask prediction map and the target area mask annotation map can be calculated, and then the cross entropy loss can be used to update each module of the preset model. Alternatively, the cross entropy loss of the background area mask prediction map and the background area mask annotation map can be calculated, and then the cross entropy loss can be used to update each module of the preset model. Or you can calculate the cross entropy loss of the target area mask prediction map and the target area mask annotation map, and calculate the cross entropy loss of the background area mask prediction map and the background area mask annotation map, and finally calculate the average of the two cross entropy losses Loss value, using the average loss value to update each module of the preset model.

In addition, in order to simplify the preset model so that the preset model can be deployed on devices with limited computing capabilities such as mobile phones and attendance machines, the parameters of the preset model can be changed from floating point to integer to achieve model quantification. Then the quantized preset model is used as the teacher model, and the quantized preset model is used as the student model, and the loss function is established in the output layer of the two models, and then the quantized preset model is retrained.

Based on the same inventive concept, an embodiment of the present application provides an image segmentation device. Referring to FIG. 11, FIG. 11 is a schematic diagram of an image segmentation device proposed in an embodiment of the present application. As shown in Figure 11, the device includes:

The feature extraction module 1101 is configured to perform feature extraction on the image to be segmented to obtain image features of the image to be segmented;

The up-sampling module 1102 is used to perform an up-sampling operation on the image features to obtain a foreground feature map and a background feature map, each pixel in the foreground feature map corresponds to each pixel in the image to be segmented one-to-one , The pixel value of each pixel in the foreground feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background feature map is Each pixel in the image to be segmented corresponds one-to-one, and the pixel value of each pixel in the background feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the background area Sex

The normalization module 1103 is used to normalize the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask image and a background area mask Where the pixel value of each pixel in the target area mask image represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and the background area mask image The pixel value of each pixel in represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

The segmentation module 1104 is configured to segment the image to be segmented according to the target area mask map and the background area mask map.

Optionally, the feature extraction module is specifically configured to: perform feature extraction operations of multiple scales on the image to be segmented to obtain multiple image features of different scales of the image to be segmented;

The up-sampling module is specifically configured to perform an up-sampling operation on the multiple image features of different scales respectively, to obtain a foreground feature map and a background feature map corresponding to each of the multiple image features.

Optionally, the normalization module includes:

The feature map fusion sub-module is used to fuse the respective foreground feature maps of multiple image features to obtain a fused foreground feature map, and to fuse the background feature maps corresponding to each of the multiple image features to obtain a fusion Background feature map;

The normalization sub-module is used to normalize the pixel value of each pixel of the fused foreground feature map and the pixel value of each pixel of the fused background feature map to obtain the target area mask Figure and background area mask map.

Optionally, the feature map fusion sub-module includes:

The first feature map superimposing subunit is used to superimpose multiple foreground feature maps according to the feature depth sequence of the multiple foreground feature maps;

The first feature map convolution subunit is used to convolve multiple superimposed foreground feature maps to obtain the merged foreground feature map;

The second feature map superimposing subunit is used to superimpose multiple background feature maps according to the feature depth sequence of the multiple background feature maps;

The second feature map convolution subunit is used to convolve multiple background feature maps after superimposition to obtain the fused background feature map.

Optionally, the up-sampling module is specifically configured to: perform up-sampling processing on the image features through multiple up-sampling paths, respectively, to obtain a foreground feature map and a background feature map output by each up-sampling path. , Where the corresponding up-sampling modes of the multiple up-sampling paths are different from each other;

The normalization module includes:

The feature map fusion sub-module is used to fuse multiple foreground feature maps output by multiple upsampling paths to obtain a fused foreground feature map, and to fuse multiple background feature maps output from multiple upsampling paths, Obtain a fused background feature map;

The segmentation module includes:

The target area segmentation sub-module is used to, for each pixel in the target area mask image, if the pixel value of the pixel is greater than the first preset threshold, compare the pixel in the image to be segmented The pixel point corresponding to the point is determined to belong to the target area, and the pixel point of the target area is segmented from the image to be segmented;

And/or the background region segmentation sub-module, for each pixel in the background region mask image, if the pixel value of the pixel is greater than a second preset threshold, the image to be segmented The pixel point corresponding to the pixel point is determined to belong to the background area; and the pixel point of the background area is segmented from the image to be segmented.

Optionally, the device further includes:

The sample image obtaining module is used to obtain a sample image, the sample image carries a target area mask annotated map and a background area mask annotated map, and the pixel value of the pixel point of the target area in the target area mask annotated map is the first Pixel value, the pixel value of the pixel in the background area is the second pixel value, the pixel value of the pixel in the target area in the background area mask annotation map is the second pixel value, and the pixel value of the pixel in the background area Is the first pixel value;

The prediction feature map obtaining module is used to perform feature extraction on the sample image through the preset model to obtain image features, and perform an up-sampling operation on the image features through the preset model to obtain foreground prediction features Map and background prediction feature map;

A mask prediction map obtaining module, configured to perform a normalization operation on the respective pixel values of the foreground prediction feature map and the background prediction feature map to obtain a target region mask prediction map and a background region mask prediction map;

The model update module is used to update the preset model according to the target area mask prediction map, the background area mask prediction map, the target area mask annotation map, and the background area mask annotation map.

Using the image segmentation device provided in this application, feature extraction is performed on the image to be segmented to obtain the image features of the image to be segmented; then the image features are up-sampled to obtain the foreground feature map and the background feature map; and then the foreground feature map and The pixel value of each pixel of the background feature map is normalized to obtain the target area mask image and the background area mask image; finally, the image to be segmented is segmented according to the target area mask image and the background area mask image, so that The image segmentation device provided by this application has at least the following advantages:

Based on the same inventive concept, another embodiment of the present application provides a readable storage medium on which a computer program is stored. When the program is executed by a processor, the image segmentation method described in any of the foregoing embodiments of the present application is implemented step.

The computer-readable storage medium provided by the embodiments of this application includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM ( Random Access Memory), EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory, magnetic Card or light card. That is, a readable storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer).

Based on the same inventive concept, another embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The steps in the image segmentation method described in the embodiment.

As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other.

Those skilled in the art should understand that the embodiments of the embodiments of the present application may be provided as methods, devices, or computer program products. Therefore, the embodiments of the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of the present application may adopt the form of computer program products implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

The embodiments of this application are described with reference to the flowcharts and/or block diagrams of the methods, terminal devices (systems), and computer program products according to the embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions.

Therefore, an embodiment of the present application further provides a computer program, including computer-readable code, when the computer-readable code runs on a computing processing device, it can cause the computing processing device to execute the explanation in any one of the embodiments of this application. Any of the image segmentation methods. Specifically, these computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing terminal equipment to generate a machine, so that the processor of the computer or other programmable data processing terminal equipment The executed instructions generate means for realizing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing terminal equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The instruction device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing terminal equipment, so that a series of operation steps are executed on the computer or other programmable terminal equipment to produce computer-implemented processing, so that the computer or other programmable terminal equipment The instructions executed above provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Although the preferred embodiments of the embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the embodiments of the present application.

Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. Or there is any such actual relationship or sequence between operations. Moreover, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or terminal device including a series of elements not only includes those elements, but also includes those elements that are not explicitly listed. Other elements listed, or also include elements inherent to this process, method, article, or terminal device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article, or terminal device that includes the element.

Those skilled in the art can understand that the various operations, methods, and steps, measures, and solutions in the process that have been discussed in this application can be alternated, changed, combined, or deleted. Further, various operations, methods, and other steps, measures, and solutions in the process that have been discussed in this application can also be alternated, changed, rearranged, decomposed, combined, or deleted. Further, the steps, measures, and solutions in the various operations, methods, and procedures in the prior art that have the same operations, methods, and procedures disclosed in this application can also be alternated, changed, rearranged, decomposed, combined, or deleted.

The image segmentation method, device, electronic equipment, and readable storage medium provided by this application are described in detail above. Specific examples are used in this article to illustrate the principles and implementations of this application. The description of the above embodiments It is only used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application. In summary, The content of this manual should not be construed as a limitation on this application. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present disclosure, and these changes and modifications will fall within the protection scope of the present invention.

Claims

An image segmentation method, characterized in that the method includes:

Performing feature extraction on the image to be segmented to obtain image features of the image to be segmented;

Perform an up-sampling operation on the image feature to obtain a foreground feature map and a background feature map. Each pixel in the foreground feature map corresponds to each pixel in the image to be segmented, and the foreground feature map Characterization of the pixel value of each pixel in the image to be segmented: the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background feature map is related to the pixel in the image to be segmented. Each pixel has a one-to-one correspondence, and the pixel value of each pixel in the background feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

Normalize the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask map and a background area mask map, wherein the target area The pixel value of each pixel in the mask image represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and the pixel of each pixel in the background area mask image Value represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

The image to be divided is segmented according to the target area mask image and the background area mask image.
The method according to claim 1, wherein the performing feature extraction on the image to be segmented to obtain the image features of the image to be segmented comprises:

Performing feature extraction operations of multiple scales on the image to be segmented to obtain image features of multiple scales of the image to be segmented;

The performing an up-sampling operation on the image features to obtain a foreground feature map and a background feature map includes:

Up-sampling operations are performed on the multiple image features of different scales, respectively, to obtain a foreground feature map and a background feature map corresponding to each of the multiple image features.
The method according to claim 2, wherein said normalizing the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask The mask map and background area mask map, including:

Fuse the respective foreground feature maps of multiple image features to obtain a fused foreground feature map, and fuse the background feature maps corresponding to each of the multiple image features to obtain a fused background feature map;

The pixel value of each pixel of the fused foreground feature map and the pixel value of each pixel of the fused background feature map are normalized to obtain the target area mask map and the background area mask map.
The method according to claim 3, wherein the fusing the foreground feature maps corresponding to each of the multiple image features to obtain a fused foreground feature map comprises:

Superimpose multiple foreground feature maps according to the feature depth sequence of multiple foreground feature maps;

Convolve the superimposed multiple foreground feature maps to obtain the merged foreground feature map;

The fusion of the background feature maps corresponding to each of the multiple image features to obtain a fused background feature map includes:

Superimpose multiple background feature maps according to the feature depth sequence of multiple background feature maps;

Perform convolution on the multiple background feature maps that are superimposed to obtain the fused background feature map.
The method according to claim 1, wherein the performing an up-sampling operation on the image features to obtain a foreground feature map and a background feature map comprises:

The image features are respectively up-sampled through multiple up-sampling paths, and a foreground feature map and a background feature map output by each up-sampling path are obtained respectively, wherein each of the multiple up-sampling paths corresponds to the up-sampling Different ways;

The normalizing the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask image and a background area mask image includes:

Fusion of multiple foreground feature maps output by multiple upsampling paths to obtain a fused foreground feature map, and multiple background feature maps output by multiple upsampling paths to obtain a fused background feature map ；

The pixel value of each pixel of the fused foreground feature map and the pixel value of each pixel of the fused background feature map are normalized to obtain the target area mask image and the background area mask image.
The method according to claim 1, wherein the segmenting the image to be segmented according to the target area mask map and the background area mask map comprises:

For each pixel in the target area mask image, if the pixel value of the pixel is greater than the first preset threshold, the pixel corresponding to the pixel in the image to be divided is determined as Belong to the target area, and segment the pixels of the target area from the image to be segmented;

And/or, for each pixel in the background area mask image, if the pixel value of the pixel is greater than a second preset threshold, the image corresponding to the pixel in the image to be divided is The pixels are determined to belong to the background area; and the pixels of the background area are segmented from the image to be segmented.
The method according to claim 1, characterized in that, before performing feature extraction on the image to be segmented to obtain the image features of the image to be segmented, the method further comprises:

Obtain a sample image, the sample image carries the target area mask annotation map and the background area mask annotation map, the pixel value of the pixel point in the target area in the target area mask annotation map is the first pixel value, and the pixel in the background area The pixel value of the point is the second pixel value, the pixel value of the pixel point of the target area in the background area mask annotation map is the second pixel value, and the pixel value of the pixel point in the background area is the first pixel value ；

Input the sample image into a preset model to perform feature extraction on the sample image through the preset model to obtain image features, and perform an up-sampling operation on the image features through the preset model to obtain Foreground prediction feature map and background prediction feature map;

Normalizing the respective pixel values of the foreground prediction feature map and the background prediction feature map to obtain a target region mask prediction map and a background region mask prediction map;

The preset model is updated according to the target area mask prediction map, the background area mask prediction map, the target area mask annotation map, and the background area mask annotation map.
An image segmentation device, characterized in that the device includes:

The feature extraction module is used to perform feature extraction on the image to be segmented to obtain the image features of the image to be segmented;

The up-sampling module is used to perform an up-sampling operation on the image features to obtain a foreground feature map and a background feature map, each pixel in the foreground feature map corresponds to each pixel in the image to be segmented one-to-one, The pixel value of each pixel in the foreground feature map is characterized by the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and each pixel in the background feature map is related to all the pixels in the background feature map. Each pixel in the image to be segmented corresponds one-to-one, and the pixel value of each pixel in the background feature map represents the possibility that the pixel corresponding to the pixel in the image to be segmented belongs to the background area ；

The normalization module is used to normalize the pixel value of each pixel of the foreground feature map and the pixel value of each pixel of the background feature map to obtain a target area mask image and a background area mask image , Wherein the pixel value of each pixel in the target area mask image represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the target area, and the background area mask image The pixel value of each pixel in, represents: the probability that the pixel corresponding to the pixel in the image to be segmented belongs to the background area;

The segmentation module is configured to segment the image to be segmented according to the target area mask image and the background area mask image.
A computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the steps in the image segmentation method according to any one of claims 1 to 7 when the computer program is executed by a processor.
An electronic device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the image segmentation according to any one of claims 1 to 7 when executed Method steps.
A computer program comprising computer readable code, when the computer readable code runs on a computing processing device, causes the computing processing device to execute the image segmentation method according to any one of claims 1 to 7.