CN117372699A - Image segmentation method and device based on cascade model, medium and electronic equipment - Google Patents

Image segmentation method and device based on cascade model, medium and electronic equipment Download PDF

Info

Publication number
CN117372699A
CN117372699A CN202311435602.5A CN202311435602A CN117372699A CN 117372699 A CN117372699 A CN 117372699A CN 202311435602 A CN202311435602 A CN 202311435602A CN 117372699 A CN117372699 A CN 117372699A
Authority
CN
China
Prior art keywords
segmentation
image
target
segmentation result
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311435602.5A
Other languages
Chinese (zh)
Inventor
杨志雄
杨延展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202311435602.5A priority Critical patent/CN117372699A/en
Publication of CN117372699A publication Critical patent/CN117372699A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The disclosure relates to an image segmentation method, device, medium and electronic equipment based on a cascading model, wherein the image segmentation method comprises the following steps: acquiring an image to be segmented, wherein the image to be segmented is a low-resolution image; inputting the image to be segmented into a first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented; repeatedly executing the correction process until the image resolution of the image corresponding to the obtained third segmentation result reaches the target resolution; the correction process comprises the following steps: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation region in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation region and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation region is a low-resolution image, so that a high-resolution segmentation result is obtained on the premise of low memory consumption and low calculation amount.

Description

Image segmentation method and device based on cascade model, medium and electronic equipment
Technical Field
The disclosure relates to the technical field of image processing, in particular to an image segmentation method, device, medium and electronic equipment based on a cascading model.
Background
The convolutional neural network has wide application and excellent performance in the field of image segmentation, however, the model has more parameters, high calculation complexity and high memory occupation, and is particularly obvious under the condition of higher image resolution.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a cascade model-based image segmentation method, including:
acquiring an image to be segmented, wherein the image to be segmented is a low-resolution image;
inputting the image to be segmented into a first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented;
repeatedly executing the correction process until the image resolution of the image corresponding to the obtained third segmentation result reaches the target resolution; the correction process includes:
And upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation area is a low-resolution image.
In a second aspect, the present disclosure provides an image segmentation apparatus based on a cascading model, including:
the first acquisition module is used for acquiring an image to be segmented, wherein the image to be segmented is a low-resolution image;
the first segmentation module is used for inputting the image to be segmented into a first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented;
the second segmentation module is used for repeatedly executing the correction process under the condition that the image resolution of the image corresponding to the obtained third segmentation result does not reach the target resolution;
the correction process includes: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation area is a low-resolution image.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the image segmentation method described in the first aspect.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing said computer program in said storage means to carry out the steps of said image segmentation method in the first aspect.
According to the technical scheme, the first image segmentation model and the second image segmentation model form a cascade model, and the input of the first image segmentation model and the input of the second image segmentation model are low-resolution images, so that the first image segmentation model and the second image segmentation model are obtained by training by using the low-resolution images, and therefore memory occupation and calculation amount are small; in the up-sampling process, the up-sampling process may not reach the target resolution and the fuzzy boundary generated by the up-sampling process is performed on the target segmentation area of the high-resolution image by using the second image segmentation model in the cascade model, so that the gradual optimization from low resolution to high resolution of the fuzzy boundary is realized, the high-resolution segmentation result is obtained on the premise of low memory consumption and low calculation amount, and the cascade model is suitable for the image segmentation of different target resolutions.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:
fig. 1 is a flowchart illustrating a cascading model-based image segmentation method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a segmented image of one different resolution shown in accordance with an exemplary embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating splitting a first split result to obtain a target split region according to an exemplary embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating an output target image according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram illustrating an image segmentation apparatus based on a cascading model according to an exemplary embodiment of the present disclosure.
Fig. 6 is a schematic structural view of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
Meanwhile, it can be understood that the data (including but not limited to the data itself, the acquisition or the use of the data) related to the technical scheme should conform to the requirements of the corresponding laws and regulations and related regulations.
In the related art, a high-resolution image segmentation result is generally obtained by:
first, upsampling a low resolution image results in a high resolution image that is processed using a conventional image segmentation model, but simple upsampling generally does not recover all the details in the original image, and moreover, the upsampled image may introduce blurring or artifacts.
Second, transfer learning uses a pre-training model to fine tune on the target resolution image dataset, with the disadvantage that the pre-training model may not be suitable for image segmentation for specific application scenarios, particularly for pedestrian application scenarios.
Third, data enhancement, extending training data through various enhancement techniques. To simulate the diversity of high resolution images, which does not address the fundamental problem of high resolution segmentation based on low resolution images.
Fourth, deeper network structures are designed to capture more detail of the image, but require more computing resources and may cause overfitting, especially if the training data is limited.
Fifth, multi-scale input, the designed network accepts multiple resolution inputs simultaneously, and then performs feature extraction and segmentation on different scales, but this method increases computational complexity, and requires labeling on all scales, increasing labeling cost.
In summary, in the above related art, there is a lack of a way to infer a high resolution segmentation result with a good segmentation effect only in the case of low resolution training data, especially in the case of pedestrian image segmentation.
In view of this, embodiments of the present disclosure provide a method, an apparatus, a medium, and an electronic device for image segmentation based on a cascading model.
Embodiments of the present disclosure are further explained and illustrated below in conjunction with the various figures.
Fig. 1 is a flowchart of a cascading model-based image segmentation method according to an exemplary embodiment of the present disclosure, where the image segmentation method may be performed by an electronic device, and in particular, may be performed by an apparatus of the cascading model-based image segmentation method, where the apparatus may be implemented by software and/or hardware, and configured in the electronic device. As shown in fig. 1, the image segmentation method may include the steps of:
in step 110, an image to be segmented is obtained, and the image to be segmented is a low resolution image.
It should be noted that, in the embodiment of the present disclosure, the image resolution of the low resolution image is lower than the preset image resolution.
The image to be segmented comprises a target object, which can be a pedestrian.
Step 120, inputting the image to be segmented into the first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented.
Before the image to be segmented is input into the first image segmentation model, data preprocessing can be performed first, and the image to be segmented after the data preprocessing is input into the first image segmentation model. By way of example, the data preprocessing may be normalization processing, and the like.
The segmentation result mentioned in the embodiment of the present disclosure covers the probability that each pixel in the image to be segmented belongs to each category, and the category includes the target object. Based on the probability that each pixel belongs to each category, the boundary of the target object is determined, and therefore the target object is segmented. The division results here are, for example, a first division result, a second division result, a third division result, and the like in the present embodiment.
The first image segmentation model may be a U-Net model, and the U-Net model may refer to the following related embodiments, which are not described herein.
Step 130, repeating the correction process until the image resolution of the image corresponding to the obtained third segmentation result reaches the target resolution; the correction process comprises the following steps: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation region in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation region and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation region is a low-resolution image.
It is worth noting that the target resolution is the required high resolution. The target resolution may be configured by the actual situation.
The second image segmentation model may be a U-Net model, and the related embodiments of the U-Net model may be referred to as follows, which will not be described herein.
The upsampling is an operation for improving the image resolution, that is, the image resolution corresponding to the second segmentation result is greater than the image resolution corresponding to the first segmentation result. Wherein the upsampling of the first segmentation result may be achieved by a bilinear interpolation algorithm.
It should be noted that, each target segmentation area may include a boundary portion of the target object, and since the boundary is an important portion in image segmentation, and there is a case that there is a boundary blur in the upsampled segmentation result with high resolution, referring to two segmented images with different resolutions as shown in fig. 2, a pixel value of a pixel of the target object is a first pixel value, and a pixel value of a pixel of a non-target object is a second pixel value, where the first pixel value and the second pixel value are different, so that the boundary of the target object is visible. Referring to fig. 2, it can be seen clearly that the boundary is more blurred than the boundary of the image with low resolution under the condition of larger resolution, so that the boundary of the target object can be corrected by the second image segmentation model, and a segmentation result with better boundary segmentation effect can be further obtained.
Wherein, in the case where the image resolution of the third division result does not reach the target resolution, the third division result is taken as a new first division result, and the above-described correction process is re-executed. The image resolution of the image corresponding to the second division result and the image corresponding to the third division result are the same.
From the above, the first image segmentation model and the second image segmentation model form a cascade model, and the inputs of the first image segmentation model and the second image segmentation model are low-resolution images, so that the first image segmentation model and the second image segmentation model are both models obtained by training with the low-resolution images, and therefore, the occupied memory and the calculated amount are small; in the up-sampling process, the up-sampling process may not reach the target resolution and the fuzzy boundary generated by the up-sampling process is performed on the target segmentation area of the high-resolution image by using the second image segmentation model in the cascade model, so that the gradual optimization from low resolution to high resolution of the fuzzy boundary is realized, the high-resolution segmentation result is obtained on the premise of low memory consumption and low calculation amount, and the cascade model is suitable for the image segmentation of different target resolutions.
In some embodiments, the image segmentation method may further include the steps of: and carrying out image post-processing on a third segmentation result with the image resolution reaching the target resolution to obtain a target image, wherein the target image is an image obtained by segmenting a target object from the image to be segmented.
The image post-processing may include morphological operations such as erosion, inflation, open operation, close operation, morphological gradient (Morphological Gradient) operation, top hat operation (top hat operation), and black hat operation, among others.
Wherein the image post-processing may comprise thresholding for converting the third segmentation result into a binarized image.
In a possible manner, the morphological operation may be performed first, after which the third segmentation result of the morphological operation is thresholded to obtain the target image.
Taking a target object as a pedestrian as an example, in the popularization field, a dynamic effect, such as clothes try-on, can be added to the target object in the target image based on the target image; in the AR (Augmented Reality) or VR (Virtual Reality) field, based on the target image, the AR system or VR system may be assisted in determining the position and posture of the pedestrian, thereby providing a more realistic interactive experience.
By the method, the target object is segmented from the image to be segmented, so that a data basis is provided for applications in different fields.
In some embodiments, the target segmented region in the second segmented result may be obtained by directly splitting the second segmented result, so as to obtain a low-resolution image, in which case, the corresponding third segmented result may be obtained by only combining the corrected segmented regions corresponding to the target segmented region. Specifically, the step of correcting the target segmented region in the second segmented result by the second image segmentation model, and obtaining the third segmented result from the corrected segmented region obtained by the correction and the second segmented result may be performed by: dividing the second segmentation result into a plurality of target segmentation areas, wherein each target segmentation area comprises a boundary of a target object; correcting the boundary in each target segmentation area through a second image segmentation model to obtain a corresponding corrected segmentation area; and splicing the corrected segmentation areas corresponding to all the target segmentation areas according to the positions of the target segmentation areas in the second segmentation result to obtain a third segmentation result.
Fig. 3 is a schematic diagram illustrating splitting a first split result to obtain a target split region according to an exemplary embodiment of the present disclosure. Splitting by using the splitting lines in fig. 3 to obtain four target splitting areas corresponding to the first splitting result, wherein each target splitting area respectively comprises the boundary of the target object. It is worth noting that the size of the image resolution of the split target segmentation area is consistent with the image resolution of the training sample when the second image segmentation model is trained. And the second image segmentation model corrects each target segmentation area to obtain corrected segmentation areas for optimizing the boundary of the target image, and after the corrected segmentation areas obtained by correcting each target segmentation area by the second image segmentation model are obtained, the corrected segmentation areas corresponding to all the target segmentation areas are spliced according to the positions of each target segmentation area in the second segmentation result to obtain a third segmentation result.
By the method, the low-resolution target segmentation area is obtained by dividing the second segmentation result, so that the method is applicable to a second image segmentation model trained by using low-resolution images. And splicing the optimized corrected segmentation areas to obtain a third segmentation result with optimized boundaries, wherein each target segmentation area comprises the boundaries of the target object, so that the second image segmentation model cannot carry out invalid correction on the invalid target segmentation area, and invalid occupation of the second image segmentation model on computing resources is avoided.
In some embodiments, before the correction process is performed for the first time, the image segmentation method based on the cascade model may further include the following steps: determining the mass fraction of each first target communication area in the first segmentation result; correcting the first target communication area with the quality score smaller than the first preset score through a third image segmentation model to obtain a corrected communication area corresponding to the first target communication area; and replacing the corresponding first target communication region in the first segmentation result by the correction communication region corresponding to the first target communication region to obtain a fourth segmentation result, wherein the fourth segmentation result replaces the first segmentation result when the correction process is executed for the first time.
The third image segmentation model may be a U-Net model, and the U-Net model may refer to the following related embodiments, which are not described herein.
The first preset score may be set according to an actual situation, which is not limited in this embodiment, and the following related embodiments may be referred to for determining the quality score, which is not described herein.
It should be noted that, generally speaking, the segmentation quality of the boundary of some regions in the first segmentation result obtained by using the first image segmentation model is higher, and the segmentation quality of the boundary of some regions is lower, so before the first upsampling, the boundary of the region with lower segmentation quality may be optimized for one time, then the optimized region is replaced with the corresponding region in the first segmentation result, so as to obtain an optimized segmentation result, and then the optimized segmentation result is replaced with the first segmentation result when the correction process is performed for the first time, that is, when the first correction process is performed on the image to be segmented, the fourth segmentation result is upsampled to obtain a second segmentation result, the target segmentation region in the second segmentation result is corrected by using the second image segmentation model, the third segmentation result is obtained according to the corrected segmentation region and the second segmentation result, and the third segmentation result is used as a new first segmentation result; if the third segmentation result does not reach the target resolution, the next correction process is continued, and the subsequent correction process is: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area obtained by correction and the second segmentation result, and taking the third segmentation result as a new first segmentation result.
By the method, a third image segmentation model is added, a cascade model is further formed by the third image segmentation model, the first image segmentation model and the second image segmentation model, the first image segmentation model is optimized for one time before the first segmentation result is up-sampled, and on the basis, the second image segmentation model is utilized to realize gradual optimization of the boundary in the segmentation result from low resolution to high resolution of the image to be segmented; and only the first target communication area with the quality score smaller than the first preset score is corrected, so that invalid correction of the area with better segmentation effect is avoided, and the calculated amount is reduced.
In a possible manner, the target segmentation region may be a second target connected region with a mass fraction lower than a second preset fraction in the second segmentation result.
The second preset score may be the same as or different from the first preset score, and the second preset score may be set according to an actual situation, which is not limited in this embodiment, and the determination manner of the quality score may refer to the following related embodiments, which are not described herein.
In this way, similarly to the above-described correction of the first target communication region, only the region with the lower quality score in the second divided image needs to be corrected, and the invalid correction of the region with the better dividing effect is avoided, so that the calculation amount is reduced.
In some embodiments, the mass fraction of the target connected region may be determined by: determining a target communication area in the segmentation result, wherein the area size of the target communication area is larger than the preset area size; determining parameters of a target communication area; and determining the mass fraction of the target connected region according to the matching degree of the parameter of the target connected region and the standard reference parameter.
Here, the target communication region is, for example, the first target communication region or the second target communication region described above, and the division result is, for example, the first division result or the third division result described above.
It should be noted that, the segmentation result includes a probability that each pixel belongs to each object (for example, a target object), so that the connected regions in the segmentation result may be divided based on the probability that each pixel belongs to the same object, and for the pixels which are determined to belong to the same object based on the probability and are connected to each other, the pixels may be divided into the same group, that is, one connected region, and the connected region in the segmentation result may be obtained by using the connection component marking algorithm.
In addition, the area size can be represented by the number of pixels or the area in the connected area, and the connected area of the pedestrian is generally larger, so that the connected area with smaller area size can be considered as noise or segmentation error, in order to avoid interference, the connected area with the area size smaller than or equal to the preset area size in the segmentation result can be filtered, only the connected area with the area size larger than the preset area size in the segmentation result is determined as a target connected area, and the subsequent mass fraction determination is performed based on the target connected area.
The size of the preset area may be set according to practical situations, which is not described herein in detail.
Where the size, shape and boundaries of pedestrians are all of their uniqueness, for example, most pedestrians may exhibit relatively regular shapes, and when the segmentation results show areas that do not conform to the characteristics of these shapes, the segmentation quality is poor, requiring further correction. The parameters may thus include shape parameters that characterize the shape, and/or boundary parameters that characterize whether the boundary is smooth. Illustratively, the shape parameters may include aspect ratio, roundness, and compactness, and illustratively, the boundary parameters may include curvature or smoothness.
When the parameters include shape parameters and boundary parameters, the matching degree of the shape parameters and the shape reference standard parameters and the matching degree of the boundary parameters and the boundary reference standard parameters can be combined to obtain the quality fraction of the segmentation result.
By the method, the quality score of the segmented image is determined by utilizing the matching degree of the parameters of the target connected region and the standard reference parameters for representing the target object, so that the quality score of the segmented image is obtained.
In some embodiments, the U-Net model can include an encoder, a decoder, and an output layer, where the encoder encodes an input to obtain an encoded feature, the decoder decodes a feature of the encoder to obtain another feature, the feature is fused with the feature of the encoder and the decoder, the output layer processes the feature to obtain probabilities that each pixel belongs to each class, and based on the probabilities that each pixel belongs to each class, a target image that segments each class can be further obtained, where the class includes the target object.
In training the cascade model constructed by the first image segmentation model and the second image segmentation model, the training process may be as follows: firstly, inputting a low-resolution sample image into a first image segmentation model to obtain a first sample segmentation result; then, a sample correction process is iteratively executed until the third sample segmentation result reaches the target sample resolution, the sample correction process comprising: upsampling the first sample segmentation result to obtain a second sample segmentation result, dividing the second sample segmentation result into areas, upsampling the low-resolution sample image to obtain a second sample image, dividing the second sample image into areas, optimizing the sample segmentation result by using a second image segmentation model according to the sample segmentation result and the image of each partition, outputting optimized sample segmentation results corresponding to each partition, combining the optimized sample segmentation results corresponding to each partition to obtain a third sample segmentation result, and taking the third sample segmentation result as the first sample segmentation result in the next sample correction process; after the sample correction process is executed, constructing a loss function; and carrying out parameter optimization on the first image segmentation model and the second image segmentation model by using the loss function and the labeling data corresponding to the low-resolution sample image, and finally obtaining the first image segmentation model and the second image segmentation model which are trained. It should be noted that, in each sample correction process, the first sample segmentation result and the low-resolution image need to be up-sampled to the same resolution.
In training the cascade model constructed by the first image segmentation model, the second image segmentation model and the third image segmentation model, the training process may be as follows: firstly, inputting a low-resolution sample image into a first image segmentation model to obtain a first sample segmentation result; then, inputting the first sample segmentation result into a third image segmentation model to obtain a fourth sample segmentation result; then, a first correction process is executed, wherein the first correction process comprises sampling a fourth sample segmentation result to obtain a second sample segmentation result, dividing the second sample segmentation result into areas, upsampling a low-resolution sample image to obtain a second sample image, dividing the second sample image into areas, optimizing the sample segmentation result by using a second image segmentation model according to the sample segmentation result and the image of each partition, outputting the optimized sample segmentation result corresponding to each partition, and combining the optimized sample segmentation results corresponding to each partition to obtain a third sample segmentation result. Then, taking the third sample segmentation result as the first sample segmentation result, and iteratively executing the sample correction process mentioned in the cascade model constructed by training the first image segmentation model and the second image segmentation model until the third sample segmentation result reaches the target sample resolution.
By referring to the above-described sample correction process, an embodiment in which each model obtains a third division result satisfying the target resolution in the inference process can be directly obtained.
Fig. 4 is a flow chart illustrating an output target image according to an exemplary embodiment of the present disclosure. Referring to fig. 4, first, a low-resolution pedestrian image is acquired, and data preprocessing is performed on the low-resolution pedestrian image to obtain an image to be segmented; performing initial segmentation on the image to be segmented by using a first image segmentation model to obtain a first segmentation result; checking the mass fraction of each first target communication area in the first segmentation result; optimizing the boundary in the first target communication area with the quality score lower than the first preset score by using the third image segmentation model, and obtaining a fourth segmentation result based on the optimization result; determining whether further optimization is needed, namely determining whether an image corresponding to the fourth segmentation result meets the target resolution, if so, performing morphological operation and threshold processing on the fourth segmentation result to obtain a target image;
if the image corresponding to the fourth segmentation result is determined not to meet the target resolution, optimizing the intermediate segmentation result by using the second image segmentation model, obtaining a third segmentation result based on the optimization result, further determining whether the third segmentation result reaches the target resolution, and if so, performing morphological operation and threshold processing on the third segmentation result to obtain a target image;
And if the image corresponding to the third segmentation result does not meet the target resolution, returning to execute the step of optimizing the intermediate segmentation result by using the second image segmentation model, and obtaining the third segmentation result based on the optimization result.
It is worth to say that, when the step of optimizing the intermediate segmentation result by using the second image segmentation model is performed for the first time and obtaining the third segmentation result based on the optimization result, the intermediate segmentation result is the fourth segmentation result; when the step of optimizing the intermediate segmentation result by using the second image segmentation model is not performed for the first time and obtaining a third segmentation result based on the optimization result, the intermediate segmentation result is the third segmentation result obtained by performing the step for the first time.
It should be noted that, in the step of performing the optimization of the intermediate segmentation result by using the second image segmentation model and obtaining the third segmentation result based on the optimization result, only the region with the low mass fraction in the intermediate segmentation result may be optimized, and the detailed process may refer to the above related embodiments, which are not described herein.
By the mode, a high-resolution segmentation result is obtained by using the cascade model.
Fig. 5 is a block diagram illustrating an image segmentation apparatus based on a cascading model according to an exemplary embodiment of the present disclosure. Referring to fig. 5, the image segmentation apparatus 500 includes:
A first obtaining module 501, configured to obtain an image to be segmented, where the image to be segmented is a low resolution image;
the first segmentation module 502 is configured to input the image to be segmented into a first image segmentation model, and obtain a first segmentation result corresponding to the image to be segmented;
a second segmentation module 503, configured to repeatedly execute the correction process when the image resolution of the image corresponding to the obtained third segmentation result does not reach the target resolution;
the correction process includes: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation area is a low-resolution image.
In some embodiments, the second segmentation module 503 is specifically configured to divide the second segmentation result into a plurality of target segmentation areas, where each of the target segmentation areas includes a boundary of a target object; correcting the boundary in each target segmentation area through a second image segmentation model to obtain a corresponding corrected segmentation area; and splicing all the corrected segmentation areas corresponding to the target segmentation areas according to the positions of the target segmentation areas in the second segmentation result to obtain a third segmentation result.
In some embodiments, the image segmentation apparatus 500 further includes:
the first determining module is used for determining the mass fraction of each first target communication area in the first segmentation result;
the third segmentation module is used for correcting the first target communication area with the quality score smaller than the first preset score through a third image segmentation model to obtain a corrected communication area corresponding to the first target communication area;
and the fourth segmentation module is used for replacing the corresponding first target communication region in the first segmentation results by the correction communication region corresponding to the first target communication region to obtain fourth segmentation results, and the fourth segmentation results replace the first segmentation results when the correction process is executed for the first time.
In some embodiments, the target segmentation region is a second target connected region with a mass fraction lower than a second preset fraction in the second segmentation result.
In some embodiments, the mass fraction of the target connected region is determined by:
determining a target communication area in a segmentation result, wherein the area size of the target communication area is larger than the preset area size;
determining parameters of the target communication area;
And determining the mass fraction of the target communication area according to the matching degree of the parameter of the target communication area and the standard reference parameter.
In some embodiments, the first image segmentation model, the second image segmentation model, and the third image segmentation model are all U-Net models.
In some embodiments, the image segmentation apparatus 500 further includes:
and the processing module is used for carrying out image post-processing on a third segmentation result with the image resolution reaching the target resolution to obtain a target image, wherein the target image is an image of a target object segmented from the image to be segmented.
The implementation of each module in the image segmentation apparatus 500 may refer to the related embodiments, which are not described herein.
The presently disclosed embodiments also provide a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the above-described image segmentation method.
The embodiment of the disclosure also provides an electronic device, including:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the above image segmentation method.
Referring now to fig. 6, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the electronic device may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image to be segmented, wherein the image to be segmented is a low-resolution image; inputting the image to be segmented into a first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented; repeatedly executing the correction process until the image resolution of the image corresponding to the obtained third segmentation result reaches the target resolution; the correction process includes: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation area is a low-resolution image.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module does not in some cases define the module itself.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims (10)

1. An image segmentation method based on a cascading model is characterized by comprising the following steps:
acquiring an image to be segmented, wherein the image to be segmented is a low-resolution image;
inputting the image to be segmented into a first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented;
repeatedly executing the correction process until the image resolution of the image corresponding to the obtained third segmentation result reaches the target resolution; the correction process includes:
and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation area is a low-resolution image.
2. The image segmentation method according to claim 1, wherein the correcting the target segmentation region in the second segmentation result by the second image segmentation model, and obtaining a third segmentation result based on the corrected segmentation region obtained by the correction and the second segmentation result, comprises:
dividing the second segmentation result into a plurality of target segmentation areas, wherein each target segmentation area comprises a boundary of a target object;
correcting the boundary in each target segmentation area through a second image segmentation model to obtain a corresponding corrected segmentation area;
and splicing all the corrected segmentation areas corresponding to the target segmentation areas according to the positions of the target segmentation areas in the second segmentation result to obtain a third segmentation result.
3. The image segmentation method as set forth in claim 1, further comprising, prior to the first execution of the correction procedure:
determining the mass fraction of each first target communication area in the first segmentation result;
correcting a first target communication area with the quality score smaller than a first preset score through a third image segmentation model to obtain a corrected communication area corresponding to the first target communication area;
And replacing the corresponding first target communication region in the first segmentation result by a correction communication region corresponding to the first target communication region to obtain a fourth segmentation result, wherein the fourth segmentation result replaces the first segmentation result when the correction process is executed for the first time.
4. The image segmentation method according to claim 3, wherein the target segmentation region is a second target connected region with a mass fraction lower than a second preset fraction in the second segmentation result.
5. The image segmentation method as set forth in claim 3, wherein the mass fraction of the target connected region is determined by:
determining a target communication area in a segmentation result, wherein the area size of the target communication area is larger than the preset area size;
determining parameters of the target communication area;
and determining the mass fraction of the target communication area according to the matching degree of the parameter of the target communication area and the standard reference parameter.
6. The image segmentation method as set forth in claim 3, wherein the first, second, and third image segmentation models are each a U-Net model.
7. The image segmentation method as set forth in claim 1, further comprising:
and carrying out image post-processing on a third segmentation result with the image resolution reaching the target resolution to obtain a target image, wherein the target image is an image obtained by segmenting a target object from the image to be segmented.
8. An image segmentation apparatus based on a cascade model, comprising:
the first acquisition module is used for acquiring an image to be segmented, wherein the image to be segmented is a low-resolution image;
the first segmentation module is used for inputting the image to be segmented into a first image segmentation model to obtain a first segmentation result corresponding to the image to be segmented;
the second segmentation module is used for repeatedly executing the correction process under the condition that the image resolution of the image corresponding to the obtained third segmentation result does not reach the target resolution;
the correction process includes: and upsampling the first segmentation result to obtain a second segmentation result, correcting a target segmentation area in the second segmentation result through a second image segmentation model, obtaining a third segmentation result according to the corrected segmentation area and the second segmentation result, and taking the third segmentation result as a new first segmentation result, wherein an image corresponding to the target segmentation area is a low-resolution image.
9. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing means, implements the steps of the image segmentation method as claimed in any one of claims 1-7.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing said computer program in said storage means to carry out the steps of the image segmentation method according to any one of claims 1-7.
CN202311435602.5A 2023-10-31 2023-10-31 Image segmentation method and device based on cascade model, medium and electronic equipment Pending CN117372699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311435602.5A CN117372699A (en) 2023-10-31 2023-10-31 Image segmentation method and device based on cascade model, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311435602.5A CN117372699A (en) 2023-10-31 2023-10-31 Image segmentation method and device based on cascade model, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117372699A true CN117372699A (en) 2024-01-09

Family

ID=89402150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311435602.5A Pending CN117372699A (en) 2023-10-31 2023-10-31 Image segmentation method and device based on cascade model, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117372699A (en)

Similar Documents

Publication Publication Date Title
CN112184738B (en) Image segmentation method, device, equipment and storage medium
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN110298851B (en) Training method and device for human body segmentation neural network
CN110826567B (en) Optical character recognition method, device, equipment and storage medium
CN113313064A (en) Character recognition method and device, readable medium and electronic equipment
CN112381717A (en) Image processing method, model training method, device, medium, and apparatus
CN111209856B (en) Invoice information identification method and device, electronic equipment and storage medium
CN112418249A (en) Mask image generation method and device, electronic equipment and computer readable medium
CN112631947A (en) Application program test control method and device, electronic equipment and storage medium
CN112330788A (en) Image processing method, image processing device, readable medium and electronic equipment
CN110310293B (en) Human body image segmentation method and device
CN111311609B (en) Image segmentation method and device, electronic equipment and storage medium
CN110852242A (en) Watermark identification method, device, equipment and storage medium based on multi-scale network
CN116596748A (en) Image stylization processing method, apparatus, device, storage medium, and program product
CN111612714B (en) Image restoration method and device and electronic equipment
CN111340813B (en) Image instance segmentation method and device, electronic equipment and storage medium
CN111612715B (en) Image restoration method and device and electronic equipment
CN117372699A (en) Image segmentation method and device based on cascade model, medium and electronic equipment
CN113936271A (en) Text recognition method and device, readable medium and electronic equipment
CN114399814A (en) Deep learning-based obstruction removal and three-dimensional reconstruction method
CN113011410A (en) Training method of character recognition model, character recognition method and device
CN112488947A (en) Model training and image processing method, device, equipment and computer readable medium
CN111738311A (en) Multitask-oriented feature extraction method and device and electronic equipment
CN111382696A (en) Method and apparatus for detecting boundary points of object
CN111626283B (en) Character extraction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination