CN117853507A - Interactive image segmentation method, device, storage medium and program product - Google Patents

Interactive image segmentation method, device, storage medium and program product Download PDF

Info

Publication number
CN117853507A
CN117853507A CN202410257252.6A CN202410257252A CN117853507A CN 117853507 A CN117853507 A CN 117853507A CN 202410257252 A CN202410257252 A CN 202410257252A CN 117853507 A CN117853507 A CN 117853507A
Authority
CN
China
Prior art keywords
image
segmentation
target
result
prompt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410257252.6A
Other languages
Chinese (zh)
Other versions
CN117853507B (en
Inventor
郭恒
张剑锋
黄家兴
莫志榮
郭大洲
闫轲
吕乐
金达开
许敏丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202410257252.6A priority Critical patent/CN117853507B/en
Publication of CN117853507A publication Critical patent/CN117853507A/en
Application granted granted Critical
Publication of CN117853507B publication Critical patent/CN117853507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Apparatus For Radiation Diagnosis (AREA)
  • Image Processing (AREA)

Abstract

Embodiments of the present invention provide an interactive image segmentation method, apparatus, storage medium, and program product, the method comprising: in the process of interactive image segmentation, the segmentation model responds to interactive operation triggered by a user, and a prompt image corresponding to the target image can be acquired, wherein the prompt image reflects the occurrence position of the interactive operation in the target image. Finally, the segmentation model may utilize the hint image as hint information to image segment the target image. The target image and the corresponding prompt image are three-dimensional images. In the above method, the target image may be segmented using the presentation image expressed as the three-dimensional image as the presentation information. Because the prompt image contains rich space information, the prompt image can accurately describe the occurrence position of the interactive operation in the three-dimensional image, so that the prompt image can provide more accurate prompt information for the segmentation model, and the segmentation accuracy is improved.

Description

Interactive image segmentation method, device, storage medium and program product
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to an interactive image segmentation method, apparatus, storage medium, and program product.
Background
Image segmentation is a technique and process that divides an image into several specific regions with unique properties and proposes an object of interest. It is a key step from image processing to image analysis. And the existing image segmentation technology can be used for accurately segmenting the two-dimensional images applied to different scenes. However, for certain specific scenes, such as medical scenes, animated modeling scenes, and three-dimensional images generated in meteorological scenes, accurate segmentation results are often not yet available.
Therefore, how to improve the segmentation accuracy of the three-dimensional image is a problem to be solved.
Disclosure of Invention
In view of this, embodiments of the present invention provide an interactive image segmentation method, device, storage medium and program product for improving the segmentation accuracy of three-dimensional images.
In a first aspect, an embodiment of the present invention provides an interactive image segmentation method, including:
acquiring a target image represented as a three-dimensional image;
responding to the current interactive operation of a user on the target image, acquiring a prompt image corresponding to the target image, wherein the prompt image reflects the occurrence position of the interactive operation in the target image;
And carrying out image segmentation on the target image according to the prompt image expressed as a three-dimensional image.
In a second aspect, an embodiment of the present invention provides an interactive image segmentation method, including:
responding to interactive operation of a user on a target area in an original image, and acquiring prompt information corresponding to the target area, wherein the prompt information reflects the occurrence position of the interactive operation in the target area;
image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the original image is a three-dimensional image;
and taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image.
In a third aspect, an embodiment of the present invention provides an interactive image segmentation method, including:
displaying a target image represented as a three-dimensional image;
and responding to the interactive operation of the user on the target image, and displaying the segmentation result of the target image.
The segmentation result is determined according to a prompt image expressed as a three-dimensional image, and the prompt image reflects the occurrence position of the interactive operation in the target image.
In a fourth aspect, an embodiment of the present invention provides an interactive image segmentation method, including:
displaying an original image represented as a three-dimensional image;
responding to interactive operation of a user on a target area in the original image, and displaying a segmentation result of the original image;
the determining process of the segmentation result comprises the following steps:
image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the prompt information reflects the occurrence position of the interactive operation in the target area;
and taking the target area and the segmentation result of the target area as prompt information, carrying out image segmentation on other areas in the original image, wherein the target area and the other areas which are overlapped correspond to a target object in the original image, and the sizes of the target area and the other areas are smaller than the size of the target object in the original image.
In a fifth aspect, an embodiment of the present invention provides an interactive image segmentation method, including:
Receiving an image segmentation request generated by a user, wherein the image segmentation request comprises an image segmentation task, the image segmentation task carries a target image which is expressed as a three-dimensional image and a prompt image corresponding to the target image, and the prompt image reflects the occurrence position of the interactive operation triggered by the user on the target image at present;
image segmentation is carried out on the target image according to the prompt image expressed as a three-dimensional image;
and sending an image segmentation result to the user.
In a sixth aspect, an embodiment of the present invention provides an interactive image segmentation method, including:
receiving an image segmentation request generated by a user, wherein the image segmentation request comprises an image segmentation task, the image segmentation task carries an original image which is expressed as a three-dimensional image and prompt information corresponding to a target area in the original image, and the prompt information reflects the occurrence position of interactive operation triggered by the user on the target area;
image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the original image is a three-dimensional image;
taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image;
And sending an image segmentation result to the user.
In a seventh aspect, an embodiment of the present invention provides an interactive image segmentation platform, including: an image display system and an image processing system;
the image display system is used for displaying a target image which is represented as a three-dimensional image; displaying the segmentation result of the target image;
the image processing system is used for responding to the current interactive operation of a user on the target image, acquiring a prompt image corresponding to the target image, and reflecting the occurrence position of the interactive operation in the target image; and carrying out image segmentation on the target image according to the prompt image expressed as a three-dimensional image.
In an eighth aspect, an embodiment of the present invention provides an interactive image segmentation platform, including: an image display system and an image processing system;
the image display system is used for displaying an original image which is represented as a three-dimensional image; displaying the segmentation result of the original image;
the image processing system is used for responding to the interactive operation of a user on a target area in the original image, acquiring prompt information corresponding to the target area, and reflecting the occurrence position of the interactive operation in the target area;
Image segmentation is carried out on the target area according to the prompt information corresponding to the target area;
and taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image.
In a ninth aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is configured to store one or more computer instructions, and where the one or more computer instructions, when executed by the processor, implement the interactive image segmentation method in any one of the first to sixth aspects. The electronic device may also include a communication interface for communicating with other devices or communication systems.
In a tenth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to at least implement the interactive image segmentation method as in any of the first to sixth aspects above.
In an eleventh aspect, embodiments of the present invention provide a program product comprising a computer program or instructions which, when executed by a processor, cause the processor to implement the interactive image segmentation method as in any of the first to sixth aspects above.
In the interactive image segmentation method provided by the embodiment of the invention, the prompt image corresponding to the target image can be obtained in response to the interactive operation triggered by the user, and the prompt image reflects the occurrence position of the interactive operation in the target image. Finally, the target image can be subjected to image segmentation by taking the prompt image as prompt information. The target image and the prompt image may be three-dimensional images.
In the above method, the target image may be segmented using the presentation image represented as the three-dimensional image as the presentation information. Because the prompt image contains rich spatial information, the prompt image can accurately describe the occurrence position of the interactive operation in the three-dimensional image, so that the prompt image can provide more accurate prompt information for the image segmentation process, and the segmentation accuracy of the three-dimensional image is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an interactive image segmentation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hint image according to an embodiment of the present invention;
FIG. 3 is a flowchart of another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a working process of a segmentation model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating the operation of another segmentation model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating the operation of a segmentation model according to another embodiment of the present invention;
FIG. 7 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a relationship between a multi-level sampling layer and a feature extraction layer according to an embodiment of the present invention;
FIG. 9 is a schematic diagram illustrating the operation of a segmentation model according to another embodiment of the present invention;
FIG. 10 is a schematic diagram illustrating the operation of a segmentation model according to another embodiment of the present invention;
FIG. 11 is a schematic diagram of a positional relationship between images according to an embodiment of the present invention;
FIG. 12 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 13 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 14 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 15 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 16 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
FIG. 17 is a flowchart of yet another method for interactive image segmentation according to an embodiment of the present invention;
fig. 18 is a schematic structural diagram of an interactive image segmentation platform according to an embodiment of the present invention;
FIG. 19 is a flowchart of a model training method according to an embodiment of the present invention;
FIG. 20 is a flowchart of another model training method according to an embodiment of the present invention;
fig. 21 is a schematic structural diagram of an interactive image segmentation apparatus according to an embodiment of the present invention;
FIG. 22 is a schematic diagram of another interactive image segmentation apparatus according to an embodiment of the present invention;
FIG. 23 is a schematic structural diagram of another interactive image segmentation apparatus according to an embodiment of the present invention;
fig. 24 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to an identification", depending on the context. Similarly, the phrase "if determined" or "if identified (stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (stated condition or event)" or "in response to an identification (stated condition or event), depending on the context.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.
Before describing in detail embodiments that are provided for the present invention, concepts that are related to the present invention may be explained as well:
interactive image segmentation is an image segmentation technique. The user can trigger the interactive operation on the image, and the occurrence position of the interactive operation in the image can be used as prompt information of image segmentation, so that the image segmentation is realized. The interaction operation may include a clicking operation, a scribing operation, a framing operation, or the like. For a detailed description of the interaction, reference may be made to the relevant description in the following embodiments.
Currently, interactive image segmentation has been widely applied to images generated under multiple scenes, such as traffic scenes, where two-dimensional road images acquired on roads can be segmented. Also, such as medical scenes, animated modeling scenes, meteorological scenes, etc. as mentioned in the background, interactive image segmentation may also perform image segmentation on three-dimensional images generated under these scenes.
In practice, the interactive image segmentation is used to obtain the segmentation result, and the more important point is that the attribute information of the segmented object can be obtained from the image segmentation result. For example, after the interactive image segmentation is used to segment the target anatomical structure in the medical image, attribute information such as size information, gray information and the like of the target anatomical result can be obtained after the segmentation result is obtained.
The methods provided by the following embodiments of the present invention can be used in order to improve the segmentation accuracy of the three-dimensional image.
Some embodiments of the invention will now be described in detail with reference to the accompanying drawings. In the case of no conflict between the embodiments, the following embodiments and features and steps in the embodiments may be combined with each other. In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
Fig. 1 is a flowchart of an interactive image segmentation method according to an embodiment of the present invention. The interactive image segmentation method provided by the embodiment of the invention can be executed by processing equipment with image segmentation capability. As shown in fig. 1, the method may include the steps of:
S101, acquiring a target image expressed as a three-dimensional image.
The processing device may acquire a target image to be segmented. The target image may include at least one object, and the objects may specifically include a target object that needs to be segmented, or may include other objects that need not be segmented.
Wherein the target image may be a three-dimensional image. The size of the target image may be preset. Alternatively, in different scenarios, the target image may be a medical image, a game modeling image, a satellite cloud image, a remote sensing image, or the like. Alternatively, the medical image may particularly comprise an X-ray, an ultrasound image, a nuclear magnetic resonance image (Nuclear Magnetic Resonance Imaging, NMRI) or the like. The X-ray images may in particular comprise X-ray flat panels, computed X-ray (computed radiography, abbreviated CR) images, digital X-ray (Digital Radiography, abbreviated DR) images, and computed tomography (Computed Tomography, abbreviated CT) images.
When the target image is a medical image, the object contained in the target image may be an anatomical structure of a human body; when the target image is a game modeling image, the object contained in the target image may be a game environment such as a city, a forest, a mountain, or the like, or may be a game character, prop, or the like. These objects in the target image may be targeted objects or other objects, depending on the user's needs.
Alternatively, the target object in the target image may be a complete object or may be a part of a complete object. Taking the medical image as an example, the target image may comprise a complete anatomy, such as a complete kidney; the target image may also contain a part of the anatomy, such as a part of the kidney, in which case the target image is an image region (patch) in an image containing the complete anatomy.
S102, responding to the current interactive operation of the user on the target image, acquiring a prompt image corresponding to the target image, wherein the prompt image reflects the occurrence position of the interactive operation in the target image.
The processing device may present the acquired target image to a user, based on which the user may trigger an interactive operation on the target image. The processing device may further obtain a hint image corresponding to the target image in response to the interactive operation.
Wherein the hint image may also appear as a three-dimensional image, and the hint image and the target image are the same size. Meanwhile, the prompt image can also reflect the occurrence position of the interactive operation in the target image. And for a target image represented as a three-dimensional image, the three-dimensional hint image may more accurately describe the location of the occurrence of the interaction in the target image than if the two-dimensional vector was used to reflect the location of the occurrence of the interaction in the target image.
Optionally, in the hint image, the voxel corresponding to the location where the interaction occurs has a different color value than the voxel corresponding to the location where the interaction does not occur.
Alternatively, the interactive operation may include a user's framing operation on the target image, or the like. As shown in fig. 2, assuming that the interactive operation triggered by the user is a frame selection operation, the background of the prompting image is a first color value, and a frame formed by a second color value also exists in the prompting image, where the frame is the position where the frame selection operation of the user occurs in the target image.
Optionally, the interaction operation may further include a click operation triggered by the user on the target image. The clicking operation specifically may further include a clicking operation, i.e., a positive clicking operation, triggered by the target object to be segmented in the target image; and the method can also comprise the step of triggering a clicking operation, namely a negative point operation, on other objects which do not need to be segmented in the target image. The background of the prompting image is a first color value, a point formed by a second color value can be further arranged in the prompting image, the point is the position where the positive clicking operation of the user in the target image occurs, and a point formed by a third color value can be further arranged in the prompting image, the point is the position where the negative clicking operation of the user in the target image occurs.
In the following, a positive point operation and a negative point operation are also described with reference to medical images, assuming that the target image is an abdominal CT image of organs including a liver, a left kidney, a right kidney, and a stomach of a human body, and that the target object to be segmented is a liver. The user may trigger a positive point operation on the region of the image where the liver is located. Further, in order to divide the liver more accurately, optionally, the user may trigger the negative point operation on the image areas of the left kidney, the right kidney and the stomach, that is, the image area where the liver is not located may be excluded by the negative point.
S103, performing image segmentation on the target image according to the prompt image expressed as the three-dimensional image.
Finally, the processing device may divide the target image by using the prompt image as the prompt information, that is, complete the image division task. Alternatively, the target image and the hint image may be input into a segmentation model built in the processing device, thereby achieving segmentation of the target image.
Alternatively, the segmentation model may specifically include a SAM (Segment Anything Model) model, a full convolutional neural network (Fully Convolutional Networks, abbreviated as FCN) model, a Mask Region convolutional neural network (Mask Region-Convolutional Neural Network, abbreviated as Mask R-CNN) model, a recursive residual neural network-based U-network (Recurrent Residual CNN-based U-Net, abbreviated as R2U-Net) model, and the like.
In the interactive image segmentation method provided by the embodiment of the invention, the prompt image corresponding to the target image can be obtained in response to the interactive operation triggered by the user, and the prompt image reflects the occurrence position of the interactive operation in the target image. Finally, the target image can be subjected to image segmentation by taking the prompt image as prompt information. The target image and the prompt image may be three-dimensional images.
In the above method, the target image may be segmented using the presentation image represented as the three-dimensional image as the presentation information. Compared with the method that the two-dimensional vector is used for reflecting the occurrence position of the interactive operation in the three-dimensional image, the three-dimensional prompt image can contain richer spatial information, so that the occurrence position of the interactive operation in the three-dimensional image can be described more accurately, the prompt image can provide more accurate prompt information for the image segmentation process, and the segmentation accuracy of the three-dimensional image is improved.
In practice, the user can trigger at least one round of interactive operation on the target image, and the more the number of interactive rounds is, the more abundant prompt information can be provided for the image segmentation process, so that the accuracy of image segmentation can be improved. Optionally, each round of interaction may in turn comprise at least one interaction. When the user triggers the graph interaction operation and the segmentation result corresponding to the round of interaction operation does not reach the user requirement, the user can trigger the next round of interaction operation.
Whether or not the current interaction operation triggered by the user on the target image in step S102 is the first interaction operation triggered by the user on the target image, in an alternative manner, the prompting image obtained after the current interaction operation triggered by the user on the current round may be used as prompting information to perform image segmentation according to the embodiment shown in fig. 1.
If in step S102, the current interaction operation triggered by the user on the target image is not the first interaction operation triggered by the user on the target image, then image segmentation may be implemented according to the embodiment shown in fig. 3 alternatively. In the embodiment shown in fig. 3, the historical segmentation result corresponding to the target image obtained after the previous round of interaction operation triggering by the user and the image of the occurrence position of the interaction operation currently triggered by the user can be taken as the prompt image together, so as to realize image segmentation.
Fig. 3 is a flowchart of another interactive image segmentation method according to an embodiment of the present invention. As shown in fig. 3, the following steps may be included:
s201, a target image expressed as a three-dimensional image is acquired.
The specific implementation process of the above step S201 may refer to the specific description of the related steps in the embodiment shown in fig. 1, which is not repeated herein.
S202, responding to the current interactive operation of the user on the target image, acquiring a prompt image corresponding to the target image, wherein the prompt image reflects the occurrence position of the interactive operation in the target image, and the prompt image also comprises a historical segmentation result corresponding to the target image after the interactive operation is triggered on the target image in the last round of the user.
If the current interactive operation triggered by the user on the target image is not the first round of interactive operation triggered by the target image, the processing device may respond to the current interactive operation to obtain a prompt image, where the prompt image includes an image capable of reflecting the current interactive operation occurrence position, and may further include a history segmentation result corresponding to the target image after the previous round of interactive operation triggered by the user on the target image.
For example, the user currently triggers a second round of interactive operations on the target image. The processing device may obtain an image reflecting the location of the occurrence of the second round of interaction in the target image in response to the second round of interaction. When the second round of interactive operation triggered by the user is a frame selection operation, the image may be as shown in fig. 2. The processing device may also obtain a history segmentation result corresponding to the target image. Wherein the historical segmentation result may be a segmentation result output by the processing device in response to the first round of interaction operations. The two obtained images are prompt images corresponding to the target object after triggering the second interactive operation.
S203, inputting the target image and the prompt image with the same size into the segmentation model to output a segmentation result by the segmentation model.
The target image and the presentation image acquired in the above steps may be input into a segmentation model built in the processing apparatus to segment the target image by the segmentation model. Wherein the size of the target image and the prompt image are the same.
Alternatively, the working process of the segmentation model may be: the segmentation model may encode the target image and the hint image to extract a first feature image from the target image and a second feature image from the hint image. Then, the segmentation model may perform feature fusion processing, that is, fusion of the first feature image and the second feature image. Alternatively, the fusing operation may include at least one of a dot product operation, an add operation, and an attention mechanism weighting operation. Finally, the segmentation model can decode the fusion result and output the segmentation result.
In this embodiment, on the one hand, when the currently triggered interactive operation of the user is not the first round, the prompting image not only includes an image capable of reflecting the current occurrence position of the interactive operation, but also includes a history segmentation result corresponding to the target image after the previous round of interactive operation is triggered on the target image by the user. Because the historical segmentation result of the target image can reflect the segmentation condition of the target object in the target image, the historical segmentation result is also used as the prompt information when the user triggers the interactive operation currently, so that the prompt information used in the image segmentation process is richer, and the accuracy of the image segmentation can be improved.
On the other hand, the segmentation model can process the target image and the prompt image, and the prompt image expressed as the three-dimensional image contains rich spatial information, namely, the prompt image can accurately describe the occurrence position of the interactive operation in the three-dimensional image, so that the segmentation model can output more accurate segmentation results through the position prompt function of the prompt image.
Based on the embodiment shown in fig. 3, the specific process of completing image segmentation by the user through multiple rounds of trigger interactions can also be understood in conjunction with the following examples:
the user can trigger the first round of interaction operation on the target image, and at this time, the processing device can acquire the prompt image corresponding to the target image after the first round of interaction operation is generated. The hint image and the target image of the same size may be input into a segmentation model, from which a historical segmentation result corresponding to the target image is output.
Then, the user can trigger the second round of interaction operation on the target image, and the processing device can acquire the prompt image corresponding to the second round of interaction operation and input the prompt image and the target image into the segmentation model so as to output a segmentation result corresponding to the target image by the segmentation model. The prompting image corresponding to the second round of interaction operation can comprise a historical segmentation result of the target image and an image reflecting the occurrence position of the interaction operation triggered by the second round of interaction operation of the user. And if the user is satisfied with the segmentation result, the image segmentation can be stopped, and the user does not need to trigger the next round of interaction operation.
It has been mentioned in the embodiment shown in fig. 3 that the segmentation model may be used for image segmentation of the target image. The working of the segmentation model may be described in detail below in connection with its model structure. Alternatively, a schematic diagram of the working process of a segmentation model may be seen in fig. 4. As shown in fig. 4, the segmentation model may include an encoder and a decoder.
Specifically, after the target image and the prompt image are input into the segmentation model, the encoder in the segmentation model may extract a first feature image from the target image and extract a second feature image from the prompt image, and the encoder may further fuse the first feature image and the second feature image. The decoder in the segmentation model may decode the fusion result and output the segmentation result of the target image. Wherein the first feature image may be represented by X and the second feature image may be represented by M. Alternatively, in order to secure the effect of fusion, the size of the first feature image X may be the same as the size of the second feature image M.
For the extraction process of the first feature image X and the second feature image M, the feature extraction layer in the encoder may be used to extract the first feature image X from the target image and extract the second feature image M from the hint image.
Optionally, in order to further improve the efficiency of the segmentation model, i.e. reduce the computation of the segmentation model, the encoder in the segmentation model may further comprise a sampling layer, as shown in fig. 5, on the basis of comprising a feature extraction layer.
The sampling layer in the encoder may downsample the target image and the hint image to obtain a first sampled image corresponding to the target image and a second sampled image corresponding to the hint image. Alternatively, the multiple of downsampling may be set according to actual needs.
Then, the feature extraction layer in the encoder may perform feature extraction on the first sampled image and the second sampled image, respectively, to obtain a first feature image X and a second feature image M having the same size. The sizes of the first sampling image and the first feature image are the same, and the sizes of the second sampling image and the second feature image are the same, that is, the sizes of the image of the input feature extraction layer and the image of the output feature extraction layer are the same, that is, the feature extraction layer in the encoder does not change the size of the input image.
In this embodiment, dimension reduction of the target image and the prompt image can be achieved by downsampling the image and extracting features, and the dimension-reduced image has smaller data size, so that the calculation amount of the segmentation model in the image segmentation process can be reduced, and the efficiency of the segmentation model can be improved.
For the process of fusing the first feature image X and the second feature image M, the fusion process may be implemented using a feature fusion layer in the encoder. The working process of the feature fusion layer may specifically be: the feature fusion layer may first convolve the second feature image M to obtain an intermediate convolution result. The feature fusion layer may then convolve the intermediate convolution result with the first convolution parameter to obtain a first convolution result. Meanwhile, the feature fusion layer can also convolve the intermediate convolution result by using a second convolution parameter to obtain a second convolution result. Wherein the first convolution result may be expressed asThe second convolution result may be expressed as +.>
Then, the first characteristic image X is combined with the first convolution resultAnd performing dot multiplication operation to obtain dot multiplication results. Then, the dot product is added to the second convolution result +.>And performing addition operation to finally obtain a fusion result of the first characteristic image X and the second characteristic image M.
Optionally, to ensure feature fusionThe feature fusion layer can normalize the first feature image X and further normalize the first feature image X and the first convolution result And performing a dot product operation. The above fusion process can also be understood in conjunction with fig. 6.
In addition, the above fusion process may also be considered as a progressive spatially aligned hint coding scheme (Progressively and Spatially Aligned Prompt, abbreviated as PSAP).
In this embodiment, the feature fusion layer in the encoder can fuse the first feature image corresponding to the target image and the second feature image corresponding to the prompt image, so as to obtain a fusion result with rich information, so as to further improve accuracy of the segmentation result.
For the decoding process of the fusion result, the decoder in the segmentation model may optionally specifically comprise a sampling layer.
Because the fusion result is a result of fusing the downsampled image and is different from the size of the target image, in order to ensure that the size of the finally output segmentation result is the same as the size of the target image, the fusion result can be upsampled by using a sampling layer in the decoder to obtain an upsampled result with the same size as the target image. The decoder may then output a segmentation result of the target image based on the upsampling result.
In this embodiment, the size of the fusion result can be up-sampled to be the same as the size of the target image by the sampling layer in the decoder, that is, the size of the segmented image of the target image which is finally output is ensured to be the same as the size of the target image, that is, the output segmented result is more visual and clear.
The above embodiments are not limited to the number of stages of the sampling layer and the feature extraction layer in the encoder. Optionally, the more the number of steps of each of the sampling layer and the feature extraction layer, the more abundant the size of the obtained feature image, since the feature images of different sizes may contain features of different levels in the target image and the hint image, i.e. the smaller the size, the higher the feature contained in the feature image. The feature of rich hierarchy can obviously improve the accuracy of image segmentation.
When the sampling layer and the feature extraction layer each include multiple stages, a process of segmenting the target image by the segmentation model may be described with two adjacent sampling layers and two adjacent feature extraction layers as an example, and may also be as shown in fig. 7. Fig. 7 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. As shown in fig. 7, the following steps may be included:
s301, a target image expressed as a three-dimensional image is acquired.
S302, responding to the current interactive operation of the user on the target image, acquiring a prompt image corresponding to the target image, wherein the prompt image reflects the occurrence position of the interactive operation in the target image, and the prompt image also comprises a historical segmentation result corresponding to the target image after the interactive operation is triggered on the target image in the last round of the user.
The specific implementation process of the steps S301 to S302 may refer to the specific description of the related steps in the embodiment shown in fig. 3, which is not repeated herein.
S303, downsampling is carried out by using an upper sampling layer in the encoder in the segmentation model so as to obtain a third sampling image corresponding to the target image and a fourth sampling image corresponding to the prompt image.
And S304, respectively carrying out feature extraction on the third sampling image and the fourth sampling image by utilizing a previous-stage feature extraction layer in the encoder in the segmentation model so as to obtain a third feature image of the target image and a fourth feature image of the prompt image.
And S305, downsampling the third characteristic image and the fourth characteristic image by using a next sampling layer in the encoder in the segmentation model to obtain a first sampling image corresponding to the target image and a second sampling image corresponding to the prompt image.
S306, respectively carrying out feature extraction on the first sampling image and the second sampling image by utilizing a next-stage feature extraction layer in the encoder in the segmentation model to obtain a first feature image and a second feature image, fusing the first feature image and the second feature image by the encoder, decoding the fusion result by the decoder in the segmentation model, and outputting a segmentation result of the target image, wherein the size of the third feature image is larger than that of the first feature image.
When the sampling layer and the feature extraction layer each include multiple stages, the relationship between the input and output of each of the sampling layer and the feature extraction layer of adjacent stages may be as shown in fig. 8, specifically as follows:
the output of the upper sampling layer may be the input of the upper feature extraction layer, the output of the upper feature extraction layer may be the input of the lower sampling layer, and the output of the lower sampling layer may be the input of the lower feature extraction layer.
Specifically, an upper sampling layer in the encoder may downsample the image to obtain a third sampled image corresponding to the target image and a fourth sampled image corresponding to the hint image.
Here, if the previous sampling layer is the first sampling layer in the encoder, the sampling layer samples the target image and the hint image, and the sampling layer may include a step block. If the previous sampling layer is not the first sampling layer in the encoder, the sampling layer samples the feature image output by the feature extraction layer adjacent to the previous sampling layer. Alternatively, this sampling layer may include resTstage.
Then, the previous-stage feature extraction layer in the encoder may perform feature extraction on the third sampled image and the fourth sampled image, respectively, so as to obtain a third feature image of the target image and a fourth feature image of the hint image.
Then, the next sampling layer in the encoder can downsample the third feature image and the fourth feature image of the target image to obtain a first sampling image corresponding to the target image and a second sampling image corresponding to the prompt image. Wherein the size of the third feature image is larger than the size of the first feature image.
Then, the next-stage feature extraction layer in the encoder can respectively perform feature extraction on the first sampling image and the second sampling image so as to obtain a first feature image and a second feature image. And the encoder may further fuse the first feature image and the second feature image to decode the fusion result by a decoder in the segmentation model and output a segmentation result of the target image.
For example, assume that the size of the target image may be represented by w×h×d, W represents the width, H represents the height, and D represents the depth. And it is assumed that two-stage sampling layers and two-stage feature extraction layers are included in the encoder in the segmentation model. The relationship between the inputs and outputs of the two-stage sampling layer and the feature extraction layer, respectively, may be as shown in fig. 9.
As shown in fig. 9, the sizes of the third sampled image and the fourth sampled image output by the first-stage sampling layer may be W/16×h/16×d/16, and since the size of the input image is not changed by any one of the first-stage feature extraction layers, the sizes of the third feature image and the fourth feature image extracted by the first-stage feature extraction layer may also be W/16×h/16×d/16. The size of the first sampled image and the second sampled image output by the second stage sampling layer may be W/32×h/32×d/32. The sizes of the first feature image and the second feature image output by the second-stage feature extraction layer may also be W/32×h/32×d/32.
In practice, optionally, the encoder may further include four-stage sampling layers and four-stage feature extraction layers, and if the sampling multiple of each stage of sampling layer may be 2, the sizes of the sampled images output by each stage of sampling layer may sequentially include: W/4*H/4*D/4, W/8*H/8*D/8,W/16H/16D/16, W/32H/32D/32. The sizes of the feature images output by the feature extraction layers of each stage can also sequentially comprise W/4*H/4*D/4, W/8*H/8*D/8,W/16H/16D/16, and W/32H/32D/32.
In this embodiment, the encoder in the segmentation model includes a multi-stage sampling layer and a multi-stage feature extraction layer, and feature images of different sizes corresponding to the target image and the hint image can be extracted respectively through the multi-stage sampling layer, that is, feature images of different levels in the image are extracted. Then, through the fusion result between the characteristic images, a more accurate segmentation result can be obtained, and the accuracy of target image segmentation is improved.
Alternatively, the encoder may also include a multi-level feature fusion layer, as well as a multi-level sampling layer, a multi-level feature extraction layer. And the sampling layer, the feature extraction layer and the feature fusion layer in the segmentation model have the same number of stages. At this time, the working process of the adjacent two-stage feature fusion layer may be specifically as follows:
Firstly, the upper-level feature fusion layer in the encoder can respectively carry out convolution processing on the same feature image corresponding to the target image output by the feature extraction layer of the same level by utilizing different convolution parameters, and further fuse convolution results. The fusion process can be described in detail with reference to the embodiment shown in fig. 6.
And then, the sampling layer in the decoder can up-sample the fusion result, and splice the up-sampling result and the characteristic image corresponding to the prompt image to obtain a spliced result. The size of the up-sampling result is the same as the size of the feature image corresponding to the prompt image, and the prompt image is output by the feature extraction layer at the same level as the previous feature fusion layer.
And then, the next-stage feature fusion layer in the encoder can be utilized to respectively carry out convolution processing on the feature images output by the feature extraction layers of the same level by utilizing different convolution parameters, and further fusion is carried out on convolution results.
For example, as shown in fig. 10, assuming that the encoder specifically includes two stages of sampling layers, two stages of feature extraction layers, and two stages of feature fusion layers, the encoder may first perform convolution processing on the second feature image M by using different convolution parameters included in the first stage of feature fusion layer, so as to obtain a third convolution result and a fourth convolution result. Wherein the third convolution result may be expressed as The fourth convolution result may be expressed as +.>
Then, the first-stage feature fusion layer can integrate the first feature image X and the third convolution resultAnd fourth convolution result->Fusion is performed. The fusion process can be described in detail with reference to the embodiment shown in fig. 6. Optionally, in order to ensure the effect of feature fusion, the first-stage feature fusion layer may further normalize the first feature image X, and normalize the result of the normalization processing of the first feature image X with the third convolution result +.>Performing point multiplication operation, and then combining the point multiplication result with a fourth convolution resultAnd performing addition operation to obtain a first fusion result.
Then, the sampling layer in the decoder may further up-sample the first fusion result, and splice (concat) the up-sampled result and the third feature image to obtain a spliced result. Wherein the up-sampling result has the same size as the third feature image.
Then, the encoder can respectively carry out convolution processing on the fourth characteristic image by utilizing different convolution parameters contained in the second characteristic fusion layer so as to obtain a first convolution resultSecond convolution result->
Finally, the second-level feature fusion layer can be used for splicing the results and the first convolution result And second convolution result->Fusion is performed. The fusion process is also described in detail with reference to the embodiment shown in fig. 6. Optionally, to ensure featuresThe second-stage feature fusion layer can normalize the splicing result and the first convolution result +.>Performing dot multiplication operation, and then combining the dot multiplication result with the second convolution result>And performing addition operation to obtain a second fusion result.
In this embodiment, the multi-stage fusion feature layer of the encoder is utilized to fuse the feature extraction result corresponding to the target image with the feature extraction result corresponding to the prompt image serving as the prompt information, so that the information contained in the fusion result is more complete. And the fusion result generated by the previous-stage feature fusion layer can be used by the next-stage feature fusion layer in the fusion process, so that the information contained in the fusion result is more accurate, and the accuracy of target image segmentation is improved.
The above embodiment has mentioned that the target object included in the target image of the preset size may be a complete object or may be a part of the object. Alternatively, in practice, in view of the computing power of the segmentation model, the target object contained in the target image is often a part of the complete object, and the complete object may be contained in the original image, then the target image is one image area (patch) in the original image.
For the determination of the target image, the processing device may optionally determine, as the target image, a target area in the original image in which the interaction occurs in response to the interaction of the user with the original image. The target area corresponds to a target object in the original image that needs to be segmented. Alternatively, the interactive operation may be a user's framing operation of the image region. Alternatively, the size of the target image may be set in advance.
For example, assume that the original image is an abdominal CT image of a human body. Then when the user initiates an interactive operation on the region of the liver, the processing apparatus may determine an image region of the region in which the liver is located as the target image in response to the interactive operation.
In practice, the target object to be segmented in the original image may have a larger volume, and the preset size of the target image is usually smaller, so that the target object may not be completely contained, and therefore after the target image corresponding to the target area in the original image is segmented in the manner mentioned in the above embodiments, other images corresponding to other areas in the original image are often needed to be segmented, so that the segmentation of the target object in the original image is finally completed. Alternatively, the target image and the other images are the same size and there may be coincidence. The positional relationship among the original image, the target image, and the other images may be as shown in fig. 11.
For example, when the original image is an abdominal CT of a human body, the image region of the large volume of liver to be segmented in the image, i.e. the image region of the liver in the abdominal CT, may be composed of a plurality of image regions of the same size, and the segmentation model segments one image difference at a time.
Optionally, the processing device may also provide the user with an automatic segmentation function of the adjacent regions. If the function is not started, optionally, the user may trigger a round of interaction operation, so that the segmentation model segments the target image corresponding to the interaction operation. When the user triggers one round of interactive operation and the segmentation result corresponding to the round of interactive operation does not meet the user requirement, the user can further trigger the next round of interactive operation so that the segmentation model determines other images in the original image and continues to segment the other images, and therefore the segmentation of the target object in the original image is finally completed.
If the automatic segmentation function is turned on, the following embodiment shown in fig. 12 may be used, where after the target image corresponding to the first round of interaction is segmented, other images adjacent to the target image are automatically segmented, so as to improve the segmentation efficiency of the target object.
Fig. 12 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. As shown in fig. 12, the following steps may be included:
s401, determining other areas in the original image as other images, where the target area is the same size as the other areas and there is coincidence.
S402, taking the target image and the segmentation result of the target image as a prompt image, and carrying out image segmentation on other images, wherein other areas and the target area correspond to the same object in the original image.
Alternatively, the processing device may determine other areas in the original image as other images in response to the activation of the automatic segmentation function. Wherein the other regions are the same size and overlap as the target region, and the other regions correspond to the same target object in the original image as the target region. Alternatively, the other region may include at least one. After image segmentation is performed on the target image in the original image, the processing device may further perform image segmentation on other images by using the target image and the segmentation result of the target image as a prompt image.
Alternatively, the above-described segmentation process of the target image and other images may be performed by a segmentation model in the processing device. Specifically, then, when the target image is segmented, the input of the segmentation model is the target image and a presentation image of the target image. When the other image is segmented, the input of the segmentation model is the other image, the target image, and the segmentation result of the target image, which are the same in size. The target image and the segmentation result of the target image can be used as a prompt image of other images, and the prompt image is also a three-dimensional image.
In this embodiment, when the target object to be segmented in the original image is large and an image area of a preset size cannot completely contain the target object, after the automatic segmentation function is started, other images adjacent to the target image can be automatically segmented, and the user does not need to trigger the interactive operation again. Therefore, the present embodiment improves the image segmentation efficiency by reducing the number of interactions.
After the automatic segmentation function is started, when the segmentation model segments the target image and other images, the used prompting images are three-dimensional images, and the occurrence position of interactive operation or the segmentation result contained in the prompting images can provide accurate prompting information, so that the accuracy of three-dimensional image segmentation is improved.
Fig. 13 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. The execution subject of the method may in particular also be a processing device, and the processing device may also provide the user with an automatic segmentation function of the adjacent areas. As shown in fig. 13, the method may include the steps of:
s501, responding to the interactive operation of a user on a target area in an original image, acquiring prompt information corresponding to the target area, wherein the prompt information reflects the occurrence position of the interactive operation in the target area.
S502, image segmentation is carried out on the target area according to the prompt information corresponding to the target area, and the original image is a three-dimensional image.
S503, taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the size of the target area and the other areas is smaller than the size of a target object in the original image.
The processing device may present an original image to a user, based on which the user may trigger a round of interaction with a target area in the original image. Alternatively, the interactive operation may include a user's selection operation or a click operation of a target area in the original image. Wherein the original image may be represented as a three-dimensional image.
The processing device responds to the round of interactive operation and can acquire prompt information corresponding to the target area. The hint information may reflect the location of the occurrence of the interactive operation in the target area. Alternatively, the hint information may be a normal two-dimensional vector, or may be a hint image represented as a three-dimensional image as mentioned in the above embodiments. Then, the processing device may perform image segmentation on the target area according to the prompt information corresponding to the target area, so as to obtain a segmentation result of the target area.
At this time, if the automatic segmentation function of the processing device is not started, optionally, the user may further trigger the next round of interaction operation, so that the processing device uses the target area and the segmentation result of the target area as prompt information to segment the image of other areas in the original image, thereby finally completing the segmentation of the target object in the original image. The target area in the original image is overlapped with other areas, and the size of the target area and the other areas is smaller than that of the target object in the original image.
If the automatic segmentation function of the processing device is started, optionally, the processing device may automatically take the target area and the segmentation result of the target area as prompt information in response to the starting of the automatic segmentation function, and perform image segmentation on other areas in the original image, so as to finally complete segmentation of the target object in the original image.
Alternatively, the above-described process of segmenting the target region and the other regions may be performed by a segmentation model in the processing device.
In this embodiment, after the processing device starts the automatic segmentation function, the processing device may automatically segment the target region and other adjacent regions in the original image, without triggering the interactive operation again by the user. Therefore, by the automatic dividing function of the processing device, the image dividing efficiency can be improved by reducing the number of interactions.
After the automatic segmentation function is started, the processing equipment can automatically segment other areas which are overlapped with the target area after the target area is segmented, and the segmentation result corresponding to the target area, namely the historical segmentation result, is also used when the other areas are segmented, so that the prompt information used in the segmentation process of the other areas is richer, and the accuracy of image segmentation can be improved.
In addition, the details not described in detail in this embodiment and the technical effects that can be achieved may be related to the descriptions in the above embodiments, which are not described herein.
The process of image segmentation is described in detail in the above embodiments, on the basis of which, alternatively, the process of interactive image segmentation may be described in an interactive manner below. Fig. 14 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. The execution subject of the method may in particular also be a processing device, and the processing device may also provide a presentation function for a user. As shown in fig. 14, the method may include the steps of:
s601, a target image expressed as a three-dimensional image is displayed.
S602, responding to interactive operation of a user on the target image, and displaying a segmentation result of the target image.
The processing device may present the target image to the user. Based on the target image, the user can trigger the interactive operation on the target image, the processing device responds to the interactive operation, the image segmentation of the target image can be realized, and the segmentation result of the target image can be displayed to the user.
Alternatively, the interactive operation may include a frame selection operation or a click operation of the target image by the user. The specific description may be found in the above related embodiments, and will not be repeated here.
Wherein the segmentation result of the target image may be determined by the processing device from the hint image and the target image. Both the prompt image and the target image are three-dimensional images, and the prompt image can reflect the occurrence position of the interactive operation in the target image.
Optionally, the image segmentation may be specifically implemented by a segmentation model in the processing device, and the acquiring process of the prompting image and the specific process of the image segmentation may refer to the related descriptions in the above embodiments, which are not repeated herein.
In this embodiment, the implementation process of image segmentation is described from the interaction point of view, and in addition, what is not described in detail in this embodiment and the technical effects that can be achieved can be taken into consideration in the related description in the above embodiments, which is not described here again.
When the automatic segmentation function of the processing device is turned on, image segmentation may be performed in accordance with the method of the embodiment shown in fig. 15. Fig. 15 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. The execution subject of the method may in particular also be a processing device, and the processing device may also provide a presentation function for a user. As shown in fig. 15, the method may include the steps of:
s701, an original image expressed as a three-dimensional image is displayed.
S702, responding to interactive operation of a user on a target area in the original image, and displaying a segmentation result of the original image.
Alternatively, the processing device may present the original image to the user. The original image may be a three-dimensional image, and the target region and other regions may be included in the original image. Based on the target image, the user can actively trigger the interactive operation on the target area in the original image, and display the segmentation result of the original image for the user. Optionally, the interactive operation may include a user's selection operation and/or a click operation of a target region in the original image.
The determining of the segmentation result may include: the processing device may perform image segmentation on the target image corresponding to the target area according to the prompt information corresponding to the target area, where the prompt information reflects an occurrence position of the interaction operation in the target area. Then, the processing device may use the target area and the segmentation result of the target area as prompt information, and perform image segmentation on other images corresponding to other areas in the original image. Wherein, there is a coincident target area and other areas corresponding to the target object in the original image, the size of the target area and other areas being smaller than the size of the target object in the original image. The positional relationship among the original image, the target image, and other images may be as shown in fig. 11.
In this embodiment, in response to the activation of the automatic segmentation function, the processing device may first perform image segmentation on the target region. After that, the processing device can automatically segment other areas in the original image by taking the segmentation result of the target area as prompt information without triggering interaction operation again by the user. The automatic segmentation function in this embodiment is particularly suitable for the case where the target object to be segmented in the original image has a large volume, and the target object cannot be completely covered by one target area.
The processing device for executing the interactive image segmentation method mentioned in the above embodiments may specifically be a device deployed in the cloud. Alternatively, the image segmentation may be provided to the user as a service platform. The workflow of this platform may be as shown in fig. 16. Fig. 16 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. The method specifically comprises the following steps:
s801, receiving an image segmentation request generated by a user, wherein the image segmentation request comprises an image segmentation task, the image segmentation task carries a target image expressed as a three-dimensional image and a prompt image corresponding to the target image, and the prompt image reflects the occurrence position of the interactive operation triggered by the user on the target image currently.
S802, performing image segmentation on the target image according to the prompt image expressed as the three-dimensional image.
S803, the image segmentation result is sent to the user.
The present embodiment provides an interactive image segmentation method, which uses a prompt image expressed as a three-dimensional image as prompt information to perform image segmentation. Since a three-dimensional image can more accurately describe the occurrence position of the interactive operation in the image than a two-dimensional vector, image segmentation can also be performed more accurately.
In addition, the details not described in detail in this embodiment and the technical effects that can be achieved may be related to the descriptions in the above embodiments, which are not described herein.
The workflow of the platform may also be as shown in fig. 17. Fig. 17 is a flowchart of yet another interactive image segmentation method according to an embodiment of the present invention. The method specifically comprises the following steps:
s901, receiving an image segmentation request generated by a user, wherein the image segmentation request comprises an image segmentation task, the image segmentation task carries an original image expressed as a three-dimensional image and prompt information corresponding to a target area in the original image, and the prompt information reflects the occurrence position of interactive operation triggered by the user on the target area.
S902, image segmentation is carried out on the target area according to the prompt information corresponding to the target area.
S903, taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the size of the target area and the other areas is smaller than the size of the target object in the original image.
S904, the image segmentation result is sent to the user.
The embodiment provides an interactive image segmentation method, after an interactive operation is triggered by a user, a target area in an original image can be segmented, and then image segmentation can be automatically performed on other areas adjacent to the target area, so that segmentation of the original image is finally completed. The method can reduce the interaction times generated by users in the interactive image segmentation process by automatically segmenting the adjacent areas, thereby improving the segmentation efficiency.
In addition, the details not described in detail in this embodiment and the technical effects that can be achieved may be related to the descriptions in the above embodiments, which are not described herein.
Fig. 18 is a schematic structural diagram of an interactive image segmentation platform according to an embodiment of the present invention, corresponding to fig. 16. Alternatively, the platform may be deployed in the cloud. As shown in fig. 18, the interactive image segmentation platform includes: an image display system and an image processing system.
The image display system can display the target image to be segmented to a user. Wherein the target image is a three-dimensional image. The target image may include at least one object, and these objects may specifically include a target object that needs to be segmented, or may include other objects that need not be segmented.
The user may then trigger an interactive operation on the target image based on the target image. At this time, the image processing system may further acquire a hint image corresponding to the target image in response to the current interactive operation of the user on the target image. Wherein the hint image may also appear as a three-dimensional image, and the hint image and the target image are the same size. Meanwhile, the prompt image can also reflect the occurrence position of the interactive operation in the target image. And for a target image represented as a three-dimensional image, the three-dimensional hint image may more accurately describe the location of the occurrence of the interaction in the target image than if the two-dimensional vector was used to reflect the location of the occurrence of the interaction in the target image.
Thereafter, the image processing system may perform image segmentation on the target image based on the hint image that appears as a three-dimensional image. Finally, the image display system may also present the segmentation result of the target image to the user.
In this embodiment, the interactive image segmentation platform includes an image display system and an image processing system. The image display system can display the target image and the segmentation result of the target image to a user, and the image processing system is used for segmenting the target image by using the prompt image which is represented as a three-dimensional image. The details and the technical effects that can be achieved in this embodiment may be related to the descriptions in the above embodiments, which are not described in detail herein.
Optionally, the interactive image segmentation platform in the embodiment shown in fig. 18 may further include an automatic segmentation function, and the activation of the automatic segmentation function may reduce the number of interactions in the image interaction process. The working process of the interactive image segmentation platform may also be specifically as follows:
wherein the image display system may present the original image, which is represented as a three-dimensional image, to the user. The original image may include a target region and other regions.
Based on the original image, the user can trigger a first round of interaction operation on a target area in the original image. Alternatively, the interactive operation may include a user's selection operation or a click operation of a target area in the original image.
The image processing system can respond to the previous round of interaction operation to acquire the prompt information corresponding to the target area. The hint information may reflect the location of the occurrence of the interactive operation in the target area. Alternatively, the prompt information may be a normal two-dimensional vector, or may be a prompt image represented as a three-dimensional image in the above embodiments.
Then, the image processing system can divide the image of the target area according to the prompt information corresponding to the target area so as to obtain a division result of the target area.
After the automatic segmentation function is started, optionally, the processing device may automatically use the target area and the segmentation result of the target area as prompt information to segment the image of other areas in the original image, thereby completing the segmentation of the target object in the original image. The target area and the other areas are two areas which are coincident in the original image.
This embodiment also corresponds to the method shown in fig. 17.
In this embodiment, after the image processing system starts the automatic segmentation function, the target area and other adjacent areas in the original image may be automatically segmented, without the need for the user to trigger the interactive operation again. Therefore, by the automatic segmentation function of the image processing system, the image segmentation efficiency can be improved by reducing the interaction times. And when other areas are segmented, the corresponding segmentation results of the target areas, namely the historical segmentation results, are used, so that the prompt information used in the segmentation process of the other areas is richer, and the accuracy of image segmentation can be improved.
In addition, the details not described in detail in this embodiment and the technical effects that can be achieved may be related to the descriptions in the above embodiments, which are not described herein.
If the automatic segmentation function is not started, optionally, the user may further trigger a next round of interaction operation to use the target area and the segmentation result of the target area as prompt information, and segment the image of other areas in the original image, thereby finally completing the segmentation of the target object in the original image. The target area in the original image is overlapped with other areas, and the size of the target area and the other areas is smaller than that of the target object in the original image.
As can be seen from the use of the processing device described in the embodiments shown in fig. 1 to 17, the target image is realized by using the segmentation model built in the processing device. The training process of the segmentation model can be described as follows. And the embodiments described below may be performed in particular by a training device, which may alternatively be a processing device as mentioned in the embodiments described above, or may be a stand-alone device. The training device may be deployed at the cloud.
Fig. 19 is a flowchart of a model training method according to an embodiment of the present invention. The present embodiment may be performed by the training device described above. As shown in fig. 19, the method may include the steps of:
S1001, obtaining a training image with the same size, a reference segmentation result corresponding to the training image and a prompt image corresponding to the training image, wherein the prompt image reflects the occurrence position of the interactive operation triggered by the user on the training image in the training image, and the training image and the prompt image are three-dimensional images.
Alternatively, the reference segmentation result corresponding to the training image and the prompt image corresponding to the training image may be obtained in advance and stored in the training image set. The training device may acquire images and segmentation results directly from the training image set. The size of the reference segmentation result is the same as that of the prompting image, and the prompting image and the training image are three-dimensional images. And the hint image may reflect where in the training image the user's interaction with the training image triggered occurred.
S1002, inputting the training image and the prompt image into the segmentation model, and outputting a first prediction segmentation result corresponding to the training image by the segmentation model.
S1003, training a segmentation model according to a loss calculation result between a first prediction segmentation result corresponding to the training image and a reference segmentation result.
The training device may then input the training images of the same size and the hint images into the segmentation model to output a first predictive segmentation result corresponding to the training images from the segmentation model.
And then, the training equipment can perform loss calculation on the first prediction segmentation result output by the segmentation model and the reference segmentation result corresponding to the training image, and train the segmentation model according to the obtained loss calculation result. Alternatively, the loss function used by the loss calculation process may include one of a cross entropy loss function, a logarithmic loss function, and a square loss function.
In this embodiment, the segmentation model is trained by using a supervised training manner, that is, by using the reference segmentation result corresponding to the training image, so that the training effect of the segmentation model is better, and further, the prediction segmentation result output by the segmentation model is more accurate when the model is used in stages.
In order for the processing device to have the automatic segmentation function described above, the segmentation model may also be trained in the following manner. Fig. 20 is a flowchart of another model training method according to an embodiment of the present invention. The present embodiment may be performed by the training device described above. As shown in fig. 20, the method may include the steps of:
s1101, obtaining training images with the same size, reference segmentation results corresponding to the training images and prompt images corresponding to the training images, wherein the prompt images reflect the occurrence positions of interaction operations triggered by users on the training images in the training images, and the training images comprise target training images and other training images.
The training device may acquire a target training image, a target reference segmentation result corresponding to the target training image, and a hint image corresponding to the target training image. Meanwhile, the training equipment can also acquire other training images, other reference segmentation results corresponding to the other training images and prompt images corresponding to the other training images.
The target training image and other training images and the respective reference segmentation results and the prompt images have the same size. And the target training image and other training images are image areas which are coincident in the same three-dimensional image. Other training images can be expressed as V, target training images can be expressed as U, and other reference segmentation results corresponding to other training images can be expressed asThe target reference segmentation result corresponding to the target training image can be expressed as +.>
S1102, inputting the training image and the prompt image into the segmentation model to output a first prediction segmentation result corresponding to the training image by the segmentation model.
The training device may input the target training image U and the hint image corresponding to the target training image U, which are the same in size, into the segmentation model, so as to output a first prediction segmentation result corresponding to the target training image U by the segmentation model. Wherein, the first prediction segmentation result corresponding to the target training image U can be expressed as
Meanwhile, the training device can also input other training images V and prompt images corresponding to the other training images V into the segmentation model so as to output first prediction segmentation results corresponding to the other training images V through the segmentation model. Wherein the first prediction segmentation result corresponding to the other training image V can be expressed as
At this time, the segmentation model is the first stage training of the segmentation model, which is actually a non-cross training process.
S1103, inputting other training images and first prompt information into the segmentation model to output second prediction segmentation results corresponding to the other training images by the segmentation model, wherein the first prompt information comprises a target training image and first prediction segmentation results corresponding to the target training image.
The training device may then perform a second stage training of the segmentation model, which is in effect a cross-training process. Specifically, the other training image V and the first hint information may be input into the segmentation model to output, by the segmentation model, a second predicted segmentation result corresponding to the other training image V. Wherein the second prediction segmentation result corresponding to the other training image V can be expressed asFirst, theA hint may be considered as an image pair, and the first hint may include the target training image U and a first predictive segmentation result corresponding to the target training image U >
S1104, inputting the target training image and second prompt information into the segmentation model to output a second prediction segmentation result corresponding to the target training image by the segmentation model, wherein the second prompt information comprises other training images and first prediction segmentation results of other training images.
Similar to step S903, the training device may also input the target training image U and the second hint information into the segmentation model to output a second predicted segmentation result corresponding to the target training image U from the segmentation model. Wherein, the second prediction segmentation result corresponding to the target training image can be expressed as. The second hint information may comprise the other training image V and the first predictive segmentation result of the other training image V>
S1105, training a segmentation model according to a loss calculation result between a first prediction segmentation result corresponding to the target training image and a target reference segmentation result, a loss calculation result between a first prediction segmentation result corresponding to other training images and other reference segmentation results, a loss calculation result between a second prediction segmentation result corresponding to other training images and other reference segmentation results, and a loss calculation result between a second prediction segmentation result corresponding to the target training image and the target reference segmentation result.
The training device can divide the first prediction corresponding to the target training image USegmentation result +.>A loss calculation is performed to obtain a first loss value. Alternatively, the loss function used by the loss calculation process may include one of a cross entropy loss function, a logarithmic loss function, and a square loss function.
Similarly, the training device may also divide the first prediction corresponding to the other training image VSegmentation results with other references->Performing loss calculation to obtain a second loss value; second predictive segmentation result for other training image V>And other reference segmentation results->Performing loss calculation to obtain a third loss value; second predictive segmentation result corresponding to target training image U>And target reference segmentation result->And performing loss calculation to obtain a fourth loss value.
Finally, the training device may train the segmentation model according to the loss values obtained by the calculation. Wherein the sum of the first loss value and the second loss value is the loss value obtained by the first stage training (i.e. non-cross training), and can be usedAnd (3) representing. The third and fourth loss values are obtained by training in the second stage (i.e. cross training), and can be +. >And (3) representing.
In this embodiment, the automatic segmentation function may be implemented in a multi-stage training manner. In the training process of the second stage, the other training images and the prediction segmentation results obtained in the first stage can be used as the prompt information of the target training image to train the segmentation model, and similarly, the target training image and the prediction segmentation results obtained in the first stage can be used as the prompt information of the other images to train the segmentation model, namely, the cross training is completed. Therefore, the scheme can further improve the training effect of the segmentation model by combining respective training and cross training, so that the prediction result output by the segmentation model is more accurate in the use stage of the model.
An interactive image segmentation apparatus according to one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these interactive image segmentation devices may each be configured by the steps taught by the present solution using commercially available hardware components.
Fig. 21 is a schematic structural diagram of an interactive image segmentation apparatus according to an embodiment of the present invention, as shown in fig. 21, the apparatus includes:
The first acquisition module 11 is used for acquiring a target image represented as a three-dimensional image.
And the second acquisition module 12 is used for responding to the current interactive operation of the user on the target image and acquiring a prompt image corresponding to the target image, wherein the prompt image reflects the occurrence position of the interactive operation in the target image.
An image segmentation module 13, configured to perform image segmentation on the target image according to the hint image that is represented as a three-dimensional image.
Wherein the target image comprises a medical image; the interaction includes at least one of: and performing clicking operation on an image area where the target object is located in the target image, clicking operation on image areas where other objects are located in the target image and framing operation on the target image.
Optionally, the prompting image further includes a history segmentation result corresponding to the target image after the interaction operation is triggered on the target image in response to a last round of the user.
The image segmentation module 13 is configured to input the target image and the hint image with the same size into a segmentation model, so as to output a segmentation result from the segmentation model.
Optionally, the image segmentation module 13 is configured to input the target image and the hint image into the segmentation model, to extract a first feature image from the target image by an encoder in the segmentation model, to extract a second feature image from the hint image, to fuse the first feature image and the second feature image by the encoder, to decode a fusion result by a decoder in the segmentation model, and to output a segmentation result of the target image.
Optionally, the image segmentation module 13 is configured to downsample the target image and the hint image with a sampling layer in the encoder, so as to obtain a first sampled image corresponding to the target image and a second sampled image corresponding to the hint image; and respectively carrying out feature extraction on the first sampling image and the second sampling image by utilizing a feature extraction layer in the encoder so as to obtain the first feature image and the second feature image with the same size, wherein the image input into the feature extraction layer has the same size as the image output from the feature extraction layer.
Optionally, the image segmentation module 13 is configured to perform convolution processing on the second feature image by using different convolution parameters included in a feature fusion layer in the encoder, so as to obtain a first convolution result and a second convolution result; and fusing the normalization processing result, the first convolution result and the second convolution result of the first characteristic image by utilizing the characteristic fusion layer.
Optionally, the image segmentation module 13 is configured to upsample the fusion result by using a sampling layer in the decoder; and outputting a segmentation result of the target image according to the upsampling result by using the decoder.
Optionally, the encoder includes a multi-level sampling layer and a multi-level feature extraction layer, and an image input to any feature extraction layer is the same size as an image output from any feature extraction layer.
The image segmentation module 13 is configured to perform downsampling by using a previous sampling layer in the encoder, so as to obtain a third sampling image corresponding to the target image and a fourth sampling image corresponding to the prompt image; performing feature extraction on the third sampling image and the fourth sampling image by using a previous feature extraction layer in the encoder to obtain a third feature image of the target image and a fourth feature image of the prompt image; downsampling the third characteristic image and the fourth characteristic image by using a next sampling layer in the encoder to obtain a first sampling image corresponding to the target image and a second sampling image corresponding to the prompt image; and respectively carrying out feature extraction on the first sampling image and the second sampling image by utilizing a next-stage feature extraction layer in the encoder so as to obtain the first feature image and the second feature image, wherein the size of the third feature image is larger than that of the first feature image.
Optionally, the encoder includes a multi-level feature fusion layer; the encoder fuses the first feature image and the second feature image.
The image segmentation module 13 is configured to perform convolution processing on the second feature image by using different convolution parameters included in a previous feature fusion layer in the encoder, so as to obtain a third convolution result and a fourth convolution result; fusing the normalization processing result, the third convolution result and the fourth convolution result of the first characteristic image by using the upper-level characteristic fusion layer; upsampling a fusion result of the third convolution result and the fourth convolution result using a sampling layer in the decoder; splicing the up-sampling result and the third characteristic image to obtain a splicing result; respectively carrying out convolution processing on the fourth characteristic image by utilizing different convolution parameters contained in a next-stage characteristic fusion layer in the encoder so as to obtain a first convolution result and a second convolution result; and fusing the normalization processing result, the first convolution result and the second convolution result of the splicing result by using the next-stage feature fusion layer.
Optionally, the first obtaining module 11 is configured to determine, in response to an interaction operation of the user on an original image, a target area corresponding to the interaction operation in the original image as the target image, where the original image is represented as a three-dimensional image.
Optionally, the apparatus further comprises: the other image determining module 14 is configured to determine other areas in the original image as other images, where the target area has the same size and overlaps with the other areas.
The image segmentation module 13 is configured to use the target image and a segmentation result of the target image as a prompt image, and perform image segmentation on the other image, where the other region and the target region correspond to the same object in the original image.
Optionally, the image segmentation module 13 is configured to input the target image, the segmentation result of the target image, and the other image with the same size into a segmentation model, so as to output the segmentation result by the segmentation model.
Optionally, the apparatus further comprises: the training module 15 is configured to obtain a training image with the same size, a reference segmentation result corresponding to the training image, and a prompt image corresponding to the training image, where the prompt image reflects an occurrence position of an interaction operation triggered by the user on the training image in the training image, and the training image and the prompt image are three-dimensional images; inputting the training image and the prompting image into the segmentation model to output a first prediction segmentation result corresponding to the training image by the segmentation model; and training the segmentation model according to a loss calculation result between the first prediction segmentation result corresponding to the training image and the reference segmentation result.
Optionally, the training image includes a target training image and other training images, the target training image and the other training images are the same in size and are overlapping image areas in the same three-dimensional image; the reference segmentation results comprise other reference segmentation results corresponding to the other training images and target reference segmentation results corresponding to the target training images.
The training module 15 is configured to input the other training images and first prompt information into the segmentation model, so that a second prediction segmentation result corresponding to the other training images is output by the segmentation model, and the first prompt information includes the target training image and a first prediction segmentation result corresponding to the target training image; inputting the target training image and second prompt information into the segmentation model to output a second prediction segmentation result corresponding to the target training image by the segmentation model, wherein the second prompt information comprises the other training images and first prediction segmentation results of the other training images; and training the segmentation model according to a loss calculation result between a second prediction segmentation result corresponding to the other training image and the other reference segmentation result and a loss calculation result between a second prediction segmentation result corresponding to the target training image and the target reference segmentation result.
The apparatus shown in fig. 19 may perform the methods of the embodiments shown in fig. 1 to 12 and the embodiments shown in fig. 17 to 18, and reference may be made to the relevant descriptions of the embodiments shown in fig. 1 to 12 and the embodiments shown in fig. 17 to 18 for portions of this embodiment not described in detail. The implementation process and the technical effect of the technical scheme refer to the description in the embodiment shown in fig. 1 to 12 and the embodiment shown in fig. 17 to 18, and are not repeated here.
Optionally, the second obtaining module 12 in the apparatus shown in fig. 21 is further configured to obtain, in response to an interaction operation performed by a user on a target area in an original image, prompt information corresponding to the target area, where the prompt information reflects an occurrence position of the interaction operation in the target area.
The image segmentation module 13 is further configured to perform image segmentation on the target area according to the prompt information corresponding to the target area, where the original image is a three-dimensional image; and taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image.
The apparatus of fig. 21 may also perform the method of the embodiment of fig. 13, and reference is made to the relevant description of the embodiment of fig. 13 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 13, and are not described herein.
Fig. 22 is a schematic structural diagram of yet another interactive image segmentation apparatus according to an embodiment of the present invention, as shown in fig. 22, the apparatus includes:
the image display module 21 is used for displaying the target image represented as a three-dimensional image.
And the segmentation result display module 22 is used for responding to the interactive operation of the user on the target image and displaying the segmentation result of the target image.
The segmentation result is determined according to a prompt image expressed as a three-dimensional image, and the prompt image reflects the occurrence position of the interactive operation in the target image.
The apparatus shown in fig. 22 may perform the method of the embodiment shown in fig. 14, and reference is made to the relevant description of the embodiment shown in fig. 14 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 14, and are not described herein.
Optionally, the image display module 21 in the apparatus shown in fig. 22 is further configured to display an original image represented as a three-dimensional image.
The segmentation result display module 22 is further configured to display a segmentation result of the original image in response to an interaction operation of a user on a target region in the original image.
The determining process of the segmentation result comprises the following steps:
image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the prompt information reflects the occurrence position of the interactive operation in the target area;
and taking the target area and the segmentation result of the target area as prompt information, carrying out image segmentation on other areas in the original image, wherein the target areas and the other areas which have the same size and are overlapped correspond to the target object in the original image, and the sizes of the target area and the other areas are smaller than the size of the target object in the original image.
The apparatus of fig. 22 may also perform the method of the embodiment of fig. 15, and reference is made to the relevant description of the embodiment of fig. 15 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 15, and are not described herein.
Fig. 23 is a schematic structural diagram of yet another interactive image segmentation apparatus according to an embodiment of the present invention, as shown in fig. 23, the apparatus includes:
The receiving module 23 is configured to receive an image segmentation request generated by a user, where the image segmentation request includes an image segmentation task, and the image segmentation task carries a target image represented as a three-dimensional image and a prompt image corresponding to the target image, and the prompt image reflects an occurrence position of an interaction operation triggered by the user on the target image currently.
A processing module 24, configured to perform image segmentation on the target image according to the hint image that appears as a three-dimensional image.
And a sending module 25, configured to send the image segmentation result to the user.
The apparatus shown in fig. 23 may perform the method of the embodiment shown in fig. 16, and reference is made to the relevant description of the embodiment shown in fig. 16 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 16, and are not described herein.
Optionally, the receiving module 23 in the apparatus shown in fig. 23 is further configured to receive an image segmentation request generated by a user, where the image segmentation request includes an image segmentation task, and the image segmentation task carries an original image represented as a three-dimensional image and prompt information corresponding to a target area in the original image, where the prompt information reflects an occurrence position of an interaction operation triggered by the user on the target area.
The processing module 24 is further configured to perform image segmentation on the target area according to the prompt information corresponding to the target area, where the original image is a three-dimensional image; and taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image.
The apparatus shown in fig. 23 may perform the method of the embodiment shown in fig. 17, and reference is made to the relevant description of the embodiment shown in fig. 17 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 17, and are not described herein.
In one possible design, the method for dividing an interactive image provided in the above embodiments may be applied to an electronic device, as shown in fig. 24, where the electronic device may include: a processor 31 and a memory 32. The memory 32 is used for storing a program for supporting the electronic device to execute the method of the embodiment shown in fig. 1 to 17 and the method of the embodiment shown in fig. 19 to 20, and the processor 31 is configured to execute the program stored in the memory 32.
The program comprises one or more computer instructions which, when executed by the processor 31, enable the steps in the interactive image segmentation method provided in the various embodiments of the figures described above to be carried out.
The electronic device may also include a communication interface 33 in its structure for communicating with other devices or communication systems.
In addition, an embodiment of the present invention provides a computer storage medium storing computer software instructions for the electronic device, which includes a program for executing the interactive image segmentation method according to the embodiments shown in fig. 1 to 17 and the embodiments shown in fig. 19 to 20.
Additionally, embodiments of the present invention provide a computer program product. The computer program product comprises a computer program or instructions. The computer programs or instructions, when executed by the processor, cause the processor to perform the steps or functions of the methods described above with respect to the embodiments of fig. 1-17 and the embodiments of fig. 19-20.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (18)

1. An interactive image segmentation method, comprising:
acquiring a target image represented as a three-dimensional image;
responding to the current interactive operation of a user on the target image, acquiring a prompt image corresponding to the target image, wherein the prompt image reflects the occurrence position of the interactive operation in the target image;
and carrying out image segmentation on the target image according to the prompt image expressed as a three-dimensional image.
2. The method of claim 1, wherein the target image comprises a medical image; the interaction includes at least one of: and performing clicking operation on an image area where the target object is located in the target image, clicking operation on image areas where other objects are located in the target image and framing operation on the target image.
3. The method of claim 1, wherein the hint image further comprises a history segmentation result corresponding to the target image in response to a previous round of interaction with the target image by the user;
the image segmentation of the target image according to the prompt image represented as a three-dimensional image comprises:
Inputting the target image and the prompt image with the same size into a segmentation model to output a segmentation result by the segmentation model.
4. A method according to claim 3, wherein said inputting the target image and the hint image into a segmentation model to output segmentation results from the segmentation model comprises:
inputting the target image and the prompt image into the segmentation model to extract a first characteristic image from the target image by an encoder in the segmentation model, extracting a second characteristic image from the prompt image to fuse the first characteristic image and the second characteristic image by the encoder, decoding a fusion result by a decoder in the segmentation model, and outputting a segmentation result of the target image.
5. The method of claim 4, wherein the encoder in the segmentation model extracts a first feature image from the target image and a second feature image from the hint image, comprising:
downsampling the target image and the prompt image by using a sampling layer in the encoder to obtain a first sampling image corresponding to the target image and a second sampling image corresponding to the prompt image;
And respectively carrying out feature extraction on the first sampling image and the second sampling image by utilizing a feature extraction layer in the encoder so as to obtain the first feature image and the second feature image with the same size, wherein the image input into the feature extraction layer has the same size as the image output from the feature extraction layer.
6. The method of claim 4, wherein the encoder comprises a multi-level sampling layer and a multi-level feature extraction layer, an image input to any feature extraction layer being the same size as an image output from the any feature extraction layer;
the encoder in the segmentation model extracts a first feature image from the target image and a second feature image from the hint image, comprising:
downsampling is carried out by utilizing an upper sampling layer in the encoder so as to obtain a third sampling image corresponding to the target image and a fourth sampling image corresponding to the prompt image;
performing feature extraction on the third sampling image and the fourth sampling image by using a previous feature extraction layer in the encoder to obtain a third feature image of the target image and a fourth feature image of the prompt image;
Downsampling the third characteristic image and the fourth characteristic image by using a next sampling layer in the encoder to obtain a first sampling image corresponding to the target image and a second sampling image corresponding to the prompt image;
and respectively carrying out feature extraction on the first sampling image and the second sampling image by utilizing a next-stage feature extraction layer in the encoder so as to obtain the first feature image and the second feature image, wherein the size of the third feature image is larger than that of the first feature image.
7. The method of claim 6, wherein the encoder comprises a multi-level feature fusion layer; the encoder fusing the first feature image and the second feature image, comprising:
respectively carrying out convolution processing on the second characteristic image by utilizing different convolution parameters contained in a previous-stage characteristic fusion layer in the encoder so as to obtain a third convolution result and a fourth convolution result;
fusing the normalization processing result, the third convolution result and the fourth convolution result of the first characteristic image by using the upper-level characteristic fusion layer;
Upsampling a fusion result of the third convolution result and the fourth convolution result using a sampling layer in the decoder;
splicing the up-sampling result and the third characteristic image to obtain a splicing result;
respectively carrying out convolution processing on the fourth characteristic image by utilizing different convolution parameters contained in a next-stage characteristic fusion layer in the encoder so as to obtain a first convolution result and a second convolution result;
and fusing the normalization processing result, the first convolution result and the second convolution result of the splicing result by using the next-stage feature fusion layer.
8. The method of claim 1, wherein acquiring the target image represented as a three-dimensional image comprises:
responding to the interactive operation of the user on an original image, determining a target area corresponding to the interactive operation in the original image as the target image, wherein the original image is expressed as a three-dimensional image;
determining other areas in the original image as other images, wherein the target area and the other areas are the same in size and overlap;
and taking the target image and the segmentation result of the target image as a prompt image, and carrying out image segmentation on the other images, wherein the other areas and the target area correspond to the same object in the original image.
9. The method of claim 8, wherein the method further comprises:
acquiring training images with the same size, a reference segmentation result corresponding to the training images and a prompt image corresponding to the training images, wherein the prompt image reflects the occurrence position of interaction operation triggered by the user on the training images in the training images, and the training images and the prompt images are three-dimensional images;
inputting the training image and the prompting image into the segmentation model to output a first prediction segmentation result corresponding to the training image by the segmentation model;
and training the segmentation model according to a loss calculation result between the first prediction segmentation result corresponding to the training image and the reference segmentation result.
10. The method of claim 9, wherein the training image comprises a target training image and other training images, the target training image and the other training images being the same size and being overlapping image areas in the same three-dimensional image; the reference segmentation results comprise other reference segmentation results corresponding to the other training images and target reference segmentation results corresponding to the target training images;
The method further comprises the steps of:
inputting the other training images and first prompt information into the segmentation model to output second prediction segmentation results corresponding to the other training images by the segmentation model, wherein the first prompt information comprises first prediction segmentation results corresponding to the target training images and the target training images;
inputting the target training image and second prompt information into the segmentation model to output a second prediction segmentation result corresponding to the target training image by the segmentation model, wherein the second prompt information comprises the other training images and first prediction segmentation results of the other training images;
and training the segmentation model according to a loss calculation result between a second prediction segmentation result corresponding to the other training image and the other reference segmentation result and a loss calculation result between a second prediction segmentation result corresponding to the target training image and the target reference segmentation result.
11. An interactive image segmentation method, comprising:
responding to interactive operation of a user on a target area in an original image, and acquiring prompt information corresponding to the target area, wherein the prompt information reflects the occurrence position of the interactive operation in the target area;
Image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the original image is a three-dimensional image;
and taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image.
12. An interactive image segmentation method, comprising:
displaying a target image represented as a three-dimensional image;
responding to the interactive operation of the user on the target image, and displaying the segmentation result of the target image;
the segmentation result is determined according to a prompt image expressed as a three-dimensional image, and the prompt image reflects the occurrence position of the interactive operation in the target image.
13. An interactive image segmentation method, comprising:
displaying an original image represented as a three-dimensional image;
responding to interactive operation of a user on a target area in the original image, and displaying a segmentation result of the original image;
the determining process of the segmentation result comprises the following steps:
Image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the prompt information reflects the occurrence position of the interactive operation in the target area;
and taking the target area and the segmentation result of the target area as prompt information, carrying out image segmentation on other areas in the original image, wherein the target area and the other areas which are overlapped correspond to a target object in the original image, and the sizes of the target area and the other areas are smaller than the size of the target object in the original image.
14. An interactive image segmentation method, comprising:
receiving an image segmentation request generated by a user, wherein the image segmentation request comprises an image segmentation task, the image segmentation task carries a target image which is expressed as a three-dimensional image and a prompt image corresponding to the target image, and the prompt image reflects the occurrence position of the interactive operation triggered by the user on the target image at present;
image segmentation is carried out on the target image according to the prompt image expressed as a three-dimensional image;
and sending an image segmentation result to the user.
15. An interactive image segmentation method, comprising
Receiving an image segmentation request generated by a user, wherein the image segmentation request comprises an image segmentation task, the image segmentation task carries an original image which is expressed as a three-dimensional image and prompt information corresponding to a target area in the original image, and the prompt information reflects the occurrence position of interactive operation triggered by the user on the target area;
image segmentation is carried out on the target area according to prompt information corresponding to the target area, and the original image is a three-dimensional image;
taking the target area and the segmentation result of the target area as prompt information, and carrying out image segmentation on other areas in the original image, wherein the target area is overlapped with the other areas, and the sizes of the target area and the other areas are smaller than the size of a target object in the original image;
and sending an image segmentation result to the user.
16. An electronic device, comprising: a memory, a processor; wherein the memory has executable code stored thereon, which when executed by the processor causes the processor to perform the interactive image segmentation method according to any one of claims 1-15.
17. A non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of an electronic device, causes the processor to perform the interactive image segmentation method of any of claims 1-15.
18. A computer program product, characterized in that the computer program product comprises a computer program or instructions enabling the computer program or instructions to carry out the steps of the interactive image segmentation method according to any one of claims 1-15.
CN202410257252.6A 2024-03-06 2024-03-06 Interactive image segmentation method, device, storage medium and program product Active CN117853507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410257252.6A CN117853507B (en) 2024-03-06 2024-03-06 Interactive image segmentation method, device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410257252.6A CN117853507B (en) 2024-03-06 2024-03-06 Interactive image segmentation method, device, storage medium and program product

Publications (2)

Publication Number Publication Date
CN117853507A true CN117853507A (en) 2024-04-09
CN117853507B CN117853507B (en) 2024-06-18

Family

ID=90536411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410257252.6A Active CN117853507B (en) 2024-03-06 2024-03-06 Interactive image segmentation method, device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN117853507B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826209A (en) * 2010-04-29 2010-09-08 电子科技大学 Canny model-based method for segmenting three-dimensional medical image
US20130169639A1 (en) * 2012-01-04 2013-07-04 Feng Shi System and method for interactive contouring for 3d medical images
US20140198979A1 (en) * 2011-09-19 2014-07-17 Oxipita Inc. Methods and systems for interactive 3d image segmentation
US20160188274A1 (en) * 2014-12-31 2016-06-30 Coretronic Corporation Interactive display system, operation method thereof, and image intermediary apparatus
CN110766694A (en) * 2019-09-24 2020-02-07 清华大学 Interactive segmentation method of three-dimensional medical image
WO2021179205A1 (en) * 2020-03-11 2021-09-16 深圳先进技术研究院 Medical image segmentation method, medical image segmentation apparatus and terminal device
US20210319213A1 (en) * 2020-04-09 2021-10-14 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for prompting motion, electronic device and storage medium
CN115564953A (en) * 2022-09-22 2023-01-03 广东人工智能与先进计算研究院 Image segmentation method, device, equipment and storage medium
WO2023070447A1 (en) * 2021-10-28 2023-05-04 京东方科技集团股份有限公司 Model training method, image processing method, computing processing device, and non-transitory computer readable medium
CN116342629A (en) * 2023-06-01 2023-06-27 深圳思谋信息科技有限公司 Image interaction segmentation method, device, equipment and storage medium
US20230215121A1 (en) * 2022-01-05 2023-07-06 Electronics And Telecommunications Research Institute Method and apparatus for providing user-interactive customized interaction for xr real object transformation
WO2023185391A1 (en) * 2022-03-29 2023-10-05 北京字跳网络技术有限公司 Interactive segmentation model training method, labeling data generation method, and device
CN116912827A (en) * 2023-07-19 2023-10-20 广东弓叶科技有限公司 Interactive labeling method and system based on large model

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826209A (en) * 2010-04-29 2010-09-08 电子科技大学 Canny model-based method for segmenting three-dimensional medical image
US20140198979A1 (en) * 2011-09-19 2014-07-17 Oxipita Inc. Methods and systems for interactive 3d image segmentation
US20130169639A1 (en) * 2012-01-04 2013-07-04 Feng Shi System and method for interactive contouring for 3d medical images
US20160188274A1 (en) * 2014-12-31 2016-06-30 Coretronic Corporation Interactive display system, operation method thereof, and image intermediary apparatus
CN110766694A (en) * 2019-09-24 2020-02-07 清华大学 Interactive segmentation method of three-dimensional medical image
WO2021179205A1 (en) * 2020-03-11 2021-09-16 深圳先进技术研究院 Medical image segmentation method, medical image segmentation apparatus and terminal device
US20210319213A1 (en) * 2020-04-09 2021-10-14 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for prompting motion, electronic device and storage medium
WO2023070447A1 (en) * 2021-10-28 2023-05-04 京东方科技集团股份有限公司 Model training method, image processing method, computing processing device, and non-transitory computer readable medium
US20230215121A1 (en) * 2022-01-05 2023-07-06 Electronics And Telecommunications Research Institute Method and apparatus for providing user-interactive customized interaction for xr real object transformation
WO2023185391A1 (en) * 2022-03-29 2023-10-05 北京字跳网络技术有限公司 Interactive segmentation model training method, labeling data generation method, and device
CN116934769A (en) * 2022-03-29 2023-10-24 北京字跳网络技术有限公司 Interactive segmentation model training method, annotation data generation method and equipment
CN115564953A (en) * 2022-09-22 2023-01-03 广东人工智能与先进计算研究院 Image segmentation method, device, equipment and storage medium
CN116342629A (en) * 2023-06-01 2023-06-27 深圳思谋信息科技有限公司 Image interaction segmentation method, device, equipment and storage medium
CN116912827A (en) * 2023-07-19 2023-10-20 广东弓叶科技有限公司 Interactive labeling method and system based on large model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAZHONG CEN等: ""Segment Any 3D Gaussians"", 《ARXIV》, 1 December 2023 (2023-12-01), pages 1 - 10 *
XUAN LIAO等: ""Iteratively-Refined Interactive 3D Medical Image Segmentation with Multi-Agent Reinforcement Learning"", 《ARXIV》, 23 November 2019 (2019-11-23), pages 1 - 9 *

Also Published As

Publication number Publication date
CN117853507B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN112396115B (en) Attention mechanism-based target detection method and device and computer equipment
CN110599492B (en) Training method and device for image segmentation model, electronic equipment and storage medium
TWI715117B (en) Method, device and electronic apparatus for medical image processing and storage mdeium thereof
CN111369582B (en) Image segmentation method, background replacement method, device, equipment and storage medium
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
CN112330574A (en) Portrait restoration method and device, electronic equipment and computer storage medium
CN114821605B (en) Text processing method, device, equipment and medium
CN115438215B (en) Image-text bidirectional search and matching model training method, device, equipment and medium
CN111369567B (en) Method and device for segmenting target object in three-dimensional image and electronic equipment
WO2024060558A1 (en) Feasible region prediction method and apparatus, and system and storage medium
US12026857B2 (en) Automatically removing moving objects from video streams
CN115631205B (en) Method, device and equipment for image segmentation and model training
CN116797768A (en) Method and device for reducing reality of panoramic image
CN114519710A (en) Disparity map generation method and device, electronic equipment and storage medium
CN112419342A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN116980541B (en) Video editing method, device, electronic equipment and storage medium
CN116740161B (en) Binocular stereo matching aggregation method
CN117853507B (en) Interactive image segmentation method, device, storage medium and program product
CN117726513A (en) Depth map super-resolution reconstruction method and system based on color image guidance
CN115311550B (en) Remote sensing image semantic change detection method and device, electronic equipment and storage medium
CN113496225B (en) Image processing method, image processing device, computer equipment and storage medium
CN114926882A (en) Human face detection method based on DETR
CN114299087A (en) Image optimization method, device, equipment and storage medium
CN111815631B (en) Model generation method, device, equipment and readable storage medium
CN117541703B (en) Data rendering method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant