CN115965785A

CN115965785A - Image segmentation method, device, equipment, program product and medium

Info

Publication number: CN115965785A
Application number: CN202310014416.8A
Authority: CN
Inventors: 黄雅雯; 黄慧敏; 王红; 李悦翔; 郑冶枫
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2023-04-14

Abstract

The invention provides an image segmentation method, an image segmentation device, image segmentation equipment and a medium, wherein the method comprises the following steps: extracting image features in the category corresponding to the feature image; obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category; according to the image features in the category, calculating a pixel set to obtain category central features of the feature images; interacting the block sequences and the category central features to obtain enhanced category central features; splicing the enhanced category central features and the image features in the categories to obtain image features between the categories; and segmenting the characteristic image through the image characteristics between the categories to obtain a segmentation result of the characteristic image of the target object. The method can improve the image segmentation accuracy and adapt to different image segmentation scenes.

Description

Image segmentation method, device, equipment, program product and medium

Technical Field

The present invention relates to machine learning technologies, and in particular, to an image segmentation method, an image segmentation apparatus, an electronic device, a computer program product, and a storage medium.

Background

Various types of identification based on deep learning have been important tools for solving a large amount of data points in various application scenarios. For example, in application scenarios such as images and natural language processing, large-scale classification and recognition are performed on a large amount of data, so that relevant classification prediction results can be obtained quickly and accurately, and the functional implementation of the application scenarios is accelerated.

In the classification prediction of images, the images specifically used for implementing the classification prediction and the implementation of the classification prediction are different according to different deployed application scenarios, for example, AI + medical scenarios. Taking AI + medical scene as an example, various images of medical images formed by different medical devices are continuously generated, for example, the images are continuously generated as the development of the disease condition of a patient requires different time points or continuous shooting in a department, and become a large amount of data, and it is urgently needed to realize the segmentation of the medical images by means of the execution of image segmentation.

However, in the related art, although the deep convolution neural network algorithm is widely applied to segmentation and enhancement of medical images, the overall characteristics and the local characteristics cannot be considered, and it is difficult to consider both network accuracy and memory consumption.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image segmentation method, an apparatus, an electronic device, a computer program product, and a storage medium, which can fully learn characteristics of image features within a class and image features between classes, solve the problem of intra-class consistency and insufficient inter-class difference, achieve intra-class feature consistency, and enhance feature distinctiveness between different classes through inter-class constraint, so that a segmentation result of a feature image of a target object is more accurate, a segmentation boundary is clearer, and the method and the apparatus can adapt to image segmentation environments with various complex image backgrounds.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an image segmentation method, which comprises the following steps:

acquiring a characteristic image of a target object;

extracting image features in the category corresponding to the feature image;

obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category;

according to the image features in the category, calculating a pixel set to obtain category central features of the feature images;

interacting the block sequences and the category central features to obtain enhanced category central features;

splicing the enhanced category central features and the image features in the categories to obtain image features between the categories;

and segmenting the characteristic image through the image characteristics between the categories to obtain a segmentation result of the characteristic image of the target object.

An embodiment of the present invention further provides an image segmentation apparatus, where the apparatus includes:

the information transmission module is used for acquiring a characteristic image of the target object;

the information processing module is used for extracting the image features in the category corresponding to the feature image;

the information processing module is used for obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category;

the information processing module is used for carrying out pixel set calculation according to the image characteristics in the category to obtain category central characteristics of the characteristic image;

the information processing module is used for interacting the block sequences and the category central features to obtain enhanced category central features;

the information processing module is used for splicing the enhanced category central feature and the intra-category image feature to obtain an inter-category image feature;

and the information processing module is used for segmenting the characteristic image according to the image characteristics among the categories to obtain the segmentation result of the characteristic image of the target object.

In the above scheme, the information processing module is configured to perform iteration and decoupling processing on the feature image to obtain a decoupling feature image set;

the information processing module is used for calculating the query features corresponding to the target category feature images in the decoupling feature image set;

the information processing module is used for carrying out weighting processing on the query features to obtain significance perception mapping;

the information processing module is used for extracting the salient features of the target category feature images according to the salient perception mapping;

and the information processing module is used for calculating the image characteristics in the category according to the salient characteristics of the target category characteristic image.

In the above scheme, the information processing module is configured to obtain a sliding window matched with the target object;

the information processing module is used for dividing the significance perception mapping through the sliding window to obtain at least two non-overlapping sub-regions;

the information processing module is used for determining a significant position in each sub-area;

the information processing module is used for extracting a salient feature from the target category feature image according to the salient position;

the information processing module is used for projecting the salient features to obtain key values and value items;

the information processing module is used for calculating the characteristics of the enhancement sequences of different categories by using the key values and the value items through a multi-head attention mechanism;

and the information processing module is used for calculating the image characteristics in the categories according to the characteristics of the enhancement sequences in different categories.

In the above scheme, the information processing module is configured to determine information extracted at different scales according to an image segmentation requirement of the target object;

the information processing module is used for determining the total scale quantity according to the information extracted by the different scales;

and the information processing module is used for determining each sliding window matched with the target object according to the total scale number.

In the above scheme, the information processing module is configured to determine a convolution step matched with the image feature in the category;

and the information processing module is used for carrying out convolution processing on the image characteristics in the category according to the convolution step length to obtain a block sequence of the characteristic image.

In the above solution, the information processing module is configured to calculate an initial segmentation result of each category according to the image features in the categories;

the information processing module is used for calculating a pixel representation set of the target category characteristic image according to the initial segmentation result;

and the information processing module is used for determining the class central feature of the feature image according to the pixel representation set of the target class feature image.

In the above scheme, the information processing module is configured to splice the block sequences and the category center features to obtain a spliced feature;

the information processing module is used for performing feature complementation on the splicing features through an attention matrix to obtain complementary splicing features, wherein the attention matrix comprises 4 groups of normalized similar matrices;

and the information processing module is used for splitting the complementary splicing characteristics according to the splicing sequence to obtain an enhanced block sequence and enhanced category central characteristics.

In the above scheme, the information processing module is configured to obtain a feature image of the target object through the in-class dynamic converter network model;

the information processing module is used for extracting the image features in the category corresponding to the feature images through the dynamic converter network model in the category;

the information processing module is used for obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the categories through the inter-category dynamic converter network model;

the information processing module is used for carrying out pixel set calculation according to the image characteristics in the categories through the inter-category dynamic converter network model to obtain the category central characteristics of the characteristic images;

the information processing module is used for interacting the block sequences and the category central features through the inter-category dynamic converter network model to obtain enhanced category central features;

and the information processing module is used for splicing the enhanced class central feature and the intra-class image feature through the inter-class dynamic converter network model to obtain the inter-class image feature.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the image segmentation method of the preamble when the executable instructions stored in the memory are operated.

Embodiments of the present invention further provide a computer program product, where the computer program or the instructions, when executed by a processor, implement a method for pre-order image segmentation.

The embodiment of the invention also provides a computer-readable storage medium, which stores executable instructions and is characterized in that the executable instructions are executed by a processor to realize the image segmentation method of the preamble.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention obtains the characteristic image of the target object; extracting image features in the category corresponding to the feature image; obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category; according to the image features in the category, calculating a pixel set to obtain category central features of the feature images; interacting the block sequences and the category central features to obtain enhanced category central features; splicing the enhanced category central features and the image features in the categories to obtain image features between the categories; and segmenting the characteristic image according to the image characteristics among the categories to obtain the segmentation result of the characteristic image of the target object. Therefore, the characteristics of the image characteristics in the category and the image characteristics between the categories can be fully learned, the problems of insufficient intra-category consistency and inter-category difference are solved, intra-category feature consistency is realized, meanwhile, the feature distinctiveness between different categories is enhanced through inter-category constraint, the segmentation result of the feature image of the target object is more accurate, the segmentation boundary is clearer, and the method can adapt to the image segmentation environment of various complex image backgrounds.

Drawings

FIG. 1 is a schematic diagram of an environment for use in an image segmentation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a conventional scheme for image segmentation using a Transformer network;

FIG. 3 is a diagram illustrating an effect of image segmentation using a transform network in a conventional scheme;

FIG. 4 is a schematic flow chart of an alternative image segmentation method provided in the present application;

FIG. 5 is a schematic diagram of a model structure of a class-aware network model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating the operation of the intra-class dynamic translator network model according to an embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating the operation of the inter-class dynamic translator network model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a comparison between the image segmentation method in the Synapse dataset and the related art test in the embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating a comparison between ACDC data set and MoNuSeg data set and related techniques in an embodiment of the present invention;

FIG. 10 is a diagram illustrating a comparison of the segmentation effect of the image segmentation method in the Synapse dataset and the related art test in the embodiment of the present invention;

FIG. 11 is a schematic diagram illustrating a comparison of the segmentation effect of the image segmentation method in the ACDC data set and the related art test according to the embodiment of the present invention;

FIG. 12 is a schematic diagram illustrating a comparison of segmentation effects of an image segmentation method in a MoNuSeg dataset and related art tests according to an embodiment of the present invention;

fig. 13 is a schematic view of a usage scenario of an image segmentation method according to an embodiment of the present invention;

fig. 14 is a schematic flow chart of an alternative image segmentation method according to an embodiment of the present invention;

fig. 15 is an alternative schematic diagram of an image processing method according to an embodiment of the present invention.

FIG. 16 is a schematic diagram of the front end of the medical image segmentation by the image segmentation method according to the embodiment of the present invention;

fig. 17 is a schematic structural diagram of an image segmentation apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Convolutional Neural Networks (CNN Convolutional Neural Networks) are a class of Feed-forward Neural Networks (Feed-forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms of deep learning (deep learning). The convolutional neural network has the ability of representation learning (shift-invariant classification), and can perform shift-invariant classification on input information according to the hierarchical structure of the convolutional neural network.

2) And (4) model training, namely performing multi-class learning on the image data set. The model can be constructed by adopting deep learning frames such as Tensor Flow, torch and the like, and a multi-segmentation network is formed by combining multiple layers of neural network layers such as CNN and the like. The input of the model is a three-channel or original channel matrix formed by reading the image through openCV and other tools, the output of the model is multi-classification probability, and the image segmentation result is finally output through softmax and other algorithms. During training, the model approaches to a correct trend through target functions such as cross entropy and the like.

3) Neural Networks (NN): an Artificial Neural Network (ANN), referred to as Neural Network or Neural Network for short, is a mathematical model or computational model that imitates the structure and function of biological Neural Network (central nervous system of animals, especially brain) in the field of machine learning and cognitive science, and is used for estimating or approximating functions.

4) In the Computer Aided Diagnosis (AD Computer Aided Diagnosis), AD is used for assisting in finding out a focus and improving the accuracy of Diagnosis by combining the Computer analysis and calculation through the imaging technology, the medical image processing technology and other possible physiological and biochemical means.

5) Endoscopic video streaming: pathological information of a video state formed by image-capturing a body region (different target organs of a human body or an in-vivo lesion) by an image-capturing device (e.g., an endoscope).

Fig. 1 is a schematic usage scenario diagram of an image segmentation method provided in an embodiment of the present invention, wherein a medical image in a medical environment can be processed by the image segmentation method provided in the present invention, and referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with corresponding clients capable of executing different functions, where the clients are terminals (including the terminal 10-1 and the terminal 10-2) that acquire medical images of different corresponding target objects from corresponding servers 200 through a network 300 for browsing, or acquire corresponding medical images, and analyze target regions (e.g., regions of lesion tissues) shown in the medical images, and the terminal is connected to the server 200 through the network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two networks, and data transmission is implemented by using a wireless link, where the types of the medical images of the corresponding target objects acquired by the terminal (including the terminal 10-1 and the terminal 10-2) from the corresponding servers 200 through the network 300 may be the same or different, for example: the terminals (including the terminal 10-1 and the terminal 10-2) can acquire the pathological image or the medical image set matched with the target object from the corresponding server 200 through the network 300, and can acquire the medical image set (such as a CT image) matched with the current target only from the corresponding server 200 through the network 300 for browsing. The server 200 may store medical images of corresponding target objects corresponding to different target objects, or may store auxiliary analysis information matching the medical images of the corresponding target objects of the target objects. In some embodiments of the present invention, the different types of medical images of the respective target objects saved in the server 200 may be endoscopic images acquired by an endoscope or CT images of a patient acquired by a CT machine.

The medical image refers to a technique and a processing procedure for obtaining an image of an internal tissue of a human body or a part of the human body in a non-invasive manner for medical treatment or medical research, and includes but is not limited to: images generated by medical instruments, such as CT, MRI, ultrasound, X-ray, electrocardiogram, electroencephalogram, optical photography, etc., are an important means and reference factor for assisting clinical diagnosis, and the intrinsic heterogeneity of different disease symptoms is also reflected in their imaging phenotypes (appearance and shape). Therefore, the medical image is used for etiological diagnosis or image segmentation of a focus tissue region, and a doctor can be effectively assisted to accurately diagnose the etiological disease. In the related art, the deep convolutional neural network algorithm has been widely applied to image segmentation, and various images of medical images formed by different medical devices are continuously generated, for example, the images are continuously generated as different time points are required for the development of the patient's condition or the images are continuously shot in departments, and thus become a large amount of data, and it is urgently needed to implement large-scale classification and identification by means of classification prediction.

However, in the related art, although the deep convolutional neural network algorithm is widely applied to image segmentation, most of the classical segmentation methods cannot give consideration to the overall characteristics and the local characteristics, and it is difficult to give consideration to network accuracy and memory consumption, while the medical images have more details and need to be captured by the network, so that the auxiliary diagnostic information in the large 2D diagnostic images and the 3D diagnostic images needs to be obtained by extracting the deeper network and more local characteristics.

The conventional neural network technology usually adopts an encoder-decoder structure (encoder-decoder technology) for segmentation, namely, the image is firstly sampled to extract features, then the images are sampled to return to the size of the original image, a high-resolution image channel is maintained to keep effective information, but the image occupies a larger memory due to high-resolution operation, and the parameter occupies an increased memory due to the fact that multiple dimensions bring more channels, so that the conventional neural network technology can only construct a shallow network and cannot be applied to larger 2D images or 3D images.

Furthermore, a reversible residual error network RevNet technology can be adopted, the RevNet has the advantages that the RevNet can be connected in series almost infinitely without increasing the memory consumption caused by image calculation, and only some intermediate calculation parameters are increased, but the reversible residual error network has the problems that the structure is too simple, calculation must be carried out under a resolution scale, and therefore the precision of network tasks cannot be improved due to limited complexity.

In the medical image use environment, taking a medical image as an endoscope image as an example, at least two original endoscope images in an endoscope video stream are a set of multi-view pathological images obtained by repeatedly observing a suspected target object area through operations such as moving a camera, switching magnification and the like during the process of using the endoscope by a doctor, and information of a specific view under the endoscope is fused. Since all information in the visual field of the endoscope during the process of observing the target object (such as the region of the focal tissue) of the patient by the doctor is recorded in the video stream of the endoscope, the information of the target object (such as the region of the focal tissue) of the patient observed by the doctor in the visual field of the endoscope is utilized as a continuous video stream, so that the condition that the doctor ignores the tiny lesion region during the process of rapidly moving the endoscope is avoided, and more information than a single-frame picture is provided to assist the doctor in diagnosing and finding the tiny lesion region. In this process, a clear endoscopic image is required to assist diagnosis of a doctor, but due to restrictions of a mechanical endoscopic imaging environment or operation restrictions of an operator, the presented endoscopic image is often an endoscopic image for a large environment of a focus, cannot be focused on a specific position of the focus for imaging, and is not beneficial to the classification of the doctor by the endoscopic image focusing on the focus.

The embodiment of the present invention may be implemented by combining a Cloud technology, where the Cloud technology (Cloud technology) refers to a hosting technology for unifying series resources such as hardware, software, and a network in a wide area network or a local area network to implement calculation, storage, processing, and sharing of data, and may also be understood as a generic term of a network technology, an information technology, an integration technology, a management platform technology, an application technology, and the like applied based on a Cloud computing business model. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, photo-like websites and more portal websites, so cloud technology needs to be supported by cloud computing.

It should be noted that cloud computing is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can obtain computing power, storage space and information services as required. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand. As a basic capability provider of cloud computing, a cloud computing resource pool platform, which is called an Infrastructure as a Service (IaaS) for short, is established, and multiple types of virtual resources are deployed in a resource pool and are used by external clients selectively. The cloud computing resource pool mainly comprises: a computing device (which may be a virtualized machine, including an operating system), a storage device, and a network device.

With reference to the embodiment shown in fig. 1, the target object determining method provided in the embodiment of the present invention may be implemented by corresponding cloud devices, for example: the terminals (including the terminal 10-1 and the terminal 10-2) are connected to the server 200 located at the cloud end through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two. It should be noted that the server 200 may be a physical device or a virtualization device.

Specifically, as shown in fig. 1 in the preamble embodiment, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

It should be noted that the target object (e.g. a region of lesion tissue) of the patient viewed under the endoscope (medical device connected to the target object) may include a plurality of different application scenarios, such as different video stream screens for screening glycogenopathy, early screening for cervical cancer, and the like. The image segmentation method based on the embodiment can be deployed to various application scenes, so that remote reference and use of doctors are facilitated.

The server 200 transmits medical image information of a corresponding target object of the same target object to the terminal (terminal 10-1 and/or terminal 10-2) through the network 300 to enable the user of the terminal (terminal 10-1 and/or terminal 10-2) to analyze the image information of the corresponding target object of the target object, and thus. As an example, the server 200 deploys a corresponding image segmentation, which includes a class-aware network model provided in this application, for outputting inter-class image features, so that the image segmentation may segment the feature image by using the inter-class image features to obtain a segmentation result of the feature image of the target object.

Before introducing the image segmentation method provided by the embodiment of the present invention, a process of performing image segmentation in the related art in the present invention is first introduced, where fig. 2 is a schematic diagram of a principle of performing image segmentation by using a transform network in a conventional scheme, and fig. 3 is a schematic diagram of an effect of performing image segmentation by using a transform network in a conventional scheme, and as shown in fig. 2, the transform network mainly includes two structures: multi-head attention Mechanism (MHSA) and one-site fully-connected feed-forward network (FFN). Firstly, inputting a characteristic diagram, and respectively obtaining the following three nonlinear mappings: query (Q), key value (K), value item (V). The dot product between Q and K transpose is first calculated, then the resulting attention matrix (of size HW × HW) is normalized to a probability distribution using Softmax operation, and multiplied by matrix V to obtain a representation of the sum of weights. The enhanced features are further input into the FFN, learning more non-linear relationships between features. When the medical image is segmented using the neural network model of fig. 2, as shown in fig. 3, fig. 3 (a) and (c) show the segmentation results of TransUNet, and fig. 3 (b) and (d) show the segmentation results of swinnunet, although the relationship between the pixel points on the liver and the pixels on organs such as the stomach, the left kidney, and the right kidney can be captured at a long distance. However, the probability of dividing the stomach into the spleen is high, and the edge of the division result is blurred, so that the conventional transform method usually acts on the feature map without distinguishing the classes, that is, all the class objects are highlighted at the same time. The weighted set of all these salient regions (based on multiple object inference) results in confusing pixel-to-pixel relationships that may compromise intra-class consistency learning, such as the incomplete stomach segmentation shown in fig. 3. Meanwhile, conventional transformers typically model only pixel-level dependencies, but often ignore object-to-object dependencies between semantic classes. This limits the ability to accurately segment between different organs, especially organs with similar contextual information and close locations, which can easily be confused, for example, in fig. 3 where a portion of the stomach is segmented incorrectly into spleens.

In order to solve the above-mentioned drawbacks, referring to fig. 4, fig. 4 is an alternative flowchart of the image segmentation method provided in the present application, and it can be understood that the steps shown in fig. 4 can be executed by various electronic devices operating a segmentation network device, for example, a server with a medical image processing terminal, an image segmentation function, or a server cluster, to implement accurate segmentation for complex images adapted in different usage scenarios. The following is a description of the steps shown in fig. 4.

Step 401: the image segmentation device acquires a characteristic image of a target object.

In some embodiments of the present invention, the image segmentation method provided by the present application may be implemented based on a class-aware network model, where the class-aware network model at least includes an intra-class dynamic converter network model and an inter-class dynamic converter network model, and referring to fig. 5, fig. 5 is a schematic diagram of a model structure of the class-aware network model in the embodiments of the present invention, where an architecture of the class-aware network model (ClassFormer) includes two parts: intra-Class Dynamic converter network model (IDT Intra-Class Dynamic converter) and Inter-Class Dynamic converter network model (IIT Inter-Class Interactive)ve Transformer) for extracting a characteristic image X from a certain layer of the network as the input of a class perception network model; sequentially processing through the IDT shown on the left side and the IIT module on the right side; finally, the enhanced image characteristics X between the categories are output _inter . The complex image can be segmented by utilizing the image characteristics among the categories, and a segmentation result with clear edges and accurate segmentation is obtained. In the structure shown in fig. 5, inter-class and intra-class are terms in image classification, and are used to describe two different features, for example, it is known that a data set used for training in a face recognition task is divided into many different people, each of which is a category, and each category has several separate pictures, where inter-class represents features between categories, such as differences between different face features in face recognition; intra-class represents features within a class, such as differences in features of the face of the same person in different states.

The following description continues with respect to the processing procedure from step 402 to step 406 by taking the class-aware network model as an example to execute the image segmentation method provided by the present application.

Step 402: the image segmentation device extracts image features in the category corresponding to the feature image.

In some embodiments of the present invention, when extracting the image features in the category corresponding to the feature image through the network model of the dynamic converter in the category, the following method may be implemented:

carrying out iteration and decoupling processing on the characteristic images to obtain a decoupling characteristic image set; calculating query features corresponding to target category feature images in the decoupling feature image set; carrying out weighting processing on the query features to obtain significance perception mapping; extracting the salient features of the target category feature images according to the salient perception mapping; and calculating the image characteristics in the category according to the remarkable characteristics of the target category characteristic image. Wherein, a characteristic image is extracted from a certain layer of the CNN network (such as ResUNet)

And H/W/D is respectively the height, width and characteristic dimension corresponding to the characteristic image.Then, repeating the iteration for C times for the feature maps without distinguishing categories to obtain ^ based on>

Where C represents the number of categories. Inputting the repeated characteristic images into a grouping convolution layer (grouping conv) with C groups to generate a decoupled characteristic image set ^ greater than or equal to>

Wherein

Features denoted as class C [ ·]Indicating a splicing operation. In order to let the decoupled feature X _C More specifically, category C information is included, with each decoupling feature generating a corresponding binary prediction result, i.e., each predicted pixel value represents the likelihood that the pixel belongs to that category. By using a binary cross entropy loss function L _bce Each decoupled feature map gradually has explicit class perception cues.

Referring to fig. 6, fig. 6 is a schematic diagram of the working process of the intra-class dynamic Transformer network model in the embodiment of the present invention, wherein, in order to capture the remote dependency relationship of the classification, a simple method is to use a Transformer with global relationship for the feature map of each class. However, in the conventional Transformer, each query Q has to process a large number of key values K, resulting in redundant relevance computations between irrelevant K. Thus, the significant K in each class can be adaptively selected by the intra-class dynamic converter network model shown in fig. 6, thereby substantially reducing redundant computations and focusing more on significant features.

According to the salient features of the target class feature image, the image features in the class are calculated, and the method can be realized by the following steps: acquiring a sliding window matched with a target object; dividing the significance perception mapping through a sliding window to obtain at least two non-overlapping sub-regions; determining a significant location in each sub-region; extracting salient features from the target category feature images according to the salient positions; for salient featuresPerforming line projection to obtain key values and value items; calculating the characteristics of the enhancement sequences of different classes by using key values and value items through a multi-head attention mechanism; and calculating the image characteristics in the categories according to the characteristics of the enhancement sequences of different categories. As shown in FIG. 6, the decoupled class C feature image X is input _C The query feature Q can be obtained by linear projection _C ＝X _c W _q The query features are then input into a simple saliency sense network θ _SN (including convolutional layers and active layers), the pixels of the target class feature image are re-weighted, i.e., the weights of the salient pixels are increased, while the weights of the irrelevant pixels are suppressed. At significance aware network θ _SN In the method, input features are subjected to 3*3 convolution operation and a GELU activation function to extract local features. The active layer may use a GELU activation function, a Softmax activation function, or a ReLU activation function, which is not specifically limited in this embodiment of the present invention.

Since pooling along the channel dimension can effectively activate the information region, the maxima and means on the channel side can be pooled and then connected. The stitched features were then processed through the 3*3 convolutional layer shown in fig. 6 to obtain the saliency perception map

Wherein H/W is the corresponding height and width respectively. To achieve dynamic key value K selection across the space Λ, the saliency map M is first divided into a set of non-overlapping sub-regions Λ using a sliding window of size δ _s . At each->

In which the position with the greatest value is selected as significant position->

Where p is the position of each pixel. Thus, a set of significant positions ^ is formed over the entire space Λ>

Wherein

Indicating the number of positions.

Because the image segmentation method provided by the application needs to segment different complex images, the class-aware network model needs to be set as a plug-and-play module, and is integrated into a ResUNet network to segment the complex images. Different image segmentation environments have different segmentation requirements and extraction scales, so that in order to increase the universality of the image segmentation method and adapt to different image segmentation requirements, information extracted in different scales can be determined according to the image segmentation requirements of a target object; determining the total scale quantity according to the information extracted by different scales; each sliding window matching the target object is determined according to the total scale number, and referring to fig. 6, a pyramid mechanism may be adopted to adaptively sample salient points from multiple scales. In particular, for each scale, a different significance aware network θ is used _SN And corresponding delta (size of sliding window), unique patterns of different scales can be adaptively learned. Thus, the pyramid

Information extracted from different scales is fused, wherein S is the total scale number. From the salient position P, X can be input _C Extracting significant features from

Further obtaining a corresponding key value K through linear projection _c Sum term V _C Wherein is present>

Continuing to capture remote dependency:, via a multi-headed attention mechanism>

Combining enhancement sequences from different classes

Converting the image into a two-dimensional feature, splicing the two-dimensional feature with the original input feature, inputting the two-dimensional feature into a convolutional layer of 1*1 size, and generating a final image feature (or ^ of the image in the output class>

Wherein H/W/D is the height, width and feature dimension corresponding to the feature image respectively.

Through the processing from step 401 to step 402, the intra-class dynamic transformer network model completes the image processing process, and outputs the intra-class image features to the inter-class dynamic transformer network model

The inter-class dynamic translator network model bases on the intra-class image feature ≧ in steps 403-406>

The processing is performed to output the inter-class image features, and the working process of the inter-class dynamic converter network model is described below.

Step 403: and the image segmentation device obtains a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category.

Referring to fig. 7, fig. 7 is a schematic diagram of a working process of a network model of an inter-class dynamic transformer according to an embodiment of the present invention, and as shown in the left side of fig. 7, a convolution step size matching with an image feature in a class is first determined; performing convolution processing on the image features in the category according to the convolution step length to obtain a block sequence of the feature image, specifically, inputting the image features in the category

A high-level representation with a convolution step size of H/r W/r D is obtained using one maximal pooling layer with convolution step size r and two consecutive convolution layers with size 3*3, and its shape is further transformed into a block sequence with rich semantics (patch sequence) which is then transformed into a block sequence with rich semantics>

Step 404: and the image segmentation device carries out pixel set calculation according to the image features in the category to obtain the category central features of the feature images.

For the class center feature of the feature image used in fig. 7, in the calculation, the initial segmentation result of each class may be calculated according to the intra-class image feature; then, according to the initial segmentation result, calculating a pixel representation set of the target category characteristic image; and finally, determining the category central feature of the feature image according to the pixel representation set of the target category feature image. As shown in FIG. 7, first, the intra-class image feature X is utilized _intra Predicting an initial segmentation result for each class

Wherein, y _c For the initial segmentation result, H/W is the height and width, respectively, corresponding to the feature image, and the representations of all pixels are grouped together based on the probability of belonging to the c-th class>

Wherein +>

Represents a characteristic of the ith pixel>

Is the probability that the ith pixel belongs to class c. Thereby, the generated->

Denoted as the center feature of the c-th category. Therefore, the defect that only the relation at the pixel level is concerned and the category dependence between semantic objects is ignored in the related technology is overcome.

Step 405: and the image segmentation device interacts the block sequences and the category central features to obtain enhanced category central features.

In some embodiments of the present invention, by interacting a blocking sequence with high-level semantics with different category central features, category correlations between the blocking sequence and the different category central features can be determined, and when interacting the blocking sequence and the category central features, the blocking sequence and the category central features are spliced to obtain a splicing feature; performing feature complementation on the splicing features through an attention moment matrix to obtain complementary splicing features, wherein the attention matrix comprises 4 groups of normalized similar matrices; and splitting the complementary splicing characteristics according to the splicing sequence to obtain an enhanced block sequence and enhanced category central characteristics. Specifically, referring to fig. 7, a block sequence is divided

And a category center feature>

Splicing is performed, and then remote interaction is captured through a Transformer. Thus, the feature Q/K/V after three linear projections has a shape size of (C + HW/r) ² ) D, the resulting attention matrix Α may be expressed as:

wherein A is ^c→c 、Α ^p→p 、Α ^c→p 、Α ^p→c In the form of a normalized similarity matrix, A ^c→c Representing class to class, A ^p→p Indicates patch to patch, A ^c→p Representing class to patch, Α ^p→c Indicating patch to class. Wherein, A ^c→c And A ^p→p Is a normalized similarity matrix, Α, of a self-attention mechanism ^c→p 、Α ^p→c Feature interaction can be achieved by learning complementary clues through cross-attention. Therefore, the block sequence of high-level semantics can explore explicit category dependency relationships from the category center features, so that the characteristic enhancement of category guidance is realized; and the category center feature can capture rich semantics from the block sequenceThe diversity of the classes can be increased. And then splitting the enhanced sequence into a block sequence and a category center characteristic again according to the previous splicing sequence. In order to further improve the difference of the characteristics, an Euclidean distance loss function L is carried out on the enhanced class central characteristics _EU And constraining to enable the distance between the enhanced class tokens to reach a maximum value.

Step 406: and the image segmentation device splices the enhanced class central features and the intra-class image features to obtain inter-class image features.

As shown in FIG. 7, in the inter-class image feature calculation, the source information is retained together with the inputted intra-class image feature X _intra Splicing, and obtaining the final output image characteristics between classes after the convolution layer of 1*1

Through the processing of steps 403 to 406, the inter-class dynamic converter network model converts the input intra-class image features into inter-class image features, and the inter-class image features obtained in step 406 inherit the intra-class consistency, so that the image segmentation network can maintain intra-class consistency and increase inter-class difference, thereby improving the characterization capability of the features and generating high-quality segmentation performance.

Step 407: the image segmentation device segments the characteristic image according to the image characteristics among the categories to obtain the segmentation result of the characteristic image of the target object.

Because the image segmentation method provided by the application needs to segment different complex images, the class-aware network model needs to be set as a plug-and-play module, and the class-aware network model is integrated into a ResUNet network to segment the complex images. Therefore, it is necessary to train class-aware network models of the intra-class dynamic converter network model and the inter-class dynamic converter network model to determine parameters of the intra-class dynamic converter network model and parameters of the inter-class dynamic converter network model. Respectively, synapse, ACDC and monseg datasets containing different numbers of categories and medical images acquired by different devices. Wherein:

the Synapse dataset contains 9 classes (8 abdominal organs and 1 background class) involving 30 Computed Tomogry (CT) scan cases, 18 for training and 12 for testing.

The ACDC dataset contains 4 classes (left ventricle (LV), right Ventricle (RV), myocardium (MYO) and 1 background class), which relate to 100 Magnetic Resonance Imaging (MRI) cases, 70 for training, 10 for validation and 20 for testing.

The monnuseg dataset contains 2 categories (cell foreground and background) and a total of 44 pathology images, 30 for training images and 14 for testing images.

In order to avoid overfitting, two data augmentation methods of random rotation and random flipping are adopted. Next, the commonly used batch size (bs), learning rate (lr), maximum training epoch (ep), solver (opt) of the three datasets are listed in turn:

Synapse:bs＝8；lr＝3e-3；ep＝600；opt＝SGD；

ACDC:bs＝8；lr＝3e-3；ep＝200；opt＝SGD；

MoNuSeg:bs＝4；lr＝1e-3；ep＝200；opt＝Adam；

all datasets used momentum =0.9 and weight decay =0.0001.

After determining the parameters of the intra-class dynamic converter network model and the parameters of the inter-class dynamic converter network model, the trained class-aware network model can be integrated into a ResUNet network. The encoder portion of ResUNet has 4 convolution modules consisting of basic resnet-34 blocks, where the number of channels increases with increasing network depth, and the resolution of the feature map decreases. Given an input image I ∈ R ^ (224X 224), a ResNet encoder (encoder) first maps pixels in I to a nonlinear feature X ∈ R ^ (28X 512), and inputs the nonlinear feature X ∈ R ^ (28X 512) into a class-aware network model module to explore class-aware dependencies. The enhanced features are further sent to the decoder for progressive upsampling. And finally, obtaining a final segmentation result through a segmentation layer.

When testing the image segmentation method provided by the present application, refer to fig. 8 and 9, fig. 8 is a schematic diagram illustrating a comparison between the image segmentation method in the embodiment of the present invention and a related technology in a Synapse dataset, and fig. 9 is a schematic diagram illustrating a comparison between the image segmentation method in the embodiment of the present invention and a related technology in an ACDC dataset and a monseg dataset, where ClassFormer shown in fig. 8 and 9 is a test result of the image segmentation method of the present application, a Dice coeffient (DSC index) and a Hausdorff Distance (HD index) are used on the Synapse dataset, and DSC indexes of 8 abdominal organs are listed respectively, a DSC index is used on the ACDC dataset, and DSC indexes of a Left Ventricle (LV), a Right Ventricle (RV), and a Myocardium (MYO) are listed respectively, and a DSC index and an Intersection unift (IoU index) are used on the monseg dataset. As can be seen from fig. 8 and fig. 9, the image segmentation method implemented by the class-aware network model of the present application is consistently superior to the test results of all related technologies, and significantly improves the performance on all data sets. Particularly, in the parameter quantities shown in FIG. 8, the image segmentation method realized by the class-aware network model in the present application reduces the parameter quantity of 76.15M compared with the best method before, but improves DSC: +1.54 and HD: -4.59mm in precision.

Referring to fig. 10, fig. 11 and fig. 12, fig. 10 is a schematic diagram illustrating a comparison of a segmentation effect of an image segmentation method in a Synapse data set and a test of a related technology in an embodiment of the present invention, fig. 11 is a schematic diagram illustrating a comparison of a segmentation effect of an image segmentation method in an ACDC data set and a test of a related technology in an embodiment of the present invention, fig. 12 is a schematic diagram illustrating a comparison of a segmentation effect of an image segmentation method in an monseg data set and a test of a related technology in an embodiment of the present invention, as shown in fig. 10 to fig. 12, three types of test data segmentation effects show, the image segmentation method provided in the present application can accurately segment salient organs with different sizes, shapes and positions. In addition, the boundary of the segmentation result is clearer than other results, the significant organs with different sizes, shapes and positions can be accurately segmented, the probability of wrong segmentation is reduced, and accurate segmentation of a complex medical image by a user is facilitated.

In order to better explain the working process of the image segmentation method, the image segmentation method provided by the invention is explained below by taking internal hemorrhage of the organism such as cerebral hemorrhage and fundus hemorrhage of the target object as an example,

fig. 13 is a schematic view of a use scenario of the image segmentation method according to the embodiment of the present invention, and referring to fig. 13, fig. 13 is a schematic view of an application scenario of the blood vessel image processing system 10 according to the embodiment of the present invention, and the terminal 200 may be located in various institutions (e.g., hospitals and medical research institutes) with medical attributes, and may be used to acquire (e.g., an image acquisition device of the terminal 200, or through another image acquisition device 400) a fundus image (i.e., a blood vessel image to be processed) of a patient.

In some embodiments, the terminal 200 locally executes the blood vessel image processing method provided by the embodiment of the present invention to perform blood vessel segmentation and blood vessel classification of the fundus image, and outputs the results of the blood vessel segmentation and the blood vessel classification in a graphical manner, so that doctors and researchers can perform the study of diagnosis, re-diagnosis and treatment methods of diseases, for example, morphological performances of different types of blood vessels can be determined according to the results of the blood vessel segmentation and the blood vessel classification of the fundus image, and then assist or directly diagnose whether the patient has cardiovascular and cerebrovascular disease risk or hypertensive retinopathy.

The terminal 200 can also transmit the fundus image to the server 100 through the network 300, and call the function of the remote diagnosis service provided by the server 100, the server 100 performs multitask of blood vessel segmentation and blood vessel classification through the blood vessel image processing method provided by the embodiment of the invention, and the results of the blood vessel segmentation and the blood vessel classification are returned to the terminal 200 for doctors and researchers to perform diagnosis, re-diagnosis and research of treatment methods of diseases.

The terminal 200 can display various intermediate results and final results of the blood vessel image processing, such as a fundus image, segmentation results and classification results of fundus blood vessels, and the like, in the graphical interface 210.

Continuing with the structure of the blood vessel image processing device provided by the embodiment of the present invention, the blood vessel image processing device may be various terminals, such as medical diagnostic equipment, computers, etc., or may be the server 100 shown in fig. 1.

The following describes an image segmentation method provided by the present invention, taking medical information for determining a cerebral hemorrhage case as an example, wherein various images forming a medical image are continuously generated, for example, a magnetic resonance MRI image is continuously captured and continuously generated along with a CT scout image, and thus a large amount of data is generated, and it is urgently needed to perform large-scale classification and identification by means of classification prediction.

Referring to fig. 14, fig. 14 is an optional flowchart of the image segmentation method according to the embodiment of the present invention, where the user may be a doctor, and the target object is a patient, and the method specifically includes the following steps:

step 1401: and acquiring a blood vessel characteristic image to be segmented of the target object.

Fig. 15 is an optional schematic diagram of an image processing method according to an embodiment of the present invention, in which a foreground a (for example, the medical terminal 400 shown in fig. 10) receives image data (for example, a medical image to be processed by a user in a preamble removing embodiment), and then uploads the image data to a background (for example, the server 100 shown in fig. 10) after passing through a pre-processing algorithm, including but not limited to data augmentation and segmentation such as translation, rotation, symmetry, and the like, and the background segments the medical image by using the image processing method provided in the present application, and then outputs the segmented medical image to a foreground B (for example, the terminal 200 shown in fig. 10), and a doctor can clearly observe the segmented medical image through a display device of the foreground B.

Step 1402: and extracting image features in the category corresponding to the blood vessel feature image.

When the blood vessel characteristic image is segmented, the shapes of tumor capillaries are extremely diverse, a plurality of signal paths participate in regulating and controlling the growth of the capillaries, and the current medical research shows that the capillaries with different shapes and densities on the tumor tissue of a patient are related to the prognosis and the response to treatment of the patient. Therefore, when image segmentation is carried out, the micro-vessels and mature vessels are separated, different micro-vessel forms are mutually distinguished, intra-class variations are reduced, inter-class variations are improved, and accuracy of image segmentation can be greatly improved

Step 1403: and obtaining a block sequence of the blood vessel characteristic image through convolution calculation, and performing pixel set calculation according to the image characteristics in the category to obtain the category central characteristic of the blood vessel characteristic image.

Step 1404: and interacting the blocking sequence and the category central features to obtain enhanced category central features, and splicing the enhanced category central features with the image features in the categories to obtain the image features between the categories.

Because the image segmentation method provided by the application needs to segment different complex images, the class-aware network model needs to be set as a plug-and-play module, and the class-aware network model is integrated into a ResUNet network to segment the complex blood vessel images. Different blood vessel image segmentation environments have different segmentation requirements and extraction scales, so that in order to increase the universality of the image segmentation method and adapt to different blood vessel image segmentation requirements, information extracted in different scales can be determined according to the blood vessel image segmentation requirements; determining the total scale quantity according to the information extracted by different scales; and determining each sliding window matched with the target object according to the total scale quantity to obtain accurate inter-category image characteristics of different types of blood vessel images.

Step 1405: and (4) segmenting the blood vessel characteristic image through the image characteristics among the categories to obtain a segmentation result of the blood vessel characteristic image.

Fig. 16 is a schematic front end view of segmenting a medical image by an image segmentation method according to an embodiment of the present invention; when the medical image in the medical information is displayed, the segmentation area of the medical image in the user interface is locked through the control assembly; segmenting the medical image by an image segmentation method to realize the segmentation of the medical image; and displaying the segmentation results of the CT image and the contrast image respectively corresponding to the internal hemorrhage of the organism, such as cerebral hemorrhage, fundus hemorrhage, pulmonary hemorrhage and the like of the target object in a display interface through a user interface.

In order to implement the image segmentation method provided by the present application, the present application further provides corresponding hardware devices, and the following describes in detail the structure of the image segmentation apparatus according to the embodiment of the present invention, the image segmentation apparatus may be implemented in various forms, such as a dedicated terminal with a logic rule network training processing function, or a server provided with a processing function of the image segmentation apparatus, such as the server 200 in the foregoing fig. 1. Fig. 17 is a schematic diagram of a composition structure of an image segmentation apparatus according to an embodiment of the present invention, and it can be understood that fig. 17 only shows an exemplary structure of the image segmentation apparatus, and not the entire structure, and a part of or the entire structure shown in fig. 17 may be implemented as needed.

The image segmentation device provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the image segmentation apparatus are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 17.

The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in the embodiments of the present invention is capable of storing data to support the operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

In some embodiments, the image segmentation apparatus provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and for example, the image segmentation apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the image segmentation method provided in the embodiments of the present invention. For example, a processor in the form of a hardware decode processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

As an example of the image segmentation apparatus provided by the embodiment of the present invention implemented by combining software and hardware, the image segmentation apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, where the software modules may be located in a storage medium located in the memory 202, and the processor 201 reads executable instructions included in the software modules in the memory 202, and completes the image segmentation method provided by the embodiment of the present invention in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).

By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

As an example of the image segmentation apparatus provided by the embodiment of the present invention implemented by hardware, the apparatus provided by the embodiment of the present invention may be implemented by directly using the processor 201 in the form of a hardware decoding processor, for example, by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field-Programmable Gate arrays (FPGAs), or other electronic components, to implement the image segmentation method provided by the embodiment of the present invention.

The memory 202 in the embodiment of the present invention is used to store various types of data to support the operation of the image segmentation apparatus. Examples of such data include: any executable instructions for operating on the image segmentation apparatus, such as executable instructions, may be included in the executable instructions, and the program implementing the slave image segmentation method according to the embodiment of the present invention may be included in the executable instructions.

In other embodiments, the image segmentation apparatus provided by the embodiment of the present invention may be implemented in software, and fig. 17 illustrates the image segmentation apparatus stored in the memory 202, which may be software in the form of programs, plug-ins, and the like, and includes a series of modules, and as an example of the programs stored in the memory 202, the image segmentation apparatus may include the following software modules:

an information transmission module 2081 and an information processing module 2082. When the software modules in the image segmentation apparatus are read into the RAM by the processor 201 and executed, the image segmentation method provided by the embodiment of the present invention will be implemented, where the functions of each software module in the image segmentation apparatus include:

the information transmission module 2081 is used for acquiring a characteristic image of a target object;

the information processing module 2082 is used for extracting image features in the category corresponding to the feature image;

the information processing module 2082 is used for obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category;

the information processing module 2082 is used for performing pixel set calculation according to the image characteristics in the category to obtain the category central characteristics of the characteristic image;

the information processing module 2082 is used for interacting the blocking sequences and the category central features to obtain enhanced category central features;

the information processing module 2082 is used for splicing the enhanced category central features and the intra-category image features to obtain inter-category image features;

the information processing module 2082 is configured to segment the feature image according to the inter-category image features to obtain a segmentation result of the feature image of the target object.

According to the electronic device shown in fig. 17, in one aspect of the present application, the present application further provides a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes different embodiments and combinations of embodiments provided in various alternative implementations of the image segmentation method.

The invention has the following beneficial technical effects:

1) The embodiment of the invention obtains the characteristic image of the target object; extracting image features in the category corresponding to the feature image; obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category; according to the image features in the category, calculating a pixel set to obtain category central features of the feature images; interacting the blocking sequences and the category central features to obtain enhanced category central features; splicing the enhanced category central features and the intra-category image features to obtain inter-category image features; and segmenting the characteristic image through the image characteristics among the categories to obtain the segmentation result of the characteristic image of the target object. Therefore, the characteristics of the image features in the categories and the image features between the categories can be fully learned, the problems of insufficient intra-category consistency and inter-category difference are solved, intra-category feature consistency is realized, meanwhile, the feature distinctiveness between different categories is enhanced through inter-category constraint, the segmentation result of the feature image of the target object is more accurate, the segmentation boundary is clearer, and the method can adapt to the image segmentation environment of various complex image backgrounds.

2) Through the processing of the intra-class dynamic converter network model, through the decoupling of the representations of different classes, the highlighted keys/values are selected from multiple scales in a self-adaptive manner, so that compact learning can be realized, and more compact intra-class features can be obtained. A more representative center feature within the initial category may also be provided.

3) Through the processing of the inter-class dynamic converter network model, the dependency relationships of different classes can be obtained, the class information is fully mined, and the accuracy of the complex image segmentation is improved.

The above description is only exemplary of the present invention and should not be construed as limiting the scope of the present invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of image segmentation, the method comprising:

acquiring a characteristic image of a target object;

extracting image features in the category corresponding to the feature image;

and segmenting the characteristic image according to the image characteristics among the categories to obtain the segmentation result of the characteristic image of the target object.

2. The method according to claim 1, wherein the extracting the image features in the category corresponding to the feature image comprises:

carrying out iteration and decoupling processing on the characteristic image to obtain a decoupling characteristic image set;

calculating query features corresponding to target category feature images in the decoupling feature image set;

carrying out weighting processing on the query features to obtain significance perception mapping;

extracting the salient features of the target category feature image according to the salient perception mapping;

and calculating the image features in the category according to the salient features of the target category feature image.

3. The method according to claim 2, wherein the calculating the image features in the class according to the salient features of the target class feature image comprises:

acquiring a sliding window matched with the target object;

dividing the significance perception mapping through the sliding window to obtain at least two non-overlapping sub-regions;

determining a significant location in each sub-region;

extracting a salient feature from the target category feature image according to the salient position;

projecting the significant features to obtain key values and value items;

calculating the characteristics of the enhancement sequences of different classes by using the key values and the value items through a multi-head attention mechanism;

and calculating the image characteristics in the categories according to the characteristics of the enhancement sequences of the different categories.

4. The method of claim 3, further comprising:

determining information extracted in different scales according to the image segmentation requirements of the target object;

determining the total scale quantity according to the information extracted by the different scales;

and determining each sliding window matched with the target object according to the total scale quantity.

5. The method according to claim 1, wherein obtaining the block sequence of the feature image by convolution calculation according to the image feature in the class comprises:

determining a convolution step size matching the image features within the category;

and carrying out convolution processing on the image features in the category according to the convolution step length to obtain a block sequence of the feature image.

6. The method according to claim 1, wherein the performing a pixel set calculation according to the image features in the category to obtain a category center feature of the feature image comprises:

calculating an initial segmentation result of each category according to the image features in the categories;

calculating a pixel representation set of a target category characteristic image according to the initial segmentation result;

and determining the class central feature of the feature image according to the pixel representation set of the target class feature image.

7. The method of claim 1, wherein the interacting the sequence of patches and the category-centric feature to obtain an enhanced category-centric feature comprises:

splicing the block sequences and the category central features to obtain spliced features;

performing feature complementation on the splicing features through an attention matrix to obtain complementary splicing features, wherein the attention matrix comprises 4 groups of normalized similar matrices;

and splitting the complementary splicing characteristics according to the splicing sequence to obtain an enhanced block sequence and enhanced category central characteristics.

8. The method according to claim 1, wherein the method is implemented based on a class-aware network model, the class-aware network model includes at least an intra-class dynamic transformer network model and an inter-class dynamic transformer network model, and the obtaining the feature image of the target object includes:

acquiring a characteristic image of the target object through the in-class dynamic converter network model;

the extracting of the image features in the category corresponding to the feature image includes:

extracting the image features in the category corresponding to the feature images through the dynamic converter network model in the category;

obtaining a block sequence of the feature image through convolution calculation according to the image features in the category, wherein the block sequence comprises the following steps:

obtaining a block sequence of the characteristic image through convolution calculation according to the image characteristics in the category through the inter-category dynamic converter network model;

the calculating a pixel set according to the image features in the category to obtain the category central features of the feature images comprises:

performing pixel set calculation according to the image characteristics in the categories through the inter-category dynamic converter network model to obtain category central characteristics of the characteristic images;

the interacting the block sequences and the category central features to obtain enhanced category central features, including:

interacting the block sequences and the category central features through the inter-category dynamic converter network model to obtain enhanced category central features;

the step of splicing the enhanced category central features and the intra-category image features to obtain inter-category image features comprises the following steps:

and splicing the enhanced class central features and the intra-class image features through the inter-class dynamic converter network model to obtain inter-class image features.

9. An image segmentation apparatus, characterized in that the apparatus comprises:

10. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the image segmentation method of any one of claims 1 to 8 when executing the executable instructions stored by the memory.

11. A computer program product comprising a computer program or instructions for implementing the image segmentation method according to any one of claims 1 to 8 when executed by a processor.

12. A computer-readable storage medium storing executable instructions, wherein the executable instructions when executed by a processor implement the image segmentation method of any one of claims 1 to 8.