CN113470029B

CN113470029B - Training method and device, image processing method, electronic device and storage medium

Info

Publication number: CN113470029B
Application number: CN202111032467.0A
Authority: CN
Inventors: 边成
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2021-12-03
Anticipated expiration: 2041-09-03
Also published as: CN113470029A; WO2023030281A1

Abstract

A training method and device for an image processing network, an image processing method, an electronic device and a storage medium are provided. The training method of the image processing network comprises the following steps: acquiring a first training image, wherein the first training image comprises a target area to be segmented; processing the first training image by using a classification network to obtain a peak response image corresponding to the first training image, wherein the peak response image comprises position information corresponding to the target area; and performing region segmentation training on the segmentation network by using the position information as auxiliary information and using the first training image. The image processing method uses a peak response map with position guidance information as auxiliary information, performs target region segmentation on an input training image, solves the problem of data loss of a medical image (such as a medical image obtained through endoscopy), and improves the generalization property and clinical usability of a network.

Description

Training method and device, image processing method, electronic device and storage medium

Technical Field

Embodiments of the present disclosure relate to a training method of an image processing network, an image processing method, a training apparatus of an image processing network, an image processing apparatus, an electronic device, and a non-transitory computer-readable storage medium.

Background

For gastrointestinal disorders, such as colorectal cancer, the best treatment is colon polyp detection to detect and eliminate polyps as early as possible and avoid malignant lesions. For example, the colon may be examined by colonoscopy, using video endoscopy to examine a significant area of the colon's surface and tissue. To provide support for the gastroenterologist during this examination, image processing methods may be employed to aid in the detection of neoplastic lesions, such as texture and feature-based methods using various classification models. Along with the development of artificial intelligence, a lot of researches show that better accuracy and higher analysis speed can be obtained by applying the artificial intelligence to the scene.

In the image processing method, there is a Weakly Supervised Segmentation related article, "weak Supervised distance Segmentation using Class Peak Response, 20180623".

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

At least one embodiment of the present disclosure provides a training method for an image processing network, where the image processing network includes a segmentation network for obtaining a segmented image, the training method including: acquiring a first training image, wherein the first training image comprises a target area to be segmented; processing the first training image by using a classification network to obtain a peak response image corresponding to the first training image, wherein the peak response image comprises position information corresponding to the target area; and performing region segmentation training on the segmentation network by using the first training image by using the position information as auxiliary information.

At least one embodiment of the present disclosure provides an image processing method, including: acquiring an input image, wherein the input image comprises a target area; performing region segmentation processing on the input image by using an image processing network to obtain a segmented image corresponding to the input image, wherein the segmented image has a segmented region corresponding to the target region in the input image, and the image processing network is at least partially trained according to the training method of any embodiment of the disclosure.

At least one embodiment of the present disclosure provides a training apparatus for an image processing network, where the image processing network includes a segmentation network for obtaining a segmented image, the training apparatus including: an acquisition unit configured to acquire a first training image, wherein the first training image includes a target region to be segmented; a processing unit, configured to process the first training image by using the classification network to obtain a peak response map corresponding to the first training image, where the peak response map includes location information corresponding to the target area; and the training unit is configured to perform region segmentation training on the segmentation network by using the first training image by using the position information as auxiliary information.

At least one embodiment of the present disclosure provides an image processing apparatus including: an image input unit configured to acquire an input image, wherein the input image includes a target region; a segmentation processing unit configured to perform a region segmentation process on the input image by using an image processing network to obtain a segmented image corresponding to the input image, wherein the segmented image has a segmented region corresponding to the target region in the input image, and the image processing network is at least partially trained according to the training method of any embodiment of the present disclosure.

At least one embodiment of the present disclosure provides an electronic device, including: a memory non-transiently storing computer executable instructions; a processor configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, implement a training method or an image processing method of an image processing network according to any embodiment of the present disclosure.

At least one embodiment of the present disclosure provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, which when executed by a processor, implement a training method or an image processing method of an image processing network according to any one of the embodiments of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.

Fig. 1 is a schematic flow chart of a training method of an image processing network according to at least one embodiment of the present disclosure;

fig. 2A and 2B are schematic diagrams of a region segmentation label provided in at least one embodiment of the present disclosure;

fig. 3 is a schematic flowchart of step S20 provided by at least one embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a split network provided in at least one embodiment of the present disclosure;

FIG. 5 is a schematic block diagram of a training network provided in at least one embodiment of the present disclosure;

fig. 6 is a schematic flow chart of an image processing method according to at least one embodiment of the disclosure;

fig. 7A is a schematic block diagram of an exercise device provided in at least one embodiment of the present disclosure;

fig. 7B is a schematic block diagram of an image processing apparatus according to at least one embodiment of the present disclosure;

fig. 7C is a schematic processing diagram of an image processing apparatus according to at least one embodiment of the disclosure;

fig. 8 is a schematic block diagram of an electronic device provided in at least one embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a non-transitory computer-readable storage medium provided in at least one embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of another electronic device according to at least one embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly. To maintain the following description of the embodiments of the present disclosure clear and concise, a detailed description of some known functions and components have been omitted from the present disclosure.

Generally, image segmentation is a process of segmenting an image into several parts, and such image processing process can obtain objects or textures in the image, and is often used in remote sensing image or lesion detection applications.

Medical image segmentation is a new biomedical image processing technology, makes a great contribution to sustainable medical care, and is one of important research directions in the field of computer vision. The medical image segmentation based on deep learning is currently a focus of research, and has achieved enormous achievements in recent years. In the research of medical image segmentation networks, the current research mainly aims at modifying the structure of a deep learning network or improving the preprocessing of data.

Although the deep learning network has strong detection capability, many methods can detect the focus (such as polyp, etc.) and achieve high precision and analysis efficiency, the essence of the success of the deep learning method still depends on a large amount of high-quality labeled data sets. However, in the field of medical examination, such as the practical use of medical examination means like endoscopes, it is difficult and expensive to collect data and to train a model with a sufficient number of carefully labeled data sets in a short time.

Zero-sample learning focuses on a transfer learning process of information inference from visible classes to invisible classes by implementing cross-modal or class knowledge transfer with assistance information. For example, the zero sample learning method is defined to learn and predict the invisible data set result based on the visible data set and the auxiliary information, for example, the class intersection of the visible data set and the invisible data set is empty. That is, the zero-sample learning can implement classification, region segmentation, and other processing on the untrained (and therefore not labeled) invisible data set based on less training set data (i.e., the labeled visible data set) and auxiliary information.

At present, aiming at medical images, such as endoscope images, methods and products for segmenting a focus area mainly improve a segmentation network to improve segmentation accuracy, and the method does not consider that clinical data may be missing and cannot solve the problem of a large number of training data sets.

Although zero sample learning has achieved a good effect in a conventional computer vision task, zero sample learning has not been applied to the field of medical images, and no related research work appears at present in scenes such as lesion region segmentation of medical images.

At least one embodiment of the present disclosure provides a training method of an image processing network, a training apparatus of an image processing network, an image processing method, an electronic device, and a non-transitory computer-readable storage medium.

The image processing method comprises the following steps: acquiring a first training image, wherein the first training image comprises a target area to be segmented; processing the first training image by using a classification network to obtain a peak response image corresponding to the first training image, wherein the peak response image comprises position information corresponding to the target area; and performing region segmentation training on the segmentation network by using the position information as auxiliary information and using the first training image.

In at least one embodiment, the segmentation network performs region segmentation on the unseen image by using the position information in the peak response map output by the classification network as auxiliary information based on the learning features obtained in the process of learning the visible data set. The image processing method can enable the segmentation network to have the region segmentation capability on an unseen target region without a new region segmentation label, effectively solves the problem of insufficient labeling data (such as clinical data missing), and greatly improves the generalization property and clinical usability of the model.

In at least one example, the image processing method realizes that a zero sample learning idea is applied to medical image segmentation, a peak response graph with position guide information is used as auxiliary information, and the input training image is subjected to lesion region segmentation, so that the existing clinical data set can be better utilized, the image processing network model can still distinguish and adapt to more lesion types under the condition of no complete data set and data annotation, the problem of data loss of the medical image such as an endoscope image is solved, the early detection capability of the lesion region such as polyp is greatly improved, and the generalization property and the clinical usability of the model are improved.

The image processing method provided by the embodiment of the disclosure can be applied to the image processing device provided by the embodiment of the disclosure, and the image processing device can be configured on an electronic device. The electronic device may be a personal computer, a server, a mobile terminal, and the like, and for example, the mobile terminal may be a hardware device such as a mobile phone and a tablet computer.

For example, the image processing network in the embodiments of the present disclosure may be a neural network.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments.

Fig. 1 is a schematic flow chart of a training method of an image processing network according to at least one embodiment of the present disclosure.

As shown in fig. 1, a training method for an image processing network according to at least one embodiment of the present disclosure includes steps S10 to S30.

In step S10, a first training image is acquired.

For example, the first training image includes a target region to be segmented, for example, for a medical image, the target region may be a lesion region, that is, a region that a doctor needs to make medical judgment, and a diagnosis result of a patient may be obtained through analysis of the lesion region.

For example, when the first training image is an endoscopic image, the target region may be a polyp region.

In step S20, the first training image is processed using the classification network to obtain a peak response map corresponding to the first training image.

For example, the peak response map includes location information corresponding to the target area.

In step S30, the segmentation network is subjected to region segmentation training using the first training image, using the position information as auxiliary information.

For example, the training image may be a medical image.

For example, the imaging modality of the training image may be any suitable medical imaging modality. For example, the training image may be imaged by NBI (Narrow Band Imaging), white light, CT (Computed Tomography), MRI (Magnetic Resonance Imaging), and the like.

For example, the training image may be an image obtained by performing medical examination on an arbitrary examination region by using an arbitrary medical examination means, and the training image needs to be a real pathological image of the patient. For example, the medical detection means may include endoscopy, angiography, ultrasound, and the like. For example, the detection region may include a human body region such as the colon, stomach, lung, heart, etc.

For example, the training image may be an image taken from a medical examination video, e.g., the medical examination video may be an endoscopic video, a cardiac ultrasound video, etc., e.g., the endoscopic video from real case situations of a plurality of anonymized patients.

For example, the training image may be obtained by scanning or shooting the medical detection image through the image acquisition device, and the training image may be an image directly acquired by the image acquisition device, or an image obtained by performing certain preprocessing on the acquired image.

For example, acquisition region requirements and quality requirements may be set for the training images.

For example, the sharpness of the training image should meet the requirement to ensure that an accurate lesion region is obtained, thereby facilitating the physician to obtain an accurate diagnosis result.

For example, the training images may be from a pre-acquired dataset comprising 3433 images taken from endoscopic video, including white light images and NBI images, which was created from real case situations in 48 patients.

For example, the training image has normalized annotation information, which may include one or more of a classification label, a modality label, a region segmentation label, and the like. For example, the modality label indicates an imaging modality of the training image, such as white light imaging or NBI imaging, and the like. For example, the classification labels characterize classes of target regions in the training image, such as colon polyps and the like. For example, the region segmentation labels represent position information of the target region in the training image.

Fig. 2A and 2B are schematic diagrams of a region segmentation label according to at least one embodiment of the present disclosure. For example, fig. 2A is a schematic diagram of a training image, and fig. 2B is a region segmentation label corresponding to the training image shown in fig. 2A.

For example, in fig. 2B, a white portion indicates a target region, that is, a polyp region, and a black portion indicates a background region, that is, a portion other than the target region in the training image.

For example, the training image may be divided into a visible data set and an invisible data set according to a zero sample learned data set definition. For example, when the annotation information of the training image includes a region segmentation label and a classification label, the training image belongs to the visible data set, for example, when the annotation information of the training image does not include a region segmentation label, for example, when the annotation information of the training image includes a classification label and a modality label, the training image belongs to the invisible data set.

For example, for medical images, the acquisition of region segmentation labels is costly and data collection is difficult, while the acquisition of classification labels is relatively easy, so the number of training images in the visible dataset is much smaller than the number of training images in the invisible dataset.

For example, the first training image does not have a region segmentation label corresponding to the target region to be segmented, and further for example, the segmentation network does not have any region segmentation training on the target region to be segmented before the region segmentation training on the segmentation network. That is, the target region in the first training image belongs to an image that was not seen before the segmentation network, and the first training image belongs to the invisible dataset for the segmentation network.

For example, the class of the target region in the first training image belongs to a class for which the classification network has been trained, but the target region in the first training image belongs to a target region that has not been seen before the segmentation network.

For example, the classification network to be trained may be trained in advance using the visible data set and the invisible data set to obtain a trained classification network, and the classification network may identify the target region of at least one class. For example, because the total number of visible data sets and invisible data sets is large, a large number of training images can be used to train the classification network, and the accuracy of the resulting classification network is also high.

For example, a segmentation network to be trained may be trained using the visible data sets to obtain a segmentation network, which may be less effective in directly performing region segmentation due to the limited number of visible data sets.

For example, the classification network can identify 3 types of target areas through training, and only 1 type of the 3 types of target areas has the normalized area segmentation labels, the segmentation network is trained by using the 1 type of training image with the normalized area segmentation labels, and the other two types of target areas belong to an "unseen" state for the trained segmentation network, and the segmentation network cannot directly perform area segmentation on the two types of target areas.

For example, a peak response graph may be generated with knowledge already learned by a trained classification network, which assists in performing zero sample inference learning on a segmented network that has been trained on a visible data set. For example, the segmentation network performs region segmentation on an unseen image by using, as auxiliary information, position information in a peak response map output by the classification network based on a learning feature obtained in a process of learning a visible data set. The method can enable the segmentation network to have the region segmentation capability for an unseen target region without a new region segmentation label, effectively solves the problem of insufficient labeling data (such as clinical data missing), and greatly improves the generalization property and clinical usability of the model.

The following describes an implementation procedure of a training method for an image processing network according to at least one embodiment of the present disclosure with reference to the drawings.

For example, the classification network includes a classification model configured to identify at least one class of target regions and at least one peak stimulation layer.

For example, the classification model may be a Full Convolutional Network (FCN), which may capture spatial information in a forward propagation process and output a class response map with pixel-level prediction, i.e., a semantically segmented image, and thus is suitable for spatial prediction.

For example, with respect to Convolutional Neural Networks (CNNs), in a Full Convolutional Network (FCN), the last fully-connected layer is replaced with a full convolutional layer. This significant improvement makes FCNs have dense pixel-level prediction capabilities. To achieve better positioning performance, high resolution activation maps are combined with the up-sampled output and passed to the convolutional layer to assemble a more accurate output. This improvement therefore enables the FCN to make pixel-level predictions for full-size images, rather than segment predictions, and also to make predictions for the entire image in one forward pass.

For example, the classification model may use a full-convolution network, and the class response map is a semantic segmentation image with pixel-level prediction output by the full-convolution network.

It should be noted that, in other embodiments, the classification model may also be other neural network structures capable of performing classification and outputting a class response graph, which is not limited by the present disclosure.

For example, the classification network may be trained in advance and may be capable of identifying at least one category of target region, that is, the classification network may be trained in advance to identify a plurality of desired lesion regions.

For example, the training method of the classification network may include: acquiring at least one image to be trained and at least one category label corresponding to the at least one image to be trained one by one; and training the classification model to be trained based on at least one image to be trained and at least one class label to obtain the classification model.

For example, the images to be trained for training the classification model may be from the visible data set as described above or may be from the invisible data set as described above.

For example, the classification model is trained separately, for example, the classification model may be trained by a conventional fully supervised training method, the classification model may be trained separately, and the loss function is a standard mean square error loss function, which is expressed by the following formula (1):

(formula 1)

Where MSE represents the mean squared error loss value calculated by the loss function,

representing the predicted result of the output of the classification model,

representing a classification label, n representing a total number of at least one image to be trained,

a summation formula is represented.

It should be noted that the loss function of the classification model is only the feature information that helps the classification model to learn the category, and the loss function of the classification model may also take other available forms, which is not limited by the present disclosure.

For example, to make a classification network with spatial prediction capability, one or more peak stimulus layers may be constructed for the classification network and inserted into the top layer of the classification network, which generates peaks for the classes to obtain a peak response map. For example, the peak response map includes location information corresponding to the target area, which may assist the split network in performing zero sample learning. For example, the peak response map may be in the form of a pixel level predictive image, e.g., in the form of a region segmentation tag similar to that shown in fig. 2B.

Fig. 3 is a schematic flowchart of step S20 provided in at least one embodiment of the present disclosure. As shown in fig. 3, step S20 may include at least steps S201-S203.

In step S201, a classification model is used to classify the first training image to obtain at least one class response map corresponding to at least one class one to one.

For example, each category response map has a response region for the category to which each category response map corresponds.

For example, the response regions in the class response maps of different classes are different, e.g., the response regions are used to indicate the location of the target region belonging to the class in one class response map. For example, when the target region is a polyp region, the response region in the category response map corresponding to the category a is a polyp region belonging to the category a, and the response region in the category response map corresponding to the category B is a polyp region belonging to the category B.

In step S202, the at least one category response map is processed with the at least one peak stimulation layer to obtain at least one confidence score, respectively.

For example, in one example, step S202 may include: for each of the at least one category response graph: determining a response area in the category response map; and calculating the average value of the numerical values of all the elements in the response area, and taking the average value as the confidence score corresponding to the category response graph.

For example, where at least one peak stimulus layer is a convolutional layer, calculating an average of the values of all elements in the response region may include: the class response graph is convolved with a convolution kernel corresponding to the class response graph to obtain an average value, for example, the formula of the convolution kernel is as follows:

(formula 2)

Wherein the content of the first and second substances,

represents a convolution kernel corresponding to the c-th class response map, (C)

,

) Coordinate values representing the kth element in the response region of the c-th class response map,

represents the total number of elements in the response area of the c-th category response map,

the representation of the function of a dirac,

indicating an addition process.

For example, confidence score for the c-th category

From the c-th class response map

And corresponding convolution kernel

The convolution is carried out to obtain the result, and the specific formula is as follows:

(formula 3)

Wherein the content of the first and second substances,

the "x" indicates convolution operation, and the meanings of other parameters are as described in the parameter description of formula 2, and are not described herein again.

As can be seen from equation 3, the classification network ultimately uses only the peaks in the response region for final prediction, so the gradient will be shared by the convolution kernel to all the peak coordinates during the subsequent probability back propagation.

From the perspective of model learning, the class response map is mainly obtained by densely sampling all receptive fields, and since the class response map presents semantic segmentation information of the full input image, most information in the full input image is irrelevant to a focus and is not an object to be focused, the class response map has the problem of unbalanced foreground and background.

Different from the traditional method of learning from extreme foreground and background imbalance unconditionally, the peak stimulation layer forces the classification network to learn through an information receptive field set (namely a response region) estimated through a class response diagram, and forces the segmentation network to learn only the peak region according to the peak region in the peak response diagram estimated through the class response diagram, so that negative effects brought by learning of a negative sample in the training process are avoided.

In step S203, based on the at least one confidence score and the at least one category response map, performing a probability back propagation process on the classification model to obtain a peak response map corresponding to the first training image.

Probabilistic back-propagation in this disclosure is a probabilistic back-propagation process for peaks to further generate a refined and example-aware representation, i.e., a peak response map. Unlike previous attention models in the back propagation direction, the probabilistic back propagation in the present disclosure finds the most relevant neurons in the output class through the receptive field to generate the class response map, and extracts the spatial information that determines the most relevant of the class from the classification labels that do not contain the location information, so that the correlation of each element in the back propagation direction (the direction from the output end to the input end) can be obtained at the bottom layer (called the bottom layer near the input end in the classification model), and thus the peak response map is obtained through the probabilistic back propagation.

The probability back propagation process in this disclosure can be interpreted, for example, as a process in which a walker starts from the top layer (referred to as the top layer closest to the output end in the classification model) and walks randomly to the bottom layer (referred to as the bottom layer closest to the input end in the classification model), and then converts the correlation formula in the back propagation direction for each element in the bottom layer input image into a probability that the walker visits the element.

For example, in some embodiments, step S203 may include: performing probability back propagation processing on the classification model based on at least one confidence score and at least one category response graph, and determining at least one intermediate peak response graph corresponding to at least one category one to one; acquiring the category of a target area in a first training image; and taking the intermediate peak response graph corresponding to the category of the target area in the first training image as the peak response graph corresponding to the first training image.

For example, performing a probabilistic back propagation process on the classification model based on the at least one confidence score and the at least one class response map to determine at least one intermediate peak response map corresponding to the at least one class in a one-to-one manner may include: determining N processing layers included by the classification model, wherein the N processing layers are sequentially from a 1 st processing layer to an Nth processing layer along a forward propagation direction, the forward propagation direction is a direction from input to output of the classification model, and N is a positive integer greater than 1; and for each selected category in at least one category, sequentially obtaining a probability map corresponding to each processing layer for the selected category along a backward propagation direction based on the confidence score corresponding to the selected category and the category response map corresponding to the selected category, and taking the probability map corresponding to the 1 st processing layer for the selected category as an intermediate peak response map corresponding to the selected category, wherein the backward propagation direction is opposite to the forward propagation direction.

For example, the processing layer may comprise a convolutional layer, e.g., the processing layer may also comprise other commonly used intermediate layers, such as a pooling layer, e.g., an average pooling layer or a maximum pooling layer, etc., which is considered to be the same type of layer that performs the affine transformation of the input, and thus the corresponding probabilistic backpropagation processing may be performed in the same manner as the convolutional layer.

For example, the most relevant spatial locations may be located for each class's peak response in turn along the counter-propagation direction to generate detailed example perceptual visual cues, i.e., a map of the peak response for that class.

For example, sequentially obtaining the probability map for the selected category corresponding to each processing layer along the back propagation direction based on the confidence score corresponding to the selected category and the category response map corresponding to the selected category may include: for the ith processing layer of the N processing layers: acquiring a switching probability corresponding to the ith processing layer, wherein the switching probability is used for indicating the correlation of each element in the input feature diagram and the output feature diagram of the ith processing layer; obtaining a probability map corresponding to the ith processing layer based on the transition probability corresponding to the ith processing layer and the probability parameter corresponding to the ith processing layer, wherein when i is equal to N, the probability parameter corresponding to the ith processing layer is a category response map corresponding to the selected category, and the transition probability corresponding to the ith processing layer is obtained, including: and acquiring the switching probability corresponding to the Nth processing layer based on the confidence score corresponding to the selected category, wherein when i is greater than or equal to 1 and less than N, the probability parameter corresponding to the ith processing layer is the probability map corresponding to the (i + 1) th processing layer, and i is a positive integer.

For example, obtaining the transition probability corresponding to the ith processing layer may include: when i is equal to N, determining an input characteristic parameter corresponding to the Nth processing layer based on the confidence score corresponding to the selected category, and when i is greater than or equal to 1 and less than N, taking the input characteristic graph of the ith processing layer as the input characteristic parameter of the ith processing layer; determining a gradient map corresponding to the ith processing layer in the forward propagation direction based on the input characteristic parameters and the loss function of the classification model; and obtaining the switching probability corresponding to the ith processing layer based on the gradient map corresponding to the ith processing layer.

For example, determining the input feature parameter corresponding to the nth processing layer based on the confidence score corresponding to the selected category may include: determining the input characteristic parameters corresponding to the Nth processing layer by using the following formula:

(formula 4)

Wherein the content of the first and second substances,

the input characteristic parameters corresponding to the Nth processing layer and aiming at the selected category are shown,

representing the total number of elements in the response region of the class response map corresponding to the selected class, L representing the loss function of the classification model,

a confidence score is indicated that corresponds to the selected category,

representing the convolution kernel in the peak stimulation layer corresponding to the selected category, and c representing the selected category.

For example, in the probability back propagation process, for each selected category, the input feature parameters for the selected category corresponding to the nth processing layer calculated according to formula 4 for the nth processing layer are first input

So that the gradient is shared by the convolution kernel to all peak coordinates during back propagation.

Then, the input characteristic parameters corresponding to the Nth processing layer and aiming at the selected category

The transition probability corresponding to the nth processing layer is obtained according to the following formula 6, the class response map corresponding to the selected class is used as a probability parameter, and the probability map corresponding to the nth processing layer for the selected class is obtained according to the following formula 5 according to the probability parameter and the transition probability.

Then, for the (N-1) th processing layer, the input feature map of the (N-1) th processing layer is used as an input feature parameter, the transition probability corresponding to the (N-1) th processing layer is obtained according to the following formula 6, the probability map corresponding to the selected category of the (N) th processing layer is used as a probability parameter, and the probability map corresponding to the selected category of the (N) th processing layer is obtained by calculation according to the following formula 5 according to the probability parameter and the transition probability.

Then, for the N-2 th processing layer, the input feature map of the N-2 th processing layer is used as an input feature parameter, the transition probability corresponding to the N-2 th processing layer is obtained according to the following formula 6, the probability map corresponding to the N-1 th processing layer and corresponding to the selected category is used as a probability parameter, and the probability map corresponding to the N-2 th processing layer and corresponding to the selected category is obtained according to the following formula 5 according to the probability parameter and the transition probability.

And (4) according to the above process, obtaining a probability map corresponding to the 1 st processing layer, and taking the probability map as a middle peak response map corresponding to the selected category.

For example, for the ith processing layer, the probability map calculation formula is as follows:

(formula 5)

Wherein P is: (

) A probability map for the selected category corresponding to the ith processing layer is shown,

represents the transition probability corresponding to the ith processing layer, P: (

) A probability parameter for the selected category corresponding to the ith processing layer is represented,

indicates the input feature parameters corresponding to the ith processing layer for the selected category, i and j indicate the position coordinates of each element in the input feature parameters (e.g., input feature map),

an output feature map representing the i-th processing layer, p and q representing position coordinates of respective elements in the output feature map, k representing a ratio of sizes of the input feature map and the output feature map of the i-th processing layer, H and W representing

The height dimension and the width dimension of (a).

For example, for the ith processing layer, the transition probability calculation formula is as follows:

(formula 6)

Wherein the content of the first and second substances,

expressing the normalization factor to ensure

，

Representing the loss function versus the input characteristic parameter during forward propagation

Partial derivatives of elements with (i, j) as the coordinates of the middle position, i.e. input characteristic parameters

A gradient map of an element with a middle position coordinate of (i, j),

and the method is used for discarding the negative relation in the probability back propagation process.

For example, acquiring the category of the target region in the first training image may include: based on the at least one class response map, a class of the target region in the first training image is determined.

For example, a category score is calculated separately for each category to be learned, and the calculation formula is as follows:

(formula 7)

Wherein Score represents the category Score of the c-th category,

and

the non-correlation coefficient of the representative category,

a class response graph corresponding to the c-th class is shown,

a contour line indicating a response region calculated from the morphological gradient, S a background mask image for the contour line obtained from a deviation between the class response map and the contour line, Q a background mask image for the response region obtained from a deviation between the class response map and the response region, and "×" represents a convolution operation.

For relevant content on category scores, reference may be made to the following papers: j.r.r. Uijlings, k.e.a. van de Sande, t. Gevers, and a.w.m. smeulders. Selective search for object registration. International Journal of Computer Vision (IJCV); J. Pont-Tuset, p. Arbelaez, j.t. Barron, f. Marques, and j. malik. Multiscale combined grouping for image segmentation and object pro-posal generation, IEEE trans. Pattern anal. inner, 39(1): 128-ion 140, 2017.2, 3, 5; K. manitis, j. Pont-Tuset, p.a. Arbelaez, and l.v. Gool. resonant oriented buildings From image segmentation to high-level tasks, CoRR, abs/1701.04658, 2017.3, 5. And will not be described in detail herein.

And calculating according to formula 7 for each category, taking the category with the highest category score as the category of the target region in the first training image, and taking the intermediate peak response map corresponding to the category of the target region in the first training image as the peak response map corresponding to the first training image.

For example, in other embodiments, step S203 may include obtaining a target class of the target region in the first training image; performing probability back propagation processing on the classification model based on the confidence score corresponding to the target class and the class response graph corresponding to the target class, and determining a middle peak value response graph corresponding to the target class; and taking the intermediate peak response graph corresponding to the target class as a peak response graph corresponding to the first training image.

The specific implementation procedure of the probability back propagation processing in this embodiment is as described above, and is not described here again.

For example, the process of probability back propagation may be performed on all classes first, and then the intermediate peak response map corresponding to the class of the first training image is used as the peak response map corresponding to the first training image, or the class of the first training image may be determined first, probability back propagation is performed only on the class, and the obtained intermediate peak response map is used as the peak response map corresponding to the first training image, which is not limited in this disclosure.

According to the above process, the peak response graph provided by the present disclosure as the auxiliary information of the segmented network explicitly considers the receptive field and can extract visual cues perceived by instances from a specific spatial position, i.e. class peak responses, and further generates detailed and instance perception representations in the probability back propagation process, so that the spatial information that determines the most relevant class can be extracted from the classification label without position information, and the spatial information can be used as the auxiliary information of the region segmentation process, so as to realize the zero sample learning of the segmented network by using the information of the classified network; in addition, the peak response graph provided by the disclosure can encode effective information in a training image into the form of an image for zero sample learning of a segmentation model.

For example, the structure of the split network may adopt a U-Net (U-type network) structure.

U-Net is a convolutional neural network that better segments biomedical images. U-Net can be understood as the fusion of encoder and decoder. For example, the encoder part is used for realizing the feature extraction of the input image through a plurality of convolution layers, activation functions, pooling layers and the like to obtain a plurality of feature maps; the decoder part utilizes a plurality of transposed convolutional layers and the like to carry out up-sampling processing on a plurality of characteristic graphs, and carries out channel splicing processing on processing results of some convolutional layers of the encoder part and some convolution processing results after up-sampling processing to obtain a segmented image corresponding to the input image, so that the image segmentation of the input image is realized.

In the analysis process of the medical image, different from the conventional image, the analysis time and the processing time can be longer, the medical image needs to be analyzed in the shortest time as possible, and immediate feedback is provided for a doctor, so that the processing speed of the medical image is required to be higher.

For example, the residual network can solve the problems of gradient dispersion, explosion and network degradation in the deep convolution, the residual connection can make information propagation smoother, and the same convolution layer can be converged more quickly. Because the number of convolutional layers in the U-Net structure is large, fewer training parameters can be used when the convolutional layers are completely replaced by residual modules under the condition of obtaining the same result, so that the model becomes a lightweight model with higher speed, and the use requirement of the medical image is better met.

For example, the segmentation network includes at least one encoder module and at least one decoder module, the at least one encoder module is configured to perform a feature extraction process on the first training image to obtain a plurality of feature maps, the at least one decoder module is configured to perform a resizing process and a channel connection process on the plurality of feature maps to obtain a segmentation result, and the at least one encoder module and the at least one decoder module perform a convolution process using the residual module.

The conventional convolution structures in the encoder module and the decoder module are changed into residual structures, and compared with a U-Net model, the residual structure uses fewer training parameters, so that the model is changed into a lightweight model with higher speed, and the residual structure is more in line with the use requirement of medical images.

For example, fig. 4 illustrates a schematic diagram of a split network provided by at least one embodiment of the present disclosure.

As shown in fig. 4, the split network consists of two encoder modules (encoder module 1 and encoder module 2) and two decoder modules (decoder module 1 and decoder module 2). As shown in fig. 4, the encoder module 1 includes 2 Residual modules (Residual Block), 1 Strided Convolutional layer (Strided Convolutional layer), and 1 pooling layer (Pool layer), and the encoder module 2 includes 2 Residual modules and 1 Strided Convolutional layer; the decoder module 1 and the decoder module 2 each include 2 residual modules and 2 transposed Convolutional layers (transposed layers), and channel-connect feature maps output by different residual modules.

For example, the encoder module learns to extract all necessary information from the input image and transfer the information to the decoder module, and the transposed convolutional layer is also added to improve the coarse-level features into the fine-level features layer by using spatial cross-layer attention when the encoder module and the decoder module are connected, that is, the transposed convolutional layer can introduce an attention mechanism and add higher weight to the important features of the encoder module, so as to help the segmentation network pay more attention to the important features. In addition, because a residual error structure is used, the segmentation network can be helped to generate more semantic information and meaningful information, and the segmentation result can be generated quickly and accurately.

It should be noted that the present disclosure does not limit the structure of the split network, and other split network architectures may be adopted in different embodiments, and all the network architectures may be replaced to perform feature extraction and region splitting.

For example, before the step S30 is executed, the training method of the image processing network may further include: acquiring at least one second training image and at least one region segmentation label corresponding to the at least one second training image one to one; and training the segmentation network to be trained based on at least one second training image and at least one region segmentation label to obtain the segmentation network.

For example, the second training image is from a visible data set, i.e., the labeling information of the training image includes region segmentation labels and classification labels. For example, the segmentation network to be trained is trained first using the existing training image from the visible data set, so that the segmentation network has a priori knowledge, so that region segmentation of the unseen training image can be realized later according to the auxiliary information.

For example, the second training image and the training image used for training the classification model are the same image, i.e. both training images in the visible data set.

For example, the process of training the segmented network based on the visible data set may employ a conventional fully supervised training approach.

For example, step S30 may include: processing the first training image by using a segmentation network to obtain a training segmentation image; calculating the loss value of the segmentation network through a loss function corresponding to the segmentation network on the basis of the training segmentation image by using the position information as auxiliary information; modifying the parameters of the segmented network based on the loss values; and when the loss value corresponding to the segmentation network does not meet the preset accuracy condition, continuously inputting the first training image to repeatedly execute the training process.

For example, the process of training the segmentation network by using the position information as the auxiliary information and combining the invisible data set can be understood as a normal fully supervised training mode, specifically, the loss value of the segmentation network is calculated by combining the training segmentation image output by the segmentation network and the label converted by the peak response map based on the loss function in the training process, then, the parameter of the segmentation network is adjusted according to the loss value, when the loss value corresponding to the segmentation network does not meet the predetermined accuracy condition, the first training image is continuously input to repeatedly execute the training process, and when the loss value corresponding to the segmentation network meets the predetermined accuracy condition, the trained segmentation network is obtained.

Fig. 5 is a schematic structural diagram of a training network according to at least one embodiment of the present disclosure.

As shown in fig. 5, the training network includes a segmentation network and a classification network.

For example, a classification network includes a classification model and a plurality of peak stimulus layers, the classification network having been trained previously through visible and invisible data sets.

For example, the segmentation network is a segmentation network used for obtaining a segmentation image in the image processing network as described above, and the segmentation network has been subjected to preliminary region segmentation training by the visible data set in advance.

For example, the classification model firstly performs classification processing on the input training image to obtain class response maps respectively corresponding to a plurality of classes, and the specific process is as described in step S201, which is not described herein again.

Then, the peak stimulation layer processes the multiple category response maps to obtain multiple confidence scores, and the specific process is as described in step S202, which is not described herein again.

Then, probability back propagation processing is performed based on the plurality of confidence scores and the plurality of category response maps to obtain a peak response map, and the specific process is as described in step S203, which is not described herein again.

For example, the segmentation network processes an input training image to obtain a training segmentation image corresponding to the training image, the segmentation network uses a peak response map as auxiliary information, that is, uses position information in the peak response map as an area segmentation label, combines the training segmentation image, calculates a loss value according to a loss function corresponding to the segmentation network, adjusts parameters of the segmentation network according to the loss value, and finally obtains the trained segmentation network when the loss value corresponding to the segmentation network meets a predetermined accuracy condition.

In the training method of the image processing network provided in at least one embodiment of the present disclosure, a zero sample learning method is used to implement a transfer of cross-modal or class knowledge by using a peak response graph as auxiliary information, that is, to implement region segmentation on an unseen training image, so as to effectively solve the problem of insufficient annotation information encountered in a deep learning model during clinical application, provide a precise and fast medical segmentation image for a doctor, for example, an endoscopic segmentation image, and simultaneously reduce the cost of data collection and annotation, thereby greatly improving the generalization property and clinical usability of the model.

For example, at least one embodiment of the present disclosure further provides an image processing method. Fig. 6 is a schematic flowchart of an image processing method according to at least one embodiment of the present disclosure.

As shown in fig. 6, the image processing method includes steps S40-S50.

In step S40, an input image is acquired.

For example, the input image includes a target region, e.g., the input image is a medical image and the target region is a lesion region.

For example, regarding the relevant properties and the obtaining manner of the input image, reference may be made to the relevant content of the foregoing step S10, which is not described herein again.

In step S50, the input image is subjected to region segmentation processing using an image processing network to obtain a segmented image corresponding to the input image.

For example, the divided image has a divided region corresponding to the target region in the input image.

For example, the image processing network is at least partially trained according to the training method of the image processing network according to at least one embodiment of the present disclosure.

The training method for the image processing network is as described above, and will not be described herein.

For example, after the image processing network to be trained is trained according to the above-mentioned manner, the trained image processing network can be obtained to perform region segmentation processing on the input image, the image processing network does not need to provide auxiliary information by a classification network, the image processing network obtained by the training method can perform region segmentation processing independently, the target region capable of performing region segmentation is not limited to the limited type in the visible data set, and the generalization property and clinical usability of the model are improved.

At least one embodiment of the present disclosure further provides a training apparatus of an image processing network, and fig. 7A is a schematic block diagram of the training apparatus provided in at least one embodiment of the present disclosure.

As shown in fig. 7A, the training apparatus 100 of the image processing network may include an acquisition unit 101, a processing unit 102, and a training unit 103. These components are interconnected by a bus system and/or other form of connection mechanism (not shown). It should be noted that the components and configuration of the exercise device 100 shown in FIG. 7A are exemplary only, and not limiting, and that the exercise device 100 may have other components and configurations as desired.

For example, the obtaining unit 101 is configured to obtain a first training image, wherein the first training image comprises a target region to be segmented.

For example, the processing unit 102 is configured to process the first training image by using the classification network, and obtain a peak response map corresponding to the first training image, where the peak response map includes location information corresponding to the target area.

For example, the training unit 103 is configured to perform region segmentation training on the segmentation network using the first training image, using the position information as auxiliary information.

For example, the training unit 103 includes a training network 104, the training network includes a classification network, an image processing network for obtaining a segmentation image, a loss function, and the like (all not shown), and the training unit 103 is configured to train the image processing network to be trained to obtain a trained image processing network.

For example, the training network 104 may refer to the training network shown in FIG. 5.

It should be noted that the image processing network in the training unit 103 is the same as the image processing network in the embodiment of the training method of the image processing network, and the structure and the function of the image processing network are not described herein again.

It should be noted that the specific process of training the image processing network by using the training unit 103 may refer to the related description in the embodiment of the training method for the image processing network, and repeated details are not repeated.

At least one embodiment of the present disclosure further provides an image processing apparatus, and fig. 7B is a schematic block diagram of an image processing apparatus provided in at least one embodiment of the present disclosure.

As shown in fig. 7B, the image processing apparatus 200 may include an image input unit 201, a segmentation processing unit 202. These components are interconnected by a bus system and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the image processing apparatus 200 shown in fig. 7B are only exemplary and not limiting, and the image processing apparatus 200 may have other components and structures as necessary.

For example, the image input unit 201 is configured to acquire an input image, wherein the input image includes a target region.

For example, the segmentation processing unit 202 is configured to perform region segmentation processing on the input image using an image processing network to obtain a segmented image corresponding to the input image.

For example, at least part of the image processing network is obtained by training according to the training method described in any embodiment of the present disclosure, and the specific process of training and the like may refer to the related description in the embodiment of the training method of the image processing network, and repeated parts are not described again.

Fig. 7C is a schematic processing procedure diagram of an image processing apparatus according to at least one embodiment of the disclosure.

As shown in fig. 7C, the left input image is a medical image obtained by endoscopy, and the input image is processed by the image processing apparatus provided in at least one embodiment of the present disclosure to obtain a segmented image shown on the right side of fig. 7C, where the segmented region is a white region in the segmented image.

Some embodiments of the present disclosure also provide an electronic device. Fig. 8 is a schematic block diagram of an electronic device according to at least one embodiment of the present disclosure.

For example, as shown in fig. 8, an electronic device 800 includes a processor 801 and a memory 802. It should be noted that the components of the electronic device 800 shown in fig. 8 are exemplary only, and not limiting, and the electronic device 800 may have other components according to the actual application.

For example, the processor 801 and the memory 802 may be in direct or indirect communication with each other.

For example, the processor 801 and the memory 802 may communicate over a network. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The processor 801 and the memory 802 may also communicate with each other via a system bus, which is not limited by the present disclosure.

For example, in some embodiments, memory 802 is used to store computer-readable instructions non-transiently. The processor 801 is configured to execute computer readable instructions, and when the computer readable instructions are executed by the processor 801, the training method of the image processing network according to any of the above embodiments or the image processing method according to any of the above embodiments is implemented. For specific implementation and related explanation of each step of the training method for the image processing network, reference may be made to the above-mentioned embodiment of the training method for the image processing network, and for specific implementation and related explanation of each step of the image processing method, reference may be made to the above-mentioned embodiment of the image processing method, and repeated points are not described herein again.

For example, the processor 801 and the memory 802 may be disposed on a server side (or cloud side).

For example, the processor 801 may control other components in the electronic device 800 to perform desired functions. The processor 801 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), or the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The Central Processing Unit (CPU) may be an X86 or ARM architecture, etc.

For example, memory 802 may include any combination of one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer readable instructions may be stored on the computer readable storage medium and executed by the processor 801 to implement various functions of the electronic device 800. Various application programs and various data and the like can also be stored in the storage medium.

For example, in some embodiments, the electronic device 800 may be a cell phone, a tablet computer, electronic paper, a television, a display, a laptop computer, a digital photo frame, a navigator, a wearable electronic device, a smart home device, and so on.

For example, the electronic device 800 may include a display panel, which may be used to segment images, and the like. For example, the display panel may be a rectangular panel, a circular panel, an oval panel, a polygonal panel, or the like. In addition, the display panel can be not only a plane panel, but also a curved panel, even a spherical panel.

For example, the electronic device 800 may have a touch function, i.e., the electronic device 800 may be a touch device.

For example, for a detailed description of a process of the electronic device 800 executing the training method of the image processing network, reference may be made to the related description in the embodiment of the training method of the image processing network, and for a detailed description of a process of the electronic device 800 executing the image processing method, reference may be made to the related description in the embodiment of the image processing method, and repeated details are not repeated.

Fig. 9 is a schematic diagram of a non-transitory computer-readable storage medium according to at least one embodiment of the disclosure. For example, as shown in FIG. 9, one or more computer readable instructions 901 may be stored non-temporarily on a storage medium 900. For example, the computer readable instructions 901 may perform one or more steps of a training method according to the image processing network described above when executed by the processor, or the computer readable instructions 901 may perform one or more steps of an image processing method according to the image processing method described above when executed by the processor.

For example, the storage medium 900 may be applied to the electronic device 800 described above. For example, the storage medium 900 may include the memory 802 in the electronic device 800.

For example, the description of the storage medium 900 may refer to the description of the memory 802 in the embodiment of the electronic device 800, and repeated descriptions are omitted here.

Referring now to FIG. 10, FIG. 10 illustrates a schematic diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), wearable electronic devices, and the like, and fixed terminals such as digital TVs, desktop computers, smart home devices, and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device 300 may comprise a processing means (e.g. a central processing unit, a graphics processor, etc.) 301 which may perform various suitable actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 306 into a Random Access Memory (RAM) 303 to perform one or more steps of a training method or an image processing method according to an image processing network as described above. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 306 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart to perform one or more steps of the method of the image processing method or the training method of the image processing network according to the above. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 309, or installed from the storage means 306, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that in the context of this disclosure, a computer-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer-readable storage medium may be, for example, but is not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

According to one or more embodiments of the present disclosure, a training method of an image processing network, wherein the image processing network includes a segmentation network for obtaining a segmented image, the training method includes: acquiring a first training image, wherein the first training image comprises a target area to be segmented; processing the first training image by using a classification network to obtain a peak response image corresponding to the first training image, wherein the peak response image comprises position information corresponding to the target area; and performing region segmentation training on the segmentation network by using the first training image by using the position information as auxiliary information.

According to one or more embodiments of the present disclosure, the first training image does not have a region segmentation label corresponding to a target region to be segmented, and the segmentation network does not perform any region segmentation training on the target region to be segmented before performing the region segmentation training on the segmentation network.

According to one or more embodiments of the present disclosure, the classification network includes a classification model and at least one peak stimulation layer, the classification model is configured to identify at least one class of target regions, and the first training image is processed by the classification network to obtain a peak response map corresponding to the first training image, including: classifying the first training image by using the classification model to obtain at least one class response map corresponding to the at least one class one by one, wherein each class response map has a response region corresponding to the class corresponding to each class response map; processing the at least one category response map with the at least one peak stimulation layer, respectively, to obtain at least one confidence score; and performing probability back propagation processing on the classification model based on the at least one confidence score and the at least one category response map to obtain a peak response map corresponding to the first training image.

According to one or more embodiments of the present disclosure, the processing the at least one class response map with the at least one peak stimulation layer to obtain at least one confidence score, respectively, includes: for each of the at least one category response graph: determining a response region in the category response map; and calculating the average value of the numerical values of all the elements in the response area, and taking the average value as the confidence score corresponding to the category response graph.

According to one or more embodiments of the present disclosure, the at least one peak stimulation layer is a convolutional layer, and calculating an average of values of all elements in the response region includes: performing convolution processing on the class response graph by using a convolution kernel corresponding to the class response graph to obtain the average value, wherein a formula of the convolution kernel is as follows:

wherein the content of the first and second substances,

a convolution kernel representing the class response map, c represents the class corresponding to the class response map, and (c)

,

) Coordinate values representing the kth element in the response region of the class response map,

representing a total number of elements in a response region of the category response graph,

the representation of the function of a dirac,

indicating an addition process.

According to one or more embodiments of the present disclosure, performing probability back propagation processing on the classification model based on the at least one confidence score and the at least one category response map to obtain a peak response map corresponding to the first training image includes: performing probability back propagation processing on the classification model based on the at least one confidence score and the at least one category response map, and determining at least one intermediate peak response map corresponding to the at least one category in a one-to-one mode; acquiring the category of a target area in the first training image; and taking the intermediate peak response graph corresponding to the category of the target area in the first training image as the peak response graph corresponding to the first training image.

According to one or more embodiments of the present disclosure, performing a probability back propagation process on the classification model based on the at least one confidence score and the at least one class response map, and determining at least one intermediate peak response map corresponding to the at least one class in a one-to-one manner, includes: determining N processing layers included by the classification model, wherein the N processing layers are sequentially from a 1 st processing layer to an Nth processing layer along a forward propagation direction, the forward propagation direction is a direction from input to output of the classification model, and N is a positive integer greater than 1; for each selected category in the at least one category, sequentially obtaining a probability map for the selected category corresponding to each processing layer along a backward propagation direction based on the confidence score corresponding to the selected category and the category response map corresponding to the selected category, and taking the probability map for the selected category corresponding to the 1 st processing layer as an intermediate peak response map corresponding to the selected category, wherein the backward propagation direction is opposite to the forward propagation direction.

According to one or more embodiments of the present disclosure, sequentially obtaining a probability map for the selected category corresponding to each processing layer along a back propagation direction based on the confidence score corresponding to the selected category and the category response map corresponding to the selected category includes: for an ith processing layer of the N processing layers: acquiring a transition probability corresponding to the ith processing layer, wherein the transition probability is used for indicating the correlation of each element in the input feature map and the output feature map of the ith processing layer; obtaining a probability map corresponding to the ith processing layer based on the transition probability corresponding to the ith processing layer and the probability parameter corresponding to the ith processing layer, wherein when i is equal to N, the probability parameter corresponding to the ith processing layer is a category response map corresponding to the selected category, and the transition probability corresponding to the ith processing layer is obtained, including: and acquiring the switching probability corresponding to the Nth processing layer based on the confidence score corresponding to the selected category, wherein when i is greater than or equal to 1 and less than N, the probability parameter corresponding to the ith processing layer is a probability map corresponding to the (i + 1) th processing layer, and i is a positive integer.

According to one or more embodiments of the present disclosure, obtaining the transition probability corresponding to the ith processing layer includes: when i is equal to N, determining the input characteristic parameter corresponding to the Nth processing layer based on the confidence score corresponding to the selected category, and when i is greater than or equal to 1 and less than N, taking the input characteristic graph of the ith processing layer as the input characteristic parameter of the ith processing layer; determining a gradient map corresponding to the i-th processing layer in the forward propagation direction based on the input feature parameters and a loss function of the classification model; and obtaining the switching probability corresponding to the ith processing layer based on the gradient map corresponding to the ith processing layer.

According to one or more embodiments of the present disclosure, determining the input feature parameter corresponding to the nth processing layer based on the confidence score corresponding to the selected category includes: determining the input characteristic parameters corresponding to the Nth processing layer by using the following formula:

wherein the content of the first and second substances,

representing the input characteristic parameters for the selected category corresponding to the Nth processing layer,

representing a total number of elements in a response region of a class response map corresponding to the selected class, L representing a loss function of the classification model,

representing a confidence score corresponding to the selected category,

representing the convolution kernel in the peak stimulation layer corresponding to the selected class.

According to one or more embodiments of the present disclosure, performing probability back propagation processing on the classification model based on the at least one confidence score and the at least one category response map to obtain a peak response map corresponding to the first training image includes: acquiring a target type of a target area in the first training image; performing probability back propagation processing on the classification model based on the confidence score corresponding to the target class and the class response graph corresponding to the target class, and determining an intermediate peak value response graph corresponding to the target class; and taking the intermediate peak response graph corresponding to the target class as the peak response graph corresponding to the first training image.

According to one or more embodiments of the present disclosure, performing region segmentation training on the segmentation network using the first training image with the position information as auxiliary information includes: processing the first training image by using the segmentation network to obtain a training segmentation image; calculating a loss value of the segmentation network through a loss function corresponding to the segmentation network based on the training segmentation image by using the position information as auxiliary information; and modifying parameters of the segmented network based on the loss values; and when the loss value corresponding to the segmentation network does not meet the preset accuracy rate condition, continuously inputting the first training image to repeatedly execute the training process.

According to one or more embodiments of the present disclosure, before performing region segmentation training on the segmentation network using the first training image using the position information as auxiliary information, the training method further includes: acquiring at least one second training image and at least one region segmentation label corresponding to the at least one second training image one to one; and training the segmentation network to be trained based on the at least one second training image and the at least one region segmentation label to obtain the segmentation network.

According to one or more embodiments of the present disclosure, the segmentation network includes at least one encoder module and at least one decoder module, the at least one encoder module is configured to perform a feature extraction process on the first training image to obtain a plurality of feature maps, the at least one decoder module is configured to perform a resizing process and a channel join process on the plurality of feature maps to obtain a segmentation result, and the at least one encoder module and the at least one decoder module perform a convolution process using a residual module.

According to one or more embodiments of the present disclosure, the classification model is a full convolution network.

According to one or more embodiments of the present disclosure, the first training image is a medical image.

According to one or more embodiments of the present disclosure, an image processing method includes: acquiring an input image, wherein the input image comprises a target area; performing region segmentation processing on the input image by using an image processing network to obtain a segmented image corresponding to the input image, wherein the segmented image has a segmented region corresponding to the target region in the input image, and the image processing network is at least partially trained according to the training method of any embodiment of the disclosure.

According to one or more embodiments of the present disclosure, a training apparatus of an image processing network, wherein the image processing network includes a segmentation network for obtaining a segmented image, the training apparatus includes: an acquisition unit configured to acquire a first training image, wherein the first training image includes a target region to be segmented; a processing unit, configured to process the first training image by using the classification network to obtain a peak response map corresponding to the first training image, where the peak response map includes location information corresponding to the target area; and the training unit is configured to perform region segmentation training on the segmentation network by using the first training image by using the position information as auxiliary information.

According to one or more embodiments of the present disclosure, an image processing apparatus includes: an image input unit configured to acquire an input image, wherein the input image includes a target region; a segmentation processing unit configured to perform a region segmentation process on the input image by using an image processing network to obtain a segmented image corresponding to the input image, wherein the segmented image has a segmented region corresponding to the target region in the input image, and the image processing network is at least partially trained according to the training method of any embodiment of the present disclosure.

According to one or more embodiments of the present disclosure, an electronic device includes: a memory non-transiently storing computer executable instructions; a processor configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, implement the training method of the image processing network according to any embodiment of the present disclosure or the image processing method according to any embodiment of the present disclosure.

According to one or more embodiments of the present disclosure, a non-transitory computer-readable storage medium stores computer-executable instructions that, when executed by a processor, implement a training method of an image processing network according to any one of the embodiments of the present disclosure or an image processing method according to any one of the embodiments of the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

For the present disclosure, there are also the following points to be explained:

(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to the common design.

(2) Thicknesses and dimensions of layers or structures may be exaggerated in the drawings used to describe embodiments of the present invention for clarity. It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.

(3) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims

1. A training method for an image processing network, wherein the image processing network comprises a segmentation network for obtaining segmented images, the training method comprising:

acquiring a first training image, wherein the first training image comprises a target area to be segmented;

processing the first training image by using a classification network to obtain a peak response image corresponding to the first training image, wherein the peak response image comprises position information corresponding to the target area;

performing region segmentation training on the segmentation network by using the first training image with the position information as auxiliary information,

wherein the classification network comprises a classification model configured to identify at least one class of target regions and at least one peak stimulation layer,

processing the first training image by using the classification network to obtain a peak response graph corresponding to the first training image, wherein the peak response graph comprises:

classifying the first training image by using the classification model to obtain at least one class response map corresponding to the at least one class one by one, wherein each class response map has a response region corresponding to the class corresponding to each class response map;

processing the at least one category response map with the at least one peak stimulation layer, respectively, to obtain at least one confidence score;

and performing probability back propagation processing on the classification model based on the at least one confidence score and the at least one category response map to obtain a peak response map corresponding to the first training image.

2. The training method of claim 1, wherein the first training image has no region segmentation labels corresponding to a target region to be segmented, and the segmentation network has not undergone any region segmentation training on the target region to be segmented before the region segmentation training on the segmentation network.

3. Training method according to claim 1, wherein the processing of the at least one class response map with the at least one peak stimulation layer, respectively, to derive at least one confidence score comprises:

for each of the at least one category response graph:

determining a response region in the category response map;

and calculating the average value of the numerical values of all the elements in the response area, and taking the average value as the confidence score corresponding to the category response graph.

4. Training method according to claim 3, wherein the at least one peak stimulation layer is a convolutional layer,

calculating an average of the values of all elements in the response region, comprising:

performing convolution processing on the class response graph by using a convolution core corresponding to the class response graph to obtain the average value,

wherein the formula of the convolution kernel is expressed as:

wherein the content of the first and second substances,

,

the representation of the function of a dirac,

indicating an addition process.

5. The training method of claim 1, wherein performing a probability back-propagation process on the classification model based on the at least one confidence score and the at least one class response map to obtain a peak response map corresponding to the first training image comprises:

performing probability back propagation processing on the classification model based on the at least one confidence score and the at least one category response map, and determining at least one intermediate peak response map corresponding to the at least one category in a one-to-one mode;

acquiring the category of a target area in the first training image;

and taking the intermediate peak response graph corresponding to the category of the target area in the first training image as the peak response graph corresponding to the first training image.

6. The training method of claim 5, wherein performing a probabilistic back-propagation process on the classification model based on the at least one confidence score and the at least one class response map to determine at least one intermediate peak response map corresponding to the at least one class in a one-to-one manner comprises:

determining N processing layers included by the classification model, wherein the N processing layers are sequentially from a 1 st processing layer to an Nth processing layer along a forward propagation direction, the forward propagation direction is a direction from input to output of the classification model, and N is a positive integer greater than 1;

for each selected category in the at least one category, sequentially obtaining a probability map for the selected category corresponding to each processing layer along a backward propagation direction based on the confidence score corresponding to the selected category and the category response map corresponding to the selected category, and taking the probability map for the selected category corresponding to the 1 st processing layer as an intermediate peak response map corresponding to the selected category, wherein the backward propagation direction is opposite to the forward propagation direction.

7. The training method of claim 6, wherein obtaining the probability map for the selected category corresponding to each processing layer in turn along a back propagation direction based on the confidence score corresponding to the selected category and the category response map corresponding to the selected category comprises:

for an ith processing layer of the N processing layers:

acquiring a transition probability corresponding to the ith processing layer, wherein the transition probability is used for indicating the correlation of each element in the input feature map and the output feature map of the ith processing layer;

obtaining a probability map corresponding to the ith processing layer based on the switching probability corresponding to the ith processing layer and the probability parameter corresponding to the ith processing layer,

when i is equal to N, the probability parameter corresponding to the ith processing layer is the category response map corresponding to the selected category, and the transition probability corresponding to the ith processing layer is obtained, including:

acquiring the switching probability corresponding to the Nth processing layer based on the confidence score corresponding to the selected category,

and when i is greater than or equal to 1 and less than N, the probability parameter corresponding to the ith processing layer is a probability map corresponding to the (i + 1) th processing layer, and i is a positive integer.

8. The training method of claim 7, wherein obtaining the transition probability corresponding to the i-th processing layer comprises:

when i is equal to N, determining the input characteristic parameter corresponding to the Nth processing layer based on the confidence score corresponding to the selected category, and when i is greater than or equal to 1 and less than N, taking the input characteristic graph of the ith processing layer as the input characteristic parameter of the ith processing layer;

determining a gradient map corresponding to the i-th processing layer in the forward propagation direction based on the input feature parameters and a loss function of the classification model;

and obtaining the switching probability corresponding to the ith processing layer based on the gradient map corresponding to the ith processing layer.

9. The training method of claim 8, wherein determining the input feature parameters corresponding to the nth processing layer based on the confidence scores corresponding to the selected categories comprises:

determining the input characteristic parameters corresponding to the Nth processing layer by using the following formula:

wherein c represents the selected category,

representing a confidence score corresponding to the selected category,

10. The training method of claim 1, wherein performing a probability back-propagation process on the classification model based on the at least one confidence score and the at least one class response map to obtain a peak response map corresponding to the first training image comprises:

acquiring a target type of a target area in the first training image;

performing probability back propagation processing on the classification model based on the confidence score corresponding to the target class and the class response graph corresponding to the target class, and determining an intermediate peak value response graph corresponding to the target class;

and taking the intermediate peak response graph corresponding to the target class as the peak response graph corresponding to the first training image.

11. The training method according to claim 1 or 2, wherein performing region segmentation training on the segmentation network using the first training image using the position information as auxiliary information includes:

processing the first training image by using the segmentation network to obtain a training segmentation image;

calculating a loss value of the segmentation network through a loss function corresponding to the segmentation network based on the training segmentation image by using the position information as auxiliary information; and

modifying parameters of the segmented network based on the loss values;

and when the loss value corresponding to the segmentation network does not meet the preset accuracy rate condition, continuously inputting the first training image to repeatedly execute the training process.

12. The training method according to claim 1, wherein, before performing region segmentation training on the segmentation network using the first training image using the position information as auxiliary information, the training method further comprises:

acquiring at least one second training image and at least one region segmentation label corresponding to the at least one second training image one to one;

and training the segmentation network to be trained based on the at least one second training image and the at least one region segmentation label to obtain the segmentation network.

13. Training method according to claim 1 or 2, wherein said segmentation network comprises at least one encoder module and at least one decoder module,

the at least one encoder module is configured to perform a feature extraction process on the first training image to obtain a plurality of feature maps,

the at least one decoder module is configured to perform a resize process and a channel join process on the plurality of feature maps to obtain a segmentation result,

the at least one encoder module and the at least one decoder module perform convolution processing with a residual module.

14. The training method of claim 1, wherein the classification model is a full convolution network.

15. An image processing method comprising:

acquiring an input image, wherein the input image comprises a target area;

performing a region segmentation process on the input image using an image processing network to obtain a segmented image corresponding to the input image, wherein the segmented image has a segmented region corresponding to the target region in the input image,

wherein the image processing network is at least partly trained according to the training method of any one of claims 1-14.

16. A training apparatus for an image processing network, wherein the image processing network comprises a segmentation network for obtaining a segmented image,

the training apparatus includes:

an acquisition unit configured to acquire a first training image, wherein the first training image includes a target region to be segmented;

a processing unit configured to process the first training image by using a classification network to obtain a peak response map corresponding to the first training image, wherein the peak response map includes position information corresponding to the target area;

a training unit configured to perform region segmentation training on the segmentation network using the first training image using the position information as auxiliary information,

17. An image processing apparatus comprising:

an image input unit configured to acquire an input image, wherein the input image includes a target region;

a segmentation processing unit configured to perform region segmentation processing on the input image using an image processing network to obtain a segmented image corresponding to the input image, wherein the segmented image has a segmented region corresponding to the target region in the input image,

18. An electronic device, comprising:

a memory non-transiently storing computer executable instructions;

a processor configured to execute the computer-executable instructions,

wherein the computer executable instructions, when executed by the processor, implement a training method for an image processing network according to any one of claims 1-14 or an image processing method according to claim 15.

19. A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions,

the computer executable instructions, when executed by a processor, implement a method of training of an image processing network according to any one of claims 1 to 14 or a method of image processing according to claim 15.