WO2023165033A1

WO2023165033A1 - Method for training model for recognizing target in medical image, method for recognizing target in medical image, and device and medium

Info

Publication number: WO2023165033A1
Application number: PCT/CN2022/095137
Authority: WO
Inventors: 潘晓春; 王娟; 陈素平; 夏斌
Original assignee: 深圳硅基智能科技有限公司
Priority date: 2022-03-02
Filing date: 2022-05-26
Publication date: 2023-09-07
Also published as: CN114581709A

Abstract

A method for training a model for recognizing a target in a medical image, a method for recognizing a target in a medical image, and a device and a medium. The model training method comprises acquiring a medical image as a training sample, and a marked region corresponding to a target in the training sample (S102); determining a region segmentation result corresponding to the marked region, and constructing a training set by using the training sample and the region segmentation result (S104), wherein the region segmentation result is acquired by performing under-segmentation on image data in the marked region; and on the basis of the training set, training a model to be trained, and optimizing said model by using a training loss function (S106), wherein in the training loss function, the negative effect of pixels of a first region in the training sample on said model is reduced by using a spatial weight, the first region is a region in the training sample other than a target region of the target in the marked region, and the target region is determined by the region segmentation result. Therefore, a small target can be effectively recognized.

Description

Model training, method, device and medium for recognizing objects in medical images

technical field

The present disclosure relates to the field of image processing based on artificial intelligence, and in particular to a model training, method, device and medium for identifying targets in medical images.

Background technique

In recent years, artificial intelligence technology has made great achievements in the field of computer vision. For example, deep learning techniques are increasingly used in semantic segmentation, image classification, and object recognition. Especially in the medical field, segmentation, recognition or classification of targets in medical images is often used to assist target analysis.

At present, the deep learning target recognition technology can obtain high recognition accuracy for large-sized targets, but the recognition performance for small targets (such as thin objects or small objects) is not satisfactory, which is likely to cause false positives and false alarms. , and it is also difficult to distinguish the categories of small objects. For example, in fundus images, small target signs such as spot hemorrhages and microvascular tumors are not easy to find and distinguish when deep learning is used for target recognition because the targets are small, light in color, and close in color. Therefore, how to effectively identify small targets remains to be studied.

Contents of the invention

The present disclosure is proposed in view of the above-mentioned state of the prior art, and its purpose is to provide a model training, method, device and medium for identifying targets in medical images that can effectively identify small targets.

To this end, the first aspect of the present disclosure provides a model training method for identifying a target in a medical image, including: acquiring the medical image as a training sample and a labeled region corresponding to the target in the training sample; determining the The region segmentation result corresponding to the marked region, and using the training sample and the region segmentation result to construct a training set, wherein the region segmentation result is obtained by under-segmenting the image data in the marked region; and based on The training set trains the model to be trained, and optimizes the model to be trained using a training loss function, wherein, in the training loss function, spatial weights are used to reduce the pair of pixels in the first region of the training sample to the Negative impact of the model to be trained, the first region is a region other than the target region of the target in the labeled region in the training sample, and the target region is determined by the region segmentation result. In this case, by under-segmenting the image data in the labeled area in the training sample to identify pixels of undetermined categories in the labeled area, and combining the spatial weights to train the model to be trained to reduce the undetermined category in the labeled area The negative impact of the pixels on the model to be trained can improve the accuracy of the prediction result of the model to be trained on the input image after training. Thus, small targets can be effectively identified.

In addition, in the model training method involved in the first aspect of the present disclosure, optionally, obtaining the region segmentation result further includes: obtaining image data to be segmented based on the image data corresponding to the marked region in the training sample, Or obtain the image data to be segmented based on the image data corresponding to the marked region in the training sample and the image data corresponding to the marked region in the segmentation result of interest, wherein the segmentation result of interest is used to identify the A binary image of the region of interest of the training sample; and performing threshold segmentation on the image data to be segmented by using a target segmentation threshold, and then obtaining the region segmentation result, wherein the region segmentation result is a binary image. In this case, the target region in the image data to be segmented can be identified through threshold segmentation, and when the labeled region includes a region other than the region of interest, noise outside the region of interest can be eliminated.

In addition, in the model training method involved in the first aspect of the present disclosure, optionally, the target segmentation threshold is obtained according to the method for obtaining the threshold value of the label category to which the target belongs, wherein the method for obtaining the threshold value of each label category is determined by The average area and average color of each label category are determined, and the threshold value acquisition method includes the first method and the second method, and the average area of the label category corresponding to the first method is larger than the label corresponding to the second method The average area of the category and the average color of the label category corresponding to the first method is lighter than the average color of the label category corresponding to the second method; for the first method, find the threshold, so that the to-be-segmented The area of the pixel whose grayscale value is greater than the threshold in the image data is less than a preset multiple of the area of the image data to be segmented, and the threshold is used as the target segmentation threshold, wherein the preset multiple is greater than 0 and less than 1. For the second method, if the length of the smallest side of the image data to be segmented is less than a preset length, then take the mean value of the grayscale values of the pixels in the image data to be segmented as the target segmentation threshold , otherwise the target segmentation threshold is determined based on the gray values of the four corner areas and the central area of the image data to be segmented. In this case, the target segmentation threshold can be obtained according to the characteristics of the label category corresponding to the target. Accordingly, the accuracy of threshold segmentation can be improved.

In addition, in the model training method involved in the first aspect of the present disclosure, optionally, before obtaining the region segmentation result: an erosion operation is also performed on the threshold segmentation result of the image data to be segmented to obtain at least one connected region, selecting the connected region whose center is closest to the center of the image data to be segmented from the at least one connected region as the region segmentation result. Thus, an accurate target area can be obtained.

In addition, in the model training method involved in the first aspect of the present disclosure, optionally, among the spatial weights, the pixels in the first region in the training sample are assigned a first weight, wherein the The first weight is 0. In this case, samples of undetermined categories can be ignored to reduce the negative impact of samples of undetermined categories on the model to be trained.

In addition, in the model training method involved in the first aspect of the present disclosure, optionally, the pixels in the first region, the second region, the third region and the fourth region in the training sample are respectively assigned the first A weight, a second weight, a third weight, and a fourth weight, wherein the second area is the target area, the third area is an area in the area of interest that does not belong to the marked area, and the The fourth area is an area outside the area of interest, the first weight is less than the second weight and less than the third weight, the fourth weight is less than the second weight and less than the third weight Weights. In this case, the negative impact of pixels of undetermined categories and pixels outside the region of interest on the model to be trained can be suppressed, and the positive influence of the target region within the target region and the non-target region within the region of interest on the model to be trained can be improved. Thereby, the accuracy of the model can be improved.

In addition, in the model training method according to the first aspect of the present disclosure, optionally, the model to be trained is a semantic segmentation model, and a prediction result of the model to be trained is a semantic segmentation result of the training sample. Thus, small objects can be identified.

In addition, in the model training method according to the first aspect of the present disclosure, optionally, the shape of the labeled region is a rectangle. Thus, the difficulty of labeling can be reduced.

A second aspect of the present disclosure provides an electronic device, which includes: at least one processing circuit configured to execute the steps of the model training method described in the first aspect of the present disclosure.

A third aspect of the present disclosure provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the steps of the above-mentioned model training method are implemented.

A fourth aspect of the present disclosure provides a method for identifying a target in a medical image, the method comprising: acquiring the medical image as an input image; and using at least one model training method according to the first aspect of the present disclosure to train training a model, determining a prediction result of each trained model for the input image, and obtaining a target prediction result based on the prediction result of the at least one trained model.

In addition, in the method involved in the fourth aspect of the present disclosure, optionally, the prediction results of each trained model include the probability that each pixel in the input image belongs to the corresponding label category, and the Integrating the prediction results of at least one trained model to obtain an integrated probability that each pixel of the input image belongs to the corresponding label category, determining a connected region based on the integrated probability, and obtaining the target prediction corresponding to each label category based on the connected region As a result, wherein, if there is only one trained model, the probability is used as the integrated probability, otherwise, the average value of the prediction results of multiple trained models is calculated to obtain the probability that each pixel in the input image belongs to the corresponding label category The probability mean is used as the integrated probability. In this case, obtaining the target prediction result based on the integrated probability can further improve the accuracy of the target prediction result.

In addition, in the method related to the fourth aspect of the present disclosure, optionally, the medical image is a fundus image. In this case, the trained model is able to recognize small objects in fundus images.

In addition, in the method according to the fourth aspect of the present disclosure, optionally, the target includes microangioma, spotting, sheet bleeding and linear bleeding. In this case, the trained model is able to recognize small objects in fundus images.

A fifth aspect of the present disclosure provides an electronic device, the electronic device comprising: at least one processing circuit configured to: acquire the medical image as an input image; and utilize the medical image according to the first aspect of the present disclosure The at least one trained model trained by the model training method determines a prediction result of each trained model for the input image, and obtains a target prediction result based on the prediction result of the at least one trained model.

According to the present disclosure, a model training, method, device and medium for identifying targets in medical images capable of effectively identifying small targets are provided.

Description of drawings

The present disclosure will now be explained in further detail by way of example only with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram showing an example of a recognition target environment related to an example of the present disclosure.

FIG. 2 is a flowchart illustrating an example of a model training method related to an example of the present disclosure.

FIG. 3 is a schematic diagram illustrating some examples of labeled regions related to examples of the present disclosure.

FIG. 4 is a schematic diagram showing some example region segmentation results related to examples of the present disclosure.

FIG. 5 is a flowchart illustrating an example of acquiring a result of area division related to an example of the present disclosure.

FIG. 6 is an architecture diagram showing an example of a model to be trained using the U-Net architecture involved in an example of the present disclosure.

FIG. 7 is a schematic diagram illustrating several areas of some examples to which examples of the present disclosure relate.

FIG. 8 is a flowchart illustrating an example of a method of recognizing an object in an image according to an example of the present disclosure.

Detailed ways

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the drawings. In the following description, the same reference numerals are given to the same components, and repeated descriptions are omitted. In addition, the drawings are only schematic diagrams, and the ratio of dimensions between components, the shape of components, and the like may be different from the actual ones. It should be noted that the terms "comprising" and "having" and any variations thereof in the present disclosure, such as a process, method, system, product or device including or having a series of steps or units, are not necessarily limited to the clearly listed instead, may include or have other steps or elements not explicitly listed or inherent to the process, method, product or apparatus. All methods described in this disclosure can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context.

The term "circuitry" herein may refer to hardware circuits and/or a combination of hardware circuits and software. The term "model" in this disclosure is capable of processing an input and providing a corresponding output. In this article, the terms "neural network", "deep neural network", "model", "network" and "neural network model" are used interchangeably. In addition, this paper mentions the rectangular characteristics (such as side, width, height, width, and height) of related objects (such as labeled regions, image data to be segmented, and objects). If the object itself is not rectangular, unless otherwise specified, it can be defaulted Rectangle property that is the bounding rectangle of the object.

Existing deep learning target recognition schemes use frame labels of various shapes (that is, labeling methods that do not require accurate boundaries) to identify small targets. However, as mentioned briefly above, this scheme is unsatisfactory in the recognition of small targets in the image, and there is a large risk of false negatives and false alarms. This is due to the fact that small objects have smaller areas, fewer features that can be extracted, and are easily disturbed by noise and other tissues. Therefore, a better solution is to segment small targets by means of deep learning target segmentation to realize the recognition of small targets. However, this scheme requires precise labeling of the boundaries of small objects, which makes image labeling difficult. In order to overcome the shortcomings of the above two solutions, the present disclosure obtains a region segmentation result by under-segmenting the labeled region, and uses the result as a gold standard to segment the image, thereby realizing accurate recognition of small objects. In particular, the present disclosure adopts a spatial weight method to deal with the negative impact of pixels of undetermined categories on image segmentation caused by under-segmentation in the labeled region. In this case, small targets can be effectively identified.

Accordingly, examples of the present disclosure propose a scheme for training models and recognizing objects in images to address one or more of the above-mentioned problems and/or other potential problems. The scheme adopts the method of image segmentation for target recognition (that is, image segmentation is first performed on the image data in the marked area in the training sample to obtain the area segmentation result, and then the area segmentation result is post-processed to obtain the target recognition result). Specifically, this scheme identifies pixels of undetermined categories in the labeled region by under-segmenting the image data in the labeled region in the training sample, and combines spatial weights (that is, the setting of the weights can be related to the position of the pixels) to neural networks. The network model is trained to reduce the negative impact of pixels of undetermined categories in the labeled region on the neural network model, which can improve the accuracy of the prediction result of the trained model on the input image (eg, medical image). In addition, the trained model can be a trained neural network model (that is, a trained neural network model. For example, a trained semantic segmentation model. Thus, it is possible to optimize the performance of the trained model and improve the accuracy of the model for small objects. Accuracy of recognition. In some examples, the trained model may be an optimal neural network model obtained after training.

Examples of the present disclosure relate to training models and schemes for recognizing objects in images, which efficiently recognize small objects. The model training method for recognizing objects in images involved in the examples of the present disclosure may be simply referred to as a model training method or a training method. It should be noted that the solutions involved in the examples of the present disclosure are also applicable to the recognition of large objects.

Examples of the present disclosure may refer to images from cameras, CT scans, PET-CT scans, SPECT scans, MRI, ultrasound, X-rays, angiograms, fluoroscopy, capsule endoscopic images, or combinations thereof. In some examples, the images may be medical images. For example, medical images may include, but are not limited to, fundus images, lung images, stomach images, chest images, brain images, and the like. Thus, small targets in medical images can be identified. In some examples, the image can be a natural image. The natural image may be an image observed or captured in a natural scene. Thus, small objects in natural images can be identified. The following describes an example of the present disclosure by taking the image as an example of a fundus image in a medical image, and such description does not limit the scope of the present disclosure. For those skilled in the art, other types of images can be used without limitation.

Examples of the present disclosure will be described in detail below with reference to the accompanying drawings. For ease of understanding, the specific data mentioned in the following description are exemplary, and are not intended to limit the protection scope of the present disclosure. It should be understood that examples according to the present disclosure may also include additional modules not shown, modules shown may be omitted, additional acts not shown, and/or acts shown may be omitted, and that the scope of the present disclosure is not limited in this regard. Restricted.

FIG. 1 is a schematic diagram showing an example of a recognition target environment 100 related to an example of the present disclosure. As shown in FIG. 1 , recognition target environment 100 may include computing device 110 . Computing device 110 may be any device having computing capabilities. For example, computing device 110 may be a cloud server, a personal computer, a mainframe, a distributed computing system, and the like.

The computing device 110 can obtain an input 120 and generate an output 140 corresponding to the input 120 by using a neural network model 130 (sometimes also referred to as a model to be trained 130 or a model 130 for short). In some examples, the input 120 may be the above-mentioned image, and the output 140 may be a prediction result, training parameters (eg, weights), or performance indicators (eg, accuracy rate and error rate), etc. In some examples, the neural network model 130 may include, but is not limited to, a semantic segmentation model (eg, U-Net), or other models related to image processing. Additionally, the neural network model 130 may be implemented using any suitable network structure. For example, convolutional neural network (CNN), recurrent neural network (RNN) and deep neural network (DNN), etc.

In some examples, the recognition target environment 100 may further include a model training device and a model application device (not shown). The model training device can be used to implement the training method of training the neural network model 130 to obtain a trained model. The model application device can be used to implement a related method for obtaining prediction results using a trained model to identify objects in an image. In addition, in the model training phase, the neural network model 130 may be the model 130 to be trained. In the model application phase, the neural network model 130 may be a trained model.

FIG. 2 is a flowchart illustrating an example of a model training method related to an example of the present disclosure. For example, the model training method can be performed by the computing device 110 shown in FIG. 1 . Additionally, the model training method can train a model that recognizes objects in medical images.

As shown in FIG. 2, the model training method may include step S102. In step S102, medical images serving as training samples and labeled regions corresponding to targets in the training samples may be acquired. That is, in the training phase, medical images can be obtained as training samples. Thereby, it is possible to recognize the target in the medical image. In some examples, the medical images may be color images. Thus, the accuracy of recognizing small targets can be improved.

In addition, the medical image may contain a corresponding target, and the target may belong to at least one category of interest (that is, the category that needs to be identified). In some examples, if the medical image is a fundus image, the target may include small targets such as microangioma, spot hemorrhage, sheet hemorrhage, and linear hemorrhage. In this case, the trained model is able to recognize small objects in fundus images.

In some examples, objects in the training samples may be labeled to obtain labeled regions. In addition, the shape of the labeled region can be a rectangle, a circle, or a shape that matches the shape of the object in the training sample (for example, the shape of the labeled region can be the outline of the object). Preferably, the shape of the marked area can be a rectangle. Thus, the difficulty of labeling can be reduced. As an example, FIG. 3 shows a marked area D1 in a fundus image, where the marked area D1 is rectangular in shape, and the target in the marked area D1 is a sheet hemorrhage.

In addition, the marked region may have a corresponding marked label (that is, marked category of the target), and the marked label may be used to distinguish the category of the target. Label categories can be in one-to-one correspondence with target categories. For example, for a fundus image, the target category and label category may respectively include but not limited to microvascular tumor, spot hemorrhage, sheet hemorrhage, linear hemorrhage, and the like. In some examples, corresponding label categories may be represented numerically. Thus, it is possible to facilitate calculation by the computing device 110 . In addition, the labeled regions and corresponding labeled labels may be referred to as labeled results.

As shown in FIG. 2, the model training method may further include step S104. In step S104, a region segmentation result (also called a pseudo-segmentation result) corresponding to the labeled region in the training sample may be determined, and a training set may be constructed using the training sample and the region segmentation result. It should be noted that, in some other examples, it is not necessary to determine the region segmentation result corresponding to the marked region, as long as the target region in the marked region (described later) can be identified, and the pixels in the target region are determined to belong to the target. Can.

In some examples, according to the actual situation (for example, the quality of the training samples does not meet the training requirements or the size of the training samples is not uniform), before constructing the training set, the training samples can be preprocessed and then used to construct the training set. set.

In some examples, preprocessing the training samples may include unifying the size of the training samples. For example, the size of the training samples can be unified to 1024×1024 or 2048×2048. The disclosure does not limit the size of the training samples. In some examples, preprocessing the training samples may include clipping the training samples. In some examples, in clipping the training samples, a region of interest in the training sample may be obtained and the region of interest may be used to clip the training sample. Accordingly, it is possible to make the size of the training samples uniform and include the region of interest. In some examples, the region of interest may be a region where objects may exist (also referred to as a foreground region). For example, for a fundus image, the region of interest may be a fundus region.

In some examples, training samples may be segmented to obtain regions of interest. In some examples, threshold segmentation may be performed on the training samples to obtain segmentation results of interest, wherein the segmentation results of interest may be used to identify regions of interest of the training samples. Thereby, a region of interest can be identified. In addition, the segmentation result of interest obtained through threshold segmentation may be a binary image (also referred to as a binary image). It can be understood that, although the segmentation result of interest is obtained through threshold segmentation above, other segmentation results suitable for obtaining the segmentation result of interest are also applicable. For example, the segmentation result of interest can be obtained through a neural network.

In some examples, performing threshold segmentation on the training sample may be to divide the training sample into a preset number of shares (for example, 9 equal parts), determine the segmentation threshold based on the gray values of the four corners of the training sample and the central area, and based on The segmentation threshold performs threshold segmentation on the training samples, and then obtains the segmentation results of interest. In some examples, the determination of the segmentation threshold based on the gray values of the four corner areas and the central area of the training sample may be the gray mean value of the pixels in each area in the four corner areas and the gray level of the pixels in the middle area The average value of the mean is used as the segmentation threshold for threshold segmentation, and then the segmentation result of interest is obtained.

In addition, in the threshold segmentation, before obtaining the segmentation result of interest, an erosion operation may be performed on the segmentation result of the threshold corresponding to the training sample (that is, the initial segmentation result) to obtain the segmentation result of interest. For example, two erosion operations may be performed on the threshold segmentation results of the training samples to obtain the segmentation results of interest, where the size of the erosion kernel may be 5. In this way, noise at the edge of the region of interest (for example, the fundus region) can be eliminated.

Referring back to FIG. 2 , as described above, in step S104 , the region segmentation result corresponding to the labeled region may be determined. In addition, the region segmentation result can be used to determine the object region of the object within the labeled region. In this case, the target area in the labeled area can be identified, and then pixels of undetermined categories can be determined based on the target area. Specifically, the pixels outside the target area in the marked area in the training sample may be pixels of undetermined categories.

In addition, the region segmentation result may be any form of data (for example, an image) that can identify the target region. In some examples, the region segmentation result may be a binary image. In some examples, in the region segmentation result of a binary image, the region corresponding to the pixel with a value of 1 can be set as the target region (that is, if the value of the pixel is 1, it can represent the pixel at the corresponding position in the training sample Belonging to the target, if the value of the pixel is 0, it can indicate that the pixel at the corresponding position in the training sample is a pixel of an undetermined category). In this case, it is possible to reduce the negative impact of pixels of undetermined categories on the model to be trained 130 .

As an example, FIG. 4 shows the region segmentation result A1 corresponding to the labeled region D1 in FIG. 3 , where D2 is the target region. In addition, in order to display the region segmentation result A1 more clearly, the region segmentation result A1 in FIG. 4 is the result of an equal scale enlargement, which does not represent a limitation to the present disclosure. The region segmentation result A1 in FIG. 4 can actually be compared with the labeled region D1 of the same size.

In some examples, under-segmentation may be performed on the image data in the marked region in the training sample to obtain the region segmentation result (that is, the target region corresponding to the target in the marked region may be segmented through under-segmentation to obtain the region segmentation result). In this way, pixels of undetermined categories in the labeled region can be identified based on the region segmentation result obtained by under-segmentation. Generally speaking, the mis-segmentation of the foreground object as the background but the background is not mis-segmented as the foreground object can be called under-segmentation. Here, under-segmentation can be that the pixels belonging to the target in the labeled area are mis-segmented as non-targets but the pixels in the labeled area that do not belong to the target are not mis-segmented as targets. In this case, the pixels in the target area in the area segmentation result can be determined to belong to the target. In addition, within the labeled region, pixels outside the target region do not necessarily belong to the target (that is, they may be pixels of undetermined categories).

In some examples, the region segmentation results corresponding to the marked regions may be determined based on the training samples and the image data respectively corresponding to the marked regions in the above segmentation results of interest. Specifically, the image data corresponding to the labeled region in the training sample (hereinafter referred to as the first image data corresponding to the labeled region in the training sample) and the above-mentioned segmentation result of interest (that is, the segmentation result of interest can be The image data corresponding to the marked region in the binary image used to identify the region of interest of the training sample (hereinafter referred to as the second image data corresponding to the marked region in the segmentation result of interest) is subjected to a product operation to obtain the image to be segmented data (that is, the image data in the marked area), the image data to be segmented is under-segmented to determine the area segmentation result corresponding to the marked area. In this case, when the labeled region includes regions other than the region of interest, noise outside the region of interest can be eliminated.

FIG. 5 is a flowchart illustrating an example of acquiring a result of area division related to an example of the present disclosure. That is, some examples of the present disclosure obtain the flow of region segmentation results.

As shown in FIG. 5, obtaining the region segmentation result may include step S202. In step S202, image data to be segmented may be acquired based on the labeled region. As mentioned above, the first image data may be the image data corresponding to the labeled region in the training sample, and the second image data may be the image data corresponding to the labeled region in the segmentation result of interest. That is, the first image data and/or the second image data may be acquired based on the marked region, and then the image data to be segmented may be acquired based on the first image data, or the first image data and the second image data.

In some examples, image data to be segmented may be obtained based on the first image data. In some examples, the image data to be segmented may be acquired based on color channels (eg, red channel, green channel, blue channel) of the first image data. Taking the fundus image as an example, the image data to be segmented may be acquired based on the green channel of the first image data. Specifically, the first image data corresponding to the labeled region can be obtained (for example, cropped) from the training sample, and then the green channel (ie, G channel) of the first image data is taken, and the image to be segmented is obtained based on the green channel of the first image data data. In some examples, the corresponding color channel (green channel) of the first image data may be used as the image data to be segmented to obtain the image data to be segmented. In addition, the color space and color channel can be selected according to the characteristics of the medical image itself, which is not particularly limited in this disclosure.

In some other examples, the image data to be segmented may be acquired based on the first image data and the second image data. In this case, when the labeled region includes regions other than the region of interest, noise outside the region of interest can be eliminated. In some examples, the image data to be segmented may be acquired based on the color channels of the first image data and the second image data. Specifically, the color channel of the first image data can be expressed as G ₁ , and the second image data can be expressed as B ₁ , then the image data to be divided can be expressed as I ₁ =G ₁ □B ₁ , where I ₁ can represent Segmenting image data, □ can represent element (that is, the gray value of a pixel) product operation.

It should be noted that the first image data, the second image data, and the image data to be segmented may represent image data (for example, pixel data, data stream or image) of the corresponding area, and in practice, the pixels of the corresponding area may be The value of or the position mark of the pixel is stored in a corresponding medium (such as a memory or a disk) to form a corresponding form of image data, which can be conveniently processed. In addition, the shapes of the first image data, the second image data, and the image data to be segmented can match the shape of the marked area, or can be a circumscribed rectangle of the marked area, which can be selected according to the method of obtaining the area segmentation result.

In addition, in the process of obtaining the region segmentation result, if it is necessary to use the rectangular characteristics of the image data to be segmented (for example, side, length, width, height, four corners) and the shape of the marked region is not rectangular, it can be based on the circumscribed region of the marked region The area corresponding to the rectangle acquires the image data to be segmented. That is, after the shape of the labeled region is converted into a rectangle, the image data to be segmented can be obtained based on the converted labeled region.

As shown in FIG. 5, obtaining the region segmentation result may further include step S204. In step S204, threshold segmentation may be performed on the image data to be segmented to obtain a region segmentation result. However, the examples of the present disclosure are not limited thereto. In other examples, the image data to be segmented may be under-segmented in other ways to obtain the region segmentation result.

In some examples, in step S204, a target segmentation threshold (described later) may be used to perform threshold segmentation on the image data to be segmented, and then obtain a region segmentation result. Thus, the target region in the image data to be segmented can be identified through threshold segmentation. In some examples, in the threshold segmentation, the value of the pixel whose gray value is not less than the target segmentation threshold in the image data to be segmented can be set to 1, and the value of other pixels to be 0, and then the region segmentation result can be obtained.

In some examples, before obtaining the region segmentation result, an erosion operation may also be performed on the threshold segmentation result (that is, the initial segmentation result) of the image data to be segmented. In this case, it is possible to reduce the probability of isolated pixels in the threshold segmentation result due to the influence of noise.

In some examples, during the erosion operation on the threshold segmentation result of the image data to be segmented, the erosion kernel k may satisfy the formula:

Among them, h can represent the height of the marked area (that is, the marked area corresponding to the image data to be segmented) and w can represent the width of the marked area, H can represent the height of the training sample, W can represent the width of the training sample, and p can represent preset hyperparameters. In this case, a corrosion kernel of an appropriate size can be obtained according to the size of the training sample, the size of the labeled region, and the preset hyperparameters. Thereby, excessive corrosion can be suppressed.

In some examples, preset hyperparameters may be used to tune the size of the corrosion kernel. In this case, smaller corrosion nuclei can be used for particularly small objects. Thereby, it is possible to avoid an excessive erosion operation which would cause the target area of a particularly small target to disappear.

In some examples, the preset hyperparameters may be fixed values. In some examples, the preset hyperparameters may be determined according to the average size of objects of the same category in the medical images. In some examples, the preset hyperparameters may be determined according to the average width and average height of objects of the same category in the medical images. In some examples, the preset hyperparameter p can satisfy the formula:

in,

and

can respectively represent the average width and average height of the same category of targets in medical images, _σw and _σh can represent the standard deviation of width and standard deviation of height, respectively,

and

The average width and average height of medical images can be represented respectively. Here, the medical image may be an image in a data source for obtaining preset hyperparameters. In some examples, the width and height of objects of the same category in multiple training samples and the width and height of the training samples may be counted to obtain relevant parameters of preset hyperparameters. That is, the data source can be training data. In some examples, when acquiring preset hyperparameters in a medical image with labeled regions (for example, a training sample), the width and height of the target may also be the width and height of the corresponding labeled region. Thus, the width and height of the target can be acquired conveniently.

Generally speaking, there may be multiple connected regions in the threshold segmentation result of the image data to be segmented. In some examples, an erosion operation may be performed on the threshold segmentation result of the image data to be segmented to obtain at least one connected region, and a connected region whose center is closest to the center of the image data to be segmented is selected from the at least one connected region as the region segmentation result. In addition, the connected region closest to the center of the image data to be segmented may represent the identified target region. Thus, an accurate target area can be obtained. In some examples, contours can be searched for corrosion results (that is, at least one connected region), and the preset number (for example, 3) contours with the largest area are taken as candidates, and the distance between the center of the contour and the image data to be segmented is retained in the candidate contours. The connected region corresponding to the nearest contour of the center of is the region segmentation result.

In addition, in the threshold segmentation of the image data to be segmented, there are many ways to obtain the target segmentation threshold. For example, the object segmentation threshold can be obtained according to the common big law (OTSU) way. In some examples, at least one manner of acquiring the object segmentation threshold may be selected from the manners described in the examples of the present disclosure.

In some examples, the target segmentation threshold may be obtained according to the label category to which the target belongs. In some examples, the object segmentation threshold may be acquired according to an acquisition threshold method of the label category to which the object belongs. In this case, the target segmentation threshold can be obtained according to the characteristics of the label category corresponding to the target. Accordingly, the accuracy of threshold segmentation can be improved. In addition, the method for obtaining the threshold value of the labeled category may include the first method and the second method. In addition, the labeled categories of the objects in the training samples can be known. For example, the labeling category to which the target in the training sample belongs may be the labeling label in the labeling result.

In some examples, the method for obtaining the threshold value of each labeling category may be obtained through the features of each labeling category. In some examples, the acquisition threshold method may be determined according to the average area and average color of each label category. However, the examples of the present disclosure are not limited thereto, and in some other examples, the method for obtaining the threshold value of the label category may also be determined empirically. For example, for a fundus image, the first method can be used for sheet hemorrhage in the fundus image, and the second method can be used for microvascular tumors, spot hemorrhages, and linear hemorrhages in the fundus image.

In some examples, the average area and average color of each label category may be fixed values, which may be obtained according to statistics on sample data. For example, the area and color of objects of the same category (eg, for the training sample, the same category may refer to the same labeled category) in sample data (eg, training samples) may be averaged to obtain the average area and average color. In other examples, the fixed value may also be an empirical value.

In some examples, in the method for obtaining the threshold value of the label category determined according to the average area and average color of each label category, the average area of the label category corresponding to the first method may be greater than the average area of the label category corresponding to the second method and The average color of the label category corresponding to the first method may be lighter than the average color of the label category corresponding to the second method. For example, the first method can be used for objects with large areas and light colors (for example, flake hemorrhages in fundus images). The second method can target objects of this labeling category with small areas and dark colors (for example, microvascular tumors, spotting and linear bleeding in fundus images).

In some examples, in determining the method for obtaining the threshold value of the label category according to the average area and the average color of each label category, the method for obtaining the threshold value for the label category may be determined by the first preset area and preset color values. In this way, it is possible to automatically acquire the acquisition threshold method used by the label category.

In some examples, if the average area of the label category is larger than the first preset area and the average color is smaller than the preset color value (that is, the area of the target of the label category is relatively large, and the color is relatively light), then the label category can be It is determined to use the first method, otherwise if the average area of the label category is not greater than the first preset area and the average color is not less than the preset color value (that is, the area of the target of the label category is relatively small, and the color is relatively dark), Then the label class can be determined to use the second method.

In some examples, the first preset area and the preset color value can be adjusted according to the result of the region segmentation. In some examples, the first preset area and the preset color value may be fixed values, and the fixed values may be obtained according to statistics on sample data. That is, a statistical method can be used to count the region segmentation results of a small amount of sample data under different first preset areas and preset color values to determine the best first preset area and preset color values for classification .

As mentioned above, the target segmentation threshold can be obtained according to the method for obtaining the threshold value of the label category to which the target belongs. In some examples, the object segmentation threshold may be obtained according to the method for obtaining the threshold value of the labeled category to which the object belongs and the image data to be segmented corresponding to the training samples.

In some examples, for the first method (that is, the method of obtaining the threshold value of the label category to which the target belongs is the first method), the threshold value can be searched to make the area of the pixel whose gray value is greater than the threshold value in the image data to be segmented If it is smaller than a preset multiple of the area of the image data to be segmented, the threshold is used as the target segmentation threshold, wherein the preset multiple may be greater than 0 and less than 1. Taking the medical image as an 8-bit quantized image as an example, the threshold value from 0 to 255 can be traversed to find the threshold value so that the area of the pixel whose gray value is greater than the threshold value in the image data to be segmented is smaller than the preset multiple of the area of the image data to be segmented. Use this threshold as the object segmentation threshold. In addition, the preset multiple can be any value that makes the target area not divided. For example, the preset multiplier can take a smaller value so that the target area is not divided. In some examples, the preset multiplier may be empirically determined by the shape of the target.

In some examples, for the second method (that is, the method of obtaining the threshold value of the label category to which the target belongs is the second method), the mean value of the gray value of the pixels in the image data to be segmented can be used as the target segmentation threshold or based on The gray values of the four corner regions and the central region of the image data to be segmented determine the target segmentation threshold.

In some examples, for the second method, if the length of the smallest side of the image data to be segmented is less than a preset length, the average value of grayscale values of pixels in the image data to be segmented may be taken as the target segmentation threshold. In some examples, the preset length can be any value that prevents the target area from being segmented. In some examples, the preset length may be a first preset ratio of the smallest side of the training sample. Specifically, the preset length can be represented as min(rH, rW), where r can represent the first preset ratio, H can represent the height of the training sample, and W can represent the width of the training sample.

In some examples, the first preset ratio may be a fixed value. In some examples, the first preset ratio may be determined according to an average size of objects of the same category in the medical image. In some examples, the first preset ratio may be determined according to the average width and average height of objects of the same category in the medical image. In some examples, the first preset ratio may satisfy the formula:

in,

and

The average width and average height of medical images can be represented respectively. Here, the medical image may be an image in the data source used to obtain the first preset ratio. In some examples, the data source can be training data. In addition, related parameters related to the first preset ratio may be acquired in a manner similar to related parameters related to acquiring preset hyperparameters, which will not be repeated here.

In some examples, for the second method, if the length of the smallest side of the image data to be segmented is not less than the preset length, the target can be determined based on the gray values of the four corner areas and the central area of the image data to be segmented Segmentation threshold. Specifically, the image data to be segmented can be equally divided into a preset number of parts (for example, 9 equal parts), and the target segmentation threshold can be determined based on the gray values of the four corner regions and the central region of the image data to be segmented. For details, refer to the relevant description of determining the segmentation threshold based on the gray values of the four corner regions and the central region of the training sample above.

Referring back to FIG. 2, as mentioned above, in step S104, a training set may be constructed using training samples and region segmentation results. That is, the training set may be constructed based on the training samples and at least one region segmentation result corresponding to the training samples. In some examples, the training set can include training samples and a gold standard of training samples. In some examples, a gold standard of training samples can be obtained based on the region segmentation results. That is, the target region can be identified based on the region segmentation result, and then the real category to which the pixels in the training samples belong can be determined based on the target region. Thereby, the gold standard of training samples can be obtained.

In some examples, the ground-truth category may include at least one of annotated categories of objects (for example, for fundus images, may include microangioma, spotting, sheet-like bleeding, and linear bleeding), no-target category, and undetermined category . Specifically, it is related to the process of optimizing the model to be trained 130 .

In addition, the labeled category of the object in the real category may be the category to which the pixels of the target area (ie, the second area described later) of the object in the labeled area in the training sample belong. In addition, the undetermined category in the true category may be the category to which the pixels of the region other than the target region (that is, the first region described later) of the object in the labeled region in the training sample belong. In addition, the non-target category in the ground truth category can be the category to which the pixels outside the labeled region in the training samples belong. In some examples, the areas outside the labeled area in the training samples may include areas within the ROI that do not belong to the labeled area (ie, the third area described later). For example, for a medical image, the region within the region of interest and not belonging to the marked region may be the region corresponding to the non-target tissue in the medical image. In some examples, the areas outside the marked area in the training samples may include areas within the ROI that do not belong to the marked area, and areas outside the ROI (ie, the fourth area described later).

In some examples, a validation set and a test set can also be constructed using the training samples and the region segmentation results.

Referring back to FIG. 2 , the model training method may further include step S106. In step S106, the model to be trained 130 may be trained based on the training set, and the model to be trained 130 may be optimized using a training loss function.

In some examples, the model to be trained 130 may include, but is not limited to, a semantic segmentation model. In addition, the prediction results of the model to be trained 130 may include, but not limited to, semantic segmentation results of training samples. Thus, small objects can be identified. For example, in the above example where the input 120 is the image data to be semantically segmented and the model to be trained 130 is a semantic segmentation model, the prediction result may be the semantic segmentation result of the image data. In addition, the above-mentioned input 120 may be color image data.

In some examples, high-dimensional feature information can be added to the model to be trained 130 . Thus, the accuracy of recognition of small objects can be improved. In some examples, in the model to be trained 130, feature information of different dimensions in medical images (such as training samples) can be extracted, and the feature information of a preset dimension close to the feature information of the highest dimension can be combined with the feature information of the highest dimension Fusion is performed to increase high-dimensional feature information.

FIG. 6 is an architecture diagram showing an example of a model to be trained 130 employing a U-Net architecture involved in an example of the present disclosure.

As an example, FIG. 6 shows a model 130 to be trained using the U-Net architecture, wherein, for the common network layers in the U-Net architecture, too much explanation is not given here. As shown in Figure 6, the preset dimension can be 2, and the feature information of the two dimensions can include feature information 131a and feature information 132b, wherein the feature information 131a can be fused with the feature information of the highest dimension through the upsampling layer 132a, and the feature information The information 131b can be fused with the feature information of the highest dimension through the upsampling layer 132b. In addition, the convolution size of the upsampling layer 132a and the upsampling layer 132b may be any value that makes feature information (for example, feature information 131a and feature information 131b ) consistent with the size of the feature information of the highest dimension after upsampling.

In some examples, in training the model to be trained 130, the prediction result corresponding to the training sample can be obtained based on the training sample of the training set by the model to be trained 130, and then the training loss function can be constructed based on the region segmentation result and the prediction result corresponding to the training sample (That is, the training loss function can be constructed using the gold standard and prediction results of the training samples obtained based on the region segmentation results). In addition, the training loss function can represent the degree of difference between the gold standard of the training sample and the corresponding prediction result.

In some examples, the region segmentation results can be directly used as the gold standard for training samples. In some examples, the region segmentation result may be used as the gold standard of the pixels in the labeled region corresponding to the target in the training sample to obtain the gold standard of the training sample. In addition, the gold standard of pixels in areas other than the marked area corresponding to the target in the training sample can be set as required. For example, it may be fixed as a category (for example, it may be the no-target category involved in the example of the present disclosure). For another example, it may be set by manually labeling the training samples or automatically labeling the training samples by an artificial intelligence algorithm. The example of the present disclosure does not specifically limit the method of setting the gold standard of the pixels in the region other than the marked region corresponding to the target in the training sample.

In some examples, in the training loss function, weights may be assigned to the aforementioned pixels of undetermined categories in the training samples to reduce the negative impact of the pixels of undetermined categories on the model 130 to be trained. Thus, the accuracy of the model to be trained 130 can be improved. In some examples, in the training loss function, spatial weights can be used to reduce the negative impact of pixels of undetermined categories in the training samples on the model 130 to be trained.

In some examples, in the spatial weighting, the training sample may be divided into several regions (also referred to as at least one region), and weights are used to adjust the influence of each of the several regions on the model 130 to be trained.

In some examples, the number of regions may include the first region. The first area may be an area of pixels of undetermined categories in the training sample (that is, an area other than the target area in the marked area in the training sample). In some examples, in the training loss function, spatial weights may be used to reduce the negative impact of pixels in the first region of the training samples on the model to be trained 130 . In some examples, among the spatial weights, the pixels in the first region in the training samples may be assigned a first weight to reduce the negative impact on the model 130 to be trained.

In addition, the first weight may be any value that reduces the negative impact on the model 130 to be trained. In some examples, the first weight may be a fixed value. In some examples, the first weight may be zero. In this case, samples of undetermined categories can be ignored, so as to reduce the negative impact of samples of undetermined categories on the model 130 to be trained.

In some examples, the number of regions may include the second region. The second area may be the target area of the training samples. In some examples, pixels of the second region may be assigned a second weight in the spatial weights. In some examples, the first weight may be less than the second weight. In addition, the second weight may be any value that increases the positive influence of the pixels in the second region on the model 130 to be trained. In some examples, the second weight can be a fixed value. In some examples, the second weight may be one.

In some examples, the number of regions may include a third region. The third region may be a region in the region of interest in the training sample that does not belong to the labeled region. In some examples, pixels of the third region may be assigned a third weight in the spatial weights. In some examples, the first weight may be less than the third weight. In addition, the principle of setting the third weight may be similar to that of the second weight.

In some examples, the number of regions may include a fourth region. The fourth area may be an area outside the area of interest in the training sample. In some examples, pixels of the fourth region may be assigned a fourth weight in the spatial weights. In some examples, the fourth weight may be less than the second weight. In addition, the principle of setting the fourth weight may be similar to that of the first weight.

In some examples, several areas may include the first area, the second area, the third area and the fourth area at the same time, and the pixels in the first area, the second area, the third area and the fourth area may be respectively assigned the first weight, second weight, third weight and fourth weight, wherein the first weight may be smaller than the second weight and smaller than the third weight, and the fourth weight may be smaller than the second weight and smaller than the third weight. In this case, it is possible to suppress the negative impact of pixels of undetermined categories and pixels outside the region of interest on the model to be trained 130 , and increase the positive influence of the target region and non-target regions within the region of interest on the model to be trained 130 . Thereby, the accuracy of the model can be improved. Preferably, the first weight may be 0, the second weight may be 1, the third weight may be 1, and the fourth weight may be 0. In this case, it is possible to avoid the negative impact of pixels of undetermined categories and pixels outside the region of interest on the model to be trained 130 , and increase the positive influence of the target region and non-target regions within the region of interest on the model to be trained 130 . Thus, the accuracy of the model to be trained 130 can be improved.

But examples of the present disclosure are not limited thereto, and in some other examples, the several areas may include any combination of the first area, the second area, the third area and the fourth area.

In addition, in order to clearly describe several regions, FIG. 7 is a schematic diagram showing various binarized regions, and does not limit the present disclosure to be divided into all the regions shown in FIG. 7 . Wherein, D3 may represent the first area, D4 may represent the second area, D5 may represent the third area, and D6 may represent the fourth area.

As mentioned above, in some examples, in the spatial weighting, the training sample can be divided into several regions, and the influence of each region in the several regions can be adjusted by using the weights to the model 130 to be trained.

In some examples, in the training loss function, losses may be computed per class. As mentioned above, the ground-truth category may include at least one of the labeled category of the target, the non-target category, and the undetermined category. In some examples, in the training loss function, the classes can be derived from the ground-truth classes described above. That is, the categories in the training loss function may include the labeled category of the target and the non-target category, or the categories in the training loss function may include the labeled category of the target, the non-target category, and the undetermined category. The categories in the specific training loss function are related to the samples selected in the training loss function.

In some examples, in the training loss function, if samples (that is, pixels) of each category in the training samples belong to corresponding regions among several regions, the loss of the corresponding samples may be multiplied by the weight of the corresponding region. In this case, the training loss function can be determined based on the spatial weights, and then the influence of pixels in different regions on the model to be trained 130 can be adjusted.

In some examples, in the training loss function, the influence of the samples of each category on the model 130 to be trained can be adjusted based on the weight of each category. Thus, it is possible to adjust the influence of samples of different categories on the model to be trained 130 .

In some examples, in the training loss function, the influence of samples to be trained on the model 130 can be adjusted based on both spatial weights and category weights. Thus, the influence of samples on the model to be trained 130 can be adjusted by region and class.

In some examples, the training loss function may employ weighted balanced cross-entropy. In this case, the imbalance between positive and negative samples can be suppressed, thereby further improving the recognition accuracy of the model 130 to be trained for small targets. In some examples, when training the model to be trained 130 , a training loss function based on weighted equalization cross entropy may be used, and spatial weights may be used to control the negative impact of pixels of undetermined categories on the model to be trained 130 .

In the following space weights, the first weight of the first area is 0, the second weight of the second area is 1, the third weight of the third area is 1, and the fourth weight of the fourth area is 0 as an example. The description is based on Weighted balanced cross-entropy training loss function. It should be noted that this does not represent a limitation to the disclosure, and those skilled in the art can design a training loss function based on weighted balanced cross-entropy by freely combining the weights of each area and each category in several areas according to the situation. The training loss function L based on weighted balanced cross-entropy can satisfy the formula (that is, it is equivalent to ignoring the losses of the first area and the fourth area by setting the first weight and the fourth weight to 0):

Among them, C can represent the number of categories, W _i can represent the weight of the i-th category, M _i can represent the number of samples of the i-th category, and y _ij can represent the jth of the i-th category in the gold standard of the above training samples. The actual value of the sample, p _ij can represent the predicted value of the jth sample of the i-th category in the prediction result (that is, the probability that the j-th sample belongs to the i-th category). In addition, the samples of each category may be pixels of the corresponding category in the training samples. In addition, the samples of a category can be determined based on the above-mentioned gold standard of the training samples. As mentioned above, the weight of the category can adjust the impact of samples of each category on the model 130 to be trained.

In addition, in formula (1), by setting the first weight and the fourth weight to 0, ignoring the samples in the first area and the fourth area, the categories in the training loss function can include the label category of the target and the non-target category, and the target The labeled category may be the category to which the pixels in the second region in the training sample belong, and the non-target category may be the category to which the pixels in the third region in the training sample belong. Taking the fundus image as an example, the categories in the training loss function of Equation (1) can include microangioma, spot hemorrhage, sheet hemorrhage and linear hemorrhage and no target category.

The method for recognizing a target in an image (hereinafter referred to as the recognition method) involved in the present disclosure will be described below with reference to FIG. 8 . In addition, recognition methods can identify objects in medical images. FIG. 8 is a flowchart illustrating an example of a method of recognizing an object in an image according to an example of the present disclosure.

As shown in FIG. 8, the identification method may include step S302. In step S302, a medical image as an input image may be acquired. In some examples, the input image can be input into the trained model after the same preprocessing as the above-mentioned training samples.

As shown in FIG. 8, the identification method may further include step S304. In step S304, at least one trained model can be used to determine the prediction results of each trained model for the input image, and the target prediction result can be obtained based on the prediction results of at least one trained model, wherein at least one trained model can be based on the above-mentioned Obtained by training with the model training method. In addition, at least one trained model may be a model based on the same type of network architecture (eg, U-Net), but with different network structures and/or parameters. For example, some branches or network levels may be added or subtracted to form at least one trained model. However, examples of the present disclosure are not limited thereto, and in other examples, at least one trained model may not be based on the same type of network architecture.

In addition, the prediction results of each trained model may include the probability that each pixel in the input image belongs to the corresponding labeled category. The labeling category may be the labeling category of the above-mentioned target. In some examples, the prediction results of at least one trained model may be integrated according to the label category and pixels to obtain the integrated probability that each pixel of the input image belongs to the corresponding label category, determine the connected region based on the integrated probability, and obtain each pixel based on the connected region. The target prediction result corresponding to the labeled category. In this case, obtaining the target prediction result based on the integrated probability can further improve the accuracy of the target prediction result.

In some examples, in obtaining the integrated probability, if there is only one trained model, the probability that each pixel in the input image in the predicted result of the trained model belongs to the corresponding label category can be used as the integrated probability; otherwise, the multi- The prediction results of each trained model are averaged to obtain the average probability of each pixel in the input image belonging to the corresponding label category (that is, the pixel-level probability average can be calculated according to the label category).

In some examples, in determining the connected region based on the integrated probability, the connected region may be determined based on the integrated probability and the classification threshold of each label category. Specifically, the values of pixels whose integration probability is not less than the classification threshold can be set to 1, and the values of other pixels can be set to 0. In some examples, the classification threshold can be determined based on a validation set using performance metrics. In addition, if there are connected regions, the number of connected regions may be one or more.

In some examples, in obtaining the target prediction result based on the connected region, the circumscribed rectangle of each connected region can be obtained. If the area of the circumscribed rectangle is greater than the second preset area, it can indicate that there is a target at the circumscribed rectangle, otherwise it can indicate that the There is no target at the bounding rectangle.

In some examples, the second preset area may be a second preset ratio of the area of the training samples. Specifically, the second preset area may be expressed as sHW, where s may represent the second preset ratio, H may represent the height of the input image, and W may represent the width of the input image.

In some examples, the second preset ratio may be a fixed value. In some examples, the second preset ratio may be determined according to the median area of objects of the same category in the medical image. In some examples, the second preset ratio s may satisfy the formula:

Among them, m can represent the median value of the area of the target of the same category in the medical image, and σ can represent the standard deviation of the area of the target of the same category in the medical image,

and

The average width and average height of medical images can be represented respectively. Here, the medical image may be an image in the data source used to acquire the second preset ratio. In some examples, the data source can be training data. In addition, the relevant parameters involved in the second preset ratio may be acquired in a similar manner to the related parameters involved in acquiring the preset hyperparameters, which will not be repeated here.

The present disclosure also relates to a computer-readable storage medium. The computer-readable storage medium can store at least one instruction, and when the at least one instruction is executed by a processor, one or more steps in the above-mentioned model training method or recognition method are realized.

The present disclosure also relates to electronic devices, which may include at least one processing circuit. At least one processing circuit is configured as one or more steps in the above-mentioned model training method or recognition method.

The model training, method, device and medium for identifying targets in medical images according to the example of the present disclosure, under-segment the image data in the marked area in the training sample to identify pixels of undetermined categories in the marked area, and combine the spatial weights The model to be trained 130 is trained to reduce the negative impact of the pixels of undetermined categories in the labeled region on the model to be trained 130 , thereby improving the accuracy of the predicted result of the trained model 130 to the input image. Thus, small targets can be effectively identified.

Although the present disclosure has been described in detail with reference to the drawings and examples, it should be understood that the above description does not limit the present disclosure in any form. Those skilled in the art can make modifications and changes to the present disclosure as needed without departing from the true spirit and scope of the present disclosure, and these modifications and changes fall within the scope of the present disclosure.

Claims

A model training method for identifying a target in a medical image, comprising: acquiring the medical image as a training sample and a marked area corresponding to the target in the training sample; determining the corresponding marked area Region segmentation results, and construct a training set using the training samples and the region segmentation results, wherein the region segmentation results are obtained by under-segmenting the image data in the marked region; and training based on the training set A model to be trained, and using a training loss function to optimize the model to be trained, wherein, in the training loss function, spatial weights are used to reduce the negative impact of pixels in the first region of the training sample on the model to be trained influence, the first area is an area other than the target area of the target in the marked area in the training sample, and the target area is determined by the area segmentation result.
The model training method according to claim 1, wherein obtaining the region segmentation result further comprises:

Acquiring the image data to be segmented based on the image data corresponding to the labeled region in the training sample, or acquiring based on the image data corresponding to the labeled region in the training sample and the image data corresponding to the labeled region in the segmentation result of interest The image data to be segmented, wherein the segmentation result of interest is a binary image used to identify the region of interest of the training sample; and the image data to be segmented is subjected to threshold segmentation using a target segmentation threshold, and then obtained The region segmentation result, wherein the region segmentation result is a binary image.
The model training method according to claim 2, characterized in that:

The target segmentation threshold is obtained according to the method of obtaining the threshold value of the label category to which the target belongs, wherein the method of obtaining the threshold value of each label category is determined by the average area and average color of each label category, and the method of obtaining the threshold value includes the first method and the second method, the average area of the label category corresponding to the first method is greater than the average area of the label category corresponding to the second method and the average color of the label category corresponding to the first method is larger than the The average color of the label category corresponding to the second method is light; for the first method, the threshold is searched so that the area of the pixel whose gray value is greater than the threshold in the image data to be segmented is smaller than the image data to be segmented The preset multiple of the area, the threshold is used as the target segmentation threshold, wherein, the preset multiple is greater than 0 and less than 1; for the second method, if the smallest edge of the image data to be segmented If the length of the image data to be segmented is less than the preset length, the average value of the gray value of the pixel in the image data to be segmented is taken as the target segmentation threshold, otherwise based on the gray levels of the four corner areas and the central area of the image data to be segmented value determines the object segmentation threshold.
The model training method according to claim 2, wherein, before obtaining the region segmentation result: an erosion operation is also performed on the threshold segmentation result of the image data to be segmented to obtain at least one connected region, from the at least The connected region whose center is closest to the center of the image data to be segmented is selected from one connected region as the region segmentation result.
The model training method according to claim 1, characterized in that:

In the spatial weight, pixels in the first region in the training sample are assigned a first weight, where the first weight is 0.
The model training method according to claim 1, characterized in that:

Pixels in the first region, the second region, the third region and the fourth region in the training sample are respectively assigned a first weight, a second weight, a third weight and a fourth weight, wherein the second The area is the target area, the third area is an area in the area of interest that does not belong to the marked area, the fourth area is an area outside the area of interest, and the first weight is less than the The second weight is smaller than the third weight, and the fourth weight is smaller than the second weight and smaller than the third weight.
The model training method according to claim 1, characterized in that:

The model to be trained is a semantic segmentation model, and a prediction result of the model to be trained is a semantic segmentation result of the training sample.
The model training method according to claim 1, characterized in that:

The shape of the marked area is a rectangle.
An electronic device, characterized by comprising at least one processing circuit configured to execute the model training method according to any one of claims 1 to 8.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the model according to any one of claims 1 to 8 is realized training method.
A method for identifying a target in a medical image, comprising: acquiring the medical image as an input image; and using at least one trained model trained by the model training method according to any one of claims 1 to 8 A model is used to determine a prediction result of each trained model for the input image, and to obtain a target prediction result based on the prediction result of the at least one trained model.
The method according to claim 11, characterized in that:

The prediction results of each trained model include the probability that each pixel in the input image belongs to the corresponding labeled category, and the predicted results of the at least one trained model are integrated according to the labeled category and pixel to obtain each pixel of the input image Belonging to the integrated probability of the corresponding label category, determining the connected region based on the integrated probability, and obtaining the target prediction results corresponding to each label category based on the connected region, wherein, if there is only one trained model, the probability is used as the Otherwise, average the prediction results of multiple trained models to obtain the mean value of the probability that each pixel in the input image belongs to the corresponding label category and use it as the integrated probability.
The method according to claim 11, characterized in that:

The medical image is a fundus image.
The method according to claim 13, characterized in that:

The targets included microangiomas, spotting, sheeting, and threading.
An electronic device, characterized in that it comprises: at least one processing circuit configured to: acquire a medical image as an input image; and use the model according to any one of claims 1 to 8 The training method trains at least one trained model, determines a prediction result of each trained model for the input image, and obtains a target prediction result based on the prediction result of the at least one trained model.