CN109919928B

CN109919928B - Medical image detection method and device and storage medium

Info

Publication number: CN109919928B
Application number: CN201910167844.8A
Authority: CN
Inventors: 周洪宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-06
Filing date: 2019-03-06
Publication date: 2021-08-03
Anticipated expiration: 2039-03-06
Also published as: CN109919928A

Abstract

The embodiment of the invention discloses a medical image detection method, a medical image detection device and a storage medium; the method and the device can acquire the medical image and the pathological text information of an object to be detected, then predict the pathological type of the medical image to obtain predicted information, recognize the pathological text information through a trained multilayer perceptron to obtain reference information, then fuse the predicted information and the reference information to obtain a predicted result, and detect an area which accords with the target pathological type from the medical image to obtain a detection result when the predicted result indicates that the medical image is the target pathological type; the scheme can improve the accuracy of classification and positioning and improve the reliability of detection results.

Description

Medical image detection method and device and storage medium

Technical Field

The invention relates to the technical field of communication, in particular to a medical image detection method, a medical image detection device and a storage medium.

Background

In recent years, with the rapid development of artificial intelligence, the application of artificial intelligence in the medical field is increasing, especially in the detection of medical images.

There are various ways of acquiring medical images, among which the acquisition of X-ray images is most common. X-rays are a widely used technique in radiology today, and their main role is to detect lesions in bone and soft tissue. However, most of the existing detection schemes for medical images, such as X-ray images, generally only aim at relatively simple natural images, but detection for some medical images with relatively special structures, such as lung images, is difficult to implement, and particularly for locating a lesion region in a medical image, because of the existence of a large number of background images, high false positives easily exist, which causes a locating error, and greatly affects the reliability of a detection result.

Disclosure of Invention

The embodiment of the invention provides a medical image detection method, a medical image detection device and a storage medium, which can improve the positioning accuracy and further improve the reliability of a detection result.

The embodiment of the invention provides a medical image detection method, which comprises the following steps:

acquiring medical images and pathological text information of an object to be detected;

predicting the pathological type of the medical image to obtain prediction information;

identifying pathological text information through a Multi-layer perceptron (MLP) after training to obtain reference information;

fusing the prediction information and the reference information to obtain a prediction result;

and when the prediction result indicates that the medical image is the target pathology type, detecting a region which accords with the target pathology type from the medical image to obtain a detection result.

Correspondingly, an embodiment of the present invention further provides a medical image detection apparatus, including:

the acquiring unit is used for acquiring medical images and pathological text information of the object to be detected;

the prediction unit is used for predicting the pathological type of the medical image to obtain prediction information;

the recognition unit is used for recognizing the pathological text information through the trained multilayer perceptron to obtain reference information;

the fusion unit is used for fusing the prediction information and the reference information to obtain a prediction result;

and the detection unit is used for detecting a region which accords with the target pathological type from the medical image to obtain a detection result when the prediction result indicates that the medical image is the target pathological type.

Optionally, in some embodiments, the identification unit is specifically configured to perform unique hot coding on the pathological text information to obtain coded information, and the multi-layer sensor recognizes the coded information after training to obtain reference information.

Optionally, in some embodiments, the fusion unit is specifically configured to perform feature splicing on the prediction information and the reference information to obtain spliced information, and perform convolution processing on the spliced information to obtain a prediction result.

Optionally, in some embodiments, the detection unit includes a feature extraction subunit, a screening subunit, and a classification subunit;

the feature extraction subunit is configured to screen feature information that meets the target pathology type from the medical image;

the screening subunit is used for determining the region of the screened characteristic information in the medical image to obtain an alternative region and a region type probability;

the classification subunit is configured to select a candidate region with a region type probability greater than a threshold as a detection region, acquire a position of the detection region, and generate a detection result according to the detection region and the position of the detection region.

Optionally, in some embodiments, the feature extraction subunit is specifically configured to, through a trained detection model, perform feature extraction on the medical image by using different receptive fields to obtain a feature map of the medical image, and screen feature information that meets the target pathology type from the feature map of the medical image.

Optionally, in some embodiments, the trained detection model at least includes a convolutional neural network and a pyramid network, and the feature extraction subunit is specifically configured to perform feature extraction on the medical image through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, process the feature information output by the plurality of convolutional layers through the pyramid network, and generate a feature map of the medical image according to a processing result of each layer in the pyramid network.

Optionally, in some embodiments, the medical image detection apparatus further includes an acquisition unit and a training unit;

the acquisition unit is used for acquiring a plurality of medical image samples marked with real values of the region types;

the training unit is used for dividing the medical image sample into a foreground area and a background area through a preset detection model; determining an area type predicted value of the foreground area, and determining an area type confidence coefficient according to the area type predicted value and an area type true value; constructing a loss function according to the region type confidence coefficient and a preset adjusting coefficient, wherein the adjusting coefficient is used for adjusting the influence degree of the background region on feature learning; and adopting the loss function to converge the detection model to obtain the trained detection model.

Optionally, in some embodiments, the training unit is specifically configured to perform feature extraction on the medical image sample by using different receptive fields through a preset detection model to obtain a feature map of the medical image sample, and divide the medical image sample into a foreground region and a background region according to the feature map of the medical image sample.

Optionally, in some embodiments, the detection model at least includes a convolutional neural network and a pyramid network, and the training unit is specifically configured to perform feature extraction on the medical image sample through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, process the feature information output by the plurality of convolutional layers through the pyramid network, and generate a feature map of the medical image sample according to a processing result of each layer in the pyramid network.

Optionally, in some embodiments, the training unit is specifically configured to screen a plurality of candidate regions and region type probabilities that satisfy a target pathology type according to a feature map of the medical image sample, select a candidate region having a region type probability greater than a threshold as a foreground region, and obtain a region other than the foreground region in the medical image sample to obtain a background region.

Optionally, in some embodiments, the training unit is specifically configured to, when the degree of overlap between the candidate regions is greater than a set value, select, as the filtered candidate region, the candidate region with the highest region type probability in the candidate regions where overlap is generated, and reduce the region type probabilities of other candidate regions, except for the filtered candidate region, in the candidate regions where overlap is generated; and selecting the filtered candidate area with the area type probability larger than the threshold value as the foreground area.

Optionally, in some embodiments, the training unit is specifically configured to classify the foreground region according to the feature map by using a classification regression network, so as to obtain a region type prediction value of the foreground region.

In addition, the embodiment of the present invention further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to perform the steps in any one of the medical image detection methods provided by the embodiments of the present invention.

The embodiment of the invention can predict the pathological type of the medical image of the detection object, simultaneously recognize the pathological text information of the detection object by the trained multilayer perceptron, then obtain a prediction result by combining the two results, and detect the area which is in accordance with the target pathological type from the medical image to obtain a detection result (namely, position the area which is in accordance with the target pathological type) when the prediction result indicates that the medical image is the target pathological type such as pneumonia; according to the scheme, when the pathological type of the medical image is predicted, pathological text information is introduced as a reference factor, so that the accuracy of classification can be improved; moreover, the subsequent positioning of the region conforming to a certain target pathological type is based on the prediction that the medical image has the region of the target pathological type, so that the positioning accuracy can be improved, namely the reliability of the detection result can be improved by the scheme as a whole.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a scene of a medical image detection method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for detecting medical images according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a detection model provided in an embodiment of the present invention;

FIG. 4 is a schematic connection diagram of a convolutional neural network and a pyramid network in a detection model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an alternative area network provided in an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a classification regression network according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a classification model provided by an embodiment of the present invention;

FIG. 8 is another connection diagram of the convolutional neural network and the pyramid network provided by the embodiment of the present invention;

FIG. 9 is another flowchart of a method for medical image detection according to an embodiment of the present invention;

fig. 10 is a scene schematic diagram of a medical image detection method according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating an example of candidate regions in a medical image detection method according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of a medical image detection apparatus according to an embodiment of the present invention;

fig. 13 is another schematic structural diagram of a medical image detection apparatus provided in an embodiment of the present invention;

fig. 14 is a schematic structural diagram of a network device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a medical image detection method, a medical image detection device and a storage medium. The medical image detection apparatus may be specifically integrated in a network device, such as a terminal or a server.

For example, taking the example that the detection of the medical image is specifically integrated in a network device, referring to fig. 1, the network device may acquire a medical image and pathological text information of an object to be detected, then, on one hand, perform pathological type prediction on the medical image to obtain prediction information, on the other hand, recognize the pathological text information through a trained multi-layer sensor to obtain reference information, adjust the prediction information by using the obtained reference information (for example, the prediction information and the reference information may be fused), so as to determine a final prediction result, and then, when the prediction result indicates that the medical image is a target pathological type, such as pneumonia, detect an area that meets the target pathological type from the medical image, so as to obtain a detection result, where the detection result may include information such as a type and a specific location of the area.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The embodiment will be described in terms of a medical image detection apparatus, which may be specifically integrated in a network device, where the network device may be a terminal or a server, and the terminal may include a mobile phone, a tablet Computer, a notebook Computer, a Personal Computer (PC), a medical device, or the like.

A method for medical image detection, comprising: the method comprises the steps of obtaining a medical image and pathological text information of an object to be detected, conducting pathological type prediction on the medical image to obtain prediction information, identifying the pathological text information through a trained multilayer perceptron to obtain reference information, fusing the prediction information and the reference information to obtain a prediction result, and detecting an area which is in accordance with a target pathological type from the medical image to obtain a detection result when the prediction result indicates that the medical image is the target pathological type.

As shown in fig. 2, the specific process of the medical image detection method may be as follows:

101. and acquiring the medical image and pathological text information of the object to be detected.

For example, the medical image of the object to be detected sent by the medical image acquisition device may be received, or the medical image of the object to be detected input by the user (such as importing, shooting, or scanning) may be received from the device (i.e., the device where the detection apparatus of the medical image is located), or the stored medical image of the object to be detected may be read from the device, and so on.

The detection object refers to a living tissue to be detected, and the medical image may be an X-ray image or other two-dimensional image, and may be obtained by acquiring an image of the living tissue through a medical image acquisition device, such as a medical detection device or a medical monitoring device. The living tissue refers to a component of a living body (an independent individual with a living form is a living body and can correspondingly reflect external stimulation), such as a lung, intestines and stomach, or a heart of a human body, and intestines and stomach, even an oral cavity or skin of other animals.

Similarly, the pathological text information may also be obtained from other devices, such as other servers or terminals, or may also be obtained from the device (i.e., the device where the detection apparatus of the medical image is located), for example, the stored pathological text information is directly read from the device, or pathological text information that the user inputs (such as importing, shooting, or scanning) is received by the user at the device.

The pathological text information refers to the patient information recorded in the text form, and may include personal information and medical record information of the patient. The personal information may include name, gender, age, and/or occupation, and the medical record information may include past examination, diagnosis, and treatment records of the patient.

It should be noted that, when the medical image and the pathological text information of the object to be detected are acquired, they may be acquired simultaneously or separately, and when they are acquired separately, the acquiring order of the medical image and the pathological text information may not be sequential.

102. And predicting the pathological type of the medical image (namely the medical image of the object to be detected) to obtain prediction information.

For example, a trained dense connection convolutional network (DenseNet) may be specifically used to predict the type of pathology of the medical image, so as to obtain prediction information.

For example, the medical image may be convolved by a convolution layer, the result of the convolution processing is maximally pooled by a pooling layer, then the maximal pooling result is processed by different dense blocks in sequence, and finally the output result of the dense blocks is classified by a classification layer, so as to obtain the prediction information. Different dense blocks can be connected through a transition layer, and the transition layer is used for adjusting output data of the current dense block into data meeting input requirements of the next dense block.

The network parameters of the trained densely-connected convolutional network can be set according to the requirements of practical application.

In addition, it should be noted that the after-training densely connected convolutional network may be preset by a maintenance person, or may be trained by the detection device of the medical image, for example, a plurality of medical image samples marked with pathological types may be collected, then, the pathological types of the medical image samples are predicted by the preset densely connected convolutional network, and then, the densely connected convolutional network is converged according to the predicted pathological types and the marked pathological types, so that the after-training densely connected convolutional network may be obtained.

103. Identifying pathological text information (namely pathological text information of an object to be detected) through a trained multilayer perceptron (MLP) to obtain reference information; for example, the following may be specifically mentioned:

and carrying out one-hot coding (one-hot coding) on the pathological text information to obtain coded information, and identifying the coded information through the trained MLP to obtain reference information.

One-hot encoding refers to a method of encoding, usually used to convert text information into vector representation, by using N-bit status registers to encode N states, each state having its own register bit and only one of which is valid at any time, and therefore, may also be referred to as one-bit valid encoding. The trained MLP is an artificial neural network with a multilayer forward structure, and can map a plurality of input data sets onto a single output data set for text recognition.

For example, taking the pathological text information including the sex and age information of the patient as an example, at this time, the sex and age information may be encoded separately, and then the encoded sex and age information may be identified through the trained MLP, and the identification result may be used as reference information, and so on.

Optionally, the trained MLP may be preset by a maintenance person, or may be trained by the detection device of the medical image, for example, a plurality of pathological text information samples are collected, then, text recognition is performed on the pathological text information samples by using a preset MLP to obtain a text recognition result, and the MLP is converged according to the text recognition result and the actual text content of the pathological text information samples, so that the trained MLP is obtained.

It should be noted that, in the embodiment of the present invention, the network in which the trained densely-connected convolutional network and the trained MLP are integrated may be referred to as a classification model.

104. Fusing the prediction information and the reference information to obtain a prediction result; for example, the following may be specifically mentioned:

performing feature splicing on the prediction information and the reference information to obtain spliced information, performing convolution processing on the spliced information to obtain a prediction result, for example, performing convolution processing on the spliced information through a convolution layer (conv), and outputting a result, namely the prediction result.

105. When the prediction result indicates that the medical image (i.e. the medical image of the object to be detected) is the target pathology type, detecting a region which conforms to the target pathology type from the medical image, and obtaining a detection result.

For example, if the medical image is an X-ray image of a lung and the target pathology type is pneumonia, then, when the prediction result indicates that the X-ray image is an image corresponding to pneumonia, a region conforming to the pneumonia in the medical image may be detected to obtain a detection result. For another example, if the target pathological type is tuberculosis, then, when the prediction result indicates that the X-ray image is an image corresponding to tuberculosis, a region corresponding to the tuberculosis may be detected from the medical image, and a detection result may be obtained, and so on.

Optionally, there are various ways to detect the region conforming to the target pathology type from the medical image, for example, the following steps may be specifically included:

(1) and screening the medical image for characteristic information meeting the target pathological type.

For example, the feature extraction may be performed on the medical image by using different receptive fields through a trained detection model to obtain a feature map of the medical image, and then feature information meeting the target pathology type is screened from the feature map of the medical image.

As shown in fig. 3, the trained detection model may include at least a convolutional neural network and a pyramid network (feature pyramid network), and may further include a candidate area network and a classification regression network (Roi Pooling).

If the detection model after training includes a convolutional neural network and a pyramid network, the step of "extracting the features of the medical image by using different receptive fields through the detection model after training to obtain the feature map of the medical image" may include:

and performing feature extraction on the medical image through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, processing the feature information output by the plurality of convolutional layers through the pyramid network, and generating a feature map of the medical image according to a processing result of each layer in the pyramid network.

The number of convolutional layers of the convolutional neural network and the number of layers of the pyramid network may be set according to the requirements of practical applications, for example, the number of layers of the pyramid network may be set to 4, and so on. If the number of the convolutional layers of the convolutional neural network is greater than the number of the layers of the pyramid network, a plurality of convolutional layers can be selected from the convolutional neural network as convolutional layers corresponding to the pyramid network, so that the plurality of convolutional layers correspond to the layers of the pyramid network one by one, in this way, the feature information output by the plurality of convolutional layers can be processed through the pyramid network respectively, and a feature map of the medical image is generated according to the processing result of each layer in the pyramid network.

For example, as shown in fig. 4, taking convolutional layers i, i +1, and i +2 of a convolutional neural network as convolutional layers corresponding to a pyramid network as an example, feature information output by the convolutional layer i +2 may be processed by a t +2 layer of the pyramid network to obtain a processing result 1, then, on one hand, the processing result 1 is output, on the other hand, after the feature information output by the processing result 1 and the convolutional layer i +1 is fused, the input is used as an input of a t +1 layer of the pyramid network, after the input is processed by the t +1 layer of the pyramid network, a processing result 2 is obtained, and similarly, the t +1 layer outputs the processing result 2 on one hand, and on the other hand, after the processing result 2 is fused with the feature information output by the convolutional layer i, the input is used as an input of the t layer of the pyramid network to obtain a processing result 3, and so on the other hand, finally, the processing result of each layer in the pyramid network can be obtained, and then, the feature map of the medical image can be generated according to the processing results of all the layers in the pyramid network.

Because the processing result of each layer of the pyramid corresponds to the features of different receptive fields, the obtained feature map comprises feature information of different scales, that is, the lesions of different sizes can be detected.

It should be noted that, when each layer of the pyramid fuses the processing result and the feature information output by the corresponding convolution layer, the processing result and the feature information output by the convolution layer may be respectively subjected to preset processing, so that the dimension and the size of the processing result are consistent, and then the processing result and the feature information are fused; for example, referring to fig. 5, 2 times of upsampling may be performed on the processing result, and after performing convolution (conv) processing on the feature information output by the convolutional layer, the upsampled processing result and the convolution processing result of "1 × 1" may be fused, and so on.

After obtaining the feature map of the medical image, feature information satisfying the target pathology type may be screened from the feature map of the medical image, for example, if the target pathology type is "pneumonia", in this case, feature information satisfying "pneumonia" may be screened from the feature map of the medical image.

(2) And determining the region of the screened characteristic information in the medical image to obtain candidate regions (propofol) and region type probabilities.

For example, the region of the screened feature information in the medical image may be specifically determined by the candidate region network, so as to obtain the candidate regions and the region type probabilities of the respective candidate regions.

For example, referring to fig. 5, the candidate area network may include a 3 × 3 convolutional layer, two 1 × 1 convolutional layers, and an activation function (sigmoid) layer, specifically, the 3 × 3 convolutional layers may be used to process the screened feature information, then the output of the 3 × 3 convolutional layers is respectively transmitted to the two 1 × 1 convolutional layers, the output of one 1 × 1 convolutional layer is transmitted to the activation function layer for processing, and then the region of the "screened feature information" in the medical image is predicted by combining the output of the activation function layer and the output of the other 1 × 1 convolutional layer, so as to obtain the candidate region and the region type probability of each candidate region.

(3) Selecting a candidate area with the area type probability larger than a threshold value as a detection area, acquiring the position of the detection area, and then generating a detection result according to the detection area and the position of the detection area.

For example, a candidate region with a region type probability greater than a threshold may be specifically selected as a detection region (i.e., a suspected lesion region) through a classification regression network, and a position of the detection region is obtained, where the detection region and a corresponding position are a detection result.

For example, referring to fig. 6, the classification regression network may include a Region of Interest Pooling layer (Roi), a full connected layer (FC), and a classification and regression module, specifically, a plurality of regions of Interest (rois) may be selected from the candidate regions, then the regions of Interest and a feature map (such as a feature map of a medical image) output by the pyramid network are transmitted to the Roi Pooling layer for Pooling, the pooled data is fully connected via the full connected layer and then input to the classification and regression module for classification, to determine the category to which the data belongs, and to locate the location of the data, so as to obtain a desired detection Region and a location of the detection Region, where the locations of the detection Region and the detection Region, in the embodiment of the present invention, the result is referred to as a detection result.

Optionally, the detection model after training may be set in advance by an operation and maintenance person, or may be obtained by training the detection device of the medical image by itself, that is, before the step "detecting the model after training, performing feature extraction on the medical image by using different receptive fields to obtain a feature map of the medical image", the method may further include:

A. and collecting a plurality of medical image samples marked with the real values of the region types.

For example, in addition to directly obtaining a plurality of medical image samples labeled with the real values of the region types, a plurality of medical image samples may be obtained from a medical image database, or a plurality of medical image samples are collected by a medical image collecting device, and then the real values of the region types are labeled on the medical image samples, for example, pneumonia is taken as an example, and then a detection frame for pneumonia may be labeled at this time, and so on.

B. Dividing a medical image sample into a foreground area and a background area through a preset detection model;

for example, feature extraction may be performed on a medical image sample by using different receptive fields through a preset detection model to obtain a feature map of the medical image sample, and the medical image sample is divided into a foreground region and a background region according to the feature map of the medical image sample.

The detection model may include at least a convolutional neural network and a pyramid network, and may further include a candidate area network, a classification regression network, and the like.

If the detection model includes a convolutional neural network and a pyramid network, the step of "performing feature extraction on the medical image sample by using different receptive fields through a preset detection model to obtain a feature map of the medical image sample" may include:

and performing feature extraction on the medical image sample through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, processing the feature information output by the plurality of convolutional layers through the pyramid network, and generating a feature map of the medical image sample according to a processing result of each layer in the pyramid network.

Then, the medical image sample may be divided into a foreground region and a background region according to the feature map of the medical image sample, for example, a plurality of candidate regions and region type probabilities that satisfy a target pathology type may be specifically screened according to the feature map of the medical image sample, the candidate region having the region type probability greater than a threshold value is selected as the foreground region, and then, a region other than the foreground region in the medical image sample is obtained, so that the background region may be obtained. Wherein, the threshold value can be set according to the requirement of practical application.

Optionally, since there may be a case of overlapping candidate regions, in this case, generally, if the overlap is greater than a certain degree, the candidate regions with a lower region type probability are completely discarded, which may result in omitting some reliable labeled regions (i.e. labeled with region type real value regions, also referred to as detection boxes), so as to retain these reliable labeled regions, a flexible non-maximum suppression algorithm (soft-nms) may be used to handle this case, so that when the overlap degree of the candidate regions is greater than a set value, the candidate regions with a lower region type probability may not be completely discarded, but the region type probability thereof is reduced, that is, before the step "selecting the candidate regions with a region type probability greater than a threshold value as foreground regions", the detection method of the medical image may further include:

when the overlapping degree of the candidate areas is larger than a set value, selecting the candidate area with the highest area type probability in the candidate areas generating the overlapping as the filtered candidate area, and reducing the area type probability of other candidate areas except the filtered candidate area in the candidate areas generating the overlapping (namely, in the candidate areas generating the overlapping, reserving the candidate area with the highest area type probability, and then reducing the area type probability of other candidate areas, instead of completely discarding the other candidate areas).

Then, at this time, the step "selecting a candidate region with a region type probability greater than a threshold value as a foreground region" may specifically be: and selecting the filtered candidate area with the area type probability larger than the threshold value as the foreground area.

C. And determining an area type predicted value of the foreground area, and determining an area type confidence coefficient according to the area type predicted value and the area type true value.

For example, the foreground region may be classified by using a classification regression network according to the feature map to obtain a region type predicted value of the foreground region, and the region type confidence may be determined according to the region type predicted value and the region type true value.

The region type prediction value may include information such as a region type and a region type probability, for example, the region may be a pneumonia region, the probability is 80%, or a tuberculosis region, the probability is 60%, and the like.

For example, if the predicted region type t is consistent with the real region type, it is true, y may be set to 1, otherwise, if the predicted region type t is inconsistent with the real region type, it is false, y may be set to 0, and if y is 1, the confidence p of the predicted region type t is obtained_tThe region type probability p corresponding to the predicted region type t, and when y is not 1 (for example, 0), the confidence p of the predicted region type t_tThe difference between 1 and the region type probability p corresponding to the predicted region type t is expressed by a formula:

D. and constructing a Loss function (Focal local) according to the region type confidence coefficient and a preset adjusting coefficient, wherein the adjusting coefficient is used for adjusting the influence degree of the background region on feature learning.

For example, the loss function FL (p)_t) The following may be used:

FL(p_t)＝-α_t(1-p_t)^γlog(p_t)

wherein p is_tFor confidence of the predicted region type t, α_tThe larger the gamma is, the more the detection model emphasizes the foreground region (i.e., the smaller the influence of the background region on the feature learning).

Wherein the balance coefficient alpha_tThe value of the sum influence degree adjustment coefficient gamma can be set according to the requirements of practical application, for example, alpha of the background area can be set_tSet to 0.25, alpha of the foreground region_tSet to 0.75, γ to 1, and so on. Experiments show that under the parameter setting, compared with the traditional Loss function such as Cross Engine Loss, the Loss function can better learn pneumonia features from background images, and therefore the positioning accuracy is improved.

E. And adopting the loss function to converge the detection model to obtain the trained detection model.

Alternatively, in order to accelerate the training speed, the model may be trained in a hot start manner, for example, the initial learning rate of the detection model may be set to 0.001, when the training is performed to three quarters, the learning rate is reduced to 0.0001, and so on.

As can be seen from the above, in this embodiment, while the medical image of the detection object is subjected to the pathology type prediction, the MLP after training may also be used to identify the pathology text information of the detection object, and then, the two results are combined to obtain a prediction result, and when the prediction result indicates that the medical image is of a target pathology type, such as pneumonia, a region that conforms to the target pathology type is detected from the medical image, so as to obtain a detection result (i.e., the region that conforms to the target pathology type is located); according to the scheme, when the pathological type of the medical image is predicted, pathological text information is introduced as a reference factor, so that the accuracy of classification can be improved; moreover, the subsequent positioning of the region conforming to a certain target pathological type is based on the prediction that the medical image has the region of the target pathological type, so that the positioning accuracy can be improved, namely the reliability of the detection result can be improved by the scheme as a whole.

The method described in the above examples is further illustrated in detail below by way of example.

In this embodiment, a description will be given of an example in which the medical image detection apparatus is specifically integrated in a network device, and a classification model is used to process the medical image and the pathological text information of the detection object, respectively, and a detection model is used to further detect the prediction result obtained by the classification model.

And (I) training a classification model.

As shown in fig. 7, the classification model may include a dense connection convolutional network (DenseNet) and MLP, and may further include a concatenation module and a convolutional layer (conv).

The method for training the classification model may be various, for example, the classification model may be trained separately for each branch, or the classification model may be integrally trained as follows:

the first mode is as follows: and (5) training respectively.

Specifically, the network device may collect a plurality of medical image samples labeled with a pathological type to train the dense connection convolutional network in the classification model, and collect a plurality of pathological text information samples to train the MLP in the classification model, for example, taking the pathological type as pneumonia as an example, the training processes of the dense connection convolutional network and the MLP may be as follows:

(1) and training the dense connection convolutional network.

The network equipment acquires a plurality of medical image samples with marking values (the marking values indicate whether pneumonia exists in the medical image samples), predicts whether pneumonia exists in the medical image samples through a preset dense connection convolution network, and then converges the dense connection convolution network according to prediction results and the marking values, so that the trained dense connection convolution network can be obtained.

The initial values of the network structure and the parameters of the preset dense connection convolutional network can be set according to the requirements of practical application. Optionally, in order to accelerate the training speed of the dense connected convolutional network, the initial value of the parameter may adopt a parameter of the dense connected convolutional network pre-trained on an ImageNet data set, where the ImageNet is a large visual database for the study of visual object recognition software.

For example, taking the dense connection convolutional network, specifically DenseNet-169 as an example, the network parameters can be set as shown in table one.

Table one:

it should be noted that the above network layer number and network parameters are merely examples, and it should be understood that both the network layer number and the network parameters may be set according to the requirements of the practical application, and are not listed here.

(2) And training the MLP.

The network equipment collects a plurality of pathological text information samples, text recognition is carried out on the pathological text information samples through a preset MLP to obtain a text recognition result, then the MLP is converged according to the text recognition result and the actual text content of the pathological text information samples, and the trained MLP can be obtained.

The preset MLP may be initialized by using a gaussian distribution with a variance of 0.01 and a mean of 0.

The second mode is as follows: and (5) integral training.

Optionally, besides training the densely connected convolutional network and the MLP separately, the classification model may also be trained as a whole. For example, a plurality of sample pairs may be provided, where each sample pair includes a medical image sample having a label value (i.e., a pathological type, such as whether the label value indicates pneumonia) and a pathological text information sample corresponding to the medical image sample, then the medical image sample and the pathological text information sample in the sample pair are processed respectively by using a preset dense connection convolutional network and an MLP (i.e., whether pneumonia exists in the medical image sample in the sample pair is predicted by using the preset dense connection convolutional network, and text recognition is performed on the pathological text information sample in the sample pair by using the preset MLP), so as to obtain prediction information and a text recognition result, and the prediction information and the text recognition result are fused, for example, the prediction confidence and the text recognition result may be specifically feature-spliced, so as to obtain spliced information, then, and performing convolution processing on the spliced information to obtain a prediction result corresponding to the sample pair, if pneumonia exists or pneumonia does not exist, converging the classification model according to the prediction result and the labeled value of the sample pair, and thus obtaining the trained classification model.

The classification model after training comprises a dense connection convolution network after training, an MLP after training and the like.

It should be noted that, in order to make the classification network better learn the features, when training the classification network, the training process of the classification network may be divided into 3 stages according to the requirements, and different learning rates are adopted in different stages, for example, the learning rate in the 1 st stage is 0.03, the learning rate in the 2 nd stage is 0.003, the learning rate in the 3 rd stage is 0.001, and so on; wherein the densely connected convolutional network and the MLP use the same learning rate.

It should be noted that, when acquiring the medical image sample, in addition to the acquisition directly by the medical image acquisition device, the medical image sample may be acquired from a medical image database, such as the North american Society of Radiology (RSNA). In addition, in the acquired multiple medical image samples, besides the medical image samples can be used as a training set, a part of the medical image samples can also be selected as a verification set so as to verify the trained detection model, and further the accuracy of the detection model is improved.

Optionally, after a plurality of medical image samples are acquired, the medical image samples may be preprocessed, for example, scaling, flipping, cutting, or denoising. For example, in training the classification network, the image size may be adjusted to 448 × 448, etc., in order to reduce memory consumption; in addition, during the training process, the lung images are also sampled randomly. These pretreatments can greatly expand the size of the training set and effectively suppress overfitting.

And (II) training a detection model.

The detection model may include a convolutional neural network, a pyramid network, a candidate area network, a classification regression network (Roi Pooling), and the like, and the specific structure of the detection model may be shown in fig. 3.

The structure of each network may be determined according to the requirement of the practical application, for example, the convolutional neural network may adopt a residual neural network (ResNet), such as ResNet-101, wherein the step size (stride) of the first layer of conv3-x and conv4-x may be 2, and each convolutional layer is followed by an activation function, such as a Relu layer and a Batch Normalization layer (Batch Normalization), for example, referring to table two, the network parameters of ResNet-101 may be as follows:

table two:

Optionally, in order to accelerate the training speed of the densely connected convolutional network, the initial value of the network parameter of the ResNet may adopt a parameter of the ResNet pre-trained on the ImageNet data set.

Similarly, the number of layers of the pyramid network may also be set according to the requirements of practical applications, for example, the number of layers of the pyramid network may be set to 4, and so on. If the number of convolutional layers of the convolutional neural network is greater than the number of layers of the pyramid network, a plurality of convolutional layers may be selected from the convolutional neural network as convolutional layers corresponding to the pyramid network, so that the plurality of convolutional layers correspond to the layers of the pyramid network one to one, and thus, the feature information output by the plurality of convolutional layers may be processed by the pyramid network, and a feature map of the medical image sample may be generated according to a processing result of each layer in the pyramid network, for example, refer to fig. 8 and the foregoing embodiment specifically, which is not described herein again.

Optionally, when the detection model is trained, the network device may solve the convolution template parameter w and the bias parameter b of the convolutional neural network based on a Gradient Descent method (SGD), then, in each iteration process, calculate a prediction result error and back propagate to the convolutional neural network, calculate a Gradient and update parameters of the convolutional neural network, and after many times, the trained detection model may be obtained. For example, taking the pathological type as pneumonia as an example, the training of the detection model may specifically be as follows:

(1) the network device collects a plurality of medical image samples marked with pneumonia areas (i.e. pneumonia detection boxes).

For example, the network device may obtain a plurality of medical image samples (e.g., 28,000 lung X-ray images) from a medical image database, such as RSNA, or receive a plurality of medical image samples sent by the medical image acquisition device, and then mark the pneumonia area on the medical image samples, and so on.

It should be noted that, in the obtained multiple medical image samples, besides the medical image samples may be used as a training set, a part, for example, one fifth (for example, 2,000 images) may also be selected as a verification set, so as to verify the trained detection model, thereby improving the accuracy of the detection model.

In addition, after a plurality of medical image samples are acquired, the medical image samples can be preprocessed, such as scaling, turning, shearing, denoising and the like.

In the embodiment of the present invention, the size of each medical image sample may be specifically adjusted to 1024 × 1024, and the horizontal flipping may be performed at the same time.

(2) And the network equipment performs feature extraction on the medical image sample through the convolutional neural network in the detection model to obtain feature information output by the plurality of convolutional layers.

(3) And the network equipment processes the feature information output by the plurality of convolutional layers through the pyramid network in the detection model respectively, and generates a feature map of the medical image sample according to the processing result of each layer in the pyramid network.

For example, referring to fig. 8, a convolutional neural network may be used to perform feature extraction on a medical image sample to obtain feature information output by each convolutional layer, such as i, i +1, and i +2(i is an integer greater than or equal to 0), and then the feature information output by the convolutional layer i +2 may be processed by the t +2 layer of the pyramid network to obtain a processing result 1, the t +2 layer of the pyramid network outputs the processing result 1, and after the processing result 1 and the feature information output by the convolutional layer i +1 are fused, the input is used as the input of the t +1 layer of the pyramid network, and the input is processed by the t +1 layer of the pyramid network to obtain a processing result 2, and similarly, the t +1 layer of the pyramid network outputs the processing result 2, and the feature information output by the convolutional layer i are fused, and then the input is used as the input of the t layer of the pyramid network for processing, and obtaining a processing result 3, outputting the processing result 3 by the t layer of the pyramid network, fusing the processing result 3 with the other convolution layer of the convolution neural network, and then inputting the fused processing result as the t-1 layer of the pyramid network, and so on to finally obtain the processing result of each layer in the pyramid network, and then generating the feature map of the medical image sample according to the processing results of all the layers in the pyramid network.

(4) The network equipment divides the medical image sample into a foreground area and a background area according to the characteristic diagram of the medical image sample.

For example, if the target pathology type is pneumonia as an example, at this time, the network device may screen a plurality of candidate regions and region type probabilities that satisfy pneumonia according to the feature map of the medical image sample, then select a candidate region having a region type probability greater than a threshold as a foreground region, and obtain regions other than the foreground region in the medical image sample, to obtain a background region.

The threshold may be set according to the requirement of practical application, which is not described herein.

Optionally, if the candidate regions overlap, when the degree of overlap between the candidate regions is greater than the set value, the network device may select, as the filtered candidate region, the candidate region with the highest region type probability in the candidate regions where overlap occurs, and reduce the region type probabilities of other candidate regions except the filtered candidate region in the candidate regions where overlap occurs, so that some reliable labeled regions may be retained without being completely discarded.

(5) And the network equipment determines an area type predicted value of the foreground area and determines an area type confidence coefficient according to the area type predicted value and the area type real value.

For example, the network device may classify the foreground region according to the feature map by using a classification regression network to obtain a region type prediction value of the foreground region, for example, the region type prediction value may indicate that the foreground region is "pneumonia region, probability 80%", and then determine a region type confidence according to the region type prediction value and the region type true value; for example, after obtaining the area type prediction value of the foreground area, the position of the foreground area may be determined according to the feature map, then, the area type true value of the foreground area is obtained according to the position of the foreground area, and the area type confidence is determined according to the area type prediction value and the area type true value, and so on.

Wherein the region type confidence p_tThe calculation formula of (c) may be as follows:

where t is a region type (e.g., a pneumonia region or a tuberculosis region, etc.), p is a region type probability corresponding to the predicted region type t, and y is a comparison result between the predicted region type t and a real region type, for example, if the predicted region type t is consistent with the real region type, y is 1, and if the predicted region type t is not consistent with the real region type, y is not 1, for example, y may be 0 in general, and so on.

(6) The network equipment constructs a loss function FL (p) according to the confidence coefficient of the region type and a preset adjusting coefficient_t) For example, the following may be specifically mentioned:

FL(p_t)＝-α_t(1-p_t)^γlog(p_t)

wherein p is_tFor confidence of the predicted region type t, α_tThe larger the gamma is, the more the detection model emphasizes the foreground region (i.e., the smaller the influence of the background region on the feature learning). Coefficient of balance alpha_tThe value of the sum influence degree adjustment coefficient gamma can be set according to the requirements of practical application, for example, alpha of the background area can be set_tSet to 0.25, alpha of the foreground region_tSet to 0.75, gamma to 1, etc., and will not be described herein.

Due to the introduction of the adjustment coefficient gamma, the influence degree of the background region on the feature learning can be flexibly adjusted (the influence degree of the foreground region on the feature learning can also be considered as being adjusted), so that in the process of training the model, feature information such as pneumonia features can be better learned from the background image, the positioning accuracy is further improved, and the performance of the whole model is favorably improved.

(7) And the network equipment adopts the loss function to converge the detection model to obtain the trained detection model.

Optionally, when the detection model is trained, a hot start may be adopted to accelerate the training process of the model, for example, at the beginning of training, one quarter of the learning rate may be used, then the learning rate is linearly increased to 0.001, and when the training is performed to three quarters, the learning rate is reduced to 0.0001, so as to perform small-amplitude adjustment on the model parameters, so as to stably improve the performance of the model, and the like.

After the trained classification model and the trained detection model are obtained, the trained classification model and the trained detection model can be used for detecting the medical image of the object to be detected.

And (III) application of the classification model after training and the detection model after training.

As shown in fig. 9, a specific process of the method for detecting a medical image may be as follows:

201. the network equipment acquires medical images and pathological text information of the object to be detected.

The detection object refers to a living body tissue to be detected, such as a human body. Taking the pneumonia need to be detected by zhang san as an example, at this time, medical image acquisition may be performed on the lung of zhang san by the medical image acquisition device, for example, an X-ray image of the lung is acquired, and then the medical image and pathological text information of zhang san are provided to the network device.

The pathological text information can include personal information and medical record information of the patient. The personal information may include name, gender, age, and/or occupation, and the medical record information may include past examination, diagnosis, and treatment records of the patient.

202. The network device performs "pneumonia" (pathological type) prediction on the medical image by using the trained dense connection convolutional network to obtain prediction information, which is shown in fig. 10.

The convolutional network after training may be specifically DenseNet-169, and the network structure thereof may be as shown in table one.

For example, taking the network structure of the trained densely connected convolutional network as the network structure shown in table one as an example, the step "the network device performs pneumonia prediction on the medical image by using the trained densely connected convolutional network to obtain prediction information" may include:

the network equipment carries out convolution processing on the medical image through the convolution layer, transmits a convolution processing result to the pooling layer, carries out maximum pooling processing on the convolution processing result by the pooling layer, and then transmits the convolution processing result to the dense block 1 for processing, the dense block 1 carries out convolution and average pooling processing on the processing result through the transition layer 1 and then transmits the processing result to the dense block 2 for processing, and in the same way, the dense block 2 carries out convolution and average pooling processing on the processing result through the transition layer 2 and then transmits the processing result to the dense block 3 for processing, the dense block 3 carries out convolution and average pooling processing on the processing result through the transition layer 3 and then transmits the processing result to the dense block 4 for processing, and the dense block transmits the processing result to the classification layer for processing and then outputs final prediction information through the classification layer.

203. The network device identifies the pathological text information of the object to be detected through the trained MLP to obtain reference information, which is shown in fig. 10.

For example, the network device may perform one-hot coding (one-hot coding) on the pathological text information, such as gender and age information of the patient, to obtain coded information, and then recognize the coded information through the trained MLP to obtain reference information.

Wherein, the

steps

202 and 203 may not be executed sequentially.

204. And the network equipment performs characteristic splicing on the prediction information and the reference information to obtain spliced information.

For example, as shown in fig. 10, the prediction information and the reference information may be feature-spliced by the splicing module to obtain spliced information, and then step 205 is performed.

205. And the network equipment performs convolution processing on the spliced information to obtain a prediction result.

For example, as shown in fig. 10, the prediction result can be obtained by performing convolution processing on the spliced information by a convolution layer (conv).

206. When the prediction result indicates that the medical image is pneumonia, the network device screens feature information meeting the pneumonia from the medical image through the trained detection model, and these screened feature information may be referred to as pneumonia features, and then step 207 is executed.

For example, as shown in fig. 10, taking the example that the trained detection model includes a convolutional neural network, a pyramid network, an alternative area network, and a classification regression network, specifically, the network device may perform feature extraction on the medical image through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, then process the feature information output by the plurality of convolutional layers through the pyramid network, and generate a feature map of the medical image according to a processing result of each layer in the pyramid network. Because the processing result of each layer of the pyramid corresponds to the characteristics of different receptive fields, the obtained characteristic map comprises characteristic information with different scales, and therefore pneumonia focuses with different sizes can be detected.

After the feature map of the medical image is obtained, feature information satisfying pneumonia can be screened from the feature map of the medical image, and pneumonia features can be obtained.

207. The network equipment determines the area of the pneumonia features in the medical image, and obtains a candidate area and an area type probability.

For example, as shown in fig. 10, the network device may determine the region of the pneumonia feature in the medical image through the candidate region network, and obtain the candidate regions and the region type probabilities of the respective candidate regions.

For example, referring to fig. 5, taking the example that the candidate area network includes a 3 × 3 convolution layer, two 1 × 1 convolution layers, and an activation function (sigmoid) layer, the step "determining the area of the pneumonia feature in the medical image through the candidate area network to obtain the candidate area and the area type probability of each candidate area" may include:

processing pneumonia features by adopting 3 x 3 convolutional layers, then respectively transmitting the output of the 3 x 3 convolutional layers to two 1 x 1 convolutional layers for convolution processing, after the 1 x 1 convolutional layers are processed, transmitting the output of one 1 x 1 convolutional layer to an activation function layer, processing the output by the activation function layer by using an activation function, and then predicting the areas of the pneumonia features in the medical image by combining the output of the activation function layer and the output of the other 1 x 1 convolutional layer to obtain the candidate areas and the area type probability of each candidate area.

For example, referring to fig. 11, in the figure, three candidate regions may be obtained, including candidate region 1, candidate region 2, and candidate region 3. Wherein, the region type probability of the candidate region 1 is "80%", and the probability that the region is a focus region of pneumonia is 80%; the region type probability of the candidate region 2 is "40%", and the probability that this region is a lesion region of pneumonia is 40%; the region type probability of the candidate region 3 is "60%", and the probability that this region is a lesion region of pneumonia is 60%.

208. The network equipment selects a candidate area with the area type probability larger than a threshold value as a detection area, acquires the position of the detection area, and then generates a detection result according to the detection area and the position of the detection area.

For example, as shown in fig. 10, a candidate region with a region type probability greater than a threshold may be selected as a detection region (i.e., a suspected pneumonia lesion region) through a classification regression network, and a position of the detection region is obtained, where the detection region and the corresponding position are detection results.

For example, taking the classification regression network as an example that the classification regression network includes an interest region pooling layer, a full connection layer, and a classification and regression module, at this time, a plurality of interest regions (rois) may be specifically selected from the candidate regions, then, the interest regions and the feature maps output by the pyramid network are transmitted to the Roi pooling layer for pooling, and then, after the data subjected to pooling is subjected to full connection processing through the full connection layer, the data is input to the classification and regression module for classification so as to determine the category to which the data belongs, and the position of the classification and regression module is located, so that the required detection region and the position of the detection region can be obtained.

For example, if the threshold is 59%, in step 207, if the candidate region 1, the candidate region 2, and the candidate region 3 have been obtained, then since the candidate region 1 (with a region type probability of "80%") and the candidate region 3 (with a region type probability of "60%") are both greater than the threshold of "59%", the candidate region 1 and the candidate region 3 may be used as detection regions, and the positions of the candidate region 1 and the candidate region 3 may be obtained; on the other hand, since the region type probability of the candidate region 2 is "40%" and is smaller than the threshold value "59%", the candidate region 2 is not regarded as the detection region.

It can be seen that although the trained classification model determines that pneumonia features may exist in a certain region, if the trained detection model does not detect a corresponding lesion (i.e. the probability of the type of the region is not greater than the threshold), the region is still regarded as a background image and not as a detection region (lesion region), such as the candidate region 2 in the above example, and only the region where pneumonia is considered to exist in agreement between the trained classification model and the trained detection model is determined as a detection region (lesion region), so the detection accuracy is relatively high.

It should be noted that, the above is only described by taking the pathological type as "pneumonia" as an example, it should be understood that the pathological type may be other types, such as "tuberculosis", and in addition, it should be understood that the classification model and the detection model may detect some pathological types in addition to some pathological types, that is, the classification model may also recognize various pathological types in addition to two classifications, and the detection model may also detect various pathological types in addition to some pathological type areas, and the implementation manner is the same as that described above, but is different from that of the training sample used, and thus, the description is omitted.

As can be seen from the above, in this embodiment, while the pneumonia of the medical image of the detection object is predicted, the pathological text information of the detection object is identified through the trained MLP, and then a prediction result is obtained by combining the two results, and when the prediction result indicates that the medical image is pneumonia, the pneumonia area is detected from the medical image, and a detection result is obtained; according to the scheme, when the pathological type of the medical image is predicted, pathological text information is introduced as a reference factor, so that the accuracy of classification can be improved; moreover, the subsequent positioning of the pneumonia area is established on the basis that the pneumonia exists in the medical image, so that the positioning accuracy can be improved.

In addition, in the embodiment, when the detection model is trained, not only the feature learning is performed on the foreground region (such as the pneumonia region), but also the useful feature representation can be learned from the background region (such as the non-pneumonia region), so that the characteristics can be fully learned, the expression capability of the extracted features is improved, and the accuracy of subsequent region positioning can be further improved. In addition, the flexible non-maximum suppression algorithm, the hot start and other manners adopted by the embodiment also improve the classification and positioning accuracy of the model, and in summary, the scheme provided by the embodiment of the invention can improve the performance of the model and improve the reliability of the detection result.

In order to better implement the method, an embodiment of the present invention further provides a medical image detection apparatus, where the medical image detection apparatus may be specifically integrated in a network device, and the network device may be a terminal or a server.

For example, as shown in fig. 12, the medical image detection apparatus includes an acquisition unit 301, a prediction unit 302, a recognition unit 303, a fusion unit 304, and a detection unit 305, as follows:

(1) an acquisition unit 301;

the acquiring unit 301 is configured to acquire a medical image and pathological text information of an object to be detected.

For example, the acquiring unit 301 may specifically receive a medical image of an object to be detected sent by a medical image acquiring device, or may also receive a medical image of an object to be detected input by a user, or read a stored medical image of an object to be detected from the ground, or the like.

The medical image may be an X-ray image or other two-dimensional image, and the pathological text information may include personal information and medical record information of the patient, which may be referred to in the foregoing embodiments of the method.

(2) A prediction unit 302;

the prediction unit 302 is configured to perform a pathological type prediction on the medical image to obtain prediction information.

For example, the prediction unit 302 may be specifically configured to perform a pathology type prediction on the medical image by using a trained dense connection convolutional network to obtain prediction information.

The network parameters and the training mode of the after-training densely connected convolutional network may refer to the foregoing method embodiments, which are not described herein again.

(3) An identification unit 303;

the identifying unit 303 is configured to identify the pathological text information through the trained MLP to obtain reference information.

For example, the identifying unit 303 may be specifically configured to perform unique hot coding on the pathological text information to obtain coded information, and identify the coded information by the trained MLP to obtain reference information.

The network parameters and the training mode of the trained MLP may refer to the foregoing method embodiments, which are not described herein again.

(4) A fusion unit 304;

a fusion unit 304, configured to fuse the prediction information and the reference information to obtain a prediction result;

for example, the fusion unit 304 may be specifically configured to perform feature splicing on the prediction information and the reference information to obtain spliced information, and then perform convolution processing on the spliced information to obtain a prediction result.

(5) A detection unit 305;

a detecting unit 305, configured to detect, when the prediction result indicates that the medical image is of a target pathology type, a region that conforms to the target pathology type from the medical image, and obtain a detection result.

For example, the detection unit may include a feature extraction subunit, a screening subunit, and a classification subunit, as follows:

the feature extraction subunit is configured to screen feature information that satisfies the target pathology type from the medical image.

For example, if the medical image is an X-ray image of a lung and the target pathology type is pneumonia, then, when the prediction result indicates that the X-ray image is an image corresponding to pneumonia, the feature extraction subunit may select feature information that satisfies the target pathology type from the medical image.

The screening subunit is used for determining the region of the screened feature information in the medical image to obtain a candidate region and a region type probability.

For example, the screening subunit may be specifically configured to determine, through the candidate region network, a region of the screened feature information in the medical image, and obtain candidate regions and a region type probability of each candidate region.

For example, the classification subunit may be specifically configured to select, by using a classification regression network, a candidate region having a region type probability greater than a threshold as a detection region, and acquire a position of the detection region, where the detection region and a corresponding position are detection results.

The parameters and the threshold of the candidate area network and the classification regression network may be set according to the requirements of the practical application, and are not described herein again.

Optionally, the feature extraction subunit may perform feature extraction on the medical image in various ways, for example, in some embodiments, the feature extraction subunit may be specifically configured to perform feature extraction on the medical image by using different receptive fields through a detection model after training to obtain a feature map of the medical image, and screen feature information that meets the target pathology type from the feature map of the medical image.

For example, taking the example that the trained detection model includes a convolutional neural network and a pyramid network, at this time, the feature extraction subunit may be specifically configured to:

the feature extraction is carried out on the medical image through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, the feature information output by the plurality of convolutional layers is processed through the pyramid network, a feature map of the medical image is generated according to the processing result of each layer in the pyramid network, and the feature information meeting the target pathological type is screened from the feature map of the medical image.

The number of convolutional layers of the convolutional neural network and the number of layers of the pyramid network may be set according to the requirements of practical application, for example, the number of layers of the pyramid network may be set to 4, and so on, which may be referred to the foregoing method embodiments specifically and will not be described herein.

Optionally, in some embodiments, as shown in fig. 13, the apparatus for detecting medical images may further include an acquisition unit 306 and a training unit 307, as follows:

the acquiring unit 306 is configured to acquire a plurality of medical image samples labeled with real values of region types, and the specific acquiring manner may be as shown in the foregoing method embodiments.

The training unit 307 is configured to divide the medical image sample into a foreground region and a background region through a preset detection model; determining a region type predicted value of the foreground region (for example, the foreground region may be classified by using a classification regression network according to the feature map to obtain the region type predicted value of the foreground region), and determining a region type confidence according to the region type predicted value and the region type true value; constructing a loss function according to the region type confidence coefficient and a preset adjusting coefficient, wherein the adjusting coefficient is used for adjusting the influence degree of the background region on feature learning; and adopting the loss function to converge the detection model to obtain the trained detection model.

For example, the training unit 307 may be specifically configured to perform feature extraction on a medical image sample by using different receptive fields through a preset detection model to obtain a feature map of the medical image sample, and then divide the medical image sample into a foreground region and a background region according to the feature map of the medical image sample.

For example, taking the example that the detection model at least includes a convolutional neural network and a pyramid network, then, the training unit 307 may perform feature extraction on the medical image sample through the convolutional neural network to obtain feature information output by the plurality of convolutional layers, process the feature information output by the plurality of convolutional layers through the pyramid network, and generate a feature map of the medical image sample according to a processing result of each layer in the pyramid network. Thereafter, the training unit 307 may screen a plurality of candidate regions and region type probabilities satisfying the target pathology types according to the feature map of the medical image sample, then select a candidate region having a region type probability greater than a threshold as a foreground region, and obtain a region other than the foreground region in the medical image sample to obtain a background region.

Optionally, since there may be cases where the candidate regions overlap, in order to avoid some reliable labeled regions being discarded due to low probability, a flexible non-maximum suppression algorithm (soft-nms) may be used to handle such cases, namely:

the training unit 307 may be specifically configured to, when the degree of overlap between the candidate regions is greater than a set value, select, as the filtered candidate region, the candidate region with the highest region type probability in the candidate regions where overlap is generated, and reduce the region type probabilities of the other candidate regions except the filtered candidate region in the candidate regions where overlap is generated; and selecting the filtered candidate area with the area type probability larger than the threshold value as the foreground area.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, the medical image detecting apparatus of the present embodiment can predict the type of a pathology of a medical image of a detection object, and at the same time, identify the pathological text information of the detection object through the trained MLP, and then obtain a prediction result by combining the two results, and when the prediction result indicates that the medical image is a target pathology type, such as pneumonia, detect an area that conforms to the target pathology type from the medical image, to obtain a detection result; according to the scheme, when the pathological type of the medical image is predicted, pathological text information is introduced as a reference factor, so that the accuracy of classification can be improved; furthermore, the subsequent location of the region corresponding to a certain target pathology type is based on the prediction of the existence of the target pathology type region in the medical image, so that the accuracy of location can be improved.

In addition, when the detection model is trained, the detection device for medical images of the embodiment not only learns the features of the foreground region, but also learns useful feature representation from the background region, so that the characteristics can be fully learned, the expression capability of the extracted features is improved, and the accuracy of subsequent region positioning can be further improved. In addition, the flexible non-maximum suppression algorithm, the hot start and other manners adopted by the embodiment also improve the classification and positioning accuracy of the model, and in summary, the scheme provided by the embodiment of the invention can improve the performance of the model and improve the reliability of the detection result.

An embodiment of the present invention further provides a network device, as shown in fig. 14, which shows a schematic structural diagram of the network device according to the embodiment of the present invention, specifically:

the network device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the network device architecture shown in fig. 14 does not constitute a limitation of network devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the network device, connects various parts of the entire network device by using various interfaces and lines, and performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the network device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the network device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The network device further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The network device may also include an input unit 404, where the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the network device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

the method comprises the steps of obtaining a medical image and pathological text information of an object to be detected, conducting pathological type prediction on the medical image to obtain prediction information, identifying the pathological text information through MLP after training to obtain reference information, fusing the prediction information and the reference information to obtain a prediction result, and detecting an area which is in accordance with a target pathological type from the medical image to obtain a detection result when the prediction result indicates that the medical image is the target pathological type.

For example, a feature map of the medical image may be obtained by performing feature extraction on the medical image by using different receptive fields through a trained detection model, then, feature information meeting the target pathology type is screened from the feature map of the medical image, a region of the screened feature information in the medical image is determined, a candidate region and a region type probability are obtained, then, a candidate region with a region type probability greater than a threshold is selected as a detection region, a position of the detection region is obtained, and finally, a detection result is generated according to the detection region and the position of the detection region.

Optionally, the trained detection model may be set in advance by an operation and maintenance person, or may be obtained by training the medical image detection apparatus by itself, that is, the processor 401 may also run an application program stored in the memory 402, so as to implement the following functions:

collecting a plurality of medical image samples marked with area type true values, dividing the medical image samples into a foreground area and a background area through a preset detection model, determining an area type predicted value of the foreground area, determining an area type confidence coefficient according to the area type predicted value and the area type true value, constructing a loss function according to the area type confidence coefficient and a preset adjustment coefficient, and adopting the loss function to converge the detection model to obtain a trained detection model.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, while predicting the pathological type of the medical image of the detection object, the network device of this embodiment may also identify the pathological text information of the detection object through the trained MLP, and then obtain a prediction result by combining the two results, and when the prediction result indicates that the medical image is the target pathological type, such as pneumonia, detect the area conforming to the target pathological type from the medical image, to obtain the detection result (i.e., locate the area conforming to the target pathological type); according to the scheme, when the pathological type of the medical image is predicted, pathological text information is introduced as a reference factor, so that the accuracy of classification can be improved; moreover, the subsequent positioning of the region conforming to a certain target pathological type is based on the prediction that the medical image has the region of the target pathological type, so that the positioning accuracy can be improved, namely the reliability of the detection result can be improved by the scheme as a whole.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present invention provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the medical image detection methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:

the method comprises the steps of obtaining a medical image and pathological text information of an object to be detected, conducting pathological type prediction on the medical image to obtain prediction information, identifying the pathological text information through MLP after training to obtain reference information, fusing the prediction information and the reference information to obtain a prediction result, and detecting an area which is in accordance with a target pathological type from the medical image to obtain a detection result when the prediction result indicates that the medical image is the target pathological type. For example, feature extraction may be performed on the medical image by using different receptive fields through a trained detection model to obtain a feature map of the medical image, feature information meeting the target pathology type is screened from the feature map of the medical image, then, a region of the screened feature information in the medical image is determined to obtain an alternative region and a region type probability, the alternative region with the region type probability greater than a threshold is selected as a detection region, a position of the detection region is obtained, and then, a detection result may be generated according to the detection region and the position of the detection region, and so on.

Optionally, the detection model after training may be set in advance by an operation and maintenance person, or may be obtained by training the detection device of the medical image by itself, that is, the instruction may further perform the following steps:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any medical image detection method provided by the embodiment of the present invention, the beneficial effects that can be achieved by any medical image detection method provided by the embodiment of the present invention can be achieved, and detailed descriptions are omitted here for the details, see the foregoing embodiments.

The above detailed description is provided for the medical image detection method, apparatus and storage medium provided by the embodiments of the present invention, and the principle and implementation of the present invention are explained in this document by applying specific examples, and the description of the above embodiments is only used to help understanding the method and its core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for medical image detection, comprising:

predicting the pathological type of the medical image by adopting a trained dense connection convolution network to obtain prediction information; the method specifically comprises the following steps: performing convolution processing on the medical image by densely connecting convolution layers in a convolution network after training, performing maximum pooling processing on convolution processing results through a pooling layer, sequentially adopting different dense blocks to process the maximum pooling result, and classifying output results of the dense blocks by a classification layer to obtain the prediction information;

carrying out unique hot coding on the pathological text information to obtain coded information;

recognizing pathological text information through the trained multilayer perceptron to obtain reference information;

2. The method according to claim 1, wherein the fusing the prediction information and the reference information to obtain the prediction result comprises:

performing characteristic splicing on the prediction information and the reference information to obtain spliced information;

and carrying out convolution processing on the spliced information to obtain a prediction result.

3. The method according to any one of claims 1 to 2, wherein the detecting a region conforming to the target pathology type from the medical image, and obtaining a detection result comprises:

screening feature information meeting the target pathological type from the medical image;

determining the region of the screened characteristic information in the medical image to obtain a candidate region and a region type probability;

selecting a candidate region with the region type probability larger than a threshold value as a detection region, and acquiring the position of the detection region;

and generating a detection result according to the detection area and the position of the detection area.

4. The method of claim 3, wherein the screening the medical image for feature information that satisfies the target pathology type comprises:

performing feature extraction on the medical image by adopting different receptive fields through a trained detection model to obtain a feature map of the medical image;

and screening feature information meeting the target pathology type from the feature map of the medical image.

5. The method according to claim 4, wherein the trained detection model at least includes a convolutional neural network and a pyramid network, and the obtaining of the feature map of the medical image by performing feature extraction on the medical image by using different receptive fields through the trained detection model comprises:

extracting the characteristics of the medical image through the convolutional neural network to obtain characteristic information output by a plurality of convolutional layers;

processing the characteristic information output by the plurality of convolution layers through the pyramid network;

and generating a feature map of the medical image according to the processing result of each layer in the pyramid network.

6. The method according to claim 4, wherein before the feature extraction of the medical image by using different receptive fields through the trained detection model to obtain the feature map of the medical image, the method further comprises:

collecting a plurality of medical image samples marked with real values of the region types;

dividing a medical image sample into a foreground area and a background area through a preset detection model;

determining an area type predicted value of the foreground area, and determining an area type confidence coefficient according to the area type predicted value and an area type true value;

constructing a loss function according to the region type confidence coefficient and a preset adjusting coefficient, wherein the adjusting coefficient is used for adjusting the influence degree of the background region on feature learning;

and adopting the loss function to converge the detection model to obtain the trained detection model.

7. The method according to claim 6, wherein the dividing the medical image sample into a foreground region and a background region by a preset detection model comprises:

performing feature extraction on the medical image sample by adopting different receptive fields through a preset detection model to obtain a feature map of the medical image sample;

and dividing the medical image sample into a foreground region and a background region according to the feature map of the medical image sample.

8. The method according to claim 7, wherein the detection model at least includes a convolutional neural network and a pyramid network, and the obtaining of the feature map of the medical image sample by performing feature extraction on the medical image sample by using different receptive fields through a preset detection model includes:

performing feature extraction on the medical image sample through the convolutional neural network to obtain feature information output by a plurality of convolutional layers;

and generating a characteristic diagram of the medical image sample according to the processing result of each layer in the pyramid network.

9. The method according to claim 7, wherein the dividing the medical image sample into a foreground region and a background region according to the feature map of the medical image sample comprises:

screening a plurality of candidate regions and region type probabilities meeting the target pathology type according to the feature map of the medical image sample;

selecting a standby area with the area type probability larger than a threshold value as a foreground area;

and acquiring the region except the foreground region in the medical image sample to obtain a background region.

10. The method according to claim 9, wherein before selecting the candidate region with the region type probability greater than the threshold as the foreground region, further comprising:

when the overlapping degree of the alternative areas is larger than a set value, selecting the alternative area with the highest area type probability in the alternative areas generating the overlapping as a filtered alternative area, and reducing the area type probabilities of other alternative areas except the filtered alternative area in the alternative areas generating the overlapping;

the selecting the candidate area with the area type probability larger than the threshold as the foreground area specifically comprises: and selecting the filtered candidate area with the area type probability larger than the threshold value as the foreground area.

11. The method of claim 6, wherein determining the region type predictor for the foreground region comprises:

and classifying the foreground region by using a classification regression network according to the feature map to obtain a region type predicted value of the foreground region.

12. An apparatus for detecting medical images, comprising:

the prediction unit is used for predicting the pathological type of the medical image by adopting the trained dense connection convolution network to obtain prediction information; the method specifically comprises the following steps: performing convolution processing on the medical image by densely connecting convolution layers in a convolution network after training, performing maximum pooling processing on convolution processing results through a pooling layer, sequentially adopting different dense blocks to process the maximum pooling result, and classifying output results of the dense blocks by a classification layer to obtain the prediction information;

the identification unit is used for carrying out unique hot coding on the pathological text information to obtain coded information; recognizing pathological text information through the trained multilayer perceptron to obtain reference information;

13. The apparatus of claim 12, further comprising an acquisition unit and a training unit;

14. A storage medium storing a plurality of instructions, the instructions being suitable for being loaded by a processor to execute the steps of the method for detecting medical images according to any one of claims 1 to 11.