CN113780421B

CN113780421B - Brain PET image identification method based on artificial intelligence

Info

Publication number: CN113780421B
Application number: CN202111065379.0A
Authority: CN
Inventors: 叶方全
Original assignee: Guangzhou Tianpeng Computer Technology Co ltd
Current assignee: Guangzhou Tianpeng Computer Technology Co ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2022-06-07
Anticipated expiration: 2041-06-07
Also published as: CN113780421A; CN113077021A

Abstract

The invention provides a brain PET image identification method based on artificial intelligence, which comprises the following steps: receiving a sequence of PET images of a brain of a target patient; the PET image sequence is analyzed to identify a plurality of images, a multi-channel image of the PET image sequence is created, and a classification of a lesion of the PET image sequence of the multi-channel image is computed. The invention provides an artificial intelligence-based brain PET image recognition method, which is beneficial to summarizing the rules of the characteristics of brain nodule imaging by automatically learning and extracting characteristic information, achieves higher detection rate, obtains a more accurate three-dimensional model by segmentation and is beneficial to brain nodule lesion recognition and accurate diagnosis of doctors.

Description

Brain PET image identification method based on artificial intelligence

Technical Field

The invention relates to data mining, in particular to a brain PET image identification method based on artificial intelligence.

Background

The computer-aided diagnosis has great promotion effect on the aspects of improving the diagnosis accuracy, improving the working efficiency, reducing missed diagnosis and the like. With the development of computer technology and artificial intelligence technology, computer-aided diagnosis is also moving towards intellectualization. The detection and identification of brain nodules are of great significance for the diagnosis of early brain tumors. Brain nodules are a general term for small lesions, high-density shadows in some PET images. The appearance of brain nodules in imaging is very complex. Various image algorithms have been applied to brain nodule detection and segmentation, such as thresholding, morphological algorithms, active contour methods, and nonlinear regression. In recent years, researchers have proposed some deep learning models for brain nodule detection and segmentation, which have significantly improved effects compared with the previous methods, but also face the following problems: the two-dimensional network cannot well utilize three-dimensional shape and texture information, and the three-dimensional boundary is difficult to be accurately segmented; brain region images and nodule features are of high complexity and difficult to distinguish between nodules and other similar tissue.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an artificial intelligence-based brain PET image identification method, which comprises the following steps:

receiving a sequence of PET images of a brain of a target patient;

identifying a plurality of images from the sequence of PET images, the plurality of images including a base image, a peak intensity enhanced image, an initial captured image, and a delayed response image; the base image represents a sequence of non-gray PET images, the peak gray-scale enhanced image represents an image with the highest relative brightness value, the initial captured image represents an initially detected gray-scale image in the sequence of PET images, and the delayed response image represents an end portion of the sequence of PET images, i.e., the last image over a predefined time;

creating a multi-channel image of a PET image sequence, wherein the multi-channel image comprises a brightness channel, a gray level updating channel and a gray level clearing channel, the brightness channel comprises the peak gray level enhanced image, the gray level updating channel is an operation difference value between the peak gray level enhanced image and a basic image, and the gray level clearing channel is an operation difference value between the initial shooting image and the delayed response image;

wherein the analysis is performed by calculating a score image by assigning a score to each pixel according to a significant value above a threshold within a region around the pixel, and applying a non-maximum suppression to the score image to obtain a binary detection mask comprising candidate regions representing local maximum locations;

the method further comprises the following steps: cropping the candidate regions from the image and resizing each cropped candidate region according to an input of the depth RNN, wherein the depth RNN computes a classification representing a lesion of each candidate region;

performing wavelet transformation on the denoised PET brain image to obtain high-frequency information in the PET image; dividing the PET brain image into a plurality of regions through lifting tree decomposition, and respectively processing each local region; if each local region is phi x phi, the sampling angle is set to phi²1, i.e. projection angle u pi/phi²-1, wherein u is 1, 2, …, L²-1；

Constructing a phi multiplied by phi window with the same size as the sub-region, and calculating the orthogonal projection eta of the region on the sampling angle_u(i)：

η_u(i)＝-x(i)*(sinu)+y(i)*(cosu)

u is the projection angle, x (i), y (i) are the window coordinates, and the bending coefficient eta is obtained by projection for each angle_d；

Calculating a gradient vector field indicating the direction of change of the region at each point in the PET image; to eta_dPerforming wavelet transform to obtain transform coefficient { epsilon_kH, predetermining a threshold value T, and aligning epsilon_kCarrying out thresholding:

ε_k’(x)＝0 |x|≤T

ε_k’(x)＝ε_k(x) |x|＞T

after the thresholdingInverse wavelet transform to obtain eta_dIs approximated by a signal R_dFor all projection angles u, η can be made_dAnd R_dThe angle with the minimum difference is used as the optimal gradient vector field direction of the region;

u’＝argmin||η_d-R_d||²，ζ<H，u∈[0,Φ²]；

ζ＝min||η_d-R_d||²；

h is a threshold to determine whether a gradient vector field exists in the region.

Preferably, the method further comprises:

a patch saliency histogram is computed for respective images of the sequence of PET images, and wherein the method further comprises creating a single patch saliency map by combining a plurality of the patch saliency histograms, wherein the patch saliency channel stores the single patch saliency map.

Preferably, the depth RNN outputs a binary detection map comprising candidate regions calculated using a classification representing a lesion of each candidate region.

Preferably, the method further comprises summing the values of the binary detection map generated for each image of the sequence of PET images along the longitudinal axis to generate a projection thermodynamic map representing the spatial density of the candidate region.

Preferably, each of a plurality of transverse saliency histograms is computed for a respective image of the sequence of PET images, wherein the method further comprises creating a single transverse saliency thermal map by combining the plurality of transverse saliency histograms, wherein the transverse saliency channel stores the single transverse saliency thermal map.

Preferably, the transverse saliency histogram is computed by computing a side-to-side patch flow from a flow field between patches of the left and right brain for identifying, for each patch of the brain, a corresponding most adjacent patch in the symmetric part of the brain, wherein the transverse saliency value of the transverse saliency histogram of each patch is estimated from the error of the most adjacent patch.

Compared with the prior art, the invention has the following advantages:

the invention provides an artificial intelligence-based brain PET image recognition method, which is beneficial to summarizing the rules of the characteristics of brain nodule imaging by automatically learning and extracting characteristic information, achieves higher detection rate, obtains a more accurate three-dimensional model by segmentation and is beneficial to brain nodule lesion recognition and accurate diagnosis of doctors.

Drawings

Fig. 1 is a flowchart of an artificial intelligence-based brain PET image recognition method according to an embodiment of the present invention.

Detailed Description

A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.

The invention provides a brain PET image identification method based on artificial intelligence. Fig. 1 is a flowchart of a brain PET image recognition method based on artificial intelligence according to an embodiment of the invention.

The PET image sequence of the invention comprises at least a portion of the brain of the target patient. The present invention trains the depth RNN for each patient receiving a sequence of PET images. The depth RNN is trained from the multi-channel images and associated renderings and associated labels. Due to the lack of available training data sets, and because such machine learning methods are typically used for classification of natural images, rather than different medical images, standard automatic machine learning methods that rely on large data sets cannot be used. The present invention provides a solution to the problem of providing accurate classification results when a large training data set is not available. Due to the small size of the training set, and the use of a multi-channel image data structure, the neural network is trained relatively quickly, without sacrificing the accuracy of lesion computation.

A multi-channel image is created for a sequence of PET images. The PET image sequence is preprocessed before receiving a multi-channel image computed from the PET image sequence using a trained recurrent neural network. Preprocessing includes the process of segmenting the brain tissue from the sequence images. For example, registering the series of PET images along the longitudinal axis, with registration to achieve accurate superposition of the images. By regularizing the luminance values of the PET images and setting the standard deviation of the luminance values as a normalization of the unit luminance values, a relative measure between the images is defined. The regularization is performed, for example, by calculating a total luminance value for each image, an average luminance value for each image, a regularization luminance value for each image, or a relative luminance value for each image. The values of the regularized images are analyzed to identify images from the sequence for computing channels of the multi-channel representation.

For PET brain tissue image segmentation, only the brain boundary neighboring region needs to be analyzed, and the specific steps are as follows:

and performing wavelet transformation on the denoised PET brain image to obtain high-frequency information in the PET image. The PET brain image is divided into a plurality of regions by lifting tree decomposition, and each local region is processed respectively. Assuming each local region size is Φ × Φ, the sampleable angle is set to Φ²1, i.e. projection angle u π/φ²-1, wherein u is 1, 2, …, L²-1。

A Φ × Φ window having the same size as the sub-region is constructed, and the orthogonal projection of the region at the sampling angle is calculated.

η_u(i)＝-x(i)*(sinu)+y(i)*(cosu)

Wherein u is a projection angle, x (i), y (i) are window coordinates, and a bending coefficient eta is obtained by projecting for each angle_d。

A gradient vector field is calculated that indicates the direction of change of the region at each point in the PET image. To eta_dPerforming wavelet transform to obtain transform coefficient called { epsilon_kH, predetermining a threshold value T, and aligning epsilon_kCarrying out thresholding:

ε_k’(x)＝0 |x|≤T

ε_k’(x)＝ε_k(x) |x|＞T

after thresholding, it is subjected to an inverse wavelet transform to obtain η_dApproximation signal R of_dFor all projection angles u, η can be made_dAnd R_dThe angle at which the difference is the smallest is taken as the optimal gradient vector field direction for that region.

u’＝argmin||η_d-R_d||²，ζ<H，u∈[0,Φ²]

ζ＝min||η_d-R_d||²

In order to simplify the calculation complexity of the algorithm and reduce the number of target regions, adjacent regions with similar gradient vector field characteristics in the lifting tree region are combined together to construct a new PET brain segmentation target region:

1. calculating the optimal gradient vector field direction u' and the reconstruction error zeta of all the blocks;

2. calculating the optimal gradient vector field direction u for a region omega of width 2 phi_d'and reconstruction error ζ', and region Ω four sub-regions Ω₁-Ω₄Respectively has a reconstruction error of ζ₁,ζ₂,ζ₃,ζ₄If ζ' ═ ζ₁+ζ₂+ζ₃+ζ₄Then merge Ω₁-Ω₄

3. Repeating steps 1 and 2 until the maximum blocking area is reached.

The region where the gradient vector field is present is finally taken as the target region of the PET brain image and further processed.

PET nodule image segmentation was accomplished using the following procedure: each target region is divided into two parts, namely a nodule part and a background part, and the nodule part is a connected region after being divided;

the richness of the gray values is represented by entropy values, which are defined as

Wherein W is the number of gray levels contained in the target region, P_iProbability of occurrence of a pixel at gray level i in a PET sub-image.

Calculating the mean gradient

Wherein

G_x(x，y)＝2f(x+2，y)+f(x+l，y)-f(x-1，y)-2f(x-2，y)

G_y(x，y)＝2f(x，y+2)+f(x，y+1)-f(x，y-1)-2f(x，y-2)

D is the divided region, N_DCalculating the number of pixels in the region for gradient calculation; f is the pixel value of the corresponding position of the PET brain image.

Calculating the nodule region after PET image segmentation:

D_F＝argmin[weight₁W_D1+(l-weight_l)W_TF-D1+weight₂G_D1+(1-weight₂)G_TF-D2]

and after the brain information segmentation of the region is completed, the segmentation results of all target regions are fused to realize the segmentation processing of the PET brain image.

In the formula, TF represents the whole brain image, weight₁And weight₂Are the above functions U and G respectively_DWeight values in the segmentation algorithm.

Wherein a sequence of PET images is analyzed to identify a plurality of images including a base image, a peak intensity enhanced image, an initial captured image, and a delayed response image. The basic image is a PET image sequence without contrast information. For example, the base image may be identified as the first image of the sequence and the image associated with the lowest relative luminance value. Peak grayscale enhancement of a PET image sequence. The peak grayscale enhanced image is identified as, for example, the image having the highest relative luminance value, or the peak of the generated pattern. The gray scale initially detectable in the sequence of PET images of the initial captured image. The initial captured image is identified as an image having a luminance value higher than that of the base image according to a threshold value excluding a luminance change due to noise or artifact. An end portion of an image of the sequence of PET images is delayed in response to the image. The delayed response image may be the last image in the sequence that has passed a certain time.

A trained Recurrent Neural Network (RNN) receives as input the multi-channel images and computes an output representing a classification of the lesion. For example, the output may include one of the following classifications: malignant lesions, benign lesions and normal tissue.

Optionally, the sequence of PET images is extracted from the three-dimensional image data as two-dimensional slices. Deep RNN analyzes each two-dimensional slice sequence. Or the PET image sequence comprises 3D images. A patch saliency histogram may then also be computed for the plurality of images. A patch saliency histogram is computed from the LP distance between each patch of the image and the average patch of the principal components along the image patch. The patch saliency histogram is represented as an additional channel of the multi-channel image. The patch discriminative saliency histogram is analyzed to identify candidate regions that include a relatively high density of saliency values. The candidate regions are cropped and fed to the depth RNN for calculating lesions for each candidate region. The candidate regions may be cropped from the input of the luminance channel, the grayscale updating channel, and the grayscale clearing channel to create a multi-channel representation of each candidate region. The deep RNN may output a binary detection map that includes candidate regions classified with a classification representative of a lesion. The candidate regions classified as lesion representations represent the locations of the lesions. The binary detection maps of the patch discriminative significance histogram calculation are combined by an or operation. The binary detection maps are summed together to generate a projected thermodynamic map representing the spatial density of the candidate region, by which the location of the detected lesion is represented.

Wherein for the average patch, it may be computed from the LP distance and the principal components of the image patch along the particular image.

For a vectorized patch p around a point (x, y)_x，yThe degree of distinction of a patch is expressed in the formula:

PD(p_x，y)＝∑_k＝1 ⁿ|p_x，yω_k ^T|

wherein:

PD represents the degree of distinction of the patch, and n is the number of components;

p_x，yrepresenting the vectorized patch around point (x, y),

ω_k ^Trepresenting the kth principal component of the overall image patch distribution.

Optionally, for each image of the sequence, a patch saliency histogram is computed. The patch discriminative saliency histograms are then summed to create a thermodynamic diagram representing the degree of saliency.

The patch saliency map may be fed as an input to the depth RNN as an additional channel of a multi-channel representation. The patch discriminative saliency thermodynamic map represents the location of a detected lesion, e.g., the peak intensity point of the thermodynamic map associated with a multi-channel image of a lesion or tumor may represent the location of the lesion or tumor.

In another embodiment, a lateral saliency histogram is computed. For each patch of the brain, the transverse saliency includes the corresponding most adjacent patch in the symmetric region of the brain. The transverse saliency histogram stores transverse saliency values representing each patch. The transverse saliency histogram is calculated by LP distances between a patch of a brain of a certain image of a PET image sequence and corresponding patches of symmetric parts of the brain of the same image, and also by calculating a contralateral patch flow from a flow field between patches of the left and right brain. The flow field identifies for each patch of the brain the corresponding most adjacent patch in the symmetric portion of the brain.

The dense cluster field for each pixel is computed taking into account the k × k patches surrounding the respective pixel. For each pixel location denoted (x, y), a random displacement vector (denoted T) is assigned. The random displacement vectors mark the positions of corresponding patches in the symmetric part of the brain. Based on the calculated distance (e.g., LP distance). The quality of the displacement vector T can be calculated according to the following mathematical relationship:

wherein:

k denotes the size of the patch around each pixel, p_x，yRepresenting the corresponding pixel at coordinate (x, y), T representing a displacement vector, I representing a patch, d representing a quality metric.

And adjusting the displacement of a certain patch of the brain according to the displacement vectors of the adjacent patches of the symmetrical part of the brain. The adjustment of the displacement is generated by displacement vectors of adjacent patches in the same image. After the displacement adjustment is carried out on the image, the steps of distribution of random displacement vectors and displacement adjustment are carried out for multiple times in an iterative mode, and the position of the best corresponding patch of the brain symmetric part is determined according to the LP distance.

The transverse saliency value of the transverse saliency histogram of each patch is estimated from the errors of the most adjacent patches. The nearest neighbor error (denoted as NHE) may be calculated based on the following mathematical relationship:

NHE(p_x，y)＝min_TD(p_x，y，p_x+Tx，y+Ty)

p_x，yrepresenting the corresponding pixel at coordinate (x, y), t representing the displacement vector, d representing the quality metric. NHE denotes the most adjacent error metric.

Optionally, a transverse saliency histogram is computed for each image of the sequence. The transverse saliency histograms may be summed to create a thermodynamic diagram having values representing degrees of saliency. The transverse saliency heat map is fed as an input to the depth RNN as an additional channel of the multi-channel representation. The laterally significant thermodynamic map may represent the location of a detected lesion, e.g., a peak intensity point of the thermodynamic map associated with a representation of a lesion or tumor may represent the location of the lesion or tumor.

A patch discriminative significance histogram of the plurality of images, or a lateral significance histogram of the plurality of images, is analyzed to identify candidate regions comprising a relatively high density of significance values. The candidate regions are bounded by bounding boxes. The bounding box ensures that the entire lesion is included in the cropped image. The extracted lesion image may be resized according to the input of the depth RNN. For a composition consisting of_i，h_j) Represented window size for a given range and a set of thresholds t₁，t₂，...t_nW at and around each pixel (x, y)_i×h_jRegion of size s_x，yIn (1), the following scores were evaluated:

where count (x, y) represents the score calculated for the (x, y) pixel, a score image is generated from a set of scores calculated for each pixel. Non-maximum suppression is applied to the score image to obtain the location of the local maximum.

Optionally, the candidate region is cropped from the image. Each cropped candidate region is resized according to the input of the depth RNN. The depth RNN calculation represents the classification of the lesion for each candidate region. As described above, the cropped candidate regions may be referred to as a patch saliency histogram or a lateral saliency histogram of the respective channels fed into the multi-channel image.

When the candidate region is cropped, each channel of the multi-channel image includes a region corresponding to the candidate region. The multi-channel image includes at least the following three channels:

a luminance channel comprising a peak grayscale enhanced image.

And the gray level updating channel comprises the operation difference value between the peak gray level enhanced image and the basic image.

And the gray clearing channel comprises an operation difference value between the initial shooting image and the delayed response image.

The channels in the multi-channel image are computed from a series of axial PET imaging images.

The multi-channel image may include the following channels:

a patch saliency path comprising a patch saliency histogram, such as a patch saliency heat map, or a candidate region.

A lateral saliency channel comprising a lateral saliency histogram, a lateral saliency thermodynamic map, or a candidate region.

The multi-channel images are provided as input to a trained depth RNN for computing a plurality of classifications representative of lesions. Optionally, the trained depth RNN includes 9 convolutional layers in three consecutive patches. The first module may include two 4 x 32 filters with a ReLU layer, and a max pooling layer. The second module may include four 4 x 32 filters. The third module may include three convolutional layers of sizes 4 × 4 × 72, 6 × 6 × 72, and 3 × 3 × 72, with one ReLU behind each of the three convolutional layers. The trained deep RNN may include a fully connected layer with a number of neurons and a softmax loss layer.

The depth RNN output comprises a binary detection map of candidate regions computed using a classification representing the lesion of each candidate region. The values of the binary detection map generated for each image of the sequence of PET images are summed along the vertical axis to generate a projected thermodynamic map representing the spatial density of the candidate region.

The PET image sequence is acquired in the form of 3D image data. A set of 2D image slices is acquired from each sequence. Preprocessing a sequence or an image identifying a channel for calculating a multi-channel image, then calculating a patch discrimination significance histogram or a lateral significance histogram, analyzing the patch discrimination significance histogram or the lateral significance histogram to identify a candidate region, classifying the multi-channel image by a depth RNN, and providing an output based on a classification result.

According to one embodiment of the invention, a method of training a depth RNN to detect a representation of a lesion from a multi-channel image computed from a sequence of PET images comprises:

a training image of a patient is received. The training images may be stored, for example, as training images in a PET image repository or by a medical record server. Each set of training images comprises a sequence of PET images.

The training images are pre-processed, for example by moving the region of interest, adding multiple rotations or multiple flip variables.

A subset of the images of each sequence including the lesion is manually drawn to define the boundary of the lesion. The training images may be stored in an electronic medical record along with manual drawing. Optionally, the plurality of images without lesions are manually or automatically annotated to include normal tissue. Each annotated lesion is associated with an indicator. The indicator is stored as a tag, metadata, field value in an electronic medical record according to a drawn color or other representation. Images of areas of brain symmetry where there are no lesions or annotations may be associated with indicators, such as normal patches.

Calculating a patch difference significance histogram for a plurality of images of each sequence, calculating a lateral significance histogram for a plurality of images of each sequence, creating a multi-channel image for each sequence, and training a depth RNN according to the multi-channel image and the associated label. Wherein the deep RNN is trained according to a random gradient descent.

Finally, the depth RNN is provided for classification of the target sequence of the PET image.

In conclusion, the invention provides the brain PET image recognition method based on artificial intelligence, which is beneficial to summarizing the rules of the brain nodule imaging characteristics by automatically learning and extracting the characteristic information, achieves higher detection rate, obtains a more accurate three-dimensional model by segmentation, and is beneficial to brain nodule lesion recognition and accurate diagnosis of doctors.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented in a general purpose computing system, centralized on a single computing system, or distributed across a network of computing systems, and optionally implemented in program code that is executable by the computing system, such that the program code is stored in a storage system and executed by the computing system. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A brain PET image identification method based on artificial intelligence is characterized by comprising the following steps:

receiving a sequence of PET images of a brain of a target patient;

calculating a score image by assigning a score to each pixel according to significant values above a threshold within a region around the pixel, and applying non-maximum suppression to the score image to obtain a binary detection mask comprising candidate regions representing local maximum locations;

the method further comprises the following steps: cropping the candidate regions from the score image and resizing each cropped candidate region according to an input of a depth RNN that calculates a classification representing a lesion of each candidate region;

performing wavelet transformation on the denoised PET brain image to obtain high-frequency information in the PET image; dividing the PET brain image into a plurality of regions through lifting tree decomposition, and respectively processing each local region; if each local region is phi × phi, the sampling angle is phi²1, i.e. projection angle u pi/phi²-1, wherein u is 1, 2, …, L²-1；

Constructing a phi multiplied by phi window with the same size as the local area, and calculating an orthogonal projection eta of the local area on a sampling angle_u(i)：

η_u(i)=- x(i) * (sinu) +y(i) * (cosu)

Calculating a gradient vector field indicating the direction of change of the local region at each point in the PET image; to eta_dPerforming wavelet transform to obtain transform coefficient { epsilon_kH, predetermining a threshold value T, and aligning epsilon_kCarrying out thresholding:

ε_k ^’(x)=0 |x|≤T

ε_k ^’(x)= ε_k(x) |x|＞T

the thresholding is followed by an inverse wavelet transform to obtain η_dIs approximated by a signal R_dFor all projection angles u, η can be made_dAnd R_dThe angle with the minimum difference is used as the optimal gradient vector field direction of the local area;

u’=argmin||η_d-R_d||²，ζ<H，u∈[0,Φ²]；

ζ=min||η_d-R_d||²；

h is a threshold value for determining whether a gradient vector field exists in the local region.