CN113269225A - Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics - Google Patents

Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics Download PDF

Info

Publication number
CN113269225A
CN113269225A CN202110378227.XA CN202110378227A CN113269225A CN 113269225 A CN113269225 A CN 113269225A CN 202110378227 A CN202110378227 A CN 202110378227A CN 113269225 A CN113269225 A CN 113269225A
Authority
CN
China
Prior art keywords
image
liver
smooth muscle
omics
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110378227.XA
Other languages
Chinese (zh)
Inventor
丁勇
邵嘉源
夏靖雯
田吴炜
陆晨燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110378227.XA priority Critical patent/CN113269225A/en
Publication of CN113269225A publication Critical patent/CN113269225A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • G06F2218/06Denoising by applying a scale-space analysis, e.g. using wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20064Wavelet transform [DWT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30056Liver; Hepatic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30101Blood vessel; Artery; Vein; Vascular
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/032Recognition of patterns in medical or anatomical images of protuberances, polyps nodules, etc.

Abstract

The invention discloses a non-invasive liver epithelial sample vascular smooth muscle lipoma image classification device based on image omics, and belongs to the technical field of medical image processing. The method comprises the following steps: the sampling module is used for acquiring CT/MRI images of liver epithelial vascular smooth muscle lipoma, liver cancer and liver focal nodule hyperplasia which meet the requirements; a focus region extraction module for extracting a focus region; the characteristic extraction module is used for carrying out image omics characteristic extraction on the focus area; the characteristic screening module is used for screening the characteristics of the image omics; the random forest network training module is used for training a random forest model to obtain an image omics label; the clinical index fusion module is used for fusing the imaging omics prediction label and the clinical index of the patient and training a multiple logistic regression model; and the classification module is used for obtaining a final prediction label by combining a random forest network and a multiple logistic regression model, and realizing classification of the liver epithelial sample vascular smooth muscle lipoma images. The device has the advantages of high identification precision, high identification speed, safety and stability.

Description

Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics
Technical Field
The invention belongs to the technical field of medical image processing, and particularly relates to a non-invasive liver epithelial sample vascular smooth muscle lipoma image classification device based on image omics.
Background
At present, preoperative differential assessment of liver tumors remains a challenging medical problem for clinicians. On the one hand, since China has a huge number of hepatitis B virus-related cirrhosis population, a large number of new liver lesion cases are generated every year. On the other hand, with the increasing popularity of health checks, clinicians may detect various types of liver masses by means of non-invasive imaging techniques. Often, clinicians need to evaluate large numbers of liver lesion cases to implement individualized diagnosis, treatment, and follow-up strategies.
Hepatoepithelioid angioid smooth muscle lipoma (HEAAML) is a rare potential malignancy belonging to the PEComas family, which is pathologically represented by abnormal differentiation of perivascular epithelioid cells. As a particular subtype of vascular smooth muscle lipoma, hepatic epithelioid vascular smooth muscle lipoma with no visible fat is easily confused with other liver hematomas, including liver cancer (HCC) and liver Focal Nodular Hyperplasia (FNH). Since differential diagnostic evaluation is an important prerequisite for the implementation of individualized treatment strategies, accurate differentiation between hepatoma-type and non-HEAAML-type lesions is crucial for the treatment of HEAAML. The recurrence rate of HEAAML is low, and the effect of local excision is ideal. In contrast, HCC patients require radiotherapy, surgical resection or arterial chemoembolization following a clinical complex assessment, according to diagnostic and therapeutic guidelines; FNH is then completely benign and typically only requires periodic observation. In conclusion, the development of the non-invasive method for differential diagnosis of the liver epithelial-like vascular smooth muscle lipoma has important clinical significance.
Disclosure of Invention
The invention discloses a non-invasive liver epithelial sample vascular smooth muscle lipoma image classification device based on image omics. The method aims to complete the full-flow design of the identification model of the liver epithelial sample vascular smooth muscle lipoma by utilizing the technology of the image omics, construct a prediction model based on the characteristics of the image omics, provide a practical auxiliary tool for preoperative differential diagnosis of the liver epithelial sample vascular smooth muscle lipoma, and help to realize accurate treatment of a liver cancer patient.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on imagery omics comprises:
the sampling module is used for acquiring CT or MRI image data of a patient diagnosed with liver epithelial-like vascular smooth muscle lipoma, liver cancer or liver focal nodule hyperplasia; all epithelial-like vascular smooth muscle lipoma cases are classified into an epithelial-like vascular smooth muscle lipoma group, all liver cancer and liver focal nodule proliferation cases are classified into a non-epithelial-like vascular smooth muscle lipoma group, and actual data labels are given to the cases according to grouping;
the focus region extraction module is used for extracting images of liver epithelial-like vascular smooth muscle lipoma, liver cancer and liver focal nodule hyperplasia focus regions;
the characteristic extraction module is used for extracting four types of image omics characteristics of the focus region image, including gray statistic characteristics, morphological characteristics, texture characteristics and wavelet characteristics, to form an image omics characteristic set X { X ═ X1,X2,X3,…,XnWhere n is the total number of features, the ith feature vector XiIs represented by Xi={xi1,xi2,xi3,…,ximM is the total number of cases, ximAn ith feature representing an mth case;
the characteristic screening module is used for carrying out characteristic screening on the image omics characteristic set acquired by the characteristic extraction module, the screening is based on a mutual information coefficient between the image omics characteristic and the data label, if the mutual information coefficient is greater than 0.3, the characteristic is reserved, and if not, the characteristic is screened out;
the random forest network training module is used for training a random forest network by utilizing the screened image omics feature set and the actual data label corresponding to each sample, and mapping the image omics feature set into an image omics prediction label by utilizing the trained random forest network;
the clinical index fusion module is used for acquiring and screening clinical indexes of cases, fusing the normalized clinical indexes with the image omics prediction labels and training a multivariate logistic regression model by using a five-fold verification method;
the classification module is used for loading the trained random forest network and the multivariate logistic regression model in the actual classification stage, acquiring a clinical CT or MRI image of the case to be identified, and processing the clinical CT or MRI image by using the focus region extraction module, the feature extraction module and the feature screening module in sequence to obtain a screened image omics feature set corresponding to the focus region of the case to be identified; and obtaining an image omics prediction label of the image to be identified by utilizing the trained random forest network, combining the clinical data of the case to be identified, using the image omics prediction label as the input of a multiple logistic regression model to obtain a final prediction label of the case to be identified, and finishing the classification of the liver epithelial sample vascular smooth muscle lipoma image according to the prediction label.
The invention has the beneficial effects that:
the invention develops a complete liver epithelial sample vascular smooth muscle lipoma differential diagnosis auxiliary system based on the imaging omics, which comprises data acquisition, lesion segmentation, feature extraction, feature screening and model construction. A differential diagnosis classification model based on image omics characteristics and clinical indexes is obtained by adopting machine learning methods such as random forest and logistic regression. Experimental results show that the model can accurately judge whether the focus belongs to liver epithelial-like vascular smooth muscle lipoma or not, can be conveniently used as a preoperative assessment tool, and is expected to help surgeons customize individualized preoperative treatment schemes and prognosis assessment in the future.
Drawings
Fig. 1 is a flowchart of the non-invasive liver epithelioid angiomyolipoma image classification device based on the imaging group.
Detailed Description
The method of the present invention is further described below with reference to the accompanying drawings.
A non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on imagery omics comprises:
the sampling module is used for acquiring CT or MRI image data of a patient diagnosed with liver epithelial-like vascular smooth muscle lipoma, liver cancer or liver focal nodule hyperplasia; all epithelial-like vascular smooth muscle lipoma cases are classified into an epithelial-like vascular smooth muscle lipoma group, all liver cancer and liver focal nodule proliferation cases are classified into a non-epithelial-like vascular smooth muscle lipoma group, and actual data labels are given to the cases according to grouping;
the focus region extraction module is used for extracting images of liver epithelial-like vascular smooth muscle lipoma, liver cancer and liver focal nodule hyperplasia focus regions;
the characteristic extraction module is used for extracting four types of image omics characteristics of the focus region image, including gray statistic characteristics, morphological characteristics, texture characteristics and wavelet characteristics, to form an image omics characteristic set X { X ═ X1,X2,X3,…,XnWhere n is the total number of features, the ith feature vector XiIs represented by Xi={xi1,xi2,xi3,…,ximM is the total number of cases, ximAn ith feature representing an mth case;
the characteristic screening module is used for carrying out characteristic screening on the image omics characteristic set acquired by the characteristic extraction module, the screening is based on a mutual information coefficient between the image omics characteristic and the data label, if the mutual information coefficient is greater than 0.3, the characteristic is reserved, and if not, the characteristic is screened out;
the random forest network training module is used for training a random forest network by utilizing the screened image omics feature set and the actual data label corresponding to each sample, and mapping the image omics feature set into an image omics prediction label by utilizing the trained random forest network;
the clinical index fusion module is used for acquiring and screening clinical indexes of cases, fusing the normalized clinical indexes with the image omics prediction labels and training a multivariate logistic regression model by using a five-fold verification method;
the classification module is used for loading the trained random forest network and the multivariate logistic regression model in the actual classification stage, acquiring a clinical CT or MRI image of the case to be identified, and processing the clinical CT or MRI image by using the focus region extraction module, the feature extraction module and the feature screening module in sequence to obtain a screened image omics feature set corresponding to the focus region of the case to be identified; and obtaining an image omics prediction label of the image to be identified by utilizing the trained random forest network, combining the clinical data of the case to be identified, using the image omics prediction label as the input of a multiple logistic regression model to obtain a final prediction label of the case to be identified, and finishing the classification of the liver epithelial sample vascular smooth muscle lipoma image according to the prediction label.
As shown in fig. 1, the work flow of the above apparatus specifically includes:
and (1) acquiring a clinical CT or MRI image of a patient through a sampling module.
Inclusion criteria for data: pathological diagnosis of smooth muscle lipoma of liver epithelium sample blood vessel, liver cancer and liver focal nodule hyperplasia after surgical excision or biopsy; performing CT or MRI enhanced scanning within 1 month before operation; the image data is complete and can be further analyzed;
exclusion criteria for data: diagnosing a recurrent tumor or a multi-organ malignancy; receiving anti-tumor treatment before CT or MRI enhanced scanning; the quality of the image of the liver mass is poor.
Using a focus region extraction module to draw and extract images of liver epithelioid vascular smooth muscle lipoma, liver cancer and liver focal nodule hyperplasia focus regions; in this embodiment, the lesion region extraction module may be implemented by using ITK-SNAP software.
And (3) extracting image features of the focus region segmented in the step (2) by using a feature extraction module, wherein the extracted features mainly comprise the following categories:
A. morphological characteristics, representing the relevant characteristics of the tumor shape, and the calculation formula of each morphological characteristic is as follows:
(a) grid surface area:
Figure BDA0003011665130000041
wherein, OiaiAnd OibiIs the edge of the ith triangle in the mesh, from vertex ai、biAnd the origin O, NfRepresenting the number of triangles in the mesh, AiDenotes the area of the ith triangle in the mesh, NfRepresenting the number of triangles in the mesh.
(b) Pixel surface area:
Figure BDA0003011665130000042
wherein A iskRepresenting a single pixel AkSurface area of (2), NvIndicating the number of pixels. By multiplying the number of pixels in the region of interest by a single pixel akTo approximate the surface area of the region of interest.
(c) Perimeter:
Figure BDA0003011665130000051
wherein the content of the first and second substances,
Figure BDA0003011665130000052
aiand biIs the vertex of the ith line in the peripheral mesh, PiRepresenting the perimeter of the ith line in the grid circumference, NwIndicating the number of buses. The perimeter of each line in the grid circumference is first calculated and then the total perimeter P is obtained by taking the sum of all the lines.
(d) Ratio of perimeter to surface area of mesh:
Figure BDA0003011665130000053
(e) sphericity:
Figure BDA0003011665130000054
(f) spherical irregularity:
Figure BDA0003011665130000055
(g) length of the main shaft:
Figure BDA0003011665130000056
wherein λ ismajorA maximum principal component value after performing principal component analysis on physical coordinates representing the center of the pixel; the principal axis length feature yields the maximum axial length of an ellipsoid enclosing the region of interest, and uses the maximum principal component λmajorTo calculate.
(h) Minor axis length
Figure BDA0003011665130000057
Wherein λ isminorA second largest principal component value after performing principal component analysis on the physical coordinates representing the pixel center; the short axis length feature produces a second large axis length of an ellipsoid enclosing the region of interest, and a second large principal component λ is usedminorTo calculate.
(i) Elongation percentage:
Figure BDA0003011665130000058
the elongation shows the relationship between the two largest principal components in the shape of the region of interest.
B. First order features (grayscale statistics): the grayscale statistic is a feature derived based on an intensity histogram, describing the distribution of pixel intensities within the tumor region. The definition of the intensity histogram is represented by the following formula:
Figure BDA0003011665130000059
where i represents a discrete intensity level, NpRepresenting the total number of pixels in the image of the lesion region of the liver, NgNumber of classes representing discrete intensity levels, niRepresenting the number of pixels with discrete intensity level i in the liver lesion area image, and H (i) representing the frequency of occurrence of the pixels with discrete intensity level i in the liver lesion area image; let M denote the set of all pixels in the liver lesion region, and M (i) denote the pixel value of the ith pixel;
the gray scale statistical characteristics comprise:
(a) energy:
Figure BDA0003011665130000061
where c is an optional pixel intensity used to indicate motion, preventing negative values in M.
(b) Total energy:
Figure BDA0003011665130000062
wherein, VvoxelRepresenting the voxel volume in cubic millimeters.
(c) Entropy:
Figure BDA0003011665130000063
where ε represents an arbitrarily small positive number.
(d) The interquartile distance: interquartilenge (n)75-n25 (3-14)
Wherein n is25And n75Indicating the number of pixels in the 25 th and 75 th percentiles of discrete intensity levels, respectively.
(e) Mean absolute deviation:
Figure BDA0003011665130000064
wherein the content of the first and second substances,
Figure BDA0003011665130000065
representing the average gray-scale intensity of the image matrix.
(f) Robust mean absolute deviation:
Figure BDA0003011665130000066
wherein M is10-90(i) A pixel value representing the ith pixel at a discrete intensity level between the 10 th and 90 th percentiles,
Figure BDA0003011665130000067
denotes the mean value, N10-90A number of class classes representing a discrete intensity level between the 10 th and 90 th percentiles; a robust average absolute deviation is the average distance of all intensity values calculated over a subset of the pixel matrix having a gray level between or equal to the 10 th and 90 th percentiles from the average value.
(g) Skewness:
Figure BDA0003011665130000068
wherein, mu3Table 3 center distance of order, σ denotes standard deviation;
(h) kurtosis:
Figure BDA0003011665130000069
(i) consistency:
Figure BDA00030116651300000610
wherein p (-) represents the grayscale intensity value of the image;
in addition, the first-order statistical features commonly used include maximum, minimum, mean, variance, and standard deviation, which are not described herein again.
C. Second-order features (texture features): texture feature analysis features for describing texture distribution in a tumor are extracted from a gray level co-occurrence matrix (GLCM), a gray level run matrix (GLRLM), a gray level area size matrix (GLSZM) and a local gray level difference matrix (NGTDM) of an image.
GLCM: computing a gray scale relationship between neighboring voxels in the lesion fieldThe texture of the image is characterized. Size Ng×NgThe GLCM of (a) describes a second order joint probability function for the mask-constrained image region, defined as P (i, j | δ, θ). The (i, j) th element of the matrix represents the combined number of occurrences of pixels in the image at levels i and j that are separated by a distance delta pixel along the angle theta. The delta of the center pixel is defined as the distance with infinite norm.
GLRLM: and analyzing the relation of the pixels with the same intensity in the space to express the strip texture of the image. The gray level run is defined as the length of the number of pixels having the same gray level value consecutively. In the gray level runlength matrix P (i, j | θ), the (i, j) th element describes the number of runlengths where the gray level i and length j appear in the image (ROI) along the angle θ.
GLSZM: the gray scale size region quantifies a gray scale region in the image. The gray area is defined as the number of connected pixels sharing the same gray intensity. From an infinite norm, if the distance is 1 (8 connected regions in 2D, 26 connected regions in 3D), the pixels are considered connected. In the gray level region matrix P (i, j), the (i, j) -th element is equal to the number of regions having a gray level i and a size j appearing in the image. In contrast to the gray level co-occurrence matrix and the GLRLM matrix, which are rotation independent, only one matrix is calculated in all directions of the region of interest.
NGTDM: the description is based on the texture visual characteristics of a certain voxel and its neighborhood. The adjacent gray-scale difference matrix quantifies the difference between the adjacent gray-scale values and the average gray-scale value within the distance δ. The sum of the absolute differences of the grey levels i is stored in a matrix. Let XglIs a set of segmented voxels, xgl(jx,jy,jz)∈XglIs a voxel at position (j)x,jy,jz) Then the average gray level of the neighborhood is:
Figure BDA0003011665130000071
wherein (j)x,jy,jz) Not equal (0,0,0) and xgl(jx+kx,jy+ky,jz+kz)∈Xgl. W is the number of pixels of the field.
D. Wavelet characteristics: the original two-dimensional image is filtered using a non-sampled two-dimensional (2D) wavelet transform. The two-dimensional image is regarded as a series of row vectors to form, the original signal is divided into two parts of a high-frequency signal and a low-frequency signal after wavelet filtering, the two parts of signals are subjected to down sampling, elements with even numbers marked as small rows of vectors are reserved, the high-frequency part obtains a corresponding high-frequency matrix, and the low-frequency signal obtains a new low-frequency matrix. And respectively performing wavelet filtering and downsampling on column vectors of the newly generated matrix, thereby obtaining three high-frequency signals and one low-frequency signal. And repeating the above steps for low frequency signaldecThen J is finally obtaineddecX 3 high frequency signals and one low frequency approximation image. The modulo square expression of the Daubechies wavelet used is:
|m0(ω)|2=[cos2(ω/2)]NP[sin2(ω/2)] (3-21)
in which the wavelets are in discrete form
Figure BDA0003011665130000081
Omega is angular quantity, hkAre parameters. And extracting the multi-scale intensity and texture features of the image subjected to wavelet decomposition to obtain wavelet features.
Step (4), in order to avoid overfitting, a feature screening module is used for carrying out feature screening on the image omics feature set obtained by the feature extraction module, screening is carried out according to Mutual Information (MI) coefficients between the image omics features and the data labels, if the mutual Information coefficients are larger than 0.3, the features are reserved, and if not, the features are screened out;
the formula for calculating the mutual information coefficient is as follows:
Figure BDA0003011665130000082
wherein, XiTo represent a feature set of the cinematologyI features, Y denotes the data tag in step (1), I (X)i(ii) a Y) represents the mutual information value of the ith characteristic and the label; p (X, y) is XiAnd Y, p (X) and p (Y) are X, respectivelyiAnd the edge probability distribution function of Y, wherein x and Y respectively represent any feature in the image omics feature set and the target label.
Step (5) training a random forest model for the feature set obtained by screening in the step (4) by using a random forest network training module, mixing and disordering cases of an epithelial sample vascular smooth muscle lipoma group and a non-epithelial sample vascular smooth muscle lipoma group, randomly sampling in a layered mode, and dividing the cases into a training set and a testing set according to the ratio of 2: 1; the training set is used for training the model, and the testing set is used for testing the prediction effect of the model; obtaining an optimal random forest model, and mapping the image omics feature set into an image omics prediction label; the optimal random forest model is selected according to an operation characteristic curve (ROC) of a receiver and an area under the curve (AUC), the optimal AUC value is obtained on the test set, the model without the overfitting phenomenon is the optimal random forest model, and the image omics feature set is mapped to the image omics prediction label by using the trained random forest network. In this embodiment, the predictive label has a probability value between 0 and 1.
The calculation formula of the AUC value is as follows:
Figure BDA0003011665130000083
wherein M is the number of positive samples, N is the number of negative samples, PpositiveIs the prediction probability of a positive sample, PnegativeA predicted probability of being a negative sample; u (P)positive,Pnegative) Can be expressed as:
Figure BDA0003011665130000084
the prediction probability of any positive sample and the prediction probability of any negative sample can form a pair of samples, and M X N pairs exist together, so that M X N U function values are corresponded.
The effect of the optimal random forest model on the training set and the test set is shown in table 1.
TABLE 1 Effect of optimal random forest models on training and test sets
Figure BDA0003011665130000091
And (6) acquiring clinical indexes of the case by using a clinical index fusion module, wherein the clinical indexes include but are not limited to: age, sex, maximum diameter of imaging tumor, whether there are multiple tumors, whether there is a history of smoking or alcohol abuse, whether there is liver dysfunction, etc.; clinical data of patients need to be screened, indexes with inter-group difference are selected, and the indexes participate in construction of a fusion model. Respectively utilize chi2And (4) carrying out inter-group difference analysis on classification variables and continuous variables by using the test and the Mann-Whitney U test, and removing clinical index features with the significance level P value of more than 0.01.
The calculation formula for the chi-square test is as follows:
Figure BDA0003011665130000092
wherein A isiObservation frequency at level i, EiIs the expected frequency of i level under the original hypothesis, n is the total frequency, piThe desired probability at the i level. Desired frequency E of i leveliExpected probability p equal to the total frequency n × i leveliAnd k is the number of cells. When n is relatively large,%2The statistics approximate a chi-squared distribution that obeys k-1 degrees of freedom.
In the average comparison of two samples designed in a group, the calculation formula of the statistic U of the U test is as follows;
Figure BDA0003011665130000093
wherein the content of the first and second substances,
Figure BDA0003011665130000094
the average number of the samples is obtained;
Figure BDA0003011665130000095
is an estimate of the standard error.
The significance level was set to 0.01 by the two hypothesis tests above, and the clinical index with a P value greater than 0.01 was discarded.
Then, the normalized clinical indexes are fused with the image omics prediction labels, a quintuple validation method is used for training a multiple logistic regression model, and the accuracy of the multiple logistic regression model can be represented by the average AUC:
Figure BDA0003011665130000101
wherein N represents total fold number, AUCnRepresents the AUC value on the internal verification set at the nth turn, and N belongs to [1, N ]]. Finally, the accuracy of the best fusion model (multiple logistic regression model) on the data set of step (1) is shown in table 2.
TABLE 2 accuracy of the best fusion model
Figure BDA0003011665130000102
And (7) realizing the classification work of the actual image by using a classification module. Loading the trained random forest network and the multiple logistic regression model, obtaining a clinical CT or MRI image of a case to be identified, and sequentially processing the clinical CT or MRI image by using a focus region extraction module, a feature extraction module and a feature screening module to obtain a screened image omics feature set corresponding to the focus region of the case to be identified; and obtaining an image omics prediction label of the image to be identified by utilizing the trained random forest network, combining the clinical data of the case to be identified, using the image omics prediction label as the input of a multiple logistic regression model to obtain a final prediction label of the case to be identified, and finishing the classification of the liver epithelial sample vascular smooth muscle lipoma image according to the prediction label.
The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (9)

1. A non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on imagery omics is characterized by comprising:
the sampling module is used for acquiring CT or MRI image data of a patient diagnosed with liver epithelial-like vascular smooth muscle lipoma, liver cancer or liver focal nodule hyperplasia; all epithelial-like vascular smooth muscle lipoma cases are classified into an epithelial-like vascular smooth muscle lipoma group, all liver cancer and liver focal nodule proliferation cases are classified into a non-epithelial-like vascular smooth muscle lipoma group, and actual data labels are given to the cases according to grouping;
the focus region extraction module is used for extracting images of liver epithelial-like vascular smooth muscle lipoma, liver cancer and liver focal nodule hyperplasia focus regions;
the characteristic extraction module is used for extracting four types of image omics characteristics of the focus region image, including gray statistic characteristics, morphological characteristics, texture characteristics and wavelet characteristics, to form an image omics characteristic set X { X ═ X1,X2,X3,...,XnWhere n is the total number of features, the ith feature vector XiIs represented by Xi={xi1,xi2,xi3,...,ximM is the total number of cases, ximAn ith feature representing an mth case;
the characteristic screening module is used for carrying out characteristic screening on the image omics characteristic set acquired by the characteristic extraction module, the screening is based on a mutual information coefficient between the image omics characteristic and the data label, if the mutual information coefficient is greater than 0.3, the characteristic is reserved, and if not, the characteristic is screened out;
the random forest network training module is used for training a random forest network by utilizing the screened image omics feature set and the actual data label corresponding to each sample, and mapping the image omics feature set into an image omics prediction label by utilizing the trained random forest network;
the clinical index fusion module is used for acquiring and screening clinical indexes of cases, fusing the normalized clinical indexes with the image omics prediction labels and training a multivariate logistic regression model by using a five-fold verification method;
the classification module is used for loading the trained random forest network and the multivariate logistic regression model in the actual classification stage, acquiring a clinical CT or MRI image of the case to be identified, and processing the clinical CT or MRI image by using the focus region extraction module, the feature extraction module and the feature screening module in sequence to obtain a screened image omics feature set corresponding to the focus region of the case to be identified; and obtaining an image omics prediction label of the image to be identified by utilizing the trained random forest network, combining the clinical data of the case to be identified, using the image omics prediction label as the input of a multiple logistic regression model to obtain a final prediction label of the case to be identified, and finishing the classification of the liver epithelial sample vascular smooth muscle lipoma image according to the prediction label.
2. The non-invasive liver epithelioid angiomyolipoma image classification device based on iconomics according to claim 1, characterized in that the formula for the feature screening module to calculate the mutual information coefficient is:
Figure RE-FDA0003144011790000021
wherein, XiRepresenting the ith feature in the image group feature set, Y represents a label, and p (X, Y) is XiAnd Y, p (X) and p (Y) are X, respectivelyiAnd the edge probability distribution function of Y, I (X)i(ii) a Y) represents XiThe mutual information coefficient with the label is obtained, Y belongs to a target label in the label Y, and X belongs to XiDenotes that X belongs to XiA characteristic value of (1).
3. The non-invasive liver epithelioid angiosarcoma image classification device based on iconomics according to claim 1, characterized in that, in the random forest network training module, when training a random forest network, a model which obtains an optimal AUC value and has no overfitting phenomenon is a trained random forest model according to a receiver operation characteristic curve ROC and an area under the curve AUC;
the calculation formula of the AUC value is as follows:
Figure RE-FDA0003144011790000022
wherein M is the number of positive samples, N is the number of negative samples, PpositiveIs the prediction probability of a positive sample, PnegativeA predicted probability of being a negative sample; u (P)positive,Pnegative) Expressed as:
Figure RE-FDA0003144011790000023
the prediction probability of any positive sample and the prediction probability of any negative sample can form a pair of samples, and M X N pairs exist together, so that M X N U function values are corresponded.
4. The device for non-invasive liver epithelioid angiosarcoma smooth muscle lipoma image classification based on imagery omics of claim 1, wherein the clinical CT or MRI images obtained by the sampling module should comply with the following standards:
inclusion criteria for data: pathological diagnosis of smooth muscle lipoma of liver epithelium sample blood vessel, liver cancer and liver focal nodule hyperplasia after surgical excision or biopsy; performing CT or MRI enhanced scanning within 1 month before operation; the image data is complete and can be further analyzed;
exclusion criteria for data: diagnosing a recurrent tumor or a multi-organ malignancy; receiving anti-tumor treatment before CT or MRI enhanced scanning; the quality of the image of the liver mass is poor.
5. The device for classifying smooth muscle lipoma in non-invasive liver epithelioid blood vessels based on imagery omics according to claim 1, wherein the morphological characteristics extracted by the feature extraction module comprise:
(a) grid surface area a:
Figure RE-FDA0003144011790000031
wherein, OiaiAnd OibiIs the edge of the ith triangle in the mesh, from vertex ai、biAnd the origin O, NfRepresenting the number of triangles in the mesh, AiDenotes the area of the ith triangle in the mesh, NfRepresenting the number of triangles in the mesh;
(b) pixel surface area Apixel
Figure RE-FDA0003144011790000032
Wherein A iskRepresenting a single pixel AkSurface area of (2), NvRepresenting the number of pixels;
(c) perimeter P:
Figure RE-FDA0003144011790000033
wherein the content of the first and second substances,
Figure RE-FDA0003144011790000034
aiand biIs the vertex of the ith line in the peripheral mesh, PiRepresenting the perimeter of the ith line in the grid circumference, NwRepresents the number of buses;
(d) ratio of perimeter to surface area of mesh
Figure RE-FDA0003144011790000035
(e) Sphericity of sphere
Figure RE-FDA0003144011790000036
(f) Degree of spherical irregularity
Figure RE-FDA0003144011790000037
(g) Length of main shaft
Figure RE-FDA0003144011790000038
λmajorA maximum principal component value after performing principal component analysis on physical coordinates representing the center of the pixel; the principal axis length feature yields the maximum axial length of an ellipsoid enclosing the region of interest, and uses the maximum principal component λmajorTo calculate;
(h) minor axis length
Figure RE-FDA0003144011790000039
λminorA second largest principal component value after performing principal component analysis on the physical coordinates representing the pixel center; the short axis length feature produces a second large axis length of an ellipsoid enclosing the region of interest, and a second large principal component λ is usedminorTo calculate;
(i) elongation percentage:
Figure RE-FDA00031440117900000310
6. the device for classifying smooth muscle lipoma in non-invasive liver epithelial-like blood vessel based on iconomics according to claim 1, wherein the statistical features of the gray scale extracted by the feature extraction module are features obtained based on an intensity histogram, which represents the distribution of pixel intensities in a focal region, and the intensity histogram can be defined by a formula:
Figure RE-FDA0003144011790000041
where i represents a discrete intensity level, NpRepresenting the total number of pixels in the image of the lesion region of the liver, NgNumber of classes representing discrete intensity levels, niRepresenting the number of pixels with discrete intensity level i in the liver lesion area image, and H (i) representing the frequency of occurrence of the pixels with discrete intensity level i in the liver lesion area image;
let M denote the set of all pixels in the liver lesion region, and M (i) denote the pixel value of the ith pixel; the gray scale statistical characteristics comprise:
(a) energy:
Figure RE-FDA0003144011790000042
where c is an optional pixel intensity used to indicate motion, preventing negative values in M;
(b) total energy:
Figure RE-FDA0003144011790000043
wherein, VvoxelRepresents the voxel volume in cubic millimeters;
(c) entropy:
Figure RE-FDA0003144011790000044
wherein ε represents an arbitrarily small positive number;
(d) the interquartile distance:
Interquartile range=n75-n25
wherein n is25And n75Respectively representing the number of pixels with discrete intensity levels in the 25 th percentile and the 75 th percentile;
(e) mean absolute deviation:
Figure RE-FDA0003144011790000045
wherein the content of the first and second substances,
Figure RE-FDA0003144011790000046
means representing the frequency of occurrence of the pixel;
(f) robust mean absolute deviation:
Figure RE-FDA0003144011790000051
wherein M is10-90(i) A pixel value representing the ith pixel at a discrete intensity level between the 10 th and 90 th percentiles,
Figure RE-FDA0003144011790000052
denotes the mean value, N10-90A number of class classes representing a discrete intensity level between the 10 th and 90 th percentiles; robust average absolute deviation is the average distance of all intensity values from the average calculated over a subset of the pixel matrix with gray levels between or equal to the 10 th and 90 th percentiles;
(g) skewness:
Figure RE-FDA0003144011790000053
wherein, mu3Represents the 3-order center distance, and σ represents the standard deviation;
(h) kurtosis:
Figure RE-FDA0003144011790000054
(i) consistency:
Figure RE-FDA0003144011790000055
where p (.) represents the grayscale intensity value of the image.
7. The device for classifying smooth muscle lipoma in non-invasive liver epithelioid vessels based on imagery omics as claimed in claim 1, wherein the texture features extracted by the feature extraction module are derived based on gray level co-occurrence matrix (GLCM), gray level run-length matrix (GLRLM), gray level area size matrix (GLSZM), and local gray level difference matrix (NGTDM).
8. The non-invasive liver epithelial sample vessel smooth muscle lipoma image classification device based on the iconomics according to claim 1, characterized in that the wavelet feature extracted by the feature extraction module filters the original two-dimensional image by non-sampling two-dimensional wavelet transform, the two-dimensional image is regarded as a series of row vectors to form, the original signal is divided into two parts of high-frequency signal and low-frequency signal after the wavelet filtering, the two parts of signal are down-sampled, the elements marked as even number of the row vectors are reserved, and a high-frequency matrix and a low-frequency matrix are generated;
performing wavelet filtering and down-sampling on column vectors of the newly generated matrix respectively, thereby obtaining three high-frequency signals and one low-frequency signal; performing the above processing on the low frequency signal, repeating JdecThen J is finally obtaineddecX 3 high frequency signals and a low frequency approximation image; the modulo square expression of the Daubechies wavelet used is:
|m0(ω)|2=[cos2(ω/2)]NP[sin2(ω/2)]
in which the discrete form of the wavelets
Figure RE-FDA0003144011790000061
Omega is angular quantity, hkN represents the order of the wavelet as a parameter; for the 4 wavelet components obtained after wavelet decomposition,and respectively calculating the gray statistical characteristics and the texture characteristics to obtain the wavelet characteristics.
9. The device for classifying smooth muscle lipoma in non-invasive liver epithelial-like blood vessel based on imagomics as claimed in claim 1, wherein the clinical index fusion module utilizes χ for clinical index screening of cases2And screening the clinical index features by using the test and the Kruskal-Wallis H test, and removing the clinical index features with the significance level P value of more than 0.01.
CN202110378227.XA 2021-04-08 2021-04-08 Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics Pending CN113269225A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110378227.XA CN113269225A (en) 2021-04-08 2021-04-08 Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110378227.XA CN113269225A (en) 2021-04-08 2021-04-08 Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics

Publications (1)

Publication Number Publication Date
CN113269225A true CN113269225A (en) 2021-08-17

Family

ID=77228551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110378227.XA Pending CN113269225A (en) 2021-04-08 2021-04-08 Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics

Country Status (1)

Country Link
CN (1) CN113269225A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067154A (en) * 2021-11-12 2022-02-18 深圳大学 Crohn's disease fibrosis classification method based on multi-sequence MRI and related equipment
CN114419135A (en) * 2022-03-29 2022-04-29 武汉楚精灵医疗科技有限公司 Pancreas marker size quantification method, pancreas marker size quantification device, pancreas marker size quantification terminal and readable storage medium
CN114581382A (en) * 2022-02-21 2022-06-03 北京医准智能科技有限公司 Training method and device for breast lesions and computer readable medium
CN115131642A (en) * 2022-08-30 2022-09-30 之江实验室 Multi-modal medical data fusion system based on multi-view subspace clustering
CN115148365A (en) * 2022-05-31 2022-10-04 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) Method and system for predicting prognosis of germ cell tumor of central nervous system
CN116548950A (en) * 2023-07-12 2023-08-08 北京大学 Focus positioning method, apparatus, device and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178449A (en) * 2019-12-31 2020-05-19 浙江大学 Liver cancer image classification method and device combining computer vision characteristics and imaging omics characteristics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178449A (en) * 2019-12-31 2020-05-19 浙江大学 Liver cancer image classification method and device combining computer vision characteristics and imaging omics characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WENJIE LIANG等: ""Differentiating Hepatic Epithelioid Angiomyolipoma From Hepatocellular Carcinoma and Focal Nodular Hyperplasia via Radiomics Models", 《FRONTIERS IN ONCOLOGY》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067154A (en) * 2021-11-12 2022-02-18 深圳大学 Crohn's disease fibrosis classification method based on multi-sequence MRI and related equipment
CN114067154B (en) * 2021-11-12 2022-07-08 深圳大学 Crohn's disease fibrosis classification method based on multi-sequence MRI and related equipment
CN114581382A (en) * 2022-02-21 2022-06-03 北京医准智能科技有限公司 Training method and device for breast lesions and computer readable medium
CN114581382B (en) * 2022-02-21 2023-02-21 北京医准智能科技有限公司 Training method and device for breast lesions and computer readable medium
CN114419135A (en) * 2022-03-29 2022-04-29 武汉楚精灵医疗科技有限公司 Pancreas marker size quantification method, pancreas marker size quantification device, pancreas marker size quantification terminal and readable storage medium
CN114419135B (en) * 2022-03-29 2022-06-28 武汉楚精灵医疗科技有限公司 Pancreas marker size quantification method and device, terminal and readable storage medium
CN115148365A (en) * 2022-05-31 2022-10-04 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) Method and system for predicting prognosis of germ cell tumor of central nervous system
CN115131642A (en) * 2022-08-30 2022-09-30 之江实验室 Multi-modal medical data fusion system based on multi-view subspace clustering
CN116548950A (en) * 2023-07-12 2023-08-08 北京大学 Focus positioning method, apparatus, device and storage medium
CN116548950B (en) * 2023-07-12 2023-11-10 北京大学 Focus positioning method, apparatus, device and storage medium

Similar Documents

Publication Publication Date Title
CN111178449B (en) Liver cancer image classification method combining computer vision characteristics and imaging omics characteristics
Zebari et al. Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images
CN111242174B (en) Liver cancer image feature extraction and pathological classification method based on imaging omics
CN113269225A (en) Non-invasive liver epithelium sample vascular smooth muscle lipoma image classification device based on image omics
Shah et al. Artificial intelligence for breast cancer analysis: Trends & directions
Sorensen et al. Quantitative analysis of pulmonary emphysema using local binary patterns
Byra et al. Early prediction of response to neoadjuvant chemotherapy in breast cancer sonography using Siamese convolutional neural networks
US6760468B1 (en) Method and system for the detection of lung nodule in radiological images using digital image processing and artificial neural network
Zhang et al. Automated semantic segmentation of red blood cells for sickle cell disease
US20090097728A1 (en) System and Method for Detecting Tagged Material Using Alpha Matting
El-Baz et al. Three-dimensional shape analysis using spherical harmonics for early assessment of detected lung nodules
Taha et al. Automatic polyp detection in endoscopy videos: A survey
Ganesan et al. Fuzzy-C-means clustering based segmentation and CNN-classification for accurate segmentation of lung nodules
Kuo et al. Automatic lung nodule detection system using image processing techniques in computed tomography
CN108695000B (en) Ultrasonic image-based intelligent diagnosis system for diffuse thyroid diseases
Maitra et al. Automated digital mammogram segmentation for detection of abnormal masses using binary homogeneity enhancement algorithm
Maghsoudi et al. Informative and uninformative regions detection in WCE frames
CN110838114A (en) Pulmonary nodule detection method, device and computer storage medium
CN112102343A (en) Ultrasound image-based PTC diagnostic system
Al-Tam et al. Breast cancer detection and diagnosis using machine learning: a survey
Hamed et al. Comparative study and analysis of recent computer aided diagnosis systems for masses detection in mammograms
EP4118617A1 (en) Automated detection of tumors based on image processing
Shirazi et al. Automated pathology image analysis
Azli et al. Ultrasound image segmentation using a combination of edge enhancement and kirsch’s template method for detecting follicles in ovaries
Bhuyan et al. Diagnosis system for cancer disease using a single setting approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210817