CN109711469B - Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index - Google Patents

Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index Download PDF

Info

Publication number
CN109711469B
CN109711469B CN201811615503.4A CN201811615503A CN109711469B CN 109711469 B CN109711469 B CN 109711469B CN 201811615503 A CN201811615503 A CN 201811615503A CN 109711469 B CN109711469 B CN 109711469B
Authority
CN
China
Prior art keywords
features
cell data
feature
supervised
semi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811615503.4A
Other languages
Chinese (zh)
Other versions
CN109711469A (en
Inventor
张莉
庞晴晴
王邦军
周伟达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201811615503.4A priority Critical patent/CN109711469B/en
Publication of CN109711469A publication Critical patent/CN109711469A/en
Application granted granted Critical
Publication of CN109711469B publication Critical patent/CN109711469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a breast cancer diagnosis system based on semi-supervised neighborhood discrimination indexes, which comprises a data acquisition module, a feature extraction module, a feature screening module and a classification module, wherein a labeled breast cell data sample and an unlabeled breast cell data sample can be acquired, a plurality of features of the breast cell data sample are extracted, then the semi-supervised neighborhood discrimination indexes of the features are calculated, the features of which the semi-supervised neighborhood discrimination indexes meet preset conditions are screened out from the features, and finally the breast cell data sample to be diagnosed is diagnosed according to the screened features, so that a diagnosis result is obtained. Therefore, the method is realized based on semi-supervised learning, the features with the highest association degree with the breast cancer are screened out by calculating the semi-supervised neighborhood discrimination index of each feature, the feature data are extracted from the data sample to be diagnosed in the diagnosis process, and finally the diagnosis result is obtained, so that the process of adding labels for a large amount of data is avoided, and the cost is greatly saved.

Description

Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index
Technical Field
The invention relates to the field of computers, in particular to a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index.
Background
Global breast cancer incidence has been on the rise since the end of the 70 s of the 20 th century. It is counted that 1 out of every 8 women in the united states has breast cancer. In recent years, the growth rate of the incidence rate of breast cancer in China is up to 1% -2% of the national rate. The key point of the prevention and treatment of the breast cancer is early discovery, and the early breast cancer, especially the zero-phase breast cancer, can preserve the breast and radically treat the breast cancer through operation, has low cost and can reach more than 90 percent of survival rate in 5 years.
Wang and Hu et al propose using a neighborhood discrimination index for feature selection in "Feature Selection Based on Neighborhood Discrimination Index", introducing the basic idea in shannon information theory into a neighborhood relation context, and propose a discrimination index for measuring the discrimination capability of feature subsets, however, the method is based on a fully supervised learning implementation, which requires acquiring a large number of data tags, and in practical application, the acquisition of the tags often requires a great amount of manual effort, thus requiring a great expense.
Disclosure of Invention
The invention aims to provide a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index, which is used for solving the problem that the cost is huge because a large number of data labels are needed for realizing the existing method for diagnosing breast cancer based on full-supervised learning.
In order to solve the technical problems, the invention provides a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index, comprising:
and a data acquisition module: for obtaining a labeled breast cell data sample and an unlabeled breast cell data sample;
and the feature extraction module is used for: extracting a plurality of features of a breast cell data sample from the labeled breast cell data sample and the unlabeled breast cell data sample;
and a feature screening module: the method comprises the steps of screening out features of the plurality of features, wherein the features of the semi-supervised neighborhood discrimination index meet preset conditions;
and a classification module: and diagnosing the breast cell data sample to be diagnosed according to the characteristics meeting the preset conditions to obtain a diagnosis result.
Optionally, the feature extraction module includes:
normalization unit: normalizing a labeled and unlabeled breast cell data matrix, wherein the breast cell data matrix comprises data of a plurality of breast cell data samples;
and a splicing unit: the method comprises the steps of splicing a normalized labeled mammary cell data matrix and an unlabeled mammary cell data matrix to obtain a target data matrix;
feature extraction unit: and the method is used for extracting each column of data in the target data matrix to obtain a plurality of characteristics of the breast cell data sample.
Optionally, the feature screening module is specifically configured to: and screening out the preset number of features with the maximum semi-supervised neighborhood discrimination index from the plurality of features.
Optionally, the feature screening module includes:
a judging unit: judging whether the number of the screened features is 0;
a first calculation unit: when the number of the screened features is 0, calculating a semi-supervised neighborhood discrimination index of each feature in a feature set to be screened according to a first formula, wherein the feature set to be screened comprises the plurality of features during initialization and is updated along with the screening process of the features;
a second calculation unit: when the number of the screened features is not 0, calculating a semi-supervised neighborhood discrimination index of each feature in the feature set to be screened according to a second formula;
screening unit: and the method is used for screening the features with the maximum semi-supervised neighborhood discrimination indexes from the feature set to be screened, and updating the feature set to be screened until the number of the screened features is the preset number.
Optionally, the classification module includes:
extraction unit: extracting target feature data from a breast cell data sample to be diagnosed according to the features meeting preset conditions;
classification unit: and the target characteristic data are input into a pre-trained KNN classifier to obtain a diagnosis result.
Optionally, the mammary gland cell data sample comprises mammary gland cell data extracted from a fine needle puncture digital image, the mammary gland cell data comprises any one or any combination of the following: nucleus radius, texture, smoothness, perimeter, concavity.
The invention provides a breast cancer diagnosis system based on semi-supervised neighborhood discrimination indexes, which comprises a data acquisition module, a feature extraction module, a feature screening module and a classification module, wherein the data acquisition module can acquire a labeled breast cell data sample and an unlabeled breast cell data sample, extract a plurality of features of the breast cell data sample, calculate the semi-supervised neighborhood discrimination indexes of the features, screen out the features of the features, and finally diagnose the breast cell data sample to be diagnosed according to the screened features to obtain diagnosis results. Therefore, the method is realized based on semi-supervised learning, the features with the highest association degree with the breast cancer are screened out by calculating the semi-supervised neighborhood discrimination index of each feature, the feature data are extracted from the data sample to be diagnosed in the diagnosis process, and finally the diagnosis result is obtained, so that the process of adding labels for a large amount of data is avoided, and the cost is greatly saved.
Drawings
For a clearer description of embodiments of the invention or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first embodiment of a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index according to the present invention;
FIG. 3 is a schematic structural diagram of a second embodiment of a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index according to the present invention;
fig. 4 is a flowchart of a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index according to a second embodiment of the present invention.
Detailed Description
The invention provides a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index, which realizes the purpose of realizing diagnosis based on semi-supervised learning, screens features according to the semi-supervised neighborhood discrimination index of the features in the diagnosis process, saves calculation cost and adds labels for a large amount of data.
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The following describes an embodiment of a breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index, referring to fig. 1, the embodiment includes:
the data acquisition module 100: for obtaining a labeled breast cell data sample and an unlabeled breast cell data sample.
The mammary gland cell data refer to characteristic data of cell nuclei in a digital image of a fine needle puncture of a breast tumor, such as radius, texture, circumference, smoothness, compactness, concavity, pits and the like. The principle of the fine needle puncture (fine needle aspiration, FNA for short) is that the fine needle puncture is utilized to suck the components such as cells in the focus part as smear, and the morphological change and interstitial change of tumor and non-tumor cells are observed. The labeled breast cell data samples also include diagnostic results, particularly, both benign and malignant, as compared to the unlabeled breast cell data samples. In this embodiment, the number of samples of the unlabeled breast cell data sample is greater than the number of samples of the labeled breast cell data sample.
Feature extraction module 200: for extracting a plurality of features of the breast cell data sample from the labeled breast cell data sample and the unlabeled breast cell data sample.
Specifically, in the foregoing data acquisition module 100, the acquired labeled mammary cell data matrix and the unlabeled mammary cell data matrix may be provided that the labeled mammary cell data matrix is an i×n matrix, where i is the number of samples of the labeled mammary cell data samples, n is the dimension of each sample, and the unlabeled mammary cell data matrix is a u×n matrix, where u is the number of samples of the unlabeled mammary cell data samples, and n is the dimension of the samples. Then, in the feature extraction module 200, the two matrices may be normalized respectively so that the sum of the features of the samples is 1, and then the normalized two matrices are spliced together to obtain a (l+u) n-type matrix, which is referred to as a target data matrix for convenience of description, where each column of the target data matrix is one feature of the breast cell data sample, so that n features of the breast cell data sample are finally obtained.
Feature screening module 300: and the method is used for screening out the characteristics of which the semi-supervised neighborhood discrimination index meets the preset condition from the plurality of characteristics.
The purpose of the feature screening module 300 is to screen out some features with the greatest degree of association with the tag from the n features of the breast cell sample extracted by the feature extraction module 200, or to find the relationship between the features and the tag, so that when there is a feature but no unknown data of the tag, the unknown data tag can be obtained through the existing relationship. In this embodiment, the association degree of a feature and a label is measured by a semi-supervised neighborhood discrimination index, where the semi-supervised neighborhood discrimination index is a parameter for measuring the distinguishing capability of a feature subset, and the greater the semi-supervised neighborhood discrimination index, the better the distinguishing capability of the feature subset. In this embodiment, the semi-supervised neighborhood discrimination index of each of the n features is calculated first, and then the features satisfying the preset requirement are screened out according to the semi-supervised neighborhood discrimination index, which may be an optional implementation manner, where the number of features with the largest preset number of semi-supervised neighborhood discrimination indexes is screened out, and the preset number may be adjusted according to the actual request, and this embodiment is not limited specifically.
Classification module 400: and diagnosing the breast cell data to be diagnosed according to the characteristics meeting the preset conditions to obtain a diagnosis result.
Specifically, according to the features which are screened and have high association degree with the tag, extracting feature data of the features from the breast cell data to be diagnosed, and then taking the feature data as input of a pre-trained classification model, wherein the finally obtained input is a diagnosis result, and the diagnosis result is specifically benign or malignant.
Specifically, the overall implementation flow of the above system embodiment is shown in fig. 2, and includes the following steps:
step S101: a labeled breast cell data sample and an unlabeled breast cell data sample are obtained.
Step S102: and respectively normalizing the labeled mammary gland cell data sample and the unlabeled mammary gland cell data sample to obtain a plurality of characteristics of the mammary gland cell data sample.
Step S103: and screening out the characteristics of which the semi-supervised neighborhood discrimination index meets the preset condition from the plurality of characteristics.
Step S104: and extracting feature data of the features from the breast cell data sample to be diagnosed according to the features meeting the preset conditions, and inputting the feature data into a classifier trained in advance to obtain a diagnosis result.
It can be seen that the breast cancer diagnosis system based on the semi-supervised neighborhood discrimination index provided in this embodiment includes a data acquisition module 100, a feature extraction module 200, a feature screening module 300, and a classification module 400, which can acquire a labeled breast cell data sample and an unlabeled breast cell data sample, extract a plurality of features of the breast cell data sample, calculate the semi-supervised neighborhood discrimination index of each feature, screen out features of the features that the semi-supervised neighborhood discrimination index satisfies a preset condition, and finally diagnose the breast cell data sample to be diagnosed according to the screened features, so as to obtain a diagnosis result. Therefore, the method is realized based on semi-supervised learning, the features with the highest association degree with the breast cancer are screened out by calculating the semi-supervised neighborhood discrimination index of each feature, the feature data are extracted from the data sample to be diagnosed in the diagnosis process, and finally the diagnosis result is obtained, so that the process of adding labels for a large amount of data is avoided, and the cost is greatly saved.
The second embodiment of the breast cancer diagnosis system based on the semi-supervised neighborhood discrimination index provided by the invention is realized based on the first embodiment, and is expanded to a certain extent based on the first embodiment.
As shown in fig. 3, the breast cancer diagnosis system based on the semi-supervised neighborhood discrimination index is divided into four modules, namely, a data acquisition module 100, a feature extraction module 200, a feature screening module 300 and a classification module 400, and the implementation procedures of the four modules are described below:
in this embodiment, two data matrices are input to the data acquisition module 100, including a labeled breast cell data matrix
Figure SMS_1
And a label-free mammary gland cell data matrix +.>
Figure SMS_2
Wherein x is i Is an n-dimensional sample representing breast cell data, l is the number of samples of labeled data samples, u is the number of samples of unlabeled data samples, and n is the total number of features of the data. Note that the labeled mammary cell data sample contains a label vector y= [ Y ] 1 … y l ] T Where i=1, 2,.. i E { -1,1}, in particular, when y i When = -1 indicates that the diagnosis result of the sample is malignant, when y i =1 indicates that the diagnosis of the sample is benign. As an alternative embodiment, this example sets l to 172, u to 341, and n to 30.
As shown in fig. 3, the feature extraction module 200 may be divided into the following three units, namely, a normalization unit 201, a stitching unit 202, and a feature extraction unit 203, where the functions of the respective units are as follows:
normalization unit 201: for normalizing the labeled and unlabeled breast cell data matrices, respectively.
The normalization aims to make the sum of the characteristic data in each sample be the same as 1 so as to facilitate subsequent calculation.
The splicing unit 202: the method is used for splicing the normalized labeled mammary cell data matrix and the label-free mammary cell data matrix to obtain a target data matrix.
The splice result is shown in the following formula:
target data matrix
Figure SMS_3
Where m=l+u is the total number of samples of the data matrix.
The feature extraction unit 203: and the method is used for extracting each column of data in the target data matrix to obtain a plurality of characteristics of the breast cell data sample.
That is, the target data matrix is divided by columns, and features that each column is a breast cell data sample are obtained, as shown in the following formula:
Figure SMS_4
wherein f k =[f 1k … f mk ] T For the characterization of breast cell data samples, k=1, 2,..n.
In this embodiment, as shown in fig. 3, the feature screening module 300 includes four units, namely a judging unit 301, a first calculating unit 302, a second calculating unit 303, and a screening unit 304, where the functions of the units are as follows:
the judgment unit 301: it is determined whether the number of features that have been screened is 0.
As an alternative embodiment, before entering the judging unit 301, an initializing operation may be performed, including setting the target feature subset G and the feature subset B to be filtered, and initializing them as
Figure SMS_5
B= { f 1 ,f 2 ,...,f n The target feature subset is used for storing the features which are screened out by the feature screening module 300 and have higher association degree with the tag, while the feature subset to be screened stores a plurality of features which are extracted by the previous module, namely the feature extraction module, during initialization, and the features in the set are updated along with the screening of the features, specifically, when one feature is screened out, the feature subset to be screened is moved to the target feature subset from the feature subset to be screened.
The first calculation unit 302: and the semi-supervised neighborhood discrimination index is used for calculating each feature in the feature set to be screened according to a first formula when the number of the screened features is 0.
In particular, when the target feature subset
Figure SMS_6
Calculating a semi-supervised neighborhood discrimination index of each feature in the set B, wherein the calculation formula is as follows:
semi-supervised neighborhood discrimination index
Figure SMS_7
Where k ε B, |·| represents the number of non-zero elements of the matrix, δ represents an adjustable constant parameter,
Figure SMS_8
and->
Figure SMS_9
Feature vector +.>
Figure SMS_10
The calculation formulas of the neighborhood similarity relation matrix under the distance function are respectively as follows:
Figure SMS_11
Figure SMS_12
wherein the method comprises the steps of
Figure SMS_13
Kth feature vector representing tagged data, < >>
Figure SMS_14
Represents the kth feature of the ith sample with label data, ε represents the neighborhood radius and 0 ε is less than or equal to 1. Furthermore, +_in formula (3)>
Figure SMS_15
1=[1,...,1] T ,f k The kth eigenvector representing all data, L is the laplace matrix l=d-S, D is the diagonal matrix, D ii =∑ j S ij And S is ij Satisfies the following formula:
Figure SMS_16
where t > 0 is the constant to be regulated, KNN (f i ) Represents f i Is not included in the K neighbors of a given set.
As an alternative embodiment, in this example, the following parameters may be set to δ=0.7, k=3, t=100, and ε=0.2.
The second calculation unit 303: and the semi-supervised neighborhood discrimination index is used for calculating each feature in the feature set to be screened according to a second formula when the number of the screened features is not 0.
Specifically, when the target feature subset G is not empty, a semi-supervised neighborhood discrimination index is calculated for each feature in the set B according to the following formula:
Figure SMS_17
wherein, k is E B,
Figure SMS_20
and->
Figure SMS_22
Respectively represent data matrix->
Figure SMS_24
And->
Figure SMS_19
Neighborhood similarity matrix under distance function, < ->
Figure SMS_21
And->
Figure SMS_23
The data matrix with tag data under the feature subsets G and gu { k } are represented, respectively. />
Figure SMS_25
And
Figure SMS_18
the calculation formula of (2) is as follows:
Figure SMS_26
Figure SMS_27
screening unit 304: and the method is used for screening the feature with the largest semi-supervised neighborhood discrimination index from the feature set to be screened, and updating the feature set to be screened.
Specifically, a semi-supervised neighborhood discrimination index SNDI (f k ,F G Y) (k=1, …, n), the feature with the largest semi-supervised neighborhood discrimination index is selected, added to the target feature subset G and the candidate feature subset b=b-G is updated.
It should be noted that the above-mentioned screening process is an iterative process, and the screening is repeatedly performed until the number of the screened features is the preset numberUntil that point. Of course, can also be as
Figure SMS_28
And stopping iteration, and finally selecting the first N features with the largest semi-supervised neighborhood discrimination indexes in the target features as the finally screened features.
As shown in fig. 3, the classification module 400 in this embodiment includes two units, namely an extraction unit 401 and a classification unit 402, which function as:
extraction unit 401: and extracting target characteristic data from the breast cell data sample to be diagnosed according to the characteristics meeting the preset conditions.
Classification unit 402: and the target characteristic data are input into a pre-trained KNN classifier to obtain a diagnosis result.
As an optional implementation manner, a KNN classifier may be selected to classify, so as to obtain a classification result that the sample to be tested is benign or malignant.
It can be seen that the breast cancer diagnosis system based on the semi-supervised neighborhood discrimination index provided in this embodiment includes a data acquisition module 100, a feature extraction module 200, a feature screening module 300, and a classification module 400, after original data are acquired, the data are normalized, when features are screened, some features with the largest correlation degree with the tag are screened out through an iterative mode, finally feature data are extracted from a breast cell data sample to be diagnosed, and diagnosis is performed by using a KNN classifier, so as to obtain a final diagnosis result. More characteristics are screened, and the accuracy of the diagnosis result is improved.
In summary, according to the breast cancer diagnosis system based on the semi-supervised neighborhood discrimination index provided in the present embodiment, the whole implementation flow is shown in fig. 4, and the method includes the following steps:
step S201: a labeled breast cell data matrix and an unlabeled breast cell data matrix are acquired.
Step S202: the labeled and unlabeled breast cell data matrices were normalized separately.
Step S203: splicing the normalized labeled mammary gland cell data matrix and the unlabeled mammary gland cell data matrix to obtain a target data matrix, and extracting each column in the target data matrix to obtain a plurality of characteristics of the mammary gland cell data sample.
Step S204: the target feature subset G is initialized to null and the feature subset to be screened is initialized to the plurality of features.
Step S205: whether the target feature subset G is empty is determined, if yes, the process proceeds to step S206, otherwise the process proceeds to step S207.
Step S206: and (3) calculating a semi-supervised neighborhood discrimination index of each feature in the feature set to be screened according to the formula (3).
Step S207: and (3) calculating a semi-supervised neighborhood discrimination index of each feature in the feature set to be screened according to the formula (7).
Step S208: and screening the features with the maximum semi-supervised neighborhood discrimination indexes from the feature set to be screened, and moving the features from the feature subset to be screened to the target feature subset.
Step S209: and extracting target characteristic data from the breast cell data sample to be diagnosed according to the characteristics meeting the preset conditions, and inputting the target characteristic data into a pre-trained KNN classifier to obtain a diagnosis result.
To verify the effect of this embodiment, this embodiment also provides a comparative experiment, specifically, this embodiment tests on UCI Data Set WDBC (Breast Cancer Wisconsin (Diagnostic) Data Set) containing 569 Data samples in total, each sample containing 31 features, including sample tags. Features are calculated from digitized images of Fine Needle Aspiration (FNA) of breast tumors, which characterize the nuclei present in the images. Table 1 shows comparison of the recognition results of the fully supervised neighborhood discrimination index (HANDI) method, including the average feature number and the average recognition rate of the ten-fold cross validation experiment. Obviously, the number of the features screened in the embodiment is more, and the accuracy of the obtained diagnosis result is higher.
TABLE 1
Figure SMS_29
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The breast cancer diagnosis system based on the semi-supervised neighborhood discrimination index provided by the invention is described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (4)

1. A breast cancer diagnosis system based on a semi-supervised neighborhood discrimination index, comprising:
and a data acquisition module: for obtaining a labeled breast cell data sample and an unlabeled breast cell data sample;
and the feature extraction module is used for: extracting a plurality of features of a breast cell data sample from the labeled breast cell data sample and the unlabeled breast cell data sample;
and a feature screening module: the method comprises the steps of screening out features of the plurality of features, wherein the features of the semi-supervised neighborhood discrimination index meet preset conditions;
and a classification module: the method is used for diagnosing the breast cell data sample to be diagnosed according to the characteristics meeting the preset conditions to obtain a diagnosis result;
the feature extraction module includes:
normalization unit: normalizing a labeled and unlabeled breast cell data matrix, wherein the breast cell data matrix comprises data of a plurality of breast cell data samples;
and a splicing unit: the method comprises the steps of splicing a normalized labeled mammary cell data matrix and an unlabeled mammary cell data matrix to obtain a target data matrix;
feature extraction unit: extracting each column of data in the target data matrix to obtain a plurality of characteristics of the breast cell data sample; the target data matrix is divided according to columns, and the characteristics that each column is a mammary gland cell data sample are obtained, wherein the formula is as follows:
Figure FDA0004058587340000011
wherein f k =[f 1k …f mk ] T Is a feature of the breast cell data sample, k=1, 2,. -%, n;
the feature screening module is specifically used for: and screening out the preset number of features with the maximum semi-supervised neighborhood discrimination index from the plurality of features.
2. The system of claim 1, wherein the feature screening module comprises:
a judging unit: judging whether the number of the screened features is 0;
a first calculation unit: when the number of the screened features is 0, calculating a semi-supervised neighborhood discrimination index of each feature in a feature set to be screened according to a first formula, wherein the feature set to be screened comprises the plurality of features during initialization and is updated along with the screening process of the features;
a second calculation unit: when the number of the screened features is not 0, calculating a semi-supervised neighborhood discrimination index of each feature in the feature set to be screened according to a second formula;
screening unit: and the method is used for screening the features with the maximum semi-supervised neighborhood discrimination indexes from the feature set to be screened, and updating the feature set to be screened until the number of the screened features is the preset number.
3. The system of claim 1 or 2, wherein the classification module comprises:
extraction unit: extracting target feature data from a breast cell data sample to be diagnosed according to the features meeting preset conditions;
classification unit: and the target characteristic data are input into a pre-trained KNN classifier to obtain a diagnosis result.
4. The system of claim 3, wherein the breast cell data sample comprises breast cell data extracted from a fine needle puncture digital image, the breast cell data comprising any one or any combination of: nucleus radius, texture, smoothness, perimeter, concavity.
CN201811615503.4A 2018-12-27 2018-12-27 Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index Active CN109711469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811615503.4A CN109711469B (en) 2018-12-27 2018-12-27 Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811615503.4A CN109711469B (en) 2018-12-27 2018-12-27 Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index

Publications (2)

Publication Number Publication Date
CN109711469A CN109711469A (en) 2019-05-03
CN109711469B true CN109711469B (en) 2023-06-20

Family

ID=66258762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811615503.4A Active CN109711469B (en) 2018-12-27 2018-12-27 Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index

Country Status (1)

Country Link
CN (1) CN109711469B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533190B (en) * 2019-07-18 2023-09-05 武汉烽火众智数字技术有限责任公司 Data object analysis method and device based on machine learning
CN111290369A (en) * 2020-02-24 2020-06-16 苏州大学 Fault diagnosis method based on semi-supervised recursive feature retention
CN117497064A (en) * 2023-12-04 2024-02-02 电子科技大学 Single-cell three-dimensional genome data analysis method based on semi-supervised learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877053A (en) * 2009-11-25 2010-11-03 北京交通大学 Semi-supervised neighborhood discrimination analysis method for face recognition
CN105138862A (en) * 2015-07-31 2015-12-09 同济大学 Collaborative anti-cancer pharmaceutical combination prediction method and pharmaceutical composition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7412425B2 (en) * 2005-04-14 2008-08-12 Honda Motor Co., Ltd. Partially supervised machine learning of data classification based on local-neighborhood Laplacian Eigenmaps

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877053A (en) * 2009-11-25 2010-11-03 北京交通大学 Semi-supervised neighborhood discrimination analysis method for face recognition
CN105138862A (en) * 2015-07-31 2015-12-09 同济大学 Collaborative anti-cancer pharmaceutical combination prediction method and pharmaceutical composition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于半监督阶梯网络的红外乳腺癌检测方法研究;侯丽等;《信息技术与信息化》;20180630;179-182 *

Also Published As

Publication number Publication date
CN109711469A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
Linder et al. Identification of tumor epithelium and stroma in tissue microarrays using texture analysis
Xu et al. Automatic nuclei detection based on generalized laplacian of gaussian filters
Kothari et al. Automatic batch-invariant color segmentation of histological cancer images
CN109711469B (en) Breast cancer diagnosis system based on semi-supervised neighborhood discrimination index
WO2019110567A1 (en) Method of computing tumor spatial and inter-marker heterogeneity
CN111462042B (en) Cancer prognosis analysis method and system
Sarkar et al. Sdl: Saliency-based dictionary learning framework for image similarity
WO2015069824A2 (en) Diagnostic system and method for biological tissue analysis
Sanchez-Morillo et al. Classification of breast cancer histopathological images using KAZE features
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
Win et al. Comparative study on automated cell nuclei segmentation methods for cytology pleural effusion images
Song et al. Hybrid deep autoencoder with Curvature Gaussian for detection of various types of cells in bone marrow trephine biopsy images
Moraru et al. Texture analysis of parasitological liver fibrosis images
Rodríguez-Esparza et al. Automatic detection and classification of abnormal tissues on digital mammograms based on a bag-of-visual-words approach
Mabaso et al. Spot detection methods in fluorescence microscopy imaging: a review
Zheng et al. Retrieval of pathology image for breast cancer using PLSA model based on texture and pathological features
Akbar et al. Tumor localization in tissue microarrays using rotation invariant superpixel pyramids
Chang et al. Multireference level set for the characterization of nuclear morphology in glioblastoma multiforme
CN117612711A (en) Multi-mode prediction model construction method and system for analyzing liver cancer recurrence data
Mhala et al. Improved approach towards classification of histopathology images using bag-of-features
Kurmi et al. Histopathology image segmentation and classification for cancer revelation
CN114863163A (en) Method and system for cell classification based on cell image
US7769547B2 (en) Karyometry-based method for prediction of cancer event recurrence
Kowal et al. Segmentation of breast cancer fine needle biopsy cytological images using fuzzy clustering
Vincent et al. Automated segmentation and classification of nuclei in histopathological images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant