CN112508889B - Chromosome karyotype analysis system - Google Patents

Chromosome karyotype analysis system Download PDF

Info

Publication number
CN112508889B
CN112508889B CN202011352831.7A CN202011352831A CN112508889B CN 112508889 B CN112508889 B CN 112508889B CN 202011352831 A CN202011352831 A CN 202011352831A CN 112508889 B CN112508889 B CN 112508889B
Authority
CN
China
Prior art keywords
chromosome
algorithm
chromosomes
pixels
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011352831.7A
Other languages
Chinese (zh)
Other versions
CN112508889A (en
Inventor
梁静
岳彩通
于坤杰
瞿博阳
杨昊天
胡毅
李鹏帅
李功平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202011352831.7A priority Critical patent/CN112508889B/en
Publication of CN112508889A publication Critical patent/CN112508889A/en
Application granted granted Critical
Publication of CN112508889B publication Critical patent/CN112508889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30072Microarray; Biochip, DNA array; Well plate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to a chromosome karyotype analysis system, which comprises (1) a filtering algorithm and a segmentation algorithm are designed to filter and remove impurities from human metaphase cell images and extract chromatids; (2) designing a recognition algorithm and a correction algorithm to recognize and pair the extracted chromosomes, thereby generating a nucleation pattern. The invention combines the karyotype analysis method with the technologies of image processing, machine learning and the like, develops a set of reliable chromosome karyotype automatic analysis system, realizes the automation and the intellectualization of the chromosome karyotype analysis, and integrally improves the efficiency and the accuracy of the chromosome karyotype classification.

Description

Chromosome karyotype analysis system
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a karyotype analysis system which is applied to research on human genetic disease mechanisms, species genetic relationship and evolution, tumor pathology and the like.
Background
Human somatic cells produce chromosomes in metaphase, normally 46 chromosomes (22 pairs of autosomes and a pair of sex chromosomes). Chromosomes act as carriers of genetic material, and abnormalities in their number or structure can lead to genetic disorders. The karyotype analysis has important significance in the aspects of discussing human genetic disease mechanisms, species genetic relationship and evolution, tumor pathological research and the like. As shown in FIG. 1, karyotyping refers to grouping, aligning and pairing chromosomes in images of human metaphase cells and generating karyotype maps.
Early karyotyping was performed by purely manual procedures, requiring operators to manually isolate chromosomes from metaphase cell images, and then to pair and order them according to morphology and banding patterns to generate a karyotype map. This is a very tedious and complicated task, and because of the high requirement for the professional ability of the treating staff and the long culture period of the related skills, the research and treating staff in this area are in very short supply. In addition, the chromosome is subjected to pairing sequencing only by human eyes, so that a large error exists, and the efficiency is low.
In recent years, with the rapid popularization and efficient application of automation and intelligence in various fields, the automated processing of karyotype analysis in the medical field has become more desirable. Currently, there are some chromosome karyotyping systems in commercial use in the industry, such as the Cytovision system developed by lycra, germany, the Ikaros system developed by carl zeiss management, inc. These products have transitioned karyotyping from purely manual operations to semi-automated processing. Semi-automatic processing is said because these systems still require a significant amount of manual assistance when in use. The existing classification method has long time consumption, low efficiency and insufficient accuracy and cannot meet the requirements of clinical work.
Disclosure of Invention
The invention aims to design a set of chromosome karyotype analysis system, which can rapidly extract chromatids from human mitosis metaphase cell images and carry out pairing and nucleation pattern generation.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a karyotyping system comprising
(1) Designing a filtering algorithm and a segmentation algorithm to filter and remove impurities from the human metaphase cell image and extract a dyeing monomer;
(2) designing a recognition algorithm, and recognizing and matching the extracted chromosomes by a correction algorithm so as to generate a nucleation pattern.
Preferably: the flow of the filtering algorithm is as follows:
(1) carrying out binarization on the metaphase image (I) to generate a filter map (B);
(2) the contours of all objects in image (B) are detected and are sequentially denoted as C 1 -C n
(3) Initializing i ═ 1, empty set contiors;
(4) calculating the contour C i Area A of i
(5) Impurities were removed according to the following scheme (α, β, η are threshold parameters):
Figure GDA0003762534580000021
(6) judging whether i is larger than or equal to n, if so: executing the next step, if no, changing i to i +1, and going to step 4;
(7) processing each pixel point in the binary image (B):
if the pixel point (x, y) is located in a certain contour range in the set contents, the pixel point is assigned to 255, otherwise, the pixel point is assigned to 0.
(8) The metaphase image (I) was processed as follows to generate a filtered image (G):
Figure GDA0003762534580000031
and adding a function of artificially assisting in removing impurities into the chromosome karyotype analysis system, and when the impurities cannot be completely removed by a filtering algorithm, manually assisting in removing the impurities.
Preferably: the flow of the segmentation algorithm is as follows:
(1) initializing an empty set Contours;
(2) detecting the outlines of all the objects of the filter graph and adding the outlines to the set Contours;
(3) initializing i to 1;
(4) calculating the minimum circumscribed rectangle of the ith contour in the set Contours to obtain the coordinates of four vertexes of the minimum circumscribed rectangle in the filter graph;
(5) dividing and rotating the filtered image according to the coordinates obtained in the step four to obtain a vertically placed chromosome i;
(6) judging whether i is more than or equal to the number n of elements in the set Contours, if yes, finishing, otherwise: i equals i +1 and returns to step 4.
Preferably: in the segmentation algorithm, for chromosomes overlapped in a cross manner, the chromosomes are automatically separated by adopting the segmentation algorithm; aiming at other types of overlapped chromosomes, a man-machine interaction mode is adopted for solving, namely, each chromosome is drawn by different colors through a mouse manually, and then the chromosomes are extracted according to the colors by an algorithm.
Preferably: the extraction of the chromosome characteristics comprises the following steps: extracting, area and stripe characteristics of a middle shaft;
the extraction of the middle shaft: the method mainly comprises preprocessing, deleting boundaries layer by layer and post-processing, wherein specific values and graphic representation methods are given to background pixels, pattern pixels, contour pixels and skeleton pixels, the main task of the preprocessing stage is determination of the image contour pixels and elimination of the influence of edge noise, and Sobel edge detection operators are used for detecting the contour of an image; the pixel deleting stage according to the constraint condition is to delete the contour point pixels marked in the preprocessing stage according to the judgment condition and mark the contour point pixels as skeleton pixels; the object processed in the post-processing stage is a skeleton line obtained in the multi-iteration preprocessing and deleting stage, the problem to be solved is that the skeleton line part has the width of two pixels, one pixel is deleted through a corresponding judgment condition to obtain the skeleton line with the single pixel width, and the preprocessing and the post-processing of the algorithm adopt a serial method;
extraction of chromosome length: firstly, determining an end point pixel of a central axis of a chromosome as Q 0 When the length L of the chromosome is 0, from the point Q 0 Starting to traverse along the central axis, and finding out a pixel point on the second central axis as Q 1 If Q is 1 Is located at Q 0 L ═ L + 1; if Q 1 At Q 0 In a diagonal direction of
Figure GDA0003762534580000041
Updating the value of L every time one pixel is traversed, and continuously circulating until the L is traversed to the other end point of the central axis, wherein the L at the moment is the length of the chromosome;
extracting chromosome area: the area of the chromosome can be calculated through the binarized chromosome image, and for the binarized binary image with black pixels 0 and white pixels 1 as background, the area of the chromosome is the number of the white pixels;
extracting chromosome banding characteristics: the extraction of the belt line features is to solve the gray information of pixels of the common part of a line perpendicular to each central axis point and the chromosome by taking the central axis point as an independent variable after the central axis is extracted from the chromosome gray image; calculating the dyeing body stripe characteristics by adopting WDD transformation;
normalization of features:
normalization of length and area: calculating the maximum value m and the minimum value n of the length of all chromosomes in one picture, wherein the length of the chromosome with the length value x after normalization is (m-x)/(m-n), the length value of the chromosome after normalization is in the range of [0,1], and the area characteristic of the chromosome is normalized by adopting the same method;
normalization of band-line characteristics: because the WDD function is uniform, the projection curve representing the belt stripes only needs to be normalized before the characteristic of the WDD with the belt stripes is obtained, and because the belt stripes represent the texture information of the chromosomes, the trend of the curve is not influenced by changing the numerical value of the curve, the maximum value and the minimum value of the projection curve of each chromosome are obtained, and the projection value is normalized according to a length normalization method;
further processing of the features: adding the characteristic values of each chromosome set, namely 46 chromosomes, to obtain an average value, and then subtracting the average value from the characteristic value of each chromosome, wherein the purpose of the treatment is to reduce the abnormal difference of the chromosome sets among different people, such as the brightness difference during microscope extraction and the like;
and finally, combining the normalized length, area and projection characteristics to obtain 852-dimensional characteristic data of the chromosomes for chromosome classification.
Preferably: the identification and pairing specifically comprises the steps of adopting an ensemble learning and a correction algorithm based on priori knowledge to carry out identification and pairing, firstly predicting a test sample through an ensemble learner, applying the correction algorithm according to the predicted probability, and finally obtaining a prediction label;
the component learner used for ensemble learning is as follows: kNN, SVM and ELM, selection design for chromosome classification task:
adaptive design for kNN: the classification basis is changed from the Euclidean distance to the average distance, namely, the Euclidean distance between the sample point to be measured and the nearest training sample is changed to the average distance between the sample point to be measured and the nearest training sample of the same type. K is additionally set to 3;
for the adaptive design of the SVM: c is set to 1, g is set to 0.07, and is modified to a probability output;
adaptive design for ELM: the neural unit is set to 1500, modified to a probabilistic output;
the prediction probability adopts an integration algorithm, and the method specifically comprises the following steps:
step1: repeatedly and randomly extracting k samples from the k training samples for 5 times to form 5 new training sets tr 1 ,tr 2 ,tr 3 ,tr 4 ,tr 5
Step 2. utilization based on average distancekNN, SVM, ELM, in tr 1 And training and testing the sample to be tested for the training sample. Then voting the prediction labels of the three is carried out, and the result is recorded as L 1 (ii) a The probability outputs of the three are averaged, and the result is recorded as P 1
Step3: repeating the step2 according to different training sets to obtain L 1 ,L 2 ,L 3 ,L 4 ,L 5 Voting is carried out on the 5 prediction labels, and the result is recorded as Label; will P 1 ,P 2 ,P 3 ,P 4 ,P 5 Then carrying out average treatment, and recording the result as P;
step 4: and (5) obtaining a final prediction Label by using Label and P and using a correction algorithm, and testing the precision of the final prediction Label.
Preferably: the correction algorithm is specifically a correction algorithm: knowing that the chromosome set of a normal person is 22 pairs of autosomes and +1 pair of sex chromosomes (XX or XY), under the condition of meeting the distribution, obtaining the most possible distribution according to the classification probability, namely the correction algorithm proposed by the product;
defining variables:
P ij : the probability of the ith chromosome of 46 chromosomes, the jth tag (1. ltoreq. i.ltoreq.46, 1. ltoreq. i.ltoreq.24)
X ij The indicator variable of the sample to the label is 0 or 1
Solving the maximum probability is to solve:
Figure GDA0003762534580000061
if the object to be measured is female, the constraint conditions are satisfied as follows:
X ij ∈{0,1}
Figure GDA0003762534580000062
Figure GDA0003762534580000063
j=24∑X ij if the object to be measured is male, the constraint condition is satisfied as follows:
X ij ∈{0,1}
Figure GDA0003762534580000071
Figure GDA0003762534580000072
j=23,24∑X ij ≤1
the steps of the correction algorithm are as follows:
inputting: 24 classification probabilities of 46 chromosomes
And (3) outputting: predictive tag
Step1, calculating the maximum probability P under the condition of meeting the constraint of males 1
Step2, calculating the maximum probability P under the condition of satisfying the female constraint condition 2
Figure GDA0003762534580000073
The invention has the beneficial effects that:
the karyotype analysis method is combined with technologies such as image processing, machine learning and the like, a set of reliable chromosome karyotype automatic analysis system is developed, automation and intellectualization of chromosome karyotype analysis are achieved, and efficiency and accuracy of chromosome karyotype classification are improved integrally.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of karyotyping;
FIG. 2 is a roadmap for the staining karyotype analysis technique;
FIG. 3 is a karyotype chart;
FIG. 4 is a graph of the effect of the filtering algorithm;
FIG. 5 is a graph of the effect of the segmentation algorithm;
FIG. 6 is a map of adherent chromosomes;
FIG. 7 is an overlapping chromosome map;
FIG. 8 is a processing diagram of adherent chromosomes;
FIG. 9 is a diagram of overlapping chromosome processing;
FIG. 10 is a medial axis extraction view;
FIG. 11 is a diagram of an integrated algorithm structure;
FIG. 12 is a precision versus bar graph;
fig. 13 is a ten-fold accuracy comparison chart.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 2, a karyotype analysis system includes
1) Designing a filtering algorithm and a segmentation algorithm to filter and remove impurities from the human metaphase cell image and extract a dyeing monomer;
(2) designing a recognition algorithm and a correction algorithm to recognize and pair the extracted chromosomes, thereby generating a nucleation pattern.
This example will specifically describe the karyotype analysis system of the present invention, taking the karyotype chart of FIG. 3 as an example.
The specific implementation mainly comprises three parts of filtering, impurity removal, segmentation, extraction and identification pairing. The following description will be made for the relevant technical details of these three parts.
Filtering to remove impurities
The part of the work is mainly to remove impurities in the human metaphase cell image, and to achieve the purpose, a corresponding filtering algorithm is designed, and the algorithm flow is as follows:
image I is a metaphase cell image in the upper left corner of the figure in the specification, FIG. 2'
Filter graph B is an explanatory drawing, and ' Filter graph ' at the upper right corner of FIG. 2 '
1. Carrying out binaryzation on the cell metaphase image I to generate a filter image B;
2. the contours of all objects in image B are detected and denoted sequentially as C 1 -C n
3. Initializing i ═ 1, empty set contiors;
4. calculating the contour C i Area A of i
5. Impurities were removed according to the following scheme (α, β, η are threshold parameters):
Figure GDA0003762534580000091
6. judging whether i is larger than or equal to n, if so: executing the next step, if no, changing i to i +1, and going to step 4;
7. processing each pixel point in the binary image B:
if the pixel point (x, y) is located in a certain contour range in the set contents, the pixel point is assigned to 255, otherwise, the pixel point is assigned to 0.
8. The metaphase image I was processed as follows to generate a filter map G:
Figure GDA0003762534580000101
it should be noted that, in order to ensure that chromosomes in the metaphase cell image are not filtered as impurities, the threshold parameters α, β, η in the filtering algorithm are set conservatively, which results in that the filtering algorithm cannot ensure that the impurities in the metaphase cell image are filtered out completely. In order to solve the problem, a function of artificially assisting in removing impurities is added into chromosome karyotype intelligent analysis software, and when impurities cannot be completely removed through a filtering algorithm, the impurities are removed through artificial assistance. The filtering effect is shown in fig. 4.
Segmentation extraction
The work of the part is mainly to extract each chromatid from the filter graph, and in order to achieve the purpose, a corresponding segmentation algorithm is designed, and the algorithm flow is as follows:
1. initializing an empty set Contours;
2. detecting the outlines of all the objects in the filter graph G and adding the outlines to the set Contours;
3. initializing i to 1;
4. calculating the minimum bounding rectangle of the ith contour in the set Contours to obtain the coordinates (x) of the four vertexes in the filter graph 1 ,y 1 ),(x 2 ,y 2 ),(x 3 ,y 3 ),(x 4 ,y 4 );
5. Dividing and rotating the filtering graph G according to the coordinates obtained in the fourth step to obtain a vertically placed chromosome i;
6. judging whether i is more than or equal to the number n of elements in the set Contours, if yes, finishing, otherwise: i +1 and returns to step 4.
The effect of the segmentation algorithm is shown in fig. 5.
Since chromosomes are non-rigid objects, there are instances of conglutination and crossing of chromosomes in metaphase cell images. Therefore, the chromosomes extracted by the segmentation algorithm are not all monomers, and there are conglutinated chromosomes and overlapped chromosomes, as shown in fig. 6 and 7.
Aiming at the conglutinated chromosomes, a human-computer interaction mode is adopted for solving, namely, the conglutinated regions are manually erased through a mouse, and the effect is shown in fig. 8.
For chromosomes with criss-cross overlap (most common), we designed a corresponding algorithm to automatically separate out chromosomes; aiming at other types of overlapped chromosomes (uncommon), a man-machine interaction mode is adopted for solving, namely, each chromosome is drawn by different colors through a mouse manually, and then the chromosomes are extracted according to the colors through an algorithm. The effect is shown in fig. 9.
Feature extraction and processing
The extraction of chromosome characteristics comprises: the extraction of the middle axis (the area and the length are convenient to calculate), the area and the belt line characteristics.
Extracting a chromosome central axis: the method mainly comprises preprocessing, boundary deletion layer by layer and post-processing, wherein a specific value and a graphic representation method are given to background pixels, pattern pixels, contour pixels and skeleton pixels, the main task of the preprocessing stage is determination of the image contour pixels and elimination of the influence of edge noise, and a Sobel edge detection operator is used for detecting the contour of an image; the pixel deleting stage according to the constraint condition is to delete the contour point pixels marked in the preprocessing stage according to the judgment condition and mark the contour point pixels as skeleton pixels; the post-processing stage is to process the skeleton line obtained in the pre-processing and deleting stages by multiple iterations, and solves the problem that the skeleton line part has the width of two pixels, and one pixel is deleted by corresponding judgment conditions to obtain the skeleton line with the single pixel width. The preprocessing and post-processing of the algorithm adopt a serial method, and the algorithm is shown in fig. 10:
extraction of chromosome length: firstly, determining an end point pixel of a central axis of a chromosome as Q 0 When the length L of the chromosome is 0, from the point Q 0 Starting to traverse along the central axis, and finding out a pixel point on the second central axis as Q 1 . If Q 1 At Q 0 L ═ L + 1; if Q 1 At Q 0 In a diagonal direction of
Figure GDA0003762534580000111
And updating the value of L every time one pixel is traversed, and continuously circulating until the L is traversed to the other end point of the central axis, wherein the L at the moment is the length of the chromosome.
Extracting chromosome area: the area of the chromosome can be calculated through the binarized chromosome image, and for the binarized binary image with black pixels 0 and white pixels 1 as background, the area of the chromosome is the number of the white pixels.
Extracting the chromosome striation characteristics: the extraction of the stripe features aims at the chromosome gray level image, and after the central axis is extracted, the gray level information of pixels of a common part of a line perpendicular to each central axis point and the chromosome is obtained by taking the central axis points as independent variables. The work uses a global description method, and the obtained band-line characteristics can represent the overall characteristics of the band-line curve. According to related documents, the classifying effect of the chromosome band-shaped characteristics obtained by WDD transformation is better, so that the WDD transformation is adopted to calculate the chromosome band-shaped characteristics. The WDD transformation means that a series of WDD functions are respectively used for carrying out inner product on the chromosome striated curves, and the obtained numerical value is called a WDD coefficient and is used as the striated characteristic value of the chromosome.
Normalization of features:
1. normalization of length and area: and (3) calculating the maximum value m and the minimum value n of the lengths of all chromosomes in one picture, wherein the length of the chromosome with the length value x after normalization is (m-x)/(m-n), and the length value of the chromosome after normalization is in the range of [0,1 ]. The same approach is taken for the area features of the chromosomes for normalization.
2. Normalization of band-line characteristics: because the WDD function is uniform, the projection curve representing the stripe only needs to be normalized before the WDD characteristic of the stripe is obtained, and because the stripe expresses the texture information of chromosomes, the trend of the curve is not influenced by changing the numerical value of the curve, the maximum value and the minimum value of the projection curve of each chromosome are obtained, and the projection value is normalized according to a length normalization method.
Further processing of the features: for each individual chromosome set, i.e., 46 chromosomes, the feature values are added to obtain an average value, and then the average value is subtracted from the feature value of each chromosome. The purpose of this process is to reduce the abnormal differences in the genome between different persons, like differences in darkness at the time of microscopic extraction, etc.
Finally, the length, the area and the projection characteristics after normalization are combined to obtain 852-dimensional characteristic data of the chromosomes for chromosome classification.
Identifying and pairing
The product is identified and paired by adopting an integrated learning and priori knowledge-based correction algorithm. Firstly, a test sample is predicted through an ensemble learner, and a correction algorithm is applied according to the predicted probability to finally obtain a predicted label.
The component learner used for ensemble learning is as follows: kNN, SVM and ELM. The selection basis is that after various learners are tested, the three learners have higher classification precision. The selection design made for the chromosome classification task is as follows:
1. adaptive design for kNN: the classification basis is changed from the Euclidean distance to the average distance, namely, the Euclidean distance between the sample point to be detected and the nearest training sample is changed to the average distance between the sample point to be detected and the nearest training sample of the same kind. K is additionally set to 3;
2. for the adaptive design of the SVM: c is set to 1, g is set to 0.07, and is modified to a probability output;
3. adaptive design for ELM: the neural unit is set to 1500, modified to a probabilistic output.
Fig. 11 shows the structure of the integrated algorithm, and the following describes the specific algorithm:
step1: repeatedly and randomly extracting k samples from the k training samples for 5 times to form 5 new training sets tr 1 ,tr 2 ,tr 3 ,tr 4 ,tr 5
Step 2. Using the mean distance based kNN, SVM, ELM, in tr 1 And training and testing the sample to be tested for the training sample. Then voting the prediction labels of the three is carried out, and the result is recorded as L 1 (ii) a The probability outputs of the three are averaged, and the result is recorded as P 1
Step3: repeating the step2 according to different training sets to obtain L 1 ,L 2 ,L 3 ,L 4 ,L 5 Voting is carried out on the 5 prediction labels, and the result is recorded as Label; will P 1 ,P 2 ,P 3 ,P 4 ,P 5 Then carrying out average treatment, and marking the result as P;
step 4: utilizing Label and P, using correction algorithm to obtain final prediction Label, and testing its accuracy
Description of an integrated algorithm: the improved kNN, SVM, ELM have similar and higher accuracy in the test of chromosome classification. Because of the close precision, no weighted votes were selected, but direct votes were selected. The integration algorithm utilizes the clustering idea, and through the disturbance on the training samples and the selection of three different learners, the classification labels have differences, and the requirements of integration on 'good and different' are met. And finally, the classification precision is further improved through a correction algorithm.
We performed ten-fold cross validation on 550 metaphase maps, with training data: the test data is 9:1, the precision improvement of integration is verified, and fig. 12 is the average precision comparison of ten tests: the precision of the integrated learner reaches 93.89%, the precision is highest in the comparison algorithm, the precision of the SVM and the KNN + are similar, and the precision of the ELM is lowest.
And (3) correcting algorithm: it is known that the normal human genome is 22 pairs of autosomes +1 pairs of sex chromosomes (XX or XY). Under the condition of satisfying the distribution, the most probable distribution is obtained according to the classification probability, and the most probable distribution is the correction algorithm proposed by the work.
Defining variables:
P ij : probability of the ith chromosome in 46 chromosomes, the jth tag (1. ltoreq. i.ltoreq.46, 1. ltoreq. i.ltoreq.24)
X ij The indicator variable of the sample to the label is 0 or 1
Solving the maximum probability is to solve:
Figure GDA0003762534580000141
if the object to be measured is female, the constraint conditions are satisfied as follows:
X ij ∈{0,1}
Figure GDA0003762534580000151
Figure GDA0003762534580000152
j=24ΣX ij if the object to be measured is male, the constraint condition is satisfied as follows:
X ij ∈{0,1}
Figure GDA0003762534580000153
Figure GDA0003762534580000154
j=23,24ΣX ij ≤1
the steps of the correction algorithm are as follows:
inputting: 24 classification probabilities of 46 chromosomes
And (3) outputting: predictive tag
Step1, calculating the maximum probability P under the condition of meeting the constraint of males 1
Step2, calculating the maximum probability P under the condition of satisfying the female constraint condition 2
Figure GDA0003762534580000155
Likewise, the accuracy of ten-fold cross validation is as shown in fig. 13. The accuracy of the 'integration + correction' algorithm is highest in the ten-fold relationship, the average accuracy is higher than that of other algorithms, and the result verifies the effectiveness of the integration and correction algorithm in the chromosome karyotype analysis.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A karyotyping system, comprising
(1) Designing a filtering algorithm and a segmentation algorithm to filter and remove impurities from the human metaphase cell image and extract a dyeing monomer;
(2) designing a recognition algorithm and a correction algorithm to recognize and pair the extracted chromosomes so as to generate a nucleation type map;
the identification and pairing specifically comprises the steps of adopting an ensemble learning and a correction algorithm based on priori knowledge to carry out identification and pairing, firstly predicting a test sample through an ensemble learner, applying the correction algorithm according to the predicted probability, and finally obtaining a predicted label;
the component learning device adopted by the ensemble learning comprises: kNN, SVM and ELM, the selection design made for the chromosome classification task:
adaptive design for kNN: changing the classification basis from Euclidean distance to average distance, namely changing the Euclidean distance between the sample point to be detected and the nearest training sample to the average distance between the sample point to be detected and the nearest training sample of the same kind, and setting k to be 3;
for the adaptive design of the SVM: c is set to 1, g is set to 0.07, and is modified to a probability output;
adaptive design for ELM: the neural unit is set to 1500, modified to a probabilistic output;
the prediction probability adopts an integration algorithm, and the method specifically comprises the following steps:
step1: repeatedly and randomly extracting k samples from the k training samples, and sequentially extracting for 5 times to form 5 new training sets tr 1 ,tr 2 ,tr 3 ,tr 4 ,tr 5
Step2 Using the mean distance based kNN, SVM, ELM, in tr 1 Training and testing the sample to be tested for training the sample, voting the prediction labels of the three samples, and recording the result as L 1 (ii) a The probability outputs of the three are averaged, and the result is recorded as P 1
Step3: repeating the step2 according to different training sets to obtain L 1 ,L 2 ,L 3 ,L 4 ,L 5 Voting the 5 prediction labels, and recording the result as Label; will P 1 ,P 2 ,P 3 ,P 4 ,P 5 Then carrying out average treatment, and marking the result as P;
step 4: obtaining a final prediction Label by using Label and P and using a correction algorithm, and testing the precision of the final prediction Label;
the correction algorithm is specifically as follows: knowing that the distribution of the normal human chromosome group is 22 pairs of autosomes +1 pairs of sex chromosomes, XX or XY, and under the condition of meeting the distribution, obtaining the most possible distribution according to the classification probability;
defining variables:
P ij : the probability of the ith chromosome and the jth label in 46 chromosomes is that i is more than or equal to 1 and less than or equal to 46, and j is more than or equal to 1 and less than or equal to 24
X ij The indicator variable of the sample to the label is 0 or 1
Solving the maximum probability is to solve:
Figure FDA0003740473010000021
if the object to be measured is female, the constraint conditions are satisfied as follows:
X ij ∈{0,1}
Figure FDA0003740473010000022
Figure FDA0003740473010000023
Σ X when j takes 24 ij =0;
If the object to be measured is male, the constraint conditions are satisfied as follows:
X ij ∈{0,1}
Figure FDA0003740473010000024
Figure FDA0003740473010000025
Σ X when j takes 23 and 24 ij ≤1;
The steps of the correction algorithm are as follows:
inputting: 24 classification probabilities for 46 chromosomes;
and (3) outputting: predictive tag
Step1, calculating the maximum probability P under the condition of meeting the constraint of the male 1
Step2, calculating the maximum probability P under the condition of satisfying the female constraint condition 2
Step3 if P 1 ≥P 2 Taking P 1 The corresponding label is an output label
Otherwise, get P 2 The corresponding tag is an output tag.
2. The karyotyping system according to claim 1, wherein: the filtering algorithm flow is as follows:
(1) carrying out binarization on the image I in the middle cell stage to generate a filter image B;
(2) detecting the outlines of all objects in the filter image B, and sequentially recording the outlines as C 1 -C q
(3) Initializing p ═ 1, empty set contacts;
(4) calculating the contour C p Area A of p
(5) Removing impurities alpha, beta and eta according to the following process as threshold parameters:
if alpha < A p < beta: adding C p Adding to the set Contours;
if A is p Beta. calculating the contour C p Area B of circumscribed polygon p
If (A) p /B p ) Eta is C p Adding to the set Contours;
(6) judging whether p is more than or equal to q, if so: executing the next step, if no, changing p to p +1, and going to the step (4);
(7) processing each pixel point in the binary image B:
if a pixel (x, y) is within a certain contour range in the set of Contours, the pixel is assigned 255, otherwise the pixel is assigned 0,
(8) the metaphase image I was processed as follows to generate a filter map G:
Figure FDA0003740473010000041
and adding a function of artificially assisting impurity removal into the chromosome karyotype analysis system, and when the impurities cannot be completely removed by a filtering algorithm, manually assisting the impurity removal.
3. The karyotyping system according to claim 1, wherein: the flow of the segmentation algorithm is as follows:
(1) initializing an empty set Contours;
(2) detecting the outlines of all the objects of the filter graph and adding the outlines to the set Contours;
(3) initializing d-1;
(4) calculating the minimum circumscribed rectangle of the d-th contour in the set Contours to obtain the coordinates of four vertexes of the minimum circumscribed rectangle in the filter graph;
(5) dividing and rotating the filtered image according to the coordinates obtained in the step (4) to obtain a vertically placed chromosome V;
(6) judging whether V is larger than or equal to the element number e in the set Contours, if yes, finishing, otherwise: d +1, and returning to the step (4).
4. A karyotyping system according to claim 3, wherein: automatically separating out chromosomes by adopting a segmentation algorithm; aiming at other types of overlapped chromosomes, a man-machine interaction mode is adopted for solving, namely, each chromosome is drawn by different colors through a mouse manually, and then the chromosomes are extracted according to the colors by an algorithm.
5. The karyotyping system according to claim 1, wherein: the extraction of chromosome characteristics comprises: extracting, area and stripe characteristics of a middle shaft;
the extraction of the middle shaft: the method mainly comprises preprocessing, deleting boundaries layer by layer and post-processing, wherein specific values and graphic representation methods are given to background pixels, pattern pixels, contour pixels and skeleton pixels, the main task of the preprocessing stage is determination of the image contour pixels and elimination of the influence of edge noise, and Sobel edge detection operators are used for detecting the contour of an image; the pixel deleting stage according to the constraint condition is to delete the contour point pixels marked in the preprocessing stage according to the judgment condition and mark the contour point pixels as skeleton pixels; the object processed in the post-processing stage is a skeleton line obtained in the multi-iteration preprocessing and deleting stage, the problem to be solved is that the skeleton line part has the width of two pixels, one pixel is deleted through a corresponding judgment condition to obtain the skeleton line with the single pixel width, and the preprocessing and the post-processing of the algorithm adopt a serial method;
extraction of chromosome length: firstly, determining an end point pixel of a central axis of a chromosome as Q 0 When the length L of the chromosome is 0, from the point Q 0 Starting to traverse along the central axis, and finding out a pixel point on the second central axis as Q 1 If Q is 1 Is located at Q 0 L ═ L + 1; if Q 1 Is located at Q 0 In a diagonal direction of
Figure FDA0003740473010000051
Updating the value of L every time one pixel is traversed, and continuously circulating until the L is traversed to the other end point of the central axis, wherein the L at the moment is the length of the chromosome;
extracting chromosome area: calculating the area of the chromosome through the binarized chromosome image, wherein the area of the chromosome is the number of white pixels for the binarized binary image with black pixels 0 and white pixels 1 as background;
extracting the chromosome striation characteristics: the extraction of the belt line features is to solve the gray information of pixels of the common part of a line perpendicular to each central axis point and the chromosome by taking the central axis point as an independent variable after the central axis is extracted from the chromosome gray image; calculating the dyeing body stripe characteristics by adopting WDD transformation;
normalization of features:
normalization of length and area: calculating the maximum value m and the minimum value n of the length of all chromosomes in one picture, wherein the length of the chromosome with the length value x after normalization is (m-x)/(m-n), the length value of the chromosome after normalization is in the range of [0,1], and the area characteristic of the chromosome is normalized by adopting the same method;
normalization of band-line characteristics: because the WDD function is uniform, the projection curve representing the belt lines only needs to be normalized before the WDD characteristic of the belt lines is obtained, and because the belt lines express the texture information of chromosomes, the curve value is changed without influencing the trend of the curve, the maximum value and the minimum value of the projection curve of each chromosome are obtained, and the projection value is normalized according to a length normalization method;
further processing of the features: adding the characteristic values of each chromosome set, namely 46 chromosomes, to obtain an average value, and then subtracting the average value from the characteristic value of each chromosome, wherein the purpose of the treatment is to reduce the abnormal difference of the chromosome sets among different people, such as the brightness difference during microscope extraction;
finally, the length, the area and the projection characteristics after normalization are combined to obtain 852-dimensional characteristic data of the chromosomes for chromosome classification.
CN202011352831.7A 2020-11-26 2020-11-26 Chromosome karyotype analysis system Active CN112508889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011352831.7A CN112508889B (en) 2020-11-26 2020-11-26 Chromosome karyotype analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011352831.7A CN112508889B (en) 2020-11-26 2020-11-26 Chromosome karyotype analysis system

Publications (2)

Publication Number Publication Date
CN112508889A CN112508889A (en) 2021-03-16
CN112508889B true CN112508889B (en) 2022-09-13

Family

ID=74966566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011352831.7A Active CN112508889B (en) 2020-11-26 2020-11-26 Chromosome karyotype analysis system

Country Status (1)

Country Link
CN (1) CN112508889B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781505B (en) * 2021-11-08 2022-11-18 深圳市瑞图生物技术有限公司 Chromosome segmentation method, chromosome analyzer, and storage medium
CN114170218B (en) * 2021-12-16 2022-12-06 易构智能科技(广州)有限公司 Chromosome image instance label generation method and system
CN115049686B (en) * 2022-08-15 2022-11-29 湖南自兴智慧医疗科技有限公司 Complex chromosome region segmentation method and device based on auxiliary information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632168A (en) * 2013-12-09 2014-03-12 天津工业大学 Classifier integration method for machine learning
CN109242842A (en) * 2018-08-31 2019-01-18 郑州金域临床检验中心有限公司 Human chromosomal analytical equipment, equipment and storage medium based on image recognition
CN111986183A (en) * 2020-08-25 2020-11-24 中国科学院长春光学精密机械与物理研究所 Chromosome scattergram image automatic segmentation and identification system and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018078447A1 (en) * 2016-10-27 2018-05-03 Scopio Labs Ltd. Digital microscope which operates as a server
CN109150104A (en) * 2018-08-10 2019-01-04 江南大学 A kind of diagnosing failure of photovoltaic array method based on random forests algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632168A (en) * 2013-12-09 2014-03-12 天津工业大学 Classifier integration method for machine learning
CN109242842A (en) * 2018-08-31 2019-01-18 郑州金域临床检验中心有限公司 Human chromosomal analytical equipment, equipment and storage medium based on image recognition
CN111986183A (en) * 2020-08-25 2020-11-24 中国科学院长春光学精密机械与物理研究所 Chromosome scattergram image automatic segmentation and identification system and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Chromosome Medical Axis Extraction Method Based on Graphic Geometry and Competitive Extreme Learning Machines Teams(GELMT) Classifier for Chromosome Classification;Jie Wang等;《Bio-inspired Computing:Theories and Applications》;20200402;第1160卷;第550-564页 *
On fully automatic feature measurement for banded chromosome classification;Jim Piper等;《Journal of Quantitative Cell Science》;19890531;第10卷(第3期);第242-255页 *
基于决策级融合的无线传感器网络感知目标分类研究;张阳;《信息科技辑》;20191115(第11期);第9-97页 *
基于深度卷积神经网络对中期染色体分类的应用研究;张成成等;《中国临床新医学》;20200229;第13卷(第2期);第123-126页 *

Also Published As

Publication number Publication date
CN112508889A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN112508889B (en) Chromosome karyotype analysis system
Wei et al. Deep learning model based breast cancer histopathological image classification
US8605981B2 (en) Centromere detector and method for determining radiation exposure from chromosome abnormalities
CN109300111B (en) Chromosome recognition method based on deep learning
Gamarra et al. Split and merge watershed: A two-step method for cell segmentation in fluorescence microscopy images
Ko et al. Automatic white blood cell segmentation using stepwise merging rules and gradient vector flow snake
CN107437243B (en) Tire impurity detection method and device based on X-ray image
Poletti et al. A review of thresholding strategies applied to human chromosome segmentation
US9971929B2 (en) Fingerprint classification system and method using regular expression machines
JP4921858B2 (en) Image processing apparatus and image processing program
CN109492706B (en) Chromosome classification prediction device based on recurrent neural network
Theodorakopoulos et al. Hep-2 cells classification via fusion of morphological and textural features
CN110021028B (en) Automatic clothing making method based on clothing style drawing
CN106529532A (en) License plate identification system based on integral feature channels and gray projection
CN107194393B (en) Method and device for detecting temporary license plate
CN110705403A (en) Cell sorting method, cell sorting device, cell sorting medium, and electronic apparatus
CN106340016A (en) DNA quantitative analysis method based on cell microscope image
Sajeena et al. Automated cervical cancer detection through RGVF segmentation and SVM classification
CN107730499A (en) A kind of leucocyte classification method based on nu SVMs
CN103679184A (en) Method for leukocyte automatic identification based on relevant vector machine
CN114283407A (en) Self-adaptive automatic leukocyte segmentation and subclass detection method and system
CN113160185A (en) Method for guiding cervical cell segmentation by using generated boundary position
CN115294377A (en) System and method for identifying road cracks
Zafari et al. Resolving overlapping convex objects in silhouette images by concavity analysis and Gaussian process
Pijackova et al. Deep learning pipeline for chromosome segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant