CN112508889B

CN112508889B - Chromosome karyotype analysis system

Info

Publication number: CN112508889B
Application number: CN202011352831.7A
Authority: CN
Inventors: 梁静; 岳彩通; 于坤杰; 瞿博阳; 杨昊天; 胡毅; 李鹏帅; 李功平
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2022-09-13
Anticipated expiration: 2040-11-26
Also published as: CN112508889A

Abstract

The invention relates to a chromosome karyotype analysis system, which comprises (1) a filtering algorithm and a segmentation algorithm are designed to filter and remove impurities from human metaphase cell images and extract chromatids; (2) designing a recognition algorithm and a correction algorithm to recognize and pair the extracted chromosomes, thereby generating a nucleation pattern. The invention combines the karyotype analysis method with the technologies of image processing, machine learning and the like, develops a set of reliable chromosome karyotype automatic analysis system, realizes the automation and the intellectualization of the chromosome karyotype analysis, and integrally improves the efficiency and the accuracy of the chromosome karyotype classification.

Description

Chromosome karyotype analysis system

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a karyotype analysis system which is applied to research on human genetic disease mechanisms, species genetic relationship and evolution, tumor pathology and the like.

Background

Human somatic cells produce chromosomes in metaphase, normally 46 chromosomes (22 pairs of autosomes and a pair of sex chromosomes). Chromosomes act as carriers of genetic material, and abnormalities in their number or structure can lead to genetic disorders. The karyotype analysis has important significance in the aspects of discussing human genetic disease mechanisms, species genetic relationship and evolution, tumor pathological research and the like. As shown in FIG. 1, karyotyping refers to grouping, aligning and pairing chromosomes in images of human metaphase cells and generating karyotype maps.

Early karyotyping was performed by purely manual procedures, requiring operators to manually isolate chromosomes from metaphase cell images, and then to pair and order them according to morphology and banding patterns to generate a karyotype map. This is a very tedious and complicated task, and because of the high requirement for the professional ability of the treating staff and the long culture period of the related skills, the research and treating staff in this area are in very short supply. In addition, the chromosome is subjected to pairing sequencing only by human eyes, so that a large error exists, and the efficiency is low.

In recent years, with the rapid popularization and efficient application of automation and intelligence in various fields, the automated processing of karyotype analysis in the medical field has become more desirable. Currently, there are some chromosome karyotyping systems in commercial use in the industry, such as the Cytovision system developed by lycra, germany, the Ikaros system developed by carl zeiss management, inc. These products have transitioned karyotyping from purely manual operations to semi-automated processing. Semi-automatic processing is said because these systems still require a significant amount of manual assistance when in use. The existing classification method has long time consumption, low efficiency and insufficient accuracy and cannot meet the requirements of clinical work.

Disclosure of Invention

The invention aims to design a set of chromosome karyotype analysis system, which can rapidly extract chromatids from human mitosis metaphase cell images and carry out pairing and nucleation pattern generation.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a karyotyping system comprising

(1) Designing a filtering algorithm and a segmentation algorithm to filter and remove impurities from the human metaphase cell image and extract a dyeing monomer;

(2) designing a recognition algorithm, and recognizing and matching the extracted chromosomes by a correction algorithm so as to generate a nucleation pattern.

Preferably: the flow of the filtering algorithm is as follows:

(1) carrying out binarization on the metaphase image (I) to generate a filter map (B);

(2) the contours of all objects in image (B) are detected and are sequentially denoted as C ₁ -C _n ；

(3) Initializing i ═ 1, empty set contiors;

(4) calculating the contour C _i Area A of _i ；

(5) Impurities were removed according to the following scheme (α, β, η are threshold parameters):

(6) judging whether i is larger than or equal to n, if so: executing the next step, if no, changing i to i +1, and going to step 4;

(7) processing each pixel point in the binary image (B):

if the pixel point (x, y) is located in a certain contour range in the set contents, the pixel point is assigned to 255, otherwise, the pixel point is assigned to 0.

(8) The metaphase image (I) was processed as follows to generate a filtered image (G):

and adding a function of artificially assisting in removing impurities into the chromosome karyotype analysis system, and when the impurities cannot be completely removed by a filtering algorithm, manually assisting in removing the impurities.

Preferably: the flow of the segmentation algorithm is as follows:

(1) initializing an empty set Contours;

(2) detecting the outlines of all the objects of the filter graph and adding the outlines to the set Contours;

(3) initializing i to 1;

(4) calculating the minimum circumscribed rectangle of the ith contour in the set Contours to obtain the coordinates of four vertexes of the minimum circumscribed rectangle in the filter graph;

(5) dividing and rotating the filtered image according to the coordinates obtained in the step four to obtain a vertically placed chromosome i;

(6) judging whether i is more than or equal to the number n of elements in the set Contours, if yes, finishing, otherwise: i equals i +1 and returns to step 4.

Preferably: in the segmentation algorithm, for chromosomes overlapped in a cross manner, the chromosomes are automatically separated by adopting the segmentation algorithm; aiming at other types of overlapped chromosomes, a man-machine interaction mode is adopted for solving, namely, each chromosome is drawn by different colors through a mouse manually, and then the chromosomes are extracted according to the colors by an algorithm.

Preferably: the extraction of the chromosome characteristics comprises the following steps: extracting, area and stripe characteristics of a middle shaft;

the extraction of the middle shaft: the method mainly comprises preprocessing, deleting boundaries layer by layer and post-processing, wherein specific values and graphic representation methods are given to background pixels, pattern pixels, contour pixels and skeleton pixels, the main task of the preprocessing stage is determination of the image contour pixels and elimination of the influence of edge noise, and Sobel edge detection operators are used for detecting the contour of an image; the pixel deleting stage according to the constraint condition is to delete the contour point pixels marked in the preprocessing stage according to the judgment condition and mark the contour point pixels as skeleton pixels; the object processed in the post-processing stage is a skeleton line obtained in the multi-iteration preprocessing and deleting stage, the problem to be solved is that the skeleton line part has the width of two pixels, one pixel is deleted through a corresponding judgment condition to obtain the skeleton line with the single pixel width, and the preprocessing and the post-processing of the algorithm adopt a serial method;

extraction of chromosome length: firstly, determining an end point pixel of a central axis of a chromosome as Q ₀ When the length L of the chromosome is 0, from the point Q ₀ Starting to traverse along the central axis, and finding out a pixel point on the second central axis as Q ₁ If Q is ₁ Is located at Q ₀ L ═ L + 1; if Q ₁ At Q ₀ In a diagonal direction of

Updating the value of L every time one pixel is traversed, and continuously circulating until the L is traversed to the other end point of the central axis, wherein the L at the moment is the length of the chromosome;

extracting chromosome area: the area of the chromosome can be calculated through the binarized chromosome image, and for the binarized binary image with black pixels 0 and white pixels 1 as background, the area of the chromosome is the number of the white pixels;

extracting chromosome banding characteristics: the extraction of the belt line features is to solve the gray information of pixels of the common part of a line perpendicular to each central axis point and the chromosome by taking the central axis point as an independent variable after the central axis is extracted from the chromosome gray image; calculating the dyeing body stripe characteristics by adopting WDD transformation;

normalization of features:

normalization of length and area: calculating the maximum value m and the minimum value n of the length of all chromosomes in one picture, wherein the length of the chromosome with the length value x after normalization is (m-x)/(m-n), the length value of the chromosome after normalization is in the range of [0,1], and the area characteristic of the chromosome is normalized by adopting the same method;

normalization of band-line characteristics: because the WDD function is uniform, the projection curve representing the belt stripes only needs to be normalized before the characteristic of the WDD with the belt stripes is obtained, and because the belt stripes represent the texture information of the chromosomes, the trend of the curve is not influenced by changing the numerical value of the curve, the maximum value and the minimum value of the projection curve of each chromosome are obtained, and the projection value is normalized according to a length normalization method;

further processing of the features: adding the characteristic values of each chromosome set, namely 46 chromosomes, to obtain an average value, and then subtracting the average value from the characteristic value of each chromosome, wherein the purpose of the treatment is to reduce the abnormal difference of the chromosome sets among different people, such as the brightness difference during microscope extraction and the like;

and finally, combining the normalized length, area and projection characteristics to obtain 852-dimensional characteristic data of the chromosomes for chromosome classification.

Preferably: the identification and pairing specifically comprises the steps of adopting an ensemble learning and a correction algorithm based on priori knowledge to carry out identification and pairing, firstly predicting a test sample through an ensemble learner, applying the correction algorithm according to the predicted probability, and finally obtaining a prediction label;

the component learner used for ensemble learning is as follows: kNN, SVM and ELM, selection design for chromosome classification task:

adaptive design for kNN: the classification basis is changed from the Euclidean distance to the average distance, namely, the Euclidean distance between the sample point to be measured and the nearest training sample is changed to the average distance between the sample point to be measured and the nearest training sample of the same type. K is additionally set to 3;

for the adaptive design of the SVM: c is set to 1, g is set to 0.07, and is modified to a probability output;

adaptive design for ELM: the neural unit is set to 1500, modified to a probabilistic output;

the prediction probability adopts an integration algorithm, and the method specifically comprises the following steps:

step1: repeatedly and randomly extracting k samples from the k training samples for 5 times to form 5 new training sets tr ₁ ，tr ₂ ，tr ₃ ，tr ₄ ，tr ₅ ；

Step 2. utilization based on average distancekNN, SVM, ELM, in tr ₁ And training and testing the sample to be tested for the training sample. Then voting the prediction labels of the three is carried out, and the result is recorded as L ₁ (ii) a The probability outputs of the three are averaged, and the result is recorded as P ₁ ；

Step3: repeating the step2 according to different training sets to obtain L ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ Voting is carried out on the 5 prediction labels, and the result is recorded as Label; will P ₁ ，P ₂ ，P ₃ ，P ₄ ，P ₅ Then carrying out average treatment, and recording the result as P;

step 4: and (5) obtaining a final prediction Label by using Label and P and using a correction algorithm, and testing the precision of the final prediction Label.

Preferably: the correction algorithm is specifically a correction algorithm: knowing that the chromosome set of a normal person is 22 pairs of autosomes and +1 pair of sex chromosomes (XX or XY), under the condition of meeting the distribution, obtaining the most possible distribution according to the classification probability, namely the correction algorithm proposed by the product;

defining variables:

P _ij : the probability of the ith chromosome of 46 chromosomes, the jth tag (1. ltoreq. i.ltoreq.46, 1. ltoreq. i.ltoreq.24)

X _ij The indicator variable of the sample to the label is 0 or 1

Solving the maximum probability is to solve:

if the object to be measured is female, the constraint conditions are satisfied as follows:

X _ij ∈{0,1}

j＝24∑X _ij if the object to be measured is male, the constraint condition is satisfied as follows:

X _ij ∈{0,1}

j＝23,24∑X _ij ≤1

the steps of the correction algorithm are as follows:

inputting: 24 classification probabilities of 46 chromosomes

And (3) outputting: predictive tag

Step1, calculating the maximum probability P under the condition of meeting the constraint of males ₁

Step2, calculating the maximum probability P under the condition of satisfying the female constraint condition ₂

The invention has the beneficial effects that:

the karyotype analysis method is combined with technologies such as image processing, machine learning and the like, a set of reliable chromosome karyotype automatic analysis system is developed, automation and intellectualization of chromosome karyotype analysis are achieved, and efficiency and accuracy of chromosome karyotype classification are improved integrally.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of karyotyping;

FIG. 2 is a roadmap for the staining karyotype analysis technique;

FIG. 3 is a karyotype chart;

FIG. 4 is a graph of the effect of the filtering algorithm;

FIG. 5 is a graph of the effect of the segmentation algorithm;

FIG. 6 is a map of adherent chromosomes;

FIG. 7 is an overlapping chromosome map;

FIG. 8 is a processing diagram of adherent chromosomes;

FIG. 9 is a diagram of overlapping chromosome processing;

FIG. 10 is a medial axis extraction view;

FIG. 11 is a diagram of an integrated algorithm structure;

FIG. 12 is a precision versus bar graph;

fig. 13 is a ten-fold accuracy comparison chart.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in FIG. 2, a karyotype analysis system includes

1) Designing a filtering algorithm and a segmentation algorithm to filter and remove impurities from the human metaphase cell image and extract a dyeing monomer;

(2) designing a recognition algorithm and a correction algorithm to recognize and pair the extracted chromosomes, thereby generating a nucleation pattern.

This example will specifically describe the karyotype analysis system of the present invention, taking the karyotype chart of FIG. 3 as an example.

The specific implementation mainly comprises three parts of filtering, impurity removal, segmentation, extraction and identification pairing. The following description will be made for the relevant technical details of these three parts.

Filtering to remove impurities

The part of the work is mainly to remove impurities in the human metaphase cell image, and to achieve the purpose, a corresponding filtering algorithm is designed, and the algorithm flow is as follows:

image I is a metaphase cell image in the upper left corner of the figure in the specification, FIG. 2'

Filter graph B is an explanatory drawing, and ' Filter graph ' at the upper right corner of FIG. 2 '

1. Carrying out binaryzation on the cell metaphase image I to generate a filter image B;

2. the contours of all objects in image B are detected and denoted sequentially as C ₁ -C _n ；

3. Initializing i ═ 1, empty set contiors;

4. calculating the contour C _i Area A of _i ；

5. Impurities were removed according to the following scheme (α, β, η are threshold parameters):

6. judging whether i is larger than or equal to n, if so: executing the next step, if no, changing i to i +1, and going to step 4;

7. processing each pixel point in the binary image B:

8. The metaphase image I was processed as follows to generate a filter map G:

it should be noted that, in order to ensure that chromosomes in the metaphase cell image are not filtered as impurities, the threshold parameters α, β, η in the filtering algorithm are set conservatively, which results in that the filtering algorithm cannot ensure that the impurities in the metaphase cell image are filtered out completely. In order to solve the problem, a function of artificially assisting in removing impurities is added into chromosome karyotype intelligent analysis software, and when impurities cannot be completely removed through a filtering algorithm, the impurities are removed through artificial assistance. The filtering effect is shown in fig. 4.

Segmentation extraction

The work of the part is mainly to extract each chromatid from the filter graph, and in order to achieve the purpose, a corresponding segmentation algorithm is designed, and the algorithm flow is as follows:

1. initializing an empty set Contours;

2. detecting the outlines of all the objects in the filter graph G and adding the outlines to the set Contours;

3. initializing i to 1;

4. calculating the minimum bounding rectangle of the ith contour in the set Contours to obtain the coordinates (x) of the four vertexes in the filter graph ₁ ,y ₁ ),(x ₂ ,y ₂ ),(x ₃ ,y ₃ ),(x ₄ ,y ₄ )；

5. Dividing and rotating the filtering graph G according to the coordinates obtained in the fourth step to obtain a vertically placed chromosome i;

6. judging whether i is more than or equal to the number n of elements in the set Contours, if yes, finishing, otherwise: i +1 and returns to step 4.

The effect of the segmentation algorithm is shown in fig. 5.

Since chromosomes are non-rigid objects, there are instances of conglutination and crossing of chromosomes in metaphase cell images. Therefore, the chromosomes extracted by the segmentation algorithm are not all monomers, and there are conglutinated chromosomes and overlapped chromosomes, as shown in fig. 6 and 7.

Aiming at the conglutinated chromosomes, a human-computer interaction mode is adopted for solving, namely, the conglutinated regions are manually erased through a mouse, and the effect is shown in fig. 8.

For chromosomes with criss-cross overlap (most common), we designed a corresponding algorithm to automatically separate out chromosomes; aiming at other types of overlapped chromosomes (uncommon), a man-machine interaction mode is adopted for solving, namely, each chromosome is drawn by different colors through a mouse manually, and then the chromosomes are extracted according to the colors through an algorithm. The effect is shown in fig. 9.

Feature extraction and processing

The extraction of chromosome characteristics comprises: the extraction of the middle axis (the area and the length are convenient to calculate), the area and the belt line characteristics.

Extracting a chromosome central axis: the method mainly comprises preprocessing, boundary deletion layer by layer and post-processing, wherein a specific value and a graphic representation method are given to background pixels, pattern pixels, contour pixels and skeleton pixels, the main task of the preprocessing stage is determination of the image contour pixels and elimination of the influence of edge noise, and a Sobel edge detection operator is used for detecting the contour of an image; the pixel deleting stage according to the constraint condition is to delete the contour point pixels marked in the preprocessing stage according to the judgment condition and mark the contour point pixels as skeleton pixels; the post-processing stage is to process the skeleton line obtained in the pre-processing and deleting stages by multiple iterations, and solves the problem that the skeleton line part has the width of two pixels, and one pixel is deleted by corresponding judgment conditions to obtain the skeleton line with the single pixel width. The preprocessing and post-processing of the algorithm adopt a serial method, and the algorithm is shown in fig. 10:

extraction of chromosome length: firstly, determining an end point pixel of a central axis of a chromosome as Q ₀ When the length L of the chromosome is 0, from the point Q ₀ Starting to traverse along the central axis, and finding out a pixel point on the second central axis as Q ₁ . If Q ₁ At Q ₀ L ═ L + 1; if Q ₁ At Q ₀ In a diagonal direction of

And updating the value of L every time one pixel is traversed, and continuously circulating until the L is traversed to the other end point of the central axis, wherein the L at the moment is the length of the chromosome.

Extracting chromosome area: the area of the chromosome can be calculated through the binarized chromosome image, and for the binarized binary image with black pixels 0 and white pixels 1 as background, the area of the chromosome is the number of the white pixels.

Extracting the chromosome striation characteristics: the extraction of the stripe features aims at the chromosome gray level image, and after the central axis is extracted, the gray level information of pixels of a common part of a line perpendicular to each central axis point and the chromosome is obtained by taking the central axis points as independent variables. The work uses a global description method, and the obtained band-line characteristics can represent the overall characteristics of the band-line curve. According to related documents, the classifying effect of the chromosome band-shaped characteristics obtained by WDD transformation is better, so that the WDD transformation is adopted to calculate the chromosome band-shaped characteristics. The WDD transformation means that a series of WDD functions are respectively used for carrying out inner product on the chromosome striated curves, and the obtained numerical value is called a WDD coefficient and is used as the striated characteristic value of the chromosome.

Normalization of features:

1. normalization of length and area: and (3) calculating the maximum value m and the minimum value n of the lengths of all chromosomes in one picture, wherein the length of the chromosome with the length value x after normalization is (m-x)/(m-n), and the length value of the chromosome after normalization is in the range of [0,1 ]. The same approach is taken for the area features of the chromosomes for normalization.

2. Normalization of band-line characteristics: because the WDD function is uniform, the projection curve representing the stripe only needs to be normalized before the WDD characteristic of the stripe is obtained, and because the stripe expresses the texture information of chromosomes, the trend of the curve is not influenced by changing the numerical value of the curve, the maximum value and the minimum value of the projection curve of each chromosome are obtained, and the projection value is normalized according to a length normalization method.

Further processing of the features: for each individual chromosome set, i.e., 46 chromosomes, the feature values are added to obtain an average value, and then the average value is subtracted from the feature value of each chromosome. The purpose of this process is to reduce the abnormal differences in the genome between different persons, like differences in darkness at the time of microscopic extraction, etc.

Finally, the length, the area and the projection characteristics after normalization are combined to obtain 852-dimensional characteristic data of the chromosomes for chromosome classification.

Identifying and pairing

The product is identified and paired by adopting an integrated learning and priori knowledge-based correction algorithm. Firstly, a test sample is predicted through an ensemble learner, and a correction algorithm is applied according to the predicted probability to finally obtain a predicted label.

The component learner used for ensemble learning is as follows: kNN, SVM and ELM. The selection basis is that after various learners are tested, the three learners have higher classification precision. The selection design made for the chromosome classification task is as follows:

1. adaptive design for kNN: the classification basis is changed from the Euclidean distance to the average distance, namely, the Euclidean distance between the sample point to be detected and the nearest training sample is changed to the average distance between the sample point to be detected and the nearest training sample of the same kind. K is additionally set to 3;

2. for the adaptive design of the SVM: c is set to 1, g is set to 0.07, and is modified to a probability output;

3. adaptive design for ELM: the neural unit is set to 1500, modified to a probabilistic output.

Fig. 11 shows the structure of the integrated algorithm, and the following describes the specific algorithm:

Step 2. Using the mean distance based kNN, SVM, ELM, in tr ₁ And training and testing the sample to be tested for the training sample. Then voting the prediction labels of the three is carried out, and the result is recorded as L ₁ (ii) a The probability outputs of the three are averaged, and the result is recorded as P ₁ ；

Step3: repeating the step2 according to different training sets to obtain L ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ Voting is carried out on the 5 prediction labels, and the result is recorded as Label; will P ₁ ，P ₂ ，P ₃ ，P ₄ ，P ₅ Then carrying out average treatment, and marking the result as P;

step 4: utilizing Label and P, using correction algorithm to obtain final prediction Label, and testing its accuracy

Description of an integrated algorithm: the improved kNN, SVM, ELM have similar and higher accuracy in the test of chromosome classification. Because of the close precision, no weighted votes were selected, but direct votes were selected. The integration algorithm utilizes the clustering idea, and through the disturbance on the training samples and the selection of three different learners, the classification labels have differences, and the requirements of integration on 'good and different' are met. And finally, the classification precision is further improved through a correction algorithm.

We performed ten-fold cross validation on 550 metaphase maps, with training data: the test data is 9:1, the precision improvement of integration is verified, and fig. 12 is the average precision comparison of ten tests: the precision of the integrated learner reaches 93.89%, the precision is highest in the comparison algorithm, the precision of the SVM and the KNN + are similar, and the precision of the ELM is lowest.

And (3) correcting algorithm: it is known that the normal human genome is 22 pairs of autosomes +1 pairs of sex chromosomes (XX or XY). Under the condition of satisfying the distribution, the most probable distribution is obtained according to the classification probability, and the most probable distribution is the correction algorithm proposed by the work.

Defining variables:

P _ij : probability of the ith chromosome in 46 chromosomes, the jth tag (1. ltoreq. i.ltoreq.46, 1. ltoreq. i.ltoreq.24)

X _ij The indicator variable of the sample to the label is 0 or 1

Solving the maximum probability is to solve:

X _ij ∈{0,1}

j＝24ΣX _ij if the object to be measured is male, the constraint condition is satisfied as follows:

X _ij ∈{0,1}

j＝23,24ΣX _ij ≤1

the steps of the correction algorithm are as follows:

inputting: 24 classification probabilities of 46 chromosomes

And (3) outputting: predictive tag

Likewise, the accuracy of ten-fold cross validation is as shown in fig. 13. The accuracy of the 'integration + correction' algorithm is highest in the ten-fold relationship, the average accuracy is higher than that of other algorithms, and the result verifies the effectiveness of the integration and correction algorithm in the chromosome karyotype analysis.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A karyotyping system, comprising

(2) designing a recognition algorithm and a correction algorithm to recognize and pair the extracted chromosomes so as to generate a nucleation type map;

the identification and pairing specifically comprises the steps of adopting an ensemble learning and a correction algorithm based on priori knowledge to carry out identification and pairing, firstly predicting a test sample through an ensemble learner, applying the correction algorithm according to the predicted probability, and finally obtaining a predicted label;

the component learning device adopted by the ensemble learning comprises: kNN, SVM and ELM, the selection design made for the chromosome classification task:

adaptive design for kNN: changing the classification basis from Euclidean distance to average distance, namely changing the Euclidean distance between the sample point to be detected and the nearest training sample to the average distance between the sample point to be detected and the nearest training sample of the same kind, and setting k to be 3;

step1: repeatedly and randomly extracting k samples from the k training samples, and sequentially extracting for 5 times to form 5 new training sets tr ₁ ，tr ₂ ，tr ₃ ，tr ₄ ，tr ₅ ；

Step2 Using the mean distance based kNN, SVM, ELM, in tr ₁ Training and testing the sample to be tested for training the sample, voting the prediction labels of the three samples, and recording the result as L ₁ (ii) a The probability outputs of the three are averaged, and the result is recorded as P ₁ ；

Step3: repeating the step2 according to different training sets to obtain L ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ Voting the 5 prediction labels, and recording the result as Label; will P ₁ ，P ₂ ，P ₃ ，P ₄ ，P ₅ Then carrying out average treatment, and marking the result as P;

step 4: obtaining a final prediction Label by using Label and P and using a correction algorithm, and testing the precision of the final prediction Label;

the correction algorithm is specifically as follows: knowing that the distribution of the normal human chromosome group is 22 pairs of autosomes +1 pairs of sex chromosomes, XX or XY, and under the condition of meeting the distribution, obtaining the most possible distribution according to the classification probability;

defining variables:

P _ij : the probability of the ith chromosome and the jth label in 46 chromosomes is that i is more than or equal to 1 and less than or equal to 46, and j is more than or equal to 1 and less than or equal to 24

X _ij The indicator variable of the sample to the label is 0 or 1

Solving the maximum probability is to solve:

X _ij ∈{0,1}

Σ X when j takes 24 _ij ＝0；

If the object to be measured is male, the constraint conditions are satisfied as follows:

X _ij ∈{0,1}

Σ X when j takes 23 and 24 _ij ≤1；

The steps of the correction algorithm are as follows:

inputting: 24 classification probabilities for 46 chromosomes;

and (3) outputting: predictive tag

Step1, calculating the maximum probability P under the condition of meeting the constraint of the male ₁

Step3 if P ₁ ≥P ₂ Taking P ₁ The corresponding label is an output label

Otherwise, get P ₂ The corresponding tag is an output tag.

2. The karyotyping system according to claim 1, wherein: the filtering algorithm flow is as follows:

(1) carrying out binarization on the image I in the middle cell stage to generate a filter image B;

(2) detecting the outlines of all objects in the filter image B, and sequentially recording the outlines as C ₁ -C _q ；

(3) Initializing p ═ 1, empty set contacts;

(4) calculating the contour C _p Area A of _p ；

(5) Removing impurities alpha, beta and eta according to the following process as threshold parameters:

if alpha < A _p < beta: adding C _p Adding to the set Contours;

if A is _p Beta. calculating the contour C _p Area B of circumscribed polygon _p ；

If (A) _p /B _p ) Eta is C _p Adding to the set Contours;

(6) judging whether p is more than or equal to q, if so: executing the next step, if no, changing p to p +1, and going to the step (4);

(7) processing each pixel point in the binary image B:

if a pixel (x, y) is within a certain contour range in the set of Contours, the pixel is assigned 255, otherwise the pixel is assigned 0,

(8) the metaphase image I was processed as follows to generate a filter map G:

and adding a function of artificially assisting impurity removal into the chromosome karyotype analysis system, and when the impurities cannot be completely removed by a filtering algorithm, manually assisting the impurity removal.

3. The karyotyping system according to claim 1, wherein: the flow of the segmentation algorithm is as follows:

(1) initializing an empty set Contours;

(3) initializing d-1;

(4) calculating the minimum circumscribed rectangle of the d-th contour in the set Contours to obtain the coordinates of four vertexes of the minimum circumscribed rectangle in the filter graph;

(5) dividing and rotating the filtered image according to the coordinates obtained in the step (4) to obtain a vertically placed chromosome V;

(6) judging whether V is larger than or equal to the element number e in the set Contours, if yes, finishing, otherwise: d +1, and returning to the step (4).

4. A karyotyping system according to claim 3, wherein: automatically separating out chromosomes by adopting a segmentation algorithm; aiming at other types of overlapped chromosomes, a man-machine interaction mode is adopted for solving, namely, each chromosome is drawn by different colors through a mouse manually, and then the chromosomes are extracted according to the colors by an algorithm.

5. The karyotyping system according to claim 1, wherein: the extraction of chromosome characteristics comprises: extracting, area and stripe characteristics of a middle shaft;

extraction of chromosome length: firstly, determining an end point pixel of a central axis of a chromosome as Q ₀ When the length L of the chromosome is 0, from the point Q ₀ Starting to traverse along the central axis, and finding out a pixel point on the second central axis as Q ₁ If Q is ₁ Is located at Q ₀ L ═ L + 1; if Q ₁ Is located at Q ₀ In a diagonal direction of

extracting chromosome area: calculating the area of the chromosome through the binarized chromosome image, wherein the area of the chromosome is the number of white pixels for the binarized binary image with black pixels 0 and white pixels 1 as background;

extracting the chromosome striation characteristics: the extraction of the belt line features is to solve the gray information of pixels of the common part of a line perpendicular to each central axis point and the chromosome by taking the central axis point as an independent variable after the central axis is extracted from the chromosome gray image; calculating the dyeing body stripe characteristics by adopting WDD transformation;

normalization of features:

normalization of band-line characteristics: because the WDD function is uniform, the projection curve representing the belt lines only needs to be normalized before the WDD characteristic of the belt lines is obtained, and because the belt lines express the texture information of chromosomes, the curve value is changed without influencing the trend of the curve, the maximum value and the minimum value of the projection curve of each chromosome are obtained, and the projection value is normalized according to a length normalization method;

further processing of the features: adding the characteristic values of each chromosome set, namely 46 chromosomes, to obtain an average value, and then subtracting the average value from the characteristic value of each chromosome, wherein the purpose of the treatment is to reduce the abnormal difference of the chromosome sets among different people, such as the brightness difference during microscope extraction;