CN114842472B

CN114842472B - Method and device for detecting chromosome structure abnormality based on deep learning

Info

Publication number: CN114842472B
Application number: CN202210776295.6A
Authority: CN
Inventors: 宋宁; 韦然; 晏青; 吕明; 马伟旗; 贾瑞
Original assignee: Hangzhou Daigens Biotech Ltd
Current assignee: Hangzhou Daigens Biotech Ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-09-23
Anticipated expiration: 2042-07-04
Also published as: CN114842472A

Abstract

The invention provides a method and a device for detecting chromosome structural abnormality based on deep learning. The detection method comprises the following steps: acquiring chromosome image data of a user to be diagnosed; acquiring a feature matrix of each chromosome through monomer sequence data, type data and banding number data of sister chromatids of each chromosome according to the chromosome image data; obtaining a difference matrix representing the difference between homologous chromosome pairs based on two feature matrices of chromosomes which are homologous to each other; and determining whether the type of chromosome of the user to be diagnosed has a structural abnormality based at least on the difference matrix of the pairs of homologous chromosomes of the various types in the at least one cell. According to the method, the chromosome is represented by the characteristic matrix, and the difference between the homologous chromosome pairs is represented by the difference matrix, so that whether the chromosome structural abnormality exists in a user can be judged according to the difference matrix through deep learning, and the screening efficiency of the chromosome structural abnormality can be greatly improved.

Description

Method and device for detecting chromosome structure abnormality based on deep learning

Technical Field

The present invention relates to detection of structural abnormalities of chromosomes, and more particularly to a method and apparatus for detecting structural abnormalities of chromosomes based on deep learning.

Background

Chromosomal abnormalities, including deletions, duplications, or irregular portions of chromosomal DNA, are the underlying cause of various genetic diseases. Chromosomal abnormalities occur in about 0.6% of live-born infants, which often lead to malformations and/or developmental disorders. Diseases caused by chromosomal abnormalities can have serious consequences such as: abortion and stillbirths caused by chromosomal abnormalities account for 25%, and 50% -60% of abortions in early gestation. With the aid of the detection of chromosomal abnormalities, the clinician can identify all abnormalities that may lead to birth defects. According to a general understanding of chromosomal abnormalities, they can be broadly divided into two types: quantitative and structural anomalies. The former refers to an abnormality in the number of chromosomes. A healthy human cell contains 46 chromosomes of 23 or 24 types. Therefore, by accurately observing the chromosome number, a quantitative abnormality can be easily detected. While structural anomalies are a more challenging type.

Chromosomal structural abnormalities refer to chromosomal abnormalities caused by large chromosomal mutations. Based on the existing chromosome karyotype classification and imaging technology, the chromosome quantity abnormality can be visually and easily identified. Unlike the visual detection of chromosome quantity abnormality, the structural abnormality of the chromosome is expressed on the local part of the image of the karyotype of the chromosome or a plurality of chromosomes, and compared with the quantity abnormality which can be observed by ordinary people only by means of a microscope, the structural abnormality can be detected by a human expert with enough karyotype knowledge. More specifically, structural anomalies include various forms, and domain knowledge-based detection processes cannot be replaced with specific rules. On the other hand, it takes a long time for the human expert to diagnose the structural abnormality. According to the actual examination process of the doctor, there are 10 karyotype pictures for each potential patient, and there are a maximum of 46 chromosomes in each karyotype picture. Therefore, manual detection of structural abnormalities is complicated and time consuming.

In view of the above, it is desirable to provide a method and an apparatus for detecting a chromosome structural abnormality based on deep learning, so as to implement automatic screening of a chromosome structural abnormality by means of a deep learning algorithm, thereby effectively improving screening efficiency of a chromosome structural abnormality.

Disclosure of Invention

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

As described above, in order to solve the problems of the prior art that the manual detection of the chromosome structural abnormality is complex and time-consuming, the invention provides a method and a device for detecting the chromosome structural abnormality based on deep learning, which implement automatic screening of the chromosome structural abnormality by means of a deep learning algorithm, thereby effectively improving the screening efficiency of the chromosome structural abnormality.

The method for detecting the chromosome structural abnormality based on deep learning provided by one aspect of the invention comprises the following steps: acquiring chromosome image data of a user to be diagnosed; acquiring monomer sequence data of sister chromatids in each chromosome of at least one cell of the user to be diagnosed according to the chromosome image data; for each chromosome in the at least one cell, merging monomer sequence data of a sister chromatid thereof into sequence data of the chromosome, and splicing type data and banding number data of the chromosome in the sequence data of the chromosome to obtain a feature matrix of the chromosome; for each type of homologous chromosome pair in the at least one cell, performing adaptive structurally aligned similarity calculation on two feature matrices of the homologous chromosome pair to obtain a difference matrix characterizing differences between the homologous chromosome pair; and determining whether the structural abnormality exists in the chromosome of the type of the user to be diagnosed at least based on the difference matrix of the homologous chromosome pairs of the types in the at least one cell.

When there is a structural abnormality in a chromosome, it is often shown that there is a difference in one or more regions in karyotype between pairs of homologous chromosomes that are homologous to each other, and therefore, the present invention innovatively contemplates detecting the presence or absence of a structural abnormality in a chromosome by deep learning the degree of difference between pairs of homologous chromosomes. In the invention, the chromosome image data is firstly subjected to monomer sequence data extraction of sister chromatids of the chromosomes, monomer sequence data combination and chromosome type and banding number data splicing, so that each chromosome is represented by a feature matrix. And then, a difference matrix for representing the difference between the homologous chromosome pairs is obtained by carrying out adaptive structure alignment similarity calculation on the two feature matrices of the homologous chromosome pairs, so that the chromosome structure abnormality detection can be realized based on the difference matrix.

In an embodiment of the above detection method, optionally, the determining whether there is a structural abnormality in the chromosome of the type of the user to be diagnosed at least based on the difference matrix of the homologous chromosome pairs of the types in the at least one cell further includes: and judging whether the type of chromosome of the user to be diagnosed has structural abnormality or not based on the difference matrix of a plurality of homologous chromosome pairs of the same type in a plurality of cells of the user to be diagnosed.

Considering that the chromosome data in a single cell may receive the influence of batch effect and the influence of biological noise, in order to reduce the influence of batch effect and the influence of noise, the invention synthesizes the data of homologous chromosome pairs of a plurality of cells of the same user for comprehensive prediction, thereby further improving the accuracy of screening the chromosome structure abnormality.

In an embodiment of the foregoing detection method, optionally, the determining whether there is a structural abnormality in the chromosome of the type of the user to be diagnosed based on the difference matrix of the plurality of homologous chromosome pairs of the same type in the plurality of cells of the user to be diagnosed further includes: acquiring a difference combination matrix of the same kind of the user to be diagnosed based on a plurality of difference matrices of a plurality of homologous chromosome pairs of the same kind in a plurality of cells; inputting the difference combination matrix into a first feature aggregation model obtained by pre-training so as to obtain a difference feature matrix based on homologous chromosome pairs of a plurality of cells of the user to be diagnosed; and judging whether structural abnormality exists in the chromosome of each type of the user to be diagnosed based on the difference characteristic matrix of the homologous chromosome pair of each type.

In an embodiment of the foregoing detection method, optionally, a difference combination matrix of the same kind of the user to be diagnosed is obtained by adaptive weighted summation based on a plurality of difference matrices of a plurality of homologous chromosome pairs of the same kind in a plurality of cells; and/or the first feature aggregation model at least performs matrix multiplication processing on the input difference combination matrix.

In an embodiment of the foregoing detection method, optionally, performing adaptive structure-aligned similarity calculation on the two feature matrices of the homologous chromosome pair to obtain a difference matrix representing a difference between the homologous chromosome pair further includes: two feature matrices of the homologous chromosome pair

、

Respectively convolving to obtain respective query matrix

、

Matrix of key values

、

And eigenvalue matrix

、

(ii) a Based on

、

、

Obtaining a feature matrix after difference weighting

Based on

、

、

Obtaining a feature matrix after difference weighting

(ii) a And based on the feature matrix after the difference weighting

、

And obtaining a difference matrix representing differences between homologous chromosome pairs by using a second feature aggregation model obtained by pre-training.

In an embodiment of the foregoing detection method, optionally, the method is based on

、

、

Obtaining a feature matrix after difference weighting

Based on

、

、

Obtaining a feature matrix after difference weighting

Further comprising: will (a) to

，

) Obtaining a first similarity weight matrix after transposition, cross multiplication and normalization processing, and cross multiplication of the first similarity weight matrix

To obtain

Will be (A) and (B)

，

) Obtaining a second similarity weight matrix after transposition, cross multiplication and normalization processing, and cross multiplication of the second similarity weight matrix

To obtain

(ii) a Based on feature matrix after difference weighting

、

And the obtaining of the difference matrix of the difference between the same characteristic source chromosome and the pre-trained second feature aggregation model further comprises: will be provided with

、

And after feature integration, unfolding the integrated features into feature vectors, combining the features, and inputting the feature combinations into the second feature aggregation model to obtain a difference matrix of differences between homologous chromosome pairs.

In an embodiment of the foregoing detection method, optionally, the acquiring monomer sequence data of sister chromatids in each chromosome of at least one cell of the user to be diagnosed according to the chromosome image data further includes: image refining the chromosome image data to extract a skeleton line of the chromatid; and acquiring a gray average value of a preset number of vertical scanning lines on the skeleton line based on the extending direction of the skeleton line, so as to obtain a sequence of the gray average values of preset number of lengths representing the extending direction as the monomer sequence data.

In an embodiment of the above detection method, optionally, the determining whether there is a structural abnormality in the chromosome of the type of the user to be diagnosed at least based on the difference matrix of the homologous chromosome pairs of the types in the at least one cell further includes: inputting the difference matrix into a classifier model obtained by pre-training so as to judge whether the chromosome of the type of the user to be diagnosed has structural abnormality or not; the classifier model is trained by taking a difference matrix of homologous chromosome pairs of an artificial defect chromosome and a normal chromosome as a sample, and is adjusted by taking a difference matrix of homologous chromosome pairs of a real defect chromosome as a sample.

Another aspect of the present invention also provides a detection apparatus for detecting chromosome structural abnormality based on deep learning, including: at least one processor; and a memory coupled to the at least one processor, the memory containing instructions stored therein, which when executed by the at least one processor, cause the detection apparatus to perform a method for detecting a deep learning based chromosomal structural abnormality as described in any one of the embodiments of the present invention.

Another aspect of the present invention also provides a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing a method for detecting a chromosome structure abnormality based on deep learning as described in any one of the embodiments of the present invention.

According to the method and the device for detecting the chromosome structural abnormality based on the deep learning, provided by the invention, the chromosome can be represented through the characteristic matrix, and the difference between the homologous chromosome pairs can be represented through the difference matrix, so that whether the chromosome structural abnormality exists in a user can be judged according to the difference matrix through the deep learning, and the screening efficiency of the chromosome structural abnormality can be greatly improved. In addition, the structural data source of the characteristic matrix of the chromosome is monomer sequence data of sister chromatids, and important information of the chromosome can be accurately acquired as detailed as possible, so that the influence caused by noise of chromosome image data is effectively eliminated, and the accuracy of detecting the structural abnormality of the chromosome is improved.

Drawings

The above features and advantages of the present disclosure will be better understood upon reading the detailed description of embodiments of the disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components having similar relative characteristics or features may have the same or similar reference numerals.

Fig. 1 is a schematic flow chart of a method for detecting structural abnormality of a chromosome based on deep learning according to an aspect of the present invention.

Fig. 2 illustrates chromosome data flow in a method for detecting structural abnormality of a chromosome based on deep learning according to an aspect of the present invention.

Fig. 3 is a schematic diagram illustrating an embodiment of adaptive structure-aligned similarity calculation in a method for detecting structural abnormality of a chromosome based on deep learning according to an aspect of the present invention.

Fig. 4 is a schematic structural diagram illustrating an embodiment of a device for detecting chromosome structural abnormality based on deep learning according to another aspect of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments. It is noted that the aspects described below in connection with the figures and the specific embodiments are only illustrative and should not be construed as imposing any limitation on the scope of the present invention.

The following description is presented to enable any person skilled in the art to make and use the invention and is incorporated in the context of a particular application. Various modifications, as well as various uses in different applications will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

Note that where used, the designations left, right, front, back, top, bottom, positive, negative, clockwise, and counterclockwise are used for convenience only and do not imply any particular fixed orientation. In fact, they are used to reflect the relative position and/or orientation between the various parts of the object. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It is noted that, where used, further, preferably, still further and more preferably is a brief introduction to the exposition of the alternative embodiment on the basis of the preceding embodiment, the contents of the further, preferably, still further or more preferably back band being combined with the preceding embodiment as a complete constituent of the alternative embodiment. Several further, preferred, still further or more preferred arrangements of the belt after the same embodiment may be combined in any combination to form a further embodiment.

The invention is described in detail below with reference to the figures and specific embodiments. It is noted that the aspects described below in connection with the figures and the specific embodiments are only exemplary and should not be construed as imposing any limitation on the scope of the present invention.

Please refer to fig. 1 to understand the specific implementation process of the detection method provided by the present invention. As shown in fig. 1, the detection method provided by the present invention includes:

step S100: acquiring chromosome image data of a user to be diagnosed;

step S200: acquiring monomer sequence data of sister chromatids in each chromosome of at least one cell of a user to be diagnosed according to the chromosome image data;

step S300: merging monomer sequence data of sister chromatids of each chromosome in at least one cell into chromosome sequence data, and splicing chromosome type data and strip number data in the chromosome sequence data to obtain a chromosome feature matrix;

step S400: for each type of homologous chromosome pair in at least one cell, performing adaptive structure-aligned similarity calculation on two feature matrices of the homologous chromosome pair to obtain a difference matrix characterizing differences between the homologous chromosome pair; and

step S600: determining whether there is a structural abnormality in a chromosome of the type of the user to be diagnosed based at least on a difference matrix of pairs of homologous chromosomes of the type in the at least one cell.

Cytologists found that the morphological state of each chromosome could be clearly observed in chromosome specimens stained with fluorescent dyes. In step S100, first, the chromosome image data of the user to be diagnosed may be acquired by various existing or future techniques. For example, microscopic images of chromosomes are obtained by the AutoVision system in the prior art, the chromosome image processing open source software ImageJ referred to by Uhlmann et al, and the like, using existing or future staining techniques. Individual chromosome images are obtained by segmenting the background of the microscopic image from each chromosome region, and further classifying and extracting the chromosome images. The skilled person can adopt the existing or future specific classification and extraction method of chromosome images as long as the chromosome images can be obtained. Referring to fig. 2 together, to understand the step S100, it can be seen from fig. 2 that, for the user to be diagnosed, the chromosome image data of each chromosome in at least one cell (cell 100-cell 300) thereof can be acquired.

The number of chromosomes in somatic cells of human body is 23 pairs and 46, except sex chromosomes, the structures of the rest 22 pairs of autosomes (homologous chromosome pairs) have higher similarity under normal conditions, as shown in FIG. 2, chromosome 111 and chromosome 112 in cell 100 are homologous to each other, and both have similar structures. Chromosome 211 and chromosome 212 in cell 200 are homologous to each other, chromosome 311 and chromosome 312 in cell 300 are homologous to each other, and

chromosomes

111, 112, 211, 212, 311, and 312 are chromosomes belonging to the same species in different cells. When there is a structural abnormality in chromosomes, it is generally said that there is a structural difference between pairs of homologous chromosomes that are homologous to each other. Generally, two chromosomes are each subject to substantially identical defects, so that there is a very low likelihood that structural similarity between pairs of homologous chromosomes having structural defects will be high. Thus, the present invention proposes to detect chromosomal structural abnormalities by detecting the degree of difference between each pair of homologous chromosomes. However, capturing these differences is a challenge because of the different shapes of each chromosome (which can freely bend within the cell) and the possibility that karyotype photographs of the chromosomes may flip.

In order to detect the degree of difference between each pair of homologous chromosomes, it is necessary to extract features of each chromosome, so that each chromosome can be described in machine language. In order to obtain the feature matrix of the chromosome, the present invention performs steps S200 and S300.

For autosomes, each chromosome (e.g., chromosome 112 in fig. 2) contains two sister chromatids, which are juxtaposed and connected by the same centromere, and the DNA sequences of the two sister chromatids are identical, theoretically, the same chromosome image should be left-right symmetric. However, since chromosomes can freely swing or bend in living cells, this will compress the chromosome information on one side. Furthermore, the dye may not be fully attached to the chromosome, and there may be more or less dye in some locations, all of which result in an asymmetric left and right chromosome image.

In order to solve the above problems, the present invention first acquires two sets of monomer sequence data based on sister chromatids from a chromosome image. Specifically, referring to fig. 2, the step S200 of obtaining monomer sequence data of sister chromatids in each chromosome (e.g., chromosome 112) of at least one cell of the user to be diagnosed according to the chromosome image data further includes: for chromosome image data, the chromosome image data can be refined through images, a skeleton line of a chromosome is extracted, and the chromosome is divided into a left sister chromosome monomer and a right sister chromosome monomer by the skeleton line; and then acquiring the gray level average value of a preset number of vertical scanning lines on the skeleton line based on the extending direction of the skeleton line, specifically, acquiring the left side gray level average value and the right side gray level average value of the skeleton line respectively so as to obtain a number sequence (including two groups at the left and right sides) of the gray level average value with preset length representing the extending direction as the monomer sequence data, wherein one of the two groups at the left and right sides is the monomer sequence data.

Because the sequence of the left sister chromatids and the sequence of the right sister chromatids are not completely symmetrical under most conditions due to factors such as distortion, dyeing difference and the like, and the average gray scale is directly adopted to represent a single chromosome, so that much information is lost, the sister chromatid sequence input and recombination form is adopted in the invention. As a more preferred embodiment, the input part may also use a more information input form, such as using more sequences for input (instead of the 2 sequences described above), and then combining by model adaptation.

For example, monomer sequence data

May be [1, 512 ] in shape](vector can be understood), in the present invention, it is first preset that the shape of each chromosome is the same, and monomer sequence data is set

Have the same length (or chromosome height) d = 512.

Subsequently, in step S300, it is first necessary to merge monomer sequence data of his sister chromatids into sequence data of the chromosome

Subsequently in the merging orderColumn data

Upper splicing chromosome type: (

) And band number information: (

) Obtaining a feature matrix of the chromosome

I.e. machine language describing the chromosomes.

For example, in the merging step, the input shape is two bars [1, 512 ]]Is/are as follows

That is, the input shape can be considered as [2, 512 ]]Wherein each sequence data length is 512. The output shape is [14, 64 ]]Where 14 (number of channels) can be understood as 14 different features, 64 can be understood as the input sequence length 512 compressed into 64. Further, at this step, merging may be performed in a convolution-based manner, and the merging of the two sequences may be approximately understood as a model automatic convolution merging sequence performed on a multichannel input with similar three channels of RGB pictures in a general convolution network.

In the above step S300, the input chromosome type shape is [1, 24 ] when corresponding to the machine language](onehot vectors corresponding to 24 chromosome classes), the partial data is obtained by obtaining the chromosome class when the chromosome image is subjected to chromosome instance segmentation and identification, and the person skilled in the art can obtain the chromosome type by the existing and future methods and express the chromosome type by machine language. The shape of the input number of strips is [1, 5 ]](onehot vectors corresponding to 5 different bands), the chromosome band patterns are automatically obtained by a chromosome band pattern recognition module, and similarly, the chromosome band patterns can be obtained by the existing and future methods and expressed by machine language by a person skilled in the artA tape type. In step S300, the merging mode is

Wherein

The obtained shape is [1, 29 ]]Through which is passed

Processed to obtain the shape [14, 64 ]]The feature matrix of (a), i.e.,

。

in a preferred embodiment, step S300 further includes comparing the feature matrix with the feature matrix mentioned above

The integration process is carried out, that is to say that the resulting expression of the individual chromosome characteristics is a characteristic matrix

。

After modeling chromosome features, two chromosome feature matrices can be obtained for two chromosomes which are homologous to each other in each cell, and in the present invention, in step S400, similarity calculation of adaptive structure alignment is performed on two feature matrices of homologous chromosome pairs to obtain a difference matrix representing differences between the homologous chromosome pairs.

Specifically, please refer to fig. 3 to understand the step S400. Step S400 of performing adaptive structure-aligned similarity calculation on the two feature matrices of the homologous chromosome pair to obtain a difference matrix representing the difference between the homologous chromosome pair further includes:

two feature matrices of the homologous chromosome pair

、

Respectively convolving to obtain respective query matrixes

、

Matrix of key values

、

And eigenvalue matrix

、

；

Based on

、

、

Obtaining a feature matrix after difference weighting

Based on

、

、

Obtaining a feature matrix after difference weighting

(ii) a And

based on feature matrix after difference weighting

、

And acquiring a difference matrix representing differences between homologous chromosome pairs by using a second feature aggregation model obtained by pre-training.

In fig. 3, the preferred embodiment of fig. 2 is continued, i.e. based on the feature matrix

By aligning the feature matrix

After transposition and integration again, a feature matrix is obtained

Based on homologous chromosome pairs

、

Convolution is carried out to obtain respective query matrix

、

Matrix of key values

、

And featuresValue matrix

、

。

As can be seen from fig. 3, will

、

After transposition, cross multiplication and normalization (obtaining the characterization)

And

similarity weight matrix of) and

cross multiplication is carried out to obtain the difference weighted

。

Similarly, will

、

And

similarity weight matrix of) and

cross multiplication is carried out to obtain the difference weighted

。

Feature matrix to be differentially weighted

、

Respectively performing feature integration and then unfolding into vectors and performing feature combination

Inputting a pre-trained second feature aggregation model of homologous chromosomes

Obtaining a difference matrix of differences between homologous chromosomes

。

By processing chromosome images, particularly by first acquiring monomer sequence data of sister chromatids in each chromosome, the sequence data of more accurately expressed chromosomes can be obtained by merging the monomer sequence data, and on the basis of the sequence data, a feature matrix of the accurately expressed chromosomes can be obtained by merging type data and band number data. On the basis of the feature matrix, by carrying out similarity calculation of adaptive structure alignment on two feature matrices between homologous chromosome pairs, a difference matrix (namely, a difference matrix for characterizing differences between the homologous chromosome pairs through a machine language) can be obtained

) Thereby being able to capture differences between homologous chromosomes.

In response to having obtained a difference matrix characterizing differences between pairs of homologous chromosomes in machine language, step S600 is performed: and judging whether the type of chromosome of the user to be diagnosed has structural abnormality or not at least based on the difference matrix of the homologous chromosome pairs of various types in the at least one cell.

Specifically, the step S600 further includes: and inputting the difference matrix into a classifier model obtained by pre-training to judge whether the chromosome of the user to be diagnosed has structural abnormality or not. The classifier model is trained by taking a difference matrix of a pair of homologous chromosomes of an artificial defect chromosome and a normal chromosome as a sample, and is adjusted by taking a difference matrix of a pair of homologous chromosomes of a real defect chromosome as a sample.

As a preparatory work, the classifier model described above needs to be trained before the detection method provided by the present invention is performed. Through big data learning, under the condition that enough difference matrixes between normal homologous chromosome pairs and enough difference matrixes between defect homologous chromosome pairs exist, a proper classifier model can be obtained through training, and therefore the difference matrixes between the homologous chromosome pairs of the user to be diagnosed, which are input subsequently, can be distinguished and detected.

Since the number of chromosomes of structural abnormality is much smaller than the number of chromosomes of normal in the real world, the recognizable features of abnormality detection are difficult to explore due to imbalance between normal samples and abnormal samples, and training of classifier models is not facilitated. In order to train and obtain a classifier model with more accurate detection effect, the classifier model in the invention is trained by taking the difference matrix of the homologous chromosome pair of the artificial defect chromosome and the normal chromosome as a sample, and is adjusted by taking the difference matrix of the homologous chromosome of the real defect chromosome as a sample.

Specifically, when training a classifier, a real normal chromosome needs to be acquired, and a first difference matrix (based on the construction method of the difference matrix described above) between a pair of real normal homologous chromosomes is acquired, where both chromosomes that are homologous to each other in the pair of real normal homologous chromosomes are the real normal chromosomes; constructing an artificial defect chromosome based on the real normal chromosome, and acquiring a second difference matrix (based on the construction mode of the difference matrix described above) between an artificial defect homologous chromosome pair, wherein at least one of two chromosomes which are homologous to each other in the artificial defect homologous chromosome pair is the artificial defect chromosome, and the other of the two chromosomes can be the real normal chromosome or the artificial normal chromosome; and training by taking at least the first difference matrix and the second difference matrix as samples to obtain a classifier model for detecting the chromosome structural abnormality. By artificially constructing the defect chromosome, the problem of the number mismatch between the defect chromosome and the normal chromosome in the real world can be solved.

The artificial normal chromosome is a chromosome obtained by constructing an artificial defect chromosome and then performing inverse construction on the artificial defect chromosome. Although the inverse operation is performed, since the relevant chromosome sequence curve needs to be irreversibly smoothed when the artificial chromosome is reconstructed, there is still a slight difference between the artificial normal chromosome obtained by performing artificial construction twice on the real normal chromosome and the real normal chromosome. And the artificial defect chromosome and the artificial real chromosome are used as the pair of the artificial defect homologous chromosomes to construct the difference matrix, so that the trace of the artificial structure chromosome can be reduced.

Further, in order to obtain an accurate classifier model through training, the classifier model needs to be optimized by using a difference matrix of homologous chromosomes of a real defect chromosome as a sample. That is, a true defect chromosome is obtained, and a third difference matrix (based on the construction manner of the difference matrix described above) between a pair of true defect homologous chromosomes, which are two chromosomes that are homologous to each other, are the true defect chromosome and the true normal chromosome, respectively; and training the classifier model by taking the third difference matrix as a sample so as to optimize the classifier model.

After obtaining the representation of the difference (difference matrix) between the homologous chromosome pairs, the difference between the homologous chromosome pairs of the user to be diagnosed can be detected through a pre-trained classifier model, so that whether the chromosome structure abnormality exists in the user to be diagnosed can be judged based on the difference degree between the homologous chromosome pairs of the user to be diagnosed. Therefore, the chromosome structural abnormality of the user can be automatically detected based on the chromosome karyotype picture in a deep learning mode, and the detection efficiency of the chromosome structural abnormality can be greatly improved.

It will be appreciated that the presence of structural chromosomal abnormalities can already be predicted visually after obtaining a differential representation of homologous chromosome pairs within a cell. However, when a chromosome structural abnormality is detected, if only difference data of a pair of homologous chromosomes in a certain cell is considered, the result may be disturbed by noise, and the accuracy of prediction may be lowered.

Therefore, in a preferred embodiment, the detection method provided by the present invention further includes: and judging whether the type of chromosome of the user to be diagnosed has structural abnormality or not based on the difference matrix of a plurality of homologous chromosome pairs of the same type in a plurality of cells of the user to be diagnosed. That is, comprehensive prediction is performed by considering data of a plurality of pairs of homologous chromosomes.

If a chromosome structural abnormality exists for a user, the corresponding chromosomes of the user's multiple cells should all exhibit the abnormality. Therefore, when diagnosing a chromosomal abnormality, if a comprehensive diagnosis is performed based on a plurality of karyotypes corresponding to a plurality of cells, the reliability of the diagnosis result can be improved.

Further, in the present invention, the difference matrix between homologous chromosome pairs of different single cells is not simply repeatedly put into the classifier model for determination. In order to more precisely perform the detection of the structural abnormality of the chromosome based on the difference between the pair of homologous chromosomes of the same type in the plurality of cells, the present invention further comprises the step S500: a plurality of difference feature matrices for pairs of homologous chromosomes of the same type in a plurality of cells of a user to be diagnosed are obtained.

Specifically, as shown in fig. 1 and fig. 2, step S500 further includes:

step S510: acquiring a differential combination matrix of the same kind of the user to be diagnosed based on a plurality of differential matrices of homologous chromosome pairs of the same kind in a plurality of cells; and

step S520 inputs the difference combination matrix into a feature aggregation model obtained by pre-training, so as to obtain a difference feature matrix based on homologous chromosome pairs of a plurality of cells of the user to be diagnosed.

And, in response to the execution of step S500, step S600 correspondingly changes to determine whether there is a structural abnormality in the chromosome of the type of the user to be diagnosed based on the difference feature matrix of each type of homologous chromosome pair, so that it is possible to determine whether there is a structural abnormality in the chromosome of the type of the user to be diagnosed based on a plurality of difference feature matrices of the homologous chromosome pairs of the same type in a plurality of cells of the user to be diagnosed.

That is, through steps S510 and S520, the chromosome sample used for detecting the structural abnormality of the chromosome is case-level rather than cell-level, so that detection errors due to noise and the like can be avoided.

Further, the step S510 specifically includes: obtaining a differential combination matrix for the same kind of the user to be diagnosed by adaptive weighted summation based on a plurality of (m) differential matrices of homologous chromosome pairs of the same kind in a plurality of (m) cells: (

). That is, in this step, it can be considered that a plurality of difference matrices of homologous chromosome pairs of the same kind of a plurality of cells are simply superimposed.

The present invention preferably performs step S520, namely, inputting the difference combination matrix into the feature extraction model obtained by pre-training

The feature extraction model performs matrix multiplication processing on at least the input difference combination matrix to obtainA difference feature matrix based on homologous chromosome pairs of a plurality of cells of the user to be diagnosed: (

). That is, the process of feature extraction is performed again on the difference combination matrix of homologous chromosome pairs of the same kind of a plurality of cells of the case level, thereby characterizing the difference between homologous chromosome pairs of a certain type of a certain case of the case level by a difference feature matrix. It will be appreciated that since there are 23 classes of homologous chromosome sets in humans, there will be 23 difference feature matrices, i.e.

、

、……

。

Accordingly, when the classifier model is trained and the determination is performed based on the case level, it is necessary to train using the difference feature matrix of the pair of the artificial defect chromosome of the case level and the homologous chromosome of the normal chromosome and to adjust the difference feature matrix of the homologous chromosome of the true defect chromosome of the case level as a sample.

Thus far, a method for detecting chromosomal structural abnormality based on deep learning provided by an aspect of the present invention has been described. According to the method, a difference matrix based on the difference between the machine language characterization homologous chromosome pairs can be gradually formed based on the monomer sequence data of the sister chromatids of the chromosomes, so that whether the user to be diagnosed has structural abnormality or not can be judged based on the difference matrix.

Please refer to fig. 4 for an understanding of the apparatus for detecting chromosomal structural abnormality based on deep learning provided in the present invention. As shown in fig. 4, in this embodiment, the detecting apparatus 400 is represented in the form of a general-purpose computer device, and is used to implement the steps of the method for detecting the chromosome structure abnormality based on deep learning described in any one of the above embodiments. For details, please refer to the above description of the method for detecting structural abnormality of chromosome based on deep learning, which is not repeated herein.

The components of the detection apparatus 400 may include one or more memories 401, one or more processors 402, and a bus 403 that connects the various system components (including the memories 401 and the processors 402).

The bus 403 includes a data bus, an address bus, and a control bus. The product of the number of bits of the data bus and the operating frequency is proportional to the data transfer rate, the number of bits of the address bus determines the maximum addressable memory space, and the control bus (read/write) indicates the type of bus cycle and the time at which the present I/O operation is completed. The processor 402 is connected to the memory 401 via a bus 403 and is configured to implement the method for detecting structural abnormality of a chromosome based on deep learning provided in any one of the above embodiments.

The processor 402 is a final execution unit for information processing and program execution, which is an operation and control core of the detection apparatus 400. The operation of all software layers in the computer system will eventually be mapped to the operation of the processor 402 by the instruction set. The processor 402 has the main functions of processing instructions, executing operations, controlling time and processing data.

The memory 401 is a storage device for storing programs and data in the computer. Memory 401 may include computer system readable media in the form of storage volatile memory. Such as Random Access Memory (RAM) 404 and/or cache memory 405.

A Random Access Memory (RAM) 404 is an internal memory that exchanges data directly with the processor 402. It can be read and written at any time (except for refreshing), and is fast, usually used as a temporary data storage medium for an operating system or other programs in operation, and the stored data will be lost when power is off. Cache memory (Cache) 405 is a level one memory existing between main memory and processor 402, and has a relatively small capacity but much higher speed than main memory, close to the speed of processor 402.

It should be noted that, in the case that the detection apparatus 400 includes a plurality of memories 401 and a plurality of processors 402, the plurality of memories 401 and the plurality of processors 402 may have a distributed structure, for example, the detection apparatus may include memories and processors respectively located at a local end and a backend cloud end, and the local end and the backend cloud end jointly implement the above-described method for detecting the chromosome structure abnormality based on deep learning. Furthermore, in the embodiment adopting the distributed structure, each step may adjust a specific execution terminal according to the actual situation, and the specific implementation scheme of each step in a specific terminal should not unduly limit the scope of the present invention.

The memories 401 may store classifier models obtained by pre-training the processors 402, so that after the processors 402 obtain the difference matrix of homologous chromosome pairs, chromosome structural abnormalities are detected based on the classifier models.

The detection apparatus 400 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. In this embodiment, the storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media.

Memory 401 may also include at least one set of program modules 407. Program modules 407 may be stored in memory 401. Program modules 407 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methodologies of embodiments of the invention as described.

The detection apparatus 400 may also communicate with one or more external devices 408. The external device 408 in this embodiment includes a display 409 or other interactive device for interacting with the user, so that the user can clearly and clearly obtain the detection result of the chromosome structure abnormality based on the detection device.

Detection apparatus 400 may also communicate with one or more devices that enable a user to interact with detection apparatus 400, and/or with any device (e.g., network card, modem, etc.) that enables detection apparatus 400 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 410.

The detection apparatus 400 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 411. As shown in fig. 4, the network adapter 411 communicates with the other modules of the detection apparatus 400 via the bus 403. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the detection apparatus 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Another aspect of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the steps of the method for detecting a chromosome structure abnormality based on deep learning described in any one of the above embodiments are implemented, please refer to the above description, which is not repeated herein. In addition, it is understood that the computer-readable storage medium may also be in a system form, that is, a plurality of computer-readable storage sub-media are included, so that the steps of the method for detecting chromosome structural abnormality based on deep learning described above are implemented together by the plurality of computer-readable storage sub-media.

The method and the device for detecting the chromosome structural abnormality based on deep learning provided by the invention have been described so far. According to the method and the device for detecting the chromosome structural abnormality based on the deep learning, provided by the invention, the chromosome can be represented through the characteristic matrix, and the difference between the homologous chromosome pairs can be represented through the difference matrix, so that whether the chromosome structural abnormality exists in a user can be judged according to the difference matrix through the deep learning, and the screening efficiency of the chromosome structural abnormality can be greatly improved. In addition, the structural data source of the characteristic matrix of the chromosome is monomer sequence data of sister chromatids, and important information of the chromosome can be accurately acquired as detailed as possible, so that the influence caused by noise of chromosome image data is effectively eliminated, and the accuracy of detecting the structural abnormality of the chromosome is improved.

The various illustrative logical modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. It is to be understood that the scope of the invention is to be defined by the appended claims and not by the specific constructions and components of the embodiments illustrated above. Those skilled in the art can make various changes and modifications to the embodiments within the spirit and scope of the present invention, and these changes and modifications also fall within the scope of the present invention.

Claims

1. A method for detecting chromosome structural abnormality based on deep learning is characterized by comprising the following steps:

acquiring chromosome image data of a user to be diagnosed;

acquiring monomer sequence data of sister chromatids in each chromosome of at least one cell of the user to be diagnosed according to the chromosome image data;

for each chromosome in the at least one cell, merging monomer sequence data of a sister chromatid thereof into sequence data of the chromosome, and splicing type data and banding number data of the chromosome in the sequence data of the chromosome to obtain a feature matrix of the chromosome;

for each type of homologous chromosome pair in the at least one cell, performing adaptive structural alignment similarity calculation on two feature matrices of the homologous chromosome pair to obtain a difference matrix characterizing differences between the homologous chromosome pair; and

determining whether there is a structural abnormality in the chromosome of the type of the user to be diagnosed based at least on the difference matrix of the pairs of homologous chromosomes of the types in the at least one cell,

performing adaptive structure-aligned similarity computation on the two feature matrices of the homologous chromosome pair to obtain a difference matrix characterizing differences between the homologous chromosome pair further comprises:

two feature matrices of the homologous chromosome pair

、

Respectively convolving to obtain respective query matrixes

、

Matrix of key values

、

And eigenvalue matrix

、

；

Based on

、

、

Obtaining a feature matrix after difference weighting

Based on

、

、

Obtaining a feature matrix after difference weighting

(ii) a And

based on feature matrix after difference weighting

、

And obtaining a difference matrix representing differences between homologous chromosome pairs by a pre-trained second feature aggregation model,

based on

、

、

Obtaining a feature matrix after difference weighting

Based on

、

、

Obtaining a feature matrix after difference weighting

Further comprising:

will (a) to

，

To obtain

Will (a)

，

To obtain

；

Based on feature matrix after difference weighting

、

And the pre-trained second feature aggregation model for obtaining the difference matrix of the difference between the same characteristic source chromosomes further comprises the following steps:

will be provided with

、

And after feature integration, unfolding the chromosome pairs into feature vectors, combining the features, and inputting the feature combinations into the second feature aggregation model to obtain a difference matrix of differences between homologous chromosome pairs.

2. The method of claim 1, wherein determining whether the structural abnormality exists for the type of chromosome of the user to be diagnosed based at least on the difference matrix for the type of homologous chromosome pair in the at least one cell further comprises:

and judging whether the chromosome of the type of the user to be diagnosed has structural abnormality or not based on a plurality of difference matrixes of the homologous chromosome pairs of the same type in a plurality of cells of the user to be diagnosed.

3. The method of claim 2, wherein determining whether the structural abnormality exists for the type of chromosome of the user to be diagnosed based on a plurality of difference matrices for pairs of homologous chromosomes of the same type for a plurality of cells of the user to be diagnosed further comprises:

acquiring a difference combination matrix of the same kind of the user to be diagnosed based on a plurality of difference matrices of homologous chromosome pairs of the same kind in a plurality of cells;

inputting the difference combination matrix into a first feature aggregation model obtained by pre-training so as to obtain a difference feature matrix based on homologous chromosome pairs of a plurality of cells of the user to be diagnosed; and

and judging whether structural abnormality exists in the chromosome of each type of the user to be diagnosed based on the difference characteristic matrix of the homologous chromosome pair of each type.

4. The detection method according to claim 3, wherein a difference combination matrix of the same kind of the user to be diagnosed is obtained by adaptive weighted summation based on a plurality of difference matrices of homologous chromosome pairs of the same kind in a plurality of cells; and/or the presence of a gas in the atmosphere,

the first feature aggregation model performs matrix multiplication processing on at least the input difference combination matrix.

5. The detection method according to claim 1, wherein obtaining monomer sequence data for sister chromatids in each chromosome of at least one cell of the user to be diagnosed from the chromosome image data further comprises:

image refining the chromosome image data to extract skeleton lines of the chromosomes; and

acquiring a gray average value of a preset number of vertical scanning lines on the skeleton line based on the extending direction of the skeleton line, and acquiring a sequence of the gray average values of a preset number of lengths representing the extending direction as the monomer sequence data.

6. The method of any one of claims 1-5, wherein determining whether the type of chromosome of the user to be diagnosed has the structural abnormality based at least on the difference matrix of pairs of homologous chromosomes of various types in the at least one cell further comprises:

inputting the difference matrix into a classifier model obtained by pre-training so as to judge whether the chromosome of the type of the user to be diagnosed has structural abnormality or not; wherein

The classifier model is trained by taking the difference matrix of the homologous chromosome pair of the artificial defect chromosome and the normal chromosome as a sample, and is adjusted by taking the difference matrix of the homologous chromosome pair of the real defect chromosome as a sample.

7. A detection apparatus for detecting chromosome structural abnormality based on deep learning, the detection apparatus comprising: at least one processor; and

a memory coupled to the at least one processor, the memory containing instructions stored therein, which when executed by the at least one processor, cause the detection apparatus to perform the method for detecting deep learning based chromosomal structural abnormalities of any of claims 1-6.

8. A computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for detecting deep learning-based chromosomal structural abnormalities according to any of claims 1-6.