CN116977338A

CN116977338A - Chromosome case-level abnormality prompting system based on visual semantic association

Info

Publication number: CN116977338A
Application number: CN202311235013.2A
Authority: CN
Inventors: 穆阳; 张金超; 高悦; 汤滔; 徐思; 邓代华; 邹磊; 刘丽珏; 蔡昱峰; 彭伟雄
Original assignee: Hunan Zixing Wisdom Medical Technology Co ltd
Current assignee: Hunan Zixing Wisdom Medical Technology Co ltd
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2023-10-31
Anticipated expiration: 2043-09-25
Also published as: CN116977338B

Abstract

The application provides a chromosome case-level abnormality prompting system based on visual semantic association, which adopts a karyotype diagram preprocessing module to divide chromosomes and splice a position code, and then encodes the karyotype diagram into an image feature vector through a karyotype image encoding vector module; then inputting N abnormal core types into a text encoder to obtain text feature vectors, and constructing an abnormal text vector base; finally, calculating the feature similarity between the image feature vector and the abnormal text vector base, outputting the highest similarity and judging whether the similarity reaches a set threshold value, outputting a designated kernel type if the similarity reaches the threshold value, otherwise, outputting no kernel type; the text encoder guides the image encoder to learn important features, and directly detects quantity abnormality and structural abnormality end to end, so that the text encoder is simple and effective.

Description

Chromosome case-level abnormality prompting system based on visual semantic association

Technical Field

The application relates to the technical field of medical artificial intelligence, in particular to a chromosome case-level abnormality prompting system based on visual semantic association.

Background

Chromosome karyotyping (Chromosome karyotyping) is a technique for detecting the presence of abnormal numbers and structures in human chromosomes. However, there are hundreds of metaphase pictures taken on a patient's slide, and the quality varies, and a patient needs to analyze at least 30 metaphase images to determine if an abnormality (quantitative abnormality, structural abnormality) exists. At this time, the doctor is in an unknown state for each metaphase image, and misdiagnosis and missed diagnosis are easy to occur. In order to reduce the burden of professionals and improve the efficiency of nuclear type analysis, a chromosome case level abnormality prompting system based on visual semantic association is provided, and the system can make preliminary judgment on a patient case before analysis by a doctor, so that the doctor can analyze corresponding images in a targeted manner conveniently. Thereby improving the accuracy and efficiency of the nuclear type analysis.

Similar methods for analyzing chromosome abnormalities have been developed in the current chromosome karyotype analysis technology, mainly including the following methods.

And judging whether the chromosome is normal or not by setting rules such as the length of the chromosome and the position of a central point. The method has better image effect on clear chromosome structure, but has difficult rule design and poor effect on nuclear pattern images with poor image effect.

Traditional machine learning based method: and extracting the characteristics of chromosome morphology, gray scale and the like by using characteristic engineering, inputting the characteristics into a classification model for training, and judging whether the chromosome is normal or not. This approach is sensitive to feature design and works poorly when the image is complex or has poor resolution.

The method based on deep learning is to directly perform anomaly detection training on chromosome images by using a convolutional neural network. This approach is more limited to the type of anomaly detected. Specific, such as number 9 inversion; der (13; 15) Roche translocation. The detection model trained for a specific karyotype has great limitation and can only be used for prompting a certain type of abnormality somewhat. Therefore, the case can be prompted on the map in the middle stage; and the system with unrestricted abnormality prompt category is more reasonable and effective.

Disclosure of Invention

The application aims to provide a chromosome case-level abnormality prompting system based on visual semantic association, which can prompt case-level abnormality in units of a single Zhang Zhongqi chart, wherein an abnormality identification sign prompts a case in the system to indicate that the case is likely to be an abnormality case to remind doctors of focus attention, and the corresponding metaphase chart is also marked by an abnormality identification sign. In short, the data of a case is divided and identified to obtain a corresponding karyotype graph. And then automatically inputting the karyotype graphs into the model of the application to obtain the abnormal information corresponding to each karyotype graph in the case, and finally, counting whether the quantity of the abnormal information in the case meets the threshold set by us, if so, throwing out the case abnormality, and displaying in the case, thereby achieving the function of prompting the abnormality of the case.

The application provides a chromosome case-level abnormality prompting system based on visual semantic association, which comprises:

the nuclear pattern diagram preprocessing module is used for dividing chromosomes of the input nuclear pattern diagram, dividing the inputted nuclear pattern diagram according to categories, and outputting 24 chromosome images coded by splicing positions;

the model off-line training module is used for training a model, calling the data of the local kernel image database in a multithreading manner and completing distributed training on a plurality of GPUs;

the nuclear type image coding vector module is used for coding the chromosome nuclear type image into an image characteristic vector;

the core-type text coding module is used for coding the abnormal information core-type text information into text feature vectors;

the abnormal kernel type text vectorization module is used for inputting N abnormal kernel types into the text encoder to obtain abnormal text feature vectors and constructing an abnormal text vector base;

the feature vector similarity calculation module is used for calculating feature similarity calculation of the image feature vector and the abnormal text vector base, outputting the highest similarity and judging whether the similarity reaches a set threshold value, outputting a designated kernel type if the similarity reaches the threshold value, otherwise, not outputting the kernel type;

and the user interaction interface is used for displaying the case level abnormal information judged by the feature vector similarity calculation module and the abnormal information of the list Zhang Zhongqi chart in the case.

Specifically, the karyotype map preprocessing module is responsible for dividing chromosomes of an input karyotype map, outputting 24 divided chromosome images according to categories, calculating the maximum size of all chromosome blocks as standard size, and filling 255 for each chromosome image to reach the standard size, wherein the standard size is defined as 128 x 128;

the nuclear image coding vector module plays a role in junction visual information and semantic information in the whole process, and extracts high-level semantic features of the image by means of strong multi-modal representation capability of the model; unlike pure pixel-level information, these semantic features focus on the visual patterns of chromosomes and are associated with linguistic concepts; the finally output coding vector fully fuses the visual and semantic information of the kernel-type image; these coding features can be used directly for abnormality diagnosis, or can be input into other diagnostic modules to enhance the effect;

the abnormal nuclear type text information coder is used for inputting N abnormal nuclear type text information into the text coder to obtain feature vectors, and constructing an abnormal text vector base;

the feature vector similarity calculation module calculates the similarity between the feature vector of the reconstructed chromosome and the vector of the abnormal text vector base by calculating the cosine similarity between the feature vector of the reconstructed chromosome and the feature vector of the real image, and the kernel image coding vector and the vector of the abnormal text vector base;

the user interaction interface is used for displaying case level abnormality information and abnormality information of a single Zhang Zhongqi chart in the case, playing a role in directly prompting an abnormality identification result and accurately positioning abnormality of the medium-term charts in the case.

Further, in the division of the chromosome in the karyotype map, the chromosome is divided into blocks according to the karyotype category, the long side of each chromosome image is adjusted to be the standard length 128, then the filling quantity of the short side is calculated, and the pixel values 255 are filled at the two sides of the short side to be consistent with the long side 128; the filling formula is as follows:

；

where H denotes a standard length 128, H is the size of the short side to be filled, and pad_h denotes the filling amount on both sides of the short side.

Further, the core pattern diagram preprocessing module includes: firstly dividing the vector into 24 patches in total of 1-22 # chromosome, x-y chromosome according to the category, obtaining 24 vectors through linear mapping, splicing a position code for recording the position of the vector for each vector, and inputting the 24 vectors into a Transformers Encoder encoder to obtain the corresponding image coding vector.

Further, the linear mapping converts 24 patches of 128 x 128 into 24 768-dimensional vectors by convolution, pooling, activation operations, and batch normalization.

Further, the batch normalization adjustment process is as follows:

（1）；

（2）；

（3）；

（4）；

in the method, in the process of the application,for inputting the average value of patch, +.>Representing an input profile, < >>For variance of patch, ++>Is->Normalized value, ++>For scaling parameters +.>For translation parameter, y ⁱ Normalized value for each patch, i= [ 12 3 … … 24]。

Further, the data of the local karyotype image database is derived from millions of metaphase graphs, which are marked with the data of the karyotype analysis result, and each metaphase graph corresponds to one karyotype information text.

Further, the core type text divides words of the core type text information through a text encoder, and inputs the words into the core type encoder to obtain a core type text vector; and then comparing and learning with the image coding vector obtained by the kernel-type image coding vector module, and adjusting the loss function to enable the image coding vector and the kernel-type text vector to tend to be consistent, wherein the loss function is as follows:

；

where q represents a vector obtained by the image encoder,representing the correct text vector matching q, k refers to the number of categories in the dataset, and in contrast learning, this k refers to the number of negative samples, sum in the denominator above is done on 1 positive sample and k negative samples, from 0 to k, so k+1 samples total, T represents a hyper-parameter, a scalar here default value t=1, is a scalar here default value t=1>Representing the loss value of one sample.

Further, the construction steps of the abnormal text vector base are as follows: k+1 abnormal nuclear type information is prepared in advance, a nuclear type expression is obtained through labeling, the k+1 nuclear type information is input into a trained bert text encoder, feature vectors of the corresponding nuclear type information are obtained, and finally a nuclear type information feature vector library with the vector number of k+1 and the vector length of 768 is established.

Further, the process of calculating the feature vector similarity is as follows: inputting a kernel type graph into an image encoder to obtain a feature vector Q, obtaining cosine similarity between the current feature vector Q and all vectors of an abnormal text vector base, obtaining cosine similarity D= [0,1], and when D > threshold, representing that the matched kernel type text information is credible when the D > threshold is greater than the threshold, and otherwise, obtaining the feature vector Q when the D > threshold is less than the threshold, wherein the threshold is a threshold for judging whether the abnormal kernel type is correct; the similarity calculation formula is as follows

；

Wherein Q is a characteristic vector of a core pattern diagram, b _j And the abnormal text vector base stores the feature vector of the core information.

Further, the user interaction interface marks structural anomalies by pointing with arrows, marks the quantity anomalies by circles, and prompts text information for the anomalies.

The application has the beneficial effects that:

the application directly inputs the nuclear image without considering the image film making level, and has better image modeling capability: the global content and the local relation of the core pattern graph can be more fully represented according to the characteristic that the core pattern graph has category prior information and the category blocks;

the application performs end-to-end training: the method can be jointly optimized, the text encoder can guide the image encoder to learn important features, and the number abnormality and the structure abnormality are detected directly end to end, so that the method is simple and effective; and which abnormality can be accurately prompted; semantic information is introduced, and the output result accords with clinical expression habit;

the application has strong interpretation: the detection result is interpreted through text description, so that the structural abnormality of suspected number abnormality numbers in a plurality of images in a case can be known, a doctor or a professional can rapidly and efficiently locate whether the case really appears abnormality or not, and the understandability is improved;

the application is flexible and universal: the method can be applied to other pathological image classification detection tasks in an expanded mode, and more modal information is added, so that model migration is facilitated;

the application is easy to optimize: data may continue to be collected, model architecture adjusted, etc., to iterate the improvement effect.

Drawings

For a further understanding of the nature and technical aspects of the present application, reference should be made to the following detailed description of the application and to the accompanying drawings, which are provided for purposes of reference only and are not intended to limit the application.

In the drawing the view of the figure,

FIG. 1 is a block diagram of the present application;

FIG. 2 is a flow chart of the operation of the application;

FIG. 3 is a diagram of an inventive image encoder model;

FIG. 4 is a diagram of a text encoder model of the application;

FIG. 5 is a model diagram of an inventive core text base;

fig. 6 is a schematic diagram of an abnormality notification system according to the present application.

Detailed Description

In order to further explain the technical means adopted by the present application and the effects thereof, the following detailed description is given with reference to the preferred embodiments of the present application and the accompanying drawings.

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

In the description of the present application, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the drawings are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present application, the term "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" in this disclosure is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been described in detail so as not to obscure the description of the application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Examples

Referring to fig. 1-6, the present application can be divided into three parts, namely a local core image database, a model offline training module, and a core image coding vector module; the core-type text coding module, the abnormal core-type text vectorization module and the feature vector similarity calculation module are used as a part; the user interaction interface is a part.

Firstly, the implementation method of the local chromosome image database, the model off-line training module, the nuclear type image coding module and the nuclear type information coding module is described as follows:

the data of the local karyotype image database is derived from the karyotype analysis results marked by millions of metaphase pictures and used as training data, and each metaphase picture is used as a training label for a karyotype information text.

The left side of fig. 3 is the image encoder of the present application, which inputs a kernel-type map to output a coded vector. As shown in FIG. 3, the left-most part is a complete karyotype map, and even if the karyotype map has chromosome abnormality, the model can hardly distinguish which category has abnormality, so that in order to solve the problem of category position coding, the application firstly divides the complete karyotype map into chromosome maps of small blocks according to categories. The application is divided into 24 patches (1-22, x, y chromosomes) according to categories. 24 vectors are obtained through linear mapping on the right side of fig. 3, and a position code is spliced to each vector for recording the position of each category vector, such as (1, 2,3 … …, 24). These vectors are then input to a Transformers Encoder encoder to obtain corresponding image encoding vectors.

On the right side of fig. 3 is a linear mapping module that converts 24 patches of 128 x 128 into 24 768-dimensional vectors by convolution, pooling, activation operations, and batch normalization. The batch normalization is added for the stability of data input, the generalization capability of a model is enhanced, and the scale dependence of the gradient on the initial value of the parameter is reduced.

；

The above formulaFor inputting the average value of patch, +.>Representing an input feature map. />For variance of patch, ++>Is->The values after the treatment are normalized. The final normalized values are:

；

introduced intoAnd->The scaling and translation processes are performed on two parameters, so that the network can learn the two reconstruction parameters, and the model can pay out the characteristic distribution of the original network.

The text encoder used in the application is shown in fig. 4, which refers to a bert model as a text vector extractor, wherein bert combines the representational learning ability of a transducer and the large-scale pre-training ability, is an important model for text representation and understanding, and is widely applied to various NLP tasks. And the core text information corresponding to the core graph is as karyottype: +18, t (9, 22), the information is first input to the word segmentation device for word segmentation, and then the vector obtained by the word segmentation device is input to the text encoder ebedding to obtain a 768-dimensional text vector.

The model offline training module is used for training the model provided by the application, the model is a multi-mode machine learning model, and vision and semantic combined representation is obtained mainly through image and text contrast learning. When using an image encoder, the image encoder model structure of the present application is shown in FIG. 3, which divides a core image into a plurality of patches, and then generates an ebedding for each patch like word embedding in NLP. These patch enabling are input into transformer encoder along with position embedding, and global features of the image are extracted through a self-attention mechanism. Conventional VIT (Vision Transformer) is to divide an input image into image blocks according to an equal division size, for example, an input 224 x 224 image, and divide it into 16 x 16 patches, each patch size being fixed 14 x 14. However, the size of each chromosome in the karyotype map of the present application is not uniform and the number is not uniform, so that the chromosomes cannot be divided according to the fixed size, and here, we skillfully divide the chromosomes into blocks according to the karyotype category. First, the length of each chromosome image is from the long side to the standard length 128 according to the long side resolution, then the filling amount of the short side is calculated, and the pixel values 255 are filled on both sides of the short side so as to be consistent with the long side 128. Each block is then input transformer encoder by category so that the model can be informed of the category a priori information, and location information can be obtained from the category so that the model can know which chromosome is abnormal. The filling formula is as follows, where H represents the standard length 128, i.e. the size of the image block before we want to input the model, H is the size of the short side to be filled, and pad_h represents the filling amount on both sides of the short side.

；

The filled kernel pattern is input to the image encoder of the present application as shown in fig. 3, and a 768 kernel pattern feature vector q is obtained.

Next, as shown in fig. 4, the text encoder in the present application divides the core text information into words, inputs the words into the core encoder to obtain a core text vector, and performs contrast learning on the core diagram feature vector q obtained by the image encoder and the core text vector k obtained by the text encoder. The characteristic vector of the core pattern diagram and the core pattern text vector tend to be consistent.

The loss function used in the present application is as follows, and this loss design initially shows that if the problem is considered to be a two-class problem, only the data samples and noise samples may not be friendly to model learning, since many noise samples may not be a class at all, and therefore it is reasonable to consider it as a multi-class problem.

；

Where q represents the vector obtained by the image encoder,representing the correct text vector matching q. k refers to the number of categories in the dataset, while in contrast learning, this k refers to the number of negative samples. The sum in the denominator above is done on 1 positive sample and k negative samples, from 0 to k, so k+1 samples (number of core type information categories), T represents a superparameter, which is a scalar where default t=1,>representing the loss value of one sample.

Then, in the construction of the abnormal karyotype text vector base, k+1 abnormal karyotype information is prepared in advance, and all abnormal karyotypes are karyotype expressions obtained by a professional chromosome karyotype analyst through labeling. Inputting the kernel type information in the k+1 into a previously trained bert text encoder to obtain the feature vector of the corresponding kernel type information. Finally, a kernel type information characteristic vector library B with a vector number of k+1 and a vector length of 768 is established, as shown in fig. 5.

Inputting a kernel type graph into an image encoder to obtain a feature vector Q, obtaining cosine similarity between the current vector and all vectors of a kernel type text vector base, obtaining cosine similarity D= [0,1], and when D > threshold, indicating that the matched kernel type text information is credible when the D > threshold is greater than the threshold, and otherwise, obtaining the feature vector Q when the D > threshold is less than the threshold, wherein the threshold is a threshold for whether the abnormal kernel type is correct. The similarity calculation formula is as follows

；

Finally, the doctor is assisted in judging whether the number of cases is abnormal or the structure is abnormal by acquiring the highest similarity and judging whether the set threshold is reached.

Application example:

as shown in FIG. 2, after an abnormal case is subjected to a segmentation and identification algorithm in the front of the system, a metaphase map, that is, a karyotype map corresponding to each cell in the case, is obtained, and the karyotype map is a map obtained by segmenting chromosomes from the metaphase map according to categories and placing the chromosomes at corresponding positions according to identification results. Inputting each karyogram in the case into a previously trained image encoder to obtain an image feature vector Q, and then inputting the vector Q and an abnormal karyotype text vector baseCalculating cosine similarity for matching, and finding out a core text vector ++f with the maximum similarity>. Then determine this similarity +.>Whether greater than the threshold we set.

The karyotype information corresponding to each cell in the case is obtained after the previous operation, and the karyotype information of normal abnormality is agreed, at this time, we count that the frequency ratio of occurrence of a certain type of abnormal karyotype (such as karyotype: +18, t (9, 11)) exceeds n (where n=0.6 is also a representative threshold), and then indicate that the case does have the abnormality (karyotype: +18, t (9, 11)). When a prompt sign is displayed in the system and the case is opened, the cells corresponding to the abnormal are also marked as arrow pointing marks (abnormal structure) and red circle marks (abnormal quantity) in fig. 6.

The embodiments of the present application described above do not limit the scope of the present application.

Claims

1. A chromosome case-level abnormality cue system based on visual semantic association, comprising:

2. The visual semantic association-based chromosome case-level abnormality prompting system according to claim 1, wherein the division of the chromosomes in the karyotype map is to divide the chromosomes into blocks according to the karyotype category, then adjust the long side of each chromosome image to the standard length 128, then calculate the filling amount of the short side, and fill the pixel values 255 on both sides of the short side to be consistent with the long side 128; the filling formula is as follows:

；

3. The system for prompting chromosomal case-level abnormalities based on visual semantic association of claim 1, wherein said karyogram preprocessing module comprises: firstly dividing the vector into 24 patches in total of 1-22 # chromosome, x-y chromosome according to the category, obtaining 24 vectors through linear mapping, splicing a position code for recording the position of the vector for each vector, and inputting the 24 vectors into a Transformers Encoder encoder to obtain the corresponding image coding vector.

4. A chromosome case level abnormality prompting system based on visual semantic association as claimed in claim 3, wherein said linear mapping converts 24 patches of 128 x 128 into 24 768-dimensional vectors by convolution, pooling, activation operations, and batch normalization.

5. The system for prompting chromosomal case-level abnormalities based on visual semantic association of claim 4, wherein said batch normalization is adjusted as follows:

（1）；

（2）；

（3）；

（4）；

6. The visual semantic association-based chromosome case-level abnormality prompting system according to claim 1, wherein the data of the local karyotype image database is derived from data of millions of metaphase graphs marked with karyotype analysis results, and each metaphase graph corresponds to a karyotype information text.

7. The chromosome case-level abnormality prompting system based on visual semantic association according to claim 6, wherein the core-type information text is used for segmenting core-type text information through a text encoder, and inputting the segmented core-type text information into the core-type encoder to obtain a core-type text vector; and then comparing and learning with the image coding vector obtained by the kernel-type image coding vector module, and adjusting the loss function to enable the image coding vector and the kernel-type text vector to tend to be consistent, wherein the loss function is as follows:

；

8. The chromosome case-level abnormality prompting system based on visual semantic association according to claim 6, wherein the step of constructing the abnormal text vector base is: and (3) pre-manufacturing k+1 abnormal nuclear type information, marking the obtained nuclear type expression, inputting the k+1 nuclear type information into a trained bert text encoder to obtain the characteristic vector of the corresponding nuclear type information, and finally establishing a nuclear type information characteristic vector library with the vector number of k+1 and the vector length of 768.

9. The system for prompting chromosomal case-level abnormalities based on visual semantic association according to claim 6, wherein said feature vector similarity calculation is performed by: inputting a kernel type graph into an image encoder to obtain a feature vector Q, obtaining cosine similarity between the current feature vector Q and all vectors of an abnormal text vector base, obtaining cosine similarity D= [0,1], and when D > threshold, representing that the matched kernel type text information is credible when the D > threshold is greater than the threshold, and otherwise, obtaining the feature vector Q when the D > threshold is less than the threshold, wherein the threshold is a threshold for judging whether the abnormal kernel type is correct; the similarity calculation formula is as follows

；

10. The system for prompting chromosomal case-level abnormalities based on visual semantic association according to claim 1, wherein said user interactive interface is pointed by arrows to identify structural abnormalities, to identify a number of abnormalities by circles, and to prompt text information for abnormal cells.