CN112562819B

CN112562819B - Report generation method of ultrasonic multi-section data for congenital heart disease

Info

Publication number: CN112562819B
Application number: CN202011454009.1A
Authority: CN
Inventors: 高跃; 陈自强; 魏宇轩
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2022-06-17
Anticipated expiration: 2040-12-10
Also published as: CN112562819A

Abstract

The invention discloses a report generation method of ultrasonic multi-section data for congenital heart disease, which is characterized by comprising the following steps of: step 1, finishing training data and preprocessing; step 2, completing ultrasonic image feature extraction by utilizing an ultrasonic image feature extractor, wherein a residual structure is adopted in the ultrasonic image feature extractor to transfer shallow texture and color information, 4 convolution modules are adopted, and 2 convolution layers, 2 batch normalization layers and 2 activation functions are arranged in each convolution module; step 3, setting a pathological label graph; step 4, extracting information by adopting a multi-frame ultrasonic image attention mechanism; and 5, establishing a multi-frame ultrasonic image report generation model, and generating a structure of the model based on the multi-section report of the congenital heart disease ultrasonic, wherein the model is established on the basis of clinical basic requirements, and due to the requirement on network speed, a very complex network structure is not selected, and the model established according to the standard also meets the precision standard of the clinical requirement.

Description

Report generation method of ultrasonic multi-section data for congenital heart disease

Technical Field

The invention relates to a method for classifying by utilizing a multi-scale detection network, a multi-scale feature extraction module and a focus area detection module, in particular to a report generation method of ultrasonic multi-section data for congenital heart disease.

Background

Congenital heart disease is one of the most common diseases in newborns in china and many other countries. Congenital heart disease accounts for 8-12 per mill of babies born in China, which means that 12-20 ten thousand congenital heart disease patients are born in China every year, wherein the complicated congenital heart disease which cannot achieve good treatment effect by the existing treatment means or is easy to die in early postnatal period accounts for about 20 percent, and is one of the main death reasons of newborn babies and children.

Although congenital heart diseases are quite common, the heart ultrasonography level of newborns and children is different at present, and the processing capacity of the ultrasonography is in urgent need to be improved. Accordingly, experts and scholars in the related art have proposed the use of artificial intelligence to process relevant ultrasound images. Perrin et al propose a method for classifying congenital heart disease images based on a convolutional neural network. Abdi et al developed a deep convolutional neural network based on quality assessment of apical four-chamber echo slice. Dezaki et al designed a neural network that extracted the temporal correlation of echocardiograms.

The above work as artificial intelligence lays a solid foundation for image recognition application in congenital heart disease, but at present, no system for performing artificial intelligence image processing through echocardiography is available, and no report generation method for ultrasonic multi-section data of congenital heart disease is available.

Disclosure of Invention

The invention aims to provide a report generation method of ultrasonic multi-section data for congenital heart disease, which is established on the basis of clinical basic requirements and improves the report generation efficiency of ultrasonic images.

The technical scheme of the invention provides a report generation method of ultrasonic multi-section data aiming at congenital heart disease, which is characterized by comprising the following steps of:

step 1, finishing training data and preprocessing;

and 2, completing ultrasonic image feature extraction by using an ultrasonic image feature extractor.

In a feature extractor of an ultrasonic image, a residual structure is adopted to transmit texture and color information of a shallow layer, 4 convolution modules are adopted, and 2 convolution layers, 2 batches of normalization layers and 2 activation functions are arranged in each module;

step 3, setting a pathological label graph, automatically extracting various major and minor guest combinations from the report by using a language analyzer, manually screening the combinations to summarize 25 pathological labels, wherein each label comprises a positive observation result and a negative observation result which respectively represent pathology, and after the pathological label graph is extracted, the pathological label graph is used as additional label data to guide a feature extractor to learn;

step 4, extracting information by adopting a multi-frame ultrasonic image attention mechanism;

and step 5, establishing a multi-frame ultrasonic image report generation model, performing theme division and pathology label extraction on the reports in the data set to obtain 5 theme sentences and 25 pathology labels, fusing the input multiple multi-view ultrasonic images by adopting an attention mechanism, and constructing an initial input full-link graph and a full-link adjacency matrix of the 5 theme sentences.

Further, in step 2, first, the picture is changed in size in the image preprocessing operation to 224 × 224 size suitable for the input network, then the picture is passed through 7 × 7 convolutional layers, the picture size is changed to 112 × 112, then the picture size is changed to 56 × 56 through one maximum pooling layer of 3 × 3 and step size 2, then the picture is passed through 4 convolution modules, each convolution module contains 2 3 × 3 convolutional layers, and after 2 3 × 3 convolutional layers, the same distribution of features of each channel is maintained by the batch normalization layer and the ReLu activation layer.

Further, in step 3, a 25-node pathology label graph structure is constructed to simulate the relationship between pathologies

The invention has the beneficial effects that: the identification efficiency of the ultrasonic image is improved by utilizing an artificial intelligence mode, the structure of the model is generated based on the multi-section report of the congenital heart disease ultrasonic, the model is established on the basis of clinical basic requirements, a very complex network structure is not selected due to the requirement on the network speed, and the model established according to the standard also reaches the precision standard of the clinical requirements.

Drawings

FIG. 1 is a diagram of a multi-faceted report generation model architecture.

Fig. 2 is a training structure diagram of an ultrasound image feature extractor.

FIG. 3 is a schematic diagram of a report generation model.

Fig. 4 is a pathology signature diagram.

Detailed Description

The technical scheme of the invention is explained in detail in the following with reference to the attached drawings 1-4.

In order to achieve the purpose of the invention, the work of the classification method based on the congenital heart disease multi-ultrasonic section comprises the following aspects:

and step 1, finishing training data and preprocessing.

The model training data comprises 310 cases, wherein 61 cases of normal person section data, 104 cases of congenital heart disease atrial septal defect patient section data and 145 cases of congenital heart disease ventricular septal defect patient section data. The data classification method is provided by Wuhan Asia heart disease hospitals, and is classified by professional doctors of ultrasonic departments of the Wuhan Asia heart disease hospitals, so that the accuracy of section data classification is guaranteed. The training data are stored in the DICOM format in the sequence shown in table 1, and the number of frames in each slice is different, so that the training data can be pre-processed.

TABLE 1 ultrasonic cardiogram section each classification name table

In the feature extractor of the ultrasonic image, the embodiment adopts a residual structure to transfer the information of texture, color and the like of a shallow layer, and simultaneously avoids the problem of gradient disappearance.

This embodiment employs a design of 4 convolution modules, 2 convolution layers, 2 batch normalization layers and 2 activation functions inside each module. In general, each ultrasound image only needs to pass through an 18-layer network structure, and the method is suitable for efficient ultrasound image feature extraction.

Based on the ResNet18 network, an ultrasonic image feature extractor is designed, and a preliminarily designed model structure is shown in FIG. 2. In the design of the model, the short connection mode of the residual structure is considered in the embodiment, and shallow features in the image are also preserved, so that the embodiment adopts the design mode of the residual structure for a convolution module in a network.

While this embodiment uses only 4 convolution modules due to the total number of layers. For each picture of each slice data, the embodiment inputs it into the network shown in fig. 3.

First, the picture is changed in size in an image preprocessing operation to 224 × 224 size suitable for an input network, then the picture is passed through 7 × 7 convolutional layers, the picture size is changed to 112 × 112, then the picture size is changed to 56 × 56 by one maximum pooling layer of 3 × 3 and step size 2, then 4 convolution modules are passed, each convolution module contains 2 3 × 3 convolutional layers, and after 2 3 × 3 convolutional layers, it is required to pass through a Batch Normalization layer (BN layer) and a ReLu activation layer to keep the same distribution of features of each channel. Before the output of each convolution module, the input features and the convolved features are added and output after passing through a second ReLu activation layer, so that the problem of gradient disappearance is avoided. The structure is referred to work [18] of He et al. After the input image passes through 4 convolution modules, the embodiment classifies the obtained features by using a softmax layer, and the softmax function is a function which is normalized after a group of numbers are expressed by indexes, and is also called a normalized index function, and the formula is shown as (1):

that is, for each class, the weight of the class is calculated in an exponential manner, and the probability that the feature belongs to the jth class is obtained. Due to the characteristics of the exponential function, the classification with low probability can be inhibited during normalization, the classification with high probability is improved, and the method is widely applied to multi-classification problems. After the softmax function is used, a 1 × 10 vector can be obtained, wherein each position i represents the probability that the single-frame picture belongs to the ith classification, and the largest value in the vector is selected to be determined as the classification of the single-frame picture. For the classification of the pathology labels, the embodiment introduces an additional full-link layer output branch for the feature extraction network, predicts a 1 × 25 vector, where each position i corresponds to the output of the ith pathology label, and then the embodiment uses a sigmoid function, where each position i represents the probability that the picture contains the ith pathology label.

And 3, setting a pathological label graph, automatically extracting various main and predicate guest combinations from the report by using a language analyzer, manually screening to summarize into 25 pathological labels, wherein each label comprises a positive observation result and a negative observation result which respectively represent pathology, and after the pathological label graph is extracted, using the pathological label graph as additional label data to guide a feature extractor to learn.

In the training of the ultrasonic image feature extractor, the embodiment can perform classification training on the images according to the angles of the images and whether the images contain obvious heart disease features. However, the angle alone and whether or not the pathological features are included do not provide sufficient guidance for the feature extractor, which can result in image intra-class differences that are the same in the final angle and that both include ASD or VSD features that are too small to facilitate the diversity of automatic report generation. Therefore, this embodiment requires an additional image prior to assist the feature extractor in learning.

In the embodiment, a pathology label graph is innovatively introduced, and the embodiment considers that medical reports need to accurately describe each pathology feature, and the accuracy of the pathology description is far more important than the generation of a pathology-independent word. Thus this embodiment.

In the training process of the ultrasonic image feature extractor, the embodiment requires that the extractor can accurately predict the section to which the ultrasonic image belongs, and for an image with an obvious focus, the embodiment additionally requires that the extractor can predict the type of the congenital heart disease from the image, and parameters are fixed after the feature extractor is trained with the aid of a pathological label graph, so that accurate prior information of the ultrasonic image is provided for a subsequent report generation part.

Therefore, this embodiment creatively adds a structure of a pathology label map in the network, and the pathology label map structure is shown in fig. 4.

And 3.1, analyzing the sentences by using a language analyzer for the whole report data set, extracting the main and predicate object structures, and grouping the extracted main and predicate objects according to the description subject.

And 3.2, dividing the main predicate structure of each group into two types of description directions which respectively correspond to normal and abnormal conditions of pathology, and constructing a pathology label graph structure of 25 nodes by using a graph neural network at the end of the feature extractor to simulate the relation between pathologies.

Considering that the occurrence of the pathologies is not independent of each other, the embodiment needs to consider the correlation of the pathologies in the actual prediction, so the embodiment uses the graph neural network at the end of the feature extractor to construct the pathology label graph structure with 25 nodes for simulating the relationship between the pathologies.

because the ultrasonic image data is a sequence image, the sequence image has great redundancy, and a key problem is how to extract important information from redundant information and generate a report. In order to extract important information and reduce redundancy, the embodiment designs a multi-frame ultrasonic image attention mechanism.

For 20 ultrasound images, this embodiment first performs feature extraction using a pre-trained feature extractor to obtain features in dimension B × 20 × D, and then performs extrusion in dimension D through the first fully connected layer to obtain features in dimension B × 20 × D/r, where r is a set multiple, here 4. After activation by ReLu, the features are again changed back to Bx 20 XD/r size by the second fully connected layer. And finally, mapping the weight between [0,1] after the function is activated through sigmoid. And carrying out bit-wise multiplication operation on the output characteristics and the original characteristics to obtain the characteristics after weighting. In order to retain the original feature information, the weighted feature and the original feature need to be added together bit by bit.

Since the length of the medical report is different and the description format is flexible, the embodiment performs topic division and pathology label extraction on the report in the data set, so as to obtain 5 topic sentences and 25 pathology labels. For the input 20 multi-view ultrasound images, this embodiment first fuses using the attention mechanism and then constructs an initial input fully-connected graph of 5 subject sentences and a fully-connected adjacency matrix. The overall model uses a graph convolution network and a recurrent neural network LSTM, as shown in fig. 3. The reports are generated gradually as the network iterates over time. And generating a word in each iteration step, then carrying out relationship modeling on nodes of 5 subject sentences among different subject sentences through a graph convolution once, and entering the next iteration. In a continuously iterative process, the 5-topic reports are finally generated and combined into the final report result for the input.

This embodiment devises the structure of a multislice report generation model based on precordial disease ultrasound. The model is established on the basis of clinical basic requirements, and due to the requirement on network speed, a very complex network structure is not selected, and the model established according to the standard also reaches the precision standard required by clinical application.

In the ultrasonic image feature extraction model structure, the embodiment adopts a residual structure to transmit the information of texture, color and the like of a shallow layer, and simultaneously avoids the problem of gradient disappearance. Due to the speed limitation, the embodiment adopts the design of 4 convolution modules, 2 convolution layers, 2 batch normalization layers and 2 activation functions in each module, and does not adopt a complex network with a deeper layer number. In general, each ultrasonic image only needs to pass through an 18-layer network structure, and the method is suitable for extracting the clinical section features with high requirements on speed.

In the embodiment, a pathology label graph is innovatively introduced, and the embodiment considers that medical reports need to accurately describe each pathology feature, and the accuracy of the pathology description is far more important than the generation of a pathology-independent word. Therefore, in this embodiment, various combinations of principal and predicate objects are automatically extracted from the report by using the language parser, and are manually screened to be summarized into 25 kinds of pathological labels, each label includes two different observation results, namely positive observation result and negative observation result, which respectively represent pathology. After extracting the pathology label map, the embodiment uses this as additional label data to guide the feature extractor to learn.

Claims

1. A report generation method of ultrasonic multi-section data for congenital heart disease is characterized by comprising the following steps:

step 1, finishing training data and preprocessing;

step 2, completing ultrasonic image feature extraction by utilizing an ultrasonic image feature extractor;

in a feature extractor of an ultrasonic image, a residual structure is adopted to transfer texture and color information of a shallow layer, 4 convolution modules are adopted, and 2 convolution layers, 2 batch normalization layers and 2 activation functions are arranged in each module;

for N ultrasonic images, firstly, a pre-trained feature extractor is used for feature extraction to obtain B multiplied by N multiplied by D dimensional features, then extrusion is carried out on D dimension through a first full connection layer to obtain B multiplied by N multiplied by D/r features, wherein r is a set multiple and is 4; after the activation by ReLu, changing the characteristics back to the size of BxNxD/r again by a second full connection layer, and finally mapping the weight between [0,1] after the activation by sigmoid; carrying out bit-wise multiplication operation on the output features and the original features to obtain weighted features; in order to retain the original feature information, the weighted feature and the original feature need to be added together bit by bit;

step 5, establishing a multi-frame ultrasonic image report generation model, performing theme division and pathology label extraction on the reports in the data set to obtain 5 theme sentences and 25 pathology labels, fusing a plurality of input multi-view ultrasonic images by adopting an attention mechanism, and constructing an initial input full-link graph and a full-link adjacency matrix of the 5 theme sentences;

the integral model adopts graph convolution network and circular neural network LSTM; the report is gradually generated in the continuous iteration process of the network; generating a word in each iteration step, then carrying out relation modeling between different topic sentences on nodes of 5 topic sentences through one graph convolution, and entering the next iteration; in a continuously iterative process, the 5-topic reports are finally generated and combined into the final report result for the input.

2. The method of claim 1, wherein in step 2, the size of the picture is changed in the image preprocessing operation to be 224 x 224 size suitable for the input network, then the picture is passed through 7 x 7 convolutional layers with the picture size being 112 x 112, then the picture size is changed to 56 x 56 through a 3 x 3 maximum pooling layer with step size being 2, then the picture is passed through 4 convolution modules, each convolution module contains 2 3 x 3 convolutional layers, and the features of each channel are kept the same distribution through the batch normalization layer and the ReLu activation layer after passing through 2 3 x 3 convolutional layers.

3. The method of claim 1, wherein in step 3, a 25 node pathology label graph structure is constructed to simulate the relationship between pathologies.