CN111986177A

CN111986177A - Chest rib fracture detection method based on attention convolution neural network

Info

Publication number: CN111986177A
Application number: CN202010845981.5A
Authority: CN
Inventors: 张�雄; 彭司春; 上官宏; 侯婷; 郝雅文; 王安红
Original assignee: Taiyuan University of Science and Technology
Current assignee: Taiyuan University of Science and Technology
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-11-24
Anticipated expiration: 2040-08-20
Also published as: CN111986177B

Abstract

The invention belongs to the field of CT image target detection, and the specific technical scheme is as follows: a chest rib fracture detection method based on an attention convolution neural network comprises the following specific steps: firstly, obtaining a chest rib fracture data set; secondly, training the data set, wherein the training process comprises the following steps: 1) sending the data set into a preprocessing module for preprocessing; 2) extracting primary features by a feature extraction network; 3) extracting multi-scale features by a multi-scale inclusion module, and recombining the features with different scales; 4) the cascade corner pooling prediction module predicts key points and outputs corresponding hot spot graphs/connection vectors/offsets; 5) constraining the hotspot graph/connection vector/offset of the key point through an integral loss function; and thirdly, testing the chest rib fracture data set, classifying and positioning the target according to the predicted parameters of the key points at the upper left corner, the lower right corner and the center, and having high classification accuracy and high positioning precision.

Description

Chest rib fracture detection method based on attention convolution neural network

Technical Field

The invention belongs to the technical field of CT image target detection, and particularly relates to a chest rib fracture detection method based on an attention convolution neural network.

Background

The rib fracture is a pathological phenomenon that the broken end of a rib is shifted inwards and outwards or crushed due to the action modes of direct violence or indirect violence (such as front and back compression of the chest) from different outsides, and is a serious chest wound which often occurs in daily life (such as physical exercise, high-altitude falling, various criminal cases, traffic accidents and the like).

Rib fracture is the main reason for chest pain and hydropneumothorax after trauma of patients, and can bring strong pain to the patients; in addition, the fracture forms are complex and various, which causes certain difficulty in fracture diagnosis. If a doctor wants to find an optimal treatment scheme for a patient in time, an accurate pathological judgment is needed; the fracture lesion of the rib can induce the lesions of adjacent structures such as lung, chest and mediastinum to a certain extent, and the rapid and accurate diagnosis of the rib fracture has positive effect on treating diseases of other parts; in addition, rib fracture diagnosis is important evidence for judicial identification, insurance claims, and the like. Based on the above reasons, accurate diagnosis of the positions and number of the rib fractures is very important for judging the disability degree, the fracture type and the injury grade, improving the medical diagnosis and treatment level and avoiding medical disputes.

The most clinically used diagnostic basis for chest trauma is Computed Tomography (CT). At present, the CT image shot by a commercial or clinically used CT device has higher definition, and compared with a conventional X-ray film, the CT film can accurately obtain the detailed conditions of rib fracture, such as the number and specific parts of the fracture, and can also evaluate the damage of adjacent tissue structures. A doctor with 3 years of clinical experience can accurately diagnose the fracture type and the injury degree by reading the high-definition CT image. However, the current hospital or judicial institution depends heavily on the quality of CT image and the experience of the doctor when diagnosing or treating rib fracture. One person without clinical experience can be inaccurate in diagnosing the rib fracture, and one doctor with clinical experience needs to spend 2-3 minutes and compare the rib fracture back and forth when diagnosing the rib fracture, so that the diagnosis process is time-consuming and tedious. In addition, the manual interpretation is influenced by factors such as interpretation fatigue, the number of rib fractures and non-standard anatomical plane distribution, and the like, and the missed diagnosis rate is high. Therefore, it is highly desirable to develop a rapid and accurate automatic rib fracture identification technique to enable patients to perform surgical treatment as early as possible.

In 2006, professor Geoffrey Hinton first proposed the concept of Deep Learning (DL), which is a method for computers to automatically learn pattern features. Compared with the traditional algorithm, the deep learning technology has strong feature extraction capability, can obtain deep features by relying on a large amount of sample data, has stronger robustness and generalization capability, and has excellent performance in various image processing fields. In recent years, the method has become a research hotspot in the fields of breast cancer, lung nodule, lung tumor prediction and the like.

CornerNet and centret are the latest research results in deep learning-like target detection methods, and perform well in many target detection tasks. However, the direct application of them to chest rib fracture detection has the following problems: firstly, the chest rib fracture detection belongs to the small-scale target detection problem (the size of a whole chest CT image is 1176 multiplied 1194, the fracture size is about 50 multiplied 50-100 multiplied 100, the occupied area of the fracture position is small in the whole image), the fracture form is complex and various, the similarity with the surrounding background is large, and the characteristic effect is not good by directly using HourglassNet to extract the fracture; secondly, the centret has the advantages that the characteristics of the geometric center point are supplemented on the basis of the CornerNet, and the characteristics of the geometric center point of the chest rib fracture are often sparse, so the rib fracture characteristics extracted by the centret are poor in expression capacity, and the detection rate and the classification accuracy of the fracture cannot be guaranteed.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a chest rib fracture detection method, which is characterized in that all rib fracture types contained in a chest CT image are identified by detecting the chest CT image, the fracture positions are marked, a specific boundary box is marked, and the accuracy of the boundary box is higher.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a chest rib fracture detection method based on an attention convolution neural network adopts an attention module to capture effective characteristic points which can most express semantic information in a central area, so as to improve the detection rate and classification accuracy of fracture; and a multi-scale inclusion block is added to effectively extract the rib fracture characteristics with complex and various forms, so that the detection accuracy of the ultra-small target chest rib fracture is improved.

The method comprises the following specific steps:

the method comprises the steps of firstly, scanning a chest by a CT scanner to obtain chest rib fracture data, and carrying out category classification, manual labeling and format conversion on the chest rib fracture data to obtain a chest rib fracture data set.

Step two, training the data set of the fracture of the chest rib, wherein the training process comprises the following steps:

1) sending the data set of the chest rib fracture into a preprocessing module for preprocessing;

2) sending the preprocessed image into a feature extraction network for primary feature extraction;

3) sending the image after the primary feature extraction into a multi-scale inclusion module for multi-scale feature extraction, and recombining features of different scales;

4) sending the reconstructed image into a cascade angular point pooling prediction module, predicting key points at the upper left corner and the lower right corner of the target to be detected, and respectively outputting a heat point map, a connection vector and an offset;

sending the reconstructed image into a central pooling prediction module, predicting a central point of a target to be detected, and respectively outputting an offset and a hot spot map, wherein the central point hot spot map is processed by an attention module;

5) constraining a hot spot graph/a connecting vector/an offset of a corner point and an offset/a hot spot graph of a central point through an integral loss function;

step three, testing the data set of the fracture of the chest rib:

5) and processing the central point hot spot diagram through an attention module, and carrying out target classification and positioning on the predicted parameters of the key points at the upper left corner, the lower right corner and the center.

The hot spot graph is restrained by the hot spot loss function, the connection vector is restrained by the connection vector loss function, and the offset is restrained by the offset loss function.

The multi-scale inclusion module comprises four branches, wherein three branches adopt convolution kernels with the size of 1 x 1 for convolution, and adopt convolution kernels with the size of 3 x 3 for convolution operation on convolution results; the other branch adopts pooling kernels with the size of 3 multiplied by 3 for pooling, and adopts convolution kernels with the size of 1 multiplied by 1 for performing convolution operation on pooled results; and inputting the result of the convolution of the four branches into the convolution layer for data dimension reduction.

The attention module comprises three branches, wherein a first branch adopts a convolution kernel with the size of 1 multiplied by 1 to perform convolution on an input image and obtain a characteristic diagram f (x), a second branch adopts a convolution kernel with the size of 1 multiplied by 1 to perform convolution on the input image and obtain a characteristic diagram g (x), and autocorrelation coefficients of the characteristic diagram f (x) and the characteristic diagram g (x) are calculated to obtain a characteristic image pixel weight value distribution diagram; and the third branch adopts a convolution core with the size of 1 multiplied by 1 to convolute the input image and obtain a characteristic graph h (x), the characteristic graph h (x) is multiplied by a characteristic image pixel weight value distribution graph to obtain a corresponding self-attention mechanism characteristic graph, the point with the maximum weight, namely the response point with the most prominent characteristic, is found, and the self-attention mechanism characteristic graph is superposed on the original input characteristic graph through a modulation factor alpha to obtain output.

The overall loss function comprises corner point position loss, attention point position loss, embedding loss, corner point compensation and attention point compensation loss, and is specifically expressed as follows:

wherein α ═ β ═ 0.1, γ ═ 1;

the position of the corner points is represented,

indicating a loss of position of the point of attention,

the loss of the embedding is indicated,

for reducing the distance of the connecting vector of two corner points belonging to the same object,

for enlarging the distance of the connecting vector of two corner points not belonging to the same object,

representing the compensation loss of the corner point;

indicating a loss of compensation for the point of attention.

The invention improves the inherent structure of the CenterNet (a network model for carrying out target detection based on three key points), so that the CenterNet is suitable for the detection of the fracture of the rib of the chest:

in the aspect of feature extraction, considering that rib fracture belongs to an ultra-small scale target, fracture forms are complex and various and have high similarity with surrounding backgrounds, the detection network provided by the invention reserves a HourglassNet structure, and inputs a feature map into an inclusion module capable of performing multi-scale feature extraction after primary feature extraction is performed by using the HourglassNet structure, so that the detection accuracy of the ultra-small target can be improved while rib fracture features are effectively extracted.

Secondly, an attention subnetwork is designed in a prediction module to be used for adaptively estimating effective characteristic points of a central area of the rib fracture, and the target category and the position of a boundary box are determined according to the estimated central characteristic points and key points at the upper left corner and the lower right corner, so that the positioning precision is high.

Drawings

Fig. 1 is a schematic view of the overall architecture of a rib fracture detection network.

Fig. 2 is a schematic diagram of a multi-scale inclusion block.

FIG. 3 is a schematic diagram of an attention module.

Fig. 4 is a CT image of several fractures commonly seen in rib bone images of the chest.

FIG. 5 is a detected Broken1 rib image with the defect location within a small box.

FIG. 6 is a detected Broken2 rib image with the defect location within a small box.

FIG. 7 is a detected Broken3 rib image with the defect location within a small box.

FIG. 8 is a detected Broken4 rib image with the defect location within a small box.

FIG. 9 is a detected Broken5 rib image with the defect location within a small box.

FIG. 10 is an accurate histogram of three defects detected by different inspection models.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a method for detecting a fracture of a rib of a chest based on an attention convolution neural network includes the following steps:

scanning the chest by adopting a CT scanner to obtain the fracture data of the rib of the chest;

secondly, carrying out category classification, manual labeling and format conversion operation on the chest rib fracture data to obtain a chest rib fracture data set;

step three, training a chest rib fracture data set;

and step four, testing the data set of the chest rib fracture.

The training process is specifically divided into the following steps:

2) firstly, sending the rib fracture image into a HourglassNet (a hourglass network with a symmetrical coding and decoding structure) for primary feature extraction;

3) then sending the obtained product to a multi-scale inclusion module (a basic block of a GoogleNet network), extracting multi-scale features, and recombining the features with different scales;

4) predicting key points of the upper left corner and the lower right corner of the target to be detected in a cascade pooling prediction module, and respectively outputting a heat point diagram, a connection vector and an offset of the key points;

5) in the central pooling prediction module, predicting a central point of a target to be detected, particularly, extracting a sensitive point which can most express the central feature of the target by adopting an attention subnet in the aspect of central point prediction, and outputting a heat point diagram, a connection vector and an offset;

6) and constraining the network with hot spot loss, connection vector loss and offset loss.

The testing process is substantially the same as the training process except that: and step six, classifying and positioning the target according to the predicted parameters of the key points at the upper left corner, the lower right corner and the center.

The specific structure of the multi-scale initiation module is as follows:

in order to improve the detection precision of the network on the chest rib fracture, the feature extraction module comprises a HourglassNet module and a multi-scale inclusion module. The hourglass-shaped HourglassNet is a characteristic extraction network of a coding and decoding structure, so that the preliminary characteristic extraction of rib fracture by the hourglass-shaped HourglassNet ensures the detection degree of the network on rib fracture; in addition, due to the introduction of the multi-scale inclusion module, the receptive field of the network can be increased under the condition of low calculation cost, and meanwhile, the bottom-layer features and the high-layer semantic features of the image are extracted, so that the detection accuracy of the network on the ultra-small target is improved.

As shown in fig. 2, the multi-scale inclusion structure is divided into 4 branches, and first, in order to obtain features of multiple scales simultaneously, the input image is convolved and pooled by respectively adopting convolution kernels with the size of 1 × 1 and pooling kernels with the size of 3 × 3, then convolution operations are respectively carried out on convolution results of convolution kernels with the size of 3 × 3, and convolution operations are carried out on pooled results of convolution kernels with the size of 1 × 1; and secondly, fusing the extracted features with different scales and inputting the fused features into a convolution layer, wherein the convolution layer consists of a bottleneck layer (the size of a convolution kernel is 1 multiplied by 1), a batch normalization layer (BatchNorm, BN) and an activation function layer (Relu). The convolution operation performed by adopting the convolution kernel with the size of 1 multiplied by 1 can be regarded as linear transformation of an input channel, and the operation can realize data dimension reduction, improve the characteristic characterization capability of the network and increase the adaptability of the network to the scale.

Compared with traditional feature extraction modules such as Alexnet and VGG, the multi-scale inclusion module is characterized in that:

(1) input via bottleneck layer (bottleeck, convolution kernel size 1 × 1): the classical multi-scale algorithm is to deconvolute the same input by convolution kernels with different sizes and then combine the results of the deconvolution and convolution kernels with different sizes, which is very computationally intensive. And (3) the bottleeck is introduced to carry out feature dimension reduction, the layer number of the feature graph is reduced, and the nonlinear expression capability of the model is increased while parameters are reduced and calculation is accelerated.

(2) B N and Relu: BN is added to normalize each layer to a Gaussian random variable with standard normal distribution; the nonlinear activation unit with Relu can well transfer errors in back propagation, and the gradient of the Relu activation function is either 1 or 0, so that gradient explosion/disappearance can be inhibited, and the training speed is accelerated. This combination is added after each convolution and the update of the acceleration weight parameters can be repeated.

(3) The convolution with the convolution kernel size of 5 multiplied by 5 is split into 2 convolutions with the convolution kernel size of 3 multiplied by 3, so that the obtained image information is single, the convolution kernels are more abundant, and the width of the network is increased. At the same time, a small convolution kernel may better extract information for smaller objects.

In general, a conventional convolutional layer only performs convolution operation on input data in the same scale (for example, the size of a convolution kernel is constantly 3 × 3), the output data dimension is fixed, and all output features are substantially uniformly distributed on a scale range of 3 × 3, that is, a sparsely distributed feature set is output. However, since features are extracted at multiple scales (e.g. convolution kernel size is 1 × 1, 3 × 3, 5 × 5, etc.), the distribution rule of the multiple features output by the multi-scale initiation module is: the uniform distribution is not presented any more, but the features with strong correlation are gathered together (for example, a plurality of features of 1 × 1 are gathered together, a plurality of features of 3 × 3 are gathered together, and a plurality of features of 5 × 5 are gathered together), and irrelevant non-critical features are weakened, that is, a plurality of sub-feature sets with dense distribution are output. Similarly, the same number of features are output, and the multi-scale acceptance module outputs fewer 'redundant' information of the features. The pure feature set is transmitted layer by layer and finally used as the input of reverse calculation, the natural convergence speed is higher, and a high-quality detection frame can be obtained. In general, the multi-scale inclusion module can decompose a sparse matrix into a dense matrix, so that the calculation amount is greatly reduced, and the network convergence speed is accelerated.

Wherein, the concrete structure of the self-attention module net is as follows:

because the fracture center area features of the rib image are sparse, the target geometric center point features may be useless information, and if the point is captured as the feature basis of classification, the detection rate of the network on the fracture and the classification accuracy can be reduced. In order to solve the problem, a central key point prediction module of the Centernet is improved, the position of a geometric center is not taken as a determination standard of the central key point, and a feature point which is noticed by a self-attention machine is taken as the central key point. The self-attention mechanism can automatically focus on the feature center of the target, and the pixel point with the largest weight value is obtained and used as a central key point by calculating the self-correlation coefficient among the pixel points in the feature graph.

As shown in fig. 3, the structure of the self-attention module is divided into 3 branches. Firstly, convolving an input image by adopting a convolution kernel with the size of 1 multiplied by 1 to obtain feature maps f (x) and g (x) in a 1 st branch and a 2 nd branch respectively, and calculating autocorrelation coefficients of the feature maps f (x) and g (x) to obtain a feature image pixel weight value distribution map; secondly, in the 3 rd branch, the convolution kernel with the size of 1 multiplied by 1 is also adopted to convolute the input image to obtain a feature map h (x), the feature map h (x) is multiplied by the pixel weight size distribution map obtained in the previous step to obtain a corresponding self-attention feature map, and the point with the maximum weight, namely the response point which can highlight the feature most is found; finally, the self-attention characteristic diagram is superposed on the original input characteristic diagram through a modulation factor alpha to obtain output.

The self-attention mechanism does not depend on the position information of the pixel points of the feature graph, the internal correlation of data or features is captured by calculating the similarity between pixels, the dependence on external feature information is reduced, particularly, the important features of sparse data can be extracted quickly, parallel calculation can be performed well, and the calculation efficiency is greatly improved. The self-attention structure is adopted to detect the rib fracture, so that the problems of reduction of the rib fracture detection accuracy and increase of false detection caused by centeret due to sparse characteristics of the central region of the fracture image can be effectively solved, and the self-attention structure has positive effects on improving the rib fracture detection rate and increasing the classification accuracy of fracture types.

The loss function applied to the attention convolution neural network is specifically expressed as follows:

the loss function of the attention convolution neural network is shown as formula (1), and includes five items of loss of corner position, loss of attention point position, embedding loss, corner offsets and loss of attention point offsets.

Wherein, α ═ β ═ 0.1, γ ═ 1,

representing the corner location (headers) loss,

the position loss of the attention point is shown and is an improved version of focal loss (the improvement of a cross entropy loss function balances the imbalance problem of positive and negative sample proportions);

and expressing vector loss, namely, finding a pair of corner points of each target based on the distance between connecting vectors of different corner points, wherein if one upper left corner point and one lower right corner point belong to the same target, the distance between embedding vectors of the upper left corner point and the lower right corner point is small. The embedding training is carried out by

The two loss functions are implemented such that,

for reducing the distance between two corner connecting vectors belonging to the same object,

for enlarging purposes other thanTwo corner points of the same object connect the distance of the vector. Model training

The loss function groups the vertices of the same object,

the loss function is used for separating the vertexes of different targets;

the method specifically adopts learning of a smooth L1 loss function supervision parameter gamma.

The data set construction process comprises the following steps:

the invention collects the image data of 30 patients who are subjected to chest CT examination due to chest trauma, and the resolution of the images is unified to 1176 x 1194. Because the chest rib fracture has different pathological conditions and different directions and is difficult to distinguish, the reasonable classification of the chest rib fracture is particularly important. The invention sets a classification standard for the medical diagnosis method according to the medical diagnosis method, so that the classification standard is suitable for realizing under a deep learning framework and is close to the actual requirement of clinical medicine. Rib fractures are specifically classified into the following 5 categories: bilateral cortical bone fracture, lateral cortical bone fracture, medial cortical bone fracture, cortical bone flexion fracture and other non-classifiable events. As shown in fig. 4, for the convenience of the experiment, the data set was constructed by sequentially designating bilateral cortical bone fractures, lateral cortical bone fractures, medial cortical bone fractures, cortical bone buckling fractures, and other non-classifiable sites as brooken 1, brooken 2, brooken 3, brooken 4, and brooken 5.

Finally, a chest CT rib fracture image data set is successfully constructed by an image cutting data set expansion method from a chest CT rib fracture image collected from a hospital, and 12276 images are all defective images. Among these defective images, there are 11079 single-type defective images and 1197 mixed-type defective images. The single type defect image data set composition is shown in table 1.

TABLE 1 Single type Defect image dataset construction

The overall image dataset is specifically constructed as shown in table 2:

TABLE 2 Overall image dataset composition

In the experiment, the number of the pictures in the training set is 11314, the number of the pictures in the verification set is 212, and the rest 750 pictures are used as the test set. In addition, the invention adopts the marking format of the COCO data set, and the process is as follows: (1) unifying all collected chest rib fracture images into a jpg format, naming by adopting 8-digit numbers, such as 00000001.jpg, dividing the images into three parts, namely training, verifying and testing, and respectively storing the three parts under a train 2014, a minival2014 and a testdev2017 folder of images folders; (2) the rib fracture images in the folders of train 2014, minival2014 and testdev2017 are manually marked through graphic image annotation tool software to be used as group Truth (true value, which indicates the classification accuracy of the training set with supervised learning). Generating an xml file corresponding to each image after the labeling is finished, wherein the file comprises information such as rib categories, coordinates of the upper left corner of a bounding box, length and width and the like; (3) xml files are converted into respective instaces _ train val2014 json, instaces _ mini 2014 json and instaces _ testdev2017 json files and are stored under the annotations folder in a unified mode, and the files in the json format are in a format required by a COCO data set.

The training set in this experiment was trained on three networks, Cornernet, Centernet and our improved network. After training is finished, the test images in the data set are input into a model, 5 types of defects including brooken 1, brooken 2, brooken 3, brooken 4 and brooken 5 are identified and positioned, and the 5 types of defects and the positioning frames are output as target types and positioning frames.

The results of the experiments performed on the above data sets are as follows:

in order to detect the performance of the improved network, the invention adopts a test set consisting of 750 pictures, which correspond to 150 defects of 5 types, namely brooken 1, brooken 2, brooken 3, brooken 4 and brooken 5.

As can be clearly seen from fig. 5, 6, 7, 8 and 9, the invention can accurately detect the 5 ribs, and because the idea of multi-scale inclusion is adopted, the network model can learn the features of the upper layer and the lower layer at the same time, the extracted features are more comprehensive, and the detection precision of the method is improved. And meanwhile, the advantages of the attention module are combined, so that the accuracy of the position of the bounding box is higher.

The invention adopts three indexes of the omission factor, the false alarm rate and the detection position precision to quantitatively analyze the detection performances of CornerNet, CenterNet and the method of the invention on three defects. The statistics of the defect detection results of the three methods are shown in table 3. The statistical results of the detection accuracy, the false detection rate and the missed detection rate of the three methods are shown in table 4. The accuracy histogram for three defects detected using different detection models is shown in fig. 10.

TABLE 3 statistics of the results of the three methods of defect detection (unit: sheet)

TABLE 4 statistical results of detection accuracy, false detection rate and missed detection rate of the three methods

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principles of the present invention are intended to be included therein.

Claims

1. A chest rib fracture detection method based on an attention convolution neural network is characterized by comprising the following specific steps:

firstly, constructing a chest rib fracture data set;

secondly, training a chest rib fracture data set:

thirdly, testing the data set of the fracture of the chest rib:

the early stage process of the test is the same as the training process in the step two, namely the step 1), the step 2), the step 3) and the step 4), and the difference steps are as follows: and the central point hot spot diagram is processed by an attention module, and target classification and positioning are carried out on the predicted parameters of the key points at the upper left corner, the lower right corner and the center.

2. The method of claim 1, wherein the hotspot graph is constrained by a hotspot loss function, the connection vector is constrained by a connection vector loss function, and the offset is constrained by an offset loss function.

3. The chest rib fracture detection method based on the attention convolution neural network is characterized in that the multi-scale inclusion module comprises four branches, wherein three branches adopt convolution kernels with the size of 1 x 1 for convolution, and adopt convolution kernels with the size of 3 x 3 for convolution to perform convolution operation on the convolution results; the other branch adopts pooling kernels with the size of 3 multiplied by 3 for pooling, and adopts convolution kernels with the size of 1 multiplied by 1 for performing convolution operation on pooled results; and inputting the result of the convolution of the four branches into the convolution layer for data dimension reduction.

4. The chest rib fracture detection method based on the attention convolution neural network is characterized in that the attention module comprises three branches, the first branch adopts a convolution kernel with the size of 1 x 1 to convolve an input image and obtain a feature map f (x), the second branch adopts a convolution kernel with the size of 1 x 1 to convolve an input image and obtain a feature map g (x), autocorrelation coefficients of the feature maps f (x) and g (x) are calculated, and a feature image pixel weight value distribution map is obtained; and the third branch adopts a convolution core with the size of 1 multiplied by 1 to convolute the input image and obtain a characteristic graph h (x), the characteristic graph h (x) is multiplied by a characteristic image pixel weight value distribution graph to obtain a corresponding self-attention mechanism characteristic graph, the point with the maximum weight, namely the response point with the most prominent characteristic, is found, and the self-attention mechanism characteristic graph is superposed on the original input characteristic graph through a modulation factor alpha to obtain output.

5. The method for detecting the fracture of the rib of the chest based on the attention convolution neural network as claimed in claim 4, wherein the overall loss function includes a corner point position loss, an attention point position loss, an embedding loss, a corner point compensation and an attention point compensation loss, and is specifically expressed as follows:

wherein α ═ β ═ 0.1, γ ═ 1;

the position of the corner points is represented,

indicating a loss of position of the point of attention,

the loss of the embedding is indicated,

for enlarging the distance between two corner connecting vectors not belonging to the same object,

representing the compensation loss of the corner point;

indicating a loss of compensation for the point of attention.

6. The chest rib fracture detection method based on the attention convolution neural network as claimed in claim 1, wherein in the first step, the specific process of constructing the chest rib fracture data set is as follows: the CT scanner scans the chest to obtain the fracture data of the chest rib, and performs classification, manual labeling and format conversion on the fracture data of the chest rib to obtain a fracture data set of the chest rib.