CN115546142A

CN115546142A - X-ray image bone detection method and system based on deep learning

Info

Publication number: CN115546142A
Application number: CN202211214029.0A
Authority: CN
Inventors: 陈振学; 孙露娜; 王修宇; 张玉娇; 曹佳倩; 赵宏剑; 蔡磊; 孙胜斌; 冀晶晶
Original assignee: Qingdao Jianhua Food Machinery Manufacture Co ltd; Shandong University; Beijing Research Institute of Auotomation for Machinery Industry Co Ltd
Current assignee: Qingdao Jianhua Food Machinery Manufacture Co ltd; Shandong University; Beijing Research Institute of Auotomation for Machinery Industry Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2022-12-30

Abstract

The invention relates to the technical field of image processing, and provides an X-ray image bone detection method and system based on deep learning, wherein the method comprises the following steps: acquiring an X-ray image to be detected and preprocessing the X-ray image; inputting the preprocessed image into a trained two-stage double-U-shaped network, extracting coarse features of a bone region in the image, thinning the extracted coarse features layer by layer from a high level to a low level to obtain a multi-layer thinned feature map with richer and richer details, and fusing the thinned feature maps layer by layer to generate a final bone region significant map. According to the method, the skeleton region salient object detection is carried out on the X-ray image through the two-stage double-U-shaped network, the detection accuracy can be guaranteed through two-stage feature extraction and refinement operation, meanwhile, the network fuses feature maps processed through two stages, high-level semantic information contained in different levels is fully utilized, and the detection accuracy and the image processing speed can be guaranteed at the same time.

Description

X-ray image bone detection method and system based on deep learning

Technical Field

The disclosure relates to the technical field of image processing, in particular to an X-ray image bone detection method and system based on deep learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The X-ray image can display invisible object information such as bones, human tumors, tires, industrial defects and the like on a film or a screen by means of an X-ray imaging technology, and from the view point of image processing, the essence is to convert the distribution information of internal substances of an object to be detected into the pixel size in the image, namely the brightness of the image. X-ray images are widely used in medical science, security inspection and industrial defect detection to assist doctors in diagnosing disease conditions and security inspection personnel in detecting dangerous goods and defective industrial goods. The X-ray image can be used for detecting bones, for example, the distribution relation of bones, muscles and fat in the pork leg is obtained by shooting an X-ray image of the pork leg, and key bone information is extracted from the X-ray image by a bone detection method, so that the bone edge extraction can be realized, a robot can be guided to carry out segmentation, and the preparation for completing a bone and meat separation task is realized.

The inventor finds that the existing X-ray image processing method mostly utilizes image enhancement and denoising to improve the image quality to help manual diagnosis; or, the target detection is performed by using a morphological method or threshold processing, but the target processing result of the existing processing method is difficult to obtain a complete and accurate bone edge, so that the requirement of next-step robot segmentation path planning is difficult to meet.

Disclosure of Invention

In order to solve the problems, the invention provides an X-ray image skeleton detection method and system based on deep learning.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

one or more embodiments provide a deep learning-based X-ray image bone detection method, which includes the following steps:

acquiring an X-ray image to be detected and preprocessing the X-ray image;

inputting the preprocessed image into a trained two-stage double-U-shaped network, extracting coarse features of a bone region in the image, thinning the extracted coarse features layer by layer from a high level to a low level to obtain a multi-layer thinned feature map with richer and richer details, and fusing the thinned feature maps layer by layer to generate a final bone region significant map.

One or more embodiments provide a deep learning based X-ray image bone detection system, comprising:

a data acquisition module: the X-ray image acquisition device is configured to be used for acquiring an X-ray image to be detected and preprocessing the X-ray image;

a data processing module: the method is configured to input the preprocessed image into a trained two-stage double-U-shaped network, perform coarse feature extraction on a bone region in the image, perform layer-by-layer thinning on the extracted coarse features from a high level to a low level to obtain a multi-layer thinned feature map with richer details, and perform layer-by-layer fusion on the thinned feature map to generate a final bone region saliency map.

An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the above method.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above method.

Compared with the prior art, this disclosed beneficial effect does:

in the method, the skeleton region salient object detection is carried out on the X-ray image through a two-stage double-U-shaped network, the detection accuracy can be ensured through two-stage feature extraction and refinement operations, and meanwhile, the network fuses feature maps processed through two stages so as to fully utilize high-level semantic information contained in different levels and ensure the detection accuracy and the image processing speed at the same time; the method can solve the problems of region false detection and incomplete detection existing in the threshold segmentation method, and large memory occupied by calculation.

Advantages of the present disclosure, as well as advantages of additional aspects, will be described in detail in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.

FIG. 1 is a flow chart of a bone detection method using X-ray images in embodiment 1 of the present disclosure;

FIG. 2 is a schematic diagram of an image preprocessing method in embodiment 1 of the present disclosure;

fig. 3 is a schematic diagram of a two-stage dual-U network framework in embodiment 1 of the present disclosure;

fig. 4 is a schematic diagram of the operation of a dual U-type network, which is an important component of the network in embodiment 1 of the present disclosure;

fig. 5 is a schematic diagram of a network important component backbone network residual convolution block in embodiment 1 of the present disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.

Example 1

In one or more embodiments, as shown in fig. 1 to 5, a method for detecting bones based on a deep learning X-ray image includes the following steps:

step 1, obtaining an X-ray image to be detected and preprocessing the X-ray image to be detected, wherein a large amount of noise points existing in X-ray image acquisition are removed through preprocessing;

and 2, inputting the preprocessed image into a trained two-stage double-U-shaped network, extracting coarse features of a bone region in the image, thinning the extracted coarse features layer by layer from a high level to a low level to obtain a multi-layer thinned feature map with richer details, and fusing the thinned feature maps layer by layer to generate a final bone region saliency map.

The two-stage double-U-shaped network comprises a single-coding-double-decoding U-shaped network, a salient and edge U-shaped network and a feature map fusion module;

the two-stage double-U-shaped network obtains a plurality of salient and edge coarse feature maps with different scales through a single coding-double decoding U-shaped network, the two types of coarse feature maps are refined layer by layer from a high level to a low level through an independent salient and edge U-shaped network to obtain refined feature maps with richer details, and finally a feature map fusion module fuses the refined feature maps layer by layer to generate a final skeleton region salient map.

In the embodiment, the skeleton region salient object detection is carried out on the X-ray image through a two-stage double-U-shaped network, the detection accuracy can be ensured through two-stage feature extraction and refinement operations, and meanwhile, the network fuses feature maps processed through two stages, so that high-level semantic information contained in different levels can be fully utilized, and the detection accuracy and the image processing speed can be ensured at the same time; the method can solve the problems of region false detection and incomplete detection existing in the threshold segmentation method, and large memory occupied by calculation.

The method is applied to the pig leg X-ray image bone detection, can effectively provide help for next step of extracting pig leg bone edges and extracting edge key points, and is accurate in detection and higher in real-time performance.

Alternatively, the structure of a two-stage dual-U network is shown in fig. 3 and 4. Conv in the figure is a convolutional layer.

In some embodiments, the single-coding-double-decoding U-type network comprises a plurality of sequentially concatenated encoders Eni, each connected to one significant coarse feature decoder SDei and one edge feature decoder BDei. In this embodiment, the encoding and decoding are set to 5 layers, i =1,2,3,4,5, respectively.

Optionally, the encoder En may use convolution blocks of a residual error network (ResNet-50), the En1 convolution block may be formed by 7 × 7 convolution layers, the step size is 2, and the coding feature size is (w/2) × (h/2) × 64; the step size of the En2 volume block may be 2, the pooling convolution is 3 × 3 of the maximum pooling layer and 3 sets of residual convolutions, where the residual convolutions are as shown in fig. 5, and the coding feature size is (w/4) × (h/4) × 256; the En3 convolution block is composed of 4 groups of residual convolutions, and the coding characteristic size is (w/8) × (h/8) × 512; the En4 convolution block is composed of 6 groups of residual convolutions, and the coding characteristic size is (w/16) × (h/16) × 1024; the En5 convolution block is composed of 3 sets of residual convolutions and has a coding feature size of (w/32) × (h/32) × 2048.

Optionally, the significant coarse feature decoder SDei and the edge feature decoder BDei employ the same decoding module, each decoding module includes two convolutional layers, a BatchNorm layer, and an activation module connected in sequence, the activation module may employ a ReLU activation function, and each decoder input is a fusion feature of a bilinear upsampled output of a previous stage and an output of a corresponding stage in an encoder. The decoding feature sizes of five levels from the top to the bottom may be set to (w/32) × (h/32) × 64, (w/16) × (h/16) × 64, (w/8) × (h/8) × 64, (w/4) × (h/4) × 64, (w/2) × (h/2) × 64, respectively.

In some embodiments, the salient and edge U-type networks include a salient feature refinement U-type network and an edge feature refinement U-type network, and may adopt the same network structure, the salient feature refinement U-type network includes a plurality of cascaded encoders SUEni and corresponding sequentially connected decoders SUDei, and a top-layer refinement module TopRefine is further disposed between the encoders and the decoders. The top layer refinement module TopRefine can comprise a plurality of convolution layers which are connected end to end in sequence.

The U-type network for the significant feature refinement comprises a plurality of cascaded encoders BUEni and corresponding decoders BUDei which are connected in sequence, and a top-layer refinement module TopRefine is further arranged between the encoders and the decoders. The top layer refinement module TopRefine can comprise a plurality of convolution layers which are connected end to end in sequence.

In some embodiments, the feature map fusion module includes a pixel addition module, a fusion convolution module, a bilinear upsampling module.

The fusion output prediction graph is: and (3) restoring the salient and edge characteristic graphs and the fusion characteristic graph of the two-stage double-U-shaped network to the size of an original graph through prediction convolution and bilinear upsampling to obtain an output prediction graph which is a skeleton area salient graph.

Further, the two-stage dual-U type network training process comprises the following steps:

s1, carrying out pixel-level skeleton region calibration on an obtained X-ray image to form a training set;

s1.1: the image can be shot according to different angles through an X-ray machine;

s1.2: and automatically labeling each pixel point by using a threshold segmentation algorithm, and then correcting the skeleton edge region pixel by pixel manually to obtain a true value image label.

S2, preprocessing the images of the training set;

specifically, the pretreatment comprises the following steps: performing mean value reduction operation, performing data enhancement by random horizontal turning, and finally inputting the data as a network to train a two-stage double-U-shaped network;

s3, inputting the preprocessed image into a two-stage double-U-shaped network for training, sequentially extracting salient and edge coarse features based on a single-coding-double-decoding U-shaped network, refining the salient and edge refined feature maps based on independent salient and edge U-shaped network features, and fusing the refined feature maps layer by layer to generate a final skeleton region salient map;

in the training process, the processing process in the two-stage double-U type network comprises a coarse feature extraction stage based on a single coding-double decoding U type network, a feature refinement stage based on independent significant and edge U type networks and significant and edge information fusion processing, and specifically comprises the following steps:

in the coarse feature extraction stage, the input image is subjected to coarse feature coding from a low level to a high level through a main network by a single-coding-double-decoding U-shaped network, namely single coding, and then the input image is subjected to layer-by-layer decoding from the high level to the low level by combining the double-decoding U-shaped network to obtain significant and edge coarse features;

in the feature refinement stage, the salient and edge coarse features are gradually refined from a high level to a low level through two independent salient and edge U-type networks, and abstract semantic information is further refined and obtained through a refinement module at the top of the network, so that a refined feature map with richer details is obtained;

in the fusion operation, the significant and edge thinning characteristic graphs of corresponding scales are gradually fused and upsampled layer by utilizing pixel addition, fusion convolution and bilinear upsampling layer through a characteristic graph fusion module to obtain a skeleton region information graph;

and S4, calculating the cross entropy loss and IOU loss of each feature graph and a label (label edge) through monitoring the feature graph, the edge refined feature graph and the fused skeleton region significant graph, and optimizing the convolution parameters of the model to obtain the trained two-stage double-U-shaped network.

The following description will take the pig leg X-ray image bone detection as an example, and will describe the two-stage dual U-shaped network training process.

Specifically, SA1: and acquiring a pig leg X-ray image with the size of any scale w X h, labeling the image, determining a label, and further constructing a training set and a test set.

The specific process of step SA1 is:

SA1.1: the data set is formed by taking different pig forelegs and hind legs with an X-ray machine.

SA1.2: and automatically labeling each pixel point by using a threshold segmentation algorithm, and then correcting the skeleton edge region pixel by pixel manually to obtain a true value image label.

SA1.3: referring to fig. 2, the training set is enhanced by using a random horizontal flipping operation, so that the training accuracy is improved, and meanwhile, the normalization operation is performed by subtracting the mean value.

Step SA2, preprocessing the images of the training set, and the specific process is as follows:

SA2.1: the average value of all image pixel points is calculated, the average value is firstly subtracted from the input image before the network is trained, and the calculation accuracy of the network can be improved by normalizing the pixel values.

And SA2.2: and randomly and horizontally turning over the normalized image and the labeled image, and finally inputting the images into a network to finish training.

And SA3: the training set of the pre-processed pig leg X-ray images was input into a two-stage dual U-network as shown in fig. 3 for training.

The input image is firstly encoded through a series of convolution operations of a single encoding backbone network to complete multi-resolution characteristics from a low level to a high level, then, a U-shaped network which is decoded in a first stage and is decoded in a double mode is decoded from the high level to the low level to obtain obvious and edge coarse characteristics on five levels, two types of coarse characteristic graphs are refined through a double U-shaped network in a second stage to obtain characteristic graphs with rich details, and semantic information contained in characteristic graphs of different levels is extracted through multilayer 7 × 7, 3 × 3 and 1 × 1 convolution in the process.

Taking five layers as an example, the specific process of step SA3 is:

SA3.1: during the single coding feature process of the first stage, the pre-processed input image (w × h) acquires the coarse coding feature through the first five rolling blocks of the residual network (ResNet-50): the En1 convolution block is composed of 7 × 7 convolution layers, the step size is 2, and the coding characteristic size is (w/2) × (h/2) × 64; the En2 convolution block consists of a maximum pooling layer with a step size of 2 and a pooling convolution of 3 × 3 and 3 sets of residual convolutions, where the residual convolutions are as shown in fig. 5 with a coding feature size of (w/4) × (h/4) × 256; the En3 convolution block is composed of 4 groups of residual convolutions, and the coding characteristic size is (w/8) × (h/8) × 512; the En4 convolution block is composed of 6 groups of residual convolutions, and the coding characteristic size is (w/16) × (h/16) × 1024; the En5 convolution block consists of 3 sets of residual convolutions and has a coding feature size of (w/32) × (h/32) × 2048.

And SA3.2: in the first stage of the dual decoding feature process, the salient and edge feature decoders (denoted as SDei and BDei in fig. 3) consist of five sets of decoding convolutions, which are structurally symmetrical to the encoder, but which differ in module design. The decoding module consists of 2 layers of convolution, a BatchNorm layer and a ReLU activation function. The input to each block is a fused feature of the bilinear upsampled output of the previous stage and the output of the corresponding stage in the encoder. The decoding feature sizes of five levels from the top to the bottom are (w/32) × (h/32) × 64, (w/16) × (h/16) × 64, (w/8) × (h/8) × 64, (w/4) × (h/4) × 64, (w/2) × (h/2) × 64, respectively.

And SA3.3: the U-shaped network for the second stage of significant feature refinement also adopts a coding and decoding structure, the encoder SUEn1 is provided with a convolution layer (convolution kernel 3 x 3, step length of 2, channel number of 64), a BatchNorm layer and a ReLU activation function, the convolution layer with the step length of 2 can realize the halving of the feature size, and the output size is (w/4) × (h/4) × 64. The coarse feature of the corresponding level and the coding feature of the previous level are cascaded as the input of four decoders SUEn2 to SUEn5, so the number of input channels of the convolutional layers of the four decoders is set to 128, the remaining parameters and the number of layers are the same as SUEn1, and the corresponding output features have the sizes of (w/8) × (h/8) × 64, (w/16) × (h/16) × 64, (w/32) × (h/32) × 64, (w/64) × (h/64) × 64. A top-level refinement module with remarkable characteristics is designed between coding and decoding structures, the structure of the top-level refinement module is composed of 4 convolution blocks, parameters of the convolution blocks are set to be convolution kernels of 3 x 3, the step length is 1, the number of channels is 64, the top-level refinement does not change the characteristic size, and the output size is (w/64) x (h/64) x 64. The SUDe5 decoder first concatenates the top-level refined features with the corresponding levels of coding features, then performs normalization and nonlinear processing to accelerate the training process using a deconvolution layer (convolution kernel 4 × 4, step size 2) and decoding features of (w/32) × (h/32) × 64, and then a BatchNorm layer and a ReLU activation function, and SUDe4 to SUDe1 are performed as above, and outputs significantly refined feature sizes of (w/16) × (h/16) × 64, (w/8) × (h/8) × 64, (w/4) × (h/4) × 64, and (w/2) × (h/2) × 64.

And SA3.4: in the edge feature thinning process of the second stage, similarly to S3.4, the output thinned edge feature sizes are (w/32) × (h/32) × 64, (w/16) × (h/16) × 64, (w/8) × (h/8) × 64, (w/4) × (h/4) × 64, (w/2) × (h/2) × 64.

SA3.5: the two types of refined feature map fusion modules are provided with pixel addition, fusion convolution and a bilinear upsampling layer, wherein fusion convolution parameters are convolution kernel 3 x 3, step length is 3, channel number is 64, a BatchNorm layer and a ReLU activation function are arranged behind the fusion convolution kernel 3 x 3, and the size of a final output feature map of the fusion module is (w/2) x (h/2) x 64.

SA3.6: referring to FIG. 3, network and fusion is refined from two featuresModule for obtaining the significant refined features S ¹ r, edge refining feature B ¹ r and fusion characteristics, finally obtaining an output characteristic diagram S by predicting convolution (convolution kernel 3 multiplied by 3, step length 1 and channel number 1) and bilinear upsampling the three characteristics, and outputting the significant characteristic diagram S _wo And an edge feature map B of size w h 1.

SA3.7: the two-stage double-U type network outputs the salient feature map S by monitoring and outputting the feature map S _wo And an edge feature graph B, calculating the cross entropy loss and IOU loss of the network to optimize the network, wherein the loss calculation formula is as follows:

where L represents the total loss of the network.

L _bce Represents cross entropy loss, which is calculated by the formula:

where g (x, y) e {0,1} represents a true value map label, and p (x, y) e [0,1] represents the pixel prediction probability value.

The IOU loss is initially used to measure the similarity between two sets, and is applied to the skeletal region (salient feature) detection task, i.e. the similarity between the prediction map and the truth map (between maps) is measured, and the calculation formula is as follows:

the embodiment can solve the problem of detecting the bone area in the pig leg X-ray image by utilizing a two-stage double-U-shaped network. Carrying out pixel level labeling on the acquired image to form a training set and a testing set, and randomly and horizontally overturning the image in the training process to expand the training set; the method comprises the steps that significant and edge coarse features are extracted based on a single-coding-double-decoding U-shaped structure in a first stage, and a skeleton region is located; in the second stage, coarse features of five levels are gradually refined from a high level to a low level by utilizing a U-shaped network based on independent saliency and edge to obtain a feature map with rich details; the two types of feature fusion processes further excavate the hierarchical relevance of the significant features and the edge features, and ensure the integrity and accuracy of the feature map; cross entropy loss and IOU loss are introduced in a loss calculation stage, and a significant feature map and an edge feature map are supervised. The method can effectively provide help for extracting the pig leg bone edge and the edge key point in the next step, meets the requirements on accuracy and real-time performance, and solves the problems of region false detection, incomplete detection and large calculation memory occupation existing in the detection of the threshold segmentation method.

Example 2

Based on embodiment 1, the present embodiment provides a deep learning based X-ray image bone detection system, including:

a data processing module: the image preprocessing method comprises the steps of inputting a preprocessed image into a trained two-stage double-U-shaped network, extracting coarse features of a bone region in the image, thinning the extracted coarse features layer by layer from a high level to a low level to obtain a multi-layer thinned feature map with richer details, and fusing the thinned feature maps layer by layer to generate a final bone region salient map.

It should be noted that, each module in this embodiment corresponds to each step in embodiment 1 one to one, and the specific implementation process is the same, which is not described again here.

Example 3

The present embodiment provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method of embodiment 1.

Example 4

The present embodiment provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of embodiment 1.

The electronic device provided by the present disclosure may be a mobile terminal and a non-mobile terminal, where the non-mobile terminal includes a desktop computer, and the mobile terminal includes a Smart Phone (such as an Android Phone and an IOS Phone), smart glasses, a Smart watch, a Smart bracelet, a tablet computer, a notebook computer, a personal digital assistant, and other mobile internet devices capable of performing wireless communication.

It should be understood that in the present disclosure, the processor may be a central processing unit CPU, but may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor, or in a combination of hardware and software modules. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and combines hardware thereof to complete the steps of the method. To avoid repetition, it is not described in detail here. Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. The X-ray image bone detection method based on deep learning is characterized by comprising the following steps of:

acquiring an X-ray image to be detected and preprocessing the X-ray image;

inputting the preprocessed image into a trained two-stage double-U-shaped network, extracting coarse features of a bone region in the image, thinning the extracted coarse features layer by layer from a high level to a low level to obtain a multi-layer thinned feature map with richer details, and fusing the thinned feature maps layer by layer to generate a final bone region saliency map.

2. The deep learning-based X-ray image bone detection method according to claim 1, wherein: the two-stage double-U type network comprises a single coding-double decoding U type network, a salient and edge U type network and a feature map fusion module;

the single-coding-double-decoding U-shaped network comprises a plurality of sequentially cascaded encoders Eni, and each encoder Eni is connected with a remarkable coarse feature decoder SDei and an edge feature decoder BDei;

the feature map fusion module comprises a pixel addition module, a fusion convolution module and a bilinear upsampling module.

3. The deep learning-based X-ray image bone detection method according to claim 2, wherein: the decoding modules adopted by the significant coarse feature decoder SDei and the edge feature decoder BDei are the same, and each decoding module comprises two convolutional layers, a BatchNorm layer and an activation module which are connected in sequence.

4. The X-ray image bone detection method based on deep learning of claim 2, characterized in that: the obvious and marginal U-shaped network comprises an obvious characteristic thinning U-shaped network and a marginal characteristic thinning U-shaped network, the same network structure is adopted, the obvious characteristic thinning U-shaped network or the marginal characteristic thinning U-shaped network respectively comprises a plurality of cascaded encoders and decoders which are correspondingly connected in sequence, and a top layer thinning module is further arranged between the encoders and the decoders.

5. The deep learning-based X-ray image bone detection method of claim 4, wherein: the top layer refining module TopRefine comprises a plurality of convolution layers which are sequentially connected end to end.

6. The deep learning-based X-ray image bone detection method according to claim 2, wherein: the two-stage double-U-shaped network training process comprises the following steps:

carrying out pixel-level skeleton region calibration on the obtained X-ray image to form a training set;

preprocessing images of the training set;

inputting the preprocessed image into a two-stage double-U-shaped network for training, sequentially extracting salient and edge coarse features based on a single-coding-double-decoding U-shaped network, refining the salient and edge coarse features based on independent salient and edge U-shaped network features to obtain a salient and edge refined feature map, and fusing the refined feature maps layer by layer to generate a final skeleton area salient map;

and calculating the cross entropy loss and IOU loss of each feature graph and the label by supervising the feature graph, the edge refined feature graph and the fused skeleton region significant graph, and optimizing the convolution parameters of the model to obtain the trained two-stage double-U-shaped network.

7. The deep learning based X-ray image bone detection method according to claim 6, wherein pixel level bone regions are scaled as: automatically labeling each pixel point by using a threshold segmentation algorithm, and then correcting the skeleton edge region pixel by pixel manually to obtain a true value image label;

alternatively, the pre-processing comprises: and carrying out data enhancement by means of average value reduction operation and random horizontal inversion.

8. X-ray image bone detection system based on deep learning, characterized by comprising:

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of any of the methods of claims 1-7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.