CN110310280B

CN110310280B - Image recognition method, system, equipment and storage medium for hepatobiliary duct and calculus

Info

Publication number: CN110310280B
Application number: CN201910620001.9A
Authority: CN
Inventors: 蔡念; 符小睿; 夏皓; 王慧恒; 王晗; 王平
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2021-05-11
Anticipated expiration: 2039-07-10
Also published as: CN110310280A

Abstract

The invention discloses an image recognition system for hepatobiliary ducts and calculi, which comprises: the input module is used for acquiring an image to be detected and inputting the image to the trained sparse convolution neural network; the sparse convolution neural network is used for carrying out image identification on the image to be detected and dividing position outlines of the hepatobiliary duct and the gall-stone in the image to be detected in an identification result; each convolution layer in the 1 st encoder in the sparse convolutional neural network adopts a small convolution kernel, and each convolution layer in the rest encoders and each decoder adopts a sparse convolution kernel; in the first encoding process, the feature maps under different scales are output to the subsequent encoders and decoders, and are fused with the feature maps under the corresponding resolution. By applying the scheme of the application, the hepatobiliary duct and the calculus can be more effectively subjected to image recognition so as to assist a doctor in treatment. The application also provides an image recognition system, equipment and a storage medium for the hepatobiliary duct and the calculus, and the image recognition system, the equipment and the storage medium have corresponding effects.

Description

Image recognition method, system, equipment and storage medium for hepatobiliary duct and calculus

Technical Field

The invention relates to the technical field of medical image segmentation, in particular to an image identification method, system, equipment and storage medium for hepatobiliary ducts and calculi.

Background

At present, the surgical treatment of hepatobiliary calculus faces a plurality of difficulties and challenges, including the problems of difficult calculus removal, unclean calculus removal, easy recurrence after operation and the like. The general lithotomy scheme is to perform an enhanced CT scan before an operation to obtain two-dimensional image information, then to abstract the doctor into a 3D model at the head by virtue of experience, so that the doctor has high requirements on clinical experience and professional knowledge and may have deviation. In order to better utilize digital imaging technology to assist physicians, medical image segmentation technology specific to specific organ regions and corresponding focal zones begins to appear, and blood vessel segmentation, organ segmentation, tumor segmentation and the like are common. But the special segmentation for hepatobiliary and biliary calculi is almost blank.

A common blood vessel segmentation technique utilizes a region growing method, which is very effective for segmenting arterial blood vessels in an enhancement period because blood vessels are highlighted and distinct regions on a CT image and have a higher resolution than background regions, but the effect on bile duct stones and biliary tract stones is not ideal because target regions visible in both a flat scan period and a portal period are gray or even dark gray, and therefore, it is difficult to distinguish hepatobiliary ducts and stones by pixel gray values. The method also adopts the scheme that a full convolution neural network is utilized to carry out image segmentation, which is more effective to be U-net, convolution is utilized to carry out feature extraction, the shape and the position of foreground features are learned, fusion of feature maps at the same layer is added, finally deconvolution is utilized to carry out size recovery, and a segmentation map is obtained. However, this method is suitable for large, smooth-edged, regularly shaped organ or cell boundaries. The effect is not particularly ideal for calculus with high deformability and bile duct with small pixel area and changeable shape.

In summary, how to more effectively perform image recognition on the hepatobiliary duct and the calculus so as to better assist the treatment of the doctor is a technical problem which needs to be solved by the technicians in the field at present.

Disclosure of Invention

The invention aims to provide an image recognition method, system, equipment and storage medium for hepatobiliary ducts and calculi, which can effectively perform image recognition on the hepatobiliary ducts and the calculi so as to better assist doctors in treatment.

In order to solve the technical problems, the invention provides the following technical scheme:

an image recognition system for hepatobiliary and stone, comprising:

the input module is used for acquiring an image to be detected and inputting the image to the trained sparse convolution neural network;

the sparse convolution neural network is used for carrying out image recognition on the image to be detected and segmenting the position profile of the hepatobiliary duct and the position profile of the gallstone in the image to be detected in a recognition result;

the sparse convolutional neural network comprises N encoders, N decoders, sparse convolutional layers connecting adjacent encoders and decoders and an output module connected with the Nth decoder, and all the encoders and the decoders are alternately connected; each encoder and decoder comprises K blocks which are connected in sequence; p, q, N and K are positive integers, N is more than or equal to 2, and K is more than or equal to 3; each convolution layer in the 1 st encoder adopts a small convolution kernel, and each convolution layer in the rest encoders and each decoder adopts a sparse convolution kernel;

the 1 st to K-1 st blocks of the 1 st encoder and the 2 nd to K-1 st blocks of other encoders respectively comprise p convolutional layers and a pooling layer connected with the last layer of the p convolutional layers; the Kth block of each encoder comprises q convolutional layers; the 1 st blocks of the 2 nd to the Nth encoders respectively comprise 1 pooling layer;

the output of the pooling layer of the ith block of each of the 2 nd to the nth encoders is fused with the output of the last convolution layer in the (i + 1) th block of the 1 st encoder, and then is used as the input of the (i + 1) th block of the encoder, wherein i is more than or equal to 1 and less than or equal to K-2;

the 1 st block of the N decoders comprises p convolutional layers, the K block of the N decoders comprises 1 upper convolutional layer, and the K-1 to 2 nd blocks comprise p convolutional layers and an upper convolutional layer connected with the last layer of the p convolutional layers;

the output of the upper convolutional layer of the jth block of each decoder is fused with the output of the last convolutional layer in the jth block-1 in the 1 st encoder, and then is used as the input of the jth block of the decoder, wherein j is more than or equal to 2 and less than or equal to K.

Preferably, in the sparse convolutional neural network, N is 2, p is 2, q is 1, and K is 5.

Preferably, each convolution layer in the 1 st encoder uses a 1 × 1 convolution kernel or a 2 × 2 gaussian convolution kernel.

Preferably, the scales of the sparse convolution kernels used in the 2 nd to nth encoders and the N decoders are different from each other.

Preferably, each convolutional layer in the 1 st decoder adopts a 3 × 3 sparse convolutional kernel; each convolution layer in the 2 nd encoder adopts a 5 multiplied by 5 sparse convolution kernel; each convolutional layer in the 2 nd decoder uses a 7 x 7 sparse convolutional kernel.

Preferably, the loss function L in the output module is represented as:

in the formula, Cls is the total number of pixel categories in the image, Num is the total number of pixel points, y_iIndicating the classification of the ith pixel, p_ijAnd (4) representing the probability when the ith pixel point is of the jth category, wherein t is a preset probability threshold.

Preferably, the loss function L in the output module is represented as:

in the formula, Cls is the total number of pixel categories in the image, Num is the total number of pixel points, y_iIndicating the classification of the ith pixel, p_ijThe probability of the ith pixel point being in the jth category is shown, t is a preset probability threshold value,

and the prediction type is the prediction type of the ith pixel point.

An image identification method for hepatobiliary ducts and stones comprises the following steps:

the input module acquires an image to be detected and inputs the image to the trained sparse convolution neural network;

the sparse convolution neural network carries out image recognition on the image to be detected, and the position contour of the hepatobiliary duct and the position contour of the gallstone in the image to be detected are segmented in a recognition result;

An image recognition device for hepatobiliary and stone, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the hepatobiliary and stone image recognition system as defined in any of the above.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the hepatobiliary and stone image recognition system of any of the above.

In the scheme of the application, in the process of the first encoding, the feature maps under different scales are output to the subsequent encoders and decoders, and are fused with the feature maps under the corresponding resolutions, that is, after the output of the pooling layer of the ith block of each of the 2 nd to nth encoders is fused with the output of the last convolution layer in the (i + 1) th block of the 1 st encoder, the fused pooling layer is used as the input of the (i + 1) th block of the encoder, and i is more than or equal to 1 and less than or equal to K-2. The output of the upper convolutional layer of the jth block of each decoder is fused with the output of the last convolutional layer in the jth block-1 in the 1 st encoder, and then is used as the input of the jth block of the decoder, wherein j is more than or equal to 2 and less than or equal to K. Such a solution greatly preserves the elementary characteristics, i.e. the situation in which the elementary characteristics of a conventional U-net network would stop at the output of the encoder. Because the primary characteristics are greatly reserved, the information lost by layer-by-layer pooling and convolution is reduced, the refinement degree of edge segmentation is favorably improved, and the loss condition of edge information is reduced. In addition, each convolution layer in the 1 st encoder adopts a small convolution kernel, each convolution layer in the rest encoders and each convolution layer in each decoder adopts a sparse convolution kernel, and on the premise of ensuring larger receptive field and convergence speed, the detection of a small target is more sensitive due to the small convolution kernel, so that the edge details can be effectively reserved by the first encoder adopting the small convolution kernel aiming at the calculus with high deformability and the bile duct with small pixel area and changeable shape. It can be seen that in the scheme of the application, the edge details are effectively retained through the small convolution kernel, and the loss of the primary features is avoided through the fusion of the feature map, namely, the loss of the retained edge details is avoided, so that the hepatobiliary duct and the calculus can be more effectively subjected to image recognition, and the treatment of a doctor is better assisted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of an image recognition system for hepatobiliary and calculus of the present invention;

FIG. 2 is a schematic diagram of a sparse convolutional neural network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a connection structure of a 1 st encoder and a second encoder of a sparse convolutional neural network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the structure of the encoder 1 of the sparse convolutional neural network according to an embodiment of the present invention;

FIG. 5 is a block diagram of a decoder 1 of the sparse convolutional neural network according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a 2 nd encoder of a sparse convolutional neural network in accordance with an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a 2 nd decoder of the sparse convolutional neural network according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide an image recognition system for hepatobiliary ducts and calculi, which effectively retains edge details through a small convolution kernel and avoids the loss of primary features through the fusion of feature maps, so that the image recognition can be more effectively carried out on the hepatobiliary ducts and the calculi, thereby better assisting the treatment of doctors.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an image recognition system for hepatobiliary ducts and stones in the present invention, the image recognition system for hepatobiliary ducts and stones includes:

and the input module 10 is used for acquiring an image to be detected and inputting the image to be detected into the trained sparse convolutional neural network 20.

Specifically, during training, the input module may perform image augmentation on the original CT image sample, for example, by performing operations such as flipping, blurring, warping, and cutting, to obtain a large number of sample images for training, and each sample image for training input into the sparse convolutional neural network 20 needs to be the same size. After the training is completed, when the input module 10 acquires the image to be detected, if the size of the image to be detected is not equal to the size of the image required to be input into the sparse convolutional neural network 20, the input module 10 may also enable the sparse convolutional neural network 20 to perform image recognition on the image to be detected through operations such as all-zero padding and cutting.

And the sparse convolution neural network 20 is used for carrying out image identification on the image to be detected and segmenting the position profile of the hepatobiliary duct and the position profile of the gallstone in the image to be detected in the identification result.

The sparse convolutional neural network 20 comprises N encoders, N decoders, a sparse convolutional layer connecting adjacent encoders and decoders, and an output module connected with the nth decoder, wherein each encoder is alternately connected with the decoders; each encoder and decoder comprises K blocks which are connected in sequence; p, q, N and K are positive integers, N is more than or equal to 2, and K is more than or equal to 3; each convolution layer in the 1 st encoder adopts a small convolution kernel, and each convolution layer in the rest encoders and each decoder adopts a sparse convolution kernel;

For convenience of description, fig. 2 is an example, and fig. 2 is a schematic diagram of an internal structure of the sparse convolutional neural network 20 in an embodiment, and an output module is not shown in fig. 2. In fig. 2, N takes a value of 2, i.e., in the embodiment of fig. 2, the sparse convolutional neural network 20 includes two encoder-decoder processes. In fig. 2, solid arrows to the right indicate that the feature map fusion is performed. The open dashed arrow to the right indicates the convolutional layer and a small convolutional kernel is used, which is present only in encoder 1, stream 1. The solid arrows to the right indicate sparse convolution layers, i.e. sparse convolution kernels are used. The upward arrows indicate the pooling layers, and the pooling layers in this application are all referred to as maximum pooling layers. The downward arrow indicates the upper convolutional layer, otherwise known as the upsampling layer.

In other specific situations, N may be set to other values, that is, 3 or more sets of encoder-decoders may be set in the sparse convolutional neural network 20, but it should be noted that, when image recognition of hepatobiliary duct and calculus is performed, experimental verification and theoretical analysis show that the effect is better when N is 2.

The depth of the convolutional neural network is divided into two types, one is longitudinal, namely convolutional layer by layer to obtain a deeper network, the other is transverse, convolutional kernels with a plurality of scales are simultaneously convoluted on one characteristic layer, and then fusion is carried out on the next characteristic layer, or a plurality of network architectures are formed to carry out the fusion of the networks.

For a longitudinally deepened network, the convolution operation brings about the increase of weight, which can cause the sudden increase of memory occupation, but the precision is improved a little. In addition, since a pooling operation is introduced at the end of each convolutional layer, from the viewpoint of the feature map, it is true that higher-layer features can be obtained, which is beneficial for classification and position identification, but at the same time, each pooling means a reduction in resolution, a reduction in definition, and a further lack of edge information, and even an over-fitting phenomenon is likely to occur. The value of K can influence the network depth, and in a specific embodiment, the effect is better when the value of K is 5 through experimental verification and theoretical analysis. Of course, in other implementation scenarios, K may have other values. Correspondingly, the values of p and q can be set and adjusted according to needs, for example, p can be 2 or 3, and q can be 1.

For convenience of description, a preferred embodiment of fig. 2, N-2, p-2, q-1, and K-5 will be described hereinafter.

For a transversely deepened network, some conventional schemes are to perform cascade operation on the U-net, which can deepen the depth of the network to a certain extent, but because the circulation of primary features is not fundamentally solved, the precision improvement is limited, that is, the circulation of information ends at the output end of the front sub-network, and shallow feature information is greatly lost.

In fig. 2, the 1 st block of the 1 st encoder includes: in order to more clearly show the structure contained in each block, it can be seen from fig. 3, and it should be noted that fig. 3 only shows the 1 st encoder and the 1 st decoder. Accordingly, the 2 nd block, the 3 rd block and the 4 th block of the 1 st encoder each include: convolutional layer-pooling layer. The 5 th block of the 1 st encoder includes one convolutional layer. The convolutional layers in encoder 1 each employ a small convolution kernel, for example, 1 × 1 convolution kernels or 2 × 2 gaussian convolution kernels may each be employed. The concept of "block" in this application may also be referred to as a feature layer.

In the embodiment of fig. 2, each of the 2 nd, 3 rd and 4 th blocks of the 2 nd encoder includes: convolutional layer-pooling layer. And the 5 th block of the 2 nd encoder includes 1 convolutional layer, and the 1 st block of the 2 nd encoder includes 1 pooling layer.

The 1 st block of each of the 1 st and 2 nd decoders includes: convolutional layers — convolutional layers, i.e., layers each comprising two convolutional layers. The 5 th blocks of the 1 st and 2 nd decoders each include 1 upper convolutional layer, and the 4 th, 3 rd and 2 nd blocks of the 1 st decoder each include: convolutional layer-upper convolutional layer. The 4 th, 3 rd and 2 nd blocks of the 2 nd decoder also include: convolutional layer-upper convolutional layer.

In addition, since the convolutional layers in the 1 st decoder, the 2 nd decoder, and the 2 nd encoder all use sparse convolutional kernels, that is, all are sparse convolutional layers, fig. 3 is directly labeled as sparse convolution.

Considering that calculus has high deformability, the pixel area of the bile duct is small, the shape of the bile duct is variable, and therefore the segmentation effect of the target edge needs to be improved, namely more primary features need to be reserved.

On the other hand, when performing object segmentation, the scale of the convolution kernel is also a very important influencing parameter. The large convolution kernel can effectively enlarge the receptive field of the image and accelerate convergence, but at the same time, the parameter quantity of the model is increased, the implementation complexity and the cost are high, and the sparse convolution kernel is a better choice for achieving the same receptive field. The small convolution kernel is more sensitive to detection of small targets, and considering that the small-volume targets such as bile duct endings and calculi are aimed at in the application, the small convolution kernel has a good segmentation effect by matching with the sparse convolution kernel.

It can be seen that the method can effectively retain edge details through the small convolution kernel, has a good segmentation effect, and simultaneously avoids the loss of primary features through the fusion of the feature maps, namely avoids the loss of the edge details retained in the 1 st encoder, so that the method can more effectively perform image recognition on the hepatobiliary duct and the calculus.

Furthermore, in consideration of the need of segmenting calculus and hepatobiliary duct, the multi-scale sparse convolution kernel is beneficial to segmenting multiple objects and multiple targets, namely, the small convolution kernel is matched with the multi-scale sparse convolution kernel to achieve a better segmentation effect. Therefore, in one embodiment, the scales of the sparse convolution kernels used in the 2 nd to nth encoders and the N decoders are different from each other.

Further, in the embodiment of fig. 2, each convolutional layer in the 1 st encoder may use a small convolution kernel of 1 × 1; each convolutional layer in the 1 st decoder may use a 3 × 3 sparse convolutional kernel; each convolutional layer in the 2 nd encoder may use a 5 × 5 sparse convolutional kernel; it is a proven preferable solution that each convolutional layer in the 2 nd decoder can use 7 × 7 sparse convolutional kernels.

The 1 st encoder adopts a 1 multiplied by 1 small convolution kernel, so that the feature learning is carried out on a small target region, the segmentation effect on the small target is improved, the feature maps under different scales are transmitted to the following encoder and decoder for feature fusion, and the loss condition of primary features is reduced. The 1 st decoder is a characteristic reconstruction process, and 3 x3 sparse convolution kernels are adopted, so that the image receptive field is increased, and more accurate position information and classification information are obtained. The 2 nd encoder extracts the secondary features, more accurate focal zone position information can be obtained, and a 5 x5 sparse convolution kernel is adopted, so that the depth of a network structure is further widened while the receptive field is increased, and the image features are further extracted. The 2 nd decoder performs multi-stage feature fusion, adopts 7 × 7 sparse convolution kernels, and obtains a prediction map of the original resolution size through an upsampling operation.

Referring to fig. 4, in the flow of the 1 st encoder, an original image of 1 channel 512 × 512 is input, after two convolutions, a 64 channel image of 512 × 512 is obtained, the image is fused with the output of the 2 nd convolutional layer of the 1 st decoder, the fused image is a 128 channel image of 512 × 512 in fig. 5, and the fused image is used as the input of the 1 st block of the 1 st decoder, that is, the input of the convolutional layer-convolutional layer structure of the 1 st block of the 1 st decoder. Meanwhile, the 512 × 512 64 channel image obtained by the 1 st encoder needs to be fused with the output of the 2 nd convolution layer of the 2 nd decoder, the fused image is the 512 × 512 128 channel image in fig. 7, and the fused image is used as the input of the 1 st block of the 2 nd decoder. The 1 st encoder itself needs to obtain the 512 × 512 64-channel image as the input of the pooling layer in the 1 st block, and after pooling, the 256 × 256 64-channel image is output, and the 256 × 256 64-channel image is used as the input of the 2 nd block of the 1 st encoder.

In fig. 4 to 7, the upper numbers indicate the number of channels, and the lower numbers indicate the resolution.

The 2 nd block of the 1 st encoder also includes two convolutional layers and one pooling layer, and after performing two convolutions on the input 256x256 64 channels, a 256x256 128 channel image is obtained. This image needs to be fused with the output of the pooling layer of the 1 st block of the 2 nd encoder, as well as the input of the 2 nd block of the 2 nd encoder, with the output of the upper convolution layer of the 3 rd block of the 1 st decoder, as well as the input of the 2 nd block of the 1 st decoder, and with the output of the upper convolution layer of the 3 rd block of the 2 nd decoder, as well as the input of the 2 nd block of the 2 nd decoder. In addition, the pictures need to be pooled and the pooled 128-channel pictures become 128x128 as input to the 3 rd block of the 1 st encoder.

The 5 th block of the 1 st encoder includes 1 convolutional layer, which is inputted with a 512-channel image of 32x32 and outputted with a 1024-channel image of 32x 32. The 1024 channel 32x32 images are input to a sparse convolutional layer, which is the sparse convolutional layer connecting the 1 st encoder and the 1 st decoder, and the output of the sparse convolutional layer is the 1024 channel 32x32 images as the input of the 5 th convolutional layer of the 1 st decoder. After the 1 st decoder receives 1024 channel images of 32x32 output by the sparse convolutional layer, 512 channel images of 64x64 can be obtained by the upper convolutional layer of the 5 th block, and after fusion, the 1024 channel images of 64x64 become input to the 4 th convolutional layer-convolutional layer structure of the 1 st decoder.

The scale of the sparse convolutional layers connecting the 1 st encoder and the 1 st decoder may be the same as the scale of the respective convolutional layers in the 1 st decoder, e.g., the scale of the sparse convolutional layers may be 3 × 3. Accordingly, the scale of the sparse convolutional layer connecting the 1 st decoder and the 2 nd encoder may be 5 × 5, and the scale of the sparse convolutional layer connecting the 2 nd encoder and the 2 nd decoder may be 7 × 7.

In one embodiment of the present invention, the loss function L in the output module is represented as:

Specifically, when an object with a regular shape and smooth edges is segmented, the boundary is easy to determine, but the bile duct, the calculus is a highly deformable object, and the boundary part is usually difficult to segment, so that more accurate segmentation information needs to be obtained on these pixel points for segmentation accuracy.

As the training process continues, the sparse convolutional neural network over learns the categories of most pixels, i.e. the pixels belonging to these categories become easily distinguishable, meaning that the loss during the training process is reduced largely due to these easily distinguishable pixels. The loss of pixels belonging to a small number of classes remains unchanged or even increases, and these small number of pixels which are difficult to distinguish are often boundary points or artifact points. Therefore, the purpose of this embodiment is to find these few indistinguishable pixels and ensure that the sparse convolutional neural network can train on this portion of pixels until their loss drops to the same level as most pixels. That is, the loss is decreased closer to the actually required direction.

In the above formula, 1{ } indicates that when it is true, 1{ } equals 1, otherwise it is 0. I.e. (y)_i＝j)∩(p_ijT ≦ t) when standing, 1{ (y)_i＝j)∩(p_ijT) } is equal to 1, and if not, is 0. When (y)_i＝j)∩(p_ij≤t)logp_ijWhen it is established, 1{ (y)_i＝j)∩(p_ij≤t)logp_ijEqual to 1, and if not, 0. It can be seen that when the classification probability of a pixel is relatively high, that is, the pixel is relatively well resolved, the pixel is discarded. What remains are those pixels that are not well differentiated. The predetermined probability threshold t may be predetermined and is a number between 0 and 1. It is desirable to keep at least one reasonable number of pixels per batch, otherwise the computed gradient becomes very noisy, of course, if the sparse convolutional neural network performs reasonably well on a particular small batch, the value of t may be increased, whereas if the effect is not good, the value of t may be decreased.

and the prediction type is the prediction type of the ith pixel point.

Specifically, the loss function is set in combination with the concept of cross entropy in this embodiment, which is beneficial to further solving the bottleneck problem of loss reduction.

The cross entropy is used to evaluate the difference between the probability distribution obtained by the current training and the true distribution, and can be expressed as follows.

In this equation, y is the desired output, and a is the actual output of the neuron: a ═ σ (z), and derivation is performed simultaneously to obtain:

wherein z ═ Σ (Wj × XJ + b), the derivative has no term σ' (z), and the update of the weight is affected by the term σ (z) -y, i.e., by the error, so that when the error is large, the update of the weight is fast, and when the error is small, the update of the weight is slow. Based on this property, the present application uses the cross entropy as the loss function, so as to effectively measure the similarity between the real marker distribution and the predicted marker distribution, and in addition, this embodiment also has an advantage that when the sigmoid function is used in gradient descent, the problem of the learning rate of the mean square error loss function is reduced, because the learning rate can be controlled by the output error.

Formula (II)

In (1),

is actually equivalent to p_ij。

Because the sparse convolution neural network carries out multi-stage feature extraction and multi-scale feature fusion, the loss tends to be stable after being trained to a certain degree, namely, the loss is not obviously reduced. This is because there are easy-to-distinguish pixel points and pixel points that are difficult to distinguish, the easy-to-distinguish pixel points can already realize efficient classification in the early iteration process, and the probability of the foreground edge pixel points is still far away from 0 and 1. By utilizing the loss function provided by the application, the bottleneck problem of loss reduction can be further solved, namely, a more targeted loss function is set, so that the loss reduction is closer to the direction of real requirement to be reduced.

In addition, it should be noted that a visualization unit may be further disposed in the output module. Specifically, an image output by the nth decoder is changed into an image of 1 channel after one time of sparse convolution, and a gray level image with gray levels of 0, 1 and 2 can be obtained by activating a function, and is completely black in visual effect, and coloring is required to be performed through a visualization unit, that is, an operation of pixel value conversion is performed. For example, a point with a gray value of 0, corresponding to the background, can be transformed to three-channel (255, 255, 255), and correspondingly, a hepatobiliary with a gray value of 1, the pixel value is changed to (255, 0, 0); for stones with a gray value of 2, the pixel value is changed to (0, 255, 0).

Corresponding to the above system embodiments, the embodiments of the present invention further provide an image recognition method for hepatobiliary ducts and stones, which can be referred to in correspondence with the above.

The image identification method of the hepatobiliary duct and the calculus can comprise the following steps:

step S101: the input module acquires an image to be detected and inputs the image to the trained sparse convolution neural network;

s102, carrying out image recognition on an image to be detected by a sparse convolution neural network, and segmenting a position contour of a hepatobiliary duct and a position contour of a gallstone in the image to be detected in a recognition result;

Corresponding to the above method and system embodiments, the embodiment of the invention also provides an image recognition device for hepatobiliary and calculus and a computer readable storage medium, which can be referred to in correspondence with the above.

The image recognition device for hepatobiliary and calculus may include:

a memory for storing a computer program;

a processor for executing a computer program to implement the hepatobiliary and stone image recognition system in any of the above embodiments.

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the hepatobiliary and stone image recognition system of any of the above embodiments. A computer-readable storage medium as referred to herein may include Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. An image recognition system for hepatobiliary and stone, comprising:

2. The system of claim 1, wherein in the sparse convolutional neural network, N-2, p-2, q-1, K-5; n-2 means that the sparse convolutional neural network is composed of a 1 st encoder, a 1 st decoder, a 2 nd encoder, and a 2 nd decoder.

3. The hepatobiliary and stone image recognition system of claim 2, wherein each convolutional layer in the 1 st encoder employs a 1 x1 convolutional kernel or a 2x 2 gaussian convolutional kernel.

4. The system of claim 3, wherein the scales of the sparse convolution kernels used in the 2 nd to Nth encoders and the N decoders are different from each other.

5. The hepatobiliary and stone image recognition system of claim 4, wherein each convolutional layer in the 1 st decoder employs a 3 x3 sparse convolutional kernel; each convolution layer in the 2 nd encoder adopts a 5 multiplied by 5 sparse convolution kernel; each convolutional layer in the 2 nd decoder uses a 7 x 7 sparse convolutional kernel.

6. The system according to any one of claims 1 to 5, wherein the loss function L in the output module is expressed as:

7. The system according to any one of claims 1 to 5, wherein the loss function L in the output module is expressed as:

wherein Cls is a figureThe total number of pixel classes in the image, Num is the total number of pixel points, y_iIndicating the classification of the ith pixel, p_ijThe probability of the ith pixel point being in the jth category is shown, t is a preset probability threshold value,

and the prediction type is the prediction type of the ith pixel point.

8. An image recognition method for hepatobiliary ducts and stones, comprising:

9. An image recognition device for hepatobiliary and stone, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the hepatobiliary and stone image recognition system of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, implements the hepatobiliary and stone image recognition system according to any one of claims 1 to 7.