CN113643238A

CN113643238A - Mobile phone terminal vein imaging method based on deep learning

Info

Publication number: CN113643238A
Application number: CN202110796882.7A
Authority: CN
Inventors: 钱蒙恩; 唐超颖; 杨嘉睿; 王彪; 李丽荣
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-11-12

Abstract

The invention discloses a method for imaging veins at a mobile phone terminal based on deep learning, which belongs to the technical field of image processing and deep learning and comprises the following steps: 1) vein image dataset processing: acquiring a skin visible light image and a near infrared image which are synchronous, enhancing the near infrared image, highlighting vein information and providing truth value data for subsequent network training; 2) designing and training a vein imaging network: establishing a vein imaging network, and taking the processed vein image data set as a training set training network; 3) network lightweight: improving a display network, and compressing the parameter scale and the calculated amount of the display network; 4) deployment and application of the mobile phone terminal: and deploying the network after light weight to the mobile phone end for operation, and realizing vein imaging of the mobile phone end. The vein imaging method based on the deep learning method and the open source neural network reasoning framework at the mobile phone end improves the convenience of vein imaging work and has great use value in the medical field.

Description

Mobile phone terminal vein imaging method based on deep learning

Technical Field

The invention relates to the technical field of image processing and deep learning, in particular to a mobile phone end vein imaging method based on deep learning.

Background

In the medical field, venipuncture is an indispensable procedure in medical care and nursing today for clinical administration of fluids and blood transfusions. Venipuncture is performed on the premise that the vein position is determined, and a generally experienced medical technician determines the vein position of a patient by eye observation and finger touch. Due to the strong virus infectivity, medical personnel must wear protective gloves, protective clothing and goggles when curing infected patients. The tight ventilative protective clothing makes the heat that the health produced can't get rid of and breathes the steam that produces in addition and attach to the goggles internal surface, can make the vision receive very big influence, and protective gloves can make the sensitivity of touch decline simultaneously, and this makes medical personnel confirm the degree of difficulty greatly increased of vein position and then implement the vein puncture.

In order to reduce the difficulty of venipuncture and improve the success rate of venipuncture, many venipuncture auxiliary devices or methods have been proposed, including imaging methods based on near-infrared light characteristics, imaging methods based on ultrasonic characteristics, and imaging methods based on optical imaging principles. The vein imaging method based on the near infrared ray characteristics mainly utilizes the characteristic that hemoglobin in blood has higher absorption rate of near infrared light with the wavelength range of 780nm to 960nm than other body tissues. When the vein is irradiated by near-infrared light with the same intensity, the vein position can be observed to be darker than other skin areas under the help of the near-infrared camera, so that the position of the vein can be determined. The imaging method based on the ultrasonic wave characteristics distinguishes veins from other body tissues by utilizing the acoustic characteristics of the vein tissues and the acoustic characteristics of other body tissues to distinguish the veins from the other body tissues according to different reflection intensities of the ultrasonic waves, and generally, the ultrasonic vein imaging method can be used by combining with an ultrasonic contrast agent, so that the imaging definition and accuracy are improved. The vein imaging method based on the optical imaging principle uses the K-M theory to simulate the optical behavior of the skin and combines a large number of biophysical parameters, spectral information of a digital camera sensor and light source information illuminating the skin to simulate the formation of skin color. The process is used as a forward model. A series of skin parameters including melanin concentration, hemoglobin concentration and dermis depth are input into the forward model to obtain corresponding RGB values of the visible light image, and then a shallow forward neural network is trained by using the corresponding relations, and the network takes the RGB values as input to predict the skin parameters. Collecting the output predicted skin parameters forms three profiles: spatial distribution of melanin and hemoglobin and changes in dermal depth. Finally, the texture information of the subcutaneous vein of the epidermis can be extracted through the distribution maps. The imaging method based on the near-infrared light characteristics and the ultrasonic characteristics needs to depend on specific external equipment such as a near-infrared light source and an ultrasonic transmitter, and is poor in portability. The visualization method based on the optical imaging principle has the problem that the final visualization result is very noisy due to single-pixel calculation.

Disclosure of Invention

Aiming at the problems in the prior art, the invention adopts a mobile phone terminal vein imaging method based on deep learning, which is based on a deep learning method and a mobile phone terminal open source neural network reasoning framework, and overcomes the problems of poor portability and large noise of imaging results because the existing vein imaging method depends on specific external equipment.

The invention is realized by the following steps:

firstly, processing a vein image data set; the deep learning method needs real data as a training set to train a network model, and for a same-frame visible light image and a near-infrared image obtained by original acquisition, certain preprocessing operation needs to be performed on the same-frame visible light image and the near-infrared image, and the method comprises the following steps:

1) performing contrast-limited adaptive histogram equalization (CLAHE) on the near-infrared image to improve the contrast of the near-infrared image so as to strengthen the vein information in the image;

2) performing median filtering on the near-infrared image subjected to contrast-limited adaptive histogram equalization processing to remove a small amount of noise introduced by CLAHE in the previous step; the median filter replaces the value of the central pixel by the median of the pixel gray levels contained in the image area enclosed by the filter, as shown in equation (1):

wherein

For the calculated median value of the gray levels of the pixels in the region, S_xyIs the image area surrounded by the filter, g (m, n) is the pixel value in the filter area;

3) enhancing data of an image data set, and carrying out random chromaticity change on a visible light image so as to simulate the difference of skin colors among different people; the visible light image is randomly rotated by 15 degrees to 45 degrees in a counterclockwise way, and is horizontally and vertically overturned to simulate the change of a shooting angle, except the change of the chroma, the space change of the visible light image also needs to be equally changed for the near infrared image of the same frame, so that the pixels of a pair of images of the same frame are always in one-to-one correspondence;

4) normalizing image data, counting the pixel distribution rule of each image in a training set, selecting an image with a pixel value close to a mean value as a reference image, and performing histogram regularization on all images in the training set so as to avoid that model training is difficult to converge due to large pixel data distribution difference among the images;

5) the original image is cut into image blocks of 192 × 128 size as training set data, so that useless background information outside a skin area, namely an area to be imaged, can be removed, and insufficient display memory can be reduced when a large batch of processing values are set.

Constructing and training a vein imaging network; the convolutional neural network can keep the position relation among pixels when processing the image, has high invariance to the rotation, translation and scaling of the image, and is suitable for extracting the characteristics of the two-dimensional image. A neural network architecture based on the convolutional neural network is selected to construct a venous imaging network. The contents of the network construction and training part are as follows:

1) the network is mainly divided into three parts: a feature extractor section that combines the upsampling section and the upsampling reduction section. The characteristic extractor part performs convolution operation on an input image to extract an image characteristic diagram, wherein part of convolution layers perform down-sampling operation on the image to generate characteristic diagrams with different scales; the combined up-sampling part up-samples the feature maps with different sizes obtained by the feature extractor part to the same size for splicing, and extracts features from the spliced feature maps again by using dilation convolution so as to expect that the features with different sizes can be fused to obtain the scale and shape information of veins; the up-sampling restoration section up-samples the feature map output from the joint up-sampling section so that the final output result map is identical in size to the input image.

2) The expected output result is an image which has the same size as the input image and displays vein information, so that the imaging network adopts a full convolution structure and directly takes the characteristic diagram of the last convolution layer as output; and the output image should be as close to the value of the near-infrared image as possible, so that the near-infrared image corresponding to the input image is taken as a true value image during network training, the mean square error loss is taken as a loss function, and the difference between the output result and the true value image is measured. The loss function is divided into two parts, wherein the first part is the mean square error loss of the network output result and the true value image and is marked as A; the second part is to use a VGG16 network to extract feature maps of the network output result image and the truth value image, and to make the mean square error loss of the feature maps of the two, which is marked as B. Finally, the loss function is determined as: loss is 0.8 a + 0.2B.

3) Training was performed using 5-fold cross validation. Dividing 90% of images in the training set into a training set, and dividing 10% of images in the training set into a verification set; let the learning rate be lr, the total number of training rounds be epochs, the current training round number be epo, and the learning rate implement the change shown in formula (2):

step three, lightening the network model; the training of the vein imaging network is carried out at the server side, but the memory and the calculation performance of the mobile phone side obviously cannot be compared with those of the server side, and in order to enable the final model to normally run at the mobile phone side and shorten the forward calculation time of the network to a certain extent, the lightweight transformation is carried out on the network model obtained by the training at the server side, and the parameter quantity and the calculation quantity of the network are reduced.

The depth separable convolution is adopted to replace the common convolution used in the original network model, so that the parameter quantity and the calculated quantity of the network can be reduced, and the function of lightweight model is achieved. Setting the number of input channels as M and the size of an output characteristic diagram as follows: d_F*D_FN, convolution kernel size is: d_K*D_KM, a total of N convolution kernels are included in the filter of a convolution layer. The parameter of the normal convolution is D_K*D_KM N, the parameters of the depth separable convolution are the parameters of the grouped convolution plus the parameters of the point-by-point convolution. Wherein the parameter of the packet convolution is D_K*D_KM, the parameters of point-by-point convolution are M N, so that the total parameter of depth-separable convolution is (D)_K*D_K+ N) M. The computation of the ordinary convolution is D_K*D_K*M*D_F*D_FN, the computation of the depth-separable convolution comprises the computation of the grouped convolution and the computation of the point-by-point convolution. The calculation amount of the packet convolution is as follows: d_K*D_K*M*D_F* D _F1, the calculated amount of point-by-point convolution is 1 x M D_F*D_FN, so the total computation of depth-separable convolution is D_F*D_F*M*(D_K*D_K+ N). The ratio of the parameters of the normal convolution and the depth separable convolution to the calculated amount is shown in equations (3) and (4):

in convolutional layers, N tends to be large, taking the common convolutional kernel size D_KWhen depth-separable convolution is used instead of normal convolution, the parameter amount of the convolution layer is about 1/9. In actual operation, not all convolutional layers in the original imaging network are replaced by depth separable convolutions, and after the convolutional layers are replaced by the depth separable convolutions, a group of point-by-point convolutions are added before the grouping convolution to carry out dimension raising on the input feature map, the width of the convolutional layers is expanded, more feature maps are generated, but due to the later depth separable convolutions, the parameters and the calculated quantity of the network model can be well compressed.

Step four, deploying a network model mobile phone end; in order to solve the problem of insufficient portability in the vein imaging application, the best method is to deploy the vein imaging algorithm to the mobile phone. The smart phone is almost an object which is not away from everyone at present, and the vein imaging realized by the mobile phone does not cause any additional carrying cost.

A smart phone with an Android system and an ARM (advanced RISC machines) architecture-based CPU (central processing unit) is selected as a hardware platform in a deployment example. Because the architectures of the chips are different, the neural network training framework of the server end cannot be deployed to the mobile phone end, therefore, the mobile end neural network inference framework NCNN opened by Tencent corporation is selected as the inference framework of the mobile phone end, and in addition, the OpenCV of the computer vision open source library is configured for conveniently processing image data.

1) And (3) converting the network model file: because the server side uses a network model obtained by PyTorch framework training, and the formats of the weight files required by the NCNN framework and the PyTorch framework are different, the parameter weight files stored in the PyTorch framework need to be converted into a format which can be loaded by the NCNN framework. The original PyTorch model weight file can be converted into an ONNX format by calling a torch.onnx.export () method, and then converted into a format required by NCNN;

2) application project construction: and using Android Studio as a development environment, constructing projects by using the converted model weight file and the compiled NCNN _ Android and OpenCV _ Android libraries, and deploying the vein imaging algorithm to a mobile phone terminal in the form of APP.

Compared with the prior art, the invention has the beneficial effects that:

the invention is based on the combination of a deep learning method, semantic segmentation and a mobile phone end, the deep learning method is extremely successful in the field of computer vision, and the semantic segmentation is a common task in the field. The semantic segmentation allocates a predefined label representing the semantic category of each pixel in the image, detects object information of a certain category in the image and segments the object information, and can be used for extracting specific information in the image. The vein imaging can be understood as a semantic segmentation task to some extent, and the vein lines are segmented from the skin image to increase the visualization degree. In addition, smart phones are popularized nowadays, almost all the things that people do not leave are, and a vein imaging method based on deep learning is deployed to be realized at a mobile phone end, so that the method can well solve the portability problem in the vein imaging problem;

the invention builds a lightweight imaging network model and deploys the imaging network model to the mobile phone end for operation, thereby well solving the problem of vein imaging that the vein imaging problem depends on external equipment, such as a near infrared light source, even an embedded development board and the like, and hardly generating any carrying cost because the current smart phone becomes a carry-on object of people; meanwhile, compared with an imaging method based on an optical principle, the obtained vein imaging result is smoother, and the noise is greatly reduced.

Drawings

FIG. 1 is an original near-infrared image acquired in an embodiment of the present invention;

FIG. 2 is a diagram of a near infrared image after limited contrast histogram equalization (CLAHE) processing according to an embodiment of the present invention;

fig. 3 is a CLAHE enhanced image after median filtering processing in an embodiment of the present invention;

FIG. 4 is a diagram illustrating a display network structure according to an embodiment of the present invention;

FIG. 5 is a block diagram of a joint upsampling module in a visualization network according to an embodiment of the present invention;

FIG. 6 is a first set of examples of the present invention display network operating on a computer side, wherein (a) is a visible light image and (b) is the corresponding display result of (a);

FIG. 7 is a second set of examples of the present invention display network operating on a computer side, wherein (a) is a visible light image and (b) is the corresponding display result of (a);

FIG. 8 is a graph comparing normal convolution with depth separable convolution in an embodiment of the present invention;

FIG. 9 is a block diagram illustrating the basic block structure after the normal convolution is converted into the depth separable convolution according to an embodiment of the present invention;

fig. 10 is a diagram of a display result of the operation of the first group of mobile phone terminals in the embodiment of the present invention; wherein (a) is the interface for selecting the visible light image, and (b) is the interface of the display result corresponding to the visible light image selected in (a);

fig. 11 is a diagram of a second group of mobile phone end operation display results in the embodiment of the present invention; wherein (a) is the interface for selecting the visible light image, and (b) is the interface for displaying the result corresponding to the visible light image selected in (a).

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. In the description of the present patent application, it is noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, including not only those elements listed, but also other elements not expressly listed.

The invention relates to a mobile phone end vein imaging method based on deep learning, which comprises four parts, namely vein image data set processing, vein imaging network construction and training, network model lightweight and network model mobile phone end deployment, and specifically comprises the following implementation steps:

the method comprises the following steps: vein image dataset processing; the vein imaging algorithm in the invention is a deep learning-based method. Deep learning cannot be supported by real data, so that data related to task requirements are collected as real data to be used for deep network to 'learn' intrinsic characteristics, and then the data are generalized to the same type of problems. The arm region is selected as the image capturing region, considering that the veins in the arm region are often abundant and are often found in the hand region during daily medical infusion. During specific collection, a double CCD camera JAI-AD080CL is selected to shoot arm images. The camera is internally provided with a beam splitter prism, and can simultaneously send incident light into two independent channels, namely a visible light channel (400nm to 700nm) and a near infrared channel (750nm to 920nm), so as to provide synchronous images with different spectrums. Therefore, the visible light image and the near infrared image which completely correspond to the arm area can be acquired. The raw near-infrared image obtained by the acquisition is shown in fig. 1. The original picture is not suitable for being directly used as training data, so that certain preprocessing is carried out on the acquired original data.

(1) The positions of veins can be seen in the original near-infrared image, but the vein traces in some pictures are shallow, so that in order to highlight the veins which are not obvious enough, the original near-infrared image is subjected to contrast-limited histogram equalization (CLAHE) to enhance the contrast of vein parts and skin parts and highlight the vein positions, as shown in fig. 2;

(2) some salt-and-pepper noises appear in images processed by CLAHE, and the noises can be well suppressed by applying median filtering to the images, as shown in FIG. 3;

(3) cutting the near-infrared image and the corresponding visible light image into 192-128 image blocks to eliminate irrelevant background and reduce the size of the input training image;

(4) carrying out data enhancement on the image block, including random chrominance transformation, anticlockwise random rotation of 15-45 degrees and horizontal and vertical turnover, and expanding a data set;

(5) data normalization is an important operation in deep learning, the pixel distribution rules of images shot by different cameras are different, and when the pixel distribution rules of different images are too different, the model is difficult to converge. And calculating the pixel average value of each channel of the visible light image in the data set, selecting one visible light image closest to the average value and the corresponding near infrared image as a reference image, and prescribing the histogram of other images in the data set to make the pixel distribution rule of the other images similar to the reference image.

Step two: construction and training of a visualization network:

(1) a convolutional neural network is selected as a basic neural network architecture. The schematic diagram of the network structure is shown in fig. 4. The network is divided into a feature extractor portion, a joint upsampling portion and an upsampling restoration portion. Considering that the final network model is deployed at the mobile phone end, the network model is to avoid excessive parameters and excessive calculation. The network scale should be controlled at the beginning of network design, model lightweight after excessive dependence is avoided, so that currently widely applied network models such as VGG16 and ResNet50 are not directly used in a feature extractor part, a shallow feature extractor is constructed by using convolution as a basis, and the feature extractor part comprises three times of downsampling operations for extracting feature maps with different sizes. The joint up-sampling part up-samples the three feature maps with different sizes obtained by the feature extractor part to a uniform size and then splices the feature maps, and further performs expansion convolution on the spliced feature maps, and takes the feature maps after the expansion convolution as the output of the part, as shown in fig. 5. The up-sampling restoration part up-samples and further convolutes the output characteristic diagram of the previous part, so that the final output result diagram has the same size with the input image. And after the convolution layer, the batch normalization layer is used for accelerating the training convergence speed and inhibiting the gradient from disappearing. The activation function is selected from PReLU, which is defined as formula (5):

the initial value of a is usually a positive number less than 1, for example, a is 0.25, and the value of a can be updated during the training process. The PReLU does not discard all negative axis information but gives a smaller weight to it, and the non-linearity factor can be improved to a certain extent.

(2) Network training is provided withIntel to strong (Xeon) central processor and inteviantan X plus (TITAN Xp) graphics processing unit. The Python3.6 version is selected to build a training environment, and the loading and processing efficiency of image data is improved by means of opencv-python, numpy and other tool libraries for image processing and matrix operation. The deep neural network training frame is PyTorch. Training was performed using 5-fold cross validation. Dividing 90% of 1200 image blocks obtained after cutting and enhancing into a training set, and dividing 10% into a verification set; the batchsize is set to 4 and the total round of training is set to 100. The initial learning rate is set to 4 x 10 during network training^-4The learning rate after 10 rounds of training decays to 2 × 10^-4The learning rate after 20 rounds of training decays to 1 × 10^-4After 30 training rounds, the learning rate is exponentially attenuated, and the learning rate of each round is attenuated to 0.9 th power of the learning rate of the previous round. The loss function adopts a mean square error loss function and comprises two main contents: the first part is the pixel-by-pixel loss of a result image and a true value image output by the network and is marked as A; the second part is to use the VGG16 network to extract features from the result image and the true value image outputted by the network, and then to make pixel-by-pixel Loss on the feature maps extracted from both, which is denoted as B, and the total Loss function can be denoted as Loss being 0.8 × a +0.2 × B. The network visualization results are shown in fig. 6 and 7.

Step three: imaging network lightweight

The use of depth-separable convolutions instead of some convolution layers in the visualization network serves to reduce the number of network parameters and computations. A schematic diagram of a normal convolution and a depth separable convolution is shown in fig. 8. Before the grouping convolution, the feature maps are subjected to dimension increasing by using point-by-point convolution, the width of the corresponding convolution layer is expanded, and more feature maps are generated. The basic module structure diagram after replacing the common convolution with the depth separable convolution is shown in fig. 9.

Step four: network model handset side deployment

(1) Compiling and installing Windows end NCNN, NCNN _ Android and OpenCV _ Android;

(2) and converting the network weight file obtained by PyTorch training into a format required by loading of the NCNN framework. Invoking the torch. ONNX. export () method may convert the originally saved Pytorch model into an Open Neural Network exchange (ONNX) format. The ONNX format uses a protobuf binary format serialization model, is a multi-framework compatible model exchange format, and can serve as a bridge for converting a plurality of framework model formats. And after the ONNX format model is obtained through conversion, the model in the ONNX format is converted into a weight storage format for NCNN loading and calling by using a Windows-end NCNN tool obtained through compiling, and a network structure file with the suffix name of param and a network weight file with the suffix name of bin are obtained.

(3) And constructing the project in the Android Studio development environment. And calling related NCNN frame interface functions by using the C + + language, loading a network model to carry out forward reasoning on the model, and realizing the vein imaging of the visible light image. Major activities of constructing the application by using the Java language include reading and displaying pictures, and calling a C + + file containing model loading and reasoning by using a Java Native Interface (JNI) to realize the complete function of the application. When creating a project, the project template selects Native C + +. Two peer directories of cpp and java appear under the src/main path of the newly-built project folder, wherein cmakelists.txt and native-lib.cpp files are stored under the cpp folder and are respectively a construction file and a source file of the project, a function for calling the forward calculation of the NCNN frame is stored in the source file, and a mainactivity.java file is stored under the java folder and is used for constructing main activities of the application, including reading in pictures, calling the number of display functions, displaying result images and the like. After the project is built, the images in the mobile phone album can be selected, and then the display button is clicked, so that the display result graph can be obtained. The results of the mobile phone running the vein visualization algorithm are shown in fig. 10 and fig. 11. In this embodiment, fig. 10 and 11 are two display examples of the operation of the mobile phone terminal. Fig. 10(a) and 6(a) are the same visible light image, and fig. 11(a) and 7(a) are the same visible light image, which are used to compare with the computer terminal operation example, and illustrate that the operation result of the mobile phone terminal is the same as that of the computer terminal. Fig. 10(a) and 11(a) are interfaces for selecting a visible light image at the mobile phone end, and fig. 10(b) and 11(b) are interfaces for displaying results of the mobile phone end corresponding to the visible light image selected in fig. 10(a) and 11(a), respectively.

Through the comparison of the multiple groups of pictures, the development result of the mobile phone end is completely consistent with the development result of the server end, and the vein development method can be normally deployed and operated at the mobile phone end.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that modifications can be made by those skilled in the art without departing from the principle of the present invention, and these modifications should also be construed as the protection scope of the present invention.

Claims

1. A method for imaging veins at a mobile phone terminal based on deep learning is characterized by comprising the following steps: (ii) a

Firstly, processing a vein image data set;

constructing and training a vein imaging network;

step three, lightening the vein imaging network;

and step four, deploying the mobile phone end of the network model.

2. The method for imaging the vein of the mobile phone terminal based on the deep learning of claim 1, wherein the first step is specifically as follows:

1.1, carrying out contrast-limiting adaptive histogram equalization on a visible light image and a near-infrared image of the same frame obtained by original acquisition, and improving the contrast of the near-infrared image so as to strengthen vein information in the image;

1.2, performing median filtering on the near-infrared image subjected to contrast-limited adaptive histogram equalization; the median filter replaces the value of the central pixel by the median of the pixel gray levels contained in the image area enclosed by the filter, as shown in equation (1):

wherein

1.3, enhancing data of an image data set, and carrying out random chromaticity change on a visible light image so as to simulate the difference of skin colors among different people; the visible light image is randomly rotated by 15 degrees to 45 degrees in a counterclockwise way, and is horizontally and vertically overturned to simulate the change of a shooting angle, and the spatial change of the visible light image is also equally changed for the near-infrared image of the same frame except the change of the chroma;

1.4, normalizing the image data, counting the pixel distribution rule of each image in the training set, selecting an image with a pixel value close to the mean value as a reference image, and performing histogram stipulation on all the images in the training set;

1.5, the original image is cut into 192 × 128 image blocks as training set data.

3. The method for imaging the vein at the mobile phone end based on deep learning of claim 1, wherein the second step is specifically as follows:

2.1, the network is mainly divided into three parts: a feature extractor section combining the upsampling section and the upsampling restoration section; the characteristic extractor part performs convolution operation on an input image to extract an image characteristic diagram, wherein part of convolution layers perform down-sampling operation on the image to generate characteristic diagrams with different scales; the combined up-sampling part up-samples the feature maps with different sizes obtained by the feature extractor part to the same size for splicing, and extracts features from the spliced feature maps again by using dilation convolution so as to expect that the features with different sizes can be fused to obtain the scale and shape information of the vein; the up-sampling restoring part up-samples the characteristic graph output by the joint up-sampling part to make the final output result graph and the input image be identical in size;

2.2, the imaging network adopts a full convolution structure, and directly takes the characteristic diagram of the last convolution layer as output; during network training, a near-infrared picture corresponding to an input picture is used as a true value picture, mean square error loss is used as a loss function, and the difference between an output result and the true value picture is measured; the loss function is divided into two parts, wherein the first part is the network output result and the true value image mean square error loss which is marked as A; the second part is that a VGG16 network is used for extracting feature maps of a network output result image and a true value image, and mean square error loss is carried out on the feature maps of the network output result image and the true value image and is marked as B; finally, the loss function is determined as: loss is 0.8 a + 0.2B;

2.3, training adopts five-fold cross validation; dividing 90% of images in the training set into a training set, and dividing 10% of images in the training set into a verification set; let the learning rate be lr, the total number of training rounds be epochs, the current training round number be epo, and the learning rate implement the change shown in formula (2):

4. the method for imaging the vein at the mobile phone end based on deep learning of claim 1, wherein the third step is specifically as follows:

the depth separable convolution is adopted to replace the common convolution used in the original network model, so that the parameter quantity and the calculated quantity of the network can be reduced, and the function of lightweight model is achieved; setting the number of input channels as M and the size of an output characteristic diagram as follows: d_F*D_FN, convolution kernel size is: d_K*D_KM, a convolutional layer filter comprising a total of N convolutional kernels;

the parameter of the normal convolution is D_K*D_KM N, the parameter quantity of the depth separable convolution is the parameter quantity of the grouped convolution plus the parameter quantity of the point-by-point convolution; wherein the parameter of the packet convolution is D_K*D_KM, the parameters of point-by-point convolution are M N, so that the total parameter of depth-separable convolution is (D)_K*D_K+ N) M. The computation of the ordinary convolution is D_K*D_K*M*D_F*D_FN, calculated amount of depth separable convolutionThe calculated amount of the packet convolution and the calculated amount of the point-by-point convolution are included; the calculation amount of the packet convolution is as follows: d_K*D_K*M*D_F*D_F1, the calculated amount of point-by-point convolution is 1 x M D_F*D_FN, so the total computation of depth-separable convolution is D_F*D_F*M*(D_K*D_K+ N); the ratio of the parameters of the normal convolution and the depth separable convolution to the calculated amount is shown in equations (3) and (4):

in convolutional layers, N tends to be large, taking the common convolutional kernel size D_KWhen depth-separable convolution is used instead of normal convolution, the parameter amount of the convolution layer is about 1/9.

5. The method for vein visualization of a mobile phone terminal based on deep learning of claim 1, wherein in the fourth step, a mobile phone terminal of an Android system is selected, and a smart phone with a CPU based on an ARM architecture is used as a hardware platform in a deployment example; adopting an NCNN frame as an inference frame of a mobile phone end, and adopting a computer vision open source library OpenCV to process image data; namely, Android Studio is used as a development environment, model weight files obtained through conversion and NCNN _ Android and OpenCV _ Android library construction items obtained through compiling are used for deploying a vein imaging algorithm to a mobile phone end in an APP mode.