CN114998230A

CN114998230A - Pharynx swab oral cavity nucleic acid sampling area image identification method

Info

Publication number: CN114998230A
Application number: CN202210563500.0A
Authority: CN
Inventors: 朱天军; 张闯; 梁建国; 李伟豪; 韩诗婷; 陈悦任; 蔡淼纯
Original assignee: Zhaoqing University
Current assignee: Zhaoqing University
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-09-02

Abstract

The invention discloses a method for identifying an image of a nucleic acid sampling area of a throat swab oral cavity, which relates to the technical field of image segmentation and has the technical scheme that: s1: acquiring oral cavity images of people of different ages by a robot in different environments, and taking the acquired images as a training set, a verification set and a test set; s2: training the acquired images based on a Deeplab V3+ network model; s3: and carrying out verification analysis on the trained oral cavity M region segmentation model. According to the conclusion obtained through experiments, the method can effectively distinguish the oral cavity M area.

Description

Pharyngeal swab oral nucleic acid sampling area image identification method

Technical Field

The invention relates to the technical field of image segmentation, in particular to a method for identifying an image of a nucleic acid sampling area of a throat swab oral cavity.

Background

At present, the COVID-19 detection method mainly adopts a throat swab method for sample collection. During sample sampling, medical personnel need to be in intimate contact with the patient; when a patient coughs, a large amount of spray or aerosol is produced, and the medical staff has a high risk of infection. In addition, in the collection process of the pharyngeal swab, due to different levels and irregular collection operations of samples collected by medical staff, differences exist in the quality of the pharyngeal swab, false negative cannot be avoided, and misdiagnosis risks exist.

To avoid prolonged exposure of medical personnel to high risk areas, pharyngeal swab sample collection is accomplished with a sampling robot. In the process of throat swab sampling by the robot, how to accurately identify the sampling area (M area) of the oral cavity is extremely important, and the robot plays a leading role in the sampling process.

The sampled picture needs to be subjected to image segmentation, and the purpose of the image segmentation is to separate an object from a background, but the existing image identification method has the problem that the boundary of the extracted oral cavity M region is discontinuous or fuzzy.

Disclosure of Invention

The invention aims to provide a method for identifying an image of a pharyngeal swab oral cavity nucleic acid sampling area, which can effectively distinguish an M area from a background area and has good segmentation performance.

The technical purpose of the invention is realized by the following technical scheme: a throat swab oral cavity nucleic acid sampling area image identification method specifically comprises the following steps:

s1: acquiring oral cavity images of people of different ages by a robot in different environments, and taking the acquired images as a training set, a verification set and a test set;

s2: training the acquired images based on a Deeplab V3+ network model;

s3: carrying out verification analysis on the trained oral cavity M region segmentation model;

the specific steps for training the Deeplab V3+ model in the S2 are as follows:

1) and (3) the training set, the verification set and the test set are adjusted according to the following conditions of 8: 1: 1, firstly, inputting an image into an Encoder, and obtaining two characteristic layers which are respectively a shallow effective characteristic layer and a deep effective characteristic layer after processing by a Deep Convolutional Neural Network (DCNN);

2) after carrying out 1 × 1 convolution on the shallow effective characteristic layer, entering a Decoder, and stacking results of four times of up-sampling with the characteristic layer with high semantic information;

3) after 3 × 3 convolution is carried out on the stacked characteristic layers, up-sampling is carried out for four times to obtain a final effective characteristic layer, namely the characteristic concentration of the whole picture;

4) utilizing resize to enable the height and width of a final output layer to be the same as the size of an original picture;

the specific steps of obtaining the feature layer of the high semantic information in the step 2) are as follows:

(1) respectively performing 1 × 1 convolution on the deep effective characteristic layers obtained in the step 1) by utilizing an ASPP structure, performing 3 × 3 convolution according to expansion ratios of 6, 12 and 18 respectively, and performing image pooling to obtain 5 effective characteristic layers.

(2) And stacking the 5 effective feature layers, and adjusting the number of channels by using 1 × 1 convolution to obtain the feature layer with high semantic information.

2. The method of claim 1, wherein the method comprises the steps of: the Xeception backbone network of the Deeplab V3+ model, Xception is the extreme of the inclusion.

In conclusion, the invention has the following beneficial effects:

1. according to the depth separable convolution, a Deeplab V3+ model is provided, the expansion convolution with different expansion rates is adopted for feature extraction, the receptive field of the network is improved, the network has different feature perception conditions, and therefore M areas with discontinuous or fuzzy pharyngeal swab image boundaries can be effectively segmented, and the segmentation precision is higher than that of other networks;

2. the Deeplab V3+ model adopts an Xception network, the Xception is completely decoupled into depth separable convolution, and the mapping of cross channel correlation and spatial correlation in the convolutional neural network feature mapping can be completely decoupled.

Drawings

FIG. 1 is a schematic flow chart of a method for identifying an image of a pharyngeal swab oral nucleic acid sampling area according to an embodiment of the present invention;

FIG. 2 is a network architecture diagram of an Xconcept in an embodiment of the present invention;

FIG. 3 is a graph of the loss function obtained from training in an embodiment of the present invention;

FIG. 4 is a comparison graph of an input lost frame image and filled gray levels in an embodiment of the present invention;

FIG. 5 is a comparison graph of M region results obtained by segmenting the mobilenetv2 and the Xconcentration network of the Deeplab V3+ model in the embodiment of the present invention;

FIG. 6 is a graph of the test results of a standard oral cavity in an embodiment of the invention;

FIG. 7 is a graph of the effectiveness of non-standard oral tests in an embodiment of the present invention, wherein the non-standard oral tests do not show M region;

FIG. 8 is a training set picture in an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to figures 1-4.

Example (b): a method for identifying an image of a pharyngeal swab oral cavity nucleic acid sampling area is disclosed, as shown in figures 1 to 4, and specifically comprises the following steps:

s2: training the acquired images based on a Deeplab V3+ network model; the Deeplab V3+ model is shown in FIG. 1, and the Deeplab V3+ model comprises an Encoder part and a Decode part;

the specific steps for training the Deeplab V3+ model in S2 are as follows:

2) the specific steps of obtaining the feature layer of the high semantic information are as follows:

(1) performing 1 × 1 convolution on the deep effective characteristic layers obtained in the step 1) by using an ASPP structure, performing 3 × 3 convolution on the deep effective characteristic layers according to expansion ratios of 6, 12 and 18 respectively, and performing image pooling to obtain 5 effective characteristic layers.

The Xeception backbone network of the Deeplab V3+ model, Xception is the extreme of the inclusion.

An Xception backbone network of the deplab v3+ model; the Xceptance is extremely induced, the inclusion structure is an intermediate form between the traditional convolution and the depth separable convolution, the Xceptance is completely decoupled into the depth separable convolution, and the mapping of the cross-channel correlation and the space correlation in the convolutional neural network feature mapping can be completely decoupled. The Xception network system structure has 36 convolution layers which form a network feature extraction library, and the 36 convolution layers have 14 modules, and the rest modules except the first module and the last module are in linear connection. Data first passes through the ingress stream, then through the intermediate stream, which is repeated 8 times, and finally through the egress stream.

Experimental data:

oral cavity images of 81 volunteers such as children, young people, middle-aged people and old people are collected in different environments, 1569 oral cavity images are collected in total, and a data set is expanded to 7845 throat swab images through different horizontal turning and rotation; a professional throat swab nucleic acid sampling doctor labels each picture by using a labelme labeling tool to generate a corresponding label file, and a manufactured training set is shown in FIG. 8, wherein 80% of images are used as the training set, 10% of images are used as a verification set, and 10% of images are used as a test set.

The environment configuration of model training is shown as the following table:

the trunk extraction network used by the Backbone is Xconvergence, and the model training is divided into a freezing stage and a thawing stage; a freezing stage: the main network of the model is frozen, the feature extraction network is not changed, the occupied video memory is small, and only the network is finely adjusted; and (3) a thawing stage: the main network of the model is not frozen, the feature extraction network is changed, the occupied video memory is large, and all parameters of the network are changed. Training the model by utilizing the environment of the pyrrch-gpu, wherein the proportion of a training set, a verification set and a test set in the training process is 8: 1: the LOSS function adopted by training consists of two parts, namely a common Cross Entropy LOSS function (Cross Entropy LOSS) and a set similarity measurement function (Dice LOSS); the common cross entropy loss function formula is:

wherein C represents the number of classes, the number of classes in this document is 1, p _i As true value, q _i Is a predicted value; the set similarity metric function is generally used to calculate the similarity between two samples, and is formulated as:

| X ≦ Y | is the intersection between X and Y; | X | YAnd | Y | represent the number of elements of X and Y, respectively. Among them, the coefficient 2 in the numerator is a reason why the denominator exists to repeatedly calculate the common element between X and Y. The larger the Dice coefficient is, the better the score is, the larger the degree of coincidence between the predicted result and the true result is. As LOSS, the smaller the LOSS, the better the die LOSS becomes 1-die, and LOSS is regarded as semantic segmentation. The loss function resulting from training is shown in fig. 3. From the curves, it can be seen that the loss value curve (value 0.08613) of the training set and the loss value curve (value 0.10026) of the test set gradually decrease and converge, and the trained model is ideal.

Image processing:

for the trained model, firstly, resize processing is performed on the input image, the processing mode can generate image distortion phenomenon, in order to ensure that the image is not distorted in the resize process, gray bars are filled in the vacant positions of the image, and the comparison between the distorted image and the image filled with the gray bars is shown in fig. 4. For the image added with the gray bars, normalization, adjustment of the number of channels to the first dimension and addition of the batch-size dimension are required. Because the prediction result comprises a gray strip part, the gray strip needs to be cut, and a new graph is created by utilizing seg-img to be the same as the original graph in size.

Evaluation indexes are as follows:

in order to evaluate the segmentation effect of the M region, Accuracy (Accuracy), Recall (Recall) and precision (precision) are used as evaluation indexes, and the calculation formula is as follows:

wherein tp (true positive) predicts the positive class, i.e. true class; FP (false positive) negative class is predicted positive class, and is false positive class; TN (true negative) predicts the negative class, i.e. true negative class; FN (false negative) positive class is predicted as negative class, and is false negative class. Judging the segmentation result of the image, judging and classifying by a professional throat swab nucleic acid sampling doctor, and determining a corresponding value.

Carrying out further test analysis on the model, collecting 205 new oral cavity images as a test set, wherein the new oral cavity images comprise 182 standard oral cavity images and 23 non-standard oral cavity images, judging whether the prediction result meets the standard by a professional throat swab nucleic acid sampling doctor, and the effect of a standard oral cavity prediction M area and a background area is shown in fig. 6; the effect of the non-standard oral outcome prediction M region with the background region is shown in fig. 7. The detection classification results are shown in table 1, and the model classification results are: 169 TP, 4 FP, 19 TN and 13 FN, and the calculated accuacy, Recall and Precision are respectively 92.12%, 92.86% and 97.69%.

	N	P
			F	13	4
T	18	169

TABLE 1 results of the classification

And (3) experimental comparison: in order to verify the segmentation effect of the Deeplab V3+ model, the mobilenetv2 and Xception networks of the U-Net and Deeplab V3+ models are selected for comparison, and the values of Accuracy, Recall and Precision in different networks after the test of the test set are shown in Table 2;

	Accuracy	Recall	Precision
				U-Net	75.45％	79.56％	89.67％
Mobilenetv2	77.07％	81.87％	91.41％
				Xception	92.12％	92.86％	97.69％

TABLE 2

The results of segmentation of Mobilenetv2 and Xception are shown in fig. 5. The first two rows of the first column contain pharyngeal swab images of the M regions, and the second two rows do not contain pharyngeal swab images of the M regions; the first two rows of the second column are M area outlines marked by doctors, and the non-standard acquisition areas of the last two rows are not marked; the third column and the fourth column are the effects of segmenting the M region and the background region by Mobilenetv2 and Xception, respectively. Experimental results show that the improved Xception can effectively distinguish an M region from a background region, and the M region contour prediction is more accurate.

The working principle is as follows: according to the depth separable convolution, a Deeplab V3+ model is provided, the expansion convolution with different expansion rates is adopted for feature extraction, the receptive field of the network is improved, the network has different feature receptive conditions, and therefore M areas with discontinuous or fuzzy pharyngeal swab image boundaries can be effectively segmented, and the segmentation progress is higher than that of other networks; the Deeplab V3+ model adopts an Xception network, the Xception is completely decoupled into depth separable convolution, and the mapping of cross channel correlation and spatial correlation in the convolutional neural network feature mapping can be completely decoupled.

The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

Claims

1. A throat swab oral cavity nucleic acid sampling area image identification method is characterized in that: the method specifically comprises the following steps:

s2: training the acquired images based on a Deeplab V3+ network model;

the specific steps for training the Deeplab V3+ model in the S2 are as follows:

2) after carrying out 1 multiplied by 1 convolution on the shallow effective characteristic layer, entering the Decoder, and stacking the result of carrying out up-sampling for four times with the characteristic layer with high semantic information;

(2) And stacking 5 effective feature layers, and adjusting the number of channels by using 1 multiplied by 1 convolution to obtain the feature layer with high semantic information.