CN113113119A - Training method of semantic segmentation network, image processing method and equipment thereof - Google Patents
Training method of semantic segmentation network, image processing method and equipment thereof Download PDFInfo
- Publication number
- CN113113119A CN113113119A CN202110309167.6A CN202110309167A CN113113119A CN 113113119 A CN113113119 A CN 113113119A CN 202110309167 A CN202110309167 A CN 202110309167A CN 113113119 A CN113113119 A CN 113113119A
- Authority
- CN
- China
- Prior art keywords
- training
- label data
- image
- network
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 182
- 230000011218 segmentation Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 230000006870 function Effects 0.000 claims description 70
- 238000012545 processing Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 8
- 230000000873 masking effect Effects 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 15
- 238000013135 deep learning Methods 0.000 description 5
- 238000001356 surgical procedure Methods 0.000 description 5
- 208000002177 Cataract Diseases 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Primary Health Care (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Epidemiology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The application provides a training method of a semantic segmentation network, an image processing method, a terminal device and a computer readable storage medium. The semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function. The technical scheme provided by the application is beneficial to improving the accuracy and robustness of the network.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a semantic segmentation network, an image processing method, a terminal device, and a computer-readable storage medium.
Background
In the field of medical image processing, machine learning has been used in a wide variety of applications, particularly in the field of endoscopic and microscopic surgical image processing. Endoscope, microscope operation because inside operable space is less, can only pass through very limited field of vision observation operation, and the operation environment is comparatively complicated, and smog, blood, specular reflection etc. all can cause the influence to limited field of vision, consequently, have higher requirement to the precision that the meaning was cut apart, and the precision requirement of endoscope, microscope operation to image processing can not be satisfied to current deep learning network of deep LabV3 +.
Disclosure of Invention
The application provides a training method of a semantic segmentation network, an image processing method, a terminal device and a computer readable storage medium.
The first technical scheme adopted by the application is as follows: the method for training the semantic segmentation network comprises an encoding network and a decoding network, wherein the decoding network comprises a main decoding network and an auxiliary decoding network, the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.
Optionally, the perturbation setting connects decoders in the encoding network and the secondary decoding network; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image, wherein the second label data comprises: and inputting the intermediate representation into disturbance setting to generate a disturbance version of the intermediate representation, and inputting the disturbance version into a decoder to obtain second label data of the training image.
Optionally, the obtaining a first loss function using the first label data and the second label data of the training image includes: calculating an error between first label data and second label data of the training image by using a mean square error loss function; an average of errors between the first label data and the second label data for all training images is calculated as a first loss function.
Optionally, the number of the secondary decoding networks is multiple, and each secondary decoding network includes a corresponding disturbance setting; calculating an error between the first label data and the second label data of the training image using a mean square error loss function, comprising: acquiring second label data of the training image output by each auxiliary decoding network; calculating an error between first label data of the training image and second label data corresponding to each auxiliary decoding network by using a mean square error loss function; and calculating the average number of errors between the first label data of the training image and the second label data corresponding to all the auxiliary decoding networks as the errors between the first label data and the second label data of the training image.
Optionally, the disturbance setting includes any one or more of F-Noise, F-Drop, Guided Masking, Intermediate VAT, and spatialDropout.
Optionally, the training image set comprises a first image and a second image with third label data; before inputting the training image set into the coding network, the method comprises the following steps: training a semantic segmentation network by using the second image; training a semantic segmentation network based on a first loss function, comprising: acquiring a second loss function by using the first label data and the third label data of the second image; forming a third loss function by using the second loss function and the first loss function; the semantic segmentation network is trained with the goal of reducing the third loss function.
Optionally, obtaining a second loss function using the first label data and the third label data of the second image includes: calculating an error between the first label data and the third label data of the second image using a cross entropy loss function; an average of errors between the first label data and the third label data of all the second images is calculated as a second loss function.
Optionally, before inputting the training image set into the encoding network, the method includes: and preprocessing the training image set by using a Poisson image editing algorithm.
The second technical scheme adopted by the application is as follows: an image processing method is provided, including acquiring an intra-surgical image; processing the operation image by utilizing a semantic segmentation network, wherein the semantic segmentation network is obtained by training through any one of the training methods; and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.
The third technical scheme adopted by the application is as follows: a terminal device is provided, the terminal device comprising a processor and a memory; the memory has stored therein a computer program for execution by the processor to implement the steps of the training method described above and/or the image processing method described above.
The fourth technical scheme adopted by the application is as follows: a computer storage medium is provided, which stores a computer program that, when executed, implements the steps of the training method and/or the image processing method described above.
The beneficial effect of this application is: in the application, the semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, wherein the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function. According to the method, disturbance setting is introduced into the auxiliary decoding network, and the original deep learning network DeepLabV3+ is improved based on a domain self-adaption principle, so that the accuracy and the robustness of the semantic segmentation network are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a training method for semantic segmentation networks provided in the present application;
FIG. 2 is a schematic diagram of a semantic segmentation network in the training method shown in FIG. 1;
FIG. 3 is another schematic flow chart diagram of the training method shown in FIG. 1;
FIG. 4 is another schematic flow chart diagram of the training method shown in FIG. 1;
FIG. 5 is a schematic flow chart of S30 in the training method shown in FIG. 1;
FIG. 6 is a schematic diagram of one embodiment of a perturbation setting of the present application;
FIG. 7 is a schematic diagram of another embodiment of a perturbation setting of the present application;
FIG. 8 is a schematic flow chart of S40 in the training method shown in FIG. 1;
FIG. 9 is a schematic flow chart diagram illustrating one embodiment of S41 of FIG. 8;
FIG. 10 is a schematic flow chart diagram illustrating one embodiment of S43 of FIG. 8;
FIG. 11 is a flowchart illustrating an embodiment of an image processing method of the present application;
FIG. 12 is a schematic structural diagram of an embodiment of a terminal device according to the present application;
FIG. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive work are within the scope of the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of an embodiment of a training method for a semantic segmentation network provided in the present application, and fig. 2 is a schematic diagram of the semantic segmentation network in the training method shown in fig. 1.
As shown in fig. 2, the semantic segmentation network is an end-to-end neural network, and may specifically include an encoding network e and a decoding network. Wherein the decoding network comprises a main decoding network d and K auxiliary decoding networksWherein, K is a natural number,indicating the kth secondary decoding netCollaterals of blood, da kIndicating the decoder, p, corresponding to the kth secondary decoding networkkAnd representing the disturbance setting corresponding to the kth auxiliary decoding network.
As shown in FIG. 2, in some embodiments, the decoding network may include multiple secondary decoding networksAnd, each secondary decoding networkMay include a perturbation setting p corresponding theretokAnd decoder da k. For example, the number of the secondary decoding networks may be 2, 3, 4, or 5, which is not limited in this application and can be selected by those skilled in the art according to actual needs.
Of course, in some embodiments, the number of the secondary decoding networks may be one. To a certain extent, the number of the auxiliary decoding networks is increased, so that the accuracy and the robustness of the semantic segmentation network are improved.
The training method of the semantic segmentation network is applied to a terminal device, wherein the terminal device can be a server, a mobile device, or a system in which the server and the mobile device are matched with each other. Accordingly, each part, such as each unit, sub-unit, module, and sub-module, included in the terminal device may be all disposed in the server, may be all disposed in the mobile device, and may be disposed in the server and the mobile device, respectively.
Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein.
As shown in fig. 1, the training method for semantic segmentation networks provided in the embodiment of the present application specifically includes the following steps:
s10: and inputting a training image set into the coding network, wherein the training image set comprises a plurality of training images to obtain the intermediate representation of the training images.
Specifically, a video camera or a video recorder may be used to capture images during a surgical procedure to capture a plurality of training images, forming a training image set D ═ { x ═ x1,x2,……,xi,……,xmWhere m denotes that the training image set D includes m training images, xiRepresenting the ith training image. As shown in FIG. 2, a training image xiInputting the coding network e to obtain the training image xiIs zi=e(xi)。
Referring to fig. 3, fig. 3 is another flow chart of the training method shown in fig. 1, and in some embodiments, to further improve the accuracy of the semantic segmentation network, before S10, the method may further include:
s01: the Poisson image editing algorithm is utilized to preprocess the training image set so as to remove the highlight part in the training image and avoid the influence of point light sources on image segmentation.
Specifically, the highlight region Ω in the training image may be extracted by a threshold processing, and then the highlight region in the training image may be removed by solving the following formula:
g(x)=(I-Gδ*I)(x) (1)
f(x)=I(x)(x∈Ω) (3)
where x denotes the pixels in the highlight region omega, I denotes the original image, GδThe image is subjected to Gaussian filtering processing, and f represents the image with highlight removed.
In some embodiments, the training image set D may include both the first image without the label and the second image with the label, at which time the semantic segmentation network is trained as a semi-supervised learning network.
In particular, the training image set D may comprise a first image set D1And a second image set D2Wherein the second image set can be represented as D2={x1,x2,……xj,……,xnWherein n represents the second image set D2Comprising n second images, x, with labelsjRepresenting the jth labeled second image. Accordingly, the first image set D1Then m-n unlabelled first images may be included. Wherein the second image set D2May be generated by manual annotation.
Referring to fig. 4, fig. 4 is another schematic flow chart of the training method shown in fig. 1, when the training image set D includes both the first image without the label and the second image with the label, before S10, the method may further include:
s02: and training the semantic segmentation network by using the second image.
In particular, the second image x may be utilized firstjTraining the coding network e and the main decoding network d, and then reusing the main decoding network d and the auxiliary decoding networkTraining secondary decoding network for prediction consistency between
As shown in fig. 4, S01 and S02 may be included before S10, and S01 precedes S02. In some embodiments, before S10, only S02 may be included, and S01 is not included, and the present application is not limited, and those skilled in the art can make and select the S02 according to actual needs. It should be noted that, in the present application, the sequence represented by the sequence numbers of the steps does not represent the actual execution sequence of the steps between the steps that do not have a certain precedence relationship, for example, in some embodiments, S02 may precede S01.
In some embodiments, the training image set D may further include m training images without labels, and at this time, the semantic segmentation network is trained as an unsupervised learning network.
Generally speaking, a fully supervised learning network has high accuracy, but needs a large amount of manually labeled training data, is difficult to acquire the training data, consumes a large amount of manpower, has poor generalization capability, and lacks certain flexibility.
Compared with the fully supervised learning network, the unsupervised learning network has better generalization capability, but the accuracy needs to be further improved.
Compared with an unsupervised learning network, the semi-supervised learning network has higher accuracy, only a small amount of artificially labeled training data is needed, and the acquisition difficulty of the training data is reduced. In addition, semi-supervised deep learning also has better generalization capability than fully supervised deep learning.
S20: and inputting the intermediate representation into a main decoding network to obtain first label data of the training image.
As shown in FIG. 2, a training image xiIs zi=e(xi) Inputting the training image x into the main decoding network d to obtain the training image x output by the main decoding network diFirst tag data d (z)i)。
S30: and inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image.
As previously described, in this embodiment, the decoding network may include a plurality of secondary decoding networks, each secondary decoding networkCan correspondingly output a training image xiA second tag data ofUnder the condition that the number of the auxiliary decoding networks is K, outputting a training image xiK second tag data.
As shown in fig. 2, in this embodimentIn an embodiment, the secondary decoding networkDisturbance setting p inkConnecting coding network e and decoder da k. As shown in fig. 5, fig. 5 is a schematic flow chart of S30 in the training method shown in fig. 1, and S30 may specifically include:
s31: the intermediate representation is input to the perturbation setting, generating a perturbed version of the intermediate representation.
As shown in FIG. 2, a training image xiIs zi=e(xi) Input-assisted decoding network Disturbance setting p inkObtaining the training image xiIs zi=e(xi) Perturbed version z ofi k。
S32: and inputting the disturbed version into a decoder to obtain second label data of the training image.
As shown in fig. 2, the training image x is divided intoiIs zi=e(xi) Perturbed version z ofi kInput-assisted decoding networkDecoder d in (1)a kObtaining second label data of the training image
In some other embodiments, it may also be a secondary decoding networkDecoder d in (1)a kConnecting a connection coding network e and a disturbance setting pkThis is not a limitation of the present application and the artThe skilled person can make a free choice depending on the actual situation.
The experimental results show that the phase comparison decoder da kConnecting a connection coding network e and a disturbance setting pkOf the disturbance setting pkConnecting coding network e and decoder da kThe scheme is more beneficial to improving the accuracy and the robustness of the semantic segmentation network.
As described above, in the present embodiment, each secondary decoding networkMay include a perturbation setting p corresponding theretokAnd decoder da kWherein p iskMay include any one or more of F-Noise, F-Drop, Guided Masking, Intermediate VAT, and spatialDropout.
Next, these disturbance settings will be described in detail.
F-Noise: FIG. 6 is a schematic diagram of an embodiment of disturbance setting according to the present application, with unified sampling and intermediate representation ziThe same noise tensor N-U (0.2, 0.3), by multiplying by ziTo adjust its scope, inject noise into z of the coded network outputiIn (b) to obtain ziPerturbed version z ofi k. Wherein the injected noise and ziProportional relationship, as shown in fig. 5.
F-Drop: as shown in fig. 7, fig. 7 is a schematic diagram of another embodiment of the disturbance setting of the present application, and the threshold γ to U (0.6, 0.9) is uniformly sampled first. Summing and normalizing feature maps z in channel dimensionsiTo obtain zi' later, we generate a mask Mdrop={zi’<γ } which is then used to obtain the perturbed version zi k=zi⊙MdropThus, we can mask 10% to 40% of the most active region in the feature map.
Guided Masking: context-related objects can be more quickly located and identified in familiar environments, and the information constituting elements of a scene can be contextThe inference provides a very important influencing factor. Creating z using a mask context (Con-Msk)iTo apply them to the intermediate representation ziTo obtain a perturbed version zi k。
Intermediate VAT (I-VAT): the slight perturbation of the input data will have an effect on the model result and the training result needs to be smooth so as to be stable. Thus, the range z is perturbed using the VAT functioniFor a given auxiliary encoder, the prediction result will be most affected against disturbances, and noise is injected into the intermediate representation ziTo obtain a perturbed version zi k。
Spatialdropout (dropout): as a random perturbation is applied in this network.
S40: and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.
As shown in fig. 8, fig. 8 is a flowchart of S40 in the training method shown in fig. 1, and the obtaining the first loss function by using the first label data and the second label data of the training image may include:
s41: an error between the first label data and the second label data of the training image is calculated using a mean square error loss function.
In some embodiments, other loss functions may also be used to calculate the error between the first label data and the second label data of each training image, for example, a cross entropy loss function, a mean absolute value error loss function, a Huber loss function, etc., which is not limited in this application and can be selected by one skilled in the art according to actual needs.
As described above, in the semantic segmentation network, the decoding network includes a plurality of secondary decoding networks, and each secondary decoding networkCan correspondingly output a training image xiA second tag data ofThe K auxiliary decoding networks correspondingly output K second label data.
Referring to fig. 9, fig. 9 is a schematic flowchart of an embodiment of S41 in fig. 8, where S41 may specifically include:
s411: and acquiring second label data of the training image output by each auxiliary decoding network.
S412: and calculating the error between the first label data of the training image and the second label data corresponding to each auxiliary decoding network by using a mean square error loss function.
In some embodiments, other loss functions may also be used to calculate the error between the first label data and each second label data of the training image, such as a cross entropy loss function, a mean absolute value error loss function, a Huber loss function, and so on, which is not limited in this application and can be selected by one skilled in the art according to actual needs.
S413: and calculating the average number of errors between the first label data of the training image and the second label data corresponding to all the auxiliary decoding networks as the errors between the first label data and the second label data of the training image.
And calculating errors between the first label data of the training image and the second label data corresponding to each auxiliary decoding network, summing the errors, and dividing the sum by the number K of the auxiliary decoding networks to obtain the errors between the first label data and the second label data of the training image.
S42: an average of errors between the first label data and the second label data for all training images is calculated as a first loss function.
The training image set comprises a plurality of training images, errors between first label data and second label data of each training image are calculated, summed, and divided by the number of the training images, so that a first loss function can be obtained.
wherein,representing a first loss function, D representing a training image set, m representing the number of training images included in the training image set, K representing the number of secondary decoding networks, xiRepresents the i-th training image, d (z)i) First label data representing an ith training image,and representing second label data of the ith training image output by the Kth auxiliary decoding network, wherein SE represents a mean square error function.
As previously mentioned, the training image set D comprises both a first image without a label and a second image with a label, the second image xjThe tag with can be recorded as third tag data yj. At this point, continuing to refer to fig. 8, training the semantic segmentation network based on the first loss function may include:
s43: a second loss function is obtained using the first label data and the third label data of the second image.
Specifically, as shown in fig. 10, fig. 10 is a schematic flowchart of an embodiment of S43 in fig. 8, where the step may include:
s431: an error between the first label data and the third label data of the second image is calculated using a cross entropy loss function.
In some embodiments, other loss functions may also be used to calculate the error between the first label data and the third label data of the second image, such as a mean square error loss function, a mean absolute value error loss function, a Huber loss function, and the like, which is not limited in this application and can be selected by one skilled in the art according to practical needs.
S432: an average of errors between the first label data and the third label data of all the second images is calculated as a second loss function.
The second loss function is obtained by calculating the error between the first label data and the third label data of each second image, summing the errors, and dividing the sum by the number of second images.
wherein,representing a second loss function, n representing a second set of images D2Number of second images, yjThird label data, d (z), representing the jth second imagej) First label data representing a jth second image, CE representing a cross entropy function.
S44: a third loss function is formed using the second loss function and the first loss function.
wherein, ω is1Representing a first loss functionWeight of (a), for example, ω1May be 1, and the present application does not limit ω1The value of (b) can be selected by those skilled in the art according to actual needs.
S45: the semantic segmentation network is trained with the goal of reducing the third loss function.
In some embodiments, S43 may precede S41, or S43 may be performed simultaneously with S41, which is not limited herein and may be selected by those skilled in the art according to actual needs. In the present application, the sequence represented by the serial number of each step does not represent the actual execution sequence of each step between each step that does not have a certain precedence relationship.
On one hand, the semantic segmentation network comprises a coding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, disturbance setting is introduced into the auxiliary decoding network, and an original deep learning network DeepLabV3+ is improved based on a domain self-adaption principle, so that the accuracy and the robustness of the semantic segmentation network are improved. On the other hand, the semi-supervised semantic segmentation network obtained by training through the training method can accurately segment images under the condition of less label data, the segmentation result is natural and mellow in boundary and easy to observe, higher accuracy and robustness are achieved, the number of labels required by training is reduced, the generalization capability is strong, the semi-supervised semantic segmentation network is more flexible, the semi-supervised semantic segmentation network can adapt to various segmentation scenes, and higher segmentation accuracy and robustness can be maintained for the types of surgical instruments with fewer occurrence times.
The DeepLabv3+ network based on the domain adaptive principle is realized on the data set CATARACTS Semantic Segmentation2020 of the public data set 352020, and a better effect is achieved, and the result has higher reliability, and can basically meet the safety requirement in the field of medical image processing.
Specifically, the data set was selected as CATARACTS Semantic Segmentation2020 data set, which included 50 videos of cataract surgery performed at the braes oversize hospital on days 1 and 22 of 2015 to 10 of 2015 and 9. A total of over 9 hours of surgical video was recorded. The training set contained 4 hours of 42 minutes of video, while the test set contained 4 hours of 24 minutes of video, with a sufficient number of samples. The data set had 25 video subsets, with the training set, validation set, and test set containing 3550, 534 (video subsets 5, 7, 16), 587 (video subsets 2, 12, and 22), respectively, and the input picture resolution was 512 x 512, using a random gradient descent (SGD) optimizer.
And comparing the quality of the algorithm processing result by depending on the Intersection over Union parameter to obtain the final experimental result. In the whole process, the research of theory and method and the realization and verification of algorithm are carried out synchronously, and the optimization is carried out alternately.
Referring to fig. 11, fig. 11 is a schematic flowchart illustrating an embodiment of an image processing method according to the present application, the image processing method including:
s201: an intra-surgical image is acquired.
For example, the intra-surgical images may be acquired by a camera or video recorder.
S202: and processing the operation image by utilizing a semantic segmentation network, wherein the semantic segmentation network is obtained by training through the training method.
Specifically, the semantic segmentation network obtained through training by the training method is used for performing semantic segmentation on the operation image.
S202: and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.
For example, the image processing method can be applied to image processing of cataract surgery, and can accurately segment a surgery image under the condition of less tag data. The boundary of the segmentation result is natural and mellow, the observation is easy, the higher accuracy and robustness are achieved, and the reliable reference is provided for the operating personnel. Of course, the image processing method can also be used in other endoscopic and microscopic surgeries, and the application is not limited, and the skilled person can select the image processing method according to actual needs.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application. The terminal device 100 comprises a processor 10 and a memory 20 coupled; the memory 20 stores a computer program for execution by the processor 10 to implement the steps of the training method and/or the image processing method as described above.
Wherein the processor 10 is used for the operation of the terminal 100, the processor 10 may also be referred to as a CPU (Central Processing Unit). The processor 10 may be an integrated circuit chip having signal processing capabilities. The processor 10 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like.
The memory 20 may include Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, and so forth. Memory 20 may store program data, which may include a single instruction, or many instructions, for example, and may be distributed over several different code segments, among different programs, and across multiple memories 20. Memory 20 may be coupled to processor 10 such that processor 10 can read information from, and write information to, memory 20. Of course, the memory 20 may be integral to the processor 10.
Referring to fig. 13, fig. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium 200 of the present application, in which a computer program is stored, and the computer program implements the steps of the training method and/or the image processing method when being executed.
The technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage device and includes instructions (program data) for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. The aforementioned storage device includes: various media such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and electronic devices such as a computer, a mobile phone, a notebook computer, a tablet computer, and a camera having the storage medium.
In several embodiments provided in the present application, it should be understood that the disclosed training method for semantic segmentation network may be implemented in other ways. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only a part of the embodiments of the present application, and not intended to limit the scope of the present application, and all equivalent devices or equivalent processes performed by the content of the present application and the attached drawings, or directly or indirectly applied to other related technical fields, are also included in the scope of the present application.
Claims (11)
1. A training method of a semantic segmentation network, wherein the semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a primary decoding network and a secondary decoding network, wherein the secondary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps:
inputting a training image set into the coding network, wherein the training image set comprises a plurality of training images, and an intermediate representation of the training images is obtained;
inputting the intermediate representation into the main decoding network to obtain first label data of the training image;
inputting the intermediate representation into the perturbation setting and the decoder to obtain second label data of the training image;
and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.
2. Training method according to claim 1, wherein the perturbation setting connects decoders in the coding network and the secondary decoding network;
the inputting the intermediate representation into the perturbation setting and the decoder to obtain second label data of the training image comprises:
inputting the intermediate representation into the perturbation setting, generating a perturbed version of the intermediate representation,
and inputting the disturbed version into the decoder to obtain second label data of the training image.
3. The training method of claim 1, wherein the obtaining a first loss function using the first label data and the second label data of the training image comprises:
calculating an error between the first label data and the second label data of the training image using a mean square error loss function;
calculating an average of errors between the first label data and the second label data for all of the training images as the first loss function.
4. The training method of claim 3, wherein the number of the secondary decoding networks is plural, each of the secondary decoding networks comprising a corresponding perturbation setting;
the calculating an error between the first label data and the second label data of the training image using a mean square error loss function includes:
acquiring second label data of the training image output by each auxiliary decoding network;
calculating an error between the first label data of the training image and second label data corresponding to each secondary decoding network by using a mean square error loss function;
calculating an average of errors between the first label data of the training image and the second label data corresponding to all the secondary decoding networks as errors between the first label data and the second label data of the training image.
5. Training method according to claim 4, wherein the disturbance settings comprise any one or several of F-Noise, F-Drop, Guided Masking, Intermediate VAT and spatialDropout.
6. Training method according to claim 1, wherein the set of training images comprises a first image and a second image with third label data;
before inputting the training image set into the coding network, the method comprises the following steps:
training the semantic segmentation network with the second image;
the training the semantic segmentation network based on the first loss function includes:
obtaining a second loss function using the first label data and the third label data of the second image;
forming a third loss function using the second loss function and the first loss function;
and training the semantic segmentation network with the aim of reducing the third loss function.
7. The training method of claim 6, wherein the obtaining a second loss function using the first label data and the third label data of the second image comprises:
calculating an error between the first label data and the third label data of the second image using a cross entropy loss function;
calculating an average of errors between the first label data and the third label data for all of the second images as the second loss function.
8. The training method of claim 1, wherein prior to inputting the set of training images into the encoding network, comprising:
and preprocessing the training image set by utilizing a Poisson image editing algorithm.
9. An image processing method, comprising:
acquiring an intra-surgical image;
processing the surgical image by using a semantic segmentation network, wherein the semantic segmentation network is obtained by training through the training method of any one of claims 1-8;
and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.
10. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory has stored therein a computer program for executing the computer program to implement the steps of the training method of any one of claims 1 to 8 and/or the image processing method of claim 9.
11. A computer storage medium, characterized in that it stores a computer program which, when executed, implements the steps of the training method of any one of claims 1 to 8 and/or the image processing method of claim 9.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110309167.6A CN113113119A (en) | 2021-03-23 | 2021-03-23 | Training method of semantic segmentation network, image processing method and equipment thereof |
PCT/CN2021/137599 WO2022199137A1 (en) | 2021-03-23 | 2021-12-13 | Training method for semantic segmentation network, image processing method and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110309167.6A CN113113119A (en) | 2021-03-23 | 2021-03-23 | Training method of semantic segmentation network, image processing method and equipment thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113113119A true CN113113119A (en) | 2021-07-13 |
Family
ID=76710438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110309167.6A Pending CN113113119A (en) | 2021-03-23 | 2021-03-23 | Training method of semantic segmentation network, image processing method and equipment thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113113119A (en) |
WO (1) | WO2022199137A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705666A (en) * | 2021-08-26 | 2021-11-26 | 平安科技(深圳)有限公司 | Segmentation network training method, using method, device, equipment and storage medium |
CN114494800A (en) * | 2022-02-17 | 2022-05-13 | 平安科技(深圳)有限公司 | Prediction model training method and device, electronic equipment and storage medium |
WO2022199137A1 (en) * | 2021-03-23 | 2022-09-29 | 中国科学院深圳先进技术研究院 | Training method for semantic segmentation network, image processing method and device thereof |
US20230154185A1 (en) * | 2021-11-12 | 2023-05-18 | Adobe Inc. | Multi-source panoptic feature pyramid network |
WO2024175045A1 (en) * | 2023-02-22 | 2024-08-29 | 华为技术有限公司 | Model training method and apparatus, and electronic device and storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546483B (en) * | 2022-09-30 | 2023-05-12 | 哈尔滨市科佳通用机电股份有限公司 | Deep learning-based method for measuring residual usage amount of carbon slide plate of subway pantograph |
CN116168242B (en) * | 2023-02-08 | 2023-12-01 | 阿里巴巴(中国)有限公司 | Pixel-level label generation method, model training method and equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161279A (en) * | 2019-12-12 | 2020-05-15 | 中国科学院深圳先进技术研究院 | Medical image segmentation method and device and server |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472360B (en) * | 2018-10-30 | 2020-09-04 | 北京地平线机器人技术研发有限公司 | Neural network updating method and updating device and electronic equipment |
CN110097131B (en) * | 2019-05-08 | 2023-04-28 | 南京大学 | Semi-supervised medical image segmentation method based on countermeasure cooperative training |
CN110533044B (en) * | 2019-05-29 | 2023-01-20 | 广东工业大学 | Domain adaptive image semantic segmentation method based on GAN |
CN110909744B (en) * | 2019-11-26 | 2022-08-19 | 山东师范大学 | Multi-description coding method and system combined with semantic segmentation |
CN111091166B (en) * | 2020-03-25 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Image processing model training method, image processing device, and storage medium |
CN112035834A (en) * | 2020-08-28 | 2020-12-04 | 北京推想科技有限公司 | Countermeasure training method and device, and application method and device of neural network model |
CN113113119A (en) * | 2021-03-23 | 2021-07-13 | 中国科学院深圳先进技术研究院 | Training method of semantic segmentation network, image processing method and equipment thereof |
-
2021
- 2021-03-23 CN CN202110309167.6A patent/CN113113119A/en active Pending
- 2021-12-13 WO PCT/CN2021/137599 patent/WO2022199137A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161279A (en) * | 2019-12-12 | 2020-05-15 | 中国科学院深圳先进技术研究院 | Medical image segmentation method and device and server |
Non-Patent Citations (3)
Title |
---|
HONGYU CHEN 等: "Semi-supervised Semantic Segmentation of Cataract Surgical Images based on DeepLab v3+", 《ACM》 * |
YASSINE OUALI 等: "Semi-Supervised Semantic Segmentation with Cross-Consistency Training", 《IEEE》 * |
史攀 等: "深度学习在微创手术视频分析中的应用研究综述", 《中国生物医学工程学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022199137A1 (en) * | 2021-03-23 | 2022-09-29 | 中国科学院深圳先进技术研究院 | Training method for semantic segmentation network, image processing method and device thereof |
CN113705666A (en) * | 2021-08-26 | 2021-11-26 | 平安科技(深圳)有限公司 | Segmentation network training method, using method, device, equipment and storage medium |
CN113705666B (en) * | 2021-08-26 | 2023-10-27 | 平安科技(深圳)有限公司 | Split network training method, use method, device, equipment and storage medium |
US20230154185A1 (en) * | 2021-11-12 | 2023-05-18 | Adobe Inc. | Multi-source panoptic feature pyramid network |
US11941884B2 (en) * | 2021-11-12 | 2024-03-26 | Adobe Inc. | Multi-source panoptic feature pyramid network |
CN114494800A (en) * | 2022-02-17 | 2022-05-13 | 平安科技(深圳)有限公司 | Prediction model training method and device, electronic equipment and storage medium |
CN114494800B (en) * | 2022-02-17 | 2024-05-10 | 平安科技(深圳)有限公司 | Predictive model training method and device, electronic equipment and storage medium |
WO2024175045A1 (en) * | 2023-02-22 | 2024-08-29 | 华为技术有限公司 | Model training method and apparatus, and electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2022199137A1 (en) | 2022-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113113119A (en) | Training method of semantic segmentation network, image processing method and equipment thereof | |
CN110347799B (en) | Language model training method and device and computer equipment | |
CN112330685B (en) | Image segmentation model training method, image segmentation device and electronic equipment | |
CN110677598A (en) | Video generation method and device, electronic equipment and computer storage medium | |
CN113836992B (en) | Label identification method, label identification model training method, device and equipment | |
US20230143452A1 (en) | Method and apparatus for generating image, electronic device and storage medium | |
CN115762484B (en) | Multi-mode data fusion method, device, equipment and medium for voice recognition | |
CN112818995B (en) | Image classification method, device, electronic equipment and storage medium | |
CN116050496A (en) | Determination method and device, medium and equipment of picture description information generation model | |
CN116127080A (en) | Method for extracting attribute value of description object and related equipment | |
CN116993864A (en) | Image generation method and device, electronic equipment and storage medium | |
CN115620371A (en) | Training method and device for speaking video generation model, electronic equipment and storage medium | |
CN113837179B (en) | Multi-discriminant GAN network construction method, device and system for processing images and storage medium | |
CN114529917B (en) | Zero-sample Chinese single-word recognition method, system, device and storage medium | |
CN112488148A (en) | Clustering method and device based on variational self-encoder | |
CN113705276A (en) | Model construction method, model construction device, computer apparatus, and medium | |
CN114926479A (en) | Image processing method and device | |
CN113177957B (en) | Cell image segmentation method and device, electronic equipment and storage medium | |
CN117556048A (en) | Artificial intelligence-based intention recognition method, device, equipment and medium | |
CN108765413B (en) | Method, apparatus and computer readable medium for image classification | |
CN116402831A (en) | Partially-supervised abdomen CT sequence image multi-organ automatic segmentation method and device | |
CN111598904B (en) | Image segmentation method, device, equipment and storage medium | |
CN112182268B (en) | Image classification method, device, electronic equipment and storage medium | |
CN110781646B (en) | Name standardization method, device, medium and electronic equipment | |
CN117541758B (en) | Virtual face configuration parameter generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210713 |
|
RJ01 | Rejection of invention patent application after publication |