CN113113119A - Training method of semantic segmentation network, image processing method and equipment thereof - Google Patents

Training method of semantic segmentation network, image processing method and equipment thereof Download PDF

Info

Publication number
CN113113119A
CN113113119A CN202110309167.6A CN202110309167A CN113113119A CN 113113119 A CN113113119 A CN 113113119A CN 202110309167 A CN202110309167 A CN 202110309167A CN 113113119 A CN113113119 A CN 113113119A
Authority
CN
China
Prior art keywords
training
label data
image
network
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110309167.6A
Other languages
Chinese (zh)
Inventor
贾富仓
陈宏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202110309167.6A priority Critical patent/CN113113119A/en
Publication of CN113113119A publication Critical patent/CN113113119A/en
Priority to PCT/CN2021/137599 priority patent/WO2022199137A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Primary Health Care (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a training method of a semantic segmentation network, an image processing method, a terminal device and a computer readable storage medium. The semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function. The technical scheme provided by the application is beneficial to improving the accuracy and robustness of the network.

Description

Training method of semantic segmentation network, image processing method and equipment thereof
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a semantic segmentation network, an image processing method, a terminal device, and a computer-readable storage medium.
Background
In the field of medical image processing, machine learning has been used in a wide variety of applications, particularly in the field of endoscopic and microscopic surgical image processing. Endoscope, microscope operation because inside operable space is less, can only pass through very limited field of vision observation operation, and the operation environment is comparatively complicated, and smog, blood, specular reflection etc. all can cause the influence to limited field of vision, consequently, have higher requirement to the precision that the meaning was cut apart, and the precision requirement of endoscope, microscope operation to image processing can not be satisfied to current deep learning network of deep LabV3 +.
Disclosure of Invention
The application provides a training method of a semantic segmentation network, an image processing method, a terminal device and a computer readable storage medium.
The first technical scheme adopted by the application is as follows: the method for training the semantic segmentation network comprises an encoding network and a decoding network, wherein the decoding network comprises a main decoding network and an auxiliary decoding network, the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.
Optionally, the perturbation setting connects decoders in the encoding network and the secondary decoding network; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image, wherein the second label data comprises: and inputting the intermediate representation into disturbance setting to generate a disturbance version of the intermediate representation, and inputting the disturbance version into a decoder to obtain second label data of the training image.
Optionally, the obtaining a first loss function using the first label data and the second label data of the training image includes: calculating an error between first label data and second label data of the training image by using a mean square error loss function; an average of errors between the first label data and the second label data for all training images is calculated as a first loss function.
Optionally, the number of the secondary decoding networks is multiple, and each secondary decoding network includes a corresponding disturbance setting; calculating an error between the first label data and the second label data of the training image using a mean square error loss function, comprising: acquiring second label data of the training image output by each auxiliary decoding network; calculating an error between first label data of the training image and second label data corresponding to each auxiliary decoding network by using a mean square error loss function; and calculating the average number of errors between the first label data of the training image and the second label data corresponding to all the auxiliary decoding networks as the errors between the first label data and the second label data of the training image.
Optionally, the disturbance setting includes any one or more of F-Noise, F-Drop, Guided Masking, Intermediate VAT, and spatialDropout.
Optionally, the training image set comprises a first image and a second image with third label data; before inputting the training image set into the coding network, the method comprises the following steps: training a semantic segmentation network by using the second image; training a semantic segmentation network based on a first loss function, comprising: acquiring a second loss function by using the first label data and the third label data of the second image; forming a third loss function by using the second loss function and the first loss function; the semantic segmentation network is trained with the goal of reducing the third loss function.
Optionally, obtaining a second loss function using the first label data and the third label data of the second image includes: calculating an error between the first label data and the third label data of the second image using a cross entropy loss function; an average of errors between the first label data and the third label data of all the second images is calculated as a second loss function.
Optionally, before inputting the training image set into the encoding network, the method includes: and preprocessing the training image set by using a Poisson image editing algorithm.
The second technical scheme adopted by the application is as follows: an image processing method is provided, including acquiring an intra-surgical image; processing the operation image by utilizing a semantic segmentation network, wherein the semantic segmentation network is obtained by training through any one of the training methods; and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.
The third technical scheme adopted by the application is as follows: a terminal device is provided, the terminal device comprising a processor and a memory; the memory has stored therein a computer program for execution by the processor to implement the steps of the training method described above and/or the image processing method described above.
The fourth technical scheme adopted by the application is as follows: a computer storage medium is provided, which stores a computer program that, when executed, implements the steps of the training method and/or the image processing method described above.
The beneficial effect of this application is: in the application, the semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, wherein the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function. According to the method, disturbance setting is introduced into the auxiliary decoding network, and the original deep learning network DeepLabV3+ is improved based on a domain self-adaption principle, so that the accuracy and the robustness of the semantic segmentation network are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a training method for semantic segmentation networks provided in the present application;
FIG. 2 is a schematic diagram of a semantic segmentation network in the training method shown in FIG. 1;
FIG. 3 is another schematic flow chart diagram of the training method shown in FIG. 1;
FIG. 4 is another schematic flow chart diagram of the training method shown in FIG. 1;
FIG. 5 is a schematic flow chart of S30 in the training method shown in FIG. 1;
FIG. 6 is a schematic diagram of one embodiment of a perturbation setting of the present application;
FIG. 7 is a schematic diagram of another embodiment of a perturbation setting of the present application;
FIG. 8 is a schematic flow chart of S40 in the training method shown in FIG. 1;
FIG. 9 is a schematic flow chart diagram illustrating one embodiment of S41 of FIG. 8;
FIG. 10 is a schematic flow chart diagram illustrating one embodiment of S43 of FIG. 8;
FIG. 11 is a flowchart illustrating an embodiment of an image processing method of the present application;
FIG. 12 is a schematic structural diagram of an embodiment of a terminal device according to the present application;
FIG. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive work are within the scope of the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of an embodiment of a training method for a semantic segmentation network provided in the present application, and fig. 2 is a schematic diagram of the semantic segmentation network in the training method shown in fig. 1.
As shown in fig. 2, the semantic segmentation network is an end-to-end neural network, and may specifically include an encoding network e and a decoding network. Wherein the decoding network comprises a main decoding network d and K auxiliary decoding networks
Figure BDA0002988856510000051
Wherein, K is a natural number,
Figure BDA0002988856510000052
indicating the kth secondary decoding netCollaterals of blood, da kIndicating the decoder, p, corresponding to the kth secondary decoding networkkAnd representing the disturbance setting corresponding to the kth auxiliary decoding network.
As shown in FIG. 2, in some embodiments, the decoding network may include multiple secondary decoding networks
Figure BDA0002988856510000053
And, each secondary decoding network
Figure BDA0002988856510000054
May include a perturbation setting p corresponding theretokAnd decoder da k. For example, the number of the secondary decoding networks may be 2, 3, 4, or 5, which is not limited in this application and can be selected by those skilled in the art according to actual needs.
Of course, in some embodiments, the number of the secondary decoding networks may be one. To a certain extent, the number of the auxiliary decoding networks is increased, so that the accuracy and the robustness of the semantic segmentation network are improved.
The training method of the semantic segmentation network is applied to a terminal device, wherein the terminal device can be a server, a mobile device, or a system in which the server and the mobile device are matched with each other. Accordingly, each part, such as each unit, sub-unit, module, and sub-module, included in the terminal device may be all disposed in the server, may be all disposed in the mobile device, and may be disposed in the server and the mobile device, respectively.
Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein.
As shown in fig. 1, the training method for semantic segmentation networks provided in the embodiment of the present application specifically includes the following steps:
s10: and inputting a training image set into the coding network, wherein the training image set comprises a plurality of training images to obtain the intermediate representation of the training images.
Specifically, a video camera or a video recorder may be used to capture images during a surgical procedure to capture a plurality of training images, forming a training image set D ═ { x ═ x1,x2,……,xi,……,xmWhere m denotes that the training image set D includes m training images, xiRepresenting the ith training image. As shown in FIG. 2, a training image xiInputting the coding network e to obtain the training image xiIs zi=e(xi)。
Referring to fig. 3, fig. 3 is another flow chart of the training method shown in fig. 1, and in some embodiments, to further improve the accuracy of the semantic segmentation network, before S10, the method may further include:
s01: the Poisson image editing algorithm is utilized to preprocess the training image set so as to remove the highlight part in the training image and avoid the influence of point light sources on image segmentation.
Specifically, the highlight region Ω in the training image may be extracted by a threshold processing, and then the highlight region in the training image may be removed by solving the following formula:
g(x)=(I-Gδ*I)(x) (1)
Figure BDA0002988856510000061
f(x)=I(x)(x∈Ω) (3)
where x denotes the pixels in the highlight region omega, I denotes the original image, GδThe image is subjected to Gaussian filtering processing, and f represents the image with highlight removed.
In some embodiments, the training image set D may include both the first image without the label and the second image with the label, at which time the semantic segmentation network is trained as a semi-supervised learning network.
In particular, the training image set D may comprise a first image set D1And a second image set D2Wherein the second image set can be represented as D2={x1,x2,……xj,……,xnWherein n represents the second image set D2Comprising n second images, x, with labelsjRepresenting the jth labeled second image. Accordingly, the first image set D1Then m-n unlabelled first images may be included. Wherein the second image set D2May be generated by manual annotation.
Referring to fig. 4, fig. 4 is another schematic flow chart of the training method shown in fig. 1, when the training image set D includes both the first image without the label and the second image with the label, before S10, the method may further include:
s02: and training the semantic segmentation network by using the second image.
In particular, the second image x may be utilized firstjTraining the coding network e and the main decoding network d, and then reusing the main decoding network d and the auxiliary decoding network
Figure BDA0002988856510000072
Training secondary decoding network for prediction consistency between
Figure BDA0002988856510000073
As shown in fig. 4, S01 and S02 may be included before S10, and S01 precedes S02. In some embodiments, before S10, only S02 may be included, and S01 is not included, and the present application is not limited, and those skilled in the art can make and select the S02 according to actual needs. It should be noted that, in the present application, the sequence represented by the sequence numbers of the steps does not represent the actual execution sequence of the steps between the steps that do not have a certain precedence relationship, for example, in some embodiments, S02 may precede S01.
In some embodiments, the training image set D may further include m training images without labels, and at this time, the semantic segmentation network is trained as an unsupervised learning network.
Generally speaking, a fully supervised learning network has high accuracy, but needs a large amount of manually labeled training data, is difficult to acquire the training data, consumes a large amount of manpower, has poor generalization capability, and lacks certain flexibility.
Compared with the fully supervised learning network, the unsupervised learning network has better generalization capability, but the accuracy needs to be further improved.
Compared with an unsupervised learning network, the semi-supervised learning network has higher accuracy, only a small amount of artificially labeled training data is needed, and the acquisition difficulty of the training data is reduced. In addition, semi-supervised deep learning also has better generalization capability than fully supervised deep learning.
S20: and inputting the intermediate representation into a main decoding network to obtain first label data of the training image.
As shown in FIG. 2, a training image xiIs zi=e(xi) Inputting the training image x into the main decoding network d to obtain the training image x output by the main decoding network diFirst tag data d (z)i)。
S30: and inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image.
As previously described, in this embodiment, the decoding network may include a plurality of secondary decoding networks, each secondary decoding network
Figure BDA0002988856510000074
Can correspondingly output a training image xiA second tag data of
Figure BDA0002988856510000071
Under the condition that the number of the auxiliary decoding networks is K, outputting a training image xiK second tag data.
As shown in fig. 2, in this embodimentIn an embodiment, the secondary decoding network
Figure BDA0002988856510000082
Disturbance setting p inkConnecting coding network e and decoder da k. As shown in fig. 5, fig. 5 is a schematic flow chart of S30 in the training method shown in fig. 1, and S30 may specifically include:
s31: the intermediate representation is input to the perturbation setting, generating a perturbed version of the intermediate representation.
As shown in FIG. 2, a training image xiIs zi=e(xi) Input-assisted decoding network
Figure BDA0002988856510000083
Figure BDA0002988856510000084
Disturbance setting p inkObtaining the training image xiIs zi=e(xi) Perturbed version z ofi k
S32: and inputting the disturbed version into a decoder to obtain second label data of the training image.
As shown in fig. 2, the training image x is divided intoiIs zi=e(xi) Perturbed version z ofi kInput-assisted decoding network
Figure BDA0002988856510000085
Decoder d in (1)a kObtaining second label data of the training image
Figure BDA0002988856510000081
In some other embodiments, it may also be a secondary decoding network
Figure BDA0002988856510000086
Decoder d in (1)a kConnecting a connection coding network e and a disturbance setting pkThis is not a limitation of the present application and the artThe skilled person can make a free choice depending on the actual situation.
The experimental results show that the phase comparison decoder da kConnecting a connection coding network e and a disturbance setting pkOf the disturbance setting pkConnecting coding network e and decoder da kThe scheme is more beneficial to improving the accuracy and the robustness of the semantic segmentation network.
As described above, in the present embodiment, each secondary decoding network
Figure BDA0002988856510000087
May include a perturbation setting p corresponding theretokAnd decoder da kWherein p iskMay include any one or more of F-Noise, F-Drop, Guided Masking, Intermediate VAT, and spatialDropout.
Next, these disturbance settings will be described in detail.
F-Noise: FIG. 6 is a schematic diagram of an embodiment of disturbance setting according to the present application, with unified sampling and intermediate representation ziThe same noise tensor N-U (0.2, 0.3), by multiplying by ziTo adjust its scope, inject noise into z of the coded network outputiIn (b) to obtain ziPerturbed version z ofi k. Wherein the injected noise and ziProportional relationship, as shown in fig. 5.
F-Drop: as shown in fig. 7, fig. 7 is a schematic diagram of another embodiment of the disturbance setting of the present application, and the threshold γ to U (0.6, 0.9) is uniformly sampled first. Summing and normalizing feature maps z in channel dimensionsiTo obtain zi' later, we generate a mask Mdrop={zi’<γ } which is then used to obtain the perturbed version zi k=zi⊙MdropThus, we can mask 10% to 40% of the most active region in the feature map.
Guided Masking: context-related objects can be more quickly located and identified in familiar environments, and the information constituting elements of a scene can be contextThe inference provides a very important influencing factor. Creating z using a mask context (Con-Msk)iTo apply them to the intermediate representation ziTo obtain a perturbed version zi k
Intermediate VAT (I-VAT): the slight perturbation of the input data will have an effect on the model result and the training result needs to be smooth so as to be stable. Thus, the range z is perturbed using the VAT functioniFor a given auxiliary encoder, the prediction result will be most affected against disturbances, and noise is injected into the intermediate representation ziTo obtain a perturbed version zi k
Spatialdropout (dropout): as a random perturbation is applied in this network.
S40: and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.
As shown in fig. 8, fig. 8 is a flowchart of S40 in the training method shown in fig. 1, and the obtaining the first loss function by using the first label data and the second label data of the training image may include:
s41: an error between the first label data and the second label data of the training image is calculated using a mean square error loss function.
In some embodiments, other loss functions may also be used to calculate the error between the first label data and the second label data of each training image, for example, a cross entropy loss function, a mean absolute value error loss function, a Huber loss function, etc., which is not limited in this application and can be selected by one skilled in the art according to actual needs.
As described above, in the semantic segmentation network, the decoding network includes a plurality of secondary decoding networks, and each secondary decoding network
Figure BDA0002988856510000092
Can correspondingly output a training image xiA second tag data of
Figure BDA0002988856510000091
The K auxiliary decoding networks correspondingly output K second label data.
Referring to fig. 9, fig. 9 is a schematic flowchart of an embodiment of S41 in fig. 8, where S41 may specifically include:
s411: and acquiring second label data of the training image output by each auxiliary decoding network.
S412: and calculating the error between the first label data of the training image and the second label data corresponding to each auxiliary decoding network by using a mean square error loss function.
In some embodiments, other loss functions may also be used to calculate the error between the first label data and each second label data of the training image, such as a cross entropy loss function, a mean absolute value error loss function, a Huber loss function, and so on, which is not limited in this application and can be selected by one skilled in the art according to actual needs.
S413: and calculating the average number of errors between the first label data of the training image and the second label data corresponding to all the auxiliary decoding networks as the errors between the first label data and the second label data of the training image.
And calculating errors between the first label data of the training image and the second label data corresponding to each auxiliary decoding network, summing the errors, and dividing the sum by the number K of the auxiliary decoding networks to obtain the errors between the first label data and the second label data of the training image.
S42: an average of errors between the first label data and the second label data for all training images is calculated as a first loss function.
The training image set comprises a plurality of training images, errors between first label data and second label data of each training image are calculated, summed, and divided by the number of the training images, so that a first loss function can be obtained.
Specifically, the first loss function may be:
Figure BDA0002988856510000101
wherein,
Figure BDA0002988856510000102
representing a first loss function, D representing a training image set, m representing the number of training images included in the training image set, K representing the number of secondary decoding networks, xiRepresents the i-th training image, d (z)i) First label data representing an ith training image,
Figure BDA0002988856510000103
and representing second label data of the ith training image output by the Kth auxiliary decoding network, wherein SE represents a mean square error function.
As previously mentioned, the training image set D comprises both a first image without a label and a second image with a label, the second image xjThe tag with can be recorded as third tag data yj. At this point, continuing to refer to fig. 8, training the semantic segmentation network based on the first loss function may include:
s43: a second loss function is obtained using the first label data and the third label data of the second image.
Specifically, as shown in fig. 10, fig. 10 is a schematic flowchart of an embodiment of S43 in fig. 8, where the step may include:
s431: an error between the first label data and the third label data of the second image is calculated using a cross entropy loss function.
In some embodiments, other loss functions may also be used to calculate the error between the first label data and the third label data of the second image, such as a mean square error loss function, a mean absolute value error loss function, a Huber loss function, and the like, which is not limited in this application and can be selected by one skilled in the art according to practical needs.
S432: an average of errors between the first label data and the third label data of all the second images is calculated as a second loss function.
The second loss function is obtained by calculating the error between the first label data and the third label data of each second image, summing the errors, and dividing the sum by the number of second images.
That is, the second loss function may be:
Figure BDA0002988856510000111
wherein,
Figure BDA0002988856510000112
representing a second loss function, n representing a second set of images D2Number of second images, yjThird label data, d (z), representing the jth second imagej) First label data representing a jth second image, CE representing a cross entropy function.
S44: a third loss function is formed using the second loss function and the first loss function.
Specifically, the third loss function may be expressed as:
Figure BDA0002988856510000113
wherein, ω is1Representing a first loss function
Figure BDA0002988856510000114
Weight of (a), for example, ω1May be 1, and the present application does not limit ω1The value of (b) can be selected by those skilled in the art according to actual needs.
S45: the semantic segmentation network is trained with the goal of reducing the third loss function.
In some embodiments, S43 may precede S41, or S43 may be performed simultaneously with S41, which is not limited herein and may be selected by those skilled in the art according to actual needs. In the present application, the sequence represented by the serial number of each step does not represent the actual execution sequence of each step between each step that does not have a certain precedence relationship.
On one hand, the semantic segmentation network comprises a coding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, disturbance setting is introduced into the auxiliary decoding network, and an original deep learning network DeepLabV3+ is improved based on a domain self-adaption principle, so that the accuracy and the robustness of the semantic segmentation network are improved. On the other hand, the semi-supervised semantic segmentation network obtained by training through the training method can accurately segment images under the condition of less label data, the segmentation result is natural and mellow in boundary and easy to observe, higher accuracy and robustness are achieved, the number of labels required by training is reduced, the generalization capability is strong, the semi-supervised semantic segmentation network is more flexible, the semi-supervised semantic segmentation network can adapt to various segmentation scenes, and higher segmentation accuracy and robustness can be maintained for the types of surgical instruments with fewer occurrence times.
The DeepLabv3+ network based on the domain adaptive principle is realized on the data set CATARACTS Semantic Segmentation2020 of the public data set 352020, and a better effect is achieved, and the result has higher reliability, and can basically meet the safety requirement in the field of medical image processing.
Specifically, the data set was selected as CATARACTS Semantic Segmentation2020 data set, which included 50 videos of cataract surgery performed at the braes oversize hospital on days 1 and 22 of 2015 to 10 of 2015 and 9. A total of over 9 hours of surgical video was recorded. The training set contained 4 hours of 42 minutes of video, while the test set contained 4 hours of 24 minutes of video, with a sufficient number of samples. The data set had 25 video subsets, with the training set, validation set, and test set containing 3550, 534 (video subsets 5, 7, 16), 587 (video subsets 2, 12, and 22), respectively, and the input picture resolution was 512 x 512, using a random gradient descent (SGD) optimizer.
And comparing the quality of the algorithm processing result by depending on the Intersection over Union parameter to obtain the final experimental result. In the whole process, the research of theory and method and the realization and verification of algorithm are carried out synchronously, and the optimization is carried out alternately.
Referring to fig. 11, fig. 11 is a schematic flowchart illustrating an embodiment of an image processing method according to the present application, the image processing method including:
s201: an intra-surgical image is acquired.
For example, the intra-surgical images may be acquired by a camera or video recorder.
S202: and processing the operation image by utilizing a semantic segmentation network, wherein the semantic segmentation network is obtained by training through the training method.
Specifically, the semantic segmentation network obtained through training by the training method is used for performing semantic segmentation on the operation image.
S202: and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.
For example, the image processing method can be applied to image processing of cataract surgery, and can accurately segment a surgery image under the condition of less tag data. The boundary of the segmentation result is natural and mellow, the observation is easy, the higher accuracy and robustness are achieved, and the reliable reference is provided for the operating personnel. Of course, the image processing method can also be used in other endoscopic and microscopic surgeries, and the application is not limited, and the skilled person can select the image processing method according to actual needs.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application. The terminal device 100 comprises a processor 10 and a memory 20 coupled; the memory 20 stores a computer program for execution by the processor 10 to implement the steps of the training method and/or the image processing method as described above.
Wherein the processor 10 is used for the operation of the terminal 100, the processor 10 may also be referred to as a CPU (Central Processing Unit). The processor 10 may be an integrated circuit chip having signal processing capabilities. The processor 10 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like.
The memory 20 may include Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, and so forth. Memory 20 may store program data, which may include a single instruction, or many instructions, for example, and may be distributed over several different code segments, among different programs, and across multiple memories 20. Memory 20 may be coupled to processor 10 such that processor 10 can read information from, and write information to, memory 20. Of course, the memory 20 may be integral to the processor 10.
Referring to fig. 13, fig. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium 200 of the present application, in which a computer program is stored, and the computer program implements the steps of the training method and/or the image processing method when being executed.
The technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage device and includes instructions (program data) for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. The aforementioned storage device includes: various media such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and electronic devices such as a computer, a mobile phone, a notebook computer, a tablet computer, and a camera having the storage medium.
In several embodiments provided in the present application, it should be understood that the disclosed training method for semantic segmentation network may be implemented in other ways. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only a part of the embodiments of the present application, and not intended to limit the scope of the present application, and all equivalent devices or equivalent processes performed by the content of the present application and the attached drawings, or directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (11)

1. A training method of a semantic segmentation network, wherein the semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a primary decoding network and a secondary decoding network, wherein the secondary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps:
inputting a training image set into the coding network, wherein the training image set comprises a plurality of training images, and an intermediate representation of the training images is obtained;
inputting the intermediate representation into the main decoding network to obtain first label data of the training image;
inputting the intermediate representation into the perturbation setting and the decoder to obtain second label data of the training image;
and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.
2. Training method according to claim 1, wherein the perturbation setting connects decoders in the coding network and the secondary decoding network;
the inputting the intermediate representation into the perturbation setting and the decoder to obtain second label data of the training image comprises:
inputting the intermediate representation into the perturbation setting, generating a perturbed version of the intermediate representation,
and inputting the disturbed version into the decoder to obtain second label data of the training image.
3. The training method of claim 1, wherein the obtaining a first loss function using the first label data and the second label data of the training image comprises:
calculating an error between the first label data and the second label data of the training image using a mean square error loss function;
calculating an average of errors between the first label data and the second label data for all of the training images as the first loss function.
4. The training method of claim 3, wherein the number of the secondary decoding networks is plural, each of the secondary decoding networks comprising a corresponding perturbation setting;
the calculating an error between the first label data and the second label data of the training image using a mean square error loss function includes:
acquiring second label data of the training image output by each auxiliary decoding network;
calculating an error between the first label data of the training image and second label data corresponding to each secondary decoding network by using a mean square error loss function;
calculating an average of errors between the first label data of the training image and the second label data corresponding to all the secondary decoding networks as errors between the first label data and the second label data of the training image.
5. Training method according to claim 4, wherein the disturbance settings comprise any one or several of F-Noise, F-Drop, Guided Masking, Intermediate VAT and spatialDropout.
6. Training method according to claim 1, wherein the set of training images comprises a first image and a second image with third label data;
before inputting the training image set into the coding network, the method comprises the following steps:
training the semantic segmentation network with the second image;
the training the semantic segmentation network based on the first loss function includes:
obtaining a second loss function using the first label data and the third label data of the second image;
forming a third loss function using the second loss function and the first loss function;
and training the semantic segmentation network with the aim of reducing the third loss function.
7. The training method of claim 6, wherein the obtaining a second loss function using the first label data and the third label data of the second image comprises:
calculating an error between the first label data and the third label data of the second image using a cross entropy loss function;
calculating an average of errors between the first label data and the third label data for all of the second images as the second loss function.
8. The training method of claim 1, wherein prior to inputting the set of training images into the encoding network, comprising:
and preprocessing the training image set by utilizing a Poisson image editing algorithm.
9. An image processing method, comprising:
acquiring an intra-surgical image;
processing the surgical image by using a semantic segmentation network, wherein the semantic segmentation network is obtained by training through the training method of any one of claims 1-8;
and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.
10. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory has stored therein a computer program for executing the computer program to implement the steps of the training method of any one of claims 1 to 8 and/or the image processing method of claim 9.
11. A computer storage medium, characterized in that it stores a computer program which, when executed, implements the steps of the training method of any one of claims 1 to 8 and/or the image processing method of claim 9.
CN202110309167.6A 2021-03-23 2021-03-23 Training method of semantic segmentation network, image processing method and equipment thereof Pending CN113113119A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110309167.6A CN113113119A (en) 2021-03-23 2021-03-23 Training method of semantic segmentation network, image processing method and equipment thereof
PCT/CN2021/137599 WO2022199137A1 (en) 2021-03-23 2021-12-13 Training method for semantic segmentation network, image processing method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110309167.6A CN113113119A (en) 2021-03-23 2021-03-23 Training method of semantic segmentation network, image processing method and equipment thereof

Publications (1)

Publication Number Publication Date
CN113113119A true CN113113119A (en) 2021-07-13

Family

ID=76710438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110309167.6A Pending CN113113119A (en) 2021-03-23 2021-03-23 Training method of semantic segmentation network, image processing method and equipment thereof

Country Status (2)

Country Link
CN (1) CN113113119A (en)
WO (1) WO2022199137A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705666A (en) * 2021-08-26 2021-11-26 平安科技(深圳)有限公司 Segmentation network training method, using method, device, equipment and storage medium
CN114494800A (en) * 2022-02-17 2022-05-13 平安科技(深圳)有限公司 Prediction model training method and device, electronic equipment and storage medium
WO2022199137A1 (en) * 2021-03-23 2022-09-29 中国科学院深圳先进技术研究院 Training method for semantic segmentation network, image processing method and device thereof
US20230154185A1 (en) * 2021-11-12 2023-05-18 Adobe Inc. Multi-source panoptic feature pyramid network
WO2024175045A1 (en) * 2023-02-22 2024-08-29 华为技术有限公司 Model training method and apparatus, and electronic device and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546483B (en) * 2022-09-30 2023-05-12 哈尔滨市科佳通用机电股份有限公司 Deep learning-based method for measuring residual usage amount of carbon slide plate of subway pantograph
CN116168242B (en) * 2023-02-08 2023-12-01 阿里巴巴(中国)有限公司 Pixel-level label generation method, model training method and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161279A (en) * 2019-12-12 2020-05-15 中国科学院深圳先进技术研究院 Medical image segmentation method and device and server

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472360B (en) * 2018-10-30 2020-09-04 北京地平线机器人技术研发有限公司 Neural network updating method and updating device and electronic equipment
CN110097131B (en) * 2019-05-08 2023-04-28 南京大学 Semi-supervised medical image segmentation method based on countermeasure cooperative training
CN110533044B (en) * 2019-05-29 2023-01-20 广东工业大学 Domain adaptive image semantic segmentation method based on GAN
CN110909744B (en) * 2019-11-26 2022-08-19 山东师范大学 Multi-description coding method and system combined with semantic segmentation
CN111091166B (en) * 2020-03-25 2020-07-28 腾讯科技(深圳)有限公司 Image processing model training method, image processing device, and storage medium
CN112035834A (en) * 2020-08-28 2020-12-04 北京推想科技有限公司 Countermeasure training method and device, and application method and device of neural network model
CN113113119A (en) * 2021-03-23 2021-07-13 中国科学院深圳先进技术研究院 Training method of semantic segmentation network, image processing method and equipment thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161279A (en) * 2019-12-12 2020-05-15 中国科学院深圳先进技术研究院 Medical image segmentation method and device and server

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGYU CHEN 等: "Semi-supervised Semantic Segmentation of Cataract Surgical Images based on DeepLab v3+", 《ACM》 *
YASSINE OUALI 等: "Semi-Supervised Semantic Segmentation with Cross-Consistency Training", 《IEEE》 *
史攀 等: "深度学习在微创手术视频分析中的应用研究综述", 《中国生物医学工程学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022199137A1 (en) * 2021-03-23 2022-09-29 中国科学院深圳先进技术研究院 Training method for semantic segmentation network, image processing method and device thereof
CN113705666A (en) * 2021-08-26 2021-11-26 平安科技(深圳)有限公司 Segmentation network training method, using method, device, equipment and storage medium
CN113705666B (en) * 2021-08-26 2023-10-27 平安科技(深圳)有限公司 Split network training method, use method, device, equipment and storage medium
US20230154185A1 (en) * 2021-11-12 2023-05-18 Adobe Inc. Multi-source panoptic feature pyramid network
US11941884B2 (en) * 2021-11-12 2024-03-26 Adobe Inc. Multi-source panoptic feature pyramid network
CN114494800A (en) * 2022-02-17 2022-05-13 平安科技(深圳)有限公司 Prediction model training method and device, electronic equipment and storage medium
CN114494800B (en) * 2022-02-17 2024-05-10 平安科技(深圳)有限公司 Predictive model training method and device, electronic equipment and storage medium
WO2024175045A1 (en) * 2023-02-22 2024-08-29 华为技术有限公司 Model training method and apparatus, and electronic device and storage medium

Also Published As

Publication number Publication date
WO2022199137A1 (en) 2022-09-29

Similar Documents

Publication Publication Date Title
CN113113119A (en) Training method of semantic segmentation network, image processing method and equipment thereof
CN110347799B (en) Language model training method and device and computer equipment
CN112330685B (en) Image segmentation model training method, image segmentation device and electronic equipment
CN110677598A (en) Video generation method and device, electronic equipment and computer storage medium
CN113836992B (en) Label identification method, label identification model training method, device and equipment
US20230143452A1 (en) Method and apparatus for generating image, electronic device and storage medium
CN115762484B (en) Multi-mode data fusion method, device, equipment and medium for voice recognition
CN112818995B (en) Image classification method, device, electronic equipment and storage medium
CN116050496A (en) Determination method and device, medium and equipment of picture description information generation model
CN116127080A (en) Method for extracting attribute value of description object and related equipment
CN116993864A (en) Image generation method and device, electronic equipment and storage medium
CN115620371A (en) Training method and device for speaking video generation model, electronic equipment and storage medium
CN113837179B (en) Multi-discriminant GAN network construction method, device and system for processing images and storage medium
CN114529917B (en) Zero-sample Chinese single-word recognition method, system, device and storage medium
CN112488148A (en) Clustering method and device based on variational self-encoder
CN113705276A (en) Model construction method, model construction device, computer apparatus, and medium
CN114926479A (en) Image processing method and device
CN113177957B (en) Cell image segmentation method and device, electronic equipment and storage medium
CN117556048A (en) Artificial intelligence-based intention recognition method, device, equipment and medium
CN108765413B (en) Method, apparatus and computer readable medium for image classification
CN116402831A (en) Partially-supervised abdomen CT sequence image multi-organ automatic segmentation method and device
CN111598904B (en) Image segmentation method, device, equipment and storage medium
CN112182268B (en) Image classification method, device, electronic equipment and storage medium
CN110781646B (en) Name standardization method, device, medium and electronic equipment
CN117541758B (en) Virtual face configuration parameter generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713

RJ01 Rejection of invention patent application after publication