CN113113119A

CN113113119A - Training method of semantic segmentation network, image processing method and equipment thereof

Info

Publication number: CN113113119A
Application number: CN202110309167.6A
Authority: CN
Inventors: 贾富仓; 陈宏宇
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-07-13
Also published as: WO2022199137A1

Abstract

The application provides a training method of a semantic segmentation network, an image processing method, a terminal device and a computer readable storage medium. The semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function. The technical scheme provided by the application is beneficial to improving the accuracy and robustness of the network.

Description

Training method of semantic segmentation network, image processing method and equipment thereof

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a semantic segmentation network, an image processing method, a terminal device, and a computer-readable storage medium.

Background

In the field of medical image processing, machine learning has been used in a wide variety of applications, particularly in the field of endoscopic and microscopic surgical image processing. Endoscope, microscope operation because inside operable space is less, can only pass through very limited field of vision observation operation, and the operation environment is comparatively complicated, and smog, blood, specular reflection etc. all can cause the influence to limited field of vision, consequently, have higher requirement to the precision that the meaning was cut apart, and the precision requirement of endoscope, microscope operation to image processing can not be satisfied to current deep learning network of deep LabV3 +.

Disclosure of Invention

The application provides a training method of a semantic segmentation network, an image processing method, a terminal device and a computer readable storage medium.

The first technical scheme adopted by the application is as follows: the method for training the semantic segmentation network comprises an encoding network and a decoding network, wherein the decoding network comprises a main decoding network and an auxiliary decoding network, the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.

Optionally, the perturbation setting connects decoders in the encoding network and the secondary decoding network; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image, wherein the second label data comprises: and inputting the intermediate representation into disturbance setting to generate a disturbance version of the intermediate representation, and inputting the disturbance version into a decoder to obtain second label data of the training image.

Optionally, the obtaining a first loss function using the first label data and the second label data of the training image includes: calculating an error between first label data and second label data of the training image by using a mean square error loss function; an average of errors between the first label data and the second label data for all training images is calculated as a first loss function.

Optionally, the number of the secondary decoding networks is multiple, and each secondary decoding network includes a corresponding disturbance setting; calculating an error between the first label data and the second label data of the training image using a mean square error loss function, comprising: acquiring second label data of the training image output by each auxiliary decoding network; calculating an error between first label data of the training image and second label data corresponding to each auxiliary decoding network by using a mean square error loss function; and calculating the average number of errors between the first label data of the training image and the second label data corresponding to all the auxiliary decoding networks as the errors between the first label data and the second label data of the training image.

Optionally, the disturbance setting includes any one or more of F-Noise, F-Drop, Guided Masking, Intermediate VAT, and spatialDropout.

Optionally, the training image set comprises a first image and a second image with third label data; before inputting the training image set into the coding network, the method comprises the following steps: training a semantic segmentation network by using the second image; training a semantic segmentation network based on a first loss function, comprising: acquiring a second loss function by using the first label data and the third label data of the second image; forming a third loss function by using the second loss function and the first loss function; the semantic segmentation network is trained with the goal of reducing the third loss function.

Optionally, obtaining a second loss function using the first label data and the third label data of the second image includes: calculating an error between the first label data and the third label data of the second image using a cross entropy loss function; an average of errors between the first label data and the third label data of all the second images is calculated as a second loss function.

Optionally, before inputting the training image set into the encoding network, the method includes: and preprocessing the training image set by using a Poisson image editing algorithm.

The second technical scheme adopted by the application is as follows: an image processing method is provided, including acquiring an intra-surgical image; processing the operation image by utilizing a semantic segmentation network, wherein the semantic segmentation network is obtained by training through any one of the training methods; and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.

The third technical scheme adopted by the application is as follows: a terminal device is provided, the terminal device comprising a processor and a memory; the memory has stored therein a computer program for execution by the processor to implement the steps of the training method described above and/or the image processing method described above.

The fourth technical scheme adopted by the application is as follows: a computer storage medium is provided, which stores a computer program that, when executed, implements the steps of the training method and/or the image processing method described above.

The beneficial effect of this application is: in the application, the semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, wherein the auxiliary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps: inputting a training image set into a coding network, wherein the training image set comprises a plurality of training images to obtain intermediate representation of the training images; inputting the intermediate representation into a main decoding network to obtain first label data of a training image; inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image; and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function. According to the method, disturbance setting is introduced into the auxiliary decoding network, and the original deep learning network DeepLabV3+ is improved based on a domain self-adaption principle, so that the accuracy and the robustness of the semantic segmentation network are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a training method for semantic segmentation networks provided in the present application;

FIG. 2 is a schematic diagram of a semantic segmentation network in the training method shown in FIG. 1;

FIG. 3 is another schematic flow chart diagram of the training method shown in FIG. 1;

FIG. 4 is another schematic flow chart diagram of the training method shown in FIG. 1;

FIG. 5 is a schematic flow chart of S30 in the training method shown in FIG. 1;

FIG. 6 is a schematic diagram of one embodiment of a perturbation setting of the present application;

FIG. 7 is a schematic diagram of another embodiment of a perturbation setting of the present application;

FIG. 8 is a schematic flow chart of S40 in the training method shown in FIG. 1;

FIG. 9 is a schematic flow chart diagram illustrating one embodiment of S41 of FIG. 8;

FIG. 10 is a schematic flow chart diagram illustrating one embodiment of S43 of FIG. 8;

FIG. 11 is a flowchart illustrating an embodiment of an image processing method of the present application;

FIG. 12 is a schematic structural diagram of an embodiment of a terminal device according to the present application;

FIG. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive work are within the scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of an embodiment of a training method for a semantic segmentation network provided in the present application, and fig. 2 is a schematic diagram of the semantic segmentation network in the training method shown in fig. 1.

As shown in fig. 2, the semantic segmentation network is an end-to-end neural network, and may specifically include an encoding network e and a decoding network. Wherein the decoding network comprises a main decoding network d and K auxiliary decoding networks

Wherein, K is a natural number,

indicating the kth secondary decoding netCollaterals of blood, d_a ^kIndicating the decoder, p, corresponding to the kth secondary decoding network_kAnd representing the disturbance setting corresponding to the kth auxiliary decoding network.

As shown in FIG. 2, in some embodiments, the decoding network may include multiple secondary decoding networks

And, each secondary decoding network

May include a perturbation setting p corresponding thereto_kAnd decoder d_a ^k. For example, the number of the secondary decoding networks may be 2, 3, 4, or 5, which is not limited in this application and can be selected by those skilled in the art according to actual needs.

Of course, in some embodiments, the number of the secondary decoding networks may be one. To a certain extent, the number of the auxiliary decoding networks is increased, so that the accuracy and the robustness of the semantic segmentation network are improved.

The training method of the semantic segmentation network is applied to a terminal device, wherein the terminal device can be a server, a mobile device, or a system in which the server and the mobile device are matched with each other. Accordingly, each part, such as each unit, sub-unit, module, and sub-module, included in the terminal device may be all disposed in the server, may be all disposed in the mobile device, and may be disposed in the server and the mobile device, respectively.

Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein.

As shown in fig. 1, the training method for semantic segmentation networks provided in the embodiment of the present application specifically includes the following steps:

s10: and inputting a training image set into the coding network, wherein the training image set comprises a plurality of training images to obtain the intermediate representation of the training images.

Specifically, a video camera or a video recorder may be used to capture images during a surgical procedure to capture a plurality of training images, forming a training image set D ═ { x ═ x₁,x₂,……,x_i,……,x_mWhere m denotes that the training image set D includes m training images, x_iRepresenting the ith training image. As shown in FIG. 2, a training image x_iInputting the coding network e to obtain the training image x_iIs z_i＝e(x_i)。

Referring to fig. 3, fig. 3 is another flow chart of the training method shown in fig. 1, and in some embodiments, to further improve the accuracy of the semantic segmentation network, before S10, the method may further include:

s01: the Poisson image editing algorithm is utilized to preprocess the training image set so as to remove the highlight part in the training image and avoid the influence of point light sources on image segmentation.

Specifically, the highlight region Ω in the training image may be extracted by a threshold processing, and then the highlight region in the training image may be removed by solving the following formula:

g(x)＝(I-G_δ*I)(x) (1)

f(x)＝I(x)(x∈Ω) (3)

where x denotes the pixels in the highlight region omega, I denotes the original image, G_δThe image is subjected to Gaussian filtering processing, and f represents the image with highlight removed.

In some embodiments, the training image set D may include both the first image without the label and the second image with the label, at which time the semantic segmentation network is trained as a semi-supervised learning network.

In particular, the training image set D may comprise a first image set D₁And a second image set D₂Wherein the second image set can be represented as D₂＝{x₁,x₂,……x_j,……,x_nWherein n represents the second image set D₂Comprising n second images, x, with labels_jRepresenting the jth labeled second image. Accordingly, the first image set D₁Then m-n unlabelled first images may be included. Wherein the second image set D₂May be generated by manual annotation.

Referring to fig. 4, fig. 4 is another schematic flow chart of the training method shown in fig. 1, when the training image set D includes both the first image without the label and the second image with the label, before S10, the method may further include:

s02: and training the semantic segmentation network by using the second image.

In particular, the second image x may be utilized first_jTraining the coding network e and the main decoding network d, and then reusing the main decoding network d and the auxiliary decoding network

Training secondary decoding network for prediction consistency between

As shown in fig. 4, S01 and S02 may be included before S10, and S01 precedes S02. In some embodiments, before S10, only S02 may be included, and S01 is not included, and the present application is not limited, and those skilled in the art can make and select the S02 according to actual needs. It should be noted that, in the present application, the sequence represented by the sequence numbers of the steps does not represent the actual execution sequence of the steps between the steps that do not have a certain precedence relationship, for example, in some embodiments, S02 may precede S01.

In some embodiments, the training image set D may further include m training images without labels, and at this time, the semantic segmentation network is trained as an unsupervised learning network.

Generally speaking, a fully supervised learning network has high accuracy, but needs a large amount of manually labeled training data, is difficult to acquire the training data, consumes a large amount of manpower, has poor generalization capability, and lacks certain flexibility.

Compared with the fully supervised learning network, the unsupervised learning network has better generalization capability, but the accuracy needs to be further improved.

Compared with an unsupervised learning network, the semi-supervised learning network has higher accuracy, only a small amount of artificially labeled training data is needed, and the acquisition difficulty of the training data is reduced. In addition, semi-supervised deep learning also has better generalization capability than fully supervised deep learning.

S20: and inputting the intermediate representation into a main decoding network to obtain first label data of the training image.

As shown in FIG. 2, a training image x_iIs z_i＝e(x_i) Inputting the training image x into the main decoding network d to obtain the training image x output by the main decoding network d_iFirst tag data d (z)_i)。

S30: and inputting the intermediate representation into a disturbance setting and a decoder to obtain second label data of the training image.

As previously described, in this embodiment, the decoding network may include a plurality of secondary decoding networks, each secondary decoding network

Can correspondingly output a training image x_iA second tag data of

Under the condition that the number of the auxiliary decoding networks is K, outputting a training image x_iK second tag data.

As shown in fig. 2, in this embodimentIn an embodiment, the secondary decoding network

Disturbance setting p in_kConnecting coding network e and decoder d_a ^k. As shown in fig. 5, fig. 5 is a schematic flow chart of S30 in the training method shown in fig. 1, and S30 may specifically include:

s31: the intermediate representation is input to the perturbation setting, generating a perturbed version of the intermediate representation.

As shown in FIG. 2, a training image x_iIs z_i＝e(x_i) Input-assisted decoding network

Disturbance setting p in_kObtaining the training image x_iIs z_i＝e(x_i) Perturbed version z of_i ^k。

S32: and inputting the disturbed version into a decoder to obtain second label data of the training image.

As shown in fig. 2, the training image x is divided into_iIs z_i＝e(x_i) Perturbed version z of_i ^kInput-assisted decoding network

Decoder d in (1)_a ^kObtaining second label data of the training image

In some other embodiments, it may also be a secondary decoding network

Decoder d in (1)_a ^kConnecting a connection coding network e and a disturbance setting p_kThis is not a limitation of the present application and the artThe skilled person can make a free choice depending on the actual situation.

The experimental results show that the phase comparison decoder d_a ^kConnecting a connection coding network e and a disturbance setting p_kOf the disturbance setting p_kConnecting coding network e and decoder d_a ^kThe scheme is more beneficial to improving the accuracy and the robustness of the semantic segmentation network.

As described above, in the present embodiment, each secondary decoding network

May include a perturbation setting p corresponding thereto_kAnd decoder d_a ^kWherein p is_kMay include any one or more of F-Noise, F-Drop, Guided Masking, Intermediate VAT, and spatialDropout.

Next, these disturbance settings will be described in detail.

F-Noise: FIG. 6 is a schematic diagram of an embodiment of disturbance setting according to the present application, with unified sampling and intermediate representation z_iThe same noise tensor N-U (0.2, 0.3), by multiplying by z_iTo adjust its scope, inject noise into z of the coded network output_iIn (b) to obtain z_iPerturbed version z of_i ^k. Wherein the injected noise and z_iProportional relationship, as shown in fig. 5.

F-Drop: as shown in fig. 7, fig. 7 is a schematic diagram of another embodiment of the disturbance setting of the present application, and the threshold γ to U (0.6, 0.9) is uniformly sampled first. Summing and normalizing feature maps z in channel dimensions_iTo obtain z_i' later, we generate a mask M_drop＝{z_i’<γ } which is then used to obtain the perturbed version z_i ^k＝z_i⊙M_dropThus, we can mask 10% to 40% of the most active region in the feature map.

Guided Masking: context-related objects can be more quickly located and identified in familiar environments, and the information constituting elements of a scene can be contextThe inference provides a very important influencing factor. Creating z using a mask context (Con-Msk)_iTo apply them to the intermediate representation z_iTo obtain a perturbed version z_i ^k。

Intermediate VAT (I-VAT): the slight perturbation of the input data will have an effect on the model result and the training result needs to be smooth so as to be stable. Thus, the range z is perturbed using the VAT function_iFor a given auxiliary encoder, the prediction result will be most affected against disturbances, and noise is injected into the intermediate representation z_iTo obtain a perturbed version z_i ^k。

Spatialdropout (dropout): as a random perturbation is applied in this network.

S40: and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.

As shown in fig. 8, fig. 8 is a flowchart of S40 in the training method shown in fig. 1, and the obtaining the first loss function by using the first label data and the second label data of the training image may include:

s41: an error between the first label data and the second label data of the training image is calculated using a mean square error loss function.

In some embodiments, other loss functions may also be used to calculate the error between the first label data and the second label data of each training image, for example, a cross entropy loss function, a mean absolute value error loss function, a Huber loss function, etc., which is not limited in this application and can be selected by one skilled in the art according to actual needs.

As described above, in the semantic segmentation network, the decoding network includes a plurality of secondary decoding networks, and each secondary decoding network

Can correspondingly output a training image x_iA second tag data of

The K auxiliary decoding networks correspondingly output K second label data.

Referring to fig. 9, fig. 9 is a schematic flowchart of an embodiment of S41 in fig. 8, where S41 may specifically include:

s411: and acquiring second label data of the training image output by each auxiliary decoding network.

S412: and calculating the error between the first label data of the training image and the second label data corresponding to each auxiliary decoding network by using a mean square error loss function.

In some embodiments, other loss functions may also be used to calculate the error between the first label data and each second label data of the training image, such as a cross entropy loss function, a mean absolute value error loss function, a Huber loss function, and so on, which is not limited in this application and can be selected by one skilled in the art according to actual needs.

S413: and calculating the average number of errors between the first label data of the training image and the second label data corresponding to all the auxiliary decoding networks as the errors between the first label data and the second label data of the training image.

And calculating errors between the first label data of the training image and the second label data corresponding to each auxiliary decoding network, summing the errors, and dividing the sum by the number K of the auxiliary decoding networks to obtain the errors between the first label data and the second label data of the training image.

S42: an average of errors between the first label data and the second label data for all training images is calculated as a first loss function.

The training image set comprises a plurality of training images, errors between first label data and second label data of each training image are calculated, summed, and divided by the number of the training images, so that a first loss function can be obtained.

Specifically, the first loss function may be:

wherein,

representing a first loss function, D representing a training image set, m representing the number of training images included in the training image set, K representing the number of secondary decoding networks, x_iRepresents the i-th training image, d (z)_i) First label data representing an ith training image,

and representing second label data of the ith training image output by the Kth auxiliary decoding network, wherein SE represents a mean square error function.

As previously mentioned, the training image set D comprises both a first image without a label and a second image with a label, the second image x_jThe tag with can be recorded as third tag data y_j. At this point, continuing to refer to fig. 8, training the semantic segmentation network based on the first loss function may include:

s43: a second loss function is obtained using the first label data and the third label data of the second image.

Specifically, as shown in fig. 10, fig. 10 is a schematic flowchart of an embodiment of S43 in fig. 8, where the step may include:

s431: an error between the first label data and the third label data of the second image is calculated using a cross entropy loss function.

In some embodiments, other loss functions may also be used to calculate the error between the first label data and the third label data of the second image, such as a mean square error loss function, a mean absolute value error loss function, a Huber loss function, and the like, which is not limited in this application and can be selected by one skilled in the art according to practical needs.

S432: an average of errors between the first label data and the third label data of all the second images is calculated as a second loss function.

The second loss function is obtained by calculating the error between the first label data and the third label data of each second image, summing the errors, and dividing the sum by the number of second images.

That is, the second loss function may be:

wherein,

representing a second loss function, n representing a second set of images D₂Number of second images, y_jThird label data, d (z), representing the jth second image_j) First label data representing a jth second image, CE representing a cross entropy function.

S44: a third loss function is formed using the second loss function and the first loss function.

Specifically, the third loss function may be expressed as:

wherein, ω is₁Representing a first loss function

Weight of (a), for example, ω₁May be 1, and the present application does not limit ω₁The value of (b) can be selected by those skilled in the art according to actual needs.

S45: the semantic segmentation network is trained with the goal of reducing the third loss function.

In some embodiments, S43 may precede S41, or S43 may be performed simultaneously with S41, which is not limited herein and may be selected by those skilled in the art according to actual needs. In the present application, the sequence represented by the serial number of each step does not represent the actual execution sequence of each step between each step that does not have a certain precedence relationship.

On one hand, the semantic segmentation network comprises a coding network and a decoding network, the decoding network comprises a main decoding network and an auxiliary decoding network, disturbance setting is introduced into the auxiliary decoding network, and an original deep learning network DeepLabV3+ is improved based on a domain self-adaption principle, so that the accuracy and the robustness of the semantic segmentation network are improved. On the other hand, the semi-supervised semantic segmentation network obtained by training through the training method can accurately segment images under the condition of less label data, the segmentation result is natural and mellow in boundary and easy to observe, higher accuracy and robustness are achieved, the number of labels required by training is reduced, the generalization capability is strong, the semi-supervised semantic segmentation network is more flexible, the semi-supervised semantic segmentation network can adapt to various segmentation scenes, and higher segmentation accuracy and robustness can be maintained for the types of surgical instruments with fewer occurrence times.

The DeepLabv3+ network based on the domain adaptive principle is realized on the data set CATARACTS Semantic Segmentation2020 of the public data set 352020, and a better effect is achieved, and the result has higher reliability, and can basically meet the safety requirement in the field of medical image processing.

Specifically, the data set was selected as CATARACTS Semantic Segmentation2020 data set, which included 50 videos of cataract surgery performed at the braes oversize hospital on days 1 and 22 of 2015 to 10 of 2015 and 9. A total of over 9 hours of surgical video was recorded. The training set contained 4 hours of 42 minutes of video, while the test set contained 4 hours of 24 minutes of video, with a sufficient number of samples. The data set had 25 video subsets, with the training set, validation set, and test set containing 3550, 534 (video subsets 5, 7, 16), 587 (video subsets 2, 12, and 22), respectively, and the input picture resolution was 512 x 512, using a random gradient descent (SGD) optimizer.

And comparing the quality of the algorithm processing result by depending on the Intersection over Union parameter to obtain the final experimental result. In the whole process, the research of theory and method and the realization and verification of algorithm are carried out synchronously, and the optimization is carried out alternately.

Referring to fig. 11, fig. 11 is a schematic flowchart illustrating an embodiment of an image processing method according to the present application, the image processing method including:

s201: an intra-surgical image is acquired.

For example, the intra-surgical images may be acquired by a camera or video recorder.

S202: and processing the operation image by utilizing a semantic segmentation network, wherein the semantic segmentation network is obtained by training through the training method.

Specifically, the semantic segmentation network obtained through training by the training method is used for performing semantic segmentation on the operation image.

S202: and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.

For example, the image processing method can be applied to image processing of cataract surgery, and can accurately segment a surgery image under the condition of less tag data. The boundary of the segmentation result is natural and mellow, the observation is easy, the higher accuracy and robustness are achieved, and the reliable reference is provided for the operating personnel. Of course, the image processing method can also be used in other endoscopic and microscopic surgeries, and the application is not limited, and the skilled person can select the image processing method according to actual needs.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application. The terminal device 100 comprises a processor 10 and a memory 20 coupled; the memory 20 stores a computer program for execution by the processor 10 to implement the steps of the training method and/or the image processing method as described above.

Wherein the processor 10 is used for the operation of the terminal 100, the processor 10 may also be referred to as a CPU (Central Processing Unit). The processor 10 may be an integrated circuit chip having signal processing capabilities. The processor 10 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like.

The memory 20 may include Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, and so forth. Memory 20 may store program data, which may include a single instruction, or many instructions, for example, and may be distributed over several different code segments, among different programs, and across multiple memories 20. Memory 20 may be coupled to processor 10 such that processor 10 can read information from, and write information to, memory 20. Of course, the memory 20 may be integral to the processor 10.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium 200 of the present application, in which a computer program is stored, and the computer program implements the steps of the training method and/or the image processing method when being executed.

The technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage device and includes instructions (program data) for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. The aforementioned storage device includes: various media such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and electronic devices such as a computer, a mobile phone, a notebook computer, a tablet computer, and a camera having the storage medium.

In several embodiments provided in the present application, it should be understood that the disclosed training method for semantic segmentation network may be implemented in other ways. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only a part of the embodiments of the present application, and not intended to limit the scope of the present application, and all equivalent devices or equivalent processes performed by the content of the present application and the attached drawings, or directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A training method of a semantic segmentation network, wherein the semantic segmentation network comprises an encoding network and a decoding network, the decoding network comprises a primary decoding network and a secondary decoding network, wherein the secondary decoding network comprises a disturbance setting and a decoder, and the training method comprises the following steps:

inputting a training image set into the coding network, wherein the training image set comprises a plurality of training images, and an intermediate representation of the training images is obtained;

inputting the intermediate representation into the main decoding network to obtain first label data of the training image;

inputting the intermediate representation into the perturbation setting and the decoder to obtain second label data of the training image;

and acquiring a first loss function by using the first label data and the second label data of the training image, and training the semantic segmentation network based on the first loss function.

2. Training method according to claim 1, wherein the perturbation setting connects decoders in the coding network and the secondary decoding network;

the inputting the intermediate representation into the perturbation setting and the decoder to obtain second label data of the training image comprises:

inputting the intermediate representation into the perturbation setting, generating a perturbed version of the intermediate representation,

and inputting the disturbed version into the decoder to obtain second label data of the training image.

3. The training method of claim 1, wherein the obtaining a first loss function using the first label data and the second label data of the training image comprises:

calculating an error between the first label data and the second label data of the training image using a mean square error loss function;

calculating an average of errors between the first label data and the second label data for all of the training images as the first loss function.

4. The training method of claim 3, wherein the number of the secondary decoding networks is plural, each of the secondary decoding networks comprising a corresponding perturbation setting;

the calculating an error between the first label data and the second label data of the training image using a mean square error loss function includes:

acquiring second label data of the training image output by each auxiliary decoding network;

calculating an error between the first label data of the training image and second label data corresponding to each secondary decoding network by using a mean square error loss function;

calculating an average of errors between the first label data of the training image and the second label data corresponding to all the secondary decoding networks as errors between the first label data and the second label data of the training image.

5. Training method according to claim 4, wherein the disturbance settings comprise any one or several of F-Noise, F-Drop, Guided Masking, Intermediate VAT and spatialDropout.

6. Training method according to claim 1, wherein the set of training images comprises a first image and a second image with third label data;

before inputting the training image set into the coding network, the method comprises the following steps:

training the semantic segmentation network with the second image;

the training the semantic segmentation network based on the first loss function includes:

obtaining a second loss function using the first label data and the third label data of the second image;

forming a third loss function using the second loss function and the first loss function;

and training the semantic segmentation network with the aim of reducing the third loss function.

7. The training method of claim 6, wherein the obtaining a second loss function using the first label data and the third label data of the second image comprises:

calculating an error between the first label data and the third label data of the second image using a cross entropy loss function;

calculating an average of errors between the first label data and the third label data for all of the second images as the second loss function.

8. The training method of claim 1, wherein prior to inputting the set of training images into the encoding network, comprising:

and preprocessing the training image set by utilizing a Poisson image editing algorithm.

9. An image processing method, comprising:

acquiring an intra-surgical image;

processing the surgical image by using a semantic segmentation network, wherein the semantic segmentation network is obtained by training through the training method of any one of claims 1-8;

and obtaining the position information of the surgical instrument used in the surgical operation based on the processing result of the semantic segmentation network on the surgical image.

10. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory has stored therein a computer program for executing the computer program to implement the steps of the training method of any one of claims 1 to 8 and/or the image processing method of claim 9.

11. A computer storage medium, characterized in that it stores a computer program which, when executed, implements the steps of the training method of any one of claims 1 to 8 and/or the image processing method of claim 9.