CN112101453A

CN112101453A - Network model training method and device and marking information determining method and device

Info

Publication number: CN112101453A
Application number: CN202010961988.3A
Authority: CN
Inventors: 于朋鑫; 夏晨; 张荣国; 李新阳; 王少康; 陈宽
Original assignee: Infervision Medical Technology Co Ltd
Current assignee: Infervision Medical Technology Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2020-12-18
Anticipated expiration: 2040-09-14
Also published as: CN112101453B

Abstract

The utility model discloses a network model training method and device, a labeling information determining method and device, a computer readable storage medium and an electronic device, and relates to the technical field of image processing. The training method of the network model comprises the following steps: determining an initial network model; the method comprises the steps of training an initial network model based on a sample image to be annotated and a probability image corresponding to the sample image to be annotated to generate an image understanding model, wherein the image understanding model is used for generating a feature map and a repaired image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that a display unit in the sample image to be annotated is erased. The image understanding model can be focused on boundary information (including boundary information capable of being used for segmentation) of a learning image, and the trained image understanding model has image understanding capacity. Therefore, the method and the device can provide a data base for the subsequent redefinition operation of the annotation information, and further improve the redefinition accuracy of the annotation information.

Description

Network model training method and device and marking information determining method and device

Technical Field

The disclosure relates to the technical field of image processing, in particular to a training method and device of a network model, a method and device for determining labeling information, a computer-readable storage medium and an electronic device.

Background

In the field of medical image processing, the image segmentation technology can greatly improve the working efficiency of doctors and is an important technology for assisting doctors in disease diagnosis. As is well known, the segmentation of the image segmentation technique is based on the annotation information corresponding to the object to be segmented, and therefore, the quality of the annotation information directly affects the segmentation effect of the object to be segmented.

However, the existing method for determining the annotation information mainly depends on manual work (such as doctors) to implement, and the accuracy is difficult to guarantee, so that the requirement of accurate segmentation is more difficult to meet.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a network model training method and device, a method and device for determining labeling information, a computer-readable storage medium and electronic equipment.

In one aspect, an embodiment of the present disclosure provides a method for training a network model, where the method includes: determining an initial network model; and training an initial network model based on the sample image to be annotated and the probability image corresponding to the sample image to be annotated so as to generate an image understanding model. The image understanding model is used for generating a feature map and a repaired image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that the display unit in the sample image to be annotated is erased.

In another aspect, an embodiment of the present disclosure provides a method for determining annotation information, where the method includes: inputting an image to be annotated into an image understanding model to obtain a feature map and a repaired image corresponding to the image to be annotated, wherein the image understanding model is obtained based on the network model training method mentioned in the embodiment; and determining second annotation information corresponding to the image to be annotated based on the feature map, the repaired image, the image to be annotated and the first annotation information corresponding to the image to be annotated.

In another aspect, an embodiment of the present disclosure provides an apparatus for training a network model, where the apparatus includes: a model determination module for determining an initial network model; and the training module is used for training the initial network model based on the sample image to be labeled and the probability image corresponding to the sample image to be labeled so as to generate an image understanding model. The image understanding model is used for generating a feature map and a repaired image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that the display unit in the sample image to be annotated is erased.

In another aspect, an embodiment of the present disclosure provides an apparatus for determining annotation information, where the apparatus includes: the input module is used for inputting the image to be annotated into the image understanding model so as to obtain a feature map and a repaired image corresponding to the image to be annotated, wherein the image understanding model is obtained based on the network model training method mentioned in the embodiment; and the second annotation information determining module is used for determining second annotation information corresponding to the image to be annotated based on the feature map, the repaired image, the image to be annotated and the first annotation information corresponding to the image to be annotated.

In another aspect, the present disclosure provides a computer-readable storage medium, which stores a computer program for executing the training method of the network model and/or the determination method of the label information mentioned in the foregoing embodiments.

In another aspect, an embodiment of the present disclosure provides an electronic device, including: a processor and a memory for storing processor executable instructions, wherein the processor is configured to perform the method for training the network model and/or the method for determining the label information mentioned in the above embodiments.

The training data of the image understanding model comprises the sample image to be labeled and the probability image corresponding to the sample image to be labeled, and the image understanding model generated by training can generate the feature map corresponding to the image to be labeled based on the image to be labeled and can generate the repair image corresponding to the feature map based on the feature map, so that the training method provided by the embodiment of the disclosure can enable the image understanding model to focus on the boundary information (including the boundary information capable of being used for segmentation) of the learning image, and further enable the image understanding model obtained by training to have the image understanding capability. Therefore, the image understanding model obtained by training according to the embodiment of the disclosure can provide a more accurate data base for the subsequent redefinition operation of the annotation information, and further provide a precondition for improving the redefinition accuracy of the annotation information.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic view of a scene to which the embodiment of the present disclosure is applied.

Fig. 2 is a schematic diagram of another scenario in which the embodiment of the present disclosure is applied.

Fig. 3 is a schematic flowchart illustrating a method for training a network model according to an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic flowchart illustrating a process of training an initial network model based on a to-be-labeled sample image and a probability image corresponding to the to-be-labeled sample image according to an exemplary embodiment of the present disclosure.

Fig. 5 is a schematic flow chart illustrating a process of generating an erased image corresponding to a sample image to be labeled based on the sample image to be labeled and a probability image according to an exemplary embodiment of the present disclosure.

Fig. 6 is a schematic flow chart illustrating a process of generating an erased image corresponding to a sample image to be labeled based on the sample image to be labeled and a probability image according to another exemplary embodiment of the present disclosure.

Fig. 7 is a schematic flowchart illustrating a process of determining an erased image at the second stage based on distance information between first erased areas included in the first erased area set according to another exemplary embodiment of the present disclosure.

Fig. 8 is a flowchart illustrating a method for training a network model according to another exemplary embodiment of the present disclosure.

Fig. 9 is a schematic flowchart illustrating a process of determining a probability image based on first annotation information and a sample image to be annotated according to an exemplary embodiment of the disclosure.

Fig. 10 is a schematic flowchart illustrating a process of generating a probability image based on a first distance set and a second distance set according to an exemplary embodiment of the present disclosure.

Fig. 11 is a flowchart illustrating a method for determining annotation information according to an exemplary embodiment of the present disclosure.

Fig. 12 is a schematic flow chart illustrating that the second annotation information corresponding to the image to be annotated is determined based on the feature map, the repaired image, and the image to be annotated according to an exemplary embodiment of the present disclosure.

Fig. 13 is a schematic flow chart illustrating a process of determining second annotation information corresponding to an image to be annotated based on a feature map, a repaired image, and the image to be annotated according to another exemplary embodiment of the present disclosure.

Fig. 14 is a schematic flowchart illustrating the determination of the second annotation information based on the L1 distance set and the feature diagram according to an exemplary embodiment of the present disclosure.

Fig. 15 is a schematic flowchart illustrating a first redefinition operation performed on the first annotation information based on the first L1 distance subset, the first redefinition threshold, and the feature diagram according to an exemplary embodiment of the present disclosure.

Fig. 16 is a schematic flowchart illustrating a process of determining a redefined labeled value corresponding to a second pixel position based on a feature map according to an exemplary embodiment of the disclosure.

FIG. 17 is a flowchart illustrating a process of determining second annotation information based on first annotation information undergoing a first redefinition operation according to an exemplary embodiment of the present disclosure.

Fig. 18 is a schematic structural diagram of a training apparatus for a network model according to an exemplary embodiment of the present disclosure.

Fig. 19 is a schematic structural diagram of a training apparatus for a network model according to another exemplary embodiment of the present disclosure.

Fig. 20 is a schematic structural diagram of a device for determining annotation information according to an exemplary embodiment of the present disclosure.

Fig. 21 is a schematic structural diagram of a device for determining annotation information according to another exemplary embodiment of the present disclosure.

Fig. 22 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

In recent years, with the rapid development of intelligent technologies, image processing technologies and computer vision technologies have attracted much attention. The importance of image segmentation techniques as an important component in the field of image processing technology and in the field of computer vision technology is self-evident. Specifically, the image segmentation technique refers to a technique of extracting an object to be segmented from an image.

As is well known, the segmentation of the image segmentation technique is based on the annotation information corresponding to the object to be segmented, and therefore, the quality of the annotation information directly affects the segmentation effect of the object to be segmented. Particularly in the field of medical image segmentation, the medical image has the characteristics of huge image quantity, unclear tissue structure boundary and the like, so that the difficulty in determining the labeling information of the medical image is higher. However, the existing method for determining the annotation information mainly depends on manual (such as doctors) implementation, and is inefficient, difficult to ensure accuracy, and more difficult to meet the requirement of accurate segmentation.

Based on the above mentioned technical problems, the basic concept of the present disclosure is to provide a method and an apparatus for training a network model, a method and an apparatus for determining label information, a computer-readable storage medium, and an electronic device.

The training data of the image understanding model are the sample image to be labeled and the probability image corresponding to the sample image to be labeled, and the image understanding model generated by training can generate the feature map corresponding to the image to be labeled based on the image to be labeled and can generate the restored image corresponding to the feature map based on the feature map, so that the training method provided by the embodiment of the disclosure can enable the image understanding model to focus on the boundary information (including the boundary information capable of being used for segmentation) of the learning image, and further enable the image understanding model obtained by training to have the image understanding capability. Therefore, the image understanding model obtained by training according to the embodiment of the disclosure can provide a more accurate data base for the subsequent redefinition operation of the annotation information, and further provide a precondition for improving the redefinition accuracy of the annotation information.

The method for determining the annotation information, provided by the embodiment of the disclosure, can effectively correct the first annotation information based on the image understanding model, so as to obtain the second annotation information with higher accuracy. In addition, when the obtained second annotation information is applied to a subsequent training process of the image segmentation model, the stability and robustness of the image segmentation model obtained through training can be effectively improved, the major loss of the model at an ambiguous annotation position is avoided, and the performance of the image segmentation model is further improved.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings.

Fig. 1 is a schematic view of a scene to which the embodiment of the present disclosure is applied. As shown in fig. 1, a scenario to which the embodiment of the present disclosure is applied includes a server 1 and an image capturing device 2, where there is a communication connection relationship between the server 1 and the image capturing device 2.

Specifically, the image acquisition device 2 is configured to acquire a sample image to be annotated, and the server 1 is configured to determine an initial network model, acquire a probability image corresponding to the sample image to be annotated, and train the initial network model based on the sample image to be annotated and the probability image corresponding to the sample image to be annotated, so as to generate an image understanding model. The image understanding model is used for generating a feature map and a repaired image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that the display unit in the sample image to be annotated is erased. That is, this scenario implements a network model training method.

Or, the image acquisition device 2 is configured to acquire an image to be annotated, and the server 1 is configured to input the image to be annotated into the image understanding model to obtain a feature map and a repaired image corresponding to the image to be annotated, and determine second annotation information corresponding to the image to be annotated based on the feature map, the repaired image, the image to be annotated, and first annotation information corresponding to the image to be annotated. That is, the scenario implements a method for determining annotation information. The image understanding model mentioned in this scenario can be generated based on the above network model training method scenario. Since the scene shown in fig. 1 uses the server 1 to implement the network model training method and/or the determination method of the annotation information, the adaptive capacity of the scene can be improved, and the calculation amount of the image acquisition device 2 can be effectively reduced.

It should be noted that the present disclosure is also applicable to another scenario. Fig. 2 is a schematic diagram of another scenario in which the embodiment of the present disclosure is applied. Specifically, the image processing apparatus 3 is included in the scene, and an image acquisition module 301 and a calculation module 302 are included in the image processing apparatus 3.

Specifically, the image acquisition module 301 in the image processing device 3 is configured to acquire a sample image to be annotated, and the calculation module 302 in the image processing device 3 is configured to determine an initial network model, obtain a probability image corresponding to the sample image to be annotated, and train the initial network model based on the sample image to be annotated and the probability image corresponding to the sample image to be annotated, so as to generate an image understanding model. The image understanding model is used for generating a feature map and a repaired image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that the display unit in the sample image to be annotated is erased. That is, this scenario implements a network model training method.

Or, the image acquisition module 301 in the image processing device 3 is configured to acquire an image to be annotated, and the calculation module 302 in the image processing device 3 is configured to input the image to be annotated into the image understanding model to obtain a feature map and a restored image corresponding to the image to be annotated, and determine second annotation information corresponding to the image to be annotated based on the feature map, the restored image, the image to be annotated, and the first annotation information corresponding to the image to be annotated. That is, the scenario implements a method for determining annotation information. The image understanding model mentioned in this scenario can be generated based on the above network model training method scenario. Since the scene shown in fig. 2 uses the image processing device 3 to implement the network model training method and/or the method for determining the annotation information, and does not need to perform data transmission operation with a server or other related devices, the scene can ensure the real-time performance of the network model training method or the method for determining the annotation information.

It should be noted that the image acquisition device 2 and the image acquisition module 301 mentioned in the above scenario include, but are not limited to, an X-ray machine, a CT scanner, an MRI apparatus, and other image acquisition devices. Correspondingly, the to-be-labeled sample image and the to-be-labeled image acquired by the image acquisition device 2 and the image acquisition module 301 mentioned in the above scenario include, but are not limited to, medical image data such as a CT image set capable of presenting information such as tissue and organ structure, density, etc. inside a human or animal body in an image manner, such as an X-ray image, a CT image, an MRI image, etc. In addition, it should be understood that the network model training method and the method for determining annotation information provided by the embodiment of the present disclosure are not limited to the above-mentioned application scenarios of medical images, and as long as the application scenarios requiring determination of annotation information are involved, all belong to the application scope of the embodiment of the present disclosure.

Fig. 3 is a schematic flow chart of a network model training method according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the network model training method provided by the embodiment of the present disclosure includes the following steps.

Step S100, an initial network model is determined.

Optionally, the initial network model mentioned in step S100 is a deep learning based neural network model. The specific model structure of the initial network model can be flexibly set according to the actual situation.

And S200, training an initial network model based on the sample image to be annotated and the probability image corresponding to the sample image to be annotated to generate an image understanding model.

Illustratively, the probability image is used to characterize the probability that the display unit in the sample image to be labeled is erased. It should be understood that each display unit in the sample image to be labeled corresponds to a probability value of being erased. For example, the sample image to be labeled is a two-dimensional image including a plurality of display units, correspondingly, the probability image is a two-dimensional image including a plurality of probability units, and the size of the sample image to be labeled is completely the same as the size of the probability image. The probability units are in one-to-one correspondence with the display units, and each probability unit represents the probability that the display unit at the same pixel position (also called an image position) is erased. It should be understood that the display unit can be a single pixel unit, or a pixel unit set formed by a plurality of pixel units.

Preferably, the probability value corresponding to the probability unit included in the probability image falls within the closed interval range of [0,1 ]. When the probability value corresponding to the probability unit is 0, it indicates that the probability of the display unit corresponding to the probability unit being erased is 0, that is, the display unit is not erased. When the probability value corresponding to the probability unit is 1, it indicates that the probability of the display unit corresponding to the probability unit being erased is 1, i.e. it is certain to be erased. The probability value corresponding to the probability unit is limited to the closed interval range of [0,1], so that the calculation amount of subsequent operations such as image erasing calculation can be effectively reduced, and the determination speed of the labeling information is finally improved.

The image understanding model mentioned in step S200 is used to generate a feature map and a restored image corresponding to the image to be annotated based on the image to be annotated. Preferably, the image to be labeled and the sample image to be labeled should be the same type of image (for example, both lung medical images), so that the image to be labeled can be better processed by the image understanding model trained based on the sample image to be labeled.

Preferably, the sample image to be annotated includes an image to be annotated. For example, the sample image to be labeled includes an image M and an image N, and the image to be labeled is the image N. More preferably, the sample image to be labeled and the image to be labeled are identical images, such as a same set of three-dimensional image sequences. In other words, the embodiment of the present disclosure first trains the image understanding model using the sample image including the image to be subjected to annotation redefinition or the image to be subjected to annotation redefinition, and then performs the annotation redefinition on the image to be subjected to annotation redefinition by using the trained image understanding model. The setting can ensure the integrating degree of the image understanding model and the annotation redefinition operation, and further improve the precision of the subsequent annotation redefinition operation.

Exemplarily, the training of the initial network model based on the sample image to be annotated and the probability image corresponding to the sample image to be annotated mentioned in step S200 refers to adjusting model parameters of the initial network model based on the sample image to be annotated and the probability image, and further generating an image understanding model capable of generating a feature map corresponding to the image to be annotated and a restored image based on the image to be annotated.

In the practical application process, an initial network model is determined firstly, and then the initial network model is trained based on a sample image to be annotated and a probability image corresponding to the sample image to be annotated so as to obtain a feature map and a restored image understanding model which are used for generating the corresponding image to be annotated.

Fig. 4 is a schematic flowchart illustrating a process of training an initial network model based on a to-be-labeled sample image and a probability image corresponding to the to-be-labeled sample image according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 4 of the present disclosure is extended on the basis of the embodiment shown in fig. 3 of the present disclosure, and the differences between the embodiment shown in fig. 4 and the embodiment shown in fig. 3 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 4, in the training method of a network model provided in the embodiment of the present disclosure, an initial network model includes a feature extraction module and an image restoration module connected to the feature extraction module. The method comprises the following steps of training an initial network model based on a sample image to be annotated and a probability image corresponding to the sample image to be annotated.

And step S210, generating an erasing image corresponding to the sample image to be annotated based on the sample image to be annotated and the probability image.

The probability image can represent the probability that the display unit in the sample image to be labeled is erased, so that an erased image corresponding to the sample image to be labeled can be generated based on the probability image and the sample image to be labeled. It should be understood that the size of the erased image is the same as the size of the sample image to be annotated.

In step S220, the erased image is input to the feature extraction module to generate a feature map.

Illustratively, the size of the erased image is 1xHxW, and the size of the feature map (feature map) generated by the feature extraction module is CxHxW. In other words, the feature extraction module is configured to perform a feature extraction operation on the erased image, and then output a high-dimensional vector with C feature maps. Wherein C is a positive integer greater than 1.

Step S230, inputting the feature map into an image restoration module to generate a restored image.

And inputting the feature map into an image restoration module on the basis of the feature map generated by the feature extraction module to generate a restored image. Illustratively, the size of the repair image is 1 xHxW. It should be understood that the restored image is obtained by performing an image restoration operation on the feature map.

In one embodiment of the present disclosure, the image inpainting module includes a convolution of 3x3, which is used to assist the image inpainting operation.

And S240, training an initial network model based on the sample image to be annotated and the restored image.

It should be understood that, in the practical application process, the initial network model may be optimally trained end to end based on L1 loss between the restored image output by the image restoration module and the sample image to be annotated, so as to finally generate the image understanding model. The image understanding model obtained after training can generate a high-precision characteristic diagram and a repairing image corresponding to the image to be labeled based on the image to be labeled.

According to the training method of the network model provided by the embodiment of the disclosure, the erased image corresponding to the sample image to be labeled is generated based on the sample image to be labeled and the probability image, the erased image is input into the feature extraction module to generate the feature map, the feature map is input into the image restoration module to generate the restored image, and then the initial network model is trained based on the sample image to be labeled and the restored image, so that the purpose of training the initial network model based on the sample image to be labeled and the probability image corresponding to the sample image to be labeled is achieved. The image understanding model obtained through training can have better image understanding capacity, especially the understanding capacity aiming at the boundary information of the image, and further preconditions are provided for improving the accuracy of the subsequently determined labeling information.

Fig. 5 is a schematic flow chart illustrating a process of generating an erased image corresponding to a sample image to be labeled based on the sample image to be labeled and a probability image according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 5 of the present disclosure is extended on the basis of the embodiment shown in fig. 4 of the present disclosure, and the differences between the embodiment shown in fig. 5 and the embodiment shown in fig. 4 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 5, in the training method of a network model provided in the embodiment of the present disclosure, the step of generating an erased image corresponding to a sample image to be labeled based on the sample image to be labeled and a probability image includes the following steps.

And step S211, carrying out erasing operation on the sample image to be annotated based on the probability image so as to generate a first-stage erasing image.

Note that the erasing operation mentioned in step S211 is performed to erase information contained in the erased display unit to generate a first-stage erased image with information missing. For example, the pixel values of the display elements that need to be erased are replaced with a globally uniform value. The global uniform value mentioned here may be 0 or the maximum pixel value of all display units included in the sample image to be labeled.

In an embodiment of the present disclosure, step S211 includes: determining the erasing probability corresponding to each display unit in the sample image to be marked based on the probability image; and based on the erasing probability corresponding to each display unit in the sample image to be annotated, erasing the sample image to be annotated so as to generate a first-stage erasing image.

The following illustrates the implementation procedure of step S211, and in the following example, the pixel value of the display unit that needs to be erased in the sample image to be labeled is replaced by the minimum pixel value of the display unit in the sample image to be labeled.

The size of the sample image to be labeled is 1x3x3, and the size of the probability image corresponding to the sample image to be labeled is also 1x3x 3. The matrix form of the sample image to be labeled (i.e. matrix a) and the probability image (i.e. matrix B) is as follows.

It can be seen from the combination of matrices a and B that the probability of erasing the display element (i.e., pixel value 34) in row 1 and column 1 of matrix a is 0.21, and the probability of erasing the display element (i.e., pixel value 78) in row 1 and column 2 of matrix a is 0.56, … …. And, based on the matrix a, it can be known that the minimum pixel value of the display unit included in the sample image to be labeled is 4.

During the actual erasing operation, it is necessary to make an erasing decision on the display cells included in the matrix a. For example, if it is determined that the display cells in row 1, column 1 and row 2, column 2 in the matrix a need to be erased, the matrix form of the first-stage erased image (i.e., the matrix C) obtained after the erasing operation is performed is as follows.

In step S215, the first-stage erased image is determined as an erased image.

According to the training method of the network model provided by the embodiment of the disclosure, the erasing operation is performed on the sample image to be marked based on the probability image to generate the first-stage erasing image, and the first-stage erasing image is determined as the erasing image, so that the purpose of generating the erasing image corresponding to the sample image to be marked based on the sample image to be marked and the probability image is achieved.

It should be noted that, in the training phase of the initial network model, the erasing operation mentioned in the above embodiment may also be iterated multiple times, so as to train the initial network model for multiple iterations. It should be understood that each iteration may result in a corresponding first phase erased image, and that the first phase erased image resulting from each iteration is different, but the probability image used for each iteration is unchanged.

Fig. 6 is a schematic flow chart illustrating a process of generating an erased image corresponding to a sample image to be labeled based on the sample image to be labeled and a probability image according to another exemplary embodiment of the present disclosure. The embodiment shown in fig. 6 of the present disclosure is extended on the basis of the embodiment shown in fig. 5 of the present disclosure, and the differences between the embodiment shown in fig. 6 and the embodiment shown in fig. 5 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 6, in the training method of a network model provided in the embodiment of the present disclosure, before determining the first-stage erased image as the erased image step, the following steps are further included.

Step S212, a first erased area set is generated based on the size information and the preset size condition corresponding to each of the plurality of first erased areas.

Illustratively, first erasing regions which do not meet the preset size condition are filtered out based on the preset size condition, and a first erasing region set is generated based on the first erasing regions which meet the preset size condition.

For example, the preset size condition is that the minimum side length of the minimum bounding rectangle of the erasing area is greater than or equal to 3 pixels. Then, of the plurality of first erasing areas mentioned in step S212, the first erasing area whose minimum side length of the minimum bounding rectangle is greater than or equal to 3 pixels is added to the first erasing area set, and the first erasing area whose minimum side length of the minimum bounding rectangle is less than 3 pixels is revoked, that is, the erasing operation for the first erasing area is revoked.

In step S213, the second-stage erased image is determined based on the distance information between the first erased areas included in the first erased area set.

Because the distance information between the first erasing areas can effectively reflect the relevance between the first erasing areas, the determined second-stage erasing images can be optimized by using the distance information between the first erasing areas, for example, the erasing precision and the erasing consistency are improved.

In the training method of the network model provided by the embodiment of the disclosure, the step of determining the first-stage erased image as the erased image includes the following steps.

In step S2151, the second-stage erased image is determined as an erased image.

In the practical application process, firstly, erasing operation is carried out on a sample image to be marked based on a probability image so as to generate a first-stage erased image, then, a first erased area set is generated based on size information corresponding to each of a plurality of first erased areas and a preset size condition, then, an erased image at a second stage is determined based on distance information between the first erased areas included in the first erased area set, and then, the erased image at the second stage is determined as the erased image.

The training method for the network model provided by the embodiment of the disclosure can avoid the situation that an excessively small erasing area interferes with the learning accuracy of the model, and can optimize the determined erasing image by using the distance information between the first erasing areas.

One implementation of step S213 in the embodiment shown in fig. 6 is specifically described below with reference to fig. 7.

Fig. 7 is a schematic flowchart illustrating a process of determining an erased image at the second stage based on distance information between first erased areas included in the first erased area set according to another exemplary embodiment of the present disclosure. As shown in fig. 7, in the embodiment of the present disclosure, the step of determining the second-stage erasing image based on the distance information between the first erasing areas included in the first erasing area set includes the following steps.

In step S2131, a euclidean distance between two first erased areas is calculated for any two first erased areas in the first erased area set.

Step S2132, determining whether the euclidean distance meets a preset connection condition.

Illustratively, the preset communication condition is: the Euclidean distance between the two first erasing areas is smaller than the maximum diameter of the smaller first erasing area of the two first erasing areas.

In step S2132, when the euclidean distance is determined to meet the predetermined connection condition, the two first erasure areas are connected (i.e., step S2133 is executed), and when the euclidean distance is determined to not meet the predetermined connection condition, the disconnection process is executed (step S2134 is executed).

In step S2133, the two first erasure areas are connected.

In step S2134, a disconnection process is performed.

Step S2135 is to generate a second-stage erased image based on the first erased area set subjected to the connectivity processing.

Illustratively, an erase image including the first erase region set after the connected processing is taken as a second stage erase image.

The embodiment of the disclosure can improve the consistency of the internal information of the erased area, so as to better assist in training the image understanding model by using the erased image.

Fig. 8 is a flowchart illustrating a network model training method according to another exemplary embodiment of the present disclosure. The embodiment shown in fig. 8 of the present disclosure is extended on the basis of the embodiment shown in fig. 3 of the present disclosure, and the differences between the embodiment shown in fig. 8 and the embodiment shown in fig. 3 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 8, in the training method of a network model provided in the embodiment of the present disclosure, before the step of training an initial network model based on a sample image to be labeled and a probability image corresponding to the sample image to be labeled, the following steps are further included.

Step S150, determining first annotation information corresponding to the sample image to be annotated.

Illustratively, the first annotation information mentioned in step S150 refers to the initial annotation information before the annotation information is redefined. For example, the sample image to be labeled is a medical image, and the first labeling information is labeling information given by a professional doctor based on the sample image to be labeled.

In an embodiment of the present disclosure, a sample image to be labeled can be divided into a foreground region and a background region based on first labeling information. Specifically, the first annotation information is embodied in the form of an image, that is, embodied as a first annotation image. The size of the first annotation image is the same as that of the sample image to be annotated, and the first annotation image comprises annotation units which are in one-to-one correspondence with the display units in the sample image to be annotated. The marking unit with the median value of 1 in the first marking image corresponds to the foreground area of the sample image to be marked, and the marking unit with the median value of 0 in the first marking image corresponds to the background area of the sample image to be marked.

And step S160, determining a probability image based on the first annotation information and the sample image to be annotated.

Because the probability image is determined based on the first annotation information and the sample image to be annotated, the probability image determined by the embodiment of the disclosure can make full use of effective information in the first annotation information, and further improve the image understanding accuracy of the image understanding model obtained based on the probability image training.

Fig. 9 is a schematic flowchart illustrating a process of determining a probability image based on first annotation information and a sample image to be annotated according to an exemplary embodiment of the disclosure. The embodiment shown in fig. 9 of the present disclosure is extended on the basis of the embodiment shown in fig. 8 of the present disclosure, and the differences between the embodiment shown in fig. 9 and the embodiment shown in fig. 8 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 9, in the training method of a network model provided in the embodiment of the present disclosure, the step of determining a probability image based on first annotation information and a sample image to be annotated includes the following steps.

In step S161, a first region and a second region of the sample image to be labeled are determined based on the first labeling information.

Optionally, the first region is a foreground region of the sample image to be labeled, and the first region includes a plurality of first display units. The second area is a background area of the sample image to be marked and comprises a plurality of second display units.

Step S162, calculating a first euclidean distance between the first display unit and a second display unit closest to the first display unit, respectively, for the plurality of first display units, to generate a first distance set.

Specifically, for each of a plurality of first display units, a first euclidean distance between the first display unit and a second display unit closest to the first display unit is calculated. Then, a first distance set is generated based on the first Euclidean distances corresponding to the first display units respectively.

Step S163 of calculating, for the plurality of second display units, second euclidean distances between the second display unit and the first display unit closest to the second display unit, respectively, to generate a second distance set.

Specifically, for each of the plurality of second display units, a second euclidean distance between the second display unit and the first display unit closest to the second display unit is calculated. Then, a second distance set is generated based on the second euclidean distances corresponding to each of the plurality of second display units.

In step S164, a probability image is generated based on the first distance set and the second distance set.

According to the training method of the network model provided by the embodiment of the disclosure, the purpose of determining the probability image based on the first labeling information and the sample image to be labeled is realized by determining the first region and the second region of the sample image to be labeled based on the first labeling information, then respectively calculating the first Euclidean distance between the first display unit and the second display unit closest to the first display unit aiming at the plurality of first display units to generate the first distance set, respectively calculating the second Euclidean distance between the second display unit and the first display unit closest to the second display unit aiming at the plurality of second display units to generate the second distance set, and then generating the probability image based on the first distance set and the second distance set.

It should be understood that the distance information of the first display unit and the second area included in the first area can be characterized based on the first distance set obtained by the above steps, and the distance information of the second display unit and the first area included in the second area can be characterized based on the second distance set obtained by the above steps. The embodiment of the disclosure can effectively simplify the generation steps of the probability image, and save the calculation time and cost.

One implementation of the step of generating a probability image based on the first set of distances and the second set of distances is described below in conjunction with fig. 10.

Fig. 10 is a schematic flowchart illustrating a process of generating a probability image based on a first distance set and a second distance set according to an exemplary embodiment of the present disclosure. Specifically, in the embodiment of the present disclosure, the first distance set includes a plurality of first euclidean distances, and the plurality of first euclidean distances are in a one-to-one correspondence relationship with the plurality of first display units included in the first region. The second distance set comprises a plurality of second Euclidean distances, and the plurality of second Euclidean distances are in one-to-one correspondence relation with a plurality of second display units included in the second area.

As shown in fig. 10, in the training method of a network model provided in the embodiment of the present disclosure, the step of generating a probability image based on a first distance set and a second distance set includes the following steps.

In step S1641, a preset distance threshold is determined based on the plurality of first euclidean distances.

Step S1642, determining the erasing probability corresponding to each of the plurality of first display units based on the relationship between the plurality of first euclidean distances and the preset distance threshold.

Step S1643, determining the erasing probability corresponding to each of the plurality of second display units based on the relationship between the plurality of second euclidean distances and the preset distance threshold.

Step S1644, generate a probability image based on the erasure probabilities corresponding to the plurality of first display units and the erasure probabilities corresponding to the plurality of second display units.

For example, 1/10 of the largest euclidean distance among all the first euclidean distances included in the first distance set is taken as the preset distance threshold T, and then all the first euclidean distances included in the first distance set are compared with the preset distance threshold T. And if the first Euclidean distance is greater than or equal to the preset distance threshold value T, the probability value of the corresponding position of the first display unit in the probability image corresponding to the first Euclidean distance is 0. If the first Euclidean distance is smaller than a preset distance threshold value T, the probability value of the corresponding position of the first display unit corresponding to the first Euclidean distance in the probability image is-log (1- (T-D)_fore+1)/T + eps), wherein D_foreCharacterizing the first Euclidean distance, eps characterizes a very small value, such as 0.000001, to avoid the infinite case of calculation.

Then, all the second euclidean distances included in the second distance set are also compared with the preset distance threshold T. And if the second Euclidean distance is greater than or equal to the preset distance threshold value T, the probability value of the corresponding position of the second display unit in the probability image corresponding to the second Euclidean distance is 0. If the second Euclidean distance is smaller than the preset distance threshold value T, the probability value of the corresponding position of the second display unit in the probability image corresponding to the second Euclidean distance is-log (1- (T-D)_back+1)/T + eps), wherein D_backCharacterizing the second Euclidean distance.

In summary, the embodiments of the present disclosure can improve the accuracy of the generated probability image by using the distance information.

Fig. 11 is a flowchart illustrating a method for determining annotation information according to an exemplary embodiment of the present disclosure. As shown in fig. 11, the method for determining annotation information provided by the embodiment of the present disclosure includes the following steps.

And step S300, inputting the image to be annotated into an image understanding model to obtain a feature map and a restored image corresponding to the image to be annotated.

It should be understood that the image understanding model provided by the embodiments of the present disclosure may be obtained by training based on the training method of the network model mentioned in the above embodiments.

And S400, determining second annotation information corresponding to the image to be annotated based on the feature map, the repaired image, the image to be annotated and the first annotation information corresponding to the image to be annotated.

Optionally, redefining the first annotation information corresponding to the image to be annotated based on the feature map, the repaired image, and the image to be annotated to obtain the modified second annotation information. In other words, the second annotation information is redefined annotation information obtained by redefining the first annotation information corresponding to the image to be annotated. For example, the image to be labeled is a medical image, and the first labeling information is labeling information given by a professional doctor based on the image to be labeled.

In the practical application process, firstly, the image to be annotated is input into the image understanding model to obtain a feature map and a repaired image corresponding to the image to be annotated, and then second annotation information corresponding to the image to be annotated is determined based on the feature map, the repaired image, the image to be annotated and the first annotation information corresponding to the image to be annotated.

Fig. 12 is a schematic flow chart illustrating a process of determining second annotation information corresponding to an image to be annotated based on a feature map, a repaired image, the image to be annotated, and first annotation information corresponding to the image to be annotated according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 12 of the present disclosure is extended on the basis of the embodiment shown in fig. 11 of the present disclosure, and the differences between the embodiment shown in fig. 12 and the embodiment shown in fig. 11 will be emphasized below, and the descriptions of the same parts will not be repeated.

As shown in fig. 12, in the method for determining annotation information provided in the embodiment of the present disclosure, the step of determining, based on the feature map, the repaired image, the image to be annotated, and the first annotation information corresponding to the image to be annotated, the second annotation information corresponding to the image to be annotated includes the following steps.

And step S410, determining similarity information between the repaired image and the image to be annotated.

In step S420, second labeling information is determined based on the similarity information, the feature map and the first labeling information.

The restored image is determined based on the image understanding model, and the image understanding model can focus on the boundary information of the image to be annotated and reflect the focused boundary information into the restored image in a mapping mode, so that the accuracy of the first annotation information can be verified based on the similarity information between the restored image and the image to be annotated, and the first annotation information is corrected based on the verification result and the feature map, so that the second annotation information with higher accuracy is obtained.

A specific implementation of the embodiment shown in fig. 12 is given below in conjunction with fig. 13.

Fig. 13 is a schematic flow chart illustrating a process of determining second annotation information corresponding to an image to be annotated based on a feature map, a repaired image, the image to be annotated, and first annotation information corresponding to the image to be annotated according to another exemplary embodiment of the present disclosure.

As shown in fig. 13, in the embodiment of the present disclosure, the repair image includes a plurality of repair units, the image to be labeled includes a plurality of display units, and the plurality of repair units and the plurality of display units are in a one-to-one correspondence relationship. And the step of determining the similarity information between the repaired image and the image to be annotated comprises the following steps.

Step S411, for a plurality of display units, respectively determining L1 distances between the display units and the repair units corresponding to the display units to generate an L1 distance set.

Illustratively, the repair unit in the repair image mentioned above refers to a single pixel unit in the repair image. Alternatively, the repair unit refers to a repair display unit formed by combining a plurality of pixel units in a repair image. In addition, the above-mentioned one-to-one correspondence relationship means that a plurality of repair units and a plurality of display units are in one-to-one correspondence in terms of pixel position and size.

It should be understood that the L1 distance mentioned in step S411 refers to manhattan distance.

In addition, in the embodiment of the present disclosure, the step of determining the second label information based on the similarity information, the feature map, and the first label information includes the following steps.

In step S421, the second label information is determined based on the L1 distance set, the feature map, and the first label information.

According to the method for determining annotation information provided by the embodiment of the disclosure, the L1 distances between the display unit and the repair unit corresponding to the display unit are respectively determined for the plurality of display units to generate an L1 distance set, and then the purpose of determining the similarity information between the repaired image and the image to be annotated is achieved in a manner of determining the second annotation information based on the L1 distance set, the feature map and the first annotation information.

Fig. 14 is a schematic flowchart illustrating a process of determining the second label information based on the L1 distance set, the feature map, and the first label information according to an exemplary embodiment of the disclosure. The embodiment shown in fig. 14 of the present disclosure is extended on the basis of the embodiment shown in fig. 13 of the present disclosure, and the differences between the embodiment shown in fig. 14 and the embodiment shown in fig. 13 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 14, in the method for determining label information provided by the embodiment of the present disclosure, the step of determining the second label information based on the L1 distance set, the feature map and the first label information includes the following steps.

In step S4211, a first L1 distance subset is generated based on the L1 distance corresponding to the first display unit of the first region of the image to be annotated in the L1 distance set.

Illustratively, the first region of the image to be annotated mentioned in step S4211 is determined based on the corresponding first annotation information of the image to be annotated. For example, the image to be labeled can be divided into a foreground region and a background region based on the first labeling information, and the first region is the foreground region.

It should be understood that the first region of the image to be labeled mentioned here is equivalent to the first region of the sample image to be labeled mentioned in the above embodiment.

Step S4212, determining a first redefinition threshold based on the first L1 distance subset.

Exemplarily, the determining step of the first redefined threshold is: all the L1 distances included in the first L1 distance subset are arranged in the order from small to large, and then the value of the L1 distance located at the 90% point is selected as the first redefining threshold.

Step S4213, performing a first redefinition operation on the first annotation information based on the first L1 distance subset, the first redefinition threshold, and the feature map.

Optionally, whether to redefine the annotation value of the pixel position of the first display unit corresponding to the L1 distance is determined based on the relationship between the L1 distance included in the first L1 distance subset and the first redefining threshold.

Optionally, the feature map mentioned in step S4213 is used to determine the redefined annotation value that needs to be subjected to the first redefining operation.

In step S4214, the second annotation information is determined based on the first annotation information subjected to the first redefinition.

The method for determining annotation information provided by the embodiment of the disclosure achieves the purpose of determining second annotation information based on the L1 distance set, the feature map and the first annotation information by generating a first L1 distance subset based on the L1 distance corresponding to the first display unit of the first region of the image to be annotated in the L1 distance set, then determining a first redefinition threshold based on the first L1 distance subset, performing a first redefinition operation on the first annotation information based on the first L1 distance subset, the first redefinition threshold and the feature map, and then determining the second annotation information based on the first annotation information subjected to the first redefinition operation. The embodiment of the disclosure can correct the labeled value of the display unit in which the category confusion occurs (for example, the display unit belonging to the foreground region is labeled as belonging to the background region) based on the first redefined threshold, thereby achieving the purpose of correcting the wrong labeling.

Fig. 15 is a schematic flowchart illustrating a first redefinition operation performed on the first annotation information based on the first L1 distance subset, the first redefinition threshold, and the feature diagram according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 15 of the present disclosure is extended on the basis of the embodiment shown in fig. 14 of the present disclosure, and the differences between the embodiment shown in fig. 15 and the embodiment shown in fig. 14 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 15, in the method for determining annotation information provided by the embodiment of the present disclosure, the step of performing a first redefinition operation on the first annotation information based on the first L1 distance subset, the first redefinition threshold, and the feature map includes the following steps.

Step S42131, for a first pixel position in the first L1 distance subset that is not greater than the first redefined threshold, retaining an annotation value corresponding to the first pixel position of the first annotation information.

It should be noted that the first pixel position mentioned in step S42131 refers to a pixel position where the first L1 is located away from the first display unit corresponding to the L1 distance not greater than the first redefined threshold in the subset. Since the annotation image in which the first annotation information is located corresponds to the image to be annotated, the annotation value corresponding to the pixel position can be determined based on the pixel position in which the first display unit is located.

Step S42132, for a second pixel position in the first L1 distance subset that is greater than the first redefined threshold, acquiring L1 distances corresponding to respective ones of a plurality of neighboring pixel positions circumferentially adjacent to the second pixel position.

It should be noted that the second pixel position mentioned in step S42132 refers to a pixel position where the first L1 is located away from the corresponding first display unit in the L1 distance greater than the first redefined threshold in the subset.

Illustratively, the above-mentioned plurality of adjacent pixel positions circumferentially adjacent to the second pixel position refers to a plurality of adjacent pixel positions where a plurality of first display units circumferentially adjacent to the first display unit corresponding to the second pixel position are located. For example, 8 adjacent pixel positions where 8 first display units adjacent in the circumferential direction are located.

Step S42133 determines a relationship between the L1 distance corresponding to each of the plurality of adjacent pixel positions and the first redefined threshold.

In step S42133, when it is determined that the L1 distance corresponding to at least one neighboring pixel position is greater than the first redefining threshold from the L1 distances corresponding to the neighboring pixel positions, the redefining label value corresponding to the second pixel position is determined based on the feature map (i.e., step S42134 is performed). When it is determined that all the L1 distances in the L1 distances corresponding to the adjacent pixel positions are not greater than the first redefining threshold, the labeled value corresponding to the first label information and the second pixel position is retained (i.e., step S42135 is executed).

Step S42134, determining a redefined label value corresponding to the second pixel position based on the feature map.

In step S42135, the label value corresponding to the first label information and the second pixel position is retained.

The method for determining labeling information provided by the embodiment of the disclosure can correct the labeling value of the display unit in which the category confusion occurs (for example, the display unit belonging to the foreground region is labeled as belonging to the background region) based on the first redefining threshold, thereby achieving the purpose of correcting the wrong labeling. In addition, the embodiment of the disclosure can make the second annotation information generated by final modification be a continuous value (for example, a continuous value of a closed interval from 0 to 1), so that an excessive loss of a subsequent image segmentation model at an ambiguous pixel position can be avoided, and an adverse effect of the ambiguous annotation on the model is reduced.

Fig. 16 is a schematic flowchart illustrating a process of determining a redefined labeled value corresponding to a second pixel position based on a feature map according to an exemplary embodiment of the disclosure. The embodiment shown in fig. 16 of the present disclosure is extended on the basis of the embodiment shown in fig. 15 of the present disclosure, and the differences between the embodiment shown in fig. 16 and the embodiment shown in fig. 15 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 16, in the method for determining annotation information provided in the embodiment of the present disclosure, the step of determining the redefined annotation value corresponding to the second pixel position based on the feature map includes the following steps.

Step S610, calculating a first feature extraction vector of the second pixel position in the feature map.

For example, the feature map is a high-dimensional vector comprising C feature maps, with a size of CxHxW. Correspondingly, the size of the first feature extraction vector of the second pixel position in the feature map is Cx1x 1.

Step S620, determining at least one second feature extraction vector based on euclidean distances between the second pixel position and a plurality of adjacent pixel positions.

Preferably, it is first determined whether the L1 distances corresponding to each of the plurality of adjacent pixel positions mentioned in step S620 satisfy a condition not greater than the first redefined threshold, and then, of the adjacent pixel positions satisfying the condition not greater than the first redefined threshold, an adjacent pixel position having a relatively small (e.g., minimum) euclidean distance is selected as the adjacent pixel position for which the second feature extraction vector needs to be determined, based on the euclidean distances between the second pixel position and the plurality of adjacent pixel positions. By the arrangement, the marking accuracy of the foreground area can be improved.

Similarly, a second feature extraction vector corresponding to the neighboring pixel position is also determined according to the feature map, and its size is Cx1x 1.

Step S630, determining a redefined label value corresponding to the second pixel position based on the vector similarity information between the first feature extraction vector and the at least one second feature extraction vector.

For example, the vector similarity information is determined based on js (jensen shannon) divergence between the first feature extraction vector and the at least one second feature extraction vector. For another example, the vector similarity information is determined based on a cosine similarity or KL divergence between the first feature extraction vector and the at least one second feature extraction vector.

Illustratively, the labeling value of the adjacent pixel position corresponding to the second feature extraction vector having the highest vector similarity with the first feature extraction vector is used as the redefining labeling value of the second pixel position.

The embodiment of the disclosure achieves the purpose of determining the redefined label value corresponding to the second pixel position based on the feature map by calculating the first feature extraction vector of the second pixel position in the feature map, then determining at least one second feature extraction vector based on the euclidean distances between the second pixel position and a plurality of adjacent pixel positions, and determining the redefined label value corresponding to the second pixel position based on the vector similarity information between the first feature extraction vector and the at least one second feature extraction vector.

FIG. 17 is a flowchart illustrating a process of determining second annotation information based on first annotation information undergoing a first redefinition operation according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 17 of the present disclosure is extended on the basis of the embodiment shown in fig. 14 of the present disclosure, and the differences between the embodiment shown in fig. 17 and the embodiment shown in fig. 14 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 17, in the method for determining annotation information provided by the embodiment of the present disclosure, the step of determining the second annotation information based on the first annotation information subjected to the first redefinition operation includes the following steps.

In step S710, a second L1 distance subset is generated based on the L1 distance corresponding to the second display unit of the second region of the image to be annotated in the L1 distance set.

Illustratively, the second area of the image to be annotated mentioned in step S710 is determined based on the first annotation information corresponding to the image to be annotated. For example, the image to be labeled can be divided into a foreground region and a background region based on the first labeling information, and the second region is the background region.

It should be understood that the second region of the image to be labeled mentioned here is equivalent to the second region of the sample image to be labeled mentioned in the above embodiment.

In step S720, a second redefined threshold is determined based on the second L1 subset of distances.

Illustratively, the determining step of the second redefined threshold is: all the L1 distances included in the second L1 distance subset are arranged in the order from small to large, and then the value of the L1 distance located at the 50% point is selected as the second redefined threshold.

In step S730, a second redefinition operation is performed on the first annotation information subjected to the first redefinition operation based on the second L1 distance subset, the second redefinition threshold, and the feature map to generate second annotation information.

Exemplarily, the second redefinition operation mentioned in step S730 is equivalent to the first redefinition operation mentioned in the above embodiment, and the specific implementation steps thereof may refer to the first redefinition operation mentioned above (refer to the embodiments shown in fig. 15 and fig. 16 in particular).

An example of the main steps of the second redefinition operation is given below.

First, for a third pixel position in the second L1 distance subset that is not greater than the second redefined threshold, the labeled value of the first labeling information corresponding to the third pixel position is retained, and then, for a fourth pixel position in the second L1 distance subset that is greater than the second redefined threshold, the L1 distances corresponding to each of a plurality of adjacent pixel positions circumferentially adjacent to the fourth pixel position are obtained. And then judging the relationship between the L1 distance corresponding to each of the plurality of adjacent pixel positions and the second redefined threshold, and when judging that the L1 distance corresponding to at least one adjacent pixel position in the L1 distances corresponding to each of the plurality of adjacent pixel positions is greater than the second redefined threshold, determining the redefined marking value corresponding to the fourth pixel position based on the feature map. And when all the L1 distances in the L1 distances corresponding to the adjacent pixel positions are not greater than the second redefined threshold, keeping the labeled values corresponding to the first labeled information and the fourth pixel position.

Exemplarily, the step of determining the redefined annotation value corresponding to the fourth pixel position based on the feature map may be seen in the following example.

The method comprises the steps of firstly calculating a third feature extraction vector of a fourth pixel position in a feature map, then determining at least one fourth feature extraction vector based on Euclidean distances between the fourth pixel position and a plurality of adjacent pixel positions, then determining a redefined mark value corresponding to the fourth pixel position based on vector similarity information between the third feature extraction vector and the at least one fourth feature extraction vector.

Fig. 18 is a schematic structural diagram of a training apparatus for a network model according to an exemplary embodiment of the present disclosure. As shown in fig. 18, the training apparatus for a network model provided in the embodiment of the present disclosure includes:

a model determination module 100 for determining an initial network model;

the training module 200 is configured to train an initial network model based on the sample image to be labeled and the probability image corresponding to the sample image to be labeled, so as to generate an image understanding model.

Fig. 19 is a schematic structural diagram of a training apparatus for a network model according to another exemplary embodiment of the present disclosure. The embodiment shown in fig. 19 of the present disclosure is extended on the basis of the embodiment shown in fig. 18 of the present disclosure, and the differences between the embodiment shown in fig. 19 and the embodiment shown in fig. 18 will be emphasized below, and the descriptions of the same parts will not be repeated.

As shown in fig. 19, the training apparatus for a network model provided in the embodiment of the present disclosure further includes:

the first annotation information determining module 150 is configured to determine first annotation information corresponding to a sample image to be annotated;

and a probability image determining module 160, configured to determine a probability image based on the first annotation information and the sample image to be annotated.

Optionally, in an embodiment of the present disclosure, the probability image determining module 160 includes:

a determining unit 161, configured to determine a first region and a second region of the sample image to be labeled based on the first labeling information;

a first distance set generating unit 162 configured to calculate, for each of the plurality of first display units, a first euclidean distance between the first display unit and a second display unit closest to the first display unit to generate a first distance set;

a second distance set generating unit 163 for calculating, for the plurality of second display units, second euclidean distances between the second display units and the first display units closest to the second display units, respectively, to generate a second distance set;

a probability image generating unit 164 for generating a probability image based on the first distance set and the second distance set.

Optionally, in an embodiment of the present disclosure, the probability image generating unit 164 is further configured to determine a preset distance threshold based on the plurality of first euclidean distances, determine an erasure probability corresponding to each of the plurality of first display units based on a relationship between the plurality of first euclidean distances and the preset distance threshold, then determine an erasure probability corresponding to each of the plurality of second display units based on a relationship between the plurality of second euclidean distances and the preset distance threshold, and generate the probability image based on the erasure probability corresponding to each of the plurality of first display units and the erasure probability corresponding to each of the plurality of second display units.

Optionally, in an embodiment of the present disclosure, the training module 200 includes:

an erased image generating unit 210, configured to generate an erased image corresponding to the sample image to be labeled based on the sample image to be labeled and the probability image;

a feature map generation unit 220, configured to input the erased image to the feature extraction module to generate a feature map;

a repair image generation unit 230 for inputting the feature map into an image repair module to generate a repair image;

and a training unit 240, configured to train an initial network model based on the sample image to be labeled and the repaired image.

Optionally, in an embodiment of the present disclosure, the erased image generating unit 210 is further configured to perform an erasing operation on the sample image to be annotated based on the probability image to generate a first-stage erased image, and determine the first-stage erased image as the erased image.

Optionally, in an embodiment of the present disclosure, the erased image generating unit 210 is further configured to perform an erasing operation on the sample image to be annotated based on the probability image to generate a first-stage erased image, generate a first erased area set based on size information corresponding to each of the plurality of first erased areas and a preset size condition, then determine a second-stage erased image based on distance information between the first erased areas included in the first erased area set, and determine the second-stage erased image as the erased image.

Fig. 20 is a schematic structural diagram of a device for determining annotation information according to an exemplary embodiment of the present disclosure. As shown in fig. 20, the apparatus for determining annotation information provided in the embodiment of the present disclosure includes:

the input module 300 is configured to input the image to be annotated into the image understanding model to obtain a feature map and a restored image corresponding to the image to be annotated;

and a second annotation information determining module 400, configured to determine, based on the feature map, the repaired image, the image to be annotated, and the first annotation information corresponding to the image to be annotated, second annotation information corresponding to the image to be annotated.

Fig. 21 is a schematic structural diagram of a device for determining annotation information according to another exemplary embodiment of the present disclosure. The embodiment shown in fig. 21 of the present disclosure is extended on the basis of the embodiment shown in fig. 20 of the present disclosure, and the differences between the embodiment shown in fig. 21 and the embodiment shown in fig. 20 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 21, in the embodiment of the present disclosure, the second annotation information determination module 400 includes:

a similarity information determining unit 410, configured to determine similarity information between the restored image and the image to be annotated;

a second labeling information determining unit 420, configured to determine second labeling information based on the similarity information, the feature map, and the first labeling information.

Optionally, in an embodiment of the present disclosure, the similarity information determining unit 410 is further configured to determine, for a plurality of display units, L1 distances between the display unit and the repair unit corresponding to the display unit, respectively, to generate an L1 distance set; and, the second label information determining unit 420 is further configured to determine the second label information based on the L1 distance set, the feature map, and the first label information.

Optionally, in an embodiment of the present disclosure, the second annotation information determining unit 420 is further configured to generate a first L1 distance subset based on the L1 distance in the L1 distance set corresponding to the first display unit of the first region of the image to be annotated, then determine a first redefinition threshold based on the first L1 distance subset, perform a first redefinition operation on the first annotation information based on the first L1 distance subset, the first redefinition threshold, and the feature map, and determine the second annotation information based on the first annotation information subjected to the first redefinition operation.

It should be understood that, for the operations and functions of the model determining module 100, the first annotation information determining module 150, the probability image determining module 160, and the training module 200 in the training apparatus for network models provided in fig. 18 and 19, and the determining unit 161, the first distance set generating unit 162, the second distance set generating unit 163, and the probability image generating unit 164 included in the probability image determining module 160, and the erased image generating unit 210, the first training unit 220, and the second training unit 230 included in the training module 200, reference may be made to the above-mentioned training method for network models provided in fig. 3 to 10, and no further description is given here to avoid repetition.

Further, it should be understood that the operations and functions of the input module 300 and the second annotation information determination module 400 in the annotation information determination apparatus provided in fig. 20 and 21, and the similarity information determination unit 410 and the second annotation information determination unit 420 included in the second annotation information determination module 400 may refer to the above-mentioned determination method of the annotation information provided in fig. 11 to 17, and are not described herein again in order to avoid redundancy.

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 22. Fig. 22 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

As shown in fig. 22, the electronic device 2200 includes one or more processors 2201 and memory 2202.

The processor 2201 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 2200 to perform desired functions.

Memory 2202 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 2201 to implement the network model training method, the annotation information determination method, and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as sample image data may also be stored in the computer-readable storage medium.

In one example, the electronic device 2200 may further include: an input device 2203 and an output device 2204, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 2203 may include, for example, a keyboard, a mouse, and the like.

The output device 2204 can output various information including the specified second label information to the outside. The output devices 2204 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 2200 relevant to the present disclosure are shown in fig. 22, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 2200 may include any other suitable components depending on the particular application.

In addition to the above methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method for training a network model, the method for determining annotation information according to various embodiments of the present disclosure described above in this specification.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps in the training method of a network model, the determination method of annotation information according to various embodiments of the present disclosure described above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method for training a network model, comprising:

determining an initial network model;

training the initial network model based on a sample image to be annotated and a probability image corresponding to the sample image to be annotated to generate an image understanding model, wherein the image understanding model is used for generating a feature map and a repair image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that a display unit in the sample image to be annotated is erased.

2. The method according to claim 1, wherein the initial network model comprises a feature extraction module and an image restoration module, and wherein the training the initial network model based on the sample image to be labeled and the probability image corresponding to the sample image to be labeled comprises:

generating an erasing image corresponding to the sample image to be marked based on the sample image to be marked and the probability image;

inputting the erasing image into the feature extraction module to generate a feature map;

inputting the feature map into the image restoration module to generate a restored image;

and training the initial network model based on the sample image to be annotated and the repaired image.

3. The method according to claim 2, wherein the generating of the erased image corresponding to the sample image to be labeled based on the sample image to be labeled and the probability image comprises:

performing erasing operation on the sample image to be marked based on the probability image to generate a first-stage erasing image, wherein the first-stage erasing image comprises a first erasing area;

determining the first stage erased image as the erased image.

4. The method according to claim 3, wherein the number of the first erased areas is plural, and before the determining the first-stage erased image as the erased image, further comprising:

generating a first erasing area set based on the size information and the preset size condition corresponding to the plurality of first erasing areas respectively;

determining a second-stage erased image based on distance information between first erased areas in the first erased area set;

wherein the determining the first stage erased image as the erased image comprises:

determining the second stage erased image as the erased image.

5. The method of claim 4, wherein determining a second-stage erased image based on distance information between first erased areas in the first erased area set comprises:

calculating Euclidean distance between any two first erasing areas in the first erasing area set;

if the Euclidean distance meets a preset communication condition, performing communication processing on the two first erasing areas;

and generating the second-stage erasing image based on the first erasing area set subjected to the connection processing.

6. The method according to any one of claims 1 to 5, wherein before the training of the initial network model based on the sample image to be labeled and the probability image corresponding to the sample image to be labeled, the method further comprises:

determining first labeling information corresponding to the sample image to be labeled;

and determining the probability image based on the first labeling information and the sample image to be labeled.

7. The method of claim 6, wherein the determining the probability image based on the first annotation information and the sample image to be annotated comprises:

determining a first area and a second area of the sample image to be annotated based on the first annotation information, wherein the first area comprises a plurality of first display units, and the second area comprises a plurality of second display units;

calculating first Euclidean distances between the first display unit and a second display unit closest to the first display unit respectively aiming at the plurality of first display units to generate a first distance set;

calculating second Euclidean distances between the second display units and first display units closest to the second display units respectively aiming at the plurality of second display units to generate a second distance set;

generating the probability image based on the first set of distances and the second set of distances.

8. The method of claim 7, wherein the first set of distances comprises a plurality of first Euclidean distances, wherein the second set of distances comprises a plurality of second Euclidean distances, and wherein generating the probabilistic image based on the first set of distances and the second set of distances comprises:

determining a preset distance threshold based on the plurality of first Euclidean distances;

determining the erasing probability corresponding to each of the first display units based on the relation between the first Euclidean distances and the preset distance threshold;

determining the erasing probability corresponding to each of the second display units based on the relation between the second Euclidean distances and the preset distance threshold;

and generating the probability image based on the erasing probability corresponding to each of the first display units and the erasing probability corresponding to each of the second display units.

9. A method for determining label information, comprising:

inputting an image to be annotated into an image understanding model to obtain a feature map and a repaired image corresponding to the image to be annotated, wherein the image understanding model is obtained based on the training method of the network model of any one of 1 to 8;

and determining second labeling information corresponding to the image to be labeled based on the feature map, the repaired image, the image to be labeled and the first labeling information corresponding to the image to be labeled.

10. The method according to claim 9, wherein the determining second labeling information corresponding to the image to be labeled based on the feature map, the repaired image, the image to be labeled and the first labeling information corresponding to the image to be labeled comprises:

determining similarity information between the repaired image and the image to be marked;

and determining the second labeling information based on the similarity information, the feature map and the first labeling information.

11. The method according to claim 10, wherein the repaired image comprises a plurality of repaired units, the image to be annotated comprises a plurality of display units, the plurality of repaired units and the plurality of display units are in one-to-one correspondence, and the determining the similarity information between the repaired image and the image to be annotated comprises:

determining L1 distances between the display unit and a repair unit corresponding to the display unit, respectively, for the plurality of display units to generate an L1 distance set;

wherein the determining the second label information based on the similarity information, the feature map, and the first label information includes:

determining the second label information based on the L1 distance set, the feature map, and the first label information.

12. The method of claim 11, wherein said determining the second label information based on the L1 distance set, the feature map, and the first label information comprises:

generating a first L1 distance subset based on the L1 distance corresponding to the first display unit of the first area of the image to be labeled in the L1 distance set, wherein the first area is determined based on the first labeling information;

determining a first redefinition threshold based on the first L1 subset of distances;

performing a first redefinition operation on the first annotation information based on the first L1 distance subset, the first redefinition threshold, and the feature map;

determining the second annotation information based on the first annotation information subjected to the first redefinition operation.

13. The method of claim 12, wherein said first redefining said first annotation information based on said first L1 subset of distances, said first redefining threshold, and said feature map comprises:

for a first pixel location in the first L1 distance subset that is not greater than the first redefined threshold, retaining an annotation value for the first annotation information corresponding to the first pixel location;

for a second pixel position in the first L1 distance subset that is greater than the first redefined threshold, obtaining L1 distances corresponding to each of a plurality of adjacent pixel positions that are circumferentially adjacent to the second pixel position;

if the L1 distance corresponding to each of the plurality of adjacent pixel positions is not greater than the first redefined threshold, retaining the labeled value corresponding to the first labeled information and the second pixel position;

determining a redefined annotation value for the second pixel position based on the feature map if the L1 distance for at least one of the plurality of neighboring pixel positions is greater than the first redefined threshold.

14. The method of claim 13, wherein determining the redefined label value corresponding to the second pixel location based on the feature map comprises:

calculating a first feature extraction vector of the second pixel position in the feature map;

determining at least one second feature extraction vector based on Euclidean distances between the second pixel position and the plurality of adjacent pixel positions, wherein the second feature extraction vector is a feature extraction vector of the adjacent pixel positions in the feature map;

and determining a redefined marking value corresponding to the second pixel position based on the vector similarity information between the first feature extraction vector and the at least one second feature extraction vector.

15. The method of any of claims 12 to 14, wherein said determining the second annotation information based on the first annotation information subjected to the first redefinition comprises:

generating a second L1 distance subset based on a L1 distance in the L1 distance set corresponding to a second display unit of a second area of the image to be labeled, wherein the second area is determined based on the first labeling information;

determining a second redefined threshold based on the second L1 subset of distances;

performing a second redefinition operation on the first annotation information subjected to the first redefinition operation based on the second L1 distance subset, the second redefinition threshold, and the feature map to generate the second annotation information.

16. An apparatus for training a network model, comprising:

a model determination module for determining an initial network model;

the training module is used for training the initial network model based on a sample image to be annotated and a probability image corresponding to the sample image to be annotated so as to generate an image understanding model, wherein the image understanding model is used for generating a feature map and a repair image corresponding to the image to be annotated based on the image to be annotated, and the probability image is used for representing the probability that a display unit in the sample image to be annotated is erased.

17. An apparatus for determining label information, comprising:

the image annotation device comprises an input module, a processing module and a display module, wherein the input module is used for inputting an image to be annotated into an image understanding model so as to obtain a feature map and a repaired image corresponding to the image to be annotated, and the image understanding model is obtained based on the network model training method of any one of the 1 to 8;

and the second annotation information determining module is used for determining second annotation information corresponding to the image to be annotated based on the feature map, the repaired image, the image to be annotated and the first annotation information corresponding to the image to be annotated.

18. A computer-readable storage medium storing a computer program for executing the method for training a network model according to any one of claims 1 to 8 or the method for determining label information according to any one of claims 9 to 15.

19. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to perform a method for training a network model according to any one of claims 1 to 8, or perform a method for determining label information according to any one of claims 9 to 15.