CN112506797A

CN112506797A - Performance test method for medical image recognition system

Info

Publication number: CN112506797A
Application number: CN202011525218.0A
Authority: CN
Inventors: 陈芳; 成楚凡; 张道强
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-16
Anticipated expiration: 2040-12-22
Also published as: CN112506797B

Abstract

The invention discloses a performance test method of a medical image recognition system, which comprises the following steps that 1) a multi-class image test data generation module comprises a confrontation sample generation network and an entity and background recombination method; 2) the multi-angle test module is based on system stability, reliability and safety performance; 3) and a model decision evaluation module. The invention realizes the generation of multi-class image test data and multi-angle complete system test, finally completes the decision evaluation of the medical image recognition system, and has wide future application prospect.

Description

Performance test method for medical image recognition system

Technical Field

The invention belongs to the technical field of performance analysis of medical image recognition systems, and particularly relates to a performance test method for a medical image recognition system.

Background

The medical image recognition system plays an important role in clinical medical diagnosis, greatly changes the clinical diagnosis mode and promotes the development of clinical medicine. The intelligent medical image recognition is based on an artificial intelligence technology and is used for analyzing and processing images and operation videos scanned by common medical imaging technology such as X-ray films, computed tomography, magnetic resonance imaging and the like, and the development direction of the intelligent medical image recognition mainly comprises intelligent image diagnosis, three-dimensional image reconstruction and registration, intelligent operation video analysis and the like. Currently, research in this field has progressed to a certain extent, and is gradually moving to clinical applications. Therefore, evaluation and testing of the performance of medical image recognition systems is particularly important for the development of future clinical medicine. The FERET sets a performance standard for the recognition algorithm for the first time, defines a series of evaluation standards, greatly promotes the development of the recognition technology, and the established evaluation standards and evaluation protocols always influence the prior art, thereby bringing a profound influence on the development of the face recognition technology in the future. Although there are some test schemes for general image recognition systems, a test scheme for medical image recognition systems has not been proposed. Moreover, because the identification technology at that time is not mature, the identification system participating in FERET evaluation is mostly a prototype system in a university laboratory, and the identification effect is not very satisfactory.

In recent years, there has been an increasing demand for a test method that analyzes an identification model and automatically performs performance analysis. With the rapid development of the deep learning technology, the performance index of the medical image recognition system is also rapidly improved, and the recognition efficiency is greatly improved, so how to test the performance of the models needs to be solved urgently. Aiming at the fact that the visual reality of a generated test image needs to be considered in the test of a medical image recognition system, a method for resisting against a sample generation network and entity and background recombination is provided, and the authenticity of a generated sample is fully guaranteed; and the medical identification system has higher requirements on the test of reliability and safety, so that a multi-angle test scheme is provided, and the countermeasure sample is applied to the medical image, thereby achieving the effect of better analyzing the medical identification model.

Disclosure of Invention

The invention provides a performance testing method for a medical image recognition system, which aims to solve the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a performance test method for a medical image recognition system comprises the following steps: the multi-class image test data generation module comprises a confrontation sample generation network and an entity and background recombination method; the multi-angle test module comprises a performance test, a reliability test and a safety test; the decision evaluation module analyzes the input test result, judges the performance of the model and gives a detailed test report;

the network inputs a group of pictures to be classified and identified, the pictures are input into the multi-class image test data generation module for the first time, the input models are classified after image augmentation, classification results are input into the multi-angle test module, the multi-angle test module tests the learning results of the models and transmits the results to the decision evaluation module, and the decision evaluation module analyzes the input test results, judges the performance of the models and gives detailed test reports.

Further, the countermeasure sample generation network and entity, background recombination method includes using multi-loss hybrid countermeasure camouflage against augmentation,

multiple loss function

Expressed as:

wherein: λ represents the antagonistic strength,

Indicating a loss of resistance,

Representing a style loss for style generation,

Representing a content loss for preserving source image content,

Representing a smoothness penalty for ensuring smoothness of the extended sample;

defining an existing image, a target attack area and an expected target pattern by a user, generating a required pattern in the required area, and adding additional physical adaptation training to the generated extended sample in each step;

the style distance between two images is defined by the difference in the style representation of the two images,

wherein:

as a feature distance, l is a style level feature, S_lIs a collection of style layers from which a style representation is extracted,

is a feature extractor for a style or style,

is from

A set of pattern layers of the extracted deep layer features of the Gram matrix, x^sIs a stylistic reference image, x' is the generated countermeasure sample;

pattern loss for pattern generation

The content of the enhanced image in the generated reference pattern is very different from the content of the original image; specifically, as follows, the following description will be given,

wherein:

is content loss, t is content layer characteristic, c_tIs a set of content layers that extracts the content representation,

is a feature extractor for the content layer, x is the original image, x' is the generated countermeasure sample;

improving the smoothness of the enhanced image by reducing the variation between adjacent pixels; for the enhanced image, the smoothness penalty is defined as,

wherein: x'_i,jTo combat the pixel value at the sample (i, j) coordinate, x_i+1,jIs the pixel value, x, at the (i +1, j) coordinate of the original image_i,j+1For the pixel value at the original image (i, j +1) coordinate:

for the loss of antagonism

The following cross entropy loss was used:

wherein: p is a radical of_yadv() And p_y() The probability outputs of the object model F (F refers to the object function of a general machine model, e.g. the object function F of vgg is fc8, from which the probability outputs corresponding to 1000 classes can be derived) to the labels yadv (class of confrontational samples) and y (class of original images), respectively.

Realistic conditions are introduced into the generation process of the augmented example, as follows:

wherein: o is a random background image sampled in the physical world, T is a random transformation of rotation, resizing and color shift,

is a set of transformations; by following the original graphLike x and background image o, the resulting enhanced sample is substantially legitimate to a human observer;

and the target background is recombined and expanded, the target is segmented from the background by using a segmentation algorithm Mask R-CNN, pixels are supplemented to a blank part in the background by using an interpolation algorithm, and finally the target and the background are randomly combined to realize the image expansion.

Furthermore, the performance test in the multi-angle test module comprises different angles: judging and judging the identification Accuracy Accuracy, judging and judging the identification Loss value Loss and judging the metamorphic relation; judging the Accuracy Accuracy and the Loss value Loss, wherein the Accuracy Accuracy and the Loss value Loss are obtained by subtracting the Accuracy Accuracy and the identification Loss value Loss output by the models before and after the augmentation to obtain the identification Accuracy difference percentage delta acc before and after the augmentation and the identification Loss difference percentage delta Loss before and after the augmentation;

the disintegration test is defined as: c_iFor the original test image

Is classified by the image recognition system, S_iFor the original test image

A confidence score of; c_i' use for association with metamorphic relations

Synthesized new test image

Class label of S_i' use for association with metamorphic relations

Synthesized new test image

Then the metamorphic relationship is expressed as:

C_i＝C′_iandΔS＝|S_i-S′_i|<c (7)

wherein: c is the hyperparameter, 0< c <100, c is set to 50, Δ S is the difference in confidence scores before and after expansion.

Further, the reliability test in the multi-angle test module is a robustness test, and under the condition that an original image x meets confidence assurance, immune attack can be performed within a norm sphere radius R:

wherein: z () is a loss function, g () is an objective function to be optimized,

the method is characterized in that the method is arbitrary, epsilon is introduced noise, B (x; R) is a noise set, R is a norm sphere radius, R is a value of a wireless approximate 0, and x is an original image;

the final robustness accuracy (robac) is defined as:

the security test in the multi-angle test module is a model invariance test, selects a random image, selects a perturbation of a pixel using one of four methods described below, then measures the sensitivity of the network to the perturbation, the first method is the "Crop" (Crop) method, randomly selects a square in the original image and resizes the square to 224x224px, then we shift the square diagonally by one pixel to create a second image that is different from the first image by shifting a single pixel; the second method is the "Embedding" (Embedding) method, which first reduces the image to a minimum size of 100px while maintaining the aspect ratio and embeds it at a random location within the 224x224px image while filling the rest of the image with Black (Black) pixels, then shifts the Embedding location by a single pixel, creating two identical images again until a single pixel is shifted; in the third method, the image is first reduced to a minimum size of 100px while maintaining the aspect ratio and embedded at random locations within the 224x224px image, then using a simple repair algorithm (each black pixel is replaced by a weighted average of the non-black pixels in its neighborhood), the fourth method is the same as the second protocol, the image is first reduced to a minimum size of 100px, but we do not move the embedding location, but instead keep the embedding location unchanged and change the size of the embedded image by a single pixel (e.g., from a size of 100x100px to a size of 101x101px pixels).

Further, in the safety test, two methods are used for measuring sensitivity as an invariance test of a model, the first method is called P (Top-1Change), and the TOP-1 of the network predicts the probability of Change after single-pixel disturbance; second referred to as "mean absolute change" (MAC), measures the change in the mean absolute value of the probability computed by the network (i.e., the class with the highest probability in the first of the two frames) after a pixel perturbation of the top class (i.e., the class with the highest probability in the first of the two frames).

Further, the decision evaluation module analyzes the input test result, judges the model performance [ identification Accuracy after Accuracy expansion, identification Loss after Loss expansion, identification Accuracy difference before and after Δ acc expansion, identification Loss difference before and after Δ Loss expansion, CR model robustness (characterized by robac), confidence score difference before and after Δ S expansion, probability of Change of TOP-1 prediction of P (Top-1Change) network after single pixel disturbance, MAC average absolute Change) ] and gives a detailed test report, when the performance of a plurality of recognition models is compared, a large number of independent performance indexes are often too complicated for users, so that the users are difficult to make reasonable judgment, therefore, the comprehensive influence of different indexes on the identification system is considered in the design of the performance indexes, then, a comprehensive performance index CM (composite value) is defined to reflect the comprehensive performance of different recognition systems; the formula is as follows:

wherein: CM (compact message processor)_iRepresenting the integrated performance value, ω, of the i-th identification system_jWeight, max (M) representing jth performance index value of cloud service_j) Represents the maximum value of the jth individual performance index, min (M), in multiple recognition systems_j) Representing the minimum value of the j-th performance indicator, M, in a plurality of recognition systems_ijA j-th performance index value representing an i-th identification system, N representing a total number of identification system performance index values, by using the formula (2 max (M)_j)-M_ij)/(2*max(M_j)-min(M_j) ) to M_ijIs normalized to [0, 1 ]]An interval. It can be seen that the larger the value of CM, the better the overall performance of the identification system.

For some identification system performance indicators, such as Loss, P (Top-1Change), MAC, etc., the value M_ijThe smaller the value of the CM comprehensive performance index is, the larger the value of the performance index is, the higher the value of the performance index is, the better the identification performance is, but the direct substitution of the value into the formula causes the CM value to be reduced, which is not expected, so that the values of the performance indexes need to be processed and used (1-M)_ij) To replace M in the formula_ijThe value is obtained.

Further, the decision evaluation module finally outputs the results of the multi-angle test module test and generates a corresponding test report table, as shown in tables 1-3. The test system also gives corresponding suggestions due to different requirements of different task scenarios.

Compared with the prior art, the invention has the following beneficial effects:

the invention realizes the generation of multi-class image test data and multi-angle complete system test, finally completes the decision evaluation of the medical image recognition system, and has wide future application prospect.

Drawings

FIG. 1 is a framework flow diagram of the present invention;

FIG. 2 is a flow chart of the present invention for countering amplification;

FIG. 3 is a flow chart of the background recombination augmentation of an object in the present invention;

FIG. 4 is a block diagram of a decision evaluation module of the present invention.

Detailed Description

The present invention will be further described with reference to the following examples.

A performance testing method for a medical image recognition system, as shown in fig. 1, the performance testing method includes: the system comprises a multi-class image test data generation module, a multi-angle test module and a decision evaluation module, wherein the multi-class image test data generation module comprises a confrontation sample generation network and an entity and background recombination method; the multi-angle test module comprises a performance test, a reliability test and a safety test; the decision evaluation module analyzes the input test result, judges the performance of the model and gives a detailed test report;

A countermeasure sample generation network and an entity and background recombination method use a countermeasure sample generation joint entity and background recombination scheme in consideration of the characteristics of medical images and the visual reality of generated test images. Counter-augmentation uses multi-loss hybrid counter-camouflaging, which can generate new enhanced images that human observers look legitimate without relying on large amounts of data to train the generating network. The method aims to develop a mechanism to generate an extended sample with a custom pattern, realize image enhancement by using a pattern transformation technology and realize concealment of the image by using an anti-attack technology. Final multiple loss function

Is toResistance to loss of strength lambda and antagonism

Product of (2), pattern loss for pattern generation

Content loss for preserving source image content

And smoothness penalty for ensuring smoothness of extended samples

Combinations of (a) and (b).

Multiple loss function

Expressed as:

wherein: λ represents the antagonistic strength,

Indicating a loss of resistance,

Representing a style loss for style generation,

Representing a content loss for preserving source image content,

as shown in fig. 2, an overview of the anti-augmentation method is shown. The user defines the existing image, the target attack area and the expected target pattern, and generates the required pattern in the required area, as shown on the right side of fig. 2. In order to make the extended samples robust to various environmental conditions (including illumination, rotation, etc.), additional physical adaptation training is added to the generated extended samples at each step;

wherein:

is a feature extractor for a style or style,

is from

pattern loss for pattern generation

wherein:

is a content loss, t isContent layer characteristics, c_tIs a set of content layers that extracts the content representation,

wherein: x'_i，jTo combat the pixel value at the sample (i, j) coordinate, x_i+1，jIs the pixel value, x, at the (i +1, j) coordinate of the original image_i,j+1For the pixel value at the original image (i, j +1) coordinate:

for the loss of antagonism

The following cross entropy loss was used:

In order to make the confrontational image samples realistic in the real world, we model the realistic conditions in the process of generating the augmented samples. Since real-world environments often involve fluctuations in conditions such as viewpoint movement, image noise and other natural transformations, we use a series of adjustments to accommodate these different conditions. In particular, we use a technique similar to expected over-conversion (EOT). Our goal is to improve the adaptability of the expanded samples to different physical conditions. Therefore, we consider transformations to model fluctuations in physical world conditions, including rotation, scaling, color shifting (to model lighting variations), and random backgrounds. Here, realistic conditions are introduced into the generation process of the augmented example, as follows:

is a set of transformations; by generating an enhanced sample from the original image x and the background image o that is substantially legitimate to a human observer;

the target background recombination and augmentation uses a segmentation algorithm Mask R-CNN to segment the target from the background, uses an interpolation algorithm to supplement pixels to the blank part in the background, and finally randomly combines the target and the background to realize the image augmentation, wherein the overall method frame is shown in figure 3.

The performance test in the multi-angle test module comprises different angles: judging and judging the identification Accuracy Accuracy, judging and judging the identification Loss value Loss and judging the metamorphic relation; judging the Accuracy Accuracy and the Loss value Loss, wherein the Accuracy Accuracy and the Loss value Loss are obtained by subtracting the Accuracy Accuracy and the identification Loss value Loss output by the models before and after the augmentation to obtain the identification Accuracy difference percentage delta acc before and after the augmentation and the identification Loss difference percentage delta Loss before and after the augmentation;

the disintegration test is defined as: c_iFor the original test image

Is classified by the image recognition system, S_iFor the original test image

A confidence score of; c_i' is a combination of transmutation relationshipBy using

Synthesized new test image

Class label of S_i' use for association with metamorphic relations

Synthesized new test image

Then the metamorphic relationship is expressed as:

C_i＝C′_iandΔS＝|S_i-S′_i|<c (7)

The reliability test in the multi-angle test module is a robustness test, and under the condition that an original image x meets confidence guarantee, immune attack can be carried out in a norm sphere radius R:

the final robustness accuracy (robac) is defined as:

In the safety test, two methods are used for measuring sensitivity as an invariance test of a model, the first method is called P (Top-1Change), and the TOP-1 of the network predicts the probability of Change after single-pixel disturbance; second referred to as "mean absolute change" (MAC), measures the change in the mean absolute value of the probability computed by the network (i.e., the class with the highest probability in the first of the two frames) after a pixel perturbation of the top class (i.e., the class with the highest probability in the first of the two frames).

As shown in FIG. 4, the decision evaluation module analyzes the input test result, and determines the model performance [ recognition Accuracy after Accuracy expansion, recognition Loss after Loss expansion, recognition Accuracy difference before and after Δ acc expansion, recognition Loss difference before and after Δ Loss expansion, CR model robustness (characterized by robac), confidence score difference before and after Δ S expansion, probability of Change after single pixel disturbance of TOP-1 prediction of P (Top-1Change) network, MAC average absolute Change) ] and gives a detailed test report, when comparing the performance of a plurality of recognition models, a large number of individual performance indexes are often too complicated for users, so that the users are difficult to make reasonable judgment, therefore, the comprehensive influence of different indexes on the identification system is considered in the design of the performance indexes, then, a comprehensive performance index CM (composite value) is defined to reflect the comprehensive performance of different recognition systems; the formula is as follows:

The decision evaluation module finally outputs the test results of the multi-angle test module and generates a corresponding test report table as shown in tables 1-3. The test system also gives corresponding suggestions due to different requirements of different task scenarios.

TABLE 1 Performance index report

TABLE 2 safety index report

TABLE 3 model stability index report and model combination property report

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A performance test method for a medical image recognition system is characterized by comprising the following steps: the performance test method comprises the following steps: the multi-class image test data generation module comprises a confrontation sample generation network and an entity and background recombination method; the multi-angle test module comprises a performance test, a reliability test and a safety test; the decision evaluation module analyzes the input test result, judges the performance of the model and gives a detailed test report;

2. The performance testing method for medical image recognition system according to claim 1, wherein:

the challenge sample generation network and entity, the background recombination method includes using multiple loss hybrid challenge camouflage against augmentation,

multiple loss function

Expressed as:

wherein: λ represents the antagonistic strength,

Indicating a loss of resistance,

Representing a style loss for style generation,

Representing a content loss for preserving source image content,

wherein:

is a feature extractor for a style or style,

is from

pattern loss for pattern generation

wherein:

wherein: x'_i，jTo combat the pixel value at the sample (i, j) coordinate, x_i+1，jIs the pixel value, x, at the (i +1, j) coordinate of the original image_i，j+1For the pixel value at the original image (i, j +1) coordinate:

for the loss of antagonism

The following cross entropy loss was used:

wherein: p is a radical of_yadv() And p_y() Respectively outputting the probability of the target model F to labels yadv and y, wherein yadv is the category of the confrontation sample, and y is the category of the original image;

wherein: o is sampling in the physical worldA machine background image, T is a random transformation of rotation, resizing and color shifts,

is a set of transformations;

3. The performance testing method for medical image recognition system according to claim 1, wherein:

the disintegration test is defined as: c_iFor the original test image

Is classified by the image recognition system, S_iFor the original test image

A confidence score of; c_i' use for association with metamorphic relations

Synthesized new test image

Class label of S_i' use for association with metamorphic relations

Synthesized new test image

The confidence score of (a) is determined,

the metamorphic relationship is expressed as:

C_i＝C′_iand ΔS＝|s_i-s′_i|＜c (7)

wherein: c is a hyperparameter, c is more than 0 and less than 100, c is set to be 50, and Delta S is the difference of confidence scores before and after expansion.

4. The performance testing method for medical image recognition system according to claim 1, wherein:

the reliability test in the multi-angle test module is a robust verified robustness) test, and under the condition that an original image x meets confidence guarantee, immune attack can be carried out in a norm sphere radius R:

the method is characterized in that the method is arbitrary, epsilon is introduced noise, B (x; R) is a noise set, R is a norm sphere radius, and x is an original image;

the final robustness accuracy robac is defined as:

5. the performance testing method for medical image recognition system according to claim 1, wherein:

the security test in the multi-angle test module is a model invariance test, a random image is selected, the disturbance of a pixel is selected by using one of four methods described below, and then the sensitivity of the network to the disturbance is measured, the first method is a Crop method, a square is randomly selected in an original image, the size of the square is adjusted to 224x224px, then the diagonal of the square is translated by one pixel to create a second image, and the image is different from the first image by translating a single pixel; the second method, Embedding, is the Embedding method, which first reduces the image to a minimum size of 100px while maintaining the aspect ratio and embeds it at a random location within the 224x224px image while filling the rest of the image with black pixels, then shifts the Embedding location by a single pixel, and creates two identical images again until a single pixel is shifted; in the third method, the image is first reduced to a minimum size of 100px while maintaining the aspect ratio and embedded at random locations within the 224x224px image, then a simple repair algorithm is used, i.e., each black pixel is replaced by a weighted average of the non-black pixels in its neighborhood, and the fourth method is the same as the second method, the image is first reduced to a minimum size of 100px without moving the embedded location, but keeping the embedded location unchanged, and changing the size of the embedded image by a single pixel.

6. The performance testing method for medical image recognition system according to claim 5, wherein:

in the security test, sensitivity is measured as an invariance test of a model by two methods, the first is the probability that TOP-1 of the network predicts the Change after a single-pixel disturbance, which is called P, i.e. Top-1Change, and the second is the average absolute Change of the probability calculated by the network after a pixel disturbance of the TOP class, which is called MAC.

7. The performance testing method for medical image recognition system according to claim 1, wherein:

the decision evaluation module analyzes the input test result, and judges the model performance, namely the identification Accuracy after Accuracy expansion, the identification Loss after Loss expansion, the identification Accuracy difference before and after delta acc expansion, the identification Loss difference before and after delta Loss expansion, the CR model robustness is represented by robac, the confidence score difference before and after delta S expansion, the probability that the TOP-1 of a P network changes after single-pixel disturbance is predicted, the MAC average absolute change is given, and a test report is given; the formula is as follows:

wherein: CM (compact message processor)_iRepresenting the integrated performance value, ω, of the i-th identification system_jWeight, max (M) representing jth performance index value of cloud service_j) Represents the maximum value of the jth individual performance index, min (M), in multiple recognition systems_j) Representing the minimum value of the j-th performance indicator, M, in a plurality of recognition systems_ijA j-th performance index value representing an i-th identification system, N representing a total number of identification system performance index values, by using the formula (2 max (M)_j)-M_ij)/(2*max(M_j)-min(M_j) ) to M_ijIs normalized to [0, 1 ]]An interval.

8. The performance testing method for medical image recognition system according to claim 1, wherein:

and the decision evaluation module finally outputs various test results of the multi-angle test module and generates a corresponding test reporting table.