Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
Referring to fig. 1, a method for evaluating an analysis result includes the steps of:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first equipment on the test set and a second marking result of the AI model on the test set;
s3, acquiring a gold standard, and performing t-test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
s4, judging whether the first test result is larger than a threshold value, and if so, judging that the second marking result has accuracy.
From the above description, the beneficial effects of the invention are as follows: the same test set is marked by setting the first equipment and the AI model together, the marked result is compared with the gold standard to calculate a difference value, the difference value is subjected to t-test, the difference value between different marked results and the gold standard is calculated, the comparison result of the difference value between different marked modes and the gold standard is intuitively obtained, the difference value is subjected to t-test, the accuracy standard can be quantized, whether the marked result of the AI model has accuracy can be directly judged according to the result of the t-test, and the accuracy assessment of the AI analysis result is realized.
Further, the calculating method of the first difference and the second difference in the step S3 is as follows:
acquiring the coordinates of the first marking result, the coordinates of the second marking result and the coordinates of the gold standard;
and calculating a first difference value between the coordinates of the first marking result and the coordinates of the gold standard and a second difference value between the coordinates of the second marking result and the coordinates of the gold standard by utilizing a trigonometric function.
From the above description, the difference relationships between the first mark result, the second mark result and the gold standard are obtained through the coordinates of the first mark result, the second mark result and the gold standard, and the difference relationships between the first mark and the gold standard and the difference relationships between the second mark and the gold standard can be compared conveniently.
Further, the step S3 further includes:
generating a scatter diagram according to the first difference value and the second difference value by taking the gold standard as a circle center and taking the difference value as a radius;
or the first marking result is used as a circle center, and a scatter diagram is generated according to a third difference value between the coordinates of the first marking result and the coordinates of the second marking result;
or generating a scatter diagram by taking the second marking result as a circle center according to the third difference value.
As can be seen from the above description, a scatter diagram is generated according to the first difference, the second difference and the gold standard, so that the difference between the first mark and the gold standard and the difference between the second mark and the gold standard can be intuitively embodied, that is, the accuracy of the results of the first mark and the second mark can be intuitively obtained; and considering that the gold standard can not be obtained, the first mark or the second mark can be used as the circle center, and a scatter diagram is generated by using the third difference value between the first mark and the second mark, so that the difference value between the first mark and the second mark can be intuitively obtained, and the accuracy of judging the second mark is more convenient.
Further, the step S2 further includes:
obtaining a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result and the first marking result are different in generation time;
the step S4 further includes:
calculating a first set of intra-correlation coefficients between the first and fourth labeling results, a second set of intra-correlation coefficients between the first and third labeling results, and a third set of intra-correlation coefficients between the first and second labeling results;
performing t-test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and performing t-test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
judging whether the second test result and the third test result are both larger than a threshold value, and if so, considering that the second marking result has repeatability.
As can be seen from the above description, the second device is added to obtain the third marking result of the second device, and obtain the fourth marking result of the first device, where the generation time of the fourth marking result is different from that of the first marking result, and a comparison group is added, so that the evaluation result is more reliable, and the intra-group correlation number between the comparison group and the comparison group is t-checked, and the repeatability evaluation is further performed on the marking result of the second marking, i.e., the AI, so that the dimension of the evaluation is more complete and the evaluation result is more reliable.
Further, the calculating the first set of internal correlation coefficients, the second set of internal correlation coefficients, and the third set of internal correlation coefficients is specifically:
and calculating the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient by using a self-help method to respectively obtain a plurality of the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient.
From the above description, the intra-group correlation coefficients are calculated by a self-help method, so that a large number of intra-group correlation coefficients can be obtained for one group of samples, and t-test can be performed on intra-group correlation coefficients of different groups subsequently to obtain comparison results, and repeatability evaluation is finally realized.
Referring to fig. 2, an evaluation terminal of analysis results includes a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the following steps when executing the computer program:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first equipment on the test set and a second marking result of the AI model on the test set;
s3, acquiring a gold standard, and performing t-test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
s4, judging whether the first test result is larger than a threshold value, and if so, judging that the second marking result has accuracy.
The invention has the beneficial effects that: the same test set is marked by setting the first equipment and the AI model together, the marked result is compared with the gold standard to calculate a difference value, the difference value is subjected to t-test, the difference value between different marked results and the gold standard is calculated, the comparison result of the difference value between different marked modes and the gold standard is intuitively obtained, the difference value is subjected to t-test, the accuracy standard can be quantized, whether the marked result of the AI model has accuracy can be directly judged according to the result of the t-test, and the accuracy assessment of the AI analysis result is realized.
Further, when the processor performs the calculation of the first difference and the second difference in step S3:
acquiring the coordinates of the first marking result, the coordinates of the second marking result and the coordinates of the gold standard;
and calculating a first difference value between the coordinates of the first marking result and the coordinates of the gold standard and a second difference value between the coordinates of the second marking result and the coordinates of the gold standard by utilizing a trigonometric function.
From the above description, the difference relationships between the first mark result, the second mark result and the gold standard are obtained through the coordinates of the first mark result, the second mark result and the gold standard, and the difference relationships between the first mark and the gold standard and the difference relationships between the second mark and the gold standard can be compared conveniently.
Further, the step S3 further includes:
generating a scatter diagram according to the first difference value and the second difference value by taking the gold standard as a circle center and taking the difference value as a radius;
or the first marking result is used as a circle center, and a scatter diagram is generated according to a third difference value between the coordinates of the first marking result and the coordinates of the second marking result;
or generating a scatter diagram by taking the second marking result as a circle center according to the third difference value.
As can be seen from the above description, a scatter diagram is generated according to the first difference, the second difference and the gold standard, so that the difference between the first mark and the gold standard and the difference between the second mark and the gold standard can be intuitively embodied, that is, the accuracy of the results of the first mark and the second mark can be intuitively obtained; and considering that the gold standard can not be obtained, the first mark or the second mark can be used as the circle center, and a scatter diagram is generated by using the third difference value between the first mark and the second mark, so that the difference value between the first mark and the second mark can be intuitively obtained, and the accuracy of judging the second mark is more convenient.
Further, the step S2 further includes:
obtaining a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result and the first marking result are different in generation time;
the step S4 further includes:
calculating a first set of intra-correlation coefficients between the first and fourth labeling results, a second set of intra-correlation coefficients between the first and third labeling results, and a third set of intra-correlation coefficients between the first and second labeling results;
performing t-test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and performing t-test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
judging whether the second test result and the third test result are both larger than a threshold value, and if so, considering that the second marking result has repeatability.
As can be seen from the above description, the second device is added to obtain the third marking result of the second device, and obtain the fourth marking result of the first device, where the generation time of the fourth marking result is different from that of the first marking result, and a comparison group is added, so that the evaluation result is more reliable, and the intra-group correlation number between the comparison group and the comparison group is t-checked, and the repeatability evaluation is further performed on the marking result of the second marking, i.e., the AI, so that the dimension of the evaluation is more complete and the evaluation result is more reliable.
Further, the calculating the first set of internal correlation coefficients, the second set of internal correlation coefficients, and the third set of internal correlation coefficients is specifically:
and calculating the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient by using a self-help method to respectively obtain a plurality of the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient.
From the above description, the intra-group correlation coefficients are calculated by a self-help method, so that a large number of intra-group correlation coefficients can be obtained for one group of samples, and t-test can be performed on intra-group correlation coefficients of different groups subsequently to obtain comparison results, and repeatability evaluation is finally realized.
Referring to fig. 1, a first embodiment of the present invention is as follows:
the evaluation method of the analysis result specifically comprises the following steps:
s1, acquiring a preset number of files as a test set;
in an alternative embodiment, the file is an image;
s2, acquiring a first marking result of the first equipment on the test set and a second marking result of the AI model on the test set;
the first marker marks the test set through the first equipment to obtain a first marking result;
s3, acquiring a gold standard, and performing t-test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
the golden standard is the correct position of the mark, taking the determination of the macula fovea golden standard of the fundus ultra-wide angle image as an example, two examinations of OCTA (Optical Coherence Tomography Angiography, optical coherence tomography blood vessel imaging) and ultra-wide angle fundus imaging can be carried out on the same patient, according to the position of the macula fovea determined in the OCTA tomographic image, the relative position relation between the macula fovea and retinal blood vessels in the OCTA tomographic image is determined, and because the OCTA is the same as the blood vessels shot by the ultra-wide angle fundus imaging, a linear regression equation is established according to the relative position of the macula fovea and the retinal blood vessels on the OCTA tomographic image, the accurate position of the macula fovea can be obtained according to the position of the retinal blood vessels on the obtained fundus ultra-wide angle image, namely the golden standard of the macula fovea;
calculating a corresponding gold standard for each file in the test set;
on the basis, the calculation method for obtaining the first difference value and the second difference value in the step S3 comprises the following steps:
acquiring coordinates of the first marking result, coordinates of the second marking result and coordinates of the gold standard; specifically, the marked files in the test set can be placed in the same coordinate system in the same mode, and the coordinates of the first marking result, the second marking result and the gold standard are obtained;
in an alternative embodiment, the pictures in the test set are the same in size, for example, 3000×4000 (pixels), and the positions of the pixels where the mark points are located can be directly used as coordinates of the mark results;
calculating the first difference between the coordinates of the first marking result and the coordinates of the gold standard and the second difference between the coordinates of the second marking result and the coordinates of the gold standard by utilizing a trigonometric function according to the coordinates; the coordinates of the mark for a file as in the second mark result are [ X ] AI ,Y AI ]The coordinates of the mark for the same file in the first mark result are [ X ] Human 1 ,Y Human 1 ]The difference is:
Sqrt[(X AI -X human 1 ) 2 +(Y AI -Y Human 1 ) 2 ]
After the first difference value and the second difference value are obtained in step S3, the method further includes:
generating a scatter diagram according to the first difference value and the second difference value by taking the coordinates of the gold standard as a circle center and taking the difference value as a radius;
or generating a scatter diagram by taking the coordinates of the first marking result as the circle center according to a third difference value between the coordinates of the first marking result and the coordinates of the second marking result;
or generating a scatter diagram by taking the coordinates of the second marking result as the circle center according to the third difference value;
in an alternative embodiment, a direction of the coordinates of the first marking result relative to the coordinates of the gold standard and a direction of the coordinates of the second marking result relative to the coordinates of the gold standard are also obtained, and a scatter diagram is generated according to the difference value and the direction; or directly generating a scatter diagram according to the coordinates of the first marking result, the coordinates of the second marking result and the coordinates of the gold standard;
s4, judging whether the first test result is larger than a threshold value, and if so, judging that the second marking result has accuracy;
the first test result is the value of the parameter P in the t test, and the threshold value may be 0.05, i.e. when the value of P is greater than 0.05, the difference between the second mark and the gold standard and the difference between the first mark and the gold standard are considered to be no difference, i.e. the AI mark result has the same accuracy as the manual mark result of the first marker;
in an alternative embodiment, the consistency between the first mark and the second mark can be described by using a bland-alterman diagram without obtaining a gold standard, namely, the consistency between the second mark result of the AI and the first mark result of the first marker is obtained;
if the second marking result of the AI has better consistency with the first marking result of the first marker, the marking result of the AI model is considered to have the same accuracy as the manual marking result of the first marker.
The second embodiment of the invention is as follows:
an evaluation method of analysis results is different from the first embodiment in that:
the step S2 further includes:
obtaining a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result and the first marking result are different in generation time;
the second marker marks the test set through the second equipment to obtain a third marking result; a first marker marks the test set through the first equipment to obtain a fourth marking result, wherein the fourth marking result is different from the generation time of the first marking result, and if the first marker marks the test set again after marking the test set for four days to obtain the first marking result, the fourth marking result is obtained;
the step S4 further includes:
s5, calculating a first Intra-group correlation coefficient (ICC, intra-class Correlation Correlation, intra-group correlation number) between the first marking result and the fourth marking result, a second Intra-group correlation coefficient between the first marking result and the third marking result and a third Intra-group correlation coefficient between the first marking result and the second marking result;
the method comprises the steps of calculating the consistency of marks of a first marker on the same test set at different times, the consistency of marks of the first marker and a second marker on the same test set and the consistency of marks of the first marker and an AI model on the same test set; the consistency of the marks of the same person on the same test set at different times, by different persons, by the person and by the AI is obtained;
in this embodiment, a self-service Method (Bootstrap Method) may be used to calculate the first intra-group correlation coefficient, the second intra-group correlation coefficient, and the third intra-group correlation coefficient, so as to obtain a plurality of the first intra-group correlation coefficient, the second intra-group correlation coefficient, and the third intra-group correlation coefficient, respectively;
specifically, the first marking result and the fourth marking result are self-sampled, intra-group correlation coefficients are calculated according to the sampling result of each time, and finally a plurality of first intra-group correlation coefficients are obtained; self-sampling the first marking result and the third marking result, calculating intra-group correlation coefficients according to sampling results of each time, and finally obtaining a plurality of second intra-group correlation coefficients; self-sampling the first marking result and the second marking result, calculating intra-group correlation coefficients according to sampling results of each time, and finally obtaining a plurality of third intra-group correlation coefficients;
in an alternative embodiment, at least 50 first intra-group correlation coefficients, second intra-group correlation coefficients, and third intra-group correlation coefficients are generated respectively to ensure the accuracy of the t-test;
performing t-test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and performing t-test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
judging whether the second test result and the third test result are both greater than a threshold value, and if so, considering that the second marking result has repeatability;
t-checking the first group internal correlation coefficient and the third group internal correlation coefficient, namely checking the difference between the repeatability of the marking result of the same person (first marker) on the same test set at different times and the repeatability of the marking result of the person and the AI model on the same test set; t-checking the second set of internal correlation coefficients and the third set of internal correlation coefficients, i.e., checking the difference between the repeatability of the marking results of one person (first marker) and the other person (second marker) on the same test set and the repeatability of the marking results of the person (first marker) and the AI model on the same test set; if the threshold is 0.05, when the value of the result P of the t test is greater than 0.05, the repeatability between the AI model and the marking result of the same test set by the person is considered to be the same as that of the marking result of the same test set by different persons and the repeatability between the marking result of the same test set by the same person at different times, namely the repeatability of the AI model is the same as that of the manual method;
in an alternative embodiment, the reproducibility of the AI model is evaluated, that is, a fifth marking result of the AI model on the test set is obtained, wherein the fifth marking result is different from the generation time of the second marking result; calculating a fourth group of internal correlation coefficients between the second marking result and the fifth marking result by using a self-help method to obtain a plurality of fourth group of internal correlation coefficients, and performing t-test on the first group of internal correlation coefficients and the fourth group of internal correlation coefficients to obtain a fourth test result; and judging whether the fourth test result and the fourth group internal phase relation number are both larger than a threshold value, and if so, judging that the marking result of the AI model has reproducibility.
Referring to fig. 2, a third embodiment of the present invention is as follows:
an evaluation terminal 1 of analysis results comprises a processor 2, a memory 3 and a computer program stored on the memory 3 and executable on the processor 2, which processor 2 implements the steps of embodiment one or embodiment two when executing the computer program.
In summary, the present invention provides an evaluation method and a terminal for an analysis result, where a first marker marks a test set through a first device to obtain a first marking result and a fourth marking result, marking the test set by an AI model to obtain a second marking result, marking the test set through a second device to obtain a third marking result, obtaining a gold standard, calculating a difference value between the first marking result and the gold standard and a difference value between the second marking result and the gold standard, using a scatter diagram to represent the difference value, and performing t-test, where a distribution trend of the marking result in space can be represented through the scatter diagram, and if the t-test result exceeds a threshold, the AI model marking result and the artificial marking result are considered to have consistent accuracy; in addition, the fourth marking result of the first marker and the third marking result of the second marker are obtained, the repeatability of the marking result of the same test set between the same person (the first marking result and the fourth editor result), different persons (the first marking result and the third marking result) and the person and AI (the first marking result and the second marking result) can be obtained, a large number of intra-group correlation coefficients can be obtained through a self-service method, so that t-test can be carried out, the difference of the repeatability of the marking result of the same test set between the person and AI on the repeatability of the marking result of the same test set between the same person and different persons can be obtained, the problem that the repeatability of the marking result of the AI model cannot be systematically and comprehensively judged by adopting single values such as ICC (integrated circuit) and gram (gram) is solved, and the accurate evaluation of the AI analysis result is realized.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.