CN111724374A - Evaluation method of analysis result and terminal - Google Patents

Evaluation method of analysis result and terminal Download PDF

Info

Publication number
CN111724374A
CN111724374A CN202010572829.4A CN202010572829A CN111724374A CN 111724374 A CN111724374 A CN 111724374A CN 202010572829 A CN202010572829 A CN 202010572829A CN 111724374 A CN111724374 A CN 111724374A
Authority
CN
China
Prior art keywords
result
marking
marking result
difference value
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010572829.4A
Other languages
Chinese (zh)
Other versions
CN111724374B (en
Inventor
林晨
喻碧莺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lin Chen
Wisdom Medical Shenzhen Co ltd
Original Assignee
Ke Junlong
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ke Junlong filed Critical Ke Junlong
Priority to CN202010572829.4A priority Critical patent/CN111724374B/en
Publication of CN111724374A publication Critical patent/CN111724374A/en
Application granted granted Critical
Publication of CN111724374B publication Critical patent/CN111724374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10101Optical tomography; Optical coherence tomography [OCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an evaluation method and a terminal of an analysis result, which are used for acquiring files with a preset number as a test set; acquiring a first marking result of the first device on the test set and a second marking result of the AI model on the test set; obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result; judging whether the first detection result is larger than a threshold value or not, and if so, considering that the second marking result has accuracy; the method comprises the steps of obtaining a first mark and a second mark of a same test set by a first device and an AI model, obtaining a gold standard of the test set, calculating a difference value of the first mark and the gold standard and a difference value of the second gold standard, and carrying out t-test on the difference values to obtain the accuracy of the AI model compared with the first device, thereby realizing accuracy evaluation.

Description

Evaluation method of analysis result and terminal
Technical Field
The invention relates to the field of statistical methods, in particular to an evaluation method and a terminal for an analysis result.
Background
The existing method for evaluating the accuracy of the AI analysis result mainly calculates the difference between the AI analysis result and the gold standard to perform difference calculation, but the judgment method cannot see the distribution trend in space, for example, in some scenes, the AI measurement result tends to show difference in the horizontal direction, and the difference in the vertical position is very small, the distribution in space has important prompting significance for further improving the method, but the existing method cannot be embodied, so that the accuracy evaluation compared with a manual method needs to be designed.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the evaluation method and the terminal for the analysis result are provided, and the AI analysis result can be accurately evaluated.
In order to solve the technical problems, the invention adopts a technical scheme that:
a method of evaluating an analysis result, comprising the steps of:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
and S4, judging whether the first detection result is larger than a threshold value, and if so, determining that the second marking result has accuracy.
In order to solve the technical problem, the invention adopts another technical scheme as follows:
an evaluation terminal for analyzing results, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
and S4, judging whether the first detection result is larger than a threshold value, and if so, determining that the second marking result has accuracy.
The invention has the beneficial effects that: the method comprises the steps of marking the same test set by setting a first device and an AI model together, comparing a marking result with a gold standard to calculate a difference value, carrying out t-test on the difference value, calculating the difference value between different marking results and the gold standard, visually obtaining a comparison result of the difference value between different marking modes and the gold standard, carrying out t-test on the difference value to quantify the accuracy standard, directly judging whether the marking result of the AI model has accuracy according to the result of the t-test, and realizing the accuracy evaluation of an AI analysis result.
Drawings
FIG. 1 is a flowchart illustrating the steps of a method for evaluating an analysis result according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an evaluation terminal for analyzing a result according to an embodiment of the present invention;
FIG. 3 is a scatter plot of an embodiment of the present invention;
description of reference numerals:
1. an evaluation terminal for analysis results; 2. a processor; 3. a memory;
Detailed Description
In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.
Referring to fig. 1, an evaluation method of an analysis result includes the steps of:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
and S4, judging whether the first detection result is larger than a threshold value, and if so, determining that the second marking result has accuracy.
From the above description, the beneficial effects of the present invention are: the method comprises the steps of marking the same test set by setting a first device and an AI model together, comparing a marking result with a gold standard to calculate a difference value, carrying out t-test on the difference value, calculating the difference value between different marking results and the gold standard, visually obtaining a comparison result of the difference value between different marking modes and the gold standard, carrying out t-test on the difference value to quantify the accuracy standard, directly judging whether the marking result of the AI model has accuracy according to the result of the t-test, and realizing the accuracy evaluation of an AI analysis result.
Further, the method for calculating the first difference and the second difference in step S3 is as follows:
acquiring the coordinate of the first marking result, the coordinate of the second marking result and the coordinate of the gold standard;
and calculating a first difference value between the coordinate of the first marking result and the coordinate of the gold standard and a second difference value between the coordinate of the second marking result and the coordinate of the gold standard by utilizing a trigonometric function.
From the above description, the difference relationship among the first marking result, the second marking result and the gold standard is obtained through the coordinates of the first marking result, the second marking result and the gold standard, and the difference relationship among the first marking result, the second marking result and the gold standard is quantized, so that the difference relationship between the first marking result and the gold standard and the difference relationship between the second marking result and the gold standard can be compared conveniently.
Further, the step S3 further includes:
generating a scatter diagram according to the first difference value and the second difference value by taking the gold standard as a circle center and taking the difference value as a radius;
or generating a scatter diagram according to a third difference value between the coordinate of the first marking result and the coordinate of the second marking result by taking the first marking result as a circle center;
or generating a scatter diagram according to the third difference value by taking the second marking result as a circle center.
According to the description, the scatter diagram is generated according to the first difference value, the second difference value and the gold standard, so that the difference values between the first mark and the gold standard and between the second mark and the gold standard can be visually embodied, and the accuracy of the result of the first mark and the result of the second mark can be visually obtained; in addition, considering the situation that the gold standard cannot be acquired, the first mark or the second mark can be used as a circle center, a scatter diagram is generated by using a third difference value between the first mark and the second mark, and the difference value between the first mark and the second mark can be visually acquired, so that the accuracy of judging the second mark is more convenient.
Further, the step S2 further includes:
acquiring a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result is generated at a time different from that of the first marking result;
the step S4 is followed by:
calculating a first intra-group correlation coefficient between the first marked result and the fourth marked result, a second intra-group correlation coefficient between the first marked result and the third marked result, and a third intra-group correlation coefficient between the first marked result and the second marked result;
carrying out t test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and carrying out t test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
and judging whether the second inspection result and the third inspection result are both larger than a threshold value, and if so, determining that the second marking result has repeatability.
As can be seen from the above description, the second device is added to obtain the third labeling result of the second device and obtain the fourth labeling result of the first device, and the generation time of the fourth labeling result is different from that of the first labeling result, so that the comparison group is added, the evaluation result is more reliable, the intra-group correlation coefficient between the comparison group and the comparison group is subjected to t-test, and the repeatability evaluation is further performed on the labeling result of the second labeling, i.e., AI, so that the evaluation dimension is more complete, and the evaluation result is more reliable.
Further, the calculating the first, second and third intra-group correlation coefficients specifically includes:
and calculating the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient by using a self-help method to respectively obtain a plurality of the first group internal correlation coefficients, the second group internal correlation coefficients and the third group internal correlation coefficients.
As can be seen from the above description, the intra-group correlation coefficient is calculated by the self-service method, and a large number of intra-group correlation coefficients can be obtained for one group of samples, so that the intra-group correlation coefficients of different groups can be subjected to t-test subsequently to obtain a comparison result, and finally, repeatability evaluation is achieved.
Referring to fig. 2, an evaluation terminal for analyzing a result includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer program:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
and S4, judging whether the first detection result is larger than a threshold value, and if so, determining that the second marking result has accuracy.
The invention has the beneficial effects that: the method comprises the steps of marking the same test set by setting a first device and an AI model together, comparing a marking result with a gold standard to calculate a difference value, carrying out t-test on the difference value, calculating the difference value between different marking results and the gold standard, visually obtaining a comparison result of the difference value between different marking modes and the gold standard, carrying out t-test on the difference value to quantify the accuracy standard, directly judging whether the marking result of the AI model has accuracy according to the result of the t-test, and realizing the accuracy evaluation of an AI analysis result.
Further, when the processor performs the calculation of the first difference and the second difference in step S3:
acquiring the coordinate of the first marking result, the coordinate of the second marking result and the coordinate of the gold standard;
and calculating a first difference value between the coordinate of the first marking result and the coordinate of the gold standard and a second difference value between the coordinate of the second marking result and the coordinate of the gold standard by utilizing a trigonometric function.
From the above description, the difference relationship among the first marking result, the second marking result and the gold standard is obtained through the coordinates of the first marking result, the second marking result and the gold standard, and the difference relationship among the first marking result, the second marking result and the gold standard is quantized, so that the difference relationship between the first marking result and the gold standard and the difference relationship between the second marking result and the gold standard can be compared conveniently.
Further, the step S3 further includes:
generating a scatter diagram according to the first difference value and the second difference value by taking the gold standard as a circle center and taking the difference value as a radius;
or generating a scatter diagram according to a third difference value between the coordinate of the first marking result and the coordinate of the second marking result by taking the first marking result as a circle center;
or generating a scatter diagram according to the third difference value by taking the second marking result as a circle center.
According to the description, the scatter diagram is generated according to the first difference value, the second difference value and the gold standard, so that the difference values between the first mark and the gold standard and between the second mark and the gold standard can be visually embodied, and the accuracy of the result of the first mark and the result of the second mark can be visually obtained; in addition, considering the situation that the gold standard cannot be acquired, the first mark or the second mark can be used as a circle center, a scatter diagram is generated by using a third difference value between the first mark and the second mark, and the difference value between the first mark and the second mark can be visually acquired, so that the accuracy of judging the second mark is more convenient.
Further, the step S2 further includes:
acquiring a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result is generated at a time different from that of the first marking result;
the step S4 is followed by:
calculating a first intra-group correlation coefficient between the first marked result and the fourth marked result, a second intra-group correlation coefficient between the first marked result and the third marked result, and a third intra-group correlation coefficient between the first marked result and the second marked result;
carrying out t test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and carrying out t test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
and judging whether the second inspection result and the third inspection result are both larger than a threshold value, and if so, determining that the second marking result has repeatability.
As can be seen from the above description, the second device is added to obtain the third labeling result of the second device and obtain the fourth labeling result of the first device, and the generation time of the fourth labeling result is different from that of the first labeling result, so that the comparison group is added, the evaluation result is more reliable, the intra-group correlation coefficient between the comparison group and the comparison group is subjected to t-test, and the repeatability evaluation is further performed on the labeling result of the second labeling, i.e., AI, so that the evaluation dimension is more complete, and the evaluation result is more reliable.
Further, the calculating the first, second and third intra-group correlation coefficients specifically includes:
and calculating the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient by using a self-help method to respectively obtain a plurality of the first group internal correlation coefficients, the second group internal correlation coefficients and the third group internal correlation coefficients.
As can be seen from the above description, the intra-group correlation coefficient is calculated by the self-service method, and a large number of intra-group correlation coefficients can be obtained for one group of samples, so that the intra-group correlation coefficients of different groups can be subjected to t-test subsequently to obtain a comparison result, and finally, repeatability evaluation is achieved.
Referring to fig. 1, a first embodiment of the present invention is:
an evaluation method of an analysis result specifically comprises the following steps:
s1, acquiring a preset number of files as a test set;
in an alternative embodiment, the document is an image;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
a first marker marks the test set through the first equipment to obtain a first marking result;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
the gold standard is the correct position of the mark, by taking the gold mark of the macular fovea of the fundus super-wide-angle image as an example, two examinations of OCTA (Optical Coherence Tomography imaging) and super-wide-angle fundus photography can be carried out on the same patient, according to the position of the macular fovea determined in the OCTA tomographic image and the relative position relation between the macular fovea and retinal blood vessels in the OCTA tomographic image, because the OCTA is the same as the blood vessels shot by the super-wide-angle fundus photography, a linear regression equation is established according to the relative position between the macular fovea and the retinal blood vessels in the OCTA tomographic image, the accurate position of the macular fovea can be obtained according to the position of the retinal blood vessels on the obtained fundus super-wide-angle image, and the gold standard of the macular fovea is obtained;
calculating the corresponding gold standard of each file in the test set;
on this basis, the calculation method for obtaining the first difference and the second difference in step S3 is as follows:
acquiring the coordinate of the first marking result, the coordinate of the second marking result and the coordinate of the gold standard; specifically, the marked files in the test set can be placed in the same coordinate system in the same manner, and the first marking result, the second marking result and the coordinates of the gold standard are obtained;
in an optional implementation manner, the sizes of the pictures in the test set are the same, and if the sizes of the pictures are 3000 × 4000 (pixels), the positions of the pixel points where the marking points are located can be directly used as the coordinates of the marking result;
calculating the first difference between the coordinates of the first marking result and the coordinates of the gold standard and the second difference between the coordinates of the second marking result and the coordinates of the gold standard by utilizing a trigonometric function according to the coordinates; if the coordinate of the mark for a file in the second marking result is [ X ]AI,YAI]The coordinate of the mark for the same file in the first marking result is [ X ]Person 1,YPerson 1]If the difference is:
Sqrt[(XAI-Xperson 1)2+(YAI-YPerson 1)2]
After the first difference value and the second difference value are acquired in step S3, the method further includes:
generating a scatter diagram according to the first difference value and the second difference value by taking the coordinate of the gold standard as a circle center and the difference value as a radius;
or generating a scatter diagram according to a third difference value between the coordinate of the first marking result and the coordinate of the second marking result by taking the coordinate of the first marking result as a circle center;
or generating a scatter diagram according to the third difference value by taking the coordinate of the second marking result as the center of a circle;
in an optional implementation manner, the direction of the coordinate of the first marking result relative to the coordinate of the gold standard and the direction of the coordinate of the second marking result relative to the coordinate of the gold standard are also obtained, and a scatter diagram is generated according to the difference value and the direction; or generating a scatter diagram directly according to the coordinates of the first marking result, the coordinates of the second marking result and the coordinates of the gold standard;
s4, judging whether the first detection result is larger than a threshold value or not, and if so, considering that the second marking result has accuracy;
wherein, the first inspection result is the value of the parameter P in the t-inspection, and the threshold value may be 0.05, that is, when the value of P is greater than 0.05, it is determined that there is no difference between the second marker and the gold standard and between the first marker and the gold standard, that is, the marking result of the AI and the manual marking result of the first marker have the same accuracy;
in an alternative embodiment, the gold standard may not be obtained, and a bland-alternaman diagram is used to describe the correspondence between the first marker and the second marker, i.e. to obtain the correspondence between the second marker result of the AI and the first marker result of the first marker;
and if the second marking result of the AI is better consistent with the first marking result of the first marker, the marking result of the AI model and the manual marking result of the first marker are considered to have the same accuracy.
The second embodiment of the invention is as follows:
an evaluation method of an analysis result, which is different from the first embodiment in that:
the step S2 further includes:
acquiring a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result is generated at a time different from that of the first marking result;
the second marker marks the test set through the second equipment to obtain a third marking result; a first marker marks the test set through the first equipment to obtain a fourth marking result, wherein the generation time of the fourth marking result is different from that of the first marking result, and if the first marker marks the test set to obtain the first marking result four days later, the first marker marks the test set again to obtain the fourth marking result;
the step S4 is followed by:
s5, calculating a first Intra-class Correlation (ICC) between the first marked result and the fourth marked result, a second Intra-class Correlation (hd) between the first marked result and the third marked result, and a third Intra-class Correlation (hd) between the first marked result and the second marked result;
calculating the consistency of the marks of the same test set by the first marker at different time, the consistency of the marks of the same test set by the first marker and the second marker, and the consistency of the marks of the same test set by the first marker and the AI model; that is, the consistency of the same person in the same test set at different times, different persons, people and AI;
in this embodiment, a self-service Method (Bootstrap Method) may be used to calculate the first intra-group correlation coefficient, the second intra-group correlation coefficient, and the third intra-group correlation coefficient, so as to obtain a plurality of first intra-group correlation coefficients, second intra-group correlation coefficients, and third intra-group correlation coefficients, respectively;
specifically, the first marking result and the fourth marking result are subjected to self-sampling, intra-group correlation coefficients are calculated according to each sampling result, and a plurality of first intra-group correlation coefficients are finally obtained; self-sampling is carried out on the first marking result and the third marking result, intra-group correlation coefficients are calculated according to sampling results of each time, and a plurality of second intra-group correlation coefficients are obtained finally; self-sampling is carried out on the first marking result and the second marking result, intra-group correlation coefficients are calculated according to sampling results of each time, and a plurality of third intra-group correlation coefficients are obtained finally;
in an optional embodiment, at least 50 first group internal correlation coefficients, second group internal correlation coefficients and third group internal correlation coefficients are respectively generated to ensure the accuracy of the t test;
carrying out t test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and carrying out t test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
judging whether the second inspection result and the third inspection result are both larger than a threshold value, if so, determining that the second marking result has repeatability;
performing a t-test on the first intra-group correlation coefficient and the third intra-group correlation coefficient, namely testing the difference between the repeatability of the marking result of the same person (a first marker) on the same test set at different times and the repeatability of the marking result of the person and the AI model on the same test set; performing a t-test on the second intra-group correlation coefficient and the third intra-group correlation coefficient, namely testing the difference between the repeatability of the marking result of one person (a first marker) and the repeatability of the marking result of another person (a second marker) on the same test set and the repeatability of the marking result of the person (the first marker) and the AI model on the same test set; if the threshold is 0.05, when the value of the result P of the t-test is greater than 0.05, the repeatability between the AI model and the marking result of the same test set by the person is considered to have no difference from the repeatability between the marking results of different persons on the same test set and the repeatability between the marking results of the same person on the same test set at different times, namely the repeatability of the AI model is the same as that of a manual method;
in an optional embodiment, the reproducibility of the AI model is evaluated by obtaining a fifth labeling result of the AI model on the test set, where the fifth labeling result is generated at a different time than the second labeling result; calculating a fourth group internal correlation coefficient between the second marking result and the fifth marking result by using a self-service method to obtain a plurality of fourth group internal correlation coefficients, and performing t test on the first group internal correlation coefficient and the fourth group internal correlation coefficient to obtain a fourth test result; and judging whether the fourth test result and the fourth intra-group correlation coefficient are both larger than a threshold value, if so, determining that the marking result of the AI model has reproducibility.
Referring to fig. 2, a third embodiment of the present invention is:
an evaluation terminal 1 for analyzing results comprises a processor 2, a memory 3 and a computer program stored on the memory 3 and capable of running on the processor 2, wherein the processor 2 implements the steps of the first embodiment or the second embodiment when executing the computer program.
In summary, the present invention provides an evaluation method and a terminal for analysis results, where a first marker marks a test set by a first device to obtain a first marking result and a fourth marking result, the marking times of the first marking result and the fourth marking result are different, an AI model marks the test set to obtain a second marking result, a second marker marks the test set by a second device to obtain a third marking result, a gold standard is obtained, a difference between the first marking result and the gold standard and a difference between the second marking result and the gold standard are calculated, the difference is represented by a scatter plot and a t-test is performed, a distribution trend of the marking results in space can be represented by the scatter plot, and a result of the t-test can determine whether the accuracy of the AI model marking and the accuracy of the human marking are consistent, if the result of the t test exceeds the threshold value, the marking result of the AI model and the manual marking result are considered to have consistent accuracy; in addition, the fourth marking result of the first marking person and the third marking result of the second marking person are also obtained, the repeatability of the marking result of the same person on the same test set between different times (the first marking result and the fourth editing machine result), different persons (the first marking result and the third marking result) and persons and AI (the first marking result and the second marking result) can be obtained, a large number of intra-group correlation coefficients can be obtained through a self-help method, so that t test can be carried out, the repeatability of the marking result of the same test set between the persons and AI can be obtained, the repeatability difference of the marking result of the same person on the same test set between different times and different persons can be obtained, the problems that the existing method adopts single values such as ICC and Cranbach coefficient to carry out repeatability evaluation and the analysis result cannot be systematically and comprehensively judged are solved, and the marking result of the AI model is evaluated systematically and comprehensively in repeatability, so that the AI analysis result is evaluated accurately.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for evaluating an analysis result, comprising the steps of:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
and S4, judging whether the first detection result is larger than a threshold value, and if so, determining that the second marking result has accuracy.
2. The method of claim 1, wherein the first difference and the second difference are calculated in step S3 by:
acquiring the coordinate of the first marking result, the coordinate of the second marking result and the coordinate of the gold standard;
and calculating a first difference value between the coordinate of the first marking result and the coordinate of the gold standard and a second difference value between the coordinate of the second marking result and the coordinate of the gold standard by utilizing a trigonometric function.
3. The method for evaluating an analysis result according to claim 1, wherein the step S3 further comprises:
generating a scatter diagram according to the first difference value and the second difference value by taking the gold standard as a circle center and taking the difference value as a radius;
or generating a scatter diagram according to a third difference value between the coordinate of the first marking result and the coordinate of the second marking result by taking the first marking result as a circle center;
or generating a scatter diagram according to the third difference value by taking the second marking result as a circle center.
4. The method for evaluating an analysis result according to claim 1, wherein the step S2 further comprises:
acquiring a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result is generated at a time different from that of the first marking result;
the step S4 is followed by:
calculating a first intra-group correlation coefficient between the first marked result and the fourth marked result, a second intra-group correlation coefficient between the first marked result and the third marked result, and a third intra-group correlation coefficient between the first marked result and the second marked result;
carrying out t test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and carrying out t test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
and judging whether the second inspection result and the third inspection result are both larger than a threshold value, and if so, determining that the second marking result has repeatability.
5. The method of claim 4, wherein the calculating the first, second and third intra-group correlation coefficients comprises:
and calculating the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient by using a self-help method to respectively obtain a plurality of the first group internal correlation coefficients, the second group internal correlation coefficients and the third group internal correlation coefficients.
6. An evaluation terminal for analyzing results, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, characterized in that the processor implements the following steps when executing the computer program:
s1, acquiring a preset number of files as a test set;
s2, acquiring a first marking result of the first device to the test set and a second marking result of the AI model to the test set;
s3, obtaining a gold standard, and carrying out t test on a first difference value between the first marking result and the gold standard and a second difference value between the second marking result and the gold standard to obtain a first test result;
and S4, judging whether the first detection result is larger than a threshold value, and if so, determining that the second marking result has accuracy.
7. The terminal for evaluating an analysis result according to claim 6, wherein when the processor performs the calculation of the first difference and the second difference in the step S3:
acquiring the coordinate of the first marking result, the coordinate of the second marking result and the coordinate of the gold standard;
and calculating a first difference value between the coordinate of the first marking result and the coordinate of the gold standard and a second difference value between the coordinate of the second marking result and the coordinate of the gold standard by utilizing a trigonometric function.
8. The terminal for evaluating an analysis result according to claim 6, wherein the step S3 further comprises:
generating a scatter diagram according to the first difference value and the second difference value by taking the gold standard as a circle center and taking the difference value as a radius;
or generating a scatter diagram according to a third difference value between the coordinate of the first marking result and the coordinate of the second marking result by taking the first marking result as a circle center;
or generating a scatter diagram according to the third difference value by taking the second marking result as a circle center.
9. The terminal for evaluating an analysis result according to claim 6, wherein the step S2 further comprises:
acquiring a third marking result of the second device on the test set and a fourth marking result of the first device on the test set, wherein the fourth marking result is generated at a time different from that of the first marking result;
the step S4 is followed by:
calculating a first intra-group correlation coefficient between the first marked result and the fourth marked result, a second intra-group correlation coefficient between the first marked result and the third marked result, and a third intra-group correlation coefficient between the first marked result and the second marked result;
carrying out t test on the first group internal correlation coefficient and the third group internal correlation coefficient to obtain a second test result, and carrying out t test on the second group internal correlation coefficient and the third group internal correlation coefficient to obtain a third test result;
and judging whether the second inspection result and the third inspection result are both larger than a threshold value, and if so, determining that the second marking result has repeatability.
10. The terminal for evaluating an analysis result according to claim 9, wherein the calculating the first, second and third intra-group correlation coefficients specifically comprises:
and calculating the first group internal correlation coefficient, the second group internal correlation coefficient and the third group internal correlation coefficient by using a self-help method to respectively obtain a plurality of the first group internal correlation coefficients, the second group internal correlation coefficients and the third group internal correlation coefficients.
CN202010572829.4A 2020-06-22 2020-06-22 Evaluation method and terminal of analysis result Active CN111724374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010572829.4A CN111724374B (en) 2020-06-22 2020-06-22 Evaluation method and terminal of analysis result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010572829.4A CN111724374B (en) 2020-06-22 2020-06-22 Evaluation method and terminal of analysis result

Publications (2)

Publication Number Publication Date
CN111724374A true CN111724374A (en) 2020-09-29
CN111724374B CN111724374B (en) 2024-03-01

Family

ID=72569897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010572829.4A Active CN111724374B (en) 2020-06-22 2020-06-22 Evaluation method and terminal of analysis result

Country Status (1)

Country Link
CN (1) CN111724374B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633265A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 For optimizing the data processing method and device of credit evaluation model
CN108805134A (en) * 2018-06-25 2018-11-13 慧影医疗科技(北京)有限公司 A kind of construction method of dissection of aorta parted pattern and application
CN109598415A (en) * 2018-11-13 2019-04-09 黑龙江金域医学检验所有限公司 Method for evaluating quality and device, the computer readable storage medium of detection system
CN109685870A (en) * 2018-11-21 2019-04-26 北京慧流科技有限公司 Information labeling method and device, tagging equipment and storage medium
JP2020009141A (en) * 2018-07-06 2020-01-16 株式会社 日立産業制御ソリューションズ Machine learning device and method
CN110826494A (en) * 2019-11-07 2020-02-21 达而观信息科技(上海)有限公司 Method and device for evaluating quality of labeled data, computer equipment and storage medium
CN110826908A (en) * 2019-11-05 2020-02-21 北京推想科技有限公司 Evaluation method and device for artificial intelligent prediction, storage medium and electronic equipment
CN110858327A (en) * 2018-08-24 2020-03-03 宏达国际电子股份有限公司 Method of validating training data, training system and computer program product
CN111311558A (en) * 2020-02-09 2020-06-19 华中科技大学同济医学院附属协和医院 Construction method of imaging omics model for pancreatic cancer prediction

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633265A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 For optimizing the data processing method and device of credit evaluation model
CN108805134A (en) * 2018-06-25 2018-11-13 慧影医疗科技(北京)有限公司 A kind of construction method of dissection of aorta parted pattern and application
JP2020009141A (en) * 2018-07-06 2020-01-16 株式会社 日立産業制御ソリューションズ Machine learning device and method
CN110858327A (en) * 2018-08-24 2020-03-03 宏达国际电子股份有限公司 Method of validating training data, training system and computer program product
CN109598415A (en) * 2018-11-13 2019-04-09 黑龙江金域医学检验所有限公司 Method for evaluating quality and device, the computer readable storage medium of detection system
CN109685870A (en) * 2018-11-21 2019-04-26 北京慧流科技有限公司 Information labeling method and device, tagging equipment and storage medium
CN110826908A (en) * 2019-11-05 2020-02-21 北京推想科技有限公司 Evaluation method and device for artificial intelligent prediction, storage medium and electronic equipment
CN110826494A (en) * 2019-11-07 2020-02-21 达而观信息科技(上海)有限公司 Method and device for evaluating quality of labeled data, computer equipment and storage medium
CN111311558A (en) * 2020-02-09 2020-06-19 华中科技大学同济医学院附属协和医院 Construction method of imaging omics model for pancreatic cancer prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SKYLAR STOLTE ET AL.: "A survey on medical image analysis in diabetic retinopathy", 《MEDICAL IMAGE ANALYSIS》, pages 1 - 27 *
陈桂林: "基于钼靶影像的乳腺肿瘤形态特征分析", 《中国优秀硕士学位论文全文数据库》, pages 1 - 87 *

Also Published As

Publication number Publication date
CN111724374B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN110490851B (en) Mammary gland image segmentation method, device and system based on artificial intelligence
CN110428415B (en) Medical image quality evaluation method, device, equipment and storage medium
US8840555B2 (en) System and method of ultrasound image processing
RU2542096C2 (en) System for lung ventilation information presentation
Jannin et al. Validation in medical image processing.
JP2016531709A (en) Image analysis technology for diagnosing disease
JP2004174254A (en) Method and system for measuring disease related tissue change
RU2752690C2 (en) Detecting changes in medical imaging
JP2014042684A (en) Medical image processing device, and program
RU2005133397A (en) AUTOMATIC SKIN DETECTION
US10729389B2 (en) 3D assessment of conjugant eye deviation for the identificaiton of acute ischemic stroke
US20110148861A1 (en) Pet data processing system, an arrangement, a method and a computer program product for determining a distribution of a tracer uptake
KR20200108686A (en) Programs and applications for sarcopenia analysis using deep learning algorithms
TWI542320B (en) Human weight estimating method by using depth images and skeleton characteristic
KR20140089103A (en) Magnetic Resonance Diffusion Tensor Imaging Registration and Distortion Correction Method and System Using Image Intensity Minimization
CN103743819A (en) Detection method and device for content of fat in swine muscle
CN113749646A (en) Monocular vision-based human body height measuring method and device and electronic equipment
CN116712094A (en) Knee joint measurement system based on load simulation CT device
CN111724374B (en) Evaluation method and terminal of analysis result
Fu et al. Automated analysis of multi site MRI phantom data for the NIHPD project
US7415142B2 (en) Method of visualizing the perfusion of an organ while utilizing a perfusion measurement
EP1571999B1 (en) Method of tomographic imaging
CN109350062B (en) Medical information acquisition method, medical information acquisition device and non-volatile computer storage medium
Holden et al. Detecting small anatomical change with 3D serial MR subtraction images
Asi et al. Automatic craniofacial anthropometry landmarks detection and measurements for the orbital region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210309

Address after: Unit G7, block a, floor 1, building 9, Baoneng Science Park, Qinghu village, Qinghu community, Longhua street, Longhua District, Shenzhen, Guangdong 518000

Applicant after: Lin Chen

Applicant after: Huishili medical (Shenzhen) Co.,Ltd.

Address before: Room 604, block 4, ginkgo garden, 296 Shangdu Road, Cangshan District, Fuzhou City, Fujian Province 350000

Applicant before: Lin Chen

Applicant before: Ke Junlong

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220909

Address after: Room 804, Building 3A, Qiaoxiang Mansion, Qiaoxiang Road, Futian District, Shenzhen, Guangdong 518000

Applicant after: Wisdom Medical (Shenzhen) Co.,Ltd.

Applicant after: Lin Chen

Address before: Unit G7, block a, floor 1, building 9, Baoneng Science Park, Qinghu village, Qinghu community, Longhua street, Longhua District, Shenzhen, Guangdong 518000

Applicant before: Lin Chen

Applicant before: Huishili medical (Shenzhen) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant