Background
Cervical cancer ranks highest among the most feared cancer types in korean women because hysterectomy may affect pregnancy or childbirth or cause loss of female feelings.
According to the statistics of 2013, the number of korean cervical cancer patients is 26,207, and the ranking is the fourth in female cancer (korean health and welfare department data). In addition, cervical cancer is one of seven major cancers recommended to be screened in korea, and as cervical cancer is incorporated into national cancer screening programs since 1999, its early diagnosis rate is increasing. In recent years, the in situ Cancer (CIS) (precancerous) of the cervix, which is called "stage 0" cervical cancer, has also steadily increased, and women with sexual experience are suggested to receive examinations every year.
Market status for cervical cancer screening has shown that the age of screening targets has decreased from 30 to 20 years as the rate of CIS in young women has increased since 2016. In particular, unlike other cancers, medical insurance gold is suitable for 100% of the cost of cervical cytology-based screening tests. However, since the false negative rate (i.e., misdiagnosis rate) is as high as 55%, it is recommended to supplement sub-cervical angiography when performing screening tests. Thus, the global cervical cancer screening has a market value of 6.86 trillion won, with cervical imaging accounting for 30% and a value of about 2 trillion won.
Fig. 1 is a diagram schematically illustrating a cervical cytological examination and cervical imaging, which are commonly used for diagnosing cervical cancer. Referring to the lower part of fig. 1, when a captured image of the cervix uteri is acquired from the outside of the vagina of a female subject by a predetermined photographing apparatus (e.g., a cervical colposcope shown in fig. 1), the result of analyzing the captured image may be used to reduce the misdiagnosis rate of cervical cancer.
However, when using a conventional cervical colposcope, medical professionals can confirm whether cervical cancer is occurring from images of the cervix based on their education and experience, and this method is often repetitive and ambiguous, time consuming even for experienced physicians, and may also reduce accuracy.
To overcome these drawbacks, devices for determining the incidence of cervical cancer have been introduced, which acquire captured images of the cervix, generate from the acquired cervical images based on a machine-learned model of cervical cancer, and provide analytical information about whether a subject has suffered from cervical cancer.
A key factor in evaluating the performance of such a determining device is that the images used for learning should be accurately classified and organized. If the data classification is not accurately and clearly performed, the accuracy of the analysis of the cervical cancer incidence is reduced. Unlike general medical images, colposcopic images of cervical cancer may appear in different forms depending on the imaging environment and the operator. Therefore, the apparatus for determining cervical cancer incidence must classify images to be used for learning according to more definite and strict criteria and perform learning.
(Prior art document)
(patent document)
(patent document 1) Korean patent application No. 10-0850347
(patent document 2) Korean patent laid-open publication No. 10-2016-0047720
Disclosure of Invention
1. Technical problem
It is a technical object of the present invention to provide a method of classifying cervical learning data for deep learning of cervix, which classifies cervical data necessary for accurately diagnosing the presence or absence of cervical cancer lesion according to an accurate criterion.
Further, another technical object of the present invention is to provide a system for generating cervical learning data and a method of classifying cervical learning data, which make an accurate diagnosis of a cervix uteri by preventing an over-learning of image data of a cervix uteri of a specific shape or a phenomenon that a specific type of image is not normally learned.
2. Means for solving the problems
A method executable in a computer system for generating cervical learning data, the method comprising: receiving captured image data of an unclassified cervix from an external device; classifying the captured image data of the unclassified cervix based on a neural network algorithm according to a plurality of multi-level classification criteria; and generating and storing classification criterion-specific learning data from the captured image data of the cervix classified according to the classification criterion, or learning the generated classification criterion-specific learning data.
In the method according to the present invention, it is characterized in that the generating learning data and the learning further include generating additional learning data for controlling a numerical balance of the learning data specific to the classification criterion.
In the method according to the present invention, it is characterized in that the additional learning data is generated based on learning data specific to each of the classification criteria.
In a method according to the invention, it is characterized in that the plurality of multi-level classification criteria comprises at least two or more of a first level classification criterion based on color, a second level classification criterion based on the size of the cervix in the captured image data, and a third level classification criterion based on a combination of color and shape in the cervix image data.
In the method according to the invention, it is characterized in that the plurality of multilevel sorting criteria further comprises a fourth level sorting criterion based on exposure and focus.
In the method according to the present invention, it is characterized in that classifying the image data includes: first classifying the captured image data of the unclassified cervix according to a first color-based classification criterion; secondarily classifying the captured image data of the unclassified cervix according to a second-level classification criterion based on a size of the cervix in the primarily classified captured image data; and classifying the captured image data of the unclassified cervix three times according to a third-level classification criterion based on a combination of color and shape in the secondarily classified cervix image data.
In the method according to the invention, it is characterized in that the captured image data of the cervix, which has not been classified after the tertiary classification, is classified four times according to a fourth classification criterion based on exposure and focus.
In the method according to the present invention, it is characterized in that the first-stage classification criterion includes a color value as a classification reference value for identifying each of at least one of an acetic acid reaction image, a lugol solution reaction image, a green filter image, and a general image.
In the method according to the present invention, it is characterized in that the third-level classification criterion includes a combination of a color value and a shape as a classification reference value for identifying each of at least one of blood, mucus, a ring, a colposcope, a treatment trace, and a surgical instrument in the cervical image data.
A system for generating cervical learning data, the system comprising: an image receiving unit configured to receive captured image data of an unclassified cervix from an external device allowing transmission and reception of data; an image data classification unit configured to classify the captured image data of the unclassified cervix based on a neural network algorithm according to a plurality of multi-level classification criteria; a learning data generation unit configured to generate learning data specific to a classification criterion from the captured image data of the cervix uteri classified according to the classification criterion, and store the learning data or transmit the learning data to an artificial intelligence learning system for learning; and a data storage unit configured to store the multi-level classification criteria, the captured image data of the unclassified cervix, and the generated learning data.
In the system according to the present invention, it is characterized in that the learning data generation unit further generates additional learning data for controlling a numerical balance of the generated classification criterion-specific learning data.
In the system according to the present invention, it is characterized in that the image data classifying unit classifies the captured image data of the unclassified cervix using at least two or more of a first-level classification criterion based on color, a second-level classification criterion based on a size of the cervix in the captured image data, and a third-level classification criterion based on a combination of color and shape in the cervix image data.
In the system according to the present invention, it is characterized in that the image data classifying unit further includes a fourth-level classification criterion based on exposure and focus, and classifies the captured image data of the cervix that is not classified.
In the system according to the present invention, it is characterized in that the image data classifying unit primarily classifies the captured image data of the unclassified cervix according to a primary color-based classification criterion, secondarily classifies the captured image data of the unclassified cervix according to a secondary classification criterion based on a size of the cervix in the primarily classified captured image data, and thirdly classifies the captured image data of the unclassified cervix according to a tertiary classification criterion based on a combination of a color and a shape in the secondarily classified cervix image data.
In the system according to the present invention, it is characterized in that the image data classifying unit classifies the captured image data of the cervix that has not been classified yet after the tertiary classification four times, according to a fourth-order classification criterion based on exposure and focus.
In the system according to the present invention, it is characterized in that the image data classifying unit classifies the captured image data of the unclassified cervix according to a first-stage color-based classification standard, and the first-stage classification standard includes a color value as a classification reference value for identifying each of at least one of an acetic acid reaction image, a solution reaction image, a green filter image, and a general image.
In the system according to the present invention, it is characterized in that the image data classifying unit classifies the captured image data of the unclassified cervix three times according to a third-level classification criterion based on a combination of a color and a shape in the cervical image data in the captured image data of the unclassified cervix, and the third-level classification criterion includes a combination of a color value and a shape as a classification reference value for identifying at least one of blood, mucus, a ring, a colposcope, a treatment mark, and a surgical instrument in the cervical image data.
3. Advantageous effects
According to the present invention, the system for generating cervical learning data has an advantageous effect in that since the learning data is generated by classifying captured image data of unclassified cervix uteri according to a multi-level classification standard, the presence or absence of cervical lesions can be diagnosed more accurately than in the existing system.
Further, there is another advantageous effect in that since additional learning data for controlling the numerical balance of the learning data specific to the classification criterion is further generated and used for learning, thereby preventing a phenomenon in which cervical image data of a specific shape is over-learned or a specific type of image is not normally learned, it is possible to accurately diagnose whether cervical lesion exists.
Detailed Description
Hereinafter, detailed embodiments of the present invention will be described with reference to the accompanying drawings. In the description of the present invention, a detailed description of related well-known functions or configurations determined to unnecessarily obscure the gist of the present invention will be omitted.
Fig. 2 is an exemplary diagram illustrating a configuration of a system 200 for generating cervical learning data according to an embodiment of the present invention. The illustrated system 200 for generating cervical learning data may be implemented as a set of code data executable in a computer system. In the following description, the system 200 and the Artificial Intelligence (AI) learning system 300 for generating cervical learning data are respectively illustrated, but the two systems may be integrated into one system according to a system implementation method.
For reference, the computer system is a system including a communication unit capable of data transmission/reception with an external device, a storage unit, and a control unit configured to control the overall operation of the system according to a set of control code data stored in the storage unit, and the storage unit may further include a software module for executing a specific-purpose application program.
Hereinafter, the configuration of the system 200 for generating cervical learning data will be described in detail with reference to fig. 2. A system 200 for generating cervical learning data according to one embodiment of the invention includes: an image receiving unit 210 configured to receive captured image data of an unclassified cervix from an external apparatus capable of transmitting/receiving data, for example, the apparatus 100 (photographing apparatus) capable of acquiring a captured image of a cervix or a storage apparatus storing a captured image of a cervix, and store the received captured image data of the unclassified cervix in the data storage unit 240; an image data classification unit 220 configured to classify the captured image data of the unclassified cervix based on a neural network algorithm (e.g., a Convolutional Neural Network (CNN) or a model containing the CNN and a Support Vector Machine (SVM)) according to a plurality of multi-level classification criteria; a learning data generation unit 230 configured to generate classification criterion-specific learning data from the captured image data of the cervix uteri classified according to each classification criterion, and transmit the learning data to the AI learning system 300 or store the learning data in the data storage unit 240; and a data storage unit 240 in which the multi-level classification criteria, the captured image data of the unclassified cervix uteri, and the generated learning data are stored.
In addition, the learning data generation unit 230 also generates additional learning data for controlling the numerical balance of the generated learning data specific to the plurality of classification criteria, thereby preventing the over-learning of the image data of the cervix uteri (or cervical cancer) of a specific shape or preventing a phenomenon that the image of a specific shape (or a specific type) is not normally learned.
Meanwhile, the image data classifying unit 220 may classify the captured image data of the unclassified cervix using at least two or more of a first-level classification criterion based on color, a second-level criterion based on the size of the cervix in the captured image data, a third-level criterion based on a combination of color and shape in the cervix image data, and a fourth-level criterion based on exposure and focus.
Specifically, the image data classifying unit 220 may primarily classify the captured image data of the unclassified cervix according to a primary color-based classification criterion, secondarily classify the captured image data of the unclassified cervix according to a secondary classification criterion based on a size of the cervix in the primary classified captured image data, and thirdly classify the captured image data of the unclassified cervix according to a tertiary classification criterion based on a combination of a color and a shape in the secondarily classified cervix image data.
In addition, the image data classifying unit 220 may classify captured image data of the cervix uteri, which has not been classified yet after the tertiary classification, four times according to a fourth-level classification criterion based on the exposure level and the focus.
In addition, the image data classifying unit 220 may first classify the captured image data of the unclassified cervix according to a first-level color-based classification criterion including a color value as a classification reference value for identifying each of at least one of an acetic acid reaction image, a Lugol solution reaction image, a green filter image, and a general image.
In addition, the image data classifying unit 220 may perform secondary classification according to the size of the cervix uteri in the captured image data primarily classified, for example, 150%, 100%, 80%, and 50% of the cervix uteri size, and whether or not a colposcope and other parts are included in the image.
Further, the image data classifying unit 220 may classify the captured image data of the cervix that has not been classified after the secondary classification three times according to a third-level classification criterion based on a combination of a color and a shape of the cervix image data in the captured image data of the unclassified cervix, wherein the third-level classification criterion includes a combination of a color value and a shape as a classification reference value for identifying at least one of blood, mucus, a ring, a colposcope, a treatment trace, and a surgical instrument in the cervix image data to classify the foreign substance affecting the cervix.
For example, blood typically appears in a reddish shape flowing down from the center of the cervix, mucus typically appears in a yellowish shape flowing down from the center of the cervix, and a ring is typically located in the middle of the cervix and clearly shows the line of the boomerang shape. Colposcopes and other surgical instruments appear in a color different from the color of the cervix (silver, blue, etc.), and thus, as described above, foreign objects affecting the cervix can be classified by using a combination of the color and shape of each foreign object.
Meanwhile, the image data classification unit 220 may additionally classify images that have not been classified by the above-described three classification criteria (i.e., images in which lesions are not recognized) according to the classification criteria based on the exposure and focus. For example, when underexposure or overexposure occurs, the histogram may show an extreme value on one side and thus may be classified using such a characteristic, and when the image is out of focus, an edge may not be detected or the color contrast may not be clear and thus may be classified using such a characteristic (four-time classification).
In the first to fourth described classification process, each classification may be performed using CNN as a deep learning technique. In the first, second, and fourth classifications, the features to be extracted are definite, and therefore, classification with high accuracy can be performed using a small number of layers, whereas in the third classification, many features need to be extracted, and therefore, accuracy can be improved by arranging deep layers.
Hereinafter, a learning data classification method of the system 200 for generating cervical learning data according to an embodiment of the present invention will be described in more detail with reference to fig. 2 to 4.
Fig. 3 is a flowchart illustrating a method of classifying cervical learning data according to an embodiment of the present invention. Fig. 4 is a diagram for describing in more detail the multi-level classification criteria for generating cervical learning data according to one embodiment of the present invention.
Referring to fig. 3, the image receiving unit 210 receives captured image data of an unclassified cervix from an external device, such as the photographing device 100, and stores the unclassified captured image data in the data storage unit 240 (S100).
The image data classifying unit 220 classifies one or more unclassified captured image data based on a neural network algorithm (e.g., CNN) according to a plurality of multi-level classification criteria, and stores the classified captured image data (S200).
For example, the image data classification unit 220 first classifies the captured image data of the unclassified cervix according to a first-level color-based classification criterion.
For the first classification, the image data classification unit 220 may include color values as classification reference values for identifying each of the acetic acid reaction image, the lugol solution reaction image, the green filter image, and the general image, and classifying the four images.
Specifically, the acetic acid response image can be distinguished from pink cervix and vagina due to the appearance of white spots on the cervix. In the lugol solution reaction image, brown or dark orange appears, and in the green filter image, green appears over the entire image. Accordingly, color values indicative of features of these images may be used as classification reference values to classify captured image data of unclassified cervix.
When the primary classification is completed, the image data classification unit 220 secondarily classifies the unclassified captured image data according to a second-level classification criterion based on the size of the cervix uteri in the primarily classified captured image data.
The cervix is a 500 won coin sized circle, usually located in the middle of the image. Thus, based on the size of the cervix in the image (150%, 100%, 80%, etc.), the image may be secondarily classified, for example, as an image in which only the cervix is enlarged, an image showing the entire cervix, an image in which the cervix occupies about 80% of the entire area, an image in which the cervix occupies about 50% of the entire area, and an image including the cervix, colposcope, and other parts.
Then, the image data classifying unit 220 classifies the foreign substance affecting the cervix three times according to a third-level classification criterion based on a combination of colors and shapes in the secondarily classified cervix image data.
As described above, blood generally takes on a reddish shape flowing down from the center of the cervix, mucus generally takes on a yellowish shape flowing down from the center of the cervix, and a ring is generally located in the middle of the cervix and clearly shows the line of the boomerang shape. Colposcopes and other surgical instruments appear in a color different from the color of the cervix (silver, blue, etc.), and thus, as described above, foreign objects affecting the cervix can be classified by using a combination of the color and shape of each foreign object.
The image data classification unit 220 may classify the three-times classified image four times based on the exposure and the focus, as the case may be.
As described above, the captured image data specific to the plurality of classification criteria classified according to the multi-level classification criteria is stored in the data storage unit 240.
When the classification of the image data is completed, the learning data generating unit 230 generates the learning data specific to the classification criterion from the captured image data of the cervix classified according to each classification criterion, and stores the generated learning data (S300). In the process of generating the learning data, the learning data generating unit 230 may also generate additional learning data for controlling the numerical balance of the learning data specific to the classification criterion, wherein it is preferable to generate the additional learning data based on each of the learning data specific to the classification criterion.
As a method of generating the additional learning data, the left and right of the image may be reversed using vertical mirroring, and the top and bottom of the image may be reversed using horizontal mirroring, or the image may be cropped to a size smaller than the original size on the basis of the top, bottom, left, and right sides to generate the additional learning data. Furthermore, when mirroring and cropping are used together, up to 16 times more additional learning data can be generated.
When the learning data is generated by classifying the captured image data of the unclassified cervix according to the multi-level classification criteria as described above, the AI learning system 300 learns and verifies the generated learning data (S400). When the AI learning system 300 is implemented in the system for generating cervical learning data 200, the generated learning data can be learned and validated, and the presence or absence of a lesion in the cervix can be diagnosed or determined based on the learning.
Meanwhile, when additional unclassified cervical image data is obtained, further learning data is generated as described above and can be used for relearning to improve performance.
As described above, the system 200 for generating cervical learning data according to one embodiment of the present invention generates learning data by classifying captured image data of an unclassified cervix according to a plurality of levels of classification criteria, so that whether or not a lesion exists in the cervix can be diagnosed more accurately by learning AI diagnosis devices (determination devices, AI engines, etc.) established according to the learning data generated according to various classification criteria, as compared to existing systems.
In addition, according to the present invention, additional learning data is further generated to control the numerical balance of the learning data specific to the classification criteria and used for learning, so that it is possible to prevent a phenomenon in which image data of a cervix uteri (or cervical cancer) of a specific shape is over-learned or an image of a specific shape (or type) is not learned, thereby making it possible to accurately diagnose whether there is a lesion in the cervix uteri.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.