KR20220109527A

KR20220109527A - Method and system for predicting adenoma related information based on machine-leaned model

Info

Publication number: KR20220109527A
Application number: KR1020210012416A
Authority: KR
Inventors: 정정일; 이현정
Original assignee: 주식회사 피씨티
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2022-08-05
Also published as: KR102577294B1

Abstract

The present invention relates to a method and system for predicting adenoma-related information based on a machine-leaned model and, a method and system for predicting adenoma-related information based on a machine-leaned model, which can derive a risk of adenoma that may cause colon cancer by using a model trained by data on lifestyles of people acquired through health check results to be used by a user to predict an occurrence of colon cancer based on big data on the basis of the lifestyle information.

Description

Method and system for predicting adenoma related information based on machine-leaned model}

본 발명은 기계학습모델에 기반한 선종 관련 정보 예측 방법 및 시스템에 관한 것으로서, 더욱 상세하게는 건강검진결과를 통해 얻는 사람들의 생활습관에 대한 데이터에 의하여 학습된 모델을 이용하여 대장암을 유발할 수 있는 선종 발생 가능성을 도출함으로써, 사용자는 생활습관 정보에 기초하여 빅데이터 기반의 대장암의 발병을 예측해볼 수 있는 지표로서 활용할 수 있는, 기계학습모델에 기반한 선종 관련 정보 예측 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for predicting adenoma-related information based on a machine learning model, and more particularly, to a method and system for predicting adenoma-related information that can cause colorectal cancer using a model learned by data on people's lifestyle obtained through health examination results. By deriving the possibility of occurrence of adenoma, the user relates to a method and system for predicting adenoma-related information based on a machine learning model, which can be used as an index to predict the onset of colorectal cancer based on big data based on lifestyle information.

대장암은 결장과 직장에 생기는 악성종양을 말하며, 최근 전 세계적으로 대장암 발생이 급속히 증가하고 있다. 그 중 우리나라의 대장암 발생률은 2012년 10만명당 51.7건으로 전체 발생 암 중 3위를 차지하고 있다. 최근 국내 대장암 발생률이 증가함에 따라 대장암 조기 검진에 대한 관심이 증가되면서 대장내시경 및 용종절제술이 활발하게 시행되고 있다. 대장에서 발생하는 선종성 용종은 암의 전구 병변으로 알려져 있으며, 이러한 용종을 제거함으로써, 대장암 발병률을 줄일 수 있는 것으로 보고되고 있다. 국립암센터에서는 현재 매년 8,000건 이상의 진단 및 치료대장내시경을 시행하고 있으며 이는 향후 환자들의 수요에 의해 더욱 증가될 것으로 예측되고 있다. 대장암은 30대까지는 발생률이 낮지만 40대 이후부터 발생률이 증가하기 시작하고 50대부터는 발생률이 큰 증가세를 보이는 것으로 나타난다. 또한 여자보다 남자에게서 발생률이 더 높게 나타나며 이와 같이 대장암의 경우 타 암과 비교하여 암 발생에 대한 위험 인자들이 비교적 뚜렷하게 밝혀져 있어 암의 예방에 초점을 맞춘 많은 연구들이 선행 되어져 왔다.Colorectal cancer refers to a malignant tumor that occurs in the colon and rectum, and the incidence of colorectal cancer is rapidly increasing worldwide in recent years. Among them, the incidence of colorectal cancer in Korea was 51.7 cases per 100,000 people in 2012, ranking third among all cancers. With the recent increase in the incidence of colorectal cancer in Korea, interest in early detection of colorectal cancer has increased, and colonoscopy and polypectomy are being actively performed. Adenomatous polyps occurring in the colon are known as precursor lesions of cancer, and by removing these polyps, it has been reported that the incidence of colorectal cancer can be reduced. The National Cancer Center currently performs more than 8,000 diagnostic and therapeutic colonoscopy every year, and this is expected to increase further due to the demand from patients in the future. The incidence rate of colorectal cancer is low until the age of 30, but the incidence rate starts to increase after the age of 40, and the incidence rate shows a large increase from the age of 50. In addition, the incidence rate is higher in men than in women, and in the case of colorectal cancer, compared to other cancers, the risk factors for cancer are relatively clear, so many studies focusing on the prevention of cancer have been preceded.

이에 기초하여 개개인의 특성 및 생활습관에 따라, 대장용종 및 대장암에 대한 가능성을 도출하여 대장암을 예측할 수 있는 지표를 제공해줄 수 있는 기술이 필요하나 이러한 종래 기술은 전무하다.Based on this, there is a need for a technology that can provide an indicator for predicting colorectal cancer by deriving the possibility of colorectal polyps and colorectal cancer according to individual characteristics and lifestyle, but such prior art is nonexistent.

한편, 용종은 점막의 일부가 주위 표면보다 돌출해 마치 혹처럼 튀어나온 것에 해당한다. 우리 몸의 소화관이나 점막이 있는 모든 기관에서 생길 수 있지만, 특히 용종이 가장 많이 발생하는 인체기관은 대장이다. 대장 용종은 대부분 특별한 증상을 보이지 않아 대장내시경 검사를 하기 전까지는 발견이 어렵다.On the other hand, a polyp corresponds to a part of the mucous membrane protruding from the surrounding surface and protruding like a lump. It can occur in any organ with digestive tract or mucous membranes in our body, but in particular, the human organ where polyps occur the most is the large intestine. Most colon polyps do not show any specific symptoms, so it is difficult to detect them until colonoscopy is performed.

선종은 용종 중에서 대장암으로 발전할 수 있는 종양성 용종을 의미한다. 대부분의 용종은 양성이지만, 선종은 시간이 지남에 따라 대장암으로 진행할 수 있어 각별한 주위가 필요하다. 대장암의 약 80% 이상이 선종으로부터 진행되며, 선종 자체를 제거하면 그 만큼 대장암의 발생 빈도 또한 낮아질 수 있기 때문에, 선종에 대한 예측이 필요하다.Adenoma refers to a neoplastic polyp that can develop into colorectal cancer among polyps. Most polyps are benign, but adenomas can progress to colorectal cancer over time, requiring special attention. More than 80% of colorectal cancer proceeds from adenoma, and if the adenoma itself is removed, the incidence of colorectal cancer can be lowered by that much, so it is necessary to predict adenoma.

그러나, 이와 같은 선종에 대한 발견은 현재까지 대장내시경이라는 일종의 마취가 필요한 시술절차를 통해서 발견될 수 있기 때문에, 이에 대한 발견이 늦어져서 암으로 전이하는 경우가 많은 실정이다.However, since the discovery of such adenomas can be discovered through a procedure that requires a kind of anesthesia called colonoscopy to date, the discovery is delayed and metastasis to cancer is frequent.

본 발명은 건강검진결과를 통해 얻는 사람들의 생활습관에 대한 데이터에 의하여 학습된 모델을 이용하여 대장암을 유발할 수 있는 선종 발생 가능성을 도출함으로써, 사용자는 생활습관 정보에 기초하여 빅데이터 기반의 대장암의 발병을 예측해볼 수 있는 지표로서 활용할 수 있는, 기계학습모델에 기반한 선종 관련 정보 예측 방법 및 시스템을 제공하는 것을 그 목적으로 한다.The present invention derives the possibility of occurrence of an adenoma that can cause colorectal cancer using a model learned by data on people's lifestyle obtained through the health checkup result, so that the user can use the big data-based large-database based on lifestyle information. An object of the present invention is to provide a method and system for predicting adenoma-related information based on a machine learning model, which can be used as an index to predict the onset of cancer.

상기와 같은 과제를 해결하기 위하여 본 발명은, 서버시스템에 의하여 수행되는, 기계학습모델에 기반한 선종 관련 정보 예측 방법으로서, 의료기관으로부터 수집한 개인별 복수의 제1항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제1학습데이터에 의하여 통계적으로 학습된 제1추론모델을 이용하여 임의의 개인별 복수의 제1항목에 대한 생활습관정보에 대한 예측된 선종정보를 도출하여, 상기 임의의 개인별 복수의 제1항목에 대한 생활습관정보 및 선종정보를 포함하는 제2학습데이터를 도출하는 제2학습데이터도출단계; 상기 제1학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제1압축학습데이터를 생성하고, 상기 제2학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제2압축학습데이터를 생성하는 압축데이터생성단계; 및 상기 제1압축데이터 및 제2압축데이터에 의하여, 인공신경망을 포함하고, 입력된 제2항목에 대한 생활습관정보에 대하여 선종유무에 대한 정보 가능성을 추론하는 제2추론모델을 학습시키는 제2추론모델학습단계;를 포함하는, 기계학습모델에 기반한 선종 관련 정보 예측 방법을 제공한다.In order to solve the above problems, the present invention is a method for predicting adenoma-related information based on a machine learning model, performed by a server system, and lifestyle information for a plurality of first items for each individual collected from a medical institution, and a colonoscopy Using the first inference model statistically learned by the first learning data including the adenoma information including the information on the presence or absence of adenoma according to the result, the prediction of the lifestyle information for the plurality of first items for each individual is predicted. a second learning data deriving step of deriving Zen type information and deriving second learning data including lifestyle information and Zen type information for the plurality of first items for each individual; Generates first compressed learning data including lifestyle information and Zen type information for a second item in which some of the plurality of first items are removed from the first learning data, and in the second learning data, the plurality of first items A compressed data generating step of generating second compressed learning data including lifestyle information and Zen type information for a second item from which some of the first items are removed; and a second inference model for inferring the possibility of information on the presence or absence of an adenoma with respect to the lifestyle information for the inputted second item, including an artificial neural network, based on the first compressed data and the second compressed data. It provides a method for predicting ship type-related information based on a machine learning model, including; inference model learning step.

본 발명의 몇 실시예에서는, 상기 제2학습데이터도출단계는, 의료기관으로부터 수집한 개인별 복수의 제1항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제1학습데이터를 수신하는 학습정보수신단계; 상기 제1학습데이터를 상기 제1항목에 따른 특정 생활습관정보에 따라 선종정보를 클러스터링하고, 클러스터링된 정보에 기초하여 상기 제1항목에 따른 특정 생활습관정보를 가진 그룹에 대한 선종가능성을 도출하는 가능성도출단계; 복수의 상기 제1항목에 따른 상기 특정 생활습관정보를 가진 그룹의 상기 선종가능성을 포함하는 예측데이터셋을 도출하는 예측데이터셋도출단계; 및 입력된 임의의 개인별 복수의 제1항목에 대한 생활습관정보에 대하여, 상기 예측데이터셋에 기초하여 입력된 해당 생활습관정보를 가진 사람의 선종가능성에 대한 결과를 도출하는 예측결과도출단계;를 포함할 수 있다.In some embodiments of the present invention, the second learning data derivation step includes adenoma information including lifestyle information on a plurality of first items for each individual collected from a medical institution, and information on the presence or absence of adenoma according to the results of colonoscopy. Learning information receiving step of receiving the first learning data including; The first learning data is clustered with adenomatous information according to the specific lifestyle information according to the first item, and based on the clustered information, the probability of adenomatosis for a group having the specific lifestyle information according to the first item is derived. possibility extraction stage; a predictive data set deriving step of deriving a predictive data set including the adenomatous probability of a group having the specific lifestyle information according to a plurality of the first items; and a prediction result derivation step of deriving a result on the possibility of adenomatosis of a person having the inputted lifestyle information based on the prediction data set with respect to the inputted lifestyle information for a plurality of first items for each individual; may include

본 발명의 몇 실시예에서는, 상기 예측결과도출단계는, 상기 입력된 임의의 개인별 복수의 제1항목에 대한 생활습관정보와 상기 예측데이터셋에 포함된 생활습관정보의 매칭정도에 따라 유사도를 도출하고, 상기 유사도가 기설정된 기준을 부합하는지 여부를 판별하여 복수의 유사그룹예측데이터를 도출하는 단계; 및 상기 복수의 유사그룹예측데이터에서의 선종유무에 대한 정보에 대한 수치 정보의 대표값을 이용하여, 상기 입력된 임의의 개인별 복수의 제1항목에 대한 생활습관정보를 가진 사람의 선종유무에 대한 정보에 대한 확률정보를 도출하는 단계;를 포함할 수 있다.In some embodiments of the present invention, in the step of deriving the prediction result, the degree of similarity is derived according to the degree of matching between the inputted lifestyle information for the plurality of first items for each individual and the lifestyle information included in the prediction data set. and deriving a plurality of similarity group prediction data by determining whether the degree of similarity meets a preset criterion; And by using a representative value of numerical information on the information on the presence or absence of adenoma in the plurality of similar group prediction data, the information on the presence or absence of adenoma of a person having lifestyle information for a plurality of first items for each individual inputted above It may include; deriving probability information about the information.

본 발명의 몇 실시예에서는, 상기 제1학습데이터의 선종유무에 대한 정보는,In some embodiments of the present invention, the information on the presence or absence of a line type of the first learning data is,

상기 의료기관으로부터 수집한 대장내시경 결과정보 및 상기 조직검사 결과정보로부터 용종 혹은 선종에 관련된 키워드를 추출하여 해당 개인에 대한 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 제1용종데이터를 도출하고, 상기 제1용종데이터로부터 상기 선종유무에 대한 정보를 판별하여 정규화된 정보 형태로 도출될 수 있다.By extracting a keyword related to a polyp or adenoma from the colonoscopy result information and the biopsy result information collected from the medical institution, the first polyp data including at least one of the polyp location, polyp size, and polyp type for the individual is derived, and , by determining the information on the presence or absence of the adenoma from the first polyp data can be derived in the form of normalized information.

본 발명의 몇 실시예에서는, 상기 제1항목 및 상기 제2항목은 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력을 포함하고, 상기 제2항목은 상기 제1항목 중 일부가 제외될 수 있다.In some embodiments of the present invention, the first item and the second item include age, sex, BMI, exercise habit, drinking habit, smoking habit, family medical history and personal medical history, and the second item is the first Some of the items may be excluded.

본 발명의 몇 실시예에서는, 상기 제2학습데이터도출단계에서, 서버시스템은 자신의 예측된 선종정보를 요청하는 제1사용자으로부터 복수의 제1항목에 대한 생활습관정보를 수신하고, 상기 수신된 제1항목에 대한 생활습관정보 및 상기 제1추론모델을 이용하여 예측된 선종정보를 도출하여, 제1사용자에게 제공하는 제1서비스가 수행됨으로써, 제2학습데이터를 도출하고, 상기 기계학습모델에 기반한 선종 관련 정보 예측 방법은, 자신의 예측된 선종정보를 요청하는 제2사용자으로부터 복수의 제2항목에 대한 생활습관정보를 수신하고, 상기 수신된 제2항목에 대한 생활습관정보 및 상기 제2추론모델을 이용하여 예측된 선종정보를 도출하여, 제2사용자에게 제공하는 제2서비스제공단계;를 더 포함할 수 있다.In some embodiments of the present invention, in the second learning data derivation step, the server system receives lifestyle information for a plurality of first items from the first user requesting his/her predicted line type information, and the received The first service provided to the first user is performed by deriving the predicted ship type information using the lifestyle information for the first item and the first inference model, thereby deriving the second learning data, and the machine learning model Adenoma related information prediction method based on receiving lifestyle information for a plurality of second items from a second user who requests his/her predicted adenomatous information, and lifestyle information for the received second item and the first It may further include; a second service providing step of deriving the predicted ship type information using the two inference model, and providing it to a second user.

본 발명의 일 실시예에 따르면, 생활습관정보를 수신하여 생활습관정보로부터 도출된 선종발생 가능성을 제공함으로써, 대장암 진단 선별검사 기준을 제시할 수 있는 효과를 발휘할 수 있다.According to an embodiment of the present invention, by receiving the lifestyle information and providing the possibility of occurrence of adenoma derived from the lifestyle information, it is possible to exhibit the effect of presenting the screening test criteria for colorectal cancer.

본 발명의 일 실시예에 따르면, 다양한 학습대상의 생활습관정보, 대장내시경 결과정보 및 조직검사 결과정보 뿐만 아니라 이를 이용하여 통계적 방식으로 학습된 추론모델에 의하여 생성된 추가적인 학습데이터를 바탕으로 학습된 기계학습모델에 기반하여 대장암 관련 정보에 해당하는 선종발생 가능성에 대한 정보를 제공함으로써, 사용자는 제공받은 가능성 지표를 통해 대장의 건강상태를 객관적 지표로 확인할 수 있는 효과를 발휘할 수 있다.According to an embodiment of the present invention, it is learned based on not only lifestyle information, colonoscopy result information and biopsy result information of various learning subjects, but also additional learning data generated by an inference model learned in a statistical manner using it. By providing information on the possibility of occurrence of adenoma corresponding to colorectal cancer-related information based on the machine learning model, the user can exert the effect of confirming the health status of the colon as an objective index through the provided probability index.

본 발명의 일 실시예에 따르면, 통계적으로 학습하는 제1추론모델의 학습과정에서 선종발생 관련 정보를 예측하기 위한 생활습관정보와 유사한 데이터가 없는 경우, 생활습관정보의 세부카테고리정보를 기설정된 기준에 따라 제외하여 유사한 데이터를 필터링함으로써, 더욱 제1학습모델을 효율적으로 학습시킬 수 있는 효과를 발휘할 수 있다.According to an embodiment of the present invention, when there is no data similar to the lifestyle information for predicting adenoma occurrence related information in the learning process of the first inference model to learn statistically, the detailed category information of the lifestyle information is set as a preset criterion. By filtering similar data by excluding according to , it is possible to exert the effect of more efficiently learning the first learning model.

본 발명의 일 실시예에 따르면, 일부 데이터가 수신되지 않은 경우에도, 과거의 수집한 학습정보에 기초하여 데이터를 보완함으로써, 데이터로부터 도출한 대장암 관련 정보의 데이터 신뢰성을 향상시킬 수 있는 효과를 발휘할 수 있다.According to an embodiment of the present invention, even when some data is not received, by supplementing the data based on the learning information collected in the past, the effect of improving the data reliability of the colorectal cancer-related information derived from the data can perform

도 1은 본 발명의 일 실시예에 따른 생활습관정보에 기초한 대장암 관련 정보 예측 방법에 따른 서버시스템의 동작을 개략적으로 도시한다.
도 2는 본 발명의 일 실시예에 따른 서버시스템의 내부 구성을 개략적으로 도시한다.
도 3은 본 발명의 일 실시예에 따른 가능성도출부에 의하여 처리되는 학습정보의 형태를 개략적으로 도시한다.
도 4는 본 발명의 일 실시예에 따른 생활습관정보를 분류하는 기준을 개략적으로 도시한다.
도 5는 본 발명의 일 실시예에 따른 가능성도출부에 의하여 도출되는 정보의 처리과정을 개략적으로 도시한다.
도 6은 본 발명의 일 실시예에 따른 가능성도출부에 의하여 학습대상의 대장암가능성을 도출하는데 기초가 되는 기설정된 매핑테이블을 개략적으로 도시한다.
도 7은 본 발명의 일 실시예에 따른 가능성도출부 및 예측데이터셋도출부에 의하여 도출되는 데이터의 형태를 개략적으로 도시한다.
도 8은 본 발명의 일 실시예에 따른 예측결과도출부의 수행 단계를 개략적으로 도시한다.
도 9는 본 발명의 일 실시예에 따른 예측결과도출부에 의하여 생활습관정보의 세부카테고리항목을 제외하는 기준을 개략적으로 도시한다.
도 10은 본 발명의 일 실시예에 따른 예측결과도출부에 의하여 도출되는 정보의 처리과정을 개략적으로 도시한다.
도 11은 본 발명의 일 실시예에 따른 기계학습모델에 기반한 선종 관련 정보 예측 시스템을 개략적으로 도시한다.
도 12는 본 발명의 일 실시예에 따른 정규화된 생활습관정보를 생성하는 과정을 개략적으로 도시한다.
도 13은 본 발명의 일 실시예에 따른 제1학습데이터 및 제2학습데이터를 확보하는 과정을 개략적으로 도시한다.
도 14는 본 발명의 일 실시예에 따른 사용자의 제1서비스의 이용단계들을 개략적으로 도시한다.
도 15는 본 발명의 일 실시예에 따른 제1학습데이터 및 제2학습데이터의 압축과정을 개략적으로 도시한다.
도 16은 본 발명의 일 실시예에 따른 제1압축학습데이터 및 제2압축학습데이터를 이용하여 제2추론모델을 학습시키는 과정을 개략적으로 도시한다.
도 17은 본 발명의 일 실시예에 따른 제2서비스를 제공하는 단계들을 개략적으로 도시한다.
도 18은 본 발명의 일 실시예에 따른 제3압축학습데이터를 생성하는 과정들을 개략적으로 도시한다.
도 19은 본 발명의 일 실시예에 따른 제4압축학습데이터를 생성하는 과정들을 개략적으로 도시한다.
도 20은 본 발명의 일 실시예에 따른 제3추론모델을 학습시키는 과정에 대하여 개략적으로 도시한다.
도 21은 본 발명의 일 실시예에 따른 제2서비스제공부에 의한 고위험 선종정보를 제공하는 단계들을 개략적으로 도시한다.
도 22은 본 발명의 일 실시예에 따른 제2서비스제공부의 동작에 의하여 사용자단말에서 표시되는 선종발생 예측정보 UI를 도시한다.
도 23은 본 발명의 일 실시예에 따른 제2서비스제공부의 동작에 의하여 사용자단말에서 표시되는 고위험 선종발생 예측정보 UI를 도시한다.
도 24는 본 발명의 실시예들에 따른 선종발생 예측 및 고위험 선종발생 예측에 대한 정확도에 대한 자료를 도시한다.
도 25은 본 발명의 일 실시예에 따른 컴퓨팅장치의 구성을 개략적으로 도시한다.1 schematically illustrates the operation of a server system according to a method for predicting colorectal cancer-related information based on lifestyle information according to an embodiment of the present invention.
2 schematically shows an internal configuration of a server system according to an embodiment of the present invention.
3 schematically shows the form of learning information processed by the possibility derivation unit according to an embodiment of the present invention.
4 schematically shows a criterion for classifying lifestyle information according to an embodiment of the present invention.
5 schematically illustrates a processing process of information derived by a likelihood derivation unit according to an embodiment of the present invention.
6 schematically shows a preset mapping table that is based on deriving the colorectal cancer possibility of a learning target by the possibility derivation unit according to an embodiment of the present invention.
7 schematically shows the form of data derived by the likelihood derivation unit and the prediction dataset derivation unit according to an embodiment of the present invention.
8 schematically shows the execution steps of the prediction result deriving unit according to an embodiment of the present invention.
9 schematically illustrates a criterion for excluding detailed category items of lifestyle information by the prediction result deriving unit according to an embodiment of the present invention.
10 schematically illustrates a processing process of information derived by a prediction result deriving unit according to an embodiment of the present invention.
11 schematically shows a ship type-related information prediction system based on a machine learning model according to an embodiment of the present invention.
12 schematically illustrates a process of generating normalized lifestyle information according to an embodiment of the present invention.
13 schematically illustrates a process of securing the first learning data and the second learning data according to an embodiment of the present invention.
14 schematically illustrates steps of a user using a first service according to an embodiment of the present invention.
15 schematically shows a compression process of the first learning data and the second learning data according to an embodiment of the present invention.
16 schematically illustrates a process of learning a second inference model using the first compressed learning data and the second compressed learning data according to an embodiment of the present invention.
17 schematically illustrates steps of providing a second service according to an embodiment of the present invention.
18 schematically shows processes for generating third compressed learning data according to an embodiment of the present invention.
19 schematically shows processes for generating fourth compressed learning data according to an embodiment of the present invention.
20 schematically illustrates a process of learning a third inference model according to an embodiment of the present invention.
21 schematically shows steps for providing high-risk vessel type information by the second service provider according to an embodiment of the present invention.
22 illustrates adenoma occurrence prediction information UI displayed on the user terminal by the operation of the second service providing unit according to an embodiment of the present invention.
23 illustrates a high-risk adenoma occurrence prediction information UI displayed on the user terminal by the operation of the second service provider according to an embodiment of the present invention.
24 shows data on the accuracy of predicting the occurrence of adenoma and predicting the occurrence of high-risk adenoma according to embodiments of the present invention.
25 schematically illustrates the configuration of a computing device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Also, terms including an ordinal number such as first, second, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware. Meanwhile, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, '~' denotes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'. In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

이하에서 언급되는 "사용자 단말"은 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말기로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말 등과 같은 모든 종류의 핸드헬드 (Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 또한, "네트워크"는 근거리 통신망(Local Area Network;LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 무선 네트워크로 구현될 수 있다.The "user terminal" referred to below may be implemented as a computer or portable terminal that can access a server or other terminal through a network. Here, the computer includes, for example, a laptop equipped with a web browser (WEB Browser), a desktop, a laptop, and the like, and the portable terminal is, for example, a wireless communication device that guarantees portability and mobility. , PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code) Division Multiple Access)-2000, W-Code Division Multiple Access (W-CDMA), and Wibro (Wireless Broadband Internet) terminals may include all types of handheld-based wireless communication devices. In addition, "network" refers to a wired network such as a local area network (LAN), a wide area network (WAN) or a value added network (VAN), or a mobile radio communication network or satellite. It may be implemented in any kind of wireless network, such as a communication network.

도 1은 본 발명의 일 실시예에 따른 생활습관정보에 기초한 대장암 관련 정보 예측 방법에 따른 서버시스템의 동작을 개략적으로 도시한다.1 schematically illustrates an operation of a server system according to a method for predicting colorectal cancer-related information based on lifestyle information according to an embodiment of the present invention.

도 1에 도시된 서버시스템(1000)에서는, 건강검진결과를 통해 얻는 사람들의 생활습관에 대한 데이터를 학습하여 생활습관에 따른 대장용종 및 대장암의 가능성과 같은 대장암 관련정보를 도출하여 제공함으로써, 사용자는 특정 생활습관에 대한 대장암 관련 정보를 통해 해당 생활습관에 따른 대장암의 발병을 예측해볼 수 있는 지표로서 활용할 수 있다. 바람직하게는, 복수의 세부카테고리정보를 포함하는 생활습관정보는, 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 중 1 이상을 포함한다. 생활습관정보는 건강검진문답과 같은 건강검진정보를 수신하여 상기 세부카테고리정보의 항목과 관련된 키워드에 기초하여 수신한 건강검진정보로부터 도출하거나, 혹은 상기 세부카테고리정보의 항목별 정보의 입력을 직관적으로 수신할 수 있다. 본 발명의 서버시스템(1000)은 이러한 생활습관정보 및 대장내시경 결과정보 및 조직검사 결과정보를 포함하는 학습정보(1510)를 수신하여 학습정보(1510)를 학습하여 용종가능성 및 대장암가능성을 도출하고, 특정 생활습관정보에 대한 용종가능성 및 대장암가능성을 도출한다. 이후, 도출된 용종가능성 및 대장암가능성에 기초하여 대장암관련결과정보(1530)를 도출할 수 있고, 입력된 생활습관정보에 기초하여 입력된 생활습관정보를 가진 사람의 도출된 대장암관련결과정보(1530)를 도출할 수 있다.In the server system 1000 shown in FIG. 1, by learning data about people's lifestyles obtained through health checkup results, and deriving and providing colorectal cancer-related information such as the possibility of colon polyps and colorectal cancer according to lifestyles. , the user can use the colorectal cancer-related information for a specific lifestyle as an index that can predict the onset of colorectal cancer according to the corresponding lifestyle. Preferably, the lifestyle information including a plurality of detailed category information includes one or more of age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history. Lifestyle information receives health checkup information such as health checkup questions and answers and derives it from the received health checkup information based on keywords related to the items of the detailed category information, or intuitively inputs information for each item of the detailed category information can receive The server system 1000 of the present invention receives the learning information 1510 including such lifestyle information and colonoscopy result information and biopsy result information and learns the learning information 1510 to derive the polyp possibility and the colorectal cancer possibility. and deduce the possibility of polyp and colorectal cancer for specific lifestyle information. Thereafter, colorectal cancer-related result information 1530 can be derived based on the derived polyp possibility and colorectal cancer possibility, and the derived colorectal cancer-related result of a person with the inputted lifestyle information based on the inputted lifestyle information Information 1530 may be derived.

도 1에 도시된 서버시스템(1000)은 수신한 학습정보(1510)를 수신하고 대장암 관련 정보를 도출하는 프로세서 및 상기 프로세서에서 수행 가능한 명령들을 저장하는 1 이상의 메모리를 포함한다. 상기 프로세서는 학습정보수신부(1100), 가능성도출부(1200), 예측데이터셋도출부(1300) 및 예측결과도출부(1400)를 포함한다.The server system 1000 shown in FIG. 1 includes a processor for receiving the received learning information 1510 and deriving colorectal cancer-related information, and one or more memories for storing instructions executable by the processor. The processor includes a learning information receiving unit 1100 , a probability deriving unit 1200 , a prediction data set deriving unit 1300 , and a prediction result deriving unit 1400 .

이하에서는, 보다 구체적인 상기 학습정보수신부(1100), 가능성도출부(1200), 예측데이터셋도출부(1300) 및 예측결과도출부(1400)의 동작에 대해서 자세하게 설명하도록 한다.Hereinafter, operations of the learning information receiving unit 1100 , the likelihood deriving unit 1200 , the prediction data set deriving unit 1300 , and the prediction result deriving unit 1400 will be described in detail.

도 2는 본 발명의 일 실시예에 따른 서버시스템(1000)의 내부 구성을 개략적으로 도시한다.2 schematically shows an internal configuration of a server system 1000 according to an embodiment of the present invention.

본 발명의 서버시스템(1000)은 생활습관정보에 기초한 대장암 관련 정보 예측 방법을 수행할 수 있다. 개략적으로, 건강검진결과를 통해 얻는 사람들의 생활습관에 대한 데이터를 학습하여 생활습관에 따른 대장용종 및 대장암의 가능성과 같은 대장암 관련정보를 도출하여 제공한다. 상기 서버시스템(1000)은 학습정보수신부(1100), 가능성도출부(1200), 예측데이터셋도출부(1300) 및 예측결과도출부(1400)를 포함할 수 있다.The server system 1000 of the present invention may perform a method of predicting colorectal cancer-related information based on lifestyle information. Briefly, by learning data on people's lifestyles obtained through health checkup results, information related to colorectal cancer such as the possibility of colon polyps and colorectal cancer according to lifestyles is derived and provided. The server system 1000 may include a learning information receiving unit 1100 , a likelihood deriving unit 1200 , a prediction data set deriving unit 1300 , and a prediction result deriving unit 1400 .

상기 학습정보수신부(1100)는, 학습대상의 복수의 항목에 대한 생활습관정보, 대장내시경 결과정보, 및 조직검사 결과정보를 수신한다. 구체적으로, 학습대상의 생활습관정보는, 예를 들어 건강검진정보와 같이 학습대상들의 생활습관에 관한 정보가 포함된 정보로부터 수신할 수 있다. 바람직하게는 생활습관정보는, 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 중 1 이상을 포함할 수 있다. 상기 대장내시경 결과정보는 예를 들어, 대장내시경 진단서와 같은 대장내시경을 통해 진단을 수행하고 진단결과를 보유하고 있는 학습대상의 대장내시경 결과에 대한 정보를 의미한다. 조직검사 결과정보는, 예를 들어, 조직검사 결과보고서와 같이, 조직검사를 수행하고 조직검사결과를 보유하고 있는 학습대상의 조직검사 결과에 대한 정보를 의미한다.The learning information receiving unit 1100 receives lifestyle information, colonoscopy result information, and biopsy result information for a plurality of items of the learning target. Specifically, the lifestyle information of the learning target may be received from information including information on the lifestyle of the learning target, such as, for example, health checkup information. Preferably, the lifestyle information may include one or more of age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history. The colonoscopy result information means, for example, information about a colonoscopy result of a learning target who has performed a diagnosis through a colonoscopy, such as a colonoscopy diagnosis certificate, and has a diagnosis result. The biopsy result information, for example, like a biopsy result report, means information about the biopsy result of a learning object that performs a biopsy and holds the biopsy result.

상기 가능성도출부(1200)는, 특정 생활습관정보를 갖는 1 이상의 학습대상의 대장내시경 결과정보, 및 조직검사 결과정보를 클러스터링하고, 클러스터링된 정보에 기초하여 특정 생활습관정보를 가진 그룹에 대한 용종가능성, 및 대장암가능성을 도출한다. 구체적으로, 우선 복수의 학습대상 중 동일한 특정 생활습관정보를 갖는 1 이상의 학습대상의 대장내시경 결과정보 및 조직검사 결과정보를 클러스터링 한다. 이후, 클러스터링된 정보에 기초하여 학습대상의 용종보유여부를 판별하고, 동일한 특정 생활습관정보를 갖는 복수의 학습대상의 용종보유여부에 기초하여 특정 생활습관정보를 가진 그룹에 대한 용종가능성을 도출한다. 또한, 클러스터링된 정보 및 기설정된 매핑테이블에 기초하여 특정 생활습관정보를 가진 그룹에 대한 대장암가능성을 도출한다. 가능성도출부(1200)는, 복수의 특정 생활습관정보에 대하여 상술한 바와 같은 과정을 반복하여, 학습대상 전체의 각각의 특정 생활습관정보를 가진 그룹별 용종가능성 및 대장암가능성을 도출한다.The possibility derivation unit 1200 clusters colonoscopy result information and biopsy result information of one or more learning subjects having specific lifestyle information, and based on the clustered information, polyps for a group having specific lifestyle information likelihood, and colorectal cancer. Specifically, first, colonoscopy result information and biopsy result information of one or more learning objects having the same specific lifestyle information among a plurality of learning objects are clustered. Thereafter, based on the clustered information, it is determined whether the learning target has polyps, and based on whether a plurality of learning objects having the same specific lifestyle information have polyps or not, the polyp possibility for a group with specific lifestyle information is derived. . In addition, based on the clustered information and a preset mapping table, the possibility of colorectal cancer for a group having specific lifestyle information is derived. Possibility derivation unit 1200, by repeating the above-described process for a plurality of specific lifestyle information, derives the possibility of polyp and colorectal cancer for each group having each specific lifestyle information of the entire learning object.

상기 예측데이터셋도출부(1300)는, 복수의 특정 생활습관정보를 가진 그룹의 용종가능성 및 대장암가능성에 대한 예측데이터셋(1520)을 도출한다. 가능성도출부(1200)에 의하여 특정 생활습관정보를 갖는 그룹별 용종가능성 및 대장암가능성이 도출된 후에 상기 예측데이터셋도출부(1300)는, 복수의 세부카테고리정보를 포함하는 상기 생활습관정보, 상기 용종가능성, 및 상기 대장암가능성을 포함하는 예측데이터셋(1520)을 도출한다. 바람직하게는 상기 생활습관정보는 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 중 1 이상을 포함한다.The prediction dataset derivation unit 1300 derives a prediction dataset 1520 for the polyp possibility and colorectal cancer possibility of a group having a plurality of specific lifestyle information. After the possibility of polyp and colorectal cancer for each group having specific lifestyle information is derived by the likelihood derivation unit 1200, the prediction dataset derivation unit 1300 includes the lifestyle information including a plurality of detailed category information, A prediction data set 1520 including the polyp possibility and the colorectal cancer possibility is derived. Preferably, the lifestyle information includes at least one of age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history.

이후, 예측결과도출부(1400)는, 입력된 입력생활습관정보에 대하여, 예측데이터셋도출부(1300)가 도출한 예측데이터셋(1520)에 기초하여 해당 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성에 대한 결과를 도출한다. 구체적으로, 예측결과도출부(1400)는, 수신한 입력생활습관정보 및 상기 예측데이터셋(1520)에 포함된 생활습관정보를 매칭하고, 각각의 생활습관정보의 세부카테고리정보들이 매칭되는 정도에 따라 매칭정도가 높은 세부예측데이터셋에 기초하여 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성을 도출할 수 있다.Thereafter, the prediction result deriving unit 1400, with respect to the input lifestyle information, based on the prediction data set 1520 derived by the prediction data set deriving unit 1300, the polyp of a person with the corresponding input lifestyle information It derives the results for the possibility and the possibility of colorectal cancer. Specifically, the prediction result deriving unit 1400 matches the received input lifestyle information and the lifestyle information included in the prediction data set 1520, and the detailed category information of each lifestyle information matches the degree of matching. Accordingly, the possibility of polyp and colorectal cancer of a person with input lifestyle information can be derived based on the detailed prediction dataset with a high degree of matching.

한편, 상기 서버시스템(1000)의 DB(1500)에는 학습대상의 복수의 항목에 대한 생활습관정보, 대장내시경 결과정보 및 조직검사 결과정보를 포함하는 학습정보(1510), 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 중 1 이상을 포함하는 생활습관정보, 상기 용종가능성 및 대장암가능성을 포함하는 예측데이터셋(1520), 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성을 포함하는 대장암관련결과정보(1530)가 저장되어 있을 수 있다.On the other hand, in the DB 1500 of the server system 1000, learning information 1510 including lifestyle information, colonoscopy result information and biopsy result information for a plurality of items of the learning target, age, gender, BMI, Lifestyle information including at least one of exercise habits, drinking habits, smoking habits, family history, and personal medical history, a prediction data set 1520 including the polyp possibility and colorectal cancer possibility, polyps of a person with input lifestyle information Colorectal cancer-related result information 1530 including the possibility and the possibility of colorectal cancer may be stored.

도 2에 도시된 서버시스템(1000)은 도시된 구성요소 외의 다른 요소들을 더 포함할 수 있으나, 편의상 본 발명의 실시예들에 따른 생활습관정보에 기초한 대장암 관련 정보 예측 방법과 관련된 구성요소들만을 표시하였다.The server system 1000 shown in FIG. 2 may further include elements other than the elements shown, but for convenience, only the elements related to the method for predicting colorectal cancer-related information based on lifestyle information according to embodiments of the present invention. was displayed.

도 3은 본 발명의 일 실시예에 따른 가능성도출부(1200)에 의하여 처리되는 학습정보(1510)의 형태를 개략적으로 도시한다.3 schematically shows the form of learning information 1510 processed by the possibility derivation unit 1200 according to an embodiment of the present invention.

본 발명의 서버시스템(1000)에 포함되는 학습정보수신부(1100)가 수신하는 학습정보(1510)는 학습대상의 복수의 항목에 대한 생활습관정보, 대장내시경 결과정보 및 조직검사 결과정보를 포함한다. 이후, 수신한 학습정보(1510)는 가능성도출부(1200)에 의하여 가공될 수 있다. 구체적으로, 도 3의 (a)는 생활습관정보를, 도 3의 (b)는 대장내시경 결과정보를, 도 3의 (c)는 조직검사 결과정보를 개략적으로 도시한다. 생활습관정보는 예를 들어, 도 3의 (a)에 도시된 바와 같은 학습대상의 건감검진 설문지와 같은 건강검진정보가 수신되면 가능성도출부(1200)는, 수신한 건강검진 설문지에 기초하여 건강검진 설문이미지에 포함되는 생활습관정보를 도출할 수 있다. 혹은, 이미 건강검진 설문을 통해 외부에서 전처리된 복수의 세부카테고리정보에 대한 생활습관정보를 직접적으로 수신할 수도 있다. 한편, 상기 대장내시경 결과정보는 예를 들어, 대장내시경 진단서와 같은 대장내시경을 통해 진단을 수행하고 진단결과를 보유하고 있는 학습대상의 대장내시경 결과에 대한 정보를 의미한다. 대장내시경 결과정보는 가능성도출부(1200)에 의하여 클러스터링 되어 도 3의 (b)에 도시된 바와 같이 대장내시경 결과정보로부터 용종위치(도 3에서 Pericolica area) 및 용종크기(도 3에서 3cm)를 도출할 수 있다. 또한, 조직검사 결과정보는 예를 들어, 조직검사 결과보고서와 같이 조직검사를 수행하여 대장 내 조직에 대한 검사결과를 보유하고 있는 학습대상의 조직검사 결과에 대한 정보를 의미한다. 조직검사 결과정보 또한, 도 3의 (c)에 도시된 바와 같이 가능성도출부(1200)에 의하여 조직검사 결과정보로부터 용종종류(도 3에서 adenoma)가 도출될 수 있다. 이와 같이 가능성도출부(1200)는, 학습정보(1510)가 수신되는 경우, 학습정보(1510)로부터 대장용종에 관련된 키워드를 추출하여 상기 학습대상의 용종크기, 용종위치, 및 용종종류를 포함하는 용종데이터를 도출하는 단계를 수행한다.The learning information 1510 received by the learning information receiving unit 1100 included in the server system 1000 of the present invention includes lifestyle information, colonoscopy result information, and biopsy result information for a plurality of items of the learning target. . Thereafter, the received learning information 1510 may be processed by the possibility derivation unit 1200 . Specifically, Fig. 3 (a) schematically shows lifestyle information, Fig. 3 (b) shows colonoscopy result information, and Fig. 3 (c) schematically shows biopsy result information. Life habit information is, for example, when health check-up information such as a health check-up questionnaire of the learning target as shown in FIG. Lifestyle information included in the examination questionnaire image can be derived. Alternatively, lifestyle information for a plurality of detailed category information that has already been pre-processed externally through a health checkup questionnaire may be directly received. On the other hand, the colonoscopy result information means, for example, information about the colonoscopy result of a learning target who has performed a diagnosis through a colonoscopy, such as a colonoscopy diagnosis certificate, and has the diagnosis result. The colonoscopy result information is clustered by the possibility derivation unit 1200, and as shown in FIG. can be derived In addition, the biopsy result information means information about the biopsy result of a learning target, for example, by performing a biopsy, such as a biopsy result report, and having the test result for the tissue in the large intestine. Biopsy result information In addition, as shown in (c) of FIG. 3, the polyp type (adenoma in FIG. 3) can be derived from the biopsy result information by the possibility derivation unit 1200. As such, the possibility derivation unit 1200, when the learning information 1510 is received, extracts keywords related to colon polyps from the learning information 1510, and includes the polyp size, polyp position, and polyp type of the learning target. The step of deriving polyp data is performed.

도 4는 본 발명의 일 실시예에 따른 생활습관정보를 분류하는 기준을 개략적으로 도시한다.4 schematically shows a criterion for classifying lifestyle information according to an embodiment of the present invention.

본 발명의 서버시스템(1000)은 수신한 생활습관정보를 각각의 세부카테고리정보들의 기설정된 기준에 따라 분류할 수 있다. 상기 생활습관정보가 분류되는 기준은 도 4에 도시된 바와 같이 분류될 수 있다. 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 중 1 이상을 포함하는 생활습관정보는 각각의 세부카테고리에 따라 도 4에 도시된 바와 같은 기설정된 기준으로 분류되어 항목별지수가 부여되어 인식될 수 있다. 예를 들어, 세부카테고리 중 성별의 경우, 여성은 00의 성별항목지수를 부여하고, 남성은 01의 성별항목지수를 부여할 수 있다. 또한, 도 4에 도시된 바와 같이 음주습관 혹은 흡연습관과 같은 개개인의 습관의 정도의 차이가 클 수 있는 세부카테고리 항목에 대해서는 도 4에 도시된 바와 같이 예를 들어, 흡연기간 * 하루평균흡연량과 같은 기준으로 세부카테고리항목에 대한 정도를 수치로 환산하여 수치가 기설정된 기준을 충족하는지 여부에 따라 흡연항목지수를 부여하는 방법으로 항목별지수를 도출할 수 있다. 도출된 항목별지수에 따라 생활습관정보를 인식할 수 있고, 각각 세부카테고리별로 도출된 항목별지수에 기초하여 생활습관정보를 하나의 아이디(예를 들어, 30-01-00-00-10-10-01-01)로 도출할 수도 있다. 이와 같은 방식으로 생활습관정보를 분류함으로써, 복수의 학습대상의 정보를 객관적으로 분류할 수 있는 효과를 발휘할 수 있다.The server system 1000 of the present invention may classify the received lifestyle information according to preset criteria of each detailed category information. The criteria by which the lifestyle information is classified may be classified as shown in FIG. 4 . Lifestyle information including at least one of age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history is classified according to a preset criterion as shown in FIG. 4 according to each detailed category. An asterisk can be assigned and recognized. For example, in the case of gender among detailed categories, a gender item index of 00 may be assigned to women, and a gender item index of 01 may be assigned to men. In addition, as shown in FIG. 4, for detailed category items in which the degree of individual habit, such as drinking habits or smoking habits, may vary greatly, as shown in FIG. 4, for example, smoking period * average daily smoking amount and The index for each item can be derived by converting the degree of each subcategory item into a numerical value based on the same standard and assigning the smoking item index depending on whether the numerical value meets the preset standard. Lifestyle information can be recognized according to the derived item-by-item index, and lifestyle information can be converted into one ID (for example, 30-01-00-00-10- 10-01-01) can also be derived. By classifying the lifestyle information in this way, it is possible to exhibit the effect of objectively classifying the information of a plurality of learning objects.

도 5는 본 발명의 일 실시예에 따른 가능성도출부(1200)에 의하여 도출되는 정보의 처리과정을 개략적으로 도시하고, 도 6은 가능성도출부(1200)에 의하여 학습대상의 대장암가능성을 도출하는데 기초가 되는 기설정된 매핑테이블을 개략적으로 도시한다.5 schematically shows a processing process of information derived by the likelihood derivation unit 1200 according to an embodiment of the present invention, and FIG. It schematically shows a preset mapping table as a basis for this.

본 발명의 상기 가능성도출부(1200)는, 상기 학습대상의 상기 대장내시경 결과정보 및 상기 조직검사 결과정보로부터 대장용종에 관련된 키워드를 추출하여 상기 학습대상의 용종크기, 용종위치 및 용종종류 중 1 이상을 포함하는 용종데이터를 도출하는 단계(S100); 상기 용종크기, 용종위치 및 용종종류 중 1 이상에 의하여 상기 학습대상의 용종보유여부를 판별하고, 동일한 상기 특정 생활습관정보를 갖는 2 이상의 학습대상의 상기 용종보유여부에 기초하여 상기 특정 생활습관정보에 대한 상기 용종가능성을 도출하는 단계(S110); 및 동일한 상기 특정 생활습관정보를 갖는 2 이상의 학습대상의 상기 용종크기 및 용종종류에 대한 정보, 및 기설정된 매핑테이블에 따라 상기 특정 생활습관정보에 대한 상기 대장암가능성을 도출하는 단계;(S120)를 수행한다.The possibility derivation unit 1200 of the present invention extracts keywords related to colon polyps from the colonoscopy result information and the biopsy result information of the learning subject, and selects one of the polyp size, polyp location, and polyp type of the learning subject. Deriving polyp data including the above (S100); It is determined whether the polyp is possessed by the learning target according to one or more of the polyp size, polyp location, and polyp type, and the specific lifestyle information based on whether two or more learning objects having the same specific lifestyle information have the polyp. Deriving the polyp possibility for (S110); and deriving the colorectal cancer possibility for the specific lifestyle information according to information on the polyp size and polyp type of two or more learning objects having the same specific lifestyle information, and a preset mapping table; (S120) carry out

구체적으로 S100단계에서는, 가능성도출부(1200)는 학습대상의 대장내시경 결과정보 및 조직검사 결과정보로부터 대장용종에 관련된 키워드를 추출하여 상기 학습대상의 용종크기, 용종위치 및 용종종류 중 1 이상을 포함하는 용종데이터를 도출할 수 있다. 상기 도 3의 설명에서 상술한 바와 같이 가능성도출부(1200)는 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 용종데이터를 도출한다. 도 5의 (a)는 가능성도출부(1200)에 의하여 용종데이터가 도출된 특정 생활습관정보를 갖는 1 이상의 학습대상의 학습정보(1510)를 도시한다. 상기 도 4의 기준에 따라 복수의 세부카테고리정보를 포함하는 생활습관정보는 항목별지수로 나타나고 각각의 학습대상의 대장내시경 결과정보 및 조직검사 결과정보에 따른 용종위치, 용종크기 및 용종종류가 도시되어 있다. 학습정보#1 및 학습정보#4의 경우, 해당 학습대상의 대장내시경 검사결과 및 조직검사 결과정보에서 대장용종에 관련된 키워드가 추출되지 않았음을 나타내고, 이는 해당 학습대상의 대장에서 용종이 발견되지 않았음을 의미한다.Specifically, in step S100, the possibility derivation unit 1200 extracts keywords related to colon polyps from the colonoscopy result information and the biopsy result information of the learning target, and selects one or more of the polyp size, polyp location, and polyp type of the learning target. It is possible to derive polyp data including As described above in the description of FIG. 3 , the possibility derivation unit 1200 derives polyp data including at least one of a polyp position, a polyp size, and a polyp type. Figure 5 (a) shows the learning information 1510 of one or more learning objects having specific lifestyle information from which polyp data was derived by the possibility derivation unit 1200 . According to the criteria of FIG. 4, lifestyle information including a plurality of detailed category information is displayed as an index by item, and the polyp location, polyp size, and polyp type according to the colonoscopy result information and biopsy result information of each learning target are shown. has been In the case of learning information #1 and learning information #4, it indicates that keywords related to colon polyps were not extracted from the colonoscopy result and biopsy result information of the subject, which means that no polyps were found in the large intestine of the subject. means it wasn't

이후, S110단계에서는, 가능성도출부(1200)는 상기 용종위치, 용종크기 및 용종종류 중 1 이상에 의하여 상기 학습대상의 용종보유여부를 판별한다. 도 5의 (b)에 도시된 바에 따르면, 학습대상#1 및 학습대상#4의 경우, 상술한 바와 같이 해당 학습대상의 학습정보(1510)에서 대장내시경 검사결과 및 조직검사 결과정보에서 대장용종에 관련된 키워드가 추출되지 않았음에 기초하여 용종데이터가 도출되지 않았음이 도시된다. 이와 같이, 상기 용종위치, 용종크기, 및 용종종류 중 1 이상을 포함하는 용종데이터의 도출여부에 따라 학습대상의 용종보유여부를 판별할 수 있다. 상기 용종위치, 용종크기, 및 용종종류 중 1 이상이 도출된 경우, 용종을 보유한 것으로 판별하고, 상기 용종위치, 용종크기, 및 용종종류 중 1 이상이 도출되지 않은 경우, 용종을 보유하지 않은 것으로 판별한다.Thereafter, in step S110, the possibility derivation unit 1200 determines whether the polyp possesses the learning target according to one or more of the polyp position, polyp size, and polyp type. According to the bar shown in Figure 5 (b), in the case of learning object #1 and learning object #4, colon polyps in the colonoscopy result and biopsy result information in the learning information 1510 of the learning object as described above. It is shown that polyp data was not derived based on that keywords related to the were not extracted. In this way, it is possible to determine whether the polyp possessing the learning target according to whether the polyp data including at least one of the polyp location, polyp size, and polyp type is derived. If one or more of the polyp location, polyp size, and polyp type is derived, it is determined that the polyp is possessed. to determine

동일한 특정 생활습관정보를 갖는 복수의 학습대상의 용종보유여부가 판별된 후에, 상기 가능성도출부(1200)는 해당 특정 생활습관정보를 갖는 2 이상의 학습대상의 상기 용종보유여부에 기초하여 상기 특정 생활습관정보에 대한 용종가능성을 도출할 수 있다. 본 발명의 일 실시예에서는, 도 5의 (b)에 도시된 바에 따르면, 나이-성별-BMI-운동습관-음주습관-흡연습관-가족병력-개인병력을 나타내는 생활습관정보가 30-01-00-00-10-10-01-01으로 동일한 특정 생활정보를 가진 5명의 학습대상이 도시되어 있다. 이 중에 학습대상#1 및 학습대상#4 2명은 용종을 보유하고 있지 않고, 학습대상#2, 학습대상#3 및 학습대상#5 3명은 용종을 보유하고 있음이 도시된다. 이에 기초하여 하기의 도출식 1에 따라 용종가능성이 도출될 수 있다.After it is determined whether polyps possessing a plurality of learning objects having the same specific lifestyle information, the possibility derivation unit 1200 is based on the polyps possession of two or more learning objects having the specific lifestyle information. Polyp possibility can be derived for habit information. In one embodiment of the present invention, as shown in FIG. 5(b), lifestyle information indicating age-sex-BMI-exercise habit-drinking habit-smoking habit-family medical history-personal medical history is 30-01- As 00-00-10-10-01-01, five learning subjects with the same specific life information are shown. Among them, it is shown that two of the learning objects #1 and #4 do not have polyps, and the three learning objects #2, #3, and #5 have polyps. Based on this, the polyp possibility can be derived according to the following derivation equation 1.

[도출식 1][derivative formula 1]

용종을 보유한 특정 생활습관정보를 갖는 학습대상의 수 / 특정 생활습관정보를 갖는 전체 학습대상의 수 *100 (%)Number of learning subjects with specific lifestyle information with polyps / Total number of learning subjects with specific lifestyle information *100 (%)

도 5의 (b)에 도시된 일 예를 도출식 1에 적용하면, 도출식 1 = 3/5 *100 = 60%의 용종가능성이 도출된다. 따라서, 상기 생활습관정보 30-01-00-00-10-10-01-01를 갖는 학습대상의 용종가능성은 도 5의 (c)에 도시된 바와 같이 60%로 도출된다.When an example shown in (b) of FIG. 5 is applied to Equation 1, a polyp probability of Equation 1 = 3/5 * 100 = 60% is derived. Therefore, the polyp probability of the learning target having the lifestyle information 30-01-00-00-10-10-01-01 is derived as 60% as shown in (c) of FIG. 5 .

이와 같은 방식으로, 가능성도출부(1200)는 특정 생활습관정보에 대한 용종가능성을 도출할 수 있다.In this way, the possibility derivation unit 1200 may derive the polyp possibility for specific lifestyle information.

이후, 가능성도출부(1200)가 수행하는 상기 특정 생활습관정보에 대한 대장암가능성을 도출하는 단계(S120)는, 특정 생활습관정보를 갖는 2 이상의 학습대상의 상기 용종크기 및 용종종류에 대한 정보 및 기설정된 매핑테이블에 따라 학습대상별 대장암가능성을 도출하는 단계(S120-1); 및 각각의 상기 학습대상별 대장암가능성에 기초하여 상기 특정 생활습관정보에 대한 대장암가능성을 도출하는 단계;(S120-2)를 포함한다.Thereafter, the step of deriving the colorectal cancer possibility for the specific lifestyle information performed by the likelihood derivation unit 1200 (S120) is information on the polyp size and polyp type of two or more learning objects having specific lifestyle information. and deriving the colorectal cancer possibility for each learning target according to a preset mapping table (S120-1); and deriving the colorectal cancer possibility for the specific lifestyle information based on the colorectal cancer possibility for each learning target; (S120-2).

구체적으로, S120-1단계에서는, 가능성도출부(1200)는 도 6과 같은 기설정된 매핑테이블을 통해 특정 생활습관정보를 갖는 각각의 학습정보(1510)별 대장암가능성을 도출한다. 도 5 및 도 6에 도시된 바와 같이 매핑테이블은 용종크기 및 용종종류에 따라 대장암가능성을 도출하는 기준이 된다. 도 5의 (b)에 따르면 학습정보#2의 용종크기는 4로, 용종종류는 융모상선종으로 도시되어 있다. 이에 기초하여 도 6의 매핑테이블의 기준에서 용종종류는 융모상선종이며, 3 내지 6cm 미만의 용종크기에 해당하므로 도 5의 (b)에 도시된 바와 동일하게 대장암가능성을 25%로 도출할 수 있다. 마찬가지로, 학습정보#3 및 학습정보(1510) #5의 경우에도, 도 6의 매핑테이블에 따라 각각의 학습정보(1510)별 대장암가능성이 도출되었음이 도시된다. 이와 같이, 가능성도출부(1200)는 특정 생활습관정보를 갖는 2 이상의 학습대상의 상기 용종크기 및 용종종류에 대한 정보 및 기설정된 매핑테이블에 따라 각각의 학습대상별 대장암가능성을 도출한다.Specifically, in step S120-1, the likelihood derivation unit 1200 derives the colorectal cancer possibility for each learning information 1510 having specific lifestyle information through a preset mapping table as shown in FIG. 6 . As shown in FIGS. 5 and 6 , the mapping table serves as a criterion for deriving the possibility of colorectal cancer according to the polyp size and polyp type. According to FIG. 5 (b), the polyp size of learning information #2 is 4, and the polyp type is shown as villous adenoma. Based on this, in the criteria of the mapping table of FIG. 6, the polyp type is chorionic adenoma, and since it corresponds to a polyp size of 3 to 6 cm or less, the possibility of colorectal cancer is derived as 25% as shown in FIG. can Similarly, in the case of the learning information #3 and the learning information 1510 #5, it is shown that the colorectal cancer possibility was derived for each learning information 1510 according to the mapping table of FIG. 6 . In this way, the likelihood derivation unit 1200 derives the colorectal cancer possibility for each learning object according to the information on the polyp size and the polyp type of two or more learning objects having specific lifestyle information and a preset mapping table.

이후, S120-2단계에서는, 각각의 상기 학습대상별 대장암가능성에 기초하여 상기 특정 생활습관정보에 대한 대장암가능성을 도출한다. 가능성도출부(1200)는 특정 생활습관정보를 갖는 2 이상의 학습대상별 대장암가능성에 기초하여 하기의 도출식 2에 따라 특정생활습관정보에 대한 대장암가능성이 도출될 수 있다.Then, in step S120-2, the colorectal cancer possibility is derived for the specific lifestyle information based on the colorectal cancer possibility for each learning target. The possibility derivation unit 1200 may derive the colorectal cancer possibility with respect to the specific lifestyle information according to Equation 2 below based on the colorectal cancer possibility for each of two or more learning objects having specific lifestyle information.

[도출식 2][derivative formula 2]

상기 도출식 2 에서는, N은 특정 생활습관정보를 갖는 학습대상의 전체 수를 의미한다. In Equation 2, N means the total number of learning objects having specific lifestyle information.

도 5의 (b)에 도시된 예를 도출식 2에 적용하면, 도출식 2 = ((25 + 80 + 2)/5) = 21.4%의 특정생활정보에 대한 대장암가능성이 도출된다. 따라서, 상기 생활습관정보 30-01-00-00-10-10-01-01에 대한 대장암가능성은 도 5의 (c)에 도시된 바와 같이 21.4%로 도출된다.When the example shown in (b) of FIG. 5 is applied to Equation 2, the possibility of colorectal cancer for specific life information of Equation 2 = ((25 + 80 + 2)/5) = 21.4% is derived. Therefore, the colorectal cancer possibility for the lifestyle information 30-01-00-00-10-10-01-01 is derived as 21.4% as shown in FIG. 5(c).

이와 같은 방식으로, 가능성도출부(1200)는 특정 생활습관정보에 대한 대장암가능성을 도출할 수 있다.In this way, the likelihood derivation unit 1200 may derive the colorectal cancer possibility for specific lifestyle information.

도 7은 본 발명의 일 실시예에 따른 가능성도출부(1200) 및 예측데이터셋도출부(1300)에 의하여 도출되는 데이터의 형태를 개략적으로 도시한다.7 schematically shows the form of data derived by the likelihood derivation unit 1200 and the prediction data set derivation unit 1300 according to an embodiment of the present invention.

상술한 바와 같이 가능성도출부(1200)는 생활습관정보, 대장내시경 결과정보 및 조직검사 결과정보에 기초하여 학습대상의 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 용종데이터를 도출하고, 특정 생활습관정보에 대한 용종가능성 및 대장암가능성을 도출한다. 도 7의 (a)는 가능성도출부(1200)에 의하여 상기 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 용종데이터가 포함된 학습정보(1510)의 일 예의 형태를 도시한다. 용종데이터에 대한 값이 있는 경우, 해당 학습정보(1510)를 갖는 학습대상은 용종을 보유한 것이고, 용종데이터에 대한 값이 없는 경우, 해당 학습정보(1510)를 갖는 학습대상은 용종을 보유하지 않은 것으로 판별할 수 있다.As described above, the possibility derivation unit 1200 derives polyp data including at least one of a polyp location, a polyp size, and a polyp type of a learning target based on lifestyle information, colonoscopy result information, and biopsy result information, The possibility of polyp and colorectal cancer are derived for specific lifestyle information. 7 (a) shows an example form of the learning information 1510 including polyp data including at least one of the polyp position, polyp size, and polyp type by the possibility derivation unit 1200 . If there is a value for the polyp data, the learning target with the corresponding learning information 1510 has a polyp, and if there is no value for the polyp data, the learning target with the corresponding learning information 1510 does not have a polyp. can be identified as

한편, 상기 가능성도출부(1200)는, 상기 학습대상의 학습정보(1510)에 조직검사 결과정보가 없는 경우에는, 과거에 수집된 학습정보(1510) 중 상기 학습정보수신부(1100)에서 수신된 상기 학습대상의 상기 생활습관정보와 생활습관정보가 동일하고, 상기 대장내시경 결과정보가 기설정된 기준 내에서 유사하고, 조직검사 결과정보가 존재하는 과거의 수집된 학습정보(1510)에 기초하여, 상기 학습정보수신부(1100)에서 수신된 학습정보(1510)에 대한 대장암가능성을 도출한다. 구체적으로 도 7의 (a)에 도시된 바에 따르면, 학습정보(1510) #1은 용종위치 및 용종사이즈에 대한 값은 존재하지만 용종종류에 대한 값은 존재하지 않는 것이 도시되어 있다. 이는 조직검사 결과정보가 수신되지 않았음을 의미한다. 이와 같이 본 발명의 일 실시예에서는, 서버시스템(1000)이 수신한 학습대상의 학습정보(1510)에 조직검사 결과정보가 없는 경우가 있을 수 있다. 이와 같은 경우에 가능성도출부(1200)는, 과거에 수집한 학습정보(1510) 중 수신된 조직검사 결과정보가 없는 학습대상의 학습정보(1510)와 동일한 생활습관정보를 가지는 학습정보(1510)를 필터링한다. 이후, 동일한 생활습관정보를 가진 1 이상의 학습정보(1510)의 용종위치 및 용종크기와 수신한 학습정보(1510)에 포함된 용종위치 및 용종크기를 비교하여 기설정된 기준 내에서 유사한 학습정보(1510)를 필터링한다. 도 7의 (a) 및 (b)에 따르면 용종종류에 대한 값이 없는 학습정보#1와 동일한 생활습관정보 40-00-00-00-10-10-01-01를 갖는 복수의 학습대상의 학습정보(1510)가 도시되고, 각각의 학습정보(1510)의 용종위치 및 용종크기를 비교했을 때, 학습정보#1 및 학습정보#5의 용종위치 및 용종크기가 R 및 4로서 동일함이 도시되어 있다. 이에 따라 가능성도출부(1200)는, 학습정보#1의 용종종류를 학습정보#5와 동일한 용종종류 Type4로 도출한다. 바람직하게는, 과거에 수집한 동일한 생활습관정보를 갖는 1 이상의 학습대상의 학습정보(1510) 중 상기 대장내시경 결과정보로부터 도출된 용종위치 및 용종크기가 동일한 학습정보(1510)가 존재하는 경우, 해당 학습정보(1510)의 용종종류를 학습정보수신부(1100)에서 수신한 학습정보(1510)의 용종종류로 도출하고, 상기 용종위치 및 용종크기가 동일한 학습정보(1510)가 존재하지 않는 경우, 용종위치가 동일하고, 용종크기가 기설정된 범위내에 해당하는 학습정보(1510)의 용종종류를 수신한 학습정보수신부(1100)에서 수신한 학습정보(1510)의 용종종류로 도출할 수 있다.On the other hand, the possibility derivation unit 1200, when there is no tissue examination result information in the learning information 1510 of the learning target, the learning information received in the learning information receiving unit 1100 of the learning information 1510 collected in the past Based on the collected learning information 1510 in the past that the lifestyle information and the lifestyle information of the learning target are the same, the colonoscopy result information is similar within a preset standard, and the biopsy result information exists, A colorectal cancer possibility is derived for the learning information 1510 received from the learning information receiving unit 1100 . Specifically, as shown in (a) of FIG. 7 , in the learning information 1510 #1, values for the location and size of the polyp exist, but values for the type of the polyp do not exist. This means that the biopsy result information was not received. As such, in one embodiment of the present invention, there may be a case where there is no tissue examination result information in the learning information 1510 of the learning object received by the server system 1000 . In such a case, the possibility derivation unit 1200 is the learning information 1510 having the same lifestyle information as the learning information 1510 of the learning target without the received biopsy result information among the learning information 1510 collected in the past. to filter Thereafter, by comparing the polyp location and polyp size of one or more learning information 1510 having the same lifestyle information with the polyp location and polyp size included in the received learning information 1510, similar learning information 1510 within a preset standard. ) to filter out. According to (a) and (b) of Figure 7, a plurality of learning objects having the same lifestyle information 40-00-00-00-10-10-01-01 as the learning information #1 without a value for the polyp type Learning information 1510 is shown, and when the polyp position and polyp size of each learning information 1510 are compared, the polyp position and polyp size of learning information #1 and learning information #5 are the same as R and 4. is shown. Accordingly, the possibility derivation unit 1200 derives the polyp type of the learning information #1 as the same polyp type Type4 as the learning information #5. Preferably, among the learning information 1510 of one or more learning objects having the same lifestyle information collected in the past, if the polyp location and the polyp size derived from the colonoscopy result information are the same learning information 1510, When the polyp type of the corresponding learning information 1510 is derived as the polyp type of the learning information 1510 received from the learning information receiving unit 1100, and the learning information 1510 having the same polyp location and polyp size does not exist, The polyp location is the same, and the polyp size can be derived as the polyp type of the learning information 1510 received from the learning information receiving unit 1100 that has received the polyp type of the learning information 1510 that is within a preset range.

이와 같은 방식으로 가능성도출부(1200)는, 조직검사 결과정보가 없는 경우의 학습정보(1510)를 과거의 수집된 학습정보(1510)에 기초하여 조직검사 결과정보에 대한 정보를 모두 도출하고, 이후, 상기 도 5 및 6에서 설명한 방법을 통해 학습정보(1510)수신단계에서 수신된 학습정보(1510)에 대한 대장암가능성을 도출하고, 해당하는 특정 생활습관정보에 대한 대장암가능성을 도출한다.In this way, the possibility derivation unit 1200 derives all the information about the biopsy result information based on the learning information 1510 collected in the past when there is no biopsy result information, Then, through the method described in FIGS. 5 and 6, the possibility of colon cancer is derived for the learning information 1510 received in the step of receiving the learning information 1510, and the possibility of colorectal cancer is derived for the corresponding specific lifestyle information. .

이와 같은 방식으로 가능성도출부(1200)는, 학습정보(1510)를 모두 수신하지 못한 경우에도 일부 학습정보(1510)에 기초하여 해당 학습정보(1510)를 보완하여 데이터 신뢰성을 향상시킬 수 있는 효과를 발휘할 수 있다.In this way, the possibility derivation unit 1200 supplements the learning information 1510 based on some learning information 1510 even when all of the learning information 1510 has not been received, thereby improving data reliability. can perform

한편, 예측데이터셋도출부(1300)는, 상기 가능성도출부(1200)가 도출한 특정 생활습관정보에 대한 용종가능성 및 대장암가능성에 기초하여 복수의 상기 특정 생활습관정보를 가진 그룹의 용종가능성 및 대장암가능성에 대한 예측데이터셋(1520)을 도출한다. 예측데이터셋(1520)은, 복수의 세부카테고리를 포함하는 생활습관정보, 상기 용종가능성, 상기 대장암가능성을 포함하고, 상기 생활습관정보는, 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력을 포함한다. 가능성도출부(1200)가 도출한 특정 생활습관정보에 대한 용종가능성 및 대장암가능성은 도 7의 (c)에 도시된 바와 같이 생활습관정보 40-00-00-00-10-10-01-01에 대한 용종가능성 A1, 및 대장암가능성B1을 도출하고, 예측데이터셋도출부(1300)는, 도 7의 (d)에 도시된 바와 같이 전체 학습정보(1510)를 복수의 상기 특정생활습관정보를 가진 그룹별 생활습관정보, 생활습관정보에 대한 용종가능성, 및 대장암가능성을 포함하는 예측데이터셋(1520)을 도출한다. 이와 같이 도출된 예측데이터셋(1520)에 기초하여 후술하는 예측결과도출부(1400)는 입력된 입력생활습관정보에 대한 용종가능성 및 대장암가능성을 예측할 수 있다.On the other hand, the prediction dataset deriving unit 1300, the polyp possibility of a group having a plurality of the specific lifestyle information based on the polyp possibility and colorectal cancer possibility for the specific lifestyle information derived by the possibility deriving unit 1200 and a prediction data set 1520 for the possibility of colorectal cancer. The prediction dataset 1520 includes lifestyle information including a plurality of detailed categories, the polyp possibility, and the colorectal cancer possibility, and the lifestyle information is, age, gender, BMI, exercise habits, drinking habits, smoking Including habits, family history, and personal medical history. As shown in (c) of FIG. 7 , the polyp potential and colorectal cancer possibility for specific lifestyle information derived by the likelihood derivation unit 1200 are lifestyle information 40-00-00-00-10-10-01- The polyp probability A1 for 01, and the colorectal cancer possibility B1 are derived, and the prediction dataset derivation unit 1300 converts the entire learning information 1510 into a plurality of the specific lifestyle habits as shown in FIG. A prediction dataset 1520 including lifestyle information for each group with information, polyp potential for lifestyle information, and colorectal cancer potential is derived. Based on the prediction data set 1520 derived in this way, the prediction result deriving unit 1400 to be described later may predict the polyp possibility and the colorectal cancer possibility with respect to the input lifestyle information.

도 8은 본 발명의 일 실시예에 따른 예측결과도출부(1400)의 수행 단계를 개략적으로 도시하고, 도 9는 본 발명의 일 실시예에 따른 예측결과도출부(1400)에 의하여 생활습관정보의 세부카테고리항목을 제외하는 기준을 개략적으로 도시한다.8 schematically shows the execution steps of the prediction result deriving unit 1400 according to an embodiment of the present invention, and FIG. 9 is lifestyle information by the prediction result deriving unit 1400 according to an embodiment of the present invention. It schematically shows the criteria for excluding detailed category items of

본 발명의 예측결과도출부(1400)는, 입력된 입력생활습관정보에 대하여, 상기 예측데이터셋도출부(1300)로부터 도출된 예측데이터셋(1520)에 기초하여 해당 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성에 대한 결과를 도출할 수 있다. 구체적으로, 상기 예측결과도출부(1400)는, 상기 입력생활습관정보 및 상기 예측데이터셋(1520)에 포함된 생활습관정보의 매칭정도에 따라 유사도를 도출하고, 상기 유사도가 기설정된 기준을 부합하는지 여부를 판별하여 유사그룹예측데이터를 도출하는 단계(S1000 내지 S1300); 및 상기 유사그룹예측데이터에 기초하여 상기 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성을 도출하는 단계(S1400);를 수행한다.The prediction result deriving unit 1400 of the present invention, with respect to the input lifestyle information, based on the prediction data set 1520 derived from the prediction data set deriving unit 1300, a person who has the corresponding input lifestyle information of polyp and colorectal cancer can be derived. Specifically, the prediction result deriving unit 1400 derives a degree of similarity according to the matching degree of the input lifestyle information and the lifestyle information included in the prediction data set 1520, and the degree of similarity meets a preset criterion Deriving similar group prediction data by determining whether or not (S1000 to S1300); and deriving the polyp possibility and colorectal cancer possibility of a person with the input lifestyle information based on the similar group prediction data (S1400).

S1000단계에서는, 입력된 입력생활습관정보 및 예측데이터셋(1520)에 포함된 생활습관정보의 매칭정도에 따라 유사도를 도출한다. 입력된 입력생활습관정보와 상기 예측데이터셋(1520)도출단계에서 도출된 예측데이터셋(1520)의 생활습관정보를 매칭하여 생활습관정보의 세부카테고리에 따른 매칭여부를 판별하고, 매칭여부에 따라 유사도를 도출할 수 있다. 생활습관정보에 포함되는 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 각각의 카테고리의 매칭여부를 판별한 후 매칭여부에 따라 유사도를 도출한다. 유사도를 도출하는 식은 하기 도출식 3에 나타낸다. In step S1000, a degree of similarity is derived according to the degree of matching between the input lifestyle information and the lifestyle information included in the prediction dataset 1520 . By matching the input lifestyle information input with the lifestyle information of the prediction dataset 1520 derived in the step of deriving the prediction dataset 1520, it is determined whether matching according to the detailed category of the lifestyle information is matched, and depending on whether the matching similarity can be derived. Age, gender, BMI, exercise habit, drinking habit, smoking habit, family history, and personal medical history included in the lifestyle information are determined whether each category is matched, and the degree of similarity is derived according to the matching. The equation for deriving the similarity is shown in Derived Equation 3 below.

[도출식 3][derivative formula 3]

(매칭된 생활습관정보의 세부카테고리 항목의 수 / 전체 세부카테고리 항목의 수)*100(Number of detailed category items of lifestyle information matched / Total number of detailed category items)*100

상기 예측결과도출부(1400)가 유사도를 도출하는 일 예는 다음과 같다.An example in which the prediction result deriving unit 1400 derives the degree of similarity is as follows.

[예 1][Example 1]

입력생활습관정보 : 30-01-00-00-10-10-01-01Input lifestyle information: 30-01-00-00-10-10-01-01

세부예측데이터셋 생활습관정보 : 50-00-01-00-12-10-01-01Detailed Prediction Dataset Lifestyle Information: 50-00-01-00-12-10-01-01

매칭여부 :X-X-X-O-X-O-O-OMatching :X-X-X-O-X-O-O-O

상기 일 예를 도출식 3의 적용하면 (4/8)*100 = 40%의 유사도를 도출할 수 있다. 이와 같은 방식으로 예측결과도출부(1400)는 입력생활습관정보 및 예측데이터셋(1520)의 유사도를 도출한다.If the above example is applied to Equation 3, a similarity of (4/8)*100 = 40% can be derived. In this way, the prediction result deriving unit 1400 derives the similarity between the input lifestyle information and the prediction data set 1520 .

이후 S1100단계에서는, S1000단계에서 도출된 유사도가 기설정된 기준에 부합하는지 여부를 판별한다. 예를 들어, 상기 기설정된 기준이 도출된 유사도가 80% 이상인 경우라고 가정할 때, 본 발명의 일 실시예에서는 생활습관정보의 세부카테고리 항목의 수는 8이므로, 7 이상의 항목의 생활습관정보가 동일한 경우에 80%이상의 유사도가 도출될 수 있다. 이에 따라, 예측결과도출부(1400)는 예측데이터셋(1520) 중 80% 이상의 유사도가 도출된 세부예측데이터셋을 판별한다. 이후, 유사도가 기설정된 기준에 부합하는 세부예측데이터셋이 있는 경우 S1200단계를 수행하고, 유사도가 기설정된 기준에 부합하는 세부예측데이터셋이 없는 경우, S1300단계를 수행한다.Thereafter, in step S1100, it is determined whether the degree of similarity derived in step S1000 meets a preset criterion. For example, assuming that the degree of similarity derived from the preset criterion is 80% or more, in an embodiment of the present invention, since the number of detailed category items of lifestyle information is 8, the lifestyle information of 7 or more items is In the same case, a similarity of 80% or more can be derived. Accordingly, the prediction result deriving unit 1400 determines the detailed prediction dataset from which the similarity of 80% or more is derived among the prediction datasets 1520 . Thereafter, if there is a detailed prediction data set having a similarity that meets a preset criterion, step S1200 is performed.

S1200단계는, 상기 유사도가 기설정된 기준에 부합하는 세부예측데이터셋이 있는 경우, 예측결과도출부(1400)가 수행하는 단계로서, S1200단계에서는, 예측데이터셋(1520) 중 유사도가 기설정된 기준을 부합하는 1 이상의 세부예측데이터셋을 유사그룹예측데이터로 도출한다.Step S1200 is a step performed by the prediction result derivation unit 1400 when there is a detailed prediction data set in which the degree of similarity meets a preset criterion. One or more detailed prediction datasets that match , are derived as similar group prediction data.

한편, S1300단계는, 상기 유사도가 기설정된 기준에 부합하는 세부예측데이터셋이 없는 경우, 예측결과도출부(1400)가 수행하는 단계로서, S1300단계에서는, 입력생활습관정보의 복수의 세부카테고리정보의 일부를 기설정된 기준에 따라 제외한다. 도 9에 도시된 바에 따르면, 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력에 따른 8개의 세부카테고리정보의 항목의 매칭정도에 따라 도출된 유사도가 기설정된 기준에 부합하는 세부예측데이터셋이 없는 경우, 생활습관정보의 복수의 세부카테고리정보 중 나이 및 성별을 제외하여 BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력에 따른 6개의 세부카테고리정보를 기준으로 한다. 이와 같이 예측결과도출부(1400)는 도 9에 도시된 바와 같은 기설정된 기준에 따라 복수의 세부카테고리정보의 일부를 제외할 수 있다. 바람직하게는, 상기 입력생활습관정보는, 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력 중 1 이상을 포함하고, 상기 유사그룹예측데이터로 도출하는 단계는, 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력에 대한 제1유사도가 기설정된 기준을 부합하는지 여부를 판별하고, 상기 제1유사도가 기설정된 기준에 부합하지 않는 경우, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력에 대한 제2유사도가 기설정된 기준을 부합하는지 여부를 판별하고, 상기 제2유사도가 기설정된 기준에 부합하지 않는 경우, BMI, 흡연습관, 음주습관 및 개인병력에 대한 제3유사도가 기설정된 기준을 부합하는지 여부를 판별하여, 상기 유사그룹예측데이터를 도출한다.On the other hand, step S1300 is a step performed by the prediction result derivation unit 1400 when there is no detailed prediction data set that meets the preset criteria for the degree of similarity. In step S1300, a plurality of detailed category information of the input lifestyle information part of it is excluded according to the preset criteria. As shown in FIG. 9, the degree of similarity derived according to the matching degree of items of eight detailed category information according to age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history is based on a preset standard. If there is no matching detailed prediction data set, 6 detailed category information according to BMI, exercise habits, drinking habits, smoking habits, family and personal medical history, excluding age and gender, among multiple detailed category information of lifestyle information based on As such, the prediction result deriving unit 1400 may exclude a part of the plurality of detailed category information according to a preset criterion as shown in FIG. 9 . Preferably, the input lifestyle information includes at least one of age, sex, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history, and the step of deriving it as the similar group prediction data includes age, , gender, BMI, exercise habits, drinking habits, smoking habits, family history, and personal medical history to determine whether the first degree of similarity meets a preset standard, and if the first similarity does not meet the preset standard, It is determined whether the second similarity for BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history meets a preset criterion, and if the second similarity does not meet the preset criterion, BMI, smoking It is determined whether the third similarity for habit, drinking habit, and personal medical history meets a preset criterion, and the similar group prediction data is derived.

이와 같이 입력생활습관정보 및 예측데이터셋(1520)과 매칭하는 복수의 세부카테고리정보의 일부를 제외한 후, 예측결과도출부(1400)는, 다시 상기 S1000 내지 S1200단계를 수행하여, 상기 유사도가 기설정된 기준을 부합하는 1 이상의 세부예측데이터셋을 유사그룹예측데이터로 도출할 수 있다.After excluding some of the plurality of detailed category information matching the input lifestyle information and the prediction data set 1520 as described above, the prediction result deriving unit 1400 again performs the steps S1000 to S1200, so that the similarity is One or more detailed prediction datasets that meet the set criteria can be derived as similar group prediction data.

S1400단계에서는, 상기 S1000 내지 S1300단계의 수행을 통해 유사그룹예측데이터를 도출한 예측결과도출부(1400)는, 상기 유사그룹예측데이터에 기초하여 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성을 도출한다.In step S1400, the prediction result derivation unit 1400, which derives the similar group prediction data through the execution of the steps S1000 to S1300, based on the similar group prediction data, the possibility of polyp and colorectal cancer of a person with input lifestyle information draw possibilities

본 발명의 일 실시예에서, 입력된 입력생활습관정보가 도 10의 (a)와 같은 정보가 입력된 경우, 예측결과도출부(1400)는 상기 S1000 내지 S1300단계의 수행을 통해 도 10의 (b)와 같은 세부예측데이터셋을 유사그룹예측데이터로 도출할 수 있다. 도 10의 (b)에 도시된 세부예측데이터셋의 생활습관정보는 입력생활습관정보와 동일한 20-00-00-00-10-10-01-01를 포함하고 있음이 도시되어 있다. 예측결과도출부(1400)는 이와 같은 유사그룹예측데이터를 도출한 후, 20-00-00-00-10-10-01-01에 해당하는 특정 생활습관정보에 대한 유사그룹예측데이터에 기초하여 20-00-00-00-10-10-01-01에 해당하는 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성을 도출한다. 이에 따라 도 10의 (c)는, 20-00-00-00-10-10-01-01에 해당하는 특정 생활습관정보에 대한 유사그룹예측데이터의 용종가능성 값인 A1 및 대장암가능성 B2가 20-00-00-00-10-10-01-01에 해당하는 입력생활습관정보를 가진 사람의 용종가능성의 값이 A1 및 대장암가능성의 값이 B2로 도출된 것이 도시된다.In an embodiment of the present invention, when the input lifestyle information is input as shown in FIG. A detailed prediction data set like b) can be derived as similar group prediction data. It is shown that the lifestyle information of the detailed prediction data set shown in FIG. 10(b) includes the same 20-00-00-00-10-10-01-01 as the input lifestyle information. After deriving such similar group prediction data, the prediction result deriving unit 1400 is based on the similar group prediction data for specific lifestyle information corresponding to 20-00-00-00-10-10-01-01. The possibility of polyp and colorectal cancer of a person who has input lifestyle information corresponding to 20-00-00-00-10-10-01-01 is derived. Accordingly, (c) of FIG. 10 shows that A1 and colorectal cancer possibility B2, which are polyp probability values of similar group prediction data for specific lifestyle information corresponding to 20-00-00-00-10-10-01-01, are 20 It is shown that the polyp probability value of a person who has input lifestyle information corresponding to -00-00-00-10-10-01-01 is derived as A1 and the colorectal cancer possibility value is derived as B2.

도 10의 (b)는 상술한 설명과 같이 생활습관정보의 모든 세부카테고리정보의 항목이 입력생활습관정보와 일치하는 경우를 설명하였지만 본 발명의 일 실시예에서는, 입력된 입력생활습관정보와 예측데이터셋(1520)의 생활습관정보의 세부카테고리정보의 항목이 모두 일치하지 않는 경우가 있을 수 있다. 도 10의 (c)는 세부카테고리정보의 항목이 모두 일치하는 세부예측데이터셋이 없는 경우에 기설정된 기준에 따라 도출된 유사그룹예측데이터를 도시한다. 도 10의 (c)에 도시된 바에 따르면 20-00-00-00-10-10-01-01과 매칭정도에 따라 유사도가 기설정된 기준에 부합하는 유사그룹예측데이터로서 5개의 세부예측데이터셋이 도출되었다. 세부예측데이터셋#1의 경우, 성별을 제외한 나머지 카테고리정보의 항목이 동일하고, 세부예측데이터셋#2의 경우, 나이를 제외한 나머지 카테고리 정보의 항목이 동일하고, 나머지 세부예측데이터셋#3,#4 및 #5의 경우에도 각각 음주습관, 흡연습관 및 BMI를 제외한 나머지 카테고리정보의 항목이 동일한 것이 도시된다. 이 경우, 예측결과도출부(1400)는, 20-00-00-00-10-10-01-01에 해당하는 특정 생활습관정보에 대한 유사그룹예측데이터에 기초하여 20-00-00-00-10-10-01-01에 해당하는 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성을 도출한다. 이에 따라 도 10의 (e)는, 20-00-00-00-10-10-01-01에 해당하는 특정 생활습관정보에 대한 유사그룹예측데이터인 5개의 세부예측데이터셋의 용종가능성 및 대장암가능성 값의 평균값인 A1+A2+A3+A4+A5/5 및 B1+B2+B3+B4+B5/5가 20-00-00-00-10-10-01-01에 해당하는 입력생활습관정보를 가진 사람의 용종가능성 값이 A1+A2+A3+A4+A5/5 및 대장암가능성 값이 B1+B2+B3+B4+B5/5로 도출된 것이 도시된다.Although FIG. 10 (b) describes a case in which all items of detailed category information of lifestyle information match the input lifestyle information as described above, in an embodiment of the present invention, the input lifestyle information and prediction There may be a case where all items of detailed category information of the lifestyle information of the dataset 1520 do not match. 10(c) shows similar group prediction data derived according to a preset criterion when there is no detailed prediction data set in which all items of detailed category information match. As shown in (c) of FIG. 10, the similarity group prediction data that meets the preset criteria according to the degree of matching with 20-00-00-00-10-10-01-01 and 5 detailed prediction datasets this was derived In the case of detailed prediction dataset #1, the items of the remaining category information except for gender are the same, and in the case of detailed prediction dataset #2, the items of the remaining category information except for age are the same, and the remaining detailed prediction dataset #3, In the case of #4 and #5, the same items of category information except for drinking habit, smoking habit, and BMI, respectively, are shown. In this case, the prediction result deriving unit 1400 is 20-00-00-00 based on the similar group prediction data for the specific lifestyle information corresponding to 20-00-00-00-10-10-01-01. -10-10-01-01 The possibility of polyp and colorectal cancer of a person with input lifestyle information is derived. Accordingly, FIG. 10(e) shows the polyp potential and colon of five detailed prediction data sets, which are similar group prediction data for specific lifestyle information corresponding to 20-00-00-00-10-10-01-01. An input life in which the average values of cancer likelihood values, A1+A2+A3+A4+A5/5 and B1+B2+B3+B4+B5/5, correspond to 20-00-00-00-10-10-01-01 It is shown that the polyp probability value of a person with habit information is A1+A2+A3+A4+A5/5 and the colorectal cancer probability value is derived as B1+B2+B3+B4+B5/5.

바람직하게는, 입력생활습관정보를 가진 사람의 용종가능성 및 대장암가능성은 하기의 도출식 4 및 도출식5에 기초하여 도출할 수 있다.Preferably, the possibility of polyp and colorectal cancer of a person with input lifestyle information can be derived based on the following Equations 4 and 5.

[도출식 4][derivative formula 4]

[도출식 5][derivative formula 5]

바람직하게는, 예측결과도출부(1400)가 수행하는, 유사그룹예측데이터를 도출하는 단계는, 상기 유사도가 기설정된 기준을 부합하면서, 상기 입력생활습관정보의 복수의 세부카테고리정보의 항목이 모두 일치하는 세부예측데이터셋이 있는 경우에, 해당 세부예측데이터셋만을 상기 유사그룹예측데이터로 도출하고, 상기 유사도가 기설정된 기준을 부합하면서, 상기 입력생활습관정보의 복수의 세부카테고리정보의 항목이 모두 일치하는 세부예측데이터셋은 없는 경우에, 도출된 1 이상의 세부예측데이터셋을 상기 유사그룹예측데이터로 도출한다.Preferably, the step of deriving similar group prediction data performed by the prediction result deriving unit 1400 includes all items of a plurality of detailed category information of the input lifestyle information while the degree of similarity meets a preset criterion. When there is a matching detailed prediction data set, only the corresponding detailed prediction data set is derived as the similar group prediction data, and while the degree of similarity meets a preset criterion, the items of a plurality of detailed category information of the input lifestyle information are When there is no matching detailed prediction data set, one or more derived detailed prediction data sets are derived as the similar group prediction data.

이와 같은 방식으로, 예측결과도출부(1400)는 학습정보(1510)를 학습하여 도출된 예측데이터셋(1520)에 기초하여 입력생활습관정보를 갖는 사람의 용종가능성 및 대장암가능성을 도출할 수 있다.In this way, the prediction result deriving unit 1400 can derive the polyp possibility and colorectal cancer possibility of a person having input lifestyle information based on the prediction data set 1520 derived by learning the learning information 1510 . have.

도 11은 본 발명의 일 실시예에 따른 기계학습모델에 기반한 선종 관련 정보 예측 시스템을 개략적으로 도시한다.11 schematically shows a ship type-related information prediction system based on a machine learning model according to an embodiment of the present invention.

서버시스템에 의하여 수행되는, 기계학습모델에 기반한 선종 관련 정보 예측시스템은 도 11에 도시된 바와 같이, 제2학습데이터도출부(2100), 압축데이터생성부(2200), 제2추론모델학습부(2300), 제3추론모델학습부(2400), 서비스제공부(2500), DB를 포함한다.As shown in FIG. 11, the ship type-related information prediction system based on the machine learning model, performed by the server system, is a second learning data derivation unit 2100, a compressed data generation unit 2200, and a second inference model learning unit. 2300 , a third inference model learning unit 2400 , a service providing unit 2500 , and a DB.

상기 제2학습데이터도출부(2100)는 도 1 내지 10을 참조하여 설명하였던 학습정보수신부, 기능성도출부, 예측데이터셋도출부, 예측결과도출부를 포함한다. 이들 구성에 대하여 도 1 내지 10을 참조하여 설명하였던 부분과 중복된 부분들은 생략하기로 한다. 다만, 도 11에 도시된 시스템의 일 실시예에서는 선종유무에 대한 정보를 중심으로, 학습정보수신부, 기능성도출부, 예측데이터셋도출부, 예측결과도출부가 동작을 하게 되고, 예측데이터셋도출부에 의하여 도출된 예측데이터셋에 기초하여 예측동작을 수행하는 되는 예측결과도출부가 제1추론모델에 포함된다.The second learning data deriving unit 2100 includes a learning information receiving unit, a functional deriving unit, a prediction data set deriving unit, and a prediction result deriving unit, which have been described with reference to FIGS. 1 to 10 . Parts overlapping with those described with reference to FIGS. 1 to 10 with respect to these configurations will be omitted. However, in one embodiment of the system shown in FIG. 11, the learning information receiving unit, the functional deriving unit, the prediction data set deriving unit, and the prediction result deriving unit are operated based on the information on the presence or absence of the line type, and the prediction data set deriving unit is operated. A prediction result derivation unit that performs a prediction operation based on the prediction data set derived by , is included in the first inference model.

구체적으로, 제2학습데이터도출부(2100)는 의료기관으로부터 수집한 개인별 복수의 제1항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제1학습데이터에 의하여 통계적으로 학습된 제1추론모델을 이용하여 임의의 개인별 복수의 제1항목에 대한 생활습관정보에 대한 예측된 선종정보를 도출하여, 상기 임의의 개인별 복수의 제1항목에 대한 생활습관정보 및 선종정보를 포함하는 제2학습데이터를 도출하는 제2학습데이터도출단계를 수행한다.Specifically, the second learning data derivation unit 2100 is a first including adenoma information including lifestyle information for a plurality of first items for each individual collected from a medical institution, and information on the presence or absence of an adenoma according to a colonoscopy result. Using the first inference model statistically learned by the learning data, the predicted adenomatous information on the lifestyle information for the plurality of first items for each individual is derived, and the life for the plurality of first items for each individual is derived. A second learning data derivation step of deriving second learning data including habit information and Zen species information is performed.

즉, 제2학습데이터도출단계는 수집된 생활습관정보 및 선종정보에 기초하여 도출되는 예측데이터셋으로 통계적으로 학습된 제2추론모델을 이용하여, 임의의 생활습관정보에 대한 선종정보를 예측하고, 이를 통하여 가상의 학습데이터를 추출한다.That is, the second learning data derivation step uses the second inference model statistically learned with a prediction dataset derived based on the collected lifestyle information and Zen type information to predict the Zen type information for any lifestyle information, and , through which virtual learning data is extracted.

제2학습데이터를 도출하는 데 사용되는 임의의 개인별 복수의 제1항목에 대한 생활습관정보는 후술하는 도 14에 도시된 서비스를 통하여 입력되는 사용자들의 생활습관정보에 해당하거나, 난수발생 혹은 기존의 생활습관정보에 대하여 난수발생에 기반한 가변에 따라 인위적으로 생성한 생활습관정보에 해당할 수 있다.The lifestyle information on the plurality of first items for each individual used to derive the second learning data corresponds to the lifestyle information of users input through the service shown in FIG. 14 to be described later, or generates random numbers or existing The lifestyle information may correspond to lifestyle information artificially generated according to a variable based on random number generation.

본 발명의 실시예에서는, 제1학습데이터 뿐만 아니라, 이와 같은 통계적으로 학습된 제1추론모델에 의하여 획득된 제2학습데이터에 의하여 후술하는 인공신경망 기반의 제2추론모델 및 제3추론모델의 학습데이터가 확장되고, 이에 따라 제2추론모델과 제3추론모델의 추론정확도를 개선시킬 수 있다.In an embodiment of the present invention, not only the first learning data, but also the artificial neural network-based second inference model and the third inference model to be described later by the second learning data obtained by the statistically learned first inference model. The training data is expanded, and accordingly, the inference accuracy of the second inference model and the third inference model can be improved.

압축데이터생성부(2200)는 상기 제1학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제1압축학습데이터를 생성하고, 상기 제2학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제2압축학습데이터를 생성하는 압축데이터생성단계를 수행한다.Compressed data generation unit 2200 generates first compressed learning data including lifestyle information and Zen type information for a second item in which some of the plurality of first items are removed from the first learning data, and 2 A compressed data generation step of generating second compressed learning data including lifestyle information on a second item in which some of the plurality of first items is removed from the second learning data, and Zen type information is performed.

제1학습데이터 및 제2학습데이터에서의 제1항목에 따른 생활습관정보는 다수의 생활정보카테고리를 가질 수 있다. 본 발명에서는 학습정확도의 향상 및 연산부하의 감소를 위하여, 제1항목에서 일부 항목이 삭제된 제2항목으로 학습데이터를 압축한 후에, 압축된 학습데이터로 인공신경망 기반의 제2추론모델 및 제3추론모델를 학습시킨다.Lifestyle information according to the first item in the first learning data and the second learning data may have a plurality of living information categories. In the present invention, in order to improve the learning accuracy and reduce the computational load, after compressing the learning data into the second item in which some items are deleted from the first item, the artificial neural network-based second inference model and the second inference model are compressed into the compressed learning data. 3 Train the inference model.

상기 제2추론모델학습단계는 상기 제1압축데이터 및 제2압축데이터에 의하여, 인공신경망을 포함하고, 입력된 제2항목에 대한 생활습관정보에 대하여 선종유무에 대한 정보 가능성을 추론하는 제2추론모델을 학습시킨다.The second inference model learning step includes an artificial neural network based on the first compressed data and the second compressed data, and inferring the possibility of information on the presence or absence of an adenoma with respect to the inputted lifestyle information for the second item. Train an inference model.

이와 같은 제2추론모델은 입력된 제2항목에 따른 생활습관정보에 대하여 선종정보, 예를들어 선종이 발생할 가능성에 대한 예측정보를 출력할 수 있다.Such a second inference model may output adenomatous information, for example, predictive information on the possibility of occurrence of adenoma with respect to the inputted lifestyle information according to the second item.

이하에서는, 제2학습데이터도출부(2100)의 세부 구성요소의 동작에 대하여 설명하도록 한다. 기본적으로 제2학습데이터도출부(2100)의 세부 구성요소는 도 1 내지 도 10을 참조하여 설명하였던, 더욱 상세하게는 도 2의 동일한 명칭의 요소의 동작에 상응하는 동작을 수행하나, 이하에서는 본 발명의 핵심적인 부분에 대해서만 설명하도록 한다.Hereinafter, the operation of the detailed components of the second learning data derivation unit 2100 will be described. Basically, the detailed components of the second learning data derivation unit 2100 perform an operation corresponding to the operation of the element with the same name of FIG. 2 described with reference to FIGS. 1 to 10 , but in the following Only the essential parts of the present invention will be described.

학습정보수신부는 의료기관으로부터 수집한 개인별 복수의 제1항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제1학습데이터를 수신하는 학습정보수신단계;한다. 상기 학습정보수신단계에서는 도 12를 참조하여 설명하는 정규화단계가 수행될 수 있다.The learning information receiving unit receives the first learning data including lifestyle information for a plurality of first items for each individual collected from a medical institution, and adenoma information including information on the presence or absence of adenoma according to the result of colonoscopy. ;do. In the learning information receiving step, a normalization step described with reference to FIG. 12 may be performed.

상기 가능성도출부는 상기 제1학습데이터를 상기 제1항목에 따른 특정 생활습관정보에 따라 선종정보를 클러스터링하고, 클러스터링된 정보에 기초하여 상기 제1항목에 따른 특정 생활습관정보를 가진 그룹에 대한 선종가능성을 도출하는 단계를 수행한다.The possibility derivation unit clusters the first learning data according to the specific lifestyle information according to the first item, and based on the clustered information, the adenomatous type for the group having the specific lifestyle information according to the first item. Take steps to derive possibilities.

상기 예측데이터셋도출부는 복수의 상기 제1항목에 따른 상기 특정 생활습관정보를 가진 그룹의 상기 선종가능성을 포함하는 예측데이터셋을 도출한다. The prediction dataset derivation unit derives a prediction dataset including the likelihood of adenoma of a group having the specific lifestyle information according to a plurality of the first items.

상기 예측결과도출부는 입력된 임의의 개인별 복수의 제1항목에 대한 생활습관정보에 대하여, 상기 예측데이터셋에 기초하여 입력된 해당 생활습관정보를 가진 사람의 선종가능성에 대한 결과를 도출한다.The prediction result deriving unit derives a result on the likelihood of adenomatosis of a person having the inputted lifestyle information based on the prediction data set with respect to the inputted lifestyle information for the plurality of first items for each individual.

전술한 바와 같은 학습정보수신부, 가능성도출부, 예측데이터셋도출부, 예측결과도출부의 동작에 의하여 제2학습데이터가 추출될 수 있다.The second learning data may be extracted by the operations of the learning information receiving unit, the likelihood deriving unit, the prediction data set deriving unit, and the prediction result deriving unit as described above.

한편, 입력된 일부 생활습관정보에 대해서는 이와 동일한 예측데이터셋이 존재하지 않기 때문에, 이에 대한 통계적 예측결과를 도출하기 어려울 수 있다. 이를 보완하기 위하여, 본 발명의 바람직한 실시예에서는 예측결과도출부는 상기 입력된 임의의 개인별 복수의 제1항목에 대한 생활습관정보와 상기 예측데이터셋에 포함된 생활습관정보의 매칭정도에 따라 유사도를 도출하고, 상기 유사도가 기설정된 기준을 부합하는지 여부를 판별하여 복수의 유사그룹예측데이터를 도출하는 단계; 및 상기 복수의 유사그룹예측데이터에서의 선종유무에 대한 정보에 대한 수치 정보의 대표값을 이용하여, 상기 입력된 임의의 개인별 복수의 제1항목에 대한 생활습관정보를 가진 사람의 선종유무에 대한 정보에 대한 확률정보를 도출하는 단계;를 수행할 수 있다.On the other hand, since the same prediction data set does not exist for some inputted lifestyle information, it may be difficult to derive statistical prediction results for it. In order to supplement this, in a preferred embodiment of the present invention, the prediction result deriving unit determines the similarity according to the degree of matching between the inputted lifestyle information for the plurality of first items for each individual and the lifestyle information included in the prediction dataset. deriving a plurality of similarity group prediction data by determining whether the degree of similarity meets a preset criterion; And by using a representative value of numerical information on the information on the presence or absence of adenoma in the plurality of similar group prediction data, the information on the presence or absence of adenoma of a person having lifestyle information for a plurality of first items for each individual inputted above deriving probability information about the information; may be performed.

전술한 기계학습모델에 기반한 선종 관련 정보 예측 시스템 혹은 방법의 세부 동작에 대해서는 도 12 내지 도 16을 참조하여 설명하도록 한다.Detailed operations of the ship type-related information prediction system or method based on the above-described machine learning model will be described with reference to FIGS. 12 to 16 .

본 발명의 다른 실시예에서는, 도 11에 도시된 서버시스템은 서버시스템에 의하여 수행되는, 기계학습모델에 기반한 고위험 선종 관련 정보 예측 시스템으로 동작할 수도 있다.In another embodiment of the present invention, the server system shown in FIG. 11 may operate as a high-risk vessel type-related information prediction system based on a machine learning model, performed by the server system.

이 경우, 상기 제2학습데이터도출부(2100)는 의료기관으로부터 수집한 개인별 복수의 제1항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제1학습데이터에 의하여 통계적으로 학습된 제1추론모델을 이용하여 임의의 개인별 복수의 제1항목에 대한 생활습관정보에 대한 예측된 선종정보를 도출하여, 상기 임의의 개인별 복수의 제1항목에 대한 생활습관정보 및 선종정보를 포함하는 제2학습데이터를 도출하는 제2학습데이터도출단계;를 수행한다.In this case, the second learning data derivation unit 2100 includes adenoma information including lifestyle information on a plurality of first items for each individual collected from medical institutions, and information on the presence or absence of adenoma according to the result of colonoscopy. 1 By using the first inference model statistically learned by the learning data, the predicted adenomatous information on the lifestyle information for the plurality of first items for each individual is derived, and the plurality of first items for each individual is derived. A second learning data derivation step of deriving second learning data including lifestyle information and Zen species information; is performed.

또한, 압축데이터생성부(2200)는 상기 제1학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제1압축학습데이터를 생성하고, 상기 제2학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제2압축학습데이터를 생성하는 압축데이터생성단계;를 수행한다.In addition, the compressed data generating unit 2200 generates first compressed learning data including lifestyle information and Zen type information for a second item in which some of the plurality of first items are removed from the first learning data, performing a compressed data generation step of generating second compressed learning data including lifestyle information for a second item in which some of the plurality of first items are removed from the second learning data, and Zen type information;

또한, 제2추론모델학습부(2300)는 상기 제1압축데이터 및 제2압축데이터에 의하여, 인공신경망을 포함하고, 입력된 제2항목에 대한 생활습관정보에 대하여 선종유무에 대한 정보 가능성을 추론하는 제2추론모델을 학습시킨다.In addition, the second inference model learning unit 2300 includes an artificial neural network by the first compressed data and the second compressed data, and the possibility of information on the presence or absence of an adenoma with respect to the lifestyle information for the input second item. A second inference model that makes inferences is trained.

또한, 제3추론모델학습부(2400)는 의료기관으로부터 수집한 개인별 복수의 제2항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보 및 고위험 선종유무에 대한 정보를 포함하는 확장선종정보를 포함하는 제3압축학습데이터; 및 임의의 개인별 복수의 제2항목에 대한 생활습관정보, 및 상기 임의의 개인별 복수의 제2항목에 대한 생활습관정보 및 상기 제2추론모델에 기초하여 생성된 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제4압축학습데이터에 의하여, 인공신경망을 포함하고, 입력된 제2항목에 대한 생활습관정보에 대하여 고위험 선종유무에 대한 정보 가능성을 추론하는 제3추론모델을 학습시키는 제3추론모델학습단계;를 수행한다.In addition, the third inference model learning unit 2400 expands to include lifestyle information for a plurality of second items for each individual collected from medical institutions, and information on the presence or absence of adenomas and information on the presence or absence of high-risk adenomas according to the results of colonoscopy. third compressed learning data including ship type information; and lifestyle information for a plurality of second items for each individual, lifestyle information for a plurality of second items for each individual, and information on the presence or absence of adenomas generated based on the second inference model. The third reasoning that trains the third inference model for inferring the possibility of information on the presence or absence of high-risk adenoma with respect to the lifestyle information for the input second item, including the artificial neural network, by the fourth compressed learning data including the information The model learning step is performed.

전술한 기계학습모델에 기반한 고위험 선종 관련 정보 예측 시스템 혹은 방법의 세부 동작에 대해서는 도 12 내지 도 16을 참조하여 설명하도록 한다.The detailed operation of the high-risk adenoma-related information prediction system or method based on the above-described machine learning model will be described with reference to FIGS. 12 to 16 .

도 12는 본 발명의 일 실시예에 따른 정규화된 생활습관정보를 생성하는 과정을 개략적으로 도시한다.12 schematically illustrates a process of generating normalized lifestyle information according to an embodiment of the present invention.

도 12에 도시된 과정은 학습정보수신부에 의하여 수행될 수 있다. The process shown in FIG. 12 may be performed by the learning information receiver.

상기 제1학습데이터의 선종유무에 대한 정보는, 상기 의료기관으로부터 수집한 대장내시경 결과정보 및 상기 조직검사 결과정보로부터 용종 혹은 선종에 관련된 키워드를 추출하여 해당 개인에 대한 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 제1용종데이터를 도출하고, 상기 제1용종데이터로부터 상기 선종유무에 대한 정보를 판별하여 정규화된 정보 형태로 도출될 수 있다.The information on the presence or absence of an adenoma in the first learning data is obtained by extracting a keyword related to a polyp or adenoma from the colonoscopy result information and the biopsy result information collected from the medical institution to extract the polyp location, polyp size, and polyp type for the individual. Deriving the first polyp data including one or more of, and determining the information on the presence or absence of the adenoma from the first polyp data can be derived in the form of normalized information.

상기 의료기관으로부터 수집한 대장내시경 결과정보 및 상기 조직검사 결과정보는 의료기관의 EMR정보에 해당할 수 있다. 이와 같은 정보는 전처리 혹은 정규화가 되지 않은 정보에 해당한다.The colonoscopy result information and the biopsy result information collected from the medical institution may correspond to EMR information of the medical institution. Such information corresponds to information that has not been pre-processed or normalized.

상기 학습정보수신부는 의료기관으로 수집된 비정형생활습관정보를 겅규화하여 제1항목에 따른 생활습관정보로 변환한다. 예를들어, 각기 다른 의료기관에서 다른 양식의 EMR데이터가 있는 경우, 이를 제1항목에 따른 생활습관정보로 변환함으로써, 후술하는 제1추론모델, 제2추론모델, 제3추론모델의 학습데이터로 사용될 수 있다.The learning information receiving unit converts the atypical lifestyle information collected by the medical institution into the lifestyle information according to the first item. For example, if there is EMR data in a different format at different medical institutions, it is converted into lifestyle information according to the first item, so that the learning data of the first inference model, the second inference model, and the third inference model, which will be described later can be used

한편, 의료기관으로부터 수집한 정보가 텍스트 형태의 대장내시경 결과정보 및 상기 조직검사 결과정보인 경우에는, 상기 학습정보수신부는 용종 혹은 선종에 관련된 키워드를 추출하여 해당 개인에 대한 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 제1용종데이터를 도출한다.On the other hand, if the information collected from the medical institution is the colonoscopy result information and the biopsy result information in the form of text, the learning information receiving unit extracts a keyword related to the polyp or adenoma, and the polyp location, polyp size and polyp for the individual. The first polyp data including one or more of the types is derived.

이와 같은 용종데이터로부터 선종유무에 대한 정보를 판단할 수 있고, 이와 같이 추출된 선종유무에 대한 정보는 일종의 그라운드 트루스의 역할을 할 수 있다. 본 발명의 일 실시예에서는 선종유무에 대한 정보는 1 혹은 0 형태의 2가지 값 중 어느 하나를 가질 수 있다.Information on the presence or absence of adenoma can be determined from such polyp data, and the information on the presence or absence of adenoma extracted in this way can serve as a kind of ground truth. In an embodiment of the present invention, the information on the presence or absence of adenoma may have any one of two values in the form of 1 or 0.

도 13은 본 발명의 일 실시예에 따른 제1학습데이터 및 제2학습데이터를 확보하는 과정을 개략적으로 도시한다.13 schematically illustrates a process of securing the first learning data and the second learning data according to an embodiment of the present invention.

도 13의 (A)에서는, 상기 제2학습데이터도출부(2100)는 의료기관으로부터 수집한 개인별 복수의 제1항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보를 포함하는 선종정보를 포함하는 제1학습데이터에 의하여 통계적으로 제1추론모델을 학습한다.In (A) of FIG. 13 , the second learning data extracting unit 2100 includes lifestyle information for a plurality of first items for each individual collected from a medical institution, and information on the presence or absence of adenoma according to the result of colonoscopy. The first inference model is statistically learned by the first learning data including information.

본 발명의 일 실시예에서는, 예측결과도출부가 예측데이터셋에 기초하여 동작하고,전술한 예측데이터셋도출부에 의하여 도출된 예측데이터셋들을 지속적으로 수집함으로써, 결과적으로 제1추론모델을 학습한다고 볼 수 있다.In an embodiment of the present invention, the prediction result deriving unit operates based on the prediction data set, and by continuously collecting the prediction data sets derived by the above-described prediction data set deriving unit, as a result, the first inference model is learned. can see.

이와 같이 제1추론모델의 학습에 사용된 정보는 실제 의료기관에서 수집된 정보이기 때문에 양이 제한적일 수 있다. 따라서, 제1추론모델은 전술한 도 9 및 도 10에서와 같은 보완적인 알고리즘을 예측결과도출부가 수행할 수 있게 함으로써, 학습데이터의 부족을 어느 정도 보완할 수 있다. As such, the amount of information used for learning the first inference model may be limited because it is information collected at an actual medical institution. Therefore, the first inference model can compensate for the lack of learning data to some extent by allowing the prediction result derivation unit to perform the complementary algorithm as in FIGS. 9 and 10 described above.

한편, 본 발명의 제2추론모델의 경우 인공신경망 기반 딥러닝 모델에 해당하고, 바람직하게는, 제2추론모델 및 제3추론모델은 로지스틱 회귀분석 기반 인공신경망 모델을 포함할 수 있다.Meanwhile, the second inference model of the present invention corresponds to an artificial neural network-based deep learning model, and preferably, the second inference model and the third inference model may include an artificial neural network model based on logistic regression analysis.

이와 같은 제2추론모델의 경우, 실제 의료기관에서 수집된 제1학습데이터만으로 학습시키기에는 어려움이 있을 수 있기 때문에, 본 발명에서는 통계적으로 학습된 제1추론모델을 이용하여, 학습데이터를 확장할 제2학습데이터를 생성할 수 있다.In the case of such a second inference model, since it may be difficult to learn only with the first learning data collected at an actual medical institution, in the present invention, it is possible to expand the learning data by using the first statistically learned inference model. 2You can create training data.

도 13의 (B)에서와 같이, 본 발명의 실시예들에서는 제1추론모델을 이용하여 임의의 개인별 복수의 제1항목에 대한 생활습관정보에 대한 예측된 선종정보를 도출하여, 상기 임의의 개인별 복수의 제1항목에 대한 생활습관정보 및 선종정보를 포함하는 제2학습데이터를 도출한다.As shown in (B) of FIG. 13 , in the embodiments of the present invention, predicted adenomatous information about lifestyle information for a plurality of first items for each individual is derived using a first inference model, and the arbitrary The second learning data including lifestyle information and Zen type information for the plurality of first items for each individual is derived.

즉, 임의의 생활습관정보 입력값을 제1추론모델에 입력하고, 출력된 예측된 선정정보가 일종의 라벨이 되어, 제2학습데이터로 확보될 수 있다. 도 13의 (B)에서의 예측된 선종정보는 예를들어 0 내지 1 사이의 수치로 표현될 수 있다. 이와 같은 수치는 일종의 확률값에 해당하고, 이는 후술하는 제2추론모델의 학습에 사용될 수 있다.That is, an arbitrary lifestyle information input value is input to the first inference model, and the output predicted selection information becomes a kind of label and can be secured as the second learning data. The predicted adenomatous information in FIG. 13B may be expressed, for example, as a numerical value between 0 and 1. Such a numerical value corresponds to a kind of probability value, which may be used for learning a second inference model to be described later.

도 14는 본 발명의 일 실시예에 따른 사용자의 제1서비스의 이용단계들을 개략적으로 도시한다.14 schematically illustrates steps of a user using a first service according to an embodiment of the present invention.

상기 서버시스템은 자신의 예측된 선종정보를 요청하는 제1사용자으로부터 복수의 제1항목에 대한 생활습관정보를 수신하고, 상기 수신된 제1항목에 대한 생활습관정보 및 상기 제1추론모델을 이용하여 예측된 선종정보를 도출하여, 제1사용자에게 제공하는 제1서비스가 수행됨으로써, 제2학습데이터를 도출할 수 있다.The server system receives lifestyle information for a plurality of first items from a first user who requests his/her predicted line type information, and uses the received lifestyle information for the first item and the first inference model. By deriving the predicted ship type information, the first service provided to the first user is performed, so that the second learning data can be derived.

즉, 제1사용자단말기의 사용자는 통계적으로 학습된 제1추론모델을 이용하여 자신의 선종유무에 대한 예측정보를 확인할 수 있고, 사용자가 입력한 생활습관정보는 다시 제2학습데이터의 일부가 될 수 있다.That is, the user of the first user terminal can check the prediction information about his/her own adenoma using the statistically learned first inference model, and the lifestyle information input by the user will become a part of the second learning data again. can

도 15는 본 발명의 일 실시예에 따른 제1학습데이터 및 제2학습데이터의 압축과정을 개략적으로 도시한다.15 schematically shows a compression process of the first learning data and the second learning data according to an embodiment of the present invention.

상기 압축데이터생성부(2200)는, 상기 제1학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제1압축학습데이터를 생성하고, 상기 제2학습데이터에서 상기 복수의 제1항목 중 일부를 제거한 제2항목에 대한 생활습관정보, 및 선종정보를 포함하는 제2압축학습데이터를 생성한다.The compressed data generating unit 2200 generates first compressed learning data including lifestyle information and Zen type information for a second item in which some of the plurality of first items are removed from the first learning data, Generates second compressed learning data including lifestyle information and Zen type information on a second item in which some of the plurality of first items are removed from the second learning data.

바람직하게는, 상기 제1항목 및 상기 제2항목은 나이, 성별, BMI, 운동습관, 음주습관, 흡연습관, 가족병력 및 개인병력을 포함하고, 상기 제2항목은 상기 제1항목 중 일부가 제외된다. 더욱 바람직하게는, 제2항목의 경우, 제1항목에서 운동습관이 제외된 형태에 해당할 수 있다. Preferably, the first item and the second item include age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history, and personal medical history, and the second item includes some of the first items excluded. More preferably, in the case of the second item, it may correspond to a form in which the exercise habit is excluded from the first item.

도 16은 본 발명의 일 실시예에 따른 제1압축학습데이터 및 제2압축학습데이터를 이용하여 제2추론모델을 학습시키는 과정을 개략적으로 도시한다.16 schematically illustrates a process of learning a second inference model using the first compressed learning data and the second compressed learning data according to an embodiment of the present invention.

제2추론모델학습부(2300)는 상기 제1압축데이터 및 제2압축데이터에 의하여, 인공신경망을 포함하고, 입력된 제2항목에 대한 생활습관정보에 대하여 선종유무에 대한 정보 가능성을 추론하는 제2추론모델을 학습시킨다.The second inference model learning unit 2300 includes an artificial neural network based on the first compressed data and the second compressed data to infer the possibility of information on the presence or absence of an adenoma with respect to the lifestyle information for the input second item. Train the second inference model.

도 17은 본 발명의 일 실시예에 따른 제2서비스를 제공하는 단계들을 개략적으로 도시한다.17 schematically illustrates steps of providing a second service according to an embodiment of the present invention.

상기 제2서비스제공부(2520)는, 자신의 예측된 선종정보를 요청하는 제2사용자으로부터 복수의 제2항목에 대한 생활습관정보를 수신하고, 상기 수신된 제2항목에 대한 생활습관정보 및 상기 제2추론모델을 이용하여 예측된 선종정보를 도출하여, 제2사용자에게 제공하는 제2서비스제공단계;를 수행한다.The second service providing unit 2520 receives the lifestyle information for the plurality of second items from the second user who requests his/her predicted line type information, and receives the lifestyle information for the second item and A second service providing step of deriving predicted ship type information using the second inference model and providing it to a second user; is performed.

예를들어, 사용자가 자신의 BMI, 음주습관, 흡연습관, 가족병력, 개인병력을 자신의 단말기를 통하여 입력하는 경우, 상기 제2서비스제공부(2520)는 학습된 제2추론모델을 이용하여 기계학습모델에 기반한 예측된 선종유무에 대한 정보를 제공할 수 있다.For example, when the user inputs his/her BMI, drinking habit, smoking habit, family history, and personal medical history through his/her terminal, the second service provider 2520 uses the learned second inference model to Information on the presence or absence of adenoma predicted based on a machine learning model can be provided.

도 18은 본 발명의 일 실시예에 따른 제3압축학습데이터를 생성하는 과정들을 개략적으로 도시한다.18 schematically shows processes for generating third compressed learning data according to an embodiment of the present invention.

이하에서는, 기계학습모델에 기반한 고위험 선종 관련 정보 예측 방법 및 시스템에 대한 구체적인 기술적 구성에 대하여, 설명하도록 한다. 고위험 선종은 선종 중에 대장암으로 발전될 수 있는 선종에 해당하고, 본 발명의 실시예들에서는 입력된 생활습관정보에 기초하여 고위험 선종의 가능성 혹은 예측정보를 도출한다.Hereinafter, a detailed technical configuration of a method and system for predicting high-risk adenoma-related information based on a machine learning model will be described. High-risk adenoma corresponds to an adenoma that can develop into colorectal cancer among adenomas, and in embodiments of the present invention, possibility or prediction information of high-risk adenoma is derived based on the inputted lifestyle information.

이와 같은 고위험 선종을 예측하기 위하여, 본 발명에서는 인공신경망 기반의 제3추론모델을 이용하고, 상기 제3추론모델은 제2추론모델과 다른 방식으로 학습이 된다. 이와 같은 고위험 선종을 예측하는 시스템에서 제1학습데이터는 In order to predict such a high-risk adenoma, the present invention uses a third inference model based on an artificial neural network, and the third inference model is learned in a different way from the second inference model. In the system for predicting such high-risk adenomas, the first learning data is

도 18에 도시된 바와 같이, 상기 의료기관으로부터 수집한 대장내시경 결과정보 및 상기 조직검사 결과정보로부터 용종 혹은 선종에 관련된 키워드를 추출하여 해당 개인에 대한 용종위치, 용종크기 및 용종종류 중 1 이상을 포함하는 제1용종데이터를 도출하고, 상기 제1용종데이터로부터 상기 학습대상의 선종유무에 대한 정보를 판별하여 정규화된 정보 형태로 도출된다.As shown in FIG. 18, keywords related to polyps or adenomas are extracted from the colonoscopy result information and the biopsy result information collected from the medical institution, and one or more of the polyp location, polyp size, and polyp type for the individual is included. To derive the first polyp data, and to determine the information on the presence or absence of adenoma of the learning target from the first polyp data is derived in the form of normalized information.

한편, 상기 제3압축학습데이터의 고위험 선종유무에 대한 정보는 상기 제1용종데이터로부터 상기 고위험 선종유무에 대한 정보를 판별하여 정규화된 정보 형태로 도출된다.On the other hand, the information on the presence or absence of high-risk adenoma of the third compressed learning data is derived in the form of normalized information by determining the information on the presence or absence of the high-risk adenoma from the first polyp data.

이아 같은 제3학습데이터는 제1항목에 따른 생황습관정보, 선종유무에 대한 정보, 및 고위험 선종 유무에 대한 정보를 포함하고, 이는 의료기관으로부터 수집된 그라운드 트루스에 해당하는 정보이다.The third learning data, such as Oia, includes information on living habits according to item 1, information on the presence or absence of adenoma, and information on the presence or absence of high-risk adenoma, which is information corresponding to ground truth collected from medical institutions.

이는 마찬가지로, 압축데이터생성부(2200)에 의하여, 압축이 되고, 최종적으로 제3추론모델의 학습에 사용되는 제3압축학습데이터는 의료기관으로부터 수집한 개인별 복수의 제2항목에 대한 생활습관정보, 및 대장내시경 결과에 따른 선종유무에 대한 정보 및 고위험 선종유무에 대한 정보를 포함한다.Similarly, the third compressed learning data that is compressed by the compressed data generating unit 2200 and finally used for learning the third inference model is lifestyle information for a plurality of second items for each individual collected from a medical institution, and information on the presence or absence of adenoma according to the results of colonoscopy and information on the presence or absence of high-risk adenoma.

도 19은 본 발명의 일 실시예에 따른 제4압축학습데이터를 생성하는 과정들을 개략적으로 도시한다.19 schematically shows processes for generating fourth compressed learning data according to an embodiment of the present invention.

전술한 바와 같이, 제3추론모델은 제3압축학습데이터 및 제1추론모델 및 제2추론모델을 이용하여 생성되는 제4압축학습데이터에 의하여 학습된다. As described above, the third inference model is learned by the third compressed learning data and the fourth compressed learning data generated using the first and second inference models.

도 19에 도시된 바와 같이, 우선 제2항목에 따른 생활습관정보가 제2추론모델에 입력된다. 이때 입력되는 생활습관정보는 전술한 바와 같은 제1서비스제공부(2510) 혹은 제2서비스제공부(2520)의 사용자의 이용에 의하여 수집되는 정보 혹은 난수발생 등으로 생성된 정보에 해당한다.19 , first, lifestyle information according to the second item is input to the second inference model. At this time, the inputted lifestyle information corresponds to information collected by the user of the first service providing unit 2510 or the second service providing unit 2520 as described above, or information generated by random number generation.

이와 같은 제2항목에 따른 생활습관정보는 상기 제2추론모델에 입력이 되어 예측된 선종유무에 대한 정보가 도출된다.The lifestyle information according to the second item is input to the second inference model, and information on the predicted adenoma is derived.

이후, 제4압축학습데이터는, 이와 같은 임의의 개인별 복수의 제2항목에 대한 생활습관정보, 및 상기 임의의 개인별 복수의 제2항목에 대한 생활습관정보 및 상기 제2추론모델에 기초하여 생성된 선종유무에 대한 정보를 포함하도록 생성된다. 제Thereafter, the fourth compressed learning data is generated based on the lifestyle information on the plurality of second items for each individual, lifestyle information on the plurality of second items for each individual, and the second inference model. It is created to include information on the presence or absence of adenoma. My

도 20은 본 발명의 일 실시예에 따른 제3추론모델을 학습시키는 과정에 대하여 개략적으로 도시한다.20 schematically illustrates a process of learning a third inference model according to an embodiment of the present invention.

전술한 바와 같은 제3압축학습데이터 및 제4압축학습데이터릴 이용하여 제3추론모델학습부(2400)는 제3추론모델을 학습시킨다. 제3압축학습데이터는 그라운드 트루스로서, 선종유무에 대한 정보가 0 혹은 1의 값으로 지정되어 있고, 고위험 선종 유무에 대한 정보가 0 혹은 1의 값으로 지정되어 있다. The third inference model learning unit 2400 trains the third inference model by using the third compressed learning data and the fourth compressed learning data as described above. The third compressed learning data is ground truth, and information on the presence or absence of adenoma is designated as a value of 0 or 1, and information on the presence or absence of high-risk adenoma is designated as a value of 0 or 1.

한편, 제4압축학습데이터는 고위험 선종 유무에 대한 정보는 없으나, 선종유무에 대한 정보가 0 내지 1 사이의 값을 갖게 된다. 이와 같은 방식으로, 제3압축데이터를 확장하는 방식으로 제3추론모델의 학습데이터를 확장시킬 수 있고, 생활습관정보 및 선종유무에 대한 확률정보와 고위험 선종유무에 대한 확률정보 사이의 관계를 학습시킬 수 있다.Meanwhile, in the fourth compressed learning data, there is no information on the presence or absence of high-risk adenoma, but information on the presence or absence of adenoma has a value between 0 and 1. In this way, the learning data of the third inference model can be expanded by expanding the third compressed data, and the relationship between lifestyle information and probability information on the presence or absence of adenoma and probability information on the presence or absence of high-risk adenoma can be learned. can do it

도 21은 본 발명의 일 실시예에 따른 제2서비스제공부(2520)에 의한 고위험 선종정보를 제공하는 단계들을 개략적으로 도시한다.21 schematically illustrates steps of providing high-risk vessel type information by the second service providing unit 2520 according to an embodiment of the present invention.

전술한 바와 같이, 서버시스템은 자신의 예측된 선종정보를 요청하는 제1사용자으로부터 복수의 제1항목에 대한 생활습관정보를 수신하고, 상기 수신된 제1항목에 대한 생활습관정보 및 상기 제1추론모델을 이용하여 예측된 선종정보를 도출하여, 제1사용자에게 제공하는 제1서비스가 수행됨으로써, 제2학습데이터를 도출한다. As described above, the server system receives lifestyle information for a plurality of first items from the first user requesting his/her predicted line type information, and receives lifestyle information for the first item and the first By deriving the predicted ship type information using the inference model, and the first service provided to the first user is performed, the second learning data is derived.

한편, 상기 기계학습모델에 기반한 선종 관련 정보 예측 방법은, 자신의 예측된 선종정보를 요청하는 제2사용자으로부터 복수의 제2항목에 대한 생활습관정보를 수신하고, 상기 수신된 제2항목에 대한 생활습관정보 및 상기 제2추론모델 및 제3추론모델을 이용하여 예측된 선종정보를 도출하여, 제2사용자에게 제공하는 제2서비스제공단계;를 수행할 수 있다.On the other hand, the ship type-related information prediction method based on the machine learning model receives lifestyle information for a plurality of second items from a second user who requests his/her predicted ship type information, and receives information about the received second item. A second service providing step of deriving the predicted ship type information using the lifestyle information and the second inference model and the third inference model, and providing it to the second user; may be performed.

이와 같이 제1추론모델, 제2추론모델, 및 제3추론모델이 구축이 됨으로써, 사용자는 자신의 생활습관정보를 입력함으로써, 기계학습모델에 기반한 예측된 고위험 선종정보를 제공받을 수 있다.As the first inference model, the second inference model, and the third inference model are constructed in this way, the user can receive the predicted high-risk adenoma information based on the machine learning model by inputting his/her lifestyle information.

도 22은 본 발명의 일 실시예에 따른 제2서비스제공부(2520)의 동작에 의하여 사용자단말에서 표시되는 선종발생 예측정보 UI를 도시한다. 도 23은 본 발명의 일 실시예에 따른 제2서비스제공부(2520)의 동작에 의하여 사용자단말에서 표시되는 고위험 선종발생 예측정보 UI를 도시한다.22 illustrates adenoma occurrence prediction information UI displayed on the user terminal by the operation of the second service providing unit 2520 according to an embodiment of the present invention. 23 illustrates a high-risk adenoma occurrence prediction information UI displayed on a user terminal by the operation of the second service providing unit 2520 according to an embodiment of the present invention.

도 24는 본 발명의 실시예들에 따른 선종발생 예측 및 고위험 선종발생 예측에 대한 정확도에 대한 자료를 도시한다.24 shows data on the accuracy of predicting the occurrence of adenoma and predicting the occurrence of high-risk adenoma according to embodiments of the present invention.

모델 성능은 5-fold cross validation을 통하여 성능 통계를 확인하여 보았다. 결과는 도 24에 도시된 바와 같다.The model performance was checked by checking the performance statistics through 5-fold cross validation. The results are as shown in FIG. 24 .

- 전체 데이터: 78,792건- Total data: 78,792 cases

- 수집된 데이터 해당 기간: 2008년 08월 01일 ~ 2019년 12월 27일- Data collected during the period: August 01, 2008 - December 27, 2019

- 학습데이터와 테스트데이터 비율(8:2)- Ratio of training data and test data (8:2)

하기의 표 1 및 표 2는 각각 본 발명의 실시예들에 따른 제2추론모델 및 제3추론모델에서의 모델 성능 예측결과를 도시한다.Tables 1 and 2 below show prediction results of model performance in the second inference model and the third inference model, respectively, according to embodiments of the present invention.

[표 1][Table 1]

[표 2][Table 2]

도 25은 본 발명의 일 실시예에 따른 컴퓨팅장치의 내부 구성을 개략적으로 도시한다.25 schematically illustrates an internal configuration of a computing device according to an embodiment of the present invention.

상기 도 1에 따른 대장암 관련 정보 예측 방법을 수행하는 서버시스템(1000)은 도 13에서 도시되는 컴퓨팅장치의 1 이상의 모듈을 포함할 수 있다.The server system 1000 for performing the method for predicting colorectal cancer-related information according to FIG. 1 may include one or more modules of the computing device illustrated in FIG. 13 .

도 25에 도시한 바와 같이, 컴퓨팅장치(11000)은 적어도 하나의 프로세서(processor)(11100), 메모리(memory)(11200), 주변장치 인터페이스(peripheral interface)(11300), 입/출력 서브시스템(I/Osubsystem)(11400), 전력 회로(11500) 및 통신 회로(11600)를 적어도 포함할 수 있다. 이때, 컴퓨팅장치(11000)는 상기 얼굴의 비대칭 특성정보를 도출하는 시스템(1000)을 포함하거나, 상기 입/출력 서브시스템(11400)에 의하여 서버시스템(1000)에 연결될 수 있다.25, the computing device 11000 includes at least one processor 11100, a memory 11200, a peripheral interface 11300, an input/output subsystem ( I/O subsystem) 11400 , a power circuit 11500 and a communication circuit 11600 may be included at least. In this case, the computing device 11000 may include the system 1000 for deriving the asymmetric characteristic information of the face or may be connected to the server system 1000 by the input/output subsystem 11400 .

메모리(11200)는, 일례로 고속 랜덤 액세스 메모리(high-speed random access memory), 자기 디스크, 에스램(SRAM), 디램(DRAM), 롬(ROM), 플래시 메모리 또는 비휘발성 메모리를 포함할 수 있다. 메모리(11200)는 컴퓨팅장치(11000)의 동작에 필요한 소프트웨어 모듈, 명령어 집합 또는 그밖에 다양한 데이터를 포함할 수 있다.The memory 11200 may include, for example, a high-speed random access memory, a magnetic disk, an SRAM, a DRAM, a ROM, a flash memory, or a non-volatile memory. have. The memory 11200 may include a software module, an instruction set, or other various data required for the operation of the computing device 11000 .

이때, 프로세서(11100)나 주변장치 인터페이스(11300) 등의 다른 컴포넌트에서 메모리(11200)에 액세스하는 것은 프로세서(11100)에 의해 제어될 수 있다. 상기 프로세서(11100)은 단일 혹은 복수로 구성될 수 있고, 연산처리속도 향상을 위하여 GPU 및 TPU 형태의 프로세서를 포함할 수 있다.In this case, access to the memory 11200 from other components such as the processor 11100 or the peripheral device interface 11300 may be controlled by the processor 11100 . The processor 11100 may be configured as a single or a plurality of processors, and may include a GPU and a TPU type processor in order to improve the processing speed.

주변장치 인터페이스(11300)는 컴퓨팅장치(11000)의 입력 및/또는 출력 주변장치를 프로세서(11100) 및 메모리 (11200)에 결합시킬 수 있다. 프로세서(11100)는 메모리(11200)에 저장된 소프트웨어 모듈 또는 명령어 집합을 실행하여 컴퓨팅장치(11000)을 위한 다양한 기능을 수행하고 데이터를 처리할 수 있다.Peripheral interface 11300 may couple input and/or output peripherals of computing device 11000 to processor 11100 and memory 11200 . The processor 11100 may execute a software module or an instruction set stored in the memory 11200 to perform various functions for the computing device 11000 and process data.

입/출력 서브시스템(11400)은 다양한 입/출력 주변장치들을 주변장치 인터페이스(11300)에 결합시킬 수 있다. 예를 들어, 입/출력 서브시스템(11400)은 모니터나 키보드, 마우스, 프린터 또는 필요에 따라 터치스크린이나 센서등의 주변장치를 주변장치 인터페이스(11300)에 결합시키기 위한 컨트롤러를 포함할 수 있다. 다른 측면에 따르면, 입/출력 주변장치들은 입/출력 서브시스템(11400)을 거치지 않고 주변장치 인터페이스(11300)에 결합될 수도 있다.The input/output subsystem 11400 may couple various input/output peripherals to the peripheral interface 11300 . For example, the input/output subsystem 11400 may include a controller for coupling a peripheral device such as a monitor, keyboard, mouse, printer, or a touch screen or sensor as needed to the peripheral interface 11300 . According to another aspect, input/output peripherals may be coupled to peripheral interface 11300 without going through input/output subsystem 11400 .

전력 회로(11500)는 단말기의 컴포넌트의 전부 또는 일부로 전력을 공급할 수 있다. 예를 들어 전력 회로(11500)는 전력 관리 시스템, 배터리나 교류(AC) 등과 같은 하나 이상의 전원, 충전 시스템, 전력 실패 감지 회로(power failure detection circuit), 전력 변환기나 인버터, 전력 상태 표시자 또는 전력 생성, 관리, 분배를 위한 임의의 다른 컴포넌트들을 포함할 수 있다.The power circuit 11500 may supply power to all or some of the components of the terminal. For example, the power circuit 11500 may include a power management system, one or more power sources such as batteries or alternating current (AC), a charging system, a power failure detection circuit, a power converter or inverter, a power status indicator, or a power source. It may include any other components for creation, management, and distribution.

통신 회로(11600)는 적어도 하나의 외부 포트를 이용하여 다른 컴퓨팅장치와 통신을 가능하게 할 수 있다.The communication circuit 11600 may enable communication with another computing device using at least one external port.

또는 상술한 바와 같이 필요에 따라 통신 회로(11600)는 RF 회로를 포함하여 전자기 신호(electromagnetic signal)라고도 알려진 RF 신호를 송수신함으로써, 다른 컴퓨팅장치와 통신을 가능하게 할 수도 있다.Alternatively, as described above, if necessary, the communication circuit 11600 may transmit and receive an RF signal, also known as an electromagnetic signal, including an RF circuit, thereby enabling communication with other computing devices.

이러한 도 25의 실시예는, 컴퓨팅장치(11000)의 일례일 뿐이고, 컴퓨팅장치(11000)은 도 25에 도시된 일부 컴포넌트가 생략되거나, 도 25에 도시되지 않은 추가의 컴포넌트를 더 구비하거나, 2개 이상의 컴포넌트를 결합시키는 구성 또는 배치를 가질 수 있다. 예를 들어, 모바일 환경의 통신 단말을 위한 컴퓨팅장치는 도 25에 도시된 컴포넌트들 외에도, 터치스크린이나 센서 등을 더 포함할 수도 있으며, 통신 회로(1160)에 다양한 통신방식(WiFi, 3G, LTE, Bluetooth, NFC, Zigbee 등)의 RF 통신을 위한 회로가 포함될 수도 있다. 컴퓨팅장치(11000)에 포함 가능한 컴포넌트들은 하나 이상의 신호 처리 또는 어플리케이션에 특화된 집적 회로를 포함하는 하드웨어, 소프트웨어, 또는 하드웨어 및 소프트웨어 양자의 조합으로 구현될 수 있다.This embodiment of FIG. 25 is only an example of the computing device 11000, and the computing device 11000 may omit some components shown in FIG. 25 or further include additional components not shown in FIG. 25, or 2 It may have a configuration or arrangement that combines two or more components. For example, a computing device for a communication terminal in a mobile environment may further include a touch screen or a sensor in addition to the components shown in FIG. 25 , and may include various communication methods (WiFi, 3G, LTE) in the communication circuit 1160 . , Bluetooth, NFC, Zigbee, etc.) may include a circuit for RF communication. Components that may be included in the computing device 11000 may be implemented in hardware, software, or a combination of both hardware and software including an integrated circuit specialized for one or more signal processing or applications.

본 발명의 실시예에 따른 방법들은 다양한 컴퓨팅장치를 통하여 수행될 수 있는 프로그램 명령(instruction) 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 특히, 본 실시예에 따른 프로그램은 PC 기반의 프로그램 또는 모바일 단말 전용의 어플리케이션으로 구성될 수 있다. 본 발명이 적용되는 어플리케이션은 파일 배포 시스템이 제공하는 파일을 통해 이용자 단말에 설치될 수 있다. 일 예로, 파일 배포 시스템은 이용자 단말이기의 요청에 따라 상기 파일을 전송하는 파일 전송부(미도시)를 포함할 수 있다.Methods according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computing devices and recorded in a computer-readable medium. In particular, the program according to the present embodiment may be configured as a PC-based program or an application dedicated to a mobile terminal. The application to which the present invention is applied may be installed in the user terminal through a file provided by the file distribution system. For example, the file distribution system may include a file transmission unit (not shown) that transmits the file according to a request of the user terminal.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로 (collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨팅장치 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computing devices, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with reference to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be interpreted as being included in the scope of the present invention. do.

Claims

As a method of predicting ship type-related information based on a machine learning model, performed by a server system,
A first inference model statistically learned by first learning data including lifestyle information for a plurality of first items for each individual collected from medical institutions and adenoma information including information on the presence or absence of adenoma according to colonoscopy results The second learning data including lifestyle information and Zen type information for the plurality of first items for each individual is derived by deriving the predicted adenomatous information on the lifestyle information for the plurality of first items for each individual using a second learning data derivation step of deriving;
Generates first compressed learning data including lifestyle information and Zen type information for a second item in which some of the plurality of first items are removed from the first learning data, and in the second learning data, the plurality of first items A compressed data generating step of generating second compressed learning data including lifestyle information and Zen type information for a second item from which some of the first items are removed; and
A second reasoning for learning a second inference model that includes an artificial neural network and infers the possibility of information on the presence or absence of an adenoma with respect to the inputted lifestyle information for the second item by using the first compressed data and the second compressed data. A method for predicting adenoma related information based on a machine learning model, including a model learning step.

The method according to claim 1,
The second learning data derivation step is,
A learning information receiving step of receiving first learning data including lifestyle information for a plurality of first items for each individual collected from a medical institution, and adenoma information including information on the presence or absence of an adenoma according to a colonoscopy result;
The first learning data is clustered with adenomatous information according to the specific lifestyle information according to the first item, and based on the clustered information, the probability of adenomatosis for a group having the specific lifestyle information according to the first item is derived. possibility extraction stage;
a predictive data set deriving step of deriving a predictive data set including the adenomatous probability of a group having the specific lifestyle information according to a plurality of the first items; and
A prediction result derivation step of deriving a result on the possibility of adenoma of a person having the inputted lifestyle information based on the prediction data set with respect to the inputted lifestyle information for a plurality of first items for each individual; A method of predicting adenoma related information based on a machine learning model.

3. The method according to claim 2,
The step of deriving the prediction result is,
The degree of similarity is derived according to the matching degree between the lifestyle information for the plurality of first items for each individual input and the lifestyle information included in the prediction dataset, and it is determined whether the degree of similarity meets a preset standard. deriving a plurality of similar group prediction data; and
Information on the presence or absence of adenoma of a person who has lifestyle information for a plurality of first items for each individual inputted above by using a representative value of numerical information for information on the presence or absence of adenoma in the plurality of similar group prediction data A method of predicting adenoma related information based on a machine learning model, including; deriving probability information for.

The method according to claim 1,
Information on the presence or absence of adenoma of the first learning data is,
By extracting a keyword related to a polyp or adenoma from the colonoscopy result information and the biopsy result information collected from the medical institution, the first polyp data including at least one of the polyp location, polyp size, and polyp type for the individual is derived, , Adenoma-related information prediction method based on a machine learning model, which is derived in the form of normalized information by determining the information on the presence or absence of the adenoma from the first polyp data.

The method according to claim 1,
The first item and the second item include age, gender, BMI, exercise habit, drinking habit, smoking habit, family medical history and personal medical history,
The second item is a method of predicting ship type-related information based on a machine learning model, in which some of the first item is excluded.

The method according to claim 1,
In the second learning data derivation step,
The server system receives the lifestyle information for a plurality of first items from the first user who requests his/her predicted line type information, and uses the received lifestyle information for the first item and the first inference model. By deriving the predicted ship type information and performing the first service provided to the first user, the second learning data is derived,
The method of predicting ship type-related information based on the machine learning model is,
Receives lifestyle information for a plurality of second items from a second user who requests his/her predicted adenomatous information, and predicts adenomatous disease using the received lifestyle information for the second item and the second inference model A method of predicting ship type related information based on a machine learning model, further comprising; a second service providing step of deriving information and providing it to a second user.