KR102672710B1

KR102672710B1 - Apparatus and method for synthesizing endoscopic image

Info

Publication number: KR102672710B1
Application number: KR1020230087486A
Authority: KR
Inventors: 이해진; 이연주
Original assignee: 주식회사 서르
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2024-06-05

Abstract

본 발명은 인공지능을 활용한 내시경 영상 합성 장치 및 방법에 관한 것이다. 본 개시는 가상의 내시경 영상의 합성 장치를 제공할 수 있다. 상기 가상의 내시경 영상의 합성 장치는 내시경 영상과 관련된 속성값을 포함하는 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 입력받도록 이루어지는 입력부, 상기 입력 받은 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 기반으로 가상의 내시경 영상을 합성하도록 이루어지는 프로세서를 포함하고, 상기 프로세서는, 실제로 촬영된 내시경 영상 및 상기 촬영된 내시경 영상과 관련된 적어도 하나의 속성값, 병변의 위치 정보, 마스크 정보 중 적어도 하나를 학습 데이터로하여, 내시경 영상과 관련된 속성값이 포함된 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 입력받아 가상의 내시경 영상을 합성하도록 학습된 인공지능 모델을 포함하는 영상 합성 모듈을 포함하는 것을 특징으로 하는 가상의 내시경 영상 합성 장치를 제공할 수 있다.The present invention relates to an endoscopic image synthesis device and method using artificial intelligence. The present disclosure can provide a device for synthesizing virtual endoscopic images. The virtual endoscopic image synthesis device includes an input unit configured to receive at least one of text containing attribute values related to the endoscopic image, lesion location information, mask information, and an actually captured endoscopic image, the input text, and the location of the lesion. A processor configured to synthesize a virtual endoscopic image based on at least one of information, mask information, and an actually captured endoscopic image, wherein the processor combines an actually captured endoscopic image and at least one attribute related to the captured endoscopic image. Using at least one of the value, lesion location information, and mask information as learning data, at least one of text containing attribute values related to the endoscopic image, lesion location information, mask information, and actually captured endoscopic image is input, and a virtual A virtual endoscopic image synthesis device may be provided, characterized in that it includes an image synthesis module including an artificial intelligence model learned to synthesize endoscopic images.

Description

Endoscopic image synthesis device and synthesis method {APPARATUS AND METHOD FOR SYNTHESIZING ENDOSCOPIC IMAGE}

본 발명은 인공지능을 활용한 내시경 영상 합성 장치 및 방법에 관한 것이다.The present invention relates to an endoscopic image synthesis device and method using artificial intelligence.

인공지능 영상 합성 기술은 컴퓨터 비전과 인공지능 기술의 결합으로 실제 영상에 디지털 요소를 추가하거나 수정하는 기술이다. 이 기술은 가상 현실(VR), 증강 현실(AR), 영화 제작, 광고, 게임 개발 등 다양한 분야에서 활용된다. Artificial intelligence image synthesis technology is a technology that adds or modifies digital elements to actual images through a combination of computer vision and artificial intelligence technology. This technology is used in various fields such as virtual reality (VR), augmented reality (AR), film production, advertising, and game development.

최근에는, 의료 분야에서 진단, 치료, 연구 등 다양한 목적으로 인공지능을 활용한 의료 영상 합성 기술이 사용되고 있다. 예를 들어, 의료 영상 합성 기술을 사용하여 의료 영상의 해상도를 향상시키거나 노이즈를 제거할 수 있다. 이를 통해, 의료 전문가들은 더 정확하고 세밀한 분석을 수행할 수 있으며, 환자의 상태를 더 정확하게 판단할 수 있다. 다른 예를 들어, 의료 영상 합성은 병변이나 종양과 같은 질병을 시각화하는 데 사용될 수 있다. 의료 영상 데이터와 인공지능 알고리즘을 결합하여 종양의 위치, 크기, 형태 등을 시각적으로 보다 명확하게 표현할 수 있다.Recently, medical image synthesis technology using artificial intelligence has been used in the medical field for various purposes such as diagnosis, treatment, and research. For example, medical image synthesis technology can be used to improve the resolution of medical images or remove noise. Through this, medical experts can perform more accurate and detailed analysis and more accurately determine the patient's condition. As another example, medical image synthesis can be used to visualize diseases such as lesions or tumors. By combining medical image data and artificial intelligence algorithms, the location, size, and shape of the tumor can be visually expressed more clearly.

하지만, 병변 검출, 세분화, 분류 등 다양한 의료 인공지능 모델 학습을 위해서는 다수의 영상 데이터가 필요하지만, 의료 분야 특성 상 다수의 영상을 확보하기 어려운 실정이다. 이에, 인공지능 모델 훈련에 활용될 수 있는 영상을 합성하는 기술에 대한 니즈가 존재한다.However, a large number of image data are required to learn various medical artificial intelligence models such as lesion detection, segmentation, and classification, but it is difficult to secure a large number of images due to the nature of the medical field. Accordingly, there is a need for technology to synthesize images that can be used to train artificial intelligence models.

대한민국 특허등록공보 제10-2513394호Republic of Korea Patent Registration No. 10-2513394

본 개시는 위, 소장 또는 대장 병변 예측 모델을 학습시키는데 사용되는 대량의 내시경 영상을 합성하는 장치 및 방법을 제공하는 것을 그 목적으로 한다.The purpose of the present disclosure is to provide an apparatus and method for synthesizing large amounts of endoscopic images used to learn a stomach, small intestine, or large intestine lesion prediction model.

또한, 본 개시는 복수의 속성값, 생성하고자 하는 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나하여 쉽게 가상의 내시경 영상을 합성할 수 있도록 하는 영상 합성 장치 및 방법을 제공하는 것을 그 목적으로 한다. In addition, the present disclosure provides an image synthesis device and method that allows easily synthesizing a virtual endoscopic image using at least one of a plurality of attribute values, location information of the lesion to be created, mask information, and an actually captured endoscopic image. It is for that purpose.

본 개시가 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

본 개시는 가상의 내시경 영상의 합성 장치를 제공할 수 있다. 상기 가상의 내시경 영상의 합성 장치는 내시경 영상과 관련된 속성값을 포함하는 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 입력받도록 이루어지는 입력부, 상기 입력 받은 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 기반으로 가상의 내시경 영상을 합성하도록 이루어지는 프로세서를 포함하고, 상기 프로세서는, 실제로 촬영된 내시경 영상 및 상기 촬영된 내시경 영상과 관련된 적어도 하나의 속성값, 병변의 위치 정보, 마스크 정보 중 적어도 하나를 학습 데이터로하여, 내시경 영상과 관련된 속성값이 포함된 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 입력받아 가상의 내시경 영상을 합성하도록 학습된 인공지능 모델을 포함하는 영상 합성 모듈을 포함하는 것을 특징으로 하는 가상의 내시경 영상 합성 장치를 제공할 수 있다.The present disclosure can provide a device for synthesizing virtual endoscopic images. The virtual endoscopic image synthesis device includes an input unit configured to receive at least one of text containing attribute values related to the endoscopic image, lesion location information, mask information, and an actually captured endoscopic image, the input text, and the location of the lesion. A processor configured to synthesize a virtual endoscopic image based on at least one of information, mask information, and an actually captured endoscopic image, wherein the processor combines an actually captured endoscopic image and at least one attribute related to the captured endoscopic image. Using at least one of the value, lesion location information, and mask information as learning data, at least one of text containing attribute values related to the endoscopic image, lesion location information, mask information, and actually captured endoscopic image is input, and a virtual A virtual endoscopic image synthesis device may be provided, characterized in that it includes an image synthesis module including an artificial intelligence model learned to synthesize endoscopic images.

일 실시 예에 있어서, 상기 실제로 촬영된 내시경 영상 및 상기 촬영된 내시경 영상과 관련된 속성값, 상기 병변의 위치 정보 및 상기 마스크 정보 중 적어도 하나를 입력받는 입력창을 표시하도록 이루어지는 표시부를 더 포함하고, 상기 학습 데이터는 상기 입력창을 통해 입력받은 적어도 하나의 속성값을 상기 촬영된 내시경 영상에 매칭시켜 생성될 수 있다.In one embodiment, the display unit is configured to display an input window that receives at least one of the actually captured endoscopic image, attribute values related to the captured endoscopic image, location information of the lesion, and the mask information, The learning data may be generated by matching at least one attribute value input through the input window to the captured endoscopic image.

일 실시 예에 있어서, 상기 프로세서는 내시경 영상과 관련된 속성값을 기반으로 상기 속성값이 포함된 텍스트를 생성하는 문장 생성 모듈을 더 포함하고, 상기 학습 데이터는, 상기 입력창을 통해 입력받은 적어도 하나의 속성값 및 상기 입력창을 통해 입력받은 적어도 하나의 속성값을 기반으로 생성된 텍스트를 상기 촬영된 내시경 영상에 매칭시켜 생성될 수 있다.In one embodiment, the processor further includes a sentence generation module that generates text containing the attribute values based on attribute values related to the endoscope image, and the learning data includes at least one input through the input window. It can be generated by matching the text generated based on the attribute value of and at least one attribute value input through the input window to the captured endoscopic image.

일 실시 예에 있어서, 상기 문장 생성 모듈은 상기 촬영된 내시경 영상과 관련된 속성값을 기반으로 복수의 텍스트를 생성하고, 상기 학습 데이터는 상기 복수의 텍스트를 상기 촬영된 내시경 영상에 매칭시켜 생성될 수 있다.In one embodiment, the sentence generation module generates a plurality of texts based on attribute values related to the captured endoscopic image, and the learning data may be generated by matching the plurality of texts to the captured endoscopic image. there is.

일 실시 예에 있어서, 상기 촬영된 내시경 영상과 관련된 속성값은 글로벌 스탠드 SES-CD 스코어 평가 항목과 관련된 속성값을 포함할 수 있다.In one embodiment, attribute values related to the captured endoscopic image may include attribute values related to global stand SES-CD score evaluation items.

일 실시 예에 있어서, 상기 촬영된 내시경 영상과 관련된 속성값은 촬영된 장기의 종류, 촬영 대상의 연령, 진단명, 내시경 촬영 장치의 시야, 병변 시야, 촬영된 병변의 위치, 촬영된 병변의 모양, 촬영된 병변의 크기, 내시경 촬영 장치의 종류, 내시경 시술 도구의 종류 및 내시경 촬영 상황 중 적어도 하나와 관련된 속성값을 포함할 수 있다.In one embodiment, the attribute values related to the captured endoscopic image include the type of organ imaged, the age of the imaged subject, the name of the diagnosis, the field of view of the endoscopic imaging device, the field of view of the lesion, the location of the imaged lesion, the shape of the imaged lesion, It may include attribute values related to at least one of the size of the imaged lesion, the type of endoscopic imaging device, the type of endoscopic procedure tool, and the endoscopic imaging situation.

일 실시 예에 있어서, 상기 인공지능 모델은 상기 내시경 영상과 관련된 속성값이 포함된 텍스트, 상기 병변의 위치 정보 및 상기 마스크 정보 중 적어도 하나를 기반으로, 실제로 촬영된 내시경 영상에 병변 이미지를 합성하도록 학습될 수 있다.In one embodiment, the artificial intelligence model is configured to synthesize a lesion image with an actually captured endoscopic image based on at least one of text containing attribute values related to the endoscopic image, location information of the lesion, and the mask information. It can be learned.

또한 본 개시는, 실제로 촬영된 내시경 영상 및 상기 촬영된 내시경 영상과 관련된 적어도 하나의 속성값을 학습 데이터로하여, 내시경 영상과 관련된 속성값이 포함된 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 입력받아 가상의 내시경 영상을 합성하도록 인공지능 모델을 학습시키는 단계, 내시경 영상과 관련된 속성값을 포함하는 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 입력받는 단계 및 상기 인공지능 모델이 상기 입력 받은 텍스트, 병변의 위치 정보, 마스크 정보 및 실제로 촬영된 내시경 영상 중 적어도 하나를 기반으로 가상의 내시경 영상을 합성하는 단계를 포함하는 가상의 내시경 영상 합성 방법을 제공할 수 있다.In addition, the present disclosure uses an actually captured endoscopic image and at least one attribute value related to the captured endoscopic image as learning data, text containing attribute values related to the endoscopic image, location information of the lesion, mask information, and actually captured endoscopic image. A step of learning an artificial intelligence model to synthesize a virtual endoscopic image by receiving at least one of the endoscopic images, text containing attribute values related to the endoscopic image, lesion location information, mask information, and at least one of the actually captured endoscopic images. A virtual endoscopic image comprising receiving one input and the artificial intelligence model synthesizing a virtual endoscopic image based on at least one of the input text, lesion location information, mask information, and actually captured endoscopic image. A synthesis method may be provided.

이 외에도, 본 개시를 구현하기 위한 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 더 제공될 수 있다.In addition to this, a computer program stored in a computer-readable recording medium for implementing the present disclosure may be further provided.

이 외에도, 본 개시를 구현하기 위한 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, a computer-readable recording medium recording a computer program for executing a method for implementing the present disclosure may be further provided.

본 개시에 따른 영상 합성 장치 및 방법은 실제 내시경 데이터가 없어도 다양한 병변 영상을 대량을 합성할 수 있도록 한다. 이러한 방식으로 생성된 대량의 영상 데이터는 내시경 영상을 통해 병변을 예측하기 위한 인공지능 모델의 학습 데이터로 활용될 수 있다. 즉, 본 개시에 따르면, 소량의 실제로 촬영된 내시경 영상만으로, 인공지능 모델 학습을 위한 학습 데이터(가상의 내시경 영상)을 대량을 생성할 수 있다.The image synthesis device and method according to the present disclosure enable the synthesis of a large number of images of various lesions even without actual endoscopic data. A large amount of image data generated in this way can be used as training data for an artificial intelligence model to predict lesions through endoscopic images. That is, according to the present disclosure, a large amount of learning data (virtual endoscope images) for learning an artificial intelligence model can be generated using only a small amount of actually captured endoscope images.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 전반적 시스템 도면이다.
도 2는 본 개시의 영상 합성 장치에 포함된 서버의 블록도이다.
도 3은 본 개시의 영상 합성 장치에 포함된 단말기의 블록도이다.
도 3은 본 개시의 영상 합성 장치에 포함된 프로세서의 블록도이다.
도 5는 본 개시에 따른 영상 합성 방법의 흐름도이다.
도 6은 본 개시에 따른 영상 합성 장치에 포함된 인공지능 모델의 학습 방법을 나타내는 흐름도이다.
도 7 및 8은 인공지능 모델의 학습 데이터 생성을 위한 사용자 인터페이스 화면이다.
도 9 및 10은 본 개시에 따른 영상 합성 장치에 포함된 인공지능 모델의 일 실시 예를 나타내는 개념도이다.1 is an overall system diagram of the present disclosure.
Figure 2 is a block diagram of a server included in the image synthesis device of the present disclosure.
Figure 3 is a block diagram of a terminal included in the image synthesis device of the present disclosure.
Figure 3 is a block diagram of a processor included in the image synthesis device of the present disclosure.
Figure 5 is a flowchart of an image synthesis method according to the present disclosure.
Figure 6 is a flowchart showing a method of learning an artificial intelligence model included in the image synthesis device according to the present disclosure.
Figures 7 and 8 are user interface screens for generating learning data for an artificial intelligence model.
9 and 10 are conceptual diagrams showing an example of an artificial intelligence model included in the image synthesis device according to the present disclosure.

본 개시 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 개시가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시가 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 ‘부, 모듈, 부재, 블록’이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다. Like reference numerals refer to like elements throughout this disclosure. This disclosure does not describe all elements of the embodiments, and general content or overlapping content between embodiments in the technical field to which this disclosure pertains is omitted. The term 'part, module, member, block' used in the specification may be implemented as software or hardware, and depending on the embodiment, a plurality of 'part, module, member, block' may be implemented as a single component, or It is also possible for one 'part, module, member, or block' to include multiple components.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is said to be “connected” to another part, this includes not only direct connection but also indirect connection, and indirect connection includes connection through a wireless communication network. do.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

명세서 전체에서, 어떤 부재가 다른 부재 "상에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the specification, when a member is said to be located “on” another member, this includes not only cases where a member is in contact with another member, but also cases where another member exists between the two members.

제 1, 제 2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다. Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly makes an exception.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다. The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is.

이하 첨부된 도면들을 참고하여 본 개시의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the operating principle and embodiments of the present disclosure will be described with reference to the attached drawings.

본 명세서에서 '본 개시에 따른 장치'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함된다. 예를 들어, 본 개시에 따른 장치는, 컴퓨터, 서버 장치 및 휴대용 단말기를 모두 포함하거나, 또는 어느 하나의 형태가 될 수 있다.In this specification, 'device according to the present disclosure' includes all various devices that can perform computational processing and provide results to the user. For example, the device according to the present disclosure may include all of a computer, a server device, and a portable terminal, or may take the form of any one.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser.

상기 서버 장치는 외부 장치와 통신을 수행하여 정보를 처리하는 서버로써, 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다.The server device is a server that processes information by communicating with external devices and may include an application server, computing server, database server, file server, game server, mail server, proxy server, and web server.

상기 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말, 스마트 폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치와 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD) 등과 같은 웨어러블 장치를 포함할 수 있다.The portable terminal is, for example, a wireless communication device that guarantees portability and mobility, such as PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), and PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, smart phone ), all types of handheld wireless communication devices, and wearable devices such as watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-device (HMD). may include.

본 개시에 따른 영상 합성 장치는 서버 및 단말기 중 적어도 하나에 의해 구현될 수 있다. 구체적으로, 본 개시에 따른 인공지능 모델 학습 장치는 서버 및 단말기 중 어느 하나에 의해 구현되거나, 서버 및 단말기 간의 데이터 송수신을 통해 시스템으로 구현될 수 있다. The image synthesis device according to the present disclosure may be implemented by at least one of a server and a terminal. Specifically, the artificial intelligence model learning device according to the present disclosure may be implemented by either a server or a terminal, or may be implemented as a system through data transmission and reception between the server and the terminal.

이하에서는, 본 개시에 따른 영상 합성 장치에 대하여 설명한다. Hereinafter, an image synthesis device according to the present disclosure will be described.

도 1을 참고하면, 본 개시에 따른 영상 합성 장치는 서버 및 단말기 중 적어도 하나를 포함할 수 있다.Referring to FIG. 1, the image synthesis device according to the present disclosure may include at least one of a server and a terminal.

서버(10)는 단말기(20)와 네트워크로 연결되며, 단말기(20)로부터 영상 합성에 필요한 데이터를 수신한 후, 영상 합성 결과를 단말기(20)로 전송할 수 있다.The server 10 is connected to the terminal 20 through a network, and can receive data necessary for image synthesis from the terminal 20 and then transmit the image synthesis result to the terminal 20.

한편, 상기 단말기(20)는 상술한 휴대용 단말기에 한정되지 않고, 프로세서가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있는 것은 통상의 기술자에게 자명하다. Meanwhile, it is obvious to those skilled in the art that the terminal 20 is not limited to the above-described portable terminal, and may include a laptop, desktop, laptop, tablet PC, slate PC, etc. equipped with a processor. do.

상술한 바와 같이, 본 개시에 따른 영상 합성 장치는 서버(10) 및 단말기(20) 간 데이터 송수신을 통해 구현될 수 있다. As described above, the image synthesis device according to the present disclosure can be implemented through data transmission and reception between the server 10 and the terminal 20.

이하에서는, 본 개시에 따른 영상 합성 장치를 구현하기 위한 서버(10) 및 단말기(20) 각각에 대하여 설명한다. Hereinafter, each of the server 10 and the terminal 20 for implementing the image synthesis device according to the present disclosure will be described.

도 2는 본 개시의 영상 합성 장치에 포함된 서버의 블록도이다.Figure 2 is a block diagram of a server included in the image synthesis device of the present disclosure.

본 개시에 따른 서버(100)는 통신부(110), 저장부(120) 및 프로세서(130) 중 적어도 하나를 포함할 수 있다. The server 100 according to the present disclosure may include at least one of a communication unit 110, a storage unit 120, and a processor 130.

통신부(110)는 단말기, 외부 저장소(예를 들어, 데이터베이스(database, 140)), 외부 서버 및 클라우드 서버 중 적어도 하나와 통신을 수행할 수 있다.The communication unit 110 may communicate with at least one of a terminal, an external storage (eg, a database 140), an external server, and a cloud server.

한편, 외부 서버 또는 클라우드 서버에서는, 프로세서(130)의 적어도 일부의 역할을 수행하도록 구성될 수 있다. 즉, 데이터 처리 또는 데이터 연산 등의 수행은 외부 서버 또는 클라우드 서버에서 이루어지는 것이 가능하며, 본 발명에서는 이러한 방식에 대한 특별한 제한을 두지 않는다.Meanwhile, in an external server or cloud server, it may be configured to perform at least a portion of the role of the processor 130. In other words, data processing or data computation can be performed on an external server or cloud server, and the present invention does not impose any special restrictions on this method.

한편, 통신부(110)는 통신하는 대상(예를 들어, 전자기기, 외부 서버, 디바이스 등)의 통신 규격에 따라 다양한 통신 방식을 지원할 수 있다.Meanwhile, the communication unit 110 may support various communication methods depending on the communication standard of the communicating object (eg, electronic device, external server, device, etc.).

예를 들어, 통신부(110)는, WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced), 5G(5th　Generation　Mobile　Telecommunication　), 블루투스(Bluetooth™), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra-Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 통신 대상과 통신하도록 이루어질 수 있다.For example, the communication unit 110 supports wireless LAN (WLAN), wireless-fidelity (Wi-Fi), wireless fidelity (Wi-Fi) Direct, digital living network alliance (DLNA), wireless broadband (WiBro), and WiMAX ( World Interoperability for Microwave Access), HSDPA (High Speed Downlink Packet Access), HSUPA (High Speed Uplink Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution-Advanced), 5G (5th Generation Mobile Telecommunication) , Bluetooth™, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra-Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi Direct, Wireless USB (Wireless Universal) Serial Bus) technology may be used to communicate with a communication target.

다음으로 저장부(120)는, 본 발명과 관련된 다양한 정보를 저장하도록 이루어질 수 있다. 본 발명에서 저장부(120)는 본 발명에 따른 장치 자체에 구비될 수 있다. 이와 다르게, 저장부(120)의 적어도 일부는, 데이터베이스(database: DB, 140) 클라우드 저장소(또는 클라우드 서버) 중 적어도 하나를 의미할 수 있다. 즉, 저장부(120)는 본 발명에 따른 장치 및 방법을 위하여 필요한 정보가 저장되는 공간이면 충분하며, 물리적인 공간에 대한 제약은 없는 것으로 이해될 수 있다. 이에, 이하에서는, 저장부(120), 데이터베이스(140), 외부 저장소, 클라우드 저장소(또는 클라우드 서버)를 별도로 구분하지 않고, 모두 저장부(120)라고 표현하도록 한다. Next, the storage unit 120 can be configured to store various information related to the present invention. In the present invention, the storage unit 120 may be provided in the device itself according to the present invention. Alternatively, at least a portion of the storage unit 120 may refer to at least one of a database (DB) 140 and cloud storage (or cloud server). In other words, it can be understood that the storage unit 120 is sufficient as a space to store information necessary for the device and method according to the present invention, and there are no restrictions on physical space. Accordingly, hereinafter, the storage unit 120, the database 140, external storage, and cloud storage (or cloud server) will not be separately distinguished, and all will be referred to as the storage unit 120.

다음으로, 프로세서(130)는 본 발명과 관련된 장치의 전반적인 동작을 제어하도록 이루어질 수 있다. 프로세서(130)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 사용자에게 적절한 정보 또는 기능을 제공 또는 처리할 수 있다.Next, the processor 130 may be configured to control the overall operation of the device related to the present invention. The processor 130 may process signals, data, information, etc. input or output through the components discussed above, or provide or process appropriate information or functions to the user.

프로세서(130)는 적어도 하나의 CPU(Central Processing Unit, 중앙처리장치)를 포함하여, 본 발명에 따른 기능을 수행할 수 있다. The processor 130 may include at least one CPU (Central Processing Unit) and perform the function according to the present invention.

도 2에 도시된 구성 요소들의 성능에 대응하여 적어도 하나의 구성요소가 추가되거나 삭제될 수 있다. 또한, 구성 요소들의 상호 위치는 장치의 성능 또는 구조에 대응하여 변경될 수 있다는 것은 당해 기술 분야에서 통상의 지식을 가진 자에게 용이하게 이해될 것이다.At least one component may be added or deleted in response to the performance of the components shown in FIG. 2. Additionally, it will be easily understood by those skilled in the art that the mutual positions of the components may be changed in response to the performance or structure of the device.

이하, 본 개시의 영상 합성 장치에 포함된 단말기에 대하여 구체적으로 설명한다.Hereinafter, the terminal included in the image synthesis device of the present disclosure will be described in detail.

도 3은 본 개시의 영상 합성 장치에 포함된 단말기의 블록도이다.Figure 3 is a block diagram of a terminal included in the image synthesis device of the present disclosure.

도 3을 참고하면, 본 개시에 따른 단말기(200)는 통신부(210), 입력부(220), 표시부(230) 및 프로세서(240) 등을 포함할 수 있다. 도 3에 도시된 구성요소들은 본 개시에 따른 영상 합성 장치를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서 상에서 설명되는 단말기는 위에서 열거된 구성요소들 보다 많거나, 또는 적은 구성요소들을 가질 수 있다.Referring to FIG. 3, the terminal 200 according to the present disclosure may include a communication unit 210, an input unit 220, a display unit 230, and a processor 240. The components shown in FIG. 3 are not essential for implementing the image synthesis device according to the present disclosure, so the terminal described in this specification may have more or fewer components than the components listed above.

상기 구성요소들 중 통신부(210)는 외부 장치와 통신을 가능하게 하는 하나 이상의 구성 요소를 포함할 수 있으며, 예를 들어, 방송 수신 모듈, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다. Among the components, the communication unit 210 may include one or more components that enable communication with an external device, for example, a broadcast reception module, a wired communication module, a wireless communication module, a short-range communication module, and location information. It may contain at least one of the modules.

유선 통신 모듈은, 지역 통신(Local Area Network; LAN) 모듈, 광역 통신(Wide Area Network; WAN) 모듈 또는 부가가치 통신(Value Added Network; VAN) 모듈 등 다양한 유선 통신 모듈뿐만 아니라, USB(Universal Serial Bus), HDMI(High Definition Multimedia Interface), DVI(Digital Visual Interface), RS-1302(recommended standard1302), 전력선 통신, 또는 POTS(plain old telephone service) 등 다양한 케이블 통신 모듈을 포함할 수 있다. Wired communication modules include various wired communication modules such as Local Area Network (LAN) modules, Wide Area Network (WAN) modules, or Value Added Network (VAN) modules, as well as USB (Universal Serial Bus) modules. ), HDMI (High Definition Multimedia Interface), DVI (Digital Visual Interface), RS-1302 (recommended standard 1302), power line communication, or POTS (plain old telephone service).

무선 통신 모듈은 와이파이(Wifi) 모듈, 와이브로(Wireless broadband) 모듈 외에도, GSM(global System for Mobile Communication), CDMA(Code Division Multiple Access), WCDMA(Wideband Code Division Multiple Access), UMTS(universal mobile telecommunications system), TDMA(Time Division Multiple Access), LTE(Long Term Evolution), 4G, 5G, 6G 등 다양한 무선 통신 방식을 지원하는 무선 통신 모듈을 포함할 수 있다.In addition to Wi-Fi modules and WiBro (Wireless broadband) modules, wireless communication modules include GSM (global System for Mobile Communication), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), and UMTS (universal mobile telecommunications system). ), TDMA (Time Division Multiple Access), LTE (Long Term Evolution), 4G, 5G, 6G, etc. may include a wireless communication module that supports various wireless communication methods.

입력부(220)는 영상 정보(또는 신호), 오디오 정보(또는 신호), 데이터, 또는 사용자로부터 입력되는 정보의 입력을 위한 것으로서, 적어도 하나의 카메라, 적어도 하나의 마이크로폰 및 사용자 입력부 중 적어도 하나를 포함할 수 있다. 입력부에서 수집한 음성 데이터나 이미지 데이터는 분석되어 사용자의 제어명령으로 처리될 수 있다.The input unit 220 is for inputting image information (or signal), audio information (or signal), data, or information input from a user, and includes at least one of at least one camera, at least one microphone, and a user input unit. can do. Voice data or image data collected from the input unit can be analyzed and processed as a user's control command.

표시부(230)는 시각, 청각 또는 촉각 등과 관련된 출력을 발생시키기 위한 것으로, 디스플레이부, 음향 출력부, 햅팁 모듈 및 광 출력부 중 적어도 하나를 포함할 수 있다. 디스플레이부는 터치 센서와 상호 레이어 구조를 이루거나 일체형으로 형성됨으로써, 터치 스크린을 구현할 수 있다. 이러한 터치 스크린은, 본 장치와 사용자 사이의 입력 인터페이스를 제공하는 사용자 입력부로써 기능함과 동시에, 본 장치와 사용자 간에 출력 인터페이스를 제공할 수 있다.The display unit 230 is intended to generate output related to vision, hearing, or tactile sensation, and may include at least one of a display unit, an audio output unit, a haptip module, and an optical output unit. A touch screen can be implemented by forming a layered structure with the touch sensor or being integrated with the display unit. This touch screen functions as a user input unit that provides an input interface between the device and the user, and can simultaneously provide an output interface between the device and the user.

디스플레이부는 본 장치에서 처리되는 정보를 표시(출력)한다. 예를 들어, 디스플레이부는 본 장치에서 구동되는 응용 프로그램(일 예로, 어플리케이션)의 실행화면 정보, 또는 이러한 실행화면 정보에 따른 UI(User Interface), GUI(Graphic User Interface) 정보를 표시할 수 있다. The display unit displays (outputs) information processed in the device. For example, the display unit may display execution screen information of an application program (eg, an application) running on the device, or UI (User Interface) and GUI (Graphic User Interface) information according to the execution screen information.

상술한 구성요소 외에, 상술한 단말기는 인터페이스부 및 메모리를 더 포함할 수 있다. In addition to the above-described components, the above-described terminal may further include an interface unit and memory.

인터페이스부는 본 장치에 연결되는 다양한 종류의 외부 기기와의 통로 역할을 수행한다. 이러한 인터페이스부는 유/무선 헤드셋 포트(port), 외부 충전기 포트(port), 유/무선 데이터 포트(port), 메모리 카드(memory card) 포트, 식별 모듈(SIM)이 구비된 장치를 연결하는 포트(port), 오디오 I/O(Input/Output) 포트(port), 비디오 I/O(Input/Output) 포트(port), 이어폰 포트(port) 중 적어도 하나를 포함할 수 있다. 본 장치에서는, 상기 인터페이스부에 연결된 외부 기기와 관련된 적절한 제어를 수행할 수 있다.The interface unit serves as a passageway for various types of external devices connected to this device. These interface units include a wired/wireless headset port, an external charger port, a wired/wireless data port, a memory card port, and a port for connecting a device equipped with an identification module (SIM) ( port), an audio input/output (I/O) port, a video input/output (I/O) port, and an earphone port. In this device, appropriate control related to external devices connected to the interface unit can be performed.

메모리는 본 장치의 다양한 기능을 지원하는 데이터와, 프로세서의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 음악 파일, 정지영상, 동영상 등)을 저장할 있고, 본 장치에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 본 장치의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드 될 수 있다. The memory can store data that supports various functions of this device and programs for processor operation, and can store input/output data (e.g., music files, still images, videos, etc.), and can store data that supports various functions of this device. A number of application programs (application programs or applications) running in the device, data for operation of the device, and commands can be stored. At least some of these applications may be downloaded from an external server via wireless communication.

이러한, 메모리는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리는 본 장치와는 분리되어 있으나, 유선 또는 무선으로 연결된 데이터베이스가 될 수도 있다.These memories include flash memory type, hard disk type, SSD type (Solid State Disk type), SDD type (Silicon Disk Drive type), and multimedia card micro type. , card-type memory (e.g., SD or It may include at least one type of storage medium among (only memory), PROM (programmable read-only memory), magnetic memory, magnetic disk, and optical disk. Additionally, the memory may be a database that is separate from the device, but is connected wired or wirelessly.

한편, 상술한 단말기는 프로세서(240)를 포함한다. 프로세서는 본 장치 내의 구성요소들의 동작을 제어하기 위한 알고리즘 또는 알고리즘을 재현한 프로그램에 대한 데이터를 저장하는 메모리, 및 메모리에 저장된 데이터를 이용하여 전술한 동작을 수행하는 적어도 하나의 프로세서(미도시)로 구현될 수 있다. 이때, 메모리와 프로세서는 각각 별개의 칩으로 구현될 수 있다. 또는, 메모리와 프로세서는 단일 칩으로 구현될 수도 있다.Meanwhile, the above-described terminal includes a processor 240. The processor includes a memory that stores data for an algorithm for controlling the operation of components within the device or a program that reproduces the algorithm, and at least one processor (not shown) that performs the above-described operations using the data stored in the memory. It can be implemented as: At this time, the memory and processor may each be implemented as separate chips. Alternatively, the memory and processor may be implemented as a single chip.

한편, 도 4와 같이, 서버 및 단말기 중 적어도 하나에 포함된 프로세서는 후술할 영상 합성 장치를 구현하기 위한 복수의 모듈을 포함할 수 있다. 구체적으로, 프로세서(300)는 문장 생성 모듈 (310) 및 영상 합성 모듈(320)을 포함할 수 있다. 후술하는 인공지능 모델 학습 방법은 상기 모듈들의 동작에 의해 구현되는 것으로 서술하나, 후술하는 각 단계의 동작의 수행이 반드시 상기 모듈들에 의해 수행될 필요는 없다.Meanwhile, as shown in FIG. 4, a processor included in at least one of the server and the terminal may include a plurality of modules for implementing an image synthesis device to be described later. Specifically, the processor 300 may include a sentence generation module 310 and an image synthesis module 320. The artificial intelligence model learning method described later is described as being implemented by the operations of the modules, but the performance of each step described later does not necessarily need to be performed by the modules.

또한, 프로세서는 이하의 도면에서 설명되는 본 개시에 따른 다양한 실시 예들을 본 장치 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 어느 하나 또는 복수를 조합하여 제어할 수 있다. In addition, the processor may control any one or a combination of the above-described components in order to implement various embodiments according to the present disclosure described in the drawings below on the present device.

한편, 도 1 내지 4에 도시된 구성 요소들의 성능에 대응하여 적어도 하나의 구성요소가 추가되거나 삭제될 수 있다. 또한, 구성 요소들의 상호 위치는 장치의 성능 또는 구조에 대응하여 변경될 수 있다는 것은 당해 기술 분야에서 통상의 지식을 가진 자에게 용이하게 이해될 것이다.Meanwhile, at least one component may be added or deleted in accordance with the performance of the components shown in FIGS. 1 to 4. Additionally, it will be easily understood by those skilled in the art that the mutual positions of the components may be changed in response to the performance or structure of the device.

이하에서는, 본 발명에서 서술되는 인공지능에 대하여 구체적으로 설명한다.Below, the artificial intelligence described in the present invention will be described in detail.

본 개시에 따른 인공지능과 관련된 기능은 상술한 서버 및 단말기에 탑재된 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다.Functions related to artificial intelligence according to the present disclosure are operated through processors and memories mounted on the above-described servers and terminals. The processor may consist of one or multiple processors. At this time, one or more processors may be a general-purpose processor such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor such as a GPU or VPU (Vision Processing Unit), or an artificial intelligence-specific processor such as an NPU. One or more processors control input data to be processed according to predefined operation rules or artificial intelligence models stored in memory. Alternatively, when one or more processors are dedicated artificial intelligence processors, the artificial intelligence dedicated processors may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도 형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.Predefined operation rules or artificial intelligence models are characterized by being created through learning. Here, being created through learning means that the basic artificial intelligence model is learned using a large number of learning data by a learning algorithm, thereby creating a predefined operation rule or artificial intelligence model set to perform the desired characteristics (or purpose). It means burden. This learning may be performed on the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server and/or system. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the examples described above.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들 (weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경 망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.An artificial intelligence model may be composed of multiple neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and neural network calculation is performed through calculation between the calculation result of the previous layer and the plurality of weights. Multiple weights of multiple neural network layers can be optimized by the learning results of the artificial intelligence model. For example, a plurality of weights may be updated so that loss or cost values obtained from the artificial intelligence model are reduced or minimized during the learning process. Artificial neural networks may include deep neural networks (DNN), for example, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), or Deep Q-Networks, etc., but are not limited to the examples described above.

본 개시의 예시적인 실시예에 따르면, 프로세서는 인공지능을 구현할 수 있다. 인공지능이란 사람의 신경세포(biological neuron)를 모사하여 기계가 학습하도록 하는 인공신경망(Artificial Neural Network) 기반의 기계 학습법을 의미한다. 인공지능의 방법론에는 학습 방식에 따라 훈련데이터로서 입력데이터와 출력데이터가 같이 제공됨으로써 문제(입력데이터)의 해답(출력데이터)이 정해져 있는 지도학습(supervised learning), 및 출력데이터 없이 입력데이터만 제공되어 문제(입력데이터)의 해답(출력데이터)이 정해지지 않는 비지도학습(unsupervised learning), 및 현재의 상태(State)에서 어떤 행동(Action)을 취할 때마다 외부 환경에서 보상(Reward)이 주어지는데, 이러한 보상을 최대화하는 방향으로 학습을 진행하는 강화학습(reinforcement learning)으로 구분될 수 있다. 또한, 인공지능의 방법론은 학습 모델의 구조인 아키텍처에 따라 구분될 수도 있는데, 널리 이용되는 딥러닝 기술의 아키텍처는, 합성곱신경망(CNN; Convolutional Neural Network), 순환신경망(RNN; Recurrent Neural Network), 트랜스포머(Transformer), 생성적 대립 신경망(GAN; generative adversarial networks) 등으로 구분될 수 있다.According to an exemplary embodiment of the present disclosure, a processor may implement artificial intelligence. Artificial intelligence refers to a machine learning method based on an artificial neural network that allows machines to learn by imitating human biological neurons. Methodology of artificial intelligence includes supervised learning, in which the answer (output data) to the problem (input data) is determined by providing input data and output data together as training data according to the learning method, and only input data is provided without output data. In unsupervised learning, in which the solution (output data) to the problem (input data) is not determined, and a reward is given from the external environment whenever an action is taken in the current state, , It can be divided into reinforcement learning, which conducts learning in the direction of maximizing these rewards. In addition, artificial intelligence methodologies can be divided according to the architecture, which is the structure of the learning model. The architecture of widely used deep learning technology is convolutional neural network (CNN) and recurrent neural network (RNN). , Transformer, generative adversarial networks (GAN), etc.

본 장치와 시스템은 인공지능 모델을 포함할 수 있다. 인공지능 모델은 하나의 인공지능 모델일 수 있고, 복수의 인공지능 모델로 구현될 수도 있다. 인공지능 모델은 뉴럴 네트워크(또는 인공 신경망)로 구성될 수 있으며, 기계학습과 인지과학에서 생물학의 신경을 모방한 통계학적 학습 알고리즘을 포함할 수 있다. 뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다. 뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어(layer)를 포함할 수 있다. 예시적으로, 장치는 input layer, hidden layer, output layer를 포함할 수 있다. 장치를 구성하는 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력(input)으로부터 예측하고자 하는 결과(output)를 추론할 수 있다.The devices and systems may include artificial intelligence models. An artificial intelligence model may be a single artificial intelligence model or may be implemented as multiple artificial intelligence models. Artificial intelligence models may be composed of neural networks (or artificial neural networks) and may include statistical learning algorithms that mimic biological neurons in machine learning and cognitive science. A neural network can refer to an overall model in which artificial neurons (nodes), which form a network through the combination of synapses, change the strength of the synapse connection through learning and have problem-solving capabilities. Neurons in a neural network can contain combinations of weights or biases. A neural network may include one or more layers consisting of one or more neurons or nodes. By way of example, a device may include an input layer, a hidden layer, and an output layer. The neural network that makes up the device can infer the result (output) to be predicted from arbitrary input (input) by changing the weight of neurons through learning.

프로세서는 뉴럴 네트워크를 생성하거나, 뉴럴 네트워크를 훈련(train, 또는 학습(learn)하거나, 수신되는 입력 데이터를 기초로 연산을 수행하고, 수행 결과를 기초로 정보 신호(information signal)를 생성하거나, 뉴럴 네트워크를 재훈련(retrain)할 수 있다. 뉴럴 네트워크의 모델들은 GoogleNet, AlexNet, VGG Network 등과 같은 CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restrcted Boltzman Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network 등 다양한 종류의 모델들을 포함할 수 있으나 이에 제한되지는 않는다. 프로세서는 뉴럴 네트워크의 모델들에 따른 연산을 수행하기 위한 하나 이상의 프로세서를 포함할 수 있다. 예를 들어 뉴럴 네트워크는 심층 뉴럴 네트워크 (Deep Neural Network)를 포함할 수 있다.The processor creates a neural network, trains or learns a neural network, performs calculations based on received input data, generates an information signal based on the results, or generates a neural network. The network can be retrained. Neural network models include CNN (Convolution Neural Network), R-CNN (Region with Convolution Neural Network), RPN (Region Proposal Network), and RNN, such as GoogleNet, AlexNet, and VGG Network. (Recurrent Neural Network), S-DNN (Stacking-based deep Neural Network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restrcted Boltzman Machine), Fully Convolutional Network , LSTM (Long Short-Term Memory) Network, Classification Network, etc., but the processor is not limited thereto and may include one or more processors for performing operations according to neural network models. For example, a neural network may include a deep neural network.

뉴럴 네트워크는 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), 퍼셉트론(perceptron), 다층 퍼셉트론(multilayer perceptron), FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network), AN(Attention Network), DDPM(Denoising diffusion probabilistic model), Latent Diffusion 및 Stable Diffusion 중 적어도 하나를 포함할 수 있으나 이에 한정되는 것이 아닌 임의의 뉴럴 네트워크를 포함할 수 있음은 통상의 기술자가 이해할 것이다.Neural networks include CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), perceptron, multilayer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), and LSTM. (Long Short Term Memory), GRU (Gated Recurrent Unit), AE (Auto Encoder), VAE (Variational Auto Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield) Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), DBN (Depp Belief Network), DCN (Deep Convolutional Network), DN (Deconvolutional Network), DCIGN (Deep Convolutional Inverse Graphics Network), GAN (Generative Adversarial Network) ), Liquid State Machine (LSM), Extreme Learning Machine (ELM), Echo State Network (ESN), Deep Residual Network (DRN), Differential Neural Computer (DNC), Neural Turning Machine (NTM), Capsule Network (CN), It may include at least one of KN (Kohonen Network), AN (Attention Network), DDPM (Denoising diffusion probabilistic model), Latent Diffusion, and Stable Diffusion, but is not limited to this and may include any neural network. Technicians will understand.

본 개시의 예시적인 실시예에 따르면, 프로세서는 GoogleNet, AlexNet, VGG Network 등과 같은 CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restrcted Boltzman Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, 자연어 처리를 위한 BERT, SP-BERT, MRC/QA, Text Analysis, Dialog System, GPT-3, GPT-4, 비전 처리를 위한 Visual Analytics, Visual Understanding, Video Synthesis, ResNet 데이터 지능을 위한 Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation, Data Creation 등 다양한 인공지능 구조 및 알고리즘을 이용할 수 있으며, 이에 제한되지 않는다. 이하, 첨부된 도면을 참조하여 본 개시의 실시예를 상세하게 설명한다.According to an exemplary embodiment of the present disclosure, the processor may support a Convolution Neural Network (CNN), a Region with Convolution Neural Network (R-CNN), a Region Proposal Network (RPN), a Recurrent Neural Network (RNN), such as GoogleNet, AlexNet, VGG Network, etc. ), S-DNN (Stacking-based deep Neural Network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restrcted Boltzman Machine), Fully Convolutional Network, LSTM (Long Short-Term Memory) Network, Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, BERT for natural language processing, SP-BERT, MRC/QA, Text Analysis, Dialog System, GPT-3 , GPT-4, Visual Analytics for vision processing, Visual Understanding, Video Synthesis, and Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation, and Data Creation for ResNet data intelligence. , but is not limited to this. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the attached drawings.

이하에서는, 상술한 구성요소를 활용한 영상 합성 방법에 대해 구체적으로 설명한다. 후술하는 영상 합성 방법은 상술한 서버 및 단말기 간 데이터 송수신을 통해 구현되며, 상기 방법의 일부 단계는 서버 및 단말기 중 어느 하나에서 수행될 수 있다. 다만, 이에 한정되지 않고 후술하는 방법은 서버 및 단말기 중 어느 하나에서 단독으로 수행될 수 있음은 통상의 기술자에게 자명하다.Below, an image synthesis method using the above-described components will be described in detail. The image synthesis method described later is implemented through data transmission and reception between the server and the terminal described above, and some steps of the method may be performed in either the server or the terminal. However, it is obvious to those skilled in the art that the method described below is not limited to this and can be performed independently in either a server or a terminal.

도 5는 본 개시에 따른 영상 합성 방법의 흐름도이다.Figure 5 is a flowchart of an image synthesis method according to the present disclosure.

도 5를 참조하면, 촬영된 내시경 영상 및 촬영된 내시경 영상과 관련된 속성값, 병변의 위치 정보, 마스크 정보 및 병변 이미지 중 적어도 하나를 기반으로 인공지능 모델을 학습하는 방법이 진행된다(S110).Referring to FIG. 5, a method of learning an artificial intelligence model is performed based on at least one of the captured endoscopic image, attribute values related to the captured endoscopic image, location information of the lesion, mask information, and lesion image (S110).

상기 영상 합성 모듈(320)은 소정 데이터를 입력 받아 가상의 내시경 영상을 합성하는 인공지능 모델을 포함할 수 있다. 상기 인공지능 모델은 실제로 촬영된 내시경 영상을 기반으로 학습될 수 있다.The image synthesis module 320 may include an artificial intelligence model that receives predetermined data and synthesizes a virtual endoscope image. The artificial intelligence model can be learned based on actually captured endoscope images.

일 실시 예에 있어서, 상기 촬영된 내시경 영상은 위 내시경 영상, 소장 내시경 영상및 대장 내시경 영상 중 어느 하나일 수 있으나, 이에 한정하지 않는다.In one embodiment, the captured endoscopic image may be any one of a gastric endoscopic image, a small intestine endoscopic image, and a colonoscopic image, but is not limited thereto.

상기 인공지능 모델은 실제로 촬영된 내시경 영상 및 상기 촬영된 내시경 영상과 관련된 적어도 하나의 속성값, 병변의 위치 정보, 마스크 정보 및 병변 이미지를 학습 데이터로하여, 내시경 영상과 관련된 속성값이 포함된 텍스트, 병변의 위치 정보 및 마스크 정보를 입력받아 가상의 내시경 영상을 합성하도록 학습될 수 있다. 인공지능 모델 학습을 위한 학습 데이터 및 학습 방법에 대한 구체적인 설명은 후술한다.The artificial intelligence model uses the actually captured endoscopic image, at least one attribute value related to the captured endoscopic image, lesion location information, mask information, and lesion image as learning data, and provides text containing attribute values related to the endoscopic image. , it can be learned to synthesize a virtual endoscopic image by receiving lesion location information and mask information. A detailed description of the learning data and learning method for learning an artificial intelligence model is described later.

다음으로, 학습된 인공지능 모델에 내시경 영상과 관련된 속성값이 포함된 텍스트, 병변의 위치 정보 및 마스크 정보 중 적어도 하나를 입력받는 단계가 진행된다(S120).Next, a step is performed in which at least one of text containing attribute values related to the endoscopic image, lesion location information, and mask information is input to the learned artificial intelligence model (S120).

사용자는 단말기(20)에 포함된 입력부를 통해, 내시경 영상과 관련된 속성값 또는 상기 속성값이 포함된 텍스트를 입력할 수 있다.The user can input attribute values related to the endoscope image or text containing the attribute values through the input unit included in the terminal 20.

일 실시 예에 있어서, 내시경 영상과 관련된 속성값을 입력받는 경우, 단말기(20)에 포함된 표시부에는 복수의 항목 및 복수의 항목 각각에 대응되는 적어도 하나의 속성값이 표시될 수 있다. 사용자는 상기 표시된 속성값 중 적어도 하나를 선택함으로써, 내시경 영상과 관련된 속성값을 입력할 수 있다. In one embodiment, when attribute values related to an endoscopic image are input, a plurality of items and at least one attribute value corresponding to each of the plurality of items may be displayed on the display unit included in the terminal 20. The user can input attribute values related to the endoscopic image by selecting at least one of the displayed attribute values.

이 경우, 문장 생성 모듈(310)은 상기 입력 받은 속성값에 기반하여 문장 형태의 텍스트를 생성할 수 있다. In this case, the sentence generation module 310 may generate text in the form of a sentence based on the input attribute value.

예를 들어, 사용자를 통해, 복수의 속성값 'wall view', 'TI location', 'flat', 'polyp'을 입력받은 경우, 문장 생성 모듈(310)은 'make a wall view picture taken at the colon TI location, and flat polyp at the center of the picture.'이라는 문장을 생성할 수 있다.For example, when a plurality of attribute values 'wall view', 'TI location', 'flat', and 'polyp' are input by the user, the sentence generation module 310 generates 'make a wall view picture taken at the You can create the sentence 'colon TI location, and flat polyp at the center of the picture.'

한편, 단말기(20)는 사용자로부터 내시경 영상과 관련된 속성값을 포함하는 완성된 문장 형태의 텍스트를 입력 받을 수 있다.Meanwhile, the terminal 20 can receive text input in the form of a completed sentence including attribute values related to the endoscope image from the user.

예를 들어, 단말기(20)는 사용자로부터 'make a wall view picture taken at the colon TI location, and flat polyp at the center of the picture.'라는 문장 형태의 텍스트를 입력받을 수 있다.For example, the terminal 20 may receive text from the user in the form of the sentence 'make a wall view picture taken at the colon TI location, and flat polyp at the center of the picture.'

한편, 내시경 영상과 관련된 속성값은 복수의 항목 각각에 대응되는 복수의 속성값을 포함할 수 있다. Meanwhile, attribute values related to an endoscopic image may include a plurality of attribute values corresponding to each of a plurality of items.

일 실시 예에 있어서, 상기 속성값은 글로벌 스탠드 SES-CD 스코어 평가 항목과 관련된 속성값을 포함할 수 있다.In one embodiment, the attribute value may include an attribute value related to a global stand SES-CD score evaluation item.

예를 들어, 상기 속성값은 촬영된 장기의 종류, 촬영 대상의 연령, 진단명, 내시경 촬영 장치의 시야, 병변 시야, 촬영된 병변의 위치, 촬영된 병변의 모양, 촬영된 병변의 크기, 내시경 촬영 장치의 종류, 내시경 시술 도구의 종류 및 내시경 촬영 상황 각각에 대응되는 속성값을 포함할 수 있다.For example, the attribute values include the type of organ imaged, the age of the imaged subject, the name of the diagnosis, the field of view of the endoscopic imaging device, the field of view of the lesion, the location of the imaged lesion, the shape of the imaged lesion, the size of the imaged lesion, and the endoscopic image. It may include attribute values corresponding to each type of device, type of endoscopic procedure tool, and endoscopic imaging situation.

'촬영된 장기의 종류' 항목은 내시경 영상의 촬영 대상이 되는 장기의 종류이다.일 실시 예에 있어서, '촬영된 장기의 종류' 항목에 대응되는 속성값은 위 및 대장 중 어느 하나일 수 있다.The 'type of organ imaged' item is the type of organ that is the subject of imaging of the endoscopic image. In one embodiment, the attribute value corresponding to the 'type of organ imaged' item may be any one of the stomach and large intestine. .

'촬영 대상의 연령' 항목은 내시경 영상의 촬영 대상이되는 사람의 나이이다. 일 실시 예에 있어서, '촬영 대상의 연령' 항목에 대응되는 속성값은 '10세 미만', '10 내지 20세', '20 내지 60세', '60세 초과' 중 어느 하나일 수 있다.The 'Age of the subject of filming' is the age of the person who is the subject of the endoscopic image. In one embodiment, the attribute value corresponding to the 'age of the subject of photography' item may be any one of 'under 10 years old', '10 to 20 years old', '20 to 60 years old', and 'over 60 years old'. .

'진단명' 항목은 내시경 영상에서 발견된 병변에 대응되는 진단명을 의미한다. 일 실시 예에 있어서, '진단명' 항목에 대응되는 속성값은 궤양(ulcer), 폴립(polyp) 및 암(cancer) 중 어느 하나일 수 있다.The ‘Diagnosis Name’ item refers to the diagnosis name corresponding to the lesion found in the endoscopic image. In one embodiment, the attribute value corresponding to the 'diagnosis name' item may be any one of ulcer, polyp, and cancer.

'내시경 촬영 장치의 시야' 항목은 내시경 장치가 바라보는 시야를 의미한다. 일 실시 예에 있어서, '내시경 촬영 장치의 시야' 항목에 대응되는 속성값은 내강(lumen), 벽(wall) 및 반전(J turn) 중 어느 하나일 수 있다. The 'field of view of the endoscope imaging device' item refers to the field of view viewed by the endoscope device. In one embodiment, the attribute value corresponding to the 'field of view of the endoscopic imaging device' item may be any one of lumen, wall, and J turn.

'병변 시야' 항목은 내시경 영상에 포함된 병변의 촬영 방향을 의미한다. 일 실시 예에 있어서, '병변 시야' 항목에 대응되는 속성값은 정면(front), 사선 방향(oblique) 및 측면 방향(lateral) 중 어느 하나일 수 있다.The 'lesion field of view' item refers to the imaging direction of the lesion included in the endoscopic image. In one embodiment, the attribute value corresponding to the 'lesion field of view' item may be any one of front, oblique, and lateral.

'촬영된 병변의 위치' 항목은 내시경 영상에 포함된 병변이 위치한 장기를 의미한다. 일 실시 예에 있어서, '촬영된 병변의 위치' 항목에 대응되는 속성값은 식도(E1/2), 위(S1/2/3/4/5), 십이지장(D1/2/3)), 말단부회장(TI), 맹장(C), 상행결장(AC), 간굴곡(HF), 횡행결장(TC), 비장굴곡(SF), 하행결장(DC), S자결장(SC) 및 직장(R) 중 어느 하나일 수 있다.The 'Location of imaged lesion' item refers to the organ where the lesion included in the endoscopic image is located. In one embodiment, the attribute values corresponding to the 'location of imaged lesion' item are esophagus (E1/2), stomach (S1/2/3/4/5), duodenum (D1/2/3), Terminal ileum (TI), cecum (C), ascending colon (AC), hepatic flexure (HF), transverse colon (TC), splenic flexure (SF), descending colon (DC), sigmoid colon (SC), and rectum ( It may be any one of R).

'촬영된 병변의 모양' 항목은 내시경 영상에 포함된 병변의 모양을 의미한다. 일 실시 예에 있어서, '촬영된 병변의 모양' 항목에 대응되는 속성값은 평평한(flat), 무경성(sessile), 유경성(pedunculated), 타원형(oval) 및 불규칙적인(irregular) 중 어느 하나일 수 있다.The 'Shape of the imaged lesion' item refers to the shape of the lesion included in the endoscopic image. In one embodiment, the attribute value corresponding to the 'shape of photographed lesion' item is one of flat, sessile, pedunculated, oval, and irregular. It can be.

'촬영된 병변의 크기' 항목은 내시경 영상에 포함된 병변의 크기를 의미한다. 일 실시 예에 있어서, '촬영된 병변의 크기' 항목에 대응되는 속성값은 아프타성(aphthous), 큼(large) 및 매우 큼(very large) 중 어느 하나일 수 있다.The 'size of imaged lesion' item refers to the size of the lesion included in the endoscopic image. In one embodiment, the attribute value corresponding to the 'size of imaged lesion' item may be any one of aphthous, large, and very large.

'내시경 촬영 장치의 종류, 내시경 시술 도구의 종류' 항목은 내시경 영상 촬영에 사용된 장비의 종류를 의미한다. 일 실시 예에 있어서, '내시경 촬영 장치의 종류, 내시경 시술 도구의 종류' 항목에 대응되는 속성값은 포셉(forcep) 및 올가미(snare) 중 어느 하나일 수 있다. The 'Type of endoscopic imaging device, type of endoscopic procedure tool' item refers to the type of equipment used for endoscopic imaging. In one embodiment, the attribute value corresponding to the item 'Type of endoscopic imaging device, type of endoscopic surgical tool' may be one of forceps and snare.

'내시경 촬영 상황' 항목은 어떤 상황에서 내시경 촬영을 실시하였는지를 의미한다. 일 실시 예에 있어서, '내시경 촬영 상황' 항목에 대응되는 속성값은 조직검사(biopsy), 색소 주입(dye injection) 및 협대역 화상 강화촬영(narrow-band imaging) 중 어느 하나일 수 있다.The 'endoscopic imaging situation' item refers to under what circumstances the endoscopic imaging was performed. In one embodiment, the attribute value corresponding to the 'endoscopic imaging situation' item may be any one of biopsy, dye injection, and narrow-band imaging.

한편, 상술한 실시 예에 한정되지 않고, 촬영된 내시경 영상에 매칭되는 속성값은 영상 바깥 테두리에 원형 플라스틱(내시경 장치의 일부분)의 존재 여부, 내시경 영상에 박스 및 글자가 존재하는지 여부 중 적어도 하나에 대응되는 속성값을 포함할 수 있다.Meanwhile, without being limited to the above-described embodiment, the attribute value matching the captured endoscopic image is at least one of the presence or absence of circular plastic (part of the endoscope device) on the outer edge of the image and the presence of boxes and letters in the endoscopic image. It may contain attribute values corresponding to .

추가적으로, 사용자는 내시경 영상 합성을 위해, 실제로 촬영된 내시경 영상, 병변의 위치 정보 및 마스크 정보 중 적어도 하나를 입력할 수 있다.Additionally, the user may input at least one of an actually captured endoscopic image, lesion location information, and mask information for endoscopic image synthesis.

실제로 촬영된 내시경 영상이 입력될 경우, 내시경 영상 합성은 입력된 내시경 영상에 사용자가 입력한 속성값, 병변의 위치 정보 및 마스크 정보가 적용된 내시경 영상이 합성될 수 있다.When an actually captured endoscopic image is input, an endoscopic image synthesis may be performed by applying the user-entered attribute values, lesion location information, and mask information to the input endoscopic image.

한편, 병변의 위치 정보 사용자가 병변을 생성하고자 하는 내시경 영상 위의 위치를 정의한다. 일 실시 예에 있어서, 병변의 위치 정보는 병변의 중심 좌표, X방향 두께, y방향 두께 값을 포함할 수 있다. Meanwhile, the location information of the lesion defines the location on the endoscope image where the user wants to create the lesion. In one embodiment, the location information of the lesion may include the lesion's center coordinates, X-direction thickness, and y-direction thickness values.

한편, 마스크 정보는 내시경 촬영 시 광량 및 빛의 조사 방향에 따라 생성되는 어둡게 보이는 영역 및 밝게 보이는 영역을 정의하거나, 내시경 촬영시 부유물 등의 유무로 인해 선명하게 보이는 영역 및 불명확하게 보이는 영역을 정의한다. 일 실시 예에 있어서, 마스크 정보는 합성하고자 하는 내시경 영상과 동일한 크기의 영상일 수 있으며, 영상을 구성하는 영역별 픽셀값은 복수의 기설정된 픽셀값 중 어느 하나로 정의될 수 있다.Meanwhile, mask information defines areas that appear dark and areas that appear bright depending on the amount of light and the direction of light irradiation during endoscopic imaging, or areas that appear clearly and areas that appear unclear due to the presence or absence of floating objects during endoscopic imaging. . In one embodiment, the mask information may be an image of the same size as the endoscopic image to be synthesized, and the pixel value for each area constituting the image may be defined as one of a plurality of preset pixel values.

예를 들어, 마스크 정보를 구성하는 픽셀값은 세 종류의 픽셀값 중 어느 하나일 수 있으며, 세 종류의 픽셀값 각각은 어두운 영역을 정의하는 픽셀값, 밝은 영역을 정의하는 픽셀값 및 부유물이 존재하는 영역을 정의하는 픽셀값일 수 있다.For example, the pixel value constituting the mask information may be any one of three types of pixel values, each of which includes a pixel value defining a dark area, a pixel value defining a bright area, and the presence of floaters. It may be a pixel value that defines an area.

나아가, 상기 부유물이 존재하는 영역은 보다 세분화된 픽셀값으로 정의될 수 있다. 예를 들어, 상기 부유물이 존재하는 영역은 물의 투명도, 물에 존재하는 부유물의 밀도 등에 따라 세분화된 픽셀값으로 정의될 수 있다. Furthermore, the area where the floating matter exists can be defined with more detailed pixel values. For example, the area where the floating matter exists may be defined by pixel values segmented according to the transparency of the water, the density of the floating matter present in the water, etc.

다른 일 실시 예에 있어서, 마스크 정보는 생성하고자 하는 내시경 영상의 백그라운드 특성을 정의할 수 있다. 예를 들어, 상기 마스크 정보는 내시경 촬영의 대상이 되는 장기 내부의 전체적인 염증 정도, 나이에 따른 장기 상태, 장기의 구불거림 정도, 특정 각도에서 병변을 촬영하기 위해 병변 또는 병변 주변을 압박한 상태등을 정의하는 값을 정의할 수 있다. In another embodiment, mask information may define background characteristics of an endoscopic image to be generated. For example, the mask information includes the overall degree of inflammation inside the organ that is the target of endoscopic imaging, the state of the organ according to age, the degree of tortuosity of the organ, the state of pressure on the lesion or around the lesion to image the lesion at a specific angle, etc. You can define a value that defines .

마지막으로, 상기 입력 받은 텍스트와 실제로 촬영된 내시경 영상, 병변의 위치 정보 및 마스크 정보 중 적어도 하나를 기반으로 가상의 내시경 영상을 합성하는 단계가 진행된다(S130).Finally, a step of synthesizing a virtual endoscopic image based on the input text, at least one of the actually captured endoscopic image, lesion location information, and mask information is performed (S130).

인공지능 모델은 상술한 사용자로부터 입력받은 텍스트에 포함된 속성값을 반영한 가상의 내시경 영상을 합성할 수 있다. 상기 인공지능 모델은 다양한 크기, 형태의 병변이 다양한 위치에 생성된 가상의 내시경 영상을 합성할 수 있다.The artificial intelligence model can synthesize a virtual endoscope image that reflects the attribute values included in the text input from the user described above. The artificial intelligence model can synthesize virtual endoscopic images in which lesions of various sizes and shapes are created at various locations.

추가적으로, 인공지능 모델은 사용자로부터 입력받은 텍스트와 사용자로부터 입력받은 실제로 촬영된 내시경 영상, 병변의 위치 정보 및 마스크 정보 중 적어도 하나를 활용하여 내시경 영상을 합성할 수 있다.Additionally, the artificial intelligence model can synthesize an endoscopic image using at least one of the text input from the user, the actually captured endoscopic image input from the user, location information of the lesion, and mask information.

일 실시 예에 있어서, 인공지능 모델은 내시경 영상 합성 시 사용자로부터 입력 받은, 실제로 촬영된 내시경 영상을 활용하여, 실제로 촬영된 내시경 영상에 사용자로부터 입력받은 속성값(또는 텍스트로부터 추출된 속성값), 병변의 위치 정보 및 마스크 정보를 적용하여 내시경 영상을 합성할 수 있다.In one embodiment, the artificial intelligence model utilizes the actually captured endoscopic image input from the user when synthesizing the endoscopic image, and includes attribute values input from the user in the actually captured endoscopic image (or attribute values extracted from text), Endoscopic images can be synthesized by applying lesion location information and mask information.

다른 일 실시 예에 있어서, 인공지능 모델은 내시경 영상 합성 시 사용자로부터 입력 받은 병변의 위치 정보를 활용하여 합성할 내시경 영상에 포함된 병변의 위치를 결정할 수 있다.In another embodiment, the artificial intelligence model may determine the location of the lesion included in the endoscopic image to be synthesized by using the location information of the lesion input from the user when synthesizing the endoscopic image.

다른 일 실시 예에 있어서, 인공지능 모델은 내시경 경상 합성 시 사용자로부터 입력받은 마스크 정보를 활용하여 합성할 내시경 영상에서 명확하게 보이는 부분과 명확하게 보이지 않는 영역을 결정할 수 있다.In another embodiment, the artificial intelligence model may use mask information input from the user during endoscopic mirror image synthesis to determine clearly visible and not clearly visible areas in the endoscopic image to be synthesized.

한편, 인공지능 모델은 사용자로부터 내시경 영상 합성을 위해 입력받은 정보를 한 번에 적용하여 내시경 영상을 합성하거나, 사용자로부터 입력받은 서로 다른 종류의 정보를 단계적으로 적용하여 영상을 합성할 수 있다.Meanwhile, the artificial intelligence model can synthesize an endoscopic image by applying the information input from the user at once for synthesizing the endoscopic image, or it can synthesize the image by applying different types of information input from the user in stages.

예를 들어, 사용자가 내시경 영상 합성을 위해 실제로 촬영한 내시경 영상, 적어도 하나의 속성값을 포함하는 텍스트, 병변의 위치 정보 및 마스크 정보를 입력받은 경우, 인공지능 모델은 실제로 촬영한 내시경 영상에 상기 속성값을 적용한 영상을 1차적으로 합성하고, 상기 1차적으로 합성한 영상에 상기 병변의 위치 정보를 적용한 영상을 2차적으로 합성하고, 2차적으로 합성한 영상에 상기 마스크 정보를 적용하여 최종 영상을 합성할 수 있다. 이때, 단말기(20)에는 단계적으로 합성된 영상 각각이 표시될 수 있다. For example, when a user inputs an actually captured endoscopic image, text containing at least one attribute value, lesion location information, and mask information to synthesize an endoscopic image, the artificial intelligence model uses the actually captured endoscopic image as input. An image to which attribute values are applied is first synthesized, an image to which the location information of the lesion is applied is secondarily synthesized to the primarily synthesized image, and the mask information is applied to the secondarily synthesized image to produce a final image. can be synthesized. At this time, each step-by-step synthesized image may be displayed on the terminal 20.

이하에서는, 본 개시에 따른 영상 합성 장치에 포함된 인공지능 모델의 학습 방법에 대하여 구체적으로 설명한다.Hereinafter, a method for learning an artificial intelligence model included in the image synthesis device according to the present disclosure will be described in detail.

도 6은 본 개시에 따른 영상 합성 장치에 포함된 인공지능 모델의 학습 방법을 나타내는 흐름도이고, 도 7 및 8은 인공지능 모델의 학습 데이터 생성을 위한 사용자 인터페이스 화면이고, 도 9 및 10은 본 개시에 따른 영상 합성 장치에 포함된 인공지 능 모델의 일 실시 예를 나타내는 개념도이다.FIG. 6 is a flowchart showing a method of learning an artificial intelligence model included in the image synthesis device according to the present disclosure, FIGS. 7 and 8 are user interface screens for generating learning data of the artificial intelligence model, and FIGS. 9 and 10 are a flowchart of the present disclosure. This is a conceptual diagram showing an example of an artificial intelligence model included in an image synthesis device according to .

도 6을 참조하면, 단말기(20)가 인공지능 모델 학습을 위한 학습 데이터 생성 시, 촬영된 내시경 영상 및 복수의 속성값을 표시하는 단계가 진행된다(S210).Referring to FIG. 6, when the terminal 20 generates learning data for learning an artificial intelligence model, a step of displaying the captured endoscopic image and a plurality of attribute values is performed (S210).

이를 위해, 단말기(20)에 포함된 표시부는 상기 촬영된 내시경 영상 및 상기 촬영된 내시경 영상과 관련된 속성값을 입력 받는 입력창을 표시할 수 있다.To this end, the display unit included in the terminal 20 may display an input window for inputting the captured endoscopic image and attribute values related to the captured endoscopic image.

예를 들어, 단말기(20)에는 상기 속성값이 항목 별로 표시될 수 있으며, 표시되는 항목은 촬영된 장기의 종류, 촬영 대상의 연령, 진단명, 내시경 촬영 장치의 시야, 병변 시야, 촬영된 병변의 위치, 촬영된 병변의 모양, 촬영된 병변의 크기, 내시경 촬영 장치의 종류, 내시경 시술 도구의 종류 및 내시경 촬영 상황 중 적어도 하나에 대한 항목일 수 있다.For example, the attribute values may be displayed on the terminal 20 for each item, and the displayed items include the type of organ imaged, age of the imaged subject, name of diagnosis, field of view of the endoscopic imaging device, field of view of the lesion, and type of lesion imaged. The item may be at least one of the following: location, shape of the imaged lesion, size of the imaged lesion, type of endoscopic imaging device, type of endoscopic procedure tool, and endoscopic imaging situation.

한편, 상술한 실시 예에 한정되지 않고, 단말기(20)에 표시되는 속성값은 영상 바깥 테두리에 원형 플라스틱(내시경 장치의 일부분)의 존재 여부, 내시경 영상에 박스 및 글자가 존재하는지 여부 중 적어도 하나에 대응되는 속성값을 포함할 수 있다.Meanwhile, without being limited to the above-described embodiment, the attribute value displayed on the terminal 20 includes at least one of the presence or absence of circular plastic (part of the endoscope device) on the outer edge of the image and the presence of boxes and letters in the endoscope image. It may contain attribute values corresponding to .

다음으로, 단말기(20)가 표시된 복수의 속성값 중 촬영된 내시경 영상과 관련된 적어도 하나의 속성값 입력받는 단계가 진행된다(S210).Next, a step is performed in which the terminal 20 receives at least one attribute value related to the captured endoscope image among the plurality of attribute values displayed (S210).

사용자는 촬영된 내시경 영상을 참고하여, 촬영된 내시경 영상에 대응되는 속성값을 직접 입력하거나, 단말기(20)에 표시된 표시된 복수의 속성값 중 적어도 하나를 선택함으로써, 촬영된 내시경 영상에 대응되는 속성값을 입력할 수 있다.The user refers to the captured endoscopic image and directly enters the attribute value corresponding to the captured endoscopic image, or selects at least one of the plurality of attribute values displayed on the terminal 20 to determine the attribute corresponding to the captured endoscopic image. You can enter a value.

예를 들어, 도 7을 참조하면, 단말기(20)는 실제로 촬영된 내시경 영상(310)이 표시되며, 복수의 항목 및 복수의 항목 각각에 대응되는 속성값이 체크 박스 형태로 표시되는 입력창이 표시될 수 있다. 구체적으로, 단말기(20)에는 '촬영 대상의 연령' 항목(age, 320)이 표시되며, age에 대응되는 속성값 '10세 미만'(321), '10 내지 20세'(322), '20 내지 60세'(323), '60세 초과'(324) 각각이 체크 박스 형태로 표시될 수 있다. For example, referring to FIG. 7, the terminal 20 displays an actually captured endoscope image 310 and displays an input window in which a plurality of items and attribute values corresponding to each of the plurality of items are displayed in the form of a check box. It can be. Specifically, the terminal 20 displays the 'age of the subject of filming' item (age, 320), and the attribute values corresponding to age are 'less than 10 years old' (321), '10 to 20 years old' (322), ' '20 to 60 years old' (323) and 'over 60 years old' (324) may each be displayed in the form of a check box.

나아가, 도 7을 참조하면, 단말기(20)에는 병변의 위치 정보 및 마스크 정보는 입력할 수 있는 입력창(330)이 표시될 수 있다. Furthermore, referring to FIG. 7 , the terminal 20 may display an input window 330 where lesion location information and mask information can be input.

예를 들어, 사용자는 단말기(20)에 표시된 병변의 위치 정보 및 마스크 정보는 입력할 수 있는 입력창에 병변의 위치를 선택하거나, 어두워서 잘 보이지 않는 영역, 명확하게 보이는 영역, 부유물로 인해 잘 보이지 않는 영역 등을 지정할 수 있다. For example, the user selects the location of the lesion in the input window where the location information and mask information of the lesion displayed on the terminal 20 can be input, or selects an area that is not easily visible due to dark, an area that is clearly visible, or an area that is not easily visible due to floaters. You can specify areas that are not allowed.

다음으로, 사용자를 통해 입력받은 속성값을 기반으로 텍스트를 생성하는 단계가 진행된다(S320).Next, the step of generating text based on the attribute values input by the user proceeds (S320).

사용자가 내시경 영상 합성에 적용시킬, 적어도 하나의 속성값을 입력하는 경우, 문장 생성 모듈(310)은 상기 선택된 속성값을 포함하는 완성된 문장 형태의 텍스트를 생성할 수 있다.When the user inputs at least one attribute value to be applied to endoscopic image synthesis, the sentence generation module 310 may generate text in the form of a completed sentence including the selected attribute value.

예를 들어, 도 8을 참조하면, 사용자가 단말기(20)에 표시된, 실제로 촬영된 내시경 영상(310)을 참고하여 'lumen', 'C', 'flat', 'sessile', 'forcep', 'NBI'를 선택한 경우, 문장 생성 모듈(310)은 완성된 형태의 문장 'a lumen view picture taken at the colon C location'(341)을 생성할 수 있다.For example, referring to FIG. 8, a user may select 'lumen', 'C', 'flat', 'sessile', 'forcep', When 'NBI' is selected, the sentence generation module 310 can generate the completed sentence 'a lumen view picture taken at the colon C location' (341).

한편, 문장 생성 모듈(310)은 사용자가 입력한 속성값에 기반하여, 복수의 서로 다른 형태의 문장을 생성할 수 있다. 예를 들어, 문장 생성 모듈(310)은 동일한 속성값에 대응하여 'Pedunculated polyp on the lumen view using NBI with a cap from cancer patient aged 10 to 20.' 및 'Pedunculated polyp on the lumen view.' 및 'Large polyp using NBI with a cap from cancer patient'와 같은 복수의 텍스트를 생성할 수 있다. Meanwhile, the sentence generation module 310 may generate a plurality of different types of sentences based on attribute values input by the user. For example, the sentence generation module 310 may generate 'Pedunculated polyp on the lumen view using NBI with a cap from cancer patient aged 10 to 20.' in response to the same attribute value. and ‘Pedunculated polyp on the lumen view.’ and 'Large polyp using NBI with a cap from cancer patient' can be created.

또한, 도 8을 참조하면, 단말기(20)는 병변의 위치 정보 및 마스크 정보 입력창(330)을 통해 입력 받은 정보를 활용하여 병변의 위치 정보(342) 및 마스크 정보(343)를 생성할 수 있다. In addition, referring to FIG. 8, the terminal 20 can generate lesion location information 342 and mask information 343 by utilizing information input through the lesion location information and mask information input window 330. there is.

마지막으로, 상기 촬영된 내시경 영상, 사용자를 통해 입력받은 속성값 및 상기 속성값을 기반으로 생성된 텍스트, 병변의 위치 정보 및 마스크 정보 중 적어도 하나를 활용하여 인공지능 모델을 학습시키는 단계가 진행된다(S240).Finally, a step of learning an artificial intelligence model is performed using at least one of the captured endoscopic image, attribute values input by the user, text generated based on the attribute values, lesion location information, and mask information. (S240).

여기서, 상기 인공지능 모델 학습을 위한 학습 데이터는 사용자를 통해 입력받은 적어도 하나의 속성값, 병변의 위치 정보 및 마스크 정보 중 적어도 하나를 상기 촬영된 내시경 영상에 매칭시켜 생성될 수 있다. 즉, 상기 학습 데이터는 사용자가 선택한 속성값, 사용자가 생성하고자 하는 병변의 위치, 내시경 영상 촬영 시 명확하게 보이는 영역과 명확하게 보이지 않는 영역을 정의한 정보를 상기 촬영된 내시경에 매칭시켜 생성될 수 있다.Here, learning data for learning the artificial intelligence model may be generated by matching at least one of at least one attribute value, lesion location information, and mask information input through the user to the captured endoscopic image. That is, the learning data can be generated by matching attribute values selected by the user, the location of the lesion the user wants to create, and information defining clearly visible and not clearly visible areas when capturing an endoscopic image with the imaged endoscope. .

나아가, 상기 학습 데이터는 상기 사용자를 통해 통해 입력받은 적어도 하나의 속성값 및 상기 입력창을 통해 입력받은 적어도 하나의 속성값을 기반으로 생성된 텍스트를 상기 촬영된 내시경 영상에 매칭시켜 생성될 수 있다. 즉, 상기 학습 데이터는 상기 촬영된 내시경 영상에 사용자가 선택한 속성값 뿐만 아니라, 사용자가 선택한 속성값을 기반으로 생성된 텍스트를 매칭시켜 생성될 수 있다.Furthermore, the learning data may be generated by matching text generated based on at least one attribute value input through the user and at least one attribute value input through the input window to the captured endoscopic image. . That is, the learning data can be generated by matching not only the user-selected attribute value to the captured endoscope image, but also text generated based on the user-selected attribute value.

여기서, 상기 촬영된 내시경 영상에 매칭되는 텍스트는 복수개일 수 있다. 상술한 바와 같이, 문장 생성 모듈(310)은 동일한 속성값으로부터 서로 다른 문장을 생성할 수 있다. 상기 촬영된 내시경 영상에는 상기 서로 다른 문장 각각이 매칭될 수 있다.Here, there may be a plurality of texts matching the captured endoscope image. As described above, the sentence generation module 310 can generate different sentences from the same attribute value. Each of the different sentences may be matched to the captured endoscopic image.

한편, 상기 학습 데이터에는 상기 촬영된 내시경 영상에 포함된 병변의 위치 정보가 매칭될 수 있다. 일 실시 예에 있어서, 병변의 위치 정보는 병변의 중심 좌표, X방향 두께, y방향 두께 값을 포함할 수 있다. Meanwhile, location information of lesions included in the captured endoscopic image may be matched to the learning data. In one embodiment, the lesion location information may include the lesion's center coordinates, X-direction thickness, and y-direction thickness values.

예를 들어, 도 9를 참조하면, 학습 데이터(train data)는 촬영된 내시경 영상(Image), 적어도 하나의 속성값을 포함하는 텍스트, 병변의 위치 정보(바운딩 박스, Bounding Box) 및 마스크 정보(세분화 마스크) 중 적어도 하나를 포함할 수 있다. 상기 학습 데이터를 활용하여, 인공지능 모델이 내시경 영상과 관련된 속성값을 포함하는 텍스트를 입력받아 가상의 내시경 영상을 합성하도록 학습(Text2Image)되거나, 내시경 영상을 입력받아 가상의 내시경 영상을 합성하도록 학습(Image2Image)되거나, 상기 내시경 영상과 관련된 속성값이 포함된 텍스트를 기반으로, 실제로 촬영된 내시경 영상에 병변 이미지를 합성하도록 학습(Multi-modal2Image)될 수 있다. For example, referring to FIG. 9, training data includes a captured endoscopic image (Image), text including at least one attribute value, lesion location information (bounding box), and mask information ( may include at least one of a segmentation mask). Using the above learning data, the artificial intelligence model is trained to synthesize a virtual endoscope image by receiving text containing attribute values related to the endoscope image (Text2Image), or is trained to synthesize a virtual endoscope image by receiving an endoscope image. (Image2Image), or it can be learned (Multi-modal2Image) to synthesize a lesion image with an actually captured endoscopic image based on text containing attribute values related to the endoscopic image.

한편, 상기 학습 데이터에는 마스크 정보가 매칭될 수 있다. 예를 들어, 마스크 정보를 구성하는 픽셀값은 세 종류의 픽셀값 중 어느 하나일 수 있으며, 세 종류의 픽셀값 각각은 어두운 영역을 정의하는 픽셀값, 밝은 영역을 정의하는 픽셀값 및 부유물이 존재하는 영역을 정의하는 픽셀값일 수 있다. Meanwhile, mask information may be matched to the learning data. For example, the pixel value constituting the mask information may be any one of three types of pixel values, each of which includes a pixel value defining a dark area, a pixel value defining a bright area, and the presence of floaters. It may be a pixel value that defines an area.

한편, 상기 학습 데이터에는 병변 영상이 매칭될 수 있다. 상기 병변 영상은 사용자로부터 입력 받은 영상 중 사용자가 지정한 병변의 위치 정보에 해당하는 영상을 잘라낸 영상이다. 상술한 잘라낸 영상은 인공지능 모델 학습을 위한 학습 데이터로 활용될 수 있다.Meanwhile, a lesion image may be matched to the learning data. The lesion image is an image obtained by cropping the image corresponding to the location information of the lesion designated by the user among the images input from the user. The above-mentioned cut video can be used as learning data for learning artificial intelligence models.

한편, 인공지능 모델은 실제로 촬영된 내시경 영상을 입력받은 경우, 그대로 학습에 활용하거나 점진적으로 노이즈를 추가하거나 없애는 모델(Diffusion model)을 적용하여 노이즈가 적용 또는 제거된 학습용 영상을 생성한 후, 인공지능 모델 학습에 활용할 수 있다.On the other hand, when the artificial intelligence model receives an actually captured endoscope image, it can be used for learning as is or a model that gradually adds or removes noise (diffusion model) is applied to generate a learning image with noise applied or removed, and then artificially It can be used for learning intelligence models.

도 10은 인공지능 모델에 포함된 잠재 공간(latent space)을 나타내는 개념도이나, 이에 한정되지 않는다.Figure 10 is a conceptual diagram showing a latent space included in an artificial intelligence model, but is not limited thereto.

상술한 바와 같이, 본 개시에 따른 영상 합성 장치 및 방법은 실제 내시경 데이터가 없어도 다양한 병변 영상을 대량을 합성할 수 있도록 한다. 이러한 방식으로 생성된 대량의 영상 데이터는 내시경 영상을 통해 병변을 예측하기 위한 인공지능 모델의 학습 데이터로 활용될 수 있다. 즉, 본 개시에 따르면, 소량의 실제로 촬영된 내시경 영상만으로, 인공지능 모델 학습을 위한 학습 데이터(가상의 내시경 영상)을 대량을 생성할 수 있다.As described above, the image synthesis device and method according to the present disclosure enable the synthesis of a large number of images of various lesions even without actual endoscopic data. A large amount of image data generated in this way can be used as training data for an artificial intelligence model to predict lesions through endoscopic images. That is, according to the present disclosure, a large amount of learning data (virtual endoscope images) for learning an artificial intelligence model can be generated using only a small amount of actually captured endoscope images.

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium that stores instructions executable by a computer. Instructions may be stored in the form of program code and, when executed by a processor, may create program modules to perform operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. Computer-readable recording media include all types of recording media storing instructions that can be decoded by a computer. For example, there may be Read Only Memory (ROM), Random Access Memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, etc.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 개시가 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As described above, the disclosed embodiments have been described with reference to the attached drawings. A person skilled in the art to which this disclosure pertains will understand that the present disclosure may be practiced in forms different from the disclosed embodiments without changing the technical idea or essential features of the present disclosure. The disclosed embodiments are illustrative and should not be construed as limiting.

Claims

In a virtual endoscopic image synthesis device,
an input unit configured to receive at least one of text including attribute values related to the endoscopic image, lesion location information, mask information, and an actually captured endoscopic image;
A processor configured to synthesize a virtual endoscopic image based on at least one of the input text, lesion location information, mask information, and an actually captured endoscopic image,
The processor,
Using at least one of the actually captured endoscopic image, at least one attribute value related to the captured endoscopic image, lesion location information, and mask information as learning data, text containing attribute values related to the endoscopic image, lesion location information , an image synthesis module including an artificial intelligence model learned to synthesize a virtual endoscopic image by receiving at least one of mask information and an actually captured endoscopic image,
The artificial intelligence model is,
Learned to synthesize a lesion image with an actually captured endoscopic image based on at least one of text containing attribute values related to the endoscopic image, location information of the lesion, and the mask information,
The mask information defines the brightness and darkness of the area according to the amount of light and the direction of light irradiation during endoscopic imaging, or defines the sharpness of the area according to the presence or absence of floating objects during endoscopic imaging,
A virtual endoscope image synthesis device, wherein the mask information has the same size as the synthesized virtual endoscope image.

According to paragraph 1,
It further includes a display unit configured to display an input window that receives at least one of the actually captured endoscopic image, attribute values related to the captured endoscopic image, location information of the lesion, and the mask information,
The learning data is,
A virtual endoscopic image synthesis device, characterized in that it is generated by matching at least one attribute value input through the input window to the captured endoscopic image.

According to paragraph 2,
The processor,
It further includes a sentence generation module that generates text containing the attribute values based on attribute values related to the endoscope image,
The learning data is,
A virtual endoscope image, characterized in that it is generated by matching at least one attribute value input through the input window and text generated based on at least one attribute value input through the input window to the captured endoscopic image. Synthesis device.

According to paragraph 3,
The sentence generation module generates a plurality of texts based on attribute values related to the captured endoscopic image,
The learning data is,
A virtual endoscopic image synthesis device, characterized in that it is generated by matching the plurality of texts to the captured endoscopic image.

According to paragraph 3,
Attribute values related to the captured endoscopic image are,
A virtual endoscopic image synthesis device comprising attribute values related to global stand SES-CD score evaluation items.

According to clause 5,
Attribute values related to the captured endoscopic image are,
Type of organ imaged, age of image subject, name of diagnosis, field of view of endoscopic imaging device, field of view of lesion, location of imaged lesion, shape of imaged lesion, size of imaged lesion, type of endoscopic imaging device, endoscopic procedure tools A virtual endoscopic image synthesis device comprising attribute values related to at least one of the type and endoscopic imaging situation.

delete

In a method of synthesizing a virtual endoscope image performed by a processor,
Using the actually captured endoscopic image and at least one attribute value related to the captured endoscopic image as learning data, at least one of text containing attribute values related to the endoscopic image, lesion location information, mask information, and the actually captured endoscopic image A step of receiving one input and learning an artificial intelligence model to synthesize a virtual endoscope image;
Receiving at least one of text including attribute values related to the endoscopic image, lesion location information, mask information, and an actually captured endoscopic image; and
A step of the artificial intelligence model synthesizing a virtual endoscopic image based on at least one of the input text, lesion location information, mask information, and actually captured endoscopic image,
The artificial intelligence model is,
Learned to synthesize a lesion image with an actually captured endoscopic image based on at least one of text containing attribute values related to the endoscopic image, location information of the lesion, and the mask information,
The mask information defines the brightness and darkness of the area according to the amount of light and the direction of light irradiation during endoscopic imaging, or defines the sharpness of the area according to the presence or absence of floating objects during endoscopic imaging,
A virtual endoscope image synthesis method, wherein the mask information has the same size as the synthesized virtual endoscope image.

According to clause 8,
The step of learning the artificial intelligence model is,
Further comprising the step of displaying an input window for inputting the actually captured endoscopic image and attribute values related to the captured endoscopic image,
The learning data is,
A virtual endoscopic image synthesis method, characterized in that it is generated by matching at least one of at least one attribute value, lesion location information, and mask information input through the input window to the captured endoscopic image.

A program coupled to a computer and stored in a computer-readable recording medium to execute the image synthesis method of any one of claims 8 and 9.