KR20220008126A

KR20220008126A - Method and apparatus for detecting dysphagia automatically based on machine learning

Info

Publication number: KR20220008126A
Application number: KR1020200086342A
Authority: KR
Inventors: 이성재; 최상일
Original assignee: 단국대학교 산학협력단
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2022-01-20
Also published as: KR102430526B1

Abstract

The present invention provides a method and an apparatus for automatically detecting dysphagia based on machine learning which can improve determination accuracy and reduce determination time. According to one embodiment of the present invention, the method for automatically detecting dysphagia based on machine learning comprises: a step of receiving image data including a cervical vertebra; a step of inputting the image data into a first machine learning unit to recognize the position of the cervical vertebra, and generating a plurality of reasonable on interest (ROI) images for a plurality of frames in the image data; a step of inputting the plurality of ROI images into a second machine learning unit to identify dysphagia occurrence status in the plurality of ROI images; and a step of determining that a patient corresponding to the image data has dysphagia if the number of frames identified to have dysphagia among prescribed continuous frames included in the plurality of frames is higher than or equal to a first threshold value.

Description

Machine learning-based automatic detection method and device for dysphagia

본 발명은 기계학습 기반 연하장애 자동 탐지 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for automatic detection of dysphagia based on machine learning.

연하장애(dysphagia)는 음식물이 구강에서 식도로 넘어가는 과정에 문제가 생겨 음식을 원활히 섭취할 수 없는 증상을 의미한다. 연하장애는 음식물이 기도로 넘어가는 흡인 현상을 유발하여 폐렴 및 질식을 일으킬 수 있으므로 빠른 발견이 중요하다.Dysphagia is a condition in which food passes from the mouth to the esophagus, causing problems in the passage of food. It is important to detect dysphagia early because it can cause aspiration of food into the airways, which can lead to pneumonia and suffocation.

한편, 종래에는 환자에게 연하장애가 있는지 여부를 판단하려면 의료인이 환자가 음식물을 삼키는 영상 데이터를 판독하는 것이 요구되었다. 의료인이 직접 영상 데이터를 보고 판독하는 방법의 경우, 의료인에 따라 판독 정확도의 편차가 크고 판독 시간이 장시간 소요되는 단점이 존재하게 된다. 따라서, 이를 해결하기 위해 기계학습을 이용한 연하장애 자동 탐지에 대한 연구가 진행 중에 있다.Meanwhile, in the prior art, in order to determine whether a patient has dysphagia, it is required for a medical practitioner to read image data of a patient swallowing food. In the case of a method in which medical personnel directly view and read image data, there is a disadvantage in that the reading accuracy varies depending on the medical personnel and the reading time is long. Therefore, to solve this problem, research on automatic detection of dysphagia using machine learning is in progress.

대한민국 등록특허공보 제10-2094828호 (2020. 4. 27)Republic of Korea Patent Publication No. 10-2094828 (2020. 4. 27)

상술한 바와 같은 문제점을 해결하기 위한 본 발명은 제1 기계학습부를 이용하여 경추를 인식한 후, 경추를 포함하는 ROI 영상을 생성하고, 제2 기계학습부를 이용하여 ROI 영상에서 연하장애 여부를 판단하는 기계학습 기반 연하장애 자동 탐지 방법 및 장치를 제공하는데 그 목적이 있다.The present invention for solving the problems as described above uses the first machine learning unit to recognize the cervical vertebrae, then generates an ROI image including the cervical vertebrae, and uses the second machine learning unit to determine whether swallowing disorder is present in the ROI image. An object of the present invention is to provide a method and apparatus for automatic detection of dysphagia based on machine learning.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법은 경추가 포함된 영상 데이터를 수신하는 단계, 영상 데이터를 제1 기계학습부에 입력하여 경추의 위치를 인식하고, 영상 데이터 내의 복수의 프레임에 대하여 복수의 ROI(reasonable on interest) 영상을 생성하는 단계, 복수의 ROI 영상을 제2 기계학습부에 입력하여, 복수의 ROI 영상에서의 연하장애 발생 여부를 식별하는 단계 및 복수의 프레임에 포함된 소정의 연속적인 프레임 중에서 연하장애가 발생한 것으로 식별된 프레임의 수가 제1 임계값 이상인 경우, 영상 데이터와 대응하는 환자에게 연하장애가 있는 것으로 판단하는 단계를 포함한다.A method for automatically detecting swallowing disorders based on machine learning according to an embodiment of the present invention for solving the above-described problems includes the steps of receiving image data including cervical vertebrae, and inputting the image data to the first machine learning unit to determine the position of the cervical vertebrae. Recognizing and generating a plurality of ROI (reasonable on interest) images for a plurality of frames in the image data, inputting the plurality of ROI images to a second machine learning unit to determine whether swallowing disorders occur in the plurality of ROI images and determining that the patient corresponding to the image data has dysphagia when the number of frames identified as having dysphagia among predetermined continuous frames included in the plurality of frames is equal to or greater than a first threshold.

또한, 상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 장치는 경추가 포함된 영상 데이터를 수신하는 영상 데이터 수신부, 영상 데이터를 제1 기계학습부에 입력하여 경추의 위치를 인식하고, 영상 데이터 내의 복수의 프레임에 대하여 복수의 ROI(reasonable on interest) 영상을 생성하는 동적 ROI 생성부 및 복수의 ROI 영상을 제2 기계학습부에 입력하여, 복수의 ROI 영상에서의 연하장애 발생 여부를 식별하고, 복수의 프레임에 포함된 소정의 연속적인 프레임 중에서 연하장애가 발생한 것으로 식별된 프레임의 수가 제1 임계값 이상인 경우, 영상 데이터와 대응하는 환자에게 연하장애가 있는 것으로 판단하는 연하장애 판단부를 포함한다.In addition, the machine learning-based automatic swallowing disorder detection apparatus according to an embodiment of the present invention for solving the above-described problems is an image data receiving unit for receiving image data including cervical spine, and inputting the image data to the first machine learning unit. A dynamic ROI generator that recognizes the position of the cervical spine and generates a plurality of ROI (reasonable on interest) images for a plurality of frames in the image data and a plurality of ROI images are input to the second machine learning unit, and a plurality of ROI images If the number of frames identified as having dysphagia among predetermined continuous frames included in the plurality of frames is greater than or equal to the first threshold, it is determined that the patient corresponding to the image data has dysphagia. and a swallowing disorder judgment unit.

이 외에도, 하드웨어인 컴퓨터와 결합하여 본 발명을 구현하기 위한 방법을 실행시키기 위해 매체에 저장된 프로그램이 더 제공될 수 있다. In addition to this, a program stored in a medium may be further provided to execute the method for implementing the present invention in combination with a computer which is hardware.

또한, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition, another method for implementing the present invention, another system, and a computer readable recording medium recording a computer program for executing the method may be further provided.

상기와 같은 본 발명에 따르면, 영상 데이터가 수신되면 의료인의 개입 없이 연하장애 유무를 탐지할 수 있기 때문에, 판단의 정확도를 향상시킬 수 있고, 판독 시간을 절감할 수 있는 효과가 있다.According to the present invention as described above, when the image data is received, the presence or absence of a swallowing disorder can be detected without the intervention of a medical professional, so that the accuracy of judgment can be improved and the reading time can be reduced.

또한, 본 발명의 일 실시예에 따른 방법은 연하장애 유무를 탐지하기 위해, 영상 데이터로부터 ROI 영상을 생성하는 과정을 포함하고 있으므로, 영상 데이터로부터 직접 연하장애 유무를 판독하는 것보다 정확도를 향상시킬 수 있는 효과가 있다.In addition, since the method according to an embodiment of the present invention includes the process of generating an ROI image from image data to detect the presence or absence of a dysphagia, the accuracy can be improved compared to reading the presence or absence of a dysphagia directly from the image data. can have an effect.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 영상 데이터의 전처리를 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 동적 ROI 생성부에 의해 인식된 경추 및 ROI 영상을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따라 ROI 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 동적 ROI 생성부가 경추를 인식하는 방법을 설명하기 위한 도면이다.
도 7 내지 도 9는 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법의 실험 결과를 설명하기 위한 도면이다.
도 10은 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 장치를 설명하기 위한 블록도이다.1 is a diagram for explaining a method for automatically detecting dysphagia based on machine learning according to an embodiment of the present invention.
2 is a flowchart illustrating a method for automatically detecting dysphagia based on machine learning according to an embodiment of the present invention.
3 is a diagram for explaining pre-processing of image data according to an embodiment of the present invention.
4 is a view for explaining the cervical spine and ROI image recognized by the dynamic ROI generator according to an embodiment of the present invention.
5 is a diagram for explaining a method of generating an ROI image according to an embodiment of the present invention.
6 is a diagram for explaining a method of recognizing a cervical spine by a dynamic ROI generator according to an embodiment of the present invention.
7 to 9 are diagrams for explaining the experimental results of the automatic detection method for dysphagia based on machine learning according to an embodiment of the present invention.
10 is a block diagram for explaining an apparatus for automatic detection of dysphagia based on machine learning according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully understand the scope of the present invention to those skilled in the art, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

설명에 앞서 본 명세서에서 사용하는 용어의 의미를 간략히 설명한다. 그렇지만 용어의 설명은 본 명세서의 이해를 돕기 위한 것이므로, 명시적으로 본 발명을 한정하는 사항으로 기재하지 않은 경우에 본 발명의 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.Before the description, the meaning of the terms used in this specification will be briefly described. However, it should be noted that, since the description of the term is for the purpose of helping the understanding of the present specification, it is not used in the meaning of limiting the technical idea of the present invention unless explicitly described as limiting the present invention.

도 1은 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법을 설명하기 위한 도면이다.1 is a diagram for explaining a method for automatically detecting dysphagia based on machine learning according to an embodiment of the present invention.

연하장애의 발생은 환자가 음식물을 삼켰을 때 음식물이 식도로 이동하지 않고 기도로 이동하는지 여부에 따라 판단된다. 환자마다 연하장애가 언제 발생할지 예측할 수 없기 때문에, 본 발명의 일 실시예에 따른 방법은 환자가 음식물을 삼키는 것을 촬영한 영상 데이터에서 먼저 각 프레임에서 연하장애가 발생하였는지 여부를 판단한 후, 그 결과를 바탕으로 컨텍스트 정보를 이용하여 해당 영상 데이터에서 연하장애가 발생한 시점을 특정한다.The occurrence of dysphagia is judged according to whether or not the food moves into the airway rather than the esophagus when the patient swallows it. Since it is impossible to predict when dysphagia will occur for each patient, the method according to an embodiment of the present invention first determines whether dysphagia has occurred in each frame from the image data captured by the patient swallowing food, and then based on the result As a result, context information is used to specify the point in time when a swallowing disorder occurs in the corresponding image data.

도 1을 참고하면, 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법 및 장치는 환자가 음식물을 삼키는 영상 데이터를 수신할 수 있다. 도 1 및 이하 도면에서 영상 데이터는 x-ray 영상 데이터로 도시되어 있으나, 영상 데이터의 종류는 이에 제한되지 않는다.Referring to FIG. 1 , a method and apparatus for automatically detecting dysphagia based on machine learning according to an embodiment of the present invention may receive image data of a patient swallowing food. 1 and the following drawings, image data is shown as x-ray image data, but the type of image data is not limited thereto.

이후 기계학습의 효율성을 향상시키기 위해 영상 데이터의 전처리(Image Preprocessing)를 수행할 수 있다. 영상 데이터의 전처리는 픽셀 밝기의 조정 및 영상의 크로핑 등을 포함할 수 있다.Thereafter, image preprocessing may be performed to improve the efficiency of machine learning. The pre-processing of the image data may include adjusting pixel brightness and cropping the image.

전처리된 영상 데이터는 동적 ROI 생성부(Dynamic ROI Creator)에 입력되는데, 동적 ROI 생성부(Dynamic ROI Creator)는 영상 데이터의 복수의 프레임 내에서 경추를 인식한 후, 복수의 ROI 영상을 생성할 수 있다. 도 1을 참고하면, 붉은색 영역(110)은 본 발명의 일 실시예에 따른 방법이 경추로 인식한 영역이고, 녹색 영역(120)은 ROI 영상에 포함될 영역을 의미한다.The pre-processed image data is input to the Dynamic ROI Creator, which recognizes the cervical vertebrae within a plurality of frames of image data and then generates a plurality of ROI images. have. Referring to FIG. 1 , a red region 110 is a region recognized as a cervical spine by the method according to an embodiment of the present invention, and a green region 120 denotes a region to be included in the ROI image.

이후 복수의 ROI 영상은 연하장애 판단부(Airway Invasion Detection)에 입력되고, 연하장애 판단부(Airway Invasion Detection)는 연하장애 유무를 자동으로 탐지할 수 있다.Thereafter, the plurality of ROI images may be input to the Airway Invasion Detection, and the Airway Invasion Detection may automatically detect the presence or absence of a swallowing disorder.

도 2는 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a method for automatically detecting dysphagia based on machine learning according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법은 S210에서 경추가 포함된 영상 데이터를 수신할 수 있다.The machine learning-based automatic detection method for dysphagia according to an embodiment of the present invention may receive image data including cervical vertebrae in S210.

한편, 본 발명의 일 실시예에 따른 방법은 경추가 포함된 영상 데이터를 수신한 후, 영상 데이터를 전처리하는 단계를 더 포함할 수 있다. 구체적으로 영상 데이터를 전처리하는 단계는, 영상 데이터의 픽셀 밝기를 균일화하는 단계 및 영상 데이터을 동일한 크기로 크로핑(cropping)하는 단계 중 적어도 하나를 포함할 수 있다.Meanwhile, the method according to an embodiment of the present invention may further include pre-processing the image data after receiving the image data including the cervical spine. Specifically, the pre-processing of the image data may include at least one of uniformizing pixel brightness of the image data and cropping the image data to the same size.

S220에서 영상 데이터를 제1 기계학습부에 입력하여 상기 경추의 위치를 인식하고, 경추의 위치를 기준으로 영상 데이터 내의 복수의 프레임에 대하여 복수의 ROI(reasonable on interest) 영상을 생성할 수 있다. 한편, ROI 영상은 복수의 프레임마다 동적으로 생성될 수 있다. In S220, image data is input to the first machine learning unit to recognize the position of the cervical vertebrae, and a plurality of ROI (reasonable on interest) images may be generated for a plurality of frames in the image data based on the position of the cervical vertebrae. Meanwhile, the ROI image may be dynamically generated for each of a plurality of frames.

S230에서 복수의 ROI 영상을 제2 기계학습부에 입력하여, 복수의 ROI 영상에서의 연하장애 발생 여부를 식별할 수 있다.In S230, by inputting the plurality of ROI images to the second machine learning unit, it is possible to identify whether swallowing disorders occur in the plurality of ROI images.

S240에서 복수의 프레임에 포함된 소정의 연속적인 프레임 중에서 연하장애가 발생한 것으로 식별된 프레임의 수가 제1 임계값 이상인 경우, 영상 데이터와 대응하는 환자에게 연하장애가 있는 것으로 판단할 수 있다.When the number of frames identified as having dysphagia among predetermined continuous frames included in the plurality of frames in S240 is equal to or greater than the first threshold, it may be determined that the patient corresponding to the image data has dysphagia.

S240은 영상 데이터 내의 몇번째 목넘김에서 연하장애가 발생한 것인지 판단하는 단계를 포함할 수 있다. 또한, S240은 연하장애가 영상 데이터 내에서 몇번 발생한 것인지 판단하는 단계를 포함할 수 있다.S240 may include a step of determining whether the swallowing disorder occurs at the number of throat swallowing in the image data. In addition, S240 may include determining how many times the dysphagia has occurred in the image data.

한편, 일 실시예에 따른 방법은 연하장애가 발생한 것으로 식별된 제1 임계값 이상의 프레임이 연속적이지 않은 경우, 연하장애가 발생한 것으로 식별된 프레임 사이에 존재하는 프레임은 연하장애가 발생한 것으로 후보정하는 단계를 더 포함할 수 있다. 여기서 후보정하는 단계는 중앙값 필터(median filter)를 사용하는 것일 수 있다.On the other hand, the method according to an embodiment of the present invention further comprises the step of, when the frames above the first threshold value identified as having dysphagia are not continuous, the frame existing between the frames identified as having dysphagia as having occurred is post-selecting further comprising: can do. Here, the step of performing the post-correction may be to use a median filter.

도 3은 본 발명의 일 실시예에 따른 영상 데이터의 전처리를 설명하기 위한 도면이다.3 is a diagram for explaining pre-processing of image data according to an embodiment of the present invention.

환자마다 연하장애 여부를 판단하기 위해, 영상을 촬영한 시점과 촬영 환경은 서로 다를 수 있다. 나아가 해당 영상 데이터들은 픽셀의 밝기 분포가 동적 범위내 특정 영역에 집중되는 경향이 있을 수 있어, 기계학습을 수행하는데 어려움이 있을 수 있다. In order to determine whether or not a swallowing disorder is present for each patient, the time point at which the image was taken and the shooting environment may be different. Furthermore, the corresponding image data may have a tendency to be concentrated in a specific area within the dynamic range of the pixel brightness distribution, so it may be difficult to perform machine learning.

따라서, 본 발명의 일 실시예에 따른 방법은 각 영상 데이터의 조건을 일치시켜 기계학습 동안 컨볼루션 필터가 효과적으로 학습될 수 있도록, 수신된 영상 데이터의 전처리를 수행할 수 있다. 구체적으로 전처리의 종류는 픽셀 밝기의 균일화 및 영상 크기의 크로핑 중 적어도 하나일 수 있다. 여기서 영상 데이터를 모두 동일한 크기로 크로핑하면 ROI 검출 과정의 효율성을 높이는 효과가 있다.Accordingly, the method according to an embodiment of the present invention may perform preprocessing of the received image data so that the convolution filter can be effectively learned during machine learning by matching the conditions of each image data. Specifically, the type of preprocessing may be at least one of uniform pixel brightness and cropping of image size. Here, if all the image data is cropped to the same size, there is an effect of increasing the efficiency of the ROI detection process.

도 3은 CLAHE(Contrast Limited Adaptive Histogram Equalization)를 이용하여 영상 데이터를 전처리 결과를 설명하기 위한 도면이다. CLAHE는 영상을 일정한 크기의 작은 블록으로 구분하고, 블록별로 픽셀 밝기의 히스토그램을 균일화함으로써 영상 전체에 대한 픽셀 밝기를 균일화할 수 있다. 3 is a diagram for explaining a result of preprocessing image data using Contrast Limited Adaptive Histogram Equalization (CLAHE). CLAHE divides the image into small blocks of a certain size and uniformizes the histogram of the pixel brightness for each block, thereby uniformizing the pixel brightness for the entire image.

도 3의 (a)는 원본 영상 데이터이고, 도 3의 (b)는 전처리 수행 결과를 도시한 것이다. 도 3의 (b)를 참고하면, CLAHE 과정을 통해 턱과 경추 주변의 콘트라스트가 확연히 개선되어 경추의 구조적인 특성이 두드러지게 나타나게 되는 것을 확인할 수 있다. Fig. 3 (a) is the original image data, and Fig. 3 (b) shows the pre-processing result. Referring to (b) of FIG. 3 , it can be seen that the contrast around the chin and cervical vertebrae is significantly improved through the CLAHE process, so that the structural characteristics of the cervical vertebrae are prominently displayed.

도 4는 본 발명의 일 실시예에 따른 동적 ROI 생성부에 의해 인식된 경추 및 ROI 영상을 설명하기 위한 도면이다.4 is a view for explaining the cervical spine and ROI image recognized by the dynamic ROI generator according to an embodiment of the present invention.

도 4의 (a)는 동적 ROI 생성부가 영상 데이터에 포함된 경추를 인식하는데 필요한 실측자료(ground truth)를 도시한 것으로, 경추 영역이 붉은색으로 도시되어 있다. 도 4의 (b)는 동적 ROI 생성부에 의해 인식된 경추를 파란색으로 도시한 영상이며, 도 4의 (c)는 인식된 경추 영역에 기초하여, 생성된 ROI 영상을 도시한 것이다. Figure 4 (a) shows the ground truth required for the dynamic ROI generator to recognize the cervical spine included in the image data, the cervical region is shown in red. 4 (b) is an image showing the cervical vertebra recognized by the dynamic ROI generator in blue, and FIG. 4 (c) shows an ROI image generated based on the recognized cervical vertebrae region.

연하장애는 목구멍의 구조상 후두 부근에서 발생하기 때문에, 영상 데이터 내 영역 중에서 목 주변 이외의 영역은 연하장애 진단에 불필요하다. 오히려 오 4의 (a) 및 (b)와 같이 두개골과 목뼈 등에서 보이는 복잡한 텍스쳐 패턴은 음식물의 이동 경로 분석에 어려움을 야기할 수 있다. 따라서 본 발명의 일 실시예에 따른 방법은 연하장애의 검출을 위해 ROI 데이터를 생성할 수 있다. Since dysphagia occurs near the larynx due to the structure of the throat, regions other than around the neck in the image data are unnecessary for diagnosing dysphagia. Rather, the complex texture patterns seen in the skull and cervical vertebrae as shown in (a) and (b) of Figure 4 may cause difficulties in analyzing the movement path of food. Therefore, the method according to an embodiment of the present invention may generate ROI data for the detection of dysphagia.

도 3을 참고하면 환자가 삼킨 음식물은 일반적으로 경추 앞부분에서 발견된다. 따라서, 본 발명의 일 실시예에 따른 방법은 효과적인 연하장애 검출을 위해 후두와 그 인근 영역을 ROI로 선택할 수 있다. Referring to FIG. 3 , food swallowed by a patient is generally found in the anterior portion of the cervical spine. Accordingly, the method according to an embodiment of the present invention may select the larynx and its vicinity as an ROI for effective dysphagia detection.

또한, 환자가 VFSS(Videofluoroscopic swallowing study)에 참여하는 동안 음식물을 삼킬 때마다 머리와 목이 움직이기 때문에 본 발명은 환자의 움직임에 따라 적응적으로 움직이는, 동적 ROI를 생성할 수 있다. 이를 위해, 본 발명의 일 실시예에 따른 방법은 각 영상 데이터마다 구조적으로 두드러진 특징을 나타내는 경추를 찾아 ROI 설정의 기준점으로 사용할 수 있다.In addition, since the head and neck move whenever the patient swallows food while participating in a videofluoroscopic swallowing study (VFSS), the present invention can generate a dynamic ROI that adaptively moves according to the movement of the patient. To this end, the method according to an embodiment of the present invention may find a cervical vertebra showing structurally prominent features for each image data and use it as a reference point for ROI setting.

본 발명의 일 실시예에 따라 구현된 기계학습 기반 연하장애 자동 탐지 장치는 제1 기계학습부의 학습을 위해 전체 322개 영상 데이터(비디오 파일) 중 약 50%에 해당하는 157개의 영상 데이터를 랜덤하게 선택하고, 각 영상 데이터에서 경추가 선명하게 보이는 영상을 선택하여 총 157장의 영상으로 학습 데이터 셋을 구성할 수 있다. 한편, 테스트 셋은 학습 데이터에 사용되지 않은 나머지 비디오 파일로부터 한 장씩 골라 총 165장으로 구성할 수 있고, 제1 기계학습부의 학습을 위해 도 4의 (a)에 도시된 붉은색 영역과 같이 경추 영역의 픽셀에 라벨링할 수 있으나, 구체적으로 활용되는 영상 데이터의 수는 이에 제한되지 않는다.The machine learning-based automatic swallowing disorder detection apparatus implemented according to an embodiment of the present invention randomly selects 157 image data corresponding to about 50% of the total 322 image data (video files) for learning of the first machine learning unit. The training data set can be configured with a total of 157 images by selecting and selecting an image in which the cervical spine is clearly visible from each image data. On the other hand, the test set can be composed of a total of 165 sheets by selecting one from the remaining video files that are not used for the training data, and for the learning of the first machine learning unit, the cervical spine as shown in the red region of Fig. 4 (a). The pixels of the region may be labeled, but the number of specifically used image data is not limited thereto.

도 5는 본 발명의 일 실시예에 따라 ROI 영상을 생성하는 방법을 설명하기 위한 도면이다.5 is a diagram for explaining a method of generating an ROI image according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 방법은 경추의 위치 측정 결과에 기초하여, 후두, 경추, 그리고 인접 영역을 포함하는 직사각형 모양의 ROI 영상을 생성할 수 있다. ROI 영상을 생성할 때, 본 발명의 일 실시예에 따른 방법은 경추에 해당하는 픽셀들의 중심 좌표로 사용할 수 있다. The method according to an embodiment of the present invention may generate a rectangular ROI image including the larynx, cervical vertebrae, and adjacent regions based on the position measurement result of the cervical vertebrae. When generating the ROI image, the method according to an embodiment of the present invention may be used as the center coordinates of pixels corresponding to the cervical spine.

도 5의 (b) 및 (c)에 도시된 붉은색 영역은 경추로 분류된 픽셀을 의미한다. 경추로 분류된 픽셀의 수를 N이라고 할 때, 도 5의 (c)에 도시된 ROI의 기준점 좌표는 수학식 1과 같다.The red regions shown in FIGS. 5 (b) and (c) mean pixels classified as cervical vertebrae. When the number of pixels classified as cervical vertebra is N, the coordinates of the reference point of the ROI shown in FIG. 5(c) are the same as in Equation 1.

[수학식 1][Equation 1]

여기서 x_i와 y_i는 도 5의 (a) 영상 데이터의 왼쪽 상단을 원점으로 했을 때 각 픽셀들의 좌표를 의미한다. 한편, ROI 영상의 최하단 부분은 영상 데이터의 최하단 부분과 같게 하였고, ROI 영상의 크기는

를 기준으로 휴리스틱하게(heuristically) 결정할 수 있다.Here, x _i and y _i mean the coordinates of each pixel when the upper left corner of the image data of FIG. 5 (a) is taken as the origin. On the other hand, the lowermost part of the ROI image is the same as the lowermost part of the image data, and the size of the ROI image is

can be determined heuristically.

도 5의 (c)에 도시된 녹색 영역은 본 발명의 일 실시예에 따른 방법에 의해 생성된 ROI 영상이며, 도 5의 (d)는 ROI 영상을 299x299 크기로 리사이징한 것을 나타낸다.The green area shown in FIG. 5C is an ROI image generated by the method according to an embodiment of the present invention, and FIG. 5D shows that the ROI image is resized to a size of 299x299.

도 5의 (d)에 도시된 ROI 영상은 연하장애 판단부에 입력될 수 있다. 연하장애 판단을 위해, 본 발명의 일 실시예에 따른 방법은 CNN(Convolutional Neural Network)에 기반한 제2 기계학습부를 활용할 수 있다. 연하장애 판단부는 ROI 영상 속 음식물의 위치와 모양에 기초하여 음식물이 정상적으로 식도를 지나고 있는지 아니면 연하장애가 발생하였는지 판단할 수 있다.The ROI image shown in (d) of FIG. 5 may be input to the dysphagia determination unit. For the determination of dysphagia, the method according to an embodiment of the present invention may utilize a second machine learning unit based on a Convolutional Neural Network (CNN). The dysphagia determination unit may determine whether the food normally passes through the esophagus or whether a dysphagia has occurred based on the position and shape of the food in the ROI image.

도 6은 본 발명의 일 실시예에 따른 동적 ROI 생성부가 경추를 인식하는 방법을 설명하기 위한 도면이다.6 is a diagram for explaining a method of recognizing a cervical spine by a dynamic ROI generator according to an embodiment of the present invention.

일 실시예에 따르면, 영상 데이터에서 경추의 위치를 찾기 위해 생체의학 이미지의 분할(segmentation)시 활용되는 기계학습 모델인 U-Net을 이용할 수 있다. U-Net은 Fully Connected Network(FCN) 기반의 모델로서, 적은 양의 데이터만으로 네트워크를 학습할 수 있는 장점이 있다.According to an embodiment, U-Net, which is a machine learning model used for segmentation of a biomedical image, may be used to find the position of the cervical spine in the image data. U-Net is a Fully Connected Network (FCN)-based model and has the advantage of being able to learn a network with only a small amount of data.

한편, 적은 양의 학습 데이터를 가지고 효과적으로 네트워크를 학습하기 위해 본 발명의 일 실시예에 따른 방법은 전이 학습(trasnsfer learning) 기법을 이용할 수 있다. 도 6은 본 발명의 일 실시예에 따른 방법을 구현하기 위해 사용된 U-Net의 구조를 나타낸다. 본 발명의 일 실시예에 따른 방법을 구현하기 위해, 인코더는 11개의 컨볼루션 계층(convolutional layer)으로 구성되었으며 디코더에는 4개의 컨볼루션 계층을 사용하였다. Meanwhile, in order to effectively learn a network with a small amount of learning data, the method according to an embodiment of the present invention may use a transfer learning technique. 6 shows the structure of a U-Net used to implement a method according to an embodiment of the present invention. In order to implement the method according to an embodiment of the present invention, the encoder consists of 11 convolutional layers, and the decoder uses 4 convolutional layers.

도 6을 참고하면, 수신된 224x224 크기의 영상 데이터는 컨볼루션 인코더(convolutional encoder)를 거치면서 14x14 크기의 피처(feature)를 생성한다. 피처는 컨볼루션 인코더를 통과하면서 112x112 크기로 줄어든 영상 데이터에서 경추를 찾는다. 또한, 도 6에 도시된 U-Net 구조는 컨볼루션 인코더에서 추출된 특징들을 컨볼루션 디코더로 전달할 수 있다. Referring to FIG. 6 , the received image data with a size of 224x224 is passed through a convolutional encoder to generate a feature with a size of 14x14. The feature finds the cervical vertebrae in the image data reduced to 112x112 size while passing through the convolutional encoder. In addition, the U-Net structure shown in FIG. 6 may transmit features extracted from the convolutional encoder to the convolutional decoder.

도 7 내지 도 9는 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 방법의 실험 결과를 설명하기 위한 도면이다.7 to 9 are diagrams for explaining the experimental results of the automatic detection method for dysphagia based on machine learning according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 방법은 복수의 ROI 영상에서의 연하장애 발생 여부를 식별하여 2진수 값으로 출력할 수 있다. 예를 들어, ROI 영상이 연하장애가 발생한 것으로 판단되는 포지티브 클래스에 속하면, 일 실시예에 따른 방법은 2진수 값인 1을 출력할 수 있다. 또한, ROI 영상이 연하장애가 발생하지 않은 것으로 판단되는 네거티브 클래스에 속하면, 2진수 값인 0을 출력할 수 있다.The method according to an embodiment of the present invention may identify whether or not swallowing disorders have occurred in a plurality of ROI images and output them as binary values. For example, if the ROI image belongs to a positive class in which it is determined that dysphagia has occurred, the method according to an embodiment may output a binary value of 1. In addition, if the ROI image belongs to a negative class in which it is determined that swallowing disorder does not occur, a binary value of 0 may be output.

도 7은 연하장애를 판독하기 위해 제2 기계학습부에서 활용되는 클래스를 설명하기 위한 도면이다. 구체적으로, 도 7의 (a)와 같이 연하장애가 발생한 것으로 판단되는 영상은 포지티브 클래스로 분류되고, 도 7의 (b)는 연하장애가 발생하지 않은 것으로 판단되는 영상은 네거티브 클래스로 분류될 수 있다.7 is a view for explaining a class used in the second machine learning unit to read dysphagia. Specifically, as shown in (a) of FIG. 7 , the image determined to have dysphagia is classified as a positive class, and in FIG.

한편, 본 발명의 일 실시예에 따른 방법은 Xception 아키텍처를 이용한 CNN을 이용하여 구현될 수 있다. 본 발명의 일 실시예에 따른 방법의 성능을 실험하기 위해, 전체 322개 비디오 파일에 포함된 16,062장의 포지티브 클래스의 영상들과 269,579개의 네거티브 클래스의 영상들로부터 각각 70%에 해당하는 영상들을 학습 데이터로 사용하였다. 또한, 10%의 영상들은 학습된 네트워크의 검증을 위해 사용하였고, 나머지 20%의 영상들은 본 발명의 일 실시예에 따른 방법의 성능을 평가하기 위한 테스트 데이터로 사용하였다. 표 1은 실험에 사용된 전체 영상 수와 학습(Training)과 검증(Validation) 및 테스트(Test)에 사용된 클래스별 영상 데이터의 수를 나타낸 것이다. On the other hand, the method according to an embodiment of the present invention can be implemented using a CNN using the Xception architecture. In order to test the performance of the method according to an embodiment of the present invention, images corresponding to 70% of each of 16,062 positive class images and 269,579 negative class images included in 322 video files were used as training data. was used as In addition, 10% of the images were used for verification of the learned network, and the remaining 20% of images were used as test data for evaluating the performance of the method according to an embodiment of the present invention. Table 1 shows the total number of images used in the experiment and the number of image data for each class used in Training, Validation, and Test.

[표 1][Table 1]

전체 영상 데이터에서 연하장애는 짧은 시간동안 발생하기 때문에 표 1의 네거티브 샘플이 포지티브 샘플에 비해 매우 많다. 이러한 클래스 간의 데이터 양의 불균형 문제를 완화하기 위해, 초점 손실 함수(focal loss function)가 사용될 수 있다.Since dysphagia occurs for a short period of time in the entire image data, the number of negative samples in Table 1 is much higher than that of the positive samples. In order to alleviate the problem of imbalance in the amount of data between these classes, a focal loss function may be used.

초점 손실(focal loss)은 개수가 많아 상대적으로 자주 분류할 수 있는 클래스에 대해 손실을 적게 주어 손실 갱신을 거의 못하게 하고, 분류가 어려운 클래스에 대해 손실의 갱신을 크게 함으로써 분류가 어려운 클래스에 대한 학습이 집중될 수 있게 한다.In the case of focal loss, loss update is almost impossible by giving a small loss to a class that can be classified relatively frequently due to its large number, and learning for a class that is difficult to classify by increasing the update of the loss for a class that is difficult to classify allow it to be focused.

또한, 제1 기계학습부가 경추를 인식하는 과정과 제2 기계학습부가 연하장애를 탐지하는 과정 모두 포지티브 샘플을 찾는 것이 네거티브 샘플을 찾는 것에 비해 상대적으로 어려운 작업으로 간주 될 수 있다. 따라서, 일 실시예에 따른 방법 및 장치는 크로스 엔트로피 손실 대신 초점 손실을 이용하여 네트워크를 학습할 수 있다.In addition, both the process of recognizing the cervical spine by the first machine learning unit and the process of detecting the dysphagia by the second machine learning unit may be regarded as a relatively difficult task to find a positive sample compared to finding a negative sample. Accordingly, the method and apparatus according to an embodiment may learn a network using a focus loss instead of a cross entropy loss.

도 8은 실험에 참가한 환자들의 각 영상 데이터에 대한 CNN의 분류 결과를 나타낸다. 도 8의 각 그래프의 가로축은 영상 데이터들을 촬영 시간의 순서대로 나열한 것이고, 세로축은 각 영상 데이터에 대한 클래스 분류 결과이다. 해당 영상 데이터가 포지티브 클래스에 해당하는 것으로 판단된 경우에는 1을 매핑하고, 네거티브 클래스에 해당하는 것으로 판단된 경우에는 0을 매핑하였다.8 shows the classification results of CNN for each image data of patients participating in the experiment. The horizontal axis of each graph of FIG. 8 lists the image data in the order of shooting time, and the vertical axis is the class classification result for each image data. When it is determined that the corresponding image data corresponds to the positive class, 1 is mapped, and when it is determined that the corresponding image data corresponds to the negative class, 0 is mapped.

도 8의 (a)는 연하장애가 없는 정상인에 대한 실험 결과이고, 도 8의 (b)는 연하장애를 가진 환자에 대한 실험 결과를 나타낸다. 도 8의 (c)와 (d)는 각각 정상인과 환자의 실험 결과를 중앙값 필터를 통해 후보정한 결과를 나타낸다. Fig. 8 (a) shows the experimental results for a normal person without dysphagia, and Fig. 8 (b) shows the experimental results for a patient with dysphagia. 8 (c) and (d) show the results of post-correction of the experimental results of the normal person and the patient through the median filter, respectively.

도 8의 (a) 및 (b)를 참고하면, 정상인이더라도 순간 연하장애가 있는 것으로 판단될 수 있고, 연하장애가 있는 환자라 하더라도 어느 프레임에서는 순간적으로 연하장애가 발생하지 않은 것으로 판단될 수 있다. 그러나 음식물을 삼키는 동작은 연속적인 동작이기 때문에 연하장애 발생지점에서의 영상들은 시간 축에서 연속으로 포지티브 샘플로 분류되어야 한다. 영상 데이터에 대한 연하장애의 발생 여부를 최종적으로 판단하는 데에 이러한 영상 데이터의 시간적 컨텍스트 정보를 반영하기 위해, 본 발명의 일 실시예에 따른 방법은 중앙값 필터를 이용하여, 후보정을 수행할 수 있다.Referring to (a) and (b) of Figure 8, it can be determined that even a normal person has instantaneous dysphagia, and even in a patient with dysphagia, it can be determined that there is no instantaneous dysphagia in any frame. However, since swallowing food is a continuous motion, the images at the point of occurrence of dysphagia should be continuously classified as positive samples on the time axis. In order to reflect the temporal context information of the image data in the final determination of whether or not swallowing disorders occur in the image data, the method according to an embodiment of the present invention may perform post-correction using a median filter. .

예를 들어, 영상 데이터가 24fps로 촬영된 영상 데이터인 경우, 도 8의 (a)에 도시된 임펄스 형태의 포지티브 샘플(810)은 원래 네거티브 샘플인데, 오류로 인해 포지티브 결과가 출력된 것으로 간주하는 것이 합리적일 수 있다. 따라서, 본 발명의 일 실시예에 따른 방법은 중앙값 필터를 이용하여 도 8의 (c)와 같이 포지티브 샘플을 네거티브 샘플로 후보정할 수 있다.For example, if the image data is image data shot at 24 fps, the positive sample 810 in the impulse form shown in FIG. it may be reasonable Accordingly, in the method according to an embodiment of the present invention, a positive sample may be post-corrected to a negative sample as shown in FIG. 8C using a median filter.

또한, 본 발명의 일 실시예에 따른 방법은 연속적인 7장의 영상에 대한 분류 결과중에 최소 2장 이상이 포지티브 샘플로 분류된 경우에만 해당 구간에서 연하장애가 발생한 것으로 판단할 수 있다. 이를 위해 영상 데이터에 대한 CNN의 분류 결과를 담은 벡터와 모든 성분이 1의 값을 갖는 길이가 7인 필터를 컨볼루션 연산할 수 있다. 도 8의 (d)는 이러한 후보정을 수행한 결과를 나타낸 것이다.In addition, the method according to an embodiment of the present invention may determine that dysphagia has occurred in the corresponding section only when at least two or more of the seven consecutive images are classified as positive samples. To this end, a convolution operation can be performed on a vector containing the classification result of CNN for image data and a filter of length 7 in which all components have a value of 1. Fig. 8(d) shows the result of performing such a post-correction.

한편, 표 2는 U-Net을 이용한 경추 검출 결과를 나타낸 것이다. Meanwhile, Table 2 shows the cervical spine detection results using U-Net.

[표 2][Table 2]

성능은 U-Net의 학습에 사용되지 않은 39 장의 영상에 대해 평가하였으며, 표 2의 수치들은 픽셀 수준에서 측정된 결과이다. 예를 들어, 리콜율(recall)은 경추의 실측자료(ground truth, 사람에 의해 경추이라고 라벨링된 픽셀) 중에 모델에 의해 경추에 해당한다고 예측된 픽셀의 비율을 의미한다. 표 2를 참고하면, U-Net이 경추를 90.4%의 리콜율(recall)과 99.6%의 정확도(accuracy), 그리고 85.6%의 정밀도(precision)를 보이는 것을 확인할 수 있다.The performance was evaluated for 39 images not used for U-Net training, and the figures in Table 2 are the results measured at the pixel level. For example, recall refers to the proportion of pixels in the ground truth of the cervical spine (pixels labeled as cervical by humans) predicted by the model to be cervical. Referring to Table 2, it can be seen that U-Net exhibits a recall rate of 90.4%, an accuracy of 99.6%, and a precision of 85.6% for the cervical spine.

표 3과 도 9는 각각 Xception 아키텍처 기반의 영상 데이터 분류 결과와 ROC 커브를 나타낸 것이다. Table 3 and FIG. 9 show the image data classification results and ROC curves based on the Xception architecture, respectively.

[표 3][Table 3]

표 3의 리콜율(recall)은 두명의 의료인에 의해 연하장애가 발생하였다고 판독된 영상 데이터 중에서, 본 발명의 일 실시예에 따른 방법 역시 연하장애가 발생하였다고 탐지한 영상 데이터의 비율을 의미한다. 표 3을 참고하면, 90.4%의 리콜율(recall)과 99.6%의 정확도(accuracy), 그리고 85.6%의 정밀도(precision)를 보이는 것을 확인할 수 있다.The recall rate in Table 3 means the ratio of the image data detected that dysphagia also occurred in the method according to an embodiment of the present invention among the image data read that dysphagia occurred by two medical personnel. Referring to Table 3, it can be seen that a recall rate of 90.4%, an accuracy of 99.6%, and a precision of 85.6% are shown.

또한, 도 9를 참고하면, 본 발명의 일 실시예에 따른 방법을 수행한 결과, 네거티브 샘플이 포지티브 샘플로 잘못 식별된 경우에 대한 비율을 나타내는 FPR(False Positive Rate) 값은 0에 근접하고, 포지티브 샘플을 제대로 포지티브 샘플로 식별한 비율인 TPR(True Positive Rate) 값은 1에 근접한 것을 확인할 수 있다.In addition, referring to FIG. 9 , as a result of performing the method according to an embodiment of the present invention, a false positive rate (FPR) value indicating a ratio for a case in which a negative sample is incorrectly identified as a positive sample is close to 0, It can be seen that the TPR (True Positive Rate) value, which is the rate at which a positive sample is properly identified as a positive sample, is close to 1.

표 4는 컨텍스트 정보를 반영하기 위해 컨볼루션 필터를 활용하여 연하장애 발생여부 식별 결과를 후보정한 후, 최종 연하장애의 검출 결과 및 ROC 커브를 나타낸 것이다.Table 4 shows the final detection result of dysphagia and the ROC curve after the results of identifying whether or not dysphagia has occurred by using a convolution filter to reflect context information.

[표 4][Table 4]

표 4를 참고하면, 83.3%의 리콜율(recall)과 93.9%의 정확도(accuracy), 그리고 98.0%의 정밀도(precision) 를 보이는 것을 확인할 수 있다.Referring to Table 4, it can be seen that the recall rate is 83.3%, the accuracy is 93.9%, and the precision is 98.0%.

이상으로 설명한 본 발명의 실시예에 따른 기계학습 기반 연하장애 자동 탐지 장치는 도 1 내지 도 9를 통해 설명한 기계학습 기반 연하장애 자동 탐지 방법과 발명의 카테고리만 다를 뿐, 동일한 내용이므로 중복되는 설명, 예시는 생략하도록 한다.The machine learning-based automatic swallowing disorder detection method according to the embodiment of the present invention described above differs from the machine learning-based automatic swallowing disorder detection method described with reference to FIGS. Examples are omitted.

도 10은 본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 장치를 설명하기 위한 블록도이다.10 is a block diagram illustrating an apparatus for automatically detecting a dysphagia based on machine learning according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 기계학습 기반 연하장애 자동 탐지 장치(1000)는 영상 데이터 수신부(1010), 동적 ROI 생성부(1020) 및 연하장애 판단부(1030)를 포함할 수 있다.The machine learning-based automatic swallowing disorder detection apparatus 1000 according to an embodiment of the present invention may include an image data receiver 1010 , a dynamic ROI generator 1020 , and a swallowing disorder determiner 1030 .

영상 데이터 수신부(1010)는 경추가 포함된 영상 데이터를 수신할 수 있다. The image data receiver 1010 may receive image data including cervical vertebrae.

한편, 영상 데이터 수신부(1010)는, 영상 데이터를 전처리하는 영상 전처리부를 포함할 수 있다. 이때 영상 전처리부는, 영상 데이터의 픽셀 밝기를 균일화하는 것 및 영상 데이터을 동일한 크기로 크로핑(cropping)하는 것 중 적어도 하나를 수행할 수 있다.Meanwhile, the image data receiver 1010 may include an image preprocessor that pre-processes the image data. In this case, the image preprocessor may perform at least one of uniform pixel brightness of image data and cropping image data to the same size.

동적 ROI 생성부(1020)는 영상 데이터 수신부(1010)에 의해 수신된 영상 데이터를 제1 기계학습부에 입력하여 상기 경추의 위치를 인식하고, 경추의 위치를 기준으로 상기 영상 데이터 내의 복수의 프레임에 대하여 복수의 ROI(reasonable on interest) 영상을 생성할 수 있다.The dynamic ROI generator 1020 inputs the image data received by the image data receiver 1010 to the first machine learning unit to recognize the position of the cervical vertebrae, and a plurality of frames in the image data based on the position of the cervical vertebrae. It is possible to generate a plurality of ROI (reasonable on interest) images with respect to the .

연하장애 판단부(1030)는 복수의 ROI 영상을 제2 기계학습부에 입력하여, 복수의 ROI 영상에서의 연하장애 발생 여부를 식별하고, 복수의 프레임에 포함된 소정의 연속적인 프레임 중에서 연하장애가 발생한 것으로 식별된 프레임의 수가 제1 임계값 이상인 경우, 영상 데이터와 대응하는 환자에게 연하장애가 있는 것으로 판단할 수 있다.The dysphagia determination unit 1030 inputs the plurality of ROI images to the second machine learning unit, identifies whether or not swallowing disorders occur in the plurality of ROI images, and determines whether dysphagia occurs among predetermined continuous frames included in the plurality of frames. When the number of frames identified as occurring is equal to or greater than the first threshold, it may be determined that the patient corresponding to the image data has dysphagia.

한편, 연하장애 판단부(1030)는 연하장애가 발생한 것으로 식별된 제1 임계값 이상의 프레임이 연속적이지 않은 경우, 연하장애가 발생한 것으로 식별된 프레임 사이에 존재하는 프레임은 연하장애가 발생한 것으로 후보정할 수 있다. 이때 연하장애 판단부(1030)는 후보정을 위해 중앙값 필터를 사용할 수 있다.On the other hand, the dysphagia determination unit 1030 may postulate that if the frames above the first threshold value identified as having the dysphagia are not continuous, frames existing between the frames identified as having the dysphagia have occurred. In this case, the dysphagia determination unit 1030 may use a median filter for the post-correction.

또한, 연하장애 판단부(1030)는 영상 데이터 내의 몇번째 목넘김에서 연하장애가 발생한 것인지 판단할 수 있다.In addition, the swallowing disorder determination unit 1030 may determine at which number of throat swallows in the image data the swallowing disorder occurs.

또한, 연하장애 판단부(1030)는 연하장애가 영상 데이터 내에서 몇번 발생한 것인지 판단할 수 있다.Also, the swallowing disorder determination unit 1030 may determine how many times the swallowing disorder occurs in the image data.

이상에서 전술한 본 발명의 일 실시예에 따른 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a server, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-described program is C, C++, JAVA, machine language, etc. that a processor (CPU) of the computer can read through a device interface of the computer in order for the computer to read the program and execute the methods implemented as a program It may include code (Code) coded in the computer language of Such code may include functional code related to a function defining functions necessary for executing the methods, etc., and includes an execution procedure related control code necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, the code may further include additional information necessary for the processor of the computer to execute the functions or code related to memory reference for which location (address address) in the internal or external memory of the computer to be referenced. have. In addition, when the processor of the computer needs to communicate with any other computer or server located remotely in order to execute the above functions, the code uses the communication module of the computer to determine how to communicate with any other computer or server remotely. It may further include a communication-related code for whether to communicate and what information or media to transmit and receive during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium is not a medium that stores data for a short moment, such as a register, a cache, a memory, etc., but a medium that stores data semi-permanently and can be read by a device. Specifically, examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. That is, the program may be stored in various recording media on various servers that the computer can access or in various recording media on the user's computer. In addition, the medium may be distributed in a computer system connected to a network, and a computer-readable code may be stored in a distributed manner.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, as a software module executed by hardware, or by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.As mentioned above, although embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains know that the present invention may be embodied in other specific forms without changing the technical spirit or essential features thereof. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

1000: 기계학습 기반 연하장애 자동 탐지 장치
1010: 영상 데이터 수신부
1020: 동적 ROI 생성부
1030: 연하장애 판단부1000: machine learning-based automatic detection device for dysphagia
1010: image data receiving unit
1020: dynamic ROI generator
1030: swallowing disorder judgment unit

Claims

Receiving image data including cervical spine;
inputting the image data to a first machine learning unit to recognize a position of the cervical vertebrae, and generating a plurality of ROI (reasonable on interest) images for a plurality of frames in the image data based on the position of the cervical vertebra;
inputting the plurality of ROI images to a second machine learning unit to identify whether swallowing disorders occur in the plurality of ROI images; and
determining that the patient corresponding to the image data has dysphagia if the number of frames identified as having dysphagia is greater than or equal to a first threshold among predetermined continuous frames included in the plurality of frames;
Including, machine learning-based automatic detection method of dysphagia.

According to claim 1,
If the frames above the first threshold value identified as having dysphagia are not continuous,
preliminarily setting a frame existing between the frames identified as having dysphagia as having dysphagia;
Further comprising, a machine learning-based automatic detection method of dysphagia.

3. The method of claim 2,
The step of pre-correction is to use a median filter, a machine learning-based automatic detection method for dysphagia.

According to claim 1,
The steps to determine that you have dysphagia are:
determining at which number of swallows in the image data a swallowing disorder occurred;
Including, machine learning-based automatic detection method of dysphagia.

According to claim 1,
The steps to determine that you have dysphagia are:
determining how many times the dysphagia has occurred in the image data;
Including, machine learning-based automatic detection method of dysphagia.

According to claim 1,
After receiving the image data including the cervical vertebrae, comprising the step of pre-processing the image data,
The pre-processing of the image data comprises at least one of equalizing the pixel brightness of the image data and cropping the image data to the same size, machine learning-based automatic detection method for dysphagia .

An image data receiving unit for receiving image data including cervical vertebrae;
Dynamic, which inputs the image data to a first machine learning unit to recognize the position of the cervical vertebrae, and generates a plurality of reasonable on interest (ROI) images for a plurality of frames in the image data based on the position of the cervical vertebrae ROI generator; and
By inputting the plurality of ROI images to a second machine learning unit, it is identified whether or not swallowing disorders have occurred in the plurality of ROI images, and frames identified as having dysphagia among predetermined continuous frames included in the plurality of frames are identified. a dysphagia determination unit that determines that a patient corresponding to the image data has a dysphagia when the number of is greater than or equal to a first threshold;
Including, machine learning-based automatic detection of dysphagia.

8. The method of claim 7,
The swallowing disorder determination unit
When the frames above the first threshold value identified as having dysphagia are not continuous, the frame existing between the frames identified as having dysphagia is pre-selected as having dysphagia, a machine learning-based automatic detection device for dysphagia.

9. The method of claim 8,
A machine learning-based automatic detection device for dysphagia that uses a median filter for the post-correction.

8. The method of claim 7,
The swallowing disorder determination unit
A machine learning-based automatic swallowing disorder detection device that determines whether a swallowing disorder occurs in the number of throat swallows in the image data.

8. The method of claim 7,
The swallowing disorder determination unit
To determine how many times the dysphagia has occurred in the image data, a machine learning-based automatic detection of dysphagia.

8. The method of claim 7,
The image data receiving unit includes an image pre-processing unit for pre-processing the image data,
The image preprocessor, machine learning-based automatic detection of dysphagia, which performs at least one of equalizing the pixel brightness of the image data and cropping the image data to the same size.

A program that is combined with a computer which is hardware and is stored in a medium for executing the method of any one of claims 1 to 6.