KR20200010621A

KR20200010621A - Device and method for insurance unfair claim and unfair pattern detection based on artificial intelligence

Info

Publication number: KR20200010621A
Application number: KR1020180070403A
Authority: KR
Inventors: 김지혁; 안재형
Original assignee: (주)위세아이텍
Priority date: 2018-06-19
Filing date: 2018-06-19
Publication date: 2020-01-31

Abstract

According to an embodiment of the present invention, an apparatus for detecting a fraudulent insurance claim and a fraudulent pattern based on artificial intelligence includes: a data preprocessing part deriving property variables by standardizing insurance claim history data; an abnormal value detection part detecting a normal pattern and a fraudulent pattern from an abnormal group detected based on an abnormality detection algorithm into which the property variables are inputted; a new pattern classification part detecting a new pattern of a fraudulent claim by performing unsupervised learning into which the normal and fraudulent patterns are inputted; a new pattern detection part detecting a pattern of a new fraudulent claim by building a pattern determination model based on supervised learning into which the new pattern of the fraudulent claim and a pre-classified insurance claim pattern are inputted; and a fraudulent claim determination part determining whether a new insurance claim history is fraudulent by building a fraudulent claim detection model based on supervised learning into which the property variables and the normal claim data are inputted.

Description

Artificial intelligence-based insurance claim fraud and fraud pattern detection device and method {DEVICE AND METHOD FOR INSURANCE UNFAIR CLAIM AND UNFAIR PATTERN DETECTION BASED ON ARTIFICIAL INTELLIGENCE}

본원은 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법에 관한 것이다.The present invention relates to an artificial intelligence-based claims and claims pattern detection apparatus and method.

기존의 보험사기 방지시스템은 비즈니스 룰 기반으로 청구된 보험 사건에 대해 심사자의 경험과 지식을 바탕으로 룰을 도출하는 방법으로 조사대상 룰과 조사대상 제외 룰로 구분하여 산출하고 있다 그러나 보험사기는 점점 지능화 및 고도화됨에 따라 새로운 부당청구 패턴 내지 사기패턴에 대한 지속적인 갱신이 필요하다.The existing insurance fraud prevention system calculates rules based on business rules based on the reviewer's experience and knowledge, and divides them into investigation rules and exclusion rules. And as it is advanced, it is necessary to continuously update new fraudulent patterns or fraud patterns.

또한, 보험회사는 보험사기에 대해 아무런 조치도 취하지 않은 채 보험금 누수를 방치할 수도 없고, 보험사기를 완전히 밝혀낸다는 목적하에 무한대의 조사비용을 지출할 수도 없다. 이에 보험회사는 보험사기로부터 누수보험금을 줄이는 한편 지나치게 많은 조사비용을 지출하지 않는 적정한 선에서 조사노력의 수준을 결정할 필요가 있다. 최근에는 인공지능 기반 학습 모델을 통해 보험사기를 분석하는 기술들이 개발된 바 있으나, 보험사기 분석 기술과 관련하여 그 개발 수준이 마땅치 않은 실정이다.In addition, insurers cannot leave a claim without taking any action on insurance fraud, and cannot spend an infinite amount of investigation on the purpose of fully revealing the fraud. Thus, insurance companies need to reduce the amount of leakage from insurance fraud and determine the level of investigation efforts on a reasonable basis without spending too much investigative expenditure. Recently, technologies for analyzing insurance frauds have been developed through AI-based learning models, but the level of development is not appropriate for insurance fraud analysis techniques.

본원의 배경이 되는 기술은 한국등록특허공보 제10-0862181호에 개시되어 있다.Background art of the present application is disclosed in Korean Patent Publication No. 10-0862181.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 보험금 부당청구의 신규 패턴을 판별할 수 있는 판별 모델을 제공하는 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법을 제공하는 것을 목적으로 한다.An object of the present invention is to solve the above-mentioned problems of the prior art, and to provide an artificial intelligence-based claim and claim pattern detection apparatus and method that provides a discrimination model that can determine a new pattern of claims unfair claim. do.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 부당청구 데이터를 학습하여 새로운 부당청구 패턴을 분석하고, 신규 청구건에 대한 청구 패턴 유형을 판별할 수 있는 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법을 제공하는 것을 목적으로 한다.The present invention is to solve the above-mentioned problems of the prior art, artificial intelligence-based insurance claims unfair and unfair claims that can analyze the new unfair claims pattern by learning the unfair claims data, and determine the type of claim pattern for the new claim An object of the present invention is to provide a pattern detecting apparatus and method.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 보험금 청구 관련 데이터를 학습하여 신규 청구건에 대한 부당을 판별하는 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법을 제공하는 것을 목적으로 한다.The present invention is to solve the above-mentioned problems of the prior art, and to provide an artificial intelligence-based claim unfair claims and fraud pattern detection apparatus and method for learning the claim-related data to determine unfairness for a new claim. do.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiments of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치는, 보험청구 내역 데이터를 정형화하여 특질 변수를 도출하는 데이터 전처리부, 상기 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여 탐지된 이상 군집으로부터 정상 패턴 및 부당 패턴을 검출하는 이상치 탐지부, 상기 정상 패턴 및 부당 패턴을 입력으로 하는 비지도 학습을 수행하여, 부당청구의 신규 패턴을 탐지하는 신규 패턴 분류부, 상기 부당청구의 신규 패턴 및 기분류된 보험 청구 패턴을 입력으로 하는 지도 학습 기반의 패턴 판별 모델을 구축하여 신규 부당청구의 패턴을 탐지하는 신규 패턴 탐지부 및 상기 특질 변수 및 정상 청구 데이터를 입력으로 하는 지도 학습 기반의 부당청구 탐지 모델을 구축하여 신규 보험청구 내역의 부당 여부를 판단하는 부당청구 판단부를 포함할 수 있다.As a technical means for achieving the above technical problem, the artificial intelligence-based claim claims and fraud pattern detection apparatus according to an embodiment of the present application, data pre-processing unit for deriving the characteristic variable by formulating the insurance claim details data, the feature An outlier detection unit that detects a normal pattern and an invalid pattern from the detected abnormal clusters based on the abnormal detection algorithm using a variable as an input, and performs an unsupervised learning using the normal pattern and the invalid pattern as inputs, thereby creating a new pattern of invalid claims. New pattern classification unit for detecting a new pattern detection unit, a new pattern detection unit for detecting a pattern of a new invalid claim by building a pattern discrimination model based on the supervised learning that inputs the new pattern of the invalid claim and the desired insurance claim pattern Supervised detection based on supervised learning based on variable and normal billing data The model may include an unjust claim determination unit that determines whether the new claim details are unfair.

본원의 일 실시예에 따르면, 상기 보험청구 내역 데이터, 상기 특질 변수 및 상기 신규 패턴 중 적어도 하나를 기록하는 데이터베이스를 더 포함하고, 상기 보험청구 내역 데이터는 청구 데이터, 계약 데이터, 지급 데이터, 보험설계사 데이터, 고객 데이터 중 적어도 하나를 포함할 수 있다.According to an embodiment of the present application, the claim history data, and further comprising a database for recording at least one of the characteristic variable and the new pattern, the claim history data is billing data, contract data, payment data, insurance agent It may include at least one of data, customer data.

본원의 일 실시예에 따르면, 상기 이상치 탐지부는, 상기 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여, 보험 청구의 정상거래 및 부당거래 각각의 특질 변수의 정상치 및 이상치를 검출하고, 상기 이상치에 대응하는 특질 변수의 빈도에 기초하여 상기 이상 군집을 탐지할 수 있다.According to an embodiment of the present application, the outlier detection unit detects the normal value and the outlier value of each characteristic variable of the normal transaction and the invalid transaction of the insurance claim based on the abnormality detection algorithm using the characteristic variable as an input. The anomaly cluster can be detected based on the frequency of the corresponding feature variable.

본원의 일 실시예에 따르면, 상기 신규 패턴 분류부는, 군집 알고리즘에 기초하여 상기 정상 패턴 및 부당 패턴을 군집하고, 상기 정상 패턴 및 부당 패턴의 군집간 분리도에 기초하여 상기 부당청구의 신규 패턴을 검출할 수 있다.According to the exemplary embodiment of the present application, the new pattern classification unit clusters the normal pattern and the invalid pattern based on a clustering algorithm, and detects the new pattern of the invalid claim based on the separation degree between the normal pattern and the invalid pattern. can do.

본원의 일 실시예에 따르면, 상기 신규 패턴 탐지부는, 분류 알고리즘에 기초하여 상기 신규 부당청구의 패턴을 탐지할 수 있다.According to an embodiment of the present application, the new pattern detection unit may detect the pattern of the new invalid claim based on a classification algorithm.

본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 탐지 방법은, (a) 보험청구 내역 데이터를 정형화하여 특질 변수를 도출하는 단계, (b) 상기 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여 탐지된 이상 군집으로부터 정상 패턴 및 부당 패턴을 검출하는 단계, (c) 상기 정상 패턴 및 부당 패턴을 입력으로 하는 비지도 학습을 수행하여, 부당청구의 신규 패턴을 탐지하는 단계, (d) 상기 부당청구의 신규 패턴 및 기 분류된 보험 청구 패턴을 입력으로 하는 지도 학습 기반의 패턴 판별 모델을 구축하여 신규 부당청구의 패턴을 탐지하는 단계 및 (e) 상기 특질 변수 및 정상 청구 데이터를 입력으로 하는 지도 학습 기반의 부당청구 탐지 모델을 구축하여 신규 보험청구 내역의 부당 여부를 판단하는 단계를 포함할 수 있다.Artificial intelligence-based claims fraud detection method according to an embodiment of the present invention, (a) formulating the claim details data to derive the characteristic variable, (b) based on the abnormality detection algorithm as the characteristic variable input Detecting a normal pattern and an invalid pattern from the detected abnormal cluster, (c) performing unsupervised learning with the normal pattern and an invalid pattern as input, and detecting a new pattern of an invalid claim, (d) the invalid pattern Constructing a pattern learning model based on supervised learning with input of new patterns of claims and pre-sorted insurance claim patterns to detect patterns of new fraudulent claims; and (e) Maps using the characteristic variables and normal claim data as inputs. It may include the step of determining whether the new claim details are unfair by building a learning-based fraud detection model.

본원의 일 실시예에 따르면, 인공지능 기반 보험금 부당청구 탐지 방법은 상기 보험청구 내역 데이터, 상기 특질 변수 및 상기 신규 패턴 중 적어도 하나를 기록하는 단계를 더 포함하고, 상기 보험청구 내역 데이터는 청구 데이터, 계약 데이터, 지급 데이터, 보험설계사 데이터, 고객 데이터 중 적어도 하나를 포함할 수 있다.According to one embodiment of the present application, the artificial intelligence-based claim fraud detection method further comprises the step of recording at least one of the claim history data, the characteristic variable and the new pattern, the claim history data is billing data , Contract data, payment data, insurance company data, and customer data.

본원의 일 실시예에 따르면, 상기 (b)단계는, 상기 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여, 보험 청구의 정상거래 및 부당거래 각각의 특질 변수의 정상치 및 이상치를 검출하고, 상기 이상치에 대응하는 특질 변수의 빈도에 기초하여 상기 이상 군집을 탐지할 수 있다.According to an embodiment of the present application, the step (b) detects the normal value and the abnormal value of each characteristic variable of the normal transaction and the invalid transaction of the insurance claim based on the abnormality detection algorithm which takes the characteristic variable as an input, and The abnormal cluster can be detected based on the frequency of the feature variable corresponding to the outlier.

본원의 일 실시예에 따르면, 상기 (c)단계는, 군집 알고리즘에 기초하여 상기 정상 패턴 및 부당 패턴을 군집하고, 상기 정상 패턴 및 부당 패턴의 군집간 분리도에 기초하여 상기 부당청구의 신규 패턴을 검출할 수 있다.According to an embodiment of the present application, the step (c) clusters the normal pattern and the invalid pattern based on a clustering algorithm, and based on the degree of separation between the clusters of the normal pattern and the invalid pattern, a new pattern of the invalid claim is made. Can be detected.

본원의 일 실시예에 따르면, 상기 (d)단계는, 분류 알고리즘에 기초하여 상기 신규 부당청구의 패턴을 탐지할 수 있다.According to an embodiment of the present application, step (d) may detect the pattern of the new invalid claim based on a classification algorithm.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-mentioned means for solving the problems are merely exemplary, and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 보험금 부당청구의 신규 패턴을 판별할 수 있는 판별 모델을 제공하는 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an artificial intelligence-based claims and claims pattern detection apparatus and method that provides a discrimination model that can determine a new pattern of claims unfair claims.

전술한 본원의 과제 해결 수단에 의하면, 부당청구 데이터를 학습하여 새로운 부당청구 패턴을 분석하고, 신규 청구건에 대한 청구 패턴 유형을 판별할 수 있는 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present invention, AI-based claims fraud and fraud pattern detection device that can learn a new claim data to analyze a new claim pattern, and determine the type of claim pattern for a new claim and It may provide a method.

전술한 본원의 과제 해결 수단에 의하면, 보험금 청구 관련 데이터를 학습하여 신규 청구건에 대한 부당을 판별하는 인공지능 기반의 보험금 부당청구 및 부당패턴 탐지 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an artificial intelligence-based claim unfair claim and unfair pattern detection apparatus and method for learning the claim claim-related data to determine unfairness for a new claim.

도 1은 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 구성을 도시한 도면이다.
도 2는 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 특질변수의 예를 도시한 도면이다.
도 3은 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 이상 탐지 알고리즘에 의해 분류된 패턴을 도시한 도면이다.
도 4는 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 이상 탐지 알고리즘의 탐지 결과를 도시한 도면이다.
도 5는 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 부당청구 탐지 모델의 알고리즘별 정확도를 도시한 도면이다.
도 6은 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 방법의 흐름을 도시한 도면이다.1 is a diagram illustrating a configuration of an artificial intelligence-based insurance claim unfair claim and unfair pattern detection device according to an embodiment of the present application.
FIG. 2 is a diagram illustrating an example of characteristic variables of an artificial intelligence-based insurance claim unfair claim and unfair pattern detection apparatus according to an embodiment of the present disclosure.
FIG. 3 is a diagram illustrating patterns classified by an abnormality detection algorithm of an artificial intelligence-based insurance claim unfair claim and unfair pattern detection device according to an embodiment of the present disclosure.
4 is a diagram illustrating a detection result of an anomaly detection algorithm of an artificial intelligence-based insurance claim unfair claim and an unfair pattern detection device according to an embodiment of the present application.
FIG. 5 is a diagram illustrating algorithm-specific accuracy of an invalid claim detection model of an artificial intelligence-based claim claim and an invalid pattern detection device according to an embodiment of the present disclosure.
6 is a flowchart illustrating a method of artificial intelligence-based insurance claim unfair claim and unfair pattern detection method according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present disclosure. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted for simplicity of explanation, and like reference numerals designate like parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is said to be located on another member "on", "upper", "top", "bottom", "bottom", "bottom", this means that any member This includes not only the contact but also the case where another member exists between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding the other components unless otherwise stated.

도 1은 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of an artificial intelligence-based insurance claim unfair claim and unfair pattern detection device according to an embodiment of the present application.

도 1을 참조하면, 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치(100)는 데이터 전처리부(110), 이상치 탐지부(120), 신규 패턴 분류부(130), 신규 패턴 탐지부(140) 및 부당청구 판단부(150)를 포함할 수 있다. 데이터 전처리부(110)는 보험청구 내역 데이터를 정형화하여 특질 변수를 도출할 수 있다.Referring to FIG. 1, the AI-based insurance claim fraud and unfair pattern detection apparatus 100 includes a data preprocessor 110, an outlier detection unit 120, a new pattern classification unit 130, and a new pattern detection unit 140. And an unfair billing determination unit 150. The data preprocessor 110 may derive the characteristic variable by formulating the insurance claim history data.

도 2는 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 특질변수의 예를 도시한 도면이다.FIG. 2 is a diagram illustrating an example of characteristic variables of an artificial intelligence-based insurance claim unfair claim and unfair pattern detection apparatus according to an embodiment of the present disclosure.

후술하는 설명은 보험금의 부당청구 뿐만 아니라 보조금, 지원금, 보증금 등의 부당청구를 포함하는 것은 자명하나, 설명의 편의를 위해 보험금 부당청구를 중심으로 설명한다. 상기 보험청구 내역 데이터는 예시적으로, 보험금을 청구한 청구 데이터, 보험 또는 보조금 관련 계약 데이터, 보험금의 지급 데이터, 보험 설계사 데이터, 고객 데이터 및 보험사기 적발 결과 데이터 중 적어도 하나를 포함할 수 있다. 데이터 전처리부(110)는 상기 보험청구 내역 데이터를 정형화하여 특질 변수를 도출할 수 있다. 보험청구 내역 데이터의 정형화는 예를 들어, 고객 데이터인 경우, 고객의 소득 수준, 고객이 방문한 병원의 수, 고객이 신청한 질병 사유의 개수 등을 수치화하는 것을 의미한다. 또한, 데이터 전처리부(110)에 의해 도출되는 특질 변수는 보험금 청구와 관련되어 수치화된 값을 가질 수 있는 변수로, 예를 들어, 고객 ID 번호, 보험사기자 여부, 동일 병명으로 중복 신청한 개수, 계약 체결 년월 개수, 하루 최대 계약 개수, 지불 승인된 증권 개수, 지불 신청한 증권 개수, 고객이 체결한 증권 개수, 보장성 보험 청구 횟수, 신용 등급 변화량, 가입한 계약의 종류, 유의 병원 방문 총 횟수, 고객이 신청한 질병 사유의 개수, 고객이 만난 의사의 명수, 고객이 방문한 병원의 개수, 유효 입/통원 총 일수, 진료 과목 개수, 고객 소득 수준, FP(Financial Planner) 변경 횟수, 실손 처리 개수, 사기 FP 계약 개수 중 적어도 하나를 포함할 수 있다. The description below will include not only claims unfair claims, but also claims, subsidies, security deposits, etc., but will be described based on claims unfair claims for convenience of explanation. For example, the insurance claim data may include at least one of claim data, insurance or subsidy-related contract data, insurance payment data, insurance company data, customer data, and insurance fraud detection result data. The data preprocessor 110 may derive the characteristic variable by formalizing the insurance claim data. For example, in the case of the customer data, the standardization of the insurance claim data means to quantify the income level of the customer, the number of hospitals visited by the customer, the number of disease reasons that the customer applied for, and the like. In addition, the characteristic variable derived by the data preprocessing unit 110 is a variable that may have a numerical value associated with the claim, for example, the customer ID number, whether the insurance fraud, the number of duplicate applications in the same bottle name, The number of months signed, the maximum number of contracts per day, the number of securities approved for payment, the number of securities applied for payment, the number of securities entered by the customer, the number of claims for insurance coverage, the change in credit rating, the type of contract signed, the total number of hospital visits, The number of reasons that the client applied for, the number of doctors the client met, the number of hospitals the client visited, the total number of days of effective admission / hospitalization, the number of treatment subjects, the level of the client's income, the number of changes in the Financial Planner, the number of losses, It may include at least one of the number of fraudulent FP contract.

도 3은 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 이상 탐지 알고리즘에 의해 분류된 패턴을 도시한 도면이다.FIG. 3 is a diagram illustrating patterns classified by an abnormality detection algorithm of an artificial intelligence-based insurance claim unfair claim and unfair pattern detection device according to an embodiment of the present disclosure.

이상치 탐지부(120)는 상기 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여 탐지된 이상 군집으로부터 정상 패턴 및 부당 패턴을 검출할 수 있다. 상기 이상 탐지 알고리즘은 전체 데이터 즉, 보험청구 내역 데이터의 평균적인 특성에 맞지 않는 소수의 이상 데이터를 탐지하는 알고리즘을 의미한다. 구체적으로, 이상치 탐지부(120)는 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여, 보험 청구의 정상거래 및 부당거래 각각의 특질 변수의 정상치 및 이상치를 검출하고, 상기 이상치에 대응하는 특질 변수의 빈도에 기초하여 상기 이상 군집을 탐지할 수 있다. 예시적으로, 이상 탐지 알고리즘은 아이솔레이션 포레스트(Isolation Forest) 알고리즘을 포함할 수 있으며, 아이솔레이션 포레스트 알고리즘은 트리기반으로 이상 데이터를 고립시킴으로써 이상 군집을 탐지할 수 있다. The outlier detection unit 120 may detect a normal pattern and an invalid pattern from the detected abnormal cluster based on the abnormality detection algorithm using the characteristic variable as an input. The abnormality detection algorithm refers to an algorithm that detects a small number of abnormal data that does not fit the average characteristic of the entire data, that is, the insurance claim history data. Specifically, the outlier detection unit 120 detects the normal value and the outlier value of each characteristic variable of the normal transaction and the invalid transaction of the insurance claim based on the abnormality detection algorithm that takes the characteristic variable as an input, and the characteristic variable corresponding to the abnormal value. The abnormal clusters can be detected based on the frequency of. For example, the abnormality detection algorithm may include an isolation forest algorithm, and the isolation forest algorithm may detect an abnormal cluster by isolating abnormal data on a tree basis.

이상치 탐지부(120)는 이상 탐지 알고리즘에 의해 탐지된 소수의 군집을 이상 군집으로 검출할 수 있다. 보험청구의 경우, 부당청구의 비율이 정상청구의 비율보다 작기 때문에 상기 알고리즘을 통해 이상 데이터를 고립시킬 수 있다. 즉, 이상치 탐지부(120)는 이상 군집을 부당 패턴으로 검출하고, 이상 군집에 속하지 않은 군집을 정상 패턴으로 검출할 수 있다. 도 3을 참조하면, 이상치 탐지부(120)는 아이솔레이션 포레스트 알고리즘에 의한 고립도에 따라, 정상 패턴, 주의 패턴, 의심 패턴, 이상 패턴으로 분류할 수 있다. 주의 및 의심 패턴은 부당청구의 가능성이 있으나, 부당으로 판별되지 않은 패턴으로 정상의 범주에 속할 수 있다.The outlier detection unit 120 may detect a small number of clusters detected by the abnormality detection algorithm as the abnormality cluster. In the case of insurance claim, since the ratio of unfair claims is smaller than that of normal claims, the above algorithm can isolate the abnormal data. That is, the outlier detection unit 120 may detect an abnormal cluster in an invalid pattern and detect a cluster that does not belong to the abnormal cluster in a normal pattern. Referring to FIG. 3, the outlier detection unit 120 may be classified into a normal pattern, a attention pattern, a suspicion pattern, and an abnormal pattern according to the isolation degree by the isolation forest algorithm. Attention and suspicion patterns may be fraudulent, but they may fall within the normal category with patterns not discriminated against.

도 4는 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 이상 탐지 알고리즘의 탐지 결과를 도시한 도면이다.4 is a diagram illustrating a detection result of an anomaly detection algorithm of an artificial intelligence-based insurance claim unfair claim and an unfair pattern detection device according to an embodiment of the present application.

도 4를 참조하면, 이상치 탐지부(120)는, 상기 이상 탐지 알고리즘을 통해 보험금 청구의 정상 거래와 부당 거래 각각의 정상치와 이상치를 탐지할 수 있다. 보험금 청구의 부당 거래는 기존의 부당 거래 패턴일 경우 정상치로 탐지될 수 있는 반면, 부당 거래의 이상치는 부당 거래의 새로운 패턴일 가능성이 높다. 예시적으로, 아이솔레이션 포레스트 알고리즘을 이용한 정상치 및 이상치 탐지의 정확도는 87%의 성능을 나타낸다.Referring to FIG. 4, the outlier detection unit 120 may detect the normal value and the outlier value of the normal transaction and the unfair transaction of the claim through the abnormality detection algorithm. Unfair trades in claims can be detected as normal when there is an existing unfair trade pattern, while outliers are likely to be a new pattern of unfair trade. By way of example, the accuracy of normal and outlier detection using the isolation forest algorithm represents 87% performance.

신규 패턴 분류부(130)는 정상 패턴 및 부당 패턴을 입력으로 하는 비지도 학습을 수행하여, 부당청구의 신규 패턴 즉, 부당패턴을 탐지할 수 있다. 비지도 학습이란 학습용 데이터를 구축하는 것이 아닌 데이터 자체를 분석하거나 군집하면서 학습하는 알고리즘을 의미한다. 이는 공지된 사항이므로 구체적인 설명은 생략한다. 신규 패턴 분류부(130)는 군집 알고리즘에 기초하여 상기 정상 패턴 및 부당 패턴을 군집하고, 상기 정상 패턴 및 부당 패턴의 군집간 분리도에 기초하여 상기 부당청구의 신규 패턴(부당패턴)을 검출할 수 있다. The new pattern classification unit 130 may perform unsupervised learning using a normal pattern and an invalid pattern as an input to detect a new pattern of an invalid claim, that is, an invalid pattern. Unsupervised learning refers to an algorithm that learns by analyzing or clustering data itself, rather than constructing learning data. Since this is a known matter, a detailed description thereof will be omitted. The new pattern classification unit 130 may cluster the normal pattern and the invalid pattern based on a clustering algorithm, and detect the new pattern (unfair pattern) of the invalid claim based on the separation degree between the normal pattern and the invalid pattern. have.

예시적으로 상기 비지도 학습을 위한 군집 알고리즘에는 로지스틱 회귀 알고리즘, 랜덤 포레스트 알고리즘, SVM(Support Vector Machine)알고리즘, 의사결정 알고리즘 및 군집 알고리즘이 이용될 수 있다. 또한, 신규 패턴 분류부(130)는 상술한 알고리즘 외에도 Extra Tree알고리즘, XG Boost알고리즘 및 Deep Learning 알고리즘, K-means 클러스터링 알고리즘, SOM(Self-Organizing-Maps) 알고리즘 EM & Canopy 알고리즘과 같은 군집 알고리즘을 통해 비지도 학습을 수행할 수 있다. Random Forest알고리즘은 수많은 Decision Tree들이 Forest를 구성하여 각각의 예측결과를 하나의 결과변수로 평균화하는 알고리즘이고, SVM알고리즘은 데이터의 분포공간에서 가장 큰 폭의 경계를 구분하여 데이터가 속하는 분류를 판단하는 비확률적 알고리즘이다. Extra Tree알고리즘은 Random forest와 비슷하나 속도가 Random forest에 비해 빠른 알고리즘이며, XGBoost알고리즘은 Random Forest의 Tree는 독립적이라면 XGBoost의 Tree의 결과를 다음 트리에 적용하는 boost방식의 알고리즘이다. Deep Learning알고리즘은 다층구조의 Neural Network을 기반으로 변수의 패턴이 결과에 미치는 영향을 가중치로 조절하며 학습하는 알고리즘이다. 또한, K-means 클러스터링 알고리즘은 전통적인 분류기법으로 대상집단을 거리의 평균값(유사도)을 기준으로 K개의 군집으로 반복 세분화하는 기법이고, SOM알고리즘은 인공신경망을 기반으로 훈련집합의 입력 패턴을 가중치로 학습하여 군집화하는 기법이다. 또한, EMI & Canopy 알고리즘은 주어진 초기값으로 가능성이 최대인 것부터 반복 과정을 통해 파라미터 값을 갱신하여 군집화 하는 기법을 의미한다.For example, a logistic regression algorithm, a random forest algorithm, a support vector machine (SVM) algorithm, a decision algorithm, and a clustering algorithm may be used as the clustering algorithm for unsupervised learning. In addition to the above-described algorithm, the new pattern classification unit 130 may use clustering algorithms such as Extra Tree algorithm, XG Boost algorithm, Deep Learning algorithm, K-means clustering algorithm, Self-Organizing-Maps (SOM) algorithm, EM & Canopy algorithm. Can lead to unsupervised learning. The random forest algorithm is an algorithm that a lot of decision trees form a forest and averaging each prediction result into one result variable. The SVM algorithm determines the classification to which data belongs by dividing the largest boundary in the data distribution space. It is a non-probability algorithm. The Extra Tree algorithm is similar to the Random forest, but the algorithm is faster than the Random forest. The XGBoost algorithm is a boost algorithm that applies the result of the XGBoost tree to the next tree if the tree of the Random Forest is independent. Deep Learning algorithm is an algorithm that learns by adjusting the effect of the pattern of variable on the weight based on multi-layer Neural Network. In addition, K-means clustering algorithm is a traditional classification technique that repeats the target group into K clusters based on the mean value (similarity) of distance, and the SOM algorithm is based on the artificial neural network. It is a technique of learning and clustering. In addition, the EMI & Canopy algorithm refers to a technique for clustering by updating parameter values through the iterative process from the maximum probability to a given initial value.

신규 패턴 분류부(130)는 상기 군집 알고리즘을 통해 상기 정상 패턴 및 부당 패턴을 복수개 패턴 군집으로 군집할 수 있다. 즉 패턴 군집은 동질성 있는 특질 변수들의 군집일 수 있다. 또한, 패턴 군집의 군집간 분리도 즉 군집간 거리에 기초하여 상기 부당청구의 신규 패턴을 검출할 수 있다. 신규 패턴 분류부(130)는 유사한 정상 패턴과 부당 패턴을 군집할 수 있으며, 유사하지 않은 패턴을 신규 패턴으로 분류하되, 신규 패턴의 특질 변수에 기초하여 정상 청구의 패턴인 것으로 판단되면, 정상 패턴으로 분류하고, 상기 신규 패턴이 부당 패턴인 것으로 판단되면, 부당청구의 신규 패턴으로 검출될 수 있다. 구체적으로, 정상 패턴 또는 부당 패턴에 속한 특질 변수들의 빈도가 유사한 경우, 유사한 특질 변수끼리 동일 내지 유사한 패터닝(또는 식별자)가 부여될 수 있다. 즉 패턴 상호간 패터닝이 동일 내지 유사한 경우, 군집간 분리도가 낮다고 할 수 있다. 또한, 특질 변수간 유사한 빈도로 군집된 청구 패턴은 정상적인 보험 청구로 인해 발생하는 특질 변수의 빈도에 기초하여 군집된 청구 패턴 즉, 부당청구가 아닌 정상적인 청구 패턴인 것으로 판단할 수 있다. The new pattern classification unit 130 may cluster the normal pattern and the invalid pattern into a plurality of pattern clusters through the clustering algorithm. That is, the pattern cluster may be a cluster of homogeneous feature variables. In addition, it is possible to detect the new pattern of the illegal claim based on the degree of separation between the groups of the pattern clusters, that is, the distance between the groups. The new pattern classification unit 130 may cluster similar normal patterns and invalid patterns, and classify dissimilar patterns into new patterns, and if it is determined that the pattern is a normal claim based on the characteristic variables of the new patterns, the normal patterns If it is determined that the new pattern is an invalid pattern, it may be detected as a new pattern of an invalid claim. Specifically, when the frequency of the feature variables belonging to the normal pattern or the wrong pattern is similar, the same to similar patterning (or identifier) may be given to similar feature variables. That is, when the patterning between the patterns is the same or similar, it can be said that the degree of separation between clusters is low. In addition, the claim pattern clustered at similar frequencies between the feature variables may be determined to be a clustered claim pattern, that is, a normal claim pattern rather than an unfair claim, based on the frequency of the feature variables resulting from the normal insurance claim.

한편, 유사한 빈도를 가진 특질 변수들과 다른 빈도를 가진 특질 변수들로 군집된 패턴의 경우, 상기 라벨링과 다른 라벨링으로 구분될 수 있다. 이러한 패턴은 패턴은 정상적인 청구 패턴의 특질 변수의 빈도와는 다르므로(예를 들어, FP변경횟수가 정상적인 청구 패턴의 특질 변수에 비해 상대적으로 많은 경우), 부당청구의 신규 패턴일 수 있고, 신규 패턴 분류부(130)에 의해 검출될 수 있다. 부당청구의 신규 패턴을 검출하는 것은 인적자원이 한정된 상황에서도, 새롭게 등장하는 부당청구를 검출하기 위한 중요한 요소이다. 이러한 신규 패턴을 누적함으로써, 부당청구 검출의 신뢰도가 향상될 수 있고, 향후 등장할 알려지지 않은 미지의 부당청구에 대해서도 누적된 패턴 데이터에 기초하여 검출할 수 있는 기반을 마련할 수 있다.On the other hand, in the case of a pattern grouped with feature variables having similar frequencies and feature variables having different frequencies, the labeling may be divided into different labeling. Such a pattern may be a new pattern of fraudulent claims, since the pattern is different from the frequency of the feature variables of the normal claim pattern (e.g., when the number of FP changes is relatively high compared to the feature variables of the normal claim pattern). It may be detected by the pattern classifier 130. Detecting a new pattern of fraudulent claims is an important factor for detecting a newly appearing fraudulent claim even in a limited human resource. By accumulating such a new pattern, the reliability of detection of an invalid claim can be improved, and a foundation for detecting unknown unknown claims that will appear in the future can be provided based on the accumulated pattern data.

신규 패턴 탐지부(140)는 부당청구의 신규 패턴 및 기 분류된 보험 청구 패턴을 입력으로 하는 지도 학습 기반의 패턴 판별 모델을 구축하여 신규 부당청구의 패턴을 탐지할 수 있다. 지도 학습이란, 미리 구축된 학습용 데이터(training data)를 활용하여 모델을 학습하는 것을 의미한다. 또한, 신규 패턴 탐지부(140)는 분류 알고리즘에 기초하여 상기 신규 부당청구의 패턴을 탐지할 수 있다. 상기 분류 알고리즘은 의사결정 알고리즘을 포함하며 의사 결정 알고리즘은 예를 들어 Decision Tree알고리즘일 수 있으나, 이에 한정되는 것은 아니다. 신규 패턴 탐지부(140)는 부당청구의 신규 패턴 및 기 분류된 보험 청구 패턴을 입력으로 하는 의사결정 알고리즘에 기초하여 신규 패턴 판별 규칙을 학습할 수 있다. 또한, 신규 패턴 탐지부(140)는 신규 패턴 판별 규칙을 포함하는 신규 부당청구의 패턴 판별 모델을 구축할 수 있다. 전술한 비지도 학습 기반의 신규 패턴 분류는 단순히 빈도에 기초하여 신규 패턴을 분류할 수는 있으나, 어떠한 변수에 의해 분류되었는지는 알 수 없다. 따라서 지도 학습 기반의 의사결정 알고리즘에 기초하여 신규 패턴 판별 규칙을 학습할 수 있다. The new pattern detection unit 140 may detect a pattern of the new invalid claim by constructing a pattern discrimination model based on supervised learning that takes a new pattern of an invalid claim and a pre-classified insurance claim pattern. Supervised learning means learning a model using pre-built training data. In addition, the new pattern detector 140 may detect the pattern of the new invalid claim based on a classification algorithm. The classification algorithm includes a decision algorithm and the decision algorithm may be, for example, a decision tree algorithm, but is not limited thereto. The new pattern detection unit 140 may learn a new pattern discrimination rule based on a decision algorithm that inputs a new pattern of unfair claims and a pre-classified insurance claim pattern. In addition, the new pattern detection unit 140 may build a pattern discrimination model of a new invalid claim including a new pattern discrimination rule. The above-described new pattern classification based on unsupervised learning can simply classify the new pattern based on the frequency, but it is not known by which variable. Therefore, new pattern discrimination rules can be learned based on supervised learning-based decision algorithms.

상기 신규 패턴 판별 규칙이란, 임계값 이상의 특질 변수에 기초하여 청구 패턴을 신규 패턴으로 판별할 수 있는 규칙을 의미한다. 또한, 이러한 신규 패턴 판별 규칙을 종합하여 신규 부당청구의 패턴 판별 모델로 구축될 수 있다. 예시적으로, 상기 도 4를 참조하면, 정상 거래인 경우에도 이상치가 탐지될 수 있으며, 부당청구의 신규패턴일 가능성이 없다고 할 수 없다. 따라서, 부당청구의 신규 패턴 및 기분류된 보험청구 패턴을 포함하는 모든 패턴을 입력으로 하여 신규 패턴 판별 규칙이 학습될 수 있다. 또한, 신규 패턴 판별 규칙의 학습은 특질 변수의 중요도가 고려될 수 있다. 즉, 다양한 특질 변수 중에도 부당 청구의 위험성이 높은 특질 변수의 경우, 상대적으로 높은 중요도를 가질 수 있으며, 신규 패턴 판별 규칙의 학습 시 이러한 특질 변수의 중요도가 고려될 수 있다. 예를 들어, 특질 변수 중 유의 병원 방문 총 횟수는 고객이 만난 의사의 명수보다 높은 중요도가 부여될 수 있다. 즉, 신규 패턴 탐지부(140)는 특질 변수의 중요도를 고려함으로써 보다 정확한 신규 부당청구의 패턴 판별 모델을 구축할 수 있다.The new pattern discrimination rule means a rule capable of discriminating a claim pattern as a new pattern based on a characteristic variable of a threshold value or more. In addition, the new pattern discrimination rule may be integrated into a pattern discrimination model for new unfair claims. For example, referring to FIG. 4, even in the case of a normal transaction, an outlier may be detected, and it may not be said that there is no possibility that it is a new pattern of unfair claims. Therefore, the new pattern discrimination rule can be learned by inputting all the patterns including the new pattern of the unfair claim and the surviving claim pattern. In addition, the learning of the new pattern discrimination rule may be considered the importance of the feature variable. That is, among the various characteristic variables, the characteristic variables with high risk of fraudulent claims may have a relatively high importance, and the importance of these characteristic variables may be considered when learning new pattern discrimination rules. For example, the total number of significant hospital visits among the characteristic variables may be given a higher importance than the number of doctors the client met. That is, the new pattern detection unit 140 may construct a more accurate pattern discrimination model of new unfair claims by considering the importance of the feature variables.

도 5는 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치의 부당청구 탐지 모델의 알고리즘별 정확도를 도시한 도면이다.FIG. 5 is a diagram illustrating algorithm-specific accuracy of an invalid claim detection model of an artificial intelligence-based claim claim and an invalid pattern detection device according to an embodiment of the present disclosure.

부당청구 판단부(150)는 특질 변수 및 정상 청구 데이터를 입력으로 하는 지도 학습 기반의 부당청구 탐지 모델을 구축하여 신규 보험청구 내역의 부당 여부를 판단할 수 있다. 상기 부당청구 탐지 모델은 상기 특질 변수 및 정상 청구 데이터를 입력으로 하여 부당청구 탐지 결과를 출력할 수 있다. 부당청구 판단부(150)는 지도 학습 기반의 분류/예측 알고리즘에 기초하여 상기 부당청구 탐지 모델을 구축할 수 있다. 상기 분류/예측 알고리즘에는 전술한 군집 알고리즘에서 설명한 알고리즘일 수 있으므로 중복되는 설명은 생략한다.The invalid claim determination unit 150 may determine whether the new insurance claim is unfair by constructing a supervised detection model based on supervised learning using characteristic variables and normal billing data as inputs. The fraud detection model may output the fraud detection result by inputting the characteristic variable and the normal billing data. The invalid bill determination unit 150 may build the invalid bill detection model based on a classification / prediction algorithm based on supervised learning. The classification / prediction algorithm may be an algorithm described in the clustering algorithm described above, and thus redundant description thereof will be omitted.

상기 부당청구 탐지 모델은 상기 특질 변수 및 정상 청구 데이터를 입력으로 하여 새로운 보험청구 내역에 대해 부당청구 탐지 결과를 출력할 수 있다. 예시적으로, 부당청구 탐지 모델은 특질 변수와 정상 청구 데이터 간의 유사도에 기초하여 부당청구 참지 결과를 출력할 수 있다. 상기 유사도가 미리 설정된 임계값 미만이면 특질 변수와 연계된 보험청구를 정상으로 판단하고, 상기 유사도가 미리 설정된 임계값 이상이면 특질 변수와 연계된 보험청구를 부당청구로 판단할 수 있다. 상기 유사도는 K-means 클러스터링 알고리즘, SOM(Self-Organizing-Maps) 알고리즘 EM & Canopy 알고리즘 중 적어도 하나의 알고리즘에 의해 연산될 수 있다. 부당청구 판단부(150)는 입력에 의한 출력의 도출을 반복적으로 수행함으로써 부당청구 탐지 모델의 정확도를 향상시킬 수 있다.The fraud detection model may output the fraud detection result for the new insurance details by inputting the characteristic variable and the normal billing data. As an example, the fraud detection model may output the fraudulent sedentary result based on the similarity between the feature variable and the normal billing data. If the similarity is less than a preset threshold, the insurance claim associated with the characteristic variable may be determined to be normal. If the similarity is more than the preset threshold, the insurance claim associated with the characteristic variable may be determined to be an invalid claim. The similarity may be calculated by at least one of K-means clustering algorithm, Self-Organizing-Maps (SOM) algorithm, EM & Canopy algorithm. The invalidity determination unit 150 may improve the accuracy of the invalidation detection model by repeatedly deriving the output by the input.

부당청구 판단부(150)는 부당청구 탐지 모델에 기초하여 신규 보험청구 내역의 부당 여부를 판단할 수 있다. 부당청구 판단부(150)는 상기 구축된 부당청구 탐지 모델에 새로운 신규 보험청구 내역이 입력되면, 전술한 바와 같이 유사도에 기초하여 신규 보험청구 내역의 부당 여부가 판단될 수 있다. 이때, 신규 패턴 탐지부(140)에 의해 구축된 패턴 판별 모델에도 신규 보험청구 내역이 입력됨으로써, 신규 보험청구 내역이 부당청구인지 아닌지의 여부와, 해당 신규 보험청구의 패턴을 파악할 뿐만 아니라, 신규 보험청구가 부당청구인 경우, 부당청구의 신규패턴 여부까지 판별할 수 있다.The invalid claim determination unit 150 may determine whether the new insurance claim is invalid based on the invalid claim detection model. When the new claim details are input to the constructed claim detection model, the wrong claim determination unit 150 may determine whether the new claim details are unfair based on the similarity as described above. At this time, the new claim details are also input to the pattern discrimination model constructed by the new pattern detection unit 140, so as to determine whether the new claim details are unjust claims and the pattern of the new claim, as well as the new claims. If the insurance claim is an invalid claim, it is possible to determine whether the claim is a new pattern.

본원의 일 실시예에 따르면, 인공지능 기반 보험금 부당청구 탐지 장치(100)는 데이터베이스를 더 포함할 수 있으며, 데이터베이스는 보험청구 내역 데이터, 상기 특질 변수 및 상기 신규 패턴 중 적어도 하나를 기록할 수 있다.According to an embodiment of the present application, the artificial intelligence-based claim fraud detection apparatus 100 may further include a database, and the database may record at least one of claims history data, the characteristic variable, and the new pattern. .

도 6은 본원의 일 실시예에 따른 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 방법의 흐름을 도시한 도면이다.6 is a flowchart illustrating a method of artificial intelligence-based insurance claim unfair claim and unfair pattern detection method according to an embodiment of the present application.

도 6에 도시된 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 방법은 앞선 도 1 내지 도 5를 통해 설명된 인공지능 기반 보험금 부당청구 및 부당패턴 탐지 장치에 의하여 수행될 수 있다. 따라서 이하 생략된 내용이라고 하더라도 도 1 내지 도 5를 통해 신규 부당 청구 패턴 분석 장치에 대하여 설명된 내용은 도 6에도 동일하게 적용될 수 있다.The artificial intelligence-based insurance claim unfair claim and unfair pattern detection method illustrated in FIG. 6 may be performed by the artificial intelligence-based insurance claim unfair claim and unfair pattern detection apparatus described above with reference to FIGS. 1 to 5. Therefore, even if omitted below, the descriptions of the apparatus for analyzing new unfair claims through FIGS. 1 to 5 may be equally applicable to FIG. 6.

도 6을 참조하면, 단계 S610에서 데이터 전처리부(110)는 보험청구 내역 데이터를 정형화하여 특질 변수를 도출할 수 있다. 보험청구 내역 데이터의 정형화는 예를 들어, 고객 데이터인 경우, 고객의 소득 수준, 고객이 방문한 병원의 수, 고객이 신청한 질병 사유의 개수 등을 수치화하는 것을 의미한다. 또한, 데이터 전처리부(110)에 의해 도출되는 특질 변수는 보험금 청구와 관련되어 수치화된 값을 가질 수 있는 변수로, 예를 들어, 고객 ID 번호, 보험사기자 여부, 동일 병명으로 중복 신청한 개수, 계약 체결 년월 개수, 하루 최대 계약 개수, 지불 승인된 증권 개수, 지불 신청한 증권 개수, 고객이 체결한 증권 개수, 보장성 보험 청구 횟수, 신용 등급 변화량, 가입한 계약의 종류, 유의 병원 방문 총 횟수, 고객이 신청한 질병 사유의 개수, 고객이 만난 의사의 명수, 고객이 방문한 병원의 개수, 유효 입/통원 총 일수, 진료 과목 개수, 고객 소득 수준, FP(Financial Planner) 변경 횟수, 실손 처리 개수, 사기 FP 계약 개수 중 적어도 하나를 포함할 수 있다. Referring to FIG. 6, in operation S610, the data preprocessor 110 may derive characteristic variables by shaping insurance claim details data. For example, in the case of the customer data, the standardization of the insurance claim data means to quantify the income level of the customer, the number of hospitals visited by the customer, the number of disease reasons that the customer applied for, and the like. In addition, the characteristic variable derived by the data preprocessing unit 110 is a variable that may have a numerical value associated with the claim, for example, the customer ID number, whether the insurance fraud, the number of duplicate applications in the same bottle name, The number of months signed, the maximum number of contracts per day, the number of securities approved for payment, the number of securities applied for payment, the number of securities entered by the customer, the number of claims for insurance coverage, the change in credit rating, the type of contract signed, the total number of hospital visits, The number of reasons that the client applied for, the number of doctors the client met, the number of hospitals the client visited, the total number of days of effective admission / hospitalization, the number of treatment subjects, the level of the client's income, the number of changes in the Financial Planner, the number of losses, It may include at least one of the number of fraudulent FP contract.

단계 S620에서 이상치 탐지부(120)는 상기 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여 탐지된 이상 군집으로부터 정상 패턴 및 부당 패턴을 검출할 수 있다. 상기 이상 탐지 알고리즘은 전체 데이터 즉, 보험청구 내역 데이터의 평균적인 특성에 맞지 않는 소수의 이상 데이터를 탐지하는 알고리즘을 의미한다. 구체적으로, 이상치 탐지부(120)는 특질 변수를 입력으로 하는 이상 탐지 알고리즘에 기초하여, 보험 청구의 정상거래 및 부당거래 각각의 특질 변수의 정상치 및 이상치를 검출하고, 상기 이상치에 대응하는 특질 변수의 빈도에 기초하여 상기 이상 군집을 탐지할 수 있다. 예시적으로, 이상 탐지 알고리즘은 아이솔레이션 포레스트(Isolation Forest) 알고리즘을 포함할 수 있으며, 아이솔레이션 포레스트 알고리즘은 트리기반으로 이상 데이터를 고립시킴으로써 이상 군집을 탐지할 수 있다. 이상치 탐지부(120)는 이상 탐지 알고리즘에 의해 탐지된 소수의 군집을 이상 군집으로 검출할 수 있다. 보험청구의 경우, 부당청구의 비율이 정상청구의 비율보다 작기 때문에 상기 알고리즘을 통해 이상 데이터를 고립시킬 수 있다. 즉, 이상치 탐지부(120)는 이상 군집을 부당 패턴으로 검출하고, 이상 군집에 속하지 않은 군집을 정상 패턴으로 검출할 수 있다.In operation S620, the outlier detection unit 120 may detect a normal pattern and an invalid pattern from the detected abnormal cluster based on the abnormality detection algorithm using the characteristic variable as an input. The abnormality detection algorithm refers to an algorithm that detects a small number of abnormal data that does not fit the average characteristic of the entire data, that is, the insurance claim history data. Specifically, the outlier detection unit 120 detects the normal value and the outlier value of each characteristic variable of the normal transaction and the invalid transaction of the insurance claim based on the abnormality detection algorithm that takes the characteristic variable as an input, and the characteristic variable corresponding to the abnormal value. The abnormal clusters can be detected based on the frequency of. For example, the abnormality detection algorithm may include an isolation forest algorithm, and the isolation forest algorithm may detect an abnormal cluster by isolating abnormal data on a tree basis. The outlier detection unit 120 may detect a small number of clusters detected by the abnormality detection algorithm as the abnormality cluster. In the case of insurance claim, since the ratio of unfair claims is smaller than that of normal claims, the above algorithm can isolate the abnormal data. That is, the outlier detection unit 120 may detect an abnormal cluster in an invalid pattern and detect a cluster that does not belong to the abnormal cluster in a normal pattern.

단계 S630에서 신규 패턴 분류부(130)는 정상 패턴 및 부당 패턴을 입력으로 하는 비지도 학습을 수행하여, 부당청구의 신규 패턴을 탐지할 수 있다. 비지도 학습이란 학습용 데이터를 구축하는 것이 아닌 데이터 자체를 분석하거나 군집하면서 학습하는 알고리즘을 의미한다. 이는 공지된 사항이므로 구체적인 설명은 생략한다. 신규 패턴 분류부(130)는 군집 알고리즘에 기초하여 상기 정상 패턴 및 부당 패턴을 군집하고, 상기 정상 패턴 및 부당 패턴의 군집간 분리도에 기초하여 상기 부당청구의 신규 패턴을 검출할 수 있다. In operation S630, the new pattern classification unit 130 may perform unsupervised learning using the normal pattern and the invalid pattern as inputs to detect the new pattern of the invalid claim. Unsupervised learning refers to an algorithm that learns by analyzing or clustering data itself, rather than constructing learning data. Since this is a known matter, a detailed description thereof will be omitted. The new pattern classification unit 130 may cluster the normal pattern and the invalid pattern based on a clustering algorithm, and detect the new pattern of the invalid claim based on the separation degree between the normal pattern and the invalid pattern.

단계 S640에서 신규 패턴 탐지부(140)는 부당청구의 신규 패턴 및 기 분류된 보험 청구 패턴을 입력으로 하는 지도 학습 기반의 패턴 판별 모델을 구축하여 신규 부당청구의 패턴을 탐지할 수 있다. 지도 학습이란, 미리 구축된 학습용 데이터(training data)를 활용하여 모델을 학습하는 것을 의미한다. 또한, 신규 패턴 탐지부(140)는 분류 알고리즘에 기초하여 상기 신규 부당청구의 패턴을 탐지할 수 있다. 상기 분류 알고리즘은 의사결정 알고리즘을 포함하며 의사 결정 알고리즘은 예를 들어 Decision Tree알고리즘일 수 있으나, 이에 한정되는 것은 아니다. 신규 패턴 탐지부(140)는 부당청구의 신규 패턴 및 기 분류된 보험 청구 패턴을 입력으로 하는 의사결정 알고리즘에 기초하여 신규 패턴 판별 규칙을 학습할 수 있다. 또한, 신규 패턴 탐지부(140)는 신규 패턴 판별 규칙을 포함하는 신규 부당청구의 패턴 판별 모델을 구축할 수 있다. 전술한 비지도 학습 기반의 신규 패턴 분류는 단순히 빈도에 기초하여 신규 패턴을 분류할 수는 있으나, 어떠한 변수에 의해 분류되었는지는 알 수 없다. 따라서 지도 학습 기반의 의사결정 알고리즘에 기초하여 신규 패턴 판별 규칙을 학습할 수 있다. In operation S640, the new pattern detection unit 140 may detect a pattern of the new invalid claim by constructing a pattern discrimination model based on a supervised learning using the new pattern of the invalid claim and the pre-classified insurance claim pattern. Supervised learning means learning a model using pre-built training data. In addition, the new pattern detector 140 may detect the pattern of the new invalid claim based on a classification algorithm. The classification algorithm includes a decision algorithm and the decision algorithm may be, for example, a decision tree algorithm, but is not limited thereto. The new pattern detection unit 140 may learn a new pattern discrimination rule based on a decision algorithm that inputs a new pattern of unfair claims and a pre-classified insurance claim pattern. In addition, the new pattern detection unit 140 may build a pattern discrimination model of a new invalid claim including a new pattern discrimination rule. The above-described new pattern classification based on unsupervised learning can simply classify the new pattern based on the frequency, but it is not known by which variable. Therefore, new pattern discrimination rules can be learned based on supervised learning-based decision algorithms.

상기 신규 패턴 판별 규칙이란, 임계값 이상의 특질 변수에 기초하여 청구 패턴을 신규 패턴으로 판별할 수 있는 규칙을 의미한다. 또한, 이러한 신규 패턴 판별 규칙을 종합하여 신규 부당청구의 패턴 판별 모델로 구축될 수 있다. 예시적으로, 정상 거래인 경우에도 이상치가 탐지될 수 있으며, 부당청구의 신규패턴일 가능성이 없다고 할 수 없다. 따라서, 부당청구의 신규 패턴 및 기분류된 보험청구 패턴을 포함하는 모든 패턴을 입력으로 하여 신규 패턴 판별 규칙이 학습될 수 있다. 또한, 신규 패턴 판별 규칙의 학습은 특질 변수의 중요도가 고려될 수 있다. 즉, 다양한 특질 변수 중에도 부당 청구의 위험성이 높은 특질 변수의 경우, 상대적으로 높은 중요도를 가질 수 있으며, 신규 패턴 판별 규칙의 학습 시 이러한 특질 변수의 중요도가 고려될 수 있다. 예를 들어, 특질 변수 중 유의 병원 방문 총 횟수는 고객이 만난 의사의 명수보다 높은 중요도가 부여될 수 있다. 즉, 신규 패턴 탐지부(140)는 특질 변수의 중요도를 고려함으로써 보다 정확한 신규 부당청구의 패턴 판별 모델을 구축할 수 있다The new pattern discrimination rule means a rule capable of discriminating a claim pattern as a new pattern based on a characteristic variable of a threshold value or more. In addition, the new pattern discrimination rule may be integrated into a pattern discrimination model for new unfair claims. For example, an outlier may be detected even in a normal transaction, and it may not be said that it is not likely to be a new pattern of fraudulent claims. Therefore, the new pattern discrimination rule can be learned by inputting all the patterns including the new pattern of the unfair claim and the surviving claim pattern. In addition, the learning of the new pattern discrimination rule may be considered the importance of the feature variable. That is, among the various characteristic variables, the characteristic variables with high risk of fraudulent claims may have a relatively high importance, and the importance of these characteristic variables may be considered when learning new pattern discrimination rules. For example, the total number of significant hospital visits among the characteristic variables may be given a higher importance than the number of doctors the client met. That is, the new pattern detector 140 may construct a more accurate pattern discrimination model of new unfair claims by considering the importance of the feature variables.

단계 S650에서 부당청구 판단부(150)는 특질 변수 및 정상 청구 데이터를 입력으로 하는 지도 학습 기반의 부당청구 탐지 모델을 구축하여 신규 보험청구 내역의 부당 여부를 판단할 수 있다. 상기 부당청구 탐지 모델은 상기 특질 변수 및 정상 청구 데이터를 입력으로 하여 부당청구 탐지 결과를 출력할 수 있다. 부당청구 판단부(150)는 지도 학습 기반의 분류/예측 알고리즘에 기초하여 상기 부당청구 탐지 모델을 구축할 수 있다. 상기 분류/예측 알고리즘에는 전술한 군집 알고리즘에서 설명한 알고리즘일 수 있으므로 중복되는 설명은 생략한다.In operation S650, the invalid claim determination unit 150 may determine whether the new insurance claim is invalid by constructing a supervised detection model based on supervised learning that uses characteristic variables and normal billing data as inputs. The fraud detection model may output the fraud detection result by inputting the characteristic variable and the normal billing data. The invalid bill determination unit 150 may build the invalid bill detection model based on a classification / prediction algorithm based on supervised learning. The classification / prediction algorithm may be an algorithm described in the clustering algorithm described above, and thus redundant description thereof will be omitted.

상기 부당청구 탐지 모델은 상기 특질 변수 및 정상 청구 데이터를 입력으로 하여 새로운 보험청구 내역에 대해 부당청구 탐지 결과를 출력할 수 있다. 예시적으로, 부당청구 탐지 모델은 특질 변수와 정상 청구 데이터 간의 유사도에 기초하여 부당청구 참지 결과를 출력할 수 있다. 상기 유사도가 미리 설정된 임계값 미만이면 특질 변수와 연계된 보험청구를 정상으로 판단하고, 상기 유사도가 미리 설정된 임계값 이상이면 특질 변수와 연계된 보험청구를 부당청구로 판단할 수 있다.The fraud detection model may output the fraud detection result for the new insurance details by inputting the characteristic variable and the normal billing data. As an example, the fraud detection model may output the fraudulent sedentary result based on the similarity between the feature variable and the normal billing data. If the similarity is less than a preset threshold, the insurance claim associated with the characteristic variable may be determined to be normal. If the similarity is more than the preset threshold, the insurance claim associated with the characteristic variable may be determined to be an invalid claim.

본원의 일 실시예에 따르면, 인공지능 기반 보험금 부당청구 탐지 방법은, 보험청구 내역 데이터, 상기 특질 변수 및 상기 신규 패턴 중 적어도 하나를 기록하는 단계를 더 포함할 수 있다.According to an embodiment of the present disclosure, the AI-based claim fraud detection method may further include recording at least one of claims detail data, the characteristic variable, and the new pattern.

본원의 일 실시 예에 따른, 인공지능 기반 보험금 부당청구 탐지 방법은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.According to an embodiment of the present disclosure, an artificial intelligence-based claim fraud detection method may be implemented in a program instruction form that may be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present application.

100: 인공지능 기반 보험금 부당청구 탐지 장치
110: 데이터 전처리부
120: 이상치 탐지부
130: 신규 패턴 분류부
140: 신규 패턴 탐지부
150: 부당 청구 판단부100: AI-based claims fraud detection device
110: data preprocessor
120: outlier detection unit
130: new pattern classification unit
140: new pattern detection unit
150: unfair claim judgment

Claims

In the artificial intelligence-based claim unfair claim and unfair pattern detection device,
A data preprocessor for deriving characteristic variables by formulating insurance claim data;
An outlier detection unit that detects a normal pattern and an invalid pattern from the detected abnormal cluster based on the abnormality detection algorithm using the characteristic variable as an input;
A new pattern classification unit for performing unsupervised learning using the normal pattern and the unfair pattern as an input to detect a new pattern of unfair claims;
A new pattern detector configured to detect a pattern of a new invalid claim by constructing a pattern discrimination model based on supervised learning that takes the new pattern of the invalid claim and a pre-classified insurance claim pattern; And
Unfair claim determination unit to build a supervised detection model based on supervised learning based on the characteristic variable and the normal claim data to determine whether the new claim details unfair,
Artificial intelligence-based claims fraud detection device comprising a.

The method of claim 1,
And a database for recording at least one of the claim history data, the feature variable, and the new pattern.
The claim details data includes at least one of claim data, contract data, payment data, insurance company data, customer data, artificial intelligence-based claims fraud detection device.

The method of claim 1,
The outlier detection unit,
Based on the abnormality detection algorithm using the characteristic variable as an input, the normal value and the abnormal value of each characteristic variable of the normal transaction and the invalid transaction of the insurance claim are detected, and the abnormality cluster is based on the frequency of the characteristic variable corresponding to the abnormal value. The AI-based claim fraud detection device that is to detect.

The method of claim 1,
The new pattern classification unit,
And claiming the new pattern of the illegal claim based on a clustering algorithm based on a clustering algorithm and detecting the new pattern of the illegal pattern based on the degree of separation between the normal pattern and the invalid pattern.

The method of claim 1,
The new pattern detector,
And claim new pattern of claim fraudulent detection based on a classification algorithm.

In the artificial intelligence-based insurance claim detection method,
(a) formulating claim details data to derive feature variables;
(b) detecting a normal pattern and an invalid pattern from the detected abnormal clusters based on the abnormal detection algorithm using the characteristic variables as inputs;
(c) performing unsupervised learning using the normal pattern and the invalid pattern as an input to detect a new pattern of the invalid claim;
(d) detecting a pattern of a new fraudulent claim by constructing a pattern learning model based on supervised learning using the new pattern of the fraudulent claim and a pre-sorted insurance claim pattern; And
(e) constructing an unsupervised detection model based on supervised learning using the characteristic variables and normal billing data as inputs, and determining whether the new insurance claims are unfair;
Artificial intelligence-based insurance fraud detection method comprising a.

The method of claim 6,
The AI-based claim fraud detection method may further include recording at least one of the claim details data, the characteristic variable, and the new pattern.
The claim details data includes at least one of billing data, contract data, payment data, insurance company data, customer data, artificial intelligence-based claims fraud detection method.

The method of claim 6,
In step (b),
Based on the abnormality detection algorithm using the characteristic variable as an input, the normal value and the abnormal value of each characteristic variable of the normal transaction and the invalid transaction of the insurance claim are detected, and the abnormality cluster is based on the frequency of the characteristic variable corresponding to the abnormal value. AI-based claims fraud detection method that is to detect.

The method of claim 6,
Step (c) is,
And collecting the normal pattern and the invalid pattern based on a clustering algorithm, and detecting the new pattern of the invalid claim based on the degree of separation between the normal pattern and the invalid pattern.

The method of claim 6,
In step (d),
And detecting a pattern of the new invalid claim based on a classification algorithm.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 6 to 10.