KR20210042449A

KR20210042449A - Data scrapping apparatus and method

Info

Publication number: KR20210042449A
Application number: KR1020190125002A
Authority: KR
Inventors: 이승배
Original assignee: 주식회사 핀마트
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2021-04-20

Abstract

Provided are a data scraping apparatus and a data scraping method, capable of providing any element information required by a financial institution as alternative information by collecting and analyzing data through an artificial intelligence scraping solution without separate scraping structure analysis for an unstructured region as well as structured open API scraping data. The provided apparatus includes: a retrieval unit for retrieving structured data and unstructured data matching input information from an outside based on a machine learning algorithm; a collection unit for scraping the retrieved structured data and the retrieved unstructured data from a scraping source based on the machine learning algorithm; an analysis unit for analyzing whether the structured data and the unstructured data from the collection unit are appropriate data based on the machine learning algorithm; and a matching unit for matching the structured data and the unstructured data from the analysis unit to a mapping table for each evaluation subject and for each financial institution.

Description

Data scrapping apparatus and method TECHNICAL FIELD

본 발명은 데이터 스크래핑 장치 및 방법에 관한 것으로, 보다 상세하게는 머신러닝 알고리즘을 기반으로 하는 인공지능형 빅데이터 스크래핑 장치 및 방법에 관한 것이다.The present invention relates to a data scraping apparatus and method, and more particularly, to an artificial intelligence type big data scraping apparatus and method based on a machine learning algorithm.

가계대출시장은 2019년 3월말 잔액기준 1,540조원/2,200만명이 이용하는 초거대시장이지만, 정작 대출을 비교할만한 수단이 미비하여 정보비대칭성으로 인한 불리한 대출선택, 모집인에 의한 과도한 수익편취, 불완전판매, 정보 유출 등이 발생하여 고객피해가 만연하고 있다.The household loan market is a huge market used by 1,540 trillion won/2 million people based on the balance at the end of March 2019, but there are insufficient means to compare loans, so unfavorable loan selection due to information asymmetry, excessive profit deprivation by recruiters, incomplete sales, Information leakage has occurred and customer damage is rampant.

전통적으로 진입 장벽이 존재하는 금융 라이선스 사업은 예대마진 구조의 안정적 수익을 창출하였으나, 이자마진 감소, 대출 한도제한, 인터넷은행 및 핀테크 기업의 진출 등으로 인해 다각적인 대고객 서비스 강화 및 IT기술을 연동한 비대면 방식의 대출서비스 강화가 요구되고 있다.Traditionally, the financial licensing business, where entry barriers exist, has created stable profits with a loan-to-deposit structure, but due to the reduction of interest margins, restrictions on loan limits, and the entry of Internet banks and fintech companies, various customer services are strengthened and IT technologies are linked. There is a need to strengthen the non-face-to-face loan service.

그런데, 금융기관마다 신용평가 모형이 상이하여 비대면으로 요구하는 평가정보 및 데이터 스크래핑에 대한 어려움이 있다.However, since the credit rating model is different for each financial institution, there is a difficulty in scraping the rating information and data that are required in a non-face-to-face manner.

또한, 부동산 담보평가 및 설정서류 제출 등을 위해 고객은 적어도 2회 이상 금융기관에 방문해야 하는 번거러움이 발생한다.In addition, the customer has to visit financial institutions at least two times or more for real estate mortgage evaluation and submission of documents to be set.

선행기술 1 : 대한민국 등록특허 제10-2009336호(미리 스크래핑된 빅데이터를 이용한 클라우드 스크래핑 시스템 및 방법과, 이를 위한 컴퓨터 프로그램)Prior Art 1: Korean Patent Registration No. 10-2009336 (Cloud scraping system and method using pre-scraped big data, and computer program therefor) 선행기술 2 : 대한민국 공개특허 제10-2018-0093213호(비대면채널을 통한 주택담보대출 처리 방법)Prior Art 2: Korean Patent Laid-Open No. 10-2018-0093213 (Method of processing a home mortgage loan through a non-face-to-face channel)

본 발명은 상기한 종래의 문제점을 해결하기 위해 제안된 것으로, 금융기관이 요구하는 어떠한 요소정보라도 정형적 오픈 API 스크래핑 데이터 뿐만 아니라 비정형 영역에 대한 별도의 스크래핑 구조분석없이 인공지능형 스크래핑 솔루션을 통해 데이터를 수집 및 분석하여 대안정보로 제공할 수 있도록 하는 데이터 스크래핑 장치 및 방법을 제공함에 그 목적이 있다.The present invention has been proposed to solve the above-described conventional problems, and any element information required by financial institutions is not only formal open API scraping data, but also data through an artificial intelligent scraping solution without a separate scraping structure analysis for the unstructured area. The purpose of this is to provide a data scraping apparatus and method capable of collecting and analyzing data and providing it as alternative information.

상기와 같은 목적을 달성하기 위하여 본 발명의 바람직한 실시양태에 따른 데이터 스크래핑 장치는, 외부로부터의 입력정보에 부합되는 정형적 데이터 및 비정형적 데이터를 머신러닝 알고리즘을 기반으로 검색하는 검색부; 상기 검색된 정형적 데이터 및 비정형적 데이터를 머신러닝 알고리즘을 기반으로 스크래핑 소스에서 스크래핑하는 수집부; 상기 수집부로부터의 정형적 데이터 및 비정형적 데이터가 적절한 데이터인지를 머신러닝 알고리즘을 기반으로 분석하는 분석부; 및 상기 분석부로부터의 정형적 데이터 및 비정형적 데이터를 평가과목별 및 금융기관별 매핑테이블에 매칭시키는 매칭부;를 포함한다.In order to achieve the above object, a data scraping apparatus according to a preferred embodiment of the present invention includes: a search unit that searches for structured data and unstructured data corresponding to input information from the outside based on a machine learning algorithm; A collection unit for scraping the retrieved structured data and unstructured data from a scraping source based on a machine learning algorithm; An analysis unit that analyzes whether the formal data and the unstructured data from the collection unit are appropriate data based on a machine learning algorithm; And a matching unit for matching the formal data and the informal data from the analysis unit to the mapping tables for each evaluation subject and each financial institution.

상기 검색부는, 외부로부터의 스크래핑 요청이 있게 되면 외부로부터의 입력정보를 근거로 평가항목 요소를 머신러닝 알고리즘을 기반으로 분석하고, 분석된 평가항목 요소에 해당하는 키워드가 무엇인지를 머신러닝 알고리즘을 기반으로 추출하고, 추출된 키워드에 가장 많이 부합되는 정형적 데이터 또는 비정형적 데이터가 어느 스크래핑 소스에 있는지를 파악하고, 키워드에 가장 많이 부합되는 정형적 데이터가 상기 파악된 스크래핑 소스에서 텍스트 또는 이미지로 존재하는지를 머신러닝 알고리즘을 기반으로 검색하고, 키워드에 가장 많이 부합되는 비정형적 데이터가 상기 파악된 스크래핑 소스에서 텍스트 또는 이미지로 존재하는지를 머신러닝 알고리즘을 기반으로 검색할 수 있다.When there is a scraping request from the outside, the search unit analyzes the evaluation item element based on the machine learning algorithm based on the input information from the outside, and uses the machine learning algorithm to determine what keyword corresponds to the analyzed evaluation item element. Based on the extraction, it is possible to identify which scraping source contains the structured or unstructured data that most matches the extracted keyword, and the structured data that most matches the keyword is converted from the identified scraping source to text or images. It is possible to search for existence based on a machine learning algorithm, and search for the existence of unstructured data that most closely matches a keyword as text or an image in the identified scraping source based on the machine learning algorithm.

상기 수집부는, 상기 정형적 데이터 및 비정형적 데이터 각각에 대해 검색된 스크래핑 소스내에서의 데이터 위치, 형태, 구조를 머신러닝 알고리즘을 기반으로 각각 파악하고, 파악된 정형적 데이터 및 비정형적 데이터를 해당 스크래핑 소스에서 일괄 수집한 후에 분리 추출하여 머신러닝 알고리즘을 기반으로 분석 가공할 수 있다.The collection unit identifies the location, shape, and structure of the data in the scraping source searched for each of the structured data and the unstructured data, respectively, based on machine learning algorithms, and scrapes the identified structured and unstructured data. After collectively collecting from the source, it can be separated and extracted for analysis and processing based on machine learning algorithms.

상기 분석부는, 상기 수집부에서 수집된 정형적 데이터 및 비정형적 데이터를 머신러닝 알고리즘으로 학습하여 잘못 수집된 데이터가 있으면 상기 수집부에게 해당 사실을 피드백하여 다시 수집할 수 있도록 한다.The analysis unit learns the structured data and the unstructured data collected by the collection unit using a machine learning algorithm, and if there is incorrectly collected data, the analysis unit feeds back the facts to the collection unit so that they can be collected again.

상기 분석부는, 상기 수집부로부터의 정형적 데이터 및 비정형적 데이터의 적정성, 시기성, 활용성을 머신러닝 알고리즘을 기반으로 분석할 수 있다.The analysis unit may analyze adequacy, timing, and utility of the formal data and the unstructured data from the collection unit based on a machine learning algorithm.

상기 분석부는, 상기 적정성, 시기성, 활용성을 모두 충족하는 데이터를 상기 매칭부에게로 보낼 수 있다.The analysis unit may send data that satisfies all of the adequacy, timing, and utility to the matching unit.

상기 매칭부는, 매칭시킨 결과를 저장하고, 매칭시킨 결과를 금융사 서버에게로 출력할 수 있다.The matching unit may store the matched result and output the matched result to a financial institution server.

한편, 본 발명의 바람직한 실시양태에 따른 데이터 스크래핑 방법은, 데이터 스크래핑 장치에서 수행되는 데이터 스크래핑 방법으로서, 외부로부터의 입력정보에 부합되는 정형적 데이터 및 비정형적 데이터를 머신러닝 알고리즘을 기반으로 검색하는 단계; 상기 검색된 정형적 데이터 및 비정형적 데이터를 머신러닝 알고리즘을 기반으로 스크래핑 소스에서 스크래핑하는 단계; 상기 스크래핑된 정형적 데이터 및 비정형적 데이터가 적절한 데이터인지를 머신러닝 알고리즘을 기반으로 분석하는 단계; 및 상기 분석된 정형적 데이터 및 비정형적 데이터를 평가과목별 및 금융기관별 매핑테이블에 매칭시키는 단계;를 포함한다.On the other hand, the data scraping method according to the preferred embodiment of the present invention is a data scraping method performed in a data scraping device, which searches for structured data and unstructured data corresponding to input information from the outside based on a machine learning algorithm. step; Scraping the retrieved structured data and unstructured data in a scraping source based on a machine learning algorithm; Analyzing whether the scraped structured data and the unstructured data are appropriate data based on a machine learning algorithm; And matching the analyzed formal data and unstructured data to mapping tables for each evaluation subject and for each financial institution.

이러한 구성의 본 발명에 따르면, 금융기관이 요구하는 그 어떤 요소정보라도 그에 적절한 정형적 데이터 및 비정형적 데이터를 인공지능형 스크래핑 솔루션을 통해 수집 및 분석하여 모든 금융기관이 원하는 대출상품별 평가전문에 대안정보를 제공할 수 있다.According to the present invention of this configuration, any element information required by financial institutions is collected and analyzed through an artificial intelligence scraping solution to collect and analyze the appropriate formal data and unstructured data, and alternative information to the evaluation message for each loan product desired by all financial institutions. Can provide.

즉, 고객의 동의하에 여러 기관으로부터 대출신청에 필요한 각종 정형적 및 비정형적 평가 요소항목 데이터에 대해 보유기관 분석, 구조분석, 내용분석, 유효성 분석 등을 인공지능(AI)형 머신러닝 알고리즘을 활용하여 자동으로 스크래핑 매칭함으로써 업무처리 시간과 비용을 획기적으로 절감할 수 있다.In other words, using artificial intelligence (AI)-type machine learning algorithms for analysis of the holding institution, structure analysis, content analysis, and effectiveness analysis of various formal and informal evaluation element data required for loan application from various institutions with the consent of the customer. Therefore, it is possible to drastically reduce business processing time and cost by automatically matching scraping.

도 1은 본 발명이 적용된 시스템의 구성도이다.
도 2는 도 1에 도시된 스크래핑 서버의 내부 구성도이다.
도 3은 본 발명의 실시예에 따른 데이터 스크래핑 방법을 설명하기 위한 플로우차트이다.1 is a block diagram of a system to which the present invention is applied.
2 is an internal configuration diagram of the scraping server shown in FIG. 1.
3 is a flowchart illustrating a data scraping method according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세하게 설명하고자 한다.In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.However, this is not intended to limit the present invention to a specific embodiment, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof does not preclude in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present application. Does not.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. In describing the present invention, in order to facilitate an overall understanding, the same reference numerals are used for the same elements in the drawings, and duplicate descriptions for the same elements are omitted.

도 1은 본 발명이 적용된 시스템의 구성도이다.1 is a block diagram of a system to which the present invention is applied.

도 1의 시스템은, 고객 단말(10), 스크래핑 서버(20), 금융사 서버(30), 스크래핑 소스(40), 및 네트워크(50)를 포함할 수 있다.The system of FIG. 1 may include a customer terminal 10, a scraping server 20, a financial institution server 30, a scraping source 40, and a network 50.

고객 단말(10)은 네트워크(50)를 통해 금융사 서버(30)에 접속하여 대출신청 및 대출신청을 위한 대출희망 내역을 입력할 수 있다. 이때, 고객 단말(10)은 대출신청에 관련된 세부 정보를 함께 입력할 수 있다. 여기서, 세부 정보는 예를 들어 성명, 주민등록번호, 이메일주소, 자택주소, 자택 연락처, 직장주소, 직장 연락처, 휴대폰 번호 등이 될 수 있다. The customer terminal 10 may access the financial company server 30 through the network 50 and input a loan request and details of a loan request for a loan application. At this time, the customer terminal 10 may also input detailed information related to the loan application. Here, the detailed information may be, for example, a name, a social security number, an email address, a home address, a home contact, a work address, a work contact, a mobile phone number, and the like.

상술한 대출신청을 위한 대출희망 내역 및 대출신청에 관련된 세부 정보는 스크래핑 서버(20)에서 필요로 하는 평가항목 요소의 근거가 될 수 있다.The above-described loan request details for the loan application and detailed information related to the loan application may be the basis of the evaluation item elements required by the scraping server 20.

예를 들어, 고객 단말(10)은 휴대용 단말기 또는 휴대용 컴퓨터로 구현될 수 있다. 여기서, 휴대용 단말기는 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet), 와이파이(Wi-Fi), LTE(Long Term Evolution) 단말 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 휴대용 컴퓨터는 노트북, 랩톱(laptop) 등을 포함할 수 있다. For example, the customer terminal 10 may be implemented as a portable terminal or a portable computer. Here, the portable terminal is a wireless communication device that guarantees portability and mobility, and includes a Personal Communication System (PCS), a Global System for Mobile communications (GSM), a Personal Digital Cellular (PDC), a Personal Handyphone System (PHS), a Personal Digital Digital Assistant (PDA). Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet), Wi-Fi, LTE ( All types of handheld-based wireless communication devices, such as a Long Term Evolution terminal, may be included. The portable computer may include a notebook, a laptop, and the like.

또한, 고객 단말(10)은 스마트폰, 스마트 노트, 태블릿 PC, 웨어러블(wearable) 컴퓨터 등의 각종 스마트 기기일 수도 있다.In addition, the customer terminal 10 may be various smart devices such as a smart phone, a smart note, a tablet PC, and a wearable computer.

스크래핑 서버(20)는 금융사 서버(30)로부터의 스크래핑 요청에 의해 해당 고객의 평가를 위한 각종의 정형적 데이터 및 비정형적 데이터를 스크래핑 소스(40)에서 인공지능적으로 스크래핑하여 해당 금융사 서버(30)의 평가 전문에 대안 정보(평가 정보)를 실어서 해당 금융사 서버(30)에게로 제공할 수 있다.The scraping server 20 artificially and intelligently scrapes a variety of formal data and unstructured data for evaluation of a corresponding customer in response to a scraping request from the financial company server 30 in the scraping source 40, and the corresponding financial company server 30 Alternative information (evaluation information) may be loaded on the full text of the evaluation and provided to the financial institution server 30.

즉, 스크래핑 서버(20)는 금융기관에서 요구하는 모든 대출평가 항목 요소정보에 대하여 정형적 Open API를 이용한 스크래핑 데이터 뿐만 아니라 비정형 영역에 대한 데이터를 머신러닝 알고리즘을 기반으로 한 인공지능(AI)형 데이터 스크래핑을 통해 수집 및 분석하여 금융기관별로 원하는 대출상품별 평가전문에 대안 정보(평가 정보)를 제공할 수 있다.That is, the scraping server 20 is an artificial intelligence (AI) type based on a machine learning algorithm, as well as scraping data using a formal open API for all loan evaluation item element information required by financial institutions. By collecting and analyzing data through data scraping, alternative information (evaluation information) can be provided to the evaluation message for each loan product desired for each financial institution.

다시 말해서, 스크래핑 서버(20)는 금융사 서버(30)로부터의 스크래핑 요청에 의해 해당 고객에 대한 평가를 위한 정형적 평가 정보 및 비정형적 평가 정보를 머신러닝 알고리즘을 기반으로 한 인공지능(AI)형 데이터 스크래핑을 통해 스크래핑 소스(40)에서 수집 및 분석하여 금융기관별로 원하는 대출상품별 평가항목에 적용가능한 대안 정보를 생성하여 제공할 수 있다.In other words, the scraping server 20 is an artificial intelligence (AI) type based on a machine learning algorithm based on the formal evaluation information and the informal evaluation information for the evaluation of the customer in response to the scraping request from the financial institution server 30. By collecting and analyzing data from the scraping source 40 through data scraping, alternative information applicable to the evaluation items for each desired loan product for each financial institution may be generated and provided.

예를 들어, A 중소기업에 다니는 박부장이 딸의 결혼자금 마련을 위해 비대면 신용대출 1,000만원 대출신청을 한 것으로 가정한다. 이 경우, 기존에는 건강보험공단, 국세청 등을 통해 대출 상환능력 정보(정형적 데이터; 직장, 재산, 소득현황)만을 스크래핑하여 박부장의 대출 상환능력을 평가한다. 평가결과, 기존에는 신용등급 3등급에 기존 신용대출 3,000만원을 사용중이므로 추가 비대면 신용대출 불가 대상으로 판정하여 대출을 해 주지 않을 것이다.For example, it is assumed that Park, who works for Small and Medium Business A, applied for a non-face-to-face credit loan of 10 million won to raise money for her daughter's marriage. In this case, in the past, only the information on loan repayment capability (formal data; job, property, income status) was scraped through the Health Insurance Corporation and the National Tax Service to evaluate the loan repayment capability of Park. As a result of the evaluation, since the existing credit loan of 30 million won is being used for the 3rd grade of the credit rating, it is determined that it is not eligible for additional non-face-to-face credit loans and will not provide the loan.

그러나, 상기의 경우, 스크래핑 서버(20)는 상술한 대출 상환능력 평가정보(정형적 데이터; 재산, 소득현황 등) 뿐만 아니라 대출 상환의지 평가를 위한 정보(비정형적 데이터; 직장, 개인 관심 분야 등)를 스크래핑하여 상환능력과 상환의지를 다면평가 분석하여 대출가능 여부, 대출한도 책정, 대출이율 책정 등을 결정할 수 있다. 예를 들어, 상기의 비정형적 데이터에서 직장은 현재 직장이 중소기업으로 분류되어 있으나 의료신약 개발 등으로 코스닥 상장 예비심사를 통과하였다는 정보를 포함할 수 있다. 상기의 비정형적 데이터에서 개인 관심 분야는 직장내 주요 기술개발 인력으로 평판이 좋고 관련 자격증 5개를 보유하고 있으며 지속적으로 학업을 병행하고 있는 중임을 나타내는 정보를 포함할 수 있다. However, in the above case, the scraping server 20 not only evaluates the above-described loan repayment ability evaluation information (formal data; property, income status, etc.), but also information (atypical data; workplace, personal interests, etc.) for evaluating the will to repay the loan. ) By scraping and analyzing the repayment ability and repayment will by multi-faceted evaluation, it is possible to determine whether a loan is possible, set a loan limit, and set a loan interest rate. For example, in the above unstructured data, the workplace may include information that the current workplace is classified as a SME, but has passed the preliminary examination for listing on the KOSDAQ due to the development of new medical drugs. In the above unstructured data, the field of personal interest may include information indicating that a person has a reputation as a major technology development manpower in the workplace, has five related qualifications, and is continuously studying at the same time.

다시 말해서, 비대면 신용대출 1,000만원 신청자 박부장은 기존 비대면 대출 신청평가시 신용등급 및 자격요건 미흡으로 대출불가대상이었으나, 스크래핑 서버(20)가 직장 세부정보 및 근무이력, 직종형태와 관련 전문성 여부 및 안정적인 생활과 대출상환 성향 등을 종합적으로 평가하여 신용등급 2 등급으로 상향조정 가능성이 판정할 수 있다. 이 정보를 근거로, 금융사 서버(30)는 추가 신용대출을 승인할 수 있다(예컨대, 대출한도 1,000만원 + 대출금리 4%의 대출 증액 승인 결정).In other words, Director Park, an applicant for a non-face-to-face credit loan of 10 million won, was not eligible for a loan due to insufficient credit rating and qualification requirements when evaluating an existing non-face-to-face loan application. And by comprehensively evaluating stable life and propensity to repay loans, it is possible to determine the possibility of an upgrade to the second grade of credit rating. Based on this information, the financial institution server 30 may approve an additional credit loan (eg, a loan limit of 10 million won + a loan increase approval decision of 4%).

다른 예를 들면, B 대기업에 다니는 김모씨가 과도한 사치 및 유흥비 탕진으로 인해 비대면 신용대출 1,000 만원 대출신청을 한 것으로 가정한다. 이 경우, 기존에는 직장, 소득수준 등을 통한 신용등급 평가만으로 비대면 대출승인 대상이었다. As another example, it is assumed that Mr. Kim, who works for a large company B, applied for a 10 million won loan for a non-face-to-face credit due to excessive luxury and waste of entertainment expenses. In this case, in the past, it was subject to non-face-to-face loan approval only by evaluating credit ratings based on employment and income level.

그러나, 상기의 경우, 스크래핑 서버(20)는 상술한 대출 상환능력 평가정보(정형적 데이터; 직장, 재산, 소득현황) 뿐만 아니라 대출 상환의지 평가를 위한 정보(비정형적 데이터; 개인 생활 패턴, 개인 관심 분야 등)를 스크래핑하여 상환능력과 상환의지를 다면평가 분석하여 대출가능 여부 등을 결정할 수 있다. 예를 들어, 개인 생활 패턴은 지로납부일 준수여부, 자동차 과태료 납부건수, 법원 및 경찰서 사건조회 등의 생활 정보를 포함할 수 있다. 개인 관심 분야는 홈페이지, SNS 등의 활동기록, 술집방문횟수 및 사용내역 등의 성향정보를 포함할 수 있다. However, in the above case, the scraping server 20 includes not only the above-described loan repayment ability evaluation information (formal data; employment, property, income status), but also information for evaluating the loan repayment intention (atypical data; personal life pattern, personal (Areas of interest, etc.) can be scraped to determine whether or not a loan is possible by multi-faceted evaluation of the repayment ability and repayment will. For example, the personal life pattern may include life information such as whether or not to comply with the Giro payment date, the number of automobile fines paid, and inquiries about court and police cases. The personal interest field may include propensity information such as activity records such as homepage and SNS, number of visits to bars, and usage history.

다시 말해서, 스크래핑 서버(20)는 비대면 신용대출 신청자 김모씨는 신용등급 및 상환능력은 우수하지만, SNS 등에 명품과시 욕구 성향이 매우 강하고 기분파로써 유흥비 과도 지출하며 지로납부 경과 및 과태료 발부가 빈번하고 법원에 사기혐의로 피소되어 있는 자로써 금명간 대출부실 가능성이 높게 상존해 있는 대상자로 판단할 수 있다. 그에 따라, 이 정보를 근거로, 금융사 서버(30)는 대출한도를 300만원으로 축소하고 대출이율은 5%로 하는 감액 승인결정을 할 수 있다.In other words, the scraping server 20 is a non-face-to-face credit loan applicant, Mr. Kim has excellent credit rating and repayment ability, but has a very strong desire to display luxury goods on social media, etc. As a person accused of fraud in the court, it can be judged as a target who has a high possibility of bad loans for the time being. Accordingly, based on this information, the financial institution server 30 may make a decision to approve a reduction in which the loan limit is reduced to 3 million won and the loan interest rate is 5%.

또 다른 예를 들면, 최근에 리모델링한 경기도 오산에 위치한 20년된 단독주택(시가 1억/선순위 4천만원/방 2개)을 담보로 추가 후순위 부동산 담보대출 3,000 만원 비대면 신청을 한 것으로 가정한다. 이 경우, 기존 담보가치 평가방식에 의하면 80% 감정가(8천만원)에서 선순위(4천만원) 및 소액보증금(3천만원) 공제시 담보한도 여력부족으로 추가 후순위 대출 한도 1,000만원만 승인 가능할 것이다.As another example, it is assumed that a 20-year-old detached house located in Osan, Gyeonggi-do that was recently remodeled (city price: 100 million won/priority 40 million won/two rooms) as collateral, additional subordinated real estate mortgage loan 30 million won non-face-to-face application is assumed. In this case, according to the existing collateral value evaluation method, when deducting the senior (40 million won) and small deposit (30 million won) from the 80% assessed value (80 million won), it will be possible to approve only the additional subordinated loan limit of 10 million won due to insufficient collateral limit.

그러나, 상기의 경우, 스크래핑 서버(20)는 기존의 대출 상환능력 정보(정형적 데이터; 담보물 평가 데이터) 뿐만 아니라 대출 상환의지 평가를 위한 정보(비정형적 데이터; 가치변동 평가 데이터)를 스크래핑하여 상환능력과 상환의지를 다면평가 분석하여 대출가능 여부 등을 결정할 수 있다. 예를 들어, 담보물 평가 데이터는 인근시세, 인근감정가, 주민등록등본 정보를 포함할 수 있다. 가치변동 평가 데이터는 담보 인근 개발호재(예컨대, 삼성 직원 주거단지) 및 자산가치 변동사항(예컨대, 리모델링) 내역을 포함할 수 있다.However, in the above case, the scraping server 20 scrapes and repays the existing loan repayment capability information (formal data; collateral evaluation data) as well as information for evaluating the willingness to repay the loan (atypical data; value fluctuation evaluation data). The ability and willingness to repay can be evaluated and analyzed to determine whether a loan is possible or not. For example, the collateral evaluation data may include information on a nearby market price, a nearby appraisal price, and a resident registration copy. Value change evaluation data may include details of development goods (eg, Samsung employee housing complex) and changes in asset value (eg, remodeling) near collateral.

다시 말해서, 스크래핑 서버(20)는 공부상으로는 벽돌식 슬래이트지붕 구조의 오래된 일반 단독주택으로 담보가치 낮고 선순위 및 소액보증금 공제시 추가 담보여력이 적었으나, 인공지능 스크래핑을 통해 해당지역이 삼성 직원 거주전용 주택단지로써 최근에 인근 개발호재가 있고 해당물건 리모델링 증개축으로 가치 상승변동이 예상되어 추가대출(즉, 3,000만원) 여력이 충분한 것으로 판단할 수 있다.In other words, the scraping server 20 is an old general detached house with a brick-type slat roof structure, for study purposes, and has low collateral value and has little additional collateral capacity when deducting seniority and small deposits, but the area is inhabited by Samsung employees through artificial intelligence scraping. As a dedicated housing complex, there are recent developments in the vicinity, and the value is expected to change due to the remodeling and renovation of the property, so it can be judged that there is enough room for additional loans (that is, 30 million won).

또 다른 예를 들면, 서울에 방 3 개(소유자 주소지)를 6 명의 대학생에게 공유아파트로 임차중(보증금 각 1,000 만원 월세 50 만원)인 아파트를 담보(시가 2 억원, 선순위 8천만원)로 후순위 부동산 담보대출 3,000 만원 비대면 신청을 한 것으로 가정한다. 이 경우, 기존에는 소유자 거주 부동산으로 판단하여 LVT 및 DTI 등 평가만으로 탁상감정 및 영업점전결로 대출 승인을 할 것이다.As another example, a subordinated real estate lease in Seoul with 3 rooms (owner's address) as a shared apartment to 6 college students (deposit 10 million won each, 500,000 won monthly rent) as collateral (market price 200 million won, senior priority 80 million won). It is assumed that a non-face-to-face application for a mortgage loan of 30 million won is made. In this case, in the past, it will be judged as the owner's residence property, and the loan will be approved through a desk appraisal and a branch office conclusion only by evaluation such as LVT and DTI.

그러나, 상기의 경우, 스크래핑 서버(20)는 기존의 대출 상환능력 정보(정형적 데이터; 담보물 평가 데이터) 뿐만 아니라 대출 상환의지 평가를 위한 정보(비정형적 데이터; 개인 생활 패턴 데이터)를 스크래핑하여 상환능력과 상환의지를 다면평가 분석하여 대출가능 여부 등을 결정할 수 있다. 예를 들어, 담보물 평가 데이터는 인근시세, 인근감정가, 주민등록등본 정보를 포함할 수 있다. 개인 생활 패턴 데이터는 해당 주소지를 사용하는 모든 쇼핑몰 택배수령지 입력정보 및 우체국 수령정보를 포함할 수 있다.However, in the above case, the scraping server 20 scrapes and repays the existing loan repayment capability information (formal data; collateral evaluation data) as well as information for evaluating the willingness to repay the loan (atypical data; personal life pattern data). The ability and willingness to repay can be evaluated and analyzed to determine whether a loan is possible or not. For example, the collateral evaluation data may include information on a nearby market price, a nearby appraisal price, and a resident registration copy. Personal life pattern data may include all shopping mall parcel delivery destination input information and post office receipt information using the corresponding address.

다시 말해서, 스크래핑 서버(20)는 공부상으로는 소유자만 거주하는 주거용 아파트로 시가대비 감정가(80%) 1억 6천만원 - 선순위 근저당 8천만원 - 소액임차보증금(서울기준 방 3개) 4천5백만원을 공제하여도 후순위담보 3천만원이 가능하나, 실제로는 6명의 타인이 공동 거주하는 공간임을 파악하고 대출 신청자에게 임차인명부 열람조서 요구 또는 실방문 감정 추가조치로 담보가치 추가감액으로 판단할 수 있다. 따라서, 상기의 경우는 대출한도를 1,000만원 정도로 할 수 있다.In other words, the scraping server (20) is a residential apartment in which only the owner resides for study purposes, with an appraised value of 160 million won (80%) compared to the market price-80 million won for senior mortgage-a small lease deposit (three rooms in Seoul) of 45 million won. Even if deducted, subordinated security of 30 million won is possible, but it can be judged as an additional reduction in the value of the collateral by requesting a loan applicant to read the tenant list or taking additional measures to appraise the actual visit after identifying that it is a space where six others live in common. Therefore, in the above case, the loan limit can be set to about 10 million won.

금융사 서버(30)는 네트워크(50)를 통해 고객 단말(10) 또는 스크래핑 서버(20)와 통신할 수 있다.The financial institution server 30 may communicate with the customer terminal 10 or the scraping server 20 through the network 50.

금융사 서버(30)는 고객 단말(10)로부터의 대출신청을 수신하게 되면 스크래핑 서버(20)에게 해당 고객 단말(10)로부터의 대출신청관련 입력정보(대출희망 내역 및 대출신청에 관련된 세부 정보 포함)를 제공하면서 대출심사 평가를 위한 데이터 스크래핑을 요청할 수 있다. When the financial company server 30 receives the loan application from the customer terminal 10, the scraping server 20 sends the input information related to the loan application from the customer terminal 10 (including the details of the loan request and detailed information related to the loan application). ), and request data scraping for loan review evaluation.

스크래핑 소스(40)는 정형적 데이터를 제공할 수 있는 소스(source) 및 비정형적 데이터를 제공할 수 있는 소스(source)를 포함한다. The scraping source 40 includes a source capable of providing structured data and a source capable of providing unstructured data.

정형적 데이터를 제공할 수 있는 소스는 예를 들어, 등기소, 국세청, 건강보험공단, 금융기관, CB/NICE, 대법원, 공공기관 등에서 하나 이상이 될 수 있다.One or more sources of formal data can be provided, for example, from the registry office, the IRS, the Health Insurance Corporation, the financial institution, the CB/NICE, the Supreme Court, or a public institution.

비정형적 데이터를 제공할 수 있는 소스는 예를 들어, SNS, 부동산중개업소, 감정평가법인, 민원24, 통신사, VAN, 대부업체, 종교단체, 쇼핑몰, 학교, 직장 등에서 하나 이상이 될 수 있다.Sources that can provide atypical data may be, for example, one or more from SNS, real estate brokerage firms, appraisal firms, civil petitions24, telecommunications companies, VANs, loan companies, religious organizations, shopping malls, schools, and workplaces.

물론, 정형적 데이터를 제공할 수 있는 소스 및 비정형적 데이터를 제공할 수 있는 소스는 상기에서 예시한 것으로 한정되는 것이 아니라, 당연히 추가 가능하다.Of course, the source that can provide the structured data and the source that can provide the unstructured data are not limited to the ones exemplified above, but can of course be added.

네트워크(50)는 고객 단말(10)과 스크래핑 서버(20)과 금융사 서버(30)와 스크래핑 소스(40)간의 통신이 원활하게 이루어지도록 한다.The network 50 facilitates communication between the customer terminal 10, the scraping server 20, the financial institution server 30, and the scraping source 40.

예를 들어, 네트워크(50)는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network;WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선 네트워크로 구현될 수 있다. For example, the network 50 may be implemented as a wired network such as a local area network (LAN), a wide area network (WAN), or a value added network (VAN).

또한, 네트워크(50)는 이동 통신망(mobile radio communication network), 위성 통신망, 블루투스(Bluetooth), Wibro(Wireless Broadband Internet), HSDPA(High Speed Downlink Packet Access), 와이파이(Wi-Fi), LTE(Long Term Evolution) 등과 같은 모든 종류의 무선 네트워크로 구현될 수 있다. In addition, the network 50 is a mobile radio communication network, satellite communication network, Bluetooth, Wibro (Wireless Broadband Internet), HSDPA (High Speed Downlink Packet Access), WiFi (Wi-Fi), LTE (Long Term Evolution) can be implemented in all types of wireless networks.

필요에 따라서, 네트워크(50)는 유선 및 무선이 혼용된 네트워크일 수 있다.If necessary, the network 50 may be a network in which wired and wireless are mixed.

도 2는 도 1에 도시된 스크래핑 서버의 내부 구성도이다.FIG. 2 is an internal configuration diagram of the scraping server shown in FIG. 1.

스크래핑 서버(20)는 검색부(21), 수집부(22), 분석부(23), 및 매칭부(24)를 포함할 수 있다. 스크래핑 서버(20)는 데이터 스크래핑 장치라고 할 수 있다.The scraping server 20 may include a search unit 21, a collection unit 22, an analysis unit 23, and a matching unit 24. The scraping server 20 may be referred to as a data scraping device.

검색부(21)는 금융사 서버(30)로부터의 입력정보를 근거로 평가항목 요소를 분석하고, 평가항목 요소에 해당하는 키워드를 자동 추출(분석)하고, 키워드에 부합되어 가장 많은 데이터를 제공할 수 있는 스크래핑 소스(40) 및 해당 스크래핑 소스(40)에서 데이터가 포함된 텍스트 및 이미지 등을 검색할 수 있다. 여기서, 금융사 서버(30)로부터의 입력정보는 대출희망 내역 및 대출신청에 관련된 세부 정보(예컨대, 성명, 주민등록번호, 이메일주소, 자택주소, 자택 연락처, 직장주소, 직장 연락처, 휴대폰 번호 등)이 될 수 있다. The search unit 21 analyzes the evaluation item element based on the input information from the financial company server 30, automatically extracts (analyzes) a keyword corresponding to the evaluation item element, and provides the most data in accordance with the keyword. Text and images including data may be searched from the possible scraping source 40 and the scraping source 40. Here, the input information from the financial company server 30 will be the details of the loan request and detailed information related to the loan application (e.g., name, social security number, email address, home address, home contact, work address, work contact, mobile phone number, etc.). I can.

검색부(21)에서 행해지는 검색은 머신러닝 알고리즘을 기반으로 행해진다.The search performed by the search unit 21 is performed based on a machine learning algorithm.

다시 말해서, 검색부(21)는 금융사 서버(30)로부터 스크래핑 요청이 있게 되면 해당 금융사 서버(30)로부터의 입력정보를 근거로 평가항목 요소를 머신러닝 알고리즘을 기반으로 자동 분석(추출)한다. 여기서, 평가항목 요소는 직업군, 직장명, 입사일, 연소득, 보유 부동산(아파트, 주택 등) 주소, 보유 부동산 시세, 연체성향, 상환의지, 준법수준, 관심분야, 종교성향, 학력수준, 경력수준, 소득증대, 소득하락, 담보상승, 담보하락, 임차현황, SNS성향, 취미, 특기 등이 될 수 있다.In other words, when there is a scraping request from the financial institution server 30, the search unit 21 automatically analyzes (extracts) the evaluation item element based on the machine learning algorithm based on the input information from the financial institution server 30. Here, the elements of evaluation items are job group, name of work, date of employment, annual income, real estate (apartment, house, etc.) address, real estate market price, delinquency propensity, willingness to repay, compliance level, interests, religious orientation, education level, career level, It can be an increase in income, a decrease in income, an increase in collateral, a decrease in collateral, the status of leases, social media tendencies, hobbies, and specialties.

그리고, 검색부(21)는 분석된 평가항목 요소에 해당하는 키워드가 무엇인지를 머신러닝 알고리즘을 기반으로 자동으로 추출(분석)한다.In addition, the search unit 21 automatically extracts (analyzes) what keywords corresponding to the analyzed evaluation item elements are based on a machine learning algorithm.

또한, 검색부(21)는 추출된 키워드에 가장 많이 부합되는 정형적 데이터가 어느 스크래핑 소스에 있는지를 파악하고, 추출된 키워드에 가장 많이 부합되는 비정형적 데이터가 어느 스크래핑 소스에 있는지를 파악한다.In addition, the search unit 21 determines which scraping source contains the most formal data corresponding to the extracted keyword, and identifies which scraping source contains the unstructured data most most consistent with the extracted keyword.

정형적 데이터 및 비정형적 데이터는 해당 스크래핑 소스에서 텍스트 또는 이미지 등으로 되어 있을 수 있다. 따라서, 검색부(21)는 키워드에 가장 많이 부합되는 정형적 데이터가 상기 파악된 스크래핑 소스에서 텍스트 또는 이미지로 존재하는지를 머신러닝 알고리즘을 기반으로 검색한다. 그리고, 검색부(21)는 키워드에 가장 많이 부합되는 비정형적 데이터가 상기 파악된 스크래핑 소스에서 텍스트 또는 이미지로 존재하는지를 머신러닝 알고리즘을 기반으로 검색한다.The structured data and the unstructured data may be text or images in the corresponding scraping source. Accordingly, the search unit 21 searches for whether the structured data that most closely matches the keyword exists as text or image in the identified scraping source, based on a machine learning algorithm. In addition, the search unit 21 searches whether or not the unstructured data that most closely matches the keyword exists as text or image in the identified scraping source, based on a machine learning algorithm.

수집부(22)는 검색부(21)에서 검색된 정형적 데이터 및 비정형적 데이터의 구조를 머신러닝 알고리즘을 기반으로 분석한 후에 해당 데이터를 스크래핑 소스(40)에서 스크래핑한다.The collection unit 22 analyzes the structure of the structured data and the unstructured data searched by the search unit 21 based on a machine learning algorithm, and then scrapes the data in the scraping source 40.

즉, 수집부(22)는 검색된 정형적 데이터 및 비정형적 데이터가 텍스트로 되어 있든지, 이미지로 되어 있든지간에 어떠한 포맷으로 되어 있어도 스크래핑한다.That is, the collection unit 22 scrapes the retrieved structured data and unstructured data in any format, whether in text or images.

다시 말해서, 수집부(22)는 정형적 데이터 및 비정형적 데이터 각각에 대해 검색된 스크래핑 소스내에서의 위치, 형태, 구조를 머신러닝 알고리즘을 기반으로 각각 파악(분석)한다. In other words, the collection unit 22 grasps (analyzes) the position, shape, and structure in the scraping source searched for each of the structured data and the unstructured data based on a machine learning algorithm.

그리고, 수집부(22)는 파악된 정형적 데이터 및 비정형적 데이터를 해당 스크래핑 소스(40)에서 일괄 수집한 후에 분리 추출하여 머신러닝 알고리즘을 기반으로 분석 가공한다.In addition, the collection unit 22 collectively collects the identified formal data and unstructured data from the corresponding scraping source 40, separates and extracts them, and analyzes and processes them based on a machine learning algorithm.

분석부(23)는 수집부(22)로부터의 데이터(즉, 정형적 데이터, 비정형적 데이터)가 적절한 데이터인지를 머신러닝 알고리즘을 기반으로 분석한다.The analysis unit 23 analyzes whether the data from the collection unit 22 (ie, structured data or unstructured data) is appropriate data based on a machine learning algorithm.

다시 말해서, 분석부(23)는 수집부(22)로부터의 정형적 데이터 및 비정형적 데이터의 적정성, 시기성, 활용성을 머신러닝 알고리즘을 기반으로 분석한다. 여기서, 적정성은 해당 데이터가 평가항목 요소에 부합되어야 함을 의미할 수 있다. 시기성은 해당 데이터가 현재 시점 또는 가장 최근에 추출된 데이터이어야 함을 의미할 수 있다. 활용성은 해당 데이터가 여러 금융기관에서 활용가능한 데이터이어야 함을 의미할 수 있다.In other words, the analysis unit 23 analyzes the adequacy, timing, and utility of the formal data and the informal data from the collection unit 22 based on a machine learning algorithm. Here, the adequacy may mean that the data must conform to the elements of the evaluation item. Timing may mean that the data should be the current time point or the most recently extracted data. Usability can mean that the data must be data that can be used by several financial institutions.

특히, 분석부(23)는 수집부(22)에서 수집된 데이터(즉, 정형적 데이터, 비정형적 데이터)를 머신러닝 알고리즘으로 학습하여 잘못 수집된 데이터가 있으면 수집부(22)에게 해당 사실을 피드백하여 다시 수집할 수 있도록 한다.In particular, the analysis unit 23 learns the data (i.e., formal data, unstructured data) collected by the collection unit 22 with a machine learning algorithm, and if there is incorrectly collected data, the analysis unit 23 informs the collection unit 22 of the fact. Give feedback so you can collect it again.

그리고, 분석부(23)는 적정성, 시기성, 활용성을 모두 충족하는 데이터는 매칭부(24)에게로 보낸다.In addition, the analysis unit 23 sends data that satisfies all appropriateness, timing, and utility to the matching unit 24.

매칭부(24)는 분석부(23)로부터의 정형적 데이터 및 비정형적 데이터를 평가과목(대출과목)별 및 금융기관별 매핑테이블에 매칭시킨다. 여기서, 평가과목별 및 금융기관별 매핑테이블은 예를 들어, 주민등록번호, 성명, 이메일주소, 자택주소, 자택 연락처, 직장주소, 직장 연락처, 휴대폰 번호, 직업군, 직장명, 입사일, 연소득, 보유 부동산 주소, 보유 부동산 시세, 연체성향, 상환의지, 준법수준, 관심분야, 종교성향, 학력수준, 경력수준, 소득증대, 소득하락, 담보상승, 담보하락, 임차현황, SNS성향, 취미, 특기 등의 정보를 포함할 수 있다. The matching unit 24 matches the formal data and the informal data from the analysis unit 23 to each evaluation subject (loan subject) and a mapping table for each financial institution. Here, the mapping table for each evaluation subject and for each financial institution is, for example, social security number, name, email address, home address, home contact number, work address, work contact number, mobile phone number, occupation group, company name, date of employment, annual income, real estate holding address, Information such as market price, delinquency disposition, repayment will, compliance level, interests, religious disposition, education level, career level, income increase, income decrease, mortgage increase, mortgage decrease, lease status, SNS disposition, hobbies, specialty, etc. Can include.

매칭부(24)는 매칭시킨 결과를 저장할 수 있다. 그리고, 매칭부(24)는 매칭시킨 결과를 네트워크(50)를 통해 금융사 서버(30)에게로 전송할 수 있다.The matching unit 24 may store the result of matching. In addition, the matching unit 24 may transmit the matching result to the financial institution server 30 through the network 50.

예를 들어, 임차현황이 평가항목 요소인 경우 스크래핑 서버(20)는 자동으로 해당 주소지에 대한 택배, 우편 수령정보에 대한 키워드를 인식하여 쇼핑몰 또는 우체국 등에서 실제 거주자 임차인 정보를 파악할 수 있는 데이터(즉, 정형적 데이터, 비정형적 데이터)를 자동으로 분석하여 해당 데이터의 위치, 형태, 구조 등을 분석하고 해당 데이터를 추출, 분리하여 금융기관의 평가항목과 매칭하여 제공할 수 있다.For example, if the lease status is an evaluation item, the scraping server 20 automatically recognizes the keyword for the delivery and mail receipt information for the corresponding address, and data that can identify the actual tenant tenant information in a shopping mall or post office, etc. , Formal data, and unstructured data) can be automatically analyzed, the location, shape, and structure of the corresponding data, extracted and separated, and matched with the evaluation items of financial institutions and provided.

또한, 예를 들어, 스크래핑 서버(20)는 현재 신용 2등급에 해당하는 대출금리를 적용받고 있는 사람에 대하여 AI 스크래핑 데이터를 분석 활용하여 신용등급 향상에 따른 1 등급에 해당하는 대출금리 및 한도를 적용할 수 있도록 비교정보를 제공할 수 있다.In addition, for example, the scraping server 20 analyzes and utilizes AI scraping data for a person who is currently receiving a loan interest rate corresponding to the second grade of credit, and determines the loan interest rate and the limit corresponding to the first grade according to the improvement of the credit grade. Comparison information can be provided so that it can be applied.

한편, 부동산감정가격평가의 경우, 스크래핑 서버(20)는 주변시세, 동종물건 과거 감정가, 인근 동종물건 경매가, 감정평가법인 과거 감정내역, 금융기관 자체 과거 탁감, 인근 부동산중개업소 중개호가, 인근지역 신규분양가 등을 감정가격 산정을 위한 정형적 데이터 및 비정형적 데이터로 하여 분석 및 반영하여 대안 정보로서 제공할 수 있다.On the other hand, in the case of real estate appraisal price evaluation, the scraping server 20 includes the surrounding market price, the past appraised price of the same kind, the auction price of the nearby similar property, the past appraisal of the appraisal firm, the past of the financial institution itself, the brokerage price of the nearby real estate agency, and the neighborhood. New pre-sale prices, etc. can be analyzed and reflected as formal and informal data for calculating appraised prices, and provided as alternative information.

또한, 부동산권리분석검증의 경우, 스크래핑 서버(20)는 근저당권, 지상권, 임차권 등의 설정내역, 경매개시 결정, 가압류 결정, 인도명령 결정 내역, 실거주자 정보, 실임차여부 정보, 이중매매계약 여부 등을 권리분석 파악을 위한 정형적 데이터 및 비정형적 데이터로 하여 분석 및 반영하여 대안 정보로서 제공할 수 있다.In addition, in the case of real estate rights analysis verification, the scraping server 20 includes details of the establishment of root mortgage rights, superficies, lease rights, etc., auction commencement decision, seizure decision, delivery order decision details, actual resident information, actual lease status information, double sales contract. Etc. can be provided as alternative information by analyzing and reflecting them as formal and informal data for understanding rights analysis.

또한, 부동산가치변동산출의 경우, 스크래핑 서버(20)는 최근 증축 또는 개축 현황내역, 최근 수리 내역, 보수 내역, 최근 화재/침수/멸실/손괘 현황 내역, 인근지역 개발 호재, 인근지역 교통 여건 내역 등을 향후 가치변동 요인 파악을 위한 정형적 데이터 및 비정형적 데이터로 하여 분석 및 반영하여 대안 정보로서 제공할 수 있다.In addition, in the case of real estate value fluctuation calculation, the scraping server 20 has the latest extension or renovation status details, recent repair details, repair details, recent fire/inundation/loss/hand damage status details, favorable development in nearby areas, traffic conditions in nearby areas. Etc. can be provided as alternative information by analyzing and reflecting them as formal and unstructured data for understanding future value fluctuation factors.

상술한 바와 같은 본 발명의 실시예에 따르면, 금융회사마다 서로 다르게 운영되는 신용평가 정보시스템 항목이 요구하는 평가항목과 포맷을 모두 수용 및 대응할 수 있는 스마트 메타(Smart Meta) 데이터베이스를 설계할 수 있다.According to the embodiment of the present invention as described above, it is possible to design a Smart Meta database capable of accommodating and responding to all evaluation items and formats required by credit rating information system items operated differently for each financial company. .

한편, 개인 또는 회사 등의 신용대출, 담보대출 평가를 위한 정형적 데이터 및 비정형적 데이터를 머신러닝 알고리즘을 기반으로 한 인공지능(AI)형 데이터 스크래핑을 통해 수집 및 분석하여 최적의 대출조건을 검색하고, 비대면 대출신청이 가능하도록 한다.Meanwhile, the optimal loan conditions are searched by collecting and analyzing formal and unstructured data for credit loans and mortgage evaluation of individuals or companies through artificial intelligence (AI) data scraping based on machine learning algorithms. And make it possible to apply for a non-face-to-face loan

또한, 데이터 보유기관마다 서로 다르게 구축된 정보 항목, 포맷을 모두 수용할 수 있으므로, 최적의 대안 정보를 제공할 수 있다.In addition, since all information items and formats constructed differently for each data holding institution can be accommodated, optimal alternative information can be provided.

도 3은 본 발명의 실시예에 따른 데이터 스크래핑 방법을 설명하기 위한 플로우차트이다.3 is a flowchart illustrating a data scraping method according to an embodiment of the present invention.

먼저, 고객 단말(10)은 네트워크(50)를 통해 금융사 서버(30)에 접속하여 대출신청 및 대출신청을 위한 대출희망 내역을 입력할 수 있다. 이때, 고객 단말(10)은 대출신청에 관련된 세부 정보(예컨대, 성명, 주민등록번호, 이메일주소, 자택주소, 자택 연락처, 직장주소, 직장 연락처, 휴대폰 번호 등)를 입력할 수 있다.First, the customer terminal 10 may access the financial company server 30 through the network 50 to input a loan request and a loan request details for the loan application. At this time, the customer terminal 10 may input detailed information related to the loan application (eg, name, social security number, email address, home address, home contact, work address, work contact, mobile phone number, etc.).

그에 따라, 금융사 서버(30)는 고객 단말(10)로부터의 대출신청을 수신하게 되면 스크래핑 서버(20)에게 해당 고객 단말(10)로부터의 대출신청관련 입력정보(대출희망 내역 및 대출신청에 관련된 세부 정보 포함)를 제공하면서 대출심사 평가를 위한 데이터 스크래핑을 요청한다.Accordingly, when receiving the loan application from the customer terminal 10, the financial company server 30 sends the scraping server 20 input information related to the loan application from the customer terminal 10 (the details of the loan request and the loan application Provides detailed information) and requests data scraping for loan review evaluation.

스크래핑 서버(20)는 금융사 서버(30)로부터 대출심사 평가를 위한 데이터 스크래핑 요청을 수신(S10)함에 따라, 스크래핑 서버(20)는 해당 금융사 서버(30)로부터의 입력정보를 근거로 평가항목 요소를 머신러닝 알고리즘을 기반으로 자동 분석하고, 분석된 평가항목 요소에 해당하는 키워드가 무엇인지를 머신러닝 알고리즘을 기반으로 자동으로 추출하고, 추출된 키워드에 가장 많이 부합되는 정형적 데이터 및 비정형적 데이터가 어느 스크래핑 소스에 있는지를 파악한다. 그리고, 스크래핑 서버(20)는 키워드에 가장 많이 부합되는 정형적 데이터가 상기 파악된 스크래핑 소스에서 텍스트 또는 이미지로 존재하는지를 머신러닝 알고리즘을 기반으로 검색한다. 또한, 스크래핑 서버(20)는 키워드에 가장 많이 부합되는 비정형적 데이터가 상기 파악된 스크래핑 소스에서 텍스트 또는 이미지로 존재하는지를 머신러닝 알고리즘을 기반으로 검색한다(S12).As the scraping server 20 receives a data scraping request for loan review evaluation from the financial institution server 30 (S10), the scraping server 20 is an evaluation item element based on the input information from the financial institution server 30 Is automatically analyzed based on machine learning algorithms, and automatically extracts what keywords corresponding to the analyzed evaluation item elements are based on machine learning algorithms, and formal data and unstructured data most consistent with the extracted keywords Figure out which scraping source is in. In addition, the scraping server 20 searches for whether the structured data that most closely matches the keyword exists as text or image in the identified scraping source, based on a machine learning algorithm. In addition, the scraping server 20 searches based on a machine learning algorithm whether the unstructured data that most closely matches the keyword exists as text or an image in the identified scraping source (S12).

이어, 스크래핑 서버(20)는 정형적 데이터 및 비정형적 데이터 각각에 대한 검색된 스크래핑 소스내에서의 위치, 형태, 구조를 머신러닝 알고리즘을 기반으로 각각 파악(분석)하고, 파악된 정형적 데이터 및 비정형적 데이터를 해당 스크래핑 소스(40)에서 일괄 수집한 후에 분리 추출하여 머신러닝 알고리즘을 기반으로 분석 가공한다(S14).Subsequently, the scraping server 20 grasps (analyzes) the location, shape, and structure in the searched scraping source for each of the structured data and the unstructured data based on machine learning algorithms, and identifies the structured data and the unstructured data. After collectively collecting enemy data from the corresponding scraping source 40, it is separated and extracted, and analyzed and processed based on a machine learning algorithm (S14).

이어, 스크래핑 서버(20)는 수집부(22)로부터의 데이터(즉, 정형적 데이터, 비정형적 데이터)가 적절한 데이터인지를 머신러닝 알고리즘을 기반으로 분석한다. 즉, 스크래핑 서버(20)는 수집부(22)로부터의 정형적 데이터 및 비정형적 데이터의 적정성, 시기성, 활용성을 머신러닝 알고리즘을 기반으로 분석한다(S16). Subsequently, the scraping server 20 analyzes whether the data from the collection unit 22 (ie, structured data or unstructured data) is appropriate data based on a machine learning algorithm. That is, the scraping server 20 analyzes the adequacy, timing, and utility of the formal data and the unstructured data from the collection unit 22 based on the machine learning algorithm (S16).

그리고, 스크래핑 서버(20)는 적정성, 시기성, 활용성을 모두 충족하는 데이터(즉, 정형적 데이터, 비정형적 데이터)를 대출과목별 및 금융기관별 매핑테이블에 매칭시킨다(S18). In addition, the scraping server 20 matches data that satisfies all appropriateness, timing, and usability (ie, formal data and unstructured data) with mapping tables for each loan subject and for each financial institution (S18).

이후, 스크래핑 서버(20)는 매칭시킨 결과를 저장함과 더불어, 매칭시킨 결과를 네트워크(50)를 통해 금융사 서버(30)에게로 전송한다(S20).Thereafter, the scraping server 20 stores the matched result and transmits the matched result to the financial institution server 30 through the network 50 (S20).

또한, 상술한 본 발명의 데이터 스크래핑 방법은, 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 상기 방법을 구현하기 위한 기능적인(function) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.In addition, the data scraping method of the present invention described above can be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tapes, floppy disks, and optical data storage devices. In addition, the computer-readable recording medium is distributed over a computer system connected through a network, so that computer-readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the method can be easily inferred by programmers in the art to which the present invention pertains.

이상에서와 같이 도면과 명세서에서 최적의 실시예가 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로, 본 기술 분야의 통상의 지식을 가진자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.As described above, an optimal embodiment has been disclosed in the drawings and specifications. Although specific terms have been used herein, these are only used for the purpose of describing the present invention, and are not used to limit the meaning or the scope of the present invention described in the claims. Therefore, those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical scope of the present invention should be determined by the technical spirit of the appended claims.

10 : 고객 단말 20 : 스크래핑 서버
21 : 검색부 22 : 수집부
23 : 분석부 24 : 매칭부
30 : 금융사 서버 40 : 스크래핑 소스
50 : 네트워크10: customer terminal 20: scraping server
21: search unit 22: collection unit
23: analysis unit 24: matching unit
30: financial company server 40: scraping source
50: network

Claims

A search unit that searches for structured data and unstructured data corresponding to input information from the outside based on a machine learning algorithm;
A collection unit for scraping the retrieved structured data and unstructured data from a scraping source based on a machine learning algorithm;
An analysis unit that analyzes whether the formal data and the unstructured data from the collection unit are appropriate data based on a machine learning algorithm; And
And a matching unit for matching the structured data and the unstructured data from the analysis unit to a mapping table for each evaluation subject and each financial institution.

The method according to claim 1,
The search unit,
When there is a scraping request from the outside, the evaluation item element is analyzed based on the machine learning algorithm based on the input information from the outside, and the keyword corresponding to the analyzed evaluation item element is extracted based on the machine learning algorithm. , Machine learning to determine which scraping source has the most structured or unstructured data matching the extracted keyword, and whether the structured data most matching the keyword exists as text or image in the identified scraping source. A data scraping apparatus comprising: searching based on an algorithm, and searching based on a machine learning algorithm whether or not unstructured data most likely matching a keyword exists as text or image in the identified scraping source.

The method according to claim 1,
The collection unit,
For each of the structured and unstructured data, the location, shape, and structure of the data in the searched scraping source are identified based on machine learning algorithms, and the identified structured and unstructured data are collectively collected from the corresponding scraping source. A data scraping device, characterized in that for analyzing and processing based on a machine learning algorithm by separating and extracting the data.

The method according to claim 1,
The analysis unit,
The data scraping device, characterized in that by learning the structured data and the unstructured data collected by the collection unit with a machine learning algorithm, and if there is incorrectly collected data, the information is fed back to the collection unit so that the data can be collected again.

The method according to claim 1,
The analysis unit,
A data scraping device, characterized in that analyzing the appropriateness, timing, and usability of the structured and unstructured data from the collection unit based on a machine learning algorithm.

The method of claim 5,
The analysis unit,
Data scraping apparatus, characterized in that sending data that satisfies all of the adequacy, timing, and utility to the matching unit.

The method according to claim 1,
The matching unit,
A data scraping device, characterized in that storing the matched result and outputting the matched result to a financial institution server.

As a data scraping method performed in a data scraping device,
Searching for structured data and unstructured data corresponding to input information from the outside based on a machine learning algorithm;
Scraping the retrieved structured data and unstructured data in a scraping source based on a machine learning algorithm;
Analyzing whether the scraped structured data and the unstructured data are appropriate data based on a machine learning algorithm; And
And matching the analyzed formal data and the unstructured data to a mapping table for each evaluation subject and each financial institution.