KR102315181B1

KR102315181B1 - Method, apparauts and system for named entity linking and computer program thereof

Info

Publication number: KR102315181B1
Application number: KR1020190158194A
Authority: KR
Inventors: 손대능; 정유진; 이동주; 장영환; 김효승; 서대룡; 손훈석; 장성은; 백송이; 강인호; 치아키 코다마
Original assignee: 네이버 주식회사
Priority date: 2017-04-06
Filing date: 2019-12-02
Publication date: 2021-10-21
Also published as: KR102053419B1; KR20190138623A; KR20180113444A

Abstract

본 발명은 개체명 연결 방법, 장치, 시스템 및 컴퓨터 프로그램에 관한 것으로서, 보다 구체적으로는 텍스트에 포함되는 다의어에 대하여 개체명 연결을 수행하는 방법에 있어서, 개체명 연결 장치가, 상기 텍스트에서 상기 다의어를 포함하는 복수의 단어를 각 대응하는 복수의 벡터로 변환하는 단어-벡터 변환 단계; 상기 복수의 벡터를 은닉층을 구비하는 심층신경망에 입력하는 신경망 입력 단계; 및 상기 심층신경망의 출력값을 이용하여 상기 다의어에 대한 개체명 연결을 수행하는 개체명 연결 단계;를 포함하는 것을 특징으로 하는 개체명 연결 방법을 개시한다.The present invention relates to a method, apparatus, system, and computer program for linking entity names, and more particularly, to a method for linking entity names with respect to a synonym included in a text. a word-vector conversion step of converting a plurality of words including a neural network input step of inputting the plurality of vectors into a deep neural network having a hidden layer; and an entity name connection step of performing entity name connection to the polynomial using the output value of the deep neural network.

Description

NAMED ENTITY LINKING AND COMPUTER PROGRAM THEREOF}

본 발명은 개체명 연결 방법, 장치, 시스템 및 컴퓨터 프로그램에 관한 것으로서, 보다 구체적으로는 텍스트 내에 위치하는 다의어에 대한 개체명 연결을 수행하는 방법, 장치, 시스템 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to a method, apparatus, system and computer program for linking entity names, and more particularly, to a method, apparatus, system and computer program for linking entity names to polyphosic words located in text.

개체명 연결(Named Entity Linking)이라 함은 주어진 텍스트 내에 위치하는 단어 등이 다의적으로 해석될 수 있는 경우, 상기 주어진 텍스트에서의 쓰임새를 고려하여 상기 단어 등이 어떤 의미로 사용되었는지를 판단하고, 이에 따라 상기 단어 등을 인물, 사물, 장소 등으로 연결하는 작업을 말한다(예를 들어, 텍스트에서 사용된 "거미"라는 단어가 가수를 의미하는지 또는 절지동물을 의미하는지 판단). 개체명 연결은 다의적 단어에 대한 의미 판단 결과가 품질에 영향을 주는 검색 엔진, 대화 시스템 등에서 중요한 구성 요소가 될 수 있다.Named entity linking means that when a word, etc. located in a given text can be interpreted in multiple ways, it is determined in what sense the word, etc. is used in consideration of usage in the given text, and Accordingly, it refers to the operation of connecting the words to a person, an object, a place, etc. (eg, determining whether the word "spider" used in the text means a singer or an arthropod). The entity name connection can be an important component in a search engine, a conversation system, etc., in which the result of determining the meaning of a polymorphic word affects the quality.

종래에는 주로 주어진 텍스트의 연결 대상 단어의 주변 단어 등을 자질(feature)로 산출한 후, 써포트 벡터 머신(Support Vector Machine, SVM)이나 로지스틱 회귀분석(Logistic Regression), 유사도 분석 등을 적용하여 개체명 연결을 수행하였다. 그런데, 상기 종래 기술의 경우에는 구별 대상 개체명의 수가 늘어남에 따라 개체명 연결에 소요되는 시간이 증가할 수 있으며, 나아가 주어진 시간 내에 정확한 결과물을 얻기가 어려워지는 문제가 나타날 수 있다. 특히, 개체명 연결이 소수의 특정 단어에 한정되지 않고 다수의 다의어가 사용될 수 있는 검색 엔진 등에서는 구별 대상 개체명이 100 단위에서 1만 단위 이상까지 늘어날 수 있어 종래 기술로는 개체명 연결의 정확도가 크게 떨어지는 문제가 따르게 된다.Conventionally, after calculating the features, such as surrounding words of the target word of the given text, mainly, support vector machine (SVM), logistic regression, similarity analysis, etc. are applied to name the entity A connection was made. However, in the case of the prior art, as the number of object names to be distinguished increases, the time required to connect the object names may increase, and furthermore, it may be difficult to obtain an accurate result within a given time. In particular, the object name connection is not limited to a small number of specific words, and in a search engine that can use a large number of multi-lingual words, the object name to be distinguished may increase from 100 units to 10,000 units or more, so the accuracy of entity name connection is difficult with the prior art. A big problem will follow.

대한민국 등록특허 제10-1255957호(2013.4.24.)Republic of Korea Patent Registration No. 10-1255957 (2013.4.24.)

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위해 창안된 것으로, 텍스트 내에 위치하는 다의어에 대하여 정확한 개체명 연결을 수행할 수 있으며, 나아가 전산 자원을 보다 효율적으로 사용하여 개체명 연결을 수행할 수 있는 개체명 연결 방법, 장치, 시스템 및 컴퓨터 프로그램을 제공하는 것을 목적으로 한다.The present invention was devised to solve the problems of the prior art as described above, and it is possible to perform exact entity name linking with respect to a synonym located in a text, and further efficiently use computer resources to perform entity name linking. An object of the present invention is to provide a method, device, system and computer program for linking object names that can be used.

상기 과제를 해결하기 위한 본 발명의 한 측면에 따른 개체명 연결 방법은, 텍스트에 포함되는 다의어에 대하여 개체명 연결을 수행하는 방법으로서, 개체명 연결 시스템이, 상기 텍스트에서 상기 다의어를 포함하는 복수의 단어를 각 대응하는 복수의 벡터로 변환하는 단어-벡터 변환 단계; 상기 복수의 벡터를 은닉층을 구비하는 심층신경망에 입력하는 신경망 입력 단계; 및 상기 심층신경망의 출력값을 이용하여 상기 다의어에 대한 개체명 연결을 수행하는 개체명 연결 단계;를 포함하는 것을 특징으로 한다.A method for linking entity names according to an aspect of the present invention for solving the above problems is a method for linking entity names with respect to polyphosic words included in text. a word-vector conversion step of converting the words of a neural network input step of inputting the plurality of vectors into a deep neural network having a hidden layer; and an entity name connection step of performing entity name connection to the polynomial using the output value of the deep neural network.

본 발명의 다른 측면에 따른 컴퓨터 프로그램은 상기 기재된 개체명 연결 방법의 각 단계를 컴퓨터에서 실행시키기 위한 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램인 것을 특징으로 한다.A computer program according to another aspect of the present invention is characterized in that it is a computer program stored in a computer-readable medium for executing each step of the method for linking an entity name described above in a computer.

본 발명의 또 다른 측면에 따른 객체명 연결 시스템은, 텍스트에 포함되는 다의어에 대하여 개체명 연결을 수행하는 개체명 연결 시스템으로서, 상기 텍스트에서 상기 다의어를 포함하는 복수의 단어를 각 대응하는 복수의 벡터로 변환하는 단어-벡터 변환부; 상기 복수의 벡터를 은닉층을 구비하는 심층신경망에 입력하는 신경망 입력부; 및 상기 심층신경망의 출력값을 이용하여 상기 다의어에 대한 개체명 연결을 수행하는 개체명 연결부;를 포함하는 것을 특징으로 한다.An object name linking system according to another aspect of the present invention is an entity name linking system for performing entity name linking with respect to polyphosic words included in text. a word-vector conversion unit that converts a vector into a vector; a neural network input unit for inputting the plurality of vectors into a deep neural network having a hidden layer; and an entity name connection unit for performing entity name connection to the polynomial using the output value of the deep neural network.

본 발명의 일 실시예에 따르면, 텍스트 내에 위치하는 다의어에 대하여 정확한 개체명 연결을 수행할 수 있으며, 나아가 전산 자원을 보다 효율적으로 사용하여 개체명 연결을 수행할 수 있는 개체명 연결 방법, 장치, 시스템 및 컴퓨터 프로그램을 제공할 수 있게 된다.According to an embodiment of the present invention, an entity name linking method, apparatus, which can perform accurate entity name linking with respect to polyphosic words located in text, and further efficiently use computational resources to perform entity name linking, system and computer programs can be provided.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부도면은 본 발명에 대한 실시예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 본 발명의 일 실시예에 따른 개체명 연결 시스템의 구성을 설명하는 도면이다.
도 2는 본 발명의 일 실시예에 따른 개체명 연결의 개념을 설명하기 위한 도면이다.
도 3과 도 4는 본 발명의 일 실시예에 따른 개체명 연결 시스템에서 사용자 단말의 화면에 대한 예시도이다.
도 5는 본 발명의 일 실시예에 따라 개체명이 연결되는 웹문서의 예시도이다.
도 6은 본 발명의 일 실시예에 따른 개체명 연결 방법의 순서도이다.
도 7은 본 발명의 일 실시예에 따른 개체명 연결 시스템에서 심층신경망의 동작을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시예에 따른 개체명 연결 시스템에서 N-gram 콘볼루션 필터의 동작을 설명하기 위한 도면이다.
도 9는 본 발명의 일 실시예에 따른 개체명 연결 시스템에서 동적 데이터 샘플링 기법을 설명하기 위한 도면이다.
도 10은 본 발명의 일 실시예에 따른 개체명 연결 시스템에서 정적 정책 맞춤 손실 함수 기법을 설명하기 위한 도면이다.
도 11은 본 발명의 일 실시예에 따른 개체명 연결 시스템에서 일반 개체에 대한 네거티브 샘플링 기법을 설명하기 위한 도면이다.
도 12는 본 발명의 일 실시예에 따른 개체명 연결 시스템의 동작을 설명하기 위한 도면이다.
도 13은 본 발명의 일 실시예에 따른 학습 말뭉치 자동 구축 모듈의 동작을 설명하기 위한 도면이다.
도 14는 본 발명의 일 실시예에 따른 개체명 연결 시스템의 기능별 구성을 설명하기 위한 도면이다.
도 15는 본 발명의 일 실시예에 따른 개체명 연결 시스템의 거리 기반 단어-벡터 모델 및 후처리를 설명하기 위한 도면이다.
도 16은 본 발명의 일 실시예에 따른 개체명 연결 시스템의 구성도이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description to help the understanding of the present invention, provide embodiments of the present invention, and together with the detailed description, explain the technical spirit of the present invention.
1 is a view for explaining the configuration of an entity name connection system according to an embodiment of the present invention.
2 is a diagram for explaining the concept of entity name connection according to an embodiment of the present invention.
3 and 4 are exemplary views of a screen of a user terminal in the entity name connection system according to an embodiment of the present invention.
5 is an exemplary diagram of a web document to which an entity name is linked according to an embodiment of the present invention.
6 is a flowchart of a method for linking entity names according to an embodiment of the present invention.
7 is a diagram for explaining the operation of the deep neural network in the entity name connection system according to an embodiment of the present invention.
8 is a diagram for explaining the operation of an N-gram convolution filter in the entity name connection system according to an embodiment of the present invention.
9 is a diagram for explaining a dynamic data sampling technique in the entity name connection system according to an embodiment of the present invention.
10 is a diagram for explaining a static policy alignment loss function technique in the entity name association system according to an embodiment of the present invention.
11 is a diagram for explaining a negative sampling technique for a general entity in the entity name connection system according to an embodiment of the present invention.
12 is a diagram for explaining the operation of the entity name connection system according to an embodiment of the present invention.
13 is a diagram for explaining the operation of an automatic learning corpus building module according to an embodiment of the present invention.
14 is a diagram for explaining the configuration of each function of the entity name connection system according to an embodiment of the present invention.
15 is a diagram for explaining a distance-based word-vector model and post-processing of the entity name connection system according to an embodiment of the present invention.
16 is a block diagram of an entity name connection system according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 이하에서는 특정 실시예들을 첨부된 도면을 기초로 상세히 설명하고자 한다.The present invention can apply various transformations and can have various embodiments. Hereinafter, specific embodiments will be described in detail based on the accompanying drawings.

본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In describing the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되는 것은 아니며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first, second, etc. may be used to describe various elements, but the elements are not limited by the terms, and the terms are used only for the purpose of distinguishing one element from other elements. used

이하에서는, 본 발명에 따른 개체명 연결 방법, 장치, 시스템 및 컴퓨터 프로그램의 예시적인 실시형태들을 첨부된 도면을 참조하여 상세히 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of a method, apparatus, system and computer program for linking entity names according to the present invention will be described in detail with reference to the accompanying drawings.

먼저, 도 1에서는 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)의 동작을 설명하기 위한 도면을 보여주고 있다. 도 1에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)은 데이터베이스(30) 등과 연결되어, 상기 데이터베이스(30) 등에 저장되어 있는 문서(예를 들어, 블로그, 카페, 뉴스 등을 포함하는 다양한 전자 문서 등)에 대하여, 상기 문서에 포함되어 있는 하나 이상의 문장을 포함하는 텍스트 내에 위치하는 다의적 단어에 대한 개체명 연결을 수행하게 된다.First, FIG. 1 shows a diagram for explaining the operation of the entity name connection system 100 according to an embodiment of the present invention. As can be seen in FIG. 1 , the entity name connection system 100 according to an embodiment of the present invention is connected to a database 30 and the like, and documents (eg, blogs, With respect to various electronic documents including cafes, news, etc.), entity name associations are performed for polymorphic words located in texts including one or more sentences included in the document.

보다 구체적인 예를 들어, 데이터베이스(30)에 저장되어 있는 특정 전자 문서에, 도 2에서 볼 수 있는 바와 같이 “오늘 딸과 매장에 가서 ‘원피스’를 골랐다”라는 문장이 포함되어 있는 경우, 상기 ‘원피스’라는 단어는 그 쓰임새에 따라서 만화의 제목인 ‘원피스’일 수도 있으며, 의류의 한 종류인 ‘원피스’일 수도 있다. 이에 대하여, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 상기 문장에서 상기 ‘원피스’가 만화를 의미하는지 또는 의류를 의미하는지를 판단하여, 상기 문장의 다의어(‘원피스’)에 대한 개체명 연결을 수행하게 된다. 이에 따라, 상기 문장의 ‘원피스’에 ‘원피스_의류’ 등과 같이 개체명을 연결하고 나아가 이를 저장할 수 있다.As a more specific example, when a specific electronic document stored in the database 30 includes the sentence "I went to the store with my daughter today and picked 'One Piece'" as can be seen in FIG. 2, the ' The word 'One Piece' may be 'One Piece', which is the title of a cartoon, or 'One Piece', a type of clothing, depending on its usage. On the other hand, in the entity name connection system 100 according to an embodiment of the present invention, it is determined whether the 'dress' in the sentence means a cartoon or clothes, and the The object name connection is performed. Accordingly, it is possible to connect an entity name such as 'one piece_clothing' to 'one piece' of the sentence and further store it.

또한, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 상기 개체명 연결 결과를 반영하여 사용자들에게 서비스를 제공할 수 있게 된다. 보다 구체적인 예를 들어, 도 3에서 볼 수 있는 바와 같이, 단말(10a)의 사용자가 만화 ‘원피스’에 대한 검색을 수행하고자 하는 경우(예를 들어, 각 검색 서비스에서 지원하는 검색어 형식에 따라 다의어 중 특정 의미의 검색어를 특정하여 입력하는 경우(원피스(만화) 또는 원피스(의류)과 같은 형식으로 입력하는 경우(도 3의 310)), 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 다의적 의미를 가지는 검색어의 의미를 고려한 검색 결과를 제공(도 3의 320)함으로써, 사용자가 보다 편리하게 자신이 원하는 검색 결과를 얻을 수 있도록 도울 수 있게 된다.In addition, in the entity name connection system 100 according to an embodiment of the present invention, a service can be provided to users by reflecting the entity name connection result. As a more specific example, as can be seen in FIG. 3 , when the user of the terminal 10a wants to search for the cartoon 'One Piece' (for example, polymorphic words according to the search word format supported by each search service) In the case of inputting a specific search word with a specific meaning (in the case of inputting in a form such as one-piece (comic) or one-piece (clothing) (310 in FIG. 3 )), the entity name connection system 100 according to an embodiment of the present invention ), by providing a search result in consideration of the meaning of a search term having multiple meanings ( 320 of FIG. 3 ), it is possible to help the user more conveniently obtain a desired search result.

또한, 도 1을 참조하여 본 발명의 일 실시예에 따른 개체명 연결 시스템(100) 및 관련된 구성들에 대하여 보다 자세하게 살펴보면, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)은 하나 이상의 서버를 포함하여 구성될 수 있으며, 이외에도 개체명 연결을 수행하기 위한 전용 하드웨어 등을 이용하여 구성되거나, 하나 혹은 둘 이상의 단말 장치가 연동되어 구성되는 등 다양한 방식으로 구현될 수 있다. Also, referring to FIG. 1 , referring to the entity name linking system 100 and related components according to an embodiment of the present invention in more detail, the entity name linking system 100 according to an embodiment of the present invention includes one or more It may be configured to include a server, and may be implemented in various ways, such as configured using dedicated hardware for performing entity name connection, or configured by interworking with one or more terminal devices.

또한, 단말(10a, 10b)로서는 스마트폰, 태블릿 PC, PDA, 휴대전화 등 다양한 휴대 단말기가 사용될 수 있고, 그외에도 퍼스널 컴퓨터(PC), 노트북 PC 등 다양한 단말들이 채택될 수도 있다.In addition, as the terminals 10a and 10b, various portable terminals such as a smart phone, a tablet PC, a PDA, and a mobile phone may be used, and in addition, various terminals such as a personal computer (PC) and a notebook PC may be employed.

또한, 단말(10a, 10b)과 개체명 연결 시스템(100)을 연결하는 통신 네트워크(30)로서는 유선 네트워크와 무선 네트워크를 포함할 수 있으며, 구체적으로, 근거리 통신망 (LAN: Local Area Network), 도시권 통신망 (MAN: Metropolitan Area Network), 광역 통신망 (WAN: Wide Area Network) 등의 다양한 통신망을 포함할 수 있다. 또한, 상기 통신 네트워크(30)는 공지의 월드 와이드 웹(WWW: World Wide Web)을 포함할 수도 있다. 그러나, 본 발명에 따른 통신 네트워크(30)가 상기 열거된 네트워크에 국한되는 것은 아니며, 이외에도 공지의 무선 데이터 네트워크나 공지의 전화 네트워크 또는 공지의 유무선 텔레비전 네트워크 등 다양한 네트워크를 포함하여 구성될 수도 있다.In addition, the communication network 30 connecting the terminals 10a and 10b and the entity name connection system 100 may include a wired network and a wireless network, and specifically, a local area network (LAN), an urban area It may include various communication networks such as a metropolitan area network (MAN) and a wide area network (WAN). In addition, the communication network 30 may include a well-known World Wide Web (WWW). However, the communication network 30 according to the present invention is not limited to the networks listed above, and may include various networks such as a well-known wireless data network, a well-known telephone network, or a well-known wired/wireless television network.

또한, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)의 동작을 보다 구체적인 예를 들어 설명하면, 텍스트에 "지민"이라는 다의어가 포함되어 있을 경우, 상기 텍스트에 포함되어 있는 "지민"이라는 단어는 그 쓰임새에 따라서 "그룹 AOA"의 "지민"일 수도 있고, "그룹 방탄소년단"의 "지민"일 수도 있으며, 또는 다른 "지민"일 수도 있다. 또한, 텍스트에 "원피스"라는 다의어가 포함되어 있을 경우에도, 상기 "원피스"는 그 쓰임새에 따라 만화의 제목인 '원피스'일 수도 있으며, 의류의 한 종류인 '원피스'일 수도 있다. 이에 대하여, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 주어진 텍스트에서 사용된 "지민"이 어떤 사람을 의미하는지, 주어진 텍스트의 "원피스"가 만화를 의미하는지 또는 의류를 의미하는지를 판단하여, 상기 텍스트에서 사용된 다의어에 대한 개체명 연결을 수행하게 된다.In addition, if the operation of the entity name connection system 100 according to an embodiment of the present invention will be described with a more specific example, if the text contains a synonym for "Jimin", "Jimin" included in the text Depending on its usage, the word may be “Jimin” of “Group AOA”, “Jimin” of “Group BTS”, or other “Jimin”. Also, even when the text contains a polymorphic word "one piece", the "one piece" may be 'one piece', which is the title of a cartoon, or 'one piece', which is a type of clothing, depending on its usage. In contrast, in the entity name connection system 100 according to an embodiment of the present invention, what kind of person "Jimin" used in a given text means, and whether "one piece" of the given text means a cartoon or clothing By judging, the entity name connection is performed with respect to the synonyms used in the text.

이에 따라, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 주어진 텍스트의 "지민", "원피스" 등의 다의어에 "지민_가수AOA", "원피스_만화" 등과 같이 개체명을 연결하여 저장할 수 있으며, 나아가 상기 개체명 연결 결과를 반영하여 사용자들에게 검색 서비스 등 다양한 서비스를 제공할 수 있게 된다. Accordingly, in the entity name connection system 100 according to an embodiment of the present invention, entity names such as “Jimin_singer AOA” and “One Piece_Manga” are added to the synonyms of “Jimin” and “One Piece” in the given text It can be connected and stored, and further, various services such as a search service can be provided to users by reflecting the object name connection result.

나아가, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 사용자가 다의어를 구분하여 입력할 수 있도록 다의어 목록을 제공하고 사용자가 선택할 수 있도록 하는 인터페이스를 제공할 수도 있다. 예를 들어, 도 4에서 볼 수 있는 바와 같이, 사용자가 다의어 "지민" 이나 "원피스"를 입력하면, 상기 다의어 "지민"이나 "원피스"를 그 의미를 구분하여 목록을 제공한 후 사용자가 선택할 수 있도록 할 수 있다(도 4의 410, 420).Furthermore, the entity name connection system 100 according to an embodiment of the present invention may provide a list of polyphrases so that the user can distinguishly input polyphrases, and provide an interface for the user to select. For example, as can be seen in FIG. 4 , when the user inputs the polyphoric "Jimin" or "One Piece", the user selects the It can be done ( 410 and 420 in FIG. 4 ).

또 다른 예로서, 사용자가 만화 '원피스'에 대한 검색을 수행하고자 하는 경우, 각 검색 서비스에서 지원하는 검색어 형식에 따라 다의어 중 특정 의미의 검색어를 특정하여 입력할 수 있도록 함으로써(예를 들어, 원피스(만화) 또는 원피스(의류)과 같은 형식으로 입력), 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 다의적 의미를 가지는 검색어의 의미를 고려한 검색 결과를 제공하여 사용자가 보다 편리하게 자신이 원하는 검색 결과를 얻을 수 있도록 도울 수 있게 된다.As another example, when a user wants to perform a search for the cartoon 'One Piece', by allowing the user to specify and input a search word with a specific meaning among polyphosic words according to the search word format supported by each search service (eg, One Piece) (input in a format such as (comic) or one piece (clothing)), the entity name connection system 100 according to an embodiment of the present invention provides search results in consideration of the meaning of a search word having multiple meanings so that the user can more conveniently We can help you get the search results you want.

이에 따라, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는, 도 5에서 볼 수 있는 바와 같이, 동일한 단어인 원피스가 사용된 텍스트라 하더라도 상기 텍스트에서 어떤 의미로 사용되었는지를 파악함으로써(예를 들어, 도 5(a)에서는 원피스(의류)으로, 도 5(b)에서는 원피스(만화)로 파악), 사용자가 원하는 정보를 보다 정확하게 제공할 수 있게 된다.Accordingly, in the entity name connection system 100 according to an embodiment of the present invention, as can be seen in FIG. 5 , even if the same word ‘one piece’ is used in the text, what meaning is used in the text By grasping (eg, as a one-piece (clothing) in FIG. 5(a) and as a one-piece (comic) in FIG. 5(b), it is possible to more accurately provide information desired by the user.

또한, 본 발명에서는 "지민", "원피스" 등 하나의 단어가 다의적 의미를 가지는 경우를 예를 들어 설명하고 있으나, 본 발명이 반드시 이에 한정되는 것은 아니며 둘 이상의 단어가 다의적 의미를 가지는 경우에도 본 발명이 적용될 수 있다(예를 들어, "겨울 바다(노래 제목/특정 계절의 바다)"). 이에 따라, 본 발명에서 "단어"라는 표현에는 둘 이상의 단어를 포함하는 문구(phrase) 등도 포함될 수 있다.In addition, in the present invention, a case in which one word such as "Jimin" and "One Piece" has a multiple meaning is described as an example, but the present invention is not necessarily limited thereto. The invention can be applied (eg, "Winter Sea (song title/Sea of a Specific Season)"). Accordingly, in the present invention, the expression “word” may include a phrase including two or more words.

도 6에서는 본 발명의 일 실시예에 따른 개체명 연결 방법의 순서도를 도시하고 있다. 도 6에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 개체명 연결 방법은, 텍스트에 포함되는 다의어에 대하여 개체명 연결을 수행하는 방법에 있어서, 개체명 연결 시스템(100)이 텍스트에서 다의어를 포함하는 복수의 단어를 각 대응하는 복수의 벡터로 변환하는 단어-벡터 변환 단계(S110), 상기 복수의 벡터를 은닉층을 구비하는 심층신경망에 입력하는 신경망 입력 단계(S120) 및 상기 심층신경망의 출력값을 이용하여 상기 다의어에 대한 개체명 연결을 수행하는 개체명 연결 단계(S130)를 포함할 수 있다.6 is a flowchart of a method for linking entity names according to an embodiment of the present invention. As can be seen in FIG. 6 , in the method for concatenating entity names according to an embodiment of the present invention, in the method for linking entity names with respect to a synonym included in text, the entity name connection system 100 is A word-vector conversion step (S110) of transforming a plurality of words including a polymorphic word into a plurality of vectors corresponding to each other, a neural network input step (S120) of inputting the plurality of vectors into a deep neural network having a hidden layer (S120), and the deep neural network It may include an entity name connection step (S130) of performing entity name connection to the polynomial using the output value of .

아래에서는 도 6을 참조하여 본 발명의 일 실시예에 따른 개체명 연결 방법을 각 단계별로 나누어 자세하게 검토한다. Hereinafter, with reference to FIG. 6 , the method for linking entity names according to an embodiment of the present invention will be reviewed in detail by dividing each step.

먼저, S110단계에서는 개체 연결 시스템(100)이 텍스트에서 다의어를 포함하는 복수의 단어를 각 대응하는 복수의 벡터로 변환하게 된다. 이때, 본 발명의 일 실시예에 따른 개체 연결 시스템(100)은, 텍스트 내에서 다의어를 기준으로 미리 정해진 거리 이내의 복수의 단어를 선정한 후, 상기 복수의 단어에 대한 각 벡터를 산출할 수도 있다.First, in step S110, the object connection system 100 converts a plurality of words including a polymorphic word in the text into a plurality of vectors corresponding to each other. In this case, the object connection system 100 according to an embodiment of the present invention may select a plurality of words within a predetermined distance based on a polyphrase in the text, and then calculate each vector for the plurality of words. .

보다 구체적인 예를 들어, 도 7에서는 드라마 <김과장>에 관한 기사인, "김과장에서 정해성(홍가은역)은 레이스 블라우스와 트위드 배색 원피스를 매치해 러블리한"이라는 텍스트(도 7의 710)에 포함된 다의어 "원피스"에 대한 개체명을 연결하는 경우, 상기 텍스트를 형태소 단위로 분류한 후 상기 "원피스"를 기준으로 미리 정해진 거리 이내에 있는 복수의 단어 (예를 들어, 전후 4개 단어인 "블라우스", "와", "트위드", "배색" 및 "를", "매치", "해", "러블리") 를 산출한 후(도 7의 720), 상기 복수의 단어를 미리 정해진 차원의 벡터로 변환할 수 있다. 예를 들어, 도 7의 731에서 볼 수 있는 바와 같이, 소정의 단어-벡터 변환(word2vec) 함수를 이용하여 단어 "블라우스"에 대한 128차원 벡터(0.0,0.0, 0.1, 0.3, 0.8, ...), "와"에 대한 128차원 벡터(0.3, 0.2, 0.5, 0.7, 1.2, ...) 등으로 변환되어 임베딩층으로 입력될 수 있다. 이때, 변환되는 벡터의 차원은 다의어에 대한 개체명 연결의 정확도, 개체명 연결에 소요되는 시간 및 전산 자원(computing resource) 등을 고려하여 결정될 수 있다.For a more specific example, in FIG. 7, the article about the drama <Chief Kim>, "In Manager Kim, Jeong Hae-seong (played by Ga-eun Hong) is lovely by matching a lace blouse and a tweed-colored dress" (710 in FIG. 7) In the case of linking the object name for the polymorphic word "one piece", after classifying the text into morpheme units, a plurality of words within a predetermined distance based on the "one piece" (for example, "blouse" which is four words before and after After calculating ", "and", "tweed", "color matching" and "a", "match", "sun", "lovely") ( 720 in FIG. 7 ), It can be converted to a vector. For example, as shown at 731 of FIG. 7 , a 128-dimensional vector (0.0,0.0, 0.1, 0.3, 0.8, .. .), a 128-dimensional vector for "and" (0.3, 0.2, 0.5, 0.7, 1.2, ...), etc., can be converted and input to the embedding layer. In this case, the dimension of the transformed vector may be determined in consideration of the accuracy of linking the entity name to the polynomial, the time required for linking the entity name, computing resources, and the like.

다음으로, S120 단계에서는 복수의 벡터를 은닉층을 구비하는 심층신경망에 입력하게 된다(도 7의 733). 이때, 변환된 벡터(M개)는 복수개 단위로 묶여 소정의 연산을 거쳐 N개의 벡터로 산출(도 7의 732)된 후(여기서, M>N), 심층신경망에 입력될 수 있다.Next, in step S120, a plurality of vectors are input to a deep neural network having a hidden layer (733 in FIG. 7). In this case, the transformed vectors (M) are grouped into a plurality of units, and after a predetermined operation is performed to calculate N vectors (732 in FIG. 7 ) (here, M>N), they may be input to the deep neural network.

즉, 본 발명에서 변환되어 임베딩층으로 입력된 벡터는 은닉층으로 전달되어 처리되는데(도 7의 732), 이때 N-gram 콘볼루션 필터(732) 등을 이용하여 상기 입력된 M개의 벡터들을 콘볼루션 연산을 통해 믹싱(mixing)하여 N개로 줄여줌으로써, 심층신경망의 학습 및 디코딩 속도 및 효율을 크게 개선할 수 있게 된다. That is, the vector transformed in the present invention and input to the embedding layer is transferred to the hidden layer and processed (732 in FIG. 7). At this time, the input M vectors are convolved using an N-gram convolution filter 732, etc. By mixing through calculation and reducing the number to N, it is possible to significantly improve the learning and decoding speed and efficiency of the deep neural network.

즉, 도 8에서 볼 수 있는 바와 같이, N-gram 콘볼루션 필터(732)를 사용하지 않는 경우(도 8의 (a))와 대비할 때, N-gram 콘볼루션 필터(732)를 사용하는 경우(도 8의 (b)), 단어1과 단어2가 믹싱되고, 태깅 대상 다의어와 단어 4가 믹싱되는 등 입력 차원은 절반으로 크게 줄어들 수 있으며, 이에 따라 학습 및 디코딩 속도는 2개 빨라지는 등 그 효율이 크게 개선될 수 있다.That is, as can be seen in FIG. 8 , when the N-gram convolution filter 732 is used, in contrast to the case where the N-gram convolution filter 732 is not used ((a) of FIG. 8) ((b) of FIG. 8), the input dimension can be greatly reduced by half, such as when word 1 and word 2 are mixed, tag target polyphoric word and word 4 are mixed, etc. Its efficiency can be greatly improved.

보다 구체적인 예를 들어, 텍스트에서 상기 다의어를 중심으로 12개의 단어를 산출(다의어 + 전후 각 12개 단어 = 25개 단어)하고 각 단어를 128 차원의 벡터로 변환하여 입력하는 경우 심층신경망은 128 x 25 = 3200 개를 입력받아 처리하게 되는데 반하여, 도 8에서 볼 수 있는 바와 같이 복수의 단어(M개)를 콘볼루션 연산 등을 통해 N개의 단어로 줄여줌으로써, 학습 및 디코딩에 소요되는 시간 및 전산 자원(computing resource)를 효과적으로 줄일 수 있게 된다.As a more specific example, if 12 words are generated from the text, centered on the polymorphic word (multiple word + each 12 words before and after = 25 words), and each word is converted into a 128-dimensional vector and input, the deep neural network is 128 x 25 = 3200 are input and processed, whereas as shown in FIG. 8 , a plurality of words (M) is reduced to N words through convolution operation, etc., so the time required for learning and decoding and computation It is possible to effectively reduce the computing resource.

마지막으로, S130 단계에서는 심층신경망의 출력값을 이용하여 다의어에 대한 개체명 연결을 수행하게 된다. Finally, in step S130, an entity name connection is performed for a polymorphic word using the output value of the deep neural network.

이때, 심층신경망은 다의어가 서로 다른 의미로 사용된 복수의 텍스트군을 이용하여 학습되어 텍스트에서 사용된 다의어의 의미를 구별하여 출력할 수 있도록 학습된 신경망일 수 있다. In this case, the deep neural network may be a neural network that is learned by using a plurality of text groups in which polymorphic words are used with different meanings, and thus learned to distinguish and output the meanings of polyphosic words used in the text.

또한, 은닉층을 거친 정보는 ReLU(Rectified Linear Unit) 등 뉴런 함수 등을 거쳐 출력층을 통해 개체명 연결 결과치를 출력할 수 있다.In addition, the information passing through the hidden layer can output the object name connection result through the output layer through a neuron function such as a Rectified Linear Unit (ReLU).

이에 따라, 심층신경망의 출력층은, 도 7의 734에서 볼 수 있는 바와 같이, 서로 다른 의미로 사용되는 각 다의어 별로 구별되어 복수의 노드를 구성하는 구조를 이룰 수 있다. 예를 들어, 심층신경망의 출력층은, 다의어 "원피스"를 그 의미에 따라 나누어 "원피스(의류)"로 사용된 경우 및 "원피스(만화)"로 사용된 경우에 대응하는 별도의 노드를 포함하여 구성될 수 있다. 이에 따라, 도 7의 710의 "레이스 블라우스와 트위드 배색 원피스를 매치해 러블리한" 텍스트가 입력되는 경우, 도 7의 734에서 볼 수 있는 바와 같이 "원피스(의류)" 노드의 출력값(0.8)이 "원피스(만화)"의 출력값(0.2)보다 높게 출력되는 바, 텍스트에서 사용된 다의어 "원피스"가 "원피스(의류)"의 의미로 사용되었다고 판단하고, 상기 다의어 "원피스"에 대한 개체명 연결을 수행할 수 있게 된다.Accordingly, as shown in 734 of FIG. 7 , the output layer of the deep neural network may be distinguished for each polymorphic word used in a different meaning to form a structure constituting a plurality of nodes. For example, the output layer of the deep neural network divides the polymorphic word "one piece" according to its meaning and includes separate nodes corresponding to the case of being used as "one piece (clothing)" and the case of being used as "one piece (manga)". can be configured. Accordingly, when the text "Lovely by matching a lace blouse and a tweed color matching dress" of 710 of FIG. 7 is input, the output value (0.8) of the "One Piece (Clothing)" node is Since the output value (0.2) of "One Piece (Comic)" is higher than the output value (0.2), it is determined that the synonym "One Piece" used in the text is used in the meaning of "One Piece (Clothes)", and the object name is linked to the Synonym "One Piece (Clothes)" will be able to perform

이에 따라, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)은, 1) 태깅 대상 주변 단어들을 분산된 에너지 벡터 값으로 변환하여 임베딩층(embedding layer)으로 입력하고, 2) 이때 입력된 벡터 값은 처리에 요구되는 계산량을 줄일 수 있도록 콘볼루션 연산 등을 통해 믹싱되어 압축될 수 있으며, 3) 압축된 벡터는 은닉층(hidden layer)에 곱해지고 ReLU 함수로 비선형 활성화(non-linear activation)를 거쳐 4) 이어서 서로 다른 의미의 다의어 별로 구별되는 복수의 노드를 구비하는 출력층(output layer)과 곱해져 최종적으로 출력층(output layer)의 값으로부터 태깅 대상 다의어가 어떤 의미를 나타내는지를 판단하게 된다.Accordingly, the entity name connection system 100 according to an embodiment of the present invention 1) converts the words around the tagging target into distributed energy vector values and inputs them into the embedding layer, 2) at this time the input The vector value can be mixed and compressed through convolutional operation, etc. to reduce the amount of computation required for processing, 3) the compressed vector is multiplied by a hidden layer and non-linear activation is performed with the ReLU function 4) Then, it is multiplied by an output layer having a plurality of nodes that are distinguished for each polymorphic word with different meanings, and finally it is determined from the value of the output layer what meaning the tagging target polynomial represents.

나아가, 본 발명에는 개체명 연결 시스템(100)의 성능을 최적화하기 위한 추가적인 기법들이 더 적용될 수도 있다.Furthermore, additional techniques for optimizing the performance of the entity name connection system 100 may be further applied to the present invention.

먼저, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는, 종래 기술에 따라 신경망을 이용하여 개체명 연결을 수행하는 경우 학습 데이터 분포의 불균일성에 의해 다수 학습 데이터에 해당하는 다의어로 편중되어 개체명이 연결되는 문제점을 개선할 수 있다.First, in the object name connection system 100 according to an embodiment of the present invention, when the object name connection is performed using a neural network according to the prior art, it is biased toward a polynomial corresponding to a plurality of learning data due to the non-uniformity of the distribution of the learning data. This can improve the problem of linking object names.

즉, 도 9의 표(910)에서 볼 수 있는 바와 같이 다의어의 의미별 학습 데이터의 수가 크게 차이가 나게 되는 경우(원피스_(의류) 51,099건 >> 원피스_(만화) 5,313건, 원피스_(음악그룹) 1,672건), 소수의 학습 데이터(위의 경우, 원피스_(만화), 원피스_(음악그룹))에 해당하는 경우에 대한 정확도가 떨어지는 등, 상기 다수의 학습 데이터에 해당하는 다의어(위의 경우, 원피스_(의류))로 편중되어 개체명이 연결되는 문제가 나타날 수 있다.That is, as can be seen in the table 910 of FIG. 9 , when the number of learning data for each meaning of a polymorphic word is significantly different (one piece_(clothing) 51,099 cases >> one piece_(comic) 5,313 cases, one piece_( (music group) 1,672 cases) and a small number of learning data (in the above case, one piece_(comic), one piece_(music group))) In the above case, there may be a problem that the individual names are connected by being biased towards One Piece_(Clothing)).

이에 대하여, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 도 9의 수식(920)에서 볼 수 있는 바와 같이, 특정 다의어에 대한 학습 데이터를 샘플링함에 있어서, 개체명 연결의 정확도 등 성능이 떨어지는 결과를 보이는 다의어에 대한 샘플링 분포를 높일 수 있도록 동적으로 보정하여 다의어의 의미별 학습 데이터의 분포의 불균일성을 완화하여 줌으로써, 학습 데이터에서의 각 개체간 빈도 차이로 인해 다수 개체로 편향되는 문제를 효과적으로 개선할 수 있게 된다. In contrast, in the entity name connection system 100 according to an embodiment of the present invention, as can be seen in Equation 920 of FIG. 9 , in sampling learning data for a specific polymorphic word, the accuracy of entity name connection, etc. By dynamically correcting to increase the sampling distribution for polyphosic words that show poor performance, and alleviating the non-uniformity of the distribution of learning data by meaning of polyphos, it is biased toward multiple objects due to the difference in frequency between each object in the learning data. problem can be effectively rectified.

또한, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는, 도 10에서 볼 수 있는 바와 같이, 태깅 대상이 되는 다의어의 연결 후보 개체들끼리만 비교하여 순위를 산출(Ranking)하도록 함으로써, 수많은 출력 개체 클래스로 인해 학습 품질이 저하되는 문제를 효과적으로 개선할 수도 있다. In addition, in the entity name connection system 100 according to an embodiment of the present invention, as can be seen in FIG. 10 , by comparing only the connection candidate entities of the synonymous to the tagging target to calculate the ranking (Ranking), It can also effectively improve the problem of poor learning quality due to a large number of output object classes.

보다 구체적으로, 도 10의 출력층(1010)에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)의 출력층은 각 다의어의 서로 다른 의미에 대응되는 복수의 노드를 포함하는 구조를 이루게 되는데(예를 들어, 원피스(만화), 원피스(의류), 지민(AOA), 지민(BTS), 나비(가수), 나비(동물) 등), 다의어 원피스에 대한 학습 과정에서 출력층의 원피스(만화), 원피스(의류) 등 다의어 원피스에 연결될 수 있는 후보 개체(원피스(만화), 원피스(의류))들끼리만 비교하여 결과를 판단하도록 함으로써, 다수의 무관한 출력 개체(지민(AOA), 지민(BTS), 나비(가수), 나비(동물) 등)로 인한 품질 저하를 효과적으로 억제할 수 있게 된다.More specifically, as can be seen from the output layer 1010 of FIG. 10 , the output layer of the entity name connection system 100 according to an embodiment of the present invention includes a plurality of nodes corresponding to different meanings of each polynomial. (For example, One Piece (Comic), One Piece (Clothing), Jimin (AOA), Jimin (BTS), Butterfly (Singer), Butterfly (Animal), etc.), the learning process for the polymorphic word ‘One Piece’ By comparing only candidate objects (one piece (cartoon), one piece (clothing)) that can be connected to the polymorphic ‘one piece’ such as one piece (comic) and one piece (clothing) of the output layer in the output layer to judge the result, a large number of irrelevant output objects (Jimin (AOA), Jimin (BTS), butterfly (singer), butterfly (animal), etc.) can effectively suppress the deterioration of quality.

보다 구체적으로, 도 10의 수식(1020)에서 볼 수 있는 바와 같이 F1 측정치(F1 measure)를 반영할 수 있는 non-differential 함수(convex 함수 등)인 f1_prior 함수를 적용하여 다의어에 대한 동일 개체 예측의 성공, 실패에 따라 손실 전파(loss propagation)을 감소 또는 증가하도록 조절하여 줌으로써, 개체명 연결의 정확도를 효과적으로 개선할 수 있다. 이에 따라, Macro F1 기준 1% 이상의 개선 효과를 달성할 수 있었다.More specifically, as can be seen in Equation 1020 of FIG. 10, the f1_prior function, which is a non-differential function (convex function, etc.) that can reflect the F1 measure, is applied to predict the By adjusting the loss propagation to decrease or increase according to success or failure, the accuracy of entity name connection can be effectively improved. Accordingly, it was possible to achieve an improvement effect of 1% or more based on Macro F1.

또한, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는, 종래 기술에 따라 신경망을 이용하여 개체명 연결을 수행하는 경우 일반 명사나 동사, 형용사 등 일반 개체(Common sense entity)로 사용될 수 있는 다의어에 대한 개체명 연결시에 정확도가 떨어질 수 있는 문제점을 개선할 수 있다.In addition, in the entity name connection system 100 according to an embodiment of the present invention, when entity name connection is performed using a neural network according to the prior art, it will be used as a common sense entity such as a general noun, a verb, or an adjective. It is possible to improve the problem that the accuracy may be lowered when linking the entity name to a possible polynomial.

이에 대하여, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는, 도 11에서 볼 수 있는 바와 같이, 일반 명사, 동사, 형용사 등 일반 개체를 개체명 연결이 불필요한 개체를 나타내는 클래스로 정의하여 네거티브 샘플링(Negative sampling) 기법을 적용함으로써, 일반 개체에 대한 개체명 연결 오류의 발생을 효과적으로 억제할 수 있게 된다.In contrast, in the entity name connection system 100 according to an embodiment of the present invention, as shown in FIG. 11 , general entities such as general nouns, verbs, and adjectives are defined as classes representing entities that do not require entity name connection. Thus, by applying the negative sampling technique, it is possible to effectively suppress the occurrence of an entity name connection error for a general entity.

이때, 본 발명에서 일반 개체라 함은, 상기 일반 명사, 동사, 형용사 등을 모두 포함하여야 하는 것은 아니며, 상기 일반 명사, 동사, 형용사 중 하나 이상을 포함하는 것으로 정의될 수 있는 바, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 일반 명사, 동사 또는 형용사에 해당하는 일반 개체에 대한 학습 데이터를 이용하여 심층신경망을 학습시키는 경우를 폭넙게 포함한다.At this time, in the present invention, a general entity does not have to include all of the general nouns, verbs, adjectives, etc., but may be defined as including one or more of the general nouns, verbs, and adjectives. In the entity name connection system 100 according to an embodiment, a case of learning a deep neural network using learning data for a general entity corresponding to a general noun, a verb, or an adjective is broadly included.

즉, 도 11의 1110에서 볼 수 있는 바와 같이 일반 명사(기술과 경험을 "공유") 등의 일반 개체(Common sense entity)를 고유 명사(배우 "공유")로 잘못 인식하는 개체명 연결 오류를 방지하기 위하여, 일반 개체(Common sense entity)에 대한 네거티브 샘플링을 적용함으로써, 일반 명사 등 일반 개체에 대한 개체명 연결 오류를 효과적으로 방지할 수 있게 된다.That is, as can be seen in 1110 of FIG. 11, a common sense entity such as a common noun (“share” skills and experience) is incorrectly recognized as a proper noun (“share”), an entity name connection error. In order to prevent this, by applying negative sampling to a common sense entity, it is possible to effectively prevent an entity name connection error with respect to a common entity such as a common noun.

예를 들어, 태깅을 원하지 않는 일반 명사, 동사, 형용사 등에 대한 개체명 연결을 방지하기 위해서, 이러한 개체들을 별도의 클래스로 정의하여 학습시키고, 이 클래스로 분류된 단어에 대하여는 개체명 연결을 하지 않는 방식을 사용할 수 있다. 특히, 연결 오류가 빈번하게 발생하는 단어들에 대하여 별도의 클래스로 분류되도록 학습시킴으로써 효율적으로 태깅 오류를 방지할 수 있다.For example, in order to prevent association of object names with common nouns, verbs, and adjectives that do not want to be tagged, these objects are defined and learned as a separate class, and for words classified in this class, do not connect object names. method can be used. In particular, it is possible to effectively prevent a tagging error by learning to classify words in which connection errors occur frequently into separate classes.

나아가, 종래의 심층신경망(DNN) 모델에서는 다수의 다의어에 대한 개체명 연결을 위하여 곱셈 연산 등 연산량이 크게 증가하면서 처리 속도가 저하되는 문제가 나타날 수 있으나, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 N-gram 콘볼루션 필터 등을 이용하여 입력 임베딩층(input embedding layer)을 압축시켜 연산량을 줄임으로써, 개체명 연결 작업의 처리 속도를 개선할 수도 있다.Furthermore, in the conventional deep neural network (DNN) model, there may be a problem in that the processing speed is decreased while the amount of calculation such as multiplication operation is greatly increased to connect the entity name to a plurality of multi-lingual words, but the entity name according to an embodiment of the present invention The connection system 100 may reduce the amount of computation by compressing an input embedding layer using an N-gram convolution filter or the like, thereby improving the processing speed of the object name connection task.

도 12에서는 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)의 동작을 설명하고 있다. 도 12에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)은 개체명 연결 엔진(1220)을 포함하여 구성될 수 있으며, 개체명 연결 엔진(1220)은 심층신경망(Deep Neural Network, DNN) 모델 디코더(decoder) 및 단어-벡터(word vector) 모델 디코더(decoder)를 포함하여 구성될 수 있다. 본 발명에서는 심층신경망(DNN) 모델 디코더를 사용함과 동시에 단어를 벡터 형태로 변환(word to vector)하여 사용함으로써, 전산 자원을 보다 효율적으로 사용하면서도 정확한 개체명 연결을 수행할 수 있게 된다.12 illustrates the operation of the entity name connection system 100 according to an embodiment of the present invention. As can be seen in FIG. 12 , the entity name connection system 100 according to an embodiment of the present invention may include an entity name connection engine 1220, and the entity name connection engine 1220 is a deep neural network. It may be configured to include a (Deep Neural Network, DNN) model decoder and a word-vector model decoder. In the present invention, by using a deep neural network (DNN) model decoder and at the same time converting words into a vector form and using them, it is possible to perform accurate object name connection while using computational resources more efficiently.

나아가, 본 발명의 일 실시예에 따른 개체명 연결 엔진(1220)은 심층신경망(DNN) 모델 디코더 또는 단어-벡터(word vector) 모델 디코더에서 발생할 수 있는 개체명 연결 오류를 보완하기 위한 수동 태깅 이슈 대응 모듈을 더 포함할 수 있으며, 이에 따라 학습 과정 등에서 나타날 수 있는 오류에 대하여 신속하고 효과적인 대응이 가능하게 된다.Furthermore, the entity name connection engine 1220 according to an embodiment of the present invention provides a manual tagging issue to compensate for entity name connection errors that may occur in a deep neural network (DNN) model decoder or a word-vector model decoder. It may further include a response module, and accordingly, it is possible to quickly and effectively respond to errors that may appear in the learning process.

따라서, 개체명 연결 엔진(1220)은 하나 이상의 다의어를 포함하는 텍스트(1210)를 입력받고, 상기 하나 이상의 다의어에 대한 개체명 연결을 효과적으로 수행(1250)할 수 있게 된다.Accordingly, the entity name connection engine 1220 can receive the text 1210 including one or more polyphrases, and effectively perform ( 1250 ) linking the entity names to the one or more synonyms.

보다 구체적으로, 종래 기술에 따라 써포트 벡터 머신(Support Vector Machine, SVM)이나 로지스틱 회귀분석(Logistic Regression), 유사도 분석 기법 등을 이용하여 개체명 연결을 수행하는 경우, 구별 대상 개체명의 수가 늘어남에 따라 소요 시간의 증가와 함께 시간 대비 정확도가 떨어지는 문제가 나타날 수 있었으나, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 심층신경망(DNN) 모델 디코더를 사용함과 동시에 단어를 벡터 형태로 변환(word to vector)하여 개체명 연결을 수행함으로써, 구별 대상 개체명의 수가 늘어나더라도 소요 시간의 증가 및 정확도의 하락을 효과적으로 억제할 수 있게 된다.More specifically, in the case of performing object name connection using a support vector machine (SVM), logistic regression, similarity analysis technique, etc. according to the prior art, as the number of object names to be distinguished increases, There may be a problem of a decrease in accuracy versus time with an increase in the required time. However, in the entity name connection system 100 according to an embodiment of the present invention, a deep neural network (DNN) model decoder is used and a word is converted into a vector form at the same time. By performing object name concatenation by (word to vector), it is possible to effectively suppress an increase in the required time and a decrease in accuracy even if the number of object names to be distinguished increases.

보다 구체적으로, 종래에는 각 단어에 대하여 개체 ID를 부여하기 위해 문맥 주변에서 각 개체를 잘 설명하는 단어의 존재 여부와 그 가중치 값을 사용하는 등의 기법을 사용하였으나, 이러한 경우 각 개체에 해당하는 단어가 없는 경우(Unseen word)에는 해당 개체에 대한 판단이 어려워지며, 이를 보완하기 위하여 무한정 해당 개체와 관련된 단어를 모으기도 어렵다는 문제가 따랐다. 더불어, 단순의 단어의 가중치에 대한 합을 이용하여 개체명 연결을 수행하는 경우 단어간 상관 관계나 의존 관계를 표현하는 것에도 한계가 있어 개체명 연결의 커버리지나 품질이 저하되는 문제가 나타날 수도 있었다.More specifically, in the prior art, in order to assign an entity ID to each word, a technique such as the existence of a word that describes each entity well around the context and a weight value thereof have been used, but in this case, the When there is no word (Unseen word), it becomes difficult to judge the object, and to compensate for this, it is difficult to collect words related to the object indefinitely. In addition, when the entity name connection is performed using the sum of the weights of simple words, there is a limit in expressing the correlation or dependency between words, and thus the coverage or quality of the entity name connection may be deteriorated. .

이에 대하여 본 발명에서는 심층신경망(DNN) 모델을 도입하고, 상기 심층신경망(DNN) 모형의 학습 전에 단어-벡터 변환(word2vec) 방식으로 다량의 뉴스, 블로그 등을 학습하여, 다수의 단어에 대해 에너지 벡터 표현값을 얻음으로써, 단어가 모형 사전에 없는 경우(Unseen word) 등 판단이 어려운 케이스를 현저하게 줄일 수 있게 된다.In contrast, the present invention introduces a deep neural network (DNN) model, learns a large amount of news, blogs, etc. in a word-vector conversion (word2vec) method before learning the deep neural network (DNN) model, and provides energy for a number of words By obtaining the vector expression value, it is possible to remarkably reduce cases in which it is difficult to judge, such as a case in which a word is not in the model dictionary (Unseen word).

또한, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)은 학습 말뭉치 자동 구축 모듈(1230)을 이용하여 심층신경망(DNN) 모델 디코더의 학습에 필요한 학습 말뭉치를 자동으로 생성할 수 있으며, 이에 따라 보다 높은 정확도를 가지는 심층신경망(DNN) 모델 디코더를 효과적으로 구축할 수 있게 된다.In addition, the entity name connection system 100 according to an embodiment of the present invention can automatically generate a learning corpus required for learning of a deep neural network (DNN) model decoder using the automatic learning corpus building module 1230, Accordingly, it is possible to effectively construct a deep neural network (DNN) model decoder with higher accuracy.

보다 구체적으로, 도 13에서는 본 발명의 일 실시예에 따른 학습 말뭉치 구축 모듈(1230)의 동작을 설명하고 있다. 도 13에서 볼 수 있는 바와 같이, 학습 말뭉치 구축 모듈(1230)은 위키 백과 등과 같은 미리 준비된 데이터 베이스로부터 서로 다른 의미로 사용된 다의어에 대응하는 키워드 등 메타 데이터를 파싱하게 된다(1310).More specifically, FIG. 13 describes the operation of the learning corpus building module 1230 according to an embodiment of the present invention. As can be seen in FIG. 13 , the learning corpus building module 1230 parses meta data such as keywords corresponding to different meanings from a pre-prepared database such as Wikipedia ( 1310 ).

예를 들어, 도 13에서 볼 수 있는 바와 같이, 다의어 "여름"에 대한 메타 데이터 파싱을 통해 "상주상무프로축구단"의 "축구선수"인 "여름"과 "그룹 우주소녀"의 멤버로서 "가수"인 "여름" 등에 대한 메타 데이터를 얻을 수 있게 된다.For example, as can be seen in FIG. 13, through meta data parsing of the polynomial "summer", "Singer" as a "soccer player" of "Sangju Sangmu Professional Football Team" and "Singer" as a member of "Group Cosmic Girls" It becomes possible to obtain metadata about "in "summer" and the like.

이어서, 학습 말뭉치 구축 모듈(1230)은 생성된 메타 데이터를 이용하여 규칙(Rule) 기반 질의(Query) 생성 등을 통해 각 개체(entity)에 대한 검색식 등을 생성하게 된다(1320).Next, the learning corpus building module 1230 generates a search expression for each entity through rule-based query generation, etc. using the generated metadata ( 1320 ).

예를 들어, 상기 규칙(Rule)으로서 성명과 직업, 성명과 소속단체 등의 적절한 규칙을 적용할 수 있으며, 이에 따라 "축구선수" "여름"에 대해서는 "여름, 광주FC, 축구선수, NOT(가수)" 등의 검색어를 포함하는 검색식을 생성할 수 있으며, "가수" "여름"에 대해서는 "여름, 우주소녀, 스타쉽엔터테인먼트, NOT(선수)" 등의 검색어를 포함하는 검색식을 생성할 수 있게 된다.For example, as the above rule, appropriate rules such as name and occupation, name and organization can be applied, and accordingly, for "soccer player" and "summer", "summer, Gwangju FC, soccer player, NOT ( You can create a search expression that includes search terms such as "singer", be able to

이에 따라, 상기 학습 말뭉치 구축 모듈(1230)은 생성된 각 검색식을 이용하여 웹, 뉴스, 블로그 등의 문서를 검색하여 "축구선수"인 "여름" 및 "가수"인 "여름" 등 각 다의어에 대한 학습 데이터를 생성할 수 있게 된다(1330).Accordingly, the learning corpus building module 1230 searches for documents such as the web, news, and blogs using each generated search expression, and each polymorphic word such as “summer” as “soccer player” and “summer” as “singer” It becomes possible to generate training data for (1330).

이를 통해, 본 발명에 따른 학습 말뭉치 구축 모듈(1230)은 다수의 다의어에 대한 다량의 학습 데이터를 효과적으로 생성함으로써, 높은 정확도를 가지는 심층신경망(DNN) 모델 디코더를 효과적으로 구축할 수 있게 된다.Through this, the learning corpus building module 1230 according to the present invention can effectively construct a deep neural network (DNN) model decoder with high accuracy by effectively generating a large amount of learning data for a plurality of multilingual words.

나아가, 도 14에서는 본 발명의 일 실시예에 따른 개체명 연결 엔진(1410)의 구성을 보다 자세하게 설명하고 있다.Furthermore, the configuration of the entity name connection engine 1410 according to an embodiment of the present invention is described in more detail in FIG. 14 .

도 14에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 개체명 연결 엔진(1410)은 토크나이저(Tokenizer)(1411), 심층신경망 모델(DNN model)(1412), 거리기반 단어-벡터 모델(Distance based word vector model)(1413), 후처리부(1414) 및 리소스부(1415)를 포함하여 구성될 수 있다.14 , the entity name connection engine 1410 according to an embodiment of the present invention includes a tokenizer 1411, a deep neural network model 1412, and a distance-based word-vector. It may be configured to include a distance based word vector model 1413 , a post-processing unit 1414 , and a resource unit 1415 .

토크나이저(1411)에서는 주어진 텍스트를 분석하여 형태소 등으로 분류하게 된다.The tokenizer 1411 analyzes the given text and classifies it into morphemes and the like.

이어서, 심층신경망 모델(1412)에서는 분류된 형태소 등을 입력받아 상기 텍스트에서 사용된 다의어에 대한 개체명 연결을 수행하게 된다.Next, the deep neural network model 1412 receives the classified morphemes, etc., and connects the entity names to the synonyms used in the text.

또한, 거리기반 단어-벡터 모델(1413)에서는 텍스트 내에서 제1 단어를 벡터로 변환함에 있어서, 텍스트에 포함되는 제2 단어 및 상기 제2 단어와의 거리를 고려하여 상기 제1 단어에 대한 벡터를 산출하게 된다.In addition, in the distance-based word-vector model 1413, in converting the first word in the text into a vector, a vector for the first word is taken into consideration in consideration of the second word included in the text and the distance between the second word and the second word. will yield

또한, 후처리부(1414)에서는 심층신경망 모델(1412) 및 거리기반 단어-벡터 모델(1413)에서 산출된 결과물에 대한 후처리를 수행하게 된다.In addition, the post-processing unit 1414 performs post-processing on the results calculated from the deep neural network model 1412 and the distance-based word-vector model 1413 .

본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 심층신경망(DNN) 모델(1412)로 분류하기 어려운 개체들을 기구축된 데이터베이스(1415) 등을 이용해 거리기반 단어-벡터 모델(1413) 등을 이용하여 개체명 연결을 수행할 수도 있다.In the entity name connection system 100 according to an embodiment of the present invention, a distance-based word-vector model 1413 using an established database 1415, etc. for entities that are difficult to classify as a deep neural network (DNN) model 1412 It is also possible to perform object name association using .

보다 구체적으로, 도 15에서는 거리기반 단어-벡터 모델(Distance based word vector model)(1413) 및 후처리부(1414)의 동작을 보다 자세하게 설명하고 있다.More specifically, in FIG. 15 , operations of the distance based word vector model 1413 and the post-processing unit 1414 are described in more detail.

도 15에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 거리기반 단어-벡터 모델(1413)에서는 주어진 텍스트 내의 개체명 연결 대상 단어에 대하여 주변 단어들과의 관계 및 문장 내에서의 거리를 고려하여 분석에 활용할 수 있다. 구체적으로, ‘김과장’이라는 드라마에 ‘정혜성’이라는 연기자가 출연하여 ‘홍가은’이라는 배역을 맡은 경우, ‘김과장’을 분석할 때 주변에 ‘정혜성’과 ‘홍가은’이라는 단어가 나타나는 점을 이용하여 회사에서의 직급을 나타내는 ‘김과장’ 이 아닌 드라마 ‘김과장’임을 알 수 있다. As can be seen in FIG. 15 , in the distance-based word-vector model 1413 according to an embodiment of the present invention, the relationship with the surrounding words and the distance within the sentence with respect to the object name connection target word in the given text are calculated. can be used for analysis. Specifically, in the drama 'Chief Kim', when an actor named 'Hye-seong Jeong' appeared and took on the role of 'Hong Ga-eun', the words 'Jung Hye-sung' and 'Hong Ga-eun' appear around when analyzing 'Chief Kim'. Using the dots, it can be seen that it is the drama 'Chief Kim', not 'Chief Kim', which indicates the position in the company.

이때 각 단어들간의 거리는 가중치로 사용될 수 있다. 마찬가지로, ‘정혜성’을 분석할 경우, 앞의 ‘김과장’ 과 뒤의 ‘홍가은’을 고려하여 ‘정혜성’을 연기자 정혜성으로 결정할 수 있다. 이러한 방법을 이용하여, ‘홍가은’ 이라는 단어가 등록되거나 학습되지 않은 경우라도 ‘홍가은’을 캐릭터로 태깅할 수 있다.In this case, the distance between each word may be used as a weight. Similarly, when analyzing ‘Jeong Hye-sung’, ‘Chief Kim’ and ‘Hong Ga-eun’ in the front can be considered and ‘Jung Hye-sung’ can be determined as the actress Jeong Hye-seong. Using this method, even if the word ‘Hong Ga-eun’ is not registered or learned, ‘Hong Ga-eun’ can be tagged as a character.

보다 구체적인 예를 들어, 도 15에서 볼 수 있는 바와 같이, 주어진 텍스트(1510)에서 "정해성"이라는 다의어에 대한 개체명 연결을 수행하는 경우, 거리기반 단어-벡터 모델은 기구축된 데이터베이스(1520) 등을 이용하여 드라마 "김과장"에 대응하는 키워드들(도 15에서 "김과장", "정해성", "홍가은" 등)을 산출할 수 있고, 텍스트(1510)에서 개체명 연결 대상 단어인 "김과장"을 기준으로 "정해성", "홍가은" 등의 거리를 고려해 가중치를 부여하여 상기 "김과장"이 회사에서의 직급을 나타내는 "김과장"이 아닌 드라마 "김과장" 임을 판단할 수 있다.As a more specific example, as can be seen in FIG. 15 , when performing entity name concatenation for the synonym of “determinism” in a given text 1510, the distance-based word-vector model is constructed in the database 1520. Keywords corresponding to the drama “Chief Kim” (“Chief Kim”, “Jeong Hae-Sung”, “Hong Ga-Eun”, etc. in FIG. 15 ) can be calculated using the It can be determined that the "Chief Kim" is the drama "Chief Kim" rather than "Chief Kim" indicating the rank in the company by giving weights considering the distances such as "Hae-Sung Jeong" and "Ga-Eun Hong" based on ".

나아가, 텍스트(1510)에 포함된 다른 다의어인 "원피스"에 대한 개체명 연결을 수행함에 있어서는, 상기 심층신경망 모델(1530)을 이용하여 개체명 연결을 수행할 수도 있는 바, 개체명 연결 대상 단어에 따른 거리기반 단어-벡터 모델(1413) 및 심층신경망 모델(1530)의 정확도나, 개체명 연결에 소요되는 시간 및 전산 자원 등을 고려하여 거리기반 단어-벡터 모델(1413) 및 심층신경망 모델(1530)을 선택적으로 적용할 수도 있으며, 필요에 따라서는 거리기반 단어-벡터 모델(1413) 및 심층신경망 모델(1530)의 결과치를 모두 고려하여 개체명 연결을 수행함으로써 그 정확도를 더욱 개선하는 것도 가능하다.Furthermore, in performing the entity name connection to "one piece," which is another polymorphic word included in the text 1510, the entity name connection can also be performed using the deep neural network model 1530, so the object name connection target word The distance-based word-vector model 1413 and the deep neural network model ( 1413 ) and the deep neural network model ( 1530) can be selectively applied, and if necessary, it is also possible to further improve the accuracy by performing entity name connection in consideration of both the results of the distance-based word-vector model 1413 and the deep neural network model 1530. do.

이어서, 후처리부(1414)에서는 거리기반 단어-벡터 모델(1413) 및 신경망 모델(1530)의 결과치 중 하나를 고려하거나, 둘을 모두 산출하여 비교하는 등의 방법으로(1540), 다의어에 대한 개체명 연결을 수행하게 된다(1550).Subsequently, the post-processing unit 1414 considers one of the result values of the distance-based word-vector model 1413 and the neural network model 1530, or calculates and compares both of them (1540). A name connection is performed (1550).

이에 따라, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는 "CPU" 기반으로도 1코어당 수백TPS의 성능을 낼 수 있고, 수천만개 수준의 개체(entity)에 대한 태깅도 가능하며, 이때 F1 결과치도 88% 이상의 성능을 가질 수 있다.Accordingly, in the entity name connection system 100 according to an embodiment of the present invention, performance of hundreds of TPS per core can be achieved even on the basis of “CPU”, and tagging of tens of millions of entities is also possible. In this case, the F1 result value can also have a performance of 88% or more.

또한, 본 발명의 또 다른 측면에 따른 컴퓨터 프로그램은 앞서 살핀 개체명 연결 방법의 각 단계를 컴퓨터에서 실행시키기 위하여 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램인 것을 특징으로 한다. 상기 컴퓨터 프로그램은 컴파일러에 의해 만들어지는 기계어 코드를 포함하는 컴퓨터 프로그램뿐만 아니라, 인터프리터 등을 사용해서 컴퓨터에서 실행될 수 있는 고급 언어 코드를 포함하는 컴퓨터 프로그램일 수도 있다. 이때, 상기 컴퓨터로서는 퍼스널 컴퓨터(PC)나 노트북 컴퓨터 등에 한정되지 아니하며, 서버, 스마트폰, 태블릿 PC, PDA, 휴대전화 등 중앙처리장치(CPU)를 구비하여 컴퓨터 프로그램을 실행할 수 있는 일체의 정보처리 장치를 포함한다.In addition, the computer program according to another aspect of the present invention is characterized in that it is a computer program stored in a computer-readable medium in order to execute each step of the above salpin entity name connection method in a computer. The computer program may be not only a computer program including a machine code generated by a compiler, but also a computer program including a high-level language code that can be executed in a computer using an interpreter or the like. In this case, the computer is not limited to a personal computer (PC) or a notebook computer, and includes a central processing unit (CPU) such as a server, smart phone, tablet PC, PDA, mobile phone, etc. to process any information that can execute a computer program. includes the device.

또한, 상기 컴퓨터로 판독 가능한 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.In addition, the computer-readable medium may continuously store a computer-executable program, or may be temporarily stored for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media may include recording media or storage media managed by an app store that distributes applications, sites that supply or distribute various other software, and servers.

또한, 도 16에서는 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)의 구성도를 예시하고 있다.Also, FIG. 16 exemplifies the configuration of the entity name connection system 100 according to an embodiment of the present invention.

도 16에서 볼 수 있는 바와 같이 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)은 단어-벡터 변환부(110), 신경망 입력부(120) 및 개체명 연결부(130)를 포함하여 구성될 수 있다.As can be seen in FIG. 16 , the entity name connection system 100 according to an embodiment of the present invention may be configured to include a word-vector conversion unit 110 , a neural network input unit 120 , and an entity name connection unit 130 . can

아래에서는 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)을 각 구성요소 별로 나누어 살핀다. 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에 대한 보다 자세한 내용은 앞서 설명한 본 발명의 일 실시예에 따른 개체명 연결 방법에 대한 설명으로부터 유추될 수 있는 바, 아래에서 보다 자세한 설명은 생략한다.Hereinafter, the entity name connection system 100 according to an embodiment of the present invention will be divided for each component. Further details of the entity name connection system 100 according to an embodiment of the present invention can be inferred from the description of the entity name connection method according to an embodiment of the present invention described above, and will be described in more detail below. is omitted.

먼저, 단어-벡터 변환부(110)에서는 텍스트에서 다의어를 포함하는 복수의 단어를 각 대응하는 복수의 벡터로 변환하게 된다.First, the word-vector conversion unit 110 converts a plurality of words including polyphrases in text into a plurality of vectors corresponding thereto.

이어서, 신경망 입력부(120)에서는 상기 복수의 벡터를 은닉층을 구비하는 심층신경망에 입력하게 된다.Next, the neural network input unit 120 inputs the plurality of vectors to a deep neural network having a hidden layer.

마지막으로, 개체명 연결부(130)에서는 심층신경망의 출력값을 이용하여 상기 다의어에 대한 개체명 연결을 수행하게 된다.Finally, the entity name connection unit 130 performs entity name connection to the polynomial using the output value of the deep neural network.

이때, 심층신경망은 상기 다의어가 서로 다른 의미로 사용된 복수의 텍스트군을 이용하여 학습된 신경망일 수 있다. In this case, the deep neural network may be a neural network learned by using a plurality of text groups in which the polymorphic words are used in different meanings.

나아가, 본 발명의 일 실시예에 따른 개체명 연결 시스템(100)에서는, 변환된 벡터 M개를 복수개 단위로 묶어 연산하여 N개의 벡터로 산출(여기서, M>N)하여 처리할 수 있다.Furthermore, in the entity name connection system 100 according to an embodiment of the present invention, M transformed vectors are grouped into a plurality of units and calculated to be N vectors (here, M>N) and processed.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 기재된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의해서 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains. Accordingly, the embodiments described in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and are not limited to these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

10a, 10b : 단말
20 : 통신 네트워크
30 : 데이터베이스
100 : 개체명 연결 시스템
110 : 단어-벡터 변환부
120 : 신경망 입력부
130 : 개체명 연결부10a, 10b: terminal
20: communication network
30: database
100: Entity name connection system
110: word-vector conversion unit
120: neural network input unit
130: entity name connection part

Claims

delete

In the method of performing a search for a search term including a polymorphic word,
A search word inputted by a user in a search service according to a predetermined search word format, a search word that receives a search word including a polyphoric word and a polyphrase whose meaning is specified including an entity name and an identifier for distinguishing it input step; and
and a search performing step of specifying the meaning of the polymorphic word using the entity name, and performing a search for the search word in consideration of the polymorphic word having the specified meaning.

4. The method of claim 3,
In the search word input step,
The method of performing a search, characterized in that the search term includes the polymorphic word and a word specifying the meaning of the polymorphic word which is input in a predetermined format subsequent to the polymorphic word.

A computer program stored in a computer-readable medium for executing each step according to any one of claims 3 to 4 in a computer.

delete

In the search system for performing a search for a search word including a polymorphic word,
A search word inputted by a user in a search service according to a predetermined search word format, a search word that receives a search word including a polyphoric word and a polyphrase whose meaning is specified including an entity name and an identifier for distinguishing it input unit; and
and a search performing unit for specifying the meaning of the polymorphic word by using the entity name, and performing a search for the search word in consideration of the polymorphic word with the specified meaning.