KR20210085158A

KR20210085158A - Method and apparatus for recognizing named entity considering context

Info

Publication number: KR20210085158A
Application number: KR1020190177904A
Authority: KR
Inventors: 이재길; 홍승기
Original assignee: 한국과학기술원
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-08
Also published as: KR102361616B1

Abstract

As a method for a computing device operated by at least one processor to recognize an entity name, the method comprises: a step of receiving learning text tagged with an entity name label for a plurality of words; a step of learning each of a monomial entity name prediction model that learns a relationship between each word and an entity name label that is tagged corresponding to each word, and a binomial entity name prediction model that learns a relationship between an adjacent word pair constituting the learning text and the entity name label pair corresponding to each adjacent word pair so as to output a possibility that an arbitrary entity name label is adjacent to each other; and a step of generating an entity name prediction model by combining the learned monomial entity name prediction model and a label determination model at an output end of the learned binomial entity name prediction model. Therefore, the present invention is capable of improving a performance and accuracy of recognizing the entity name.

Description

Method and apparatus for recognizing entity names considering context information {METHOD AND APPARATUS FOR RECOGNIZING NAMED ENTITY CONSIDERING CONTEXT}

본 발명은 문맥 정보를 고려한 개체명 인식 기술에 관한 것이다.The present invention relates to a technology for recognizing an entity in consideration of context information.

개체명 인식(Named Entity Recognition, NER)은 입력된 텍스트에서 관심 대상이 되는 개체를 찾는 것으로, 예를 들어, 일반적인 뉴스 텍스트에서 사람, 조직, 지리적 위치 같은 대상들, 또는 생물 의학 분야에서의 질병, 단백질, 유전자, 화학 성분 등의 대상들을 찾아내는 것이다. 따라서 개체명 인식은 자연어 처리(Natural Language Processing, NLP), 정보 추출(Information Extraction), 및 지식 서비스(Knowledge Service)에서의 기본이 된다.Named Entity Recognition (NER) is to find an entity of interest in input text, for example, objects such as people, organizations, geographic locations in general news texts, or diseases in the field of biomedical science, It is to find objects such as proteins, genes, and chemical components. Accordingly, entity name recognition is the basis for Natural Language Processing (NLP), Information Extraction, and Knowledge Service.

대규모 텍스트에서 개체명 인식을 자동으로 수행하기 위해, 기존에는 해당 분야의 전문 지식과 경험을 갖춘 전문가가 수작업으로 규칙(Rule) 또는 특징(Feature)을 설계하고 이를 라벨 시퀀스를 얻는 알고리즘에 대입하였다. In order to automatically perform entity name recognition in large-scale texts, conventionally, experts with expertise and experience in the relevant field manually design a rule or feature and substitute it into an algorithm for obtaining a label sequence.

최근에는 기계 학습(Machine Learning)을 통해 대규모 텍스트로부터 개체명 인식에 적합한 특징 또는 표현(Representation)을 고정된 길이의 벡터로 추출하고, 영상 처리 및 자연어 처리를 포함한 다양한 분야에서 채택, 이용되고 있는 딥러닝 기술을 이용하여 개체명 인식을 수행한다. Recently, through machine learning, a feature or representation suitable for object name recognition is extracted from a large-scale text as a vector of a fixed length, and deep is adopted and used in various fields including image processing and natural language processing. Entity name recognition is performed using learning technology.

예를 들어, 입력 텍스트를 구성하는 단어들간의 관계를 설명하기 위한 딥러닝 모델과 단어를 구성하는 문자들간의 관계를 설명하기 위한 딥러닝 모델을 결합하고, 조건부 무작위 장(Conditional Random Field, 이하 ‘CRF’라 호칭함)을 이용하여 텍스트를 구성하는 각각의 개별 단어에 라벨이 태깅된 결과를 도출한다. For example, a deep learning model for describing the relationship between words constituting the input text and a deep learning model for describing the relationship between the characters constituting the word are combined, and the Conditional Random Field (hereinafter '' CRF') to derive the result of tagging each individual word constituting the text.

이때 CRF 모델은 전이 행렬을 이용하여 이웃하는 라벨 사이의 관계를 반영한 라벨 시퀀스(Label Sequence)를 출력하는 역할을 한다. 라벨 시퀀스란 텍스트를 구성하는 각각의 단어마다 상응하는 개별 라벨들의 연속된 순열을 의미한다. In this case, the CRF model plays a role of outputting a label sequence that reflects the relationship between neighboring labels using a transition matrix. The label sequence means a continuous permutation of individual labels corresponding to each word constituting the text.

전이 행렬이란, 텍스트 상에서 두 개의 연속된 입력 단어들에 대한 이웃한 라벨들 사이의 관계를 모델링하기 위한 것이다. 전이 행렬 기반의 모델은 학습 데이터에만 의존하여 기계 학습을 진행하며, 새로운 텍스트가 입력되어도, 기존에 학습된 전이 행렬의 값을 참조할 뿐, 새로 입력된 텍스트의 문맥을 이웃 라벨들간의 관계에 반영하지 못한다는 한계가 있다. The transition matrix is for modeling the relationship between neighboring labels for two consecutive input words in text. The transition matrix-based model performs machine learning only by relying on training data, and even when new text is input, it only refers to the value of the previously learned transition matrix, and the context of the newly input text is reflected in the relationship between neighboring labels. There is a limit to what you can't do.

한편, 한국어 텍스트에서 개체명 인식 성능을 향상시키기 위해, 하나의 어절에 여러 형태소가 조합되는 교착어인 한국어에서는 개체명이 어절 어두에 위치하는 경향을 이용한 방법이 개발되었다. 이 방법은 한국어 텍스트에서 음절-바이그램 단위로 벡터를 학습하고, 각각의 음절-바이그램마다 하나의 라벨을 부여한다. 각각의 음절-바이그램에 대한 라벨을 인식하기 위해 인공신경망을 이용하는 특징이 있지만, 이웃하는 연속된 음절-바이그램의 상관관계를 활용하지는 않는다. On the other hand, in order to improve the object name recognition performance in Korean text, a method using the tendency of the object name to be located at the beginning of a word in Korean, which is an agglutinative language in which several morphemes are combined in one word, has been developed. In this method, vectors are learned in units of syllable-bigrams in Korean text, and one label is assigned to each syllable-bigram. Although there is a feature of using an artificial neural network to recognize the label for each syllable-bigram, it does not utilize the correlation of neighboring consecutive syllable-bigrams.

따라서 입력 텍스트에 대해 개별 단어뿐만이 아니라 이웃하는 단어와의 문맥 관계를 파악하고, 파악한 문맥 관계를 반영하여 라벨 시퀀스를 예측하는 방법이 요구된다. Therefore, there is a need for a method of predicting a label sequence by identifying not only individual words but also contextual relationships with neighboring words for input text, and reflecting the identified contextual relationships.

해결하고자 하는 과제는 입력 텍스트의 라벨 시퀀스를 예측하기 위해, 입력 텍스트에 포함된 개별 단어의 독립적인 특성과 개별 단어와 이웃한 단어들과의 관계를 고려하여 각 단어들의 개체명을 인식하는 방법 및 장치를 제공하는 것이다.The task to be solved is a method of recognizing the entity names of each word in consideration of the independent characteristics of individual words included in the input text and the relationship between the individual words and neighboring words in order to predict the label sequence of the input text; to provide the device.

한 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치가 개체명을 인식하는 방법으로서, 복수의 단어들에 개체명 라벨을 태깅한 학습 텍스트를 입력받는 단계, 각 단어와 상기 각 단어에 대응되어 태깅된 개체명 라벨 간의 관계를 학습하는 단항 개체명 예측 모델과, 임의의 개체명 라벨이 서로 이웃할 가능성이 출력되도록, 상기 학습 텍스트를 구성하는 인접 단어쌍과 각 인접 단어쌍에 대응되는 개체명 라벨 쌍의 관계를 학습하는 이항 개체명 예측 모델을 각각 학습시키는 단계, 그리고 학습된 단항 개체명 예측 모델과 학습된 이항 개체명 예측 모델의 출력단에 라벨 결정 모델을 결합하여 개체명 예측 모델을 생성하는 단계를 포함한다. A method for recognizing an entity name by a computing device operated by at least one processor according to an embodiment, the method comprising: receiving an input of learning text in which entity name labels are tagged on a plurality of words; An object corresponding to a pair of adjacent words constituting the training text and each adjacent word pair so that a unary object name prediction model that learns the relationship between the tagged object name labels and the possibility that arbitrary object name labels are adjacent to each other are output Each step of training a binomial entity name prediction model that learns the relationship of name label pairs, and combining the learned unary entity name prediction model and the learned binomial entity name prediction model with the label determination model to create an entity name prediction model including the steps of

상기 입력받는 단계는, 상기 학습 텍스트를 단어 임베딩 모델로 전처리할 수 있다. In the receiving of the input, the training text may be pre-processed into a word embedding model.

상기 입력받는 단계는, 상기 복수의 단어들에 IOBES 방식 또는 BIO 방식으로 상기 개체명 라벨을 태깅하고, 상기 IOBES 방식은, 임의의 단어가 복수의 단어들로 구성된 개체명의 시작에 해당하면 B, 개체명의 중간에 해당하면 I, 개체명의 끝에 해당하면 E, 개체명이 아닌 단어이면 O, 한 단어로 구성된 개체명에 해당하면 S로 표시하는 방식이고, 상기 BIO 방식은, 상기 임의의 단어가 상기 복수의 단어들로 구성된 개체명의 시작에 해당하면 B, 시작이 아닌 개체명에 해당하면 I, 개체명이 아닌 단어이면 O로 표시하는 방식일 수 있다. In the step of receiving the input, the plurality of words are tagged with the entity name label in an IOBES method or a BIO method, and in the IOBES method, if an arbitrary word corresponds to the start of an entity name composed of a plurality of words, B, entity If it corresponds to the middle of the name, I, if it corresponds to the end of the name of the entity, E, if it is a word that is not an entity name, it is O, if it corresponds to the entity name consisting of one word, it is indicated as S. If it corresponds to the beginning of an entity name composed of words, it may be indicated by B, if it corresponds to an entity name other than the beginning, then I, and if it is a word that is not an entity name, O may be indicated.

상기 학습시키는 단계는, 상기 단항 개체명 예측 모델의 출력값과 상기 이항 개체명 예측 모델의 출력값에 각각 별개의 라벨 결정 모델을 연결할 수 있다.The training may include connecting separate label determination models to the output value of the unary entity name prediction model and the output value of the binary entity name prediction model.

상기 개체명 예측 모델은, 상기 단항 개체명 예측 모델에 포함된 양방향 장단기 기억 구조(Bidirectional Long Short-Term Memory, BiLSTM) 레이어의 파라미터들과 상기 이항 개체명 예측 모델에 포함된 BiLSTM 레이어의 파라미터들이 통합된 모델일 수 있다.In the entity name prediction model, the parameters of the Bidirectional Long Short-Term Memory (BiLSTM) layer included in the unary entity name prediction model and the parameters of the BiLSTM layer included in the binomial entity name prediction model are integrated. It may be a model that has been

상기 개체명 예측 모델로 텍스트를 입력하는 단계, 그리고 상기 개체명 예측 모델을 이용하여 상기 텍스트에 포함된 단어들의 개체명 라벨이 나열된 라벨 시퀀스를 예측하는 단계를 더 포함할 수 있다.The method may further include inputting text into the entity name prediction model, and predicting a label sequence in which entity name labels of words included in the text are listed using the entity name prediction model.

상기 라벨 시퀀스는, 상기 학습된 단항 개체명 예측 모델이 출력하는 확률 값과 상기 학습된 이항 개체명 예측 모델이 출력하는 확률 값을 곱한 값이 최대가 되도록 할 수 있다.The label sequence may maximize a value obtained by multiplying a probability value output by the learned unary entity name prediction model by a probability value output by the learned binomial entity name prediction model.

상기 텍스트는 단어 임베딩 모델에 의해 전처리된 단어 임베딩 벡터일 수 있다. The text may be a word embedding vector preprocessed by a word embedding model.

다른 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치가 개체명을 인식하는 방법으로서, 텍스트를 단항 개체명 예측 모델과 이항 개체명 예측 모델로 입력하는 단계, 상기 단항 개체명 예측 모델을 이용하여 상기 텍스트를 구성하는 각 단어에 해당하는 개체명 라벨을 예측하고, 상기 이항 개체명 예측 모델을 이용하여 상기 텍스트를 구성하는 인접 단어쌍에 대응되는 예측된 개체명 라벨쌍이 출력될 가능성을 예측하는 단계, 그리고 예측된 결과를 바탕으로, 상기 텍스트에 포함된 단어들의 개체명 라벨이 나열된 라벨 시퀀스를 출력하는 단계를 포함하고, 상기 단항 개체명 예측 모델은, 복수의 단어들에 개체명 라벨을 태깅한 학습 텍스트로 각 단어와 상기 각 단어에 대응되어 태깅된 상기 개체명 라벨 간의 관계를 학습한 모델이고, 상기 이항 개체명 예측 모델은, 상기 학습 텍스트로 임의의 개체명 라벨이 서로 이웃할 가능성이 출력되도록 태깅된 개체명 라벨들 간의 인접 관계를 학습한 모델이다. A method for recognizing an entity name by a computing device operated by at least one processor according to another embodiment, the method comprising: inputting text into a unary entity name prediction model and a binary entity name prediction model; using the unary entity name prediction model predicting the entity name label corresponding to each word constituting the text, and predicting the possibility of outputting the predicted entity name label pair corresponding to the adjacent word pair constituting the text using the binary entity name prediction model and outputting a label sequence in which entity names labels of words included in the text are listed based on the predicted results, wherein the unary entity name prediction model tags a plurality of words with entity name labels. It is a model that learns the relationship between each word and the entity name label tagged in correspondence to each word as a training text, and the binary entity name prediction model is the training text. It is a model that learns the adjacency relationship between the label of the entity that is tagged to be output.

상기 출력하는 단계는, 상기 단항 개체명 예측 모델의 예측 결과와 상기 이항 개체명 예측 모델의 예측 결과를 하나의 라벨 결정 모델로 결합할 수 있다. The outputting may combine the prediction result of the unary entity name prediction model and the prediction result of the binary entity name prediction model into one label determination model.

한 실시예에 따른 컴퓨팅 장치로서, 메모리, 그리고 상기 메모리에 로드된 프로그램의 명령들(instructions)을 실행하는 적어도 하나의 프로세서를 포함하고, 상기 프로그램은 개체명 라벨이 태깅된 학습 텍스트로 개체명 예측 모델을 학습시키는 단계, 학습된 개체명 예측 모델로 텍스트를 입력하는 단계, 상기 개체명 예측 모델을 이용하여, 상기 텍스트를 구성하는 각 단어의 개체명 라벨과, 이웃한 단어들에 대응되는 개체명 라벨들이 이웃하여 출력될 수 있는지 여부를 예측하는 단계, 그리고 예측된 개체명 라벨 중 이웃 가능한 개체명 라벨들을 상기 텍스트의 라벨 시퀀스로 출력하는 단계를 실행하도록 기술된 명령들을 포함한다.A computing device according to an embodiment, comprising: a memory; and at least one processor executing instructions of a program loaded into the memory, wherein the program predicts an entity name with learning text tagged with an entity name label Learning the model, inputting text into the learned entity name prediction model, using the entity name prediction model, entity name labels of each word constituting the text, and entity names corresponding to neighboring words and predicting whether labels can be output next to each other, and outputting possible neighbor labels among the predicted entity name labels as a label sequence of the text.

상기 입력하는 단계는, 임의의 단어 임베딩 모델을 이용하여 상기 텍스트를 구성하는 단어들을 단어 임베딩 벡터로 생성하고, 생성된 단어 임베딩 벡터들을 상기 개체명 예측 모델로 입력할 수 있다.The inputting may include generating words constituting the text as word embedding vectors using an arbitrary word embedding model, and inputting the generated word embedding vectors into the entity name prediction model.

상기 개체명 예측 모델은, 상기 텍스트를 구성하는 각 단어의 개체명 라벨을 예측하는 단항 개체명 예측 모델과 상기 텍스트를 구성하는 이웃한 단어들에 대응되는 개체명 라벨들이 이웃하여 출력될 수 있는지에 대한 가능성을 예측하는 이항 개체명 예측 모델을 포함할 수 있다.The entity name prediction model determines whether the unary entity name prediction model for predicting the entity name label of each word constituting the text and entity name labels corresponding to the neighboring words constituting the text can be output next to each other. It may include a binomial entity name prediction model that predicts the likelihood of

상기 학습시키는 단계는, 상기 단항 개체명 예측 모델과 상기 이항 개체명 예측 모델에 서로 다른 라벨 결정 모델을 각각 연결할 수 있다.The training may include connecting different label determination models to the unary entity name prediction model and the binary entity name prediction model, respectively.

본 발명에 따르면 개별 단어의 특성뿐만 아니라 이웃하는 단어와들과 상관관계를 반영하여 개체명 인식 결과인 라벨 시퀀스를 생성하므로 개체명 인식의 성능과 정확도를 높일 수 있다.According to the present invention, the performance and accuracy of entity name recognition can be improved by generating a label sequence that is a result of entity name recognition by reflecting not only characteristics of individual words but also correlation with neighboring words.

또한 본 발명에 따르면 뉴스 등의 일반적인 텍스트뿐만 아니라, 생물이나 의학 분야의 전문적인 텍스트에 대한 개체명 인식 성능을 향상시킬 수 있다.In addition, according to the present invention, it is possible to improve object name recognition performance for not only general texts such as news, but also specialized texts in the field of biology or medicine.

도 1은 한 실시예에 따른 개체명 인식 장치의 구조도이다.
도 2는 한 실시예에 따른 개체명 인식 결과의 예시도이다.
도 3은 한 실시예에 따른 개체명 인식 장치가 동작하는 방법의 흐름도이다.
도 4는 한 실시예에 따른 단항 딥러닝 모델과 이항 딥러닝 모델의 구조도이다.
도 5는 한 실시예에 따른 전이 행렬의 예시도이다.
도 6은 한 실시예에 따른 단항 딥러닝 모델과 이항 딥러닝 모델이 결합된 구조를 나타내는 도면이다.
도 7은 한 실시예에 따른 개체명 예측 모델의 구조도이다.
도 8은 한 실시예에 따른 컴퓨팅 장치의 하드웨어 구성도이다.1 is a structural diagram of an apparatus for recognizing an entity name according to an embodiment.
2 is an exemplary diagram of a result of recognizing an entity name according to an exemplary embodiment.
3 is a flowchart of a method of operating an apparatus for recognizing an entity name according to an exemplary embodiment.
4 is a structural diagram of a unary deep learning model and a binary deep learning model according to an embodiment.
5 is an exemplary diagram of a transition matrix according to an embodiment.
6 is a diagram illustrating a structure in which a unary deep learning model and a binary deep learning model are combined according to an embodiment.
7 is a structural diagram of an entity name prediction model according to an embodiment.
8 is a hardware configuration diagram of a computing device according to an embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated. In addition, terms such as “…unit”, “…group”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. have.

도 1은 한 실시예에 따른 개체명 인식 장치의 구조도이고, 도 2는 한 실시예에 따른 개체명 인식 결과의 예시도이다.1 is a structural diagram of an apparatus for recognizing an entity name according to an embodiment, and FIG. 2 is an exemplary diagram of a result of recognizing an entity name according to an embodiment.

도 1을 참고하면, 개체명 인식 장치(1000)는 입력 문장을 벡터화하는 전처리부(100), 전처리된 학습 텍스트를 이용하여 두 개의 개별적인 딥러닝 모델(210, 220)을 각각 학습시키는 학습부(200) 그리고 학습된 모델들(210, 220)과 라벨 결정 모델(311)을 포함하는 개체명 예측 모델(310)을 이용하여 입력 문장의 개체명을 예측하고 라벨 시퀀스를 출력하는 예측부(300)를 포함한다. Referring to FIG. 1, the entity name recognition apparatus 1000 includes a preprocessing unit 100 for vectorizing an input sentence, and a learning unit for learning two separate deep learning models 210 and 220 using the preprocessed learning text, respectively ( 200) And predicting the entity name of the input sentence using the entity name prediction model 310 including the learned models 210 and 220 and the label determination model 311 and the prediction unit 300 for outputting a label sequence includes

설명을 위해, 전처리부(100), 학습부(200) 그리고 예측부(300)로 명명하여 부르나, 이들은 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치이다. 여기서, 전처리부(100), 학습부(200) 그리고 예측부(300)는 하나의 컴퓨팅 장치에 구현되거나, 별도의 컴퓨팅 장치에 분산 구현될 수 있다. 별도의 컴퓨팅 장치에 분산 구현된 경우, 전처리부(100), 학습부(200) 그리고 예측부(300)는 통신 인터페이스를 통해 서로 통신할 수 있다. 컴퓨팅 장치는 본 발명을 수행하도록 작성된 소프트웨어 프로그램을 실행할 수 있는 장치이면 충분하고, 예를 들면, 서버, 랩탑 컴퓨터 등일 수 있다. For the sake of explanation, the preprocessor 100 , the learner 200 , and the prediction unit 300 are named and called, but these are computing devices operated by at least one processor. Here, the preprocessor 100 , the learner 200 , and the prediction unit 300 may be implemented in one computing device or distributed in separate computing devices. When distributed in a separate computing device, the preprocessor 100 , the learner 200 , and the prediction unit 300 may communicate with each other through a communication interface. The computing device may be any device capable of executing a software program written to carry out the present invention, and may be, for example, a server, a laptop computer, or the like.

전처리부(100), 학습부(200) 그리고 예측부(300) 각각은 하나의 인공지능 모델일 수 있고, 복수의 인공지능 모델로 구현될 수도 있다. 그리고 단어 임베딩 모델(110), 단항 딥러닝 모델(210), 이항 딥러닝 모델(220), 단항 라벨 결정 모델(230), 이항 라벨 결정 모델(240), 라벨 결정 모델(311), 그리고 개체명 예측 모델(310)도 하나의 인공지능 모델일 수 있고, 복수의 인공지능 모델로 구현될 수도 있다. 개체명 인식 장치(1000)는 하나의 인공지능 모델일 수 있고, 복수의 인공지능 모델로 구현될 수도 있다. 이에 따라, 상술한 구성들에 대응하는 하나 또는 복수의 인공지능 모델은 하나 또는 복수의 컴퓨팅 장치에 의해 구현될 수 있다.Each of the preprocessing unit 100 , the learning unit 200 , and the prediction unit 300 may be one artificial intelligence model, or may be implemented as a plurality of artificial intelligence models. And the word embedding model 110, the unary deep learning model 210, the binary deep learning model 220, the unary label determination model 230, the binary label determination model 240, the label determination model 311, and the entity name The predictive model 310 may also be one artificial intelligence model or may be implemented as a plurality of artificial intelligence models. The entity name recognition apparatus 1000 may be one artificial intelligence model or may be implemented as a plurality of artificial intelligence models. Accordingly, one or a plurality of artificial intelligence models corresponding to the above-described configurations may be implemented by one or a plurality of computing devices.

개체명 인식 장치(1000)는 하나의 문장이 입력되면, 문장을 구성하는 단어마다 개체명 인식 결과인 라벨을 부착하여, 개체명 라벨들이 배열된 것인 라벨 시퀀스를 출력한다. 인공지능 모델의 관점에서 단어를 노드라고 호칭할 수 있다.When one sentence is input, the entity name recognition apparatus 1000 attaches a label, which is a result of entity name recognition, to each word constituting the sentence, and outputs a label sequence in which entity name labels are arranged. From the point of view of an AI model, a word can be called a node.

한편 라벨 시퀀스를 구성하는 개체명을 출력하는 방식으로서, BIO 태깅(Tagging)이 사용될 수 있다. BIO 태깅이란 개체명 인식 결과를 표시할 때 여러 개의 단어로 이루어진 개체명의 시작 단어를 B(Beginning)로 표시하고 중간에 있는 단어를 I(Inside)로, 개체명이 아닌 단어의 경우에는 O(Outside)로 표시하는 태깅 방법이다. On the other hand, as a method of outputting the name of the entity constituting the label sequence, BIO tagging may be used. In BIO tagging, when displaying the object name recognition result, the starting word of an object name consisting of several words is displayed as B (Beginning), the word in the middle is I (Inside), and in the case of a word that is not an object name, O (Outside). This is a tagging method indicated by .

다른 예로서, BIO에 추가로 개체명의 끝에 있는 문자를 E(End), 하나의 단어 자체가 개체명인 경우는 S(Singleton)으로 추가로 표시하는 IOBES 태깅이 사용될 수도 있다. 개체명을 태깅하는 방식은 어느 하나에 제한되지 않는다.As another example, in addition to BIO, IOBES tagging may be used in which the letter at the end of the entity name is additionally expressed as E (End), and when one word itself is the entity name, additionally S (Singleton). The method of tagging the object name is not limited to any one.

도 2를 참고하면, 입력 텍스트의 첫번째 줄에 대해, 첫번째 줄을 구성하는 각 단어들의 개체명 인식 결과가 라벨 시퀀스로 출력된다. 예를 들어, ureteric은 Disease에 해당하는 개체명의 시작이고, obstruction은 Disease에 해당하는 개체명의 끝이므로, ureteric obstruction은 Disease에 해당하는 하나의 개체명임을 알 수 있다. 또한 caused, by 등은 개체명이 아니고, indinavir은 단어 하나로 구성된 개체명임을 알 수 있다. Referring to FIG. 2 , for the first line of the input text, the entity name recognition result of each word constituting the first line is output as a label sequence. For example, since ureteric is the beginning of a name corresponding to Disease and obstruction is the end of a name corresponding to Disease, it can be seen that ureteric obstruction is a name corresponding to Disease. Also, it can be seen that caused and by are not individual names, and indinavir is an individual name composed of one word.

다시 도 1로 돌아가서, 개체명 인식 장치(1000)를 구성하는 부분들의 역할에 대해 설명한다. Returning again to FIG. 1 , roles of parts constituting the apparatus 1000 for recognizing an entity name will be described.

전처리부(100)는 학습 텍스트 및 입력 텍스트의 단어들을 임베딩 벡터로 변환한다. 변환 방법은 예를 들어, 입력 단어에 대한 사전을 검색하여 생성하거나, 사전에 학습된 모델(Pre-trained Model)을 통해 출력된 값을 이용할 수 있다. 입력 단어를 구성하는 문자들을 새로운 딥러닝 모델에 넣어 그 모델로부터 출력된 벡터를 이용할 수도 있다. 또한 복수의 방법들을 이용할 수도 있으며, 어느 하나에 한정되지 않는다.The preprocessor 100 converts the words of the training text and the input text into embedding vectors. The transformation method may be generated by, for example, searching a dictionary for an input word, or may use a value output through a pre-trained model. It is also possible to use the vector output from the model by putting the characters constituting the input word into a new deep learning model. Also, a plurality of methods may be used, but the present invention is not limited thereto.

본 명세서에서는 단어 임베딩 모델(110)을 이용하며, 한 예로서 입력되는 전체 문장을 고려하여 단어를 임베딩하는 사전 훈련된 언어 모델(Embeddings from Language Model, ELMo)을 이용한 경우에 대해 설명한다. In this specification, the word embedding model 110 is used and, as an example, a case using a pre-trained language model (Embeddings from Language Model, ELMo) for embedding words in consideration of the entire inputted sentence will be described.

전처리부(100)는 학습 텍스트를 단어 임베딩 벡터로 변환하여 학습부(200)에 전달하고, 입력 텍스트를 단어 임베딩 벡터로 변환하여 예측부(300)에 전달한다. The preprocessor 100 converts the training text into a word embedding vector and transmits it to the learner 200 , and converts the input text into a word embedding vector and transmits it to the predictor 300 .

학습부(200)는 단어 임베딩 벡터로 변환된 학습 텍스트와 두 개의 라벨 결정 모델(230, 240)을 이용하여 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)을 각각 학습시킨다. The learning unit 200 trains the unary deep learning model 210 and the binomial deep learning model 220 using the training text converted into the word embedding vector and the two label determination models 230 and 240 , respectively.

단항 딥러닝 모델(210)이란 하나의 단어 자체만을 고려하여 그 단어의 개별적인 라벨의 확률을 계산하는 모델을 의미하고, 이항 딥러닝 모델(220)은 하나의 단어 주위에 위치한 이웃 단어들의 라벨들을 고려하여 그 단어의 라벨의 확률을 계산하는 역할을 하는 딥러닝 모델을 의미한다. The unary deep learning model 210 refers to a model that calculates the probability of individual labels of a word by considering only one word itself, and the binary deep learning model 220 considers the labels of neighboring words located around one word. It means a deep learning model that plays a role in calculating the probability of the label of the word.

단항 딥러닝 모델(210) 또는 이항 딥러닝 모델(220)은 개체명 인식 장치(1000)가 적용되는 태스크의 특성에 따라 적합한 모델로서 구현될 수 있다. 예를 들어 합성곱 신경망(Convolutional Neural Network, CNN), 순환 신경망(Recurrent Neural Network, RNN), 장단기 기억 구조(Long Short-Term Memory, LSTM), 어텐션(Attention) 메커니즘 기반의 Transformer 등으로 구현될 수 있으며, 어느 하나로 한정되는 것은 아니다. The unary deep learning model 210 or the binary deep learning model 220 may be implemented as a suitable model according to the characteristics of the task to which the entity name recognition apparatus 1000 is applied. For example, it can be implemented with Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Attention mechanism-based Transformer, etc. and is not limited to any one.

본 명세서에서 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)은 양방향 장단기 기억 구조(Bidirectional Long Short-Term Memory, BiLSTM)로 구현된 경우에 대해 설명한다. 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)의 자세한 구조 및 각 모델을 학습시키는 과정에 대해서는 도 4를 통해 설명한다. In this specification, a case in which the unary deep learning model 210 and the binomial deep learning model 220 is implemented with a bidirectional long short-term memory (BiLSTM) will be described. Detailed structures of the unary deep learning model 210 and the binary deep learning model 220 and the process of learning each model will be described with reference to FIG. 4 .

예측부(300)는 학습부(200)를 통해 학습된 단항 딥러닝 모델(210)과 학습된 이항 딥러닝 모델(220)을 하나의 라벨 결정 모델(311)로 결합하여 개체명 예측 모델(310)을 생성한다. 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)이 하나의 라벨 결정 모델(311)을 통해 결합된 개체명 예측 모델(310)의 구조는 도 6과 도 7을 통해 자세히 설명한다. The prediction unit 300 combines the unary deep learning model 210 learned through the learning unit 200 and the learned binomial deep learning model 220 into one label determination model 311, and the entity name prediction model 310 ) is created. The structure of the entity name prediction model 310 in which the unary deep learning model 210 and the binary deep learning model 220 are combined through one label determination model 311 will be described in detail with reference to FIGS. 6 and 7.

그리고 예측부(300)는 생성한 개체명 예측 모델(310)을 이용하여 입력 텍스트의 개체명 인식 결과로서 하나의 라벨 시퀀스를 출력한다. Then, the prediction unit 300 outputs one label sequence as the entity name recognition result of the input text by using the generated entity name prediction model 310 .

라벨 결정 모델(311)은 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)을 통해 계산된 확률 값을 수집하여, 이웃하는 라벨과의 관계를 고려하여 가장 적합한 하나의 라벨 시퀀스를 출력한다. The label determination model 311 collects the probability values calculated through the unary deep learning model 210 and the binomial deep learning model 220, and outputs one most suitable label sequence in consideration of the relationship with the neighboring labels. .

본 명세서에서는 라벨 결정 모델(311)을 CRF로 구현하였으나, 반드시 이에 한정되는 것은 아니며 은닉 마르코프 모델(Hidden Markov Model, HMM), 최대 엔트로피 마르코프 모델(Maximum Entropy Markov Model, MEMM) 등으로 구현할 수 있다. In the present specification, the label determination model 311 is implemented as a CRF, but is not necessarily limited thereto, and may be implemented as a Hidden Markov Model (HMM), a Maximum Entropy Markov Model (MEMM), or the like.

도 3은 한 실시예에 따른 개체명 인식 장치가 동작하는 방법의 흐름도이다.3 is a flowchart of a method of operating an apparatus for recognizing an entity name according to an exemplary embodiment.

도 3을 참고하면, 개체명 인식 장치(1000)는 학습 텍스트를 입력받는다(S110). 입력되는 학습 텍스트의 분야는 뉴스, 생물 분야, 의학 분야 등일 수 있으며, 어느 하나에 한정되지 않는다. 학습 텍스트는 각 단어에 IOBES 태깅이 이루어진 데이터를 의미한다. Referring to FIG. 3 , the entity name recognition apparatus 1000 receives a training text ( S110 ). The field of the input learning text may be news, a biological field, a medical field, and the like, but is not limited thereto. The training text means data in which each word is tagged with IOBES.

개체명 인식 장치(1000)의 전처리부(100)는 단어 임베딩 모델(110)을 이용하여 학습 텍스트를 단어 임베딩 벡터로 변환한다(S120). 단어를 벡터로 변환하는 단어 임베딩 방법은 어느 하나에 한정되지 않으며, 한 예로서 단어를 벡터 차원으로 표현하는 분산 표상 모델인 Word2Vec와 사전 훈련된 언어 모델(Pre-trained language model)을 사용하는 ELMo(Embeddings from Language Model)를 단어 임베딩 모델(110)로 사용할 수 있다. The pre-processing unit 100 of the entity name recognition apparatus 1000 converts the training text into a word embedding vector by using the word embedding model 110 ( S120 ). The word embedding method for converting a word into a vector is not limited to any one, and as an example, Word2Vec, a distributed representation model that expresses words in a vector dimension, and ELMo ( Embeddings from Language Model) may be used as the word embedding model 110 .

학습부(200)는 단항 딥러닝 모델(210)에 출력단에 단항 라벨 결정 모델(230)을 연결하고, S120 단계에서 변환된 단어 임베딩 벡터로 단항 딥러닝 모델(210)을 학습시킨다(S130). 즉 단항 딥러닝 모델(210)은 학습 텍스트를 구성하는 각각의 단어와 태깅된 IOBES 결과와의 관계를 학습한다.The learning unit 200 connects the unary label determination model 230 to the output terminal to the unary deep learning model 210, and trains the unary deep learning model 210 with the word embedding vector converted in step S120 (S130). That is, the unary deep learning model 210 learns the relationship between each word constituting the training text and the tagged IOBES result.

S130 단계와 병렬적으로, 학습부(200)는 이항 딥러닝 모델(220)에 출력단에 이항 라벨 결정 모델(240)을 연결하고, S120 단계에서 변환된 단어 임베딩 벡터로 이항 딥러닝 모델(220)을 학습시킨다(S140). 즉 이항 딥러닝 모델(220)은 학습 텍스트를 구성하는 복수의 단어들과 해당 단어들에 태깅된 IOBES 결과를 이용하여 출력되는 개체명들 간의 관계를 학습한다. In parallel with step S130, the learning unit 200 connects the binary label determination model 240 to the output end to the binary deep learning model 220, and uses the word embedding vector converted in step S120 to convert the binary deep learning model 220. to learn (S140). That is, the binary deep learning model 220 learns the relationship between the plurality of words constituting the training text and the output object names using the IOBES results tagged to the corresponding words.

개체명 인식 장치(1000)의 예측부(300)는, S130 단계와 S140 단계를 통해 학습된 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)을 결합하고, 출력단에 라벨 결정 모델(311)을 연결하여 개체명 예측 모델(310)을 생성한다(S150). 이때, 두 개의 모델을 학습시킬 때 사용한 라벨 결정 모델(230, 240)이 아닌, 새로운 라벨 결정 모델(311)에 학습된 두 모델을 결합한다. 이는 도 2에서 설명한 바와 같이, 학습 과정에서는 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)의 매개 변수 학습을 용이하게 하기 위해 두 개의 딥러닝 모델에 각각 라벨 결정 모델(230, 240)을 연결한 것이기 때문이다. The prediction unit 300 of the entity name recognition apparatus 1000 combines the unary deep learning model 210 and the binary deep learning model 220 learned through steps S130 and S140, and a label determination model 311 at the output end. ) to create the entity name prediction model 310 (S150). At this time, the two learned models are combined in the new label determination model 311, not the label determination models 230 and 240 used when training the two models. As described in FIG. 2, in the learning process, in order to facilitate parameter learning of the unary deep learning model 210 and the binomial deep learning model 220, the label determination models 230 and 240 for the two deep learning models, respectively. because it is connected to

이후 개체명 인식 장치(1000)는 입력 텍스트를 입력받고, 전처리부(100)에서 입력 텍스트를 단어 임베딩 벡터로 변환한다(S170). 사용되는 단어 임베딩 방법은 S120 단계와 동일할 수 있다.Thereafter, the entity name recognition apparatus 1000 receives the input text, and the preprocessor 100 converts the input text into a word embedding vector ( S170 ). The word embedding method used may be the same as that of step S120.

예측부(300)는 변환된 단어 임베딩 벡터들을 개체명 예측 모델(310)에 넣고, 입력 텍스트에 대한 개체명 인식 결과인 라벨 시퀀스를 출력한다(S180). 출력된 결과인 라벨 시퀀스는 도 2의 형태일 수 있다. The prediction unit 300 puts the transformed word embedding vectors into the entity name prediction model 310, and outputs a label sequence that is the entity name recognition result for the input text (S180). The outputted label sequence may be in the form of FIG. 2 .

이하에서는, S130 단계와 S140 단계에서 학습부(200)가 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)을 각각 학습시키는 방법을, 각 딥러닝 모델의 구조를 통해 설명한다.Hereinafter, a method for the learning unit 200 to learn the unary deep learning model 210 and the binary deep learning model 220 in steps S130 and S140, respectively, will be described through the structure of each deep learning model.

도 4는 한 실시예에 따른 단항 딥러닝 모델과 이항 딥러닝 모델의 구조도이고, 도 5는 한 실시예에 따른 전이 행렬의 예시도이다. 4 is a structural diagram of a unary deep learning model and a binomial deep learning model according to an embodiment, and FIG. 5 is an exemplary diagram of a transition matrix according to an embodiment.

도 4의 (a)는 단항 딥러닝 모델(210)의 구조를 나타낸 것이고, 도 4의 (b)는 이항 딥러닝 모델(220)의 구조를 나타낸 것이다.Figure 4 (a) shows the structure of the unary deep learning model 210, Figure 4 (b) shows the structure of the binary deep learning model 220.

학습 텍스트가 전처리부(100)에서 단어 임베딩 벡터로 생성되어 학습부(200)에 입력되면, 학습부(200)는 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)을 각각 학습시킨다. When the training text is generated as a word embedding vector in the preprocessor 100 and input to the learning unit 200 , the learning unit 200 trains the unary deep learning model 210 and the binary deep learning model 220 , respectively.

예측부(300)에서는 두 개의 딥러닝 모델이 하나의 라벨 결정 모델(311)을 통해 결합되지만, 학습 과정에서는 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)의 매개 변수 학습을 용이하게 하기 위해 각 딥러닝 모델이 개별적으로 학습된다. 즉, 단항 라벨 결정 모델(230)과 이항 라벨 결정 모델(240)이 각각 이용된다. In the prediction unit 300, two deep learning models are combined through one label determination model 311, but in the learning process, parameter learning of the unary deep learning model 210 and the binomial deep learning model 220 is facilitated. To do this, each deep learning model is trained individually. That is, the unary label determination model 230 and the binary label determination model 240 are used, respectively.

각 딥러닝 모델(210, 220)은 각각의 개별적인 라벨 결정 모델(230, 240)을 이용하여 학습 텍스트에 대한 우도(Likelihood)를 최대화시키기 위한 모델 학습을 진행한다.Each deep learning model (210, 220) proceeds to model learning to maximize the likelihood (Likelihood) for the training text using each individual label determination model (230, 240).

단항 딥러닝 모델(210)은 BiLSTM 레이어(211), 결합 레이어(212)를 포함하고 단항 라벨 결정 모델(230)과 연결되며, 이항 딥러닝 모델(220)은 BiLSTM 레이어(211), 결합 레이어(212)를 포함하고 이항 라벨 결정 모델(240)과 연결된다. The unary deep learning model 210 includes a BiLSTM layer 211 and a binding layer 212 and is connected to the unary label determination model 230, and the binomial deep learning model 220 includes a BiLSTM layer 211 and a binding layer ( 212 ) and connected to the binary label determination model 240 .

이하에서는, 단항 딥러닝 모델(210)과 이항 딥러닝 모델(220)에 각각 포함된 BiLSTM 레이어(211)와 BiLSTM 레이어(221)의 역할은 유사하므로 같이 설명하고, 단항 딥러닝 모델(210)에서의 결합 레이어(212)와 이항 딥러닝 모델(220)에서의 결합 레이어(222) 및 단항 라벨 결정 모델(230)과 이항 라벨 결정 모델(240)의 역할은 따로 설명한다.Hereinafter, since the roles of the BiLSTM layer 211 and the BiLSTM layer 221 included in the unary deep learning model 210 and the binomial deep learning model 220, respectively, are similar, they will be described together, and in the unary deep learning model 210 The roles of the binding layer 222 and the unary label decision model 230 and the binary label decision model 240 in the binding layer 212 and the binomial deep learning model 220 will be described separately.

BiLSTM 레이어(211, 221)는 전처리부(100)에서 생성된 단어 임베딩 벡터를 입력받아 그에 따른 임의의 차원을 갖는 상태 벡터를 생성한다. 구체적으로, BiLSTM을 통해 생성되는 시간 t에서의 상태 벡터는, 정방향 LSTM의 시간 t에서의 상태 벡터와 역방향 LSTM의 시간 t에서의 상태 벡터의 단순 연결 연산을 통해 생성된다. The BiLSTM layers 211 and 221 receive the word embedding vector generated by the preprocessor 100 and generate a state vector having an arbitrary dimension according to the input vector. Specifically, the state vector at time t generated through BiLSTM is generated through a simple concatenation operation of the state vector at time t of the forward LSTM and the state vector at time t of the backward LSTM.

이때, 정방향 LSTM의 시간 t에서의 상태 벡터는 시간 t에서의 단어 임베딩 벡터와 이전 시간 t-1에서의 정방향 LSTM의 상태 벡터를 입력으로 하는 함수 연산을 통해 생성된다. 마찬가지로, 역방향 LSTM의 시간 t에서의 상태 벡터는 시간 t에서의 단어 임베딩 벡터와 미래 시간 t+1에서의 역방향 LSTM의 상태 벡터를 입력으로 하는 함수 연산을 통해 생성된다. In this case, the state vector at time t of the forward LSTM is generated through a function operation using the word embedding vector at time t and the state vector of the forward LSTM at the previous time t-1 as inputs. Similarly, the state vector at time t of the backward LSTM is generated through a function operation with the word embedding vector at time t and the state vector of the backward LSTM at future time t+1 as inputs.

BiLSTM 레이어(211, 221)를 통해 생성된 임의의 차원을 갖는 상태 벡터는 상위의 결합 레이어(212, 222)의 입력 벡터가 된다. 한편 결합 레이어(212, 222)를 통해 출력되는 벡터의 크기는 가능한 라벨의 개수와 일치할 수 있다. 예를 들어, 텍스트를 구성하는 질병(Disease)과 화학 물질(Chemical)의 개체명을 인식하고자 하고, IOBES 태깅 방식을 이용하는 문제를 설명한다. 이 경우 각 단어가 가질 수 있는 라벨은 Disease와 Chemical 유형의 개체명에 I, B, E, S가 가능하고, O가 가능하므로 가능한 라벨의 개수는 2*4+1로 총 9개일 수 있다.A state vector having an arbitrary dimension generated through the BiLSTM layers 211 and 221 becomes an input vector of the upper coupling layers 212 and 222 . Meanwhile, the size of the vector output through the combining layers 212 and 222 may match the number of possible labels. For example, the problem of using the IOBES tagging method will be explained in order to recognize the individual names of the diseases and chemicals constituting the text. In this case, the labels that each word can have are I, B, E, and S for the disease and chemical type entity names, and O is possible, so the number of possible labels is 2*4+1, which can be a total of 9.

결합 레이어(212, 222)는 적어도 한 개의 선형 변환 연산으로 구성되며, 반드시 이에 제한되는 것은 아니며 경우에 따라 복수 개의 선형 및 비선형 변환 연산들의 결합으로 구성될 수 있다. 이하에서는 도면을 통해 단항 딥러닝 모델(210)의 결합 레이어(212)와 이항 딥러닝 모델(220)의 결합 레이어(222)의 역할에 대해 설명한다. The combining layers 212 and 222 are composed of at least one linear transform operation, but are not necessarily limited thereto, and may be composed of a combination of a plurality of linear and non-linear transform operations in some cases. Hereinafter, the roles of the coupling layer 212 of the unary deep learning model 210 and the coupling layer 222 of the binomial deep learning model 220 will be described with reference to the drawings.

먼저 도 4의 (a)를 참고하면, 단항 딥러닝 모델(210)의 결합 레이어(212)는 임의의 함수를 통해 스코어 벡터를 생성한다. First, referring to FIG. 4A , the combination layer 212 of the unary deep learning model 210 generates a score vector through an arbitrary function.

스코어란, 입력된 텍스트를 구성하는 각 단어가 임의의 라벨에 해당할 수치를 의미하고, 스코어 벡터란 단어별 스코어를 벡터 형식으로 나열한 것을 의미한다. 앞서 설명한 예에 대해서, 텍스트를 구성하는 각 단어가 “B-Disease", "I-Disease", "E-Disease", "S-Disease", "B-Chemical", "I-Chemical", "E-Chemical", "S-Chemical", "O" 라는 각 라벨에 해당할 스코어가 출력될 수 있다. The score means a numerical value for each word constituting the input text to correspond to an arbitrary label, and the score vector means that the score for each word is arranged in a vector format. For the example described above, each word in the text is “B-Disease”, “I-Disease”, “E-Disease”, “S-Disease”, “B-Chemical”, “I-Chemical”, “ A score corresponding to each label of "E-Chemical", "S-Chemical", and "O" may be output.

이 경우, 단항 딥러닝 모델(210)에서의 결합 레이어(212)는 각 노드에 대응되는 스코어 벡터를 생성하며, 스코어 벡터의 크기는 9가 될 것이다. 또한 나머지 임의의 노드의 스코어 벡터 역시, 그 노드 단어가 9개의 라벨 각각에 해당할 스코어를 포함할 것이다.In this case, the combination layer 212 in the unary deep learning model 210 generates a score vector corresponding to each node, and the size of the score vector will be 9. In addition, the score vector of any other node will also contain a score for which the node word corresponds to each of the nine labels.

단항 딥러닝 모델(210)에 연결된 단항 라벨 결정 모델(230)은 수학식 1과 같이 학습 텍스트에 대한 라벨 시퀀스 확률을 계산한다. The unary label determination model 230 connected to the unary deep learning model 210 calculates the label sequence probability for the training text as shown in Equation 1.

수학식 1에서, X는 입력되는 학습 텍스트이고, Y는 출력되는 라벨 시퀀스이고,

는 출력되는 라벨 시퀀스의 확률 값이다.

는 출력 값을 확률로 정규화 시키기 위하여 가능한 라벨 시퀀스의 확률 값을 모두 더한 값이고, N은 학습 데이터에 포함된 단어의 개수이자 출력되는 라벨의 개수이다.

는 i번째 단어에 대해

라는 라벨을 할당하기 위한 단항 스코어이고,

는 전이 행렬 A의 i-1번째 행과 i번째 열에 해당하는 값을 의미하며, 도 5를 통해 자세히 설명한다.In Equation 1, X is the input training text, Y is the output label sequence,

is the probability value of the output label sequence.

is the sum of all probability values of possible label sequences in order to normalize the output value to the probability, and N is the number of words included in the training data and the number of output labels.

is for the ith word

is a unary score for assigning a label,

denotes values corresponding to the i-1 th row and i th column of the transition matrix A, which will be described in detail with reference to FIG. 5 .

한편

는 다음 수학식 2를 통해 계산될 수 있다. Meanwhile

can be calculated through Equation 2 below.

도 5를 참고하면, 전이 행렬은 두 개의 라벨들로 조합할 수 있는 가능한 모든 라벨 쌍들의 조합들을 나타낸 행렬을 의미한다. 즉 행렬의 각 성분(Entry)은 각 행과 열에 해당하는 라벨들의 쌍(Pair)이 라벨 시퀀스에서 함께 이웃하여 나타날 상호 호환성에 대한 스코어이다. Referring to FIG. 5 , the transition matrix refers to a matrix indicating all possible combinations of label pairs that can be combined into two labels. That is, each element (Entry) of the matrix is a score for mutual compatibility in which pairs of labels corresponding to each row and column appear adjacent to each other in the label sequence.

전이 행렬은 기계 학습 과정에서 만들어져 저장된 값을 사용하기 때문에, 새로운 텍스트가 입력되어도 임의의 라벨 쌍에 대해서는 항상 같은 값을 제공한다. 따라서, 텍스트의 문맥에 따른 이웃 라벨과의 관계 및 호환성을 고려하지 못하는 단점이 있다. 따라서, 본 발명에서는 텍스트 입력에 대해 출력 단계의 라벨 시퀀스에서 이웃하는 라벨들 간의 관계를 고려하는 이항 딥러닝 모델(220)을 함께 사용한다. Since the transition matrix uses values created and stored in the machine learning process, it always provides the same value for any label pair even when new text is input. Accordingly, there is a disadvantage in that the relationship and compatibility with neighboring labels according to the context of the text cannot be considered. Therefore, in the present invention, the binary deep learning model 220 that considers the relationship between neighboring labels in the label sequence of the output stage for text input is used together.

이제 도 4의 (b)를 참고하여, 이항 딥러닝 모델(220)의 결합 레이어(222)의 역할에 대해 설명한다. 이항 딥러닝 모델(220)의 결합 레이어(222)는 하단의 BiLSTM 레이어(221)에서 출력된 이웃하는 위치의 두 개의 상태 벡터들을 입력으로 받고, 이웃하는 라벨 쌍의 상호 호환성에 대한 스코어 벡터를 생성한다. Now, with reference to Fig. 4 (b), the role of the binding layer 222 of the binomial deep learning model 220 will be described. The combination layer 222 of the binary deep learning model 220 receives two state vectors of neighboring positions output from the lower BiLSTM layer 221 as inputs, and generates a score vector for the interoperability of neighboring label pairs. do.

이때 스코어 벡터의 크기는, 조합될 수 있는 모든 라벨 쌍에 대한 스코어를 포함하도록 결정될 수 있다. 예를 들어, 앞서 설명한 텍스트를 구성하는 질병(Disease)과 화학 물질(Chemical)의 개체명을 인식하고자 하고, IOBES 태깅 방식을 이용하는 문제에 대해 설명한다. 이 경우, 텍스트를 구성하는 각 단어는 대응되는 스코어 벡터를 갖는데, 이 스코어 벡터의 크기는 9*9로 81일 것이다.In this case, the size of the score vector may be determined to include scores for all label pairs that can be combined. For example, the problem of using the IOBES tagging method will be described in order to recognize the individual names of the diseases and chemicals constituting the text described above. In this case, each word constituting the text has a corresponding score vector, and the size of the score vector will be 81 in 9*9.

이항 딥러닝 모델(220)은 라벨 결정 모델(240)에서 출력되는 두 개의 연속한 라벨 간의 관계를 입력된 텍스트로부터 추론한다. 모델 학습 단계에서, 이항 딥러닝 모델(220)에 연결된 이항 라벨 결정 모델(240)은 수학식 3과 같이 학습 텍스트에 대한 라벨 시퀀스 확률을 계산한다.The binary deep learning model 220 infers a relationship between two consecutive labels output from the label determination model 240 from the input text. In the model learning step, the binary label determination model 240 connected to the binary deep learning model 220 calculates a label sequence probability for the training text as shown in Equation 3.

수학식 3에서, X는 입력되는 학습 텍스트이고, Y는 학습 텍스트에 상응하는 출력으로서 하나의 라벨 시퀀스이고,

는 라벨 시퀀스에 대한 확률 값이다.

는 라벨 시퀀스를 확률 값으로 정규화하기 위해, 생성 가능한 모든 라벨 시퀀스들의 스코어들을 모두 더한 값이고, N은 텍스트 입력에 포함된 단어의 개수이자 출력되는 라벨의 개수이다.

는 i-1번째 위치에서의 임의의 라벨과 i번째 위치의 임의의 라벨의 상호 호환성에 대한 스코어이다. 상호 호환성이란, 해당 위치에서 임의의 라벨 쌍이 나타날 수 있는 가능성을 의미한다. In Equation 3, X is the input training text, Y is one label sequence as an output corresponding to the training text,

is the probability value for the label sequence.

In order to normalize the label sequence to a probability value, N is the sum of the scores of all the label sequences that can be generated, and N is the number of words included in the text input and the number of output labels.

is a score for the interchangeability of any label at the i-1 th position and any label at the i th position. Interoperability refers to the possibility of any label pair appearing in that position.

한편

는 다음 수학식 4를 통해 계산될 수 있다. Meanwhile

can be calculated through Equation 4 below.

한편 스코어 벡터를 생성하는 연산 방법은 어느 하나로 제한되지 않는다. 예를 들어, 하단의 신경망 네트워크로부터 출력된 벡터들을 기반으로 이웃하는 벡터들을 단순히 하나의 벡터로 이어 붙여서 구성하는 방법, 두 이웃하는 벡터들을 서로 곱(Element-wise multiplication) 또는 아다마르 곱(Hadamard Product)하여 하나의 벡터로 형성하는 방법을 사용할 수 있다.Meanwhile, the calculation method for generating the score vector is not limited to any one. For example, a method of constructing neighboring vectors by simply concatenating them into one vector based on the vectors output from the neural network at the bottom, Element-wise multiplication of two neighboring vectors, or Hadamard Product ) to form a single vector can be used.

또한, 같은 크기를 갖는 두 이웃하는 벡터들의 차이에 절대값을 씌워서 하나의 벡터로 형성하는 방법을 이용할 수 있다. 이 경우, 수학식 3의

는

의 형태일 수 있다. 이때

는 Bilinear Model의 j번째 행렬을 가리킨다.Also, it is possible to use a method of forming a single vector by applying an absolute value to the difference between two neighboring vectors having the same size. In this case, Equation 3

is

may be in the form of At this time

is the j-th matrix of the bilinear model.

한편 이항 딥러닝 모델(220)에서 인접한 상태 벡터들 사이에서의 관계만을 고려하는 것으로 설명하였으나, 반드시 이에 한정되는 것은 아니다.Meanwhile, although it has been described that only the relationship between adjacent state vectors is considered in the binary deep learning model 220, the present invention is not limited thereto.

도 6은 한 실시예에 따른 단항 딥러닝 모델과 이항 딥러닝 모델이 결합된 구조를 나타내는 도면이고, 도 7은 한 실시예에 따른 개체명 예측 모델의 구조도이다.6 is a diagram illustrating a structure in which a unary deep learning model and a binomial deep learning model are combined according to an embodiment, and FIG. 7 is a structural diagram of an entity name prediction model according to an embodiment.

도 6을 참고하면, X는 단항 딥러닝 모델(210)에 입력되는 단어이고, Y는 입력 텍스트를 구성하는 각 단어에 대한 출력 라벨을 의미한다. 즉

는 시간 t-1에서 출력되는 라벨,

는 시간 t에서 출력되는 라벨을 의미한다.Referring to FIG. 6 , X is a word input to the unary deep learning model 210, and Y is an output label for each word constituting the input text. In other words

is the label output at time t-1,

denotes a label output at time t.

단항 딥러닝 모델(210)은 라벨 결정 모델(311)이 출력하는 개별 라벨에 연결되고, 이항 딥러닝 모델(220)은 라벨 결정 모델(311)이 출력하는 인접한 라벨 사이에 연결된다. The unary deep learning model 210 is connected to individual labels output by the label determination model 311 , and the binary deep learning model 220 is connected between adjacent labels output by the label determination model 311 .

도 7을 참고하면, 예측부(300)는 학습된 단항 딥러닝 모델(210)과 학습된 이항 딥러닝 모델(220)을 하나의 라벨 결정 모델(311)에서 통합하여 개체명 예측 모델(310)을 생성한다. 개체명 예측 모델(310)은 입력 텍스트에 대해 가장 적합한 하나의 라벨 시퀀스를 제공한다. 이때 단항 딥러닝 모델(210)의 BiLSTM 레이어(211)와 이항 딥러닝 모델(220)의 BiLSTM 레이어(221)의 각각의 파라미터들은 서로 통합될 수 있다. Referring to FIG. 7 , the prediction unit 300 integrates the learned unary deep learning model 210 and the learned binomial deep learning model 220 in one label determination model 311 to predict the entity name model 310 . create The entity name prediction model 310 provides one label sequence that is most suitable for the input text. In this case, the parameters of the BiLSTM layer 211 of the unary deep learning model 210 and the BiLSTM layer 221 of the binomial deep learning model 220 may be integrated with each other.

개체명 예측 모델(310)의 라벨 결정 모델(311)은, 단항 딥러닝 모델(210)의 출력 값과 이항 딥러닝 모델(220)의 출력 값을 모두 이용하여 수학식 5를 통해 계산된 값을 제공할 수 있다.The label determination model 311 of the entity name prediction model 310 is a value calculated through Equation 5 using both the output value of the unary deep learning model 210 and the output value of the binary deep learning model 220. can provide

수학식 5에서, X는 입력되는 텍스트이고, Y는 출력되는 라벨 시퀀스이고,

는 단항 딥러닝 모델(210)에서 계산된 해당 라벨 시퀀스의 확률 값이고,

는 이항 딥러닝 모델(220)에서 계산된 해당 라벨 시퀀스의 확률 값이다. In Equation 5, X is the input text, Y is the output label sequence,

is the probability value of the corresponding label sequence calculated in the unary deep learning model 210,

is a probability value of the corresponding label sequence calculated in the binary deep learning model 220 .

한 예로서, 개체명 예측 모델(310)은 단항 딥러닝 모델(210)의 결과와 이항 딥러닝 모델(220)의 결과를 곱하여 얻을 수 있다. 이 경우, 개체명 라벨을 예측하는 것은 곱한 값이 최대화되도록 하는 하나의 라벨 시퀀스를 얻어내는 과정이다. As an example, the entity name prediction model 310 may be obtained by multiplying the result of the unary deep learning model 210 and the result of the binary deep learning model 220 . In this case, predicting the entity label is a process of obtaining one label sequence that maximizes the multiplied value.

이때, 확률 값 자체가 아니라 확률 값을 최대화시키는 라벨 시퀀스를 구하는 것이 목적이므로, 확률 값을 위한 정규화 과정은 생략될 수 있다. 구체적으로, 비터비(Viterbi) 디코딩과 같은 디코딩 기법들이 사용될 수 있다.In this case, since the purpose is to obtain a label sequence that maximizes the probability value, not the probability value itself, the normalization process for the probability value may be omitted. Specifically, decoding techniques such as Viterbi decoding may be used.

도 8은 한 실시예에 따른 컴퓨팅 장치의 하드웨어 구성도이다.8 is a hardware configuration diagram of a computing device according to an embodiment.

도 8을 참고하면, 전처리부(100), 학습부(200) 그리고 예측부(300)는 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치(400)에서, 본 발명의 동작을 실행하도록 기술된 명령들(instructions)이 포함된 프로그램을 실행한다. Referring to FIG. 8 , the preprocessing unit 100 , the learning unit 200 and the prediction unit 300 are instructions described to execute the operations of the present invention in the computing device 400 operated by at least one processor. Executes a program containing (instructions).

컴퓨팅 장치(400)의 하드웨어는 적어도 하나의 프로세서(410), 메모리(420), 스토리지(430), 통신 인터페이스(440)를 포함할 수 있고, 버스를 통해 연결될 수 있다. 이외에도 입력 장치 및 출력 장치 등의 하드웨어가 포함될 수 있다. 컴퓨팅 장치(400)는 프로그램을 구동할 수 있는 운영 체제를 비롯한 각종 소프트웨어가 탑재될 수 있다.The hardware of the computing device 400 may include at least one processor 410 , a memory 420 , a storage 430 , and a communication interface 440 , and may be connected through a bus. In addition, hardware such as an input device and an output device may be included. The computing device 400 may be loaded with various software including an operating system capable of driving a program.

프로세서(410)는 컴퓨팅 장치(400)의 동작을 제어하는 장치로서, 프로그램에 포함된 명령들을 처리하는 다양한 형태의 프로세서(410)일 수 있고, 예를 들면, CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 등 일 수 있다. 메모리(420)는 본 발명의 동작을 실행하도록 기술된 명령들이 프로세서(410)에 의해 처리되도록 해당 프로그램을 로드한다. 메모리(420)는 예를 들면, ROM(read only memory), RAM(random access memory) 등 일 수 있다. 스토리지(430)는 본 발명의 동작을 실행하는데 요구되는 각종 데이터, 프로그램 등을 저장한다. 통신 인터페이스(440)는 유/무선 통신 모듈일 수 있다.The processor 410 is a device for controlling the operation of the computing device 400, and may be various types of processors 410 that process instructions included in a program, for example, a central processing unit (CPU), an MPU (Central Processing Unit) Micro Processor Unit), MCU (Micro Controller Unit), GPU (Graphic Processing Unit), and the like. The memory 420 loads the corresponding program so that the instructions described to execute the operation of the present invention are processed by the processor 410 . The memory 420 may be, for example, read only memory (ROM), random access memory (RAM), or the like. The storage 430 stores various data, programs, etc. required for executing the operation of the present invention. The communication interface 440 may be a wired/wireless communication module.

본 발명의 개체명 인식 장치는 일반적인 뉴스 텍스트에서 사람의 이름, 조직, 회사 또는 기관의 이름, 지리적 명칭 등에 대한 개체명 인식에 활용될 수 있다.The apparatus for recognizing entity names of the present invention may be utilized to recognize entity names for a person's name, an organization, a company or institution name, a geographical name, and the like in general news text.

또한, 본 발명의 개체명 인식 장치는 일반적인 분야를 다루는 뉴스 텍스트뿐만 아니라 각 분야에 특화된 텍스트에서 활용될 수 있다. 특히 생물 및 의학 분야의 텍스트의 경우, 단백질, 유전자, 화학 성분, 질병, DNA, RNA의 이름이 개체명이 될 수 있다. In addition, the apparatus for recognizing entity names of the present invention can be utilized not only in news texts dealing with general fields, but also texts specific to each field. In particular, in the case of texts in the field of biology and medicine, the names of proteins, genes, chemical components, diseases, DNA, and RNA can be names of entities.

전문 분야의 개체명은 각 분야가 연구됨에 따라 새롭게 발견되거나 만들어지고, 개체명을 명명하는 규칙의 표준이 없으며, 개체를 설명하거나 형용하는 형태로 명명되는 경우가 있다. 따라서 본 발명의 개체명 인식 장치는, 이웃하는 단어 간 문맥을 파악하여 개체명 및 라벨 시퀀스를 결정하므로, 전문 분야에서 복수의 단어로 연결된 개체명을 인식하는 경우에도 활용될 수 있다. Individual names in specialized fields are newly discovered or created as each field is researched, there is no standard for naming rules, and there are cases in which names are used to describe or describe objects. Accordingly, the apparatus for recognizing entity names according to the present invention determines entity names and label sequences by recognizing contexts between neighboring words, and thus can be utilized even when recognizing entity names connected by a plurality of words in a specialized field.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiment of the present invention described above is not implemented only through the apparatus and method, and may be implemented through a program for realizing a function corresponding to the configuration of the embodiment of the present invention or a recording medium in which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improved forms of the present invention are also provided by those skilled in the art using the basic concept of the present invention as defined in the following claims. is within the scope of the right.

Claims

A method for a computing device operated by at least one processor to recognize an entity name,
Step of receiving the input of the learning text tagged with the entity name label in a plurality of words,
A unary entity name prediction model for learning the relationship between each word and the entity name label corresponding to each word, and adjacent word pairs constituting the training text so that the probability that any entity name labels are adjacent to each other is output; Training each binary entity name prediction model for learning the relationship between entity name label pairs corresponding to each adjacent word pair, and
Creating an entity name prediction model by combining the learned unary entity name prediction model and the label determination model at the output of the learned binomial entity name prediction model
A method for recognizing an entity name, including

In claim 1,
The step of receiving the input is
An entity name recognition method of pre-processing the training text into a word embedding model.

In claim 1,
The step of receiving the input is
Tagging the entity name label to the plurality of words in an IOBES method or a BIO method,
In the IOBES method, if an arbitrary word corresponds to the beginning of an entity name composed of a plurality of words, B, if it corresponds to the middle of the entity name, I, if it corresponds to the end of the entity name, E, if it is a word other than the entity name, O, an entity composed of one word If it corresponds to a person, it is indicated by an S,
In the BIO method, if the arbitrary word corresponds to the start of the entity name composed of the plurality of words, B, if it corresponds to the entity name other than the beginning, I, and if the word is not the entity name, O is displayed. Way.

In claim 1,
The learning step is,
A method for recognizing an entity name, in which separate label determination models are connected to the output value of the unary entity name prediction model and the output value of the binary entity name prediction model.

In claim 1,
The entity name prediction model is,
Entity name recognition, which is a model in which the parameters of the Bidirectional Long Short-Term Memory (BiLSTM) layer included in the unary entity name prediction model and the parameters of the BiLSTM layer included in the binomial entity name prediction model are integrated Way.

In claim 1,
inputting text into the entity name prediction model, and
Predicting a label sequence in which entity name labels of words included in the text are listed using the entity name prediction model
Further comprising, the object name recognition method.

In claim 6,
The label sequence is
A method for recognizing an entity name such that a value obtained by multiplying a probability value output by the learned unary entity name prediction model by a probability value output by the learned binomial entity name prediction model becomes a maximum.

In claim 7,
wherein the text is a word embedding vector preprocessed by a word embedding model.

A method for a computing device operated by at least one processor to recognize an entity name,
inputting text into the unary entity name prediction model and the binary entity name prediction model;
The entity name label corresponding to each word constituting the text is predicted using the unary entity name prediction model, and the predicted entity name corresponding to the adjacent word pair constituting the text using the binary entity name prediction model predicting the likelihood that a label pair will be output, and
Outputting a label sequence in which entity name labels of words included in the text are listed based on the predicted result
including,
The unary entity name prediction model is a learning text in which a plurality of words are tagged with an entity name label, and is a model that learns the relationship between each word and the entity name label tagged corresponding to each word,
The binary entity name prediction model is a model that learns the adjacency relationship between tagged entity name labels so that the possibility that arbitrary entity name labels are adjacent to each other is output as the training text.

In claim 9,
The output step is
A method for recognizing an entity that combines the prediction result of the unary entity name prediction model and the prediction result of the binary entity name prediction model into one label determination model.

A computing device comprising:
memory, and
at least one processor executing instructions of a program loaded into the memory;
the program is
training the entity name prediction model with the entity name label-tagged training text;
Entering text into the learned entity name prediction model;
Using the entity name prediction model, predicting whether entity name labels of each word constituting the text and entity name labels corresponding to neighboring words can be output next to each other; and
Outputting possible neighboring entity name labels among the predicted entity name labels as a label sequence of the text
A computing device comprising instructions written to execute

In claim 11,
The input step is
A computing device for generating words constituting the text as word embedding vectors using an arbitrary word embedding model, and inputting the generated word embedding vectors into the entity name prediction model.

In claim 11,
The entity name prediction model is,
A unary entity name prediction model for predicting the entity name label of each word constituting the text and a binary entity predicting the possibility that entity name labels corresponding to neighboring words constituting the text can be output next to each other A computing device comprising a life prediction model.

In claim 13,
The learning step is,
A computing device for connecting different label determination models to the unary entity name prediction model and the binary entity name prediction model, respectively.