KR102605709B1

KR102605709B1 - A pyramid layered attention model for nested and overlapped named entity recognition

Info

Publication number: KR102605709B1
Application number: KR1020210138239A
Authority: KR
Inventors: 조인휘; 최성민
Original assignee: 한양대학교 산학협력단
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2023-11-23
Also published as: KR20230055021A

Abstract

Nested 와 Overlapped Named Entity 인식을 위한 피라미드 Layered 어텐션 모델이 개시된다. 일 실시예에 따른 개체명 인식 시스템에 의해 수행되는 개체명 인식 방법은, 텍스트 데이터를 개체명 인식을 위한 딥러닝 모델에 입력받는 단계; 및 상기 딥러닝 모델을 통해 상기 입력받은 텍스트 데이터로부터 개체명을 인식하는 단계를 포함하고, 상기 딥러닝 모델은, 인코더로부터 출력된 인코딩 결과를 어텐션 기반의 디코더의 입력 데이터로 사용하여 디코더의 각 레이어의 출력 길이가 감소되도록 구축된 것일 수 있다. A pyramid layered attention model for recognizing nested and overlapped named entities is disclosed. An entity name recognition method performed by an entity name recognition system according to an embodiment includes receiving text data into a deep learning model for entity name recognition; And a step of recognizing an entity name from the input text data through the deep learning model, wherein the deep learning model uses the encoding result output from the encoder as input data of an attention-based decoder for each layer of the decoder. It may be constructed to reduce the output length of .

Description

Pyramid Layered Attention Model for Nested and Overlapped Named Entity Recognition {A PYRAMID LAYERED ATTENTION MODEL FOR NESTED AND OVERLAPPED NAMED ENTITY RECOGNITION}

아래의 설명은 개체명 인식 기술에 관한 것이다. The explanation below is about entity name recognition technology.

개체명 인식(Named Entity Recognition; NER)은 텍스트에서 개체명(NE)의 위치를 식별하고 이를 사람, 조직, 위치, 기산 등과 같은 정의된 범주로 분류하는 것을 목표로 하는 정보 추출의 하위 작업이다. 개체명 인식은 정보 추출을 위한 도구일 뿐만 아니라 텍스트 이해, 자동 텍스트 요약, 질문 응답 시스템, 기계 번역 및 지식 기반 구축과 같은 다양한 자연어 처리 응용 프로그램에서 중요한 역할을 한다. Named Entity Recognition (NER) is a subtask of information extraction that aims to identify the location of named entities (NEs) in text and classify them into defined categories such as people, organizations, locations, entities, etc. Name-entity recognition is not only a tool for information extraction, but also plays an important role in various natural language processing applications such as text understanding, automatic text summarization, question answering systems, machine translation, and knowledge base construction.

초기의 개체명 인식 시스템은 정확도가 높았지만, 많은 인력이 규칙을 설계해야 하기 때문에 다른 분야로 바꿀 때 많은 양의 규칙을 재설계해야 했었다. 최근 몇 년 동안 딥러닝의 급속한 발전이 각 분야에서 좋은 효과를 보고 있기 때문에 많은 개체명 인식 시스템이 딥러닝 모델을 채택하여 최고의 성능을 내고 있다.The initial entity name recognition system had high accuracy, but because it required a large number of people to design the rules, a large amount of rules had to be redesigned when switching to another field. In recent years, the rapid development of deep learning has had positive effects in each field, so many entity name recognition systems have adopted deep learning models to achieve the best performance.

딥러닝 모델을 이용하여 네스티드(nested)와 중복 개체명(overlapped Named Entity) 인식을 해결하는 방법 및 시스템을 제공할 수 있다. A method and system for solving nested and overlapped named entity recognition can be provided using a deep learning model.

개체명 인식 시스템에 의해 수행되는 개체명 인식 방법은, 텍스트 데이터를 개체명 인식을 위한 딥러닝 모델에 입력받는 단계; 및 상기 딥러닝 모델을 통해 상기 입력받은 텍스트 데이터로부터 개체명을 인식하는 단계를 포함하고, 상기 딥러닝 모델은, 인코더로부터 출력된 인코딩 결과를 어텐션 기반의 디코더의 입력 데이터로 사용하여 디코더의 각 레이어의 출력 길이가 감소되도록 구축된 것일 수 있다. An entity name recognition method performed by an entity name recognition system includes the steps of inputting text data into a deep learning model for entity name recognition; And a step of recognizing an entity name from the input text data through the deep learning model, wherein the deep learning model uses the encoding result output from the encoder as input data of an attention-based decoder for each layer of the decoder. It may be constructed to reduce the output length of .

상기 딥러닝 모델은, 텍스트 데이터로부터 단어 임베딩과 문자 임베딩을 수행하는 인코더; 및 상기 인코더를 통해 출력된 인코딩 결과를 입력 데이터로 시용하여 서로 인접한 복수 개의 입력 데이터의 어텐션 스코어를 계산하는 동작을 통해 각 레이어의 출력 데이터를 획득하는 디코더를 포함할 수 있다. The deep learning model includes an encoder that performs word embedding and character embedding from text data; and a decoder that obtains output data of each layer through an operation of calculating attention scores of a plurality of input data adjacent to each other by using the encoding result output through the encoder as input data.

상기 디코더는, 복수 개의 어텐션 레이어가 피라미드 형태로 구성되고, 상기 인코더로부터 출력된 인코딩 결과에 서로 다른 행렬을 곱하여 각 시점에서 인코딩 결과에 대해 복수 개의 벡터를 생성하고, 서로 인접한 두 개의 입력 데이터 사이의 상관관계를 획득하기 위하여 두 개의 입력 데이터에 대한 쿼리를 각각의 상대방 키와 곱셈 연산을 수행하는 어텐션 스코어를 계산하는 동작을 통해 상관관계의 값을 획득하고, tanh 함수를 사용하여 상기 획득된 상관관계의 값을 -1 내지 1사의 값으로 변환할 수 있다. The decoder has a plurality of attention layers configured in a pyramid shape, multiplies the encoding results output from the encoder by different matrices, generates a plurality of vectors for the encoding results at each time, and divides the information between two adjacent input data. In order to obtain the correlation, the value of the correlation is obtained through the operation of calculating the attention score by querying the two input data and performing a multiplication operation with each other's key, and using the tanh function to obtain the obtained correlation The value can be converted to a value between -1 and 1.

상기 디코더는, 상기 - 1 내지 1사이의 값으로 변환된 상관관계의 값에 벨류를 곱한 값을 가산하여 각 어텐션 레이어의 출력을 획득하고, 상기 획득된 각 어텐션 레이어의 출력이 완전 연결 레이어로 전달됨에 따라 소프트맥스 함수를 통해 분류 예측을 수행할 수 있다. The decoder obtains the output of each attention layer by adding the correlation value converted to a value between -1 and 1 multiplied by the value, and the obtained output of each attention layer is transmitted to the fully connected layer. Accordingly, classification prediction can be performed through the softmax function.

상기 인코더는, 임베딩 레이어(Embedding Layer), 인코딩 레이어(Encoding Layer) 및 연결 레이어(Concatenate Layer)로 구성되고, 상기 임베딩 레이어에서, 텍스트 데이터로부터 워드 임베딩과 문자 임베딩을 수행하고, 상기 인코딩 레이어에서, 상기 워드 임베딩의 결과와 상기 문자 임베딩의 결과를 연결한 임베딩 결과가 Bidirectional LSTM로 통과됨에 따라 인코딩 출력이 획득되고, 상기 연결 레이어에서, 상기 획득된 인코딩 출력과 사전 훈련된 언어 모델의 출력이 연결됨에 따라 완전 연결 레이어를 통해 인코딩 결과가 출력될 수 있다. The encoder is composed of an embedding layer, an encoding layer, and a concatenate layer. In the embedding layer, word embedding and character embedding are performed from text data, and in the encoding layer, An encoding output is obtained as the embedding result connecting the word embedding result and the character embedding result is passed through a Bidirectional LSTM, and in the connection layer, the obtained encoding output is connected to the output of the pre-trained language model. Accordingly, the encoding result can be output through a fully connected layer.

개체명 인식 시스템은, 텍스트 데이터를 개체명 인식을 위한 딥러닝 모델에 입력받는 데이터 입력부; 및 상기 딥러닝 모델을 통해 상기 입력받은 텍스트 데이터로부터 개체명을 인식하는 개체명 인식부를 포함하고, 상기 딥러닝 모델은, 인코더로부터 출력된 인코딩 결과를 어텐션 기반의 디코더의 입력 데이터로 사용하여 디코더의 각 레이어의 출력 길이가 감소되도록 구축된 것일 수 있다. The entity name recognition system includes a data input unit that receives text data into a deep learning model for entity name recognition; and an entity name recognition unit that recognizes an entity name from the input text data through the deep learning model, wherein the deep learning model uses the encoding result output from the encoder as input data for an attention-based decoder to It may be built so that the output length of each layer is reduced.

텍스트의 내용을 잘 학습하기 위해 문자, 단어 및 문맥 수준의 특징을 활용하고, 피라미드 모양의 어텐션 기반 디코더를 사용함으로써 플랫 개체명 인식(flat NER)을 해결할 수 있을 뿐만 아니라 네스티드나 중복 개체명 인식에도 적용할 수 있다.By utilizing character, word, and context-level features to learn the content of the text well, and using a pyramid-shaped attention-based decoder, it is possible to not only solve flat NER, but also recognize nested or duplicate names. It can be applied.

도 1은 일 실시예에 따른 개체명 인식 시스템의 개괄적인 동작을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 개체명 인식 시스템의 동작을 설명하기 위한 도면이다.
도 3은 일 실시예에 있어서, 어텐션 레이어를 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 개체명 인식 시스템의 구성을 설명하기 위한 블록도이다.
도 5는 일 실시예에 따른 개체명 인식 시스템에서 개체명 인식 방법을 설명하기 위한 흐름도이다. Figure 1 is a diagram for explaining the general operation of an entity name recognition system according to an embodiment.
Figure 2 is a diagram for explaining the operation of an entity name recognition system according to an embodiment.
Figure 3 is a diagram for explaining an attention layer, according to one embodiment.
Figure 4 is a block diagram for explaining the configuration of an entity name recognition system according to an embodiment.
Figure 5 is a flowchart for explaining an entity name recognition method in an entity name recognition system according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 개체명 인식 시스템의 개괄적인 동작을 설명하기 위한 도면이다. Figure 1 is a diagram for explaining the general operation of an entity name recognition system according to an embodiment.

개체명 인식 시스템(100)은 딥러닝 모델(110)을 이용하여 네스티드(nested)와 중복 개체명(overlapped Named Entity) 인식을 해결할 수 있다. 개체명 인식 시스템(100)은 딥러닝 모델(110)에 텍스트 데이터를 입력받을 수 있다. 개체명 인식 시스템(100)은 딥러닝 모델(110)을 통해 텍스트 데이터(101)로부터 개체명을 인식(102)할 수 있다. The entity name recognition system 100 can solve nested and overlapped named entity recognition using the deep learning model 110. The entity name recognition system 100 may receive text data as input to the deep learning model 110. The entity name recognition system 100 can recognize the entity name 102 from text data 101 through a deep learning model 110.

이때, 딥러닝 모델(110)은 어텐션 메커니즘을 기반으로 피라미드 모델이 설계될 수 있다. 텍스트의 임베딩 결과와 인코딩 결과가 피라미드 기반으로 구성된 디코더에 입력되면, 아래에서 위로 피라미드 모양으로 쌓아올린다. 설계된 어텐션 메커니즘에 기반한 레이어는 서로 인접해 있는 복수 개(예를 들면, 두 개)의 어텐션을 계산하여 결과를 출력하므로, 각 레이어로부터 출력된 결과는 입력 길이에 비해 1이 적어지게 된다. 디코더의 L(L은 자연수)번째 레이어는 길이가 L+1인 개체명을 인식하는데 사용되므로 네스티드나 중복 개체명을 인식할 때 서로 영향을 미치지 않는다. 이에, 플랫 개체명 인식, 네스티드 개체명 인식 및 중복 개체명 인식 작업에 적합하다. At this time, the deep learning model 110 may be designed as a pyramid model based on an attention mechanism. When the text embedding and encoding results are input to the decoder based on a pyramid, they are stacked in a pyramid shape from bottom to top. Since the layer based on the designed attention mechanism calculates the attention of multiple (for example, two) adjacent to each other and outputs the result, the result output from each layer becomes 1 less than the input length. The L (L is a natural number) layer of the decoder is used to recognize entity names with a length of L+1, so they do not affect each other when recognizing nested or duplicate entity names. Therefore, it is suitable for flat entity name recognition, nested entity name recognition, and duplicate entity name recognition tasks.

참고로, 네스티드 개체명 인식(nested NER), 중복 개체명(overlapped Named Entity), 플랫 개체명 인식(flat NER) 인식에 대하여 설명하기로 한다. 많은 개체명들 자체가 다른 개체명들을 포함하고 있는데, 이를 인식하는 것을 네스티드 개체명 인식이라고 한다. 예를 들어 University of Washington 이 span에서 University of Washington은 조직이고 Washington은 위치이다. 이러한 종류의 작업은 단순히 일련의 라벨링 작업으로 볼 수 없다. 왜냐하면 Washington이라는 단어에 분류를 표시할 때 조직의 끝이자 하나의 위치이다. 최근에는 네스티드 개체명 인식을 해결하기 위해 컴퓨터 비전에서 객체 감지 방법을 사용하여 모델이 개체의 경계 및 분류를 직접 예측하는 작업이 많이 이루어지고 있다. For reference, nested entity name recognition (nested NER), overlapped named entity recognition, and flat entity name recognition (flat NER) recognition will be described. Many entity names themselves contain other entity names, and recognizing these is called nested entity name recognition. For example, University of Washington In this span, University of Washington is the organization and Washington is the location. This kind of work cannot be viewed simply as a series of labeling operations. Because when the word Washington is classified, it is an end and a location of the organization. Recently, a lot of work has been done in computer vision to solve nested object name recognition, where the model directly predicts the boundaries and classification of objects using object detection methods.

또한, 복수 개의 개체명이 겹치는 부분이 있을 때 이를 인식하는 태스크를 중복 개체명 인식이라고 한다. 따라서 네스티드 개체명 인식은 중복 개체명 인식의 특수한 상황이라고 볼 수 있다. 그러나 복수 개의 개체명이 겹치는 부분이 있지만 그 중 어느 것도 다른 엔터티를 완전히 포함하지 않는 경우도 많다. 따라서 경계 검출에 기반한 방법은 중복 개체명 인식에도 적합하다.Additionally, the task of recognizing when multiple entity names overlap is called duplicate entity name recognition. Therefore, nested entity name recognition can be viewed as a special situation of duplicate entity name recognition. However, there are many cases where multiple entity names overlap, but none of them completely includes the other entity. Therefore, methods based on boundary detection are also suitable for recognizing duplicate entity names.

플랫 개체명 인식 작업은 개체명의 주석에 단조롭게 주석을 부가하는 것이다. 모든 엔터티 간에 중첩 또는 겹침이 없다. 이런 임무의 해결 방식은 상대적으로 간단해서 이 작업을 시퀀스 라벨링 작업으로 직접 간주할 수 있다. 최근 유행하는 해결 방식은 Long short-term memory (LSTM)와 conditional random fields (CRF) 또는 간단한 소프트맥스(softmax)를 이용해 각각의 단어를 분류하는 방식이다.The flat entity name recognition task involves monotonously adding annotations to the entity name annotations. There is no overlap or overlap between all entities. The solution to this task is relatively simple, so the task can be considered directly as a sequence labeling task. A recently popular solution is to classify each word using long short-term memory (LSTM), conditional random fields (CRF), or simple softmax.

도 2는 일 실시예에 따른 개체명 인식 시스템의 동작을 설명하기 위한 도면이다. Figure 2 is a diagram for explaining the operation of an entity name recognition system according to an embodiment.

개체명 인식 시스템은 딥러닝 모델(110)을 통해 텍스트 데이터로부터 개체명을 인식할 수 있다. 개체명 인식 시스템은 개체명 인식을 위한 태스크를 해결하기 위한 딥러닝 모델(110)을 설계할 수 있다. 딥러닝 모델(110)은 인코더(210)와 디코더(220)로 구성될 수 있다. The entity name recognition system can recognize entity names from text data through the deep learning model 110. The entity name recognition system can design a deep learning model 110 to solve the task for entity name recognition. The deep learning model 110 may be composed of an encoder 210 and a decoder 220.

인코더(210)는 임베딩 레이어(Embedding Layer), 인코딩 레이어(Encoding Layer) 및 연결 레이어(Concatenate Layer)를 포함하는 세 가지의 구성 요소로 구성될 수 있다. 상세하게는, 임베딩 레이어에서 임베딩을 결합하는 방식이 채택될 수 있다. 문자 임베딩(Character Embedding)과 워드 임베딩(Word Embedding)이 결합될 수 있다. 문자 임베딩과 워드 임베딩이 결합된 접근 방식은 단어 수준의 특징과 문자 수준의 특징을 결합한다. 임베딩 결과는 인코딩 레이어로 전달될 수 있다. 인코딩 레이어의 목적은 컨텍스트 정보를 결합하는 것이다. 인코딩 레이어의 출력 결과가 BERT의 임베딩 결과와 같은 사전 훈련된 언어 모델과 결합되어 디코딩 레이어로 전달될 수 있다.The encoder 210 may be composed of three components including an embedding layer, an encoding layer, and a concatenate layer. In detail, a method of combining embeddings in the embedding layer may be adopted. Character embedding and word embedding can be combined. The combined character embedding and word embedding approach combines word-level features and character-level features. The embedding result can be passed to the encoding layer. The purpose of the encoding layer is to combine context information. The output result of the encoding layer can be combined with a pre-trained language model, such as BERT's embedding result, and passed to the decoding layer.

디코더(220)는 피라미드 형태의 어텐션(Attention) 모델로 설계될 수 있다. L(L은 자연수) 번째 레이어의 출력은 길이 범위(Span of length) L-1의 특징이며, 마지막으로 분류를 위해 소프트맥스 방법을 사용할 수 있다. The decoder 220 may be designed with a pyramid-shaped attention model. The output of the L (L is a natural number) th layer is a feature of the span of length L-1, and finally, the softmax method can be used for classification.

상세하게는, 개체명 인식 시스템은 텍스트 데이터를 개체명 인식을 위한 딥러닝 모델에 입력받을 수 있다. 예를 들면, 텍스트 데이터는 문장일 수 있다. 이때, 딥러닝 모델에 입력되는 입력 데이터는 T 길이의 텍스트 문장이다. 출력은 각각 길이가 T, T-1, ??, T-L+1인 L개의 IOB2형식 태그 시퀀스(tag sequences)이다. IOB2 포맷(format)에서 B-{class}는 개체명의 시작에 쓰이고, I-{class}는 개체명에 해당하는 1개 이상의 토큰 중 두번째 토큰부터 쓰이고, O는 개체명에 해당되지 않는 토큰이다. In detail, the entity name recognition system can input text data into a deep learning model for entity name recognition. For example, text data may be sentences. At this time, the input data input to the deep learning model is a text sentence of T length. The output is L tag sequences in IOB2 format, each of length T, T-1, ??, T-L+1. In the IOB2 format, B-{class} is used at the beginning of the entity name, I-{class} is used from the second token among one or more tokens corresponding to the entity name, and O is a token that does not correspond to the entity name.

딥러닝 모델(110)은 인코더(210)와 디코더(220)를 포함하는 두 부분으로 구분될 수 있다. 딥러닝 모델(110)에 구성된 인코더(210)를 통한 인코딩이 수행될 수 있다. 인코딩이 수행됨에 따라 획득된 임베딩 시퀀스는 개체명 인식을 위한 디코더(디코딩 레이어)에 재귀적으로 공급될 수 있다. 디코더(220)의 각 레이어는 길이가 T-L인 특징에 대한 순차적 레이블링이다. The deep learning model 110 can be divided into two parts including an encoder 210 and a decoder 220. Encoding may be performed through the encoder 210 configured in the deep learning model 110. As encoding is performed, the obtained embedding sequence can be recursively supplied to the decoder (decoding layer) for entity name recognition. Each layer of the decoder 220 is a sequential labeling of features of length T-L.

먼저, T 토큰을 포함하는 문장은 로 표시될 수 있다. First, sentences containing T tokens are It can be displayed as .

인코더(210)는 문자 임베딩의 경우, Bidirectional LSTM을 사용할 수 있다. 워드 임베딩의 경우, GloVe 방법을 사용할 수 있다. 문자 임베딩 및 워드 임베딩을 수행한 결과를 결합하여 인코딩 레이어로 전달할 수 있다. 인코딩 레이어에서는 Bidirectional LSTM을 사용하여 컨텍스트 정보를 결합할 수 있다. 인코더의 끝에서 인코딩 레이어의 출력과 BERT의 임베딩 결과가 결합되어 완전 연결 레이어(Fully Connected Layer)를 통해 인코더의 출력으로 사용된다. 따라서, 입력 데이터 에 대한 인코더 공식은 다음과 같이 나타낼 수 있다.The encoder 210 may use Bidirectional LSTM for character embedding. For word embedding, the GloVe method can be used. The results of character embedding and word embedding can be combined and transmitted to the encoding layer. In the encoding layer, context information can be combined using Bidirectional LSTM. At the end of the encoder, the output of the encoding layer and the BERT embedding result are combined and used as the output of the encoder through a fully connected layer. Therefore, the input data The encoder formula for can be expressed as follows.

보다 구체적으로, 워드 임베딩은 사전 훈련된 단어 벡터로 초기화될 수 있다. 여기서, 사전 훈련된 GloVe이 선택될 수 있다. GloVe이 없는 단어들은 랜덤하게 초기화될 수 있다. 이런 문제는 out-of-vocabulary(OOV)라고 한다. 수식으로 표현하자면, 이 된다. More specifically, word embeddings can be initialized with pre-trained word vectors. Here, a pre-trained GloVe can be selected. Words without GloVe can be initialized randomly. This problem is called out-of-vocabulary (OOV). To express it in a formula, This happens.

또한, 문자 임베딩은 Bidirectional LSTM를 사용하여 문자 임베딩을 동적으로 생성하고, 트레이닝 중에 가중치를 업데이트할 수 있다. 문자 임베딩을 도입하면 모델이 OOV 단어를 더욱 잘 처리할 수 있다. 단어를 글자 단위로 분리하고, 임베딩을 이용하여 글자에 대해서 임베딩을 수행할 수 있다. 최종적으로 정방향 LSTM의 마지막 히든 스테이트(hidden state)와 역방향 LSTM의 첫번째 히든 스테이트(hidden state)를 연결(concatenate)한 결과를 단어의 벡터로 사용할 수 있다. 수식으로 표현하자면, , 이 된다. Additionally, character embeddings can be dynamically generated using Bidirectional LSTM, and weights can be updated during training. Introducing character embeddings allows the model to better handle OOV words. You can separate words into letters and perform embedding on the letters using embedding. Finally, the result of concatenating the last hidden state of the forward LSTM and the first hidden state of the reverse LSTM can be used as a word vector. To express it in a formula, , This happens.

또한, 인코딩 레이어는 워드 임베딩 결과와 문자 임베딩 결과를 연결(concatenate)하고, Bidirectional LSTM로 전달할 수 있다. 이는, 컨텍스트 정보(contextual information)를 더욱 활용하기 위함이다. 수식으로 표현하자면, , 이 된다. Additionally, the encoding layer can concatenate the word embedding result and the character embedding result and transfer them to Bidirectional LSTM. This is to further utilize contextual information. To express it in a formula, , This happens.

또한, 더 나은 컨텍스트 정보를 이용하기 위하여 사전 훈련된 언어 모델이 사용될 수 있다. 여기서는, BERT가 사용될 수 있다. 수식으로 표현하자면, 이 된다. Additionally, pre-trained language models can be used to utilize better context information. Here, BERT can be used. To express it in a formula, This happens.

연결 레이어(concatenate Layer)는 인코딩 레이어의 출력과 사전 훈련된 언어 모델의 출력을 연결한 다음 완전 연결 레이어를 통해 결과 데이터를 출력할 수 있다. 수식으로 표현하자면, 이 된다. The concatenate layer connects the output of the encoding layer and the output of the pre-trained language model, and then outputs the resulting data through the fully concatenated layer. To express it in a formula, This happens.

도 3을 참고하면, 어텐션 레이어(230)를 설명하기 위한 도면이다. 디코더(220)는 쌍 어텐션 레이어(Pairwise Attention Layer)로 구성될 수 있다. 디코더는 각 입력에 3개의 다른 행렬을 곱하여 3개의 벡터 q, k, v를 획득할 수 있다. 다시 말해서, 디코더(220)는 인코더 부분의 출력에 3개의 서로 다른 행렬을 곱하여 각 시점에서 출력에 대해 3개의 벡터를 생성할 수 있다. 디코더의 입력 데이터는 이다. 여기서, q는 쿼리(query), k는 키(key), v는 벨류(value)를 나타낸다. 벡터 q, k, v의 계산 방식은 다음과 같다. Referring to FIG. 3, it is a diagram to explain the attention layer 230. The decoder 220 may be composed of a pairwise attention layer. The decoder can obtain three vectors q, k, and v by multiplying each input by three different matrices. In other words, decoder 220 can multiply the output of the encoder portion by three different matrices to generate three vectors for the output at each point in time. The input data of the decoder is am. Here, q represents query, k represents key, and v represents value. The calculation method for vectors q, k, and v is as follows.

여기서, 는 디코더의 L번째 어텐션 레이어의 i번째 입력 데이터이다. 는 의 키, 쿼리, 벨류이다. 그리고 d_k=d_q이다. here, is the i-th input data of the L-th attention layer of the decoder. Is These are the key, query, and value. And d _k =d _q .

는 학습할 파라미터들이고, 디코더의 어텐션 레이어 사이에 공유될 수 있다. are the parameters to be learned, and can be shared between the attention layers of the decoder.

그런 다음, 인접하는 두 입력 데이터 사이에 상관관계를 획득하기 위하여, 두 입력 데이터에 대한 쿼리를 상대방의 키와 곱셈 연산을 수행할 수 있다. 이때, 값이 높을수록 두 점 사이의 관계가 강하다는 것을 의미한다. 그리고, tanh 함수를 사용하여 상관관계의 값을 [-1, 1]사이에 스케일링(scaling)한다. 어텐션 스코어를 계산하는 수식은 아래와 같다. Then, in order to obtain a correlation between two adjacent input data, a query for the two input data can be multiplied with the other's key. At this time, the higher the value, the stronger the relationship between the two points. Then, the tanh function is used to scale the correlation value between [-1, 1]. The formula for calculating the attention score is as follows.

는 입력 데이터 의 쿼리이고, 는 입력 데이터 의 키의 전치이다. 는 입력 데이터 에 대한 입력 데이터 의 어텐션 스코어를 나타낸다. is the input data is a query of, is the input data It is the transpose of the key of . is the input data input data for Indicates the attention score of .

디코더(220)에 구성된 어텐션 레이어(230)의 출력은 인접한 두 입력 데이터의 어텐션 스코어에 해당 벨류를 곱하고 가산함으로써 결과 데이터가 획득될 수 있다. 계산 방법은 다음과 같다.The output of the attention layer 230 configured in the decoder 220 can be obtained by multiplying the attention scores of two adjacent input data by the corresponding values and adding them. The calculation method is as follows:

여기서, 는 디코더의 L번째 어텐션 레이어의 i번째 출력이고, L+1번째 어텐션 레이어의 i번째 입력이다. here, is the i-th output of the L-th attention layer of the decoder, and is the i-th input of the L+1-th attention layer.

디코더에 어텐션 레이어가 출력될 때마다 출력 길이가 1씩 줄어들게 된다. 그리고, 어텐션 레이어 L의 출력은 텍스트에 대한 슬라이딩 윈도우(window)를 설정하는 것과 동일하며, 윈도우의 크기는 L+1이다. Each time the attention layer is output to the decoder, the output length is reduced by 1. And, the output of attention layer L is equivalent to setting a sliding window for text, and the size of the window is L+1.

마지막으로, 완전 연결 레이어와 소프트맥스 방법을 사용하여 각 어텐션 레이어에서 분류 작업이 수행될 수 있다. 각 어텐션 레이어의 출력은 완전 연결 레이어로 전송되고, 소프트맥스 함수는 분류 예측을 수행하는데 사용될 수 있다. 수식은 다음과 같다.Finally, classification tasks can be performed on each attention layer using fully connected layers and the softmax method. The output of each attention layer is sent to the fully connected layer, and the softmax function can be used to perform classification prediction. The formula is as follows:

여기서, 는 디코더의 L번째 어텐션 레이어의 i번째 출력의 예측 결과이다. d_class는 태그 총 클래스의 개수이다. here, is the prediction result of the ith output of the Lth attention layer of the decoder. d _class is the total number of tag classes.

도 4는 일 실시예에 따른 개체명 인식 시스템의 구성을 설명하기 위한 블록도이고, 도 5는 일 실시예에 따른 개체명 인식 시스템에서 개체명 인식 방법을 설명하기 위한 흐름도이다. FIG. 4 is a block diagram for explaining the configuration of an entity name recognition system according to an embodiment, and FIG. 5 is a flowchart for explaining an entity name recognition method in an entity name recognition system according to an embodiment.

개체명 인식 시스템(100)의 프로세서는 데이터 입력부(410) 및 개체명 인식부(320)를 포함할 수 있다. 이러한 프로세서의 구성요소들은 개체명 인식 시스템에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 프로세서 및 프로세서의 구성요소들은 도 5의 개체명 인식 방법이 포함하는 단계들(510 내지 520)을 수행하도록 개체명 인식 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서의 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. The processor of the entity name recognition system 100 may include a data input unit 410 and an entity name recognition unit 320. These processor components may be expressions of different functions performed by the processor according to control instructions provided by program codes stored in the entity name recognition system. The processor and its components may control the entity recognition system to perform steps 510 to 520 included in the entity name recognition method of FIG. 5 . At this time, the processor and its components may be implemented to execute instructions according to the code of an operating system included in the memory and the code of at least one program.

프로세서는 개체명 인식 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 개체명 인식 시스템에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 로딩하도록 개체명 인식 시스템을 제어할 수 있다. 이때, 프로세서는 데이터 입력부(410) 및 개체명 인식부(420) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(510 내지 520)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다.The processor may load the program code stored in the file of the program for the entity name recognition method into memory. For example, when a program is executed in the entity name recognition system, the processor can control the entity name recognition system to load program code from the program file into memory under the control of the operating system. At this time, the processor, the data input unit 410 and the entity name recognition unit 420 each execute the instructions of the corresponding part of the program code loaded in the memory to execute the subsequent steps 510 to 520. These can be functional expressions.

단계(510)에서 데이터 입력부(410)는 텍스트 데이터를 개체명 인식을 위한 딥러닝 모델에 입력받을 수 있다. 데이터 입력부(410)는 문장 형태의 텍스트 데이터를 개체명 인식을 위한 딥러닝 모델에 입력받을 수 있다. 이때, 텍스트 데이터는 적어도 하나 이상으로 구성된 문장을 포함할 수 있다. 또한, 텍스트 데이터는 한글, 영어 등 하나의 언어 또는 여러 가지 언어가 혼용된 형태를 포함할 수도 있다. In step 510, the data input unit 410 may receive text data as input into a deep learning model for entity name recognition. The data input unit 410 can receive text data in the form of sentences into a deep learning model for entity name recognition. At this time, the text data may include at least one sentence. Additionally, text data may include one language, such as Korean or English, or a mixture of several languages.

단계(520)에서 개체명 인식부(420)는 딥러닝 모델을 통해 입력받은 텍스트 데이터로부터 개체명을 인식할 수 있다. 이를 통해, 플랫 개체명 인식, 네스티드 개체명 인식 및 중복 개체명 인식 작업에 적합한 개체명 인식이 가능하게 된다. In step 520, the entity name recognition unit 420 may recognize the entity name from text data input through a deep learning model. Through this, entity name recognition suitable for flat entity name recognition, nested entity name recognition, and duplicate entity name recognition tasks is possible.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), etc. , may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. It can be embodied in . Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In the entity name recognition method performed by the entity name recognition system,
Inputting text data into a deep learning model for entity name recognition; and
Recognizing an entity name from the input text data through the deep learning model
Including,
The deep learning model is constructed to reduce the output length of each layer of the decoder by using the encoding result output from the encoder as input data of an attention-based decoder,
An encoder that outputs an encoding result by combining context information combined through word embedding and character embedding from text data and the output of a pre-trained language model; and
A decoder that obtains output data of each layer through an operation of calculating attention scores of a plurality of input data adjacent to each other by using the encoding result output through the encoder as input data,
The decoder is,
The encoding result output from the encoder is multiplied by different matrices to generate a plurality of vectors for the encoding result at each point in time, and a query for the two input data is respectively performed to obtain the correlation between the two adjacent input data. Obtaining the value of the correlation through the operation of calculating an attention score that performs a multiplication operation with the other key of , and converting the obtained correlation value to a value of -1 to 1 using the tanh function.
An entity name recognition method characterized by:

delete

According to paragraph 1,
The decoder is,
The output of each attention layer is obtained by adding the correlation value converted to a value between 1 and 1 multiplied by the value, and as the output of each attention layer obtained is transmitted to the fully connected layer, softmax is obtained. Performing classification prediction through functions
An entity name recognition method characterized by:

According to paragraph 1,
The encoder consists of an embedding layer, an encoding layer, and a concatenate layer,
In the embedding layer, word embedding and character embedding are performed from text data,
In the encoding layer, an encoding output is obtained as the embedding result connecting the word embedding result and the character embedding result is passed through a Bidirectional LSTM,
In the connection layer, the obtained encoding output and the output of the pre-trained language model are connected, and the encoding result is output through a fully connected layer.
An entity name recognition method characterized by:

In the entity name recognition system,
A data input unit that receives text data into a deep learning model for entity name recognition; and
Entity name recognition unit that recognizes the entity name from the input text data through the deep learning model
Including,
The deep learning model is constructed to reduce the output length of each layer of the decoder by using the encoding result output from the encoder as input data of an attention-based decoder,
An encoder that outputs an encoding result by combining context information combined through word embedding and character embedding from text data and the output of a pre-trained language model; and
A decoder that obtains output data of each layer through an operation of calculating attention scores of a plurality of input data adjacent to each other by using the encoding result output through the encoder as input data,
The decoder is,
The encoding result output from the encoder is multiplied by different matrices to generate a plurality of vectors for the encoding result at each time point, and a query is made on the two input data to obtain the correlation between the two adjacent input data. Obtaining the value of the correlation through the operation of calculating an attention score that performs a multiplication operation with the other key of , and converting the obtained correlation value to a value of -1 to 1 using the tanh function.
An entity name recognition system characterized by: