KR102642012B1

KR102642012B1 - Electronic apparatus for performing pre-processing regarding analysis of text constituting electronic medical record

Info

Publication number: KR102642012B1
Application number: KR1020210182628A
Authority: KR
Inventors: 김철호; 김유섭; 최정명; 서수영; 이재준
Original assignee: 한림대학교 산학협력단
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2024-02-27
Also published as: KR20230093754A

Abstract

전자 장치가 개시된다. 본 전자 장치는, 제1 언어의 단어와 제2 언어의 단어 간의 매칭 관계를 나타내는 단어 사전, 및 상기 제2 언어의 텍스트를 처리하기 위한 적어도 하나의 딥러닝 모델이 저장된 메모리, 메모리와 연결된 프로세서를 포함한다. 프로세서는, 전체 텍스트 내에서 제1 언어에 해당하는 제1 텍스트 및 제2 언어에 해당하는 제2 텍스트를 식별하고, 단어 사전을 기반으로, 식별된 제1 텍스트를 제2 언어에 해당하는 제3 텍스트로 변환하고, 제2 텍스트 및 제3 텍스트를 포함하는 통합 텍스트를 자연어 처리하여 하나 이상의 벡터를 획득하고, 획득된 벡터를 기초로 딥러닝 모델을 훈련시킨다.An electronic device is disclosed. The electronic device includes a memory storing a word dictionary indicating a matching relationship between words of a first language and words of a second language, and at least one deep learning model for processing text of the second language, and a processor connected to the memory. Includes. The processor identifies, within the entire text, a first text corresponding to the first language and a second text corresponding to the second language, and, based on the word dictionary, converts the identified first text into a third text corresponding to the second language. Convert it to text, process the integrated text including the second text and the third text into natural language to obtain one or more vectors, and train a deep learning model based on the obtained vectors.

Description

An electronic device that performs preprocessing related to the analysis of text constituting an electronic medical record { ELECTRONIC APPARATUS FOR PERFORMING PRE-PROCESSING REGARDING ANALYSIS OF TEXT CONSTITUTING ELECTRONIC MEDICAL RECORD }

본 개시는 텍스트의 분석을 위한 전처리를 수행하는 전자 장치에 관한 것으로, 보다 상세하게는, 전문 분야에 특화된 단어 사전을 바탕으로 특정 언어에 대한 번역을 수행함으로써 복수의 언어가 혼재된 텍스트 전체를 효과적으로 처리할 수 있는 전자 장치에 관한 것이다.The present disclosure relates to an electronic device that performs preprocessing for text analysis. More specifically, the present disclosure relates to an electronic device that performs translation for a specific language based on a word dictionary specialized in a specialized field, effectively translating the entire text in which multiple languages are mixed. It is about electronic devices that can process

뇌경색 등 다양한 질환의 환자들의 전자 의무 기록을 활용함에 있어, 일반적인 기계번역으로는 정확한 의학용어를 영문으로 치환하는 것이 효율적이지 않았다.When using electronic medical records of patients with various diseases, such as cerebral infarction, it was not efficient to replace accurate medical terms into English using general machine translation.

특히, 일반적인 대화 또는 기사 등에서 자주 사용하게 되는 단어들의 경우 의학 용어와는 상당히 동떨어져 있으며, 이러한 일반적인 텍스트 전처리 방법을 사용할 경우에는 불명확한 한글의 영문 전환으로 인한 텍스트의 의미가 기계적으로 정확하게 인식될 수 없다. In particular, words that are frequently used in general conversations or articles are quite far from medical terminology, and when using this general text preprocessing method, the meaning of the text due to unclear conversion of Korean to English cannot be mechanically and accurately recognized. does not exist.

인공어에 비하여 자연어의 경우, 특히 한글의 경우는 의학, 예술 등과 같은 각각의 분야에서도 동일한 단어가 다른 의미로 사용되어 기계적인 번역을 적용할 경우에는 잘못된 벡터로의 전환을 일으켜 효율적인 텍스트 데이터를 구현하는 것이 불가능하다.Compared to artificial languages, in the case of natural languages, especially Hangul, the same words are used with different meanings in fields such as medicine and art, so when mechanical translation is applied, conversion to incorrect vectors occurs, making it difficult to implement efficient text data. It is impossible.

도 1a는 종래의 영문 자연어처리의 과정을 설명하기 위한 모식도, 도 1b는 종래의 한글 자연어처리의 과정을 설명하기 위한 모식도이다.Figure 1A is a schematic diagram for explaining the process of conventional English natural language processing, and Figure 1B is a schematic diagram for explaining the conventional Korean natural language processing process.

도 1a를 참조하면, Tokenization, Stemming, Stop Word Removal, Part-Of-Speech Tagging 등을 통하여 전처리된 영문은 영문 텍스트 벡터로 사용될 수 있다.Referring to Figure 1a, English text preprocessed through Tokenization, Stemming, Stop Word Removal, Part-Of-Speech Tagging, etc. can be used as an English text vector.

그리고, 도 1b를 참조하면, 한글 텍스트의 경우 Tokenization, POS Tagging 외에 Entity Detection, Relation Detection 등의 과정을 거쳐 벡터로 사용될 수 있다.And, referring to Figure 1b, in the case of Korean text, it can be used as a vector through processes such as Entity Detection and Relation Detection in addition to Tokenization and POS Tagging.

다만, 한영이 혼재되어 있는 경우에는 token 제작에 한글과 영문 token을 각각 따로 제작해야 하며, 이러한 경우에는 한글, 영문의 순서가 뒤섞여 정확한 문맥의 해석을 기대할 수 없다.However, in cases where Korean and English are mixed, Korean and English tokens must be produced separately. In this case, the order of Korean and English is mixed, so accurate interpretation of the context cannot be expected.

등록 특허 공보 제10-233144호(신경망을 이용한 텍스트 인식 시스템 및 그 방법)Registered Patent Publication No. 10-233144 (Text recognition system and method using neural network)

본 개시는 복수의 언어(ex. 한글, 영어 등)가 포함된 텍스트를 전처리함에 있어, 전문 분야에 특화된 단어 사전을 활용하여 특정 언어로 모두 변환한 뒤, 통합 텍스트를 활용하여 딥러닝 모델을 구축하는 전자 장치 및 제어 방법을 제공한다.In the present disclosure, when preprocessing text containing multiple languages (e.g. Korean, English, etc.), all of them are converted into a specific language using a word dictionary specialized in the specialized field, and then a deep learning model is built using the integrated text. Provides an electronic device and control method.

본 개시의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 개시의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 개시의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 개시의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present disclosure are not limited to the purposes mentioned above, and other objects and advantages of the present disclosure that are not mentioned can be understood by the following description and will be more clearly understood by the examples of the present disclosure. Additionally, it will be readily apparent that the objects and advantages of the present disclosure can be realized by the means and combinations thereof indicated in the patent claims.

본 개시의 일 실시 예에 따른 전자 장치는, 제1 언어의 단어와 제2 언어의 단어 간의 매칭 관계를 나타내는 단어 사전, 및 상기 제2 언어의 텍스트를 처리하기 위한 적어도 하나의 딥러닝 모델이 저장된 메모리, 상기 메모리와 연결된 프로세서를 포함한다. 상기 프로세서는, 전체 텍스트 내에서 상기 제1 언어에 해당하는 제1 텍스트 및 상기 제2 언어에 해당하는 제2 텍스트를 식별하고, 상기 단어 사전을 기반으로, 상기 식별된 제1 텍스트를 상기 제2 언어에 해당하는 제3 텍스트로 변환하고, 상기 제2 텍스트 및 상기 제3 텍스트를 포함하는 통합 텍스트를 자연어 처리하여 하나 이상의 벡터를 획득하고, 상기 획득된 벡터를 기초로 상기 딥러닝 모델을 훈련시킨다.An electronic device according to an embodiment of the present disclosure stores a word dictionary indicating a matching relationship between words of a first language and words of a second language, and at least one deep learning model for processing text of the second language. It includes a memory and a processor connected to the memory. The processor identifies a first text corresponding to the first language and a second text corresponding to the second language within the entire text, and, based on the word dictionary, converts the identified first text into the second text. Convert to a third text corresponding to the language, process the integrated text including the second text and the third text into natural language to obtain one or more vectors, and train the deep learning model based on the obtained vectors. .

상기 프로세서는, 상기 제1 텍스트 내에서, 상기 단어 사전에 포함된 적어도 하나의 제1 키워드를 식별하고, 상기 단어 사전을 기반으로, 상기 제1 텍스트에 포함된 상기 제1 키워드를 상기 제2 언어에 해당하는 제2 키워드로 변환하고, 상기 제2 키워드가 하나의 개체로 설정된 상태에서, 상기 제1 텍스트를 상기 제2 언어에 해당하는 상기 제3 텍스트로 변환할 수 있다.The processor identifies, within the first text, at least one first keyword included in the word dictionary, and, based on the word dictionary, converts the first keyword included in the first text into the second language. and, with the second keyword set as one entity, the first text can be converted into the third text corresponding to the second language.

이 경우, 상기 프로세서는, 상기 획득된 벡터를 기초로 훈련된 상기 딥러닝 모델의 훈련 전후 정확도를 비교하고, 훈련 전후의 정확도에 대한 비교 결과 상기 딥러닝 모델의 정확도가 낮아진 경우, 상기 제1 키워드에 매칭되는 상기 제2 언어의 키워드가 변경되도록 상기 단어 사전을 업데이트할 수 있다.In this case, the processor compares the accuracy before and after training of the deep learning model trained based on the obtained vector, and when the accuracy of the deep learning model is lowered as a result of comparing the accuracy before and after training, the first keyword The word dictionary may be updated so that keywords of the second language that match are changed.

또한, 상기 프로세서는, 상기 단어 사전에 따라 상기 제1 키워드에 매칭되는 복수의 제2 키워드가 식별되면, 상기 복수의 제2 키워드 각각을 독립적으로 반영하여 복수의 제3 텍스트를 획득하고, 상기 복수의 제3 텍스트 각각에 매칭되는 벡터가 독립적으로 입력됨에 따른 상기 딥러닝 모델의 출력 간의 차이가 임계치 미만인 경우, 상기 단어 사전을 유지하고, 상기 복수의 제3 텍스트 각각에 매칭되는 벡터가 독립적으로 입력됨에 따른 상기 딥러닝 모델의 출력 간의 차이가 상기 임계치 이상인 경우, 상기 단어 사전을 업데이트할 수도 있다.In addition, when a plurality of second keywords matching the first keyword are identified according to the word dictionary, the processor acquires a plurality of third texts by independently reflecting each of the plurality of second keywords, and If the difference between the outputs of the deep learning model as the vector matching each of the third texts is independently input is less than a threshold, the word dictionary is maintained, and the vector matching each of the plurality of third texts is independently input. If the difference between the outputs of the deep learning model is greater than or equal to the threshold, the word dictionary may be updated.

상기 프로세서는, 상기 통합 텍스트에 대하여 tokenization, 품사 태깅, 및 word embedding을 수행하여 하나 이상의 벡터를 획득할 수 있다.The processor may obtain one or more vectors by performing tokenization, part-of-speech tagging, and word embedding on the integrated text.

한편, 상기 메모리는, 제1 전문 분야에 대한 제1 단어 사전, 제2 전문 분야에 대한 제2 단어 사전, 및 제3 전문 분야에 대한 제3 단어 사전을 포함할 수도 있다. 이 경우, 상기 프로세서는, 상기 전체 텍스트가 상기 제1 전문 분야로 설정된 경우, 상기 제1 단어 사전을 활용하여 상기 제1 텍스트를 상기 제2 언어에 해당하는 제3 텍스트로 변환할 수 있다.Meanwhile, the memory may include a first word dictionary for a first specialty field, a second word dictionary for a second specialty field, and a third word dictionary for a third specialty field. In this case, when the entire text is set to the first specialized field, the processor may convert the first text into a third text corresponding to the second language using the first word dictionary.

여기서, 상기 프로세서는, 상기 제1 텍스트 내에 상기 제1 단어 사전에 포함된 키워드가 존재하지 않는 경우, 상기 제2 단어 사전 및 상기 제3 단어 사전 중 적어도 하나를 활용하여 상기 제1 텍스트를 상기 제2 언어에 해당하는 제3 텍스트로 변환할 수도 있다.Here, when a keyword included in the first word dictionary does not exist in the first text, the processor uses at least one of the second word dictionary and the third word dictionary to convert the first text into the first word dictionary. 2 It can also be converted to a third text corresponding to the language.

이 경우, 상기 프로세서는, 상기 제2 단어 사전 및 상기 제3 단어 사전 중, 상기 제1 텍스트에 포함된 키워드와 관련된 적어도 하나의 단어 사전을 선택하고, 상기 선택된 단어 사전을 활용하여 상기 제1 텍스트를 상기 제2 언어에 해당하는 제3 텍스트로 변환하고, 상기 선택된 단어 사전 내에서 상기 키워드와 매칭되는 상기 제2 언어의 키워드를 바탕으로, 상기 제1 단어 사전을 업데이트할 수도 있다.In this case, the processor selects at least one word dictionary related to a keyword included in the first text among the second word dictionary and the third word dictionary, and uses the selected word dictionary to create the first text may be converted into a third text corresponding to the second language, and the first word dictionary may be updated based on the keyword of the second language that matches the keyword in the selected word dictionary.

본 개시의 일 실시 예에 따라 제1 언어의 단어와 제2 언어의 단어 간의 매칭 관계를 나타내는 단어 사전, 및 상기 제2 언어의 텍스트를 처리하기 위한 적어도 하나의 딥러닝 모델을 포함하는 전자 장치의 제어 방법은, 전체 텍스트 내에서 상기 제1 언어에 해당하는 제1 텍스트 및 상기 제2 언어에 해당하는 제2 텍스트를 식별하는 단계, 상기 단어 사전을 기반으로, 상기 식별된 제1 텍스트를 상기 제2 언어에 해당하는 제3 텍스트로 변환하는 단계, 상기 제2 텍스트 및 상기 제3 텍스트를 포함하는 통합 텍스트를 자연어 처리하여 하나 이상의 벡터를 획득하는 단계, 상기 획득된 벡터를 기초로 상기 딥러닝 모델을 훈련시키는 단계를 포함한다.According to an embodiment of the present disclosure, an electronic device including a word dictionary indicating a matching relationship between words of a first language and words of a second language, and at least one deep learning model for processing text of the second language. The control method includes identifying a first text corresponding to the first language and a second text corresponding to the second language within the entire text, and based on the word dictionary, dividing the identified first text into the second text. Converting into a third text corresponding to a second language, processing the integrated text including the second text and the third text into natural language to obtain one or more vectors, and forming the deep learning model based on the obtained vector. It includes training steps.

본 개시의 일 실시 예에 따른 전자 장치는, 제1 언어의 단어와 제2 언어의 단어 간의 매칭 관계를 나타내는 단어 사전이 저장된 메모리, 상기 메모리와 연결된 프로세서를 포함한다. 상기 프로세서는, 전체 텍스트 내에서 상기 제1 언어에 해당하는 제1 텍스트 및 상기 제2 언어에 해당하는 제2 텍스트를 식별하고, 상기 단어 사전을 기반으로, 상기 식별된 제1 텍스트를 상기 제2 언어에 해당하는 제3 텍스트로 변환하고, 상기 제2 텍스트 및 상기 제3 텍스트를 포함하는 통합 텍스트를 자연어 처리하여 하나 이상의 벡터를 획득할 수 있다.An electronic device according to an embodiment of the present disclosure includes a memory storing a word dictionary indicating a matching relationship between words of a first language and words of a second language, and a processor connected to the memory. The processor identifies a first text corresponding to the first language and a second text corresponding to the second language within the entire text, and, based on the word dictionary, converts the identified first text into the second text. One or more vectors can be obtained by converting the text into a third text corresponding to the language and processing the integrated text including the second text and the third text into natural language.

본 개시에 따른 전자 장치 및 제어 방법은, 복수의 언어가 혼재된 전체 텍스트를 취급함에 있어서, 전문 분야에 최적화된 단어 사전이 활용함으로써 언어의 종류를 일원화할 수 있고, 그 결과 맥락이 통일된 정밀한 자연처 처리가 수행됨으로써 딥러닝 모델의 훈련 환경을 개선하는 효과가 있다.The electronic device and control method according to the present disclosure can unify the types of languages by utilizing a word dictionary optimized for the specialized field when handling the entire text in which multiple languages are mixed, and as a result, the types of languages can be unified, and as a result, the context is unified and precise. By performing natural processing, there is an effect of improving the training environment of deep learning models.

의학 전문 분야에 활용되는 경우, 본 개시에 따른 전자 장치는, 의학용어의 의미를 정확하게 전달할 수 있는 지식기반 딕셔너리를 제작한 이후 이를 전자의무기록에 적용한 이후 추가적인 기계번역을 통한 통합 전처리방법을 통하여 한영혼재 전자의무기록 자연어처리의 효율성을 확보할 수 있다. 이 경우, 단어들의 개별적인 의미 외에도 문장의 의미나 문맥 등이 최대한 반영될 수 있다. 본 개시에 따른 전자 장치는 뇌경색 등 뇌질환을 포함하여 다양한 종류의 질환에 대하여 응용/적용될 수 있는 전처리 과정을 수행한다.When used in the medical field, the electronic device according to the present disclosure creates a knowledge-based dictionary that can accurately convey the meaning of medical terms, applies it to the electronic medical record, and then translates it into Korean-English through an integrated pre-processing method through additional machine translation. The efficiency of natural language processing of mixed electronic medical records can be secured. In this case, in addition to the individual meanings of the words, the meaning and context of the sentence can be reflected as much as possible. The electronic device according to the present disclosure performs a preprocessing process that can be applied to various types of diseases, including brain diseases such as cerebral infarction.

도 1a는 종래의 영문 자연어처리의 과정을 설명하기 위한 모식도,
도 1b는 종래의 한글 자연어처리의 과정을 설명하기 위한 모식도,
도 2는 본 개시의 일 실시 예에 따라 단어 사전을 포함하는 전자 장치의 구성을 설명하기 위한 블록도,
도 3은 본 개시의 일 실시 예에 따른 전자 장치의 동작을 설명하기 위한 흐름도,
도 4는 본 개시의 일 실시 예에 따른 전자 장치가 단어 사전을 활용하여 한영 변환을 수행하는 동작을 설명하기 위한 도면, 그리고
도 5는 본 개시의 일 실시 예에 따른 전자 장치가 전문 분야 별로 구분된 단어 사전을 선택적으로 활용하는 동작을 설명하기 위한 알고리즘이다.Figure 1a is a schematic diagram for explaining the process of conventional English natural language processing;
Figure 1b is a schematic diagram for explaining the process of conventional Hangul natural language processing;
2 is a block diagram illustrating the configuration of an electronic device including a word dictionary according to an embodiment of the present disclosure;
3 is a flowchart for explaining the operation of an electronic device according to an embodiment of the present disclosure;
4 is a diagram illustrating an operation in which an electronic device performs Korean-English conversion using a word dictionary according to an embodiment of the present disclosure; and
FIG. 5 is an algorithm for explaining an operation in which an electronic device selectively utilizes a word dictionary classified by specialized field according to an embodiment of the present disclosure.

본 개시에 대하여 구체적으로 설명하기에 앞서, 본 명세서 및 도면의 기재 방법에 대하여 설명한다.Before explaining the present disclosure in detail, the description method of the present specification and drawings will be explained.

먼저, 본 명세서 및 청구범위에서 사용되는 용어는 본 개시의 다양한 실시 예들에서의 기능을 고려하여 일반적인 용어들을 선택하였다. 하지만, 이러한 용어들은 당해 기술 분야에 종사하는 기술자의 의도나 법률적 또는 기술적 해석 및 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 일부 용어는 출원인이 임의로 선정한 용어도 있다. 이러한 용어에 대해서는 본 명세서에서 정의된 의미로 해석될 수 있으며, 구체적인 용어 정의가 없으면 본 명세서의 전반적인 내용 및 당해 기술 분야의 통상적인 기술 상식을 토대로 해석될 수도 있다. First, the terms used in the specification and claims are general terms selected in consideration of their functions in various embodiments of the present disclosure. However, these terms may vary depending on the intention of technicians working in the relevant technical field, legal or technical interpretation, and the emergence of new technologies. Additionally, some terms are arbitrarily selected by the applicant. These terms may be interpreted as defined in this specification, and if there is no specific term definition, they may be interpreted based on the overall content of this specification and common technical knowledge in the relevant technical field.

또한, 본 명세서에 첨부된 각 도면에 기재된 동일한 참조번호 또는 부호는 실질적으로 동일한 기능을 수행하는 부품 또는 구성요소를 나타낸다. 설명 및 이해의 편의를 위해서 서로 다른 실시 예들에서도 동일한 참조번호 또는 부호를 사용하여 설명한다. 즉, 복수의 도면에서 동일한 참조 번호를 가지는 구성요소를 모두 도시되어 있다고 하더라도, 복수의 도면들이 하나의 실시 예를 의미하는 것은 아니다. In addition, the same reference numbers or symbols in each drawing attached to this specification indicate parts or components that perform substantially the same function. For convenience of explanation and understanding, the same reference numerals or symbols are used in different embodiments. That is, even if all components having the same reference number are shown in multiple drawings, the multiple drawings do not represent one embodiment.

또한, 본 명세서 및 청구범위에서는 구성요소들 간의 구별을 위하여 "제1", "제2" 등과 같이 서수를 포함하는 용어가 사용될 수 있다. 이러한 서수는 동일 또는 유사한 구성요소들을 서로 구별하기 위하여 사용하는 것이며 이러한 서수 사용으로 인하여 용어의 의미가 한정 해석되어서는 안 된다. 일 예로, 이러한 서수와 결합된 구성요소는 그 숫자에 의해 사용 순서나 배치 순서 등이 제한되어서는 안 된다. 필요에 따라서는, 각 서수들은 서로 교체되어 사용될 수도 있다. Additionally, in this specification and claims, terms including ordinal numbers such as “first”, “second”, etc. may be used to distinguish between components. These ordinal numbers are used to distinguish identical or similar components from each other, and the meaning of the term should not be interpreted limitedly due to the use of these ordinal numbers. For example, the order of use or arrangement of components combined with these ordinal numbers should not be limited by the number. If necessary, each ordinal number may be used interchangeably.

본 명세서에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, singular expressions include plural expressions, unless the context clearly dictates otherwise. In this application, terms such as “comprise” or “consist of” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are intended to indicate the presence of one or more other It should be understood that this does not exclude in advance the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof.

본 개시의 실시 예에서 "모듈", "유닛", "부(part)" 등과 같은 용어는 적어도 하나의 기능이나 동작을 수행하는 구성요소를 지칭하기 위한 용어이며, 이러한 구성요소는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 "모듈", "유닛", "부(part)" 등은 각각이 개별적인 특정한 하드웨어로 구현될 필요가 있는 경우를 제외하고는, 적어도 하나의 모듈이나 칩으로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In embodiments of the present disclosure, terms such as “module”, “unit”, “part”, etc. are terms to refer to components that perform at least one function or operation, and these components are either hardware or software. It may be implemented or may be implemented through a combination of hardware and software. In addition, a plurality of "modules", "units", "parts", etc. are integrated into at least one module or chip, except in cases where each needs to be implemented with individual specific hardware, and is integrated into at least one processor. It can be implemented as:

또한, 본 개시의 실시 예에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적인 연결뿐 아니라, 다른 매체를 통한 간접적인 연결의 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 포함한다는 의미는, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, in an embodiment of the present disclosure, when a part is connected to another part, this includes not only direct connection but also indirect connection through other media. In addition, the meaning that a part includes a certain component does not mean that other components are excluded, but that it may further include other components, unless specifically stated to the contrary.

도 2는 본 개시의 일 실시 예에 따라 단어 사전을 포함하는 전자 장치의 구성을 설명하기 위한 블록도이다.FIG. 2 is a block diagram illustrating the configuration of an electronic device including a word dictionary according to an embodiment of the present disclosure.

도 2를 참조하면, 전자 장치(100)는 메모리(110) 및 프로세서(120)를 포함할 수 있다.Referring to FIG. 2 , the electronic device 100 may include a memory 110 and a processor 120.

전자 장치(100)는 서버에 해당할 수 있으며, 스마트폰, 스마트 스피커, 데스크탑 PC, 노트북 PC, 태블릿 PC 등 다양한 단말 기기에 해당할 수 있다. The electronic device 100 may correspond to a server and may correspond to various terminal devices such as a smartphone, smart speaker, desktop PC, laptop PC, and tablet PC.

또한, 전자 장치(100)는 하나 이상의 컴퓨터를 포함하는 시스템으로 구현될 수 있다. 예를 들어, 전자 장치(100)는 적어도 하나의 병원, 의료기관, 또는 공공기관에서 운영하는 전자 의무 기록을 관리하는 시스템에 해당할 수 있으나, 이에 한정되지 않는다.Additionally, the electronic device 100 may be implemented as a system including one or more computers. For example, the electronic device 100 may correspond to a system that manages electronic medical records operated by at least one hospital, medical institution, or public institution, but is not limited to this.

메모리(110)는 전자 장치(100)의 구성요소들의 전반적인 동작을 제어하기 위한 운영체제(OS: Operating System) 및 전자 장치(100)의 구성요소와 관련된 적어도 하나의 인스트럭션 또는 데이터를 저장하기 위한 구성이다.The memory 110 is configured to store an operating system (OS) for controlling the overall operation of the components of the electronic device 100 and at least one instruction or data related to the components of the electronic device 100. .

메모리(110)는 ROM, 플래시 메모리 등의 비휘발성 메모리를 포함할 수 있으며, DRAM 등으로 구성된 휘발성 메모리를 포함할 수 있다. 또한, 메모리(110)는 하드 디스크, SSD(Solid state drive) 등을 포함할 수도 있다.The memory 110 may include non-volatile memory, such as ROM or flash memory, and may include volatile memory, such as DRAM. Additionally, the memory 110 may include a hard disk, solid state drive (SSD), etc.

도 2를 참조하면, 메모리(110)는, 적어도 하나의 단어 사전(111), 적어도 하나의 딥러닝 모델(112)을 포함할 수 있다.Referring to FIG. 2, the memory 110 may include at least one word dictionary 111 and at least one deep learning model 112.

단어 사전(111)은 서로 다른 언어로 된 단어들 간의 매칭 관계에 대한 정보를 포함할 수 있다. 예를 들어, 단어 사전(111)은 제1 언어(ex. 한글)로 된 단어와, 해당 단어와 매칭되는 제2 언어(ex. 영어)의 단어를 각각 포함할 수 있다.The word dictionary 111 may include information about matching relationships between words in different languages. For example, the word dictionary 111 may include words in a first language (eg, Korean) and words in a second language (eg, English) that match the corresponding words.

단어 사전(111)은, 전문 분야 별로 각각 구비될 수 있다. 예를 들어, 메모리(110)는 뇌질환과 관련된 단어들이 포함된 단어 사전, 심장질환과 관련된 단어들이 포함된 단어 사전 등을 각각 별도로 포함할 수 있다.The word dictionary 111 may be provided for each specialized field. For example, the memory 110 may separately include a word dictionary containing words related to brain disease and a word dictionary containing words related to heart disease.

딥러닝 모델(112)은 특정 언어의 텍스트를 처리하기 위한 네트워크 모델에 해당할 수 있다. 예를 들어, 단어 사전(111)이 한글 단어 및 영어 단어 간의 매칭 관계를 포함하는 경우, 딥러닝 모델(112)은 영어로 구성된 텍스트를 처리하기 위한 네트워크 모델에 해당할 수 있다.The deep learning model 112 may correspond to a network model for processing text in a specific language. For example, if the word dictionary 111 includes matching relationships between Korean words and English words, the deep learning model 112 may correspond to a network model for processing text composed of English.

딥러닝 모델(112)은, 텍스트의 분류, 텍스트의 요약, 텍스트의 인식/변환, 텍스트와 관련된 대화 생성 등 다양한 목적에 따라 텍스트를 처리하도록 설계된 것일 수 있다.The deep learning model 112 may be designed to process text for various purposes, such as text classification, text summary, text recognition/conversion, and text-related dialogue generation.

딥러닝 모델(112)은 다양한 학습 알고리즘을 통해 전자 장치(100) 또는 별도의 서버/시스템을 통해 학습된 것일 수 있다.The deep learning model 112 may be learned through the electronic device 100 or a separate server/system through various learning algorithms.

학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으며, 본 개시에서의 학습 알고리즘은 명시한 경우를 제외하고 전술한 예에 한정되지 않는다.Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, and the learning algorithm in the present disclosure is specified. Except, it is not limited to the examples described above.

딥러닝 모델(112)은 신경망(Neural Network)을 기반으로 하는 네트워크 모델(신경망 모델)이며, 가중치를 가지는 복수의 네트워크 노드들을 포함할 수 있다. 복수의 네트워크 노드들은 서로 다른 레이어의 노드 간 가중치를 기반으로 연결 관계를 형성할 수 있다.The deep learning model 112 is a network model (neural network model) based on a neural network and may include a plurality of network nodes with weights. Multiple network nodes can form a connection relationship based on weights between nodes of different layers.

프로세서(120)는 전자 장치(100)를 전반적으로 제어하기 위한 구성이다. 구체적으로, 프로세서(120)는 메모리(110)와 연결되는 한편 메모리(110)에 저장된 적어도 하나의 인스트럭션을 실행함으로써 본 개시의 다양한 실시 예들에 따른 동작을 수행할 수 있다.The processor 120 is configured to overall control the electronic device 100. Specifically, the processor 120 may perform operations according to various embodiments of the present disclosure by being connected to the memory 110 and executing at least one instruction stored in the memory 110.

프로세서(120)는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit) 등과 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서 등을 포함할 수 있다. 인공지능 전용 프로세서는, 특정 인공지능 모델의 훈련 내지는 이용에 특화된 하드웨어 구조로 설계될 수 있다.The processor 120 may include a general-purpose processor such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor such as a GPU, a VPU (Vision Processing Unit), or an artificial intelligence-specific processor such as an NPU. An artificial intelligence-specific processor may be designed with a hardware structure specialized for training or use of a specific artificial intelligence model.

도 3은 본 개시의 일 실시 예에 따른 전자 장치의 동작을 설명하기 위한 흐름도이다.Figure 3 is a flowchart for explaining the operation of an electronic device according to an embodiment of the present disclosure.

도 3을 참조하면, 전자 장치(100)는 전체 텍스트 내에서 제1 언어에 해당하는 제1 텍스트 및 상기 제2 언어에 해당하는 제2 텍스트를 식별할 수 있다(S310).Referring to FIG. 3, the electronic device 100 may identify a first text corresponding to the first language and a second text corresponding to the second language within the entire text (S310).

일 예로, 전체 텍스트는 한글과 영어가 혼재된 텍스트일 수 있고, 제1 언어는 한글, 제2 언어는 영어일 수 있으나, 이에 한정되지 않고 다양한 종류의 언어가 식별될 수 있다.For example, the entire text may be a mixed text of Korean and English, the first language may be Korean, and the second language may be English, but the text is not limited to this and various types of languages can be identified.

그리고, 전자 장치(100)는 상술한 단어 사전(111)을 기반으로 제1 텍스트를 제2 언어에 해당하는 제3 텍스트로 변환할 수 있다(S320).Additionally, the electronic device 100 may convert the first text into a third text corresponding to the second language based on the above-described word dictionary 111 (S320).

구체적으로, 전자 장치(100)는 제1 텍스트 내에서 단어 사전(111)에 포함된 적어도 하나의 제1 키워드를 식별할 수 있다. 그리고, 단어 사전(111)을 기반으로, 전자 장치(100)는 제1 텍스트에 포함된 제1 키워드를 제2 언어에 해당하는 제2 키워드로 변환할 수 있다.Specifically, the electronic device 100 may identify at least one first keyword included in the word dictionary 111 within the first text. And, based on the word dictionary 111, the electronic device 100 may convert the first keyword included in the first text into a second keyword corresponding to the second language.

이때, 제2 키워드가 하나의 개체로 설정된 상태에서, 전자 장치(100)는 제1 텍스트를 제2 언어에 해당하는 제3 텍스트로 변환할 수 있다.At this time, with the second keyword set as one entity, the electronic device 100 may convert the first text into a third text corresponding to the second language.

관련하여, 도 4는 본 개시의 일 실시 예에 따른 전자 장치가 단어 사전을 활용하여 한영 변환을 수행하는 동작을 설명하기 위한 도면이다.In relation to this, FIG. 4 is a diagram illustrating an operation in which an electronic device performs Korean-English conversion using a word dictionary according to an embodiment of the present disclosure.

도 4를 참조하면, 단어 사전(405)은 뇌질환과 관련된 단어들을 포함하며, 구체적으로 한글 단어와 영어 단어 간의 매칭 관계를 포함한다.Referring to FIG. 4, the word dictionary 405 includes words related to brain diseases, and specifically includes matching relationships between Korean words and English words.

리스트(410)는 한글에 해당하는 텍스트들을 포함하고, 리스트(410')는 리스트(410)의 텍스트들이 영어로 변환된 결과를 포함한다.The list 410 includes texts corresponding to Korean, and the list 410' includes the results of converting the texts of the list 410 into English.

도 4를 참조하면, 리스트(410)에 포함된 각 텍스트들은 단어 사전(405)에 정의된 매칭 관계에 따라 리스트(410')에 포함된 각 텍스트로 변환될 수 있다.Referring to FIG. 4, each text included in the list 410 may be converted into each text included in the list 410' according to a matching relationship defined in the word dictionary 405.

예를 들어, 상술한 제1 텍스트에 “활력증상”이라는 키워드가 포함된 경우, 전자 장치(100)는 단어 사전(405)을 활용하여 “활력증상”을 “vital sign”으로 변환할 수 있다.For example, if the above-described first text includes the keyword “vital symptoms,” the electronic device 100 may convert “vital symptoms” into “vital signs” using the word dictionary 405.

여기서, 전자 장치(100)는 “활력증상”이 “vital sign”으로 대체된 제1 텍스트에서 “vital sign”을 하나의 개체로 설정한 상태로 제1 텍스트를 영어로 변환할 수 있다. 이렇듯, 단어 사전(405)에 따라 확보된 “vital sign” 자체가 하나의 개체로 고정된 상태로 번역이 수행되는 경우, 번역의 정확도가 상승할 수 있다.Here, the electronic device 100 may convert the first text into English, with “vital sign” set as one entity in the first text in which “vital symptoms” is replaced with “vital sign.” In this way, if translation is performed with the “vital sign” itself secured according to the word dictionary 405 fixed as a single entity, the accuracy of translation may increase.

이때, 전자 장치(100)는 상술한 단어 사전(111, 405) 외에 한영 번역을 수행하기 위한 적어도 하나의 인공지능 모델을 활용할 수도 있다(ex. 종래의 기계 번역 내지는 통계기반 번역).At this time, the electronic device 100 may utilize at least one artificial intelligence model to perform Korean-English translation in addition to the word dictionaries 111 and 405 described above (ex. conventional machine translation or statistical-based translation).

상술한 실시 예에 따라 제1 언어의 제1 텍스트가 제3 텍스트(: 제2 언어)로 변환되면, 전자 장치(100)는 제2 텍스트 및 제3 텍스트를 포함하는 통합 텍스트(: 제2 언어)를 자연어 처리하여 하나 이상의 벡터를 획득할 수 있다(S330).According to the above-described embodiment, when the first text of the first language is converted into the third text (: second language), the electronic device 100 converts the integrated text (: second language) including the second text and the third text. ) can be processed into natural language to obtain one or more vectors (S330).

구체적으로, 전자 장치(100)는 통합 텍스트에 대하여 tokenization, 품사 태깅(POS Tagging)을 수행한 이후, word embedding을 수행하여 하나 이상의 벡터를 획득할 수 있다.Specifically, the electronic device 100 may perform tokenization and part-of-speech tagging (POS Tagging) on the integrated text and then perform word embedding to obtain one or more vectors.

Word embedding을 통한 벡터 변환 이전에, 전자 장치(100)는 통합 텍스트에 대하여 Stop Word Removal, N-gram 기반의 단어 선택 내지는 단어 간 결합 판단 등을 수행할 수도 있다.Before vector conversion through word embedding, the electronic device 100 may perform Stop Word Removal, N-gram-based word selection, or combination judgment between words on the integrated text.

그리고, 상술한 바와 같이 통합 텍스트가 변환된 벡터가 획득되면, 전자 장치(100)는 획득된 벡터를 기초로 딥러닝 모델(112)을 훈련시킬 수 있다(S340).And, when the vector converted into the integrated text is obtained as described above, the electronic device 100 can train the deep learning model 112 based on the obtained vector (S340).

제1 언어 및 제2 언어가 혼재하는 전체 텍스트에 대해서 일괄적으로 벡터를 획득하는 경우와 비교했을 때, 본 개시에 따른 전자 장치의 제어 방법(도 3)은, 단어 변환의 전문성이 담보된 단어 사전(111)을 활용하여 하나의 언어로 통일한 뒤 벡터를 획득한다는 점에서, 벡터 변환 전 잘못된 자연어 처리의 가능성이 줄어든다는 장점이 있다. 이러한 장점은, 텍스트가 변환된 벡터를 통해 훈련되는 딥러닝 모델의 훈련 성과의 증대로 이어질 수 있다.Compared to the case of obtaining vectors at once for the entire text in which the first language and the second language are mixed, the control method of an electronic device according to the present disclosure (FIG. 3) is a method for controlling words with expertise in word conversion. The advantage is that the possibility of incorrect natural language processing before vector conversion is reduced in that vectors are obtained after unification into one language using a dictionary (111). These advantages can lead to an increase in the training performance of deep learning models that are trained using text-converted vectors.

한편, 일 실시 예로, 전자 장치(100)는 상술한 통합 텍스트의 벡터를 통해 훈련된 딥러닝 모델(112)의 훈련 전후 정확도를 비교하여 훈련의 효과를 검증할 수 있다Meanwhile, in one embodiment, the electronic device 100 may verify the effect of training by comparing the accuracy before and after training of the deep learning model 112 trained through the vector of the above-described integrated text.

예를 들어, 전자 장치(100)는 딥러닝 모델(112)의 검증을 위해 구비된 다양한 텍스트의 벡터를 딥러닝 모델(112)에 입력할 수 있으며, 딥러닝 모델(112)의 출력에 따라 정확도를 식별할 수 있다.For example, the electronic device 100 may input vectors of various texts provided for verification of the deep learning model 112 into the deep learning model 112, and the accuracy according to the output of the deep learning model 112 can be identified.

만약 상술한 통합 텍스트의 벡터를 통해 훈련이 수행된 결과 딥러닝 모델(112)의 정확도가 낮아진 경우, 전자 장치(100)는 단어 사전(111)을 업데이트할 수 있다.If the accuracy of the deep learning model 112 is lowered as a result of training using the vector of the integrated text described above, the electronic device 100 may update the word dictionary 111.

구체적으로, 제1 언어에 해당하는 제1 텍스트에 포함된 제1 키워드(ex. 활력 증상)가 단어 사전(111)에 따라 제2 언어의 제2 키워드(ex. vital sign)로 변환된 결과, 통합 텍스트(: 제2 언어)가 획득된 경우를 가정한다.Specifically, as a result of converting the first keyword (ex. vital sign) included in the first text corresponding to the first language into a second keyword (ex. vital sign) of the second language according to the word dictionary 111, It is assumed that an integrated text (: second language) is obtained.

여기서, 통합 텍스트의 벡터에 따라 훈련된 딥러닝 모델(112)의 정확도가 오히려 낮아지는 경우, 전자 장치(100)는 단어 사전(111) 내에서 제1 키워드에 매칭되는 제2 언어의 키워드를 상술한 제2 키워드가 아닌 다른 키워드로 변경할 수 있다.Here, when the accuracy of the deep learning model 112 trained according to the vector of the integrated text is rather low, the electronic device 100 details the keyword of the second language that matches the first keyword in the word dictionary 111. You can change it to a keyword other than the second keyword.

이때, 제1 키워드에 매칭되는 키워드는, 한 명 이상의 전문가의 사용자 입력에 따라 지정될 수 있다.At this time, the keyword matching the first keyword may be designated according to the user input of one or more experts.

예를 들어, 전자 장치(100)는 제1 키워드에 대한 정보를 복수의 전문가 단말로 전송할 수 있다. 그리고, 전자 장치(100)는 제1 키워드가 제2 언어로 변환된 키워드를 각 전문가 단말로부터 수신할 수 있다. 이때, 전자 장치(100)는 가장 많은 수의 전문가 단말로부터 수신된 키워드에 따라 단어 사전(111)을 업데이트할 수 있다.For example, the electronic device 100 may transmit information about the first keyword to a plurality of expert terminals. Additionally, the electronic device 100 may receive a keyword converted from the first keyword into the second language from each expert terminal. At this time, the electronic device 100 may update the word dictionary 111 according to the keywords received from the largest number of expert terminals.

한편, 단어 사전(111)에 따라, 제1 키워드(: 제1 언어)에 매칭되는 제2 키워드(: 제2 언어)가 복수 개 매칭되어 있을 수도 있다.Meanwhile, according to the word dictionary 111, there may be a plurality of second keywords (: second language) matching the first keyword (: first language).

일 예로, 상술한 S320 단계에서, 제1 언어에 해당하는 제1 키워드가 식별되고, 단어 사전(111)에 따라 제1 키워드에 매칭되는 복수의 제2 키워드가 식별된 경우를 가정한다.As an example, assume that in step S320 described above, a first keyword corresponding to the first language is identified, and a plurality of second keywords matching the first keyword are identified according to the word dictionary 111.

이 경우, 전자 장치(100)는 복수의 제2 키워드 각각을 독립적으로 반영하여 복수의 제3 텍스트를 획득할 수 있다. 여기서, 복수의 제3 텍스트 각각이 제2 언어에 해당하는 제2 텍스트와 결합되어 복수의 통합 텍스트가 획득될 수 있다.In this case, the electronic device 100 may acquire a plurality of third texts by independently reflecting each of the plurality of second keywords. Here, each of the plurality of third texts may be combined with the second text corresponding to the second language to obtain a plurality of integrated texts.

이 경우, 서로 다른 제3 텍스트를 포함하는 복수의 통합 텍스트가 각각 별도로 벡터로 변환될 수 있으며, 전자 장치(100)는 각 통합 텍스트가 변환된 벡터를 딥러닝 모델(112)에 독립적으로 입력할 수 있다.In this case, a plurality of integrated texts including different third texts may be separately converted into vectors, and the electronic device 100 may independently input the vectors into which each integrated text is converted into the deep learning model 112. You can.

여기서, 각 통합 텍스트가 변환된 벡터에 대한 딥러닝 모델(112)의 출력 간의 차이가 임계치 미만인 경우, 단어 사전(111)은 기존과 동일하게 유지될 수 있다. 즉, 단어 사전(111) 내에서, 제1 키워드에 매칭되는 것으로 설정된 복수의 제2 키워드가 그대로 유지될 수 있다.Here, if the difference between the output of the deep learning model 112 for the vector into which each integrated text is converted is less than the threshold, the word dictionary 111 may remain the same as before. That is, within the word dictionary 111, a plurality of second keywords set to match the first keyword may be maintained as is.

반면, 각 통합 텍스트가 변환된 벡터에 대한 딥러닝 모델(112)의 출력 간의 차이가 임계치 이상인 경우라면, 단어 사전(111)은 업데이트될 수 있다.On the other hand, if the difference between the output of the deep learning model 112 for the vector into which each integrated text is converted is greater than or equal to a threshold, the word dictionary 111 may be updated.

구체적인 예로, 전자 장치(100)는, 앞서 서로 다른 제2 키워드에 따라 도출된 복수의 통합 텍스트 중, 정답에 가장 가까운 딥러닝 모델(112)의 출력을 유발한 통합 텍스트를 선택할 수 있다.As a specific example, the electronic device 100 may select the integrated text that caused the output of the deep learning model 112 that is closest to the correct answer among a plurality of integrated texts previously derived according to different second keywords.

이때, 전자 장치(100)는, 선택된 통합 텍스트를 제외한 나머지 통합 텍스트 중, 정답에 가장 가까운 상술한 출력과 임계치 이상 차이가 나는 출력이 기인하는 (저품질의) 통합 텍스트를 식별할 수 있다. 그리고, 전자 장치(100)는 식별된 (저품질의) 통합 텍스트와 관련된 제2 키워드를, 단어 사전(111) 내 제1 키워드와 매칭되는 키워드 리스트에서 삭제할 수 있다.At this time, the electronic device 100 may identify a (low-quality) integrated text resulting from an output that differs by more than a threshold from the above-described output closest to the correct answer among the remaining integrated texts excluding the selected integrated text. Additionally, the electronic device 100 may delete the second keyword related to the identified (low-quality) integrated text from the keyword list that matches the first keyword in the word dictionary 111.

한편, 일 실시 예로, 메모리(110)는 전문 분야 별로 구분된 복수의 단어 사전을 포함할 수 있다.Meanwhile, in one embodiment, the memory 110 may include a plurality of word dictionaries divided by specialized field.

예를 들어, 메모리(110)는 제1 전문 분야에 대한 제1 단어 사전, 제2 전문 분야에 대한 제2 단어 사전, 및 제3 전문 분야에 대한 제3 단어 사전을 포함할 수 있다. For example, memory 110 may include a first word dictionary for a first specialty, a second word dictionary for a second specialty, and a third word dictionary for a third specialty.

구체적인 예로, 제1 전문 분야는 뇌질환, 제2 전문 분야는 심장질환, 제3 전문 분야는 정신질환에 해당할 수 있다. 다만, 전문 분야는 이 밖에도 다양한 의료 전문 분야를 포함할 수 있으며, 의료 전문 분야가 아닌 전혀 다른 분야의 전문 분야 역시 포함될 수 있음은 물론이다.As a specific example, the first specialty may be brain disease, the second specialty may be heart disease, and the third specialty may be mental disease. However, it goes without saying that the specialty field may include various other medical specialties, and may also include specialties in fields completely different from the medical specialty field.

이렇듯, 다양한 전문 분야에 해당하는 단어 사전이 기저장된 경우, 전자 장치(100)는 전체 텍스트(: 제1 언어와 제2 언어가 혼재)가 속하는 전문 분야에 맞는 단어 사전을 우선적으로 활용하여 통합 텍스트를 획득할 수 있다. 다만, 상황에 따라서는 다른 전문 분야의 단어 사전도 활용될 수 있는 바, 관련 실시 예는 이하 도 5를 통해 설명한다.In this way, when word dictionaries corresponding to various specialized fields are pre-stored, the electronic device 100 preferentially utilizes the word dictionary appropriate for the specialized field to which the entire text (a mixture of the first language and the second language) belongs to create an integrated text. can be obtained. However, depending on the situation, word dictionaries in other specialized fields may also be used, and related embodiments will be described with reference to FIG. 5 below.

도 5는 본 개시의 일 실시 예에 따른 전자 장치가 전문 분야 별로 구분된 단어 사전을 선택적으로 활용하는 동작을 설명하기 위한 알고리즘이다.FIG. 5 is an algorithm for explaining an operation in which an electronic device selectively utilizes a word dictionary classified by specialized field according to an embodiment of the present disclosure.

도 5를 참조하면, 전자 장치(100)는 전체 텍스트(: 제1 언어와 제2 언어 혼재)의 전문 분야를 설정할 수 있다(S510). 이때, 전문 분야는 사용자 입력에 따라 설정될 수도 있고, 전체 텍스트와 함께 저장된 메타데이터에 따라 기설정된 것일 수도 있다.Referring to FIG. 5, the electronic device 100 can set a specialized field for the entire text (: first language and second language mixed) (S510). At this time, the specialized field may be set according to user input, or may be preset according to metadata stored with the entire text.

또는, 전자 장치(100)는 전문 분야를 선택/분류하도록 훈련된 적어도 하나의 분류기 모델에 상술한 전체 텍스트를 입력하여, 전문 분야를 식별할 수도 있다.Alternatively, the electronic device 100 may identify the specialty field by inputting the entire text described above into at least one classifier model trained to select/classify the specialty field.

그리고, 전자 장치(100)는 전체 텍스트에 포함된 제1 언어의 제1 텍스트를 식별할 수 있다(S520). 본 과정은 상술한 S310 과정에 포함될 수 있다.And, the electronic device 100 can identify the first text in the first language included in the entire text (S520). This process may be included in the S310 process described above.

여기서, 전자 장치(100)는 제1 텍스트를 제2 언어의 제3 텍스트로 변환할 수 있으며, 본 과정에서 앞서 설정된 전문 분야에 맞는 단어 사전이 선택되어 활용될 수 있다(S530).Here, the electronic device 100 can convert the first text into a third text in the second language, and in this process, a dictionary of words suitable for the previously set specialized field can be selected and used (S530).

예를 들어, 전체 텍스트가 제1 전문 분야로 설정된 경우, 전자 장치(100)는 제1 전문 분야에 맞는 제1 단어 사전을 선택할 수 있으며, 선택된 제1 단어 사전을 활용하여 제1 텍스트를 제2 언어의 제3 텍스트로 변환할 수 있다.For example, when the entire text is set to the first specialty, the electronic device 100 may select a first word dictionary suitable for the first specialty, and use the selected first word dictionary to convert the first text into the second specialty. It can be converted into a third text of the language.

구체적으로, 제1 텍스트에 제1 단어 사전의 키워드가 존재하는 경우(S540 - Y), 전자 장치(100)는 제1 단어 사전을 활용하여 해당 키워드를 변환함으로써 제1 텍스트를 제2 언어로 변환할 수 있다(S550).Specifically, when a keyword from the first word dictionary exists in the first text (S540 - Y), the electronic device 100 converts the first text into the second language by converting the keyword using the first word dictionary. You can do it (S550).

반면, 선택된 제1 단어 사전의 키워드가 제1 텍스트 내에 존재하지 않을 수도 있다(S540 - N).On the other hand, the keyword of the selected first word dictionary may not exist in the first text (S540 - N).

이 경우, 전자 장치(100)는 다른 전문 분야에 해당하는 적어도 하나의 단어 사전을 선택할 수 있다(S560).In this case, the electronic device 100 may select at least one word dictionary corresponding to another specialized field (S560).

그리고, 전자 장치(100)는 선택된 단어 사전을 바탕으로, 제1 텍스트를 제2 언어로 변환할 수 있다(S550).Then, the electronic device 100 may convert the first text into the second language based on the selected word dictionary (S550).

일 예로, 전자 장치(100)는 제1 텍스트에 포함된 키워드들 중 적어도 하나를 포함하는 단어 사전을 선택할 수 있다. 구체적인 예로, 제1 텍스트에 포함된 키워드가 제2 전문 분야에 해당하는 제2 단어 사전에 포함된 경우, 전자 장치(100)는 제2 단어 사전을 활용하여 제1 텍스트를 제2 언어로 변환함으로써 통합 텍스트를 획득할 수 있다.As an example, the electronic device 100 may select a word dictionary that includes at least one of keywords included in the first text. As a specific example, when a keyword included in the first text is included in a second word dictionary corresponding to a second specialized field, the electronic device 100 converts the first text into the second language by using the second word dictionary. Integrated text can be obtained.

또한, 일 예로, 전자 장치(100)는 전체 텍스트에 대하여 설정된 제1 전문 분야에 대한 연관도가 높은 적어도 하나의 전문 분야를 선택하고, 선택된 전문 분야의 단어 사전을 선택할 수도 있다.Additionally, as an example, the electronic device 100 may select at least one specialized field with a high degree of relevance to the first specialized field set for the entire text and select a word dictionary of the selected specialized field.

전문 분야 간의 연관도는, 각 전문 분양에 매칭되는 단어 사전의 내용 간의 유사도에 따라 산출될 수 있다.The degree of correlation between specialized fields can be calculated based on the similarity between the contents of the word dictionary matching each specialized distribution.

구체적으로, 전자 장치(100)는 제1 전문 분야의 제1 단어 사전에 포함된 키워드들 각각을 제2 전문 분야의 제2 단어 사전에 포함된 키워드들 각각과 비교함으로써 키워드 간 유사도의 평균(또는 기타 통계치)을 산출할 수 있다.Specifically, the electronic device 100 compares each of the keywords included in the first word dictionary of the first specialized field with each of the keywords included in the second word dictionary of the second specialized field, thereby averaging the similarity between keywords (or Other statistics) can be calculated.

그리고, 전자 장치(100)는 키워드 간 유사도의 평균에 따라 제1 전문 분야 및 제2 전문 분야 간의 연관도를 설정할 수 있다. 예를 들어, 키워드 간 유사도의 평균이 클수록 제1 전문 분야 및 제2 전문 분야 간의 연관도도 더 크게 산출될 수 있다.Additionally, the electronic device 100 may set the degree of association between the first specialized field and the second specialized field according to the average similarity between keywords. For example, the greater the average similarity between keywords, the greater the correlation between the first and second specialized fields can be calculated.

일 예로, 제1 전문 분야와 제2 전문 분야 간의 연관도가, 제1 전문 분야와 제3 전문 분야 간의 연관도보다 큰 경우를 가정한다.As an example, assume that the degree of correlation between the first and second specialized fields is greater than the degree of correlation between the first and third specialized fields.

여기서, 상술한 바와 같이 전체 텍스트에 포함된 제1 텍스트의 키워드가 앞서 선택된 제1 전문 분야의 제1 단어 사전에 존재하지 않는 경우, 전자 장치(100)는 제1 전문 분야와 연관도가 가장 높은 제2 전문 분야의 제2 단어 사전을 선택할 수 있다.Here, as described above, if the keyword of the first text included in the entire text does not exist in the first word dictionary of the first specialty field selected previously, the electronic device 100 selects the keyword with the highest degree of association with the first specialty field. You can select a second dictionary of words in a second specialty.

이때, 전자 장치(100)는 선택된 제2 단어 사전을 활용하여 제1 텍스트를 제2 언어로 변환할 수 있다.At this time, the electronic device 100 may convert the first text into the second language using the selected second word dictionary.

다만, 제2 단어 사전 역시 제1 텍스트 내 키워드를 포함하지 않는 경우도 발생할 수 있다. 이 경우, 전자 장치(100)는 제1 전문 분야와의 연관도가 두 번째로 높은 제3 전문 분야의 제3 단어 사전을 선택할 수 있다. 그리고, 전자 장치(100)는 제3 단어 사전을 활용하여 제1 텍스트를 제2 언어로 변환할 수 있다.However, there may be cases where the second word dictionary also does not include keywords in the first text. In this case, the electronic device 100 may select a third word dictionary in the third specialty field that has the second highest degree of correlation with the first specialty field. Additionally, the electronic device 100 may convert the first text into the second language using a third word dictionary.

한편, 상술하였듯 전체 텍스트의 전문 분야에 따라 (최초로) 선택된 단어 사전(ex. 제1 단어 사전)을 통해 키워드 변환이 수행되지 않고 다른 전문 분야의 단어 사전을 통해 키워드 변환이 수행된 경우(ex. S540 - N -> S560 -> S550), 전자 장치(100)는 최종적으로 선택된 단어 사전(ex. 제2 단어 사전 및/또는 제3 단어 사전)을 기초로 최초로 선택된 단어 사전(ex. 제1 단어 사전)을 업데이트할 수 있다.On the other hand, as described above, if keyword conversion is not performed through a word dictionary (ex. first word dictionary) selected (for the first time) according to the specialty of the entire text, but is performed through a word dictionary of another specialty (ex. S540 - N -> S560 -> S550), the electronic device 100 first selects a word dictionary (ex. first word dictionary) based on the finally selected word dictionary (ex. second word dictionary and/or third word dictionary). word dictionary) can be updated.

구체적으로, 최종적으로 선택된 다른 전문 분야의 단어 사전(ex. 제2 단어 사전 및/또는 제3 단어 사전) 내에서, 전자 장치(100)는 상술한 제1 텍스트에 포함된 키워드와 매칭되는 제2 언어의 키워드를 식별할 수 있다.Specifically, within the word dictionary (e.g., second word dictionary and/or third word dictionary) of another specialized field finally selected, the electronic device 100 may generate a second word matching the keyword included in the above-described first text. Can identify language keywords.

그리고, 전자 장치(100)는 상술한 제1 텍스트에 포함된 키워드 및 (다른 단어 사전을 통해 매칭된) 제2 언어의 키워드 간의 매칭 관계를, 최초로 선택된 단어 사전 내에 추가할 수 있다.Additionally, the electronic device 100 may add a matching relationship between keywords included in the above-described first text and keywords of the second language (matched through another word dictionary) into the initially selected word dictionary.

그 결과, 전체 텍스트의 전문 분야(ex. 제1 전문 분야)에 매칭되는 단어 사전(ex. 제1 단어 사전)이 지속적으로 보완될 수 있다.As a result, the word dictionary (ex. first word dictionary) matching the specialty field (ex. first specialty field) of the entire text can be continuously supplemented.

한편, 이상에서 설명된 다양한 실시 예들은 서로 저촉되거나 모순되지 않는 한 두 개 이상의 실시 예가 서로 결합되어 구현될 수 있다.Meanwhile, the various embodiments described above may be implemented by combining two or more embodiments as long as they do not conflict or contradict each other.

한편, 이상에서 설명된 다양한 실시 예들은 소프트웨어(software), 하드웨어(hardware) 또는 이들의 조합된 것을 이용하여 컴퓨터(computer) 또는 이와 유사한 장치로 읽을 수 있는 기록 매체 내에서 구현될 수 있다.Meanwhile, the various embodiments described above may be implemented in a recording medium that can be read by a computer or similar device using software, hardware, or a combination thereof.

하드웨어적인 구현에 의하면, 본 개시에서 설명되는 실시 예들은 ASICs(Application Specific Integrated Circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛(unit) 중 적어도 하나를 이용하여 구현될 수 있다. According to hardware implementation, embodiments described in this disclosure include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), and field programmable gate arrays (FPGAs). ), processors, controllers, micro-controllers, microprocessors, and other electrical units for performing functions.

일부의 경우에 본 명세서에서 설명되는 실시 예들이 프로세서 자체로 구현될 수 있다. 소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시 예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상술한 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다.In some cases, embodiments described herein may be implemented in the processor itself. According to software implementation, embodiments such as procedures and functions described in this specification may be implemented as separate software modules. Each of the software modules described above may perform one or more functions and operations described herein.

한편, 상술한 본 개시의 다양한 실시 예들에 따른 전자 장치(100)에서의 처리동작을 수행하기 위한 컴퓨터 명령어(computer instructions) 또는 컴퓨터 프로그램은 비일시적 컴퓨터 판독 가능 매체(non-transitory computer-readable medium)에 저장될 수 있다. 이러한 비일시적 컴퓨터 판독 가능 매체에 저장된 컴퓨터 명령어 또는 컴퓨터 프로그램은 특정 기기의 프로세서에 의해 실행되었을 때 상술한 다양한 실시 예에 따른 전자 장치(100)에서의 처리 동작을 상술한 특정 기기가 수행하도록 한다. Meanwhile, computer instructions or computer programs for performing processing operations in the electronic device 100 according to various embodiments of the present disclosure described above are non-transitory computer-readable medium. It can be saved in . Computer instructions or computer programs stored in such non-transitory computer-readable media, when executed by a processor of a specific device, cause the specific device to perform processing operations in the electronic device 100 according to the various embodiments described above.

비일시적 컴퓨터 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 비일시적 컴퓨터 판독 가능 매체의 구체적인 예로는, CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등이 있을 수 있다.A non-transitory computer-readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories. Specific examples of non-transitory computer-readable media may include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, etc.

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In the above, preferred embodiments of the present disclosure have been shown and described, but the present disclosure is not limited to the specific embodiments described above, and may be used in the technical field pertaining to the disclosure without departing from the gist of the disclosure as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical ideas or perspectives of the present disclosure.

100: 전자 장치 110: 메모리
111: 단어 사전 120: 프로세서100: electronic device 110: memory
111: Dictionary of words 120: Processor

Claims

In electronic devices,
a memory storing a word dictionary indicating a matching relationship between words of a first language and words of a second language, and at least one deep learning model for processing text of the second language; and
Including a processor connected to the memory,
The processor,
Identifying a first text corresponding to the first language and a second text corresponding to the second language within the entire text,
Based on the word dictionary, converting the identified first text into a third text corresponding to the second language,
Obtaining one or more vectors by natural language processing the integrated text including the second text and the third text,
Train the deep learning model based on the obtained vector,
The processor,
Within the first text, identify at least one first keyword included in the word dictionary,
Based on the word dictionary, converting the first keyword included in the first text into a second keyword corresponding to the second language,
With the second keyword set as one entity, converting the first text into the third text corresponding to the second language,
The processor,
Compare the accuracy before and after training of the deep learning model trained based on the obtained vector,
If the accuracy of the deep learning model is lowered as a result of comparing the accuracy before and after training, updating the word dictionary so that the keyword of the second language matching the first keyword is changed,
The processor,
When a plurality of second keywords matching the first keyword are identified according to the word dictionary, for each of the plurality of second keywords, the first keyword is converted according to each second keyword and set as an entity. Converting the first text to the second language to obtain a plurality of third texts,
If the difference between the outputs of the deep learning model according to the input of vectors matching each of the plurality of third texts is less than a threshold, maintaining the word dictionary,
An electronic device that updates the word dictionary when the difference between the outputs of the deep learning model according to the input of vectors matching each of the plurality of third texts is greater than or equal to the threshold.

delete

According to paragraph 1,
The processor,
An electronic device that obtains one or more vectors by performing tokenization, part-of-speech tagging, and word embedding on the integrated text.

According to paragraph 1,
The memory is,
a first word dictionary for the first specialty, a second word dictionary for the second specialty, and a third word dictionary for the third specialty;
The processor,
When the entire text is set to the first specialized field, the electronic device converts the first text into a third text corresponding to the second language by utilizing the first word dictionary.

According to clause 6,
The processor,
If there is no keyword included in the first word dictionary in the first text, at least one of the second word dictionary and the third word dictionary is used to convert the first text into a word corresponding to the second language. 3 An electronic device that converts text into text.

In clause 7,
The processor,
Selecting at least one word dictionary related to a keyword included in the first text from the second word dictionary and the third word dictionary,
Converting the first text into a third text corresponding to the second language using the selected word dictionary,
An electronic device that updates the first word dictionary based on a keyword of the second language that matches the keyword in the selected word dictionary.

In a method of controlling an electronic device,
The electronic device is,
A word dictionary representing a matching relationship between words of a first language and words of a second language, and at least one deep learning model for processing text of the second language,
The control method is,
identifying a first text corresponding to the first language and a second text corresponding to the second language within the entire text;
converting the identified first text into a third text corresponding to the second language, based on the word dictionary;
Obtaining one or more vectors by natural language processing the integrated text including the second text and the third text; and
Comprising: training the deep learning model based on the obtained vector,
Converting the identified first text into a third text corresponding to the second language,
Within the first text, identify at least one first keyword included in the word dictionary,
Based on the word dictionary, converting the first keyword included in the first text into a second keyword corresponding to the second language,
With the second keyword set as one entity, converting the first text into the third text corresponding to the second language,
The control method of the electronic device is,
Comparing accuracy before and after training of the deep learning model trained based on the obtained vector; and
When the accuracy of the deep learning model is lowered as a result of comparing the accuracy before and after training, updating the word dictionary so that the keyword of the second language matching the first keyword is changed,
The control method of the electronic device is,
When a plurality of second keywords matching the first keyword are identified according to the word dictionary, for each of the plurality of second keywords, the first keyword is converted according to each second keyword and set as an entity. converting the first text into the second language to obtain a plurality of third texts;
maintaining the word dictionary when a difference between outputs of the deep learning model according to input vectors matching each of the plurality of third texts is less than a threshold; and
A control method of an electronic device comprising: updating the word dictionary when a difference between outputs of the deep learning model according to input of vectors matching each of the plurality of third texts is greater than or equal to the threshold.

In a computer program stored on a computer-readable medium,
A computer program stored in a computer-readable medium, which is executed by a processor of an electronic device and causes the electronic device to perform the control method of claim 9.

delete