KR20200114555A

KR20200114555A - Apparatus and method for matching natural language based on morphological analysis and system for controlling electronic document using the same

Info

Publication number: KR20200114555A
Application number: KR1020190036517A
Authority: KR
Inventors: 박미경; 최재호; 이상현
Original assignee: 주식회사 포시에스
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2020-10-07
Also published as: KR102215091B1

Abstract

Disclosed are a device and method for matching a natural language based on a morphological analysis and a system for controlling an electronic document using the same. According to one embodiment of the present invention, the device for matching a natural language based on a morphological analysis may comprise: a pattern generation unit acquiring concepts included in a plurality of sentences through a morpheme analysis of a plurality of sentences containing similar intentions to create a pattern string and generating a pattern based on the pattern string; and a user input analysis unit receiving a user input and matching the pattern corresponding to the user input to return matching data.

Description

A natural language matching device and method based on morpheme analysis, and an electronic document control system using the same {APPARATUS AND METHOD FOR MATCHING NATURAL LANGUAGE BASED ON MORPHOLOGICAL ANALYSIS AND SYSTEM FOR CONTROLLING ELECTRONIC DOCUMENT USING THE SAME}

본원은 형태소 분석 기반의 자연어 매칭 장치, 방법 및 그를 이용한 전자문서 제어 시스템에 관한 것이다.The present application relates to a natural language matching apparatus and method based on morpheme analysis, and an electronic document control system using the same.

일반적으로 업무 영역에서 다양한 양식의 문서가 발생하게 되고, 이러한 문서들은 종래에는 종이에 인쇄된 형태로 작성되었다면, 최근 다양한 문서들을 전자적인 형태로 생성하는 전자 문서 및 전자 서명 서비스가 도입되어 활용되고 있으며, 정부의 페이퍼리스(Paperless) 정책과 맞물려 급속한 성장 추세를 보이고 있다.In general, various forms of documents are generated in the business area, and if these documents were prepared in the form of printed on paper in the past, electronic documents and electronic signature services that generate various documents in electronic form have been recently introduced and utilized. , It is showing a rapid growth trend in line with the government's paperless policy.

또한, 스마트 디바이스의 대중화에 따라 음성 기반 서비스를 이용하는 사용자 경험은 지속적으로 축적되어 왔다. 현재 음성 기반 서비스의 대부분은 개인의 일상 생활에 연계된 기능을 제공하고 있으며, 앞으로는 기업의 업무 영역에 까지 확장될 것으로 예상된다.In addition, with the popularization of smart devices, user experiences using voice-based services have been continuously accumulated. Currently, most of the voice-based services provide functions related to the daily life of individuals, and are expected to expand to the business area of companies in the future.

따라서, 음성 신호를 포함한 다양한 형태의 입력을 기초로 전자 문서를 생성하는 기술 및 시스템이나 음성 입력 등을 기초로 전자 문서를 제어 및 관리할 수 있는 기술 및 시스템 개발의 필요성이 증대되고 있다.Accordingly, there is an increasing need for a technology and a system for generating an electronic document based on various types of input including a voice signal, or a technology and a system capable of controlling and managing an electronic document based on a voice input.

아울러, 음성 신호를 포함한 다양한 형태의 입력을 기초로 전자 문서를 제어 및 관리하기 위해서는 입력을 분석하여 사용자의 의도를 파악해야 하며, 자연어로 이루어진 입력(예를 들어, 음성 입력, 텍스트 입력)의 경우, 자연어에 특정 단어 또는 문구가 존재하거나 반복되는지 파악하기 위해 규칙 기반의 매칭이 사용될 수 있다. In addition, in order to control and manage electronic documents based on various types of input including voice signals, the user's intention must be identified by analyzing the input, and in the case of input made in natural language (for example, voice input, text input) In addition, rule-based matching may be used to determine whether a specific word or phrase exists or is repeated in a natural language.

전술한 규칙 기반의 매칭을 위하여 스크립트, 정규식 등이 대부분의 개발용 언어에서 활용되는데, 한글의 경우 표현 가능한 음절의 수는 총 11172개이다. 이렇듯 음절의 수가 많기 때문에 음절을 구성하는 초/중/종성 중에서 하나만 바뀌더라도 큰 폭으로 변화한 음절이 도출되므로, 비슷한 음절들이 모두 매칭될 수 있도록 정규식을 기술하는 것은 어려운 문제이다.For the above-described rule-based matching, scripts and regular expressions are used in most development languages. In the case of Hangul, a total of 11172 syllables can be expressed. Since the number of syllables is so large, even if only one of the elementary/medium/final constituents of the syllable is changed, a greatly changed syllable is derived, so it is a difficult problem to describe a regular expression so that similar syllables can all match.

또한, 한글의 경우 어근과 접사가 쉽게 결합할 수 있는 특성이 있고, 특히 용언인 동사 및 형용사의 경우 다양한 접사가 활용될 수 있다. 하나의 기본형에 대해 파생 가능한 모든 활용형들을 매칭하기 위해서는 접사와 결합될 때 특정 자/모음이 탈락하거나 추가되는 불규칙적인 현상까지 포괄할 수 있는 패턴이 작성될 수 있어야 한다.In addition, in the case of Hangul, roots and affixes can be easily combined. In particular, various affixes can be used in the case of verbs and adjectives that are verbs. In order to match all conjugable conjugations for one basic type, a pattern that can cover even irregular phenomena in which specific characters/vowels are dropped or added when combined with an affix must be created.

본원의 배경이 되는 기술은 한국등록특허공보 제10-1604553호에 개시되어 있다.The technology behind the present application is disclosed in Korean Patent Publication No. 10-1604553.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 형태소 분석을 도입하여 자연어를 매칭할 수 있는 방법 및 장치를 제공하려는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, and an object thereof is to provide a method and apparatus capable of matching natural language by introducing morpheme analysis.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 유사한 의도를 내포하는 복수의 문장들을 분석하여 어근을 추출하고 이를 패턴에 반영하여 파생어를 포함하는 모든 자연어에 대해 매칭 가능한 자연어 매칭 방법 및 장치를 제공하려는 것을 목적으로 한다.The present application provides a natural language matching method and apparatus capable of matching all natural languages including derived words by analyzing a plurality of sentences containing similar intentions, extracting roots, and reflecting them in a pattern as to solve the problems of the prior art described above. It is intended to be provided.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 사용자 입력을 생성한 사용자의 의도를 패턴에 기초하여 파악하고, 사용자의 의도에 따라 전자 문서에 대한 작성, 처리, 검색 등의 제어 동작을 수행할 수 있는 자연어 처리 기반의 전자문서 제어 시스템을 제공하려는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, grasps the intention of the user who generated the user input based on the pattern, and performs control operations such as creation, processing, and search for electronic documents according to the user's intention. It aims to provide an electronic document control system based on natural language processing.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiments of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 장치는, 유사한 의도를 내포하는 복수의 문장에 대한 형태소 분석을 통해 상기 복수의 문장에 포함된 컨셉을 획득하여 패턴 문자열을 작성하고 상기 패턴 문자열에 기초하여 패턴을 생성하는 패턴 생성부 및 사용자 입력을 수신하여 상기 사용자 입력에 대응되는 상기 패턴을 매칭하여 매칭 데이터를 반환하는 사용자 입력 분석부를 포함할 수 있다.As a technical means for achieving the above technical problem, the morpheme analysis-based natural language matching apparatus according to an embodiment of the present application includes a concept included in the plurality of sentences through morpheme analysis of a plurality of sentences containing similar intentions. And a pattern generator configured to obtain a pattern string and generate a pattern based on the pattern string, and a user input analyzer configured to receive a user input and match the pattern corresponding to the user input to return matching data. have.

또한, 상기 패턴 생성부는, 상기 복수의 문장 각각에 대한 형태소 분석을 수행하는 형태소 분석부, 형태소 분석 결과에 기초하여 컨셉을 획득하는 컨셉 획득부, 상기 사용자 입력에서 특정 문구를 추출하기 위한 캡쳐 또는 상기 컨셉 중 적어도 하나를 포함하는 패턴 문자열을 작성하는 패턴 문자열 작성부 및 상기 패턴 문자열에 기초하여 패턴을 정의하는 패턴 정의부를 포함할 수 있다.In addition, the pattern generation unit may include a morpheme analysis unit that performs morpheme analysis on each of the plurality of sentences, a concept acquisition unit that acquires a concept based on a result of the morpheme analysis, and a capture or the above for extracting a specific phrase from the user input. It may include a pattern character string creating unit for creating a pattern character string including at least one of the concepts and a pattern defining unit defining a pattern based on the pattern character string.

또한, 상기 형태소 분석 결과는, 상기 복수의 문장 각각을 형태소 단위로 분석한 형태소 및 분석된 형태소와 연계된 품사 정보를 포함할 수 있다.In addition, the morpheme analysis result may include a morpheme obtained by analyzing each of the plurality of sentences in a morpheme unit, and part of speech information associated with the analyzed morpheme.

또한, 상기 품사 정보는 기 설정된 품사 태그에 기초하여 표기될 수 있다.In addition, the POS information may be displayed based on a preset POS tag.

또한, 상기 패턴 문자열 작성부는 상기 컨셉 획득부가 획득한 상기 컨셉을 적어도 하나 포함하는 컨셉 집합에 기초하여 상기 패턴 문자열을 작성하는 것일 수 있다.In addition, the pattern string creation unit may create the pattern string based on a concept set including at least one concept obtained by the concept acquisition unit.

또한, 상기 컨셉은, 상기 컨셉의 원소인 단어를 포괄하는 컨셉 명칭이 존재하고 상기 컨셉 명칭에 기초하여 정의되거나 별도의 컨셉 명칭 없이 상기 컨셉의 원소인 단어를 직접 나열한 익명 컨셉 형태로 정의될 수 있다.In addition, the concept may be defined in the form of an anonymous concept in which a concept name encompassing a word that is an element of the concept exists and is defined based on the concept name, or a word that is an element of the concept is directly listed without a separate concept name. .

또한, 상기 사용자 입력 분석부는, 상기 사용자 입력을 띄어쓰기 단위로 잘라 복수의 분절로 상기 사용자 입력을 나누는 분절화부 및 적어도 하나의 상기 분절과 상기 패턴과의 매칭 여부를 판단하여 매칭이 성공될 경우 매칭 데이터를 반환하는 매칭 수행부를 포함할 수 있다.In addition, the user input analysis unit may include a segmentation unit that divides the user input into a plurality of segments by cutting the user input into spaces, and determines whether the at least one segment matches the pattern, and if the matching is successful, matching data It may include a matching execution unit that returns.

또한, 상기 매칭 수행부는, 적어도 하나의 분절과 단어의 매칭 또는 상기 적어도 하나의 분절과 컨셉의 매칭을 수행할 수 있다.In addition, the matching performing unit may perform matching of at least one segment and a word or matching of the at least one segment and a concept.

또한, 상기 매칭 수행부는, 패턴 문자열에 기술된 단어 또는 컨셉의 매칭 수행 여부를 결정하여 상기 매칭 수행 여부에 기초한 매칭 데이터를 반환할 수 있다.In addition, the matching unit may determine whether to perform matching of a word or concept described in a pattern string, and return matching data based on whether the matching is performed.

또한, 상기 매칭 수행부는 적어도 하나의 분절과 컨셉을 매칭하기 위하여 하나의 분절과 일치하는 단어를 원소로 하는 컨셉이 존재하는지 판단하고, 존재하지 않을 경우 띄어쓰기를 포함하여 상기 하나의 분절과 다음에 오는 분절을 결합한 것과 일치하는 단어를 원소로 하는 컨셉을 매칭할 수 있다. In addition, in order to match the concept with at least one segment, the matching unit determines whether a concept having a word matching one segment as an element exists, and if not, including a space, the one segment and the next It is possible to match concepts that combine segments and match words as elements.

또한, 상기 매칭 수행부는 순차적으로 분절을 결합해가며 컨셉 내 원소와 일치하는지 여부를 판단하고, 일치하는 원소가 없는 경우 다음 컨셉의 원소에 대해 매칭 여부를 판단할 수 있다.In addition, the matching unit may sequentially combine segments to determine whether or not an element in the concept matches, and if there is no matching element, it may determine whether to match an element of the next concept.

또한, 상기 매칭 수행부는, 상기 패턴 문자열에 캡쳐가 포함된 경우 상기 캡쳐와 상기 적어도 하나의 분절을 매칭하여 상기 캡쳐와 연계된 사용자 입력 내 특정 문구를 추출할 수 있다.In addition, when a capture is included in the pattern string, the matching unit may match the capture with the at least one segment to extract a specific phrase in the user input associated with the capture.

또한, 상기 매칭 데이터는, 상기 매칭된 컨셉에 대한 데이터 또는 상기 캡쳐와 연계된 사용자 입력 내 특정 문구 중 적어도 하나를 포함할 수 있다.In addition, the matching data may include at least one of data on the matched concept or a specific phrase in a user input associated with the capture.

한편, 본원의 일 실시예에 따른 자연어 처리 기반 전자문서 제어 시스템은, 사용자 입력을 수신하여 상기 사용자 입력에 대응되는 패턴을 매칭하여 매칭 데이터를 반환하고, 상기 매칭 데이터에 기초하여 상기 사용자 입력에 내포된 사용자의 의도를 파악하는 자연어 매칭 장치, 상기 사용자의 의도에 따라 전자서식 템플릿에 기초하여 전자문서에 대한 작성 또는 수정을 수행하는 전자문서 작성 장치, 상기 사용자의 의도에 따라 기 작성된 전자문서에 대한 결재, 접수 또는 삭제 처리를 수행하는 전자문서 처리 장치, 상기 사용자의 의도에 따라 사용자 입력에 포함된 검색 조건에 부합하는 전자문서 목록을 제공하는 전자문서 검색 장치 및 상기 사용자의 의도에 따라 복수의 전자문서 간의 이동을 제어하는 네비게이션 모듈을 포함할 수 있다.On the other hand, the electronic document control system based on natural language processing according to an embodiment of the present application receives a user input, matches a pattern corresponding to the user input, returns matching data, and is embedded in the user input based on the matching data. A natural language matching device that grasps the user's intention, an electronic document creation device that creates or corrects an electronic document based on an electronic format template according to the user's intention, and an electronic document that has been previously created according to the user's intention. An electronic document processing device that performs approval, reception, or deletion processing, an electronic document search device that provides a list of electronic documents that meet the search conditions included in the user input according to the user's intention, and a plurality of electronic devices according to the user's intention It may include a navigation module that controls movement between documents.

한편, 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 방법은, 유사한 의도를 내포하는 복수의 문장에 대한 형태소 분석을 통해 상기 복수의 문장에 포함되고 형태소 분석 결과에 기초하여 핵심이 되는 형태소로 구성된 원소를 가지는 컨셉을 획득하여 패턴 문자열을 작성하고 상기 패턴 문자열에 기초하여 패턴을 생성하는 단계, 사용자 입력을 수신하는 단계 및 상기 사용자 입력에 대응되는 상기 패턴을 매칭하여 매칭 데이터를 반환하는 단계를 포함할 수 있다.On the other hand, the natural language matching method based on morpheme analysis according to an embodiment of the present application is included in the plurality of sentences through morpheme analysis of a plurality of sentences containing similar intentions, and based on the result of morpheme analysis, Acquiring a concept having a configured element, creating a pattern string, generating a pattern based on the pattern string, receiving a user input, and returning matching data by matching the pattern corresponding to the user input. Can include.

또한, 상기 패턴을 생성하는 단계는, 상기 복수의 문장 각각에 대한 형태소 분석을 수행하는 단계, 형태소 분석 결과에 기초하여 컨셉을 획득하는 단계, 상기 사용자 입력에서 특정 문구 추출하기 위한 캡쳐 또는 상기 컨셉 중 적어도 하나를 포함하는 패턴 문자열을 작성하는 단계 및 상기 패턴 문자열에 기초하여 패턴을 정의하는 단계를 포함할 수 있다.In addition, the generating of the pattern may include performing a morpheme analysis on each of the plurality of sentences, acquiring a concept based on a result of morpheme analysis, a capture for extracting a specific phrase from the user input, or among the concepts It may include creating a pattern string including at least one and defining a pattern based on the pattern string.

또한, 상기 매칭 데이터를 반환하는 단계는, 상기 수신된 사용자 입력을 띄어쓰기 단위로 잘라 복수의 분절로 상기 사용자 입력을 나누는 단계 및 적어도 하나의 분절과 상기 패턴과의 매칭 여부를 판단하여 매칭이 성공될 경우 매칭 데이터를 생성하는 단계를 포함할 수 있다.In addition, the returning of the matching data may include dividing the user input into a plurality of segments by cutting the received user input in units of spaces, and determining whether at least one segment matches the pattern and matching is successful. In this case, it may include generating matching data.

또한, 상기 매칭 데이터를 생성하는 단계는, 적어도 하나의 분절과 단어를 매칭하거나 상기 적어도 하나의 분절과 컨셉을 매칭하는 단계, 매칭된 단어 및 매칭된 컨셉을 모두 포함하는 패턴 문자열을 결정하는 단계, 상기 패턴 문자열에 캡쳐가 포함된 경우 상기 캡쳐와 상기 적어도 하나의 분절을 매칭하여 상기 캡쳐와 연계된 사용자 입력 내 특정 문구를 추출하는 단계 및 상기 매칭된 컨셉에 대한 데이터 또는 상기 캡쳐와 연계된 사용자 입력 내 특정 문구 중 적어도 하나를 포함하는 상기 매칭 데이터를 생성하는 단계를 포함할 수 있다.In addition, the generating the matching data may include matching at least one segment and a word or matching the at least one segment and a concept, determining a pattern string including all the matched words and the matched concept, When the pattern string includes a capture, extracting a specific phrase in a user input associated with the capture by matching the capture and the at least one segment, and data on the matched concept or a user input associated with the capture It may include generating the matching data including at least one of my specific phrases.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 형태소 분석을 도입하여 자연어를 매칭할 수 있는 방법 및 장치를 제공할 수 있는 효과가 있다.According to the above-described problem solving means of the present application, it is possible to provide a method and apparatus capable of matching natural language by introducing morpheme analysis.

전술한 본원의 과제 해결 수단에 의하면, 유사한 의도를 내포하는 복수의 문장들을 분석하여 어근을 추출하고 이를 패턴에 반영함으로써 파생어를 포함하는 모든 자연어에 대해 매칭 가능한 자연어 매칭 방법 및 장치를 제공할 수 있는 효과가 있다. According to the above-described problem solving means of the present application, it is possible to provide a natural language matching method and apparatus that can match all natural languages including derived words by analyzing a plurality of sentences containing similar intentions, extracting the root, and reflecting it in the pattern. It works.

전술한 본원의 과제 해결 수단에 의하면, 사용자 입력을 생성한 사용자의 의도를 패턴에 기초하여 파악하고, 사용자의 의도에 따라 전자 문서에 대한 작성, 처리, 검색 등의 제어 동작을 수행할 수 있는 자연어 처리 기반의 전자문서 제어 시스템을 제공할 수 있는 효과가 있다.According to the above-described problem solving means of the present application, a natural language capable of grasping the intention of the user who generated the user input based on the pattern, and performing control operations such as creation, processing, and search for an electronic document according to the user's intention. There is an effect of providing a processing-based electronic document control system.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effect obtainable in the present application is not limited to the effects as described above, and other effects may exist.

도 1은 본원의 일 실시예에 따른 자연어 처리 기반 전자문서 제어 시스템의 구성을 도시한 도면이다.
도2는 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 장치의 구성을 도시한 도면이다.
도3은 본원의 일 실시예에 따른 한글 형태소를 세부 품사에 따라 분류한 기 설정된 품사 태그의 예시를 도시한 도면이다.
도4는 본원의 일 실시예에 따른 유사한 의도를 내포하는 복수의 문장에 대한 형태소 분석 결과를 도시한 도면이다.
도5는 본원의 일 실시예에 따른 형태소 분석 결과로부터 어근에 해당하는 형태소를 탐지한 결과를 도시한 도면이다.
도6은 본원의 일 실시예에 따른 어근에 해당하는 형태소를 체언과 용언으로 분류한 결과를 도시한 도면이다.
도7은 본원의 일 실시예에 따른 컨셉을 획득하는 방식을 설명하기 위한 도면이다.
도8은 본원의 일 실시예에 따른 패턴 문자열을 설명하기 위한 도면이다.
도9는 본원의 일 실시예에 따른 패턴 문자열에 기초한 패턴을 설명하기 위한 도면이다.
도10은 본원의 일 실시예에 따른 패턴 생성부의 구성을 도시한 도면이다.
도11은 본원의 일 실시예에 따른 사용자 입력 분석부의 구성을 도시한 도면이다.
도12는 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 방법의 동작흐름도이다.
도13은 본원의 일 실시예에 따른 패턴 생성 방법의 동작흐름도이다.
도14는 본원의 일 실시예에 따른 매칭 데이터 생성 방법의 동작흐름도이다.1 is a diagram illustrating a configuration of an electronic document control system based on natural language processing according to an embodiment of the present application.
2 is a diagram illustrating a configuration of a natural language matching device based on morpheme analysis according to an embodiment of the present application.
3 is a diagram illustrating an example of a preset POS tag classified according to detailed parts of speech according to a Hangul morpheme according to an embodiment of the present application.
4 is a diagram showing a result of morpheme analysis for a plurality of sentences containing similar intentions according to an embodiment of the present application.
5 is a diagram illustrating a result of detecting a morpheme corresponding to a root root from a result of morpheme analysis according to an embodiment of the present application.
6 is a diagram showing a result of classifying a morpheme corresponding to a root according to an embodiment of the present application into a body language and a verb.
7 is a diagram illustrating a method of acquiring a concept according to an embodiment of the present application.
8 is a diagram for describing a pattern character string according to an embodiment of the present application.
9 is a diagram for explaining a pattern based on a pattern character string according to an embodiment of the present application.
10 is a diagram showing the configuration of a pattern generator according to an embodiment of the present application.
11 is a diagram showing the configuration of a user input analysis unit according to an embodiment of the present application.
12 is an operation flow diagram of a natural language matching method based on morpheme analysis according to an embodiment of the present application.
13 is an operation flow diagram of a method for generating a pattern according to an embodiment of the present application.
14 is a flowchart illustrating an operation of a method for generating matching data according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, exemplary embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present application. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in the drawings, parts not related to the description are omitted in order to clearly describe the present application, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout the present specification, when a part is said to be "connected" with another part, it is not only "directly connected", but also "electrically connected" or "indirectly connected" with another element interposed therebetween. "Including the case.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is positioned "on", "upper", "upper", "under", "lower", and "lower" of another member, this means that a member is located on another member. It includes not only the case where they are in contact but also the case where another member exists between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification of the present application, when a certain part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

도1은 본원의 일 실시예에 따른 자연어 처리 기반 전자문서 제어 시스템의 구성을 도시한 도면이다.1 is a diagram showing the configuration of an electronic document control system based on natural language processing according to an embodiment of the present application.

도1을 참조하면, 본원의 일 실시예에 따른 자연어 처리 기반 전자문서 제어 시스템(10)은 형태소 분석 기반의 자연어 매칭 장치(100), 전자문서 작성 장치(200), 전자문서 처리 장치(300), 전자문서 검색 장치(400) 및 네비게이션 모듈(500)을 포함할 수 있다.Referring to FIG. 1, a natural language processing-based electronic document control system 10 according to an embodiment of the present application includes a morpheme analysis-based natural language matching device 100, an electronic document creation device 200, and an electronic document processing device 300. , An electronic document search device 400 and a navigation module 500 may be included.

본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 장치(100)는 사용자 입력을 수신하여 상기 사용자 입력에 대응되는 패턴을 매칭하여 매칭 데이터를 반환하고, 상기 매칭 데이터에 기초하여 상기 사용자 입력에 내포된 사용자의 의도를 파악할 수 있다.The morpheme analysis-based natural language matching apparatus 100 according to an embodiment of the present application receives a user input, matches a pattern corresponding to the user input, returns matching data, and includes the matching data in the user input. You can understand the intention of the user.

또한, 본원의 일 실시예에 따르면, 형태소 분석 기반의 자연어 매칭 장치(100)는 사용자가 육성으로 말한 문장을 상기 사용자 입력으로 하는 음성 입력 형태로 상기 사용자 입력을 수신할 수 있다.In addition, according to the exemplary embodiment of the present application, the natural language matching apparatus 100 based on morpheme analysis may receive the user input in the form of a voice input in which a sentence spoken by the user as the user input is used as the user input.

또한, 본원의 다른 실시예에 따르면, 형태소 분석 기반의 자연어 매칭 장치(100)는 사용자가 키보드 등의 별도의 입력 장치를 통해 타이핑한 문장을 상기 사용자 입력으로 하는 대화형 입력 형태로 상기 사용자 입력을 수신할 수 있다.In addition, according to another embodiment of the present application, the natural language matching device 100 based on morpheme analysis may input the user input in the form of an interactive input in which the user inputs a sentence typed through a separate input device such as a keyboard. Can receive.

또한, 본원의 일 실시예에 따르면, 상기 사용자의 의도에는 문서 작성 의도, 문서 수정 의도, 문서 결재 의도, 문서 접수 의도, 문서 삭제 의도, 문서 검색 의도 또는 문서 이동 의도 등 새로운 전자문서의 생성 또는 기 작성된 전자문서에 대한 관리 등 전자문서를 활용한 업무 절차 전반에 대한 명령 또는 요청이 포함될 수 있다.In addition, according to an embodiment of the present application, the user's intention includes the creation or creation of a new electronic document such as an intention to create a document, an intention to modify a document, an intention to approve a document, an intention to receive a document, an intention to delete a document, an intention to search a document, or an intention to move a document. It may include orders or requests for overall business procedures using electronic documents, such as management of written electronic documents.

전자문서 작성 장치(200)는, 상기 사용자의 의도에 따라 전자서식 템플릿에 기초하여 전자문서에 대한 작성 또는 수정을 수행할 수 있다.The electronic document creation apparatus 200 may create or modify an electronic document based on the electronic form template according to the intention of the user.

본원의 일 실시예에 따르면, 전자문서 작성 장치(200)는 형태소 분석 기반의 자연어 매칭 장치(100)에 의해 파악된 상기 사용자의 의도가 문서 작성 의도에 해당하는 경우, 상기 문서 작성 의도에 부합하는 전자서식 템플릿을 불러오고, 사용자 입력으로부터 키(Key) 값 및 밸류(Value) 값을 단위로 추출된 데이터를 상기 전자서식 템플릿 내 기입 항목에 입력할 수 있다.According to an embodiment of the present application, when the user's intention identified by the morpheme analysis-based natural language matching device 100 corresponds to the document writing intention, the electronic document writing device 200 corresponds to the document writing intention. The electronic form template may be loaded, and data extracted from a user input in units of a key value and a value value may be input into the entry item in the electronic form template.

전자문서 처리 장치(300)는, 상기 사용자의 의도에 따라 기 작성된 전자문서(2)에 대한 결재, 접수 또는 삭제 처리를 수행할 수 있다.The electronic document processing apparatus 300 may perform approval, reception, or deletion processing for the previously created electronic document 2 according to the intention of the user.

또한, 본원의 일 실시예에 따르면, 전자문서 처리 장치(300)는 푸시(Push) 알림 또는 알람을 통해 전자문서 결재 요청을 전달하고, 상기 결재 요청과 연계된 전자문서 목록을 제공할 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the electronic document processing apparatus 300 may transmit an electronic document approval request through a push notification or an alarm, and may provide a list of electronic documents associated with the approval request.

전자문서 검색 장치(400)는, 상기 사용자의 의도에 따라 사용자 입력에 포함된 검색 조건에 부합하는 전자문서 목록을 제공할 수 있다.The electronic document search apparatus 400 may provide a list of electronic documents meeting a search condition included in a user input according to the intention of the user.

본원의 일 실시예에 따르면, 전자문서 검색 장치(400)는, 상기 사용자 입력에 기초하여 검색 조건을 획득할 수 있다.According to the exemplary embodiment of the present disclosure, the electronic document search apparatus 400 may obtain a search condition based on the user input.

본원의 일 실시예에 따르면, 전자문서 검색 장치(400)는, 기 작성된 전자문서 (2)가 저장된 전자문서 데이터베이스로부터 상기 검색 조건에 부합하는 전자문서를 취합하여 목록을 작성하고, 상기 전자문서 목록을 제공할 수 있다.According to an embodiment of the present application, the electronic document search apparatus 400 collects electronic documents meeting the search conditions from an electronic document database in which a previously created electronic document 2 is stored and creates a list, and the electronic document list Can provide.

네비게이션 모듈(500)은, 상기 사용자의 의도에 따라 복수의 전자문서 간의 이동을 제어할 수 있다.The navigation module 500, Movement between a plurality of electronic documents may be controlled according to the user's intention.

자연어 매칭 장치(100)과 전자문서 작성 장치(200), 전자문서 처리 장치(300), 전자문서 검색 장치(400) 및 네비게이션 모듈(500)은 상호 연결되는 네트워크 (1)로 통신할 수 있다.The natural language matching device 100 and the electronic document creation device 200, the electronic document processing device 300, the electronic document search device 400, and the navigation module 500 may communicate through a network 1 that is interconnected.

네트워크(1)는 자연어 매칭 장치(100)와 전자문서 작성 장치(200), 전자문서 처리 장치(300), 전자문서 검색 장치(400) 및 네비게이션 모듈(500) 간에 정보 교환이 가능한 유, 무선의 연결 구조를 의미하며, 예시적으로, 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5G 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.The network 1 is a wired and wireless network capable of exchanging information between the natural language matching device 100 and the electronic document creation device 200, the electronic document processing device 300, the electronic document search device 400, and the navigation module 500. It refers to a connection structure, and illustratively, 3GPP (3rd Generation Partnership Project) network, LTE (Long Term Evolution) network, 5G network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network) ), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), Bluetooth (Bluetooth) network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, etc. It is not limited thereto.

도2는 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 장치의 구성을 도시한 도면이다.2 is a diagram illustrating a configuration of a natural language matching device based on morpheme analysis according to an embodiment of the present application.

도2를 참조하면, 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 장치(100)는 패턴 생성부(110) 및 사용자 입력 분석부(120)를 포함할 수 있다.Referring to FIG. 2, a morpheme analysis-based natural language matching apparatus 100 according to an exemplary embodiment of the present disclosure may include a pattern generation unit 110 and a user input analysis unit 120.

패턴 생성부(110)는 유사한 의도를 내포하는 복수의 문장에 대한 형태소 분석을 통해 상기 복수의 문장에 포함된 컨셉을 획득하여 패턴 문자열을 작성하고 상기 패턴 문자열에 기초하여 패턴을 생성할 수 있다.The pattern generator 110 may generate a pattern string by obtaining a concept included in the plurality of sentences through morpheme analysis of a plurality of sentences containing similar intentions, and generate a pattern based on the pattern string.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 복수의 문장 각각에 대한 형태소 분석을 수행할 수 있다.In addition, according to the exemplary embodiment of the present application, the pattern generator 110 may perform morpheme analysis for each of the plurality of sentences.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 형태소 분석 결과에 기초하여 컨셉을 획득할 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the pattern generator 110 may acquire a concept based on a result of morpheme analysis.

본원의 일 실시예에 따르면, 상기 컨셉은 사용자 입력 분석부(120)가 상기 사용자 입력과 후술할 패턴을 매칭하는 과정에서 참조하게 되는 최소한의 사전 정보를 모아둔 것으로 이해할 수 있다. 구체적으로, 상기 컨셉은 후술할 패턴을 매칭하는 과정에서 사용자 입력을 구성하는 하나 이상의 분절과 매칭이 되는지 판단되는 대상일 수 있다.According to the exemplary embodiment of the present disclosure, the concept may be understood as a collection of minimum prior information that is referred to in the process of matching the user input with the pattern to be described later by the user input analysis unit 120. Specifically, the concept may be a target for which it is determined whether or not matching with one or more segments constituting a user input in a process of matching a pattern to be described later.

또한, 본원의 일 실시예에 따르면, 상기 형태소 분석 결과는, 상기 복수의 문장 각각을 형태소 단위로 분석한 형태소 및 상기 분석된 형태소와 연계된 품사 정보가 포함될 수 있으며, 상기 품사 정보는 기 설정된 품사 태그에 기초하여 표기될 수 있다.In addition, according to an embodiment of the present application, the morpheme analysis result may include a morpheme obtained by analyzing each of the plurality of sentences in a morpheme unit and part of speech information associated with the analyzed morpheme, and the part of speech information is a preset part of speech. It can be marked based on tags.

도3은 본원의 일 실시예에 따른 한글 형태소를 세부 품사에 따라 분류한 기 설정된 품사 태그의 예시를 도시한 도면이다.3 is a diagram illustrating an example of a preset POS tag classified according to detailed parts of speech according to a Hangul morpheme according to an embodiment of the present application.

도3을 참조하면, 본원의 일 실시예에 따른 기 설정된 품사 태그(32)는 한글 형태소의 품사를 체언, 용언, 관형사, 부사, 감탄사, 조사, 어미, 접사, 어근, 부호, 한글 이외 등의 대분류(31)로 나누고, 일반 명사, 고유 명사, 의존 명사, 수사 등의 세부 품사(33)에 따라 분류하여 소정의 규칙에 따라 알파벳 대문자로 표기한 것일 수 있다.Referring to FIG. 3, a preset POS tag 32 according to an embodiment of the present application includes parts of speech in Hangul morphemes, such as body language, verb, tube sentence, adverb, interjection, investigation, ending, affix, root, sign, Korean, etc. It may be divided into a large classification 31, classified according to detailed parts of speech 33 such as general nouns, proper nouns, dependent nouns, rhetoric, etc., and marked in uppercase alphabets according to a predetermined rule.

다만, 도3에 도시된 기 설정된 품사 태그는 예시적인 것으로 이해되어야 하며, 형태소 분석에 활용되는 형태소 분석기에 따라 상이하게 결정될 수 있다.However, it should be understood that the preset POS tag shown in FIG. 3 is exemplary, and may be determined differently according to a morpheme analyzer used for morpheme analysis.

도4는 본원의 일 실시예에 따른 유사한 의도를 내포하는 복수의 문장에 대한 형태소 분석 결과를 도시한 도면이다.4 is a diagram showing a result of morpheme analysis for a plurality of sentences containing similar intentions according to an embodiment of the present application.

도4를 참조하면, 유사한 의도를 내포하는 복수의 문장(41)은 '문서 작성 의도'를 내포하는 것일 수 있으며, 패턴 생성부(110)는 복수의 문장(41) 각각을 형태소 단위로 분석하여, 상기 분석된 형태소와 연계된 품사 정보를 기 설정된 품사 태그에 기초하여 상기 분석된 형태소와 함께 표기(42)할 수 있다.Referring to FIG. 4, a plurality of sentences 41 containing similar intentions may contain a'document creation intention', and the pattern generator 110 analyzes each of the plurality of sentences 41 in units of morphemes. , Part of speech information associated with the analyzed morpheme may be marked 42 together with the analyzed morpheme based on a preset POS tag.

본원의 일 실시예에 따르면, 패턴 생성부(110)는 복수의 문장을 띄어쓰기 단위로 구분하고(/), 상기 형태소 분석 결과를 (분석된 형태소):(품사태그) 꼴로 제공할 수 있으며, 복수의 형태소는 ','를 통해 연결될 수 있다.According to an embodiment of the present application, the pattern generation unit 110 may divide a plurality of sentences into spaces (/), and provide the morpheme analysis result in the form of (analyzed morpheme):(part of speech tag). The morpheme of can be connected through','.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 형태소 분석 결과로부터 어근에 해당하는 형태소를 탐지하고, 상기 어근에 해당하는 형태소를 체언과 용언으로 분류할 수 있다.In addition, according to an exemplary embodiment of the present application, the pattern generator 110 may detect a morpheme corresponding to a root root from the result of the morpheme analysis, and classify the morpheme corresponding to the root root into a body language and a verb.

도5는 본원의 일 실시예에 따른 형태소 분석 결과로부터 어근에 해당하는 형태소를 탐지한 결과를 도시한 도면이다.5 is a diagram illustrating a result of detecting a morpheme corresponding to a root root from a result of morpheme analysis according to an embodiment of the present application.

도5를 참조하면, 어근에 해당하는 형태소(50)는 볼드체로 표시된 형태소를 의미할 수 있으며, 본원의 일 실시예에 따르면, 어근에 해당하는 형태소(50)는 명사 계열(품사 태그 기준 NNx 꼴), 형용사 계열(VA) 또는 동사 계열(VV)에 속하는 형태소일 수 있다.Referring to FIG. 5, the morpheme 50 corresponding to the root may mean a morpheme displayed in bold. According to an embodiment of the present application, the morpheme 50 corresponding to the root root is a noun series (NNx form based on a part of speech tag) ), adjective series (VA), or verb series (VV).

본원의 일 실시예에 따르면, 상기 형태소 분석 결과 중 어근에 해당하는 형태소(50)를 제외한 형태소(도5에서 볼드체로 표기되지 않은 부분)는 후술할 컨셉을 획득하는 과정에서 활용되지 않을 수 있다.According to an exemplary embodiment of the present disclosure, morphemes (parts not shown in bold in FIG. 5) excluding the morpheme 50 corresponding to the root of the morpheme analysis result may not be utilized in the process of obtaining a concept to be described later.

도6은 본원의 일 실시예에 따른 어근에 해당하는 형태소를 체언과 용언으로 분류한 결과를 도시한 도면이다.6 is a diagram showing a result of classifying a morpheme corresponding to a root according to an embodiment of the present application into a body language and a verb.

도6을 참조하면, 어근에 해당하는 형태소(50)는 체언 측(61)과 용언 측(62)로 분류될 수 있다. 본원의 일 실시예에 따르면, 체언 측(61)에는 명사, 대명사, 수사에 속하는 어근에 해당하는 형태소(50)가 우선적으로 포함될 수 있으며, 용언 측(62)에는 형용사, 동사에 속하는 어근에 해당하는 형태소(50)가 우선적으로 포함될 수 있으나, 체언(즉, 명사, 대명사 또는 수사)에 해당하는 경우에도 문장 내에서 서술어 성향이 높은 부분은 용언 측(62)으로 분류될 수 있다.Referring to FIG. 6, the morpheme 50 corresponding to the root can be classified into a body language side 61 and a word language side 62. According to an embodiment of the present application, the body word 61 may include a morpheme 50 corresponding to a root belonging to a noun, a pronoun, and a rhetoric first, and the verb side 62 corresponds to an adjective, a root belonging to a verb The morpheme 50 may be included preferentially, but even if it corresponds to a body language (ie, a noun, a pronoun, or a rhetoric), a portion having a high predicate tendency within a sentence may be classified as a proverb 62.

본원의 일 실시예에 따르면, 하나의 문장에서 체언 측 또는 용언 측으로 분류된 하나 이상의 형태소는 상기 하나 이상의 형태소를 포괄하는 단어(60)로 정의될 수 있으며, 단어(60)는 상기 컨셉의 원소가 될 수 있다.According to an embodiment of the present application, one or more morphemes classified as body language side or proverb side in one sentence may be defined as a word 60 encompassing the one or more morphemes, and the word 60 is an element of the concept. Can be.

본원의 일 실시예에 따르면, 단어(60)는 상기 형태소를 그대로 표기한 문자열 단어 또는 상기 분석된 형태소와 연계된 품사 정보를 기 설정된 품사 태그에 기초하여 상기 분석된 형태소와 함께 표기한 태그 단어를 포함할 수 있다According to an embodiment of the present application, the word 60 is a string word in which the morpheme is displayed as it is, or a tag word in which the analyzed morpheme is displayed together with the analyzed morpheme based on a preset part-of-speech tag. Can contain

또한, 본원의 일 실시예에 따르면, 상기 문자열 단어의 경우, 후술할 사용자 입력과 패턴과의 매칭 과정에서 사용자 입력의 분절과 정확히 일치하는 경우에만 매칭에 성공한 것으로 간주될 수 있다. In addition, according to an exemplary embodiment of the present disclosure, in the case of the character string word, in the process of matching a user input and a pattern to be described later, it may be regarded as successful only when it exactly matches a segment of the user input.

또한, 본원의 일 실시예에 따르면, 상기 태그 단어의 경우, 후술할 사용자 입력과 패턴과의 매칭 과정에서 상기 태그 단어에 기술된 형태소 및 품사 정보가 상기 사용자 입력의 분절을 분석한 결과에 존재할 때, 상기 사용자 입력의 분절과 상기 태그 단어가 매칭에 성공한 것으로 간주될 수 있다.In addition, according to an embodiment of the present application, in the case of the tag word, when the morpheme and part-of-speech information described in the tag word exist in the result of analyzing the segment of the user input in the process of matching a user input and a pattern to be described later. , It may be considered that the segment of the user input and the tag word have been successfully matched.

본원의 일 실시예에 따르면, 단어(60)는 도6의 3번 째 문장에 대한 용언 측 단어와 같이 '작성:NNG', '하:VV' 및 '싶:VX'의 복수 개의 형태소(50)를 포괄하는 것일 수 있다.According to an embodiment of the present application, the word 60 is a plurality of morphemes 50 of'write:NNG','low:VV', and'want:VX' like the word on the proverb for the third sentence of FIG. ) May be included.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 체언으로 분류된 형태소를 포함하는 체언 측 단어를 원소로 하는 체언 측 컨셉 및 상기 용언으로 분류된 형태소를 포함하는 용언 측 단어를 원소로 하는 용언 측 컨셉을 획득할 수 있다.In addition, according to an exemplary embodiment of the present application, the pattern generation unit 110 generates a concept of a word on the side of the body including the morphemes classified as the word, and a word on the side of the word including the morphemes classified as the word. It is possible to acquire the concept of the word of the word as an element.

예시적으로, 도6을 참조하면, 체언 측 컨셉은 단어 '문서:NNG', 단어 '문서:NNP'를 원소로 하는 것일 수 있으며, 용언 측 컨셉은 단어 '작성:NNG', 단어 '쓰:VV,거:NNB', 단어 '작성:NNG,하:VV,:싶:VX', 단어 '쓰:VV, 하:VX', 단어 '쓰:VV, 싶:VX'를 원소로 하는 것일 수 있다.For example, referring to FIG. 6, the concept of the word'document:NNG' and the word'document:NNP' may be used as elements, and the concept of the word'write:NNG' and the word'write: VV, G: NNB', the word'Write: NNG, Ha: VV,: Want: VX', the word 'Write: VV, Ha: VX', the word' Write: VV, Want: VX' as elements. have.

본원의 일 실시예에 따르면, 체언의 경우 형 변화가 다소 적은 한글 문장 구조의 특성 상 체언 측 단어를 원소로 하는 체언 측 컨셉은 사용자 입력의 하나의 분절과 우선적으로 매칭되도록 설정될 수 있고, 용언의 경우 형 변화가 다양한 한글 문장 구조의 특성 상 용언 측 단어를 원소로 하는 용언 측 컨셉은 사용자 입력의 둘 이상의 복수의 분절과 우선적으로 매칭되도록 설정될 수 있다.According to an embodiment of the present application, in the case of a body language, the concept of the body language side, which has a word on the body language as an element, may be set to preferentially match with one segment of the user input, due to the characteristics of a Hangul sentence structure with a little change in type. In the case of, the proverb-side concept using the proverb-side word as an element may be set to preferentially match two or more segments of the user input due to the characteristics of the Hangul sentence structure with various types of change.

또한, 본원의 일 실시예에 따르면, 상기 컨셉은, 상기 컨셉의 원소인 단어를 포괄하는 컨셉 명칭이 존재하고 상기 컨셉 명칭에 기초하여 정의되거나 별도의 컨셉 명칭 없이 상기 컨셉의 원소인 단어를 직접 나열한 익명 컨셉 형태로 정의될 수 있다.In addition, according to an embodiment of the present application, the concept includes a concept name including a word that is an element of the concept, and is defined based on the concept name or directly listing the word as an element of the concept without a separate concept name. It can be defined in the form of an anonymous concept.

도7은 본원의 일 실시예에 따른 컨셉을 획득하는 방식을 설명하기 위한 도면이다.7 is a diagram illustrating a method of acquiring a concept according to an embodiment of the present application.

도7을 참조하면, 컨셉 명칭이 존재하는 컨셉(71a, 71b)은 원소인 단어를 포괄하는 컨셉 명칭(예를 들어, 문서 또는 작성)이 존재하고 상기 컨셉 명칭에 기초하여 정의될 수 있으며, 익명 컨셉 형태로 정의된 컨셉(72a, 72b)은 별도의 컨셉 명칭 없이 컨셉의 원소(예를 들어, 문서:NN, 작성:NN, 쓰:VV,거:NNB 등)를 직접 나열하는 방식으로 정의될 수 있다.Referring to FIG. 7, the concept 71a and 71b in which the concept name exists has a concept name (for example, a document or written) encompassing a word as an element, and may be defined based on the concept name, and is anonymous. Concepts (72a, 72b) defined in concept form can be defined in a way that directly lists the elements of the concept (e.g., document:NN, creation:NN, write:VV, geo:NNB, etc.) without a separate concept name. I can.

또한, 상기 컨셉은 전술한 바와 같이 단어(하나 이상의 형태소를 묶은 것)를 원소로 할 수 있고, 다른 컨셉을 원소로 할 수도 있다.In addition, as described above, the concept may be a word (a group of one or more morphemes) as an element, or another concept may be used as an element.

예시적으로, {~문서} 컨셉은 '문서:NNP'와 같은 단어를 원소로 포함할 수 있고, 하위 컨셉이며 미리 생성된 컨셉인 {~보고서}, {~품의서}, {~요청서} 등을 원소로 포함할 수도 있다.For example, the {~document} concept may include a word such as'document:NNP' as an element, and it is a sub-concept and includes pre-generated concepts such as {~report}, {~goodwill}, and {~request}. It can also be included as an element.

본원의 일 실시예에 따르면, 컨셉의 원소는 상기 문자열 단어, 상기 태그 단어 또는 다른 컨셉 중 적어도 하나가 될 수 있으며, 컨셉이 다른 컨셉을 원소로 하는 경우, 각 컨셉의 상기 컨셉 명칭이 존재하는지 여부는 동일하게 결정될 필요는 없다.According to an embodiment of the present application, the element of the concept may be at least one of the string word, the tag word, or another concept, and if the concept is a different concept as an element, whether the concept name of each concept exists Need not be determined identically.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 사용자 입력에서 특정 문구를 추출하기 위해 상기 특정 문구의 위치를 나타내도록 치환되는 캡쳐 또는 상기 컨셉 중 적어도 하나를 포함하는 패턴 문자열을 작성할 수 있다.In addition, according to an embodiment of the present application, the pattern generation unit 110 generates a pattern string including at least one of the concept or a capture substituted to indicate the position of the specific phrase in order to extract the specific phrase from the user input. You can write it.

본원의 일 실시예에 따르면, 상기 패턴 문자열은 주로 문자열(string) 관련 프로그래밍 분야에서 사용되는 일종의 형식 언어인 정규 표현식(Regular Expression; regex)과 같은 목적으로 작성되며, 패턴 생성부(110)가 획득한 상기 컨셉 또는 상기 캡쳐 중 적어도 하나를 포함할 수 있다.According to an embodiment of the present application, the pattern string is mainly written for the same purpose as a regular expression (regex), which is a kind of formal language used in a programming field related to a string, and the pattern generator 110 is obtained. It may include at least one of the concept or the capture.

또한, 상기 캡쳐는 사용자 입력 분석부(120)가 상기 사용자 입력과 상기 패턴을 매칭하는 과정에서 상기 사용자 입력에서 특정 문구를 추출하기 위해 상기 특정 문구의 위치를 나타내도록 상기 패턴 문자열 내에 {_*} 꼴로 치환된 부분을 의미할 수 있다. 다만, 본원 전반에서 {_*}로 표기된 상기 캡쳐는 {_*} 형태의 표기법에만 국한되는 것은 아니며, 다양한 형태를 가질 수 있다.In addition, the capture is {_*} in the pattern string to indicate the location of the specific phrase in order to extract the specific phrase from the user input in the process of matching the user input and the pattern by the user input analysis unit 120 It may mean a part that is replaced with a shape. However, the capture indicated by {_*} throughout the present application is not limited to the notation of the {_*} form, and may have various forms.

도8은 본원의 일 실시예에 따른 패턴 문자열을 설명하기 위한 도면이다.8 is a diagram for describing a pattern character string according to an embodiment of the present application.

도8을 참조하면, 본원의 일 실시예에 따른 패턴 문자열(80a, 80b)는 캡쳐(81) 또는 컨셉(82) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 8, the pattern strings 80a and 80b according to the exemplary embodiment of the present disclosure may include at least one of a capture 81 and a concept 82.

본원의 일 실시예에 따르면, 상측에 도시된 패턴 문자열(80a)의 캡쳐는 '기획'문서, '증빙'문서, '휴가신청'문서 등 패턴 문자열에 포함된 컨셉({~[문서:NN}}의 앞에 붙어 문서의 종류를 한정하도록 기능하는 문자열 자체(예를 들어, 기획, 증빙, 휴가신청)를 사용자 입력으로부터 획득할 수 있도록 {_*} 꼴로 치환된 것일 수 있다.According to an embodiment of the present application, the capture of the pattern string 80a shown on the upper side is a concept included in the pattern string such as a'planning' document, a'proof' document, and a'vacation application' document ({~[document: NN}). The character string itself (for example, planning, proof, and vacation request), which is placed in front of} and functions to limit the type of document, may be replaced with {_*} so that the user input can be obtained.

본원의 다른 실시예에 따르면, 하측에 도시된 패턴 문자열(80b)의 캡쳐는 '한'마리, '두'그루, '세'번 등 패턴 문자열에 포함된 수량 단위를 포괄하는 컨셉 {~단위}의 앞에 오는 수량 정보를 사용자 입력으로부터 획득할 수 있도록 {_*} 꼴로 치환된 것일 수 있다.According to another embodiment of the present application, the capture of the pattern string 80b shown at the bottom is a concept encompassing the quantity units included in the pattern string such as'one','two' trees, and'three' times {~unit} It may be replaced with a {_*} format so that the quantity information preceding in can be obtained from a user input.

실시예에 따라, 문장 내에서 특정 문구를 추출할 부분이 존재하지 않는 경우, 상기 패턴 문자열에는 상기 캡쳐가 포함되지 않을 수 있다.According to an embodiment, when there is no part to extract a specific phrase in a sentence, the capture may not be included in the pattern string.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 패턴 문자열로부터 적어도 하나의 상기 컨셉을 포함하는 컨셉 집합에 기초하여 상기 패턴 문자열에 기술된 컨셉을 획득할 수 있다. 이 때 획득된 컨셉은 후술할 사용자 입력과 패턴과의 매칭 과정에서 매칭 성공 여부를 판단하기 위한 기준으로 사용될 수 있다.In addition, according to the exemplary embodiment of the present application, the pattern generation unit 110 may obtain a concept described in the pattern string based on a concept set including at least one concept from the pattern string. In this case, the acquired concept may be used as a criterion for determining whether matching is successful in a process of matching a user input and a pattern to be described later.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 컨셉을 적어도 하나 포함하는 컨셉 집합에 기초하여 상기 패턴 문자열을 작성할 수 있다.In addition, according to the exemplary embodiment of the present application, the pattern generator 110 may create the pattern string based on a concept set including at least one concept.

즉, 패턴 생성부(110)는 상기 패턴 문자열을 작성하는 과정에서 매번 복수의 문장의 체언 측 또는 용언 측에 대한 상기 컨셉을 새로 정의할 필요없이, 컨셉 집합 내 기 존재하는 컨셉을 불러오는 방식으로 상기 패턴 문자열을 작성할 수 있어, 패턴을 생성하기 위해 요구되는 연산 처리를 간소화할 수 있다.That is, the pattern generation unit 110 does not need to newly define the concept for the word side or the word side of a plurality of sentences each time in the process of creating the pattern string, and calls the concept existing in the concept set. Pattern strings can be created, simplifying the processing of operations required to create patterns.

또한, 본원의 일 실시예에 따르면, 패턴 생성부(110)는 상기 패턴 문자열에 기초하여 패턴을 정의할 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the pattern generator 110 may define a pattern based on the pattern string.

도9는 본원의 일 실시예에 따른 패턴 문자열에 기초한 패턴을 설명하기 위한 도면이다.9 is a diagram for explaining a pattern based on a pattern character string according to an embodiment of the present application.

도9를 참조하면, 본원의 일 실시예에 따른 패턴(90a, 90b)은 패턴 문자열(80), 상기 복수의 문장에 내포된 유사한 의도에 따라 결정되는 패턴의 명칭(91), 상기 패턴 문자열을 작성하는데 사용된 언어 유형(92) 및 상기 패턴 문자열에 포함된 컨셉이 속하는 컨셉 집합(93)을 포함할 수 있다.Referring to FIG. 9, the patterns 90a and 90b according to an embodiment of the present application include a pattern string 80, a name 91 of a pattern determined according to similar intentions contained in the plurality of sentences, and the pattern string. It may include a language type 92 used for writing and a concept set 93 to which a concept included in the pattern string belongs.

본원의 일 실시예에 따르면, 언어 유형(92)은 로케일(Locale)로 표현될 수 있으며, 한글 이외의 언어에 있어서도 패턴을 정의할 수 있도록 패턴 문자열을 작성하는데 사용된 언어의 종류를 나타내는 것으로써, 소정의 규칙에 따라 언어 유형을 표기(예를 들어, 한글의 경우 ko_KR)할 수 있다.According to an embodiment of the present application, the language type 92 may be expressed in a locale, and indicates the type of language used to create the pattern string so that the pattern can be defined even in languages other than Korean. , According to a predetermined rule, the language type may be indicated (eg, ko_KR for Korean).

본원의 일 실시예에 따르면, 컨셉 집합(93)은 패턴 문자열(80)에 포함된 컨셉이 속하는 컨셉 집합을 나타낸 것으로, 패턴을 정의할 때 활용 가능한 모든 컨셉을 포함하는 것일 수 있다. According to an exemplary embodiment of the present disclosure, the concept set 93 represents a concept set to which a concept included in the pattern string 80 belongs, and may include all concepts that can be utilized when defining a pattern.

본원의 일 실시예에 따르면, 컨셉 집합(93)은 패턴을 생성하는 과정에서 패턴 문자열에 명시된 컨셉을 찾아 패턴에 설정하기 위해 전달되는 입력으로 이해될 수 있다. According to the exemplary embodiment of the present disclosure, the concept set 93 may be understood as an input transmitted to find a concept specified in a pattern string and set it in a pattern in the process of generating a pattern.

본원의 일 실시예에 따르면, 컨셉 집합(93)은 소정의 기준에 따라 복수의 컨셉을 묶은 컨셉군을 표기한 것일 수 있다. 예를 들어, 컨셉 집합(93)은 해당 패턴을 정의할 때, 어느 컨셉군에 속하는 컨셉을 기초로 하여 해당 패턴의 패턴 문자열이 작성되었는지를 나타내기 위해 명시될 수 있다.According to the exemplary embodiment of the present application, the concept set 93 may indicate a concept group in which a plurality of concepts are grouped according to a predetermined criterion. For example, when defining a corresponding pattern, the concept set 93 may be specified to indicate whether a pattern string of a corresponding pattern is created based on a concept belonging to which concept group.

또한, 본원의 일 실시예에 따르면, 패턴 문자열에 명시된 컨셉은 후술할 사용자 입력과의 매칭 여부를 판단하는 과정에서 상기 패턴 문자열에 명시된 컨셉이 컨셉 집합(93) 내 존재하는 경우 해당 컨셉은 컨셉 집합(93)으로부터 불러와질 수 있고, 사용자 입력과의 매칭 여부를 판단하는 과정에서 활용될 수 있다.In addition, according to an embodiment of the present application, in the process of determining whether the concept specified in the pattern string matches a user input to be described later, when the concept specified in the pattern string exists in the concept set 93, the concept is a concept set. It can be fetched from (93), and used in the process of determining whether to match a user input.

사용자 입력 분석부(120)는 사용자 입력을 수신하여 상기 사용자 입력에 대응되는 상기 패턴을 매칭하여 매칭 데이터를 반환할 수 있다.The user input analysis unit 120 may receive a user input, match the pattern corresponding to the user input, and return matching data.

또한, 본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 사용자가 육성으로 말한 문장을 상기 사용자 입력으로 하는 음성 입력 형태로 상기 사용자 입력을 수신할 수 있다.In addition, according to the exemplary embodiment of the present application, the user input analysis unit 120 may receive the user input in the form of a voice input in which a sentence spoken by the user as the user input is used as the user input.

본원의 일 실시예에 따르면 사용자 입력 분석부(120)는 사용자에 의해 발화된 음성을 인식하여 텍스트 형식으로 변환할 수 있다.According to an exemplary embodiment of the present disclosure, the user input analysis unit 120 may recognize a voice spoken by a user and convert it into a text format.

이를 위해, 사용자 입력 분석부(120)는, 당해 기술 분야에서 널리 알려진 적어도 하나의 음성 인식 알고리즘을 이용할 수 있고, 상기 텍스트 형식으로의 변환을 위해STT(Speech-To-Text) 기법이 활용될 수 있다.To this end, the user input analysis unit 120 may use at least one speech recognition algorithm widely known in the art, and a speech-to-text (STT) technique may be used for conversion into the text format. have.

또한, 본원의 다른 실시예에 따르면, 사용자 입력 분석부(120)는 사용자가 키보드 등의 별도의 입력 장치를 통해 타이핑한 문장을 상기 사용자 입력으로 하는 대화형 입력 형태로 상기 사용자 입력을 수신할 수 있다.Further, according to another embodiment of the present application, the user input analysis unit 120 may receive the user input in the form of an interactive input in which the user inputs a sentence typed through a separate input device such as a keyboard. have.

또한, 본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 상기 사용자 입력을 띄어쓰기 단위로 잘라 복수의 분절로 상기 사용자 입력을 나눌 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the user input analysis unit 120 may divide the user input into a plurality of segments by cutting the user input into spaces.

또한, 본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 적어도 하나의 분절과 상기 패턴과의 매칭 여부를 판단하여 매칭이 성공될 경우 매칭 데이터를 반환할 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the user input analysis unit 120 may determine whether at least one segment matches the pattern and return matching data when the matching is successful.

또한, 본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 적어도 하나의 분절과 단어를 매칭하거나 상기 적어도 하나의 분절과 컨셉을 매칭하고, 매칭된 단어 및 매칭된 컨셉을 모두 포함하는 패턴 문자열을 결정하여 매칭 데이터를 반환할 수 있다.Further, according to an embodiment of the present application, the user input analysis unit 120 matches at least one segment and a word, or matches the at least one segment and a concept, and includes both the matched word and the matched concept. Matching data can be returned by determining a string.

또한, 본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 적어도 하나의 분절과 컨셉을 매칭하기 위하여 하나의 분절과 일치하는 단어를 원소로 하는 컨셉이 존재하는지 판단하고, 존재하지 않을 경우 상기 하나의 분절과 다음에 오는 분절을 결합한 것과 일치하는 단어를 원소로 하는 컨셉을 매칭할 수 있다. 사용자 입력 분석부(120)는 순차적으로 분절을 결합해가며 컨셉 내 원소와 일치하는지 여부를 판단하고, 일치하는 원소가 없는 경우 다음 컨셉의 원소에 대해 매칭 여부를 판단할 수 있다.In addition, according to an embodiment of the present application, in order to match the concept with at least one segment, the user input analysis unit 120 determines whether a concept having a word matching one segment as an element exists, and if not A concept in which a word that matches the combination of the one segment and the next segment as an element may be matched. The user input analysis unit 120 may sequentially combine the segments to determine whether or not they match the elements in the concept, and if there is no matching element, it may determine whether or not the elements of the next concept are matched.

본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 패턴 문자열에 명시된 컨셉이 컨셉 집합에 존재하는 경우, 이를 가져와 매칭에 활용할 수 있다.According to an exemplary embodiment of the present disclosure, when a concept specified in a pattern string exists in a concept set, the user input analysis unit 120 may take it and use it for matching.

또한, 본원의 일 실시예에 따르면, 사용자 입력 분석부(120)는 상기 패턴 문자열에 캡쳐가 포함된 경우 상기 캡쳐와 상기 적어도 하나의 분절을 매칭하여 상기 캡쳐와 연계된 사용자 입력 내 특정 문구를 추출할 수 있다.In addition, according to an embodiment of the present application, when a capture is included in the pattern string, the user input analysis unit 120 matches the capture with the at least one segment to extract a specific phrase in the user input associated with the capture. can do.

또한, 본원의 일 실시예에 따르면, 상기 매칭 데이터는 상기 매칭된 컨셉에 대한 데이터 또는 상기 캡쳐와 연계된 사용자 입력 내 특정 문구 중 적어도 하나를 포함할 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the matching data may include at least one of data on the matched concept or a specific phrase in a user input associated with the capture.

도10은 본원의 일 실시예에 따른 패턴 생성부의 구성을 도시한 도면이다.10 is a diagram showing the configuration of a pattern generator according to an embodiment of the present application.

도10을 참조하면, 본원의 일 실시예에 따른 패턴 생성부(110)는 형태소 분석부(111), 컨셉 획득부(112), 패턴 문자열 작성부(113) 및 패턴 정의부(114)를 포함할 수 있다.Referring to FIG. 10, the pattern generation unit 110 according to an embodiment of the present application includes a morpheme analysis unit 111, a concept acquisition unit 112, a pattern string creation unit 113, and a pattern definition unit 114 can do.

형태소 분석부(111)는 상기 복수의 문장 각각에 대한 형태소 분석을 수행할 수 있다.The morpheme analysis unit 111 may perform morpheme analysis for each of the plurality of sentences.

또한, 본원의 일 실시예에 따르면, 형태소 분석부(111)는 상기 형태소 분석 결과로부터 어근에 해당하는 형태소를 탐지하고, 상기 어근에 해당하는 형태소를 체언과 용언으로 분류할 수 있다.In addition, according to an exemplary embodiment of the present application, the morpheme analysis unit 111 may detect a morpheme corresponding to a root from the result of the morpheme analysis, and classify the morpheme corresponding to the root into a body language and a verb.

컨셉 획득부(112)는 형태소 분석 결과에 기초하여 컨셉을 획득할 수 있다.The concept acquisition unit 112 may acquire a concept based on the morpheme analysis result.

또한, 본원의 일 실시예에 따르면, 컨셉 획득부(112)는 상기 체언으로 분류된 형태소를 포함하는 체언 측 단어를 원소로 하는 체언 측 컨셉 및 상기 용언으로 분류된 형태소를 포함하는 용언 측 단어를 원소로 하는 용언 측 컨셉을 획득할 수 있다.In addition, according to an embodiment of the present application, the concept acquisition unit 112 includes a word on the side of the word including the word on the side of the body, including the morphemes classified as the word, as an element It is possible to acquire the concept of the word of the word as an element.

패턴 문자열 작성부(113)는 상기 사용자 입력에서 특정 문구를 추출하기 위해 상기 특정 문구의 위치를 나타내도록 치환되는 캡쳐 또는 상기 컨셉 중 적어도 하나를 포함하는 패턴 문자열을 작성할 수 있다.The pattern string creation unit 113 may create a pattern string including at least one of the concept or capture substituted to indicate the location of the specific phrase in order to extract the specific phrase from the user input.

또한, 본원의 일 실시예에 따르면, 패턴 문자열 작성부(113)는 컨셉 획득부(112)가 획득한 상기 컨셉을 적어도 하나 포함하는 컨셉 집합에 기초하여 상기 패턴 문자열을 작성할 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the pattern string creation unit 113 may create the pattern string based on a concept set including at least one concept acquired by the concept acquisition unit 112.

패턴 정의부(114)는 상기 패턴 문자열에 기초하여 패턴을 정의할 수 있다.The pattern definition unit 114 may define a pattern based on the pattern character string.

또한, 본원의 일 실시예에 따르면, 패턴 정의부(114)는 패턴 문자열, 복수의 문장에 내포된 유사한 의도에 따라 결정되는 패턴의 명칭, 상기 패턴 문자열을 작성하는 사용된 언어 유형 및 상기 패턴 문자열에 포함된 컨셉이 속하는 컨셉 집합을 포함하여 상기 패턴을 정의하는 것일 수 있다.In addition, according to an embodiment of the present application, the pattern definition unit 114 includes a pattern character string, a name of a pattern determined according to similar intentions contained in a plurality of sentences, a language type used to create the pattern character string, and the pattern character string. It may be to define the pattern including a concept set to which the concept included in belongs.

도11은 본원의 일 실시예에 따른 사용자 입력 분석부의 구성을 도시한 도면이다.11 is a diagram showing the configuration of a user input analysis unit according to an embodiment of the present application.

도11을 참조하면, 본원의 일 실시예에 따른 사용자 입력 분석부(120)는 분절화부(121) 및 매칭 수행부(122)를 포함할 수 있다.Referring to FIG. 11, the user input analysis unit 120 according to the exemplary embodiment of the present disclosure may include a segmentation unit 121 and a matching execution unit 122.

분절화부(121)는 상기 사용자 입력을 띄어쓰기 단위로 잘라 복수의 분절로 상기 사용자 입력을 나눌 수 있다.The segmentation unit 121 may divide the user input into a plurality of segments by cutting the user input into spaces.

매칭 수행부(122)는 적어도 하나의 상기 분절과 상기 패턴과의 매칭 여부를 판단하여 매칭이 성공될 경우 매칭 데이터를 반환할 수 있다.The matching performing unit 122 may determine whether or not the at least one segment matches the pattern, and may return matching data when the matching is successful.

본원의 일 실시예에 따르면, 매칭 수행부(122)는 적어도 하나의 분절과 단어를 매칭하거나 상기 적어도 하나의 분절과 컨셉을 매칭하고, 매칭된 단어 및 매칭된 컨셉을 모두 포함하는 패턴 문자열을 결정하여 매칭 데이터를 반환할 수 있다.According to an embodiment of the present application, the matching execution unit 122 matches at least one segment and a word, or matches the at least one segment and a concept, and determines a pattern string including both the matched word and the matched concept. To return the matching data.

또한, 본원의 일 실시예에 따르면, 매칭 수행부(122)는 상기 패턴 문자열에 캡쳐가 포함된 경우 상기 캡쳐와 상기 적어도 하나의 분절을 매칭하여 상기 캡쳐와 연계된 사용자 입력 내 특정 문구를 추출할 수 있다.In addition, according to an exemplary embodiment of the present application, when the pattern string includes a capture, the matching execution unit 122 matches the capture and the at least one segment to extract a specific phrase in the user input associated with the capture. I can.

매칭 수행부(122)는 적어도 하나의 분절과 컨셉을 매칭하기 위하여 하나의 분절과 일치하는 단어를 원소로 하는 컨셉이 존재하는지 판단하고, 존재하지 않을 경우 상기 하나의 분절과 다음에 오는 분절을 결합한 것과 일치하는 단어를 원소로 하는 컨셉을 매칭할 수 있다. 또한, 매칭 수행부(122)는 순차적으로 분절을 결합해가며 컨셉 내 원소와 일치하는지 여부를 판단하고, 일치하는 원소가 없는 경우 다음 컨셉의 원소에 대해 매칭 여부를 판단할 수 있다.In order to match the concept with at least one segment, the matching execution unit 122 determines whether a concept having a word matching one segment as an element exists, and if not, combines the one segment and the next segment. It is possible to match concepts with words that match those as elements. In addition, the matching execution unit 122 may sequentially combine the segments to determine whether or not they match the elements in the concept, and if there is no matching element, it may determine whether to match the element of the next concept.

본원의 일 실시예에 따르면, 사용자 입력이 '휴가신청문서 쓰려고 해' 라는 사용자의 발화인 경우, 분절화부(121)는 상기 사용자 입력을 '휴가신청문서/ 쓰려고/해'와 같이 띄어 쓰기 단위로 자를 수 있고, 제1분절 '휴가신청문서', 제2분절인 '쓰려고' 및 제3분절인 '해'가 도출될 수 있다 즉, 띄어 쓰기 단위를 기준으로 잘린 부분 각각이 분절이 될 수 있다.According to an embodiment of the present application, when the user input is the user's utterance of'I am going to write a vacation application document', the segmentation unit 121 uses the user input as a ‘vacation application document/to write/year’ It can be cut, and the first segment'vacation application document', the second segment'to write', and the third segment'sun' can be derived. That is, each of the cut parts based on the spacing unit can be a segment. .

이어서, 매칭 수행부(122)는 제1분절 내지 제3분절과 일치하는 단어를 원소로 하는 컨셉을 매칭할 수 있으며, 그 결과 제1분절이 컨셉 {~문서}와 매칭될 수 있으며, 제2분절과 제3분절만으로는 컨셉과 매칭되지 않고, 제2분절과 제3분절을 결합한 것이 컨셉 {~작성}과 매칭될 수 있다.Subsequently, the matching execution unit 122 may match a concept having a word matching the first segment to the third segment as an element, and as a result, the first segment may be matched with the concept {~document}, and the second The segment and the third segment alone do not match the concept, but the combination of the second segment and the third segment may match the concept {~writing}.

이어서, 매칭 수행부(122)는 {~문서}, {~작성} 컨셉을 포함하는 패턴 문자열을 결정할 수 있고, 상기 패턴 문자열에 캡쳐 {_*}가 {~문서}컨셉 앞에 표기되어 있는 경우, 사용자 입력에서 캡쳐에 대응되는 데이터인 '휴가신청'이라는 특정 문구를 추출할 수 있다.Subsequently, the matching execution unit 122 may determine a pattern string including the {~document} and {~write} concepts, and when the capture {_*} is indicated in front of the {~document} concept in the pattern string, A specific phrase called'vacation request', which is data corresponding to the capture, can be extracted from the user input.

따라서, 매칭 데이터에는 상기 사용자 입력으로부터 컨셉 {~문서} 및 컨셉 {~작성} 과 매칭되는 부분이 존재한다는 정보 및 '휴가신청'이라는 특정 문구가 포함될 수 있다.Accordingly, the matching data may include information indicating that there is a part matching the concept {~document} and the concept {~write} from the user input, and a specific phrase “request for vacation”.

이에 따라, 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 장치(100)는 상기 매칭 데이터를 기반으로 상기 사용자 입력에 내포된 사용자의 의도(휴가신청문서를 작성하려는 의도를 가진 발화라는 점)를 파악할 수 있고, 상기 사용자의 의도에 따라 전자문서 작성 장치(200)는 휴가신청문서 전자서식 템플릿을 불러와 상기 전자서식 템플릿에 기초한 전자문서 작성을 개시할 수 있다.Accordingly, the natural language matching device 100 based on morpheme analysis according to an embodiment of the present application is based on the matching data, the intention of the user contained in the user input (the point is that it is a speech with the intention to write a vacation application document) May be recognized, and according to the user's intention, the electronic document creation device 200 may start to create an electronic document based on the electronic form template by calling the vacation application document electronic form template.

도12는 본원의 일 실시예에 따른 형태소 분석 기반의 자연어 매칭 방법의 동작흐름도이다.12 is an operation flow diagram of a natural language matching method based on morpheme analysis according to an embodiment of the present application.

도12에 도시된 형태소 분석 기반의 자연어 매칭 방법은 앞서 설명된 형태소 분석 기반의 자연어 매칭 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 형태소 분석 기반의 자연어 매칭 장치(100)에 대하여 설명된 내용은 도12에도 동일하게 적용될 수 있다.The morpheme analysis-based natural language matching method illustrated in FIG. 12 may be performed by the morpheme analysis-based natural language matching apparatus 100 described above. Therefore, even if omitted below, the description of the natural language matching apparatus 100 based on morpheme analysis may be equally applied to FIG. 12.

도12를 참조하면, 단계 S1210에서 패턴 생성부(110)는 유사한 의도를 내포하는 복수의 문장에 대한 형태소 분석을 통해 상기 복수의 문장에 포함된 컨셉을 획득하여 패턴 문자열을 작성하고 상기 패턴 문자열에 기초하여 패턴을 생성할 수 있다.Referring to FIG. 12, in step S1210, the pattern generator 110 obtains a concept included in the plurality of sentences through morpheme analysis of a plurality of sentences containing similar intentions, creates a pattern character string, and creates a pattern character string in the pattern character string. You can create a pattern based on it.

다음으로, 단계 S1220에서 사용자 입력 분석부(120)는 사용자 입력을 수신할 수 있다.Next, in step S1220, the user input analysis unit 120 may receive a user input.

다음으로, 단계 S1230에서 사용자 입력 분석부(120)는 상기 사용자 입력에 대응되는 상기 패턴을 매칭하여 매칭 데이터를 반환할 수 있다.Next, in step S1230, the user input analyzer 120 may match the pattern corresponding to the user input and return matching data.

본원의 일 실시예에 따르면, 도 12에 도시된 매칭 데이터를 반환하는 단계(S1230)는 상기 수신된 사용자 입력을 띄어쓰기 단위로 잘라 복수의 분절로 상기 사용자 입력을 나누는 단계 및 적어도 하나의 분절과 상기 패턴과의 매칭 여부를 판단하여 매칭이 성공될 경우 매칭 데이터를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present application, the returning the matching data shown in FIG. 12 (S1230) comprises dividing the user input into a plurality of segments by cutting the received user input by spaces, and at least one segment and the It may include determining whether to match the pattern and generating matching data when the matching is successful.

상술한 설명에서, 단계 S1210 내지 단계 S1230은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S1210 to S1230 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present disclosure. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

도13은 본원의 일 실시예에 따른 패턴 생성 방법의 동작흐름도이다.13 is an operation flow diagram of a method for generating a pattern according to an embodiment of the present application.

도13에 도시된 패턴 생성 방법은 앞서 설명된 형태소 분석 기반의 자연어 매칭 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 형태소 분석 기반의 자연어 매칭 장치(100)에 대하여 설명된 내용은 도13에도 동일하게 적용될 수 있다.The pattern generation method illustrated in FIG. 13 may be performed by the natural language matching apparatus 100 based on morpheme analysis described above. Therefore, even if omitted below, the description of the natural language matching apparatus 100 based on morpheme analysis may be equally applied to FIG. 13.

도13을 참조하면, 단계 S1310에서 형태소 분석부(111)는 상기 복수의 문장 각각에 대한 형태소 분석을 수행할 수 있다.Referring to FIG. 13, in step S1310, the morpheme analysis unit 111 may perform morpheme analysis for each of the plurality of sentences.

다음으로, 단계 S1320에서 컨샙 획득부(112)는 형태소 분석 결과에 기초하여 컨셉을 획득할 수 있다.Next, in step S1320, the consap acquisition unit 112 may acquire a concept based on the morpheme analysis result.

다음으로, 단계 S1330에서 패턴 문자열 작성부(113)는 상기 사용자 입력에서 특정 문구 추출하기 위한 캡쳐 또는 상기 컨셉 중 적어도 하나를 포함하는 패턴 문자열을 작성할 수 있다.Next, in step S1330, the pattern string creation unit 113 may create a pattern string including at least one of the concept or capture for extracting a specific phrase from the user input.

다음으로, 단계 S1340에서 패턴 정의부(114)는 상기 패턴 문자열에 기초하여 패턴을 정의할 수 있다.Next, in step S1340, the pattern definition unit 114 may define a pattern based on the pattern character string.

상술한 설명에서, 단계 S1310 내지 단계 S1340은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S1310 to S1340 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present disclosure. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

도14는 본원의 일 실시예에 따른 매칭 데이터 생성 방법의 동작흐름도이다.14 is a flowchart illustrating an operation of a method for generating matching data according to an embodiment of the present application.

도14에 도시된 매칭 데이터 생성 방법은 앞서 설명된 형태소 분석 기반의 자연어 매칭 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 형태소 분석 기반의 자연어 매칭 장치(100)에 대하여 설명된 내용은 도14에도 동일하게 적용될 수 있다.The method of generating matching data illustrated in FIG. 14 may be performed by the natural language matching apparatus 100 based on morpheme analysis described above. Accordingly, even if omitted below, the description of the natural language matching apparatus 100 based on morpheme analysis may be equally applied to FIG. 14.

도14를 참조하면, 단계 S1410에서 분절화부(121)는 상기 수신된 사용자 입력을 띄어쓰기 단위로 잘라 복수의 분절로 상기 사용자 입력을 나눌 수 있다.Referring to FIG. 14, in step S1410, the segmentation unit 121 may divide the received user input into a plurality of segments by cutting the received user input into spaces.

다음으로, 단계 S1420에서 매칭 수행부(122)는 적어도 하나의 분절과 단어를 매칭하거나 상기 적어도 하나의 분절과 컨셉을 매칭할 수 있다.Next, in step S1420, the matching execution unit 122 may match at least one segment and a word or match the at least one segment and a concept.

다음으로, 단계 S1430에서 매칭 수행부(122)는 매칭된 단어 및 매칭된 컨셉을 모두 포함하는 패턴 문자열을 결정할 수 있다.Next, in step S1430, the matching execution unit 122 may determine a pattern string including both the matched word and the matched concept.

다음으로, 단계 S1440에서 매칭 수행부(122)는 상기 패턴 문자열에 캡쳐가 포함되는지 판단할 수 있다.Next, in step S1440, the matching performing unit 122 may determine whether a capture is included in the pattern string.

다음으로, 단계 S1450에서 매칭 수행부(122)는 상기 패턴 문자열에 캡쳐가 포함된 경우 상기 캡쳐와 상기 적어도 하나의 분절을 매칭하여 상기 캡쳐와 연계된 사용자 입력 내 특정 문구를 추출할 수 있다.Next, in step S1450, when a capture is included in the pattern string, the matching execution unit 122 may match the capture and the at least one segment to extract a specific phrase in the user input associated with the capture.

반대로, 단계 S1440에서 상기 패턴 문자열에 캡쳐가 포함되지 않는 경우, 매칭 수행부(122)는 후술할 단계 S1460으로 넘어갈 수 있다.Conversely, when the pattern string does not contain capture in step S1440, the matching execution unit 122 may proceed to step S1460, which will be described later.

다음으로, 단계 S1460에서 매칭 수행부(122)는, 매칭된 컨셉에 대한 데이터 또는 상기 캡쳐와 연계된 사용자 입력 내 특정 문구 중 적어도 하나를 포함하는 상기 매칭 데이터를 생성할 수 있다.Next, in step S1460, the matching execution unit 122 may generate the matching data including at least one of data on the matched concept or a specific phrase in the user input associated with the capture.

상술한 설명에서, 단계 S1410 내지 단계 S1460은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S1410 to S1460 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present disclosure. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

본원의 일 실시 예에 따른 형태소 분석 기반의 자연어 매칭 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The natural language matching method based on morpheme analysis according to an embodiment of the present application may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The above-described hardware device may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

또한, 전술한 형태소 분석 기반의 자연어 매칭 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the above-described morpheme analysis-based natural language matching method may be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustrative purposes only, and those of ordinary skill in the art to which the present application pertains will be able to understand that it is possible to easily transform it into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present application.

10: 자연어 처리 기반 전자문서 제어 시스템
100: 형태소 분석 기반의 자연어 매칭 장치
110: 패턴 생성부
111: 형태소 분석부
112: 컨셉 획득부
113: 패턴 문자열 작성부
114: 패턴 정의부
120: 사용자 입력 분석부
121: 분절화부
122: 매칭 수행부
200: 전자문서 작성 장치
300: 전자문서 처리 장치
400: 전자문서 검색 장치
500: 네비게이션 모듈
1: 네트워크
2: 전자문서10: Electronic document control system based on natural language processing
100: Natural language matching device based on morpheme analysis
110: pattern generation unit
111: morpheme analysis unit
112: Concept acquisition unit
113: pattern string creation unit
114: pattern definition unit
120: user input analysis unit
121: segmentation part
122: matching execution unit
200: electronic document writing device
300: electronic document processing device
400: electronic document retrieval device
500: navigation module
1: network
2: electronic document

Claims

In the natural language matching device based on morpheme analysis,
A pattern generation unit that obtains concepts included in the plurality of sentences through morpheme analysis of a plurality of sentences containing similar intentions, creates a pattern string, and generates a pattern based on the pattern string; And
A user input analysis unit that receives a user input, matches the pattern corresponding to the user input, and returns matching data,
Natural language matching device comprising a.

The method of claim 1,
The pattern generation unit,
A morpheme analysis unit that performs a morpheme analysis on each of the plurality of sentences;
A concept acquisition unit that acquires a concept based on a result of morpheme analysis;
A pattern string creation unit for generating a pattern string including at least one of the concept or capture for extracting a specific phrase from the user input; And
It comprises a pattern definition unit for defining a pattern based on the pattern string,
Natural language matching device.

The method of claim 2,
The result of the morpheme analysis,
A morpheme that analyzes each of the plurality of sentences in a morpheme unit and part of speech information associated with the analyzed morpheme,
The part-of-speech information is displayed based on a preset part-of-speech tag,
Natural language matching device.

The method of claim 2,
The pattern character string creation unit is to create the pattern character string based on a concept set including at least one concept obtained by the concept acquisition unit,
Natural language matching device.

The method of claim 2,
The above concept is
The natural language matching device, wherein a concept name encompassing words as elements of the concept exists and is defined based on the concept name or is defined in the form of an anonymous concept listing words as elements of the concept.

The method of claim 2,
The user input analysis unit,
A segmentation unit for dividing the user input into a plurality of segments by cutting the user input into spaces; And
Comprising a matching performing unit that determines whether at least one segment matches the pattern and returns matching data when matching is successful,
Natural language matching device.

The method of claim 6,
The matching performing unit,
Matching at least one segment and a word or matching the at least one segment and a concept, determining a pattern string including both the matched word and the matched concept, and returning matching data,
When the capture is included in the pattern string, the capture and the at least one segment are matched to extract a specific phrase in a user input associated with the capture,
The matching data includes at least one of data on the matched concept or a specific phrase in a user input associated with the capture,
Natural language matching device.

In the electronic document control system based on natural language processing,
A natural language matching device that receives a user input, matches a pattern corresponding to the user input, returns matching data, and determines a user's intention contained in the user input based on the matching data;
An electronic document creation device that creates or corrects an electronic document based on an electronic form template according to the user's intention;
An electronic document processing device that performs approval, reception, or deletion processing for a previously created electronic document according to the intention of the user;
An electronic document search device that provides a list of electronic documents that meet a search condition included in a user input according to the user's intention; And
Electronic document control system based on natural language processing comprising a navigation module for controlling movement between a plurality of electronic documents according to the intention of the user.

In the natural language matching method based on morpheme analysis,
Generating a pattern string by obtaining concepts included in the plurality of sentences through morpheme analysis of a plurality of sentences containing similar intentions, and generating a pattern based on the pattern string;
Receiving a user input; And
Matching the pattern corresponding to the user input and returning matching data,
Natural language matching method comprising a.

The method of claim 9,
The step of generating the pattern,
Performing morpheme analysis for each of the plurality of sentences;
Obtaining a concept based on a result of morpheme analysis;
Creating a pattern string based on a concept set including at least one of the concepts or capture for extracting a specific phrase from the user input; And
Defining a pattern based on the pattern string,
That includes, natural language matching method.

The method of claim 9,
Returning the matching data,
Dividing the user input into a plurality of segments by cutting the received user input into spaces; And
Determining whether the at least one segment matches the pattern and generating matching data when the matching is successful,
That includes, natural language matching method.

The method of claim 11,
The step of generating the matching data,
Matching at least one segment and word or matching the at least one segment and concept;
Determining a pattern string including both the matched word and the matched concept;
Extracting a specific phrase in a user input associated with the capture by matching the capture and the at least one segment when the pattern string includes a capture; And
Generating the matching data including at least one of the data on the matched concept or a specific phrase in the user input associated with the capture,
That includes, natural language matching method.