KR20210098135A

KR20210098135A - Apparatus, method and computer program for analyzing query data

Info

Publication number: KR20210098135A
Application number: KR1020200011949A
Authority: KR
Inventors: 김치영; 김정호; 민태홍; 성주원
Original assignee: 주식회사 케이티
Priority date: 2020-01-31
Filing date: 2020-01-31
Publication date: 2021-08-10

Abstract

Provided is a query analysis device for analyzing query data, which includes: an input unit for receiving query data; a natural language processing unit for deriving a first entity name pattern by performing natural language processing on the received query data; a deep learning model processing unit for deriving a second entity name pattern using a deep learning model with respect to the received query data; a similar meta extraction unit for extracting similar metadata for each entity name included in the second entity name pattern; a query analysis unit for performing query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and the similar metadata; and an output unit for outputting a query analysis result for the received query data.

Description

Query analysis apparatus, method and computer program for analyzing query data {APPARATUS, METHOD AND COMPUTER PROGRAM FOR ANALYZING QUERY DATA}

본 발명은 질의 데이터를 분석하는 질의 분석 장치, 방법 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to a query analysis apparatus, method and computer program for analyzing query data.

질의 응답 시스템이란 주어진 질의에 대하여 응답을 구하는 시스템으로, 사용자의 질의에 대한 적절한 응답을 구하기 위하여 질의 분석, 정보 검색, 정보 추출 등의 다양한 기술이 이용된다. 사용자의 질의는 텍스트, 음성과 같은 형태의 자연어 데이터로 입력될 수 있으며, 여러 가지 정보를 동시에 포함할 수 있다. 따라서, 질의 응답 시스템에서 정보를 검색하고 추출하기에 앞서 질의를 분석하는 과정이 필요하다.A question-and-answer system is a system that obtains a response to a given question, and various techniques such as query analysis, information retrieval, and information extraction are used to obtain an appropriate response to a user's query. A user's query may be input as natural language data in the form of text or voice, and may include various pieces of information at the same time. Therefore, it is necessary to analyze the query before searching and extracting information in the question and answer system.

종래에는 콘텐츠를 검색하기 위해 사용자가 영화 제목, 프로그램명, 장르, 출연진 등의 검색어를 직접 입력하면 이에 대한 키워드 검색 결과가 제공되는 방식이 일반적으로 사용되었다. 하지만, 질의를 입력할 수 있는 사용자 인터페이스가 키보드, 리모콘에서 음성 인식 장치 등으로 다양하게 변화되면서, 복잡한 검색어를 처리할 수 있는 기술이 요구되고 있다.Conventionally, when a user directly inputs a search term such as a movie title, a program name, a genre, or a cast to search for content, a method in which a keyword search result is provided is generally used. However, as a user interface capable of inputting a query is variously changed from a keyboard and a remote control to a voice recognition device, a technology capable of processing a complex search word is required.

콘텐츠를 검색하기 위한 사용자의 질의에는 사용자가 찾고자 하는 미디어의 유형에 관한 정보와 메타 데이터를 이용하여 검색된 결과를 제약하는 정보 등이 포함된다. 이 경우에 질의 분석은 일반적으로 두 단계로 이루어진다. 먼저 사용자가 찾고자 하는 미디어의 유형을 인식하고, 이어서 사용자의 질문에 포함된 메타 데이터를 인식한다.A user's query to search for content includes information on the type of media the user is looking for and information restricting the search results using metadata. In this case, query analysis is usually done in two steps. First, the type of media the user is looking for is recognized, and then the metadata included in the user's question is recognized.

메타 데이터의 인식은 개체명 인식기를 이용하여 이루어진다. 개체명 인식(named-entity recognition, NER)이란 인명, 단체, 장소, 시간, 단위 등과 같이 미리 정의해둔 분류가 존재하고, 비정형 텍스트로부터 미리 정의해둔 분류에 포함되는 단어를 인식하여 이를 추출 및 분류하는 것을 의미한다.Meta data is recognized using an entity name recognizer. Named-entity recognition (NER) is a method of extracting and classifying words by recognizing a predefined classification such as a person's name, group, place, time, unit, etc. means that

종래의 콘텐츠 검색 방식은 콘텐츠의 제목, 기본적인 메타 데이터(장르, 출연진 등)를 포함하는 질의는 처리할 수 있었으나, 여러 가지 메타 데이터를 포함하는 질의를 처리하는 데에는 한계를 보이고 있다. 여러 가지 메타 데이터를 포함하는 질의는 예를 들어, "프랑스 배경의 미국 영화", "2017년 경찰학교 배경의 유쾌한 한국 영화"와 같은 것을 의미한다.The conventional content search method can process a query including the title of the content and basic metadata (genre, cast, etc.), but has limitations in processing a query including various kinds of metadata. A query that includes multiple metadata means, for example, "American movies set in France" or "Fun Korean movies set in a police academy in 2017".

또한, 종래의 콘텐츠 검색 방식은 질의가 사전에 정의하지 않은 메타 데이터를 포함하는 경우에도 분석 결과의 정확도가 저하되는 문제점이 있었다. 사전에 정의하지 않은 메타 데이터에는, 사전 정의한 메타 데이터와 의미는 유사하지만 형태가 다른 경우, 사전 정의한 메타 데이터 중에 의미가 유사한 것이 존재하지 않는 경우 등이 있다. 콘텐츠 검색에 음성 인식 기술이 함께 이용되면서, 사전에 정의하지 않은 메타 데이터를 포함하는 질의가 더욱 증가하고 있다.In addition, the conventional content search method has a problem in that the accuracy of the analysis result is deteriorated even when the query includes metadata not defined in advance. Meta data that is not defined in advance may have a similar meaning to predefined metadata but have a different shape, or a case in which a similar meaning does not exist among predefined metadata. As voice recognition technology is used together for content search, queries including metadata that are not defined in advance are increasing.

한국등록특허공보 제 10-1873873호 (2018.06.27. 공개)Korean Patent Publication No. 10-1873873 (published on June 27, 2018)

여러 가지 메타 데이터를 포함하거나, 사전에 정의하지 않은 메타 데이터를 포함하는 질의를 처리하는 질의 분석 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.An object of the present invention is to provide a query analysis apparatus, method, and computer program for processing a query including various metadata or metadata not defined in advance.

자연어 처리 및 딥러닝 알고리즘에 기초하여 질의 데이터를 분석하는 질의 분석 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.An object of the present invention is to provide a query analysis apparatus, method, and computer program for analyzing query data based on natural language processing and deep learning algorithms.

복수의 키워드를 포함하는 질의 데이터를 분석하는 질의 분석 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.An object of the present invention is to provide a query analysis apparatus, method, and computer program for analyzing query data including a plurality of keywords.

사전에 정의되어 있지 않은 키워드를 포함하는 질의 데이터를 분석하는 질의 분석 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.An object of the present invention is to provide a query analysis apparatus, method, and computer program for analyzing query data including keywords that are not defined in advance.

질의 분석을 위해 구축하고 관리해야 하는 정보의 양을 감소시키고, 비교적 적은 데이터로 질의 분석의 정확도를 향상하고자 한다.It aims to reduce the amount of information that needs to be built and managed for query analysis and to improve the accuracy of query analysis with relatively little data.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 질의 데이터를 분석하는 질의 분석 장치에 있어서, 질의 데이터를 입력받는 입력부, 상기 입력받은 질의 데이터에 대하여 자연어 처리를 수행하여 제 1 개체명 패턴을 도출하는 자연어 처리부, 상기 입력받은 질의 데이터에 대하여 딥러닝 모델을 이용하여 제 2 개체명 패턴을 도출하는 딥러닝 모델 처리부, 상기 제 2 개체명 패턴에 포함되는 각 개체명에 대하여 유사 메타 데이터를 추출하는 유사 메타 추출부, 상기 제 1 개체명 패턴, 상기 제 2 개체명 패턴 및 상기 유사 메타 데이터에 기초하여 상기 입력받은 질의 데이터에 대한 질의 분석을 수행하는 질의 분석부 및 상기 입력받은 질의 데이터에 대한 질의 분석 결과를 출력하는 출력부를 포함할 수 있다.As a means for achieving the above-described technical problem, an embodiment of the present invention provides a query analysis apparatus for analyzing query data, an input unit for receiving query data, and performing natural language processing on the received query data. A natural language processing unit for deriving 1 entity name pattern, a deep learning model processing unit for deriving a second entity name pattern using a deep learning model for the received query data, and for each entity name included in the second entity name pattern A similarity meta extraction unit for extracting similar metadata, a query analysis unit for performing query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and the similar metadata, and the input It may include an output unit for outputting a query analysis result for the received query data.

일 실시예에서, 상기 입력받은 질의 데이터는 체언 또는 용언에 해당하는 적어도 하나의 키워드를 포함하고, 상기 자연어 처리부는 상기 키워드를 복수의 개체명 중 어느 하나의 개체명에 대응시킴으로써 상기 입력받은 질의 데이터에 대한 상기 제 1 개체명 패턴을 도출할 수 있다.In an embodiment, the received query data includes at least one keyword corresponding to a pronoun or a verb, and the natural language processing unit matches the keyword to any one of a plurality of entity names to match the received query data. It is possible to derive the first entity name pattern for .

일 실시예에서, 상기 자연어 처리부는 상기 제 1 개체명 패턴이 복수 개로 도출되는 경우, 각 제 1 개체명 패턴에 대한 정확도를 도출하는 패턴 정확도 도출부를 더 포함하고, 상기 자연어 처리부는 상기 정확도에 기초하여 상기 도출된 복수 개의 제 1 개체명 패턴 중 하나를 선택할 수 있다.In an embodiment, the natural language processing unit further includes a pattern accuracy deriving unit for deriving accuracy for each first entity name pattern when a plurality of first entity name patterns are derived, and the natural language processing unit is based on the accuracy Thus, one of the plurality of derived first entity name patterns may be selected.

일 실시예에서, 상기 딥러닝 모델은 워드 임베딩(word embedding)에 기초하여 생성된 제 1 학습 데이터, 상기 제 1 개체명 패턴에 대한 통계에 기초하여 생성된 제 2 학습 데이터 및 상기 질의 분석 결과에 대한 피드백 데이터에 기초하여 생성된 제 3 학습 데이터 중 하나 이상을 이용하여 학습된 것일 수 있다.In one embodiment, the deep learning model includes first learning data generated based on word embedding, second training data generated based on statistics on the first entity name pattern, and the query analysis result. It may be learned using one or more of the third learning data generated based on the feedback data.

일 실시예에서, 상기 딥러닝 모델은 기학습된 복수의 개체명 패턴에 대하여 상기 입력받은 질의 데이터가 해당할 확률값을 출력하는 제 1 모델을 포함할 수 있다.In an embodiment, the deep learning model may include a first model that outputs a probability value corresponding to the received query data for a plurality of pre-learned entity name patterns.

일 실시예에서, 상기 딥러닝 모델은 상기 제 1 모델의 출력값에 기초하여 상기 제 2 개체명 패턴을 도출하는 제 2 모델을 더 포함할 수 있다.In an embodiment, the deep learning model may further include a second model for deriving the second entity name pattern based on the output value of the first model.

일 실시예에서, 상기 유사 메타 추출부는 개체명 색인에 기초하여 상기 제 2 개체명 패턴에 포함되는 각 개체명과 형태가 유사한 단어를 상기 유사 메타 데이터로 추출할 수 있다.In an embodiment, the similarity meta extraction unit may extract a word having a shape similar to each entity name included in the second entity name pattern as the similarity metadata based on the entity name index.

일 실시예에서, 상기 유사 메타 추출부는 상기 개체명 색인에 형태가 유사한 단어가 존재하지 않는 경우, 워드 임베딩에 기초하여 형태가 유사한 단어를 상기 유사 메타 데이터로 추출할 수 있다.In an embodiment, when a word having a similar shape does not exist in the entity name index, the similarity meta extraction unit may extract a word having a similar shape as the similarity metadata based on word embedding.

본 발명의 다른 실시예는, 질의 데이터를 분석하는 질의 분석 방법에 있어서, 질의 데이터를 입력받는 단계, 상기 입력받은 질의 데이터에 대하여 자연어 처리를 수행하여 제 1 개체명 패턴을 도출하는 단계, 상기 입력받은 질의 데이터에 대하여 딥러닝 모델을 이용하여 제 2 개체명 패턴을 도출하는 단계, 상기 제 2 개체명 패턴에 포함되는 각 개체명에 대하여 유사 메타 데이터를 추출하는 단계, 상기 제 1 개체명 패턴, 상기 제 2 개체명 패턴 및 상기 유사 메타 데이터에 기초하여 상기 입력받은 질의 데이터에 대한 질의 분석을 수행하는 단계 및 상기 입력받은 질의 데이터에 대한 질의 분석 결과를 출력하는 단계를 포함할 수 있다.Another embodiment of the present invention provides a query analysis method for analyzing query data, comprising: receiving query data; performing natural language processing on the received query data to derive a first entity name pattern; Deriving a second entity name pattern using a deep learning model with respect to the received query data, extracting similar metadata for each entity name included in the second entity name pattern, the first entity name pattern, The method may include performing a query analysis on the received query data based on the second entity name pattern and the similar metadata, and outputting a query analysis result on the received query data.

본 발명의 또 다른 실시예는, 질의 데이터를 분석하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램에 있어서, 질의 데이터를 입력받고, 상기 입력받은 질의 데이터에 대하여 자연어 처리를 수행하여 제 1 개체명 패턴을 도출하고, 상기 입력받은 질의 데이터에 대하여 딥러닝 모델을 이용하여 제 2 개체명 패턴을 도출하고, 상기 제 2 개체명 패턴에 포함되는 각 개체명에 대하여 유사 메타 데이터를 추출하고, 상기 제 1 개체명 패턴, 상기 제 2 개체명 패턴 및 상기 유사 메타 데이터에 기초하여 상기 입력받은 질의 데이터에 대한 질의 분석을 수행하고, 상기 입력받은 질의 데이터에 대한 질의 분석 결과를 출력하도록 하는 명령어들의 시퀀스를 포함할 수 있다.Another embodiment of the present invention provides a computer program stored in a medium including a sequence of instructions for analyzing query data, receiving query data and performing natural language processing on the received query data to name a first entity A pattern is derived, a second entity name pattern is derived using a deep learning model with respect to the received query data, similar metadata is extracted for each entity name included in the second entity name pattern, and the second entity name pattern is derived. A sequence of commands for performing query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and the similar metadata and outputting the query analysis result on the received query data may include

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 복수의 키워드를 포함하는 질의 데이터 또는 사전에 정의되어 있지 않은 키워드를 포함하는 질의 데이터를 분석할 수 있다.According to any one of the above-described problem solving means of the present invention, query data including a plurality of keywords or query data including keywords not defined in advance can be analyzed.

또한, 질의 분석을 위해 구축하고 관리해야 하는 정보의 양을 감소시킬 수 있다.It can also reduce the amount of information that must be built and managed for query analysis.

또한, 자연어 처리 및 딥러닝 모델을 이용하여 질의 분석의 정확도를 향상시킬 수 있다.In addition, the accuracy of query analysis can be improved by using natural language processing and deep learning models.

또한, 질의 분석을 이용하여 서비스를 제공할 수 있다.In addition, a service can be provided using query analysis.

도 1은 본 발명의 일 실시예에 따른 질의 분석 장치의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 질의 분석 방법의 순서도이다.
도 3은 본 발명의 일 실시예에 따라 제 1 개체명 패턴을 도출하는 방법의 순서도이다.
도 4는 본 발명의 일 실시예에 따라 제 2 개체명 패턴을 도출하는 방법의 순서도이다.1 is a block diagram of a query analysis apparatus according to an embodiment of the present invention.
2 is a flowchart of a query analysis method according to an embodiment of the present invention.
3 is a flowchart of a method of deriving a first entity name pattern according to an embodiment of the present invention.
4 is a flowchart of a method of deriving a second entity name pattern according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware. Meanwhile, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, '~' denotes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'. In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

이하에서 언급되는 "네트워크"는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.The "network" referred to below means a connection structure capable of exchanging information between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN). , the Internet (WWW: World Wide Web), wired and wireless data networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.Some of the operations or functions described as being performed by the terminal or device in the present specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 질의 분석 장치의 구성도이다. 도 1을 참조하면, 질의 분석 장치(100)는 입력부(110), 자연어 처리부(120), 딥러닝 모델 처리부(130), 유사 메타 추출부(140), 질의 분석부(150) 및 출력부(160)를 포함할 수 있다.1 is a block diagram of a query analysis apparatus according to an embodiment of the present invention. Referring to FIG. 1 , the query analysis apparatus 100 includes an input unit 110 , a natural language processing unit 120 , a deep learning model processing unit 130 , a similar meta extraction unit 140 , a query analysis unit 150 , and an output unit ( 160) may be included.

질의 분석 장치(100)는 질의 데이터를 분석하여 이에 대응하는 결과값을 출력할 수 있다. 예를 들어, 질의 분석 장치(100)는 미디어 제공 서비스에서 사용자가 원하는 콘텐츠를 검색하기 위한 질의 데이터를 분석하여 이에 대응하는 콘텐츠의 목록을 결과값을 출력할 수 있다.The query analysis apparatus 100 may analyze the query data and output a result value corresponding thereto. For example, the query analysis apparatus 100 may analyze query data for searching for content desired by the user in the media providing service and output a list of corresponding content as a result value.

입력부(110)는 질의 데이터를 입력받을 수 있다. 질의 데이터는 텍스트 데이터 등을 포함할 수 있다. 입력부(110)는 예를 들어, 자연어 데이터를 질의 데이터로 입력받을 수 있다.The input unit 110 may receive query data. The query data may include text data and the like. The input unit 110 may receive, for example, natural language data as query data.

질의 데이터는 적어도 하나의 키워드를 포함할 수 있다. 키워드는 예를 들어, 명사, 대명사, 수사 등과 같은 체언에 해당하거나, 또는 동사, 형용사 등과 같은 용언에 해당할 수 있다. 본원 발명은 일반적으로 키워드에 포함되는 체언뿐 아니라 용언을 포함하여 질의 데이터를 분석함으로써, 보다 정확한 분석 결과를 도출할 수 있다.The query data may include at least one keyword. The keyword may correspond to, for example, a pronoun, such as a noun, a pronoun, or a verb, or a verb, such as a verb, an adjective, or the like. In the present invention, more accurate analysis results can be derived by analyzing query data including verbs as well as verbs generally included in keywords.

예를 들어, "프랑스가 나오는 미국 영화"라는 질의 데이터를 입력받은 경우에, 질의 데이터는 "프랑스", "나오는", "미국" 및 "영화"를 키워드로서 포함할 수 있다. 여기서, "프랑스", "미국" 및 "영화"는 체언에 해당하는 키워드이고, "나오는"은 용언에 해당하는 키워드이다.For example, when query data of "American movies about France" is input, the query data may include "France", "coming out", "USA", and "movies" as keywords. Here, "France", "United States" and "movie" are keywords corresponding to verbs, and "coming out" is keywords corresponding to verbs.

자연어 처리부(120)는 입력받은 질의 데이터에 대하여 자연어 처리를 수행하여 제 1 개체명 패턴을 도출할 수 있다. 자연어 처리부(120)는 개체명 사전 및 사전에 정의한 개체명 패턴 규칙을 이용하여 제 1 개체명 패턴을 도출할 수 있다.The natural language processing unit 120 may derive a first entity name pattern by performing natural language processing on the received query data. The natural language processing unit 120 may derive the first entity name pattern by using the entity name dictionary and the entity name pattern rule defined in the dictionary.

자연어 처리부(120)는 키워드를 복수의 개체명 중 어느 하나의 개체명에 대응시킬 수 있다. 자연어 처리부(120)는 예를 들어, 복수의 개체명에 관한 정보를 포함하는 개체명 사전으로부터 키워드에 대응되는 개체명을 추출할 수 있다. 개체명 사전은 국가, 콘텐츠 타입, 공간적 배경, 배경 속성 등의 미리 정의된 복수의 개체명 및 각 개체명으로 분류되는 복수의 단어에 관한 정보를 포함할 수 있다.The natural language processing unit 120 may match the keyword to any one of a plurality of entity names. The natural language processing unit 120 may extract, for example, an entity name corresponding to a keyword from an entity name dictionary including information on a plurality of entity names. The entity name dictionary may include a plurality of predefined entity names such as a country, a content type, a spatial background, and a background attribute, and information about a plurality of words classified into each entity name.

자연어 처리부(120)는 사전에 정의한 개체명 패턴 규칙에 기초하여 제 1 개체명 패턴을 분석할 수 있다. 자연어 처리부(120)는 사전에 정의한 복수 개의 개체명 패턴 규칙을 이용할 수 있다.The natural language processing unit 120 may analyze the first entity name pattern based on a predefined entity name pattern rule. The natural language processing unit 120 may use a plurality of predefined entity name pattern rules.

예를 들어, "프랑스 배경의 미국 영화"라는 질의 데이터에 포함되는 각 키워드에 대하여, 개체명 사전에 기초하여 "프랑스 - 국가", "배경의 - 배경 속성", "미국 - 국가", "영화 - 콘텐츠 타입"과 같이 개체명을 대응시킬 수 있다. 이로부터 사전에 정의한 개체명 패턴 규칙을 참조하여 "공간적 배경: 프랑스, 제작 국가: 미국, 콘텐츠 타입: 영화"와 같이 제 1 개체명 패턴을 분석할 수 있다.For example, for each keyword included in the query data of "American film with a French background", "France - Country", "Background - Background attribute", "US - Country", "Movie" based on the entity name dictionary. - You can match object names such as "content type". From this, it is possible to analyze the first entity name pattern such as "Spatial background: France, production country: USA, content type: movie" with reference to the entity name pattern rule defined in advance.

다른 예를 들어, "이승기 출연 예능"이라는 질의 데이터에 포함되는 각 키워드에 대하여, 개체명 사전에 기초하여 "이승기 - 인물", "출연 - 출연 속성", "예능 - 콘텐츠 타입"과 같이 개체명을 대응시킬 수 있다. 이로부터 사전에 정의한 개체명 패턴 규칙을 참조하여 "게스트 출연: 이승기, 콘텐츠 타입: 예능" 또는 "출연진: 이승기, 콘텐츠 타입: 예능"과 같이 제 1 개체명 패턴을 분석할 수 있다.As another example, for each keyword included in the query data of "Lee Seung-gi's entertainment," based on the entity name dictionary, the entity name such as "Lee Seung-gi - person", "appearance - appearance attribute", "entertainment - content type" can be matched. From this, it is possible to analyze the first entity name pattern, such as "guest appearance: Seunggi Lee, content type: entertainment" or "caster: Seunggi Lee, content type: entertainment" with reference to the entity name pattern rule defined in advance.

다시 도 1을 참조하면, 자연어 처리부(120)는 패턴 정확도 도출부(121)를 포함할 수 있다. 패턴 정확도 도출부(121)는 제 1 개체명 패턴이 복수 개로 도출되는 경우, 각 제 1 개체명 패턴에 대한 정확도를 도출할 수 있다.Referring back to FIG. 1 , the natural language processing unit 120 may include a pattern accuracy deriving unit 121 . When a plurality of first entity name patterns are derived, the pattern accuracy deriving unit 121 may derive accuracy for each first entity name pattern.

패턴 정확도 도출부(121)는 각 기설정된 가중치에 기초하여 제 1 개체명 패턴에 대한 정확도를 도출할 수 있다.The pattern accuracy deriving unit 121 may derive the accuracy of the first entity name pattern based on each preset weight.

예를 들어, "게스트 출연: 인물, 콘텐츠 타입: 예능"의 경우에 가중치는 A1 × ( 1 / 게스트로 출연한 마지막 프로그램 경과일) + B1 × (타입 가중치 × 직업 가중치) + C1 × (게스트 출연 횟수 / (게스트 출연 횟수 + 고정 출연 횟수))과 같이 도출될 수 있으며, "출연: 인물, 콘텐츠 타입: 예능"의 경우에 가중치는 A2 × ( 1 / 고정 출연한 마지막 프로그램 경과일) + B1 × (타입 가중치 × 직업 가중치) + C2 × (고정 출연 횟수 / (게스트 출연횟수 + 고정 출연 횟수))과 같이 도출될 수 있다. 여기서, 타입 가중치는 각 직업의 인물이 해당 콘텐츠에 출연할 가능성을 수치화하여 설정된 값이고, 직업 가중치는 각 인물의 직업에 기초하여 설정된 값이고, A1, A2, B1, C1 및 C2는 결과 보정을 위하여 설정된 값일 수 있다.For example, in the case of “Guest Appearance: Person, Content Type: Entertainment”, the weight is A1 × ( 1 / Last program date as a guest appearance) + B1 × (Type weight × Occupation weight) + C1 × (Guest appearance) number of times / (number of guest appearances + number of fixed appearances)), and in the case of “appearance: person, content type: entertainment,” the weight is A2 × ( 1 / date of last program appearance fixed) + B1 × It can be derived as (type weight × job weight) + C2 × (number of fixed appearances / (number of guest appearances + number of fixed appearances)). Here, the type weight is a value set by quantifying the probability that a person of each job will appear in the corresponding content, the job weight is a value set based on the job of each person, and A1, A2, B1, C1, and C2 are the results correction. It may be a value set for

자연어 처리부(120)는 정확도에 기초하여 도출된 복수 개의 제 1 개체명 패턴 중 하나를 선택할 수 있다. 자연어 처리부(120)는 예를 들어, 복수 개의 제 1 개체명 패턴 중 정확도의 값이 더 높은 제 1 개체명 패턴을 선택할 수 있다.The natural language processing unit 120 may select one of the plurality of first entity name patterns derived based on accuracy. The natural language processing unit 120 may, for example, select a first entity name pattern having a higher accuracy value from among the plurality of first entity name patterns.

딥러닝 모델 처리부(130)는 입력받은 질의 데이터에 대하여 딥러닝 모델을 이용하여 제 2 개체명 패턴을 도출할 수 있다.The deep learning model processing unit 130 may derive the second entity name pattern by using the deep learning model with respect to the received query data.

딥러닝 모델 처리부(130)는 제 1 학습 데이터, 제 2 학습 데이터 및 제 3 학습 데이터를 이용하여 딥러닝 모델을 학습시킬 수 있다.The deep learning model processing unit 130 may train the deep learning model by using the first learning data, the second learning data, and the third learning data.

딥러닝 모델의 학습 데이터는 딥러닝 모델의 성능 및 결과의 품질에 직접적인 영향을 미친다. 그러나, 대부분의 경우 학습 데이터를 구축하는 데에 어려움이 있다. 본 발명은 딥러닝 모델의 최초 학습에 필요한 학습 데이터를 구축할 수 있는 방법 및 지속적으로 학습 데이터를 구축할 수 있는 방법을 제공하고자 한다. 지도 학습 기반의 딥러닝 모델의 경우에는 문제와 정답의 쌍으로 구성되는 학습 데이터를 필요로 하므로, 본 발명은 이러한 문제와 정답의 쌍으로 구성되는 학습 데이터의 생성을 자동화하고자 한다.The training data of the deep learning model directly affects the performance of the deep learning model and the quality of the results. However, in most cases, it is difficult to construct training data. An object of the present invention is to provide a method for constructing learning data necessary for initial learning of a deep learning model and a method for continuously constructing learning data. In the case of a supervised learning-based deep learning model, learning data composed of pairs of problems and correct answers is required, so the present invention intends to automate the generation of learning data composed of pairs of problems and correct answers.

딥러닝 모델 처리부(130)는 웹 크롤링 등의 방식과 뉴스 등의 웹 문서로부터 질의 데이터를 수집하여 원시 학습 데이터를 구축할 수 있다. 이러한 원시 학습 데이터는 문제와 정답의 쌍으로 구성되어 있지 않을 수 있다.The deep learning model processing unit 130 may construct raw learning data by collecting query data from web documents such as web crawling and news. Such raw training data may not consist of pairs of questions and answers.

딥러닝 모델 처리부(130)는 워드 임베딩에 기초하여 제 1 학습 데이터를 생성할 수 있다. 딥러닝 모델 처리부(130)는 원시 학습 데이터, 외부 수집 데이터 등에 대하여 어절, 형태소 등을 다차원 공간에 전사하는 워드 임베딩을 수행할 수 있다. 딥러닝 모델 처리부(130)는 예를 들어, Word2Vec, Phase2Vec, FastText 등의 알고리즘을 이용할 수 있으며, 이를 일부 변형하여 이용할 수 있다.The deep learning model processing unit 130 may generate first training data based on word embedding. The deep learning model processing unit 130 may perform word embedding in which words and morphemes are transcribed into a multidimensional space with respect to raw learning data, externally collected data, and the like. The deep learning model processing unit 130 may use, for example, an algorithm such as Word2Vec, Phase2Vec, FastText, and may use it with some modifications.

딥러닝 모델 처리부(130)는 워드 임베딩을 수행한 결과를 개체명 사전과 비교하여 제 1 학습 데이터를 생성할 수 있다. 딥러닝 모델 처리부(130)는 디스턴트 수퍼비전(Distant supervision)에 더 기초하여 제 1 학습 데이터를 생성할 수 있다.The deep learning model processing unit 130 may generate the first learning data by comparing the word embedding result with the entity name dictionary. The deep learning model processing unit 130 may generate the first learning data further based on distant supervision.

예를 들어, "고전적인"이라는 단어에 대하여 워드 임베딩에 의하여 추출된 단어 "고풍스러운"이 개체명 "감성"에 해당하고, 또 다른 추출된 단어 "엔틱"이 개체명 "소재"에 해당하는 경우에, 디스턴트 수퍼비전에 의하여 형태가 더 유사한 단어인 "고풍스러운"이 선택될 수 있다. 이러한 과정을 통하여 "고전적인"이라는 단어에 "감성"이라는 개체명을 대응시킨 제 1 학습 데이터가 생성된다.For example, with respect to the word "classical", the word "classic" extracted by word embedding corresponds to the entity name "sentiment", and another extracted word "antique" corresponds to the entity name "material". In some cases, distant supervision may select a word that is more similar in form, "old-fashioned". Through this process, the first learning data in which the word "classic" is matched with the name of the entity "sentiment" is generated.

딥러닝 모델 처리부(130)는 제 1 개체명 패턴에 대한 통계에 기초하여 제 2 학습 데이터를 생성할 수 있다.The deep learning model processing unit 130 may generate second learning data based on statistics on the first entity name pattern.

딥러닝 모델 처리부(130)는 예를 들어, 사용자가 실제로 많이 입력하는 질의로부터 도출된 제 1 개체명 패턴을 이용할 수 있다.The deep learning model processing unit 130 may use, for example, a first entity name pattern derived from a query that the user actually inputs a lot.

딥러닝 모델 처리부(130)는 예를 들어, 제 1 개체명 패턴에 대한 통계 및 개체명 사전에 기초하여 제 2 학습 데이터를 생성할 수 있다. 예를 들어, 소재, 등장 속성, 장르 및 콘텐츠 타입을 포함하는 개체명 패턴에 대하여 각 개체명에 해당하는 단어들의 조합함으로써 제 2 학습 데이터를 생성할 수 있다.The deep learning model processing unit 130 may generate the second learning data, for example, based on statistics on the first entity name pattern and the entity name dictionary. For example, the second learning data may be generated by combining words corresponding to each entity name with respect to an entity name pattern including a material, an appearance attribute, a genre, and a content type.

딥러닝 모델 처리부(130)는 질의 분석 결과에 대한 피드백 데이터에 기초하여 제 3 학습 데이터를 생성할 수 있다. 딥러닝 모델 처리부(130)는 예를 들어, 질의 분석 장치(100)의 관리자로부터 입력된 피드백 데이터에 기초하여 제 3 학습 데이터를 생성할 수 있다. 제 3 학습 데이터에 의하여 질의 분석 결과의 신뢰도 및 정확도가 개선될 수 있다.The deep learning model processing unit 130 may generate third learning data based on the feedback data for the query analysis result. The deep learning model processing unit 130 may generate third learning data, for example, based on feedback data input from the manager of the query analysis apparatus 100 . Reliability and accuracy of the query analysis result may be improved by the third learning data.

딥러닝 모델 처리부(130)는 제 1 모델 및 제 2 모델을 포함하는 딥러닝 모델을 이용할 수 있다. 제 1 모델은 예를 들어, BiLSTM-CRF에 기초한 것일 수 있다.The deep learning model processing unit 130 may use a deep learning model including a first model and a second model. The first model may be, for example, based on BiLSTM-CRF.

딥러닝 모델 처리부(130)는 제 1 모델을 이용하여 기학습된 복수의 개체명 패턴에 대하여 상기 입력받은 질의 데이터가 해당할 확률값을 출력할 수 있다. 예를 들어, 딥러닝 모델 처리부(130)는 제 1 모델을 이용하여 "형사가 나오는 액션 영화"를 질의 데이터로 입력받은 경우에, 개체명 패턴 "소재: 형사, 등장 속성: 나오는, 장르: 액션, 콘텐츠 타입: 영화"에 해당할 확률이 70%이고, 또 다른 개체명 패턴 "인물: 형사, 등장 속성: 나오는, 장르: 액션, 콘텐츠 타입: 영화"에 해당할 확률이 20%인 것으로 확률값을 출력할 수 있다.The deep learning model processing unit 130 may output a probability value corresponding to the received query data for a plurality of pre-learned entity name patterns using the first model. For example, when the deep learning model processing unit 130 receives "action movie in which a detective appears" as query data using the first model, the entity name pattern "Material: detective, appearance property: appearing, genre: action , content type: movie”, and another entity name pattern “character: detective, appearance attribute: coming out, genre: action, content type: movie” with a probability of 20%. can be printed out.

딥러닝 모델 처리부(130)는 제 2 모델을 이용하여 제 2 개체명 패턴을 도출할 수 있다. 제 2 모델은 제 1 모델의 출력값을 입력으로 하여, 정답에 해당하는 개체명 패턴을 출력하도록 학습된 것일 수 있다.The deep learning model processing unit 130 may derive the second entity name pattern by using the second model. The second model may be one that has been trained to output an entity name pattern corresponding to the correct answer by taking the output value of the first model as an input.

딥러닝 모델 처리부(130)는 제 2 모델을 이용하여 제 1 모델에서 출력된 개체명 패턴들 중에서 실제로 사용자의 질의로 입력될 수 있는 개체명 패턴을 추출하는 역할을 할 수 있다. 딥러닝 모델 처리부(130)는 제 2 모델을 이용하여, 제 1 모델이 충분히 학습되어 있지 않은 경우를 보완할 수 있다.The deep learning model processing unit 130 may serve to extract an entity name pattern that can be actually input as a user's query from among entity name patterns output from the first model by using the second model. The deep learning model processing unit 130 may compensate for the case in which the first model is not sufficiently trained by using the second model.

유사 메타 추출부(140)는 제 2 개체명 패턴에 포함되는 각 개체명에 대하여 유사 메타 데이터를 추출할 수 있다. 유사 메타 추출부(140)는 개체명 사전에서 중요한 형태소라고 판단되는 명사, 형용사 등을 추출하여 이를 개체명 유형별로 색인한 개체명 색인을 이용할 수 있다. 유사 메타 추출부(140)는 예를 들어, "화려한"이라는 키워드에 대하여 'Key-화려, Value-화려한'과 같이 색인화할 수 있다. 유사 메타 추출부(140)는 예를 들어, "소름돋는"이라는 키워드에 대하여 'Key-소름, Key-돋다, Value-소름돋는''과 같이 색인화할 수 있다. 유사 메타 추출부(140)는 제 2 개체명 패턴으로부터 중요 형태소로 판단되는 명사, 형용사 등을 추출하고, 이를 개체명 색인과 비교할 수 있다.The similarity meta extraction unit 140 may extract similar metadata for each entity name included in the second entity name pattern. The similarity meta extraction unit 140 may extract a noun, an adjective, etc. determined to be important morphemes from the entity name dictionary and use the entity name index indexed by entity name type. The similarity meta extractor 140 may index the keyword “glamorous” as, for example, 'Key-glamorous, Value-glamorous'. The similarity meta extractor 140 may index, for example, 'Key- creepy, Key- creepy, Value- creepy'' with respect to the keyword "creepy". The similarity meta extraction unit 140 may extract a noun, an adjective, etc. determined as important morphemes from the second entity name pattern, and compare it with an entity name index.

유사 메타 추출부(140)는 개체명 색인에 기초하여 제 2 개체명 패턴에 포함되는 각 개체명과 형태가 유사한 단어를 유사 메타 데이터로 추출할 수 있다.The similarity meta extraction unit 140 may extract a word having a shape similar to each entity name included in the second entity name pattern as similarity metadata based on the entity name index.

유사 메타 추출부(140)는 개체명 색인에 형태가 유사한 단어가 존재하지 않는 경우, 워드 임베딩에 기초하여 형태가 유사한 단어를 유사 메타 데이터로 추출할 수 있다.When a word having a similar shape does not exist in the entity name index, the similarity meta extraction unit 140 may extract a word having a similar shape as similar metadata based on word embedding.

질의 분석부(150)는 제 1 개체명 패턴, 제 2 개체명 패턴 및 유사 메타 데이터에 기초하여 입력받은 질의 데이터에 대한 질의 분석을 수행할 수 있다.The query analysis unit 150 may perform query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and similar metadata.

출력부(160)는 입력받은 질의 데이터에 대한 질의 분석 결과를 출력할 수 있다. 출력부(160)는 예를 들어, 스피커, 디스플레이 등을 이용하여 질의 분석 결과를 출력할 수 있다.The output unit 160 may output a query analysis result for the received query data. The output unit 160 may output a query analysis result using, for example, a speaker or a display.

일 실시예에서, 질의 분석 장치(100)는 판단부(미도시)를 더 포함할 수 있다. 판단부는 입력받은 질의 데이터에 대하여 제 2 개체명 패턴을 도출하지 않고 바로 제 1 개체명 패턴을 출력할 것인지 여부를 판단할 수 있다. 판단부는 예를 들어, 도출된 제 1 개체명 패턴에 대하여 판단한 결과 제 1 개체명 패턴을 출력하지 않고 입력받은 질의 데이터에 대한 제 2 개체명 패턴을 도출할 것으로 판단할 수 있다.In an embodiment, the query analysis apparatus 100 may further include a determination unit (not shown). The determination unit may determine whether to directly output the first entity name pattern without deriving the second entity name pattern with respect to the received query data. For example, as a result of determining the derived first entity name pattern, the determination unit may determine to derive the second entity name pattern for the received query data without outputting the first entity name pattern.

도 2는 본 발명의 일 실시예에 따른 질의 분석 방법의 순서도이다. 도 2에 도시된 질의 분석 장치(100)에서 수행되는 질의를 분석하는 방법(200)은 도 1에 도시된 실시예에 따라 질의 분석 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1에 도시된 실시예에 따라 질의 분석 장치(100)에서 수행되는 질의를 분석하는 방법에도 적용된다.2 is a flowchart of a query analysis method according to an embodiment of the present invention. The method 200 for analyzing a query performed by the query analyzing apparatus 100 shown in FIG. 2 includes steps that are time-series processed by the query analyzing apparatus 100 according to the embodiment shown in FIG. 1 . Therefore, even if omitted below, it is also applied to the method of analyzing a query performed by the query analyzing apparatus 100 according to the embodiment shown in FIG. 1 .

단계 S210에서 질의 분석 장치(100)는 질의 데이터를 입력받을 수 있다.In step S210, the query analysis apparatus 100 may receive query data.

단계 S220에서 질의 분석 장치(100)는 입력받은 질의 데이터에 대하여 자연어 처리를 수행하여 제 1 개체명 패턴을 도출할 수 있다.In step S220, the query analysis apparatus 100 may derive a first entity name pattern by performing natural language processing on the received query data.

단계 S230에서 질의 분석 장치(100)는 입력받은 질의 데이터에 대하여 딥러닝 모델을 이용하여 제 2 개체명 패턴을 도출할 수 있다.In step S230 , the query analysis apparatus 100 may derive a second entity name pattern from the received query data using a deep learning model.

단계 S240에서 질의 분석 장치(100)는 제 2 개체명 패턴에 포함되는 각 개체명에 대하여 유사 메타 데이터를 추출할 수 있다.In operation S240, the query analyzing apparatus 100 may extract similar metadata for each entity name included in the second entity name pattern.

단계 S250에서 질의 분석 장치(100)는 제 1 개체명 패턴, 제 2 개체명 패턴 및 유사 메타 데이터에 기초하여 입력받은 질의 데이터에 대한 질의 분석을 수행할 수 있다.In operation S250, the query analysis apparatus 100 may perform query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and similar metadata.

단계 S260에서 질의 분석 장치(100)는 입력받은 질의 데이터에 대한 질의 분석 결과를 출력할 수 있다.In operation S260 , the query analysis apparatus 100 may output a query analysis result for the received query data.

상술한 설명에서, 단계 S210 내지 S260은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S210 to S260 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be switched.

도 3은 본 발명의 일 실시예에 따라 제 1 개체명 패턴을 도출하는 방법의 순서도이다. 도 3에 도시된 질의 분석 장치(100)에서 수행되는 제 1 개체명 패턴을 도출하는 방법(300)은 도 1에 도시된 실시예에 따라 질의 분석 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1에 도시된 실시예에 따라 질의 분석 장치(100)에서 수행되는 제 1 개체명 패턴을 도출하는 방법에도 적용된다.3 is a flowchart of a method of deriving a first entity name pattern according to an embodiment of the present invention. The method 300 for deriving the first entity name pattern performed by the query analyzing apparatus 100 shown in FIG. 3 is time-series processing by the query analyzing apparatus 100 according to the embodiment shown in FIG. 1 . include those Therefore, even if omitted below, it is also applied to the method of deriving the first entity name pattern performed by the query analyzing apparatus 100 according to the embodiment shown in FIG. 1 .

단계 S310에서 질의 분석 장치(100)는 질의 데이터를 입력받을 수 있다.In step S310, the query analysis apparatus 100 may receive query data.

단계 S320에서 질의 분석 장치(100)는 입력받은 질의 데이터에 대한 복수 개의 제 1 개체명 패턴을 도출할 수 있다.In operation S320 , the query analyzing apparatus 100 may derive a plurality of first entity name patterns for the received query data.

단계 S330에서 질의 분석 장치(100)는 각 제 1 개체명 패턴에 대한 정확도를 도출할 수 있다.In step S330 , the query analyzing apparatus 100 may derive accuracy for each first entity name pattern.

단계 S340에서 질의 분석 장치(100)는 정확도에 기초하여 복수 개의 제 1 개체명 패턴 중 하나를 선택할 수 있다.In operation S340 , the query analyzing apparatus 100 may select one of the plurality of first entity name patterns based on accuracy.

상술한 설명에서, 단계 S310 내지 S340은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S310 to S340 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be switched.

도 4는 본 발명의 일 실시예에 따라 제 2 개체명 패턴을 도출하는 방법의 순서도이다. 도 4에 도시된 질의 분석 장치(100)에서 수행되는 제 2 개체명 패턴을 도출하는 방법의 순서도(400)는 도 1에 도시된 실시예에 따라 질의 분석 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1에 도시된 실시예에 따라 질의 분석 장치(100)에서 수행되는 제 2 개체명 패턴을 도출하는 방법의 순서도에도 적용된다.4 is a flowchart of a method of deriving a second entity name pattern according to an embodiment of the present invention. A flowchart 400 of a method of deriving a second entity name pattern performed by the query analyzing apparatus 100 shown in FIG. 4 is time-series processed by the query analyzing apparatus 100 according to the embodiment shown in FIG. 1 . includes the steps to be Therefore, even if omitted below, the flowchart of the method for deriving the second entity name pattern performed by the query analysis apparatus 100 according to the embodiment shown in FIG. 1 is also applied.

단계 S410에서 질의 분석 장치(100)는 질의 데이터를 입력받을 수 있다.In step S410, the query analysis apparatus 100 may receive query data.

단계 S420에서 질의 분석 장치(100)는 입력받은 질의 데이터에 대하여 자연어 처리를 수행하여 제 1 개체명 패턴을 도출할 수 있다.In step S420 , the query analysis apparatus 100 may derive a first entity name pattern by performing natural language processing on the received query data.

단계 S430에서 질의 분석 장치(100)는 입력받은 질의 데이터에 대하여 딥러닝 모델을 이용하여 제 2 개체명 패턴을 도출할 수 있다.In step S430 , the query analysis apparatus 100 may derive a second entity name pattern from the received query data using a deep learning model.

상술한 설명에서, 단계 S410 내지 S430은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S410 to S430 may be further divided into additional steps or combined into fewer steps according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be switched.

도 1 내지 도 4를 통해 설명된 질의 분석 장치에서 질의 데이터를 분석하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 4를 통해 설명된 질의 분석 장치에서 질의 데이터를 분석하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다.The method of analyzing query data in the query analyzing apparatus described with reference to FIGS. 1 to 4 may be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by the computer. . Also, the method of analyzing query data in the query analyzing apparatus described with reference to FIGS. 1 to 4 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

100: 질의 분석 장치
110: 입력부
120: 자연어 처리부
130: 딥러닝 모델 처리부
140: 유사 메타 추출부
150: 질의 분석부
160: 출력부100: query analysis device
110: input unit
120: natural language processing unit
130: deep learning model processing unit
140: similar meta extraction unit
150: query analysis unit
160: output unit

Claims

In the query analysis device for analyzing query data,
an input unit for receiving query data;
a natural language processing unit for deriving a first entity name pattern by performing natural language processing on the received query data;
a deep learning model processing unit for deriving a second entity name pattern using a deep learning model with respect to the received query data;
a similarity meta extraction unit for extracting similar meta data for each entity name included in the second entity name pattern;
a query analysis unit that performs query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and the similar metadata; and
An output unit for outputting a query analysis result for the received query data
That comprising a, query analysis device.

The method of claim 1,
The received query data includes at least one keyword corresponding to a verb or a verb,
and the natural language processing unit derives the first entity name pattern for the received query data by matching the keyword to any one entity name among a plurality of entity names.

The method of claim 1,
The natural language processing unit is a pattern accuracy deriving unit that derives accuracy for each first entity name pattern when a plurality of first entity name patterns are derived.
further comprising,
wherein the natural language processing unit selects one of the plurality of derived first entity name patterns based on the accuracy.

The method of claim 1,
The deep learning model is based on first training data generated based on word embedding, second training data generated based on statistics on the first entity name pattern, and feedback data for the query analysis result. A query analysis device that is learned using one or more of the generated third learning data.

5. The method of claim 4,
The deep learning model includes a first model that outputs a probability value corresponding to the received query data with respect to a plurality of pre-learned entity name patterns.

6. The method of claim 5,
The deep learning model further comprises a second model for deriving the second entity name pattern based on the output value of the first model, query analysis apparatus.

The method of claim 1,
and the similarity meta extracting unit extracts, as the similarity metadata, a word having a shape similar to each entity name included in the second entity name pattern, based on the entity name index.

8. The method of claim 7,
The similarity meta extraction unit extracts, as the similarity metadata, a word having a similar shape based on word embedding when there is no word having a similar shape in the entity name index.

In the query analysis method of analyzing query data,
receiving query data;
deriving a first entity name pattern by performing natural language processing on the received query data;
deriving a second entity name pattern using a deep learning model with respect to the received query data;
extracting similar metadata for each entity name included in the second entity name pattern;
performing query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and the similar metadata; and
outputting a query analysis result for the received query data
A query analysis method comprising a.

10. The method of claim 9,
The received query data includes at least one keyword corresponding to a verb or a verb,
The step of deriving the first entity name pattern comprises deriving the first entity name pattern for the received query data by matching the keyword to any one entity name among a plurality of entity names.

10. The method of claim 9,
deriving accuracy for each first entity name pattern when a plurality of first entity name patterns are derived; and
selecting one of the derived plurality of first entity name patterns based on the accuracy
Further comprising a, query analysis method.

10. The method of claim 9,
The deep learning model is based on first training data generated based on word embedding, second training data generated based on statistics on the first entity name pattern, and feedback data for the query analysis result. A query analysis method that is learned using one or more of the generated third learning data.

13. The method of claim 12,
The deep learning model includes a first model that outputs a probability value corresponding to the received query data with respect to a plurality of pre-learned entity name patterns.

14. The method of claim 13,
The deep learning model further comprises a second model for deriving the second entity name pattern based on the output value of the first model.

10. The method of claim 9,
The extracting of the similarity metadata comprises extracting, as the similarity metadata, a word having a shape similar to each entity name included in the second entity name pattern based on the entity name index.

16. The method of claim 15,
The extracting of the similarity metadata includes extracting a word having a similar shape as the similarity metadata based on word embedding when a word having a similar shape does not exist in the entity name index.

A computer program stored on a medium comprising a sequence of instructions for analyzing query data, the computer program comprising:
When the computer program is executed by a computing device,
Receive query data,
A first entity name pattern is derived by performing natural language processing on the received query data,
Deriving a second entity name pattern using a deep learning model with respect to the received query data,
extracting similar metadata for each entity name included in the second entity name pattern;
performing query analysis on the received query data based on the first entity name pattern, the second entity name pattern, and the similar metadata;
A computer program stored in a medium comprising a sequence of instructions for outputting a query analysis result for the received query data.