KR20240025728A

KR20240025728A - Query processing method and system

Info

Publication number: KR20240025728A
Application number: KR1020220103681A
Authority: KR
Inventors: 이유영; 김혜영; 김선라; 인수교; 문기윤; 김경덕; 서수빈; 남경민; 김현욱
Original assignee: 네이버 주식회사
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2024-02-27

Abstract

본 발명은 사용자의 질의 의도에 따른 응답을 출력하기 위한 질의 처리 방법 및 시스템에 관한 것이다. 본 발명에 따른 질의 처리 방법은, 사용자로부터 사용자 질의를 수신하는 단계, 적어도 하나 이상의 문장을 입력으로 하여 정규화 된 하나의 문장 또는 단어를 생성하도록 학습된 질의 정규화 모델을 이용하여, 상기 사용자 질의에 상응하는 정규화된 질의를 생성하는 단계, 상기 정규화된 질의에 대응하는 응답을 생성하는 단계 및 상기 응답을 상기 사용자에게 제공하는 단계를 포함할 수 있다.The present invention relates to a query processing method and system for outputting a response according to the user's query intent. The query processing method according to the present invention includes receiving a user query from a user, using a query normalization model learned to generate a normalized sentence or word using at least one sentence as input, and responding to the user query. It may include generating a normalized query, generating a response corresponding to the normalized query, and providing the response to the user.

Description

Query processing method and system {QUERY PROCESSING METHOD AND SYSTEM}

본 발명은 사용자의 질의 의도에 따른 응답을 출력하기 위한 질의 처리 방법 및 시스템에 관한 것이다.The present invention relates to a query processing method and system for outputting a response according to the user's query intent.

인공지능의 사전적 의미는, 인간의 학습능력과 추론능력, 지각능력, 자연언어의 이해능력 등을 컴퓨터 프로그램으로 실현한 기술이라 할 수 있다. 이러한 인공지능은 딥러닝으로 인하여 비약적인 발전을 이루었다.The dictionary meaning of artificial intelligence is a technology that realizes human learning ability, reasoning ability, perception ability, and natural language understanding ability through computer programs. Artificial intelligence has made rapid progress thanks to deep learning.

특히, 인공지능의 발달에 힘입어, 다양한 언어모델(Language Model)이 개발되었으며, 이러한 언어모델은 텍스트를 인지하고, 그 의미를 이해할 뿐만 아니라 문서 등 방대한 텍스트가 포함된 데이터로부터 정보를 추출하고, 분류하며, 나아가 직접 텍스트를 생성하는 수준에 이르렀다.In particular, thanks to the development of artificial intelligence, various language models have been developed. These language models not only recognize text and understand its meaning, but also extract information from data containing a large amount of text such as documents, It has reached the level of classifying and directly generating text.

이러한 언어모델은 다양한 분야에 적극적으로 활용되고 있으며, 예를 들어, 검색 서비스, 문서 작성(ex: 이력서 작성, 보고서 작성, 게시물 작성 등), 다양한 주제에 대한 자유 대화, 주어진 텍스트에서의 데이터 파싱(ex: 데이터 요약, 분류 등), 전문 지식 제공, 프로그래밍, 주어진 문장을 적절한 스타일의 문장으로 변환 등과 같이, 텍스트를 기반으로 수행될 수 있는 다양한 분야가 존재한다.These language models are actively used in various fields, for example, search services, document writing (ex: resume writing, report writing, posting writing, etc.), free conversation on various topics, and data parsing from a given text ( There are a variety of fields that can be performed based on text, such as data summarization, classification, etc.), providing expert knowledge, programming, converting a given sentence into an appropriate style of sentence, etc.

한편, 검색 서비스의 경우, 사용자로부터 질의를 수신하여, 사용자의 질의 의도에 맞는 정확한 정답(또는 응답)을 도출하는 것이 중요하다. 특히, 기술이 발전됨에 따라, 사용자의 질의를 수신하는 수단 및 상황도 다양화되었다. 근래에는 대화형 에이전트(agent)를 이용하여, 에이전트와 사용자와의 대화 상황에서, 사용자의 질의를 특정하고, 사용자의 질의 의도를 파악하여, 사용자의 질의에 대한 적절한 정답을 도출하는 기술에 대한 많은 연구가 이루어지고 있다.Meanwhile, in the case of a search service, it is important to receive a query from a user and derive an accurate correct answer (or response) that matches the intention of the user's query. In particular, as technology develops, means and situations for receiving user inquiries have also diversified. Recently, many technologies have been developed to specify the user's query, identify the user's query intention, and derive an appropriate answer to the user's query in a conversation situation between the agent and the user using an interactive agent. Research is being done.

이때, 사용자의 질의 의도를 파악하기 위하여, 다양한 연구가 이루어지고 있으며, 방대한 양의 데이터로 구성된 언어모델을 적극 활용하여, 사용자의 정확한 질의 의도를 파악하는 방법에 대한 연구가 필요하다.At this time, various studies are being conducted to understand the user's query intention, and research is needed on how to identify the user's exact query intention by actively utilizing a language model composed of a large amount of data.

본 발명은 사용자의 정확한 질의 의도를 특정할 수 있는 질의 처리 방법 및 시스템을 제공하기 위한 것이다.The purpose of the present invention is to provide a query processing method and system that can specify the user's exact query intent.

나아가, 본 발명의 사용자와 에이전트 간의 대화 문맥에 기초하여, 사용자의 질의 의도를 파악할 수 있는 질의 처리 방법 및 시스템을 제공하기 위한 것이다.Furthermore, the purpose of the present invention is to provide a query processing method and system that can determine the user's query intention based on the conversation context between the user and the agent.

나아가, 본 발명은, 사용자의 질의 의도가 반영된 정규화된 질의를 생성할 수 있는 질의 처리 방법 및 시스템을 제공하기 위한 것이다.Furthermore, the present invention is intended to provide a query processing method and system that can generate a normalized query that reflects the user's query intent.

나아가, 본 발명은 언어모델을 이용하여, 사용자의 질의에 대응되는 정규화된 질의를 생성할 수 있는 질의 처리 방법 및 시스템을 제공하기 위한 것이다.Furthermore, the present invention is intended to provide a query processing method and system that can generate a normalized query corresponding to a user's query using a language model.

나아가, 본 발명은 사용자의 질의 의도에 따른 정규화된 질의를 생성하기 위하여 언어모델을 학습하는 방법을 제공하기 위한 것이다.Furthermore, the present invention is intended to provide a method of learning a language model to generate a normalized query according to the user's query intention.

위에서 살펴본 과제를 해결하기 위하여, 본 발명에 따른 질의 처리 방법은, 사용자로부터 사용자 질의를 수신하는 단계, 적어도 하나 이상의 문장을 입력으로 하여 정규화 된 하나의 문장 또는 단어를 생성하도록 학습된 질의 정규화 모델을 이용하여, 상기 사용자 질의에 상응하는 정규화된 질의를 생성하는 단계, 상기 정규화된 질의에 대응하는 응답을 생성하는 단계 및 상기 응답을 상기 사용자에게 제공하는 단계를 포함할 수 있다.In order to solve the problems described above, the query processing method according to the present invention includes the steps of receiving a user query from a user, and a query normalization model learned to generate one normalized sentence or word using at least one sentence as input. It may include generating a normalized query corresponding to the user query, generating a response corresponding to the normalized query, and providing the response to the user.

본 발명에 따른 언어 모델을 이용한 질의 처리 시스템은, 사용자로부터 사용자 질의를 수신하는 질의 수신부 및 적어도 하나 이상의 문장을 입력으로 하여 정규화 된 하나의 문장 또는 단어를 생성하도록 학습된 질의 정규화 모델을 이용하여, 상기 사용자 질의에 상응하는 정규화된 질의를 생성하는 질의 정규화부를 포함할 수 있다.The query processing system using a language model according to the present invention uses a query receiver that receives a user query from a user and a query normalization model learned to generate one normalized sentence or word using at least one sentence as input, It may include a query normalization unit that generates a normalized query corresponding to the user query.

본 발명에 따른 전자기기에서 하나 이상의 프로세스에 의하여 실행되며, 컴퓨터로 판독될 수 있는 기록매체에 저장된 프로그램은, 사용자로부터 사용자 질의를 수신하는 단계; 및 적어도 하나 이상의 문장을 입력으로 하여 정규화 된 하나의 문장 또는 단어를 생성하도록 학습된 질의 정규화 모델을 이용하여, 상기 사용자 질의에 상응하는 정규화된 질의를 생성하는 단계를 수행하도록 하는 명령어들을 포함할 수 있다.A program executed by one or more processes in an electronic device according to the present invention and stored in a computer-readable recording medium includes the steps of: receiving a user inquiry from a user; and instructions for generating a normalized query corresponding to the user query using a query normalization model learned to generate a normalized sentence or word using at least one sentence as input. there is.

본 발명에 따른 검색 서비스 제공 방법은, 사용자 단말로부터 사용자 음성을 수신하는 단계, STT변환에 근거하여, 상기 사용자 음성에 대응되는 사용자 질의를 획득하는 단계, 기 학습된 언어 모델로부터, 상기 사용자 질의에 대응되는 정규화된 질의를 획득하는 단계 및 상기 정규화된 질의와 관련된 검색 결과가 포함된 검색 페이지를 상기 사용자 단말에 제공하는 단계를 포함할 수 있다.The method for providing a search service according to the present invention includes receiving a user voice from a user terminal, obtaining a user query corresponding to the user voice based on STT conversion, and responding to the user query from a pre-learned language model. It may include obtaining a corresponding normalized query and providing a search page containing search results related to the normalized query to the user terminal.

위에서 살펴본 것과 같이, 본 발명에 따른 언어 모델을 이용한 질의 처리 방법은, 사용자의 질의 의도를 파악하고, 사용자의 질의 의도가 반영된 정규화된 질의를 생성할 수 있다. 이와 같이, 본 발명에 따른 언어 모델을 이용한 질의 처리 방법은 정규화된 질의 의도를 이용하여, 사용자의 질의에 대응되는 정답(또는 응답)을 검색할 수 있다. As seen above, the query processing method using the language model according to the present invention can identify the user's query intention and generate a normalized query that reflects the user's query intention. In this way, the query processing method using the language model according to the present invention can search for the correct answer (or response) corresponding to the user's query using the normalized query intent.

도 1 및 도 2는 본 발명에 따른 언어 모델을 이용한 질의 처리 방법 및 시스템을 설명하기 위한 개념도이다.
도 3 및 도 4는 정규화된 질의를 설명하기 위한 개념도들이다.
도 5 내지 도 8은 학습 데이터를 설명하기 위한 개념도들이다.
도 11 내지 도 14는 본 발명에 따른 질의 처리 방법 및 시스템의 활용 예들을 설명하기 위한 개념도들이다.1 and 2 are conceptual diagrams for explaining a query processing method and system using a language model according to the present invention.
Figures 3 and 4 are conceptual diagrams for explaining normalized queries.
Figures 5 to 8 are conceptual diagrams for explaining learning data.
11 to 14 are conceptual diagrams illustrating examples of use of the query processing method and system according to the present invention.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소에는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 “모듈” 및 “부”는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings. However, identical or similar components will be assigned the same reference numbers regardless of drawing symbols, and duplicate descriptions thereof will be omitted. The suffixes “module” and “part” for components used in the following description are given or used interchangeably only for the ease of preparing the specification, and do not have distinct meanings or roles in themselves. Additionally, in describing the embodiments disclosed in this specification, if it is determined that detailed descriptions of related known technologies may obscure the gist of the embodiments disclosed in this specification, the detailed descriptions will be omitted. In addition, the attached drawings are only for easy understanding of the embodiments disclosed in this specification, and the technical idea disclosed in this specification is not limited by the attached drawings, and all changes included in the spirit and technical scope of the present invention are not limited. , should be understood to include equivalents or substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms containing ordinal numbers, such as first, second, etc., may be used to describe various components, but the components are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 “연결되어” 있다거나 “접속되어” 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 “직접 연결되어” 있다거나 “직접 접속되어” 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is said to be “connected” or “connected” to another component, it is understood that it may be directly connected or connected to the other component, but that other components may exist in between. It should be. On the other hand, when a component is said to be “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 출원에서, “포함한다” 또는 “가지다” 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

도 1 및 도 2는 본 발명에 따른 질의 처리 시스템을 설명하기 위한 개념도이고, 도 3 및 도 4는 정규화된 질의를 설명하기 위한 개념도들이다. 나아가, 도 5 내지 도 8은 학습 데이터를 설명하기 위한 개념도들이다.1 and 2 are conceptual diagrams for explaining a query processing system according to the present invention, and FIGS. 3 and 4 are conceptual diagrams for explaining a normalized query. Furthermore, Figures 5 to 8 are conceptual diagrams for explaining learning data.

에이전트(agent)는 사용자로부터 질의(query)를 수신하여, 수신된 질의에 대응하는 적절한 응답(answer) 사용자에게 출력하도록 이루어질 수 있다.An agent may receive a query from a user and output an appropriate answer corresponding to the received query to the user.

도 1에 도시된 것과 같이, 에이전트는 다양한 종류의 전자기기(20)의 일 기능으로서 포함되거나, 검색 서비스, 상담 서비스, 정보 제공 서비스, 가상 비서 서비스 등과 같이 사용자의 질의가 이루어질 수 있는 다양한 서비스를 제공하는 웹사이트, 애플리케이션(application) 또는 소프트웨어(software)의 일 기능으로서 포함될 수 있다. As shown in FIG. 1, the agent is included as a function of various types of electronic devices 20, or provides various services through which user inquiries can be made, such as search services, consultation services, information provision services, virtual assistant services, etc. It may be included as a function of a provided website, application, or software.

일 예로서, 에이전트는 음성 인식 기능을 구비한 스피커(speaker, 20b)를 통해, 발화 형식으로 사용자(USER)로부터 질의를 수신하고, 질의에 대한 적절한 응답을 사용자에게 제공할 수 있다. 이 경우, 에이전트와 사용자(USER)간 발화 형식의 대화(40)가 이루어질 수 있다.As an example, the agent may receive a query from the user (USER) in the form of a speech through a speaker (20b) equipped with a voice recognition function, and provide the user with an appropriate response to the query. In this case, a conversation 40 in the form of speech may occur between the agent and the user (USER).

또 다른 예로서, 에이전트는 스마트폰(20a), 태블릿, TV 등과 같이 디스플레이부가 구비된 전자기기를 통해, 사용자(USER)와 채팅을 통하여 사용자(USER)로부터 질의를 수신하고, 질의에 대한 적절한 응답을 사용자에게 제공할 수 있다.As another example, the agent receives an inquiry from the user (USER) through a chat with the user (USER) through an electronic device equipped with a display unit, such as a smartphone (20a), tablet, TV, etc., and responds appropriately to the inquiry. can be provided to the user.

또 다른 예로서, 에이전트는 네이버, 구글과 같이, 검색 서비스를 제공하는 웹사이트 또는 애플리케이션을 통해, 음성 또는 텍스트 형식의 질의를 수신할 수 있다. 한편, 사용자의 질의에 대한 적절한 응답을 하기 위해서는 사용자의 질의 의도를 정확하게 파악하는 것이 중요하다. 이에 본 발명에서는, 사용자 질의를 정규화(normalization)하는 방법을 제공한다. 본 발명에 따른 질의 처리 방법 및 시스템에서는, 사용자의 질의를 단일 문장(또는 하나의 문장)으로 생성하는 정규화를 수행할 수 있다. As another example, the agent may receive a query in voice or text format through a website or application that provides a search service, such as Naver or Google. Meanwhile, in order to provide an appropriate response to a user's inquiry, it is important to accurately understand the intention of the user's inquiry. Accordingly, the present invention provides a method for normalizing user queries. In the query processing method and system according to the present invention, normalization can be performed to generate a user's query into a single sentence (or one sentence).

본 발명에 따른 질의 처리 방법 및 시스템에서는, 에이전트를 통해 수신된 사용자 질의를 질의 정규화 모델(130)을 포함하는 질의 정규화부(120)에 입력하여, 질의 정규화부(120)의 출력으로서, 사용자 질의에 대응되는 정규화된 질의를 획득할 수 있다.In the query processing method and system according to the present invention, the user query received through the agent is input to the query normalization unit 120 including the query normalization model 130, and as the output of the query normalization unit 120, the user query You can obtain a normalized query corresponding to .

나아가, 본 발명에 따른 질의 처리 방법 및 시스템에서는, 서비스 서버(30)으로부터, 정규화된 질의에 대한 응답(정답 또는 검색 결과 등)을 획득할 수 있으며, 획득된 응답은 에이전트로 제공될 수 있다. 즉, 질의 처리 시스템(100)을 통해 정규화된 질의는 서비스 서버(30)에 전달될 수 있으며, 서비스 서버(30)를 통해, 정규화된 질의에 대한 응답이 획득될 수 있다. 한편, 획득된 응답은, 에이전트에 전달되어, 사용자 질의에 대한 에이전트의 출력으로 활용될 수 있다. Furthermore, in the query processing method and system according to the present invention, a response (correct answer or search result, etc.) to a normalized query can be obtained from the service server 30, and the obtained response can be provided to the agent. That is, the normalized query can be delivered to the service server 30 through the query processing system 100, and a response to the normalized query can be obtained through the service server 30. Meanwhile, the obtained response can be delivered to the agent and used as the agent's output in response to the user's query.

도 1 및 도 2에 도시된 것과 같이, 본 발명에 따른 질의 처리 시스템(100)은 질의 수신부(110) 및 질의 정규화부(120)를 포함할 수 있다. As shown in Figures 1 and 2, the query processing system 100 according to the present invention may include a query reception unit 110 and a query normalization unit 120.

질의 처리 시스템(100)의 질의 수신부(110)는 에이전트로부터 사용자 질의를 수신하도록 이루어질 수 있다. 한편, 사용자 질의는 음성 및 텍스트 중 적어도 하나의 형태로 수신되는 것이 가능하다. The query receiving unit 110 of the query processing system 100 may be configured to receive a user query from an agent. Meanwhile, it is possible for user inquiries to be received in at least one of voice and text forms.

다음으로, 질의 정규화부(120)는 사용자 질의에 대한 정규화를 수행함으로써, 정규화된 질의가 획득되도록 일련의 제어를 수행할 수 있다. Next, the query normalization unit 120 can perform a series of controls to obtain a normalized query by performing normalization on the user query.

질의 정규화부(120)는 질의 정규화 모델(130)을 포함하도록 이루어질 수 있다. 질의 정규화부(120)는 사용자 질의를 질의 정규화 모델(130)에 입력할 수 있다. 그리고, 질의 정규화부(120)는 질의 정규화 모델(130)로부터, 사용자 질의에 대한 정규화된 질의를 획득할 수 있다.The query normalization unit 120 may include a query normalization model 130. The query normalization unit 120 may input a user query into the query normalization model 130. Additionally, the query normalization unit 120 may obtain a normalized query for the user query from the query normalization model 130.

사용자 질의에 대해 정규화된 질의를 생성하는 질의 정규화 모델(130)은 자연어처리를 위한 언어 모델(Language Model)로 구성되며 그 종류에 제한이 없다. 질의 정규화 모델(130)은 적어도 하나 이상의 문장을 입력으로 하여 정규화 된 하나의 문장 또는 단어를 생성하도록 학습되도록 이루어진다.The query normalization model 130, which generates a normalized query for a user's query, is composed of a language model for natural language processing and has no restrictions on its type. The query normalization model 130 is trained to generate one normalized sentence or word using at least one sentence as input.

도 2에 도시된 것과 같이, 질의 정규화 모델(130)은 이미 다양한 정보에 대하여 학습이 이루어진 기 학습 언어 모델(140, Pre-trained Language Model)을 이용할 수 있다. 이 때 기 학습 언어 모델은 수억개 이상의 모수(파라미터)로 이루어진 딥러닝 모델을 대규모 데이터로 학습시킨 초거대언어모델일 수 있으며, 이 경우 보다 나은 성능 향상을 위하여 정규화된 질의를 생성하기 위한 학습 데이터로 파인 튜닝하기 위해 프롬프트 인코더(150, Prompt Encoder)를 이용할 수 있다. 프롬프트 인코더를 이용한 p-tuning 방법에 대하여는 후술한다. As shown in FIG. 2, the query normalization model 130 can use a pre-trained language model (140) that has already been trained on various information. At this time, the previously learned language model may be a very large language model in which a deep learning model consisting of hundreds of millions of parameters or more is trained with large-scale data. In this case, the learning data is used to generate normalized queries to improve performance. A prompt encoder (150, Prompt Encoder) can be used for low-fine tuning. The p-tuning method using the prompt encoder will be described later.

질의 처리 시스템(100)은 질의 정규화부(120)의 질의 정규화 모델(130)을 이용하여, 사용자와 에이전트 간에 이루어진 대화의 문맥을 고려하여, 사용자 질의에 대한 정규화된 질의를 생성할 수 있다. 구체적으로, 질의 처리 시스템(100)은 정규화의 대상이 되는 사용자 질의 뿐만 아니라, 에이전트와 사용자 간에 형성된 대화 세션을 통해 이루어진 대화 중 적어도 일부를 질의 정규화 모델(130)에 입력하여, 사용자 질의에 대해 정규화된 질의를 생성할 수 있다. The query processing system 100 may use the query normalization model 130 of the query normalization unit 120 to generate a normalized query for the user query by considering the context of the conversation between the user and the agent. Specifically, the query processing system 100 inputs not only the user query that is the target of normalization, but also at least a portion of the conversation conducted through a conversation session formed between the agent and the user into the query normalization model 130 to normalize the user query. You can create a query.

이 때 정규화의 대상이 되는 사용자 질의는 입력의 마지막 문장 일 수 있다. 질의는 복수의 질의 유형으로 구분될 수 있으며, 복수의 질의 유형 중 제1 질의 유형은 대화형 질의 유형이고, 제2 질의 유형은 검색형 질의 유형으로 구분될 수 있다.At this time, the user query subject to normalization may be the last sentence of the input. Queries can be divided into a plurality of query types. Among the plurality of query types, the first query type is an interactive query type, and the second query type is a search query type.

질의 정규화 모델(130)은, 복수의 질의 유형 각각에 대한 예시들을 포함하는 학습 데이터에 기반하여 학습되어, 사용자의 질의 유형에 따라 정규화된 질의를 생성할 수 있다. 즉, 질의 처리 시스템(100)은 복수의 질의 유형 각각에 대한 예시들을 포함하는 학습 데이터에 의해 학습된 질의 정규화 모델(130)을 이용하여, 사용자의 질의 유형에 따라 정규화된 질의를 생성할 수 있다.The query normalization model 130 is trained based on learning data including examples for each of a plurality of query types, and can generate a normalized query according to the user's query type. That is, the query processing system 100 can generate a normalized query according to the user's query type using the query normalization model 130 learned by learning data including examples of each of a plurality of query types. .

대화형 질의 유형은 에이전트와 사용자 간의 대화의 문맥을 고려하여, 사용자 질의에 대한 정규화를 수행하는 유형으로 이해되어질 수 있다.The interactive query type can be understood as a type that performs normalization on user queries by considering the context of the conversation between the agent and the user.

대화형 질의 유형은 복수의 발화 유형으로 구분되어질 수 있다. 제1 발화 유형은 사용자 질의가, 앞선 대화의 내용과 관련성이 있는 유형으로서, 이는 “문맥 연속 발화”라고도 표현될 수 있다. 즉, 제1 발화 유형은, 사용자 질의로 특정된 사용자의 마지막 발화가, 이전 대화의 문맥과 관련된 유형일 수 있다. 이 경우, 질의 처리 시스템(100)은, 사용자 질의 뿐만 아니라 이전 대화의 문맥을 고려하여, 정규화된 질의를 생성할 수 있다. 예를 들어, 도 3의 (a) 내지 (c)는 제1 질의 유형에 해당하는 대화형 질의 유형 중 제1 발화 유형에 해당하는 문맥 연속 발화 유형일 수 있다.Interactive query types can be divided into multiple utterance types. The first utterance type is a type in which the user's query is related to the content of the previous conversation, and can also be expressed as a “continuous utterance in context.” That is, the first utterance type may be a type in which the last utterance of the user specified by the user query is related to the context of the previous conversation. In this case, the query processing system 100 may generate a normalized query by considering the context of the previous conversation as well as the user query. For example, (a) to (c) of Figures 3 may be a context continuous utterance type corresponding to the first utterance type among the interactive query types corresponding to the first query type.

예를 들어, 도 3의 (a)에 도시된 것과 같이, 사용자와 에이전트 간에 “거북이”(301)의 특징에 대하여 대화가 이루어지고 있는 중에 사용자가 “육식 동물이야?” 라는 발화를 수행할 수 있다. 이 경우, 사용자의 질의에는, 질문의 대상, 즉 “누가” 육식 동물인지 특정되지 않았다. 그러나, 사용자와 에이전트간의 대화의 흐름상 사용자는 “거북이”(301)를 염두해두고 해당 발화를 했을 것이므로, 사용자의 의도를 반영한 정규화된 질의는 “거북이는 육식 동물이야?” 가 될 수 있다.For example, as shown in Figure 3(a), while a conversation is taking place between a user and an agent about the characteristics of a “turtle” (301), the user asks, “Is it a carnivore?” You can perform the utterance. In this case, the user's query did not specify the subject of the question, that is, “who” the carnivore was. However, in the flow of conversation between the user and the agent, the user would have made the utterance with “turtle” (301) in mind, so the normalized query reflecting the user’s intention would be “Are turtles carnivores?” It can be.

예를 들어, 사용자 질의에는 문장 성분을 이루는 주어, 서술어, 목적어, 보어, 관형어 및 부사어 중 적어도 하나가 생략되어 있을 수 있다. 질의 처리 시스템(100)은 생략된 문장 성분 중 적어도 하나를 이전 대화의 문맥을 이용하여 보충하여 사용자 질의의 질의 의도에 따른 응답을 생성할 수 있는 정규화된 질의를 생성할 수 있다. 그 결과, 도 3의 (a)에 도시된 바와 같이, 사용자와 에이전트 간의 대화에서 마지막 사용자 질의에 대한 정규화된 질의는, 이전 대화 내용에 포함된 “거북이”(303)라는 단어와 사용자 질의에 포함된 “육식 동물”이라는 단어를 포함할 수 있다.For example, in the user query, at least one of the subject, predicate, object, complement, adjective, and adverb that constitute sentence components may be omitted. The query processing system 100 may supplement at least one of the omitted sentence components using the context of a previous conversation to generate a normalized query capable of generating a response according to the intention of the user's query. As a result, as shown in (a) of Figure 3, the normalized query for the last user query in the conversation between the user and the agent includes the word “turtle” 303 included in the previous conversation content and the user query. May contain the word “carnivore.”

한편, 생략된 문장 성분에 따른 내용은 개요 또는 의미 요청(ex: 어떤 내용이야? 어떤 뜻이야?), 육하 원칙(ex: 누가, 언제, 어디서, 무엇을, 어떻게, 왜), 비교(ex: 다른점, 비슷한 점) 요청, 종류(ex: 어떤 것이 있어) 요청, 추가 정보 요청(ex: 더 자세히 알려줘, 더 알려줘, 더 쉽게 알려줘, 다른 것도 알려줘 등)을 등을 특정하기 위한 내용일 수 있다.Meanwhile, the contents according to the omitted sentence components include outline or meaning request (ex: What is the content? What does it mean?), hexadecimal principle (ex: who, when, where, what, how, why), comparison (ex: It may be used to specify requests (differences, similarities), type (ex: is there something) request, request for additional information (ex: tell me in more detail, tell me more, tell me more easily, tell me something else, etc.) .

또 다른 예로, 도 3의 (b)에 도시된 것과 같이, 사용자와 에이전트 간에는 “전기를 만드는 방법”, “전기 발전 방법”, “풍력 발전소”에 대하여 대화가 이루어졌으며, 사용자의 마지막 발화로서 “선풍기 같이 생긴건가?” 라는 사용자 질의가 수신되었다. 이때, 질의 처리 시스템(100)는 사용자 질의를 이전 대화 문맥을 고려하여 사용자가 “풍력발전소”(311)가 “선풍기”(312)같이 생겼는지, 즉, 풍력 발전기의 형태에 대하여 질문하려는 의도를 파악할 수 있다. 그 결과, 질의 처리 시스템(100)은 사용자 질의의 내용(314)과 이전 대화의 내용(313)을 조합하여, 정규화된 질의인 “풍력 발전기는 선풍기 같이 생긴 건가?”를 생성할 수 있다.As another example, as shown in (b) of Figure 3, a conversation was held between the user and the agent about “how to make electricity,” “how to generate electricity,” and “wind power plant,” and as the user’s last utterance, “ “Does it look like a fan?” A user inquiry was received. At this time, the query processing system 100 considers the user query's previous conversation context and determines whether the user's intention to ask a question about whether the “wind power plant” (311) looks like an “electric fan” (312), that is, the shape of the wind power generator. You can. As a result, the query processing system 100 may combine the content 314 of the user query and the content 313 of the previous conversation to generate a normalized query, “Does a wind turbine look like a fan?”

유사한 예로서, 도 3의 (c) 사용자의 마지막 발화에 해당하는 사용자 질의인 “첫차시간도 알려줘”에 대하여, 질의 처리 시스템(100)은 이전 대화의 문맥을 고려하여, “광주 1187번 버스 첫차 시간 알려줘”라는 정규화된 질의를 생성할 수 있다.As a similar example, for the user query “Please tell me the time of the first bus,” which corresponds to the user’s last utterance in Figure 3 (c), the query processing system 100 considers the context of the previous conversation and asks, “The first bus of Gwangju 1187 bus.” You can create a normalized query like “Tell me the time.”

한편, 제1 질의 유형에 해당하는 대화형 질의 유형의 복수의 발화 유형 중 제2 발화 유형은 사용자 질의가, 사용자와 에이전트 간에 이루어진 대화의 내용과 관련성이 없는 유형으로서, 이는 “문맥 단절 발화”라고도 표현될 수 있다. 즉, 제2 발화 유형은, 사용자 질의로 특정된 사용자의 마지막 발화가, 이전 대화의 문맥과 관련성이 없는 유형일 수 있다. 이 경우 질의 처리 시스템(100)은 사용자 질의만을 고려하고 이전 대화의 문맥을 고려하지 않을 수 있다.Meanwhile, among the plurality of utterance types of the interactive query type corresponding to the first query type, the second utterance type is a type in which the user query is not related to the content of the conversation between the user and the agent, and is also called a “context disconnected utterance.” can be expressed. That is, the second utterance type may be a type in which the last utterance of the user specified by the user query is not related to the context of the previous conversation. In this case, the query processing system 100 may consider only the user query and not consider the context of the previous conversation.

예를 들어, 도 3의 (d)에 도시된 것과 같이, 사용자의 마지막 발화에 해당하는 사용자 질의인 “오늘 날씨 어때?” 이전의 대화 문맥을 살펴보면, 사용자와 에이전트 간에는, 초등학교 공부법에 대하여 대화가 이루어지고 있었다. 이 경우, 사용자 질의인 “오늘 날씨 어때?”는 앞선 문맥과 관련성이 없다. 따라서, 이 경우, 질의 처리 시스템(100)는 정규화된 질의로서, 사용자 질의와 동일한 “오늘 날씨 어때?”를 출력할 수 있다.For example, as shown in (d) of Figure 3, the user query corresponding to the user's last utterance, “How is the weather today?” Looking at the previous conversation context, a conversation was taking place between the user and the agent about elementary school study methods. In this case, the user query “How is the weather today?” is not relevant to the preceding context. Therefore, in this case, the query processing system 100 may output “How is the weather today?” as a normalized query, which is the same as the user query.

마찬가지로, 도 3의 (e)에 도시된 것과 같이, 사용자의 마지막 발화에 해당하는 사용자 질의인 “어제 우리나라 축구 결과는 어땠어?” 이전의 대화 문맥을 살펴보면, 사용자와 에이전트 간에는 우주 및 별똥별에 대하여 대화가 이루어지고 있었다. 이 경우, 사용자 질의는 앞선 문맥과 관련성이 없다. 따라서, 이 경우, 질의 처리 시스템(100)은 정규화된 질의로서, 사용자 질의와 동일한 “어제 우리나라 축구 결과는 어땠어?”를 출력할 수 있다. 또는, “어제 우리나라 축구 결과 알려줘” 등과 같이 보다 서비스에 적합한 형태의 문장으로 정규화 될 수도 있다.Likewise, as shown in (e) of Figure 3, the user query corresponding to the user's last utterance, “What was the soccer result in our country yesterday?” Looking at the previous conversation context, a conversation was taking place between the user and the agent about space and shooting stars. In this case, the user query is not relevant to the preceding context. Therefore, in this case, the query processing system 100 can output “What was the soccer result in Korea yesterday?” as a normalized query, which is the same as the user query. Alternatively, it can be normalized into a sentence that is more suitable for the service, such as “Please tell me the soccer results of my country yesterday.”

검색형 질의 유형은 검색 서비스를 위한 검색형 질의로 변경하는 유형에 해당할 수 있다. 검색형 질의 유형은, 복수의 발화 유형으로 구분될 수 있으며, 제1 발화 유형은, 정보 검색을 위한 단일 발화, 제2 발화 유형은 다중 발화 유형에 해당할 수 있다. The search-type query type may correspond to a type that changes to a search-type query for a search service. The search-type query type may be divided into a plurality of utterance types, where the first utterance type may correspond to a single utterance for information search, and the second utterance type may correspond to multiple utterance types.

단일 발화 유형은, 도 4의 (a) 내지 (c)에 도시된 것과 같이, 정보를 검색하기 위하여 사용자의 단일 발화가 수신된 경우일 수 있다. 이 경우, 사용자의 발화가 곧 사용자 질의가 될 수 있다. The single utterance type may be a case where a single utterance from a user is received to search for information, as shown in Figures 4 (a) to (c). In this case, the user's utterance may soon become a user query.

예를 들어, 도 4의 (a)에 도시된 것과 같이, “체리는 언제 먹어?”라는 사용자 질의에 대하여, 질의 처리 시스템(100)은 “체리 제철”이라는 정규화된 질의를 생성하고, “현재위치 근처 갈비집 가르쳐줘.”라는 사용자 질의에 대하여, 질의 처리 시스템(100)은 “근처 갈비집”이라는 정규화된 질의를 생성할 수 있다. 마찬가지로, “동영상을 보고 싶은데 잘 안나와요.”라는 사용자 질의에 대하여, 질의 처리 시스템(100)은 “동영상 재생오류”라는 정규화된 질의를 생성할 수 있다. 이와 같이, 질의 처리 시스템(100)은 검색이 잘 될 수 있는 형태인 명사 위주의 정규화된 질의를 생성할 수 있다.For example, as shown in (a) of FIG. 4, in response to a user query, “When do you eat cherries?”, the query processing system 100 generates a normalized query called “cherries in season,” and “currently.” In response to a user query, “Tell me a rib restaurant near the location,” the query processing system 100 may generate a normalized query called “Nearby rib restaurant.” Likewise, in response to a user query, “I want to watch a video, but it doesn’t come out well,” the query processing system 100 may generate a normalized query called “video playback error.” In this way, the query processing system 100 can generate a normalized query centered on nouns, which is a form that can be easily searched.

다중 발화 유형은 앞서 살펴본 제1 질의 유형 중 제1 발화 유형인 문맥 연속 발화와 유사하게, 이전 문맥의 내용을 이용하여, 사용자 질의에 대한 정규화된 질의를 생성하는 유형일 수 있다. 질의 정규화 모델(130)은, 이전 문맥에 나오는 사용자 질의에서 추가로 필요한 단어(슬롯(slot) 또는 인텐트(intent))를, 사용자 질의와 재조합하여 정규화된 질의를 생성하도록 학습 될 수 있다.The multiple utterance type may be a type that generates a normalized query for the user query using the contents of the previous context, similar to the context continuous utterance, which is the first utterance type among the first query types discussed above. The query normalization model 130 can be trained to generate a normalized query by recombining additional words (slots or intents) from the user query appearing in the previous context with the user query.

이와 같이, 질의 처리 시스템(100)은 정규화의 대상이 된 사용자의 질의가, 이전 사용자의 질의와 연관된 경우, 이전 사용자의 질의를 이용하여, 정규화된 질의를 생성할 수 있다.In this way, if the user's query that is the target of normalization is related to the previous user's query, the query processing system 100 may generate a normalized query using the previous user's query.

예를 들어, 도 4의 (d)에 도시된 것과 같이, 질의 처리 시스템(100)은 다중 발화에 해당하는 복수의 사용자 질의 “연금복권 당첨일”(401), “62회 번호 알려줘”(402)에 근거하여, 정규화된 사용자 질의인 “연금복권 62회 당첨번호”를 생성할 수 있다.For example, as shown in (d) of FIG. 4, the query processing system 100 processes multiple user queries corresponding to multiple utterances: “pension lottery winning date” (401), “tell me the number 62” (402) ), the normalized user query “Pension Lottery 62nd winning number” can be generated.

또 다른 예를 들어, 도 4의 (e)에 도시된 것과 같이, 질의 처리 시스템(100)은 다중 발화에 해당하는 복수의 사용자 질의 “BMW 520i”(411), “중고차 시세 알려줘”(412)에 근거하여, 정규화된 사용자 질의인 “BMW 520i 중고차 시세”를 생성할 수 있다.For another example, as shown in (e) of FIG. 4, the query processing system 100 processes multiple user queries corresponding to multiple utterances: “BMW 520i” (411), “Tell me the price of a used car” (412) Based on this, a normalized user query “BMW 520i used car price” can be created.

질의 정규화 모델(130)은 앞서 살펴본 다양한 질의 유형 및 발화 유형에 대한 학습 데이터를 이용하여 정규화된 질의를 생성하도록 학습된다. The query normalization model 130 is trained to generate normalized queries using learning data for the various query types and utterance types discussed above.

질의 정규화 모델(130)은 대화형 질의 유형 중 문맥 연속 발화 유형에 해당하는 사용자 질의에 대해서는, 이전 대화 문맥을 이용하여, 정규화된 질의를 생성하도록 할 수 있다. 그리고, 질의 정규화 모델(130)은 대화형 질의 유형 중 문맥 단절 발화 유형에 해당하는 사용자 질의에 대해서는 이전 문맥을 고려하지 않고 사용자 질의를 그대로 정규화된 질의로 생성하도록 구성된 학습데이터를 이용하여 학습할 수 있다.The query normalization model 130 can generate a normalized query using the previous conversation context for a user query corresponding to the context continuous speech type among the interactive query types. In addition, the query normalization model 130 can be learned using learning data configured to generate the user query as a normalized query without considering the previous context for the user query corresponding to the context-disconnected utterance type among the interactive query types. there is.

그리고, 질의 정규화 모델(130)은 검색형 질의 유형 중 단일 발화 유형에 해당하는 사용자 질의에 대해서는, 명사 위주의 정규화된 질의를 생성하도록 할 수 있다. 나아가, 질의 정규화 모델(130)은 검색형 질의 유형 중 다중 발화 유형에 해당하는 사용자 질의에 대해서는 이전 사용자 질의를 고려하여 정규화된 질의를 생성하도록 구성된 학습데이터를 이용하여 학습할 수 있다.In addition, the query normalization model 130 can generate a normalized query centered on nouns for a user query corresponding to a single utterance type among search query types. Furthermore, the query normalization model 130 can be trained using learning data configured to generate a normalized query by considering previous user queries for user queries corresponding to multiple utterance types among search query types.

앞서 설명한 바와 같이 질의 정규화 모델(130)은 초거대 기 학습 언어모델(140)을 사용할 수 있으며, 여기에 정규화를 위한 새로운 데이터 기반 추가 학습을 위하여 P-tunning 학습 방법을 이용할 수 있다. As described above, the query normalization model 130 can use the super-large learning language model 140, and the P-tunning learning method can be used for additional learning based on new data for normalization.

본 발명의 일실시예에서, 기 학습 언어 모델(140)은 Transformer 구조를 기반으로 수천억개의 토큰 데이터를 이용하여 학습된 것으로, 토큰화된 T개의 연속된 토큰들 x=(x1, x2, …, xT)에 대해 아래의 수학식 1과 같은 확률 분포를 따를 수 있다(θ: 기 학습 언어 모델(140)의 모수).In one embodiment of the present invention, the pre-trained language model 140 is learned using hundreds of billions of token data based on the Transformer structure, and consists of T tokenized consecutive tokens x=(x1, x2,... , xT) may follow a probability distribution as shown in Equation 1 below (θ: parameter of the pre-learning language model 140).

[수학식 1][Equation 1]

프롬프트 인코더(150)는 탬플릿에 정의된 n개의 프롬프트 토큰 p=(p1, …, pn)에 대해, 아래의 수학식 2를 따른다.The prompt encoder 150 follows Equation 2 below for n prompt tokens p=(p1,...,pn) defined in the template.

[수학식 2][Equation 2]

프롬프트 인코더(150)에 의해 인코딩된 벡터 h = (h1, …, h2)는 탬플릿 내 프롬프트 토큰 p=(p1, …, pn)의 각 위치에 맞게 대체되어 기 학습 언어모델(140)의 입력으로 들어가게 된다. 이때, 활성화 함수로 ReLU가 사용될 수 있다.The vector h = (h1, ..., h2) encoded by the prompt encoder 150 is replaced for each position of the prompt token p = (p1, ..., pn) in the template as an input to the previously learned language model 140. I go in. At this time, ReLU can be used as the activation function.

본 발명에서는 질의 정규화 모델(130)을 학습시키기 위하여, 입력값 x와 프롬프트 인코더(150)로부터 추출된 벡터 h를 정의된 P-tunning 용 탬플릿에 매핑하여 입력을 정의하고, 이와 짝을 이루는 K개의 응답 토큰 y=(y1, …, yk)에 대해 다음의 수학식 3과 같은 목적 함수를 따르도록 한다.In the present invention, in order to learn the query normalization model 130, the input is defined by mapping the input value x and the vector h extracted from the prompt encoder 150 to a defined P-tunning template, and K pairs are created For the response token y=(y1, …, yk), follow the objective function as shown in Equation 3 below.

[수학식 3][Equation 3]

이때, 기 학습 언어 모델(140)의 모수θ는 고정하고, 프롬프트 인코더(150)의 모수 ø만 학습한다.At this time, the parameter θ of the previously learned language model 140 is fixed, and only the parameter ø of the prompt encoder 150 is learned.

한편, 질의 정규화 모델(130)에 입력되는 프롬프트(prompt)에는 복수개의 프롬프트 토큰(60, 예를 들어, [PROMPT 1], [PROMPT 2]… 등)이 배치될 수 있다. 예를 들어, 도 2에 도시된 것과 같이, 프롬프트는 프롬프트 토큰(60)과 대화 예시(510, 520), 그리고 정답(정규화된 질의, 530)으로 구성될 수 있다. 이때, 질의 정규화 모델(130) 은 대화 예시의 마지막 사용자 발화(520)를 사용자 질의로 특정하고, 정답(530)을 상기 사용자 질의에 대한 정규화된 질의로서 생성하도록 학습된다.Meanwhile, a plurality of prompt tokens (60, for example, [PROMPT 1], [PROMPT 2], etc.) may be placed in the prompt input to the query normalization model 130. For example, as shown in FIG. 2, a prompt may consist of a prompt token 60, dialogue examples 510, 520, and a correct answer (normalized query, 530). At this time, the query normalization model 130 is trained to specify the last user utterance 520 of the conversation example as a user query and to generate the correct answer 530 as a normalized query for the user query.

질의 정규화 모델(130)은 적어도 둘 이상의 문장을 포함하는 대화 시퀀스를 입력받아, 상기 대화 시퀀스 중 마지막 문장의 이전 문장들의 문맥을 고려하여, 상기 마지막 문장에 대해 정규화된 문장을 생성하도록 학습될 수 있다. The query normalization model 130 receives a conversation sequence containing at least two or more sentences, considers the context of previous sentences of the last sentence in the conversation sequence, and is trained to generate a normalized sentence for the last sentence. .

질의 정규화 모델(130)은 상기 대화 시퀀스 및 상기 대화 시퀀스 중 상기 마지막 문장에 대한 정규화 문장 쌍을 포함하여 구성된 학습데이터 세트를 이용하여 학습될 수 있다.The query normalization model 130 can be learned using a learning data set including the conversation sequence and a normalized sentence pair for the last sentence of the conversation sequence.

한편, 학습 데이터 세트는, 제1 타입의 학습 데이터 및 제2 타입의 학습 데이터 중 적어도 하나를 포함하며, 제1 타입의 학습 데이터에 포함된 정규화된 문장은, 대화 시퀀스의 마지막 문장에서 생략된 성분을 상기 이전 문장들 중 적어도 일부를 이용하여 채워지는 것을 통해 구성될 수 있다. 이때, 생략된 성분은, 주어, 목적어 및 질의 의도를 나타내는 서술어 중 적어도 하나 이상이다. Meanwhile, the training data set includes at least one of a first type of training data and a second type of training data, and the normalized sentence included in the first type of training data is a component omitted from the last sentence of the conversation sequence. It can be constructed by being filled in using at least some of the previous sentences. At this time, the omitted component is at least one of the subject, object, and predicate indicating the intention of the query.

나아가, 제2 타입의 학습 데이터에 포함된 정규화된 문장은, 마지막 문장이 특정 서비스에 대응되는 형태로 변경되어 구성될 수 있다.Furthermore, the normalized sentences included in the second type of learning data may be configured with the last sentence changed to a form corresponding to a specific service.

제1 타입의 학습 데이터는, 상기 마지막 문장과 상기 이전 문장들 간에 문맥이 상호 연관성을 갖도록 구성되며, 상기 제2 타입의 학습 데이터는, 상기 마지막 문장과 상기 이전 문장들 간에 문맥이 상호 연관성을 갖지 않도록 구성될 수 있다.The first type of learning data is configured so that the contexts are correlated between the last sentence and the previous sentences, and the second type of learning data is configured so that the contexts are correlated between the last sentence and the previous sentences. It can be configured not to do so.

한편, 질의 정규화 모델(130)은 서로 다른 질의 유형에 각각 해당하는 학습 데이터에 기반하여, 각각의 질의 유형에 대한 학습을 수행할 수 있다.Meanwhile, the query normalization model 130 may perform learning for each query type based on learning data corresponding to each different query type.

예를 들어, 학습 데이터는, 제1 질의 유형에 해당하는 대화형 질의 유형에 따른 대화 세트(또는 대화 예시)인 학습 데이터 및 제2 질의 유형에 해당하는 검색형 질의 유형에 따른 대화 세트(또는 대화 예시)인 학습 데이터 중 적어도 하나를 포함할 수 있다.For example, the training data includes training data that is a conversation set (or conversation example) according to an interactive query type corresponding to the first query type, and a conversation set (or conversation example) according to a search query type corresponding to the second query type. It may include at least one of the learning data (example).

나아가, 제1 질의 유형에 해당하는 대화형 질의 유형에 따른 학습 데이터는 제1 발화 유형(문맥 연속 발화 유형)에 해당하는 대화 세트(또는 대화 예시)인 학습 데이터 및 제2 발화 유형(문맥 단절 발화 유형)에 해당하는 대화 세트(또는 대화 예시)인 학습 데이터 중 적어도 하나를 포함할 수 있다.Furthermore, the learning data according to the interactive query type corresponding to the first query type is the learning data that is a conversation set (or conversation example) corresponding to the first utterance type (context continuous utterance type) and the learning data that is a conversation set (or conversation example) corresponding to the second utterance type (context disconnected utterance type). It may include at least one of the learning data that is a conversation set (or conversation example) corresponding to the type.

나아가, 제2 질의 유형에 해당하는 검색형 질의 유형에 따른 학습 데이터는 제1 발화 유형(단일 발화 유형)에 해당하는 대화 세트(또는 대화 예시)인 학습 데이터 및 제2 발화 유형(다중 발화 유형)에 해당하는 대화 세트(또는 대화 예시)인 학습 데이터 중 적어도 하나를 포함할 수 있다.Furthermore, the learning data according to the search query type corresponding to the second query type is the learning data that is a conversation set (or conversation example) corresponding to the first utterance type (single utterance type) and the second utterance type (multiple utterance type). It may include at least one of the learning data that is a conversation set (or conversation example) corresponding to .

여기에서, 대화 세트는, 앞서 도 2에서 살펴본 대화 예시(510, 520)에 해당하고, 질의 세트는 앞서 살펴본 사용자의 마지막 발화(사용자 질의, 520) 및 정답(530)에 해당할 수 있다.Here, the conversation set may correspond to the conversation examples 510 and 520 seen above in FIG. 2, and the query set may correspond to the user's last utterance (user query, 520) and the correct answer 530 seen above.

도 5는 학습 데이터 템플릿 구조를 설명하기 위한 예시이다. 학습 데이터는, 대화 예제 별 학습 대상 대화 세트(602, 605, …) 및 학습 대상 질의 세트(603, 606, …)를 포함하도록 구성될 수 있다. 그리고, 학습 데이터에는 복수의 프롬프트 토큰(601 등)이 포함되도록 구성될 수 있다.Figure 5 is an example to explain the learning data template structure. The learning data may be configured to include a learning target conversation set (602, 605, ...) and a learning target query set (603, 606, ...) for each conversation example. Additionally, the learning data may be configured to include a plurality of prompt tokens (601, etc.).

학습 데이터의 학습 대상 대화 세트(602, 605, …)는, 적어도 하나의 사용자 질의 및 상기 적어도 하나의 사용자 질의에 대한 응답에 각각 대응되는 복수의 문장을 포함하도록 구성될 수 있다. 그리고, 학습 대상 질의 세트(603, 606, …)는, 정규화 학습의 대상이 되는 학습 대상 질의(ex: “가장 신나는 앨범이 뭐야?”, “그 때 미국 대통령이 누구야?” 등) 및 상기 학습 대상 질의에 대해 정규화가 이루어진 학습 대상 정규화 질의(ex: “Love poem, Palette, CHAT-SHIRE 중 가장 신나는 앨범이 뭐야?”, “2018년 미국 대통령이 누구야?” 등)에 각각 대응되는 복수의 문장을 포함하도록 구성될 수 있다.The learning target conversation sets 602, 605, ... of the learning data may be configured to include at least one user query and a plurality of sentences each corresponding to a response to the at least one user query. And, the learning target query set (603, 606, ...) is the learning target query that is the target of normalization learning (ex: “What is the most exciting album?”, “Who was the US president at that time?”, etc.) and the learning target query set (603, 606, …). A plurality of sentences each corresponding to a normalized learning target query (ex: “What is the most exciting album among Love poem, Palette, CHAT-SHIRE?”, “Who is the President of the United States in 2018?”, etc.) It may be configured to include.

한편, 도시와 같이, 학습 데이터는 복수의 프롬프트 토큰을 포함하도록 구성되고, 상기 복수의 프롬프트 토큰의 일부분(601)은, 적어도 하나의 학습 대상 대화 세트(602) 보다 상단에 배치될 수 있다. 그리고, 상기 복수의 프롬프트 토큰의 다른 일부분(604)은, 상기 정규화 질의 앞쪽에 배치될 수 있다.Meanwhile, as shown, the learning data is configured to include a plurality of prompt tokens, and a portion 601 of the plurality of prompt tokens may be placed above at least one conversation set 602 to be learned. And, another part 604 of the plurality of prompt tokens may be placed in front of the normalization query.

한편, 도 5의 예시는 문맥 연속 발화 유형에 해당하는 학습 데이터이다. 이 경우, 학습 데이터는 학습 대상 대화 세트(602)와 학습 대상 질의 세트(603)를 각각 구성하는 문장들 간의 문맥이 상호 연관성을 갖도록 구성될 수 있다. 학습 대상 질의 세트(603)에 포함된 정규화된 질의(학습 대상 정규화 질의)(ex: “Love poem, Palette, CHAT-SHIRE 중 가장 신나는 앨범이 뭐야?”, “2018년 미국 대통령이 누구야?” 등)는 해당 대화 세트의 학습 대상 대화 세트에 포함된 복수의 문장을 구성하는 단어 중 적어도 일부를 포함하도록 이루어질 수 있다.Meanwhile, the example in FIG. 5 is learning data corresponding to the context continuous utterance type. In this case, the learning data may be configured so that the contexts between the sentences constituting the learning target conversation set 602 and the learning target query set 603 are interrelated. Normalized queries (learning target normalized queries) included in the learning target query set 603 (ex: “What is the most exciting album among Love poem, Palette, CHAT-SHIRE?”, “Who is the President of the United States in 2018?”, etc. ) may be configured to include at least some of the words constituting a plurality of sentences included in the learning target conversation set of the corresponding conversation set.

나아가, 상기 제1 발화 유형에 해당하는 상기 학습 대상 정규화 질의는, 상기 학습 대상 질의(사용자 질의)에 해당하는 문장을 구성하는 단어 중 적어도 일부를 더 포함하도록 이루어질 수 있다. 이를 통해, 질의 정규화 모델(130)은 제1 질의 유형에 해당하는 대화형 질의 유형 중 제1 발화 유형(문맥 연속 발화 유형)의 경우, 사용자 질의 이전의 대화의 문맥을 고려하여, 이전 대화의 내용 및 사용자 질의의 내용을 조합하여, 정규화된 질의를 생성하도록 학습할 수 있다.Furthermore, the learning target normalization query corresponding to the first utterance type may further include at least some of the words constituting the sentence corresponding to the learning target query (user query). Through this, the query normalization model 130 considers the context of the conversation before the user query in the case of the first utterance type (contextual continuous utterance type) among the interactive query types corresponding to the first query type, and the contents of the previous conversation. and the contents of the user's query can be combined to learn to generate a normalized query.

한편, 비록 도시되지는 않았지만, 제1 질의 유형에 해당하는 대화형 질의 유형 중 제2 발화 유형(문맥 단절 발화 유형)에 해당하는 학습 데이터는 상기 학습 대상 대화 세트(또는 대화 예시)와 상기 학습 대상 질의 세트(또는 사용자 질의)를 각각 구성하는 문장들 간의 문맥이 상호 연관성을 갖지 않도록 구성될 수 있다. 이 경우, 상기 제2 발화 유형에 해당하는 상기 학습 대상 정규화 질의는, 상기 학습 대상 질의에 해당하는 문장과 동일 또는 대응되도록 구성될 수 있다.Meanwhile, although not shown, the learning data corresponding to the second utterance type (context disconnected utterance type) among the interactive query types corresponding to the first query type includes the learning target conversation set (or conversation example) and the learning target. The context between sentences constituting each query set (or user query) may be configured so that there is no correlation between them. In this case, the learning target normalization query corresponding to the second utterance type may be configured to be the same as or correspond to the sentence corresponding to the learning target query.

질의 정규화 모델(130)은 제1 질의 유형에 해당하는 대화형 질의 유형 중 제2 발화 유형(문맥 단절 발화 유형)에 해당하는 학습 데이터로부터, 사용자 질의가 수신되기 전 이전 대화와, 사용자 질의 간의 문맥의 관련성이 없음을 학습할 수 있다. 그리고, 학습 결과로서, 문맥 단절 발화에 해당하는 대화 및 사용자 질의가 수신되는 경우, 대화의 내용에 따른 문맥을 고려하지 않고, 사용자 질의 그대로 또는 사용자 질의와 대응되는 문장을 정규화된 질의로서 출력할 수 있다.The query normalization model 130 is created from learning data corresponding to the second utterance type (context disconnected utterance type) among the interactive query types corresponding to the first query type, the previous conversation before the user query is received, and the context between the user query. You can learn that there is no relevance. And, as a learning result, when a conversation or user query corresponding to a context-disconnected utterance is received, the user query as is or a sentence corresponding to the user query can be output as a normalized query without considering the context according to the content of the conversation. there is.

다음으로, 비록 도시되지는 않았지만, 제2 질의 유형에 해당하는 검색형 질의 유형 중 제1 발화 유형(단일 발화 유형)에 해당하는 학습 데이터는, 사용자 질의 및 사용자 질의에 포함된 단어(또는 유사 의미의 단어)로 구성된 정규화된 질의를 포함하도록 구성될 수 있다. 이때, 정규화된 질의는 명사 위주의 단어가 포함되도록 구성될 수 있으며, 사용자 질의 및 정규화된 질의 사이에 복수개의 프롬프트 토큰이 배치되도록 구성될 수 있다.Next, although not shown, the learning data corresponding to the first utterance type (single utterance type) among the search-type queries corresponding to the second query type include the user query and the words (or similar meanings) included in the user query. It can be configured to include a normalized query consisting of words of . At this time, the normalized query may be configured to include noun-oriented words, and a plurality of prompt tokens may be placed between the user query and the normalized query.

나아가, 제2 질의 유형에 해당하는 검색형 질의 유형 중 제2 발화 유형(다중 발화 유형)에 해당하는 학습데이터는, 도 6에 도시된 것과 같이 학습 데이터가 구성될 수 있다. 도시와 같이, 학습 데이터는, 다중 발화에 해당하는 복수의 사용자 질의(“조선 세 번째 왕 -> “그 다음은”, 701) 및 이에 대한 정규화된 질의를 포함하도록 구성될 수 있으며, 정규화된 질의(“조선 네번째 왕”)에는 복수의 프롬프트 토큰(702)이 배치될 수 있다.Furthermore, the learning data corresponding to the second utterance type (multiple utterance types) among the search-type query types corresponding to the second query type may be configured as shown in FIG. 6. Like the city, the learning data can be structured to include multiple user queries corresponding to multiple utterances (“The third king of Joseon -> “And then”, 701) and normalized queries for them, and the normalized queries (“The fourth king of Joseon”) may have multiple prompt tokens 702 placed there.

제2 질의 유형에 해당하는 검색형 질의 유형 중 제2 발화 유형(다중 발화 유형)에 해당하는 학습데이터는, 복수개의 사용자 질의의 내용이 반영된 정규화된 질의를 포함하도록 구성될 수 있다. 이를 통해, 질의 정규화 모델(130)은 다중 발화 유형에 대한 사용자 질의가 수신되었을 때, 이전 사용자 질의를 참조하여, 정규화된 질의를 생성할 수 있다.The learning data corresponding to the second utterance type (multiple utterance types) among the search-type queries corresponding to the second query type may be configured to include normalized queries reflecting the contents of a plurality of user queries. Through this, the query normalization model 130 can generate a normalized query by referring to previous user queries when a user query for multiple utterance types is received.

위에서 살펴본 설명들, 그리고 도 7 및 도 8에 도시된 것과 같이, 본 발명에 따른 질의 정규화 모델 (130)은, 각각의 질의 유형 및 발화 유형에 따른 다양한 사용자 질의 상황 마다의 정답을 포함하는 학습 데이터를 통하여 학습될 수 있다.As described above and as shown in FIGS. 7 and 8, the query normalization model 130 according to the present invention includes learning data including the correct answer for various user query situations according to each query type and utterance type. It can be learned through.

도 7의 (a)에 도시된 것과 같이, 제1 질의 유형에 해당하는 대화형 질의 유형 중 제1 발화 유형(문맥 연속 발화 유형)에 해당하는 학습 데이터(ex: 제1 타입의 학습 데이터)에는, 대화의 문맥을 고려하여 생성된 정규화된 질의가 정답으로서 포함될 수 있다. As shown in (a) of Figure 7, the learning data (ex: first type of learning data) corresponding to the first utterance type (contextual continuous utterance type) among the interactive query types corresponding to the first query type includes , a normalized query generated considering the context of the conversation can be included as the correct answer.

도 7의 (a)에 도시된 것과 같이, 문맥 연속 발화 유형에 해당하는 학습 데이터는, 사용자와 에이전트 간에 이루어진 대화와 사용자의 질의(“가장 신나는 앨범이 뭐야?”)의 문맥 간에 상호 관련성이 있도록 이루어질 수 있다. 이때, 사용자 질의는 대화의 마지막 문장이 될 수 있으며, 정규화된 질의에 해당하는 정답은, 대화의 문맥이 고려되어 구성될 수 있다. 사용자 질의에 해당하는 마지막 문장이 문장 성분 중 적어도 일부가 생략되더라도, 대화로부터 생략된 문장 성분이 채워지도록 정답이 구성될 수 있다. As shown in (a) of Figure 7, the learning data corresponding to the context-continuous utterance type is structured so that there is a correlation between the context of the conversation between the user and the agent and the user's inquiry (“What is the most exciting album?”) It can be done. At this time, the user query may be the last sentence of the conversation, and the correct answer corresponding to the normalized query may be constructed by taking the context of the conversation into consideration. Even if the last sentence corresponding to the user's query omits at least some of the sentence components, the correct answer may be constructed to fill in the omitted sentence components from the conversation.

정답은, 대화의 내용 중 적어도 일부를 이용하여, 사용자 질의에서 생략된 문장 성분이 채워지도록 구성될 수 있으며, 그 결과 정답에 해당하는 정규화된 질의는 “Love poem, Pallette, CHAT-SHIRE 중 가장 신나는 앨범이 뭐야?”로 구성될 수 있다. The correct answer can be constructed to fill in the sentence components omitted in the user query using at least part of the content of the conversation, and as a result, the normalized query corresponding to the correct answer is “The most exciting among Love poem, Pallette, and CHAT-SHIRE.” It may consist of “What is an album?”

한편, 도 7의 (b)에 도시된 것과 같이, 제1 질의 유형에 해당하는 대화형 질의 유형 중 제2 발화 유형(문맥 단절 발화 유형)에 해당하는 학습 데이터(ex: 제2 타입의 학습 데이터)에는, 대화의 문맥이 고려되지 않고, 사용자의 마지막 발화(사용자 질의)에 대하여 정규화된 질의가 정답으로서 포함될 수 있다.Meanwhile, as shown in (b) of FIG. 7, learning data (ex: learning data of the second type) corresponding to the second utterance type (context disconnected utterance type) among the interactive query types corresponding to the first query type ), the context of the conversation is not considered, and a normalized query for the user's last utterance (user query) may be included as the correct answer.

문맥 단절 발화 유형에 해당하는 학습 데이터는 다양한 카테고리의 학습 데이터로 구성될 수 있다.Learning data corresponding to context-disconnected utterance types may consist of various categories of learning data.

문맥 단절 발화 유형의 일 예로, 사용자 질의 이전의 대화와 서로 다른 도메인에 대한 응답을 요청하는 경우가 있다(카테고리: 다른 도메인 요청). 예를 들어, 도 7의 (b)에 도시된 것과 같이, 사용자 질의와, 사용자 질의 이전에 사용자와 에이전트 간에 이루어진 대화가 서로 다른 도메인(또는 주제)를 갖도록 구성될 수 있다. 도시와 같이, 사용자 질의에 해당하는 대화의 마지막 문장인 “옛날 사람들은 어디서 결혼식을 했어?”의 도메인은 “결혼식”이고, 사용자 질의 이전의 대화의 내용의 도메인은 “전통풍습”으로 서로 다른 경우 문맥 단절 발화가 된다.An example of a context-disconnected utterance type is a case in which a response to a different domain is requested from the conversation preceding the user inquiry (category: request for another domain). For example, as shown in (b) of FIG. 7, a user query and a conversation between a user and an agent prior to the user query may be configured to have different domains (or topics). Like the city, the domain of the last sentence of the conversation corresponding to the user query, “Where did people get married in the past?” is “wedding,” and the domain of the content of the conversation before the user query is “traditional customs.” It becomes a context-severe utterance.

또 다른 예로, 사용자 질의가 불건전한 발화(ex. 닥쳐, 장난하냐? 와 같은 이전 대화의 내용과 무관한 건전하지 못한 발화)에 해당하는 경우일 수 있다(카테고리: 불건전 발화). As another example, the user's query may correspond to an unhealthy utterance (ex. Unhealthy utterance unrelated to the content of the previous conversation, such as Shut up, are you kidding me?) (Category: Unhealthy utterance).

또 다른 예로, 또 다른 예로, 에이전트의 응답에 반응하거나, 단순 대화를 하기 위한 발화(ex: 신기하네, 그렇구나, 알려줘서 고마워, 나도 알아, 너 똑똑하구나, 고마워 등) 와 같은 채팅 성 발화에 해당하는 경우일 수 있다(카테고리: 챗(chat)성). 또 다른 예로, 에이전트와 대화를 중단하기 위한 발화(ex: 그만해, 이제 그만 알려줘, 오케이 알았어 등) 에 해당하는 경우일 수 있다(카테고리: 정지요청). Another example, another example, corresponds to chat-like utterances such as utterances for responding to an agent's response or for simple conversation (ex: That's fascinating, I see, thank you for letting me know, I know, you're smart, thank you, etc.) This may be the case (Category: Chat). As another example, it may be an utterance to stop a conversation with an agent (ex: stop, stop telling me now, okay, okay, etc.) (category: request to stop).

이러한 문맥 단절 발화 유형의 경우, 정답에 해당하는 정규화된 질의는, 사용자의 질의 그대로이거나, 사용자 질의의 내용과 동일 또는 유사하면서 보다 서비스에 적합한 형태의 문장일 수 있다. In the case of this context-disconnected utterance type, the normalized query corresponding to the correct answer may be the same as the user's query, or may be a sentence that is the same or similar to the content of the user's query and is more suitable for the service.

도 8의 (a)에 도시된 것과 같이, 제2 질의 유형에 해당하는 검색형 질의 유형 중 제1 발화 유형(단일 발화 유형)에 해당하는 학습 데이터(ex: 제2 타입의 학습 데이터)에는, 사용자의 질의에 대하여 명사 위주로 구성된 정규화된 질의가 정답으로서 포함될 수 있다. 그리고, 도 8의 (b)에 도시된 것과 같이, 제2 질의 유형에 해당하는 검색형 질의 유형 중 제2 발화 유형(다중 발화 유형)에 해당하는 학습 데이터(ex: 제1 타입의 학습 데이터)에는, 대화의 문맥(사용자의 복수의 질문)이 고려되어 생성된 정규화된 질의가 정답으로서 포함될 수 있다.As shown in (a) of Figure 8, the learning data (ex: learning data of the second type) corresponding to the first utterance type (single utterance type) among the search-type query types corresponding to the second query type, includes: A normalized query consisting mainly of nouns may be included as the correct answer to the user's query. And, as shown in (b) of FIG. 8, learning data (ex: first type of learning data) corresponding to the second utterance type (multiple utterance type) among the search-type query types corresponding to the second query type. , a normalized query generated by considering the context of the conversation (the user's multiple questions) may be included as the correct answer.

본 발명에 따른 질의 처리 시스템(100)은 이와 같이 다양한 질의 및 발화 유형에 대하여 질의 정규화 모델(130)을 이용하여, 사용자와 에이전트 간의 대화 상황에서, 사용자의 질의를, 사용자의 의도에 맞도록 정규화할 수 있다.The query processing system 100 according to the present invention uses the query normalization model 130 for various queries and utterance types, and normalizes the user's query to suit the user's intention in a conversation situation between the user and the agent. can do.

이하에서는, 위에서 살펴본 것과 같이, 질의 정규화 모델(130)을 이용하여, 사용자 질의를 정규화 하는 질의 처리 시스템에서, 사용자 질의를 처리하는 방법에 대하여 첨부된 도면과 함께 보다 구체적으로 살펴본다. 도 9 내지 도 14는 본 발명에 따른 질의 처리 방법 및 시스템의 활용 예들을 설명하기 위한 개념도들이다.Below, as discussed above, a method of processing user queries in a query processing system that normalizes user queries using the query normalization model 130 will be examined in more detail with the accompanying drawings. Figures 9 to 14 are conceptual diagrams for explaining usage examples of the query processing method and system according to the present invention.

도 9를 참조하면, 본 발명에 따른 질의 처리 방법에서는, 에이전트와 사용자 간에 대화 세션이 형성되는 과정이 수행된다(S1010). 다음으로, 본 발명에서는, 질의 정규화 시스템(100)의 질의 수신부(110)에서 대화 세션에 따른 대화의 적어도 일부에 해당하는 사용자 질의를 수신하는 과정이 수행된다(S1020). 앞서 살펴본 것과 같이, 사용자 질의는, 상기 대화 세션 따른 대화 중 상기 사용자의 마지막 발화가 상기 사용자 질의로서 특정될 수 있다. 예를 들어, 도 10에 도시된 것과 같이, 사용자(USER)와 에이전트(20)와의 대화(1111, 1112, 1113, 1114, 1115) 중 사용자의 마지막 발화(1115, “아이언맨은?”)가, 사용자 질의로서 특정될 수 있다.Referring to FIG. 9, in the query processing method according to the present invention, a conversation session is formed between the agent and the user (S1010). Next, in the present invention, a process of receiving a user query corresponding to at least a part of a conversation according to a conversation session is performed in the query reception unit 110 of the query normalization system 100 (S1020). As discussed above, the user's query may be specified as the user's last utterance during a conversation according to the conversation session. For example, as shown in FIG. 10, among the conversations (1111, 1112, 1113, 1114, 1115) between the user (USER) and the agent 20, the user's last utterance (1115, “What about Iron Man?”) , can be specified as a user query.

다음으로, 본 발명에서는, 학습 데이터에 기반하여 학습된 질의 정규화 모델(130)을 이용하여, 상기 사용자 질의에 상응하는 정규화된 질의를 생성하는 과정이 수행될 수 있다(S1030). 질의 처리 시스템(100)은 상기 대화 세션 따른 대화 중 상기 사용자의 마지막 발화를 상기 사용자 질의로서 특정하고, 상기 사용자 질의에 해당하는, 대화의 마지막 발화를 단일 문장으로 정규화하는 것을 통해, 상기 정규화된 질의를 생성할 수 있다.Next, in the present invention, a process of generating a normalized query corresponding to the user query can be performed using the query normalization model 130 learned based on learning data (S1030). The query processing system 100 specifies the last utterance of the user during a conversation according to the conversation session as the user query, and normalizes the last utterance of the conversation corresponding to the user query into a single sentence, thereby generating the normalized query. can be created.

질의 처리 시스템(100)은 상기 마지막 발화가, 상기 대화 세션에 따른 대화 중 상기 마지막 발화 이전의 대화의 문맥과 관련된 경우, 상기 정규화된 질의를 구성하는 복수의 단어 중 적어도 일부가, 상기 마지막 발화 이전의 대화의 내용과 관련되도록, 사용자 질의를 정규화할 수 있다.The query processing system 100 determines that when the last utterance is related to the context of a conversation before the last utterance among conversations according to the conversation session, at least some of the plurality of words constituting the normalized query are before the last utterance. User queries can be normalized so that they relate to the content of the conversation.

보다 구체적으로, 질의 처리 시스템(100)은 상기 마지막 발화가, 상기 대화 세션에 따른 대화 중 상기 마지막 발화 이전의 대화의 문맥과 관련된 경우, 상기 마지막 발화 이전의 대화를 구성하는 복수의 단어 일부와 상기 마지막 발화를 구성하는 복수의 단어의 일부를 조합하여, 상기 정규화된 질의를 생성할 수 있다.More specifically, if the last utterance is related to the context of the conversation before the last utterance among the conversations according to the conversation session, the query processing system 100 selects the part of a plurality of words constituting the conversation before the last utterance and the The normalized query can be generated by combining some of the plurality of words that make up the last utterance.

예를 들어, 도 10에 도시된 것과 같은 대화는, 영화의 평점을 질문하는 상황이다. 해당 상황에서, 사용자 질의에 해당하는 사용자의 마지막 발화(1115)인 “아이언맨은?”은, 영화 제목을 포함하고 있다. 이 경우, 질의 정규화 모델(130)은 해당 마지막 발화에 대하여, 이전 대화의 문맥을 고려하여 정규화를 수행할 수 있다. 이 경우 질의 처리 시스템(100)은 질의 정규화 모델(130)을 이용하여, 정규화된 질의 로서 “아이언맨의 영화 평점은?” 을 생성할 수 있다.For example, a conversation like the one shown in Figure 10 is a situation where a question is asked about the rating of a movie. In this situation, the user's last utterance (1115) corresponding to the user query, “What is Iron Man?” includes the movie title. In this case, the query normalization model 130 may perform normalization on the last utterance by considering the context of the previous conversation. In this case, the query processing system 100 uses the query normalization model 130 to form a normalized query, “What is the movie rating for Iron Man?” can be created.

이와 달리, 사용자의 질의에 해당하는 마지막 발화가, 상기 대화 세션에 따른 대화 중 상기 마지막 발화 이전의 대화의 문맥과 관련되지 않은 경우, 질의 처리 시스템(100)은 상기 마지막 발화에만 관련된 내용으로 정규화된 질의를 생성할 수 있다. 이 경우, 정규화된 질의는, 사용자의 마지막 발화와 동일할 수 있다.On the other hand, if the last utterance corresponding to the user's query is not related to the context of the conversation before the last utterance among the conversations according to the conversation session, the query processing system 100 normalizes the content related only to the last utterance. You can create queries. In this case, the normalized query may be identical to the user's last utterance.

질의 처리 시스템(100)은 서비스 서버(30, 도 1 참조)을 이용하여, 상기 정규화된 질의에 대응하는, 응답을 생성할 수 있다. The query processing system 100 may generate a response corresponding to the normalized query using the service server 30 (see FIG. 1).

얘를 들어, 질의 처리 시스템(100)와 연동된 서비스 서버(30)가 검색 엔진에 해당하는 경우, 질의 처리 시스템(100)은 정규화된 질의를 검색 엔진으로 전송할 수 있다. 나아가, 검색 엔진으로부터 검색 결과가 획득되는 경우, 검색 결과는 에이전트로 전송될 수 있다. 에이전트는, 검색 결과를 정규화된 질의에 대한 응답으로서 사용자에게 제공할 수 있다.For example, if the service server 30 linked to the query processing system 100 corresponds to a search engine, the query processing system 100 may transmit a normalized query to the search engine. Furthermore, when search results are obtained from a search engine, the search results may be transmitted to the agent. The agent may provide search results to the user as a response to a normalized query.

예를 들어, 에이전트가, 도시와 같이 음성 기반의 대화형 에이전트인 경우, 상기 응답은 에이전트의 발화(1116)로서, 출력될 수 있다. 그리고, 상기 에이전트가, 챗봇 형태의 에이전트인 경우, 상기 응답은 채팅의 형식으로 사용자의 단말에 제공될 수 있다.For example, if the agent is a voice-based conversational agent like the city, the response may be output as the agent's utterance 1116. And, if the agent is a chatbot-type agent, the response may be provided to the user's terminal in the form of a chat.

이와 같이, 질의 처리 시스템은, 사용자 질의에 상응하는 정규화된 질의에 대한 응답(ex: 검색 결과)를 서비스 서버(30)로부터 획득하여, 상기 사용자 질의에 대한 응답으로써 처리할 수 있다. 이때, 상기 정규화된 질의에 대한 응답은 상기 에이전트의 발화로서 출력될 수 있다.In this way, the query processing system can obtain a response (ex: search result) to a normalized query corresponding to the user query from the service server 30 and process it as a response to the user query. At this time, the response to the normalized query may be output as an utterance of the agent.

한편, 질의 처리 시스템(100)은 사용자에게 보다 나은 응답을 제공하기 위하여 추가적인 처리를 할 수 있다.Meanwhile, the query processing system 100 may perform additional processing to provide a better response to the user.

일 예로, 질의 처리 시스템(100)은 정규화된 질의 및 사용자 질의(ex: 정규화의 대상이된 사용자 질의) 각각에 대한 응답을 서비스 서버(30)로부터 획득하여, 스코어링을 수행할 수 있다. 이때, 스코어링은 상기 정규화된 질의에 대한 응답 및 상기 사용자 질의에 대한 응답 중 어느 응답이 상기 사용자 질의의 질의 의도를 더 잘 반영했는지에 대한 점수로 이해되어 질 수 있다. 스코어링의 방법은 다양할 수 있으며, 특별한 제한을 두지 않는다.As an example, the query processing system 100 may obtain responses to each of a normalized query and a user query (ex: a user query subject to normalization) from the service server 30 and perform scoring. At this time, scoring can be understood as a score indicating which response among the response to the normalized query and the response to the user query better reflects the intention of the user query. Scoring methods can vary, and there are no special restrictions.

질의 처리 시스템(100)은 스코어링 결과, 상기 정규화된 질의에 대한 응답 및 상기 사용자 질의에 대한 응답 중 스코어가 더 높은 응답을 최종 응답으로서 결정할 수 있다. 그리고, 결정된 최종 응답을 에이전트의 발화로서 처리할 수 있다. 즉, 사용자에게는, 결정된 최종 응답이 제공될 수 있다.The query processing system 100 may determine a response with a higher score among the scoring results, the response to the normalized query, and the response to the user query as the final response. And, the determined final response can be processed as the agent's utterance. That is, the user may be provided with the final, determined response.

위의 예와 다른 예로서, 질의 처리 시스템(100)은 사용자 질의에 대하여 복수개의 정규화된 질의를 생성할 수 있다. 그리고, 질의 처리 시스템(100)은 복수개의 정규화된 질의 각각에 대하여, 서비스 서버(30)로부터 응답을 획득하고, 획득된 응답들에 대한 스코어링을 수행할 수 있다. 스코어링 결과, 스코어가 가장 높은 응답이, 상기 사용자 질의에 대한 응답으로써 선택될 수 있다.As an example different from the above example, the query processing system 100 may generate a plurality of normalized queries for user queries. Additionally, the query processing system 100 may obtain a response from the service server 30 for each of a plurality of normalized queries and perform scoring on the obtained responses. As a result of the scoring, the response with the highest score may be selected as the response to the user inquiry.

또 다른 예로서, 질의 처리 시스템(100)은 복수개의 정규화된 질의 각각에 대한 응답들 뿐만 아니라, 사용자 질의에 대한 응답을 서비스 서버(30)로부터 획득하고, 획득된 응답들에 대한 스코어링을 수행할 수 있다. 스코어링 결과, 복수개의 정규화된 질의 각각에 대한 응답들 및 상기 사용자 질의에 대한 응답 중 스코어가 가장 높은 응답이, 상기 사용자 질의에 대한 응답으로서 선택될 수 있다.As another example, the query processing system 100 obtains responses to user queries as well as responses to each of a plurality of normalized queries from the service server 30, and performs scoring on the obtained responses. You can. As a result of the scoring, the response with the highest score among the responses to each of the plurality of normalized queries and the response to the user query may be selected as the response to the user query.

위와 같이 살펴본 예시들과 같은 후처리 과정을 통하여, 사용자 질의의 질의 의도를 가장 잘 반영하는 응답이, 사용자에게 제공될 수 있다.Through post-processing processes such as the examples examined above, a response that best reflects the intention of the user's inquiry can be provided to the user.

다음으로, 본 발명에 따른 질의 처리 시스템(100)과 연동하는 에이전트가 제공하는 다양한 서비스의 예에 대하여 살펴본다.Next, we will look at examples of various services provided by an agent that works with the query processing system 100 according to the present invention.

도 11 내지 도 14는 검색 서비스를 제공하는 에이전트를 통해 사용자 질의를 수신하여, 사용자 질의를 정규화하고, 정규화된 질의에 대한 응답을 제공하는 검색 서비스의 예들이다. 검색 서비스는, 도 11 내지 도 14에 도시된 것과 같이, 사용자 단말의 디스플레이부 상에, 검색 서비스를 제공하는 페이지(1210, 1310, 1410, 1510)를 통하여 제공될 수 있다. 이때, 사용자 질의는 다양한 방식으로 수신될 수 있으며, 음성 인식의 방식으로 수신되거나, 질의 입력 영역(1220, 1320, 1420, 1520)을 통해 입력되는 텍스트를 통하여 수신될 수 있다. 11 to 14 are examples of a search service that receives a user query through an agent providing a search service, normalizes the user query, and provides a response to the normalized query. As shown in FIGS. 11 to 14, the search service may be provided through pages 1210, 1310, 1410, and 1510 that provide the search service on the display unit of the user terminal. At this time, the user query can be received in various ways, such as through voice recognition or through text input through the query input areas 1220, 1320, 1420, and 1520.

에이전트는, 사용자 질의를 수신하며, 이를 질의 처리 시스템(100)으로 전송할 수 있다. 그리고, 질의 처리 시스템(100)에서는 검색형 질의 유형에 따른 학습 데이터에 기반하여 학습된 언어 모델을 이용하여, 사용자 질의에 대한 정규화를 수행할 수 있다. The agent may receive a user query and transmit it to the query processing system 100. Additionally, the query processing system 100 can perform normalization on user queries using a language model learned based on learning data according to the search-type query type.

일 예를 들어, 도 11의 (a)에 도시된 것과 같이, 질의 처리 시스템(100)은 사용자 질의로서 “춤법 검사기”가 수신된 경우처럼 사용자의 질의에 오타가 포함되어 있더라도 사용자의 검색 의도는 “맞춤법 검사기”임을 파악하고, 이를 정규화된 질의로서 출력할 수 있다. 그 결과, 사용자 질의에 따른 검색 결과(1250)와 정규화된 질의에 따른 검색 결과(1260)는 상이하며, 정규화된 질의에 따른 검색결과가 사용자의 의도를 더 잘 반영할 수 있다.For example, as shown in (a) of FIG. 11, the query processing system 100 determines that the user's search intention is It can be identified as a “spelling checker” and output as a normalized query. As a result, the search result 1250 according to the user query and the search result 1260 according to the normalized query are different, and the search result according to the normalized query can better reflect the user's intention.

한편, 에이전트는, 질의 처리 시스템(100)으로부터, 정규화된 질의를 수신하고, 수신된 정규화된 질의(1240)를 도 11의 (b)에 도시된 것과 같이, 질의 입력 영역(1220)에 출력할 수 있다. 따라서, 사용자는 정규화된 질의(1240)에 대하여 검색이 이루어졌음을 인지할 수 있다.Meanwhile, the agent receives a normalized query from the query processing system 100 and outputs the received normalized query 1240 to the query input area 1220, as shown in (b) of FIG. 11. You can. Accordingly, the user can recognize that a search has been performed for the normalized query 1240.

한편, 질의 입력 영역(1220)에 정규화된 질의가 출력되는 경우는 상황에 따라 다를 수 있다. 에이전트는, 정규화된 질의에 신뢰도가, 기 설정된 기준을 만족하는 경우에 한하여, 사용자 질의 대신 정규화된 질의를 질의 입력 영역(1220)에 출력할 수 있다. 이 경우, 정규화된 질의에 신뢰도는, 정규화된 질의가 사용자의 질의 의도를 얼만큼 만족하는지에 대한 것으로서, 신뢰도를 평가하는 방법은 특별한 한정을 두지 않는다.Meanwhile, the case in which a normalized query is output to the query input area 1220 may vary depending on the situation. The agent may output a normalized query to the query input area 1220 instead of a user query only if the reliability of the normalized query satisfies a preset standard. In this case, the reliability of a normalized query refers to the extent to which the normalized query satisfies the user's query intent, and there are no special limitations on the method of evaluating reliability.

또 다른 예를 들어, 도 12의 (a)에 도시된 것과 같이, 사용자 질의로서 “10월의 보석”이 수신된 경우, 질의 처리 시스템(100)은 도 12의 (b)에 도시된 것과 같이, 이를 “10월 탄생석”으로 정규화할 수 있다. 질의 처리 시스템(100)은, 사용자의 질의에 대한 질의 의도를 파악하여, 사용자의 검색 의도는 “10월의 탄생석”임을 파악하고, 이를 정규화된 질의로서 출력할 수 있다. 그 결과, 사용자 질의에 따른 검색 결과(1350)와 정규화된 질의에 따른 검색 결과(1360)는 상이하며, 정규화된 질의에 따른 검색결과가 사용자의 의도를 더 잘 반영할 수 있다.For another example, as shown in (a) of FIG. 12, when “Jewel of October” is received as a user query, the query processing system 100 as shown in (b) of FIG. 12 , this can be normalized as “October birthstone”. The query processing system 100 may determine the query intention of the user's query, determine that the user's search intention is “October birthstone,” and output this as a normalized query. As a result, the search result 1350 according to the user query and the search result 1360 according to the normalized query are different, and the search result according to the normalized query can better reflect the user's intention.

한편, 에이전트는, 질의 처리 시스템(100)으로부터, 정규화된 질의를 수신하고, 수신된 정규화된 질의(1340)를 도 13의 (b)에 도시된 것과 같이, 질의 입력 영역(1320)에 출력할 수 있다. 따라서, 사용자는 정규화된 질의(1340)에 대하여 검색이 이루어졌음을 인지할 수 있다.Meanwhile, the agent receives a normalized query from the query processing system 100 and outputs the received normalized query 1340 to the query input area 1320, as shown in (b) of FIG. 13. You can. Accordingly, the user can recognize that a search has been performed for the normalized query 1340.

앞서 살펴본 것과 같이, 질의 입력 영역(1320)에 정규화된 질의가 출력되는 경우는 상황에 따라 다를 수 있으며, 에이전트는, 정규화된 질의에 신뢰도가, 기 설정된 기준을 만족하는 경우에 한하여, 사용자 질의 대신 정규화된 질의를 질의 입력 영역(1220)에 출력할 수 있다. As seen above, the case in which a normalized query is output to the query input area 1320 may vary depending on the situation, and the agent replaces the user query only if the reliability of the normalized query satisfies preset standards. The normalized query can be output to the query input area 1220.

예를 들어, 도 13의 (a)에 도시된 것과 같이, 사용자 질의로서 “바다에서 타는 보드”가 입력되고, 해당 사용자 질의에 대하여 정규화 질의로서 “서핑보드”가 생성되었다고 가정하자. 그리고, 정규화된 질의의 신뢰도가 기 설정된 기준을 만족하지 못했다고 가정하자. 이 경우, 에이전트는, 정규화된 질의를 질의 입력 영역(1420)에 바로 출력하지 않을 수 있다. 이 경우, 에이전트는 도 13의 (a)에 도시된 것과 같이, 검색 서비스 페이지(1410)의 일 영역(1440)에, 정규화된 질의(“서핑보드”)를 제공할 수 있다. 이를 통해, 사용자는, 사용자 질의에 대하여 정규화된 질의가 존재함을 인지할 수 있다. 그리고, 사용자로부터 정규화된 질의가 선택되는 경우, 정규화된 질의(1460)는 도 13의 (b)에 도시된 것과 같이, 질의 입력 영역(1420)에 출력될 수 있다. 나아가, 정규화된 질의에 대한 검색 결과(1470)가 사용자 단말에 제공될 수 있다. 상기 정규화된 질의는, 상기 일 영역(1440)에 아이콘의 형태로 표시되어, 사용자에 의하여 선택되어 질 수 있다.For example, as shown in (a) of FIG. 13, assume that “board riding in the sea” is input as a user query, and “surfing board” is created as a normalization query for the user query. And, let us assume that the reliability of the normalized query did not meet the preset standards. In this case, the agent may not immediately output the normalized query to the query input area 1420. In this case, the agent may provide a normalized query (“surfboard”) to an area 1440 of the search service page 1410, as shown in (a) of FIG. 13. Through this, the user can recognize that a normalized query exists for the user query. And, when a normalized query is selected by the user, the normalized query 1460 may be output to the query input area 1420, as shown in (b) of FIG. 13. Furthermore, search results 1470 for normalized queries may be provided to the user terminal. The normalized query may be displayed in the form of an icon in the area 1440 and selected by the user.

또 다른 예를 들어, 도 14에 도시된 것과 같이, 앞선 상황에서, 에이전트와 사용자 간에 대화(1500)가 있었다고 가정하자. 그리고, 해당 대화(1500)에서는 “왕(king)”의 재위 기간에 대한 질의가 있었다고 가정하자. 이 경우, 에이전트는, 앞선 대화 상황에 따른 대화 및 사용자 질의(1530, “세종대왕”)를 질의 처리 시스템(100)으로 전송할 수 있다. 그리고, 질의 처리 시스템(100)은 앞서 살펴본 학습 방법에 따라 학습된 질의 정규화 모델(130) 에 기반하여, 앞선 대화 상황의 문맥을 관련성 있는 정규화된 질의(1520)를 생성할 수 있다. 그 결과, 정규화된 질의는 “세종대왕 재위 기간”으로 생성될 수 있다.For another example, as shown in Figure 14, assume that in the previous situation, there was a conversation 1500 between the agent and the user. And, let us assume that in the conversation (1500) there was a question about the reign of the “king.” In this case, the agent may transmit the conversation and user query (1530, “King Sejong”) according to the previous conversation situation to the query processing system 100. Additionally, the query processing system 100 may generate a normalized query 1520 that is relevant to the context of the preceding conversation situation, based on the query normalization model 130 learned according to the learning method discussed above. As a result, a normalized query can be created as “Period of King Sejong’s reign.”

도 14의 (a) 및 (b)에 도시된 것과 같이, 사용자 질의(1530) 따른 검색 결과(1550)는 정규화된 질의(1540)에 따른 검색 결과(1560)와 적어도 일부가 다를 수 있다. 검색 결과에는, 사용자의 질의 의도를 반영할 검색 결과(1570)를 더 포함할 수 있다.As shown in (a) and (b) of FIGS. 14, the search result 1550 according to the user query 1530 may be at least partially different from the search result 1560 according to the normalized query 1540. The search results may further include a search result 1570 that reflects the user's query intention.

이와 같이, 본 발명에 따른 언어 모델을 이용한 질의 처리 방법 및 시스템은 사용자의 질의 의도를 파악하고, 사용자의 질의 의도가 반영된 정규화된 질의를 생성할 수 있다. 그리고, 본 발명에 따른 언어 모델을 이용한 질의 처리 방법 및 시스템은 정규화된 질의를 이용하여, 사용자의 질의에 대응되는 정답(또는 응답)을 검색할 수 있다. 이를 통해, 사용자는 에이전트와 대화하는 것 만으로, 자신이 의도한 만족도 높은 검색결과를 얻을 수 있다.In this way, the query processing method and system using the language model according to the present invention can identify the user's query intention and generate a normalized query that reflects the user's query intention. Additionally, the query processing method and system using a language model according to the present invention can search for the correct answer (or response) corresponding to the user's query using a normalized query. Through this, users can obtain the satisfactory search results they intended simply by talking to the agent.

한편, 위에서 살펴본 본 발명은, 컴퓨터에서 하나 이상의 프로세스에 의하여 실행되며, 이러한 컴퓨터로 판독될 수 있는 매체(또는 기록 매체)에 저장 가능한 프로그램으로서 구현될 수 있다.Meanwhile, the present invention discussed above can be implemented as a program that is executed by one or more processes on a computer and can be stored in a medium (or recording medium) that can be read by such a computer.

나아가, 위에서 살펴본 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드 또는 명령어로서 구현하는 것이 가능하다. 즉, 본 발명은 프로그램의 형태로 제공될 수 있다. Furthermore, the present invention discussed above can be implemented as computer-readable codes or instructions on a program-recorded medium. That is, the present invention may be provided in the form of a program.

한편, 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. Meanwhile, computer-readable media includes all types of recording devices that store data that can be read by a computer system. Examples of computer-readable media include HDD (Hard Disk Drive), SSD (Solid State Disk), SDD (Silicon Disk Drive), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. There is.

나아가, 컴퓨터가 읽을 수 있는 매체는, 저장소를 포함하며 전자기기가 통신을 통하여 접근할 수 있는 서버 또는 클라우드 저장소일 수 있다. 이 경우, 컴퓨터는 유선 또는 무선 통신을 통하여, 서버 또는 클라우드 저장소로부터 본 발명에 따른 프로그램을 다운로드 받을 수 있다.Furthermore, the computer-readable medium may be a server or cloud storage that includes storage and can be accessed by electronic devices through communication. In this case, the computer can download the program according to the present invention from a server or cloud storage through wired or wireless communication.

나아가, 본 발명에서는 위에서 설명한 컴퓨터는 프로세서, 즉 CPU(Central Processing Unit, 중앙처리장치)가 탑재된 전자기기로서, 그 종류에 대하여 특별한 한정을 두지 않는다.Furthermore, in the present invention, the computer described above is an electronic device equipped with a processor, that is, a CPU (Central Processing Unit), and there is no particular limitation on its type.

한편, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.Meanwhile, the above detailed description should not be construed as restrictive in all respects and should be considered illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

Claims

Receiving a user query from a user;
Generating a normalized query corresponding to the user query using a query normalization model learned to generate a normalized sentence or word using at least one sentence as input;
generating a response corresponding to the normalized query; and
A query processing method comprising providing the response to the user as a response to the user query.

According to paragraph 1,
The query normalization model is,
A query processing method characterized by receiving a conversation sequence containing at least two or more sentences, and learning to generate a normalized sentence for the last sentence by considering the context of previous sentences of the last sentence in the conversation sequence.

According to clause 2,
The query normalization model is,
A query processing method, characterized in that it is learned using a learning data set including the conversation sequence and a normalized sentence pair for the last sentence of the conversation sequence.

According to paragraph 3,
The training data set includes at least one of a first type of training data and a second type of training data,
The normalized sentences included in the first type of learning data are:
It is constructed by filling in the omitted element in the last sentence of the dialogue sequence using at least some of the previous sentences,
The normalized sentences included in the second type of learning data are:
A query processing method characterized in that the last sentence is changed to a form corresponding to a specific service.

According to paragraph 4,
A query processing method, wherein the omitted component is at least one of a subject, an object, and a predicate indicating the intention of the query.

According to paragraph 4,
The first type of learning data is configured so that the contexts are correlated between the last sentence and the previous sentences,
The second type of learning data is a query processing method characterized in that the context is not correlated between the last sentence and the previous sentences.

Receiving a user voice from a user terminal;
Obtaining a user query corresponding to the user voice based on STT conversion;
Obtaining a normalized query for the user query using a query normalization model learned to generate a normalized sentence or word using at least one sentence as input; and
A method of providing a search service using a language model, comprising providing a search page including search results related to the normalized query to the user terminal.

In clause 7,
further comprising providing the normalized query to the search page,
The normalized query is provided in different areas of the search page according to the reliability of the normalized query.

According to clause 8,
The search page includes a query input area,
If the reliability of the normalized query satisfies a preset standard, providing the normalized query to the query input area and providing search results related to the normalized query to the search page. How we provide services.

According to clause 9,
If the reliability of the normalized query does not meet a preset standard, the normalized query is provided as an icon in an area different from the query input area, and the user query is provided in the query input area,
A search service providing method, wherein search results related to the normalized query are provided on the search page when the icon is selected.

In the query processing system,
The system is,
a query receiving unit that receives a user query from a user; and
Query processing comprising a query normalization unit that generates a normalized query corresponding to the user query using a query normalization model learned to generate a normalized sentence or word using at least one sentence as input. system.

A program that is executed by one or more processes in an electronic device and stored on a computer-readable recording medium,
The above program is,
Receiving a user query from a user; and
Characterized by comprising instructions for generating a normalized query corresponding to the user query using a query normalization model learned to generate a normalized sentence or word using at least one sentence as input. A program stored on a recording medium that can be read by a computer.