KR20210019924A

KR20210019924A - System and method for modifying voice recognition result

Info

Publication number: KR20210019924A
Application number: KR1020190162921A
Authority: KR
Inventors: 김찬우; 다난자야 엔. 고다; 아비나브 가르그; 이경민
Original assignee: 삼성전자주식회사
Priority date: 2019-08-13
Filing date: 2019-12-09
Publication date: 2021-02-23
Also published as: KR20210019920A

Abstract

Provided are a system and method for correcting a speech recognition result. The method for correcting a speech recognition result provided from a device by a server comprises the operations of: receiving a text outputted from an automatic speech recognition (ASR) model of the device from the device; identifying at least one domain related to the received text; selecting at least one text correction model corresponding to the identified domain among a plurality of text correction models included in the server; and correcting the received text using the selected text correction model.

Description

System and method for modifying speech recognition results {SYSTEM AND METHOD FOR MODIFYING VOICE RECOGNITION RESULT}

본 개시는 음성 인식 결과를 수정하는 시스템 및 방법에 관한 것으로서, 보다 상세하게는 디바이스와 서버가 연동하여 음성 인식 결과를 수정하는 시스템 및 방법에 관한 것이다.The present disclosure relates to a system and method for modifying a speech recognition result, and more particularly, to a system and method for modifying a speech recognition result by interworking with a device and a server.

음성 인식(Automatic Speech Recognition)은 사람의 음성을 입력받아 이를 인식하여 텍스트로 변환하는 기술이다. 음성 인식은 스마트폰, 에어컨, 냉장고 및 AI 스피커 등의 다양한 전자 장치에서 활용되고 있다. 먼저 기기가 사람의 음성을 입력으로 받고, 기기 내부에서 이미 훈련되어 있는 음성 인식 모델을 사용하여 입력 음성을 인식하고 텍스트로 변환한다. 이렇게 변환된 텍스트를 최종 출력으로 가지게 된다. 최근 심층 신경망(deep neural network, DNN) 알고리즘이 다양한 머신 러닝 분야에 사용되며 성능 향상이 이루어졌다. 음성 인식 분야에서도 신경망을 사용하여 성능 향상이 크게 이루어졌으며, 최근에는 음성 인식을 위한 음성 인식 모델(Automatic Speech Recognition Model)이 연구되고 있다. 인공지능 시스템은 사용할 수록 인식률이 향상되고 사용자의 취향을 보다 정확하게 이해할 수 있게 되어, 기존의 룰 기반 스마트 시스템은 점차 딥러닝 기반 인공지능 시스템으로 대체되고 있다.Automatic Speech Recognition is a technology that receives a human voice, recognizes it, and converts it into text. Voice recognition is being used in various electronic devices such as smartphones, air conditioners, refrigerators, and AI speakers. First, the device receives a human voice as an input, and uses a voice recognition model that is already trained inside the device to recognize the input voice and convert it into text. You will have the converted text as the final output. Recently, a deep neural network (DNN) algorithm has been used in various machine learning fields, and performance has been improved. In the field of speech recognition, performance has been greatly improved using neural networks, and recently, an Automatic Speech Recognition Model for speech recognition has been studied. As artificial intelligence systems are used, the recognition rate improves and users' tastes can be more accurately understood, and the existing rule-based smart systems are gradually being replaced by deep learning-based artificial intelligence systems.

본 개시의 일 실시예는, 디바이스의 ASR(Automatic Speech Recognition) 모델의 출력 값을 서버로 제공하며, 서버의 인공지능 모델을 이용하여 ASR 모델의 출력 값을 수정할 수 있는 시스템 및 방법을 제공할 수 있다.An embodiment of the present disclosure provides a system and method capable of providing an output value of an ASR (Automatic Speech Recognition) model of a device to a server, and modifying an output value of an ASR model using an artificial intelligence model of the server. have.

또한, 본 개시의 일 실시예는, 디바이스의 ASR 모델의 출력 값에 관련된 도메인에 대응되는 텍스트 수정 모델을 이용하여 음성 인식 결과를 수정할 수 있는 시스템 및 방법을 제공할 수 있다.In addition, an embodiment of the present disclosure may provide a system and method capable of correcting a speech recognition result by using a text correction model corresponding to a domain related to an output value of an ASR model of a device.

또한, 본 개시의 일 실시예는, 서버가 디바이스로부터 수신되는 텍스트를 복수의 도메인에 관련된 텍스트 수정 모델들에 효과적으로 적용할 수 있도록 하는 시스템 및 방법을 제공할 수 있다.In addition, an embodiment of the present disclosure may provide a system and method for enabling a server to effectively apply text received from a device to text modification models related to a plurality of domains.

또한, 본 개시의 일 실시예는, 서버가 복수의 도메인에 관련된 복수의 도메인 식별 모듈들을 이용하여 텍스트에 관련된 도메인을 효과적으로 식별할 수 있도록 하는 시스템 및 방법을 제공할 수 있다.In addition, an embodiment of the present disclosure may provide a system and method for enabling a server to effectively identify a domain related to text using a plurality of domain identification modules related to a plurality of domains.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 한 측면은, 상기 디바이스의 ASR(Automatic Speech Recognition) 모델로부터 출력되는 텍스트를 상기 디바이스로부터 수신하는 동작; 상기 수신된 텍스트에 관련된 적어도 하나의 도메인을 식별하는 동작; 상기 서버 내에 포함된 복수의 텍스트 수정 모델 들 중에서, 상기 식별된 적어도 하나의 도메인에 대응되는 적어도 하나의 텍스트 수정 모델을 선택하는 동작; 상기 선택된 적어도 하나의 텍스트 수정 모델을 이용하여 상기 수신된 텍스트를 수정하는 동작; 및 상기 수정된 텍스트를 상기 디바이스에게 제공하는 동작; 을 포함하며, 상기 텍스트 수정 모델은, 음성 인식을 위하여 상기 수신된 텍스트를 입력받아 상기 수정된 텍스트를 출력하는 인공 지능 모델인 것인, 서버가 디바이스로부터 제공된 음성 인식 결과를 수정하는 방법을 제공할 수 있다.As a technical means for achieving the above-described technical problem, an aspect of the present disclosure includes an operation of receiving, from the device, a text output from an Automatic Speech Recognition (ASR) model of the device; Identifying at least one domain related to the received text; Selecting at least one text correction model corresponding to the identified at least one domain from among a plurality of text correction models included in the server; Modifying the received text using the selected at least one text modification model; And providing the modified text to the device. Including, wherein the text correction model is an artificial intelligence model that receives the received text for speech recognition and outputs the corrected text, wherein the server provides a method of modifying the speech recognition result provided from the device. I can.

또한, 본 개시의 다른 측면은, 통신 인터페이스; 하나 이상의 명령어들(instructions)을 포함하는 프로그램을 저장하는 저장부; 및 상기 저장부에 저장된 프로그램의 하나 이상의 명령어들을 실행하는 프로세서; 를 포함하고, 상기 프로세서는, 상기 디바이스의 ASR(Automatic Speech Recognition) 모델로부터 출력되는 텍스트를 상기 디바이스로부터 수신하고, 상기 수신된 텍스트에 관련된 적어도 하나의 도메인을 식별하고, 상기 서버 내에 포함된 복수의 텍스트 수정 모델들 중에서, 상기 식별된 적어도 하나의 도메인에 대응되는 적어도 하나의 텍스트 수정 모델을 선택하고, 상기 선택된 적어도 하나의 텍스트 수정 모델을 이용하여 상기 수신된 텍스트를 수정하고, 상기 수정된 텍스트를 상기 디바이스에게 제공하며, 상기 텍스트 수정 모델은, 음성 인식을 위하여 상기 수신된 텍스트를 입력받아 상기 수정된 텍스트를 출력하는 인공 지능 모델인 것인, 디바이스로부터 제공된 음성 인식 결과를 수정하는 서버를 제공할 수 있다.In addition, another aspect of the present disclosure, a communication interface; A storage unit for storing a program including one or more instructions; And a processor that executes one or more instructions of the program stored in the storage unit. Including, wherein the processor receives the text output from the ASR (Automatic Speech Recognition) model of the device from the device, identifies at least one domain related to the received text, and a plurality of Among text modification models, select at least one text modification model corresponding to the identified at least one domain, modify the received text using the selected at least one text modification model, and select the modified text. Provided to the device, wherein the text correction model is an artificial intelligence model that receives the received text and outputs the corrected text for speech recognition, providing a server for modifying the speech recognition result provided from the device. I can.

도 1은 본 개시의 일 실시예에 따른 음성 인식 시스템의 개요도이다.
도 2는 본 개시의 일 실시예에 따른 복수의 도메인에 관련된 텍스트 수정 모델들을 포함하는 음성 인식 시스템의 개요도이다.
도 3은 본 개시의 일 실시예에 따른 음성 인식 시스템 내의 디바이스 및 서버가 음성 입력을 인식하여 수정된 텍스트를 획득하는 방법의 흐름도이다.
도 4는 본 개시의 일 실시예에 따른 서버가 텍스트에 관련된 도메인을 식별하고 텍스트에 관련된 도메인의 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.
도 5는 본 개시의 일 실시예에 따른 디바이스가 텍스트에 관련된 도메인을 식별하고 서버가 텍스트에 관련된 도메인의 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.
도 6은 본 개시의 일 실시예에 따른 서버 및 디바이스가 텍스트에 관련된 도메인을 각각 식별하고, 서버가 텍스트에 관련된 도메인의 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.
도 7은 본 개시의 일 실시예에 따른 서버가 디바이스에서 획득된 도메인 신뢰도 및 서버에서 획득된 도메인 신뢰도를 이용하여, 텍스트에 관련된 도메인을 선택하는 방법의 흐름도이다.
도 8은 본 개시의 일 실시예에 따른 서버가 서버 내의 복수의 도메인 식별 모듈 중 선택된 도메인 식별 모듈을 이용하여 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.
도 9는 본 개시의 일 실시예에 따른 서버가 복수의 도메인 식별 모듈 중 선택된 도메인 식별 모듈을 이용하여 텍스트의 수정을 위한 도메인을 선택하는 방법의 흐름도이다.
도 10은 본 개시의 일 실시예에 따른 계층적으로 분류된 도메인에 관련된 제1 도메인 식별 모듈, 제2 도메인 식별 모듈 및 텍스트 수정 모델의 예시를 나타내는 도면이다.
도 11은 본 개시의 일 실시예에 따른 서버가 복수의 텍스트 수정 모델을 이용하여 텍스트를 수정하는 예시를 나타내는 도면이다.
도 12는 본 개시의 일 실시예에 따른 서버가 복수 구간의 텍스트에 대한 도메인 신뢰도를 누적하여 산출하는 방법의 흐름도이다.
도 13은 본 개시의 일 실시예에 따른 서버가 어절 단위로 누적되는 텍스트 스트림에 대한 도메인 신뢰도를 획득하는 예시를 나타내는 도면이다.
도 14는 본 개시의 일 실시예에 따른 서버가 텍스트를 복수의 구간으로 구분하고 각 구간의 텍스트 별로 도메인을 선택하는 방법의 흐름도이다.
도 15는 서버가 복수의 도메인 별로 텍스트의 도메인 신뢰도를 비교하고 각 구간의 텍스트 별로 텍스트 수정 모델을 선택하여 수정하는 예시를 나타내는 도면이다.
도 16은 본 개시의 일 실시예에 따른 서버가 복수의 텍스트 수정 모델로부터 수정된 텍스트를 이용하여, 디바이스로부터 수신된 텍스트를 수정하는 예시를 나타내는 도면이다.
도 17은 본 개시의 일 실시예에 따른 서버의 블록도이다.
도 18은 본 개시의 일 실시예에 따른 디바이스의 블록도이다.1 is a schematic diagram of a speech recognition system according to an embodiment of the present disclosure.
2 is a schematic diagram of a speech recognition system including text correction models related to a plurality of domains according to an embodiment of the present disclosure.
3 is a flowchart of a method for acquiring a modified text by recognizing a voice input by a device and a server in a voice recognition system according to an embodiment of the present disclosure.
4 is a diagram illustrating an example in which a server according to an embodiment of the present disclosure identifies a domain related to text and selects a text correction model of a domain related to text.
5 is a diagram illustrating an example in which a device identifies a domain related to text and a server selects a text modification model of a domain related to text according to an embodiment of the present disclosure.
6 is a diagram illustrating an example in which a server and a device according to an embodiment of the present disclosure identify domains related to text, and the server selects a text modification model of a domain related to text.
7 is a flowchart of a method for selecting a domain related to text by using a domain reliability obtained from a device and a domain reliability obtained from a server according to an embodiment of the present disclosure.
8 is a diagram illustrating an example in which a server according to an embodiment of the present disclosure selects a text correction model by using a domain identification module selected from among a plurality of domain identification modules in the server.
9 is a flowchart of a method for selecting a domain for text correction by using a domain identification module selected from among a plurality of domain identification modules by a server according to an embodiment of the present disclosure.
10 is a diagram illustrating an example of a first domain identification module, a second domain identification module, and a text correction model related to hierarchically classified domains according to an embodiment of the present disclosure.
11 is a diagram illustrating an example in which a server corrects text using a plurality of text correction models according to an embodiment of the present disclosure.
12 is a flowchart of a method of accumulating and calculating domain reliability for texts in a plurality of sections by a server according to an embodiment of the present disclosure.
13 is a diagram illustrating an example in which a server obtains domain reliability for a text stream accumulated in word units according to an embodiment of the present disclosure.
14 is a flowchart illustrating a method of dividing text into a plurality of sections by a server and selecting a domain for each text of each section according to an embodiment of the present disclosure.
FIG. 15 is a diagram illustrating an example in which a server compares the domain reliability of text for each of a plurality of domains, and selects and corrects a text correction model for each text in each section.
16 is a diagram illustrating an example in which a server according to an embodiment of the present disclosure modifies text received from a device by using text modified from a plurality of text correction models.
17 is a block diagram of a server according to an embodiment of the present disclosure.
18 is a block diagram of a device according to an embodiment of the present disclosure.

아래에서는 첨부한 도면을 참조하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 개시의 실시예를 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present disclosure. However, the present disclosure may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present disclosure, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

이하 첨부된 도면을 참고하여 본 개시를 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

도 1은 본 개시의 일 실시예에 따른 음성 인식 시스템의 개요도이다.1 is a schematic diagram of a speech recognition system according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시의 일 실시예에 따른 음성 인식 시스템은 디바이스(1000) 및 서버(2000)를 포함한다.Referring to FIG. 1, a voice recognition system according to an embodiment of the present disclosure includes a device 1000 and a server 2000.

디바이스(1000)는 ASR(Automatic Speech Recognition) 모델을 포함할 수 있으며, 서버(2000)는 텍스트 수정 모델을 포함할 수 있다. 디바이스(1000)는 ASR 모델을 이용하여 사용자의 음성 입력을 인식하고 텍스트를 출력할 수 있으며, 서버(2000)는 디바이스(1000)에 의해 생성된 텍스트를 수정할 수 있다.The device 1000 may include an Automatic Speech Recognition (ASR) model, and the server 2000 may include a text correction model. The device 1000 may recognize a user's voice input and output text using the ASR model, and the server 2000 may modify the text generated by the device 1000.

ASR 모델은, 통합 신경망을 이용하여 음성을 인식하는 음성 인식 모델로서, 사용자의 음성 입력으로부터 텍스트를 출력할 수 있다. ASR 모델은, 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 포함하는 인공지능 모델일 수 있다. 또는, ASR 모델은, 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 별도로 포함하지 않고 통합된 신경망을 포함하는 구조를 가지는 종단간(End-to-End) 음성 인식 모델일 수 있다. 종단간 ASR 모델은 통합된 신경망을 이용함으로써, 음성으로부터 음소를 인식한 이후에 음소를 텍스트로 변환하는 과정이 없이, 음성을 텍스트로 변환할 수 있다. 텍스트는, 적어도 하나의 문자를 포함할 수 있다. 문자는, 인간의 언어를 눈에 볼 수 있는 형태로 나타내어 적는데 사용하는 기호를 의미한다. 예를 들어, 문자에는, 한글, 알파벳, 한자, 숫자, 발음 부호, 문장 부호 및 기타 기호가 포함될 수 있다. 또한, 예를 들어, 텍스트는 문자열을 포함할 수 있다. 문자열은, 문자들의 배열(sequence)을 의미한다. 예를 들어, 텍스트는 적어도 하나의 문자소를 포함할 수 있다. 문자소(grapheme)는 적어도 하나의 문자로 구성되는, 소리를 나타내는 가장 작은 단위이다. 예를 들어, 알파벳 표기 체계의 경우, 하나의 문자가 문자소가 될 수 있으며, 문자열은 문자소들의 배열을 의미할 수 있다. 예를 들어, 텍스트는, 형태소 또는 단어를 포함할 수 있다. 형태소(morpheme)는 적어도 하나의 문자소로 구성되는, 의미를 가지는 가장 작은 단위이다. 단어(word)는, 적어도 하나의 형태소로 구성되는, 자립적으로 쓰일 수 있거나 문법적 기능을 나타내는 언어의 기본 단위이다.The ASR model is a speech recognition model that recognizes speech using an integrated neural network, and can output text from a user's speech input. The ASR model may be, for example, an artificial intelligence model including an acoustic model, a pronunciation dictionary, and a language model. Alternatively, the ASR model may be, for example, an end-to-end speech recognition model having a structure including an integrated neural network without separately including an acoustic model, a pronunciation dictionary, and a language model. The end-to-end ASR model uses an integrated neural network, so after recognizing a phoneme from a speech, it can convert speech into text without the process of converting the phoneme into text. The text may include at least one character. Letters refer to symbols used to express human language in a visible form and write it down. For example, characters may include Korean characters, alphabets, Chinese characters, numbers, pronunciation marks, punctuation marks, and other symbols. Also, for example, the text may include a character string. String refers to the sequence of characters. For example, the text may include at least one character element. Grapheme is the smallest unit of sound, composed of at least one character. For example, in the case of the alphabet notation system, one character may be a character element, and a character string may mean an arrangement of character elements. For example, the text may include morphemes or words. A morpheme is the smallest unit that has a meaning composed of at least one letter. A word is a basic unit of language that can be used independently or represents a grammatical function, composed of at least one morpheme.

디바이스(1000)는 사용자의 음성 입력을 수신하고, 수신된 음성 입력을 ASR 모델을 이용하여 인식할 수 있으며, ASR 모델의 출력 값인 텍스트를 서버(2000)에게 제공할 수 있다. 또한, 서버(2000)는 디바이스(1000)로부터 ASR 모델의 출력 값인 텍스트를 수신하고, 수신된 텍스트를 수정할 수 있다. 서버(2000)는 ASR 모델의 출력 값인 텍스트가 서버(2000)에 등록된 특정 도메인에 관련된 정도를 식별하고, 식별된 도메인의 텍스트 수정 모델을 이용하여 텍스트를 수정할 수 있다. 또한, 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다.The device 1000 may receive a user's voice input, recognize the received voice input using an ASR model, and provide text, which is an output value of the ASR model, to the server 2000. Also, the server 2000 may receive text, which is an output value of the ASR model, from the device 1000 and may modify the received text. The server 2000 may identify the degree to which text, which is an output value of the ASR model, is related to a specific domain registered in the server 2000, and correct the text using the text correction model of the identified domain. Also, the server 2000 may provide the modified text to the device 1000.

텍스트 수정 모델은, 음성 인식 결과인 텍스트의 적어도 일부를 수정하기 위하여 학습된 인공지능 모델로서, 예를 들어, 시퀀스-투-시퀀스 맵퍼 (Sequence-to-Sequence Mapper)를 포함할 수 있다. 텍스트 수정 모델은, ASR 모델로부터 출력되는 텍스트 및 기 설정된 정답 텍스트(ground truth text)를 이용하여 훈련된 인공지능 모델일 수 있다. 텍스트 수정 모델은 도메인 별로 훈련된 인공지능 모델일 수 있다. 예를 들어, 제1 도메인의 텍스트 수정 모델은 ASR 모델의 출력 값인 텍스트 및 제1 도메인에 특화된 정답 텍스트를 이용하여 훈련될 수 있다. 또한, 예를 들어, 제2 도메인의 텍스트 수정 모델은 ASR 모델의 출력 값인 텍스트 및 제2 도메인에 특화된 정답 텍스트를 이용하여 훈련될 수 있다.The text correction model is an artificial intelligence model that has been trained to correct at least a portion of text resulting from speech recognition, and may include, for example, a sequence-to-sequence mapper. The text correction model may be an artificial intelligence model trained using text output from the ASR model and a preset ground truth text. The text correction model may be an artificial intelligence model trained for each domain. For example, the text correction model of the first domain may be trained using the text that is an output value of the ASR model and the correct answer text specialized in the first domain. In addition, for example, the text correction model of the second domain may be trained using a text that is an output value of the ASR model and a correct answer text specialized for the second domain.

또한, 텍스트 수정 모델은 여러 종류의 ASR 모델로부터 출력된 텍스트들 및 기 설정된 정답 텍스트들을 이용하여 훈련된 인공지능 모델일 수 있다. 이 경우, 텍스트 수정 모델은 여러 종류의 ASR 모델로부터 출력된 텍스트로 훈련되기 때문에, 텍스트 수정 모델에 입력되는 텍스트가 어떤 종류의 ASR 모델로부터 출력된 텍스트인지에 관계없이 정확한 출력 값(예를 들어, 수정된 텍스트)을 제공할 수 있게 된다.Further, the text correction model may be an artificial intelligence model trained using texts output from various types of ASR models and preset correct answer texts. In this case, since the text correction model is trained with text output from several types of ASR models, the correct output value (for example, regardless of what kind of ASR model the text input to the text correction model is) Revised text) can be provided.

예를 들어, 수정된 텍스트는, 수정된 문자, 수정된 문자소, 수정된 형태소 또는 수정된 단어 중 적어도 하나를 포함할 수 있다. 예를 들어, ASR 모델로부터 출력된 텍스트가 오타를 포함하는 경우에, 수정된 텍스트는 오타로부터 수정된 문자를 포함할 수 있다. 또한, 예를 들어, ASR 모델로부터 출력된 텍스트가 문맥에 부적합한 잘못된 의미의 단어를 포함하는 경우에, 수정된 텍스트는 잘못된 의미의 단어를 대체하는 바른 의미의 단어를 포함할 수 있다. 또한, 예를 들어, ASR 모델로부터 출력된 텍스트 내의 특정 단어가 유사 단어로 대체됨으로써, 수정된 텍스트가 생성될 수도 있다. For example, the revised text may include at least one of revised characters, revised letters, revised morphemes, and revised words. For example, if the text output from the ASR model contains a typo, the corrected text may include a character corrected from the typo. In addition, for example, when the text output from the ASR model includes a word with an incorrect meaning inappropriate to the context, the corrected text may include a word with a correct meaning replacing the word with the wrong meaning. Further, for example, a specific word in the text output from the ASR model may be replaced with a similar word, thereby generating a modified text.

디바이스(1000)는, 스마트폰, 태블릿 PC, PC, 스마트 TV, 휴대폰, PDA(personal digital assistant), 랩톱, 미디어 플레이어, 마이크로 서버, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다. 또한, 디바이스(1000)는 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 디바이스일 수 있다. 그러나, 이에 제한되지 않으며, 디바이스(1000)는 음성 인식을 위하여 서버(2000)와 네트워크를 통하여 데이터를 송수신할 수 있는 모든 종류의 기기를 포함할 수 있다.The device 1000 includes a smartphone, a tablet PC, a PC, a smart TV, a mobile phone, a personal digital assistant (PDA), a laptop, a media player, a micro server, a global positioning system (GPS) device, an e-book terminal, a digital broadcasting terminal, Navigation, kiosk, MP3 player, digital camera, home appliances, and other mobile or non-mobile computing devices may be, but are not limited thereto. In addition, the device 1000 may be a wearable device such as a watch, glasses, hair band, and ring having a communication function and a data processing function. However, the present invention is not limited thereto, and the device 1000 may include all kinds of devices capable of transmitting and receiving data through a network with the server 2000 for voice recognition.

네트워크(200)는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 이동 통신망(mobile radio communication network), 위성 통신망 및 이들의 상호 조합을 포함하며, 도 1에 도시된 각 네트워크 구성 주체가 서로 원활하게 통신을 할 수 있도록 하는 포괄적인 의미의 데이터 통신망이며, 유선 인터넷, 무선 인터넷 및 모바일 무선 통신망을 포함한다. The network 200 includes a local area network (LAN), a wide area network (WAN), a value added network (VAN), a mobile radio communication network, a satellite communication network, and a mutual It includes a combination, and is a data communication network in a comprehensive meaning that enables each network member shown in FIG. 1 to communicate with each other smoothly, and includes a wired Internet, a wireless Internet, and a mobile wireless communication network.

도 2는 본 개시의 일 실시예에 따른 복수의 도메인에 관련된 텍스트 수정 모델들을 포함하는 음성 인식 시스템의 개요도이다.2 is a schematic diagram of a speech recognition system including text correction models related to a plurality of domains according to an embodiment of the present disclosure.

도 2를 참조하면, 디바이스(1000)는 ASR 모델을 포함하며, 서버(2000)는 ASR 모델로부터 출력된 텍스트를 수정하기 위한 텍스트 수정 모델들을 포함할 수 있다. 예를 들어, 서버(2000)는 제1 도메인에 대응되는 제1 텍스트 수정 모델 및 제2 도메인에 대응되는 제2 텍스트 수정 모델을 포함할 수 있다.Referring to FIG. 2, the device 1000 may include an ASR model, and the server 2000 may include text correction models for correcting text output from the ASR model. For example, the server 2000 may include a first text correction model corresponding to the first domain and a second text correction model corresponding to the second domain.

디바이스(1000)는 음성 입력으로부터 특징을 추출하여 특징 벡터를 획득할 수 있으며, 획득된 특징 벡터를 ASR 모델에 입력할 수 있다. 디바이스(1000)는 ASR 모델부터 출력되는 출력 값을 서버(2000)로 제공할 수 있다. ASR 모델로부터 출력되는 출력 값은 다양한 형식의 텍스트를 포함할 수 있다. 예를 들어, 디바이스(1000)는 문장 단위의 텍스트를 서버(2000)에게 제공하거나, 텍스트 스트림을 서버(2000)에게 제공할 수 있으나, 이에 제한되지 않는다.The device 1000 may obtain a feature vector by extracting a feature from the voice input, and may input the obtained feature vector into the ASR model. The device 1000 may provide an output value output from the ASR model to the server 2000. The output value output from the ASR model can include various types of text. For example, the device 1000 may provide text in sentence units to the server 2000 or a text stream to the server 2000, but is not limited thereto.

서버(2000)는 디바이스(1000)로부터 텍스트를 수신하고, 수신된 텍스트에 관련된 도메인을 선택할 수 있다. 도메인은 입력 음성이 관련된 분야를 나타내며, 예를 들어, 입력 음성의 의미, 입력 음성의 속성 등에 따라 미리 설정될 수 있다. 도메인은, 예를 들어, 입력 음성이 관련된 서비스에 따라 분류될 수도 있다. 또한, 도메인 별로 텍스트 수정 모델이 훈련될 수 있으며, 이 경우, 도메인 별로 훈련된 텍스트 수정 모델은, 해당 도메인에 관련된 입력 텍스트와 그에 대응되는 정답 텍스트를 이용하여 훈련된 모델일 수 있다. 서버(2000)는 기설정된 복수의 도메인 중 적어도 하나를 선택하고, 선택된 도메인에 대응되는 텍스트 수정 모델들 중에서 적어도 하나를 선택할 수 있다. 또한, 서버(2000)는 디바이스(1000)로부터 수신된 텍스트를 선택된 도메인의 텍스트 수정 모델에 입력함으로써, 수정된 텍스트를 획득할 수 있다. 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다. The server 2000 may receive text from the device 1000 and select a domain related to the received text. The domain represents a field related to the input voice, and may be preset according to, for example, the meaning of the input voice and the properties of the input voice. The domain may be classified according to, for example, a service related to an input voice. In addition, the text correction model may be trained for each domain. In this case, the text correction model trained for each domain may be a model trained using an input text related to a corresponding domain and a correct answer text corresponding thereto. The server 2000 may select at least one of a plurality of preset domains, and select at least one of text correction models corresponding to the selected domain. Also, the server 2000 may obtain the corrected text by inputting the text received from the device 1000 into the text correction model of the selected domain. The server 2000 may provide the modified text to the device 1000.

한편, 서버(2000)는 수정된 텍스트를 이용하여 디바이스(1000)에게 다양한 종류의 보이스 어시스턴트 서비스를 제공할 수 있다. 보이스 어시스턴트 서비스는, 사용자와의 대화를 제공하는 서비스일 수 있다. 보이스 어시스턴트 서비스에서는 사용자의 상황, 디바이스의 상황 등을 고려하여 사람이 사용자와 직접 대화하는 것처럼 사용자에게 응답 메시지를 제공할 수 있다. 또한, 보이스 어시스턴트 서비스에서는, 사용자의 개인 비서처럼 사용자가 필요한 정보가 적절하게 생성되어 사용자에게 제공될 수 있다. 보이스 어시스턴트 서비스는, 예를 들어, 방송 서비스, 콘텐트 공유 서비스, 콘텐트 제공 서비스, 전력 관리 서비스, 게임 제공 서비스, 채팅 서비스, 문서 작성 서비스, 검색 서비스, 통화 서비스, 사진 촬영 서비스, 교통 수단 추천 서비스 및 동영상 재생 서비스 등과 같은 다양한 서비스와 연계되어, 사용자가 필요한 정보 또는 기능을 사용자에게 제공할 수 있다.Meanwhile, the server 2000 may provide various types of voice assistant services to the device 1000 by using the modified text. The voice assistant service may be a service that provides a conversation with a user. In the voice assistant service, a response message can be provided to the user as if a person had a direct conversation with the user in consideration of the situation of the user and the device. In addition, in the voice assistant service, information required by the user, such as a personal assistant of the user, may be appropriately generated and provided to the user. The voice assistant service includes, for example, a broadcasting service, a content sharing service, a content providing service, a power management service, a game providing service, a chat service, a document writing service, a search service, a call service, a photo shooting service, a transportation recommendation service, and In connection with various services such as a video playback service, information or functions required by the user may be provided to the user.

도 3은 본 개시의 일 실시예에 따른 음성 인식 시스템 내의 디바이스 및 서버가 음성 입력을 인식하여 수정된 텍스트를 획득하는 방법의 흐름도이다.3 is a flowchart of a method for obtaining a modified text by recognizing a voice input by a device and a server in a voice recognition system according to an embodiment of the present disclosure.

디바이스(1000)는 디바이스(1000)의 메모리에 저장된 인스트럭션들을 실행함으로써 도 3에서의 디바이스(1000)의 동작을 실행할 수 있다. 예를 들어, 디바이스(1000)는 후술할 도 18의 음성 인식 평가 모듈(1430), 도메인 식별 모듈(1440), NLU 결정 모듈(1450), 도메인 등록 모듈(1460), ASR 모델(1410), NLU 모델(1420) 중 적어도 하나를 실행함으로써 도 3에서의 디바이스(1000)의 동작을 수행할 수 있다. 하지만, 이에 제한되지 않으며, 디바이스(1000)는 디바이스(1000)의 소정 동작을 수행하기 위해 메모리에 저장된 다른 프로그램을 실행할 수도 있다.The device 1000 may execute the operation of the device 1000 in FIG. 3 by executing instructions stored in the memory of the device 1000. For example, the device 1000 includes a voice recognition evaluation module 1430, a domain identification module 1440, an NLU determination module 1450, a domain registration module 1460, an ASR model 1410, and an NLU of FIG. 18 to be described later. The operation of the device 1000 in FIG. 3 may be performed by executing at least one of the models 1420. However, the present invention is not limited thereto, and the device 1000 may execute another program stored in the memory to perform a predetermined operation of the device 1000.

또한, 서버(2000)는 서버(2000)의 저장부에 저장된 인스트럭션들을 실행함으로써 도 3에서의 서버(2000)의 동작을 실행할 수 있다. 예를 들어, 서버(2000)는 후술할 도 17의 도메인 관리 모듈(2310), 음성 해석 관리 모듈(2340), 텍스트 수정 모듈(2320) 및 NLU 모듈(2330) 중 적어도 하나를 실행함으로써 도 3에서의 서버(2000)의 동작을 수행할 수 있다. 하지만, 이에 제한되지 않으며, 서버(2000)는 서버(2000)의 소정 동작을 수행하기 위해 저장부에 저장된 다른 프로그램을 실행할 수도 있다.In addition, the server 2000 can execute the operation of the server 2000 in FIG. 3 by executing instructions stored in the storage unit of the server 2000. For example, the server 2000 in FIG. 3 by executing at least one of the domain management module 2310, the speech analysis management module 2340, the text modification module 2320, and the NLU module 2330 of FIG. 17 to be described later. The operation of the server 2000 of may be performed. However, the present invention is not limited thereto, and the server 2000 may execute another program stored in the storage to perform a predetermined operation of the server 2000.

S300에서 디바이스(1000)는 음성 신호로부터 특징 벡터를 획득할 수 있다. 디바이스(1000)는 사용자의 음성 입력(예를 들어, 발화)을 마이크를 통하여 수신하고, 마이크를 통해 획득된 음성 신호를 이용하여 음성 신호의 특징을 나타내는 특징 벡터를 생성할 수 있다. 음성 신호에 노이즈가 포함된 경우 디바이스(1000)는 음성 신호 내의 노이즈를 제거하고, 노이즈가 제거된 음성 신호로부터 특징 벡터를 획득할 수 있다. 또한, 예를 들어, 디바이스(1000)는 음성 신호로부터 음성 신호의 특징을 나타내는 특징 벡터를 추출할 수 있다. 예를 들어, 디바이스(1000)는 외부 장치(미도시)로부터 음성 신호의 특징 벡터를 나타내는 데이터를 수신할 수도 있다.In S300, the device 1000 may obtain a feature vector from the voice signal. The device 1000 may receive a user's voice input (eg, speech) through a microphone, and may generate a feature vector representing characteristics of the voice signal by using the voice signal obtained through the microphone. When noise is included in the voice signal, the device 1000 may remove noise in the voice signal and obtain a feature vector from the noise-removed voice signal. Also, for example, the device 1000 may extract a feature vector representing a characteristic of the speech signal from the speech signal. For example, the device 1000 may receive data representing a feature vector of a voice signal from an external device (not shown).

S305에서 디바이스(1000)는 ASR 모델을 이용하여 특징 벡터로부터 텍스트를 획득할 수 있다. 디바이스(1000)는 사용자의 음성을 인식하기 위하여 디바이스(1000) 내의 ASR 모델에 특징 벡터를 입력할 수 있다. 디바이스(1000) 내에 복수의 ASR 모델이 포함된 경우, 디바이스(1000)는 복수의 ASR 모델 중 하나를 선택하고, 특징 벡터를 선택된 ASR 모델에 적합한 포맷으로 변환할 수 있다. 디바이스(1000)의 ASR 모델은, 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 포함하는 인공지능 모델일 수 있다. 또는, 디바이스(1000)의 ASR 모델은, 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 별도로 포함하지 않고 통합된 신경망을 포함하는 구조를 가지는 종단간 음성 인식 모델일 수 있다.In S305, the device 1000 may obtain text from the feature vector using the ASR model. The device 1000 may input a feature vector into the ASR model in the device 1000 in order to recognize the user's voice. When a plurality of ASR models are included in the device 1000, the device 1000 may select one of the plurality of ASR models and convert the feature vector into a format suitable for the selected ASR model. The ASR model of the device 1000 may be, for example, an artificial intelligence model including an acoustic model, a pronunciation dictionary, and a language model. Alternatively, the ASR model of the device 1000 may be, for example, an end-to-end speech recognition model having a structure including an integrated neural network without separately including an acoustic model, a pronunciation dictionary, and a language model.

S310에서 디바이스(1000)는 텍스트의 신뢰도를 획득할 수 있다. 텍스트의 신뢰도는, ASR 모델로부터 출력된 텍스트가 입력 음성에 매칭되는 정도를 나타내는 수치일 수 있으며 예를 들어, 컨피던스 스코어(confidence score)를 포함할 수 있으나, 이에 제한되지 않는다. 또한, 텍스트의 신뢰도는, 텍스트가 입력 음성에 일치할 확률에 관련될 수 있다. 예를 들어, 텍스트의 신뢰도는, 디바이스(1000)의 ASR 모델로부터 출력되는 복수의 추정 텍스트들의 가능도, 및 텍스트 내의 적어도 하나의 문자가 다른 문자로 대체될 사후 확률들 중 적어도 하나에 기초하여 계산될 수 있다. 예를 들어, 디바이스(1000)는, 비터비(Viterbi) 디코딩 결과 출력되는 가능도에 기초하여 신뢰도를 계산할 수 있다. 또는, 예를 들어, 디바이스(1000)는, 종단간 ASR 모델에서 소프트맥스 레이어로부터 출력되는 사후 확률들에 기초하여 신뢰도를 계산할 수도 있다. 또는, 예를 들어, 디바이스(1000)는 디바이스(1000)의 ASR 모델의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들을 결정하고, 복수의 추정 텍스트들 내의 문자들의 상관도에 기초하여, 텍스트의 신뢰도를 계산할 수 있다. 또한, 예를 들어, 디바이스(1000)는 도 18의 음성 인식 평가 모듈(1430)을 이용하여 텍스트의 신뢰도를 획득할 수 있다.In S310, the device 1000 may acquire the reliability of the text. The reliability of the text may be a numerical value indicating the degree to which the text output from the ASR model matches the input voice, and may include, for example, a confidence score, but is not limited thereto. Also, the reliability of the text may be related to the probability that the text will match the input speech. For example, the reliability of the text is calculated based on at least one of the likelihood of a plurality of estimated texts output from the ASR model of the device 1000 and posterior probabilities in which at least one character in the text is replaced with another character. Can be. For example, the device 1000 may calculate the reliability based on the likelihood output as a result of Viterbi decoding. Alternatively, for example, the device 1000 may calculate the reliability based on posterior probabilities output from the softmax layer in the end-to-end ASR model. Alternatively, for example, the device 1000 determines a plurality of estimated texts estimated in the speech recognition process of the ASR model of the device 1000, and based on the correlation of characters in the plurality of estimated texts, the reliability of the text Can be calculated. Also, for example, the device 1000 may acquire the reliability of the text using the speech recognition evaluation module 1430 of FIG. 18.

S315에서 디바이스(1000)는 텍스트를 서버(2000)로 전송할 지를 결정할 수 있다. 디바이스(1000)는 텍스트의 신뢰도를 기설정된 임계치와 비교함으로써, 텍스트를 서버(2000)로 전송할 지를 결정할 수 있다. 디바이스(1000)는 텍스트의 신뢰도가 기설정된 임계치 이상이면, 텍스트를 서버(2000)로 전송하지 않을 것을 결정할 수 있다. 또한, 디바이스(1000)는 텍스트의 신뢰도가 기설정된 임계치보다 작으면, 텍스트를 서버(2000)로 전송할 것을 결정할 수 있다.In S315, the device 1000 may determine whether to transmit the text to the server 2000. The device 1000 may determine whether to transmit the text to the server 2000 by comparing the reliability of the text with a preset threshold. The device 1000 may determine not to transmit the text to the server 2000 when the reliability of the text is greater than or equal to a preset threshold. Also, if the reliability of the text is less than a preset threshold, the device 1000 may determine to transmit the text to the server 2000.

또한, 디바이스(1000)는 ASR 모델의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들 중에서 신뢰도가 높은 적어도 하나의 텍스트를 기준으로, 텍스트를 서버(2000)로 전송할 지를 결정할 수 있다. 예를 들어, ASR 모델의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들이 신뢰도가 높은 제1 추정 텍스트 및 신뢰도가 높은 제2 추정 텍스트를 포함하며, 제1 추정 텍스트의 신뢰도 및 제2 추정 텍스트의 신뢰도의 차이가 소정 임계치 이하인 경우에, 디바이스(1000)는 텍스트를 서버(2000)로 전송할 것을 결정할 수 있다. 또한, 예를 들어, ASR 모델의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들이 신뢰도가 높은 제1 추정 텍스트 및 신뢰도가 높은 제2 추정 텍스트를 포함하며, 제1 추정 텍스트의 신뢰도 및 제2 추정 텍스트의 신뢰도의 차이가 소정 임계치보다 큰 경우에는, 디바이스(1000)는 텍스트를 서버(2000)로 전송하지 않을 것을 결정할 수 있다Also, the device 1000 may determine whether to transmit the text to the server 2000 based on at least one text with high reliability among a plurality of estimated texts estimated during the speech recognition process of the ASR model. For example, the plurality of estimated texts estimated in the speech recognition process of the ASR model include a first estimated text with high reliability and a second estimated text with high reliability, and the reliability of the first estimated text and the reliability of the second estimated text When the difference between is less than or equal to a predetermined threshold, the device 1000 may determine to transmit the text to the server 2000. In addition, for example, the plurality of estimated texts estimated in the speech recognition process of the ASR model include a first estimated text with high reliability and a second estimated text with high reliability, and the reliability of the first estimated text and the second estimated text When the difference in reliability of is greater than a predetermined threshold, the device 1000 may determine not to transmit the text to the server 2000.

동작 S315에서 텍스트를 서버(2000)로 전송할 것을 결정한 경우, 동작 S320에서 디바이스(1000)는 서버(2000)에게 텍스트의 수정을 요청할 수 있다.When it is determined in operation S315 to transmit the text to the server 2000, the device 1000 may request the server 2000 to modify the text in operation S320.

디바이스(1000)는 텍스트를 서버(2000)로 전송하면서 서버(2000)에게 수정된 텍스트를 요청할 수 있다. 이 경우, 예를 들어, 디바이스(1000)는 서버(2000)에게 텍스트를 요청하면서, 디바이스(1000) 내의 ASR 모델의 종류 및 ASR 모델의 식별 값을 서버(2000)에게 전송할 수 있으나, 이에 제한되지 않는다. The device 1000 may request the modified text from the server 2000 while transmitting the text to the server 2000. In this case, for example, the device 1000 may transmit the type of the ASR model in the device 1000 and the identification value of the ASR model to the server 2000 while requesting the text from the server 2000, but this is not limited thereto. Does not.

또한, 예를 들어, 디바이스(1000)는 서버(2000)에게 수정된 텍스트를 요청하면서, 디바이스(1000)의 ASR 모델로부터 출력된 텍스트에 관련된 도메인 정보를 서버(2000)로 제공할 수 있다. 도메인 정보는 도메인을 식별하기 위한 정보로서, 예를 들어, 도메인의 명칭, 도메인의 식별자를 포함할 수 있으나, 이에 제한되지 않는다. 디바이스(1000)는 디바이스(1000) 내의 도메인 식별 모듈(1440)을 이용하여 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 디바이스(1000)는, 디바이스(1000)의 ASR 모델로부터 출력된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 도메인 신뢰도는, 텍스트의 적어도 일부가 특정 도메인에 어느 정도 관련되었는지를 나타내는 수치일 수 있다. 예를 들어, 디바이스(1000)는 ASR 모델로부터 출력된 텍스트가 디바이스(1000)에 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 또한, 디바이스(1000)는 산출된 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 디바이스(1000)는 룰 기반으로 텍스트에 관련된 도메인을 식별하거나 도메인 식별을 위해 훈련된 인공 지능 모델을 이용하여 텍스트에 관련된 도메인 신뢰도를 획득할 수 있다. 또한, 예를 들어, 도메인 식별을 위한 인공 지능 모델은 NLU 모델의 일부이거나, NLU 모델과는 별개의 모델일 수 있다.Also, for example, the device 1000 may provide domain information related to the text output from the ASR model of the device 1000 to the server 2000 while requesting the modified text from the server 2000. The domain information is information for identifying a domain, and may include, for example, a name of a domain and an identifier of a domain, but is not limited thereto. The device 1000 may identify a domain related to text by using the domain identification module 1440 in the device 1000. For example, the device 1000 may identify a domain related to the text based on the domain reliability of the text output from the ASR model of the device 1000. The domain reliability may be a numerical value indicating to what extent at least part of the text is related to a specific domain. For example, the device 1000 may calculate a confidence score indicating to what extent a text output from the ASR model is related to a domain registered in advance in the device 1000. Also, the device 1000 may identify a domain related to text based on the calculated domain reliability. The device 1000 may identify a domain related to the text based on a rule or obtain a domain reliability related to the text using an artificial intelligence model trained for domain identification. Also, for example, the artificial intelligence model for domain identification may be part of the NLU model or may be a model separate from the NLU model.

동작 S315에서 텍스트를 서버(2000)로 전송하지 않을 것을 결정한 경우, 디바이스(1000)는 ASR 모델로부터 출력된 텍스트를 이용하여 보이스 어시스턴트 서비스를 제공할 수 있다. 예를 들어, ASR 모델로부터 출력된 텍스트의 신뢰도가 기설정된 임계치 이상이면, 디바이스(1000)는 ASR 모델로부터 출력된 텍스트를 이용하여 보이스 어시스턴트 서비스를 위한 동작들을 수행할 수 있다. 또한, 예를 들어, ASR 모델의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들이 신뢰도가 가장 높은 제1 추정 텍스트 및 신뢰도가 두 번째로 높은 제2 추정 텍스트를 포함하며, 제1 추정 텍스트의 신뢰도 및 제2 추정 텍스트의 신뢰도의 차이가 소정 임계치보다 큰 경우에는, 디바이스(1000)는 신뢰도가 가장 높은 제1 추정 텍스트를 이용하여 보이스 어시스턴트 서비스를 제공할 수 있다.When it is determined not to transmit the text to the server 2000 in operation S315, the device 1000 may provide a voice assistant service by using the text output from the ASR model. For example, if the reliability of the text output from the ASR model is greater than or equal to a preset threshold, the device 1000 may perform operations for the voice assistant service by using the text output from the ASR model. In addition, for example, the plurality of estimated texts estimated in the speech recognition process of the ASR model include a first estimated text having the highest reliability and a second estimated text having the second highest reliability, and the reliability of the first estimated text and When the difference in reliability of the second estimated text is greater than a predetermined threshold, the device 1000 may provide the voice assistant service using the first estimated text having the highest reliability.

예를 들어, 디바이스(1000)는 ASR 모델로부터 출력된 텍스트를 화면 상에 디스플레이할 수 있다. 예를 들어, 디바이스(1000)는 ASR 모델로부터 출력된 텍스트에 기초하여 사용자와의 대화를 위한 동작을 수행할 수 있다, 또한, 예를 들어, 디바이스(1000)는 ASR 모델로부터 출력된 텍스트에 기초한 사용자와의 대화를 통해, 사용자에게 방송 서비스, 콘텐트 공유 서비스, 콘텐트 제공 서비스, 전력 관리 서비스, 게임 제공 서비스, 채팅 서비스, 문서 작성 서비스, 검색 서비스, 통화 서비스, 사진 촬영 서비스, 교통 수단 추천 서비스 및 동영상 재생 서비스 등과 같은 다양한 서비스를 제공할 수 있다.For example, the device 1000 may display text output from the ASR model on the screen. For example, the device 1000 may perform an operation for dialogue with the user based on the text output from the ASR model. Also, for example, the device 1000 may perform an operation based on the text output from the ASR model. Through dialogue with the user, broadcast service, content sharing service, content provision service, power management service, game provision service, chat service, document writing service, search service, call service, photo shooting service, transportation recommendation service, and Various services such as video playback service can be provided.

동작 S325에서 서버(2000)는 텍스트의 수정을 위한 도메인을 식별할 수 있다. 서버(2000)가 디바이스(1000)로부터 도메인 정보를 수신하는 경우에는, 서버(2000)는 도메인 정보로부터 텍스트의 수정을 위한 도메인을 식별할 수 있다. 또는, 서버(2000)가 디바이스(1000)로부터 도메인 정보를 수신하지 않는 경우에는, 서버(2000)는 서버(2000) 내의 도메인 식별 모듈(2312)을 이용하여, 디바이스(1000)로부터 수신된 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 이 경우, 서버(2000)는 디바이스(1000)로부터 수신된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 서버(2000)는 디바이스(1000)로부터 수신되는 텍스트가 텍스트의 수정을 위해 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 또한, 서버(2000)는 미리 등록된 도메인에 대해 산출된 도메인 신뢰도에 기초하여, 디바이스(1000)로부터 수신된 텍스트에 관련된 도메인을 식별할 수 있다. 서버(1000)는 룰 기반으로 텍스트에 관련된 도메인을 식별하거나 도메인 식별을 위해 훈련된 인공 지능 모델을 이용하여 텍스트에 관련된 도메인 신뢰도를 획득할 수 있다. 또한, 예를 들어, 도메인 식별을 위한 인공 지능 모델은 NLU 모델의 일부이거나, NLU 모델과는 별개의 모델일 수 있다. In operation S325, the server 2000 may identify a domain for text correction. When the server 2000 receives domain information from the device 1000, the server 2000 may identify a domain for text correction from the domain information. Alternatively, when the server 2000 does not receive domain information from the device 1000, the server 2000 uses the domain identification module 2312 in the server 2000 to write the text received from the device 1000. Related domains can be identified. For example, in this case, the server 2000 may identify a domain related to the text based on the domain reliability of the text received from the device 1000. For example, the server 2000 may calculate a confidence score indicating how relevant the text received from the device 1000 is to a domain registered in advance for text correction. Also, the server 2000 may identify a domain related to the text received from the device 1000 based on the domain reliability calculated for the domain registered in advance. The server 1000 may identify a domain related to the text based on a rule or obtain a domain reliability related to the text using an artificial intelligence model trained for domain identification. Also, for example, the artificial intelligence model for domain identification may be part of the NLU model or may be a model separate from the NLU model.

동작 S330에서 서버(2000)는 결정된 도메인에 대응되는 텍스트 수정 모델을 이용하여 텍스트를 수정할 수 있다. 서버(2000)는 복수의 도메인에 대응되는 복수의 텍스트 수정 모델들을 포함할 수 있으며, 복수의 텍스트 수정 모델 중에서, 동작 S325에서 식별된 도메인에 대응되는 텍스트 수정 모델을 선택할 수 있다. In operation S330, the server 2000 may modify the text using a text modification model corresponding to the determined domain. The server 2000 may include a plurality of text modification models corresponding to a plurality of domains, and among a plurality of text modification models, a text modification model corresponding to the domain identified in operation S325 may be selected.

서버(2000)는 서버(2000)에 등록된 도메인들 중에서 동작 S325에서 식별된 도메인에 대응되는 도메인을 선택하고, 선택된 도메인의 텍스트 수정 모델을 선택할 수 있다. 서버(2000)에 등록된 도메인들 중에서, 동작 S325에서 식별된 도메인에 대응되는 도메인은, 식별된 도메인과 동일한 도메인 또는 유사한 도메인일 수 있다. 예를 들어, 서버(2000) 내에 등록된 복수의 도메인이 “영화”, “장소”, “지역명”이며, 동작 S325에서 식별된 도메인 ‘영화’인 경우에, 서버(2000)는 “영화”를 선택할 수 있다. 예를 들어, 서버(2000) 내의 복수의 도메인이 “비디오 콘텐츠”, “장소”, “지역명”이며, 동작 S325에서 식별된 도메인 ‘영화’인 경우에, 서버(2000)는 “비디오 콘텐츠”를 선택할 수 있다. 이 경우, 서버(2000)의 도메인의 식별 값과 유사한 식별 값에 관한 정보가 서버(2000) 내에 저장되어 있을 수 있다.The server 2000 may select a domain corresponding to the domain identified in operation S325 from among the domains registered in the server 2000, and select a text correction model of the selected domain. Among the domains registered in the server 2000, the domain corresponding to the domain identified in operation S325 may be the same domain as the identified domain or a similar domain. For example, when a plurality of domains registered in the server 2000 are “movie”, “place”, and “region name”, and the domain “movie” identified in operation S325, the server 2000 is “movie” You can choose For example, when a plurality of domains in the server 2000 are “video content”, “place”, and “region name”, and the domain “movie” identified in operation S325, the server 2000 is “video content” You can choose In this case, information about an identification value similar to the identification value of the domain of the server 2000 may be stored in the server 2000.

또한, 서버(2000)는 선택된 텍스트 수정 모델을 이용하여 수정된 텍스트를 생성할 수 있다. 서버(2000)는 텍스트를 선택된 텍스트 수정 모델에 입력할 수 있으며, 텍스트 수정 모델로부터 출력되는 수정된 텍스트를 획득할 수 있다. 이 경우, 서버(2000)는 디바이스(1000)로부터 수신된 텍스트의 포맷을 텍스트 수정 모델에 적합하도록 전처리하고, 전처리된 값을 텍스트 수정 모델에 입력할 수도 있다. Also, the server 2000 may generate the modified text using the selected text modification model. The server 2000 may input text into the selected text correction model and obtain the corrected text output from the text correction model. In this case, the server 2000 may pre-process the format of the text received from the device 1000 to be suitable for the text correction model, and may input the preprocessed value into the text correction model.

만약, 디바이스(1000)로부터 수신된 텍스트가 복수의 도메인에 관련된 경우에, 서버(2000)는 텍스트의 수정을 위하여 복수의 도메인에 대응되는 복수의 텍스트 수정 모델을 선택할 수 있다. 이 경우, 서버(2000)는 복수의 텍스트 수정 모델로부터 출력되는 수정된 텍스트들로부터, 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다. 예를 들어, 서버(2000)가 복수의 텍스트 수정 모델들을 이용하여 복수의 수정된 텍스트를 생성한 경우에는, 복수의 수정된 텍스트들의 신뢰도를 비교하고, 신뢰도가 높은 수정된 텍스트를 디바이스(1000)에게 제공할 수정된 텍스트로 결정할 수 있다. 수정된 텍스트의 신뢰도는, 수정된 텍스트가 입력 음성에 일치하는 정도를 나타내는 수치일 수 있으며 예를 들어, 컨피던스 스코어(confidence score)를 포함할 수 있으나, 이에 제한되지 않는다.If the text received from the device 1000 is related to a plurality of domains, the server 2000 may select a plurality of text correction models corresponding to the plurality of domains for text correction. In this case, the server 2000 may obtain a modified text to be provided to the device 1000 from modified texts output from a plurality of text modification models. For example, when the server 2000 generates a plurality of modified texts using a plurality of text modification models, the reliability of the plurality of modified texts is compared, and the modified text with high reliability is displayed on the device 1000. It can be decided by the modified text to be provided to it. The reliability of the corrected text may be a numerical value indicating the degree to which the corrected text matches the input voice, and may include, for example, a confidence score, but is not limited thereto.

또한, 예를 들어, 서버(2000)는 복수의 텍스트 수정 모델로부터 출력되는 수정된 텍스트들로부터 일부 텍스트들을 추출하고, 추출된 일부 텍스트들을 조합함으로써, 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다. 예를 들어, 서버(2000)가 복수의 텍스트 수정 모델들을 이용하여 제1 수정 텍스트 및 제2 수정 텍스트를 생성하고, 제1 수정 텍스트의 일부의 신뢰도 및 제2 수정 텍스트의 일부의 신뢰도가 높은 경우에, 서버(2000)는 제1 수정 텍스트의 일부 및 제2 수정 텍스트의 일부를 조합함으로써 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다.In addition, for example, the server 2000 extracts some texts from modified texts output from a plurality of text modification models and combines the extracted texts to obtain modified texts to be provided to the device 1000 can do. For example, when the server 2000 generates a first modified text and a second modified text using a plurality of text modification models, and the reliability of a part of the first modified text and the reliability of a part of the second modified text are high In addition, the server 2000 may obtain a modified text to be provided to the device 1000 by combining a part of the first modified text and a part of the second modified text.

동작 S335에서 서버(2000)는 디바이스(1000)에게 수정된 텍스트를 제공할 수 있다. In operation S335, the server 2000 may provide the modified text to the device 1000.

한편, 도 3에서는 디바이스(1000)가 서버(2000)에게 수정된 텍스트를 요청하고 서버(2000)가 디바이스(1000)에게 수정된 텍스트를 제공하는 것으로 설명되었지만, 이에 제한되지 않는다. 서버(2000)는 수정된 텍스트를 이용하여 디바이스(1000)에게 다양한 종류의 보이스 어시스턴트 서비스를 제공할 수 있다. 보이스 어시스턴트 서비스는, 사용자와의 대화를 제공하는 서비스일 수 있다. 보이스 어시스턴트 서비스에서는 사용자의 상황, 디바이스의 상황 등을 고려하여 사람이 사용자와 직접 대화하는 것처럼 사용자에게 응답 메시지를 제공할 수 있다. 또한, 보이스 어시스턴트 서비스에서는, 사용자의 개인 비서처럼 사용자가 필요한 정보가 적절하게 생성되어 사용자에게 제공될 수 있다. 보이스 어시스턴트 서비스는, 예를 들어, 방송 서비스, 콘텐트 공유 서비스, 콘텐트 제공 서비스, 전력 관리 서비스, 게임 제공 서비스, 채팅 서비스, 문서 작성 서비스, 검색 서비스, 통화 서비스, 사진 촬영 서비스, 교통 수단 추천 서비스 및 동영상 재생 서비스 등과 같은 다양한 서비스와 연계되어, 사용자가 필요한 정보 또는 기능을 사용자에게 제공할 수 있다.Meanwhile, in FIG. 3, it has been described that the device 1000 requests the modified text from the server 2000 and the server 2000 provides the modified text to the device 1000, but is not limited thereto. The server 2000 may provide various types of voice assistant services to the device 1000 by using the modified text. The voice assistant service may be a service that provides a conversation with a user. In the voice assistant service, a response message can be provided to the user as if a person had a direct conversation with the user in consideration of the situation of the user and the device. In addition, in the voice assistant service, information required by the user, such as a personal assistant of the user, may be appropriately generated and provided to the user. The voice assistant service includes, for example, a broadcasting service, a content sharing service, a content providing service, a power management service, a game providing service, a chat service, a document writing service, a search service, a call service, a photo shooting service, a transportation recommendation service, and In connection with various services such as a video playback service, information or functions required by the user may be provided to the user.

이 경우, 서버(2000)는 텍스트에 기초하여 보이스 어시스턴트 서비스를 제공하기 위하여, 서버(2000) 내의 NLU (Natural Language Understanding) 모델, DM (Dialog Manager) 모델 및 NLG (Natural Language Generating) 모델 등을 이용하여, 사용자와의 대화를 수행하기 위한 정보를 디바이스(1000)에게 제공할 수 있다. 또한, 서버(2000)는 텍스트를 해석한 결과를 바탕으로, 다른 디바이스(미도시)를 직접 제어할 수 있다. 또한, 서버(2000)는 수정된 텍스트를 해석한 결과를 바탕으로, 디바이스(1000)가 다른 디바이스(미도시)를 제어하도록 하기 위한 제어 정보를 생성하고, 생성된 제어 정보를 디바이스(1000)에게 제공할 수도 있다.In this case, the server 2000 uses a Natural Language Understanding (NLU) model, a Dialog Manager (DM) model, and a Natural Language Generating (NLG) model in the server 2000 in order to provide a voice assistant service based on text. Thus, information for performing a conversation with the user may be provided to the device 1000. Also, the server 2000 may directly control other devices (not shown) based on the result of analyzing the text. In addition, the server 2000 generates control information for allowing the device 1000 to control another device (not shown) based on the result of analyzing the modified text, and transmits the generated control information to the device 1000. You can also provide.

도 4는 본 개시의 일 실시예에 따른 서버가 텍스트에 관련된 도메인을 식별하고 텍스트에 관련된 도메인의 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.4 is a diagram illustrating an example in which a server according to an embodiment of the present disclosure identifies a domain related to text and selects a text correction model of a domain related to text.

도 4를 참조하면, 디바이스(1000)의 ASR 모델(40)로부터 출력된 텍스트가 서버(2000) 내의 도메인 식별 모듈(2312)에게 제공될 수 있다. 서버(2000)는 서버(2000) 내의 도메인 식별 모듈(2312)을 이용하여, 디바이스(1000)로부터 수신된 텍스트에 관련된 도메인을 식별할 수 있다. 이 경우, 서버(2000)는 디바이스(1000)로부터 수신된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 서버(2000)는 디바이스(1000)로부터 수신되는 텍스트가 텍스트의 수정을 위해 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 도메인 식별 모듈(2312)는 도메인 식별을 위해 훈련된 인공 지능 모델로서 텍스트를 입력 값으로 하여 도메인 신뢰도를 출력할 수 있다. 또한, 예를 들어, 도메인 식별 모듈(2312)은 NLU 모델의 일부이거나, NLU 모델과는 별개의 모델일 수 있다. 또는 도메인 식별 모듈(2312)은 룰 기반으로 텍스트에 관련된 도메인을 식별할 수 있다.Referring to FIG. 4, text output from the ASR model 40 of the device 1000 may be provided to the domain identification module 2312 in the server 2000. The server 2000 may identify a domain related to the text received from the device 1000 using the domain identification module 2312 in the server 2000. In this case, the server 2000 may identify a domain related to the text based on the domain reliability of the text received from the device 1000. For example, the server 2000 may calculate a confidence score indicating how relevant the text received from the device 1000 is to a domain registered in advance for text correction. The domain identification module 2312 is an artificial intelligence model trained for domain identification and may output domain reliability by using text as an input value. Further, for example, the domain identification module 2312 may be a part of the NLU model or may be a model separate from the NLU model. Alternatively, the domain identification module 2312 may identify a domain related to text based on a rule.

도 4에서, 도메인 식별 모듈(2312)는, 예를 들어, 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 획득할 수 있다.In FIG. 4, the domain identification module 2312 may obtain, for example, a domain reliability for a first domain, a domain reliability for a second domain, and a domain reliability for a third domain.

또한, 서버(2000)의 모델 선택 모듈(2313)은 텍스트를 수정할 텍스트 수정 모델을 선택할 수 있다. 예를 들어, 모델 선택 모듈(2313)은 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 비교하고, 제1 도메인에 대한 도메인 신뢰도가 가장 높은 값을 가짐을 식별할 수 있다. 또한, 모델 선택 모듈(2313)은 서버(2000) 내의 복수의 텍스트 수정 모델들(41, 42, 33) 중에서 제1 도메인의 텍스트 수정 모델(41)을 선택할 수 있다.In addition, the model selection module 2313 of the server 2000 may select a text modification model to modify the text. For example, the model selection module 2313 compares the domain reliability for the first domain, the domain reliability for the second domain, and the domain reliability for the third domain, and determines a value with the highest domain reliability for the first domain. Have can be identified. Also, the model selection module 2313 may select the text correction model 41 of the first domain from among a plurality of text correction models 41, 42, and 33 in the server 2000.

서버(2000)는 디바이스(1000)로부터 수신된 텍스트를 제1 도메인의 텍스트 수정 모델(41)에 입력하고, 텍스트 수정 모델(41)로부터 출력되는 수정된 텍스트를 획득할 수 있다. 이후, 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다.The server 2000 may input the text received from the device 1000 into the text correction model 41 of the first domain, and obtain the corrected text output from the text correction model 41. Thereafter, the server 2000 may provide the modified text to the device 1000.

도 5는 본 개시의 일 실시예에 따른 디바이스가 텍스트에 관련된 도메인을 식별하고 서버가 텍스트에 관련된 도메인의 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.5 is a diagram illustrating an example in which a device identifies a domain related to text and a server selects a text modification model of a domain related to text according to an embodiment of the present disclosure.

도 5를 참조하면, 디바이스(1000)의 ASR 모델(50)로부터 출력된 텍스트가 디바이스(1000) 내의 도메인 식별 모듈(1440)에게 제공될 수 있다. 디바이스(1000)는 디바이스(1000) 내의 도메인 식별 모듈(1440)을 이용하여, ASR 모델(50)로부터 출력된 텍스트에 관련된 도메인을 식별할 수 있다. 이 경우, 디바이스(1000)는 ASR 모델(50)로부터 출력된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 디바이스(1000)는 ASR 모델(50)로부터 출력된 텍스트가 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 도메인 식별 모듈(1440)은 도메인 식별을 위해 훈련된 인공 지능 모델로서 텍스트를 입력 값으로 하여 도메인 신뢰도를 출력할 수 있다. 또한, 예를 들어, 도메인 식별 모듈(1440)은 NLU 모델의 일부이거나, NLU 모델과는 별개의 모델일 수 있다. 또는 도메인 식별 모듈(1440)은 룰 기반으로 텍스트에 관련된 도메인을 식별할 수 있다.Referring to FIG. 5, text output from the ASR model 50 of the device 1000 may be provided to the domain identification module 1440 in the device 1000. The device 1000 may use the domain identification module 1440 in the device 1000 to identify a domain related to the text output from the ASR model 50. In this case, the device 1000 may identify a domain related to the text based on the domain reliability of the text output from the ASR model 50. For example, the device 1000 may calculate a confidence score indicating how relevant the text output from the ASR model 50 is to a pre-registered domain. The domain identification module 1440 is an artificial intelligence model trained for domain identification and may output domain reliability by using text as an input value. Also, for example, the domain identification module 1440 may be a part of the NLU model or may be a model separate from the NLU model. Alternatively, the domain identification module 1440 may identify a domain related to text based on a rule.

도 5에서, 도메인 식별 모듈(1440)는, 예를 들어, 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 획득할 수 있다.In FIG. 5, the domain identification module 1440 may obtain, for example, a domain reliability for a first domain, a domain reliability for a second domain, and a domain reliability for a third domain.

또한, 디바이스(1000)는 ASR 모델(50)로부터 출력된 텍스트를 서버(2000)에게 제공할 수 있다. 또한, 디바이스(1000)는 도메인 식별 모듈(1440)에 의해 획득된 도메인 신뢰도를 서버(2000)에게 제공할 수 있다. 또는 디바이스(1000)는 도메인 식별 모듈(1440)에 의해 획득된 도메인 신뢰도에 기초하여 텍스트에 관련된 도메인을 식별하고, 식별된 도메인의 식별 정보를 서버(2000)에게 제공할 수 있다.Also, the device 1000 may provide the text output from the ASR model 50 to the server 2000. In addition, the device 1000 may provide the server 2000 with the domain reliability obtained by the domain identification module 1440. Alternatively, the device 1000 may identify a domain related to the text based on the domain reliability obtained by the domain identification module 1440 and provide identification information of the identified domain to the server 2000.

서버(2000)의 모델 선택 모듈(2313)은 텍스트를 수정할 텍스트 수정 모델을 선택할 수 있다. 디바이스(1000)가 서버(2000)에게 도메인 신뢰도를 제공한 경우에, 모델 선택 모듈(2313)은, 예를 들어, 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 비교하고, 제1 도메인에 대한 도메인 신뢰도가 가장 높은 값을 가짐을 식별할 수 있다. 또한, 모델 선택 모듈(2313)은 서버(2000) 내의 복수의 텍스트 수정 모델들(51, 52, 53) 중에서 제1 도메인의 텍스트 수정 모델(51)을 선택할 수 있다.The model selection module 2313 of the server 2000 may select a text modification model to modify the text. When the device 1000 provides the server 2000 with the domain reliability, the model selection module 2313 may, for example, the domain reliability for the first domain, the domain reliability for the second domain, and the third domain. The domain reliability for the first domain may be compared, and it may be identified that the domain reliability for the first domain has the highest value. Also, the model selection module 2313 may select the text correction model 51 of the first domain from among a plurality of text correction models 51, 52, and 53 in the server 2000.

또는, 디바이스(1000)가 서버(2000)에게 텍스트에 관련된 도메인의 식별 값을 제공한 경우에, 모델 선택 모듈(2313)은, 예를 들어, 디바이스(1000)로부터 수신된 도메인 식별 값에 따라 제1 도메인의 텍스트 수정 모델(51)을 선택할 수 있다.Alternatively, when the device 1000 provides the server 2000 with an identification value of a domain related to text, the model selection module 2313 may, for example, determine the domain identification value received from the device 1000. A text correction model 51 of one domain can be selected.

서버(2000)는 디바이스(1000)로부터 수신된 텍스트를 제1 도메인의 텍스트 수정 모델(51)에 입력하고, 텍스트 수정 모델(51)로부터 출력되는 수정된 텍스트를 획득할 수 있다. 이후, 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다.The server 2000 may input the text received from the device 1000 into the text correction model 51 of the first domain and obtain the corrected text output from the text correction model 51. Thereafter, the server 2000 may provide the modified text to the device 1000.

도 6은 본 개시의 일 실시예에 따른 서버 및 디바이스가 텍스트에 관련된 도메인을 각각 식별하고, 서버가 텍스트에 관련된 도메인의 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.6 is a diagram illustrating an example in which a server and a device according to an embodiment of the present disclosure identify domains related to text, and the server selects a text modification model of a domain related to text.

도 6을 참조하면, 디바이스(1000)의 ASR 모델(60)로부터 출력된 텍스트가 디바이스(1000) 내의 제1 도메인 식별 모듈(61)에게 제공될 수 있다. 제1 도메인 식별 모듈(61)은 도메인 식별 모듈(1440)일 수 있다. 디바이스(1000)는 디바이스(1000) 내의 제1 도메인 식별 모듈(61)을 이용하여, ASR 모델(60)로부터 출력된 텍스트의 제1 도메인 신뢰도를 획득할 수 있다. 또한, 디바이스(1000)는 ASR 모델(60)로부터 출력된 텍스트 및 제1 도메인 식별 모듈(61)로부터 획득된 제1 도메인 신뢰도를 서버(2000)에게 제공할 수 있다.Referring to FIG. 6, text output from the ASR model 60 of the device 1000 may be provided to the first domain identification module 61 in the device 1000. The first domain identification module 61 may be a domain identification module 1440. The device 1000 may obtain the first domain reliability of the text output from the ASR model 60 by using the first domain identification module 61 in the device 1000. Also, the device 1000 may provide the text output from the ASR model 60 and the first domain reliability obtained from the first domain identification module 61 to the server 2000.

서버(2000)는 디바이스(1000)로부터 텍스트를 수신하고, 수신된 텍스트를 서버(2000) 내의 제2 도메인 식별 모듈(62)에게 제공할 수 있다. 제2 도메인 식별 모듈(62)은 도메인 식별 모듈(2312)일 수 있다. 서버(2000)는 서버(2000) 내의 제2 도메인 식별 모듈(62)을 이용하여, 디바이스(1000)로부터 수신된 텍스트의 제2 도메인 신뢰도를 획득할 수 있다.The server 2000 may receive text from the device 1000 and provide the received text to the second domain identification module 62 in the server 2000. The second domain identification module 62 may be a domain identification module 2312. The server 2000 may acquire a second domain reliability of the text received from the device 1000 by using the second domain identification module 62 in the server 2000.

이후, 서버(2000)의 모델 선택 모듈(2313)은, 제1 도메인 신뢰도 및 제2 도메인 신뢰도에 기초하여, 텍스트를 수정할 텍스트 수정 모델을 선택할 수 있다. 예를 들어, 모델 선택 모듈(2313)은 제1 도메인 신뢰도 및 제2 도메인 신뢰도의 가중치 합에 기초하여, 서버(2000)에 등록된 도메인들 중에서 텍스트가 관련된 제1 도메인을 선택하고, 선택된 제1 도메인의 텍스트 수정 모듈(63)을 선택할 수 있다. 또한, 예를 들어, 이 경우, 제1 도메인 신뢰도 및 제2 도메인 신뢰도가 정규화됨으로써, 제1 도메인 신뢰도 및 제2 도메인 신뢰도에 각각 가중치가 반영될 수도 있으나, 이에 제한되지 않는다. 서버(2000)가 제1 도메인 신뢰도 및 제2 도메인 신뢰도에 기초하여 텍스트에 관련된 도메인을 선택하는 방법은 후술할 도 7에서 보다 상세히 설명하기로 한다.Thereafter, the model selection module 2313 of the server 2000 may select a text correction model to modify the text based on the reliability of the first domain and the reliability of the second domain. For example, the model selection module 2313 selects a first domain to which text is related from among domains registered in the server 2000 based on the sum of the weights of the reliability of the first domain and the reliability of the second domain, and the selected first domain The text editing module 63 of the domain can be selected. In addition, in this case, for example, since the reliability of the first domain and the reliability of the second domain are normalized, the weights may be respectively reflected in the reliability of the first domain and the reliability of the second domain, but are not limited thereto. A method for the server 2000 to select a domain related to the text based on the reliability of the first domain and the reliability of the second domain will be described in more detail with reference to FIG. 7 to be described later.

서버(2000)는 디바이스(1000)로부터 수신된 텍스트를 제1 도메인의 텍스트 수정 모델(63)에 입력하고, 텍스트 수정 모델(63)로부터 출력되는 수정된 텍스트를 획득할 수 있다. 이후, 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다.The server 2000 may input the text received from the device 1000 into the text correction model 63 of the first domain, and obtain the corrected text output from the text correction model 63. Thereafter, the server 2000 may provide the modified text to the device 1000.

도 7은 본 개시의 일 실시예에 따른 서버가 디바이스에서 획득된 도메인 신뢰도 및 서버에서 획득된 도메인 신뢰도를 이용하여, 텍스트에 관련된 도메인을 선택하는 방법의 흐름도이다.7 is a flowchart of a method for selecting a domain related to text by using a domain reliability obtained from a device and a domain reliability obtained from a server according to an embodiment of the present disclosure.

S700에서 서버(2000)는 디바이스(1000)의 제1 도메인 식별 모듈(61)에서 산출된 텍스트의 제1 도메인 신뢰도를 디바이스(1000)로부터 수신할 수 있다. 디바이스(1000)의 제1 도메인 식별 모듈(61)은 ASR 모델(60)로부터 출력된 텍스트가 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 이 경우, 예를 들어, 도메인 식별 모듈(61)은 도메인 식별을 위해 훈련된 인공 지능 모델로서 텍스트를 입력 값으로 하여 제1 도메인 신뢰도를 출력할 수 있다. 디바이스(1000)에 복수의 도메인이 등록되어 있는 경우에, 제1 도메인 식별 모듈(61)은, 복수의 도메인들 각각에 대하여 텍스트가 어느 정도 관련되었는지를 나타내는 복수의 제1 도메인 신뢰도를 획득할 수 있다.In S700, the server 2000 may receive from the device 1000 the first domain reliability of the text calculated by the first domain identification module 61 of the device 1000. The first domain identification module 61 of the device 1000 may calculate a confidence score indicating to what extent the text output from the ASR model 60 is related to a pre-registered domain. In this case, for example, the domain identification module 61 may output the first domain reliability by using text as an input value as an artificial intelligence model trained for domain identification. When a plurality of domains are registered in the device 1000, the first domain identification module 61 may obtain a plurality of first domain reliability indicating how much text is related to each of the plurality of domains. have.

S710에서 서버(2000)는 제2 도메인 식별 모듈(62)을 이용하여 디바이스(1000)로부터 수신된 텍스트의 제2 도메인 신뢰도를 산출할 수 있다. 서버(2000)의 제2 도메인 식별 모듈(62)은 디바이스(1000)로부터 수신된 텍스트가 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 이 경우, 예를 들어, 도메인 식별 모듈(62)은 도메인 식별을 위해 훈련된 인공 지능 모델로서 텍스트를 입력 값으로 하여 제2 도메인 신뢰도를 출력할 수 있다. 서버(2000)에 복수의 도메인이 등록되어 있는 경우에, 제2 도메인 식별 모듈(62)은, 복수의 도메인들 각각에 대하여 텍스트가 어느 정도 관련되었는지를 나타내는 복수의 제2 도메인 신뢰도를 획득할 수 있다.In S710, the server 2000 may calculate a second domain reliability of the text received from the device 1000 using the second domain identification module 62. The second domain identification module 62 of the server 2000 may calculate a confidence score indicating to what extent the text received from the device 1000 is related to a pre-registered domain. In this case, for example, the domain identification module 62 may output the second domain reliability by using text as an input value as an artificial intelligence model trained for domain identification. When a plurality of domains are registered in the server 2000, the second domain identification module 62 can obtain a plurality of second domain reliability indicating how much text is related to each of the plurality of domains. have.

S720에서 서버(2000)는 제1 도메인 신뢰도 및 제2 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 선택할 수 있다. 서버(2000)는 제1 도메인 신뢰도 및 제2 도메인 신뢰도의 가중치 합에 기초하여 기 등록된 복수의 도메인들 중에서 텍스트에 관련된 도메인을 선택할 수 있다. 예를 들어, 높은 제1 도메인 신뢰도를 가지는 도메인과 높은 제2 도메인 신뢰도를 가지는 도메인이 상이한 경우에, 서버(2000)는 제1 도메인 신뢰도에 기 설정된 제1 가중치를 부여하고 제2 도메인 신뢰도에 기 설정된 제2 가중치를 부여할 수 있다. 또한, 서버(2000)는 제1 가중치가 부여된 제1 도메인 신뢰도 및 제2 가중치가 부여된 제2 도메인 신뢰도에 기초하여, 기 등록된 복수의 도메인들 중에서 텍스트에 관련된 도메인을 선택할 수 있다. 이 경우, 예를 들어, 제1 도메인 신뢰도 및 제2 도메인 신뢰도가 정규화됨으로써, 제1 도메인 신뢰도 및 제2 도메인 신뢰도에 각각 가중치가 반영될 수도 있으나, 이에 제한되지 않는다. 예를 들어, 제1 도메인 신뢰도가 상위 계층의 도메인에 대한 신뢰도이며 제2 도메인 신뢰도가 하위 계층의 도메인에 대한 신뢰도인 경우에, 제1 도메인 신뢰도에 낮은 가중치가 부여되고 제2 도메인에 높은 가중치가 부여될 수 있다.In S720, the server 2000 may select a domain related to the text based on the reliability of the first domain and the reliability of the second domain. The server 2000 may select a domain related to text from among a plurality of pre-registered domains based on a weighted sum of the reliability of the first domain and the reliability of the second domain. For example, when a domain having a high first domain reliability and a domain having a high second domain reliability are different from each other, the server 2000 assigns a preset first weight to the first domain reliability and is based on the second domain reliability. The set second weight may be assigned. Also, the server 2000 may select a text-related domain from among a plurality of pre-registered domains based on the reliability of the first domain to which the first weight is assigned and the reliability of the second domain to which the second weight is assigned. In this case, for example, since the reliability of the first domain and the reliability of the second domain are normalized, weights may be respectively reflected in the reliability of the first domain and the reliability of the second domain, but are not limited thereto. For example, if the reliability of the first domain is the reliability of the domain of the upper layer and the reliability of the second domain is the reliability of the domain of the lower layer, a low weight is given to the first domain reliability and a high weight is applied to the second domain. Can be given.

일 실시예에 따르면, 서버(2000)는, 도메인을 선택하기 위하여, 디바이스(1000)의 ASR 모델로부터 출력된 텍스트의 신뢰도를 고려할 수 있다. 이 경우, ASR 모델로부터 출력된 텍스트의 신뢰도는, 디바이스(1000)에서 획득되어 서버(2000)로 제공될 수 있으나, 이에 제한되지 않는다. 또한, 예를 들어, 서버(2000)는 디바이스(1000)로부터 텍스트의 신뢰도에 기 설정된 제3 가중치를 부여하고, 제1 가중치가 부여된 제1 도메인 신뢰도, 제2 가중치가 부여된 제2 도메인 신뢰도, 및 제3 가중치가 부여된 텍스트의 신뢰도에 기초하여, 기 등록된 복수의 도메인들 중에서 텍스트에 관련된 도메인을 선택할 수 있다.According to an embodiment, the server 2000 may consider the reliability of text output from the ASR model of the device 1000 in order to select a domain. In this case, the reliability of the text output from the ASR model may be obtained from the device 1000 and provided to the server 2000, but is not limited thereto. Also, for example, the server 2000 assigns a predetermined third weight to the reliability of the text from the device 1000, the first domain reliability to which the first weight is assigned, and the second domain reliability to which the second weight is assigned. , And a text-related domain from among a plurality of previously registered domains may be selected based on the reliability of the text to which the third weight is assigned.

한편, 예를 들어, 높은 제1 도메인 신뢰도를 가지는 도메인과 높은 제2 도메인 신뢰도를 가지는 도메인이 동일한 경우에는, 서버(2000)는, 가중치를 고려하지 않고, 복수의 도메인들 중에서, 높은 제1 도메인 신뢰도를 가지는 도메인을 선택할 수 있다.On the other hand, for example, when a domain having a high first domain reliability and a domain having a high second domain reliability are the same, the server 2000 does not consider the weight, and among a plurality of domains, a high first domain You can select a domain with reliability.

도 8은 본 개시의 일 실시예에 따른 서버가 서버 내의 복수의 도메인 식별 모듈 중 선택된 도메인 식별 모듈을 이용하여 텍스트 수정 모델을 선택하는 예시를 나타내는 도면이다.8 is a diagram illustrating an example in which a server according to an embodiment of the present disclosure selects a text correction model by using a domain identification module selected from among a plurality of domain identification modules in the server.

도 8을 참조하면, 디바이스(1000)의 ASR 모델(80)로부터 출력된 텍스트가 디바이스(1000) 내의 제1 도메인 식별 모듈(81)에게 제공될 수 있다. 제1 도메인 식별 모듈(81)은 도메인 식별 모듈(1440)일 수 있다. 디바이스(1000)는 디바이스(1000) 내의 제1 도메인 식별 모듈(81)을 이용하여, ASR 모델(80)로부터 출력된 텍스트의 제1 도메인 신뢰도를 획득할 수 있다. 또한, 디바이스(1000)는 ASR 모델(80)로부터 출력된 텍스트 및 제1 도메인 식별 모듈(81)로부터 획득된 제1 도메인 신뢰도를 서버(2000)에게 제공할 수 있다.Referring to FIG. 8, text output from the ASR model 80 of the device 1000 may be provided to the first domain identification module 81 in the device 1000. The first domain identification module 81 may be a domain identification module 1440. The device 1000 may acquire the first domain reliability of the text output from the ASR model 80 by using the first domain identification module 81 in the device 1000. Further, the device 1000 may provide the text output from the ASR model 80 and the first domain reliability obtained from the first domain identification module 81 to the server 2000.

서버(2000)는, 디바이스(1000)로부터 수신된 제1 도메인 신뢰도에 기초하여, 서버(2000) 내의 복수의 제2 도메인 식별 모듈들(82) 중 하나를 선택할 수 있다. 음성 인식 시스템 내에서 음성 인식을 위한 도메인들은 계층적으로 설정될 수 있다. 음성 인식을 위한 도메인들은, 예를 들어, 제1 계층의 도메인, 제1 계층의 도메인의 하위 도메인인 제2 계층의 도메인들, 제2 계층의 도메인의 하위 도메인인 제3 계층의 도메인들, 제3 계층의 도메인의 하위 도메인인 제4 계층의 도메인들을 포함할 수 있다. 또한, 예를 들어, 제2 도메인 식별 모듈들(82)은, 예를 들어, 적어도 하나의 제2 계층의 도메인 식별 모듈(82-1), 적어도 하나의 제3 계층의 도메인 식별 모듈(82-2) 및 적어도 하나의 제4 계층의 도메인 식별 모듈(82-3) 등을 포함할 수 있다. 또한, 예를 들어, 제1 계층의 도메인은 제1 도메인 식별 모듈(81)에 대응되고, 제2 계층의 도메인은 제2 계층의 도메인 식별 모듈(82-1)에 대응되고, 제3 계층의 도메인은 제3 계층의 도메인 식별 모듈(82-2)에 대응되고, 제4 계층의 도메인은 제4 계층의 도메인 식별 모듈(82-3)에 대응되고, 제5 계층의 도메인은 텍스트 수정 모델에 대응될 수 있다. 이 경우, 서버(2000)는 도메인 식별 모듈 선택 모듈(2311)을 이용하여, 제1 도메인 식별 모듈(81)로부터 산출된 제1 도메인 신뢰도에 따라 복수의 제2 계층의 도메인들 중 높은 신뢰도를 가지는 제2 계층의 도메인을 식별할 수 있다. 또한, 서버(2000)의 도메인 식별 모듈 선택 모듈(2311)은 식별된 제2 계층의 도메인에 대응되는 제2 계층의 도메인 식별 모듈(82-1)을 선택할 수 있다. The server 2000 may select one of the plurality of second domain identification modules 82 in the server 2000 based on the reliability of the first domain received from the device 1000. Domains for speech recognition in a speech recognition system may be hierarchically set. Domains for speech recognition are, for example, a first layer domain, a second layer domain that is a subdomain of a first layer domain, a third layer domain that is a lower domain of a second layer domain, and It may include domains of the fourth layer, which are subdomains of the domain of the third layer. Further, for example, the second domain identification modules 82 may include, for example, at least one second layer domain identification module 82-1, at least one third layer domain identification module 82- 2) and at least one fourth layer domain identification module 82-3. In addition, for example, the domain of the first layer corresponds to the first domain identification module 81, the domain of the second layer corresponds to the domain identification module 82-1 of the second layer, and The domain corresponds to the domain identification module 82-2 of the third layer, the domain of the fourth layer corresponds to the domain identification module 82-3 of the fourth layer, and the domain of the fifth layer corresponds to the text correction model. Can correspond. In this case, the server 2000 uses the domain identification module selection module 2311 to have a high reliability among a plurality of second layer domains according to the first domain reliability calculated from the first domain identification module 81. The domain of the second layer can be identified. Also, the domain identification module selection module 2311 of the server 2000 may select the second layer domain identification module 82-1 corresponding to the identified second layer domain.

또한, 예를 들어, 서버(2000)는 선택된 제2 계층의 도메인 식별 모듈(82-1)을 이용하여 텍스트의 제2 도메인 신뢰도를 획득할 수 있다. 예를 들어, 서버(2000)의 도메인 식별 모듈 선택 모듈(2311)은 제2 도메인 신뢰도에 기초하여 복수의 제3 계층의 도메인들 중 높은 신뢰도를 가지는 제3 계층의 도메인을 식별하고, 식별된 제3 계층의 도메인에 대응되는 제3 계층의 도메인 식별 모듈(82-2)을 선택할 수 있다.Also, for example, the server 2000 may acquire the second domain reliability of the text by using the domain identification module 82-1 of the selected second layer. For example, the domain identification module selection module 2311 of the server 2000 identifies a third layer domain having a high reliability among a plurality of third layer domains based on the second domain reliability, and The third layer domain identification module 82-2 corresponding to the third layer domain may be selected.

또한, 예를 들어, 서버(2000)는 선택된 제3 계층의 도메인 식별 모듈(82-2)을 이용하여 텍스트의 제3 도메인 신뢰도를 획득할 수 있다. 예를 들어, 서버(2000)의 도메인 식별 모듈 선택 모듈(2311)은 제3 도메인 신뢰도에 기초하여 복수의 제4 계층의 도메인들 중 높은 신뢰도를 가지는 제4 계층의 도메인을 식별하고, 식별된 제4 계층의 도메인에 대응되는 제4 계층의 도메인 식별 모듈(82-3)을 선택할 수 있다.Also, for example, the server 2000 may acquire the third domain reliability of the text using the selected third layer domain identification module 82-2. For example, the domain identification module selection module 2311 of the server 2000 identifies a fourth layer domain having a high reliability among a plurality of fourth layer domains based on the third domain reliability, and The fourth layer domain identification module 82-3 corresponding to the 4th layer domain may be selected.

또한, 예를 들어, 서버(2000)는 선택된 제4 계층의 도메인 식별 모듈(82-3)을 이용하여 텍스트의 제4 도메인 신뢰도를 획득할 수 있다. 또한, 서버(2000)의 모델 선택 모듈(2313)은 제4 도메인 신뢰도에 기초하여 복수의 텍스트 수정 모델들 중에서 제3 도메인에 대응되는 텍스트 수정 모델(85)을 선택할 수 있다.In addition, for example, the server 2000 may acquire the reliability of the fourth domain of the text using the domain identification module 82-3 of the selected fourth layer. Also, the model selection module 2313 of the server 2000 may select the text correction model 85 corresponding to the third domain from among a plurality of text correction models based on the fourth domain reliability.

이후, 서버(2000)는 디바이스(1000)로부터 수신된 텍스트를 제3 도메인의 텍스트 수정 모델(85)에 입력하고, 텍스트 수정 모델(65)로부터 출력되는 수정된 텍스트를 획득할 수 있다. 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다.Thereafter, the server 2000 may input the text received from the device 1000 into the text correction model 85 of the third domain, and obtain the corrected text output from the text correction model 65. The server 2000 may provide the modified text to the device 1000.

도 8에서는, 서버(2000) 내의 제2 도메인 식별 모듈(82)의 계층이 제2 계층, 제3 계층 및 제4 계층을 포함하며, 서버(2000)가 제2 계층의 도메인 식별 모듈(82-1), 제3 계층의 도메인 식별 모듈(82-2) 및 제4 계층의 도메인 식별 모듈(82-3)을 순차적으로 선택하는 것으로 설명되었지만, 이에 제한되지 않는다.In FIG. 8, the layer of the second domain identification module 82 in the server 2000 includes a second layer, a third layer, and a fourth layer, and the server 2000 is a second layer domain identification module 82- 1), the domain identification module 82-2 of the third layer and the domain identification module 82-3 of the fourth layer are sequentially selected, but are not limited thereto.

서버(2000)는 제2 계층의 도메인 식별 모듈(82-1)로부터 산출된 제2 도메인 신뢰도, 제3 계층의 도메인 식별 모듈(82-2)로부터 산출된 제3 도메인 신뢰도 및 제4 계층의 도메인 식별 모듈(82-3)로부터 산출된 제4 도메인 신뢰도를 함께 고려하여, 텍스트를 수정하기 위한 도메인을 선택할 수도 있다. 이 경우, 서버(2000)는 제2 계층의 도메인 식별 모듈(82-1)로부터 산출된 제2 도메인 신뢰도, 제3 계층의 도메인 식별 모듈(82-2)로부터 산출된 제3 도메인 신뢰도 및 제4 계층의 도메인 식별 모듈(82-3)로부터 산출된 제4 도메인 신뢰도를 각각 정규화하고, 정규화된 값을 서로 비교함으로써 텍스트를 수정하기 위한 도메인을 선택할 수 있다.The server 2000 includes the second domain reliability calculated from the domain identification module 82-1 of the second layer, the third domain reliability calculated from the domain identification module 82-2 of the third layer, and the domain of the fourth layer. In consideration of the reliability of the fourth domain calculated from the identification module 82-3 together, a domain for correcting the text may be selected. In this case, the server 2000 uses the second domain reliability calculated from the second layer domain identification module 82-1, the third domain reliability calculated from the third layer domain identification module 82-2, and the fourth The fourth domain reliability calculated from the hierarchical domain identification module 82-3 is normalized, and the normalized values are compared with each other to select a domain for correcting the text.

예를 들어, 제1 도메인 식별 모듈(81)은 제1 계층의 도메인들에 대한 도메인 신뢰도들을 산출하고, 제2 계층의 도메인 식별 모듈(82-1)은 제2 계층의 도메인들에 대한 도메인 신뢰도들을 산출하고, 제3 계층의 도메인 식별 모듈(82-2)는 제3 계층의 도메인들에 대한 도메인 신뢰도들을 산출하고, 제4 계층의 도메인 식별 모듈(82-3)은 제4 계층의 도메인들에 대한 도메인 신뢰도들을 산출할 수 있다.For example, the first domain identification module 81 calculates domain reliability for the domains of the first layer, and the domain identification module 82-1 of the second layer is the domain reliability for the domains of the second layer. The domain identification module 82-2 of the third layer calculates domain reliability for the domains of the third layer, and the domain identification module 82-3 of the fourth layer calculates the domains of the fourth layer. Domain reliability levels for can be calculated.

한편, 서버(2000) 내의 제2 도메인 식별 모듈(82)의 계층은 제2 계층만을 포함할 수 있다. 또는, 서버(2000) 내의 제2 도메인 식별 모듈(82)의 계층은 제2 계층 내지 제4 계층외에 더 많은 계층을 포함할 수도 있다. 이 경우, 서버(2000) 내의 제2 도메인 식별 모듈(82)의 계층들에 따라, 서버(2000)는 각 계층에 대응되는 도메인 식별 모듈을 포함할 수 있다.Meanwhile, the layer of the second domain identification module 82 in the server 2000 may include only the second layer. Alternatively, the layer of the second domain identification module 82 in the server 2000 may include more layers in addition to the second to fourth layers. In this case, according to the layers of the second domain identification module 82 in the server 2000, the server 2000 may include a domain identification module corresponding to each layer.

예를 들어, 제1 도메인 식별 모듈(81)이 제1 계층의 도메인 “location”에 대한 도메인 신뢰도를 60%로 산출하고, 제1 계층의 도메인 “weather”에 대한 신뢰도를 30%로 산출할 수 있다. 제2 계층의 도메인 식별 모듈(82-1)은 제2 계층의 도메인 “Canada”에 대한 신뢰도를 40%로 산출하고, 제2 계층의 도메인 “USA”에 대한 신뢰도를 20%로 산출하고, 제2 계층의 도메인 “Rain”에 대한 신뢰도를 25%로 산출할 수 있다. 제3 계층의 도메인 식별 모듈(82-2)는 제3 계층의 도메인 “Brish Columbia”의 신뢰도를 20 %로 산출하고, 제3 계층의 도메인 “Ontario”의 신뢰도를 30 %로 산출하고 제3 계층의 도메인 “New York”의 신뢰도를 10 %로 산출하고, 제3 계층의 도메인 “강수량”의 신뢰도를 5%로 산출할 수 있다.For example, the first domain identification module 81 may calculate the reliability of the domain “location” of the first layer as 60% and the reliability of the domain “weather” of the first layer as 30%. have. The domain identification module 82-1 of the second layer calculates the reliability of the domain “Canada” of the second layer as 40%, the reliability of the domain “USA” of the second layer is calculated as 20%, and The reliability of the layer 2 domain “Rain” can be calculated as 25%. The domain identification module 82-2 of the third layer calculates the reliability of the domain “Brish Columbia” of the third layer as 20%, the reliability of the domain “Ontario” of the third layer is calculated as 30%, and the third layer The reliability of the domain “New York” of'New York' can be calculated as 10%, and the reliability of the domain'precipitation' of the third layer can be calculated as 5%.

또한, 예를 들어, 도메인 선택 모듈(2313)은 제1 계층의 도메인 “location”에 대한 도메인 신뢰도, 및 제1 계층의 도메인 “weather”에 대한 신뢰도에 제1 가중치를 각각 부여할 수 있다. 도메인 선택 모듈(2313)은 제2 계층의 도메인 “Canada”에 대한 신뢰도, 제2 계층의 도메인 “USA”에 대한 신뢰도 및 제2 계층의 도메인 “Rain”에 대한 신뢰도에 제2 가중치를 각각 부여할 수 있다. 도메인 선택 모듈(2313)은 3 계층의 도메인 “Brish Columbia”의 신뢰도, 제3 계층의 도메인 “Ontario”의 신뢰도, 제3 계층의 도메인 “New York”의 신뢰도 및 제3 계층의 도메인 “강수량”의 신뢰도에 제3 가중치를 각각 부여할 수 있다. 이 경우, 제2 가중치는 제1 가중치보다 크고 제3 가중치보다 작을 수 있다. 또한, 도메인 선택 모듈(2313)은 제1 가중치가 부연된 신뢰도들, 제2 가중치가 부여된 신뢰도들 및 제3 가중치가 부여된 신뢰도들을 함께 고려하여 텍스트 수정을 위한 도메인을 선택할 수 있다.In addition, for example, the domain selection module 2313 may assign a first weight to the reliability of the domain “location” of the first layer and the reliability of the domain “weather” of the first layer, respectively. The domain selection module 2313 assigns a second weight to the reliability of the domain “Canada” of the second layer, the reliability of the domain “USA” of the second layer, and the reliability of the domain “Rain” of the second layer, respectively. I can. The domain selection module 2313 has the reliability of the domain “Brish Columbia” of the third layer, the reliability of the domain “Ontario” of the third layer, the reliability of the domain “New York” of the third layer, and the domain “precipitation” of the third layer. Each of the third weights can be assigned to the reliability. In this case, the second weight may be larger than the first weight and smaller than the third weight. In addition, the domain selection module 2313 may select a domain for text correction by considering the reliability levels to which the first weight is added, the reliability levels to which the second weight is assigned, and the reliability levels to which the third weight is assigned together.

또한, 예를 들어, 도메인 선택 모듈(2313)은 “location”에 대한 신뢰도, “Canada”에 대한 신뢰도 및 “Brish Columbia”에 대한 신뢰도의 제1 가중치 합을 산출할 수 있다. 도메인 선택 모듈(2313)은 “location”에 대한 신뢰도, “Canada”에 대한 신뢰도 및 “Ontario”에 대한 신뢰도의 제2 가중치 합을 산출할 수 있다. 도메인 선택 모듈(2313)은 “location”에 대한 신뢰도, “USA”에 대한 신뢰도 및 “New York”에 대한 신뢰도의 제3 가중치 합을 산출할 수 있다. 또한, 예를 들어, 도메인 선택 모듈(2313)은 “Weather”에 대한 신뢰도, “Rain”에 대한 신뢰도 및 “강수량”에 대한 신뢰도의 제4 가중치 합을 산출할 수 있다.Also, for example, the domain selection module 2313 may calculate a first weight sum of the reliability for “location”, the reliability for “Canada”, and the reliability for “Brish Columbia”. The domain selection module 2313 may calculate a second weight sum of the reliability for “location”, the reliability for “Canada”, and the reliability for “Ontario”. The domain selection module 2313 may calculate a third weight sum of the reliability for “location”, the reliability for “USA”, and the reliability for “New York”. In addition, for example, the domain selection module 2313 may calculate a fourth weight sum of the reliability for “Weather”, the reliability for “Rain”, and the reliability for “Precipitation”.

예를 들어, 도메인 선택 모듈(2313)은 산출된 가중치 합들을 비교함으로써, 제1 가중치 합이 가능 높음을 식별하고, “Brish Columbia”를 텍스트 수정을 위한 도메인으로 결정할 수 있다. For example, by comparing the calculated weight sums, the domain selection module 2313 may identify that the first weight sum is as high as possible and determine “Brish Columbia” as a domain for text correction.

또는, 예를 들어, 도메인 선택 모듈(2313)은 제1 계층의 도메인들에 관한 도메인 신뢰도 및 제2 계층의 도메인들에 관한 신뢰도에 기초하여, 제2 계층의 도메인을 선택할 수 있다. 도메인 선택 모듈(2313)은 선택된 제2 계층의 도메인에 관련된 하위 도메인들을 선택할 수 있으며, 선택된 하위 도메인들에 대응되는 텍스트 수정 모델들을 이용하여 텍스트가 수정될 수 있다.Alternatively, for example, the domain selection module 2313 may select the domain of the second layer based on the reliability of the domains of the first layer and the domains of the second layer. The domain selection module 2313 may select subdomains related to the selected second layer domain, and text may be modified using text modification models corresponding to the selected subdomains.

또한, 도 8에서는, 디바이스(1000) 내의 제1 도메인 식별 모듈(81)이 제1 계층에 대응되고, 서버(2000) 내의 제2 도메인 식별 모듈(82)이 제2 계층 내지 제4 계층에 대응되는 것으로 설명하였지만, 이에 제한되지 않는다. 예를 들어, 디바이스(1000) 내의 제1 도메인 식별 모듈(81)이 제1 계층에 대응되고, 서버(2000) 내의 제2 도메인 식별 모듈(82)이 제1 계층 내지 제3 계층에 대응될 수도 있다.In addition, in FIG. 8, the first domain identification module 81 in the device 1000 corresponds to the first layer, and the second domain identification module 82 in the server 2000 corresponds to the second to fourth layers. It has been described as being, but is not limited thereto. For example, the first domain identification module 81 in the device 1000 may correspond to the first layer, and the second domain identification module 82 in the server 2000 may correspond to the first layer to the third layer. have.

한편, 도메인 신뢰도들에 대한 가중치가 디바이스(1000)에 관련된 컨텍스트 정보에 기초하여 부여될 수 있다. 컨텍스트 정보는, 디바이스(1000)의 주변 환경 정보, 디바이스(1000)의 상태 정보, 사용자의 상태 정보, 사용자의 디바이스(1000) 사용 이력 정보 및 사용자의 일정 정보 중 적어도 하나를 포함할 수 있으나, 이에 한정되는 것은 아니다. 디바이스(1000)의 주변 환경 정보는, 디바이스(1000)로부터 소정 반경 내의 환경 정보를 의미하는 것으로서, 예를 들어, 날씨 정보, 온도 정보, 습도 정보, 조도 정보, 소음 정보, 소리 정보 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 디바이스(1000)의 상태 정보는, 디바이스(1000)의 모드 정보(예컨대, 소리 모드, 진동 모드, 무음 모드, 절전 모드, 차단 모드, 멀티 윈도우 모드, 자동 회전 모드 등), 디바이스(1000)의 위치 정보, 시간 정보, 통신 모듈의 활성화 정보(예컨대, Wi-Fi ON / Bluetooth OFF / GPS ON/ NFC ON 등), 디바이스(1000)의 네트워크 접속 상태 정보, 디바이스(1000)에서 실행되는 애플리케이션 정보(예컨대, 애플리케이션의 식별 정보, 애플리케이션 종류, 애플리케이션 이용 시간, 애플리케이션 이용 주기) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 사용자의 상태 정보는 사용자의 움직임, 생활 패턴 등에 관한 정보로서, 사용자의 걷는 상태, 운동하는 상태, 운전 중인 상태, 수면 상태, 사용자의 기분 상태 등에 관한 정보를 포함할 수 있으나, 이에 한정되는 것은 아니다. 사용자의 디바이스(1000) 사용 이력 정보는, 사용자가 디바이스(1000)를 이용한 이력에 관한 정보로서, 애플리케이션의 실행 이력, 애플리케이션에서 실행된 기능의 이력, 사용자의 통화 내역, 및 사용자의 문자 내역 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.Meanwhile, weights for domain reliability levels may be assigned based on context information related to the device 1000. The context information may include at least one of surrounding environment information of the device 1000, state information of the device 1000, user state information, user's device 1000 usage history information, and user schedule information. It is not limited. The surrounding environment information of the device 1000 means environment information within a predetermined radius from the device 1000, and may include, for example, weather information, temperature information, humidity information, illumination information, noise information, sound information, etc. However, it is not limited thereto. The state information of the device 1000 includes mode information of the device 1000 (eg, sound mode, vibration mode, silent mode, power saving mode, blocking mode, multi-window mode, automatic rotation mode, etc.), and the location of the device 1000. Information, time information, activation information of the communication module (e.g., Wi-Fi ON / Bluetooth OFF / GPS ON / NFC ON, etc.), network connection status information of the device 1000, application information running on the device 1000 (e.g. , Application identification information, application type, application use time, application use period), etc., but are not limited thereto. The user's state information is information about the user's movement and life pattern, and may include information about the user's walking state, exercising state, driving state, sleeping state, user's mood state, etc., but is not limited thereto. . The user's device 1000 usage history information is information on the history of the user's use of the device 1000, and includes the execution history of the application, the history of functions executed in the application, the user's call history, and the user's text history. It may include, but is not limited thereto.

예를 들어, 디바이스(1000)에서 실행 중인 애플리케이션에 관한 컨텍스트 정보에 기초하여 도메인 신뢰도들에 대한 가중치가 결정될 수 있다. 예를 들어, 사용자의 음성 입력이 디바이스(1000)에서 실행 중인 애플리케이션에 대한 입력인 경우에, 애플리케이션에 관련된 도메인의 도메인 신뢰도에 높은 가중치가 부여될 수 있다. 또는 애플리케이션에 관련된 도메인이 텍스트 수정을 위한 도메인으로 바로 결정될 수도 있다. 예를 들어, 디바이스(1000)에서 지도 애플리케이션이 실행 중에 “아크로비스타”라는 음성 입력이 입력되면, 지도 도메인에 높은 가중치가 부여되거나, 텍스트 수정을 위한 도메인이 지도 도메인으로 바로 결정될 수 있다.For example, weights for domain reliability levels may be determined based on context information about an application running in the device 1000. For example, when a user's voice input is an input for an application running in the device 1000, a high weight may be given to the domain reliability of a domain related to the application. Alternatively, the domain related to the application may be directly determined as a domain for text correction. For example, when a voice input “Acrovista” is input while a map application is being executed in the device 1000, a high weight is assigned to the map domain, or a domain for text correction may be directly determined as the map domain.

예를 들어, 디바이스(1000)에서 제공된 보이스 어시스턴트 서비스를 통한 사용자의 대화 이력에 기초하여 도메인 신뢰도들에 대한 가중치가 결정될 수 있다. 예를 들어, 보이스 어시스턴트 서비스를 통해 사용자가 디바이스(1000)와 음악에 관련된 대화를 나누는 도중에, “아이유 검색해줘”라는 음성 입력이 디바이스(1000)에 입력되면, 음악 도메인에 높은 가중치가 부여되거나, 텍스트 수정을 위한 도메인이 음악 도메인으로 바로 결정될 수 있다.For example, based on the conversation history of the user through the voice assistant service provided by the device 1000, weights for domain reliability levels may be determined. For example, through the voice assistant service during user is having a conversation related to the device (1000) with music, when "IU Search me," called a voice input is entered in the device 1000, or given a higher weight in the music domain, The domain for text correction can be directly determined as the music domain.

예를 들어, 디바이스(1000)에서 수집되는 센싱 정보에 기초하여 도메인 신뢰도들에 대한 가중치가 결정될 수 있다. 디바이스(1000)에서 획득되는 위치 정보(예를 들어 GPS 정보)에 기초하여 도메인에 가중치가 부여될 수 있다. 예를 들어, 디바이스(1000)의 위치가 영화관 근처인 경우에, 영화 도메인에 높은 가중치가 부여될 수 있다. 예를 들어, 디바이스(1000)에서 음식점이 검색되고 있는 도중에 사용자의 음성 입력이 디바이스(1000)에 입력되면, 디바이스(1000)의 위치하는 장소에 관련된 도메인에 높은 가중치가 부여될 수 있다.For example, weights for domain reliability levels may be determined based on sensing information collected by the device 1000. A weight may be assigned to a domain based on location information (eg, GPS information) obtained from the device 1000. For example, when the location of the device 1000 is near a movie theater, a high weight may be given to the movie domain. For example, when a user's voice input is input to the device 1000 while the device 1000 is searching for a restaurant, a high weight may be given to a domain related to a location where the device 1000 is located.

한편, 도메인 신뢰도들에 대한 가중치가 트렌드 정보에 기초하여 부여될 수 있다. 예를 들어, 주요 뉴스에 대한 도메인 또는 포털 사이트를 통한 실시간 검색어의 도메인에 대하여 높은 가중치가 부여될 수 있다.Meanwhile, weights for domain reliability levels may be assigned based on trend information. For example, a high weight may be given to a domain for major news or a domain for a real-time search word through a portal site.

도 9는 본 개시의 일 실시예에 따른 서버가 복수의 도메인 식별 모듈 중 선택된 도메인 식별 모듈을 이용하여 텍스트의 수정을 위한 도메인을 선택하는 방법의 흐름도이다.9 is a flowchart of a method for selecting a domain for text correction by using a domain identification module selected from among a plurality of domain identification modules by a server according to an embodiment of the present disclosure.

도 10은 본 개시의 일 실시예에 따른 계층적으로 분류된 도메인에 관련된 제1 도메인 식별 모듈, 제2 도메인 식별 모듈 및 텍스트 수정 모델의 예시를 나타내는 도면이다.10 is a diagram illustrating an example of a first domain identification module, a second domain identification module, and a text correction model related to hierarchically classified domains according to an embodiment of the present disclosure.

도 9 및 도 10에서는, 예를 들어, 음성 인식을 위한 도메인들은 제1 계층, 제2 계층 및 제3 계층으로 분류될 수 있다.In FIGS. 9 and 10, domains for speech recognition may be classified into a first layer, a second layer, and a third layer, for example.

S900에서 서버(2000)는 디바이스(1000)의 제1 도메인 식별 모듈(81)에서 산출된 텍스트의 제1 도메인 신뢰도를 디바이스(1000)로부터 수신할 수 있다. 서버(2000)는 디바이스(1000)의 ASR 모델로부터 출력된 텍스트의 제1 도메인 신뢰도를 디바이스(1000)로부터 수신할 수 있다. 도 10을 참조하면, 예를 들어, 디바이스(1000)의 제1 도메인 식별 모듈(100)은 “location”에 관한 제1 계층의 도메인인 “all”에 대응될 수 있으며, 제1 도메인 식별 모듈(100)로부터 산출되는 제1 도메인 신뢰도는 “country”에 관련된 제2 계층의 도메인에 대한 도메인 신뢰도일 수 있다. 예를 들어, 제1 도메인 신뢰도는 “Canada”에 대한 도메인 신뢰도 및 도메인 “USA”에 대한 도메인 신뢰도를 포함할 수 있다. 또한, 예를 들어, 제2 도메인 식별 모듈(101)은 도메인 “Canada”에 대응되며, 제2 도메인 식별 모듈(102)는 도메인 “USA”에 대응될 수 있다.In S900, the server 2000 may receive from the device 1000 the first domain reliability of the text calculated by the first domain identification module 81 of the device 1000. The server 2000 may receive from the device 1000 the first domain reliability of the text output from the ASR model of the device 1000. Referring to FIG. 10, for example, the first domain identification module 100 of the device 1000 may correspond to “all”, which is a domain of the first layer related to “location”, and the first domain identification module ( The first domain reliability calculated from 100) may be the domain reliability for the domain of the second layer related to “country”. For example, the first domain reliability may include domain reliability for “Canada” and domain reliability for domain “USA”. Also, for example, the second domain identification module 101 may correspond to the domain “Canada”, and the second domain identification module 102 may correspond to the domain “USA”.

S910에서 서버(2000)는 제1 도메인 신뢰도에 기초하여, 복수의 제2 도메인 식별 모듈들(82) 중에 적어도 하나를 선택할 수 있다. 서버(2000)의 도메인 식별 모듈 선택 모듈(2311)은 제1 도메인 신뢰도에 기초하여 복수의 제2 도메인 식별 모듈들(82) 중에서 제2 계층의 도메인 식별 모듈(82-1)을 선택할 수 있다. 도 10을 참조하면, 예를 들어, 서버(2000)는 “Canada”에 대한 도메인 신뢰도 및 도메인 “USA”에 대한 도메인 신뢰도를 비교하고, “Canada”에 대한 도메인 신뢰도가 소정 임계치보다 높음을 식별할 수 있다. 또한, 서버(2000)는, 도메인 “Canada”에 대응되는 제2 도메인 식별 모듈(101) 및 도메인 “USA”에 대응되는 제2 도메인 식별 모듈(102) 중에서, “Canada”에 대응되는 제2 도메인 식별 모듈(101)을 선택할 수 있다. 이 경우, 제2 도메인 식별 모듈(101) 및 제2 도메인 식별 모듈(102)는 제2 계층에 대응되는 제2 도메인 식별 모듈일 수 있다.In S910, the server 2000 may select at least one of the plurality of second domain identification modules 82 based on the first domain reliability. The domain identification module selection module 2311 of the server 2000 may select the domain identification module 82-1 of the second layer from among the plurality of second domain identification modules 82 based on the first domain reliability. Referring to FIG. 10, for example, the server 2000 compares the domain reliability for “Canada” and the domain reliability for the domain “USA”, and identifies that the domain reliability for “Canada” is higher than a predetermined threshold. I can. In addition, the server 2000 is, among the second domain identification module 101 corresponding to the domain “Canada” and the second domain identification module 102 corresponding to the domain “USA”, the second domain corresponding to “Canada” The identification module 101 can be selected. In this case, the second domain identification module 101 and the second domain identification module 102 may be a second domain identification module corresponding to the second layer.

S920에서 서버(2000)는 선택된 제2 계층의 도메인 식별 모듈(82-1)을 이용하여 텍스트의 제2 도메인 신뢰도를 산출할 수 있다. 제2 계층의 도메인 식별 모듈(82-1)은 텍스트를 입력으로 하여 제2 도메인 신뢰도를 산출할 수 있다. 도 10을 참조하면, 제2 도메인 식별 모듈(101)로부터 산출되는 제2 도메인 신뢰도는 “province or state”에 관련된 제3 계층의 도메인에 대한 도메인 신뢰도일 수 있다. 예를 들어, 제2 도메인 신뢰도는 “British Columbia”에 대한 도메인 신뢰도, 도메인 “Ontario”에 대한 도메인 신뢰도, 도메인 “New York”에 대한 도메인 신뢰도, 및 도메인 “Illinois”에 대한 도메인 신뢰도를 포함할 수 있다. 또한, 예를 들어, 텍스트 수정 모델(103)은 도메인 “British Columbia”에 대응되며, 텍스트 수정 모델(104)은 도메인 “Ontario”에 대응되며, 텍스트 수정 모델(105)은 도메인 “New York”에 대응되며, 텍스트 수정 모델(106)은 도메인 Illinois”에 대응될 수 있다.In S920, the server 2000 may calculate the reliability of the second domain of the text using the domain identification module 82-1 of the selected second layer. The domain identification module 82-1 of the second layer may calculate the reliability of the second domain by inputting text. Referring to FIG. 10, the second domain reliability calculated by the second domain identification module 101 may be a domain reliability for a domain of a third layer related to “province or state”. For example, the second domain reliability may include domain reliability for “British Columbia”, domain reliability for domain “Ontario”, domain reliability for domain “New York”, and domain reliability for domain “Illinois”. have. In addition, for example, the text correction model 103 corresponds to the domain “British Columbia”, the text correction model 104 corresponds to the domain “Ontario”, and the text correction model 105 corresponds to the domain “New York”. Correspondingly, the text correction model 106 may correspond to the domain "Illinois".

S930에서 서버(2000)는 제2 도메인 신뢰도에 기초하여, 텍스트의 수정에 관련된 도메인을 선택할 수 있다. 서버(2000)의 모델 선택 모듈(2313)은 제2 도메인 신뢰도에 기초하여 복수의 텍스트 수정 모델들(83, 84, 85) 중에서 하나를 선택할 수 있다. 도 10을 참조하면, 예를 들어, 서버(2000)는 “British Columbia”에 대한 도메인 신뢰도, 도메인 “Ontario”에 대한 도메인 신뢰도, 도메인 “New York”에 대한 도메인 신뢰도, 및 도메인 “Illinois”에 대한 도메인 신뢰도를 비교하고, “British Columbia”에 대한 도메인 신뢰도 가 소정 임계치보다 높음을 식별할 수 있다. 또한, 서버(2000)는, 텍스트의 수정에 관한 도메인으로서, 도메인 “British Columbia”를 선택할 수 있다. 이에 따라, 텍스트는, 도메인 “British Columbia”에 대응되는 텍스트 수정 모델(103)에 의해 수정될 수 있다.In S930, the server 2000 may select a domain related to text correction based on the second domain reliability. The model selection module 2313 of the server 2000 may select one of the plurality of text correction models 83, 84, and 85 based on the second domain reliability. Referring to FIG. 10, for example, the server 2000 has domain reliability for “British Columbia”, domain reliability for domain “Ontario”, domain reliability for domain “New York”, and domain “Illinois”. The domain reliability is compared, and it can be identified that the domain reliability for “British Columbia” is higher than a predetermined threshold. Further, the server 2000 can select a domain "British Columbia" as a domain related to text correction. Accordingly, the text may be modified by the text modification model 103 corresponding to the domain “British Columbia”.

도 9 및 도 10에서는, 제1 도메인 식별 모듈(81)이 도메인의 제1 계층에 대응되고, 제2 도메인 식별 모듈(82)이 도메인의 제2 계층에 대응되고, 텍스트 수정 모델(83, 84, 85)이 제3 계층에 대응되는 것으로 설명하였지만, 이에 제한되지 않는다. 예를 들어, 제2 도메인 식별 모듈(82)은 더 많은 계층에 대응될 수 있다. 예를 들어, 제1 도메인 식별 모듈(81)이 도메인의 제1 계층에 대응되고, 제2 도메인 식별 모듈(82)이 도메인의 제2 계층, 제3 계층 및 제4 계층에 대응되고, 텍스트 수정 모델(83, 84, 85)이 제5 계층에 대응될 수 있으나, 이에 제한되지 않는다.9 and 10, the first domain identification module 81 corresponds to the first layer of the domain, the second domain identification module 82 corresponds to the second layer of the domain, and text correction models 83 and 84 , 85) has been described as corresponding to the third layer, but is not limited thereto. For example, the second domain identification module 82 may correspond to more layers. For example, the first domain identification module 81 corresponds to the first layer of the domain, the second domain identification module 82 corresponds to the second layer, the third layer and the fourth layer of the domain, and text correction The models 83, 84, and 85 may correspond to the fifth layer, but are not limited thereto.

도 10에서 서버(2000)는 제1 도메인 식별 모듈(100)로부터 산출된 도메인 신뢰도, 제2 도메인 식별 모듈(101)로부터 산출된 도메인 신뢰도 및 제2 도메인 식별 모듈(102)로부터 산출된 도메인 신뢰도를 각각 정규화하고, 정규화된 값을 비교함으로써 텍스트 수정을 위한 텍스트 수정 모델을 선택할 수도 있다.In FIG. 10, the server 2000 calculates the domain reliability calculated from the first domain identification module 100, the domain reliability calculated from the second domain identification module 101, and the domain reliability calculated from the second domain identification module 102. It is also possible to select a text correction model for text correction by normalizing each and comparing the normalized values.

도 11은 본 개시의 일 실시예에 따른 서버가 복수의 텍스트 수정 모델을 이용하여 텍스트를 수정하는 예시를 나타내는 도면이다.11 is a diagram illustrating an example in which a server corrects text using a plurality of text correction models according to an embodiment of the present disclosure.

도 11을 참조하면, 디바이스(1000)의 ASR 모델(110)로부터 출력된 텍스트가 서버(2000) 내의 도메인 식별 모듈(2312)에게 제공될 수 있다. 서버(2000)는 서버(2000) 내의 도메인 식별 모듈(2312)을 이용하여, 디바이스(1000)로부터 수신된 텍스트에 관련된 도메인을 식별할 수 있다. 이 경우, 서버(2000)는 디바이스(1000)로부터 수신된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. Referring to FIG. 11, text output from the ASR model 110 of the device 1000 may be provided to the domain identification module 2312 in the server 2000. The server 2000 may identify a domain related to the text received from the device 1000 using the domain identification module 2312 in the server 2000. In this case, the server 2000 may identify a domain related to the text based on the domain reliability of the text received from the device 1000.

도메인 식별 모듈(2312)는, 예를 들어, 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 획득할 수 있다. 예를 들어, 도메인 식별 모듈(2312)은 텍스트를 복수 구간으로 구분하고, 각 구간 별로 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 획득할 수 있다. 예를 들어, 도메인 식별 모듈(2312)은 텍스트를 제1 구간, 제2 구간 및 제3 구간으로 구분하고, 제1 구간, 제2 구간 및 제3 구간 각각에 대하여, 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 획득할 수 있다. 예를 들어, 서버(2000)가 텍스트 스트림을 수신하는 경우에, 제1 구간의 텍스트에 대한 도메인 신뢰도들을 획득하고, 제2 구간의 텍스트에 대한 도메인 신뢰도들을 획득하고, 제3 구간의 텍스트에 대한 도메인 신뢰도들을 획득할 수 있다. 이 경우, 서버(2000)는 텍스트 스트림을 수신하면서 추후에 수신될 텍스트를 기다리지 않고, 이미 수신된 텍스트를 실시간으로 수정할 수 있게 되며, 보다 빠른 속도로 텍스트에 관련된 도메인을 식별할 수 있게 된다.The domain identification module 2312 may obtain, for example, a domain reliability for a first domain, a domain reliability for a second domain, and a domain reliability for a third domain. For example, the domain identification module 2312 may divide the text into a plurality of sections, and obtain a domain reliability for a first domain, a domain reliability for a second domain, and a domain reliability for a third domain for each section. . For example, the domain identification module 2312 divides the text into a first section, a second section, and a third section, and for each of the first section, the second section, and the third section, the domain reliability of the first domain , Domain reliability for the second domain and domain reliability for the third domain may be obtained. For example, when the server 2000 receives a text stream, domain reliability for the text of the first section is obtained, domain reliability for the text of the second section is obtained, and the text of the third section is Domain credibility can be obtained. In this case, the server 2000 can modify the text already received in real time without waiting for the text to be received later while receiving the text stream, and can identify the domain related to the text at a faster rate.

또는, 예를 들어, 서버(2000)가 텍스트 스트림을 수신하는 경우에, 복수 구간의 텍스트에 대한 도메인 신뢰도를 누적하여 산출할 수도 있다. 예를 들어, 서버(2000)는 텍스트 스트림을 수신하면서 한 문장의 텍스트를 복수의 구간으로 구분할 수 있으며, 제1 구간의 텍스트에 대한 도메인 신뢰도들을 획득하고, 제1 구간 및 제2 구간의 텍스트에 대한 도메인 신뢰도들을 획득하고, 제1 구간 내지 제3 구간의 텍스트에 대한 도메인 신뢰도들을 획득할 수 있다. 이 경우, 서버(2000)는 문장 단위로 도메인 신뢰도를 산출하되 복수 구간을 누적하여 도메인 신뢰도를 산출함으로써 보다 효율적으로 텍스트에 관련된 도메인을 식별할 수 있게 된다.Alternatively, for example, when the server 2000 receives a text stream, domain reliability for a plurality of sections of text may be accumulated and calculated. For example, the server 2000 may divide the text of one sentence into a plurality of sections while receiving the text stream, obtain domain reliability for the text of the first section, and obtain the text of the first section and the second section. Domain reliability levels for the text may be obtained, and domain reliability levels for the text of the first to third period may be obtained. In this case, the server 2000 calculates the domain reliability in sentences, but by accumulating a plurality of sections to calculate the domain reliability, it is possible to more efficiently identify the domain related to the text.

또는, 예를 들어, 도메인 식별 모듈(2312)은 문장 단위의 텍스트에 대하여, 제1 도메인에 대한 도메인 신뢰도, 제2 도메인에 대한 도메인 신뢰도 및 제3 도메인에 대한 도메인 신뢰도를 획득할 수 있다. Alternatively, for example, the domain identification module 2312 may obtain a domain reliability for a first domain, a domain reliability for a second domain, and a domain reliability for a third domain for text in a sentence unit.

일 실시예에 따르면, 서버(2000)의 모델 선택 모듈(2313)은 텍스트를 수정할 텍스트 수정 모델을 선택할 수 있다. 예를 들어, 텍스트가 복수 구간으로 구분된 경우에, 서버(2000)는 텍스트의 각 구간 별로 상이한 텍스트 수정 모델을 선택할 수 있다. 예를 들어, 텍스트가 제1 구간, 제2 구간 및 제3 구간으로 구분된 경우에, 텍스트의 제1 구간을 수정하기 위해 텍스트 수정 모델(111)을 선택하고, 텍스트의 제2 구간을 수정하기 위해 텍스트 수정 모델(112)를 선택하고, 텍스트의 제3 구간을 수정하기 위해 텍스트 수정 모델(113)을 선택할 수 있다.According to an embodiment, the model selection module 2313 of the server 2000 may select a text correction model to modify text. For example, when the text is divided into a plurality of sections, the server 2000 may select a different text correction model for each section of the text. For example, when the text is divided into a first section, a second section, and a third section, selecting the text correction model 111 to correct the first section of the text, and correcting the second section of the text The text correction model 112 may be selected to correct the text, and the text correction model 113 may be selected to correct the third section of the text.

또는, 예를 들어, 서버(2000)는 문장 단위의 텍스트를 수정하기 위해 복수의 텍스트 수정 모델을 선택할 수 있다. 예를 들어, 서버(2000)는 문장 단위의 텍스트를 수정하기 위해 텍스트 수정 모델(111), 텍스트 수정 모델(112) 및 텍스트 수정 모델(113)을 선택할 수 있다.Alternatively, for example, the server 2000 may select a plurality of text correction models to correct text in units of sentences. For example, the server 2000 may select a text correction model 111, a text correction model 112, and a text correction model 113 to correct text in a sentence unit.

이후, 서버(2000)는 텍스트 수정 모델(111)로부터 출력된 제1 수정된 텍스트, 텍스트 수정 모델(112)로부터 출력된 제2 수정된 텍스트 및 텍스트 수정 모델(113)로부터 출력된 제3 수정된 텍스트를 이용하여, 디바이스(1000)로부터 수신된 텍스트에 대한 수정된 텍스트를 획득할 수 있다. Thereafter, the server 2000 includes a first modified text output from the text modification model 111, a second modified text output from the text modification model 112, and a third modified text output from the text modification model 113. Using the text, a modified text for the text received from the device 1000 may be obtained.

예를 들어, 서버(2000)는 제1 수정된 텍스트, 제2 수정된 텍스트 및 제3 수정된 텍스트 중 하나 이상을 선택하고, 선택된 텍스트의 적어도 일부를 이용하여, 디바이스(1000)로부터 수신된 텍스트에 대한 수정된 텍스트를 획득할 수 있다. 또는, 예를 들어, 서버(2000)는 제1 수정된 텍스트, 제2 수정된 텍스트 및 제3 수정된 텍스트 중 하나를 선택함으로써, 디바이스(1000)로부터 수신된 텍스트에 대한 수정된 텍스트를 획득할 수 있다. 또는, 예를 들어, 서버(2000)는 제1 수정된 텍스트의 적어도 일부, 제2 수정된 텍스트의 적어도 일부 및 제3 수정된 텍스트의 적어도 일부를 조합함으로써, 디바이스(1000)로부터 수신된 텍스트에 대한 수정된 텍스트를 획득할 수 있다.For example, the server 2000 selects one or more of a first modified text, a second modified text, and a third modified text, and uses at least a portion of the selected text to receive the text received from the device 1000. You can get the modified text for. Or, for example, the server 2000 may obtain the modified text for the text received from the device 1000 by selecting one of the first modified text, the second modified text, and the third modified text. I can. Alternatively, for example, the server 2000 may combine at least a part of the first modified text, at least a part of the second modified text, and at least a part of the third modified text, thereby adding the text received from the device 1000. You can get the modified text for

이후, 서버(2000)는 수정된 텍스트를 디바이스(1000)에게 제공할 수 있다.Thereafter, the server 2000 may provide the modified text to the device 1000.

도 12는 본 개시의 일 실시예에 따른 서버가 복수 구간의 텍스트에 대한 도메인 신뢰도를 누적하여 산출하는 방법의 흐름도이다.12 is a flowchart of a method of accumulating and calculating domain reliability for texts in a plurality of sections by a server according to an embodiment of the present disclosure.

S1200에서 서버(2000)는 텍스트의 제1 구간을 획득할 수 있다. 텍스트는 복수의 구간으로 구분될 수 있으며, 텍스트의 구간은, 예를 들어, 어절, 단어 또는 구 단위로 구분될 수 있다. 서버(2000)는 디바이스(1000)로부터 텍스트를 텍스트 스트림의 형태로 수신할 수 있다. 이 경우, 서버(2000)는 텍스트 스트림을 실시간으로 수신하면서 텍스트의 제1 구간을 획득할 수 있다. 또는, 서버(2000)는 디바이스(1000)로부터 문장 형태의 텍스트를 수신하고, 수신된 텍스트에서 제1 구간의 텍스트를 추출할 수 있다.In S1200, the server 2000 may acquire the first section of the text. The text may be divided into a plurality of sections, and the sections of the text may be divided into, for example, words, words, or phrases. The server 2000 may receive text from the device 1000 in the form of a text stream. In this case, the server 2000 may acquire the first section of the text while receiving the text stream in real time. Alternatively, the server 2000 may receive text in a sentence form from the device 1000 and extract the text of the first section from the received text.

S1210에서 서버(2000)는 제1 구간의 텍스트에 대한 도메인 신뢰도를 산출할 수 있다. 서버(2000)는 서버(2000)에 등록된 도메인들에 대하여, 제1 구간의 텍스트에 대한 도메인 신뢰도를 산출할 수 있다.In S1210, the server 2000 may calculate a domain reliability for the text of the first section. The server 2000 may calculate the domain reliability for the text of the first section for domains registered in the server 2000.

S1220에서 서버(2000)는 텍스트의 제2 구간을 획득할 수 있다. 서버(2000)가 디바이스(1000)로부터 텍스트를 텍스트 스트림의 형태로 수신하는 경우에, 서버(2000)는 텍스트 스트림을 실시간으로 수신하면서 텍스트의 제2 구간을 획득할 수 있다. 또는, 서버(2000)는 디바이스(1000)로부터 문장 형태의 텍스트를 수신하고, 수신된 텍스트에서 제2 구간의 텍스트를 추출할 수 있다.In S1220, the server 2000 may acquire a second section of the text. When the server 2000 receives text from the device 1000 in the form of a text stream, the server 2000 may acquire the second section of the text while receiving the text stream in real time. Alternatively, the server 2000 may receive text in the form of a sentence from the device 1000 and extract the text of the second section from the received text.

S1230에서 서버(2000)는 제1 구간 및 제2 구간의 텍스트에 대한 도메인 신뢰도를 산출할 수 있다. 서버(2000)는 제1 구간의 텍스트 및 제2 구간의 텍스트를 누적하고, 제1 구간 및 제2 구간을 포함하는 누적된 텍스트에 대한 도메인 신뢰도를 산출할 수 있다. In S1230, the server 2000 may calculate the domain reliability for the text of the first section and the second section. The server 2000 may accumulate the text of the first section and the text of the second section, and calculate a domain reliability for the accumulated text including the first section and the second section.

S1240에서 서버(2000)는 텍스트의 제n 구간을 획득할 수 있다. 서버(2000)가 디바이스(1000)로부터 텍스트를 텍스트 스트림의 형태로 수신하는 경우에, 서버(2000)는 텍스트 스트림을 실시간으로 수신하면서 텍스트의 제n 구간을 획득할 수 있다. 또는, 서버(2000)는 디바이스(1000)로부터 문장 형태의 텍스트를 수신하고, 수신된 텍스트에서 제n 구간의 텍스트를 추출할 수 있다.In S1240, the server 2000 may acquire the nth section of the text. When the server 2000 receives text from the device 1000 in the form of a text stream, the server 2000 may acquire the n-th section of the text while receiving the text stream in real time. Alternatively, the server 2000 may receive text in the form of a sentence from the device 1000 and extract the text of the nth section from the received text.

S1250에서 서버(2000)는 제1 구간 내지 제n 구간의 텍스트에 대한 도메인 신뢰도를 산출할 수 있다. 서버(2000)는 제1 구간의 텍스트 내지 제n 구간의 텍스트를 누적하고, 제1 구간 내지 제n 구간을 포함하는 누적된 텍스트에 대한 도메인 신뢰도를 산출할 수 있다. In S1250, the server 2000 may calculate a domain reliability for the text of the first to nth interval. The server 2000 may accumulate the text of the first section through the text of the nth section and calculate a domain reliability for the accumulated text including the first section through the nth section.

S1260에서 서버(2000)는 제1 구간 내지 제n 구간의 텍스트에 대한 도메인 신뢰도에 기초하여, 디바이스(1000)로부터 수신된 텍스트를 수정할 도메인을 결정할 수 있다.In S1260, the server 2000 may determine a domain to modify the text received from the device 1000 based on the domain reliability of the text in the first to nth period.

도 13은 본 개시의 일 실시예에 따른 서버가 어절 단위로 누적되는 텍스트 스트림에 대한 도메인 신뢰도를 획득하는 예시를 나타내는 도면이다.13 is a diagram illustrating an example in which a server obtains domain reliability for a text stream accumulated in word units according to an embodiment of the present disclosure.

도 13을 참고하면, 서버(2000)의 도메인 식별 모듈(2312)은 어절 단위로 텍스트를 구분하고, 구분된 텍스트를 누적하고, 누적된 텍스트에 대하여 도메인 신뢰도를 획득할 수 있다.Referring to FIG. 13, the domain identification module 2312 of the server 2000 may classify text by word unit, accumulate the text, and obtain domain reliability with respect to the accumulated text.

예를 들어, 도메인 식별 모듈(2312)에 제1 구간의 텍스트인 “새로 나온”이 입력되면, 도메인 식별 모듈(2312)는 “새로나온”에 관련된 도메인 신뢰도가 “0.1”로 낮아 도메인을 식별 값을 “Rejected”로 출력할 수 있다.For example, when “new”, which is the text of the first section, is input to the domain identification module 2312, the domain identification module 2312 lowers the domain reliability related to “new” to “0.1” and identifies the domain. Can be output as “Rejected”.

이후, 도메인 식별 모듈(2312)에 제2 구간의 텍스트인 “트와이스”가 입력되면, 도메인 식별 모듈(2312)는 제1 구간의 텍스트인 “새로나온”과 제2 구간의 텍스트인 “트와이스”를 누적하고, 누적된 텍스트인 “새로나온 트와이스”에 관련된 도메인 식별 값 “Music” 및 도메인 신뢰도가 “0.7”을 출력할 수 있다. Thereafter, when "Twice", which is the text of the second section, is input to the domain identification module 2312, the domain identification module 2312 is used as the text of the first section, "New" and the text of "Twice", which is the second section. ”Is accumulated, and the domain identification value “Music” related to the accumulated text “New Twice” and the domain reliability level “0.7” can be output.

이후, 도메인 식별 모듈(2312)에 제3 구간의 텍스트인 “예스 오 노”가 입력되면, 도메인 식별 모듈(2312)는 제1 구간의 텍스트인 “새로나온”, 제2 구간의 텍스트인 “트와이스” 및 제3 구간의 텍스트인 “예스 오 노”를 누적하고, 누적된 텍스트인 “새로나온 트와이스 예스 오 노”에 관련된 도메인 식별 값 “Music” 및 도메인 신뢰도가 “0.9”를 출력할 수 있다.Thereafter, when the domain identification module 2312 inputs the text of the third section, “Yes Oh No”, the domain identification module 2312 uses the text of the first section, “New”, and the text of the second section. The domain identification value “Music” related to the accumulated text “New Twice Yes Ono” and the domain reliability level “0.9” can be output. have.

이후, 도메인 식별 모듈(2312)에 제4 구간의 텍스트인 “틀어줘”가 입력되면, 도메인 식별 모듈(2312)는 제1 구간의 텍스트인 “새로나온”, 제2 구간의 텍스트인 “트와이스”, 제3 구간의 텍스트인 “예스 오 노” 및 제4 구간의 텍스트인 “틀어줘”를 누적하고, 누적된 텍스트인 “새로 나온 트와이스 예스 오 노 틀어줘”에 관련된 도메인 식별 값 “Music” 및 도메인 신뢰도가 “1.0”를 출력할 수 있다.Thereafter, when the domain identification module 2312 inputs the text of the fourth section, "Play it," the domain identification module 2312 uses the text of the first section, "New", and the text of the second section, "Twice. ”, the text of the third section, “Yes Oh No”, and the text of the fourth section, “Play me” are accumulated, and the domain identification value related to the accumulated text “New Twice Yes Oh No Play” is “Music "" and "1.0" with the domain reliability can be output.

이에 따라, 서버(2000)는 “새로 나온 트와이스 예스 오 노 틀어줘”를 수정하기 위한 도메인 수정 모델로서, 도메인 “Music”의 도메인 수정 모델을 선택할 수 있다.Accordingly, the server 2000 may select a domain modification model of the domain “Music” as a domain modification model for correcting “New Twice Yes Oh Play”.

도 13에서는 도메인 식별 모듈(2312)이 가장 높은 값을 가지는 도메인 및 도메인 신뢰도를 출력하는 것으로 설명되었지만, 이에 제한되지 않는다. 도메인 식별 모듈(2312)은 서버(2000)에 등록된 복수의 도메인에 대하여 도메인 별로 도메인 신뢰도를 출력할 수도 있다.In FIG. 13, it has been described that the domain identification module 2312 outputs the domain having the highest value and the domain reliability, but is not limited thereto. The domain identification module 2312 may output domain reliability for each domain with respect to a plurality of domains registered in the server 2000.

도 13에서는 복수 구간의 텍스트에 대한 도메인 신뢰도가 누적하여 산출되는 것으로 설명되었지만, 이에 제한되지 않는다. 예를 들어, 서버(2000)는 복수 구간의 텍스트에 대하여 각 구간의 텍스트를 수정할 텍스트 수정 모델을 순차적으로 선택할 수 있다. 예를 들어, 서버(2000)가 텍스트 스트림을 수신하느 경우에, 서버(2000)는, 텍스트 스트림을 수신하면서, 제1 구간의 텍스트에 대한 도메인 신뢰도를 산출하여 제1 구간의 텍스트를 수정할 텍스트 수정 모델을 선택하고, 제2 구간의 텍스트에 대한 도메인 신뢰도를 산출하여 제2 구간의 텍스트를 수정할 텍스트 수정 모델을 선택하고, 제n 구간의 텍스트에 대한 도메인 신뢰도를 산출하여 제n 구간의 텍스트를 수정할 텍스트 수정 모델을 선택할 수도 있다.In FIG. 13, it has been described that the domain reliability for the text of a plurality of sections is calculated by accumulating, but is not limited thereto. For example, the server 2000 may sequentially select a text correction model to modify the text of each section with respect to the text of a plurality of sections. For example, when the server 2000 receives the text stream, the server 2000 calculates the domain reliability of the text in the first section while receiving the text stream, and modifies the text to modify the text in the first section. Select the model, calculate the domain reliability for the text of the second section, and modify the text of the n-th section by selecting a text correction model to modify the text of the second section, and calculate the domain reliability for the text of the n-th section. You can also choose a text correction model.

도 14는 본 개시의 일 실시예에 따른 서버가 텍스트를 복수의 구간으로 구분하고 각 구간의 텍스트 별로 도메인을 선택하는 방법의 흐름도이다.14 is a flowchart illustrating a method of dividing text into a plurality of sections by a server and selecting a domain for each text of each section according to an embodiment of the present disclosure.

S1400에서 서버(2000)는 복수의 도메인 각각에 대하여 텍스트의 도메인 신뢰도를 산출할 수 있다. 서버(2000)는 디바이스(1000)로부터 수신된 텍스트에 대하여 서버(2000)에 등록된 복수의 도메인 별로 도메인 신뢰도를 산출할 수 있다. In S1400, the server 2000 may calculate the domain reliability of the text for each of the plurality of domains. The server 2000 may calculate a domain reliability for each of a plurality of domains registered in the server 2000 with respect to the text received from the device 1000.

S1410에서 서버(2000)는 산출된 도메인 신뢰도를 비교하여, 텍스트를 복수의 구간으로 구분할 수 있다. 서버(2000)는 도메인 별로 도메인 신뢰도가 높은 텍스트 구간을 식별함으로써, 텍스트를 복수의 구간으로 구분할 수 있다. In S1410, the server 2000 may divide the text into a plurality of sections by comparing the calculated domain reliability. The server 2000 may divide the text into a plurality of sections by identifying a text section with high domain reliability for each domain.

동작 S1420에서 서버(2000)는 구분된 각 구간의 텍스트 별로 텍스트 수정을 위한 도메인을 선택할 수 있다. 서버(2000)는 각 구간 별의 텍스트 별로, 각 구간의 텍스트에 대하여 도메인 신뢰도가 가장 높은 도메인을, 각 구간의 텍스트에 대응되는 도메인으로 선택할 수 있다. In operation S1420, the server 2000 may select a domain for text correction for each text of each segment. The server 2000 may select, for each text of each section, a domain having the highest domain reliability for the text of each section, as a domain corresponding to the text of each section.

도 15는, 서버가 복수의 도메인 별로 텍스트의 도메인 신뢰도를 비교하고 각 구간의 텍스트 별로 텍스트 수정 모델을 선택하여 수정하는 예시를 나타내는 도면이다.15 is a diagram illustrating an example in which a server compares the domain reliability of texts for each of a plurality of domains, and selects and corrects a text correction model for each text in each section.

도 15를 참조하면, 서버(2000)는 텍스트 스트림을 수신하면서 텍스트의 구간들을 구분할 수 있으며, 구분된 각 구간의 텍스트에 대하여 실시간으로 도메인 신뢰도를 산출할 수 있다. 예를 들어, 서버(2000)는 “오늘 염통 역 근처에서 길홍이를 만나서 어 베저스를 볼 거야”라는 텍스트 스트림을 수신할 수 있다. 서버(2000)는 텍스트 스트림을 수신하면서, “오늘 염통 역 근처에서”를 제1 구간으로 식별하고, “길홍이를 만나서”를 제2 구간으로 식별하고, “어 베저스를 볼 거야”를 제3 구간으로 식별할 수 있다. 또한, 서버(2000)는 텍스트 스트림을 수신하면서, “오늘 염통 역 근처에서”에 관련된 도메인 신뢰도들, “길홍이를 만나서”에 관련된 도메인 신뢰도들 및 “어 베저스를 볼 거야”에 관련된 도메인 신뢰도들을 순차적으로 산출할 수 있다.Referring to FIG. 15, the server 2000 may classify sections of text while receiving a text stream, and calculate domain reliability in real time with respect to the text of each section. For example, the server 2000 may receive a text stream stating "I will meet Gil Hong-i near Yeomtong Station today and see A Betters". While receiving the text stream, the server 2000 identifies “near Yeomtong Station today” as the first section, identifies “Meet Gilhongi” as the second section, and suggests “I'll see a betters”. It can be identified by 3 sections. In addition, the server 2000 receives the text stream, the domain reliability related to “near Yeomtong Station today”, the domain reliability related to “Meet Gilhongi” and the domain reliability related to “I'm going to see a betters” Can be calculated sequentially.

예를 들어, 서버(2000)의 도메인 식별 모듈(2312)은 “오늘 염통 역 근처에서 길홍이를 만나서 어 베저스를 볼 거야”라는 텍스트의 각 구간에 대하여, “영화” 도메인의 도메인 신뢰도, “장소” 도메인의 도메인 신뢰도 및 “연락처” 도메인의 도메인 신뢰도를 산출할 수 있다.For example, the domain identification module 2312 of the server 2000, for each section of the text “I will meet Gil Hong-i near Yeomtong Station today and see A Betters”, the domain reliability of the “movie” domain, “ You can calculate the domain reliability of the “place” domain and the domain reliability of the “contact” domain.

예를 들어, 서버(2000)의 도메인 선택 모듈(2313)은 영화” 도메인의 도메인 신뢰도, “장소” 도메인의 도메인 신뢰도 및 “연락처” 도메인의 도메인 신뢰도를 비교할 수 있다. 도메인 선택 모듈(2313)은 “오늘 염통 역 근처에서”에 대하여 “장소” 도메인의 도메인 신뢰도가 높음을 식별할 수 있다. 도메인 선택 모듈(2313)은 “길홍이를 만나서”에 대하여 “연락처” 도메인의 도메인 신뢰도가 높음을 식별할 수 있다. 도메인 선택 모듈(2313)은 “어 베저스를 볼 거야”에 대하여 “영화” 도메인의 도메인 신뢰도가 높음을 식별할 수 있다. 이에 따라, 도메인 선택 모듈(2313)은 “오늘 염통 역 근처에서 길홍이를 만나서 어 베저스를 볼 거야”라는 텍스트 스트림으로부터, “오늘 염통 역 근처에서”를 제1 구간으로 식별하고, “길홍이를 만나서”를 제2 구간으로 식별하고, “어 베저스를 볼 거야”를 제3 구간으로 순차적으로 식별할 수 있다.For example, the domain selection module 2313 of the server 2000 may compare the domain reliability of the “movie” domain, the domain reliability of the “place” domain, and the domain reliability of the “contact” domain. The domain selection module 2313 may identify that the domain reliability of the “place” domain is high with respect to “near the Yeomtong Station today”. The domain selection module 2313 may identify that the domain reliability of the "contact" domain is high with respect to "Meeting Gil Hong". The domain selection module 2313 may identify that the domain reliability of the “movie” domain is high with respect to “I will watch A Betters”. Accordingly, the domain selection module 2313 identifies “I will meet Gil Hong-i near Yeomtong Station today and see A Betters” as the first section from the text stream “I will meet Gilhongi near Yeomtong Station today” as the first section, and "Meet me" can be identified as the second section, and "I'll see a betters" can be sequentially identified as the third section.

또한, 예를 들어, 도메인 선택 모듈(2313)은 “오늘 염통 역 근처에서”에 관련된 도메인으로 “장소” 도메인을 선택하고, “길홍이를 만나서”에 관련된 도메인으로 “연락처” 도메인을 선택하고, “어 베저스를 볼 거야”에 관련된 도메인으로 “영화” 도메인을 선택할 수 있다.In addition, for example, the domain selection module 2313 selects the “place” domain as the domain related to “Today near Yeomtong Station”, and selects the “contact” domain as the domain related to “Meet Gil Hong-i”, You can select the “Movies” domain as the domain related to “Ah We're Watching Veggies”.

“장소” 도메인의 텍스트 수정 모델은 “오늘 염통 역 근처에서”를 “오늘 영통 역 근처에서”로 수정하고, “연락처” 도메인의 텍스트 수정 모델은 “길홍이를 만나서”를 “길동이를 만나서”로 수정하고, “영화” 도메인의 텍스트 수정 모델은 “어 베저스를 볼 거야”를 “어벤저스를 볼 거야”로 수정할 수 있다. “장소” 도메인의 텍스트 수정 모델의 텍스트 수정 동작, “연락처” 도메인의 텍스트 수정 모델의 텍스트 수정 동작, 및 영화” 도메인의 텍스트 수정 모델의 텍스트 수정 동작 중 적어도 하나는, 텍스트 스트림이 수신되는 동안에 순차적으로 수행될 수 있다.The text correction model in the “Place” domain is “Today near Yeomtong Station” to “Today near Yeongtong Station”, and the text correction model in the “Contact” domain is “Meet Gil Hong” to “Meet Gildong”. The text correction model in the "Movie" domain can be modified from "I'm going to see the Avengers" to "I'm going to see the Avengers". At least one of the text correction operation of the text correction model of the “place” domain, the text correction operation of the text correction model of the “contact” domain, and the text correction operation of the text correction model of the movie domain are sequential while the text stream is being received. Can be done with

도 16은 본 개시의 일 실시예에 따른 서버가 복수의 텍스트 수정 모델로부터 수정된 텍스트를 이용하여, 디바이스로부터 수신된 텍스트를 수정하는 예시를 나타내는 도면이다.16 is a diagram illustrating an example in which a server according to an embodiment of the present disclosure modifies text received from a device by using text modified from a plurality of text correction models.

도 16을 참조하면, 예를 들어, 서버(2000)는 “오늘 염통 역 근처에서 길홍이를 만나서 어 베저스를 볼 거야”라는 텍스트(160)를, “장소” 도메인의 텍스트 수정 모델, “연락처” 도메인의 텍스트 수정 모델 및 “영화” 도메인의 텍스트 수정 모델에게 각각 제공할 수 있다.Referring to FIG. 16, for example, the server 2000 writes a text 160 stating "I will meet Gilhongi near Yeomtong Station today and see A Betters", a text correction model in the "place" domain, "contact information" It can be provided to the text modification model of the ”domain and the text modification model of the “movie” domain respectively.

이에 따라, “장소” 도메인의 텍스트 수정 모델은 “오늘 영통 역 근처에서 길홍이를 만나서 어 베저스를 볼 거야”라는 수정 텍스트(161)를 출력하고, “영화” 도메인의 텍스트 수정 모델은 “오늘 염통 역 근처에서 길홍이를 만나서 어벤저스를 볼 거야”라는 수정 텍스트(162)를 출력하고, “연락처” 도메인의 텍스트 수정 모델은 “오늘 염통 역 근처에서 길동이를 만나서 어 베저스를 볼 거야”라는 수정 텍스트(163)를 출력할 수 있다.Accordingly, the text correction model in the “Place” domain outputs a revised text 161 stating “I will meet Gil Hong-i near Yeongtong Station today and see A Betters”, and the text correction model in the “Movie” domain is “Today. The corrected text (162) is printed, saying, "I will meet Gil Hong-i near Yeomtong Station and see the Avengers." The corrected text 163 may be output.

이후, 서버(2000)는 수정 텍스트(161) 내의 수정된 단어인 “영통”, 수정 텍스트(162) 내의 수정된 단어인 “어벤저스”, 및 수정 텍스트(163) 내의 수정된 단어인 “길동이”를 식별하고, 디바이스(1000)에 제공할 수정된 텍스트(164)인 “오늘 영통 역 근처에서 길동이를 만나서 어벤저스를 볼거야”를 생성할 수 있다. Thereafter, the server 2000 is the corrected word “Youngtong” in the corrected text 161, “Avengers” as the corrected word in the corrected text 162, and “Gildongi” that is the corrected word in the corrected text 163. "I will meet Gildong near Yeongtong Station today and see the Avengers", which is a modified text 164 to be provided to the device 1000 may be generated.

도 17은 본 개시의 일 실시예에 따른 서버의 블록도이다.17 is a block diagram of a server according to an embodiment of the present disclosure.

도 17을 참조하면, 본 개시의 일 실시예에 따른 서버(2000)는 통신 인터페이스(2100), 프로세서(2200) 및 저장부(2300)를 포함하며, 저장부(2300)는 도메인 관리 모듈(2310), 텍스트 수정 모듈(2320), NLU 모듈(2330) 및 음성 해석 관리 모듈(2340)을 포함할 수 있다.Referring to FIG. 17, a server 2000 according to an embodiment of the present disclosure includes a communication interface 2100, a processor 2200, and a storage unit 2300, and the storage unit 2300 is a domain management module 2310. ), a text modification module 2320, an NLU module 2330, and a speech analysis management module 2340.

통신 인터페이스(2100)는, 디바이스(1000) 및 다른 서버(미도시)와의 통신을 위한 하나 이상의 구성요소를 포함할 수 있다. 통신 인터페이스(2100)는, 음성 인식 및 보이스 어시스턴트 서비스를 위한 정보를 디바이스(1000) 및 다른 서버(미도시)와 송수신할 수 있다. 통신 인터페이스(2100)는, 예를 들어, 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 이동 통신망(mobile radio communication network), 위성 통신망 및 이들의 상호 조합을 통하여 통신할 수 있으나, 이에 제한되지 않는다.The communication interface 2100 may include one or more components for communication with the device 1000 and another server (not shown). The communication interface 2100 may transmit/receive information for voice recognition and voice assistant service with the device 1000 and other servers (not shown). The communication interface 2100 may include, for example, a local area network (LAN), a wide area network (WAN), a value added network (VAN), a mobile radio communication network, and Communication may be performed through a satellite communication network and a combination thereof, but is not limited thereto.

프로세서(2200)는 서버(2000)의 전반적인 동작을 제어한다. 프로세서(2200)는 저장부(2300)에 저장된 프로그램들을 실행함으로써, 본 명세서에서의 서버(2000)의 동작들을 전반적으로 제어할 수 있다. The processor 2200 controls the overall operation of the server 2000. The processor 2200 may generally control operations of the server 2000 in the present specification by executing programs stored in the storage unit 2300.

저장부(2300)는, 프로세서(2200)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 서버(2000)로 입력되거나 서버(2000)로부터 출력되는 데이터를 저장할 수 있다. 저장부(2300)는, 예를 들어, 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있으나, 이에 제한되지 않는다.. The storage unit 2300 may store a program for processing and controlling the processor 2200, and may store data input to the server 2000 or output from the server 2000. The storage unit 2300 includes, for example, a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, SD or XD memory, etc.), RAM (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (ROM, Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory) , Magnetic memory, magnetic disk, optical disk may include at least one type of storage medium, but is not limited thereto.

저장부(2300)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류할 수 있는데, 예를 들어, 도메인 관리 모듈(2310), 텍스트 수정 모듈 (2321, 2320), NLU 모듈(2330) 및 음성 해석 관리 모듈(2340) 등으로 분류될 수 있다.Programs stored in the storage unit 2300 can be classified into a plurality of modules according to their functions. For example, a domain management module 2310, text modification modules 2321 and 2320, NLU module 2330, and voice The analysis management module 2340 may be classified.

도메인 관리 모듈(2310)은 디바이스(1000)로부터 수신된 텍스트를 텍스트 수정 모듈(2320)로 제공한다. 도메인 관리 모듈(2310)은 도메인 식별 모듈 선택 모듈(2311), 적어도 하나의 도메인 식별 모듈(2312) 및 도메인 선택 모듈(2313)을 포함할 수 있다.The domain management module 2310 provides the text received from the device 1000 to the text correction module 2320. The domain management module 2310 may include a domain identification module selection module 2311, at least one domain identification module 2312, and a domain selection module 2313.

도메인 식별 모듈 선택 모듈(2311)은 도메인 식별 모듈(2312)을 선택할 수 있다. 도메인 식별 모듈(2312)이 복수인 경우에, 도메인 식별 모듈 선택 모듈(2311)은, 복수의 도메인 식별 모듈(2312) 중 적어도 일부를 선택할 수 있다.The domain identification module selection module 2311 may select the domain identification module 2312. When there are a plurality of domain identification modules 2312, the domain identification module selection module 2311 may select at least some of the plurality of domain identification modules 2312.

도메인 식별 모듈 선택 모듈(2311)은 디바이스(1000)로부터 수신된 제1 도메인 신뢰도에 기초하여, 서버(2000) 내의 복수의 도메인 식별 모듈(2312) 중 하나를 선택할 수 있다. 음성 인식 시스템 내에서 음성 인식을 위한 도메인들은 계층적으로 설정될 수 있다. 음성 인식을 위한 도메인들은, 예를 들어, 제1 계층의 도메인, 제1 계층의 도메인의 하위 도메인인 제2 계층의 도메인들, 및 제2 계층의 도메인의 하위 도메인인 제3 계층의 도메인들을 포함할 수 있다. 또한, 예를 들어, 제1 계층의 도메인은 디바이스(1000)의 도메인 식별 모듈(1440)에 대응되고, 제2 계층의 도메인은 서버(2000)의 도메인 식별 모듈(2312)에 대응되고, 제3 계층의 도메인은 텍스트 수정 모듈(2320)에 대응될 수 있다. 이 경우, 도메인 식별 모듈 선택 모듈(2311)은 디바이스(1000)의 도메인 식별 모듈(1440)로부터 산출된 제1 도메인 신뢰도에 따라 복수의 제2 계층의 도메인들 중 높은 신뢰도를 가지는 제2 계층의 도메인을 식별할 수 있다. 또한, 도메인 식별 모듈 선택 모듈(2311)은 식별된 제2 계층의 도메인에 대응되는 도메인 식별 모듈(2312)을 선택할 수 있다.The domain identification module selection module 2311 may select one of the plurality of domain identification modules 2312 in the server 2000 based on the first domain reliability received from the device 1000. Domains for speech recognition in a speech recognition system may be hierarchically set. Domains for speech recognition include, for example, a domain of a first layer, domains of a second layer that is a subdomain of a domain of a first layer, and domains of a third layer that are a subdomain of a domain of a second layer. can do. Also, for example, the domain of the first layer corresponds to the domain identification module 1440 of the device 1000, the domain of the second layer corresponds to the domain identification module 2312 of the server 2000, and the third The hierarchical domain may correspond to the text modification module 2320. In this case, the domain identification module selection module 2311 is a second layer domain having a high reliability among a plurality of second layer domains according to the first domain reliability calculated from the domain identification module 1440 of the device 1000. Can be identified. Also, the domain identification module selection module 2311 may select the domain identification module 2312 corresponding to the identified second layer domain.

도메인 식별 모듈(2312)은 텍스트의 수정을 위한 도메인을 식별할 수 있다. 서버(2000)가 디바이스(1000)로부터 도메인 정보를 수신하는 경우에는, 도메인 식별 모듈(2312)은 도메인 정보로부터 텍스트의 수정을 위한 도메인을 식별할 수 있다. 또는, 서버(2000)가 디바이스(1000)로부터 도메인 정보를 수신하지 않는 경우에는, 도메인 식별 모듈(2312)은 디바이스(1000)로부터 수신된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 도메인 식별 모듈(2312)은 디바이스(1000)로부터 수신되는 텍스트가 텍스트의 수정을 위해 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 또한, 도메인 식별 모듈(2312)은 미리 등록된 도메인에 대해 산출된 도메인 신뢰도에 기초하여, 디바이스(1000)로부터 수신된 텍스트에 관련된 도메인을 식별할 수 있다. 도메인 식별 모듈(2312)은 룰 기반으로 텍스트에 관련된 도메인을 식별하거나 도메인 식별을 위해 훈련된 인공 지능 모델을 이용하여 텍스트에 관련된 도메인 신뢰도를 획득할 수 있다. 또한, 예를 들어, 도메인 식별을 위한 인공 지능 모델은 NLU 모델의 일부이거나, NLU 모델과는 별개의 모델일 수 있다. The domain identification module 2312 may identify a domain for text correction. When the server 2000 receives domain information from the device 1000, the domain identification module 2312 may identify a domain for text correction from the domain information. Alternatively, when the server 2000 does not receive domain information from the device 1000, the domain identification module 2312 may identify a domain related to the text based on the domain reliability of the text received from the device 1000. I can. For example, the domain identification module 2312 may calculate a confidence score indicating how relevant a text received from the device 1000 is to a domain registered in advance for text correction. Also, the domain identification module 2312 may identify a domain related to the text received from the device 1000 based on the domain reliability calculated for the pre-registered domain. The domain identification module 2312 may identify a domain related to the text based on a rule or obtain a domain reliability related to the text by using an artificial intelligence model trained for domain identification. Also, for example, the artificial intelligence model for domain identification may be part of the NLU model or may be a model separate from the NLU model.

도메인 식별 모듈(2312)은 복수 구간의 텍스트를 누적하면서 누적된 텍스트에 대한 도메인 신뢰도를 산출함으로써, 텍스트에 관련된 도메인을 식별할 수 있다. 또는, 도메인 식별 모듈(2312)은 텍스트를 복수의 구간으로 구분하고 각 구간의 텍스트 별로 관련된 도메인을 식별할 수 있다.The domain identification module 2312 may identify a domain related to the text by accumulating a plurality of sections of text and calculating a domain reliability for the accumulated text. Alternatively, the domain identification module 2312 may divide the text into a plurality of sections and identify a related domain for each text of each section.

도메인 선택 모듈(2313)은 복수의 텍스트 수정 모델들(2321, 2322, 2323) 중에서, 도메인 식별 모듈(2312)에 의해 식별된 도메인에 대응되는 텍스트 수정 모델을 선택할 수 있다.The domain selection module 2313 may select a text modification model corresponding to the domain identified by the domain identification module 2312 from among a plurality of text modification models 2321, 2322, 2323.

도메인 선택 모듈(2313)은 서버(2000)에 등록된 도메인들 중에서 도메인 식별 모듈(2312)에 의해 식별된 도메인에 대응되는 도메인을 선택하고, 선택된 도메인의 텍스트 수정 모델을 선택할 수 있다. The domain selection module 2313 may select a domain corresponding to the domain identified by the domain identification module 2312 from among domains registered in the server 2000 and select a text correction model of the selected domain.

도메인 식별 모듈(2312)은 텍스트를 복수의 구간으로 구분하고 각 구간의 텍스트 별로 관련된 도메인을 식별한 경우에는, 도메인 선택 모듈(2313)은 각 구간의 텍스트 별로 도메인을 선택할 수 있다.When the domain identification module 2312 divides the text into a plurality of sections and identifies a domain related to each text of each section, the domain selection module 2313 may select a domain for each text of each section.

텍스트 수정 모듈(2320)은 디바이스(1000)로부터 수신된 텍스트를 수정한다. 텍스트 수정 모듈(2320)은 결정된 도메인에 대응되는 텍스트 수정 모델을 이용하여 텍스트를 수정할 수 있다. 텍스트 수정 모듈(2320)은 제1 도메인의 텍스트 수정 모델(2321), 제2 도메인의 텍스트 수정 모델(2322) 및 제3 도메인의 텍스트 수정 모델(2323)을 포함할 수 있다.The text modification module 2320 modifies the text received from the device 1000. The text modification module 2320 may modify text using a text modification model corresponding to the determined domain. The text correction module 2320 may include a text correction model 2321 of a first domain, a text correction model 2322 of a second domain, and a text correction model 2323 of a third domain.

텍스트 수정 모듈(2320)은 선택된 텍스트 수정 모델을 이용하여 수정된 텍스트를 생성할 수 있다. 텍스트 수정 모듈(2320)은 텍스트를 선택된 텍스트 수정 모델에 입력할 수 있으며, 텍스트 수정 모델로부터 출력되는 수정된 텍스트를 획득할 수 있다. 이 경우, 텍스트 수정 모듈(2320)은 디바이스(1000)로부터 수신된 텍스트의 포맷을 텍스트 수정 모델에 적합하도록 전처리하고, 전처리된 값을 텍스트 수정 모델에 입력할 수도 있다. The text correction module 2320 may generate the corrected text using the selected text correction model. The text correction module 2320 may input text into the selected text correction model, and obtain the corrected text output from the text correction model. In this case, the text modification module 2320 may pre-process the format of the text received from the device 1000 to fit the text modification model, and may input the preprocessed value into the text modification model.

만약, 디바이스(1000)로부터 수신된 텍스트가 복수의 도메인에 관련된 경우에, 텍스트 수정 모듈(2320)은 텍스트의 수정을 위하여 복수의 도메인에 대응되는 복수의 텍스트 수정 모델을 선택할 수 있다. 이 경우, 텍스트 수정 모듈(2320)은 복수의 텍스트 수정 모델로부터 출력되는 수정된 텍스트들로부터, 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다. 예를 들어, 텍스트 수정 모듈(2320)이 복수의 텍스트 수정 모델들을 이용하여 복수의 수정된 텍스트를 생성한 경우에는, 텍스트 수정 모듈(2320)은 복수의 수정된 텍스트들의 신뢰도를 비교하고, 신뢰도가 높은 수정된 텍스트를 디바이스(1000)에게 제공할 수정된 텍스트로 결정할 수 있다. 수정된 텍스트의 신뢰도는, 수정된 텍스트가 입력 음성에 일치하는 정도를 나타내는 수치일 수 있으며 예를 들어, 컨피던스 스코어(confidence score)를 포함할 수 있으나, 이에 제한되지 않는다.If the text received from the device 1000 is related to a plurality of domains, the text modification module 2320 may select a plurality of text modification models corresponding to the plurality of domains to modify the text. In this case, the text modification module 2320 may obtain a modified text to be provided to the device 1000 from modified texts output from a plurality of text modification models. For example, when the text modification module 2320 generates a plurality of modified texts using a plurality of text modification models, the text modification module 2320 compares the reliability of the plurality of modified texts, and the reliability is The highly modified text may be determined as the modified text to be provided to the device 1000. The reliability of the corrected text may be a numerical value indicating the degree to which the corrected text matches the input voice, and may include, for example, a confidence score, but is not limited thereto.

또한, 예를 들어, 텍스트 수정 모듈(2320)이 복수의 텍스트 수정 모델들을 이용하여 복수의 수정된 텍스트를 생성한 경우에, 텍스트 수정 모듈(2320)은 복수의 수정된 텍스트에서 수정된 부분들을 추출하고, 추출된 수정된 부분들을 이용하여 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다.Also, for example, when the text modification module 2320 generates a plurality of modified texts using a plurality of text modification models, the text modification module 2320 extracts modified portions from the plurality of modified texts. Then, the modified text to be provided to the device 1000 may be obtained using the extracted modified portions.

또한, 예를 들어, 텍스트 수정 모듈(2320)은 복수의 텍스트 수정 모델로부터 출력되는 수정된 텍스트들로부터 일부 텍스트들을 추출하고, 추출된 일부 텍스트들을 조합함으로써, 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다. 예를 들어, 텍스트 수정 모듈(2320)은 복수의 텍스트 수정 모델들을 이용하여 제1 수정 텍스트 및 제2 수정 텍스트를 생성하고, 제1 수정 텍스트의 일부의 신뢰도 및 제2 수정 텍스트의 일부의 신뢰도가 높은 경우에, 텍스트 수정 모듈(2320)은 제1 수정 텍스트의 일부 및 제2 수정 텍스트의 일부를 조합함으로써 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다.In addition, for example, the text modification module 2320 extracts some texts from modified texts output from a plurality of text modification models, and combines the extracted texts, thereby providing the modified text to be provided to the device 1000. Can be obtained. For example, the text modification module 2320 generates a first modified text and a second modified text using a plurality of text modification models, and the reliability of a part of the first modified text and the reliability of a part of the second modified text In a high case, the text correction module 2320 may obtain the corrected text to be provided to the device 1000 by combining a part of the first corrected text and a part of the second corrected text.

또한, 예를 들어, 각 구간의 텍스트 별로 관련된 도메인이 선택된 경우에, 텍스트 수정 모듈(2320)은 각 구간의 텍스트를 각각 대응되는 도메인 수정 모델에게 제공할 수 있다. 이 경우, 텍스트 수정 모듈(2320)은 각각의 도메인 수정 모델로부터 출력되는 각 구간의 수정 텍스트들을 조합하여 디바이스(1000)에게 제공할 수정된 텍스트를 획득할 수 있다.In addition, for example, when a domain related to each text of each section is selected, the text modification module 2320 may provide the text of each section to each corresponding domain modification model. In this case, the text correction module 2320 may obtain the corrected text to be provided to the device 1000 by combining the corrected texts of each section output from each domain correction model.

NLU 모듈(2330) 텍스트 수정 모듈(2320)로부터 출력된 수정된 텍스트를 해석할 수 있다. NLU 모듈(2330)은, 제1 NLU 모델(2331) 및 제2 NLU 모델(2332) 등과 같이, 복수 도메인의 복수의 NLU 모델을 포함할 수 있다. NLU 모듈(233)이 텍스트를 해석함으로써 생성되는 결과 값은, 예를 들어, 인텐트 및 파라미터를 포함할 수 있다. 인텐트는 NLU 모델을 이용하여 텍스트를 해석함으로써 결정되는 정보로서, 예를 들어, 사용자의 발화 의도를 나타낼 수 있다. 인텐트는, 사용자의 발화 의도를 나타내는 정보(이하, 의도 정보)뿐 아니라, 사용자의 의도를 나타내는 정보에 대응하는 수치 값을 포함할 수 있다. 수치 값은, 텍스트가 특정 의도를 나타내는 정보와 관련될 확률을 나타낼 수 있다. NLU 모델을 이용하여 텍스트를 해석한 결과, 사용자의 의도를 나타내는 정보가 복수 개 획득되는 경우, 각 의도 정보에 대응되는 수치 값이 최대인 의도 정보가 인텐트로 결정될 수 있다. 또한, 파라미터는 인텐트와 관련된 세부 정보를 나타낼 수 있다. 파라미터는 인텐트와 관련된 정보로서, 하나의 인텐트에 복수 종류의 파라미터가 대응될 수 있다.The corrected text output from the NLU module 2330 text correction module 2320 may be analyzed. The NLU module 2330 may include a plurality of NLU models of a plurality of domains, such as a first NLU model 2331 and a second NLU model 2332. The result value generated by the NLU module 233 interpreting the text may include, for example, an intent and a parameter. The intent is information determined by interpreting the text using the NLU model, and may indicate, for example, the user's speech intention. The intent may include not only information indicating the user's speech intention (hereinafter, intention information), but also a numerical value corresponding to information indicating the user's intention. Numerical values may indicate the probability that the text will be associated with information indicating a specific intent. As a result of analyzing the text using the NLU model, when a plurality of pieces of information indicating the intention of the user are obtained, intention information having a maximum numerical value corresponding to each intention information may be determined as the intent. Also, the parameter may indicate detailed information related to the intent. The parameter is information related to the intent, and a plurality of types of parameters may correspond to one intent.

또한, NLU 모듈(233)이 텍스트를 해석함으로써 생성되는 결과 값은 디바이스(1000)에게 소정의 보이스 어시스턴트 서비스를 제공하는데 이용될 수 있다.In addition, the result value generated by the NLU module 233 interpreting the text may be used to provide a predetermined voice assistant service to the device 1000.

음성 해석 관리 모듈(2340)은 텍스트 수정 모듈(2320)에 의한 수정된 텍스트를 평가하고, 수정된 텍스트에 대한 NLU 처리를 수행할 지를 결정할 수 있다. 음성 해석 관리 모듈(2340)은 음성 인식 평가 모듈(2341) 및 NLU 결정 모듈(2342)을 포함할 수 있다.The speech analysis management module 2340 may evaluate the corrected text by the text correction module 2320 and determine whether to perform NLU processing on the corrected text. The speech analysis management module 2340 may include a speech recognition evaluation module 2341 and an NLU determination module 2342.

음성 인식 평가 모듈(2341)은 텍스트 수정 모듈(2320)에 의해 수정된 텍스트의 신뢰도를 산출할 수 있다. 수정된 텍스트의 신뢰도 수정된 텍스트가 입력 음성에 일치할 확률을 나타내는 수치일 수 있으며 예를 들어, 컨피던스 스코어(confidence score)를 포함할 수 있으나, 이에 제한되지 않는다. 또한, 음성 인식 평가 모듈(2341)은 수정된 텍스트의 도메인 신뢰도를 산출할 수 있다. 음성 인식 평가 모듈(2341)은 수정된 텍스트가 NLU 처리를 위해 서버(2000)에 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 도메인 신뢰도를 산출할 수 있다.The speech recognition evaluation module 2341 may calculate the reliability of the text modified by the text correction module 2320. Reliability of the revised text may be a numerical value representing the probability that the revised text matches the input voice, and may include, for example, a confidence score, but is not limited thereto. In addition, the speech recognition evaluation module 2341 may calculate the domain reliability of the modified text. The speech recognition evaluation module 2341 may calculate a domain reliability indicating how relevant the modified text is to a domain previously registered in the server 2000 for NLU processing.

NLU 결정 모듈(2342)은 수정된 텍스트에 대하여 서버(2000)에서 NLU 처리를 수행할 지를 결정할 수 있다. 수정된 텍스트의 신뢰도 및 수정된 텍스트의 도메인 신뢰도에 기초하여, 서버(2000)에서 NLU 처리를 수행할 지를 결정할 수 있다. NLU 결정 모듈(2342)은 수정된 텍스트에 관련된 도메인이 디바이스(1000)에서 NLU 처리를 할 수 있는 도메인인지 서버(2000)에서 NLU 처리를 할 수 있는 도메인 인지를 판단할 수도 있다. The NLU determination module 2342 may determine whether to perform NLU processing in the server 2000 for the modified text. Based on the reliability of the modified text and the domain reliability of the modified text, the server 2000 may determine whether to perform NLU processing. The NLU determination module 2342 may determine whether the domain related to the modified text is a domain capable of NLU processing by the device 1000 or a domain capable of NLU processing by the server 2000.

도 18은 본 개시의 일 실시예에 따른 디바이스의 블록도이다.18 is a block diagram of a device according to an embodiment of the present disclosure.

도 18을 참조하면, 본 개시의 일 실시예에 따른 디바이스(1000)는 통신 인터페이스(1100), 입출력부(1200), 프로세서(1300) 및 메모리(1400)를 포함하며, 메모리(1400)는 적어도 하나의 ASR 모델(1410), 적어도 하나의 NLU 모델(1420), 음성 인식 평가 모듈(1430), 도메인 식별 모듈(1440) 및 NLU 결정 모듈(1450)을 포함할 수 있다.Referring to FIG. 18, a device 1000 according to an embodiment of the present disclosure includes a communication interface 1100, an input/output unit 1200, a processor 1300, and a memory 1400, and the memory 1400 is at least One ASR model 1410, at least one NLU model 1420, a speech recognition evaluation module 1430, a domain identification module 1440, and an NLU determination module 1450 may be included.

통신 인터페이스(1100)는 서버(2000) 및 외부 장치(미도시)와의 통신을 위한 하나 이상의 구성요소를 포함할 수 있다. 통신 인터페이스(1100)는, 음성 인식 및 보이스 어시스턴트 서비스를 위한 정보를 서버(2000) 및 외부 장치(미도시)와 송수신할 수 있다. 통신 인터페이스(1100)는, 예를 들어, 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 이동 통신망(mobile radio communication network), 위성 통신망 및 이들의 상호 조합을 통하여 통신할 수 있으나, 이에 제한되지 않는다.The communication interface 1100 may include one or more components for communication with the server 2000 and an external device (not shown). The communication interface 1100 may transmit/receive information for voice recognition and voice assistant service with the server 2000 and an external device (not shown). The communication interface 1100 may include, for example, a local area network (LAN), a wide area network (WAN), a value added network (VAN), a mobile radio communication network, and Communication may be performed through a satellite communication network and a combination thereof, but is not limited thereto.

입출력부(1200)는 디바이스(1000)에 입력되는 데이터를 수신하고, 디바이스(1000)로부터 데이터를 출력할 수 있다. 입출력부(1200)는 사용자 입력부, 카메라, 마이크로폰, 디스플레이부 및 음향 출력부를 포함할 수 있다. 사용자 입력부는, 예를 들어, 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다.The input/output unit 1200 may receive data input to the device 1000 and may output data from the device 1000. The input/output unit 1200 may include a user input unit, a camera, a microphone, a display unit, and an audio output unit. The user input unit is, for example, a key pad, a dome switch, a touch pad (contact type capacitance method, pressure type resistive film method, infrared detection method, surface ultrasonic conduction method, integral tension measurement) Method, piezo effect method, etc.), a jog wheel, a jog switch, and the like, but are not limited thereto.

디스플레이부는 디바이스(1000)에서 처리되는 정보를 표시 출력할 수 있다. 예를 들어, 디스플레이부는, 보이스 어시스턴트 서비스를 위한 GUI를 디스플레이할 수 있다. 디스플레이부와 터치패드가 레이어 구조를 이루어 터치 스크린으로 구성되는 경우, 디스플레이부는 출력 장치 이외에 입력 장치로도 사용될 수 있다. 디스플레이부는 액정 디스플레이(liquid crystal display), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display), 유기 발광 다이오드(organic light-emitting diode), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 전기영동 디스플레이(electrophoretic display) 중에서 적어도 하나를 포함할 수 있다. The display unit may display and output information processed by the device 1000. For example, the display unit may display a GUI for voice assistant service. When the display unit and the touch pad are layered to form a touch screen, the display unit may be used as an input device in addition to an output device. The display unit is a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, a 3D display, It may include at least one of an electrophoretic display.

음향 출력부는 오디오 데이터를 출력할 수 있으며, 예를 들어, 스피커(speaker), 버저(Buzzer) 등을 포함할 수 있다.The sound output unit may output audio data, and may include, for example, a speaker, a buzzer, and the like.

카메라는 화상 통화모드 또는 촬영 모드에서 이미지 센서를 통해 정지영상 또는 동영상 등의 화상 프레임을 얻을 수 있다. 이미지 센서를 통해 캡쳐된 이미지는 프로세서(1300) 또는 별도의 이미지 처리부(미도시)를 통해 처리될 수 있다. The camera may obtain an image frame such as a still image or a video through an image sensor in a video call mode or a photographing mode. The image captured through the image sensor may be processed through the processor 1300 or a separate image processing unit (not shown).

마이크로폰은, 사용자의 발화를 입력 받아 전기적인 음성 데이터로 처리할 수 있다.The microphone can receive the user's speech and process it as electrical voice data.

프로세서(1300)는 디바이스(1000)의 전반적인 동작을 제어한다. 프로세서(1300)는 메모리(1400)에 저장된 프로그램들을 실행함으로써, 본 명세서에서의 디바이스(1000)의 동작들을 전반적으로 제어할 수 있다. The processor 1300 controls the overall operation of the device 1000. The processor 1300 may generally control operations of the device 1000 in the present specification by executing programs stored in the memory 1400.

메모리(1400)는, 프로세서(1300)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 디바이스(1000)로 입력되거나 디바이스(1000)로부터 출력되는 데이터를 저장할 수 있다. 메모리(1400)는, 예를 들어, 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있으나, 이에 제한되지 않는다.. The memory 1400 may store a program for processing and control of the processor 1300 and may store data input to the device 1000 or output from the device 1000. The memory 1400 is, for example, a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, SD or XD Memory, etc.), RAM (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (ROM, Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), The storage medium may include at least one type of magnetic memory, magnetic disk, and optical disk, but is not limited thereto.

메모리(1400)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류할 수 있는데, 예를 들어, ASR 모델(1410), NLU 모델(1420), 음성 인식 평가 모듈(1430), 도메인 식별 모듈(1440) 및 NLU 결정 모듈(1450) 등으로 분류될 수 있다.Programs stored in the memory 1400 can be classified into a plurality of modules according to their functions. For example, an ASR model 1410, an NLU model 1420, a speech recognition evaluation module 1430, and a domain identification module ( 1440) and the NLU determination module 1450.

ASR 모델(1410)은 사용자의 음성 입력으로부터 생성된 특징 벡터로부터 텍스트를 획득할 수 있다. 디바이스(1000)의 프로세서(1300)는 사용자의 음성을 인식하기 위하여 ASR 모델(1410)에 특징 벡터를 입력할 수 있다. 디바이스(1000) 내에 복수의 ASR 모델(1410)이 포함된 경우, 디바이스(1000)의 프로세서(1300)는 복수의 ASR 모델(1410) 중 하나를 선택하고, 특징 벡터를 선택된 ASR 모델(1410)에 적합한 포맷으로 변환할 수 있다. ASR 모델(1410)은, 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 포함하는 인공지능 모델일 수 있다. 또는, ASR 모델(1410)은, 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 별도로 포함하지 않고 통합된 신경망을 포함하는 구조를 가지는 종단간 음성 인식 모델일 수 있다.The ASR model 1410 may obtain text from a feature vector generated from a user's voice input. The processor 1300 of the device 1000 may input a feature vector into the ASR model 1410 to recognize the user's voice. When a plurality of ASR models 1410 are included in the device 1000, the processor 1300 of the device 1000 selects one of the plurality of ASR models 1410, and selects a feature vector to the selected ASR model 1410. It can be converted to a suitable format. The ASR model 1410 may be, for example, an artificial intelligence model including an acoustic model, a pronunciation dictionary, and a language model. Alternatively, the ASR model 1410 may be, for example, an end-to-end speech recognition model having a structure including an integrated neural network without separately including an acoustic model, a pronunciation dictionary, and a language model.

NLU 모델(1420)은 ASR 모델(1410)로부터 출력되는 텍스트를 해석할 수 있다. 또는 서버(2000)로부터 제공되는 수정된 텍스트를 해석할 수 있다. NLU 모델(1420)이 텍스트 또는 수정된 텍스트를 해석함으로써 생성되는 결과 값은 사용자에게 소정의 보이스 어시스턴트 서비스를 제공하는데 이용될 수 있다.The NLU model 1420 may interpret text output from the ASR model 1410. Alternatively, the modified text provided from the server 2000 may be analyzed. A result value generated by the NLU model 1420 interpreting text or modified text may be used to provide a predetermined voice assistant service to a user.

음성 인식 평가 모듈(1430)은 ASR 모델(1410)로부터 출력된 텍스트의 신뢰도를 획득할 수 있다. 텍스트의 신뢰도는, ASR 모델(1410)로부터 출력된 텍스트가 입력 음성에 매칭되는 정도를 나타내는 수치일 수 있으며 예를 들어, 컨피던스 스코어(confidence score)를 포함할 수 있으나, 이에 제한되지 않는다. 또한, 텍스트의 신뢰도는, 텍스트가 입력 음성에 일치할 확률에 관련될 수 있다. 예를 들어, 텍스트의 신뢰도는, 디바이스(1000)의 ASR 모델(1410)로부터 출력되는 복수의 추정 텍스트들의 가능도, 및 텍스트 내의 적어도 하나의 문자가 다른 문자로 대체될 사후 확률들 중 적어도 하나에 기초하여 계산될 수 있다. 예를 들어, 음성 인식 평가 모듈(1430)은, 비터비(Viterbi) 디코딩 결과 출력되는 가능도에 기초하여 신뢰도를 계산할 수 있다. 또는, 예를 들어, 음성 인식 평가 모듈(1430)은, 종단간 ASR 모델에서 소프트맥스 레이어로부터 출력되는 사후 확률들에 기초하여 신뢰도를 계산할 수도 있다. 또는, 예를 들어, 음성 인식 평가 모듈(1430)은 디바이스(1000)의 ASR 모델(1410)의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들을 결정하고, 복수의 추정 텍스트들 내의 문자들의 상관도에 기초하여, 텍스트의 신뢰도를 계산할 수 있다. The speech recognition evaluation module 1430 may acquire the reliability of the text output from the ASR model 1410. The reliability of the text may be a numerical value indicating the degree to which the text output from the ASR model 1410 matches the input voice, and may include, for example, a confidence score, but is not limited thereto. Also, the reliability of the text may be related to the probability that the text will match the input speech. For example, the reliability of the text is determined by at least one of the likelihood of a plurality of estimated texts output from the ASR model 1410 of the device 1000 and the posterior probabilities that at least one character in the text is replaced with another character. It can be calculated on the basis of. For example, the speech recognition evaluation module 1430 may calculate a reliability based on a likelihood output as a result of Viterbi decoding. Alternatively, for example, the speech recognition evaluation module 1430 may calculate the reliability based on posterior probabilities output from the softmax layer in the end-to-end ASR model. Alternatively, for example, the speech recognition evaluation module 1430 determines a plurality of estimated texts estimated in the speech recognition process of the ASR model 1410 of the device 1000, and determines a correlation of characters in the plurality of estimated texts. Based on this, the reliability of the text can be calculated.

또한, 음성 인식 평가 모듈(1430)은 ASR 모델(1430)에서 출력된 텍스트를 서버(2000)로 전송할 지를 결정할 수 있다. 음성 인식 평가 모듈(1430)은 텍스트의 신뢰도를 기설정된 임계치와 비교함으로써, 텍스트를 서버(2000)로 전송할 지를 결정할 수 있다. 음성 인식 평가 모듈(1430)은 텍스트의 신뢰도가 기설정된 임계치 이상이면, 텍스트를 서버(2000)로 전송하지 않을 것을 결정할 수 있다. 또한, 음성 인식 평가 모듈(1430)은 텍스트의 신뢰도가 기설정된 임계치보다 작으면, 텍스트를 서버(2000)로 전송할 것을 결정할 수 있다.Also, the speech recognition evaluation module 1430 may determine whether to transmit the text output from the ASR model 1430 to the server 2000. The speech recognition evaluation module 1430 may determine whether to transmit the text to the server 2000 by comparing the reliability of the text with a preset threshold. The speech recognition evaluation module 1430 may determine not to transmit the text to the server 2000 when the reliability of the text is greater than or equal to a preset threshold. Also, the speech recognition evaluation module 1430 may determine to transmit the text to the server 2000 when the reliability of the text is less than a preset threshold.

또한, 음성 인식 평가 모듈(1430)은 ASR 모델(1410)의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들 중에서 신뢰도가 높은 적어도 하나의 텍스트를 기준으로, 텍스트를 서버(2000)로 전송할 지를 결정할 수 있다. 예를 들어, ASR 모델(1410)의 음성 인식 과정에서 추정되는 복수의 추정 텍스트들이 신뢰도가 높은 제1 추정 텍스트 및 신뢰도가 높은 제2 추정 텍스트를 포함하며, 제1 추정 텍스트의 신뢰도 및 제2 추정 텍스트의 신뢰도의 차이가 소정 임계치 이하인 경우에, 음성 인식 평가 모듈(1430)은 텍스트를 서버(2000)로 전송할 것을 결정할 수 있다.In addition, the speech recognition evaluation module 1430 may determine whether to transmit the text to the server 2000 based on at least one text with high reliability among a plurality of estimated texts estimated in the speech recognition process of the ASR model 1410. have. For example, a plurality of estimated texts estimated in the speech recognition process of the ASR model 1410 include a first estimated text with high reliability and a second estimated text with high reliability, and the reliability and second estimation of the first estimated text When the difference in the reliability of the text is less than or equal to a predetermined threshold, the speech recognition evaluation module 1430 may determine to transmit the text to the server 2000.

도메인 식별 모듈(1440)은 ASR 모델(1410)로부터 출력된 텍스트에 관련된 도메인을 식별할 수 있다. 도메인 식별 모듈(1440)은 ASR 모델(1410)로부터 출력된 텍스트의 도메인 신뢰도에 기초하여, 텍스트에 관련된 도메인을 식별할 수 있다. 예를 들어, 도메인 식별 모듈(1440)은 ASR 모델(1410)로부터 출력된 텍스트가 미리 등록된 도메인에 어느 정도 관련성이 있는 지를 나타내는 컨피던스 스코어를 산출할 수 있다. 도메인 식별 모듈(1440)은 도메인 식별을 위해 훈련된 인공 지능 모델로서 텍스트를 입력 값으로 하여 도메인 신뢰도를 출력할 수 있다. 또한, 예를 들어, 도메인 식별 모듈(1440)은 NLU 모델의 일부이거나, NLU 모델과는 별개의 모델일 수 있다. 또는 도메인 식별 모듈(1440)은 룰 기반으로 텍스트에 관련된 도메인을 식별할 수 있다.The domain identification module 1440 may identify a domain related to the text output from the ASR model 1410. The domain identification module 1440 may identify a domain related to the text based on the domain reliability of the text output from the ASR model 1410. For example, the domain identification module 1440 may calculate a confidence score indicating how relevant the text output from the ASR model 1410 is to a pre-registered domain. The domain identification module 1440 is an artificial intelligence model trained for domain identification and may output domain reliability by using text as an input value. Also, for example, the domain identification module 1440 may be a part of the NLU model or may be a model separate from the NLU model. Alternatively, the domain identification module 1440 may identify a domain related to text based on a rule.

음성 인식 시스템 내에서 음성 인식을 위한 도메인들은 계층적으로 설정될 수 있으며, 디바이스(1000)의 도메인 식별 모듈(1440)에 의해 식별되는 도메인은, 서버(2000)의 도메인 식별 모듈(2312)에 의해 식별되는 도메인보다 상위 계층의 도메인일 수 있다.Domains for speech recognition in the speech recognition system may be hierarchically set, and the domains identified by the domain identification module 1440 of the device 1000 are identified by the domain identification module 2312 of the server 2000. It may be a higher layer domain than the identified domain.

NLU 결정 모듈(1450)은 ASR 모델(1410)로부터 출력되는 텍스트에 대하여 디바이스(1000)에서 NLU 처리를 수행할 지 서버(2000)에서 NLU 처리를 수행할 지를 결정할 수 있다. NLU 결정 모듈(1450)은 ASR 모델(1410)로부터 출력된 텍스트에 관련된 도메인이 디바이스(1000)에서 NLU 처리를 할 수 있는 도메인인지를 판단할 수 있다. ASR 모델(1410)로부터 출력된 텍스트에 관련된 도메인이 디바이스(1000)에 기 등록된 도메인인 경우에, NLU 결정 모듈(1450)은 디바이스(1000)가 NLU 처리를 수행할 것을 결정할 수 있다. 또한, ASR 모델(1410)로부터 출력된 텍스트에 관련된 도메인이 디바이스(1000)에 기 등록된 도메인이 아닌 경우에, NLU 결정 모듈(1450)은 디바이스(1000)가 NLU 처리를 수행하지 않을 것을 결정할 수 있다.The NLU determination module 1450 may determine whether to perform NLU processing in the device 1000 or NLU processing in the server 2000 for text output from the ASR model 1410. The NLU determination module 1450 may determine whether the domain related to the text output from the ASR model 1410 is a domain in which the device 1000 can perform NLU processing. When the domain related to the text output from the ASR model 1410 is a domain previously registered in the device 1000, the NLU determination module 1450 may determine that the device 1000 performs NLU processing. In addition, when the domain related to the text output from the ASR model 1410 is not a domain previously registered in the device 1000, the NLU determination module 1450 may determine that the device 1000 does not perform NLU processing. have.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다. Functions related to artificial intelligence according to the present disclosure are operated through a processor and a memory. The processor may be composed of one or a plurality of processors. In this case, one or more processors may be a general-purpose processor such as a CPU, AP, or Digital Signal Processor (DSP), a graphics-only processor such as a GPU, a Vision Processing Unit (VPU), or an artificial intelligence-only processor such as an NPU. One or more processors control to process input data according to a predefined operation rule or an artificial intelligence model stored in the memory. Alternatively, when one or more processors are dedicated artificial intelligence processors, the dedicated artificial intelligence processor may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.A predefined motion rule or an artificial intelligence model is characterized by being created through learning. Here, to be made through learning means that a basic artificial intelligence model is learned using a plurality of learning data by a learning algorithm, so that a predefined motion rule or an artificial intelligence model set to perform a desired characteristic (or purpose) is created. Means Jim. Such learning may be performed in the device itself on which the artificial intelligence according to the present disclosure is performed, or may be performed through a separate server and/or system. Examples of the learning algorithm include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the above-described examples.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다. The artificial intelligence model may be composed of a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and a neural network operation is performed through an operation between the operation result of a previous layer and a plurality of weights. The plurality of weights of the plurality of neural network layers can be optimized by the learning result of the artificial intelligence model. For example, a plurality of weights may be updated to reduce or minimize a loss value or a cost value acquired from an artificial intelligence model during a learning process. The artificial neural network may include a deep neural network (DNN), for example, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), or deep Q-Networks (Deep Q-Networks), and the like, but is not limited to the above-described example.

본 개시의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 또는 프로그램 모듈과 같은 변조된 데이터 신호의 기타 데이터를 포함할 수 있다. An embodiment of the present disclosure may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable media may include computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media may typically contain computer readable instructions, data structures, or other data in a modulated data signal such as a program module.

또한, 본 명세서에서, “부”는 프로세서 또는 회로와 같은 하드웨어 구성(hardware component), 및/또는 프로세서와 같은 하드웨어 구성에 의해 실행되는 소프트웨어 구성(software component)일 수 있다.In addition, in the present specification, the “unit” may be a hardware component such as a processor or a circuit, and/or a software component executed by a hardware configuration such as a processor.

또한, 본 명세서에서, “a, b 또는 c 중 적어도 하나를 포함한다”는 “a만 포함하거나, b만 포함하거나, c만 포함하거나, a 및 b를 포함하거나, b 및 c를 포함하거나, a 및 c를 포함하거나, a, b 및 c를 모두 포함하는 것을 의미할 수 있다.In addition, in the present specification, “including at least one of a, b, or c” means “including only a, only b, only c, including a and b, or including b and c, It may mean including a and c, or including all of a, b and c.

전술한 본 개시의 설명은 예시를 위한 것이며, 본 개시가 속하는 기술분야의 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present disclosure is for illustrative purposes only, and those of ordinary skill in the art to which the present disclosure pertains will be able to understand that it is possible to easily transform it into other specific forms without changing the technical spirit or essential features of the present disclosure. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 개시의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 개시의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present disclosure is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present disclosure. do.

Claims

In a method for a server to modify a speech recognition result provided from a device,
Receiving a text output from the device's Automatic Speech Recognition (ASR) model from the device;
Identifying at least one domain related to the received text;
Selecting at least one text correction model corresponding to the identified at least one domain from among a plurality of text correction models included in the server;
Modifying the received text using the selected at least one text modification model; And
Providing the modified text to the device;
Including,
The text correction model is an artificial intelligence model that receives the received text and outputs the corrected text for speech recognition.

The method of claim 1,
The text correction model is an artificial intelligence model trained using a text output from an ASR model and a ground truth text of a preset domain.

The method of claim 2,
The text correction model is an artificial intelligence model trained using texts output from a plurality of types of ASR models.

The method of claim 1,
The plurality of text correction models respectively correspond to a plurality of domains,
The operation of selecting the at least one text modification model comprises selecting the text modification model corresponding to the identified domain from among the plurality of text modification models corresponding to the plurality of domains.

The method of claim 1,
The operation of receiving the text includes receiving a text stream output from the ASR model,
The operation of identifying the domain,
By accumulating and using the text stream in units of a predetermined section, a domain related to the received text stream is identified.

The method of claim 5,
The operation of identifying the domain,
And identifying a domain related to the received text stream by calculating the domain reliability of the text stream accumulated in units of the predetermined interval.

The method of claim 1,
In the operation of identifying the domain, classifying the text into a plurality of sections, identifying a plurality of domains related to each of the plurality of sections,
The operation of selecting the text correction model includes selecting a plurality of text correction models corresponding to the identified plurality of domains,
The operation of modifying the text includes modifying the text of the plurality of sections, respectively, using the selected plurality of text modification models.

The method of claim 1,
Receiving from the device a first domain reliability for the text calculated by the device; And
Calculating a second domain reliability for the received text;
It further includes,
The operation of identifying the at least one domain,
And identifying a domain related to the received text based on the received first domain reliability and the calculated second domain reliability.

The method of claim 8,
Selecting one of a plurality of domain identification modules in the server based on the first domain reliability;
It further includes,
The operation of calculating the second domain reliability,
And calculating the second domain reliability for the text by using the selected domain identification module.

The method of claim 1,
The ASR model of the device is an end-to-end ASR model,
The text modification model of the server is a sequence-to-sequence model for speech recognition.

In the server for modifying the speech recognition result provided from the device,
Communication interface;
A storage unit for storing a program including one or more instructions; And
A processor that executes one or more instructions of a program stored in the storage unit; Including,
The processor,
Receive text output from the device's Automatic Speech Recognition (ASR) model from the device, identify at least one domain related to the received text, and among a plurality of text correction models included in the server, the identification Selecting at least one text correction model corresponding to the at least one selected domain, correcting the received text using the selected at least one text correction model, and providing the modified text to the device,
The text modification model is an artificial intelligence model that receives the received text and outputs the modified text for speech recognition.

The method of claim 11,
The text modification model is an artificial intelligence model trained using text output from the ASR model and ground truth text of a preset domain.

The method of claim 12,
The text modification model is an artificial intelligence model trained using texts output from a plurality of types of ASR models.

The method of claim 11,
The plurality of text correction models respectively correspond to a plurality of domains,
Wherein the processor executes the one or more instructions and selects the text correction model corresponding to the identified domain from among the plurality of text correction models corresponding to the plurality of domains.

The method of claim 11,
The processor executes the one or more instructions, receives the text stream output from the ASR model, and identifies a domain related to the received text stream by accumulating and using the text stream in units of a predetermined section, server.

The method of claim 15,
The processor is to identify a domain related to the received text stream by executing the one or more instructions and calculating a domain reliability of the text stream accumulated in units of the predetermined interval.

The method of claim 11,
The processor executes the one or more instructions to classify the text into a plurality of sections, identify a plurality of domains related to each of the plurality of sections, and modify a plurality of texts corresponding to the identified plurality of domains Selecting a model, and using the selected plurality of text correction models, to modify the text of the plurality of sections, respectively.

The method of claim 11,
The processor executes the one or more instructions to receive a first domain reliability for the text calculated by the device from the device, calculate a second domain reliability for the received text, and calculate the received text. To identify a domain related to the received text based on the first domain reliability and the calculated second domain reliability.

The method of claim 18,
The processor executes the one or more instructions to select one of a plurality of domain identification modules in the server based on the first domain reliability, and the second domain for the text using the selected domain identification module. Server, which is to calculate the reliability.

The method of claim 11,
The ASR model of the device is an end-to-end ASR model,
The server's text correction model is a sequence-to-sequence model for speech recognition.