KR102395164B1

KR102395164B1 - Method and apparatus for providing speech based conversation service

Info

Publication number: KR102395164B1
Application number: KR1020200078353A
Authority: KR
Inventors: 안민지; 윤설화; 임정보; 홍창득
Original assignee: 카티어스 주식회사
Priority date: 2020-06-26
Filing date: 2020-06-26
Publication date: 2022-05-11
Also published as: KR102475038B1; KR20220000565A; KR20220066228A

Abstract

음성 기반 대화 서비스 제공 방법이 제공된다. 본 발명의 일 실시예에 따른 음성 기반 대화 서비스 제공 방법은, 사용자 장치로부터 오디오 입력을 획득하는 단계와, 상기 오디오 입력의 의미를 식별하는 단계와, 상기 의미의 식별에 실패하였다는 판정에 기초하여, 상기 사용자 장치의 관리자에게 상기 의미를 입력 받는 단계와, 상기 입력 받은 의미를 저장하는 단계를 포함한다.A method for providing a voice-based conversation service is provided. A method of providing a voice-based conversation service according to an embodiment of the present invention includes: acquiring an audio input from a user device; identifying the meaning of the audio input; and based on a determination that the identification of the meaning has failed , receiving the meaning from an administrator of the user device, and storing the received meaning.

Description

Method and apparatus for providing voice-based conversation service

본 발명은 음성 기반 대화 서비스 제공 방법 및 장치에 관한 것이다. 보다 자세하게는, 성인에 비해 언어 구사 수준이 낮은 영유아 또는 미취학 아동을 대상으로 한 음성 기반 대화 서비스를 효율적으로 제공하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for providing a voice-based conversation service. More particularly, it relates to a method and apparatus for efficiently providing a voice-based conversation service for infants or preschoolers whose language proficiency level is lower than that of adults.

인공 신경망 기반의 머신 러닝 기술의 발전은, 음성 기반의 대화 서비스나 텍스트 기반의 챗봇 서비스 등 사람과의 대화를 통해 질문에 대한 적절한 답이나 각종 연관 정보를 제공하는 기술의 발전에 큰 영향을 미치고 있다. 특히 머신 러닝 기술은 텍스트 생성, 자연어 처리, 음성 인식, 및 음성 합성 등의 세부 요소 기술의 발전을 촉진하여, 음성 비서(voice assistant) 서비스나 음성 기반의 대화 서비스를 제공하는 인공지능 스피커와 같은 장치들이 실생활에 보급되는데 크게 기여하고 있다.The development of machine learning technology based on artificial neural networks has a great impact on the development of technologies that provide appropriate answers to questions and various related information through conversations with people, such as voice-based conversation services and text-based chatbot services. . In particular, machine learning technology promotes the development of detailed component technologies such as text generation, natural language processing, speech recognition, and speech synthesis, and devices such as artificial intelligence speakers that provide voice assistant services or voice-based conversation services They are contributing greatly to the spread of these in real life.

특히 음성 비서나 음성 기반 대화 서비스들은, 사용자의 요청에 의해 정보를 제공하거나 사용자의 요청에 따라 다른 장치를 제어하여 사용자가 원하는 방식대로 동작하도록 하는 등 사용자의 요청을 처리하는 기능을 넘어서, 사용자가 표현하지 않은 사용자의 니즈를 충족시켜주는 기능까지 제공하는 수준으로 발전되고 있다. 예를 들어, 사용자의 명시적인 요청이 없어도, 사용자가 알아야 할 정보를 적시에 제공하거나, 사용자의 무료함이나 외로움을 완화하기 위한 엔터테인먼트 성격의 컨텐츠나 감성적인 서비스를 제공하는 것 등이다.In particular, voice assistants and voice-based conversation services go beyond a function of handling a user's request, such as providing information upon a user's request or controlling another device according to a user's request to operate in a user's desired manner. It is developing to a level that provides functions that satisfy the needs of users who have not expressed them. For example, without an explicit request from the user, information that the user needs to know is provided in a timely manner, or entertainment content or emotional service to alleviate the boredom or loneliness of the user.

한편, 머신 러닝에 기반한 음성 인식 자연어 처리 모델은 특정 언어 전반 및 특정인의 발화에 대한 학습 정도에 따라 그 성능이 달라질 수 있지만, 최근 세계 주요 언어들을 기준으로 성인의 발화에 대한 인식률은 크게 개선되어, 높은 빈도로 이용되는 주제 분야에 대해서는 상당히 자연스러운 서비스를 제공하는 수준에 이르렀다.On the other hand, although the performance of the speech recognition natural language processing model based on machine learning may vary depending on the learning level of a specific language and a specific person's speech, the recognition rate for adult speech has been greatly improved based on recent major languages in the world. It has reached the level of providing a fairly natural service for a subject field that is used frequently.

반면에 성인이 아닌 영유아를 포함한 아동들을 대상으로 음성 기반의 서비스를 제공하는 것에는 아직 여러 가지 어려움들이 있다. 아동들은 발성 기관이 완전히 발달되지 않아서 발음이 불완전하고, 어휘력 및 언어 구사 능력이 성인에 비해서 부족하며, 때로는 성인들이 사용하지 않는 또래 어휘나 다른 아동들은 사용하지 않는 자신만의 어휘를 사용하는 특징을 가진다. 이는, 아동의 발화를 올바로 인식하고 그에 내재된 의미를 정확히 파악하여 원활한 음성 기반 대화 서비스를 제공하지 못하는 원인이 된다.On the other hand, there are still several difficulties in providing voice-based services to children, including infants and toddlers, who are not adults. Children have incomplete pronunciation because their vocal organs are not fully developed, their vocabulary and language skills are inferior compared to adults, and sometimes they use peer vocabulary that adults do not use or their own vocabulary that other children do not use. have This is a cause of not being able to provide a smooth voice-based conversation service by correctly recognizing a child's utterance and accurately grasping the meaning inherent therein.

특히 어린 아동의 발화나 어린 아동이 사용하는 어휘는 부모 등 보호자나 동거하는 가족들 외의 다른 사람들은 전혀 이해하기 어려운 경우가 많으므로, 음성 기반 대화 서비스 제공자 측에서, 필드 서비스를 통해 수집된 대화 로그에 포함된 아동의 발화를 사람이 듣고 인식하여 대화 엔진을 지도 학습시키려고 노력하더라도, 발화자인 아동이 표현한 의사를 이해하기 어려운 경우가 많다.In particular, since young children's utterances and vocabulary used by young children are often difficult to understand by anyone other than their parents or guardians or family members living together, the voice-based conversation service provider's side of the conversation log collected through the field service Even if humans listen and recognize the child's utterances included in the song and try to supervise the conversation engine, it is often difficult to understand the intention expressed by the child who is the speaker.

그런데 지금까지 위와 같은 문제점들을 해결하는 기술이나 서비스는 제공되지 못하고 있다.However, no technology or service to solve the above problems has been provided so far.

한국공개특허 제10-2017-0034154호 (2017.3.28. 공개)Korean Patent Publication No. 10-2017-0034154 (published on March 28, 2017)

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 기술적 과제는, 아동을 대상으로 한 음성 기반 대화 서비스를 효율적으로 제공하는 방법 및 장치를 제공하는 것이다. A technical problem to be solved through some embodiments of the present invention is to provide a method and apparatus for efficiently providing a voice-based conversation service for children.

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 다른 기술적 과제는, 사용자인 아동의 발화 및 아동이 사용하는 어휘의 의미를 제대로 파악하여 응답을 제공할 수 있는 음성 기반 대화 서비스를 제공하는 방법 및 장치를 제공하는 것이다. Another technical problem to be solved through some embodiments of the present invention is a method and apparatus for providing a voice-based conversation service capable of providing a response by properly understanding the meaning of the child's utterance and the vocabulary used by the child. will provide

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 사용자인 아동과 높은 친밀도를 형성하여 아동의 서비스 사용 시간을 증가시킬 수 있는 음성 기반 대화 서비스를 제공하는 방법 및 장치를 제공하는 것이다. Another technical problem to be solved through some embodiments of the present invention is to provide a method and apparatus for providing a voice-based conversation service capable of increasing the service use time of children by forming high intimacy with children who are users. .

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 사용자인 아동의 어휘 사용 습관을 개선하고 보호자에게 통지할 수 있는 음성 기반 대화 서비스를 제공하는 방법 및 장치를 제공하는 것이다.Another technical problem to be solved through some embodiments of the present invention is to provide a method and apparatus for providing a voice-based conversation service capable of improving the vocabulary use habit of a child who is a user and notifying a guardian.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 해결하기 위한, 본 발명의 일 실시예에 따른 음성 기반 대화 서비스 제공 방법은, 사용자 장치로부터 오디오 입력을 획득하는 단계와, 상기 오디오 입력의 의미를 식별하는 단계와, 상기 의미의 식별에 실패하였다는 판정에 기초하여, 상기 사용자 장치의 관리자에게 상기 의미를 입력 받는 단계와, 상기 입력 받은 의미를 저장하는 단계를 포함한다.In order to solve the above technical problem, a method for providing a voice-based conversation service according to an embodiment of the present invention includes: obtaining an audio input from a user device; identifying the meaning of the audio input; and identifying the meaning and receiving the meaning from an administrator of the user device based on a determination that the operation has failed, and storing the received meaning.

일 실시예에서, 상기 오디오 입력의 의미를 식별하는 단계는, 상기 오디오 입력으로부터 인식된 사용자 발화 단어를 사전에 포함된 사전 수록 단어들과 비교하는 단계를 포함하고, 상기 입력 받은 의미를 저장하는 단계는, 상기 사용자 발화 단어를 상기 사전에 추가하는 단계를 포함한다.In an embodiment, the step of identifying the meaning of the audio input includes comparing the user uttered word recognized from the audio input with words contained in a dictionary included in a dictionary, and storing the received meaning includes adding the user uttered word to the dictionary.

일 실시예에서, 상기 사용자 발화 단어를 상기 사전 수록 단어들과 비교하는 단계는, 상기 사용자 발화 단어를 제1 사전에 포함된 단어들과 비교하는 단계와, 상기 사용자 발화 단어를 상기 사용자 장치에 개인화된 제2 사전에 포함된 단어들과 비교하는 단계를 포함하며, 상기 사용자 발화 단어를 상기 사전에 추가하는 단계는, 상기 사용자 발화 단어를 상기 제2 사전에 추가하는 단계를 포함한다.In an embodiment, the comparing the user-spoken word with the words included in the dictionary includes comparing the user-spoken word with words included in a first dictionary, and personalizing the user-spoken word to the user device and comparing the words with words included in a second dictionary, wherein the adding of the user-spoken word to the dictionary includes adding the user-spoken word to the second dictionary.

일 실시예에서, 상기 음성 기반 대화 서비스 제공 방법은, 상기 사용자 발화 단어를 상기 사전 수록 단어들과 비교하는 단계는, 상기 사용자 발화 단어를 상기 사용자 장치와 구별되는 다른 사용자 장치에 개인화된 제3 사전에 포함된 단어들과 비교하는 단계를 더 포함할 수 있다.In an embodiment, in the method for providing a voice-based conversation service, the step of comparing the user's spoken word with the words included in the dictionary includes comparing the user's spoken word with a third dictionary personalized to another user device that is distinct from the user device It may further include the step of comparing with the words included in the.

일 실시예에서, 상기 다른 사용자 장치의 화자는 상기 사용자 장치의 화자와 동일한 화자 그룹에 속하는 사용자이며, 상기 화자 그룹은 화자의 나이 및 구사 언어 수준 중 적어도 하나에 기초하여 결정될 수 있다.In an embodiment, the speaker of the other user device is a user belonging to the same speaker group as the speaker of the user device, and the speaker group may be determined based on at least one of an age and a spoken language level of the speaker.

일 실시예에서, 상기 사용자 발화 단어와 관련된 정보를 상기 관리자에게 요청하는 단계는, 상기 사용자 장치를 통해 발생한 대화 이력의 적어도 일부를 상기 관리자에게 제공하는 단계와, 상기 관리자로부터 상기 사용자 발화 단어의 의미를 입력 받는 단계를 포함한다.In an embodiment, the requesting of the manager for information related to the user spoken word includes providing at least a part of a conversation history generated through the user device to the manager; Including the step of receiving input.

일 실시예에서, 상기 사용자 발화 단어와 관련된 정보를 상기 관리자에게 요청하는 단계는, 상기 사용자 장치와 구별되는 다른 사용자 장치에 개인화된 사전에 포함된 단어들 중 상기 사용자 발화 단어와 매칭되는 단어를 상기 관리자에게 제공하는 단계를 포함한다.In an embodiment, the requesting of the administrator for information related to the user's spoken word may include: selecting a word matching the user's spoken word from among words included in a personalized dictionary in another user device that is distinct from the user device and providing it to the administrator.

일 실시예에서, 상기 사용자 발화 단어와 관련된 정보를 상기 관리자에게 요청하는 단계는, 상기 사용자 장치의 화자가 상기 사용자 발화 단어를 기설정된 횟수를 초과하여 발화하였다는 판정에 기초하여, 상기 사용자 발화 단어와 관련된 정보를 상기 관리자에게 요청하는 단계를 포함한다.In an embodiment, the requesting of the administrator for information related to the user uttered word may include: based on a determination that the speaker of the user device has uttered the user uttered word more than a preset number of times, the user uttered word and requesting information related to the manager.

일 실시예에서, 상기 사용자 발화 단어와 관련된 정보를 상기 관리자에게 요청하는 단계는, 상기 사용자 장치의 화자가 상기 사용자 발화 단어를 사전 수록 단어로 교정하도록 상기 음성 기반 대화 서비스가 유도할지 여부를 입력받는 단계를 포함한다.In an embodiment, the step of requesting the information related to the user's spoken word from the manager includes receiving an input of whether the voice-based conversation service induces the speaker of the user device to correct the user's spoken word into a dictionary-recorded word. includes steps.

일 실시예에서, 상기 의미의 식별에 성공하였다는 판정에 기초하여, 상기 음성 기반 대화 서비스가 상기 사용자 장치에 제공하는 응답 발화에 상기 사용자 발화 단어를 사용하도록 설정하는 단계를 포함한다.In one embodiment, based on a determination that the identification of the meaning is successful, setting the voice-based conversation service to use the user uttered word in a response utterance provided to the user device.

일 실시예에서, 상기 의미의 식별에 성공하였다는 판정에 기초하여, 상기 음성 기반 대화 서비스가 상기 사용자 장치에 제공하는 컨텐츠에서 상기 의미에 대응되는 단어를 상기 사용자 발화 단어로 치환하여 제공하도록 설정하는 단계를 포함한다.In one embodiment, on the basis of a determination that the identification of the meaning is successful, the voice-based conversation service sets to provide a word corresponding to the meaning in the content provided to the user device by replacing the word uttered by the user includes steps.

일 실시예에서, 상기 음성 기반 대화 서비스 제공 방법은, 상기 의미의 식별에 성공하였다는 판정에 기초하여, 상기 사용자 장치의 화자의 레벨을 판정하는 단계와, 상기 화자가 제1 레벨에 해당한다는 판정에 기초하여, 상기 사용자 발화 단어를 사용하여 상기 사용자 장치에 음성 기반 대화 서비스를 제공하도록 설정하는 단계와, 상기 화자가 제2 레벨에 해당한다는 판정에 기초하여, 상기 사용자 발화 단어에 매칭되는 사전 수록 단어로 상기 사용자 발화 단어를 교정하도록 유도하는 응답 발화를 상기 사용자 장치에 제공하도록 설정하는 단계를 더 포함한다.In one embodiment, the method for providing a voice-based conversation service includes determining a level of a speaker of the user device based on a determination that the identification of the meaning is successful, and determining that the speaker corresponds to a first level setting to provide a voice-based conversation service to the user device using the user uttered words based on the and setting the user device to provide a response utterance prompting to correct the user uttered word with the word.

일 실시예에서, 상기 음성 기반 대화 서비스 제공 방법은, 상기 화자가 제3 레벨에 해당한다는 판정에 기초하여, 상기 사용자 발화 단어 및 상기 매칭된 사전 수록 단어를 혼용하여 상기 사용자 장치에 음성 기반 대화 서비스를 제공하도록 설정하는 단계를 더 포함한다.In an embodiment, the method for providing a voice-based conversation service includes providing a voice-based conversation service to the user device by mixing the user uttered word and the matched dictionary entry on the basis of determining that the speaker corresponds to a third level Further comprising the step of setting to provide.

일 실시예에서, 상기 화자의 레벨을 판정하는 단계는, 상기 화자의 나이 및 구사 언어 수준 중 적어도 하나에 기초하여 상기 레벨을 판정하는 단계를 포함한다.In one embodiment, determining the level of the speaker includes determining the level based on at least one of an age and a spoken language level of the speaker.

일 실시예에서, 상기 음성 기반 대화 서비스 제공 방법은, 기설정된 기간동안 상기 사용자 장치의 화자가 상기 사용자 발화 단어를 및 상기 사용자 발화 단어와 매칭되는 사전 수록 단어를 각각 사용한 빈도에 관한 정보를 상기 사용자 장치의 관리자에게 제공하는 단계를 더 포함한다.In an embodiment, the method for providing a voice-based conversation service provides information about the frequency with which the speaker of the user device uses the user uttered word and a dictionary-listed word matching the user uttered word for a preset period, respectively, to the user. The method further comprises providing to an administrator of the device.

일 실시예에서, 상기 오디오 입력의 의미를 식별하는 단계는, 상기 오디오 입력으로부터 사용자 발화 단어를 인식하는 단계를 포함하고, 상기 의미의 식별에 실패하였다는 판정에 기초하여, 상기 사용자 장치의 관리자에게 상기 의미를 입력받는 단계는, 상기 오디오 입력을 상기 사용자 장치의 관리자에게 제공하는 단계를 포함하며, 상기 입력받은 의미를 저장하는 단계는, 상기 관리자로부터 입력 받은 정보에 기초하여 오디오 인식 모델을 업데이트하는 단계를 포함한다.In one embodiment, the step of identifying the meaning of the audio input comprises recognizing a user uttered word from the audio input, and based on a determination that the identification of the meaning has failed, to an administrator of the user device The receiving of the meaning may include providing the audio input to a manager of the user device, and the storing of the received meaning may include updating an audio recognition model based on information received from the manager. includes steps.

상기 기술적 과제를 해결하기 위한 본 발명의 다른 일 실시예에 따른 음성 기반대화 서비스 제공 장치는, 사용자 장치로부터 오디오 입력을 획득하는 단계와, 상기 오디오 입력의 의미를 식별하는 단계와, 상기 의미의 식별에 실패하였다는 판정에 기초하여, 상기 사용자 장치의 관리자에게 상기 의미를 입력 받는 단계와, 상기 입력 받은 의미를 저장하는 단계를 수행한다.In an apparatus for providing a voice-based conversation service according to another embodiment of the present invention for solving the above technical problem, the steps of obtaining an audio input from a user device, identifying the meaning of the audio input, and identifying the meaning Based on the determination that the operation has failed, the steps of receiving the meaning from the manager of the user device and storing the received meaning are performed.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 일 실시예에 따른 컴퓨터 판독 가능한 비일시적 기록 매체는, 사용자 장치로부터 오디오 입력을 획득하는 단계와, 상기 오디오 입력의 의미를 식별하는 단계와, 상기 의미의 식별에 실패하였다는 판정에 기초하여, 상기 사용자 장치의 관리자에게 상기 의미를 입력 받는 단계와, 상기 입력 받은 의미를 저장하는 단계를 포함하는 방법을 컴퓨터로 하여금 수행하도록 하는 컴퓨터 프로그램이 저장된 것이다.A computer-readable non-transitory recording medium according to another embodiment of the present invention for solving the above technical problem includes the steps of obtaining an audio input from a user device, identifying the meaning of the audio input, and the meaning A computer program is stored that causes the computer to perform a method comprising the steps of: receiving the input from an administrator of the user device, and storing the received meaning, based on a determination that the identification of the user has failed.

도 1은 본 발명의 일 실시예에 따른 음성 기반 대화 서비스가 제공될 수 있는 예시적인 시스템을 도시한 도면이다.
도 2는 본 발명의 다른 일 실시예에 따른, 음성 기반 대화 서비스 제공 장치의 블록 구성도이다.
도 3은 본 발명의 또 다른 일 실시예에 따른, 음성 기반 대화 서비스 제공 방법의 순서도이다.
도 4는 도 3을 참조하여 설명한 음성 기반 대화 서비스 제공 방법의 일부 단계를 보다 구체적으로 설명하기 위한 도면이다.
도 5 및 도 6은 도 4를 참조하여 설명한 일부 단계를 보다 구체적으로 설명하기 위한 도면이다.
도 7은 도 6을 참조하여 설명한 일부 단계를 보다 구체적으로 설명하기 위한 도면이다.
도 8 및 도 9는 도 4를 참조하여 설명한 일부 단계를 보다 구체적으로 설명하기 위한 도면이다.
도 10은 본 발명의 몇몇 실시예들에 따른 음성 기반 대화 서비스 제공 장치의 하드웨어 구성을 설명하기 위한 도면이다.1 is a diagram illustrating an exemplary system in which a voice-based conversation service can be provided according to an embodiment of the present invention.
2 is a block diagram of an apparatus for providing a voice-based conversation service according to another embodiment of the present invention.
3 is a flowchart of a method for providing a voice-based conversation service according to another embodiment of the present invention.
FIG. 4 is a diagram for explaining in more detail some steps of the method of providing a voice-based conversation service described with reference to FIG. 3 .
5 and 6 are diagrams for explaining in more detail some steps described with reference to FIG. 4 .
FIG. 7 is a diagram for describing in more detail some steps described with reference to FIG. 6 .
8 and 9 are diagrams for explaining in more detail some steps described with reference to FIG. 4 .
10 is a diagram for explaining a hardware configuration of an apparatus for providing a voice-based conversation service according to some embodiments of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 발명의 기술적 사상을 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the technical spirit of the present invention is not limited to the following embodiments, but may be implemented in various different forms, and only the following embodiments complete the technical spirit of the present invention, and in the technical field to which the present invention belongs It is provided to fully inform those of ordinary skill in the art of the scope of the present invention, and the technical spirit of the present invention is only defined by the scope of the claims.

각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly defined in particular. The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.In addition, in describing the components of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is “connected”, “coupled” or “connected” to another component, the component may be directly connected or connected to the other component, but another component is formed between each component. It should be understood that elements may also be “connected,” “coupled,” or “connected.”

명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used herein, "comprises" and/or "comprising" refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

이하, 본 발명의 몇몇 실시예들에 대하여 첨부된 도면에 따라 상세하게 설명한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 음성 기반 대화 서비스가 제공될 수 있는 예시적인 시스템을 도시한 도면이다.1 is a diagram illustrating an exemplary system in which a voice-based conversation service can be provided according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 음성 기반 대화 서비스 제공 시스템은, 음성 기반 대화 서비스 제공 서버(1; 이하 "서버"라고 지칭한다) 및 복수의 사용자 장치들(11a 내지 11d)를 포함한다. 일 실시예에서, 음성 기반 대화 서비스 제공 시스템은 사용자 장치를 관리하는 별도의 관리자 장치(15) 및/또는 서버(1)에 연결된 외부 컨텐츠 제공 장치(3)를 더 포함할 수 있다.As shown in FIG. 1 , a system for providing a voice-based conversation service includes a voice-based conversation service providing server 1 (hereinafter referred to as a “server”) and a plurality of user devices 11a to 11d. In an embodiment, the voice-based conversation service providing system may further include a separate manager device 15 for managing user devices and/or an external content providing device 3 connected to the server 1 .

본 실시예에서, 사용자 장치(11a 내지 11d)는 예컨대 인공지능 스피커, 음성 비서 또는 가상 비서 기능이 탑재되고 오디오 입출력 기능을 갖춘 스마트 장치, 장난감이나 인형, 스마트 워치, 스마트 디스플레이, 및 기타 가전 제품 등을 포함한다. 사용자 장치(11a 내지 11d)는 스피커 및 마이크 등의 오디오 입출력 장치를 통해 사용자와 음성 기반의 인터랙션을 수행할 수 있으며, 카메라 및 디스플레이 장치 등의 영상 입출력 장치를 추가로 이용할 수 있다. 사용자 장치(11a 내지 11d)는 사용자로부터의 입력 받은 음성이나 영상 데이터를 서버(1)에 전달하고, 음성 또는 영상을 포함한 다양한 형식의 응답이나 컨텐츠를 서버(1)로부터 획득하여 사용자에게 제공할 수 있다.In this embodiment, the user devices 11a to 11d are, for example, smart devices, toys or dolls, smart watches, smart displays, and other home appliances that are equipped with artificial intelligence speakers, voice assistants or virtual assistant functions and have audio input/output functions, etc. includes The user devices 11a to 11d may perform voice-based interaction with the user through an audio input/output device such as a speaker and a microphone, and may additionally use an image input/output device such as a camera and a display device. The user devices 11a to 11d may transmit audio or video data input from the user to the server 1, and obtain responses or contents in various formats including audio or video from the server 1 and provide them to the user. there is.

본 실시예에서, 서버(1)는 사용자 장치(11a 내지 11d)로부터의 입력 받은 음성이나 영상 데이터를 분석하여, 적절한 응답을 사용자 장치(11a 내지 11d)에게 제공한다. 이때 서버(1)는 자체적으로 보유한 컨텐츠 외에도, 외부 컨텐츠 제공자가 관리하는 별도의 컨텐츠 제공 장치(3)로부터 컨텐츠를 제공받아서 사용자 장치(11a 내지 11d)에게 제공할 수도 있다. 서버(1)는 사용자 장치(11a 내지 11d)로부터의 입력을 분석하고 그 결과를 기초로 적절한 대화 서비스를 제공하기 위하여 다양한 세부 구성들을 포함할 수 있으며, 이에 대해서는 도 2를 참조하여 보다 자세히 설명하기로 한다.In the present embodiment, the server 1 analyzes audio or video data received from the user devices 11a to 11d, and provides appropriate responses to the user devices 11a to 11d. In this case, the server 1 may receive content from a separate content providing device 3 managed by an external content provider and provide it to the user devices 11a to 11d in addition to the content it owns. The server 1 may include various detailed configurations to analyze the input from the user devices 11a to 11d and provide an appropriate conversation service based on the result, which will be described in more detail with reference to FIG. 2 . do it with

본 실시예에서, 서버(1)와 사용자 장치(11a 내지 11d)는 근거리 네트워크(LAN: Local Area Network), 무선 네트워크(Wireless Network), 원거리 네트워크(WAN: Wide Area Network) 및 이동 통신망(Mobile Network) 등을 통해 상호 연결될 수 있다. In the present embodiment, the server 1 and the user devices 11a to 11d are a local area network (LAN), a wireless network (Wireless Network), a wide area network (WAN), and a mobile network (Mobile Network). ) can be interconnected through

일 실시예에서, 사용자 장치(11a 내지 11d)는 와이파이(WiFi), 블루투스(Bluetooth), 저전력 블루투스(BLE), 지그비(Zigbee), 지웨이브(Z-Wave), RFID, NFC 등의 근거리 무선 통신을 통해, 스마트폰 및 태블릿 PC 등 중장거리 무선 통신이 가능한 장치(예컨대 관리자 장치 15)에 연결된 후, 상기 장치를 통해 음성 기반 대화 서비스 제공 서버(1)에 연결될 수 있다. 이와 같이 사용자 장치(11a 내지 11d)에 근거리 무선 통신을 위한 통신 모듈만을 내장할 경우, 사용자 장치(11a 내지 11d)의 제조 단가를 절감할 수 있다는 장점이 있다.In one embodiment, the user devices 11a to 11d are short-range wireless communication such as Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), Zigbee, Z-Wave, RFID, NFC, etc. After being connected to a device capable of mid-to-long-range wireless communication such as a smart phone and a tablet PC (eg, the manager device 15), it may be connected to the voice-based conversation service providing server 1 through the device. As described above, when only a communication module for short-range wireless communication is embedded in the user devices 11a to 11d, there is an advantage in that the manufacturing cost of the user devices 11a to 11d can be reduced.

이하에서는, 본 발명의 일 실시예에 따른 음성 기반 대화 서비스 제공 장치(1)의 구성 및 동작에 대하여 도 2를 참조하여 설명한다. 음성 기반 대화 서비스 제공 장치(1)는 예컨대 도 1을 참조하여 설명한 음성 기반 대화 서비스 제공 시스템의 서버(1)일 수 있다.Hereinafter, the configuration and operation of the apparatus 1 for providing a voice-based conversation service according to an embodiment of the present invention will be described with reference to FIG. 2 . The voice-based conversation service providing apparatus 1 may be, for example, the server 1 of the voice-based conversation service providing system described with reference to FIG. 1 .

도 2에 도시된 바와 같이, 본 실시예에 따른 음성 기반 대화 서비스 제공 장치(1)는 대화 엔진(10), 사전 관리부(20), 관리자 장치 인터페이스(30), 컨텐츠 관리부(40), 리포트 생성부(50), 컨텐츠 DB(60), 사전 DB(70), 및 대화 이력 DB(80) 등을 포함할 수 있다. As shown in FIG. 2 , the apparatus 1 for providing a voice-based conversation service according to the present embodiment includes a conversation engine 10 , a dictionary management unit 20 , a manager device interface 30 , a content management unit 40 , and report generation. It may include the unit 50 , the content DB 60 , the dictionary DB 70 , and the conversation history DB 80 .

다만, 도 2에는 본 발명의 실시예와 관련 있는 구성 요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 2에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다. 또한, 도 2에 도시된 음성 기반 대화 서비스 제공 장치(1)의 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 복수의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있음에 유의한다. 이하, 각 구성요소에 대하여 설명한다.However, only the components related to the embodiment of the present invention are illustrated in FIG. 2 . Accordingly, one of ordinary skill in the art to which the present invention pertains can see that other general-purpose components other than those shown in FIG. 2 may be further included. In addition, each component of the apparatus 1 for providing a voice-based conversation service shown in FIG. 2 represents functional elements that are functionally separated, and a plurality of components may be implemented in a form that is integrated with each other in an actual physical environment. take note of Hereinafter, each component will be described.

대화 엔진(10)은 사용자 장치(11)와의 전반적인 대화를 처리하는 기능을 제공한다. 대화 엔진(10)은 사전 관리부(20) 및 컨텐츠 관리부(40)로부터 조회되는 데이터 및 대화 이력 DB(80)에 기록된 사용자 장치(11)와의 대화 이력으로부터 파악되는 문맥 정보 등에 기초하여, 사용자 장치(11)와의 대화를 처리한다. 보다 구체적으로, 사용자 장치(11)로부터 입력 받은 사용자 발화 오디오를 인식 및 분석하고, 적절한 응답이나 상황에 맞는 컨텐츠 등 출력 데이터를 생성하여 제공한다.The conversation engine 10 provides a function to process the overall conversation with the user device 11 . The conversation engine 10 is based on the data retrieved from the dictionary management unit 20 and the content management unit 40 and context information identified from the conversation history with the user device 11 recorded in the conversation history DB 80 , the user device (11) handle the conversation with. More specifically, it recognizes and analyzes user speech audio input from the user device 11 , and generates and provides output data such as an appropriate response or content appropriate to the situation.

사전 관리부(20)는 음성 기반 대화 서비스 제공 장치(1)가 제공하는 대화 서비스에서 사용하는 사전으로부터 단어를 조회하고, 사전에 새로운 단어를 추가하는 등, 사전을 관리한다. 사전 데이터는 사전 DB(70)에 기록되어 보관될 수 있다. The dictionary management unit 20 manages the dictionary by inquiring a word from a dictionary used in a conversation service provided by the apparatus 1 for providing a voice-based conversation service, and adding a new word to the dictionary. The dictionary data may be recorded and stored in the dictionary DB 70 .

사전 관리부(20)는, 대화 엔진(10)이 사용자 발화 오디오로부터 인식한 단어의 의미를, 사전 DB(70)를 이용하여 파악하고, 그 결과를 대화 엔진(10)에 제공한다. 또한 사전 관리부(20)는 사전 DB(70)를 이용하여 의미를 파악할 수 없는 단어가 발생할 경우, 후술할 관리자 장치 인터페이스(30)에 상기 단어를 전달하여, 사용자 장치(11)의 관리자를 통해 상기 단어의 의미를 파악할 수 있도록 시도한다.The dictionary management unit 20 grasps the meaning of the word recognized by the dialogue engine 10 from the user's speech audio using the dictionary DB 70 , and provides the result to the dialogue engine 10 . Also, when a word whose meaning cannot be grasped using the dictionary DB 70 occurs, the dictionary management unit 20 transmits the word to the manager device interface 30 to be described later, and passes the word through the manager of the user device 11 . Try to figure out the meaning of the word.

사전 관리부(20)는 복수의 사전들을 이용 및 관리할 수 있다. 예를 들어, 일반적인 어휘 사전, 대화 서비스에서 사용되는 어휘들이나 대화 서비스에서 제공되는 주제들에 매칭되는 단어들만을 포함하는 공통 사전, 복수의 사용자 또는 사용자 장치들 각각에 개인화된 사전들이 사전 관리부(20)에 의해 관리 및 이용될 수 있다. The dictionary manager 20 may use and manage a plurality of dictionaries. For example, the dictionary management unit 20 includes a general vocabulary dictionary, a common dictionary including only words matching vocabulary used in the conversation service or topics provided by the conversation service, and dictionaries personalized to each of a plurality of users or user devices. ) can be managed and used by

본 실시예에서, 사용자 또는 사용자 장치에 개인화된 사전은, 특정 사용자가 사용하는 독특한 단어 및 그 단어의 의미가 무엇인지를 기록한 것일 수 있다. In this embodiment, the dictionary personalized to the user or the user device may be a record of a unique word used by a specific user and the meaning of the word.

특정 사용자가 사용하는 독특한 단어란, 예를 들어 특정 시대에 특정 나이대의 또래들이 공통적으로 사용하는 단어(아동용 컨텐츠에 등장하는 캐릭터 이름, 최신 약어나 은어 등)일 수 있다. The unique word used by a specific user may be, for example, a word commonly used by peers of a specific age in a specific age (character names appearing in content for children, latest abbreviations or slang, etc.).

특정 사용자가 사용하는 독특한 단어는, 다른 사용자들은 일반적으로 사용하지 않으나, 오로지 특정 사용자만이 사용하는 고유한 단어일 수 있다. 예를 들어, 아이스크림을 "아꿍" 또는 "아킴"이라고 지칭 또는 발음하는 사용자가 있을 수 있고, 할머니를 "나나", 할아버지를 "바바"라고 지칭하는 사용자가 있을 수 있다. 이처럼, 특히 영유아 아동들이 사용하는 어휘는 아동별로 편차가 크고, 동일한 단어도 서로 다른 의미로 사용되는 경우가 많다. 따라서, 특정 사용자 또는 특정 사용자가 사용하는 사용자 장치에 개인화된 사전에 기록 및 관리될 필요가 있다. A unique word used by a specific user may be a unique word used only by a specific user, but not commonly used by other users. For example, there may be users who refer to or pronounce ice cream as “A goong” or “Akim”, and there may be users who refer to their grandmother as “Nana” and their grandfather as “Baba”. As such, the vocabulary used by infants and young children in particular varies greatly from child to child, and the same word is often used with different meanings. Therefore, it is necessary to record and manage in advance personalized to a specific user or a user device used by a specific user.

그 밖에도 사용자 또는 사용자 장치에 개인화된 사전에는, 가족들이나 반려동물의 이름 등의 고유명사가 기록되어 관리될 수 있다. In addition, proper nouns such as names of family members or companion animals may be recorded and managed in the dictionary personalized to the user or user device.

본 실시예에서는 사전 관리부(20)를 통해 전술한 바와 같이 특정 사용자 또는 특정 사용자 장치에 개인화된 사전을 관리 및 이용함으로써, 사용자로 하여금 대화 서비스에 더 높은 친밀감을 가지도록 유도하고, 궁극적으로 대화 서비스에 대한 사용자 인게이지먼트를 향상시킬 수 있게 된다. In this embodiment, the dictionary management unit 20 manages and uses the personalized dictionary for a specific user or a specific user device as described above, thereby inducing the user to have a higher affinity for the conversation service, and ultimately, the conversation service can improve user engagement with

다음으로, 관리자 장치 인터페이스(30)는 특정 사용자 장치(11)의 관리자가 이용하는 관리자 장치(15)에게 정보를 제공하고, 대화 서비스에 필요한 정보를 관리자 장치(15)로부터 획득한다. 관리자 장치(15)란, 예컨대 사용자 장치(11)의 사용자인 아동의 부모 등 보호자가 사용하는 장치이다. 본 실시예에서 관리자란, 음성 기반 대화 서비스 제공자 측의 관리 인원을 지칭하는 것이 아니라, 최종 사용자가 사용하는 사용자 장치(11)를 소유 및/또는 관리하는 자로서, 최종 사용자의 부모나 보호자 등을 의미한다는 것에 유의한다. Next, the manager device interface 30 provides information to the manager device 15 used by the manager of the specific user device 11 , and obtains information necessary for a conversation service from the manager device 15 . The manager device 15 is, for example, a device used by a guardian such as a parent of a child who is a user of the user device 11 . In this embodiment, the manager does not refer to a management person on the side of the voice-based conversation service provider, but a person who owns and/or manages the user device 11 used by the end user, and includes a parent or guardian of the end user. Note that it means

구체적으로, 관리자 장치 인터페이스(30)는 사용자 장치(11)의 사용자(화자)의 대화 이력을 관리자 장치(15)에 제공하고, 사용자 장치(11)의 화자가 사용한 단어의 의미를 관리자 장치(15)로부터 입력받을 수 있다. 또한 관리자 장치 인터페이스(30)는 사용자 장치(11)의 화자가 발화한 오디오 클립을 관리자 장치(15)에 제공하고, 상기 오디오 클립을 통해 파악되는 화자의 의사나 상기 오디오 클립에 담긴 소리의 의미를 관리자 장치(15)로부터 입력받을 수 있다. Specifically, the manager device interface 30 provides the conversation history of the user (speaker) of the user device 11 to the manager device 15 , and provides the meaning of the words used by the speaker of the user device 11 to the manager device 15 . ) can be entered from In addition, the manager device interface 30 provides an audio clip uttered by the speaker of the user device 11 to the manager device 15, and the intention of the speaker recognized through the audio clip or the meaning of the sound contained in the audio clip. An input may be received from the manager device 15 .

나아가, 관리자 장치 인터페이스(30)는 사용자 장치(11)의 사용자(화자)의 언어 능력 또는 언어 습관을 가리키는 리포트를 관리자 장치(15)에게 제공할 수 있다. 예컨대, 사용자 장치(11)의 사용자(화자)가 사용하는 단어의 종류 및 빈도 등에 관한 통계 데이터를 관리자 장치(15)에게 제공할 수 있다.Further, the manager device interface 30 may provide the manager device 15 with a report indicating the language ability or linguistic habits of the user (speaker) of the user device 11 . For example, statistical data on the type and frequency of words used by the user (speaker) of the user device 11 may be provided to the manager device 15 .

리포트 생성부(50)는 대화 이력 DB(80)로부터, 각각의 사용자 장치(11)의 사용자(화자)의 언어 능력 또는 언어 습관을 가리키는 리포트를 생성하여, 관리자 장치 인터페이스(30)에 제공할 수 있다.The report generating unit 50 may generate a report indicating the language ability or language habit of the user (speaker) of each user device 11 from the conversation history DB 80 , and provide it to the manager device interface 30 . there is.

다음으로 컨텐츠 관리부(40)는, 대화 엔진(10)로부터의 컨텐츠 요청에 응답하여, 컨텐츠를 제공한다. 본 실시예에서, 컨텐츠는 오디오 북 및 팟캐스트 등의 오디오 컨텐츠, 동영상 컨텐츠, 인터랙션 가능한 멀티 미디어 컨텐츠 등을 포함한다. 사용자 장치(11)의 사용자가 아동일 경우, 아동에 특화된 컨텐츠가 제공될 수 있다.Next, the content management unit 40 provides content in response to a content request from the conversation engine 10 . In the present embodiment, the contents include audio contents such as audio books and podcasts, video contents, interactive multimedia contents, and the like. When the user of the user device 11 is a child, content specific to the child may be provided.

컨텐츠는 음성 기반 대화 서비스 장치(1)의 컨텐츠 DB(60)로부터 획득된 것일 수 있다. 일 실시예에서, 컨텐츠는 외부의 컨텐츠 제공 장치(3)로부터 획득된 것일 수 있다. 일 실시예에서, 컨텐츠 DB(60)에는 주제어와 컨텐츠 식별자의 연관 관계를 포함한 인덱스 데이터만이 저장되고, 컨텐츠 관리부(40)는 컨텐츠 제공 장치(3)로부터 제공받은 컨텐츠를 대화 엔진(10)에 전달할 수 있다.The content may be obtained from the content DB 60 of the voice-based conversation service device 1 . In an embodiment, the content may be obtained from an external content providing device 3 . In one embodiment, only index data including the relation between the main word and the content identifier is stored in the content DB 60 , and the content management unit 40 transmits the content provided from the content providing device 3 to the conversation engine 10 . can transmit

컨텐츠 관리부(40)는, 컨텐츠 DB(60) 및/또는 컨텐츠 제공 장치(3)로부터 제공받은 컨텐츠를 대화 엔진(10)에 제공하기 전에 적절히 가공할 수 있다. 예를 들어, 사용자 장치(11)의 사용자(화자)가 보다 높은 친밀도를 가지도록 유도하기 위하여, 컨텐츠에 등장하는 주인공 캐릭터의 이름을 화자의 이름으로 치환하거나, 컨텐츠에 포함된 일반 단어(예컨대 "아이스크림")를, 사용자 장치(11)에 개인화된 사전에 포함된 화자 고유의 단어(예컨대 "아꿍")로 치환하는 가공을 거쳐서, 대화 엔진(10)에 제공할 수 있다. 다른 일 실시예에서는, 상기 가공을 대화 엔진(10)이 처리할 수도 있다.The content management unit 40 may appropriately process the content provided from the content DB 60 and/or the content providing device 3 before providing it to the dialog engine 10 . For example, in order to induce the user (speaker) of the user device 11 to have higher intimacy, the name of the main character appearing in the content may be substituted with the name of the speaker, or a general word included in the content (eg, " Ice cream") may be provided to the conversation engine 10 through processing of replacing the word (eg, "a goong") unique to the speaker included in the dictionary personalized in the user device 11 . In another embodiment, the processing may be processed by the dialog engine 10 .

다음으로, 본 실시예에 따른 음성 기반 대화 서비스 제공 장치(1)의 동작을 설명하기로 한다.Next, an operation of the apparatus 1 for providing a voice-based conversation service according to the present embodiment will be described.

먼저 대화 엔진(10)에 의해 사용자 장치(11)의 화자와의 대화가 개시되고, 사용자 장치(11)로부터 사용자 발화를 담은 오디오가 획득된다. 대화 엔진(10)은 사용자 장치(11)와 주고받은 대화의 적어도 일부를 대화 이력 DB(80)에 기록한다. 대화 엔진(10)은 사용자 발화 오디오를 분석하여, 사용자 발화 오디오에 포함된 단어들을 식별한다. 대화 엔진(10)은 상기 식별된 사용자 발화 단어를 사전 관리부(20)에 제공한다.First, a conversation with a speaker of the user device 11 is initiated by the conversation engine 10 , and audio containing the user's utterance is obtained from the user device 11 . The conversation engine 10 records at least a part of the conversation exchanged with the user device 11 in the conversation history DB 80 . The dialogue engine 10 analyzes the user's speech audio to identify words included in the user's speech audio. The conversation engine 10 provides the identified user uttered word to the dictionary management unit 20 .

사전 관리부(20)는 사전 DB(70)에 저장된 단어들과 사용자 발화 단어를 비교하여, 사용자 발화 단어의 의미를 파악한다. 이때 사전 관리부(20)는 전술한 바와 같이, 하나 이상의 유형의 사전들을 이용 및 조회할 수 있다. 예를 들어, 사전 관리부(20)는 일반 어휘 사전, 공통 사전, 및 사용자 장치(11)에 개인화된 사전, 사용자 장치(11)가 아닌 다른 사용자 장치에 개인화된 사전 등을 조회하여, 사용자 발화 단어의 의미를 파악해 볼 수 있다.The dictionary management unit 20 compares the words stored in the dictionary DB 70 with the user's spoken words to determine the meaning of the user's spoken words. In this case, the dictionary manager 20 may use and inquire one or more types of dictionaries as described above. For example, the dictionary manager 20 inquires a general vocabulary dictionary, a common dictionary, a dictionary personalized to the user device 11 , a dictionary personalized to a user device other than the user device 11 , etc. can understand the meaning of

만약, 상기 사전들 중 적어도 하나에 사용자 발화 단어와 매칭되는 사전 수록 단어가 존재하는 경우, 사전 관리부(20)는 사용자 발화 단어의 의미를 파악할 수 있다. 이 경우 사전 관리부(20)는 사용자 발화 단어의 의미를 대화 엔진(10)에 제공한다. 대화 엔진(10)은 상기 의미에 기초하여 음성 기반의 대화 서비스를 계속하여 제공한다. 예를 들어 대화 엔진(10)은 컨텐츠 관리부(40)를 통해 상기 사용자 발화 단어(아이스크림을 의미하는 "아꿍")가 포함되는 주제(예컨대 간식)에 매칭되는 컨텐츠를 제공받아서, 사용자 장치(11)에 제공할 수 있다. 또한 대화 엔진(10)은 상기 사용자 발화 단어("아꿍")를 문장 내에 사용하면서 사용자 장치(11)와 대화를 이어 나갈 수 있다. 또 다른 예로 대화 엔진(10)은 기성 컨텐츠에서 일반 명사 "아이스크림"이 사용된 부분을 "아꿍"으로 치환하는 방식으로 가공된 컨텐츠를 사용자 장치(11)에 제공함으로써, 사용자 장치(11)의 사용자로 하여금 보다 친밀하고 개인화된 대화 서비스를 제공받는 느낌을 가지도록 할 수 있다.If, in at least one of the dictionaries, there is a dictionary-listed word matching the user's spoken word, the dictionary management unit 20 may determine the meaning of the user's spoken word. In this case, the dictionary manager 20 provides the meaning of the user's spoken word to the dialog engine 10 . The conversation engine 10 continues to provide a voice-based conversation service based on the above meaning. For example, the conversation engine 10 receives the content matching the subject (eg, snacks) including the user's uttered word (“a goong” meaning ice cream) through the content management unit 40, and the user device 11 can be provided to In addition, the conversation engine 10 may continue the conversation with the user device 11 while using the user's spoken word (“a goong”) in a sentence. As another example, the dialog engine 10 provides the user device 11 with content processed in such a way that the portion where the general noun “ice cream” is used is replaced with “a goong” in the ready-made content, so that the user of the user device 11 It can make them feel that they are being provided with a more intimate and personalized conversation service.

상기 사전들 중 적어도 하나에 사용자 발화 단어와 매칭되는 사전 수록 단어가 존재하지 않는 경우, 사전 관리부(20)는 사용자 발화 단어의 의미를 파악하는데 일차적으로 실패한 것일 수 있다. When there is no dictionary-listed word matching the user's spoken word in at least one of the dictionaries, the dictionary management unit 20 may have failed primarily to determine the meaning of the user's spoken word.

이 경우 사전 관리부(20)는 대화 엔진(10)으로 하여금 상기 사용자 발화 단어의 의미를 질문하는 발화를 사용자 장치(11)에게 제공하도록 하고, 사용자 장치(11)의 화자가 답변하도록 유도함으로써, 사용자 발화 단어의 의미를 파악해 볼 수 있다. In this case, the dictionary manager 20 causes the dialog engine 10 to provide the user device 11 with an utterance asking the meaning of the user uttered word, and induces the speaker of the user device 11 to answer, so that the user You can figure out the meaning of spoken words.

또한 사전 관리부(20)는 소정의 조건이 만족되었을 경우(예컨대 사익 사용자 발화 단어의 누적 사용 횟수가 임계치를 초과하였을 경우), 관리자 장치 인터페이스(30)를 통해 관리자 장치(15)를 통해 상기 사용자 발화 단어의 의미를 파악해 볼 수 있다. 여기서, 관리자는 상기 사용자 발화 단어가 획득된 사용자 장치(11)를 소유 및/또는 관리하는 자로서 사용자 장치(11)의 화자의 보호자 등이며, 관리자 장치(15)는 상기 관리자가 사용하는 장치를 지칭하는 것임에 유의한다. 관리자 장치 인터페이스(30)는, 예를 들어 사용자 발화 단어가 포함된 대화 기록의 일부를 관리자 장치(15)에 제공하고, 관리자 장치(15)로부터 상기 사용자 발화 단어의 의미를 입력받을 수 있다. 사전 관리부(20)는 관리자 장치(15)로부터의 입력에 기초하여, 사전 DB(70)에 상기 사용자 발화 단어 및 그 의미를 추가할 수 있다. In addition, when a predetermined condition is satisfied (eg, when the cumulative number of use of a private user uttered word exceeds a threshold), the dictionary management unit 20 is configured to use the user utterance through the manager device 15 through the manager device interface 30 . You can figure out the meaning of words. Here, the manager is a person who owns and/or manages the user device 11 from which the user uttered words are obtained, and is a guardian of the speaker of the user device 11 , and the manager device 15 controls the device used by the manager. Note that this refers to The manager device interface 30 may provide, for example, a part of a conversation record including user speech words to the manager device 15 , and receive the meaning of the user speech words from the manager device 15 . The dictionary manager 20 may add the user uttered word and its meaning to the dictionary DB 70 based on an input from the manager device 15 .

지금까지 도 2를 참조하여, 본 실시예에 따른 음성 기반 대화 서비스 제공 장치(1)의 구성과 동작에 대하여 설명하였다. 이하에서는 도 3 내지 도 9를 참조하여, 본 발명의 다른 일 실시예에 따른 음성 기반 대화 서비스 제공 방법에 대하여 설명한다.So far, the configuration and operation of the apparatus 1 for providing a voice-based conversation service according to the present embodiment have been described with reference to FIG. 2 . Hereinafter, a method of providing a voice-based conversation service according to another embodiment of the present invention will be described with reference to FIGS. 3 to 9 .

도 3은 본 실시예에 따른 음성 기반 대화 서비스 제공 방법의 순서도이다. 단, 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.3 is a flowchart of a method for providing a voice-based conversation service according to the present embodiment. However, this is only a preferred embodiment for achieving the object of the present invention, and it goes without saying that some steps may be added or deleted as needed.

도 3에 도시된 음성 기반 대화 서비스 제공 방법의 각 단계는 예컨대 도 2를 참조하여 설명한 음성 기반 대화 서비스 제공 장치(1)에 의해 수행될 수 있다. 다시 말하면, 상기 음성 기반 대화 서비스 제공 방법의 각 단계는 음성 기반 대화 서비스 제공 장치(1) 등의 컴퓨팅 장치의 프로세서에 의해 실행되는 하나 이상의 인스트럭션들로 구현될 수 있다. 상기 음성 기반 대화 서비스 제공 방법에 포함되는 제1 단계들은 제1 컴퓨팅 장치에 의하여 수행되고, 상기 방법의 제2 단계들은 제2 컴퓨팅 장치에 의하여 수행될 수 있다. 이하에서는, 상기 음성 기반 대화 서비스 제공 방법의 각 단계가 음성 기반 대화 서비스 제공 장치(1)에 의해 수행되는 것을 가정하여 설명을 이어가도록 한다. 다만, 각 단계의 수행 주체는 단지 예시일 뿐, 본 발명이 이하의 설명에 의해 한정되는 아니며, 설명의 편의를 위해 상기 음성 기반 대화 서비스 제공 방법에 포함되는 일부 단계의 동작 주체는 그 기재가 생략될 수도 있다.Each step of the method for providing a voice-based conversation service shown in FIG. 3 may be performed, for example, by the apparatus 1 for providing a voice-based conversation service described with reference to FIG. 2 . In other words, each step of the method for providing a voice-based conversation service may be implemented with one or more instructions executed by a processor of a computing device such as the apparatus 1 for providing a voice-based conversation service. First steps included in the method for providing a voice-based conversation service may be performed by a first computing device, and second steps of the method may be performed by a second computing device. Hereinafter, it is assumed that each step of the method for providing a voice-based conversation service is performed by the apparatus 1 for providing a voice-based conversation service. However, the subject performing each step is merely an example, and the present invention is not limited by the following description, and for the convenience of explanation, the subject of the operation of some steps included in the method for providing a voice-based conversation service is omitted. it might be

별도로 언급하지 않더라도, 본 실시예에 따른 음성 기반 대화 서비스 제공 방법의 각 동작에 있어서, 도 1 내지 도 2를 참조하여 설명된 실시예들의 기술 사상이 반영될 수 있음은 물론이다. 또한, 반대로 본 실시예에 따른 음성 기반 대화 서비스 제공 방법의 각 동작에 반영된 기술 사상 역시 도 2를 참조하여 설명된 음성 기반 대화 서비스 제공 장치(1)의 구성 및 동작에 반영될 수 있을 것이다.Of course, the technical ideas of the embodiments described with reference to FIGS. 1 and 2 may be reflected in each operation of the method for providing a voice-based conversation service according to the present embodiment, even if not separately mentioned. Conversely, the technical idea reflected in each operation of the method for providing a voice-based conversation service according to the present embodiment may also be reflected in the configuration and operation of the apparatus 1 for providing a voice-based conversation service described with reference to FIG. 2 .

도 3에 도시된 바와 같이, 본 실시예에 따른 음성 기반 대화 서비스 제공 방법은, 인사말 제공 단계(S10), 사용자 장치로부터 사용자 발화 오디오를 획득하는 단계(S20), 사용자 발화 오디오를 분석하는 단계(S30), 응답을 제공하는 단계(S40)를 포함할 수 있다.As shown in FIG. 3 , the method for providing a voice-based conversation service according to the present embodiment includes the steps of providing a greeting (S10), obtaining user speech audio from the user device (S20), and analyzing the user speech audio (S10). S30), and providing a response (S40).

먼저 단계(S10)에서는 음성 기반 대화 서비스 제공 장치(1)가 사용자 장치(11)에 인사말을 제공할 수 있다. 예를 들어 사용자 장치(11)가 출고 후 처음으로 음성 기반 대화 서비스 제공 장치(1)에 연결되어 대화 서비스가 구동되는 경우, 그러한 상황에 적절한 인사말에 제공될 수 있으며, 사용자 장치(11)가 오늘 처음 구동된 경우 또는 일시 정지된 후에 재시작된 경우 등, 다양한 상황을 반영한 적절한 인사말이 제공됨으로써, 사용자 장치(11)의 화자에게 보다 높은 친밀감을 유발할 수 있다. First, in step S10 , the voice-based conversation service providing device 1 may provide a greeting to the user device 11 . For example, when the user device 11 is connected to the voice-based conversation service providing device 1 for the first time after shipment and the conversation service is started, a greeting appropriate for such a situation may be provided, and the user device 11 is today When it is initially driven or when it is restarted after being paused, an appropriate greeting reflecting various situations is provided, thereby increasing intimacy with the speaker of the user device 11 .

이어서, 대화 모델을 이용하여, 화자가 관심을 가질 만한 주제에 관한 대화를 시도할 수 있다. 예를 들어 "좋아하는 동물이 뭐니?", "어떤 맛의 아이스크림을 가장 좋아하니?" 등과 같이 화자의 관심을 끌어서 대화를 유도할 만한 질문 문장을 제공할 수 있다.The conversation model can then be used to attempt to have a conversation about a topic that the speaker may be interested in. For example, "What's your favorite animal?", "What flavor of ice cream do you like the most?" It is possible to provide a question sentence that can induce a conversation by attracting the speaker's attention, such as.

단계(S20)에서는 사용자 장치로부터 사용자 발화 오디오가 획득된다. 예를 들어, 사용자 장치(11)를 사용하는 아동으로부터 상기 질문 문장에 대한 답변을 획득할 수 있다.In step S20, the user's speech audio is obtained from the user device. For example, an answer to the question sentence may be obtained from a child using the user device 11 .

단계(S30)에서는 단계(S20)에서 획득된 사용자 발화 오디오가 분석된다. 단계(S30)에서는 사용자 발화 오디오를 텍스트로 변환하고, 이를 자연어 처리하여 문장 구조를 파악한 후, 문장에 포함된 단어의 의미를 사전에서 조회하는 등의 동작이 수행될 수 있다. 단계(S30)에 대해서는 도 4 내지 도 9를 참조하여 보다 자세히 설명하기로 한다.In step S30, the user uttered audio obtained in step S20 is analyzed. In step S30, an operation such as converting the user's utterance audio into text, natural language processing to determine the sentence structure, and inquiring the meaning of the word included in the sentence in the dictionary, may be performed. Step S30 will be described in more detail with reference to FIGS. 4 to 9 .

단계(S40)에서는 단계(S30)에서 분석된 결과에 기초하여, 사용자에게 적절한 응답이 제공될 수 있다. 예를 들어, 대화 모델을 이용하여 대화를 계속하여 진행하거나, 사용자의 발화를 기초로 파악되는 사용자의 관심사에 부합하는 컨텐츠를 사용자 장치(11)를 통해 사용자에게 제공할 수 있다. In step S40 , an appropriate response may be provided to the user based on the result analyzed in step S30 . For example, the conversation may be continued using the conversation model, or content matching the user's interest identified based on the user's utterance may be provided to the user through the user device 11 .

이하에서는 도 4를 참조하여, 도 3의 단계(S30)에서 사용자 발화 오디오를 분석하는 과정에 대하여 보다 자세히 설명한다.Hereinafter, with reference to FIG. 4 , a process of analyzing the user's speech audio in step S30 of FIG. 3 will be described in more detail.

도 4를 참조하면, 우선 단계(S31)에서는 사용자 발화 오디오를 텍스트로 변환하고(STT: Speech to Text), 단계(S32)에서는 변환된 텍스트를 대상으로 자연어 처리(NLP: Natural Language Processing)를 처리하여, 사용자 발화 오디오에 포함된 문장을 구조화된 사용자 발화 단어들의 모음으로 인식한다. STT와 NLP 처리는 종래에 알려진 다양한 기술과 방법들이 사용될 수 있으며, 본 발명은 그 중 어느 하나의 기술이나 방법을 사용하는 것으로 한정되지 않는다. 단계(S31) 및 단계(S32)는 예컨대 음성 기반 대화 서비스 제공 장치(1)의 대화 엔진(10)에 의해 독자적으로 수행될 수도 있지만, 음성 기반 대화 서비스 제공 장치(1)와 네트워크로 연결된 외부 서비스 또는 외부 엔진을 이용하여 수행될 수도 있다. 도 4에 도시되지는 않았지만, 본 발명의 몇몇 실시예에서는, 단계(S31) 또는 단계(S32)의 과정이 실패한 경우, 단계(S35)의 매칭 실패 처리 루틴(S35)으로 진행할 수 있다.Referring to FIG. 4 , first, in step S31 , audio of a user's speech is converted into text (STT: Speech to Text), and in step S32, natural language processing (NLP) is processed on the converted text. Thus, a sentence included in the user speech audio is recognized as a collection of structured user speech words. Various techniques and methods known in the art may be used for STT and NLP processing, and the present invention is not limited to using any one technique or method. Steps S31 and S32 may be independently performed, for example, by the dialogue engine 10 of the voice-based conversation service providing apparatus 1, but an external service connected to the voice-based conversation service providing apparatus 1 by a network Alternatively, it may be performed using an external engine. Although not shown in FIG. 4 , in some embodiments of the present invention, when the process of step S31 or S32 fails, the matching failure processing routine S35 of step S35 may proceed.

단계(S33)에서는 전 단계에서 인식된 사용자 발화 단어를 사전에 수록된 단어들과 비교하여, 사용자 발화 단어의 의미가 파악된다. 다시 말해 상기 사용자 발화 단어에 매칭되는 사전 수록 단어가 존재하는지 여부가 판정된다. 단계(S33)에 대해서는 도 5를 참조하여 보다 자세히 후술하기로 한다.In step S33, the user's spoken word recognized in the previous step is compared with words recorded in the dictionary to determine the meaning of the user's spoken word. In other words, it is determined whether there is a dictionary-listed word matching the user uttered word. Step S33 will be described later in more detail with reference to FIG. 5 .

단계(S34)에서는, 상기 사용자 발화 단어에 매칭되는 사전 수록 단어가 존재하지 않는다면 단계(S35)로 분기하고, 상기 사용자 발화 단어에 매칭되는 사전 수록 단어가 존재한다면 단계(S36)로 분기한다.In step S34, if there is no dictionary-listed word matching the user's spoken word, it branches to step S35, and if there is a dictionary-listed word matching the user's spoken word, it branches to step S36.

단계(S35) 및 단계(S36)에서는, 사용자 발화 단어가 사전에 존재하지 않을 경우를 위한 루틴 및 사용자 발화 단어가 사전에 존재하는 경우를 위한 루틴이 각각 수행된다. 이에 관해서는 도 6 내지 도 9를 참조하여 보다 자세히 설명하기로 한다.In steps S35 and S36, a routine for a case in which the user spoken word does not exist in the dictionary and a routine for a case in which the user spoken word exists in the dictionary are respectively performed. This will be described in more detail with reference to FIGS. 6 to 9 .

이하에서는 도 5를 참조하여, 도 4의 단계(S33)에서 사용자 발화 단어를 사전과 비교하는 과정을 보다 자세히 설명한다. Hereinafter, with reference to FIG. 5 , a process of comparing the user's spoken word with the dictionary in step S33 of FIG. 4 will be described in more detail.

본 실시예에 따른 음성 기반의 대화 서비스 제공 방법에서는, 도 2를 참조하여 설명한 실시예에서와 같이, 복수의 사전들이 이용될 수 있다. 복수의 사전들은 예를 들어, 일반적인 어휘 사전, 대화 서비스에서 사용되는 어휘들이나 대화 서비스에서 제공되는 주제들에 매칭되는 단어들만을 포함하는 공통 사전, 복수의 사용자 또는 사용자 장치들 각각에 개인화된 사전들을 포함한다.In the method of providing a voice-based conversation service according to the present embodiment, as in the embodiment described with reference to FIG. 2 , a plurality of dictionaries may be used. The plurality of dictionaries include, for example, a general vocabulary dictionary, a common dictionary including only words matching vocabulary used in the conversation service or topics provided in the conversation service, and dictionaries personalized to each of the plurality of users or user devices. include

단계(S331)에서는 일반 어휘 사전 및/또는 공통 사전에서 사용자 발화 단어가 조회된다. 다시 말해, 사용자 발화 단어가 일반 어휘 사전 및/또는 공통 사전에 수록된 단어들과 비교되고, 매칭되는 사전 수록 단어가 존재하는지 여부가 판정된다. In step S331, the user's spoken word is searched for in the general vocabulary dictionary and/or the common dictionary. In other words, the user uttered word is compared with words included in the general vocabulary dictionary and/or the common dictionary, and it is determined whether a matching dictionary entry word exists.

단계(S332)에서는 현재 사용자 장치에 개인화된 사전에서 사용자 발화 단어가 조회되고, 매칭되는 사전 수록 단어가 존재하는지 여부가 판정된다. 사용자 장치에 개인화된 사전이란, 대화가 진행 중인 사용자 장치 #1 또는 그 화자 #1에 개인화된 사전일 수 있다. 사용자 장치에 개인화된 사전은, 사용자 발화 단어가 입력된 해당 사용자 장치 #1 또는 그 장치를 사용하는 화자 #1이 고유하게 사용하는 단어들을 기록한 사전일 수 있다. In step S332, a user uttered word is searched for in a dictionary personalized to the current user device, and it is determined whether a matching dictionary entry word exists. The dictionary personalized to the user device may be a dictionary personalized to the user device #1 or the speaker #1 of the conversation in progress. The dictionary personalized to the user device may be a dictionary in which words uniquely used by the user device #1 into which the user utterance word is input or the speaker #1 using the device is recorded.

단계(S333)에서는 상기 사용자 장치 #1가 아닌 다른 사용자 장치(사용자 장치 #2, #3, ...)에 개인화된 사전에서 사용자 발화 단어가 조회되고, 매칭되는 사전 수록 단어가 있는지 판정된다. 즉, 사용자 발화 단어를 발화한 화자 #1의 사용자 장치 #1가 아닌, 다른 화자들(화자 #2, #3, ...)의 사용자 장치(장치 #2, #3, ...)에 개인화된 사전에, 상기 사용자 발화 단어가 존재하는지 조회된다.In step S333, a user uttered word is searched for in a dictionary personalized to a user device other than the user device #1 (user device #2, #3, ...), and it is determined whether there is a matching dictionary entry word. That is, the user's uttered word is not transmitted to the user device #1 of the speaker #1, but to the user devices (devices #2, #3, ...) of other speakers (speakers #2, #3, ...). In the personalized dictionary, it is queried whether the user uttered word exists.

이때 이용되는 다른 사용자 장치의 개인화된 사전은, 사용자 장치 #1의 화자 #1와 동일한 화자 그룹에 속하는 화자들의 장치에 개인화된 사전일 수 있다. In this case, the personalized dictionary of another user device used may be a dictionary personalized to the devices of speakers belonging to the same speaker group as speaker #1 of the user device #1.

화자 그룹은 화자의 거주 지역, 나이, 언어 구사 레벨, 컨텐츠 소비 취향 등에 기초하여 결정되는 그룹일 수 있다. 예를 들어, 화자 #1이 수도권에 거주하는 34개월령의 아동이고, 제1 내지 제3 레벨로 구별되는 언어 구사 레벨 중에 제1 레벨에 해당하는 아동일 경우, 이와 비슷한 나이와 유사한 언어 구사 레벨을 가지는, 수도권에 거주하는 다른 화자의 장치에 개인화된 사전이 우선적으로 사용될 수 있다. 다른 예로, 화자 #1이 수도권에 거주하는 45개월령의 아동이고, 로보카폴리 애니메이션을 즐겨 시청하는 아동일 경우, 이와 유사한 거주지와 나이를 가지고 유사한 컨텐츠를 즐겨 시청하는 아동이 사용하는 장치에 개인화된 사전이 우선적으로 사용될 수 있다. The speaker group may be a group determined based on the speaker's residential area, age, language proficiency level, content consumption taste, and the like. For example, if speaker #1 is a 34-month-old child residing in the metropolitan area and is a child corresponding to the first level among the language proficiency levels classified into the first to third levels, a similar age and similar language proficiency level For branch, a dictionary personalized to the device of another speaker living in the metropolitan area can be used preferentially. As another example, if speaker #1 is a 45-month-old child living in the metropolitan area and a child who enjoys watching Robocar Poli animations, it is personalized to the device used by a child who enjoys watching similar content with a similar residence and age. A dictionary may be used preferentially.

다시 말해, 사용자 장치 #1의 화자 #1이 발화한 단어의 의미를 파악하기 위해, 화자 #1과 유사한 거주지, 나이, 언어 구사 레벨, 및/또는 취향을 가지는 다른 화자가 구사하는 단어들이 참조됨으로써, 사용자 발화 단어의 의미를 파악하는 과정의 효율과 정확도를 높일 수 있게 된다. In other words, in order to understand the meaning of the word spoken by the speaker #1 of the user device #1, words spoken by another speaker having a residence, age, language proficiency level, and/or preference similar to the speaker #1 are referred to. , it is possible to increase the efficiency and accuracy of the process of recognizing the meaning of words spoken by the user.

다른 일 실시예에서, 단계(S333)에서 이용되는 다른 사용자 장치의 개인화된 사전은, 사용자 장치 #1의 화자 #1와 컨텐츠 소비 취향이 유사한 동일한 화자 그룹에 속하는 화자들의 장치에 개인화된 사전일 수 있다.In another embodiment, the personalized dictionary of another user device used in step S333 may be a dictionary personalized to the devices of speakers belonging to the same speaker group having similar content consumption tastes as the speaker #1 of the user device #1. there is.

단계(S334)에서는 상기 단계들(S331 내지 S333)에서 조회된 결과가 종합되고, 매칭된 사전 수록 단어가 하나라도 있다면 최종적으로 매칭된 사전 수록 단어가 결정될 수 있다. In step S334, the results searched for in steps S331 to S333 are synthesized, and if there is at least one matched dictionary entry word, a finally matched dictionary entry word may be determined.

예를 들어, 사용자 발화 단어에 매칭되는 사전 수록 단어가 둘 이상의 사전에서 발견되었다면, 최종적으로 매칭된 사전 수록 단어는 우선 순위에 따라 결정될 수 있다. 예를 들어, 사용자 장치에 개인화된 사전에 수록된 단어가 가장 큰 우선 순위를 가지고, 공통 사전에 수록된 단어가 그 다음 우선 순위를 가지며, 다른 사용자 장치에 개인화된 사전들에 수록된 단어가 가장 낮은 우선 순위를 가질 수 있다. 즉, 사용자 발화 단어를 발화한 사용자 본인의 사전에 수록된 단어가 가장 큰 우선 순위를 가지고, 해당 단어의 의미가 이용된다.For example, if a dictionary entry matching a user utterance word is found in two or more dictionaries, a finally matched dictionary entry word may be determined according to priority. For example, a word from a dictionary personalized on a user device has the highest priority, a word from a common dictionary has the next priority, and a word from a dictionary personalized on another user device has the lowest priority can have That is, the word recorded in the user's own dictionary uttering the user's spoken word has the highest priority, and the meaning of the corresponding word is used.

최종적으로 매칭된 사전 수록 단어는 단계(S36)의 매칭 성공 처리 루틴으로 제공된다.Finally, the matched dictionary entry is provided to the matching success processing routine of step S36.

한편, 단계(S334)에서, 사용자 발화 단어와 매칭되는 사전 수록 단어가 하나도 없다고 결정될 수 있다. 이 경우, 사용자 발화 단어는, 음성 기반 대화 서비스 제공 장치(1)가 관리하는 사전에 전혀 존재하지 않는 단어로서, 의미를 파악할 수 없는 단어이다. 단계(S334)에서 사용자 발화 단어와 매칭되는 사전 수록 단어가 하나도 없다고 결정될 경우, 단계(S34)에서 단계(S35)로 분기하여 매칭 실패 처리 루틴이 진행된다.Meanwhile, in step S334 , it may be determined that there is no dictionary-listed word matching the user uttered word. In this case, the user uttered word is a word that does not exist at all in the dictionary managed by the apparatus 1 for providing a voice-based conversation service, and is a word whose meaning cannot be grasped. If it is determined in step S334 that there is no dictionary entry that matches the user uttered word, it branches from step S34 to step S35 and a matching failure processing routine proceeds.

이하에서는 도 6 및 도 7을 참조하여, 도 4의 단계(S35)에서 수행되는 매칭 실패 처리 루틴에 대하여 보다 자세히 설명한다. 먼저 도 6을 참조한다.Hereinafter, the matching failure processing routine performed in step S35 of FIG. 4 will be described in more detail with reference to FIGS. 6 and 7 . First refer to FIG. 6 .

사용자 발화 단어의 의미를 파악할 수 없는 경우, 먼저 단계(S351)에서 사용자 장치를 통한 의미 파악을 시도한다. 사용자 장치(11)의 화자가 발화한 단어 "아킴"의 의미를 파악할 수 없는 경우를 예로 들어 설명을 이어간다. If the meaning of the user's spoken word cannot be grasped, first, in step S351, the user device attempts to grasp the meaning. The description will be continued taking the case where the meaning of the word "akim" uttered by the speaker of the user device 11 cannot be grasped as an example.

단계(S351)에서는, "아킴"의 의미를 파악하기 위하여, 사용자 장치(11)를 통해 화자에게 질문할 수 있다. 예를 들어 "아킴? 아킴은 내가 모르는 말인데, 무슨 뜻이야?"라는 질문을 사용자 장치(11)를 통해 화자에게 제공함으로써, 화자로부터 직접 "아킴"의 의미를 획득하는 것을 시도할 수 있다. In step S351 , in order to understand the meaning of “Achim”, a question may be asked from the speaker through the user device 11 . For example, by providing the question "Achim? Akim is a word I do not know, what does it mean?" to the speaker through the user device 11, an attempt may be made to directly obtain the meaning of "Achim" from the speaker.

일 실시예에서, "아킴"의 의미를 열린 질문으로 물어보기 보다는, "아킴"이 의미하는 바를 나타내는 후보 단어를 선정하고, 그 단어가 맞는지 물어보는 질문을 사용자 장치(11)를 통해 화자에게 제공할 수 있다. 이를테면, 사용자 장치(11)의 화자가 아닌 다른 사용자 장치 #2의 화자 #2에 개인화된 사전에 "아스킴"은 "아이스크림"을 의미한다는 것이 수록되어 있을 수 있다. 이 경우, "아킴"과 유사한 형태와 발음을 가지는 "아스킴"의 의미인 "아이스크림"이 후보 단어로 선정될 수 있고, "아킴? 혹시 아이스크림을 말하는 거니?" 라는 방식으로 사용자 장치(11)의 화자에게 되물음으로써, 보다 효과적인 방식으로 사용자 발화 단어의 의미를 파악해 볼 수 있다.In one embodiment, rather than asking the meaning of "Achim" as an open question, a candidate word indicating what "Achim" means is selected, and a question asking whether the word is correct is provided to the speaker through the user device 11 can do. For example, a dictionary personalized to a speaker #2 of a user device #2 other than the speaker of the user device 11 may include that “aschim” means “ice cream”. In this case, "ice cream", which means "askim", which has a shape and pronunciation similar to "akim", may be selected as a candidate word, and "akim? are you talking about ice cream?" By asking the speaker of the user device 11 in this way, it is possible to grasp the meaning of the user's spoken word in a more effective way.

단계(S351)에서 사용자 장치(11)를 통해 사용자 발화 단어의 의미를 파악하는데 성공하였다면(S352), 이를 사용자 장치(11)에 개인화된 사전에 기록하고, 도 3의 단계(S40)로 진행하여 음성 기반 대화 서비스의 제공을 지속할 수 있다.If it succeeds in grasping the meaning of the user's spoken word through the user device 11 in step S351 (S352), it is recorded in the personalized dictionary in the user device 11, and proceeds to step S40 of FIG. It is possible to continue providing voice-based conversation services.

단계(S351)에서 사용자 장치(11)를 통해 사용자 발화 단어의 의미를 파악하는데 성공하지 못했다면(S352), 단계(S353)으로 진행하여 사용자 장치의 관리자를 통해 사용자 발화 단어의 의미 파악을 시도한다. 도 7을 참조하여 단계(S353)에 대하여 보다 구체적으로 설명한다.If it is not successful to grasp the meaning of the user's spoken word through the user device 11 in step S351 (S352), proceed to step S353 and try to understand the meaning of the user's spoken word through the manager of the user device . Step S353 will be described in more detail with reference to FIG. 7 .

도 7을 참조하면, 단계(S353)는 사용자 발화 단어의 누적 사용 횟수를 임계치와 비교하는 단계(S3531), 사용자 장치의 관리자에게 사용자 발화 단어에 관한 정보를 요청하는 단계(S3532), 사용자 발화 단어의 의미 및 교정 희망 여부를 관리자로부터 획득하는 단계(S3533), 및 사용자 발화 단어 및 그 의미를 사용자 장치에 개인화된 사전에 추가하는 단계(S3534) 등의 세부 단계들을 포함할 수 있다.Referring to FIG. 7 , the step S353 includes comparing the cumulative number of uses of the user uttered word with a threshold ( S3531 ), requesting information about the user uttered word from the manager of the user device ( S3532 ), and the user uttered word It may include detailed steps such as obtaining the meaning of and whether correction is desired from an administrator (S3533), and adding the user uttered word and its meaning to a personalized dictionary in the user device (S3534).

먼저 단계(S3531)에서는, 사용자 장치(11)에서 사용자 발화 단어가 사용된 누적 횟수가 사전 설정된 임계치를 초과하는지 여부가 판정된다. 후술할 단계(S3532)에서 사용자 장치(11)의 관리자에게 사용자 발화 단어에 관한 정보를 요청하기에 앞서서, 사용자 발화 단어가 사용된 누적 횟수가 임계치를 초과하는지를 확인함으로써, 사용자 장치(11)의 화자가 반복적으로 사용하지 않는 단어들이나 실수로 잘못 발음한 단어들까지 관리자에게 정보를 요청하여 번거롭게 하는 비효율을 방지할 수 있다.First, in step S3531 , it is determined whether the accumulated number of times the user uttered word is used in the user device 11 exceeds a preset threshold. In step S3532 to be described later, prior to requesting information on the user's spoken word from the manager of the user's device 11, it is checked whether the accumulated number of times the user's spoken word is used exceeds a threshold, thereby the speaker of the user's device 11 It is possible to prevent the inefficiency of asking the manager for information on words that are not repeatedly used or words that are mispronounced by mistake.

단계(S3532)에서는, 사용자 장치의 관리자에게 사용자 발화 단어에 관한 정보가 요청된다. 전술한 바와 같이 관리자란 상기 사용자 발화 단어가 획득된 사용자 장치(11)를 소유 및/또는 관리하는 자를 지칭한다. 많은 경우에, 사용자 장치(11)를 주로 사용하는 아동의 보호자 등이 상기 사용자 장치(11)의 관리자에 해당할 것이다. 상기 사용자 장치(11)와 관리자의 매칭 관계, 또는 상기 사용자 장치(11)의 화자와 관리자의 매칭 관계는 음성 기반 대화 서비스 제공 장치(1) 내의 DB(미도시)에 기록되어 이용될 수 있다. In step S3532, information about the user's spoken word is requested from the manager of the user device. As described above, the manager refers to a person who owns and/or manages the user device 11 from which the user's spoken word is obtained. In many cases, a guardian of a child who mainly uses the user device 11 may correspond to the manager of the user device 11 . The matching relationship between the user device 11 and the manager or between the speaker and the manager of the user device 11 may be recorded and used in a DB (not shown) in the voice-based conversation service providing device 1 .

단계(S3532)에서, 음성 기반 대화 서비스 제공 장치(1)는 예컨대 관리자 장치 인터페이스(30)를 통해 상기 관리자가 사용하는 관리자 장치(15)로 메시지 또는 푸쉬 알림을 전송함으로써, 상기 사용자 장치의 관리자가 상기 사용자 발화 단어에 관한 정보를 입력하도록 요청할 수 있다. 음성 기반 대화 서비스 제공 장치(1)는 관리자 장치(15)의 UI를 통해 사용자 장치(11)를 통해 이루어진 대화 이력의 적어도 일부를 제공하고, 대화 이력을 통해 파악되는 문맥 내에서 사용자 발화 단어가 가지는 의미를 관리자가 입력하도록 유도할 수 있다.In step S3532, the voice-based conversation service providing device 1 sends a message or a push notification to the manager device 15 used by the manager through the manager device interface 30, for example, so that the manager of the user device It is possible to request to input information about the user uttered word. The voice-based conversation service providing apparatus 1 provides at least a part of the conversation history made through the user device 11 through the UI of the manager device 15, and the user uttered words have It can induce the administrator to input the meaning.

일 실시예에서는, 단계(S3532)에서 상기 대화 이력과 함께 상기 사용자 발화 단어에 해당되는 후보 단어를 관리자에게 제시할 수 있다. 예를 들어, 상기 사용자 발화 단어의 의미가 상기 사용자 장치(11)가 아닌 다른 사용자 장치(예컨대 다른 아동이 사용하는 장치)에 개인화된 사전에 수록되어 있다면, 그 의미를 사용자 장치(11)의 관리자에게 제시함으로써, 사용자 발화 단어가 가지는 의미에 대한 단서를 상기 관리자에게 제공할 수 있다. 이때 다른 사용자 장치는, 사용자 장치(11)의 화자와 동일한 화자 그룹에 속하는 화자가 사용하는 장치일 수 있다. 즉, 사용자 장치(11)의 화자와 유사한 거주지, 나이, 언어 구사 레벨, 컨텐츠 소비 취향 등을 가지는 화자가 사용하는 단어들 중에 상기 사용자 발화 단어에 대응되는 것이 있다면, 그 의미를 관리자가 참고할 수 있도록 제시할 수 있다.In an embodiment, in step S3532 , a candidate word corresponding to the user's spoken word may be presented to the manager together with the conversation history. For example, if the meaning of the user's spoken word is recorded in a personalized dictionary in a user device other than the user device 11 (eg, a device used by another child), the meaning of the user device 11 is determined by the administrator of the user device 11 . By presenting to the user, a clue about the meaning of the user's spoken word may be provided to the manager. In this case, the other user device may be a device used by a speaker belonging to the same speaker group as the speaker of the user device 11 . That is, if there is any word used by a speaker having a residence, age, language proficiency level, content consumption taste, etc. similar to the speaker of the user device 11 , the user can refer to the meaning can present

단계(S3532)에서는, 예를 들어, "우리 아이가 '아꿍'이라는 단어를 사용했는데, 어머님은 무슨 뜻인지 아시나요? 참고로 우리 아이 또래의 다른 아이 중 한 명은 '아이스크림'이라는 의미로 '아꿍'을 사용한대요" 라는 메시지지를 관리자 장치(15)를 통해 관리자에게 제공함으로써, "아꿍" 이라는 사용자 발화 단어의 의미를 관리자가 입력할 수 있도록 한다.In the step S3532, for example, "My child used the word 'a goong', do you know what the mother means? For reference, one of the other children of our age has the meaning of 'ice cream' and 'a goong' By providing the message "You're using ." to the manager through the manager device 15, the manager can input the meaning of the user's uttered word "Awkong".

단계(S3533)에서는, 사용자 발화 단어의 의미가 관리자 장치(15)를 통해 관리자로부터 획득될 수 있다. 사용자 장치(11)의 화자(예컨대 아동)가 사용하는 단어의 의미를 사용자 장치(11)의 관리자(예컨대 아동의 부모 또는 보호자)로부터 제공받음으로써, 사용자 장치(11)를 사용하는 아동의 부모나 보호자만이 알 수 있는 특정 아동만의 독특한 표현들의 의미까지도 파악할 수 있게 된다. In step S3533 , the meaning of the user uttered word may be acquired from the manager through the manager device 15 . By receiving the meaning of a word used by a speaker (eg, a child) of the user device 11 from an administrator (eg, a parent or guardian of the child) of the user device 11 , the parent of a child using the user device 11 or It is also possible to grasp the meaning of the unique expressions of a specific child that only the caregiver can know.

일 실시예에서, 단계(S3533)에서는, 사용자 장치(11)의 화자가 상기 사용자 발화 단어를 표준어로 교정하도록 유도하는 것을 상기 관리자가 희망하는지 여부에 관한 정보가 추가적으로 획득될 수 있다. 사용자 장치(11)의 화자인 아동의 나이나 언어 구사 수준에 따라서, 표준어가 아닌 화자 고유의 단어를 계속해서 사용하는 것에 대해 교정을 유도하는 것이 바람직할지 또는 일정 기간 동안은 계속해서 화자 고유의 단어를 사용하도록 할 지 여부에 대하여, 화자인 아동의 부모 등의 선호가 다를 수 있다. 또한 상기 사용자 발화 단어가 무엇인지, 그 어감은 어떠한 지 여부에 따라서도, 화자의 부모의 선호가 다를 수 있다. 단계(S3533)에서는, 상기 사용자 발화 단어를 표준어로 교정하도록 유도할지 여부를 상기 관리자가 지정할 수 있도록 하는 인터페이스가 제공됨으로써, 음성 기반 대화 서비스가 사용자 장치(11)의 화자의 언어 사용 습관에 어떤 영향을 줄지를 관리자가 결정할 수 있게 된다.In an embodiment, in step S3533 , information on whether the administrator desires to induce the speaker of the user device 11 to correct the user uttered word as a standard word may additionally be obtained. Depending on the age or language proficiency level of the child who is the speaker of the user device 11 , it may be desirable to induce correction for continuing to use the speaker's own words, not the standard language, or continue to use the speaker's own words for a certain period of time. As to whether or not to use , the preferences of the parent of the child who is the speaker may be different. Also, the preference of the speaker's parent may be different depending on what the user's uttered word is and what the tone is. In step S3533 , an interface is provided for the administrator to specify whether or not to induce the user's spoken word to be corrected to a standard language, so that the voice-based conversation service affects the language usage habit of the speaker of the user device 11 . Administrators can decide whether to

단계(S3534)에서는, 관리자 장치(15)를 통해 관리자로부터 입력 받은 사용자 발화 단어의 의미를, 사용자 발화 단어와 함께 사용자 장치(11)에 개인화된 사전에 기록한다. 향후 사용자 장치(11)로부터 동일한 단어가 발화될 경우 단계(S33)의 사전 조회 과정에서 사용자 발화 단어의 의미가 파악될 수 있게 된다.In step S3534 , the meaning of the user uttered word received from the manager through the manager device 15 is recorded in the personalized dictionary in the user device 11 together with the user uttered word. When the same word is uttered from the user device 11 in the future, the meaning of the user uttered word can be grasped in the dictionary inquiry process of step S33 .

몇몇 실시예에서는, 단계(S3532)에서, 사용자 발화 단어가 아닌 사용자 발화가 포함된 오디오 클립이 관리자 장치(15)에 제공 및 재생되고, 사용자 발화가 포함된 오디오 클립에 담긴 화자의 표현의 의미를 관리자로 하여금 입력하도록 할 수 있다. 발음이 부정확한 어린 아이의 발화는 오디오를 텍스트로 변환하는 오디오 인식 모델에 의해서 인식되지 못하는 경우가 많으므로, 텍스트 형태의 사용자 발화 단어로 변환되지 못하는 경우가 발생한다. 이 경우 사용자 발화가 포함된 오디오 클립 자체를 관리자(화자의 부모 등)에게 제공함으로써, 텍스트로 변환되지 못한 발화의 의미까지도 파악할 수 있게 된다. 또한 그 결과를 기초로 오디오 인식 모델을 지도 학습하여 업데이트 할 수 있게 된다. In some embodiments, in step S3532 , an audio clip including a user utterance other than the user utterance words is provided and played to the manager device 15 , and the meaning of the speaker's expression contained in the audio clip including the user utterance is displayed. You can have the administrator enter it. Since the speech of a child with incorrect pronunciation is often not recognized by the audio recognition model that converts audio into text, it may not be converted into user speech words in text form. In this case, by providing the audio clip itself including the user's utterance to the manager (the speaker's parent, etc.), even the meaning of the utterance that has not been converted into text can be grasped. In addition, based on the results, the audio recognition model can be supervised and updated.

지금까지 도 6 및 도 7을 참조하여, 도 4의 단계(S35)에서 수행되는 매칭 실패 처리 루틴에 대하여 설명하였다. 본 실시예에 따른 매칭 실패 처리 루틴에서는, 사전 조회 과정에서 매칭되는 단어를 찾지 못한 사용자 발화 단어의 의미를, 상기 단어를 발화한 사용자의 부모 등 발화가 이루어진 사용자 장치의 관리자 내지는 소유자로부터 획득한다. 이는 음성 기반 대화 서비스의 대상자가 영유아 등의 아동일 경우에 특히 매우 큰 효과를 가진다. 어린 아동의 발화나 어린 아동이 사용하는 어휘는 부모 등 보호자나 동거하는 가족들 외의 다른 사람들은 전혀 이해하기 어려운 경우가 많으므로, 어린 아동이 발화한 단어의 의미를 서비스 제공자 측이 고용한 운영자나 관리자가 알아 내어 사전에 추가하는 것은 불가능에 가깝다. 본 발명의 실시예들에서는, 어린 아동의 발화가 이루어진 사용자 장치의 관리자, 즉 부모 등으로부터 단어의 의미를 획득하므로, 어린 아동이 사용하는 단어의 의미를 효율적이고 정확하게 파악하여, 보다 고도한 음성 기반 대화 서비스를 제공할 수 있다는 장점을 가지게 된다.So far, the matching failure processing routine performed in step S35 of FIG. 4 has been described with reference to FIGS. 6 and 7 . In the matching failure processing routine according to the present embodiment, the meaning of the user's uttered word for which a matching word is not found in the dictionary inquiry process is acquired from the manager or owner of the user device in which the utterance is made, such as the parent of the user who uttered the word. This has a particularly great effect when the target of the voice-based conversation service is a child, such as an infant. Since the utterances of young children and vocabulary used by young children are often difficult to understand by others other than parents, guardians, and family members living together, the meaning of the words spoken by the young child can be interpreted by the operator or the operator hired by the service provider. It is almost impossible for an administrator to find out and add it to the dictionary. In the embodiments of the present invention, since the meaning of the word is obtained from the administrator of the user device, that is, the parent, etc. in which the young child's utterance is made, the meaning of the word used by the young child is efficiently and accurately grasped, and a more advanced voice-based It has the advantage of being able to provide a conversation service.

이하에서는 도 8 및 도 9를 참조하여, 도 4의 단계(S36)에서 수행되는 매칭 성공 처리 루틴에 대하여 보다 자세히 설명한다. Hereinafter, the matching success processing routine performed in step S36 of FIG. 4 will be described in more detail with reference to FIGS. 8 and 9 .

도 8은, 본 발명의 몇몇 실시예에 따른 매칭 성공 처리 루틴의 순서도이다. 사용자 발화 단어와 매칭되는 사전 수록 단어가 사전으로부터 발견된 경우, 단계(S36)에서는 상기 사용자 발화 단어를 이용한 음성 기반 대화 서비스가 제공될 수 있다. 8 is a flowchart of a matching success processing routine according to some embodiments of the present invention. When a dictionary-listed word matching the user's spoken word is found from the dictionary, in step S36, a voice-based conversation service using the user's spoken word may be provided.

예를 들어, 도 8의 단계(S361)를 참조하면, 사용자 발화 단어에 매칭되는 사전 수록된 단어, 다시 말해 사용자 발화 단어의 의미에 대응되는 음성/영상 컨텐츠가 사용자 장치(11)에게 제공될 수 있다. 예컨대, 사용자 발화 단어가 "아꿍"이고, "아꿍"이 "아이스크림"에 매칭된다고 판정된 경우, "아이스크림"에 관련된 오디오북 컨텐츠나 동영상 컨텐츠가 사용자 장치(11)를 통해서 제공될 수 있다.For example, referring to step S361 of FIG. 8 , a pre-recorded word matching the user's spoken word, that is, audio/video content corresponding to the meaning of the user's spoken word may be provided to the user device 11 . . For example, when it is determined that the user's uttered word is "a goong" and "a goong" matches "ice cream," audiobook content or video content related to "ice cream" may be provided through the user device 11 .

다른 예로서, 도 8의 단계(S362)를 참조하면, 사전 수록 단어의 의미에 기초하여, 대화 엔진의 응답 발화 내에 사용자 발화 단어가 사용될 수 있다. 예컨대, "어떤 맛 아꿍을 제일 좋아해?", "최근에 엄마와 함께 아꿍을 먹은 것은 언제야?" 등과 같이, "아이스크림" 대신에 "아꿍"을 사용한 응답 발화를 사용자 장치(11)에 제공함으로써, "아꿍" 이라는 표현을 사용하는 사용자 장치(11)의 화자와의 친밀감을 형성할 수 있다.As another example, referring to step S362 of FIG. 8 , a user uttered word may be used in a response utterance of the dialogue engine based on the meaning of the word included in the dictionary. For example, "Which flavor do you like the most?", "When was the last time you ate a goong with your mom?" As such, by providing the user device 11 with a response utterance using “a goong” instead of “ice cream”, it is possible to form intimacy with the speaker of the user device 11 using the expression “a goong”.

또 다른 예로서, 기존 컨텐츠에 등장하는 사전 수록 단어를 사용자 발화 단어로 치환하는 가공을 수행하고(단계 S3631), 이와 같이 가공된 컨텐츠를 사용자 장치(11)에 제공할 수 있다(단계 S3632). 예컨대, 동화 오디오북 컨텐츠에서 "아이스크림"이라는 단어가 등장하는 부분에서 "아이스크림"을 "아꿍"으로 치환하고, 이와 같이 가공된 컨텐츠를 사용자 장치(11)에 제공할 수 있다. 이처럼 사용자 장치(11)의 화자만이 고유하게 사용하는 표현을, 음성 기반 대화 서비스 제공 장치(1)가 적절히 모사하도록 함으로써, 사용자 장치(11)의 화자와의 친밀감을 한층 더 향상시킬 수 있으며, 궁극적으로 대화 서비스에 대한 사용자 인게이지먼트를 극대화시킬 수 있게 된다.As another example, it is possible to perform processing of replacing words in the dictionary appearing in the existing contents with words spoken by the user (step S3631), and provide the processed contents to the user device 11 (step S3632). For example, in a part where the word "ice cream" appears in the audiobook content of a children's story, "ice cream" may be replaced with "a goong", and the processed content may be provided to the user device 11 in this way. In this way, by allowing the voice-based conversation service providing device 1 to properly imitate the expression uniquely used by only the speaker of the user device 11, the user device 11’s intimacy with the speaker can be further improved, Ultimately, it will be possible to maximize user engagement with the conversation service.

몇몇 실시예에서는, 상기 사용자 발화 단어의 사용에 관한 리포트를 사용자 장치(11)의 관리자에게 제공할 수 있다(단계 S364). 구체적으로, 사용자 발화 단어 및 이에 대응되는 표준어의 사용 빈도, 상대적인 사용 비율 등에 관한 통계 데이터를 관리자 장치(15)에게 제공함으로써, 사용자 장치(11)의 화자의 언어 사용 습관 및 언어 구사 능력의 발달 수준을, 화자의 부모 등이 파악할 수 있도록 한다.In some embodiments, the report on the usage of the user uttered word may be provided to the administrator of the user device 11 (step S364). Specifically, the level of development of the speaker's language usage habit and language ability of the user device 11 by providing statistical data on the frequency of use, relative usage ratio, etc. of the user's spoken word and the corresponding standard word to the manager device 15 . , so that the speaker's parents, etc. can understand it.

도 9는, 본 발명의 다른 몇몇 실시예에 따른 매칭 성공 처리 루틴의 순서도이다. 본 실시예는, 사용자 장치(11)의 화자의 언어 구사 레벨에 따라서, 음성 기반 대화 서비스의 응답을 다르게 한다는 점에서, 도 8를 참조하여 전술한 실시예와 구별된다.9 is a flowchart of a matching success processing routine according to some other embodiments of the present invention. This embodiment is different from the embodiment described above with reference to FIG. 8 in that the response of the voice-based conversation service is different according to the language proficiency level of the speaker of the user device 11 .

도 9를 참조하면, 먼저 사용자 장치(11)의 화자의 언어 구사 레벨이 판정된다(단계 S366). Referring to FIG. 9 , first, the language proficiency level of the speaker of the user device 11 is determined (step S366).

화자의 언어 구사 레벨이 제1 레벨일 경우, 사용자 발화 단어를 사용하여 음성 기반 대화 서비스를 제공한다(단계 S367). 예컨대 발성 기관이 완전히 발달되지 않아서 발음이 불완전하고 언어 구사 레벨이 낮은 화자를 대상으로는, "아이스크림"을 의미하는 사용자 발화 단어인 "아꿍"을 사용하여 음성 기반 대화 서비스를 제공함으로써, 화자와의 친밀감 형성을 촉진할 수 있다. 사용자 발화 단어를 사용하여 음성 기반 대화 서비스를 제공하는 것의 구체적인 예는, 도 8에 도시된 순서도의 단계(S361 내지 S3632)에 관한 설명들이 참조될 수 있다.When the speaker's language proficiency level is the first level, a voice-based conversation service is provided using the user's spoken words (step S367). For example, for a speaker with incomplete pronunciation and a low level of language proficiency due to not fully developed vocal organs, a voice-based conversation service is provided using the user's utterance word "a goong", which means "ice cream." It can promote intimacy. For a specific example of providing a voice-based conversation service using a user uttered word, reference may be made to descriptions of steps S361 to S3632 of the flowchart shown in FIG. 8 .

화자의 언어 구사 레벨이 제2 레벨일 경우, 사용자 발화 단어에 매칭되는 사전 수록 단어(표준어 등)를 사용하여 음성 기반 대화 서비스를 제공하며 화자가 사용하는 단어의 교정을 유도할 수 있다(단계 S369). 제2 레벨은 제1 레벨보다 높은 수준의 언어를 구사하는 레벨일 수 있다. 예컨대 "아이스크림"을 발음하는데 지장이 없을 정도로 화자의 발음이 정확하고 언어 구사력도 충분한 경우에는, 화자가 "아꿍"이라고 발화하더라도 "아이스크림"이라는 표준어를 사용하여 응답함으로써, 화자가 앞으로는 표준어를 사용하도록 유도할 수 있다.When the speaker's language proficiency level is the second level, a voice-based conversation service is provided using a dictionary-listed word (standard language, etc.) that matches the user's spoken word, and correction of the speaker's word can be induced (step S369) ). The second level may be a level that speaks a higher level language than the first level. For example, if the speaker's pronunciation is accurate enough that there is no problem in pronouncing "ice cream" and the speaker's language skills are sufficient, even if the speaker says "a goong", he responds using the standard word "ice cream" to encourage the speaker to use the standard word in the future. can induce

만약 화자의 언어 구사 레벨이 상기 제1 레벨과 제2 레벨의 사이인 제3 레벨에 해당할 경우, 사용자 발화 단어 및 사전 수록 단어(표준어 등)를 혼용하여 음성 기반 대화 서비스를 제공할 수 있다. 제3 레벨에 해당하는 화자를 대상으로는, 사용자 발화 단어를 사용함으로써 친밀감을 유지하면서도, 표준어를 함께 사용함으로써 화자가 점차적으로 표준어를 사용하도록 유도할 수 있다.If the speaker's language proficiency level corresponds to a third level between the first level and the second level, a voice-based conversation service may be provided by mixing a user's spoken word and a dictionary-listed word (standard language, etc.). With respect to the speaker corresponding to the third level, it is possible to induce the speaker to gradually use the standard word by using the standard word while maintaining intimacy by using the user's spoken word.

지금까지 도 3 내지 도 9를 참조하여, 본 발명의 일 실시예에 따른 음성 기반의 대화 서비스 제공 방법에 대하여 설명하였다. So far, a method for providing a voice-based conversation service according to an embodiment of the present invention has been described with reference to FIGS. 3 to 9 .

본 실시예에 따르면, 특정 사용자 또는 특정 사용자 장치에 개인화된 사전을 관리 및 이용함으로써, 사용자로 하여금 대화 서비스에 더 높은 친밀감을 가지도록 유도하고, 궁극적으로 대화 서비스에 대한 사용자 인게이지먼트를 향상시킬 수 있게 된다.According to this embodiment, by managing and using a dictionary personalized to a specific user or a specific user device, it is possible to induce a user to have a higher affinity for the conversation service and ultimately improve user engagement with the conversation service. be able to

또한, 본 실시예에 따르면, 사용자와 유사한 거주지, 나이, 언어 구사 레벨, 컨텐츠 소비 취향 등을 가지는 다른 사용자에 개인화된 사전을 참고함으로써, 사용자가 발화한 단어의 의미를 효과적으로 파악할 수 있게 된다.Also, according to the present embodiment, by referring to a dictionary personalized to another user having a residence, age, language proficiency level, content consumption taste, etc. similar to the user, it is possible to effectively grasp the meaning of the word spoken by the user.

또한, 본 실시예에 따르면, 사전 조회를 통해 의미를 파악하지 못한 사용자 발화 단어에 대하여, 사용자 장치의 관리자나 사용자의 부모 등에게 단어의 의미를 입력하도록 함으로써, 부모 등 보호자나 동거 가족들 외의 다른 사람들은 극도로 이해하기 어려운, 어린 아동이 발화한 단어의 의미를 효율적으로 파악할 수 있게 된다. 이는 서비스 제공자 측이 고용한 운영자나 관리자 등 특정 아동과 관계가 없는 제3자를 통해 특정 아동의 어휘를 파악하는 종전의 방식으로부터는 전혀 기대할 수 없는 수준의 높은 정확도로 단어의 의미를 파악할 수 있도록 한다.In addition, according to the present embodiment, with respect to a user's uttered word, the meaning of which cannot be grasped through a dictionary inquiry, by having the administrator of the user device or the user's parent input the meaning of the word, other People will be able to efficiently grasp the meaning of words spoken by young children, which are extremely difficult to understand. This makes it possible to grasp the meaning of words with a high level of accuracy that cannot be expected from the previous method of identifying the vocabulary of a specific child through a third party not related to the specific child, such as an operator or manager hired by the service provider. .

또한, 본 실시예에 따르면, 사용자의 언어 구사 레벨 내지는 언어 발달 수준에 맞추어, 사용자가 고유하게 사용하는 단어와 이에 대응되는 표준어를 적절히 사용하여 음성 기반 대화 서비스를 제공함으로써, 사용자와의 친밀감을 형성하면서도 사용자의 단어 사용 습관의 개선을 점진적으로 유도할 수 있게 된다.In addition, according to the present embodiment, in accordance with the user's language proficiency level or language development level, the user's uniquely used words and the corresponding standard words are appropriately used to provide a voice-based conversation service, thereby forming intimacy with the user. However, it is possible to gradually induce the improvement of the user's habit of using words.

또한, 본 실시예에 따르면, 사용자의 단어 사용 습관에 관한 리포트를 자동으로 생성하여 사용자의 부모에게 제공함으로써, 아동의 언어 구사 레벨과 발달 수준을 부모가 손쉽게 파악할 수 있도록 한다.In addition, according to the present embodiment, a report on the user's word usage habit is automatically generated and provided to the user's parents, so that the parent can easily understand the language proficiency level and developmental level of the child.

이하에서는, 도 10을 참조하여, 본 발명의 몇몇 실시예들에 음성 기반 대화 서비스 제공 장치(1)를 구현할 수 있는 예시적인 컴퓨팅 장치(1500)에 대하여 설명하도록 한다.Hereinafter, with reference to FIG. 10 , an exemplary computing device 1500 capable of implementing the apparatus 1 for providing a voice-based conversation service in some embodiments of the present invention will be described.

도 10은 본 발명의 몇몇 실시예들에 따른 음성 기반 대화 서비스 제공 장치(1)를 구현할 수 있는 예시적인 컴퓨팅 장치(1500)를 나타내는 하드웨어 구성도이다.10 is a hardware configuration diagram illustrating an exemplary computing device 1500 capable of implementing the apparatus 1 for providing a voice-based conversation service according to some embodiments of the present invention.

도 10에 도시된 바와 같이, 컴퓨팅 장치(1500)는 하나 이상의 프로세서(1510), 버스(1550), 통신 인터페이스(1570), 상기 프로세서(1510)에 의하여 수행되는 컴퓨터 프로그램(1591)을 로드(load)하는 메모리(1530)와, 컴퓨터 프로그램(1591)을 저장하는 스토리지(1590)를 포함할 수 있다. 다만, 도 10에는 본 발명의 실시예와 관련 있는 구성 요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 10에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.As shown in FIG. 10 , the computing device 1500 loads one or more processors 1510 , a bus 1550 , a communication interface 1570 , and a computer program 1591 executed by the processor 1510 . ) may include a memory 1530 and a storage 1590 for storing the computer program 1591 . However, only the components related to the embodiment of the present invention are illustrated in FIG. 10 . Accordingly, a person skilled in the art to which the present invention pertains can know that other general-purpose components other than the components shown in FIG. 10 may be further included.

프로세서(1510)는 컴퓨팅 장치(1500)의 각 구성의 전반적인 동작을 제어한다. 프로세서(1510)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 발명의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성될 수 있다. 또한, 프로세서(1510)는 본 발명의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(1500)는 하나 이상의 프로세서를 구비할 수 있다.The processor 1510 controls the overall operation of each component of the computing device 1500 . The processor 1510 includes a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), a graphic processing unit (GPU), or any type of processor well known in the art. can be In addition, the processor 1510 may perform an operation on at least one application or program for executing the method according to the embodiments of the present invention. The computing device 1500 may include one or more processors.

메모리(1530)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(1530)는 본 발명의 실시예들에 따른 방법을 실행하기 위하여 스토리지(1590)로부터 하나 이상의 프로그램(1591)을 로드할 수 있다. 메모리(1530)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 발명의 기술적 범위가 이에 한정되는 것은 아니다.The memory 1530 stores various data, commands, and/or information. The memory 1530 may load one or more programs 1591 from the storage 1590 to execute a method according to embodiments of the present invention. The memory 1530 may be implemented as a volatile memory such as RAM, but the technical scope of the present invention is not limited thereto.

버스(1550)는 컴퓨팅 장치(1500)의 구성 요소 간 통신 기능을 제공한다. 버스(1550)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.The bus 1550 provides a communication function between components of the computing device 1500 . The bus 1550 may be implemented as various types of buses, such as an address bus, a data bus, and a control bus.

통신 인터페이스(1570)는 컴퓨팅 장치(1500)의 유무선 인터넷 통신을 지원한다. 또한, 통신 인터페이스(1570)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(1570)는 본 발명의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.The communication interface 1570 supports wired/wireless Internet communication of the computing device 1500 . Also, the communication interface 1570 may support various communication methods other than Internet communication. To this end, the communication interface 1570 may be configured to include a communication module well known in the art.

몇몇 실시예들에 따르면, 통신 인터페이스(1570)는 생략될 수도 있다.According to some embodiments, the communication interface 1570 may be omitted.

스토리지(1590)는 상기 하나 이상의 프로그램(1591)과 각종 데이터를 비임시적으로 저장할 수 있다.The storage 1590 may non-temporarily store the one or more programs 1591 and various data.

스토리지(1590)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 1590 is a non-volatile memory such as a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, or well in the art to which the present invention pertains. It may be configured to include any known computer-readable recording medium.

컴퓨터 프로그램(1591)은 메모리(1530)에 로드될 때 프로세서(1510)로 하여금 본 발명의 다양한 실시예에 따른 방법/동작을 수행하도록 하는 하나 이상의 인스트럭션들을 포함할 수 있다. 즉, 프로세서(1510)는 상기 하나 이상의 인스트럭션들을 실행함으로써, 본 발명의 다양한 실시예에 따른 방법/동작들을 수행할 수 있다.The computer program 1591 may include one or more instructions that, when loaded into the memory 1530 , cause the processor 1510 to perform methods/operations in accordance with various embodiments of the present invention. That is, the processor 1510 may perform the methods/operations according to various embodiments of the present disclosure by executing the one or more instructions.

위와 같은 경우, 컴퓨팅 장치(1500)를 통해 본 발명의 몇몇 실시예들에 따른 장치들이 구현될 수 있다.In this case, the devices according to some embodiments of the present invention may be implemented through the computing device 1500 .

지금까지 도 1 내지 도 10을 참조하여 본 발명의 다양한 실시예들 및 그 실시예들에 따른 효과들을 언급하였다. 본 발명의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.So far, various embodiments of the present invention and effects according to the embodiments have been described with reference to FIGS. 1 to 10 . Effects according to the technical spirit of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

지금까지 도 1 내지 도 10을 참조하여 설명된 본 발명의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비 형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical idea of the present invention described with reference to FIGS. 1 to 10 may be implemented as computer-readable codes on a computer-readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). can The computer program recorded on the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

이상에서, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though it has been described that all components constituting the embodiment of the present invention are combined or operated in combination, the technical spirit of the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although acts are shown in a particular order in the drawings, it should not be understood that the acts must be performed in the specific order or sequential order shown, or that all illustrated acts must be performed to obtain a desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be construed as necessarily requiring such separation, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 발명이 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can practice the present invention in other specific forms without changing the technical spirit or essential features. can understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the technical ideas defined by the present invention.

Claims

A method of providing a voice-based conversation service, comprising:
obtaining an audio input from a user device;
identifying the meaning of the audio input;
receiving the meaning from an administrator of the user device based on a determination that the identification of the meaning has failed; and
Storing the received meaning
including,
The step of identifying the meaning of the audio input comprises:
comparing the user uttered words recognized from the audio input with words included in a dictionary personalized to another user device distinct from the user device;
containing,
How to provide a voice-based conversation service.

According to claim 1,
The step of identifying the meaning of the audio input comprises:
Comprising the step of comparing the user speech word recognized from the audio input with dictionary-listed words included in a dictionary personalized to the user device,
The step of storing the received meaning further includes adding the user uttered word to a personalized dictionary in the user device.
How to provide a voice-based conversation service.

delete

According to claim 1,
The speaker of the other user equipment is a user belonging to the same speaker group as the speaker of the user equipment,
The method for providing a voice-based conversation service, wherein the speaker group is determined based on at least one of an age and a spoken language level of the speaker.

3. The method of claim 2,
The step of receiving the meaning from the manager of the user device comprises:
providing at least a portion of a conversation history generated through the user device to the manager; and
Comprising the step of receiving the meaning of the user uttered word from the manager,
How to provide a voice-based conversation service.

3. The method of claim 2,
The step of receiving the meaning from the manager of the user device comprises:
and providing, to the administrator, a word matching the user uttered word among words included in a personalized dictionary to another user device, which is distinguished from the user device.
How to provide a voice-based conversation service.

3. The method of claim 2,
The step of receiving the meaning from the manager of the user device comprises:
based on a determination that the speaker of the user device has uttered the user uttered word more than a preset number of times, comprising the step of requesting information related to the user uttered word from the administrator,
How to provide a voice-based conversation service.

3. The method of claim 2,
The step of receiving the meaning from the manager of the user device comprises:
Receiving an input whether the voice-based conversation service induces the speaker of the user device to correct the user uttered word into a word recorded in the dictionary
containing,
How to provide a voice-based conversation service.

3. The method of claim 2,
Based on the judgment that the identification of the above meaning was successful,
and setting, by the voice-based conversation service, to use the user uttered word for a response utterance provided to the user device.
How to provide a voice-based conversation service.

3. The method of claim 2,
Based on the judgment that the identification of the above meaning was successful,
and setting a word corresponding to the meaning in the content provided to the user device by the voice-based conversation service to be provided by replacing the word uttered by the user,
How to provide a voice-based conversation service.

A method of providing a voice-based conversation service, comprising:
obtaining an audio input from a user device;
identifying the meaning of the audio input;
receiving the meaning from an administrator of the user device based on a determination that the identification of the meaning has failed; and
Storing the received meaning
including,
determining a level of a speaker of the user device based on a determination that the semantic identification is successful;
setting, based on a determination that the speaker corresponds to a first level, to provide a voice-based conversation service to the user device using a user uttered word recognized from the audio input; and
setting, based on determining that the speaker corresponds to the second level, to provide the user device with a response utterance that prompts the user to correct the user uttered word with a dictionary entry matching the user uttered word;
further comprising,
How to provide a voice-based conversation service.

13. The method of claim 12,
setting, based on the determination that the speaker corresponds to the third level, to provide a voice-based conversation service to the user device by mixing the user uttered word and the matched dictionary entry word;
further comprising,
How to provide a voice-based conversation service.

13. The method of claim 12,
The step of determining the level of the speaker,
determining the level based on at least one of the age of the speaker and the level of language spoken;
How to provide a voice-based conversation service.

3. The method of claim 2,
providing information on the frequency with which the speaker of the user device uses the user uttered word and the dictionary-listed word matching the user uttered word to the manager of the user device, respectively, during a preset period
further comprising,
How to provide a voice-based conversation service.

According to claim 1,
The step of identifying the meaning of the audio input comprises:
recognizing user-uttered words from the audio input;
Based on the determination that the identification of the meaning has failed, the step of receiving the meaning from an administrator of the user device includes:
providing the audio input to an administrator of the user device;
The step of storing the input meaning is,
Including the step of updating the audio recognition model based on the information received from the manager,
How to provide a voice-based conversation service.

17. An apparatus for providing a voice-based conversation service for performing the method according to any one of claims 1, 2 and 5 to 16.

17. A computer-readable non-transitory recording medium storing a computer program for causing a computer to perform the method for providing a voice-based conversation service according to any one of claims 1, 2, and 5 to 16.