KR20220154655A

KR20220154655A - Device, method and computer program for generating voice data based on family relationship

Info

Publication number: KR20220154655A
Application number: KR1020220149109A
Authority: KR
Inventors: 신보라; 박재한
Original assignee: 주식회사 케이티
Priority date: 2020-07-10
Filing date: 2022-11-10
Publication date: 2022-11-22
Also published as: KR102605178B1; KR20220007490A

Abstract

A device for generating speech data comprises: a family relationship model generating part that generates a plurality of family relationship models corresponding to a family relationship of a family member, based on the speech data of the family member; an input part that receives the speech data of the family member related to a user who wants to generate the speech data; a selection part that selects one family relationship model among the plurality of family relationship models based on the family relationship between the user and the family member; and a generating part that generates the speech data of the user by inputting the speech data of the family member to the selected family relationship model.

Description

DEVICE, METHOD AND COMPUTER PROGRAM FOR GENERATING VOICE DATA BASED ON FAMILY RELATIONSHIP

본 발명은 음성 데이터를 생성하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to an apparatus, method and computer program for generating voice data.

음성 합성 기술(TTS, Text-To-Speech)이란 말소리의 음파를 기계가 자동으로 생성하는 기술로, 모델로 선정된 한 사람의 말소리를 녹음하여 일정한 음성 단위로 분할한 후, 부호를 붙여 합성기에 입력한 후 지시에 따라 필요한 음성 단위만을 다시 합쳐 말소리를 인위적으로 만들어내는 기술을 의미한다. TTS (Text-To-Speech) is a technology that automatically generates speech sound waves by a machine. After recording the speech sound of one person selected as a model, dividing it into certain speech units, and then attaching codes to the synthesizer It refers to a technology that artificially creates speech sounds by recombining only necessary phonetic units according to instructions after input.

최근에는 음성 합성 기술을 이용하여 원하는 목소리를 학습하여 문맥에 따른 높낮이, 강세, 발음을 학습함으로써, 자연스러운 개인화된 음성 서비스를 제공할 수 있게 되었다. Recently, it is possible to provide a natural personalized voice service by learning a desired voice using voice synthesis technology and learning pitch, stress, and pronunciation according to context.

이러한 개인화된 음성 서비스를 제공하는 기술과 관련하여, 선행기술인 한국공개특허 제 2020-0016516호는 개인화된 가상 음성 합성 장치 및 방법을 개시하고 있다. Regarding technology for providing such a personalized voice service, Korean Patent Publication No. 2020-0016516, which is a prior art, discloses a personalized virtual voice synthesis device and method.

그러나 종래의 음성 합성 기술을 이용하여 특정 목소리를 복원하고자 하는 경우, 복원하고자 하는 특정 사용자의 발화 데이터의 수집이 충분한 시간 동안 이루어져야 한다. 그러나 선천적으로 목소리를 내지 못하는 장애를 가진 농아인의 경우, 농아인의 목소리를 충분히 확보할 수 없음에 따라 목소리를 복원하기 어렵다는 단점을 가지고 있다. However, when a specific voice is to be restored using conventional voice synthesis technology, speech data of a specific user to be restored must be collected for a sufficient period of time. However, in the case of a deaf person with a congenital disability, it is difficult to restore the voice as the voice of the deaf person cannot be sufficiently secured.

복수의 가족 구성원의 음성 데이터에 기초하여 가족 구성원의 가족 관계에 해당하는 복수의 가족 관계 모델을 생성하는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. An apparatus, method, and computer program for generating a plurality of family relationship models corresponding to family relationships of family members based on voice data of a plurality of family members are provided.

음성 데이터를 복원하고자 하는 사용자와 관련된 가족 구성원의 음성 데이터를 입력받고, 사용자 및 가족 구성원 간의 가족 관계에 기초하여 복수의 가족 관계 모델 중 하나의 가족 관계 모델을 선택하고, 선택된 가족 관계 모델에 가족 구성원의 음성 데이터를 입력하여 사용자의 음성 데이터를 생성하는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. Receive voice data of a family member related to a user whose voice data is to be restored, select one family relationship model from among a plurality of family relationship models based on family relationships between the user and family members, and select the family member in the selected family relationship model. It is intended to provide a device, method, and computer program for generating user's voice data by inputting voice data of the user.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 복수의 가족 구성원의 음성 데이터에 기초하여, 상기 가족 구성원의 가족 관계에 해당하는 복수의 가족 관계 모델을 생성하는 가족 관계 모델 생성부, 음성 데이터를 생성하고자 하는 사용자와 관련된 가족 구성원의 음성 데이터를 입력받는 입력부, 상기 사용자 및 상기 가족 구성원 간의 가족 관계에 기초하여 상기 복수의 가족 관계 모델 중 하나의 가족 관계 모델을 선택하는 선택부 및 상기 선택된 가족 관계 모델에 상기 가족 구성원의 음성 데이터를 입력하여 상기 사용자의 음성 데이터를 생성하는 생성부를 포함하는 음성 데이터 생성 장치를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention is a family relationship model for generating a plurality of family relationship models corresponding to family relationships of the family members based on voice data of a plurality of family members. A generation unit, an input unit that receives voice data of a family member related to a user who wants to generate voice data, and a selection for selecting one family relationship model from among the plurality of family relationship models based on a family relationship between the user and the family member. and a generating unit generating voice data of the user by inputting voice data of the family member into the selected family relationship model.

본 발명의 다른 실시예는, 목소리 생성 서비스를 이용할 사용자의 식별자 정보를 포함하는 농아인 정보를 등록받는 단계, 상기 등록받은 농아인 정보에 기초하여 상기 목소리 생성 서비스에 대한 인증 프로세스를 수행하는 단계, 상기 인증 프로세스의 결과에 기초하여 타사용자 단말과의 통화 서비스를 수행하는 단계 및 상기 통화 서비스가 수행되는 중에 텍스트 정보 또는 음성을 입력받는 단계를 포함하고, 상기 인증 프로세스의 결과에 따라 상기 통화 서비스가 농아인 모드로 제공되는 경우, 상기 텍스트 정보는 상기 등록받은 농아인 정보와 대응되는 가족 관계 모델에 기초하여 음성 데이터로 변환되고, 상기 변환된 음성 데이터는 상기 타사용자 단말로 전달되는 것인 목소리 생성 서비스 제공 방법을 제공할 수 있다. Another embodiment of the present invention includes registering deaf-mute information including identifier information of a user who will use a voice generation service, performing an authentication process for the voice generation service based on the registered deaf-mute information, and performing the authentication process. Performing a call service with another user's terminal based on a result of the process, and receiving text information or voice while the call service is being performed, wherein the call service is performed in a deaf mode according to the result of the authentication process. , the text information is converted into voice data based on a family relationship model corresponding to the registered deaf information, and the converted voice data is transmitted to the other user terminal. can provide

본 발명의 또 다른 실시예는, 목소리 생성 서비스를 이용할 사용자 식별자 정보를 포함하는 농아인 정보를 등록받는 등록부, 상기 등록받은 농아인 정보에 기초하여 상기 목소리 생성 서비스에 대한 인증 프로세스를 수행하는 인증 프로세스 수행부, 상기 인증 프로세스의 결과에 기초하여 타사용자 단말과의 통화 서비스를 수행하는 통화 서비스 수행부 및 상기 통화 서비스가 수행되는 중에 텍스트 정보 또는 음성을 입력받는 입력부를 포함하고, 상기 인증 프로세스의 결과에 따라 상기 통화 서비스가 농아인 모드로 제공되는 경우, 상기 텍스트 정보는 상기 등록받은 농아인 정보와 대응되는 가족 관계 모델에 기초하여 음성 데이터로 변환되고, 상기 변환된 음성 데이터는 상기 타사용자 단말로 전달되는 것인 사용자 단말을 제공할 수 있다. According to another embodiment of the present invention, a registration unit for registering deaf information including user identifier information to use a voice generation service, and an authentication process performer performing an authentication process for the voice generation service based on the registered deaf information. , Based on the result of the authentication process, a call service performing unit that performs a call service with another user terminal and an input unit that receives text information or voice while the call service is being performed, and according to the result of the authentication process When the call service is provided in the deaf mode, the text information is converted into voice data based on a family relationship model corresponding to the registered deaf information, and the converted voice data is transmitted to the other user terminal. A user terminal may be provided.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described means for solving the problems is only illustrative and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 종래에는 음성 합성에 활용할 데이터를 많이 확보함으로써 특정 목소리의 데이터를 이용하여 원하는 음성이 발화되도록 합성할 수 있었으나, 본 발명은 특정 목소리의 음성 데이터가 없어도 가족 구성원의 음성 데이터를 이용하여 사용자의 데이터를 생성하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, in the past, it was possible to synthesize a desired voice using data of a specific voice by securing a lot of data to be used for voice synthesis, but the present invention provides voice data of a specific voice. It is possible to provide a device, method, and computer program for generating user data using voice data of family members even without the present invention.

선천적으로 말을 하지 못하는 장애를 가진 사용자에게 자신의 목소리를 가질 수 있도록 사용자 및 가족 구성원 간의 가족 관계에 기초하여 사용자의 목소리를 복원하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다. It is possible to provide a device, method, and computer program for restoring a user's voice based on family relationships between the user and family members so that a user with a congenital disability can have his or her own voice.

자신의 목소리로서 합성된 음성 데이터를 통화를 통해 상대방에게 제공하여 상대방과 대화하는 듯한 상황을 조성할 수 있도록 하는 서비스를 제공하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다It is possible to provide a device, method, and computer program that provide a service that provides a voice data synthesized as one's own voice to the other party through a call to create a conversational situation with the other party.

도 1은 본 발명의 일 실시예에 따른 음성 데이터 생성 장치의 구성도이다.
도 2a 및 도 2b는 본 발명의 일 실시예에 따른 복수의 가족 관계에 포함된 가족 구성원의 성별 정보에 기초하여 복수의 가족 관계를 분류하는 전처리 과정을 설명하기 위한 예시적인 도면이다.
도 3은 본 발명의 일 실시예에 따른 복수의 가족 구성원 간의 피치 거리값, 피치 분포도, 나이 차이값에 기초하여 가족 관계를 분류하는 과정을 설명하기 위한 예시적인 도면이다.
도 4a 내지 도 4e는 본 발명의 일 실시예에 따른 가족 관계 모델을 생성하는 과정을 설명하기 위한 예시적인 도면이다.
도 5a 내지 도 5c는 본 발명의 일 실시예에 따른 사용자와 관련된 가족 구성원의 음성 데이터에 기초하여 선택된 가족 관계 모델을 이용하여 사용자의 음성 데이터를 생성하는 과정을 설명하기 위한 예시적인 도면이다.
도 6a 내지 도 6d는 본 발명의 일 실시예에 따른 사용자 단말에서 인증 프로세스를 통해 통화 서비스를 수행하는 과정을 설명하기 위한 예시적인 도면이다.
도 7a 내지 도 7c는 본 발명의 일 실시예에 따른 사용자 단말에서 앱을 통해 사용자의 음성 데이터 서비스를 제공받는 과정을 설명하기 위한 예시적인 도면이다.
도 8a 및 도 8b는 본 발명의 일 실시예에 따른 사용자의 음성 데이터를 생성하여 통화 서비스를 제공하는 과정을 설명하기 위한 예시적인 도면이다.
도 9는 본 발명의 일 실시예에 따른 음성 데이터 생성 장치에서 음성 데이터를 생성하는 방법의 순서도이다. 1 is a block diagram of an apparatus for generating voice data according to an embodiment of the present invention.
2A and 2B are exemplary diagrams for explaining a preprocessing process of classifying a plurality of family relationships based on gender information of family members included in the plurality of family relationships according to an embodiment of the present invention.
3 is an exemplary diagram for explaining a process of classifying a family relationship based on a pitch distance value, a pitch distribution, and an age difference value between a plurality of family members according to an embodiment of the present invention.
4A to 4E are exemplary diagrams for explaining a process of generating a family relationship model according to an embodiment of the present invention.
5A to 5C are exemplary diagrams for explaining a process of generating voice data of a user by using a family relationship model selected based on voice data of family members related to the user according to an embodiment of the present invention.
6A to 6D are exemplary diagrams for explaining a process of performing a call service through an authentication process in a user terminal according to an embodiment of the present invention.
7A to 7C are exemplary diagrams for explaining a process of receiving a user's voice data service through an app in a user terminal according to an embodiment of the present invention.
8A and 8B are exemplary diagrams for explaining a process of generating a user's voice data and providing a call service according to an embodiment of the present invention.
9 is a flowchart of a method of generating voice data in the voice data generating apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by a terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 음성 데이터 생성 장치의 구성도이다. 도 1을 참조하면, 음성 데이터 생성 장치(100)는 가족 관계 모델 생성부(110), 입력부(120), 선택부(130) 및 생성부(140)를 포함할 수 있다. 1 is a block diagram of an apparatus for generating voice data according to an embodiment of the present invention. Referring to FIG. 1 , the voice data generating apparatus 100 may include a family relationship model generating unit 110 , an input unit 120 , a selection unit 130 and a generation unit 140 .

가족 관계 모델 생성부(110)는 복수의 가족 구성원의 음성 데이터에 기초하여 가족 구성원의 가족 관계에 해당하는 복수의 가족 관계 모델을 생성할 수 있다. 여기서, 복수의 가족 관계 모델은 예를 들어, 성별 정보, 피치 거리값, 피치 분포값, 나이 차이값 등에 기초하여 생성될 수 있다. 예를 들어, 가족 관계 모델 생성부(110)는 성별 정보, 피치 거리값, 피치 분포값, 나이 차이값에 기초하여 총 24개의 가족 관계 모델을 생성할 수 있다. 가족 관계 모델을 생성하는 과정에 대해서는 도 2a 내지 도 3을 통해 상세히 설명하도록 한다. The family relationship model generation unit 110 may generate a plurality of family relationship models corresponding to family relationships of family members based on voice data of a plurality of family members. Here, the plurality of family relationship models may be generated based on, for example, gender information, a pitch distance value, a pitch distribution value, an age difference value, and the like. For example, the family relationship model generation unit 110 may generate a total of 24 family relationship models based on gender information, a pitch distance value, a pitch distribution value, and an age difference value. A process of generating a family relationship model will be described in detail with reference to FIGS. 2A to 3 .

도 2a 및 도 2b는 본 발명의 일 실시예에 따른 복수의 가족 관계에 포함된 가족 구성원의 성별 정보에 기초하여 복수의 가족 관계를 분류하는 전처리 과정을 설명하기 위한 예시적인 도면이다. 2A and 2B are exemplary diagrams for explaining a preprocessing process of classifying a plurality of family relationships based on gender information of family members included in the plurality of family relationships according to an embodiment of the present invention.

가족 관계 모델 생성부(110)는 복수의 가족 관계에 포함된 가족 구성원의 성별 정보에 기초하여 복수의 가족 관계를 분류할 수 있다. The family relationship model generation unit 110 may classify a plurality of family relationships based on gender information of family members included in the plurality of family relationships.

예를 들어, 남성(200)과 여성(210)이 동일한 문자를 발화하였다고 가정하자. 도 2a 및 도 2b를 참조하면, 동일한 조건에서 남성(200)과 여성(210)의 주파수별 특징은 매우 다르게 나타나는 것을 확인할 수 있다. For example, suppose that the male 200 and the female 210 utter the same character. Referring to FIGS. 2A and 2B , it can be confirmed that the frequency-specific characteristics of a male 200 and a female 210 appear very different under the same condition.

이는, 사람마다 가지고 있는 고유 목소리인 피치(pitch)가 성대 길이, 크기, 긴장에 따라 달라지는데, 성별에 따른 기본 주파수의 차이가 주로 성대의 크기와 길이의 차이로 인해 발생되기 때문이다. 예를 들어, 사춘기 이전의 남성(200) 및 여성(210)의 피치의 차이는 거의 없으나, 사춘기 이후부터 남성의 후두가 커지고, 성대가 길어지면서, 남성(200)의 주파수가 여성(210)의 주파수보다 낮아지게 된다.This is because the pitch, which is a unique voice that each person has, varies depending on the length, size, and tension of the vocal cords, and the difference in fundamental frequency according to gender is mainly caused by the difference in size and length of the vocal cords. For example, there is almost no difference in pitch between the male 200 and the female 210 before puberty, but after puberty, the male's larynx grows and the vocal cords become longer, and the frequency of the male 200 changes to that of the female 210. lower than the frequency.

따라서, 가족 관계 모델 생성부(110)는 복수의 가족 관계에 포함된 가족 구성원의 성별 정보에 기초하여 복수의 가족 관계를 모녀 관계, 부자 관계, 자매 관계, 형제 관계 등으로 분류할 수 있다. Accordingly, the family relationship model generation unit 110 may classify the plurality of family relationships into a mother-daughter relationship, father-son relationship, sister-sister relationship, brother-sister relationship, and the like, based on gender information of family members included in the plurality of family relationships.

도 3은 본 발명의 일 실시예에 따른 복수의 가족 구성원 간의 피치 거리값, 피치 분포도, 나이 차이값에 기초하여 가족 관계를 분류하는 과정을 설명하기 위한 예시적인 도면이다. 도 3을 참조하면, 가족 관계 모델 생성부(110)는 복수의 가족 구성원의 음성 데이터 간의 피치 거리값에 기초하여 가족 관계를 복수의 가족 관계 그룹으로 분류하여 복수의 가족 관계 그룹 모델을 생성할 수 있다. 여기서, 피치(pitch)란 사람 마다 가지고 있는 고유 목소리를 나타내는 것으로, 성대 길이, 크기, 긴장에 따라 다르다.3 is an exemplary diagram for explaining a process of classifying a family relationship based on a pitch distance value, a pitch distribution, and an age difference value between a plurality of family members according to an embodiment of the present invention. Referring to FIG. 3 , the family relationship model generation unit 110 may classify family relationships into a plurality of family relationship groups based on pitch distance values between voice data of a plurality of family members and generate a plurality of family relationship group models. have. Here, the pitch indicates a unique voice that each person has, and is different depending on the length, size, and tension of the vocal cords.

예를 들어, 가족 관계 모델 생성부(110)는 복수의 가족 구성원의 성별 정보에 기초하여 가족 관계를 자매 관계(300)로 분류한 경우, 자매의 음성 데이터 간의 피치 거리값이 가까운지 여부에 따라 가족 관계를 더 분류할 수 있다. For example, when the family relationship model generation unit 110 classifies family relationships as sister relationships 300 based on gender information of a plurality of family members, the pitch distance values between voice data of sisters are close according to whether or not they are close. Family relationships can be further classified.

일반적으로 사람의 음색을 나타내는 가장 큰 지표는 음성의 톤(tone)이며, 음성의 톤(tone)은 목소리를 여성, 남성, 아이, 어른으로 구분할 수 있는 지표이기도 하다. 그러나 가족 관계의 경우, 예를 들어, 자매 관계(300)에 포함된 구성원의 쌍 간의 목소리는 유사하지만, 유사한 정도는 매우 작을 수도 있고, 클 수도 있다. 따라서, 가족 관계 모델 생성부(110)는 피치 거리값에 기초하여 구성원 쌍 간의 목소리 유사 정도에 기초하여 가족 관계를 더 분류할 수 있다. 가족 관계 모델 생성부(110)는, 예를 들어, 다음의 수학식 1을 이용하여 피치 거리값을 산출할 수 있다. In general, the biggest index representing a person's timbre is the tone of a voice, and the tone of a voice is also an indicator for distinguishing voices into female, male, child, and adult voices. However, in the case of a family relationship, for example, the voices between pairs of members included in the sisterhood relationship 300 are similar, but the degree of similarity may be very small or large. Accordingly, the family relationship model generation unit 110 may further classify the family relationship based on the degree of voice similarity between pairs of members based on the pitch distance value. The family relationship model generating unit 110 may calculate the pitch distance value using, for example, Equation 1 below.

수학식 1을 참조하면, 예를 들어, 가족 관계 모델 생성부(110)는 제 1 음성 데이터 및 제 2 음성 데이터가 서로 가족 관계에 있으므로, 독립적이지 않은 데이터(공분산이 존재하는 경우)로 판단하고, 마할라노비스 거리(Mahalanobis distance)를 이용하여 피치 거리값을 산출할 수 있다. Referring to Equation 1, for example, the family relationship model generation unit 110 determines that the first voice data and the second voice data are non-independent data (when covariance exists) because they have a family relationship with each other, and , the pitch distance value can be calculated using the Mahalanobis distance.

예를 들어, 가족 관계 모델 생성부(110)는 피치 거리값에 기초하여 가족 관계를 피치 거리값이 임계치 초과(310)인 경우와 피치 거리값이 임계치 미만(320)인 경우로 가족 관계 그룹을 분류할 수 있다. 여기서, 피치 거리값이 임계치 초과(310)인 경우는 피치 거리값이 먼 경우를 의미하고, 피치 거리값이 임계치 미만(320)인 경우는 피치 거리값이 가까운 경우를 의미하는 것일 수 있다. For example, the family relationship model generation unit 110 classifies family relationships into a case where the pitch distance value exceeds the threshold value 310 and a case where the pitch distance value is less than the threshold value 320 based on the pitch distance value. can be classified. Here, the case where the pitch distance value is greater than the threshold value (310) may mean a case where the pitch distance value is far, and the case where the pitch distance value is less than the threshold value (320) may mean a case where the pitch distance value is close.

이 때, 가족 관계 모델 생성부(110)는 피치 거리값에 기초하여 분류된 가족 관계를 피치 분포값에 기초하여 가족 관계 그룹을 더 분류할 수 있다. 이는, 음성 데이터의 스펙트럼을 분석하면 에너지 분포도의 차이가 뚜렷하게 존재하기 때문이다. At this time, the family relationship model generation unit 110 may further classify family relationships classified based on the pitch distance value into family relationship groups based on the pitch distribution value. This is because there is a distinct difference in energy distribution when the spectrum of voice data is analyzed.

예를 들어, 가족 관계 모델 생성부(110)는 피치 거리값이 임계치 미만(320)인 경우에 대해 유사한 두 음성 데이터들의 피치 거리값이 존재하는 주파수 대역에 기초하여 저주파수(low frequency) 대역(322) 또는 고주파수(high frequency) 대역(321)으로 가족 관계 그룹을 더 분류할 수 있다. For example, when the pitch distance value is less than the threshold value 320, the family relation model generation unit 110 uses a low frequency band 322 based on a frequency band in which pitch distance values of two similar voice data exist. ) or a high frequency band 321, the family relationship group may be further classified.

이는, 사람의 귀가 로그 스케일(log scale)로 인식하므로, 동일한 주파수 만큼의 차이를 갖는 두 데이터도 저주파수 대역(322)에 존재하는 데이터와 고주파수 대역(321)에 존재하는 데이터를 같은 값으로 변경한 경우, 변경한 데이터를 들었을 때 느끼는 차이는 다를 수 있기 때문이다. 일반적으로 사람의 귀는 저주파수 대역(322)에서 더 민감하다. 만약, 음성 합성을 위해 고주파수 대역(321)의 값을 '2'만큼 변경했을 때와 저주파수 대역(322)의 값을 '2'만큼 변경했을 때, 사람이 듣기에는 저주파수 대역(322)의 값이 더 크게 변화되었다고 생각할 수 있다. 따라서, 가족 관계 모델 생성부(110)는 저주파수 대역(322)에 대해서는 더 민감하게 데이터를 변경하도록 가족 관계 그룹 모델을 생성하도록 할 수 있다. This is because the human ear recognizes it on a log scale, so even two data having a difference by the same frequency change the data existing in the low frequency band 322 and the data existing in the high frequency band 321 to the same value. In this case, the difference felt when listening to the changed data may be different. In general, the human ear is more sensitive in the low frequency band 322. If the value of the high frequency band 321 is changed by '2' for voice synthesis and the value of the low frequency band 322 is changed by '2', the value of the low frequency band 322 is You can think of it as a bigger change. Accordingly, the family relationship model generating unit 110 may generate a family relationship group model to change data more sensitively in the low frequency band 322 .

가족 관계 모델 생성부(110)는 복수의 가족 구성원 간의 나이 차이값에 기초하여 가족 관계를 복수의 가족 관계 그룹으로 분류하고, 복수의 가족 관계 모델을 생성할 수 있다. 이는, 나이가 목소리에 영향을 주는 한 요인이기도 하므로, 목소리의 변환 또는 합성 시 사람의 음성 톤이 비슷하더라도 나이 차이를 고려해야 한다.The family relationship model generation unit 110 may classify family relationships into a plurality of family relationship groups based on age differences between a plurality of family members and generate a plurality of family relationship models. Since age is also a factor influencing the voice, the age difference must be taken into account when converting or synthesizing the voice even if the human voice tone is similar.

예를 들어, 가족 관계 모델 생성부(110)는 자매 관계(300)에 대해 피치 거리값이 임계치 초과(310)인 경우에 대해 나이 차이값에 기초하여 가족 관계를 제 1 그룹(311)(두 가족 구성원 간의 나이 차이값이 임계치를 초과하는 경우) 또는 제 2 그룹(312)(두 가족 구성원 간의 나이 차이값이 임계치 미만인 경우)으로 분류할 수 있다. For example, the family relationship model generation unit 110 determines the family relationship as a first group 311 (two sibling relationships 300) based on the age difference value when the pitch distance value exceeds the threshold value 310. When the age difference between family members exceeds a threshold value) or the second group 312 (when the age difference value between two family members is less than a threshold value), it may be classified.

다른 예를 들어, 가족 관계 모델 생성부(110)는 자매 관계(300)에 대해 피치 거리값이 고주파 대역(321)으로 분류된 경우, 나이 차이값에 기초하여 제 3 그룹(330) 또는 제 4 그룹(331)으로 분류할 수 있다. 이와 달리, 가족 관계 모델 생성부(110)는 자매 관계(300)에 대해 피치 거리값이 저주파 대역(322)으로 분류된 경우, 나이 차이값에 기초하여 제 5 그룹(340) 또는 제 6 그룹(341)으로 분류할 수 있다. For another example, when the pitch distance value for the sister relationship 300 is classified into the high frequency band 321, the family relationship model generation unit 110 generates a third group 330 or a fourth group based on the age difference value. Group 331 can be classified. In contrast, when the pitch distance value for the sister relationship 300 is classified as the low frequency band 322, the family relationship model generation unit 110 classifies the fifth group 340 or the sixth group (340) based on the age difference value. 341) can be classified as:

이와 같이, 가족 관계 모델 생성부(110)는 가족 구성원의 성별 정보에 기초하여 분류된 자매 관계(300)에 대해 피치 거리값, 피치 분포도, 나이 차이값 등에 기초하여 가족 관계를 더 분류함으로써, 복수의 가족 관계 그룹을 총 6개의 가족 관계 그룹으로 분류할 수 있다. In this way, the family relationship model generation unit 110 further classifies the family relationship based on the pitch distance value, the pitch distribution, the age difference value, etc. for the sister relationship 300 classified based on the gender information of the family member, The family relationship group of can be classified into a total of 6 family relationship groups.

도 4a 내지 도 4d는 본 발명의 일 실시예에 따른 가족 관계 모델을 생성하는 과정을 설명하기 위한 예시적인 도면이다. 4A to 4D are exemplary diagrams for explaining a process of generating a family relationship model according to an embodiment of the present invention.

도 4a는 본 발명의 일 실시예에 따른 오토인코더를 도시한 예시적인 도면이다. 도 4a를 참조하면, 오토인코더(AutoEncoder)는 인코더(410) 및 디코더(411)를 포함하며, 인코더(410) 및 디코더(411)를 이용하여 입력(400)과 출력(401)의 값을 근사시키기 위한 기술로, 입력(400)과 출력(401)의 차원이 동일하다는 특징을 가지고 있다. 4A is an exemplary diagram illustrating an autoencoder according to an embodiment of the present invention. Referring to FIG. 4A, an autoencoder includes an encoder 410 and a decoder 411, and approximates values of an input 400 and an output 401 using the encoder 410 and the decoder 411. As a technique for doing this, it has the feature that the dimensions of the input 400 and the output 401 are the same.

오토인코더는 신경망의 각 층을 단계적으로 학습해나가다, 최종 출력(output)이 최초 입력(input)을 재현하도록 하는 것을 주된 특징으로 하고 있다. 입력(400)과 출력(401) 층의 차원(노드의 개수)은 동일하되, 히든 레이어(hidden layer)가 입력층 및 출력층보다 차원이 낮음으로써, 신경망은 입력 데이터들을 압축하여 특징을 추출하고, 추출한 특징을 기반으로 입력(400)을 최대한 재현한 출력 데이터를 도출할 수 있다. The main feature of autoencoders is that each layer of a neural network is learned step by step, and the final output reproduces the initial input. The dimensions (number of nodes) of the input 400 and output 401 layers are the same, but the hidden layer has a lower dimension than the input and output layers, so the neural network extracts features by compressing the input data, Based on the extracted features, output data that reproduces the input 400 as much as possible can be derived.

오토인코더는 히든 레이어를 여러 층으로 쌓아 구현함으로써, 더 의미 있는 특징(feature)을 추출할 수 있게 된다. , 예를 들어, 가장 작은 압축된 특징을 획득할 수 있는 코드(code) 부분은 보틀넥 히든 레이어(bottleneck hidden layer)일 수 있다. 여기서, 오토인코더의 입력벡터는

이면, 히든 레이어를 통해 보틀넥 히든 레이어의 코드로서

로 표현될 수 있다. The autoencoder is implemented by stacking multiple layers of hidden layers, so that more meaningful features can be extracted. , For example, a code part capable of obtaining the smallest compressed feature may be a bottleneck hidden layer. Here, the input vector of the autoencoder is

As the code of the bottleneck hidden layer through the hidden layer

can be expressed as

*이러한 과정은 결정적 매핑(deterministic mapping)이라는 일종의 압축 과정으로, 입력(400)으로부터 의미있는 특징을 추출하는 과정을 의미한다. 결정적 매핑은, 예를 들어, 다음의 수학식 2를 통해 설명될 수 있다. * This process is a kind of compression process called deterministic mapping, and means a process of extracting meaningful features from the input 400. The deterministic mapping can be described, for example, through Equation 2 below.

수학식 2를 참조하면, θ=W, b의 모수를 의미하고, W는 d*D의 가중치 행렬(Weight matrix)이고, b는 bias를 의미할 수 있다. Referring to Equation 2, θ=W, denotes a parameter of b, W is a weight matrix of d*D, and b may denote a bias.

히든 레이어에서 계산되는 코드 값인 y는 다시 복원된 벡터인

으로 매핑될 수 있다. 이 때, 매핑은, 예를 들어, 다음의 수학식 3을 통해 이루어질 수 있으며, 손실함수는, 예를 들어, 다음의 수학식 4을 통해 도출될 수 있다. y, the code value calculated in the hidden layer, is the restored vector

can be mapped to In this case, the mapping may be performed through, for example, Equation 3 below, and the loss function may be derived through Equation 4 below, for example.

이러한 학습 과정을 통해, 가족 관계 모델 생성부(110)는 예를 들어, wav 파일 형식의 음성 데이터를 입력받으면, 입력된 음성 데이터로부터 음성을 잘 표현할 수 있도록 압축된 특징 벡터로 변환할 수 있다. Through this learning process, the family relationship model generation unit 110, for example, when receiving voice data in a wav file format, can convert the input voice data into a compressed feature vector to express the voice well.

도 4b 및 도 4c는 본 발명의 일 실시예에 따른 자매 관계에 해당하는 음성 데이터로부터 코드 변환을 수행하는 과정을 설명하기 위한 예시적인 도면이다.4B and 4C are exemplary diagrams for explaining a process of performing code conversion from voice data corresponding to a sister relationship according to an embodiment of the present invention.

가족 관계 모델 생성부(110)는 복수의 가족 구성원 중 제 1 가족 구성원의 음성 데이터로부터 제 1 특징 벡터를 도출하고, 제 2 가족 구성원의 음성 데이터로부터 제 2 특징 벡터를 도출하고, 제 1 특징 벡터 및 제 2 특징 벡터에 기초하여 제 1 가족 구성원과 제 2 가족 구성원의 가족 관계에 대응하는 가족 관계 모델을 생성할 수 있다. The family relationship model generator 110 derives a first feature vector from voice data of a first family member among a plurality of family members, derives a second feature vector from voice data of a second family member, and and based on the second feature vector, a family relationship model corresponding to the family relationship between the first family member and the second family member may be generated.

가족 관계 모델 생성부(110)는 제 1 특징 벡터 및 제 2 특징 벡터를 통해 목소리 유사성 모델링을 수행하여 가족 관계 모델을 생성할 수 있다. The family relationship model generation unit 110 may generate a family relationship model by performing voice similarity modeling through the first feature vector and the second feature vector.

도 4b를 참조하면, 가족 관계 모델 생성부(110)는 자매 관계 중 언니(420)의 음성 모델을 오토인코더를 이용하여 훈련시킬 수 있다. 예를 들어, 가족 관계 모델 생성부(110)는 언니의 음성 데이터를 입력받으면, 오토인코딩 과정을 통해 보틀넥 레이어(bottleneck layer)로 효과적인 특징(feature)을 추출해낼 수 있도록 훈련시킬 수 있다. Referring to FIG. 4B , the family relationship model generation unit 110 may train a voice model of an older sister 420 among sister relationships using an autoencoder. For example, when the family relation model generation unit 110 receives the voice data of the older sister, it can train the bottleneck layer to extract effective features through an auto-encoding process.

사용자의 음성을 비지도 학습(unsupervised learning)하여 언니의 발화 음성을 가장 효과적으로 표현할 수 있는 특징(feature)을 통한 보틀넥 레이어(bottleneck layer)의 제 1 특징 벡터로 인코딩 부분(421)에서 추출할 수 있다. 이 때, 추출된 언니의 코드는 'c1'일 수 있다. In the encoding part 421, the first feature vector of the bottleneck layer can be extracted through unsupervised learning of the user's voice and the feature that can most effectively express the sister's spoken voice. have. At this time, the extracted older sister's code may be 'c1'.

도 4c를 참조하면, 가족 관계 모델 생성부(110)는 자매 관계 중 여동생(430)의 음성 모델을 오토인코더를 이용하여 훈련시킬 수 있다. 예를 들어, 가족 관계 모델 생성부(110)는 여동생의 음성 데이터를 입력받으면, 출력을 입력에 근사시킬 수 있을 때까지 학습을 반복시킬 수 있다. Referring to FIG. 4C , the family relationship model generation unit 110 may train a voice model of a younger sister 430 among sister relationships using an autoencoder. For example, when receiving voice data of a younger sister, the family relation model generation unit 110 may repeat learning until an output is approximated to an input.

가족 관계 모델 생성부(110)는 보틀넥 레이어를 통해 여동생의 음성이 잘 반영된 제 2 특징 벡터를 인코딩 부분(431)에서 추출할 수 있다. 이 때, 추출된 여동생의 코드는 'c2'일 수 있다. The family relationship model generation unit 110 may extract a second feature vector in which the younger sister's voice is well reflected through the bottleneck layer through the encoding part 431 . At this time, the code of the extracted younger sister may be 'c2'.

도 4d를 참조하면, 가족 관계 모델 생성부(110)는 언니(420)의 음성 모델로부터 도출된 코드인 'c1'(440)을 여동생(430)의 음성 모델로부터 도출된 코드인 'c2'(450)으로 변환되도록 모델링을 수행할 수 있다. 이는, 자매 관계에 있는 데이터들의 코드 쌍은 유사성을 지니고 있어 모델링이 가능하며, 자매 관계에 있는 데이터 쌍들을 입력과 출력으로 학습시켜 유사 관계를 학습시킴으로써, 자매 관계에 대한 상관 관계 모델링을 수행하여 자매 관계 모델을 생성할 수 있다.Referring to FIG. 4D , the family relationship model generator 110 converts 'c1' (440), a code derived from the voice model of the older sister 420, to 'c2' (code derived from the voice model of the younger sister 430). 450), modeling may be performed. This is because the code pairs of data in sister relationship have similarity, so modeling is possible. You can create relationship models.

도 4e를 참조하면, 가족 관계 모델 생성부(110)는 자매 관계에 대한 모델링이 수행된 경우, 언니(420)의 인코더 부분(421)과 여동생(430)의 디코더 부분(431)을 연결시킬 수 있다. 예를 들어, 가족 관계 모델 생성부(110)는 입력 데이터를 언니(420)의 인코더 부분(421)에 입력하여 보틀넥 레이어를 통해 특징을 추출하면, 자매 관계 모델이 code2code로 유사한 음성을 추론할 수 있다. 이 때, 해당 음성의 코드(code)로 변환 후, 여동생(430)의 디코딩 부분(431)을 통해 음성이 출력될 수 있다. Referring to FIG. 4E , the family relationship model generation unit 110 may connect the encoder part 421 of the older sister 420 and the decoder part 431 of the younger sister 430 when modeling of the sister-sister relationship is performed. have. For example, when the family relationship model generation unit 110 inputs input data to the encoder part 421 of the older sister 420 and extracts features through the bottleneck layer, the sister relationship model infers a similar voice using code2code. can At this time, after conversion into a code of the corresponding voice, the voice may be output through the decoding part 431 of the younger sister 430.

입력부(120)는 음성 데이터를 생성하고자 하는 사용자와 관련된 가족 구성원의 음성 데이터를 입력받을 수 있다. 예를 들어, 입력부(120)는 선천적으로 말을 할 수 없는 장애를 가진 농아인과 관련된 가족 구성원(예를 들어, 엄마, 언니 등)의 음성 데이터를 입력받을 수 있다. The input unit 120 may receive voice data of a family member related to a user who wants to generate voice data. For example, the input unit 120 may receive voice data of a family member (eg, mother, older sister, etc.) related to a deaf person with a congenital disability.

선택부(130)는 사용자 및 가족 구성원 간의 가족 관계에 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 예를 들어, 선택부(130)는 사용자 및 가족 구성원 간의 가족 관계가 '자매 관계'인 경우, 자매 관계에 해당하는 가족 관계 그룹 모델을 선택할 수 있다, The selection unit 130 may select one family relationship group model from among a plurality of family relationship group models based on family relationships between the user and family members. For example, when a family relationship between a user and a family member is a 'sister relationship', the selector 130 may select a family relationship group model corresponding to the sister relationship.

선택부(130)는 사용자 및 가족 구성원 간의 성별 정보에 더 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 예를 들어, 사용자 및 가족 구성원이 모두 남자인 경우, 선택부(130)는 남성들로만 구성된 형제 관계에 해당하는 가족 관계 모델 또는 부자 관계에 해당하는 가족 관계 그룹 모델을 선택할 수 있다.The selection unit 130 may select one family relationship group model from among a plurality of family relationship group models further based on gender information between the user and family members. For example, when both the user and the family members are male, the selection unit 130 may select a family relationship model corresponding to a sibling relationship consisting only of males or a family relationship group model corresponding to a parent-child relationship.

선택부(130)는 사용자 및 가족 구성원 간의 나이 차이값에 더 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 예를 들어, 사용자 및 가족 구성원 간의 가족 관계가 '자매 관계'이고 사용자 및 가족 구성원 간이 나이 차이값이 임계치 초과 또는 임계치 미만인 경우에 따라, 선택부(130)는 자매 관계에 해당하는 가족 관계 그룹 모델 중 나이 차이값이 임계치를 초과하는 가족 관계 그룹 모델 또는 나이 차이값이 임계치 미만인 가족 관계 그룹 모델을 선택할 수 있다.The selector 130 may select one family relationship group model from among a plurality of family relationship group models further based on the age difference between the user and the family members. For example, if the family relationship between the user and the family member is a 'sister relationship' and the age difference between the user and the family member is greater than or less than a threshold value, the selector 130 selects a family relationship group model corresponding to the sister relationship. Among them, a family relation group model in which the age difference exceeds the threshold value or a family relation group model in which the age difference value is less than the threshold value may be selected.

일 실시예에 따르면, 선택부(130)는 사용자의 목소리 데이터가 일부 존재하는 경우, 사용자의 목소리 데이터 및 가족 구성원의 음성 데이터에 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 여기서, 사용자의 목소리 데이터는 사용자의 성대 울림 등을 통해 발성된 목소리 중 음성으로 이용 가능한 정도의 데이터일 수 있다. According to an embodiment, the selector 130 selects one family relation group model from among a plurality of family relation group models based on the user's voice data and the family member's voice data, when some voice data of the user exists. can Here, the user's voice data may be data of a degree that can be used as voice among voices uttered through resonating of the user's vocal cords.

선택부(130)는 사용자의 목소리 데이터 및 가족 구성원의 음성 데이터 간의 피치 거리값에 더 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 예를 들어, 사용자 및 가족 구성원 간의 가족 관계가 '자매 관계'인 경우, 복원하고자 하는 사용자의 음성의 'f0'값과 가족 구성원의 음성의 'f0'값 간의 피치 거리값에 기초하여 자매 관계에 해당하는 가족 관계 모델 중 피치 거리값이 임계치 미만에 해당하는 가족 관계 그룹 모델 또는 피치 거리값이 임계치 초과에 해당하는 가족 관계 그룹 모델을 선택할 수 있다. The selector 130 may select one family relation group model from among a plurality of family relation group models further based on pitch distance values between the user's voice data and the family member's voice data. For example, when the family relationship between a user and a family member is a 'sister relationship', the sister relationship is determined based on the pitch distance value between the 'f0' value of the user's voice and the 'f0' value of the family member's voice to be restored. Among the corresponding family relationship models, a family relationship group model whose pitch distance value is less than the threshold value or a family relationship group model whose pitch distance value exceeds the threshold value may be selected.

선택부(130)는 사용자의 목소리 데이터 및 가족 구성원의 음성 데이터 간의 피치 분포값에 더 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 예를 들어, 사용자 및 가족 구성원 간의 가족 관계가 '자매 관계'이고 사용자 및 가족 구성원의 피치 분포값이 저주파수 대역 또는 고주파수 대역인 경우에 따라, 선택부(130)는 자매 관계에 해당하는 가족 관계 그룹 모델 중 저주파수에 해당하는 가족 관계 그룹 모델 또는 고주파수에 해당하는 가족 관계 그룹 모델을 선택할 수 있다.The selector 130 may select one family relationship group model from among a plurality of family relationship group models further based on a pitch distribution value between voice data of the user and voice data of family members. For example, when the family relationship between the user and the family member is a 'sister relationship' and the pitch distribution value of the user and the family member is a low frequency band or a high frequency band, the selector 130 selects a family relationship group corresponding to the sister relationship. Among the models, a family relation group model corresponding to a low frequency or a family relation group model corresponding to a high frequency may be selected.

생성부(140)는 사용자의 목소리 데이터 및 가족 구성원의 음성 데이터에 기초하여 선택된 가족 관계 모델에 가족 구성원의 음성 데이터를 입력하여 사용자의 음성 데이터를 생성할 수 있다. 여기서, 사용자의 음성 데이터는 사용자 및 상대방 간의 통화 서비스 중에 제공될 수 있다. The generation unit 140 may generate the user's voice data by inputting the family member's voice data into the family relationship model selected based on the user's voice data and the family member's voice data. Here, the user's voice data may be provided during a call service between the user and the other party.

다른 실시예에 따르면, 선택부(130)는 사용자의 목소리 데이터가 존재하지 않은 경우, 사용자와 관련된 가족 구성원의 음성 데이터에 기초하여 복수의 가족 관계 그룹 모델 중 하나의 가족 관계 그룹 모델을 선택할 수 있다. 사용자의 목소리 데이터가 존재하지 않은 경우에 가족 관계 그룹 모델을 선택하여 사용자의 음성 데이터를 생성하는 과정에 대해서는 도 5a 내지 도 5c를 통해 상세히 설명하도록 한다. According to another embodiment, when voice data of the user does not exist, the selector 130 may select one family relationship group model from among a plurality of family relationship group models based on voice data of family members related to the user. . A process of generating user voice data by selecting a family relation group model when user voice data does not exist will be described in detail with reference to FIGS. 5A to 5C .

도 5a 내지 도 5c는 본 발명의 일 실시예에 따른 사용자와 관련된 가족 구성원의 음성 데이터에 기초하여 선택된 가족 관계 그룹 모델을 이용하여 사용자의 음성 데이터를 생성하는 과정을 설명하기 위한 예시적인 도면이다. 이하에서는, 농아인이 언니이고, 건청인이 여동생인 경우를 가정하여 설명하도록 한다. 5A to 5C are exemplary diagrams for explaining a process of generating voice data of a user by using a family relation group model selected based on voice data of family members related to the user according to an embodiment of the present invention. In the following description, it is assumed that the deaf person is the older sister and the hearing person is the younger sister.

도 5a를 참조하면, 음성으로 이용 가능한 농아인인 언니의 목소리 데이터가 존재하지 않은 경우, 선택부(130)는 언니 및 여동생 간의 관계에 기초하여 복수의 가족 관계 그룹 모델 중 자매 관계(500)에 해당하는 가족 관계 그룹 모델을 선택할 수 있다. Referring to FIG. 5A , when there is no voice data of a deaf older sister available as a voice, the selection unit 130 corresponds to a sister relationship 500 among a plurality of family relationship group models based on relationships between older sisters and younger sisters. You can choose a family relationship group model that does.

이후, 선택부(130)는 선택한 자매 관계(500)에 해당하는 가족 관계 그룹 모델 중 농아인인 언니와 건청인인 여동생 간의 나이를 비교하여, 나이 차이값이 임계치 초과 또는 임계치 미만인 경우에 따라, 선택부(130)는 자매 관계(500)에 해당하는 가족 관계 그룹 모델 중 나이 차이값이 임계치를 초과 또는 미만인지 여부에 기초하여 제 1 그룹(510) 또는 제 2 그룹(511)에 해당하는 모델을 선택할 수 있다. Thereafter, the selection unit 130 compares the ages between the deaf sister and the hearing sister among the family relation group models corresponding to the selected sister relationship 500, and if the age difference exceeds a threshold value or is less than a threshold value, the selection unit Step 130 selects a model corresponding to the first group 510 or the second group 511 based on whether the age difference exceeds or falls below a threshold among the family relationship group models corresponding to the sister relationship 500. can

마지막으로, 선택부(130)는 선택된 제 1 그룹 모델(510) 또는 제 2 그룹 모델(511)과 여동생의 음성 데이터와의 유사도에 기초하여 최종적으로 제 3 그룹 모델(520) 내지 제 6 그룹(523) 중 어느 하나에 해당하는 모델을 선택할 수 있다. Finally, the selector 130 finally selects the third group model 520 to the sixth group ( 520 ) based on the similarity between the selected first group model 510 or the second group model 511 and the voice data of the younger sister. 523) can be selected.

도 5b를 참조하면, 생성부(140)는 오토인코더 또는 CNN(Convolution Neural Network)를 이용하여 여동생의 음성 데이터로부터 특징 벡터를 추출할 수 있다. 예를 들어, 10ms단위의 여동생의 음성 데이터로부터 멜 스펙트로그램(Mel Spectrogram)을 40차원(N차원 가능)으로 추출할 수 있다. 이 때, 생성부(140)는 M(음성 데이터의 시간 길이)x40차원의 매트릭스를 생성하고, 생성된 매트릭스를 CNN의 풀링(pooling)을 통해 여동생의 음성에 대한 특징을 추출한 후, 특징 벡터(1x256)를 추출할 수 있다. 여기서, CNN의 풀링은 특성맵을 다운 샘플링하는 역할로, 입력 변수량을 축소시키는 역할을 할 수 있다. Referring to FIG. 5B , the generation unit 140 may extract a feature vector from the younger sister's voice data using an autoencoder or a Convolution Neural Network (CNN). For example, a 40-dimensional (N-dimensional) Mel Spectrogram can be extracted from the voice data of the younger sister in units of 10 ms. At this time, the generation unit 140 generates a matrix of M (time length of speech data) x 40 dimensions, extracts features of the sister's voice through CNN pooling from the generated matrix, and then features vector ( 1x256) can be extracted. Here, CNN pooling serves to downsample the feature map, and can serve to reduce the amount of input variables.

이후, 선택부(130)는 다음의 수학식 5를 이용하여, 여동생의 음성 특징에 해당하는 A와 제 3 그룹(520) 내지 제 6 그룹(523) 중 최종적으로 선택된 어느 하나의 그룹의 기훈련된 가족 데이터인 B간의 비교를 통해 가장 유사한 화자를 선택할 수 있다. Thereafter, the selector 130 uses the following Equation 5 to train A corresponding to the voice characteristic of the younger sister and one group finally selected from the third group 520 to the sixth group 523. The most similar speaker can be selected through comparison between B, which is the family data.

이후, 생성부(140)는 피드 포워드 네트워크(Feed Forward Network) 구조로 구성된 딥러닝 모델을 이용하여 특징 벡터 간의 유사도를 분석할 수 있다. 여기서, 피드 포워드 네트워크는 입력층(input layer)으로 데이터가 입력되고, 1개 이상으로 구성된 은닉층(hidden layer)을 거쳐 마지막의 출력층(output layer)으로 출력값을 내보내는 구조로 구성될 수 있다. Thereafter, the generation unit 140 may analyze the similarity between feature vectors using a deep learning model configured in a feed forward network structure. Here, the feed forward network may have a structure in which data is input to an input layer and an output value is exported to a final output layer through one or more hidden layers.

예를 들어, 생성부(140)는 Fully Connected Network를 이용하여 유사도를 분석할 수 있다. 여기서, Fully Connected Network는 딥러닝의 마지막에서 분류를 결정하는 층으로, 1차원 벡터로 변환된 레이어를 하나의 벡터로 연결시키는 역할을 한다. For example, the generation unit 140 may analyze the degree of similarity using a fully connected network. Here, the Fully Connected Network is the layer that determines the classification at the end of deep learning, and serves to connect the layers converted to one-dimensional vectors into one vector.

예를 들어, 여동생의 음성 데이터로부터 출된 특징 벡터는 X(540)와 맵핑될 수 있다. 이 때, Y(541)는 훈련에 이용된 화자의 인덱스가 될 수 있다. 인덱스는 화자의 순번에 따라 '1번 화자: 1', '100번 화자: 100'과 같이 인덱스가 부여될 수 있다. 생성부(140)는 사용자와 관련된 가족 구성원의 음성 데이터에 기초하여 선택된 가족 관계 모델에 가족 구성원의 음성 데이터를 입력하여 사용자의 음성 데이터를 생성할 수 있다. 여기서, 사용자의 음성 데이터는 사용자 및 상대방 간의 통화 서비스 중에 제공될 수 있다. For example, a feature vector derived from the younger sister's voice data may be mapped with X 540 . In this case, Y 541 may be an index of a speaker used for training. Indexes such as 'Speaker No. 1: 1' and 'Speaker No. 100: 100' may be given according to the order of speakers. The generation unit 140 may generate voice data of the user by inputting voice data of the family member into a family relationship model selected based on the voice data of the family member related to the user. Here, the user's voice data may be provided during a call service between the user and the other party.

이러한 음성 데이터 생성 장치(100)는 음성 데이터를 생성하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램에 의해 실행될 수 있다. 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 음성 데이터 생성 장치(100)는 복수의 가족 구성원의 음성 데이터에 기초하여 가족 구성원의 가족 관계에 해당하는 복수의 가족 관계 모델을 생성하고, 음성 데이터를 생성하고자 하는 사용자와 관련된 가족 구성원의 음성 데이터를 입력받고, 사용자 및 가족 구성원 간의 가족 관계에 기초하여 복수의 가족 관계 모델 중 하나의 가족 관계 모델을 선택하고, 선택된 가족 관계 모델에 가족 구성원의 음성 데이터를 입력하여 사용자의 음성 데이터를 생성하도록 하는 명령어들의 시퀀스를 포함할 수 있다. The voice data generating apparatus 100 may be executed by a computer program stored in a medium including a sequence of instructions for generating voice data. When the computer program is executed by a computing device, the voice data generating device 100 generates a plurality of family relationship models corresponding to the family relationships of the family members based on the voice data of the plurality of family members, and generates the voice data. Receive voice data of family members related to the user, select one family relationship model from among a plurality of family relationship models based on family relationships between the user and family members, and input voice data of family members to the selected family relationship model. and a sequence of instructions for generating user voice data.

도 6a 내지 도 6d는 본 발명의 일 실시예에 따른 사용자 단말에서 인증 프로세스를 통해 통화 서비스를 수행하는 과정을 설명하기 위한 예시적인 도면이다. 6A to 6D are exemplary diagrams for explaining a process of performing a call service through an authentication process in a user terminal according to an embodiment of the present invention.

사용자 단말은 목소리 생성 서비스를 이용할 사용자 식별자 정보를 포함하는 농아인 정보를 등록받을 수 있다. 예를 들어, 사용자 단말은 목소리 생성 서비스 제공 앱을 실행시킨 후, 인증 프로세스를 수행하기 위해 필요한 이름 및 전화번호 등을 포함하는 농아인 정보를 등록받을 수 있다. The user terminal may register deaf information including user identifier information to use the voice generation service. For example, after the user terminal executes a voice generation service providing app, the user terminal may receive registration of deaf information including a name and phone number necessary for performing an authentication process.

사용자 단말은 인증 프로세스의 결과에 기초하여 타사용자 단말과의 통화 서비스를 수행할 수 있다. 예를 들어, 인증 프로세스의 수행을 위해 사전에 농아인 데이터베이스에 농아인 등록 정보가 기등록되어 있을 수 있다. The user terminal may perform a call service with other user terminals based on the result of the authentication process. For example, registration information of the deaf-mute person may be pre-registered in the deaf-mute person database in order to perform the authentication process.

예를 들어, 농아인인 사용자가 사전에 농아인임을 증명하는 증명 서류를 통신사의 오프라인 매장으로 제출함으로써, 서비스 관리자에 의해 제출된 증명 서류가 검토되어 농아인 등록 정보가 농아인 데이터베이스에 등록될 수 있다. For example, by submitting a proof document proving that a deaf user is deaf in advance to an offline store of a telecommunications company, the submitted certification document may be reviewed by a service manager and registration information of the deaf person may be registered in a deaf person database.

다른 예를 들어, 농아인인 사용자가 농아인임을 증명하는 증명 서류를 목소리 생성 서비스 제공 앱과 연동 가능한 통신사의 앱을 통해 스캔하여 제출함으로써, 서비스 관리자에 의해 제출된 증명 서류가 검토되어 농아인 등록 정보가 농아인 데이터베이스에 등록될 수 있다. For another example, by scanning and submitting a proof document proving that a deaf user is a deaf person through an app of a telecommunications company that can be linked with a voice generation service providing app, the proof document submitted by the service manager is reviewed and the registration information of the deaf person is confirmed. can be registered in the database.

사용자 단말은 등록받은 농아인 정보 및 농아인 데이터베이스에 등록된 농아인 등록 정보 간의 일치 여부에 기초하여 인증 프로세스를 수행할 수 있다. 예를 들어, 사용자 단말은 농아인 정보에 포함된 사용자 단말의 전화번호와 농아인 데이터베이스에 등록된 농아인 등록 정보에 포함된 농아인의 전화번호와 일치하는지 여부에 기초하여 인증 프로세스를 수행할 수 있다. The user terminal may perform an authentication process based on whether the registered deaf-mute information and the deaf-mute registration information registered in the deaf-mute database match. For example, the user terminal may perform an authentication process based on whether the phone number of the user terminal included in the information about the deaf-mute-mute person matches the phone number of the deaf-mute person included in the registration information of the deaf-mute person registered in the database for the deaf-mute person.

사용자 단말은 통화 서비스가 수행되는 중에 텍스트 정보 또는 음성을 입력받을 수 있다. 예를 들어, 사용자 단말은 인증 프로세스를 통해 인증이 성공한 경우, '농아인 모드'를 통해 타사용자 단말(미도시)과의 통화 서비스를 수행할 수 있다. 예를 들어, 타사용자 단말(미도시)과의 통화 서비스를 수행하기 위해, 타사용자와 관련된 정보가 건청인 데이터베이스에 건청인 등록 정보로 기등록되어 있을 수 있다. The user terminal may receive text information or voice input while a call service is being performed. For example, when authentication is successful through an authentication process, a user terminal may perform a call service with another user terminal (not shown) through a 'deaf mode'. For example, in order to perform a call service with another user terminal (not shown), information related to other users may be pre-registered as hearing registration information in a database for hearing persons.

여기서, 타사용자 단말은 사용자 단말로부터 초대 메시지를 수신함으로써, 건청인 등록 정보가 등록될 수 있다. 예를 들어, 사용자 단말은 통화 서비스를 수행하기 위해 사용자(농아인)로부터 목소리 생성 서비스 제공 앱을 통해 사용자 단말의 주소록에 저장된 복수의 타사용자(건청인) 중 적어도 하나의 타사용자를 선택받고, 선택된 타사용자의 사용자 단말(미도시)로 목소리 생성 서비스 제공 앱의 링크를 포함하는 초대 메시지를 전송할 수 있다. 이 때, 초대 메시지의 전송을 통해 건청인의 건청인 등록 정보(예를 들어, 전화번호)가 건청인 데이터베이스에 등록될 수 있다. Here, other user terminals may register hearing registration information by receiving an invitation message from the user terminal. For example, in order to perform a call service, the user terminal receives a selection of at least one other user from among a plurality of other users (hearing persons) stored in the address book of the user terminal through a voice generation service providing app from the user (deaf person), and the selected other user is selected. An invitation message including a link of a voice generation service providing app may be transmitted to the user's terminal (not shown). At this time, through transmission of the invitation message, the hearing registration information (eg, phone number) of the hearing person may be registered in the hearing database.

이후, 초대 메시지를 수신한 타사용자(건청인)는 링크를 통해 목소리 생성 서비스 제공 앱을 설치하고, 설치된 앱을 통해 타사용자(건청인)의 사용자 식별 정보(예를 들어, 전화번호)를 인증할 수 있다. 여기서, 인증 프로세스의 결과, 사용자 단말(농아인)의 연락처 목록에 타사용자(건청인)의 연락처가 표시될 수 있고, 타사용자(건청인)는 '건청인 모드'를 통해 통화 서비스를 수행할 수 있다.Thereafter, another user (hearing person) who has received the invitation message can install the voice generation service providing app through the link and authenticate the user identification information (eg, phone number) of the other user (hearing person) through the installed app. have. Here, as a result of the authentication process, the contact information of other users (hearing persons) may be displayed in the contact list of the user terminal (deaf persons), and other users (hearing persons) may perform a call service through the 'hearing mode'.

사용자 단말은 적어도 하나의 타사용자(건청인)에 대한 정보를 건청인 데이터베이스로부터 수신하고, 수신한 타사용자(건청인)에 대한 정보에 기초하여 통화 서비스를 수행할 수 있다. 예를 들어, 사용자 단말은 사용자(농아인)의 연락처 목록에 통화 서비스를 수행할 적어도 하나의 타사용자(건청인)의 연락처를 표시하고, 표시된 적어도 하나의 타사용자(건청인)의 연락처 중 어느 하나의 타사용자(건청인)의 연락처를 선택하여 선택된 타사용자(건청인)와 통화 서비스를 수행할 수 있다. The user terminal may receive information on at least one other user (hearing person) from the hearing database and perform a call service based on the received information on the other user (hearing person). For example, the user terminal displays the contact information of at least one other user (hearing person) who will perform a call service in the contact list of the user (deaf person), and selects any one of the displayed contact information of the at least one other user (hearing person). By selecting the contact information of the user (hearing person), a call service can be performed with the selected other user (hearing person).

이하에서는, 사용자 단말의 목소리 생성 서비스 제공 앱의 UI를 이용하여 설명하도록 한다. Hereinafter, description will be made using the UI of the voice generating service providing app of the user terminal.

도 6a는 본 발명의 일 실시예에 따른 사용자 단말에서 타사용자 단말을 초대하는 과정을 설명하기 위한 예시적인 도면이다. 도 6a를 참조하면, 사용자 단말(600)은 농아인 정보에 기초하여 목소리 생성 서비스에 대한 인증 프로세스의 수행이 완료되면, 연락처 메뉴(610) 내 '친구 초대 아이콘'(611)을 통해 적어도 하나의 타사용자 단말(미도시)을 초대할 수 있다. 6A is an exemplary diagram for explaining a process of inviting another user terminal from a user terminal according to an embodiment of the present invention. Referring to FIG. 6A , when the authentication process for the voice generation service is completed based on the deaf information, the user terminal 600 selects at least one other person through a 'friend invitation icon' 611 in the contact menu 610. A user terminal (not shown) may be invited.

예를 들어, 사용자 단말(600)은 사용자로부터 '친구 초대 아이콘'(611)을 선택받은 경우, 적어도 하나의 타사용자의 연락처를 포함하는 연락처 목록(612)을 표시할 수 있다. 사용자 단말은 적어도 하나의 타사용자 중 초대할 타사용자에 대한 '추가 아이콘'(614)을 선택받음으로써, 연락처 관리 메뉴(615)에 등록시킬 수 있다. 이 때, 사용자 단말은 '검색 버튼'(613)을 이용하여 초대할 타사용자를 보다 용이하게 검색한 후, 초대할 타사용자에 대해 '추가 아이콘'(614)을 선택받음으로써, 연락처 관리 메뉴(615)에 등록시킬 수도 있다. For example, when a 'friend invitation icon' 611 is selected by the user, the user terminal 600 may display a contact list 612 including at least one other user's contact information. The user terminal can register in the contact management menu 615 by receiving selection of an 'additional icon' 614 for another user to be invited from among at least one other user. At this time, the user terminal uses the 'search button' 613 to more easily search for other users to invite, and then selects the 'add icon' 614 for the other users to invite, so that the contact management menu ( 615) can also be registered.

사용자 단말(600)은 연락처 관리 메뉴(615)에 등록된 적어도 하나의 타사용자 중 사용자로부터 특정 타사용자에 대한 '초대 버튼'(616)을 입력받아 통화 서비스를 함께 이용할 특정 타사용자를 선택할 수 있다. 이 때, 연락처 관리 메뉴(615)에 등록된 타사용자가 기설정된 인원(예를 들어, 4명) 이상인 경우, 사용자 단말(600)은 '친구 삭제 버튼'(617)을 통해 삭제 후 추가 등록을 수행할 수 있다. 예를 들어, 사용자 단말이 사용자로부터 '엄마'의 이름 영역에 포함된 초대 버튼(616)을 선택받은 경우, 목소리 생성 서비스 제공 앱의 설치 링크를 포함하는 초대 메시지가 '엄마'의 사용자 단말로 전송될 수 있다. 이 때, 초대 메시지의 전송을 통해 건청인인 엄마와 관련된 건청인 등록 정보가 건청인 데이터베이스에 등록될 수 있다. The user terminal 600 receives an 'invite button' 616 for a specific other user from the user among at least one other user registered in the contact management menu 615 and selects a specific other user to use the call service with. . At this time, if the number of other users registered in the contact management menu 615 is more than a predetermined number (eg, 4 people), the user terminal 600 performs additional registration after deleting through the 'friend delete button' 617. can be done For example, when the user terminal receives a selection of the invitation button 616 included in the name field of 'mom' from the user, an invitation message including an installation link of a voice generation service providing app is transmitted to the user terminal of 'mom'. It can be. At this time, through the transmission of the invitation message, the hearing registration information related to the hearing mother may be registered in the hearing database.

만약, '엄마'의 사용자 단말에 목소리 생성 서비스 제공 앱의 설치가 완료되는 경우, 사용자 단말과 '엄마'의 사용자 단말간의 통화 서비스가 수행될 수 있다. If the installation of the voice generation service providing app in the user terminal of 'mom' is completed, a call service between the user terminal and the user terminal of 'mom' may be performed.

사용자 단말(600)은 연락처 메뉴(610)에 목소리 생성 서비스 제공 앱의 설치가 완료된 타사용자 목록을 표시하고, 표시된 타사용자 목록 중 어느 하나의 타사용자를 선택하여 통화 서비스를 수행할 수 있다. 이 때, 사용자 단말은 편집 버튼(618)을 통해 타사용자 목록에 포함된 타사용자의 순서를 변경할 수 있으며, 각 타사용자에 대한 별명을 등록할 수도 있다. The user terminal 600 may display a list of other users on the contact menu 610 for which the voice generation service providing app has been installed, and perform a call service by selecting any other user from the displayed list of other users. At this time, the user terminal may change the order of other users included in the other user list through the edit button 618, and may register nicknames for each other user.

도 6b는 본 발명의 일 실시예에 따른 사용자 단말에서 음성 통화 서비스를 수행하는 과정을 설명하기 위한 예시적인 도면이다. 도 6b를 참조하면, 사용자 단말(600)은 사용자로부터 연락처 메뉴(620)에 등록된 적어도 하나의 타사용자 중 음성 통화를 수행할 타사용자를 선택받을 수 있다. 이 때, 사용자 단말(600)이 사용자로부터 타사용자의 이름을 선택받은 경우, 사용자 단말(600)은 선택된 타사용자와의 통화 내역을 표시할 수 있다. 6B is an exemplary diagram for explaining a process of performing a voice call service in a user terminal according to an embodiment of the present invention. Referring to FIG. 6B , the user terminal 600 may receive a user's selection of another user to perform a voice call from among at least one other user registered in the contact menu 620 . At this time, when the user terminal 600 receives a name of another user selected by the user, the user terminal 600 may display call history with the selected other user.

예를 들어, 사용자가 연락처 메뉴(620)에 등록된 적어도 하나의 타사용자 중 '홍길동'과 음성 통화를 수행하고자 하는 경우, 사용자 단말(600)은 사용자로부터 '홍길동'의 연락처 영역 내에 위치한 음성 통화 버튼(621)을 입력받을 수 있다. 이후, 사용자 단말(600)은 통화 준비 화면(622)을 표시하고, 사용자로부터 '통화 버튼'(623)을 입력받을 수 있다. 이 때, 타사용자가 전화를 수락한 경우, 사용자 단말은 사용자와 타사용자 간의 음성 통화 서비스를 수행할 수 있다.For example, when the user wants to make a voice call with 'Hong Gil-dong' among at least one other user registered in the contact menu 620, the user terminal 600 sends a voice call from the user to 'Hong Gil-dong' located within the contact area. A button 621 may be input. Thereafter, the user terminal 600 may display a call preparation screen 622 and receive a 'call button' 623 from the user. At this time, if the other user accepts the call, the user terminal can perform a voice call service between the user and the other user.

이 때, 음성 통화 서비스의 수행 시, 사용자 단말은 '농아인 모드'로 동작하고, 타사용자 단말(600)은 '건청인 모드'로 동작할 수 있다.At this time, when performing the voice call service, the user terminal may operate in the 'deaf mode' and the other user terminal 600 may operate in the 'hearing mode'.

이하에서는, 사용자 단말(600) 및 타사용자 단말(630) 각각에서 음성 통화 서비스가 수행되는 과정을 설명하도록 한다. Hereinafter, a process of performing a voice call service in each of the user terminal 600 and the other user terminal 630 will be described.

사용자 단말(600)이 사용자로부터 입력창(624)을 통해 텍스트를 입력받은 후, 전송 버튼(625)을 입력받음으로써, 입력된 텍스트가 사용자의 음성 데이터로 변환되어 타사용자 단말(630)로 전송될 수 있다. After the user terminal 600 receives text from the user through the input window 624 and then presses the send button 625, the input text is converted into user voice data and transmitted to the other user terminal 630. It can be.

타사용자 단말(630)은 스피커 모드(632)를 활성화하여 사용자의 음성 데이터를 출력할 수 있다. The other user's terminal 630 may activate the speaker mode 632 to output the user's voice data.

타사용자 단말(630)은 마이크 모드의 활성화 여부(633)를 출력할 수 있다. The other user's terminal 630 may output whether the microphone mode is activated (633).

타사용자 단말(630)은 마이크 모드(631)를 활성화하여 사용자의 음성 데이터에 대한 응답으로 타사용자로부터 발화된 음성을 입력받을 수 있다. 이 때, 발화된 음성은 텍스트로 변환되어 사용자 단말로 전송될 수 있다. The other user's terminal 630 activates the microphone mode 631 to receive a voice spoken by the other user in response to the user's voice data. At this time, the spoken voice may be converted into text and transmitted to the user terminal.

타사용자 단말(630)은 음성 통화 서비스를 수행하는 중 타사용자로부터 영상 통화 버튼(634)을 입력받아, 음성 통화 서비스에서 영상 통화 서비스로 전환시킬 수 있다. The other user's terminal 630 may receive a video call button 634 from the other user while performing the voice call service, and convert the voice call service to the video call service.

타사용자 단말(630)은 음성 통화 서비스의 종료를 원하는 경우, 타사용자로부터 음성 통화 종료 버튼(635)을 입력받을 수 있다. When the other user's terminal 630 wants to end the voice call service, it can receive a voice call end button 635 from the other user.

도 6c는 본 발명의 일 실시예에 따른 사용자 단말에서 영상 통화 서비스를 수행하는 과정을 설명하기 위한 예시적인 도면이다. 도 6c를 참조하면, 사용자 단말(600)은 영상 통화 서비스를 통해 사용자의 얼굴 및 타사용자의 얼굴을 함께 표시할 수 있다. 사용자 단말(600)은 '농아인 모드'로 동작되며, 사용자로부터 입력창(641)을 통해 텍스트를 입력받은 후, 전송 버튼(642)을 입력받으면, 입력된 텍스트가 사용자의 음성 데이터로 변환되어 타사용자 단말(630)로 전송될 수 있다. 6C is an exemplary diagram for explaining a process of performing a video call service in a user terminal according to an embodiment of the present invention. Referring to FIG. 6C , the user terminal 600 may display the user's face and another user's face together through a video call service. The user terminal 600 operates in the 'deaf mode', receives text from the user through the input window 641 and then presses the send button 642, the input text is converted into the user's voice data and other It may be transmitted to the user terminal 630.

타사용자 단말(630)은 영상 통화 서비스를 통해 타사용자의 얼굴 및 사용자의 얼굴을 함께 표시할 수 있다. 타사용자 단말(630)은 '건청인 모드'로 동작되며, 타사용자가 마이크 모드(644)를 활성화시켜 음성을 발화한 경우, 발화된 음성이 텍스트로 변환되어 사용자 단말로 전송될 수 있다. The other user's terminal 630 may display the other user's face and the user's face together through the video call service. The other user's terminal 630 is operated in a 'hearing mode', and when the other user activates the microphone mode 644 and utters a voice, the uttered voice can be converted into text and transmitted to the user terminal.

타사용자 단말(630)은 영상 통화 서비스를 수행하는 중 타사용자로부터 음성 통화 버튼(643)을 입력받아, 영상 통화 서비스에서 음성 통화 서비스로 전환시킬 수 있다. The other user's terminal 630 may receive a voice call button 643 from the other user while performing the video call service, and convert the video call service to the voice call service.

타사용자 단말(630)은 영상 통화 서비스의 종료를 원하는 경우, 타사용자로부터 영상 통화 종료 버튼(645)을 입력받을 수 있다.When the other user's terminal 630 wants to end the video call service, it may receive a video call end button 645 from the other user.

사용자 단말(600)은 영상 통화 서비스가 수행되는 중 사용자로부터 화면에 대해 스크롤 입력(648)을 입력받을 수 있다. 이 때, 스크롤 입력(648)을 통해 화면이 위/아래로 스크롤됨으로써, 전체 대화 내용이 확장 또는 축소될 수 있다. The user terminal 600 may receive a scroll input 648 on the screen from the user while the video call service is being performed. At this time, as the screen is scrolled up/down through the scroll input 648, the entire conversation content can be expanded or reduced.

사용자 단말(600)은 영상 통화 서비스의 수행 중 앨범 라이브러리(649) 또는 카메라를 통해 촬영된 이미지를 선택받고, 선택된 이미지를 타사용자 단말(630)로 전송할 수 있다. 여기서, 이미지는 기설정된 장수(예를 들어, 5장)까지 전송가능하며, 영상 통화 서비스뿐만 아니라, 음성 통화 서비스의 수행 중에도 전송될 수 있다. While performing a video call service, the user terminal 600 may receive a selected image from the album library 649 or a camera and transmit the selected image to the other user terminal 630 . Here, images can be transmitted up to a preset number of sheets (for example, 5 pictures), and can be transmitted not only during a video call service but also during a voice call service.

도 6d는 본 발명의 일 실시예에 따른 사용자의 음성 데이터로 변환하는 과정을 설명하기 위한 예시적인 도면이다. 도 6d를 참조하면, 사용자 단말(600)은 사용자로부터 대화 저장함 메뉴(650)를 선택받을 수 있다. 대화 저장함 메뉴(650)를 통해 '실시간 대화하기' 서비스 또는 '내 목소리 만들기' 서비스가 제공될 수 있다. 6D is an exemplary diagram for explaining a process of converting into user's voice data according to an embodiment of the present invention. Referring to FIG. 6D , the user terminal 600 may receive a selection of a conversation storage box menu 650 from the user. A 'real-time conversation' service or a 'create my voice' service may be provided through the conversation storage box menu 650 .

예를 들어, 사용자 단말(600)은 사용자가 타사용자와 함께 있는 경우 '실시간 대화하기' 서비스를 통해 음성 대화 서비스 기능을 제공할 수 있다. For example, the user terminal 600 may provide a voice conversation service function through a 'real-time conversation' service when the user is with another user.

사용자 단말(600)이 사용자로부터 실시간 대화하기 아이콘(660)을 선택받은 경우, 사용자 단말(600)은 '농아인 모드'(661)를 활성화시킨 후, 사용자로부터 텍스트(662)를 입력받고, 말하기 버튼(663)을 입력받은 경우, 입력된 텍스트가 사용자의 음성 데이터로 변환될 수 있다. 이 때, 사용자 단말(600)은 '건청인 모드'(664)를 활성화시킨 후, 타사용자로부터 발화된 음성을 마이크를 통해 입력받고, 발화 완료 버튼(665)을 입력받은 경우, 입력된 발화가 실시간으로 텍스트로 변환되어 표시할 수 있다. When the user terminal 600 receives a real-time conversation icon 660 selected by the user, the user terminal 600 activates the 'deaf mode' 661, receives text 662 from the user, and presses the speak button When 663 is input, the input text may be converted into user voice data. At this time, the user terminal 600 activates the 'hearing mode' 664, receives a voice uttered from another user through the microphone, and when the speech completion button 665 is input, the input utterance is displayed in real time. can be converted to text and displayed.

다른 예를 들어, 사용자 단말(600)은 '내 목소리 만들기' 아이콘(670)을 통해 사용자가 자주 사용하는 텍스트로 구성된 문장을 음성 데이터로 변환되도록 생성하여 저장할 수 있다. 예를 들어, 사용자 단말(600)은 사용자로부터 자주 사용하는 텍스트(671)를 입력받은 후, '말하기' 버튼(672)을 입력받은 경우, 입력된 텍스트(671)를 사용자의 음성 데이터로 변환할 수 있다. 이 때, 사용자 단말(600)은 사용자로부터 '저장 버튼'(673)을 입력받아 입력된 문장 및 문장에 대응하는 음성 데이터를 저장할 수 있다. 이후, 사용자 단말은 저장된 문장에 대해 재생 버튼(674)의 클릭을 통해 간편하게 해당 문장을 음성 데이터로 출력할 수 있다. For another example, the user terminal 600 may generate and store a sentence composed of text frequently used by the user through the 'create my voice' icon 670 to be converted into voice data. For example, when the user terminal 600 receives frequently used text 671 from the user and then presses the 'speak' button 672, the user terminal 600 converts the input text 671 into the user's voice data. can At this time, the user terminal 600 may receive a 'save button' 673 from the user and store the input sentence and voice data corresponding to the sentence. Thereafter, the user terminal can easily output the corresponding sentence as voice data by clicking the play button 674 for the stored sentence.

도 7a 내지 도 7c는 본 발명의 일 실시예에 따른 사용자 단말에서 앱을 통해 사용자의 음성 데이터 서비스를 제공받는 과정을 설명하기 위한 예시적인 도면이다. 여기서, 음성 데이터 생성 장치(100)는 사용자별 접속 권한을 관리하고, 사용자의 음성 데이터를 생성하고, 음성 데이터 생성 장치(100)는 사용자의 음성 데이터가 생성된 경우, 생성된 음성 데이터의 결과를 체크하는 역할을 수행하고, 생성된 음성 데이터를 제공하기 위한 다양한 서비스 로직을 반영할 수 있다. 또한, 음성 데이터 생성 장치(100)는 사용자 단말(700)과 다른 사용자 단말(미도시)을 중계하는 역할을 수행할 수 있다. 예를 들어, 음성 데이터 생성 장치(100)는 텍스트 및 합성된 목소리 파일을 사용자 단말(700)과 다른 사용자 단말(미도시) 간에 중계를 수행할 수 있으며, 사용자 단말(700)과 다른 사용자 단말(미도시) 간의 영상 및 음성 통화의 요청을 제어할 수 있다. 7A to 7C are exemplary diagrams for explaining a process of receiving a user's voice data service through an app in a user terminal according to an embodiment of the present invention. Here, the voice data generating device 100 manages access rights for each user and generates the user's voice data. It plays a role of checking and may reflect various service logics for providing generated voice data. Also, the voice data generating apparatus 100 may play a role of relaying the user terminal 700 and other user terminals (not shown). For example, the voice data generating apparatus 100 may relay text and synthesized voice files between the user terminal 700 and another user terminal (not shown), and the user terminal 700 and another user terminal ( (not shown) may control a request for video and audio calls between users.

사용자 단말(700)은 음성 데이터 생성 장치(100)에 의해 생성된 사용자의 음성 데이터를 이용하여 서비스를 제공받을 수 있다. The user terminal 700 may receive a service using the user's voice data generated by the voice data generating device 100 .

도 7a를 참조하면, 사용자 단말(700)은 음성 데이터 서비스 제공 앱을 실행시키고, 실행된 앱을 통해 대화를 나누고자 하는 상대(701)를 선택할 수 있다. 여기서, 사용자는 선천적으로 말을 할 수 없는 장애를 가진 농아인일 수 있다.Referring to FIG. 7A , the user terminal 700 may execute a voice data service providing app and select a partner 701 to have a conversation with through the launched app. Here, the user may be a deaf person with a congenital disability.

사용자 단말(700)은 대화를 나누고자 하는 상대(701)를 선택한 후, '내 목소리 대화'(702), '영상 대화'(703), '문자' 중 어느 하나를 선택할 수 있다.After selecting a partner 701 to have a conversation with, the user terminal 700 may select one of 'my voice conversation' 702, 'video conversation' 703, and 'text'.

도 7a 및 도 7b를 참조하면, 사용자 단말(700)이 사용자로부터 '내 목소리 대화'(702)를 선택받은 경우, 사용자 단말(700)은 음성 데이터 생성 장치(100)를 통해 대화를 나누고자 하는 상대(701)와 메시지를 주고받을 수 있다. 이 때, 메시지는 텍스트로 표시될 수 있으며, '내 목소리 대화'(711)를 통해 생성된 사용자의 음성으로 출력되도록 할 수 있다. 또는, 사용자로부터 영상 통화(712) 버튼을 입력받은 경우, 사용자 단말(700)은 영상 통화를 진행하기 위한 발신 화면(713)을 출력하고, 음성 데이터 생성 장치(100)를 통해 대화를 나누고자 하는 상대(701)인 '남편'과 영상 통화(714)를 수행할 수 있다. Referring to FIGS. 7A and 7B , when the user terminal 700 receives a 'My Voice Conversation' 702 selection from the user, the user terminal 700 wants to have a conversation through the voice data generating device 100. Messages can be exchanged with the counterpart 701 . At this time, the message may be displayed as a text, and may be output as a user's voice generated through the 'My Voice Conversation' 711 . Alternatively, when the video call 712 button is input from the user, the user terminal 700 outputs a call screen 713 for conducting a video call, and A video call 714 may be performed with the counterpart 701 'husband'.

도 7a 및 도 7c를 참조하면, 사용자 단말(700)이 사용자로부터 '영상 대화'(703)를 선택받은 경우, 사용자 단말(700)은 대화 상대인 '남편'(721)과 영상 통화를 진행하기 위한 발신 화면(720)을 출력하고, 음성 데이터 생성 장치(100)를 통해 대화 상대인 '남편'(721)과 영상 통화(722)를 수행할 수 있다. 이후, 사용자 단말(700)이 사용자로부터 '내 목소리 대화'(723) 버튼을 입력받은 경우, 사용자 단말(700)은 영상 통화 화면으로부터 '내 목소리 대화'가 가능한 채팅 화면으로 전환하고, 음성 데이터 생성 장치(100)를 통해 남편과 메시지를 주고받을 수 있다. 이 때, 메시지는 텍스트로 표시됨과 동시에 '내 목소리 대화'(724)를 통해 음성 데이터 생성 장치(100)에 의해 생성된 사용자의 음성으로 출력될 수 있다.Referring to FIGS. 7A and 7C , when the user terminal 700 selects 'video chat' 703 from the user, the user terminal 700 proceeds with a video call with the 'husband' 721, a conversation partner. A video call 722 may be performed with the 'husband' 721, a conversation partner, through the voice data generating device 100. Thereafter, when the user terminal 700 receives the 'My Voice Conversation' 723 button from the user, the user terminal 700 switches from the video call screen to the chatting screen where 'My Voice Conversation' is possible, and generates voice data. Messages can be exchanged with the husband through the device 100 . At this time, the message may be displayed as a text and simultaneously output as a user's voice generated by the voice data generating apparatus 100 through the 'My Voice Conversation' 724 .

도 8a 및 도 8b는 본 발명의 일 실시예에 따른 사용자의 음성 데이터를 생성하여 통화 서비스를 제공하는 과정을 설명하기 위한 예시적인 도면이다.8A and 8B are exemplary diagrams for explaining a process of generating a user's voice data and providing a call service according to an embodiment of the present invention.

여기서, 음성 데이터 생성 장치(100)는 사용자별 접속 권한을 관리하고, 사용자의 음성 데이터를 생성하고, 음성 데이터 생성 장치(100)는 사용자의 음성 데이터가 생성된 경우, 생성된 음성 데이터의 결과를 체크하는 역할을 수행하고, 생성된 음성 데이터를 제공하기 위한 다양한 서비스 로직을 반영할 수 있다. 또한, 음성 데이터 생성 장치(100)는 제 1 사용자 단말(800, 820)과 제 2 사용자 단말(801, 821)을 중계하는 역할을 수행할 수 있다. 예를 들어, 음성 데이터 생성 장치(100)는 텍스트 및 합성된 목소리 파일을 제 1 사용자 단말(800, 820)과 제 2 사용자 단말(801, 821) 간에 중계를 수행할 수 있으며, 제 1 사용자 단말(800, 820)과 제 2 사용자 단말(801, 821) 간의 영상 및 음성 통화의 요청을 제어할 수 있다.Here, the voice data generating device 100 manages access rights for each user, generates user voice data, and when the user's voice data is generated, the voice data generating device 100 displays a result of the generated voice data It plays a role of checking and may reflect various service logics for providing generated voice data. Also, the voice data generating apparatus 100 may play a role of relaying the first user terminals 800 and 820 and the second user terminals 801 and 821 . For example, the voice data generating apparatus 100 may relay text and synthesized voice files between the first user terminals 800 and 820 and the second user terminals 801 and 821, and the first user terminal Requests for video and audio calls between (800, 820) and the second user terminal (801, 821) may be controlled.

도 8a는 본 발명의 일 실시예에 따른 농아인인 사용자와 건청인인 가족 구성원 간의 통화 서비스를 제공하는 과정을 설명하기 위한 예시적인 도면이다. 도 8a를 참조하면, 제 1 사용자 단말(800)은 건청인인 딸의 단말이고, 제 2 사용자 단말(801)은 농아인인 엄마의 단말일 수 있다. 8A is an exemplary diagram for explaining a process of providing a call service between a deaf-mute user and a hearing family member according to an embodiment of the present invention. Referring to FIG. 8A , a first user terminal 800 may be a terminal of a daughter who is hearing, and a second user terminal 801 may be a terminal of a mother who is deaf.

제 1 사용자 단말(800)은 건청인인 딸로부터 음성 데이터 서비스 제공 앱(802)의 실행을 입력받아 앱(802)을 실행시킬 수 있다. 이후, 제 1 사용자 단말(800)은 딸로부터 실행된 앱(802)을 통해 대화를 나누고자 하는 상대로 엄마(803)를 선택받을 수 있다. 예를 들어, 제 1 사용자 단말(800)은 딸로부터 통화를 나누고자 하는 대상으로 '엄마'(803)를 선택받은 후, '내 목소리 대화'(804), '영상 대화', '문자' 중 어느 하나를 선택받을 수 있다. The first user terminal 800 may execute the app 802 by receiving an execution input of the voice data service providing app 802 from the hearing daughter. Thereafter, the first user terminal 800 may select a mother 803 as a person to have a conversation with through the app 802 executed by the daughter. For example, the first user terminal 800 selects 'mom' 803 as a target to make a call with the daughter, and selects 'my voice conversation' 804, 'video conversation', or 'text'. Either one can be chosen.

예를 들어, 제 1 사용자 단말(800)이 딸로부터 '내 목소리 대화'(804)를 선택받은 경우, 제 1 사용자 단말(800)은 딸로부터 음성을 입력받아 음성 데이터 생성 장치(100)로 전송하고, 음성 데이터 생성 장치(100)에 의해 입력된 음성으로부터 변환된 텍스트를 메시지(805)로 표시할 수 있다. 예를 들어, 딸이 "엄마 학교 끝났어"라고 발화한 경우, 제 1 사용자 단말(800)은 발화한 음성으로부터 변환된 텍스트를 메시지(805)로 표시할 수 있다. For example, when the first user terminal 800 receives a selection of 'My voice conversation' 804 from the daughter, the first user terminal 800 receives a voice input from the daughter and transmits it to the voice data generating device 100. And, the text converted from the voice input by the voice data generating device 100 may be displayed as a message 805 . For example, when the daughter utters “Mom, school is over,” the first user terminal 800 may display a text converted from the uttered voice as a message 805 .

제 2 사용자 단말(801)은 음성 데이터 생성 장치(100)로부터 변환된 메시지(805)를 수신하여, 수신한 메시지(806)를 채팅창을 통해 표시할 수 있다. 이후, 제 2 사용자 단말(801)은 농아인인 엄마로부터 메시지(806)에 대한 답변 내용을 입력받아 메시지(807)로 표시할 수 있다. 예를 들어, 제 2 사용자 단말(801)은 "그래 큰딸. 엄마는 집이야"와 같이 메시지(806)에 대한 답변 내용을 입력받을 수 있다. The second user terminal 801 may receive the converted message 805 from the voice data generating device 100 and display the received message 806 through a chatting window. Thereafter, the second user terminal 801 may receive the contents of the answer to the message 806 from the mother who is deaf and display the contents as a message 807 . For example, the second user terminal 801 may receive an input of an answer to the message 806, such as "Yes, my eldest daughter. My mother is at home."

제 1 사용자 단말(800)은 음성 데이터 생성 장치(100)에 의해 제 2 사용자 단말(801)로부터 답변 내용이 P-TTS(Text-to-Speech)로 변환된 농아인인 엄마가 자신의 목소리로 합성된 음성과 함께 텍스트를 포함하는 메시지(808)를 수신하고, 수신한 메시지(808)를 표시할 수 있다. In the first user terminal 800, the answer content from the second user terminal 801 is converted into text-to-speech (P-TTS) by the voice data generating device 100, and the deaf mother synthesizes it into her own voice. A message 808 including text along with the voiced voice may be received, and the received message 808 may be displayed.

제 1 사용자 단말(800)이 건청인인 딸로부터 "친구랑 1시간만 놀아도 돼? 집앞 놀이터에서 놀꺼야"라는 음성을 입력받은 경우, 제 1 사용자 단말(800)은 입력된 음성을 음성 데이터 생성 장치(100)로 전송하고, 음성 데이터 생성 장치(100)에 의해 음성으로부터 변환된 텍스트를 메시지(809)로 표시할 수 있다. When the first user terminal 800 receives a voice input saying "Can I play with a friend for an hour? Let's play at the playground in front of my house" from a daughter who is hearing, the first user terminal 800 converts the input voice to a voice data generating device. 100, and the text converted from voice by the voice data generating device 100 may be displayed as a message 809.

제 2 사용자 단말(801)은 음성 데이터 생성 장치(100)로부터 변환된 메시지(809)를 수신하고, 수신한 메시지(810)를 채팅창을 통해 표시할 수 있다. 이후, 제 2 사용자 단말(801)은 농아인인 엄마로부터 메시지(810)에 대한 답변 내용을 입력받아 메시지(811)로 표시할 수 있다. 예를 들어, 제 2 사용자 단말(801)은 "알았어. 4시까지 집에 와."와 같이 메시지(810)에 대한 답변 내용을 입력받을 수 있다. 이 때, 입력된 답변 내용은 음성 데이터 생성 장치(100)로 전송되어 음성 데이터로 생성됨으로써, 농아인인 엄마가 자신의 목소리로 합성된 음성 데이터로 들을 수 있다. The second user terminal 801 may receive the converted message 809 from the voice data generating device 100 and display the received message 810 through a chatting window. Thereafter, the second user terminal 801 may receive the content of the answer to the message 810 from the mother who is deaf and display it as a message 811 . For example, the second user terminal 801 may receive a response to the message 810, such as "Okay. Come home by 4:00". At this time, the input answer content is transmitted to the voice data generating apparatus 100 and generated as voice data, so that the deaf mother can listen to the synthesized voice data with her own voice.

제 1 사용자 단말(800)은 음성 데이터 생성 장치(100)로부터 답변 내용이 P-TTS(Text-to-Speech)로 변환된 텍스트 및 농아인인 엄마가 자신의 목소리로 합성된 음성을 수신하고, 음성과 함께 텍스트를 메시지(812)로 표시할 수 있다. The first user terminal 800 receives the text in which the answer content is converted to P-TTS (Text-to-Speech) from the voice data generating device 100 and the voice synthesized by the deaf mother's own voice, and The text along with may be displayed as a message 812.

도 8b는 본 발명의 일 실시예에 따른 농아인인 제 1 사용자와 제 2 사용자 간의 통화 서비스를 제공하는 과정을 설명하기 위한 예시적인 도면이다. 도 8b를 참조하면, 제 1 사용자 단말(820)은 농아인인 딸의 단말이고, 제 2 사용자 단말(821)은 농아인인 엄마의 단말일 수 있다. 8B is an exemplary diagram for explaining a process of providing a call service between a first user who is deaf and a second user according to an embodiment of the present invention. Referring to FIG. 8B , a first user terminal 820 may be a terminal of a daughter who is deaf, and a second user terminal 821 may be a terminal of a mother who is deaf.

제 1 사용자 단말(820)은 농아인인 딸로부터 음성 데이터 서비스 제공 앱(830)의 실행을 입력받아 앱(830)을 실행시킬 수 있다. 이후, 제 1 사용자 단말(820)은 딸로부터 실행된 앱(830)을 통해 대화를 나누고자 하는 상대로 엄마(831)를 선택받을 수 있다. 예를 들어, 제 1 사용자 단말(820)은 딸로부터 대화를 나누고자 하는 대상으로 '엄마'(831)를 선택받은 후, '내 목소리 대화'(832), '영상 대화', '문자' 중 어느 하나를 선택받을 수 있다.The first user terminal 820 may execute the app 830 by receiving an execution input of the voice data service providing app 830 from the daughter who is deaf. Thereafter, the first user terminal 820 may select a mother 831 as a person to have a conversation with through the app 830 executed by the daughter. For example, the first user terminal 820 selects 'mom' 831 as a target to have a conversation with the daughter, and then selects 'my voice conversation' 832, 'video conversation', or 'text'. either one can be chosen.

예를 들어, 제 1 사용자 단말(820)이 딸로부터 '내 목소리 대화'(832)를 선택받은 경우, 제 1 사용자 단말(821)은 딸로부터 텍스트를 입력받을 수 있다. 예를 들어, 제 1 사용자 단말(820)은 "엄마 학교 끝났어"라는 텍스트를 입력받아 메시지(833)로 표시할 수 있다. For example, when the first user terminal 820 receives the 'My Voice Conversation' 832 selected from the daughter, the first user terminal 821 may receive a text input from the daughter. For example, the first user terminal 820 may receive the text “Mom, school is over” and display the text as a message 833 .

제 2 사용자 단말(821)은 음성 데이터 생성 장치(100)로부터 제 1 사용자 단말(820)에서 입력한 메시지(833)를 수신한 경우, 메시지 도착 알림(834)을 표시할 수 있다. When the second user terminal 821 receives the message 833 input in the first user terminal 820 from the voice data generating device 100, it may display a message arrival notification 834.

이후, 제 2 사용자 단말(821)은 채팅창을 통해 수신한 메시지(835)를 표시하고, 엄마로부터 메시지(835)에 대한 답변 내용을 입력받아 메시지(836)로 표시할 수 있다. 예를 들어, 제 2 사용자 단말(821)은 "그래 큰딸. 엄마는 집이야"와 같이 메시지(835)에 대한 답변 내용을 입력받고, 입력받은 답변 내용을 음성 데이터 생성 장치(100)로 전송할 수 있다. 이 때, 제 2 사용자 단말(821)은 엄마로부터 메시지(835)를 선택받은 경우, 선택된 메시지(835)를 음성 데이터 생성 장치(100)에 의해 농아인인 딸의 음성으로 합성된 음성 데이터를 출력할 수 있다. Thereafter, the second user terminal 821 may display the message 835 received through the chatting window, receive an answer to the message 835 from the mother, and display the message 836 . For example, the second user terminal 821 may receive an answer to the message 835, such as "Yes, my eldest daughter. My mom is at home", and transmit the inputted answer to the voice data generating device 100. have. At this time, when the second user terminal 821 receives the message 835 selected from the mother, the second user terminal 821 outputs the selected message 835 as voice data synthesized by the voice data generating apparatus 100 to the voice of the deaf daughter. can

제 1 사용자 단말(820)은 음성 데이터 생성 장치(100)로부터 제 2 사용자 단말(821)에서 입력한 메시지(836)를 수신하고, 수신한 메시지(837)를 표시할 수 있다. 이 때, 제 1 사용자 단말(820)은 농아인인 딸로부터 메시지(837)를 선택받은 경우, 선택된 메시지(836)를 음성 데이터 생성 장치(100)에 의해 농아인인 엄마의 음성으로 합성된 음성 데이터를 출력할 수 있다. The first user terminal 820 may receive the message 836 input in the second user terminal 821 from the voice data generating device 100 and display the received message 837 . At this time, when the first user terminal 820 receives the message 837 selected from the deaf-mute daughter, the selected message 836 is converted into voice data synthesized by the voice data generating apparatus 100 into the voice of the deaf-mute mother. can be printed out.

이후, 제 1 사용자 단말(820)은 "친구랑 1시간만 놀아도 돼? 집앞 놀이터에서 놀꺼야"라는 텍스트를 입력받을 수 있다. Thereafter, the first user terminal 820 may receive input of the text "Can I play with my friend for an hour? Let's play at the playground in front of my house."

제 2 사용자 단말(821)은 음성 데이터 생성 장치(100)로부터 제 1 사용자 단말(820)에 의해 입력된 메시지(838)를 표시하고, 메시지(838)에 대한 답변 내용을 입력받아 메시지(839)로 표시할 수 있다. The second user terminal 821 displays the message 838 input by the first user terminal 820 from the voice data generating device 100, receives an answer to the message 838, and generates a message 839. can be displayed as

도 9는 본 발명의 일 실시예에 따른 음성 데이터 생성 장치에서 음성 데이터를 생성하는 방법의 순서도이다. 도 9에 도시된 음성 데이터 생성 장치(100)에서 음성 데이터를 생성하는 방법은 도 1 내지 도 8b에 도시된 실시예엔 따라 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 8b에 도시된 실시예에 따라 음성 데이터 생성 장치(100)에서 음성 데이터를 생성하는 방법에도 적용된다. 9 is a flowchart of a method of generating voice data in the voice data generating apparatus according to an embodiment of the present invention. The method of generating voice data in the voice data generating apparatus 100 shown in FIG. 9 includes steps that are time-sequentially processed according to the embodiments shown in FIGS. 1 to 8B. Therefore, even if the content is omitted below, it is also applied to the method of generating voice data in the voice data generating apparatus 100 according to the embodiments shown in FIGS. 1 to 8B.

단계 S910에서 음성 데이터 생성 장치(100)는 복수의 가족 구성원의 음성 데이터에 기초하여 가족 구성원의 가족 관계에 해당하는 복수의 가족 관계 모델을 생성할 수 있다. In step S910, the voice data generating apparatus 100 may generate a plurality of family relationship models corresponding to the family relationships of the family members based on the voice data of the plurality of family members.

단계 S920에서 음성 데이터 생성 장치(100)는 음성 데이터를 생성하고자 하는 사용자와 관련된 가족 구성원의 음성 데이터를 입력받을 수 있다. In step S920, the voice data generating apparatus 100 may receive voice data of a family member related to a user to generate voice data.

단계 S930에서 음성 데이터 생성 장치(100)는 사용자 및 가족 구성원 간의 가족 관계에 기초하여 복수의 가족 관계 모델 중 하나의 가족 관계 모델을 선택할 수 있다.In step S930, the voice data generating apparatus 100 may select one family relationship model from among a plurality of family relationship models based on family relationships between the user and family members.

단계 S940에서 음성 데이터 생성 장치(100)는 선택된 가족 관계 모델에 가족 구성원의 음성 데이터를 입력하여 사용자의 음성 데이터를 생성할 수 있다.In step S940, the voice data generating device 100 may generate user voice data by inputting voice data of a family member to the selected family relationship model.

상술한 설명에서, 단계 S910 내지 S940은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S910 to S940 may be further divided into additional steps or combined into fewer steps, depending on an embodiment of the present invention. Also, some steps may be omitted as needed, and the order of steps may be switched.

도 1 내지 도 9를 통해 설명된 음성 데이터 생성 장치에서 음성 데이터를 생성하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 9를 통해 설명된 음성 데이터 생성 장치에서 음성 데이터를 생성하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method of generating voice data in the voice data generating apparatus described with reference to FIGS. 1 to 9 may be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by a computer. have. Also, the method for generating voice data in the voice data generating apparatus described with reference to FIGS. 1 to 9 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 음성 데이터 생성 장치
110: 가족 관계 모델 생성부
120: 입력부
130: 선택부
140: 생성부100: voice data generating device
110: family relationship model generating unit
120: input unit
130: selection unit
140: generating unit

Claims

A method for providing a voice generation service through a user terminal,
registering deaf information including identifier information of a user who will use the voice generation service;
performing an authentication process for the voice generation service based on the registered deaf-mute information;
performing a call service with another user terminal based on the result of the authentication process; and
Receiving text information or voice while the call service is being performed;
When the call service is provided in the deaf mode according to the result of the authentication process, the text information is converted into voice data based on a family relation model corresponding to the registered deaf person information;
The converted voice data is transmitted to the other user terminal, voice generation service providing method.

According to claim 1,
A first feature vector is derived from voice data of a first family member among the plurality of family members, and a second feature vector is derived from voice data of a second family member;
wherein the family relationship model is generated to correspond to a family relationship between the first family member and the second family member based on the first feature vector and the second feature vector.

According to claim 2,
The family relationship model
The voice generation service providing method is generated by performing voice similarity modeling through the first feature vector and the second feature vector.

According to claim 2,
classifying the family relationships into a plurality of family relationship groups based on any one of age difference values of the plurality of family members, pitch distance values between voice data, and pitch distribution values;
generating a plurality of family relationship group models for the plurality of classified family relationship groups;
Further comprising, a voice generation service providing method.

According to claim 4,
The family relationship model,
wherein one family relationship group model is selected from among the plurality of family relationship group models based on the deaf person information.

In a user terminal providing a voice generation service,
a registration unit for registering deaf information including user identifier information to use the voice generation service;
an authentication process performer performing an authentication process for the voice generation service based on the registered deaf-mute information;
a call service performing unit that performs a call service with other user terminals based on the result of the authentication process; and
An input unit for receiving text information or voice while the call service is being performed;
When the call service is provided in the deaf mode according to the result of the authentication process, the text information is converted into voice data based on a family relation model corresponding to the registered deaf person information;
The converted voice data is transmitted to the other user terminal, the user terminal.

According to claim 6,
A first feature vector is derived from voice data of a first family member among the plurality of family members, and a second feature vector is derived from voice data of a second family member;
The family relationship model is generated based on the first feature vector and the second feature vector to correspond to the family relationship of the first family member and the second family member.

According to claim 7,
The family relationship model
The user terminal, which is generated by performing voice similarity modeling through the first feature vector and the second feature vector.

According to claim 7,
The family relationship is classified into a plurality of family relationship groups based on any one of the age difference value of the plurality of family members, a pitch distance value between voice data, and a pitch distribution value, and a plurality of families for the plurality of family relationship groups. Family relationship model generation unit that creates a relationship group model
To further include, the user terminal.

According to claim 9,
The family relationship model,
The user terminal of claim 1 , wherein one family relationship group model is selected from among the plurality of family relationship group models based on the deaf person information.