KR102644989B1

KR102644989B1 - Method for providing psychological counseling service using voice data of the deceased based on artificial intelligence algorithm

Info

Publication number: KR102644989B1
Application number: KR1020230048347A
Authority: KR
Inventors: 김경임; 김경호
Original assignee: 주식회사 알을깨는사람들
Priority date: 2023-04-12
Filing date: 2023-04-12
Publication date: 2024-03-08

Abstract

인공지능 알고리즘에 기초하여 상담 서비스를 제공하는 방법 및 장치가 개시된다. 본 개시의 일 실시예에 따른, 장치에 의해 수행되는, 인공지능 알고리즘에 기초하여 상담 서비스를 제공하는 방법은, 고인의 음성 데이터를 수집하는 단계; 상기 고인의 음성 데이터를 전처리하고, 상기 전처리된 고인의 음성 데이터에 대한 특징 데이터를 추출하는 단계; 상기 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수가 제1 임계값을 초과하는지 여부를 판단하는 단계; 상기 음소의 개수가 상기 제1 임계값을 초과하는 것으로 판단된 경우, 상기 특징 데이터 및 상기 특징 데이터에 대응되는 텍스트 데이터에 기초하여 입력 텍스트에 대응되는 고인 음성이 출력되도록 제1 AI 모델을 학습시키는 단계; 심리 상담 치료 및 인지 행동 치료를 위한 질문 텍스트 데이터베이스 및 답변 텍스트 데이터베이스에 기초하여 질문 텍스트 또는 답변 텍스트를 출력하도록 학습된 제2 AI 모델의 출력 레이어에 상기 제1 AI 모델의 입력 레이어를 연결함으로써 전체 AI 모델을 획득하는 단계; 심리 상담 치료 또는 인지 행동 치료를 위한 특정 질문이 포함된 제1 사용자 음성 데이터를 입력 데이터로 이용하여 상기 특정 질문에 대한 답변이 포함된 상기 고인의 음성을 출력하도록 상기 전체 AI 모델을 학습시키는 단계를 포함하고, 상기 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수가 상기 제1 임계값을 이하인 것으로 판단된 경우, 상기 고인의 음성과 연관된 제2 사용자의 음성 데이터에 기초하여 상기 제1 AI 모델이 학습될 수 있다.A method and device for providing consultation services based on an artificial intelligence algorithm are disclosed. According to an embodiment of the present disclosure, a method of providing a counseling service based on an artificial intelligence algorithm performed by a device includes collecting voice data of a deceased person; Preprocessing the voice data of the deceased and extracting feature data for the preprocessed voice data of the deceased; determining whether the number of phoneme types obtainable from the extracted feature data exceeds a first threshold; When it is determined that the number of phonemes exceeds the first threshold, training a first AI model to output a voice corresponding to the input text based on the feature data and text data corresponding to the feature data. step; Full AI by connecting the input layer of the first AI model to the output layer of the second AI model learned to output question text or answer text based on the question text database and answer text database for psychological counseling and cognitive behavioral therapy. Obtaining a model; Using first user voice data containing specific questions for psychological counseling treatment or cognitive behavioral therapy as input data, training the entire AI model to output the voice of the deceased containing answers to the specific questions. and, when it is determined that the number of phoneme types that can be obtained from the extracted feature data is less than or equal to the first threshold, the first AI model is based on the voice data of the second user associated with the voice of the deceased. This can be learned.

Description

A method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm {METHOD FOR PROVIDING PSYCHOLOGICAL COUNSELING SERVICE USING VOICE DATA OF THE DECEASED BASED ON ARTIFICIAL INTELLIGENCE ALGORITHM}

본 개시는 음성 분석 및 서비스 제공 분야에 관한 것으로서, 더욱 상세하게는 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법에 관한 것이다.This disclosure relates to the field of voice analysis and service provision, and more specifically, to a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm.

챗봇(chatbot)은 음성이나 문자를 통한 사용자와의 대화를 통하여 특정 작업을 수행하도록 제작된 컴퓨터 프로그램을 통칭한다. 최근 기하급수적으로 발전된 언어 분석과 관련된 인공지능(artificial intelligence, AI) 알고리즘이 챗봇에 적용됨에 따라 챗봇의 성능 및 활용성은 점차 확장되고 있다.Chatbot refers to a computer program designed to perform a specific task through conversation with a user through voice or text. As artificial intelligence (AI) algorithms related to language analysis, which have recently developed exponentially, are applied to chatbots, the performance and usability of chatbots are gradually expanding.

현재 챗봇은 스마트폰이나 AI 스피커 등에 탑재되어 정보 검색, 기기 제어, 법률상담 등과 같은 특정 태스크 수행을 목적으로 주로 활용되고 있다. 뿐만 아니라, 1인 가구 증가 및 독거 노인이 증가함에 따라 챗봇을 친구, 개인 비서, 또는 상담사 등으로 간주하는 경향이 증가하고 있다.Currently, chatbots are mounted on smartphones and AI speakers and are mainly used to perform specific tasks such as information search, device control, and legal consultation. In addition, as the number of single-person households and the number of elderly people living alone increases, the tendency to regard chatbots as friends, personal assistants, or counselors is increasing.

한편, 바쁜 현대인들은 경제 활동 및 대인 활동 등 다양한 분야에서 극심한 스트레스를 느끼고 있음에도 이를 해소할 시간적 금전적 여유가 없는 경우가 많다는 문제점이 존재한다. AI 알고리즘이 탑재된 챗봇이 이러한 현대인들의 대안으로 떠오르고 있는 실정이다.Meanwhile, there is a problem that busy modern people often feel extreme stress in various fields such as economic and interpersonal activities, but do not have the time or money to relieve it. Chatbots equipped with AI algorithms are emerging as an alternative for these modern people.

공개특허공보 출원번호 제10-2019-0148962호 (2021.05.27. 공개)Public Patent Publication Application No. 10-2019-0148962 (published on May 27, 2021)

본 개시는 상술된 문제점을 해결하기 위해 안출된 것으로서, 본 개시의 목적은 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법을 제공함에 있다.This disclosure was created to solve the above-mentioned problems, and the purpose of this disclosure is to provide a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm.

본 개시가 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

본 개시의 일 실시예로, 장치에 의해 수행되는, 인공지능 알고리즘에 기초하여 상담 서비스를 제공하는 방법은, 고인의 음성 데이터를 수집하는 단계; 상기 고인의 음성 데이터를 전처리하고, 상기 전처리된 고인의 음성 데이터에 대한 특징 데이터를 추출하는 단계; 상기 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수가 제1 임계값을 초과하는지 여부를 판단하는 단계; 상기 음소의 개수가 상기 제1 임계값을 초과하는 것으로 판단된 경우, 상기 특징 데이터 및 상기 특징 데이터에 대응되는 텍스트 데이터에 기초하여 입력 텍스트에 대응되는 고인 음성이 출력되도록 제1 AI 모델을 학습시키는 단계; 심리 상담 치료 및 인지 행동 치료를 위한 질문 텍스트 데이터베이스 및 답변 텍스트 데이터베이스에 기초하여 질문 텍스트 또는 답변 텍스트를 출력하도록 학습된 제2 AI 모델의 출력 레이어에 상기 제1 AI 모델의 입력 레이어를 연결함으로써 전체 AI 모델을 획득하는 단계; 심리 상담 치료 또는 인지 행동 치료를 위한 특정 질문이 포함된 제1 사용자 음성 데이터를 입력 데이터로 이용하여 상기 특정 질문에 대한 답변이 포함된 상기 고인의 음성을 출력하도록 상기 전체 AI 모델을 학습시키는 단계를 포함하고, 상기 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수가 상기 제1 임계값 이하인 것으로 판단된 경우, 상기 고인의 음성과 연관된 제2 사용자의 음성 데이터에 기초하여 상기 제1 AI 모델이 학습될 수 있다.In one embodiment of the present disclosure, a method of providing a counseling service based on an artificial intelligence algorithm, performed by a device, includes collecting voice data of a deceased person; Preprocessing the voice data of the deceased and extracting feature data for the preprocessed voice data of the deceased; determining whether the number of phoneme types obtainable from the extracted feature data exceeds a first threshold; When it is determined that the number of phonemes exceeds the first threshold, training a first AI model to output a voice corresponding to the input text based on the feature data and text data corresponding to the feature data. step; Full AI by connecting the input layer of the first AI model to the output layer of the second AI model learned to output question text or answer text based on the question text database and answer text database for psychological counseling and cognitive behavioral therapy. Obtaining a model; Using first user voice data containing specific questions for psychological counseling treatment or cognitive behavioral therapy as input data, training the entire AI model to output the voice of the deceased containing answers to the specific questions. and, if it is determined that the number of phoneme types that can be obtained from the extracted feature data is less than or equal to the first threshold, the first AI model is based on the voice data of the second user associated with the voice of the deceased. It can be learned.

그리고, 상기 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수가 상기 제1 임계값 이하인 것으로 판단됨에 기반하여: 복수의 사용자의 음성 데이터로 구축된 음성 데이터베이스로부터 상기 고인의 음성과의 유사도가 제2 임계값을 초과하는 상기 제2 사용자의 음성 데이터가 식별되고, 상기 제2 사용자의 음성 데이터로부터 상기 추출된 특징 데이터로부터 획득할 수 없는 음소 유형과 관련된 보충 음성 데이터가 추출될 수 있다.And, based on it being determined that the number of phoneme types that can be obtained from the extracted feature data is less than the first threshold: the similarity with the voice of the deceased is determined from the voice database constructed with voice data of a plurality of users. The second user's speech data exceeding a threshold of 2 is identified, and supplementary speech data related to a phoneme type that cannot be obtained from the extracted feature data may be extracted from the second user's speech data.

그리고, 상기 특징 데이터 및 상기 보충 음성 데이터를 합성하여 보정 음성 데이터가 획득되고, 상기 보정 음성 데이터 및 상기 보정 음성 데이터에 대응되는 텍스트 데이터에 기초하여 상기 제1 AI 모델이 학습될 수 있다.In addition, correction voice data may be obtained by combining the feature data and the supplementary voice data, and the first AI model may be learned based on the correction voice data and text data corresponding to the correction voice data.

그리고, 상기 보정 음성 데이터를 획득하기 위한 상기 특징 데이터 및 상기 보충 음성 데이터의 합성 비율은 A:B로 결정되고, 상기 A는, 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수에 제1 가중치 및 제2 가중치를 적용한 값이고, 상기 B는, 상기 추출된 특징 데이터로부터 획득할 수 없는 음소 유형의 개수에 제3 가중치 및 제4 가중치를 적용한 값일 수 있다.And, the synthesis ratio of the feature data and the supplementary voice data for obtaining the corrected voice data is determined as A:B, and A is a first weight for the number of phoneme types that can be obtained from the extracted feature data. and a value obtained by applying a second weight, and B may be a value obtained by applying a third weight and a fourth weight to the number of phoneme types that cannot be obtained from the extracted feature data.

그리고, 상기 제1 가중치는, 상기 제1 사용자가 요구하는 상기 고인의 음성의 정밀도에 대응되는 수치에 기초하여 결정되고, 상기 제2 가중치는, 상기 고인의 음성의 고유도에 대응되는 수치에 기초하여 결정되고, 상기 제3 가중치는, 상기 고인의 음성과 상기 제2 사용자의 음성 데이터 간의 유사도에 기초하여 결정되고, 상기 제4 가중치는, 상기 제2 사용자의 음성 데이터의 고유도에 대응되는 수치에 기초하여 결정될 수 있다.And, the first weight is determined based on a value corresponding to the precision of the voice of the deceased requested by the first user, and the second weight is based on a value corresponding to the uniqueness of the voice of the deceased. and the third weight is determined based on the similarity between the voice of the deceased and the voice data of the second user, and the fourth weight is a value corresponding to the uniqueness of the voice data of the second user. It can be decided based on .

그리고, 상기 고인의 음성의 고유도에 대응되는 수치는, 상기 고인의 음성에 대응되는 음색 패턴에 기초하여 결정되고, 상기 제2 사용자의 음성 데이터의 고유도에 대응되는 수치는, 상기 제2 사용자의 음성의 고유도에 대응되는 음색 패턴에 기초하여 결정될 수 있다.In addition, the value corresponding to the uniqueness of the voice of the deceased is determined based on the tone pattern corresponding to the voice of the deceased, and the value corresponding to the uniqueness of the voice data of the second user is determined by the second user It can be determined based on the timbre pattern corresponding to the uniqueness of the voice.

그리고, 상기 제1 임계값 및 상기 제2 임계값은, 상기 제1 사용자가 요구하는 상기 고인의 음성의 정밀도에 대응되는 수치에 기초하여 결정될 수 있다.And, the first threshold and the second threshold may be determined based on a value corresponding to the precision of the deceased's voice requested by the first user.

그리고, 상기 제1 사용자가 이용하는 단말 장치로부터 입력된 상기 전체 AI 모델을 통해 출력된 고인의 목소리에 대한 평가 점수가 제3 임계값 이하인 경우, 상기 A 값을 증가시킬 수 있다.And, if the evaluation score for the voice of the deceased output through the entire AI model input from the terminal device used by the first user is below the third threshold, the value A may be increased.

이 외에도, 본 개시를 구현하기 위한 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition, a computer-readable recording medium recording a computer program for executing a method for implementing the present disclosure may be further provided.

본 개시의 다양한 실시예에 의해, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법이 제공될 수 있다.According to various embodiments of the present disclosure, a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm can be provided.

본 개시의 다양한 실시예에 의해, 사용자에게 친숙한 고인의 목소리 데이터로 심리 상담 또는 인지 행동 치료를 진행하는 AI 챗봇 관련 서비스가 제공됨으로써 심리 상담 효과가 증대할 수 있다.According to various embodiments of the present disclosure, the effectiveness of psychological counseling can be increased by providing an AI chatbot-related service that provides psychological counseling or cognitive behavioral therapy using voice data of the deceased that is familiar to the user.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 시스템을 간략히 도시한 도면이다.
도 2는 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 장치를 간략히 도시한 블록도이다.
도 3은 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법을 설명하기 위한 순서도이다.Figure 1 is a diagram briefly illustrating a system that provides psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.
Figure 2 is a block diagram briefly showing a device that provides psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.
Figure 3 is a flowchart for explaining a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.

본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 개시는 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시의 개시가 완전하도록 하고, 본 개시가 속하는 기술 분야의 통상의 기술자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시는 청구항의 범주에 의해 정의될 뿐이다. The advantages and features of the present disclosure and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present disclosure is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the disclosure is complete and to provide a general understanding of the technical field to which the present disclosure pertains. It is provided to fully inform those skilled in the art of the scope of the present disclosure, and the present disclosure is defined only by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시를 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. The terminology used herein is for the purpose of describing embodiments and is not intended to limit the disclosure. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. As used in the specification, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other elements in addition to the mentioned elements.

명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 개시의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.Like reference numerals refer to like elements throughout the specification, and “and/or” includes each and every combination of one or more of the referenced elements. Although “first”, “second”, etc. are used to describe various components, these components are of course not limited by these terms. These terms are merely used to distinguish one component from another. Therefore, it goes without saying that the first component mentioned below may also be the second component within the technical spirit of the present disclosure.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 개시가 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which this disclosure pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용 시 또는 동작 시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. Spatially relative terms such as “below”, “beneath”, “lower”, “above”, “upper”, etc. are used as a single term as shown in the drawing. It can be used to easily describe the correlation between a component and other components. Spatially relative terms should be understood as terms that include different directions of components during use or operation in addition to the directions shown in the drawings.

예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.For example, if a component shown in a drawing is flipped over, a component described as “below” or “beneath” another component will be placed “above” the other component. You can. Accordingly, the illustrative term “down” may include both downward and upward directions. Components can also be oriented in other directions, so spatially relative terms can be interpreted according to orientation.

본 개시를 설명함에 있어서, "사용자"는 서버에 의해 제공되는 서비스를 통해 학습을 수행하는 학생을 의미할 수 있다. 그리고, "보호자"는 사용자를 보호하는 부모 또는 후견인 등을 의미할 수 있다. "강사"는 사용자를 서버에 의해 제공되는 서비스를 통해 지도하는 선생을 의미할 수 있다.In describing the present disclosure, “user” may refer to a student who performs learning through a service provided by a server. Also, “guardian” may mean a parent or guardian who protects the user. “Instructor” may refer to a teacher who guides users through services provided by the server.

이하에서는 도면을 참조하여 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법에 대해 구체적으로 설명하도록 한다.Below, with reference to the drawings, a detailed description will be given of a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm.

도 1은 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 시스템을 간략히 도시한 도면이다.Figure 1 is a diagram briefly illustrating a system that provides psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.

도 1에 도시된 바와 같이, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법을 구현하기 위한 시스템(1000)은, 장치(100), 제1 사용자가 이용하는 단말 장치(200), 및 데이터베이스(300)를 포함할 수 있다.As shown in FIG. 1, a system 1000 for implementing a method of providing psychological counseling services using voice data of a deceased person based on an artificial intelligence algorithm includes a device 100, a terminal device used by a first user ( 200), and a database 300.

여기서, 도 1에는 장치(100)는 데스크 탑으로 구현된 경우를 도시하고 있으며, 제1 사용자가 이용하는 단말 장치(200)가 하나의 스마트 폰의 형태로 구현된 경우를 도시하고 있으나, 이에 한정되는 것은 아니다.Here, Figure 1 shows a case where the device 100 is implemented as a desktop, and shows a case where the terminal device 200 used by the first user is implemented as a smart phone, but is limited to this. That is not the case.

장치(100) 및 제1 사용자가 이용하는 단말 장치(200)는 다양한 유형의 전자 장치(예로, 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 장치, 서버 장치 등)로 구현될 수 있으며, 하나 이상의 유형의 장치가 연결된 장치 군으로도 구현될 수 있다.The device 100 and the terminal device 200 used by the first user may be implemented as various types of electronic devices (e.g., laptop, desktop, laptop, tablet PC, slate PC device, server device, etc.). It can also be implemented as a device group in which more than one type of device is connected.

예로, 장치(100)는 하나 이상의 유형의 장치가 연결된 장치 군으로 구현될 수 있다. 예로, 장치(100)는 인공지능 알고리즘에 기초하여 상담 서비스를 제공하는 어플리케이션을 관리하는 장치, 어플리케이션과 관련된 데이터를 저장하는 클라우드 서버 등으로 구현될 수 있다.For example, device 100 may be implemented as a group of devices in which one or more types of devices are connected. For example, the device 100 may be implemented as a device that manages an application that provides a consultation service based on an artificial intelligence algorithm, a cloud server that stores data related to the application, etc.

도 1에는 인공지능 알고리즘에 기초하여 상담 서비스를 제공받는 사용자(즉, 제1 사용자)가 한명인 경우를 가정하고 있으나 이에 제한되는 것은 아니다. 사용자의 수는 다양한 값으로 구현될 수 있다.In Figure 1, it is assumed that there is only one user (i.e., the first user) who receives a consultation service based on an artificial intelligence algorithm, but the situation is not limited to this. The number of users can be implemented with various values.

시스템(1000)에 포함된 장치(100), 제1 사용자가 이용하는 단말 장치(200), 및 데이터베이스(300)(즉, 데이터베이스(300)가 포함된 클라우드 서버)는 네트워크(W)를 통해 통신을 수행할 수 있다. The device 100 included in the system 1000, the terminal device 200 used by the first user, and the database 300 (i.e., the cloud server including the database 300) communicate through the network W. It can be done.

여기서, 네트워크(W)는 유선 네트워크와 무선 네트워크를 포함할 수 있다. 예를 들어, 네트워크는 근거리 네트워크(LAN: Local Area Network), 도시권 네트워크(MAN: Metropolitan Area Network), 광역 네트워크(WAN: Wide Area Network) 등의 다양한 네트워크를 포함할 수 있다.Here, the network W may include a wired network and a wireless network. For example, the network may include various networks such as a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN).

또한, 네트워크(W)는 공지의 월드 와이드 웹(WWW: World Wide Web)을 포함할 수도 있다. 그러나, 본 개시의 실시예에 따른 네트워크(W)는 상기 열거된 네트워크에 국한되지 않고, 공지의 무선 데이터 네트워크나 공지의 전화 네트워크, 공지의 유무선 텔레비전 네트워크를 적어도 일부로 포함할 수도 있다.Additionally, the network W may include the known World Wide Web (WWW). However, the network (W) according to an embodiment of the present disclosure is not limited to the networks listed above, and may include at least some of a known wireless data network, a known telephone network, and a known wired and wireless television network.

장치(100)는 인공지능 알고리즘에 기초하여 상담 서비스를 제공하는 어플리케이션을 하나 이상의 사용자에게 제공할 수 있다. 해당 어플리케이션은 심리 상담 치료 또는 인지 행동 치료를 위한 특정 질문에 대한 답변을 고인의 음성으로 출력하는 서비스를 제공할 수 있다. 여기서, '고인'은 인공지능 알고리즘에 기초하여 상담 서비스를 제공받는 제1 사용자와 연관된 고인(예로, 제1 사용자의 부모 등)을 의미할 수 있다.The device 100 may provide an application that provides counseling services based on an artificial intelligence algorithm to one or more users. The application can provide a service that outputs answers to specific questions for psychological counseling or cognitive behavioral therapy in the voice of the deceased. Here, 'deceased' may refer to a deceased person (eg, parents of the first user, etc.) associated with the first user who receives counseling services based on an artificial intelligence algorithm.

구체적으로, 장치(100)는 고인의 음성 데이터로부터 특징 데이터를 추출하고, 추출된 특징 데이터에 기초하여 심리 상담 치료 또는 인지 행동 치료를 위한 특정 질문에 대한 답변을 고인의 음성으로 출력하는 AI 모델을 학습시킬 수 있다.Specifically, the device 100 extracts feature data from the voice data of the deceased and creates an AI model that outputs answers to specific questions for psychological counseling or cognitive behavioral therapy in the voice of the deceased based on the extracted feature data. It can be learned.

장치(100)가 상술된 각종 동작을 수행하는 방법은 도 2 내지 도 3을 참조하여 구체적으로 설명하도록 한다.How the device 100 performs the various operations described above will be described in detail with reference to FIGS. 2 and 3.

데이터베이스(300)(즉, 데이터베이스(300)가 포함된 클라우드 서버)는 장치(100)가 AI 모델을 학습시키기 위해 사용하는 각종 음성/텍스트 데이터가 구축된 데이터베이스를 의미할 수 있다. 데이터베이스(300)는 장치(100)의 일 구성 요소로 구현될 수 있으나 이에 제한되는 것은 아니며 별도의 클라우드 서버로 구현될 수도 있다.The database 300 (i.e., a cloud server including the database 300) may refer to a database in which various voice/text data used by the device 100 to learn an AI model are constructed. The database 300 may be implemented as a component of the device 100, but is not limited thereto and may be implemented as a separate cloud server.

도 2는 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 서버의 구성을 간략히 도시한 블록도이다.Figure 2 is a block diagram briefly illustrating the configuration of a server that provides psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.

도 2에 도시된 바와 같이, 장치(100)는 메모리(110), 통신 모듈(120), 디스플레이(130), 및 프로세서(150)를 포함할 수 있다. As shown in FIG. 2 , device 100 may include memory 110, communication module 120, display 130, and processor 150.

다만, 도 2에 도시된 구성은 본 개시의 실시 예들을 구현하기 위한 예시도이며, 통상의 기술자에게 자명한 수준의 적절한 하드웨어 및 소프트웨어 구성들이 장치(100)에 추가로 포함될 수 있다.However, the configuration shown in FIG. 2 is an illustrative diagram for implementing embodiments of the present disclosure, and appropriate hardware and software configurations that are obvious to those skilled in the art may be additionally included in the device 100.

메모리(110)는 프로세서(150)가 각종 동작을 수행하기 위한 하나 이상의 인스트럭션(instruction)을 저장할 수 있다. 메모리(110)는 장치(100)의 다양한 기능을 지원하는 데이터와, 프로세서(150)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들을 저장할 수 있다.The memory 110 may store one or more instructions for the processor 150 to perform various operations. The memory 110 can store data supporting various functions of the device 100, a program for the operation of the processor 150, and can store input/output data.

메모리(110)는 고인의 음성 데이터로부터 추출된 특징 데이터를 저장할 수 있다. 메모리(110)는 각종 AI 모델(예로, 제1 AI 모델, 제2 AI 모델, 및 전체 AI 모델)에 대해 학습/추론 동작을 수행하기 위한 하나 이상의 파라미터를 저장할 수 있다. 상술한 바와 같이, 메모리(110)는 도 1의 데이터베이스(300)에 구축된 음성/데이터 데이터를 저장할 수 있다.The memory 110 may store feature data extracted from the voice data of the deceased. The memory 110 may store one or more parameters for performing learning/inference operations for various AI models (eg, a first AI model, a second AI model, and an entire AI model). As described above, the memory 110 can store voice/data data built in the database 300 of FIG. 1.

메모리(110)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM: random access memory), SRAM(static random access memory), 롬(ROM: read-only memory), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory 110 is a flash memory type, hard disk type, solid state disk type, SDD type (Silicon Disk Drive type), and multimedia card micro type. ), card type memory (e.g. SD or XD memory, etc.), RAM (random access memory), SRAM (static random access memory), ROM (read-only memory), EEPROM (electrically erasable programmable read) -only memory), PROM (programmable read-only memory), magnetic memory, magnetic disk, and optical disk may include at least one type of storage medium.

통신 모듈(120)은 외부 장치(예로, 사용자가 이용하는 단말 장치 등)와의 통신이 가능하게 하는 회로를 포함하는 하나 이상의 구성 요소를 포함할 수 있다. 예를 들어, 통신 모듈(120)은 방송 수신 모듈, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다.The communication module 120 may include one or more components including a circuit that enables communication with an external device (eg, a terminal device used by a user, etc.). For example, the communication module 120 may include at least one of a broadcast reception module, a wired communication module, a wireless communication module, a short-range communication module, and a location information module.

디스플레이(130)는 장치(100)에서 처리되는 정보를 표시(출력)한다. 예를 들어, 본 디스플레이(130)는 본 장치(100)에서 구동되는 응용 프로그램(예로, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 어플리케이션/웹 사이트)의 실행화면 정보, 또는 이러한 실행화면 정보에 따른 UI(User Interface) (예를 들어, 사용자가 요구하는 고인의 음성의 정밀도를 입력할 수 있는 UI 등), GUI(Graphic User Interface) 정보를 표시할 수 있다.The display 130 displays (outputs) information processed by the device 100. For example, the display 130 displays information on the execution screen of an application running on the device 100 (e.g., an application/website that provides psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm). , or a UI (User Interface) (for example, a UI that allows the user to input the precision of the deceased's voice requested by the user, etc.) and GUI (Graphic User Interface) information according to this execution screen information can be displayed.

입력 모듈(140)은 장치(100)에 각종 입력 데이터 또는/및 입력 인터렉션(예로, 터치, 스와이프 등)을 인가하기 위한 구성 요소를 의미한다.The input module 140 refers to a component for applying various input data and/or input interactions (eg, touch, swipe, etc.) to the device 100.

프로세서(150)는 메모리(110)에 저장된 하나 이상의 인스트럭션(instruction)을 실행함으로써 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법을 제공할 수 있다. 즉, 프로세서(150)는 장치(100)의 각 구성 요소를 이용하여 전반적인 동작 및 기능을 제어할 수 있다.The processor 150 may execute one or more instructions stored in the memory 110 to provide a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm. That is, the processor 150 can control overall operations and functions using each component of the device 100.

구체적으로, 프로세서(150)는 장치(100) 내의 구성요소들의 동작을 제어하기 위한 알고리즘 또는 알고리즘을 재현한 프로그램에 대한 데이터를 저장하는 메모리, 및 메모리에 저장된 데이터를 이용하여 전술한 동작을 수행하는 적어도 하나의 프로세서로 구현될 수 있다. 이때, 메모리와 프로세서는 각각 별개의 칩으로 구현될 수 있다. 또는, 메모리와 프로세서는 단일 칩으로 구현될 수도 있다.Specifically, the processor 150 has a memory that stores data for an algorithm for controlling the operation of components in the device 100 or a program that reproduces the algorithm, and performs the above-described operations using the data stored in the memory. It can be implemented with at least one processor. At this time, the memory and processor may each be implemented as separate chips. Alternatively, the memory and processor may be implemented as a single chip.

또한, 프로세서(150)는 이하의 도 3에서 설명되는 본 개시에 따른 다양한 실시 예들을 본 장치(100) 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 어느 하나 또는 복수를 조합하여 제어할 수 있다. In addition, the processor 150 may control any one or a combination of the components described above in order to implement various embodiments according to the present disclosure described in FIG. 3 below on the device 100.

도 3은 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 방법을 설명하기 위한 순서도이다.Figure 3 is a flowchart for explaining a method of providing psychological counseling services using voice data of the deceased based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.

장치는 고인의 음성 데이터를 수집할 수 있다(S310).The device can collect voice data of the deceased (S310).

예로, 장치는 제1 사용자가 이용하는 단말 장치로부터 고인의 생전 음성 데이터를 획득할 수 있다. 또 다른 예로, 장치는 제1 사용자가 이용하는 단말 장치로부터 획득된 제1 사용자가 등장한 영상으로부터 제1 사용자의 음성 데이터를 추출할 수 있다. 또 다른 예로, 장치는 고인이 이용했던 단말 장치로부터 고인의 음성/영상 데이터를 추출할 수 있다.For example, the device may obtain voice data of the deceased from a terminal used by the first user. As another example, the device may extract the first user's voice data from a video in which the first user appears, obtained from a terminal device used by the first user. As another example, the device can extract the deceased's audio/video data from the terminal device the deceased used.

즉, 제1 사용자가 이용하는 단말 장치는 장치가 제공하는 인공지능 알고리즘에 기초한 고인의 음성 데이터를 이용하여 심리 상담 서비스를 제공하는 어플리케이션을 설치할 수 있다. 제1 사용자는 단말 장치에 표시되는 고인의 음성 데이터를 합성/생성하기 위한 UI 화면 상에 고인의 생전 음성/영상 데이터를 입력할 수 있다.That is, the terminal device used by the first user can install an application that provides psychological counseling services using the deceased's voice data based on an artificial intelligence algorithm provided by the device. The first user may input voice/video data of the deceased during his or her lifetime on a UI screen for synthesizing/generating voice data of the deceased displayed on the terminal device.

장치는 고인의 음성 데이터를 전처리하고, 전처리된 고인의 음성 데이터에 대한 특징 데이터를 추출할 수 있다(S320).The device may preprocess the deceased's voice data and extract feature data for the preprocessed voice data of the deceased (S320).

장치는 고인의 음성 데이터를 처리하기 위하여 각종 전처리 동작을 수행할 수 있다. 예로, 장치는 수집된 고인의 음성 데이터의 노이즈를 제거하고, 샘플링/양자화 등을 수행하는 동작 등을 수행할 수 있다.The device may perform various preprocessing operations to process the deceased's voice data. For example, the device may perform operations such as removing noise from the collected voice data of the deceased and performing sampling/quantization, etc.

그리고, 장치는 전처리된 고인의 음성 데이터로부터 특징 데이터를 추출할 수 있다. 예로, 장치는 전처리된 고인의 음성 데이터가 각종 학습, 유사 측도, 분류 등에 활용될 수 있도록 해당 음성 데이터를 수치 배열로 데이터를 변환하여 특징 데이터를 추출할 수 있다.And, the device can extract feature data from the preprocessed voice data of the deceased. For example, the device can extract feature data by converting the preprocessed voice data of the deceased into a numerical array so that it can be used for various learning, similarity measurement, classification, etc.

장치는 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수가 제1 임계값을 초과하는지 여부를 판단할 수 있다(S330).The device may determine whether the number of phoneme types that can be obtained from the extracted feature data exceeds the first threshold (S330).

구체적으로, 장치는 전처리된 고인의 음성 데이터로부터 추출된 특징 데이터를 이용하여 고인의 음성을 생성하기 위해 필요한 음소 유형의 개수를 식별할 수 있다. 장치는 상기 음소 유형의 개수가 제1 임계값을 초과하는지 여부를 판단할 수 있다.Specifically, the device can identify the number of phoneme types required to generate the deceased's voice using feature data extracted from the preprocessed voice data of the deceased. The device may determine whether the number of phoneme types exceeds a first threshold.

여기서, 제1 임계값은 제1 사용자가 요구하는 고인의 음성의 정밀도에 대응되는 수치에 기초하여 결정될 수 있다. 제1 사용자는 (제1 사용자가 이용하는) 단말 장치에 표시되는 고인의 음성 데이터를 합성/생성하기 위한 UI 화면 상에 제1 사용자가 요구하는 고인의 음성의 정밀도에 대응되는 수치를 입력할 수 있다.Here, the first threshold may be determined based on a value corresponding to the precision of the deceased's voice requested by the first user. The first user can input a value corresponding to the precision of the deceased's voice requested by the first user on the UI screen for synthesizing/generating the deceased's voice data displayed on the terminal device (used by the first user). .

이때, 제1 사용자가 요구하는 고인의 음성의 정밀도에 대응되는 수치가 높을수록 제1 임계값의 크기는 커질 수 있다. 제1 사용자가 요구하는 고인의 음성의 정밀도에 대응되는 수치가 낮을수록 제1 임계값의 크기는 작아질 수 있다.At this time, the higher the value corresponding to the precision of the deceased's voice requested by the first user, the larger the size of the first threshold may be. The lower the value corresponding to the precision of the deceased's voice requested by the first user, the smaller the size of the first threshold may be.

음소의 개수가 제1 임계값을 초과하는 것으로 판단된 경우, 장치는 특징 데이터 및 특징 데이터에 대응되는 텍스트 데이터에 기초하여 입력 텍스트에 대응되는 고인 음성이 출력되도록 제1 AI 모델을 학습시킬 수 있다(S340).If it is determined that the number of phonemes exceeds the first threshold, the device may train the first AI model to output a voice corresponding to the input text based on the feature data and text data corresponding to the feature data. (S340).

음소의 개수가 제1 임계값을 초과한다는 것은 현재 추출된 특징 데이터를 통해서 고인의 음성을 생성/변환할 수 있다는 것을 의미할 수 있다. 이에 따라, 음소의 개수가 제1 임계값을 초과하는 것으로 판단된 경우, 장치는 특징 데이터 및 특징 데이터에 대응되는 텍스트 데이터(즉, 음성 성분을 가지는 특징 데이터를 텍스트로 변환하였을 때의 데이터)에 기초하여 제1 AI 모델을 학습시킬 수 있다.The fact that the number of phonemes exceeds the first threshold may mean that the deceased's voice can be generated/converted through the currently extracted feature data. Accordingly, when it is determined that the number of phonemes exceeds the first threshold, the device sends the feature data and text data corresponding to the feature data (i.e., data when feature data with a voice component is converted into text). Based on this, the first AI model can be trained.

장치는 입력 텍스트에 대응되는 고인의 음성을 출력하도록 제1 AI 모델을 학습시킬 수 있다. 즉, 장치는 입력 텍스트를 고인의 음성으로 출력하는 TTS 모델로서 제1 AI 모델을 학습시킬 수 있다.The device may train the first AI model to output the voice of the deceased corresponding to the input text. That is, the device can learn the first AI model as a TTS model that outputs the input text as the voice of the deceased.

장치는 심리 상담 치료 및 인지 행동 치료를 위한 질문 텍스트 데이터베이스 및 답변 텍스트 데이터베이스에 기초하여 질문 텍스트 또는 답변 텍스트를 출력하도록 학습된 제2 AI 모델의 출력 레이어에 제1 AI 모델의 입력 레이어를 연결함으로써 전체 AI 모델을 획득할 수 있다(S350).The device connects the input layer of the first AI model to the output layer of the second AI model that is trained to output the question text or the answer text based on the question text database and the answer text database for psychological counseling therapy and cognitive behavioral therapy, thereby providing the entire You can obtain an AI model (S350).

그리고, 장치는 상담 치료 또는 인지 행동 치료를 위한 특정 질문이 포함된 제1 사용자 음성 데이터를 입력 데이터로 이용하여 특정 질문에 대한 답변이 포함된 고인의 음성을 출력하도록 전체 AI 모델을 학습시킬 수 있다(S360). 즉, 전이 학습 방식으로 전체 AI 모델이 학습될 수 있다.In addition, the device uses first user voice data containing specific questions for counseling therapy or cognitive behavioral therapy as input data to train the entire AI model to output the deceased's voice containing answers to specific questions. (S360). In other words, the entire AI model can be learned using transfer learning.

구체적으로, 제2 AI 모델은 심리 상담 치료 및 인지 행동 치료를 위한 질문 텍스트 데이터베이스 및 답변 텍스트 데이터베이스에 기초하여 질문 텍스트 또는 답변 텍스트를 출력하도록 학습될 수 있다.Specifically, the second AI model may be trained to output question text or answer text based on a question text database and an answer text database for psychological counseling therapy and cognitive behavioral therapy.

예로, 질문 텍스트 데이터베이스로부터 특정 질문 텍스트가 입력되면, 제2 AI 모델은 답변 텍스트 데이터베이스 중 특정 질문 텍스트에 대응되는 특정 답변 텍스트를 출력하도록 학습될 수 있다. 또 다른 예로, 특정 답변 텍스트가 입력되면, 제2 AI 모델은 특정 답변 텍스트와 연관된(또는/및 특정 답변 텍스트에 후속되는) 질문 텍스트를 출력하도록 학습될 수 있다.For example, when a specific question text is input from a question text database, the second AI model may be trained to output a specific answer text corresponding to the specific question text in the answer text database. As another example, when a specific answer text is input, the second AI model may be trained to output question text associated with the specific answer text (and/or following the specific answer text).

이에 따라, 장치는 제2 AI 모델의 출력 레이어에 제1 AI 모델의 입력 레이어를 연결함으로써 전체 AI 모델을 구성할 수 있다. 이에 따라, 제2 AI 모델의 출력 레이어에서 출력된 데이터는 제1 AI 모델의 입력 레이어에 입력될 수 있다. 예로, 제2 AI 모델에서 출력된 답변/질문 텍스트가 제1 AI 모델에 입력됨으로써 제1 AI 모델이 해당 답변/질문 텍스트가 고인의 음성으로 출력될 수 있다.Accordingly, the device can configure the entire AI model by connecting the input layer of the first AI model to the output layer of the second AI model. Accordingly, data output from the output layer of the second AI model may be input to the input layer of the first AI model. For example, the answer/question text output from the second AI model is input to the first AI model, so that the first AI model can output the answer/question text in the voice of the deceased.

추가적으로 또는 대안적으로, 전처리된 고인의 음성 데이터로부터 추출된 특징 데이터로부터 획득할 수 있는 음소의 개수가 제1 임계값 이하인 것으로 판단된 경우, 장치는 고인의 음성과 연관된 제2 사용자의 음성 데이터에 기초하여 제1 AI 모델을 학습시킬 수 있다.Additionally or alternatively, if it is determined that the number of phonemes obtainable from the feature data extracted from the pre-processed voice data of the deceased is less than or equal to the first threshold, the device may output the voice data of the second user associated with the voice of the deceased. Based on this, the first AI model can be trained.

즉, 고인의 음성을 합성/생성할 수 있는 음소 유형의 개수가 제1 임계값 이하인 경우, 장치는 고인의 음성과 연관된 다른 사용자의 음성 데이터를 이용하여 제1 AI 모델을 학습시킬 수 있다.That is, if the number of phoneme types that can synthesize/generate the deceased's voice is less than or equal to the first threshold, the device can learn the first AI model using other user's voice data associated with the deceased's voice.

장치는 복수의 사용자의 음성 데이터로 구축된 음성 데이터베이스로부터 고인의 음성과의 유사도가 제2 임계값을 초과하는 제2 사용자의 음성 데이터를 식별할 수 있다. 즉, 장치는 음성 데이터베이스로부터 고인의 음성과 유사한 제2 사용자의 음성을 식별할 수 있다.The device may identify voice data of a second user whose similarity to the voice of the deceased exceeds a second threshold from a voice database built with voice data of a plurality of users. That is, the device can identify the second user's voice that is similar to the deceased's voice from the voice database.

예로, 장치는 음성 데이터베이스 상의 복수의 사용자의 음성 데이터로부터 특징 데이터를 추출하고, 복수의 사용자의 음성 데이터의 특징 데이터와 고인의 음성 데이터로부터 추출된 특징 데이터 간의 유사도를 획득할 수 있다. 장치는 복수의 사용자의 음성 데이터 중 획득된 유사도가 제2 임계값을 초과하는 제2 사용자의 음성 데이터를 식별할 수 있다.For example, the device may extract feature data from the voice data of a plurality of users in a voice database, and obtain a degree of similarity between the feature data of the voice data of the plurality of users and the feature data extracted from the voice data of the deceased. The device may identify voice data of a second user whose obtained similarity exceeds a second threshold among voice data of a plurality of users.

장치는 제2 사용자의 음성 데이터로부터 추출된 특징 데이터로부터 획득할 수 없는 음소 유형과 관련된 보충 음성 데이터를 추출할 수 있다. 장치는 특징 데이터 및 보충 음성 데이터를 합성하여 보정 음성 데이터를 획득할 수 있다. 즉, 장치는 음성의 유형의 개수가 부족한 고인의 음성 데이터에 다른 음성 데이터를 합성함으로써 보정 음성 데이터를 추출할 수 있다.The device may extract supplementary speech data related to a phoneme type that cannot be obtained from feature data extracted from the second user's speech data. The device may obtain corrected speech data by combining the feature data and supplementary speech data. In other words, the device can extract corrected voice data by combining other voice data with the voice data of the deceased, which lacks the number of voice types.

예로, 보정 음성 데이터를 획득하기 위한 특징 데이터 및 보충 음성 데이터의 합성 비율은 A:B로 결정될 수 있다. A는 고인의 음성 데이터로부터 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수에 제1 가중치 및 제2 가중치를 적용한 값이고, B는 고인의 음성 데이터로부터 추출된 특징 데이터로부터 획득할 수 없는 음소 유형의 개수에 제3 가중치 및 제4 가중치를 적용한 값일 수 있다.For example, the synthesis ratio of feature data and supplementary voice data to obtain corrected voice data may be determined as A:B. A is the value obtained by applying the first weight and the second weight to the number of phoneme types that can be obtained from the feature data extracted from the voice data of the deceased, and B is the phoneme that cannot be obtained from the feature data extracted from the voice data of the deceased It may be a value obtained by applying a third weight and a fourth weight to the number of types.

예로, 고인의 음성 데이터로부터 추출된 특징 데이터로부터 획득할 수 있는 음소 유형의 개수에 제1 가중치 및 제2 가중치를 적용한 값이 A일 수 있으며, 고인의 음성 데이터로부터 추출된 특징 데이터로부터 획득할 수 없는 음소 유형의 개수에 제3 가중치 및 제4 가중치를 적용한 값이 B일 수 있다.For example, the value obtained by applying the first weight and the second weight to the number of phoneme types that can be obtained from feature data extracted from the voice data of the deceased may be A, and can be obtained from the feature data extracted from the voice data of the deceased. The value obtained by applying the third and fourth weights to the number of missing phoneme types may be B.

여기서, 제1 가중치는, 제1 사용자가 요구하는 상기 고인의 음성의 정밀도에 대응되는 수치에 기초하여 결정되고, 제2 가중치는 고인의 음성의 고유도에 대응되는 수치에 기초하여 결정될 수 있다.Here, the first weight may be determined based on a value corresponding to the precision of the deceased's voice requested by the first user, and the second weight may be determined based on a value corresponding to the uniqueness of the deceased's voice.

제3 가중치는 고인의 음성과 제2 음성 데이터 간의 유사도에 기초하여 결정되고, 제4 가중치는 제2 음성 데이터의 고유도에 대응되는 수치에 기초하여 결정될 수 있다.The third weight may be determined based on the similarity between the deceased's voice and the second voice data, and the fourth weight may be determined based on a value corresponding to the uniqueness of the second voice data.

고인의 음성의 고유도에 대응되는 수치는 고인의 음성에 대응되는 음색 패턴에 기초하여 결정될 수 있다. 고인의 음성의 특징 데이터는 고인의 음성의 음색 패턴을 나타내는 특징 벡터를 포함할 수 있다. 장치는 음성 데이터베이스에 포함된 복수의 음성 각각의 음색 패턴을 나타내는 특징 벡터와 고인의 음성의 음색 패턴을 나타내는 특징 벡터 간의 연관도를 산출할 수 있다. 장치는 해당 연관도가 높을수록 제1 가중치를 낮은 값으로 결정할 수 있다. 장치는 해당 연관도가 낮을수록 제1 가중치를 높은 값으로 결정할 수 있다.The value corresponding to the uniqueness of the deceased's voice may be determined based on the timbre pattern corresponding to the deceased's voice. The feature data of the deceased's voice may include a feature vector representing the timbre pattern of the deceased's voice. The device may calculate the degree of correlation between a feature vector representing the timbre pattern of each of the plurality of voices included in the voice database and a feature vector representing the timbre pattern of the voice of the deceased. The device may determine the first weight to be a lower value as the degree of correlation increases. The device may determine the first weight to be a higher value as the correlation decreases.

장치는 고인의 음성과 제2 음성 데이터 간의 유사도가 높을수록 제3 가중치를 높은 값으로 책정할 수 있다. 고인의 음성과 제2 음성 데이터 간의 유사도가 높을 경우, 제2 음성 데이터의 비중을 높일 수 있다.The device may set the third weight to a higher value as the similarity between the deceased's voice and the second voice data increases. If the similarity between the deceased's voice and the second voice data is high, the proportion of the second voice data can be increased.

제2 음성의 고유도에 대응되는 수치는 고인의 음성의 고유도에 대응되는 수치와 유사한 방식으로 결정될 수 있다.The value corresponding to the uniqueness of the second voice may be determined in a similar manner to the value corresponding to the uniqueness of the deceased's voice.

상술된 제1 임계값 및 제2 임계값은 제1 사용자가 요구하는 고인의 음성의 정밀도에 대응되는 수치에 기초하여 결정될 수 있다. 제1 사용자가 요구하는 고인의 음성의 정밀도에 대응되는 수치가 높을수록, 장치는 제1 임계값 및 제2 임계값을 더 높은 값으로 책정할 수 있다.The above-described first and second thresholds may be determined based on a value corresponding to the precision of the deceased's voice required by the first user. The higher the value corresponding to the precision of the deceased's voice requested by the first user, the higher the device can set the first and second threshold values.

장치는 보정 음성 데이터 및 보정 음성 데이터에 대응되는 텍스트 데이터에 기초하여 제1 AI 모델이 학습시킬 수 있다. 그리고, 장치는 제2 AI 모델의 출력 레이어에 제1 AI 모델의 입력 레이어를 연결하여 전체 AI 모델을 학습시킬 수 있다.The device can train the first AI model based on the corrected voice data and text data corresponding to the corrected voice data. And, the device can learn the entire AI model by connecting the input layer of the first AI model to the output layer of the second AI model.

여기서, 제1 사용자가 이용하는 단말 장치로부터 입력된 전체 AI 모델을 통해 출력된 고인의 목소리에 대한 평가 점수가 제3 임계값 이하인 경우, A 값을 증가시킬 수 있다. 즉, 장치는 고인의 목소리의 특성이 더 반영되도록 보정 음성 데이터를 생성할 수 있다. 그리고, 장치는 생성된 보정 음성 데이터에 기초하여 전체 AI 모델을 추가 학습시킬 수 있다.Here, if the evaluation score for the deceased's voice output through the entire AI model input from the terminal device used by the first user is below the third threshold, the value A may be increased. In other words, the device can generate corrected voice data to better reflect the characteristics of the deceased's voice. And, the device can further learn the entire AI model based on the generated corrected voice data.

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium that stores instructions executable by a computer. Instructions may be stored in the form of program code, and when executed by a processor, may create program modules to perform operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다.Computer-readable recording media include all types of recording media storing instructions that can be decoded by a computer. For example, there may be Read Only Memory (ROM), Random Access Memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, etc.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 개시가 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As described above, the disclosed embodiments have been described with reference to the attached drawings. A person skilled in the art to which this disclosure pertains will understand that the present disclosure may be practiced in forms different from the disclosed embodiments without changing the technical idea or essential features of the present disclosure. The disclosed embodiments are illustrative and should not be construed as limiting.

100: 장치
110: 메모리
120: 통신 모듈
130: 디스플레이
140: 입력 모듈
150: 프로세서100: device
110: memory
120: communication module
130: display
140: input module
150: processor

Claims

In a method of providing consultation services based on an artificial intelligence algorithm performed by a device, the method includes:
collecting voice data of the deceased;
Preprocessing the voice data of the deceased and extracting feature data for the preprocessed voice data of the deceased;
determining whether the number of phoneme types obtainable from the feature data exceeds a first threshold;
When it is determined that the number of phoneme types that can be obtained from the feature data exceeds the first threshold, a voice corresponding to the input text is output based on the feature data and text data corresponding to the feature data. Preferably training a first AI model;
Full AI by connecting the input layer of the first AI model to the output layer of the second AI model learned to output question text or answer text based on the question text database and answer text database for psychological counseling and cognitive behavioral therapy. Obtaining a model;
Using first user voice data containing specific questions for psychological counseling treatment or cognitive behavioral therapy as input data, training the entire AI model to output the voice of the deceased containing answers to the specific questions. Contains,
When it is determined that the number of phoneme types that can be obtained from the feature data is less than or equal to the first threshold, the first AI model is learned based on the voice data of the second user associated with the voice of the deceased,
Based on the determination that the number of phoneme types that can be obtained from the feature data is less than or equal to the first threshold:
The voice data of the second user whose similarity to the voice of the deceased exceeds a second threshold is identified from a voice database constructed from voice data of a plurality of users,
Supplementary voice data related to a phoneme type that cannot be obtained from the feature data is extracted from the voice data of the second user,
Correction voice data is obtained by combining the feature data and the supplementary voice data,
The first AI model is learned based on the corrected voice data and text data corresponding to the corrected voice data,
A synthesis ratio of the feature data and the supplementary voice data for obtaining the corrected voice data is determined as A:B,
The A is a value obtained by applying a first weight and a second weight to the number of phoneme types that can be obtained from the feature data,
The B is a value obtained by applying a third weight and a fourth weight to the number of phoneme types that cannot be obtained from the feature data,
The first weight is determined based on a value corresponding to the precision of the deceased's voice requested by the first user,
The second weight is determined based on a value corresponding to the uniqueness of the deceased's voice,
The third weight is determined based on the similarity between the voice of the deceased and the voice data of the second user,
The fourth weight is determined based on a value corresponding to the uniqueness of the voice data of the second user,
The value corresponding to the uniqueness of the voice of the deceased is determined based on the tone pattern corresponding to the voice of the deceased,
A numerical value corresponding to the uniqueness of the second user's voice data is determined based on a tone pattern corresponding to the uniqueness of the second user's voice.