KR20200024427A

KR20200024427A - Ai assitant voice matching system and operation method thereof

Info

Publication number: KR20200024427A
Application number: KR1020180101084A
Authority: KR
Inventors: 백송이
Original assignee: 주식회사 케이티
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2020-03-09

Abstract

The present invention relates to an AI assistant voice matching method through a content purchase (play) history. The AI assistant voice matching method which automatically generates an AI assistant voice comprises the steps of: collecting content information consumed by a user for a certain period of time; detecting a predetermined number of pieces of candidate content based on the collected content information; selecting target content based on user preference scores of the candidate content; and analyzing sound sources included in the selected target content to extract a representative voice and determining the representative voice as a user preference voice.

Description

AI assistant voice matching system and its operation method {AI ASSITANT VOICE MATCHING SYSTEM AND OPERATION METHOD THEREOF}

본 발명은 AI 비서 음성매칭시스템 및 그 동작방법에 관한 것으로서, 보다 구체적으로는 컨텐츠 구매(재생) 이력에 관한 정보를 기반으로 사용자 맞춤형 AI 비서 음성을 생성하여 제공할 수 있는 AI 비서 음성매칭시스템 및 그 동작방법에 관한 것이다.The present invention relates to an AI assistant voice matching system, and more particularly, to an AI assistant voice matching system capable of generating and providing a user-customized AI assistant voice based on information on a content purchase (playback) history. It relates to the operation method.

최근, 정보통신 기술의 비약적인 발전에 따라 IoT(Internet of Things) 기술에 대한 관심 및 수요가 급격히 증가하고 있다. 이러한 IoT는, 이를 바라보는 관점에 따라 다양한 방식으로 정의될 수 있다. 그러나, 본질적으로 IoT는 인터넷을 기반으로 다양한 사물들을 통신 네트워크로 연결함으로써 사람과 사물, 사물과 사물 간의 통신을 가능하게 하는 지능형 정보통신 기술 내지 서비스이다.Recently, with the rapid development of information and communication technology, interest and demand for the Internet of Things (IoT) technology are rapidly increasing. Such IoT can be defined in various ways, depending on the point of view of it. In essence, however, IoT is an intelligent information communication technology or service that enables communication between people and things, and things and things by connecting various things to a communication network based on the Internet.

이러한 IoT 기술은 스마트 홈(smart home), 스마트 헬스(smart health), 스마트 카(smart car) 등과 같은 다양한 기술 분야에서 응용되고 있다. 특히, IoT 기술을 홈 네트워크 시스템에 접목한 스마트 홈 서비스에 관한 연구가 활발히 진행되고 있다.The IoT technology is applied in various technical fields such as smart home, smart health, smart car, and the like. In particular, research on smart home services incorporating IoT technology into home network systems has been actively conducted.

스마트 홈 서비스는 통신 네트워크가 구축된 주거 환경에서 사물인터넷 기능이 포함된 IoT 기기를 통해 생활 수준 향상을 추구하는 시스템 전반을 의미한다. TV/냉장고/에어컨 등의 가전제품, 전기/수도 등의 에너지 소비장치, 보안 서비스 등을 통신 네트워크로 연결하여 사용자로 하여금 스마트 폰 또는 음성 제어기(또는 AI 스피커) 등을 통해 댁 내의 상황 정보를 원격으로 실시간 확인 및 제어가 가능하도록 할 수 있다. 특히, 최근에는 AI 스피커를 통해 댁 내에 존재하는 IoT 기기들을 원격으로 제어할 수 있는 스마트 홈 서비스가 점점 증가하는 추세이다. Smart home service refers to a system that seeks to improve living standards through IoT devices with IoT functions in a residential environment with a communication network. Connects home appliances such as TVs, refrigerators, and air-conditioners, energy consumers such as electricity and water, and security services to a communication network, allowing the user to remotely monitor the situation information in the home via a smartphone or voice controller (or AI speaker). Real-time check and control can be enabled. In particular, smart home services that can remotely control IoT devices in the home through AI speakers are increasing.

AI 스피커는 음성인식(Voice Recognition) 기술 및 인공지능(Artificial Intelligence) 기술 등을 이용하여 대화형 AI 비서 서비스를 제공할 수 있다. 여기서, 대화형 AI 비서 서비스란 단어 그대로 인공지능이 화자의 비서 역할을 수행해주는 서비스를 의미한다.　이러한 AI 스피커는 개인 일정 관리 서비스, SNS 관리 서비스, 앱 실행 서비스, 인터넷 쇼핑 서비스, 이메일 관리 서비스, 메신저 관리 서비스, 멀티미디어 재생 서비스, 날씨/교통/여행 정보 제공 서비스, IoT 기기 제어 서비스 등과 같은 다양한 서비스를 제공할 수 있다.The AI speaker can provide an interactive AI assistant service using voice recognition technology and artificial intelligence technology. Here, the interactive AI assistant service refers to a service in which artificial intelligence serves as a speaker's secretary. These AI speakers have various services such as personal schedule management service, SNS management service, app execution service, internet shopping service, email management service, messenger management service, multimedia playback service, weather / traffic / travel information service, IoT device control service, etc. Can be provided.

그런데, 종래의 AI 스피커에서 출력되는 음성(목소리)은 서비스 제공 서버에서 일방적으로 제공되는 기계적 음성이기 때문에, 사용자와 AI 비서 간의 친밀도가 높아지기 어려운 문제가 있다. 또한, 서비스 제공 서버에서는 AI 스피커에서 출력 가능한 모든 음성들을 일일이 정의, 제작, 수정해야 하는 번거로움이 있다. 또한, 서비스 제공 서버는 사용자 선호(user preference)에 따라 다양한 AI 비서 음성을 제공하여야 하지만, 사용자 선호의 대상이 되는 음성들의 종류가 너무 많기 때문에 이러한 음성들을 모두 가공하여 제공할 수 없는 현실적인 한계가 있다. 따라서, AI 스피커를 통해 사용자가 선호하는 음성을 간편하게 출력하기 위한 방안이 필요하다.However, since the voice (voice) output from the conventional AI speaker is a mechanical voice provided unilaterally by the service providing server, there is a problem that the intimacy between the user and the AI assistant is difficult to increase. In addition, in the service providing server, it is troublesome to define, produce, and modify all voices that can be output from the AI speaker. In addition, the service providing server should provide various AI secretary voices according to user preferences, but since there are too many kinds of voices that are subject to user preferences, there is a practical limitation in that these voices cannot be processed and provided. . Therefore, there is a need for a method for easily outputting a user's preferred voice through the AI speaker.

본 발명은 전술한 문제 및 다른 문제를 해결하는 것을 목적으로 한다. 또 다른 목적은 컨텐츠 구매(재생) 이력에 관한 정보를 기반으로 특정 컨텐츠에 포함된 사용자 선호 음성을 추출하고, 상기 사용자 선호 음성의 특징을 기반으로 AI 비서 음성을 자동으로 생성할 수 있는 AI 비서 음성매칭시스템 및 그 동작방법을 제공함에 있다.The present invention aims to solve the above and other problems. Another purpose is to extract the user's preferred voice included in the specific content based on the information on the content purchase (playback) history, and AI assistant voice that can automatically generate an AI assistant voice based on the characteristics of the user's preferred voice To provide a matching system and its operation method.

또 다른 목적은 사용자가 일정 기간 동안 소비한 컨텐츠 정보에 기초하여 사용자 맞춤형 AI 비서 음성을 자동으로 생성하고, 상기 사용자 맞춤형 AI 비서 음성을 해당 사용자에게 제공할 수 있는 AI 비서 음성매칭시스템 및 그 동작방법을 제공함에 있다.Still another object is an AI assistant voice matching system capable of automatically generating a customized AI assistant voice based on content information consumed by a user for a certain period of time, and providing the user customized AI assistant voice to the corresponding user, and a method of operating the same. In providing.

상기 또는 다른 목적을 달성하기 위해 본 발명의 일 측면에 따르면, 사용자가 일정 기간 동안 소비한 컨텐츠 정보를 수집하는 단계; 상기 수집된 컨텐츠 정보에 기초하여 미리 결정된 개수의 후보 컨텐츠들을 검출하는 단계; 상기 후보 컨텐츠들의 사용자 선호도 점수를 기반으로 목표 컨텐츠를 선택하는 단계; 및 상기 선택된 목표 컨텐츠에 포함된 음원들을 분석하여 대표 음성을 추출하고, 상기 대표 음성을 사용자 선호 음성으로 결정하는 단계를 포함하는 AI 비서 음성 매칭 방법을 제공한다.According to an aspect of the present invention to achieve the above or another object, collecting the content information consumed by the user for a certain period of time; Detecting a predetermined number of candidate contents based on the collected content information; Selecting target content based on a user preference score of the candidate contents; And extracting a representative voice by analyzing sound sources included in the selected target content, and determining the representative voice as a user's preferred voice.

좀 더 바람직하게는, 상기 검출 단계는, 사용자가 일정 기간 동안 소비한 컨텐츠들의 누적 재생 시간을 계산하고, 상기 누적 재생 시간에 기초하여 후보 컨텐츠들을 검출하는 것을 특징으로 한다. 또한, 상기 검출 단계는, 사용자가 일정 기간 동안 소비한 컨텐츠들의 누적 재생 횟수를 계산하고, 상기 누적 재생 횟수에 기초하여 후보 컨텐츠들을 검출하는 것을 특징으로 한다.More preferably, the detecting step may include calculating a cumulative playing time of contents consumed by a user for a predetermined period of time and detecting candidate contents based on the cumulative playing time. The detecting may include calculating a cumulative number of reproductions of contents consumed by a user for a predetermined period of time and detecting candidate contents based on the cumulative number of reproductions.

좀 더 바람직하게는, 상기 선택 단계는, 후보 컨텐츠들 각각을 미리 결정된 컨텐츠 타입들 중 어느 하나로 분류하는 단계를 더 포함하는 것을 특징으로 한다. 또한, 상기 선택 단계는, 상기 분류된 컨텐츠 타입에 대응하는 가중치 정보를 이용하여 후보 컨텐츠들의 사용자 선호도 점수를 계산하는 단계를 더 포함하는 것을 특징으로 한다. 또한, 상기 선택 단계는, 사용자 선호도 점수가 가장 높은 후보 컨텐츠를 목표 컨텐츠로 선택하는 것을 특징으로 한다. More preferably, the selecting step may further include classifying each of the candidate contents into any one of predetermined content types. The selecting may further include calculating a user preference score of candidate contents using weight information corresponding to the classified content type. In the selecting step, the candidate content having the highest user preference score is selected as the target content.

좀 더 바람직하게는, 상기 결정 단계는, 목표 컨텐츠에 포함된 하나 이상의 음성들을 구분하고, 각 음성 별 누적 재생 시간을 기준으로 상기 대표 음성을 추출하는 것을 특징으로 한다. More preferably, the determining may include distinguishing one or more voices included in target content and extracting the representative voice based on a cumulative playing time for each voice.

좀 더 바람직하게는, 상기 AI 비서 음성 매칭 방법은 사용자 선호 음성의 특징 값들을 추출하고, 상기 추출된 특징 값들을 이용하여 AI 비서 음성을 가공하는 단계를 더 포함하는 것을 특징으로 한다. 또한, 상기 AI 비서 음성 매칭 방법은 상기 가공된 AI 비서 음성을 AI 스피커로 전송하는 단계를 더 포함하는 것을 특징으로 한다. More preferably, the AI assistant voice matching method may further include extracting feature values of a user preferred voice and processing the AI assistant voice using the extracted feature values. The AI assistant voice matching method may further include transmitting the processed AI assistant voice to an AI speaker.

본 발명의 다른 측면에 따르면, 사용자가 일정 기간 동안 소비한 컨텐츠 정보를 수집하는 과정; 상기 수집된 컨텐츠 정보에 기초하여 미리 결정된 개수의 후보 컨텐츠들을 검출하는 과정; 상기 후보 컨텐츠들의 사용자 선호도 점수를 기반으로 목표 컨텐츠를 선택하는 과정; 및 상기 선택된 목표 컨텐츠에 포함된 음원들을 분석하여 대표 음성을 추출하고, 상기 대표 음성을 사용자 선호 음성으로 결정하는 과정이 컴퓨터 상에서 실행되도록 컴퓨터 판독 가능한 기록매체에 저장된 프로그램을 제공한다.According to another aspect of the invention, the process of collecting the content information consumed by the user for a certain period of time; Detecting a predetermined number of candidate contents based on the collected content information; Selecting target content based on a user preference score of the candidate contents; And extracting a representative voice by analyzing sound sources included in the selected target content, and determining the representative voice as a user's preferred voice on a computer.

본 발명의 또 다른 측면에 따르면, 사용자가 일정 기간 동안 구매한 컨텐츠 정보를 수집하는 컨텐츠 구매정보 수집부; 상기 컨텐츠 구매정보 수집부를 통해 획득된 컨텐츠 정보에 기초하여 미리 결정된 개수의 후보 컨텐츠들을 검출하고, 상기 후보 컨텐츠들의 사용자 선호도 점수를 기반으로 목표 컨텐츠를 선택하며, 상기 선택된 목표 컨텐츠에 포함된 음원을 분석하여 대표 음성을 추출하는 선호 음성 추출부; 및 상기 선호 음성 추출부를 통해 검출된 대표 음성의 특징 값들을 추출하고, 상기 특징 값들을 이용하여 AI 비서 음성을 가공하는 AI 비서 음성 가공부를 포함하는 AI 비서 음성 매칭 서버를 제공한다.According to another aspect of the invention, the content purchase information collection unit for collecting the content information purchased by the user for a certain period; Detects a predetermined number of candidate contents based on the content information acquired through the content purchase information collecting unit, selects target content based on a user preference score of the candidate contents, and analyzes a sound source included in the selected target content A preferred speech extractor for extracting a representative speech; And an AI assistant voice processing unit extracting feature values of the representative voice detected through the preferred voice extractor and processing the AI assistant voice using the feature values.

본 발명의 실시 예들에 따른 AI 비서 음성매칭시스템 및 그 동작방법의 효과에 대해 설명하면 다음과 같다. The effects of the AI assistant voice matching system and its operation method according to embodiments of the present invention will be described below.

본 발명의 실시 예들 중 적어도 하나에 의하면, 컨텐츠 구매(재생) 이력에 기초하여 생성된 사용자 맞춤형 AI 비서 음성(목소리)을 사용자에게 제공함으로써, 사용자 측면에서는 자신의 기호에 맞는 AI 비서 음성을 들을 수 있고, 서비스 제공자 측면에서는 AI 비서 음성을 제작하는데 별도의 비용 및 시간 등이 발생하지 않는다는 장점이 있다.According to at least one of the embodiments of the present invention, by providing the user with a customized AI assistant voice (voice) generated based on the content purchase (playback) history, the user side can listen to the AI assistant voice that matches his or her taste. In addition, in terms of service provider, there is an advantage in that no extra cost and time are incurred in producing the AI assistant voice.

본 발명의 실시 예들 중 적어도 하나에 의하면, 사용자가 일정 기간 동안 소비한 컨텐츠 정보에 기초하여 생성된 사용자 맞춤형 AI 비서 음성(목소리)을 해당 사용자에게 제공함으로써, AI 비서에 대한 사용자의 호감도를 높일 수 있고 사용자와 AI 비서 간의 친밀감을 향상시킬 수 있다는 장점이 있다.According to at least one of the embodiments of the present invention, by providing the user with a customized AI assistant voice (voice) generated based on the content information consumed by the user for a certain period of time, the user's preference for the AI assistant can be increased. It also has the advantage of improving the intimacy between the user and AI assistant.

다만, 본 발명의 실시 예들에 따른 AI 비서 음성매칭시스템 및 그 동작방법이 달성할 수 있는 효과는 이상에서 언급한 것들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the effects that can be achieved by the AI assistant voice matching system and its operation method according to the embodiments of the present invention are not limited to those mentioned above, and other effects not mentioned are included in the following description. It will be clearly understood by those skilled in the art.

도 1은 본 발명의 일 실시 예에 따른 AI 비서 음성매칭시스템의 전체 구성도;
도 2는 본 발명의 일 실시 예에 따른 AI 스피커의 구성 블록도;
도 3은 본 발명의 일 실시 예에 따른 AI 비서 음성 매칭 서버의 구성 블록도;
도 4는 본 발명의 일 실시 예에 따른 AI 비서 음성 매칭 방법을 설명하는 순서도;
도 5는 컨텐츠 구매 이력을 통한 AI 비서 음성 자동 매칭 방법의 일 예를 설명하는 도면.1 is an overall configuration diagram of an AI assistant voice matching system according to an embodiment of the present invention;
2 is a block diagram of an AI speaker according to an embodiment of the present invention;
3 is a block diagram of an AI assistant voice matching server according to an embodiment of the present invention;
4 is a flowchart illustrating an AI assistant voice matching method according to an embodiment of the present invention;
5 is a view for explaining an example of the AI assistant voice automatic matching method through the content purchase history.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 즉, 본 발명에서 사용되는 '부'라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '부'는 어떤 역할들을 수행한다. 그렇지만 '부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '부'들로 결합되거나 추가적인 구성요소들과 '부'들로 더 분리될 수 있다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, and the same or similar components will be given the same reference numerals regardless of the reference numerals, and redundant description thereof will be omitted. The suffixes "module" and "unit" for components used in the following description are given or mixed in consideration of ease of specification, and do not have distinct meanings or roles. In other words, the term 'part' used in the present invention refers to a hardware component such as software, FPGA or ASIC, and 'part' plays a role. But wealth is not limited to software or hardware. The 'unit' may be configured to be in an addressable storage medium and may be configured to play one or more processors. Thus, as an example, a 'part' may include components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, Subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or further separated into additional components and 'parts'.

또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.In addition, in the following description of the embodiments disclosed herein, when it is determined that the detailed description of the related known technology may obscure the gist of the embodiments disclosed herein, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easily understanding the embodiments disclosed in the present specification, the technical idea disclosed in the specification by the accompanying drawings are not limited, and all changes included in the spirit and scope of the present invention. It should be understood to include equivalents and substitutes.

본 발명은 컨텐츠 구매(재생) 이력에 관한 정보를 기반으로 특정 컨텐츠에 포함된 사용자 선호 음성을 추출하고, 상기 사용자 선호 음성의 특징을 기반으로 AI 비서 음성을 자동으로 생성할 수 있는 AI 비서 음성매칭시스템 및 그 동작방법을 제안한다. 또한, 본 발명은 사용자가 일정 기간 동안 소비한 컨텐츠 정보에 기초하여 사용자 맞춤형 AI 비서 음성을 자동으로 생성하고, 상기 사용자 맞춤형 AI 비서 음성을 해당 사용자에게 제공할 수 있는 AI 비서 음성매칭시스템 및 그 동작방법을 제안한다.The present invention extracts a user's preferred voice included in a specific content based on information on a content purchase (playback) history and automatically generates an AI assistant voice based on a feature of the user's preferred voice. We propose a system and an operation method thereof. The present invention also provides an AI assistant voice matching system capable of automatically generating a customized AI assistant voice based on content information consumed by a user for a certain period of time, and providing the user customized AI assistant voice to the user. Suggest a method.

이하에서는, 본 발명의 다양한 실시 예들에 대하여, 도면을 참조하여 상세히 설명한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 AI 비서 음성매칭시스템의 전체 구성도이다.1 is an overall configuration diagram of an AI assistant voice matching system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 AI 비서 음성매칭시스템(100)은, 댁 내에 위치하는 공유기(110), AI 스피커(120) 및 복수의 IoT 기기들(130-1~130-N)과, 댁 외에 위치하는 인터넷 망(140) 및 AI 비서 음성 제공 서버(150) 등을 포함할 수 있다. 상기 AI 비서 음성 제공 서버(150)는 서비스 제공자(service provider) 또는 플랫폼 제공자(platform provider)의 운영 서버 내에 구축될 수 있다.Referring to FIG. 1, the AI assistant voice matching system 100 according to an embodiment of the present invention includes a router 110, an AI speaker 120, and a plurality of IoT devices 130-1 to 130 located in a home. -N), an internet network 140 located outside the home, and an AI assistant voice providing server 150 and the like. The AI assistant voice providing server 150 may be built in an operation server of a service provider or a platform provider.

공유기(Access Point, 110)는 댁 내에 위치하는 AI 스피커(120)와 복수의 IoT 기기(130-1~130-N) 등을 인터넷 망(140)에 접속할 수 있도록 통신을 연결해주는 장치이다. 공유기(110)는 인터넷 서비스 제공자가 제공하는 인터넷 주소(IP address)를 댁 내에 존재하는 복수의 단말들이 서로 나눠 쓸 수 있도록 공유해주는 기능을 수행할 수 있다.The router (Access Point, 110) is a device that connects communication to connect the AI speaker 120 and a plurality of IoT devices (130-1 ~ 130-N) located in the home to the Internet network 140. The router 110 may perform a function of sharing an IP address provided by an internet service provider so that a plurality of terminals existing in the home can share each other.

AI 스피커(또는 음성 제어기, 120)는 공유기(110)와 유/무선 통신 인터페이스를 통해 연결 가능하며, 상기 공유기(110)를 통해 댁 외에 위치하는 AI 비서 음성 제공 서버(150)와 통신을 수행할 수 있다.The AI speaker (or voice controller) 120 may be connected to the router 110 through a wired / wireless communication interface, and may communicate with the AI assistant voice providing server 150 located outside the home through the router 110. Can be.

AI 스피커(120)는 다수의 IoT 기기들(130-1~130-N)과 유/무선 통신 인터페이스를 통해 직접적으로 연결되어, 상기 IoT 기기들(130-1~130-N)과 통신을 수행할 수 있다. 한편, 도면에 도시되고 있지 않지만, 다른 실시 예로, AI 스피커(120)는 홈 허브(home hub, 미도시)와 유/무선 통신 인터페이스를 통해 연결 가능하며, 상기 홈 허브를 통해 다수의 IoT 기기들(130-1~130-N)과 통신을 수행할 수도 있다.The AI speaker 120 is directly connected to a plurality of IoT devices 130-1 to 130 -N through a wired / wireless communication interface to communicate with the IoT devices 130-1 to 130 -N. can do. Meanwhile, although not shown in the drawing, in another embodiment, the AI speaker 120 may be connected to a home hub (not shown) through a wired / wireless communication interface, and a plurality of IoT devices may be provided through the home hub. It may also perform communication with (130-1 ~ 130-N).

AI 스피커(120)는 사용자의 음성 명령에 대응하는 AI 비서 서비스를 제공할 수 있다. 일 예로, AI 스피커(120)는 사용자의 음성 명령에 대응하여 댁 내에 존재하는 다수의 IoT 기기들(130-1~130-N)의 상태를 확인하거나 혹은 해당 기기들(130-1~130-N)의 동작을 제어할 수 있다. 또한, AI 스피커(120)는 AI 비서 음성 제공 서버(150)로부터 수신된 정보를 기반으로 AI 비서 음성 서비스를 제공할 수 있다.The AI speaker 120 may provide an AI assistant service corresponding to the voice command of the user. For example, the AI speaker 120 checks the state of the plurality of IoT devices 130-1 to 130 -N existing in the home in response to a voice command of the user, or corresponding devices 130-1 to 130-. The operation of N) can be controlled. In addition, the AI speaker 120 may provide an AI assistant voice service based on information received from the AI assistant voice providing server 150.

복수의 IoT 기기들(130-1~130-N)은 사물인터넷(Internet of Things, IoT) 기능이 탑재된 IoT 기기로서, 가전 기기(Smart Appliance), 보안 기기(security devices), 조명 기구(Lighting devices) 및 에너지 기기(Energy devices) 등을 포함할 수 있다. 상기 복수의 IoT 기기들(130-1~130-N)은 AI 스피커(120), 홈 허브, 셋톱박스 등과 데이터를 주고 받을 수 있다.The plurality of IoT devices 130-1 to 130 -N are IoT devices equipped with an Internet of Things (IoT) function, and are a smart appliance, a security device, and a lighting device. devices) and energy devices. The plurality of IoT devices 130-1 to 130 -N may exchange data with an AI speaker 120, a home hub, a set-top box, and the like.

AI 비서 음성 제공 서버(150)는 인터넷 망(140)을 통해 댁 내에 존재하는 공유기(110)와 접속할 수 있고, 상기 공유기(110)를 통해 AI 스피커(120)와 통신을 수행할 수 있다. The AI assistant voice providing server 150 may be connected to the router 110 existing in the home through the Internet network 140, and may communicate with the AI speaker 120 through the router 110.

AI 비서 음성 제공 서버(150)는 사용자가 일정 기간 동안 소비한 컨텐츠 정보에 기초하여 사용자 맞춤형 AI 비서 음성(목소리)을 자동으로 생성할 수 있다. 이와 같이 생성된 사용자 맞춤형 AI 비서 음성은 AI 스피커(120)를 통해 사용자에게 제공될 수 있다.The AI assistant voice providing server 150 may automatically generate a customized AI assistant voice (voice) based on content information consumed by the user for a predetermined period of time. The customized AI assistant voice generated as described above may be provided to the user through the AI speaker 120.

도 2는 본 발명의 일 실시 예에 따른 AI 스피커의 구성 블록도이다.2 is a block diagram illustrating an AI speaker according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 AI 스피커(200)는 통신부(210), 입력부(220), 출력부(230), 메모리(240) 및 제어부(250)를 포함할 수 있다. 도 2에 도시된 구성요소들은 AI 스피커(200)를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서상에서 설명되는 AI 스피커는 위에서 열거된 구성요소들보다 많거나 또는 적은 구성요소들을 가질 수 있다.Referring to FIG. 2, the AI speaker 200 according to an exemplary embodiment may include a communication unit 210, an input unit 220, an output unit 230, a memory 240, and a controller 250. . The components shown in FIG. 2 are not essential to implementing the AI speaker 200, so the AI speaker described herein may have more or fewer components than those listed above.

통신부(210)는 유선 통신을 지원하기 위한 유선 통신 모듈과 근거리 무선 통신을 지원하기 위한 근거리 통신 모듈을 포함할 수 있다. 유선 통신 모듈은, 유선 통신을 위한 기술표준들 또는 통신방식(예를 들어, 이더넷(Ethernet), PLC(Power Line Communication), 홈 PNA(Home PNA), IEEE 1394 등)에 따라 구축된 유선 통신망에서 통신 모뎀, 공유기, IoT 기기 중 적어도 하나와 유선 신호를 송수신한다. 상기 근거리 통신 모듈은 근거리 통신(Short range communication)을 위한 것으로서, 블루투스(Bluetooth?), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra-Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi(Wireless-Fidelity), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여 근거리 무선 통신을 지원할 수 있다.The communication unit 210 may include a wired communication module for supporting wired communication and a short-range communication module for supporting short-range wireless communication. The wired communication module is a wired communication network constructed according to technical standards or communication methods (for example, Ethernet, Power Line Communication, Home PNA, IEEE 1394, etc.) for wired communication. Send and receive wired signals with at least one of a communication modem, a router, and an IoT device. The short range communication module is for short range communication, and includes Bluetooth, Bluetooth (RF), Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra-Wideband (UWB), ZigBee, NFC ( Near field communication (WLAN), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, or Wireless USB (Wireless Universal Serial Bus) technology may be used to support short-range wireless communication.

입력부(220)는 오디오 신호 입력을 위한 마이크로폰(microphone)을 포함할 수 있다. 마이크로폰은 외부의 음향 신호를 전기적인 음성 데이터로 처리한다. 처리된 음성 데이터는 AI 스피커(200)에서 수행 중인 기능(또는 실행 중인 응용 프로그램)에 따라 다양하게 활용될 수 있다. 한편, 마이크로폰에는 외부의 음향 신호를 입력 받는 과정에서 발생되는 잡음(noise)을 제거하기 위한 다양한 잡음 제거 알고리즘이 구현될 수 있다.The input unit 220 may include a microphone for inputting an audio signal. The microphone processes external sound signals into electrical voice data. The processed voice data may be variously used according to a function (or an application program being executed) performed by the AI speaker 200. Meanwhile, various noise reduction algorithms may be implemented in the microphone to remove noise generated in the process of receiving an external sound signal.

출력부(230)는 시각, 청각 또는 촉각 등과 관련된 출력을 발생시키기 위한 것으로, 디스플레이부, 음향 출력부, 햅팁 모듈, 광 출력부 중 적어도 하나를 포함할 수 있다. 음향 출력부는 메모리(240)에 저장된 오디오 데이터를 출력할 수 있다. 음향 출력부는 AI 스피커(200)에서 수행되는 기능과 관련된 음향 신호를 출력하기도 한다. 이러한 음향 출력부에는 스피커(speaker) 및 버저(buzzer) 등이 포함될 수 있다.The output unit 230 is used to generate an output related to visual, auditory, or tactile, and may include at least one of a display unit, an audio output unit, a hap tip module, and an optical output unit. The sound output unit may output audio data stored in the memory 240. The sound output unit may output a sound signal related to a function performed by the AI speaker 200. The sound output unit may include a speaker and a buzzer.

메모리(240)는 AI 스피커(200)의 다양한 기능을 지원하는 데이터를 저장한다. 메모리(240)는 AI 스피커(200)에서 구동되는 응용 프로그램(application program 또는 애플리케이션(application)), AI 스피커(200)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다.The memory 240 stores data supporting various functions of the AI speaker 200. The memory 240 may store an application program driven by the AI speaker 200, data for operating the AI speaker 200, and instructions.

제어부(250)는 메모리(240)에 저장된 응용 프로그램과 관련된 동작과, 통상적으로 AI 스피커(200)의 전반적인 동작을 제어한다. 나아가 제어부(250)는 이하에서 설명되는 다양한 실시 예들을 본 발명에 따른 AI 스피커(200) 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 적어도 하나를 조합하여 제어할 수 있다.The controller 250 controls an operation related to an application program stored in the memory 240 and generally the overall operation of the AI speaker 200. Furthermore, in order to implement various embodiments described below on the AI speaker 200 according to the present invention, the controller 250 may control a combination of at least one of the above-described components.

제어부(250)는 음성 인식 모듈(255)을 더 포함할 수 있다. 음성 인식 모듈(255)은 음성 인식 알고리즘이 적용된 음성 인식 엔진을 구동하여 마이크로폰을 통해 입력된 외부 음성을 인식한다. 즉, 음성 인식 모듈(255)은 마이크로폰을 통해 입력되는 외부 음성을 디지털 데이터로 변환하고, 상기 변환된 디지털 데이터를 증폭(Pre-emphasis)한 후, 디지털 변환된 음성의 시작 지점과 끝 지점을 검출한다. 이어서, 음성 인식 모듈(255)은 검출한 시작 지점과 끝 지점 사이의 음성에 대한 음성 특징값들을 추출하여 고유의 음성 또는 음색을 인식한다. The controller 250 may further include a voice recognition module 255. The voice recognition module 255 drives the voice recognition engine to which the voice recognition algorithm is applied to recognize the external voice input through the microphone. That is, the speech recognition module 255 converts the external voice input through the microphone into digital data, amplifies the pre-emphasis, and detects the start point and the end point of the digitally converted voice. do. Subsequently, the speech recognition module 255 extracts speech feature values for the speech between the detected starting point and the ending point to recognize the unique speech or timbre.

한편, 본 실시 예에서는, 음성 인식 모듈(255)이 제어부(250) 내에 구현되는 것을 예시하고 있으나 이를 제한하지는 않으며, 상기 제어부(250)와 독립적으로 구성될 수 있다. 더 나아가, 상기 음성 인식 모듈은 AI 스피커(200)와 연동되는 외부 플랫폼 상에 설치될 수도 있다.Meanwhile, in the present exemplary embodiment, the voice recognition module 255 is implemented in the controller 250, but the present invention is not limited thereto and may be configured independently of the controller 250. Furthermore, the voice recognition module may be installed on an external platform interworking with the AI speaker 200.

도 3은 본 발명의 일 실시 예에 따른 AI 비서 음성 제공 서버의 구성 블록도이다.3 is a block diagram of an AI assistant voice providing server according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시 예에 따른 AI 비서 음성 제공 서버(300)는 통신부(310), 컨텐츠 구매정보 수집부(320), 선호 음성 추출부(330), AI 비서 음성 가공부(340), 데이터베이스(350) 및 제어부(360)를 포함할 수 있다. 도 3에 도시된 구성요소들은 AI 비서 음성 제공 서버(300)를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서상에서 설명되는 AI 비서 음성 제공 서버는 위에서 열거된 구성요소들보다 많거나, 또는 적은 구성요소들을 가질 수 있다. Referring to FIG. 3, the AI assistant voice providing server 300 according to an embodiment of the present invention may include a communication unit 310, a content purchase information collection unit 320, a preferred voice extraction unit 330, and an AI assistant voice processing unit. 340, a database 350, and a controller 360. The components shown in FIG. 3 are not essential to implementing the AI assistant voice providing server 300, so that the AI assistant voice providing server described herein is more or less than the components listed above. You can have them.

통신부(310)는 유선 통신을 지원하기 위한 유선 통신 모듈과, 무선 통신을 지원하기 위한 무선 통신 모듈을 포함할 수 있다. 유선 통신 모듈은, 유선 통신을 위한 기술표준들 또는 통신방식(예를 들어, 이더넷(Ethernet), PLC(Power Line Communication), 홈 PNA(Home PNA), IEEE 1394 등)에 따라 구축된 유선 통신망에서 통신 모뎀, 공유기, 타 서버 중 적어도 하나와 유선 신호를 송수신한다.The communication unit 310 may include a wired communication module for supporting wired communication and a wireless communication module for supporting wireless communication. The wired communication module is a wired communication network constructed according to technical standards or communication methods (for example, Ethernet, Power Line Communication, Home PNA, IEEE 1394, etc.) for wired communication. Sends and receives wired signals with at least one of a communication modem, a router, and another server.

무선 통신 모듈은, 무선 통신을 위한 기술표준들 또는 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등)에 따라 구축된 무선 통신망 상에서 기지국 및 외부의 단말 중 적어도 하나와 무선 신호를 송수신한다. The wireless communication module may include technical standards or communication schemes for wireless communication (for example, Global System for Mobile Communication (GSM), Code Division Multi Access (CDMA), Code Division Multi Access 2000 (CDMA2000), and EV-DO). Enhanced Voice-Data Optimized or Enhanced Voice-Data Only (WCDMA), Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced) and transmits and receives a radio signal with at least one of the base station and the external terminal on the wireless communication network established.

컨텐츠 구매정보 수집부(320)는 사용자가 일정 기간 동안 구매한 컨텐츠들에 관한 정보를 수집할 수 있다. 이때, 상기 컨텐츠는 TV 프로그램, 영화, VOD(video on demand), 유튜브 등을 포함하는 영상 컨텐츠와, 라디오, 팟캐스트, 음악 등을 포함하는 오디오 컨텐츠를 포함할 수 있다.The content purchase information collecting unit 320 may collect information about contents purchased by the user for a predetermined period of time. In this case, the content may include video content including a TV program, a movie, a video on demand (VOD), YouTube, and the like, and audio content including a radio, a podcast, music, and the like.

컨텐츠 구매정보 수집부(320)는 AI 스피커를 통해 입력된 사용자들의 목소리를 음성 인식하여 화자를 식별하고, 상기 식별된 화자 별로 구매 컨텐츠들에 관한 정보를 수집할 수 있다.The content purchase information collecting unit 320 may recognize the speaker by voice recognition of the voices of the users input through the AI speaker, and collect information on the purchased contents for each identified speaker.

선호 음성 추출부(330)는 사용자들이 일정 기간 동안 구매한 컨텐츠 정보에 기초하여 해당 사용자들이 선호하는 음성(preference voice)을 추출할 수 있다. The preferred voice extractor 330 may extract a preference voice of the corresponding users based on content information purchased by the users for a predetermined period of time.

선호 음성 추출부(330)는 구매 컨텐츠들의 누적 재생 시간 및 누적 재생 횟수 중 적어도 하나를 계산하고, 상기 계산된 값을 기반으로 미리 결정된 개수의 누적 재생 상위 컨텐츠들(이하, 설명의 편의상, '상위 컨텐츠' 또는 '후보 컨텐츠'라 칭함)을 검출할 수 있다. The preferred speech extractor 330 calculates at least one of a cumulative playing time and a cumulative playing count of the purchased contents, and based on the calculated value, a predetermined number of cumulative playing upper contents (hereinafter, for convenience of description, 'higher'). Content "or" candidate content ".

선호 음성 추출부(330)는 상위 컨텐츠들을 종류(type) 별로 분류하고, 미리 설정된 컨텐츠 종류별 가중치 정보를 기반으로 상위 컨텐츠들의 사용자 선호도 점수를 계산할 수 있다.The preferred voice extractor 330 may classify the upper contents by types and calculate a user preference score of the upper contents based on weight information for each type of content.

선호 음성 추출부(330)는 사용자 선호도 점수를 기반으로 최적 컨텐츠를 검출하고, 상기 최적 컨텐츠에서 추출된 대표 음성을 사용자 선호 음성으로 결정할 수 있다. The preferred voice extractor 330 may detect an optimum content based on a user preference score, and determine the representative voice extracted from the optimum content as the user preferred voice.

AI 비서 음성 가공부(340)는 사용자 선호 음성의 특징 값(또는 특징 벡터)들을 추출하고, 상기 추출된 특징 값들을 기반으로 AI 비서 음성(목소리)을 가공할 수 있다. 이와 같이 가공된 AI 비서 음성은 AI 스피커를 통해 사용자에게 제공될 수 있다. The AI assistant voice processing unit 340 may extract feature values (or feature vectors) of the user's preferred voice and process the AI assistant voice (voice) based on the extracted feature values. The AI assistant voice thus processed may be provided to the user through the AI speaker.

데이터베이스(또는 메모리, 350)는 AI 비서 음성 제공 서버(300)의 다양한 기능을 지원하는 데이터를 저장한다. 데이터베이스(350)는 AI 비서 음성 제공 서버(300)에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), AI 비서 음성 제공 서버(300)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다.The database (or memory) 350 stores data supporting various functions of the AI assistant voice providing server 300. The database 350 may store a plurality of application programs or applications that are run on the AI assistant voice providing server 300, data for operating the AI assistant voice providing server 300, and instructions. .

데이터베이스(350)는 사용자에 의해 구매된 컨텐츠들을 저장하기 위한 컨텐츠 데이터베이스(DB)와, 구매 컨텐츠들에서 추출한 음성 데이터를 저장하기 위한 음성 데이터베이스(DB)를 포함할 수 있다.The database 350 may include a content database DB for storing contents purchased by the user and a voice database DB for storing voice data extracted from the purchase contents.

제어부(360)는 데이터베이스(350)에 저장된 응용 프로그램과 관련된 동작과, 통상적으로 AI 비서 음성 제공 서버(300)의 전반적인 동작을 제어한다. 나아가 제어부(360)는 이하에서 설명되는 다양한 실시 예들을 본 발명에 따른 AI 비서 음성 제공 서버(300) 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 적어도 하나를 조합하여 제어할 수 있다. The controller 360 controls an operation related to an application program stored in the database 350 and generally the overall operation of the AI assistant voice providing server 300. Furthermore, in order to implement various embodiments described below on the AI assistant voice providing server 300 according to the present invention, the controller 360 may control a combination of at least one of the above-described components.

제어부(360)는 사용자가 일정 기간 동안 구매한 컨텐츠 정보에 기초하여 사용자 맞춤형 AI 비서 음성을 자동으로 생성하고, 상기 사용자 맞춤형 AI 비서 음성을 AI 스피커(200)로 전송하는 전반적인 동작을 제어할 수 있다.The controller 360 may automatically generate a customized AI assistant voice based on content information purchased by a user for a predetermined period of time, and control an overall operation of transmitting the customized AI assistant voice to the AI speaker 200. .

한편, 본 실시 예에서, AI 비서 음성 제공 서버(300)가 컨텐츠 구매정보 수집부(320)를 구비하는 것을 예시하고 있으나 반드시 이에 제한되지는 않으며, 상기 컨텐츠 구매정보 수집부(320) 대신 컨텐츠 재생정보 수집부를 포함할 수도 있다. 상기 컨텐츠 재생정보 수집부는 사용자가 일정 기간 동안 소비(재생)한 컨텐츠들에 관한 정보를 수집할 수 있다.Meanwhile, in the present exemplary embodiment, the AI assistant voice providing server 300 includes the content purchase information collecting unit 320, but the present invention is not necessarily limited thereto. It may also include an information collector. The content reproduction information collecting unit may collect information about contents consumed (played) by a user for a predetermined period of time.

이상, 상술한 바와 같이, 본 발명에 따른 AI 비서 음성 제공 서버는 사용자가 일정 기간 동안 소비(재생)한 컨텐츠 정보에 기초하여 생성된 사용자 맞춤형 AI 비서 음성을 해당 사용자에게 제공함으로써, 사용자가 자신의 기호에 맞는 음성을 청취할 수 있고 사용자와 AI 비서 간의 친밀도를 향상시킬 수 있다.As described above, the AI assistant voice providing server according to the present invention provides the user with a customized AI assistant voice generated based on content information consumed (played) by the user for a predetermined period of time. You can listen to your favorite voice and improve the intimacy between you and the AI assistant.

도 4는 본 발명의 일 실시 예에 따른 AI 비서 음성 매칭 방법을 설명하는 순서도이다.4 is a flowchart illustrating an AI assistant voice matching method according to an embodiment of the present invention.

도 4를 참조하면, AI 비서 음성 제공 서버(300)는 일정 기간 동안 사용자들의 컨텐츠 구매 정보를 수집할 수 있다(S410). 상기 AI 비서 음성 제공 서버(300)는 AI 스피커(200), 셋톱박스(Set-Top Box), 서비스 제공자, 플랫폼 제공자 등과 연동하여 컨텐츠 구매 정보를 수집할 수 있다. 상기 컨텐츠 구매 정보는, 컨텐츠 식별 정보, 구매 날짜 정보, 구매 시간 정보, 구매 비용 정보, 구매 장소 정보 등을 포함할 수 있다.Referring to FIG. 4, the AI assistant voice providing server 300 may collect content purchase information of users for a predetermined period of time (S410). The AI assistant voice providing server 300 may collect content purchase information in conjunction with an AI speaker 200, a set-top box, a service provider, a platform provider, and the like. The content purchase information may include content identification information, purchase date information, purchase time information, purchase cost information, purchase place information, and the like.

AI 비서 음성 제공 서버(300)는 구매 컨텐츠들의 종류(type)와, 컨텐츠 종류별 가중치(weight)를 미리 결정(또는 설정)하고, 해당 정보를 데이터베이스에 저장할 수 있다(S420).The AI assistant voice providing server 300 may predetermine (or set) a type of purchased contents and a weight for each type of content and store the corresponding information in a database (S420).

일 예로, 아래 표 1에 표시된 바와 같이, 구매 컨텐츠들의 종류는 제1 컨텐츠 타입(교육/다큐/시사/교양/애니), 제2 컨텐츠 타입(영화/드라마/연극), 제3 컨텐츠 타입(연예/오락/스포츠), 제4 컨텐츠 타입(콘서트, 공연, 예술, 뮤직) 및 제5 컨텐츠 타입(기타) 등으로 설정될 수 있다. 또한, 컨텐츠 타입 별 가중치는, 제1 컨텐츠 타입의 경우 '100'으로 설정될 수 있고, 제2 컨텐츠 타입의 경우 '70'으로 설정될 수 있고, 제3 컨텐츠 타입의 경우 '50'으로 설정될 수 있고, 제4 컨텐츠 타입의 경우 '20'으로 설정될 수 있으며, 제5 컨텐츠 타입의 경우 '10'으로 설정될 수 있다. 즉, AI 음성(목소리)에 적합한 컨텐츠 타입에 대해 높은 가중치를 부여할 수 있다.For example, as shown in Table 1 below, the types of purchased contents may include a first content type (education / document / presentation / culture / animation), a second content type (movie / drama / theater), and a third content type (entertainment). / Entertainment / sports), a fourth content type (concert, performance, art, music), and a fifth content type (other). In addition, the weight for each content type may be set to '100' for the first content type, '70' for the second content type, and to '50' for the third content type. The fourth content type may be set to '20' and the fifth content type may be set to '10'. That is, high weights can be assigned to content types suitable for AI voice (voice).

컨텐츠contents 컨텐츠 종류Content type 음원soundtrack 가중치weight AA 제1 컨텐츠 타입
(교육, 다큐, 시사, 교양, 애니)First content type
(Education, documentary, current affairs, culture, annie) 내레이션/성우 음성Narration / Voice Voice 100100 BB 제2 컨텐츠 타입
(영화, 드라마, 연극)Second content type
(Movie, drama, theater) 배우 음성Actor voice 7070 CC 제3 컨텐츠 타입
(연예, 오락, 스포츠)Third content type
(Entertainment, entertainment, sports) MC 음성MC voice 5050 DD 제4 컨텐츠 타입
(콘서트, 공연, 예술, 뮤직)4th content type
(Concerts, performances, art, music) 노래 음성Song voice 2020 EE 제5 컨텐츠 타입
(기타)Fifth content type
(Other) 미 분류US Classifieds 1010

AI 비서 음성 제공 서버(300)는 사용자가 일정 기간 동안 구매한 컨텐츠들의 누적 재생 시간을 계산할 수 있다(S430). AI 비서 음성 제공 서버(300)는 누적 재생 시간의 길이가 긴 순서에 따라 구매 컨텐츠들을 순차적으로 정렬한 다음 미리 결정된 개수의 상위 컨텐츠들을 검출할 수 있다(S440).The AI assistant voice providing server 300 may calculate the cumulative playing time of the contents purchased by the user for a predetermined period of time (S430). The AI assistant voice providing server 300 may sequentially arrange the purchased contents according to the longest length of the cumulative playing time and then detect a predetermined number of higher contents (S440).

AI 비서 음성 제공 서버(300)는 상기 검출된 상위 컨텐츠들 각각을 미리 결정된 컨텐츠 타입들 중 어느 하나로 분류할 수 있다(S450). 이는 컨텐츠의 종류에 따라 서로 다른 가중치를 적용하기 위함이다.The AI assistant voice providing server 300 may classify each of the detected upper content contents into any one of predetermined content types (S450). This is to apply different weights according to the type of content.

AI 비서 음성 제공 서버(300)는 미리 설정된 컨텐츠 타입별 가중치 정보를 이용하여 상위 컨텐츠들의 사용자 선호도 점수를 계산할 수 있다(S460). 일 실시 예로, AI 비서 음성 제공 서버(300)는 아래 수학식 1을 이용하여 사용자 선호도 점수(K)를 계산할 수 있다.The AI assistant voice providing server 300 may calculate a user preference score of the higher contents using the preset weight information for each content type (S460). For example, the AI assistant voice providing server 300 may calculate a user preference score K using Equation 1 below.

한편, 다른 실시 예로, AI 비서 음성 제공 서버(300)는 아래 수학식 2를 이용하여 사용자 선호도 점수(K)를 계산할 수도 있다. 이외에도, 기타 다양한 방식을 이용하여 사용자 선호도 점수(K)를 계산할 수 있음은 당업자에게 자명할 것이다.Meanwhile, as another example, the AI assistant voice providing server 300 may calculate a user preference score K using Equation 2 below. In addition, it will be apparent to those skilled in the art that the user preference score K may be calculated using various other methods.

여기서,

,

는 재생 횟수에 관한 가중치,

는 누적재생시간에 관한 가중치임.here,

,

Is the weight for playback count,

Is the weight for cumulative playing time.

AI 비서 음성 제공 서버(300)는 사용자 선호도 점수를 기반으로 최적 컨텐츠를 검출할 수 있다(S470). 즉, AI 비서 음성 제공 서버(300)는 상위 컨텐츠들 중에서 사용자 선호도 점수가 가장 높은 컨텐츠를 최적 컨텐츠(또는 목표 컨텐츠)로 결정할 수 있다.The AI assistant voice providing server 300 may detect the optimal content based on the user preference score (S470). That is, the AI assistant voice providing server 300 may determine the content having the highest user preference score among the higher contents as the optimal content (or target content).

AI 비서 음성 제공 서버(300)는 최적 컨텐츠에 포함된 음원 정보들을 분석하여 대표 음성을 검출할 수 있다(S480). 일 예로, AI 비서 음성 제공 서버(300)는 해당 컨텐츠에서 재생되는 음성(목소리)들을 구분하고, 각 음성 별 누적 재생 시간을 기준으로 대표 음성을 추출할 수 있다. AI 비서 음성 제공 서버(300)는 최적 컨텐츠에서 추출된 대표 음성을 사용자 선호 음성으로 결정할 수 있다.The AI assistant voice providing server 300 may detect the representative voice by analyzing sound source information included in the optimum content (S480). For example, the AI assistant voice providing server 300 may classify voices (voices) played in the corresponding content and extract the representative voice based on the cumulative playing time for each voice. The AI assistant voice providing server 300 may determine the representative voice extracted from the optimal content as the user's preferred voice.

AI 비서 음성 제공 서버(300)는 사용자 선호 음성의 특징 값(또는 특징 벡터)들을 검출하고, 상기 검출된 특징 값들을 기반으로 AI 비서 음성(목소리)을 가공할 수 있다(S490). 이와 같이 가공된 AI 비서 음성은 AI 스피커를 통해 사용자에게 출력될 수 있다.The AI assistant voice providing server 300 may detect feature values (or feature vectors) of the user's preferred voice and process the AI assistant voice (voice) based on the detected feature values (S490). The AI assistant voice thus processed may be output to the user through the AI speaker.

이상, 상술한 바와 같이, 본 발명에 따른 AI 비서 음성 매칭 방법은 사용자가 일정 기간 동안 소비(재생)한 컨텐츠 정보에 기초하여 생성된 사용자 맞춤형 AI 비서 음성을 해당 사용자에게 제공함으로써, 사용자가 자신의 기호에 맞는 음성을 청취할 수 있고 사용자와 AI 비서 간의 친밀도를 향상시킬 수 있다.As described above, the AI assistant voice matching method according to the present invention provides the user with a customized AI assistant voice generated based on content information consumed (played) by the user for a predetermined period of time. You can listen to your favorite voice and improve the intimacy between you and the AI assistant.

도 5는 컨텐츠 구매(재생) 이력을 통한 AI 비서 음성 자동 매칭 방법의 일 예를 설명하는 도면이다. 도 5의 (a)에 도시된 바와 같이, AI 비서 음성 제공 서버(300)는 사용자가 일정 기간 동안 구매한 컨텐츠들의 누적 재생 시간을 계산할 수 있다. AI 비서 음성 제공 서버(300)는 상위 누적 재생 시간을 기반으로 제1 내지 제5 컨텐츠들(라디오 스타, 밥 잘 사주는 누나, 나 혼자 산다. 원피스, 방탄소년단)을 검출할 수 있다. 5 is a view for explaining an example of the AI assistant voice automatic matching method through the content purchase (playback) history. As shown in FIG. 5A, the AI assistant voice providing server 300 may calculate a cumulative playing time of contents purchased by a user for a predetermined period of time. The AI assistant voice providing server 300 may detect the first to fifth contents (radio star, sister who buys rice well, I live alone. One piece, BTS) based on the upper cumulative playing time.

그 다음, 도 5의 (b)에 도시된 바와 같이, AI 비서 음성 제공 서버(300)는 제1 내지 제5 컨텐츠들을 미리 결정된 컨텐츠 타입들 중 어느 하나로 분류할 수 있다. 가령, AI 비서 음성 제공 서버(300)는 제1 컨텐츠를 '컨텐츠 타입 C'로, 제2 컨텐츠를 '컨텐츠 타입 B'로, 제3 컨텐츠를 '컨텐츠 타입 C', 제4 컨텐츠를 '컨텐츠 타입 A', 제5 컨텐츠를 '컨텐츠 타입 D'로 분류할 수 있다.Next, as illustrated in FIG. 5B, the AI assistant voice providing server 300 may classify the first to fifth contents into one of predetermined content types. For example, the AI assistant voice providing server 300 sets the first content to 'content type C', the second content to 'content type B', the third content to 'content type C', and the fourth content to 'content type'. A 'and the fifth content may be classified as' content type D'.

그 다음, 도 5의 (c)에 도시된 바와 같이, AI 비서 음성 제공 서버(300)는 제1 내지 제5 컨텐츠 각각에 대해 사용자 선호도 점수를 계산할 수 있다. 이때, AI 비서 음성 제공 서버(300)는 컨텐츠별 가중치 정보와 컨텐츠 재생 횟수를 기반으로 사용자 선호도 점수를 계산할 수 있다.Next, as illustrated in FIG. 5C, the AI assistant voice providing server 300 may calculate a user preference score for each of the first to fifth contents. In this case, the AI assistant voice providing server 300 may calculate a user preference score based on the weight information for each content and the number of times of playing the content.

이후, 도 5의 (d)에 도시된 바와 같이, AI 비서 음성 제공 서버(300)는 사용자 선호도 점수가 가장 높은 제2 컨텐츠(가령, 밥 잘 사주는 누나)를 검출할 수 있다. AI 비서 음성 제공 서버(300)는 제2 컨텐츠에 포함된 음원들을 분석하여 대표 음성(가령, 홍길동 음성)을 추출하고, 상기 추출된 대표 음성을 사용자 선호 음성으로 결정할 수 있다.Subsequently, as shown in FIG. 5D, the AI assistant voice providing server 300 may detect the second content having the highest user preference score (eg, a sister who buys rice well). The AI assistant voice providing server 300 may extract a representative voice (eg, Hong Gil-dong voice) by analyzing sound sources included in the second content, and determine the extracted representative voice as the user's preferred voice.

전술한 본 발명은 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 상기 컴퓨터는 단말기의 제어부를 포함할 수도 있다. 따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.The present invention described above can be embodied as computer readable codes on a medium in which a program is recorded. The computer readable medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable media include hard disk drives (HDDs), solid state disks (SSDs), silicon disk drives (SDDs), ROMs, RAM, CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and the like. There is this. In addition, the computer may include a control unit of the terminal. Accordingly, the above detailed description should not be construed as limiting in all aspects and should be considered as illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

100: AI 비서 음성매칭시스템 110: 공유기
120/200: AI 스피커 130: IoT 기기
140: 인터넷 망 150/300: AI 비서 음성 제공 서버
310: 통신부 320: 컨텐츠 구매정보 수집부
330: 선호 음성 추출부 340: AI 비서 음성 가공부
350: 데이터베이스 360: 제어부100: AI assistant voice matching system 110: router
120/200: AI speaker 130: IoT device
140: Internet network 150/300: AI secretary voice providing server
310: communication unit 320: content purchase information collection unit
330: Preferred voice extraction unit 340: AI assistant voice processing unit
350: database 360: control unit

Claims

Collecting content information consumed by a user for a certain period of time;
Detecting a predetermined number of candidate contents based on the collected content information;
Selecting target contents based on a user preference score of the candidate contents; And
Analyzing the sound sources included in the selected target content to extract a representative voice, and determining the representative voice as a user's preferred voice.

The method of claim 1, wherein the detecting step,
And calculating the cumulative playing time of the contents consumed by the user for a predetermined period of time, and detecting the candidate contents based on the cumulative playing time.

The method of claim 1, wherein the detecting step,
And calculating the cumulative reproduction count of the contents consumed by the user for a predetermined period of time and detecting the candidate contents based on the cumulative reproduction count.

The method of claim 1, wherein the selecting step,
And classifying each of the candidate contents into any one of predetermined content types.

The method of claim 4, wherein the selecting step,
And calculating a user preference score of the candidate contents by using weight information corresponding to the classified content type.

The method of claim 5, wherein the selecting step,
And selecting candidate content having the highest user preference score as the target content.

The method of claim 1, wherein the determining step,
And classifying one or more voices included in the target content and extracting the representative voice based on a cumulative playing time for each voice.

The method of claim 1,
Extracting feature values of the user preferred voice and processing the AI assistant voice using the extracted feature values.

The method of claim 8,
And transmitting the processed AI assistant voice to an AI speaker.

A program stored on a computer readable recording medium for executing the method according to any one of claims 1 to 9 on a computer.

A content purchase information collection unit for collecting content information purchased by the user for a predetermined period of time;
Detects a predetermined number of candidate contents based on the content information acquired through the content purchase information collecting unit, selects target contents based on a user preference score of the candidate contents, and includes the selected target contents in the selected target contents. A preferred voice extractor configured to extract the representative voice by analyzing the sound source; And
And an AI assistant voice processing unit for extracting feature values of the representative voice detected through the preferred voice extractor and processing the AI assistant voice using the feature values.