KR20210010594A

KR20210010594A - Method for managing contents using speaker recognition, computer-readable medium and computing device

Info

Publication number: KR20210010594A
Application number: KR1020210004202A
Authority: KR
Inventors: 류창선; 박성원; 박종세
Original assignee: 주식회사 케이티
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-01-27

Abstract

A management method of content through speaker recognition for automatically recognizing a person related to content comprises the steps of: storing acquaintance voice data for acquaintances in an address book; storing speaker voice data in connection with content; recognizing one or more speakers included in the speaker voice data based on the acquaintance voice data; and managing information of the recognized speaker in connection with the content.

Description

A method of managing contents through speaker recognition, a computer-readable medium, and a computing device {METHOD FOR MANAGING CONTENTS USING SPEAKER RECOGNITION, COMPUTER-READABLE MEDIUM AND COMPUTING DEVICE}

본 발명은 화자 인식을 통한 콘텐츠의 관리 방법, 애플리케이션에서 콘텐츠를 관리하는 명령어들의 시퀀스를 포함하는 컴퓨터 판독가능 매체 및 콘텐츠 관리 애플리케이션을 실행하는 컴퓨팅 장치에 관한 것이다.The present invention relates to a method for managing content through speaker recognition, a computer-readable medium including a sequence of instructions for managing content in an application, and a computing device for executing a content management application.

콘텐츠란 문자, 도형, 색채, 음성, 동작, 그림이나 이들을 결합한 것 또는 영화, 음악, 연극, 문학, 사진, 만화, 애니메이션, 컴퓨터 게임과 관련된 데이터를 말한다. 기존에는 콘텐츠가 전문 제작자에 의해서만 만들어졌었지만, 디지털 기술의 발전으로 인해 최근에는 누구나 개인용 컴퓨터 또는 스마트폰 등을 통해 사진, 동영상, 음악 등을 제작하고 편집할 수 있게 되었다. 이와 같이, 일반 개인이 콘텐츠를 생성할 수 있게 됨으로써, 현대 사회에서는 콘텐츠의 숫자와 양이 매우 빠른 속도로 증가하게 되었다. 따라서 이러한 방대한 양의 콘텐츠를 관리하는 방법이 필요하게 되었고 이러한 콘텐츠 관리 방법과 관련하여, 선행기술인 한국 공개 특허 제 2013-0090570호는 콘텐츠를 관리하기 위한 전자 기기 및 그 관리 방법에 대해 개시하고 있다. Content refers to data related to characters, figures, colors, voices, movements, pictures, or a combination thereof, or movies, music, plays, literature, photos, cartoons, animations, and computer games. In the past, content was created only by professional producers, but due to the advancement of digital technology, anyone can create and edit photos, videos, and music through personal computers or smartphones. In this way, as a general individual can create content, the number and amount of content has increased at a very rapid rate in the modern society. Accordingly, a method for managing such a vast amount of content has become necessary, and with respect to such a content management method, Korean Patent Publication No. 2013-0090570, which is a prior art, discloses an electronic device for managing content and a management method thereof.

앞서 언급한 바와 같이 일반 개인은 자신이 사용하는 개인용 컴퓨터 또는 스마트폰 등의 전자 기기에서 직접 콘텐츠를 생성하며 동시에 사용하고 있는 바, 이러한 유저 생성 콘텐츠(UCC: User Created Contents)의 급속한 증가는 새로운 콘텐츠 관리 방법을 요구하고 있다.As mentioned above, general individuals create and use content directly on electronic devices such as personal computers or smartphones that they use, and the rapid increase in user-generated content (UCC) is a new content. You are asking for a management method.

한국등록특허공보 제10-2016-0119740호 (2016.10.14. 공개)Korean Registered Patent Publication No. 10-2016-0119740 (published on October 14, 2016)

전자 기기에서 콘텐츠를 생성하거나 시청하거나 조회할 때 입력된 사람의 음성 데이터를 이용함으로써, 콘텐츠와 관련되어 있는 사람을 자동으로 인식하는 콘텐츠의 관리 방법을 제공하고자 한다. 전자 기기의 주소록에 저장되어 있는 사람에 대해 편리하고 효율적으로 콘텐츠를 태깅, 검색 및 전송하는 콘텐츠의 관리 방법을 제공하고자 한다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.An object of the present invention is to provide a content management method that automatically recognizes a person related to content by using voice data of a person input when creating, viewing, or inquiring content on an electronic device. An object of the present invention is to provide a content management method for conveniently and efficiently tagging, searching, and transmitting content for people stored in an address book of an electronic device. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는 주소록 상의 지인들에 대한 지인 음성 데이터를 저장하는 단계, 콘텐츠와 연계하여 화자 음성 데이터를 저장하는 단계, 지인 음성 데이터에 기초하여, 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식하는 단계 및 인식된 화자의 정보를 콘텐츠와 연계하여 관리하는 단계를 포함하는 콘텐츠 관리 방법을 제공할 수 있다. As a technical means for achieving the above-described technical problem, an embodiment of the present invention includes the steps of storing acquaintance voice data for acquaintances in an address book, storing speaker audio data in association with content, and acquaintance voice data. Thus, it is possible to provide a content management method including the step of recognizing one or more speakers included in the speaker's voice data and managing the recognized speaker's information in connection with the content.

또한, 본 발명의 다른 실시예는, 주소록 상의 지인들에 대한 지인 음성 데이터를 화자 인식 서버로 전송하는 단계, 콘텐츠와 연계하여 화자 음성 데이터를 저장하는 단계, 화자 음성 데이터를 화자 인식 서버로 전송하는 단계, 화자 인식 서버로부터 화자 음성 데이터에 포함된 한 명 이상의 화자의 인식 결과를 수신하는 단계 및 인식 결과를 콘텐츠와 연계하여 관리하는 단계를 포함하는 콘텐츠 관리 방법을 제공할 수 있다. In addition, another embodiment of the present invention includes transmitting acquaintance voice data for acquaintances in an address book to a speaker recognition server, storing speaker voice data in association with content, and transmitting speaker voice data to a speaker recognition server. It is possible to provide a content management method including the step, receiving a recognition result of one or more speakers included in the speaker voice data from a speaker recognition server, and managing the recognition result in association with the content.

또한, 본 발명의 또 다른 실시예는, 모바일 단말로부터 주소록 상의 지인들에 대한 지인 음성 데이터를 수신하는 단계, 모바일 단말로부터 화자 음성 데이터를 수신하는 단계, 지인 음성 데이터에 기초하여, 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식하는 단계 및 인식된 화자의 정보를 모바일 단말로 전송하는 단계를 포함하는 콘텐츠 관리 방법을 제공할 수 있다.In addition, another embodiment of the present invention includes the steps of receiving acquaintance voice data for acquaintances in an address book from a mobile terminal, receiving speaker audio data from a mobile terminal, based on acquaintance voice data, It is possible to provide a content management method including the step of recognizing one or more included speakers and transmitting information of the recognized speaker to a mobile terminal.

또한, 본 발명의 또 다른 실시예는, 메모리 및 메모리와 인터페이싱하도록 정렬된 프로세싱 유닛을 포함하고, 프로세싱 유닛은 주소록 상의 지인들에 대한 지인 음성 데이터를 저장하고 콘텐츠와 연계하여 화자 음성 데이터를 저장하고 지인 음성 데이터에 기초하여 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식하고 인식된 화자의 정보를 콘텐츠와 연계하여 관리하도록 구성되는 컴퓨팅 장치를 제공할 수 있다.In addition, another embodiment of the present invention includes a memory and a processing unit arranged to interface with the memory, wherein the processing unit stores acquaintance voice data for acquaintances in the address book and stores speaker voice data in association with the content. It is possible to provide a computing device configured to recognize one or more speakers included in the speaker voice data based on the acquaintance voice data and manage the recognized speaker information in association with content.

또한, 본 발명의 또 다른 실시예는 컴퓨팅 장치에 의해 실행될 때, 컴퓨팅 장치가 주소록 상의 지인들에 대한 지인 음성 데이터를 저장하고, 콘텐츠와 연계하여 화자 음성 데이터를 저장하고, 지인 음성 데이터에 기초하여 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식하고, 인식된 화자의 정보를 콘텐츠와 연계하여 관리하도록 하는 컴퓨터 판독가능 매체를 제공할 수 있다.In addition, another embodiment of the present invention, when executed by the computing device, the computing device stores acquaintance voice data for acquaintances in the address book, stores speaker audio data in association with the content, and based on acquaintance voice data It is possible to provide a computer-readable medium for recognizing one or more speakers included in speaker voice data and managing the recognized speaker information in association with content.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 전자 기기에서 콘텐츠를 생성하거나 시청하거나 조회할 때 입력된 사람의 음성 데이터를 이용함으로써, 콘텐츠와 관련되어 있는 사람을 자동으로 인식하는 콘텐츠의 관리 방법을 제공할 수 있다. 또한, 전자 기기의 주소록에 저장되어 있는 사람에 대해 편리하고 효율적으로 콘텐츠를 태깅, 검색 및 전송하는 콘텐츠의 관리 방법을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, content management that automatically recognizes a person related to the content by using the voice data of a person input when creating, viewing, or inquiring content on an electronic device. Can provide a way. In addition, it is possible to provide a content management method for tagging, searching, and transmitting content conveniently and efficiently for a person stored in an address book of an electronic device.

도 1은 본 발명의 일 실시예에 따른 화자 인식을 통한 콘텐츠 관리 방법의 제공 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 사용자 디바이스의 구성도의 일 예이다.
도 3은 본 발명의 일 실시예에 따른 사용자 디바이스에 표시되는 지인 음성 데이터의 예시를 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 인식된 화자 정보를 이용한 콘텐츠의 관리의 예시를 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 사용자 디바이스 및 화자 인식 서버에서 수행되는 콘텐츠 관리 방법을 나타내는 신호 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 사용자 디바이스에서 수행되는 콘텐츠 관리 방법을 나타낸 흐름도이다.1 is a block diagram of a system for providing a content management method through speaker recognition according to an embodiment of the present invention.
2 is an example of a configuration diagram of a user device according to an embodiment of the present invention.
3 is a diagram illustrating an example of acquaintance voice data displayed on a user device according to an embodiment of the present invention.
4 is a diagram illustrating an example of management of content using recognized speaker information according to an embodiment of the present invention.
5 is a signal flow diagram illustrating a content management method performed in a user device and a speaker recognition server according to an embodiment of the present invention.
6 is a flowchart illustrating a content management method performed in a user device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, or two or more units may be realized using one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In the present specification, some of the operations or functions described as being performed by the terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 화자 인식을 통한 콘텐츠 관리 방법의 제공 시스템(1)의 구성도이다. 도 1을 참조하면, 화자 인식을 통한 콘텐츠 관리 방법의 제공 시스템 (1)은 사용자 디바이스(100) 및 화자 인식 서버(200)를 포함할 수 있다. 도 1에 도시된 사용자 디바이스(100) 및 화자 인식 서버(200)는 화자 인식을 통한 콘텐츠 관리 방법의 제공 시스템(1)에 의하여 제어될 수 있는 구성요소들을 예시적으로 도시한 것이다.1 is a block diagram of a system 1 for providing a content management method through speaker recognition according to an embodiment of the present invention. Referring to FIG. 1, a system 1 for providing a content management method through speaker recognition may include a user device 100 and a speaker recognition server 200. The user device 100 and the speaker recognition server 200 illustrated in FIG. 1 exemplarily illustrate components that can be controlled by the system 1 for providing a content management method through speaker recognition.

도 1의 화자 인식을 통한 콘텐츠 관리 방법의 제공 시스템(1)의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 사용자 디바이스(100)는 네트워크를 통하여 화자 인식 서버(200)와 연결될 수 있다. Each component of the system 1 for providing a content management method through speaker recognition of FIG. 1 is generally connected through a network. For example, as shown in FIG. 1, the user device 100 may be connected to the speaker recognition server 200 through a network.

네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예는, Wi-Fi, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 3G, 4G, LTE(Long-Term Evolution) 등이 포함되나 이에 한정되지는 않는다. A network refers to a connection structure in which information exchange is possible between nodes such as terminals and servers, and examples of such networks are Wi-Fi, Internet, LAN (Local Area Network), and Wireless LAN. (Wireless Local Area Network), Wide Area Network (WAN), Personal Area Network (PAN), 3G, 4G, Long-Term Evolution (LTE), and the like are included, but are not limited thereto.

사용자 디바이스(100)는 이용자가 사용자 디바이스(100)의 주소록 내에 있는 지인과 음성 통화를 할 때 자동으로 그 지인의 음성을 지인 음성 데이터의 형태로 저장할 수 있다. 또는, 사용자 디바이스(100)는 주소록 상의 지인과 음성 통화를 할 때 수동으로, 즉, 사용자의 녹음 요청이 있는 경우에 그 지인의 음성을 지인 음성 데이터의 형태로 저장할 수 있다. 또는 사용자 디바이스(100)는 지인의 디바이스로부터 또는 지인이 업로드해 둔 서버로부터 지인 음성 데이터를 전송받아 저장할 수도 있다. 이 때, 그 지인 음성 데이터는 주소록 내에 있는 지인의 정보와 결합되어 저장될 수 있다.When a user makes a voice call with an acquaintance in the address book of the user device 100, the user device 100 may automatically store the acquaintance's voice in the form of acquaintance voice data. Alternatively, when making a voice call with an acquaintance in the address book, the user device 100 may store the acquaintance's voice in the form of acquaintance voice data manually, that is, when there is a user's request for recording. Alternatively, the user device 100 may receive and store the acquaintance's voice data from the acquaintance's device or from a server uploaded by the acquaintance. In this case, the voice data of the acquaintance may be stored in combination with information of the acquaintance in the address book.

사용자 디바이스(100)는 사진을 찍거나 동영상 촬영을 하는 등의 방법으로 콘텐츠를 생성하거나 네트워크를 통해 콘텐츠를 다운로드 하여 사용자 디바이스 이용자로 하여금 시청하거나 조회하게 할 수 있다. 이렇게 콘텐츠를 생성하거나 조회할 때 입력된 여러 사람의 음성은 화자 음성 데이터의 형태로 사용자 디바이스(100)에 저장될 수 있다.The user device 100 may generate content by taking a picture or taking a video or download the content through a network so that the user device user can view or view it. Voices of several people input when creating or inquiring content in this way may be stored in the user device 100 in the form of speaker voice data.

사용자 디바이스(100)는 주소록 상의 지인 음성 데이터와 콘텐츠에 태깅된 화자 음성 데이터를 비교하는 방법 등을 이용하여 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식할 수 있다. 또는. 사용자 디바이스(100)는 지인 음성 데이터와 화자 음성 데이터를 화자 인식 서버(200)에 전송하고 화자 인식 서버(200)으로부터 화자 인식 정보를 수신하는 방법에 의하여 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식할 수 있다. 이렇게 인식된 화자 정보는 콘텐츠와 연계되어 콘텐츠를 관리하는 데 사용될 수 있다.The user device 100 may recognize one or more speakers included in the speaker voice data using a method of comparing the voice data of the acquaintance in the address book with the speaker voice data tagged in the content. or. The user device 100 transmits the acquaintance voice data and the speaker voice data to the speaker recognition server 200 and receives speaker recognition information from the speaker recognition server 200 to receive one or more speakers included in the speaker voice data. I can recognize it. The recognized speaker information can be used to manage the content in association with the content.

사용자 디바이스(100)는 인식된 화자 정보를 콘텐츠와 연계되어 관리함에 있어서, 콘텐츠를 인식된 화자에게 전송하거나 인식된 화자 음성 데이터를 텍스트로 변환한 후 콘텐츠와 연계하여 저장하는 등의 방식을 사용할 수 있다.In managing the recognized speaker information in association with the content, the user device 100 may use a method such as transmitting the content to the recognized speaker or converting the recognized speaker voice data into text and storing it in association with the content. have.

이러한 사용자 디바이스(100)의 일 예는 PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(Wideband-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smart phone)과 같은 휴대성과 이동성이 보장되는 무선 통신 장치(101)일 수 있다. 또한, 사용자 디바이스(100)의 일 예는 스마트 패드(Smart pad), 타블랫 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치(102)를 포함할 수 있다.Examples of such a user device 100 are PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (Wideband-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminals, and smart phones. It may be a wireless communication device 101. In addition, an example of the user device 100 may include all kinds of handheld-based wireless communication devices 102 such as a smart pad and a tablet PC.

이와 같은 사용자 디바이스(100)의 동작은 도 2에서 후술된다.The operation of the user device 100 will be described later in FIG. 2.

화자 인식 서버(200)는 네트워크를 통해 사용자 디바이스(100)로부터 주소록 상의 지인 음성 데이터와 콘텐츠와 연계된 화자 음성 데이터를 수신한다. 지인 음성 데이터를 수신하는 때에는 그 데이터를 저장하고, 화자 음성 데이터를 수신하는 때에는 기존에 저장되어 있던 지인 음성 데이터에 기초하여 수신된 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식할 수 있다. 만일 사용자 디바이스(100)에서 화자 인식 프로세스를 자체적으로 수행하는 경우에는, 화자 인식 서버(200)는 생략될 수도 있다.The speaker recognition server 200 receives voice data of acquaintances in the address book and speaker voice data associated with content from the user device 100 through a network. When acquaintance voice data is received, the data is stored, and when speaker voice data is received, one or more speakers included in the received speaker voice data may be recognized based on the previously stored acquaintance voice data. If the user device 100 performs the speaker recognition process by itself, the speaker recognition server 200 may be omitted.

화자 인식 서버(200)는 위 인식된 화자의 정보를 사용자 디바이스(100)에게 송신하는 기능을 수행한다.The speaker recognition server 200 performs a function of transmitting the recognized speaker information to the user device 100.

도 2는 본 발명의 일 실시예에 따른 사용자 디바이스(100)의 구성도의 일 예이다. 도 2를 참조하면, 사용자 디바이스(100)는 메모리(110), 프로세싱 유닛(120), 카메라(130) 및 마이크(140)를 포함할 수 있다.2 is an example of a configuration diagram of a user device 100 according to an embodiment of the present invention. Referring to FIG. 2, the user device 100 may include a memory 110, a processing unit 120, a camera 130, and a microphone 140.

다만, 도 2에 도시된 사용자 디바이스(100)는 본원의 하나의 구현 예에 불과하며, 도 2에 도시된 구성 요소들을 기초로 하여 여러 형태로 변형이 가능함은 본원의 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 이해할 수 있다. 예를 들어, 구성 요소들과 해당 구성 요소들 안에서 제공되는 기능은 더 작은 수의 구성 요소들로 결합되거나 추가적인 구성 요소들로 더 분리될 수 있다.However, the user device 100 shown in FIG. 2 is only one example of implementation of the present application, and it is possible to change it in various forms based on the components shown in FIG. 2. Anyone who has the knowledge of can understand. For example, components and functions provided within the components may be combined into a smaller number of components or further divided into additional components.

메모리(110)는 콘텐츠, 지인 음성 데이터 및 화자 음성 데이터가 저장되는 영역이다. 콘텐츠는 사용자 디바이스(100)가 네트워크를 통해서 다운로드 받거나 사용자 디바이스(100)의 이용자가 카메라(130) 등을 이용하여 생성한 사진 또는 동영상일 수 있다. 지인 음성 데이터는, 예를 들어, 사용자 디바이스(100)의 이용자가 주소록 상의 지인들과 음성 통화를 할 때 프로세싱 유닛(120) 및 마이크(140)에 의해 자동 또는 수동으로 생성되는 데이터이다. 화자 음성 데이터는 사용자 디바이스(100)의 이용자가 위 콘텐츠를 생성하거나 사용할 때 입력된 화자의 음성이 프로세싱 유닛(120) 및 마이크(140)에 의해 자동 또는 수동으로 데이터화 된 것이다.The memory 110 is an area in which content, acquaintance voice data, and speaker voice data are stored. The content may be a photo or video that is downloaded by the user device 100 through a network or generated by a user of the user device 100 using the camera 130 or the like. The acquaintance voice data is, for example, data automatically or manually generated by the processing unit 120 and the microphone 140 when a user of the user device 100 makes a voice call with acquaintances in the address book. The speaker voice data is obtained by automatically or manually converting the voice of the speaker inputted when the user of the user device 100 creates or uses the above content by the processing unit 120 and the microphone 140.

프로세싱 유닛(120)은 마이크(140)를 통해 입력된 지인 음성 및 화자 음성을 각각 데이터의 형태로 변환하여 지인 음성 데이터와 화자 음성 데이터를 생성하는 기능을 수행한다. 또한, 프로세싱 유닛(120)은 사용자 디바이스 이용자가 콘텐츠를 다운로드 하거나 생성하는 경우 그 이용자의 명령(다운로드 또는 생성)을 수행한다. 사용자 디바이스(100) 자체적으로 지인 음성 데이터에 기초하여 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식하는 경우 프로세싱 유닛(120)은 메모리(110)에 저장된 지인 음성 데이터와 화자 음성 데이터를 비교하는 등의 방법을 이용하여 그 기능을 수행한다.The processing unit 120 performs a function of generating acquaintance voice data and speaker voice data by converting an acquaintance voice and a speaker voice input through the microphone 140 into data types, respectively. Further, the processing unit 120 performs an instruction (download or creation) of the user device user when the user device user downloads or creates the content. When the user device 100 itself recognizes one or more speakers included in the speaker voice data based on the acquaintance voice data, the processing unit 120 compares the voice data of the acquaintance stored in the memory 110 with the speaker voice data. It performs its function using the method of

카메라(130)는 사용자 디바이스(100)의 이용자가 사진을 찍거나 동영상을 촬영하여 콘텐츠를 생성할 수 있게 한다. 이렇게 생성된 콘텐츠는 메모리(110)에 저장되거나 프로세싱 유닛(120)에 의해 처리될 수 있다.The camera 130 enables a user of the user device 100 to create content by taking a picture or taking a video. The content thus generated may be stored in the memory 110 or processed by the processing unit 120.

마이크(140)는 사용자 디바이스(100)의 이용자가 주소록 상의 지인들과 통화를 하거나 콘텐츠 생성 및 사용을 할 때 화자의 음성이 입력되도록 하는 기능을 수행한다.The microphone 140 functions to input a speaker's voice when a user of the user device 100 makes a call with acquaintances in the address book or creates and uses content.

도 3은 본 발명의 일 실시예에 따른 사용자 디바이스(100)에 표시되는 지인 음성 데이터의 예시를 도시한 도면이다. 사용자 디바이스(100)의 이용자가 사용자 디바이스(100)의 주소록 내에 있는 지인과 음성 통화를 할 때 자동 또는 수동으로 그 지인의 음성을 지인 음성 데이터의 형태로 저장하는데 이 때 그 지인 음성 데이터 관련 정보(320)는 주소록 내에 있는 지인의 이름(310) 등의 정보와 결합되어 저장될 수 있다.3 is a diagram illustrating an example of acquaintance voice data displayed on the user device 100 according to an embodiment of the present invention. When a user of the user device 100 makes a voice call with an acquaintance in the address book of the user device 100, the acquaintance's voice is automatically or manually stored in the form of acquaintance voice data. 320) may be stored in combination with information such as the name 310 of an acquaintance in the address book.

도 4는 본 발명의 일 실시예에 따른 인식된 화자 정보를 이용한 콘텐츠의 관리의 예시를 도시한 도면이다. 사용자 디바이스(100)의 이용자가 5명의 인물(401, 402, 403, 404, 405)을 찍은 사진(410)이 있을 때 사진 속의 인물 중 403, 405가 각각 A, B라는 주소록 상에 저장되어 있는 지인이라면, 이들이 사진을 찍을 때 한 말이 음성 태그(420)로서 기록되고 이들에 대한 인식 정보가 화자 태그(430)로서 기록될 수 있다. 그리고 이렇게 인식된 화자에게 사진을 자동으로 전송할 지에 대한 설정(440)이 콘텐츠 관리 방법의 하나로서 표시될 수 있다. 이와 같이 기록된 음성 태그(420) 및 화자 태그(430)는 추후 콘텐츠 검색 시에 활용될 수 있다.4 is a diagram illustrating an example of management of content using recognized speaker information according to an embodiment of the present invention. When a user of the user device 100 has a photo 410 of 5 people (401, 402, 403, 404, 405), among the people in the photo, 403 and 405 are stored in the address book A and B, respectively. If they are acquaintances, the words they say when taking a picture may be recorded as a voice tag 420 and recognition information for them may be recorded as a speaker tag 430. In addition, a setting 440 for automatically transmitting a picture to the recognized speaker may be displayed as one of the content management methods. The voice tag 420 and speaker tag 430 recorded as described above may be used in future content search.

도 5는 본 발명의 일 실시예에 따른 사용자 디바이스(100) 및 화자 인식 서버(200)에서 수행되는 콘텐츠 관리 방법을 나타내는 신호 흐름도이다. 도 5를 참조하면, 단계 S510에서 사용자 디바이스(100)는 사용자 디바이스(100)의 이용자가 평소에 주소록 상 지인들과 전화 통화를 할 때 전화 통화의 상대방인 지인들의 음성을 데이터화 하여 저장한다. 예를 들어, 사용자 디바이스(100)의 이용자가 주소록에 저장된 지인 A와 통화를 하는 경우 사용자 디바이스(100)는 자동 또는 수동으로 설정된 방법에 의하여 지인 A의 음성을 데이터화 하여 지인 A의 음성 데이터를 사용자 디바이스(100)에 저장한다.5 is a signal flow diagram illustrating a content management method performed in the user device 100 and the speaker recognition server 200 according to an embodiment of the present invention. Referring to FIG. 5, in step S510, when a user of the user device 100 makes a phone call with acquaintances in the address book, voices of acquaintances who are the counterpart of the phone call are converted into data and stored. For example, when a user of the user device 100 makes a call with an acquaintance A stored in the address book, the user device 100 converts the acquaintance A's voice into data by automatically or manually set method to use the acquaintance A's voice data. It is stored in the device 100.

단계 S520에서 사용자 디바이스(100)는 저장된 지인 음성 데이터를 화자 인식 서버(200)에게 전송한다.In step S520, the user device 100 transmits the stored acquaintance voice data to the speaker recognition server 200.

단계 S530에서 사용자 디바이스(100)는 사용자 디바이스(100)의 이용자가 콘텐츠를 생성하거나 이미 존재하는 콘텐츠를 사용하는 때 입력된 화자(들)의 음성을 데이터화 하여 그 콘텐츠와 연계하여 저장한다. 사용자 디바이스(100)의 이용자가 콘텐츠를 생성하는 경우의 예를 들면, 사용자 디바이스(100)의 이용자가 사용자 디바이스(100)를 이용하여 세 명의 사람(화자 A, 화자 B, 화자 C)이 모여 있는 사진을 찍을 때 그 피사체인 세 명의 사람이 말을 하게 되면 화자 A, 화자 B, 화자 C의 음성을 데이터화 하여 모두 저장하고 그 화자 음성 데이터를 그 사진과 연계하여 저장할 수 있다. 사용자 디바이스(100)의 이용자가 이미 존재하고 있는 타인에 의해 생성된 콘텐츠를 사용하는 경우의 예를 들면, 사용자 디바이스(100)의 이용자가 네트워크를 통해 다운로드 받은 동영상을 두 명의 사람(화자 D, 화자 E)과 함께 감상하는 동안 그 두 명의 사람이 말을 하게 되면 화자 D, 화자 E의 음성을 데이터화 하여 모두 저장하고 그 화자 음성 데이터를 그 동영상 콘텐츠와 연계하여 저장할 수 있다.In step S530, the user device 100 converts the voice of the speaker(s) inputted when the user of the user device 100 creates content or uses an existing content, and stores it in association with the content. For example, when a user of the user device 100 creates content, a user of the user device 100 uses the user device 100 to gather three people (speaker A, speaker B, speaker C). When taking a picture, if the subject of three people speaks, the voices of speaker A, speaker B, and speaker C are converted into data, and all the voices of speaker A, speaker B, and speaker C are saved, and the speaker's voice data can be stored in association with the picture. For example, when a user of the user device 100 uses content created by another person that already exists, two people (speaker D, speaker) download a video downloaded by the user of the user device 100 through a network. E) If the two people talk while listening together, the voices of speakers D and E can be converted into data and stored, and the speaker voice data can be stored in association with the video content.

단계 S540에서 사용자 디바이스(100)는 저장된 화자 음성 데이터를 화자 인식 서버(200)에게 전송한다.In step S540, the user device 100 transmits the stored speaker voice data to the speaker recognition server 200.

단계 S550에서 화자 인식 서버(200)는 단계 S520을 통해 사용자 디바이스(100)로부터 수신한 지인 음성 데이터에 기초하여 단계 S540을 통해 사용자 디바이스(100)로부터 수신한 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식한다. 예를 들어, 사용자 디바이스(100)의 주소록 상 지인 정보 중에서 A, B, C의 지인 음성 데이터가 화자 인식 서버(200)로 전송된 경우 사용자 디바이스(100)의 이용자가 사진 1을 찍을 때 A, C, D가 말을 하여 화자 A, 화자 C 및 화자 D에 대한 화자 음성 데이터가 화자 인식 서버(200)로 전달되었다면 화자 인식 서버(200)는 사용자 디바이스(100)로부터 수신한 지인 음성 데이터와 화자 음성 데이터를 이용하여 화자 A 및 화자 C를 인식할 수 있다.In step S550, the speaker recognition server 200 includes one or more speakers included in the speaker voice data received from the user device 100 through step S540 based on the acquaintance voice data received from the user device 100 through step S520. Recognize. For example, when the voice data of acquaintances A, B, and C among acquaintances in the address book of the user device 100 is transmitted to the speaker recognition server 200, when the user of the user device 100 takes a picture 1, A, When C and D speak, and the speaker voice data for speakers A, speaker C, and speaker D are delivered to the speaker recognition server 200, the speaker recognition server 200 is the acquaintance voice data received from the user device 100 and the speaker. Speaker A and speaker C can be recognized using voice data.

단계 S560에서 화자 인식 서버(200)는 인식된 화자 정보를 사용자 디바이스(100)에게 전송한다.In step S560, the speaker recognition server 200 transmits the recognized speaker information to the user device 100.

단계 S570에서 사용자 디바이스(100)는 화자 인식 서버(200)로부터 수신한 인식 화자 정보를 단계 S530에서의 콘텐츠와 연계하여 관리한다. 예를 들어, 사용자 디바이스(100)는 인식된 화자에게 자동 또는 수동 설정으로 위 콘텐츠를 전송할 수 있다. 또 다른 예로, 사용자 디바이스(100)는 인식된 화자의 음성 데이터를 텍스트로 변환하고 그 변환된 텍스트를 화자 별로 구분하여 위 콘텐츠와 연계하여 저장할 수 있다.In step S570, the user device 100 manages the recognized speaker information received from the speaker recognition server 200 in association with the content in step S530. For example, the user device 100 may transmit the above content to the recognized speaker automatically or manually. As another example, the user device 100 may convert the recognized speaker's voice data into text, divide the converted text for each speaker, and store the converted text in association with the above content.

이후, 사용자 디바이스(100)는 사용자 디바이스(100)의 이용자로부터 화자 정보를 포함하는 검색 요청을 받으면 그 화자 정보에 대응하는 콘텐츠를 검색하여 표시할 수 있다.Thereafter, when receiving a search request including speaker information from a user of the user device 100, the user device 100 may search for and display content corresponding to the speaker information.

상술한 설명에서, 단계 S510 내지 S570은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S510 to S570 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

도 6은 본 발명의 일 실시예에 따른 사용자 디바이스(100)에서 수행되는 콘텐츠 관리 방법에 관한 순서도이다. 도 6에 도시된 콘텐츠 관리 방법은 도 1 내지 도 2를 통해 설명되는 사용자 디바이스(100)에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하여도 도 1 내지 도 2를 통해 사용자 디바이스(100)에 대하여 설명된 내용은 도 6에도 적용된다.6 is a flowchart illustrating a content management method performed in the user device 100 according to an embodiment of the present invention. The content management method illustrated in FIG. 6 includes steps processed in a time series by the user device 100 described with reference to FIGS. 1 to 2. Therefore, even if omitted below, the contents described with respect to the user device 100 through FIGS. 1 to 2 are also applied to FIG. 6.

단계 S610에서 사용자 디바이스(100)는 사용자 디바이스(100)의 이용자가 평소에 주소록 상 지인들과 전화 통화를 할 때 전화 통화의 상대방인 지인들의 음성을 데이터화 하여 저장한다. 예를 들어, 사용자 디바이스(100)의 이용자가 주소록에 저장된 지인 A와 통화를 하는 경우 사용자 디바이스(100)는 자동 또는 수동으로 설정된 방법에 의하여 지인 A의 음성을 데이터화 하여 지인 A의 음성 데이터를 사용자 디바이스(100)에 저장한다.In step S610, the user device 100 converts and stores the voices of acquaintances who are the counterparts of the phone call when the user of the user device 100 makes a phone call with acquaintances in the address book. For example, when a user of the user device 100 makes a call with an acquaintance A stored in the address book, the user device 100 converts the acquaintance A's voice into data by automatically or manually set method to use the acquaintance A's voice data. It is stored in the device 100.

단계 S620에서 사용자 디바이스(100)는 사용자 디바이스(100)의 이용자가 콘텐츠를 생성하거나 이미 존재하는 콘텐츠를 사용하는 때 입력된 화자(들)의 음성을 데이터화 하여 그 콘텐츠와 연계하여 저장한다. 사용자 디바이스(100)의 이용자가 콘텐츠를 생성하는 경우의 예를 들면, 사용자 디바이스(100)의 이용자가 사용자 디바이스(100)를 이용하여 세 명의 사람(화자 A, 화자 B, 화자 C)이 모여 있는 사진을 찍을 때 그 피사체인 세 명의 사람이 말을 하게 되면 화자 A, 화자 B, 화자 C의 음성을 데이터화 하여 모두 저장하고 그 화자 음성 데이터를 그 사진과 연계하여 저장할 수 있다. 사용자 디바이스(100)의 이용자가 이미 존재하고 있는 타인에 의해 생성된 콘텐츠를 사용하는 경우의 예를 들면, 사용자 디바이스(100)의 이용자가 네트워크를 통해 다운로드 받은 동영상을 두 명의 사람(화자 D, 화자 E)과 함께 감상하는 동안 그 두 명의 사람이 말을 하게 되면 화자 D, 화자 E의 음성을 데이터화 하여 모두 저장하고 그 화자 음성 데이터를 그 동영상 콘텐츠와 연계하여 저장할 수 있다.In step S620, the user device 100 converts the voice of the speaker(s) inputted when a user of the user device 100 creates content or uses an existing content, and stores it in association with the content. For example, when a user of the user device 100 creates content, a user of the user device 100 uses the user device 100 to gather three people (speaker A, speaker B, speaker C). When taking a picture, if the subject of three people speaks, the voices of speaker A, speaker B, and speaker C are converted into data, and all the voices of speaker A, speaker B, and speaker C are saved, and the speaker's voice data can be stored in association with the picture. For example, when a user of the user device 100 uses content created by another person that already exists, two people (speaker D, speaker) download a video downloaded by the user of the user device 100 through a network. E) If the two people talk while listening together, the voices of speakers D and E can be converted into data and stored, and the speaker voice data can be stored in association with the video content.

단계 S630에서 사용자 디바이스(100)는 지인 음성 데이터에 기초하여 화자 음성 데이터에 포함된 한 명 이상의 화자를 인식한다. 예를 들어, 사용자 디바이스(100)의 주소록 상 지인 정보 중에서 A, B, C의 지인 음성 데이터가 있는 경우 사용자 디바이스(100)의 이용자가 사진 1을 찍을 때 A, C, D가 말을 하여 화자 A, 화자 C 및 화자 D에 대한 화자 음성 데이터가 생성되었다면 사용자 디바이스(100)는 지인 음성 데이터와 화자 음성 데이터를 이용하여 화자 A 및 화자 C를 인식할 수 있다.In step S630, the user device 100 recognizes one or more speakers included in the speaker voice data based on the acquaintance voice data. For example, if there is voice data of acquaintances of A, B, and C among acquaintances in the address book of the user device 100, when the user of the user device 100 takes a picture 1, A, C, and D speak and speak. If speaker voice data for A, speaker C, and speaker D have been generated, the user device 100 may recognize speaker A and speaker C using the acquaintance voice data and speaker voice data.

단계 S640에서 사용자 디바이스(100)는 단계 S630에서 얻은 인식 화자 정보를 단계 S620에서의 콘텐츠와 연계하여 관리한다. 예를 들어, 사용자 디바이스(100)는 인식된 화자에게 자동 또는 수동 설정으로 위 콘텐츠를 전송할 수 있다. 또 다른 예로, 사용자 디바이스(100)는 인식된 화자의 음성 데이터를 텍스트로 변환하고 그 변환된 텍스트를 화자 별로 구분하여 위 콘텐츠와 연계하여 저장할 수 있다.In step S640, the user device 100 manages the recognized speaker information obtained in step S630 in association with the content in step S620. For example, the user device 100 may transmit the above content to the recognized speaker automatically or manually. As another example, the user device 100 may convert the recognized speaker's voice data into text, divide the converted text for each speaker, and store the converted text in association with the above content.

상술한 설명에서, 단계 S610 내지 S640은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S610 to S640 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

도 6을 통해 설명된 사용자 디바이스(100)에 의해 수행되는 화자 인식을 통한 콘텐츠의 관리 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method for managing contents through speaker recognition performed by the user device 100 described with reference to FIG. 6 may also be implemented in the form of a recording medium including instructions executable by a computer such as a program module executed by a computer. I can. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include a computer storage medium. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

한편, 도 6을 통해 설명된 사용자 디바이스에 의해 수행되는 화자 인식을 통한 콘텐츠의 관리 방법은 소정 어플리케이션을 통해 수행될 수도 있다.Meanwhile, the content management method through speaker recognition performed by the user device described with reference to FIG. 6 may be performed through a predetermined application.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명된 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명된 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention do.

100: 사용자 디바이스
200: 서버
210: 메모리
220: 프로세싱 유닛100: user device
200: server
210: memory
220: processing unit

Claims

In a method for managing content through speaker recognition driven by a user device,
Storing acquaintance voice data for acquaintances in the address book in combination with acquaintance information in the address book;
Storing speaker voice data in association with the content;
Comparing the acquaintance voice data stored in combination with the acquaintance information and the speaker voice data to recognize one or more speakers included in the speaker voice data corresponding to the acquaintance information; And
Managing the recognized speaker information in connection with the content
Including,
The step of managing the recognized speaker's information in connection with the content,
Record the speaker voice data as a voice tag,
Record the recognized speaker's information as a speaker tag,
Searching and displaying the content based on the voice tag and the speaker tag,
Receiving a search request including information of the speaker; And
Searching for content corresponding to the speaker's information and displaying the content with a setting for whether to transmit the content or not
The content management method further comprising.

The method of claim 1,
The voice data of the acquaintance is recorded and generated during a call with an acquaintance in the address book.

The method of claim 1,
The speaker voice data is recorded and generated when taking a picture.

The method of claim 1,
Transmitting the content to the recognized speaker
The content management method further comprising.

The method of claim 1,
Converting the speaker voice data into text; And
Dividing the converted text by speaker and storing it in association with the content
The content management method further comprising.

In the content management method through speaker recognition driven by a user device,
Combining the acquaintance voice data for acquaintances in the address book with acquaintance information in the address book and transmitting to a speaker recognition server;
Storing speaker voice data in association with the content;
Transmitting the speaker voice data to the speaker recognition server;
Comparing an acquaintance voice data stored in combination with the acquaintance information and the speaker voice data from the speaker recognition server to receive a recognition result of one or more speakers included in the speaker voice data corresponding to the acquaintance information; And
Managing the recognition result in connection with the content
Including,
The step of managing the recognition result in association with the content,
Record the speaker voice data as a voice tag,
Record the recognized speaker's information as a speaker tag,
Searching and displaying the content based on the voice tag and the speaker tag,
Receiving a search request including information of the speaker; And
Searching for content corresponding to the speaker's information and displaying the content with a setting for whether to transmit the content or not
The content management method further comprising.

The method of claim 6,
The voice data of the acquaintance is recorded and generated during a call with an acquaintance in the address book.

The method of claim 6,
The speaker voice data is recorded and generated when taking a picture.

The method of claim 6,
Transmitting the content to the recognized speaker
The content management method further comprising.

The method of claim 6,
Converting the speaker voice data into text; And
Dividing the converted text by speaker and storing it in association with the content
The content management method further comprising.

In the content management method through speaker recognition,
Receiving acquaintance voice data for acquaintances in an address book from a mobile terminal by combining acquaintance information in the address book;
Receiving speaker voice data from the mobile terminal;
Comparing the acquaintance voice data stored in combination with the acquaintance information and the speaker voice data to recognize one or more speakers included in the speaker voice data corresponding to the acquaintance information; And
Transmitting the recognized speaker's information to the mobile terminal
Including,
Record the speaker voice data as a voice tag,
Record the recognized speaker's information as a speaker tag,
Searching and displaying the content based on the voice tag and the speaker tag,
Receiving a search request including information of the speaker; And
Searching for content corresponding to the speaker's information and displaying the content with a setting for whether to transmit the content or not
The content management method further comprising.

The method of claim 11,
Converting the speaker voice data into text; And
Dividing the converted text for each speaker and transmitting the converted text to the mobile terminal
The content management method further comprising.

In a computing device running a content management application,
Memory, and
A processing unit arranged to interface with the memory
Including,
The processing unit,
Store acquaintance voice data for acquaintances in the address book by combining it with acquaintance information in the address book,
Store speaker voice data in connection with the content,
Recognizing one or more speakers included in the speaker voice data corresponding to the acquaintance information by comparing the acquaintance voice data stored in combination with the acquaintance information and the speaker audio data,
The recognized speaker's information is managed in connection with the content,
Record the speaker voice data as a voice tag,
Record the recognized speaker's information as a speaker tag,
Search and display the content based on the voice tag and the speaker tag,
Receiving a search request including information of the speaker,
The computing device, configured to search for content corresponding to the speaker's information and display the content together with a setting as to whether or not to transmit the content.

A computer-readable medium containing a sequence of instructions for managing content in an application, comprising:
When executed by a computing device, the computing device,
Store acquaintance voice data for acquaintances in the address book by combining it with acquaintance information in the address book,
Store speaker voice data in connection with the content,
Recognizing one or more speakers included in the speaker voice data corresponding to the acquaintance information by comparing the acquaintance voice data stored in combination with the acquaintance information and the speaker audio data,
The recognized speaker's information is managed in connection with the content,
Record the speaker voice data as a voice tag,
Record the recognized speaker's information as a speaker tag,
Search and display the content based on the voice tag and the speaker tag,
Receiving a search request including information of the speaker,
A computer-readable medium for retrieving content corresponding to the speaker's information and displaying the content together with a setting for whether to transmit the content.

The method of claim 14,
The computer-readable medium further comprising a sequence of instructions that, when executed by the computing device, cause the computing device to record the acquaintance voice data upon a call with the acquaintance in the address book.

The method of claim 15,
The computer-readable medium further comprising a sequence of instructions that, when executed by the computing device, cause the computing device to record the speaker voice data when taking a picture.

The method of claim 14,
The computer-readable medium further comprising a sequence of instructions that, when executed by the computing device, cause the computing device to transmit the content to the recognized speaker.

The method of claim 17,
When executed by the computing device, the computing device further comprises a sequence of instructions for causing the computing device to convert the speaker voice data into text, divide the converted text by speaker, and store in association with the content .