KR20140086853A

KR20140086853A - Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis

Info

Publication number: KR20140086853A
Application number: KR1020130161071A
Authority: KR
Inventors: 장세진; 신사임
Original assignee: 전자부품연구원
Priority date: 2012-12-27
Filing date: 2013-12-23
Publication date: 2014-07-08

Abstract

The present invention relates to a device and method for managing content based on a speaker using voice data analysis. The device for managing content based on a speaker using voice data analysis, according to the present invention, is characterized in that voice feature information is extracted by analyzing voice data extracted from content, more specifically, a video, for the purpose of automatic content classification and user-based listing, and the video is listed up on the basis of similarity between pieces of the extracted voice feature information so as to be grouped and managed. According to the present invention, content can be managed on the basis of meaning, so that satisfaction of a user can be improved. In particular, since content can be managed on the basis of voice data, pieces of content including the same speaker can be grouped together and managed, thereby giving convenience to the user.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speaker-based content management apparatus and method,

The present invention relates to an apparatus and method for managing a speaker-based content through voice data analysis, and more particularly, to an apparatus and method for managing content in a user's personal device.

Generally, the mobile communication terminal further includes a wireless Internet access function in addition to a unique phone call function, a short message transmission / reception function (e.g., news, short message, etc.), and an address book management function. In addition, a recently manufactured and sold mobile communication terminal includes a digital camera and further includes a function of photographing a desired subject or storing it as a moving picture, and a function of playing a sound source file of an MP3 format.

As the mobile communication terminal has various functions, various data (for example, sending and receiving text messages, photo images, moving pictures, etc.) are generated in the mobile communication terminal itself and various data Data (for example, MP3 sound source data, moving pictures, contents for executing games, etc.) are required. Since various data are generated or required for performing various functions or for performing functions, the mobile communication terminal should be able to smoothly manage the corresponding data (contents).

Meanwhile, in the related art, a task of managing and listing contents in a personalized commercial device utilizing a multimedia, i.e., a mobile communication terminal, is based on a file name or a file creation time.

In recent years, with the expansion of convergence technology, various electronic devices are providing multimedia related technologies, and the utilization range of such electronic devices is gradually becoming more personalized. That is, mobile communication terminals such as a mobile phone and a personal digital assistant (PDA) have become a necessity of life for modern people and become a complex device capable of performing various functions.

Therefore, the amount of multimedia contents to be directly generated and managed by users has increased, and a new content management method has become necessary.

The present invention has been made in view of the above needs, and it is an object of the present invention to extract voice characteristic information by analyzing contents, particularly voice data extracted from a moving picture, for automatic classification of contents and user-based listing, Based content management apparatus and method by analyzing voice data that list and group up moving pictures based on the video data and manage them.

According to an aspect of the present invention, there is provided a speaker-based content management apparatus for analyzing voice data according to an aspect of the present invention, comprising: a voice signal extracting unit for extracting a voice signal for performing speaker recognition from contents input to a user device; A voice characteristic information extracting unit for extracting voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit; A similarity measuring unit for measuring a similarity of the voice characteristic information extracted by the voice characteristic information extracting unit based on the speaker recognition algorithm based on the stored speaker model reference pattern; And a manager for arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based contents list and group information.

According to another aspect of the present invention, there is provided a speaker-based content management method for analyzing speech data, the method comprising: extracting a speech signal for performing speaker recognition from video content input to a user device; Extracting voice characteristic information for speaker analysis from the extracted voice signal; Measuring the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm on the basis of the stored speaker model reference pattern; And arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based content list and group information.

According to the present invention, contents can be managed based on the meaning of the contents, thereby improving the user's satisfaction.

In particular, since contents can be managed on the basis of voice data, it is convenient to group and manage contents in which the same speaker appears.

1 is a view for explaining a speaker-based content management apparatus through analysis of speech data according to an embodiment of the present invention;
2 is a diagram for explaining a speaker-based content management method through voice data analysis according to an embodiment of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. And is intended to enable a person skilled in the art to readily understand the scope of the invention, and the invention is defined by the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. It is noted that " comprises, " or "comprising," as used herein, means the presence or absence of one or more other components, steps, operations, and / Do not exclude the addition.

The present invention can provide a content list in a manner different from conventional ones in a content providing environment of various personalized devices (PC, digital camera, camcorder, mobile phone, tablet, etc.) for creating and utilizing video contents, The user can be satisfied with the utilization of the device and the contents through the personalized content management method out of the method of listing and managing the contents based on the order of the files or the file creation date.

For example, the present invention relates to an apparatus and method for quickly extracting semantic information of moving picture contents and managing contents thereof, and proposes a novel paradigm for data management of personal devices. Speaker recognition techniques using voice data analysis Artificial intelligence technology) has high accuracy and portability when compared with other voice data based technologies, meaningful information of useful contents can be extracted without complicated metadata annotation process through this technology, and extracted meaning It is possible to classify contents based on information.

Hereinafter, a speaker-based content management apparatus through voice data analysis according to an embodiment of the present invention will be described with reference to FIG. 1 is a diagram for explaining a speaker-based content management apparatus through analysis of speech data according to an embodiment of the present invention.

1, a speaker-based content management apparatus 100 for analyzing speech data according to the present invention includes a voice signal extracting unit 110, a voice characteristic information extracting unit 120, a similarity measuring unit 130, (140).

The voice signal extracting unit 110 extracts a voice signal for performing speaker recognition from the input moving picture when a content (hereinafter, referred to as a moving picture) is input to the user device.

Here, Speaker Recognition searches the database in which the inputted voice data is stored in the database in which the voice data corresponding to the speaker information is previously stored, and when the voice data matching the inputted voice data is retrieved from the database, It is a technique to identify who the speaker is based on information.

The voice signal extracting unit 110 may determine the length of the voice signal for performing speaker recognition within a range that ensures accuracy in speaker recognition and enables real-time interworking with the system to which the present invention is applied.

In addition, the audio signal extracting unit 110 can determine the extraction position of the audio signal in the inputted moving picture according to the target of the speaker recognition application, that is, the purpose and the policy of the service or the system to which the present invention is applied.

For example, the audio signal extracting unit 110 may extract the audio signal at the beginning of the moving image, extract the highlighted portion of the predetermined moving image, extract the audio signal from the highlighted portion of the extracted moving image, and the like It is possible to extract voice signals by setting various extraction positions in the input moving picture.

The voice characteristic information extracting unit 120 extracts voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit 110.

For example, the voice characteristic information extracting unit 120 extracts voice characteristic information necessary for speaker analysis according to a speaker recognition algorithm to be applied by the similarity measuring unit 130.

The degree-of-similarity measuring unit 130 measures the degree of similarity of the voice characteristic information extracted by the voice characteristic information extracting unit 120 according to a speaker recognition algorithm applied to a model for speaker recognition extraction Measure based on pattern.

Meanwhile, a general speaker recognition algorithm is a technique of recognizing a speaker who is appearing in a voice signal by analyzing a voice signal. The speaker recognition algorithm used here is a technique of recognizing a voice signal extracted from a moving picture The similarity between the extracted speaker signals is compared to measure the similarity between the speakers extracted from the moving picture so that the moving pictures having the similarity degree between the speakers having the reference value or higher can be grouped.

The management unit 140 arranges and groups the contents of the user device based on the measured similarity to construct a speaker-based contents list and group information.

For example, in the prior art, the moving picture photographed by the camcorder or the personal smart device is listed based on the date of photographing and the file name, and the list is provided to the user. However, the present invention analyzes the voice signal, The photographed moving images can be grouped by reflecting the context of photographed moving images, and can be classified into similar moving images and provided to the user.

More specifically, it is possible to group moving pictures in which the same speakers appear on the basis of the similarity of the speakers appearing in the audio signal into a specific group, and group similar images (room, Party, performance, etc.), grouping by the character of the video (lecture contents, conversation contents, conversation contents, music, etc.), and the grouped result can be provided to the user.

As described above, the video management technology through grouping can enhance the satisfaction of searching for a user's video and searching for a desired content, and can efficiently manage not only contents photographed by the user but also animation, movies, lecture materials, Management of movie / animation according to characters, and management of individual image according to speaker or lecturer are easy.

1, a speaker-based content management apparatus for analyzing speech data according to an exemplary embodiment of the present invention has been described. Hereinafter, a speaker-based content management apparatus for analyzing speech data according to an exemplary embodiment of the present invention will be described with reference to FIG. Based content management method. FIG. 2 is a diagram for explaining a speaker-based content management method by analyzing speech data according to an embodiment of the present invention.

As shown in FIG. 2, when a new content is input to the user device, the speaker-based content management apparatus 100 extracts a speech signal for performing speaker recognition from the input content (S200).

The length of the speech signal to be extracted can be set in a line that guarantees speaker recognition accuracy during speaker recognition and enables real-time system interworking.

In addition, the extraction position of the voice signal in the input moving image may be the beginning portion of the moving image or the highlight portion of the moving image according to the purpose and policy of the applied object.

The voice characteristic information for speaker analysis is extracted from the extracted voice signal (S201).

On the other hand, the voice feature information can be extracted from the extracted voice signal, extracted from the user voice signal, or extracted from pre-registered (pre-stored) contents.

We perform the similarity measurement on extracted feature information based on the speaker recognition algorithm applied to the model for speaker recognition extraction.

For example, the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm is measured based on the stored speaker model reference pattern (S202).

Based on the measured similarity, the contents of the user device are sorted and grouped to form a speaker-based content list and group information (S203).

As described above, according to the present invention, it is possible to manage the file list of the user device by analyzing the voice data of the moving picture, manage the moving picture of the user device based on the user appearing in the moving picture, It is also possible to manage the moving picture of the user device by grouping them on the basis of the user. Also, it is possible to manage the file list of the user device by analyzing the audio data of the audio file, manage the video of the user device based on the user appearing in the audio file, and grouped based on the users appearing in the audio file Thereby managing the moving picture of the user device.

According to the present invention, the content can be managed on a semantic basis, thereby improving the satisfaction of the user. In particular, the content of the user device can be managed based on the voice data, There is convenience.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

110: Voice signal extracting unit 120: Voice characteristic information extracting unit
130: degree of similarity measurement unit 140:

Claims

A voice signal extracting unit for extracting a voice signal for performing speaker recognition from contents input to the user device;
A voice characteristic information extracting unit for extracting voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit;
A similarity measuring unit for measuring a similarity of the voice characteristic information extracted by the voice characteristic information extracting unit based on the speaker recognition algorithm based on the stored speaker model reference pattern; And
Based on the measured similarity, the content of the user device is sorted and grouped to form a speaker-based content list and group information,
Based content management apparatus comprising:

The method according to claim 1,
The voice signal extracting unit may set the extraction position of the voice signal in the input content in accordance with the purpose and the policy of the speaker recognition application target
Based content management device through voice data analysis.

Extracting a voice signal for performing speaker recognition from video content input to the user device;
Extracting voice characteristic information for speaker analysis from the extracted voice signal;
Measuring the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm on the basis of the stored speaker model reference pattern; And
And arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based content list and group information
Based content management through voice data analysis.

The method of claim 3,
The voice characteristic information is extracted from at least one of the extracted voice signal, user voice signal, and pre-registered (pre-stored) contents
Based content management through voice data analysis.