KR20140086853A - Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis - Google Patents

Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis Download PDF

Info

Publication number
KR20140086853A
KR20140086853A KR1020130161071A KR20130161071A KR20140086853A KR 20140086853 A KR20140086853 A KR 20140086853A KR 1020130161071 A KR1020130161071 A KR 1020130161071A KR 20130161071 A KR20130161071 A KR 20130161071A KR 20140086853 A KR20140086853 A KR 20140086853A
Authority
KR
South Korea
Prior art keywords
speaker
voice
voice signal
extracted
characteristic information
Prior art date
Application number
KR1020130161071A
Other languages
Korean (ko)
Inventor
장세진
신사임
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Publication of KR20140086853A publication Critical patent/KR20140086853A/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a device and method for managing content based on a speaker using voice data analysis. The device for managing content based on a speaker using voice data analysis, according to the present invention, is characterized in that voice feature information is extracted by analyzing voice data extracted from content, more specifically, a video, for the purpose of automatic content classification and user-based listing, and the video is listed up on the basis of similarity between pieces of the extracted voice feature information so as to be grouped and managed. According to the present invention, content can be managed on the basis of meaning, so that satisfaction of a user can be improved. In particular, since content can be managed on the basis of voice data, pieces of content including the same speaker can be grouped together and managed, thereby giving convenience to the user.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speaker-based content management apparatus and method,

The present invention relates to an apparatus and method for managing a speaker-based content through voice data analysis, and more particularly, to an apparatus and method for managing content in a user's personal device.

Generally, the mobile communication terminal further includes a wireless Internet access function in addition to a unique phone call function, a short message transmission / reception function (e.g., news, short message, etc.), and an address book management function. In addition, a recently manufactured and sold mobile communication terminal includes a digital camera and further includes a function of photographing a desired subject or storing it as a moving picture, and a function of playing a sound source file of an MP3 format.

As the mobile communication terminal has various functions, various data (for example, sending and receiving text messages, photo images, moving pictures, etc.) are generated in the mobile communication terminal itself and various data Data (for example, MP3 sound source data, moving pictures, contents for executing games, etc.) are required. Since various data are generated or required for performing various functions or for performing functions, the mobile communication terminal should be able to smoothly manage the corresponding data (contents).

Meanwhile, in the related art, a task of managing and listing contents in a personalized commercial device utilizing a multimedia, i.e., a mobile communication terminal, is based on a file name or a file creation time.

In recent years, with the expansion of convergence technology, various electronic devices are providing multimedia related technologies, and the utilization range of such electronic devices is gradually becoming more personalized. That is, mobile communication terminals such as a mobile phone and a personal digital assistant (PDA) have become a necessity of life for modern people and become a complex device capable of performing various functions.

Therefore, the amount of multimedia contents to be directly generated and managed by users has increased, and a new content management method has become necessary.

The present invention has been made in view of the above needs, and it is an object of the present invention to extract voice characteristic information by analyzing contents, particularly voice data extracted from a moving picture, for automatic classification of contents and user-based listing, Based content management apparatus and method by analyzing voice data that list and group up moving pictures based on the video data and manage them.

According to an aspect of the present invention, there is provided a speaker-based content management apparatus for analyzing voice data according to an aspect of the present invention, comprising: a voice signal extracting unit for extracting a voice signal for performing speaker recognition from contents input to a user device; A voice characteristic information extracting unit for extracting voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit; A similarity measuring unit for measuring a similarity of the voice characteristic information extracted by the voice characteristic information extracting unit based on the speaker recognition algorithm based on the stored speaker model reference pattern; And a manager for arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based contents list and group information.

According to another aspect of the present invention, there is provided a speaker-based content management method for analyzing speech data, the method comprising: extracting a speech signal for performing speaker recognition from video content input to a user device; Extracting voice characteristic information for speaker analysis from the extracted voice signal; Measuring the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm on the basis of the stored speaker model reference pattern; And arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based content list and group information.

According to the present invention, contents can be managed based on the meaning of the contents, thereby improving the user's satisfaction.

In particular, since contents can be managed on the basis of voice data, it is convenient to group and manage contents in which the same speaker appears.

1 is a view for explaining a speaker-based content management apparatus through analysis of speech data according to an embodiment of the present invention;
2 is a diagram for explaining a speaker-based content management method through voice data analysis according to an embodiment of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. And is intended to enable a person skilled in the art to readily understand the scope of the invention, and the invention is defined by the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. It is noted that " comprises, " or "comprising," as used herein, means the presence or absence of one or more other components, steps, operations, and / Do not exclude the addition.

The present invention can provide a content list in a manner different from conventional ones in a content providing environment of various personalized devices (PC, digital camera, camcorder, mobile phone, tablet, etc.) for creating and utilizing video contents, The user can be satisfied with the utilization of the device and the contents through the personalized content management method out of the method of listing and managing the contents based on the order of the files or the file creation date.

For example, the present invention relates to an apparatus and method for quickly extracting semantic information of moving picture contents and managing contents thereof, and proposes a novel paradigm for data management of personal devices. Speaker recognition techniques using voice data analysis Artificial intelligence technology) has high accuracy and portability when compared with other voice data based technologies, meaningful information of useful contents can be extracted without complicated metadata annotation process through this technology, and extracted meaning It is possible to classify contents based on information.

Hereinafter, a speaker-based content management apparatus through voice data analysis according to an embodiment of the present invention will be described with reference to FIG. 1 is a diagram for explaining a speaker-based content management apparatus through analysis of speech data according to an embodiment of the present invention.

1, a speaker-based content management apparatus 100 for analyzing speech data according to the present invention includes a voice signal extracting unit 110, a voice characteristic information extracting unit 120, a similarity measuring unit 130, (140).

The voice signal extracting unit 110 extracts a voice signal for performing speaker recognition from the input moving picture when a content (hereinafter, referred to as a moving picture) is input to the user device.

Here, Speaker Recognition searches the database in which the inputted voice data is stored in the database in which the voice data corresponding to the speaker information is previously stored, and when the voice data matching the inputted voice data is retrieved from the database, It is a technique to identify who the speaker is based on information.

The voice signal extracting unit 110 may determine the length of the voice signal for performing speaker recognition within a range that ensures accuracy in speaker recognition and enables real-time interworking with the system to which the present invention is applied.

In addition, the audio signal extracting unit 110 can determine the extraction position of the audio signal in the inputted moving picture according to the target of the speaker recognition application, that is, the purpose and the policy of the service or the system to which the present invention is applied.

For example, the audio signal extracting unit 110 may extract the audio signal at the beginning of the moving image, extract the highlighted portion of the predetermined moving image, extract the audio signal from the highlighted portion of the extracted moving image, and the like It is possible to extract voice signals by setting various extraction positions in the input moving picture.

The voice characteristic information extracting unit 120 extracts voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit 110.

For example, the voice characteristic information extracting unit 120 extracts voice characteristic information necessary for speaker analysis according to a speaker recognition algorithm to be applied by the similarity measuring unit 130.

The degree-of-similarity measuring unit 130 measures the degree of similarity of the voice characteristic information extracted by the voice characteristic information extracting unit 120 according to a speaker recognition algorithm applied to a model for speaker recognition extraction Measure based on pattern.

Meanwhile, a general speaker recognition algorithm is a technique of recognizing a speaker who is appearing in a voice signal by analyzing a voice signal. The speaker recognition algorithm used here is a technique of recognizing a voice signal extracted from a moving picture The similarity between the extracted speaker signals is compared to measure the similarity between the speakers extracted from the moving picture so that the moving pictures having the similarity degree between the speakers having the reference value or higher can be grouped.

The management unit 140 arranges and groups the contents of the user device based on the measured similarity to construct a speaker-based contents list and group information.

For example, in the prior art, the moving picture photographed by the camcorder or the personal smart device is listed based on the date of photographing and the file name, and the list is provided to the user. However, the present invention analyzes the voice signal, The photographed moving images can be grouped by reflecting the context of photographed moving images, and can be classified into similar moving images and provided to the user.

More specifically, it is possible to group moving pictures in which the same speakers appear on the basis of the similarity of the speakers appearing in the audio signal into a specific group, and group similar images (room, Party, performance, etc.), grouping by the character of the video (lecture contents, conversation contents, conversation contents, music, etc.), and the grouped result can be provided to the user.

As described above, the video management technology through grouping can enhance the satisfaction of searching for a user's video and searching for a desired content, and can efficiently manage not only contents photographed by the user but also animation, movies, lecture materials, Management of movie / animation according to characters, and management of individual image according to speaker or lecturer are easy.

1, a speaker-based content management apparatus for analyzing speech data according to an exemplary embodiment of the present invention has been described. Hereinafter, a speaker-based content management apparatus for analyzing speech data according to an exemplary embodiment of the present invention will be described with reference to FIG. Based content management method. FIG. 2 is a diagram for explaining a speaker-based content management method by analyzing speech data according to an embodiment of the present invention.

As shown in FIG. 2, when a new content is input to the user device, the speaker-based content management apparatus 100 extracts a speech signal for performing speaker recognition from the input content (S200).

The length of the speech signal to be extracted can be set in a line that guarantees speaker recognition accuracy during speaker recognition and enables real-time system interworking.

In addition, the extraction position of the voice signal in the input moving image may be the beginning portion of the moving image or the highlight portion of the moving image according to the purpose and policy of the applied object.

The voice characteristic information for speaker analysis is extracted from the extracted voice signal (S201).

On the other hand, the voice feature information can be extracted from the extracted voice signal, extracted from the user voice signal, or extracted from pre-registered (pre-stored) contents.

We perform the similarity measurement on extracted feature information based on the speaker recognition algorithm applied to the model for speaker recognition extraction.

For example, the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm is measured based on the stored speaker model reference pattern (S202).

Based on the measured similarity, the contents of the user device are sorted and grouped to form a speaker-based content list and group information (S203).

As described above, according to the present invention, it is possible to manage the file list of the user device by analyzing the voice data of the moving picture, manage the moving picture of the user device based on the user appearing in the moving picture, It is also possible to manage the moving picture of the user device by grouping them on the basis of the user. Also, it is possible to manage the file list of the user device by analyzing the audio data of the audio file, manage the video of the user device based on the user appearing in the audio file, and grouped based on the users appearing in the audio file Thereby managing the moving picture of the user device.

According to the present invention, the content can be managed on a semantic basis, thereby improving the satisfaction of the user. In particular, the content of the user device can be managed based on the voice data, There is convenience.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

110: Voice signal extracting unit 120: Voice characteristic information extracting unit
130: degree of similarity measurement unit 140:

Claims (4)

A voice signal extracting unit for extracting a voice signal for performing speaker recognition from contents input to the user device;
A voice characteristic information extracting unit for extracting voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit;
A similarity measuring unit for measuring a similarity of the voice characteristic information extracted by the voice characteristic information extracting unit based on the speaker recognition algorithm based on the stored speaker model reference pattern; And
Based on the measured similarity, the content of the user device is sorted and grouped to form a speaker-based content list and group information,
Based content management apparatus comprising:
The method according to claim 1,
The voice signal extracting unit may set the extraction position of the voice signal in the input content in accordance with the purpose and the policy of the speaker recognition application target
Based content management device through voice data analysis.
Extracting a voice signal for performing speaker recognition from video content input to the user device;
Extracting voice characteristic information for speaker analysis from the extracted voice signal;
Measuring the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm on the basis of the stored speaker model reference pattern; And
And arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based content list and group information
Based content management through voice data analysis.
The method of claim 3,
The voice characteristic information is extracted from at least one of the extracted voice signal, user voice signal, and pre-registered (pre-stored) contents
Based content management through voice data analysis.
KR1020130161071A 2012-12-27 2013-12-23 Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis KR20140086853A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020120155116 2012-12-27
KR20120155116 2012-12-27

Publications (1)

Publication Number Publication Date
KR20140086853A true KR20140086853A (en) 2014-07-08

Family

ID=51735981

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130161071A KR20140086853A (en) 2012-12-27 2013-12-23 Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis

Country Status (1)

Country Link
KR (1) KR20140086853A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190118994A (en) 2019-10-01 2019-10-21 엘지전자 주식회사 Method and device for focusing sound source
KR102135389B1 (en) * 2019-04-08 2020-07-17 박성태 Speaker and conference call system with the speaker
US11200904B2 (en) 2018-05-25 2021-12-14 Samsung Electronics Co., Ltd. Electronic apparatus, controlling method and computer readable medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11200904B2 (en) 2018-05-25 2021-12-14 Samsung Electronics Co., Ltd. Electronic apparatus, controlling method and computer readable medium
KR102135389B1 (en) * 2019-04-08 2020-07-17 박성태 Speaker and conference call system with the speaker
KR20190118994A (en) 2019-10-01 2019-10-21 엘지전자 주식회사 Method and device for focusing sound source
US11010124B2 (en) 2019-10-01 2021-05-18 Lg Electronics Inc. Method and device for focusing sound source

Similar Documents

Publication Publication Date Title
CN106024009B (en) Audio processing method and device
WO2017092122A1 (en) Similarity determination method, device, and terminal
US9661133B2 (en) Electronic device and method for extracting incoming/outgoing information and managing contacts
CN107527619B (en) Method and device for positioning voice control service
CN108227950B (en) Input method and device
CN105302315A (en) Image processing method and device
JP2017530431A (en) Nuisance telephone number determination method, apparatus and system
CN110781305A (en) Text classification method and device based on classification model and model training method
CN111128183B (en) Speech recognition method, apparatus and medium
KR20160024002A (en) Method for providing visual sound image and electronic device implementing the same
CN107945806B (en) User identification method and device based on sound characteristics
CN106777016B (en) Method and device for information recommendation based on instant messaging
KR20190066537A (en) Photograph sharing method, apparatus and system based on voice recognition
CN108509412A (en) A kind of data processing method, device, electronic equipment and storage medium
CN104298694A (en) Picture message adding method and device and mobile terminal
CN109146789A (en) Picture splicing method and device
CN109862421A (en) A kind of video information recognition methods, device, electronic equipment and storage medium
CN112068711A (en) Information recommendation method and device of input method and electronic equipment
CN104268151A (en) Contact person grouping method and device
CN105704322B (en) Weather information acquisition methods and device
CN106911706A (en) call background adding method and device
KR20140086853A (en) Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis
CN113987128A (en) Related article searching method and device, electronic equipment and storage medium
CN110928425A (en) Information monitoring method and device
CN106156299B (en) The subject content recognition methods of text information and device

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E601 Decision to refuse application