KR20140086853A - Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis - Google Patents
Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis Download PDFInfo
- Publication number
- KR20140086853A KR20140086853A KR1020130161071A KR20130161071A KR20140086853A KR 20140086853 A KR20140086853 A KR 20140086853A KR 1020130161071 A KR1020130161071 A KR 1020130161071A KR 20130161071 A KR20130161071 A KR 20130161071A KR 20140086853 A KR20140086853 A KR 20140086853A
- Authority
- KR
- South Korea
- Prior art keywords
- speaker
- voice
- voice signal
- extracted
- characteristic information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000007405 data analysis Methods 0.000 title claims abstract description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 6
- 238000007726 management method Methods 0.000 description 21
- 239000000284 extract Substances 0.000 description 7
- 238000010295 mobile communication Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000005236 sound signal Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/432—Query formulation
- G06F16/433—Query formulation using audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
The present invention relates to an apparatus and method for managing a speaker-based content through voice data analysis, and more particularly, to an apparatus and method for managing content in a user's personal device.
Generally, the mobile communication terminal further includes a wireless Internet access function in addition to a unique phone call function, a short message transmission / reception function (e.g., news, short message, etc.), and an address book management function. In addition, a recently manufactured and sold mobile communication terminal includes a digital camera and further includes a function of photographing a desired subject or storing it as a moving picture, and a function of playing a sound source file of an MP3 format.
As the mobile communication terminal has various functions, various data (for example, sending and receiving text messages, photo images, moving pictures, etc.) are generated in the mobile communication terminal itself and various data Data (for example, MP3 sound source data, moving pictures, contents for executing games, etc.) are required. Since various data are generated or required for performing various functions or for performing functions, the mobile communication terminal should be able to smoothly manage the corresponding data (contents).
Meanwhile, in the related art, a task of managing and listing contents in a personalized commercial device utilizing a multimedia, i.e., a mobile communication terminal, is based on a file name or a file creation time.
In recent years, with the expansion of convergence technology, various electronic devices are providing multimedia related technologies, and the utilization range of such electronic devices is gradually becoming more personalized. That is, mobile communication terminals such as a mobile phone and a personal digital assistant (PDA) have become a necessity of life for modern people and become a complex device capable of performing various functions.
Therefore, the amount of multimedia contents to be directly generated and managed by users has increased, and a new content management method has become necessary.
The present invention has been made in view of the above needs, and it is an object of the present invention to extract voice characteristic information by analyzing contents, particularly voice data extracted from a moving picture, for automatic classification of contents and user-based listing, Based content management apparatus and method by analyzing voice data that list and group up moving pictures based on the video data and manage them.
According to an aspect of the present invention, there is provided a speaker-based content management apparatus for analyzing voice data according to an aspect of the present invention, comprising: a voice signal extracting unit for extracting a voice signal for performing speaker recognition from contents input to a user device; A voice characteristic information extracting unit for extracting voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit; A similarity measuring unit for measuring a similarity of the voice characteristic information extracted by the voice characteristic information extracting unit based on the speaker recognition algorithm based on the stored speaker model reference pattern; And a manager for arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based contents list and group information.
According to another aspect of the present invention, there is provided a speaker-based content management method for analyzing speech data, the method comprising: extracting a speech signal for performing speaker recognition from video content input to a user device; Extracting voice characteristic information for speaker analysis from the extracted voice signal; Measuring the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm on the basis of the stored speaker model reference pattern; And arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based content list and group information.
According to the present invention, contents can be managed based on the meaning of the contents, thereby improving the user's satisfaction.
In particular, since contents can be managed on the basis of voice data, it is convenient to group and manage contents in which the same speaker appears.
1 is a view for explaining a speaker-based content management apparatus through analysis of speech data according to an embodiment of the present invention;
2 is a diagram for explaining a speaker-based content management method through voice data analysis according to an embodiment of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. And is intended to enable a person skilled in the art to readily understand the scope of the invention, and the invention is defined by the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. It is noted that " comprises, " or "comprising," as used herein, means the presence or absence of one or more other components, steps, operations, and / Do not exclude the addition.
The present invention can provide a content list in a manner different from conventional ones in a content providing environment of various personalized devices (PC, digital camera, camcorder, mobile phone, tablet, etc.) for creating and utilizing video contents, The user can be satisfied with the utilization of the device and the contents through the personalized content management method out of the method of listing and managing the contents based on the order of the files or the file creation date.
For example, the present invention relates to an apparatus and method for quickly extracting semantic information of moving picture contents and managing contents thereof, and proposes a novel paradigm for data management of personal devices. Speaker recognition techniques using voice data analysis Artificial intelligence technology) has high accuracy and portability when compared with other voice data based technologies, meaningful information of useful contents can be extracted without complicated metadata annotation process through this technology, and extracted meaning It is possible to classify contents based on information.
Hereinafter, a speaker-based content management apparatus through voice data analysis according to an embodiment of the present invention will be described with reference to FIG. 1 is a diagram for explaining a speaker-based content management apparatus through analysis of speech data according to an embodiment of the present invention.
1, a speaker-based
The voice
Here, Speaker Recognition searches the database in which the inputted voice data is stored in the database in which the voice data corresponding to the speaker information is previously stored, and when the voice data matching the inputted voice data is retrieved from the database, It is a technique to identify who the speaker is based on information.
The voice
In addition, the audio
For example, the audio
The voice characteristic
For example, the voice characteristic
The degree-of-similarity measuring
Meanwhile, a general speaker recognition algorithm is a technique of recognizing a speaker who is appearing in a voice signal by analyzing a voice signal. The speaker recognition algorithm used here is a technique of recognizing a voice signal extracted from a moving picture The similarity between the extracted speaker signals is compared to measure the similarity between the speakers extracted from the moving picture so that the moving pictures having the similarity degree between the speakers having the reference value or higher can be grouped.
The
For example, in the prior art, the moving picture photographed by the camcorder or the personal smart device is listed based on the date of photographing and the file name, and the list is provided to the user. However, the present invention analyzes the voice signal, The photographed moving images can be grouped by reflecting the context of photographed moving images, and can be classified into similar moving images and provided to the user.
More specifically, it is possible to group moving pictures in which the same speakers appear on the basis of the similarity of the speakers appearing in the audio signal into a specific group, and group similar images (room, Party, performance, etc.), grouping by the character of the video (lecture contents, conversation contents, conversation contents, music, etc.), and the grouped result can be provided to the user.
As described above, the video management technology through grouping can enhance the satisfaction of searching for a user's video and searching for a desired content, and can efficiently manage not only contents photographed by the user but also animation, movies, lecture materials, Management of movie / animation according to characters, and management of individual image according to speaker or lecturer are easy.
1, a speaker-based content management apparatus for analyzing speech data according to an exemplary embodiment of the present invention has been described. Hereinafter, a speaker-based content management apparatus for analyzing speech data according to an exemplary embodiment of the present invention will be described with reference to FIG. Based content management method. FIG. 2 is a diagram for explaining a speaker-based content management method by analyzing speech data according to an embodiment of the present invention.
As shown in FIG. 2, when a new content is input to the user device, the speaker-based
The length of the speech signal to be extracted can be set in a line that guarantees speaker recognition accuracy during speaker recognition and enables real-time system interworking.
In addition, the extraction position of the voice signal in the input moving image may be the beginning portion of the moving image or the highlight portion of the moving image according to the purpose and policy of the applied object.
The voice characteristic information for speaker analysis is extracted from the extracted voice signal (S201).
On the other hand, the voice feature information can be extracted from the extracted voice signal, extracted from the user voice signal, or extracted from pre-registered (pre-stored) contents.
We perform the similarity measurement on extracted feature information based on the speaker recognition algorithm applied to the model for speaker recognition extraction.
For example, the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm is measured based on the stored speaker model reference pattern (S202).
Based on the measured similarity, the contents of the user device are sorted and grouped to form a speaker-based content list and group information (S203).
As described above, according to the present invention, it is possible to manage the file list of the user device by analyzing the voice data of the moving picture, manage the moving picture of the user device based on the user appearing in the moving picture, It is also possible to manage the moving picture of the user device by grouping them on the basis of the user. Also, it is possible to manage the file list of the user device by analyzing the audio data of the audio file, manage the video of the user device based on the user appearing in the audio file, and grouped based on the users appearing in the audio file Thereby managing the moving picture of the user device.
According to the present invention, the content can be managed on a semantic basis, thereby improving the satisfaction of the user. In particular, the content of the user device can be managed based on the voice data, There is convenience.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.
110: Voice signal extracting unit 120: Voice characteristic information extracting unit
130: degree of similarity measurement unit 140:
Claims (4)
A voice characteristic information extracting unit for extracting voice characteristic information necessary for speaker analysis from the voice signal extracted by the voice signal extracting unit;
A similarity measuring unit for measuring a similarity of the voice characteristic information extracted by the voice characteristic information extracting unit based on the speaker recognition algorithm based on the stored speaker model reference pattern; And
Based on the measured similarity, the content of the user device is sorted and grouped to form a speaker-based content list and group information,
Based content management apparatus comprising:
The voice signal extracting unit may set the extraction position of the voice signal in the input content in accordance with the purpose and the policy of the speaker recognition application target
Based content management device through voice data analysis.
Extracting voice characteristic information for speaker analysis from the extracted voice signal;
Measuring the similarity of the voice characteristic information extracted from the voice signal according to the speaker recognition algorithm on the basis of the stored speaker model reference pattern; And
And arranging and grouping the contents of the user device based on the measured similarity to construct a speaker-based content list and group information
Based content management through voice data analysis.
The voice characteristic information is extracted from at least one of the extracted voice signal, user voice signal, and pre-registered (pre-stored) contents
Based content management through voice data analysis.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020120155116 | 2012-12-27 | ||
KR20120155116 | 2012-12-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20140086853A true KR20140086853A (en) | 2014-07-08 |
Family
ID=51735981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130161071A KR20140086853A (en) | 2012-12-27 | 2013-12-23 | Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20140086853A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190118994A (en) | 2019-10-01 | 2019-10-21 | 엘지전자 주식회사 | Method and device for focusing sound source |
KR102135389B1 (en) * | 2019-04-08 | 2020-07-17 | 박성태 | Speaker and conference call system with the speaker |
US11200904B2 (en) | 2018-05-25 | 2021-12-14 | Samsung Electronics Co., Ltd. | Electronic apparatus, controlling method and computer readable medium |
-
2013
- 2013-12-23 KR KR1020130161071A patent/KR20140086853A/en not_active Application Discontinuation
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11200904B2 (en) | 2018-05-25 | 2021-12-14 | Samsung Electronics Co., Ltd. | Electronic apparatus, controlling method and computer readable medium |
KR102135389B1 (en) * | 2019-04-08 | 2020-07-17 | 박성태 | Speaker and conference call system with the speaker |
KR20190118994A (en) | 2019-10-01 | 2019-10-21 | 엘지전자 주식회사 | Method and device for focusing sound source |
US11010124B2 (en) | 2019-10-01 | 2021-05-18 | Lg Electronics Inc. | Method and device for focusing sound source |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106024009B (en) | Audio processing method and device | |
WO2017092122A1 (en) | Similarity determination method, device, and terminal | |
US9661133B2 (en) | Electronic device and method for extracting incoming/outgoing information and managing contacts | |
CN107527619B (en) | Method and device for positioning voice control service | |
CN108227950B (en) | Input method and device | |
CN105302315A (en) | Image processing method and device | |
JP2017530431A (en) | Nuisance telephone number determination method, apparatus and system | |
CN110781305A (en) | Text classification method and device based on classification model and model training method | |
CN111128183B (en) | Speech recognition method, apparatus and medium | |
KR20160024002A (en) | Method for providing visual sound image and electronic device implementing the same | |
CN107945806B (en) | User identification method and device based on sound characteristics | |
CN106777016B (en) | Method and device for information recommendation based on instant messaging | |
KR20190066537A (en) | Photograph sharing method, apparatus and system based on voice recognition | |
CN108509412A (en) | A kind of data processing method, device, electronic equipment and storage medium | |
CN104298694A (en) | Picture message adding method and device and mobile terminal | |
CN109146789A (en) | Picture splicing method and device | |
CN109862421A (en) | A kind of video information recognition methods, device, electronic equipment and storage medium | |
CN112068711A (en) | Information recommendation method and device of input method and electronic equipment | |
CN104268151A (en) | Contact person grouping method and device | |
CN105704322B (en) | Weather information acquisition methods and device | |
CN106911706A (en) | call background adding method and device | |
KR20140086853A (en) | Apparatus and Method Managing Contents Based on Speaker Using Voice Data Analysis | |
CN113987128A (en) | Related article searching method and device, electronic equipment and storage medium | |
CN110928425A (en) | Information monitoring method and device | |
CN106156299B (en) | The subject content recognition methods of text information and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E601 | Decision to refuse application |