KR101832050B1

KR101832050B1 - Tagging method for mutimedia contents base on sound data and system using the smae

Info

Publication number: KR101832050B1
Application number: KR1020160036059A
Authority: KR
Inventors: 김준모
Original assignee: 김준모
Priority date: 2016-03-25
Filing date: 2016-03-25
Publication date: 2018-02-23
Also published as: WO2017164510A2; KR20170111161A; WO2017164510A3

Abstract

Disclosed herein is a voice data-based multimedia content tagging method for generating a voice tag based on voice data of multimedia contents and tagging the generated voice tag to multimedia contents. The present invention provides a method of tagging multimedia content based multimedia content, the method comprising the steps of: a server generating a voice tag based on the extracted voice keyword information; And the server tagging the generated voice tag to the multimedia content. Accordingly, the user of the mobile terminal can be provided with a search service that allows the user to search for the desired multimedia content. In addition, a reliable search result can be obtained by searching a voice tag associated with a specific search word among voice tags generated based on voice data at a search associated with a specific search word.

Description

TECHNICAL FIELD [0001] The present invention relates to a tagging method for multimedia contents based on voice data, and a system using the tagging method.

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method of tagging multimedia content based on voice data and a system using the same, and more particularly, to a system and method for generating a voice tag based on voice data of multimedia contents, A content tagging method, and a system using the same.

Generally, multimedia content refers to contents of information service utilized in systems and services for creating, transmitting, and processing various types of information such as text, voice and images.

Such multimedia contents can transmit much more information during the same time than other images, sounds, and texts, so that the demand for multimedia contents is relatively increased compared to contents composed of other images, sounds, and texts.

However, the conventional method of searching for multimedia contents requires users to search for actual multimedia contents by playing them, or to search based on descriptive contents composed of images or text for describing the multimedia contents. Therefore, There is a disadvantage in that it takes a lot of time to search for the content desired by the user,

In order to solve the drawback that occurs when retrieving multimedia contents as described above, it is necessary to include tag information that is visible as an image in multimedia, and Korean Patent No. 10-1403317 which provides tag information to a user However, the tag information is also composed of an image, and it is necessary to check the image at the time of searching, to search for the desired multimedia content, and to use only the image stored in the tag information among the images included in the multimedia contents There is a problem that the search result can not be trusted because the multimedia content is searched.

Accordingly, it is possible to provide a service capable of searching for multimedia contents desired by the user, and it is possible to search multimedia contents desired by the user using the tag information without checking the contents of the multimedia contents, And to find a way to provide services.

Korean Patent No. 10-1403317: Information providing system using moving image having tagging information (Applicant: Hongik University Industry-Academic Cooperation Team)

SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a voice data-based multimedia content tagging method for generating a voice tag based on voice data of multimedia contents and tagging the generated voice tag to multimedia contents And a system.

Another object of the present invention is to provide a voice data-based multimedia content tagging method and system capable of searching for multimedia content associated with a specific search word based on a voice tag.

According to another aspect of the present invention, there is provided a method for tagging multimedia content based on multimedia data, the method comprising: extracting voice keyword information based on multimedia contents; The server generating voice tags based on the extracted voice keyword information; And the server tagging the generated voice tag to the multimedia content.

The server extracts voice data included in the multimedia content on a morphological basis, selects voice data corresponding to vocabulary morpheme among the separated voice data, and transmits the selected voice data to the voice It can be extracted by keyword information.

The server generates text data of the extracted voice keyword and matches the textual voice keyword information with the synchronization time information of the voice data synchronized with the time line of the multimedia contents, Tag can be generated.

In addition, the tagging step may be performed by the server, and the server adds the generated voice tag to the multimedia content, and encodes and tags the voice tag in a predetermined format.

The server may generate the voice tag by allowing the server to match the extracted voice keyword information with the synchronization time information of the voice data synchronized with the time line of the multimedia content.

The server generates the voice tag information by matching the extracted voice keyword information with the synchronization time information of the voice data synchronized with the time line of the multimedia content and the URL address linked with the multimedia content, Can be generated.

In the generating step, the server texts the extracted voice keyword information, sets at least one of the voice keyword information in the textual voice keyword information as a keyword, and sets remaining voice keyword information Is set as a stop word, the voice data set as the stop word is filtered out, and only the voice data set by the keyword is selected to generate the voice tag.

According to another aspect of the present invention, there is provided a method of tagging multimedia content based on multimedia data, the method comprising: requesting a mobile terminal to search the server based on a specific search word; And performing the requested search by the server.

Here, the performing step may perform the search by allowing the server to compare the tagged voice tag with the search word to detect a voice tag associated with the search word among the tagged voice tags.

The performing step may further include a step of the server detecting a voice tag associated with the search word and providing the mobile terminal with a voice tag associated with the search word as a result of the search, The voice tag may be preferentially provided with a voice tag that includes voice data similar to the search word, and if the voice tag includes the same voice data as the search word, A voice tag of a multimedia content having a relatively large number of download requests and a large number of times of real-time reproduction requests may be preferentially provided by a mobile terminal than a voice tag of a multimedia content having a relatively small number of download requests and a number of real-time reproduction requests.

According to another aspect of the present invention, there is provided a system for tagging multimedia content based on multimedia data, comprising: extracting voice keyword information based on the multimedia content; generating a voice tag based on the extracted voice keyword information; A server for tagging the generated voice tag to the multimedia contents; And a mobile terminal from which the tagged multimedia content is provided from the server; .

According to another aspect of the present invention, there is provided a method for tagging multimedia content based on multimedia data, comprising the steps of: extracting voice keyword information based on multimedia contents; Generating a voice tag based on the extracted voice keyword information by the mobile terminal; And tagging the generated voice tag to the multimedia content by the mobile terminal, wherein, when the voice keyword information is extracted, the mobile terminal extracts the extracted voice keyword information and the multimedia content The path information of the storage path is matched to generate the voice tag.

Accordingly, the user of the mobile terminal can be provided with a search service that allows the user to search for the desired multimedia content.

In addition, a reliable search result can be obtained by searching a voice tag associated with a specific search word among voice tags generated based on voice data at a search associated with a specific search word.

FIG. 1 is a diagram illustrating a multimedia data content based tagging system according to an exemplary embodiment of the present invention. Referring to FIG.
2 is a diagram illustrating a configuration of a voice data-based multimedia content tagging system according to an embodiment of the present invention.
3 is a flowchart illustrating a method of tagging multimedia data based on multimedia data according to an embodiment of the present invention.
4 is a diagram for explaining a data structure of a multimedia content tagged by a voice data-based multimedia content tagging method according to an embodiment of the present invention.
5 is a flowchart illustrating a method for tagging multimedia data based multimedia contents according to an exemplary embodiment of the present invention.
FIG. 6 is a diagram for explaining a process of extracting voice keyword information in a voice data-based multimedia content tagging method according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating a process of generating a voice tag using a voice data-based multimedia content tagging method according to an embodiment of the present invention.
FIG. 8 is a diagram illustrating a process of generating a voice tag using a voice data-based multimedia content tagging method according to an embodiment of the present invention. Referring to FIG.
FIG. 9 is a flowchart illustrating a method for tagging multimedia data based multimedia contents according to an exemplary embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings. The embodiments described below are provided by way of example so that those skilled in the art will be able to fully understand the spirit of the present invention. The present invention is not limited to the embodiments described below and may be embodied in other forms.

FIG. 1 is a diagram illustrating a multimedia data content tagging system according to an embodiment of the present invention. FIG. 2 is a block diagram illustrating a configuration of a multimedia data content tagging system according to an embodiment of the present invention. FIG.

Hereinafter, a voice data-based multimedia content tagging system according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG.

The multimedia data content tagging system according to the present embodiment generates a voice tag based on voice data of multimedia contents, tags the generated voice tag to multimedia contents, And is provided for performing retrieval of multimedia contents.

To this end, the present audio data based multimedia content tagging system includes a server 100 and a mobile terminal 200.

The server 100 is provided for tagging the voice tag to the multimedia content and performing search of the multimedia content associated with the specific search word based on the tagged voice tag.

Specifically, the server 100 extracts voice keyword information based on the multimedia content, generates a voice tag based on the extracted voice keyword information, and tags the generated voice tag to the multimedia content.

In addition, when a search based on a specific search term is requested from the mobile terminal 200, the server 100 may search for multimedia content associated with a specific search term based on the tagged voice tag.

To this end, the server 100 includes a communication unit 110, a control unit 120, and a storage unit 130.

The communication unit 110 of the server is provided for performing Internet communication with an external device and the mobile terminal 200 using a network communication network. For example, the communication unit 110 may provide the tagged multimedia content to the mobile terminal 200.

The control unit 120 of the server extracts voice keyword information based on the multimedia contents, generates voice tags based on the extracted voice keyword information, and tags the generated voice tags to the multimedia contents.

The control unit 120 may search multimedia contents associated with a specific search term based on the tagged voice tag when a search based on a specific search term is requested from the mobile terminal 200. [

The storage unit 130 of the server is provided for storing multimedia contents tagged with voice tags.

In addition, the storage unit 130 may store a URL address linked with data and multimedia contents for a multimedia content search service.

The mobile terminal 200 is provided so as to be able to communicate with the server 100 via the network communication network and is provided with the multimedia content tagged from the server 100 and requests the server 100 to search based on a specific search word .

The mobile terminal 200 includes a communication unit 210, a control unit 220, a storage unit 230, and a display unit 240.

The communication unit 210 of the mobile terminal is provided for performing Internet communication with the server 100 using a network communication network. For example, the communication unit 210 may request the server 100 to search for a specific search word or receive multimedia contents provided from the server 100 based on a specific search word.

The control unit 220 of the mobile terminal 200 is provided for controlling the first half of the mobile terminal 200. For example, if an input signal for requesting a search based on a specific search word is inputted through a separately provided input unit, the search can be requested to the server 100 based on a specific search word through the communication unit 210.

The storage unit 230 of the mobile terminal 200 is provided for storing various programs necessary for driving the mobile terminal 200.

In addition, the storage unit 230 may store data of a search service for retrieving multimedia contents or multimedia contents provided from the server 100.

The display unit 240 of the mobile terminal 200 is provided for outputting image information to be output by the mobile terminal 200. For example, the display unit 240 may output the multimedia contents provided from the server 100. For example,

FIG. 3 is a flowchart illustrating a speech data-based multimedia content tagging method according to an exemplary embodiment of the present invention. FIG. 4 is a flowchart illustrating a method of tagging multimedia content tagged with a speech data-based multimedia content tagging method according to an exemplary embodiment of the present invention. And the like.

Hereinafter, a method of tagging multimedia data based on multimedia data according to the present embodiment will be described with reference to FIG. 3 to FIG.

First, the server 100 extracts voice keyword information based on the multimedia contents (S110). For example, the server 100 separates voice data included in the multimedia contents into morpheme units, selects voice data corresponding to vocabulary morpheme among the separated voice data, extracts the selected voice data as voice keyword information can do.

Here, the morpheme is a minimum unit at the morphological level of the language, which gives a function of meaning, and the vocabulary morpheme is a morpheme for expressing a specific object, an action, and a state.

On the other hand, when the voice keyword information is extracted, the server 100 generates a voice tag based on the extracted voice keyword information (S120). For example, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data.

In another example, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data and the URL address information linked with the multimedia contents.

At this time, the server 100 can store the generated voice tag separately without tagging the multimedia tag. Specifically, the voice tag may be stored in the storage unit 130 of the server as a file separate from the multimedia contents.

When the mobile terminal 200 searches for multimedia contents, the server 100 may provide the mobile terminal 200 with voice tags of multimedia contents matching the search conditions.

When the voice tag is provided, the mobile terminal can receive the multimedia contents linked to the URL address by decrypting the URL address information area of the voice tag.

When the voice tag is generated, the server 100 tags the generated voice tag to the multimedia content (S130). For example, the server 100 can add the generated voice tag to the multimedia content, and can encode and tag the voice tag in a predetermined format.

At this time, the server 100 can standardize the format of the multimedia contents encoded in various formats by setting the format of the multimedia contents in advance.

At this time, the encoded and tagged multimedia contents can be stored in the server 100 as a new file.

As a result, the multimedia contents tagged with the voice tag can be composed of the data area of the voice tag and the data area of the multimedia contents as shown in FIG.

When the multimedia contents are encoded, the server 100 may decode the multimedia contents, add the generated voice tags, re-encode them in a predetermined format, and tag the multimedia contents.

In another example, the server 100 may generate a voice tag by matching the extracted voice keyword information with the path information of the storage path of the multimedia contents.

Here, the storage path means a storage path of a file stored in the storage unit in the form of a file of the multimedia contents.

When the mobile terminal 200 requests the server 100 to search based on a specific query (S140), the server 100 performs the requested search (S150). For example, the server 100 may perform a search by comparing the tagged voice tag with a search word to detect a voice tag associated with the search word among the tagged voice tags, if any.

Meanwhile, according to another embodiment of the present invention, a mobile terminal 200 according to the present embodiment may install an application for performing a voice data-based multimedia content tagging method, To extract the voice keyword information based on the multimedia contents.

When the voice keyword information is extracted, the mobile terminal 200 can generate a voice tag based on the extracted voice keyword information.

More specifically, when the voice keyword information is extracted, the mobile terminal 200 can generate the voice tag by matching the extracted voice keyword information with the path information of the multimedia contents.

Here, if the voice tag is generated by matching the extracted voice keyword information and the route information, the mobile terminal 200 may tag the multimedia tag with the multimedia content, or store it separately without tagging the multimedia content.

As a concrete example of tagging the generated voice tag to the multimedia content, the mobile terminal 200 may add the generated voice tag to the multimedia content, and may encode and tag the generated voice tag in a predetermined format. At this time, the encoded and tagged multimedia contents can be stored in the mobile terminal 200 as a new file.

Accordingly, when the mobile terminal 200 searches for the tagged multimedia contents, the multimedia contents matching the search condition are selected from the voice keyword information area of the tagged voice tag among the multimedia contents stored in the mobile terminal 200, The multimedia contents can be retrieved and executed in the mobile terminal 200 by decoding the path information area of the multimedia contents.

FIG. 5 is a flowchart illustrating a method of tagging multimedia data based on multimedia data according to an embodiment of the present invention. FIG. 6 is a flowchart illustrating a method of tagging multimedia data based on a speech data according to an embodiment of the present invention. FIG. 7 is a view for explaining a process of generating a voice tag by a voice data-based multimedia content tagging method according to an embodiment of the present invention, and FIG. Is a diagram illustrating a process of generating a voice tag using a voice data-based multimedia content tagging method according to an embodiment of the present invention.

Hereinafter, the method of tagging multimedia data based on multimedia data according to the present embodiment will be described in more detail with reference to FIG. 5 to FIG.

First, as described above, the server 100 separates voice data included in the multimedia contents into morpheme units (S210), selects voice data corresponding to vocabulary morpheme among the separated voice data (S220) The voice data can be extracted as the voice keyword information (S230).

As shown in FIG. 6, for example, when it is assumed that the specific multimedia contents include voice data that "Mongyong has fallen in love once in a swing on a swing" , "Love", "love", "love", "love", "love", "love" Can be separated into morpheme units such as " e ", "bar "," - lost ", and "-da"

The server 100 then extracts voice data corresponding to the vocabulary morpheme from among the separated voice data in the order of "Mongryong", "Swing", "Chunhyang", "Bo (da)", "Love" and " (S220), and the selected speech data may be extracted as speech keyword information (S230).

Meanwhile, when the server 100 extracts the voice keyword information, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data (S240).

Here, FIG. 7 is a diagram schematically illustrating a time line of multimedia contents, and FIG. 8 is a view illustrating a voice tag generated by matching voice keyword information with synchronization time information. More specifically, the extracted voice keyword information is extracted voice data, and the synchronization time information of voice data is information including a synchronization start time and a synchronization end time of voice data synchronized with the timeline of the multimedia contents.

As shown in FIGS. 7 to 8, for example, assuming that the voice keyword information including the voice data "swing" is synchronized for Ta and the voice keyword information including the voice data " , The server 100 determines that the voice keyword information including the voice data "swing" is synchronized from T1 (16:30) to T2 (16:42) The voice tag can be generated by matching the keyword information.

In addition, the server 100 may store the synchronization time information in which the voice keyword information including the voice data "Chunhyang" is synchronized from T3 (17:30) to T4 (18:22) The voice tag can be generated by matching the keyword information.

In another example, the server 100 may text the extracted voice keyword information, and may generate the voice tag by matching the textual voice keyword information with the synchronization time information of the voice data.

As another example, the server 100 may text the extracted voice keyword information, set at least one of the voice keyword information in the textual voice keyword information as a keyword, and store the remaining voice keyword information that is not set as a keyword When the stop word is set to Stop Word, voice data set as a stop word is filtered out and voice data can be generated by selecting only voice data set as a keyword.

This means that the server 100 is able to extract the voice data corresponding to the vocabulary morpheme of the separated voice data such as "Mong Ryong", "Swing", "Chunhyang", "Bo (Da)", "Love" If the keyword is set as a keyword, the keyword " Chunhyang "in the textualized speech keyword information is set as a keyword. The remaining words" Mongryong, "" , "Love", "Love" and "Bar" are set as stop words and excluded. Only the text data "Chunhyang" Tag can be generated.

Here, the keyword means a headword, and the stop word means a negative word.

In another example, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data and the URL address linked with the multimedia contents.

When the specific multimedia contents are searched through the search service by matching the voice keyword information with the URL address in the voice tag, the mobile terminal 200 receives the multimedia contents linked to the URL address based on the voice tag of the searched multimedia contents .

When the voice tag is generated, the server 100 may add the generated voice tag to the multimedia content, and may encode and tag the voice tag in a predetermined format (S250).

When the mobile terminal 200 requests the server 100 to search based on a specific search term (S260), the server 100 can perform the requested search (S270). Here, the search term is any one of natural words.

FIG. 9 is a flowchart illustrating a method for tagging multimedia data based multimedia contents according to an exemplary embodiment of the present invention.

Hereinafter, with reference to FIG. 9, a method of tagging multimedia data based on multimedia data according to the present embodiment will be described in detail.

First, the server 100 may extract voice keyword information based on the multimedia contents as described above (S410).

On the other hand, when the voice keyword information is extracted, the server 100 may generate a voice tag based on the extracted voice keyword information (S420).

In addition, when the voice tag is generated, the server 100 may tag the generated voice tag to the multimedia content (S430).

After tagging the voice tag to the multimedia content, if the mobile terminal 200 requests the server 100 to search based on a specific search term (S440), the server 100 transmits the tagged The voice tag is compared with the search word received from the mobile terminal 200 to determine whether there is a voice tag associated with the search word among the tagged voice tags (S450).

If there is a voice tag associated with the search word among the tagged voice tags (S450-Y), the server 100 can perform the search by detecting the voice tag.

Specifically, the server 100 preferentially provides a voice tag including voice data identical to the search word (S460), and provides a voice tag including voice data similar to the search word (S470).

In addition, if there are a plurality of voice tags including the same voice data as the search words, the server 100 may determine that the number of voice tags including the same voice data as the search words, Is preferentially provided to the voice tags of the multimedia contents having a relatively small number of download requests and a number of real-time reproduction requests.

Here, the number of download requests for multimedia content and the number of real-time playback requests refer to the number of times that download requests are requested by other mobile terminals 200 and the number of times that real-time playback is requested.

Accordingly, at the time of searching associated with a specific search term, a voice tag associated with a specific search term among voice tags generated based on voice data can be searched, and a reliable search result can be obtained through search.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be construed as limiting the scope of the invention as defined by the appended claims. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

100: server 110:
120: control unit 130:
200: mobile terminal 210:
220: control unit 230:
240:

Claims

The server extracting voice keyword information based on the multimedia contents;
The server generating voice tags based on the extracted voice keyword information;
The server tagging the generated voice tag to the multimedia content;
The mobile terminal requests the server to search based on a specific search word; And
And the server performing the requested search,
Wherein the extracting step comprises:
The server extracts voice data included in the multimedia contents in morpheme units, selects voice data corresponding to vocabulary morpheme among the separated voice data, extracts the selected voice data as the voice keyword information,
Wherein the generating comprises:
The server texts the extracted voice keyword information and converts the textualized voice keyword information into the synchronization time information of the voice data synchronized with the time line of the multimedia contents and the URL address linked with the multimedia contents And sets at least one textual voice keyword information of the textualized voice keyword information as a keyword, and sets remaining voice keyword information not set as the keyword as a stop word (Stop Word) , The speech data set by the stop word is filtered and excluded, and only the speech data set by the keyword is selected to generate the speech tag,
The generated voice tag is not tagged to the multimedia content but stored in the server in a file format separate from the multimedia content when the voice tag is generated by matching with the URL address,
Wherein the performing step comprises:
The server compares the voice tag stored in the file with the search word to detect a voice tag associated with the search word among the voice tags stored in the file format, A voice tag associated with the search word is provided as a result of the search, wherein a voice tag including voice data identical to the search word is preferentially provided to a voice tag including voice data similar to the search word,
If the number of the voice tags including the same voice data as the search word is larger than the number of the voice tags including the same voice data as the search word, Wherein the voice tags of the multimedia contents are provided preferentially over the voice tags of the multimedia contents having the relatively small number of download requests and the number of real-time reproduction requests.

delete

The method according to claim 1,
The tagging step includes:
Wherein the server adds the generated voice tag to the multimedia content, and encodes and tags the voice tag in a predetermined format.

delete

A server extracting voice keyword information based on the multimedia contents, generating a voice tag based on the extracted voice keyword information, and tagging the generated voice tag to the multimedia content; And
A mobile terminal from which the tagged multimedia content is provided; / RTI >
The mobile terminal comprises:
A search request can be made to the server based on a specific search word,
The server comprises:
When a search is requested through the mobile terminal based on the specific search word, performing a requested search,
The server comprises:
Extracting voice data included in the multimedia content by morpheme units, selecting voice data corresponding to vocabulary morpheme among the separated voice data, extracting the selected voice data as the voice keyword information,
The server comprises:
The textualized voice keyword information is matched with the synchronization time information of the voice data synchronized with the time line of the multimedia contents and the URL address linked with the multimedia contents, Tag, wherein at least one textual voice keyword information of the textualized voice keyword information is set as a keyword, and if the remaining voice keyword information not set as the keyword is set as a stop word, The voice data set as the stop word is filtered out, and only the voice data set by the keyword is selected to generate the voice tag,
The generated voice tag is not tagged to the multimedia content but stored in the server in a file format separate from the multimedia content when the voice tag is generated by matching with the URL address,
The server comprises:
Wherein the voice tag stored in the file format is compared with the search word to detect a voice tag associated with the search word among the voice tags stored in the file format, So that a voice tag including voice data identical to the search word is provided preferentially to a voice tag including voice data similar to the search word,
If the number of the voice tags including the same voice data as the search word is larger than the number of the voice tags including the same voice data as the search word, Wherein the voice tag of the multimedia tag is preferentially provided to the voice tag of the multimedia contents having a relatively small number of the download requests and the number of the real-time reproduction requests.

delete