WO2017164510A2

WO2017164510A2 - Voice data-based multimedia content tagging method, and system using same

Info

Publication number: WO2017164510A2
Application number: PCT/KR2017/001103
Authority: WO
Inventors: 김준모
Original assignee: 김준모
Priority date: 2016-03-25
Filing date: 2017-02-02
Publication date: 2017-09-28
Also published as: WO2017164510A3; KR20170111161A; KR101832050B1

Abstract

Disclosed are: a voice data-based multimedia content tagging method for generating a voice tag on the basis of voice data of multimedia content and tagging the generated voice tag to the multimedia content; and a system using the same. The voice data-based multimedia content tagging method comprises the steps of: allowing a server to generate a voice tag on the basis of extracted voice keyword information; and allowing the server to tag the generated voice tag to multimedia content. Therefore, a search service enabling a user of a mobile terminal to search for desired multimedia content can be provided to the user. In addition, in a search related to a specific search word, a reliable search result can be acquired by searching for voice tags related to the specific search word from among voice tags generated on the basis of voice data.

Description

Method of tagging multimedia contents based on voice data and system using same

The present invention relates to a voice data-based multimedia content tagging method and a system using the same. More particularly, the present invention relates to a voice data-based multimedia for generating a voice tag based on voice data of the multimedia content and tagging the generated voice tag to the multimedia content. It relates to a content tagging method and a system using the same.

In general, multimedia content refers to information service contents utilized in systems and services that integrate, create, transmit, and process various types of information such as text, voice, and video.

Such multimedia contents can deliver much more information amount more effectively than other images, sounds, and texts at the same time, and the demand is gradually increased compared to contents composed of other images, sounds, and texts only.

However, the conventional method of searching for multimedia contents is to search for desired contents because the user needs to search and play the actual multimedia contents, or search based on description contents composed of images or texts to describe the multimedia contents. It takes a lot of time, and there is a disadvantage that does not exactly search the content desired.

In order to solve the disadvantage that occurs when searching for multimedia content as described above, Korean Patent Registration No. 10-1403317 (including video with tagging information) that includes tag information that appears as an image in the multimedia and provides information to the user Information providing system) is invented, but also tag information is composed of images to check images one by one, search for desired multimedia contents, and use only images stored in tag information among images included in multimedia contents. Because of searching multimedia content, there is a problem that the search results cannot be trusted.

Therefore, it provides a service that can search the desired multimedia content, without having to check the contents of the multimedia content one by one, you can search the desired multimedia content by using the tag information, reliable search results search There is a need to find ways to provide services.

The present invention has been made to solve the above problems and an object of the present invention is to generate a voice tag based on the voice data of the multimedia content, and to tag the generated voice tag to the multimedia content And providing a system.

Further, another object of the present invention is to provide a voice data-based multimedia content tagging method and system capable of searching for multimedia content associated with a specific search word based on a voice tag.

According to an embodiment of the present invention, a voice data-based multimedia content tagging method includes: extracting voice keyword information based on multimedia content by a server; Generating, by the server, a voice tag based on the extracted voice keyword information; And tagging, by the server, the generated voice tag on the multimedia content.

In the extracting step, the server separates the voice data included in the multimedia content into morpheme units, selects voice data corresponding to lexical morphemes from the separated voice data, and selects the selected voice data from the voice. Can be extracted as keyword information.

In the generating step, the server converts the extracted voice keyword information into text and matches the textual voice keyword information with synchronization time information of the voice data synchronized to a timeline of the multimedia content. You can create tags.

In the tagging step, the server may add the generated voice tag to the multimedia content and encode the tag in a predetermined format.

The generating may include generating the voice tag by matching the extracted voice keyword information with synchronization time information of the voice data synchronized to the timeline of the multimedia content.

The generating may include generating the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data synchronized with the timeline of the multimedia content and the URL address to which the multimedia content is linked. Can be generated.

In the generating step, the server textifies the extracted voice keyword information, sets at least one voice keyword information among the textized voice keyword information as a keyword, and the remaining voice keyword information not set as the keyword. If set to a stop word, the voice tag may be generated by selecting only the voice data set as the keyword except filtering the voice data set as the stop word.

In addition, the voice data-based multimedia content tagging method according to the present embodiment includes the steps of requesting the mobile terminal a search based on a specific search word to the server; And performing, by the server, the requested search.

Here, in the performing of the search, the server may compare the tagged voice tag with the search word to detect a voice tag associated with the search word among the tagged voice tags, thereby performing the search.

In the performing of the step, the server detects a voice tag associated with the search word and provides a voice tag associated with the search word to the mobile terminal as a result of the search, but includes a voice tag including the same voice data as the search word. If a plurality of voice tags including the same voice data as the search word are provided, the voice tag including voice data similar to the search word may be provided. The mobile terminal may preferentially provide a voice tag of a multimedia content having a relatively high number of download requests and a real time play request to a voice tag of a multimedia content having a relatively low number of download requests and a real time play request.

And the voice data-based multimedia content tagging system according to an embodiment of the present invention for achieving the above object, extracts the voice keyword information based on the multimedia content, and generates a voice tag based on the extracted voice keyword information A server tagging the generated voice tag on the multimedia content; And a mobile terminal provided with the tagged multimedia content from the server. It includes.

In addition, and a voice data-based multimedia content tagging method according to another embodiment of the present invention for achieving the above object, the mobile terminal extracts the voice keyword information based on the multimedia content; Generating, by the mobile terminal, a voice tag based on the extracted voice keyword information; And tagging, by the mobile terminal, the generated voice tag to the multimedia content, wherein the generating step includes: extracting the voice keyword information and the multimedia content when the voice keyword information is extracted. The voice tag may be generated by matching the path information with respect to the storage path.

As a result, a search service for searching for multimedia content desired by a user can be provided to a user of the mobile terminal.

In addition, during a search associated with a specific search word, a reliable search result may be obtained by searching for a voice tag associated with a specific search word among voice tags generated based on voice data.

1 is a diagram illustrating a voice data-based multimedia content tagging system according to an embodiment of the present invention.

2 is a diagram illustrating the configuration of a voice data based multimedia content tagging system according to an embodiment of the present invention.

3 is a flowchart illustrating a voice data-based multimedia content tagging method according to an embodiment of the present invention.

FIG. 4 is a diagram illustrating a data structure of multimedia content tagged with a voice data-based multimedia content tagging method according to an embodiment of the present invention.

5 is a flowchart illustrating a method of tagging voice data based multimedia content according to an embodiment of the present invention in more detail.

FIG. 6 is a diagram illustrating a process of extracting voice keyword information in a voice data-based multimedia content tagging method according to an embodiment of the present invention.

FIG. 7 is a diagram illustrating a process of generating a voice tag using a voice data based multimedia content tagging method according to an embodiment of the present invention.

8 is a diagram illustrating a process of generating a voice tag using a voice data-based multimedia content tagging method according to an embodiment of the present invention.

9 is a flowchart illustrating a voice data-based multimedia content tagging method according to an embodiment of the present invention in more detail.

Hereinafter, with reference to the drawings will be described in more detail with respect to the present invention. The embodiments introduced below are provided as an example to sufficiently convey the spirit of the present invention to those skilled in the art to which the present invention pertains. The invention is not limited to the embodiments described below and may be embodied in other forms.

1 is a diagram illustrating a voice data based multimedia content tagging system according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating a configuration of a voice data based multimedia content tagging system according to an embodiment of the present invention. Drawing.

Hereinafter, a voice data based multimedia content tagging system according to the present embodiment will be described with reference to FIGS. 1 and 2.

According to the present embodiment, a voice data-based multimedia content tagging system generates a voice tag based on voice data of the multimedia content, tags the generated voice tag on the multimedia content, and associates a specific search term with the tagged voice tag. It is provided to perform a search of multimedia content.

To this end, the present voice data-based multimedia content tagging system includes a server 100 and a mobile terminal 200.

The server 100 is provided to tag the voice tag to the multimedia content and to search for the multimedia content associated with the specific search word based on the tagged voice tag.

In detail, the server 100 may extract voice keyword information based on the multimedia content, generate a voice tag based on the extracted voice keyword information, and tag the generated voice tag on the multimedia content.

In addition, when a search based on a specific search word is requested from the mobile terminal 200, the server 100 may search for multimedia content associated with the specific search word based on the tagged voice tag.

To this end, the server 100 includes a communication unit 110, a control unit 120 and a storage unit 130.

The communication unit 110 of the server is provided to perform internet communication with the external device and the mobile terminal 200 using a network communication network. For example, the communication unit 110 may provide tagged multimedia content to the mobile terminal 200.

The controller 120 of the server is provided to extract voice keyword information based on the multimedia content, generate a voice tag based on the extracted voice keyword information, and tag the generated voice tag on the multimedia content.

In addition, when a search based on a specific search word is requested from the mobile terminal 200, the controller 120 may search for multimedia content associated with the specific search word based on the tagged voice tag.

The storage unit 130 of the server is provided to store multimedia content tagged with a voice tag.

In addition, the storage unit 130 may store data for a search service for multimedia content and a URL address to which the multimedia content is linked.

The mobile terminal 200 is provided to enable internet communication with the server 100 using a network communication network, and provides tagged multimedia content from the server 100, and requests the server 100 to search based on a specific search word. Can be.

To this end, the mobile terminal 200 includes a communication unit 210, a control unit 220, a storage unit 230, and a display unit 240.

The communication unit 210 of the mobile terminal is provided to perform internet communication with the server 100 using a network communication network. For example, the communication unit 210 may request a search from the server 100 based on a specific search word or receive multimedia content provided from the server 100.

The controller 220 of the mobile terminal is provided to control the first half of the mobile terminal 200. For example, if an input signal for requesting a search based on a specific search word is input through a separate input unit, a search request may be made to the server 100 based on the specific search word through the communication unit 210.

The storage unit 230 of the mobile terminal is provided to store various programs necessary for driving the mobile terminal 200.

In addition, the storage unit 230 may store data of a search service for searching for multimedia content or multimedia content provided from the server 100.

The display unit 240 of the mobile terminal is provided for outputting image information to be output by the mobile terminal 200. For example, the display 240 may output multimedia content provided from the server 100.

3 is a flowchart illustrating a method of tagging multimedia data based multimedia data according to an embodiment of the present invention, and FIG. 4 is a diagram of multimedia content tagged using the voice data based multimedia content tagging method according to an embodiment of the present invention. It is a figure for demonstrating the data structure of the figure.

Hereinafter, a method of tagging multimedia data based multimedia content according to the present embodiment will be described with reference to FIGS. 3 to 4.

First, the server 100 extracts voice keyword information based on the multimedia content (S110). For example, the server 100 separates voice data included in multimedia content into morpheme units, selects voice data corresponding to lexical morphemes from the separated voice data, and extracts the selected voice data as voice keyword information. can do.

Here, morphemes are the smallest units at the morphological level of language, which impart the function of meaning, and lexical morphemes are morphemes that represent specific objects, actions and states.

Meanwhile, when the voice keyword information is extracted, the server 100 generates a voice tag based on the extracted voice keyword information (S120). For example, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data.

In another example, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data and the URL address information to which the multimedia content is linked.

In this case, the server 100 may store the generated voice tag separately without tagging the multimedia content. In more detail, the voice tag may be stored in the storage 130 of the server in a file format separate from the multimedia content.

When the mobile terminal 200 searches for the multimedia content, the server 100 may provide the mobile terminal 200 with a voice tag of the multimedia content corresponding to the search condition.

When the voice tag is provided, the mobile terminal may receive the multimedia content linked to the URL address by decoding the URL address information area of the voice tag.

In addition, when the voice tag is generated, the server 100 tags the generated voice tag to the multimedia content (S130). For example, the server 100 may add the generated voice tag to the multimedia content and encode the tag in a predetermined format.

At this time, the server 100 may preset the format of the multimedia content, thereby standardizing the format of the multimedia content encoded in various formats.

In this case, the encoded and tagged multimedia content may be stored in the server 100 as a new file.

Through this, the multimedia content tagged with the voice tag may be composed of a data area of the voice tag and a data area of the multimedia content as shown in FIG. 4.

In addition, when the multimedia content is encoded, the server 100 may add a voice tag generated by decoding the encoded content and re-encode the tag in a predetermined format.

For another example, the server 100 may generate a voice tag by matching the extracted voice keyword information with path information on a storage path of the multimedia content.

Here, the storage path refers to a storage path of a file in which multimedia content is stored in a storage unit in the form of a file.

When the mobile terminal 200 requests the server 100 to search based on a specific search word (S140), the server 100 performs the requested search (S150). For example, the server 100 may perform a search by comparing a tagged voice tag with a search word and detecting a voice tag associated with the search word among the tagged voice tags.

Meanwhile, referring to a voice data-based multimedia content tagging method according to another embodiment of the present invention, the mobile terminal 200 according to the present embodiment installs an application for performing a voice data-based multimedia content tagging method and installs the application. In this way, voice keyword information may be extracted based on the multimedia content.

When the voice keyword information is extracted, the mobile terminal 200 may generate a voice tag based on the extracted voice keyword information.

In detail, when the voice keyword information is extracted, the mobile terminal 200 may generate the voice tag by matching the extracted voice keyword information with the path information of the multimedia content.

Here, when the mobile terminal 200 generates the voice tag by matching the extracted voice keyword information and the path information, the mobile terminal 200 may tag the multimedia content or otherwise store it separately without tagging the multimedia content.

As a specific example of the tagging of the generated voice tag to the multimedia content, the mobile terminal 200 may add the generated voice tag to the multimedia content and encode the tag in the predetermined format. In this case, the encoded and tagged multimedia content may be stored in the mobile terminal 200 as a new file.

As a result, when the mobile terminal 200 searches for tagged multimedia content, the multimedia content corresponding to the search condition is selected in the voice keyword information area of the tagged voice tag among the multimedia contents stored in the mobile terminal 200. By deciphering the path information area of the multimedia content, the multimedia content can be retrieved and executed in the mobile terminal 200.

FIG. 5 is a flowchart illustrating a voice data based multimedia content tagging method according to an embodiment of the present invention in more detail. FIG. 6 is a voice keyword in the voice data based multimedia content tagging method according to an embodiment of the present invention. FIG. 7 is a diagram illustrating a process of extracting information, and FIG. 7 is a diagram illustrating a process of generating a voice tag using a voice data-based multimedia content tagging method according to an embodiment of the present invention. Is a view illustrating a process of generating a voice tag using a voice data based multimedia content tagging method according to an embodiment of the present invention.

Hereinafter, with reference to FIGS. 5 to 8, a voice data-based multimedia content tagging method according to the present embodiment will be described in more detail.

First, as described above, the server 100 separates the voice data included in the multimedia content into morpheme units (S210), and selects voice data corresponding to the lexical morphemes from the separated voice data (S220). Voice data may be extracted as voice keyword information (S230).

Specifically, as shown in FIG. 6, for example, assuming that a specific multimedia content includes voice data, “Monkey is in love at a time of seeing Chun Hyang riding a swing,” the server 100 may use the voice data. "Monkey", "S", "Swing", "S", "Riding", "Chunhyang", "To", "Report", "Dan-", "Burn", "On", "Love", It can be separated into morphological units, such as "e", "fa", "-" and "-da" (S210).

And the server 100, the voice data corresponding to the lexical morpheme of the separated voice data, "Mononglong", "swing", "Chunhyang", "Bo (da)", "Love" and "Pa (lo)" In operation S220, the selected voice data may be extracted as voice keyword information in operation S230.

Meanwhile, when the voice keyword information is extracted, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data (S240).

7 is a diagram schematically illustrating a timeline of multimedia content, and FIG. 8 illustrates a voice tag generated by matching voice keyword information with synchronization time information. Specifically, the extracted voice keyword information is extracted voice data, and the synchronization time information of the voice data is information including a synchronization start time and a synchronization end time of the voice data synchronized to the timeline of the multimedia content.

As shown in Figs. 7 to 8, for example, it is assumed that voice keyword information including voice data of "swing" is synchronized during Ta and voice keyword information containing voice data of "chunhyang" is synchronized during Tb. In this case, the server 100 may synchronize the voice keyword information including the voice data of "swing" with the synchronization time information from which T1 (16:30) to T2 (16:42) is synchronized and the voice of "swing". The voice tag may be generated by matching keyword information.

In addition, the server 100 includes the voice keyword information including the voice data of "Chunhyang" and the synchronization time information of which the voice data of "Chunhyang" is synchronized from T3 (17:30) to T4 (18:22). The voice tag may be generated by matching keyword information.

For another example, the server 100 may text the extracted voice keyword information and match the textual voice keyword information with the synchronization time information of the voice data to generate a voice tag.

As another example, the server 100 may text-extract the extracted voice keyword information, set at least one voice keyword information among the textized voice keyword information as a keyword, and set the remaining voice keyword information not set as the keyword. If it is set as a stop word, the voice tag may be generated by selecting only the voice data set as a keyword and filtering out the voice data set as the stop word.

This means that the server 100 selects the voice data corresponding to the lexical morphemes of the separated voice data such as "mongryong", "swing", "chunhyang", "bo", "love" and "fast". If it is selected and extracted as voice keyword information, it is textized, and if the textualized voice keyword information of "Chunhyang" among the textized voice keyword information is set as a keyword, the remaining "monglong" and "swing" are not set as keywords. Textual voice keyword information such as "bo", "love", and "fast" is excluded as a stop word, and only the textual voice data of "chunhyang" set as a keyword is selected. You can create tags.

The keyword here means the headword, and the stop word means the negative word.

As another example, the server 100 may generate the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data and the URL address to which the multimedia content is linked.

By matching the voice tag information with the URL address to the voice tag, when specific multimedia content is searched through the search service, the mobile terminal 200 may receive the multimedia content linked to the URL address based on the voice tag of the retrieved multimedia content. Can be.

When the voice tag is generated, the server 100 may add the generated voice tag to the multimedia content, encode the tag in a predetermined format, and tag it (S250).

When the mobile terminal 200 requests a search from the server 100 based on a specific search word (S260), the server 100 may perform the requested search (S270). However, the search term is any word of natural language.

Hereinafter, referring to FIG. 9, a voice data-based multimedia content tagging method according to the present embodiment will be described in more detail.

First, the server 100 may extract voice keyword information based on the multimedia content as described above (S410).

Meanwhile, when the voice keyword information is extracted, the server 100 may generate a voice tag based on the extracted voice keyword information (S420).

In addition, when the voice tag is generated, the server 100 may tag the generated voice tag to the multimedia content (S430).

After tagging the voice tag to the multimedia content, when the mobile terminal 200 requests the server 100 to search based on a specific search word (S440), the server 100 performs tagging to perform the requested search. The voice tag is compared with the search word received from the mobile terminal 200 to determine whether there is a voice tag associated with the search word among the tagged voice tags (S450).

If there is a voice tag associated with the search word among the tagged voice tags (S450 -Y), the server 100 may perform a search by detecting the detected voice tag.

In detail, the server 100 preferentially provides a voice tag including voice data identical to the search word (S460) and provides a voice tag including voice data similar to the search word (S470).

In addition, when there are a plurality of voice tags including the same voice data as the search word, the server 100 determines that the voice tag of the multimedia content has a relatively high number of download requests and real time playback requests among voice tags including the same voice data as the search word. Is provided preferentially over voice tags of multimedia content having relatively few download requests and real-time playback requests.

Here, the number of download requests and the number of real-time playback requests of the multimedia content refer to the number of times that the download request is requested by the other mobile terminal 200 and the number of times that the real-time playback is requested.

As a result, during a search associated with a specific search word, a voice tag associated with a specific search word may be searched among voice tags generated based on voice data, and a reliable search result may be obtained through the search.

While the above has been shown and described with respect to preferred embodiments of the present invention, the present invention is not limited to the specific embodiments described above, it is usually in the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

Claims

Extracting, by the server, voice keyword information based on the multimedia content;

Generating, by the server, a voice tag based on the extracted voice keyword information; And

And tagging, by the server, the generated voice tag to the multimedia content.
The method of claim 1,

The extraction step,

The server separates the voice data included in the multimedia content into morpheme units, selects voice data corresponding to lexical morphemes from the separated voice data, and extracts the selected voice data as the voice keyword information. A method for tagging voice content based multimedia content.
The method of claim 2,

The generating step,

The server generates the voice tag by textualizing the extracted voice keyword information and matching the textualized voice keyword information with synchronization time information of the voice data synchronized to a timeline of the multimedia content. A method for tagging voice content based multimedia content.
The method of claim 3,

The tagging step,

And the server adds the generated voice tag to the multimedia content, encodes the tag in a predetermined format, and tags the voice data based multimedia content.
The method of claim 2,

The generating step,

And generating the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data synchronized to the timeline of the multimedia content.
The method of claim 2,

The generating step,

The server generates the voice tag by matching the extracted voice keyword information with the synchronization time information of the voice data synchronized with the timeline of the multimedia content and the URL address with which the multimedia content is linked. Data-based multimedia content tagging method.
The method of claim 2,

The generating step,

The server textifies the extracted voice keyword information, sets at least one textualized voice keyword information among the textified voice keyword information as a keyword,

When the remaining voice keyword information not set as the keyword is set as a stop word, the voice tag is selected by filtering only the voice data set as the keyword, except for filtering out the voice data set as the stop word. Data-based multimedia content tagging method.
The method of claim 1,

Requesting a search by the mobile terminal based on a specific search word from the server; And

And the server performing the requested search.
The method of claim 8,

The performing step,

And the server performs the search by comparing the tagged voice tag with the search word to detect a voice tag associated with the search word among the tagged voice tags.
The method of claim 9,

The performing step,

The server detects a voice tag associated with the search word and provides the voice tag associated with the search word to the mobile terminal as a result of the search.

Providing a voice tag including voice data identical to the search word preferentially to a voice tag including voice data similar to the search word,

When the voice tag including the same voice data as the search word is plural, multimedia content having a relatively high number of download requests and real time playback requests by other mobile terminals among the voice tags including the same voice data as the search word. And provide a voice tag of a voice tag of a multimedia content having a relatively small number of download requests and a real time playback request.
A server for extracting voice keyword information based on multimedia content, generating a voice tag based on the extracted voice keyword information, and tagging the generated voice tag on the multimedia content; And

A mobile terminal provided with the tagged multimedia content from the server; Voice data-based multimedia content tagging system comprising a.
Extracting, by the mobile terminal, voice keyword information based on the multimedia content;

Generating, by the mobile terminal, a voice tag based on the extracted voice keyword information; And

Tagging, by the mobile terminal, the generated voice tag on the multimedia content;

The generating step,

When the voice keyword information is extracted, the mobile terminal generates the voice tag by matching the extracted voice keyword information with the path information on the storage path of the multimedia content. .