KR20120070850A

KR20120070850A - System and method for generating content tag with web mining

Info

Publication number: KR20120070850A
Application number: KR1020100132344A
Authority: KR
Inventors: 장두성
Original assignee: 주식회사 케이티
Priority date: 2010-12-22
Filing date: 2010-12-22
Publication date: 2012-07-02
Also published as: KR101285721B1

Abstract

PURPOSE: A content tag generating system and a method thereof using web mining are provided to generate a tag of video content which is supplied in an IPTV service by using information collected from web content. CONSTITUTION: A web mining executing unit(100) collects one or more web contents which are matched with video content. A material information generating unit(200) generates material information which is matched with the video content. A genre information generating unit(400) generates genre information which is matched with the video content. An emotion information generating unit(600) generates emotion information which is matched with the video content. A content tag generating unit(800) generates tag information.

Description

System and method for generating content tag using web mining {SYSTEM AND METHOD FOR GENERATING CONTENT TAG WITH WEB MINING}

본 발명은 콘텐츠의 태그를 생성하는 시스템 및 방법에 관한 것으로서, 보다 상세하게는, 웹 마이닝을 통하여 수집한 콘텐츠의 정보를 이용하여 콘텐츠의 태그를 생성하는 시스템 및 방법에 관한 것이다.
The present invention relates to a system and method for generating a tag of content, and more particularly, to a system and method for generating a tag of content by using information of content collected through web mining.

태그 정보는 동영상 컨텐츠에 부착되어 있으며, 콘텐츠의 내용이 텍스트의 형태로 표현되어 있다. IPTV 서비스는 동영상에 부착된 태그 정보를 이용하여 내용기반 검색 및 추천, 광고 등을 다양한 서비스를 제공할 수 있다.The tag information is attached to the video content, and the content of the content is expressed in the form of text. IPTV service can provide a variety of services, such as content-based search and recommendation, advertising using the tag information attached to the video.

종래의 기술에 따른 동영상 내용을 포함하는 태그 정보는 일반적으로 단순히 관리자의 입력에 의해 생성되거나, 포털 사이트 등의 사용자의 협업에 의해 수집되어 생성되었다.Tag information including moving image content according to the prior art is generally generated by simply inputting by an administrator or collected and generated by collaboration of a user such as a portal site.

그러나 포털 사이트를 통해 태그 생성을 위한 정보가 수집될 수 있는 콘텐츠는 인지도가 높은 영화, 다큐멘터리, 드라마 등에 불과하기 때문에, 수십만 개에 해당하는 IPTV에서 제공되는 콘텐츠에 대하여 포털 사이트 등으로부터 사용자의 협업에 의해 정보를 수집하거나, IPTV 서비스의 관리자가 직접 입력하는 것은 불가능에 가깝다.However, since contents for tag generation can be collected through portal sites, only famous movies, documentaries, dramas, etc. can be used to cooperate with users from portal sites for content provided by hundreds of thousands of IPTV. It is almost impossible to collect information or enter it directly by the administrator of the IPTV service.

따라서, 종래의 기슬에 따르면, IPTV에 의해 제공되는 콘텐츠의 태그 정보를 생성하지 못하여, 콘텐츠의 태그 정보를 이용한 다양한 서비스가 태그 정보의 부족으로 인하여 제공되지 못한다는 문제점이 있었다.
Therefore, according to the conventional technology, there is a problem in that the tag information of the content provided by the IPTV cannot be generated, and various services using the tag information of the content cannot be provided due to the lack of the tag information.

본 발명의 일 실시예는 단순히 포털 사이트의 사용자 또는 IPTV의 관리자에 의해 수작업을 통해 태그를 생성하는 것이 아니라, 웹 마이닝을 통해 태그를 생성하고자 하는 동영상 콘텐츠에 관련된 정보를 웹 상에서 수집하고, 수집한 정보를 미리 구축한 소재 온톨로지, 장르 온톨로지 및 감성 온톨로지에 적용하여, 동영상 콘텐츠에 대한 소재, 장르 및 감성에 매칭되는 정보를 포함하는 태그를 생성하는 시스템 및 방법을 제공하고자 한다.
An embodiment of the present invention is not simply to generate a tag manually by a user of a portal site or an administrator of an IPTV, but collects and collects information related to video content to generate a tag through web mining on the web. The present invention is to provide a system and method for generating a tag including information matching material, genre, and emotion of moving image content by applying information to a material ontology, genre ontology, and emotional ontology.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면은 태그를 생성하고자 하는 동영상 콘텐츠에 매칭되는 하나 이상의 웹 콘텐츠를 수집하는 웹 마이닝 수행부, 미리 구축된 소재 온톨로지 및 상기 수집된 하나 이상의 웹 콘텐츠에 기초하여 상기 동영상 콘텐츠에 매칭되는 소재 정보를 생성하는 소재 정보 생성부, 상기 소재 정보 생성부에 의해 생성된 소재 정보, 미리 구축된 장르 온톨로지, 상기 동영상 콘텐츠의 제작 정보 및 상기 수집된 하나 이상의 웹 콘텐츠에 기초하여 상기 동영상 콘텐츠에 매칭되는 장르 정보를 생성하는 장르 정보 생성부, 미리 구축된 감성 온톨로지 및 상기 수집된 하나 이상의 웹 콘텐츠에 기초하여 상기 동영상 콘텐츠에 매칭되는 감성 정보를 생성하고, 상기 생성한 감성 정보의 크기를 포함하는 감성 벡터를 생성하는 감성 정보 생성부 및 상기 생성된 소재 정보, 장르 정보, 감성 정보 및 벡터 정보를 포함하는 상기 동영상 콘텐츠에 매칭되는 태그 정보를 생성하는 콘텐츠 태그 생성부를 포함하는 웹 마이닝을 이용한 콘텐츠 태그 생성 시스템을 제공할 수 있다.As a technical means for achieving the above-described technical problem, the first aspect of the present invention is a web mining performing unit for collecting one or more web content matching the video content to generate a tag, a pre-built material ontology and the collected A material information generator for generating material information matching the video content based on one or more web contents, material information generated by the material information generator, pre-built genre ontology, production information of the video content, and the collection A genre information generation unit generating genre information matching the video content based on one or more web contents, a pre-established emotional ontology, and emotional information matching the video content based on the collected one or more web contents And the size of the generated emotion information. Content tag using web mining including an emotion information generator for generating an emotion vector and a content tag generator for generating tag information matching the video content including the generated material information, genre information, emotion information, and vector information It is possible to provide a generation system.

본 발명의 제 1 측면에서, 상기 소재 정보 생성부는 상기 수집된 웹 콘텐츠에 포함된 단어 및 문장에 대하여 수행된 형태소 분석 결과에 기초하여 상기 웹 콘텐츠로부터 명사 및 복합 명사 등의 의미어를 추출하는 의미어 추출부, 상기 웹 콘텐츠에서의 상기 추출된 의미어의 출현 빈도(term frequency)에 기초하여 주요어를 선정하는 주요어 선정부 및 상기 선정된 주요어에 매칭되는 대표 소재 단어를 상기 소재 온톨로지로부터 획득하고, 상기 획득한 대표 소재 단어에 기초하여 상기 동영상 콘텐츠에 매칭되는 소재 정보를 추출하는 소재 정보 추출부를 포함할 수 있다.In the first aspect of the present invention, the location information generation unit is a means for extracting a semantic word such as nouns and compound nouns from the web content based on the result of morphological analysis performed on the words and sentences included in the collected web content. A word extracting unit, a main word selecting unit selecting a main word based on a term frequency of the extracted semantic word in the web content, and a representative material word matching the selected main word from the material ontology, It may include a material information extraction unit for extracting the material information matching the video content based on the obtained representative material word.

또한, 본 발명의 제 1 측면에서, 상기 장르 정보 생성부는 상기 수집된 웹 콘텐츠에 포함된 단어 및 문장에 대하여 수행된 형태소 분석 결과에 기초하여 상기 웹 콘텐츠로부터 의미어를 추출하는 의미어 추출부, 상기 웹 콘텐츠에서의 상기 추출된 의미어의 출현 빈도(term frequency)에 기초하여 주요어를 선정하는 주요어 선정부 및 상기 선정된 주요어, 상기 소재 정보 및 상기 동영상 콘텐츠의 배우 정보 및 감독 정보에 기초하여 상기 동영상 콘텐츠를 상기 장르 온톨로지에 정의된 하나 이상의 장르로 분류하는 장르 정보 추출부를 포함할 수 있다.In addition, in the first aspect of the present invention, the genre information generation unit is a semantic extracting unit for extracting a semantic word from the web content based on the result of morphological analysis performed on the words and sentences included in the collected web content, A main word selecting unit which selects a main word based on a frequency of appearance of the extracted semantic word in the web content, and based on the selected main word, the material information, and actor information and supervision information of the video content; It may include a genre information extraction unit that classifies video content into one or more genres defined in the genre ontology.

또한, 본 발명의 제 1 측면에서, 상기 감성 정보 생성부는 상기 수집된 웹 콘텐츠에 포함된 단어 및 문장에 대하여 수행된 형태소 분석 결과에 기초하여 상기 웹 콘텐츠로부터 명사, 복합명사 등의 의미어와 형용사 및 부사를 추출하는 의미어 추출부, 상기 웹 콘텐츠에서의 상기 추출된 의미어의 출현 빈도(term frequency)에 기초하여 주요어를 선정하는 주요어 선정부, 상기 감성 온톨로지에 기초하여 상기 선정된 감성 주요어를 미리 설정된 하나 이상의 감성 분야에 매칭시켜 상기 동영상 콘텐츠에 매칭되는 감성 정보를 추출하는 감성 정보 추출부 및 상기 선정된 감성 주요어에 기초하여 상기 추출된 감성 정보의 크기를 포함하는 감성 벡터를 생성하는 감성 벡터 생성부를 포함할 수 있다.In addition, in the first aspect of the present invention, the emotion information generation unit based on the result of the morphological analysis performed on the words and sentences included in the collected web content from the web content, such as nouns, compound nouns and the like, and adjectives and A semantic extraction unit for extracting adverbs, a main word selecting unit for selecting a main word based on the appearance frequency of the extracted semantic words in the web content, and the selected emotional main word in advance based on the emotional ontology Emotional vector extraction unit for extracting emotion information matching the video content by matching the set one or more emotion field and the emotion vector generating an emotion vector including the size of the extracted emotion information based on the selected emotion key words It may include wealth.

본 발명의 제 2 측면은 (a) 웹 크롤링(web crawling)을 이용하여 태그 생성 대상 콘텐츠에 매칭되는 하나 이상의 웹 콘텐츠를 수집하는 단계, (b) 상기 수집된 웹 콘텐츠로부터 명사 및 복합 명사로 구성된 키워드를 추출하고, 상기 추출된 키워드 및 미리 구축된 소재 온톨로지에 기초하여 소재 정보를 생성하는 단계, (c) 상기 수집된 웹 콘텐츠로부터 명사 및 복합 명사로 구성된 키워드를 추출하고, 상기 (b) 단계에서 생성된 소재 정보, 상기 추출된 키워드, 컨텐츠의 제작정보 및 미리 구축된 장르 온톨로지에 기초하여 장르 정보를 생성하는 단계, (d) 상기 수집된 웹 콘텐츠로부터 명사, 복합명사, 형용사 및 부사로 구성된 키워드를 추출하고, 상기 추출된 키워드와 미리 구축된 감성 온톨로지에 기초하여 감성 정보를 생성하는 단계 및 (e) 상기 (b) 단계에서 생성된 소재 정보, 상기 (c) 단계에서 생성된 장르 정보 및 상기 (d) 단계에서 생성된 감성 정보를 포함하는 상기 태그 생성 대상 콘텐츠에 매칭되는 태그 정보를 생성하는 단계를 포함하는 콘텐츠 태그 생성 시스템이 웹 마이닝을 이용하여 콘텐츠 태그를 생성하는 방법을 제공할 수 있다.
The second aspect of the present invention comprises the steps of (a) collecting one or more web content matching the tag generation content using web crawling, (b) consisting of nouns and compound nouns from the collected web content Extracting a keyword, generating material information based on the extracted keyword and a pre-established material ontology, (c) extracting a keyword consisting of nouns and compound nouns from the collected web content, and (b) Generating genre information based on the material information generated in the, the extracted keyword, the production information of the content and the pre-established genre ontology, (d) consisting of nouns, compound nouns, adjectives and adverbs from the collected web content Extracting a keyword, and generating emotional information based on the extracted keyword and a pre-established emotional ontology; and (e) generating the emotional information in step (b). Generating tag information matching the tag generation target content including the generated material information, the genre information generated in the step (c) and the emotion information generated in the step (d); A method of generating a content tag using web mining can be provided.

전술한 본 발명의 과제 해결 수단에 의하면, 웹 콘텐츠로부터 수집한 정보를 이용하여 IPTV 서비스에서 제공되는 동영상 콘텐츠에 대하여 소재, 장르 및 감성의 세 분야에 대한 정보를 포함하는 태그를 생성할 수 있으므로, 다수의 콘텐츠에 대하여도 보다 용이하게 태그를 생성할 수 있다.According to the above-described problem solving means of the present invention, by using the information collected from the web content, it is possible to generate a tag containing information on the three areas of the material, genre and emotion for the video content provided by the IPTV service, Tags can also be easily generated for a large number of contents.

또한, 전술한 본 발명의 과제 해결 수단에 의하면, 웹 상에서 수집한 웹 콘텐츠에 포함된 키워드 중에서 출현 빈도가 높은 키워드와 소재, 감성, 장르 온톨로지를 기반으로 태그를 생성하므로, 해당 동영상 콘텐츠에 정확히 매칭되는 내용을 포함하는 태그를 생성할 수 있다.In addition, according to the above-described problem solving means of the present invention, since the tag is generated based on the keyword and the material, emotion, genre ontology that has a high frequency of appearance among the keywords included in the web content collected on the web, it matches exactly the video content You can create a tag that contains the contents.

또한, 전술한 본 발명의 과제 해결 수단에 의하면, 동영상 콘텐츠의 내용을 소재, 장르 및 감성의 세 분야로 구분하여 관련 정보를 추출하여 태그에 포함시키므로, 소재, 장르 및 감성 정보 중에서 제공하고자 하는 서비스의 내용에 적합한 정보를 추출할 수 있고, 이를 이용한 서비스, 예를 들어 내용 기반 검색 및 추천, 문맥 기반 광고 등의 서비스의 효과를 극대화시킬 수 있다.
In addition, according to the above-described problem solving means of the present invention, the content of the video content is divided into three fields, such as material, genre and emotion, and extracts the relevant information to include in the tag, service to provide from the material, genre and emotion information Information suitable for the content of the can be extracted, and the effects of the service using the service, for example, content-based search and recommendation, context-based advertising, etc. can be maximized.

도 1은 본 발명의 일 실시예에 따른 웹 마이닝을 이용한 콘텐츠 태그 생성 시스템의 구성을 도시한 도면,
도 2는 본 발명의 일 실시예에 따른 소재 정보 생성부의 구성을 도시한 도면,
도 3은 본 발명의 일 실시예에 따른 장르 정보 생성부의 구성을 도시한 도면,
도 4는 본 발명의 일 실시예에 따른 감성 정보 생성부의 구성을 도시한 도면,
도 5는 본 발명의 일 실시예에 따른 웹 마이닝을 이용한 콘텐츠 태그 생성 방법의 흐름을 도시한 순서도,
도 6은 본 발명의 일 실시예에 따른 소재 정보를 생성하는 방법의 흐름을 도시한 순서도,
도 7은 본 발명의 일 실시예에 따른 장르 정보를 생성하는 방법의 흐름을 도시한 순서도,
도 8은 본 발명의 일 실시예에 따른 감성 정보를 생성하는 방법의 흐름을 도시한 순서도,
도 9는 본 발명의 일 실시예에 따라 생성된 동영상 콘텐츠의 감성 벡터를 도시한 도면.1 is a view showing the configuration of a content tag generation system using web mining according to an embodiment of the present invention;
2 is a view showing the configuration of a material information generation unit according to an embodiment of the present invention;
3 is a diagram illustrating a configuration of a genre information generation unit according to an embodiment of the present invention;
4 is a diagram illustrating a configuration of an emotion information generation unit according to an embodiment of the present invention;
5 is a flowchart illustrating a flow of a method for generating a content tag using web mining according to an embodiment of the present invention;
6 is a flow chart showing the flow of a method for generating location information according to an embodiment of the present invention;
7 is a flowchart illustrating a method of generating genre information according to an embodiment of the present invention.
8 is a flowchart illustrating a flow of a method of generating emotion information according to an embodiment of the present invention;
9 is a diagram illustrating an emotion vector of video content generated according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . In addition, when a part is said to "include" a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated.

도 1은 본 발명의 일 실시예에 따른 웹 마이닝을 이용한 콘텐츠 태그 생성 시스템의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of a content tag generation system using web mining according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 콘텐츠 태그 생성 시스템(10)은 웹 마이닝 수행부(100), 소재 정보 생성부(200), 소재 온톨로지 데이터베이스(300), 장르 정보 생성부(400), 장르 온톨로지 데이터베이스(500), 감성 정보 생성부(600), 감성 온톨로지 데이터베이스(700), 콘텐츠 태그 생성부(800) 및 콘텐츠 메타데이터 데이터베이스(900)를 포함한다.Content tag generation system 10 according to an embodiment of the present invention is a web mining execution unit 100, location information generation unit 200, location ontology database 300, genre information generation unit 400, genre ontology database 500, the emotion information generator 600, the emotion ontology database 700, the content tag generator 800, and the content metadata database 900 are included.

웹 마이닝 수행부(100)는 태그를 생성하고자 하는 동영상 콘텐츠에 대하여 웹 크롤링(web crawling)을 통해 동영상 콘텐츠의 줄거리, 출연 배우, 연출자, 각본 등의 제작 정보, 콘텐츠에 대한 리뷰, 평론 및 이에 대한 댓글, 콘텐츠에 대한 웹 게시물에 사용자가 설정한 사용자 태그 등의 정보를 수집한다. 웹 마이닝 수행부(100)는 웹 상에서 콘텐츠에 대하여 수집한 정보를 콘텐츠 메타데이터 데이터베이스(900)에 저장한다.Web mining execution unit 100 through the web crawling (web crawling) for the video content to create a tag production information, such as the plot of the video content, actors, directors, screenplay, reviews, reviews, and the like Collects information such as comments and user tags set by users in web posts about content. The web mining execution unit 100 stores the information collected about the content on the web in the content metadata database 900.

콘텐츠 태그 생성 시스템(10)은 웹 마이닝 수행부(100)에 의해 수집된 정보의 양이 미리 설정된 임계 값 이상인 콘텐츠에 대하여만 본 발명의 일 실시예에 따른 태그 생성을 수행할 수도 있다.The content tag generation system 10 may perform tag generation according to an embodiment of the present invention only on content whose amount of information collected by the web mining execution unit 100 is equal to or greater than a preset threshold value.

소재 정보 생성부(200)는 웹 마이닝 수행부(100)에 의해 수집되어 콘텐츠 메타데이터 데이터베이스(900)에 저장된 정보에 대하여 형태소 분석하여 소재에 관련된 의미어를 추출하고, 추출된 의미어 중에서 웹 상에서 수집한 웹 페이지 등의 웹 콘텐츠 내에서의출현 빈도를 기초로 주요 키워드를 선정하며, 미리 구축된 소재 온톨로지 데이터베이스(300)를 이용하여 선정된 키워드에서 소재 정보를 추출한다.The material information generating unit 200 extracts a semantic related to a material by morphologically analyzing the information collected by the web mining performing unit 100 and stored in the content metadata database 900, and extracts the semantic related to the material from the extracted semantic words on the web. Main keywords are selected based on the frequency of appearance in collected web content such as web pages, and material information is extracted from the selected keywords using the pre-built material ontology database 300.

소재 정보 생성부(200)에 대하여 도 2을 참조하여 이하에서 상세히 설명한다.The material information generation unit 200 will be described in detail below with reference to FIG. 2.

소재 온톨로지 데이터베이스(300)는 '스포츠, 축구, 소설 원작' 등 콘텐츠의 내용에 대하여 사용자가 키워드로 이용하고 기억하는 단어를 포함하며, 각각의 콘텐츠에 대하여 사용자가 태그에 삽입한 키워드를 분석하여 선정하고, 선정한 키워드에 대하여 한국어 어휘망, 한국어 대사전 등으로부터 유의어 관계, 상하위어 관계 등을 참고하여 각 단어사이의 유의어, 상하위어 정보를 추가하여 구축될 수 있다.The material ontology database 300 includes words that the user uses and remembers as contents for contents of contents such as 'sports, soccer, and novels', and analyzes and selects keywords inserted in tags for each contents. In addition, the selected keywords may be constructed by adding the synonyms and the upper and lower information between each word by referring to the synonym relations and the upper and lower relations from the Korean vocabulary network and the Korean metabolism.

장르 정보 생성부(400)는 웹 마이닝 수행부(100)에 의해 수집되어 콘텐츠 메타데이터 데이터베이스(900)에 저장된 정보에 대하여 형태소 분석하여 장르에 관련된 의미어를 추출하고, 추출된 의미어 중에서 웹 상에서 수집한 웹 콘텐츠 내에서의출현 빈도를 기초로 주요 키워드를 선정하며, 선정된 키워드 중에서 미리 구축된 장르 온톨로지 데이터베이스(500), 소재 정보 생성부(200)에 의해 생성된 소재 정보 및 콘텐츠 메타데이터 데이터베이스(900)에 저장된 콘텐츠의 출연 배우, 감독, 각본 등의 콘텐츠 제작 정보를 이용하여 장르 정보를 추출한다.The genre information generation unit 400 extracts the semantics related to the genre by morphologically analyzing the information collected by the web mining execution unit 100 and stored in the content metadata database 900, and from the extracted semantic words on the web. The main keywords are selected based on the appearance frequency in the collected web content, and the material information and the content metadata database generated by the genre ontology database 500 and the material information generation unit 200 which are constructed in advance among the selected keywords. Genre information is extracted using content production information such as the actor, director, and screenplay of the content stored in the 900.

장르 정보 생성부(400)에 대하여 도 3을 참조하여 이하에서 상세히 설명한다.The genre information generation unit 400 will be described in detail below with reference to FIG. 3.

장르 온톨로지 데이터베이스(500)는 '로맨스, SF, 판타지, 스릴러' 등 콘텐츠의 장르에 대하여 구축된 온톨로지 데이터를 포함한다. 장르 온톨로지 데이터베이스(500)는 각 장르 별로 사용자가 유사하게 사용하는 단어들, 예를 들어 '멜로'와 '로맨스', '공상과학'과 'SF', '환타지'와 '판타지' 등을 유사 단어로 매칭하여 포함할 수 있다.The genre ontology database 500 includes ontology data constructed for genres of content, such as 'romance, sci-fi, fantasy, and thriller'. The genre ontology database 500 is similar to words used by the user in each genre, for example, 'melo' and 'romance', 'sci-fi' and 'SF', 'fantasy' and 'fantasy'. It can be included to match.

감성 정보 생성부(600)는 웹 마이닝 수행부(100)에 의해 수집되어 콘텐츠 메타데이터 데이터베이스(900)에 저장된 정보에 대하여 형태소 분석하여 명사, 복합명사 뿐만 아니라 감성에 관련된 의미어, 특히 형용사 또는 부사에 의한 의미어를 추출하고, 추출된 의미어 중에서 웹 상에서 수집한 웹 콘텐츠 내에서의출현 빈도를 기초로 주요 키워드를 선정하며, 선정된 키워드 중에서 미리 구축된 감성 온톨로지 데이터베이스(700)를 이용하여 감성 정보를 생성한다.The emotion information generating unit 600 is a grammatical analysis of information collected by the web mining execution unit 100 and stored in the content metadata database 900, as well as nouns, compound nouns, and words related to emotions, particularly adjectives or adverbs. Extracts the semantic words, selects the main keywords based on the frequency of appearance in the web content collected on the web, and uses the pre-established emotional ontology database 700 among the selected keywords. Create

또한, 감성 정보 생성부(600)는 의미어의 출현 빈도 등에 기초하여 감성의 크기를 산출하고, 감성의 종류 및 크기를 반영한 감성 벡터를 생성한다.In addition, the emotion information generation unit 600 calculates the size of the emotion based on the frequency of appearance of the semantic word and generates an emotion vector reflecting the type and size of the emotion.

감성 정보 생성부(600)에 대하여 도 4을 참조하여 이하에서 상세히 설명한다.The emotion information generator 600 will be described in detail below with reference to FIG. 4.

감성 온톨로지 데이터베이스(700)는 사용자가 콘텐츠의 시청 또는 검색 시에 많이 이용되는 감성 어휘를 웹 콘텐츠로부터 획득하고, 획득한 감성 어휘 중에서 대표 감성 분야, 예를 들어 '기쁨', '노여움', '슬픔', '즐거움', '놀람', '쓸쓸함', '볼만함', '무서움'에 대하여 매칭하여 포함한다.The emotional ontology database 700 acquires the emotional vocabulary that is frequently used when the user watches or searches the content from the web content, and among the obtained emotional vocabularies, representative emotional fields such as 'joy', 'anger', 'sorrow' ',' Fun ',' surprise ',' loneliness', 'visible', 'fear' is included to match.

또한, 감성 온톨로지 데이터베이스(700)는 각각의 감성 어휘에 대하여 각각의 감성 분야의 크기를 -1.0 내지 1.0으로 설정하여 저장한다. 예를 들어 단어 '깜놀'에는 감성 '놀람'에 대하여 1.0의 크기가 설정되며, 단어 '추천'에 감성 '볼만함'에 대하여 0.5의 크기가 설졍되고, 단어 '강추'에는 감성 '볼만함'에 대하여 1.0의 크기가 설정될 수 있다.In addition, the emotional ontology database 700 sets and stores the size of each emotional field with -1.0 to 1.0 for each emotional vocabulary. For example, the word 'Kanol' is set to a size of 1.0 for the emotion 'surprise', 0.5 is set for the word 'recommendation' and a value of 'feeling' to the word 'strongest'. A size of 1.0 may be set for.

콘텐츠 태그 생성부(800)는 소재 정보 생성부(200), 장르 정보 생성부(500) 및 감성 정보 생성부(700)에 의해 생성된 소재 정보, 장르 정보 및 감성 정보를 포함하는 콘텐츠 태그를 생성하고, 생성한 태그를 해당하는 콘텐츠에 매칭시킨다.The content tag generator 800 generates a content tag including the material information, the genre information, and the emotion information generated by the material information generator 200, the genre information generator 500, and the emotion information generator 700. The generated tag is matched with the corresponding content.

이처럼 본 발명의 일 실시예에 따른 콘텐츠 태그 생성 시스템은 단순히 콘텐츠 제공자 또는 콘텐츠 관리자에 의해 제공된 정보가 아니라 웹 마이닝을 통해 웹 콘텐츠로부터 데이터를 수집하며, 수집한 데이터로부터 콘텐츠에 관련된 소재 정보, 장르 정보 및 감성 정보를 생성하여 태그에 포함시킴으로써, 콘텐츠에 대한 보다 정확하고 다양한 내용을 포함하는 태그를 생성할 수 있다.As such, the content tag generation system according to an embodiment of the present invention collects data from web content through web mining, not merely information provided by a content provider or a content manager, and material information and genre information related to the content from the collected data. And by generating the emotion information in the tag, it is possible to generate a tag containing a more accurate and varied content for the content.

도 2는 본 발명의 일 실시예에 따른 소재 정보 생성부의 구성을 도시한 도면이다.2 is a diagram illustrating a configuration of a location information generation unit according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 소재 정보 생성부(200)는 형태소 분석부(210), 소재 의미어 추출부(220), 주요어 선정부(230) 및 소재 정보 추출부(240)를 포함한다.The location information generation unit 200 according to an embodiment of the present invention includes a morpheme analysis unit 210, a location semantic word extraction unit 220, a key word selection unit 230, and a location information extraction unit 240.

형태소 분석부(210)는 태그를 생성하고자 하는 동영상 콘텐츠에 대하여 웹 마이닝 수행부(도시 생략)에 의해 콘텐츠 메타데이터 데이터베이스(900)에 저장된 정보에 대하여 형태소 분석을 수행한다.The morpheme analysis unit 210 performs morphological analysis on the information stored in the content metadata database 900 by the web mining performer (not shown) for the video content for which the tag is to be generated.

즉, 형태소 분석부(210)는 웹 마이닝을 통해 웹 게시물 등의 웹 콘텐츠로부터 수집된 동영상 콘텐츠에 대한 사용자 태그, 줄거리, 리뷰에서 사용된 단어 또는 문장에 대하여 형태소 분석을 수행한다.That is, the morpheme analysis unit 210 performs morphological analysis on words or sentences used in user tags, plots, and reviews for video content collected from web content such as web posts through web mining.

소재 의미어 추출부(220)는 형태소 분석부(210)에 의한 형태소 분석 결과로부터 소재에 일반적으로 사용되는 단어인 명사 및 주요 복합 명사 등의 소재 관련 의미어를 추출한다.The material semantic extractor 220 extracts material-related semantics such as nouns and main compound nouns, which are words generally used in the material, from the morphological analysis result by the morpheme analysis unit 210.

주요어 선정부(230)는 소재 의미어 추출부(220)에 의해 추출된 소재 관련 의미어가 웹 마이닝 수행부(도시 생략)에 의해 동영상 콘텐츠에 관련하여 수집된 웹 콘텐츠에서 출현한 횟수(term frequency)에 기초하여 주요어를 선정한다.The key word selector 230 is a term frequency in which material-related semantic words extracted by the material semantic extractor 220 appear in web content collected by the web mining performer (not shown) in relation to video content. Select key words based on

즉, 주요어 선정부(230)는 웹 콘텐츠에서 자주 등장하는 소재 관련 의미어의 경우 해당 동영상 콘텐츠에 대하여 관련도가 높을 것으로 판단하고, 웹 콘텐츠에서의 출현 횟수에 기초하여 주요어를 선정할 수 있다.That is, the key word selecting unit 230 may determine that the relevant words related to the material frequently appearing in the web content are highly related to the corresponding video content, and select the key words based on the number of appearances in the web content.

소재 정보 추출부(240)는 주요어 선정부(230)에 의해 선정된 주요어에 대하여 미리 구축된 소재 온톨로지 데이터베이스(300)를 적용하여 선정된 주요어에 대한 대표 소재 단어를 추출함으로써 소재 정보를 추출한다.The material information extracting unit 240 extracts material information by extracting a representative material word for the selected key word by applying the pre-built material ontology database 300 to the key word selected by the key word selecting unit 230.

즉, 소재 정보 추출부(240)는 선정된 주요어에 대한 상위어, 하위어 및 유사어 등을 소재 온톨로지 데이터베이스(300)로부터 획득하고, 획득한 상위어, 하위어 및 유사어에 기초하여 선정된 주요어를 대표 소재 단어로 정규화한다.That is, the material information extracting unit 240 obtains upper words, lower words, and similar words for the selected main words from the material ontology database 300, and represents the selected main words based on the obtained upper words, lower words, and similar words. Normalize to words

예를 들어, 주요어 선정부(230)에 의해 선정된 주요어가 '골', '오프사이드, '슈팅' 등인 경우, 소재 정보 추출부(240)는 소재 온톨로지 데이터베이스(300)에 포함된 정보를 이용하여 해당 주요어를 '축구'의 소재 단어로 정규화하고, '축구'라는 소재 정보를 추출할 수 있다.For example, when the main word selected by the main word selecting unit 230 is 'goal', 'offside' or 'shooting', etc., the material information extracting unit 240 uses information included in the material ontology database 300. The main word can be normalized to material words of 'soccer', and material information of 'soccer' can be extracted.

도 3은 본 발명의 일 실시예에 따른 장르 정보 생성부의 구성을 도시한 도면이다.3 is a diagram illustrating a configuration of a genre information generation unit according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 장르 정보 생성부(400)는 형태소 분석부(410), 장르 의미어 추출부(420), 주요어 선정부(430) 및 장르 정보 추출부(440)를 포함한다.The genre information generator 400 according to an embodiment of the present invention includes a morpheme analyzer 410, a genre semantic extractor 420, a key word selector 430, and a genre information extractor 440.

형태소 분석부(410)는 태그를 생성하고자 하는 동영상 콘텐츠에 대하여 웹 마이닝 수행부(도시 생략)에 의해 콘텐츠 메타데이터 데이터베이스(900)에 저장된 정보에 대하여 형태소 분석을 수행한다.The morpheme analysis unit 410 performs morphological analysis on the information stored in the content metadata database 900 by the web mining execution unit (not shown) for the video content for which the tag is to be generated.

즉, 형태소 분석부(410)는 웹 마이닝을 통해 웹 게시물 등의 웹 콘텐츠로부터 수집된 동영상 콘텐츠에 대한 제작 정보, 예를 들어 출연 배우, 감독에 대한 정보, 사용자 태그, 줄거리, 리뷰에 포함된 단어 또는 문장에 대하여 형태소 분석을 수행한다.That is, the morpheme analysis unit 410 may produce information about video content collected from web content such as web posts through web mining, for example, information on actors and directors, user tags, plots, and words included in reviews. Or perform a morphological analysis on the sentence.

장르 의미어 추출부(420)는 형태소 분석부(410)에 의한 형태소 분석 결과로부터 장르에 일반적으로 사용되는 단어인 명사 및 주요 복합 명사 등의 장르 관련 의미어를 추출한다.The genre semantic extractor 420 extracts genre-related semantics such as nouns and main compound nouns, which are words generally used in genres, from the morphological analysis result by the morpheme analysis unit 410.

주요어 선정부(430)는 장르 의미어 추출부(420)에 의해 추출된 장르 관련 의미어가 웹 마이닝 수행부(도시 생략)에 의해 동영상 콘텐츠에 관련하여 수집된 웹 콘텐츠에서 출현한 횟수(term frequency)에 기초하여 주요어를 선정한다.The key word selector 430 may include a term frequency in which the genre related semantic extracted by the genre semantic extractor 420 appears in web content collected by the web mining performer (not shown) in relation to the video content. Select key words based on

장르 정보 추출부(440)는 주요어 선정부(430)에 의해 선정된 주요어, 소재 정보 생성부(200)에 의해 생성된 소재 정보 및 콘텐츠 메타데이터 데이터베이스(900)에 저장된 출연 배우, 감독 등의 정보를 포함하는 콘텐츠 제작 정보에 대하여 미리 구축된 장르 온톨로지 데이터베이스(300)를 적용하여 동영상 콘텐츠의 장르를 추출한다.The genre information extraction unit 440 may include key words selected by the key word selection unit 430, material information generated by the material information generation unit 200, and actors and directors stored in the content metadata database 900. The genre of video content is extracted by applying the pre-built genre ontology database 300 to the content production information including the.

장르 정보 추출부(440)는 SVM, CRF, Naive Bayesian 분류기 등을 이용하여 동영상 콘텐츠의 장르를 분류할 수 있으며, 특히 동일한 입력에 대하여 순서화된 하나 이상의 장르로 분류할 수 있는 다중 분류기(Multi-value Classifier)를 이용하여 동영상 콘텐츠에 대하여 1 내지 3개의 장르 정보를 추출할 수 있다.The genre information extractor 440 may classify the genre of the video content using SVM, CRF, Naive Bayesian classifier, etc. In particular, the multi-value classifier may classify the genre into one or more genres ordered for the same input. 1 to 3 genre information may be extracted for the video content using a classifier.

도 4는 본 발명의 일 실시예에 따른 감성 정보 생성부의 구성을 도시한 도면이다.4 is a diagram illustrating a configuration of an emotion information generating unit according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 감성 정보 생성부(600)는 형태소 분석부(610), 감성 의미어 추출부(620), 주요어 선정부(630), 감성 정보 추출부(640) 및 감성 벡터 생성부(650)를 포함한다.Emotional information generating unit 600 according to an embodiment of the present invention is a morpheme analysis unit 610, emotional semantic extraction unit 620, key word selection unit 630, emotional information extraction unit 640 and emotion vector generation Part 650 is included.

형태소 분석부(610)는 태그를 생성하고자 하는 동영상 콘텐츠에 대하여 웹 마이닝 수행부(도시 생략)에 의해 콘텐츠 메타데이터 데이터베이스(900)에 저장된 정보에 대하여 형태소 분석을 수행한다.The morpheme analysis unit 610 performs morphological analysis on the information stored in the content metadata database 900 by the web mining performer (not shown) for the video content for which the tag is to be generated.

즉, 형태소 분석부(610)는 웹 마이닝을 통해 웹 게시물 등의 웹 콘텐츠로부터 수집된 줄거리, 리뷰, 평론 등에 포함된 단어 또는 문장에 대하여 형태소 분석을 수행한다.That is, the morpheme analysis unit 610 performs morphological analysis on words or sentences included in plots, reviews, reviews, etc. collected from web content such as web posts through web mining.

감성 의미어 추출부(620)는 형태소 분석부(610)에 의한 형태소 분석 결과로부터 감성에 일반적으로 사용되는 단어인 형용사 및 부사 등의 감성 관련 의미어를 추출한다.The emotional semantic extractor 620 extracts emotion-related semantic words, such as adjectives and adverbs, which are words generally used for emotion, from the morphological analysis result by the morpheme analysis unit 610.

감성 의미어 추출부(620)는 표준어를 대상으로 하여 감성 관련 의미어를 추출할 뿐만 아니라, 웹 상에서 주로 사용되는 감성 관련 의미어, 예를 들어, '깜놀', '강추' 등의 단어를 추출할 수 있다.The emotional semantic extractor 620 not only extracts the emotion-related semantics for the standard word, but also extracts the emotion-related semantic words mainly used on the web, for example, words such as 'Kanol' and 'Cheolchu'. can do.

주요어 선정부(630)는 감성 의미어 추출부(620)에 의해 추출된 장르 관련 의미어가 웹 마이닝 수행부(도시 생략)에 의해 동영상 콘텐츠에 관련하여 수집된 웹 콘텐츠에서 출현한 횟수(term frequency)에 기초하여 주요어를 선정한다.The key word selector 630 may include a term frequency in which the genre related semantic extracted by the emotional semantic extractor 620 appears in the web content collected by the web mining performer (not shown) in relation to the video content. Select key words based on

감성 정보 추출부(640)는 주요어 선정부(630)에 의해 선정된 주요어에 대하여 미리 구축된 감성 온톨로지 데이터베이스(700)를 적용하여 선정된 주요어를 미리 설정된 감성 분야, 예를 들어 '기쁨', '노여움', '슬픔', '즐거움', '놀람', '쓸쓸함', '볼만함', '무서움'의 감성 분야에 매칭시켜 감성 정보를 추출할 수 있다.The emotion information extracting unit 640 applies the selected emotion ontology database 700 pre-built with respect to the main word selected by the main word selecting unit 630 to set the selected main word in the field of emotion which is set in advance, for example, 'joy', ' Emotional information can be extracted by matching the emotional fields of anger, sadness, joy, surprise, loneliness, admiration, and fear.

감성 벡터 생성부(650)는 선정된 주요어에 대하여 미리 설정된 감성 분야의 크기를 매칭하여 감성 정보 추출부(640)에 의해 추출된 감성 정보에 대해 감성 크기가 적용된 감성 벡터를 생성할 수 있다.The emotion vector generator 650 may generate an emotion vector to which the emotion size is applied to the emotion information extracted by the emotion information extractor 640 by matching the size of a predetermined emotion field with respect to the selected key word.

예를 들어, 감성 정보 추출부(640)에 의해 감성 정보 '볼만함' 및 '놀람'이 추출되고, 선정된 주요어인 '추천' 및 '깜놀'에 대해 각각 '볼만함'의 크기 0.5 및 '놀람'의 크기 1.0이 설정된 경우, 감성 벡터 생성부(650)는 크기가 0.5만큼 적용된 '볼만함'의 감성 벡터 및 크기가 1.0만큼 전용된 '놀람'의 감성 벡터를 생성할 수 있다.For example, the emotion information 'visible' and 'surprise' are extracted by the emotion information extracting unit 640, and the size of 'visible' 0.5 and ' When the size 1.0 of 'surprise' is set, the emotion vector generating unit 650 may generate an emotion vector of 'visible' to which the size is applied by 0.5 and an emotion vector of 'surprise' dedicated to the size of 1.0.

따라서, 감성 벡터 생성부(650)는 '기쁨', '노여움', '슬픔', '즐거움', '놀람', '쓸쓸함', '볼만함' 및 '무서움'의 감성 정보에 대하여 크기를 산출하고, 산출한 크기를 이용하여 각각의 감성 정보에 대한 감성 벡터를 생성할 수 있으며, 감성 벡터 생성부(650)에 의해 생성된 감성 벡터는 도 9에 도시된 바와 같이 각각의 감성 분야에 대한 크기를 포함하도록 표현될 수 있다.Accordingly, the emotion vector generator 650 calculates the size of the emotion information of 'joy', 'anger', 'sorrow', 'pleasure', 'surprise', 'lonely', 'fullness' and 'fear' In addition, an emotion vector for each emotion information may be generated using the calculated size, and the emotion vector generated by the emotion vector generator 650 may have a size for each emotion field as shown in FIG. 9. It can be expressed to include.

도 5는 본 발명의 일 실시예에 따른 웹 마이닝을 이용한 콘텐츠 태그 생성 방법의 흐름을 도시한 순서도이다.5 is a flowchart illustrating a flow of a method for generating a content tag using web mining according to an embodiment of the present invention.

단계(S110)에서, 콘텐츠 태그 생성 시스템은 태그 생성 대상인 동영상 콘텐츠의 정보를 수신한다. 동영상 콘텐츠의 정보는 동영상 콘텐츠 제작자 또는 공급자로부터 수신할 수 있다. In step S110, the content tag generation system receives information of video content that is a tag generation target. Information of the video content may be received from a video content producer or provider.

단계(S120)에서, 콘텐츠 태그 생성 시스템은 단계(S110)에서 수신한 동영상 콘텐츠의 정보에 기초하여 웹 마이닝을 수행한다. 콘텐츠 태그 생성 시스템은 웹 크롤링(web crawling)을 통해 동영상 콘텐츠의 줄거리, 콘텐츠 제작 정보, 콘텐츠에 대한 리뷰, 평론 및 이에 대한 댓글, 콘텐츠에 대한 웹 게시물에 사용자가 설정한 사용자 태그 등의 정보를 수집할 수 있다.In step S120, the content tag generation system performs web mining based on the information of the video content received in step S110. The content tagging system uses web crawling to collect information such as the plot of the video content, content creation information, reviews of the content, reviews and comments about it, and user tags set by users in web posts about the content. can do.

단계(S130)에서, 콘텐츠 태그 생성 시스템은 단계(S120)에서 웹 마이닝을 통해 수집된 정보 및 미리 구축된 소재 온톨로지, 장르 온톨로지 및 감성 온톨로지를 이용하여 소재 정보, 장르 정보 및 감성 정보를 생성한다.In step S130, the content tag generation system generates material information, genre information, and emotion information by using the information collected through web mining in step S120 and the pre-built material ontology, genre ontology, and emotional ontology.

즉, 콘텐츠 태그 생성 시스템은 웹 마이닝을 통해 수집된 정보에 대하여 형태소 분석 및 의미어 추출을 통하여 소재, 장르 및 감성에 매칭되는 키워드를 추출하고, 추출된 키워드 및 소재 온톨로지, 장르 온톨로지 및 감성 온톨로지를 이용하여 각각 소재 정보, 장르 정보 및 감성 정보를 생성할 수 있다.That is, the content tag generation system extracts keywords matching materials, genres, and emotions through morphological analysis and semantic extraction of information collected through web mining, and extracts extracted keywords, material ontology, genre ontology, and emotional ontology. Material information, genre information, and emotion information can be generated, respectively.

소재 정보, 장르 정보 및 감성 정보의 생성 방법에 대하여 도 6 내지 도 8을 참조하여 후술하도록 한다.A method of generating location information, genre information, and emotion information will be described later with reference to FIGS. 6 to 8.

단계(S140)에서, 콘텐츠 태그 생성 시스템은 단계(S130)에서 생성한 소재 정보, 장르 정보 및 감성 정보를 포함하는 태그를 생성하고, 생성한 태그를 해당하는 동영상 콘텐츠에 매칭시킨다.In step S140, the content tag generation system generates a tag including material information, genre information, and emotion information generated in step S130, and matches the generated tag with the corresponding video content.

도 6은 본 발명의 일 실시예에 따른 소재 정보를 생성하는 방법의 흐름을 도시한 순서도이다.6 is a flowchart illustrating a flow of a method of generating location information according to an embodiment of the present invention.

단계(S210)에서, 콘텐츠 태그 생성 시스템은 동영상 콘텐츠에 대하여 웹 마이닝된 정보에 대해 형태소 분석을 수행한다. 즉, 콘텐츠 태그 생성 시스템은 웹 콘텐츠로부터 수집된 콘텐츠에 대한 줄거리, 리뷰, 사용자 태그 등에 포함된 단어 또는 문장에 대해 형태소 분석을 수행한다.In operation S210, the content tag generation system performs morphological analysis on the web mined information about the video content. That is, the content tag generation system performs morphological analysis on words or sentences included in plots, reviews, user tags, and the like, for content collected from web content.

단계(S220)에서, 콘텐츠 태그 생성 시스템은 단계(S210)에서 수행된 형태소 분석의 결과로부터 소재 관련 의미어를 추출한다. 즉, 콘텐츠 태그 생성 시스템은 단계(S210)에서 수행된 형태소 분석 결과로부터 소재에 주로 사용되는 명사 및 주요 복합 명사 등의 의미어를 추출한다.In step S220, the content tag generation system extracts material-related semantics from the result of the morphological analysis performed in step S210. That is, the content tag generation system extracts a semantic word such as a noun and a main compound noun mainly used for the material from the morpheme analysis result performed in step S210.

단계(S230)에서, 콘텐츠 태그 생성 시스템은 웹 마이닝을 통해 수집한 웹 콘텐츠에서 출현하는 빈도, 즉 횟수에 기초하여 단계(S210)에서 추출된 의미어 중에서 주요어를 선정한다.In step S230, the content tag generation system selects a key word from the semantic words extracted in step S210 based on the frequency, that is, the number of occurrences in the web content collected through web mining.

전술한 바와 같이, 웹 콘텐츠에서 주로 사용되는 단어일수록 상대적으로 주요한 단어일 가능성이 높으므로, 콘텐츠 태그 생성 시스템은 웹 콘텐츠에서의 출현 횟수에 기초하여 주요어를 선정할 수 있다.As described above, since a word mainly used in web content is more likely to be a major word, the content tag generation system may select a key word based on the number of occurrences in the web content.

단계(S240)에서, 콘텐츠 태그 생성 시스템은 미리 구축한 소재 온톨로지를 이용하여 단계(S230)에서 선정한 주요어를 대표 소재 단어로 설정한다.In step S240, the content tag generation system sets the main word selected in step S230 as the representative material word using the previously constructed material ontology.

즉, 콘텐츠 태그 생성 시스템은 소재 온톨로지를 이용하여 주요어의 상위어, 하위어 및 유사어 등의 관계를 분석하고, 주요어에 매칭되는 대표 소재 단어를 검색하여 주요어를 대표 소재 단어로 설정하고, 설정한 대표 소재 단어를 포함하는 해당 동영상 콘텐츠의 소재 정보를 생성할 수 있다.That is, the content tag generation system analyzes the relationship between the upper word, the lower word and the similar word of the main word by using the material ontology, searches for the representative material word matching the main word, sets the main word as the representative material word, and sets the representative material. Material information of the video content including the word may be generated.

도 7은 본 발명의 일 실시예에 따른 장르 정보를 생성하는 방법의 흐름을 도시한 순서도이다.7 is a flowchart illustrating a flow of a method of generating genre information according to an embodiment of the present invention.

단계(S310)에서, 콘텐츠 태그 생성 시스템은 동영상 콘텐츠에 대하여 웹 마이닝된 정보에 대해 형태소 분석을 수행한다. 즉, 콘텐츠 태그 생성 시스템은 웹 콘텐츠로부터 수집된 콘텐츠에 대한 줄거리, 리뷰, 사용자 태그, 콘텐츠 제작 정보 등에 포함된 단어 또는 문장에 대해 형태소 분석을 수행한다.In operation S310, the content tag generation system performs morphological analysis on the web mined information about the video content. That is, the content tag generation system performs morphological analysis on the words or sentences included in the plot, review, user tag, content production information, and the like for the content collected from the web content.

단계(S320)에서, 콘텐츠 태그 생성 시스템은 단계(S310)에서 수행된 형태소 분석의 결과로부터 장르 관련 의미어를 추출한다. 즉, 콘텐츠 태그 생성 시스템은 단계(S310)에서 수행된 형태소 분석 결과로부터 장르에 주로 사용되는 명사 및 주요 복합 명사 등의 의미어를 추출한다.In step S320, the content tag generation system extracts a genre related semantic from the result of the morpheme analysis performed in step S310. That is, the content tag generation system extracts the semantic words, such as nouns and main compound nouns, which are mainly used in the genre, from the morphological analysis result performed in step S310.

단계(S330)에서, 콘텐츠 태그 생성 시스템은 웹 마이닝을 통해 수집한 웹 콘텐츠에서 출현하는 빈도, 즉 횟수에 기초하여 단계(S320)에서 추출된 의미어 중에서 주요어를 선정한다.In step S330, the content tag generation system selects a main word from the semantic words extracted in step S320 based on the frequency, that is, the number of occurrences in the web content collected through web mining.

단계(S340)에서, 콘텐츠 태그 생성 시스템은 단계(S330)에서 선정된 주요어 및 동영상 콘텐츠의 제작 정보, 예를 들어 콘텐츠의 배우, 감독 등의 정보를 미리 구축된 장르 온톨로지에 적용하여 동영상 콘텐츠의 장르를 추출한다.In step S340, the content tag generation system applies the keywords selected in step S330 and production information of the video content, for example, information such as actors and directors of the content, to a pre-built genre ontology to genre of video content. Extract

예를 들어, 콘텐츠의 배우가 '짐캐리'인 경우, 해당 동영상 콘텐츠의 장르는 '코미디'로 분류될 가능성이 높으며, 콘텐츠의 감독이 'M. 나이트 샤말란'인 경우, 해동 동영상 콘텐츠의 장르는 '미스터리'로 분류될 가능성이 높다.For example, if the actor of the content is "Jim Carry," then the genre of the video content is likely to be classified as "comedy." In the case of "Night Shamalan", the genre of thawed video content is likely to be classified as "mystery."

콘텐츠 태그 생성 시스템은 다중 분류기(Multi-value Classifier)를 이용하여 동영상 콘텐츠에 대하여 1 내지 3개의 장르 정보를 추출할 수 있다.The content tag generation system may extract 1 to 3 genre information on the video content using a multi-value classifier.

도 8은 본 발명의 일 실시예에 따른 감성 정보를 생성하는 방법의 흐름을 도시한 순서도이다.8 is a flowchart illustrating a flow of a method of generating emotion information according to an embodiment of the present invention.

단계(S410)에서, 콘텐츠 태그 생성 시스템은 동영상 콘텐츠에 대하여 웹 마이닝된 정보에 대해 형태소 분석을 수행한다. 즉, 콘텐츠 태그 생성 시스템은 웹 콘텐츠로부터 수집된 콘텐츠에 대한 줄거리, 리뷰, 평론 등에 포함된 단어 또는 문장에 대해 형태소 분석을 수행한다.In operation S410, the content tag generation system performs morphological analysis on the web mined information about the video content. That is, the content tag generation system performs morphological analysis on words or sentences included in plots, reviews, reviews, and the like, for content collected from web content.

단계(S420)에서, 콘텐츠 태그 생성 시스템은 단계(S410)에서 수행된 형태소 분석의 결과로부터 감성 관련 의미어를 추출한다. 즉, 콘텐츠 태그 생성 시스템은 단계(S410)에서 수행된 형태소 분석 결과로부터 감성에 주로 사용되는 형용사 및 부사 등의 의미어를 추출한다.In operation S420, the content tag generation system extracts an emotion-related semantic word from the result of the morpheme analysis performed in operation S410. That is, the content tag generation system extracts the semantic words such as adjectives and adverbs mainly used for emotion from the morpheme analysis result performed in step S410.

단계(S430)에서, 콘텐츠 태그 생성 시스템은 웹 마이닝을 통해 수집한 웹 콘텐츠에서 출현하는 빈도, 즉 횟수에 기초하여 단계(S420)에서 추출된 의미어 중에서 주요어를 선정한다.In step S430, the content tag generation system selects a key word from the semantic words extracted in step S420 based on the frequency, that is, the number of occurrences in the web content collected through web mining.

단계(S440)에서, 콘텐츠 태그 생성 시스템은 단계(S430)에서 선정된 주요어 중에서 미리 구축된 감성 온톨로지를 이용하여 미리 설정된 감성 분야, '기쁨', '노여움', '슬픔', '즐거움', '놀람', '쓸쓸함', '볼만함', '무서움'의 감성 분야에 매칭되는 감성 어휘를 추출한다.In step S440, the content tag generation system uses a preset emotional ontology among the key words selected in step S430, and the preset emotional field, 'joy', 'anger', 'sadness',' pleasure ',' Emotional vocabulary that matches the emotional field of surprise, loneliness, worthiness, and fear is extracted.

감성 온톨로지는 웹 콘텐츠로부터 획득한 감성 어휘를 미리 설정된 감성 분야에 매칭한 정보를 포함하고 있으며, 콘텐츠 태그 생성 시스템은 이러한 감성 온톨로지를 이용하여 단계(S430)에서 선정된 주요어 중에서 감성 분야에 매칭되는 감성 어휘를 추출할 수 있다.The emotional ontology includes information matching the emotional vocabulary obtained from the web content to a predetermined emotional field, and the content tag generation system uses the emotional ontology to match the emotional field among the main words selected in step S430. Vocabulary can be extracted.

단계(S450)에서, 콘텐츠 태그 생성 시스템은 단계(S440)에서 추출한 감성 어휘를 이용하여 해당 동영상 콘텐츠에 대한 감성 정보를 생성하고, 각각의 감성 어휘에 매칭되어 설정된 감성 분야의 크기를 이용하여 해당 동영상 콘텐츠에 대한 감성 분야의 크기를 포함하는 감성 벡터를 생성한다.In step S450, the content tag generation system generates emotion information on the video content by using the emotional vocabulary extracted in step S440, and uses the size of the emotion field set to match each emotional vocabulary. Create an emotion vector that includes the size of the field of emotion for the content.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. One embodiment of the present invention may also be embodied in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.While the methods and systems of the present invention have been described in connection with specific embodiments, some or all of those elements or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.
The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

10 : 콘텐츠 태그 생성 시스템 100 : 웹 마이닝 수행부
200 : 소재 정보 생성부 300 : 소재 온톨로지 데이터베이스
400 : 장르 정보 생성부 500 : 장르 온톨로지 데이터베이스
600 : 감성 정보 생성부 700 : 감성 온톨로지 데이터베이스
800 : 콘텐츠 태그 생성부 900 : 콘텐츠 메타데이터 데이터베이스10: content tag generation system 100: web mining unit
200: material information generation unit 300: material ontology database
400: genre information generation unit 500: genre ontology database
600: emotion information generation unit 700: emotion ontology database
800: content tag generation unit 900: content metadata database

Claims

In the content tag generation system using web mining,
Web mining execution unit for collecting one or more web content matching the video content to create a tag,
A material information generation unit for generating material information matching the video content based on a pre-built material ontology and the collected one or more web contents;
A genre information generation unit generating genre information matching the video content based on the material information generated by the material information generation unit, a pre-built genre ontology, production information of the video content, and the collected one or more web contents. ,
An emotion information generation unit generating emotion information matching the video content based on a previously constructed emotion ontology and the collected one or more web contents, and generating an emotion vector including the size of the generated emotion information;
Content tag generation unit for generating tag information matching the video content including the generated material information, genre information, emotion information and vector information
Content tag generation system comprising a.

The method of claim 1,
The material information generation unit,
A material semantic extracting unit for extracting a material semantic matching at least one of a noun and a compound noun from the web content based on a morphological analysis result performed on the words and sentences included in the collected web content;
A material key word selection unit for selecting a key word matching the material based on a frequency of appearance of the extracted material semantic word in the web content;
A material information extracting unit obtaining a representative material word matching the selected main word from the material ontology and extracting material information matching the video content based on the obtained representative material word
Content tag generation system comprising a.

The method of claim 1,
The genre information generation unit,
A genre semantic extracting unit for extracting a genre semantic matching the genre from the web content based on a result of morphological analysis performed on the words and sentences included in the collected web content;
A genre main word selecting unit that selects a main word matching the genre based on a term frequency of the extracted genre semantic in the web content;
A genre information extraction unit configured to extract one or more genres matching the video content by matching the selected genre key word, the material information, and production information including at least one of actor information and director information of the video content with the genre ontology
Content tag generation system comprising a.

The method of claim 3, wherein
The genre information extracting unit extracts a predetermined number of genre information by using a multi-class classifier that classifies one or more genres ordered for the same input.

The method of claim 1,
The emotion information generation unit,
An emotional semantic extraction unit for extracting an emotional semantic matching at least one of an adjective and an adverb from the web content based on a morphological analysis result performed on words and sentences included in the collected web content;
Emotional key word selection unit for selecting a key word matching the emotion based on the appearance frequency (term frequency) of the extracted emotional semantic in the web content,
An emotion information extracting unit configured to extract the emotion information matching the video content by matching the selected emotion key word with one or more preset emotion fields based on the emotion ontology;
Emotion vector generation unit that generates an emotion vector including the size of the extracted emotion information based on the selected emotion key word
Content tag generation system comprising a.

In the method for generating a content tag by the content tag generation system using web mining,
(a) using web crawling to collect one or more web content that matches the tag generation content,
(b) extracting a material keyword matching at least one of a noun and a compound noun from the collected web content, and generating material information based on the extracted material keyword and a pre-built material ontology;
(c) extracting a genre keyword matching at least one of a noun and a compound noun from the collected web content, and based on the material information generated in step (b), the extracted genre keyword, and a pre-built genre ontology Generating genre information,
(d) extracting an emotional keyword matching at least one of adjectives and adverbs from the collected web content, and extracting the emotional information based on the extracted emotional keyword, production information of the tag generation target content, and a pre-established emotional ontology. To generate and
(e) generating tag information that matches the tag generation target content including the material information generated in the step (b), the genre information generated in the step (c), and the emotion information generated in the step (d) Steps to
Content tag generation method comprising a.

The method according to claim 6,
The step (b)
(b1) extracting a material semantic word matching one or more nouns or compound nouns matching the material by using morpheme analysis on the collected web content;
(b2) selecting a material main word from the extracted material semantics based on a frequency of occurrence of the extracted material semantics in the web content; and
(b3) setting a representative material word that matches the selected material key word based on the material ontology, and generating material information that matches the tag generation content based on the set representative material word
Content tag generation method comprising a.

The method according to claim 6,
In step (c),
(c1) extracting a genre semantic word matching one or more nouns or a compound noun matching the genre using morphological analysis on the collected web content;
(c2) selecting a genre main word from the extracted genre semantic words based on a frequency of occurrence of the extracted genre semantic words in the web content; and
(c3) generating one or more genre information of the tag generation target content by matching the genre key word, the material information generated in the step (b) and the production information of the tag generation target content with the genre ontology
Content tag generation method comprising a.

The method of claim 7, wherein
In the step (c3), wherein the genre information includes a preset number of genres.

The method according to claim 6,
The step (d)
(d1) extracting the emotional semantics matching the one or more adjectives or adverbs matching emotions using morphological analysis on the collected web content;
(d2) selecting an emotional main word from the extracted emotional semantics based on a frequency of the extracted emotional semantics appearing in the web content;
(d3) extracting an emotional field matching the selected emotional key word based on the emotional ontology, and generating emotional information of the tag generation target content based on the emotional field;
(d4) generating an emotion vector including the size of the extracted emotional field based on a size matching the selected emotional key word;
Content tag generation method comprising a.