KR20230013140A

KR20230013140A - Apparatus, method and program for extracting keywords based on keyword extraction rules

Info

Publication number: KR20230013140A
Application number: KR1020230003962A
Authority: KR
Inventors: 김근희; 한미란
Original assignee: 주식회사 메디치소프트
Priority date: 2020-03-30
Filing date: 2023-01-11
Publication date: 2023-01-26
Also published as: KR20210121387A; KR102488914B1

Abstract

본 발명은 콘텐츠에서 키워드를 추출하고, 추출된 키워드를 이용하여 콘텐츠를 추천하는 방법에 관한 것으로, 키워드 추출 규칙을 이용하여 콘텐츠에 관련된 키워드들을 자동으로 추출하여 대표 키워드로 설정하고, 사용자로부터 입력된 키워드를 대표 키워드로 가진 콘텐츠를 검색하고, 검색된 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출하여 사용자가 또 다시 표출된 대표 키워드 중 하나를 선택하여 콘텐츠를 검색할 수 있도록 하는 효과가 있다.The present invention relates to a method for extracting keywords from content and recommending content using the extracted keywords, automatically extracting keywords related to content using a keyword extraction rule and setting them as representative keywords, There is an effect of searching for content having a keyword as a representative keyword and displaying the searched content together with a representative keyword set for each content so that the user can select one of the displayed representative keywords to search the content again.

Description

Apparatus, method and program for extracting keywords based on keyword extraction rules}

본 개시는 키워드 추출 장치에 관한 것으로, 키워드 추출 규칙 기반으로 키워드를 추출하는 장치에 관한 것이다.The present disclosure relates to an apparatus for extracting a keyword, and relates to an apparatus for extracting a keyword based on a keyword extraction rule.

최근 들어, 콘텐츠의 양이 점점 더 방대해지면서 콘텐츠를 이용하는 사용자들은 본인이 찾는 콘텐츠를 찾는데 어려움을 겪고 있다.Recently, as the amount of content becomes more and more massive, users who use content have difficulty finding the content they are looking for.

콘텐츠의 종류와 양이 급격하게 증가하기 전에는 사람이 직접 콘텐츠를 감상하고 키워드를 입력하였지만, 콘텐츠의 종류와 양이 늘어나는 속도가 너무 빨라짐에 따라 이러한 작업이 불가능해지고 있다.Before the rapid increase in the type and amount of content, people directly watched the content and entered keywords, but this task is becoming impossible as the type and amount of content increase too fast.

따라서, 콘텐츠가 입력되면 이를 분석하여 해당 콘텐츠를 표현하는 키워드들을 자동으로 설정하고, 이를 이용하여 사용자가 키워드 기반으로 콘텐츠를 검색할 수 있도록 하는 방법이 필요한 상황이지만, 이를 구현하고 있는 기술이 공개되어 있지 않은 실정이다.Therefore, when content is input, it is necessary to analyze it and automatically set keywords that express the content, and use this to enable users to search for content based on keywords. the situation is not there.

대한민국 공개특허공보 제10-2019-0055963호, (2019.05.24)Republic of Korea Patent Publication No. 10-2019-0055963, (2019.05.24)

상술한 바와 같은 문제점을 해결하기 위한 본 발명은 키워드 추출 규칙을 이용하여 콘텐츠에 관련된 키워드들을 추출하고자 한다.The present invention to solve the above problems is to extract keywords related to content using keyword extraction rules.

또한, 본 발명은 키워드 필터링 규칙을 이용하여 추출된 키워드들에서 검색 기능에 불필요한 키워드를 자동으로 제외하여 전처리 하고자 한다.In addition, the present invention intends to preprocess by automatically excluding keywords unnecessary for a search function from keywords extracted using keyword filtering rules.

또한, 본 발명은 전처리된 키워드들을 키워드 분류 규칙을 이용하여 분류하여 작업자가 세분화 작업을 효율적으로 진행할 수 있도록 한다.In addition, the present invention classifies preprocessed keywords using a keyword classification rule so that a worker can efficiently perform a segmentation task.

또한, 본 발명은 작업자로부터 키워드에 대한 세분화 수행 데이터가 수신되면, 소정 개수의 키워드를 해당 콘텐츠의 대표 키워드로 설정하고자 한다.In addition, in the present invention, when segmentation performance data for keywords is received from an operator, a predetermined number of keywords are set as representative keywords of the corresponding content.

또한, 본 발명은 사용자로부터 입력된 키워드를 대표 키워드로 가진 콘텐츠를 검색하고, 검색된 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출하여 사용자가 또 다시 대표 키워드를 선택할 수 있도록 한다.In addition, the present invention searches for content having a keyword input from a user as a representative keyword, and displays the searched content together with a representative keyword set for each content so that the user can select the representative keyword again.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 키워드 추출 규칙 기반의 키워드 추출 장치는, 입력된 콘텐츠를 키워드 추출 규칙을 기반으로 분석하고, 상기 분석 결과를 기반으로 상기 콘텐츠에 관련된 복수의 키워드를 추출하는 추출 모듈; 상기 추출된 복수의 키워드를 키워드 분류 규칙에 따라 분석하여, 상기 복수의 키워드를 분류하는 분류 모듈; 및 세분화 수행 데이터에 포함된 기 설정된 개수의 키워드를 상기 콘텐츠의 대표 키워드로 설정하는 설정 모듈을 포함하며, 상기 추출 모듈은, 상기 콘텐츠에 자막이 포함되어 있지 않은 경우 STT (Speech to Text) 기능을 이용하여 상기 콘텐츠에서 자막을 추출하고, 상기 추출된 자막을 이용하여 상기 복수의 키워드를 추출하고, 상기 콘텐츠 내에서 기 설정된 횟수 이상으로 언급된 적어도 하나의 단어를 상기 복수의 키워드로 추출하되, 상기 콘텐츠의 길이를 기반으로 상기 기 설정된 횟수를 결정할 수 있다.A keyword extraction device based on a keyword extraction rule according to an embodiment of the present invention for solving the above problems analyzes input content based on a keyword extraction rule, and based on the analysis result, a plurality of information related to the content an extraction module for extracting keywords; a classification module that analyzes the extracted plurality of keywords according to keyword classification rules and classifies the plurality of keywords; and a setting module configured to set a predetermined number of keywords included in segmentation performance data as representative keywords of the content, wherein the extraction module performs a speech to text (STT) function when subtitles are not included in the content. extracting subtitles from the content using the extracted subtitles, extracting the plurality of keywords using the extracted subtitles, and extracting at least one word mentioned more than a predetermined number of times in the content as the plurality of keywords; The predetermined number of times may be determined based on the length of content.

또한, 상기 키워드 추출 장치는, 상기 콘텐츠의 소개 자료, 제목 및 분야를 기반으로 기 설정된 개수의 간이 키워드를 추출하여 키워드 추출 기준을 결정하고, 상기 결정된 키워드 추출 기준을 기반으로 상기 콘텐츠에서 상기 복수의 키워드를 추출하되, 상기 간이 키워드와 기 설정된 유사도를 갖거나 해당 분야에서 공통점을 갖는 단어들을 상기 복수의 키워드로 추출할 수 있다.In addition, the keyword extraction device determines a keyword extraction criterion by extracting a preset number of simple keywords based on the introduction material, title, and field of the content, and determines the keyword extraction criterion from the content based on the determined keyword extraction criterion. A keyword is extracted, but words having a predetermined similarity with the simple keyword or having a common point in a corresponding field may be extracted as the plurality of keywords.

또한, 상기 키워드 추출 장치는, 상기 분류된 유사 키워드 그룹 내의 유사 키워드 개수 및 상기 키워드 추출 기준과의 매칭도를 기반으로, 상기 분류된 유사 키워드 그룹 각각의 키워드에 가중치를 부여하되, 동일한 유사 키워드 그룹 내 키워드는 동일한 가중치를 부여할 수 있다.In addition, the keyword extracting apparatus assigns a weight to each keyword of the classified similar keyword group based on the number of similar keywords in the classified similar keyword group and the degree of matching with the keyword extraction criterion, and the same similar keyword group My keywords can be given equal weight.

또한, 상기 키워드 추출 장치는, 상기 콘텐츠가 이미지 기반의 콘텐츠인 경우, 상기 콘텐츠 내에서 이미지를 인식하고 상기 인식된 이미지와 관련된 상기 복수의 키워드를 추출할 수 있다.Also, when the content is image-based content, the keyword extraction device may recognize an image in the content and extract the plurality of keywords related to the recognized image.

또한, 상기 키워드 필터링 규칙은, 키워드로서 특이점을 갖지 못하는 일반적인 문구와 일반적인 표현에 해당하는 키워드를 제외시키고, 키워드에 포함된 조사를 삭제할 수 있다.In addition, the keyword filtering rule may exclude general phrases having no specificity as keywords and keywords corresponding to general expressions, and delete investigations included in keywords.

또한, 상기 키워드 추출 장치는, 상기 콘텐츠가 강의 자료인 경우, 상기 콘텐츠의 강의 소개자료, 제목, 강의 분야 중 적어도 하나를 기반으로 상기 키워드 추출 기준을 결정하고, 상기 결정된 키워드 추출 기준을 기반으로 상기 콘텐츠의 강사 이름, 강의 과정, 스크립트 및 음성 데이터 중 적어도 하나에서 관련 키워드를 추출할 수 있다.In addition, when the content is a lecture material, the keyword extraction device determines the keyword extraction criterion based on at least one of a lecture introduction material, a title, and a lecture field of the content, and the keyword extraction criterion determines the keyword extraction criterion. A related keyword may be extracted from at least one of a lecturer's name, lecture course, script, and audio data of content.

또한, 상기 키워드 추출 장치는, 상기 복수의 키워드 중에서 이음 동의어 또는 유사어에 해당하거나 기 설정된 수준 유사도를 만족하는 키워드끼리 유사 키워드로 분류하고, 각 키워드의 유사 키워드 개수와 상기 키워드 추출 기준과의 매칭도를 기반으로 분류된 각 키워드에 가중치를 부여할 수 있다.In addition, the keyword extracting device classifies keywords corresponding to allophone synonyms or similar words or satisfying a predetermined level of similarity as similar keywords among the plurality of keywords, and the matching degree between the number of similar keywords of each keyword and the keyword extraction criterion. A weight can be assigned to each keyword classified based on .

키워드 추출 장치에 의해 대표 키워드가 설정된 콘텐츠를 이용하는 키워드를 이용한 콘텐츠 추천 장치에 있어서, 상기 콘텐츠 추천 장치는, 사용자로부터 입력 받은 키워드를 대표 키워드로 가진 콘텐츠를 검색하고, 상기 검색된 하나 이상의 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출하고, 사용자로부터 상기 표출된 대표 키워드 중 하나가 선택되면, 상기 검색된 콘텐츠 중에서 상기 선택된 대표 키워드를 대표 키워드로 가진 콘텐츠를 재검색하고, 상기 재검색된 하나 이상의 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출할 수 있다.An apparatus for recommending contents using keywords using contents for which representative keywords are set by a keyword extracting apparatus, wherein the apparatus for recommending contents using a keyword input from a user searches for contents having as a representative keyword, and selects one or more of the searched contents for each content. If one of the displayed representative keywords is selected from the user, content having the selected representative keyword as a representative keyword is re-searched among the searched contents, and one or more re-searched contents are added to each content. It can be expressed together with the set representative keyword.

또한, 상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 키워드 추출 규칙 기반의 키워드 추출 방법은, 키워드 추출 장치에 의해 수행되는 방법으로, 입력된 콘텐츠를 키워드 추출 규칙을 기반으로 분석하는 단계; 상기 분석 결과를 기반으로 상기 콘텐츠에 관련된 복수의 키워드를 추출하는 단계; 상기 추출된 복수의 키워드를 키워드 분류 규칙에 따라 분석하여, 상기 복수의 키워드를 분류하는 단계; 및 세분화 수행 데이터에 포함된 기 설정된 개수의 키워드를 상기 콘텐츠의 대표 키워드로 설정하는 단계를 포함하고, 상기 키워드 추출 장치는, 상기 콘텐츠에 자막이 포함되어 있지 않은 경우 STT (Speech to Text) 기능을 이용하여 상기 콘텐츠에서 자막을 추출하고, 상기 추출된 자막을 이용하여 상기 복수의 키워드를 추출하고, 상기 콘텐츠 내에서 기 설정된 횟수 이상으로 언급된 적어도 하나의 단어를 상기 복수의 키워드로 추출하되, 상기 콘텐츠의 길이를 기반으로 상기 기 설정된 횟수를 결정할 수 있다.In addition, a keyword extraction method based on keyword extraction rules according to an embodiment of the present invention for solving the above problems is a method performed by a keyword extraction apparatus, which includes the steps of analyzing input content based on keyword extraction rules. ; extracting a plurality of keywords related to the content based on the analysis result; analyzing the extracted plurality of keywords according to keyword classification rules and classifying the plurality of keywords; and setting a predetermined number of keywords included in segmentation performance data as representative keywords of the content, wherein the keyword extracting device performs a speech to text (STT) function when the content does not include subtitles. extracting subtitles from the content using the extracted subtitles, extracting the plurality of keywords using the extracted subtitles, and extracting at least one word mentioned more than a predetermined number of times in the content as the plurality of keywords; The predetermined number of times may be determined based on the length of content.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer readable recording medium recording a computer program for executing the method may be further provided.

상기와 같은 본 발명에 따르면, 키워드 추출 규칙을 이용하여 콘텐츠에 관련된 키워드들을 자동으로 추출해주는 효과가 있다.According to the present invention as described above, there is an effect of automatically extracting keywords related to content using keyword extraction rules.

또한, 본 발명은 추출된 키워드들을 키워드 필터링 규칙을 이용하여 필터링하고, 키워드 분류 규칙을 이용하여 분류해줌으로써, 작업자가 키워드 세분화 작업을 효율적으로 진행할 수 있도록 하는 효과가 있다.In addition, the present invention filters the extracted keywords using a keyword filtering rule and classifies them using a keyword classification rule, so that a worker can efficiently perform keyword segmentation work.

또한, 본 발명은 작업자로부터 키워드에 대한 세분화 수행 데이터가 수신되면, 소정 개수의 키워드를 해당 콘텐츠의 대표 키워드로 설정하여 키워드 기반 검색 기능에 활용할 수 있는 효과가 있다.In addition, the present invention has an effect of setting a predetermined number of keywords as representative keywords of the corresponding content and utilizing them in a keyword-based search function when segmentation performance data for keywords is received from an operator.

또한, 본 발명은 사용자로부터 입력된 키워드를 대표 키워드로 가진 콘텐츠를 검색하고, 검색된 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출하여 사용자가 또 다시 표출된 대표 키워드 중 하나를 선택하여 콘텐츠를 검색할 수 있도록 하는 효과가 있다.In addition, the present invention searches for content having a keyword input from a user as a representative keyword, and displays the searched content together with a representative keyword set for each content so that the user selects one of the representative keywords again to search for content. has the effect of enabling

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 실시예에 따른 콘텐츠의 키워드 추출 방법의 흐름도이다.
도 2는 강의 콘텐츠의 예시도이다.
도 3은 추출 모듈이 키워드 추출 규칙을 이용하여 콘텐츠에서 복수 개의 키워드를 추출한 것을 예시한 도면이다.
도 4는 전처리 모듈이 도 3의 추출된 키워드들을 전처리한 것을 예시한 도면이다.
도 5는 분류 모듈이 전처리된 키워드들을 키워드 분류 규칙에 따라 분류한 것을 예시한 도면이다.
도 6은 작업자에 의해 세분화된 세분화 수행 데이터가 워드 클라우드로 구현된 것을 예시한 도면이다.
도 7은 본 발명의 실시예에 따른 키워드를 이용한 콘텐츠 추천 방법의 흐름도이다.
도 8은 사용자로부터 입력 받은 키워드를 기반으로 콘텐츠를 검색하여 추출하고, 사용자로부터 대표 키워드를 또 다시 입력받는 것을 예시한 도면이다.
도 9는 본 발명의 실시예에 따른 키워드 추출 장치의 블록도이다.
도 10은 본 발명의 실시예에 따른 콘텐츠 추천 장치의 블록도이다.
도 11은 본 발명의 실시예에 따른 키워드 추출 및 콘텐츠 추천 장치의 블록도이다.1 is a flowchart of a method for extracting keywords from content according to an embodiment of the present invention.
2 is an exemplary diagram of lecture content.
3 is a diagram illustrating that an extraction module extracts a plurality of keywords from content using a keyword extraction rule.
4 is a diagram illustrating pre-processing of the extracted keywords of FIG. 3 by a pre-processing module.
5 is a diagram illustrating that the classification module classifies preprocessed keywords according to keyword classification rules.
6 is a diagram illustrating that segmentation performance data segmented by a worker is implemented as a word cloud.
7 is a flowchart of a content recommendation method using keywords according to an embodiment of the present invention.
8 is a diagram illustrating searching for and extracting content based on a keyword input from a user, and receiving a representative keyword from the user again.
9 is a block diagram of a keyword extraction device according to an embodiment of the present invention.
10 is a block diagram of a content recommendation device according to an embodiment of the present invention.
11 is a block diagram of an apparatus for extracting keywords and recommending contents according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, only these embodiments are intended to complete the disclosure of the present invention, and are common in the art to which the present invention belongs. It is provided to fully inform the person skilled in the art of the scope of the invention, and the invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other elements other than the recited elements. Like reference numerals throughout the specification refer to like elements, and “and/or” includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various components, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first element mentioned below may also be the second element within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

설명에 앞서 본 명세서에서 사용하는 용어의 의미를 간략히 설명한다. 그렇지만 용어의 설명은 본 명세서의 이해를 돕기 위한 것이므로, 명시적으로 본 발명을 한정하는 사항으로 기재하지 않은 경우에 본 발명의 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.Prior to the description, the meaning of the terms used in this specification will be briefly described. However, it should be noted that the description of terms is intended to help the understanding of the present specification, and is not used in the sense of limiting the technical spirit of the present invention unless explicitly described as limiting the present invention.

콘텐츠(Contents): 특정 주제에 대한 정보를 포함하는 데이터로, 예를 들어 디지털로 가공된 각종 정보 내용물, 영상물, 영화, 강의, 도서 등이 해당될 수 있다.Contents: data containing information on a specific subject, for example, various digitally processed information contents, videos, movies, lectures, books, etc. may be applicable.

도 1 내지 도 6은 콘텐츠의 키워드 추출 방법에 대한 도면이고, 도 7 및 도 8은 키워드를 이용한 콘텐츠 추천 방법에 관한 것이다.1 to 6 are diagrams of a method for extracting keywords from content, and FIGS. 7 and 8 relate to content recommendation methods using keywords.

키워드를 이용한 콘텐츠 추천 방법은, 콘텐츠의 키워드 추출 방법에 의해 추출되어 설정된 콘텐츠의 대표 키워드를 활용하는 것으로, 2개의 방법은 별개로 수행될 수도 있고 하나의 실시예로 함께 수행될 수도 있다.The content recommendation method using keywords utilizes representative keywords of content extracted and set by a content keyword extraction method, and the two methods may be performed separately or together in one embodiment.

아래에서는 콘텐츠의 키워드 추출 방법을 먼저 설명한 후에 키워드를 이용한 콘텐츠 추천 방법에 대해서 설명하도록 한다.In the following, a method of extracting keywords from content will be explained first, and then a content recommendation method using keywords will be explained.

도 1은 본 발명의 실시예에 따른 콘텐츠의 키워드 추출 방법의 흐름도이다.1 is a flowchart of a method for extracting keywords from content according to an embodiment of the present invention.

도 2는 강의 콘텐츠의 예시도이다.2 is an exemplary diagram of lecture content.

도 3은 추출 모듈(110)이 키워드 추출 규칙을 이용하여 콘텐츠에서 복수 개의 키워드를 추출한 것을 예시한 도면이다.3 is a diagram illustrating that the extraction module 110 extracts a plurality of keywords from content using a keyword extraction rule.

도 4는 전처리 모듈(120)이 도 3의 추출된 키워드들을 전처리한 것을 예시한 도면이다.FIG. 4 is a diagram illustrating that the preprocessing module 120 preprocesses the extracted keywords of FIG. 3 .

도 5는 분류 모듈(130)이 전처리된 키워드들을 키워드 분류 규칙에 따라 분류한 것을 예시한 도면이다.5 is a diagram illustrating that the classification module 130 classifies preprocessed keywords according to keyword classification rules.

도 6은 작업자에 의해 세분화된 세분화 수행 데이터가 워드 클라우드로 구현된 것을 예시한 도면이다.6 is a diagram illustrating that segmentation performance data segmented by a worker is implemented as a word cloud.

도 1 내지 도 6를 참조하여, 본 발명의 실시예에 따른 콘텐츠의 키워드 추출 방법에 대해서 설명하도록 한다.A method for extracting keywords from content according to an embodiment of the present invention will be described with reference to FIGS. 1 to 6 .

본 발명의 실시예에 따른 콘텐츠 키워드 추출 방법은 컴퓨터에 의해 수행되며, 보다 상세하게는 콘텐츠의 키워드 추출 장치(20) 또는 서버에 의해 수행될 수 있다.A content keyword extraction method according to an embodiment of the present invention is performed by a computer, and more specifically, it may be performed by the content keyword extraction device 20 or a server.

먼저, 추출 모듈(110)이 입력된 콘텐츠를 키워드 추출 규칙에 따라서 분석하여, 해당 콘텐츠에 관련된 복수 개의 키워드를 추출한다. (S100단계, 키워드 추출 단계)First, the extraction module 110 analyzes input content according to keyword extraction rules and extracts a plurality of keywords related to the content. (S100 step, keyword extraction step)

이때, 추출 모듈(110)은 작업자로부터 콘텐츠의 데이터를 직접 입력받을 수도 있고, 콘텐츠가 스트리밍 서비스되는 URL 주소를 입력받을 수도 있다.At this time, the extraction module 110 may directly receive content data from the operator, or may receive a URL address where the content is streamed.

위와 같이 콘텐츠를 입력받는 방법은 다양한 방법들이 적용 가능하므로, 발명의 실시자가 용이하게 선택할 수 있다.Since various methods are applicable to the method of receiving content as described above, the implementer of the present invention can easily select it.

이때, 키워드 추출 규칙은 콘텐츠의 소개자료, 제목 및 분야를 기반으로 키워드 추출 기준을 확립하고, 이를 기준으로 하여 콘텐츠의 스크립트 및 음성 데이터 중 적어도 하나에서 관련 키워드를 추출하는 것을 특징으로 한다.At this time, the keyword extraction rule is characterized in that a keyword extraction criterion is established based on the introductory material, title, and field of the content, and a related keyword is extracted from at least one of the script and voice data of the content based on the criterion.

이때, 콘텐츠의 소개자료는 콘텐츠의 소개자료, 소개글, 줄거리가 해당될 수 있으며, 이외에도 콘텐츠를 요약하여 설명해주는 자료라면 무엇이든 적용이 가능하다.At this time, the introductory material of the content may correspond to the introductory material, the introduction text, and the synopsis of the content, and any other material that summarizes and explains the content can be applied.

예를 들어, 추출 모듈(110)은 키워드 추출 규칙을 이용하여 콘텐츠의 소개자료, 제목 및 분야에서 소정 개수의 간이 키워드를 추출하여 키워드 추출 기준으로 확립하고, 이를 기준으로 하여 해당 콘텐츠의 스크립트 및 음성 데이터 중 적어도 하나에서 기준과 관련된 복수 개의 키워드를 추출할 수 있다.For example, the extraction module 110 extracts a predetermined number of simple keywords from the introductory material, title, and field of the content using a keyword extraction rule, establishes them as keyword extraction criteria, and based on this, extracts the script and voice of the corresponding content. A plurality of keywords related to the criterion may be extracted from at least one of the data.

이때, 키워드를 추출하는 것은 키워드 추출 기준으로 확립된 간이 키워드와 소정의 유사도를 갖거나 분야에서 공통점을 갖는 단어들이 키워드로 선택되어 추출될 수 있다.In this case, keyword extraction may be performed by selecting and extracting words having a certain degree of similarity with a simple keyword established as a keyword extraction criterion or having a common point in a field as keywords.

또한, 콘텐츠의 소개자료, 제목 및 분야 이외에도 해당 콘텐츠에 콘텐츠 제작자가 설정한 해쉬태그가 있다면, 이를 함께 이용하여 키워드 추출 기준 확립에 이용할 수도 있다.In addition, if there is a hash tag set by the content creator for the content in addition to the introduction material, title, and field of the content, it can be used together to establish keyword extraction criteria.

일 실시예로, 추출 모듈(110)은 입력된 콘텐츠에 대하여 별도 자막이 설정 또는 저장되어 있지 않은 경우에는 STT(Speech to Text) 기능을 이용하여 자막, 텍스트 데이터를 추출하여 키워드 추출에 이용할 수 있다.As an embodiment, the extraction module 110 extracts subtitles and text data using a speech to text (STT) function and uses them for keyword extraction when no subtitles are separately set or stored for the input content. .

또 다른 실시예로, 콘텐츠가 이미지, 영상의 중요도가 높은 콘텐츠인 경우에는 추출 모듈(110)은 이미지, 영상 인식 기술을 활용하여 콘텐츠 내 이미지, 영상을 인식하여 키워드를 추출할 수도 있다.As another embodiment, when the content is of high importance to images and videos, the extraction module 110 may extract keywords by recognizing images and videos in the content using image and video recognition technology.

또한, 키워드 추출 규칙은 콘텐츠가 강의 자료인 경우, 콘텐츠의 강의 소개자료, 제목, 강의 분야 중 적어도 하나를 기반으로 키워드 추출 기준을 확립하고 이를 기준으로 해당 콘텐츠의 강사 이름, 강의 과정, 스크립트 및 음성 데이터 중 적어도 하나에서 관련 키워드를 추출하는 것을 특징으로 한다.In addition, when the content is a lecture material, the keyword extraction rule establishes a keyword extraction criterion based on at least one of the lecture introduction material, title, and lecture field of the content, and based on this, the lecturer name, lecture course, script, and audio of the content It is characterized in that a related keyword is extracted from at least one of the data.

도 2에는 고등학교 수학 미적분1의 동영상 강의가 콘텐츠로 예시되어 있으며, 추출 모듈(110)은 강의 소개자료, 제목(미적분1), 강의 분야(고등학교 수학, 태그된 해쉬태그 등 참조)를 기반으로 키워드 추출 기준을 확립하게 되고, 이를 기준으로 하여 해당 콘텐츠의 강사 이름, 강의 과정, 스크립트 및 음성 데이터를 분석하여 도 3과 같이 복수 개의 키워드가 추출된 것이 예시되어 있다.In FIG. 2, a video lecture of high school mathematics calculus 1 is exemplified as content, and the extraction module 110 uses keywords based on lecture introduction material, title (calculus 1), and lecture field (see high school mathematics, tagged hashtags, etc.) An extraction criterion is established, and a plurality of keywords are extracted as shown in FIG. 3 by analyzing the instructor name, lecture course, script, and voice data of the corresponding content based on this criterion.

추출되는 키워드 개수가 한정되는 것은 아니지만, 도 3은 도면상의 도시를 위해서 20~30개의 키워드가 예시되어 있지만, 실제로는 수백개의 키워드가 추출될 수 있고 콘텐츠의 길이, 용량에 따라서 더 많은 키워드가 추출될 수도 있다.The number of extracted keywords is not limited, but 20 to 30 keywords are exemplified in FIG. 3 for illustration in the drawing, but in reality hundreds of keywords can be extracted and more keywords are extracted depending on the length and capacity of the content. It could be.

예를 들어, 도 2의 콘텐츠를 기반으로 하여 키워드 추출 기준을 확립하게 되면, 고등학생, 수학, 미적분, 수업 등과 같은 간이 키워드를 기반으로 관련 키워드를 추출하게 되므로, 도 3과 같이 간이 키워드와 관련된 키워드들이 추출된다.For example, when keyword extraction criteria are established based on the contents of FIG. 2, related keywords are extracted based on simple keywords such as high school students, math, calculus, and classes, so keywords related to simple keywords as shown in FIG. are extracted

일 실시예로, 키워드 추출 기준은 해당 콘텐츠의 스크립트, 음성 데이터에서 언급된 횟수가 될 수도 있다.As an embodiment, the keyword extraction criterion may be the number of times mentioned in the script or voice data of the corresponding content.

예를 들어, 추출 모듈(110)은 콘텐츠의 스크립트, 음성 데이터에서 언급된 횟수가 30회 이상인 경우 해당 단어를 키워드로 추출할 수 있으며, 이러한 횟수의 기준은 콘텐츠의 크기, 길이가 될 수 있다.For example, the extraction module 110 may extract the word as a keyword if the number of times it is mentioned in the script or voice data of the content is 30 or more, and the size and length of the content may be the criteria for the number of times.

S100단계 다음으로, 전처리 모듈(120)이 S100단계에서 추출된 복수 개의 키워드를 키워드 필터링 규칙에 따라서 분석하여, 콘텐츠의 검색 기능에 불필요한 키워드를 제외한다. (S200단계)Next to step S100, the pre-processing module 120 analyzes the plurality of keywords extracted in step S100 according to keyword filtering rules, and excludes keywords unnecessary for the content search function. (Step S200)

이때, 키워드 필터링 규칙은 키워드로서 특이점을 갖지 못하는 일반적인 문구와 일반적인 표현에 해당하는 키워드를 제외시키고, 키워드에 포함된 조사를 삭제하는 것을 특징으로 한다.At this time, the keyword filtering rule is characterized by excluding general phrases and keywords corresponding to general expressions that do not have specificity as keywords, and deleting investigations included in the keywords.

도 4에는 추출 모듈(110)이 키워드 추출 규칙을 이용하여 콘텐츠에서 추출된 복수 개의 키워드(도 3)에서 필터링 모듈이 키워드 필터링 규칙을 이용하여 키워드를 전처리한 것이 예시되어 있다.FIG. 4 illustrates that the filtering module pre-processes keywords using keyword filtering rules from a plurality of keywords ( FIG. 3 ) extracted from content by the extraction module 110 using keyword extraction rules.

"수능을"에서 조사 "을"을 삭제하였고, "적분을"에서 조사 "을"을 삭제하는 것과 같이 키워드들의 조사를 삭제한 것이 예시되어 있다.It is exemplified that the investigation of keywords is deleted, such as the deletion of the investigation "eul" from "SAT" and the deletion of the investigation "eul" from "integration".

또한, 유튜브, 스터디, 책상, 교실, 공부, 과학, 산수, 대학교와 같이 키워드로서 특이점을 갖지 못하는 일반적인 문구와 일반적인 표현에 해당하는 키워드를 제외시킨 것이 예시되어 있다.In addition, it is exemplified that general phrases and keywords corresponding to general expressions that do not have specificity as keywords such as YouTube, study, desk, classroom, study, science, arithmetic, and university are excluded.

S200단계 다음으로, 분류 모듈(130)이 S200단계에서 전처리된 복수 개의 키워드를 키워드 분류 규칙에 따라 분석하여, 복수 개의 키워드를 분류한다. (S300단계)Next to step S200, the classification module 130 analyzes the plurality of keywords preprocessed in step S200 according to keyword classification rules, and classifies the plurality of keywords. (Step S300)

이때, 키워드 분류 규칙은 복수 개의 키워드 중에서 이음동의어 또는 유사어에 해당하거나 일정 수준 이상의 유사도를 갖는 키워드끼리 유사 키워드로 분류하는 것을 의미한다.In this case, the keyword classification rule means to classify keywords corresponding to allophones or synonyms among a plurality of keywords or having a degree of similarity above a certain level as similar keywords.

또한, 키워드 분류 규칙은 각 키워드의 유사 키워드 개수와 키워드 추출 기준과의 매칭도를 기반으로 분류된 각 키워드에 가중치를 부여하는 것을 특징으로 한다.In addition, the keyword classification rule is characterized by assigning a weight to each classified keyword based on the matching degree between the number of similar keywords of each keyword and the keyword extraction criterion.

따라서, 분류 모듈(130)은 위와 같은 키워드 분류 규칙을 이용하여 키워드를 분류하여, 이음동의어 또는 유사어에 해당하거나 일정 수준 이상의 유사도를 갖는 키워드끼리 유사 키워드로 분류하고, 각 키워드에 가중치를 부여하여 정렬할 수 있다.Therefore, the classification module 130 classifies keywords using the above keyword classification rules, classifies keywords corresponding to allophones or synonyms or having a similarity above a certain level as similar keywords, and assigns a weight to each keyword to sort them. can do.

이때, 유사 키워드로 분류된 키워드들은 유사한 의미를 갖는 키워드들이므로 같은 그룹 내 키워드 들은 동일한 가중치를 부여받게 된다.At this time, since the keywords classified as similar keywords are keywords having similar meanings, keywords in the same group are given the same weight.

따라서, 도 5와 같이 분류 모듈(130)이 분류한 키워드는 가중치 순으로 정렬됨은 물론, 유사 키워드로 분류된 그룹으로 정렬되어 작업자의 키워드에 대한 세분화 수행이 용이하도록 하는 효과를 발휘하게 된다.Therefore, as shown in FIG. 5 , the keywords classified by the classification module 130 are not only sorted in the order of weight, but also sorted into groups classified as similar keywords, so that the operator can easily segment the keywords.

다음으로, 컴퓨터는 통신 모듈(150)을 통해 작업자 단말(300)로 키워드 분류 데이터를 제공하고, 작업자 단말(300)로 분류된 키워드에 대한 세분화 작업을 요청하게 된다.Next, the computer provides keyword classification data to the worker terminal 300 through the communication module 150, and requests a segmentation job for the classified keyword to the worker terminal 300.

그리고, 컴퓨터는 작업자로부터 분류된 키워드에 대한 세분화 작업을 진행한 세분화 수행 데이터를 수신하게 된다.In addition, the computer receives segmentation performance data on which a segmentation operation for the classified keywords has been performed from the operator.

이때, 작업자로부터 수신되는 세분화 수행 데이터는 작업자에 의해 상기 분류된 키워드 중에서 불필요한 키워드가 제외되고, 우선순위 순서로 정렬된 워드 클라우드 데이터인 것을 특징으로 한다.At this time, the segmentation performance data received from the worker is characterized in that unnecessary keywords are excluded from among the keywords classified by the worker, and word cloud data is sorted in order of priority.

이러한 워드 클라우드는 도 6과 같이 키워드의 중요도가 높을수록 글씨 크기가 크고 중앙에 도시되어 시각적으로 각 키워드의 중요도를 한 눈에 확인할 수 있는 효과가 있다.In the word cloud, as shown in FIG. 6, the higher the importance of the keyword, the larger the font size, and is displayed in the center, so that the importance of each keyword can be visually confirmed at a glance.

S300단계 다음으로, 설정 모듈(140)이 작업자로부터 세분화 수행 데이터가 수신되면, 세분화 수행 데이터에 포함된 소정 개수의 키워드를 해당 콘텐츠의 대표 키워드로 설정한다. (S400단계)Next to step S300, when segmentation performance data is received from the operator, the setting module 140 sets a predetermined number of keywords included in the segmentation performance data as representative keywords of the corresponding content. (Step S400)

보다 상세하게는, 설정 모듈(140)은 작업자로부터 수신된 세분화 수행 데이터를 이용하여 우선순위가 가장 높은 소정 개수의 키워드를 선택하고, 이를 해당 콘텐츠의 대표 키워드로 설정하게 된다. (예: 1개 ~ 5개)More specifically, the setting module 140 selects a predetermined number of keywords having the highest priority using the segmentation performance data received from the operator, and sets them as representative keywords of the corresponding content. (e.g. 1 to 5)

도 7은 본 발명의 실시예에 따른 키워드를 이용한 콘텐츠 추천 방법의 흐름도이다.7 is a flowchart of a content recommendation method using keywords according to an embodiment of the present invention.

도 8은 사용자로부터 입력 받은 키워드를 기반으로 콘텐츠를 검색하여 추출하고, 사용자로부터 대표 키워드를 또 다시 입력받는 것을 예시한 도면이다.8 is a diagram illustrating searching for and extracting content based on a keyword input from a user, and receiving a representative keyword from the user again.

도 7 및 도 8을 참조하여, 본 발명의 실시예에 따른 키워드를 이용한 콘텐츠 추천 방법에 대해서 설명하도록 한다.Referring to FIGS. 7 and 8 , a content recommendation method using keywords according to an embodiment of the present invention will be described.

본 발명의 실시예에 따른 키워드를 이용한 콘텐츠 추천 방법은 도 1 내지 도 6의 콘텐츠 키워드 추출 방법에 의해 소정 개수의 대표 키워드가 설정된 콘텐츠들을 이용하게 된다.The content recommendation method using keywords according to an embodiment of the present invention uses content for which a predetermined number of representative keywords are set by the content keyword extraction method of FIGS. 1 to 6 .

먼저, 검색 모듈(170)이 사용자로부터 입력 받은 키워드를 대표 키워드로 가진 콘텐츠를 검색한다. (S500단계)First, the search module 170 searches for content having a keyword input from a user as a representative keyword. (Step S500)

표출부(180)가 S500단계에서 검색된 하나 이상의 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출한다. (S600단계)The display unit 180 displays one or more contents searched in step S500 together with a representative keyword set for each contents. (Step S600)

도 7을 참조하면, 사용자가 키워드 검색란에 "수학"을 키워드로 입력하였고, 검색 모듈(170)이 이를 검색하여 "수학"이 대표 키워드로 설정되어 있는 콘텐츠들을 검색하여 표출하는 것이 예시되어 있다.Referring to FIG. 7 , it is exemplified that a user inputs “mathematics” as a keyword in the keyword search field, and the search module 170 searches for and displays contents for which “mathematics” is set as a representative keyword.

이때, "수학"이 대표 키워드로 설정되어 있는 콘텐츠들만 표출하는 것이 아닌고, 표출되는 각 콘텐츠에 설정되어 있는 대표 키워드가 함께 표출되고 있다.At this time, not only the contents for which "mathematics" is set as the representative keyword are expressed, but the representative keywords set for each content to be expressed are also expressed.

S600단계 다음으로, 검색 모듈(170)이 S600단계에서 표출된 대표 키워드 중 하나가 사용자로부터 선택되면, S500단계에서 검색된 콘텐츠 또는 S600단계에서 표출된 콘텐츠(2개 모두 동일한 의미) 중에서, 사용자로부터 선택된 대표 키워드를 대표 키워드로 가진 콘텐츠를 재검색한다. (S700단계)Next to step S600, when the search module 170 selects one of the representative keywords expressed in step S600 from the user, the content searched for in step S500 or the content expressed in step S600 (both of which have the same meaning) is selected from the user. Contents with the representative keyword as the representative keyword are searched again. (Step S700)

도 7을 참조하면, 사용자가 화면으로 표출되고 있는 콘텐드의 대표 키워드들 중에서 "미적분"을 손가라으로 선택하는 것이 예시되어 있다.Referring to FIG. 7 , it is exemplified that a user selects “calculus” from among representative keywords of content displayed on a screen with his finger.

이와 같이, 사용자는 본인이 입력한 키워드를 대표 키워드로 가진 콘텐츠들을 확인할 수 있음은 물론, 그 중에서 재검색하고자 하는 대표 키워드를 선택하여 검색 범위를 좁혀나가면서 특정 콘텐츠를 선택할 수 있게 된다.In this way, the user can check contents having the keyword entered by the user as a representative keyword, and select a specific content while narrowing the search range by selecting a representative keyword to be re-searched among them.

이때, S700단계와 S800단계는 한 번에 종료되는 것은 아니며, 사용자가 키워드 검색 기능을 종료하거나 특정 콘텐츠를 선택할 때까지 계속 반복될 수 있다.At this time, steps S700 and S800 do not end at once, and may be repeated until the user ends the keyword search function or selects specific content.

도 9는 본 발명의 실시예에 따른 키워드 추출 장치(20)의 블록도이다.9 is a block diagram of a keyword extraction device 20 according to an embodiment of the present invention.

도 9를 참조하면, 본 발명의 실시예에 따른 키워드 추출 장치(20)는 추출 모듈(110), 전처리 모듈(120), 분류 모듈(130), 설정 모듈(140), 통신 모듈(150) 및 데이터베이스(160)를 포함한다.Referring to FIG. 9 , the keyword extraction device 20 according to an embodiment of the present invention includes an extraction module 110, a preprocessing module 120, a classification module 130, a setting module 140, a communication module 150 and database 160.

다만, 몇몇 실시예에서 키워드 추출 장치(20)는 도 9에 도시된 구성요소보다 더 적은 수의 구성요소나 더 많은 구성요소를 포함할 수도 있다.However, in some embodiments, the keyword extraction device 20 may include fewer or more components than those shown in FIG. 9 .

추출 모듈(110)은 입력된 콘텐츠를 키워드 추출 규칙에 따라 분석하여, 콘텐츠에 관련된 복수 개의 키워드를 추출한다.The extraction module 110 analyzes the input content according to keyword extraction rules and extracts a plurality of keywords related to the content.

전처리 모듈(120)은 추출 모듈(110)에 의해 추출된 복수 개의 키워드를 키워드 필터링 규칙에 따라서 분석하여, 콘텐츠 검색 기능에 불필요한 키워드를 제외한다.The pre-processing module 120 analyzes the plurality of keywords extracted by the extraction module 110 according to keyword filtering rules and excludes keywords unnecessary for the content search function.

분류 모듈(130)은 전처리 모듈(120)에 의해 전처리된 복수 개의 키워드를 키워드 분류 규칙에 따라 분석하여, 복수 개의 키워드를 분류한다.The classification module 130 analyzes the plurality of keywords preprocessed by the preprocessing module 120 according to keyword classification rules, and classifies the plurality of keywords.

설정 모듈(140)은 작업자에 의해 분류된 키워드에 대한 세분화 수행 데이터가 수신되면, 세분화 수행 데이터에 포함된 소정 개수의 키워드를 해당 콘텐츠의 대표 키워드로 설정한다.When segmentation performance data for keywords classified by the operator is received, the setting module 140 sets a predetermined number of keywords included in the segmentation performance data as representative keywords of the corresponding content.

통신 모듈(150)은 입력된 콘텐츠를 수신하거나, 넷상의 콘텐츠에 대한 데이터를 수신하는 역할을 수행하며, 작업자 단말(300)과 통신하여 데이터를 주고받기도 한다.The communication module 150 serves to receive input content or data on content on the net, and communicates with the worker terminal 300 to exchange data.

데이터베이스(160)는 키워드 추출 규칙, 키워드 필터링 규칙, 키워드 분류 규칙과 같은 알고리즘들이 저장될 수 있으며, 콘텐츠의 데이터, 설정, 분류된 키워드들의 데이터 등과 같은 저장수단을 필요로 하는 데이터들이 저장되는 구성으로 활용될 수 있다.The database 160 may store algorithms such as keyword extraction rules, keyword filtering rules, and keyword classification rules, and may store data requiring storage means such as content data, settings, and classified keyword data. can be utilized

이상으로 설명한 본 발명의 실시예에 따른 키워드 추출 장치(20)는 도 1 내지 도 6을 통해 설명한 키워드 추출 방법과 발명의 카테고리만 다를 뿐, 동일한 내용이므로 중복되는 설명, 예시는 생략하도록 한다.The keyword extraction device 20 according to the embodiment of the present invention described above is the same content as the keyword extraction method described through FIGS. 1 to 6 except for the category of the invention, so duplicate descriptions and examples are omitted.

도 10은 본 발명의 실시예에 따른 콘텐츠 추천 장치(30)의 블록도이다.10 is a block diagram of a content recommendation device 30 according to an embodiment of the present invention.

도 10을 참조하면, 본 발명의 실시예에 따른 콘텐츠 추천 장치(30)는 통신 모듈(150), 데이터베이스(160), 검색 모듈(170), 표출부(180)를 포함한다.Referring to FIG. 10 , the content recommendation device 30 according to an embodiment of the present invention includes a communication module 150, a database 160, a search module 170, and an expression unit 180.

다만, 몇몇 실시예에서 콘텐츠 추천 장치(30)는 도 10에 도시된 구성요소보다 더 적은 수의 구성요소나 더 많은 구성요소를 포함할 수도 있다.However, in some embodiments, the content recommendation device 30 may include fewer or more components than those shown in FIG. 10 .

통신 모듈(150)은 사용자 단말(330)과 통신하여 데이터를 주고받는다.The communication module 150 communicates with the user terminal 330 to exchange data.

데이터베이스(160)는 콘텐츠에 대한 설정, 사용자의 검색 이력 등과 같은 저장수단을 필요로 하는 데이터들이 저장되는 구성으로 활용될 수 있다.The database 160 may be used as a configuration for storing data requiring storage means, such as content settings and user search histories.

검색 모듈(170)은 사용자로부터 입력 받은 키워드를 대표 키워드로 가진 콘텐츠를 검색한다.The search module 170 searches for content having a keyword input from a user as a representative keyword.

표출부(180)는 검색부를 통해 검색된 하나 이상의 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출한다.The display unit 180 displays one or more contents searched through the search unit together with a representative keyword set for each content.

그리고, 검색 모듈(170)은 표출부(180)를 통해 표출된 대표 키워드 중 사용자로부터 특정 대표 키워드가 선택되면, 검색된 콘텐츠 중에서 사용자로부터 선택된 대표 키워드를 대표 키워드로 가진 콘텐츠를 재검색한다.Then, when a specific representative keyword is selected from the user among the representative keywords expressed through the display unit 180, the search module 170 re-searches content having the representative keyword selected by the user as a representative keyword among the searched contents.

표출부(180)는 검색부를 통해 재검색된 하나 이상의 콘텐츠를 각 콘텐츠에 설정된 대표 키워드와 함께 표출한다.The display unit 180 displays one or more contents re-searched through the search unit together with a representative keyword set for each content.

이상으로 설명한 본 발명의 실시예에 따른 키워드를 이용한 콘텐츠 추천 장치(30)는 도 7 및 도 8을 통해 설명한 키워드를 이용한 콘텐츠 추천 방법과 발명의 카테고리만 다를 뿐, 동일한 내용이므로 중복되는 설명, 예시는 생략하도록 한다.Content recommendation device 30 using keywords according to the embodiment of the present invention described above is the same content as the content recommendation method using keywords described with reference to FIGS. should be omitted.

도 11은 본 발명의 실시예에 따른 키워드 추출 및 콘텐츠 추천 장치(10)의 블록도이다.11 is a block diagram of an apparatus 10 for extracting keywords and recommending contents according to an embodiment of the present invention.

도 11에 도시된 키워드 추출 및 콘텐츠 추천 장치(10)는 도 9에 도시된 키워드 추출 장치(20)와 도 10에 도시된 콘텐츠 추천 장치(30)가 하나의 장치로 구성된 것으로, 키워드 추출 장치(20) 및 콘텐츠 추천 장치(30)와 차별점을 가지지는 않는다.The keyword extraction and content recommendation device 10 shown in FIG. 11 is composed of the keyword extraction device 20 shown in FIG. 9 and the content recommendation device 30 shown in FIG. 10 as one device, and the keyword extraction device ( 20) and the content recommendation device 30.

이상에서 전술한 본 발명의 일 실시예에 따른 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a server, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The aforementioned program is C, C++, JAVA, machine language, etc. It may include a code coded in a computer language of. These codes may include functional codes related to functions defining necessary functions for executing the methods, and include control codes related to execution procedures necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, these codes may further include memory reference related codes for which location (address address) of the computer's internal or external memory should be referenced for additional information or media required for the computer's processor to execute the functions. there is. In addition, when the processor of the computer needs to communicate with any other remote computer or server in order to execute the functions, the code uses the computer's communication module to determine how to communicate with any other remote computer or server. It may further include communication-related codes for whether to communicate, what kind of information or media to transmit/receive during communication, and the like.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium is not a medium that stores data for a short moment, such as a register, cache, or memory, but a medium that stores data semi-permanently and is readable by a device. Specifically, examples of the storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., but are not limited thereto. That is, the program may be stored in various recording media on various servers accessible by the computer or various recording media on the user's computer. In addition, the medium may be distributed to computer systems connected through a network, and computer readable codes may be stored in a distributed manner.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.Steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, implemented in a software module executed by hardware, or implemented by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any form of computer readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features of the present invention. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

10: 키워드 추출 및 콘텐츠 추천 장치
20: 키워드 추출 장치
30: 콘텐츠 추천 장치
110: 추출 모듈
120: 전처리 모듈
130: 분류 모듈
140: 설정 모듈
150: 통신 모듈
160: 데이터베이스
170: 검색 모듈
180: 표출부
300: 작업자 단말
330: 사용자 단말10: Keyword extraction and content recommendation device
20: keyword extraction device
30: content recommendation device
110: extraction module
120: preprocessing module
130: classification module
140: setting module
150: communication module
160: database
170: search module
180: expression unit
300: worker terminal
330: user terminal

Claims

an extraction module that analyzes input content based on keyword extraction rules and extracts a plurality of keywords related to the content based on the analysis result;
a classification module that analyzes the extracted plurality of keywords according to keyword classification rules and classifies the plurality of keywords; and
A setting module for setting a predetermined number of keywords included in segmentation performance data as representative keywords of the content;
The extraction module extracts subtitles from the content using a speech to text (STT) function when the content does not include subtitles, and extracts the plurality of keywords using the extracted subtitles;
Characterized in that at least one word mentioned more than a preset number of times in the content is extracted as the plurality of keywords, and the preset number of times is determined based on the length of the content.
A keyword extraction device based on keyword extraction rules.

According to claim 1,
The keyword extraction device,
Based on the introductory material, title, and field of the content, a predetermined number of simple keywords are extracted to determine a keyword extraction criterion, and the plurality of keywords are extracted from the content based on the determined keyword extraction criterion. Extracting words having a pre-set similarity with or having a common point in the field as the plurality of keywords,
A keyword extraction device based on keyword extraction rules.

According to claim 2,
The keyword extraction device,
Based on the number of similar keywords in the classified similar keyword group and the degree of matching with the keyword extraction criterion, weights are assigned to keywords of each of the classified similar keyword groups, and keywords in the same similar keyword group are assigned equal weights. characterized in that,
A keyword extraction device based on keyword extraction rules.

According to claim 1,
The keyword extraction device,
Characterized in that, when the content is image-based content, recognizing an image in the content and extracting the plurality of keywords related to the recognized image,
A keyword extraction device based on keyword extraction rules.

According to claim 1,
The keyword filtering rule,
Excluding general phrases and keywords corresponding to general expressions that do not have specificity as keywords,
A method for extracting keywords from content, characterized by deleting research included in keywords.
A keyword extraction device based on keyword extraction rules.

According to claim 1,
The keyword extraction device,
If the content is a lecture material, the keyword extraction criterion is determined based on at least one of a lecture introduction material, a title, and a lecture field of the content, and based on the determined keyword extraction criterion, the instructor name of the content, the course of the lecture, Characterized in that a related keyword is extracted from at least one of script and voice data,
A keyword extraction device based on keyword extraction rules.

According to claim 1,
The keyword extraction device,
Among the plurality of keywords, keywords corresponding to allophone synonyms or similar words or satisfying a predetermined level of similarity are classified as similar keywords,
Characterized in that a weight is assigned to each keyword classified based on the number of similar keywords of each keyword and the degree of matching with the keyword extraction criterion.
A keyword extraction device based on keyword extraction rules.

A content recommendation device using a keyword using content for which a representative keyword is set by the keyword extracting device of claim 1,
The content recommendation device,
Search for content with the keyword input from the user as the representative keyword,
Displaying one or more of the searched contents together with a representative keyword set for each content;
When one of the displayed representative keywords is selected by the user, content having the selected representative keyword as a representative keyword is re-searched among the searched contents;
Characterized in that the re-searched one or more contents are expressed together with a representative keyword set for each content.
A keyword extraction device based on keyword extraction rules.

As a method performed by the keyword extraction device,
Analyzing input content based on keyword extraction rules;
extracting a plurality of keywords related to the content based on the analysis result;
analyzing the extracted plurality of keywords according to keyword classification rules and classifying the plurality of keywords; and
Setting a predetermined number of keywords included in segmentation performance data as representative keywords of the content;
The keyword extraction device,
When subtitles are not included in the content, subtitles are extracted from the content using a speech to text (STT) function, and the plurality of keywords are extracted using the extracted subtitles;
Characterized in that at least one word mentioned more than a predetermined number of times in the content is extracted as the plurality of keywords, and the predetermined number of times is determined based on the length of the content,
Keyword extraction method based on keyword extraction rules.

A program that is combined with a computer, which is hardware, and stored in a medium to execute the method of claim 9.