KR20200109515A

KR20200109515A - Education contents generating method using big data

Info

Publication number: KR20200109515A
Application number: KR1020190028657A
Authority: KR
Inventors: 유필상
Original assignee: 주식회사 키즈브라운파트너스
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2020-09-23

Abstract

The present invention provides an educational content generation method using big data, which uses a search keyword input by a user to perform a search in text content big data constructed as big data and provides educational content from a search result. According to one embodiment of the present invention, the educational content generation method using big data, which is executed by an educational content apparatus, comprises the following steps of: acquiring a search keyword input by a user; determining a major topic from topics tagged with each search keyword input by the user; extracting a first title keyword about the search keyword input by a first user from a title of the search keyword input by the first user related to a first major topic among the determined major topics; extracting a second title keyword for educational control from titles of text content big data constructed as big data among the determined major topics; calculating a topic similarity between the first major keyword and the second major keyword based on a similarity between the first topic keyword and the second topic keyword; and providing a search result about the search keyword input by the user as educational content based on the calculated topic similarity.

Description

Education contents generating method using big data}

본 발명은 빅데이터를 이용한 교육 콘텐츠 생성 방법 및 장치에 관한 것이다. 보다 자세하게는, 빅데이터를 이용한 교육 콘텐츠 DB에서 추출된 주제어(topic keyword)를 기초로 상기 사용자에 의해 입력된 검색 키워드에 대한 시각화된 교육 콘텐츠를 제공하는 빅데이터를 이용한 교육 콘텐츠 생성 방법에 관한 것이다.The present invention relates to a method and apparatus for generating educational content using big data. In more detail, it relates to a method for generating educational content using big data that provides visualized educational content for a search keyword input by the user based on a topic keyword extracted from an educational content DB using big data. .

현행의 온라인 방식의 교육은 보다 적극적으로 활용되지 않고 있다. 이에 대한 근본적인 원인은 다음과 같다. 먼저, 대부분의 온라인 방식의 교육 관련 툴은 리스트 형식의 결과물을 제공하기 때문에 사용자 편의성이 떨어진다. 구체적으로, 리스트 형식의 결과물을 제공받은 교육자는 해당 교육과 관련된 핵심 콘텐츠를 찾기 위해 리스트에 나열된 콘텐츠를 일일이 확인해야 한다. 이는 많은 시간이 소요되는 작업이기 때문에 사용자의 편의성을 크게 저하시키는 요인이 된다. Current online education is not being used more actively. The fundamental cause for this is as follows. First, most of the online education-related tools provide results in a list format, so user convenience is poor. Specifically, an educator who is provided with the results in the form of a list must check the contents listed in the list in order to find the core contents related to the education. Since this is a time-consuming task, it is a factor that greatly reduces user convenience.

한편, 일부 온라인 교육 관련 툴은 사용자 편의성을 향상시키기 위해 검색된 콘텐츠에 대한 클러스터링을 수행하고, 교육 콘텐츠를 결과물로 제공한다. 예를 들어, 일부 교육 관련 툴은, 도 1에 도시된 바와On the other hand, some online education-related tools perform clustering on searched contents to improve user convenience, and provide education contents as a result. For example, some education related tools, as shown in Figure 1

같이, 검색된 콘텐츠가 가리키는 연계 자료를 시각화된 네트워크로 제공한다. 그러나, 시각화된 네트워Likewise, the linked data indicated by the searched content is provided through a visualized network. However, the visualized network

크를 교육 콘텐츠로 제공하는 대다수의 교육 관련 툴은 주제어(topic keyword)의 동시 발생 빈도(co-occurrent frequency)에 기초하여 교육 주제 간의 유사도를 파악하기 때문에, 교육 콘텐츠의 정확도가 떨어진다는 문제가 있다.Most of the education-related tools that provide educational content as educational content have a problem that the accuracy of educational content is poor because they grasp the similarity between educational topics based on the co-occurrent frequency of the topic keyword. .

구체적으로, 동시 발생 빈도를 이용하여 교육 주제 간의 관계성을 파악하는 경우, 대다수의 콘텐츠에 포함된 검색어 및/또는 해당 분야에서 일상적으로 사용되는 범용어가 핵심적인 주제어인 것으로 오판될 수 있다. 또한, 이를 토대로 제공된 시각화된 네트워크는 도 1에 도시된 바와 같이 검색어 및/또는 범용어가 강한 허브(hub)로 작용하는 허브 앤 스포크(hub & spoke) 구조를 가질 확률이 높기 때문에, 교육 콘텐츠의 가치가 떨어지고 교육자에게 양질의 교육 콘텐츠를 제공할 수 없다.Specifically, in the case of grasping the relationship between educational topics using the frequency of simultaneous occurrence, search terms included in the majority of contents and/or general-purpose words commonly used in the relevant field may be mistaken as core keywords. In addition, the visualized network provided based on this has a high probability of having a hub & spoke structure in which the search word and/or general language acts as a strong hub as shown in FIG. Value is low and quality educational content cannot be provided to educators.

KRKR 10201000685321020100068532 AA KRKR 10201400502171020140050217 AA KRKR 10201700342061020170034206 AA

본 발명이 해결하고자 하는 기술적 과제는, 사용자에 의해 입력된 검색 키워드에 대한 시각화된 교육 콘텐츠를 제공하는 장치에 대한 구동 방법 제공하는 것이다.The technical problem to be solved by the present invention is to provide a driving method for a device that provides visualized educational content for a search keyword input by a user.

본 발명이 해결하고자 하는 다른 기술적 과제는, 사용자에 의해 입력된 검색 키워드가 가리키는 주제의 유사도 또는 관계성을 정확하게 산출할 수 있는 방법을 제공하는 것이다.Another technical problem to be solved by the present invention is to provide a method capable of accurately calculating the similarity or relationship of a subject indicated by a search keyword input by a user.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 해결하기 위한, 본 발명의 일 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법은, 교육 콘텐츠 장치에 의해 수행되는 빅데이터를 이용한 교육 콘텐츠 생성 방법에 있어서, 사용자에 의해 입력된 검색 키워드 을 획득하는 단계, 상기 사용자에 의해 입력된 검색 키워드 각각에 태깅된 주제어 중에서 주요 주제어를 결정하는 단계, 상기 결정된 주요 주제어 중 제1 주요 주제어와 연관된 제1 사용자에 의해 입력된 검색 키워드의 제목에서, 상기 제1 사용자에 의해 입력된 검색 키워드 에 대한 제1 제목 키워드를 추출하는 단계, 상기 결정된 주요 주제어 중 빅데이터로 구축된 교재 콘텐츠 빅데이터의 제목에서, 상기 교재 콘텐츠에 대한 제2 제목 키워드를 추출하는 단계, 상기 제1 제목 키워드와 상기 제2 제목 키워드 사이의 유사도를 기초로, 상기 제1 주요 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도를 산출하는 단계 및 상기 산출된 주제어 유사도를 기초로, 상기 사용자에 의해 입력된 검색 키워드 에 대한 검색 결과를 제공하는 단계를 포함할 수 있다.In order to solve the above technical problem, a method for generating educational content using big data according to an embodiment of the present invention includes a search input by a user in the method for generating educational content using big data performed by an educational content device. Acquiring a keyword, determining a main subject from among subject words tagged to each of the search keywords input by the user, from the title of a search keyword input by a first user associated with a first main subject among the determined main subject words , Extracting a first title keyword for the search keyword input by the first user, from the title of the textbook content big data constructed as big data among the determined main keywords, a second title keyword for the textbook content Extracting, based on the similarity between the first subject keyword and the second subject keyword, calculating a subject word similarity between the first subject word and the second subject word, and based on the calculated subject word similarity , Providing a search result for the search keyword input by the user.

일 실시예에서, 상기 주요 주제어를 결정하는 단계는, 복수의 시간 구간 각각에 대하여, 상기 태깅된 주제어 각각에 대한 시간 구간별 빈도수를 산출하는 단계, 상기 산출된 시간 구간별 빈도수를 기초로, 상기 태깅된 주제어 중에서 범용 주제어를 제외하는 단계 및 상기 범용 주제어가 제외된 나머지 주제어 중에서, 상기 관심 주제어를 결정하는 단계를 포함하되, 상기 범용 주제어는, 복수의 시간 구간에 걸쳐, 상기 산출된 시간 구간별 빈도수가 지속적으로 임계 값 이상이 되는 주제어일 수 있다.In one embodiment, the determining of the main subject word includes calculating a frequency for each time period for each of the tagged subject words, for each of a plurality of time periods, based on the calculated frequency for each time period, the Including the step of excluding a general-purpose main word from among the tagged subject words and determining the subject of interest from among the remaining subject words from which the general-purpose main word is excluded, wherein the general-purpose main word is, over a plurality of time intervals, for each of the calculated time intervals It may be a key word whose frequency continuously exceeds a threshold value.

일 실시예에서, 상기 주요 주제어를 결정하는 단계는, 복수의 시간 구간 각각에 대하여, 상기 태깅된 주제어 각각에 대한 시간 구간별 빈도수를 산출하는 단계 및 상기 태깅된 주제어 중에서, 상기 복수의 시간 구간에 걸쳐서 상기 산출된 시간 구간별 빈도수가 지속적인 증가 추세로 나타나는 주제어를 상기 주요 주제어로 결정하는 단계를 포함할 수 있다.In one embodiment, the determining of the main subject word includes calculating a frequency for each time period for each of the tagged subject words for each of a plurality of time periods, and among the tagged subject words, in the plurality of time periods. It may include the step of determining the main subject word in which the calculated frequency of each time section is continuously increasing over time.

일 실시예에서, 상기 주요 주제어를 결정하는 단계는, 복수의 시간 구간 각각에 대하여, 상기 태깅된 주제어 각각에 대한 시간 구간별 빈도수를 산출하는 단계 및 상기 태깅된 주제어 중에서, 상기 복수의 시간 구간에 걸쳐서 상기 산출된 시간 구간별 빈도수가 지속적인 감소 추세로 나타나는 주제어를 상기 주요 주제어로 결정하는 단계를 포함할 수 있다.In one embodiment, the determining of the main subject word includes calculating a frequency for each time period for each of the tagged subject words for each of a plurality of time periods, and among the tagged subject words, in the plurality of time periods. It may include the step of determining the main subject word in which the calculated frequency of each time section is continuously decreasing.

일 실시예에서, 상기 제1 주요 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도를 산출하는 단계는, 상기 제1 제목 키워드의 빈도수를 기초로, 상기 제1 주요 주제어에 대한 제1 키워드 벡터를 생성하는 단계, 상기 제2 제목 키워드의 빈도수를 기초로, 상기 제2 주요 주제어에 대한 제2 키워드 벡터를 생성하는 단계 및 상기 제1 키워드 벡터와 상기 제2 키워드 벡터 사이의 유사도를 이용하여, 상기 주제어 유사도를 산출하는 단계를 포함할 수 있다.In an embodiment, the calculating of the similarity of the subject word between the first subject word and the second subject word comprises generating a first keyword vector for the first subject word based on the frequency of the first subject keyword. Generating a second keyword vector for the second main keyword based on the frequency of the second subject keyword, and using a similarity between the first keyword vector and the second keyword vector, the main word It may include the step of calculating the similarity.

일 실시예에서, 상기 사용자에 의해 입력된 검색 키워드 에 대한 검색 결과를 제공하는 단계는, 상기 결정된 주요 주제어 간의 관계를 시각화된 네트워크로 제공하는 단계를 포함하되, 상기 시각화된 네트워크를 구성하는 제1 노드는 상기 제1 주요 주제어에 대응되고, 상기 시각화된 네트워크를 구성하는 제2 노드는 상기 제2 주요 주제어에 대응되며, 상기 제1 노드와 상기 제2 노드 사이의 간선의 가중치는 상기 제1 주요 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도에 기초하여 결정될 수 있다.In one embodiment, the providing of the search result for the search keyword input by the user includes providing a relationship between the determined main keywords as a visualized network, wherein the first A node corresponds to the first main subject word, a second node constituting the visualized network corresponds to the second main subject word, and the weight of the trunk line between the first node and the second node is the first main subject word. It may be determined based on the similarity of the subject word between the subject word and the second subject word.

상술한 기술적 과제를 해결하기 위한 본 발명의 다른 실시예에 따른 교육 콘텐츠 장치는, 하나 이상의 프로세서, 네트워크 인터페이스, 상기 프로세서에 의하여 수행되는 컴퓨터 프로그램을 로드(Load)하는 메모리 및 복수의 사용자에 의해 입력된 검색 키워드 및 상기 컴퓨터 프로그램을 저장하는 스토리지를 포함하되, 상기 컴퓨터 프로그램은, 상기 복수의 사용자에 의해 입력된 검색 키워드 각각에 태깅된 주제어 중에서 주요 주제어를 결정하는 오퍼레이션, 상기 결정된 주요 주제어 중 제1 주요 주제어와 연관된 제1 사용자에 의해 입력된 검색 키워드 의 제목에서, 상기 제1 사용자에 의해 입력된 검색 키워드 에 대한 제1 제목 키워드를 추출하는 오퍼레이션, 상기 결정된 주요 주제어 중 빅데이터로 구축된 교재 콘텐츠 빅데이터 의 제목에서, 상기 교재 콘텐츠에 대한 제2 제목 키워드를 추출하는 오퍼레이션, 상기 제1 제목 키워드와 상기 제2 제목 키워드 사이의 유사도를 기초로, 상기 제1 주요 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도를 산출하는 오퍼레이션 및 상기 산출된 주제어 유사도를 기초로, 상기 사용자에 의해 입력된 검색 키워드 에 대한 검색 결과를 제공하는 오퍼레이션을 포함할 수 있다.An educational content device according to another embodiment of the present invention for solving the above-described technical problem includes at least one processor, a network interface, a memory for loading a computer program executed by the processor, and input by a plurality of users. A storage for storing the searched keyword and the computer program, wherein the computer program includes an operation of determining a main subject from among subject words tagged to each of the search keywords input by the plurality of users, and a first among the determined main subject words Operation of extracting the first title keyword for the search keyword input by the first user from the title of the search keyword input by the first user related to the main subject word, textbook contents constructed with big data among the determined main subject words From the title of big data, an operation of extracting a second title keyword for the textbook content, based on the similarity between the first title keyword and the second title keyword, between the first and second main keywords An operation of calculating the similarity of the subject word of and an operation of providing a search result for the search keyword input by the user based on the calculated similarity of the subject word.

상술한 기술적 과제를 해결하기 위한 본 발명의 또 다른 실시예에 따른 교육 콘텐츠 컴퓨터 프로그램은, 컴퓨팅 장치와 결합되어, 사용자에 의해 입력된 검색 키워드 을 획득하는 단계, 상기 사용자에 의해 입력된 검색 키워드 각각에 태깅된 주제어 중에서 주요 주제어를 결정하는 단계, 상기 결정된 주요 주제어 중 제1 주요 주제어와 연관된 제1 검색 대상 콘텐츠의 제목에서, 상기 제1 사용자에 의해 입력된 검색 키워드 에 대한 제1 제목 키워드를 추출하는 단계, 상기 결정된 주요 주제어 중 빅데이터로 구축된 교재 콘텐츠 빅데이터 의 제목에서, 상기 교재 콘텐츠 에 대한 제2 제목 키워드를 추출하는 단계, 상기 제1 제목 키워드와 상기 제2 제목 키워드 사이의 유사도를 기초로, 상기 제1 관심 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도를 산출하는 단계 및 상기 산출된 주제어 유사도를 기초로, 상기 사용자에 의해 입력된 검색 키워드 에 대한 검색 결과를 제공하는 단계를 실행시키기 위하여 기록 매체에 저장될 수 있다.An educational content computer program according to another embodiment of the present invention for solving the above-described technical problem is combined with a computing device to obtain a search keyword input by a user, each of the search keywords input by the user Determining a main subject word from among the subject words tagged in, extracting a first subject keyword for the search keyword input by the first user from the title of the first search target content related to the first main subject among the determined main subject words The step of, extracting a second title keyword for the teaching material content from the title of the textbook content big data constructed with big data among the determined main keywords, and determining a similarity between the first title keyword and the second title keyword. As a basis, calculating a similarity of a subject word between the first subject word of interest and the second subject word, and providing a search result for a search keyword input by the user based on the calculated subject word similarity. It can be stored on a recording medium in order to do so.

본 발명에 따르면, 주어진 사용자에 의해 입력된 검색 키워드에 대한 교육 콘텐츠로 교육 주제 간의 관계를 시각화한 네트워크가 제공된다. 이를 통해, 사용자는 직관적으로 교육 트렌드 정보를 확인할 수 있는 바, 교육 콘텐츠의 정보 전달성이 향상되는 효과가 있다.According to the present invention, a network is provided in which a relationship between educational topics is visualized as educational content for a search keyword input by a given user. Through this, the user can intuitively check the education trend information, thereby improving information delivery of education content.

또한, 사용자에 의해 입력된 검색 키워드에 태깅된 주제어 중에서 범용 주제어는 제외되고, 지속적으로 빈도수가 증가하는 주제어가 주요 주제어로 결정될 수 있다. 이에 따라, 교육자의 관심이 증대되고 있는 주제어가 시각화된 네트워크에 반영되므로, 해당 분야의 교육 트렌드 정보가 반영된 양질의 교육 콘텐츠로 제공되는 효과가 있다. 또한, 상기 범용 주제어가 시각화된 네트워크에서 강한 허브 노드로 작용하는 것이 방지되므로, 정확한 교육 콘텐츠가 제공되는 효과가 있다.In addition, among the subject words tagged to the search keyword input by the user, a general-purpose subject word is excluded, and a subject word whose frequency is continuously increased may be determined as the main subject word. Accordingly, since the subject word, which is increasing the interest of the educator, is reflected in the visualized network, there is an effect that it is provided as high-quality educational content reflecting educational trend information in the field. In addition, since the general-purpose main word is prevented from acting as a strong hub node in the visualized network, there is an effect of providing accurate educational content.

또한, 제목 키워드의 유사도에 기초하여 주요 주제어 간의 주제어 유사도가 산출될 수 있다. 이에 따라, 유사도 높은 주요 주제어가 밀집되고, 강한 허브 노드의 효과는 억제되는 바, 정확하고 가치 있는 교육 콘텐츠가 제공되는 효과가 있다.Also, the similarity of the main subject words may be calculated based on the similarity of the subject keywords. Accordingly, major keywords with high similarity are concentrated, and the effect of a strong hub node is suppressed, thereby providing accurate and valuable educational content.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 종래의 동시 발생 빈도에 기초하여 생성된 네트워크의 예시도이다.
도 2는 본 발명의 일 실시예에 따른 교육 콘텐츠 시스템의 구성도이다.
도 3은 본 발명의 몇몇 실시예에서 참조될 수 있는 예시적인 주제어를 설명하기 위한 도면이다.
도 4는 본 발명의 다른 실시예에 따른 교육 콘텐츠 장치를 나타내는 블록도이다.
도 5는 본 발명의 또 다른 실시예에 따른 교육 콘텐츠 장치의 하드웨어 구성도이다.
도 6은 본 발명의 또 다른 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법의 흐름도이다.
도 7은 도 6에 도시된 검색 대상 기간 결정 단계(S200)를 설명하기 위한 도면이다.
도 8 내지 도 11은 도 6에 도시된 주요 주제어 결정 단계(S400)를 설명하기 위한 도면이다.
도 12 내지 도 15는 도 6에 도시된 주제어 유사도 산출 단계(S500)를 설명하기 위한 도면이다.
도 16 내지 도 18은 도 6에 도시된 검색 결과 제공 단계(S600)를 설명하기 위한 도면이다.
도 19, 20은 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법에 의해 제공된 네트워크의 평가 결과를 설명하기 위한 도면이다.1 is an exemplary diagram of a network created based on a conventional co-occurrence frequency.
2 is a block diagram of an educational content system according to an embodiment of the present invention.
3 is a diagram for explaining exemplary keywords that can be referred to in some embodiments of the present invention.
4 is a block diagram showing an educational content device according to another embodiment of the present invention.
5 is a hardware configuration diagram of an educational content device according to another embodiment of the present invention.
6 is a flowchart of a method of generating educational content using big data according to another embodiment of the present invention.
FIG. 7 is a diagram for explaining the step S200 of determining a search target period shown in FIG. 6.
8 to 11 are diagrams for explaining the determining step (S400) of the main main words shown in FIG. 6.
12 to 15 are diagrams for explaining an operation S500 of calculating the similarity of the main word shown in FIG. 6.
16 to 18 are diagrams for explaining a search result providing step (S600) shown in FIG. 6.
19 and 20 are diagrams for explaining evaluation results of a network provided by a method for generating educational content using big data according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms, and only these embodiments make the disclosure of the present invention complete, and are common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used with meanings that can be commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not interpreted ideally or excessively unless explicitly defined specifically. The terms used in the present specification are for describing exemplary embodiments and are not intended to limit the present invention. In this specification, the singular form also includes the plural form unless specifically stated in the phrase.

이하, 본 발명의 몇몇 실시예들에 대하여 첨부된 도면에 따라 상세하게 설명한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 교육 콘텐츠 시스템의 구성도이다.2 is a block diagram of an educational content system according to an embodiment of the present invention.

도 2를 참조하면, 상기 교육 콘텐츠 시스템은 교육 콘텐츠 장치(100), 적어도 하나의 빅데이터 DB(200a 내지 200n) 및 사용자 단말(300)을 포함하도록 구성될 수 있다. 특히, 도 2에는 복수의 빅데이터 DB(200a 내지 200n)에서 교육 콘텐츠를 획득하는 경우를 예로써 도시되었다. 단, 도 2에 도시된 구성은 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 구성 요소가 추가되거나 삭제될 수 있음은 물론이다. 또한, 도 2에 도시된 교육 콘텐츠 시스템의 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 적어도 하나의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있음에 유의한다. 이하, 상기 콘텐츠 검색 시스템의 각 구성 요소에 대하여 설명한다.Referring to FIG. 2, the educational content system may be configured to include an educational content device 100, at least one big data DB 200a to 200n, and a user terminal 300. In particular, FIG. 2 shows an example of acquiring educational content from a plurality of big data DBs 200a to 200n. However, the configuration shown in FIG. 2 is only a preferred embodiment for achieving the object of the present invention, and of course, some components may be added or deleted as necessary. In addition, it should be noted that each of the components of the educational content system shown in FIG. 2 represents functional elements that are functionally divided, and at least one component may be implemented in a form in which they are integrated with each other in an actual physical environment. Hereinafter, each component of the content search system will be described.

상기 교육 콘텐츠 시스템에서, 교육 콘텐츠 장치(100)는 빅데이터 DB(200a 내지 200n)에서 획득된 사용자에 의해 입력된 검색 키워드 에 대한 검색을 수행하고, 교육 콘텐츠를 제공하는 컴퓨팅 장치이다.In the educational content system, the educational content device 100 is a computing device that performs a search for a search keyword input by a user acquired from the big data DBs 200a to 200n and provides educational content.

여기서, 상기 컴퓨팅 장치는, 노트북, 데스크톱(desktop), 랩탑(laptop), 스마트폰(Smart Phone) 등이 될 수 있으나, 이에 국한되는 것은 아니며 연산 수단 및 통신 수단이 구비된 모든 종류의 장치를 포함할 수 있다. 또한, 사용자에 의해 입력된 검색 키워드는 교재, 멀티미디어 데이터, 교육 보고서 등을 포함할 수 있고, 어떠한 종류의 콘텐츠 정보라도 무방하다. 즉, 본 발명의 범위가 콘텐츠의 종류에 국한되는 것은 아니다.Here, the computing device may be a notebook, a desktop, a laptop, a smart phone, etc., but is not limited thereto, and includes all kinds of devices equipped with computing means and communication means. can do. In addition, the search keyword input by the user may include textbooks, multimedia data, educational reports, and the like, and any type of content information may be used. That is, the scope of the present invention is not limited to the type of content.

몇몇 실시예에서, 상기 사용자에 의해 입력된 검색 키워드는 소정의 검색 툴을 통해 검색 결과로 제공된 콘텐츠일 수도 있다. In some embodiments, the search keyword input by the user may be content provided as a search result through a predetermined search tool.

본 발명의 실시예에 따르면, 교육 콘텐츠 장치(100)는 획득된 사용자에 의해 입력된 검색 키워드각각에 태깅된 주제어 중에서 범용 주제어를 제외하고, 나머지 주제어 중에서 주요 주제어를 결정하며, 상기 주요 주제어 간의 유사도를 기초로 시각화된 네트워크를 교육 콘텐츠로 제공한다. 여기서, 상기 범용 주제어는 사용자에 의해 입력된 검색 키워드 에 높은 빈도로 등장하나 교육 주제로서의 중요도는 높지 않은 일상 용어를 의미할 수 있다. 본 실시예에 따르면, 범용 주제어를 검색 수행 전에 미리 제거함으로써, 상기 범용 주제어가 시각화된 네트워크에서 강한 허브로 작용하는 것이 방지될 수 있다. 이에 따라, 사용자에게 보다 정확한 교육 콘텐츠가 제공될 수 있다. 본 실시예에 대한 보다 자세한 설명은 도 7 이하의 도면을 참조하여 후술하도록 한다.According to an embodiment of the present invention, the educational content device 100 determines a main subject among the remaining subject words, excluding a general subject word from among subject words tagged to each of the acquired search keywords input by the user, and determines the similarity between the main subject words. Provides a visualized network as educational content. Here, the general-purpose keyword may refer to a daily term that appears in a search keyword input by a user with high frequency but has a low importance as an educational subject. According to the present embodiment, by removing the general-purpose main word before performing a search, it is possible to prevent the general-purpose main word from acting as a strong hub in a visualized network. Accordingly, more accurate educational content may be provided to the user. A more detailed description of the present embodiment will be described later with reference to the accompanying drawings in FIG. 7.

일 실시예에서, 상기 사용자에 의해 입력된 검색 키워드가 임의의 교육 정보를 통해 검색 결과로 제공된 어학 분야의 콘텐츠인 경우, 상기 태깅된 주제어는 도 3에 도시된 바와 같은 어학 주제어일 수 있다. 상기 어학 주제어는 계층적 구조로 해당 분야의 교육 주제가 잘 반영되어 있고, 해당 분야의 교육자에게 익숙한 용어로 구성된다. 본 실시예에 따르면, 키워드 기반으로 교육 콘텐츠가 제공됨으로써, 해당 분야의 교육자에게 정보 전달성이 증대되는 효과가 있다. In an embodiment, when the search keyword input by the user is content in a language field provided as a search result through arbitrary educational information, the tagged subject word may be a language subject word as shown in FIG. 3. The language subject words have a hierarchical structure, which reflects the educational topics in the relevant field well, and consists of terms familiar to educators in the field. According to the present embodiment, by providing educational content based on keywords, there is an effect of increasing information delivery to educators in the relevant field.

일 실시예에서, 상기 사용자에 의해 입력된 검색 키워드의 유형이 멀티미디어인 경우, 상기 태깅된 주제어는 멀티미디어 데이터의 제목이나 저자에 의해 정의된 키워드일 수 있다. 예를 들어, 도 4에 도시된 바와 같은 멀티미디어(410)에서, 태깅된 주제어는 멀티미디어 제목(411) 외에 별도로 멀티미디어 저자에 의해 수록된 키워드(413)를 의미하는 것일 수 있다.In an embodiment, when the type of the search keyword input by the user is multimedia, the tagged subject word may be a title of multimedia data or a keyword defined by an author. For example, in the multimedia 410 as illustrated in FIG. 4, the tagged subject word may mean a keyword 413 separately included by the multimedia author in addition to the multimedia title 411.

본 발명의 실시예에 따르면, 교육 콘텐츠 장치(100)는 주요 주제어와 연관된 사용자에 의해 입력된 검색 키워드의 제목에서 제목 키워드를 추출한다. 또한, 교육 콘텐츠 장치(100)는 제목 키워드 간의 유사도에 기초하여 주요 주제어 간의 주제어 유사도를 산출한다. 즉, 주요 주제어의 동시 발생 빈도 등에 기초하여 상기 주제어 유사도를 산출하는 것이 아니라, 상기 주요 주제어와 연관된 제목 키워드의 유사도에 기초하여, 상기 주요 주제어 간의 주제어 유사도를 산출한다. 이는 콘텐츠의 제목이 저자가 표현하고자 하는 핵심적인 교육 주제를 함축하고 있고, 해당 분야에서 이용되는 최신 키워드를 포함하고 있다는 점을 이용한 것이다. 본 실시예에 따르면, 주요 주제어 간의 유사도가 보다 정확하게 산출될 수 있고, 범용 주제어, 검색어 등이 시각화된 네트워크에서 강한 허브 노드로 작용하는 것이 방지될 수 있다. 이에 따라, 사용자에게 양질의 가치 있는 교육 콘텐츠가 제공될 수 있다. 본 실시예에 대한 자세한 설명 또한 도 11 이하의 도면을 참조하여 후술하도록 한다.According to an embodiment of the present invention, the educational content device 100 extracts a title keyword from a title of a search keyword input by a user associated with a main subject word. In addition, the educational content device 100 calculates the similarity of the main subject words based on the similarity between the subject keywords. That is, the similarity of the subject words is not calculated based on the frequency of simultaneous occurrence of the main subject words, but the similarity of the subject words between the main subject words is calculated based on the similarity of the subject keywords associated with the main subject words. This is to take advantage of the fact that the title of the content implies the core educational subject that the author wants to express and contains the latest keywords used in the relevant field. According to the present embodiment, similarity between main subject words may be more accurately calculated, and it may be prevented from acting as a strong hub node in a network in which universal subject words and search words are visualized. Accordingly, high quality and valuable educational content can be provided to the user. A detailed description of the present embodiment will also be described later with reference to the accompanying drawings in FIG. 11.

본 발명의 실시예에 따르면, 교육 콘텐츠 장치(100)는 검색 대상 기간 동안에 발행된 사용자에 의해 입력된 검색 키워드 에서 관심 주제어를 결정하고, 상기 결정된 주요 주제어 간의 연관성을 시각화된 네트워크 형태로 제공할 수 있다. 상기 분석 대상 기간은 사용자에 의해 설정될 수 있고, 교육 콘텐츠 장치(100)에 의해 자동으로 설정될 수 있으며, 이는 실시예에 따라 달라질 수 있다. 본 실시예에 따르면, 시각화된 네트워크를 이용하여 정보 전달성이 향상될 수 있다. 특히, 검색 대상 기간이 최근의 기간으로 설정되는 경우, 상기 시각화된 네트워크에는 최신 교육 트렌드가 반영되기 때문에 보다 가치 있는 교육 콘텐츠가 제공될 수 있다. 상기 시각화된 네트워크에 대한 보다 자세한 설명은 도 15 이하의 도면을 참조하여 후술하도록 한다. According to an embodiment of the present invention, the educational content device 100 may determine a subject of interest from a search keyword input by a user issued during a search target period, and provide the relationship between the determined main subject words in a visualized network form. have. The analysis target period may be set by a user or may be automatically set by the educational content device 100, which may vary according to embodiments. According to the present embodiment, information delivery may be improved by using a visualized network. In particular, when the search target period is set as the latest period, more valuable educational content can be provided because the latest educational trend is reflected in the visualized network. A more detailed description of the visualized network will be described later with reference to the accompanying drawings in FIG. 15.

상기 교육 콘텐츠 시스템에서, 빅데이터 DB(200a 내재 200n)는 사용자에 의해 입력된 검색 키워드를 제공받는 데이터 소스이다. 빅데이터 DB(200a 내재 200n)는 외부에 위치한 데이터 베이스일 수 있고, 교육 콘텐츠 장치(100)의 로컬 저장소를 의미할 수도 있으며, 사용자에 의해 입력된 검색 키워드를 제공할 수만 있다면 어떠한 방식으로 구현되더라도 무방하다.In the educational content system, the big data DB 200a internally 200n is a data source that receives a search keyword input by a user. The big data DB (200a internal 200n) may be an externally located database, may mean a local storage of the educational content device 100, and may be implemented in any way as long as it can provide a search keyword entered by a user. It's okay.

상기 교육 콘텐츠 시스템에서, 사용자 단말(300)은 사용자에 의해 입력된 검색 키워드 에 대한 교육 콘텐츠를 제공받기 위해 사용자가 이용하는 단말이다.In the educational content system, the user terminal 300 is a terminal used by a user to receive educational content for a search keyword input by the user.

도 2에 도시된 교육 콘텐츠 시스템의 각 구성 요소는 네트워크를 통해 통신할 수 있다. 여기서, 상기 네트워크는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 이동 통신망(mobile radioEach component of the educational content system illustrated in FIG. 2 may communicate through a network. Here, the network is a local area network (LAN), a wide area network (WAN), and a mobile radio network.

communication network), Wibro(Wireless Broadband Internet) 등과 같은 모든 종류의 유/무선 네트워크로 구현될 수 있다.communication network), Wibro (Wireless Broadband Internet), etc. can be implemented in all types of wired / wireless networks.

지금까지 도 2 내지 도 3을 참조하여 본 발명의 일 실시예에 따른 교육 콘텐츠 시스템에 대하여 설명하였다. 다음으로, 교육 콘텐츠 장치(100)의 구성 및 동작에 대하여 도 4 및 도 5를 참조하여 설명하도록 한다.So far, an educational content system according to an embodiment of the present invention has been described with reference to FIGS. 2 to 3. Next, the configuration and operation of the educational content device 100 will be described with reference to FIGS. 4 and 5.

도 4는 본 발명의 다른 실시예에 따른 교육 콘텐츠 장치(100)를 나타내는 블록도이다.4 is a block diagram showing an educational content device 100 according to another embodiment of the present invention.

도 4를 참조하면, 교육 콘텐츠 장치(100)는 검색 대상 기간 결정부(110), 주제어 추출부(130), 주요 주제어 결정부(150), 유사도 산출부(170) 및 시각화부(190)를 포함하도록 구성될 수 있다. 다만, 도 4에는 본 발명의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 4에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다. 또한, 도 4에 도시된 교육 콘텐츠 장치의 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 적어도 하나의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있음에 유의한다. 이하, 교육 콘텐츠 장치(100)의 각 구성 요소에 대하여 설명한다.Referring to FIG. 4, the educational content device 100 includes a search target period determining unit 110, a main word extracting unit 130, a main main word determining unit 150, a similarity calculating unit 170, and a visualization unit 190. It can be configured to include. However, only components related to the embodiment of the present invention are shown in FIG. 4. Accordingly, those of ordinary skill in the art to which the present invention belongs may recognize that other general-purpose components may be further included in addition to the components illustrated in FIG. 4. In addition, it should be noted that each of the components of the educational content device illustrated in FIG. 4 represents functional elements that are functionally divided, and at least one component may be implemented in a form integrated with each other in an actual physical environment. Hereinafter, each component of the educational content device 100 will be described.

검색 대상 기간 결정부(110)는 시간 구간별 발행 콘텐츠의 수를 기초로 검색 대상 기간을 결정한다. 이에 대한 자세한 설명은 도 7 을 참조하여 후술하도록 한다.The search target period determining unit 110 determines a search target period based on the number of published contents for each time period. A detailed description of this will be described later with reference to FIG. 7.

주제어 추출부(130)는 검색 대상 기간 동안에 발행된 사용자에 의해 입력된 검색 키워드 에 태깅된 주제어를 추출한다. 주요 주제어 결정부(150)는 주제어 별 빈도수에 기초하여 추출된 주제어 중에서 범용 주제어를 제거하고, 나머지 주제어 중에서 주요 주제어를 결정한다. 이에 대한 자세한 설명은 도 9 내지 도 10c를 참조하여 후술하도록 한다.The subject word extracting unit 130 extracts the subject word tagged to the search keyword input by the user issued during the search target period. The main subject word determination unit 150 removes the general-purpose subject word from the extracted subject words based on the frequency of each subject word, and determines the main subject word among the remaining subject words. A detailed description of this will be described later with reference to FIGS. 9 to 10C.

유사도 산출부(170)는 주요 주제어 간의 유사도를 산출한다. 구체적으로, 유사도 [0048] 산출부(170)는 상기 결정된 관심 주제어 중 제1 주요 주제어와 연관된 제1 사용자에 의해 입력된 검색 키워드의 제목에서 추출된 제1 제목 키워드와 상기 결정된 주요 주제어 중 빅데이터로 구축된 교재 콘텐츠 빅데이터의 제목에서 추출된 제2 제목 키워드 사이의 유사도를 기초로, 상기 제1 주요 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도를 산출한다. 이에 대한 자세한 설명은 도 12 내지 도 15를 참조하여 후술하도록 한다.The similarity calculation unit 170 calculates the similarity between main subject words. Specifically, the similarity calculation unit 170 includes a first title keyword extracted from a title of a search keyword input by a first user associated with a first main keyword among the determined subject words of interest, and big data among the determined main keywords. Based on the similarity between the second title keyword extracted from the title of the textbook content big data constructed as, the similarity of the subject word between the first main subject and the second main subject is calculated. A detailed description of this will be described later with reference to FIGS. 12 to 15.

시각화부(190)는 상기 산출된 주제어 유사도를 기초로 주요 주제어 간의 관계를 네트워크로 시각화한다. 이에대한 자세한 설명은 도 15 및 도 16을 참조하여 후술하도록 한다.The visualization unit 190 visualizes the relationship between the main subject words as a network based on the calculated similarity of the subject words. A detailed description of this will be described later with reference to FIGS. 15 and 16.

한편, 도 4에는 도시되어 있지 않으나, 교육 콘텐츠 장치(100)는 인터페이스부(미도시)를 더 포함하도록 구성될 수 있다. 상기 인터페이스부(미도시)는 시각화된 네트워크를 교육 콘텐츠로 제공한다. 또한, 상기 인터페이스부(미도시)는 사용자 또는 사용자 단말로부터 각종 입력을 수신하고, 상기 각종 입력에 따라 트리거되는 동작이 수행되도록 한다. 예를 들어, 상기 인터페이스부(미도시)는 네트워크를 구성하는 특정 노드에 대한 선택 입력을 수신하고, 상기 선택 입력에 응답하여 제목 키워드의 빈도수, 관련 콘텐츠 등을 제공할 수 있다. 이에 대한 자세한 설명 또한 도 16 내지 도 18을 참조하여 후술하도록 한다.Meanwhile, although not shown in FIG. 4, the educational content device 100 may be configured to further include an interface unit (not shown). The interface unit (not shown) provides a visualized network as educational content. In addition, the interface unit (not shown) receives various inputs from a user or a user terminal, and performs an operation triggered according to the various inputs. For example, the interface unit (not shown) may receive a selection input for a specific node constituting a network, and provide the frequency of a title keyword, related content, and the like in response to the selection input. A detailed description of this will also be described later with reference to FIGS. 16 to 18.

도 4의 각 구성 요소는 소프트웨어(Software) 또는, FPGA(Field Programmable Gate Array)나 ASIC(Application-Specific Integrated Circuit)과 같은 하드웨어(Hardware)를 의미할 수 있다. 그렇지만, 상기 구성 요소들은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 어드레싱(Addressing)할 수 있는 저장 매체에 있도록 구성될 수도 있고, 하나 또는 그 이상의 프로세서들을 실행시키도록 구성될 수도 있다. 상기 구성 요소들 안에서 제공되는 기능은 더 세분화된 구성 요소에 의하여 구현될 수 있으며, 복수의 구성 요소들을 합하여 특정한 기능을 수행하는 하나의 구성 요소로 구현될 수도 있다.Each component of FIG. 4 may refer to software or hardware such as a Field Programmable Gate Array (FPGA) or an Application-Specific Integrated Circuit (ASIC). However, the components are not limited to software or hardware, and may be configured to be in an addressable storage medium, or may be configured to execute one or more processors. The functions provided in the above components may be implemented by more subdivided components, or may be implemented as one component that performs a specific function by combining a plurality of components.

다음으로, 도 5는 본 발명의 또 다른 실시예에 따른 교육 콘텐츠 장치(100)의 하드웨어 구성도이다.Next, FIG. 5 is a hardware configuration diagram of an educational content device 100 according to another embodiment of the present invention.

도 6을 참조하면, 교육 콘텐츠 장치(100)는 하나 이상의 프로세서(101), 버스(105), 네트워크 인터페이스(107), 프로세서(101)에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리(103)와, 교육 콘텐츠 소프트웨어(109a)를 저장하는 스토리지(109)를 포함할 수 있다. 다만, 도 6에는 본 발명의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 6에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.Referring to FIG. 6, the educational content device 100 includes one or more processors 101, a bus 105, a network interface 107, and a memory 103 that loads a computer program executed by the processor 101. ), and a storage 109 for storing the educational content software 109a. However, only components related to an embodiment of the present invention are shown in FIG. 6. Accordingly, those of ordinary skill in the art to which the present invention belongs may recognize that other general-purpose components may be further included in addition to the components illustrated in FIG. 6.

프로세서(101)는 교육 콘텐츠 장치(100)의 각 구성의 전반적인 동작을 제어한다. 프로세서(101)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 발명의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성될 수 있다. 또한, 프로세서(101)는 본 발명의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 교육 콘텐츠 장치(100)는 하나 이상의 프로세서를 구비할 수 있다.The processor 101 controls the overall operation of each component of the educational content device 100. The processor 101 includes a CPU (Central Processing Unit), MPU (Micro Processor Unit), MCU (Micro Controller Unit), GPU (Graphic Processing Unit), or any type of processor well known in the technical field of the present invention. Can be. In addition, the processor 101 may perform an operation on at least one application or program for executing the method according to the embodiments of the present invention. The educational content device 100 may include one or more processors.

메모리(103)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(103)는 본 발명의 실시예들에 따른 콘텐츠 분석 방법을 실행하기 위하여 스토리지(109)로부터 하나 이상의 프로그램(109a)을 로드할 수 있다. 도 6에서 메모리(103)의 예시로 RAM이 도시되었다.The memory 103 stores various types of data, commands and/or information. The memory 103 may load one or more programs 109a from the storage 109 to execute the content analysis method according to embodiments of the present invention. In Fig. 6, a RAM is shown as an example of the memory 103.

버스(105)는 교육 콘텐츠 장치(100)의 구성 요소 간 통신 기능을 제공한다. 버스(105)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.The bus 105 provides a communication function between components of the educational content device 100. The bus 105 may be implemented as various types of buses such as an address bus, a data bus, and a control bus.

네트워크 인터페이스(107)는 교육 콘텐츠 장치(100)의 유무선 인터넷 통신을 지원한다. 또한, 네트워크 인터페이스(107)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 네트워크 인터페이스(107)는 본 발명의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.The network interface 107 supports wired/wireless Internet communication of the educational content device 100. In addition, the network interface 107 may support various communication methods other than Internet communication. To this end, the network interface 107 may be configured to include a communication module well known in the art.

스토리지(109)는 하나 이상의 프로그램(109a) 및 사용자에 의해 입력된 검색 키워드 (109b)을 비임시적으로 저장할 수 있다. 도 6에서 상기 하나 이상의 프로그램(109a)의 예시로 교육 콘텐츠 소프트웨어(109a)가 도시되었다.The storage 109 may non-temporarily store one or more programs 109a and a search keyword 109b input by a user. In FIG. 6, educational content software 109a is shown as an example of the one or more programs 109a.

스토리지(109)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 109 is a nonvolatile memory such as a ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), flash memory, etc., a hard disk, a removable disk, or well in the technical field to which the present invention belongs. It may be configured to include any known computer-readable recording medium.

교육 콘텐츠 소프트웨어(109a)는 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법을 실행할 [0060] 수 있다. 예를 들어, 키워드 분석 소프트웨어(109a)는 메모리(103)에 로드되어, 하나 이상의 프로세서(101)에 의해, 상기 복수의 검색 대상 교재 각각에 태깅된 주제어 중에서 주요 주제어를 결정하는 오퍼레이션, 상기 결정된 주요 주제어 중 제1 관심 주제어와 연관된 제1 사용자에 의해 입력된 검색 키워드 의 제목에서, 상기 제1 사용자에 의해 입력된 검색 키워드 에 대한 제1 제목 키워드를 추출하는 오퍼레이션, 상기 결정된 주요 주제어 중 빅데이터로 구축된 교재 콘텐츠 빅데이터 의 제목에서, 상기 제2 사용자에 의해 입력된 검색 키워드 에 대한 제2 제목 키워드를 추출하는 오퍼레이션, 상기 제1 제목 키워드와 상기 제2 제목 키워드를 기초로, 상기 제1 주요 주제어와 상기 제2 주요 주제어 사이의 주제어 유사도를 산출하는 오퍼레이션 및 상기 산출된 주제어 유사도를 기초로, 상기 사용자에 의해 입력된 검색 키워드 에 대한 검색 결과를 제공하는 오퍼레이션을 실행할 수 있다.Educational content software (109a) can execute the educational content generation method using big data according to an embodiment of the present invention. For example, the keyword analysis software 109a is loaded into the memory 103 and, by one or more processors 101, an operation of determining a main subject among subject words tagged to each of the plurality of search target textbooks, and the determined main subject words. An operation of extracting the first subject keyword for the search keyword input by the first user from the subject of the search keyword input by the first user associated with the first subject word of interest among the subject words, as big data among the determined main subject words An operation of extracting a second title keyword for a search keyword input by the second user from the title of the constructed textbook content big data, based on the first title keyword and the second title keyword, the first main An operation of calculating a similarity of a subject word between the subject word and the second subject word, and an operation of providing a search result for a search keyword input by the user based on the calculated similarity of the subject word may be executed.

지금까지, 도 4 및 도 5를 참조하여 본 발명의 실시예에 따른 교육 콘텐츠 장치(100)의 구성 및 동작에 대하여 설명하였다. 다음으로, 도 6 내지 도 18을 참조하여 본 발명의 또 다른 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법에 대하여 상세하게 설명한다.So far, the configuration and operation of the educational content device 100 according to an embodiment of the present invention has been described with reference to FIGS. 4 and 5. Next, a method of generating educational content using big data according to another embodiment of the present invention will be described in detail with reference to FIGS. 6 to 18.

이하, 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법의 각 단계는, 컴퓨팅 장치에 의해 수행될 수 있다. 예를 들어, 상기 컴퓨팅 장치는 교육 콘텐츠 장치(100)일 수 있다. 다만, 설명의 편의를 위해, 상기 빅데이터를 이용한 교육 콘텐츠 생성 방법에 포함되는 각 단계의 동작 주체는 그 기재가 생략될 수도 있다. 또한, 빅데이터를 이용한 교육 콘텐츠 생성 방법의 각 단계는 프로세서(101)에 의해 실행되는 오퍼레이션으로 구현될 수 있다. Hereinafter, each step of the method for generating educational content using big data according to an embodiment of the present invention may be performed by a computing device. For example, the computing device may be an educational content device 100. However, for convenience of explanation, the description of the operation subject of each step included in the method for generating educational content using big data may be omitted. In addition, each step of the method for generating educational content using big data may be implemented as an operation executed by the processor 101.

도 6은 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법의 흐름도이다. 단, 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.6 is a flowchart of a method of generating educational content using big data according to an embodiment of the present invention. However, this is only a preferred embodiment for achieving the object of the present invention, and of course, some steps may be added or deleted as necessary.

도 6을 참조하면, 단계(S100)에서, 복수의 콘텐츠가 획득된다. 예를 들어, 상기 복수의 콘텐츠는 소정의 검색 툴을 통해 검색 결과로 획득된 콘텐츠일 수 있다. 다른 예를 들어, 상기 복수의 콘텐츠는 기 설정된 데이터 소스로부터 크롤링(crawling)을 통해 획득된 콘텐츠일 수도 있다. 콘텐츠를 획득하는 방식은 어떠한 방식이 되더라도 무방하다.Referring to FIG. 6, in step S100, a plurality of contents are obtained. For example, the plurality of contents may be contents obtained as a search result through a predetermined search tool. For another example, the plurality of contents may be contents obtained through crawling from a preset data source. The method of acquiring the content can be any method.

단계(S200)에서, 시간 구간별 발행 콘텐츠 수를 기초로 검색 대상 기간이 결정된다. 또한, 사용자에 의해 입력된 검색 키워드는 상기 획득된 복수의 콘텐츠 중에서 상기 검색 대상 기간 동안에 발행된 콘텐츠로 결정될 수 있다. 다만, 실시예에 따라, 상기 검색 대상 기간은 사용자에 의해 설정될 수도 있고, 특정 기간으로 한정되지 않을 수도 있다. 이하, 교육 콘텐츠 장치(100)가 상기 검색 대상 기간을 결정하는 방법에 대하여 도 8을 참조하여 부연 설명하도록 한다.In step S200, a search target period is determined based on the number of published contents for each time period. In addition, the search keyword input by the user may be determined as a content issued during the search target period among the plurality of acquired contents. However, according to embodiments, the search target period may be set by a user or may not be limited to a specific period. Hereinafter, a method of determining the search target period by the educational content device 100 will be further described with reference to FIG. 8.

도 7은 단위 시간 구간이 "1년"으로 설정된 경우 검색 대상 기간을 결정하는 예를 도시한다.7 shows an example of determining a search target period when the unit time period is set to "1 year".

도 7을 참조하면, 단위 시간 구간이 "1년"인 경우, 획득된 교재들의 발행년도를 기초로 연도별 발행 콘텐츠 수가 산출된다. 그러면, 산출된 연도별 발행 콘텐츠 수가 기 설정된 임계 값(430) 이상이 되는 첫 시간 구간이 검색 대상 기간(450)의 시작 구간으로 결정될 수 있다. 도 7은 2010년도가 검색 대상 기간(450)의 시작 시간 구간으로 결정된 것이 예로써 도시되었다. 검색 대상 기간(450)의 마지막 시간 구간은 가장 최근의 시간 구간이 될 수 있고, 연도별 발행 콘텐츠 수가 임계 값(430)이상이 되는 마지막 시간 구간이 될 수도 있다. 본 실시예에 따르면, 시간 구간별 발행 콘텐츠 수가 임계 값에 미달하는 과거의 시간 구간은 검색의 필요성이 떨어지는 점을 고려하여 검색 대상 기간에서 제외될 수 있다. 이에 따라, 사용자에 의해 입력된 검색 키워드 의 개수가 감소되므로 검색을 수행하기 위해 소요되는 컴퓨팅 비용이 절약될 수 있다.Referring to FIG. 7, when the unit time interval is "1 year", the number of published contents per year is calculated based on the publication year of the acquired textbooks. Then, the first time interval in which the calculated number of published contents per year becomes equal to or greater than the preset threshold value 430 may be determined as the start interval of the search target period 450. 7 is shown as an example that 2010 is determined as the start time section of the search target period 450. The last time interval of the search target period 450 may be the most recent time interval, or may be the last time interval in which the number of published contents per year exceeds the threshold value 430. According to the present embodiment, a time section in the past in which the number of published contents per time section is less than a threshold value may be excluded from the search target period in consideration of the fact that the need for search is low. Accordingly, since the number of search keywords input by the user is reduced, computing cost required to perform a search can be saved.

본 발명의 실시예에 따르면, 시간 구간별 평균 발행 콘텐츠 수를 산출하고, 이를 기초로 검색 대상 기간이 결정될 수도 있다. 예를 들어, 단위 시간 구간이 "3년"으로 설정된 경우, 도 7에 도시된 바와 같이 연도별 발행 콘텐츠의 수를 산출한 다음, 3년 단위로 평균 발행 콘텐츠 수가 산출될 수 있다. 그러면, 상기 평균 발행 콘텐츠 수가 임계값 이상이 되는 첫 3년에 해당하는 시간 구간이 검색 대상 기간의 시작 시간 구간으로 결정될 수 있다. 본 실시예에 따르면, 빈도수의 변동(fluctuation)을 고려하여 보다 정확하게 검색 대상 기간이 결정될 수 있다.According to an embodiment of the present invention, an average number of published contents per time section may be calculated, and a search target period may be determined based on this. For example, when the unit time section is set to "3 years", as shown in FIG. 7, after calculating the number of published contents by year, the average number of published contents may be calculated in units of three years. Then, a time interval corresponding to the first three years in which the average number of published contents becomes greater than or equal to the threshold value may be determined as a start time interval of the search target period. According to the present embodiment, the search target period may be more accurately determined in consideration of fluctuation of the frequency.

또한, 본 발명의 실시예에 따르면, 복수의 검색 대상 기간이 결정될 수도 있다. 예를 들어, 도 7에 도시된 검색 대상 기간(450)이 전체 검색 대상 기간으로 결정되고, 상기 전체 검색 대상 기간에 포함된 제1 시간 구간이 제1 검색 대상 기간이 되며, 상기 전체 검색 대상 기간에 포함된 제2 시간 구간이 제2 검색 대상 기간이 될 수 있다. 이와 같은 경우, 각 검색 대상 기간 별로 시각화된 네트워크가 교육 콘텐츠로 제공될 수 있다.Further, according to an embodiment of the present invention, a plurality of search target periods may be determined. For example, the search target period 450 shown in FIG. 7 is determined as the total search target period, the first time period included in the total search target period becomes the first search target period, and the total search target period The second time period included in may be a second search target period. In this case, a network visualized for each search target period may be provided as educational content.

다시 도 6을 참조하면, 단계(S300)에서, 사용자에 의해 입력된 검색 키워드 각각에 태깅된 주제어가 추출된다. Referring back to FIG. 6, in step S300, a keyword tagged to each search keyword input by a user is extracted.

예를 들어, 상기 태깅된 주제어는 사용자에 의해 입력된 검색 키워드가 어학 교육 관련인 경우 교재의 서지 정보에 포함된 키워드일 수 있다.For example, the tagged subject word may be a keyword included in bibliographic information of a textbook when a search keyword input by a user is related to language education.

단계(S400)에서, 태깅된 주제어 중에서 주요 주제어가 결정된다. 본 단계(S400) 에 대한 자세한 설명은 도 8 내지 도 11을 참조하여 후술하도록 한다.In step S400, the main subject words are determined among the tagged subject words. A detailed description of this step (S400) will be described later with reference to FIGS. 8 to 11.

단계(S500)에서, 주요 주제어 간의 주제어 유사도가 산출된다. 주요 주제어 간의 주제어 유사도는 주요 주제어와 연관된 사용자에 의해 입력된 검색 키워드 의 제목에서 추출된 제목 키워드 간의 유사도에 기초하여 산출된다. 본 단계(S500)에 대한 자세한 설명은 도 8 내지 도 11를 참조하여 후술하도록 한다.In step S500, the similarity of the subject words between the main subject words is calculated. The degree of similarity between the main subject words is calculated based on the degree of similarity between subject keywords extracted from the subject of the search keyword input by the user associated with the main subject words. A detailed description of this step (S500) will be described later with reference to FIGS. 8 to 11.

단계(S600)에서, 주제어 유사도를 기초로, 사용자에 의해 입력된 검색 키워드 에 대한 검색 결과가 제공된다. 예를 들어, 상기 검색 결과는 시각화된 네트워크일 수 있다. 본 단계(S600)에 대한 자세한 설명은 도 15 및 도 16을 참조하여 후술하도록 한다.In step S600, a search result for a search keyword input by the user is provided based on the similarity of the subject word. For example, the search result may be a visualized network. A detailed description of this step (S600) will be described later with reference to FIGS. 15 and 16.

지금까지, 도 6 및 도 7을 참조하여, 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법에 대하여 설명하였다. 이하에서 는, 도 8 내지 도 11을 참조하여, 단계(S400)에서 수행되는 주요 주제어 결정 방법에 대하여 설명하도록 한다.So far, a method of generating educational content using big data according to an embodiment of the present invention has been described with reference to FIGS. 6 and 7. Hereinafter, a method of determining a main subject word performed in step S400 will be described with reference to FIGS. 8 to 11.

도 8은 도 6에 도시된 단계(S400)의 상세 흐름도이다.8 is a detailed flowchart of step S400 shown in FIG. 6.

도 8을 참조하면, 단계(S410)에서, 태깅된 주제어 각각에 대하여 시간 구간별 빈도수를 산출한다.Referring to FIG. 8, in step S410, a frequency count for each time section is calculated for each tagged subject word.

단계(S430)에서, 상기 산출된 시간 구간별 빈도수를 기초로 태깅된 주제어 중에서 범용 주제어가 제외된다. 구체적으로, 도 9에 도시된 바와 같이 복수의 시간 구간에 걸쳐 시간 구간별 빈도수가 지속적으로 임계 값(470) 이상이 되는 주제어가 범용 주제어로 판정된다. 또한, 태깅된 주제어 중에서 상기 범용 주제어가 제외된다.In step S430, general-purpose main words are excluded from the tagged main words based on the calculated frequency for each time section. Specifically, as shown in FIG. 9, a main word whose frequency for each time period continuously exceeds a threshold value 470 over a plurality of time periods is determined as a general-purpose main word. In addition, among the tagged subject words, the general purpose subject words are excluded.

실시예에 따라, 교육 콘텐츠 장치(100)는 시간 구간별 빈도수의 평균 및/또는 표준편차(또는 분산)을 이용하여 범용 주제어를 판정할 수도 있다. 예를 들어, 시간 구간별 빈도수의 평균이 제1 임계 값 이상이고, 표준편차 또는 분산이 제2 임계 값 이하인 주제어가 범용 주제어로 판정될 수 있다.Depending on the embodiment, the educational content device 100 may determine a general-purpose main word using the average and/or standard deviation (or variance) of the frequency for each time section. For example, a main word whose average frequency for each time section is equal to or greater than a first threshold value and whose standard deviation or variance is equal to or less than a second threshold value may be determined as a general-purpose subject word.

단계(S450)에서, 시간 구간별 빈도수를 기초로 범용 주제어가 제외된 나머지 주제어 중에서 주요 주제어가 결정된다.In step S450, a main subject word is determined from among the remaining subject words from which the universal subject word is excluded based on the frequency of each time section.

일 실시예에서, 상기 주요 주제어는 복수의 시간 구간에 걸쳐서 시간 구간별 빈도수가 지속적인 증가 추세로 나타나는 주제어로 결정될 수 있다. 예를 들어, 도 10에 도시된 바와 같이, 시간 구간별 빈도수가 증가 추세에 있는 주제어가 주요 주제어가 될 수 있다. 본 실시예에 따르면, 다수의 관련 분야 교육자들의 관심이 반영된 주제가 주요 주제어로 결정될 수 있다. 따라서, 관련 분야 교육자들에게 보다 가치 있는 교육 콘텐츠가 제공될 수 있고, 특히 최신 교육 트렌드가 반영된 교육 콘텐츠가 제공될 수 있다.In an embodiment, the main subject word may be determined as a subject word in which the frequency of each time period continuously increases over a plurality of time periods. For example, as shown in FIG. 10, a main subject with an increasing trend in the frequency of each time section may be a main subject. According to the present embodiment, a subject in which interests of educators in a number of related fields are reflected may be determined as a main subject. Accordingly, more valuable educational content can be provided to educators in related fields, and in particular, educational content reflecting the latest educational trends can be provided.

일 실시예에서, 상기 주요 주제어는 복수의 시간 구간에 걸쳐서 시간 구간별 빈도수가 지속적인 감소 추세로 나타나는 주제어로 결정될 수 있다. 예를 들어, 도 11에 도시된 바와 같이, 시간 구간별 빈도수가 감소 추세에 있는 주제어가 주요 주제어가 될 수 있다. 본 실시예에 따르면, 다수의 관련 분야 교육자들로부터 관심이 멀어지고 있는 주제가 주요 주제어로 결정될 수 있다. 이에 따라, 다각적인 측면의 교육 콘텐츠가 제공될 수 있다.In an embodiment, the main subject word may be determined as a subject word in which the frequency of each time period continuously decreases over a plurality of time periods. For example, as illustrated in FIG. 11, a main subject with a decreasing trend in the frequency of each time section may be a main subject. According to the present embodiment, a subject that is distracting from educators in a number of related fields may be determined as a main subject. Accordingly, educational contents of various aspects can be provided.

지금까지, 도 8 내지 도 11을 참조하여, 본 발명의 실시예에 따른 주요 주제어 결정 방법에 대하여So far, with reference to FIGS. 8 to 11, a method of determining a main subject word according to an embodiment of the present invention

설명하였다. 상술한 방법에 따르면, 범용 주제어가 제외되고, 다수의 교육자의 관심도가 반영된 주제가 관심 주제어로 결정될 수 있다. 이에 따라, 교육 트렌드가 반영된 가치 있는 교육 콘텐츠가 사용자에게 제공될 수 있다.Explained. According to the above-described method, a general-purpose subject word is excluded, and a subject in which a plurality of educators' interests are reflected may be determined as the subject of interest. Accordingly, valuable educational contents reflecting educational trends can be provided to users.

이하에서는, 도 12 내지 도 15를 참조하여, 단계(S500)에서 수행되는 주제어 유사도 산출 방법에 대하여 설명하도록 한다.Hereinafter, a method of calculating the similarity of the main word performed in step S500 will be described with reference to FIGS. 12 to 15.

도 12는 도 6에 도시된 단계(S500)의 상세 흐름도이다. 이해의 편의를 제공하기 위해, 도 12는 2개의 관심 주제어(이하, 각각을 "제1 주요 주제어" 및 "제2 주요 주제어"로 명명함) 간의 유사도를 산출하는 방법을 도시하고 있다.12 is a detailed flowchart of step S500 shown in FIG. 6. For ease of understanding, FIG. 12 shows a method of calculating the similarity between two subject words of interest (hereinafter, referred to as “first main subject words” and “second main subject words”, respectively).

도 12를 참조하면, 단계(S510)에서, 제1 주요 주제어와 연관된 제1 사용자에 의해 입력된 검색 키워드 의 제목에서, 제1 검색 대상 콘텐츠에 대한 제1 제목 키워드가 추출된다. 도 13에 도시된 바와 같이, 콘텐츠의 제목(510)에서 제목 키워드(520a 내지 520n)를 추출하기 위해 당해 기술 분야에서 널리 알려진 적어도 하나의 자연어 처리 알고리즘이 이용될 수 있으며, 상기 자연어 처리 알고리즘에는 어떠한 알고리즘이 이용되더라도 무방하다.Referring to FIG. 12, in step S510, a first title keyword for a first search target content is extracted from a title of a search keyword input by a first user associated with a first main subject word. 13, at least one natural language processing algorithm well known in the art may be used to extract the title keywords 520a to 520n from the title 510 of the content. The natural language processing algorithm It is okay if an algorithm is used.

또한, 상기 제1 사용자에 의해 입력된 검색 키워드 은 전체 사용자에 의해 입력된 검색 키워드 중에서 상기 제1 주요 주제어에 의해 색인된 콘텐츠 또는 상기 제1 주요 주제어가 태깅된 콘텐츠를 의미하는 것일 수 있다. 도 12b에 도시된 바와 같이, 주요 주제어(530)에 의해 색인된 콘텐츠로 복수의 콘텐츠가 존재할 수 있다.In addition, the search keyword input by the first user may refer to content indexed by the first main subject word or content tagged with the first main subject word among search keywords input by all users. As shown in FIG. 12B, a plurality of contents may exist as contents indexed by the main subject word 530.

단계(S530)에서, 빅데이터로 구축된 교재 콘텐츠 빅데이터 의 제목에서, 제2 [0087] 사용자에 의해 입력된 검색 키워드 에 대한 제2 제목 키워드가 추출된다.In step (S530), from the title of the textbook content big data built with big data, a second title keyword for the search keyword input by the second user is extracted.

단계(S550)에서, 제1 제목 키워드의 빈도수를 기초로, 제1 주요 주제어에 대한 제1 키워드 벡터가 생성된다. 구체적으로, 상기 제1 키워드 벡터는 하기의 수학식 1에 따라 생성될 수 있다. 하기의 수학식 1에서, Mi는 i번째 주요 주제어를 가리키고, Tkj는 j번째 제목 키워드를 가리키며, V는 벡터를 가리키고, freq(Tkj|Mi)는 i번째 관심 주제어와 연관된 사용자에 의해 입력된 검색 키워드 에서 j번째 제목 키워드가 나타나는 빈도수를 가리킨다.In step S550, a first keyword vector for a first main subject word is generated based on the frequency of the first subject keyword. Specifically, the first keyword vector may be generated according to Equation 1 below. In Equation 1 below, Mi refers to the i-th main keyword, Tkj refers to the j-th title keyword, V refers to a vector, and freq(Tkj|Mi) is a search input by a user associated with the i-th subject of interest. Refers to the frequency of occurrence of the jth title keyword in the keyword.

수학식 1Equation 1

도 13을 참조하여, 본 단계(S550)에 대하여 부연 설명하도록 한다.With reference to FIG. 13, this step (S550) will be further described.

도 13을 참조하면, 주요 주제어(530)와 연관된 사용자에 의해 입력된 검색 키워드 의 제목(540a 내지 540n)에서 제목 키워드(550) 각각이 나타나는 빈도수가 산출된다. 보다 자세하게는, 제목 키워드(550a)가 사용자에 의해 입력된 검색 키워드 의 제목(540a 내지 540n)에서 나타나는 빈도수(560a)가 산출되고, 제목 키워드(550b)가 사용자에 의해 입력된 검색 키워드 의 제목(540a 내지 540n)에서 나타나는 빈도수(560b)가 산출된다. 이와 같은 과정이 제목 키워드(550) 각각에 대하여 수행된다.Referring to FIG. 13, in the titles 540a to 540n of search keywords input by a user associated with the main subject word 530, the frequency of each occurrence of the title keywords 550 is calculated. In more detail, the frequency 560a in which the title keyword 550a appears in the titles 540a to 540n of the search keyword input by the user is calculated, and the title keyword 550b is the title of the search keyword input by the user ( The frequency number 560b that appears in 540a to 540n is calculated. This process is performed for each of the title keywords 550.

다음으로, 각 제목 키워드별 빈도수를 벡터 요소로 갖는 키워드 벡터(570)가 생성된다. 즉, 키워드 벡터(570)의 벡터 요소(570a)는 제목 키워드(550a)의 빈도수(560a)로 설정되고, 키워드 벡터(570)의 벡터 요소(570b)는 제목 키워드(550b)의 빈도수(560b)로 설정된다. 위와 같은 방식으로, 단계(S550)에서 제1 주요 주제어에 대한 k 차원(단, k는 1 이상의 자연수)의 제1 키워드 벡터가 생성될 수 있다. Next, a keyword vector 570 having the frequency of each title keyword as a vector element is generated. That is, the vector element 570a of the keyword vector 570 is set to the frequency 560a of the title keyword 550a, and the vector element 570b of the keyword vector 570 is the frequency 560b of the title keyword 550b. Is set to In the same manner as above, in step S550, a first keyword vector having a k-dimension (where k is a natural number greater than or equal to 1) for the first main keyword may be generated.

다시 도 12를 참조하면, 단계(S570)에서, 제2 제목 키워드의 빈도수를 기초로, 제2 주요 주제어에 대한 제2 키워드 벡터가 생성된다. 상기 제2 키워드 벡터 또한 상기 제1 키워드 벡터와 동일하게 k 차원의 벡터로 생성될 수 있다.Referring back to FIG. 12, in step S570, a second keyword vector for the second main subject word is generated based on the frequency of the second subject keyword. The second keyword vector may also be generated as a k-dimensional vector in the same manner as the first keyword vector.

참고로, 지금까지 키워드 벡터를 구성하는 벡터 요소의 값은 각 제목 키워드의 빈도수에 해당하는 것으로 한정하여 설명하였으나, 본 발명의 실시예에 따르면, 상기 벡터 요소의 값은 TF-IDF(term frequency-inverse document frequency) 값이 될 수도 있다. 또는, 상기 벡터 요소의 값은 IDF 값이 될 수도 있으며, 이는 실시예에 따라 달라질 수 있다.For reference, until now, the value of the vector element constituting the keyword vector has been described as being limited to the frequency of each title keyword, but according to an embodiment of the present invention, the value of the vector element is TF-IDF (term frequency- inverse document frequency). Alternatively, the value of the vector element may be an IDF value, which may vary according to embodiments.

또한, 도 12의 경우 제1 주요 주제어 및 제2 주요 주제어에 한정하여 설명하였으나, 전술한 바와 같이 이는 이해의 편의를 제공하기 위한 것일 뿐이고, 도 13에 도시된 바와 같이 각 주요 주제어 별로 키워드 벡터가 생성된다.In addition, in the case of FIG. 12, the description is limited to the first and second major keywords, but as described above, this is only for convenience of understanding, and as shown in FIG. Is created.

다시 도 12를 참조하면, 단계(S590)에서, 제1 키워드 벡터와 제2 키워드 벡터 사이의 유사도가 이용하여, 제1 주요 주제어와 제2 주요 주제어 사이의 주제어 유사도가 산출된다.Referring back to FIG. 12, in step S590, the similarity of the first keyword and the second keyword is calculated by using the similarity between the first keyword vector and the second keyword vector.

일 실시예에서, 상기 제1 키워드 벡터와 상기 제2 키워드 벡터 사이의 유사도는 하기의 수학식 2 및 도 14에 도시된 바와 같이 코사인 유사도(cosine similarity)로 산출될 수 있다. 상기 코사인 유사도는 당해 기술 분야에서 이미 널리 알려진 바, 이에 대한 설명은 생략하도록 한다.In an embodiment, a degree of similarity between the first keyword vector and the second keyword vector may be calculated as a cosine similarity as shown in Equation 2 and FIG. 14 below. The cosine similarity is well known in the art, and a description thereof will be omitted.

수학식 2Equation 2

일 실시예에서, 상기 제1 키워드 벡터와 상기 제2 키워드 벡터 사이의 유사도는 유클리드 유사도(Euclidean similarity)로 산출될 수 있다. 즉, 상기 제1 키워드 벡터와 상기 제2 키워드 벡터 사이의 유사도는 도 14에 도시된 바와 같은 벡터 공간 모델(vector space model)에서 두 벡터가 가리키는 포인트 사이의 유클리드 거리에 기초하여 산출될 수도 있다.In an embodiment, the similarity between the first keyword vector and the second keyword vector may be calculated as Euclidean similarity. That is, the similarity between the first keyword vector and the second keyword vector may be calculated based on a Euclidean distance between points indicated by two vectors in a vector space model as illustrated in FIG. 14.

이외에도, 상기 제1 키워드 벡터와 상기 제2 키워드 벡터 사이의 유사도는 다양한 방식으로 산출될 수 있다.In addition, the similarity between the first keyword vector and the second keyword vector may be calculated in various ways.

지금까지, 도 12 내지 도 15를 참조하여, 본 발명의 실시예에 따른 주제어 유사도 산출 방법에 대하여 설명하였다. 상술한 방법에 따르면, 주요 주제어의 연관된 제목 키워드의 유사도에 기초하여 상기 주요 주제어 간의 유사도가 산출된다. 콘텐츠의 제목은 저자가 표현하고자 하는 핵심적인 교육 주제를 함축하고 있고, 해당 분야의 최신 키워드를 포함하고 있기 때문에, 본 실시예에 따르면 제목 키워드의 유사도를 통해 주요 주제어 간의 유사도가 정확하게 산출될 수 있다. 또한, 범용 주제어, 검색어 등이 시각화된 네트워크에서 강한 허브로 작용하는 것 이 방지될 수 있다.So far, with reference to FIGS. 12 to 15, a method of calculating the similarity of a subject word according to an embodiment of the present invention has been described. According to the above-described method, the similarity between the main subject words is calculated based on the similarity of the related subject keywords of the main subject words. Since the title of the content implies a core educational topic that the author intends to express and includes the latest keywords in the field, according to this embodiment, the similarity between the main subject words can be accurately calculated through the similarity of the title keywords. . In addition, it can be prevented from acting as a strong hub in the network in which general-purpose keywords and search words are visualized.

이하에서는, 도 16 내지 도 18을 참조하여, 단계(S600)에서 수행되는 검색 결과 제공 방법에 대하여 설명하도록 한다.Hereinafter, a method of providing a search result performed in step S600 will be described with reference to FIGS. 16 to 18.

본 발명의 실시예에 따르면, 교육 콘텐츠 장치(100)는 주요 주제어 간의 주제어 유사도에 기초하여 주요 주제어 간의 관계를 시각화된 네트워크로 제공할 수 있다. 상기 시각화된 네트워크에서 각 노드는 주요 주제어에 대응되고, 간선의 가중치는 주요 주제어 간의 주제어 유사도에 기초하여 결정될 수 있다. 또한, 각 노드에 인접하여According to an embodiment of the present invention, the educational content device 100 may provide a relationship between the main subject words in a visualized network based on the similarity of the main subject words. In the visualized network, each node corresponds to a main subject word, and a weight of an edge may be determined based on a similarity of main subject words between main subject words. Also, adjacent to each node

대응되는 주요 주제어가 함께 표시될 수 있다.Corresponding main keywords can be displayed together.

본 발명의 실시예에 따라 생성된 네트워크의 예는 도 16에 도시되어 있다. 도 16는 도 1에 도시된 네트워크와 동일한 사용자에 의해 입력된 검색 키워드 을 기초로 생성된 네트워크이나, 네트워크의 형태가 상이한 것을 확인할 수 있다. 특히, 도 1에 도시된 네트워크와는 달리 강한 허브로 동작하는 노드가 존재하지 않는 것을 확인할 수 있다.An example of a network created according to an embodiment of the present invention is shown in FIG. 16. 16 is a network generated based on a search keyword input by the same user as the network shown in FIG. 1, but it can be seen that the network type is different. In particular, unlike the network illustrated in FIG. 1, it can be seen that there is no node operating as a strong hub.

일 실시예에서, 상기 시각화된 네트워크는 패스파인더(pathfinder) 네트워크 검색에 기초하여 구성될 수 있다. 본 실시예에 따르면, 최소 신장 트리(minimum spanning tree)에 기초하여 네트워크를 구성하는 것보다 정확도 높은 교육 콘텐츠가 제공될 수 있다. 최소 신장 트리에 기초하여 네트워크를 구성하는 경우, 중요도 높은 간선이 제거되어 중요한 정보가 손실될 수 있기 때문이다. 상기 패스파인더 네트워크 검색은 당해 기술 분야의 통상의 기술자라면 자명하게 이해할 수 있는 것인 바, 이에 대한 설명은 생략하도록 한다.In one embodiment, the visualized network may be configured based on a pathfinder network search. According to the present embodiment, educational content with higher accuracy than configuring a network based on a minimum spanning tree may be provided. This is because, when a network is configured based on the minimum spanning tree, important information may be lost due to the removal of high-important edges. The Pathfinder network search can be clearly understood by those of ordinary skill in the art, and a description thereof will be omitted.

이하에서는, 본 발명의 몇몇 실시예에 따라 다양한 교육 콘텐츠를 제공하는 방법에 대하여 설명하도록 한다. 이하에서는, 설명의 편의를 위해, 제1 노드는 제1 주요 주제어에 대응되는 노드이고, 제2 노드는 제2 주요 주제어에 대응되는 노드인 것으로 설정하여 설명하도록 한다.Hereinafter, a method of providing various educational contents according to some embodiments of the present invention will be described. Hereinafter, for convenience of description, the first node is a node corresponding to the first main subject word, and the second node is set to be a node corresponding to the second main subject word.

일 실시예에서, 네트워크를 구성하는 각 노드의 색상은 대응되는 주요 주제어가 속한 카테고리에 기초하여 결정될 수 있다. 예를 들어, 제1 주요 주제어가 제1 카테고리에 속하는 경우 제1 노드는 제1 색으로 시각화되고, 제2 주요 주제어가 제2 카테고리에 속하는 경우 제2 노드는 제2 색으로 시각화될 수 있다. 이때, 각 주요 주제어가 속하는 카테고리는 기 정의된 것일 수 있다. 예를 들어, 주제어가 어학 교육 관련 주제인 경우, 상기 주제어는 교육 콘텐츠의 하위 계층에 위치한 용어이고, 상기 카테고리는 상위 계층에 위치한 용어를 의미하는 것일 수 있다.In an embodiment, the color of each node constituting the network may be determined based on a category to which a corresponding main subject word belongs. For example, when a first main subject word belongs to a first category, a first node may be visualized with a first color, and if a second main subject word belongs to a second category, a second node may be visualized with a second color. In this case, a category to which each main subject word belongs may be predefined. For example, when the subject word is a subject related to language education, the subject word may be a term located in a lower layer of educational content, and the category may refer to a term located in a higher layer.

일 실시예에서, 주요 주제어 간의 주제어 유사도(또는, 간선의 가중치)에 기초하여 대응되는 간선의 길이 및 두께 중 적어도 하나의 시각적 요소가 결정될 수 있다. 예를 들어, 제1 주요 주제어와 제2 주요 주제어 사이의 주제어 유사도가 높을수록, 제1 노드와 제2 노드 사이의 간선의 길이는 짧아지거나, 간선의 두께가 두꺼워지도록 시각화될 수 있다. 본 실시예에 따르면, 사용자가 주요 주제어 간의 유사도 정보를 보다 직관적으로 파악할 수 있는 바, 정보 전달성이 향상되는 효과가 있다.In an embodiment, at least one visual element of a length and a thickness of a corresponding trunk line may be determined based on a similarity (or weight of the trunk line) between the key words. For example, as the similarity of the main word between the first and second main words increases, the length of the trunk line between the first node and the second node becomes shorter or the thickness of the trunk line becomes thicker. According to the present embodiment, since the user can more intuitively grasp the similarity information between main subject words, there is an effect of improving information transmission.

일 실시예에서, 네트워크를 구성하는 각 노드의 크기는 이웃 노드의 개수, 노드의 차수 및/또는 간선의 가중치에 기초하여 결정될 수 있다. 예를 들어, 네트워크를 구성하는 제1 노드의 크기는 상기 제1 노드에 인접한 제1이웃 노드의 개수 및/또는 상기 제1 이웃 노드에 인접한 제2 이웃 노드의 개수에 기초하여 결정될 수 있다. 다른 예를 들어, 상기 제1 노드의 크기는 상기 제1 노드에 인접한 제1 이웃 노드의 개수 및/또는 각 이웃 노드와의 간선의 가중치에 기초하여 결정될 수 있다. 본 실시예에 따르면, 주변에 이웃 노드가 많이 존재하거나 높은 가중치가 부여된 간선을 갖는 노드의 크기가 상대적으로 크게 시각화될 수 있다. 이에 따라, 주요 주제어 중에서 핵심적인 주제어가 부각될 수 있고, 정보 전달성이 더욱 향상될 수 있다.In an embodiment, the size of each node constituting the network may be determined based on the number of neighboring nodes, the order of nodes, and/or weights of trunk lines. For example, the size of the first node constituting the network may be determined based on the number of first neighboring nodes adjacent to the first node and/or the number of second neighboring nodes adjacent to the first neighboring node. For another example, the size of the first node may be determined based on the number of first neighboring nodes adjacent to the first node and/or a weight of a trunk line with each neighboring node. According to the present exemplary embodiment, a size of a node having a trunk line with a large number of neighboring nodes or a high weight assigned to it may be visualized relatively large. Accordingly, a key subject word among the main subject words can be highlighted, and information transferability can be further improved.

몇몇 실시예에서, 사용자로부터 특정 노드에 대한 선택 입력이 수신되면, 상기 사용자에게 추가 정보가 제공될 수 있다. 이하에서, 상기 추가 정보가 제공되는 실시예에 대하여 간략하게 설명한다.In some embodiments, when a selection input for a specific node is received from a user, additional information may be provided to the user. Hereinafter, an embodiment in which the additional information is provided will be briefly described.

일 실시예에서, 네트워크를 구성하는 노드 중 제1 노드에 대한 선택 입력에 응답하여, 제1 주요 주제어와 연관된 사용자에 의해 입력된 검색 키워드 의 제목에서 추출된 제목 키워드와 각 제목 키워드의 빈도수 정보가 제공될 수 있다. 즉, 상기 제1 주요 주제어에 대응되는 키워드 벡터의 요소 값이 각 제목 키워드 별로 제공될 수 있다. 예를 들어, 도17에 도시된 바와 같이, 노드(610)에 대한 선택 입력이 수신되면, 제목 키워드의 빈도수 정보(630)가 제공될 수 있다. 도 16은 그래프 형태로 빈도수 정보가 제공된 것이 예시되었으나, 빈도수 정보가 제공되는 방식은 어떠한 방식이 되더라도 무방하다. 본 실시예에 따르면, 각 주요 주제어에 대한 키워드 정보가 제공되는 바, 사용자가 교육 콘텐츠를 용이하게 파악할 수 있도록 도움을 줄 수 있다.In one embodiment, in response to a selection input for a first node among nodes constituting the network, the title keyword extracted from the title of the search keyword input by the user associated with the first main keyword and the frequency information of each title keyword are Can be provided. That is, an element value of a keyword vector corresponding to the first main subject word may be provided for each title keyword. For example, as illustrated in FIG. 17, when a selection input for the node 610 is received, frequency information 630 of the title keyword may be provided. 16 illustrates that the frequency information is provided in the form of a graph, but the frequency information may be provided in any manner. According to the present embodiment, since keyword information for each main subject word is provided, it is possible to help a user to easily identify educational content.

일 실시예에서, 네트워크를 구성하는 노드 중 제1 노드에 대한 선택 입력에 응답하여, 제1 주요 주제어와 연관된 사용자에 의해 입력된 검색 키워드 의 리스트가 제공될 수 있다. 또는, 전술한 실시예에서 제공된 제목 키워드에 대한 선택 입력에 응답하여, 선택된 제목 키워드를 포함하는 콘텐츠의 리스트가 제공될 수 있다. 본 실시예에 따르면, 사용자의 관심 주제 또는 관심 키워드에 기초한 검색 기능이 사용자 편의적인 인터페이스를 통해 제공될 수 있다. 이에 따라, 사용자는 관련 콘텐츠를 빠르게 검색할 수 있는 바, 사용자 편의성 및 만족도가 향상될 수 있다.In one embodiment, in response to a selection input for a first node among nodes constituting a network, a list of search keywords input by a user associated with the first main keyword may be provided. Alternatively, in response to a selection input for the title keyword provided in the above-described embodiment, a list of contents including the selected title keyword may be provided. According to the present embodiment, a search function based on a subject of interest or a keyword of interest of a user may be provided through a user-friendly interface. Accordingly, a user can quickly search for related content, and user convenience and satisfaction can be improved.

일 실시예에서, 사용자로부터 특정 주요 주제어가 입력되면, 상기 입력에 응답하여 네트워크를 구성하는 노드 중 상기 특정 주요 주제어에 대응되는 특정 노드와 상기 특정 노드로부터 기 설정된 거리 이내에 위치한 노드가 하이라이트 처리될 수 있다. 또한, 상기 사용자로부터 특정 노드에 대한 선택 입력이 수신되는 경우에도 동일한 처리가 수행될 수 있다.In one embodiment, when a specific main keyword is input from a user, a specific node corresponding to the specific main keyword among nodes constituting a network and a node located within a preset distance from the specific node among the nodes constituting the network in response to the input may be highlighted. have. Also, the same processing may be performed even when a selection input for a specific node is received from the user.

전술한 실시예에서, 서로 다른 검색 대상 기간에 대한 교육 콘텐츠가 포함된 복수의 네트워크가 동시에 제공되는 경우, 각각의 네트워크에 동시에 하이라이트 처리가 수행될 수 있다. 이에 대하여, 도 18을 참조하여 부연 설명하도록 한다.In the above-described embodiment, when a plurality of networks including educational content for different search target periods are simultaneously provided, highlight processing may be simultaneously performed on each network. This will be further described with reference to FIG. 18.

도 18은 제1 검색 대상 기간에 대한 제1 네트워크(650)와 상기 제1 검색 대상 기간과 다른 기간인 제2 검색 대상 기간에 대한 제2 네트워크(670)가 동시에 제공되었고, 사용자로부터 특정 주요 주제어가 입력된 예를 도시하고 있다. 또한, 도 17은 상기 기 설정된 거리가 "2"로 설정된 경우를 도시하고 있다. 도 18을 참조하면, 사용자로부터 상기 특정 주요 주제어가 입력되면, 제1 네트워크(650)에서 상기 특정 관심 주 제어에 대응되는 노드(651)와 노드(651)로부터 기 설정된 거리 이내에 위치한 노드(653)에 대한 하이라이트 처리가 수행될 수 있다. 이때, 실시예에 따라, 상기 특정 주요 주제어에 대응되는 노드(651)는 노드(653)와 다른 색상, 크기, 색농도로 하이라이트 처리가 수행될 수도 있다. 이와 동시에, 제2 네트워크(670)에서 상기 특정 관심 주제어에 대응되는 노드(671)와 노드(671)로부터 기 설정된 거리 이내에 위치한 노드(673)에 대한 하이라이트 처리가 수행될 수 있다. 본 실시예에 따르면, 특정 주요 주제어가 입력되면, 입력된 주요 주제어에 대응되는 노드가 각 네트워크에서 차지하는 위치, 주변 노드의 개수, 링크의 개수 등을 통해 사용자는 직관적으로 상기 입력된 주요 주제어의 트렌드 변화를 파악할 수 있다.FIG. 18 shows a first network 650 for a first search target period and a second network 670 for a second search target period that is different from the first search target period. Shows an example of input. In addition, FIG. 17 shows a case where the preset distance is set to "2". Referring to FIG. 18, when the specific main keyword is input from a user, a node 653 located within a preset distance from a node 651 corresponding to the specific main control of interest in a first network 650 and a node 651 Highlight processing for can be performed. In this case, according to an exemplary embodiment, the node 651 corresponding to the specific main keyword may perform highlight processing with a different color, size, and color density than the node 653. At the same time, the second network 670 may perform highlight processing on the node 671 corresponding to the specific subject of interest and the node 673 located within a preset distance from the node 671. According to the present embodiment, when a specific main subject word is input, the user intuitively provides the trend of the input main subject through the location occupied by the node corresponding to the input main subject in each network, the number of neighboring nodes, and the number of links Can spot change.

지금까지, 도 16 내지 도 18을 참조하여, 본 발명의 실시예에 따른 교육 콘텐츠 제공 방법에 대하여 설명하였다. 마지막으로, 도 19와 20을 참조하여, 본 발명의 실시예에 따라 제공된 네트워크의 평가 결과에 대하여 간략하게 설명하도록 한다.So far, a method of providing educational content according to an embodiment of the present invention has been described with reference to FIGS. 16 to 18. Finally, with reference to Figs. 19 and 20, an evaluation result of a network provided according to an embodiment of the present invention will be briefly described.

도 19는 사용자에 의해 입력된 검색 키워드 과 관련된 기술 분야의 전문가에 의해 지정된 복수의 클러스터(710 내지 760)를 도시하고 있다. 도 19를 참조하면, 본 발명의 실시예에 따라 생성된 네트워크는 강한 허브 노드가 존재하지 않고 방사된 형태로 다수의 서브 네트워크를 포함하고 있음을 확인할 수 있다. 또한, 각 서브 네트워크는 전문가에 의해 지정된 클러스터(710 내지 760)에 대응되는 것을 확인할 수 있다. 이는, 제목 키워드의 유사도에 기초하여 주요 주제어 간의 주제어 유사도가 산출되는 경우, 실제로 관련성이 높은 주제어가 밀집되고, 강한 허브 노드의 효과는 억제된다는 것을 나타내는 것으로 이해될 수 있다. 아울러, 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법은 정확도 높고 양질의 교육 콘텐츠를 제공한다는 점이 확인된 것으로 이해될 수 있다.19 illustrates a plurality of clusters 710 to 760 designated by an expert in the technical field related to a search keyword input by a user. Referring to FIG. 19, it can be seen that the network generated according to the embodiment of the present invention does not have a strong hub node and includes a plurality of sub-networks in a radiated form. In addition, it can be confirmed that each sub-network corresponds to the clusters 710 to 760 designated by the expert. This can be understood as indicating that, when the similarity of the main subject words is calculated based on the similarity of the subject keyword, the subject words with high relevance are actually concentrated and the effect of the strong hub node is suppressed. In addition, it can be understood that it has been confirmed that the method of generating educational content using big data according to an embodiment of the present invention provides high-quality educational content with high accuracy.

지금까지, 도 6 내지 도 20을 참조하여, 본 발명의 실시예에 따른 빅데이터를 이용한 교육 콘텐츠 생성 방법에 대하여 설명하였다. 상술한 방법에 따르면, 주어진 사용자에 의해 입력된 검색 키워드 에 대한 교육 콘텐츠로 상기 사용자에 의해 입력된 검색 키워드 에 포함된 주제 간의 관계를 시각화한 네트워크가 제공된다. 이를 통해, 사용자는 직관적으로 교육 트렌드 정보를 확인할 수 있는 바, 검색 정보의 정보 전달성이 향상될 수 있다.So far, a method of generating educational content using big data according to an embodiment of the present invention has been described with reference to FIGS. 6 to 20. According to the above-described method, a network is provided which visualizes the relationship between topics included in the search keyword input by the user as educational content for the search keyword input by a given user. Through this, the user can intuitively check the education trend information, and information delivery of search information can be improved.

지금까지 도 6 내지 도 20을 참조하여 설명된 본 발명의 개념은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The concept of the present invention described so far with reference to FIGS. 6 to 20 may be implemented as computer-readable codes on a computer-readable medium. The computer-readable recording medium is, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). I can. The computer program recorded in the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although the operations are illustrated in a specific order in the drawings, it should not be understood that the operations must be executed in the specific order shown or in a sequential order, or all illustrated operations must be executed to obtain a desired result. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the above-described embodiments should not be understood as necessitating such separation, and the program components and systems described may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features. I can understand. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting.

Claims

In the educational content generation method using big data performed by an educational content device,
Acquiring a search keyword input by a user;
Determining a main subject word among subject words tagged to each of the search keywords input by the user;
Extracting a first title keyword for the first search target content from a title of a search keyword input by a first user associated with a first main keyword among the determined main subject words;
Extracting a second title keyword for the teaching material content big data from the title of the teaching material content big data constructed with big data among the determined main subject words;
Calculating a similarity of a subject word between the first subject word and the second subject word of interest based on a degree of similarity between the first subject keyword and the second subject keyword; And
Including the step of providing a search result for the search keyword input by the user based on the calculated similarity of the main word,
The step of calculating the similarity of the main word between the first main subject and the second main subject,
Generating a first keyword vector for the first main subject word based on the frequency of the first subject keyword;
Generating a second keyword vector for the second main subject word based on the frequency of the second subject keyword; And
Calculating the similarity of the main word by using the similarity between the first keyword vector and the second keyword vector
Characterized in that it comprises a,
A method of creating educational content using big data.

The method of claim 1,
Further comprising the step of determining a search target period using the number of issued contents per time period for the search keyword input by the user,
The step of determining the main subject word,
From among the search keywords input by the user, determining the subject of interest based on a search keyword input by the user issued during the search target period,
A method of creating educational content using big data.

The method of claim 2,
The step of determining the search target period,
For each of a plurality of time intervals, calculating the number of issued contents for each time interval for the search keyword input by the user; And
It characterized in that it comprises the step of determining, from among the plurality of time intervals, a time interval in which the calculated number of published contents for each time interval is equal to or greater than a threshold value as a start interval of the analysis target period,
A method of creating educational content using big data.

The method of claim 1,
The step of determining the main subject word,
For each of a plurality of time intervals, calculating a frequency for each time interval for each of the tagged main words;
Excluding general-purpose main words from the tagged main words based on the calculated frequency for each time section; And
Including the step of determining the main subject from among the remaining subject words from which the universal subject word is excluded,
The general term is,
Over a plurality of time intervals, the calculated frequency for each time interval is a key word that continuously exceeds a threshold value,
A method of creating educational content using big data.

The method of claim 1,
The step of determining the main subject word,
For each of a plurality of time intervals, calculating a frequency for each time interval for each of the tagged main words; And
It characterized in that it comprises the step of determining, among the tagged subject words, the main subject words in which the calculated frequency of each time period continuously increases over the plurality of time periods,
A method of creating educational content using big data.

The method of claim 1,
The step of determining the main subject word,
For each of a plurality of time intervals, calculating a frequency for each time interval for each of the tagged main words; And
Among the tagged subject words, determining a main subject word in which the calculated frequency of each time period continuously decreases over the plurality of time periods as the main subject word,
A method of creating educational content using big data.

The method of claim 1,
The first title keyword includes a 1-1 title keyword and a 1-2 title keyword,
The first keyword vector includes a first vector element and a second vector element,
The first vector element,
The 1-1 title keyword is determined based on a frequency number appearing in the title of the search keyword input by the first user,
The second vector element,
Characterized in that the 1-2 title keyword is determined based on the frequency of appearing in the title of the search keyword input by the first user,
A method of creating educational content using big data.

The method of claim 1,
The degree of similarity between the first keyword vector and the second keyword vector is
Characterized in that it is calculated based on cosine similarity,
A method of creating educational content using big data.

The method of claim 1,
Providing a search result for the search keyword input by the user,
Including the step of providing the relationship between the determined main subject words in a visualized network,
The first node constituting the visualized network corresponds to the first main keyword,
The second node constituting the visualized network corresponds to the second main keyword,
The weight of the trunk line between the first node and the second node is determined based on a similarity of a main word between the first main subject and the second main subject,
A method of creating educational content using big data.

The method of claim 9,
The visualized network,
Characterized in that visualized through a pathfinder network search,
A method of creating educational content using big data.

The method of claim 9,
The first main keyword is classified into a first category among the predefined categories,
When the second main keyword is classified as a second category among the predefined categories,
The first node is visualized with a first color, and the second node is visualized with a second color,
A method of creating educational content using big data.

The method of claim 9,
At least one visual element of the length of the trunk line and the thickness of the trunk line between the first node and the second node,
Characterized in that it is determined based on the similarity of the main word between the first main subject and the second main subject,
A method of creating educational content using big data.

The method of claim 9,
The size of the first node is,
Characterized in that it is determined based on the number of first neighboring nodes adjacent to the first node and the number of second neighboring nodes adjacent to the first neighboring node,
A method of creating educational content using big data.

The method of claim 9,
Receiving a selection input for the first node from a user; And
In response to the selection input of the first node, the first search associated with the first title keyword and the first main keyword
It characterized in that it further comprises the step of providing information on the frequency of occurrence of the first title keyword in the title of the target content,
A method of creating educational content using big data.

The method of claim 14,
The first title keyword includes a 1-1 title keyword and a 1-2 title keyword,
Receiving a selection input for the first-1 title keyword from the user; And
In response to a selection input of the 1-1 title keyword, providing a list of contents including the 1-1 title keyword among search keywords input by the first user. ,
A method of creating educational content using big data.

The method of claim 9,
Receiving a selection input for the first node from a user; And
In response to receiving the selection input, further comprising the step of providing a list of search keywords input by the first user associated with the first main subject word,
A method of creating educational content using big data.

The method of claim 9,
The search keyword entered by the user,
A search keyword entered by a user issued during a first search target period and a search keyword input by a user issued during a second search target period different from the first search target period,
Providing the determined relationship between the main subject words in a visualized network,
Providing, to a first network, a relationship between main subject words determined from a search keyword input by a user issued during the first search target period; And
It characterized in that it comprises the step of providing a relationship between the main subject words determined from the search keyword entered by the user issued during the second search target period to a second network,
A method of creating educational content using big data.

The method of claim 17,
Receiving a third main key word from a user;
Highlighting nodes located within a preset distance from the 3-1 node and the 3-1 node corresponding to the third main key word in the first network; And
In the second network, it characterized in that it comprises the step of highlighting a node 3-2 corresponding to the third main keyword and a node located within a preset distance from the node 3-2,
A method of creating educational content using big data.

One or more processors;
Network interface;
A memory for loading a computer program executed by the processor; And
Including a storage for storing the search keyword and the computer program input by the user,
The computer program,
An operation of determining a main subject word among subject words tagged to each of the search keywords input by the user;
An operation of extracting a first title keyword for the first search target content from a title of a search keyword input by a first user associated with a first main keyword among the determined main subject words;
In the title of the textbook content big data built with big data among the determined main keywords, the second search target statement
An operation of extracting a second title keyword for the second term;
An operation of calculating a similarity of a subject word between the first subject word and the second subject word of interest based on a degree of similarity between the first subject keyword and the second subject keyword; And
Including an operation of providing a search result for a search keyword input by the user based on the calculated similarity of the subject word,

The operation of calculating the similarity of the subject word between the first subject word and the second subject word is,
An operation of generating a first keyword vector for the first main keyword based on the frequency of the first subject keyword;
An operation of generating a second keyword vector for the second main keyword based on the frequency of the second subject keyword; And
And an operation for calculating the similarity of the main word by using the similarity between the first keyword vector and the second keyword vector,
A method of creating educational content using big data.

Combined with a computing device,
Obtaining a search keyword input by a user;
Determining a main subject word among subject words tagged to each of the search keywords input by the user;
Extracting a first title keyword for the first search target content from a title of a search keyword inputted by a first user associated with a first main keyword among the determined main subject words;
In the title of the textbook content big data built with big data among the determined main keywords, the second search target statement
Extracting a second title keyword for heun;
Calculating a similarity of a subject word between the first subject word and the second subject word of interest based on a degree of similarity between the first subject keyword and the second subject keyword; And
When executing the step of providing a search result for the search keyword input by the user based on the calculated similarity of the main word
Keyed, the step of calculating the similarity of the main word between the first main subject and the second main subject,
Generating a first keyword vector for the first main keyword based on the frequency of the first subject keyword;
Generating a second keyword vector for the second main keyword based on the frequency of the second subject keyword; And
A computer program stored in a computer-readable recording medium comprising the step of calculating the similarity of the main word by using the similarity between the first keyword vector and the second keyword vector.