KR102540417B1

KR102540417B1 - System and method for integrated recommendation of learning activities based on keywords of interest using academic domain embedding and recording medium for performing the same

Info

Publication number: KR102540417B1
Application number: KR1020220026995A
Authority: KR
Inventors: 문기범; 이진숙; 한수연; 이수강; 권혜정; 한재호; 김규태
Original assignee: 고려대학교 산학협력단
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2023-06-05

Abstract

본 발명은 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체에 대한 것이다.
본 발명에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템은 다수의 논문 및 저널로부터 다수의 키워드와 다수의 학문영역을 추출하는 추출부; 상기 추출된 키워드를 전처리하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 목록과 상기 학문영역를 이용하여 관심 키워드별 학문영역 매트릭스를 생성하는 제1 매트릭스 생성부; 상기 생성된 관심 키워드별 학문영역 매트릭스를 이용하여 과목, 제2 전공 및 비교과 활동을 각각 학문영역 차원에 임베딩하여 각각의 학문영역 매트릭스를 생성하는 제2 매트릭스 생성부; 및 사용자로부터 관심 키워드가 입력되면, 상기 각각의 학문영역 매트릭스를 이용하여 상기 입력된 관심 키워드에 대응하는 과목, 제2 전공 및 비교과 활동 중 적어도 하나 이상을 유사도가 높은 순으로 추천하는 추천부를 포함한다.The present invention relates to an integrated recommendation system and method for learning activities based on keywords of interest using academic area embedding, and a recording medium for performing the same.
An integrated recommendation system for learning activities based on keywords of interest using academic domain embedding according to the present invention includes an extraction unit for extracting a plurality of keywords and a plurality of academic domains from a plurality of papers and journals; a first matrix generating unit that pre-processes the extracted keywords to generate a list of keywords of interest, and generates a matrix of study areas for each keyword of interest using the created list of keywords of interest and the study area; a second matrix generating unit for generating respective academic domain matrices by embedding subjects, second majors, and extracurricular activities in respective academic domain dimensions using the generated academic domain matrices for each keyword of interest; and a recommendation unit for recommending at least one of subjects, second majors, and extracurricular activities corresponding to the input keywords of interest, in order of similarity, in the order of high similarity, when a keyword of interest is input from the user. .

Description

System and method for recommending integrated learning activities based on keywords of interest using academic area embedding, and recording medium for performing it

본 발명은 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체에 관한 것으로서, 더욱 상세하게는 학문영역 임베딩을 통해 구축된 매트릭스를 이용하여 사용자로부터 입력되는 관심 키워드에 대응하는 교과 및 비교과 활동을 추천하는 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체에 관한 것이다.The present invention relates to an integrated recommendation system and method for learning activities based on keywords of interest using academic domain embedding, and to a recording medium for performing the same, and more particularly, to an interest input from a user using a matrix constructed through academic domain embedding. It relates to a system and method for recommending integrated learning activities based on keywords of interest using academic area embedding that recommends subject and extracurricular activities corresponding to keywords, and a recording medium for performing the same.

최근 많은 전문가들은 인공지능(Artificial Intelligence; AI) 기술이 우리 사회에 혁명적인 변화를 일으킬 것으로 전망하고 있다. 특히, 코로나바이러스 감염증-19(COVID-19)로 인해 사회 각 분야의 디지털화가 빠르게 진행되면서 많은 양의 데이터가 축적되고 있으며, AI 기술의 도입과 확산이 가속화되고 있다.Recently, many experts predict that artificial intelligence (AI) technology will bring about revolutionary changes in our society. In particular, due to COVID-19, a large amount of data is accumulating as digitalization in each field of society is rapidly progressing, and the introduction and spread of AI technology is accelerating.

교육 영역에서도 AI 도입에 관한 논의가 활발하게 이루어지고 있으며, 최근에는 학생 맞춤형 교육 서비스에 대한 사회적 수요가 높아짐에 따라 AI기술을 도입한 맞춤형 추천의 중요성이 증대되고 있다.In the field of education, there are active discussions on the introduction of AI, and recently, as social demand for student-customized education services has increased, the importance of personalized recommendations using AI technology is increasing.

각 대학마다 학생들에게 수강 과목을 추천해주기 위해 다양한 방법들이 시도되고 있는데 그 중 TF-IDF(Term Frequency-Inverse Document Frequency) 알고리즘을 사용한 과목 추천 시스템은 텍스트 데이터에 포함된 키워드만 사용 가능하여 사용자로부터 입력되는 키워드가 텍스트 데이터에 포함되지 않았으면 과목을 추천해주지 못하는 문제점 있었다.Each university is trying various methods to recommend courses to students. Among them, the course recommendation system using the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm can only use keywords included in text data, so it is difficult to input from the user. There was a problem that the subject could not be recommended if the required keyword was not included in the text data.

즉, TF-IDF 모델의 경우 실제 교수요목(강의명과 강의내용을 포함하여 통칭함) 또는 비교과 활동 설명에 실제로 사용된 키워드가 입력되는 경우에만 추천 결과가 산출되며, 추천 결과가 산출되지 않는 경우는 추천 실패로 간주했다.That is, in the case of the TF-IDF model, recommendation results are calculated only when actual syllabuses (including course titles and course contents) or keywords actually used in the description of extracurricular activities are entered, and when no recommendation results are calculated, It was considered a recommendation failure.

따라서 사용자로부터 어떠한 키워드가 입력되더라도 가장 유사한 과목을 폭넓게 추천해주는 시스템의 개발이 필요하다.Therefore, it is necessary to develop a system that recommends a wide range of subjects that are most similar to any keyword entered by the user.

본 발명의 배경이 되는 기술은 대한민국 공개특허공보 제10-1754723호(2017. 07. 06. 공고)에 개시되어 있다.The background technology of the present invention is disclosed in Republic of Korea Patent Publication No. 10-1754723 (2017. 07. 06. notice).

본 발명이 이루고자 하는 기술적 과제는 학문영역 임베딩을 통해 구축된 매트릭스를 이용하여 사용자로부터 입력되는 관심 키워드에 대응하는 교과 및 비교과 활동을 추천하는 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체를 제공하기 위한 것이다.The technical problem to be achieved by the present invention is an integrated recommendation system for learning activities based on keywords of interest using embedding of academic areas that recommends curricular and extracurricular activities corresponding to keywords of interest input from a user using a matrix constructed through embedding of academic areas and its It is to provide a method and a recording medium for performing this.

이러한 기술적 과제를 이루기 위한 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템은, 다수의 논문 및 저널로부터 다수의 키워드와 다수의 학문영역을 추출하는 추출부; 상기 추출된 키워드를 전처리하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 목록과 상기 학문영역를 이용하여 관심 키워드별 학문영역 매트릭스를 생성하는 제1 매트릭스 생성부; 상기 생성된 관심 키워드별 학문영역 매트릭스를 이용하여 과목, 제2 전공 및 비교과 활동을 각각 학문영역 차원에 임베딩하여 각각의 학문영역 매트릭스를 생성하는 제2 매트릭스 생성부; 및 사용자로부터 관심 키워드가 입력되면, 상기 각각의 학문영역 매트릭스를 이용하여 상기 입력된 관심 키워드에 대응하는 과목, 제2 전공 및 비교과 활동 중 적어도 하나 이상을 유사도가 높은 순으로 추천하는 추천부를 포함한다.An integrated recommendation system for learning activities based on keywords of interest using academic domain embedding according to an embodiment of the present invention to achieve this technical task includes an extraction unit for extracting a plurality of keywords and a plurality of academic domains from a plurality of papers and journals; a first matrix generating unit that pre-processes the extracted keywords to generate a list of keywords of interest, and generates a matrix of study areas for each keyword of interest using the created list of keywords of interest and the study area; a second matrix generating unit for generating respective academic domain matrices by embedding subjects, second majors, and extracurricular activities in respective academic domain dimensions using the generated academic domain matrices for each keyword of interest; and a recommendation unit for recommending at least one of subjects, second majors, and extracurricular activities corresponding to the input keywords of interest, in order of similarity, in the order of high similarity, when a keyword of interest is input from the user. .

이때, 상기 추출부는 설정 기간 동안 출판된 다수의 논문으로부터 다수의 키워드를 추출하고, 각 논문마다 해당 논문이 투고된 저널에 기 할당된 다수의 학문영역을 추출하되, 학문영역별로 해당 논문에 포함된 키워드의 등장 빈도를 집계하여 상기 등장 빈도가 설정횟수 미만인 키워드를 제거하고 남은 키워드를 추출할 수 있다.At this time, the extraction unit extracts a number of keywords from a number of papers published during the set period, and extracts a number of academic areas pre-allocated to the journal to which the paper was submitted for each paper, and includes The frequency of appearance of keywords may be aggregated, keywords having a frequency of appearance less than a set number of times may be removed, and remaining keywords may be extracted.

또한, 제1 매트릭스 생성부는 상기 추출된 남은 키워드를 자연어 처리 기법을 이용하여 상기 관심 키워드 목록을 생성하고, 생성된 관심 키워드 각각에 대해 해당 키워드가 포함된 논문의 저널에 할당된 학문영역을 연결하여 각 키워드가 어느 학문영역의 특성을 가지고 있는지 크롤링하여 상기 관심 키워드별 학문영역 매트릭스를 생성할 수 있다.In addition, the first matrix generation unit generates the list of keywords of interest using the extracted remaining keywords using a natural language processing technique, and for each generated keyword of interest, connects the academic area assigned to the journal of the thesis including the keyword, A matrix of academic domains for each keyword of interest may be generated by crawling to determine which academic domain characteristics each keyword has.

또한, 제1 매트릭스 생성부는 키워드의 등장 빈도와 특정 학문영역 내 등장 빈도를 고려하여 하나의 키워드가 특정 학문 영역 내에서 얼마나 중요한 키워드인지 나타내는 TF-IDF(Term Frequency-Inverse Document Frequency) 값을 상기 관심 키워드별 학문영역 조합별로 산출하고, 상기 TF-IDF 값을 정규화하여 상기 관심 키워드별 학문영역 매트릭스를 생성할 수 있다.In addition, the first matrix generation unit sets a TF-IDF (Term Frequency-Inverse Document Frequency) value indicating how important a keyword is in a specific academic area by considering the frequency of occurrence of keywords and the frequency of appearance in a specific academic area. It is calculated for each keyword-specific academic area combination, and the TF-IDF value is normalized to generate the academic area matrix for each keyword of interest.

또한, 상기 제2 매트릭스 생성부는 설정 기간 내 개설된 교양 과목 및 전공 과목의 과목명과, 강의 개요를 포함하는 교수요목에 대해 형태소 분석 기법을 적용하여 다수의 키워드를 추출하고, 추출된 키워드와 상기 추출부로부터 추출된 키워드를 이용하여 각 키워드가 한 과목 내에서 등장한 횟수 및 등장한 과목의 횟수로부터 교수요목 키워드-과목 쌍별 TF-IDF 값을 산출하고 정규화한 후, TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 과목별 학문영역 매트릭스를 생성할 수 있다.In addition, the second matrix generation unit extracts a plurality of keywords by applying a morpheme analysis technique to the syllabus including course names and lecture outlines of liberal arts and major courses opened within the set period, and extracts the extracted keywords and the extracted keywords. After calculating and normalizing the TF-IDF value for each keyword-subject pair from the number of times each keyword appeared in one subject and the number of subjects that appeared in one subject using the keywords extracted from the Keywords can be embedded in the discipline dimension to create a subject-specific discipline matrix.

또한, 상기 제2 매트릭스 생성부는 상기 생성된 과목별 학문영역 매트릭스를 이용하여 각 제2 전공 커리큘럼에 포함된 모든 과목의 학문영역 벡터의 평균을 산출하고, 산출된 결과를 학문영역 차원에 임베딩하여 제2 전공별 학문영역 매트릭스를 생성할 수 있다.In addition, the second matrix generation unit calculates the average of the academic domain vectors of all subjects included in each second major curriculum using the generated academic domain matrix for each subject, and embeds the calculated result in the academic domain dimension to 2 It is possible to create a matrix of academic fields by major.

또한, 상기 제2 매트릭스 생성부는 설정 기간까지 업데이트된 비교과 활동 설명 데이터를 중복 제거한 후 형태소 분석 기법을 적용하여 추출된 키워드와 상기 추출부로부터 추출된 키워드를 결합하여 다수의 키워드를 추출하되, 비교과 활동별 키워드 사용 빈도 횟수를 이용하여 비교과 활동-키워드 쌍별 TF-IDF 값을 산출하고 정규화한 후 TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 비교과 활동의 학문영역 매트릭스를 생성할 수 있다.In addition, the second matrix generation unit extracts a plurality of keywords by combining the keywords extracted by applying the morpheme analysis technique after deduplicating the updated non-subject activity description data until the set period, and the keywords extracted from the extractor, After calculating and normalizing the TF-IDF value for each extracurricular activity-keyword pair using the frequency of use of each keyword, the keywords corresponding to the top n TF-IDF values are embedded in the academic domain dimension to create an academic domain matrix of extracurricular activities. can

또한, 상기 생성된 관심 키워드 목록에 포함된 각각의 키워드를 인문계열, 사회계열, 교육계열, 자연계열, 공학계열, 의학계열, 예체능계열 및 융합계열 중 어느 하나의 하위 계열로 분류하여 각 하위 계열에 할당된 모든 과목의 학문영역 임베딩 벡터 평균을 이용하여 각 하위 계열별 관심 키워드 개수를 산출하는 키워드 분류부를 더 포함할 수 있다.In addition, each keyword included in the generated keyword list of interest is classified into one of the humanities, social sciences, education, natural sciences, engineering, medicine, arts and sports, and convergence, and each sub-type It may further include a keyword classification unit that calculates the number of keywords of interest for each sub-sequence using the average of the embedding vectors of all subjects assigned to the academic area.

또한, 상기 추천부는 상기 사용자로부터 관심 키워드가 입력되면, 상기 각각의 학문영역 매트릭스를 이용하여 매트릭스 간 코사인 유사도에 따라 상기 입력된 관심 키워드와 코사인 유사도가 높은 상위 n개를 리스트로 추천할 수 있다.In addition, when a keyword of interest is input from the user, the recommender may recommend, as a list, top n items having a high cosine similarity to the inputted keyword of interest according to the cosine similarity between the matrices using the matrices of each academic area.

또한, 본 발명의 다른 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템에 의해 수행되는 학습활동 통합 추천 방법은, 다수의 논문 및 저널로부터 다수의 키워드와 다수의 학문영역을 추출하는 단계; 상기 추출된 키워드를 전처리하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 목록과 상기 학문영역를 이용하여 관심 키워드별 학문영역 매트릭스를 생성하는 단계; 상기 생성된 관심 키워드별 학문영역 매트릭스를 이용하여 과목, 제2 전공 및 비교과 활동을 각각 학문영역 차원에 임베딩하여 각각의 학문영역 매트릭스를 생성하는 단계; 및 사용자로부터 관심 키워드가 입력되면, 상기 각각의 학문영역 매트릭스를 이용하여 상기 입력된 관심 키워드에 대응하는 과목, 제2 전공 및 비교과 활동 중 적어도 하나 이상을 유사도가 높은 순으로 추천하는 단계를 포함한다.In addition, the learning activity integrated recommendation method performed by the learning activity integrated recommendation system based on the keyword of interest using the academic area embedding according to another embodiment of the present invention extracts a number of keywords and a number of academic areas from a number of papers and journals. doing; generating a list of keywords of interest by pre-processing the extracted keywords, and generating a matrix of study areas for each keyword of interest using the created list of keywords of interest and the study area; generating respective academic domain matrices by embedding subjects, second majors, and extracurricular activities in respective academic domain dimensions using the generated academic domain matrices for each keyword of interest; and if a keyword of interest is input from the user, recommending at least one of the subject, second major, and extracurricular activities corresponding to the inputted keyword of interest using the matrix of each academic field in order of similarity. .

이때, 상기 추출하는 단계는 설정 기간 동안 출판된 다수의 논문으로부터 다수의 키워드를 추출하고, 각 논문마다 해당 논문이 투고된 저널에 기 할당된 다수의 학문영역을 추출하되, 학문영역별로 해당 논문에 포함된 키워드의 등장 빈도를 집계하여 상기 등장 빈도가 설정횟수 미만인 키워드를 제거하고 남은 키워드를 추출할 수 있다.At this time, the extracting step extracts a number of keywords from a number of papers published during the set period, and extracts a number of academic areas pre-allocated to the journal to which the paper was submitted for each paper. The frequency of occurrence of included keywords may be aggregated, keywords having an appearance frequency less than a set number of times may be removed, and remaining keywords may be extracted.

또한, 상기 관심 키워드별 학문영역 매트릭스를 생성하는 단계는, 상기 추출된 남은 키워드를 자연어 처리 기법을 이용하여 상기 관심 키워드 목록을 생성하고, 생성된 관심 키워드 각각에 대해 해당 키워드가 포함된 논문의 저널에 할당된 학문영역을 연결하여 각 키워드가 어느 학문영역의 특성을 가지고 있는지 크롤링하여 상기 관심 키워드별 학문영역 매트릭스를 생성할 수 있다.In addition, in the step of generating the subject matrix for each keyword of interest, the list of keywords of interest is generated by using the extracted remaining keywords using a natural language processing technique, and for each generated keyword of interest, the journal of the thesis containing the keyword is generated. It is possible to create a matrix of academic domains for each keyword of interest by crawling which academic domain characteristics each keyword has by linking the academic domains assigned to .

또한, 상기 관심 키워드별 학문영역 매트릭스를 생성하는 단계는 키워드의 등장 빈도와 특정 학문영역 내 등장 빈도를 고려하여 하나의 키워드가 특정 학문 영역 내에서 얼마나 중요한 키워드인지 나타내는 TF-IDF(Term Frequency-Inverse Document Frequency) 값을 상기 관심 키워드별 학문영역 조합별로 산출하고, 상기 TF-IDF 값을 정규화하여 상기 관심 키워드별 학문영역 매트릭스를 생성할 수 있다.In addition, the step of generating the academic domain matrix for each keyword of interest is TF-IDF (Term Frequency-Inverse Document Frequency) values are calculated for each combination of the keywords of interest and each combination of academic areas, and the TF-IDF value is normalized to generate a matrix of academic areas for each keyword of interest.

또한, 상기 각각의 학문영역 매트릭스를 생성하는 단계는 설정 기간 내 개설된 교양 과목 및 전공 과목의 과목명과, 강의 개요를 포함하는 교수요목에 대해 형태소 분석 기법을 적용하여 다수의 키워드를 추출하고, 추출된 키워드와 상기 추출부로부터 추출된 키워드를 이용하여 각 키워드가 한 과목 내에서 등장한 횟수 및 등장한 과목의 횟수로부터 교수요목 키워드-과목 쌍별 TF-IDF 값을 산출하고 정규화한 후, TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 과목별 학문영역 매트릭스를 생성할 수 있다.In addition, the step of generating each academic area matrix extracts and extracts a number of keywords by applying a morpheme analysis technique to the syllabus including subject names of liberal arts and major courses opened within the set period and lecture outlines. Using the extracted keywords and the keywords extracted from the extraction unit, the TF-IDF value for each keyword-subject pair of the syllabus is calculated and normalized from the number of times each keyword appears in one subject and the number of subjects that appear, and then the TF-IDF value is higher A subject-specific subject matrix can be generated by embedding n keywords into the subject dimension.

또한, 상기 각각의 학문영역 매트릭스를 생성하는 단계는 상기 생성된 과목별 학문영역 매트릭스를 이용하여 각 제2 전공 커리큘럼에 포함된 모든 과목의 학문영역 벡터의 평균을 산출하고, 산출된 결과를 학문영역 차원에 임베딩하여 제2 전공별 학문영역 매트릭스를 생성할 수 있다.In addition, the step of generating each academic domain matrix calculates the average of the academic domain vectors of all subjects included in each second major curriculum using the generated academic domain matrix for each subject, and calculates the average of the academic domain vectors for each subject. By embedding in a dimension, a matrix of academic areas for each second major can be created.

또한, 상기 각각의 학문영역 매트릭스를 생성하는 단계는 설정 기간까지 업데이트된 비교과 활동 설명 데이터를 중복 제거한 후 형태소 분석 기법을 적용하여 추출된 키워드와 상기 추출부로부터 추출된 키워드를 결합하여 다수의 키워드를 추출하되, 비교과 활동별 키워드 사용 빈도 횟수를 이용하여 비교과 활동-키워드 쌍별 TF-IDF 값을 산출하고 정규화한 후 TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 비교과 활동의 학문영역 매트릭스를 생성할 수 있다.In addition, in the step of generating each academic domain matrix, a plurality of keywords are generated by combining keywords extracted by applying morpheme analysis techniques and keywords extracted from the extraction unit after deduplication of updated non-subject activity description data until the set period. After extraction, the TF-IDF value for each non-subject activity-keyword pair is calculated and normalized using the frequency of keyword use by non-subject activity, and then the keywords corresponding to the top n TF-IDF values are embedded in the academic area dimension to study extra-subject activity You can create an area matrix.

또한, 상기 추천하는 단계는 상기 사용자로부터 관심 키워드가 입력되면, 상기 각각의 학문영역 매트릭스를 이용하여 매트릭스 간 코사인 유사도에 따라 상기 입력된 관심 키워드와 코사인 유사도가 높은 상위 n개를 리스트로 추천할 수 있다.In addition, in the recommending step, when a keyword of interest is input from the user, the top n keywords having a high cosine similarity to the input keyword of interest may be recommended as a list according to the cosine similarity between the matrices using the matrices of each academic area. there is.

또한, 본 발명의 다른 실시 예에 따른 컴퓨터로 판독 가능한 기록 매체에는 상기 관심 키워드 기반 학습활동 통합 추천 방법을 수행하기 위한 컴퓨터 프로그램이 기록될 수 있다. Also, a computer program for performing the method of recommending integrated learning activities based on keywords of interest may be recorded on a computer-readable recording medium according to another embodiment of the present invention.

이와 같이 본 발명에 따르면, 학문영역 임베딩을 통해 구축된 매트릭스를 이용하여 사용자로부터 입력되는 관심 키워드에 대응하는 교양 과목, 전공 과목 및 제2 전공을 포함하는 교과 활동과 비교과 활동을 추천함으로써 사용자로 하여금 텍스트 데이터에 포함되지 않은 키워드를 입력하더라도 유사한 과목 및 비교과 활동을 추천해주어 사용자 만족도를 향상시킬 수 있는 효과가 있다.As described above, according to the present invention, by using a matrix constructed through embedding of academic domains, curricular activities and extracurricular activities including liberal arts subjects, major subjects, and second majors corresponding to the keywords of interest input from the user are recommended, thereby enabling the user to Even if a keyword that is not included in the text data is entered, similar subjects and extracurricular activities are recommended to improve user satisfaction.

또한 본 발명에 따르면, 다양한 키워드를 사용하거나 새로운 키워드를 추가하는 것이 용이하여 교양 과목, 전공 과목, 제2전공 및 비교과 활동을 추천하는 서비스 이 외에도 학문적 영역과 관련된 다양한 추천 서비스 영역에 확대 적용이 가능하여 범용적으로 활용 가능한 이점이 있다.In addition, according to the present invention, it is easy to use various keywords or add new keywords, so that it can be expanded and applied to various recommendation service areas related to academic areas in addition to services that recommend liberal arts, majors, second majors, and extracurricular activities. It has the advantage of being universally usable.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템을 나타낸 시스템 구성도이다.
도 2는 본 발명의 실시 예에서 관심 키워드별 학문영역 매트릭스를 예시적으로 도시한 도면이다.
도 3은 본 발명의 실시 예에서 과목별 학문영역 매트릭스를 생성하는 과정을 설명하기 위한 도면이다.
도 4 및 도 5는 도 3의 과정에 의해 생성된 과목별 학문영역 매트릭스를 예시적으로 도시한 도면이다.
도 6은 본 발명의 실시 예에서 제2 전공별 학문영역 매트릭스를 생성하는 과정을 설명하기 위한 도면이다.
도 7은 도 6의 과정에 의해 생성된 제2 전공별 학문영역 매트릭스를 예시적으로 도시한 도면이다.
도 8은 본 발명의 실시 예에서 비교과 활동의 학문영역 매트릭스를 예시적으로 도시한 도면이다.
도 9는 본 발명의 실시 예에서 관심 키워드 입력 화면을 예시적으로 도시한 도면이다.
도 10은 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 방법의 동작 흐름을 도시한 순서도이다.
도 11은 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 방법에서 영역별 통합 추천 모델의 성능을 나타낸 도면이다.1 is a system configuration diagram showing a system for recommending integrated learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention.
2 is a diagram exemplarily illustrating a matrix of academic domains for each keyword of interest in an embodiment of the present invention.
3 is a diagram for explaining a process of generating a subject-specific academic area matrix in an embodiment of the present invention.
4 and 5 are diagrams illustratively illustrating the matrix of subject areas generated by the process of FIG. 3 .
6 is a diagram for explaining a process of generating a study area matrix for each second major in an embodiment of the present invention.
FIG. 7 is a diagram exemplarily illustrating a matrix of academic domains for each second major generated by the process of FIG. 6 .
8 is a diagram exemplarily illustrating an academic domain matrix of extracurricular activities in an embodiment of the present invention.
9 is a diagram showing an example of an interest keyword input screen according to an embodiment of the present invention.
10 is a flowchart illustrating an operation flow of a method for recommending integrated learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention.
11 is a diagram showing the performance of an integrated recommendation model for each area in the integrated recommendation method for learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In this process, the thickness of lines or the size of components shown in the drawings may be exaggerated for clarity and convenience of explanation.

또한 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or operator. Therefore, definitions of these terms will have to be made based on the content throughout this specification.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

먼저, 도 1 내지 도 9를 통해 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템에 대하여 설명한다.First, a system for recommending integrated learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention will be described with reference to FIGS. 1 to 9 .

도 1은 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템을 나타낸 시스템 구성도이다.1 is a system configuration diagram showing a system for recommending integrated learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention.

도 1을 참조하면, 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템(3)은 유선 또는 무선 네트워크를 통하여 수강생에 의해 소지되는 사용자 단말(1), 교육기관 서버(2) 또는 학술정보 제공 서버(4) 등과 통신하면서 동작하도록 구성될 수 있다. 유선 또는 무선 네트워크를 통한 통신 방법은 객체와 객체가 네트워킹 할 수 있는 모든 통신 방법을 포함할 수 있으며, 유선 통신, 무선 통신, 3G, 4G, 혹은 그 이외의 방법으로 제한되지 않는다. Referring to FIG. 1, a system for recommending an integrated learning activity based on keywords of interest using academic field embedding provides a user terminal 1 possessed by a student, an educational institution server 2, or academic information through a wired or wireless network. It may be configured to operate while communicating with the server 4 and the like. A communication method through a wired or wireless network may include all communication methods that can be networked between objects and objects, and are not limited to wired communication, wireless communication, 3G, 4G, or other methods.

예를 들어, 유선 또는 무선 네트워크는 LAN(Local Area Network), MAN(Metropolitan Area Network), GSM(Global System for Mobile Network), EDGE(Enhanced Data GSM Environment), HSDPA(High Speed Downlink Packet Access), W-CDMA(Wideband Code Division Multiple Access), CDMA(Code Division Multiple Access), TDMA(Time Division Multiple Access), 블루투스(Bluetooth), 지그비(Zigbee), 와이-파이(Wi-Fi), VoIP(Voice over Internet Protocol), LTE Advanced, IEEE802.16m, WirelessMAN-Advanced, HSPA+, 3GPP Long Term Evolution (LTE), Mobile WiMAX (IEEE 802.16e), UMB (formerly EV-DO Rev. C), Flash-OFDM, iBurst and MBWA (IEEE 802.20) systems, HIPERMAN, Beam-Division Multiple Access (BDMA), Wi-MAX(World Interoperability for Microwave Access) 및 초음파 활용 통신으로 이루어진 군으로부터 선택되는 하나 이상의 통신 방법에 의한 통신 네트워크를 지칭할 수 있으나, 이에 한정되는 것은 아니다.For example, a wired or wireless network may include Local Area Network (LAN), Metropolitan Area Network (MAN), Global System for Mobile Network (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA), W -CDMA (Wideband Code Division Multiple Access), CDMA (Code Division Multiple Access), TDMA (Time Division Multiple Access), Bluetooth, Zigbee, Wi-Fi, Voice over Internet (VoIP) Protocol), LTE Advanced, IEEE802.16m, WirelessMAN-Advanced, HSPA+, 3GPP Long Term Evolution (LTE), Mobile WiMAX (IEEE 802.16e), UMB (formerly EV-DO Rev. C), Flash-OFDM, iBurst and MBWA (IEEE 802.20) systems, HIPERMAN, Beam-Division Multiple Access (BDMA), Wi-MAX (World Interoperability for Microwave Access), and ultrasound communication may refer to a communication network by one or more communication methods selected from the group consisting of , but is not limited thereto.

본 발명의 일 실시예에서, 학습활동 통합 추천 시스템(3)은 입력부(30), 추출부(31), 제1 매트릭스 생성부(32), 제2 매트릭스 생성부(33) 키워드 분류부(34) 및 추천부(35)를 포함한다.In one embodiment of the present invention, the learning activity integrated recommendation system 3 includes an input unit 30, an extraction unit 31, a first matrix generator 32, a second matrix generator 33, a keyword classification unit 34 ) and a recommendation unit 35.

본 명세서에 기재된 시스템, 장치 및 서버는 전적으로 하드웨어이거나, 또는 부분적으로 하드웨어이고 부분적으로 소프트웨어인 측면을 가질 수 있다. 예컨대, 본 명세서의 시스템, 장치 및 서버와, 이들에 포함된 각 부(unit)는 특정 형식 및 내용의 데이터를 처리하거나 또는 전자통신 방식으로 주고받기 위한 하드웨어 및 이에 관련된 소프트웨어를 통칭할 수 있다. 본 명세서에서 "부", "모듈", "장치", "단말기", "서버" 또는 "시스템" 등의 용어는 하드웨어 및 해당 하드웨어에 의해 구동되는 소프트웨어의 조합을 지칭하는 것으로 의도된다. 예를 들어, 하드웨어는 CPU 또는 다른 프로세서(processor)를 포함하는 데이터 처리 기기일 수 있다. 또한, 하드웨어에 의해 구동되는 소프트웨어는 실행중인 프로세스, 객체(object), 실행파일(executable), 실행 스레드(thread of execution), 프로그램(program) 등을 지칭할 수 있다.The systems, devices and servers described herein may be entirely hardware, or may have aspects that are part hardware and part software. For example, the systems, devices, and servers of the present specification, and each unit included therein, may collectively refer to hardware and software related thereto for processing or exchanging data in a specific form and content or by electronic communication. In this specification, terms such as "unit", "module", "device", "terminal", "server" or "system" are intended to refer to a combination of hardware and software driven by the hardware. For example, the hardware may be a data processing device including a CPU or other processor. Also, software driven by hardware may refer to a running process, an object, an executable file, a thread of execution, a program, and the like.

또한, 본 실시예에 따른 학습활동 통합 추천 시스템(3)을 구성하는 각각의 요소는 반드시 서로 물리적으로 구분되는 별개의 장치를 지칭하는 것으로 의도되지 않는다. 즉, 도 1의 입력부(30), 추출부(31), 제1 매트릭스 생성부(32), 제2 매트릭스 생성부(33) 키워드 분류부(34) 및 추천부(35) 등은 학습활동 통합 추천 시스템(3)을 구성하는 하드웨어를 해당 하드웨어에 의해 수행되는 동작에 따라 기능적으로 구분한 것일 뿐, 반드시 각각의 부가 서로 독립적으로 구비되어야 하는 것이 아니다. 물론, 실시예에 따라서는 학습활동 통합 추천 시스템(3)의 각 부 중 하나 이상이 서로 물리적으로 구분되는 별개의 장치로 구현되는 것도 가능하다.In addition, each element constituting the learning activity integrated recommendation system 3 according to this embodiment is not intended to refer to a separate device that is necessarily physically separated from each other. That is, the input unit 30, the extraction unit 31, the first matrix generation unit 32, the second matrix generation unit 33, the keyword classification unit 34, and the recommendation unit 35 of FIG. 1 integrate learning activities. The hardware constituting the recommendation system 3 is functionally divided according to the operation performed by the corresponding hardware, and each unit does not necessarily have to be provided independently of each other. Of course, depending on the embodiment, one or more of the parts of the learning activity integrated recommendation system 3 may be implemented as a separate device that is physically separated from each other.

학습활동 통합 추천 시스템(3)은 하나 이상의 관심 키워드에 대응하는 과목, 제2 전공 및 비교과 활동을 관심 키워드를 입력한 사용자에게 추천할 수 있다. 사용자가 관심 키워드 정보를 전송하고 대응하는 정보를 수신하는 것이 가능하도록, 학습활동 통합 추천 시스템(3)은 사용자 단말(1) 상에서 실행되는 애플리케이션(또는, 앱(app))과 통신함으로써 애플리케이션의 기능 수행을 가능하게 하는 애플리케이션 서비스 서버의 기능을 수행할 수 있다. 또는, 다른 실시예에서 학습활동 통합 추천 시스템(3)은 사용자 단말(1) 상에서 실행되는 웹 브라우저에 의하여 접속 가능한 웹 페이지를 제공하는 웹 서버 등의 형태로 구현될 수도 있다.The learning activity integrated recommendation system 3 may recommend subjects, second majors, and extracurricular activities corresponding to one or more keywords of interest to the user who inputs the keywords of interest. The learning activity integrated recommendation system 3 communicates with the application (or app) running on the user terminal 1 so that the user can transmit information about keywords of interest and receive corresponding information, thereby enabling the function of the application. It can perform the function of an application service server that enables execution. Alternatively, in another embodiment, the learning activity integrated recommendation system 3 may be implemented in the form of a web server that provides a web page accessible by a web browser running on the user terminal 1.

도 1에 도시된 실시예에서 사용자 단말(1)은 노트북 컴퓨터의 형태로 도시되었다. 그러나 이는 예시적인 것으로서, 사용자 단말(1)의 종류는 도면에 도시된 것으로 한정되는 것은 아니다. 예를 들어, 전공을 추천받고자 하는 사용자는 스마트폰(smartphone)과 같은 이동 통신 단말기, 개인용 컴퓨터(personal computer), 노트북 컴퓨터, PDA(personal digital assistant), 태블릿(tablet), IPTV(Internet Protocol Television) 등을 위한 셋톱박스(set-top box) 등 임의의 컴퓨팅 장치를 이용하여 학습활동 통합 추천 시스템(3)이 제공하는 기능을 사용할 수 있다.In the embodiment shown in FIG. 1, the user terminal 1 is shown in the form of a notebook computer. However, this is just an example, and the type of user terminal 1 is not limited to that shown in the drawings. For example, a user who wants to be recommended for a major may use a mobile communication terminal such as a smartphone, a personal computer, a notebook computer, a personal digital assistant (PDA), a tablet, and an Internet Protocol Television (IPTV) The functions provided by the learning activity integrated recommendation system 3 can be used using an arbitrary computing device such as a set-top box for the like.

입력부(30)는 사용자 단말(1)을 통해 입력되는 관심 키워드를 입력받는다.The input unit 30 receives an interest keyword input through the user terminal 1 .

추출부(31)는 다수의 논문 및 저널로부터 다수의 키워드와 다수의 학문영역을 추출한다. 이를 위하여 추출부(31)는 연구 논문 데이터를 제공하는 학술정보 제공 서버(4)와 통신하도록 구성될 수 있다.The extraction unit 31 extracts a number of keywords and a number of academic fields from a number of papers and journals. To this end, the extraction unit 31 may be configured to communicate with the academic information providing server 4 that provides research paper data.

따라서, 추출부(31)는 설정 기간 동안 출판된 다수의 논문으로부터 다수의 키워드를 추출하고, 각 논문마다 해당 논문이 투고된 저널에 기 할당된 다수의 학문영역을 추출하되, 학문영역별로 해당 논문에 포함된 키워드의 등장 빈도를 집계하여 등장 빈도가 설정횟수 미만인 키워드를 제거하고 남은 키워드를 추출하는 것이 바람직하다.Therefore, the extraction unit 31 extracts a number of keywords from a number of papers published during the set period, and extracts a number of academic areas pre-allocated to the journal to which the paper was submitted for each paper, and the corresponding paper for each academic area. It is preferable to aggregate the appearance frequencies of keywords included in the keywords, remove keywords whose appearance frequency is less than the set number of times, and extract the remaining keywords.

이를 자세히 설명하자면, KCI 논문 키워드 데이터는 설정 기간 동안 출판된 다수 건의 논문에서 추출된 키워드를 바탕으로 구축되며, 하나의 논문은 약 5 내지 7개의 키워드를 포함하고 있으며, 투고된 저널의 특성에 따라 1 내지 3개의 학문 영역(field)에 할당된다. 논문에 포함된 키워드는 논문이 포함된 학문 영역에 따라 등장 빈도가 집계된다.To explain this in detail, the KCI paper keyword data is constructed based on keywords extracted from a number of papers published during the set period, and one paper contains about 5 to 7 keywords, depending on the characteristics of the submitted journal. Assigned to 1 to 3 fields of study. Keywords included in the thesis are counted according to the academic area in which the thesis is included.

이때, 구축된 KCI 키워드 데이터 베이스는 다수개의 행으로 이루어져 있으며, a개의 고유한 키워드가 b개의 학문 영역에 분포되어 있다. 기록된 키워드가 2회 미만인 1개 학문 영역과, 모든 학문 영역을 총합하여 10회 미만 등장한 키워드를 제거한다. 본 발명의 실시 예에서는 이와 같은 과정을 거쳐 총 n개의 키워드가 추출된다.At this time, the constructed KCI keyword database consists of a plurality of rows, and a unique keywords are distributed in b academic areas. One academic domain in which the recorded keyword was recorded less than twice, and keywords that appeared less than 10 times by summing all academic domains are removed. In an embodiment of the present invention, a total of n keywords are extracted through such a process.

제1 매트릭스 생성부(32)는 추출부(31)에서 추출된 키워드를 전처리하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 목록과 학문영역를 이용하여 관심 키워드별 학문영역 매트릭스를 생성한다.The first matrix generation unit 32 preprocesses the keywords extracted from the extraction unit 31 to generate a list of keywords of interest, and generates a matrix of subject areas for each keyword of interest using the generated list of keywords of interest and the subject area.

이때, 제1 매트릭스 생성부(32)는 추출부(31)에서 추출된 남은 키워드를 자연어 처리 기법을 이용하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 각각에 대해 해당 키워드가 포함된 논문의 저널에 할당된 학문영역을 연결하여 각 키워드가 어느 학문영역의 특성을 가지고 있는지 크롤링하여 관심 키워드별 학문영역 매트릭스를 생성한다.At this time, the first matrix generator 32 generates a list of keywords of interest by using the natural language processing technique with the remaining keywords extracted by the extractor 31, and for each generated keyword of interest, the journal of the thesis containing the keyword. By linking the assigned academic domains, each keyword creates a matrix of academic domains for each keyword of interest by crawling to see which academic domain has characteristics.

자세히는, 제1 매트릭스 생성부(32)는 키워드의 등장 빈도와 특정 학문영역 내 등장 빈도를 고려하여 하나의 키워드가 특정 학문 영역 내에서 얼마나 중요한 키워드인지 나타내는 TF-IDF(Term Frequency-Inverse Document Frequency) 값을 관심 키워드별 학문영역 조합별로 산출하고, TF-IDF 값을 정규화하여 관심 키워드별 학문영역 매트릭스를 생성할 수 있다.In detail, the first matrix generating unit 32 calculates TF-IDF (Term Frequency-Inverse Document Frequency) indicating how important a keyword is in a specific academic area by considering the frequency of occurrence of keywords and the frequency of appearance in a specific academic area. ) values are calculated for each keyword of interest and each combination of academic fields, and the TF-IDF value is normalized to generate a matrix of academic fields for each keyword of interest.

도 2는 본 발명의 실시 예에서 관심 키워드별 학문영역 매트릭스를 예시적으로 도시한 도면이다.2 is a diagram exemplarily illustrating a matrix of academic domains for each keyword of interest in an embodiment of the present invention.

제1 매트릭스 생성부(32)는 관심 키워드 학문영역 조합별로 TF-IDF 값을 산출하는데, TF-IDF 값은 키워드의 전체 등장 빈도와 특정 학문영역 내 등장 빈도를 고려하여 하나의 키워드가 특정 학문 영역 내에서 얼마나 중요한 키워드인지 나타내는 수치이다. 예를 들어 '한국', '교육' 등과 같은 키워드는 KCI 학문 영역 전반에 걸쳐 높은 빈도로 등장한다. 반면에 심리, 우울 등의 키워드는 심리과학(Psychological Science) 분야에서 주로 등장하는 키워드이다. 이러한 키워드의 개별 영역 내 중요성을 적절하게 조절하기 위해 TF-IDF 값이 사용된다. 또한, 산출된 TF-IDF 값을 0 내지 1 사이가 되게 정규화하여 관심 키워드별 학문영역 매트릭스를 생성할 수 있다. 생성된 매트릭스는 도 2에서와 같이 TF-IDF 값이 높은 순으로 키워드를 정렬하여 나타낼 수도 있다.The first matrix generation unit 32 calculates a TF-IDF value for each combination of keywords of interest and academic fields of interest. The TF-IDF value determines whether one keyword is a specific academic field by considering the overall frequency of occurrence of keywords and the frequency of occurrence in a specific academic field. It is a number indicating how important a keyword is within a keyword. For example, keywords such as 'Korea' and 'education' appear with high frequency throughout KCI's academic domains. On the other hand, keywords such as psychology and depression are keywords that appear mainly in the field of Psychological Science. TF-IDF values are used to properly balance the importance within individual fields of these keywords. In addition, the calculated TF-IDF value may be normalized to be between 0 and 1 to generate a matrix of academic domains for each keyword of interest. As shown in FIG. 2, the generated matrix may be displayed by arranging keywords in the order of high TF-IDF values.

제2 매트릭스 생성부(33)는 제1 매트릭스 생성부(32)에서 생성된 관심 키워드별 학문영역 매트릭스를 이용하여 과목, 제2 전공 및 비교과 활동을 각각 학문영역 차원에 임베딩하여 각각의 학문영역 매트릭스를 생성한다.The second matrix generator 33 embeds each subject, second major, and extracurricular activities into the academic domain dimension using the academic domain matrix for each keyword of interest generated by the first matrix generator 32 to create each academic domain matrix. generate

이때, 제2 매트릭스 생성부(33)는, 과목별 학문영역 매트릭스, 제2 전공별 학문영역 매트릭스 및 비교과 활동의 학문영역 매트릭스를 각각 생성할 수 있다.In this case, the second matrix generation unit 33 may generate an academic domain matrix for each subject, an academic domain matrix for each second major, and an academic domain matrix for extracurricular activities, respectively.

먼저, 과목별 학문영역 매트릭스는 설정 기간 내 개설된 교양 과목 및 전공 과목의 과목명과, 강의 개요를 포함하는 교수요목에 대해 형태소 분석 기법을 적용하여 다수의 키워드를 추출하고, 추출된 키워드와 추출부(31)로부터 추출된 키워드를 이용하여 각 키워드가 한 과목 내에서 등장한 횟수 및 등장한 과목의 횟수로부터 교수요목 키워드-과목 쌍별 TF-IDF 값을 산출하고 정규화한 후, TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 과목별 학문영역 매트릭스를 생성한다.First, the academic area matrix for each subject extracts a number of keywords by applying a morphological analysis technique to the syllabus including subject names and lecture outlines of liberal arts and major courses opened within the set period, and extracts the extracted keywords and the extracted keywords. Using the keywords extracted from (31), the TF-IDF value for each keyword-subject pair is calculated and normalized from the number of times each keyword appeared in one subject and the number of subjects that appeared, and then the top n TF-IDF values By embedding the corresponding keywords in the discipline dimension, a subject-specific subject matrix is created.

과목별 학문영역 매트릭스를 생성하는 과정을 상세히 설명하자면, 각 과목에서 학습하게 될 내용은 교수가 입력한 과목명 및 강의 개요를 포함하는 교수요목에 간결하게 기술되어 있다. 교수요목은 텍스트로 이루어진 비정형 데이터이기 때문에 본 발명의 실시 예에 따른 학습활동 통합 추천 시스템(3)에 그대로 적용하기 어렵다. 따라서, 본 발명의 실시 예에서는 교수요목 데이터를 활용하기 위해 KCI 연구논문 키워드 데이터를 사용한 방법에 대해 설명한다.To explain in detail the process of creating a matrix of subject areas, the contents to be learned in each subject are concisely described in the syllabus including the subject name and lecture outline entered by the professor. Since the syllabus is unstructured data composed of text, it is difficult to apply it as it is to the learning activity integrated recommendation system 3 according to the embodiment of the present invention. Therefore, in the embodiment of the present invention, a method of using KCI research paper keyword data to utilize syllabus data will be described.

지난 10년간 개설된 교양 및 전공과목 다수개 과목의 교수요목을 분석하고, 추출부(31)를 통해 추출된 KCI 연구논문 키워드 n개를 교수요목 형태소 추출을 위한 사용자 사전에 등록한다. 교수요목으로부터 총 m개의 키워드를 추출하고, 한 번만 등장한 키워드를 제거한다. 또한 전체 교수요목 중 10%이상 과목에서 반복되는 키워드(기초, 분석, 과목 연구 등)를 불용어 사전에 등록하여 제거한다. 형태소 추출 결과 다수개의 강의에 대해 다수개의 고유한 키워드가 다수건 기록되었고, 다음의 표 1은 '심리학연구와활용' 과목의 교수요목에 대한 키워드 추출 결과를 보여주는 예시이다.Syllabus of liberal arts and major courses opened in the last 10 years is analyzed, and n keywords of KCI research papers extracted through the extraction unit 31 are registered in the user dictionary for extracting morphemes of the syllabus. A total of m keywords are extracted from the syllabus, and keywords that appear only once are removed. In addition, keywords (basic, analysis, subject research, etc.) that are repeated in more than 10% of the entire syllabus are registered in the stopword dictionary and removed. As a result of morpheme extraction, a number of unique keywords were recorded for a number of lectures.

도 3은 본 발명의 실시 예에서 과목별 학문영역 매트릭스를 생성하는 과정을 설명하기 위한 도면이고, 도 4 및 도 5는 도 3의 과정에 의해 생성된 과목별 학문영역 매트릭스를 예시적으로 도시한 도면이다.3 is a diagram for explaining a process of generating a subject-specific subject area matrix in an embodiment of the present invention, and FIGS. 4 and 5 exemplarily illustrate the subject-specific subject area matrix generated by the process of FIG. 3. it is a drawing

도 3을 참고하면 알 수 있듯이, 교수요목 키워드가 한 과목 내에서 얼마나 많이 등장했는지(TF), 얼마나 많은 과목에서 등장했는지(DF)를 고려하여 교수요목 키워드-과목 쌍별 TF-IDF 값을 산출한다. TF-IDF 값이 높을수록 특정 교수요목 키워드가 해당 과목을 대표하는 중요 키워드일 수 있다는 것을 의미한다.As can be seen with reference to FIG. 3, the TF-IDF value for each syllabus keyword-subject pair is calculated considering how many times the syllabus keyword appeared in one subject (TF) and how many subjects appeared (DF) . The higher the TF-IDF value, the more likely a specific syllabus keyword can be an important keyword representing the subject.

예를 들어, 학문분야 m에서 키워드 i의 TF-IDF 값을 하기 수학식 1의 F로 나타내고, 과목정보 j에서 키워드 i의 TF-IDF 값을 하기 수학식 2의 C로 나타내며, 과목정보 j의 키워드 수를 N_j로 나타낼 경우, 학문분야 m에 속하는 과목정보 j의 가중치가 반영된 TF-IDF 값 F'은 하기 수학식 3과 같이 산출될 수 있다. For example, the TF-IDF value of keyword i in academic field m is represented by F in Equation 1 below, the TF-IDF value of keyword i in subject information j is represented by C in Equation 2 below, and subject information j When the number of keywords is represented by N _j , the TF-IDF value F' reflecting the weight of subject information j belonging to the academic field m can be calculated as in Equation 3 below.

즉, 수학식 1에 의해 산출되는 TF-IDF 값에 의해 키워드별 학문영역 매트릭스(301)를 생성하고, 수학식 2에 의해 산출되는 TF-IDF 값에 의해 키워드 매트릭스(302)를 생성하여 과목정보 키워드 매트릭스(303)로 변환하고, 과목정보 키워드 매트릭스(303)와 키워드별 학문영역 매트릭스(301)를 연산하여 과목별 학문영역 매트릭스(310)를 생성한다.That is, subject information It is converted into a keyword matrix 303, and the subject information keyword matrix 303 and the subject-specific subject area matrix 301 are operated to generate a subject-specific subject area matrix 310.

따라서 제2 매트릭스 생성부(33)는 수학식 1 내지 3을 이용하여 교수요목 키워드-과목 쌍별 TF-IDF 값을 산출하고, 그 중 TF-IDF 값이 낮은 하위 n개의 키워드를 제거하여 각 교수요목 키워드의 과목별 TF-IDF 값을 산출한 후 정규화한다. 이러한 과정을 통해 도 4와 같은 데이터 프레임을 구성할 수 있다.Therefore, the second matrix generator 33 calculates TF-IDF values for each syllabus keyword-subject pair using Equations 1 to 3, and removes the lower n keywords having a lower TF-IDF value among them to obtain each syllabus After calculating the TF-IDF value for each subject of the keyword, it is normalized. Through this process, a data frame as shown in FIG. 4 can be configured.

이때, 도 5에서와 같이 과목별 학문영역 매트릭스에서 각 과목은 행, 다수개의 학문영역은 열이 된다. 과목별 학문영역 매트릭스는 교수요목 키워드의 학문영역 특성을 TF-IDF 가중치를 적용해 평균을 구한 값으로 산출된다. 즉, 어떤 과목에서 중요한 키워드의 학문영역 특성이 그 과목의 학문영역 특성을 산출할 때 더 크게 반영된다.At this time, as shown in FIG. 5, in the subject-specific academic domain matrix, each subject becomes a row and a plurality of academic domains becomes a column. The academic domain matrix for each subject is calculated as an average value obtained by applying TF-IDF weights to the academic domain characteristics of the syllabus keywords. In other words, the characteristics of the academic domain of important keywords in a subject are reflected more when calculating the characteristics of the academic domain of that subject.

그리고, 제2 전공별 학문영역 매트릭스는 생성된 과목별 학문영역 매트릭스를 이용하여 각 제2 전공 커리큘럼에 포함된 모든 과목의 학문영역 벡터의 평균을 산출하고, 산출된 결과를 학문영역 차원에 임베딩하여 제2 전공별 학문영역 매트릭스를 생성한다.In addition, the academic domain matrix for each second major calculates the average of the academic domain vectors of all subjects included in each second major curriculum using the generated academic domain matrix for each subject, and then embeds the calculated result into the academic domain dimension. 2 Create a matrix of academic fields by major.

도 6은 본 발명의 실시 예에서 제2 전공별 학문영역 매트릭스를 생성하는 과정을 설명하기 위한 도면이고, 도 7은 도 6의 과정에 의해 생성된 제2 전공별 학문영역 매트릭스를 예시적으로 도시한 도면이다.FIG. 6 is a diagram for explaining a process of generating a study area matrix for each second major in an embodiment of the present invention, and FIG. 7 is a view showing an example of the study area matrix for each second major created by the process of FIG. 6 . am.

제2 매트릭스 생성부(33)는 과목별 학문영역 매트릭스(310)를 이용하여 도 6에서와 같이 제2 전공을 학문영역 차원에 임베딩한다. 자세히는 각 제 2전공 커리큘럼 매트릭스(400)에 포함된 모든 과목의 학문영역 벡터의 평균을 통해 제2 전공별 학문영역 매트릭스(420)를 산출한다.The second matrix generator 33 embeds the second major into the dimension of the academic domain as shown in FIG. 6 using the subject-specific academic domain matrix 310 . In detail, the academic domain matrix 420 for each second major is calculated through the average of the academic domain vectors of all subjects included in each second major curriculum matrix 400 .

이때, 도 7은 Psychological Science 특성이 높은 상위 20개 전공을 나타낸 예시이다.At this time, FIG. 7 is an example showing the top 20 majors with high Psychological Science characteristics.

마지막으로 비교과 활동의 학문영역 매트릭스는 설정 기간까지 업데이트된 비교과 활동 설명 데이터를 중복 제거한 후 형태소 분석 기법을 적용하여 추출된 키워드와 추출부(31)로부터 추출된 키워드를 결합하여 다수의 키워드를 추출하되, 비교과 활동별 키워드 사용 빈도 횟수를 이용하여 비교과 활동-키워드 쌍별 TF-IDF 값을 산출하고 정규화한 후 TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 비교과 활동의 학문영역 매트릭스를 생성한다.Finally, for the academic area matrix of extracurricular activities, a number of keywords are extracted by combining the keywords extracted from the extraction unit 31 with the keywords extracted by applying the morphological analysis technique after deduplicating the updated non-subject activity description data until the set period, , Calculate the TF-IDF value for each non-subject activity-keyword pair using the frequency of keyword use by non-subject activity, normalize it, and then embed the keywords corresponding to the top n TF-IDF values into the discipline dimension to obtain an academic domain matrix of extracurricular activities generate

도 8은 본 발명의 실시 예에서 비교과 활동의 학문영역 매트릭스를 예시적으로 도시한 도면이다.8 is a diagram exemplarily illustrating an academic domain matrix of extracurricular activities in an embodiment of the present invention.

자세히는, 비교과 활동 설명 데이터에 대한 형태소 분석을 진행한 후, KCI 연구논문 키워드와 결합하여 학문영역 특성을 산출한다. 한 학기에 한번 모든 과목에 대한 정보가 등록되는 과목이나 제2 전공과 달리 비교과 활동은 수시로 업데이트 되기 때문에 데이터 수집 및 전처리 절차를 매일 반복하는 것이 가장 바람직하다. 또한, 같은 내용을 가진 프로그램이 중복 등록될 수 있기 때문에, 고유한 비교과 활동 n개를 추출한다. 추출된 n개 비교과 활동 설명 텍스트 데이터로부터 m개의 키워드를 추출한. 비교과 활동별 키워드 사용 빈도를 바탕으로, 비교과 활동―키워드 쌍별 TF-IDF 가중치를 적용하는데, 비교과 활동의 학문영역 매트릭스는 추출된 키워드의 학문영역 특성을 TF-IDF 가중치를 적용한 평균을 통해 산출한다. 즉, 어떤 비교과 활동에서 중요한 키워드의 학문영역 특성이 그 활동의 학문영역 특성을 산출할 때 더 크게 반영된다. 이때, 도 8에서와 같이 어느 하나의 특성에 대해 가장 높은 비교과 활동 상위 n개를 나타낼 수 있다.In detail, after conducting morphological analysis on the extracurricular activity description data, it is combined with the keywords of KCI research papers to calculate the characteristics of the academic area. Unlike courses or second majors in which information on all subjects is registered once a semester, extracurricular activities are updated frequently, so it is most desirable to repeat the data collection and preprocessing procedure every day. In addition, since programs with the same contents may be registered repeatedly, n unique comparison activities are extracted. Extracted m keywords from n extracted comparison and activity description text data. Based on the frequency of keyword use for each extracurricular activity, the TF-IDF weight for each extracurricular activity-keyword pair is applied. The academic domain matrix of the extracurricular activity calculates the average of the academic domain characteristics of the extracted keywords with the TF-IDF weight. In other words, the academic domain characteristics of important keywords in a certain extracurricular activity are reflected more when calculating the academic domain characteristics of the activity. In this case, as shown in FIG. 8 , the top n items with the highest comparison activity for any one characteristic may be indicated.

키워드 분류부(34)는 제1 매트릭스 생성부(32)에서 생성된 관심 키워드 목록에 포함된 각각의 키워드를 인문계열, 사회계열, 교육계열, 자연계열, 공학계열, 의학계열, 예체능계열 및 융합계열 중 어느 하나의 하위 계열로 분류하여 각 하위 계열에 할당된 모든 과목의 학문영역 임베딩 벡터 평균을 이용하여 각 하위 계열별 관심 키워드 개수를 산출한다.The keyword classification unit 34 classifies each of the keywords included in the keyword list of interest generated by the first matrix generation unit 32 into humanities, social sciences, education, natural sciences, engineering, medicine, arts and sports, and convergence. It is classified into one sub-series among the series, and the number of keywords of interest for each sub-series is calculated using the average of the embedding vectors in the academic fields of all subjects assigned to each sub-series.

도 9는 본 발명의 실시 예에서 관심 키워드 입력 화면을 예시적으로 도시한 도면이다.9 is a diagram showing an example of an interest keyword input screen according to an embodiment of the present invention.

키워드 분류부(34)는 도 9에서와 같이, 사용자의 키워드 선택을 돕기 위해 키워드를 하위 계열에 따라 인문계열, 사회계열, 교육계열, 자연계열, 공학계열, 의학계열, 예체능계열, 융합계열과 같이 8개 카테고리로 구분할 수 있으며, 모든 개설과목은 상기 계열 중 하나에 할당된다. 각 계열에 포함된 모든 과목의 학문영역 임베딩 벡터의 평균을 통해 계열의 학문영역 임베딩 벡터를 다음의 표 2와 같이 각 하위 계열별 관심 키워드 개수를 산출할 수 있다.As shown in FIG. 9, the keyword classification unit 34 classifies keywords into humanities, social sciences, education, natural sciences, engineering, medicine, arts and sports, and convergence according to sub-series to help users select keywords. It can be divided into 8 categories, and all courses are assigned to one of the above series. Through the average of the academic domain embedding vectors of all subjects included in each series, the number of keywords of interest for each sub-series can be calculated as shown in Table 2 below.

추천부(35)는 입력부(30)를 통해 사용자로부터 관심 키워드가 입력되면, 제2 매트릭스 생성부(33)에서 생성된 각각의 학문영역 매트릭스를 이용하여 입력된 관심 키워드에 대응하는 과목, 제2 전공 및 비교과 활동 중 적어도 하나 이상을 유사도가 높은 순으로 추천한다.When a keyword of interest is input from the user through the input unit 30, the recommendation unit 35 uses the matrix of each academic area generated by the second matrix generator 33 to select a subject corresponding to the input keyword of interest, and a second At least one of the major and extracurricular activities is recommended in order of similarity.

자세히는 사용자로부터 관심 키워드가 입력되면, 각각의 학문영역 매트릭스를 이용하여 매트릭스 간 코사인 유사도에 따라 사용자로부터 입력된 관심 키워드와 코사인 유사도가 높은 상위 n개를 리스트로 추천할 수 있다.In detail, when a keyword of interest is input from the user, the top n items having high cosine similarity to the keyword of interest input from the user may be recommended as a list according to the cosine similarity between the matrices using each academic area matrix.

자세히는, 사용자는 사용자 단말(1)을 통해 관심 키워드를 직접 입력할 수도 있고, 화면에서 키워드 아이콘을 클릭할 수도 있다. 이때, 사용자는 하나 복수개의 키워드를 입력할 수도 있다. 사용자로부터 관심 키워드가 입력되면 추천부(35)는 입력한 키워드와 네 가지 항목별 추천 대상 사이의 유사도를 다음의 수학식 1의 코사인 유사도 공식을 사용하여 유사도를 산출한다. 이때, 입력된 키워드별로 각 항목별 추천 대상과의 거리를 각각 산출한다.In detail, the user may directly input a keyword of interest through the user terminal 1 or may click a keyword icon on the screen. At this time, the user may input one or more keywords. When a keyword of interest is input from the user, the recommendation unit 35 calculates a similarity between the input keyword and a recommendation target for each of the four categories by using the cosine similarity formula of Equation 1 below. At this time, the distance to the recommended target for each item is calculated for each input keyword.

예를 들어, 사용자로부터 입력된 관심 키워드 정보와 입력된 관심 키워드 정보(또는 전공별 과목명 키워드 매트릭스)를 각각 A,B라고 할 경우, 두 매트릭스 사이의 코사인 유사도는 수학식 1과 같이 산출할 수 있다. For example, when the interest keyword information input from the user and the input interest keyword information (or subject keyword matrix for each major) are A and B, respectively, the cosine similarity between the two matrices can be calculated as in Equation 1 .

이 때, 추천부(35)는 코사인 유사도로부터 산출된 사용자로부터 입력된 관심 키워드와의 평균 거리가 짧은 순서대로 사용자에게 과목, 제2 전공 및 비교과 활동을 추천할 수 있다.At this time, the recommendation unit 35 may recommend subjects, second majors, and extracurricular activities to the user in the order of short average distances from the user's input keyword of interest calculated from the cosine similarity.

이하에서는 도 10 및 도 11을 통해 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 방법에 대하여 설명한다.Hereinafter, a method for integrated recommendation of learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention will be described with reference to FIGS. 10 and 11 .

도 10은 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 방법의 동작 흐름을 도시한 순서도로서, 이를 참조하여 본 발명의 구체적인 동작을 설명한다.10 is a flow chart showing the operational flow of a method for integrated recommendation of learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention. Referring to this flowchart, specific operations of the present invention will be described.

본 발명의 실시 예에 따르면, 먼저 추출부(31)가 다수의 논문 및 저널로부터 다수의 키워드와 다수의 학문영역을 추출한다(S10).According to an embodiment of the present invention, first, the extraction unit 31 extracts a number of keywords and a number of academic fields from a number of papers and journals (S10).

이때, S10 단계에서 추출부(31)는 설정 기간 동안 출판된 다수의 논문으로부터 다수의 키워드를 추출하고, 각 논문마다 해당 논문이 투고된 저널에 기 할당된 다수의 학문영역을 추출하되, 학문영역별로 해당 논문에 포함된 키워드의 등장 빈도를 집계하여 등장 빈도가 설정횟수 미만인 키워드를 제거하고 남은 키워드를 추출한다.At this time, in step S10, the extraction unit 31 extracts a plurality of keywords from a plurality of papers published during the set period, and extracts a plurality of academic areas pre-allocated to the journal to which the paper was submitted for each paper, By counting the frequency of occurrence of keywords included in the paper, keywords whose frequency of appearance is less than the set number of times are removed, and the remaining keywords are extracted.

그 다음 제1 매트릭스 생성부(32)가 S10 단계에서 추출된 키워드를 전처리하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 목록과 학문영역를 이용하여 관심 키워드별 학문영역 매트릭스를 생성한다(S20).Next, the first matrix generating unit 32 preprocesses the keywords extracted in step S10 to generate a list of keywords of interest, and generates a matrix of subject areas for each keyword of interest using the generated list of keywords of interest and the subject area (S20).

S20 단계에서 제1 매트릭스 생성부(32)는 남은 키워드를 자연어 처리 기법을 이용하여 관심 키워드 목록을 생성하고, 생성된 관심 키워드 각각에 대해 해당 키워드가 포함된 논문의 저널에 할당된 학문영역을 연결하여 각 키워드가 어느 학문영역의 특성을 가지고 있는지 크롤링하여 관심 키워드별 학문영역 매트릭스를 생성한다.In step S20, the first matrix generator 32 generates a list of keywords of interest using natural language processing techniques for the remaining keywords, and connects the academic fields assigned to journals of papers containing the keywords for each generated keyword of interest. Then, by crawling which academic domain characteristics each keyword has, a matrix of academic domains for each keyword of interest is created.

자세히는, 키워드의 등장 빈도와 특정 학문영역 내 등장 빈도를 고려하여 하나의 키워드가 특정 학문 영역 내에서 얼마나 중요한 키워드인지 나타내는 TF-IDF 값을 관심 키워드별 학문영역 조합별로 산출하고, TF-IDF 값을 정규화하여 관심 키워드별 학문영역 매트릭스를 생성한다.In detail, the TF-IDF value, which indicates how important a keyword is in a specific academic area, is calculated for each combination of keywords and academic areas of interest, considering the frequency of occurrence of keywords and the frequency of occurrence in specific academic areas, and the TF-IDF value is normalized to create a matrix of academic domains for each keyword of interest.

그 다음 제2 매트릭스 생성부(33)가 S20 단계에서 생성된 관심 키워드별 학문영역 매트릭스를 이용하여 과목, 제2 전공 및 비교과 활동을 각각 학문영역 차원에 임베딩하여 각각의 학문영역 매트릭스를 생성한다(S30).Next, the second matrix generation unit 33 embeds the subject, second major, and extracurricular activities into the academic domain dimension using the academic domain matrix for each keyword of interest generated in step S20 to generate each academic domain matrix ( S30).

S30 단계에서 제2 매트릭스 생성부(33)는 설정 기간 내 개설된 교양 과목 및 전공 과목의 과목명과, 강의 개요를 포함하는 교수요목에 대해 형태소 분석 기법을 적용하여 다수의 키워드를 추출하고, 추출된 키워드와 S10단계에서 추출된 키워드를 이용하여 각 키워드가 한 과목 내에서 등장한 횟수 및 등장한 과목의 횟수로부터 교수요목 키워드-과목 쌍별 TF-IDF 값을 산출하고 정규화한 후, TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 과목별 학문영역 매트릭스를 생성한다.In step S30, the second matrix generation unit 33 extracts a number of keywords by applying a morpheme analysis technique to the syllabus including course names and lecture outlines of liberal arts and major courses opened within the set period, and extracts the extracted keywords. Using the keywords and the keywords extracted in step S10, calculate the TF-IDF value for each keyword-subject pair of the syllabus from the number of times each keyword appeared in one subject and the number of subjects that appeared, and after normalization, the top n TF-IDF values By embedding the keywords corresponding to the field dimension, a field matrix for each subject is created.

그리고, 제2 매트릭스 생성부(33)는 생성된 과목별 학문영역 매트릭스를 이용하여 각 제2 전공 커리큘럼에 포함된 모든 과목의 학문영역 벡터의 평균을 산출하고, 산출된 결과를 학문영역 차원에 임베딩하여 제2 전공별 학문영역 매트릭스를 생성한다.Then, the second matrix generation unit 33 calculates the average of the academic domain vectors of all subjects included in each second major curriculum using the generated academic domain matrix for each subject, and embeds the calculated result into the academic domain dimension. to create a matrix of academic areas for each of the second majors.

마지막으로 제2 매트릭스 생성부(33)는 설정 기간까지 업데이트된 비교과 활동 설명 데이터를 중복 제거한 후 형태소 분석 기법을 적용하여 추출된 키워드와 S10 단계에서 추출된 키워드를 결합하여 다수의 키워드를 추출하되, 비교과 활동별 키워드 사용 빈도 횟수를 이용하여 비교과 활동-키워드 쌍별 TF-IDF 값을 산출하고 정규화한 후 TF-IDF 값 상위 n개에 해당하는 키워드를 학문영역 차원에 임베딩하여 비교과 활동의 학문영역 매트릭스를 생성한다.Finally, the second matrix generation unit 33 removes redundant updated comparison and activity description data by the set period, and then extracts a plurality of keywords by combining the keywords extracted by applying the morphological analysis technique with the keywords extracted in step S10, After calculating and normalizing the TF-IDF value for each non-subject activity-keyword pair using the frequency of keyword use by non-subject activity, the keywords corresponding to the top n TF-IDF values are embedded in the academic area dimension to create an academic area matrix of extra-subject activity generate

그 다음 추천부(34)가 사용자로부터 관심 키워드가 입력되었는지 여부를 판단하여(S40), 관심 키워드가 입력되었으면 각각의 학문영역 매트릭스를 이용하여 입력된 관심 키워드에 대응하는 과목, 제2 전공 및 비교과 활동 중 적어도 하나 이상을 유사도가 높은 순으로 추천한다(S50).Next, the recommendation unit 34 determines whether a keyword of interest is input from the user (S40), and if the keyword of interest is input, the subject corresponding to the input keyword of interest, the second major, and the non-subject, using the matrix of each academic field. At least one or more activities are recommended in order of high similarity (S50).

이때, S50 단계는 사용자로부터 관심 키워드가 입력되면, S30 단계에서 생성된 각각의 학문영역 매트릭스를 이용하여 매트릭스 간 코사인 유사도에 따라 사용자로부터 입력된 관심 키워드와 코사인 유사도가 높은 상위 n개를 리스트로 추천한다.At this time, in step S50, when a keyword of interest is input from the user, the top n items having high cosine similarity with the keyword of interest entered by the user are recommended as a list according to the cosine similarity between the matrices using each academic area matrix generated in step S30. do.

도 11은 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 방법에서 영역별 통합 추천 모델의 성능을 나타낸 도면이다.11 is a diagram showing the performance of an integrated recommendation model for each area in the integrated recommendation method for learning activities based on keywords of interest using academic area embedding according to an embodiment of the present invention.

본 발명의 실시 예에 따른 학습활동 통합 추천 방법의 성능을 확인하기 위해 도 11에서와 같이 커버리지(Coverage) 지표를 사용하여 본 발명의 성능을 확인했다. 커버리지 지표는 추천 모델이 전체 항목 중 얼마나 넓은 범위를 추천할 수 있는지 평가하는 데 사용된다. 예를 들어, 전체 3000여 개의 과목 중 2400개의 과목을 추천할 수 있다면, 커버리지는 80%가 된다. 제2 전공 추천의 경우 10번의 시행 모두에서 커버리지가 100%였으며, TF-IDF를 이용한 baseline model의 성능 지표를 산출할 수 없기 때문에 비교에서 제외했다. 교양과목, 전공과목, 비교과 활동 추천영역 모두에서 본 발명의 통합 추천 방법을 이용한 통합 추천 모델은 기저 수준 모델보다 높은 수준의 커버리지 성능을 보이는 것을 확인할 수 있다.In order to confirm the performance of the learning activity integration recommendation method according to an embodiment of the present invention, the performance of the present invention was confirmed using the coverage index as shown in FIG. 11. The coverage metric is used to evaluate how wide a range of items a recommendation model can recommend. For example, if 2400 subjects can be recommended out of a total of 3000 subjects, the coverage is 80%. In the case of the second major recommendation, the coverage was 100% in all 10 trials, and it was excluded from comparison because the performance index of the baseline model using TF-IDF could not be calculated. It can be seen that the integrated recommendation model using the integrated recommendation method of the present invention shows a higher level of coverage performance than the base-level model in all areas of liberal arts, major subjects, and extracurricular activity recommendation.

이와 같은, 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.Such a keyword-based learning activity integrated recommendation method using academic field embedding may be implemented as an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.Program instructions recorded on the computer-readable recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있다.Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes such as those produced by a compiler. The hardware device may be configured to act as one or more software modules to perform processing according to the present invention.

상술한 바와 같이, 본 발명의 실시 예에 따른 학문영역 임베딩을 이용한 관심 키워드 기반 학습활동 통합 추천 시스템 및 그 방법은 학문영역 임베딩을 통해 구축된 매트릭스를 이용하여 사용자로부터 입력되는 관심 키워드에 대응하는 교양 과목, 전공 과목 및 제2 전공을 포함하는 교과 활동과 비교과 활동을 추천함으로써 사용자로 하여금 텍스트 데이터에 포함되지 않은 키워드를 입력하더라도 유사한 과목 및 비교과 활동을 추천해주어 사용자 만족도를 향상시킬 수 있다.As described above, the integrated recommendation system and method for learning activities based on keywords of interest using academic domain embeddings according to an embodiment of the present invention is a liberal arts corresponding to the interest keyword input from the user using a matrix constructed through academic domain embedding. By recommending curricular activities and extracurricular activities including subjects, major subjects, and second majors, user satisfaction can be improved by recommending similar subjects and extracurricular activities even if the user inputs a keyword that is not included in the text data.

또한 본 발명의 실시 예에 따르면, 다양한 키워드를 사용하거나 새로운 키워드를 추가하는 것이 용이하여 교양 과목, 전공 과목, 제2전공 및 비교과 활동을 추천하는 서비스 이 외에도 학문적 영역과 관련된 다양한 추천 서비스 영역에 확대 적용이 가능하여 범용적으로 활용 가능한 이점이 있다.In addition, according to an embodiment of the present invention, it is easy to use various keywords or add new keywords, so that it is expanded to various recommendation service areas related to academic areas in addition to services that recommend liberal arts subjects, major subjects, second majors, and extracurricular activities. It has the advantage that it can be applied and used universally.

본 발명은 도면에 도시된 실시 예를 참고로 하여 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호범위는 아래의 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. will be. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the claims below.

1 : 사용자 단말
3 : 학습활동 통합 추천 시스템
30 : 입력부
31 : 추출부
32 : 제1 매트릭스 생성부
33 : 제2 매트릭스 생성부
34 : 추천부
301 : 키워드별 학문영역 매트릭스
302 : 키워드 매트릭스
303 : 과목정보 키워드 매트릭스
310 : 과목별 학문영역 매트릭스
400 : 제2 전공별 커리큘럼 매트릭스
420 : 제2 전공별 학문영역 매트릭스1: User terminal
3: Learning activity integrated recommendation system
30: input unit
31: extraction unit
32: first matrix generating unit
33: second matrix generator
34: recommendation section
301: matrix of academic areas by keyword
302: keyword matrix
303: subject information keyword matrix
310: Academic area matrix by subject
400: Curriculum Matrix by Second Major
420: Matrices of Academic Areas by 2nd Major

Claims

In the integrated recommendation system for learning activities based on keywords of interest using academic area embedding,
an extraction unit that extracts a number of keywords and a number of academic fields from a number of papers and journals;
a first matrix generating unit that pre-processes the extracted keywords to generate a list of keywords of interest, and generates a matrix of study areas for each keyword of interest using the created list of keywords of interest and the study area;
a second matrix generating unit for generating respective academic domain matrices by embedding subjects, second majors, and extracurricular activities in respective academic domain dimensions using the generated academic domain matrices for each keyword of interest; and
When a keyword of interest is input from the user, a recommendation unit for recommending at least one of a subject, a second major, and extracurricular activities corresponding to the inputted keyword of interest using the matrix of each academic area in order of similarity,
The second matrix generator,
A number of keywords are extracted by applying the morpheme analysis technique to the syllabus including course names and lecture outlines of liberal arts and major courses opened within the set period, and using the extracted keywords and the keywords extracted from the extraction unit, After calculating and normalizing the TF-IDF value for each keyword-subject pair in the syllabus from the number of times each keyword appeared in one subject and the number of subjects that appeared, the keywords corresponding to the top n TF-IDF values were embedded in the discipline dimension Create a matrix of academic areas for each subject,
Calculate the average of the academic domain vectors of all subjects included in each second major curriculum using the generated academic domain matrix for each subject, and embed the calculated result into the academic domain dimension to generate an academic domain matrix for each second major Learning activity integrated recommendation system.

According to claim 1,
The extraction part,
A number of keywords are extracted from a number of papers published during the set period, and a number of academic areas pre-allocated to the journal to which the paper was submitted for each paper are extracted,
A learning activity integrated recommendation system that aggregates the frequency of occurrence of keywords included in the paper by academic area, removes keywords whose frequency of appearance is less than the set number of times, and extracts the remaining keywords.

According to claim 2,
The first matrix generator,
The extracted remaining keywords are used to generate the list of keywords of interest using natural language processing techniques, and for each generated keyword of interest, the academic area assigned to the journal of the paper containing the keyword is connected to each keyword in which academic area. A learning activity integrated recommendation system that creates a matrix of academic areas for each keyword of interest by crawling whether it has characteristics.

According to claim 3,
The first matrix generator,
Calculate the TF-IDF (Term Frequency-Inverse Document Frequency) value, which indicates how important a keyword is in a specific academic area, by considering the frequency of occurrence of keywords and the frequency of appearance in specific academic areas for each combination of the keywords of interest and academic areas and a learning activity integrated recommendation system that normalizes the TF-IDF values to generate a matrix of academic domains for each keyword of interest.

delete

According to claim 1,
The second matrix generator,
After deduplication of the updated non-subject activity description data by the set period, a morphological analysis technique is applied to extract a number of keywords by combining the extracted keywords with the keywords extracted from the extraction unit, using the frequency of keyword use for each non-subject activity An integrated learning activity recommendation system that generates a subject matrix of extracurricular activities by calculating and normalizing the TF-IDF values for each extracurricular activity-keyword pair and then embedding keywords corresponding to the top n TF-IDF values into the subject dimension.

According to claim 1,
Each keyword included in the generated keyword list of interest is classified into one of the humanities, social sciences, education, natural sciences, engineering, medicine, arts and sports, and convergence, and allocated to each sub-series. A learning activity integrated recommendation system further comprising a keyword classification unit that calculates the number of keywords of interest for each sub-series using the average of the embedding vector of the academic area of all subjects that have been selected.

According to claim 1,
The recommendation section,
When a keyword of interest is input from the user, the learning activity integrated recommendation system recommends the top n items having a high cosine similarity to the input keyword of interest in a list according to the cosine similarity between the matrices using the matrix of each academic area.

In the learning activity integrated recommendation method performed by the interest keyword-based learning activity integrated recommendation system using academic field embedding,
Extracting a plurality of keywords and a plurality of academic fields from a plurality of papers and journals in an extraction unit;
generating a list of keywords of interest by pre-processing the extracted keywords in a first matrix generating unit, and generating a matrix of study areas by keywords of interest using the created list of keywords of interest and the study area;
generating respective academic domain matrices by embedding subjects, second majors, and extracurricular activities in respective academic domain dimensions using the generated academic domain matrices for each keyword of interest in a second matrix generator; and
When a keyword of interest is input from a user in the recommendation unit, recommending at least one of a subject, a second major, and extracurricular activities corresponding to the inputted keyword of interest using the matrix of each academic field in order of similarity. include,
The step of generating each academic domain matrix,
A number of keywords are extracted by applying the morpheme analysis technique to the syllabus including course names and lecture outlines of liberal arts and major courses opened within the set period, and using the extracted keywords and the keywords extracted from the extraction unit, After calculating and normalizing the TF-IDF value for each keyword-subject pair in the syllabus from the number of times each keyword appeared in one subject and the number of subjects that appeared, the keywords corresponding to the top n TF-IDF values were embedded in the discipline dimension Create a matrix of academic areas for each subject,
Calculate the average of the academic domain vectors of all subjects included in each second major curriculum using the generated academic domain matrix for each subject, and embed the calculated result into the academic domain dimension to generate an academic domain matrix for each second major Learning activity integrated recommendation method.

According to claim 10,
The extraction step is
A number of keywords are extracted from a number of papers published during the set period, and a number of academic areas pre-allocated to the journal to which the paper was submitted for each paper are extracted,
A learning activity integrated recommendation method that aggregates the frequency of occurrence of keywords included in the paper for each academic area, removes keywords whose frequency of appearance is less than the set number of times, and extracts the remaining keywords.

According to claim 11,
The step of generating a matrix of academic domains for each keyword of interest,
The extracted remaining keywords are used to generate the list of keywords of interest using natural language processing techniques, and for each generated keyword of interest, the academic area assigned to the journal of the paper containing the keyword is connected to each keyword in which academic area. A learning activity integrated recommendation method that creates a matrix of academic areas for each keyword of interest by crawling whether it has characteristics.

According to claim 12,
The step of generating a matrix of academic domains for each keyword of interest,
Calculate the TF-IDF (Term Frequency-Inverse Document Frequency) value, which indicates how important a keyword is in a specific academic area, by considering the frequency of occurrence of keywords and the frequency of appearance in specific academic areas for each combination of the keywords of interest and academic areas and recommending integrated learning activities for generating a matrix of academic domains for each keyword of interest by normalizing the TF-IDF values.

delete

According to claim 10,
The step of generating each academic domain matrix,
After deduplication of the updated non-subject activity description data by the set period, a morphological analysis technique is applied to extract a number of keywords by combining the extracted keywords with the keywords extracted from the extraction unit, using the frequency of keyword use for each non-subject activity A learning activity integrated recommendation method that generates an academic domain matrix of extracurricular activities by calculating and normalizing the TF-IDF values for each extracurricular activity-keyword pair and then embedding the keywords corresponding to the top n TF-IDF values into the academic domain dimension.

According to claim 10,
The recommended steps are:
When a keyword of interest is input from the user, the top n items having a high cosine similarity to the input keyword of interest are recommended as a list according to the cosine similarity between the matrices using the matrices of each academic area.

A computer-readable recording medium having a computer program recorded thereon for performing the method of recommending integrated learning activities based on keywords of interest according to claim 10 .