KR102375755B1

KR102375755B1 - System and method for recommendation of courses based on course similarity and computer program for the same

Info

Publication number: KR102375755B1
Application number: KR1020210042245A
Authority: KR
Inventors: 김규태; 문기범; 한재호; 권혜정; 이수강; 이진숙; 한수연
Original assignee: 고려대학교 산학협력단
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2022-03-17

Abstract

The present invention relates to a system for recommending a subject based on subject similarity, which comprises: a first data analysis unit for analysing text data associated to one or more subjects to generate lecture content information for each of the subjects; a second data analysis unit for analyzing attendance history data of a student for each of the subjects to generate student characteristic information on each of the subjects; a similarity determining unit for generating similarity information on each of the subjects using the lecture content information and the student characteristic information; and a recommended subject determining unit for generating subject recommendation information on a user on the basis of subject-of-interest information received from the user and the similarity information. Accordingly, when the system for recommending a subject is used, a subject can be recommended at high accuracy as compared to that of the conventional invention by determining the similarity between subjects in consideration of a lecture content level and student characteristics at the same time. Also, a wide variety of subjects meeting the interest of each student can be recommended.

Description

SYSTEM AND METHOD FOR RECOMMENDATION OF COURSES BASED ON COURSE SIMILARITY AND COMPUTER PROGRAM FOR THE SAME

실시예들은 과목 유사도에 기반한 과목 추천 시스템과 방법 및 이를 위한 컴퓨터 프로그램에 대한 것이다. 보다 자세하게는, 실시예들은 강의내용과 수강생 특성을 함께 고려하여 산출되는 과목 유사도에 기반하여 추천과목을 결정할 수 있는 과목 추천 모델과 이를 서비스하는 기술에 대한 것이다. Embodiments relate to a subject recommendation system and method based on subject similarity, and a computer program therefor. More specifically, embodiments relate to a subject recommendation model capable of determining a recommended subject based on subject similarity calculated by considering lecture content and student characteristics together and a technology for providing the same.

최근 많은 전문가들은 인공지능(Artificial Intelligence; AI) 기술이 우리 사회에 혁명적인 변화를 일으킬 것으로 전망하고 있다. 특히, 코로나바이러스 감염증-19(COVID-19)로 인해 사회 각 분야의 디지털화가 빠르게 진행되면서 많은 양의 데이터가 축적되고 있으며, AI 기술의 도입과 확산이 가속화되고 있다. Recently, many experts predict that artificial intelligence (AI) technology will cause a revolutionary change in our society. In particular, as digitization in each field of society is rapidly progressing due to COVID-19, a large amount of data is being accumulated, and the introduction and spread of AI technology is accelerating.

교육 영역에서도 AI 도입에 관한 논의가 활발하게 이루어지고 있으며, 대한민국 정부에서는 AI기술을 활용한 개인 맞춤형 교육을 교육의 발전 방향으로 상정 및 추진하고 있다. 또한, 이와 관련된 종래 기술로, 공개특허공보 제10-2021-0026234호는 AI 기반의 사용자 맞춤형 수학 교육 서비스를 제공하는 장치 및 방법을 개시한다. Discussion on the introduction of AI is also actively taking place in the education field, and the Korean government is supposing and promoting personalized education using AI technology as the development direction of education. In addition, as a related prior art, Korean Patent Application Laid-Open No. 10-2021-0026234 discloses an apparatus and method for providing an AI-based user-customized math education service.

그러나, 공개특허공보 제10-2021-0026234호에 개시된 것을 비롯한 종래의 교육 관련 AI 기술에서는, 연구 논문의 키워드, 교양강의의 교수요목 및 학습목표와 같은 비정형 데이터를 AI를 통해 분석하는 데에는 통찰이 미치지 못하고 있으며, 또한 이러한 비정형 데이터를 이용하여 과목간의 유사도를 도출하거나 이를 시각적으로 제공하는 방법은 전혀 개시하고 있지 못하다. However, in conventional education-related AI technologies, including those disclosed in Korean Patent Application Laid-Open No. 10-2021-0026234, there is no insight in analyzing unstructured data such as keywords of research papers, syllabus of liberal arts lectures, and learning goals through AI. Moreover, there is no disclosure of a method for deriving similarity between subjects using such unstructured data or providing it visually.

공개특허공보 제10-2021-0026234호Laid-Open Patent Publication No. 10-2021-0026234

본 발명의 일 측면에 따르면, 비정형 데이터를 유사 과목 도출에 이용할 수 있으며, 강의내용 차원과 수강생 특성을 고려하여 과목 간의 유사도를 판단할 수 있고, 유사 과목을 과목지도 형태로 시각화하여 제공할 수 있는 과목 유사도에 기반한 과목 추천 시스템과 방법 및 이를 위한 컴퓨터 프로그램을 제공할 수 있다. According to one aspect of the present invention, unstructured data can be used to derive similar subjects, the degree of similarity between subjects can be determined in consideration of the dimension of lecture content and student characteristics, and similar subjects can be visualized and provided in the form of subject guidance. It is possible to provide a subject recommendation system and method based on subject similarity and a computer program for the same.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다. The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명의 일 측면에 따른 과목 유사도에 기반한 과목 추천 시스템은, 하나 이상의 과목에 연관된 텍스트 데이터를 분석함으로써 상기 하나 이상의 과목 각각에 대한 강의내용 정보를 생성하도록 구성된 제1 데이터 분석부; 과목별 수강생의 수강이력 데이터를 분석함으로써 상기 하나 이상의 과목 각각에 대한 수강생 특성 정보를 생성하도록 구성된 제2 데이터 분석부; 상기 강의내용 정보 및 상기 수강생 특성 정보를 이용하여 과목별 유사도 정보를 생성하도록 구성된 유사도 결정부; 및 사용자로부터 수신된 관심 과목 정보 및 상기 유사도 정보에 기초하여 상기 사용자에 대한 과목 추천 정보를 생성하도록 구성된 추천과목 결정부를 포함한다.According to an aspect of the present invention, a subject recommendation system based on subject similarity includes: a first data analyzer configured to generate lecture content information for each of the one or more subjects by analyzing text data related to one or more subjects; a second data analysis unit configured to generate student characteristic information for each of the one or more subjects by analyzing the attendance history data of the students for each subject; a similarity determining unit configured to generate similarity information for each subject using the lecture content information and the student characteristic information; and a recommended subject determining unit configured to generate subject recommendation information for the user based on the interest subject information and the similarity information received from the user.

일 실시예에 따른 과목 추천 시스템은, 상기 과목의 교수요목, 상기 과목의 학습목표, 학과별 커리큘럼 및 학문분야별 출간논문 키워드 중 하나 이상으로부터 상기 텍스트 데이터를 추출하도록 구성된 전처리부를 더 포함한다.The subject recommendation system according to an embodiment further includes a preprocessor configured to extract the text data from at least one of a syllabus of the subject, a learning goal of the subject, a curriculum for each department, and a published thesis keyword for each academic field.

일 실시예에서, 상기 제1 데이터 분석부는, 상기 텍스트 데이터에 대한 단어 빈도-역 문서 빈도 기반의 분석에 의하여 상기 과목의 강의내용을 학습하고, 상기 강의내용에 대한 학습 결과에 기초하여 상기 강의내용 정보를 생성하도록 더 구성된다.In one embodiment, the first data analysis unit learns the lecture contents of the subject by word frequency-inverse document frequency-based analysis of the text data, and the lecture contents based on the learning result of the lecture contents and generate information.

일 실시예에서, 상기 제2 데이터 분석부는, 복수의 수강생의 상기 수강이력 데이터에 기초하여 학과별 수강비율 데이터를 생성하고, 과목별 수강생의 상기 학과별 수강비율 데이터를 이용하여 상기 수강생 특성 정보를 생성하도록 더 구성된다. In an embodiment, the second data analysis unit is configured to generate attendance rate data for each department based on the attendance history data of a plurality of students, and generate the student characteristic information using the attendance ratio data for each department of the students for each subject. more composed.

일 실시예에서, 상기 유사도 결정부는, 상기 강의내용 정보 및 상기 수강생 특성 정보에 각각 상응하는 매트릭스 간의 코사인 유사도를 이용하여 상기 과목별 유사도 정보를 생성하도록 더 구성된다.In an embodiment, the similarity determining unit is further configured to generate the similarity information for each subject by using a cosine similarity between matrices respectively corresponding to the lecture content information and the learner characteristic information.

일 실시예에서, 상기 관심 과목 정보는 상기 사용자의 수강이력 및 수강희망과목 중 하나 이상의 정보를 포함한다. 이때, 상기 추천과목 결정부는, 상기 수강이력 및 수강희망과목 중 하나 이상의 정보를 이용하여 상기 사용자의 선호과목을 결정하고, 상기 선호과목에 대한 유사도에 기초하여 추천과목을 결정함으로써 상기 과목 추천 정보를 생성하도록 더 구성된다.In an embodiment, the interest subject information includes information on at least one of the user's attendance history and desired subjects. In this case, the recommended subject determination unit determines the preferred subject of the user using at least one of the course history and desired subject information, and determines the recommended subject based on the degree of similarity to the preferred subject to obtain the subject recommendation information. further configured to create

일 실시예에서, 상기 추천과목 결정부는, 과목별 특성에 상응하는 복수 개의 차원에 의해 정의되며 과목 간의 유사도가 높을수록 과목들이 서로 가까이 위치하는 과목지도를 이용하여 상기 과목 추천 정보를 상기 사용자에게 제공하도록 더 구성된다. In an embodiment, the recommended subject determining unit provides the subject recommendation information to the user by using a subject map defined by a plurality of dimensions corresponding to the characteristics of each subject and in which subjects are located closer to each other as the degree of similarity between subjects increases. further configured to do so.

본 발명의 일 측면에 따른 과목 유사도에 기반한 과목 추천 방법은, 과목 추천 시스템이 하나 이상의 과목에 연관된 텍스트 데이터를 분석함으로써 상기 하나 이상의 과목 각각에 대한 강의내용 정보를 생성하는 단계; 상기 과목 추천 시스템이 과목별 수강생의 수강이력 데이터를 분석함으로써 상기 하나 이상의 과목 각각에 대한 수강생 특성 정보를 생성하는 단계; 상기 과목 추천 시스템이 상기 강의내용 정보 및 상기 수강생 특성 정보를 이용하여 과목별 유사도 정보를 생성하는 단계; 및 상기 과목 추천 시스템이, 사용자로부터 수신된 관심 과목 정보 및 상기 유사도 정보에 기초하여, 상기 사용자에 대한 과목 추천 정보를 생성하는 단계를 포함한다.According to an aspect of the present invention, a method for recommending a subject based on subject similarity includes: generating, by a subject recommendation system, text data related to one or more subjects to generate lecture content information for each of the one or more subjects; generating, by the subject recommendation system, information on student characteristics for each of the one or more subjects by analyzing attendance history data of students for each subject; generating, by the subject recommendation system, similarity information for each subject using the lecture content information and the student characteristic information; and generating, by the subject recommendation system, subject recommendation information for the user based on the subject information of interest and the similarity information received from the user.

일 실시예에 따른 과목 추천 방법은, 상기 과목 추천 시스템이, 상기 과목의 교수요목, 상기 과목의 학습목표, 학과별 커리큘럼 및 학문분야별 출간논문 키워드 중 하나 이상으로부터 상기 텍스트 데이터를 추출하는 단계를 더 포함한다.The subject recommendation method according to an embodiment further includes the step of extracting, by the subject recommendation system, the text data from at least one of a syllabus of the subject, a learning goal of the subject, a curriculum for each department, and a published thesis keyword for each academic field do.

일 실시예에서, 상기 강의내용 정보를 생성하는 단계는, 상기 과목 추천 시스템이, 상기 텍스트 데이터에 대한 단어 빈도-역 문서 빈도 기반의 분석에 의하여 상기 과목의 강의내용을 학습하는 단계; 및 상기 과목 추천 시스템이, 상기 강의내용에 대한 학습 결과에 기초하여 상기 강의내용 정보를 생성하는 단계를 포함한다.In an embodiment, the generating of the lecture content information may include: learning, by the subject recommendation system, the lecture contents of the subject through word frequency-inverse document frequency-based analysis of the text data; and generating, by the subject recommendation system, the lecture content information based on a learning result of the lecture content.

일 실시예에서, 상기 수강생 특성 정보를 생성하는 단계는, 상기 과목 추천 시스템이, 복수의 수강생의 상기 수강이력 데이터에 기초하여 학과별 수강비율 데이터를 생성하는 단계; 및 상기 과목 추천 시스템이, 과목별 수강생의 상기 학과별 수강비율 데이터를 이용하여 상기 수강생 특성 정보를 생성하는 단계를 포함한다.In an embodiment, the generating of the student characteristic information may include: generating, by the subject recommendation system, data on the attendance rate for each department based on the attendance history data of a plurality of students; and generating, by the subject recommendation system, the student characteristic information by using the data on the attendance rate by department of the students for each subject.

일 실시예에서, 상기 유사도 정보를 생성하는 단계는, 상기 과목 추천 시스템이, 상기 강의내용 정보 및 상기 수강생 특성 정보에 각각 상응하는 매트릭스들 간의 코사인 유사도를 산출하는 단계를 포함한다.In an embodiment, the generating of the similarity information includes calculating, by the subject recommendation system, a cosine similarity between matrices respectively corresponding to the lecture content information and the learner characteristic information.

일 실시예에서, 상기 관심 과목 정보는 상기 사용자의 수강이력 및 수강희망과목 중 하나 이상의 정보를 포함한다. 이때, 상기 과목 추천 정보를 생성하는 단계는, 상기 과목 추천 시스템이, 상기 수강이력 및 수강희망과목 중 하나 이상의 정보를 이용하여 상기 사용자의 선호과목을 결정하는 단계; 및 상기 과목 추천 시스템이, 상기 선호과목에 대한 유사도에 기초하여 추천과목을 결정하는 단계를 포함한다.In an embodiment, the interest subject information includes information on at least one of the user's attendance history and desired subjects. In this case, the generating of the course recommendation information may include: determining, by the course recommendation system, the user's preferred course using information on at least one of the course history and desired courses; and determining, by the subject recommendation system, a recommended subject based on the degree of similarity to the preferred subject.

일 실시예에 따른 과목 추천 방법은, 상기 과목 추천 시스템이, 과목별 특성에 상응하는 복수 개의 차원에 의해 정의되며 과목 간의 유사도가 높을수록 과목들이 서로 가까이 위치하는 과목지도를 이용하여 상기 과목 추천 정보를 상기 사용자에게 제공하는 단계를 더 포함한다. In the subject recommendation method according to an embodiment, the subject recommendation system is defined by a plurality of dimensions corresponding to the characteristics of each subject, and the subject recommendation information using a subject map in which subjects are located closer to each other as the degree of similarity between subjects increases The method further includes providing to the user.

본 발명의 일 측면에 따른 컴퓨터 프로그램은 하드웨어와 결합되어 전술한 실시예들에 따른 과목 유사도에 기반한 과목 추천 방법을 실행하기 위한 것으로서 컴퓨터로 판독 가능한 기록매체에 저장된 것일 수 있다. The computer program according to an aspect of the present invention is combined with hardware to execute the subject recommendation method based on subject similarity according to the above-described embodiments, and may be stored in a computer-readable recording medium.

실시예들에 따른 과목 유사도에 기반한 과목 추천 시스템과 방법에 의하면, 강의내용 차원과 수강생 특성을 동시에 고려하여 과목 간의 유사도를 판별함으로써 종래 기술, 예컨대, 강의내용 또는 수강생 특성 중 어느 하나만을 사용하는 모델에 비해 훨씬 높은 정확도로 과목을 추천할 수 있으며, 학생별 관심사에 부합하는 폭 넓고 다양한 과목을 추천함으로써 종래 기술에 비해 추천의 범위와 다양성 측면에서도 더 양호한 결과를 도출할 수 있는 이점이 있다. According to the subject recommendation system and method based on subject similarity according to the embodiments, a model using only one of the prior art, for example, lecture content or student characteristics, by determining the similarity between subjects by considering the lecture content dimension and student characteristics at the same time Subjects can be recommended with much higher accuracy compared to the previous technology, and by recommending a wide variety of subjects that meet the interests of each student, there is an advantage that better results can be derived in terms of the scope and diversity of recommendations compared to the prior art.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다. Effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 일 실시예에 따른 과목 유사도에 기반한 과목 추천 시스템의 개략적인 블록도이다.
도 2는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법의 각 단계를 나타내는 순서도이다.
도 3은 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법에 의해 강의내용 정보를 생성하는 과정의 개념도이다.
도 4는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법에 의해 수강생 특성 정보를 생성하는 과정의 개념도이다.
도 5 내지 도 7는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법의 효과를 종래 기술과 비교하여 나타낸 그래프이다.
도 8a 및 8b는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법에 의하여 제공되는 과목지도의 예시를 나타내는 개념도이다. 1 is a schematic block diagram of a subject recommendation system based on subject similarity according to an exemplary embodiment.
2 is a flowchart illustrating each step of a method for recommending a subject based on subject similarity according to an exemplary embodiment.
3 is a conceptual diagram illustrating a process of generating lecture content information by a subject recommendation method based on subject similarity according to an exemplary embodiment.
4 is a conceptual diagram illustrating a process of generating student characteristic information by a subject recommendation method based on subject similarity according to an exemplary embodiment.
5 to 7 are graphs illustrating an effect of a subject recommendation method based on subject similarity according to an exemplary embodiment in comparison with the prior art.
8A and 8B are conceptual diagrams illustrating examples of subject guidance provided by a subject recommendation method based on subject similarity according to an exemplary embodiment.

이하에서, 도면을 참조하여 본 발명의 실시예들에 대하여 상세히 살펴본다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 일 실시예에 따른 과목 유사도에 기반한 과목 추천 시스템의 개략적인 블록도이다. 1 is a schematic block diagram of a subject recommendation system based on subject similarity according to an exemplary embodiment.

도 1을 참조하면, 과목 유사도에 기반한 과목 추천 시스템(3)은 유선 및/또는 무선 네트워크를 통하여 수강생의 사용자 장치(1), 교육기관 서버(2) 및/또는 학술정보 제공 서버(4) 등과 통신하면서 동작하도록 구성될 수 있다. 유선 및/또는 무선 네트워크를 통한 통신 방법은 객체와 객체가 네트워킹 할 수 있는 모든 통신 방법을 포함할 수 있으며, 유선 통신, 무선 통신, 3G, 4G, 혹은 그 이외의 방법으로 제한되지 않는다. Referring to FIG. 1 , the subject recommendation system 3 based on subject similarity includes a user device 1 of a student, an educational institution server 2 and/or an academic information providing server 4 through a wired and/or wireless network. It may be configured to operate while communicating. A communication method through a wired and/or wireless network may include any communication method capable of networking an object and an object, and is not limited to wired communication, wireless communication, 3G, 4G, or other methods.

예를 들어, 유선 및/또는 무선 네트워크는 LAN(Local Area Network), MAN(Metropolitan Area Network), GSM(Global System for Mobile Network), EDGE(Enhanced Data GSM Environment), HSDPA(High Speed Downlink Packet Access), W-CDMA(Wideband Code Division Multiple Access), CDMA(Code Division Multiple Access), TDMA(Time Division Multiple Access), 블루투스(Bluetooth), 지그비(Zigbee), 와이-파이(Wi-Fi), VoIP(Voice over Internet Protocol), LTE Advanced, IEEE802.16m, WirelessMAN-Advanced, HSPA+, 3GPP Long Term Evolution (LTE), Mobile WiMAX (IEEE 802.16e), UMB (formerly EV-DO Rev. C), Flash-OFDM, iBurst and MBWA (IEEE 802.20) systems, HIPERMAN, Beam-Division Multiple Access (BDMA), Wi-MAX(World Interoperability for Microwave Access) 및 초음파 활용 통신으로 이루어진 군으로부터 선택되는 하나 이상의 통신 방법에 의한 통신 네트워크를 지칭할 수 있으나, 이에 한정되는 것은 아니다.For example, wired and/or wireless networks include Local Area Network (LAN), Metropolitan Area Network (MAN), Global System for Mobile Network (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA) , W-CDMA (Wideband Code Division Multiple Access), CDMA (Code Division Multiple Access), TDMA (Time Division Multiple Access), Bluetooth, Zigbee, Wi-Fi, VoIP (Voice) over Internet Protocol), LTE Advanced, IEEE802.16m, WirelessMAN-Advanced, HSPA+, 3GPP Long Term Evolution (LTE), Mobile WiMAX (IEEE 802.16e), UMB (formerly EV-DO Rev. C), Flash-OFDM, iBurst and MBWA (IEEE 802.20) systems, HIPERMAN, Beam-Division Multiple Access (BDMA), Wi-MAX (World Interoperability for Microwave Access), and ultrasonic-based communication to refer to a communication network by one or more communication methods selected from the group consisting of However, the present invention is not limited thereto.

일 실시예에서, 과목 추천 시스템(3)은 제1 데이터 분석부(31), 제2 데이터 분석부(32), 유사도 결정부(33) 및 추천과목 결정부(35)를 포함한다. 일 실시예에서, 과목 추천 시스템(3)은 전처리부(30)를 더 포함할 수 있다. 또한 일 실시예에서, 과목 추천 시스템(3)은 데이터베이스(database; DB)(34)를 더 포함할 수 있다. In one embodiment, the subject recommendation system 3 includes a first data analysis unit 31 , a second data analysis unit 32 , a similarity determining unit 33 , and a recommended subject determining unit 35 . In an embodiment, the subject recommendation system 3 may further include a preprocessor 30 . Also, in one embodiment, the subject recommendation system 3 may further include a database (DB) 34 .

본 명세서에 기재된 시스템, 장치 및 서버는 전적으로 하드웨어이거나, 또는 부분적으로 하드웨어이고 부분적으로 소프트웨어인 측면을 가질 수 있다. 예컨대, 본 명세서의 시스템, 장치 및 서버와, 이들에 포함된 각 부(unit)는 특정 형식 및 내용의 데이터를 처리하거나 또는/또한 전자통신 방식으로 주고받기 위한 하드웨어 및 이에 관련된 소프트웨어를 통칭할 수 있다. 본 명세서에서 "부", "모듈", "장치", "단말기", "서버" 또는 "시스템" 등의 용어는 하드웨어 및 해당 하드웨어에 의해 구동되는 소프트웨어의 조합을 지칭하는 것으로 의도된다. 예를 들어, 하드웨어는 CPU 또는 다른 프로세서(processor)를 포함하는 데이터 처리 기기일 수 있다. 또한, 하드웨어에 의해 구동되는 소프트웨어는 실행중인 프로세스, 객체(object), 실행파일(executable), 실행 스레드(thread of execution), 프로그램(program) 등을 지칭할 수 있다.The systems, devices, and servers described herein may be wholly hardware, or may have aspects that are partly hardware and partly software. For example, the system, apparatus, and server of the present specification, and each unit included therein, may collectively refer to hardware and related software for processing and/or sending and receiving data in a specific format and content in an electronic communication manner. there is. As used herein, terms such as “unit”, “module”, “device”, “terminal”, “server” or “system” are intended to refer to a combination of hardware and software driven by the hardware. For example, the hardware may be a data processing device including a CPU or other processor. In addition, software driven by hardware may refer to a running process, an object, an executable file, a thread of execution, a program, and the like.

또한, 본 실시예에 따른 과목 추천 시스템(3)을 구성하는 각각의 요소는 반드시 서로 물리적으로 구분되는 별개의 장치를 지칭하는 것으로 의도되지 않는다. 즉, 도 1의 전처리부(30), 제1 데이터 분석부(31), 제2 데이터 분석부(32), 유사도 결정부(33), DB(34) 및 추천과목 결정부(35) 등은 과목 추천 시스템(3)을 구성하는 하드웨어를 해당 하드웨어에 의해 수행되는 동작에 따라 기능적으로 구분한 것일 뿐, 반드시 각각의 부가 서로 독립적으로 구비되어야 하는 것이 아니다. 물론, 실시예에 따라서는 과목 추천 시스템(3)의 각 부 중 하나 이상이 서로 물리적으로 구분되는 별개의 장치로 구현되는 것도 가능하다.In addition, each element constituting the subject recommendation system 3 according to the present embodiment is not necessarily intended to refer to separate devices physically separated from each other. That is, the pre-processing unit 30, the first data analysis unit 31, the second data analysis unit 32, the similarity determining unit 33, the DB 34, and the recommended subject determining unit 35 of FIG. 1 are The hardware constituting the subject recommendation system 3 is functionally classified according to the operation performed by the corresponding hardware, and each part does not necessarily have to be provided independently of each other. Of course, depending on the embodiment, at least one of the respective parts of the subject recommendation system 3 may be implemented as separate devices physically separated from each other.

과목 추천 시스템(3)은 추천의 대상에 해당하는 하나 이상의 과목에 연관된 텍스트 데이터와 복수의 수강생들의 수강이력 데이터 등을 분석함으로써 강의내용 정보와 수강생 특성 정보를 생성하고, 이를 토대로 과목 간의 유사도를 결정하도록 구성된다. 또한, 과목 추천 시스템(3)은 사용자 장치(1)로부터 과목을 추천받고자 하는 사용자의 관심 과목 정보를 수신하고, 관심 과목 정보와 전술한 유사도 정보를 기초로 과목 추천 정보를 생성하여 사용자 장치(1)에 제공할 수 있다. The subject recommendation system 3 generates lecture content information and student characteristic information by analyzing text data related to one or more subjects corresponding to the recommendation target and attendance history data of a plurality of students, and determines the similarity between subjects based on this. is configured to In addition, the subject recommendation system 3 receives subject information of a user who wants to receive a subject recommendation from the user device 1 , and generates subject recommendation information based on the subject of interest information and the above-described similarity information to generate subject recommendation information from the user device 1 . ) can be provided.

사용자가 관심 과목 정보를 전송하고 과목 추천 정보를 수신하는 것이 가능하도록, 과목 추천 시스템(3)은 사용자 장치(1) 상에서 실행되는 애플리케이션(또는, 앱(app))과 통신함으로써 애플리케이션의 기능 수행을 가능하게 하는 애플리케이션 서비스 서버의 기능을 수행할 수 있다. 또는, 다른 실시예에서 과목 추천 시스템(3)은 사용자 장치(1) 상에서 실행되는 웹 브라우저에 의하여 접속 가능한 웹 페이지를 제공하는 웹 서버 등의 형태로 구현될 수도 있다. To enable the user to transmit subject information of interest and receive subject recommendation information, the subject recommendation system 3 communicates with an application (or an app) running on the user device 1 to perform the function of the application. It can perform the function of an application service server that enables it. Alternatively, in another embodiment, the subject recommendation system 3 may be implemented in the form of a web server that provides a web page accessible by a web browser running on the user device 1 .

도 1에 도시된 실시예에서 사용자 장치(1)는 노트북 컴퓨터의 형태로 도시되었다. 그러나 이는 예시적인 것으로서, 사용자 장치(1)의 종류는 도면에 도시된 것으로 한정되는 것은 아니다. 예를 들어, 과목을 추천받고자 하는 사용자는 스마트폰(smartphone)과 같은 이동 통신 단말기, 개인용 컴퓨터(personal computer), 노트북 컴퓨터, PDA(personal digital assistant), 태블릿(tablet), IPTV(Internet Protocol Television) 등을 위한 셋톱박스(set-top box) 등 임의의 컴퓨팅 장치를 이용하여 과목 추천 시스템(3)이 제공하는 기능을 사용할 수 있다. In the embodiment shown in Fig. 1, the user device 1 is shown in the form of a notebook computer. However, this is an example, and the type of the user device 1 is not limited to those shown in the drawings. For example, a user who wants to receive a course recommendation is a mobile communication terminal such as a smartphone, a personal computer, a notebook computer, a personal digital assistant (PDA), a tablet, and an Internet Protocol Television (IPTV). A function provided by the subject recommendation system 3 may be used by using an arbitrary computing device such as a set-top box for etc.

전처리부(30)는 과목 간의 유사도를 계산하기 위한 기초가 되는 텍스트 데이터를 추출하기 위한 부분이다. 전처리부(30)는 과목의 교수요목 및/또는 학습목표, 학과별 커리큘럼 등으로부터 텍스트 데이터를 추출할 수 있다. 이상의 동작을 위하여, 전처리부(30)는 각 과목의 강의를 제공하는 대학 등 교육기관 서버(2)로부터 정보를 수집하도록 구성될 수 있다. 또한, 전처리부(30)는 강의내용 정보를 생성함에 있어서 과목에 해당하는 학문분야의 키워드를 결정하기 위하여 학문분야별 출간논문 데이터로부터 텍스트 데이터를 추출할 수 있다. 이를 위하여, 전처리부(30)는 연구 논문 데이터를 제공하는 학술정보 제공 서버(4)와 통신하도록 구성될 수 있다. The preprocessor 30 is a part for extracting text data, which is a basis for calculating the degree of similarity between subjects. The preprocessor 30 may extract text data from syllabus and/or learning goals of a subject, curriculum for each department, and the like. For the above operation, the preprocessor 30 may be configured to collect information from the server 2 of an educational institution such as a university that provides lectures of each subject. In addition, the preprocessor 30 may extract text data from published thesis data for each academic field in order to determine a keyword of an academic field corresponding to a subject in generating lecture content information. To this end, the preprocessor 30 may be configured to communicate with the academic information providing server 4 that provides research thesis data.

또한, 전처리부(30)는 과목별 수강생들의 수강이력 데이터에 대한 전처리를 통하여 과목 수강생의 특성을 추출하도록 구성될 수 있다. 나아가, 전처리부(30)는 과목을 추천받고자 하는 사용자가 사용자 장치(1)로부터 전송한 관심 과목 정보에 대한 전처리를 통하여 선호과목 정보를 추출하도록 더 구성될 수 있다. In addition, the pre-processing unit 30 may be configured to extract characteristics of course students through pre-processing of attendance history data of students for each subject. Furthermore, the preprocessor 30 may be further configured to extract preferred subject information through preprocessing of the subject information of interest transmitted from the user device 1 by the user who wants to receive a subject recommendation.

제1 데이터 분석부(31)는, 하나 이상의 과목에 연관된 텍스트 데이터를 분석함으로서 각 과목에 대한 강의내용 정보를 생성할 수 있다. 일 실시예에서, 제1 데이터 분석부(31)는 텍스트 데이터에 대한 단어 빈도-역 문서 빈도(Term Frequency-Inverse Document Frequency; TF-IDF) 기반의 분석에 의하여 과목의 강의내용을 학습하고, 강의내용에 대한 학습 결과에 기초하여 강의내용 정보를 생성할 수 있다. 제1 데이터 분석부(31)의 구체적인 동작에 대해서는 도 3을 참조하여 상세히 후술한다. The first data analysis unit 31 may generate lecture content information for each subject by analyzing text data related to one or more subjects. In one embodiment, the first data analysis unit 31 learns the lecture content of the subject by analyzing the text data based on Term Frequency-Inverse Document Frequency (TF-IDF), Lecture content information can be generated based on the learning result of the content. A detailed operation of the first data analysis unit 31 will be described later in detail with reference to FIG. 3 .

제2 데이터 분석부(32)는, 과목별 수강생의 수강이력 데이터를 분석함으로써 각 과목에 대한 수강생 특성 정보를 생성할 수 있다. 예를 들어, 제2 데이터 분석부(32)는 복수의 수강생의 수강이력 데이터에 기초하여 학과별 수강비율 데이터를 생성하고, 각 과목별 수강생들의 학과별 수강비율 데이터를 이용하여 수강생 특성 정보를 생성할 수 있다. 제2 데이터 분석부(32)의 구체적인 동작에 대해서는 도 4를 참조하여 상세히 후술한다. The second data analysis unit 32 may generate student characteristic information for each subject by analyzing the attendance history data of the students for each subject. For example, the second data analysis unit 32 may generate attendance rate data for each department based on the attendance history data of a plurality of students, and generate student characteristic information using the attendance ratio data for each department of the students for each subject. there is. A detailed operation of the second data analyzer 32 will be described later with reference to FIG. 4 .

유사도 결정부(33)는, 제1 데이터 분석부(31)에 의하여 생성된 강의내용 정보 및 제2 데이터 분석부(32)에 의하여 생성된 수강생 특성 정보를 이용하여 과목별 유사도 정보를 생성할 수 있다. 예를 들어, 유사도 결정부(33)는 과목별 학문영역 데이터가 임베딩(embedding)된 강의내용 매트릭스와, 과목을 수강한 학생들의 학과별 수강비율이 임베딩된 수강생 특성 매트릭스 사이의 코사인 유사도를 이용하여 과목별 유사도를 결정할 수 있다. 과목별 유사도를 결정하는 구체적인 과정에 대해서는 도 2를 참조하여 상세히 후술한다. The similarity determining unit 33 may generate similarity information for each subject using the lecture content information generated by the first data analysis unit 31 and the student characteristic information generated by the second data analysis unit 32 . there is. For example, the similarity determining unit 33 uses the cosine similarity between the lecture content matrix in which the academic area data for each subject is embedded and the student characteristic matrix in which the attendance ratio by department of students who have taken the course is embedded. Star similarity can be determined. A detailed process for determining the degree of similarity for each subject will be described later in detail with reference to FIG. 2 .

일 실시예에서, 제1 데이터 분석부(31)에 의하여 생성된 강의내용 정보, 제2 데이터 분석부(32)에 의하여 생성된 수강생 특성 정보, 및/또는 유사도 결정부(33)에 의하여 생성된 유사도 정보는 과목 추천 시스템(3)의 DB(34)에 저장될 수 있다. In an embodiment, the lecture content information generated by the first data analysis unit 31 , the student characteristic information generated by the second data analysis unit 32 , and/or the similarity determination unit 33 generated by the The similarity information may be stored in the DB 34 of the subject recommendation system 3 .

추천과목 결정부(35)는, 사용자 장치(1)로부터 수신된 관심 과목 정보와 유사도 결정부(33)에 의하여 생성된 과목별 유사도 정보를 이용하여, 대상 사용자에 대한 과목 추천 정보를 생성할 수 있다. 예를 들어, 추천과목 결정부(35)는 추천 대상 사용자의 수강이력 및/또는 수강희망과목을 토대로 사용자의 선호과목을 결정하고, 선호과목과의 유사도에 기초하여 추천과목을 결정할 수 있다.The recommended subject determining unit 35 may generate subject recommendation information for the target user by using the subject information of interest received from the user device 1 and the subject-specific similarity information generated by the similarity determining unit 33 . there is. For example, the recommended subject determining unit 35 may determine the user's preferred subject based on the recommended subject user's attendance history and/or desired course, and may determine the recommended subject based on the degree of similarity to the preferred subject.

추천과목 결정부(35)는 생성된 과목 추천 정보를 사용자 장치(1)에 제공할 수 있다. 이때, 과목 추천 정보는 과목 간의 유사도를 나타내는 과목지도의 형태로 시각화될 수도 있다. 과목지도는 과목별 특성에 상응하는 복수 개의 차원에 의해 정의되며 과목 간의 유사도가 높을수록 과목들이 서로 가까이 위치하는 평면 또는 입체 지도의 형태를 가질 수 있다. 이때, 과목별 학문영역 및 학과특성을 한정된 수의 차원(예컨대, 2차원)으로 축소하기 위하여 UMAP(Uniform Manifold Approximation and Projection) 알고리즘이 이용될 수도 있다. 과목지도의 형태에 대해서는 도 8a 및 8b를 참조하여 상세히 후술한다. The recommended subject determination unit 35 may provide the generated subject recommendation information to the user device 1 . In this case, the subject recommendation information may be visualized in the form of a subject map indicating the degree of similarity between subjects. A subject map is defined by a plurality of dimensions corresponding to the characteristics of each subject, and the higher the similarity between subjects, the closer to each other the subjects may be, in the form of a flat or three-dimensional map. In this case, the UMAP (Uniform Manifold Approximation and Projection) algorithm may be used in order to reduce the academic areas and department characteristics for each subject to a limited number of dimensions (eg, two-dimensional). The form of subject guidance will be described later in detail with reference to FIGS. 8A and 8B.

도 2는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법의 각 단계를 나타내는 순서도이다. 실시예들에 따른 과목 추천 방법은, 실시예들에 따른 과목 추천 시스템을 이용하여 수행될 수 있다. 이하에서는, 설명의 편의를 위하여 도 1 및 도 2를 참조하여 본 실시예에 따른 과목 추천 방법에 대하여 설명한다. 2 is a flowchart illustrating each step of a method for recommending a subject based on subject similarity according to an exemplary embodiment. The subject recommendation method according to the embodiments may be performed using the subject recommendation system according to the embodiments. Hereinafter, for convenience of explanation, a subject recommendation method according to the present embodiment will be described with reference to FIGS. 1 and 2 .

본 발명의 실시예들은 과목을 추천받고자 하는 대상 사용자가 관심있는 강의와 유사한 강의를 추천하는 과목 기반 추천 모델을 이용하여 구현된다. 더 구체적으로는, 실시예들은 각 과목별 데이터를 강의내용 차원과 수강생 특성 차원에서 특성 매트릭스에 임베딩하는 방식으로 과목별 강의내용 정보와 수강생 특성 정보를 산출하고, 이를 활용하여 과목간 유사도를 산출할 수 있다. 이후, 사용자의 선호과목과 유사도가 높은 과목을 추천하는 방식으로 과목 추천 정보를 사용자에게 제공할 수 있다. Embodiments of the present invention are implemented using a subject-based recommendation model in which a subject user who wants to receive a subject recommendation recommends a lecture similar to an interested lecture. More specifically, the embodiments calculate lecture content information and learner characteristic information for each subject by embedding data for each subject in a characteristic matrix at the lecture content dimension and student characteristic dimension, and use this to calculate the degree of similarity between subjects. can Thereafter, subject recommendation information may be provided to the user in a manner of recommending a subject having a high degree of similarity to the user's preferred subject.

구체적으로, 과목 추천 시스템(3)의 제1 데이터 분석부(31)는 연구 논문 데이터를 대상으로 키워드 분석을 수행함으로써, 각 과목의 학문분야 키워드 매트릭스를 생성할 수 있다(S11). 또한, 제1 데이터 분석부(31)는 과목별 교수요목과 같은 과목정보 데이터에 대한 형태소 분석을 통해 과목정보 키워드 매트릭스를 생성할 수 있다(S12). 다음으로, 제1 데이터 분석부(31)는 학문분야 키워드 매트릭스와 과목정보 키워드 매트릭스를 활용하여 각 과목을 강의내용 매트릭스에 임베딩함으로써 강의내용 정보를 생성할 수 있다(S13). Specifically, the first data analysis unit 31 of the subject recommendation system 3 may generate an academic field keyword matrix for each subject by performing keyword analysis on research thesis data (S11). Also, the first data analysis unit 31 may generate a subject information keyword matrix through morphological analysis of subject information data such as syllabus for each subject ( S12 ). Next, the first data analysis unit 31 may generate lecture content information by embedding each subject in the lecture content matrix using the academic field keyword matrix and the subject information keyword matrix (S13).

제1 데이터 분석부(31)에 의한 이상의 동작을 위하여, 과목 추천 시스템(3)의 전처리부(30)가 연구논문 데이터로부터 학문영역별 키워드를 텍스트 데이터 형태로 추출할 수 있다. 또한, 전처리부(30)는 각 과목의 교수요목이나 학습목표, 학과별 커리큘럼 데이터 등 과목정보로부터 형태소 분석을 통하여 키워드에 해당하는 텍스트 데이터를 추출할 수 있다. 또한, 제1 데이터 분석부(31)에 의해 생성되는 각 매트릭스에서 각 키워드의 중요도를 반영하기 위하여 TF-IDF 값이 이용될 수도 있다. For the above operation by the first data analysis unit 31 , the preprocessor 30 of the subject recommendation system 3 may extract keywords for each academic area from the research thesis data in the form of text data. In addition, the preprocessor 30 may extract text data corresponding to a keyword through morphological analysis from subject information such as syllabus or learning goals of each subject, and curriculum data for each department. In addition, the TF-IDF value may be used to reflect the importance of each keyword in each matrix generated by the first data analysis unit 31 .

도 3은 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법에 의해 강의내용 정보를 생성하는 과정의 개념도이다.3 is a conceptual diagram of a process of generating lecture content information by a subject recommendation method based on subject similarity according to an exemplary embodiment.

도 3을 참조하면, 추천 대상 과목들에 해당하는 각 과목의 학문영역 특성을 임베딩하기 위하여 연구논문 데이터를 이용할 수 있다. 예컨대, 연구논문 데이터의 각 논문은 통상 5 내지 7개 정도의 키워드를 포함하고 있으며, 투고된 저널의 특성에 따라 통상 1 내지 3개 정도의 학문 분야에 할당되어 있다. 일 실시예에서, 연구논문 데이터는 학술정보 제공 서버(4; 도 1)로부터 획득될 수도 있다.Referring to FIG. 3 , research thesis data may be used to embed the academic domain characteristics of each subject corresponding to the recommended subjects. For example, each thesis of the research thesis data usually includes about 5 to 7 keywords, and is usually assigned to about 1 to 3 academic fields according to the characteristics of the submitted journal. In an embodiment, the research thesis data may be obtained from the academic information providing server 4 (FIG. 1).

본 실시예에서는, 논문이 포함된 학문분야에 따라 논문에 포함된 키워드의 출현빈도를 집계하고, 집계된 데이터를 바탕으로 학문분야 키워드의 문서-단어 매트릭스(301)를 구축하였다. 문서-단어 매트릭스(301)의 각 행은 키워드에 대응되며, 각 열은 학문영역에 대응된다. 일 실시예에서는, 기록된 키워드가 일정 횟수 미만인 학문영역, 및/또는 모든 학문영역을 총합하여 일정 횟수 미만 등장한 키워드는 배제하고 전술한 문서-단어 매트릭스(301)를 구성할 수도 있다. 그러나 이는 예시적인 것으로서, 다른 실시예에서는 모든 학문영역과 모든 키워드를 포함하도록 문서-단어 매트릭스(301)를 구성할 수도 있다. In this embodiment, the frequency of occurrence of keywords included in the thesis is counted according to the academic field in which the thesis is included, and the document-word matrix 301 of the keyword in the academic field is constructed based on the aggregated data. Each row of the document-word matrix 301 corresponds to a keyword, and each column corresponds to an academic area. In an embodiment, the above-described document-word matrix 301 may be configured by excluding keywords that appear less than a certain number of times by summing up all academic areas and/or academic areas in which the recorded keyword is less than a certain number of times. However, this is only an example, and in another embodiment, the document-word matrix 301 may be configured to include all academic areas and all keywords.

이때, 문서-단어 매트릭스(301)의 각 성분은 연구논문의 키워드와 학문분야의 조합별 TF-IDF 값을 포함할 수 있다. TF-IDF 값은 키워드의 전체 등장 빈도와 특정 학문분야 내 등장 빈도를 고려해 하나의 키워드가 특정 학문분야 내에서 얼마나 중요한 키워드인지 나타내는 수치이며, 학문분야 전반에 걸쳐 높은 빈도로 등장하는 일반적인 키워드의 개별 분야 내 중요성을 적절하게 조절하기 위하여 사용된다. 일 실시예에서는, 전체 키워드-학문분야 조합별 TF-IDF 값이 0 내지 1 사이가 되도록 TF-IDF 값을 정규화할 수도 있다. In this case, each component of the document-word matrix 301 may include a TF-IDF value for each combination of a keyword of a research thesis and an academic field. The TF-IDF value is a numerical value indicating how important a keyword is within a specific academic field, considering the overall frequency of occurrence of the keyword and the frequency of occurrence within a specific academic field. It is used to properly control the importance within the field. In an embodiment, the TF-IDF value may be normalized so that the TF-IDF value for each keyword-scientific field combination is between 0 and 1.

한편, 추천 대상 과목들에서 각 과목의 콘텐츠에 해당하는 특성을 임베딩하기 위하여 과목정보가 이용될 수 있다. 이때 과목정보란 각 과목의 강의에서 학습하는 내용을 나타내는 것으로서, 예를 들어 두 강의에서 다루는 내용이 학문적으로 유사할수록 두 과목은 서로 유사한 것으로 간주할 수 있다. 본 실시예에서는, 과목의 학습내용을 분석하기 위하여 교수자가 입력한 교수요목 및/또는 학습목표, 교육기관 서버에서 제공하는 커리큘럼 정보 등을 과목정보로 이용할 수 있다. 일 실시예에서, 교수요목, 학습목표, 커리큘럼 정보 등은 교육기관 서버(2; 도 1)로부터 확득될 수도 있다. Meanwhile, subject information may be used to embed characteristics corresponding to contents of each subject in recommended subjects. In this case, the subject information indicates the contents learned in the lectures of each subject. For example, the more similar the contents covered in the two lectures are academically, the more similar the two subjects can be considered to each other. In this embodiment, the syllabus and/or learning goals input by the instructor in order to analyze the learning content of the subject, curriculum information provided by the educational institution server, etc. may be used as subject information. In an embodiment, syllabus, learning goals, curriculum information, and the like may be obtained from the educational institution server 2 ( FIG. 1 ).

본 실시예에서는, 먼저 비정형 데이터인 과목정보 데이터에 대한 형태소 분석을 통하여 과목정보 키워드를 추출할 수 있다. 이때, 일 실시예에서는 예컨대 [기말, 페이퍼, 제출, .... 공부, 교시, 필요] 등 강의별 교육 내용과 밀접한 관련이 없는 단어들은 미리 설정된 불용어로 분류하여 키워드로부터 배제할 수도 있다. 과목정보 데이터로부터 추출된 키워드들의 매트릭스(302)는 전치(transpose)를 통하여 키워드-과목 매트릭스(303)로 변환될 수 있으며, 키워드-과목 매트릭스(303)의 각 행은 각 과목에 상응하고 각 열은 각 키워드에 상응한다. In this embodiment, first, subject information keywords can be extracted through morphological analysis of subject information data, which is unstructured data. At this time, in one embodiment, for example, words not closely related to the educational content of each lecture, such as [final, paper, submission, .... study, teaching, necessary], may be excluded from keywords by classifying them as preset stopwords. The matrix 302 of keywords extracted from the subject information data may be converted into a keyword-subject matrix 303 through transpose, and each row of the keyword-subject matrix 303 corresponds to each subject and each column corresponds to each keyword.

일 실시예에서는, 학문분야 키워드 매트릭스(301)와 마찬가지로 특정 키워드가 한 과목 내에서 얼마나 많이 등장했는지(TF)와, 얼마나 많은 과목에서 등장했는지(DF)를 고려해 키워드-과목 쌍별 TF-IDF 값을 산출할 수도 있으며, TF-IDF값이 높을수록 특정 키워드가 해당 과목을 대표하는 중요 키워드일 수 있다는 것을 의미한다. 또한 일 실시예에서, 각 과목정보 키워드의 과목별 TF-IDF 값은 0 내지 1 사이의 값으로 정규화될 수도 있다. In one embodiment, similar to the academic field keyword matrix 301, the TF-IDF value for each keyword-subject pair is calculated in consideration of how many times a specific keyword appeared in one subject (TF) and how many subjects it appeared in (DF). It can also be calculated, and a higher TF-IDF value means that a specific keyword may be an important keyword representing the subject. Also, in an embodiment, the TF-IDF value for each subject of each subject information keyword may be normalized to a value between 0 and 1.

다음으로, 학문분야 키워드 매트릭스(301)와 과목정보 키워드 매트릭스(303)를 이용하여 강의내용 매트릭스(310)를 생성할 수 있다. 이때 강의내용 매트릭스(310)의 각 행은 각각의 과목에 해당하며, 각 열은 각각의 학문영역에 해당된다. 또한, 강의내용 매트릭스(310)의 각 성분은 과목정보 키워드의 학문영역 특성을 TF-IDF 가중치를 적용하여 평균을 구한 값으로 산출될 수 있다. 즉, 과목별로 중요한 키워드의 학문영역 특성이 그 과목의 학문영역 특성을 산출할 때 더 크게 반영된다. Next, the lecture content matrix 310 may be generated using the academic field keyword matrix 301 and the subject information keyword matrix 303 . At this time, each row of the lecture content matrix 310 corresponds to each subject, and each column corresponds to each academic area. In addition, each component of the lecture content matrix 310 may be calculated as an average value obtained by applying a TF-IDF weight to the academic domain characteristic of the subject information keyword. In other words, the academic domain characteristics of keywords that are important for each subject are reflected more strongly when calculating the academic domain characteristics of the subject.

예를 들어, 학문분야 m에서 키워드 i의 TF-IDF 값을 하기 수학식 1의 F로 나타내고, 과목정보 j에서 키워드 i의 TF-IDF 값을 하기 수학식 2의 C로 나타내며, 과목정보 j의 키워드 수를 N_j로 나타낼 경우, 학문분야 m에 속하는 과목정보 j의 가중치가 반영된 TF-IDF 값 F'은 하기 수학식 3과 같이 산출될 수 있다. For example, in academic field m, the TF-IDF value of keyword i is represented by F in Equation 1 below, the TF-IDF value of keyword i in subject information j is represented by C in Equation 2 below, and subject information j is When the number of keywords is expressed as N _j , the TF-IDF value F ′ in which the weight of subject information j belonging to the academic field m is reflected can be calculated as in Equation 3 below.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

[수학식 3][Equation 3]

다시 도 1 및 도 2를 참조하면, 과목 추천 시스템(3)의 제2 데이터 분석부(32)는 각 과목별로 해당 과목의 강의를 들은 수강생의 특성을 반영하기 위하여 과목별 수강생 특성 정보를 생성할 수 있다. 이를 위하여, 제2 데이터 분석부(32)는 복수의 수강생의 수강이력 데이터를 이용할 수 있다. 먼저, 제2 데이터 분석부(32)는 수강이력 데이터의 각 수강생에 대하여 수강생이 이수한 전체 학점 중 특정 학과에서 이수한 학점의 비율을 산출함으로써, 수강생별 수강이력 특성 매트릭스를 산출할 수 있다(S14). 이를 위하여, 전처리부(30)는 수강이력 데이터에서 학번, 과목, 개설학과 등의 정보를 텍스트 데이터 형태로 추출할 수 있다. Referring back to FIGS. 1 and 2 , the second data analysis unit 32 of the subject recommendation system 3 generates information about the characteristics of students for each subject in order to reflect the characteristics of the students who have listened to the lecture for each subject. can To this end, the second data analysis unit 32 may use the attendance history data of a plurality of students. First, the second data analysis unit 32 calculates the ratio of the credits completed in a specific department among the total credits completed by the students for each student in the course history data, thereby calculating a course history characteristic matrix for each student (S14) . To this end, the pre-processing unit 30 may extract information such as a student number, subject, and opened department from the course history data in the form of text data.

다음으로, 제2 데이터 분석부(32)는 추천의 대상이 되는 하나 이상의 과목을 대상으로, 수강이력 데이터에서 해당 과목을 수강한 수강생들의 수강이력 특성 매트릭스를 이용하여 과목별 수강생 특성을 산출할 수 있다(S15). 예를 들어, 동일한 과목을 수강한 수강생들의 수강이력 특성 매트릭스의 평균이 해당 과목의 수강생 특성 정보로 이용될 수 있다. Next, the second data analysis unit 32 may calculate the student characteristics for each subject by using the attendance history characteristic matrix of the students who took the corresponding course from the attendance history data for one or more subjects to be recommended. There is (S15). For example, the average of the attendance history characteristic matrix of students who have taken the same course may be used as the student characteristic information of the corresponding course.

도 4는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법에 의해 수강생 특성 정보를 생성하는 과정의 개념도이다. 4 is a conceptual diagram illustrating a process of generating student characteristic information by a subject recommendation method based on subject similarity according to an exemplary embodiment.

도 4를 참조하면, 본 실시예에서 수강이력 데이터는 수강생들의 수강이력을 기반으로 과목별 학과 TF-IDF 값을 산출하기 위하여 이용된다. 각 수강생의 식별정보(예컨대, 학번)와 각 수강생이 수강한 과목 정보(예컨대, 과목 및 개설학과)를 포함하는 수강이력 데이터(400)로부터, 수강과목 비율 매트릭스(401)를 산출할 수 있다. 수강과목 비율 매트릭스(401)의 각 행은 수강생에 해당하며, 각 열은 과목이 개설된 학과에 해당된다. 즉, 수강과목 비율 매트릭스(401)의 각 성분의 값은 해당 행에 해당하는 수강생의 전체 취득 학점 중 해당 열에 해당하는 학과에서 취득한 과목의 학점 총합의 비율을 나타낸다.Referring to FIG. 4 , in the present embodiment, the attendance history data is used to calculate the TF-IDF value for each subject based on the attendance history of the students. From the course history data 400 including identification information (eg, student number) of each student and information on the courses each student has taken (eg, courses and departments), a subject ratio matrix 401 may be calculated. Each row of the subject ratio matrix 401 corresponds to a student, and each column corresponds to a department in which a subject is opened. That is, the value of each component of the subject ratio matrix 401 represents the ratio of the total number of credits acquired in the department corresponding to the corresponding column among the total credits acquired by the students corresponding to the corresponding row.

또한, 수강이력 데이터로부터 과목별 수강생 매트릭스(402)를 추출할 수 있다. 과목별 수강생 매트릭스(402)의 각 행은 각각의 과목에 해당하며, 수강생 매트릭스(402)의 각 열은 각 수강생에 해당한다. 이와 같이 산출된 수강과목 비율 매트릭스(401)와 과목별 수강생 매트릭스(402)를 이용하여, 각각의 과목을 수강한 수강생들의 학과별 수강비율 특성이 임베딩된 수강생 특성 매트릭스(410)를 얻을 수 있다. 즉, 수강생 특성 매트릭스(401)의 각 행은 각각의 과목을 나타내며, 각 열은 해당 과목을 수강한 수강생들의 평균적인 개설학과별 수강비율을 나타낸다. Also, it is possible to extract the student matrix 402 for each subject from the attendance history data. Each row of the learner matrix 402 for each subject corresponds to a respective subject, and each column of the learner matrix 402 corresponds to a respective learner. By using the calculated subject ratio matrix 401 and the student matrix 402 for each subject, it is possible to obtain a student characteristic matrix 410 in which the characteristics of the attendance rate by department of the students who have taken each course are embedded. That is, each row of the student characteristic matrix 401 represents a respective subject, and each column represents an average attendance ratio for each open department of students who have taken the corresponding subject.

다시 도 1 및 도 2를 참조하면, 과목 추천 시스템(3)의 유사도 결정부(33)는, 전술한 것과 같이 생성된 강의내용 매트릭스(310; 도 3)와 수강생 특성 매트릭스(410; 도 4)를 이용하여 과목간 유사도를 산출할 수 있다(S16). 일 실시예에서, 과목간 유사도 정보는 강의내용 매트릭스와 수강생 특성 매트릭스 각각을 기반으로 한 코사인 유사도에 의하여 산출될 수도 있다. 예를 들어, 서로 상이한 두 과목의 강의내용 매트릭스(또는 수강생 특성 매트릭스)를 각각 A, B라고 할 경우, 두 매트릭스 사이의 코사인 유사도는 하기 수학식 4와 같이 산출될 수 있다. Referring back to FIGS. 1 and 2 , the similarity determining unit 33 of the subject recommendation system 3 includes the lecture content matrix 310 ( FIG. 3 ) and the student characteristic matrix 410 ( FIG. 4 ) generated as described above. can be used to calculate the degree of similarity between subjects (S16). In an embodiment, the similarity information between subjects may be calculated by cosine similarity based on each of the lecture content matrix and the student characteristic matrix. For example, if lecture content matrices (or student characteristic matrices) of two different subjects are A and B, respectively, the cosine similarity between the two matrices can be calculated as in Equation 4 below.

[수학식 4][Equation 4]

상기 수학식 4와 같이 산출되는 코사인 유사도에 기초하여, 강의내용 정보를 기초로 한 과목간 유사도 정보와 수강생 특성 정보를 기초로 한 과목간 유사도 정보를 각각 생성할 수 있다. Based on the cosine similarity calculated as in Equation 4, similarity information between subjects based on lecture content information and information on similarity between subjects based on student characteristic information may be generated, respectively.

다음으로, 과목 추천 시스템(3)의 추천과목 결정부(35)는, 과목을 추천받고자 하는 대상 사용자에게 적합한 추천과목을 결정하기 위하여 대상 사용자의 관심 과목 정보를 수신할 수 있다(S17). 이때 관심 과목 정보란, 사용자가 직접 지정한 수강희망과목을 포함할 수도 있고, 또는/또한 사용자의 기존 수강이력을 포함할 수도 있다. Next, the recommended subject determination unit 35 of the subject recommendation system 3 may receive information on the subject of interest of the target user in order to determine a recommended subject suitable for the target user who wants to receive the subject recommendation (S17). At this time, the subject information of interest may include a desired course designated by the user, and/or may include the user's existing attendance record.

다음으로, 추천과목 결정부(35)는 관심 과목 정보를 토대로 사용자의 선호과목을 결정하고, 유사도 정보에서 선호과목과의 유사도가 높은 과목을 미리 설정된 개수의 과목을 추천과목으로 결정할 수 있다(S18). 관심 과목 정보가 사용자의 수강희망과목을 포함하는 경우에는 수강희망과목 자체가 선호과목에 해당하며, 관심 과목 정보가 기존 수강이력을 포함하는 경우에는 기존 수강이력 중 선별된 과목이 선호과목에 해당될 수 있다. Next, the recommended subject determination unit 35 may determine the user's preferred subject based on the subject information of interest, and determine a subject having a high similarity with the preferred subject in the similarity information as a recommended subject with a preset number of subjects (S18) ). If the subject information of interest includes the user's desired course, the desired course itself corresponds to the preferred course. can

일 실시예에서는, 기존 수강이력에서 선호과목을 선별하기 위하여, 해당 사용자가 필수적으로 이수한 과목(전공과목 또는 필수교양과목 등)을 제외한 나머지 과목을 선호과목으로 결정할 수도 있다. 이는 사용자가 졸업을 위하여 의무적으로 수강한 과목 등을 배제하고 오로지 사용자의 선호에 의하여 수강한 과목을 선별하기 위한 것이다. In one embodiment, in order to select preferred subjects from the existing course history, subjects other than subjects that the user has necessarily completed (major subjects or compulsory liberal arts subjects, etc.) may be determined as preferred subjects. This is to exclude subjects that the user must take for graduation, and select subjects that are taken only according to the user's preference.

추천과목 결정부(35)는 이상과 같이 결정된 추천과목을 포함하는 과목 추천 정보를 사용자 장치(1)에 제공할 수 있다(S19). 예를 들어, 추천과목 결정부(35)는 사용자의 선호과목에 대한 유사도에 기초하여 유사도가 상위인 n개의 과목을 추천과목으로 결정할 수 있다(n은 임의의 자연수). n 값이 클수록 넓은 범위의 추천이 이루어지고 n값이 작을수록 세부적인 범위의 추천이 이루어지며, 따라서 상기 n 값은 본 명세서에서 추천 범위로도 지칭된다. The recommended subject determining unit 35 may provide subject recommendation information including the determined recommended subject to the user device 1 ( S19 ). For example, the recommended subject determination unit 35 may determine n subjects having a higher similarity as recommended subjects based on the similarity to the user's preferred subjects (n is an arbitrary natural number). As the value of n is large, a wider range of recommendations is made, and as the value of n is smaller, a recommendation of a detailed range is made. Therefore, the value of n is also referred to as a recommendation range in this specification.

일 실시예에서, 추천과목 결정부(35)는 (i) 선정 횟수, (ii) 수강희망 순위, 및 (iii) 유사도 백분위를 고려한 가중치를 반영하여 최종 추천과목을 결정할 수도 있다. 이때 선정 횟수란 얼마나 많은 수의 선호과목에 대한 유사도를 기준으로 상위 n개의 추천과목에 해당하는지를 가중치로 반영하기 위한 것이며, 다수의 선호과목과 유사한 과목일수록 선정 횟수 가중치가 증가하도록 구성할 수 있다. 또한, 수강희망 순위는 추천과목과 유사한 선호과목에 대하여 사용자가 기재한 수강희망 순위를 가중치로 반영하기 위한 것이다. 또한, 유사도 백분위는 선호과목과의 유사도 수치를 백분위로 변환한 값을 가중치로 반영하기 위한 것이다. In an embodiment, the recommended subject determination unit 35 may determine the final recommended subject by reflecting a weight in consideration of (i) the number of selections, (ii) a desired course order, and (iii) a similarity percentile. At this time, the number of selections is to reflect the weight of the top n recommended courses based on the degree of similarity to the number of preferred courses. In addition, the desired course order is to reflect the preferred course order written by the user as a weight for the preferred subject similar to the recommended subject. In addition, the similarity percentile is to reflect the value obtained by converting the degree of similarity with the preferred subject to the percentile as a weight.

일 실시예에서, 추천과목 결정부(35)는 추천 대상 사용자가 수강희망 과목을 등록하지 않았으며 해당 사용자의 기존 수강이력도 없는 경우, 해당 사용자와 동일한 특성을 가지는 다른 수강생들의 선호과목에 기초하여 추천과목을 결정할 수도 있다. 이때 다른 수강생들이란, 추천 대상 사용자와 동일한 전공을 갖는 수강생들을 지칭할 수 있으나, 이에 한정되는 것은 아니다. 또한 다른 수강생들의 선호과목은, 다른 수강생들이 입력한 수강희망 과목이거나, 또는 다른 수강생들의 기존 수강이력에 기초하여 결정되는 과목을 지칭할 수 있다. In one embodiment, the recommended subject determination unit 35 is configured to, when a recommended subject user does not register a desired course and does not have an existing course record of the user, based on the preferred subject of other students having the same characteristics as the corresponding user. You can also decide which subjects to recommend. In this case, other students may refer to students having the same major as the recommended target user, but is not limited thereto. Also, the preferred course of other students may refer to a desired course entered by other students or a course determined based on the existing course histories of other students.

본 발명자들은, 이상에서 설명한 본 발명의 실시예들에 따른 과목 추천 모델의 성능을 검증하기 위해 추천 모델이 추천한 과목과 학생들의 실제 수강신청 결과를 비교하였다. 이를 위해 2020년 1학기까지 3과목 이상 과목을 신청한 학생 중 2020년 2학기에 수강신청을 한 9,275명의 데이터를 활용했다. 2020년 1학기까지 데이터를 기반으로 학생별 추천과목 목록을 산출하고, 이를 2020년 2학기에 학생이 선택한 과목과 비교했다. 본 명세서에서는, 표현의 간결성을 위하여 학생들이 수강 신청한 과목과 수강희망과목에 등록한 과목을 선택 과목으로 표현한다. 학생들이 선택한 과목은 평균 2.94개였으며, 표준편차는 1.86이었다. In order to verify the performance of the subject recommendation model according to the embodiments of the present invention described above, the present inventors compared the subjects recommended by the recommendation model with the actual course registration results of students. For this purpose, data from 9,275 students who applied for courses in the second semester of 2020 among students who applied for three or more courses by the first semester of 2020 were used. Based on the data up to the first semester of 2020, a list of recommended subjects for each student was calculated and compared with the subjects selected by the students in the second semester of 2020. In this specification, for the sake of brevity of expression, the subjects that students have applied for and the subjects registered in the desired subjects are expressed as elective subjects. The average number of subjects selected by the students was 2.94, and the standard deviation was 1.86.

성능의 검증에는 추천 모델 성능 검증에 널리 활용되는 정밀도(precision rate)와 재현율(recall rate)을 사용하였다. 정밀도는 추천 횟수 대비 추천 성공 횟수를 통해 [정밀도 = (과목추천 성공 횟수) / (추천과목 수)]로 산출하였으며, 이때 추천 성공은 추천 모델이 추천한 과목이 실제 학생이 선택한 과목일 경우로 정의하였다. 또한 재현율은 선택 과목 수에서 과목추천이 성공한 수의 비율을 통해 [재현율 = (과목추천 성공 횟수) / (선택 과목 수)]로 산출하였다. For performance verification, precision rate and recall rate, which are widely used for recommendation model performance verification, were used. Precision was calculated as [precision = (number of successful subject recommendations) / (number of recommended subjects)] through the number of recommended successes compared to the number of recommendations did In addition, the recall rate was calculated as [recall rate = (number of successful subject recommendations) / (number of elective subjects)] through the ratio of the number of successful subject recommendations to the number of elective subjects.

도 5 내지 도 7는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법의 효과를 종래 기술과 비교하여 나타낸 그래프이다.5 to 7 are graphs illustrating an effect of a subject recommendation method based on subject similarity according to an exemplary embodiment in comparison with the prior art.

도 5는 일 실시예에 따른 과목 추천 방법에 의한 추천 정확도를 재현율과 정밀도 곡선을 통하여 종래 기술과 비교한 것을 나타내며, 도 5의 (a) 내지 (f)는 각각 추천 범위 값으로 3, 5, 10, 30, 50, 100을 사용한 경우를 나타낸다. 본 실시예와 대비되는 가상의 종래 기술은 활용한 데이터에 따라 "수강생 특성" 만을 사용하는 모델 및 "강의내용" 만을 사용하는 모델로 구분하였다. 추천 범위가 커질수록 학생의 전반적인 관심사가 반영된 추천 결과가 산출되며, 추천 범위가 좁을수록 학생의 특정 관심 과목에 기반한 추천 결과가 산출되었다. 5 shows a comparison of the recommendation accuracy by the subject recommendation method according to an embodiment with the prior art through a recall rate and a precision curve, and FIGS. 5 (a) to (f) are 3, 5, The case where 10, 30, 50, and 100 is used is shown. In contrast to this embodiment, the virtual prior art was divided into a model using only “student characteristics” and a model using only “lecture content” according to the data utilized. As the recommendation range increased, a recommendation result reflecting the student's overall interest was calculated, and as the recommendation range was narrow, the recommendation result based on the student's specific interest subject was calculated.

도 5에 도시되는 바와 같이, 본 실시예에 따른 과목 추천 방법이 "수강생 특성" 또는 "강의내용" 중 어느 하나만을 사용하는 모델에 비해 성능이 우수하였다. 특히 추천 범위가 3, 5 및 10인 경우 본 실시예에 따른 과목 추천 방법의 정밀도와 재현율이 우수하여 분석과 서비스 적용에 바람직한 것으로 확인되었다. 또한, 추천과목 수에 따라 정밀도와 재현율 사이의 트레이드 오프(trade-off)가 확인되었다. 즉, 많은 과목을 추천해줄수록 학생이 선택한 과목 중에서 모델이 추천한 과목이 있을 확률이 높아지지만, 추천된 과목을 학생이 실제로 선택할 확률은 낮았다. 이에 기초하여, 학생들에게 많은 과목을 추천하기 위하여 21개 과목을 추천하는 것을 기본값으로 하여 본 발명을 구현할 수 있으나, 이는 예시적인 것으로서, 추천 범위와 추천과목 수는 실시예들에 따라 상이할 수 있다. As shown in FIG. 5 , the subject recommendation method according to the present embodiment performed better than a model using only one of “student characteristics” and “lecture content”. In particular, when the recommendation ranges were 3, 5, and 10, the precision and recall of the subject recommendation method according to this embodiment were excellent, and it was confirmed that it was preferable for analysis and service application. Also, a trade-off between precision and recall was confirmed according to the number of recommended subjects. In other words, the more subjects were recommended, the higher the probability that there was a subject recommended by the model among the subjects selected by the student, but the probability that the student actually selected the recommended subject was low. Based on this, the present invention may be implemented by defaulting to recommending 21 subjects in order to recommend many subjects to students. .

도 6은 일 실시예에 따른 과목 추천 방법에 의한 추천 정확도를 커버리지(coverage) 및 섀넌 엔트로피(Shannon entropy) 측면에서 종래 기술과 비교한 것을 나타내는 것으로서, 도 6의 (a)는 모델별 커버리지를 나타내고, 도 6의 (b)는 모델별 추천 다양성을 나타낸다. 6 shows a comparison of recommendation accuracy by the subject recommendation method according to an embodiment with the prior art in terms of coverage and Shannon entropy, and FIG. 6 (a) shows the coverage for each model. , (b) of FIG. 6 shows the variety of recommendations for each model.

도 6에서 커버리지는 전체 개설 과목 281개 중 추천 모델이 추천한 과목의 비율로 산출하였으며, 모델별 추천 다양성은 학생별로 얼마다 다양한 과목을 추천하는지를 의미하는 것으로서 섀넌 엔트로피 공식을 통해 산출하였다. 도시되는 바와 같이, 본 발명의 실시예에 따른 과목 추천 방법이 "수강생 특성" 또는 "강의내용" 중 어느 하나만을 사용하는 모델에 비해 더 많은 과목을 추천하였으며, 가장 넓은 범위의 과목 커버리지를 가지며 종래의 모델에 비해 더 다양한 추천 결과를 산출했다는 것을 알 수 있다. In FIG. 6 , the coverage was calculated as the ratio of subjects recommended by the recommendation model among 281 subjects, and the diversity recommended by each model means how many different subjects are recommended by each student, and was calculated using the Shannon entropy formula. As shown, the subject recommendation method according to the embodiment of the present invention recommended more subjects than a model using only one of “student characteristics” or “lecture content”, and has the widest subject coverage It can be seen that compared to the model of

도 7은 일 실시예에 따른 과목 추천 방법에 의한 추천 결과와 학과별 인기과목에 기반하여 과목을 추천한 결과를 비교한 것으로서, 도 7의 (a)는 정밀도와 재현율 곡선을 나타내며, 도 7의 (b)는 커버리지와 섀넌 엔트로피를 나타낸다. 7 is a comparison of the recommendation result by the subject recommendation method according to an embodiment and the subject recommendation result based on popular subjects by department. b) represents the coverage and Shannon entropy.

도 7에 도시된 결과에서 본 발명의 실시예에 따른 과목 추천 방법은 추천범위 3을 적용을 적용한 결과를 나타낸다. 또한, 학과별 인기 과목은 2020년 2020년 2학기 개설과목 중에서 2020년 1학기까지 해당 학과에서 가장 많이 수강한 과목을 추천한 결과를 통해 산출하였다. 도시되는 바와 같이, 정밀도와 재현율에서는 인기과목 기반 추천이 더 우수한 추천 성능을 보였지만, 추천과목 커버리지와 추천의 다양성 측면에서는 본 발명의 실시예에 따른 과목 추천 방법이 인기과목 기반의 추천 모델보다 폭 넓고 다양한 추천결과를 산출하는 것으로 나타났다. In the result shown in FIG. 7 , the subject recommendation method according to the embodiment of the present invention shows the result of applying the recommendation range 3 to the subject recommendation method. In addition, popular subjects by department were calculated based on the results of recommending the subjects that were taken the most in the department until the first semester of 2020 among the courses opened in the second semester of 2020. As shown, the popular subject-based recommendation showed better recommendation performance in terms of precision and recall, but in terms of coverage of recommended subjects and diversity of recommendations, the subject recommendation method according to the embodiment of the present invention is wider and wider than the popular subject-based recommendation model. It was found to produce various recommendation results.

도 8a 및 8b는 일 실시예에 따른 과목 유사도에 기반한 과목 추천 방법에 의하여 제공되는 과목지도의 예시를 나타내는 개념도이다.8A and 8B are conceptual diagrams illustrating examples of subject guidance provided by a subject recommendation method based on subject similarity according to an exemplary embodiment.

도 8a 및 8b에 도시된 과목지도는, 과목별 강의내용 매트릭스 및 과목정보 매트릭스를 McInnes, L., Healy, J. 및 Melville, J. 공저의 논문 "Umap: Uniform manifold approximation and projection for dimension reduction" (arXiv preprint arXiv: 1802.03426, 2018)에 개시된 UMAP 알고리즘을 이용하여 2차원으로 축소한 후 시각화한 결과물을 나타내는 것으로서, 과목지도 상에서 거리가 가까울수록 과목 간의 특성이 유사하며, 멀리 떨어져 있을수록 과목 간의 특성이 상이하다는 것을 의미한다. The subject map shown in FIGS. 8A and 8B shows the course content matrix and subject information matrix for each subject. McInnes, L., Healy, J., and Melville, J. co-authored paper "Umap: Uniform manifold approximation and projection for dimension reduction" (arXiv preprint arXiv: 1802.03426, 2018) shows the results visualized after being reduced to two dimensions using the UMAP algorithm disclosed in (arXiv preprint arXiv: 1802.03426, 2018). This means that it is different.

도 8a는 과목지도에서 각 과목의 개설학과에 따라 노드(node)가 서로 상이한 해칭을 갖도록 구분하여 나타낸 것으로서, 동일한 학과에서 개설한 과목이 서로 인접하여 위치하는 것을 통하여 본 발명의 실시예에 따른 과목 추천 방법에 의한 과목간 유사도 계산이 정확하다는 것을 알 수 있다. 도 8a에서는 흑백 지도상에서 해칭을 통하여 각 노드를 구분하였으나, 과목지도를 컬러로 제공하는 경우 각 노드의 색상을 통하여 개설학과를 구분할 수도 있다. 8A is a diagram showing that nodes have different hatchings according to departments opened in each subject on the subject map, and subjects opened in the same department are located adjacent to each other, so that subjects according to an embodiment of the present invention It can be seen that the similarity calculation between subjects by the recommendation method is accurate. In FIG. 8A , each node is distinguished through hatching on the black-and-white map. However, when the subject map is provided in color, the opened department may be distinguished through the color of each node.

또한, 도 8b는 학문영역 특성 데이터를 활용하여 특정 학문 영역의 특성이 강한 과목을 밝게 나타낸 것으로, 투자경제분석, 기술경영및전략, 공학도를위한기업가정신, 기업법 등 회계학(accounting) 특성이 높은 과목을 밝게 나타내었다. 그러나, 그 외 다른 학문영역에 대해서도 동일한 방식의 시각화를 적용할 수 있다. 또한, 다른 실시예에서는 학문영역이 아닌 과목정보 매트릭스에 기초하여 노드의 밝기를 구분함으로서, 예를 들어 특정 학과의 수업을 듣는 학생들의 비율이 높은 과목을 밝게 나타내도록 과목지도를 구성하는 것도 가능하다. 이와 같이 UMAP 알고리즘을 사용하여 차원이 축소된 과목지도 형태로 추천과목 정보를 제공하는 경우 전역 구조(global structure)와 위상적 구조(topological structure)를 저차원에서도 효과적으로 포착할 수 있도록 정보의 제공이 가능한 이점이 있다. In addition, FIG. 8B shows subjects with strong characteristics in a specific academic area by using the academic area characteristic data, and subjects with high accounting characteristics such as investment economic analysis, technology management and strategy, entrepreneurship for engineering students, and corporate law. is shown brightly. However, the same method of visualization can be applied to other academic fields. In addition, in another embodiment, by classifying the brightness of the nodes based on the subject information matrix rather than the academic area, it is possible to configure the subject map to brightly display subjects with a high ratio of students taking classes in a specific department. . In this way, when the recommended subject information is provided in the form of a reduced-dimensional subject map using the UMAP algorithm, it is possible to provide information so that the global structure and topological structure can be effectively captured even in low dimensions. There is an advantage.

이상에서 살펴본 바와 같이, 본 발명의 실시예들에 의하면 과목별 교수요목 및 학습목표 등에 대한 형태소 분석을 실시해 키워드를 추출한 후 TF-IDF 값을 통해 과목-키워드별 가중치를 계산할 수 있고, 이렇게 산출된 과목-키워드별 가중치와 연구 논문 키워드 기반 용어 사전을 활용하여 각 과목내용의 강의내용 특성 벡터를 생성할 수 있다. 또한, 각 과목의 수강생들의 학과별 수강비율을 통해 수강생 특성 벡터를 생성할 수 있다. 과목간 유사도는 이와 같은 과목별 강의내용 특성 벡터 및 수강생 특성 벡터에 기반한 코사인 유사도를 통해 계산하고, 학생별 관심과목과 유사도가 높은 과목들을 추천과목으로 추천할 수 있다. 나아가, 과목별 학문영역 특성 벡터에 UMAP 알고리즘을 적용하여 생성된 과목지도를 이용하여 과목 추천 정보를 시각적으로 제공할 수 있다. As described above, according to the embodiments of the present invention, after extracting keywords by performing morphological analysis on syllabus and learning goals for each subject, the weight for each subject-keyword can be calculated through the TF-IDF value, and the calculated The lecture content characteristic vector of each subject content can be generated by using the subject-keyword-based weight and the research paper keyword-based glossary. In addition, a student characteristic vector may be generated based on the ratio of students enrolled in each subject to each department. The degree of similarity between subjects is calculated through the cosine similarity based on the lecture content characteristic vector and student characteristic vector for each subject, and subjects with a high degree of similarity to the subjects of interest for each student can be recommended as recommended subjects. Furthermore, subject recommendation information can be visually provided using the subject map generated by applying the UMAP algorithm to the academic domain characteristic vector for each subject.

본 발명의 실시예들에 의하면, 강의내용이나 수강생 특성 중 어느 하나의 정보만을 고려하여 과목을 추천하는 것에 비하여 실제 학생의 과목 선택과 일치하는 추천 결과를 산출할 수 있으며, 추천 모델의 정확도를 유지하면서도 추천의 범위와 다양성 측면에서 더 양호한 결과를 얻을 수 있는 이점이 있다. 이를 통하여, 본 발명의 실시예들을 이용할 경우 학생별 관심사에 부합하는 폭넓고 다양한 과목을 추천함으로써 창의·융합형 미래 인재 육성을 위한 맞춤형 서비스를 구현할 수 있고, 이는 제2 전공이나 진로의 추천, 마이크로 또는 나노 학위 관련 서비스 등으로 확장될 수 있는 이점이 있다. According to the embodiments of the present invention, it is possible to calculate a recommendation result consistent with the actual student's course selection, and maintain the accuracy of the recommendation model, compared to recommending a course in consideration of only one piece of information among lecture content or student characteristics. However, it has the advantage of obtaining better results in terms of the scope and diversity of recommendations. Through this, when using the embodiments of the present invention, it is possible to implement a customized service for nurturing creative and convergence-type future talents by recommending a wide variety of subjects that meet the interests of each student, which is a recommendation of a second major or career path, micro Alternatively, there is an advantage that can be extended to nano-degree related services.

한편, 이상에서 설명한 실시예들에 따른 과목 유사도에 기반한 과목 추천 방법에 의한 동작은 적어도 부분적으로 컴퓨터 프로그램으로 구현되고 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다. 실시예들에 따른 과목 유사도에 기반한 과목 추천 방법에 의한 동작을 구현하기 위한 프로그램이 기록되고 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 또한, 본 실시예를 구현하기 위한 기능적인 프로그램, 코드 및 코드 세그먼트(segment)들은 본 실시예가 속하는 기술 분야의 통상의 기술자에 의해 용이하게 이해될 수 있을 것이다.Meanwhile, the operation by the method for recommending a subject based on subject similarity according to the embodiments described above may be at least partially implemented as a computer program and recorded in a computer-readable recording medium. A computer-readable recording medium in which a program for implementing an operation by a subject recommendation method based on subject similarity according to embodiments is recorded and includes all types of recording devices in which computer-readable data is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present embodiment may be easily understood by those skilled in the art to which the present embodiment belongs.

이상에서 살펴본 본 발명은 도면에 도시된 실시예들을 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다. 그러나, 이와 같은 변형은 본 발명의 기술적 보호범위 내에 있다고 보아야 한다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 청구범위의 기술적 사상에 의해서 정해져야 할 것이다.Although the present invention described above has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and those skilled in the art will understand that various modifications and variations of the embodiments are possible therefrom. However, such modifications should be considered to be within the technical protection scope of the present invention. Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

Claims

As a subject recommendation system based on subject similarity,
a first data analysis unit configured to generate lecture content information for each of the one or more subjects by analyzing text data related to the one or more subjects;
a second data analysis unit configured to generate student characteristic information for each of the one or more subjects by analyzing the attendance history data of the students for each subject;
a similarity determining unit configured to generate similarity information for each subject using the lecture content information and the student characteristic information; and
a recommended subject determination unit configured to generate subject recommendation information for the user based on the subject information of interest received from the user and the similarity information;
Further comprising a preprocessing unit that extracts keywords for each academic area from the subject information including the syllabus of the subject, the learning goal of the subject, the curriculum for each department, and the research thesis for each academic field, and pre-processes it in the text data format,
The subject information of interest includes at least one of the user's attendance history and desired subjects,
The recommended subject determination unit determines the user's preferred subject by using at least one information among the user's attendance history and desired subjects, and determines the recommended subject based on the degree of similarity to the preferred subject to obtain the subject recommendation information. further configured to create,
The recommended subject determining unit is further configured to select a preferred subject by excluding the subject that the user obligatedly took from the user's attendance record when the user's preferred subject is determined based on only the user's attendance record,
The recommended subject determination unit determines a recommended subject based on the preferred subjects of different students having the same characteristics as the user, when the user's attendance history and desired subjects are not entered in the interest subject information;
The preferred subjects of different students who have the same characteristics as the above users are,
A course recommendation system based on subject similarity that includes one or more of the desired courses entered by different students and one or more courses determined based on the previous course histories of different students.

delete

According to claim 1,
The first data analysis unit is further configured to learn the lecture contents of the subject by word frequency-inverse document frequency-based analysis of the text data, and to generate the lecture content information based on the learning result for the lecture contents Subject recommendation system based on the constructed subject similarity.

According to claim 1,
The second data analysis unit is further configured to generate attendance rate data for each department based on the attendance history data of a plurality of students, and generate the student characteristic information using the department-specific attendance ratio data of the students for each subject. Based on the subject recommendation system.

According to claim 1,
The subject recommendation system based on subject similarity is further configured to generate the similarity information for each subject by using the cosine similarity between matrices corresponding to the lecture content information and the student characteristic information, respectively.

delete

According to claim 1,
The subject similarity determination unit is further configured to provide the subject recommendation information to the user using a subject map defined by a plurality of dimensions corresponding to the characteristics of each subject and in which subjects are located closer to each other as the degree of similarity between subjects is higher. Based on the subject recommendation system.

generating lecture content information for each of the one or more subjects by analyzing text data related to the one or more subjects by the subject recommendation system;
generating, by the subject recommendation system, information on student characteristics for each of the one or more subjects by analyzing attendance history data of students for each subject;
generating, by the subject recommendation system, similarity information for each subject using the lecture content information and the student characteristic information; and
generating, by the subject recommendation system, subject recommendation information for the user based on the subject information of interest and the similarity information received from the user;
The subject recommendation system further comprises the step of extracting, by the subject recommendation system, keywords for each academic area from subject information including the syllabus of the subject, the learning goal of the subject, the curriculum for each department, and the research thesis for each academic field, and pre-processing it in the text data format, and ,
The subject information of interest includes at least one of the user's attendance history and desired subjects,
The step of generating the subject recommendation information includes:
determining, by the subject recommendation system, the preferred subject of the user by using information on at least one of the course history and desired subjects; and
determining, by the subject recommendation system, a recommended subject based on the degree of similarity to the preferred subject;
The step of determining the user's preferred subject includes:
If the user's preferred subject is determined based on only the user's attendance record, the user's compulsory subject is excluded from the user's attendance record to determine the preferred subject;
The step of determining the recommended subject is,
When the user's course history and desired course are not entered in the subject information of interest, a recommended subject is determined based on the preferred subjects of different students having the same characteristics as the user;
The preferred subjects of different students with the same characteristics as the above users are,
A course recommendation method based on subject similarity, including at least one of the desired courses entered by different students and one or more courses determined based on the previous course histories of different students.

delete

9. The method of claim 8,
The step of generating the lecture content information includes:
learning, by the subject recommendation system, the lecture contents of the subject through word frequency-inverse document frequency-based analysis of the text data; and
and generating, by the subject recommendation system, the lecture content information based on a learning result of the lecture content.

9. The method of claim 8,
The step of generating the student characteristic information includes:
generating, by the subject recommendation system, data on the attendance rate for each department based on the attendance history data of a plurality of students; and
and generating, by the subject recommendation system, the student characteristic information by using the data on the attendance rate by department of students for each subject.

9. The method of claim 8,
The generating of the similarity information includes, by the subject recommendation system, calculating a cosine similarity between matrices corresponding to the lecture content information and the student characteristic information, respectively.

delete

9. The method of claim 8,
The subject recommendation system further includes providing the subject recommendation information to the user by using a subject map defined by a plurality of dimensions corresponding to the characteristics of each subject and in which subjects are located closer to each other as the similarity between subjects is higher. Subject recommendation method based on subject similarity.

A computer program stored in a computer-readable recording medium in combination with hardware to execute the subject recommendation method based on subject similarity according to any one of claims 8, 10 to 12 and 14.