KR102557827B1

KR102557827B1 - System and method for recommending related courses based on graph data and recording medium for performing the same

Info

Publication number: KR102557827B1
Application number: KR1020220038656A
Authority: KR
Inventors: 문기범; 이진숙; 한수연; 이수강; 권혜정; 한재호; 김규태
Original assignee: 고려대학교 산학협력단
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-07-19

Abstract

본 발명은 그래프 데이터 기반 관련 과목 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체에 대한 것이다.
본 발명에 따른 그래프 데이터 기반 관련 과목 추천 시스템은 설정 기간 동안의 수강신청 이력 데이터를 전처리하여 수강신청 데이터 세트를 생성하는 전처리부; 상기 생성된 수강신청 데이터 세트를 이용하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목 간 네트워크 데이터를 구축하는 데이터 구축부; 귀납적 학습(inductive learning) 방식의 그래프 세이지(Graph SAGE) 알고리즘을 이용하여 상기 구축된 과목 간 네트워크 데이터를 학문영역 차원에 임베딩하는 임베딩부; 및 상기 임베딩 결과에 따른 과목별 노드의 특성을 이용하여 사용자로부터 입력된 과목의 유사 과목을 추천하는 과목 추천부를 포함한다.The present invention relates to a graph data-based related subject recommendation system and method, and a recording medium for performing the same.
A system for recommending related subjects based on graph data according to the present invention includes a pre-processing unit for generating a course registration data set by pre-processing course registration history data during a set period; a data construction unit that builds network data between subjects connected in a precedent relationship based on a point in time when one class is enrolled by using the generated course registration data set; an embedding unit that embeds the constructed inter-subject network data into an academic area dimension using an inductive learning-based Graph SAGE algorithm; and a subject recommendation unit for recommending subjects similar to the subject input from the user by using the characteristics of the node for each subject according to the embedding result.

Description

Graph data-based related subject recommendation system and method, and recording medium for performing it

본 발명은 그래프 데이터 기반 관련 과목 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체에 관한 것으로서, 더욱 상세하게는 그래프 데이터를 이용하여 학생들에게 관련 과목을 추천하는 그래프 데이터 기반 관련 과목 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체에 관한 것이다.The present invention relates to a graph data-based related subject recommendation system and method, and a recording medium for performing the same, and more particularly, a graph data-based related subject recommendation system and method for recommending related subjects to students using graph data. , it relates to a recording medium for performing this.

최근 많은 전문가들은 인공지능(Artificial Intelligence; AI) 기술이 우리 사회에 혁명적인 변화를 일으킬 것으로 전망하고 있다. 특히, 코로나바이러스 감염증-19(COVID-19)로 인해 사회 각 분야의 디지털화가 빠르게 진행되면서 많은 양의 데이터가 축적되고 있으며, AI 기술의 도입과 확산이 가속화되고 있다.Recently, many experts predict that artificial intelligence (AI) technology will bring about revolutionary changes in our society. In particular, due to COVID-19, a large amount of data is accumulating as digitalization in each field of society is rapidly progressing, and the introduction and spread of AI technology is accelerating.

교육 영역에서도 AI 도입에 관한 논의가 활발하게 이루어지고 있으며, 최근에는 학생 맞춤형 교육 서비스에 대한 사회적 수요가 높아짐에 따라 AI 기술을 도입한 맞춤형 추천의 중요성이 증대되고 있다.In the field of education, there are active discussions on the introduction of AI, and recently, as social demand for student-tailored education services has increased, the importance of personalized recommendations using AI technology is increasing.

그러나, 종래의 교육 관련 AI 기술에서는, 연구 논문의 키워드, 교양강의의 교수요목 및 학습목표와 같은 비정형 데이터를 AI를 통해 분석하는 데에는 통찰이 미치지 못하고 있으며, 또한 이러한 비정형 데이터를 이용하여 과목간의 유사도를 도출하거나 이를 시각적으로 제공하는 방법은 전혀 개시하지 못하고 있다.However, in the conventional education-related AI technology, insight is not reached in analyzing unstructured data such as keywords of research papers, syllabuses and learning objectives of liberal arts lectures through AI, and similarity between subjects using such unstructured data. A method of deriving or providing it visually is not disclosed at all.

일반적으로 학생들이 한 과목을 수강하고 다음 과목을 수강하는 행동에는 학과의 커리큘럼이나 관심사의 지속성 등으로 인한 일정한 패턴이 나타난다. 이러한 과목간 관련성은 학생이 과목을 선택하고 관심사를 학문적으로 발전시키는 과정에서 유용한 정보로 활용될 수 있다. 하지만 현재까지는 학생들이 이러한 정보에 접근하는 것이 제한되어 있었다.In general, a certain pattern appears in the behavior of students taking one course and taking the next course due to the department's curriculum or the continuity of their interests. This relationship between subjects can be used as useful information in the process of students selecting subjects and developing their interests academically. Until now, however, students' access to this information has been limited.

따라서 관심 과목과 관련된 과목에 대한 정보를 제공하고 더불어 유사한 타학과 과목을 추천해주는 시스템의 개발이 필요하다.Therefore, it is necessary to develop a system that provides information on subjects related to subjects of interest and recommends similar subjects from other departments.

본 발명의 배경이 되는 기술은 대한민국 공개특허공보 제10-1754723호(2017. 07. 06. 공고)에 개시되어 있다.The background technology of the present invention is disclosed in Republic of Korea Patent Publication No. 10-1754723 (2017. 07. 06. notice).

본 발명이 이루고자 하는 기술적 과제는 그래프 데이터를 이용하여 학생들에게 관련 과목을 추천하는 그래프 데이터 기반 관련 과목 추천 시스템 및 그 방법, 이를 수행하기 위한 기록매체를 제공하기 위한 것이다.A technical problem to be achieved by the present invention is to provide a graph data-based related subject recommendation system and method for recommending related subjects to students using graph data, and a recording medium for performing the same.

이러한 기술적 과제를 이루기 위한 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 시스템은, 설정 기간 동안의 수강신청 이력 데이터를 전처리하여 수강신청 데이터 세트를 생성하는 전처리부; 상기 생성된 수강신청 데이터 세트를 이용하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목 간 네트워크 데이터를 구축하는 데이터 구축부; 귀납적 학습(inductive learning) 방식의 그래프 세이지(Graph SAGE) 알고리즘을 이용하여 상기 구축된 과목 간 네트워크 데이터를 학문영역 차원에 임베딩하는 임베딩부; 및 상기 임베딩 결과에 따른 과목별 노드의 특성을 이용하여 사용자로부터 입력된 과목의 유사 과목을 추천하는 과목 추천부를 포함한다.A system for recommending related subjects based on graph data according to an embodiment of the present invention to achieve this technical problem includes a pre-processing unit for generating a course registration data set by pre-processing course registration history data during a set period; a data construction unit that builds network data between subjects connected in a precedent relationship based on a point in time when one class is enrolled by using the generated course registration data set; an embedding unit that embeds the constructed inter-subject network data into an academic area dimension using an inductive learning-based Graph SAGE algorithm; and a subject recommendation unit for recommending subjects similar to the subject input from the user by using the characteristics of the node for each subject according to the embedding result.

이때, 상기 데이터 구축부는 상기 수강신청 데이터 세트를 참고하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목들을 통해 과목 간 관계 데이터를 산출하여 상기 과목 간 네트워크 데이터를 구축할 수 있다.At this time, the data construction unit may construct network data between subjects by calculating relation data between subjects through subjects connected in a precedent relationship based on the time point of one lecture with reference to the course registration data set.

또한, 상기 임베딩부는 대상 노드의 특성 벡터를 산출하기 위한 이웃 노드를 랜덤 추출하여 대상 노드와 직접 연결된 이웃 노드와, 그 이웃 노드의 이웃 노드까지의 연결 n개를 선택하고, 선택된 이웃 노드의 특성 벡터를 통합하여 대상 노드의 특성을 업데이트하고, 귀납적 학습 방식으로 대상 노드의 표현 학습을 수행하여 과목별 노드의 특성을 새로운 차원에 임베딩하는 순으로 그래프 세이지 알고리즘의 노드 표현 학습을 수행할 수 있다.In addition, the embedding unit randomly extracts neighboring nodes for calculating the feature vector of the target node, selects a neighboring node directly connected to the target node and n connections of the neighboring node to the neighboring node, and selects a characteristic vector of the selected neighboring node. It is possible to perform the node representation learning of the Graph Sage Algorithm in the order of updating the characteristics of the target node by integrating and embedding the characteristics of the node for each subject into a new dimension by performing the expression learning of the target node using an inductive learning method.

또한, 상기 구축된 과목 간 네트워크 데이터를 이용하여 한 명의 학생이 하나의 선수과목을 수강한 뒤 후 수강과목을 수강할 때마다 선수과목과 후 수강과목의 연결을 증가시켜 선후 수강과목을 동시 수강한 빈도에서 선수과목을 들은 빈도를 나눈 값을 통해 선후 수강 조건부 확률 데이터를 산출하는 선후 수강 조건부 확률 산출부; 상기 생성된 수강신청 데이터 세트를 이용하여 한 학기에 동시 수강한 두 과목이 연결된 동시 수강 네트워크 데이터를 구축하는 동시 수강 네트워크 구축부; 및 상기 산출된 선후 수강 조건부 확률 데이터를 이용하여 특정 과목에 대한 이전, 동시 및 이후 수강과목 리스트를 제공하기 위한 배치 테이블을 생성하는 배치 테이블 생성부를 더 포함할 수 있다.In addition, by using the network data between the built courses, each time a student takes a prerequisite course and then takes a later course, the connection between the prerequisite course and the later course is increased, so that the first and subsequent courses are taken simultaneously. a prior/post enrollment conditional probability calculation unit that calculates prior/post enrollment conditional probability data through a value obtained by dividing the frequency by the frequency of taking prerequisite courses; a concurrent course enrollment network building unit that constructs concurrent course enrollment network data in which two courses taken simultaneously in one semester are connected using the generated course registration data set; and a placement table generating unit generating a placement table for providing a list of previous, concurrent, and subsequent courses for a specific subject using the calculated prior/post-enrollment conditional probability data.

또한, 상기 배치 테이블 생성부는 특정 과목을 수강한 경우 그 이전에 수강했을 확률이 가장 높은 과목 순으로 이전 수강과목 리스트를 산출하고, 동시에 수강할 확률이 가장 높은 과목 순으로 동시 수강과목 리스트를 산출하며, 이후에 수강할 확률이 가장 높은 과목 순으로 이후 수강과목 리스트를 산출하여 제공할 수 있다.In addition, when a specific subject is taken, the arrangement table generating unit calculates a list of previous courses in the order of highest probability of taking the previous course, and calculates a list of concurrent courses in order of highest probability of taking the same course. , it is possible to calculate and provide a list of courses to be taken in the order of the most likely courses to be taken thereafter.

또한, 본 발명의 다른 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 방법은, 설정 기간 동안의 수강신청 이력 데이터를 전처리하여 수강신청 데이터 세트를 생성하는 단계; 상기 생성된 수강신청 데이터 세트를 이용하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목 간 네트워크 데이터를 구축하는 단계; 귀납적 학습(inductive learning) 방식의 그래프 세이지(Graph SAGE) 알고리즘을 이용하여 상기 구축된 과목 간 네트워크 데이터를 학문영역 차원에 임베딩하는 단계; 및 상기 임베딩 결과에 따른 과목별 노드의 특성을 이용하여 사용자로부터 입력된 과목의 유사 과목을 추천하는 단계를 포함한다.In addition, a method for recommending related subjects based on graph data according to another embodiment of the present invention includes generating a course registration data set by pre-processing course registration history data during a set period; constructing network data between subjects connected in a precedence relationship based on a point in time of attending one lecture by using the generated course registration data set; embedding the constructed inter-subject network data in the dimension of an academic domain using an inductive learning-based Graph SAGE algorithm; and recommending subjects similar to the subject input from the user by using characteristics of nodes for each subject according to the embedding result.

이때, 상기 과목 간 네트워크 데이터를 구축하는 단계는 상기 수강신청 데이터 세트를 참고하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목들을 통해 과목 간 관계 데이터를 산출하여 상기 과목 간 네트워크 데이터를 구축할 수 있다.At this time, the step of constructing the network data between subjects is to construct the network data between subjects by calculating the relationship data between subjects through the subjects connected in a precedent relationship based on the time of taking one lecture with reference to the course registration data set. can

또한, 상기 임베딩하는 단계는 대상 노드의 특성 벡터를 산출하기 위한 이웃 노드를 랜덤 추출하여 대상 노드와 직접 연결된 이웃 노드와, 그 이웃 노드의 이웃 노드까지의 연결 n개를 선택하고, 선택된 이웃 노드의 특성 벡터를 통합하여 대상 노드의 특성을 업데이트하고, 귀납적 학습 방식으로 대상 노드의 표현 학습을 수행하여 과목별 노드의 특성을 새로운 차원에 임베딩하는 순으로 그래프 세이지 알고리즘의 노드 표현 학습을 수행할 수 있다.In addition, the step of embedding randomly extracts neighboring nodes for calculating the feature vector of the target node, selects a neighboring node directly connected to the target node and n connections of the neighboring node to the neighboring node, and selects the selected neighboring node. The node representation learning of the Graph Sage Algorithm can be performed in the order of updating the characteristics of the target node by integrating the feature vectors, performing the expression learning of the target node using an inductive learning method, and embedding the characteristics of each subject node into a new dimension. .

또한, 상기 네트워크 데이터를 구축하는 단계 이후, 상기 구축된 과목 간 네트워크 데이터를 이용하여 한 명의 학생이 하나의 선수과목을 수강한 뒤 후 수강과목을 수강할 때마다 선수과목과 후 수강과목의 연결을 증가시켜 선후 수강과목을 동시 수강한 빈도에서 선수과목을 들은 빈도를 나눈 값을 통해 선후 수강 조건부 확률 데이터를 산출하는 단계; 상기 생성된 수강신청 데이터 세트를 이용하여 한 학기에 동시 수강한 두 과목이 연결된 동시 수강 네트워크 데이터를 구축하는 단계; 및 상기 산출된 선후 수강 조건부 확률 데이터를 이용하여 특정 과목에 대한 이전, 동시 및 이후 수강과목 리스트를 제공하기 위한 배치 테이블을 생성하는 단계를 더 포함할 수 있다.In addition, after the step of constructing the network data, each time a student takes a prerequisite course and then takes a subsequent course by using the network data between the constructed courses, the connection between the prerequisite course and the subsequent course is established. calculating conditional probability data of prior and subsequent enrollment through a value obtained by dividing the frequency of taking prerequisite courses by the frequency of concurrent enrollment of prior and subsequent courses; constructing concurrent course enrollment network data in which two courses taken simultaneously in one semester are connected using the generated course registration data set; and generating a placement table for providing a list of previous, concurrent, and subsequent courses for a specific subject by using the calculated prior/post-enrollment conditional probability data.

또한, 상기 배치 테이블을 생성하는 단계는 특정 과목을 수강한 경우 그 이전에 수강했을 확률이 가장 높은 과목 순으로 이전 수강과목 리스트를 산출하고, 동시에 수강할 확률이 가장 높은 과목 순으로 동시 수강과목 리스트를 산출하며, 이후에 수강할 확률이 가장 높은 과목 순으로 이후 수강과목 리스트를 산출하여 제공할 수 있다.In addition, in the step of generating the arrangement table, when a specific course is taken, a list of previous courses taken in the order of the highest probability of taking the previous course is calculated, and a list of concurrent courses taken in order of the highest probability of taking the same course. It is possible to calculate and provide a list of subsequent courses in the order of the highest probability to be taken later.

또한, 본 발명의 다른 실시 예에 따른 컴퓨터로 판독 가능한 기록 매체에는 상기 그래프 데이터 기반 관련 과목 추천 방법을 수행하기 위한 컴퓨터 프로그램이 기록될 수 있다.In addition, a computer program for performing the graph data-based related subject recommendation method may be recorded in a computer-readable recording medium according to another embodiment of the present invention.

이와 같이 본 발명에 따르면, 그래프 데이터를 이용하여 학생들에게 관련 과목을 추천해줌으로써 실제 학생들의 수강 패턴을 고려한 커리큘럼을 추천할 수 있어 학생들의 만족도를 향상시킬 수 있는 효과가 있다.As described above, according to the present invention, by recommending related subjects to students using graph data, it is possible to recommend a curriculum considering actual students' course attendance patterns, thereby improving student satisfaction.

또한 본 발명에 따르면, 텍스트 데이터만 활용하는 기존 방법 대비 연관 과목을 함께 고려함으로써 비교적 분석할 텍스트 데이터가 부족한 상황에서도 더 정확한 과목 임베딩 및 추천이 가능한 이점이 있다.In addition, according to the present invention, there is an advantage in enabling more accurate subject embedding and recommendation even in a situation where text data to be analyzed is relatively insufficient by considering related subjects together compared to the existing method using only text data.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 시스템을 나타낸 시스템 구성도이다.
도 2는 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 방법의 동작 흐름을 도시한 순서도이다.
도 3은 본 발명의 실시 예에 따른 학과별 커리큘럼 추천 과정을 예시적으로 도시한 도면이다.
도 4는 도 2의 S60 단계에 대한 세부 동작 흐름을 도시한 순서도이다.
도 5는 도 4의 과정을 설명하기 위해 도시한 도면이다.
도 6은 본 발명의 실시 예에 따른 과목별 노드 임베딩 결과를 종래와 비교하여 예시적으로 도시한 도면이다.
도 7은 본 발명의 실시 예에 따른 관련 과목 추천 결과를 예시적으로 도시한 도면이다.
도 8은 본 발명의 실시 예에 따른 유사한 타학과 과목 추천 결과를 예시적으로 도시한 도면이다.1 is a system configuration diagram illustrating a system for recommending related subjects based on graph data according to an embodiment of the present invention.
2 is a flowchart illustrating an operation flow of a method for recommending related subjects based on graph data according to an embodiment of the present invention.
3 is a diagram exemplarily illustrating a curriculum recommendation process for each department according to an embodiment of the present invention.
FIG. 4 is a flowchart illustrating a detailed operation flow for step S60 of FIG. 2 .
FIG. 5 is a diagram for explaining the process of FIG. 4 .
6 is a diagram illustratively illustrating a result of node embedding for each subject according to an embodiment of the present invention compared with the conventional one.
7 is a diagram exemplarily illustrating results of recommending related subjects according to an embodiment of the present invention.
8 is a diagram illustratively illustrating similar subject recommendation results from other departments according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In this process, the thickness of lines or the size of components shown in the drawings may be exaggerated for clarity and convenience of description.

또한 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or operator. Therefore, definitions of these terms will have to be made based on the content throughout this specification.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

먼저, 도 1을 통해 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 시스템에 대하여 설명한다.First, a related subject recommendation system based on graph data according to an embodiment of the present invention will be described with reference to FIG. 1 .

도 1은 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 시스템을 나타낸 시스템 구성도이다.1 is a system configuration diagram illustrating a system for recommending related subjects based on graph data according to an embodiment of the present invention.

도 1을 참조하면, 그래프 데이터 기반 관련 과목 추천 시스템(3)은 유선 또는 무선 네트워크를 통하여 수강생에 의해 소지되는 사용자 단말(1), 교육기관 서버(2) 또는 학술정보 제공 서버(4) 등과 통신하면서 동작하도록 구성될 수 있다. 유선 또는 무선 네트워크를 통한 통신 방법은 객체와 객체가 네트워킹 할 수 있는 모든 통신 방법을 포함할 수 있으며, 유선 통신, 무선 통신, 3G, 4G, 혹은 그 이외의 방법으로 제한되지 않는다. Referring to FIG. 1, the graph data-based related subject recommendation system 3 communicates with a user terminal 1 possessed by a student, an educational institution server 2, or an academic information providing server 4 through a wired or wireless network. It can be configured to operate while A communication method through a wired or wireless network may include all communication methods that can be networked between objects and objects, and are not limited to wired communication, wireless communication, 3G, 4G, or other methods.

예를 들어, 유선 또는 무선 네트워크는 LAN(Local Area Network), MAN(Metropolitan Area Network), GSM(Global System for Mobile Network), EDGE(Enhanced Data GSM Environment), HSDPA(High Speed Downlink Packet Access), W-CDMA(Wideband Code Division Multiple Access), CDMA(Code Division Multiple Access), TDMA(Time Division Multiple Access), 블루투스(Bluetooth), 지그비(Zigbee), 와이-파이(Wi-Fi), VoIP(Voice over Internet Protocol), LTE Advanced, IEEE802.16m, WirelessMAN-Advanced, HSPA+, 3GPP Long Term Evolution (LTE), Mobile WiMAX (IEEE 802.16e), UMB (formerly EV-DO Rev. C), Flash-OFDM, iBurst and MBWA (IEEE 802.20) systems, HIPERMAN, Beam-Division Multiple Access (BDMA), Wi-MAX(World Interoperability for Microwave Access) 및 초음파 활용 통신으로 이루어진 군으로부터 선택되는 하나 이상의 통신 방법에 의한 통신 네트워크를 지칭할 수 있으나, 이에 한정되는 것은 아니다.For example, a wired or wireless network may include Local Area Network (LAN), Metropolitan Area Network (MAN), Global System for Mobile Network (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA), W -CDMA (Wideband Code Division Multiple Access), CDMA (Code Division Multiple Access), TDMA (Time Division Multiple Access), Bluetooth, Zigbee, Wi-Fi, Voice over Internet (VoIP) Protocol), LTE Advanced, IEEE802.16m, WirelessMAN-Advanced, HSPA+, 3GPP Long Term Evolution (LTE), Mobile WiMAX (IEEE 802.16e), UMB (formerly EV-DO Rev. C), Flash-OFDM, iBurst and MBWA (IEEE 802.20) systems, HIPERMAN, Beam-Division Multiple Access (BDMA), Wi-MAX (World Interoperability for Microwave Access), and ultrasound communication may refer to a communication network by one or more communication methods selected from the group consisting of , but is not limited thereto.

본 발명의 실시 예에서, 그래프 데이터 기반 관련 과목 추천 시스템(3)은, 전처리부(31), 데이터 구축부(32), 선후 수강 조건부 확률 산출부(33), 동시 수강 네트워크 구축부(34), 배치 테이블 생성부(35), 임베딩부(36), 입력부(37) 및 과목 추천부(38)를 포함한다.In an embodiment of the present invention, the graph data-based related subject recommendation system 3 includes a pre-processing unit 31, a data construction unit 32, a first-post enrollment conditional probability calculation unit 33, and a concurrent course enrollment network construction unit 34 , a placement table generator 35, an embedding unit 36, an input unit 37, and a subject recommendation unit 38.

본 명세서에 기재된 시스템, 장치 및 서버는 전적으로 하드웨어이거나, 또는 부분적으로 하드웨어이고 부분적으로 소프트웨어인 측면을 가질 수 있다. 예컨대, 본 명세서의 시스템, 장치 및 서버와, 이들에 포함된 각 부(unit)는 특정 형식 및 내용의 데이터를 처리하거나 또는 전자통신 방식으로 주고받기 위한 하드웨어 및 이에 관련된 소프트웨어를 통칭할 수 있다. 본 명세서에서 "부", "모듈", "장치", "단말기", "서버" 또는 "시스템" 등의 용어는 하드웨어 및 해당 하드웨어에 의해 구동되는 소프트웨어의 조합을 지칭하는 것으로 의도된다. 예를 들어, 하드웨어는 CPU 또는 다른 프로세서(processor)를 포함하는 데이터 처리 기기일 수 있다. 또한, 하드웨어에 의해 구동되는 소프트웨어는 실행중인 프로세스, 객체(object), 실행파일(executable), 실행 스레드(thread of execution), 프로그램(program) 등을 지칭할 수 있다.The systems, devices and servers described herein may be entirely hardware, or may have aspects that are part hardware and part software. For example, the systems, devices, and servers of the present specification, and each unit included therein, may collectively refer to hardware and software related thereto for processing or exchanging data in a specific form and content or by electronic communication. In this specification, terms such as "unit", "module", "device", "terminal", "server" or "system" are intended to refer to a combination of hardware and software driven by the hardware. For example, the hardware may be a data processing device including a CPU or other processor. Also, software driven by hardware may refer to a running process, an object, an executable file, a thread of execution, a program, and the like.

또한, 본 실시예에 따른 관련 과목 추천 시스템(3)을 구성하는 각각의 요소는 반드시 서로 물리적으로 구분되는 별개의 장치를 지칭하는 것으로 의도되지 않는다. 즉, 도 1의 전처리부(31), 데이터 구축부(32), 선후 수강 조건부 확률 산출부(33), 동시 수강 네트워크 구축부(34), 배치 테이블 생성부(35), 임베딩부(36), 입력부(37) 및 과목 추천부(38) 등은 관련 과목 추천 시스템(3)을 구성하는 하드웨어를 해당 하드웨어에 의해 수행되는 동작에 따라 기능적으로 구분한 것일 뿐, 반드시 각각의 부가 서로 독립적으로 구비되어야 하는 것이 아니다. 물론, 실시예에 따라서는 관련 과목 추천 시스템(3)의 각 부 중 하나 이상이 서로 물리적으로 구분되는 별개의 장치로 구현되는 것도 가능하다.In addition, each element constituting the related subject recommendation system 3 according to the present embodiment is not necessarily intended to refer to a separate device that is physically separated from each other. That is, the pre-processing unit 31 of FIG. 1, the data construction unit 32, the prior-post enrollment conditional probability calculation unit 33, the concurrent course enrollment network establishment unit 34, the placement table generation unit 35, and the embedding unit 36 , The input unit 37 and the subject recommendation unit 38 functionally divide the hardware constituting the related subject recommendation system 3 according to the operation performed by the corresponding hardware, and each unit is provided independently of each other. It's not what it should be. Of course, depending on the embodiment, one or more of the parts of the related subject recommendation system 3 may be implemented as a separate device that is physically separated from each other.

관련 과목 추천 시스템(3)은 과목을 입력한 사용자에게 입력된 과목과 유사한 과목 또는 타학과의 유사한 과목을 추천할 수 있다. 사용자가 과목을 입력하여 전송하고 대응하는 정보를 수신하는 것이 가능하도록, 관련 과목 추천 시스템(3)은 사용자 단말(1) 상에서 실행되는 애플리케이션(또는, 앱(app))과 통신함으로써 애플리케이션의 기능 수행을 가능하게 하는 애플리케이션 서비스 서버의 기능을 수행할 수 있다. 또는, 다른 실시예에서 관련 과목 추천 시스템(3)은 사용자 단말(1) 상에서 실행되는 웹 브라우저에 의하여 접속 가능한 웹 페이지를 제공하는 웹 서버 등의 형태로 구현될 수도 있다.The related subject recommendation system 3 may recommend subjects similar to the input subject or similar subjects in other departments to the user who inputs the subject. The related subject recommendation system 3 communicates with an application (or app) running on the user terminal 1 so that the user can input and transmit a subject and receive corresponding information, thereby performing the function of the application. It can perform the function of an application service server that enables Alternatively, in another embodiment, the related subject recommendation system 3 may be implemented in the form of a web server providing a web page accessible by a web browser running on the user terminal 1 .

도 1에 도시된 실시예에서 사용자 단말(1)은 노트북 컴퓨터의 형태로 도시되었다. 그러나 이는 예시적인 것으로서, 사용자 단말(1)의 종류는 도면에 도시된 것으로 한정되는 것은 아니다. 예를 들어, 전공을 추천받고자 하는 사용자는 스마트폰(smartphone)과 같은 이동 통신 단말기, 개인용 컴퓨터(personal computer), 노트북 컴퓨터, PDA(personal digital assistant), 태블릿(tablet), IPTV(Internet Protocol Television) 등을 위한 셋톱박스(set-top box) 등 임의의 컴퓨팅 장치를 이용하여 관련 과목 추천 시스템(3)이 제공하는 기능을 사용할 수 있다.In the embodiment shown in FIG. 1, the user terminal 1 is shown in the form of a notebook computer. However, this is just an example, and the type of user terminal 1 is not limited to that shown in the drawings. For example, a user who wants to be recommended for a major may use a mobile communication terminal such as a smartphone, a personal computer, a notebook computer, a personal digital assistant (PDA), a tablet, and an Internet Protocol Television (IPTV) The function provided by the related subject recommendation system 3 may be used by using an arbitrary computing device such as a set-top box.

전처리부(31)는 설정 기간 동안의 수강신청 이력 데이터를 전처리하여 수강신청 데이터 세트를 생성한다.The pre-processing unit 31 creates a course registration data set by pre-processing the course registration history data during the set period.

자세히는, 전처리부(31)는 과목 간 그래프 데이터를 구축하기 위해 학생들이 지난 N년간 간 다수의 학생이 생성한 다수 건의 수강신청 이력 데이터를 추출한다. 이 중에서 과목 추천 서비스에 적합하지 않거나 필수적으로 응답해야 하는 과목과 관련된 수강이력 데이터를 제외하고 남은 과목 중 2학점 이상인 과목만 유지한다. 또한 학생별로 최초 1회 수강 건수만 분석에 반영하고, 재수강은 분석에서 제외하는 것이 바람직하다. 이를 통해 구축된 최종 분석용 수강신청 데이터 세트를 생성한다.In detail, the pre-processing unit 31 extracts a number of course registration history data generated by a number of students for the past N years in order to construct graph data between subjects. Among these, only courses with 2 or more credits are maintained among the remaining courses, excluding course history data related to courses that are not suitable for the course recommendation service or that require responses. In addition, it is desirable to reflect only the number of first-time courses per student in the analysis and exclude re-takes from the analysis. Through this, a course registration data set for final analysis is created.

데이터 구축부(32)는 전처리부(31)를 통해 생성된 수강신청 데이터 세트를 이용하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목 간 네트워크 데이터를 구축한다.The data builder 32 builds network data between subjects connected in a precedence relationship based on the time point of one lecture by using the course registration data set generated through the preprocessor 31.

이때, 데이터 구축부(32)는 수강신청 데이터 세트를 참고하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목들을 통해 과목 간 관계 데이터를 산출하여 과목 간 네트워크 데이터를 구축한다.At this time, the data construction unit 32 references the course registration data set and constructs network data between subjects by calculating relation data between subjects through subjects connected in a precedent relationship based on the point in time of one lecture.

자세히는, 데이터 구축부(32)는 전처리부(31)를 통해 전처리를 마친 수강신청 데이터 세트를 참고하여 강의수강 시점을 기준으로 선후 관계로 연결된 과목 간 네트워크 데이터를 구축한다. In detail, the data construction unit 32 refers to the course registration data set preprocessed through the preprocessing unit 31 and builds network data between subjects connected in a precedent relationship based on the time of class enrollment.

예를 들어, 어떤 학생이 1학년 1학기에 "심리학의기초I"라는 과목을 수강한 후 1학년 2학기에 "성격심리학" 과목을 수강하였다면, "심리학의기초I"과 "성격심리학"이 각각 선수강 과목과 후수강 과목의 관계로 서로 연결하여 과목 간 네트워크 데이터를 구축한다. 이때, 같은 학기에 신청한 과목끼리는 링크를 생성하지 않는다.For example, if a student took a course called "Basics of Psychology I" in the first semester of the freshman year and then took a course called "Personality Psychology" in the second semester of the freshman year, then "Basics of Psychology I" and "Personality Psychology" would be Establish network data between subjects by linking each other with each other in the relationship between prerequisite courses and post course courses. At this time, links are not created between courses applied for in the same semester.

선후 수강 조건부 확률 산출부(33)는 데이터 구축부(32)에서 구축된 과목 간 네트워크 데이터를 이용하여 한 명의 학생이 하나의 선수과목을 수강한 뒤 후 수강과목을 수강할 때마다 선수과목과 후 수강과목의 연결을 증가시켜 선후 수강과목을 동시 수강한 빈도에서 선수과목을 들은 빈도를 나눈 값을 통해 선후 수강 조건부 확률 데이터를 산출한다.The prerequisite course enrollment conditional probability calculation unit 33 uses the inter-course network data constructed in the data construction unit 32 to determine whether or not a student takes a prerequisite course and then takes a subsequent course whenever a student takes a prerequisite course. By increasing the connection of courses taken, the conditional probability data of prior and subsequent enrollment is calculated by dividing the frequency of taking prerequisite courses from the frequency of concurrently taking prior and subsequent courses.

자세히는, 선후 수강 조건부 확률 산출부(33)는 데이터 구축부(32)에서 구축된 과목 간 네트워크 데이터를 바탕으로 선수과목(source)을 수강했을 때, 후 수강과목(target)을 수강할 조건부 확률(p_source_target)을 산출한다.In detail, the conditional probability calculation unit 33 of taking a prerequisite course (target), when a prerequisite course (source) is taken based on the network data between subjects constructed in the data building unit 32, is a conditional probability of taking a later course (target) Calculate (p_source_target).

즉, 한 명의 학생이 하나의 선수과목을 수강한 뒤 후 수강과목을 수강할 때마다 선수과목과 후 수강과목의 연결(n_link)을 1회씩 증가시킨다. 그리고 선후 수강과목을 동시 수강한 빈도에서 선수과목을 들은 빈도를 나눈 값을 통해 선후 수강 조건부 확률을 산출한다. 이 중에서 연결 횟수가 n회 미만인 선후 과목 쌍은 제거하는 것이 바람직하다. 이렇게 산출된 과목 간 그래프 데이터를 조건부 확률이 높은 순으로 정렬하여 선후 수강 조건부 확률 데이터를 산출한다.That is, each time a student takes a prerequisite course and then takes a later course, the connection (n_link) between the prerequisite course and the subsequent course is increased by one time. Then, the conditional probability of prior and subsequent enrollment is calculated by dividing the frequency of taking prerequisite courses by the frequency of concurrent enrollment of prior and subsequent courses. Among them, it is desirable to remove the first and second subject pairs whose number of connections is less than n times. The graph data between subjects calculated in this way are sorted in the order of high conditional probability to calculate the conditional probability data of the first and second classes.

예를 들어, 선수강 과목을 A, 후수강 과목을 B라고 가정했을 때, (A과목 수강 후 B과목 수강 횟수) / (A과목 수강 후 다른 과목 수강한 전체 횟수)로 선후 수강 조건부 확률 데이터를 산출한다.For example, assuming that the prerequisite course is A and the subsequent course is B, the conditional probability data for prior and subsequent enrollment is calculated as (the number of courses B taken after taking A) / (the total number of other courses taken after taking A) yield

동시 수강 네트워크 구축부(34)는 전처리부(31)를 통해 생성된 수강신청 데이터 세트를 이용하여 한 학기에 동시 수강한 두 과목이 연결된 동시 수강 네트워크 데이터를 구축한다.The concurrent course enrollment network construction unit 34 uses the course registration data set generated through the preprocessing unit 31 to construct concurrent course enrollment network data in which two courses taken simultaneously in one semester are connected.

즉, 동시 수강 네트워크 구축부(34)는 선후 수강 과목 네트워크 산출 방식과 유사한 방식으로 동시 수강과목 그래프 데이터를 산출한다. 이때, 동시 수강 네트워크에 포함된 두 과목은 학생들이 한 학기에 해당 과목들을 동시에 수강하는 것을 의미한다.That is, the simultaneous course enrollment network construction unit 34 calculates concurrent course graph data in a manner similar to the method of calculating the first and subsequent course networks. In this case, the two courses included in the simultaneous enrollment network means that students take the corresponding courses simultaneously in one semester.

배치 테이블 생성부(35)는 선후 수강 조건부 확률 산출부(33)에서 산출된 선후 수강 조건부 확률 데이터를 이용하여 특정 과목에 대한 이전, 동시 및 이후 수강과목 리스트를 제공하기 위한 배치 테이블을 생성한다.The placement table generator 35 creates a placement table for providing a list of previous, concurrent, and subsequent courses for a specific subject using the prior/post-enrollment conditional probability data calculated by the prior/post-enrollment conditional probability calculation unit 33.

이때, 배치 테이블 생성부(35)는 특정 과목을 수강한 경우 그 이전에 수강했을 확률이 가장 높은 과목 순으로 이전 수강과목 리스트를 산출하고, 동시에 수강할 확률이 가장 높은 과목 순으로 동시 수강과목 리스트를 산출하며, 이후에 수강할 확률이 가장 높은 과목 순으로 이후 수강과목 리스트를 산출하여 제공한다.At this time, when a specific subject is taken, the arrangement table generator 35 calculates a list of previous courses taken in the order of highest probability of taking a specific course, and lists concurrent courses in order of highest probability of taking the same course. is calculated, and a list of subsequent courses is calculated and provided in the order of the most likely courses to be taken later.

임베딩부(36)는 귀납적 학습(inductive learning) 방식의 그래프 세이지(Graph SAGE) 알고리즘을 이용하여 데이터 구축부(32)에서 구축된 과목 간 네트워크 데이터를 학문영역 차원에 임베딩한다.The embedding unit 36 embeds the inter-subject network data constructed in the data construction unit 32 at the level of the academic domain by using the Graph SAGE algorithm of the inductive learning method.

이때, 임베딩부(36)는 대상 노드의 특성 벡터를 산출하기 위한 이웃 노드를 랜덤 추출하여 대상 노드와 직접 연결된 이웃 노드와, 그 이웃 노드의 이웃 노드까지의 연결 n개를 선택하고, 선택된 이웃 노드의 특성 벡터를 통합하여 대상 노드의 특성을 업데이트하고, 귀납적 학습 방식으로 대상 노드의 표현 학습을 수행하여 과목별 노드의 특성을 새로운 차원에 임베딩하는 순으로 그래프 세이지 알고리즘의 노드 표현 학습을 수행할 수 있다.At this time, the embedding unit 36 randomly extracts neighboring nodes for calculating the feature vector of the target node, selects a neighboring node directly connected to the target node and n connections of the neighboring node to the neighboring node, and selects the selected neighboring node. The node representation learning of the Graph Sage Algorithm can be performed in the order of updating the characteristics of the target node by integrating the feature vectors of , performing expression learning of the target node using an inductive learning method, and embedding the characteristics of each subject node into a new dimension. there is.

즉, 본 발명의 실시 예에 따른 관련 과목 추천 시스템(3)은 과목의 특성 뿐만 아니라 과목 간 관계 데이터까지 반영해 각 과목(노드)의 벡터 표현을 생성할 수 있는 표현 학습(representation learning) 기법을 사용했다. 특히, 그래프 세이지 알고리즘을 사용함으로써 노드의 특성 정보가 많은 경우에도 효율적인 학습이 가능하다.That is, the related subject recommendation system 3 according to an embodiment of the present invention uses a representation learning technique capable of generating a vector representation of each subject (node) by reflecting not only subject characteristics but also relationship data between subjects. used In particular, by using the graph sage algorithm, efficient learning is possible even when there is a lot of node characteristic information.

과목 추천부(38)는 임베딩부(36)의 임베딩 결과에 따른 과목별 노드의 특성을 이용하여 입력부(370)를 통해 사용자로부터 입력된 과목의 유사 과목을 추천한다.The subject recommendation unit 38 recommends subjects similar to the subject input from the user through the input unit 370 by using the characteristics of nodes for each subject according to the embedding result of the embedding unit 36 .

이때, 과목 추천부(38)는 학생이 선택한 과목 A와 특성 벡터 B를 기준으로 다음의 수학식 1에 의해 코사인 유사도가 가장 높은 과목이 추천될 수 있다.At this time, the subject recommendation unit 38 may recommend a subject having the highest cosine similarity according to Equation 1 based on the subject A and the feature vector B selected by the student.

이하에서는 도 2 내지 도 8을 통해 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 방법에 대하여 설명한다.Hereinafter, a method for recommending related subjects based on graph data according to an embodiment of the present invention will be described with reference to FIGS. 2 to 8 .

도 2는 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 방법의 동작 흐름을 도시한 순서도로서, 이를 참조하여 본 발명의 구체적인 동작을 설명한다.FIG. 2 is a flow chart illustrating an operation flow of a method for recommending related subjects based on graph data according to an embodiment of the present invention. Referring to this flow chart, specific operations of the present invention will be described.

본 발명의 실시 예에 따르면, 먼저, 전처리부(31)가 설정 기간 동안의 수강신청 이력 데이터를 전처리하여 수강신청 데이터 세트를 생성한다(S10).According to an embodiment of the present invention, first, the pre-processing unit 31 pre-processes course registration history data for a set period to create a course registration data set (S10).

이때, S10 단계에서는 과목 간 그래프 데이터를 구축하기 위해 학생들이 지난 N년간 간 다수의 학생이 생성한 다수 건의 수강신청 이력 데이터를 추출한다.At this time, in step S10, students extract a number of course registration history data generated by a number of students over the past N years to construct graph data between subjects.

그리고 데이터 구축부(32)가 S10 단계에서 생성된 수강신청 데이터 세트를 이용하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목 간 네트워크 데이터를 구축한다(S20).In addition, the data construction unit 32 builds network data between subjects connected in a precedent relationship based on the time point of one lecture by using the course registration data set generated in step S10 (S20).

이때, S20 단계는 S10 단계에서 수강신청 데이터 세트를 참고하여 하나의 강의수강 시점을 기준으로 선후 관계로 연결된 과목들을 통해 과목 간 관계 데이터를 산출하여 과목 간 네트워크 데이터를 구축한다.At this time, in step S20, with reference to the course registration data set in step S10, data on the relationship between subjects is calculated through subjects connected in a sequential relationship based on the point in time of taking one lecture, and network data between subjects is constructed.

그리고 선후 수강 조건부 확률 산출부(33)는 S20 단계에서 구축된 과목 간 네트워크 데이터를 이용하여 한 명의 학생이 하나의 선수과목을 수강한 뒤 후 수강과목을 수강할 때마다 선수과목과 후 수강과목의 연결을 증가시켜 선후 수강과목을 동시 수강한 빈도에서 선수과목을 들은 빈도를 나눈 값을 통해 선후 수강 조건부 확률 데이터를 산출한다(S30).In addition, the prerequisite course enrollment conditional probability calculation unit 33 uses the inter-course network data constructed in step S20 to determine the number of prerequisite courses and subsequent courses each time a student takes a prerequisite course and then takes a subsequent course. By increasing the connection, conditional probability data for prior and subsequent enrollment is calculated through a value obtained by dividing the frequency of taking prerequisite courses from the frequency of concurrent enrollment of prior and subsequent courses (S30).

그리고 동시 수강 네트워크 구축부(34)는 S10 단계에서 생성된 수강신청 데이터 세트를 이용하여 한 학기에 동시 수강한 두 과목이 연결된 동시 수강 네트워크 데이터를 구축한다(S40).In addition, the concurrent course enrollment network construction unit 34 constructs concurrent course enrollment network data in which two courses simultaneously taken in one semester are connected using the course registration data set generated in step S10 (S40).

그리고 배치 테이블 생성부(35)는 S30 단계에서 산출된 선후 수강 조건부 확률 데이터를 이용하여 특정 과목에 대한 이전, 동시 및 이후 수강과목 리스트를 제공하기 위한 배치 테이블을 생성한다(S50).Then, the placement table generator 35 creates a placement table for providing a list of previous, concurrent, and subsequent courses for a specific subject using the conditional probability data calculated in step S30 (S50).

즉, S50 단계는 특정 과목을 수강한 경우 그 이전에 수강했을 확률이 가장 높은 과목 순으로 이전 수강과목 리스트를 산출하고, 동시에 수강할 확률이 가장 높은 과목 순으로 동시 수강과목 리스트를 산출하며, 이후에 수강할 확률이 가장 높은 과목 순으로 이후 수강과목 리스트를 산출하여 제공한다.That is, in step S50, when a specific course is taken, a list of previous courses taken in the order of the highest probability of taking the previous course is calculated, and a list of concurrent courses is calculated in order of the highest probability of taking the same course. A list of subsequent courses is calculated and provided in the order of the most likely courses to be taken.

또한, S50단계에서 배치 테이블 생성부(35)는 S10 단계에서 생성한 수강신청 데이터 세트와 S30 단계에서 산출된 선후 수강 조건부 확률 데이터를 이용하여 과목 간 이전, 동시 및 이후 수강 그래프 데이터를 구축한다. 이때, 모든 과목별로 이전, 동시 및 이후 수강할 확률이 높은 과목을 최대 21개까지 산출한 배치 테이블을 생성할 수 있다.In addition, in step S50, the placement table generator 35 constructs previous, concurrent, and subsequent course enrollment graph data between subjects by using the course registration data set generated in step S10 and the probability data calculated in step S30. At this time, it is possible to generate a placement table in which up to 21 subjects with high probability of taking previous, concurrent, and subsequent courses are calculated for each subject.

도 3은 본 발명의 실시 예에 따른 학과별 커리큘럼 추천 과정을 예시적으로 도시한 도면이다.3 is a diagram exemplarily illustrating a curriculum recommendation process for each department according to an embodiment of the present invention.

도 3에서와 같이, 배치 테이블 생성부(35)는 생성된 배치 테이블을 이용하여 학과별 커리큘럼을 추천할 수 있다. 해당 정보는 학생들이 본인의 관심사에 맞게 특정 과목을 수강한 이후 수강할 심화 과목을 탐색하거나, 본인의 수강하고 싶은 심화 과목을 수강하기 전 미리 수강해야할 과목을 탐색하는 데 활용할 수도 있다.As shown in FIG. 3 , the placement table generator 35 may recommend a curriculum for each department using the generated placement table. This information can be used to search for advanced courses to be taken after taking a specific course according to students' interests, or to search for courses to be taken before taking the advanced course they want to take.

그리고 임베딩부(36)는 귀납적 학습(inductive learning) 방식의 그래프 세이지(Graph SAGE) 알고리즘을 이용하여 S20 단계에서 구축된 과목 간 네트워크 데이터를 학문영역 차원에 임베딩한다(S60).In addition, the embedding unit 36 embeds the inter-subject network data constructed in step S20 into the dimension of the academic domain using the Graph SAGE algorithm of an inductive learning method (S60).

이때, S60 단계는 대상 노드의 특성 벡터를 산출하기 위한 이웃 노드를 랜덤 추출하여 대상 노드와 직접 연결된 이웃 노드와, 그 이웃 노드의 이웃 노드까지의 연결 n개를 선택하고, 선택된 이웃 노드의 특성 벡터를 통합하여 대상 노드의 특성을 업데이트하고, 귀납적 학습 방식으로 대상 노드의 표현 학습을 수행하여 과목별 노드의 특성을 새로운 차원에 임베딩하는 순으로 그래프 세이지 알고리즘의 노드 표현 학습을 수행한다.At this time, step S60 randomly extracts neighboring nodes for calculating the characteristic vector of the target node, selects a neighboring node directly connected to the target node and n connections of the neighboring node to the neighboring node, and selects the characteristic vector of the selected neighboring node. , update the characteristics of the target node, and perform node representation learning of the Graph Sage Algorithm in the order of embedding the characteristics of each subject node into a new dimension by performing the expression learning of the target node using an inductive learning method.

자세히는, 그래프 세이지 알고리즘은 귀납적 학습 방식으로 노드의 표현을 학습한다. 즉, 새로운 과목이 추가되더라도 전체 과목 간 네트워크에 포함된 모든 과목의 표현을 새로 계산하지 않아도 된다.In detail, the graph sage algorithm learns the representations of nodes in an heuristic way. That is, even if a new subject is added, it is not necessary to recalculate the expressions of all subjects included in the network between all subjects.

즉, 새로운 과목이 추가되는 경우 기존 학생의 수강이력 데이터로부터 선후 관계에 대한 정보를 얻을 수 없기 때문에 변환적 학습(transductive learning) 방식으로는 해당 과목의 표헌을 생성할 수 없다. 따라서 본 발명의 실시 예에서는 신설 과목에 대한 표현을 생성할 수 있는 귀납적 학습 방식의 그래프 세이지 알고리즘을 적용하는 것이 바람직하다.That is, when a new subject is added, information on the precedence relationship cannot be obtained from the existing student's course history data, so a representation of the subject cannot be generated using a transformative learning method. Therefore, in an embodiment of the present invention, it is preferable to apply a graph sage algorithm of an inductive learning method capable of generating an expression for a newly established subject.

이때, 그래프 세이지 알고리즘은 샘플링 기법을 사용하기 때문에 노드의 특성 벡터가 큰 경우에도 효율적인 표현 학습이 가능하다. 그래프 세이지 알고리즘을 구현하기 위해 Stellargraph 프레임 워크의 그래프 세이지 함수를 사용할 수 있다.At this time, since the graph sage algorithm uses a sampling technique, efficient expression learning is possible even when the feature vector of a node is large. To implement the graph sage algorithm, you can use the graph sage functions of the Stellargraph framework.

또한, 과목 노드 표현 학습은 다음의 방법으로 이루어진다.In addition, subject node expression learning is performed in the following way.

노드의 특성 벡터가 큰 경우에도 효율적인 표현 학습이 가능한 그래프 세이지 알고리즘의 강점을 이용하여 과목 간 네트워크의 과목별 강의내용 콘텐츠를 과목의 표현을 생성하는데 사용한다. Even when the feature vector of the node is large, the lecture content for each subject of the network between subjects is used to generate the subject expression by using the strength of the Graph Sage algorithm, which can efficiently learn the expression.

그리고 과목별 강의내용 콘텐츠를 벡터화하기 위해 자체 개발한 자연어 분석 기법인 학문영역 임베딩 방식을 활용한다. 학문영역 임베딩 방식을 통해 모든 과목의 강의내용을 152차원의 학문영역 특성으로 표현한다.In addition, to vectorize the contents of lectures by subject, we use the academic area embedding method, which is a natural language analysis technique developed by ourselves. Through the academic area embedding method, the lecture contents of all subjects are expressed as 152-dimensional academic area characteristics.

이어질 과목별 노드 표현 학습을 수행하기 위해 각 차원을 최대-최소기법을 사용해 정규화한다.Each dimension is normalized using the maximum-minimum method to perform node expression learning for each subject that will follow.

본 발명의 실시 예에서 과목 간 선후 관계 그래프 데이터는 선수 과목과 후 수강과목 간 방향성이 있는 네트워크 구조를 가지고 있다. 이러한 관계성을 표현 학습에 반영하기 위해 다이렉트 그래프 세이지 알고리즘을 사용하는 것이 바람직하다. 그래프 데이터를 활용한 노드의 표현 학습은 변환적 학습의 일종이지만, 본 발명의 실시 예에 따른 그래프 세이지 알고리즘에서는 샘플링 기법을 활용하여 마치 귀납적 학습처럼 수행한다.In an embodiment of the present invention, the graph data of the prior relationship between courses has a network structure with directionality between prerequisite courses and subsequent courses. It is desirable to use a direct graph sage algorithm to reflect this relationship in expression learning. Learning the expression of a node using graph data is a type of transformative learning, but in the graph sage algorithm according to an embodiment of the present invention, it is performed like inductive learning by using a sampling technique.

구체적으로, 표현을 학습할 대상 노드와 실제로 연결되어 있는 이웃 노드와 이웃하지 않은 노드 n개를 각각 무작위로 선택한다. 알고리즘은 입력된 노드의 특성 벡터를 이용하여 해당 노드와 임의 선택된 노드가 서로 연결되어 있는지 연결되어 있는지 여부를 예측하는 과정을 통해 학습이 이루어진다. Specifically, n neighboring nodes and non-neighboring nodes that are actually connected to the target node to learn the expression are randomly selected. The algorithm learns through a process of predicting whether a corresponding node and a randomly selected node are connected to each other by using the feature vector of the input node.

이때, 주변 노드의 특성과 그래프 구조로 얻을 수 있는 정보를 함께 활용하기 위해 주변 노드의 특성을 대상 노드의 특성 벡터를 산출하는데 함께 활용한다.At this time, in order to utilize both the characteristics of the surrounding nodes and the information that can be obtained from the graph structure, the characteristics of the surrounding nodes are used together to calculate the characteristic vector of the target node.

도 4는 도 2의 S60 단계에 대한 세부 동작 흐름을 도시한 순서도이고, 도 5는 도 4의 과정을 설명하기 위해 도시한 도면이다.FIG. 4 is a flowchart illustrating a detailed operation flow for step S60 in FIG. 2, and FIG. 5 is a diagram for explaining the process of FIG.

도 4 및 도 5에서와 같이, 먼저, 임베딩부(36)가 대상 노드의 특성 벡터를 산출하는데 이용할 이웃 노드를 랜덤하게 추출하는 표본 추출 과정을 수행한다(S61).As in FIGS. 4 and 5, first, the embedding unit 36 randomly selects neighboring nodes to be used to calculate the feature vector of the target node (S61).

그 다음, 임베딩부(360)는 대상 노드와 직접 연결된 이웃 노드와, 그 이웃 노드의 이웃 노드까지의 연결 n개를 선택하고, 선택된 이웃 노드의 특성 벡터를 통합하여 대상 노드의 특성을 업데이트하여 이웃 노드 정보를 통합하는 과정을 수행한다(S62).Next, the embedding unit 360 selects a neighbor node directly connected to the target node and n connections from the neighbor node to the neighbor node, integrates the feature vectors of the selected neighbor nodes, and updates the target node's characteristics to update the target node's characteristics. A process of integrating node information is performed (S62).

자세히는, 그래프 세이지 알고리즘은 표본 추출 과정을 무선화 함으로써 네트워크의 크기가 크더라도 계산 과정이 효율적으로 이루어질 수 있게 만든다. 이때, 몇 단계까지의 연결을 선택할지, 몇 개의 연결을 선택할지 하이퍼 파라미터로 선택할 수 있다. 본 발명의 실시 예에서는 각 노드 별로 연결 5개를 추출하는데, 이는 어디까지나 설명의 편의를 위한 예시일 뿐 추출 개수를 한정하는 것은 바람직하지 않다.In detail, the graph sage algorithm makes the calculation process efficient even if the size of the network is large by making the sampling process random. At this time, it is possible to select up to how many connections to select and how many connections to select using hyperparameters. In the embodiment of the present invention, five connections are extracted for each node, but this is only an example for convenience of explanation, and it is not desirable to limit the number of extractions.

대상 노드와 직접 연결된 이웃 노드와 그 이웃 노드의 이웃 노드까지의 연결 5개를 선택한다. 이때, 각 단계마다 몇 개의 노드를 선택할 수 있는지 파라미터로 직접 설정할 수 있다. 첫 번째 단계의 이웃 노드 10개, 두 번째 단계의 이웃 노드 50개를 선택하고, 이웃 노드들이 선택되면, 선택된 이웃 노드의 특성 벡터를 통합해 대상 노드의 특성을 업데이트한다. 이때, 이웃 노드의 정보를 통합하는 방법으로 Mean, Max, Attentional 등이 있다. 본 발명의 실시 예에서는 이 중 선행 연구에서 가장 성능이 우수한 것으로 나타난 Attentional 방식을 사용할 수도 있다. Select 5 connections from a neighbor node directly connected to the target node to a neighbor node of that neighbor node. At this time, the number of nodes that can be selected for each step can be directly set as a parameter. 10 neighboring nodes in the first step and 50 neighboring nodes in the second step are selected, and when the neighboring nodes are selected, the properties of the target node are updated by integrating the feature vectors of the selected neighboring nodes. At this time, there are Mean, Max, Attentional, etc. as a method of integrating the information of neighboring nodes. In the embodiment of the present invention, the attentional method, which has been shown to have the best performance in previous studies, may be used.

마지막으로 임베딩부(360)는 귀납적 학습 방식으로 대상 노드의 표현 학습을 수행하는 노드 표현 생성 과정을 수행한다(S63).Finally, the embedding unit 360 performs a node expression generation process for performing expression learning of a target node in an inductive learning method (S63).

자세히는, 앞서 서술한 예측 과제를 학습하면서 대상 노트의 표현 학습을 진행한다. 전체 노드 중 90%는 학습 데이터로, 10%는 훈련 데이터로 구분하고, 학습의 진척은 이진 분류 정확도 지표를 사용해 확인할 수 있다. 엣지 가중치는 과목 간 조건부 수강 확률로 설정하고, 배치는 100, 에포크는 3회 반복한다. 이와 같은 과정을 거쳐 과목 노드의 특성을 새로운 50차원에 임베딩한다.In detail, while learning the prediction task described above, the expression of the target note is learned. 90% of all nodes are classified as training data and 10% as training data, and the progress of learning can be checked using a binary classification accuracy indicator. The edge weight is set as the conditional probability of enrollment between subjects, the batch is 100, and the epoch is repeated 3 times. Through this process, the characteristics of the subject node are embedded into a new 50-dimensional space.

입력부(37)을 통해 사용자로부터 과목이 입력되면(S70), 과목 추천부(38)는 S60 단계의 임베딩 결과에 따른 과목별 노드의 특성을 이용하여 사용자로부터 입력된 과목의 유사 과목을 추천한다(S80).When a subject is input from the user through the input unit 37 (S70), the subject recommendation unit 38 recommends similar subjects to the subject input by the user using the characteristics of the node for each subject according to the embedding result of step S60 ( S80).

도 6은 본 발명의 실시 예에 따른 과목별 노드 임베딩 결과를 종래와 비교하여 예시적으로 도시한 도면이다.6 is a diagram illustratively illustrating a result of node embedding for each subject according to an embodiment of the present invention compared with the conventional one.

도 6의 (a)는 종래의 임베딩 결과를 나타낸 예시이고, 도 6의 (b)는 본 발명의 실시 예에 따른 과목별 노드 임베딩 결과를 나타낸 예시이다.FIG. 6(a) is an example showing a conventional embedding result, and FIG. 6(b) is an example showing a node embedding result for each subject according to an embodiment of the present invention.

즉, 본 발명의 실시 예에서는 그래프 세이지 알고리즘을 이용하여 특정 과목의 저차원 임베딩을 수행한다. 이때, 해당 과목의 특성 뿐만 아니라 관련 과목의 특성 및 관련 과목과의 관계를 모두 고려한다.That is, in an embodiment of the present invention, low-dimensional embedding of a specific subject is performed using a graph sage algorithm. At this time, not only the characteristics of the subject, but also the characteristics of the related subject and the relationship with the related subject are all considered.

이러한 임베딩의 성능을 확인하기 위해서는 각 전공별 도메인 지식에 기반한 질적 평가가 필요하다. 본 발명의 실시 예예서는 대략적인 임베딩 성능을 확인하기 위해 UMAP을 활용한 차원 축소 및 시각화 기법을 활용했다. 지표로 코사인 거리를 사용했으며, n_neighbors를 100으로 설정했고, 시각화 후 노드의 겹침을 방지하기 위해 min_dist를 1로 설정했다.To check the performance of these embeddings, qualitative evaluation based on domain knowledge for each major is required. In the embodiment of the present invention, a dimensionality reduction and visualization technique using UMAP was used to check approximate embedding performance. Cosine distance was used as a metric, n_neighbors was set to 100, and min_dist was set to 1 to prevent nodes from overlapping after visualization.

도 6의 기존 과목 특성만을 사용한 결과(a)와 그래프 세이지 알고리즘을 함께 사용한 임베딩을 차원 축소 및 시각화 결과(b)를 비교하면 알 수 있듯이, 그래프 세이지 알고리즘을 사용했을 때 과목 노드가 한눈에 들어오는 것을 확인할 수 있다.As can be seen by comparing the result using only the existing subject characteristics of FIG. 6 (a) and the result of dimension reduction and visualization of the embedding using the Graph Sage algorithm (b), it is clear that subject nodes are at a glance when using the Graph Sage algorithm. You can check.

도 7은 본 발명의 실시 예에 따른 관련 과목 추천 결과를 예시적으로 도시한 도면이다. 7 is a diagram illustratively illustrating results of recommending related subjects according to an embodiment of the present invention.

도 7은 과목 간 그래프 데이터에 기반하여 실제 서비스되는 화면의 예시이다. 사용자가 한 과목을 선택하면 해당 과목의 이전, 이후 및 동시 수강 확률이 가장 높은 과목을 순서대로 보여주는 것을 확인 할 수 있다.7 is an example of a screen actually provided based on graph data between subjects. When a user selects a course, you can see that the courses with the highest probability of taking the course before, after, and concurrently are displayed in order.

도 8은 본 발명의 실시 예에 따른 유사한 타학과 과목 추천 결과를 예시적으로 도시한 도면이다.8 is a diagram illustratively illustrating similar subject recommendation results from other departments according to an embodiment of the present invention.

도 8에서와 같이, 사용자로 부터 입력된 과목과 유사한 타학과의 과목을 추천해줄 수도 있다.As shown in FIG. 8 , subjects in other departments similar to subjects input by the user may be recommended.

이와 같은, 그래프 데이터 기반 관련 과목 추천 시스템 및 그 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.Such a graph data-based related subject recommendation system and method may be implemented as an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.Program instructions recorded on the computer-readable recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있다.Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes such as those produced by a compiler. The hardware device may be configured to act as one or more software modules to perform processing according to the present invention.

상술한 바와 같이, 본 발명의 실시 예에 따른 그래프 데이터 기반 관련 과목 추천 시스템 및 그 방법은 그래프 데이터를 이용하여 학생들에게 관련 과목을 추천해줌으로써 실제 학생들의 수강 패턴을 고려한 커리큘럼을 추천할 수 있어 학생들의 만족도를 향상시킬 수 있다.As described above, the graph data-based related subject recommendation system and method according to an embodiment of the present invention recommends related subjects to students using graph data, thereby recommending a curriculum that considers actual student attendance patterns. satisfaction can be improved.

또한 본 발명의 실시 예에 따르면, 텍스트 데이터만 활용하는 기존 방법 대비 연관 과목을 함께 고려함으로써 비교적 분석할 텍스트 데이터가 부족한 상황에서도 더 정확한 과목 임베딩 및 추천이 가능한 이점이 있다.In addition, according to the embodiment of the present invention, there is an advantage in enabling more accurate subject embedding and recommendation even in a situation where text data to be analyzed is relatively insufficient by considering related subjects together compared to the existing method using only text data.

본 발명은 도면에 도시된 실시 예를 참고로 하여 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호범위는 아래의 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. will be. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the claims below.

1 : 사용자 단말
3 : 관련 과목 추천 시스템
31 : 전처리부
32 : 데이터 구축부
33 : 선후 수강 조건부 확률 산출부
34 : 동시 수강 네트워크 구축부
35 : 배치 테이블 생성부
36 : 임베딩부
37 : 입력부
38 : 과목 추천부1: User terminal
3: Related subject recommendation system
31: pre-processing unit
32: data construction unit
33: Priority and subsequent enrollment conditional probability calculation unit
34: Concurrent course network construction department
35: batch table generation unit
36: embedding unit
37: input unit
38: subject recommendation department

Claims

a pre-processing unit that pre-processes course registration history data during the set period to create a course registration data set;
a data construction unit that builds network data between subjects connected in a precedent relationship based on a point in time when one class is enrolled by using the generated course registration data set;
an embedding unit that embeds the constructed inter-subject network data into an academic area dimension using an inductive learning-based Graph SAGE algorithm; and
A subject recommendation unit recommending subjects similar to subjects input from a user by using characteristics of nodes for each subject according to the embedding result;
Using the network data between the built courses, each time a student takes a prerequisite course and then takes a later course, the connection between the prerequisite course and the later course is increased to increase the frequency of taking the first and subsequent courses at the same time. a prior/post-enrollment conditional probability calculation unit that calculates prior/post-enrollment conditional probability data through a value obtained by dividing the frequency of taking prerequisite courses;
a concurrent course enrollment network building unit that constructs concurrent course enrollment network data in which two courses taken simultaneously in one semester are connected using the generated course registration data set; and
The system for recommending related subjects based on graph data further comprising a placement table generating unit for generating a placement table for providing a list of previous, concurrent, and subsequent courses for a specific subject using the calculated prior and subsequent course enrollment conditional probability data.

According to claim 1,
The data building unit,
A graph data-based related subject recommendation system that constructs network data between subjects by calculating relationship data between subjects through subjects connected in a precedent relationship based on the time of one course enrollment by referring to the course registration data set.

According to claim 1,
The embedding part,
A neighbor node for calculating the feature vector of the target node is randomly extracted, a neighbor node directly connected to the target node and n connections from the neighbor node to the neighbor node are selected, and the feature vectors of the selected neighbor node are integrated to select the target node. A graph data-based related subject recommendation system that performs node expression learning of the Graph Sage Algorithm in the order of updating the characteristics of and embedding the characteristics of each subject node into a new dimension by performing the expression learning of the target node using an inductive learning method.

delete

According to claim 1,
The arrangement table creation unit,
If a specific course is taken, the previous course list is calculated in the order of the highest probability of taking the previous course, and the concurrent course list is calculated in the order of the highest probability of taking the same course. A graph data-based related subject recommendation system that calculates and provides a list of courses taken in the order of the highest subject.

In the related subject recommendation method performed by the graph data-based related subject recommendation system,
generating a course registration data set by pre-processing course registration history data during a set period;
constructing network data between subjects connected in a precedence relationship based on a point in time of attending one lecture by using the generated course registration data set;
embedding the constructed inter-subject network data in the dimension of an academic domain using an inductive learning-based Graph SAGE algorithm; and
recommending a subject similar to the subject input from the user by using characteristics of nodes for each subject according to the embedding result;
After the step of building the network data,
Using the network data between the built courses, each time a student takes a prerequisite course and then takes a later course, the connection between the prerequisite course and the later course is increased to increase the frequency of taking the first and subsequent courses at the same time. calculating pre-enrollment conditional probability data through a value obtained by dividing the frequency of taking prerequisite courses;
constructing concurrent course enrollment network data in which two courses taken simultaneously in one semester are connected using the generated course registration data set; and
The related subject recommendation method further comprising generating a placement table for providing a list of previous, concurrent, and subsequent courses for a specific subject using the calculated prior and subsequent enrollment conditional probability data.

According to claim 6,
The step of building the network data between subjects,
A method for recommending related subjects that constructs network data between subjects by calculating relationship data between subjects through subjects connected in a precedent relationship based on the time of enrollment of one course by referring to the course registration data set.

According to claim 6,
In the embedding step,
A neighbor node for calculating the feature vector of the target node is randomly extracted, a neighbor node directly connected to the target node and n connections from the neighbor node to the neighbor node are selected, and the feature vectors of the selected neighbor node are integrated to select the target node. A related subject recommendation method that performs node expression learning of the Graph Sage Algorithm in the order of updating the characteristics of and embedding the characteristics of each subject node into a new dimension by performing the expression learning of the target node using an inductive learning method.

delete

According to claim 6,
The step of creating the arrangement table,
If a specific course is taken, the previous course list is calculated in the order of the highest probability of taking the previous course, and the concurrent course list is calculated in the order of the highest probability of taking the same course. A method of recommending related courses that calculates and provides a list of subsequent courses in the order of the highest.

A computer-readable recording medium on which a computer program for performing the graph data-based related subject recommendation method according to claim 6 is recorded.