KR20230106579A

KR20230106579A - Method and Apparatus for Providing Knowledge Compass Service

Info

Publication number: KR20230106579A
Application number: KR1020230088032A
Authority: KR
Inventors: 윤종식
Original assignee: 윤종식
Priority date: 2020-07-17
Filing date: 2023-07-06
Publication date: 2023-07-13
Also published as: KR102559449B1; KR20220010109A; KR102675005B1

Abstract

The present invention relates to a method for providing a learning compass service, which comprises the steps of: constructing a keyword dictionary whose service target fields may be divided according to each subject and each detailed course by selecting keywords for each subject and keywords for each detailed course for an unassigned curriculum; collecting learning data on free learning content for the unassigned curriculum by performing data crawling on the free learning platforms based on the keywords for each subject and the keywords for each detailed course; extracting data for analysis from the learning data by performing data preprocessing on the learning data; distributing the free learning content to the unassigned curriculum to generate an assigned curriculum by processing the data for analysis using an artificial intelligence classification model which is learned to perform content classification through machine learning; calculating satisfaction rating for each of the free learning content based on a rating calculation model; and providing at least a part of the assignment curriculum together with the satisfaction rating for a user terminal in response to a user query input with respect to the service target field. Therefore, the convenience and efficiency of a user utilizing free learning content can be improved.

Description

Method and Apparatus for Providing Knowledge Compass Service}

본 발명은 무료 학습 큐레이션 서비스를 제공하는 방법에 관한 것이다. 보다 상세하게는, 본 발명은 데이터 크롤링을 통해 무료 학습 컨텐츠들을 수집하고 인공지능 기반의 분류 모형으로 이들을 분류하여 커리큘럼을 완성하고 만족도 기반의 통계 모형으로 평점을 산출하여 유저에게 제공하는 서비스를 제공하는 서버 및 방법에 관한 것이다.The present invention relates to a method of providing a free learning curation service. More specifically, the present invention collects free learning contents through data crawling, classifies them with an artificial intelligence-based classification model, completes the curriculum, calculates ratings with a satisfaction-based statistical model, and provides services to users. It relates to servers and methods.

무료 학습 컨텐츠를 통한 교육 및 학습에 대한 수요가 증가하고 있다. 유튜브, edwith 및 K-MOOC 등과 같은 무료 학습 플랫폼을 통해 무료 동영상 강의를 시청하는 방식의 학습이 점차 확산되고 있다.Demand for education and learning through free learning contents is increasing. Learning by watching free video lectures is gradually spreading through free learning platforms such as YouTube, edwith, and K-MOOC.

한 시장조사 전문기업의 조사에 따르면, 조사 대상자들의 유튜브 등에서 제공되는 러닝 컨텐츠에 대한 시청 빈도 및 만족도가 상당한 수준이라는 점과 러닝 컨텐츠에 의해 학습 시간 및 학습 비용의 절감이 체감될 수 있다는 점이 조사되었다. 특히, 근래에 4차 산업혁명 관련 기술에 대한 대중의 관심이 증가하면서 빅데이터, 인공지능 및 소프트웨어 등의 분야에 대해서도 무료 학습 컨텐츠 수요가 증가하고 있다.According to a research by a market research company, it was found that the frequency and satisfaction with the learning contents provided on YouTube, etc., of the subjects of the survey were quite high, and that learning time and cost reduction could be experienced by the learning contents. . In particular, as the public's interest in technologies related to the 4th industrial revolution increases in recent years, demand for free learning content is also increasing in areas such as big data, artificial intelligence and software.

위와 같이 무료 학습 컨텐츠를 통한 학습이 높은 만족도를 제공하며 그에 대한 수요가 증가하고 있음에도 불구하고, 무료 학습 컨텐츠를 통한 학습은 아직 학습자들에게 많은 불편을 초래하고 있다.As described above, although learning through free learning content provides high satisfaction and the demand for it is increasing, learning through free learning content still causes a lot of inconvenience to learners.

예를 들면, 다양한 플랫폼들에서 컨텐츠를 검색하고 실제 강의 내용이 학습자의 요구와 부합하는지를 따지는 과정에서 많은 시간이 소요될 수 있고, 정립된 체계 없이 컨텐츠들이 다수의 플랫폼들에 혼재되어 있어, 학습자들이 컨텐츠들을 효율적으로 활용하기가 어려울 수 있다는 점이 문제될 수 있다.For example, it can take a lot of time in the process of searching for content on various platforms and determining whether the actual lecture content meets the learner's needs, and the content is mixed on multiple platforms without an established system, so that learners can It can be a problem that it can be difficult to use them effectively.

본 발명으로부터 해결하고자 하는 기술적 과제는 무료 학습 큐레이션 서비스를 제공함으로써 무료 학습 컨텐츠들을 활용하여 학습하는 과정에서의 불편 및 비효율에 관한 문제점들을 해결하는 것이다.A technical problem to be solved by the present invention is to solve problems related to inconvenience and inefficiency in the process of learning using free learning contents by providing a free learning curation service.

전술한 기술적 과제를 해결하기 위한 수단으로서, 본 발명의 일 측면에 따른 무료 학습 큐레이션 서비스를 제공하는 서버는, 상기 무료 학습 큐레이션 서비스를 제공하기 위한 명령들을 저장하는 메모리; 및 상기 명령들을 실행함으로써: 서비스 대상 분야를 과목별 및 세부 과정별로 구분하는 미배정 커리큘럼에 대해 과목별 키워드들 및 세부 과정별 키워드들을 선정하여 키워드 사전을 구축하고, 상기 과목별 키워드들 및 상기 세부 과정별 키워드들을 기반으로 무료 학습 플랫폼들에 대한 데이터 크롤링을 수행하여 상기 미배정 커리큘럼에 대한 무료 학습 컨텐츠들의 학습 데이터를 수집하고, 상기 학습 데이터에 대한 데이터 전처리를 수행하여 상기 학습 데이터로부터 분석용 데이터를 추출하고, 머신 러닝을 통해 컨텐츠 분류를 수행하도록 학습되는 인공지능 분류 모형을 활용하여 상기 분석용 데이터를 처리함으로써 상기 무료 학습 컨텐츠들을 상기 미배정 커리큘럼에 분배하여 배정 커리큘럼을 생성하고, 평점 산출 모형에 기초하여 상기 무료 학습 컨텐츠들 각각에 대한 만족도 평점을 산출하고, 그리고 상기 서비스 대상 분야에 관한 유저의 쿼리 입력에 대응하여 상기 배정 커리큘럼의 적어도 일부를 상기 만족도 평점과 함께 유저 단말기에 제공하는 프로세서를 포함한다.As a means for solving the above technical problem, a server providing a free learning curation service according to an aspect of the present invention includes a memory storing instructions for providing the free learning curation service; and by executing the above commands: constructing a keyword dictionary by selecting keywords for each subject and keywords for each detailed process for an unassigned curriculum that classifies service target fields by subject and detailed process, and constructs a keyword dictionary by selecting the keywords for each subject and the detailed process Data crawling of free learning platforms based on keywords for each course is performed to collect learning data of free learning contents for the unassigned curriculum, and data for analysis is performed from the learning data by performing data pre-processing on the learning data. By extracting and processing the data for analysis using an artificial intelligence classification model learned to classify content through machine learning, the free learning contents are distributed to the unassigned curriculum to create an assignment curriculum, and score calculation model A processor for calculating a satisfaction rating for each of the free learning contents based on, and providing at least a part of the assignment curriculum to a user terminal together with the satisfaction rating in response to a user's query input regarding the service target field. include

본 발명의 다른 측면에 따른 무료 학습 큐레이션 서비스를 제공하는 서버에 의해 무료 학습 큐레이션 서비스를 제공하는 방법은, 서비스 대상 분야를 과목별 및 세부 과정별로 구분하는 미배정 커리큘럼에 대해 과목별 키워드들 및 세부 과정별 키워드들을 선정하여 키워드 사전을 구축하는 단계; 상기 과목별 키워드들 및 상기 세부 과정별 키워드들을 기반으로 무료 학습 플랫폼들에 대한 데이터 크롤링을 수행하여 상기 미배정 커리큘럼에 대한 무료 학습 컨텐츠들의 학습 데이터를 수집하는 단계; 상기 학습 데이터에 대한 데이터 전처리를 수행하여 상기 학습 데이터로부터 분석용 데이터를 추출하는 단계; 머신 러닝을 통해 컨텐츠 분류를 수행하도록 학습되는 인공지능 분류 모형을 활용하여 상기 분석용 데이터를 처리함으로써 상기 무료 학습 컨텐츠들을 상기 미배정 커리큘럼에 분배하여 배정 커리큘럼을 생성하는 단계; 평점 산출 모형에 기초하여 상기 무료 학습 컨텐츠들 각각에 대한 만족도 평점을 산출하는 단계; 및 상기 서비스 대상 분야에 관한 유저의 쿼리 입력에 대응하여 상기 배정 커리큘럼의 적어도 일부를 상기 만족도 평점과 함께 유저 단말기에 제공하는 단계를 포함한다.A method for providing a free learning curation service by a server providing a free learning curation service according to another aspect of the present invention includes keywords for each subject for an unassigned curriculum that classifies service target fields by subject and detailed course. and constructing a keyword dictionary by selecting keywords for each detailed process; Collecting learning data of free learning contents for the unassigned curriculum by performing data crawling on free learning platforms based on the keywords for each subject and the keywords for each detailed course; extracting data for analysis from the learning data by performing data pre-processing on the learning data; generating an assigned curriculum by distributing the free learning contents to the unassigned curriculum by processing the data for analysis using an artificial intelligence classification model learned to classify contents through machine learning; Calculating a satisfaction rating for each of the free learning contents based on a rating calculation model; and providing at least a part of the assignment curriculum together with the satisfaction rating to a user terminal in response to a user's query input regarding the service target field.

본 발명에 따른 무료 학습 큐레이터 서비스에 의하면, 무료 학습 플랫폼들에 대한 데이터 크롤링이 수행되고, 인공지능 분류 모형에 의해 무료 학습 컨텐츠들에 대한 커리큘럼 배정이 수행되며, 평점 산출 모형에 따른 무료 학습 컨텐츠들에 대한 만족도 평점이 산출되는 결과, 배정 커리큘럼 및 만족도 평점이 유저에게 제공될 수 있으므로, 무료 학습 컨텐츠들을 활용하여 학습하는 과정에서의 문제점들이 해결되어, 유저가 무료 학습 컨텐츠들을 활용하는 편의성 및 효율성이 향상될 수 있다.According to the free learning curator service according to the present invention, data crawling for free learning platforms is performed, curriculum assignment for free learning contents is performed by an artificial intelligence classification model, and free learning contents according to a rating calculation model As a result of calculating the satisfaction rating, the assignment curriculum and satisfaction rating can be provided to the user, problems in the process of learning using free learning contents are solved, and the convenience and efficiency of users using free learning contents are improved. can be improved

도 1은 일부 실시예에 따른 무료 학습 큐레이션 서비스를 제공하는 시스템을 설명하기 위한 도면이다.
도 2는 일부 실시예에 따른 무료 학습 큐레이션 서비스를 제공하는 서버를 구성하는 요소들을 나타내는 블록도이다.
도 3은 일부 실시예에 따른 무료 학습 큐레이션 서비스가 제공되는 과정을 설명하기 위한 도면이다.
도 4는 일부 실시예에 따른 미배정 커리큘럼을 설정하고 키워드 사전을 구축하는 과정을 설명하기 위한 도면이다.
도 5는 일부 실시예에 따른 무료 학습 플랫폼들에 대한 데이터 크롤링을 수행하여 학습 데이터를 수집하는 과정을 설명하기 위한 도면이다.
도 6은 일부 실시예에 따른 음성 인식을 수행하여 음성 데이터에 대응되는 텍스트 데이터를 수집하는 과정을 설명하기 위한 도면이다.
도 7은 일부 실시예에 따른 분석용 데이터를 추출하고 배정 커리큘럼을 생성하는 과정을 설명하기 위한 도면이다.
도 8은 일부 실시예에 따른 만족도 평점을 산출하기 위해 평점 산출 모형을 생성하는 과정을 설명하기 위한 도면이다.
도 9는 일부 실시예에 따른 배정 커리큘럼 및 만족도 평점을 유저 단말기에 제공하는 과정을 설명하기 위한 도면이다.
도 10은 일부 실시예에 따른 무료 학습 큐레이션 서비스를 제공하는 방법을 구성하는 단계들을 나타내는 흐름도이다.1 is a diagram for explaining a system for providing a free learning curation service according to some embodiments.
2 is a block diagram illustrating elements constituting a server providing a free learning curation service according to some embodiments.
3 is a diagram for explaining a process of providing a free learning curation service according to some embodiments.
4 is a diagram for explaining a process of setting an unassigned curriculum and constructing a keyword dictionary according to some embodiments.
5 is a diagram for explaining a process of collecting learning data by performing data crawling on free learning platforms according to some embodiments.
6 is a diagram for explaining a process of collecting text data corresponding to voice data by performing voice recognition according to some embodiments.
7 is a diagram for explaining a process of extracting data for analysis and generating an assignment curriculum according to some embodiments.
8 is a diagram for explaining a process of generating a score calculation model to calculate a satisfaction score according to some embodiments.
9 is a diagram for explaining a process of providing an assignment curriculum and satisfaction rating to a user terminal according to some embodiments.
10 is a flowchart illustrating steps constituting a method of providing a free learning curation service according to some embodiments.

이하에서는 도면을 참조하여 본 발명에 따른 실시예들이 상세하게 설명될 것이다. 이하에서의 설명은 실시예들을 구체화하기 위한 것일 뿐, 본 발명에 따른 권리범위를 제한하거나 한정하기 위한 것은 아니다. 본 발명의 기술 분야에서 통상의 지식을 가진 자가 발명의 상세한 설명 및 실시예들로부터 용이하게 유추할 수 있는 것은 본 발명에 따른 권리범위에 속하는 것으로 해석되어야 한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the drawings. The following description is only for specifying the embodiments, and is not intended to limit or limit the scope of rights according to the present invention. What a person skilled in the art can easily infer from the detailed description and examples of the present invention should be construed as belonging to the scope of the present invention.

본 발명에서 사용되는 용어는 본 발명의 기술 분야에서 널리 사용되는 일반적인 용어로 기재되었으나, 본 발명에서 사용되는 용어의 의미는 해당 분야에 종사하는 기술자의 의도, 판례 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 일부 용어는 출원인에 의해 임의로 선정될 수 있고, 이 경우 해당 용어의 의미가 상세하게 설명될 것이다. 본 발명에서 사용되는 용어는 단지 사전적 의미만이 아닌, 명세서의 전반적인 맥락에 따른 의미로 해석되어야 한다.The terms used in the present invention have been described as general terms widely used in the technical field of the present invention, but the meaning of the terms used in the present invention may vary depending on the intention of a person working in the field, a precedent, or the emergence of new technology. there is. Some terms may be arbitrarily selected by the applicant, in which case the meaning of the terms will be explained in detail. The terms used in the present invention should be interpreted as meanings according to the overall context of the specification, not just dictionary meanings.

본 발명에서 사용되는 '구성된다' 또는 '포함한다' 와 같은 용어는 명세서에 기재되는 구성 요소들 또는 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 일부 구성 요소들 또는 단계들은 포함되지 않는 경우 및 추가적인 구성 요소들 또는 단계들이 더 포함되는 경우 또한 해당 용어로부터 의도되는 것으로 해석되어야 한다.Terms such as 'consisting of' or 'includes' used in the present invention should not necessarily be construed as including all of the components or steps described in the specification, and when some components or steps are not included, and The inclusion of additional components or steps should also be interpreted as intended from the term.

본 발명에서 사용되는 '제 1' 또는 '제 2' 와 같은 서수를 포함하는 용어는 다양한 구성 요소들 또는 단계들을 설명하기 위해 사용될 수 있으나, 해당 구성 요소들 또는 단계들을 서수에 의해 한정되지 않아야 한다. 서수를 포함하는 용어는 하나의 구성 요소 또는 단계를 다른 구성 요소들 또는 단계들로부터 구별하기 위한 용도로만 해석되어야 한다.Terms including ordinal numbers such as 'first' or 'second' used in the present invention may be used to describe various components or steps, but the components or steps should not be limited by ordinal numbers. . Terms containing ordinal numbers should only be construed to distinguish one component or step from other components or steps.

이하에서는 도면을 참조하여 본 발명에 따른 실시예들이 상세하게 설명될 것이다. 본 발명의 기술 분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 대해서는 자세한 설명이 생략된다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the drawings. A detailed description of matters widely known to those skilled in the art will be omitted.

도 1은 일부 실시예에 따른 무료 학습 큐레이션 서비스를 제공하는 시스템을 설명하기 위한 도면이다.1 is a diagram for explaining a system for providing a free learning curation service according to some embodiments.

도 1을 참조하면, 무료 학습 큐레이션 서비스를 제공하는 시스템(10)은 무료 학습 큐레이션 서비스를 제공하는 서버(100), 무료 학습 플랫폼들(200) 및 유저 단말기(300)를 포함할 수 있다. 다만 이에 제한되는 것은 아니고, 도 1에 도시된 구성 요소들 외에 다른 범용적인 구성 요소들이 시스템(10)에 더 포함될 수 있다.Referring to FIG. 1 , a system 10 providing a free learning curation service may include a server 100 providing a free learning curation service, free learning platforms 200, and a user terminal 300. . However, it is not limited thereto, and other general-purpose components other than the components shown in FIG. 1 may be further included in the system 10 .

시스템(10)은 서버(100)가 무료 학습 플랫폼들(200)을 활용하여 유저 단말기(300)를 사용하는 유저에게 무료 학습 큐레이션 서비스를 제공하기 위한 것일 수 있다. 예를 들면, 시스템(10)은 웹 환경에서 구현되어 무료 학습 큐레이션 서비스가 웹 브라우저를 통해 유저에게 제공될 수 있고, 또는 시스템(10)은 애플리케이션으로 구현되어 무료 학습 큐레이션 서비스가 모바일 디바이스를 통해 유저에게 제공될 수도 있다.In the system 10 , the server 100 may utilize the free learning platforms 200 to provide a free learning curation service to users using the user terminal 300 . For example, the system 10 may be implemented in a web environment and the free learning curation service may be provided to users through a web browser, or the system 10 may be implemented as an application and the free learning curation service may be provided through a mobile device. It may be provided to the user through

서버(100)는 시스템(10)에서 유저 단말기(300)에 무료 학습 큐레이션 서비스를 제공하기 위한 디바이스일 수 있다. 예를 들면, 서버(100)는 웹 또는 앱 환경에서 동작하는 무료 학습 큐레이션 서비스를 구현하기 위한 하드웨어, 소프트웨어 또는 그 조합을 의미할 수 있다.The server 100 may be a device for providing a free learning curation service to the user terminal 300 in the system 10 . For example, the server 100 may refer to hardware, software, or a combination thereof for implementing a free learning curation service operating in a web or app environment.

무료 학습 플랫폼들(200)은 웹 사이트 또는 애플리케이션 등을 통해 동영상 강의와 같은 학습 자료들을 무료로 제공하는 컨텐츠 공급자를 의미할 수 있다. 예를 들면, 무료 학습 플랫폼들(200)은 유튜브(YouTube)와 같은 동영상 공유 플랫폼, 에드위드(edwith)와 같은 교육 플랫폼 및 K-MOOC과 같은 온라인 공개 강좌 플랫폼 등을 의미할 수 있다.The free learning platforms 200 may refer to content providers that provide free learning materials such as video lectures through websites or applications. For example, the free learning platforms 200 may refer to a video sharing platform such as YouTube, an education platform such as Edwith, and an online open lecture platform such as K-MOOC.

유저 단말기(300)는 유저가 서버(100)로부터 제공되는 무료 학습 큐레이션 서비스에 접근하기 위한 디바이스를 의미할 수 있다. 예를 들면, 유저 단말기(300)는 시스템(10)을 구현하는 웹 사이트 또는 애플리케이션을 구동 가능한 PC, 랩탑 컴퓨터, 태블릿 피씨 및 스마트폰 등 각종 전자 디바이스들을 의미할 수 있다.The user terminal 300 may mean a device for a user to access a free learning curation service provided from the server 100 . For example, the user terminal 300 may refer to various electronic devices such as a PC, laptop computer, tablet PC, and smart phone capable of driving a website or application implementing the system 10 .

시스템(10)의 구성 요소들 간의 상호작용은 유선 또는 무선 데이터 통신 환경에서 수행될 수 있다. 유무선 데이터 통신에 의해 무료 학습 플랫폼들(200) 및 서버(100) 간의 상호작용과 서버(100) 및 유저 단말기(300) 간의 상호작용이 이루어질 수 있다.Interaction between components of system 10 may be performed in a wired or wireless data communication environment. Interactions between the free learning platforms 200 and the server 100 and between the server 100 and the user terminal 300 can be achieved through wired/wireless data communication.

시스템(10)에서, 무료 학습 플랫폼들(200)을 활용하는 서버(100)에 의해 유저 단말기(300)에 무료 학습 큐레이션 서비스가 제공될 수 있으므로, 유저는 학습하고자 하는 선호 조건에 부합하는 무료 학습 컨텐츠들 및 그에 관한 커리큘럼을 제공받을 수 있으므로, 유저의 학습 편의성 및 효율성이 향상될 수 있다.In the system 10, since a free learning curation service can be provided to the user terminal 300 by the server 100 utilizing the free learning platforms 200, the user can learn for free according to the preference condition to learn. Since learning contents and related curriculum can be provided, the user's learning convenience and efficiency can be improved.

도 2는 일부 실시예에 따른 무료 학습 큐레이션 서비스를 제공하는 서버를 구성하는 요소들을 나타내는 블록도이다.2 is a block diagram illustrating elements constituting a server providing a free learning curation service according to some embodiments.

도 2를 참조하면, 무료 학습 큐레이션 서비스를 제공하는 서버(100)는 메모리(110) 및 프로세서(120)를 포함할 수 있다. 다만 이에 제한되는 것은 아니고, 도 2에 도시된 요소들 외에 다른 범용적인 요소들이 서버(100)에 더 포함될 수 있다.Referring to FIG. 2 , a server 100 providing a free learning curation service may include a memory 110 and a processor 120 . However, it is not limited thereto, and other general-purpose elements other than the elements shown in FIG. 2 may be further included in the server 100.

서버(100)는 무료 학습 큐레이션 서비스를 제공하기 위한 컴퓨팅 디바이스일 수 있다. 서버(100)는 각종 데이터, 명령어들, 적어도 하나의 프로그램 또는 소프트웨어를 저장하기 위한 수단으로서 메모리(110)를 포함할 수 있고, 명령어들 또는 적어도 하나의 프로그램을 실행함으로써 각종 데이터에 대한 처리를 수행하기 위한 수단으로서 프로세서(120)를 포함할 수 있다.The server 100 may be a computing device for providing a free learning curation service. The server 100 may include the memory 110 as a means for storing various data, instructions, and at least one program or software, and executes the instructions or at least one program to process various data. As a means for doing so, the processor 120 may be included.

메모리(110)는 무료 학습 큐레이션 서비스를 제공하기 위한 명령들을 저장할 수 있다. 예를 들면, 메모리(110)는 컴퓨터 프로그램 또는 모바일 애플리케이션과 같은 소프트웨어를 구성하는 명령어들을 저장할 수 있다. 또한, 메모리(110)는 명령어들 또는 적어도 하나의 프로그램의 실행에 필요한 각종 데이터를 저장할 수 있다.The memory 110 may store instructions for providing a free learning curation service. For example, memory 110 may store instructions constituting software such as a computer program or mobile application. In addition, the memory 110 may store instructions or various data necessary for executing at least one program.

메모리(110)는 ROM, PROM, EPROM, EEPROM, 플래시 메모리, PRAM, MRAM, RRAM, FRAM 등과 같은 비휘발성 메모리로 구현될 수 있고, 또는 DRAM, SRAM, SDRAM, PRAM, RRAM, FeRAM 등의 휘발성 메모리로 구현될 수 있다. 또한, 메모리(110)는 HDD, SSD, SD, Micro-SD 등으로 구현될 수도 있다.The memory 110 may be implemented with non-volatile memory such as ROM, PROM, EPROM, EEPROM, flash memory, PRAM, MRAM, RRAM, FRAM, etc., or volatile memory such as DRAM, SRAM, SDRAM, PRAM, RRAM, FeRAM, etc. can be implemented as Also, the memory 110 may be implemented as a HDD, SSD, SD, Micro-SD, or the like.

프로세서(120)는 메모리(110)에 저장되는 명령어들 또는 적어도 하나의 프로그램을 실행함으로써 무료 학습 큐레이션 서비스를 제공할 수 있다. 프로세서(120)는 무료 학습 큐레이션 서비스를 제공하기 위한 일련의 처리 과정들을 수행할 수 있다. 프로세서(120)는 서버(100)를 제어하기 위한 전반적인 기능을 수행할 수 있고, 서버(100) 내부의 각종 연산을 처리할 수 있다.The processor 120 may provide a free learning curation service by executing instructions stored in the memory 110 or at least one program. The processor 120 may perform a series of processes for providing a free learning curation service. The processor 120 may perform overall functions for controlling the server 100 and may process various internal operations of the server 100 .

프로세서(120)는 다수의 논리 게이트들의 어레이 또는 범용적인 마이크로 프로세서로 구현될 수 있다. 프로세서(120)는 단일의 프로세서 또는 복수의 프로세서들로 구성될 수 있다. 프로세서(120)는 적어도 하나의 프로그램을 저장하는 메모리(110) 와 별개의 구성이 아닌, 메모리(110)와 함께 일체로 구성될 수도 있다. 프로세서(120)는 장치(100) 내에 구비되는 CPU, GPU 및 AP 중 적어도 하나일 수 있으나, 이는 예시에 불과할 뿐, 프로세서(120)는 다른 다양한 형태로도 구현될 수 있다.Processor 120 may be implemented as an array of multiple logic gates or as a general-purpose microprocessor. Processor 120 may consist of a single processor or a plurality of processors. The processor 120 may be integrated with the memory 110, rather than a separate component from the memory 110 storing at least one program. The processor 120 may be at least one of a CPU, a GPU, and an AP provided in the device 100, but this is only an example, and the processor 120 may be implemented in various other forms.

서버(100)는 무료 학습 큐레이션 서비스를 제공하기 위한 방법의 단계들을 수행할 수 있다. 서버(100)가 무료 학습 큐레이션 서비스를 제공하기 위한 방법의 각 단계를 수행함에 따라 유저에게 무료 학습 큐레이션 서비스가 제공될 수 있다.The server 100 may perform steps of a method for providing a free learning curation service. As the server 100 performs each step of the method for providing the free learning curation service, the free learning curation service may be provided to the user.

프로세서(120)는 서비스 대상 분야를 과목별 및 세부 과정별로 구분하는 미배정 커리큘럼에 대해 과목별 키워드들 및 세부 과정별 키워드들을 선정하여 키워드 사전을 구축할 수 있다.The processor 120 may build a keyword dictionary by selecting keywords for each subject and keywords for each detailed course for an unassigned curriculum that classifies service target fields by subject and detailed course.

서비스 대상 분야는 학습의 대상이 되는 특정 분야를 의미할 수 있다. 예를 들면, 서비스 대상 분야는, 빅데이터, 인공지능, 머신 러닝 및 소프트웨어 중 적어도 하나를 포함하는 4차 산업혁명 기술 분야일 수 있다. 그 외에도, 서비스 대상 분야는 영어, 외국어, MS 오피스, 재테크 및 피트니스 등 다양한 분야에 해당할 수 있다.The service target field may mean a specific field that is a target of learning. For example, the service target field may be a 4th industrial revolution technology field including at least one of big data, artificial intelligence, machine learning, and software. In addition, service targets may correspond to various fields such as English, foreign languages, MS Office, financial technology, and fitness.

미배정 커리큘럼은 서비스 대상 분야를 과목별로, 및 세부 과정별로 구분한 것을 의미할 수 있다. 예를 들면, 4차 산업혁명 기술 분야는 빅데이터 과목, 인공지능 과목 등으로 구분될 수 있고, 각 과목에서는 학습의 세부 과정이 구분될 수 있다. 다만 미배정 커리큘럼은 아직 세부 컨텐츠들이 과목별 및 세부 과정별로 배정되지 않은 커리큘럼을 의미할 수 있다.The unassigned curriculum may mean that a service target field is divided into subjects and detailed courses. For example, the 4th industrial revolution technology field can be divided into big data subjects, artificial intelligence subjects, etc., and detailed processes of learning can be distinguished in each subject. However, the unassigned curriculum may refer to a curriculum in which detailed contents have not yet been assigned to each subject and detailed course.

미배정 커리큘럼으로는 각 서비스 대상 분야의 전문가가 미리 구성해둔 것이 활용될 수 있다. 또는, 각 서비스 대상 분야에서 널리 쓰이는 학습 교재의 목차를 참조하여 구성될 수도 있다.As an unassigned curriculum, a pre-configured by experts in each service target field can be used. Alternatively, it may be configured by referring to the table of contents of widely used learning textbooks in each service target field.

키워드 사전은 미배정 커리큘럼의 과목별 키워드들 및 세부 과정별 키워드들을 포함할 수 있다. 미배정 커리큘럼의 설정 과정에서 특정 서비스 대상 분야의 과목마다, 또한 세부 과정마다 어떤 키워드들이 관련이 있는지가 분석되어 과목별 및 세부 과정별로 키워드들이 선정될 수 있다. 한편, 키워드 선정은 미배정 커리큘럼을 구성하는 전문가에 의해, 또는 학습 교재를 참조하여 수행될 수 있다.The keyword dictionary may include keywords for each subject of the unassigned curriculum and keywords for each detailed course. In the process of setting the unassigned curriculum, keywords may be selected for each subject and each detailed course by analyzing which keywords are related to each subject of a specific service target field and each detailed course. On the other hand, keyword selection may be performed by an expert constituting an unassigned curriculum or by referring to learning materials.

프로세서(120)는 과목별 키워드들 및 세부 과정별 키워드들을 기반으로 무료 학습 플랫폼들(200)에 대한 데이터 크롤링을 수행하여 미배정 커리큘럼에 대한 무료 학습 컨텐츠들의 학습 데이터를 수집할 수 있다.The processor 120 may perform data crawling of the free learning platforms 200 based on keywords for each subject and keywords for each detailed course to collect learning data of free learning contents for an unassigned curriculum.

데이터 크롤링은 무료 학습 플랫폼들(200)을 통해 제공되는 무료 학습 컨텐츠들을 수집하는 과정을 의미할 수 있다. 데이터 크롤링 또는 웹 크롤링은 사람의 직접 검색 대신 고성능의 처리 장치를 통해 웹 페이지, 포털 사이트 등에서 방대한 양의 데이터를 수집하여 빅 데이터 분석을 가능하게 하는 작업을 의미할 수 있다. 데이터 크롤링에 의하면 무료 학습 플랫폼들(200)로부터 무료 학습 컨텐츠들이 수집될 수 있다.Data crawling may refer to a process of collecting free learning contents provided through the free learning platforms 200 . Data crawling or web crawling may refer to an operation that enables big data analysis by collecting vast amounts of data from web pages, portal sites, etc. through a high-performance processing unit instead of a direct search by a person. Free learning contents may be collected from the free learning platforms 200 according to data crawling.

데이터 크롤링에 의해 수집되는 학습 데이터는 무료 학습 컨텐츠들에 관한 각종 데이터로서, 크롤링으로 수집 가능한 컨텐츠 소개, 제목, 자막, 댓글 등을 의미할 수 있다. 한편, 학습 데이터는 동영상 강의로부터 추출되는 음성 데이터를 포함할 수 있고, 해당 음성 데이터로부터 변환되는 텍스트 데이터를 포함할 수 있다.Learning data collected by data crawling is various data about free learning contents, and may refer to content introductions, titles, subtitles, comments, and the like that can be collected by crawling. Meanwhile, the learning data may include audio data extracted from video lectures and may include text data converted from the corresponding audio data.

데이터 크롤링은 키워드들을 기반으로 수행될 수 있다. 미배정 커리큘럼에서 선정되어 있는 과목별 키워드들 및 세부 과정별 키워드들이 데이터 크롤링을 수행하기 위한 검색어가 될 수 있다. 다만, 키워드들의 검색 결과가 즉시 미배정 커리큘럼의 과목 및 세부 과정에 분배되는 것은 아니고, 이후 별도의 분류 알고리즘을 거쳐 검색 결과인 무료 학습 컨텐츠들이 미배정 커리큘럼에 분배될 수 있다.Data crawling can be performed based on keywords. Keywords for each subject selected from the unassigned curriculum and keywords for each detailed course may be search words for data crawling. However, search results of keywords are not immediately distributed to subjects and detailed courses of the unassigned curriculum, and free learning contents, which are search results, may be distributed to the unassigned curriculum after going through a separate classification algorithm.

프로세서(120)는 학습 데이터에 대한 데이터 전처리를 수행하여 학습 데이터로부터 분석용 데이터를 추출할 수 있다.The processor 120 may extract data for analysis from the learning data by performing data preprocessing on the training data.

학습 데이터는 데이터 크롤링에 의해 수집된 텍스트, 음성과 같은 비정형 데이터이기 때문에 직접 학습 데이터에 대해 분류 모형을 적용하기가 어려울 수 있다. 따라서, 분류 모형을 적용하기 용이한 형태로 학습 데이터를 변경하기 위해 데이터 전처리가 수행될 수 있다.Since training data is unstructured data such as text and voice collected by data crawling, it may be difficult to apply a classification model to training data directly. Accordingly, data preprocessing may be performed to change the training data into a form in which a classification model can be easily applied.

데이터 전처리는 학습 데이터를 수치화하는 처리 과정을 의미할 수 있다. 예를 들면, TF-IDF을 산출하는 것 또는 word2vec 알고리즘을 적용하는 것이 데이터 전처리에 해당할 수 있다. 다만 이에 제한되는 것은 아니고, 학습 데이터를 보다 다루기 쉽게 변경하는 것이라면, 수치 이외의 다른 형태의 데이터로 변경하는 처리 방식 또한 데이터 전처리에 해당할 수 있다.Data pre-processing may refer to a process of quantifying training data. For example, calculating TF-IDF or applying the word2vec algorithm may correspond to data preprocessing. However, it is not limited thereto, and if the learning data is changed to be more manageable, a processing method for changing data in a form other than numerical values may also correspond to data pre-processing.

프로세서(120)는 머신 러닝을 통해 컨텐츠 분류를 수행하도록 학습되는 인공지능 분류 모형을 활용하여 분석용 데이터를 처리함으로써 무료 학습 컨텐츠들을 미배정 커리큘럼에 분배하여 배정 커리큘럼을 생성할 수 있다.The processor 120 may generate an assigned curriculum by distributing free learning contents to an unassigned curriculum by processing data for analysis using an artificial intelligence classification model learned to classify contents through machine learning.

인공지능 분류 모형은 머신 러닝에 의해 컨텐츠 분류를 수행하도록 학습될 수 있다. 인공지능 분류 모형에 의하면 무료 학습 컨텐츠들이 과목별로 및 세부 과정별로 구분되어 있는 미배정 커리큘럼의 어떤 과목의 어떤 세부 과정에 속하는지가 분류될 수 있다. 이와 같은 인공지능 분류 모형의 학습은 CNN, SVM, 베이지안 분류 등 다양한 방식으로 수행될 수 있다.The artificial intelligence classification model can be trained to perform content classification by machine learning. According to the artificial intelligence classification model, free learning contents can be classified according to which subject and which detailed course of the unassigned curriculum are classified by subject and detailed course. Learning of such an artificial intelligence classification model can be performed in various ways such as CNN, SVM, and Bayesian classification.

인공지능 분류 모형은 분석용 데이터를 입력받아 무료 학습 컨텐츠들이 어떤 과목과 세부 과정에 속하는지가 가장 적합한지를 출력할 수 있다. 다만 인공지능 분류 모형이 비정형 데이터인 학습 데이터를 직접 처리하기는 어려우므로, 데이터 전처리에 의해 학습 데이터로부터 추출되는 분석 데이터가 인공지능 분류 모형에 대한 입력이 될 수 있다.The artificial intelligence classification model can receive data for analysis and output which subject and detailed course the free learning contents belong to the most appropriate. However, since it is difficult for an artificial intelligence classification model to directly process training data, which is unstructured data, analysis data extracted from training data by data preprocessing can be input to an artificial intelligence classification model.

인공지능 분류 모형에 의해 분석 데이터에 대한 처리가 이루어지는 경우 무료 학습 컨텐츠들이 미배정 커리큘럼에 분배되어 배정 커리큘럼이 생성될 수 있다. 따라서, 미배정 커리큘럼의 키워드 사전에 기반한 데이터 크롤링의 결과로 수집된 무료 학습 컨텐츠들은 인공지능 분류 모형에 의한 분류에 의해 비로소 과목 및 세부 과정이 분류될 수 있고, 해당 과목 및 세부 과정에 분배되어 배정 커리큘럼이 형성될 수 있다.When the analysis data is processed by the artificial intelligence classification model, the assigned curriculum may be created by distributing the free learning contents to the non-assigned curriculum. Therefore, the free learning contents collected as a result of data crawling based on the keyword dictionary of the unassigned curriculum can be classified into subjects and detailed courses only by classification by the artificial intelligence classification model, and then distributed and assigned to the corresponding subjects and detailed courses. Curriculum can be formed.

한편, 인공지능 분류 모형이 무료 학습 컨텐츠들을 분류하는 경우에, 무료 학습 컨텐츠들이 광고성 컨텐츠인지 여부 또한 함께 분류될 수 있다. 즉, 무료 학습 컨텐츠들이 광고를 포함하는지 및 그 광고의 정도가 어느 정도인지를 함께 분류하도록 인공지능 분류 모형이 학습될 수도 있다.Meanwhile, when the artificial intelligence classification model classifies free learning contents, whether the free learning contents are advertising contents may also be classified together. That is, an artificial intelligence classification model may be trained to classify together whether the free learning contents include advertisements and the degree of the advertisements.

프로세서(120)는 평점 산출 모형에 기초하여 무료 학습 컨텐츠들 각각에 대한 만족도 평점을 산출할 수 있다.The processor 120 may calculate a satisfaction rating for each of the free learning contents based on the rating calculation model.

배정 커리큘럼에는 무료 학습 컨텐츠들이 과목별 및 세부 과정별로 분배되어 있고, 그에 더하여 무료 학습 컨텐츠들 각각에 대한 만족도 평점이 함께 제공될 수 있다. 이를 위해 만족도 평점을 산출하는 평점 산출 모형이 개발될 수 있다. 평점 산출 모형은 다중 회귀 분석 등의 통계적 방식으로 무료 학습 컨텐츠들에 대한 만족도를 평점화하는 모형을 의미할 수 있다.In the assignment curriculum, free learning contents are distributed according to subjects and detailed courses, and in addition, satisfaction ratings for each of the free learning contents may be provided together. To this end, a score calculation model that calculates a satisfaction score may be developed. The rating calculation model may refer to a model for rating satisfaction with free learning contents using a statistical method such as multiple regression analysis.

프로세서(120)는 서비스 대상 분야에 관한 유저의 쿼리 입력에 대응하여 배정 커리큘럼의 적어도 일부를 만족도 평점과 함께 유저 단말기에 제공할 수 있다.The processor 120 may provide the user terminal with at least a part of the assignment curriculum along with a satisfaction rating in response to a user's query input regarding the service target field.

배정 커리큘럼이 생성되어 있고 만족도 평점이 산출되어 있는 상태에서, 유저는 유저 단말기(300)를 통해 서버(100)에 자신의 관심 분야 또는 선호 강의 조건을 쿼리 입력으로 전송할 수 있고, 그에 대응하여, 프로세서(120)는 유저의 쿼리 입력에 대응하는 결과물을 유저 단말기(300)에 제공할 수 있다.In a state in which the assignment curriculum is created and the satisfaction rating is calculated, the user may transmit his interest field or preferred lecture conditions to the server 100 through the user terminal 300 as a query input, and in response, the processor Step 120 may provide the user terminal 300 with a result corresponding to the user's query input.

예를 들면, 배정 커리큘럼이 4차 산업혁명 기술 분야에 관한 것이고, 유저의 쿼리 입력이 AI 기술 분야인 경우, 프로세서(120)는 과목별로 분류되어 있는 배정 커리큘럼 중에서 AI 과목으로 분류되는 컨텐츠들만을 필터링하여 이를 만족도 평점과 함께 유저에게 제공할 수 있다.For example, when the assignment curriculum relates to the 4th Industrial Revolution technology field and the user's query input is the AI technology field, the processor 120 filters only contents classified as AI subjects from the assignment curriculum classified by subject. This can be provided to the user along with the satisfaction rating.

전술한 바와 같이, 프로세서(120)는 데이터 크롤링을 수행한 결과를 인공지능 분류 모형으로 분류하고, 그에 대한 만족도 평점을 산출하여 유저에게 커리큘럼의 형태로 제공할 수 있으므로, 유저는 자신이 직접 무료 학습 플랫폼들(200)을 검색하거나, 무료 학습 컨텐츠들이 어떤 내용을 다루고 있는지를 직접 확인해보는 수고 없이도 학습을 위한 커리큘럼을 만족도 평점과 함께 제공받을 수 있으므로, 학습의 편의성 및 효율성이 향상될 수 있다.As described above, the processor 120 classifies the result of performing data crawling into an artificial intelligence classification model, calculates a satisfaction rating for it, and provides it to the user in the form of a curriculum, so that the user can learn for himself free of charge. Since a curriculum for learning can be provided together with a satisfaction rating without the effort of searching the platforms 200 or personally checking what content the free learning contents cover, the convenience and efficiency of learning can be improved.

도 3은 일부 실시예에 따른 무료 학습 큐레이션 서비스가 제공되는 과정을 설명하기 위한 도면이다.3 is a diagram for explaining a process of providing a free learning curation service according to some embodiments.

도 3을 참조하면, 무료 학습 큐레이션 서비스가 제공되는 과정이 프로세스 310 내지 프로세스 350으로 도시되어 있다. 프로세스 310 내지 프로세스 350는 서버(100)의 프로세서(120)에 의해 수행되는 무료 학습 큐레이션 서비스를 제공하기 위한 과정들을 의미할 수 있다.Referring to FIG. 3 , a process of providing a free learning curation service is shown as processes 310 to 350 . Processes 310 to 350 may mean processes for providing a free learning curation service performed by the processor 120 of the server 100 .

프로세스 310에서는, 미배정 커리큘럼 및 키워드 사전이 생성될 수 있다. 서버(100)는 서비스 대상 분야를 과목별 및 세부 과정별로 구분하는 미배정 커리큘럼에 대해 과목별 키워드들 및 세부 과정별 키워드들을 선정하여 키워드 사전을 구축할 수 있다.In process 310, an unassigned curriculum and keyword dictionary may be created. The server 100 may build a keyword dictionary by selecting keywords for each subject and keywords for each detailed course for an unassigned curriculum that classifies service target fields by subject and detailed course.

프로세스 320에서는, 데이터 크롤링이 수행될 수 있다. 서버(100)는 과목별 키워드들 및 세부 과정별 키워드들을 기반으로 무료 학습 플랫폼들(200)에 대한 데이터 크롤링을 수행하여 미배정 커리큘럼에 대한 무료 학습 컨텐츠들의 학습 데이터를 수집할 수 있다.In process 320, data crawling may be performed. The server 100 may collect learning data of free learning contents for an unassigned curriculum by performing data crawling on the free learning platforms 200 based on keywords for each subject and keywords for each detailed course.

프로세스 330에서는, 데이터 전처리가 수행될 수 있고, 인공지능 분류 모형을 활용한 무료 학습 컨텐츠들의 배정이 이루어질 수 있다. 서버(100)는 학습 데이터에 대한 데이터 전처리를 수행하여 학습 데이터로부터 분석용 데이터를 추출할 수 있고, 머신 러닝을 통해 컨텐츠 분류를 수행하도록 학습되는 인공지능 분류 모형을 활용하여 분석용 데이터를 처리함으로써 무료 학습 컨텐츠들을 미배정 커리큘럼에 분배하여 배정 커리큘럼을 생성할 수 있다.In process 330, data pre-processing may be performed, and free learning contents may be allocated using an artificial intelligence classification model. The server 100 may extract data for analysis from the learning data by performing data pre-processing on the learning data, and processing the data for analysis by utilizing an artificial intelligence classification model learned to perform content classification through machine learning. An assigned curriculum may be created by distributing free learning contents to an unassigned curriculum.

프로세스 340에서는, 만족도 평점이 산출될 수 있다. 서버(100)는 평점 산출 모형에 기초하여 무료 학습 컨텐츠들 각각에 대한 만족도 평점을 산출할 수 있다.In process 340, a satisfaction rating may be calculated. The server 100 may calculate a satisfaction rating for each of the free learning contents based on the rating calculation model.

프로세스 350에서는, 배정 커리큘럼 및 만족도 평점이 유저에게 제공될 수 있다. 서버(100)는 서비스 대상 분야에 관한 유저의 쿼리 입력에 대응하여 배정 커리큘럼의 적어도 일부를 만족도 평점과 함께 유저 단말기에 제공할 수 있다.In process 350, the placement curriculum and satisfaction rating may be provided to the user. The server 100 may provide the user terminal with at least a part of the assignment curriculum along with a satisfaction rating in response to a user's query input regarding the service target field.

도시된 바와 같이, 프로세스 흐름은 프로세스 310에서 프로세스 350 까지 진행되며, 기본 프로세스에 의해 동영상 분류, 만족도 평점화 및 서비스 운영이 이루어질 수 있다. 한편, 전체 프로세스에는 역순환 프로세스가 존재하여 역방향의 피드백이 이루어질 수 있다. 예를 들면, 데이터 전처리의 결과로 키워드 사전에 유의어들이 추가될 수 있고, 서비스 운영 과정에서 유저의 피드백이 평점 산출 모형의 생성 또는 인공지능 분류 모형의 고객 속성 분석에 활용될 수도 있다.As shown, the process flow proceeds from process 310 to process 350, and video classification, satisfaction rating, and service operation may be performed by the basic process. On the other hand, since a reverse cycle process exists in the entire process, feedback in the reverse direction may be achieved. For example, synonyms may be added to a keyword dictionary as a result of data pre-processing, and user feedback may be used in generating a rating calculation model or analyzing customer attributes in an artificial intelligence classification model during service operation.

도 4는 일부 실시예에 따른 미배정 커리큘럼을 설정하고 키워드 사전을 구축하는 과정을 설명하기 위한 도면이다.4 is a diagram for explaining a process of setting an unassigned curriculum and constructing a keyword dictionary according to some embodiments.

도 4를 참조하면, 도 3의 프로세스 310에서 서버(100)가 서비스 대상 분야를 과목별 및 세부 과정별로 구분하는 미배정 커리큘럼에 대해 과목별 키워드들 및 세부 과정별 키워드들을 선정하여 키워드 사전을 구축하는 과정을 보다 구체적으로 설명하기 위해 그 일부만이 도시된 미배정 커리큘럼(410) 및 키워드 사전(420)이 도시되어 있다.Referring to FIG. 4, in process 310 of FIG. 3, the server 100 selects keywords for each subject and keywords for each detailed process for an unassigned curriculum that classifies service target fields by subject and detailed process to build a keyword dictionary. In order to explain the process in more detail, an unassigned curriculum 410 and a keyword dictionary 420 in which only a portion thereof is shown are shown.

미배정 커리큘럼(410)은 서비스 대상 분야를 과목별 및 세부 과정별로 구분할 수 있다. 도시된 예시에서는, 빅데이터 분야의 데이터 관리 분석 과목이 초급, 중급 및 고급으로 분류될 수 있고, 각 단계들이 세부 과정들에 해당하는 학습 주제들로 다시 분류될 수 있다.The unassigned curriculum 410 may classify service target fields by subject and detailed course. In the illustrated example, data management analysis subjects in the field of big data may be classified into beginner, intermediate, and advanced levels, and each level may be further classified into learning topics corresponding to detailed courses.

키워드 사전(420)은 미배정 커리큘럼(410)에 포함되는 과목들 및 세부 과정들 각각에 관하여 선정되는 키워드들을 포함할 수 있다. 키워드 사전(420)은 또한 과목들 및 세부 과정들 각각에 대해 유의어와 같은 추가 키워드를 더 포함할 수 있다. 전술한 바와 같이 유의어들의 추가는 무료 학습 컨텐츠들의 학습 데이터에 대한 데이터 전처리 과정에서 수행될 수 있다.The keyword dictionary 420 may include keywords selected for each of the subjects and detailed courses included in the unassigned curriculum 410 . The keyword dictionary 420 may also include additional keywords, such as synonyms, for each of the subjects and sub-courses. As described above, the addition of synonyms may be performed in a data pre-processing process for learning data of free learning contents.

도 5는 일부 실시예에 따른 무료 학습 플랫폼들에 대한 데이터 크롤링을 수행하여 학습 데이터를 수집하는 과정을 설명하기 위한 도면이다.5 is a diagram for explaining a process of collecting learning data by performing data crawling on free learning platforms according to some embodiments.

도 5를 참조하면, 도 3의 프로세스 320에서 서버(100)가 과목별 키워드들 및 세부 과정별 키워드들을 기반으로 무료 학습 플랫폼들(200)에 대한 데이터 크롤링을 수행하여 미배정 커리큘럼에 대한 무료 학습 컨텐츠들의 학습 데이터를 수집하는 과정을 보다 구체적으로 설명하기 위한 크롤링 프로세스(510)가 도시되어 있다.Referring to FIG. 5, in process 320 of FIG. 3, the server 100 performs data crawling of the free learning platforms 200 based on keywords for each subject and keywords for each detailed course, thereby free learning for the unassigned curriculum. A crawling process 510 for explaining a process of collecting learning data of contents in more detail is shown.

크롤링 프로세스(510)의 사전 준비 단계에서는, 전문가 지식 및 관련 과목의 기본 교재를 활용하여 미배정 커리큘럼(기본 커리큘럼)이 작성될 수 있고, 과목별 및 세부 과정별로 키워드들을 선정하여 키워드 사전이 생성될 수 있다.In the preliminary preparation step of the crawling process 510, an unassigned curriculum (basic curriculum) may be created using expert knowledge and basic textbooks of related subjects, and a keyword dictionary may be created by selecting keywords for each subject and detailed process. can

이후, 무료 학습 플랫폼들(200)(동영상 강의 데이터 소스)에 대한 데이터 크롤링이 수행될 수 있고, 그로부터 무료 학습 컨텐츠들의 학습 데이터가 수집되어 강의 기본 DB가 구축될 수 있다. 프로세서(120)는 학습 데이터를 수집할 때, 무료 학습 플랫폼들(200)로부터 무료 학습 컨텐츠들 각각의 제목, 과목, 소개, 자막, 댓글, 강사 정보, URL, 조회수, 소속 채널, 구독자수, 좋아요수, 컨텐츠 크기 및 음성 데이터 중 적어도 하나를 수집할 수 있다.Thereafter, data crawling for the free learning platforms 200 (video lecture data sources) may be performed, and learning data of free learning contents may be collected therefrom to build a basic lecture DB. When the processor 120 collects learning data, the title, subject, introduction, subtitle, comment, instructor information, URL, number of views, affiliated channel, number of subscribers, and likes for each of the free learning contents from the free learning platforms 200. At least one of the number, content size, and voice data may be collected.

또한, 프로세서(120)는, 학습 데이터를 수집할 때, 학습 데이터에 음성 데이터가 포함되는 경우 음성 데이터에 대한 음성 인식(STT, Speech To Text)을 수행하여 음성 데이터에 대응되는 텍스트 데이터를 수집할 수 있다. 즉, 무료 학습 컨텐츠들의 형식적인 데이터만이 수집되는 것이 아니고, 음성 데이터에 대응되는 텍스트 데이터가 수집될 수 있으므로, 데이터 크롤링에 의해 무료 학습 컨텐츠들이 실질적으로 어떤 내용을 포함하고 있는지가 수집될 수 있다.In addition, when the training data is collected, the processor 120 performs speech recognition (STT, Speech To Text) on the speech data when the training data includes the speech data to collect text data corresponding to the speech data. can That is, since only formal data of free learning contents is collected, and text data corresponding to voice data can be collected, what content the free learning contents actually contain can be collected by data crawling. .

도 6은 일부 실시예에 따른 음성 데이터에 대한 음성 인식을 수행하여 음성 데이터에 대응되는 텍스트 데이터를 수집하는 과정을 설명하기 위한 도면이다.6 is a diagram for explaining a process of collecting text data corresponding to voice data by performing voice recognition on voice data according to some embodiments.

도 6을 참조하면, 음성 인식에 관한 선행 연구에 해당하는 R-medus(610)가 도시되어 있다. R-medus(610)는 음성을 통해 코딩을 수행하는 모바일 R 실습용 애플리케이션으로서, 산학연 과제로 개발되어 음성을 텍스트로 변환하는 기능을 할 수 있다.Referring to FIG. 6 , an R-medus 610 corresponding to a previous study on speech recognition is shown. R-medus (610) is a mobile R practice application that performs coding through voice.

R-medus(610)에서의 음성 인식 기술은 무료 학습 큐레이션 서비스에도 적용될 수 있다. 프로세서(120)가 음성 데이터에 대한 음성 인식(STT)을 수행하여 음성 데이터에 대응되는 텍스트 데이터를 수집하는 경우에, R-medus(610)를 활용하여 음성 인식(STT)이 수행될 수 있다.The voice recognition technology in R-medus 610 can also be applied to free learning curation services. When the processor 120 performs voice recognition (STT) on voice data and collects text data corresponding to the voice data, voice recognition (STT) may be performed using the R-medus 610.

도 7은 일부 실시예에 따른 분석용 데이터를 추출하고 배정 커리큘럼을 생성하는 과정을 설명하기 위한 도면이다.7 is a diagram for explaining a process of extracting data for analysis and generating an assignment curriculum according to some embodiments.

도 7을 참조하면, 무료 학습 플랫폼들(200)에 대한 데이터 크롤링의 결과로 수집되는 무료 학습 컨텐츠들(C1, C2, C3)에 대한 데이터 전처리 과정(710), 머신 러닝을 통해 컨텐츠 분류를 수행하도록 학습되는 인공지능 분류 모형(720) 및 무료 학습 컨텐츠들(C1, C2, C3)의 배정에 의해 생성되는 배정 커리큘럼(730)이 도시되어 있다.Referring to FIG. 7 , data pre-processing process 710 for free learning contents (C1, C2, C3) collected as a result of data crawling for free learning platforms 200, content classification is performed through machine learning An artificial intelligence classification model 720 learned to do so and an assignment curriculum 730 generated by assignment of free learning contents C1, C2, and C3 are shown.

무료 학습 컨텐츠들(C1, C2, C3)은 무료 학습 플랫폼들(200)에 대한 데이터 크롤링의 결과로 수집될 수 있다. 도시된 예시에서는 3개의 컨텐츠들이 수집된 것으로 표현되었으나, 이는 예시일 뿐 다양한 개수의 컨텐츠들이 수집될 수 있다.The free learning contents C1 , C2 , and C3 may be collected as a result of data crawling of the free learning platforms 200 . In the illustrated example, it is expressed that three contents are collected, but this is only an example, and various numbers of contents may be collected.

무료 학습 컨텐츠들(C1, C2, C3)은 학습 데이터를 가질 수 있다. 예를 들면, 전술한 바와 같이 무료 학습 컨텐츠(C1)의 영상 제목, 자막, 설명 문구, 댓글, 음성에 대응되는 텍스트 등이 학습 데이터(711)에 해당할 수 있고, 나머지 무료 학습 컨텐츠들(C2, C3) 또한 각자의 학습 데이터를 가질 수 있다.The free learning contents C1, C2, and C3 may have learning data. For example, as described above, video titles, subtitles, explanatory phrases, comments, and text corresponding to voices of the free learning content C1 may correspond to the learning data 711, and the remaining free learning content C2 , C3) may also have their own learning data.

데이터 전처리 과정(710)에서 학습 데이터에 대한 데이터 전처리가 수행되어 분석용 데이터가 추출될 수 있다. 예를 들면, 무료 학습 컨텐츠(C1)의 학습 데이터(711)에 대한 데이터 전처리에 의해 분석용 데이터(712)가 추출될 수 있다. 예를 들면, 분석용 데이터(712)는 비정형 데이터에 해당하는 학습 데이터(711)로부터 변환되어 벡터의 형태로 표현되는 워드 임베딩(word embedding) 등일 수 있다.In the data pre-processing process 710, data pre-processing is performed on the training data, and data for analysis may be extracted. For example, data 712 for analysis may be extracted by data pre-processing of the learning data 711 of the free learning content C1. For example, the analysis data 712 may be word embedding converted from the learning data 711 corresponding to unstructured data and expressed in a vector form.

데이터 전처리 과정(710)은 텍스트에 해당하는 학습 데이터(711)를 벡터와 같이 수치적으로 표현되는 분석용 데이터(712)로 변환하는 과정을 의미할 수 있다. 이를 위해, 프로세서(120)는 분석용 데이터(712)를 추출할 때, 학습 데이터(711)에 대해 자연어 처리(NLP, Natural Language Processing) 기반의 형태소 분석 및 텍스트 마이닝을 수행하여 학습 데이터(711)로부터 워드 데이터를 추출하고; 그리고 워드 데이터에 대한 TF-IDF(Term Frequency-Inverse Document Frequency) 산출 및 word2vec 알고리즘 적용 중 적어도 하나를 수행할 수 있다.The data pre-processing process 710 may refer to a process of converting training data 711 corresponding to text into analysis data 712 expressed numerically such as a vector. To this end, when extracting data 712 for analysis, the processor 120 performs natural language processing (NLP)-based morphological analysis and text mining on the learning data 711 to generate the training data 711. extract word data from; In addition, at least one of TF-IDF (Term Frequency-Inverse Document Frequency) calculation and word2vec algorithm application may be performed for word data.

학습 데이터(711)에 대해 자연어 처리 기반의 형태소 분석 및 텍스트 마이닝이 수행되는 경우, 학습 데이터(711)의 텍스트를 구성하는 형태소들 내지 단어들이 워드 데이터로서 추출될 수 있다. 이와 같은 워드 데이터에 대해서는 TF-IDF와 같이 해당 워드 데이터의 중요도를 나타내는 통계적 수치가 산출될 수도 있고, word2vec 알고리즘이 적용되어 워드 데이터가 벡터의 형태로 수치적으로 표현될 수도 있다.When morpheme analysis and text mining based on natural language processing are performed on the training data 711, morphemes or words constituting the text of the training data 711 may be extracted as word data. Statistical values representing the importance of the word data may be calculated for such word data, such as TF-IDF, or the word data may be numerically expressed in the form of a vector by applying the word2vec algorithm.

한편, 프로세서(120)는, 분석용 데이터(712)를 추출할 때, 학습 데이터(711)의 과목별 키워드들 및 세부 과정별 키워드들과의 유사성에 기초하여 키워드 사전에 유의어들을 추가할 수 있다. 수치적으로 표현되는 분석용 데이터(712)가 추출되는 경우, 해당 수치에 기반한 유사성이 판단될 수 있다.Meanwhile, when extracting the analysis data 712, the processor 120 may add synonyms to the keyword dictionary based on similarity with keywords for each subject and keywords for each detailed course of the learning data 711. . When data for analysis 712 expressed numerically is extracted, similarity may be determined based on the corresponding numerical value.

따라서, 무료 학습 컨텐츠(C1)의 수집에 활용된 키워드 사전의 특정 키워드에 대하여, 데이터 전처리 과정(710)에서 그 특정 키워드와 유사하다고 판정되는 다른 유사 키워드들이 존재하는 경우, 도 4의 키워드 사전(420)의 예시에서와 같이, 유사 키워드들이 유의어들로서 키워드 사전에 추가될 수 있다.Therefore, when there are other similar keywords determined to be similar to the specific keyword in the data pre-processing process 710 for a specific keyword in the keyword dictionary used to collect the free learning content C1, the keyword dictionary in FIG. 4 ( As in the example of 420), similar keywords may be added to the keyword dictionary as synonyms.

학습 데이터(711)에 대한 데이터 전처리 과정(710)이 수행되는 경우, 즉 학습 데이터(711)에 대해 TF-IDF 산출 또는 word2vec 알고리즘 적용 등이 수행되어 분석용 데이터(712)가 추출되는 경우, 그로부터 인공지능 분류 모형(720)이 동작할 수 있다. 인공지능 분류 모형(720)은 분석용 데이터를 입력으로 하여 컨텐츠 분류를 수행할 수 있다.When the data pre-processing process 710 is performed on the training data 711, that is, when TF-IDF calculation or word2vec algorithm is applied to the training data 711 and data for analysis 712 is extracted, therefrom An artificial intelligence classification model 720 may operate. The artificial intelligence classification model 720 may perform content classification by using data for analysis as an input.

프로세서(120)는, 배정 커리큘럼(730)을 생성할 때, CNN(Convolutional Neural Network), SVM(Support Vector Machine) 및 베이지안 분류(Bayesian Classification) 중 적어도 하나의 방식으로 학습되는 인공지능 분류 모형(720)을 활용하여 분석용 데이터를 처리할 수 있다. 즉, 인공지능 분류 모형(720)은 CNN, SVM 또는 베이지안 분류와 같은 머신 러닝에 의해 학습되어 형성될 수 있다.When the processor 120 generates the assignment curriculum 730, the artificial intelligence classification model (720 ) to process data for analysis. That is, the artificial intelligence classification model 720 may be formed by learning by machine learning such as CNN, SVM, or Bayesian classification.

한편, 프로세서(120)는, 배정 커리큘럼(730)을 생성할 때, 인공지능 분류 모형(720)을 활용하여 무료 학습 컨텐츠들 각각이 광고성 컨텐츠인지 여부를 분류할 수 있다. 즉, 인공지능 분류 모형(720)은 컨텐츠 분류를 수행하도록 학습되는 과정에서 추가적으로 광고성 컨텐츠 여부를 분류하도록 학습될 수 있다. 예를 들면, 인공지능 분류 모형(720)은 무료 학습 컨텐츠들 각각이 광고를 포함하는지 여부를 분류하거나, 광고성의 정도를 수치적으로 분류하도록 학습될 수 있다.Meanwhile, when generating the assignment curriculum 730, the processor 120 may classify whether each of the free learning contents is advertisement content by using the artificial intelligence classification model 720. That is, the artificial intelligence classification model 720 may be trained to additionally classify advertising content in the process of learning to perform content classification. For example, the artificial intelligence classification model 720 may be trained to classify whether each of the free learning contents includes an advertisement or to numerically classify the degree of advertisement.

인공지능 분류 모형(720)에 의하면 과목별 및 세부 과정별로 분류되는 미배정 커리큘럼에 무료 학습 컨텐츠들(C1, C2, C3)이 분배되어, 분배 커리큘럼(730)이 생성될 수 있다. 즉, 인공지능 분류 모형(720)에 의해 무료 학습 컨텐츠들(C1, C2, C3) 각각이 미배정 커리큘럼의 어떤 과목과 어떤 세부 과정에 해당하는지가 분류될 수 있다.According to the artificial intelligence classification model 720, the free learning contents C1, C2, and C3 are distributed to the unassigned curriculum classified by subject and detailed course, so that the distribution curriculum 730 can be created. That is, the AI classification model 720 can classify each of the free learning contents C1, C2, and C3 to which subject and detailed course of the unassigned curriculum correspond.

도 8은 일부 실시예에 따른 만족도 평점을 산출하기 위해 평점 산출 모형을 생성하는 과정을 설명하기 위한 도면이다.8 is a diagram for explaining a process of generating a score calculation model to calculate a satisfaction score according to some embodiments.

도 8을 참조하면, 만족도 평점을 산출하기 위해 평점 산출 모형을 생성하는 과정을 설명하기 위한 평점 산출 모형 생성(810)이 도시되어 있다. 평점 산출 모형 생성(810)에 따르면, 평점 산출 모형은 일련의 과정들을 거쳐 생성될 수 있고, 생성된 이후에는 무료 학습 컨텐츠들 각각에 대한 만족도 평점을 산출할 수 있다.Referring to FIG. 8 , a rating calculation model creation 810 is shown to explain a process of generating a rating calculation model to calculate a satisfaction rating. According to the rating calculation model generation 810, the rating calculation model can be created through a series of processes, and after being created, a satisfaction rating for each of the free learning contents can be calculated.

평점 산출 모형 생성(810)에 따른 생성 과정은, 데이터 셋을 설정하는 과정(Data 변수 생성) 및 다중 회귀 분석을 수행하여 평점 산출 모형을 도출하는 과정(모형개발)을 포함할 수 있다. 이를 위해, 프로세서(120)는 만족도 평점을 산출할 때, 무료 학습 컨텐츠들의 조회수, 구독자수, 업데이트 주기 및 전문가 평점 중 적어도 하나를 포함하는 데이터 셋을 설정하고; 그리고 데이터 셋에 대한 다중 회귀 분석(Multiple Regression Analysis)을 수행하여 평점 산출 모형을 도출할 수 있다.The process of creating a score calculation model according to the generation 810 may include a process of setting a data set (creation of data variables) and a process of deriving a score calculation model by performing multiple regression analysis (model development). To this end, when calculating the satisfaction rating, the processor 120 sets a data set including at least one of the number of views, the number of subscribers, update cycle, and expert rating of free learning contents; In addition, a rating calculation model can be derived by performing multiple regression analysis on the data set.

그 외에도, 도시된 바와 같이, 평점 산출 모형 생성(810)에 따른 생성 과정에는, 모형 개발을 위해 비즈니스의 특성을 파악하는 과정(비즈니스의 이해), 데이터 셋을 생성하기 이전에 이를 이해하고 전처리하는 과정(Data 이해, Data 전처리), 데이터 셋을 트레이닝용, 테스트용 및 검증용 데이터로 구분하는 과정(Data 분할), 부적합 데이터를 제외하기 위해 탐색적 자료 분석(EDA) 및 통계 분석 등을 수행하는 과정(Data 변수 선택), 통계적인 방식을 통해 다중 회귀 분석에 의해 도출된 평점 산출 모형을 평가하는 과정(모형평가), 테스트 데이터를 적용하여 평점 산출 모형을 검증하는 과정(모형검증) 및 평가와 검증이 완료된 모형을 관리하고 모니터링하는 과정(모형개발 종료)이 더 포함될 수 있다.In addition, as shown, in the generation process according to the rating calculation model generation 810, the process of identifying the characteristics of the business for model development (understanding of the business), understanding and preprocessing before generating the data set The process (data understanding, data preprocessing), the process of dividing the data set into training, test, and verification data (data segmentation), exploratory data analysis (EDA) and statistical analysis to exclude inappropriate data process (data variable selection), process of evaluating the rating calculation model derived by multiple regression analysis through a statistical method (model evaluation), process of verifying the rating calculation model by applying test data (model validation), and evaluation and The process of managing and monitoring the verified model (end of model development) may be further included.

평점 산출 모형 생성(810)에 따른 생성 과정에 의해 평점 산출 모형이 생성되는 경우, 배정 커리큘럼에 배정된 무료 학습 컨텐츠들에 대한 만족도 평점이 평점 산출 모형을 통해 산출될 수 있다. 무료 학습 컨텐츠들에 대한 만족도가 통계적으로 수치화되어 제공될 수 있으므로, 유저가 무료 학습 컨텐츠들에 대한 학습을 계획하는 과정에서 만족도 평점이 보다 객관적인 참고 자료로서 제시될 수 있다.When the rating calculation model is generated by the generation process according to the rating calculation model generation 810, satisfaction ratings for the free learning contents assigned to the assignment curriculum may be calculated through the rating calculation model. Since the satisfaction with the free learning contents can be statistically quantified and provided, the satisfaction rating can be presented as a more objective reference material in the process of planning the learning of the free learning contents by the user.

도 9는 일부 실시예에 따른 배정 커리큘럼 및 만족도 평점을 유저 단말기에 제공하는 과정을 설명하기 위한 도면이다.9 is a diagram for explaining a process of providing an assignment curriculum and satisfaction rating to a user terminal according to some embodiments.

도 9를 참조하면, 유저 단말기(300)에 제공되는 무료 학습 큐레이션 서비스의 예시로서, 배정 커리큘럼의 적어도 일부를 디스플레이하는 제 1 UI(910) 및 배정 커리큘럼의 적어도 일부에 포함되는 세부 과정에 대한 만족도 평점을 디스플레이하는 제 2 UI(920)가 도시되어 있다.Referring to FIG. 9 , as an example of a free learning curation service provided to the user terminal 300, a first UI 910 displaying at least a part of the assignment curriculum and a detailed process included in at least a part of the assignment curriculum A second UI 920 displaying a satisfaction rating is shown.

유저는 유저 단말기(300)를 통해 서버(100)로부터 제 1 UI(910) 및 제 2 UI(920)와 같은 정보를 제공받을 수 있다. 예를 들면, 유저는 유저 단말기(300)를 통해 서버(100)에 기술 분야, 과목, 세부 과정, 난이도 등과 같은 쿼리 입력을 전송할 수 있고, 서버(100)는 유저의 쿼리 입력에 대응되는 배정 커리큘럼의 적어도 일부 및 만족도 평점을 제 1 UI(910) 및 제 2 UI(920)와 같은 방식으로 유저 단말기(300)에 전송할 수 있다.The user may receive information such as the first UI 910 and the second UI 920 from the server 100 through the user terminal 300 . For example, the user may transmit a query input such as a technical field, subject, detailed course, level of difficulty, etc. to the server 100 through the user terminal 300, and the server 100 may send an assignment curriculum corresponding to the user's query input. At least a part of and the satisfaction rating may be transmitted to the user terminal 300 in the same manner as the first UI 910 and the second UI 920 .

한편, 프로세서(120)는, 유저 단말기(300)에 제공할 때, 유저의 조회 이력, 수강 이력, 진도율 및 출석률을 포함하는 로그 데이터 및 유저의 컨텐츠 선호도를 포함하는 유저 선호도 데이터에 기초하여 유저 맞춤형 관리 서비스를 제공할 수 있다.On the other hand, the processor 120, when provided to the user terminal 300, based on the log data including the user's inquiry history, course taking history, progress rate and attendance rate, and user preference data including the user's content preference, customized for the user. management services can be provided.

유저가 무료 학습 큐레이션 서비스를 이용하는 과정에서 축적되는 로그 데이터 및 유저 선호도 데이터에 의하면 유저가 무료 학습 큐레이션 서비스를 이용하는 경향성이 파악될 수 있다. 따라서, 로그 데이터 및 유저 선호도 데이터에 의한 유저의 경향성을 활용하여 인공지능 분류 모형 및 평점 생성 모형을 업데이트하는 경우, 유저 맞춤형으로 보다 체계적인 서비스를 지원하는 유저 맞춤형 관리 서비스가 제공될 수 있다.According to the log data and user preference data accumulated in the process of using the free learning curation service by the user, the tendency of the user to use the free learning curation service can be identified. Therefore, when an artificial intelligence classification model and a rating generation model are updated by utilizing user tendencies based on log data and user preference data, a user-customized management service that supports a more systematic service tailored to the user can be provided.

도 10은 일부 실시예에 따른 무료 학습 큐레이션 서비스를 제공하는 방법을 구성하는 단계들을 나타내는 흐름도이다.10 is a flowchart illustrating steps constituting a method of providing a free learning curation service according to some embodiments.

도 10을 참조하면, 무료 학습 큐레이션 서비스를 제공하는 방법은 단계 1010 내지 단계 1060을 포함할 수 있다. 다만 이에 제한되는 것은 아니고, 도 10에 도시되는 단계들 외에 다른 범용적인 단계들이 무료 학습 큐레이션 서비스를 제공하는 방법에 더 포함될 수 있다.Referring to FIG. 10 , the method for providing a free learning curation service may include steps 1010 to 1060. However, it is not limited thereto, and other general-purpose steps other than the steps shown in FIG. 10 may be further included in the method of providing free learning curation service.

도 10의 방법은, 도 1 내지 도 9를 통해 설명되는 서버(100)에서 시계열적으로 처리되는 단계들로 구성될 수 있다. 따라서, 도 10의 방법에 대해 이하에서 생략되는 내용이라 할지라도, 도 1 내지 도 9의 서버(100)에 대해 이상에서 기술되는 내용은 도 10의 방법에 대해서도 동일하게 적용될 수 있다.The method of FIG. 10 may consist of steps processed time-sequentially in the server 100 described with reference to FIGS. 1 to 9 . Therefore, even if the content of the method of FIG. 10 is omitted below, the content described above for the server 100 of FIGS. 1 to 9 may be equally applied to the method of FIG. 10 .

단계 1010에서, 서버(100)는 서비스 대상 분야를 과목별 및 세부 과정별로 구분하는 미배정 커리큘럼에 대해 과목별 키워드들 및 세부 과정별 키워드들을 선정하여 키워드 사전을 구축할 수 있다.In step 1010, the server 100 may build a keyword dictionary by selecting keywords for each subject and keywords for each detailed course for an unassigned curriculum that classifies service target fields by subject and detailed course.

단계 1020에서, 서버(100)는 과목별 키워드들 및 세부 과정별 키워드들을 기반으로 무료 학습 플랫폼들(200)에 대한 데이터 크롤링을 수행하여 미배정 커리큘럼에 대한 무료 학습 컨텐츠들의 학습 데이터를 수집할 수 있다.In step 1020, the server 100 may perform data crawling of the free learning platforms 200 based on keywords for each subject and keywords for each detailed course to collect learning data of free learning contents for unassigned curriculum. there is.

학습 데이터를 수집하는 과정에서, 서버(100)는 무료 학습 플랫폼들(200)로부터 무료 학습 컨텐츠들 각각의 제목, 과목, 소개, 자막, 댓글, 강사 정보, URL, 조회수, 소속 채널, 구독자수, 좋아요수, 컨텐츠 크기 및 음성 데이터 중 적어도 하나를 수집할 수 있다.In the process of collecting learning data, the server 100 provides the title, subject, introduction, subtitles, comments, instructor information, URL, number of views, number of affiliated channels, number of subscribers, At least one of the number of likes, content size, and voice data may be collected.

학습 데이터를 수집하는 과정에서, 서버(100)는 학습 데이터에 음성 데이터가 포함되는 경우 음성 데이터에 대한 음성 인식(STT, Speech To Text)을 수행하여 음성 데이터에 대응되는 텍스트 데이터를 수집할 수 있다.In the course of collecting learning data, if the learning data includes voice data, the server 100 may perform speech recognition (STT, Speech To Text) on the voice data to collect text data corresponding to the voice data. .

단계 1030에서, 서버(100)는 학습 데이터에 대한 데이터 전처리를 수행하여 학습 데이터로부터 분석용 데이터를 추출할 수 있다.In step 1030, the server 100 may extract data for analysis from the learning data by performing data pre-processing on the training data.

분석용 데이터를 추출하는 과정에서, 서버(100)는 학습 데이터에 대해 자연어 처리(NLP, Natural Language Processing) 기반의 형태소 분석 및 텍스트 마이닝을 수행하여 학습 데이터로부터 워드 데이터를 추출할 수 있고, 워드 데이터에 대한 TF-IDF(Term Frequency-Inverse Document Frequency) 산출 및 word2vec 알고리즘 적용 중 적어도 하나를 수행할 수 있다.In the process of extracting data for analysis, the server 100 may extract word data from the training data by performing natural language processing (NLP)-based morphological analysis and text mining on the training data, and may extract word data from the training data. At least one of TF-IDF (Term Frequency-Inverse Document Frequency) calculation and application of the word2vec algorithm may be performed.

분석용 데이터를 추출하는 과정에서, 서버(100)는 학습 데이터의 과목별 키워드들 및 세부 과정별 키워드들과의 유사성에 기초하여 키워드 사전에 유의어들을 추가할 수 있다.In the process of extracting data for analysis, the server 100 may add synonyms to the keyword dictionary based on similarity with keywords for each subject and keywords for each detailed course of the learning data.

단계 1040에서, 서버(100)는 머신 러닝을 통해 컨텐츠 분류를 수행하도록 학습되는 인공지능 분류 모형을 활용하여 분석용 데이터를 처리함으로써 무료 학습 컨텐츠들을 미배정 커리큘럼에 분배하여 배정 커리큘럼을 생성할 수 있다.In step 1040, the server 100 processes data for analysis using an artificial intelligence classification model that is learned to classify content through machine learning, thereby distributing free learning contents to unassigned curricula to create an assigned curriculum. .

배정 커리큘럼을 생성하는 과정에서, 서버(100)는 인공지능 분류 모형을 활용하여 무료 학습 컨텐츠들 각각이 광고성 컨텐츠인지 여부를 분류할 수 있다.In the process of generating the assignment curriculum, the server 100 may classify whether each of the free learning contents is advertising contents by using an artificial intelligence classification model.

배정 커리큘럼을 생성하는 과정에서, 서버(100)는 CNN(Convolutional Neural Network), SVM(Support Vector Machine) 및 베이지안 분류(Bayesian Classification) 중 적어도 하나의 방식으로 학습되는 인공지능 분류 모형을 활용하여 분석용 데이터를 처리할 수 있다.In the process of generating the assignment curriculum, the server 100 utilizes an artificial intelligence classification model learned by at least one of CNN (Convolutional Neural Network), SVM (Support Vector Machine), and Bayesian Classification for analysis. data can be processed.

단계 1050에서, 서버(100)는 평점 산출 모형에 기초하여 무료 학습 컨텐츠들 각각에 대한 만족도 평점을 산출할 수 있다.In step 1050, the server 100 may calculate a satisfaction rating for each of the free learning contents based on the rating calculation model.

만족도 평점을 산출하는 과정에서, 서버(100)는 무료 학습 컨텐츠들의 조회수, 구독자수, 업데이트 주기 및 전문가 평점 중 적어도 하나를 포함하는 데이터 셋을 설정할 수 있고, 데이터 셋에 대한 다중 회귀 분석(Multiple Regression Analysis)을 수행하여 평점 산출 모형을 도출할 수 있다.In the process of calculating the satisfaction rating, the server 100 may set a data set including at least one of the number of views, the number of subscribers, the update cycle, and expert ratings of free learning contents, and perform multiple regression analysis on the data set. Analysis) to derive a score calculation model.

단계 1060에서, 서버(100)는 서비스 대상 분야에 관한 유저의 쿼리 입력에 대응하여 배정 커리큘럼의 적어도 일부를 만족도 평점과 함께 유저 단말기(300)에 제공할 수 있다.In step 1060, the server 100 may provide the user terminal 300 with at least a part of the assignment curriculum along with a satisfaction rating in response to the user's query input on the service target field.

유저 단말기(300)에 제공하는 과정에서, 서버(100)는 유저의 조회 이력, 수강 이력, 진도율 및 출석률을 포함하는 로그 데이터 및 유저의 컨텐츠 선호도를 포함하는 유저 선호도 데이터에 기초하여 유저 맞춤형 관리 서비스를 제공할 수 있다.In the process of providing the user terminal 300, the server 100 provides a user-customized management service based on log data including the user's inquiry history, course taking history, progress rate and attendance rate, and user preference data including the user's content preference. can provide.

한편, 서비스 대상 분야는, 빅데이터, 인공지능, 머신 러닝 및 소프트웨어 중 적어도 하나를 포함하는 4차 산업혁명 기술 분야일 수 있다.Meanwhile, the service target field may be a fourth industrial revolution technology field including at least one of big data, artificial intelligence, machine learning, and software.

도 10의 무료 학습 큐레이션 서비스를 제공하는 방법은, 그 방법을 실행하는 명령어들을 포함하는 적어도 하나의 프로그램 또는 소프트웨어가 기록되는 컴퓨터로 판독 가능한 기록 매체에 기록될 수 있다.The method of providing the free learning curation service of FIG. 10 may be recorded on a computer-readable recording medium on which at least one program or software including instructions for executing the method is recorded.

컴퓨터로 판독 가능한 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함될 수 있다. 프로그램 명령어의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드가 포함될 수 있다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and floptical disks such as Hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like, may be included. Examples of program instructions may include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes generated by a compiler.

이상에서 실시예들에 대하여 상세하게 설명하였으나 본 발명에 따른 권리범위가 이에 한정되는 것은 아니고, 다음의 청구범위에 기재되어 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명에 따른 권리범위에 포함된다.Although the embodiments have been described in detail above, the scope of rights according to the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention described in the following claims are also included in the present invention. included in the scope of rights according to

Claims

A method for providing a free learning compass service to a user terminal by a server including a memory and a processor,
Building a keyword dictionary by selecting keywords for each subject and keywords for each detailed process for an unassigned curriculum that classifies service target fields by subject and detailed process;
Collecting learning data of free learning contents for the unassigned curriculum by performing data crawling on free learning platforms based on the keywords for each subject and the keywords for each detailed course;
extracting data for analysis from the learning data by performing data pre-processing on the learning data;
generating an assigned curriculum by distributing the free learning contents to the unassigned curriculum by processing the data for analysis using an artificial intelligence classification model learned to classify contents through machine learning;
Calculating a satisfaction rating for each of the free learning contents based on a rating calculation model, the rating calculation model comprising:
a data preprocessing step of constructing a data set by receiving internal and external data related to ratings, recognizing missing values based on actual and average values, preprocessing the missing values, and removing outliers and erroneous input noise;
Generating a variable that affects lecture satisfaction, wherein the variable includes an update cycle, a number of views, a number of subscribers, and a number of comments, data variable creation step;
a data division step of dividing the data into construction, verification, and performance verification;
A data variable selection step of excluding inappropriate variables from the data and selecting variables to be used in the analysis, wherein the data variable selection is to check trends for each variable by exploratory data analysis, and to select variables through statistical analysis. the data variable selection step;
developing the scoring model from logistic regression analysis;
Evaluating the scoring model through KS statistics, ROC curves, and AR accuracy; and
Calculating the satisfaction rating, which is generated from the step of applying test data and verifying the rating calculation model through gap analysis;
In response to a user's query input regarding the service target field, providing at least a part of the assignment curriculum together with the satisfaction rating to the user terminal;
The generating of the assignment curriculum may include generating the assignment curriculum from the artificial intelligence classification model based on a user's feedback on at least a part of the assignment curriculum provided to the user terminal, and utilizing the artificial intelligence classification model to provide the free assignment curriculum. Classifying whether each of the learning contents is advertising content,
The user terminal includes first UI information including a study map filtering function for selecting a technical field, subject, detailed course, difficulty level, age group, and form, and second UI information displaying a detailed course and satisfaction rating of the assignment curriculum A method for providing a learning compass service, further comprising each step provided.

According to claim 1,
The step of extracting the data for analysis by performing data preprocessing may include extracting word data from the learning data by performing natural language processing (NLP)-based morpheme analysis and text mining on the learning data. ; and
A method for providing a learning compass service, comprising performing at least one of calculating a term frequency-inverse document frequency (TF-IDF) for the word data and applying a word2vec algorithm.

According to claim 1,
The extracting of the data for analysis may further include adding synonyms to the keyword dictionary based on similarities between the keywords for each subject and the keywords for each detailed course of the learning data. How to provide.

According to claim 1,
The step of collecting learning data of the free learning contents by performing the data crawling,
A preliminary step of creating an unassigned curriculum using basic textbooks related to the learning subject, using the process and detailed process included in the basic textbook, and selecting a search keyword according to the frequency of derivation for each of the process and detailed process;
Audio data is extracted from the entire video lecture data source including both video and audio, not meta data or meta information included in the video, and speech recognition is performed on the audio data to extract text data corresponding to the audio data. doing;
Crawling subtitle data included in video lectures separately from the audio data;
A method of providing a learning compass service comprising the step of storing text data including both the text data and the subtitle data in a basic database of lectures.

A computer readable storage medium storing instructions for performing the method according to claims 1 to 4.