KR102190304B1

KR102190304B1 - preferred content recommending apparatus based on GPU computing Using deep learning and method therefor

Info

Publication number: KR102190304B1
Application number: KR1020180152508A
Authority: KR
Inventors: 강명주; 곽지훈; 정승원; 김건우; 서현; 노형민
Original assignee: 서울대학교 산학협력단
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-12-14
Also published as: KR20200072589A

Abstract

본 발명은 콘텐츠 이용자의 성향을 벡터화한 콘텐츠 이용자 성향 벡터와 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터를 콘텐츠 추천 모델 및 손실 함수에 입력하여 GPU만을 이용해 연산을 수행함으로써 콘텐츠 이용자에게 콘텐츠 이용자별 콘텐츠 평가 점수를 제공하는 기술에 관한 것으로, 벡터 변환부, GPU 연산 수행부, 콘텐츠 평가 점수 예측부를 포함할 수 있으며, 기존에 CPU와 GPU를 둘다 이용하여 연산을 수행하던 추천 모델에 비하여 연산 자원의 효율적인 분배와 병목 현상 방지를 통한 연산 속도 향상이라는 목적을 달성할 수 있으며 콘텐츠 이용자가 이용한 콘텐츠에 대한 체류 시간 정보를 수집하고, 이를 이용하여 체류 시간을 각 콘텐츠 이용자별 상대 점수로 환산한 맞춤형 점수를 계산하여 콘텐츠 선호도를 제공함으로써 더욱 정확한 개인별 선호도를 제공할 수 있다는 효과가 존재한다.In the present invention, the content user propensity vector vectorized content user propensity and the content propensity vector vectorized content propensity vector are input into the content recommendation model and loss function, and calculation is performed using only the GPU, so that the content evaluation score for each content user It relates to a technology that provides, and may include a vector conversion unit, a GPU operation unit, and a content evaluation score prediction unit. Compared to the recommended model that performed calculation using both a CPU and a GPU, efficient distribution of computing resources and It can achieve the purpose of improving computational speed by preventing bottlenecks, and collects information on dwell time for content used by content users, and calculates a customized score that converts dwell time into relative scores for each content user using this By providing preferences, there is an effect that more accurate individual preferences can be provided.

Description

A preferred content recommending apparatus based on GPU computing Using deep learning and method therefor.

본 발명은 딥러닝을 이용하여 콘텐츠 이용자의 선호도에 부합하는 콘텐츠를 제공하는 기술에 대한 것으로, 더 자세하게는 콘텐츠 이용자의 성향을 벡터화한 콘텐츠 이용자 성향 벡터와 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터를 콘텐츠 추천 모델 및 손실 함수에 입력하여 GPU만을 이용해 연산을 수행함으로써 콘텐츠 이용자에게 콘텐츠 이용자별 콘텐츠 평가 점수를 제공하는 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치 및 그 방법을 제공하는 것을 목적으로 한다. The present invention relates to a technology for providing content that meets the preferences of content users using deep learning, and more specifically, a content user propensity vector that vectorizes the propensity of the content user and a content propensity vector that vectorizes the content propensity. An object of the present invention is to provide a preferred content recommendation device and method using deep learning based on GPU computation that provides content evaluation scores for each content user to content users by inputting into a recommendation model and loss function and performing calculations using only GPU.

유무선 통신의 속도 및 연산자원의 발전 속도가 급격하게 이루어 짐으로써 온라인 상에는 가히 범람이라고 해도 무방할 만큼 막대한 양의 콘텐츠들이 존재하고 있다.As the speed of wired and wireless communication and the speed of development of operator personnel have been rapidly achieved, there are a huge amount of contents online enough to be inundated.

따라서 이러한 막대한 콘텐츠 중에서 특정한 이용자가 선호하는 콘텐츠를 선별하여 제공하는 것에 대한 필요성이 매우 크며, 이에 따라 다양한 콘텐츠 추천 모델들이 존재하고 있다.Therefore, there is a great need for selecting and providing content preferred by a specific user from among such enormous content, and accordingly, various content recommendation models exist.

종래의 콘텐츠 추천 시스템은 CPU와 GPU를 모두 이용하여 딥러닝 모델의 연산을 수행함으로써, 분석된 콘텐츠와의 유사도가 높은 콘텐츠를 사용자에게 추천하여 왔다.Conventional content recommendation systems have recommended content with high similarity to the analyzed content to users by performing calculation of a deep learning model using both a CPU and a GPU.

이러한 종래의 콘텐츠 추천 모델은 CPU와 GPU가 각각 연산을 수행함으로써 발생하는 병목현상 및 연산 자원의 비효율적 분배로 인한 문제점이 발생하였다.In such a conventional content recommendation model, a bottleneck occurs when the CPU and the GPU respectively perform computations and problems due to inefficient distribution of computational resources.

그 뿐만 아니라 콘텐츠 이용자가 콘텐츠에 대한 체류 시간을 기반으로 콘텐츠 이용자별 콘텐츠 선호도를 산정함으로써 콘텐츠 이용자마다 콘텐츠를 이용하는 상대적인 시간이 서로 달라 절대 시간을 기준으로 선호도를 산정하는 경우 선호도 판단에 오류가 발생하는 문제점이 존재 하였다. In addition, by calculating the content preference for each content user based on the content user's staying time for the content, the relative time of using the content is different for each content user, so when the preference is calculated based on the absolute time, an error occurs in the preference determination. There was a problem.

본 발명은 GPU만을 이용하여 연산을 수행하여 콘텐츠 이용자에게 콘텐츠 이용자별 콘텐츠 평가 점수를 제공하는 콘텐츠 추천 모델을 이용하여 기존에 CPU와 GPU를 둘다 이용하여 연산을 수행하던 추천 모델에 비하여 연산 자원의 효율적인 분배와 병목 현상 방지를 제공할 수 있는 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치 및 그 방법에 관한 것이다.The present invention uses a content recommendation model that performs computation using only a GPU and provides content evaluation scores for each content user to content users, and is more efficient in terms of computational resources than a recommendation model that previously performed computation using both a CPU and a GPU. The present invention relates to an apparatus and method for recommending preferred contents using deep learning based on GPU computation that can provide distribution and bottleneck prevention.

본 발명의 실시 예에 따르면 GPU 연산 기반의 딥러닝을 이용한 콘텐츠 추천 장치는 프로세서에 의하여 동작하는 신경망으로 이루어진 딥러닝 모델을 이용하여, 콘텐츠 이용자의 성향을 벡터화한 콘텐츠 이용자 성향 벡터u, 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터 v를 산출하는 벡터 변환부; GPU만을 이용하여 연산이 가능하도록 완전한 딥러닝 구조를 구현하기 위하여 미리 설정된 손실 함수에 상기 산출된 u와 v를 입력하여 연산을 수행하는 GPU 연산 수행부; 및 상기 연산 결과를 기반으로 콘텐츠 이용자 별로 특정 콘텐츠에 대한 콘텐츠 평가 점수를 예측하는 콘텐츠 평가 점수 예측부를 포함할 수 있다.According to an embodiment of the present invention, a content recommendation device using deep learning based on GPU computation uses a deep learning model consisting of a neural network operated by a processor, and a content user propensity vector u vectorized content users' propensities, and a content propensity A vector converting unit that calculates a vectorized content propensity vector v; A GPU calculation execution unit that performs calculation by inputting the calculated u and v to a preset loss function in order to implement a complete deep learning structure so that calculation is possible using only a GPU; And a content evaluation score prediction unit that predicts a content evaluation score for a specific content for each content user based on the calculation result.

본 발명의 일 실시 예에 따르면 상기 GPU 연산 수행부는, 상기 연산을 수행함에 있어 다중 밀집 레이어를 이용하여 생성한 가중치 행렬에 적어도 하나의 콘텐츠 이용자 성향 벡터 u의 값이 입력될 수 있다.According to an embodiment of the present invention, when performing the calculation, the GPU calculation unit may input at least one content user propensity vector u to a weight matrix generated using multiple dense layers.

본 발명의 일 실시 예에 따르면 상기 콘텐츠가 텍스트 위주인 경우, 콘텐츠에 포함된 텍스트들의 벡터화를 위하여 사전(dictionary) 형성 및 워드 투 벡터(word to vector) 모델을 학습할 수 있다.According to an embodiment of the present invention, when the content is mainly text, a dictionary may be formed and a word to vector model may be trained to vectorize texts included in the content.

본 발명의 일 실시 예에 따르면 콘텐츠에 포함된 텍스트들을 크롤링(crawling)하여 문장별로 사전에 등록하고, 상기 사전에 등록된 문장들을 구문 분석하여 미리 저장된 예외 단어 들은 삭제하고 미리 저장된 일정 형식의 품사들만 추출하여 저장된 문장들을 이용하여 워드 투 벡터(word to vector) 모델을 생성할 수 있다.According to an embodiment of the present invention, texts included in content are crawled and registered in a dictionary for each sentence, and exception words stored in advance are deleted by parsing the sentences registered in the dictionary, and only parts of speech in a predetermined format are stored in advance. A word to vector model can be generated using the extracted and stored sentences.

본 발명의 일 실시 예에 따르면 상기 GPU 연산 수행부는, 상기 콘텐츠가 텍스트 위주인 경우에는 문서 처리용 CNN 모델을 사용하고, 상기 콘텐츠가 영상 또는 음악 주인 경우 영상 또는 음악 처리용 CNN 모델을 사용하여 콘텐츠의 유형별로 해당 CNN모델으로 교체하여 사용할 수 있다.According to an embodiment of the present invention, the GPU computational unit uses a CNN model for document processing when the content is text-oriented, and uses a CNN model for video or music processing when the content is an image or music owner. It can be used by replacing it with the corresponding CNN model for each type of.

본 발명의 일 실시 예에 따르면 각 콘텐츠 이용자가 이용한 각 콘텐츠에 대한 체류 시간 정보를 수집하고 수집된 체류 시간을 상대 점수로 환산하여 해당 콘텐츠의 개인 선호도를 판단하여 콘텐츠 이용자별 콘텐츠 선호도를 제공하는 콘텐츠 선호도 제공부 및 상기 높은 콘텐츠 선호도를 가지는 콘텐츠에 대한 콘텐츠 평가 점수를 예측하여 높은 평가 점수가 예측되는 콘텐츠를 추천하는 콘텐츠 추천부를 더 포함할 수 있다.According to an embodiment of the present invention, a content that provides content preference for each content user by collecting residence time information for each content used by each content user and converting the collected stay time into a relative score to determine the personal preference of the content It may further include a preference providing unit and a content recommendation unit for predicting a content evaluation score for the content having a high content preference and recommending a content for which a high evaluation score is predicted.

본 발명의 일 실시 예에 따르면 상기 콘텐츠 선호도 제공부는, 콘텐츠 이용자가 이용한 각 콘텐츠 별 체류 시간 정보를 수집하는 체류 시간 정보 수집부; 수집된 콘텐츠 별 체류 시간 정보 각각에 정규분포 함수를 대응시키는 정규 분포 함수 매칭부; 상기 대응된 복수의 체류 시간 정보에 대한 정규분포 함수를 선형적으로 결합하여 상기 콘텐츠 이용자의 체류시간 분포를 추정하는 체류시간 분포 추정부; 상기 추정된 체류시간 분포를 누적하여 누적 분포 함수를 계산하고, 상기 계산된 누적 분포 함수를 이용하여 체류 시간을 각 콘텐츠 이용자별 상대 점수로 환산한 맞춤형 점수를 계산하는 맞춤형 점수 계산부; 및 상기 계산된 맞춤형 점수를 기반으로 각 콘텐츠 이용자별 복수의 콘텐츠에 대한 선호도 그래프를 생성하여 해당 콘텐츠의 개인 선호도를 판단하는 선호도 판단부를 더 포함할 수 있다.According to an embodiment of the present invention, the content preference providing unit includes: a residence time information collection unit for collecting residence time information for each content used by a content user; A normal distribution function matching unit for correlating a normal distribution function to each of the collected information on the residence time for each content; A residence time distribution estimation unit for estimating a residence time distribution of the content user by linearly combining a normal distribution function for the corresponding plurality of residence time information; A customized score calculator configured to calculate a cumulative distribution function by accumulating the estimated distribution of residence time, and calculating a customized score obtained by converting the residence time into a relative score for each content user using the calculated cumulative distribution function; And a preference determining unit for determining a personal preference of the corresponding content by generating a preference graph for a plurality of contents for each content user based on the calculated customized score.

본 발명의 일 실시 예에 따르면 상기 맞춤형 점수 계산부는, 상기 계산된 누적 분포 함수를 사분위(quantile) 변환하여 맞춤형 점수를 계산할 수 있다.According to an embodiment of the present invention, the customized score calculator may calculate a customized score by transforming the calculated cumulative distribution function into a quartile.

본 발명의 일 실시 예에 따르면 상기 선호도 판단부는, 상기 계산된 맞춤형 점수가 높을수록 각 콘텐츠 이용자가 선호하는 콘텐츠로 판단할 수 있다.According to an embodiment of the present invention, the preference determination unit may determine that the content user prefers as the calculated customized score increases.

본 발명의 일 실시 예에 따르면 상기 체류시간 분포 추정부는, 상기 콘텐츠 이용자가 이용한 콘텐츠의 수가 늘어날수록 추정된 체류시간 분포는 실제 결과에 부합되는 분포를 따라 갈 수 있다.According to an exemplary embodiment of the present invention, as the number of contents used by the content user increases, the estimated residence time distribution may follow a distribution corresponding to an actual result.

본 발명의 실시 예에 따르면 GPU 연산 기반의 딥러닝을 이용한 콘텐츠 추천 방법은 프로세서에 의하여 동작하는 신경망으로 이루어진 딥러닝 모델을 이용하여, 콘텐츠 이용자의 성향을 벡터화한 콘텐츠 이용자 성향 벡터u, 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터 v를 산출하는 단계; GPU만을 이용하여 연산이 가능하도록 완전한 딥러닝 구조를 구현하기 위하여 미리 설정된 손실 함수에 상기 산출된 u와 v를 입력하여 연산을 수행하는 단계; 및 상기 연산 결과를 기반으로 콘텐츠 이용자 별로 특정 콘텐츠에 대한 콘텐츠 평가 점수를 예측하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a content recommendation method using deep learning based on GPU computation uses a deep learning model consisting of a neural network operated by a processor, and a content user propensity vector u vectorized content users' propensities, and a content propensity Calculating a vectorized content propensity vector v; Performing an operation by inputting the calculated u and v to a preset loss function in order to implement a complete deep learning structure so that computation is possible using only a GPU; And predicting a content evaluation score for a specific content for each content user based on the calculation result.

본 발명의 일 실시 예에 따르면 상기 콘텐츠 성향 벡터 v를 산출하는 단계는, 상기 연산을 수행함에 있어 다중 밀집 레이어를 이용하여 생성한 가중치 행렬에 적어도 하나의 콘텐츠 이용자 성향 벡터 u의 값이 입력될 수 있다.According to an embodiment of the present invention, in the calculating of the content propensity vector v, in performing the operation, at least one content user propensity vector u may be input to a weight matrix generated using multiple dense layers. have.

본 발명의 일 실시 예에 따르면 상기 콘텐츠 성향 벡터 v를 산출하는 단계는, 상기 콘텐츠가 텍스트 위주인 경우에는 문서 처리용 CNN 모델을 사용하고, 상기 콘텐츠가 영상 또는 음악 주인 경우 영상 또는 음악 처리용 CNN 모델을 사용하여 콘텐츠의 유형별로 해당 CNN모델으로 교체하여 사용할 수 있다.According to an embodiment of the present invention, the calculating of the content propensity vector v includes using a document processing CNN model when the content is mainly text, and a video or music processing CNN when the content is an image or music owner. By using the model, it can be used by replacing it with the corresponding CNN model for each type of content.

본 발명의 일 실시 예에 따르면 각 콘텐츠 이용자가 이용한 각 콘텐츠에 대한 체류 시간 정보를 수집하고 수집된 체류 시간을 상대 점수로 환산하여 해당 콘텐츠의 개인 선호도를 판단하여 콘텐츠 이용자별 콘텐츠 선호도를 제공하는 단계 및 상기 높은 콘텐츠 선호도를 가지는 콘텐츠에 대한 콘텐츠 평가 점수를 예측하여 높은 평가 점수가 예측되는 콘텐츠를 추천하는 단계를 더 포함할 수 있다.According to an embodiment of the present invention, providing content preference for each content user by collecting residence time information for each content used by each content user and converting the collected stay time into a relative score to determine personal preference of the corresponding content. And predicting a content evaluation score for the content having the high content preference, and recommending content for which a high evaluation score is predicted.

본 발명의 일 실시 예에 따르면 상기 콘텐츠 선호도를 제공하는 단계는, 콘텐츠 이용자가 이용한 각 콘텐츠 별 체류 시간 정보를 수집하는 단계; 수집된 콘텐츠 별 체류 시간 정보 각각에 정규분포 함수를 대응시키는 단계; 상기 대응된 복수의 체류 시간 정보에 대한 정규분포 함수를 선형적으로 결합하여 상기 콘텐츠 이용자의 체류시간 분포를 추정하는 단계; 상기 추정된 체류시간 분포를 누적하여 누적 분포 함수를 계산하고, 상기 계산된 누적 분포 함수를 이용하여 체류 시간을 각 콘텐츠 이용자별 상대 점수로 환산한 맞춤형 점수를 계산하는 단계; 및 상기 계산된 맞춤형 점수를 기반으로 각 콘텐츠 이용자별 복수의 콘텐츠에 대한 선호도 그래프를 생성하여 해당 콘텐츠의 개인 선호도를 판단하는 단계를 더 포함할 수 있다.According to an embodiment of the present invention, the providing of the content preference may include: collecting information on residence time for each content used by a content user; Correlating a normal distribution function to each of the collected information on the residence time for each content; Estimating a residence time distribution of the content user by linearly combining a normal distribution function for the corresponding plurality of residence time information; Calculating a cumulative distribution function by accumulating the estimated distribution of dwell time, and calculating a customized score obtained by converting the dwell time into a relative score for each content user by using the calculated cumulative distribution function; And determining a personal preference of the corresponding content by generating a preference graph for a plurality of contents for each content user based on the calculated customized score.

본 발명의 일 실시 예에 따르면 상기 맞춤형 점수를 계산하는 단계는, 상기 계산된 누적 분포 함수를 사분위(quantile) 변환하여 맞춤형 점수를 계산할 수 있다.According to an embodiment of the present invention, in the calculating of the customized score, the customized score may be calculated by transforming the calculated cumulative distribution function into a quartile.

본 발명의 일 실시 예에 따르면 상기 개인 선호도를 판단하는 단계는, 상기 계산된 맞춤형 점수가 높을수록 각 콘텐츠 이용자가 선호하는 콘텐츠로 판단할 수 있다.According to an embodiment of the present invention, in determining the personal preference, as the calculated customized score increases, it may be determined as the content preferred by each content user.

본 발명의 일 실시 예에 따르면 상기 체류시간 분포를 추정하는 단계는, 상기 콘텐츠 이용자가 이용한 콘텐츠의 수가 늘어날수록 추정된 체류시간 분포는 실제 결과에 부합되는 분포를 따라 갈 수 있다.According to an embodiment of the present invention, in the step of estimating the distribution of the dwell time, as the number of contents used by the content user increases, the estimated distribution of the dwell time may follow a distribution corresponding to an actual result.

본 발명에 따르면 기존에 CPU와 GPU를 둘다 이용하여 연산을 수행하여 왔던 추천 모델에 비하여 연산 자원의 비효율적인 분배 및 병목 현상의 방지를 통한 연산 속도 향상이라는 목적을 달성할 수 있으며, 콘텐츠 이용자가 이용한 콘텐츠에 대한 체류 시간 정보를 수집하고 이를 이용하여 체류 시간을 각 콘텐츠 이용자별 상대 점수로 환산한 맞춤형 점수를 계산하여 콘텐츠 선호도를 제공함으로써 더욱 정확한 개인별 선호도를 제공할 수 있다는 효과가 존재한다.According to the present invention, it is possible to achieve the object of improving computation speed through inefficient distribution of computational resources and prevention of bottlenecks, compared to a recommended model that has performed computation using both a CPU and a GPU. There is an effect that it is possible to provide more accurate personal preferences by providing content preferences by calculating a customized score that converts the residence time into a relative score for each content user by collecting information on the residence time for the content and using it.

도 1은 종래의 CPU와 GPU 연산을 동시에 수행하는 딥러닝을 이용한 추천 모델을 나타낸 도면이다.
도 2는 본 발명의 일 실시 예에 따른 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치의 구성도이다.
도 3은 도 2에 개시된 콘텐츠 선호도 제공부의 세부 구성도이다.
도 4는 본 발명의 일 실시 예에 따른 GPU 연산 기반의 선호 콘텐츠 추천 장치의 연산 프로세스를 나타낸 도면이다.
도 5는 본 발명 일 실시 예 중 콘텐츠를 분석하여 선호하는 콘텐츠를 추천하는 모델을 실시 예로 하는 GPU 연산 기반의 선호 콘텐츠 추천 장치의 데이터 흐름을 나타낸 도면이다.
도 6은 본 발명의 일 실시 예에 따른 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 방법의 흐름도이다.1 is a diagram showing a conventional recommended model using deep learning that simultaneously performs CPU and GPU calculations.
2 is a block diagram of a preferred content recommendation apparatus using deep learning based on GPU computation according to an embodiment of the present invention.
3 is a detailed configuration diagram of a content preference providing unit disclosed in FIG. 2.
4 is a diagram illustrating a calculation process of an apparatus for recommending preferred content based on GPU calculation according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a data flow of a device for recommending preferred content based on a GPU calculation, which uses a model for recommending preferred content by analyzing content in an embodiment of the present invention.
6 is a flowchart of a method for recommending preferred content using deep learning based on GPU computation according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, the present invention may be implemented in various forms and is not limited to the embodiments described herein.

그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

이하, 도면을 참조하여 본 발명의 실시 예에 따른 GPU 연산 기반의 딥러닝을 이용한 콘텐츠 추천 장치 및 그 방법에 대하여 설명한다.Hereinafter, an apparatus and method for recommending contents using deep learning based on GPU calculation according to an embodiment of the present invention will be described with reference to the drawings.

도 1은 종래의 CPU와 GPU 연산을 동시에 수행하는 딥러닝을 이용한 추천 모델을 나타낸 도면이다.1 is a diagram showing a conventional recommended model using deep learning that simultaneously performs CPU and GPU calculations.

도 1을 참조하면 종래의 딥러닝을 기반으로 하는 추천 모델 중 기존의 콘텐츠에 대한 평가 정보를 이용하여 이용자와 콘텐츠의 특성 벡터를 각각 구성하여 콘텐츠 이용자의 특정 콘텐츠에 대한 평가 점수를 예측하는 매트릭스 분해(Matrix Factorization)를 이용한 추천 모델은 CPU 연산자원을 이용하는 매트릭스 분해 연산 부분과 GPU 연산자원을 이용하는 CNN 연산 부분으로 나누어 질 수 있다.Referring to FIG. 1, a matrix decomposition that predicts an evaluation score for a specific content of a content user by constructing a user and a characteristic vector of the content using evaluation information on the existing content among the recommendation models based on conventional deep learning. The recommended model using (Matrix Factorization) can be divided into a matrix decomposition operation part using a CPU operator source and a CNN operation part using a GPU operator source.

매트릭스 분해 연산 부분은 콘텐츠 이용자의 성향 벡터 u와 콘텐츠의 성향 벡터 v의 내적으로 낼 수 있으며 이는 아래의 수학식 1과 같이 표현될 수 있다.The matrix decomposition operation part can be calculated as the inner product of the content user's propensity vector u and the content propensity vector v, which can be expressed as Equation 1 below.

[수학식 1][Equation 1]

또한 모든 콘텐츠 이용자 및 콘텐츠에 대한 벡터화는 현실적으로 어렵기 때문에 보통 확률적인 접근을 통해 문제를 해결하기 위해 수학식 2과 같이 무작위 초기화(Random initialization)하는 방식을 이용할 수 있다. In addition, since vectorization of all content users and content is difficult in reality, a random initialization method as shown in Equation 2 can be used to solve the problem through a generally probabilistic approach.

[수학식 2][Equation 2]

도 1과 같이 매트릭스 분해 연산 부분은 CPU를 통해 연산이 수행되며, 콘텐츠 특성 벡터 성향 부분은 CNN 연산을 기반으로 GPU를 이용하여 연산이 수행될 수 있다.As shown in FIG. 1, the matrix decomposition operation part is calculated by a CPU, and the content characteristic vector propensity part may be calculated using a GPU based on a CNN operation.

도 1에 개시된 CNN 기반의 콘텐츠 성향 벡터 생성 부분에서 X는 콘텐츠의 원 정보 (raw information)이고, 이것이 CNN을 통해 콘텐츠 성향 벡터 V로 가공될 수 있으며, W는 CNN의 각 레이어에 존재하는 가중치들을 나타냅니다.In the CNN-based content propensity vector generation part disclosed in FIG. 1, X is the raw information of the content, and this can be processed into the content propensity vector V through the CNN, and W is the weights existing in each layer of the CNN. Indicates.

도 1과 같은 추천 모델에서 최적화 해야 하는 손실 함수는 아래 수학식 3과 같이 표현될 수 있다.The loss function to be optimized in the recommendation model shown in FIG. 1 can be expressed as Equation 3 below.

[수학식 3][Equation 3]

도 2는 본 발명의 일 실시 예에 따른 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치(1000)의 구성도이다.2 is a block diagram of a preferred content recommendation apparatus 1000 using deep learning based on GPU computation according to an embodiment of the present invention.

도 2를 참조하면 본 발명의 제1 실시 예에 따른 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치(1000)는 벡터 변환부(100), GPU 연산 수행부(200), 콘텐츠 평가 점수 예측부(300)를 포함할 수 있다.Referring to FIG. 2, the preferred content recommendation apparatus 1000 using deep learning based on GPU calculation according to the first embodiment of the present invention includes a vector conversion unit 100, a GPU calculation execution unit 200, and a content evaluation score prediction unit. It may include (300).

본 발명의 실시 예에 따르면 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치(1000)는 프로세서에 의하여 동작하는 신경망으로 이루어진 딥러닝 모델을 이용할 수 있다.According to an embodiment of the present invention, the preferred content recommendation apparatus 1000 using deep learning based on GPU computation may use a deep learning model consisting of a neural network operated by a processor.

벡터 변환부(100)는 콘텐츠 이용자의 성향을 벡터화한 콘텐츠 이용자 성향 벡터u, 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터 v를 산출할 수 있다.The vector conversion unit 100 may calculate a content user propensity vector u obtained by vectorizing the propensity of the content user and a content propensity vector v vectorized by the content propensity.

콘텐츠 이용자 성향 벡터u, 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터 v의 산출은 도 1을 참조하며 설명한 바와 같이 산출할 수 있다.The content user propensity vector u and the content propensity vector v obtained by vectorizing the content propensity can be calculated as described with reference to FIG. 1.

GPU 연산 수행부(200)는 GPU만을 이용하여 연산이 가능하도록 완전한 딥러닝 구조를 구현하기 위하여 미리 설정된 손실 함수에 산출된 u와 v를 입력하여 연산을 수행할 수 있다.The GPU operation execution unit 200 may perform an operation by inputting the calculated u and v to a preset loss function in order to implement a complete deep learning structure so that the operation can be performed using only the GPU.

본 발명의 일 실시 예에 따르면 미리 설정된 손실함수는 아래 수학식 4와 같이 표현될 수 있다.According to an embodiment of the present invention, the preset loss function may be expressed as Equation 4 below.

[수학식 4][Equation 4]

여기서

는 회귀 손실(regression loss)을 의미하며, Absolute error, Squared error, Huber loss 등이 사용될 수 있다.here

Denotes a regression loss, and Absolute error, Squared error, Huber loss, etc. can be used.

수학식 4와 같은 손실 함수는 별도의 CPU연산이 필요 없으며, GPU 연산 만으로도 연산이 가능하므로 CPU와 GPU사이의 연산 자원 분배 효율화 및 병목현상으로 발생하는 총 연산 시간의 지연 등의 문제점등을 예방할 수 있다. The loss function as shown in Equation 4 does not require a separate CPU calculation and can be calculated using only GPU calculation, so it is possible to prevent problems such as delays in total calculation time caused by bottlenecks and efficient distribution of calculation resources between the CPU and GPU. have.

본 발명의 일 실시 예에 따르면 연산을 수행함에 있어 다중 밀집 레이어를 이용하여 생성한 가중치 행렬에 적어도 하나의 콘텐츠 이용자 성향 벡터 u의 값이 입력될 수 있다.According to an embodiment of the present invention, in performing an operation, a value of at least one content user propensity vector u may be input to a weight matrix generated using multiple dense layers.

이는 도 4를 참조하면 더 자세하게 설명하도록 한다.This will be described in more detail with reference to FIG. 4.

본 발명의 일 실시 예에 따르면 콘텐츠가 텍스트 위주인 경우, 콘텐츠에 포함된 텍스트들의 벡터화를 위하여 사전(dictionary) 형성 및 워드 투 벡터(word to vector) 모델을 학습할 수 있다.According to an embodiment of the present invention, when the content is mainly text, a dictionary may be formed and a word to vector model may be trained for vectorization of texts included in the content.

본 발명의 일 실시 예에 따르면 콘텐츠에 포함된 텍스트들을 크롤링(crawling)하여 문장별로 사전에 등록하고, 사전에 등록된 문장들을 구문 분석하여 미리 저장된 예외 단어 들은 삭제하고 미리 저장된 일정 형식의 품사들만 추출하여 저장된 문장들을 이용하여 워드 투 벡터(word to vector) 모델을 생성할 수 있다.According to an embodiment of the present invention, texts included in content are crawled and registered in a dictionary for each sentence, and exception words stored in advance are deleted by parsing the sentences registered in the dictionary, and only parts of speech in a predetermined format stored in advance are extracted. Thus, a word to vector model can be generated using the stored sentences.

본 발명의 일 실시 에에 따르면 워드 투 벡터(word to vector) 모델을 생성하기 위하여 콘텐츠에 포함된 텍스트를 대상으로 Mecab으로 구문 분석하여 미리 설정된 규칙에 따라 문장들을 저장할 수 있다.According to an embodiment of the present invention, in order to generate a word to vector model, text included in content may be parsed as a Mecab, and sentences may be stored according to a preset rule.

예를 들면 체언, 용언, 수식언, 독립언 등의 품사만 저장할 수 있으며, '기자', '사진', '컷', '리포트', '닷컴', '제공', '뉴스', '캡처'등과 같은 특정 단어 및 길이가 3단어 이하인 문장들은 제외 시킬 수 있다.For example, you can only store parts of speech such as body language, verbal verbs, modifiers, and independent words, and can be used for'Reporter','Photo','Cut','Report','Dcom','Offer','News','Capture', etc. Sentences with the same specific word and length less than 3 words can be excluded.

이렇게 저장된 문장들을 이용하여 워드 투 벡터 모델을 생성할 수 있다.Using these stored sentences, a word-to-vector model can be created.

상술한 일 실시 예와 같을 수 있으나 이에 한정되지 아니하고 본 발명의 또다른 일 실시 예에 따르면 콘텐츠가 텍스트 위주인 경우에는 문서 처리용 CNN 모델을 사용하고, 콘텐츠가 영상 또는 음악 주인 경우 영상 또는 음악 처리용 CNN 모델을 사용하여 콘텐츠의 유형별로 해당 CNN모델으로 교체하여 사용할 수 있다. It may be the same as the above-described embodiment, but is not limited thereto. According to another embodiment of the present invention, a CNN model for document processing is used when the content is text-oriented, and when the content is a video or music owner, video or music processing Using the CNN model for each content type, it can be replaced with the corresponding CNN model.

콘텐츠 평가 점수 예측부(300)는 연산 결과를 기반으로 콘텐츠 이용자 별로 특정 콘텐츠에 대한 콘텐츠 평가 점수를 예측할 수 있다.The content evaluation score prediction unit 300 may predict a content evaluation score for a specific content for each content user based on the calculation result.

여기서 특정 콘텐츠에 대한 콘텐츠 평가 점수란 콘텐츠 이용자가 아직 이용하지 아니한 특정 컨텐츠에 대하여 콘텐츠 이용자가 어떠한 평가 점수(R)을 줄 것인지에 대하여 딥러닝을 통해 예측한 점수를 의미할 수 있다.Here, the content evaluation score for a specific content may mean a score predicted through deep learning as to what evaluation score (R) a content user will give for a specific content that the content user has not yet used.

본 발명의 제2 실시 예에 따르면 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 장치(1000)는 벡터 변환부(100), GPU 연산 수행부(200), 콘텐츠 평가 점수 예측부(300)뿐만 아니라 콘텐츠 선호도 제공부(400), 콘텐츠 추천부(500)를 더 포함할 수 있다.According to the second embodiment of the present invention, the preferred content recommendation device 1000 using deep learning based on GPU calculation includes not only the vector conversion unit 100, the GPU calculation execution unit 200, and the content evaluation score prediction unit 300 A content preference providing unit 400 and a content recommendation unit 500 may be further included.

콘텐츠 선호도 제공부(400)는 각 콘텐츠 이용자가 이용한 각 콘텐츠에 대한 체류 시간 정보를 수집하고 수집된 체류 시간을 상대 점수로 환산하여 해당 콘텐츠의 개인 선호도를 판단하여 콘텐츠 이용자별 콘텐츠 선호도를 제공할 수 있다.The content preference providing unit 400 can provide content preference for each content user by collecting residence time information for each content used by each content user and converting the collected residence time into a relative score to determine the personal preference of the corresponding content. have.

콘텐츠 선호도 제공부(400)에 대해서는 도 3을 통해 더 자세하게 설명하도록 한다.The content preference providing unit 400 will be described in more detail with reference to FIG. 3.

콘텐츠 추천부(500)는 최상위 콘텐츠 선호도를 가지는 콘텐츠에 대한 콘텐츠 평가 점수를 예측하여 최상위 평가 점수가 예측되는 콘텐츠를 추천할 수 있다.The content recommendation unit 500 may predict a content evaluation score for a content having the highest content preference and recommend a content for which the highest evaluation score is predicted.

도 3은 도 2에 개시된 콘텐츠 선호도 제공부(400)의 세부 구성도이다.3 is a detailed configuration diagram of the content preference providing unit 400 disclosed in FIG. 2.

도 3을 참조하면 체류 시간 정보 수집부(410), 정규 분포 함수 매칭부(420), 체류시간 분포 추정부(430), 맞춤형 점수 계산부(440), 선호도 판단부(450)를 포함할 수 있다.Referring to FIG. 3, a residence time information collection unit 410, a normal distribution function matching unit 420, a residence time distribution estimation unit 430, a customized score calculation unit 440, and a preference determination unit 450 may be included. have.

체류 시간 정보 수집부(410)는 콘텐츠 이용자가 이용한 각 콘텐츠 별 체류 시간 정보를 수집할 수 있다.The residence time information collection unit 410 may collect residence time information for each content used by a content user.

본 발명의 일 실시 예에 따르면 한 명의 콘텐츠 이용자가 콘텐츠를 읽기 위해 해당 웹페이지에 체류한 체류 시간을 측정하여 체류 시간 정보로 정의할 수 있다.According to an embodiment of the present invention, a dwell time at which one content user stays on a corresponding web page to read the content may be measured and defined as dwell time information.

상기 실시 예에 따르면 콘텐츠 이용자가 구독한 복수의 콘텐츠에 포함된 각 콘텐츠 별로 체류 시간을 측정할 수 있으며, 이렇게 측정된 체류 시간을 기준으로 체류 시간 정보를 생성할 수 있다. According to the above embodiment, the residence time may be measured for each content included in a plurality of contents subscribed by the content user, and the residence time information may be generated based on the measured stay time.

이때 생성된 체류 시간 정보에 포함된 시간 정보는 절대적인 시간에 대한 것으로 콘텐츠 이용자별 이용 속도 및 이용 습관에 대하여 전혀 고려가 되어 있지 않아 이를 기반으로 선호도를 산정하는 경우 개인차로 인하여 정확도가 감소하는 문제점이 존재한다.At this time, the time information included in the generated stay time information is for the absolute time, and the speed and usage habits of each content user are not considered at all, so when the preference is calculated based on this, the accuracy decreases due to individual differences. exist.

따라서 본 발명과 같이 측정된 체류 시간을 바탕으로 개인별 상대 시간을 산정할 필요성이 존재한다.Therefore, there is a need to calculate the relative time for each individual based on the residence time measured as in the present invention.

정규 분포 함수 매칭부(420)는 수집된 콘텐츠 별 체류 시간 정보 각각에 정규분포 함수를 대응할 수 있다.The normal distribution function matching unit 420 may correspond to a normal distribution function to each of the collected information on the residence time for each content.

본 발명의 일 실시 예에 따르면 콘텐츠 이용자 A가 콘텐츠 1, 2, 3,…을 이용하였다고 가정하면, 수 많은 콘텐츠 중에서 특정한 콘텐츠를 이용하는 희박한 확률의 사건이 일어난 것이므로 각각의 사건이 중요한 의미를 가질 수 있다.According to an embodiment of the present invention, the content user A is the content 1, 2, 3, ... Assuming that is used, an event with a rare probability of using a specific content among numerous contents has occurred, so each event can have an important meaning.

본 발명이 일 실시 예에 따르면 콘텐츠 이용자 A에게 콘텐츠 1과 취향적으로 비슷한 다른 특정 콘텐츠를 보여준다면 콘텐츠 1을 이용할 때와 비슷한 체류시간을 가질 것임을 가정하면, 비슷한 체류시간은 정규분포를 따를 것이라고 추정할 수 있으며, 수집된 콘텐츠 별 체류 시간 정보 각각에 정규분포 함수를 대응할 수 있다. According to an embodiment of the present invention, if the content user A shows other specific content similar in taste to content 1, assuming that it will have a similar stay time as when using content 1, it is estimated that similar stay time will follow a normal distribution. In addition, a normal distribution function can be applied to each of the collected information on the time of stay for each content.

체류시간 분포 추정부(430)는 대응된 복수의 체류 시간 정보에 대한 정규분포 함수를 선형적으로 결합하여, 콘텐츠 이용자의 체류시간 분포를 추정할 수 있다.The residence time distribution estimating unit 430 may linearly combine a normal distribution function for the corresponding plurality of residence time information to estimate the residence time distribution of the content user.

본 발명의 일 실시 예에 따르면 콘텐츠 이용자 A가 콘텐츠를 이용한 사건 하나(콘텐츠, 체류시간)에 정규 분포 함수를 하나를 대응시키고, 대응된 정규분포함수들을 선형 결합해 콘텐츠 이용자 A의 체류시간 분포를 추정할 수 있다.According to an embodiment of the present invention, a content user A associates a normal distribution function with one event (content, residence time) using the content, and linearly combines the corresponding normal distribution functions to determine the distribution of the content user A's residence time. Can be estimated.

상기 일 실시 예에 따르면 각각의 정규분포 함수는 기사 N에 대한 체류시간

, 콘텐츠 이용자 A의 사건 개수

, 콘텐츠 이용자 A의 체류시간 표준편차

일 때 평균 및 표준 편차는 아래와 같다.According to the above embodiment, each normal distribution function is the residence time for article N

, Number of incidents of content user A

, Standard deviation of content user A's residence time

When is, the mean and standard deviation are as follows.

본 발명의 일 실시 예에 따르면 상술한 변수에 근거하여 아래의 수학식 5를 이용함으로써 콘텐츠 이용자의 체류시간 분포(pdf)를 추정할 수 있다.According to an embodiment of the present invention, by using Equation 5 below based on the above-described variables, the distribution of the content user's residence time (pdf) can be estimated.

[수학식 5][Equation 5]

본 발명의 일 실시 예에 따르면 체류시간 분포에 대한 체류시간 평균은 기록된 콘텐츠 이용자의 체류시간 평균과 같다고 가정될 수 있으며, 체류시간 분포(pdf)에 대한 체류시간 분산은 아래 수학식 6과 같다According to an embodiment of the present invention, it may be assumed that the average stay time for the distribution of stay time is the same as the average stay time of the recorded content user, and the distribution of stay time for the stay time distribution (pdf) is as Equation 6 below.

[수학식 6][Equation 6]

본 발명의 일 실시 예에 따르면 콘텐츠 이용자가 구독한 콘텐츠의 수가 늘어날수록 추정된 체류시간 분포는 실제 결과에 부합되는 분포를 따라 갈 수 있다.According to an embodiment of the present invention, as the number of contents subscribed by a content user increases, the estimated distribution of residence time may follow a distribution corresponding to an actual result.

즉, 콘텐츠 이용자가 읽은 콘텐츠 수가 늘어날수록 체류시간 분포(pdf)는 실제 결과에 대한 정확도를 상승시킬 수 있다.In other words, as the number of contents read by a content user increases, the residence time distribution (pdf) can increase the accuracy of the actual result.

맞춤형 점수 계산부(440)는 추정된 체류시간 분포를 누적하여 누적 분포 함수를 계산하고, 계산된 누적 분포 함수를 이용하여 체류 시간을 각 콘텐츠 이용자별 상대 점수로 환산한 맞춤형 점수를 계산할 수 있다.The customized score calculation unit 440 may calculate a cumulative distribution function by accumulating the estimated distribution of staying time, and calculate a custom score obtained by converting the staying time into a relative score for each content user using the calculated cumulative distribution function.

여기서 맞춤형 점수는 일정한 만점 단위를 가지는 점수가 아니고, 상대적인 크기를 통해 선호도 여부를 판정할 수 있는 수치를 의미할 수 있으나, 이에 한정되지 않고 상대적인 크기를 나타낼 수 있는 수치라면 제한 없이 사용될 수 있다.Here, the customized score is not a score having a certain perfect score, but may mean a value that can determine whether or not a preference is based on a relative size, but is not limited thereto, and any number capable of representing a relative size may be used without limitation.

본 발명의 일 실시 예에 따르면 아래 수학식 7을 이용하여 누적 분포 함수(cdf)를 계산할 수 있다.According to an embodiment of the present invention, the cumulative distribution function cdf may be calculated using Equation 7 below.

[수학식 7][Equation 7]

본 발명의 일 실시 예에 따르면 계산된 누적 분포 함수를 사분위(quantile) 변환하여 맞춤형 점수를 계산할 수 있다.According to an embodiment of the present invention, a customized score may be calculated by transforming a calculated cumulative distribution function into a quartile.

본 발명의 일 실시 예에 따르면 이상적인 점수의 분포를 가정한 후 사분위(quantile) 변환하여 맞춤형 점수를 계산할 수 있으며, 일 실시 예에 따르면 아래 수학식 8과 같은 사분위 변환을 수행하여 맞춤형 점수를 계산할 수 있다.According to an embodiment of the present invention, after assuming the distribution of the ideal score, a custom score may be calculated by transforming a quartile. According to an embodiment, a custom score may be calculated by performing a quartile transformation as shown in Equation 8 below. Can be calculated.

[수학식 8][Equation 8]

선호도 판단부(450)는 계산된 맞춤형 점수를 기반으로 각 콘텐츠 이용자별 복수의 콘텐츠에 대한 선호도 그래프를 생성하여 해당 콘텐츠의 개인 선호도를 판단할 수 있다.The preference determination unit 450 may determine a personal preference of the corresponding content by generating a preference graph for a plurality of contents for each content user based on the calculated customized score.

본 발명의 일 실시 예에 따르면 각 콘텐츠 이용자별 복수의 콘텐츠에 대한 선호도 그래프는 X축을 체류 시간 Y축을 맞춤형 점수로 하여 각 콘텐츠별 맞춤형 점수를 한눈에 볼 수 있게 그려질 수 있으며, 이를 통해 각 콘텐츠별 개인의 선호도를 판단할 수 있다.According to an embodiment of the present invention, the preference graph for a plurality of contents for each content user may be drawn so that the customized score for each content can be viewed at a glance by using the X-axis as the residence time and the Y-axis as a customized score. You can judge individual preferences.

본 발명의 일 실시 예에 따르면 계산된 맞춤형 점수가 높을수록 각 콘텐츠 이용자가 선호하는 콘텐츠로 판단할 수 있다.According to an embodiment of the present invention, as the calculated customized score is higher, it may be determined as the content preferred by each content user.

도 4는 본 발명의 일 실시 예에 따른 GPU 연산 기반의 선호 콘텐츠 추천 장치의 연산 프로세스를 나타낸 도면이다.4 is a diagram illustrating a calculation process of an apparatus for recommending preferred content based on GPU calculation according to an embodiment of the present invention.

본 발명의 일 실시 예에 따라 GPU 연산 기반의 선호 콘텐츠 추천 장치(1000)는 도 4와 같은 연산 프로세스가 진행될 수 있다.According to an embodiment of the present invention, the GPU calculation-based preferred content recommendation apparatus 1000 may perform a calculation process as shown in FIG. 4.

본 발명의 일 실시 예에 따르면 텍스트가 위주인 콘텐츠의 경우 도 4와 같이 연산 프로세스를 수행하여 연산의 병목 현상을 최소화하기 위하여 GPU만 사용하여 연산에 소요되는 시간을 최소화 시킬 수 있다. According to an embodiment of the present invention, in the case of text-oriented content, as shown in FIG. 4, in order to minimize the bottleneck of the operation by performing an operation process as shown in FIG. 4, time required for operation may be minimized using only the GPU.

도 4를 참조하면 특정 콘텐츠의 데이터를 입력하면 각 콘텐츠 이용자에 대한 평가 점수를 예측하여 제공할 수 있으며, 다중 밀집 레이어를 이용하여 생성한 가중치 행렬에 적어도 하나의 콘텐츠 이용자 성향 벡터 u의 값이 입력될 수 있다.Referring to FIG. 4, when data of specific content is input, evaluation scores for each content user can be predicted and provided, and at least one content user propensity vector u is input in a weight matrix created using multiple dense layers. Can be.

여기서 두 번째 연산 프로세스의 경우 콘텐츠의 유형이 텍스트 위주가 아닌 경우 별도의 딥러닝 모델로 교체가 가능하다.Here, in the case of the second computational process, if the type of content is not text-oriented, it can be replaced with a separate deep learning model.

도 5는 본 발명 일 실시 예 중 텍스트 위주의 콘텐츠를 분석하여 선호하는 콘텐츠를 추천하는 모델을 실시 예로 하는 GPU 연산 기반의 선호 콘텐츠 추천 장치의 데이터 흐름을 나타낸 도면이다.FIG. 5 is a diagram illustrating a data flow of a device for recommending preferred content based on a GPU calculation in which a model for recommending preferred content by analyzing text-oriented content in one embodiment of the present invention is an example.

도 5를 참조하면 텍스트 위주의 콘텐츠인 뉴스 기사를 입력하여 워드 투 모델을 생성하고, 손실함수를 반영한 CNN기반 텍스트 처리 모델을 통해 연산하며, 다중 밀집 레이어 연산을 통한 추천모델 연산함으로써 특정 콘텐츠에 대한 콘텐츠 평가 점수를 예측할 수 있다.5, a word-to-model is generated by inputting a news article, which is a text-oriented content, and is calculated through a CNN-based text processing model that reflects a loss function, and a recommendation model is calculated through multi-dense layer calculations for specific content. Content evaluation score can be predicted.

도 6은 본 발명의 일 실시 예에 따른 GPU 연산 기반의 딥러닝을 이용한 선호 콘텐츠 추천 방법의 흐름도이다.6 is a flowchart of a method for recommending preferred content using deep learning based on GPU computation according to an embodiment of the present invention.

콘텐츠 이용자 성향 벡터 및 콘텐츠 성향 벡터를 산출한다(610).The content user propensity vector and the content propensity vector are calculated (610).

본 발명의 일 실시 예에 따르면 콘텐츠 이용자의 성향을 벡터화한 콘텐츠 이용자 성향 벡터u, 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터 v를 산출할 수 있다.According to an embodiment of the present invention, a content user propensity vector u obtained by vectorizing the propensity of a content user and a content propensity vector v vectorized by a content propensity may be calculated.

상기 실시 예에 따르면 콘텐츠 이용자 성향 벡터u, 콘텐츠의 성향을 벡터화한 콘텐츠 성향 벡터 v의 산출은 도 1을 참조하며 설명한 바와 같이 산출할 수 있다.According to the above embodiment, the content user disposition vector u and the content disposition vector v obtained by vectorizing the content disposition may be calculated as described with reference to FIG. 1.

미리 설정된 손실 함수에 벡터 값을 입력하여 연산을 수행한다(620).An operation is performed by inputting a vector value to a preset loss function (620).

본 발명의 일 실시 예에 따르면 GPU만을 이용하여 연산이 가능하도록 완전한 딥러닝 구조를 구현하기 위하여 미리 설정된 손실 함수에 산출된 u와 v를 입력하여 연산을 수행할 수 있다.According to an embodiment of the present invention, in order to implement a complete deep learning structure so that computation can be performed using only a GPU, the calculation may be performed by inputting the calculated u and v to a preset loss function.

본 발명의 일 실시 예에 따르면 미리 설정된 손실함수는 위 수학식 4와 같이 표현될 수 있다.According to an embodiment of the present invention, the preset loss function may be expressed as Equation 4 above.

콘텐츠 이용자별 콘텐츠 평가 점수를 예측한다(630).The content evaluation score for each content user is predicted (630).

본 발명의 일 실시 예에 따르면 연산 결과를 기반으로 콘텐츠 이용자 별로 특정 콘텐츠에 대한 콘텐츠 평가 점수를 예측할 수 있다.According to an embodiment of the present invention, a content evaluation score for a specific content may be predicted for each content user based on an operation result.

콘텐츠 이용자별 콘텐츠 선호도를 판단한다(640).Content preference for each content user is determined (640).

본 발명의 일 실시 예에 따르면 각 콘텐츠 이용자가 이용한 각 콘텐츠에 대한 체류 시간 정보를 수집하고 수집된 체류 시간을 상대 점수로 환산하여 해당 콘텐츠의 개인 선호도를 판단하여 콘텐츠 이용자별 콘텐츠 선호도를 제공할 수 있다.According to an embodiment of the present invention, it is possible to provide content preference for each content user by collecting information on the residence time for each content used by each content user and converting the collected residence time into a relative score to determine the personal preference of the content have.

본 발명의 일 실시 예에 따르면 콘텐츠 이용자가 이용한 각 콘텐츠 별 체류 시간 정보를 수집할 수 있다.According to an embodiment of the present invention, it is possible to collect information about the residence time for each content used by a content user.

본 발명의 일 실시 예에 따르면 수집된 콘텐츠 별 체류 시간 정보 각각에 정규분포 함수를 대응할 수 있다.According to an embodiment of the present invention, a normal distribution function may be applied to each of the collected information about the dwell time for each content.

본 발명의 일 실시 예에 따르면 대응된 복수의 체류 시간 정보에 대한 정규분포 함수를 선형적으로 결합하여, 콘텐츠 이용자의 체류시간 분포를 추정할 수 있다.According to an embodiment of the present invention, by linearly combining a normal distribution function for a plurality of corresponding dwell time information, it is possible to estimate the distribution of the dwell time of a content user.

상기 일 실시 예에 따르면 각각의 정규분포 함수는 콘텐츠 N에 대한 체류시간

, 콘텐츠 이용자 A의 사건 개수

, 콘텐츠 이용자 A의 체류시간 표준편차

일 때 평균 및 표준 편차는 아래와 같다.According to the above embodiment, each normal distribution function is a residence time for content N

, Number of incidents of content user A

, Standard deviation of content user A's residence time

When is, the mean and standard deviation are as follows.

본 발명의 일 실시 예에 따르면 상술한 변수에 근거하여 위의 수학식 5를 이용함으로써 콘텐츠 이용자의 체류시간 분포(pdf)를 추정할 수 있다.According to an embodiment of the present invention, a content user's residence time distribution (pdf) can be estimated by using Equation 5 above based on the above-described variables.

본 발명의 일 실시 예에 따르면 체류시간 분포에 대한 체류시간 평균은 기록된 콘텐츠 이용자의 체류시간 평균과 같다고 가정될 수 있으며, 체류시간 분포(pdf)에 대한 체류시간 분산은 위 수학식 6과 같다.According to an embodiment of the present invention, it may be assumed that the average stay time for the distribution of stay time is the same as the average stay time of the recorded content user, and the distribution of stay time for the stay time distribution (pdf) is as Equation 6 above. .

본 발명의 일 실시 예에 따르면 추정된 체류시간 분포를 누적하여 누적 분포 함수를 계산하고, 계산된 누적 분포 함수를 이용하여 체류 시간을 각 콘텐츠 이용자별 상대 점수로 환산한 맞춤형 점수를 계산할 수 있다.According to an embodiment of the present invention, a cumulative distribution function may be calculated by accumulating the estimated distribution of dwell time, and a customized score obtained by converting the dwell time into a relative score for each content user may be calculated using the calculated cumulative distribution function.

본 발명의 일 실시 예에 따르면 위 수학식 7을 이용하여 누적 분포 함수(cdf)를 계산할 수 있다.According to an embodiment of the present invention, a cumulative distribution function (cdf) may be calculated using Equation 7 above.

본 발명의 일 실시 예에 따르면 이상적인 점수의 분포를 가정한 후 사분위(quantile) 변환하여 맞춤형 점수를 계산할 수 있으며, 일 실시 예에 따르면 위 수학식 8과 같은 사분위 변환을 수행하여 맞춤형 점수를 계산할 수 있다.According to an embodiment of the present invention, after assuming the distribution of the ideal score, a custom score may be calculated by transforming a quartile. According to an embodiment, a custom score may be calculated by performing a quartile transformation as in Equation 8 above. Can be calculated.

본 발명의 일 실시 예에 따르면 계산된 맞춤형 점수를 기반으로 각 콘텐츠 이용자별 복수의 콘텐츠에 대한 선호도 그래프를 생성하여 해당 콘텐츠의 개인 선호도를 판단할 수 있다.According to an embodiment of the present invention, a preference graph for a plurality of contents for each content user may be generated based on the calculated customized score to determine the personal preference of the corresponding content.

최상위 콘텐츠 선호도를 가지는 콘텐츠에 대한 콘텐츠 평가 점수를 예측한다(650).The content evaluation score for the content having the highest content preference is predicted (650).

본 발명의 일 실시 예에 따르면 높은 콘텐츠 선호도를 가지는 콘텐츠에 대한 콘텐츠 평가 점수를 상기 과정을 반복하여 예측할 수 있다.According to an embodiment of the present invention, a content evaluation score for content having a high content preference may be predicted by repeating the above process.

최상위 평가 점수가 예측되는 콘텐츠를 추천한다(660).Contents for which the highest evaluation score is predicted are recommended (660).

본 발명의 일 실시 예에 따르면 높은 평가 점수가 예측되는 콘텐츠를 콘텐츠이용자에게 추천할 수 있다.According to an embodiment of the present invention, content for which a high evaluation score is predicted may be recommended to content users.

본 발명의 실시 예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.The embodiments of the present invention are not implemented only through the apparatus and/or method described above, and the embodiments of the present invention have been described in detail above, but the scope of the present invention is not limited thereto, and the following claims Various modifications and improvements of those skilled in the art using the basic concept of the present invention defined in are also within the scope of the present invention.

100 : 벡터 변환부 200 : GPU 연산 수행부
300 : 콘텐츠 평가 점수 예측부 400 : 콘텐츠 선호도 제공부
500 : 콘텐츠 추천부 1000 : 콘텐츠 추천 장치100: vector conversion unit 200: GPU operation execution unit
300: content evaluation score prediction unit 400: content preference providing unit
500: content recommendation unit 1000: content recommendation device

Claims

Using a deep learning model consisting of a neural network operated by a processor,
A vector conversion unit that calculates a content user propensity vector u obtained by vectorizing a propensity of a content user and a content propensity vector v vectorized by a content propensity;
A GPU calculation execution unit that performs calculation by inputting the calculated u and v to a preset loss function in order to implement a complete deep learning structure so that calculation is possible using only a GPU;
A content evaluation score predictor for predicting a content evaluation score for a specific content for each content user based on the calculation result; And
Including; a content preference providing unit that collects information on the residence time for each content used by each content user and converts the collected residence time into a relative score to determine the personal preference of the corresponding content and provides the content preference for each content user; and
The content preference providing unit,
A residence time information collection unit that collects residence time information for each content used by a content user;
A normal distribution function matching unit for correlating a normal distribution function to each of the collected information on the residence time for each content;
A residence time distribution estimation unit for estimating a residence time distribution of the content user by linearly combining a normal distribution function for the corresponding plurality of residence time information;
A customized score calculator configured to calculate a cumulative distribution function by accumulating the estimated distribution of residence time, and calculating a customized score obtained by converting the residence time into a relative score for each content user using the calculated cumulative distribution function; And
Contents using deep learning based on GPU computation, comprising: a preference determining unit that generates a preference graph for a plurality of contents for each content user based on the calculated customized score and determines personal preference of the corresponding content. Recommended device.

The method of claim 1, wherein the GPU operation performing unit,
A content recommendation apparatus using deep learning based on GPU computation, wherein a value of at least one content user propensity vector u is input to a weight matrix generated using multiple dense layers in performing the calculation.

The method of claim 1,
When the content is text-oriented, a content recommendation device using deep learning based on GPU calculation, characterized in that a dictionary is formed and a word to vector model is learned for vectorization of texts included in the content. .

The method of claim 3,
Texts included in the content are crawled and registered in a dictionary for each sentence, and exception words stored in advance are deleted by parsing the sentences registered in the dictionary, and only pre-stored parts of speech in a predetermined format are extracted, and the stored sentences are used as words. A content recommendation device using deep learning based on GPU computation, characterized in that it generates a word to vector model.

The method of claim 1, wherein the GPU operation performing unit,
When the content is text-oriented, a document processing CNN model is used, and when the content is a video or music owner, a video or music processing CNN model is used, and a corresponding CNN model is used for each content type. Content recommendation device using deep learning based on GPU computation.

The method of claim 1,
A content recommendation device using deep learning based on GPU computation, further comprising: a content recommendation unit that predicts a content evaluation score for a content having the highest content preference and recommends a content whose highest evaluation score is predicted.

delete

The method of claim 1, wherein the customized score calculation unit,
A content recommendation device using deep learning based on GPU computation, characterized in that the calculated cumulative distribution function is transformed into a quartile to calculate a customized score.

The method of claim 1, wherein the preference determination unit,
A content recommendation device using deep learning based on GPU calculation, characterized in that, as the calculated customized score is higher, it is determined as the content preferred by each content user.

The method of claim 1, wherein the residence time distribution estimation unit,
Content recommendation apparatus using deep learning based on GPU computation, characterized in that as the number of contents used by the contents user increases, the estimated distribution of residence time follows a distribution corresponding to an actual result.

Using a deep learning model consisting of a neural network operated by a processor,
Calculating a content user propensity vector u that vectorizes the propensity of content users and a content propensity vector v that vectorizes the content propensity;
Performing an operation by inputting the calculated u and v to a preset loss function in order to implement a complete deep learning structure so that computation is possible using only a GPU;
Predicting a content evaluation score for a specific content for each content user based on the calculation result; And
Including; collecting residence time information for each content used by each content user, converting the collected stay time into a relative score, determining personal preference of the corresponding content, and providing content preference for each content user; including,
Providing the content preference,
Collecting information on a residence time for each content used by a content user;
Correlating a normal distribution function to each of the collected information on the residence time for each content;
Estimating a residence time distribution of the content user by linearly combining a normal distribution function for the corresponding plurality of residence time information;
Calculating a cumulative distribution function by accumulating the estimated distribution of dwell time, and calculating a customized score obtained by converting the dwell time into a relative score for each content user by using the calculated cumulative distribution function; And
Generating a preference graph for a plurality of contents for each content user based on the calculated customized score, determining a personal preference of the corresponding content; and a content recommendation method using deep learning based on GPU computation, comprising: .

The method of claim 11, wherein calculating the content propensity vector v,
A content recommendation method using deep learning based on GPU calculation, characterized in that at least one content user propensity vector u is input to a weight matrix generated using multiple dense layers in performing the calculation.

The method of claim 11,
When the content is text-oriented, a content recommendation method using deep learning based on GPU computation, characterized in that a dictionary is formed and a word to vector model is learned for vectorization of texts included in the content. .

The method of claim 13,
Texts included in the content are crawled and registered in a dictionary for each sentence, and exception words stored in advance are deleted by parsing the sentences registered in the dictionary, and only pre-stored parts of speech in a predetermined format are extracted, and the stored sentences are used as words. A content recommendation method using deep learning based on GPU computation, characterized by generating a word to vector model.

The method of claim 11, wherein calculating the content propensity vector v,
When the content is text-oriented, a document processing CNN model is used, and when the content is a video or music owner, a video or music processing CNN model is used, and a corresponding CNN model is used for each content type. Content recommendation method using deep learning based on GPU computation.

The method of claim 11,
A content recommendation method using deep learning based on GPU computation, further comprising: predicting a content evaluation score for a content having the highest content preference and recommending a content for which the highest evaluation score is predicted.

delete

The method of claim 11, wherein calculating the customized score,
A content recommendation method using deep learning based on GPU calculation, characterized in that the calculated cumulative distribution function is transformed into a quartile to calculate a customized score.

The method of claim 11, wherein determining the personal preference,
A content recommendation method using deep learning based on GPU computation, characterized in that, as the calculated customized score is higher, it is determined as the content preferred by each content user.

The method of claim 11, wherein estimating the residence time distribution,
A content recommendation method using deep learning based on GPU computation, characterized in that as the number of contents used by the content user increases, the estimated distribution of residence time follows a distribution corresponding to an actual result.