KR20190134967A

KR20190134967A - Promotional image improvement apparatus and method in online shopping mall using artificial intelligence

Info

Publication number: KR20190134967A
Application number: KR1020180152768A
Authority: KR
Inventors: 남기헌; 남기준; 남정우
Original assignee: 남기헌; 남정우; 남기준
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2019-12-05
Also published as: KR102072820B1

Abstract

The present invention relates to a promotion image modifying apparatus in an online shopping mall using artificial intelligence. According to one embodiment of the present invention, a promotion effect is increased by precisely modifying a color of a promotion image of an online shopping mall by learning of artificial intelligence.

Description

Promotional image improvement apparatus and method in online shopping mall using artificial intelligence}

본 발명은 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 이미지 수정 장치에 관한 것이다. The present invention relates to an apparatus for modifying a promotional image in an online shopping mall using artificial intelligence.

온라인 쇼핑몰에서의 프로모션은 제품 할인, 경품 이벤트, 할인 쿠폰 지급, 마일리지 지급 등의 형태로 진행된다. 이러한 프로모션은 온라인 쇼핑몰의 입장에서 항상 순익의 감소를 야기하기 때문에 프로모션의 종류, 프로모션 기간, 프로모션의 정도 등이 해당 페르소나에 맞게 정밀하게 설계되어야 한다. 하지만 기존에는 프로모션의 설계가 모두 온라인 쇼핑몰의 운영자나 프로모션 대행사 등에 의해 경험적으로 결정되는 실정이었다.Promotions in online shopping malls can take the form of product discounts, sweepstakes, discount coupon payments and mileage payments. Since such promotions always cause a decrease in profits from the point of view of the online shopping mall, the type of promotion, the duration of the promotion, and the degree of promotion must be precisely designed for the persona. However, in the past, the design of the promotion was all empirically determined by the operator of the online shopping mall or the promotion agency.

대한민국의 온라인 쇼핑은 오픈마켓/소셜커머스/종합쇼핑몰/대형마트쇼핑몰/카드포인트몰 등 다양한 형태의 회사들이 존재하고 있고, 대부분의 판매자들은 위의 온라인 쇼핑몰에 복수로 입점하여 판매를 하고 있어서, 통일된 데이터를 바탕으로 한 판매 성과를 분석하기 어려운 문제가 발생하고 있다. In Korea, there are various types of online shopping such as open market / social commerce / general shopping mall / large mart shopping mall / card point mall, and most sellers sell multiple stores in the above online shopping malls. Difficulties arise in analyzing sales performance based on the data collected.

프로모션의 구성요소는 프로모션의 일시 및 조건들과 관련된 내용으로 프로모션 결과치와 인과관계를 이루고 있으나, 구성요건의 하나인 일시를 예를 들어도 계절, 월, 요일, 시간, 기간, 공휴일 등의 다양한 사항들이 포함되어 있어서 결과치를 바탕으로 구성요소가 기여한 기여도 등을 사람이 계산하는 것은 매우 어려운 실정이다. The components of the promotion are related to the date and conditions of the promotion and are causally related to the result of the promotion.However, various items such as season, month, day of the week, time, period, and holidays are included even when the date and time is one of the requirements It is very difficult for a person to calculate the contributions made by components based on the results.

특히, 소비자에게 혜택으로 작용할 프로모션 감가율의 경우, 감가율이 커질수록 구매 혜택이 커지는 결과를 가져오므로 매출 수량이 늘어나서 매출이 증가할 것이다. 하지만, 제품의 원가를 고려했을 때 매출수량 증가분과 감가율의 적정 지점에서 균형있게 적용하는 것이 중요하고, 이때 기대한 매출 이익을 달성할 수 있게 될 것 인지에 대해 사람이 계산하는 것은 매우 어려운 실정이다. In particular, in the case of the promotion depreciation rate that will benefit the consumer, the greater the depreciation rate, the greater the purchase benefit, which will increase the sales volume and increase sales. However, considering the cost of the product, it is important to balance the increase in sales volume and the appropriate rate of depreciation rate, and it is very difficult for a person to calculate whether the expected sales profit will be achieved. .

신경망을 가지는 협업 필터링 시스템을 이용하여 클릭패턴에 기초한 웹 광고 추천 방법 및 그 시스템, 대한민국 등록특허 10-0792700, 네이버 주식회사Web advertising recommendation method and system based on click pattern using a collaborative filtering system having a neural network, Korea Patent Registration 10-0792700, Naver Corporation

따라서, 본 발명의 목적은 기존에 경험적으로 설계되던 온라인 쇼핑몰의 프로모션의 낮은 정밀도를 해결하기 위하여, 인공지능을 이용하여 프로모션의 성과를 예측하고 프로모션의 특성을 추천하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법을 제공하는 데에 있다. Accordingly, an object of the present invention in order to solve the low precision of the promotion of the online shopping mall that was previously empirically designed, in the online shopping mall using artificial intelligence, which predicts the performance of the promotion using artificial intelligence and recommends the characteristics of the promotion. To provide a promotion performance prediction and recommendation device and method.

이하 본 발명의 목적을 달성하기 위한 구체적 수단에 대하여 설명한다.Hereinafter, specific means for achieving the object of the present invention will be described.

본 발명의 목적은, 제품 속성 데이터, 프로모션 속성 데이터, 기타 성과 데이터 및 프로모션 를 기초로 기학습된 인공신경망의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 인공신경망의 프로그램 코드를 처리하여 특정 프로모션의 성과를 예측하는 처리 모듈; 을 포함하고, 상기 프로그램 코드는, 제품 속성 데이터(100), 프로모션 속성 데이터(200), 성과 데이터(300)를 입력 받는 입력 단계; 상기 제품 속성 데이터, 상기 프로모션 속성 데이터 및 상기 성과 데이터를 기초로 상기 특정 프로모션에 대한 성과를 예측하여 프로모션 성과 예측 데이터를 생성하는 성과 예측 단계; 및 예측된 상기 프로모션 성과 예측 데이터를 출력하는 출력 단계;를 포함하여 컴퓨터 상에서 수행되도록 구성되는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치를 제공하여 달성될 수 있다.An object of the present invention, the memory module for storing the program code of the artificial neural network that is already learned based on the product attribute data, promotion attribute data, other performance data and promotion; And a processing module for processing the program code of the artificial neural network to predict the performance of a specific promotion. The program code includes: an input step of receiving product attribute data (100), promotion attribute data (200), and performance data (300); A performance prediction step of generating promotion performance prediction data by predicting the performance of the specific promotion based on the product attribute data, the promotion attribute data and the performance data; And an output step of outputting the predicted promotion performance prediction data. The present invention may be achieved by providing an apparatus for predicting and recommending promotion performance in an online shopping mall using artificial intelligence.

또한, 프로모션 속성 데이터 및 성과 데이터를 기초로 기학습된 프로모션 성과 예측 인공신경망 모듈 및 프로모션 추천 모듈의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 프로모션 성과 예측 인공신경망 모듈 및 상기 프로모션 추천 모듈의 프로그램 코드를 처리하여 특정 프로모션의 성과를 예측하고 프로모션 속성을 추천하는 처리 모듈;을 포함하고, 상기 프로모션 추천 모듈의 상기 프로그램 코드는, 상기 프로모션 성과 예측 인공신경망 모듈에서 특정 프로모션 속성에 대해 예측된 프로모션 성과에 대한 정보인 프로모션 성과 예측 데이터를 수신하는 프로모션 성과 예측 단계; 에피소드 t에 대한 상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 상태 정보 및 상기 프로모션 성과 예측 데이터를 기초로 가치망에서 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 정책망에서 복수개의 액션(action)인 추천 프로모션 속성 확률을 출력하는 속성 확률 출력 단계; 상기 복수개의 추천 프로모션 속성 확률을 기초로 최적 프로모션 탐색을 통해 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 수신된 상기 상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 정책망과 상기 가치망을 업데이트하는 정책망 및 가치망 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 프로모션 카테고리, 프로모션 기간, 프로모션 쇼핑몰, 노출구좌 또는 프로모션 이미지 색상을 의미하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치를 제공하여 달성될 수 있다. In addition, the memory module for storing the program code of the promotion performance prediction artificial neural network module and the promotion recommendation module based on the promotion attribute data and the performance data; And a processing module for processing the program codes of the promotion performance prediction neural network module and the promotion recommendation module to predict the performance of a specific promotion and recommending promotion attributes, wherein the program code of the promotion recommendation module includes the promotion code. A promotion performance prediction step of receiving, in the performance prediction artificial neural network module, promotion performance prediction data which is information about the promotion performance predicted for a specific promotion attribute; Receiving status information s _t for episode t; After generating reward possibility information in a value network based on the status information and the promotion performance prediction data, an attribute probability output for outputting a recommended promotion attribute probability that is a plurality of actions in a policy network based on the reward probability information. step; A recommendation promotion attribute data output step of selecting and outputting recommendation promotion attribute data (a _t ) for the episode t through an optimal promotion search based on the plurality of recommendation promotion attribute probabilities; Receiving state and reward information for receiving status information s _{t + 1} for episode t + 1 and reward information r _{t + 1} for a _t after the promotion according to the recommended promotion attribute data is applied; And updating the policy network and the value network based on the received state information s _{t + 1} and the reward information r _{t + 1} . And the status information includes performance data (Quantity, Sales Amount, Page View, Conversion Rate, Average Sales Price, or Revenue revenue) and environmental data (date information, day of the week information, weather information, promotional product information, shopping mall information, or other product information of a shopping mall). By means of the exposure account or the promotional image color, wherein the reward information means the rise or fall of the performance data for a specific period, by providing a promotion performance prediction and recommendation device in the online shopping mall using artificial intelligence Can be achieved.

본 발명의 다른 목적은, 프로모션 추천 모듈이 프로모션 성과 예측 인공신경망 모듈에서 특정 프로모션 속성에 대해 예측된 프로모션 성과에 대한 정보인 프로모션 성과 예측 데이터를 수신하는 프로모션 성과 예측 단계; 상기 프로모션 추천 모듈이 에피소드 t에 대한 상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 프로모션 추천 모듈이 상기 상태 정보 및 상기 프로모션 성과 예측 데이터를 기초로 가치망에서 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 정책망에서 복수개의 액션(action)인 추천 프로모션 속성 확률을 출력하는 속성 확률 출력 단계; 상기 프로모션 추천 모듈이 상기 복수개의 추천 프로모션 속성 확률을 기초로 최적 프로모션 탐색을 통해 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 상기 프로모션 추천 모듈이 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 상기 프로모션 추천 모듈이 수신된 상기 상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 정책망과 상기 가치망을 업데이트하는 정책망 및 가치망 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 프로모션 카테고리, 프로모션 기간, 프로모션 쇼핑몰, 노출구좌 또는 프로모션 이미지 색상을 의미하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 방법을 제공하여 달성될 수 있다. Another object of the present invention, the promotion recommendation module is a promotion performance prediction step of receiving promotion performance prediction data which is information on the promotion performance predicted for a specific promotion attribute in the promotion performance prediction artificial neural network module; Receiving status information by the promotion recommendation module for receiving status information s _t for episode t; After the promotion recommendation module generates reward possibility information in a value network based on the status information and the promotion performance prediction data, a recommendation promotion attribute probability of a plurality of actions in a policy network is generated based on the reward probability information. Outputting an attribute probability; A recommendation promotion attribute data output step of the promotion recommendation module selecting and outputting recommendation promotion attribute data (a _t ) for the episode t through an optimal promotion search based on the plurality of recommendation promotion attribute probabilities; Status and reward after the promotion according to the recommended promotion attribute data is applied, the promotion recommendation module receives status information s _{t + 1} for episode t + 1 and reward information r _{t + 1} for a _t Receiving information; And updating, by the promotion recommendation module, the policy network and the value network based on the received state information s _{t + 1} and the reward information r _{t + 1} . And the status information includes performance data (Quantity, Sales Amount, Page View, Conversion Rate, Average Sales Price, or Revenue revenue) and environmental data (date information, day of the week information, weather information, promotional product information, shopping mall information, or other product information of a shopping mall). By means of the exposure account or the promotional image color, wherein the reward information means the rise or fall of the performance data for a specific period, by providing a method for predicting and recommending promotion performance in the online shopping mall using artificial intelligence Can be achieved.

본 발명의 다른 목적은, 적어도 하나의 온라인 쇼핑몰에 대한 프로모션 속성 데이터 및 성과 데이터를 기초로 프로모션 속성을 추천하는 프로모션 추천 모듈 및 프로모션 이미지의 색감을 수정하는 프로모션 이미지 수정 모듈의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 프로모션 추천 모듈 및 상기 프로모션 이미지 수정 모듈의 프로그램 코드를 처리하여 특정 프로모션 속성을 추천하고 프로모션 이미지의 색감을 수정하는 처리 모듈; 을 포함하고, 상기 프로모션 추천 모듈의 상기 프로그램 코드는, 에피소드 t에 대한 제1상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 제1상태 정보를 기초로 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 상기 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 에피소드 t+1에 대한 제2상태 정보(s_t+1) 및 상기 추천 프로모션 속성 데이터(a_t)에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 수신된 상기 제2상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 추천 프로모션 속성 데이터 출력 단계의 가중치를 업데이트하는 가중치 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 프로모션 이미지 수정 모듈의 상기 프로그램 코드는, 상기 프로모션 이미지를 수신하는 프로모션 이미지 수신 단계; VAE 또는 GAN을 포함하는 이미지 제너레이터에 의해 상기 프로모션 이미지의 색감을 수정하여 수정 프로모션 이미지를 생성하는 제너레이션 단계; 상기 수정 프로모션 이미지를 인코딩하여 상기 추천 프로모션 속성 데이터 내의 추천 프로모션 이미지 색상 정보와의 에러를 계산하는 비교 단계; 및 상기 에러를 기초로 상기 이미지 제너레이터를 업데이트하는 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되며, 상기 제1상태 정보 및 제2상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 추천 프로모션 이미지 색상 정보를 포함하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 이미지 수정 장치를 제공하여 달성될 수 있다. Another object of the present invention, a memory for storing the program code of the promotion recommendation module for recommending the promotion attribute based on the promotion attribute data and performance data for at least one online shopping mall and the promotion image modification module for modifying the color of the promotion image module; And a processing module for processing program codes of the promotion recommendation module and the promotion image modification module to recommend specific promotion attributes and to modify the color of the promotion image. The program code of the promotion recommendation module includes: receiving status information receiving first status information (s _t ) for episode t; Generating recommended reward attribute data based on the first state information, and outputting recommended promotion attribute data a _t for the episode t based on the reward possibility information; After the promotion according to the recommended promotion attribute data is applied, second state information s _{t + 1} for episode t + 1 and reward information r _{t + 1} for the recommended promotion attribute data a _t are received. Receiving state and reward information; And a weight updating step of updating a weight of the recommendation promotion attribute data outputting step based on the received second state information (s _{t + 1} ) and the reward information (r _{t + 1} ). It is configured to be performed on a computer including, wherein the program code of the promotional image modification module, Promotion image receiving step of receiving the promotional image; Generating a modified promotional image by modifying the color of the promotional image by an image generator including a VAE or a GAN; A comparison step of encoding the modified promotional image and calculating an error with recommended promotional image color information in the recommended promotional attribute data; And updating the image generator based on the error. The first status information and the second status information are configured to be performed on a computer, including performance data (Quantity, Sales Amount, Page View, Conversion Rate, and Average Sales). Average sales price or revenue) and environmental data (date information, day of the week information, weather information, promotional product information, shopping mall information, or other product information of the shopping mall). It includes image color information, wherein the compensation information may be achieved by providing a promotional image modification apparatus in an online shopping mall using artificial intelligence, characterized in that it means the rise or fall of the performance data for a specific period.

본 발명의 다른 목적은, 적어도 하나의 온라인 쇼핑몰에 대한 프로모션 속성 데이터 및 성과 데이터를 기초로 프로모션 속성을 추천하는 프로모션 추천 모듈이, 에피소드 t에 대한 제1상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 프로모션 추천 모듈이, 상기 제1상태 정보를 기초로 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 상기 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 프로모션 추천 모듈이, 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 에피소드 t+1에 대한 제2상태 정보(s_t+1) 및 상기 추천 프로모션 속성 데이터(a_t)에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 상기 프로모션 추천 모듈이, 수신된 상기 제2상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 추천 프로모션 속성 데이터 출력 단계의 가중치를 업데이트하는 가중치 업데이트 단계; 를 포함하고, 프로모션 이미지의 색감을 수정하는 프로모션 이미지 수정 모듈이, 상기 프로모션 이미지를 수신하는 프로모션 이미지 수신 단계; VAE 또는 GAN을 포함하는 이미지 제너레이터에 의해 상기 프로모션 이미지의 색감을 수정하여 수정 프로모션 이미지를 생성하는 제너레이션 단계; 상기 수정 프로모션 이미지를 인코딩하여 상기 추천 프로모션 속성 데이터 내의 추천 프로모션 이미지 색상 정보와의 에러를 계산하는 비교 단계; 및 상기 에러를 기초로 상기 이미지 제너레이터를 업데이트하는 업데이트 단계; 를 포함하며, 상기 제1상태 정보 및 제2상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 추천 프로모션 이미지 색상 정보를 포함하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 이미지 수정 방법을 제공하여 달성될 수 있다. It is another object of the present invention to provide a promotion recommendation module for recommending a promotion attribute based on promotion attribute data and performance data for at least one online shopping mall, and receiving status information about the episode t (s _t ). Receiving step; The promotion recommendation module generates reward possibility information based on the first state information, and then selects and outputs recommended promotion attribute data a _t for the episode t based on the reward possibility information; Data output step; After the promotion recommendation module applies the promotion according to the recommended promotion attribute data, the second state information s _{t + 1} for the episode t + 1 and the reward information r for the recommended promotion attribute data a _t _{t + 1} ) receiving a state and reward information; And updating, by the promotion recommendation module, a weight of the recommendation promotion attribute data output step based on the received second state information (s _{t + 1} ) and the reward information (r _{t + 1} ). Promote image correction module for modifying the color of the promotional image, the promotion image receiving step of receiving the promotion image; Generating a modified promotional image by modifying the color of the promotional image by an image generator including a VAE or a GAN; A comparison step of encoding the modified promotional image and calculating an error with recommended promotional image color information in the recommended promotional attribute data; And updating the image generator based on the error. The first state information and the second state information include performance data (Quantity, Sales Amount, Inflow (Page View), Conversion Rate, Average Sales Price). Or sales revenue) and environmental data (date information, day of the week information, weather information, product information in progress of a promotion, shopping mall information, or other product information of a shopping mall), and the recommended promotion attribute data includes recommended promotion image color information. The reward information may be achieved by providing a method of modifying a promotional image in an online shopping mall using artificial intelligence, which means that the performance data is raised or lowered during a specific period of time.

상기한 바와 같이, 본 발명에 의하면 이하와 같은 효과가 있다.As described above, the present invention has the following effects.

첫째, 본 발명의 일실시예에 따르면, 인공지능의 학습에 의해 온라인 쇼핑몰의 프로모션이 정교해지고 정밀해짐으로써 프로모션의 효과가 상승하는 효과가 발생된다.First, according to an embodiment of the present invention, the promotion of the online shopping mall is elaborated and precisely made by the learning of artificial intelligence, thereby increasing the effect of the promotion.

둘째, 본 발명의 일실시예에 따르면, 가치망(111)에 따라 추천 프로모션 속성의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 프로모션 추천 모델에 적용하는데는 어려움이 있었다. Second, according to one embodiment of the present invention, the effect of the update of the policy network for outputting the probability of the recommended promotion attribute according to the value network 111 can proceed every episode. In the existing reinforcement learning, there is a problem that the update of the reinforcement learning model is performed after all episodes are finished, so it is difficult to apply to the promotion recommendation model.

셋째, 본 발명의 일실시예에 따르면, 프로모션의 추천은 단기 및 장기 목표들이 존재하므로 MDP의 가정이 성립하지 않아서 기존의 DQN 적용에 어려움이 있음에도 불구하고, 정책망 및 가치망의 의존적 관계에 의해 프로모션 추천에도 강화학습을 진행할 수 있게 되는 효과가 발생된다. Third, according to one embodiment of the present invention, the recommendation of the promotion is short-term and long-term targets, so even though there is difficulty in applying the existing DQN because the assumption of the MDP does not hold, the dependency of the policy network and the value network The promotion recommendation also has the effect of enabling reinforcement learning.

넷째, 본 발명의 일실시예에 따르면, 정책망 및 가치망이 프로모션 성과 예측 인공신경망 장치와 연결되어 프로모션 성과 예측 데이터를 기초로 학습되기 때문에 학습 속도가 향상되는 효과가 발생된다. Fourth, according to an embodiment of the present invention, since the policy network and the value network is connected to the promotion performance prediction artificial neural network device to learn based on the promotion performance prediction data, the learning speed is improved.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)를 도시한 것,
도 2는 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 학습 과정을 도시한 모식도,
도 3은 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 추론 과정을 도시한 모식도,
도 4는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)를 도시한 모식도,
도 5는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 동작예를 도시한 흐름도,
도 6은 본 발명의 일실시예에 따른 프로모션 이미지 수정 장치(13)를 도시한 모식도,
도 7은 본 발명의 일실시예에 따른 ConvNet 인코더의 예를 도시한 모식도,
도 8은 본 발명의 일실시예에 따른 프로모션 추천 방법을 도시한 흐름도이다. The following drawings, which are attached to this specification, illustrate exemplary embodiments of the present invention, and together with the detailed description thereof, serve to further understand the technical spirit of the present invention. It should not be interpreted.
1 is a view showing a promotion performance prediction and recommendation apparatus 1 in an online shopping mall using artificial intelligence according to an embodiment of the present invention,
Figure 2 is a schematic diagram showing a learning process of the promotion performance artificial neural network device according to an embodiment of the present invention,
3 is a schematic diagram showing an inference process of the promotion performance prediction artificial neural network device according to an embodiment of the present invention,
4 is a schematic diagram showing a promotion recommendation device 11 according to an embodiment of the present invention,
5 is a flowchart illustrating an operation example of the promotion recommendation device 11 according to an embodiment of the present invention;
6 is a schematic diagram showing a promotional image modification device 13 according to an embodiment of the present invention,
7 is a schematic diagram showing an example of a ConvNet encoder according to an embodiment of the present invention;
8 is a flowchart illustrating a promotion recommendation method according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 쉽게 실시할 수 있는 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예에 대한 동작원리를 상세하게 설명함에 있어서 관련된 공지기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, in describing in detail the principle of operation of the preferred embodiment of the present invention, if it is determined that the detailed description of the related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

또한, 도면 전체에 걸쳐 유사한 기능 및 작용을 하는 부분에 대해서는 동일한 도면 부호를 사용한다. 명세서 전체에서, 특정 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고, 간접적으로 연결되어 있는 경우도 포함한다. 또한, 특정 구성요소를 포함한다는 것은 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, the same reference numerals are used for parts having similar functions and functions throughout the drawings. Throughout the specification, when a particular part is connected to another part, this includes not only the case where it is directly connected, but also the case where it is indirectly connected with another element in between. In addition, the inclusion of a specific component does not exclude other components unless specifically stated otherwise, it means that may further include other components.

인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법Apparatus and method for forecasting and recommending promotion performance in online shopping mall using artificial intelligence

도 1은 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)를 도시한 것이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치는 제품 속성 데이터, 프로모션 속성 데이터, 성과 데이터 및 프로모션을 기초로 기학습된 프로모션 성과 예측 인공신경망 장치(10) 및 프로모션 추천 장치(11)를 포함할 수 있다. 1 is a block diagram illustrating a promotion performance prediction and recommendation apparatus 1 in an online shopping mall using artificial intelligence according to an embodiment of the present invention. As shown in FIG. 1, the promotion performance prediction and recommendation device in the online shopping mall using artificial intelligence according to an embodiment of the present invention is based on product attribute data, promotion attribute data, performance data, and promotions. It may include a performance prediction artificial neural network device 10 and the promotion recommendation device (11).

프로모션 성과 예측 인공신경망 장치(10) 관련, 도 2는 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 학습 과정을 도시한 모식도, 도 3은 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 추론 과정을 도시한 모식도이다. 도 2, 3에 도시된 바와 같이, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 방법은 제품 속성 데이터(100), 프로모션 속성 데이터(200), 성과 데이터(300)를 수신하여 프로모션 성과 예측 인공신경망 장치(10)를 학습시키고, 기학습된 프로모션 성과 예측 인공신경망 장치(10)에 제품 속성 데이터(100) 및 프로모션 속성 데이터(200)를 입력하여 예측되는 프로모션 성과 예측 데이터(400)을 출력할 수 있다. 특히, 프로모션 성과 예측 인공신경망 장치(10)의 학습시에는 프로모션 성과 데이터와 출력 데이터와의 에러(error)를 기초로 hidden layer의 weight를 업데이트하는 방법으로 진행될 수 있다. 도 2에서 제품 속성 데이터(100) 및 프로모션 속성 데이터(200)를 구성하는 각 속성은 x1, x2, x3와 같은 input layer의 각 노드에 입력되고, w1과 같은 weight을 기반으로 h1, h2, h3와 같은 hidden layer를 지나 softmax 등의 cost function 기반으로 예측된 프로모션 성과 예측 데이터(400)가 y1인 output layer로 출력되게 된다. 예측된 성과 데이터와 실제 성과 데이터(300)와의 에러(error, -Sigma(y_i log p_i))를 기반으로 프로모션 성과 예측 인공신경망 장치(10)의 weight가 back propagation으로 업데이트 될 수 있다. In relation to the promotion performance prediction neural network device 10, Figure 2 is a schematic diagram showing a learning process of the promotion performance prediction neural network device according to an embodiment of the present invention, Figure 3 is a promotion performance prediction according to an embodiment of the present invention It is a schematic diagram showing the reasoning process of artificial neural network device. As shown in Figures 2 and 3, the promotion performance prediction method in the online shopping mall using artificial intelligence receives the product attribute data 100, the promotion attribute data 200, the performance data 300, the promotion performance prediction artificial neural network The device 10 may be trained, and the product performance data 100 and the promotion property data 200 may be input to the pre-learned promotion performance prediction neural network device 10 to output the predicted promotion performance prediction data 400. have. In particular, when the promotion performance prediction neural network apparatus 10 learns, the weight of the hidden layer may be updated based on an error between the promotion performance data and the output data. In FIG. 2, each attribute constituting the product attribute data 100 and the promotion attribute data 200 is input to each node of an input layer such as x1, x2, and x3, and based on a weight such as w1, h1, h2, and h3. The promotional performance prediction data 400 predicted based on a cost function such as softmax after passing through a hidden layer, such as softmax, is output to the output layer of y1. Based on the prediction performance data and the error between the actual performance data (300) (error, -Sigma (y _i log p _i)), the weight of the promotion and prediction neural network apparatus 10 may be updated by back propagation.

또는, 각각의 제품에 대해 별도의 프로모션 성과 예측 인공신경망 장치(10)를 구성하고, 프로모션 속성 데이터(200), 성과 데이터(300)를 수신하여 프로모션 성과 예측 인공신경망 장치(10)를 학습시키고, 기학습된 프로모션 성과 예측 인공신경망 장치에 프로모션 속성 데이터(200)를 입력하여 예측되는 프로모션 성과 예측 데이터(400)을 출력할 수 있다. 이 때 이용될 수 있는 인공신경망 이외에도 다층 퍼셉트론(Multi-layer perceptron), 나이브 베이지안 분류(Naive-Bayesian Classification), 랜덤 포레스트 분류(Random Forest Classification) 등의 기계학습이 이용될 수 있다.Alternatively, a separate promotion performance prediction neural network device 10 may be configured for each product, and the promotion attribute data 200 and the performance data 300 may be received to learn the promotion performance prediction neural network device 10. The promotion attribute data 200 may be input to the pre-learned promotion performance prediction artificial neural network device to output the predicted promotion performance prediction data 400. In addition to the artificial neural network that can be used at this time, machine learning such as multi-layer perceptron, Naive-Bayesian Classification, Random Forest Classification can be used.

제품 속성 데이터(100)는 제품 카테고리, 제품 판매가격, 제품원가 등을 포함할 수 있다. The product attribute data 100 may include a product category, a product selling price, a product cost, and the like.

프로모션 속성 데이터(200)는 프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 등을 포함할 수 있다. 본 발명의 일실시예에 따른 프로모션 속성 데이터 중 프로모션 이미지 데이터는 기학습된 CNN(Convolutional Neural Network)을 통해 특정 분위기 카테고리, 특정 색감 카테고리로 분류된 결과값을 의미할 수 있다. The promotion attribute data 200 may include a promotion category (discount promotion, free gift promotion, etc.), a promotion period, a promotion shopping mall, an exposure account, a promotion image, and the like. The promotional image data among the promotional attribute data according to an embodiment of the present invention may refer to a result value classified into a specific mood category and a specific color category through a previously learned CNN (Convolutional Neural Network).

성과 데이터(300)는 판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등을 포함할 수 있다. 본 발명의 일실시예에 따르면, 성과 데이터(300)의 다중공선성(Multicollinearity)에 따른 문제를 방지하기 위하여 성과 데이터(300)를 구성하는 각각의 데이터에 대해 PCA를 적용하여 diagnol matrix의 형태로 공선성을 상쇄해줄 수 있다. The performance data 300 may include quantity, sales amount, page view, conversion rate, average sales price, and sales profit. According to an embodiment of the present invention, in order to prevent problems due to multicollinearity of the performance data 300, PCA is applied to each data constituting the performance data 300 in the form of a diagnol matrix. It can offset collinearity.

프로모션 속성 추천과 관련하여, 상기 프로모션 성과 예측 인공신경망 장치를 이용하여 가장 높은 프로모션 성과 예측 데이터(400)가 출력되도록 하는 추천 프로모션 속성 데이터를 출력하는 프로모션 추천 장치(11)가 구성될 수 있다. In relation to the promotion attribute recommendation, the promotion recommendation apparatus 11 for outputting the recommendation promotion attribute data for outputting the highest promotion performance prediction data 400 using the promotion performance prediction artificial neural network device may be configured.

본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 프로모션 추천은 강화학습을 이용하여 수행될 수 있다. 도 4는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)를 도시한 모식도이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 프로모션 추천 장치(11)는 특정 상태에서의 가치를 출력하는 가치 함수를 학습하는 인공신경망인 가치망(111) 및 추천 프로모션 속성의 확률을 출력하는 정책 함수를 학습하는 정책망(110)을 포함할 수 있고, 본 발명의 일실시예에 따른 정책망(110) 및 가치망(111)은 프로모션 성과 예측 인공신경망 장치(10)에 연결되도록 구성될 수 있다. 또한, 정책망(110)과 가치망(111)은 최적 프로모션 탐색 모듈(12)과 연결되어 추천 프로모션 속성 데이터(500)를 출력할 수 있다. Promotion recommendation of the promotion recommendation device 11 according to an embodiment of the present invention may be performed using reinforcement learning. 4 is a schematic diagram showing a promotion recommendation device 11 according to an embodiment of the present invention. As shown in Figure 4, the promotion recommendation device 11 according to an embodiment of the present invention is the probability of the value network 111 and the recommended promotional attributes that are artificial neural network learning a value function that outputs the value in a specific state It may include a policy network 110 for learning a policy function to output, the policy network 110 and the value network 111 according to an embodiment of the present invention is connected to the promotion performance prediction artificial neural network device 10 It may be configured to. In addition, the policy network 110 and the value network 111 may be connected to the optimal promotion search module 12 to output the recommended promotion attribute data 500.

강화학습의 관점에서, 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 Objective는 프로모션의 성과를 향상시키는 것이고, 상태(State)는 현재 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등) 및 환경 데이터(날짜, 요일, 날씨, 현재 프로모션 진행 중인 제품, 쇼핑몰, 쇼핑몰의 다른 제품 정보)를 의미할 수 있고, 액션(Action)은 프로모션 속성 데이터(프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 색상 등)을 의미할 수 있으며, 보상(Reward)은 특정 기간 동안의 성과 데이터의 상승 또는 하강을 의미할 수 있다. In terms of reinforcement learning, the objective of the promotion recommendation device 11 according to an embodiment of the present invention is to improve the performance of the promotion, and the state is the current performance data (Quantity, Sales Amount). ), Page View, Conversion Rate, Average Sales Price, Revenue, etc., and environmental data (date, day of the week, weather, current promotional products, shopping malls, other product information in the mall) ) May mean promotional attribute data (promotion category (discount promotions, free gift promotions, etc.), promotional periods, promotional stores, impression accounts, promotional image colors, etc.), and rewards. ) Can mean the rise or fall of performance data over a period of time.

정책망(110)은 프로모션 추천 장치(11)의 각 상태에서 추천 프로모션의 특정 속성들의 확률을 결정하는 인공신경망이고, 정책 함수를 학습하여 추천 프로모션 속성 확률을 출력하게 된다. 정책망의 Cost function은 정책함수와 가치망의 Cost Function을 곱하여 크로스 엔트로피(Cross Entropy)를 계산한 뒤 Policy gradient를 취한 함수일 수 있고, 예를 들면, 아래 수학식 1과 같이 구성될 수 있다. 정책망은 크로스 엔트로피와 가치망의 cost function인 시간차 에러의 곱을 기초로 back propagation 될 수 있다. The policy network 110 is an artificial neural network that determines the probability of specific attributes of the recommended promotion in each state of the promotion recommendation device 11, and outputs the recommended promotion attribute probability by learning a policy function. The cost function of the policy network may be a function that calculates cross entropy by multiplying the policy function and the cost function of the value network, and then takes a policy gradient. For example, the policy function may be configured as shown in Equation 1 below. The policy network can be back propagated based on the product of cross entropy and time difference error, which is a cost function of the value chain.

수학식 1에서, π는 정책 함수, θ는 정책망 파라미터, π_θ(a_i│s_i)는 현재 에피소드에서 특정 액션(특정 속성의 프로모션)을 할 가능성, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 프로모션 성과 예측 인공신경망 장치(10)에서 수신하도록 구성될 수 있다. 결국, 본 발명의 일실시예에 따른 정책망(110)은 Policy gradient를 통해 초기에는 사용자의 프로모션 히스토리를 모사하는 프로모션 속성을 출력하게 된다. In Equation 1, π is a policy function, θ is a policy network parameter, π _θ (a _i s _i ) is the likelihood that a particular action (promotion of a particular property) will be performed in the current episode, V is a value function, and w is a value chain. Parameter, s _i is the status information of the current episode i, S _{i + 1} is the status information of the next episode i + 1, r _{i + 1} is the reward expected to be obtained in the next episode, and V _w (s _i ) is The reward potential in the current episode, V _w (s _{i + 1} ) may be the reward potential in the next episode, and γ may mean a depreciation rate. In this case, r _{i + 1} may be configured to receive from the promotion performance prediction neural network device 10. As a result, the policy network 110 according to an embodiment of the present invention initially outputs a promotion attribute that simulates the promotion history of the user through a policy gradient.

본 발명의 일실시예에 따른 정책망(110)은 강화학습이 진행되기 이전에 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습(Supervised Learning)되어 정책망의 weight가 업데이트 됨으로써 정책의 기초를 학습할 수 있다. 즉, 정책망의 weight는 기존의 프로모션 속성 데이터 및 성과 데이터를 토대로 지도학습되어 설정될 수 있다. 이에 따르면, 기존의 프로모션의 기록에 의해 정책망이 매우 빠르게 학습될 수 있는 효과가 발생된다. The policy network 110 according to an embodiment of the present invention is supervised learning based on existing promotion attribute data and performance data before reinforcement learning is performed, and thus the weight of the policy network is updated. You can learn the basics. In other words, the weight of the policy network may be supervised and set based on existing promotion attribute data and performance data. According to this, there is an effect that the policy network can be learned very quickly by recording the existing promotion.

또한, 본 발명의 일실시예에 따르면 정책망(110)의 지도학습 시에 랜덤 벡터를 포함하여 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습이 되도록 구성될 수 있다. 랜덤 벡터는 예를 들면 가우시안 확률 분포(Gaussian distribution)를 이용할 수 있다. 이에 따르면, 정책망이 랜덤한 확률로 도전적인 프로모션 정책을 출력할 수 있게 되는 효과가 발생된다. 정책망(110)의 지도학습 시에 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습이 되도록 구성하면 프로모션의 추천이 기존의 정책 내에서 최적화되는 결과가 나타나게 된다. 하지만, 본 발명의 일실시예에 따라 정책망의 지도학습 시에 랜덤 벡터를 포함하게 되면 강화학습이 진행될수록 정책망이 기존의 정책보다 더 효과적인 프로모션을 학습할 수 있게 되는 효과가 발생된다. In addition, according to an embodiment of the present invention may be configured to be supervised learning on the basis of the existing promotion attribute data and the resulting performance data including a random vector during the supervised learning of the policy network 110. The random vector may use, for example, a Gaussian distribution. According to this, there is an effect that the policy network can output a challenging promotion policy with a random probability. When the supervised learning of the policy network 110 is configured to be supervised learning based on the existing promotion attribute data and the performance data, the recommendation of the promotion is optimized within the existing policy. However, if a random vector is included in the supervised learning of the policy network according to an embodiment of the present invention, as the reinforcement learning proceeds, the policy network can learn more effective promotion than the existing policy.

가치망(111)은 프로모션 추천 장치(11)가 가질 수 있는 각 상태(State)에서 보상(Reward)을 달성할 가능성을 도출하는 인공신경망이고, 가치 함수를 학습하게 된다. 가치망(111)은 에이전트(agent)인 프로모션 추천 장치(11)가 어떤 방향으로 업데이트 될 지에 대한 방향성을 제시해주게 된다. 이를 위해, 가치망(111)의 입력 변수는 프로모션 추천 장치(11)의 상태에 대한 정보인 상태 정보로 설정되고, 가치망(111)의 출력 변수는 프로모션 추천 장치(11)가 보상을 달성할 가능성인 보상 가능성 정보로 설정될 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보는 아래 수학식과 같은 Q-function으로 계산될 수 있다. The value network 111 is an artificial neural network that derives the possibility of achieving a reward in each state that the promotion recommendation device 11 may have, and learns a value function. The value network 111 presents directions for which direction the promotion recommendation device 11, which is an agent, is updated. To this end, the input variable of the value network 111 is set to state information that is information about the state of the promotion recommendation device 11, the output variable of the value network 111 is the promotion recommendation device 11 to achieve the reward It may be set as reward possibility information that is a possibility. Compensation probability information according to an embodiment of the present invention may be calculated by a Q-function as shown in the following equation.

위 수학식 2에서 Q_π는 특정 정책 π에서 상태 s, 액션 a인 경우 미래에 예상되는 전체 보상 가능성 정보를 의미하고, R은 특정 기간의 보상, gamma는 감가율을 의미할 수 있다. S_t는 시간 t의 상태, A_t는 시간 t의 액션, E는 기대값을 의미할 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보(Q value)는 정책망(110)의 업데이트 방향 및 크기를 규정하게 된다. In Equation 2, Q _π may indicate overall reward possibility information in the future when state s and action a in a specific policy π, R may mean compensation for a specific period, and gamma may mean a depreciation rate. S _t may represent a state of time t, A _t may represent an action of time t, and E may represent an expected value. Compensation possibility information (Q value) according to an embodiment of the present invention defines the update direction and size of the policy network 110.

이때, 가치망의 Cost function은 가치 함수에 대한 MSE(Mean Square error) 함수일 수 있고, 예를 들면 아래 수학식 3과 같이 구성될 수 있다. 가치망(111)은 가치망의 cost function인 시간차 에러를 기초로 back propagation 될 수 있다. In this case, the cost function of the value network may be a mean square error (MSE) function for the value function, and may be configured as shown in Equation 3 below. The value chain 111 may be back propagated based on a time difference error that is a cost function of the value chain.

수학식 2에서, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 프로모션 성과 예측 인공신경망 장치(10)에서 수신하도록 구성될 수 있다. In Equation 2, V is a value function, w is a value chain parameter, s _i is status information of the current episode i, S _{i + 1} is status information of the next episode _{i + 1} , and r _{i + 1} is The reward expected to be obtained, V _w (s _i ) may be the reward potential in the current episode, V _w (s _{i + 1} ) may be the reward probability in the next episode, and γ may mean a depreciation rate. In this case, r _{i + 1} may be configured to receive from the promotion performance prediction neural network device 10.

이에 따라, 가치망은 프로모션 추천 장치의 상태가 변경될 때 수학식 1의 Cost Function을 Gradient descent 시키는 방향으로 업데이트 할 수 있다. Accordingly, the value network may be updated in a direction in which the cost function of Equation 1 is gradientd when the state of the promotion recommendation device is changed.

본 발명의 일실시예에 따르면 가치망을 정책망과 별도로 학습시키면서, 가치망의 Q value가 랜덤에서 시작하지 않고 Supervised되게 되므로 빠른 학습이 가능해지는 효과가 발생된다. 이에 따르면 매우 복잡도가 높은 프로모션 속성의 조합을 선택하는 액션(action)에 있어서 탐구(exploration) 부담을 크게 줄일 수 있게 되는 효과가 발생된다. According to an embodiment of the present invention, while learning the value network separately from the policy network, the Q value of the value network is supervised instead of starting at random, so that the rapid learning is possible. This results in an effect of greatly reducing the exploration burden in the action of selecting a combination of highly complex promotional attributes.

본 발명의 일실시예에 따른 프로모션 추천 장치(11)에 따르면, 지도학습을 마친 정책망(110)이 현재 에피소드 i의 프로모션 속성을 추천하게 되면 가치망(111)이 추천된 프로모션 속성을 진행할 경우의 보상을 예측하도록 학습된다. 학습을 마친 프로모션 추천 장치(11)의 정책망(110)과 가치망(111)은 최적 프로모션 탐색 모듈(112)을 활용한 시뮬레이션과 조합되어 최종적으로 프로모션 속성을 선정하는데 활용된다. According to the promotion recommendation device 11 according to an embodiment of the present invention, when the policy network 110 that has completed the map learning recommends the promotion attribute of the current episode i, the value network 111 proceeds to the recommended promotion attribute. Is learned to predict the reward. The policy network 110 and the value network 111 of the promotion recommendation device 11 that have completed the learning are combined with a simulation using the optimal promotion search module 112 and finally used to select promotion attributes.

또한, 본 발명의 일실시예에 따른 가치망(111)에 따르면 추천 프로모션 속성의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 프로모션 추천 모델에 적용하는데는 어려움이 있었다. In addition, according to the value network 111 according to an embodiment of the present invention, there is an effect that the update of the policy network that outputs the probability of the recommended promotion attribute may proceed every episode. In the existing reinforcement learning, there is a problem that the update of the reinforcement learning model is performed after all episodes are finished, so it is difficult to apply to the promotion recommendation model.

최적 프로모션 탐색 모듈(112)은 정책망과 가치망에서 계산되는 복수의 에이전트(agent)를 기초로 다양한 상태 및 다양한 액션에 대한 복수회의 시뮬레이션을 진행하여 최적의 프로모션 속성을 탐색하는 구성이다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈은, 예를 들어, 몬테카를로 트리 탐색을 활용할 수 있고, 트리의 각 노드는 상태(state)를, 각 연결(edge)은 해당 상태에 대한 특정 액션에 따라 예상되는 가치(value)를 나타내며, 현재 상태를 뿌리 노드로 두고 새로운 액션을 취해 새로운 상태로 전이될 때 마다 잎(leaf) 노드가 확장되는 구조이다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈에서 최적 프로모션 탐색은 몬테카를로 트리 탐색이 활용되는 경우, Selection, Expansion, Evaluation, Backup의 4 단계로 처리될 수 있다. The optimal promotion search module 112 is configured to search for an optimal promotion attribute by conducting a plurality of simulations for various states and various actions based on a plurality of agents calculated in a policy network and a value network. Optimal promotion search module according to an embodiment of the present invention, for example, may utilize Monte Carlo tree search, each node of the tree to a state, each edge to a specific action for that state It represents the expected value, and the leaf node expands whenever the new state is taken by taking the current state as the root node and taking a new action. In the optimal promotion search module according to an embodiment of the present invention, the optimal promotion search may be processed in four stages of selection, expansion, evaluation, and backup when the Monte Carlo tree search is utilized.

최적 프로모션 탐색 모듈(112)의 Selection 단계는, 현재 상태로부터 잎 노드가 나올 때까지 선택 가능한 액션 중 가장 가치가 높은 액션을 선택하며 진행하는 단계이다. 이 때 연결(edge)에 저장해 둔 가치함수의 값과 탐구-이용 균형을 맞추기 위한 방문빈도 값을 이용한다. Selection 단계에서 액션 선택을 위한 수학식은 아래와 같다. The selection step of the optimal promotion search module 112 is a step of selecting and selecting the most valuable action from the current state until the leaf node emerges from the current state. At this time, the value of the value function stored in the edge is used and the visit frequency value to balance the exploration-use. The equation for action selection in the selection step is shown below.

위 수학식 4에서 a_t는 시간t에서의 액션(프로모션 수행)이고, Q(s_t,a)는 트리에 저장된 가치함수의 값이며, u(s_t,a)는 해당 상태-액션 쌍의 방문횟수에 반비례하는 값으로 탐구(exploration)와 이용의 균형을 맞추기 위해 사용된 것이다. In Equation 4, a _t is an action at time t (promotion is performed), Q (s _t , a) is a value function stored in the tree, and u (s _t , a) is the state-action pair. Inversely proportional to the number of visits, it is used to balance exploration and usage.

최적 프로모션 탐색 모듈(112)의 Expansion 단계는, 시뮬레이션이 잎 노드까지 진행되면 지도학습으로 학습된 정책망의 확률에 따라 액션하여 새로운 노드를 잎 노드로 추가하는 단계이다. The expansion step of the optimum promotion search module 112 is a step of adding a new node to the leaf node by acting according to the probability of the policy network learned through supervised learning when the simulation proceeds to the leaf node.

최적 프로모션 탐색 모듈(112)의 Evaluation 단계는, 새로 추가된 잎 노드로부터 가치망을 사용해 판단한 가치(보상 가능성)와 잎 노드로부터 정책망을 사용해 프로모션 에피소드가 끝날 때까지 진행해 얻은 보상을 통해 잎 노드의 가치를 평가하는 단계이다. 아래 수학식은 새로운 잎 노드의 가치를 평가하는 예시이다. The evaluation stage of the optimal promotion search module 112 uses the value chain determined from the newly added leaf node (reward potential) and the reward obtained by proceeding until the end of the promotion episode using the policy network from the leaf node. It is a step in evaluating value. The following equation is an example of evaluating the value of a new leaf node.

위 수학식 5에서 V(s_L)은 잎 노드의 가치, λ는 mixing 파라미터, v_θ(s_L)은 가치망을 통해 얻은 가치, z_L은 시뮬레이션을 계속하여 얻은 보상을 의미할 수 있다. In Equation 5, V (s _L ) may be a leaf node value, λ is a mixing parameter, v _θ (s _L ) may be a value obtained through a value network, and z _L may be a reward obtained by continuing a simulation.

최적 프로모션 탐색 모듈(112)의 Backup 단계는, 새로 추가된 잎 노드의 가치를 반영하여 시뮬레이션 중 방문한 노드들의 가치를 재평가하고 방문 빈도를 업데이트하는 단계이다. 아래 수학식은 노드 가치 재평가 및 방문 빈도 업데이트의 예시이다. The backup step of the optimal promotion search module 112 is a step of re-evaluating the value of the nodes visited during the simulation and updating the frequency of visits by reflecting the value of the newly added leaf node. Equation below is an example of node value revaluation and visit frequency update.

위 수학식 6에서 s_L ⁱ는 i번째 시뮬레이션에서의 잎 노드를, 1(s,a,i)는 i번째 시뮬레이션에서 연결 (s,a)를 방문했는지를 나타내고, 트리 탐색이 완료되면 알고리즘은 뿌리 노드로부터 가장 많이 방문된 연결(s,a)을 선택하도록 구성될 수 있다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈(112)에 따르면 정책망에 의해 선별되는 복수의 프로모션 속성에 대해 가치망을 기초로 복수회 시뮬레이션을 선행하여 최적의 프로모션 속성을 선택할 수 있게되는 효과가 발생된다. In Equation 6 above, s _L ⁱ represents a leaf node in the i-th simulation, 1 (s, a, i) represents whether a visit (s, a) is visited in the i-th simulation, and when the tree search is completed, the algorithm It may be configured to select the most visited connection (s, a) from the root node. According to the optimum promotion search module 112 according to an embodiment of the present invention, the effect of being able to select the optimal promotion attribute by preceding the simulation multiple times based on the value network for the plurality of promotion attributes selected by the policy network. Is generated.

본 발명의 일실시예에 따르면, 복수의 에이전트(Agent)가 구성되도록 프로모션 추천 장치(11)가 구성될 수 있다. 복수의 에이전트가 구성되면 특정 상태, 특정 프로모션 속성 각각에 대해 프로모션 추천 장치가 추천하는 프로모션이 상호 경쟁하여, 일정한 예산 내에서 가장 최적의 제품 및 그에 대한 프로모션을 추천할 수 있게 되는 효과가 발생된다.According to an embodiment of the present invention, the promotion recommendation device 11 may be configured to configure a plurality of agents. When a plurality of agents are configured, promotions recommended by the promotion recommendation device for each of a specific state and a specific promotion attribute may compete with each other, and thus, the most optimal product and a promotion thereof may be recommended within a predetermined budget.

도 5는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 동작예를 도시한 흐름도이다. 도 5에 도시된 바와 같이, 복수의 인터넷 쇼핑몰에서 상태 데이터를 트래킹하는 인터넷 쇼핑몰 데이터 허브 장치(12)에 의해 상태 s(t)가 입력되면 가치망(111)에 의해 정책망(110)의 복수개의 에이전트(agent)들에 의해 다양한 프로모션 속성들이 최적 프로모션 탐색 모듈(112)에 입력되고, 최적 프로모션 탐색 모듈(112)에 의해 출력되는 액션(action)인 추천 프로모션 속성 확률 a(t)에 의해 프로모션이 진행되는 것으로 에피소드 t가 종료되고 에피소드 t+1이 시작된다. 에피소드 t+1에서는 다시 a(t)에 의한 상태 변화인 s(t+1)이 인터넷 쇼핑몰 데이터 허브 장치(12)에 의해 입력되고, a(t)에 따른 보상인 r(t+1)이 곧바로 입력되어 가치망(111) 및 정책망(110)을 업데이트하게 된다. 5 is a flowchart illustrating an operation example of the promotion recommendation device 11 according to an embodiment of the present invention. As shown in FIG. 5, when the state s (t) is input by the Internet shopping mall data hub device 12 that tracks the state data in the plurality of internet shopping malls, the plurality of policy networks 110 are determined by the value network 111. Various promotion attributes are inputted to the optimal promotion search module 112 by two agents, and the promotion is performed by the recommended promotion attribute probability a (t) which is an action output by the optimal promotion search module 112. This progression ends episode t and episode t + 1 begins. In episode t + 1, s (t + 1), which is a state change caused by a (t), is again input by the Internet shopping mall data hub device 12, and r (t + 1), which is a reward according to a (t), It is input immediately to update the value network 111 and policy network 110.

또한, 본 발명의 일실시예에 따르면, 추천 프로모션 속성 데이터(500)에 포함되는 데이터 중 프로모션 이미지 색상 정보의 경우, 본 발명의 일실시예에 따르면 실제 프로모션 이미지를 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)가 수신하여 프로모션 이미지 색상 정보을 기초로 프로모션 이미지의 색감을 수정하여 수정된 프로모션 이미지를 출력할 수 있다. 도 6은 본 발명의 일실시예에 따른 프로모션 이미지 수정 장치(13)를 도시한 모식도이다. 도 6에 도시된 바와 같이, 프로모션 이미지 수정 장치(13)는 제너레이터와 ConvNet 인코더를 포함하도록 구성될 수 있다. In addition, according to an embodiment of the present invention, in the case of the promotional image color information among the data included in the recommended promotion attribute data 500, according to an embodiment of the present invention in the online shopping mall using artificial intelligence The promotion performance prediction and recommendation device 1 may receive the modified promotion image by correcting the color of the promotion image based on the promotion image color information. 6 is a schematic diagram showing a promotional image modification device 13 according to an embodiment of the present invention. As shown in FIG. 6, the promotional image modification device 13 may be configured to include a generator and a ConvNet encoder.

제너레이터는 VAE, GAN 등의 인코더 및 디코더로 구성된 이미지 제너레이터로 구성될 수 있고, 실제 프로모션 이미지(210)를 수신하여 색상이 변경된 생성된 프로모션 이미지(212)를 출력할 수 있다. 생성된 프로모션 이미지(212)는 다시 ConvNet 인코더에 수신되고, ConvNet 인코더는 인코딩된 프로모션 이미지(211)를 출력하여 추천 프로모션 속성 데이터(500)의 프로모션 이미지 색상 정보와 Cross entropy 계산되어 에러를 출력할 수 있다. 이러한 에러는 제너레이터에 다시 피드(feed)되어 생성된 프로모션 이미지(212)가 더 프로모션 이미지 색상 정보에 맞게 생성(generate)되도록 업데이트 될 수 있다. The generator may be configured as an image generator including encoders and decoders such as VAE, GAN, etc., and may receive the actual promotional image 210 and output the generated promotional image 212 having changed color. The generated promotional image 212 is received by the ConvNet encoder again, and the ConvNet encoder outputs the encoded promotional image 211 to calculate cross-entropy with the promotional image color information of the recommended promotional attribute data 500 and output an error. have. This error may be updated so that the generated promotional image 212 is fed back to the generator to generate more for the promotional image color information.

도 7은 본 발명의 일실시예에 따른 ConvNet 인코더의 예를 도시한 모식도이다. 도 7에 도시된 바와 같이, ConvNet 인코더는 [INPUT-CONV-RELU-POOL-FC]으로 구축할 수 있다. 입력 데이터인 생성된 프로모션 이미지(212)의 경우, INPUT 입력 이미지가 가로 32, 세로 32, 그리고 RGB 채널을 가지는 경우 입력의 크기는 [32x32x3]이다. CONV 레이어(Conv. Filter, 101)는 입력 이미지의 일부 영역과 연결되어 있으며, 이 연결된 영역과 자신의 가중치의 내적 연산(dot product)을 계산하게 된다. 결과 볼륨은 [32x32x12]와 같은 크기를 갖게 된다. RELU 레이어는 max(0,x)와 같이 각 요소에 적용되는 액티베이션 함수(activation function)이다. RELU 레이어는 볼륨의 크기를 변화시키지 않는다([32x32x12]). 그 결과 Activation map 1 (102)을 생성한다. POOL 레이어(pooling, 103)는 "가로,세로" 차원에 대해 다운샘플링(downsampling)을 수행해 [16x16x12]와 같이 줄어든 볼륨(Activation map 2, 104)을 출력한다. FC (fully-connected) 레이어(105)는 클래스 점수들을 계산해 [1x1x10]의 크기를 갖는 볼륨(output layer, 106)을 출력한다. "10"은 10개 카테고리에 대한 클래스 점수(본 발명의 일실시예에 따른 프로모션 이미지 색상 정보)에 해당한다. FC 레이어는 이전 볼륨의 모든 요소와 연결되어 있다.7 is a schematic diagram showing an example of a ConvNet encoder according to an embodiment of the present invention. As shown in FIG. 7, the ConvNet encoder may be constructed with [INPUT-CONV-RELU-POOL-FC]. In the case of the generated promotional image 212 that is input data, when the INPUT input image has 32 horizontal, 32 vertical, and RGB channels, the size of the input is [32x32x3]. The CONV layer (Conv. Filter) 101 is connected to some areas of the input image and calculates a dot product of the connected areas and their weights. The resulting volume will be the same size as [32x32x12]. The RELU layer is an activation function applied to each element, such as max (0, x). The RELU layer does not change the size of the volume ([32x32x12]). As a result, we create Activation map 1 (102). The POOL layer (pooling) 103 performs downsampling on the "horizontal and vertical" dimension and outputs a reduced volume (Activation map 2, 104) such as [16x16x12]. The fully-connected layer 105 calculates class scores and outputs a volume 106 having a size of [1 × 1 × 10]. "10" corresponds to a class score (promotional image color information according to an embodiment of the present invention) for ten categories. The FC layer is connected to all the elements of the previous volume.

이와 같이, ConvNet은 픽셀 값으로 이뤄진 원본 이미지를 각 레이어를 거치며 클래스 점수(본 발명의 일실시예에 따른 프로모션 이미지 색상 정보)로 변환(transform)시킨다. 어떤 레이어는 모수 (parameter)를 갖지만 어떤 레이어는 모수를 갖지 않는다. 특히 CONV/FC 레이어들은 단순히 입력 볼륨만이 아니라 가중치(weight)와 바이어스(bias)도 포함하는 액티베이션(activation) 함수이다. 반면 RELU/POOL 레이어들은 고정된 함수이다. CONV/FC 레이어의 모수 (parameter)들은 각 이미지에 대한 클래스 점수가 해당 이미지의 레이블과 같아지도록 그라디언트 디센트(gradient descent)로 학습된다.As such, ConvNet transforms the original image consisting of pixel values into class scores (promotional image color information according to an embodiment of the present invention) through each layer. Some layers have parameters, while others do not have parameters. In particular, CONV / FC layers are activation functions that include weight and bias as well as input volume. RELU / POOL layers, on the other hand, are fixed functions. The parameters of the CONV / FC layer are learned in gradient descent so that the class score for each image is equal to the label of that image.

CONV 레이어의 모수(parameter)들은 일련의 학습가능한 필터들로 이뤄져 있다. 각 필터는 가로/세로 차원으로는 작지만 깊이 (depth) 차원으로는 전체 깊이를 아우른다. 포워드 패스(forward pass) 때에는 각 필터를 입력 볼륨의 가로/세로 차원으로 슬라이딩시키며(정확히는 convolve시키며) 2차원의 액티베이션 맵 (activation map)을 생성한다. 필터를 입력 위로 슬라이딩 시킬 때, 필터와 입력 볼륨 사이에서 내적 연산(dot product)이 이뤄진다. 이러한 과정으로 ConvNet은 입력 데이터의 특정 위치의 특정 패턴에 대해 반응하는(activate) 필터를 학습하게 된다. 이런 액티베이션 맵(activation map)을 깊이(depth) 차원으로 쌓은 것이 곧 출력 볼륨이 된다. 그러므로 출력 볼륨의 각 요소들은 입력의 작은 영역만을 취급하고, 같은 액티베이션 맵 내의 뉴런들은 같은 필터를 적용한 결과이므로 같은 모수들을 공유한다.The parameters of the CONV layer consist of a set of learnable filters. Each filter is small in the horizontal / vertical dimension but encompasses the full depth in the depth dimension. In the forward pass, each filter slides (or precisely convolves) each filter into the horizontal / vertical dimension of the input volume and generates a two-dimensional activation map. When sliding the filter over the input, a dot product occurs between the filter and the input volume. This process allows ConvNet to learn filters that activate specific patterns at specific locations in the input data. Stacking these activation maps in the depth dimension is the output volume. Therefore, each element of the output volume handles only a small area of the input, and neurons in the same activation map share the same parameters because they are the result of applying the same filter.

도 8은 본 발명의 일실시예에 따른 프로모션 추천 방법을 도시한 흐름도이다. 도 8에 도시된 바와 같이, 본 발명의 일실시예에 따른 프로모션 추천 방법은, 프로모션 성과 예측 단계(S10), 상태 정보 수신 단계(S11), 추천 프로모션 속성 데이터 출력 단계(S12), 상태 및 보상 정보 수신 단계(S13), 정책망 및 가치망 업데이트 단계(S14)를 포함할 수 있다. 8 is a flowchart illustrating a promotion recommendation method according to an embodiment of the present invention. As shown in Figure 8, the promotion recommendation method according to an embodiment of the present invention, the promotion performance prediction step (S10), the status information receiving step (S11), the recommended promotion attribute data output step (S12), status and reward It may include the information receiving step (S13), policy network and value network update step (S14).

프로모션 성과 예측 단계(S10)는 프로모션 성과 예측 인공신경망 장치가 기존의 프로모션 속성 데이터, 성과 데이터를 수신하여 특정 프로모션 속성에 대한 프로모션 성과를 예측한 뒤 프로모션 추천 장치(11)에 송신하는 단계이다. 예측된 프로모션 성과 데이터는 정책망과 가치망의 초기값으로 이용될 수 있다. The promotion performance prediction step (S10) is a step in which the promotion performance prediction artificial neural network device receives the existing promotion attribute data and the performance data, predicts the promotion performance for the specific promotion attribute, and transmits it to the promotion recommendation apparatus 11. The predicted promotional performance data can be used as an initial value for the policy and value chains.

상태 정보 수신 단계(S11)는 프로모션 추천 장치(11)가 인터넷 쇼핑몰 데이터 허브 장치에서 에피소드 t에 대한 상태 정보(s_t)를 수신하는 단계이다. In the state information receiving step S11, the promotion recommendation device 11 receives the state information s _t for the episode t from the Internet shopping mall data hub device.

추천 프로모션 속성 데이터 출력 단계(S12)는 프로모션 추천 장치(11)가 최적 프로모션 탐색 모듈(112)을 통해 에피소드 t에 대한 추천 프로모션 속성 확률을 포함하는 추천 프로모션 속성 데이터(a_t)를 출력하는 단계이다. The recommended promotion attribute data output step S12 is a step in which the promotion recommendation apparatus 11 outputs the recommended promotion attribute data a _t including the recommended promotion attribute probability for the episode t through the optimal promotion search module 112. .

상태 및 보상 정보 수신 단계(S13)는 추천 프로모션 속성 데이터에 따라 인터넷 쇼핑몰에 프로모션이 적용된 뒤, 프로모션 추천 장치(11)가 인터넷 쇼핑몰 데이터 허브 장치에서 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 단계이다. In the step S13 of receiving status and reward information, after the promotion is applied to the Internet shopping mall according to the recommended promotion property data, the promotion recommendation device 11 performs status information about episode t + 1 in the Internet shopping mall data hub device (s _{t + 1).} ) And the compensation information r _{t + 1} for a _t .

정책망 및 가치망 업데이트 단계(S14)는 수신된 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 기초로 정책망과 가치망을 업데이트하는 단계이다. The policy network and value network updating step S14 is a step of updating the policy network and the value network based on the received state information s _{t + 1} and reward information r _{t + 1} for a _t .

강화학습의 관점에서, 본 발명의 일실시예에 따른 프로모션 추천 방법의 Objective는 프로모션의 성과를 향상시키는 것이고, 상태(State)는 현재 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등) 및 환경 데이터(날짜, 요일, 날씨, 현재 프로모션 진행 중인 제품, 쇼핑몰, 쇼핑몰의 다른 제품 정보)를 의미할 수 있고, 액션(Action)은 프로모션 속성 데이터(프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 색상 등)을 의미할 수 있으며, 보상(Reward)은 특정 기간 동안의 성과 데이터의 상승 또는 하강을 의미할 수 있다. In terms of reinforcement learning, the objective of the promotion recommendation method according to an embodiment of the present invention is to improve the performance of the promotion, and the state is the current performance data (Quantity, Sales Amount, Inflow). (Page View), Conversion Rate, Average Sales Price, Revenue, etc.) and environmental data (date, day of the week, weather, current promotional products, shopping malls, and other product information). Action may mean promotion attribute data (promotion category (discount promotion, free gift promotion, etc.), promotion period, promotion store, exposure account, promotional image color, etc.), and reward may be It can mean the rise or fall of performance data over time.

이상에서 설명한 바와 같이, 본 발명이 속하는 기술 분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 상술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함하는 것으로 해석되어야 한다.As described above, those skilled in the art will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features. Therefore, the above-described embodiments are to be understood in all respects as illustrative and not restrictive. The scope of the present invention is shown by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention.

본 명세서 내에 기술된 특징들 및 장점들은 모두를 포함하지 않으며, 특히 많은 추가적인 특징들 및 장점들이 도면들, 명세서, 및 청구항들을 고려하여 당업자에게 명백해질 것이다. 더욱이, 본 명세서에 사용된 언어는 주로 읽기 쉽도록 그리고 교시의 목적으로 선택되었고, 본 발명의 주제를 묘사하거나 제한하기 위해 선택되지 않을 수도 있다는 것을 주의해야 한다.The features and advantages described herein do not include all, and in particular many additional features and advantages will become apparent to those skilled in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used herein has been chosen primarily for ease of reading and for teaching purposes, and may not be selected to describe or limit the subject matter of the present invention.

본 발명의 실시예들의 상기한 설명은 예시의 목적으로 제시되었다. 이는 개시된 정확한 형태로 본 발명을 제한하거나, 빠뜨리는 것 없이 만들려고 의도한 것이 아니다. 당업자는 상기한 개시에 비추어 많은 수정 및 변형이 가능하다는 것을 이해할 수 있다.The foregoing description of the embodiments of the invention has been presented for purposes of illustration. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Those skilled in the art can appreciate that many modifications and variations are possible in light of the above disclosure.

그러므로 본 발명의 범위는 상세한 설명에 의해 한정되지 않고, 이를 기반으로 하는 출원의 임의의 청구항들에 의해 한정된다. 따라서, 본 발명의 실시예들의 개시는 예시적인 것이며, 이하의 청구항에 기재된 본 발명의 범위를 제한하는 것은 아니다.Therefore, the scope of the present invention is not limited by the detailed description, but is defined by any claims of the application on which it is based. Accordingly, the disclosure of the embodiments of the present invention is illustrative and does not limit the scope of the invention described in the claims below.

1: 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치
10: 프로모션 성과 예측 인공신경망 장치
11: 프로모션 추천 장치
12: 인터넷 쇼핑몰 데이터 허브 장치
13: 프로모션 이미지 수정 장치
100: 제품 속성 데이터
110: 정책망
111: 가치망
112: 최적 프로모션 탐색 모듈
200: 프로모션 속성 데이터
300: 성과 데이터
400: 프로모션 성과 예측 데이터
500: 추천 프로모션 속성 데이터1: Predicting and recommending promotion performance in online shopping mall using artificial intelligence
10: Promotion performance forecast neural network device
11: promotional recommendation device
12: Internet shopping mall data hub device
13: promotional image correction device
100: product attribute data
110: policy network
111: value chain
112: Optimal promotion navigation module
200: Promotional Attribute Data
300: performance data
400: Promotional performance forecast data
500: Referral promotion attribute data

Claims

The program code of the promotion performance prediction neural network module, the promotion recommendation module recommending the promotion attribute, and the promotion image modification module to modify the color of the promotion image based on the promotion attribute data and the performance data of the at least one online shopping mall. A memory module for storing; And
A processing module for processing program codes of the promotion performance prediction artificial neural network module, the promotion recommendation module, and the promotion image modification module to recommend specific promotion attributes and to modify the color of the promotion image;
Including,
The program code of the promotion recommendation module,
A promotion performance prediction step of receiving, at the promotion performance prediction artificial neural network module, promotion performance prediction data which is information on the promotion performance predicted for a specific promotion attribute;
Receiving status information receiving first status information s _t for the episode t;
An attribute for generating reward possibility information in a value network based on the first state information and the promotion performance prediction data, and then outputting a recommended promotion attribute probability that is a plurality of actions in a policy network based on the reward probability information; Probability output step;
A recommendation promotion attribute data output step of selecting and outputting recommendation promotion attribute data (a _t ) for the episode t through an optimal promotion search based on the plurality of recommendation promotion attribute probabilities;
After the promotion according to the recommended promotion attribute data is applied, second state information s _{t + 1} for episode t + 1 and reward information r _{t + 1} for the recommended promotion attribute data a _t are received. Receiving state and reward information; And
A policy network and value network updating step of updating the policy network and the value network based on the received second state information s _{t + 1} and the reward information r _{t + 1} ;
Is configured to run on a computer, including
The program code of the promotional image modification module,
A promotional image receiving step of receiving the promotional image;
Generating a modified promotional image by modifying the color of the promotional image by an image generator including a VAE or a GAN;
A comparison step of encoding the modified promotional image and calculating an error with recommended promotional image color information in the recommended promotional attribute data; And
An update step of updating the image generator based on the error;
It is configured to run on a computer, including
The first state information and the second state information are performance data (Quantity, Sales Amount, Inflow (Page View), Conversion Rate, Average Sales Price, or Sales Profit). And environmental data (date information, day information, weather information, promotional product information, shopping mall information, or other product information of the shopping mall),
The recommended promotion attribute data includes recommended promotion image color information.
The reward information refers to the rise or fall of the performance data for a specific period of time,
The policy network is an artificial neural network for outputting the recommended promotion attribute probability in each state, and the cost function of the policy network is configured as in Equation 1 below.
The value network is an artificial neural network for generating the rewardability information that is the possibility of achieving a reward in each state, the cost function of the value network is characterized in that it is configured as
Device for modifying promotion image in online shopping mall using policy network and value network:
[Equation 1]

[Equation 2]

In Equation 1, π is a policy function, θ is the possibility to network policy parameters, π _θ _(i a │s _i) has a specific action in the current episode (promotion of a particular property), V value is a function, w is a value Network parameter, s _i is the status information of the current episode i, S _{i + 1} is the status information of the next episode i + 1, r _{i + 1} is the reward expected to be obtained in the next episode, V _w (s _i ) Is the reward probability in the current episode, V _w (s _{i + 1} ) is the reward probability in the next episode, and γ is the depreciation rate.
In Equation 2, V is a value function, w is a value network parameter, s _i is the status information of the current episode i, S _{i + 1} is the next episode i + 1 status information, r _{i + 1} is the next episode The reward expected to be obtained from V _w (s _i ) is the reward probability in the current episode, V _w (s _{i + 1} ) is the reward probability in the next episode, and γ is the depreciation rate.