KR102161294B1

KR102161294B1 - Promotion Performance Prediction and Recommendation Apparatus in Online Shopping Mall Using Artificial Intelligence

Info

Publication number: KR102161294B1
Application number: KR1020180152767A
Authority: KR
Inventors: 남기헌; 남기준; 남정우
Original assignee: 남정우; 남기헌; 남기준
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-10-05
Also published as: KR20190134966A

Abstract

본 발명은 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법에 관한 것이다. 본 발명의 일실시예에 따르면, 인공지능의 학습에 의해 온라인 쇼핑몰의 프로모션이 정교해지고 정밀해짐으로써 프로모션의 효과가 상승하는 효과가 발생된다. The present invention relates to an apparatus and method for predicting and recommending promotion performance in an online shopping mall using artificial intelligence. According to an embodiment of the present invention, the promotion of the online shopping mall is elaborated and precise by learning of artificial intelligence, thereby increasing the effect of the promotion.

Description

Promotion Performance Prediction and Recommendation Apparatus in Online Shopping Mall Using Artificial Intelligence}

본 발명은 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for predicting and recommending promotion performance in an online shopping mall using artificial intelligence.

온라인 쇼핑몰에서의 프로모션은 제품 할인, 경품 이벤트, 할인 쿠폰 지급, 마일리지 지급 등의 형태로 진행된다. 이러한 프로모션은 온라인 쇼핑몰의 입장에서 항상 순익의 감소를 야기하기 때문에 프로모션의 종류, 프로모션 기간, 프로모션의 정도 등이 해당 페르소나에 맞게 정밀하게 설계되어야 한다. 하지만 기존에는 프로모션의 설계가 모두 온라인 쇼핑몰의 운영자나 프로모션 대행사 등에 의해 경험적으로 결정되는 실정이었다.Promotions in online shopping malls are conducted in the form of product discounts, giveaway events, discount coupons, and mileage payments. Since such promotion always causes a decrease in net profit from the viewpoint of online shopping malls, the type of promotion, promotion period, and degree of promotion must be precisely designed according to the persona. However, in the past, all promotion designs were empirically determined by the operators of online shopping malls or promotion agencies.

대한민국의 온라인 쇼핑은 오픈마켓/소셜커머스/종합쇼핑몰/대형마트쇼핑몰/카드포인트몰 등 다양한 형태의 회사들이 존재하고 있고, 대부분의 판매자들은 위의 온라인 쇼핑몰에 복수로 입점하여 판매를 하고 있어서, 통일된 데이터를 바탕으로 한 판매 성과를 분석하기 어려운 문제가 발생하고 있다. For online shopping in Korea, there are various types of companies such as open market/social commerce/general shopping mall/large mart shopping mall/card point mall, and most of the sellers sell multiple stores in the above online shopping malls. There is a problem that it is difficult to analyze sales performance based on the obtained data.

프로모션의 구성요소는 프로모션의 일시 및 조건들과 관련된 내용으로 프로모션 결과치와 인과관계를 이루고 있으나, 구성요건의 하나인 일시를 예를 들어도 계절, 월, 요일, 시간, 기간, 공휴일 등의 다양한 사항들이 포함되어 있어서 결과치를 바탕으로 구성요소가 기여한 기여도 등을 사람이 계산하는 것은 매우 어려운 실정이다. The components of the promotion are related to the date and time and conditions of the promotion and have a causal relationship with the promotion result. Since it is included, it is very difficult for a person to calculate the contribution of the component based on the result.

특히, 소비자에게 혜택으로 작용할 프로모션 감가율의 경우, 감가율이 커질수록 구매 혜택이 커지는 결과를 가져오므로 매출 수량이 늘어나서 매출이 증가할 것이다. 하지만, 제품의 원가를 고려했을 때 매출수량 증가분과 감가율의 적정 지점에서 균형있게 적용하는 것이 중요하고, 이때 기대한 매출 이익을 달성할 수 있게 될 것 인지에 대해 사람이 계산하는 것은 매우 어려운 실정이다. In particular, in the case of the promotion depreciation rate, which will serve as a benefit to consumers, the larger the depreciation rate, the greater the purchase benefit, so the quantity of sales will increase and sales will increase. However, when considering the cost of the product, it is important to apply a balance between the increase in sales quantity and the depreciation rate at an appropriate point, and it is very difficult for a person to calculate whether the expected sales profit will be achieved. .

신경망을 가지는 협업 필터링 시스템을 이용하여 클릭패턴에 기초한 웹 광고 추천 방법 및 그 시스템, 대한민국 등록특허 10-0792700, 네이버 주식회사Web advertisement recommendation method and system based on a click pattern using a collaborative filtering system having a neural network, Korean Patent Registration 10-0792700, Naver Co., Ltd.

따라서, 본 발명의 목적은 기존에 경험적으로 설계되던 온라인 쇼핑몰의 프로모션의 낮은 정밀도를 해결하기 위하여, 인공지능을 이용하여 프로모션의 성과를 예측하고 프로모션의 특성을 추천하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법을 제공하는 데에 있다. Accordingly, an object of the present invention is to predict the performance of promotions using artificial intelligence and recommend the characteristics of promotions in an online shopping mall using artificial intelligence in order to solve the low precision of promotion of an online shopping mall that was previously designed empirically. It is to provide a device and method for predicting and recommending promotion performance of the company.

이하 본 발명의 목적을 달성하기 위한 구체적 수단에 대하여 설명한다.Hereinafter, specific means for achieving the object of the present invention will be described.

본 발명의 목적은, 제품 속성 데이터, 프로모션 속성 데이터, 기타 성과 데이터 및 프로모션 를 기초로 기학습된 인공신경망의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 인공신경망의 프로그램 코드를 처리하여 특정 프로모션의 성과를 예측하는 처리 모듈; 을 포함하고, 상기 프로그램 코드는, 제품 속성 데이터(100), 프로모션 속성 데이터(200), 성과 데이터(300)를 입력 받는 입력 단계; 상기 제품 속성 데이터, 상기 프로모션 속성 데이터 및 상기 성과 데이터를 기초로 상기 특정 프로모션에 대한 성과를 예측하여 프로모션 성과 예측 데이터를 생성하는 성과 예측 단계; 및 예측된 상기 프로모션 성과 예측 데이터를 출력하는 출력 단계;를 포함하여 컴퓨터 상에서 수행되도록 구성되는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치를 제공하여 달성될 수 있다.An object of the present invention is a memory module for storing a program code of an artificial neural network previously learned based on product attribute data, promotion attribute data, other performance data and promotion; And a processing module for predicting the performance of a specific promotion by processing the program code of the artificial neural network. Including, the program code, the product attribute data (100), promotion attribute data (200), an input step of receiving the performance data (300); A performance prediction step of generating promotion performance prediction data by predicting a performance for the specific promotion based on the product attribute data, the promotion attribute data, and the performance data; And an output step of outputting the predicted promotion performance prediction data. It may be achieved by providing a promotion performance prediction and recommendation device in an online shopping mall using artificial intelligence, which is configured to be performed on a computer.

또한, 프로모션 속성 데이터 및 성과 데이터를 기초로 기학습된 프로모션 성과 예측 인공신경망 모듈 및 프로모션 추천 모듈의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 프로모션 성과 예측 인공신경망 모듈 및 상기 프로모션 추천 모듈의 프로그램 코드를 처리하여 특정 프로모션의 성과를 예측하고 프로모션 속성을 추천하는 처리 모듈;을 포함하고, 상기 프로모션 추천 모듈의 상기 프로그램 코드는, 상기 프로모션 성과 예측 인공신경망 모듈에서 특정 프로모션 속성에 대해 예측된 프로모션 성과에 대한 정보인 프로모션 성과 예측 데이터를 수신하는 프로모션 성과 예측 단계; 에피소드 t에 대한 상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 상태 정보 및 상기 프로모션 성과 예측 데이터를 기초로 가치망에서 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 정책망에서 복수개의 액션(action)인 추천 프로모션 속성 확률을 출력하는 속성 확률 출력 단계; 상기 복수개의 추천 프로모션 속성 확률을 기초로 최적 프로모션 탐색을 통해 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 수신된 상기 상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 정책망과 상기 가치망을 업데이트하는 정책망 및 가치망 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 프로모션 카테고리, 프로모션 기간, 프로모션 쇼핑몰, 노출구좌 또는 프로모션 이미지 색상을 의미하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치를 제공하여 달성될 수 있다. In addition, a memory module that stores a program code of a promotion performance prediction artificial neural network module and a promotion recommendation module previously learned based on promotion attribute data and performance data; And a processing module that processes the promotion performance prediction artificial neural network module and the program code of the promotion recommendation module to predict the performance of a specific promotion and recommends promotion attributes; wherein the program code of the promotion recommendation module comprises: the promotion A promotion performance prediction step of receiving promotion performance prediction data, which is information on promotion performance predicted for a specific promotion attribute, by the performance prediction artificial neural network module; A state information receiving step of receiving state information (s _t ) for episode t; Attribute probability output for generating reward possibility information in the value network based on the status information and the promotion performance prediction data, and then outputting the recommended promotion attribute probability that is a plurality of actions in the policy network based on the reward possibility information step; A recommended promotion attribute data output step of selecting and outputting recommended promotion attribute data (a _t ) for episode t through optimal promotion search based on the plurality of recommended promotion attribute probabilities; A status and reward information receiving step of receiving status information (s _t+1 ) for episode t+1 and reward information (r _t+1 ) for a _t after the promotion according to the recommended promotion attribute data is applied; And updating the policy network and the value network based on the received state information (s _t+1 ) and the compensation information (r _t+1 ). It is configured to be performed on a computer including, and the status information includes performance data (Quantity, Sales Amount, Page View), Conversion Rate, Average Sales Price, or Sales profit) and environmental data (date information, day of the week, weather information, product information in progress, shopping mall information or other product information of the shopping mall), and the recommended promotion attribute data includes promotion category, promotion period, promotion shopping mall, Providing a promotion performance prediction and recommendation device in an online shopping mall using artificial intelligence, characterized in that it means the color of the exposure account or the promotion image, and the compensation information means the rise or fall of the performance data for a specific period. Can be achieved.

또한, 상기 추천 프로모션 속성 데이터 출력 단계의 상기 최적 프로모션 탐색은 몬테카를로 트리 탐색이 이용되고, 상기 몬테카를로 트리 탐색에서 트리의 각 노드는 상태(state)를, 각 연결(edge)은 특정 상태에 대한 특정 액션에 따라 예상되는 가치(value)를 나타내며, 현재 상태를 뿌리 노드로 두고 새로운 액션을 취해 새로운 상태로 전이될 때 마다 잎(leaf) 노드가 확장되는 구조로 구성되고, 상기 추천 프로모션 속성 데이터 출력 단계는, 상기 현재 상태로부터 상기 잎 노드가 나올 때까지 선택 가능한 액션 중 가장 가치가 높은 액션을 선택하며 진행하는 선택 단계; 시뮬레이션이 상기 잎 노드까지 진행되면 지도학습으로 학습된 상기 정책망의 확률에 따라 액션하여 새로운 잎 노드를 추가하는 확장 단계; 상기 새로운 잎 노드로부터 상기 가치망을 사용해 판단한 보상 가능성 정보와 상기 잎 노드로부터 상기 정책망을 사용해 프로모션 에피소드가 끝날 때까지 진행해 얻은 보상을 계산하여 상기 새로운 잎 노드의 가치를 평가하는 평가 단계; 및 상기 새로운 잎 노드의 상기 가치를 반영하여 시뮬레이션 중 방문한 노드들의 가치를 재평가하고 방문 빈도를 업데이트하는 업데이트 단계;를 더 포함하고, 상기 방문 빈도를 기초로 상기 추천 프로모션 속성 데이터(a_t)를 선정하는 것을 특징으로 할 수 있다. In addition, in the search for the optimal promotion in the step of outputting the recommended promotion attribute data, a Monte Carlo tree search is used, and in the Monte Carlo tree search, each node of the tree represents a state, and each edge represents a specific action for a specific state. It represents the expected value according to the current state as a root node, and it is composed of a structure in which a leaf node is expanded each time a new action is taken to transition to a new state, and the recommended promotion attribute data output step is A selection step of selecting an action with the highest value among selectable actions from the current state until the leaf node appears; An expansion step of adding a new leaf node by performing an action according to the probability of the policy network learned by supervised learning when the simulation proceeds to the leaf node; An evaluation step of evaluating the value of the new leaf node by calculating the reward possibility information determined from the new leaf node using the value network and the reward obtained by proceeding from the leaf node to the end of the promotion episode using the policy network; And an update step of reevaluating the values of the nodes visited during the simulation by reflecting the value of the new leaf node and updating the visit frequency; further comprising, selecting the recommended promotion attribute data (a _t ) based on the visit frequency It can be characterized by that.

또한, 상기 정책망은, 랜덤 벡터를 포함하여 상기 프로모션 속성 데이터 및 상기 성과 데이터를 기초로 지도학습이 되도록 구성되는 것을 특징으로 할 수 있다. In addition, the policy network may be configured to perform supervised learning based on the promotion attribute data and the performance data including a random vector.

본 발명의 다른 목적은, 프로모션 추천 모듈이 프로모션 성과 예측 인공신경망 모듈에서 특정 프로모션 속성에 대해 예측된 프로모션 성과에 대한 정보인 프로모션 성과 예측 데이터를 수신하는 프로모션 성과 예측 단계; 상기 프로모션 추천 모듈이 에피소드 t에 대한 상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 프로모션 추천 모듈이 상기 상태 정보 및 상기 프로모션 성과 예측 데이터를 기초로 가치망에서 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 정책망에서 복수개의 액션(action)인 추천 프로모션 속성 확률을 출력하는 속성 확률 출력 단계; 상기 프로모션 추천 모듈이 상기 복수개의 추천 프로모션 속성 확률을 기초로 최적 프로모션 탐색을 통해 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 상기 프로모션 추천 모듈이 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 상기 프로모션 추천 모듈이 수신된 상기 상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 정책망과 상기 가치망을 업데이트하는 정책망 및 가치망 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 프로모션 카테고리, 프로모션 기간, 프로모션 쇼핑몰, 노출구좌 또는 프로모션 이미지 색상을 의미하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 방법을 제공하여 달성될 수 있다. Another object of the present invention is a promotion performance prediction step of receiving, by a promotion recommendation module, promotion performance prediction data, which is information on promotion performance predicted for a specific promotion attribute, from a promotion performance prediction artificial neural network module; A state information receiving step of the promotion recommendation module receiving state information (s _t ) for episode t; After the promotion recommendation module generates reward possibility information in the value network based on the status information and the promotion performance prediction data, the recommendation promotion attribute probability that is a plurality of actions in the policy network is calculated based on the reward possibility information. An attribute probability output step of outputting; A recommended promotion attribute data output step of selecting and outputting, by the promotion recommendation module, recommended promotion attribute data (a _t ) for episode t through optimal promotion search based on the plurality of recommended promotion attribute probabilities; After the promotion according to the recommended promotion attribute data is applied, the promotion recommendation module receives status information for episode t+1 (s _t+1 ) and reward information for a _t (r _t+1 ) and rewards Receiving information; And updating the policy network and the value network based on the status information (s _t+1 ) and the reward information (r _t+1 ) received by the promotion recommendation module. It is configured to be performed on a computer including, and the status information includes performance data (Quantity, Sales Amount, Page View), Conversion Rate, Average Sales Price, or Sales profit) and environmental data (date information, day of the week, weather information, product information in progress, shopping mall information or other product information of the shopping mall), and the recommended promotion attribute data includes promotion category, promotion period, promotion shopping mall, By providing a method of predicting and recommending promotion performance in an online shopping mall using artificial intelligence, characterized in that it means the color of an exposure account or promotion image, and the compensation information means an increase or decrease in the performance data for a specific period. Can be achieved.

상기한 바와 같이, 본 발명에 의하면 이하와 같은 효과가 있다.As described above, the present invention has the following effects.

첫째, 본 발명의 일실시예에 따르면, 인공지능의 학습에 의해 온라인 쇼핑몰의 프로모션이 정교해지고 정밀해짐으로써 프로모션의 효과가 상승하는 효과가 발생된다.First, according to an embodiment of the present invention, the promotion of the online shopping mall is elaborated and precise by learning of artificial intelligence, thereby increasing the effect of the promotion.

둘째, 본 발명의 일실시예에 따르면, 가치망(111)에 따라 추천 프로모션 속성의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 프로모션 추천 모델에 적용하는데는 어려움이 있었다. Second, according to an embodiment of the present invention, an effect of updating a policy network that outputs a probability of a recommended promotion attribute according to the value network 111 can be performed every episode. In the existing reinforcement learning, there is a problem in that the update of the reinforcement learning model proceeds after all episodes are finished, so it was difficult to apply it to the promotion recommendation model.

셋째, 본 발명의 일실시예에 따르면, 프로모션의 추천은 단기 및 장기 목표들이 존재하므로 MDP의 가정이 성립하지 않아서 기존의 DQN 적용에 어려움이 있음에도 불구하고, 정책망 및 가치망의 의존적 관계에 의해 프로모션 추천에도 강화학습을 진행할 수 있게 되는 효과가 발생된다. Third, according to an embodiment of the present invention, promotion recommendation has short-term and long-term goals, and thus, despite the difficulty in applying the existing DQN because the assumption of MDP is not established, the policy network and the value network depend on the relationship. Promotional recommendations also have the effect of enabling reinforcement learning.

넷째, 본 발명의 일실시예에 따르면, 정책망 및 가치망이 프로모션 성과 예측 인공신경망 장치와 연결되어 프로모션 성과 예측 데이터를 기초로 학습되기 때문에 학습 속도가 향상되는 효과가 발생된다. Fourth, according to an embodiment of the present invention, since the policy network and the value network are connected to the promotion performance prediction artificial neural network device and learn based on the promotion performance prediction data, the learning speed is improved.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)를 도시한 것,
도 2는 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 학습 과정을 도시한 모식도,
도 3은 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 추론 과정을 도시한 모식도,
도 4는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)를 도시한 모식도,
도 5는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 동작예를 도시한 흐름도,
도 6은 본 발명의 일실시예에 따른 프로모션 이미지 수정 장치(13)를 도시한 모식도,
도 7은 본 발명의 일실시예에 따른 ConvNet 인코더의 예를 도시한 모식도,
도 8은 본 발명의 일실시예에 따른 프로모션 추천 방법을 도시한 흐름도이다. The following drawings attached to the present specification illustrate preferred embodiments of the present invention, and serve to further understand the technical idea of the present invention together with the detailed description of the present invention, so the present invention is limited to the matters described in such drawings. And should not be interpreted.
1 shows a promotion performance prediction and recommendation device 1 in an online shopping mall using artificial intelligence according to an embodiment of the present invention.
2 is a schematic diagram showing a learning process of the artificial neural network device for predicting promotion performance according to an embodiment of the present invention;
3 is a schematic diagram showing an inference process of the artificial neural network device for predicting promotion performance according to an embodiment of the present invention;
4 is a schematic diagram showing a promotion recommendation device 11 according to an embodiment of the present invention,
5 is a flowchart showing an operation example of the promotion recommendation device 11 according to an embodiment of the present invention;
6 is a schematic diagram showing a promotion image correction device 13 according to an embodiment of the present invention,
7 is a schematic diagram showing an example of a ConvNet encoder according to an embodiment of the present invention,
8 is a flowchart illustrating a promotion recommendation method according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 쉽게 실시할 수 있는 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예에 대한 동작원리를 상세하게 설명함에 있어서 관련된 공지기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.Hereinafter, exemplary embodiments in which the present invention can be easily implemented by those of ordinary skill in the art will be described in detail with reference to the accompanying drawings. However, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

또한, 도면 전체에 걸쳐 유사한 기능 및 작용을 하는 부분에 대해서는 동일한 도면 부호를 사용한다. 명세서 전체에서, 특정 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고, 간접적으로 연결되어 있는 경우도 포함한다. 또한, 특정 구성요소를 포함한다는 것은 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, the same reference numerals are used for parts having similar functions and functions throughout the drawings. Throughout the specification, when a specific part is said to be connected to another part, this includes not only the case that it is directly connected, but also the case that it is indirectly connected with another element interposed therebetween. In addition, the inclusion of a specific component does not exclude other components unless specifically stated to the contrary, but means that other components may be further included.

인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법A device and method for predicting and recommending promotion performance in online shopping malls using artificial intelligence

도 1은 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)를 도시한 것이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치는 제품 속성 데이터, 프로모션 속성 데이터, 성과 데이터 및 프로모션을 기초로 기학습된 프로모션 성과 예측 인공신경망 장치(10) 및 프로모션 추천 장치(11)를 포함할 수 있다. 1 is a diagram showing a promotion performance prediction and recommendation device 1 in an online shopping mall using artificial intelligence according to an embodiment of the present invention. As shown in FIG. 1, the apparatus for predicting and recommending promotion performance in an online shopping mall using artificial intelligence according to an embodiment of the present invention is a pre-learned promotion based on product attribute data, promotion attribute data, performance data, and promotion. It may include a performance prediction artificial neural network device 10 and a promotion recommendation device 11.

프로모션 성과 예측 인공신경망 장치(10) 관련, 도 2는 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 학습 과정을 도시한 모식도, 도 3은 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 추론 과정을 도시한 모식도이다. 도 2, 3에 도시된 바와 같이, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 방법은 제품 속성 데이터(100), 프로모션 속성 데이터(200), 성과 데이터(300)를 수신하여 프로모션 성과 예측 인공신경망 장치(10)를 학습시키고, 기학습된 프로모션 성과 예측 인공신경망 장치(10)에 제품 속성 데이터(100) 및 프로모션 속성 데이터(200)를 입력하여 예측되는 프로모션 성과 예측 데이터(400)을 출력할 수 있다. 특히, 프로모션 성과 예측 인공신경망 장치(10)의 학습시에는 프로모션 성과 데이터와 출력 데이터와의 에러(error)를 기초로 hidden layer의 weight를 업데이트하는 방법으로 진행될 수 있다. 도 2에서 제품 속성 데이터(100) 및 프로모션 속성 데이터(200)를 구성하는 각 속성은 x1, x2, x3와 같은 input layer의 각 노드에 입력되고, w1과 같은 weight을 기반으로 h1, h2, h3와 같은 hidden layer를 지나 softmax 등의 cost function 기반으로 예측된 프로모션 성과 예측 데이터(400)가 y1인 output layer로 출력되게 된다. 예측된 성과 데이터와 실제 성과 데이터(300)와의 에러(error, -Sigma(y_i log p_i))를 기반으로 프로모션 성과 예측 인공신경망 장치(10)의 weight가 back propagation으로 업데이트 될 수 있다. Related to the promotion performance prediction artificial neural network device 10, FIG. 2 is a schematic diagram showing a learning process of the promotion performance prediction artificial neural network device according to an embodiment of the present invention, and FIG. 3 is a promotion performance prediction according to an embodiment of the present invention. It is a schematic diagram showing the inference process of the artificial neural network device. As shown in FIGS. 2 and 3, a method for predicting promotion performance in an online shopping mall using artificial intelligence is an artificial neural network for predicting promotion performance by receiving product attribute data 100, promotion attribute data 200, and performance data 300. The device 10 is trained, and product attribute data 100 and promotion attribute data 200 are input to the previously learned promotion performance prediction artificial neural network device 10 to output the predicted promotion performance prediction data 400. have. In particular, when the artificial neural network device 10 for predicting promotion performance is learning, the weight of the hidden layer may be updated based on an error between the promotion performance data and the output data. In FIG. 2, each attribute constituting the product attribute data 100 and the promotion attribute data 200 is input to each node of the input layer such as x1, x2, and x3, and based on weights such as w1, h1, h2, and h3 The promotion performance prediction data 400 predicted based on a cost function such as softmax after passing through the hidden layer such as is output as an output layer y1. Based on an error (error, -Sigma(y _i log p _i )) between the predicted performance data and the actual performance data 300, the weight of the artificial neural network device 10 for predicting promotion performance may be updated as back propagation.

또는, 각각의 제품에 대해 별도의 프로모션 성과 예측 인공신경망 장치(10)를 구성하고, 프로모션 속성 데이터(200), 성과 데이터(300)를 수신하여 프로모션 성과 예측 인공신경망 장치(10)를 학습시키고, 기학습된 프로모션 성과 예측 인공신경망 장치에 프로모션 속성 데이터(200)를 입력하여 예측되는 프로모션 성과 예측 데이터(400)을 출력할 수 있다. 이 때 이용될 수 있는 인공신경망 이외에도 다층 퍼셉트론(Multi-layer perceptron), 나이브 베이지안 분류(Naive-Bayesian Classification), 랜덤 포레스트 분류(Random Forest Classification) 등의 기계학습이 이용될 수 있다.Alternatively, a separate promotion outcome prediction artificial neural network device 10 is configured for each product, and promotion property data 200 and performance data 300 are received to learn the promotion outcome prediction artificial neural network device 10, Promotional outcome prediction data 400 predicted by inputting promotion attribute data 200 to the previously learned promotion outcome prediction artificial neural network device may be output. In addition to the artificial neural network that can be used at this time, machine learning such as Multi-layer perceptron, Naive-Bayesian Classification, and Random Forest Classification can be used.

제품 속성 데이터(100)는 제품 카테고리, 제품 판매가격, 제품원가 등을 포함할 수 있다. The product attribute data 100 may include a product category, a product selling price, a product cost, and the like.

프로모션 속성 데이터(200)는 프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 등을 포함할 수 있다. 본 발명의 일실시예에 따른 프로모션 속성 데이터 중 프로모션 이미지 데이터는 기학습된 CNN(Convolutional Neural Network)을 통해 특정 분위기 카테고리, 특정 색감 카테고리로 분류된 결과값을 의미할 수 있다. The promotion attribute data 200 may include a promotion category (discount promotion, gift promotion, etc.), a promotion period, a promotion shopping mall, an exposure account, a promotion image, and the like. Among the promotion attribute data according to an embodiment of the present invention, the promotion image data may mean a result value classified into a specific atmosphere category and a specific color sense category through a previously learned Convolutional Neural Network (CNN).

성과 데이터(300)는 판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등을 포함할 수 있다. 본 발명의 일실시예에 따르면, 성과 데이터(300)의 다중공선성(Multicollinearity)에 따른 문제를 방지하기 위하여 성과 데이터(300)를 구성하는 각각의 데이터에 대해 PCA를 적용하여 diagnol matrix의 형태로 공선성을 상쇄해줄 수 있다. The performance data 300 may include a quantity, a sales amount, a page view, a conversion rate, an average sales price, and a sales profit. According to an embodiment of the present invention, in order to prevent a problem due to multicollinearity of the performance data 300, PCA is applied to each data constituting the performance data 300 in the form of a diagnol matrix. It can offset collinearity.

프로모션 속성 추천과 관련하여, 상기 프로모션 성과 예측 인공신경망 장치를 이용하여 가장 높은 프로모션 성과 예측 데이터(400)가 출력되도록 하는 추천 프로모션 속성 데이터를 출력하는 프로모션 추천 장치(11)가 구성될 수 있다. In relation to the promotion attribute recommendation, the promotion recommendation device 11 may be configured to output recommended promotion attribute data so that the highest promotion performance prediction data 400 is output using the promotion performance prediction artificial neural network device.

본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 프로모션 추천은 강화학습을 이용하여 수행될 수 있다. 도 4는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)를 도시한 모식도이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 프로모션 추천 장치(11)는 특정 상태에서의 가치를 출력하는 가치 함수를 학습하는 인공신경망인 가치망(111) 및 추천 프로모션 속성의 확률을 출력하는 정책 함수를 학습하는 정책망(110)을 포함할 수 있고, 본 발명의 일실시예에 따른 정책망(110) 및 가치망(111)은 프로모션 성과 예측 인공신경망 장치(10)에 연결되도록 구성될 수 있다. 또한, 정책망(110)과 가치망(111)은 최적 프로모션 탐색 모듈(12)과 연결되어 추천 프로모션 속성 데이터(500)를 출력할 수 있다. Promotion recommendation by the promotion recommendation device 11 according to an embodiment of the present invention may be performed using reinforcement learning. 4 is a schematic diagram showing a promotion recommendation device 11 according to an embodiment of the present invention. As shown in Fig. 4, the promotion recommendation device 11 according to an embodiment of the present invention includes a value network 111, which is an artificial neural network that learns a value function that outputs a value in a specific state, and the probability of a recommendation promotion attribute. It may include a policy network 110 for learning a policy function that outputs, and the policy network 110 and the value network 111 according to an embodiment of the present invention are connected to the artificial neural network device 10 for predicting promotion performance. It can be configured to be. In addition, the policy network 110 and the value network 111 may be connected to the optimal promotion search module 12 to output recommended promotion attribute data 500.

강화학습의 관점에서, 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 Objective는 프로모션의 성과를 향상시키는 것이고, 상태(State)는 현재 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등) 및 환경 데이터(날짜, 요일, 날씨, 현재 프로모션 진행 중인 제품, 쇼핑몰, 쇼핑몰의 다른 제품 정보)를 의미할 수 있고, 액션(Action)은 프로모션 속성 데이터(프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 색상 등)을 의미할 수 있으며, 보상(Reward)은 특정 기간 동안의 성과 데이터의 상승 또는 하강을 의미할 수 있다. In terms of reinforcement learning, the objective of the promotion recommendation device 11 according to an embodiment of the present invention is to improve the performance of the promotion, and the state is current performance data (Sales Amount, Sales Amount). ), Inflow (Page View), Conversion Rate, Average Sales Price, Sales Profit, etc.) and environmental data (date, day of the week, weather, products currently being promoted, shopping malls, other product information of shopping malls) ), and Action can mean promotion attribute data (promotion category (discount promotion, free gift promotion, etc.), promotion period, promotion shopping mall, exposure account, promotion image color, etc.), and reward (Reward ) Can mean a rise or fall in performance data over a specific period.

정책망(110)은 프로모션 추천 장치(11)의 각 상태에서 추천 프로모션의 특정 속성들의 확률을 결정하는 인공신경망이고, 정책 함수를 학습하여 추천 프로모션 속성 확률을 출력하게 된다. 정책망의 Cost function은 정책함수와 가치망의 Cost Function을 곱하여 크로스 엔트로피(Cross Entropy)를 계산한 뒤 Policy gradient를 취한 함수일 수 있고, 예를 들면, 아래 수학식 1과 같이 구성될 수 있다. 정책망은 크로스 엔트로피와 가치망의 cost function인 시간차 에러의 곱을 기초로 back propagation 될 수 있다. The policy network 110 is an artificial neural network that determines the probability of specific attributes of a recommended promotion in each state of the promotion recommendation device 11, and learns a policy function to output a recommended promotion attribute probability. The cost function of the policy network may be a function obtained by multiplying the policy function and the cost function of the value network to calculate the cross entropy and then taking the policy gradient. The policy network can be back propagated based on the product of the cross entropy and the time difference error, which is a cost function of the value network.

수학식 1에서, π는 정책 함수, θ는 정책망 파라미터, π_θ(a_i│s_i)는 현재 에피소드에서 특정 액션(특정 속성의 프로모션)을 할 가능성, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 프로모션 성과 예측 인공신경망 장치(10)에서 수신하도록 구성될 수 있다. 결국, 본 발명의 일실시예에 따른 정책망(110)은 Policy gradient를 통해 초기에는 사용자의 프로모션 히스토리를 모사하는 프로모션 속성을 출력하게 된다. In Equation 1, π is a policy function, θ is a policy network parameter, π _θ (a _i │s _i ) is the probability of performing a specific action (promotion of a specific attribute) in the current episode, V is a value function, and w is a value network. Parameter, s _i is the status information of the current episode i, S _i+1 is the status information of the next episode i+1, r _i+1 is the reward expected to be acquired in the next episode, and V _w (s _i ) is The possibility of compensation in the current episode, V _w (s _i+1 ) may indicate the possibility of compensation in the next episode, and γ may indicate a depreciation rate. In this case, r _i+1 may be configured to be received by the artificial neural network device 10 for predicting promotion performance. As a result, the policy network 110 according to an embodiment of the present invention initially outputs a promotion attribute that simulates a user's promotion history through a policy gradient.

본 발명의 일실시예에 따른 정책망(110)은 강화학습이 진행되기 이전에 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습(Supervised Learning)되어 정책망의 weight가 업데이트 됨으로써 정책의 기초를 학습할 수 있다. 즉, 정책망의 weight는 기존의 프로모션 속성 데이터 및 성과 데이터를 토대로 지도학습되어 설정될 수 있다. 이에 따르면, 기존의 프로모션의 기록에 의해 정책망이 매우 빠르게 학습될 수 있는 효과가 발생된다. The policy network 110 according to an embodiment of the present invention is supervised learning based on the existing promotion attribute data and the corresponding performance data before reinforcement learning proceeds, and the weight of the policy network is updated. You can learn the basics. That is, the weight of the policy network may be supervised and set based on existing promotion attribute data and performance data. According to this, there is an effect that the policy network can be learned very quickly by the record of the existing promotion.

또한, 본 발명의 일실시예에 따르면 정책망(110)의 지도학습 시에 랜덤 벡터를 포함하여 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습이 되도록 구성될 수 있다. 랜덤 벡터는 예를 들면 가우시안 확률 분포(Gaussian distribution)를 이용할 수 있다. 이에 따르면, 정책망이 랜덤한 확률로 도전적인 프로모션 정책을 출력할 수 있게 되는 효과가 발생된다. 정책망(110)의 지도학습 시에 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습이 되도록 구성하면 프로모션의 추천이 기존의 정책 내에서 최적화되는 결과가 나타나게 된다. 하지만, 본 발명의 일실시예에 따라 정책망의 지도학습 시에 랜덤 벡터를 포함하게 되면 강화학습이 진행될수록 정책망이 기존의 정책보다 더 효과적인 프로모션을 학습할 수 있게 되는 효과가 발생된다. In addition, according to an embodiment of the present invention, during supervised learning of the policy network 110, it may be configured to perform supervised learning based on existing promotion attribute data and result data including a random vector. The random vector may use, for example, a Gaussian distribution. Accordingly, there is an effect that the policy network can output a challenging promotion policy with a random probability. When supervised learning of the policy network 110 is configured to be supervised learning based on the existing promotion attribute data and corresponding performance data, a result of the promotion recommendation is optimized within the existing policy. However, according to an embodiment of the present invention, if a random vector is included in the supervised learning of a policy network, the more effective reinforcement learning proceeds, the more effective the policy network can learn promotion than the existing policy.

가치망(111)은 프로모션 추천 장치(11)가 가질 수 있는 각 상태(State)에서 보상(Reward)을 달성할 가능성을 도출하는 인공신경망이고, 가치 함수를 학습하게 된다. 가치망(111)은 에이전트(agent)인 프로모션 추천 장치(11)가 어떤 방향으로 업데이트 될 지에 대한 방향성을 제시해주게 된다. 이를 위해, 가치망(111)의 입력 변수는 프로모션 추천 장치(11)의 상태에 대한 정보인 상태 정보로 설정되고, 가치망(111)의 출력 변수는 프로모션 추천 장치(11)가 보상을 달성할 가능성인 보상 가능성 정보로 설정될 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보는 아래 수학식과 같은 Q-function으로 계산될 수 있다. The value network 111 is an artificial neural network that derives the possibility of achieving a reward in each state that the promotion recommendation device 11 may have, and learns a value function. The value network 111 presents the direction in which the promotion recommendation device 11, which is an agent, is to be updated. To this end, the input variable of the value network 111 is set as status information, which is information on the status of the promotion recommendation device 11, and the output variable of the value network 111 is the promotion recommendation device 11 It may be set as information on the possibility of compensation, which is a possibility. Compensation possibility information according to an embodiment of the present invention may be calculated by a Q-function as shown in the following equation.

위 수학식 2에서 Q_π는 특정 정책 π에서 상태 s, 액션 a인 경우 미래에 예상되는 전체 보상 가능성 정보를 의미하고, R은 특정 기간의 보상, gamma는 감가율을 의미할 수 있다. S_t는 시간 t의 상태, A_t는 시간 t의 액션, E는 기대값을 의미할 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보(Q value)는 정책망(110)의 업데이트 방향 및 크기를 규정하게 된다. In Equation 2 above, Q _π may mean information about the total reward possibility expected in the future in the case of a state s and an action a in a specific policy π, R represents a reward for a specific period, and gamma represents a depreciation rate. S _t may represent a state of time t, A _t may represent an action of time t, and E may represent an expected value. Compensation possibility information (Q value) according to an embodiment of the present invention defines the update direction and size of the policy network 110.

이때, 가치망의 Cost function은 가치 함수에 대한 MSE(Mean Square error) 함수일 수 있고, 예를 들면 아래 수학식 3과 같이 구성될 수 있다. 가치망(111)은 가치망의 cost function인 시간차 에러를 기초로 back propagation 될 수 있다. In this case, the cost function of the value network may be a mean square error (MSE) function for the value function, and may be configured as in Equation 3 below. The value network 111 may be back propagated based on a time difference error, which is a cost function of the value network.

수학식 2에서, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 프로모션 성과 예측 인공신경망 장치(10)에서 수신하도록 구성될 수 있다. In Equation 2, V is a value function, w is a value network parameter, s _i is the state information of the current episode i, S _i+1 is the state information of the next episode _i+1 , and r _i+1 is the next episode. Reward expected to be obtained, V _w (s _i ) may indicate a possibility of compensation in the current episode, V _w (s _i+1 ) may indicate a possibility of compensation in the next episode, and γ indicates a depreciation rate. In this case, r _i+1 may be configured to be received by the artificial neural network device 10 for predicting promotion performance.

이에 따라, 가치망은 프로모션 추천 장치의 상태가 변경될 때 수학식 1의 Cost Function을 Gradient descent 시키는 방향으로 업데이트 할 수 있다. Accordingly, the value network may update the cost function of Equation 1 in the direction of gradient descent when the state of the promotion recommendation device changes.

본 발명의 일실시예에 따르면 가치망을 정책망과 별도로 학습시키면서, 가치망의 Q value가 랜덤에서 시작하지 않고 Supervised되게 되므로 빠른 학습이 가능해지는 효과가 발생된다. 이에 따르면 매우 복잡도가 높은 프로모션 속성의 조합을 선택하는 액션(action)에 있어서 탐구(exploration) 부담을 크게 줄일 수 있게 되는 효과가 발생된다. According to an embodiment of the present invention, while learning the value network separately from the policy network, the Q value of the value network is supervised instead of starting at random, so that rapid learning is possible. According to this, the effect of being able to greatly reduce the burden of exploration in an action of selecting a combination of promotion attributes with high complexity occurs.

본 발명의 일실시예에 따른 프로모션 추천 장치(11)에 따르면, 지도학습을 마친 정책망(110)이 현재 에피소드 i의 프로모션 속성을 추천하게 되면 가치망(111)이 추천된 프로모션 속성을 진행할 경우의 보상을 예측하도록 학습된다. 학습을 마친 프로모션 추천 장치(11)의 정책망(110)과 가치망(111)은 최적 프로모션 탐색 모듈(112)을 활용한 시뮬레이션과 조합되어 최종적으로 프로모션 속성을 선정하는데 활용된다. According to the promotion recommendation device 11 according to an embodiment of the present invention, when the policy network 110 that has completed supervised learning recommends the promotion attribute of the current episode i, the value network 111 proceeds with the recommended promotion attribute. Is learned to predict the reward of. The policy network 110 and the value network 111 of the promotion recommendation device 11 that has finished learning are combined with a simulation using the optimal promotion search module 112 and are finally used to select promotion attributes.

또한, 본 발명의 일실시예에 따른 가치망(111)에 따르면 추천 프로모션 속성의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 프로모션 추천 모델에 적용하는데는 어려움이 있었다. In addition, according to the value network 111 according to an embodiment of the present invention, an effect of updating a policy network that outputs a probability of a recommended promotion attribute may be performed every episode. In the existing reinforcement learning, there is a problem in that the update of the reinforcement learning model proceeds after all episodes are finished, so it was difficult to apply it to the promotion recommendation model.

최적 프로모션 탐색 모듈(112)은 정책망과 가치망에서 계산되는 복수의 에이전트(agent)를 기초로 다양한 상태 및 다양한 액션에 대한 복수회의 시뮬레이션을 진행하여 최적의 프로모션 속성을 탐색하는 구성이다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈은, 예를 들어, 몬테카를로 트리 탐색을 활용할 수 있고, 트리의 각 노드는 상태(state)를, 각 연결(edge)은 해당 상태에 대한 특정 액션에 따라 예상되는 가치(value)를 나타내며, 현재 상태를 뿌리 노드로 두고 새로운 액션을 취해 새로운 상태로 전이될 때 마다 잎(leaf) 노드가 확장되는 구조이다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈에서 최적 프로모션 탐색은 몬테카를로 트리 탐색이 활용되는 경우, Selection, Expansion, Evaluation, Backup의 4 단계로 처리될 수 있다. The optimal promotion search module 112 is a component that searches for an optimal promotion attribute by performing a plurality of simulations for various states and various actions based on a plurality of agents calculated in the policy network and the value network. The optimal promotion search module according to an embodiment of the present invention may utilize, for example, a Monte Carlo tree search, each node of the tree is a state, and each edge is a specific action for the state. It represents the expected value accordingly, and it is a structure in which the leaf node expands whenever a new action is taken and transition to a new state with the current state as the root node. In the optimal promotion search module according to an embodiment of the present invention, when a Monte Carlo tree search is used, the optimal promotion search may be processed in four stages of selection, expansion, evaluation, and backup.

최적 프로모션 탐색 모듈(112)의 Selection 단계는, 현재 상태로부터 잎 노드가 나올 때까지 선택 가능한 액션 중 가장 가치가 높은 액션을 선택하며 진행하는 단계이다. 이 때 연결(edge)에 저장해 둔 가치함수의 값과 탐구-이용 균형을 맞추기 위한 방문빈도 값을 이용한다. Selection 단계에서 액션 선택을 위한 수학식은 아래와 같다. The Selection step of the optimal promotion search module 112 is a step in which the action with the highest value among selectable actions is selected and proceeded from the current state until the leaf node appears. At this time, the value of the value function stored in the edge and the visit frequency value to balance inquiry-use are used. The equation for action selection in the selection step is as follows.

위 수학식 4에서 a_t는 시간t에서의 액션(프로모션 수행)이고, Q(s_t,a)는 트리에 저장된 가치함수의 값이며, u(s_t,a)는 해당 상태-액션 쌍의 방문횟수에 반비례하는 값으로 탐구(exploration)와 이용의 균형을 맞추기 위해 사용된 것이다. In Equation 4 above, a _t is an action (promotion performed) at time t, Q(s _t ,a) is the value of the value function stored in the tree, and u(s _t ,a) is the corresponding state-action pair. This value is inversely proportional to the number of visits, and is used to balance exploration and use.

최적 프로모션 탐색 모듈(112)의 Expansion 단계는, 시뮬레이션이 잎 노드까지 진행되면 지도학습으로 학습된 정책망의 확률에 따라 액션하여 새로운 노드를 잎 노드로 추가하는 단계이다. The expansion step of the optimal promotion search module 112 is a step of adding a new node as a leaf node by performing an action according to the probability of the policy network learned by supervised learning when the simulation proceeds to the leaf node.

최적 프로모션 탐색 모듈(112)의 Evaluation 단계는, 새로 추가된 잎 노드로부터 가치망을 사용해 판단한 가치(보상 가능성)와 잎 노드로부터 정책망을 사용해 프로모션 에피소드가 끝날 때까지 진행해 얻은 보상을 통해 잎 노드의 가치를 평가하는 단계이다. 아래 수학식은 새로운 잎 노드의 가치를 평가하는 예시이다. In the evaluation step of the optimal promotion search module 112, the leaf node's value (reward possibility) determined using the value network from the newly added leaf node and the reward obtained by proceeding from the leaf node to the end of the promotion episode using the policy network This is the step of evaluating the value. The following equation is an example of evaluating the value of a new leaf node.

위 수학식 5에서 V(s_L)은 잎 노드의 가치, λ는 mixing 파라미터, v_θ(s_L)은 가치망을 통해 얻은 가치, z_L은 시뮬레이션을 계속하여 얻은 보상을 의미할 수 있다. In Equation 5 above, V(s _L ) is the value of the leaf node, λ is the mixing parameter, v _θ (s _L ) is the value obtained through the value network, and z _L is the reward obtained by continuing the simulation.

최적 프로모션 탐색 모듈(112)의 Backup 단계는, 새로 추가된 잎 노드의 가치를 반영하여 시뮬레이션 중 방문한 노드들의 가치를 재평가하고 방문 빈도를 업데이트하는 단계이다. 아래 수학식은 노드 가치 재평가 및 방문 빈도 업데이트의 예시이다. The backup step of the optimal promotion search module 112 is a step of re-evaluating the values of the nodes visited during the simulation by reflecting the values of the newly added leaf nodes and updating the visit frequency. The following equation is an example of node value re-evaluation and visit frequency update.

위 수학식 6에서 s_L ⁱ는 i번째 시뮬레이션에서의 잎 노드를, 1(s,a,i)는 i번째 시뮬레이션에서 연결 (s,a)를 방문했는지를 나타내고, 트리 탐색이 완료되면 알고리즘은 뿌리 노드로부터 가장 많이 방문된 연결(s,a)을 선택하도록 구성될 수 있다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈(112)에 따르면 정책망에 의해 선별되는 복수의 프로모션 속성에 대해 가치망을 기초로 복수회 시뮬레이션을 선행하여 최적의 프로모션 속성을 선택할 수 있게되는 효과가 발생된다. In Equation 6 above, s _L ⁱ represents the leaf node in the i-th simulation, 1(s,a,i) represents whether the connection (s,a) was visited in the i-th simulation, and when the tree search is completed, the algorithm It can be configured to select the most visited connection (s,a) from the root node. According to the optimal promotion search module 112 according to an embodiment of the present invention, it is possible to select an optimal promotion attribute by performing a simulation multiple times based on the value network for a plurality of promotion attributes selected by the policy network. Occurs.

본 발명의 일실시예에 따르면, 복수의 에이전트(Agent)가 구성되도록 프로모션 추천 장치(11)가 구성될 수 있다. 복수의 에이전트가 구성되면 특정 상태, 특정 프로모션 속성 각각에 대해 프로모션 추천 장치가 추천하는 프로모션이 상호 경쟁하여, 일정한 예산 내에서 가장 최적의 제품 및 그에 대한 프로모션을 추천할 수 있게 되는 효과가 발생된다.According to an embodiment of the present invention, the promotion recommendation device 11 may be configured such that a plurality of agents are configured. When a plurality of agents are configured, promotions recommended by the promotion recommendation device for each of a specific state and specific promotion attribute compete with each other, and the most optimal product and promotion thereof can be recommended within a certain budget.

도 5는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 동작예를 도시한 흐름도이다. 도 5에 도시된 바와 같이, 복수의 인터넷 쇼핑몰에서 상태 데이터를 트래킹하는 인터넷 쇼핑몰 데이터 허브 장치(12)에 의해 상태 s(t)가 입력되면 가치망(111)에 의해 정책망(110)의 복수개의 에이전트(agent)들에 의해 다양한 프로모션 속성들이 최적 프로모션 탐색 모듈(112)에 입력되고, 최적 프로모션 탐색 모듈(112)에 의해 출력되는 액션(action)인 추천 프로모션 속성 확률 a(t)에 의해 프로모션이 진행되는 것으로 에피소드 t가 종료되고 에피소드 t+1이 시작된다. 에피소드 t+1에서는 다시 a(t)에 의한 상태 변화인 s(t+1)이 인터넷 쇼핑몰 데이터 허브 장치(12)에 의해 입력되고, a(t)에 따른 보상인 r(t+1)이 곧바로 입력되어 가치망(111) 및 정책망(110)을 업데이트하게 된다. 5 is a flowchart showing an example of the operation of the promotion recommendation device 11 according to an embodiment of the present invention. As shown in FIG. 5, when the state s(t) is input by the Internet shopping mall data hub device 12 that tracks the state data in a plurality of Internet shopping malls, the plurality of policy networks 110 are Various promotion attributes are input to the optimal promotion search module 112 by two agents and are promoted by the recommended promotion attribute probability a(t), which is an action output by the optimal promotion search module 112 With this progression, episode t ends and episode t+1 begins. In episode t+1, s(t+1), which is a state change due to a(t), is input by the internet shopping mall data hub device 12, and r(t+1), which is a compensation according to a(t), is It is immediately input to update the value network 111 and the policy network 110.

또한, 본 발명의 일실시예에 따르면, 추천 프로모션 속성 데이터(500)에 포함되는 데이터 중 프로모션 이미지 색상 정보의 경우, 본 발명의 일실시예에 따르면 실제 프로모션 이미지를 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)가 수신하여 프로모션 이미지 색상 정보을 기초로 프로모션 이미지의 색감을 수정하여 수정된 프로모션 이미지를 출력할 수 있다. 도 6은 본 발명의 일실시예에 따른 프로모션 이미지 수정 장치(13)를 도시한 모식도이다. 도 6에 도시된 바와 같이, 프로모션 이미지 수정 장치(13)는 제너레이터와 ConvNet 인코더를 포함하도록 구성될 수 있다. In addition, according to an embodiment of the present invention, in the case of promotion image color information among data included in the recommended promotion attribute data 500, according to an embodiment of the present invention, the actual promotion image is used in an online shopping mall using artificial intelligence. The promotion performance prediction and recommendation device 1 may receive and correct the color of the promotion image based on color information of the promotion image and output the modified promotion image. 6 is a schematic diagram showing a promotion image correction device 13 according to an embodiment of the present invention. As shown in Fig. 6, the promotion image modification device 13 may be configured to include a generator and a ConvNet encoder.

제너레이터는 VAE, GAN 등의 인코더 및 디코더로 구성된 이미지 제너레이터로 구성될 수 있고, 실제 프로모션 이미지(210)를 수신하여 색상이 변경된 생성된 프로모션 이미지(212)를 출력할 수 있다. 생성된 프로모션 이미지(212)는 다시 ConvNet 인코더에 수신되고, ConvNet 인코더는 인코딩된 프로모션 이미지(211)를 출력하여 추천 프로모션 속성 데이터(500)의 프로모션 이미지 색상 정보와 Cross entropy 계산되어 에러를 출력할 수 있다. 이러한 에러는 제너레이터에 다시 피드(feed)되어 생성된 프로모션 이미지(212)가 더 프로모션 이미지 색상 정보에 맞게 생성(generate)되도록 업데이트 될 수 있다. The generator may be composed of an image generator composed of encoders and decoders such as VAE and GAN, and may receive the actual promotional image 210 and output the generated promotional image 212 with a color changed. The generated promotion image 212 is again received by the ConvNet encoder, and the ConvNet encoder outputs the encoded promotional image 211 to calculate the color information of the promotion image and cross entropy of the recommended promotion attribute data 500 to output an error. have. This error may be updated so that the promotion image 212 generated by being fed back to the generator is further generated to match the color information of the promotional image.

도 7은 본 발명의 일실시예에 따른 ConvNet 인코더의 예를 도시한 모식도이다. 도 7에 도시된 바와 같이, ConvNet 인코더는 [INPUT-CONV-RELU-POOL-FC]으로 구축할 수 있다. 입력 데이터인 생성된 프로모션 이미지(212)의 경우, INPUT 입력 이미지가 가로 32, 세로 32, 그리고 RGB 채널을 가지는 경우 입력의 크기는 [32x32x3]이다. CONV 레이어(Conv. Filter, 101)는 입력 이미지의 일부 영역과 연결되어 있으며, 이 연결된 영역과 자신의 가중치의 내적 연산(dot product)을 계산하게 된다. 결과 볼륨은 [32x32x12]와 같은 크기를 갖게 된다. RELU 레이어는 max(0,x)와 같이 각 요소에 적용되는 액티베이션 함수(activation function)이다. RELU 레이어는 볼륨의 크기를 변화시키지 않는다([32x32x12]). 그 결과 Activation map 1 (102)을 생성한다. POOL 레이어(pooling, 103)는 "가로,세로" 차원에 대해 다운샘플링(downsampling)을 수행해 [16x16x12]와 같이 줄어든 볼륨(Activation map 2, 104)을 출력한다. FC (fully-connected) 레이어(105)는 클래스 점수들을 계산해 [1x1x10]의 크기를 갖는 볼륨(output layer, 106)을 출력한다. "10"은 10개 카테고리에 대한 클래스 점수(본 발명의 일실시예에 따른 프로모션 이미지 색상 정보)에 해당한다. FC 레이어는 이전 볼륨의 모든 요소와 연결되어 있다.7 is a schematic diagram showing an example of a ConvNet encoder according to an embodiment of the present invention. As shown in Fig. 7, the ConvNet encoder can be constructed with [INPUT-CONV-RELU-POOL-FC]. In the case of the generated promotion image 212 as input data, when the INPUT input image has 32 horizontally, 32 vertically, and RGB channels, the size of the input is [32x32x3]. The CONV layer (Conv. Filter, 101) is connected to a partial area of the input image, and the dot product of the connected area and its weight is calculated. The resulting volume will have the same size as [32x32x12]. The RELU layer is an activation function applied to each element, such as max(0,x). The RELU layer does not change the volume size ([32x32x12]). As a result, Activation map 1 (102) is generated. The POOL layer (pooling, 103) performs downsampling on the "horizontal and vertical" dimensions to output a reduced volume (Activation map 2, 104) such as [16x16x12]. The fully-connected (FC) layer 105 calculates class scores and outputs an output layer 106 having a size of [1x1x10]. "10" corresponds to a class score for 10 categories (promotional image color information according to an embodiment of the present invention). The FC layer is connected to all elements of the previous volume.

이와 같이, ConvNet은 픽셀 값으로 이뤄진 원본 이미지를 각 레이어를 거치며 클래스 점수(본 발명의 일실시예에 따른 프로모션 이미지 색상 정보)로 변환(transform)시킨다. 어떤 레이어는 모수 (parameter)를 갖지만 어떤 레이어는 모수를 갖지 않는다. 특히 CONV/FC 레이어들은 단순히 입력 볼륨만이 아니라 가중치(weight)와 바이어스(bias)도 포함하는 액티베이션(activation) 함수이다. 반면 RELU/POOL 레이어들은 고정된 함수이다. CONV/FC 레이어의 모수 (parameter)들은 각 이미지에 대한 클래스 점수가 해당 이미지의 레이블과 같아지도록 그라디언트 디센트(gradient descent)로 학습된다.In this way, ConvNet transforms the original image consisting of pixel values into class scores (promotional image color information according to an embodiment of the present invention) through each layer. Some layers have parameters, while others do not. In particular, CONV/FC layers are activation functions that include not only the input volume but also weight and bias. On the other hand, RELU/POOL layers are fixed functions. The parameters of the CONV/FC layer are learned with gradient descent so that the class score for each image is the same as the label of the image.

CONV 레이어의 모수(parameter)들은 일련의 학습가능한 필터들로 이뤄져 있다. 각 필터는 가로/세로 차원으로는 작지만 깊이 (depth) 차원으로는 전체 깊이를 아우른다. 포워드 패스(forward pass) 때에는 각 필터를 입력 볼륨의 가로/세로 차원으로 슬라이딩시키며(정확히는 convolve시키며) 2차원의 액티베이션 맵 (activation map)을 생성한다. 필터를 입력 위로 슬라이딩 시킬 때, 필터와 입력 볼륨 사이에서 내적 연산(dot product)이 이뤄진다. 이러한 과정으로 ConvNet은 입력 데이터의 특정 위치의 특정 패턴에 대해 반응하는(activate) 필터를 학습하게 된다. 이런 액티베이션 맵(activation map)을 깊이(depth) 차원으로 쌓은 것이 곧 출력 볼륨이 된다. 그러므로 출력 볼륨의 각 요소들은 입력의 작은 영역만을 취급하고, 같은 액티베이션 맵 내의 뉴런들은 같은 필터를 적용한 결과이므로 같은 모수들을 공유한다.The parameters of the CONV layer consist of a series of learnable filters. Each filter is small in the horizontal/vertical dimension but covers the entire depth in the depth dimension. In the forward pass, each filter is slid (convolved precisely) to the horizontal/vertical dimensions of the input volume, and a two-dimensional activation map is created. When sliding the filter over the input, a dot product is performed between the filter and the input volume. Through this process, ConvNet learns a filter that activates for a specific pattern at a specific location of the input data. The stacking of these activation maps in the depth dimension becomes the output volume. Therefore, each element of the output volume only handles a small area of the input, and neurons in the same activation map share the same parameters as the result of applying the same filter.

도 8은 본 발명의 일실시예에 따른 프로모션 추천 방법을 도시한 흐름도이다. 도 8에 도시된 바와 같이, 본 발명의 일실시예에 따른 프로모션 추천 방법은, 프로모션 성과 예측 단계(S10), 상태 정보 수신 단계(S11), 추천 프로모션 속성 데이터 출력 단계(S12), 상태 및 보상 정보 수신 단계(S13), 정책망 및 가치망 업데이트 단계(S14)를 포함할 수 있다. 8 is a flowchart illustrating a promotion recommendation method according to an embodiment of the present invention. As shown in FIG. 8, the promotion recommendation method according to an embodiment of the present invention includes a promotion performance prediction step (S10), a status information reception step (S11), a recommendation promotion attribute data output step (S12), a status and a reward. It may include an information reception step (S13), a policy network and a value network update step (S14).

프로모션 성과 예측 단계(S10)는 프로모션 성과 예측 인공신경망 장치가 기존의 프로모션 속성 데이터, 성과 데이터를 수신하여 특정 프로모션 속성에 대한 프로모션 성과를 예측한 뒤 프로모션 추천 장치(11)에 송신하는 단계이다. 예측된 프로모션 성과 데이터는 정책망과 가치망의 초기값으로 이용될 수 있다. In the promotion performance prediction step (S10), the promotion performance prediction artificial neural network device receives the existing promotion attribute data and performance data, predicts the promotion performance for a specific promotion attribute, and transmits it to the promotion recommendation device 11. The predicted promotion performance data can be used as initial values for the policy network and the value chain.

상태 정보 수신 단계(S11)는 프로모션 추천 장치(11)가 인터넷 쇼핑몰 데이터 허브 장치에서 에피소드 t에 대한 상태 정보(s_t)를 수신하는 단계이다. The status information reception step S11 is a step in which the promotion recommendation device 11 receives status information s _t for the episode t from the internet shopping mall data hub device.

추천 프로모션 속성 데이터 출력 단계(S12)는 프로모션 추천 장치(11)가 최적 프로모션 탐색 모듈(112)을 통해 에피소드 t에 대한 추천 프로모션 속성 확률을 포함하는 추천 프로모션 속성 데이터(a_t)를 출력하는 단계이다. The recommended promotion attribute data output step (S12) is a step in which the promotion recommendation device 11 outputs recommended promotion attribute data (a _t ) including the recommended promotion attribute probability for episode t through the optimal promotion search module 112. .

상태 및 보상 정보 수신 단계(S13)는 추천 프로모션 속성 데이터에 따라 인터넷 쇼핑몰에 프로모션이 적용된 뒤, 프로모션 추천 장치(11)가 인터넷 쇼핑몰 데이터 허브 장치에서 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 단계이다. After the promotion is applied to the internet shopping mall according to the recommended promotion attribute data, the status and reward information reception step (S13) is performed by the promotion recommendation device 11 from the internet shopping mall data hub device to the status information on episode t+1 (s _t+1). ) And compensation information for a _t (r _t+1 ).

정책망 및 가치망 업데이트 단계(S14)는 수신된 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 기초로 정책망과 가치망을 업데이트하는 단계이다. The policy network and value network update step (S14) is a step of updating the policy network and the value network based on the received state information (s _t+1 ) and compensation information for a _t (r _t+1 ).

강화학습의 관점에서, 본 발명의 일실시예에 따른 프로모션 추천 방법의 Objective는 프로모션의 성과를 향상시키는 것이고, 상태(State)는 현재 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등) 및 환경 데이터(날짜, 요일, 날씨, 현재 프로모션 진행 중인 제품, 쇼핑몰, 쇼핑몰의 다른 제품 정보)를 의미할 수 있고, 액션(Action)은 프로모션 속성 데이터(프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 색상 등)을 의미할 수 있으며, 보상(Reward)은 특정 기간 동안의 성과 데이터의 상승 또는 하강을 의미할 수 있다. In terms of reinforcement learning, the objective of the promotion recommendation method according to an embodiment of the present invention is to improve the performance of the promotion, and the state is the current performance data (Quantity, Sales Amount, Inflow (Page View), Conversion Rate, Average Sales Price, Sales Profit, etc.) and environmental data (date, day of the week, weather, products currently being promoted, shopping malls, other product information of shopping malls) Action can mean promotion attribute data (promotion category (discount promotion, free gift promotion, etc.), promotion period, promotional shopping mall, exposure account, promotion image color, etc.), and reward is a specific It can mean a rise or fall in performance data over a period.

이상에서 설명한 바와 같이, 본 발명이 속하는 기술 분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 상술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함하는 것으로 해석되어야 한다.As described above, those skilled in the art to which the present invention pertains will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features thereof. Therefore, the above-described embodiments are illustrative in all respects and should be understood as non-limiting. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and equivalent concepts should be interpreted as being included in the scope of the present invention.

본 명세서 내에 기술된 특징들 및 장점들은 모두를 포함하지 않으며, 특히 많은 추가적인 특징들 및 장점들이 도면들, 명세서, 및 청구항들을 고려하여 당업자에게 명백해질 것이다. 더욱이, 본 명세서에 사용된 언어는 주로 읽기 쉽도록 그리고 교시의 목적으로 선택되었고, 본 발명의 주제를 묘사하거나 제한하기 위해 선택되지 않을 수도 있다는 것을 주의해야 한다.The features and advantages described herein are not all inclusive, and in particular many additional features and advantages will become apparent to those skilled in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used herein has been selected primarily for readability and for teaching purposes, and may not be chosen to describe or limit the subject matter of the invention.

본 발명의 실시예들의 상기한 설명은 예시의 목적으로 제시되었다. 이는 개시된 정확한 형태로 본 발명을 제한하거나, 빠뜨리는 것 없이 만들려고 의도한 것이 아니다. 당업자는 상기한 개시에 비추어 많은 수정 및 변형이 가능하다는 것을 이해할 수 있다.The above description of embodiments of the present invention has been presented for purposes of illustration. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Those skilled in the art will understand that many modifications and variations are possible in light of the above disclosure.

그러므로 본 발명의 범위는 상세한 설명에 의해 한정되지 않고, 이를 기반으로 하는 출원의 임의의 청구항들에 의해 한정된다. 따라서, 본 발명의 실시예들의 개시는 예시적인 것이며, 이하의 청구항에 기재된 본 발명의 범위를 제한하는 것은 아니다.Therefore, the scope of the invention is not limited by the detailed description, but by any claims in the application on which it is based. Accordingly, the disclosure of the embodiments of the present invention is illustrative and does not limit the scope of the present invention described in the following claims.

1: 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치
10: 프로모션 성과 예측 인공신경망 장치
11: 프로모션 추천 장치
12: 인터넷 쇼핑몰 데이터 허브 장치
13: 프로모션 이미지 수정 장치
100: 제품 속성 데이터
110: 정책망
111: 가치망
112: 최적 프로모션 탐색 모듈
200: 프로모션 속성 데이터
300: 성과 데이터
400: 프로모션 성과 예측 데이터
500: 추천 프로모션 속성 데이터1: Device for predicting and recommending promotion performance in online shopping malls using artificial intelligence
10: Promotional performance prediction artificial neural network device
11: Promotional recommendation device
12: Internet shopping mall data hub device
13: Promotional image editing device
100: product attribute data
110: policy network
111: value chain
112: Optimal Promotion Search Module
200: promotion attribute data
300: performance data
400: Promotional performance prediction data
500: Recommended promotion attribute data

Claims

A memory module for storing a program code of a promotion performance prediction artificial neural network module and a promotion recommendation module previously learned based on promotion attribute data and performance data for at least one online shopping mall; And
A processing module for predicting a performance of a specific promotion and recommending a promotion attribute by processing the promotion performance prediction artificial neural network module and the program code of the promotion recommendation module;
Including,
The program code of the promotion recommendation module,
A promotion performance prediction step of receiving promotion performance prediction data, which is information on promotion performance predicted for a specific promotion attribute, by the promotion performance prediction artificial neural network module;
A state information receiving step of receiving first state information (s _t ) for episode t;
Attribute for generating reward possibility information in a value network based on the first state information and the promotion performance prediction data, and then outputting a recommendation promotion attribute probability that is a plurality of actions in a policy network based on the reward possibility information Probability output step;
A recommended promotion attribute data output step of selecting and outputting recommended promotion attribute data (a _t ) for the episode t through optimal promotion search based on the plurality of recommended promotion attribute probabilities;
After the promotion according to the recommended promotion attribute data is applied, second state information (s _t+1 ) for episode t+1 and reward information (r _t+1 ) for the recommended promotion attribute data (a _t) are received. Receiving status and compensation information; And
A policy network and value network updating step of updating the policy network and the value network based on the received second state information (s _t+1 ) and the compensation information (r _t+1 );
It is configured to be performed on a computer including,
The first state information and the second state information are performance data (Quantity, Sales Amount, Page View), Conversion Rate, Average Sales Price or Sales Profit) And environmental data (date information, day of the week information, weather information, product information in progress, shopping mall information or other product information in the shopping mall),
The recommended promotion attribute data refers to a promotion category, promotion period, promotion shopping mall, exposure account or promotion image color,
The compensation information means an increase or decrease in the performance data for a specific period,
The policy network is an artificial neural network that outputs the recommended promotion attribute probability in each state, and the cost function of the policy network is configured as in Equation 1 below,
The value network is an artificial neural network that generates the reward possibility information, which is a possibility of achieving a reward in each state, and the cost function of the value network is configured as in Equation 2 below,
Device for predicting and recommending promotion performance in online shopping malls using policy and value networks:
[Equation 1]

[Equation 2]

In Equation 1, π is a policy function, θ is a policy network parameter, π _θ (a _i │s _i ) is the probability of performing a specific action (promotion of a specific attribute) in the current episode, V is a value function, and w is a value Network parameter, s _i is the state information of the current episode i, S _i+1 is the state information of the next episode i+1, r _i+1 is the reward expected to be acquired in the next episode, V _w (s _i ) Is the probability of compensation in the current episode, V _w (s _i+1 ) is the probability of compensation in the next episode, γ is the depreciation rate,
In Equation 2, V is a value function, w is a value network parameter, s _i is the state information of the current episode i, S _i+1 is the state information of the next episode _i+1 , and r _i+1 is the next episode. Reward expected to be obtained at, V _w (s _i ) is the probability of compensation in the current episode, V _w (s _i+1 ) is the probability of compensation in the next episode, and γ is the depreciation rate.