KR101926403B1

KR101926403B1 - Promotion Performance Prediction and Recommendation Apparatus in Online Shopping Mall Using Artificial Intelligence

Info

Publication number: KR101926403B1
Application number: KR1020180049454A
Authority: KR
Inventors: 남기헌; 남기준
Original assignee: 남정우; 남기헌; 남기준
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2018-12-07

Abstract

The present invention relates to a promotion performance prediction and recommendation apparatus in an online shopping mall using artificial intelligence and a method thereof. According to an embodiment of the present invention, the promotion of the online shopping mall becomes elaborate and precise by the learning of the artificial intelligence, thereby raising a promotion effect. The promotion performance prediction and recommendation apparatus includes a memory module for storing a program code and a processing module for predicting the performance of a specific promotion and processing a promotion attribute.

Description

{Promotion Performance Prediction and Recommendation Apparatus in Online Shopping Mall Using Artificial Intelligence}

본 발명은 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and a method for predicting and recommending promotion performance in an online shopping mall using artificial intelligence.

온라인 쇼핑몰에서의 프로모션은 제품 할인, 경품 이벤트, 할인 쿠폰 지급, 마일리지 지급 등의 형태로 진행된다. 이러한 프로모션은 온라인 쇼핑몰의 입장에서 항상 순익의 감소를 야기하기 때문에 프로모션의 종류, 프로모션 기간, 프로모션의 정도 등이 해당 페르소나에 맞게 정밀하게 설계되어야 한다. 하지만 기존에는 프로모션의 설계가 모두 온라인 쇼핑몰의 운영자나 프로모션 대행사 등에 의해 경험적으로 결정되는 실정이었다.Promotions in online shopping malls are in the form of product discounts, sweepstakes events, discount coupon payments, and mileage payments. Because these promotions always lead to a decline in net profit from an online shopping mall, the type of promotions, the duration of promotions, and the degree of promotion must be precisely designed to match the persona. In the past, however, the design of promotions has been determined empirically by the operators of online shopping malls and promotional agencies.

대한민국의 온라인 쇼핑은 오픈마켓/소셜커머스/종합쇼핑몰/대형마트쇼핑몰/카드포인트몰 등 다양한 형태의 회사들이 존재하고 있고, 대부분의 판매자들은 위의 온라인 쇼핑몰에 복수로 입점하여 판매를 하고 있어서, 통일된 데이터를 바탕으로 한 판매 성과를 분석하기 어려운 문제가 발생하고 있다. There are various forms of online shopping in Korea such as open market, social commerce, general shopping mall, large mart shopping mall, and card point mall. Most sellers are selling in multiple online shopping malls, It is difficult to analyze the sales performance based on the collected data.

프로모션의 구성요소는 프로모션의 일시 및 조건들과 관련된 내용으로 프로모션 결과치와 인과관계를 이루고 있으나, 구성요건의 하나인 일시를 예를 들어도 계절, 월, 요일, 시간, 기간, 공휴일 등의 다양한 사항들이 포함되어 있어서 결과치를 바탕으로 구성요소가 기여한 기여도 등을 사람이 계산하는 것은 매우 어려운 실정이다. The components of a promotion are related to the date and condition of the promotion, which is causal to the promotion result. However, various items such as season, month, day, time, It is very difficult for a person to calculate the contribution of the component based on the result.

특히, 소비자에게 혜택으로 작용할 프로모션 감가율의 경우, 감가율이 커질수록 구매 혜택이 커지는 결과를 가져오므로 매출 수량이 늘어나서 매출이 증가할 것이다. 하지만, 제품의 원가를 고려했을 때 매출수량 증가분과 감가율의 적정 지점에서 균형있게 적용하는 것이 중요하고, 이때 기대한 매출 이익을 달성할 수 있게 될 것 인지에 대해 사람이 계산하는 것은 매우 어려운 실정이다. In particular, in the case of promotional depreciation, which will serve as a benefit to consumers, the greater the depreciation rate, the greater the purchase benefits. As a result, sales volume will increase and sales will increase. However, considering the cost of the product, it is important to balance the sales volume increase and the depreciation rate at the appropriate point, and it is very difficult for a person to calculate whether or not the expected sales profit will be achieved .

신경망을 가지는 협업 필터링 시스템을 이용하여 클릭패턴에 기초한 웹 광고 추천 방법 및 그 시스템, 대한민국 등록특허 10-0792700, 네이버 주식회사A Web advertisement recommendation method and system based on a click pattern using a collaborative filtering system having a neural network, Korean Patent Registration No. 10-0792700, Naver Corporation

따라서, 본 발명의 목적은 기존에 경험적으로 설계되던 온라인 쇼핑몰의 프로모션의 낮은 정밀도를 해결하기 위하여, 인공지능을 이용하여 프로모션의 성과를 예측하고 프로모션의 특성을 추천하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법을 제공하는 데에 있다. Accordingly, it is an object of the present invention to provide an online shopping mall using artificial intelligence that predicts the performance of a promotion using artificial intelligence and recommends a characteristic of the promotion, in order to solve the low precision of the promotion of the online shopping mall, And to provide a recommendation apparatus and method for predicting and recommending the promotional performance of the promotional products.

이하 본 발명의 목적을 달성하기 위한 구체적 수단에 대하여 설명한다.Hereinafter, specific means for achieving the object of the present invention will be described.

본 발명의 목적은, 제품 속성 데이터, 프로모션 속성 데이터, 기타 성과 데이터 및 프로모션 를 기초로 기학습된 인공신경망의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 인공신경망의 프로그램 코드를 처리하여 특정 프로모션의 성과를 예측하는 처리 모듈; 을 포함하고, 상기 프로그램 코드는, 제품 속성 데이터(100), 프로모션 속성 데이터(200), 성과 데이터(300)를 입력 받는 입력 단계; 상기 제품 속성 데이터, 상기 프로모션 속성 데이터 및 상기 성과 데이터를 기초로 상기 특정 프로모션에 대한 성과를 예측하여 프로모션 성과 예측 데이터를 생성하는 성과 예측 단계; 및 예측된 상기 프로모션 성과 예측 데이터를 출력하는 출력 단계;를 포함하여 컴퓨터 상에서 수행되도록 구성되는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치를 제공하여 달성될 수 있다.An object of the present invention is to provide a memory module that stores program codes of an artificial neural network learned based on product attribute data, promotion attribute data, other performance data, and promotions; And a processing module for processing the program code of the artificial neural network to predict the performance of a specific promotion; Wherein the program code comprises: an input step of receiving product attribute data (100), promotion attribute data (200), and performance data (300); A performance prediction step of predicting performance of the specific promotion on the basis of the product property data, the promotion property data, and the performance data to generate the promotion performance data; And an output step of outputting the predicted promo performance data. The present invention can be achieved by providing a promotional performance prediction and recommendation apparatus in an online shopping mall using artificial intelligence configured to be performed on a computer.

또한, 프로모션 속성 데이터 및 성과 데이터를 기초로 기학습된 프로모션 성과 예측 인공신경망 모듈 및 프로모션 추천 모듈의 프로그램 코드를 저장하는 메모리 모듈; 및 상기 프로모션 성과 예측 인공신경망 모듈 및 상기 프로모션 추천 모듈의 프로그램 코드를 처리하여 특정 프로모션의 성과를 예측하고 프로모션 속성을 추천하는 처리 모듈;을 포함하고, 상기 프로모션 추천 모듈의 상기 프로그램 코드는, 상기 프로모션 성과 예측 인공신경망 모듈에서 특정 프로모션 속성에 대해 예측된 프로모션 성과에 대한 정보인 프로모션 성과 예측 데이터를 수신하는 프로모션 성과 예측 단계; 에피소드 t에 대한 상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 상태 정보 및 상기 프로모션 성과 예측 데이터를 기초로 가치망에서 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 정책망에서 복수개의 액션(action)인 추천 프로모션 속성 확률을 출력하는 속성 확률 출력 단계; 상기 복수개의 추천 프로모션 속성 확률을 기초로 최적 프로모션 탐색을 통해 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 수신된 상기 상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 정책망과 상기 가치망을 업데이트하는 정책망 및 가치망 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 프로모션 카테고리, 프로모션 기간, 프로모션 쇼핑몰, 노출구좌 또는 프로모션 이미지 색상을 의미하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치를 제공하여 달성될 수 있다. A memory module for storing program codes of the promotional performance prediction artificial neural network module and the promotion recommendation module, which have been previously learned based on the promotion attribute data and the performance data; And a processing module processing the program code of the promotion performance prediction artificial neural network module and the promotion recommendation module to predict a performance of a specific promotion and to recommend a promotion attribute, wherein the program code of the promotion recommendation module includes: A performance prediction predicting step of receiving promotional performance prediction data, which is information on predicted promotion performance for a specific promotion attribute in the performance prediction artificial neural network module; A status information receiving step of receiving status information (s _t ) for episode t; Generating a compensation probability information in a value network based on the state information and the promotion performance prediction data and outputting an attribute probability output for outputting a recommended promotion property probability as a plurality of actions in a policy network based on the compensation possibility information step; A recommended promotional attribute data output step of selecting and outputting recommended promotional attribute data (a _t ) for the episode t through an optimum promotional search based on the plurality of recommended promotional attribute probabilities; Receiving state information and compensation information (rt _{+ 1} ) for state information (s _{t + 1} ) and a _t for episode t + 1 after the promotion according to the recommended promotional attribute data is applied; And updating the policy network and the value network based on the received status information (s _{t + 1} ) and the compensation information (r _{t + 1} ); Wherein the status information is configured to be executed on a computer, the status information including at least one of performance data (such as a quantity, a sales amount, a page view, a conversion rate, an average sales price, Sales promotion information, sales profit) and environmental data (date information, day of the week information, weather information, promotional product information, shopping mall information, or other product information of the shopping mall), and the recommended promotion attribute data includes promotional category, Wherein the compensation information refers to a color of an exposure account or a promotion image and the compensation information means a rise or a fall of the performance data for a specific period of time. Can be achieved.

또한, 상기 추천 프로모션 속성 데이터 출력 단계의 상기 최적 프로모션 탐색은 몬테카를로 트리 탐색이 이용되고, 상기 몬테카를로 트리 탐색에서 트리의 각 노드는 상태(state)를, 각 연결(edge)은 특정 상태에 대한 특정 액션에 따라 예상되는 가치(value)를 나타내며, 현재 상태를 뿌리 노드로 두고 새로운 액션을 취해 새로운 상태로 전이될 때 마다 잎(leaf) 노드가 확장되는 구조로 구성되고, 상기 추천 프로모션 속성 데이터 출력 단계는, 상기 현재 상태로부터 상기 잎 노드가 나올 때까지 선택 가능한 액션 중 가장 가치가 높은 액션을 선택하며 진행하는 선택 단계; 시뮬레이션이 상기 잎 노드까지 진행되면 지도학습으로 학습된 상기 정책망의 확률에 따라 액션하여 새로운 잎 노드를 추가하는 확장 단계; 상기 새로운 잎 노드로부터 상기 가치망을 사용해 판단한 보상 가능성 정보와 상기 잎 노드로부터 상기 정책망을 사용해 프로모션 에피소드가 끝날 때까지 진행해 얻은 보상을 계산하여 상기 새로운 잎 노드의 가치를 평가하는 평가 단계; 및 상기 새로운 잎 노드의 상기 가치를 반영하여 시뮬레이션 중 방문한 노드들의 가치를 재평가하고 방문 빈도를 업데이트하는 업데이트 단계;를 더 포함하고, 상기 방문 빈도를 기초로 상기 추천 프로모션 속성 데이터(a_t)를 선정하는 것을 특징으로 할 수 있다. The optimal promotional search in the step of outputting the recommended promotional attribute data may be performed using a Monte Carlo tree search. In the Monte Carlo tree search, each node of the tree may represent a state, each edge may be a specific action And a leaf node is expanded each time a transition is made to a new state by taking a new action with the current state as a root node, and the recommended promotional attribute data output step A selection step of selecting an action having the highest value among the selectable actions from the current state until the leaf node comes out; An extension step of adding a new leaf node according to a probability of the policy network learned by the learning of the guidance when the simulation proceeds to the leaf node; An evaluation step of evaluating the value of the new leaf node by calculating the compensation possibility information determined using the value network from the new leaf node and the compensation obtained from the leaf node until the end of the promotion episode using the policy network; And an update step of re-evaluating the value of visited nodes during simulation and reflecting the value of the new leaf node, and updating the visit frequency, wherein the recommendation promotion attribute data (a _t ) is selected based on the visit frequency .

또한, 상기 정책망은, 랜덤 벡터를 포함하여 상기 프로모션 속성 데이터 및 상기 성과 데이터를 기초로 지도학습이 되도록 구성되는 것을 특징으로 할 수 있다. In addition, the policy network may be configured to perform map learning based on the promotion attribute data and the performance data including a random vector.

본 발명의 다른 목적은, 프로모션 추천 모듈이 프로모션 성과 예측 인공신경망 모듈에서 특정 프로모션 속성에 대해 예측된 프로모션 성과에 대한 정보인 프로모션 성과 예측 데이터를 수신하는 프로모션 성과 예측 단계; 상기 프로모션 추천 모듈이 에피소드 t에 대한 상태 정보(s_t)를 수신하는 상태 정보 수신 단계; 상기 프로모션 추천 모듈이 상기 상태 정보 및 상기 프로모션 성과 예측 데이터를 기초로 가치망에서 보상 가능성 정보를 생성한 뒤, 상기 보상 가능성 정보를 기초로 정책망에서 복수개의 액션(action)인 추천 프로모션 속성 확률을 출력하는 속성 확률 출력 단계; 상기 프로모션 추천 모듈이 상기 복수개의 추천 프로모션 속성 확률을 기초로 최적 프로모션 탐색을 통해 에피소드 t에 대한 추천 프로모션 속성 데이터(a_t)를 선정하여 출력하는 추천 프로모션 속성 데이터 출력 단계; 상기 추천 프로모션 속성 데이터에 따른 프로모션이 적용된 뒤, 상기 프로모션 추천 모듈이 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 상태 및 보상 정보 수신 단계; 및 상기 프로모션 추천 모듈이 수신된 상기 상태 정보(s_t+1) 및 상기 보상 정보(r_t+1)를 기초로 상기 정책망과 상기 가치망을 업데이트하는 정책망 및 가치망 업데이트 단계; 를 포함하여 컴퓨터 상에서 수행되도록 구성되고, 상기 상태 정보는 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price) 또는 매출이익) 및 환경 데이터(날짜 정보, 요일 정보, 날씨 정보, 프로모션 진행 중인 제품 정보, 쇼핑몰 정보 또는 쇼핑몰의 다른 제품 정보)를 의미하며, 상기 추천 프로모션 속성 데이터는 프로모션 카테고리, 프로모션 기간, 프로모션 쇼핑몰, 노출구좌 또는 프로모션 이미지 색상을 의미하고, 상기 보상 정보는 특정 기간 동안의 상기 성과 데이터의 상승 또는 하강을 의미하는 것을 특징으로 하는, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 방법을 제공하여 달성될 수 있다. Another object of the present invention is to provide a promotional performance prediction step of receiving promotional performance prediction data, which is information on a promotional performance predicted for a specific promotional attribute in a promotional performance artificial neural network module; A status information receiving step of the promotion recommendation module receiving status information (s _t ) for the episode t; The promotion recommendation module generates the compensation possibility information in the value network on the basis of the state information and the promotion performance prediction data, and then, based on the compensation possibility information, the recommendation promotion probability, which is a plurality of actions in the policy network, Outputting an attribute probability output step; A recommended promotional attribute data output step of selecting and outputting recommended promotional attribute data (a _t ) for the episode t through an optimum promotional search based on the plurality of recommended promotional attribute probabilities; After the promotion based on the recommended promotional attribute data is applied, the promotion recommendation module updates the state and compensation (rt _{+ 1} ) for state information (s _{t + 1} ) and a _t for episode t + Receiving information; And a policy network and a value network updating step of updating the policy network and the value network based on the status information (s _{t + 1} ) and the compensation information (r _{t + 1} ) received by the promotion recommendation module; Wherein the status information is configured to be executed on a computer, the status information including at least one of performance data (such as a quantity, a sales amount, a page view, a conversion rate, an average sales price, Sales promotion information, sales profit) and environmental data (date information, day of the week information, weather information, promotional product information, shopping mall information, or other product information of the shopping mall), and the recommended promotion attribute data includes promotional category, Wherein the compensation information refers to a color of an exposure account or a promotion image and the compensation information means a rise or a fall of the performance data for a specific period of time. Can be achieved.

상기한 바와 같이, 본 발명에 의하면 이하와 같은 효과가 있다.As described above, the present invention has the following effects.

첫째, 본 발명의 일실시예에 따르면, 인공지능의 학습에 의해 온라인 쇼핑몰의 프로모션이 정교해지고 정밀해짐으로써 프로모션의 효과가 상승하는 효과가 발생된다.First, according to the embodiment of the present invention, the promotion of the online shopping mall is elaborated and refined by the learning of the artificial intelligence, so that the effect of the promotion is raised.

둘째, 본 발명의 일실시예에 따르면, 가치망(111)에 따라 추천 프로모션 속성의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 프로모션 추천 모델에 적용하는데는 어려움이 있었다. Second, according to an embodiment of the present invention, an update of the policy network that outputs the probability of a recommended promotional attribute according to the value network 111 can be effected for each episode. In the existing reinforcement learning, there is a problem that the update of the reinforcement learning model is carried out after all the episodes have ended, so that it has been difficult to apply to the recommendation recommendation model.

셋째, 본 발명의 일실시예에 따르면, 프로모션의 추천은 단기 및 장기 목표들이 존재하므로 MDP의 가정이 성립하지 않아서 기존의 DQN 적용에 어려움이 있음에도 불구하고, 정책망 및 가치망의 의존적 관계에 의해 프로모션 추천에도 강화학습을 진행할 수 있게 되는 효과가 발생된다. Thirdly, according to the embodiment of the present invention, since the recommendation of the promotion is short-term and long-term, there is difficulty in applying the DQN because the assumption of the MDP is not established, Promotional recommendation can also be reinforced.

넷째, 본 발명의 일실시예에 따르면, 정책망 및 가치망이 프로모션 성과 예측 인공신경망 장치와 연결되어 프로모션 성과 예측 데이터를 기초로 학습되기 때문에 학습 속도가 향상되는 효과가 발생된다. Fourth, according to an embodiment of the present invention, the learning speed is improved because the policy network and the value network are connected to the promotion performance prediction artificial neural network device and learned based on the promotion performance data.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)를 도시한 것,
도 2는 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 학습 과정을 도시한 모식도,
도 3은 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 추론 과정을 도시한 모식도,
도 4는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)를 도시한 모식도,
도 5는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 동작예를 도시한 흐름도,
도 6은 본 발명의 일실시예에 따른 프로모션 이미지 수정 장치(13)를 도시한 모식도,
도 7은 본 발명의 일실시예에 따른 ConvNet 인코더의 예를 도시한 모식도,
도 8은 본 발명의 일실시예에 따른 프로모션 추천 방법을 도시한 흐름도이다. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and, together with the description, And shall not be interpreted.
FIG. 1 is a diagram illustrating a promotion performance prediction and recommendation apparatus 1 in an online shopping mall using artificial intelligence according to an embodiment of the present invention,
FIG. 2 is a schematic diagram showing a learning process of a promotional performance prediction artificial neural network device according to an embodiment of the present invention;
3 is a schematic diagram illustrating a reasoning process of the promotional performance prediction artificial neural network apparatus according to an embodiment of the present invention;
4 is a schematic diagram showing a promotion recommendation apparatus 11 according to an embodiment of the present invention,
5 is a flowchart illustrating an example of the operation of the promotion recommendation apparatus 11 according to an embodiment of the present invention,
6 is a schematic diagram showing an apparatus 13 for revising a promotion image according to an embodiment of the present invention,
7 is a schematic diagram showing an example of a ConvNet encoder according to an embodiment of the present invention,
FIG. 8 is a flowchart illustrating a recommendation method according to an exemplary embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 쉽게 실시할 수 있는 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예에 대한 동작원리를 상세하게 설명함에 있어서 관련된 공지기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following detailed description of the operation principle of the preferred embodiment of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may unnecessarily obscure the subject matter of the present invention.

또한, 도면 전체에 걸쳐 유사한 기능 및 작용을 하는 부분에 대해서는 동일한 도면 부호를 사용한다. 명세서 전체에서, 특정 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고, 간접적으로 연결되어 있는 경우도 포함한다. 또한, 특정 구성요소를 포함한다는 것은 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다.The same reference numerals are used for portions having similar functions and functions throughout the drawings. In the specification, when a specific portion is connected to another portion, it includes not only a direct connection but also a case where the other portion is indirectly connected with another element in between. In addition, the inclusion of a specific constituent element does not exclude other constituent elements unless specifically stated otherwise, but may include other constituent elements.

인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치 및 방법Promotion Performance Prediction and Recommendation Device and Method in Online Shopping Mall Using Artificial Intelligence

도 1은 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)를 도시한 것이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치는 제품 속성 데이터, 프로모션 속성 데이터, 성과 데이터 및 프로모션을 기초로 기학습된 프로모션 성과 예측 인공신경망 장치(10) 및 프로모션 추천 장치(11)를 포함할 수 있다. FIG. 1 shows a promotional performance prediction and recommendation apparatus 1 in an online shopping mall using artificial intelligence according to an embodiment of the present invention. As shown in FIG. 1, the promotion performance prediction and recommendation apparatus in the online shopping mall using artificial intelligence according to an embodiment of the present invention is configured to perform a promotion promotion based on product attribute data, promotion attribute data, performance data, A performance prediction artificial neural network device 10 and a promotion recommendation device 11.

프로모션 성과 예측 인공신경망 장치(10) 관련, 도 2는 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 학습 과정을 도시한 모식도, 도 3은 본 발명의 일실시예에 따른 프로모션 성과 예측 인공신경망 장치의 추론 과정을 도시한 모식도이다. 도 2, 3에 도시된 바와 같이, 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 방법은 제품 속성 데이터(100), 프로모션 속성 데이터(200), 성과 데이터(300)를 수신하여 프로모션 성과 예측 인공신경망 장치(10)를 학습시키고, 기학습된 프로모션 성과 예측 인공신경망 장치(10)에 제품 속성 데이터(100) 및 프로모션 속성 데이터(200)를 입력하여 예측되는 프로모션 성과 예측 데이터(400)을 출력할 수 있다. 특히, 프로모션 성과 예측 인공신경망 장치(10)의 학습시에는 프로모션 성과 데이터와 출력 데이터와의 에러(error)를 기초로 hidden layer의 weight를 업데이트하는 방법으로 진행될 수 있다. 도 2에서 제품 속성 데이터(100) 및 프로모션 속성 데이터(200)를 구성하는 각 속성은 x1, x2, x3와 같은 input layer의 각 노드에 입력되고, w1과 같은 weight을 기반으로 h1, h2, h3와 같은 hidden layer를 지나 softmax 등의 cost function 기반으로 예측된 프로모션 성과 예측 데이터(400)가 y1인 output layer로 출력되게 된다. 예측된 성과 데이터와 실제 성과 데이터(300)와의 에러(error, -Sigma(y_i log p_i))를 기반으로 프로모션 성과 예측 인공신경망 장치(10)의 weight가 back propagation으로 업데이트 될 수 있다. FIG. 2 is a schematic diagram showing a learning process of a promotional performance prediction artificial neural network device according to an embodiment of the present invention. FIG. 3 is a diagram illustrating a promotional performance prediction according to an embodiment of the present invention. Fig. 2 is a schematic diagram showing an inference process of an artificial neural network device. Fig. As shown in FIGS. 2 and 3, the method for predicting promotional performance in an online shopping mall using artificial intelligence receives product attribute data 100, promotion attribute data 200, and performance data 300, It is possible to output the predicted promotional performance prediction data 400 by learning the apparatus 10 and inputting the product attribute data 100 and the promotion attribute data 200 to the previously learned promotional performance prediction ANN apparatus 10 have. In particular, during the learning of the promotion performance prediction artificial neural network device 10, the weight of the hidden layer may be updated based on the error between the promotion performance data and the output data. 2, each attribute constituting the product attribute data 100 and the promotion attribute data 200 is input to each node of the input layer such as x1, x2, and x3, and h1, h2, h3 The predicted promotion performance data 400 based on the cost function such as softmax is output to the output layer y1. The weight of the promotional performance prediction artificial neural network device 10 may be updated to back propagation based on the error (-Sigma (y _i log p _i )) between the predicted performance data and the actual performance data 300.

또는, 각각의 제품에 대해 별도의 프로모션 성과 예측 인공신경망 장치(10)를 구성하고, 프로모션 속성 데이터(200), 성과 데이터(300)를 수신하여 프로모션 성과 예측 인공신경망 장치(10)를 학습시키고, 기학습된 프로모션 성과 예측 인공신경망 장치에 프로모션 속성 데이터(200)를 입력하여 예측되는 프로모션 성과 예측 데이터(400)을 출력할 수 있다. 이 때 이용될 수 있는 인공신경망 이외에도 다층 퍼셉트론(Multi-layer perceptron), 나이브 베이지안 분류(Naive-Bayesian Classification), 랜덤 포레스트 분류(Random Forest Classification) 등의 기계학습이 이용될 수 있다.Alternatively, a separate promotional performance artificial neural network device 10 is configured for each product, and the promotional performance data 200 and the performance data 300 are received to learn the promo performance prediction artificial neural network device 10, The predicted promotional performance data 400 can be output by inputting the promotional attribute data 200 to the predicted promotional performance neural network device. Machine learning such as multi-layer perceptron, Naive-Bayesian classification, and random forest classification can be used in addition to the artificial neural network that can be used at this time.

제품 속성 데이터(100)는 제품 카테고리, 제품 판매가격, 제품원가 등을 포함할 수 있다. The product attribute data 100 may include a product category, a product selling price, a product cost, and the like.

프로모션 속성 데이터(200)는 프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 등을 포함할 수 있다. 본 발명의 일실시예에 따른 프로모션 속성 데이터 중 프로모션 이미지 데이터는 기학습된 CNN(Convolutional Neural Network)을 통해 특정 분위기 카테고리, 특정 색감 카테고리로 분류된 결과값을 의미할 수 있다. Promotion attribute data 200 may include promotional categories (discount promotions, gift promotions, etc.), promotion periods, promotional malls, exposure accounts, promotional images, and the like. Promotion image data among the promotion attribute data according to an embodiment of the present invention may mean a result value classified into a specific atmosphere category and specific color category through the CNN (Convolutional Neural Network) that has been learned.

성과 데이터(300)는 판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등을 포함할 수 있다. 본 발명의 일실시예에 따르면, 성과 데이터(300)의 다중공선성(Multicollinearity)에 따른 문제를 방지하기 위하여 성과 데이터(300)를 구성하는 각각의 데이터에 대해 PCA를 적용하여 diagnol matrix의 형태로 공선성을 상쇄해줄 수 있다. The performance data 300 may include a quantity, a sales amount, a page view, a conversion rate, an average sales price, a sales profit, and the like. According to an embodiment of the present invention, in order to prevent a problem due to the multicollinearity of the performance data 300, PCA is applied to each data constituting the performance data 300 to form a diagnosis matrix It can offset the collinearity.

프로모션 속성 추천과 관련하여, 상기 프로모션 성과 예측 인공신경망 장치를 이용하여 가장 높은 프로모션 성과 예측 데이터(400)가 출력되도록 하는 추천 프로모션 속성 데이터를 출력하는 프로모션 추천 장치(11)가 구성될 수 있다. In connection with the recommendation of the promotional attribute, the promotional recommendation apparatus 11 may be configured to output the recommended promotional attribute data for outputting the highest promotional performance predictive data 400 using the promotional performance predictive NE.

본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 프로모션 추천은 강화학습을 이용하여 수행될 수 있다. 도 4는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)를 도시한 모식도이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 프로모션 추천 장치(11)는 특정 상태에서의 가치를 출력하는 가치 함수를 학습하는 인공신경망인 가치망(111) 및 추천 프로모션 속성의 확률을 출력하는 정책 함수를 학습하는 정책망(110)을 포함할 수 있고, 본 발명의 일실시예에 따른 정책망(110) 및 가치망(111)은 프로모션 성과 예측 인공신경망 장치(10)에 연결되도록 구성될 수 있다. 또한, 정책망(110)과 가치망(111)은 최적 프로모션 탐색 모듈(12)과 연결되어 추천 프로모션 속성 데이터(500)를 출력할 수 있다. Promotion recommendation of the promotion recommendation apparatus 11 according to an embodiment of the present invention can be performed using reinforcement learning. 4 is a schematic diagram showing a promotion recommendation apparatus 11 according to an embodiment of the present invention. 4, the promotion recommendation apparatus 11 according to an embodiment of the present invention includes a value network 111 as an artificial neural network that learns a value function for outputting a value in a specific state, The policy network 110 and the value network 111 according to an embodiment of the present invention may be connected to the promotion performance prediction ANN apparatus 10 Lt; / RTI > In addition, the policy network 110 and the value network 111 may be connected to the optimal promotion search module 12 to output the recommended promotion attribute data 500.

강화학습의 관점에서, 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 Objective는 프로모션의 성과를 향상시키는 것이고, 상태(State)는 현재 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등) 및 환경 데이터(날짜, 요일, 날씨, 현재 프로모션 진행 중인 제품, 쇼핑몰, 쇼핑몰의 다른 제품 정보)를 의미할 수 있고, 액션(Action)은 프로모션 속성 데이터(프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 색상 등)을 의미할 수 있으며, 보상(Reward)은 특정 기간 동안의 성과 데이터의 상승 또는 하강을 의미할 수 있다. In view of the reinforcement learning, the objective of the promotion recommendation apparatus 11 according to the embodiment of the present invention is to improve the performance of the promotion, and the state is the current performance data (quantity of sales (Quantity), sales amount ), Inflow (Page View), conversion rate, average sales price, sales profit, etc.) and environmental data (date, day, weather, current promotional products, shopping mall, ), Action may mean promotional attribute data (promotional category (discount promotion, gift promotion, etc.), promotion period, promotion shopping mall, exposure account, promotion image color, etc.) ) May refer to an increase or decrease in performance data for a particular period of time.

정책망(110)은 프로모션 추천 장치(11)의 각 상태에서 추천 프로모션의 특정 속성들의 확률을 결정하는 인공신경망이고, 정책 함수를 학습하여 추천 프로모션 속성 확률을 출력하게 된다. 정책망의 Cost function은 정책함수와 가치망의 Cost Function을 곱하여 크로스 엔트로피(Cross Entropy)를 계산한 뒤 Policy gradient를 취한 함수일 수 있고, 예를 들면, 아래 수학식 1과 같이 구성될 수 있다. 정책망은 크로스 엔트로피와 가치망의 cost function인 시간차 에러의 곱을 기초로 back propagation 될 수 있다. The policy network 110 is an artificial neural network that determines the probability of specific attributes of a recommendation promotion in each state of the recommendation recommendation apparatus 11 and learns a policy function to output a recommended promotional attribute probability. The cost function of the policy network may be a function obtained by calculating the cross entropy by multiplying the policy function and the cost function of the value network, and then taking the policy gradient, for example, as shown in Equation 1 below. The policy network can be back propagated based on the product of the cross entropy and the time difference error, which is the cost function of the value network.

수학식 1에서, π는 정책 함수, θ는 정책망 파라미터, π_θ(a_i│s_i)는 현재 에피소드에서 특정 액션(특정 속성의 프로모션)을 할 가능성, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 프로모션 성과 예측 인공신경망 장치(10)에서 수신하도록 구성될 수 있다. 결국, 본 발명의 일실시예에 따른 정책망(110)은 Policy gradient를 통해 초기에는 사용자의 프로모션 히스토리를 모사하는 프로모션 속성을 출력하게 된다. In the equation 1, π is a policy function, θ is a policy network parameter, π _θ (a _i │s _i ) is the probability that a particular action (promotion of a particular attribute) is present in the current episode, V is a value function, _I is the state information of the current episode i, S _{i + 1} is the state information of the next episode i + 1, r _{i + 1} is the compensation expected to be obtained in the next episode, V _w (s _i ) The compensation potential in the current episode, V _w (s _{i + 1} ), can be compensated for in the next episode, and γ can mean the depreciation rate. At this time, r _{i + 1} can be configured to be received at the promo performance prediction artificial neural network device 10. As a result, the policy network 110 according to an exemplary embodiment of the present invention outputs a promotional attribute that simulates a user's promotional history through a policy gradient.

본 발명의 일실시예에 따른 정책망(110)은 강화학습이 진행되기 이전에 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습(Supervised Learning)되어 정책망의 weight가 업데이트 됨으로써 정책의 기초를 학습할 수 있다. 즉, 정책망의 weight는 기존의 프로모션 속성 데이터 및 성과 데이터를 토대로 지도학습되어 설정될 수 있다. 이에 따르면, 기존의 프로모션의 기록에 의해 정책망이 매우 빠르게 학습될 수 있는 효과가 발생된다. The policy network 110 according to the embodiment of the present invention performs supervised learning based on the existing promotion attribute data and the performance data thereof before the reinforcement learning progresses to update the weight of the policy network, You can learn the basics. That is, the weight of the policy network can be learned and set based on the existing promotion attribute data and performance data. According to this, the policy network can be learned very quickly by the recording of the existing promotion.

또한, 본 발명의 일실시예에 따르면 정책망(110)의 지도학습 시에 랜덤 벡터를 포함하여 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습이 되도록 구성될 수 있다. 랜덤 벡터는 예를 들면 가우시안 확률 분포(Gaussian distribution)를 이용할 수 있다. 이에 따르면, 정책망이 랜덤한 확률로 도전적인 프로모션 정책을 출력할 수 있게 되는 효과가 발생된다. 정책망(110)의 지도학습 시에 기존의 프로모션 속성 데이터와 이에 따른 성과 데이터를 기초로 지도학습이 되도록 구성하면 프로모션의 추천이 기존의 정책 내에서 최적화되는 결과가 나타나게 된다. 하지만, 본 발명의 일실시예에 따라 정책망의 지도학습 시에 랜덤 벡터를 포함하게 되면 강화학습이 진행될수록 정책망이 기존의 정책보다 더 효과적인 프로모션을 학습할 수 있게 되는 효과가 발생된다. In addition, according to an embodiment of the present invention, it is possible to configure the policy network 110 to perform map learning based on existing promotion attribute data including random vectors and performance data corresponding thereto. The random vector may use, for example, a Gaussian distribution. According to this, the policy network can output a challenging promotion policy with a random probability. If the map learning is performed based on the existing promotion attribute data and the performance data based on the existing promotion attribute data at the time of learning the policy network 110, the recommendation of the promotion is optimized in the existing policy. However, according to one embodiment of the present invention, when the random vector is included in the learning of the policy network, the policy network can learn a more effective promotion than the existing policy as the reinforcement learning progresses.

가치망(111)은 프로모션 추천 장치(11)가 가질 수 있는 각 상태(State)에서 보상(Reward)을 달성할 가능성을 도출하는 인공신경망이고, 가치 함수를 학습하게 된다. 가치망(111)은 에이전트(agent)인 프로모션 추천 장치(11)가 어떤 방향으로 업데이트 될 지에 대한 방향성을 제시해주게 된다. 이를 위해, 가치망(111)의 입력 변수는 프로모션 추천 장치(11)의 상태에 대한 정보인 상태 정보로 설정되고, 가치망(111)의 출력 변수는 프로모션 추천 장치(11)가 보상을 달성할 가능성인 보상 가능성 정보로 설정될 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보는 아래 수학식과 같은 Q-function으로 계산될 수 있다. The value network 111 is an artificial neural network that derives the possibility of achieving a reward in each state that the promotion recommendation apparatus 11 can have and learns the value function. The value network 111 presents the direction of the direction in which the promotional recommendation apparatus 11 as an agent is to be updated. To this end, the input variable of the value network 111 is set to state information, which is information on the state of the promotion recommendation apparatus 11, and the output variable of the value network 111 is set to the state where the promotion recommendation apparatus 11 It is possible to set the possibility of compensation possibility information. The compensability information according to an exemplary embodiment of the present invention can be calculated by a Q-function such as the following equation.

위 수학식 2에서 Q_π는 특정 정책 π에서 상태 s, 액션 a인 경우 미래에 예상되는 전체 보상 가능성 정보를 의미하고, R은 특정 기간의 보상, gamma는 감가율을 의미할 수 있다. S_t는 시간 t의 상태, A_t는 시간 t의 액션, E는 기대값을 의미할 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보(Q value)는 정책망(110)의 업데이트 방향 및 크기를 규정하게 된다. In Equation (2), Q _π denotes the total compensability information expected in the future in the case of the state s and action a in the specific policy π, R can be a compensation for a specific period, and gamma can mean a depreciation rate. S _t is the state of time t, A _t is the action of time t, and E is the expected value. The Q value according to an embodiment of the present invention defines the update direction and size of the policy network 110. [

이때, 가치망의 Cost function은 가치 함수에 대한 MSE(Mean Square error) 함수일 수 있고, 예를 들면 아래 수학식 3과 같이 구성될 수 있다. 가치망(111)은 가치망의 cost function인 시간차 에러를 기초로 back propagation 될 수 있다. At this time, the cost function of the value network can be a mean square error (MSE) function for the value function, and can be constructed as shown in Equation 3 below. The value network 111 can be back propagated based on the time difference error, which is a cost function of the value network.

수학식 2에서, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 프로모션 성과 예측 인공신경망 장치(10)에서 수신하도록 구성될 수 있다. In the equation 2, V is a value function, w is a value network parameter, s _i is state information of the current episode i, S _{i +1} is state information of the next episode _{i + 1} , V _w (s _i ) is the compensation potential in the current episode, V _w (s _{i + 1} ) is the compensation potential in the next episode, and γ is the depreciation rate. At this time, r _{i + 1} can be configured to be received at the promo performance prediction artificial neural network device 10.

이에 따라, 가치망은 프로모션 추천 장치의 상태가 변경될 때 수학식 1의 Cost Function을 Gradient descent 시키는 방향으로 업데이트 할 수 있다. Accordingly, the value network can update the cost function of Equation (1) in the direction of gradient descent when the state of the promotion recommendation apparatus is changed.

본 발명의 일실시예에 따르면 가치망을 정책망과 별도로 학습시키면서, 가치망의 Q value가 랜덤에서 시작하지 않고 Supervised되게 되므로 빠른 학습이 가능해지는 효과가 발생된다. 이에 따르면 매우 복잡도가 높은 프로모션 속성의 조합을 선택하는 액션(action)에 있어서 탐구(exploration) 부담을 크게 줄일 수 있게 되는 효과가 발생된다. According to the embodiment of the present invention, the Q value of the value network is supervised without starting from the random value while learning the value network separately from the policy network, so that the learning can be performed quickly. Accordingly, the burden of exploration in the action of selecting a combination of promotional properties having a very high complexity can be greatly reduced.

본 발명의 일실시예에 따른 프로모션 추천 장치(11)에 따르면, 지도학습을 마친 정책망(110)이 현재 에피소드 i의 프로모션 속성을 추천하게 되면 가치망(111)이 추천된 프로모션 속성을 진행할 경우의 보상을 예측하도록 학습된다. 학습을 마친 프로모션 추천 장치(11)의 정책망(110)과 가치망(111)은 최적 프로모션 탐색 모듈(112)을 활용한 시뮬레이션과 조합되어 최종적으로 프로모션 속성을 선정하는데 활용된다. According to the promotional recommendation apparatus 11 according to the embodiment of the present invention, when the policy network 110 after completion of the guidance recommends the promotional attribute of the current episode i, when the value network 111 advances the recommended promotional attribute To be compensated. The policy network 110 and the value network 111 of the completed promotional recommendation apparatus 11 are utilized in combination with the simulation using the optimum promotion search module 112 to finally select a promotion attribute.

또한, 본 발명의 일실시예에 따른 가치망(111)에 따르면 추천 프로모션 속성의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 프로모션 추천 모델에 적용하는데는 어려움이 있었다. In addition, according to the value network 111 according to the embodiment of the present invention, an update of the policy network for outputting the probability of the recommended promotional attribute can be effected for each episode. In the existing reinforcement learning, there is a problem that the update of the reinforcement learning model is carried out after all the episodes have ended, so that it has been difficult to apply to the recommendation recommendation model.

최적 프로모션 탐색 모듈(112)은 정책망과 가치망에서 계산되는 복수의 에이전트(agent)를 기초로 다양한 상태 및 다양한 액션에 대한 복수회의 시뮬레이션을 진행하여 최적의 프로모션 속성을 탐색하는 구성이다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈은, 예를 들어, 몬테카를로 트리 탐색을 활용할 수 있고, 트리의 각 노드는 상태(state)를, 각 연결(edge)은 해당 상태에 대한 특정 액션에 따라 예상되는 가치(value)를 나타내며, 현재 상태를 뿌리 노드로 두고 새로운 액션을 취해 새로운 상태로 전이될 때 마다 잎(leaf) 노드가 확장되는 구조이다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈에서 최적 프로모션 탐색은 몬테카를로 트리 탐색이 활용되는 경우, Selection, Expansion, Evaluation, Backup의 4 단계로 처리될 수 있다. The optimal promotional search module 112 performs a plurality of simulations for various states and various actions based on a plurality of agents calculated in a policy network and a value network to search for an optimal promotional attribute. The optimal promotional search module according to an embodiment of the present invention may utilize, for example, a Monte Carlo tree search, each node of the tree may be associated with a state, each edge may be associated with a particular action It represents the expected value (value) to follow, and it is a structure in which the leaf nodes are expanded each time a transition is made to a new state by taking a new action with the current state as a root node. In the optimal promotional search module according to an exemplary embodiment of the present invention, when the Monte Carlo tree search is utilized, the optimal promotional search can be processed in four steps of selection, expansion, evaluation, and backup.

최적 프로모션 탐색 모듈(112)의 Selection 단계는, 현재 상태로부터 잎 노드가 나올 때까지 선택 가능한 액션 중 가장 가치가 높은 액션을 선택하며 진행하는 단계이다. 이 때 연결(edge)에 저장해 둔 가치함수의 값과 탐구-이용 균형을 맞추기 위한 방문빈도 값을 이용한다. Selection 단계에서 액션 선택을 위한 수학식은 아래와 같다. The selection step of the optimal promotion search module 112 is a step of selecting the highest value action among the selectable actions until the leaf node comes out of the current state. In this case, we use the value of the value function stored at the edge and the visit frequency value to balance the inquiry-use balance. In the selection step, the formula for selecting an action is as follows.

위 수학식 4에서 a_t는 시간t에서의 액션(프로모션 수행)이고, Q(s_t,a)는 트리에 저장된 가치함수의 값이며, u(s_t,a)는 해당 상태-액션 쌍의 방문횟수에 반비례하는 값으로 탐구(exploration)와 이용의 균형을 맞추기 위해 사용된 것이다. In the above Equation 4 a _t is the action (performed promotion) at time t, Q (s _t, a) is the value of the value function is stored in the tree, u (s _t, a) is the state-of action pair It was used to balance exploration and utilization with inversely proportional to the number of visits.

최적 프로모션 탐색 모듈(112)의 Expansion 단계는, 시뮬레이션이 잎 노드까지 진행되면 지도학습으로 학습된 정책망의 확률에 따라 액션하여 새로운 노드를 잎 노드로 추가하는 단계이다. The Expansion stage of the optimal promotion search module 112 is a step of adding a new node as a leaf node according to the probability of the policy network learned by the map learning when the simulation proceeds to the leaf node.

최적 프로모션 탐색 모듈(112)의 Evaluation 단계는, 새로 추가된 잎 노드로부터 가치망을 사용해 판단한 가치(보상 가능성)와 잎 노드로부터 정책망을 사용해 프로모션 에피소드가 끝날 때까지 진행해 얻은 보상을 통해 잎 노드의 가치를 평가하는 단계이다. 아래 수학식은 새로운 잎 노드의 가치를 평가하는 예시이다. The evaluation step of the optimum promotional search module 112 is a step of evaluating the leaf node by using the value (compensation possibility) determined using the value network from the newly added leaf node and the compensation obtained from the leaf node to the end of the promotion episode using the policy network It is a step to evaluate value. The following equation is an example of evaluating the value of a new leaf node.

위 수학식 5에서 V(s_L)은 잎 노드의 가치, λ는 mixing 파라미터, v_θ(s_L)은 가치망을 통해 얻은 가치, z_L은 시뮬레이션을 계속하여 얻은 보상을 의미할 수 있다. In Equation 5, V (s _L ) is the value of the leaf node, λ is the mixing parameter, v _θ (s _L ) is the value obtained through the value network, and z _L is the compensation obtained by continuing the simulation.

최적 프로모션 탐색 모듈(112)의 Backup 단계는, 새로 추가된 잎 노드의 가치를 반영하여 시뮬레이션 중 방문한 노드들의 가치를 재평가하고 방문 빈도를 업데이트하는 단계이다. 아래 수학식은 노드 가치 재평가 및 방문 빈도 업데이트의 예시이다. The backup step of the optimum promotion search module 112 is a step of re-evaluating the values of visited nodes and updating the visit frequency during the simulation by reflecting the value of the newly added leaf node. The following equation is an example of node value revaluation and visit frequency update.

위 수학식 6에서 s_L ⁱ는 i번째 시뮬레이션에서의 잎 노드를, 1(s,a,i)는 i번째 시뮬레이션에서 연결 (s,a)를 방문했는지를 나타내고, 트리 탐색이 완료되면 알고리즘은 뿌리 노드로부터 가장 많이 방문된 연결(s,a)을 선택하도록 구성될 수 있다. 본 발명의 일실시예에 따른 최적 프로모션 탐색 모듈(112)에 따르면 정책망에 의해 선별되는 복수의 프로모션 속성에 대해 가치망을 기초로 복수회 시뮬레이션을 선행하여 최적의 프로모션 속성을 선택할 수 있게되는 효과가 발생된다. In Equation (6), s _L ⁱ represents the leaf node in the i-th simulation, and 1 (s, a, i) represents the connection (s, a) visited in the i-th simulation. And to select the most visited connection (s, a) from the root node. According to the optimum promotional search module 112 according to the embodiment of the present invention, it is possible to select the optimal promotional attribute by preceding the simulation multiple times based on the value network for a plurality of promotional attributes selected by the policy network Is generated.

본 발명의 일실시예에 따르면, 복수의 에이전트(Agent)가 구성되도록 프로모션 추천 장치(11)가 구성될 수 있다. 복수의 에이전트가 구성되면 특정 상태, 특정 프로모션 속성 각각에 대해 프로모션 추천 장치가 추천하는 프로모션이 상호 경쟁하여, 일정한 예산 내에서 가장 최적의 제품 및 그에 대한 프로모션을 추천할 수 있게 되는 효과가 발생된다.According to an embodiment of the present invention, the promotion recommendation apparatus 11 can be configured so that a plurality of agents (Agents) are configured. When a plurality of agents are configured, the promotions recommended by the promotional recommendation apparatus compete with each other for a specific state and a specific promotional attribute, so that the most optimal product and its promotion can be recommended within a certain budget.

도 5는 본 발명의 일실시예에 따른 프로모션 추천 장치(11)의 동작예를 도시한 흐름도이다. 도 5에 도시된 바와 같이, 복수의 인터넷 쇼핑몰에서 상태 데이터를 트래킹하는 인터넷 쇼핑몰 데이터 허브 장치(12)에 의해 상태 s(t)가 입력되면 가치망(111)에 의해 정책망(110)의 복수개의 에이전트(agent)들에 의해 다양한 프로모션 속성들이 최적 프로모션 탐색 모듈(112)에 입력되고, 최적 프로모션 탐색 모듈(112)에 의해 출력되는 액션(action)인 추천 프로모션 속성 확률 a(t)에 의해 프로모션이 진행되는 것으로 에피소드 t가 종료되고 에피소드 t+1이 시작된다. 에피소드 t+1에서는 다시 a(t)에 의한 상태 변화인 s(t+1)이 인터넷 쇼핑몰 데이터 허브 장치(12)에 의해 입력되고, a(t)에 따른 보상인 r(t+1)이 곧바로 입력되어 가치망(111) 및 정책망(110)을 업데이트하게 된다. 5 is a flowchart illustrating an exemplary operation of the promotion recommendation apparatus 11 according to an embodiment of the present invention. 5, when the state s (t) is input by the Internet shopping mall data hub device 12 that tracks status data in a plurality of Internet shopping malls, a value (s) Various promotional properties are input to the optimal promotion search module 112 by agents and the recommended promotion attribute probability a (t), which is an action output by the optimum promotion search module 112, The episode t is ended and the episode t + 1 is started. In the episode t + 1, s (t + 1), which is a state change due to a (t), is input by the Internet shopping mall data hub device 12 and r (t + 1) So that the value network 111 and the policy network 110 are updated.

또한, 본 발명의 일실시예에 따르면, 추천 프로모션 속성 데이터(500)에 포함되는 데이터 중 프로모션 이미지 색상 정보의 경우, 본 발명의 일실시예에 따르면 실제 프로모션 이미지를 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치(1)가 수신하여 프로모션 이미지 색상 정보을 기초로 프로모션 이미지의 색감을 수정하여 수정된 프로모션 이미지를 출력할 수 있다. 도 6은 본 발명의 일실시예에 따른 프로모션 이미지 수정 장치(13)를 도시한 모식도이다. 도 6에 도시된 바와 같이, 프로모션 이미지 수정 장치(13)는 제너레이터와 ConvNet 인코더를 포함하도록 구성될 수 있다. According to one embodiment of the present invention, in the case of the promotional image color information among the data included in the recommended promotion attribute data 500, according to the embodiment of the present invention, the actual promotional image is stored in the online shopping mall using artificial intelligence The promotional performance prediction and recommendation apparatus 1 can receive the corrected promotional image by modifying the color of the promotional image based on the color information of the promotional image. FIG. 6 is a schematic diagram showing an apparatus 13 for revising a promotion image according to an embodiment of the present invention. As shown in FIG. 6, the promotion image correcting apparatus 13 may be configured to include a generator and a ConvNet encoder.

제너레이터는 VAE, GAN 등의 인코더 및 디코더로 구성된 이미지 제너레이터로 구성될 수 있고, 실제 프로모션 이미지(210)를 수신하여 색상이 변경된 생성된 프로모션 이미지(212)를 출력할 수 있다. 생성된 프로모션 이미지(212)는 다시 ConvNet 인코더에 수신되고, ConvNet 인코더는 인코딩된 프로모션 이미지(211)를 출력하여 추천 프로모션 속성 데이터(500)의 프로모션 이미지 색상 정보와 Cross entropy 계산되어 에러를 출력할 수 있다. 이러한 에러는 제너레이터에 다시 피드(feed)되어 생성된 프로모션 이미지(212)가 더 프로모션 이미지 색상 정보에 맞게 생성(generate)되도록 업데이트 될 수 있다. The generator may be configured as an image generator composed of an encoder and a decoder such as a VAE and a GAN, and may receive the actual promotion image 210 and output the generated promoted image 212 with the changed color. The generated promotional image 212 is again received by the ConvNet encoder, and the ConvNet encoder outputs the encoded promotional image 211 and is cross entropy calculated with the promotional image color information of the recommended promotional attribute data 500 to output an error have. Such an error can be updated so that the generated promotional image 212 is fed back to the generator to further match the promotional image color information.

도 7은 본 발명의 일실시예에 따른 ConvNet 인코더의 예를 도시한 모식도이다. 도 7에 도시된 바와 같이, ConvNet 인코더는 [INPUT-CONV-RELU-POOL-FC]으로 구축할 수 있다. 입력 데이터인 생성된 프로모션 이미지(212)의 경우, INPUT 입력 이미지가 가로 32, 세로 32, 그리고 RGB 채널을 가지는 경우 입력의 크기는 [32x32x3]이다. CONV 레이어(Conv. Filter, 101)는 입력 이미지의 일부 영역과 연결되어 있으며, 이 연결된 영역과 자신의 가중치의 내적 연산(dot product)을 계산하게 된다. 결과 볼륨은 [32x32x12]와 같은 크기를 갖게 된다. RELU 레이어는 max(0,x)와 같이 각 요소에 적용되는 액티베이션 함수(activation function)이다. RELU 레이어는 볼륨의 크기를 변화시키지 않는다([32x32x12]). 그 결과 Activation map 1 (102)을 생성한다. POOL 레이어(pooling, 103)는 "가로,세로" 차원에 대해 다운샘플링(downsampling)을 수행해 [16x16x12]와 같이 줄어든 볼륨(Activation map 2, 104)을 출력한다. FC (fully-connected) 레이어(105)는 클래스 점수들을 계산해 [1x1x10]의 크기를 갖는 볼륨(output layer, 106)을 출력한다. "10"은 10개 카테고리에 대한 클래스 점수(본 발명의 일실시예에 따른 프로모션 이미지 색상 정보)에 해당한다. FC 레이어는 이전 볼륨의 모든 요소와 연결되어 있다.7 is a schematic diagram illustrating an example of a ConvNet encoder according to an embodiment of the present invention. As shown in Fig. 7, the ConvNet encoder can be constructed with [INPUT-CONV-RELU-POOL-FC]. In the case of the generated promoted image 212 as the input data, the input size is [32x32x3] when the INPUT input image has the width 32, the length 32, and the RGB channel. The CONV layer (Conv.Filter, 101) is connected to a part of the input image, and calculates the dot product of the weight and the connected area. The resulting volume will have the same size as [32x32x12]. The RELU layer is an activation function applied to each element such as max (0, x). The RELU layer does not change the size of the volume ([32x32x12]). As a result, Activation map 1 (102) is generated. The POOL layer 103 performs downsampling on the " horizontal and vertical " dimension to output a volume (Activation map 2, 104) such as [16x16x12]. A fully-connected (FC) layer 105 computes the class scores and outputs an output layer 106 of size [1x1x10]. &Quot; 10 " corresponds to a class score (promotion image color information according to an embodiment of the present invention) for 10 categories. The FC layer is connected to all the elements of the previous volume.

이와 같이, ConvNet은 픽셀 값으로 이뤄진 원본 이미지를 각 레이어를 거치며 클래스 점수(본 발명의 일실시예에 따른 프로모션 이미지 색상 정보)로 변환(transform)시킨다. 어떤 레이어는 모수 (parameter)를 갖지만 어떤 레이어는 모수를 갖지 않는다. 특히 CONV/FC 레이어들은 단순히 입력 볼륨만이 아니라 가중치(weight)와 바이어스(bias)도 포함하는 액티베이션(activation) 함수이다. 반면 RELU/POOL 레이어들은 고정된 함수이다. CONV/FC 레이어의 모수 (parameter)들은 각 이미지에 대한 클래스 점수가 해당 이미지의 레이블과 같아지도록 그라디언트 디센트(gradient descent)로 학습된다.As such, ConvNet transforms the original image, which consists of pixel values, through each layer into a class score (promotional image color information according to one embodiment of the present invention). Some layers have parameters but some do not. In particular, the CONV / FC layers are activation functions that include not only the input volume, but also the weight and bias. RELU / POOL layers, on the other hand, are fixed functions. The parameters of the CONV / FC layer are learned in gradient descent so that the class score for each image is equal to the label of the image.

CONV 레이어의 모수(parameter)들은 일련의 학습가능한 필터들로 이뤄져 있다. 각 필터는 가로/세로 차원으로는 작지만 깊이 (depth) 차원으로는 전체 깊이를 아우른다. 포워드 패스(forward pass) 때에는 각 필터를 입력 볼륨의 가로/세로 차원으로 슬라이딩시키며(정확히는 convolve시키며) 2차원의 액티베이션 맵 (activation map)을 생성한다. 필터를 입력 위로 슬라이딩 시킬 때, 필터와 입력 볼륨 사이에서 내적 연산(dot product)이 이뤄진다. 이러한 과정으로 ConvNet은 입력 데이터의 특정 위치의 특정 패턴에 대해 반응하는(activate) 필터를 학습하게 된다. 이런 액티베이션 맵(activation map)을 깊이(depth) 차원으로 쌓은 것이 곧 출력 볼륨이 된다. 그러므로 출력 볼륨의 각 요소들은 입력의 작은 영역만을 취급하고, 같은 액티베이션 맵 내의 뉴런들은 같은 필터를 적용한 결과이므로 같은 모수들을 공유한다.The parameters of the CONV layer consist of a series of learnable filters. Each filter is small in the horizontal / vertical dimension but covers the entire depth in the depth dimension. During the forward pass, each filter slides (precisely convolves) the horizontal and vertical dimensions of the input volume and creates a two-dimensional activation map. When sliding the filter over the input, a dot product is made between the filter and the input volume. In this way, ConvNet will learn the filter to activate for a specific pattern at a specific location in the input data. The accumulation of this activation map in depth dimension is the output volume. Therefore, each element of the output volume handles only a small area of the input, and neurons in the same activation map share the same parameters because they are the result of applying the same filter.

도 8은 본 발명의 일실시예에 따른 프로모션 추천 방법을 도시한 흐름도이다. 도 8에 도시된 바와 같이, 본 발명의 일실시예에 따른 프로모션 추천 방법은, 프로모션 성과 예측 단계(S10), 상태 정보 수신 단계(S11), 추천 프로모션 속성 데이터 출력 단계(S12), 상태 및 보상 정보 수신 단계(S13), 정책망 및 가치망 업데이트 단계(S14)를 포함할 수 있다. FIG. 8 is a flowchart illustrating a recommendation method according to an exemplary embodiment of the present invention. 8, the promotion recommendation method according to an embodiment of the present invention includes a promotion performance prediction step S10, a status information reception step S11, a recommended promotion attribute data output step S12, An information receiving step S13, a policy network and a value network updating step S14.

프로모션 성과 예측 단계(S10)는 프로모션 성과 예측 인공신경망 장치가 기존의 프로모션 속성 데이터, 성과 데이터를 수신하여 특정 프로모션 속성에 대한 프로모션 성과를 예측한 뒤 프로모션 추천 장치(11)에 송신하는 단계이다. 예측된 프로모션 성과 데이터는 정책망과 가치망의 초기값으로 이용될 수 있다. The promotional performance prediction step S10 is a step in which the promotional performance prediction ANN apparatus receives the existing promotion attribute data and performance data, predicts the promotion performance for the specific promotional attribute, and transmits the result to the promotion recommendation apparatus 11. Predicted promotional performance data can be used as an initial value for the policy and value networks.

상태 정보 수신 단계(S11)는 프로모션 추천 장치(11)가 인터넷 쇼핑몰 데이터 허브 장치에서 에피소드 t에 대한 상태 정보(s_t)를 수신하는 단계이다. The status information reception step S11 is a step in which the promotion recommendation apparatus 11 receives the status information s _t for the episode t in the Internet shopping mall data hub apparatus.

추천 프로모션 속성 데이터 출력 단계(S12)는 프로모션 추천 장치(11)가 최적 프로모션 탐색 모듈(112)을 통해 에피소드 t에 대한 추천 프로모션 속성 확률을 포함하는 추천 프로모션 속성 데이터(a_t)를 출력하는 단계이다. The recommended promotional attribute data output step S12 is a step in which the promotional recommendation apparatus 11 outputs the recommended promotional attribute data a _t including the recommended promotional attribute probability for the episode t through the optimum promotional search module 112 .

상태 및 보상 정보 수신 단계(S13)는 추천 프로모션 속성 데이터에 따라 인터넷 쇼핑몰에 프로모션이 적용된 뒤, 프로모션 추천 장치(11)가 인터넷 쇼핑몰 데이터 허브 장치에서 에피소드 t+1에 대한 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 수신하는 단계이다. The state and compensation information receiving step S13 is a step in which the promotion recommendation apparatus 11 receives the state information s _{t + 1} for the episode t + 1 in the Internet shopping mall data hub device after the promotion is applied to the Internet shopping mall according to the recommended promotion attribute data ) And compensation information r _{t + 1} for a _t .

정책망 및 가치망 업데이트 단계(S14)는 수신된 상태 정보(s_t+1) 및 a_t에 대한 보상 정보(r_t+1)를 기초로 정책망과 가치망을 업데이트하는 단계이다. The policy network and value network updating step S14 is a step of updating the policy network and the value network based on the received state information s _{t + 1} and compensation information r _{t + 1} for a _t .

강화학습의 관점에서, 본 발명의 일실시예에 따른 프로모션 추천 방법의 Objective는 프로모션의 성과를 향상시키는 것이고, 상태(State)는 현재 성과 데이터(판매수량(Quantity), 매출액(Sales Amount), 유입량(Page View), 전환율(Conversion Rate), 평균판매단가(Average Sales Price), 매출이익 등) 및 환경 데이터(날짜, 요일, 날씨, 현재 프로모션 진행 중인 제품, 쇼핑몰, 쇼핑몰의 다른 제품 정보)를 의미할 수 있고, 액션(Action)은 프로모션 속성 데이터(프로모션 카테고리(할인 프로모션, 사은품 프로모션 등), 프로모션 기간, 프로모션 쇼핑몰, 노출구좌, 프로모션 이미지 색상 등)을 의미할 수 있으며, 보상(Reward)은 특정 기간 동안의 성과 데이터의 상승 또는 하강을 의미할 수 있다. From the viewpoint of reinforcement learning, the objective of the promotion recommendation method according to the embodiment of the present invention is to improve the performance of the promotion, and the state can be divided into the present performance data (quantity, amount of sales, (Such as page view, conversion rate, average sales price, and sales profit), and environmental data (date, day, weather, current promotional products, shopping malls, Action may refer to promotion property data (promotion category (discount promotion, promotion promotion, etc.), promotion period, promotion shopping mall, exposure account, color of promotion image, etc.) May mean an increase or a decrease in performance data over a period of time.

이상에서 설명한 바와 같이, 본 발명이 속하는 기술 분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 상술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함하는 것으로 해석되어야 한다.As described above, those skilled in the art will appreciate that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention.

본 명세서 내에 기술된 특징들 및 장점들은 모두를 포함하지 않으며, 특히 많은 추가적인 특징들 및 장점들이 도면들, 명세서, 및 청구항들을 고려하여 당업자에게 명백해질 것이다. 더욱이, 본 명세서에 사용된 언어는 주로 읽기 쉽도록 그리고 교시의 목적으로 선택되었고, 본 발명의 주제를 묘사하거나 제한하기 위해 선택되지 않을 수도 있다는 것을 주의해야 한다.The features and advantages described herein are not all inclusive, and in particular, many additional features and advantages will be apparent to those skilled in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used herein is primarily chosen for readability and for purposes of teaching, and may not be selected to delineate or limit the subject matter of the invention.

본 발명의 실시예들의 상기한 설명은 예시의 목적으로 제시되었다. 이는 개시된 정확한 형태로 본 발명을 제한하거나, 빠뜨리는 것 없이 만들려고 의도한 것이 아니다. 당업자는 상기한 개시에 비추어 많은 수정 및 변형이 가능하다는 것을 이해할 수 있다.The foregoing description of embodiments of the invention has been presented for purposes of illustration. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Those skilled in the art will appreciate that many modifications and variations are possible in light of the above teachings.

그러므로 본 발명의 범위는 상세한 설명에 의해 한정되지 않고, 이를 기반으로 하는 출원의 임의의 청구항들에 의해 한정된다. 따라서, 본 발명의 실시예들의 개시는 예시적인 것이며, 이하의 청구항에 기재된 본 발명의 범위를 제한하는 것은 아니다.The scope of the invention is, therefore, not to be limited by the Detailed Description, but is to be defined by the claims of any application based thereon. Accordingly, the disclosure of embodiments of the invention is illustrative and not restrictive of the scope of the invention, which is set forth in the following claims.

1: 인공지능을 이용한 온라인 쇼핑몰에서의 프로모션 성과 예측 및 추천 장치
10: 프로모션 성과 예측 인공신경망 장치
11: 프로모션 추천 장치
12: 인터넷 쇼핑몰 데이터 허브 장치
13: 프로모션 이미지 수정 장치
100: 제품 속성 데이터
110: 정책망
111: 가치망
112: 최적 프로모션 탐색 모듈
200: 프로모션 속성 데이터
300: 성과 데이터
400: 프로모션 성과 예측 데이터
500: 추천 프로모션 속성 데이터1: Promotion Performance Prediction and Recommendation Device in Online Shopping Mall Using Artificial Intelligence
10: Promotional Performance Predictive Artificial Neural Network Device
11: Promotional recommendation device
12: Internet shopping mall data hub device
13: Promotion image correction device
100: Product attribute data
110: Policy network
111: Value Network
112: Optimum Promotion Search Module
200: Promotion attribute data
300: Performance data
400: Promotion performance forecast data
500: Featured promotional property data

Claims

A memory module for storing program codes of the promotional performance predictive artificial neural network module and the promotion recommendation module learned based on the promotion attribute data and the performance data of at least one online shopping mall; And
A processing module for processing the program code of the promotional performance artificial neural network module and the promotion recommendation module to predict the performance of the specific promotion and recommend the promotional attribute;
/ RTI >
Wherein the program code of the promotion recommendation module includes:
A promotional performance prediction step of receiving promotional performance prediction data, which is information on a promotional performance predicted for a specific promotional attribute in the promotional performance artificial neural network module;
A status information receiving step of receiving first status information (s _t ) for episode t;
Generating a compensation possibility information in a value network on the basis of the first state information and the promotion performance prediction data, and outputting a recommendation promotion property probability as a plurality of actions in a policy network based on the compensation possibility information Probability output step;
A recommended promotional attribute data output step of selecting and outputting recommended promotional attribute data (a _t ) for the episode t through an optimum promotional search based on the plurality of recommended promotional attribute probabilities;
The second state information s _{t + 1} for episode t + 1 and the compensation information r _{t + 1} for the recommended promotion attribute data a _t are received after the promotion based on the recommended promotional attribute data is applied And receiving compensation information; And
A policy network and a value network updating step of updating the policy network and the value network based on the received second state information s _{t + 1} and the compensation information r _{t + 1} ;
The computer program product being configured to run on a computer,
The first state information and the second state information may include performance data (such as a quantity, a sales amount, a page view, a conversion rate, an average selling price, And environmental data (date information, day of the week information, weather information, product information in progress, shopping mall information, or other product information of the shopping mall)
The recommended promotion attribute data means a promotion category, a promotion period, a promotional shopping mall, an exposed account or a color of a promotional image,
The compensation information means a rise or a fall of the performance data for a specific period,
The optimal promotional search in the step of outputting the recommended promotional attribute data is performed by using a Monte Carlo tree search. In the Monte Carlo tree search, each node of the tree represents a state, and each edge corresponds to a specific action It is composed of a structure in which a leaf node is expanded every time a transition is made to a new state by taking a new action with a current state as a root node,
The recommended promotional attribute data output step may include:
Selecting an action having the highest value among the selectable actions until the leaf node comes out of the current state;
An extension step of adding a new leaf node according to a probability of the policy network learned by the learning of the guidance when the simulation proceeds to the leaf node;
An evaluation step of evaluating the value of the new leaf node by calculating the compensation possibility information determined using the value network from the new leaf node and the compensation obtained from the leaf node until the end of the promotion episode using the policy network; And
An update step of re-evaluating the values of visited nodes during the simulation and updating the visit frequency by reflecting the value of the new leaf node;
Further comprising:
And selects the recommended promotional attribute data (a _t ) based on the visit frequency.
Promotion Performance Prediction and Recommendation Device in Online Shopping Mall Using Artificial Intelligence.

A promotional performance prediction module for receiving at least one promotional performance recommendation module for an online shopping mall, the promotional performance prediction data being information on predicted promotion performance for a specific promotional attribute in the promotional performance prediction artificial neural network module;
A status information receiving step in which the promotion recommendation module receives first status information (s _t ) for episode t;
The promotion recommendation module generates the compensation possibility information in the value network on the basis of the first state information and the promotion performance prediction data and then transmits the recommendation promotion property as a plurality of actions in the policy network based on the compensation possibility information An attribute probability output step of outputting a probability;
A recommended promotional attribute data output step of selecting and outputting recommended promotional attribute data (a _t ) for the episode t through an optimum promotional search based on the plurality of recommended promotional attribute probabilities;
After the promotion in accordance with the recommendation promotion attribute data is applied, the Promotions module episode t + 1 second state information (s _{t + 1)} and the like promotional attribute data (a _t) compensation information (r _t for about ₊₁ ), and receiving compensation information; And
A policy network and a value network updating step of updating the policy network and the value network based on the second state information (s _{t + 1} ) and the compensation information (r _{t + 1} ) received by the promotion recommendation module;
The computer program product being configured to run on a computer,
The status information includes performance data (such as quantity, sales amount, page view, conversion rate, average sales price, or sales profit) and environment data (date information, Day information, weather information, promotional product information, shopping mall information, or other product information of the shopping mall)
The recommended promotion attribute data means a promotion category, a promotion period, a promotional shopping mall, an exposed account or a color of a promotional image,
The compensation information means a rise or a fall of the performance data for a specific period,
The optimal promotional search in the step of outputting the recommended promotional attribute data is performed by using a Monte Carlo tree search. In the Monte Carlo tree search, each node of the tree represents a state, and each edge corresponds to a specific action It is composed of a structure in which a leaf node is expanded every time a transition is made to a new state by taking a new action with a current state as a root node,
The recommended promotional attribute data output step may include:
Selecting an action having the highest value among the selectable actions until the leaf node comes out of the current state;
An extension step of adding a new leaf node according to a probability of the policy network learned by the learning of the guidance when the simulation proceeds to the leaf node;
An evaluation step of evaluating the value of the new leaf node by calculating the compensation possibility information determined using the value network from the new leaf node and the compensation obtained from the leaf node until the end of the promotion episode using the policy network; And
An update step of re-evaluating the values of visited nodes during the simulation and updating the visit frequency by reflecting the value of the new leaf node;
Further comprising:
And selects the recommended promotional attribute data (a _t ) based on the visit frequency.
Promotion Performance Prediction and Recommendation Method in Online Shopping Mall Using Artificial Intelligence.

delete