KR20230081028A

KR20230081028A - Method for generating a product demand forecasting model and for predicting product demand using the same

Info

Publication number: KR20230081028A
Application number: KR1020210168700A
Authority: KR
Inventors: 구정인; 김보현; 정홍진
Original assignee: 한국생산기술연구원
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-06-07

Abstract

본 발명은 온라인 또는 오프라인 매장에서 상품을 판매함에 있어서 미래의 수요를 정확하게 예측할 수 있는 모델을 생성하고 이를 통해 수요를 예측하는 방법에 관한 것이다. The present invention relates to a method for generating a model capable of accurately predicting future demand in selling a product at an online or offline store and predicting demand through the model.

Description

Method for generating a product demand forecasting model and for predicting product demand using the same}

본 발명은 경제학 및 IT 기술분야에 관한 것으로, 온라인 또는 오프라인 매장에서 상품을 판매함에 있어서 미래의 수요를 정확하게 예측할 수 있는 모델을 생성하고 이를 통해 수요를 예측하는 방법에 관한 것이다. The present invention relates to the field of economics and IT technology, and relates to a method for generating a model capable of accurately predicting future demand in selling a product at an online or offline store and predicting demand therethrough.

상품 수요예측은 상품을 판매하는 매장에서 재고 관리, 진열, 발주 등 다양한 요소에 큰 영향을 준다. 유통 기간이 짧은 식자재가 판매되는 마트와 같은 매장은, 수요예측의 정확성이 수익과 밀접한 관련이 있다. 과거에는 매장 내 직원의 경험에 의해 수요를 예측하여 왔으나, 최근 통계적 기법과 인공지능의 발달로 인하여 보다 정교하게 수요를 예측하는 기술이 개발되고 있다.Product demand forecasting has a great impact on various factors such as inventory management, display, and ordering in stores that sell products. In stores such as marts where food materials with a short distribution period are sold, the accuracy of demand forecasting is closely related to profits. In the past, demand was predicted based on the experience of employees in stores, but recently, due to the development of statistical techniques and artificial intelligence, more sophisticated technology for predicting demand is being developed.

온라인 매장의 경우 수요예측 정확성은 90%에 이르는 것으로 보고되나, 오프라인 매장은 이보다 매우 낮은 것으로 알려져 있다. 이는, 온라인 매장이 상품 구매자의 개인 정보들을 대부분 확보하고 있음에 기인한다. 오프라인 매장에서는 개인 정보 확보가 어렵기에 다양한 혜택으로 회원 가입을 유도하고 있으나, 그럼에도 개인 정보를 제공하지 않는 비회원에 의한 상품 구매의 비율이 매우 높아, 정확한 수요예측은 어려운 실정이다.In the case of online stores, demand forecasting accuracy is reported to reach 90%, but offline stores are known to be much lower than this. This is due to the fact that the online store secures most of the personal information of product buyers. Since it is difficult to secure personal information in offline stores, various benefits are induced to sign up as members, but the ratio of product purchases by non-members who do not provide personal information is very high, making it difficult to accurately predict demand.

상품의 수요를 예측하는 알려진 방법 중 하나는 상품의 시간별 판매량 데이터를 시계열 분해하여 예측하는 방법이다. 여기서, 상품의 시간별 판매량 데이터는 도 1 좌측 도면과 같이 시간을 x축으로 하고 판매량을 y축으로 하는 데이터이다. One of the known methods for predicting product demand is a method of predicting by time-series decomposition of hourly sales volume data of a product. Here, sales volume data by time of a product is data having time as the x-axis and sales volume as the y-axis, as shown in the left side of FIG. 1 .

도 1 우측 도면과 같이, 실제 관측값(observed)을 시계열 분해하면 추세(trend), 계절성(seasonal) 및 전차(residual)로 구분된다. 구체적인 분해 방법은 이미 알려진 기술인바 상세한 설명을 생략한다. 추세는 판매량의 평균적인 흐름을 나타낸다. 계절성은 일정 시간 단위로 반복되는 패턴이며, 이들을 제외하면 무작위성을 나타내는 잔차가 남게 된다. 이 중에서 주로 추세와 계절성을 미래의 시간에 대입하여 상품의 수요를 예측하게 된다. As shown on the right side of FIG. 1 , time-series decomposition of actual observed values is classified into trend, seasonal, and residual. Since a specific decomposition method is a known technique, a detailed description thereof will be omitted. Trends represent the average flow of sales. Seasonality is a pattern that repeats in a certain time unit, and excluding these patterns, residuals representing randomness remain. Among them, trend and seasonality are mainly applied to future time to predict product demand.

이러한 방식에서는 데이터가 많을수록 예측 정확도는 상승하게 된다. 그런데, 동일 상품군에서 개별 판매량은 달라도 추세 및/또는 계절성은 유사할 것을 전제로 할 수 있으므로 정확성 상승을 위해 동일 상품군 내 상품의 데이터를 모두 합산하여 수요를 예측하고 있다. 예컨대, "생수"의 상품군 내에서 "생수A"와 "생수B"의 개별 판매량은 달라도 추세 및/또는 계절성은 유사할 것이다. 따라서, "생수A" 상품의 수요를 예측하는 경우 "생수A"의 시간별 판매량만 시계열 분해하여 수요를 예측하는 것보다는 "생수A", "생수B" 등 모든 생수의 시간별 판매량을 모두 합산한 후 이를 시계열 분해하여 수요를 예측하는 경우 정확성이 보다 상승한다. In this method, the more data, the higher the prediction accuracy. However, since it can be assumed that the trend and/or seasonality are similar even though the individual sales volume is different in the same product group, demand is predicted by summing up data of all products in the same product group to increase accuracy. For example, although individual sales volumes of "mineral water A" and "mineral water B" within the product group of "mineral water" are different, the trend and/or seasonality will be similar. Therefore, when forecasting the demand for "bottled water A" product, rather than forecasting the demand by time series decomposition of only the hourly sales volume of "bottled water A", after summing up the hourly sales volume of all bottled water such as "bottled water A" and "bottled water B" If demand is predicted by time series decomposition, the accuracy is higher.

상대적으로 동일 상품군 내의 상품들을 함께 분석하는 경우 정확성은 상승할 것이지만 그 정도는 충분하지 않았다. 이에, 보다 많은 데이터를 합산함으로써 수요예측의 정확성을 보다 높이는 방법이 연구되고 있다. Relatively speaking, if products within the same product group are analyzed together, the accuracy will increase, but the degree is not sufficient. Accordingly, a method of increasing the accuracy of demand forecasting by adding more data is being studied.

또한, 앞서 설명한 방법은 시간별 판매량 데이터를 통계학적으로 처리하는데 그치고 있는데, 데이터가 더 많아질수록 통계학적 처리에는 과다한 시간이 소요되고 데이터 오류 및 무작위성으로 인해 오히려 정확도가 충분히 상승하지 않을 우려가 있다. In addition, the above-described method only statistically processes hourly sales data, but as the number of data increases, statistical processing takes excessive time, and accuracy may not increase sufficiently due to data errors and randomness.

(특허문헌 1) 중국공개특허 제108133391호(Patent Document 1) Chinese Patent Publication No. 108133391

(특허문헌 2) 한국등록특허 제10-2264013호(Patent Document 2) Korea Patent Registration No. 10-2264013

(특허문헌 3) 한국등록특허 제10-22645526호(Patent Document 3) Korea Patent Registration No. 10-22645526

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것이다. The present invention has been made to solve the above problems.

시간별 판매량 데이터를 시계열 분해함으로써 수요를 예측하는 방법을 개선하여, 하나의 상품의 수요를 예측하기 위해 해당 상품의 상품군 데이터만 사용하는 것이 아니라 추세 및/또는 계절성이 유사한 다른 상품군의 상품도 데이터로 활용할 수 있도록 함으로써, 보다 많은 데이터를 활용하여 수요예측의 정확성을 높이고자 한다. By improving the method of predicting demand by time-series decomposition of sales volume data by time, to forecast the demand for a product, not only the product group data of that product is used, but also products of other product groups with similar trends and/or seasonality can be used as data. By enabling this, we want to improve the accuracy of demand forecasting by utilizing more data.

또한, 통계학적 분석에 그치지 않고, 결과 정확성을 높이기 위해 전이학습을 적용하여 수요를 예측하고자 한다. In addition, we want to predict demand by applying transfer learning to increase the accuracy of the results, not just statistical analysis.

상기와 같은 과제를 해결하기 위한 본 발명의 일 실시예는, (A) 가중상관성 연산식 작성 모듈(210)이, 기 설정된 방법으로, 가중상관성 연산식을 작성하는 단계; (B) 군집 분류 모듈(220)이, 다수의 상품 각각의 시간별 판매량 데이터를 추세(trend), 계절성(seasonal) 및 전차(residual)로 시계열 분해한 후 상기 다수의 상품의 모든 상품 쌍 조합마다 추세 상관성(correlation of trend), 계절성 상관성(correlation of seasonality) 및 전차 상관성(correlation of residual)을 연산하여 획득하고, 이를 상기 가중상관성 연산식에 대입함으로써, 모든 상품 쌍 조합의 가중상관성(weighted correlation)을 획득하는 단계; (C) 상기 군집 분류 모듈(220)이 상기 다수의 상품 중 어느 하나의 상품이 포함되는 모든 가중상관성의 평균을 구하고, 이를 모든 상품에 대하여 수행하여, 모든 상품 각각에 대한 가중상관성 평균이 연산되는 단계; (D) 상기 군집 분류 모듈(220)이, 상기 가중상관성 평균이 기 설정된 범위 내에 있는 상품들을 동일한 군집으로 분류함으로써, 상기 다수의 상품들을 하나 이상의 군집으로 분류하는 단계; 및 (E) 수요예측 모델 생성 모듈(230)이, 상기 (D) 단계에서 분류된 모든 군집에 대하여, 동일한 군집으로 분류된 모든 상품들의 시간별 판매량 데이터를 순차적으로 전이학습(transfer learning)하여 해당 군집에 해당하는 상품에 대한 수요예측 모델을 생성하는 단계를 포함하는, 방법을 제공한다.One embodiment of the present invention for solving the above problems, (A) the weighted correlation calculation equation creation module 210, in a predetermined method, creating a weighted correlation equation; (B) After the clustering classification module 220 time-series decomposition of the hourly sales data of each of a plurality of products into trend, seasonal, and residual, the trend for every product pair combination of the plurality of products The correlation of trend, correlation of seasonality, and correlation of residual are calculated and obtained, and by substituting them into the weighted correlation equation, the weighted correlation of all product pair combinations is obtained. obtaining; (C) The cluster classification module 220 calculates the average of all weighted correlations included in any one product among the plurality of products, and performs this for all products, so that the weighted correlation average for each product is calculated step; (D) classifying, by the cluster classification module 220, the plurality of products into one or more clusters by classifying products having the weighted correlation average within a preset range into the same cluster; and (E) the demand forecasting model generation module 230 sequentially transfers the sales data by time of all products classified into the same cluster with respect to all clusters classified in step (D), It provides a method comprising generating a demand forecasting model for a product corresponding to.

또한, 상기 (A) 단계는, (a1) 상기 가중상관성 연산식 작성 모듈(210)이 다수의 모델링대상 상품 각각의 시간별 판매량 데이터를 획득하는 단계; (a2) 상기 가중상관성 연산식 작성 모듈(210)이 상기 획득된 시간별 판매량 데이터를 각각 추세, 계절성 및 전차로 시계열 분해하는 단계; 및 (a3) 상기 가중상관성 연산식 작성 모듈(210)이 상기 시계열 분해된 모델링대상 상품들의 데이터를 이용하여 가중상관성 연산식을 작성하는 단계를 포함하는 것이 바람직하다.In addition, the step (A) may include: (a1) obtaining, by the weighted correlation equation creation module 210, sales data for each hour of a plurality of modeling target products; (a2) time-series decomposition of the obtained sales volume data by time by the weighted correlation equation preparation module 210 into trend, seasonality, and train, respectively; and (a3) preparing, by the module 210, a weighted correlation equation using the time-series decomposed data of modeling target products.

또한, 상기 (a3) 단계는, (a31) 전이학습 모듈(290)이, 상기 시계열 분해된 모델링대상 상품들의 데이터를 순차적으로 전이학습하여 1차 수요예측 모델을 생성하는 단계; 및 (a32) 상기 전이학습 모듈(290)이, 상기 1차 수요예측 모델을 이용하여, 시계열 분해된 추세, 계절성 및 잔차 각각에 대한 순열특성 중요도(permutation importance)를 연산하고, 상기 가중상관성 연산식 작성 모듈(210)이 이를 이용하여 아래의 수식과 같은 가중상관성 연산식을 작성하는 단계를 포함하는 것이 바람직하다. 여기에서 수식은 다음과 같다. Further, the step (a3) may include: (a31) generating a first demand forecasting model by sequentially performing transfer learning on the time-series decomposed data of modeling target products by the transfer learning module 290; and (a32) the transfer learning module 290 calculates permutation importance for each of the time-series decomposed trend, seasonality, and residual using the first-order demand forecasting model, and the weighted correlation equation It is preferable that the preparation module 210 includes a step of preparing a weighted correlation calculation formula such as the following formula using this. The formula here is:

Weighted Correlation = Correlation of Trend * Permutation Importance of Trend + Correlation of seasonality * Permutation Importance of Seasonality + Correlation of Residual * Permutation Importance of SeasonalityWeighted Correlation = Correlation of Trend * Permutation Importance of Trend + Correlation of Seasonality * Permutation Importance of Seasonality + Correlation of Residual * Permutation Importance of Seasonality

또한, 상기 (a1) 단계의 모델링대상 상품들과, 상기 (B) 단계의 다수의 상품은 동일한 것이 바람직하다.In addition, it is preferable that the products to be modeled in the step (a1) and the plurality of products in the step (B) are the same.

또한, 상기 (E) 단계는, 상기 전이학습 모듈(290)이 상기 (D) 단계에서 분류된 군집 내의 모든 상품들의 시간별 판매량 데이터를 전이학습(transfer learning)함으로써, 상기 수요예측 모델 생성 모듈(230)이 수요예측 모델을 생성하는 단계를 포함하는 것이 바람직하다.In addition, in the step (E), the demand forecasting model generation module 230 transfer learning module 290 performs transfer learning on the hourly sales volume data of all products in the cluster classified in step (D). ) preferably includes generating a demand forecasting model.

또한, 상기 (a31) 단계는, 상기 전이학습 모듈(290)이 랜덤포레스트 기반으로 1차 수요예측 모델을 생성하는 단계를 포함하며, 상기 (a32) 단계는, 상기 전이학습 모듈(290)이 상기 1차 수요예측 모델에서 랜덤포레스트 기반으로 순열특성 중요도(permutation importance)를 연산하는 단계를 포함하는 것이 바람직하다.In addition, the step (a31) includes the step of generating the first demand forecasting model based on the random forest by the transfer learning module 290, and the step (a32) includes the step of the transfer learning module 290 generating the It is preferable to include calculating permutation importance based on random forest in the first demand forecasting model.

또한, 상기 (E) 단계 이후, (F) 수요예측부(120)에 소정의 상품과 예측하고자 하는 시간이 입력되는 단계; (G) 상기 수요예측부(120)가 상기 입력된 상품이 어느 군집에 포함되는지 확인하는 단계; 및 (H) 상기 수요예측부(120)가 확인된 군집에 대하여 생성된 수요예측 모델을 불러오고, 상기 예측하고자 하는 시간을 적용함으로써, 상기 예측하고자 하는 시간의 수요를 예측하는 단계를 더 포함하는 것이 바람직하다.In addition, after the step (E), (F) inputting a predetermined product and a time to be predicted into the demand forecasting unit 120; (G) determining which cluster the input product is included in by the demand predicting unit 120; and (H) predicting the demand of the time to be predicted by the demand forecasting unit 120 by calling the demand prediction model generated for the identified cluster and applying the time to be predicted. it is desirable

또한, 상기 (H) 단계 이후, (I) 발주부(130)가 상기 (H) 단계에서 예측된 상품의 수요를 이용하여, 해당 상품을 발주하기 위한 상품 발주정보를 생성하는 것이 바람직하다.In addition, after the step (H), it is preferable that the (I) ordering unit 130 generates product ordering information for ordering the corresponding product by using the demand for the product predicted in the step (H).

또한, 매장관리부(100)는 상기 발주부(140)와 매출정보 획득부(110)와 연동되고, 상기 매출정보 획득부(110)는 상기 다수의 상품의 시간별 판매량 데이터를 포함하는 매출정보를 획득하고, 상기 (a1) 단계는, 상기 매출정보 획득부(110)로부터 상기 다수의 상품 각각의 시간별 판매량 데이터를 획득하는 단계를 포함하고, 상기 (B) 단계는, 상기 군집 분류 모듈(220)이 상기 다수의 상품 각각의 시간별 판매량 데이터를 획득하는 단계를 포함하는 것이 바람직하다.In addition, the store management unit 100 is linked with the ordering unit 140 and the sales information acquisition unit 110, and the sales information acquisition unit 110 acquires sales information including hourly sales volume data of the plurality of products. And, the step (a1) includes the step of acquiring sales data by time of each of the plurality of products from the sales information acquisition unit 110, and the step (B) includes the step of the cluster classification module 220 It is preferable to include obtaining data on sales volume by time of each of the plurality of products.

본 발명에 따른 방법이 적용됨으로써, 일견 추세 및/또는 계절성이 다를 것으로 예상되는 다른 상품군의 상품이지만 실제로는 추세, 계절성 및/또는 잔차가 유사한 상품들을 모두 선정하여 군집을 형성할 수 있으며, 이러한 군집은 해당 상품의 상품군 대비 훨씬 많은 데이터를 갖기에, 수요예측 모델의 정확성이 상승한다.By applying the method according to the present invention, a cluster can be formed by selecting all products of different product groups that are expected to have different trends and/or seasonality at first glance, but actually have similar trends, seasonality and/or residuals. has much more data than the corresponding product group, so the accuracy of the demand forecasting model increases.

또한, 가중상관성을 적용함으로써, 추세, 계절성 및 잔차에 적절한 가중치를 부여할 수 있어서, 어떤 상품의 데이터를 활용할지 높은 정확도로 결정하여 군집 형성이 정교하게 이루어지며, 이에 따라 수요예측 정확성 역시 상승한다.In addition, by applying weighted correlation, it is possible to assign appropriate weights to trend, seasonality, and residuals, so that the product data to be used is determined with high accuracy, and cluster formation is elaborately performed, thereby increasing the accuracy of demand forecasting. .

전이학습에 의해 모델을 생성하고 본 발명에 따른 방법을 적용하되 가중상관성을 적용하여 수요를 예측한 결과, 도 6에 도시되는 것과 같이 그 정확성이 크게 상승하였음을 확인하였다. As a result of generating a model by transfer learning and applying the method according to the present invention, but predicting demand by applying weighted correlation, it was confirmed that the accuracy greatly increased as shown in FIG. 6.

도 1은 시간별 판매량 데이터를 시계열 분석하여 수요를 예측하는 종래 기술을 설명하기 위한 도면이다.
도 2는 본 발명에 따른 방법을 수행하기 위한 시스템을 도시한다.
도 3은 본 발명에 따른 방법의 순서도이다.
도 4는 본 발명에 따른 방법의 검증 실험 결과 중 하나로서, 순열특성 중요도를 도출한 결과를 도시한다.
도 5는 본 발명에 따른 방법의 검증 실험 결과 중 하나로서, 각 상품 쌍 조합의 가중상관성을 도시한다.
도 6은 본 발명에 따른 방법의 검증 실험 결과 중 하나로서, 도 6a는 본 발명에 따라 군집 개념이 적용된 전이학습 이용 수요예측 모델의 검증 결과이며, 도 6b는 단일 상품을 대상으로 전이학습 이용 수요예측 모델의 검증 결과이다. 1 is a diagram for explaining a prior art for predicting demand by time-series analysis of hourly sales volume data.
Figure 2 shows a system for carrying out the method according to the invention.
3 is a flowchart of a method according to the present invention.
4 is one of the verification test results of the method according to the present invention, and shows the result of deriving the importance of permutation characteristics.
Figure 5 shows the weighted correlation of each product pair combination as one of the verification test results of the method according to the present invention.
Figure 6 is one of the verification test results of the method according to the present invention, Figure 6a is a verification result of the transfer learning use demand prediction model to which the cluster concept is applied according to the present invention, Figure 6b is a transfer learning use demand targeting a single product This is the verification result of the predictive model.

이하에서 "상품"은, 매장에서 판매되는 모든 물건을 의미한다.Hereinafter, "product" means all items sold in stores.

이하에서 "군집"은, 상품들의 모음이며, 동일한 종류의 상품군과는 다른 개념으로서 동일한 종류 또는 동일하지 않은 종류의 상품들이 하나의 군집을 형성할 수 있다. 또한, 어느 하나의 상품은 어느 하나의 군집 내에 포함되어야 한다.Hereinafter, a “group” is a collection of products, and as a concept different from a product group of the same type, products of the same or different types may form one cluster. In addition, any one product must be included in any one cluster.

1. 시스템의 설명1. Description of the system

도 2를 참조하여, 본 발명에 따른 방법을 수행하기 위한 시스템을 설명한다.Referring to Fig. 2, a system for performing the method according to the present invention is described.

본 발명에 따른 방법을 수행하기 위한 시스템은, 매장관리부(100), 매출정보 획득부(110), 수요예측부(120) 및 발주부(130)를 포함하며, 제어부(200)를 더 포함한다. The system for performing the method according to the present invention includes a store management unit 100, a sales information acquisition unit 110, a demand forecasting unit 120 and an ordering unit 130, and further includes a control unit 200. .

매장관리부(100)는 상품이 판매되는 매장을 관리하는 기능을 하며, 종래 사용되는 어떠한 종류의 매장관리 프로그램이 실행되는 하드웨어일 수 있다. 매출을 관리하고, 발주를 진행하며, 재고를 관리할 수 있다. The store management unit 100 functions to manage stores where products are sold, and may be hardware that executes any kind of store management program conventionally used. You can manage sales, place orders, and manage inventory.

매출정보 획득부(110)는 매장관리부(100)로부터 매출정보를 획득하며, 획득되는 매출정보에는 각 상품의 시간별 판매량 데이터가 포함되어 있다. 이러한 매출정보 중 제어부(200)가 요구하는 상품들의 시간별 판매량 데이터가 제어부(200)에 전달된다. The sales information acquisition unit 110 acquires sales information from the store management unit 100, and the acquired sales information includes sales data for each product at each hour. Among such sales information, hourly sales volume data of products requested by the control unit 200 is transmitted to the control unit 200 .

수요예측부(120)는 생성된 수요예측 모델을 이용하여, 특정 상품의 특정 시간 동안의 수요를 예측하는 기능을 수행한다. The demand forecasting unit 120 performs a function of predicting demand for a specific product during a specific time using the generated demand forecasting model.

발주부(130)는 수요예측부(120)에서 예측된 수요 정보에 기초하여 특정 상품의 발주정보를 자동 생성하여 발주처에 발신한다. 발주정보는 매장관리부(100)에 전달된다. The ordering unit 130 automatically generates ordering information for a specific product based on the demand information predicted by the demand predicting unit 120 and sends it to the ordering party. The ordering information is transmitted to the store management unit 100 .

제어부(200)는 수요예측 모델을 생성하는 기능을 하는데, 구체적인 내용은 아래의 방법의 설명에서 후술한다. 본 발명에 따른 방법은 프로그램으로서, 해당 방법이 수행되도록 제어부(200)에 의해 동작하며 기억매체에 저장된 것일 수 있다. The control unit 200 functions to generate a demand forecasting model, and details will be described later in the description of the method below. The method according to the present invention is a program, which is operated by the control unit 200 and stored in a storage medium so that the corresponding method is executed.

2. 방법의 설명2. Description of the method

도 3을 참조하여, 본 발명에 따른 방법을 설명한다.Referring to Fig. 3, the method according to the present invention is described.

본 발명에 따른 방법은 정확성 상승을 위해 보다 많은 데이터를 확보하기 위하여 군집의 개념을 도입한다. 전술한 바와 같이, 특정 상품의 유사 상품군에 한정되지 않고 추세, 계절성 및/또는 잔차가 유사할 것으로 판단되는 모든 상품을 군집으로 설정한다. 하나의 군집에 포함된 모든 상품들에 대한 데이터가 이용되어 수요예측 모델이 생성되기에, 특정 상품군에 대한 데이터만 이용하는 경우와 대비하여 정확성이 상승한다. The method according to the present invention introduces the concept of clustering in order to secure more data to increase accuracy. As described above, all products that are determined to have similar trends, seasonality, and/or residuals are set as clusters without being limited to similar product groups of specific products. Since the demand forecasting model is created by using the data of all products included in one cluster, the accuracy is increased compared to the case of using only the data of a specific product group.

어떠한 상품들을 하나의 군집으로 형성할 수 있을지 여부를 판단하기 위하여 1차 수요예측 모델을 생성한 후 순열특성 중요도를 이용하여 가중상관성 연산식을 작성하여 이를 활용한다. 각 상품마다 다른 모든 상품들에 대하여 가중상관성이 연산될 것이다. N개의 상품이 데이터로 활용된다면, 가중상관성은 N-1개만큼 연산되며, 그 평균값인 가중상관성 평균을 이용하여 군집을 분류한다. In order to determine which products can be formed as one cluster, the first demand forecasting model is created, and then a weighted correlation formula is prepared and utilized using the importance of permutation characteristics. For each product, a weighted correlation will be calculated for all other products. If N products are used as data, weighted correlations are calculated as many as N-1, and clusters are classified using the weighted correlation average, which is the average value.

분류된 군집 내에 포함된 상품들은 추세, 계절성 및/또는 잔차가 유사할 것으로 판단되는 모든 상품을 포함한다. 유사 상품군에 한정되지 않는다. 이러한 상품들의 데이터를 모두 이용하여 군집의 수요예측 모델이 생성된다. 군집에 포함된 상품들은 수요예측 모델이 동일하다. 각 군집마다 각각 수요예측 모델이 생성된다. The commodities included in the classified clusters include all commodities that are judged to have similar trends, seasonality, and/or residuals. It is not limited to similar product groups. A demand forecasting model for a cluster is created using all of these product data. The products included in the cluster have the same demand forecasting model. A demand forecasting model is created for each cluster.

수요를 예측하고자 하는 상품과 시간이 있다면, 해당 상품이 어느 군집에 해당하는지 확인하고, 그 군집의 수요예측 모델을 불러와서, 상품과 시간을 입력함으로써 판매량이 연산된다. 연산된 판매량이 해당 상품 해당 시간의 수요예측이다. If there is a product and time for which demand is to be predicted, the sales volume is calculated by checking which cluster the product belongs to, loading the demand prediction model of that cluster, and inputting the product and time. The calculated sales volume is the demand forecast for the corresponding product at the corresponding time.

각 단계를 구체적으로 설명한다. Each step is explained in detail.

2.1 1차 수요예측 모델을 이용한 가중상관성 연산식 작성2.1 Writing a weighted correlation formula using the first demand forecasting model

모델링대상 상품을 선정한다. 매장에서 판매되는 모든 상품일 수 있다. Select the product to be modeled. It can be any product sold in the store.

가중상관성 연산식 작성 모듈(210)은 다수의 모델링대상 상품 각각의 시간별 판매량 데이터를 획득한다. 이는, 매출정보 획득부(110)로부터 획득한 매출정보에서 획득될 수 있다. The weighted correlation formula preparation module 210 obtains sales data for each hour of a plurality of modeling target products. This may be obtained from sales information obtained from the sales information acquisition unit 110 .

다음, 가중상관성 연산식 작성 모듈(210)이 획득된 시간별 판매량 데이터를 각각 추세, 계절성 및 전차로 시계열 분해한다. 시계열 분해 방법은 종래 기술인바 상세한 설명은 생략한다.Next, the weighted correlation calculation formula preparation module 210 decomposes the acquired sales volume data by time into trend, seasonality, and time series, respectively. Since the time series decomposition method is a prior art, a detailed description thereof will be omitted.

다음, 가중상관성 연산식 작성 모듈(210)이 시계열 분해된 모델링대상 상품들의 데이터를 이용하여 가중상관성 연산식을 작성한다. Next, the weighted correlation formula preparation module 210 creates a weighted correlation formula using the time-series decomposed data of modeling target products.

구체적으로, 전이학습 모듈(290)이, 시계열 분해된 모델링대상 상품들의 데이터를 순차적으로 전이학습하여 1차 수요예측 모델을 생성한다. 1차 수요예측 모델을 통해 시계열 분해된 추세, 계절성 및 잔차 각각에 대한 순열특성 중요도(permutation importance)가 연산된다. 순열특성 중요도는 종래 알려진 방법으로 연산될 것이다. 이러한 연산에 의해 추세의 순열특성 중요도(Permutation Importance of Trend), 계절성의 순열특성 중요도(Permutation Importance of Seasonality) 및 잔차의 순열특성 중요도(Permutation Importance of Seasonality)의 점수가 확인된다. 이제, 가중상관성 연산식 작성 모듈(210)이 이를 이용하여 아래의 수식과 같은 가중상관성 연산식을 작성한다. 해당 수식에서의 추세 상관성(correlation of trend), 계절성 상관성(correlation of seasonality) 및 전차 상관성(correlation of residual)은 후술하는 군집 분류 대상 상품의 쌍마다 결정될 것이다. Specifically, the transfer learning module 290 sequentially transfers the data of the time-series decomposed modeling target products to generate a first demand forecasting model. Through the first-order demand forecasting model, the permutation importance for each of the time series decomposed trend, seasonality, and residual is calculated. The permutation feature importance will be calculated by a conventionally known method. By this operation, the scores of Permutation Importance of Trend, Permutation Importance of Seasonality, and Permutation Importance of Seasonality of residuals are confirmed. Now, the weighted correlation formula preparation module 210 uses this to create a weighted correlation formula such as the following formula. The correlation of trend, correlation of seasonality, and correlation of residual in the formula will be determined for each pair of products subject to clustering, which will be described later.

여기서, 1차 수요예측 모델은 전이학습을 이용하여 생성된다. 예컨대, 모델링대상 상품들 중 어느 하나인 상품A를 이용하여 학습하여 수요예측 모델A을 생성한 후, 다른 모델링대상 상품인 상품B에 대한 데이터를 더 이용하여 수요예측 모델B를 생성한다. 이러한 방식으로 지속적으로 학습함으로써 N개의 모델링대상 상품에 대해서는 수요예측 모델A부터 수요예측 모델N까지 생성될 것이며, 여기에서의 수요예측 모델N이 1차 수요예측 모델이 된다. Here, the first demand forecasting model is created using transfer learning. For example, after generating a demand prediction model A by learning using product A, which is one of the products to be modeled, a demand prediction model B is created by further using data on product B, another product to be modeled. By continuously learning in this way, demand forecasting model A through demand forecasting model N will be generated for N products to be modeled, and the demand forecasting model N in this case becomes the primary demand forecasting model.

또한, 이러한 전이학습은 랜덤포레스트 기반으로 수행될 수 있다. 이 경우, 순열특성 중요도(permutation importance) 역시 랜덤포레스트 기반으로 연산된다. In addition, such transfer learning can be performed based on random forest. In this case, the permutation importance is also calculated based on the random forest.

한편, 이러한 가중상관성 연산식을 작성하기 위한 모델링대상 상품에는 어떠한 제한도 없으나, 매장에서 판매되는 모든 상품, 즉 이하에서 설명하는 바와 같이 수요예측을 하고자 하는 모든 상품인 것이 정확성 측면에서 바람직하다.On the other hand, there are no restrictions on the products to be modeled for creating such a weighted correlation calculation formula, but it is preferable in terms of accuracy that all products sold in stores, that is, all products for demand forecasting as described below.

2.2 군집 분류2.2 Classification of clusters

이제, 작성된 가중상관성 연산식을 이용하여 모든 상품의 군집을 분류할 수 있다. Now, the clusters of all products can be classified using the weighted correlation formula.

먼저, 군집 분류 모듈(220)이 매장에서 판매되는 다수의 상품 각각의 시간별 판매량 데이터를 추세, 계절성 및 전차로 시계열 분해한 후 상기 다수의 상품의 모든 상품 쌍 조합마다 추세 상관성, 계절성 상관성 및 전차 상관성을 연산한다. 상관성(correlation)을 연산하는 방법은 어떠한 방법을 사용하여도 무방하다. 상관성 점수는 -1부터 1 사이이며, -1일수록 음의 상관관계, 0일수록 양의 상관관계를 의미한다. 이러한 방법으로 N개의 상품 중 어느 하나의 상품에 대하여 상품 쌍 조합마다 상관성을 연산한다면, 추세 상관성, 계절성 상관성 및 전차 상관성 각각이 N-1개씩 연산될 것이다. First, the clustering classification module 220 decomposes the time-series sales data of each of a plurality of products sold in the store into trend, seasonality, and total, and then the trend correlation, seasonality correlation, and total correlation for all product pair combinations of the plurality of products. calculate Any method may be used to calculate the correlation. The correlation score ranges from -1 to 1, where -1 indicates a negative correlation and 0 indicates a positive correlation. If correlations are calculated for each product pair combination for any one product out of N products in this way, each of N-1 trend correlations, seasonal correlations, and street correlations will be calculated.

이러한 방법으로 획득된 N-1개의 추세 상관성, 계절성 상관성 및 전차 상관성을 앞서 설명한 수식에 대입하여 N-1개의 가중상관성을 연산하고, 이를 N-1로 나누어 가중상관성 평균을 연산한다. 이러한 방식으로 N개의 모든 상품마다 가중상관성 평균을 연산하여 확보한다.N-1 weighted correlations are calculated by substituting the N-1 trend correlations, seasonal correlations, and street correlations obtained in this way into the above-described formula, and dividing them by N-1 to calculate the weighted correlation average. In this way, a weighted correlation average is calculated and secured for all N products.

하나의 상품은 하나의 가중상관성 평균을 갖는다. 이 점수는 해당 상품의 판매량 변동 특성을 나타내는 점수이다. 따라서, 특정 범위로 점수를 구분함으로써 군집을 분류할 수 있다. 예컨대, -0.05~0점 범위의 제1군집, 0~0.5점 범위의 제2군집, 0.5~10점 범위의 제3군집 등으로 분류할 수 있다. 구분 범위를 결정하는 구체적인 수치는 매장마다 다를 것이므로 본 명세서에서 구체적 수치를 한정하지는 않도록 한다. One product has one weighted correlation average. This score is a score representing the change in sales volume of the product. Therefore, clusters can be classified by dividing scores into a specific range. For example, it can be classified into a first cluster ranging from -0.05 to 0 points, a second cluster ranging from 0 to 0.5 points, and a third cluster ranging from 0.5 to 10 points. Specific numerical values for determining the range of division are different for each store, so specific numerical values are not limited in this specification.

이러한 방법으로 하나 이상의 군집이 형성되며, 모든 상품은 어느 하나의 군집에 포함되도록 분류된다. In this way, one or more clusters are formed, and all products are classified to be included in one cluster.

2.3 수요예측 모델 생성2.3 Demand forecasting model generation

수요예측 모델 생성 모듈(230)은 분류된 모든 군집에 대하여 각 군집마다 수요예측 모델을 생성한다. 동일한 군집으로 분류된 모든 상품들의 시간별 판매량 데이터를 다시 확인하고, 앞서 설명한 방법과 같이 하나의 상품씩 순차적으로 전이학습하여 수요예측 모델을 생성한다. 마찬가지로, 랜덤포레스트 기반일 수 있다. The demand forecasting model generation module 230 generates a demand forecasting model for each cluster with respect to all classified clusters. Check again the sales volume data by hour for all products classified into the same cluster, and create a demand forecasting model by sequentially transfer learning one product at a time as described above. Similarly, it can be random forest based.

2.4 수요예측2.4 Demand Forecast

사용자는, 수요예측부(120)에 수요를 예측하고자 하는 상품과 예측하고자 하는 시간을 입력한다. 수요예측부(120)는 먼저 입력된 상품이 어느 군집에 포함되는지 확인하고, 확인된 군집에 대하여 생성된 수요예측 모델을 불러온다. The user inputs a product for which demand is to be predicted and a time to be predicted in the demand prediction unit 120 . The demand forecasting unit 120 first checks in which cluster the input product is included, and calls the generated demand forecasting model for the identified cluster.

불러온 수요예측 모델에 상품과 시간이 적용되면 판매량이 출력된다. 출력되는 판매량이 해당 시간 해당 상품의 수요로 예측되는 양이다. When product and time are applied to the imported demand forecasting model, sales volume is output. The output sales volume is the amount predicted by the demand for the product at that time.

한편, 수요예측부(120)에 의해 특정 상품의 특정 시간 수요가 예측되면, 이를 활용하여, 현재 재고량과 비교하여, 발주부(130)가 자동으로 해당 상품의 발주정보를 생성할 수도 있다. 생성된 발주정보는 매장관리부(100)를 통해 발주처에 자동 전송될 수도 있다. On the other hand, if the demand forecasting unit 120 predicts the demand for a specific product at a specific time, the ordering unit 130 may automatically generate order information for the corresponding product by comparing it with the current stock amount. The generated ordering information may be automatically transmitted to the ordering party through the store management unit 100 .

3. 검증실험3. Verification experiment

전술한 방법으로 생성된 수요예측 모델의 정확성을 확인하기 위하여 검증실험을 실시하였다. A verification experiment was conducted to confirm the accuracy of the demand forecasting model generated by the above method.

A지역 소재 B마트에서 거래되는 상품들 중 50개의 상품을 선정하고 시간별 판매량 데이터를 사용하였다. 이 중 40개는 학습 자료로 활용하였고 10개는 검증 자료로 활용하였다. 즉, 40개의 모델링생성 상품이 선정되었다. 이후 랜덤포레스트 기법으로 순열특성 중요도를 연산하였으며, 연산 결과는 도 4와 같이, 추세 상관성(correlation of trend)은 3.85, 계절성 상관성(correlation of seasonality)은 14.59, 전차 상관성(correlation of residual)은 6.8로 연산되었다. 이에 따라, 가중상관성 연산식은 다음과 같이 결정되었다. Among the products traded at B Mart located in A region, 50 products were selected and the sales volume data by hour was used. Of these, 40 were used as learning materials and 10 were used as verification materials. That is, 40 modeling products were selected. Then, the permutation feature importance was calculated using the random forest method, and the calculation results were 3.85 for the correlation of trend, 14.59 for the correlation of seasonality, and 6.8 for the correlation of residual, as shown in FIG. calculated Accordingly, the weighted correlation equation was determined as follows.

Weighted Correlation = Correlation of Trend * 3.85 + Correlation of seasonality * 14.59 + Correlation of Residual * 6.8Weighted Correlation = Correlation of Trend * 3.85 + Correlation of Seasonality * 14.59 + Correlation of Residual * 6.8

해당 수식을 이용하여, 40개의 상품마다 다른 상품을 하나씩 쌍을 이루게 하여 가중상관성을 연산하였다. 하나의 상품마다 39개의 가중상관성이 연산되었다. 도 5는 그 결과를 도시한다. 또한, 39개의 가중상관성 점수의 평균을 연산하였다. 가중상관성 평균은 -0.05부터 10까지 분포하였다. Using this formula, the weighted correlation was calculated by making a pair of different products for every 40 products. Thirty-nine weighted correlations were calculated for each product. Figure 5 shows the result. In addition, the average of 39 weighted correlation scores was calculated. The average weighted correlation ranged from -0.05 to 10.

이제 40개의 가중상관성이 확보되었으므로 소정의 기준에 의해 이를 구분하여 군집을 형성할 수 있다. 일 예시에서, 가야산천년수2L(6개)와 동일한 군집으로 형성되는 상품들은 아래와 같았다. Now that 40 weighted correlations are secured, clusters can be formed by classifying them according to a predetermined criterion. In one example, the products formed in the same cluster as Gayasan Thousand Years 2L (6 pieces) were as follows.

- 2400설레임밀크, 가야산천년수500ml_20, 남부)불고기맛후랑크소시지, 남부)오륙도맛바, 들꽃향기물티슈리필, 미전지 불고기용, 바)누가바골드, 바)돼지바, 바)메가톤, 바)바밤바골드, 바)비비빅, 바)서울멜론바, 바)수박바, 바)스크류바, 바)쌍쌍바, 바)아맛나, 바)옥동자, 바)죠스바, 바)쿠앤크바쵸코, 봉지새송이버섯(1입, 400g_특_봉_국산), 신)참이슬후레쉬, 신)카스후레쉬병, 제주삼다수, 청정)친환경대란60구, 카스후레쉬캔, 콘)더블비얀코, 콘)부라보바닐라, 콘)빵빠레바닐라, 콘)빵빠레쵸코, 콘)월드콘 - 2400 Seolreim Milk, Gaya Millennial Water 500ml_20, Southern) Bulgogi Flavor Frank Sausage, Southern) Oryukdo Flavor Bar, Wildflower Scent Wet Tissue Refill, Unpasteurized Bulgogi, B) Nougat Bar Gold, B) Pork Bar, B) Megaton, B) Babamba Gold, bar) Bibbic, bar) Seoul Melon Bar, bar) Watermelon Bar, f) Screw Bar, bar) Ssangssang Bar, b) Amatna, b) Okdongja, b) Joe’s Bar, bar) Quan Kva Choco, Bag King Oyster Mushroom (1) Mouth, 400g_Special_Bong_Domestic), New) Chamisul fresh, New) Cass fresh bottle, Jeju Samdasoo, Clean) Eco-friendly 60 eggs, Cas fresh can, Corn) Double Byanco, Corn) Burabo Vanilla, Corn) Breadpare Vanilla, Corn) Breadpare Chocolate, Corn) World Corn

즉, 동일 상품군이 아님에도 동일 군집에 포함되는 결과가 도출되었다. In other words, results included in the same cluster even though they are not the same product group were derived.

위의 군집에 대하여 다시 전이학습을 진행하여 수요예측 모델을 최종 완성하였다. For the above clusters, transfer learning was conducted again to finally complete the demand forecasting model.

최종 완성된 수요예측 모델에 남은 10개의 상품을 이용한 valid 데이터 예측을 진행하고 실제 데이터와 비교하였다. 도 6a 상부 도면에 그 결과가 도시되며, 청색은 실제 데이터이고 적색인 예측 데이터이다. 또한, 다른 임의의 상품에 대한 test 데이터 예측을 진행하였다. 도 6a 하부 도면에 그 결과가 도시된다. Valid data prediction was performed using the remaining 10 products in the final completed demand forecasting model and compared with actual data. The result is shown in the upper figure of FIG. 6A, blue is actual data and red is predicted data. In addition, test data prediction for other random products was carried out. The result is shown in Fig. 6a lower figure.

이러한 방법과 비교하기 위하여, 군집을 형성하지 않고 단일 상품에만 제한된 수요예측 모델을 형성하고 동일한 예측을 진행하였다. 도 6b 상부 도면에 valid 데이터 예측 결과가 도시되고, 도 6b 하부 도면에 test 데이터 예측 결과가 도시된다. 통계 분석 결과는 아래의 표와 같다. In order to compare with this method, a demand forecasting model limited to a single product was formed without forming a cluster, and the same prediction was made. The valid data prediction result is shown in the upper figure of FIG. 6B, and the test data prediction result is shown in the lower figure of FIG. 6B. The statistical analysis results are shown in the table below.

구분division 군집 개념 적용 전이학습 모델Transfer learning model applying cluster concept 단일 전이학습 모델Single transfer learning model valid 데이터valid data test 데이터test data valid 데이터valid data test 데이터test data rmsermse 92.5392.53 88.8388.83 82.6282.62 92.0892.08 maemae 73.2673.26 66.9266.92 63.8863.88 65.1865.18 r2r2 0.640.64 0.560.56 0.710.71 0.530.53 mapemape 0.510.51 0.140.14 0.490.49 0.150.15

통계적으로, 전이학습 모델 자체의 정확성은 어느 경우이든 높은 것으로 확인되었다. 다만, 단일 전이학습 모델을 사용하는 경우 예측값과 실제값의 차이가 매우 큰 예외적인 데이터 지점이 일부 확인되었다. 군집 개념이 적용된 본 발명에 따른 방법 적용 모델이 보다 안정적이며 실제값과의 차이가 통계적으로 보다 작아서 정확성이 높은 것으로 확인되었다. Statistically, the accuracy of the transfer learning model itself was confirmed to be high in any case. However, when using a single transfer learning model, some exceptional data points with a very large difference between the predicted value and the actual value were identified. It was confirmed that the method application model according to the present invention to which the cluster concept was applied was more stable and had a statistically smaller difference from the actual value, resulting in higher accuracy.

100: 매장관리부
110: 매출정보 획득부
120: 수요예측부
130: 발주부
200: 제어부
210: 가중상관성 연산식 작성 모듈
220: 군집 분류 모듈
230: 수요예측 모델 생성 모듈
290: 전이 학습 모듈100: store management department
110: sales information acquisition unit
120: demand forecasting unit
130: ordering department
200: control unit
210: weighted correlation formula creation module
220: cluster classification module
230: Demand forecasting model creation module
290: transfer learning module

Claims

(A) preparing, by the weighted correlation equation creation module 210, a weighted correlation equation using a preset method;
(B) After the clustering classification module 220 time-series decomposition of the hourly sales data of each of the plurality of products into trend, seasonal, and residual, the trend for every product pair combination of the plurality of products The correlation of trend, correlation of seasonality, and correlation of residual are calculated and obtained, and by substituting them into the weighted correlation equation, the weighted correlation of all product pair combinations obtaining;
(C) The cluster classification module 220 calculates the average of all weighted correlations including any one product among the plurality of products, performs this for all products, and calculates the weighted correlation average for each product step;
(D) classifying, by the cluster classification module 220, the plurality of products into one or more clusters by classifying products having the weighted correlation average within a preset range into the same cluster; and
(E) For all clusters classified in step (D), the demand forecasting model generation module 230 sequentially transfers the sales data by time of all products classified in the same cluster to the corresponding cluster. Including the step of generating a demand forecasting model for the corresponding product,
method.

According to claim 1,
In step (A),
(a1) obtaining, by the weighted correlation formula preparation module 210, sales data for each of a plurality of modeling target products by time;
(a2) time-series decomposition of the obtained sales volume data by time by the weighted correlation equation preparation module 210 into trend, seasonality, and train, respectively; and
(a3) comprising the step of the weighted correlation equation creation module 210 creating a weighted correlation equation using the data of the time-series decomposed modeling target products,
method.

According to claim 2,
In the step (a3),
(a31) generating, by the transfer learning module 290, a first demand forecasting model by sequentially performing transfer learning on the time-series decomposed data of modeling target products; and
(a32) The transfer learning module 290 calculates the permutation importance of each of the trend, seasonality, and residual decomposed time series using the first demand forecasting model, and prepares the weighted correlation equation Module 210 uses this to create a weighted correlation equation such as the following equation,
Weighted Correlation = Correlation of Trend * Permutation Importance of Trend + Correlation of Seasonality * Permutation Importance of Seasonality + Correlation of Residual * Permutation Importance of Seasonality
method.

According to claim 2,
The products to be modeled in step (a1) and the plurality of products in step (B) are the same,
method.

According to claim 3,
In the step (E),
Generating, by the transfer learning module 290, the demand forecast model generation module 230, the demand forecast model, by transfer learning the hourly sales volume data of all products in the cluster classified in the step (D). including,
method.

According to claim 5,
In the step (a31),
The transfer learning module 290 includes generating a first demand forecasting model based on a random forest,
In the step (a32),
Comprising the transfer learning module 290 calculating permutation importance based on random forest in the first demand forecasting model,
method.

According to claim 4,
After the step (E),
(F) inputting a predetermined product and a time to be predicted into the demand forecasting unit 120;
(G) determining which cluster the input product is included in by the demand predicting unit 120; and
(H) further comprising the step of predicting the demand of the time to be predicted by the demand forecasting unit 120 calling the demand prediction model generated for the identified cluster and applying the time to be predicted,
method.

According to claim 6,
After the step (H),
(I) The ordering unit 130 generates product ordering information for ordering the product by using the demand for the product predicted in step (H),
method.

According to claim 8,
The store management unit 100 is linked with the ordering unit 140 and the sales information acquisition unit 110, and the sales information acquisition unit 110 acquires sales information including hourly sales volume data of the plurality of products,
The step (a1) includes acquiring sales data for each hour of each of the plurality of products from the sales information acquisition unit 110,
The step (B) includes the step of obtaining, by the cluster classification module 220, sales volume data by time of each of the plurality of products,
method.

A program stored in a storage medium and operated by the control unit 200 so that the method according to any one of claims 1 to 9 is performed.