KR20190130216A

KR20190130216A - method of generating demand forecast data by online bigdata analysis

Info

Publication number: KR20190130216A
Application number: KR1020180048139A
Authority: KR
Inventors: 백승훈; 이지수; 강민구
Original assignee: (주)엘렌; 한신대학교 산학협력단
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2019-11-22

Abstract

The present invention relates to a technique for forecasting demand by normally analyzing various data. In particular, provided is a technique for finding places where actual data exist, collecting the same everyday in real time, and applying various environment variables, thereby providing an advanced demand forecast algorithm and a result value. Further, the present invention relates to a technique for adjusting a web crawling weight value in generating demand forecast data through big data analysis. In particular, provided is a big data analysis modeling and demand forecast processing technique which can increase big data analysis performance and effectively perform demand forecasting by classifying data in a big data inflow route by a data classification method and feedback-adjusting a weight value for each classification in gathering large amounts of web data through web crawler, which is a computer program exploring the world wide web by an organized and automated method in internet marketing.

Description

Method of generating demand forecast data by online big data analysis

본 발명은 일반적으로 각종 데이터를 분석하여 수요를 예측하는 기술에 관한 것이다. 특히, 실질 데이터가 존재하는 곳을 찾아 매일, 실시간으로 수집하여 존재하는 여러 환경변수를 적용함으로써 진보된 수요예측 알고리즘과 결과값을 제공하는 기술을 제공한다.The present invention generally relates to techniques for predicting demand by analyzing various data. In particular, it provides advanced technology for predicting demand and finding results by finding the place where real data exists and applying daily environment variables collected daily and in real time.

또한, 본 발명은 빅데이터 분석을 통해 수요예측 데이터 생성함에 있어서 웹 크롤링 가중치를 조정하는 기술에 관한 것이다. 특히, 인터넷 마케팅에서 조직적, 자동화된 방법으로 월드와이드웹을 탐색하는 컴퓨터 프로그램인 웹 크롤링을 통해 대규모의 웹 데이터를 수집함에 있어서 빅데이터 유입경로에서 데이터 분류법(Data Classification)에 의하여 데이터를 분류하고 해당 분류별로 가중치를 피드백 조정함으로써 빅데이터 분석 성능을 향상시키고 수요 예측을 효과적으로 수행할 수 있는 빅데이터 분석 모델링 및 수요예측 처리 기술을 제공한다.The present invention also relates to a technique for adjusting web crawling weights in generating demand forecast data through big data analysis. In particular, in collecting large-scale web data through web crawling, a computer program that browses the World Wide Web in an organized and automated way in Internet marketing, data is classified and classified according to data classification in the big data funnel. It provides big data analysis modeling and demand forecasting technology that can improve big data analysis performance and efficiently demand forecasting by adjusting weights by classification.

최근 스마트 디바이스와 네트워크의 발전, 그리고 다양한 네트워크 서비스의 발달로 인하여 공개되고 누적되고 있는 데이터의 양이 크게 증가 되었고 모든 기관과 기업 그리고 개인도 방대한 데이터의 분석을 통해 편리성 제공과 서비스 수준을 높이기 위하여 다각화된 연구와 개발을 하고 있다.Recently, due to the development of smart devices and networks and the development of various network services, the amount of open and accumulated data has been greatly increased, and all institutions, companies, and individuals have analyzed the vast amount of data to provide convenience and improve service levels. Diversified research and development.

하지만 활용할 수 있는 데이터는 이미 가공되어 있거나 특정적인 SNS와 뉴스, 공공기관이 보유하고 있는 공공 데이터에 의존할 수밖에 없다. 하지만 이를 활용하기에는 많은 문제점이 존재한다. 필요한 데이터가 있어도 이를 사용하기 위하여 데이터에 접근하는 문제, 데이터 제공자의 데이터 공개 수준의 문제, 데이터의 다른 포맷으로 인한 사용성의 문제등 많은 문제를 가지고 있다.However, the data that can be used is inevitably relying on public data held by specific SNS, news and public institutions. However, there are many problems to utilize it. Even though the data is needed, there are many problems such as access to the data in order to use it, problems with the data provider's data disclosure level, and usability due to different formats of the data.

그에 따라, 인터넷에 공개된 대량의 데이터를 수집 및 분석 처리하는 방식을 개선하여 종래기술의 문제점을 해결하고 이를 통해 상업적인 활용이 가능한 데이터 수집과 분석 처리 기술이 요망된다.Accordingly, there is a need for a data collection and analysis processing technology that can solve the problems of the prior art by improving a method of collecting and analyzing and processing a large amount of data published on the Internet.

본 발명의 목적은 일반적으로 각종 데이터를 분석하여 수요를 예측하는 기술을 제공하는 것이다. 특히, 실질 데이터가 존재하는 곳을 찾아 매일, 실시간으로 수집하여 존재하는 여러 환경변수를 적용함으로써 진보된 수요예측 알고리즘과 결과값을 제공하는 기술을 제공하려고 한다.It is generally an object of the present invention to provide a technique for predicting demand by analyzing various data. In particular, it seeks to provide an advanced demand forecasting algorithm and technology that provides real-time demand prediction algorithms by finding various places where real data exists and applying them in real time every day.

또한, 본 발명의 목적은 빅데이터 분석을 통해 수요예측 데이터 생성함에 있어서 웹 크롤링 가중치를 조정하는 기술을 제공하는 것이다. 특히, 인터넷 마케팅에서 조직적, 자동화된 방법으로 월드와이드웹을 탐색하는 컴퓨터 프로그램인 웹 크롤링을 통해 대규모의 웹 데이터를 수집함에 있어서 빅데이터 유입경로에서 데이터 분류법에 의하여 데이터를 분류하고 해당 분류별로 가중치를 피드백 조정함으로써 빅데이터 분석 성능을 향상시키고 수요 예측을 효과적으로 수행할 수 있는 빅데이터 분석 모델링 및 수요예측 처리 기술을 제공하려고 한다.It is also an object of the present invention to provide a technique for adjusting web crawling weights in generating demand forecast data through big data analysis. In particular, in collecting large-scale web data through web crawling, a computer program that browses the World Wide Web in an organized and automated way in internet marketing, data is classified by the data classification method in the big data funnel and weighted by the classification. By adjusting the feedback, the company aims to provide big data analysis modeling and demand forecasting techniques that can improve big data analysis performance and efficiently perform demand forecasting.

상기의 목적을 달성하기 위하여 본 발명은 온라인 빅데이터 분석을 통한 수요예측 데이터 생성 방법을 제시한다. In order to achieve the above object, the present invention provides a method for generating demand forecast data through online big data analysis.

특히, 본 발명은 온라인 빅데이터 분석을 통한 수요예측 데이터 생성에 있어서 웹 크롤링 가중치 조정 기술을 제시한다.In particular, the present invention proposes a web crawl weight adjustment technique for generating demand forecast data through online big data analysis.

한편, 본 발명에 따른 컴퓨터프로그램은 하드웨어와 결합되어 이상과 같은 온라인 빅데이터 분석을 통한 수요예측 데이터 생성 방법을 실행시키기 위하여 매체에 저장된 것이다.On the other hand, the computer program according to the present invention is stored in the medium in order to execute the method for generating demand forecast data through on-line big data analysis combined with hardware.

본 발명에 따르면 종래기술에서의 폐쇄되고 가공되어 있으며 실효성이 떨어지는 데이터를 분석하는 것이 아니라 실제 운용되고 있는 시장으로부터 실시간으로 획득되는 데이터를 그때그때 분석하여 시장 수요를 예측함으로써 시장분석 모델로서의 활용도가 높다는 장점이 있다.According to the present invention, rather than analyzing closed, processed, and ineffective data in the prior art, data obtained in real time from the currently operating market is analyzed at that time, thereby predicting the market demand, and thus the utilization as a market analysis model is high. There is an advantage.

또한, 본 발명에서 제시된 빅데이터 분석 모듈(BAFP)은 빅데이터 시대의 개방되고 공유된 대량의 데이터의 활용할 수 있도록 설계되었으며 이를 통해 정확한 분석을 가능케 하고 빅데이터 분석의 타겟 확장성이 있기 때문에 다양한 업종 및 국가를 대상으로 분석할 수 있는 장점이 있다.In addition, the big data analysis module (BAFP) presented in the present invention is designed to utilize the open and shared large amounts of data in the big data era, thereby enabling accurate analysis and because of the target scalability of big data analysis, various industries And country can be analyzed.

또한, 본 발명에 따르면 소비 패턴에 따라 각 세대별, 국가별 분석이 가능함으로써 세대간의 행동유형 분석과 해외 소비패턴을 통해 각각의 대상에 따른 수요 예측을 할 수 있는 장점이 있다.In addition, according to the present invention, it is possible to analyze by generation and by country according to consumption patterns, and thus, there is an advantage of predicting demand for each target through analysis of behavior patterns between generations and overseas consumption patterns.

[도 1]은 웹 크롤러의 동작 구조를 개념적으로 나타내는 도면.
[도 2]는 본 발명에서 빅데이터 분석 모듈(BAFP)과 관련된 데이터 처리 구조를 개념적으로 나타내는 도면.
[도 3]은 본 발명에서 빅데이터 분석 모듈(BAFP)과 관련된 데이터 처리 구조를 좀더 자세하게 나타내는 도면.
[도 4]는 본 발명에서 빅데이터 분석 모듈(BAFP)이 빅데이터를 처리하는 과정을 개념적으로 나타내는 도면.
[도 5]는 본 발명에서 상품 평가 환산식에서 트랜드 가중치의 조정을 개념적으로 나타내는 도면.
[도 6]은 본 발명에서 상품 선호도 분석을 활용한 사례를 나타내는 도면.
[도 7]은 본 발명에서 빅데이터 분석 모듈(BAFP)의 기능 모듈을 개념적으로 나타내는 블록도.
[도 8]은 본 발명에서 빅데이터 분석 모듈(BAFP)의 기능 모듈이 내부 처리에 참고하는 일 예를 나타내는 도면.
[도 9]는 본 발명에서 빅데이터 분석 결과에 따른 리포팅 프로세스를 개념적으로 나타내는 도면.
[도 10]은 본 발명에서 분석 리포팅의 일 예를 나타내는 도면.1 is a diagram conceptually illustrating an operation structure of a web crawler.
FIG. 2 conceptually illustrates a data processing structure associated with a big data analysis module (BAFP) in the present invention. FIG.
3 is a view showing in more detail the data processing structure associated with the Big Data Analysis Module (BAFP) in the present invention.
4 is a diagram conceptually illustrating a process of processing big data by a big data analysis module (BAFP) in the present invention.
5 is a view conceptually illustrating the adjustment of trend weights in a commodity valuation equation in the present invention.
6 is a view showing an example utilizing the product preference analysis in the present invention.
7 is a block diagram conceptually showing a functional module of a big data analysis module (BAFP) in the present invention.
8 is a diagram showing an example in which a functional module of a big data analysis module (BAFP) refers to internal processing in the present invention.
9 is a diagram conceptually illustrating a reporting process according to a big data analysis result in the present invention.
10 is a diagram showing an example of analysis reporting in the present invention.

이하에서는 도면을 참조하여 본 발명을 상세하게 설명한다.Hereinafter, with reference to the drawings will be described in detail the present invention.

먼저, 빅데이터 분석 모델링 및 수요예측 데이터 생성을 위한 데이터 수집/유입 및 백워드 검색에 대해서 살펴본다.First, we will look into data collection / inflow and backward search for big data analysis modeling and demand forecast data generation.

데이터 수집은 공개범위에 따라 접근방법을 달리하며 데이터의 활용목적에 따라 수집해야할 데이터의 내용을 달리한다. Data collection has different approaches depending on the scope of disclosure, and the content of data to be collected depends on the purpose of the data.

또한 데이터 수집시 대부분 API(Application Program Interface)를 지원하고 있지 않기 때문에 수집 대상의 네트워크 구조와 보안 환경에 따라 다양한 방법의 접근을 통해 추출할 수 있다.In addition, since most of the data collection does not support API (Application Program Interface), it can be extracted through various methods depending on the network structure and security environment of the collection target.

수집 방법과 관련하여, 매일 실시간으로 데이터를 수집해야 하며 미리 설정된 패턴과 스케쥴에 의해 데이터를 수집할 수 있어야 정확한 분석이 가능하다. 한편 많은 개수의 가상화가 필요하기 때문에 수집 서버의 경우 높은 사양과 네트워크 속도가 요구된다. 예를 들어, 여러 개의 가상화가 이루어진 파이썬(Python)을 이용하여 데이터 수집을 진행할 수 있다.Regarding the collection method, data must be collected in real time every day, and data can be collected according to preset patterns and schedules for accurate analysis. On the other hand, because of the large number of virtualizations required, the collection server requires high specification and network speed. For example, data collection can be done using Python, which has multiple virtualizations.

본 발명에서는 빅데이터 수집 및 검색을 위해 웹 크롤링(Web Crawling)과 데이터 파싱(Data Parsing)을 활용할 수 있다. 웹 크롤링을 통해 수집대상 타겟과 데이터의 활용가치에 따른 분류를 정의하고, 데이터 파싱을 통해 분석목적에 맞는 데이터를 주기적으로 수집하여 데이터화한다.In the present invention, web crawling and data parsing may be utilized for big data collection and retrieval. Through web crawling, we define classifications based on the targets to be collected and the utilization values of data, and through data parsing, periodically collect data for analysis purposes and make data.

[도 1]은 본 발명에서 빅데이터 수집을 위해 양호하게 활용가능한 웹 크롤러의 동작 구조를 개념적으로 나타내는 도면이다.1 is a diagram conceptually showing an operation structure of a web crawler that can be well utilized for big data collection in the present invention.

웹 크롤러(web crawler)는 조직적, 자동화된 방법으로 월드와이드웹을 탐색하는 컴퓨터 프로그램이다. 일반적으로 웹 크롤러가 하는 작업을 웹 크롤링 혹은 스파이더링(spidering)이라 부른다. 검색 엔진과 같은 여러 사이트에서는 데이터의 최신 상태 유지를 위해 웹 크롤링을 수행한다. 웹 크롤러는 그 방문한 사이트의 모든 페이지의 복사본을 생성하는 데 사용되며, 검색 엔진은 이렇게 생성된 페이지를 빠르게 검색하기 위해 인덱싱을 수행한다. 또한 크롤러는 링크 체크나 HTML 코드 검증과 같은 웹 사이트의 자동 유지 관리 작업을 위해 사용되기도 하며, 자동 이메일 수집과 같은 웹 페이지의 특정 형태의 정보를 수집하는 데도 사용된다.Web crawlers are computer programs that navigate the World Wide Web in an organized and automated way. In general, what a web crawler does is called web crawling or spidering. Many sites, such as search engines, crawl the web to keep data fresh. The web crawler is used to create a copy of every page of the visited site, and search engines index it to quickly search these generated pages. Crawlers are also used for automated maintenance of Web sites, such as link checking and HTML code validation, and to collect certain types of information on Web pages, such as automatic email collection.

웹 크롤러는 봇이나 소프트웨어 에이전트의 한 형태이다. 웹 크롤러는 대개 시드(seeds)라고 불리는 URL 리스트에서부터 시작하는데, 페이지의 모든 하이퍼링크를 인식하여 URL 리스트를 갱신한다. 갱신된 URL 리스트는 재귀적으로 다시 방문한다.Web crawlers are a form of bot or software agent. Web crawlers usually start with a list of URLs called seeds, which recognize all hyperlinks on the page and update the list of URLs. The updated list of URLs recursively visits again.

이처럼 조직적, 자동화된 방법으로 월드와이드웹을 탐색하는 기능을 갖춘 컴퓨터 프로그램 검색엔진의 작동 구조는 웹 크롤링(Web crawling), 인덱싱(Indexing), 추출(Searching)의 세 단계로 작동한다. The working structure of a computer program search engine with the ability to navigate the World Wide Web in an organized and automated way works in three stages: Web crawling, indexing, and searching.

먼저, 웹 크롤링 과정은 사이트의 모든 페이지를 복사한다. 이는 최신 데이터를 우선으로 이루어진다. 그리고, 인덱싱 과정은 정확하고 빠른 검색이 가능하도록 데이터를 수집, 저장, 분석한다. 마지막으로, 추출 과정은 검색어를 구성하는 단어가 포함된 제목, 주제 등을 추출한다. First, the web crawling process copies all the pages of your site. This takes precedence over the latest data. The indexing process collects, stores, and analyzes the data for accurate and fast retrieval. Finally, the extraction process extracts a title, a subject, and the like including words constituting the search word.

이러한 일련의 과정을 통해 검색이 이루어진다. 검색엔진의 검색능력은 검색어와 데이터 간의 관련성에 의해 결정된다. 그 검색된 수많은 웹 페이지들 중에서 검색 키워드와 상대적으로 연관성이 더 높은 페이지가 존재하기 마련인데, 검색엔진은 가장 연관성이 높은 결과를 우선순위값에 대한 인덱싱(Indexing) 분석을 통해 피드백(feedback)을 지속적으로 조정한다.This series of processes leads to a search. The search ability of a search engine is determined by the relationship between the search term and the data. Among the searched web pages, there are pages that are more relevant to the search keywords, and the search engine continues to provide feedback through indexing analysis of the most relevant results. Adjust with

다음으로, 빅데이터 분석 모듈(BAFP)에서 데이터의 수집 및 유입(로그 분석), 클라우드 데이터베이스(Cloud DB)의 생성 및 검색, 가중치 값의 지속적인 피드백 조정 구성에 대해 기술한다.Next, the data collection and inflow (log analysis), the creation and retrieval of a cloud database (Cloud DB), and the continuous feedback adjustment configuration of weight values are described in the big data analysis module (BAFP).

[도 2]와 [도 3]는 본 발명에서 빅데이터 분석 모듈(BAFP)과 관련된 데이터 처리 구조를 개념적으로 나타내는 도면이다. 웹 크롤링, 인덱싱, 추출 등을 통해 온라인으로 수집된 대량의 데이터를 본 발명에 따른 분석 목적(즉, 수요예측 데이터 생성)에 맞게 분석하기 위하여 본 발명에서는 빅데이터 분석 모듈(Bigdata Analysis Framework Platform; BAFP)에 대입 분석하며, 이를 통해 해당 분석 목적에 합치하는 결과 값을 도출할 수 있다.2 and 3 are diagrams conceptually showing a data processing structure related to a big data analysis module (BAFP) in the present invention. In order to analyze a large amount of data collected online through web crawling, indexing, extraction, etc. according to the analysis purpose (that is, demand prediction data generation) according to the present invention, the present invention includes a big data analysis module (BAFP). ) Can be used to derive a result that is consistent with the purpose of the analysis.

먼저, 온라인으로 빅데이터를 수집 및 유입함에 있어서 다양한 환경에 대응하기 위해 복수의 태스크(Task)가 구비된다. 각 Task 별로 타사 홈피사이트에서 개별 정보수집을 수행한다. 정보 수집처(site) 별로 그 수집되는 데이터의 종류와 수집환경이 상이하므로 환경에 맞는 모듈을 태스크로 구성한다.First, a plurality of tasks are provided to cope with various environments in collecting and introducing big data online. For each task, individual information collection is performed by third-party homepage. Since the type of collected data and the collection environment are different for each information collection site, a module suitable for the environment is configured as a task.

다음으로, 빅데이터 분석 모듈(BAFP)이 이처럼 수집 및 유입된 빅데이터를 처리하는 과정에 대해 기술한다. Next, the Big Data Analysis Module (BAFP) describes the process of processing the big data collected and introduced.

먼저, 빅데이터 분석 모듈(BAFP)은 상이한 환경에서 수집되는 상이한 변수를 분석 용도에 맞게 분류하여 로그 데이터 데이터베이스(Log Data DB)에 저장하고 비교 분석을 수행한다. 그 비교 분석된 데이터를 모듈화된 클라우드 데이터 셋(Cloud Data set)에 가공하여 분석에 유용한 데이터를 저장한다. First, the Big Data Analysis Module (BAFP) classifies different variables collected in different environments according to an analysis purpose, stores them in a log data database, and performs comparative analysis. The comparatively analyzed data is processed into a modular Cloud Data set to store data useful for analysis.

또한, 빅데이터 분석 모듈(BAFP)은 각각 타사 홈피사이트에서 필요한 정보를 수집하는 요소(변수)의 연관성을 분석하여 타 추출군과 비교하여 추출하지 못한 데이터를 역산하여 데이터값을 유추한다. In addition, the Big Data Analysis Module (BAFP) infers the data value by inverting the data not extracted by comparing the elements (variables) that collect the information required by the third-party homepages.

이와 같은 빅데이터 분석의 활용사례를 살펴본다. Task 1 ~ Task N 판매량과 조회(view)와 리뷰(review)를 분석한 후 Task-i의 리뷰를 통해 판매량과 조회 횟수를 유추한다. 이때, 빅데이터 분석 모듈(BAFP)은 타사 홈피사이트에서 필요정보를 수집하는 요소로서 업종, 분류, 가격 등의 가중치를 적용함으로써 정확한 데이터 값을 유추하는 것이 가능하다.The following is an example of big data analysis. Task 1 ~ Task N After analyzing sales volume, view and review, we infer sales volume and number of inquiry through review of Task-i. At this time, the big data analysis module (BAFP) is an element that collects necessary information from the third-party homepage site, it is possible to infer the correct data value by applying weights such as industry type, classification, price.

한편, 본 발명에서 빅데이터 분석 모듈(BAFP)에서 빅데이터 분석을 위한 수집 요소(변수)의 예는 다음과 같다.Meanwhile, in the present invention, an example of a collection element (variable) for big data analysis in the big data analysis module (BAFP) is as follows.

1) 제품 카테고리(Product category)1) Product category

2) 제품명 또는 제품식별코드(Product name or Unique key)2) Product name or unique key

3) 제품가격(Price)3) Price

4) 판매량(Selling number)4) Selling number

5) 조회 횟수(View count)5) View count

6) 리뷰 횟수(Review count)6) Review count

7) Q&A 횟수(Q&A count)7) Q & A count

8) 제품 업데이트 시간(Update Time)8) Update Time

[도 4]는 본 발명에서 빅데이터 분석 모듈(BAFP)이 빅데이터를 처리하는 과정을 개념적으로 나타내는 도면이다. 본 발명에서 빅데이터 분석 모듈(BAFP)은 그 수집된 데이터 각각의 속성과 연관성을 정의하여 연산하고, 그 연산된 데이터 값에 또다른 매개 변수와의 연산 과정을 반복적으로 거쳐 여러가지 변수를 가정하여 실증적인 데이터 값을 도출한다. 4 is a diagram conceptually illustrating a process of processing big data by a big data analysis module (BAFP) in the present invention. In the present invention, the Big Data Analysis Module (BAFP) defines and associates the attributes and associations of each of the collected data, and empirically assumes various variables by repeatedly calculating the calculated data values with another parameter. Derive the data value.

본 발명에서는 타사 홈피사이트에서 필요한 정보를 수집하는 요소(변수)의 연관성을 분석하고자 빅데이터 분석 모듈(BAFP)를 위한 데이터 수집/유입(로그 데이터 분석) 및 클라우드 데이터베이스 생성과 검색 및 가중치 값의 지속적인 피드백 처리를 수행한다.In the present invention, data collection / inflow (log data analysis) and cloud database generation for the big data analysis module (BAFP), cloud database generation, and continuous search and weighting values are analyzed to analyze the correlation of elements (variables) that collect information required by third-party homepages. Perform feedback processing.

한편, 위에서 예시하였던 수집 요소(변수)에 대해 살핀다.On the other hand, the collection elements (variables) exemplified above are examined.

1) 제품 업데이트 시간(Update Time)(분석 핵심요소) : 제품의 출시 시기를 기준으로 각각의 변수를 대입하여 수요 분석과 예측을 수행할 기준이 된다.1) Product Update Time (Analysis Key Factor): It is a standard to perform demand analysis and forecast by assigning each variable based on the time of product release.

2) 판매량(Selling count)(분석 핵심요소) : 실시간으로 변경되는 값이며 비교 예측하기 위하여 실시간 수집이 필요하다.2) Selling count (analytical key factor): This is a value that changes in real time. Real-time collection is required for comparative prediction.

3) 조회 횟수(View count)(분석 핵심요소) : 구매 변수와 계산하여 제품 구매 적중률을 분석할 수 있으며 제품에 대한 선호도가 평가의 재원이다.3) View count (Analysis key factor): It is possible to analyze the hit rate of product purchase by calculating with purchase variables and preference for product is the source of evaluation.

4) 제품가격(Price)(가중치 변수요소) : 군집의 비교 기준이 되며 수요 예측 분석에 영향을 주는 값으로써 간헐적인 변경이 있을 수 있어 주기적인 수집 비교가 필요하다.4) Price (weighted variable factor): This is a comparison standard of clusters and affects demand forecast analysis. Intermittent changes may be necessary, so periodic collection comparison is necessary.

5) 리뷰 횟수(Review count)(가중치 변수요소) : 제품의 평가를 통하여 제품의 개선 방향과 개발의 기준이 마련된다.5) Review count (weighted variable factor): Through evaluation of the product, the direction of improvement and development criteria are prepared.

6) Q&A 횟수(Q&A count)(가중치 변수요소) : 제품에 대한 관심지수와 제품특성 분석에 필요하다.6) Q & A count (weight factor): It is necessary to analyze the interest index and product characteristics of the product.

7) 제품 카테고리(Product category)(가중치 변수요소) : 시장 점유율 분석과 그룹을 형성하기 위해 세분화된 정의가 필요하다.7) Product category (weight variables): A detailed definition is needed to analyze market share and form groups.

8) 제품명 또는 제품식별코드(Product name or Unique key)(기준 요소) : 상품을 특정짓기 위하여 상품명을 정의(수집)하거나 고유 값으로 구별한다.8) Product name or unique key (reference element): To identify a product, a product name is defined (collected) or distinguished by a unique value.

다음으로, 본 발명에서 빅데이터 분석 모듈(BAFP)을 활용하여 상품 선호도 분석 방법을 기술한다.Next, the present invention describes a product preference analysis method using a big data analysis module (BAFP).

본 발명의 바람직한 실시예로서 [수학식1]의 상품 평가 환산식을 통해 제품의 유효성(제품의 평가지표)를 분석한다. As a preferred embodiment of the present invention through the product evaluation conversion formula of [Equation 1] the effectiveness of the product (evaluation indicators of the product) is analyzed.

본 발명에서 상품 평가 환산식은 타사 홈피사이트에서 필요한 정보를 수집하는 요소(변수)인 상품 평가를 수치적으로 나타낼 수 있도록 상품 평가 환산식의 개념을 제시하는데, [수학식 1]은 일 예이다. 상품 평가의 수치는 제품의 상품성을 정의하는 실제적인 측정 결과값이며 분류별, 가격별로 소위 심리적 지지선이 발생하여 기준점이 달리 측정된다. 동일한 분류에서는 유사한 또는 동일한 패턴을 보이며 가격에 따라 가중치가 달리 부과된다.In the present invention, the product valuation equation provides a concept of the product valuation equation to numerically represent the product valuation, which is an element (variable) that collects necessary information from a third-party homepage. [Equation 1] is an example. The numerical value of the product evaluation is the actual measurement result that defines the product's productability. So-called psychological support lines are generated by classification and price, and the reference point is measured differently. In the same classification, they show similar or identical patterns and are weighted differently according to price.

[수학식 1]에서 트랜드 가중치(Trend Weight)는 기간을 여러 개의 섹터로 적용하여 나온 결과치의 흐름의 결과값에 따라 가중치를 산정하여 상품 평가 환산에 가중치로 적용한다. 본 발명에서 상품 평가는 트랜드와 외부 변수(계절, 특수성)에 따라 흐름의 변화는 가지지만 일반적으로는 일정한 변화 패턴을 가지므로 트랜드 가중치의 계수 값을 추출할 수 있다.In Equation 1, the trend weight is calculated according to the result of the flow of the result obtained by applying the period to several sectors, and is applied as a weight to the product valuation conversion. In the present invention, the product evaluation has a change in flow according to trends and external variables (seasonal, specificity), but generally has a constant change pattern, so that the coefficient value of the trend weight can be extracted.

이러한 상품 평가의 활용 사례는 다음과 같다.The use cases of such product evaluation are as follows.

1) 물티슈(필수품 분류) : 측정값이 아주 높게 측정되며 가격에 따라 가중치 높게 발생한다.1) Wet tissue (classification of necessities): The measured value is measured very high and weighted according to the price.

2) 명품(사치품 분류) : 측정값이 아주 낮게 측정되며 가격에 따라 가중치가 낮게 발생한다.2) Luxury goods (classification of luxury goods): The measured value is measured very low and the weight is generated low according to the price.

[도 5]는 본 발명에서 상품 평가 환산식에서 트랜드 가중치의 조정을 개념적으로 나타내는 도면이다. 가중치 조정을 위한 수학적 접근 방안을 제시한다. [도 5]에 표시된 7GE는 트랜드가 강한 제품군(의류)에 대하여 분석한 데이터이며 트랜드의 민감도에 따라 주기를 예컨대 인공지능(AI)에 의해 지능적으로 조정할수 있는 것이 본 발명의 일 특징이다.5 is a view conceptually illustrating the adjustment of the trend weight in the product valuation equation in the present invention. A mathematical approach for weight adjustment is presented. 5GE shown in FIG. 5 is data analyzed for a trend-high product family (clothing), and one feature of the present invention is that the period can be intelligently adjusted by, for example, artificial intelligence (AI) according to the sensitivity of the trend.

[도 6]은 본 발명에서 위 [수학식 1]과 같은 상품 선호도 분석을 활용한 사례를 나타내는 도면이다. [도 6]의 사례처럼 조사 대상의 선택에 따라 또는 조사 대상의 민감도에 따라 선별적인 조건에 따라 결과 신뢰도를 적용하여 조절할수 있다. 이때, 분석 결과의 신뢰도를 위해 판매수순, 판매율(판매량/조회횟수)순, 판매 동향 변화순, 계절별, 월별, 주별 등의 옵션을 설정할 수 있다.6 is a view showing an example using the product preference analysis as shown in [Equation 1] in the present invention. As in the case of FIG. 6, the result reliability may be adjusted according to a selection condition or a selective condition according to the sensitivity of the investigation target. In this case, options such as sales order, sales rate (sales volume / view count), sales trend change order, season, month, week, etc. may be set for the reliability of the analysis result.

다음으로, 본 발명에서 타사 홈피사이트에서 필요한 정보를 수집하는 요소(변수)로부터 빅데이터 분석 모듈(BAFP)을 위해 빅데이터 분석 및 수요예측, 데이터 분류법, 상관도 가중치 조정에 대해 기술한다.Next, big data analysis and demand prediction, data classification method, and correlation weight adjustment for the big data analysis module (BAFP) will be described in the present invention from elements (variables) that collect information required by third-party homepages.

[도 7]은 본 발명에서 빅데이터 분석 모듈(BAFP)의 기능 모듈을 개념적으로 나타내는 블록도이고, [도 8]은 본 발명에서 빅데이터 분석 모듈(BAFP)의 기능 모듈이 내부 처리에 참고하는 일 예를 나타내는 도면이다.FIG. 7 is a block diagram conceptually illustrating a functional module of a big data analysis module (BAFP) in the present invention, and FIG. 8 is a function module of the big data analysis module (BAFP) in the present invention for reference to internal processing. It is a figure which shows an example.

빅데이터 분석 모듈(BAFP)을 위해 수집된 데이터는 [도 2]에서와 같이 분석 결과 값이 분석 목적에 맞게 태스크화되며 [도 7]의 빅데이터 분석 모듈(BAFP)을 통해 각각의 클라우드 데이터 셋(Cloud Data Set)과 로그 데이터 셋(Log Data set)에 저장된 후 재분석과 업데이트를 반복 수행한다.The data collected for the Big Data Analysis Module (BAFP) is analyzed as the analysis result values are matched to the purpose of the analysis as shown in FIG. 2, and each cloud data set is provided through the Big Data Analysis Module (BAFP) of FIG. It is stored in (Cloud Data Set) and Log Data Set, and then re-analyzed and updated.

[도 7]의 빅데이터 분석 모듈(BAFP)은 실시간으로 수집된 M가지의 데이터를 각각의 연관성을 정의하여 각각의 연산 Set을 구성하여 연산되며, 그 연산된 데이터와 분석 당일을 제외한 또다른 연관 계수를 연산하는 N-1번의 반복 과정을 통해 데이터 값을 도출한다. 그 도출된 데이터 값은 패턴 분류 모듈(Pattern Classification Module), A 상관도 계수 모듈(A Correlation Coefficient Module), 가중 비율 모듈(Weighted ratio Module) 등의 각각의 모듈과 교차, 순차 연산 과정을 통한 예측 가능한 데이터로 가공되는 과정을 거친다.The big data analysis module (BAFP) of FIG. 7 defines each correlation of M pieces of data collected in real time and constructs a set of operations, and calculates another association except the calculated data and the analysis day. The data value is derived through N-1 iterations of calculating the coefficients. The derived data values can be predicted by intersecting and sequential calculation process with each module such as Pattern Classification Module, A Correlation Coefficient Module, and Weighted Ratio Module. It is processed into data.

본 발명에서 데이터 수집의 가중치 조정법에 대해 기술한다. 정확하고 실시간적인 분석 자료를 생성하고 시스템 리소스를 효율적으로 이용하기 위해 수집에 보다 효율적인 알고리듬 제안한다. 각각의 업데이트 데이터에 따라 가중치를 부여함으로써 스케줄링을 수행하며, 이를 통해 데이터 분류법에 따라 자동적으로 데이터 저장소와 수집 스케줄링 저장소에 자동적 이동이 가능하다.In the present invention, a weight adjustment method of data collection is described. In order to generate accurate and real-time analysis data and to efficiently use system resources, we propose a more efficient algorithm for collection. Scheduling is performed by assigning a weight to each update data, which enables automatic movement to the data store and the collection scheduling store according to the data classification method.

본 발명에서 수집 시스템 데이터베이스 구조에 대해 기술한다. 다수의 데이터 셋에 대해 주기를 상이하게 하여 분류하는데, 예컨대 Data 1 (10분 주기), Data 2 (30분 주기), Data 3 (1시간 주기), Data 4 (3시간 주기), Data 5 (6시간 주기), Data 6 (12시간 주기), Data 7 (18시간 주기), Data 8 (24시간 주기), Data Free (자율 수집)과 같이 나타낼 수 있다. The collection system database structure is described in the present invention. For different data sets, different periods are classified, such as Data 1 (10 minute period), Data 2 (30 minute period), Data 3 (1 hour period), Data 4 (3 hour period), Data 5 ( 6 hour cycle), Data 6 (12 hour cycle), Data 7 (18 hour cycle), Data 8 (24 hour cycle), and Data Free (autonomous collection).

기준은 1시간에 업데이트되는 가중치에 따라 데이터베이스 풀(DB.Pool)에 선택적인 삽입과 이동을 함으로써 효율적인 데이터 업데이트를 수행할 수 있다. 9단계의 DB.Pool 마다 순차적인 셀렉트를 하기 때문에 자동으로 선별되는 효과가 있으며 이중적으로 DB.Pool을 옮기게 하는 로직을 만들어 이동하도록 한다.The standard can perform efficient data update by selectively inserting and moving to the database pool (DB.Pool) according to the weight updated in 1 hour. Since there are sequential selects for every 9 levels of DB.Pool, there is an effect that is selected automatically.

다음으로, 본 발명에서 빅데이터 분석 모듈(BAFP)을 위해 타사 홈피사이트로부터 변수 데이터를 수집하는 효율성 분석에 대해 기술한다. 빅데이터 분석 모듈(BAFP)을 위해 타사 홈피사이트에서 필요한 정보를 수집하는 요소(변수) 가중치를 구성하고 데이터의 분석결과를 통한 리포팅을 수행할 수 있다. 이때, 상승도, 하락도의 변화를 검출할 수 있다.Next, the present invention describes the efficiency analysis of collecting variable data from third-party homepages for the big data analysis module (BAFP). For Big Data Analysis Module (BAFP), it is possible to configure factor (variable) weights to collect necessary information from third-party homepages and to report through the analysis results of data. At this time, a change in the degree of rise and the degree of fall can be detected.

[도 9]는 본 발명에서 빅데이터 분석 결과에 따른 리포팅 프로세스를 개념적으로 나타내는 도면이고, [도 10]은 본 발명에서 빅데이터 분석을 통해 도출되는 분석 리포팅의 일 예를 나타내는 도면이다. [도 10]에서 수치도별 구분은 예제가 다양하게 발생하면서 적립된 경우에 수에 따라 더 세분화하여 측정에 대한 분석에 대한 결과 값을 도출할 수 있다.FIG. 9 is a diagram conceptually illustrating a reporting process according to a big data analysis result in the present invention, and FIG. 10 is a diagram illustrating an example of analysis reporting derived through big data analysis in the present invention. In FIG. 10, the classification by numerical diagram may be further subdivided according to the number of cases generated while various examples are generated to derive a result value for analysis on the measurement.

이상과 같이 타사 홈피사이트에서 필요한 정보를 수집하는 변수 데이터 분류법, 상관도 가중치 조정 방식은 다양하게 구현될 수 있다. 예를 들어, 1) 시간, 요일, 계절별 분포도 분석, 2) 상승도, 하락도의 변화 분석, 3) 산발적 급변화 분석, 4) 매출분포 분석, 5) 시장점유율 분석 등을 통해 각각 추출된 수치를 기준으로 각각 다른 시각적 분석을 통해 전반적인 아이템에 대한 자동리포팅 평가서를 제공할 수 있다.As described above, a variable data classification method and a correlation weight adjustment method for collecting information required by third-party homepages may be implemented in various ways. For example, 1) time, day of the week, seasonal distribution analysis, 2) rise and fall change analysis, 3) sporadic rapid change analysis, 4) sales distribution analysis, and 5) market share analysis. Based on the different visual analysis, it is possible to provide an automatic reporting evaluation report on the overall items.

한편, 본 발명은 컴퓨터가 읽을 수 있는 비휘발성 기록매체에 컴퓨터가 읽을 수 있는 코드의 형태로 구현되는 것이 가능하다. 이러한 비휘발성 기록매체로는 다양한 형태의 스토리지 장치가 존재하는데 예컨대 하드디스크, SSD, CD-ROM, NAS, 자기테이프, 웹디스크, 클라우드 디스크 등이 있고 네트워크로 연결된 다수의 스토리지 장치에 코드가 분산 저장되고 실행되는 형태도 구현될 수 있다. 또한, 본 발명은 하드웨어와 결합되어 특정의 절차를 실행시키기 위하여 매체에 저장된 컴퓨터프로그램의 형태로 구현될 수도 있다.Meanwhile, the present invention may be embodied in the form of computer readable codes on a computer readable nonvolatile recording medium. Such nonvolatile recording media include various types of storage devices, such as hard disks, SSDs, CD-ROMs, NAS, magnetic tapes, web disks, cloud disks, etc., and code is distributed in a plurality of networked storage devices. Forms that are implemented and executed may also be implemented. In addition, the present invention may be implemented in the form of a computer program stored in a medium in combination with hardware to execute a specific procedure.

Claims

Demand forecast data generation method through online big data analysis.

A computer program coupled to hardware and stored in a medium for executing the demand forecast data generation method through on-line big data analysis according to claim 1.