KR20010105967A

KR20010105967A - Method of web data mining using on-line web data acquisition and analysis and consulting system using the same

Info

Publication number: KR20010105967A
Application number: KR1020000027112A
Authority: KR
Inventors: 김광용
Original assignee: 이기영; (주) 이씨마이너
Priority date: 2000-05-19
Filing date: 2000-05-19
Publication date: 2001-11-29

Abstract

A process for the preparation of a metal salt of clavulanic acid which comprises the reaction between an organic amine salt of clavulanic acid and a metal salt precursor compound, the reaction taking place in a liquid medium which comprises a liquid fluorinated and/or chlorinated hydrocarbon.

Description

Web data mining method using real-time web data collection and analysis and consulting system using the result {METHOD OF WEB DATA MINING USING ON-LINE WEB DATA ACQUISITION AND ANALYSIS AND CONSULTING SYSTEM USING THE SAME}

본 발명은 웹데이터마이닝 방법과 그 결과를 이용한 웹데이터마이닝 솔루션 의 제작방법에 관한 것으로, 특히 인터넷 설문조사, 웹포인터 소프트웨어를 이용한 인터넷 이용자들의 실시간 웹 서핑 로그 데이터, 서버접속 웹로그 및 거래 데이터, 실시간 웹 고객평가 데이터에 대한 실시간적인 웹데이터의 수집과 분석을 활용한 데이터 웨어하우징 시스템(Data Warehousing System)의 구축과 웹데이터마이닝 솔루션(Web Data Mining Solution)의 개발하는 방법에 관한 것이다.The present invention relates to a web data mining method and a method of manufacturing a web data mining solution using the results, in particular, real-time web surfing log data, server connection web log and transaction data of internet users using internet survey, web pointer software, The present invention relates to the construction of a data warehousing system and the development of a web data mining solution using real-time collection and analysis of real-time web customer evaluation data.

기존의 인터넷 컨설팅 업체 및 인터넷 조사업체는 객관성과 신뢰성이 증명되지 않은 자료들을 토대로 컨설팅 및 조사를 수행하고 있으며 한 두가지의 단순한데이터만을 가지고 데이터마이닝 솔루션을 개발하여, 컨설팅을 수행하고 있기에 고객의 입장에서는 객관성과 신뢰성이 부족한 협소한 정보제공 서비스만을 제공받고 있다. 또한 이제까지의 웹트랙킹(web tracking)을 이용한 실시간 자료의 제공은 그 패널의 구성에 있어서 우리나라 전체 네티즌들의 성향을 반영하기에는 상당한 제약점이 있으며, 실시간적으로 변하고 있는 네티즌들의 성향을 객관적으로 반영할 수 있는 시스템이 미비되어 있는 실정이다. 기존의 인터넷 컨설팅 업체 웹 로그 분석업계는 단순한 자료정제 시스템인 데이터 웨어하우스(Data Warehouse)의 구축으로 매우 협소한 부분의 컨설팅 서비스만을 제공하고 있다. 또한 국내의 데이터마이닝 솔루션 제공업체는 대부분 우리나라의 실정과는 맞지 않는 외국의 자료들을 근거로 해서 도출된 데이터마이닝 알고리즘을 그대로 도입하고 있으며 인터넷의 자료들을 토대로 도출된 알고리즘이 아니며 우리나라의 기업현실과는 매우 동떨어져 있으며 데이터마이닝의 핵심기술인 알고리즘을 도출하는데 사용한 자료들에 대해서 객관성과 신뢰성을 보여주지 못한 개발방법을 제시하고 있다. 기존의 데이터마이닝 솔루션 업체의 가장 취약한 부분은 한가지의 솔루션으로 모든 업체에 적용을 하는 방법을 취하였다는 점인데 각각의 산업별 특성을 무시한 채 한가지 솔루션을 전 산업에 적용하였다는 사실은 해당 솔루션의 객관성과 명확성을 크게 떨어뜨리는 방법이라 할 수 있다.Existing Internet consulting firms and Internet research firms are conducting consulting and research based on materials that have not proved their objectivity and credibility, and developing data mining solutions based on only one or two simple data. Only narrow information providing services lacking objectivity and reliability are provided. In addition, the provision of real-time data using web tracking so far has a significant limitation in reflecting the propensity of all netizens in Korea in the composition of the panel, which can objectively reflect the propensity of netizens changing in real time. The system is incomplete. The existing web consulting industry, the web log analysis industry, provides a very narrow part of consulting services by constructing a data warehouse, a simple data refining system. In addition, most data mining solution providers in Korea adopt data mining algorithms derived from foreign data that do not correspond to Korea's situation, and are not based on internet data. It suggests a development method that is very remote and shows no objectivity and reliability for the data used to derive the algorithm, which is the core technology of data mining. The most vulnerable part of the existing data mining solution company is that one solution is applied to all companies, and the fact that one solution is applied to all industries without neglecting the characteristics of each industry is the objectivity of the solution. It's a way of drastically reducing clarity.

본 발명의 목적은 상기 기술한 바와 같이 종래기술의 한계성을 갖는 데이터마이닝 솔루션에 기초한 인터넷 비즈니스 개발의 단점을 보완하기 위해서 실시간웹데이터의 웹데이터마이닝 방법과, 그 결과를 이용한 웹데이터마이닝 솔루션을 개발하는 방법과, 그리고 웹데이터마이닝 솔루션을 이용한 컨설팅 시스템을 제공하는 것이다. 이를 위하여 인터넷서베이 시스템, 웹트랙킹 시스템, 서버 웹로그 분석 시스템, 산업 및 개별상품 동향을 이용하여 실시간적으로 웹데이터를 수집하고 가공하여 객관적이고 신뢰성있는 웹데이터마이닝 솔루션을 제공하게 된다.The object of the present invention is to develop a web data mining method of real-time web data and a web data mining solution using the results in order to compensate for the shortcomings of Internet business development based on the data mining solution having the limitations of the prior art as described above. And a consulting system that uses web data mining solutions. To this end, the web survey system, web tracking system, server analytics system, industry and individual product trends are used to collect and process web data in real time to provide objective and reliable web data mining solutions.

도 1은 본 발명의 실시예에 따른 인터넷 비즈니스 컨설팅 시스템을 도시한 도면.1 is a diagram illustrating an internet business consulting system according to an embodiment of the present invention.

도 2는 인터넷서베이 시스템을 상세히 도시한 도면.Figure 2 is a detailed view of the Internet survey system.

도 3은 웹트랙킹 시스템 중에서 서버 컴퓨터와 클라이언트 컴퓨터 시스템의 구성을 도시한 도면.3 is a diagram showing the configuration of a server computer and a client computer system among web tracking systems.

도 4는 웹트랙킹 시스템 중에서 클라이언트 시스템의 구현원리를 상세히 도시한 도면.4 is a detailed diagram illustrating an implementation principle of a client system in a web tracking system.

도 5는 웹트랙킹 시스템 중에서 서버 시스템의 구조를 상세히 도시한 도면.5 is a view showing in detail the structure of the server system of the web tracking system.

도 6은 서버 웹로그 분석 시스템의 구현원리를 상세히 도시한 도면.Figure 6 illustrates in detail the principle of implementation of the server analytics system.

도 7은 산업 및 개별상품 동향파악을 위한 실시간 Web 고객평가 시스템의 구조를 도시한 도면.7 is a diagram showing the structure of a real-time Web customer evaluation system for identifying industry and individual product trends.

도 8은 웹트랙킹 시스템의 구현을 위한 웹포인터 프로그램의 전체 프로세스를 도시한 도면.8 is a diagram illustrating the entire process of a web pointer program for implementing a web tracking system.

도 9는 상기 웹 포인터 프로그램의 1단계 프로세스를 상세히 도시한 도면.9 illustrates in detail the one-step process of the web pointer program.

도 10은 상기 웹 포인터 프로그램의 2단계 프로세스를 상세히 도시한 도면.10 is a detailed diagram of a two-step process of the web pointer program.

도 11은 상기 웹 포인터 프로그램의 3단계 프로세스를 상세히 도시한 도면.11 is a detailed diagram illustrating a three step process of the web pointer program.

도 12는 상기 웹 포인터 프로그램의 4단계 프로세스를 상세히 도시한 도면.12 illustrates in detail the four-step process of the web pointer program.

도 13은 본 발명의 실시예에 따른 컨설팅 시스템의 비즈니스 방법의 전체적인 프로세스를 표시한 도면.13 is a view showing the overall process of the business method of the consulting system according to an embodiment of the present invention.

도 14는 도 13의 비즈니스 방법의 1단계 프로세스인 인터넷서베이 시스템의 프로세스를 상세히 도시한 도면.14 is a detailed view of the process of the Internet survey system, which is a one-step process of the business method of FIG.

도 15는 도 13의 비즈니스 방법에 적용할 패널들의 선정방법에 대해서 상세히 도시한 도면.FIG. 15 illustrates in detail a method of selecting panels to be applied to the business method of FIG.

도 16은 도 13의 비즈니스 방법의 2단계 프로세스인 웹트랙킹 시스템의 구성을 도시한 도면.16 is a diagram showing the configuration of a web tracking system which is a two-step process of the business method of FIG.

도 17은 상기 웹트랙킹 시스템의 1단계 프로세스를 상세히 도시한 도면.FIG. 17 illustrates in detail the one-step process of the web tracking system. FIG.

도 18은 상기 웹트랙킹 시스템의 2단계 프로세스를 상세히 도시한 도면.18 illustrates in detail the two-step process of the web tracking system.

도 19는 도 13의 비즈니스 방법의 3단계 프로세스인 서버 웹로그 분석 시스템의 구성을 상세히 도시한 도면.19 is a diagram showing in detail the configuration of a server analytics system that is a three-step process of the business method of FIG.

도 20은 산업 및 개별상품 동향파악을 위한 실시간 웹 고객평가 시스템의 구성을 상세히 도시한 도면.20 is a diagram showing in detail the configuration of the real-time web customer evaluation system for identifying the industry and individual product trends.

도 21은 위의 단계들로부터 데이터베이스화가 된 여러 데이터들을 변환을 통하여 하나의 데이터웨어하우징 시스템으로 통합하는 구성을 상세히 도시한 도면.FIG. 21 is a diagram showing in detail the configuration of integrating the data converted from the above steps into a single data warehousing system through conversion; FIG.

도 22는 데이터웨어하우징 시스템으로부터 각각의 웹데이터마이닝 알고리즘의 형식에 맞는 데이터를 추출하는 웹데이터마이닝 솔루션(방법)의 프로세스를 상세히 도시한 도면.FIG. 22 illustrates in detail the process of a web data mining solution (method) for extracting data conforming to the format of each web data mining algorithm from a data warehousing system.

도 23은 도 22의 프로세스에서 얻어진 웹데이터마이닝 솔루션을 이용하여 개인화된 웹 서핑시스템을 구현하는 과정을 도시한 도면.FIG. 23 illustrates a process of implementing a personalized web surfing system using the web data mining solution obtained in the process of FIG. 22. FIG.

* 도면의 주요부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

100 : 고객 시스템100: customer system

110 : 개인 사용자110: individual user

120 a : 인터넷서베이 패널120 a: Internet Survey Panel

120 b : 인터넷서베이 패널120 b: Internet Survey Panel

120 c : 인터넷서베이 패널120 c: Internet Survey Panel

130 a : 웹트랙킹 패널130 a: Web tracking panel

130 b : 웹트랙킹 패널130 b: Web Tracking Panel

130 c : 웹트랙킹 패널130 c: Web tracking panel

140 a : 웹 고객평가 패널140 a: Web testimonial panel

140 b : 웹 고객평가 패널140 b: Web testimonial panel

140 c : 웹 고객평가 패널140 c: Web testimonial panel

150 : 기업고객 시스템150: Corporate customer system

200 : 데이터베이스 시스템200: database system

211 a : 인터넷서베이 패널 프로파일 데이터베이스211 a: Internet Survey Panel Profile Database

211 b : 웹트랙킹 패널 프로파일 데이터베이스211 b: WebTracking Panel Profile Database

211 c : 웹 고객평가 패널 프로파일 데이터베이스211 c: Web testimonials panel profile database

211 d : 각 개별 사이트 회원 프로파일 데이터베이스211 d: Database for each individual site member profile

212 a : 인터넷서베이 결과 데이터베이스212 a: Internet survey results database

212 b : 인터넷서베이 결과 데이터베이스212 b: Internet Survey Results Database

212 c : 인터넷서베이 결과 데이터베이스212 c: Internet Survey Results Database

212 d : 일반설문 응답 데이터 베이스212 d: General Survey Response Database

212 e : 전화설문 응답 데이터베이스212 e: Telephone survey response database

212 f : 인터넷 센서스 데이터베이스212 f: Internet Census Database

213 : 실시간 인터넷 이용행태 데이터베이스213: Real-time Internet Usage Database

214 : 웹사이트 트래픽 데이터베이스214: Website Traffic Database

215 : 패널 서핑 로그 데이터베이스215: Panel Surfing Log Database

216 : 서버 로그 데이터베이스216: server log database

217 : 개별 사이트 서핑 데이터베이스217: Individual Site Surfing Database

218 : 회원 프로파일 데이터베이스218: Member Profile Database

219 : 전체 산업동향 데이터베이스219: Overall Industry Trends Database

220 : 트랜잭션 데이터베이스220: transactional database

221 : 구매정보 데이터베이스221: Purchasing Information Database

222 : 사이트 평가정보 데이터베이스222: Site evaluation database

223 : 데이터웨어하우즈 데이터베이스223 data warehouse database

300 : 인터넷서베이 시스템300: Internet Survey System

310 : 인터넷 설문 패널 선정(Selection) 시스템310: Internet survey panel selection system

320 : 인터넷서베이 응답 시스템320: Internet survey response system

330 : 일반설문 응답 시스템330: general survey response system

340 : 전화설문 응답 시스템340: telephone question answering system

400 : 웹트랙킹 시스템400: Web Tracking System

410 : 웹트랙킹 패널 선정(Selection) 시스템410: Web tracking panel selection system

420 : 실시간 패널 인터넷 서핑(Surfing) 로그정보시스템420: Real-time Panel Internet Surfing Log Information System

430 : 웹사이트 트래픽 정보시스템430 Website Traffic Information System

500 : 서버 웹로그 시스템500: Server Weblog System

510 : 로그정보 시스템510: Log Information System

520 : 거래정보 시스템520: Transaction Information System

600 : 산업 및 개별상품 동향 시스템600: industry and individual product trend system

610 : 사이트 평가정보 시스템610: Site Evaluation Information System

620: 고객별 만족도 측정 시스템620: customer satisfaction measurement system

630: 구매 및 서비스이용정보 시스템630: Purchase and service usage information system

640 : 전체산업정보 시스템640: Total Industry Information System

700 : 데이터 변환 시스템700: Data Conversion System

710 : 인터넷서베이 데이터 변환시스템710: Internet Survey Data Conversion System

720 : 웹트랙킹 로그파일 변환시스템720: Web tracking log file conversion system

730 : 서버 웹로그 변환시스템730: Server Weblog Conversion System

740 : 개별산업 서베이 데이터 변환시스템740: Individual Industry Survey Data Conversion System

750 : 종합 DB 변환시스템750: Total DB Conversion System

760 : 데이터 자동 셀렉션 시스템760: Data Automatic Selection System

800 : 웹데이터마이닝 시스템800: Web data mining system

810 : 웹데이터마이닝 알고리즘 적용 시스템810: Web data mining algorithm application system

900 : 보고서 자동추출 시스템900: report automatic extraction system

910 : 올랩(OLAP) 시스템910: OLAP system

920 : 데이터마이닝 올랩(OLAP) 시스템920: Data Mining OLAP System

930 : 컨설팅 보고서 자동추출 시스템930: Consulting report automatic extraction system

본 발명에 따른 웹데이터마이닝 솔루션 제작 방법은 실시간 웹 데이터의 수집과 분석을 활용한 웹데이터마이닝 방법과, 데이터 웨어하우징 시스템(Data Warehousing System)의 구축방법을 이용한다. 그 방법에 따라 개발된 웹데이터마이닝 솔루션을 활용한 컨설팅 시스템도 본 발명에 따라 제공된다.The web data mining solution manufacturing method according to the present invention uses a web data mining method using a collection and analysis of real-time web data, and a method of constructing a data warehousing system. A consulting system utilizing the web data mining solution developed according to the method is also provided according to the present invention.

데이터 마이닝이라 함은 거대한 양의 데이터 속에서 쉽게 드러나지 않는 유용한 정보를 찾아내는 과정이라고 할 수 있다. 예를 들면, 창문이 많은 집에는 도둑이 많다라는 말은 당연한 말처럼 들릴 수도 있지만, 누구나 이러한 사실이 보험 요율에 적용될 수 있으리라고 생각하지는 않았을 것이다. 실제로 영국의 한 보험회사는 이러한 사실을 이용하여 차별적인 보험요율을 적용함으로써, 보다 효과적인 정책을 수립하게 되었고 이로 인해 기업의 수익증대에 커다란 공헌을 하게 되었다.Data mining is the process of finding useful information that is not easily revealed in huge amounts of data. For example, it may sound natural to say that homes with many windows have many thieves, but no one would have thought this could apply to insurance rates. Indeed, an insurance company in the UK uses this fact to apply differentiated insurance rates, resulting in a more effective policy, which contributes significantly to the company's profit growth.

Data Mining은 다음과 같이 크게 Computer Science 관점, MIS 관점과 Statistics 관점에 의한 정의로 나누어 살펴볼 수 있다. 먼저Computer Science 관점은 패턴 인식 기술뿐 아니라 통계적, 수학적 분석방법을 이용하여, 저장된 거대한 자료로부터 우리에게 유익하고 흥미 있는 새로운 관계, 성향, 패턴 등 다양한가치 있는 정보를 찾아내는 일련의 과정으로 정의하고 있다. MIS 관점은 거대한 데이터 베이스 혹은 자료에서 유용한 정보를 유출하는 일련의 과정뿐 아니라 값진 정보를 사용자가 전문적 지식 없이 사용할 수 있는 의사 결정지원 시스템의 개발과정을 통틀어 Data Mining이라고 정의하고 있다.Statistics 관점은 올바른 의사결정을 지원하기 위한 자료분석(Data Analysis) 및 모델 선택 (Model Selection)으로 정의하고 있다.Data Mining can be divided into definitions by computer science, MIS, and statistics. firstComputer Science PerspectiveIs defined as a series of processes that use statistical and mathematical analysis methods, as well as pattern recognition techniques, to find a variety of valuable information, such as new relationships, inclinations, and patterns, that are beneficial and interesting to us from huge data stored. MIS PerspectiveData Mining is defined throughout the development of a decision support system that enables users to use valuable information without expert knowledge, as well as a process of leaking useful information from huge databases or data.Statistics perspectiveIs defined as Data Analysis and Model Selection to support good decision making.

창문과 도둑의 관계를 찾아내는 것, 이것이 Data Mining이다. 기업이 보유하고 있는 일일 거래 데이터, 고객 데이터, 상품 데이터 혹은 각종 마케팅 활동에 있어서의 고객 반응 데이터 등과 이외의 기타 외부 데이터를 포함하는 모든 사용 가능한 근원 데이터를 기반으로 감춰진 지식, 기대하지 못했던 경향 또는 새로운 규칙 등을 발견하고 이를 실제 비즈니스 의사결정 등에 유용한 정보로 활용하고자 하는 것이 바로 Data Mining이다. Data Mining은 다양한 방법을 이용하여 근원 데이터를 탐색하고 분석하여 이로부터 기대하고 있는 정보뿐만 아니라, 예상하지 못했던 정보를 찾아내고자 하는 개념적인 방법론인 것이다.Finding the relationship between a window and a thief, this is Data Mining. Hidden knowledge, unexpected trends, or new knowledge based on all available source data, including, but not limited to, daily transaction data, customer data, product data, or customer response data from various marketing activities held by the company. Data Mining is about discovering rules and using them as useful information for actual business decisions. Data Mining is a conceptual methodology that searches and analyzes the source data using various methods to find unexpected information as well as information expected from it.

- Data Mining의 특징-Features of Data Mining

Data Mining의 특징은 크게 다섯 가지로 요약된다. 첫째, 운영계에 축적된 과거자료로부터 비계획적으로 수집된 대용량의 데이터를 다룬다. 둘째, 컴퓨터의 강력한 처리능력을 이용하여 실용화되고 있다. 셋째, 대다수의 Data Mining 기법들은 수학적으로 증명되고 발전된 것이 아니라 경험적으로 개발되었다. 넷째, Data Mining의 주요 관심은 통계적 추론과 검정보다는 예측모형의 일반화에 있다. 다섯째, 기업의 다양한 의사결정 활동에 활용하기 위해서 사용된다. 여섯째, Data Mining은 통계학, 전산과학, 인공지능, 공학 분야에서 개발되기 시작하였다. 그러나 실제로 이를 활용하는 전문가들은 경영, 경제, 정보기술 분야에서 배출되고 있다.The characteristics of Data Mining can be summarized into five categories. First, it deals with large amounts of data collected unintentionally from past data accumulated in the operating system. Second, it has been put to practical use using the powerful processing power of computers. Third, most data mining techniques are empirically developed, not mathematically proven and advanced. Fourth, Data Mining's main interest is in generalizing prediction models rather than statistical inferences and tests. Fifth, it is used to apply to various decision-making activities of companies. Sixth, Data Mining began to be developed in the fields of statistics, computer science, artificial intelligence, and engineering. In practice, however, the experts who use them come from the fields of management, economics and information technology.

- Data Mining의 과정-Data Mining Process

Data Mining을 이야기할 때, mining이라는 것에만 초점을 두어 마치 특정기법(예를 들면, Neural Networks, Case Based Reasoning, Decision Tree 등)이 Data Mining이라고 잘못 이해하는 경우가 종종 있다. 그러나 Data Mining은 신경망모형(Neural Networks)이나 의사결정수(Decision Tree)와 같은 특정 기법을 말하는 것이 아니라 개념적인 정보추출의 방법론이며 이와 관련한 일련의 과정(Process)이라고 할 수 있다. 실제 Data Mining이 적용되는 프로세스를 살펴보면 다음과 같은 단계로 나누어 볼 수 있다.When talking about data mining, it is often misunderstood that certain techniques (for example, Neural Networks, Case Based Reasoning, Decision Tree, etc.) focus on what is mining. However, Data Mining is not a specific technique such as Neural Networks or Decision Trees, but a conceptual method of information extraction and a series of related processes. Looking at the process that Data Mining is applied in, it can be divided into the following steps.

첫째, 문제 정의 단계이다. 이는 적용하고자 하는 비즈니스 문제 정의 및 목표를 결정한다. 즉 Data Mining의 필요성을 충분히 인식하고 현 비즈니스 문제에 대해 잘 이해하고 목적이 무엇인지를 확고히 할 필요가 있다. 정확한 문제의 이해 없이는 성공적인 Data Mining을 수행할 수가 없다. 또한 Data Mining의 결과로 얻어진 정보를 어떻게 활용할 것인가 하는 실제 업무와의 연계성도 충분히 고려해야 한다.First is the problem definition phase. It determines the business problem definitions and goals you want to apply. In other words, it is necessary to fully recognize the need for data mining, to understand the current business problems, and to establish the purpose. Without understanding the exact problem, successful data mining cannot be performed. In addition, consideration should be given to the linkage with actual work on how to use the information obtained as a result of data mining.

둘째, 데이터베이스 구축 단계이다. 이는 정의된 비즈니스 문제에 따라 필요한 데이터를 선택하고 데이터웨어하우스(또는 데이터마트)를 구축함으로써 데이터를 준비한다.Second is the database construction phase. It prepares the data by selecting the required data according to the defined business problem and building a data warehouse (or data mart).

셋째, Data Mining 과정단계이다. 즉 준비된 데이터를 샘플링하고, 사전분석을 통해 탐색하고 변형과정을 거친 후 적절한 Data Mining 기법을 이용하여 정보의 패턴을 발견하고 평가하는 것이다. 즉 보유하고 있는 데이터를 Data Mining을 할 수 있는 적정 상태로 준비하는 작업이다. 적용대상업무에 따라서 차이는 있겠지만 실제로 Data Mining 수행 시, 데이터가 제대로 준비되지 않은 상태에서 필요한 테이블을 연결하고 양질의 데이터를 준비하는 작업은 많게는 전 과정의 80% 이상의 시간과 노력이 소비되기도 한다. 일단 작업대상이 되는 데이터가 준비되면 단순한 SQL을 이용한 질의나 OLAP, 여러 가지 그래픽적인 방법들(Visualization) 혹은 통계적인 방법들을 사용한 일종의 사전분석을 통해 데이터에 대한 기본적인 정보를 얻고 데이터를 이해하고 윤곽을 잡을 수가 있다. 이러한 탐색과정에서의 정보를 기반으로 하여 통계적인 방법들을 비롯한 적절한 Data Mining의 방법들이 적용된다.Third, data mining process. In other words, the prepared data is sampled, searched through pre-analysis, transformed, and then found and evaluated for information patterns using appropriate data mining techniques. In other words, it is a task to prepare data held in proper state for data mining. Although it may vary depending on the tasks to be applied, in practice, data mining requires more than 80% of the time and effort of connecting the necessary tables and preparing the high-quality data when the data is not properly prepared. Once the target data is ready, basic information about the data can be obtained, understood and outlined through simple SQL analysis, OLAP, or some kind of pre-analysis using a variety of graphical or statistical methods. I can catch you. Based on the information in this search process, appropriate data mining methods including statistical methods are applied.

넷째, 비즈니스 보고서작성 단계이다. Data Mining 과정에서 얻어진 결과물에 대해 사용자가 쉽게 이해할 수 있도록 비즈니스의 문제와 목적에 맞게 재표현하는 단계라고 할 수 있다. 이러한 과정을 통해 얻어진 고급정보는 그 의미와 정도에 대해 해석하고 평가하는 단계를 거쳐 실제 업무의 목적에 적합한가를 판단한다. 실제 정보 사용자 혹은 의사결정자가 쉽게 이해할 수 있는 형태로 재표현 되어져 실제 업무에 활용될 수 있도록 결과물의 전달을 포함하는 사용자 환경을 구축하게 될 것이다.Fourth, business report preparation stage. It is a stage that is re-expressed according to the problem and purpose of the business so that users can easily understand the result obtained during the data mining process. The advanced information obtained through this process is interpreted and evaluated for its meaning and degree to determine whether it is suitable for the purpose of actual work. It will create a user environment that includes the delivery of results so that it can be re-presented in a form that can be easily understood by the actual information user or decision maker.

다섯째, 의사결정단계이다. Data Mining으로부터의 정보를 기반으로 수립된전략이나 의사결정을 통한 실제 업무에 있어 활용하는 단계이다.Fifth, it is the decision making stage. This step is used in actual work through strategy or decision-making established based on information from Data Mining.

여섯째, 피드백(Feedback)단계이다. 실제업무에서의 적용 후의 결과나 효과를 토대로 향상된 정보를 얻기 위해 Data Mining의 초기단계로 회귀하는 단계이다. Data Mining을 통해 얻어진 정보는 실제 상황에서의 평가를 통해 피드백(Feedback)되어 다시 Data Mining에 반영되고 재분석이 되면서 얻어질 결과물의 신뢰성과 정도를 높여 가게된다.Sixth, it is a feedback stage. It is the stage of returning to the early stage of data mining to obtain the improved information based on the result or effect after actual application. The information obtained through data mining is fed back through evaluation in actual situation, reflected in data mining, and re-analyzed to increase the reliability and accuracy of the result.

Data Mining은 의미와 목적상 의사결정시스템(Business Intelligence System)과 병행되어져야한다. 즉, Data Mining 솔루션은 특정업무에 국한되거나 단순히 Data Mining이 적용되기 위한 기법이나 방법론의 모음이 아니다. 다양하게 존재하는 원천데이터에 대한 용이한 접근이 가능해야 하고 유용하게 적용될 수 있는 여러 기법들을 제공할 수 있어야 한다. 어떤 문제를 다루는데 정해진 기법이나 규칙이 정해져 있는 것이 아니라 데이터에 따라 혹은 다루어야 할 문제의 성격에 따라 다양한 기법들이 적용될 수가 있어야 하기 때문이다. 또한 구체적인 mining의 방법론을 가지고 실제 작업을 구현하는데 있어서 지침이 되는 가이드라인이 제시되어야 하고 Data Mining에 의한 결과물을 최종사용자가 쉽게 이해하고 이용할 수 있도록 하는 부분까지를 포함하는 보다 포괄적인 솔루션이어야 한다. 그리고, 최근 기술의 발전과 더불어 여러 Data Mining 기법들이 소개되고 있지만, Data Mining에서 인적요소의 역할은 매우 중요하다. 즉 사람에 의한 판단은 mining 과정에서 매우 중요하며, mining된 결과를 비교·평가하고 이를 실제 업무에 어떻게 활용할 것인가를 판단하는 것 역시 사람만이 가능하기 때문이다.Data mining must be parallel with a business intelligence system for meaning and purpose. In other words, data mining solutions are not limited to a specific task or are simply a collection of techniques or methodologies for applying data mining. It should be possible to provide easy access to various existing source data and provide various techniques that can be usefully applied. This is because a specific technique or rule is not set to deal with a problem, but various techniques should be applicable depending on the data or the nature of the problem to be dealt with. In addition, a guideline should be provided to implement the actual work with specific mining methodology, and it should be a more comprehensive solution including the part that makes the result of Data Mining easy to understand and use. And, with the recent development of technology, various data mining techniques have been introduced, but the role of human factor in data mining is very important. In other words, judgment by people is very important in the mining process, and it is only possible for people to compare and evaluate the mined results and determine how to use them in actual work.

한편, 본 명세서에서 사용하는 웹데이터마이닝이란 수많은 인터넷관련 자료들(예를 들며, 서핑 및 인터넷쇼핑몰구매자료 등)로부터 무엇인가 의미있는 연관 등을 발견하여 소프트웨어화시키는 것을 말하며 그렇게 소프트웨어화되어서 구체화 된 것을 웹데이터마이닝 솔루션이라고 한다. 결국, 웹데이터마이닝 솔루션이란 쉽게 말해서 컴퓨터 소프트웨어 팩키지를 의미하는 것이다. 즉, 각 개별기업별로 그 기업의 특성에 맞는 웹데이터마이닝 소프트웨어를 만들어주는 것이다.On the other hand, web data mining used in the present specification is to find a meaningful connection, etc. from a number of Internet-related materials (for example, surfing and Internet shopping mall purchase materials, etc.) and to softwareize the material that is embodied as such It is called web data mining solution. After all, a web data mining solution simply means a computer software package. In other words, each individual company makes a web data mining software that suits its characteristics.

이하 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도1 및 도13을 참조하면, 본 발명의 방법 또는 시스템에서 수집하여 사용하고 있는 실시간 웹 데이터의 종류는 (1) 일반적인 조사와 인터넷 설문조사를 통한 전체 인터넷 이용자 정보, (2) 패널들의 컴퓨터에 장착한 웹 트랙킹 소프트웨어를 이용하여 수집한 인터넷 이용자들의 인터넷 이용동향에 관한 실시간 웹 서핑 로그 데이터 및 웹사이트 트래픽 데이터, (3) 특정 개별 인터넷 사이트 이용자들의 실시간 서버접속 웹 로그 및 거래정보 데이터, (4) 웹사이트 상에서 실시간으로 고객 및 기업에 대한 정보를 통해 특정 인터넷 사이트의 실질적인 인터넷서비스 이용자를 대상으로 인터넷 설문조사를 통한 실시간 웹 고객평가 데이터 등이다. 각각의 데이터 및 그것을 획득하는 과정에 대한 세부적인 설명은 다음과 같다.1 and 13, the types of real-time web data collected and used by the method or system of the present invention are (1) general Internet user information through general surveys and Internet surveys, and (2) computers of panels. (3) Real-time web surfing log data and website traffic data on internet users' trends collected using on-board web tracking software, (3) Real-time server access web log and transaction information data of specific individual internet site users, (4 ) Real-time web customer evaluation data through internet survey targeting actual internet service users of a specific internet site through information on customers and companies in real time on the website. Detailed description of each data and the process of acquiring it is as follows.

(1) 도2 및 도14를 참조하면, 인터넷 서베이 시스템은 클라이언트들에게 설문에 대한 안내메일을 보내어 본 클라이언트들이 본 인터넷 서베이 서버로 접속하게 하여서 그들이 서버로 접속하여 응답을 하면 응답정보들을 파일형태로 패널들의 DB와 같이 저장되어지는 구조를 가진다. 이런 서버 프로그램은 윈도우엔티(Windows NT) 운영체제를 기반으로 실행될 수 있다. 인터넷 서베이 패널들로부터 전달되어지는 정보는 데이터의 모음형태로 되어있으며 이런 데이터를 데이터베이스에 레코드의 형식으로 저장하는데 데이터베이스 레코드의 파일 이름이 중복되어지지 않도록 유니크한 파일명을 사용한다. 이와 더불어 인터넷을 이용하지 않은 일반설문과 전화설문 응답도 설문응답 코딩(Coding) 과정을 거쳐 데이터베이스의 형태로 서버의 데이터베이스에 저장이 되게 된다.(1) Referring to Figs. 2 and 14, the Internet survey system sends a guide mail for the questionnaire to the clients so that the clients can access the Internet survey server, and when they connect and respond to the server, the response information is in the form of a file. It has a structure that is stored like a DB of panels. These server programs can run on top of the Windows NT operating system. Information from Internet survey panels is a collection of data that is stored in the form of records in the database. Use unique file names to avoid duplicate file names for database records. In addition, general surveys and telephone survey responses that do not use the Internet are also stored in the database of the server in the form of a database through a question-and-answer coding process.

(2) 도3부터 도5 및 도16부터 도18까지를 참조하면, 웹 트랙킹 시스템은 클라이언트 시스템과 서버 시스템으로 나누어지게 되는데 먼저 클라이언트 시스템은 사용자 웹 브라우저의 주소 입력 창에 입력되는 웹 주소를 트랙킹해서 로그로 만든 후 서버로 보내게 된다. 서버 프로그램은 다음의 그림과 같이 클라이언트들로부터 메시지가 전달되면 그 정보들을 파일형태로 저장되어지는 구조를 가진다. 이런 서버 프로그램은 Linux 운영체제를 기반으로 실행될 수 있다. 클라이언트로부터 전달되어지는 정보는 스트리밍가능한 형태로 되어있으며 이런 데이터를 파일로 저장하는데 파일 이름이 중복되어지지 않도록 유니크한 파일명을 사용한다. 본 발명에서는 이러한 웹트랙킹 시스템의 구현을 위하여 웹 포인터 소프트웨어(Web Pointer Software)(웹트랙킹 시스템 구현을 위해 클라이언트시스템에 장착되는 클라이언트 프로그램으로서, 본 명세서 및 도면에서는 이를 웹포인터 프로그램, 또는 웹포인터소프트웨어라 함)를 사용한다. 도8부터 도12까지를 참조하면, 웹 포인터 소프트웨어(Web Pointer Software)는 인터넷 이용자들의 웹서핑 동향에 관한 실시간 웹서핑 URL 로그 파일의 수집 및 로그 파일의 서버 전송을 담당하게 되는데 그 구체적인절차는 다음과 같다.3 to 5 and 16 to 18, the web tracking system is divided into a client system and a server system. First, the client system tracks a web address input in an address input window of a user web browser. Log it and send it to the server. As shown in the figure below, the server program has a structure in which the information is stored in the form of a file when messages are sent from clients. These server programs can run on top of the Linux operating system. The information from the client is in a streamable form and uses a unique file name to store this data in a file so that the file name is not duplicated. In the present invention, a web pointer software (Web Pointer Software) for implementing the web tracking system (a client program mounted on the client system for implementing the web tracking system, which is referred to as a web pointer program or a web pointer software in the present specification and drawings). Is used). 8 to 12, Web Pointer Software is responsible for collecting the real-time web surfing URL log files and sending the log files to the server regarding the web surfing trend of Internet users. Same as

프로그램의 수행을 위한 첫 번째 단계는 사용자들이 본 서버시스템의 홈페이지사이트를 통하여 웹 포인터 소프트웨어를 다운로드하여 각 패널들의 컴퓨터에 설치하는 단계이다. 자동적으로 시작 프로그램에 등록함으로써 사용자들이 인터넷을 접속할 때마다 본 프로그램의 실행을 해야 하는 불편함을 줄이고자 하였으며, 원하는 경우 본 프로그램의 Exit가 가능하게 하였다. 두번째 단계는 사용자가 웹 브라우저를 이용하여 인터넷에 접속하면 본 프로그램이 브라우저의 URL 창의 윈도우메시지를 후킹(Hooking)함으로써 http:// 로 시작되는 URL 값을 캐치하는 단계이다. 세번째 단계는 캐치한 URL 값을 사용자의 컴퓨터에 로그 파일로 저장하는 단계로서 사용자의 컴퓨터에 예전의 로그파일이 존재하는 경우 기존의 로그파일에 계속해서 누적시키고, 기존의 로그파일이 존재하지 않는 경우 새로운 로그파일을 생성하여 URL 값을 저장하는 단계이다. 네번째 단계는 각 사용자의 컴퓨터에 저장된 로그파일들을 서버시스템으로 전송하는 단계로서 사용자 컴퓨터에 저장된 로그파일의 크기가 1 k byte를 넘는 경우 저장된 로그파일들을 서버시스템으로 전송하고 전송이 끝나면 자동적으로 저장된 로그 파일들을 삭제하는 단계이다.The first step for the execution of the program is for users to download the web pointer software from the homepage site of the server system and install it on each panel's computer. By automatically registering the startup program to reduce the inconvenience of having to run the program every time the user accesses the Internet, the program can be exited if desired. The second step is to catch the URL value starting with http: // by hooking the window message of the browser's URL window when the user connects to the Internet using a web browser. The third step is to save the captured URL value as a log file on the user's computer. If the old log file exists on the user's computer, the existing log file continues to accumulate. If the old log file does not exist This step saves the URL value by creating a new log file. The fourth step is to transfer the log files stored in each user's computer to the server system. If the size of the log file stored in the user's computer exceeds 1 k bytes, the stored log files are transferred to the server system. Deleting files.

도15를 참조하면, 본 비즈니스 방법에서 이용하는 패널들의 대표성문제를 극복하기 위해 먼저 대한민국 통계청 자료에 근거한 대한민국 국민의 인구통계학적 변수비율에 맞는 10,000명의 패널을 전화조사를 통하여 선발하고, 이 중에서 인터넷을 사용하는 네티즌들의 인구통계학적 변수의 비율을 알 수 있도록 한다. 다음으로 이러한 네티즌들의 인구통계학적 변수의 비율에 의거하여 웹 트랙킹 패널2000명을 별도로 선발하여 패널들의 동의하에 각 패널들의 퍼스널 컴퓨터에 웹 트랙킹 소프트웨어를 설치함으로써 본 비즈니스 방법에서 사용하는 패널들의 대표성 문제를 극복한다.Referring to FIG. 15, in order to overcome the problem of representativeness of the panels used in the present business method, first, a panel of 10,000 people who fit the demographic variable ratio of the Korean people based on the statistics of the Korea National Statistical Office was selected through a telephone survey, and among them, the Internet was selected. Make sure you know the percentage of demographic variables that netizens use. Next, based on the ratio of demographic variables of these netizens, 2,000 web tracking panels were selected separately and the web tracking software was installed on each panel's personal computer with the panel's agreement. Overcome

(3) 도6 및 도19를 참조하면, 서버 웹로그 분석은 특정 서버에 인터넷 이용자들이 접속하게 되면 전에 기록을 하여두었던 패널들에 대한 데이터베이스와 함께 패널들의 사이트 내 이동경로에 대한 로그 자료와 사이트 외 거래를 한 거래정보 데이터베이스가 모두 통합이 되어서 데이터베이스에 저장이 되는 구조를 가지고 있다. 본 웹 서버 로그분석 단계는 특정 사이트 인터넷 이용자들의 실시간 서버접속 웹 로그 파일 및 실제 거래내역을 파악하고 그 정보를 데이터베이스에 저장하는 단계이다. 각 인터넷 사이트의 서버에 접속한 사용자들이 사이트의 패널인 경우에는 패널 가입시 작성한 패널 프로파일 데이터베이스를 이용하게 되고 패널이 아닌 경우에는 각 회사의 회원요구사항에 맞는 패널양식을 이용하여 패널가입을 한 후 서버에 접속하여 서핑 및 거래(Transaction)가 일어나게 된다. 본 서핑 자료들을 로그 파일의 형태로 각 서버에 저장이 되게 되며 본 로그 파일들은 로그 클리닝(Log Cleaning) 과정을 거쳐 데이터베이스에 저장이 되게 된다. 또한 실질적인 거래내역들은 거래내역 DB에 저장이 되게 된다. 이런 서버 프로그램은 각 서버시스템의 운영체제에 따라 다르게 되며 보통은 Linux와 윈도NT 운영체제를 기반으로 실행되어진다.(3) Referring to Figs. 6 and 19, server analytics includes a log of the panel's movement paths and sites along with a database of panels that were recorded before Internet users access a particular server. In addition, all the transaction information databases that have been traded are integrated and stored in the database. This web server log analysis step is to grasp the real-time server access web log files and actual transactions of specific site Internet users and store the information in a database. If the users who access the server of each Internet site are the panel of the site, the panel profile database created when joining the panel is used. If the panel is not the panel, the server is registered after using the panel form that meets the member requirements of each company. Surfing and transactions take place after accessing. The surf data is stored in each server in the form of log files, and the log files are saved in the database through a log cleaning process. In addition, the actual transaction details are stored in the transaction history DB. These server programs depend on the operating system of each server system and are usually run on Linux and Windows NT operating systems.

(4) 도20을 참조하면, 산업 및 개별상품동향 시스템은 사이트 이용자들에게 이메일 형식의 설문지를 보내서 방문했던 사이트에 대한 만족도 및 구입한 상품에대한 만족도를 측정하여 그 정보를 데이터베이스에 저장하는 단계이다. 사용자가 인터넷을 통하여 상품을 구입한 경우 상품을 구입한 즉시 구입을 했던 사이트에 대한 평가를 하는 설문지를 고객에게 보내어 그 설문결과를 인터넷을 통해서 받게 되며 해당 데이터베이스에 저장을 하게 되며, 고객이 주문한 상품이 고객에게 도달하여 고객이 상품을 사용하게 되면 해당 상품에 대한 고객의 만족도 및 중요도에 대한 정보를 별도의 설문응답 데이터베이스에, 사용자들의 구매 및 서비스 이용정도에 대한 로그 데이터를 구매 및 서비스 이용정보 DB에 저장을 하며, 각 사이트에 대한 정보를 동종 산업별로 군집화 하여 각 산업별 동향에 대한 정보를 해당 데이터베이스에 저장하는 단계이다. 고객들이 사이트의 상품 및 서비스를 이용하게 되면 소비자들이 해당 사이트와 구입한 상품들에 대한 평가를 인터넷 서베이를 통하여 응답하게 되며 본 응답치는 메일의 형태로 서비스회사의 서버에 데이터베이스화되어서 저장이 되게 된다. 이런 서버 프로그램은 Windows NT 운영체제를 기반으로 실행되어진다. 인터넷 서베이 패널들로부터 전달되어지는 정보는 연속적인 형태로 되어있으며 이런 데이터를 파일로 저장하는데 파일 이름이 중복되어지지 않도록 유니크한 파일명을 사용한다.(4) Referring to FIG. 20, the industrial and individual product trend system sends a site-based questionnaire to the site users to measure the satisfaction with the visited site and the satisfaction with the purchased product, and store the information in a database. to be. When a user purchases a product over the Internet, a questionnaire is sent to the customer to evaluate the site where the purchase was made immediately after the purchase of the product, and the result of the survey is received via the Internet and stored in a corresponding database. When the customer reaches the customer and uses the product, information about the customer's satisfaction and importance of the product is collected in a separate questionnaire response database, and log data about the user's purchase and service usage information DB for purchase and service usage information In this step, the information on each site is clustered by industry, and the information on each industry trend is stored in the database. When customers use the products and services of the site, consumers respond to the evaluation of the site and the purchased products through the Internet survey, and this response is stored in the database of the service company's server in the form of an e-mail. . These server programs run on the Windows NT operating system. The information conveyed from the Internet Survey Panels is in a continuous format and uses a unique file name to store this data in a file so that the file name is not duplicated.

다음으로는 이러한 단계들을 통하여 모집한 각 실시간 인터넷 정보를 모두 통합하여 데이터웨어하우즈 데이터베이스에 저장을 할 수 있도록 데이터변환을 실시하는 단계이다. 도21을 참조하면, 본 단계에서는 기본적인 빈도분석을 이용한 베이직 올랩(OLAP; online analytical processing) 기능을 실시하게 된다.The next step is to perform data conversion so that all real-time Internet information collected through these steps can be integrated and stored in the dataware database. Referring to FIG. 21, in this step, an basic analytical processing (OLAP) function using basic frequency analysis is performed.

도22를 참조하면, 마지막 단계에서는 데이터 변환을 통하여 데이터웨어하우즈 데이터베이스에 저장된 실시간 인터넷 정보를 각각의 데이터마이닝 알고리즘에서 요구하는 데이터만을 추출하는 데이터 선택(Selection) 시스템을 거쳐서 각각의 웹 데이터 마이닝 알고리즘을 적용하여 웹데이터마이닝 솔루션이 제작되는 단계이다. 본 단계를 거쳐서 도23에 도시한 바와 같이, 개인화된 웹 서핑 시스템이 나오게 된다.Referring to FIG. 22, in the final step, each web data mining algorithm passes through a data selection system that extracts only the data required by each data mining algorithm from the real-time Internet information stored in the dataware database through data conversion. This step is to produce web data mining solution. Through this step, as shown in Fig. 23, a personalized web surfing system comes out.

한편, 본 비즈니스 방법에서 사용하고 있는 데이터 모델은 다음과 같다.Meanwhile, the data model used in this business method is as follows.

데이터베이스 1Database 1

(a) 명칭: 패널 프로파일 데이터베이스 (b) 저장정보: 패널ID, 암호(Password), 이름, 나이, 성별, 주민등록번호, 주소1, 주소2, 전화번호, 메일어드레스, 교육정도, 직업, 결혼여부, 소득수준, 관심분야, 인터넷사용시간 (c) 기타: 개개인 별로 정보가 서버 컴퓨터의 데이터베이스에 저장(a) Name: Panel Profile Database (b) Stored Information: Panel ID, Password, Name, Age, Gender, Social Security Number, Address 1, Address 2, Phone Number, Mail Address, Education, Occupation, Marital Status, Income level, interests, internet usage time (c) other: information is stored in a database on the server computer for each individual

데이터베이스 2Database 2

(a) 명칭: 서베이 결과 데이터베이스 (b) 저장 정보; 패널ID, 패널의 인구통계학적 정보, 서베이 응답값 (c) 기타: 패널의 서베이 응답값을 메일을 통하여 서버의 데이터베이스에 저장(a) Name: Survey Results Database (b) Stored Information; Panel ID, panel demographic information, survey response (c) Others: Panel survey responses are stored in the server's database via email.

데이터베이스 3Database 3

(a) 명칭: 패널 서핑 로그 데이터베이스 (b) 저장 정보; 패널ID, 접속호스트명, 접속페이지, 접속시간, 사용브라우저 (c) 기타: 패널의 서핑 URL정보를 로그서버의 데이터베이스에 저장(a) name: panel surfing log database (b) storage information; Panel ID, access host name, access page, access time, browser used (c) Others: Save panel surfing URL information to log server database

데이터베이스 4Database 4

(a) 명칭: 서버 로그 데이터베이스 (b) 저장 정보: 사이트명, 사이트종류(쇼핑몰, 경매 등) (c) 기타: 사이트명과 그 사이트 종류에 대한 정보를 저장(a) Name: Server log database (b) Stored information: Site name, site type (shopping mall, auction, etc.) (c) Other: Stores site name and information about the site type

데이터베이스 5Database 5

(a) 명칭: 회원 프로파일 데이터베이스 (b) 저장정보: 회원ID, 암호(Password), 이름, 나이, 성별, 주민등록번호, 주소1, 주소2, 전화번호, 메일어드레스, 교육수준, 직업, 결혼여부, 소득수준, 관심분야, 인터넷사용시간 (c) 기타: 개개인 별로 정보가 서버 컴퓨터의 데이터베이스에 저장 (d) 각 사이트의 회원가입양식에 따라 다를 수 있음.(a) Name: Member Profile Database (b) Stored Information: Member ID, Password, Name, Age, Gender, Social Security Number, Address 1, Address 2, Phone Number, Mail Address, Education Level, Occupation, Marital Status, Income levels, interests, time spent on the Internet (c) Others: Information is stored in a database on the server computer for each individual.

데이터베이스 6Database 6

(a) 명칭: 트랜잭션 데이터베이스 (b) 저장 정보; 회원ID, 회원주소, 회원연락처, 회원이메일, 구입 품목, 구입 수량, 거래날짜, 거래금액, 결재방식 (c) 기타: 각각의 사이트의 트랜잭션 데이터베이스 구조에 따라 다름(a) name: transaction database (b) storage information; Member ID, Member Address, Member Contact, Member Email, Purchase Item, Purchase Quantity, Transaction Date, Transaction Amount, Payment Method (c) Other: Depends on the transaction database structure of each site

데이터베이스 7Database 7

(a) 명칭: 로그 정보 데이터베이스 (b) 저장 정보; 회원ID, 사이트내 서핑경로 로그 파일 (c) 기타: 각각의 사이트의 로그 정보 데이터베이스 구조에 따라 다름(a) Name: log information database (b) Stored information; Member ID, Intra-Site Surfing Path Log File (c) Other: Depends on the log information database structure of each site

데이터베이스 8Database 8

(a) 명칭: 사이트 평가 데이터베이스 (b) 저장 정보; 회원ID, 평가설문 응답 파일 (c) 기타: 패널에게 1차와 2차로 나누어서 발송하여 응답을 받은 설문응답 데이터베이스(a) Name: Site Assessment Database (b) Stored Information; Membership ID, Evaluation Questionnaire Response File (c) Others: Survey response database that was sent to the panel divided into 1st and 2nd

데이터베이스 9Database 9

(a) 명칭: 데이터웨어하우즈 데이터베이스 (b) 저장 정보: 회원ID, 회원나이, 주소, 교육수준, 직업, 결혼여부, 소득수분, 관심분야, 구입물품, 구입날짜, 구입수량, 결재방식, 접속 페이지 (c) 기타 데이터마이닝 알고리즘에 입력할 수 있는 형태인 수치형태로 변환된 데이터베이스. 데이터마이닝 알고리즘을 적용할 수 있는 형태의 데이터웨어하우스 데이터베이스(a) Name: Dataware Housing Database (b) Stored Information: Member ID, Member's Age, Address, Education Level, Occupation, Marital Status, Income Distribution, Interests, Purchase Items, Purchase Date, Purchase Quantity, Payment Method, (C) A database that has been converted to a numerical form, which can be entered into other data mining algorithms. Data warehouse database in which data mining algorithms can be applied

이상 본 발명을 상기 실시예를 들어 설명하였으나, 본 발명은 이에 제한되는 것은 아니다. 당업자라면, 본 발명의 취지 및 범위를 벗어나지 않고 수정, 변경 등이 가능하며 이러한 수정과 변경 또한 본 발명에 속하는 것임을 알 수 있을 것이다.Although the present invention has been described with reference to the above embodiments, the present invention is not limited thereto. Those skilled in the art will appreciate that modifications, changes, and the like may be made without departing from the spirit and scope of the present invention, and such modifications and changes also belong to the present invention.

본 발명에 따르면, 실시간 웹데이터를 이용한 웹데이터마이닝 솔루션은 각각의 산업의 특색에 맞는 실시간 웹 데이터를 이용하여 해당기업에 맞는 고유의 웹데이터마이닝 솔루션을 개발할 수 있다. 따라서, 기존과 같이 한번의 솔루션 개발로서 끝나지 않고 급변하는 인터넷 산업의 동향에 맞추어서 추가적인 웹데이터마이닝 알고리즘의 적용을 실시함으로써 국내의 각 산업에 따라 그 특색에 맞는 실질적인 웹데이터마이닝 솔루션을 개발하고 컨설팅 서비스를 제공하여 국내 기업들이 세계적인 경쟁력을 키워갈 수 있도록 한다.According to the present invention, a web data mining solution using real-time web data can develop a unique web data mining solution for a corresponding company by using real-time web data for each industry. Therefore, by applying additional web data mining algorithms to meet the rapidly changing Internet industry trend as a single solution development, the company develops a practical web data mining solution for each industry in Korea and provides consulting services. To provide domestic companies with global competitiveness.

Claims

Obtaining full Internet user information through general surveys and Internet surveys;

Acquiring real-time web surfing log data and website traffic data relating to internet usage trends of Internet users collected using panel-mounted web tracking software;

Obtaining real-time server access web logs and transaction information data of specific individual internet site users;

Acquiring real-time web customer evaluation data through internet survey targeting actual Internet service users of a specific internet site through information on customers and companies in real time on the website,

Converting the obtained data to be combined and stored in a data warehouse database;

A data selection step of extracting only data required by the data mining step among the stored data;

A method of obtaining a web data mining solution comprising data mining the data.