KR20220104098A

KR20220104098A - Big data analysis system and method of search service for marketing company thereof

Info

Publication number: KR20220104098A
Application number: KR1020210006422A
Authority: KR
Inventors: 김성규
Original assignee: 김성규
Priority date: 2021-01-17
Filing date: 2021-01-17
Publication date: 2022-07-26

Abstract

The present invention relates to a big data analysis system and a marketing company search service method using the system. More specifically, the big data analysis system searches for information from sources (websites) on selected networks based on internal customer information and keywords, collects the search results in an information collection unit, stores the collected information in a distributed manner, taking into account characteristics of multiple information collection units, and analyzes the distributed stored information using a method including value analysis through item association analysis to provide analysis results. The provided analysis results are stored again as data, and through the repeated process of analyzing the association between the data and newly accumulated and analyzed data, a diffusion state of customer information that changes in a time-series manner can be effectively analyzed. Therefore, even if an individual company's own analysis information is limited, the present invention provides effective marketing support information or management support information. This configuration can be established by a specialized analysis company instead of individual companies, and the collected information can be used universally to provide marketing support information that individual client companies desire through the analysis method that the individual client companies require. In this case, individual client companies can secure the marketing support information or management support information through big data analysis at low cost.

Description

Analysis system using big data and marketing company search service method using it { BIG DATA ANALYSIS SYSTEM AND METHOD OF SEARCH SERVICE FOR MARKETING COMPANY THEREOF }

본 발명은 빅데이터를 활용한 분석 시스템 및 이를 이용한 마케팅 업체 검색 서비스 방법에 관한 것으로서, 더욱 상세하게는, 다양한 매체를 통해 사용자 정보를 수집하고 수집된 정보를 빅데이터로 분석하여 이를 이용해 사용자에게 적합한 마케팅 업체의 정보를 제공하는 서비스에 의하여 사용자들에게 알맞은 마케팅 방법과 이를 대행해주는 업체에 대한 정보를 제공하는 빅데이터를 활용한 분석 시스템 및 이를 이용한 마케팅 업체 검색 서비스 방법에 관한 것이다.The present invention relates to an analysis system using big data and a marketing company search service method using the same, and more particularly, to collect user information through various media, analyze the collected information as big data, and use it to suit the user. A marketing method suitable for users by a service providing information of a marketing company and an analysis system using big data that provides information on a company acting on behalf of the user, and a marketing company search service method using the same.

최근 들어 1인 기업형태의 SNS 광고 플랫폼이 성장하면서 다양한 형태의 광고 및 마케팅 서비스 방법이 제시되고 있다.Recently, as the SNS advertising platform of the one-person company type has grown, various types of advertising and marketing service methods have been proposed.

종래의 일반적인 판촉마케팅은 일시적인 마케팅 방법으로 이벤트성을 강조한 부분으로 그치고 있으며, 업체보다는 개인을 위한 마케팅 방법으로 활용성이 많았다.Conventional general promotional marketing is a temporary marketing method that emphasizes event characteristics, and has more utility as a marketing method for individuals rather than companies.

그러다 보니, 광고주들은 끊임없는 신제품을 마케팅하기 위해 늘 새로운 타겟을 찾아나서야 하고, 시간, 비용 및 인력의 소모가 그만큼 늘어나게 되는 문제점을 가짐으로서, 이를 극복하는 강력한 마케팅 인프라구축이 필요한 것이 사실이다.As a result, advertisers have to constantly find new targets to continuously market new products, and the consumption of time, money, and manpower increases accordingly.

이와 같은 문제점들을 극복하기 위해 대부분의 광고주는 마케팅 대행사에게 상당액의 비용을 들여 마케팅 대행을 맡기고 있지만, 비용부담으로 인해 지속적인 광고효과를 기대하기 어려운 문제점이 있었다.In order to overcome these problems, most advertisers spend a considerable amount of money to entrust marketing to the marketing agency.

한편, 빅데이터에 대한 수집과 분석에 대한 관심이 높아지면서 포털이나 대형 유통사 혹은 제조사와 같이 많은 회원을 보유하고 회원의 방대한 사용 로그 정보를 수집할 수 있는 업체들이 회원의 사용 로그를 분석하는 것으로 특정한 트랜드 정보를 산출하거나 타겟 그룹을 구분하는 등의 빅데이터 활용을 고려하고 있다.On the other hand, as interest in collecting and analyzing big data increases, companies that have many members and can collect vast amount of member usage log information, such as portals, large distributors, or manufacturers, are certain to analyze member usage logs. The use of big data, such as calculating trend information or classifying target groups, is being considered.

기본적으로 빅데이터에 대한 처리는 방대한 정보를 효과적으로 처리하기 위한 분산 저장 및 분산 처리에 대한기술, 다양한 정형 및 비정형 데이터를 수집 및 연관 분석하는 기술, 목적 없이 얻어지는 정보들로부터 유의미한 정보를 산출하는 기술 등이 복합적으로 적용되고 있으며, 아직 이러한 빅데이터 분석에 따른 효과가 가시적으로 입증되고 있지 않기 때문에 방대한 연산과 수많은 데이터의 분석을 위한 시스템 구축 비용과 분석할 수많은 데이터의 확보가 가능한 대형 업체위주로 빅데이터 분석을 마케팅에 활용하고자 하는 연구와 시도가 이루어지고 있는 실정이다.Basically, the processing of big data includes technology for distributed storage and distributed processing to effectively process vast amounts of information, technology to collect and related analysis of various structured and unstructured data, technology to calculate meaningful information from information obtained without purpose, etc. This is complexly applied, and since the effect of such big data analysis has not yet been visibly proven, big data analysis is mainly focused on large companies that can secure a large number of data to analyze and the cost of building a system for massive computation and analysis of numerous data. Research and attempts to utilize it in marketing are being conducted.

이 중에서 웹 컨텐츠의 급증이나 회원 사용 로그의 급증에 대응하기 위한 대용량 데이터의 분산 저장과 분산 처리에 관해서는 상당한 기술적 진보와 활용이 이루어지고 있으나 아직 빅데이터로부터 유의미한 정보를 산출하는 것에 관해서는 많은 연구가 필요한 실정이다.Among them, considerable technological progress and utilization have been made in the distributed storage and distributed processing of large-capacity data to cope with the rapid increase in web content or member usage logs, but there are still many studies on calculating meaningful information from big data. is needed.

특히, 이러한 빅데이터 분석에 대해서는 그 실질적인 활용을 위한 다양한 분석 방법이 연구중이기 때문에 현재 빅데이터 분석을 경영 활동에 활용하는 경우는 주로 자사 이용고객이 제공한 내부의 데이터를 분석하여 통계나 트랜드 분석 등에 일부 활용하고 있는 정도이며, 아직까지 이러한 빅데이터 분석 결과를 전적으로 신뢰하고 있지는 못한 실정이다. 하지만, 다양한 모바일 기기의 방대한 활용, 다양한 소셜 네트워크의 활용, 다양한 네트워크 인프라의 확장에 의해서 추적 및 수집 가능한 다양한 사용자 이용 로그 정보들이 기하급수적으로 쌓이고 있는 실정이므로 이를 활용한 빅데이터 분석이 향후 산업 전반에 걸쳐 필수적인 마케팅 지원이나 경영 지원 정보가 될 것임은 누구도 부인하지 못하고 있다.In particular, various analysis methods for practical use of such big data analysis are being studied. Therefore, when big data analysis is currently used in business activities, it is mainly analyzed internal data provided by the company's customers for statistics or trend analysis, etc. Some of them are being used, and the results of such big data analysis are not yet fully trusted. However, due to the extensive use of various mobile devices, the use of various social networks, and the expansion of various network infrastructures, various user usage log information that can be tracked and collected is exponentially accumulating. No one can deny that it will be essential marketing support or management support information across the country.

결국, 정형화되어 수집 및 활용이 용이한 자사 이용고객의 정보를 충분히 보유하지 못하거나 빅데이터 분석을 위한 시스템 비용을 감당하기 어려운 중소기업은 대형 포털이나 금융사, 대기업 등과 같이 빅데이터 분석을 통해 시장에 접근하는 업체들과의 경쟁에서 도태되거나 종속되는 상황으로 발전할 가능성이 높다.As a result, small and medium-sized enterprises (SMEs) that do not have enough standardized customer information that are easy to collect and utilize or cannot afford system costs for big data analysis access the market through big data analysis, such as large portals, financial institutions, and large enterprises. There is a high possibility that it will develop into a situation of being eliminated or subordinated in competition with companies that do business.

대형 포털이나 대기업 등은 자본이나 축적된 자사 이용고객들의 정보를 활용한 빅데이터 분석으로 이러한 경영 정보를 활용할 수 있거나 곧 활용할 수 있게 되겠지만 중소기업이나 벤처기업 등은 이러한 빅데이터 분석을 통한 경영 정보 활용이 어려운 일일 수밖에 없다.Large portals and large corporations can or will soon be able to utilize such management information through big data analysis using capital or accumulated customer information. It's bound to be a difficult day.

한편, 대형 포털의 경우 자사가 보유하는 방대한 정보들을 활용할 수 있는 빅데이터 분석 도구를 무료로 제공하여 플랫폼 시장에 대한 선점을 원하는 경우가 있어 이를 통해서 자신이 보유한 여러 정보들을 분석해 볼 수 있는 기회가 있기는 하지만 아직 대용량 데이터에 대한 분산 저장과 알려져 있는 분석 방식을 통한 정적인 분석정도가 일반적일 뿐 중소기업이나 벤처기업이 이러한 시스템을 자사에 맞추어 커스터마이징하거나 시간에 따라 가변되는 시장 상황을 효과적으로 분석하기 위한 방법론 및 분석 프로세스를 직접 개발하여 적용하기는 여전히 어려운 상황이다. 따라서, 빅데이터 분석을 통한 경영 정보 확보를 포기하고 전통적인 마케팅 방법론을 따르거 나 빅데이터 분석을 실시하더라도 신뢰성이 없는 단순 보조 자료 정도로만 활용하고 있다.On the other hand, in the case of large portals, there are cases where they want to dominate the platform market by providing free big data analysis tools that can utilize the vast amount of information they have. However, the distributed storage of large data and the degree of static analysis through known analysis methods are still common. A methodology for small and medium-sized enterprises or venture companies to customize these systems to their own or to effectively analyze market conditions that change over time. And it is still difficult to directly develop and apply the analysis process. Therefore, they give up securing business information through big data analysis and follow traditional marketing methodologies or use only unreliable simple auxiliary data even if big data analysis is performed.

따라서, 중소기업이나 벤처기업의 경우도 자사의 실정에 맞는 최적화된 경영이나 마케팅 지원 정보를 효과적으로 획득할 수 있으면서, 바이럴 마케팅과 같이 정보가 분산되어 수집이 어렵고 시계열적 연관 분석이 요구되는 경우에도 적절한 데이터를 수집하여 원하는 분석 결과를 제공할 수 있는 새로운 형태의 빅데이터 분석 시스템 및 방법이 요구되고 있는 실정이다.Therefore, in the case of small and medium-sized enterprises or venture companies, it is possible to effectively obtain optimized management or marketing support information suitable for their own circumstances, and appropriate data even in cases where it is difficult to collect and time-series correlation analysis is required due to scattered information such as viral marketing. There is a need for a new type of big data analysis system and method that can collect and provide desired analysis results.

전술한 문제점을 개선하기 위한 본 발명 실시예들의 목적은 내부 고객 정보와 키워드를 기반으로 선택된 네트워크 상의 소스들(웹사이트)로부터의 정보를 탐색하여 그 결과를 정보 수집부에서 수집하고, 복수의 정보 수집부 특성을 고려하여 수집된 정보를 분산 저장하며, 분산 저장된 정보를 항목 연관 분석을 통한 가치 분석을 포함하는 분석 방식으로 분석하여 그 결과를 제공하되, 분석 결과와 제공 결과를 다시 데이터로서 저장한 후 신규 축적되어 분석되는 데이터와의 연관성을 분석하는 반복적 프로세스를 통해 시계열적으로 변화되는 고객의 정보 확산 상태를 효과적으로 분석함으로써 자체 보유 분석 정보가 한정적이라 하더라도 마케팅 지원 정보나 경영 지원 정보를 효과적으로 제공할 수 있도록 한 마케팅을 위한 빅데이터 분석 시스템 및 방법을 제공하는 것이다.An object of the embodiments of the present invention for improving the above-described problems is to search for information from sources (websites) on a network selected based on internal customer information and keywords, collect the results in an information collection unit, and collect a plurality of information In consideration of the characteristics of the collection unit, the collected information is distributed and stored, and the distributed and stored information is analyzed in an analysis method including value analysis through item correlation analysis and the result is provided, but the analysis result and the provision result are stored as data again. Through an iterative process of analyzing the correlation with the data that is then accumulated and analyzed, it is possible to effectively provide marketing support information or management support information even if the self-contained analysis information is limited by effectively analyzing the time-series change state of customer information. It is to provide a big data analysis system and method for marketing.

본 발명 실시예들의 다른 목적은 자사 고객의 한정된 정보가 아닌 필요로 하는 정보를 웹상에서 능동적으로 수집하되 수집 대상에 따른 다양한 비정형 정보를 효과적으로 분석하기 위해서 수집 대상에 따라 다양한 비정형 항목들로 수집되는 수집 정보를 처리 가능한 수준으로 정형화하는 정보 수집부를 구성하는 것으로 다양하게 수집되는 웹상 정보들을 항목을 기준으로 가치 분석이 가능하도록 하며, 분석된 정보를 신규 분석 정보와 항목을 기반으로 재분석하도록 하는 과정을 반복하도록 함으로써 항목을 기준으로 하는 1:N 관계의 연관 분석이 가능하도록 하여 시계열 분석과 바이럴 마케팅에 대한 분석이 가능하도록 한 마케팅을 위한 빅데이터 분석 시스템 및 방법을 제공하는 것이다.Another object of the embodiments of the present invention is to actively collect required information, not limited information of its customers, on the web, but collect various atypical items according to the collection target in order to effectively analyze various atypical information according to the collection target The process of reanalyzing the analyzed information based on the new analysis information and items is repeated by configuring the information collection unit that formalizes the information to a manageable level, enabling value analysis of variously collected information on the web based on the items. It is to provide a big data analysis system and method for marketing that enables correlation analysis of 1:N relationships based on items to enable analysis of time series analysis and viral marketing.

본 발명 실시예들의 또 다른 목적은 분석된 정보를 활용 데이터베이스에 저장한 후 사용자의 요구나 결과 제공어플리케이션에 의해 활용 데이터베이스의 내용을 처리하여 사용자에게 결과로서 제공하며, 이러한 결과 제공을 위한 분석 과정에서 산출된 이벤트를 분석하여 이들 중 일부를 다시 분산 저장하거나 재활용 가능하도록 활용 데이터베이스에 저장하도록 함으로써 결과 제공을 위한 실제 업무 처리자의 경험에 따라 산출되는 정보들을 분석에 추가 반영하고 다른 사용자들이 재활용할 수 있도록 한 마케팅을 위한 빅데이터 분석 시스템 및 방법을 제공하는 것이다.Another object of the embodiments of the present invention is to store the analyzed information in the utilization database and then process the content of the utilization database by the user's request or the result providing application and provide it to the user as a result, and in the analysis process for providing such a result By analyzing the calculated events and storing some of them in the database to be distributed or re-used so that they can be reused, the information generated according to the experience of the actual work processor for providing the results is additionally reflected in the analysis and can be recycled by other users. It is to provide a big data analysis system and method for marketing.

상기와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따른 마케팅을 위한 빅데이터 분석 시스템은 기 설정된 종류의 웹사이트로부터 설정된 탐색 기준에 따라 정보를 탐색한 후 탐색된 정보를 파싱하여 미리 마련된 계층적 공통 코드와 항목을 기준으로 하는 포맷으로 변환하는, 웹사이트 종류별로 마련된 정보 수집부와; 상기정보 수집부가 변환한 정보를 공통 코드와 크기를 참조하여 분산 저장하는 분산 저장부와; 상기 분산 저장부가 분산 저장한 정보에 분산 처리 방식으로 접근하여 항목 기준 가치 분석을 포함하는 분석 프로세스들에 따라 분석한 후 그 결과를 활용 데이터베이스에 저장하고, 활용 데이터베이스에 기 저장된 이전 가치 분석 결과와 신규분석된 가치 분석 결과를 항목을 기준으로 재분석하여 그 결과를 활용 데이터베이스에 더 저장하는 분석부와; 활용 데이터베이스의 분석 결과를 원하는 프로세스로 재분석하거나 요청된 쿼리에 따라 검색하여 그 결과를 출력하며 출력 내용을 분석하여 데이터 포맷에 따라 상기 분산 저장부에 데이터로 제공하거나 재사용을 위해 활용데이터 베이스에 저장하는 결과 제공부를 포함한다.In order to achieve the above object, a big data analysis system for marketing according to an embodiment of the present invention searches for information according to a search criterion set from a preset type of website and then parses the searched information to provide an information collecting unit provided for each type of website that converts the hierarchical common code and the format based on the item; a distributed storage unit for distributing and storing the information converted by the information collection unit with reference to a common code and size; The distributed storage unit accesses the distributed and stored information in a distributed processing method, analyzes it according to analysis processes including item-based value analysis, and stores the results in the utilization database, and stores the results of previous value analysis and new data previously stored in the utilization database. an analysis unit for re-analyzing the analyzed value analysis result based on the item and further storing the result in the utilization database; It reanalyzes the analysis result of the utilization database as a desired process or searches according to the requested query and outputs the result, and analyzes the output content and provides it as data to the distributed storage unit according to the data format or stores it in the utilization database for reuse Includes results provider.

또한, 정보 수집부의 탐색 기준을 제공하고, 분석부의 분석 프로세스를 갱신하며, 결과 제공부에 재분석 프로세스나 쿼리를 제공하고 그 결과를 사용자에게 제공하는 사용자 지원부를 포함할 수 있다.In addition, it may include a user support unit that provides the search criteria of the information collection unit, updates the analysis process of the analysis unit, provides a reanalysis process or query to the result providing unit, and provides the result to the user.

정보 수집부는 탐색 기준으로 기 설정된 회원 정보를 포함하며, 해당 회원에 관련된 정보를 탐색할 수 있다.The information collection unit includes member information preset as a search criterion, and may search for information related to the member.

한편, 정보 수집부는 탐색 기준으로 키워드를 포함하며, 범용 사용자를 위한 기초 정보로 복수의 웹사이트로부터 상기 키워드에 따른 정보를 탐색하고 탐색 결과를 분류하기 위해 마련된 계층적 공통 키워드로 구분하며, 정보를 파싱하여 관련성 정보가 설정된 항목들을 포함하는 포맷으로 변환할 수 있다.On the other hand, the information collection unit includes a keyword as a search criterion, and divides the information into a hierarchical common keyword prepared to search information according to the keyword from a plurality of websites as basic information for general-purpose users and classify the search result, It can be parsed and converted into a format including items for which relevance information is set.

정보 수집부는 수집되는 개인정보는 익명 처리하고 식별 정보는 암호화하거나 삭제할 수 있다.The information collection unit can anonymize collected personal information and encrypt or delete identification information.

또한, 정보 수집부는 항목을 기준으로 포맷을 변환하되, 비정형 데이터를 포함할 수도 있다.In addition, the information collection unit converts the format based on the item, but may include unstructured data.

정보 수집부는 검색 포털 사이트, 소셜 네트워크 사이트, 클라우드 사이트를 포함하는 개방형 웹사이트 별로 각 웹사이트에 접속하여 검색이나 공개된 오픈 API를 통해 탐색 기준에 따른 정보를 수집할 수 있다. 여기서, 정보수집부는 웹사이트 종류별로 구분되어 구성되며, 각 정보 수집부는 웹사이트 종류에 따라 탐색 결과를 공통 코드로 구분하기 위한 기준이나 탐색 결과를 항목을 기준으로 파싱하기 위한 기준이 다른 것이 바람직하다.The information collection unit may access each website for each open website including a search portal site, a social network site, and a cloud site, and collect information according to the search criteria through a search or an open API. Here, it is preferable that the information collection unit is configured separately for each type of website, and each information collection unit has a different criterion for classifying the search result into a common code or a criterion for parsing the search result based on the item according to the type of the website. .

분석부는 분석 결과를 분산 저장부에 데이터로서 다시 제공할 수 있다.The analysis unit may provide the analysis result back as data to the distributed storage unit.

분석부는 항목을 기준으로 수행되는 가치 분석 결과를 활용 데이터베이스에 저장하여 1:N의 연관 분석을 위한 데이터를 반복적으로 수집하는 것이 바람직하다.It is preferable that the analysis unit repeatedly collects data for 1:N correlation analysis by storing the result of the value analysis performed on the basis of the item in the utilization database.

결과 제공부는 외부 분석 솔루션 혹은 외부 분석 솔루션과의 연계를 위한 인터페이스이거나 외부 사용자 사이트에 커스터마이징될 수 있는 분석 구성을 포함할 수 있다.The result providing unit may be an interface for linking with an external analysis solution or an external analysis solution, or may include an analysis configuration that can be customized on an external user site.

본 발명의 다른 실시예에 따른 마케팅을 위한 빅데이터 분석 방법은 기 설정된 종류의 웹사이트에 따라 구분되는 정보 수집부가 설정된 탐색 기준에 따라 정보를 탐색하여 그 탐색 내용을 미리 마련된 계층적 공통 코드와 항목을 기준으로 하는 포맷으로 변환하는 정보 수집 단계와; 상기 정보 수집 단계에서 수집된 변환된 정보를 수신한 분산 저장부가 공통 코드와 크기를 기준으로 수신 정보를 분산 저장하는 분산 저장 단계와; 상기 분산 저장 단계에서 분산 저장된 정보를 이용하여 요구되는 분석을 수행하는 분석부가 상기 분산 저장된 정보에 분산처리 방식으로 접근하여 항목 기준 가치 분석을 포함하는 분석 프로세스들에 따라 분석을 실시한 후 그 결과를 활용 데이터베이스에 저장하고, 활용 데이터베이스에 기 저장된 이전 가치 분석 결과와 신규 분석된 가치 분석 결과를 항목을 기준으로 재분석하여 그 결과를 활용 데이터베이스에 다시 저장하는 분석 단계와; 사용자에게 결과를 제공하는 결과 제공부가 활용 데이터베이스의 분석 결과를 원하는 프로세스로 재분석하거나 요청된 쿼리에 따라 검색하여 그 결과를 출력하며 출력 내용을 분석하여 데이터 포맷에 따라 상기 분산 저장부에 데이터로 제공하거나 재사용을 위해 활용 데이터 베이스에 저장하는 결과 제공 단계를 포함한다.A big data analysis method for marketing according to another embodiment of the present invention searches for information according to a search criterion set by an information collection unit that is classified according to a preset type of website, and sets the search content in advance with a hierarchical common code and item an information collection step of converting to a format based on ; a distributed storage step of distributing and storing the received information based on a common code and size by a distributed storage unit receiving the converted information collected in the information collection step; In the distributed storage step, an analysis unit that performs a required analysis using the distributed and stored information accesses the distributed stored information in a distributed processing method, performs analysis according to analysis processes including item-based value analysis, and then utilizes the results an analysis step of storing the data in the database, reanalyzing the previous value analysis result and the newly analyzed value analysis result previously stored in the utilization database based on the items, and storing the result back in the utilization database; The result providing unit that provides the results to the user reanalyzes the analysis result of the utilization database as a desired process or searches according to the requested query and outputs the result, and analyzes the output and provides it as data to the distributed storage unit according to the data format; It includes a step of providing the result of storing it in the utilization database for reuse.

정보 수집부에 탐색 기준을 제공하고, 분석부의 분석 프로세스를 갱신하며, 결과 제공부에 재분석 프로세스나 쿼리를 제공하고 그 결과를 사용자에게 제공하는 사용자 지원부를 통해 사용자의 요구를 수신하여 처리하는 단계를 포함할 수 있다.receiving and processing a user's request through a user support unit that provides the search criteria to the information collection unit, updates the analysis process in the analysis unit, provides a reanalysis process or query to the result provision unit, and provides the result to the user may include

정보 수집 단계에서, 정보 수집부는 탐색 기준으로 기 설정된 회원 정보를 포함하며, 해당 회원에 관련된 정보를 탐색할 수 있다.In the information collection step, the information collection unit may include member information preset as a search criterion and search for information related to the member.

정보 수집단계에서, 정보 수집부는 탐색 기준으로 키워드를 포함하며, 범용 사용자를 위한 기초 정보로 복수의 웹사이트로부터 상기 키워드에 따른 정보를 탐색하고 탐색 결과를 분류하기 위해 마련된 계층적 공통 키워드로 구분하며, 정보를 파싱하여 관련성 정보가 설정된 항목들을 포함하는 포맷으로 변환할 수 있다.In the information collection step, the information collection unit includes a keyword as a search criterion, and is divided into hierarchical common keywords prepared to search information according to the keyword from a plurality of websites as basic information for general-purpose users and classify the search results, , it is possible to parse the information and convert it into a format including items for which relevance information is set.

상기 정보 수집 단계에서, 정보 수집부는 검색 포털 사이트, 소셜 네트워크 사이트, 클라우드 사이트를 포함하는 개방형 웹사이트 별로 각 웹사이트에 접속하여 검색이나 공개된 오픈 API를 통해 탐색 기준에 따른 정보를 수집할 수 있다. 또한, 이러한 정보 수집부는 웹사이트 종류별로 구분되어 구성되며, 각 정보 수집부는 웹사이트 종류에 따라 탐색 결과를 공통 코드로 구분하기 위한 기준이나 탐색 결과를 항목을 기준으로 파싱하기 위한 기준이 다를 수 있다.In the information collection step, the information collection unit accesses each open website for each open website including a search portal site, a social network site, and a cloud site, and collects information according to the search criteria through a search or an open API. . In addition, these information collection units are configured separately for each type of website, and each information collection unit may have different criteria for classifying search results into a common code or criteria for parsing search results based on items depending on the type of website. .

분석 단계에서, 분석부는 분석 결과를 분산 저장부에 데이터로서 다시 제공할 수 있다.In the analysis step, the analysis unit may provide the analysis result back to the distributed storage unit as data.

분석 단계에서, 분석부는 항목을 기준으로 수행되는 가치 분석 결과를 활용 데이터베이스에 저장하여 1:N의 연관 분석을 위한 데이터를 반복적으로 수집할 수 있다.In the analysis step, the analysis unit may repeatedly collect data for 1:N correlation analysis by storing the result of the value analysis performed on the basis of the item in the utilization database.

본 발명 실시예에 따른 마케팅을 위한 빅데이터 분석 시스템 및 방법은 내부 고객 정보와 키워드를 기반으로 선택된 네트워크 상의 소스들(웹사이트)로부터의 정보를 탐색하여 그 결과를 정보 수집부에서 수집하고, 복수의 정보 수집부 특성을 고려하여 수집된 정보를 분산 저장하며, 분산 저장된 정보를 항목 연관 분석을 통한 가치분석을 포함하는 분석 방식으로 분석하여 그 결과를 제공하되, 분석 결과와 제공 결과를 다시 데이터로서 저장한 후 신규 축적되어 분석되는 데이터와의 연관성을 분석하는 반복적 프로세스를 통해 시계열적으로 변화되는 고객의 정보 확산 상태를 효과적으로 분석함으로써 자체 보유 분석 정보가 한정적이라 하더라도 마케팅 지원 정보나 경영 지원 정보를 효과적으로 제공할 수 있는 효과가 있는 것은 물론이고, 이러한 구성을 개별 업체가 아닌 분석 전문 업체가 구성하고 수집되는 정보를 범용적으로 활용하여 개별 고객사에서 요구하는 분석 방식을 통해 개별 고객사가 원하는 마케팅 지원 정보를 제공하는 방식도 가능하며 이 경우 개별 고객사는 빅데이터 분석을 통한 마케팅 지원 정보나 경영 지원 정보를 낮은 비용으로 확보할 수 있는 효과가 있다.A big data analysis system and method for marketing according to an embodiment of the present invention searches for information from sources (websites) on a network selected based on internal customer information and keywords, and collects the results in an information collection unit, The collected information is distributed and stored in consideration of the characteristics of the information collection unit of Through an iterative process of analyzing the correlation with the newly accumulated and analyzed data after storage, it effectively analyzes the customer information diffusion status that changes in time series, so that even if the analysis information possessed by the company is limited, the marketing support information or management support information can be effectively analyzed. Not only is there an effect that can be provided, but also the marketing support information desired by individual customers can be provided through the analysis method required by individual customers by using the collected information universally by configuring this configuration by a specialized analysis company rather than an individual company. It is also possible to provide a method, and in this case, individual customers have the effect of securing marketing support information or management support information through big data analysis at a low cost.

본 발명 실시예에 따른 마케팅을 위한 빅데이터 분석 시스템 및 방법은 자사 고객의 한정된 정보가 아닌 필요로 하는 정보를 웹상에서 능동적으로 수집하되 수집 대상에 따른 다양한 비정형 정보를 효과적으로 분석하기 위해서 수집 대상에 따라 다양한 비정형 항목들로 수집되는 수집 정보를 처리 가능한 수준으로 정형화하여 분산 저장하는 정보 수집부를 구성하는 것으로 다양하게 수집되는 웹상 정보들을 항목을 기준으로 가치 분석이 가능하도록 하며, 분석된 정보를 신규 분석 정보와 항목을 기반으로 재분석하도록 하는 과정을 반복하도록 함으로써 항목을 기준으로 하는 1:N 관계의 연관 분석이 가능하도록 하여 시계열 분석과 바이럴 마케팅에 대한 신속하여 신뢰성 있는 분석 결과를 경제적으로 제공할 수 있는 효과가 있다.The big data analysis system and method for marketing according to the embodiment of the present invention actively collects necessary information on the web rather than limited information of its customers, but in order to effectively analyze various atypical information according to the collection target, depending on the collection target It constitutes an information collection unit that standardizes the collected information collected in various atypical items to a processable level and stores it in a distributed manner. By repeating the process of re-analysis based on and items, it enables the correlation analysis of 1:N relationships based on items, thereby economically providing fast and reliable analysis results for time series analysis and viral marketing. there is

본 발명 실시예에 따른 마케팅을 위한 빅데이터 분석 시스템 및 방법은 분석된 정보를 활용 데이터베이스에 저장한 후 사용자의 요구나 결과 제공 어플리케이션에 의해 활용 데이터베이스의 내용을 처리하여 사용자에게 결과로서 제공하며, 이러한 결과 제공을 위한 분석 과정에서 산출된 이벤트를 분석하여 이들 중 일부를 다시 분산 저장하거나 재활용 가능하도록 활용 데이터베이스에 저장하도록 함으로써 결과 제공을 위한 실제 업무 처리자의 경험에 따라 산출되는 정보들을 분석에 추가 반영하고 다른 사용자들이 재활용할 수 있도록 하여 제공되는 마케팅 정보의 품질을 높이고 사용에 따라 신뢰성과 만족도가 높아지는 효과가 있다.The big data analysis system and method for marketing according to an embodiment of the present invention stores the analyzed information in the utilization database, processes the contents of the utilization database by the user's request or the result providing application, and provides the result to the user as a result. By analyzing the events generated in the process of analysis to provide results and storing some of them in the database to be distributed or reused again, the information generated according to the experience of the actual work processor for providing the results is additionally reflected in the analysis. By allowing other users to recycle it, the quality of the provided marketing information is improved, and reliability and satisfaction increase according to its use.

도 1은 종래 빅데이터 분석 시스템의 구성도.
도 2는 본 발명의 실시예에 따른 빅데이터 분석 시스템의 개념도.
도 3은 본 발명의 실시예에 따른 빅데이터 분석 시스템의 구성도.
도 4는 본 발명의 실시예에 따른 정보 수집부 구성도.
도 5는 본 발명의 실시예에 따른 분산 저장부 구성도.
도 6은 본 발명의 실시예에 따른 분석부 구성도.
도 7은 본 발명의 실시예에 따른 결과 제공부 구성도.1 is a block diagram of a conventional big data analysis system.
2 is a conceptual diagram of a big data analysis system according to an embodiment of the present invention.
3 is a block diagram of a big data analysis system according to an embodiment of the present invention.
4 is a configuration diagram of an information collection unit according to an embodiment of the present invention;
5 is a configuration diagram of a distributed storage unit according to an embodiment of the present invention.
6 is a configuration diagram of an analysis unit according to an embodiment of the present invention;
7 is a configuration diagram of a result providing unit according to an embodiment of the present invention.

상기한 바와 같은 본 발명을 첨부된 도면들과 실시예들을 통해 상세히 설명하도록 한다.The present invention as described above will be described in detail with reference to the accompanying drawings and embodiments.

본 발명에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 발명에서 사용되는 기술적 용어는 본 발명에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 발명에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in the present invention are only used to describe specific embodiments, and are not intended to limit the present invention. In addition, the technical terms used in the present invention should be interpreted as meanings generally understood by those of ordinary skill in the art to which the present invention belongs, unless otherwise defined in the present invention, and excessively comprehensive It should not be construed as a human meaning or in an excessively reduced meaning. In addition, when the technical term used in the present invention is an incorrect technical term that does not accurately express the spirit of the present invention, it should be understood by being replaced with a technical term that can be correctly understood by those skilled in the art. In addition, general terms used in the present invention should be interpreted as defined in advance or according to the context before and after, and should not be interpreted in an excessively reduced meaning.

또한, 본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다.Also, the singular expression used in the present invention includes the plural expression unless the context clearly dictates otherwise.

본 발명에서, "구성된다" 또는 "포함한다" 등의 용어는 발명에 기재된 여러 구성 요소들, 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.In the present invention, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various elements or several steps described in the invention, and some of the elements or some steps are included. It should be construed that it may not, or may further include additional components or steps.

또한, 본 발명에서 사용되는 제 1, 제 2 등과 같이 서수를 포함하는 용어는 구성 요소들을 설명하는데 사용될 수 있지만, 구성 요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성 요소는 제 2 구성 요소로 명명될 수 있고, 유사하게 제 2 구성 요소도 제 1 구성 요소로 명명될 수 있다.Also, terms including ordinal numbers such as first, second, etc. used in the present invention may be used to describe the components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings, but the same or similar components are given the same reference numerals regardless of reference numerals, and redundant description thereof will be omitted.

또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.In addition, in the description of the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only for easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the accompanying drawings.

도 2는 본 발명의 실시예에 따른 빅데이터 분석 시스템의 개념도를 보인 것으로, 도시된 바와 같이 보유한 내부 정보만을 이용하는 것이 아니라 능동적으로 웹상에서 생성되는 수많은 이용자 정보들을 수집하는 정보탐색 수집 시스템(10)과, 이렇게 수집된 방대한 정보를 분산 저장하는 분산 저장 시스템(20)과, 분산 저장된 정보에 분산 처리 방식으로 접근하여 분석을 수행하고 분석 결과 중 재활용이 가능한 정보를 다시 분산 저장하도록 하는 분석 시스템(30)과, 분석 시스템(30)을 통해 분석된 결과를 수신하여 제공하거나, 분석 시스템(30)이 분석한 결과들을 기준으로 필요한 분석을 재실시하도록 하거나, 혹은 쿼리에 따라 분석 결과를 검색하여 결과를 제공하며, 재활용할 수 있는 검색 결과 정보는 그 종류에 따라 분산 저장 시스템(20)에 데이터로 제공하고 분석된 자체로서 활용할 수 있다면 이를 분석 시스템(30)에서 재활용하도록 제공하는 결과 제공 시스템(40)과, 정보탐색 수집시스템(10)이 수집할 정보에 대한 탐색 기준을 제공하고, 분석 시스템(30)에 분석할 내용(스크립트, 분석 알고리즘, 설정 등의 프로세스)을 제공하거나 분석할 내용을 갱신하며, 결과 제공 시스템(40)에 재분석을 위한 내용(스크립트, 분석 알고리즘, 설정 등의 프로세스)이나 쿼리를 제공하고 그 결과를 용자에게 제공하는 사용자 지원 시스템(50)을 포함한다. 여기서, 사용자 지원 시스템(50)은 상기 각 시스템들에 접근하도록 하는 사용자 인터페이스일 수 있으나 각 시스템의 일부 기능들을 구비한 시스템일 수 있으므로 구체적인 구성은 다양할 수 있다.2 is a conceptual diagram of a big data analysis system according to an embodiment of the present invention. As shown, an information search and collection system 10 that actively collects numerous user information generated on the web, rather than using only internal information possessed. And, a distributed storage system 20 that distributes and stores the vast amount of information collected in this way, and an analysis system 30 that accesses distributed and stored information in a distributed processing method to perform analysis and distribute and store recyclable information among the analysis results ), and receive and provide the analyzed results through the analysis system 30, allow the analysis system 30 to re-perform the necessary analysis based on the analyzed results, or retrieve the analysis results according to a query and display the results Search result information that can be provided and recyclable is provided as data to the distributed storage system 20 according to the type, and if it can be utilized as the analyzed itself, the result providing system 40 that provides it for recycling in the analysis system 30 And, the information search and collection system 10 provides search criteria for the information to be collected, provides the analysis system 30 with the contents to be analyzed (scripts, analysis algorithms, processes such as settings), or updates the contents to be analyzed, , the result providing system 40 includes a user support system 50 that provides content (processes such as script, analysis algorithm, setting, etc.) or query for re-analysis and provides the result to the user. Here, the user support system 50 may be a user interface that allows access to each of the systems, but may be a system having some functions of each system, and thus the specific configuration may vary.

결국, 도시된 구성은 정형화된 내부 정보를 수집하여 이를 분석하는 기존 방식이 아니라 모바일 서비스, 포털,소셜 네트워크 서비스, 클라우드 서비스 등을 포함하는 다양한 서비스를 제공하는 웹사이트에서 발생되는 다양한 정보들 중에서 원하는 분석 대상에 적합한 정보들을 수집하여 이들로부터 원하는 마케팅이나 경영 지원을 위한 정보 분석을 실시하는 방식이다.In the end, the configuration shown is not a conventional method of collecting and analyzing standardized internal information, but a desired It is a method of collecting information suitable for analysis target and analyzing information for desired marketing or management support from them.

만일 회원에 대한 정보를 일부 구비하고 있는 경우, 해당 회원에 대한 내부 정보는 물론이고 해당 다양한 웹사이트에서 해당 회원에 관련된 정보들을 더 수집하는 방식으로 다양한 관련 정보들을 수집할 수도 있다.If some information about the member is provided, various related information may be collected in a manner that further collects information related to the member from various websites as well as internal information about the member.

정보를 수집하는 웹사이트는 예를 들어 구글, 네이버, 다음, 빙, 네이트, 야후, 바이두 등의 다양한 포털일 수 있고 이로부터 특정한 검색 결과나 회원 식별자를 이용한 정보들을 수집할 수 있고, 각 포털에서 결과로 제공하는 웹페이지, 뉴스, 블로그, 문서 등을 정보 수집 대상으로 할 수 있다. 이러한 포털에서는 검색 뿐만 아니라 해당 포털에서 제공하는 각종 서비스에 대한 검색과 정보 수집이 가능하도록 하는 오픈 API(Application Program Interface)를 제공하기도 한다. 이러한 정보를 수집하는 웹사이트는 예를 들어 페이스북, 트위터, 유튜브, 핀터레스트, 싸이월드, 라인, 인스타그램, 미투데이, 텀블러, 리슨미, 라스트 에프엠 등의 다양한 소셜 네트워크일 수 있다. 이러한 소셜 네트워크는 대부분 외부에서 해당 소셜 네트워크의 내부 정보를 검색하거나 수집할 수 있도록 하는 오픈 API를 제공하는 경우가 많다.Websites that collect information may be, for example, various portals such as Google, Naver, Daum, Bing, Nate, Yahoo, and Baidu, and information using specific search results or member identifiers may be collected from these portals. Web pages, news, blogs, documents, etc. provided as a result can be the target of information collection. Such portals also provide an open API (Application Program Interface) that enables search and information collection for various services provided by the portal as well as search. The website that collects such information may be, for example, various social networks such as Facebook, Twitter, YouTube, Pinterest, Cyworld, Line, Instagram, Me2day, Tumblr, Listen Me, and Last FM. Most of these social networks often provide open APIs that allow external users to retrieve or collect internal information of the social network.

그 외에도 다양한 클라우드 서비스 사이트, 모바일 서비스 사이트 등 다양한 웹사이트들로부터 정보를 수집할 수 있으며 이러한 사이트들 역시 오픈 API를 제공하는 경우가 많다.In addition, information can be collected from various websites such as various cloud service sites and mobile service sites, and these sites also often provide open APIs.

따라서, 로봇을 이용한 정보의 수집이 가능하므로 원하는 기준(키워드, 시간, 인기도, 필터링 조건 등)에 따른 방대한 정보를 수집할 수 있다.Therefore, since it is possible to collect information using robots, it is possible to collect vast amounts of information according to desired criteria (keywords, time, popularity, filtering conditions, etc.).

그러나 이렇게 다양한 웹상의 정보를 단순하게 수집한다고 하여 이를 직접 빅데이터 분석에 활용할 수 있는 것은 아니며, 이렇게 수집된 데이터를 분산 처리 방식을 통한 분석 스크립트로 분석한다고 하여 마케팅에 적합한 정보를 생성할 수 있는 것은 아니다.However, simply collecting various information on the web does not mean that it can be directly utilized for big data analysis. not.

따라서, 본 발명의 실시예에서는 기본적으로 두 가지 새로운 방식을 도입한다. 우선 수집 정보를 어느 정도 정형화하도록 웹사이트의 종류에 대응되는 정보 수집부들을 구성하는 방식과, 시계열적 분석이나 온라인 마케팅에서 중요도가 높아지는 바이럴 마케팅 분석을 위해서 항목을 기준으로 가치분석을 실시한 후 이를 저장하고 새로 수집되어 분석된 데이터와의 연관성을 분석하는 반복적 프로세스를 통해 항목을 기준으로 1:N 관계의 연관 분석이 가능하도록 함으로써 마케팅 지원을 위한 잠재적 가치 분석이 가능하도록 하는 방식을 도입한다.Accordingly, the embodiment of the present invention basically introduces two new methods. First, the method of composing the information collection units corresponding to the type of website so as to standardize the collected information to some extent, and for the viral marketing analysis, which increases in importance in time-series analysis or online marketing, value analysis is performed based on the items and then stored In addition, a method is introduced that enables potential value analysis for marketing support by enabling correlation analysis of 1:N relationships based on items through an iterative process of analyzing correlations with newly collected and analyzed data.

그 외에도 실제 마케팅 전문가들의 분석 요청(원하는 분석을 위한 스크립트, 알고리즘, 설정 등)이나 검색 쿼리등에 따른 결과 정보를 생성하여 사용자에게 제공함과 아울러 생성된 결과 정보 중 실무적으로 의미가 있는 분석 정보들이 재활용할 수 있도록 하는 방식도 적용함으로써 단순한 대용량 데이터 분석이 아닌 실질적으로 의미가 있는 마케팅 지원이나 경영 지원을 위한 분석 정보가 제공될 수 있도록 한다. 또한, 이러한 본 발명의 실시예에 따른 시스템을 특정한 업체를 위한 종속적 시스템이 아닌 범용 사용자를 대상으로 하는 전문 서비스로 특화시킬 경우 범용적으로 수집되는 정보들을 각 서비스 이용 업체들의 분석 요구에 맞추어 분석한 후 그 결과를 제공할 수 있다. 이 경우 시스템 활용도가 높아지므로 서비스 이용 업체의 부담을 줄일 수 있고, 각 서비스 이용 업체의 마케팅 담당자가 분석한 결과들을 재활용 가능하게 수집하므로 이러한 다양한 서비스 이용 업체들이 원하는 결과를 빠르고 다양하게 획득할 수 있게 된다. 또한, 분석 결과를 신규 수집되는 정보들과 통합하여 재분석하는 것으로 시계열적 분석이나 항목을 기준으로 하는 연관 분석등의 효율을 높일 수 있다.In addition, result information is generated and provided to users according to analysis requests (scripts, algorithms, settings, etc. for desired analysis) or search queries from actual marketing experts. By applying a method that allows users to do this, analysis information for marketing support or management support that is practically meaningful can be provided rather than simply analyzing large amounts of data. In addition, when the system according to the embodiment of the present invention is specialized as a professional service for general-purpose users rather than a subordinate system for a specific company, information collected in general is analyzed according to the analysis needs of each service-using company. You can then provide the results. In this case, the system utilization increases, so the burden on service users can be reduced, and the results analyzed by the marketers of each service user can be recyclably collected so that these various service users can quickly and diversely obtain the desired results. do. In addition, it is possible to increase the efficiency of time-series analysis or correlation analysis based on items by integrating the analysis results with newly collected information and re-analyzing them.

이러한 구성을 적용한 좀 더 구체적인 예를 도 3을 통해 살펴보도록 한다.A more specific example to which such a configuration is applied will be described with reference to FIG. 3 .

도 3은 본 발명의 실시예에 따른 빅데이터 분석 시스템의 구성도로서, 도시된 바와 같이, 기 설정된 종류의 웹사이트로부터 설정된 탐색 기준에 따라 정보를 탐색한 후 탐색된 정보를 파싱하여 미리 마련된 계층적 공통 코드와 항목을 기준으로 하는 포맷으로 변환하는, 웹사이트 종류별로 마련된 정보 수집부(100)와, 정보 수집부(100)가 변환한 정보를 공통 코드와 크기를 참조하여 분산 저장소(300)에 분산 저장하는 분산 저장부(200)와,분산 저장부(200)가 분산 저장한 정보에 분산 처리 방식으로 접근하여 항목 기준 가치 분석을 포함하는 분석 프로세스들에 따라 분석한 후 그 결과를 데이터베이스부(400)에 저장하고, 데이터베이스부(400)에 기 저장된 이전 가치 분석 결과와 신규 분석된 가치 분석 결과를 항목을 기준으로 재분석하여 그 결과를 데이터베이스부(400)에 더 저장하는 분석부(500)와, 데이터베이스부(400)의 분석 결과를 원하는 프로세스로 재분석하거나 요청된 쿼리에 따라 검색하여 그 결과를 출력하며 출력 내용을 분석하여 데이터 포맷에 따라 상기 분산 저장부에 데이터로 제공하거나 재사용을 위해 데이터베이스부(400)에 저장하는 결과 제공부(600)와, 정보 수집부(100)의 탐색 기준을 제공하고, 분석부(500)의 분석 프로세스를 갱신하며, 결과 제공부(600)에 재분석 프로세스나 쿼리를 제공하고 그 결과를 사용자에게 제공하는 사용자 지원부(700)를 포함한다.3 is a block diagram of a big data analysis system according to an embodiment of the present invention. As shown, information is searched for from a preset type of website according to a set search criterion, and then the searched information is parsed and a layer prepared in advance. The information collection unit 100 prepared for each type of website, which converts it into a format based on common codes and items, and the information converted by the information collection unit 100, with reference to the common code and size, distributed storage 300 The distributed storage unit 200 that stores the distributed data in the database unit, and the distributed storage unit 200 accesses the distributed and stored information in a distributed processing method and analyzes the results according to analysis processes including item-based value analysis, and then displays the results in the database unit The analysis unit 500 that stores in 400, re-analyzes the previous value analysis result and the newly analyzed value analysis result previously stored in the database unit 400 based on the item, and further stores the result in the database unit 400 And, the analysis result of the database unit 400 is reanalyzed as a desired process or searched according to a requested query and the result is output, and the output is analyzed and provided as data to the distributed storage unit according to the data format or a database for reuse The result providing unit 600 to be stored in the unit 400 and the search criteria of the information collecting unit 100 are provided, the analysis process of the analysis unit 500 is updated, and the result providing unit 600 is re-analyzed or and a user support unit 700 that provides a query and provides the results to a user.

여기서, 사용자 지원부(700)는 통합적인 사용자 인터페이스를 의미하는 것으로 단순한 인터페이스뿐만 아니라 연관되는 각 기능부(정보 수집부(100), 분석부(500), 결과 제공부(600))에 대한 확장 기능을 제공할 수도 있으며, 경우에 따라서는 상기 각 기능부의 일부 기능으로서 해당 기능부에 포함될 수도 있다. 따라서, 사용자 지원부(700)는 다양하게 구성되거나 다양한 기능을 가질 수 있으므로 본 발명의 실시예에 따른 구성으로 한정되지 않는다.Here, the user support unit 700 means an integrated user interface, and not only a simple interface, but also an extended function for each related functional unit (information collecting unit 100 , analysis unit 500 , and result providing unit 600 ). may be provided, and in some cases, may be included in the corresponding functional unit as a partial function of each functional unit. Accordingly, the user support unit 700 is not limited to the configuration according to the embodiment of the present invention because it may be configured in various ways or may have various functions.

한편, 도시된 분산 저장부(200)와 분산 저장소(300)는 데이터를 크기에 따라 구분하여 여러 곳의 저장 장소에 데이터를 저장하는 다양한 분산 저장 방식을 이용할 수 있는데, 본 실시예에서는 널리 알려져 있는 하드 파일 시스템(HDFS)을 분산 저장의 기본으로 한다. 이 경우 분석부(500)는 하둡 파일 시스템을 이용하는 맵 리듀스를 분산 처리의 기본으로 활용할 수 있다.On the other hand, the illustrated distributed storage unit 200 and the distributed storage 300 can use various distributed storage methods that classify data according to the size and store the data in several storage places, which are widely known in this embodiment. The Hard File System (HDFS) is the basis for distributed storage. In this case, the analysis unit 500 may utilize Map Reduce using the Hadoop file system as a basis for distributed processing.

도시된 데이터베이스부(400)는 분산 저장되는 파일의 정보나 정보 수집부(100)의 탐색을 위한 탐색 기준을 저장하는 메타데이터베이스(410)와, 분류를 위한 계층적 통합 코드에 대한 정보를 구비한 코드 데이터베이스(420)와, 분석부(500)와 결과 제공부(600)가 분석을 위해 분석 결과를 저장하고 항목 기반 가치 분석 결과를 누적하며 검색 결과 중 재활용을 위한 결과를 저장하는 활용 데이터베이스(420)를 포함한다.The illustrated database unit 400 includes a metadata database 410 that stores information on files that are distributed and stored or a search criterion for search of the information collection unit 100, and information on a hierarchical integration code for classification. The code database 420, the analysis unit 500 and the result providing unit 600 store the analysis results for analysis, accumulate the item-based value analysis results, and a utilization database 420 that stores the results for recycling among the search results. ) is included.

즉, 앞서 설명된 도 3의 설명에서 실질적으로 설명된 데이터베이스부(400)는 활용 데이터베이스(420)를 의미한다. 그 외의 데이터베이스(410, 430)는 이후 추가로 설명한다.That is, the database unit 400 substantially described in the description of FIG. 3 described above means the utilization database 420 . Other databases 410 and 430 will be further described later.

한편, 도시된 구성에서 정보 수집부(100)는 회원 정보 획득부(110)와 정보 탐색 수집부(120)를 포함하는데, 이중에서 회원 정보 획득부(110)는 선택적으로 적용될 수 있다. 도시된 회원 정보 획득부(110)는 정보 수집 시 내부 회원 정보나 내부 회원의 이용 정보를 수집하며 필요에 따라 정보 탐색 수집부(120)에서 다양한 웹사이트로부터 정보를 수집할 때 회원 정보에 관련된 정보를 수집할 수 있도록 하는 정보를 제공해 줄 수 있다.Meanwhile, in the illustrated configuration, the information collection unit 100 includes a member information acquisition unit 110 and an information search collection unit 120 , among which the member information acquisition unit 110 may be selectively applied. The illustrated member information acquisition unit 110 collects internal member information or internal member usage information when collecting information, and information related to member information when the information search and collection unit 120 collects information from various websites as needed We can provide information that enables us to collect

물론, 정보 수집부(100)는 사용자 지원부(700)를 통해 사용자가 설정하는 키워드를 기준으로 설정되는 웹사이트들에서 정보를 수집하는 정보 탐색 수집부(120) 만을 구성할 수도 있다.Of course, the information collection unit 100 may configure only the information search and collection unit 120 that collects information from websites set based on a keyword set by a user through the user support unit 700 .

앞서 설명했던 바와 같이 본 발명의 실시예에서는 다양한 웹사이트들로부터 원하는 정보를 탐색하여 수집하면서 이들을 파싱하여 적절한 분류 코드와 항목을 가지는 포맷으로 변환하는 정보 수집부를 웹사이트의 종류별로 구성한다.As described above, in the embodiment of the present invention, an information collection unit that searches for and collects desired information from various websites, parses them, and converts them into a format having an appropriate classification code and item is configured for each type of website.

이를 도 4를 참조하여 좀 더 상세히 설명한다.This will be described in more detail with reference to FIG. 4 .

도 4는 본 발명의 실시예에 따른 정보 탐색 수집부(120)의 구성과, 이러한 정보 탐색 수집부(120)의 정보 탐색을 위한 탐색 기준이 설정된 메타데이터 데이터베이스(410) 및 해당 탐색 기준을 메타데이터 데이터베이스(410)에 제공하는 사용자 지원부(700)의 구성이 도시된다.4 is a meta data showing the configuration of the information search and collection unit 120 according to an embodiment of the present invention, a metadata database 410 in which search criteria for information search of the information search and collection unit 120 are set, and the corresponding search criteria. The configuration of the user support unit 700 provided to the data database 410 is shown.

본 발명의 실시예에서는 도시된 정보탐색 수집부(120)가 빅데이터 분석이 가능하도록 다양한 정보를 수집하는 방식을 채택하는데, 이를 좀 더 구체적으로 살펴본다.In an embodiment of the present invention, the illustrated information search and collection unit 120 collects various information to enable big data analysis, which will be described in more detail.

우선 정형화된 내부 정보가 아닌 다양한 결과 제공 포맷이나 제공 결과들(웹페이지, 문서, 게시판, 덧글 등)을 통합적으로 분석할 수 있도록 항목을 기준으로 규격화하고 검색된 내용을 종류별로 구분하기 위해서 분류 기준을 계층적 통합 코드로 구분하도록 한다. 이를 위에서 각 웹사이트 종류에 따라 각각 달리 제공되는 탐색 결과들을 파싱 방식으로 분석하여 통합 코드와 항목을 기준으로 하는 포맷으로 변환하도록 하는 정보 검색부를 웹사이트 종류에 맞추어 구성하도록 한다. 이를 통해서 웹사이트 자체의 규정이 변경되거나 탐색 결과의 상태가 변화되더라도 해당 정보 검색부만 갱신하면 되도록 하고, 신규 웹사이트를 추가할 경우에도 대응되는 정보 검색부를 추가하면 되도록 하여 정보 탐색을 위한 관리가 용이하도록 함과 아울러 다양한 웹사이트에 따라 수집 정보가 다양하더라도 수집 정보들을 통합적으로 분석 가능한 데이터로 변환할 수 있게 된다. 물론 이렇게 포맷을 맞춘다 하더라도 유사성을 가지는 항목들 중 하나를 선택하여 항목을 결정하는 정도까지 탐색 결과를 구분하는 것은 가능하지만 그 항목의 데이터 종류까지도 일치시키는 것은 대단히 어렵다. 따라서 관련 항목으로 수집되는 데이터라 하더라도 그 데이터는 정형 혹은 비정형 데이터일 수 있고, 이는 분석부에서 정형 데이터와 비정형 데이터를 연동하여 처리하는 여러 알려져 있는 방식들 중 하나를 선택하여 처리하도록 한다.First of all, it is not standardized internal information, but various result provision formats or provided results (web pages, documents, bulletin boards, comments, etc.) Let it be classified as a hierarchical integration code. In the above, an information search unit that analyzes search results provided differently according to each type of website in a parsing method and converts it into a format based on an integrated code and item is configured according to the type of website. Through this, even if the regulations of the website itself change or the status of the search result changes, only the relevant information search unit needs to be updated, and when a new website is added, the corresponding information search unit needs to be added, so that management for information search is improved. In addition to making it easy, it is possible to convert the collected information into data that can be analyzed in an integrated manner even if the information collected varies according to various websites. Of course, even with this format, it is possible to classify the search results to the extent that an item is determined by selecting one of the items with similarity, but it is very difficult to match the data type of the item as well. Therefore, even if the data is collected as a related item, the data may be structured or unstructured data, and the analysis unit selects and processes one of several known methods for interworking structured data and unstructured data.

이를 위하여 도시된 정보탐색 수집부(120)는 메타데이터(410)에 설정된 탐색 기준(키워드, 탐색 대상 정보의 생성 시간, 대상 웹사이트, 탐색 주기, 탐색 정도, 회원 정보 등)에 따라 정보를 탐색하도록 하는 수집 관리 모듈(122)과, 수집 관리 모듈(122)의 요청에 따라 설정된 웹사이트(800)에 접속하여 정보를 탐색하는 정보 탐색 모듈(121)과, 정보 탐색 모듈(121)에 의해 탐색된 정보를 해당 웹사이트의 특성을 고려하여 파싱한 후 관계성이 정의된 항목들을 기준으로 포맷을 변환하는 데이터 생성 모듈(123)과, 데이터 생성 모듈(123)이 생성한 정보를 분산 저장부(200)에 제공하는 데이터 제공 모듈(124)을 포함한다.To this end, the illustrated information search and collection unit 120 searches for information according to the search criteria (keywords, search target information creation time, target website, search cycle, search degree, member information, etc.) set in the metadata 410 . Search by the collection management module 122, the information search module 121 that searches for information by accessing the website 800 set according to the request of the collection management module 122, and the information search module 121 The data generation module 123 for converting the format based on the items in which the relationship is defined after parsing the obtained information in consideration of the characteristics of the website, and the distributed storage unit ( 200), and a data providing module 124 provided.

여기서, 데이터 생성 모듈(123)은 탐색된 정보를 파싱하여 규격화된 포맷으로 변환하면서 해당 정보가 속하는 분류를 계층적 공통 코드 정보로 추가할 수 있다. 예를 들어, 해당 정보가 최근 사용했던 립스틱에 관한 정보라면 여성용품-화장품-립스틱에 해당하는 공통 코드 정보일 수 있다.Here, the data generation module 123 may add a classification to which the corresponding information belongs as hierarchical common code information while parsing the searched information and converting it into a standardized format. For example, if the corresponding information is information about a recently used lipstick, it may be common code information corresponding to women's products-cosmetics-lipstick.

또한, 필요한 경우 데이터 생성 모듈(123)은 수집되는 개인정보를 익명 처리하고 식별 정보는 암호화하거나 삭제하는 보안 관련 처리를 수행할 수도 있다.In addition, if necessary, the data generation module 123 may perform security-related processing of anonymizing collected personal information and encrypting or deleting identification information.

한편, 탐색된 정보는 임의의 웹사이트로부터 얻은 정보이기 때문에 특정 웹사이트에서는 제공되는 항목을 다른 웹사이트에서는 제공하지 않을 수 있고, 특정 웹페이지에서는 확인되는 항목이 다른 웹페이지에서는 확인되지 않을 수도 있으므로 빅데이터 분석을 위해서 관계성을 설정한 항목들을 마련하고 그로부터 해당 웹사이트에 적합한 항목을 선택하여 마련된 포맷으로 정리할 수 있도록 한다. 예를 들어, 사용자의 성별, 나이, 위치와 같은 항목들은 관련 항목에 대응되는 정보가 수집될 수도 있고 수집되지 않을 수도 있다. 따라서, 이러한 경우 수집대상 기본 정보로서 성별, 나이, 위치 등을 관계성이 있는 항목으로 마련하고 이들 중에서 해당 웹사이트에서 얻어지는 정보를 대응되는 항목으로 설정할 수 있다. 다른 예로서, 사용자의 위치 정보가 소정 웹사이트에서는 사용자 IP 정보로서 대략적 사용자 위치를 확인할 수 있는 정보로 수집되고, 다른 웹사이트에서는 주소 정보로 수집되며, 또 다른 웹사이트에서는 GPS 정보로 수집될 수 있는데, 이들은 모두 위치 항목으로 관계성을 가질 수 있다. 하지만 이들은 각각 데이터 구조가 상이하다. 따라서, 이들을 특정한 정보(주소나 GPS 위치)로 변환하여 포맷에 맞출 수도 있고 관계성을 가지는 위치 항목들 중 적절한 세부 항목들로 분류하여 포맷을 구성할 수도 있다.On the other hand, since the searched information is information obtained from an arbitrary website, items provided on a specific website may not be provided on other websites, and items identified on a specific webpage may not be confirmed on other webpages. For big data analysis, items with relationships set are prepared, and items suitable for the website are selected from them and organized in a prepared format. For example, for items such as a user's gender, age, and location, information corresponding to the related item may or may not be collected. Accordingly, in this case, as the basic information to be collected, gender, age, location, etc. may be provided as related items, and information obtained from the website may be set as a corresponding item among them. As another example, the user's location information may be collected as user IP information on a certain website, as information that can confirm the approximate user location, on another website as address information, and as GPS information on another website. However, they can all have a relationship as a location item. However, they each have a different data structure. Accordingly, these may be converted into specific information (address or GPS location) to fit the format, or the format may be configured by classifying them into appropriate detailed items among location items having a relationship.

이러한 항목 정보는 이후 분석 시 항목들 간 관계성에 따른 가치 분석이나 항목을 기준으로 하는 1:N 정보 생성에 따른 연관 분석 등에서 활용될 수 있으므로 그 포맷 설정은 중요한 부분이라 할 수 있으며, 본 발명의 실시예에서는 이러한 포맷 설정을 웹사이트의 특성을 반영하여 파싱할 수 있도록 함으로서 수집 대상의 특성에 최적화된 변환이 가능하게 된다.Since such item information can be utilized in value analysis according to the relationship between items or correlation analysis according to 1:N information generation based on items during subsequent analysis, the format setting can be said to be an important part, and the implementation of the present invention In the example, conversion optimized for the characteristics of the collection target is possible by parsing the format setting by reflecting the characteristics of the website.

도 5는 본 발명의 실시예에 따른 분산 저장부(200)의 구성 예를 보인 것으로 도시된 바와 같이 분산 저장부(200)는 웹사이트 종류별로 별도로 마련된 복수의 정보 수집부들에 대응되는 복수의 단위 분산 저장부(210)를 구비하는데, 이러한 단위 분산 저장부(210)는 정보 수집부가 자신의 변환 규칙에 맞추어 제공하는 포맷의 탐색정보를 수집하고 그에 대한 통합 코드를 코드 데이터베이스(430)에서 확인하여 분류 식별자를 부가하고 그 크기를 구분하여 분산 저장소(300)에 분산하여 저장한다.5 is a configuration example of the distributed storage unit 200 according to an embodiment of the present invention. As shown in FIG. 5, the distributed storage unit 200 includes a plurality of units corresponding to a plurality of information collection units separately provided for each type of website. A distributed storage unit 210 is provided, and this unit distributed storage unit 210 collects search information in a format provided by the information collection unit according to its own conversion rules, and checks the integrated code for it in the code database 430. A classification identifier is added and the size is divided and stored in the distributed storage 300 .

각 단위 분산 저장부(210)는 수집된 정보를 분산 저장소(300)에 분산 저장하면서 분산 저장한 위치, 파일명, 그룹 코드, 입력 날짜 등에 대한 정보를 메타데이터 데이터베이스(410)에 제공하는 수집 에이전트 모듈(211)과,수집되는 데이터의 크기에 따라 분할하여 저장하도록 하는 데이터 흐름 제어 모듈(212)을 포함한다.Each unit distributed storage unit 210 distributes and stores the collected information in the distributed storage 300 and a collection agent module that provides information about the location, file name, group code, input date, etc. of the distributed storage to the metadata database 410 . 211 and a data flow control module 212 configured to divide and store the collected data according to the size of the data.

도 6은 본 발명의 실시예에 따른 분석부(500) 구성을 보인 것으로, 도시된 바와 같이 분산 저장부(200)에 의해 분산 저장소(300)에 분산 저장된 데이터를 쿼리하거나 맵리듀스 방식 등으로 분산 처리하고 필요한 경우 그 분석 결과 중 데이터와 같은 포맷을 재활용을 위해 다시 분산 저장소(300)에 데이터로 저장하도록 하는 분산 처리모듈(510)과, 항목 기반 가치 분석을 포함하는 다양한 분석 프로세스(분석 스크립트, 알고리즘, 설정 등)를 관리하는 분석 프로세스 모듈(530)과, 분석 프로세스 모듈(530)의 각 분석 프로세스에 따라 분석을 자동적으로 수행하여 그 결과를 산출하는 데이터 분석 모듈(520)과, 데이터 프로세스 모듈(530)의 개별 분석 프로세스들을 등록, 갱신, 제거하고, 필요한 경우 코드 데이터베이스(430)로부터 통합 코드 정보를 확인하여 데이터 분석 모듈(520)에 제공하며, 데이터 분석 모듈(520)이 분석한 결과를 수집하여 활용 데이터베이스(420)에 기록하는 분석 제어 모듈(540)을 포함한다. 사용자 지원부(700)는 분석부(500)에 원하는 분석을 위한 분석 프로세스를 제공하거나 기존 프로세스를 갱신하거나 혹은 기존 프로세스를 삭제하는 등에 대한 정보를 제공할 수 있다.6 shows the configuration of the analysis unit 500 according to an embodiment of the present invention. As shown, data distributed and stored in the distributed storage 300 by the distributed storage unit 200 is queried or distributed by the MapReduce method, etc. A distributed processing module 510 that processes and, if necessary, stores the same format as data among the analysis results as data in the distributed storage 300 for recycling, and various analysis processes (analysis scripts, An analysis process module 530 that manages algorithms, settings, etc.), a data analysis module 520 that automatically performs an analysis according to each analysis process of the analysis process module 530 and calculates the result, and a data process module The individual analysis processes of 530 are registered, updated, and removed, and if necessary, integrated code information is checked from the code database 430 and provided to the data analysis module 520, and the data analysis module 520 analyzes the result. and an analysis control module 540 for collecting and recording in the utilization database 420 . The user support unit 700 may provide the analysis unit 500 with an analysis process for a desired analysis, update an existing process, or provide information on deleting an existing process.

상기 분석 제어 모듈(540)은 효과적인 바이럴 마케팅 분석을 위해서 항목 기준 연관성 분석에 의한 가치 분석을 포함하는 분석 프로세스들(조회, 정형.비정형 연계분석, 통계분석, 텍스트 분석, 기계학습, 배치 분석, 데이터연관 관계 파악, 패턴 추출, 랭킹, 이슈 분석, 시기 분석, 연관어 분석, 상관 분석, 회귀 분석 등)을 이용하여 수집된 데이터를 분석한 후 그 결과를 활용 데이터베이스(420)에 저장하고, 활용 데이터베이스에 기 저장된 이전 가치 분석 결과와 신규 분석된 가치 분석 결과를 항목을 기준으로 재분석하여 이전 결과를 활용 데이터베이스(420)에 더 저장한다. 이러한 과정이 반복되면 항목에 대해 1:N 분석 정보를 누적할 수 있고 이를 통해 연관 분석이 가능하게 되므로 분석 결과들의 변화에 대해서 다양한 정보를 확인할 수 있게 된다. 예를 들어 특정 립스틱에 대한 사용 정보들이 시간에 따라 어떻게 확산되거나 어떠한 방향(나이, 지역, 특정 소속, 웹사이트 기준방향)으로 변화되는지 확인할 수 있으며 이를 광고의 실시와 관련하여 분석하는 것으로 바이럴 마케팅 효과를 확인할 수 있다.The analysis control module 540 performs analysis processes including value analysis by item-based correlation analysis for effective viral marketing analysis (inquiry, structured and unstructured linkage analysis, statistical analysis, text analysis, machine learning, batch analysis, data After analyzing the collected data using correlation identification, pattern extraction, ranking, issue analysis, timing analysis, related word analysis, correlation analysis, regression analysis, etc.), the result is stored in the utilization database 420 and the utilization database The previous value analysis result and the newly analyzed value analysis result previously stored in the are re-analyzed based on the item, and the previous result is further stored in the utilization database 420 . When this process is repeated, 1:N analysis information can be accumulated for items, and through this, correlation analysis is possible, so that various information about changes in analysis results can be checked. For example, it is possible to check how the usage information for a specific lipstick spreads over time or changes in what direction (age, region, specific affiliation, website standard direction), and the effect of viral marketing is analyzed in relation to the execution of advertisements. can be checked.

한편, 데이터 분석 모듈(520)이나 분석 제어 모듈(540)은 분석된 결과를 다시 데이터로서 분산 저장소(300)에 저장하도록 할 수 있다.Meanwhile, the data analysis module 520 or the analysis control module 540 may store the analyzed result as data again in the distributed storage 300 .

여기서, 활용 데이터베이스(420)는 하둡 분산 처리를 효과적으로 수행하기 위해 고속 데이터베이스 분석을 지원하는 데이터베이스로서, 로(Row)의 길이가 수조개에 이르는 빅데이터 분산 처리를 위한 NoSQL 기반 데이터베이스인 HBase를 이용할 수 있다. 이는 인메모리 기반으로 구성되어 빅데이터 분석 속도를 높일 수 있으며, 분석부(500)를 통해 분석된 결과를 저장하여 저장 내용을 다양한 서비스에 빠르게 활용할 수 있도록 한다. 물론 활용 데이터베이스(420)는 이러한 HBase외에도 다양한 다른 종류의 데이터베이스와의 호환성을 제공하기 위해 메모리 스토어 및 인터페이스의 형태를 가질 수도 있다. 따라서 다양한 종류의 메모리 기반 데이터베이스로 동작할 수 있다.Here, the utilization database 420 is a database that supports high-speed database analysis to effectively perform Hadoop distributed processing. HBase, a NoSQL-based database for distributed processing of big data with a length of several trillion rows, can be used. have. This is configured on an in-memory basis to increase the speed of big data analysis, and stores the analyzed results through the analysis unit 500 so that the stored contents can be quickly utilized for various services. Of course, the utilization database 420 may have the form of a memory store and an interface in order to provide compatibility with various other types of databases in addition to the HBase. Therefore, it can operate as various types of memory-based databases.

도 7은 본 발명의 실시예에 따른 결과 제공부(600)의 구성을 보인 것으로, 도시된 바와 같이 활용 데이터베이스(420)의 분석 결과를 원하는 프로세스로 재분석하거나 요청된 쿼리에 따라 검색하여 그 결과를 출력하도록 하는 정보 유통 모듈(610)과 활용 데이터베이스(420)를 활용할 프로세스나 쿼리를 제공하는 서비스 어플리케이션 모듈(620)과, 결과 출력 내용을 분석하여 데이터 포맷에 따라 로그와 분석이 가능한 이벤트는 분산 저장부를 통해 분산 저장소에 데이터로 제공하고 즉시 활용 및 서비스가 가능한 결과 이벤트는 재사용을 위해 활용 데이터베이스(420)에 저장하는 이벤트 분석 모듈(630)을 포함할 수 있다.7 shows the configuration of the result providing unit 600 according to an embodiment of the present invention. As shown, the analysis result of the utilization database 420 is reanalyzed as a desired process or searched according to a requested query and the result is displayed. The information distribution module 610 for output, the service application module 620 for providing a process or query to utilize the utilization database 420, and the distributed storage of events that can be logged and analyzed according to the data format by analyzing the result output contents It may include an event analysis module 630 that provides data to the distributed storage through the department and stores the result event that can be used and serviced immediately in the utilization database 420 for reuse.

여기서, 서비스 어플리케이션 모듈(620)은 사용자와 연동하는 시스템으로서 실질적으로 사용자 지원부(700)에 연동되는 인터페이스일 수 있고, 실질적인 분석 프로세스나 쿼리는 사용자 지원부(700)를 통해서 제공되는 것일 수 있다.Here, the service application module 620 may be an interface that is linked to the user support unit 700 as a system that interworks with the user, and an actual analysis process or query may be provided through the user support unit 700 .

한편, 사용자 지원부(700)의 일부 기능은 결과 제공부(600)와 통합될 수 있는데, 이 경우 상기 정보 유통 모듈(610)은 활용 데이터베이스(420)에 저장된 분석 결과들을 활용하는 기능 외에도 필요한 경우 분석부(500)에 새로운 분석 프로세스를 제공하여 그에 따른 분석 결과를 수집하여 사용자에게 출력하도록 하는 일종의 디스패처 기능을 제공할 수도 있다. 이 경우 결과 제공부(600)는 외부 분석 솔루션 혹은 외부 분석 솔루션과의 연계를 위한 인터페이스이거나 외부 사용자 사이트에 커스터마이징될 수 있는 분석 구성을 포함할 수도 있고, 서비스 어플리케이션 모듈은 분산 처리를 위한 분석 엔진(예를 들어 분산 처리 데이터베이스의 분석을 위한 도구인 알스튜디오(R-studio)를 통해 구성된 분석 엔진)일 수도 있다.On the other hand, some functions of the user support unit 700 may be integrated with the result providing unit 600 . In this case, the information distribution module 610 performs analysis if necessary in addition to the function of utilizing the analysis results stored in the utilization database 420 . It is also possible to provide a kind of dispatcher function to provide a new analysis process to the unit 500 to collect and output the analysis result according to the analysis result to the user. In this case, the result providing unit 600 is an interface for linking with an external analysis solution or an external analysis solution, or may include an analysis configuration that can be customized to an external user site, and the service application module is an analysis engine ( For example, it may be an analysis engine configured through R-studio, a tool for analysis of distributed processing databases.

한편, 사용자인 마케터, 관리자, 광고주 등은 자신이 필요한 분석을 위해 키워드 등의 탐색 기준을 제공하여 정보를 수집하도록 하면 본 발명의 실시예에 따른 빅데이터 분석 시스템은 해당 탐색 기준에 따라 정보들을 주기적으로 수집한다. 이렇게 수집되는 정보들을 분산 저장하고, 분산 처리 방식으로 접근하여 분석하되 분석 정보들을 연관 분석이 가능하도록 반복적으로 분석하여 정보들을 누적한다. 이렇게 누적되는 분석 정보들을 필요에 따라 원하는 기준으로 쿼리하여 결과를 산출하거나 새로운 분석 프로세스를 제시하여 그 분석 결과를 활용할 수 있으며, 이는 실무적인 노하우에 따른 것으로 분석 시스템 설계자가 모두 지원할 수 없는 부분이다. 이러한 실무적인 분석 프로세스나 쿼리에 따른 결과들을 활용 데이터베이스에 저장하는 것으로 관련된 분석 데이터에 대한 분석 결과를 요청하는 경우 즉시 재활용할 수 있고, 이러한 분석에 따른 지속적인 사용자의 사용 기록들은 분석부에서 기계학습 방식으로 학습되어 관련 분석의 신뢰성을 높일 수 있다.On the other hand, if a user, such as a marketer, manager, or advertiser, collects information by providing search criteria such as keywords for analysis that they need, the big data analysis system according to the embodiment of the present invention periodically analyzes information according to the search criteria. collect with The collected information is distributed and stored, and the information is accumulated by repeatedly analyzing the analysis information to enable correlation analysis by approaching and analyzing it in a distributed processing method. If necessary, the accumulated analysis information can be queried according to the desired criteria to produce results, or the analysis results can be utilized by suggesting a new analysis process. By storing the results according to such a practical analysis process or query in the utilization database, the analysis results for the related analysis data can be immediately reused when requested, and the continuous user usage records according to such analysis are stored in the analysis unit in a machine learning method. can be learned to increase the reliability of the related analysis.

이상에서는 본 발명에 따른 바람직한 실시예들에 대하여 도시하고 또한 설명하였다. 그러나 본 발명은 상술한 실시예에 한정되지 아니하며, 특허 청구의 범위에서 첨부하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능할 것이다.In the above, preferred embodiments according to the present invention have been shown and described. However, the present invention is not limited to the above-described embodiments, and without departing from the gist of the present invention appended in the claims, anyone with ordinary skill in the art to which the invention pertains will be able to implement various modifications. .

10: 정보탐색 수집 시스템 20: 분산 저장 시스템
30: 분석 시스템 40: 결과 제공 시스템
50: 사용자 지원 시스템 100: 정보 수집부
200: 분산 저장부 300: 분산 저장소
400: 데이터베이스부 500: 분석부
600: 결과 제공부 700: 사용자 지원부10: information search and collection system 20: distributed storage system
30: analysis system 40: result providing system
50: user support system 100: information collection unit
200: distributed storage 300: distributed storage
400: database unit 500: analysis unit
600: result providing unit 700: user support unit

Claims

an information collection unit provided for each type of website, which searches for information from a preset type of website according to a set search criterion, parses the searched information, and converts the searched information into a format based on a hierarchical common code and item;
a distributed storage unit for distributing and storing the information converted by the information collection unit with reference to a common code and size;
After the distributed storage unit accesses the distributed and stored information in a distributed processing method and analyzes it according to analysis processes including item-based value analysis, the result is stored in the utilization database, and the previous value analysis result stored in the utilization database and an analysis unit for re-analyzing the newly analyzed value analysis result based on the item and further storing the result in the utilization database;
The analysis result of the utilization database is reanalyzed as a desired process or searched according to a requested query and the result is output, and the output is analyzed and provided as data to the distributed storage according to the data format or stored in the utilization database for reuse Big data analysis system for marketing, characterized in that it comprises a result providing unit to store.