KR20220111219A

KR20220111219A - Analytical methods of systems for setting data processing cycles based on growth rate of data in real time

Info

Publication number: KR20220111219A
Application number: KR1020220090065A
Authority: KR
Inventors: 고영률; 김용수
Original assignee: (주)모아라
Priority date: 2020-12-29
Filing date: 2022-07-21
Publication date: 2022-08-09
Also published as: KR20220094551A; KR102465391B1; KR102425595B1

Abstract

Disclosed is an analysis method of an in-memory computing-based system. The analysis method includes the steps of: dividing data acquired in real time into a plurality of content groups according to time and storing the data in a memory; acquiring a plurality of types of analysis content by pre-processing each of the plurality of content groups according to at least one index keyword; if at least one search word and condition are input, performing a search for the plurality of content groups and identifying at least one type of content by a plurality of cores; and performing analysis using at least one type of analysis content matching the identified content.

Description

An analysis method of a system that sets the data processing cycle according to the data growth rate in real time

본 개시는 실시간으로 데이터의 증가 속도에 따라 데이터 처리 주기를 설정하는 시스템의 분석 방법에 관한 것으로, 보다 상세하게는, 인메모리 컴퓨팅에 따라 실시간으로 데이터를 구축하여 분석 속도를 향상시키는 시스템에 관한 것이다.The present disclosure relates to an analysis method of a system for setting a data processing cycle according to an increase rate of data in real time, and more particularly, to a system for improving analysis speed by constructing data in real time according to in-memory computing. .

실시간 및/또는 대용량의 빅데이터를 처리함에 있어 인메모리 컴퓨팅 방식이 고안된 바 있다.In-memory computing methods have been devised to process real-time and/or large-capacity big data.

인메모리 컴퓨팅 방식은, 대부분의 데이터가 하드 디스크에 저장되어 이용된 결과 데이터의 입출력에 많은 시간이 소모되었던 기존의 방식을 탈피하여, 램으로 구성된 메모리 내에 데이터를 직접 저장하여 코어가 해당 데이터를 빠르게 처리/가공하도록 하는 방식이다.In the in-memory computing method, most of the data is stored on the hard disk, breaking away from the existing method in which a lot of time was consumed for input/output of used result data, and data is directly stored in the memory composed of RAM so that the core can quickly retrieve the data. The way it is processed/processed.

인메모리 컴퓨팅 방식의 경우, 복수의 코어가 로드를 효율적으로 나누어 메모리 내 데이터를 병렬 처리함으로써 처리 속도가 빨라질 수 있다.In the case of the in-memory computing method, a plurality of cores efficiently divide the load to process data in the memory in parallel, thereby increasing the processing speed.

공개특허공보 제10-2019-0005578호(2019.01.16)Patent Publication No. 10-2019-0005578 (2019.01.16)

본 개시는 인메모리 컴퓨팅을 기반으로 메모리 내 데이터를 사전에 분류 및 전처리하여 검색 및 분석 동작의 속도를 향상시키는 시스템 및 분석 방법을 제공한다.The present disclosure provides a system and an analysis method for improving the speed of search and analysis operations by pre-classifying and pre-processing data in memory based on in-memory computing.

본 개시의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 개시의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 개시의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 개시의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Objects of the present disclosure are not limited to the above-mentioned purposes, and other objects and advantages of the present disclosure that are not mentioned may be understood by the following description, and will be more clearly understood by examples of the present disclosure. Moreover, it will be readily apparent that the objects and advantages of the present disclosure may be realized by the means and combinations thereof indicated in the claims.

본 개시의 일 실시 예에 따른 인메모리 컴퓨팅 기반 시스템의 분석 방법은, 실시간으로 획득되는 데이터를 시간에 따른 복수의 콘텐츠 그룹으로 구분하여 메모리에 저장하는 단계, 상기 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득하는 단계, 적어도 하나의 검색어 및 조건이 입력되면, 복수의 코어가 상기 복수의 콘텐츠 그룹에 대하여 병렬 처리 방식으로 검색을 수행하여 적어도 하나의 콘텐츠를 식별하는 단계, 상기 식별된 콘텐츠에 매칭되는 적어도 하나의 분석 콘텐츠를 이용하여 분석을 수행하는 단계를 포함한다.An analysis method of an in-memory computing-based system according to an embodiment of the present disclosure includes dividing data obtained in real time into a plurality of content groups according to time and storing the data in a memory; Pre-processing according to the index keyword to obtain a plurality of analyzed contents, and when at least one search word and condition are input, a plurality of cores perform a search in a parallel processing manner on the plurality of content groups to identify at least one content and performing analysis using at least one analysis content matching the identified content.

상기 복수의 분석 콘텐츠를 획득하는 단계는, 상기 인덱스 키워드의 출현 빈도, 상기 인덱스 키워드의 출현 순서, 상기 인덱스 키워드에 따른 분류 정보, 및 상기 인덱스 키워드에 기초한 패턴 정보 중 적어도 하나를 기반으로, 상기 복수의 콘텐츠 그룹 각각을 전처리할 수 있다.The acquiring of the plurality of analyzed contents may include, based on at least one of an appearance frequency of the index keyword, an appearance order of the index keyword, classification information according to the index keyword, and pattern information based on the index keyword, You can preprocess each of the content groups of .

그리고, 상기 적어도 하나의 콘텐츠를 식별하는 단계는, 상기 복수의 콘텐츠 그룹 중 적어도 하나의 콘텐츠 그룹의 데이터 사이즈가 상기 복수의 코어 각각의 검색 성능에 따라 기설정된 데이터 사이즈보다 큰 경우, 상기 콘텐츠 그룹을 분할하여 병렬 처리 방식으로 검색을 수행할 수 있다.And, in the step of identifying the at least one content, when a data size of at least one content group among the plurality of content groups is larger than a data size preset according to the search performance of each of the plurality of cores, the content group is selected The search can be performed in a parallel processing manner by partitioning.

한편, 상기 시스템의 분석 방법은, 일 코어를 통해 수행된 검색에 따라 복수의 콘텐츠가 식별된 경우, 상기 복수의 콘텐츠를 기설정된 개수 단위로 분할하여 복수의 콘텐츠 리스트를 생성하는 단계를 더 포함할 수 있다. 이 경우, 상기 분석을 수행하는 단계는, 복수의 코어가, 상기 생성된 복수의 콘텐츠 리스트를 병렬 처리 방식으로 분석할 수 있다.On the other hand, the analysis method of the system, when a plurality of contents are identified according to a search performed through one core, dividing the plurality of contents into a preset number unit to generate a plurality of contents list can In this case, in the performing of the analysis, a plurality of cores may analyze the generated plurality of content lists in a parallel processing manner.

이 경우, 상기 시스템의 분석 방법은, 상기 복수의 콘텐츠 리스트 각각에 대한 분석 결과를 병합하는 단계를 더 포함할 수도 있다.In this case, the analysis method of the system may further include merging the analysis results for each of the plurality of content lists.

한편, 상기 시스템의 분석 방법은, 상기 복수의 콘텐츠 그룹 및 상기 복수의 분석 콘텐츠에 대한 백업 정보를 적어도 하나의 스토리지에 저장하는 단계, 상기 시스템이 부팅되는 경우, 상기 백업 정보를 기반으로 상기 복수의 콘텐츠 그룹 및 상기 복수의 분석 콘텐츠를 상기 메모리로 로드하는 단계를 더 포함할 수 있다.On the other hand, the analysis method of the system includes the steps of storing backup information for the plurality of content groups and the plurality of analyzed content in at least one storage, and when the system is booted, based on the backup information, the plurality of The method may further include loading a content group and the plurality of analyzed content into the memory.

상기 복수의 분석 콘텐츠를 획득하는 단계는, 상기 실시간으로 획득되는 데이터의 증가 속도, 상기 시스템에 포함된 코어의 수, 및 각 코어의 성능에 따라 설정된 주기를 기반으로, 상기 데이터가 구분된 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득할 수 있다.The acquiring of the plurality of analysis contents includes: based on a period set according to the increase rate of the data acquired in real time, the number of cores included in the system, and the performance of each core, the data is divided into a plurality of Each of the content groups may be pre-processed according to at least one index keyword to obtain a plurality of analyzed content.

한편, 상기 시스템의 분석 방법은, 검색어 및 조건이 입력되어 상기 시스템을 통해 검색이 수행되는 적어도 하나의 시간 구간을 식별하는 단계, 상기 식별된 시간 구간을 기반으로, 검색이 수행되는 시간 구간에 대한 패턴을 생성하는 단계, 상기 생성된 패턴을 기반으로, 검색이 수행될 가능성이 가장 낮은 적어도 하나의 시간 구간을 선택하는 단계를 더 포함할 수 있다. 이 경우, 상기 복수의 분석 콘텐츠를 획득하는 단계는, 상기 선택된 시간 구간 동안, 상기 데이터가 구분된 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득할 수 있다.On the other hand, the analysis method of the system includes the steps of identifying at least one time section in which a search is performed through the system in which a search word and a condition are input, and based on the identified time section, The method may further include generating a pattern and selecting at least one time interval in which a search is most likely to be performed based on the generated pattern. In this case, the obtaining of the plurality of analysis contents may include preprocessing each of the plurality of contents groups in which the data is divided according to at least one index keyword during the selected time period to obtain the plurality of analysis contents.

본 개시의 일 실시 예에 따른 인메모리 컴퓨팅 기반 시스템은, 적어도 하나의 메모리, 상기 적어도 하나의 메모리와 연결된 적어도 하나의 프로세서를 포함한다. 상기 프로세서는, 실시간으로 획득되는 데이터를 시간에 따른 복수의 콘텐츠 그룹으로 구분하여 상기 메모리에 저장하고, 상기 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득하고, 적어도 하나의 검색어 및 조건이 입력되면, 복수의 코어를 통해 상기 복수의 콘텐츠 그룹에 대하여 병렬 처리 방식으로 검색을 수행하여 적어도 하나의 콘텐츠를 식별하고, 상기 식별된 콘텐츠에 매칭되는 적어도 하나의 분석 콘텐츠를 이용하여 분석을 수행한다.An in-memory computing based system according to an embodiment of the present disclosure includes at least one memory and at least one processor connected to the at least one memory. The processor divides the data obtained in real time into a plurality of content groups according to time, stores them in the memory, and pre-processes each of the plurality of content groups according to at least one index keyword to obtain a plurality of analyzed content, When at least one search word and condition are input, a search is performed on the plurality of content groups in a parallel processing manner through a plurality of cores to identify at least one content, and at least one analysis content matching the identified content to perform the analysis.

본 개시에 따른 인메모리 컴퓨팅 기반 시스템 및 분석 방법은, 실시간으로 획득되는 데이터를 병렬 처리 방식에 맞도록 분류 및 전처리하여 메모리 내에 저장함으로써, 병렬 처리 방식의 검색 및 분석 과정이 빨라질 수 있다는 효과가 있다.In the in-memory computing-based system and analysis method according to the present disclosure, by classifying and pre-processing data obtained in real time to match the parallel processing method and storing it in the memory, there is an effect that the search and analysis process of the parallel processing method can be accelerated .

도 1은 본 개시의 일 실시 예에 따른 시스템의 동작을 개략적으로 설명하기 위한 도면,
도 2는 본 개시의 일 실시 예에 따른 시스템의 구성을 설명하기 위한 블록도,
도 3은 본 개시의 일 실시 예에 따른 시스템의 동작을 설명하기 위한 알고리즘,
도 4는 본 개시의 일 실시 예에 따른 시스템이 데이터를 분류 및 전처리하는 동작을 설명하기 위한 도면,
도 5는 본 개시의 일 실시 예에 따른 시스템이 복수의 코어를 통해 병렬 처리 방식으로 검색을 수행하는 동작을 설명하기 위한 도면,
도 6은 본 개시의 일 실시 예에 따른 시스템이 복수의 코어를 통해 병렬 처리 방식으로 분석을 수행하는 동작을 설명하기 위한 도면, 그리고
도 7은 본 개시의 일 실시 예에 따른 시스템의 기능적 구성을 설명하기 위한 블록도이다.1 is a diagram for schematically explaining the operation of a system according to an embodiment of the present disclosure;
2 is a block diagram for explaining the configuration of a system according to an embodiment of the present disclosure;
3 is an algorithm for explaining the operation of the system according to an embodiment of the present disclosure;
4 is a diagram for explaining an operation of a system classifying and pre-processing data according to an embodiment of the present disclosure;
5 is a diagram for explaining an operation of a system performing a search in a parallel processing manner through a plurality of cores according to an embodiment of the present disclosure;
6 is a view for explaining an operation in which the system performs analysis in a parallel processing manner through a plurality of cores according to an embodiment of the present disclosure; and
7 is a block diagram illustrating a functional configuration of a system according to an embodiment of the present disclosure.

본 개시에 대하여 구체적으로 설명하기에 앞서, 본 명세서 및 도면의 기재 방법에 대하여 설명한다.Before describing the present disclosure in detail, a description will be given of the description of the present specification and drawings.

먼저, 본 명세서 및 청구범위에서 사용되는 용어는 본 개시의 다양한 실시 예들에서의 기능을 고려하여 일반적인 용어들을 선택하였다. 하지만, 이러한 용어들은 당해 기술 분야에 종사하는 기술자의 의도나 법률적 또는 기술적 해석 및 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 일부 용어는 출원인이 임의로 선정한 용어도 있다. 이러한 용어에 대해서는 본 명세서에서 정의된 의미로 해석될 수 있으며, 구체적인 용어 정의가 없으면 본 명세서의 전반적인 내용 및 당해 기술 분야의 통상적인 기술 상식을 토대로 해석될 수도 있다. First, terms used in the present specification and claims have been selected in consideration of functions in various embodiments of the present disclosure. However, these terms may vary depending on the intention or legal or technical interpretation of a person skilled in the art, and the emergence of new technology. Also, some terms are arbitrarily selected by the applicant. These terms may be interpreted in the meaning defined in the present specification, and if there is no specific term definition, it may be interpreted based on the general content of the present specification and common technical knowledge in the art.

또한, 본 명세서에 첨부된 각 도면에 기재된 동일한 참조번호 또는 부호는 실질적으로 동일한 기능을 수행하는 부품 또는 구성요소를 나타낸다. 설명 및 이해의 편의를 위해서 서로 다른 실시 예들에서도 동일한 참조번호 또는 부호를 사용하여 설명한다. 즉, 복수의 도면에서 동일한 참조 번호를 가지는 구성요소를 모두 도시되어 있다고 하더라도, 복수의 도면들이 하나의 실시 예를 의미하는 것은 아니다. Also, the same reference numerals or reference numerals in each drawing attached to this specification indicate parts or components that perform substantially the same functions. For convenience of description and understanding, the same reference numerals or reference numerals are used in different embodiments. That is, even though all components having the same reference number are illustrated in a plurality of drawings, the plurality of drawings do not mean one embodiment.

또한, 본 명세서 및 청구범위에서는 구성요소들 간의 구별을 위하여 "제1", "제2" 등과 같이 서수를 포함하는 용어가 사용될 수 있다. 이러한 서수는 동일 또는 유사한 구성요소들을 서로 구별하기 위하여 사용하는 것이며 이러한 서수 사용으로 인하여 용어의 의미가 한정 해석되어서는 안 된다. 일 예로, 이러한 서수와 결합된 구성요소는 그 숫자에 의해 사용 순서나 배치 순서 등이 제한되어서는 안 된다. 필요에 따라서는, 각 서수들은 서로 교체되어 사용될 수도 있다. In addition, in this specification and claims, terms including an ordinal number such as “first” and “second” may be used to distinguish between elements. This ordinal number is used to distinguish the same or similar elements from each other, and the meaning of the term should not be construed as limited due to the use of the ordinal number. For example, components combined with such an ordinal number should not be limited in the order of use or arrangement by the number. If necessary, each ordinal number may be used interchangeably.

본 명세서에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, the singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as "comprises" or "consisting of" are intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and are intended to indicate that one or more other It is to be understood that this does not preclude the possibility of addition or presence of features or numbers, steps, operations, components, parts, or combinations thereof.

본 개시의 실시 예에서 "모듈", "유닛", "부(part)" 등과 같은 용어는 적어도 하나의 기능이나 동작을 수행하는 구성요소를 지칭하기 위한 용어이며, 이러한 구성요소는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 "모듈", "유닛", "부(part)" 등은 각각이 개별적인 특정한 하드웨어로 구현될 필요가 있는 경우를 제외하고는, 적어도 하나의 모듈이나 칩으로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In an embodiment of the present disclosure, terms such as “module”, “unit”, “part”, etc. are terms for designating a component that performs at least one function or operation, and such component is hardware or software. It may be implemented or implemented as a combination of hardware and software. In addition, a plurality of "modules", "units", "parts", etc. are integrated into at least one module or chip, except when each needs to be implemented as individual specific hardware, and thus at least one processor. can be implemented as

또한, 본 개시의 실시 예에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적인 연결뿐 아니라, 다른 매체를 통한 간접적인 연결의 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 포함한다는 의미는, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, in an embodiment of the present disclosure, when a part is connected to another part, this includes not only direct connection but also indirect connection through another medium. In addition, the meaning that a certain part includes a certain component means that other components may be further included without excluding other components unless otherwise stated.

도 1은 본 개시의 일 실시 예에 따른 시스템의 동작을 개략적으로 설명하기 위한 도면이다.1 is a diagram schematically illustrating an operation of a system according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시에 따른 시스템은, 다양한 외부 데이터(External Data)를 획득할 수 있다.Referring to FIG. 1 , the system according to the present disclosure may acquire various external data.

구체적으로, 시스템은 다양한 방식의 유/무선 네트워크를 통해 SNS 콘텐츠, 뉴스 콘텐츠, 공공 데이터, 기타 텍스트 등 다양한 데이터를 획득할 수 있다.Specifically, the system may acquire various data such as SNS content, news content, public data, and other texts through various types of wired/wireless networks.

또한, 시스템은 해당 데이터를 실시간으로 수집하여 실시간으로 확장되는 대용량의 빅데이터를 구축할 수 있다.In addition, the system can build large-capacity big data that is expanded in real time by collecting the data in real time.

그리고, 시스템은 메모리 상에 데이터를 저장하는 한편, 메모리를 이용하여 해당 데이터를 전처리하고, 구분하고, 인덱싱하는 등 다양한 처리를 수행할 수 있다. 데이터가 실시간으로 획득됨에 따라 해당 처리 역시 실시간으로(또는 특정 주기/조건에 따라) 수행될 수 있다.And, while the system stores data in the memory, the system may perform various processes such as preprocessing, classifying, and indexing the corresponding data using the memory. As the data is acquired in real time, the corresponding processing may also be performed in real time (or according to a specific period/condition).

여기서, 시스템은 전처리, 구분 또는 인덱싱 등의 처리가 수행된 데이터를 메모리에 저장할 수 있다.Here, the system may store data on which processing such as pre-processing, classification, or indexing has been performed, in the memory.

그리고, 시스템은 해당 데이터를 병렬 처리 방식으로 취급하여 검색을 수행할 수 있다. 구체적으로, 시스템은 데이터가 구분된 복수의 서브 데이터 각각에 대하여 각 코어를 이용한 검색을 수행할 수 있다.In addition, the system may perform a search by treating the corresponding data in a parallel processing method. Specifically, the system may perform a search using each core for each of a plurality of sub data into which data is divided.

이 경우, 시스템은 키워드 기반 검색, 분류 기반 검색, 패턴 기반 검색 등 다양한 검색 기술을 이용할 수 있다.In this case, the system may use various search technologies such as keyword-based search, classification-based search, and pattern-based search.

또한, 시스템은 검색 결과 획득된 콘텐츠에 대하여 다양한 분석을 수행할 수도 있다. 이를 위해, 시스템은 다양한 기능/서비스를 제공하는 모듈들을 이용할 수 있다.In addition, the system may perform various analyzes on the content obtained as a result of the search. To this end, the system may use modules that provide various functions/services.

이때, 모듈들 각각을 실행하는 코어들이 해당 콘텐츠를 병렬 처리 방식으로 분석할 수 있으며, 검색 또는 분석의 결과가 모듈들 간에 서로 공유\될 수도 있다. 이 경우, 분석 결과가 시스템의 메모리 내에 저장될 수 있다.At this time, the cores executing each of the modules may analyze the corresponding content in a parallel processing manner, and the results of the search or analysis may be shared between the modules. In this case, the analysis results may be stored in the system's memory.

그리고, 시스템은 검색 결과 및/또는 분석 결과를 제공할 수 있다. 이때, 시스템에 연결된 출력 장치를 통해 검색 결과 및/또는 분석 결과가 제공될 수도 있고, 시스템이 접속된 네트워크에 포함된 적어도 하나의 사용자 단말을 통해 검색 결과 및/또는 분석 결과가 제공될 수도 있다.And, the system may provide search results and/or analysis results. In this case, the search result and/or analysis result may be provided through an output device connected to the system, or the search result and/or analysis result may be provided through at least one user terminal included in a network to which the system is connected.

이하 도면들을 통해, 본 개시에 따른 시스템의 구성 및 동작에 대해 보다 구체적으로 설명한다.With reference to the drawings below, the configuration and operation of the system according to the present disclosure will be described in more detail.

도 2는 본 개시의 일 실시 예에 따른 시스템의 구성을 설명하기 위한 블록도이다.2 is a block diagram illustrating the configuration of a system according to an embodiment of the present disclosure.

도 2를 참조하면, 시스템(100)은 적어도 하나의 메모리(110) 및 적어도 하나의 프로세서(120)를 포함한다.Referring to FIG. 2 , the system 100 includes at least one memory 110 and at least one processor 120 .

시스템(100)은 하나 이상의 전자 장치 내지는 컴퓨터를 포함할 수 있으며, 일 예로, 검색이나 분석 기능(Application, Program 등)을 제공하는 복수의 서버 장치로 구현될 수 있으나 이에 한정되지 않는다.The system 100 may include one or more electronic devices or computers, and for example, may be implemented as a plurality of server devices that provide a search or analysis function (Application, Program, etc.), but is not limited thereto.

메모리(110)는 다양한 데이터를 저장하기 위한 구성이다. 인메모리 컴퓨팅 방식을 사용하는 본 개시의 시스템(100) 내에서, 메모리(110)는 하나 이상의 RAM(Random Access Memory)으로 구성될 수 있다.The memory 110 is a configuration for storing various data. In the system 100 of the present disclosure using the in-memory computing method, the memory 110 may be configured as one or more random access memory (RAM).

구체적으로, 메모리(110)는 DRAM(Dynamic RAM), SRAM(static RAM) 등의 휘발성 메모리로 구현될 수 있다. 다만, 메모리(110)는 MRAM(Dynamic RAM), PRAM(Phase-change RAM), RRAM(Resistive RAM), FRAM(Ferroelectric RAM), 기타 플래시 메모리 등을 포함할 수도 있다.Specifically, the memory 110 may be implemented as a volatile memory such as a dynamic RAM (DRAM) or a static RAM (SRAM). However, the memory 110 may include a dynamic RAM (MRAM), a phase-change RAM (PRAM), a resistive RAM (RRAM), a ferroelectric RAM (FRAM), or other flash memory.

프로세서(120)는 시스템(100)의 전반적인 동작을 제어하기 위한 구성으로, 메모리(110)와 연결되어 다양한 기능을 수행할 수 있다.The processor 120 is a configuration for controlling the overall operation of the system 100 , and may be connected to the memory 110 to perform various functions.

구체적으로, 프로세서(120)는 메모리(110)에 저장된 적어도 하나의 인스트럭션을 실행함으로써 후술할 검색 및 분석 방법을 수행할 수 있다.Specifically, the processor 120 may perform a search and analysis method to be described later by executing at least one instruction stored in the memory 110 .

프로세서(120)는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit) 등과 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서 등을 포함할 수 있다. 또한, 프로세서(120)는 SRAM을 포함할 수 있다.The processor 120 may include a general-purpose processor such as a CPU, an AP, or a digital signal processor (DSP), a graphics-only processor such as a GPU, a vision processing unit (VPU), or the like, or an artificial intelligence-only processor such as an NPU. In addition, the processor 120 may include SRAM.

프로세서(120)는 복수의 코어를 포함할 수 있다. 각 코어는 동일한 기능을 병렬적으로 수행할 수도 있고, 또는 서로 다른 기능을 수행할 수도 있다.The processor 120 may include a plurality of cores. Each core may perform the same function in parallel or may perform different functions.

구체적으로, 프로세서(120)는 복수의 코어를 이용하여 메모리(110)에 저장된 다양한 형태의 데이터를 분류, 전처리할 수 있으며, 분류, 전처리된 데이터를 이용하여 다양한 기능(ex. 검색, 분석 등)을 수행할 수 있다.Specifically, the processor 120 may classify and pre-process various types of data stored in the memory 110 using a plurality of cores, and perform various functions (eg, search, analysis, etc.) using the classified and pre-processed data. can be performed.

도 3은 본 개시의 일 실시 예에 따른 시스템의 동작을 설명하기 위한 알고리즘이다.3 is an algorithm for explaining the operation of the system according to an embodiment of the present disclosure.

도 3을 참조하면, 시스템(100)은 실시간으로 획득되는 데이터를 시간에 따른 복수의 콘텐츠 그룹으로 구분하여 메모리(110)에 저장할 수 있다(S310).Referring to FIG. 3 , the system 100 may classify data acquired in real time into a plurality of content groups according to time and store the data in the memory 110 ( S310 ).

구체적으로, 시스템(100)은 메모리(110)에 저장된 데이터(다양한 콘텐츠 포함)를 날짜, Media ID 등의 항목을 기준으로 분류하여 복수의 콘텐츠 그룹을 정의할 수 있다.Specifically, the system 100 may define a plurality of content groups by classifying data (including various contents) stored in the memory 110 based on items such as date and Media ID.

특히, 외부로부터 실시간으로 다양한 (콘텐츠) 데이터가 수신되는 경우, 시스템(100)은 수신되는 데이터가 날짜 별로 구분된 복수의 콘텐츠 그룹을 메모리(110)에 저장할 수 있다. 이때, 날짜 별로 구분된 콘텐츠 데이터는 queue 형태로 메모리(110)에 저장될 수 있으나, 이 밖에 stack, linked list, array, tree 등 다양한 형태가 가능하다.In particular, when various (contents) data is received from the outside in real time, the system 100 may store a plurality of content groups in which the received data is divided by date in the memory 110 . In this case, the content data classified by date may be stored in the memory 110 in the form of a queue, but in addition, various forms such as a stack, a linked list, an array, and a tree are possible.

그 결과, 콘텐츠 데이터는 향후 검색에 용이한 형태인 복수의 콘텐츠 그룹의 구조로 메모리(110) 내에 구축될 수 있다.As a result, the content data may be built in the memory 110 in a structure of a plurality of content groups in a form that is easy to search in the future.

그리고, 시스템(100)은 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득할 수 있다(S320).Then, the system 100 may obtain a plurality of analyzed content by preprocessing each of the plurality of content groups according to at least one index keyword ( S320 ).

인덱스 키워드는, 각 콘텐츠 그룹을 인덱싱하기 위한 기준이 될 수 있다.The index keyword may be a criterion for indexing each content group.

구체적인 예로, 시스템(100)은 인덱스 키워드의 출현 빈도, 하나 이상의 인덱스 키워드의 출현 순서, 인덱스 키워드에 따른 분류 정보, 및 인덱스 키워드에 기초한 패턴 정보 중 적어도 하나를 기반으로, 복수의 콘텐츠 그룹 각각을 전처리할 수 있다. 또한, 시스템(100)은 인덱스 키워드에 매칭되는 아이디 또는 대상을 기준으로 각 콘텐츠 그룹을 전처리할 수도 있다.As a specific example, the system 100 preprocesses each of the plurality of content groups based on at least one of the frequency of appearance of the index keyword, the order of appearance of one or more index keywords, classification information according to the index keyword, and pattern information based on the index keyword. can do. Also, the system 100 may pre-process each content group based on an ID or a target matching the index keyword.

그 결과, 각 콘텐츠 그룹에 포함되는 복수의 분석 콘텐츠가 정의되어 메모리(110)에 저장될 수 있다.As a result, a plurality of analysis contents included in each content group may be defined and stored in the memory 110 .

분석 콘텐츠는, 특정한 인덱스 키워드를 포함하는 콘텐츠, 특정한 인덱스 키워드들을 특정한 순서로 포함하는 콘텐츠, 특정한 기준으로 분류된 인덱스 키워드들에 따라 구분된 콘텐츠, 특정한 인덱스 키워드들을 기설정된 패턴으로 포함하는 콘텐츠, 특정한 인덱스 키워드에 매칭되는 아이디 또는 사용자에 의해 생성된 콘텐츠 등 다양하게 정의될 수 있다.The analysis content includes content including specific index keywords, content including specific index keywords in a specific order, content classified according to index keywords classified according to a specific criterion, content including specific index keywords in a preset pattern, and specific It may be defined in various ways, such as an ID matching an index keyword or content generated by a user.

또한, 분석 콘텐츠는, 특정한 인덱스 키워드를 기준으로 특정한 콘텐츠 그룹에 대하여 산출된 통계 분석 관련 데이터(수치, 지표, 분류 등)를 포함할 수도 있다.In addition, the analysis content may include statistical analysis-related data (numerical values, indicators, classifications, etc.) calculated for a specific content group based on a specific index keyword.

이렇듯, 콘텐츠 데이터는 통계 등 다양한 분석에 용이한 분석 콘텐츠들의 형태로 메모리(110) 내에 구축될 수 있다.In this way, the content data may be built in the memory 110 in the form of analysis contents that are easy for various analysis such as statistics.

도 4는 본 개시의 일 실시 예에 따른 시스템이 데이터를 분류 및 전처리하는 동작을 설명하기 위한 도면이다.4 is a diagram for explaining an operation of classifying and preprocessing data by a system according to an embodiment of the present disclosure.

도 4를 참조하면, 시스템(100)은 데이터를 날짜 및 Media ID를 기반으로 분류하여 복수의 콘텐츠 그룹(410, 420, 430, …)을 식별하고, 식별된 복수의 콘텐츠 그룹에 대한 정보를 메모리(110)에 저장할 수 있다.Referring to FIG. 4 , the system 100 classifies data based on date and Media ID to identify a plurality of content groups 410, 420, 430, ..., and stores information about the identified plurality of content groups in memory. (110) can be stored.

또한, 시스템(100)은 각 콘텐츠 그룹(410, 420, 430, …)에 대하여 전처리를 수행할 수 있다.In addition, the system 100 may perform pre-processing for each content group 410 , 420 , 430 , ... .

구체적으로, 도 4를 참조하면, 시스템(100)은 콘텐츠 그룹(410)에 대하여 다양한 인덱스 키워드들(환경, 코로나, 펀드, 기업은행 등)을 기반으로 전처리/분류를 수행함으로써 복수의 분석 콘텐츠(411, 412, 413, 414, …)를 획득할 수 있다.Specifically, referring to FIG. 4 , the system 100 performs pre-processing/classification based on various index keywords (environment, corona, fund, IBK, etc.) for the content group 410, thereby performing a plurality of analysis contents ( 411, 412, 413, 414, ...) can be obtained.

예를 들어, “환경” 키워드에 따른 분석 콘텐츠(411)의 경우, 콘텐츠 그룹(410) 내 “환경”의 출현 빈도에 대한 정보, “환경”의 출현 빈도에 따라 순서대로 나열된 콘텐츠 그룹(410) 내 복수의 콘텐츠에 대한 정보, 콘텐츠 그룹(410) 내 키워드 “환경”의 출현 패턴에 대한 정보, 콘텐츠 그룹(410) 내 “환경”이라는 단어를 높은 빈도로 사용한 아이디/사용자에 대한 정보 등을 포함할 수 있으나, 이에 한정되지 않는다.For example, in the case of the analyzed content 411 according to the “environment” keyword, information on the frequency of appearance of “environment” in the content group 410 and the content group 410 sequentially listed according to the frequency of appearance of “environment” It includes information about my plurality of contents, information on the appearance pattern of the keyword “environment” in the content group 410, and information on ID/users who use the word “environment” in the content group 410 with high frequency. can, but is not limited thereto.

상술한 실시 예들에 따라 메모리(110) 내 데이터가 검색 및 분석에 용이하도록 구축된 상태에서, 시스템(100)은 구축된 데이터를 기반으로 검색 및 분석을 수행할 수 있다.In a state in which data in the memory 110 is constructed to facilitate search and analysis according to the above-described embodiments, the system 100 may perform search and analysis based on the constructed data.

구체적으로, 도 3을 참조하면, 적어도 하나의 검색어 및 조건이 입력되면, 시스템(100) 내 복수의 코어가 복수의 콘텐츠 그룹(ex. 410, 420, 430, …)에 대하여 병렬 처리 방식으로 검색을 수행하여 적어도 하나의 콘텐츠를 식별할 수 있다(S330).Specifically, referring to FIG. 3 , when at least one search word and condition are input, a plurality of cores in the system 100 searches for a plurality of content groups (eg. 410, 420, 430, ...) in a parallel processing manner. may be performed to identify at least one content (S330).

여기서, 검색어와 함께 입력되는 조건은, 적어도 하나의 키워드를 포함하거나 포함하지 않을 조건, 검색 대상 시기(기간), 불용어, 검색어가 복합어인 경우 복합어에 포함된 단어들이 서로 일정 단어 수 이하의 거리로 인접해 있는 조건 등 다양한 조건에 해당할 수 있다.Here, the conditions input together with the search term include a condition that includes or not include at least one keyword, a search target time (period), a stop word, and when the search term is a compound word, the words included in the compound word are at a distance of less than a certain number of words from each other. It may correspond to a variety of conditions, such as adjacent conditions.

그리고, 일 실시 예로, 시스템(100) 내 복수의 코어 각각은, 복수의 콘텐츠 그룹 각각에 대하여 검색을 수행할 수 있다(병렬 처리). And, according to an embodiment, each of the plurality of cores in the system 100 may perform a search for each of the plurality of content groups (parallel processing).

만약, 시스템(100) 내 복수의 프로세서 및/또는 복수의 서버가 포함된 경우, 복수의 프로세서/서버에 포함된 코어들이 복수의 콘텐츠 그룹 각각에 대하여 병렬 처리 방식으로 검색을 수행할 수 있다.If a plurality of processors and/or a plurality of servers are included in the system 100 , the cores included in the plurality of processors/servers may perform a search for each of a plurality of content groups in a parallel processing manner.

이렇듯 병렬 처리 방식을 통해 복수의 콘텐츠 그룹에 대한 검색이 수행된 결과, 검색 속도가 빨라질 수 있고, 코어 사용 효율이 높아질 수 있다. 또한, 데이터가 뭉쳐 있거나 중복되면 동시 접근이 불가능한 반면, 구분된 복수의 콘텐츠 그룹 각각에 대한 검색이 수행됨으로써, 동기화 구간이 최소화되어 병렬 처리의 효율이 보장될 수 있다. As a result of performing a search for a plurality of content groups through the parallel processing method, the search speed may be increased, and the core usage efficiency may be increased. In addition, when data is aggregated or duplicated, simultaneous access is impossible, whereas a search for each of a plurality of divided content groups is performed, so that the synchronization period is minimized, and parallel processing efficiency can be guaranteed.

다만, 복수의 콘텐츠 그룹 중 적어도 하나의 콘텐츠 그룹의 데이터 사이즈가 복수의 코어 각각의 검색 성능에 따라 기설정된 데이터 사이즈보다 큰 경우, 시스템(100)은 적어도 하나의 콘텐츠 그룹을 분할하여 병렬 처리 방식으로 검색을 수행할 수도 있다. 이 경우, 코어들의 병렬 처리 성능이 이상 없이 유지될 수 있다.However, when the data size of at least one content group among the plurality of content groups is larger than the data size preset according to the search performance of each of the plurality of cores, the system 100 divides the at least one content group in a parallel processing method. You can also do a search. In this case, parallel processing performance of the cores may be maintained without abnormality.

도 5는 본 개시의 일 실시 예에 따른 시스템이 복수의 코어를 통해 병렬 처리 방식으로 검색을 수행하는 동작을 설명하기 위한 도면이다.5 is a diagram for explaining an operation in which a system performs a search in a parallel processing method through a plurality of cores according to an embodiment of the present disclosure.

도 5를 참조하면, 각 코어(thread)가 콘텐츠 그룹(shard contents group) 하나씩 검색을 수행할 수 있다.Referring to FIG. 5 , each core (thread) may perform a search for one content group (shard contents group).

여기서, 각 콘텐츠 그룹은, 앞서 구분된 복수의 콘텐츠 그룹(ex. 410, 420, 430, …) 중 하나일 수도 있고, (콘텐츠 그룹의 사이즈가 큰 경우) 복수의 콘텐츠 그룹 중 하나가 분할된 일부 콘텐츠 그룹일 수도 있다.Here, each content group may be one of the plurality of content groups (ex. 410, 420, 430, ...) divided above, and a portion of one of the plurality of content groups (when the size of the content group is large) It can also be a content group.

도 5를 참조하면, 각 코어는 하나의 콘텐츠 그룹(shard contents group)에 대한 검색이 끝나면 다른 콘텐츠 그룹(shard contents group)에 대하여 검색을 수행하는데, 이러한 과정이 모든 콘텐츠 그룹(shard contents group)에 대하여 수행될 때까지 반복될 수 있다.Referring to FIG. 5 , each core performs a search on another content group (shard contents group) after a search for one content group (shard contents group) is completed, and this process is applied to all contents group (shard contents group). It can be repeated until performed for

각 코어는, 검색을 수행함에 따라 적어도 하나의 콘텐츠를 식별할 수 있다. 여기서, 각 코어는, 중복된 결과를 제거할 수 있다.Each core may identify at least one piece of content by performing a search. Here, each core can remove duplicate results.

일 코어를 통해 수행된 검색에 따라 복수의 콘텐츠가 식별된 경우, 코어는 복수의 콘텐츠를 기설정된 개수 단위로 분할하여 복수의 콘텐츠 리스트를 생성할 수 있다.When a plurality of contents are identified according to a search performed through one core, the core may generate a plurality of contents list by dividing the plurality of contents into a unit of a predetermined number.

구체적인 예로, 각 코어는, List<Contents[]>의 형태로 검색 결과를 생성할 수 있다. 이 경우, 생성되는 검색 결과의 크기가 일정 개수의 콘텐츠 이하로 설정될 수 있다.As a specific example, each core may generate a search result in the form of List<Contents[]>. In this case, the size of the generated search result may be set to be less than or equal to a predetermined number of contents.

분석 과정에서 List<Contents[]>로부터 Contents[]를 가져오는 경우 동기화가 발생하게 되는데, 이 구간을 최소화하기 위해 배열 형태로 콘텐츠를 로드하는 것이다(하나씩 가져오게 되면 가져오는 구간마다 동기화 구간이 발생하여 속도에 영향이 크다).In the analysis process, when Contents[] is retrieved from List<Contents[]>, synchronization occurs. To minimize this section, the contents are loaded in an array form (if one is imported, a synchronization section occurs for each section to be imported). Therefore, the speed is greatly affected).

이렇듯, 검색 결과가 Queue가 아닌 List의 형태로 생성된 결과, 추후 분석 과정(S340)에서, 복수의 코어는 복수의 콘텐츠 리스트를 병렬 처리 방식으로 분석할 수 있다. As such, as a result of generating the search result in the form of a List rather than a Queue, in a later analysis process ( S340 ), the plurality of cores may analyze the plurality of content lists in a parallel processing manner.

그리고, 상술한 실시 예와 같이 검색이 수행되면, 시스템(100)은 (검색에 따라) 식별된 콘텐츠에 매칭되는 적어도 하나의 분석 콘텐츠를 이용하여 분석을 수행할 수 있다(S340).And, when the search is performed as in the above-described embodiment, the system 100 may perform the analysis using at least one analysis content matching the identified content (according to the search) (S340).

일 실시 예로, 시스템(100) 내 각 코어는, 검색 결과인 복수의 콘텐츠 리스트 각각에 대한 분석을 수행할 수 있다. As an embodiment, each core in the system 100 may analyze each of a plurality of content lists that are search results.

구체적으로, 각 코어는, 복수의 콘텐츠 리스트 각각에 매칭되는 상술한 분석 콘텐츠(: 전처리된 데이터)를 이용하여, 복수의 콘텐츠 리스트 각각을 병렬 처리 방식으로 분석할 수 있다.Specifically, each core may analyze each of the plurality of content lists in a parallel processing manner using the above-described analyzed content (: preprocessed data) that matches each of the plurality of content lists.

이렇듯 메모리(110) 내에 미리 전처리된 데이터 구조(분석 콘텐츠)가 프로세서(120) 및 메모리(110) 간의 작업만으로 처리되므로 메모리 데이터 교환 및 데이터 변환 (ex. JSON-> Memory Object) 등의 과정이 수행될 필요가 없게 되어, 종래와 달리 분석 과정의 처리 속도가 빨라질 수 있다.As such, since the preprocessed data structure (analysis content) in the memory 110 is processed only by the operation between the processor 120 and the memory 110, processes such as memory data exchange and data conversion (ex. JSON-> Memory Object) are performed. This is not necessary, and unlike the prior art, the processing speed of the analysis process can be increased.

도 6은 본 개시의 일 실시 예에 따른 시스템이 복수의 코어를 통해 병렬 처리 방식으로 분석을 수행하는 동작을 설명하기 위한 도면이다.FIG. 6 is a diagram for explaining an operation in which a system performs analysis in a parallel processing method through a plurality of cores according to an embodiment of the present disclosure.

도 6을 참조하면, 각 코어(thread)는 전처리된 콘텐츠(preprocessing contents)를 이용하여 분석(ex. 통계 분석)을 빠르게 병렬적으로 처리할 수 있다.Referring to FIG. 6 , each core (thread) can quickly and parallelly process analysis (eg, statistical analysis) using preprocessing contents.

여기서, 전처리된 콘텐츠는, 검색 결과 식별된 콘텐츠 리스트에 매칭되는 분석 콘텐츠의 적어도 일부일 수 있다.Here, the pre-processed content may be at least a portion of the analyzed content matched to the content list identified as a result of the search.

이렇듯, 데이터가 미리 분류되고 전처리되어 메모리(110) 내에 구축되어 유지 내지는 업데이트됨으로써, 시스템(100) 내 복수의 코어가 데이터에 대한 검색 및 분석을 병렬 처리 방식으로 빠르게 수행할 수 있는 최적의 환경이 보장될 수 있다.As such, the data is pre-classified and pre-processed, built and maintained or updated in the memory 110, so that a plurality of cores in the system 100 can quickly perform a search and analysis of data in a parallel processing manner. can be guaranteed.

한편, 각 코어의 분석 내용은, 서로 동일한 기능/모듈에 따른 분석일 수도 있고, 서로 다른 기능/모듈에 따른 분석일 수도 있다.Meanwhile, the analysis contents of each core may be analysis according to the same function/module or analysis according to different functions/modules.

일 실시 예로, 복수의 코어가 동일한 모듈/기능에 따른 분석을 수행하는 경우, 시스템(100)은 복수의 콘텐츠 리스트 각각에 대한 복수의 코어의 분석 결과를 병합할 수도 있다.As an embodiment, when a plurality of cores perform analysis according to the same module/function, the system 100 may merge the analysis results of the plurality of cores for each of the plurality of content lists.

일 예로, 제1 내지 제4 코어 각각을 통해 수행된 분석에 따라 제1 내지 제4 통계 데이터가 산출된 경우를 가정한다.As an example, it is assumed that the first to fourth statistical data are calculated according to the analysis performed through each of the first to fourth cores.

이 경우, 프로세서(120)는, 제1 통계 데이터 및 제2 통계 데이터를 병합하여 제1 병합 통계 데이터를 획득하고, 제3 통계 데이터 및 제4 통계 데이터를 병합하여 제2 병합 통계 데이터를 획득할 수 있다.In this case, the processor 120 is configured to obtain first merge statistical data by merging the first statistical data and the second statistical data, and obtain the second merge statistical data by merging the third statistical data and the fourth statistical data. can

그리고, 프로세서(120)는 제1 병합 통계 데이터 및 제2 병합 통계 데이터를 병합하여 최종 통계 데이터를 산출할 수 있다.In addition, the processor 120 may calculate final statistical data by merging the first merged statistical data and the second merged statistical data.

한편, 일 실시 예에 따르면, 시스템(100)은, 일정 주기에 따라 상술한 S310 내지 S320 과정을 수행할 수 있다.Meanwhile, according to an embodiment, the system 100 may perform the above-described processes S310 to S320 according to a predetermined period.

즉, 시스템(100)은 외부 데이터가 실시간으로 추가됨에 따라, 해당 데이터에 대한 분류 내지 전처리를 주기적으로 수행하여 메모리(110) 내 데이터 환경을 구축할 수 있다.That is, as external data is added in real time, the system 100 may periodically perform classification or pre-processing of the corresponding data to build a data environment in the memory 110 .

구체적인 예로, 시스템(100)은 실시간으로 획득되는 데이터의 증가 속도, 시스템(100)에 포함된 코어의 수, 및 각 코어의 성능에 따라 설정된 주기를 기반으로, 데이터가 구분된 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득할 수 있다.As a specific example, the system 100 is configured for each of a plurality of content groups in which data is divided based on a period set according to the increase rate of data acquired in real time, the number of cores included in the system 100, and the performance of each core. may be pre-processed according to at least one index keyword to obtain a plurality of analysis contents.

일 예로, 시스템(100)이 실시간 추가되는 데이터를 검색 및 분석에 적합하게 분류/전처리하는(S310~S320) 속도는, 실시간으로 추가되는 데이터의 증가 속도보다 빨라야 한다. 이 경우, 각 코어의 데이터 처리 속도에 코어의 수를 곱한 값을 주기로 나눈 결과는, 실시간으로 획득되는 데이터의 증가 속도보다 빠를 수 있다.As an example, the speed at which the system 100 classifies/pre-processes data to be added in real time suitable for search and analysis (S310 to S320) should be faster than the rate of increase of data added in real time. In this case, a result obtained by dividing a value obtained by multiplying the data processing speed of each core by the number of cores by a period may be faster than an increase rate of data acquired in real time.

또한, 일 실시 예로, 시스템(100)은, 후술할 검색(S330) 및 분석(S340)이 수행되지 않을 것으로 예측되는 시간 구간 동안 상술한 분류 및 전처리(S310~S320) 과정을 수행할 수도 있다.In addition, as an embodiment, the system 100 may perform the above-described classification and pre-processing (S310 to S320) during a time period in which the search (S330) and the analysis (S340), which will be described later, are not expected to be performed.

구체적인 예로, 시스템(100)은 검색어 및 조건이 입력되어 시스템(100)을 통해 검색이 수행되었던 적어도 하나의 시간 구간을 식별할 수 있다. 그리고, 식별된 시간 구간을 기반으로, 시스템(100)은 검색이 수행되는 시간 구간에 대한 패턴(ex. 시간 구간의 평균 시작/종료 시간)을 생성할 수 있다.As a specific example, the system 100 may identify at least one time period in which a search was performed through the system 100 by inputting a search word and a condition. Then, based on the identified time interval, the system 100 may generate a pattern (eg, average start/end time of the time interval) for the time interval in which the search is performed.

이 경우, 생성된 패턴을 기반으로, 시스템(100)은 검색이 수행될 가능성이 가장 낮은 적어도 하나의 시간 구간을 선택할 수 있다.In this case, based on the generated pattern, the system 100 may select at least one time interval in which a search is most likely to be performed.

그리고, 시스템(100)은 선택된 시간 구간 동안, 추가/업데이트 된 데이터에 대하여 S310 내지 S320 과정을 수행할 수 있다. 구체적으로, 시스템(100)은 데이터가 구분된 복수의 콘텐츠 그룹 각각을 적어도 하나의 인덱스 키워드에 따라 전처리하여 복수의 분석 콘텐츠를 획득할 수 있다.Then, the system 100 may perform processes S310 to S320 for the added/updated data during the selected time period. Specifically, the system 100 may obtain a plurality of analyzed content by preprocessing each of a plurality of content groups in which data is divided according to at least one index keyword.

예를 들어, 서버 관리자 또는 사용자의 요청에 따른 검색 및 분석이 주로 오전 8시에서 오후 7시 사이에 수행되는 경우, 시스템(100)은 상술한 S310 내지 S320 과정을 오후 11시에서 다음날 오전 2시까지 수행할 수 있다.For example, when the search and analysis according to the request of the server administrator or user is mainly performed between 8:00 am and 7:00 pm, the system 100 performs the above-described processes S310 to S320 from 11:00 pm to 2:00 am the next day. can be performed

이 경우, 매일 추가적으로 획득/수집되는 외부 데이터(ex. 콘텐츠 그룹)에 대하여 전처리가 수행되어 메모리(110) 상에 구축될 수 있고, 그 결과, 매일 수행되는 검색 및 분석의 속도가 보장될 수 있다.In this case, pre-processing may be performed on external data (eg, content group) that is additionally acquired/collected every day and built on the memory 110 , and as a result, the speed of daily search and analysis can be guaranteed. .

한편, 일 실시 예에 따르면, 시스템(100)은 S310 내지 S320 과정을 통해 획득된 복수의 콘텐츠 그룹 및 복수의 분석 콘텐츠에 대한 백업 정보를 적어도 하나의 스토리지에 저장할 수 있다.Meanwhile, according to an embodiment, the system 100 may store, in at least one storage, backup information for a plurality of content groups and a plurality of analyzed content obtained through processes S310 to S320.

스토리지는, HDD(Hard Disk Drive), SSD(Solid State Drive), 플래시 메모리 등 다양한 구성을 포함할 수 있으나, 이에 한정되지 않는다.The storage may include various configurations such as a hard disk drive (HDD), a solid state drive (SSD), and a flash memory, but is not limited thereto.

그리고, 시스템(100)이 부팅(ex. 종료 후 재부팅)되는 경우, 시스템(100)은 백업 정보를 기반으로 복수의 콘텐츠 그룹 및 복수의 분석 콘텐츠를 메모리(110)에 로드할 수 있다. 이때, 시스템(100)은 복수의 코어를 통한 병렬 처리 방식으로 복수의 콘텐츠 그룹 및 복수의 분석 콘텐츠를 로드할 수 있다.And, when the system 100 is booted (eg, rebooted after termination), the system 100 may load a plurality of content groups and a plurality of analysis content into the memory 110 based on the backup information. In this case, the system 100 may load a plurality of content groups and a plurality of analyzed content in a parallel processing method through a plurality of cores.

한편, 시스템(100)은 백업 정보를 메모리(110)에 포함된 적어도 하나의 비휘발성 메모리 상에 저장해둘 수도 있다. 이 경우, 재부팅된 시스템(100)은, 비휘발성 메모리에 저장된 백업 정보를 이용하여, 복수의 콘텐츠 그룹 및 복수의 분석 콘텐츠를 메모리(110)의 휘발성 메모리 상에 로드할 수 있다.Meanwhile, the system 100 may store the backup information on at least one non-volatile memory included in the memory 110 . In this case, the rebooted system 100 may load a plurality of content groups and a plurality of analyzed content onto the volatile memory of the memory 110 by using the backup information stored in the non-volatile memory.

한편, 일 실시 예로, 시스템(100)은 S330 단계의 검색에 이용된 검색어에 대한 데이터(ex. 검색어 별 빈도, 검색 빈도에 따른 검색어 나열)를 이용하여, 추후 수행되는 메모리 구축 과정(S320)을 수행할 수 있다.On the other hand, in one embodiment, the system 100 uses the data for the search term used in the search of step S330 (eg, the frequency of each search term, the search term listing according to the search frequency), the memory building process (S320) performed later can be done

구체적인 예로, 시스템(100)은 입력 빈도에 따라 각 검색어를 순차적인 인덱스 키워드로 설정하고, 설정된 각 인덱스 키워드를 이용하여 S320 단계의 전처리를 수행하여 복수의 분석 콘텐츠를 획득할 수 있다.As a specific example, the system 100 may obtain a plurality of analysis contents by setting each search word as a sequential index keyword according to the input frequency, and performing the pre-processing of step S320 using each set index keyword.

또한, 일 실시 예로, 시스템(100)은 S330 단계의 검색에 이용된 (입력된) 적어도 하나의 조건을, 추후 인덱스 키워드에 따른 전처리 과정(S320)에 이용할 수도 있다.In addition, as an embodiment, the system 100 may use at least one (input) condition used for the search in step S330 in a pre-processing process ( S320 ) according to the index keyword later.

한편, 도 7은 본 개시의 일 실시 예에 따른 시스템의 기능적 구성을 설명하기 위한 블록도이다.Meanwhile, FIG. 7 is a block diagram illustrating a functional configuration of a system according to an embodiment of the present disclosure.

도 7을 참조하면, 시스템(100)은 다양한 형태/기능의 분석을 수행하기 위한 각종 모듈들(Analysis Modules)을 포함할 수 있다. 한 번 검색된 결과는, 여러 모듈에서 공유해서 사용될 수 있다.Referring to FIG. 7 , the system 100 may include various modules for performing analysis of various forms/functions. Once retrieved, the results can be shared and used by multiple modules.

구체적으로, 시스템(100)은 다양한 분류 관련 처리를 수행하기 위한 Classify TF(Task Force), 단어와 관련된 분석을 수행하는 Word TF, Word TFIDF(Term Frequency-Inverse Document Frequency) 등을 포함할 수 있다. 또한, 시스템(100)은 미디어의 생성/처리와 관련된 Media TF도 포함할 수 있다. 또한, 시스템(100)은 다양한 문서에 대하여 단어 감지, 어절 패턴 감지 등 주제를 발견하기 위한 언어 처리를 수행하기 위한 Topic Modeling 모듈(ex. LDA: Latent Dirichlet Allocation)을 포함할 수 있다. 또한, 시스템(100)은 다양한 외부 데이터를 획득하기 위한 SNA(Social Network Analysis) 모듈을 포함할 수도 있다.Specifically, the system 100 may include a Classify Task Force (TF) for performing various classification-related processing, a Word TF for performing word-related analysis, a Word Term Frequency-Inverse Document Frequency (TFIDF), and the like. System 100 may also include Media TFs related to the creation/processing of media. Also, the system 100 may include a Topic Modeling module (eg, Latent Dirichlet Allocation (LDA)) for performing language processing for discovering topics such as word detection and word pattern detection for various documents. In addition, the system 100 may include a Social Network Analysis (SNA) module for acquiring various external data.

상술한 모듈들 각각은 소프트웨어 및/또는 하드웨어를 통해 구현될 수 있으며, 프로세서(120)에 포함되는 적어도 하나의 코어를 통해 실행될 수 있다.Each of the above-described modules may be implemented through software and/or hardware, and may be executed through at least one core included in the processor 120 .

또한, 도 7을 참조하면, 시스템(100)은 자연어 처리를 위한 NLP(Natural Language Processing) 모듈 및 그 밖에 다양한 실시간 데이터 처리를 수행하기 위한 모듈들을 포함할 수 있다.Also, referring to FIG. 7 , the system 100 may include a Natural Language Processing (NLP) module for natural language processing and other modules for performing various real-time data processing.

구체적으로, 시스템(100)은 실시간으로 외부로부터 미디어 콘텐츠를 수집하기 위한 적어도 하나의 모듈, 수집된 콘텐츠를 이용하여 실시간으로 메모리(110) 내 데이터 구성을 변경/추가/삭제하는 모듈, 추가/변경된 인덱스 키워드를 이용하여 실시간으로 인덱싱(전처리)를 수행하는 모듈 등을 포함할 수 있다.Specifically, the system 100 includes at least one module for collecting media content from the outside in real time, a module for changing/adding/deleting data in the memory 110 in real time using the collected content, and adding/modified It may include a module that performs indexing (pre-processing) in real time using the index keyword, and the like.

일 예로, 시스템(100)은 불용어 제거, 다양한 방식의 단어 관리, 신조어 학습 등을 기반으로 인덱스 키워드를 실시간으로 추가/변경할 수 있다.As an example, the system 100 may add/change index keywords in real time based on removal of stopwords, management of various words, learning of new words, and the like.

그리고, 시스템(100)은 추가/변경된 인덱스 키워드를 이용하여 복수의 콘텐츠 그룹 각각에 대한 전처리를 수행할 수 있다. 그 결과, 메모리(110) 내에 구축되는 복수의 분석 콘텐츠가 시대 및 언어의 변화에 맞게 실시간으로 업데이트될 수 있다.In addition, the system 100 may perform pre-processing for each of the plurality of content groups by using the added/changed index keyword. As a result, a plurality of analysis contents built in the memory 110 may be updated in real time according to changes in times and languages.

한편, 이하 표 1은 본 개시에 따른 일 실시 예의 효과를 설명하기 위한 검색 및 분석 시간의 비교표이다.Meanwhile, Table 1 below is a comparison table of search and analysis times for explaining the effects of an embodiment according to the present disclosure.

기준standard 검색결과
비율Search Results
ratio 인메모리
미적용in-memory
Unapplied Elastic Search + SparkElastic Search + Spark Spark RDD + Spark SQLSpark RDD + Spark SQL 본 개시에 따른 검색 및 분석Search and analysis according to the present disclosure 소요시간
(초)time taken
(candle) 5%5% 72.272.2 4.764.76 2.5+α2.5+α 0.650.65 11%11% 258258 6.56.5 3.5+α3.5+α 1.21.2 15%15% 314314 8.18.1 5.5+α5.5+α 1.451.45 20%20% 404404 9.39.3 6.5+α6.5+α 1.81.8

표 1은 동일한 170만 건의 데이터에 대한 각 실시 예의 검색 및 분석 시간을 계산한 것이다. 특히, SparkRdd + SQL은 SQL의 질의 시간이 계산되지 않은 값으로 실제로는 더 큰 시간이 소요된다고 볼 수 있다.표 1을 참조하면, 인메모리 검색 엔진과 일반적인 검색 엔진 간에 분석에 필요한 모든 데이터를 찾아서 분석에 전달하는 시간까지는 10 ~ 100배 이상의 성능 차이를 보인다.그리고, 본 개시의 일 실시 예에 따른 검색 및 분석은, 검색 엔진(ex. Elastic Search)을 통해 데이터를 전달받은 Spark와 비교하여 4배 이상의 성능 차이를 보인다.Table 1 calculates the search and analysis time of each example for the same 1.7 million data. In particular, SparkRdd + SQL is a value that does not calculate the query time of SQL, and it can be seen that it takes longer time in reality. It shows a performance difference of 10 to 100 times or more until the time to transmit the analysis. And, in the search and analysis according to an embodiment of the present disclosure, compared with Spark, which receives data through a search engine (ex. Elastic Search), 4 More than double the performance difference.

한편, 이상에서 설명된 다양한 실시 예들은 소프트웨어(software), 하드웨어(hardware) 또는 이들의 조합된 것을 이용하여 컴퓨터(computer) 또는 이와 유사한 장치로 읽을 수 있는 기록 매체 내에서 구현될 수 있다.Meanwhile, the various embodiments described above may be implemented in a recording medium readable by a computer or a similar device using software, hardware, or a combination thereof.

하드웨어적인 구현에 의하면, 본 개시에서 설명되는 실시 예들은 ASICs(Application Specific Integrated Circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛(unit) 중 적어도 하나를 이용하여 구현될 수 있다. According to the hardware implementation, the embodiments described in the present disclosure are ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays) ), processors, controllers, micro-controllers, microprocessors, and other electrical units for performing other functions may be implemented using at least one.

일부의 경우에 본 명세서에서 설명되는 실시 예들이 프로세서 자체로 구현될 수 있다. 소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시 예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상술한 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다.In some cases, the embodiments described herein may be implemented by the processor itself. According to the software implementation, embodiments such as the procedures and functions described in this specification may be implemented as separate software modules. Each of the above-described software modules may perform one or more functions and operations described herein.

한편, 상술한 본 개시의 다양한 실시 예들에 따른 시스템(100) 내 적어도 하나의 전자 장치에서의 처리동작을 수행하기 위한 컴퓨터 명령어(computer instructions) 또는 컴퓨터 프로그램은 비일시적 컴퓨터 판독 가능 매체(non-transitory computer-readable medium)에 저장될 수 있다. 이러한 비일시적 컴퓨터 판독 가능 매체에 저장된 컴퓨터 명령어 또는 컴퓨터 프로그램은 특정 기기의 프로세서에 의해 실행되었을 때 상술한 다양한 실시 예에 따른 시스템(100)에서의 처리 동작을 상술한 특정 기기가 수행하도록 한다. On the other hand, computer instructions or computer programs for performing a processing operation in at least one electronic device in the system 100 according to various embodiments of the present disclosure described above are non-transitory computer readable media (non-transitory). may be stored in a computer-readable medium). When the computer instructions or computer program stored in such a non-transitory computer-readable medium are executed by the processor of the specific device, the specific device performs the processing operation in the system 100 according to various embodiments described above.

비일시적 컴퓨터 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 비일시적 컴퓨터 판독 가능 매체의 구체적인 예로는, CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등이 있을 수 있다.The non-transitory computer-readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device. Specific examples of the non-transitory computer-readable medium may include a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In the above, preferred embodiments of the present disclosure have been illustrated and described, but the present disclosure is not limited to the specific embodiments described above, and it is common in the technical field pertaining to the present disclosure without departing from the gist of the present disclosure as claimed in the claims. Various modifications may be made by those having the knowledge of

100: 시스템 110: 메모리
120: 프로세서100: system 110: memory
120: processor

Claims

In the system analysis method,
storing the data acquired in real time in a memory as a plurality of content groups classified according to the acquired date;
obtaining a plurality of analyzed content by preprocessing each of the plurality of content groups according to at least one index keyword;
identifying at least one content by performing, by a plurality of cores, a parallel processing method on the plurality of content groups when at least one search word and condition are input; and
performing statistical analysis on the identified content using at least one analyzed content matching the identified content;
The step of obtaining the plurality of analysis contents includes:
Pre-processing each of the plurality of content groups based on at least one of the frequency of appearance of the index keyword, the order of appearance of the index keyword, classification information according to the index keyword, and pattern information based on the index keyword,
Based on a period set according to the rate of increase of the data acquired in real time, the number of cores included in the system, and the performance of each core, each of the plurality of content groups in which the data is divided according to at least one index keyword Pre-processing to obtain a plurality of analysis contents,
The analysis method of the system is,
When the system performs at least one of classification and pre-processing of data added in real time, a result obtained by dividing a value obtained by multiplying the data processing speed of each core by the number of cores included in the system by a period set according to the performance of each core to exceed the rate of increase of the data acquired in real time, the analysis method of the system.