KR20210081161A

KR20210081161A - Data processing method and apparatus

Info

Publication number: KR20210081161A
Application number: KR1020190173423A
Authority: KR
Inventors: 진달래; 손혜령; 정해주
Original assignee: 주식회사 엘지씨엔에스
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2021-07-01
Also published as: KR102397882B1

Abstract

In a method for processing information in accordance with one embodiment, provided is the method including: a step of receiving a first information expressed in a one-dimensional format and including a categorical variable; a step of determining one or more criteria to be used for rearranging the first information based on a characteristic of the categorical variable included in the first information; a step of converting the first information into a two-dimensional format based on the one or more criteria to obtain a second information; and a step of determining a characteristic of the first information by applying a convolutional neural network (CNN) to the second information converted into the two-dimensional format. Therefore, the present invention is capable of improving a decision accuracy of a model.

Description

Information processing method and apparatus {DATA PROCESSING METHOD AND APPARATUS}

본 발명은 정보 처리 방법 및 장치에 관한 것으로, 더욱 상세하게는, 범주형 변수를 포함하는 1차원 형식의 제 1 정보를 2차원 형식의 제 2 정보로 변환하여 CNN에 적용함으로써, 인접한 비트간에 관련성이 없는 랜덤성의 범주형 변수에 대해서도 국소적 상관관계를 분석할 수 있어 수많은 범주형 변수에 대한 유의미한 데이터 분석을 효율적으로 수행할 수 있는 방법 및 장치에 관한 것이다.The present invention relates to an information processing method and apparatus, and more particularly, by converting first information in a one-dimensional format including a categorical variable into second information in a two-dimensional format and applying it to a CNN, correlation between adjacent bits The present invention relates to a method and apparatus capable of efficiently performing meaningful data analysis on numerous categorical variables by being able to analyze local correlations even for categorical variables of randomness without randomness.

종래의 고객 행동 데이터를 분석하여 활용하는 기술은 변수를 추출하여 머신러닝 모델을 이용하거나 상품 선호도로 변환하여 선형 모형에 적합하는 방법을 이용한다. 전자의 경우에는 변수 추출 과정에서 많은 작업이 요구되고 현업 전문가의 직관이 반영되어 편향이 반영된다는 단점이 있고, 후자의 경우에는 고객과 상품의 관계를 선형으로 모델링할 수밖에 없다는 단점이 있다.The conventional technology to analyze and utilize customer behavior data uses a machine learning model by extracting variables or a method that fits a linear model by converting it into product preference. In the former case, there is a disadvantage that a lot of work is required in the variable extraction process and the bias is reflected by reflecting the intuition of the field expert. In the latter case, there is a disadvantage that the relationship between the customer and the product is modeled linearly.

이러한 종래의 기술은 일반적으로 통계학적 변수 생성 및 분석에 용이한 데이터로 여겨지는 정형 데이터를 주로 이용하지만, 변수량이 많고 분석 과제가 복잡할 경우에는 데이터에 대한 특성 공학(feature engineering)이 어떻게 수행되는지가 모델 성능을 크게 좌우하게 된다. 이에, 정형 데이터에서 특성을 추출하여 고객 행동 데이터를 분석하는 머신러닝 모델에 관한 다양한 기술이 개발되고 있다.These conventional techniques mainly use structured data, which are generally regarded as data that is easy to generate and analyze statistical variables, but when the amount of variables is large and the analysis task is complicated, how feature engineering is performed on the data? has a significant impact on model performance. Accordingly, various technologies related to machine learning models for analyzing customer behavior data by extracting characteristics from structured data are being developed.

일반적으로 고객 행동 데이터에서 특성을 추출하는 경우의 수는 수천에서 수만가지일 수 있으나, 대부분의 변수가 연속형 변수 또는 서열 척도가 아닌 범주형 변수라는 측면에서 수치화하거나 단순하게 표현하기 어려운 단점이 있으며, 분석을 위한 처리 과정에서 변수 수가 폭증하게 된다.In general, the number of cases in which characteristics are extracted from customer behavior data can range from thousands to tens of thousands, but it is difficult to quantify or simply express in terms of categorical variables rather than continuous variables or sequence scales. , the number of variables increases exponentially in the process for analysis.

종래 기술은 매일 조건에 맞는 고객을 추출하여 프로모션을 제공하는 마케팅의 측면에서, 이러한 변수를 매번 다 생성하기는 현실적으로 어려우며, 매일 변동하는 고객의 상태를 저장하는 것도 사실상 불가능하다는 한계가 있다. 이를 해결하기 위한 방법으로, 종래 기술은 폭증되는 변수들 중 일부만 선별하여 특성을 추출하는 방식을 이용하지만, 이러한 방식은 분석가의 직관, 상식, 편향 등이 반영될 수밖에 없는 구조이기 때문에, 결과적으로 제한적인 정보만을 이용하게 되어, 결과의 정확성이 떨어지는 단점이 있다.In the conventional technology, in terms of marketing that provides promotions by extracting customers that meet daily conditions, it is difficult to generate all these variables every time, and it is practically impossible to store the customers' states that change every day. As a method for solving this problem, the prior art uses a method of extracting characteristics by selecting only some of the exploding variables, but since this method is a structure in which the analyst's intuition, common sense, bias, etc. are inevitably reflected, there is a limitation as a result There is a disadvantage in that the accuracy of the results is lowered as only the necessary information is used.

이에, 고객의 생활 및 구매 패턴을 분석하기 위한 고객 행동 데이터의 전처리 과정에서 발생하는 대량의 변수들을 분석에 반영하여 좋은 학습 결과를 얻을 수 있는 기술에 대한 수요가 꾸준히 증가하고 있다.Accordingly, the demand for technology that can obtain good learning results by reflecting a large number of variables generated in the pre-processing of customer behavior data for analyzing customer life and purchase patterns in the analysis is steadily increasing.

한국공개특허 제10-2017-0096298호는 컨볼루션 신경망 기반의 영상 패턴화를 이용한 딥러닝 시스템 및 이를 이용한 영상 학습방법에 관한 것으로, 입력 영상을 입력하는 영상 입력부; 영상 입력부로부터 받은 입력 영상을 패턴화된 다수의 패턴 영상으로 생성하는 패턴화 모듈; 영상 입력부로부터 받은 입력 영상과 패턴화 모듈로부터 수신 받은 패턴 영상을 학습시키는 컨볼루션 신경망(CNN: Convolution Neural Network)을 기반으로 하는 CNN 학습부; 상기 CNN 학습부로부터 학습정보와 상기 영상 입력부로부터 받은 입력 영상을 전달받는 CNN 실행부; 및 상기 CNN 실행부로부터 영상 정보를 받아 영상 정보의 객체를 종류별로 분류하는 최종 분류부를 포함한다.Korean Patent Application Laid-Open No. 10-2017-0096298 relates to a deep learning system using image patterning based on a convolutional neural network and an image learning method using the same, comprising: an image input unit for inputting an input image; a patterning module that generates a plurality of patterned patterned image input images received from the image input unit; a CNN learning unit based on a convolutional neural network (CNN) that learns the input image received from the image input unit and the pattern image received from the patterning module; a CNN execution unit receiving learning information from the CNN learning unit and an input image received from the image input unit; and a final classification unit that receives the image information from the CNN execution unit and classifies the object of the image information by type.

본 발명의 일 실시 예는 정보 처리 방법 및 장치를 제공하여 상기한 종래 기술의 문제점을 해결할 수 있고, 더욱 상세하게는, 범주형 변수를 포함하는 1차원 형식의 제 1 정보를 2차원 형식의 제 2 정보로 변환하여 CNN에 적용함으로써, 인접한 비트간에 관련성이 없는 랜덤성의 범주형 변수에 대해서도 국소적 상관관계를 분석할 수 있어 수많은 범주형 변수에 대한 유의미한 데이터 분석을 효율적으로 수행할 수 있고, 변수의 개수가 늘어나더라도 저장 용량의 측면에서 효율적인 효과를 제공할 수 있다.An embodiment of the present invention can solve the problems of the prior art by providing an information processing method and apparatus, and more specifically, converts first information in a one-dimensional format including a categorical variable into a second information in a two-dimensional format. By converting into 2 information and applying it to CNN, it is possible to analyze local correlation even for categorical variables of randomness that are not related between adjacent bits, so that meaningful data analysis for numerous categorical variables can be efficiently performed. Even if the number of is increased, it is possible to provide an efficient effect in terms of storage capacity.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.Objects of the present invention are not limited to the objects mentioned above, and other objects not mentioned will be clearly understood from the description below.

본 개시의 제 1 측면에 따른 정보 처리 방법은 1차원 형식으로 표현되고 범주형 변수를 포함하는 제 1 정보를 수신하는 단계; 상기 제 1 정보에 포함된 상기 범주형 변수의 특성에 기초하여 상기 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정하는 단계; 상기 하나 이상의 기준에 기초하여 상기 제 1 정보를 2차원 형식으로 변환하여 제 2 정보를 획득하는 단계; 및 상기 2차원 형식으로 변환된 상기 제 2 정보에 CNN(Convolutional Neural Network)을 적용하여 상기 제 1 정보의 특성을 결정하는 단계;를 포함할 수 있다.An information processing method according to a first aspect of the present disclosure includes: receiving first information expressed in a one-dimensional format and including a categorical variable; determining one or more criteria used for rearranging the first information based on a characteristic of the categorical variable included in the first information; converting the first information into a two-dimensional format based on the one or more criteria to obtain second information; and determining a characteristic of the first information by applying a Convolutional Neural Network (CNN) to the second information converted into the two-dimensional format.

또한, 상기 CNN을 적용하여 상기 제 1 정보의 특성을 결정하는 단계는 상기 제 2 정보에 포함된 정보의 위치 간의 국소적 특성에 기초하여 상기 제 1 정보의 속성을 결정할 수 있다.In addition, the determining of the characteristic of the first information by applying the CNN may determine the attribute of the first information based on local characteristics between locations of the information included in the second information.

또한, 상기 범주형 변수는 명목형 변수를 포함하고, 상기 제 1 정보는 상기 명목형 변수가 나타내는 변수들에 대한 정보가 순차적으로 제공되는 비트스트림을 포함할 수 있다.Also, the categorical variable may include a nominal variable, and the first information may include a bitstream in which information on variables indicated by the nominal variable is sequentially provided.

또한, 상기 범주형 변수는 순서형 변수를 더 포함하고, 상기 하나 이상의 기준은 상기 순서형 변수의 기준이 되는 순서를 포함할 수 있다.In addition, the categorical variable may further include an ordinal variable, and the one or more criteria may include an order in which the ordinal variable is based.

또한, 상기 제 1 정보는 구매 정보를 포함하고, 상기 범주형 변수는 구매가 발생하는 제품에 대한 카테고리 정보를 포함하고, 상기 하나 이상의 기준은 상기 카테고리를 포함할 수 있다.Also, the first information may include purchase information, the categorical variable may include category information about a product for which purchase occurs, and the one or more criteria may include the category.

또한, 상기 제 1 정보는 구매 정보를 포함하고, 상기 범주형 변수는 구매가 발생하는 상권 또는 업종에 대한 정보를 포함하고, 상기 하나 이상의 기준은 상기 상권 또는 상기 업종을 포함할 수 있다.In addition, the first information may include purchase information, the categorical variable may include information on a commercial area or industry in which a purchase occurs, and the one or more criteria may include the commercial area or the industry.

또한, 상기 제 2 정보는 상기 제 1 정보를 포함하고, 상기 CNN을 적용하여 상기 제 1 정보의 특성을 결정하는 단계는 상기 제 2 정보를 필터링하여 주요 정보를 획득하는 단계; 및 상기 주요 정보에 기초하여 상기 제 1 정보의 특성을 결정하는 단계;를 포함할 수 있다.In addition, the second information includes the first information, and applying the CNN to determine the characteristics of the first information includes: filtering the second information to obtain main information; and determining a characteristic of the first information based on the main information.

또한, 상기 정보 처리 방법은 상기 하나 이상의 기준에 따라 상기 제 1 정보를 제 1 매트릭스, 제 2 매트릭스 및 제 3 매트릭스로 분류하는 단계; 상기 제 1 매트릭스, 상기 제 2 매트릭스 및 상기 제 3 매트릭스가 각각 R, G, B로 표현된 이미지를 획득하는 단계; 및 상기 이미지에 상기 CNN을 적용하여 상기 제 1 정보의 특성을 결정하는 단계;를 더 포함할 수 있다.The information processing method may further include: classifying the first information into a first matrix, a second matrix and a third matrix according to the one or more criteria; obtaining an image in which the first matrix, the second matrix, and the third matrix are represented by R, G, and B, respectively; and determining the characteristic of the first information by applying the CNN to the image.

또한, 상기 제 1 정보는 수치형 변수를 더 포함하고, 상기 정보 처리 방법은 상기 제 1 정보로부터 상기 범주형 변수로 구성된 제 1-1 정보 및 상기 수치형 변수로 구성된 제 1-2 정보를 획득하는 단계; 상기 범주형 변수의 특성에 기초하여 상기 제 1-1 정보를 2차원 형식으로 변환하여 제 2-1 정보를 획득하는 단계; 상기 수치형 변수의 특성에 기초하여 상기 제 1-2 정보를 2차원 형식으로 변환하여 제 2-2 정보를 획득하는 단계; 상기 제 2-1 정보에 포함된 정보의 위치간 국소적 특성 및 상기 제 2-2 정보에 포함된 정보의 위치간 국소적 특성에 기초하여 상기 제 2-1 정보 및 상기 제 2-2 정보에 상기 CNN을 각각 적용하는 단계; 및 상기 제 2-1 정보 및 상기 제 2-2 정보에 대한 상기 CNN의 적용 결과를 이용하여 상기 제 1 정보의 특성을 결정하는 단계;를 더 포함할 수 있다.In addition, the first information further includes a numeric variable, and the information processing method obtains from the first information 1-1 information consisting of the categorical variable and 1-2 information consisting of the numerical variable to do; obtaining 2-1 information by converting the 1-1 information into a two-dimensional format based on the characteristics of the categorical variable; obtaining 2-2 information by converting the 1-2 information into a two-dimensional format based on the characteristics of the numeric variable; Based on the inter-location local characteristics of the information included in the 2-1 information and the inter-location local characteristics of the information included in the 2-2 information, the 2-1 information and the 2-2 information applying each of the CNNs; and determining a characteristic of the first information using a result of applying the CNN to the 2-1 information and the 2-2 information.

상기 정보 처리 방법은 상기 제 1-1 정보로부터 상기 명목형 변수로 구성된 제 1-3 정보 및 상기 순서형 변수로 구성된 제 1-4 정보를 획득하는 단계; 상기 명목형 변수의 특성에 기초하여 상기 제 1-3 정보를 2차원 형식으로 변환하여 제 2-3 정보를 획득하는 단계; 상기 순서형 변수의 특성에 기초하여 상기 제 1-4 정보를 2차원 형식으로 변환하여 제 2-4 정보를 획득하는 단계; 상기 제 2-3 정보에 포함된 정보의 위치간 국소적 특성 및 상기 제 2-4 정보에 포함된 정보의 위치간 국소적 특성에 기초하여 상기 제 2-3 정보 및 상기 제 2-4 정보에 상기 CNN을 각각 적용하는 단계; 및 상기 제 2-3 정보 및 상기 제 2-4 정보에 대한 상기 CNN의 적용 결과를 이용하여 상기 제 1 정보의 특성을 결정하는 단계;를 더 포함할 수 있다.The information processing method includes: obtaining, from the 1-1 information, 1-3 th information composed of the nominal variable and 1-4 th information composed of the ordinal variable; converting the 1-3 th information into a two-dimensional format to obtain 2-3 th information based on the characteristic of the nominal variable; converting the 1-4th information into a two-dimensional format to obtain 2-4th information based on the characteristics of the ordinal variable; Based on the inter-location local characteristics of the information included in the 2-3 information and the inter-location local characteristics of the information included in the 2-4 information, the 2-3 information and the 2-4 information are applying each of the CNNs; and determining a characteristic of the first information using a result of applying the CNN to the 2-3 th information and the 2-4 information.

본 개시의 제 2 측면에 따른 정보 처리 장치는 1차원 형식으로 표현되고 범주형 변수를 포함하는 제 1 정보를 수신하는 통신부; 및 상기 제 1 정보에 포함된 상기 범주형 변수의 특성에 기초하여 상기 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정하고, 상기 하나 이상의 기준에 기초하여 상기 제 1 정보를 2차원 형식으로 변환하여 제 2 정보를 획득하고, 상기 2차원 형식으로 변환된 상기 제 2 정보에 CNN(Convolutional Neural Network)을 적용하여 상기 제 1 정보의 특성을 결정하는 프로세서;를 포함할 수 있다.An information processing apparatus according to a second aspect of the present disclosure includes: a communication unit for receiving first information expressed in a one-dimensional format and including a categorical variable; and determining one or more criteria used for rearranging the first information based on a characteristic of the categorical variable included in the first information, and converting the first information into a two-dimensional format based on the one or more criteria to obtain the second information, and a processor for determining the characteristic of the first information by applying a Convolutional Neural Network (CNN) to the second information converted into the two-dimensional format.

또한, 상기 프로세서는 상기 제 2 정보에 포함된 정보의 위치 간의 국소적 특성에 기초하여 상기 제 1 정보의 속성을 결정할 수 있다.Also, the processor may determine the attribute of the first information based on a local characteristic between positions of the information included in the second information.

또한, 상기 제 2 정보는 상기 제 1 정보를 포함하고, 상기 프로세서는 상기 제 2 정보를 필터링하여 주요 정보를 획득하고, 상기 주요 정보에 기초하여 상기 제 1 정보의 특성을 결정할 수 있다.In addition, the second information may include the first information, the processor may filter the second information to obtain main information, and determine a characteristic of the first information based on the main information.

또한, 상기 프로세서는 상기 하나 이상의 기준에 따라 상기 제 1 정보를 제 1 매트릭스, 제 2 매트릭스 및 제 3 매트릭스로 분류하고, 상기 제 1 매트릭스, 상기 제 2 매트릭스 및 상기 제 3 매트릭스가 각각 R, G, B로 표현된 이미지를 획득하고, 상기 이미지에 상기 CNN을 적용하여 상기 제 1 정보의 특성을 결정할 수 있다.Further, the processor classifies the first information into a first matrix, a second matrix and a third matrix according to the one or more criteria, wherein the first matrix, the second matrix and the third matrix are R, G , B may be obtained, and the characteristics of the first information may be determined by applying the CNN to the image.

또한, 상기 제 1 정보는 수치형 변수를 더 포함하고, 상기 프로세서는 상기 제 1 정보로부터 상기 범주형 변수로 구성된 제 1-1 정보 및 상기 수치형 변수로 구성된 제 1-2 정보를 획득하고, 상기 범주형 변수의 특성에 기초하여 상기 제 1-1 정보를 2차원 형식으로 변환하여 제 2-1 정보를 획득하고, 상기 수치형 변수의 특성에 기초하여 상기 제 1-2 정보를 2차원 형식으로 변환하여 제 2-2 정보를 획득하고, 상기 제 2-1 정보에 포함된 정보의 위치간 국소적 특성 및 상기 제 2-2 정보에 포함된 정보의 위치간 국소적 특성에 기초하여 상기 제 2-1 정보 및 상기 제 2-2 정보에 상기 CNN을 각각 적용하고, 상기 제 2-1 정보 및 상기 제 2-2 정보에 대한 상기 CNN의 적용 결과를 이용하여 상기 제 1 정보의 특성을 결정할 수 있다.In addition, the first information further includes a numeric variable, and the processor obtains, from the first information, 1-1 information consisting of the categorical variable and 1-2 information consisting of the numerical variable, The 1-1 information is converted into a two-dimensional format based on the characteristics of the categorical variable to obtain the 2-1 information, and the 1-2 information is converted into a two-dimensional format based on the characteristics of the numeric variable. to obtain 2-2 information, and based on the inter-location local characteristics of the information included in the 2-1 information and the inter-location local characteristics of the information included in the 2-2 information, the second information is obtained. The CNN is applied to the 2-1 information and the 2-2 information, respectively, and the characteristic of the first information is determined using the result of applying the CNN to the 2-1 information and the 2-2 information can

또한, 상기 프로세서는 상기 제 1-1 정보로부터 상기 명목형 변수로 구성된 제 1-3 정보 및 상기 순서형 변수로 구성된 제 1-4 정보를 획득하고, 상기 명목형 변수의 특성에 기초하여 상기 제 1-3 정보를 2차원 형식으로 변환하여 제 2-3 정보를 획득하고, 상기 순서형 변수의 특성에 기초하여 상기 제 1-4 정보를 2차원 형식으로 변환하여 제 2-4 정보를 획득하고, 상기 제 2-3 정보에 포함된 정보의 위치간 국소적 특성 및 상기 제 2-4 정보에 포함된 정보의 위치간 국소적 특성에 기초하여 상기 제 2-3 정보 및 상기 제 2-4 정보에 상기 CNN을 각각 적용하고, 상기 제 2-3 정보 및 상기 제 2-4 정보에 대한 상기 CNN의 적용 결과를 이용하여 상기 제 1 정보의 특성을 결정할 수 있다.In addition, the processor obtains, from the 1-1 information, 1-3 th information including the nominal variable and 1-4 th information including the ordinal variable, and based on the characteristic of the nominal variable, the 1-3 information is converted into a two-dimensional format to obtain 2-3 information, and information 1-4 is converted into a two-dimensional format based on the characteristics of the ordinal variable to obtain information 2-4; , the 2-3th information and the 2-4 information based on the inter-location local characteristics of the information included in the 2-3 information and the inter-location local characteristics of the information included in the 2-4 information Each of the CNNs may be applied to , and the characteristics of the first information may be determined using a result of applying the CNN to the 2-3 th information and the 2-4 information.

본 개시의 제 3 측면은 제 1 측면에 따른 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다. 또는, 본 개시의 제 4 측면은 제 1 측면에 따른 방법을 구현하기 위하여 기록매체에 저장된 컴퓨터 프로그램을 제공할 수 있다.A third aspect of the present disclosure may provide a computer-readable recording medium recording a program for executing the method according to the first aspect on a computer. Alternatively, the fourth aspect of the present disclosure may provide a computer program stored in a recording medium to implement the method according to the first aspect.

본 발명의 일 실시 예에 따르면, 수많은 변수들을 하나의 이미지에 압축 반영하여 고객의 구매 속성을 결정하는 모델의 학습 효율을 향상시키고 모델의 결정 정확성을 개선시킬 수 있다.According to an embodiment of the present invention, it is possible to improve the learning efficiency of a model for determining a customer's purchase attribute and improve the decision accuracy of the model by compressing and reflecting numerous variables in one image.

또한, 이미지 기반의 CNN 학습을 통해 상하좌우 관계가 없어 인접한 비트간에 관련성이 없는 랜덤성의 범주형 변수에 대해서도 국소적 상관관계를 읽어낼 수 있어 수많은 범주형 변수에 대한 유의미한 데이터 분석을 수행할 수 있다.In addition, through image-based CNN learning, it is possible to read local correlations even for categorical variables of randomness that are not related between adjacent bits because there is no vertical, horizontal, left-right relationship, so that meaningful data analysis can be performed on numerous categorical variables. .

또한, 사용자가 자신의 편향에 따라 좋은 변수만 편집하여 넣지 않더라도 수많은 주요 변수들을 데이터 분석에 적용할 수 있다.In addition, many major variables can be applied to data analysis even if the user does not edit and insert only good variables according to his or her own bias.

또한, 수많은 변수들을 생성하는 대신 하나의 매트릭스 정보에 압축하여 저장할 수 있어 변수의 개수가 늘어나더라도 저장 용량의 측면에서 효율적인 효과가 있다.In addition, instead of creating numerous variables, it is possible to compress and store one matrix information, so that even if the number of variables increases, there is an efficient effect in terms of storage capacity.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.It should be understood that the effects of the present invention are not limited to the above-described effects, and include all effects that can be inferred from the configuration of the invention described in the detailed description or claims of the present invention.

도 1은 일 실시 예에 따른 정보 처리 장치의 구성의 일 예를 도시한 블록도이다.
도 2는 도 1에 있는 정보 처리 장치가 정보 처리를 수행하는 방법의 일 예를 도시한 흐름도이다.
도 3은 도 1에 있는 정보 처리 장치가 1차원 형식으로 표현되는 제 1 정보를 2차원 형식으로 표현되는 제 2 정보로 변환하고 제 2 정보를 CNN을 적용하여 제 1 정보의 특성을 결정하는 동작의 일 예시를 설명하기 위한 도면이다.
도 4는 도 1에 있는 정보 처리 장치가 복수의 학습 모델을 적용하여 제 1 정보의 특성을 결정한 결과를 평가하는 동작의 일 예시를 설명하기 위한 도면이다.
도 5는 도 1에 있는 정보 처리 장치가 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정하는 동작의 일 예시를 설명하기 위한 도면이다.
도 6은 도 1에 있는 정보 처리 장치가 하나 이상의 기준에 기초하여 제 1 정보를 2차원 형식으로 변환하여 제 2 정보를 획득하는 동작의 다양한 실시 예들을 설명하기 위한 도면이다.1 is a block diagram illustrating an example of a configuration of an information processing apparatus according to an embodiment.
FIG. 2 is a flowchart illustrating an example of a method in which the information processing apparatus in FIG. 1 performs information processing.
3 is an operation in which the information processing device in FIG. 1 converts first information expressed in a one-dimensional format into second information expressed in a two-dimensional format, and applies the second information to CNN to determine the characteristics of the first information It is a diagram for explaining an example of.
FIG. 4 is a diagram for explaining an example of an operation in which the information processing device of FIG. 1 evaluates a result of determining a characteristic of first information by applying a plurality of learning models.
FIG. 5 is a diagram for explaining an example of an operation in which the information processing apparatus of FIG. 1 determines one or more criteria used for rearranging first information.
FIG. 6 is a view for explaining various embodiments of an operation in which the information processing apparatus of FIG. 1 obtains second information by converting first information into a two-dimensional format based on one or more criteria;

이하에서는 첨부한 도면을 참조하여 본 발명을 설명하기로 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시 예로 한정되는 것은 아니다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, the present invention will be described with reference to the accompanying drawings. However, the present invention may be embodied in several different forms, and thus is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 구비할 수 있다는 것을 의미한다.Throughout the specification, when a part is said to be "connected" with another part, it includes not only the case where it is "directly connected" but also the case where it is "indirectly connected" with another member interposed therebetween. . In addition, when a part "includes" a certain component, this means that other components may be further provided without excluding other components unless otherwise stated.

이하 첨부된 도면을 참고하여 실시 예를 상세히 설명하기로 한다.Hereinafter, an embodiment will be described in detail with reference to the accompanying drawings.

도 1은 일 실시 예에 따른 정보 처리 장치(100)의 구성의 일 예를 도시한 블록도이다.1 is a block diagram illustrating an example of a configuration of an information processing apparatus 100 according to an embodiment.

도 1을 참조하면, 일 실시 예에 따른 정보 처리 장치(100)는 통신부(110), 프로세서(120) 및 저장부(130)를 포함할 수 있다.Referring to FIG. 1 , an information processing apparatus 100 according to an embodiment may include a communication unit 110 , a processor 120 , and a storage unit 130 .

일 실시 예에 따른 통신부(110)는 네트워크를 통해 다른 디바이스(예: 단말, 서버)와 연결될 수 있고, 예를 들면, 다른 디바이스와 통신 가능한 모든 종류의 유무선 통신 장치를 포함할 수 있다. 여기에서, 네트워크는 유선 및 무선 등과 같은 다양한 통신망을 통해 구성될 수 있고, 예를 들면, 근거리 통신망(LAN: Local Area Network), 도시권 통신망(MAN: Metropolitan Area Network), 광역 통신망(WAN: Wide Area Network) 등 다양한 통신망으로 구성될 수 있다.The communication unit 110 according to an embodiment may be connected to other devices (eg, terminals, servers) through a network, and may include, for example, all types of wired/wireless communication devices capable of communicating with other devices. Here, the network may be configured through various communication networks such as wired and wireless, for example, a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN). Network), etc., may be composed of various communication networks.

일 실시 예에 따른 통신부(110)는 1차원 형식으로 표현되는 제 1 정보를 수신할 수 있다. 예를 들면, 통신부(110)는 하나 이상의 단말(예: 가맹점 단말 등)이나 서버(예: 결제 서버)로부터 비트 스트림으로 표현되는 정형 데이터에 해당되는 제 1 정보를 수집할 수 있다.The communication unit 110 according to an embodiment may receive first information expressed in a one-dimensional format. For example, the communication unit 110 may collect first information corresponding to structured data expressed as a bit stream from one or more terminals (eg, affiliated store terminals, etc.) or servers (eg, payment server).

일 실시 예에서, 제 1 정보는 고객의 신상 정보(예: 성별, 나이, 연소득 등), 구매 정보(예: 구매 여부, 구매 금액, 구매 상품/서비스, 구매 상품/서비스 카테고리, 구매 가맹점, 구매 업종, 구매 품종, 구매 위치, 결제 수단 종류 등), 행동 정보(예: 사이트 접속 여부, 상품 정보 조회 여부, 문의 전화 여부 등) 및 위치 정보(예: 고객 이동 히스토리, 구매 위치 히스토리 등) 중 적어도 하나를 포함할 수 있다.In an embodiment, the first information includes the customer's personal information (eg, gender, age, annual income, etc.), purchase information (eg, whether to purchase, purchase amount, purchased product/service, purchased product/service category, purchasing merchant, purchase Industry, type of purchase, purchase location, type of payment method, etc.), behavioral information (e.g., site access, product information inquiry, phone call, etc.) and location information (e.g. customer movement history, purchase location history, etc.) may contain one.

일 실시 예에서, 고객의 행동 정보는 특정 카테고리(또는 제품)에 대한 고객의 클릭 여부(예: 고객의 조회 입력에 따라 상품/서비스 정보가 제공되었는지 여부), 클릭 횟수(예: 고객의 조회 입력에 따라 상품/서비스 정보가 제공된 횟수) 및 클릭 순서(예: 고객의 조회 입력에 따라 상품/서비스 정보가 제공된 카테고리의 시계열적 순서) 중 적어도 하나를 포함할 수 있다.In an embodiment, the customer behavior information includes whether the customer clicks on a specific category (or product) (eg, whether product/service information is provided according to the customer's inquiry input), the number of clicks (eg, the customer's inquiry input) may include at least one of the number of times the product/service information is provided according to the number of times the product/service information is provided) and a click order (eg, a time-series order of categories in which product/service information is provided according to a customer's inquiry input).

여기에서, 카테고리는 상품 또는 서비스에 대한 분류 카테고리를 나타내며, 일 실시 예에서, 기설정된 복수의 등급에 따라 카테고리의 종류 및 구체화 수준이 결정될 수 있다. 예를 들면, 등급의 값이 클수록 상대적으로 하위 카테고리를 나타내도록 카테고리 체계가 정의된 1등급 ~ 6등급 중에서 2 등급에 대응되는 카테고리가 상술한 고객의 제 1 정보에 포함되거나 후술할 하나 이상의 기준으로 이용될 수 있다.Here, the category indicates a classification category for a product or service, and in an embodiment, the type and level of refinement of the category may be determined according to a plurality of preset grades. For example, the category corresponding to the 2nd grade among grades 1 to 6, in which the category system is defined to indicate a relatively lower category as the value of the grade increases, is included in the above-mentioned customer's first information or as one or more criteria to be described later. can be used

일 실시 예에서, 제 1 정보는 범주형 변수를 포함할 수 있다. 여기에서, 범주형 변수는 몇 개의 동일한 성질을 갖는 부류나 범위로 분류될 수 있는 변수를 나타낸다. 예를 들면, 제 1 정보가 고객의 구매 정보를 포함하는 경우, 제 1 정보에 포함된 범주형 변수는 구매가 발생하는 제품에 대한 카테고리, 상권 또는 업종에 대한 정보를 포함할 수 있다.In an embodiment, the first information may include a categorical variable. Here, the categorical variable refers to a variable that can be classified into a class or range having several identical properties. For example, when the first information includes purchase information of a customer, the categorical variable included in the first information may include information on a category, commercial area, or industry for a product in which purchase occurs.

일 실시 예에서, 제 1 정보에 포함된 범주형 변수는 명목형 변수를 포함할 수 있고, 제 1 정보는 명목형 변수가 나타내는 변수들에 대한 정보가 순차적으로 제공되는 비트스트림을 포함할 수 있다. 여기에서, 명목형 변수는 변수의 값 사이에 순서를 정할 수 없는 변수로서, 항목들 간의 순서나 척도가 무의미하여 정보의 순서에 전후 상관관계가 없는 변수를 나타낸다. 예를 들면, 제 1 정보가 고객의 구매 정보를 포함하는 경우, 제 1 정보에 포함된 명목형 변수는 구매가 발생하는 제품에 대한 카테고리(예: 여성패션, 공기청정기 등), 상권(예: 명동거리, 홍대, 시청 등) 및 업종(예: 음식점, 금융, 소매 등)에 대한 정보를 포함할 수 있다.In an embodiment, the categorical variable included in the first information may include a nominal variable, and the first information may include a bitstream in which information on variables indicated by the nominal variable is sequentially provided. . Here, the nominal variable is a variable whose order cannot be determined between the values of the variable, and represents a variable that does not correlate with the order of information because the order or scale between items is meaningless. For example, when the first information includes customer purchase information, the nominal variable included in the first information is the category (eg, women's fashion, air purifier, etc.) of the product in which the purchase occurs, and the commercial area (eg: Myeongdong Street, Hongdae, City Hall, etc.) and information on business types (eg, restaurants, finance, retail, etc.) may be included.

일 실시 예에서, 제 1 정보에 포함된 범주형 변수는 순서형 변수를 더 포함할 수도 있다. 여기에서, 순서형 변수는 변수의 값 사이에 서열이나 순서가 있는 변수로서, 항목들 간의 순서나 척도가 유의미하여 정보의 순서에 전후 상관관계가 있는 유한한 변수를 나타낸다. 예를 들면, 제 1 정보가 고객의 신상 정보 및 구매 정보를 포함하는 경우, 제 1 정보에 포함된 순서형 변수는 고객의 구매 등급(예: A~D 등급)이나 연령대(예: 20대, 30대 등)에 대한 정보를 포함할 수 있다.In an embodiment, the categorical variable included in the first information may further include an ordinal variable. Here, the ordinal variable is a variable in which there is a sequence or order between the values of the variable, and represents a finite variable in which the order or scale between items is significant and has a back-and-forth correlation with the order of information. For example, if the first information includes personal information and purchase information of a customer, the ordinal variable included in the first information may be the customer's purchasing grade (eg, A to D grade) or age group (eg, 20s, 30's, etc.).

일 실시 예에서, 제 1 정보는 수치형 변수를 더 포함할 수도 있다. 여기에서, 수치형 변수는 셀 수 있거나 연속적인 속성이 있어 정보의 순서에 전후 상관관계가 자명한 변수를 나타낸다. 예를 들면, 제 1 정보가 고객의 구매 정보 또는 행동 정보를 포함하는 경우, 제 1 정보에 포함된 수치형 변수는 특정 제품(또는 카테고리)에 대한 고객의 구매 횟수(예: 10회)나 클릭 횟수(예: 5회)에 대한 정보를 포함할 수 있다.In an embodiment, the first information may further include a numeric variable. Here, the numerical variable represents a variable that has a countable or continuous property and thus has a clear correlation between the order of information and the sequence. For example, when the first information includes purchase information or behavior information of a customer, the numeric variable included in the first information is the number of purchases (eg, 10 times) or clicks of the customer for a specific product (or category). It can include information about the number of times (eg, 5 times).

일 실시 예에 따른 프로세서(120)는 수신된 제 1 정보를 저장부(130)에 저장 및 관리할 수 있고, 제 1 정보에 대한 통계 분석을 통해 기설정 기간(예: 최근 14일) 동안 제 1 정보에 대한 통계화된 정보를 생성하여 제 1 정보에 포함시킬 수 있다.The processor 120 according to an embodiment may store and manage the received first information in the storage unit 130, and through statistical analysis of the first information, the first information may be stored for a preset period (eg, the last 14 days). Statistical information for the first information may be generated and included in the first information.

일 실시 예에 따른 프로세서(120)는 제 1 정보에 포함된 범주형 변수의 특성에 기초하여 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정할 수 있다. 일 실시 예에서, 하나 이상의 기준은 구매가 발생하는 제품에 대한 카테고리, 상권 및 업종 중 하나 이상에 대한 정보를 포함할 수 있으나, 이에 제한되지 않으며, 그 외에도 제품(또는 카테고리)에 대한 구매 또는 조회가 발생된 날짜, 구매 금액, 구매 횟수, 구매 순서, 구매 여부, 클릭 횟수, 클릭 여부 등을 더 포함할 수 있다.The processor 120 according to an embodiment may determine one or more criteria used for rearranging the first information based on the characteristics of the categorical variable included in the first information. In an embodiment, the one or more criteria may include, but are not limited to, information on one or more of a category, a commercial district, and an industry for a product for which a purchase occurs, and, in addition, a purchase or inquiry for a product (or category) It may further include a date on which is generated, a purchase amount, the number of purchases, a purchase order, whether to purchase, the number of clicks, and whether or not to click.

예를 들면, 프로세서(120)는 제 1 정보가 구매 정보를 포함하고 제 1 정보에 포함된 범주형 변수가 구매가 발생하는 제품에 대한 카테고리 정보를 포함하는 경우, 해당 범주형 변수의 특성(예: 구매가 발생하는 제품에 대한 카테고리 정보)이 기설정 특정 조건(예: 고객 ID, 카테고리 정보 및 카테고리에 대한 고객의 구매 횟수를 포함)을 충족하면, 해당 조건에 대응되는 복수의 기준(예: 고객ID, 구매가 발생하는 제품에 대한 카테고리, 카테고리에 대한 고객의 구매 횟수)을 제 1 정보의 재정렬에 이용하도록 결정할 수 있다.For example, when the first information includes purchase information and the categorical variable included in the first information includes categorical information about a product for which purchase occurs, the processor 120 determines the characteristics of the corresponding categorical variable (eg, : When the category information for a product for which a purchase occurs) satisfies a specific preset condition (eg, including customer ID, category information, and the number of purchases by a customer for a category), a plurality of criteria corresponding to the condition (eg: It may be determined to use a customer ID, a category for a product in which a purchase occurs, and the number of times a customer purchases a category) for reordering the first information.

일 실시 예에 따른 프로세서(120)는 결정된 하나 이상의 기준에 기초하여 제 1 정보를 2차원 형식으로 변환하여 제 2 정보를 획득할 수 있다. 예를 들면, 프로세서(120)는 결정된 하나 이상의 기준 각각을 기설정 사이즈의 매트릭스의 축(예: X축, Y축, Z축) 또는 값(예: 각 행렬의 값)에 적용하여 각각의 기준에 따라 제 1 정보를 배열하는 방식으로 고객별 카테고리별 구매 횟수를 나타내는 2차원 형식의 매트릭스 정보를 생성하여 제 2 정보로서 생성할 수 있다.The processor 120 according to an embodiment may obtain the second information by converting the first information into a two-dimensional format based on one or more determined criteria. For example, the processor 120 applies each of the determined one or more criteria to an axis (eg, X-axis, Y-axis, Z-axis) or value (eg, a value of each matrix) of a matrix of a preset size to each criterion. According to the method of arranging the first information, matrix information in a two-dimensional format indicating the number of purchases for each category for each customer may be generated and generated as the second information.

일 실시 예에서, 제 2 정보는 제 1 정보를 포함할 수 있고, 2차원 형식의 매트릭스 정보이거나 2차원 형식의 이미지일 수 있다. 예를 들면, 프로세서(120)는 상술한 하나 이상의 기준을 기초로 제 1정보를 재배열하여 2차원 형식의 매트릭스 정보를 생성할 수 있고, 획득된 매트릭스 정보를 기저장된 이미지 알고리즘에 적용하여 매트릭스의 위치에 대응되는 이미지의 픽셀에 매트릭스의 값에 대응되는 이미지 출력값(예: 0 이상 255 이하의 정수)을 할당하는 방식으로 이미지 변환을 수행할 수 있고, 이에 따라, R, G 및 B 채널 중 적어도 하나 이상을 통해 표현되는 이미지를 제 2 정보로서 생성할 수 있다. 이렇게 생성된 제 2 정보는 제 1 정보의 형식(예: 정형 데이터의 형식)과는 다른 형식(예: 이미지 형식)으로 제 1 정보를 포함할 수 있다.In an embodiment, the second information may include the first information, and may be matrix information in a two-dimensional format or an image in a two-dimensional format. For example, the processor 120 may generate matrix information in a two-dimensional format by rearranging the first information based on one or more criteria described above, and apply the obtained matrix information to a pre-stored image algorithm to generate the matrix information. Image conversion may be performed by allocating an image output value (eg, an integer greater than or equal to 0 and less than or equal to 255) corresponding to a value of a matrix to pixels of an image corresponding to a position, and thus, at least one of R, G, and B channels An image expressed through one or more may be generated as the second information. The generated second information may include the first information in a format (eg, an image format) different from the format of the first information (eg, a format of fixed data).

일 실시 예에서, 정보 처리 장치(100)는 제 1 정보가 범주형 변수를 포함하는 경우, 구매가 발생하는 제품에 대한 카테고리, 상권 및 업종 중 하나 이상을 포함하는 하나 이상의 기준을 이용하여 제 1 정보를 2차원 형식의 제 2 정보로 변환할 수 있다. 예를 들면, 정보 처리 장치(100)는 제 1 정보에 각 고객의 신상 정보(예: 고객 ID), 제품에 대한 카테고리 정보 및 구매 정보(예: 구매 여부)가 포함된 경우, 매트릭스의 제 1축(예: X축)에 고객의 신상 정보의 기준을 적용하고, 제 2축(예: Y축)에 카테고리 정보의 기준을 적용하고, 각 행렬 값에 고객의 구매 여부의 기준을 적용하여, 매트릭스에 고객별 카테고리별 고객의 구매 여부에 대한 정보를 순차 배열하여 매트릭스 정보를 획득할 수 있으며, 매트릭스를 RGB 채널 중 하나를 통해 표현되는 이미지로 변환하여 제 2 정보를 생성할 수 있다. 이렇게 생성된 제 2 정보는 고객별로 어떤 카테고리의 제품을 주로 구매하는지를 분석하여 카테고리별 고객의 구매 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.In an embodiment, when the first information includes a categorical variable, the information processing apparatus 100 uses one or more criteria including one or more of a category for a product for which a purchase occurs, a commercial area, and a business type to first The information may be converted into second information in a two-dimensional format. For example, when the first information includes each customer's personal information (eg, customer ID), category information about a product, and purchase information (eg, whether or not to purchase), the information processing device 100 may display the first By applying the criteria of customer's personal information to the axis (eg X-axis), applying the criteria for category information to the second axis (eg, Y-axis), and applying the customer's purchase or noting criteria to each matrix value, Matrix information may be obtained by sequentially arranging information on whether or not a customer purchases by category for each customer in a matrix, and the matrix may be converted into an image expressed through one of the RGB channels to generate second information. The second information generated in this way may be used in an information characteristic determination model for predicting the purchase possibility of a customer by category by analyzing which category of product each customer mainly purchases.

다른 일 실시 예에서, 정보 처리 장치(100)는 제 1 정보에 포함된 범주형 변수가 순서형 변수를 더 포함하는 경우, 순서형 변수의 기준이 되는 순서를 포함하는 하나 이상의 기준을 이용하여 제 1 정보를 2차원 형식의 제 2 정보로 변환할 수 있다. 예를 들면, 정보 처리 장치(100)는 제 1 정보에 범주형 변수로서 각 고객의 신상 정보(예: 고객 ID) 및 제품에 대한 카테고리 정보가 포함되고, 순서형 변수로서 제품에 대한 고객의 클릭 시간이 포함된 경우, 매트릭스의 제 1축(예: X축)에 제품에 대한 카테고리 정보의 기준을 적용하고, 제 2축(예: Y축)에 고객의 클릭 시간의 기준이 되는 순서의 기준을 적용하며, 각 행렬 값에 고객의 신상 정보(예: 고객 ID)의 기준을 적용하여, 매트릭스에 카테고리별 클릭 순서에 따른 고객의 신상 정보(예: 고객 ID)를 순차 배열하여 매트릭스 정보를 획득할 수 있으며, 매트릭스를 RGB 채널 중 하나를 통해 표현되는 이미지로 변환하여 제 2 정보를 생성할 수 있다. 이렇게 생성된 제 2 정보는 고객별로 어떤 카테고리의 제품을 우선적으로 클릭하는지 또는 카테고리별로 어떤 고객이 주로 클릭하는지를 분석하여 카테고리별 고객의 조회 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.In another embodiment, when the categorical variable included in the first information further includes an ordinal variable, the information processing device 100 may use one or more criteria including an order that serves as a criterion for the ordinal variable. One information may be converted into second information in a two-dimensional format. For example, the information processing device 100 includes, as a categorical variable, each customer's personal information (eg, customer ID) and category information about a product in the first information, and the customer's click on the product as an ordinal variable When time is included, the criteria of category information for products is applied to the first axis (eg, X-axis) of the matrix, and the criterion of the order that serves as the basis for the customer's click time on the second axis (eg, Y-axis) By applying the criteria of customer's personal information (eg, customer ID) to each matrix value, the customer's personal information (eg, customer ID) is sequentially arranged in the matrix according to the click order by category to obtain matrix information The second information may be generated by converting the matrix into an image represented through one of the RGB channels. The second information generated in this way may be used in an information characteristic determination model for predicting the searchability of a customer by category by analyzing which category of product is preferentially clicked by each customer or which customer mainly clicks by category.

또 다른 일 실시 예에서, 정보 처리 장치(100)는 제 1 정보가 구매 정보를 포함하고 제 1 정보에 포함된 범주형 변수가 구매가 발생하는 제품에 대한 카테고리 정보를 포함하는 경우, 카테고리를 포함하는 하나 이상의 기준을 이용하여 제 1 정보를 2차원 형식의 제 2 정보로 변환할 수 있다. 예를 들면, 정보 처리 장치(100)는 제 1 정보에 카테고리에 대한 각 고객의 구매 횟수가 포함되고 범주형 변수로서 해당 구매와 연관된 카테고리 정보가 포함된 경우, 매트릭스의 제 1축(예: X축)에 카테고리 정보의 기준을 적용하고, 제 2축(예: Y축)에 고객의 신상 정보(예: 고객 ID)의 기준을 적용하며, 각 행렬 값에 고객의 구매 횟수의 기준을 적용하여, 매트릭스에 카테고리별 고객별 구매 횟수를 순차 배열하여 매트릭스 정보를 획득할 수 있으며, 매트릭스를 RGB 채널 중 하나를 통해 표현되는 이미지로 변환하여 제 2 정보를 생성할 수 있다. 이렇게 생성된 제 2 정보는 고객이 어떤 카테고리의 제품을 더 많이 구매하는지를 분석하여 카테고리별 고객의 구매 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.In another embodiment, the information processing device 100 includes a category when the first information includes purchase information and the categorical variable included in the first information includes category information about a product for which purchase occurs. The first information may be converted into the second information in a two-dimensional format using one or more criteria. For example, when the first information includes the number of purchases of each customer for a category and category information associated with the purchase as a categorical variable is included in the first information, the first axis of the matrix (eg, X axis), apply the criteria of customer personal information (eg, customer ID) to the second axis (eg, Y-axis), and apply the criteria of the number of customer purchases to each matrix value. , matrix information may be obtained by sequentially arranging the number of purchases for each customer by category in the matrix, and the matrix may be converted into an image expressed through one of the RGB channels to generate second information. The second information generated in this way may be used in an information characteristic determination model for predicting the purchase possibility of a customer by category by analyzing which category of product the customer purchases more of.

또 다른 일 실시 예에서, 정보 처리 장치(100)는 제 1 정보가 구매 정보를 포함하고 제 1 정보에 포함된 범주형 변수가 구매가 발생하는 상권 또는 업종에 대한 정보를 포함하는 경우, 상권 또는 업종을 포함하는 하나 이상의 기준을 이용하여 제 1 정보를 2차원 형식의 제 2 정보로 변환할 수 있다. 예를 들면, 정보 처리 장치(100)는 제 1 정보에 카테고리에 대한 각 고객의 구매 횟수가 포함되고 범주형 변수로서 해당 구매와 연관된 상권 및 업종에 대한 정보가 포함된 경우, 매트릭스의 제 1축(예: X축)에 상권의 기준을 적용하고, 제 2축(예: Y축)에 업종의 기준을 적용하며, 각 행렬 값에 고객의 구매 횟수의 기준을 적용하여, 매트릭스에 상권별 업종별 고객의 구매 횟수를 순차 배열하여 매트릭스 정보를 획득할 수 있으며, 매트릭스를 RGB 채널 중 하나를 통해 표현되는 이미지로 변환하여 제 2 정보를 생성할 수 있다. 이렇게 생성된 제 2 정보는 고객이 어떤 상권에서 어떤 업종의 제품을 주로 구매하는지를 분석하여 상권별 업종별 구매 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.In another embodiment, the information processing device 100 is the first information includes purchase information, and when the categorical variable included in the first information includes information on the commercial district or industry in which the purchase occurs, the commercial district or The first information may be converted into the second information in a two-dimensional format by using one or more criteria including the industry type. For example, when the first information includes the number of purchases of each customer for a category in the first information and information on a commercial area and industry associated with the purchase as a categorical variable is included, the first axis of the matrix (Example: X-axis) Apply the standard of the business district, apply the standard of the industry to the 2nd axis (e.g., the Y-axis), and apply the customer's purchase count to each matrix value. Matrix information may be obtained by sequentially arranging the number of purchases of the customer, and the matrix may be converted into an image expressed through one of the RGB channels to generate second information. The second information generated in this way may be used in an information characteristic determination model for predicting the purchase possibility by industry by industry by analyzing which type of product the customer mainly purchases in which commercial area.

일 실시 예에 따른 정보 처리 장치(100)가 하나 이상의 기준에 기초하여 제 1 정보를 2차원 형식의 제 2 정보로 변환하는 동작에 관한 다양한 실시 예들은 이하에서 도 6을 참조하며 추가적으로 후술하도록 한다.Various embodiments of the operation of the information processing apparatus 100 converting the first information into the second information in a two-dimensional format based on one or more criteria according to an embodiment will be described below with reference to FIG. 6 and will be additionally described later. .

일 실시 예에 따른 프로세서(120)는 2차원 형식으로 변환된 제 2 정보에 CNN(Convolutional Neural Network)을 적용하여 제 1 정보의 특성을 결정할 수 있다. 일 실시 예에서, 제 1 정보의 특성은 고객의 구매 속성을 포함할 수 있고, 고객의 구매 속성은 시간에 따른 고객의 특정 제품에 대한 구매 확률을 포함할 수 있으며, 예를 들면, 특정 기간 또는 특정 시간대에 고객이 특정 제품을 실제로 구매할지 여부를 수치화한 구매 확률을 포함할 수 있다.The processor 120 according to an embodiment may determine a characteristic of the first information by applying a Convolutional Neural Network (CNN) to the second information converted into a two-dimensional format. In an embodiment, the characteristic of the first information may include a purchasing attribute of the customer, and the purchasing attribute of the customer may include a probability of purchasing a particular product of the customer over time, for example, a specific period or It may include a purchase probability that quantifies whether a customer will actually purchase a particular product at a particular time period.

여기에서, CNN은 합성곱 층(Convolutional Layer)을 이용하여 변수 특징을 추출한 뒤 원하는 출력값을 내는 심층 신경망의 한 종류에 해당하고, 일 실시 예에서, CNN은 합성곱 층을 사용하지 않는 DNN(Deep Neural Network)과 대조적으로 인접한 변수의 조합에 기반한 지역적 특성을 학습하는 것이 가능하고, 합성곱 층의 설계에 따라 시간 순서에 따른 정보 학습이 가능하며, RNN(Recurrent Neural Networks)에 비하여 메모리 사용량이 적고 학습 및 추론 속도가 빠른 특징이 있다.Here, CNN corresponds to a type of deep neural network that extracts variable features using a convolutional layer and then outputs a desired output value, and in one embodiment, CNN is a deep neural network (DNN) that does not use a convolutional layer. In contrast to Neural Network), it is possible to learn local characteristics based on the combination of adjacent variables, and according to the design of the convolutional layer, it is possible to learn information in chronological order, and it uses less memory than RNN (Recurrent Neural Networks) It has the characteristics of fast learning and reasoning speed.

예를 들면, 프로세서(120)는 제 1 정보에 대한 변환을 통해 2차원 형식의 RGB로 표현되는 이미지가 획득되면, 획득된 이미지를 기저장된 CNN 알고리즘에 적용하여 정보 특성 결정 모델을 생성하고, 생성된 정보 특성 결정 모델에 특정 카테고리 정보를 입력하여 제 1 정보의 특성에 포함되는 시간에 따른 고객의 특정 제품 또는 카테고리에 대한 구매 확률을 결정할 수 있다. 또한, 프로세서(120)는 수집된 정보들로부터 변환된 복수의 이미지들을 학습 데이터셋으로 이용하여 CNN 기반의 딥러닝을 통해 정보 특성 결정 모델의 성능을 개선시킬 수 있다. For example, when an image expressed in RGB in a two-dimensional format is obtained through transformation of the first information, the processor 120 applies the obtained image to a pre-stored CNN algorithm to generate an information characteristic determination model, and generate By inputting specific category information into the information characteristic determination model, the probability of purchasing a specific product or category of the customer according to time included in the characteristic of the first information may be determined. In addition, the processor 120 may improve the performance of the information characteristic determination model through CNN-based deep learning by using a plurality of images converted from the collected information as a training dataset.

일 실시 예에 따른 프로세서(120)는 제 2 정보에 포함된 정보의 위치 간의 국소적 특성에 기초하여 제 1 정보의 속성을 결정할 수 있다. 예를 들면, 프로세서(120)는 2차원 형식의 이미지로 변환된 제 2 정보에 CNN을 적용하여 학습을 수행함으로써 CNN을 통해 이미지에 반영된 인접한 변수의 조합을 분석하여 국소적 상관관계를 읽어낼 수 있다.The processor 120 according to an embodiment may determine the attribute of the first information based on local characteristics between positions of the information included in the second information. For example, the processor 120 can read the local correlation by analyzing the combination of adjacent variables reflected in the image through the CNN by applying the CNN to the second information converted into the two-dimensional image and performing learning. have.

이때, 제 2 정보에 포함된 범주형 변수들에 대한 정보들은 랜덤한 특징을 가질 수 있으며, 여기에서, 랜덤이라는 것은 상하좌우 관계가 없어 인접한 비트간에 관련성이 없음을 의미할 수 있다. 즉, 프로세서(120)는 제 2 정보에 수많은 변수들이 압축되어 반영될 수 있고, 제 2 정보에 포함된 수많은 범주형 변수들이 랜덤한 특징이 있음에도, 이미지로 표현되는 제 2 정보의 상하좌우로 인접한 픽셀 간의 조합을 분석하여 위치 간의 국소적 특성을 분석함으로써 이를 유의미하게 이용할 수 있다.In this case, the information on the categorical variables included in the second information may have a random characteristic, where random may mean that there is no relation between adjacent bits because there is no vertical, horizontal, left-right relationship. That is, the processor 120 may compress and reflect numerous variables in the second information, and even though numerous categorical variables included in the second information have random characteristics, the second information expressed as an image is adjacent to the second information vertically, horizontally, horizontally and vertically. It can be used meaningfully by analyzing the combination between pixels to analyze local characteristics between positions.

이에 따라, 프로세서(120)는 이미지에 포함된 범주형 변수들에 대한 정보의 특성을 CNN의 합성곱 층을 통해 효과적으로 추출할 수 있으며, 이미지를 이용하여 CNN 기반의 딥러닝을 수행함으로써 수많은 변수들을 효율적으로 반영하여 고객의 구매 속성을 결정하는 모델의 성능을 크게 향상시킬 수 있다.Accordingly, the processor 120 can effectively extract the characteristics of the information on the categorical variables included in the image through the convolutional layer of the CNN, and perform CNN-based deep learning using the image to generate numerous variables. By efficiently reflecting it, the performance of the model that determines the purchasing attributes of customers can be greatly improved.

일 실시 예에 따른 프로세서(120)는 제 2 정보를 필터링하여 주요 정보를 획득하고, 획득된 주요 정보에 기초하여 제 1 정보의 특성을 결정할 수 있다. 예를 들면, 프로세서(120)는 CNN을 통해 제 2 정보로부터 명목형 변수, 순서형 변수 또는 수치형 변수에 해당하는 복수의 변수들에 대한 정보를 추출할 수 있고, 추출된 정보에 대한 필터링을 통해 기설정된 하나 이상의 주요 변수(예: 고객이 구매하는 제품에 대한 카테고리, 고객 ID, 구매금액, 구매 시간 등)에 대한 정보들을 포함하는 주요 정보를 획득할 수 있으며, 필터링된 주요 변수에 대한 정보들을 정보 특성 결정 모델에 입력하여 고객의 구매 속성을 결정할 수 있다.The processor 120 according to an embodiment may obtain main information by filtering the second information, and may determine a characteristic of the first information based on the obtained main information. For example, the processor 120 may extract information about a plurality of variables corresponding to a nominal variable, an ordinal variable, or a numerical variable from the second information through the CNN, and perform filtering on the extracted information. It is possible to obtain main information including information on one or more preset main variables (eg, categories of products purchased by the customer, customer ID, purchase amount, purchase time, etc.), and information on filtered main variables These factors can be input into the information characterization model to determine the purchasing attributes of customers.

일 실시 예에 따른 프로세서(120)는 결정된 하나 이상의 기준에 따라 제 1 정보를 제 1 매트릭스, 제 2 매트릭스 및 제 3 매트릭스로 분류하고, 제 1 매트릭스, 제 2 매트릭스 및 제 3 매트릭스가 각각 R, G, B로 표현된 이미지를 획득할 수 있으며, 획득된 이미지에 CNN을 적용하여 제 1 정보의 특성을 결정할 수 있다.The processor 120 according to an embodiment classifies the first information into a first matrix, a second matrix, and a third matrix according to one or more determined criteria, and the first matrix, the second matrix, and the third matrix are R, An image represented by G and B may be obtained, and a characteristic of the first information may be determined by applying a CNN to the obtained image.

예를 들면, 프로세서(120)는 상술한 예시에 따라 고객별 카테고리별 고객의 구매 여부에 대한 정보가 순차 배열된 매트릭스를 생성하되, 고객이 구매한 카테고리 정보를 동일 기간(예: 주별)으로 수합하여, 최근 1주차의 기간에 구매한 카테고리가 반영된 제 1 매트릭스, 최근 2주차의 기간에 구매한 카테고리가 반영된 제 2 매트릭스 및 최근 3주차의 기간에 구매한 카테고리가 반영된 제 3 매트릭스를 생성할 수 있다. 또한, 프로세서(120)는 생성된 제 1 매트릭스, 제 2 매트릭스 및 제 3 매트릭스를 각각 R채널, G채널 및 B채널을 통해 변환하여 RGB로 표현된 이미지를 생성할 수 있다.For example, the processor 120 generates a matrix in which information on whether or not the customer purchases by category for each customer is sequentially arranged according to the above-described example, but the information on the category purchased by the customer can be retrieved in the same period (eg, by week). In total, it is possible to create a first matrix reflecting the categories purchased in the period of the last 1 week, a second matrix reflecting the categories purchased in the period of the last 2 weeks, and a third matrix reflecting the categories purchased in the period of the last 3 weeks. have. Also, the processor 120 may generate an image expressed in RGB by converting the generated first matrix, the second matrix, and the third matrix through the R channel, the G channel, and the B channel, respectively.

일 실시 예에 따른 프로세서(120)는 제 1 정보가 수치형 변수를 더 포함하는 경우, 제 1 정보로부터 범주형 변수로 구성된 제 1-1 정보 및 수치형 변수로 구성된 제 1-2 정보를 획득할 수 있다. 또한, 프로세서(120)는 범주형 변수의 특성에 기초하여 제 1-1 정보를 2차원 형식으로 변환하여 제 2-1 정보를 획득하고, 수치형 변수의 특성에 기초하여 제 1-2 정보를 2차원 형식으로 변환하여 제 2-2 정보를 획득할 수 있다. 또한, 프로세서(120)는 제 2-1 정보에 포함된 정보의 위치간 국소적 특성 및 제 2-2 정보에 포함된 정보의 위치간 국소적 특성에 기초하여 제 2-1 정보 및 제 2-2 정보에 CNN을 각각 적용하고, 제 2-1 정보 및 제 2-2 정보에 대한 CNN의 적용 결과를 이용하여 제 1 정보의 특성을 결정할 수 있다.When the first information further includes a numeric variable, the processor 120 according to an embodiment obtains, from the first information, the 1-1 information consisting of the categorical variable and the 1-2 information consisting of the numeric variable. can do. In addition, the processor 120 converts the 1-1 information into a two-dimensional format based on the characteristics of the categorical variable to obtain the 2-1 information, and generates the 1-2 information based on the characteristics of the numeric variable. 2-2 information may be obtained by converting it into a two-dimensional format. In addition, the processor 120 is configured to perform the 2-1 information and the 2-th information based on the inter-location local characteristics of the information included in the 2-1 information and the inter-location local characteristics of the information included in the 2-2 information. The CNN may be applied to the second information, respectively, and the characteristics of the first information may be determined using the results of applying the CNN to the 2-1 information and the 2-2 information.

예를 들면, 프로세서(120)는 제 1 정보를 고객의 ID, 제품에 대한 카테고리 정보 및 고객의 구매 여부로 구성된 제 1-1 정보와, 카테고리에 대한 고객의 구매 횟수 및 구매 시간으로 구성된 제 1-2 정보로 분류하고, 고객별 카테고리별 구매 여부를 나타내는 제 2-1 이미지 및 시간에 따른 카테고리별 구매 횟수를 나타내는 제 2-2 이미지를 생성하고, 제 2-1 및 제 2-2 이미지를 CNN에 적용하여 각 이미지의 국소적 특성을 분석하여 고객이 특정 시간대에 특정 카테고리의 제품을 구매할지 여부를 나타내는 고객의 구매 확률을 결정할 수 있다.For example, the processor 120 transmits the first information to the first information consisting of the customer's ID, category information about the product, and 1-1 information including whether the customer has purchased the product, and the first information consisting of the number of purchases and the purchase time of the customer for the category. Classify into -2 information, generate a 2-1 image indicating whether each customer purchases by category and a 2-2 image indicating the number of purchases by category over time, and generate 2-1 and 2-2 images It can be applied to CNN to analyze the local characteristics of each image to determine the probability of a customer's purchase, which indicates whether the customer will purchase a product in a particular category at a particular time.

일 실시 예에 따른 프로세서(120)는 제 1-1 정보로부터 명목형 변수로 구성된 제 1-3 정보 및 순서형 변수로 구성된 제 1-4 정보를 획득하고, 명목형 변수의 특성에 기초하여 제 1-3 정보를 2차원 형식으로 변환하여 제 2-3 정보를 획득하고, 순서형 변수의 특성에 기초하여 제 1-4 정보를 2차원 형식으로 변환하여 제 2-4 정보를 획득하고, 제 2-3 정보에 포함된 정보의 위치간 국소적 특성 및 제 2-4 정보에 포함된 정보의 위치간 국소적 특성에 기초하여 제 2-3 정보 및 제 2-4 정보에 CNN을 각각 적용하고, 제 2-3 정보 및 제 2-4 정보에 대한 CNN의 적용 결과를 이용하여 제 1 정보의 특성을 결정할 수 있다.The processor 120 according to an embodiment obtains, from the 1-1 information, 1-3 th information composed of nominal variables and 1-4 th information composed of ordinal variables, and based on the characteristics of the nominal variables, 1-3 information is converted into a two-dimensional format to obtain 2-3 information, and information 2-4 is obtained by converting information 1-4 into a two-dimensional format based on the characteristics of an ordinal variable; CNN is applied to the 2-3 information and the 2-4 information, respectively, based on the local characteristics between the positions of the information included in the 2-3 information and the local characteristics between the positions of the information included in the 2-4 information, , it is possible to determine the characteristics of the first information by using the results of CNN applied to the 2-3 information and the 2-4 information.

예를 들면, 프로세서(120)는 제 1-1 정보를 제품에 대한 카테고리 정보 및 고객의 구매 여부로 구성된 제 1-3 정보와, 고객의 구매 등급(예: 1~6등급) 및 시간대(예: 오전, 오후, 저녁, 밤)으로 구성된 1-4 정보로 분류하고, 고객별 카테고리별 구매 여부를 나타내는 제 2-3 이미지 및 시간대별 구매 등급별 구매 여부를 나타내는 제 2-4 이미지를 생성하고, 제 2-3 및 제 2-4 이미지를 CNN에 적용하여 각 이미지의 국소적 특성을 분석하여 특정 구매 등급의 고객이 특정 시간대에 특정 카테고리의 제품을 구매할지 여부를 나타내는 고객의 구매 확률을 결정할 수 있다.For example, the processor 120 transmits the 1-1 information to the 1-3 information consisting of category information about the product and whether the customer has purchased the product, the customer's purchase grade (eg, grades 1 to 6) and time zone (eg, : morning, afternoon, evening, night), and generating 2-3 images indicating whether to purchase by category by customer and 2-4 images indicating whether to purchase by purchase grade by time period, By applying the 2-3 and 2-4 images to the CNN, the local characteristics of each image can be analyzed to determine the purchasing probability of a customer, which indicates whether a customer of a specific purchasing class will purchase a product in a specific category at a specific time. have.

일 실시 예에 따른 프로세서(120)는 제 2-2 정보(예: 수치형 변수)에 포함된 정보의 위치 간의 관련성을 나타내는 제 1 국소적 특성, 제 2-3 정보(예: 명목형 변수)에 포함된 정보의 위치 간의 관련성을 나타내는 제 2 국소적 특성 및 제 2-4 정보(예: 순서형 변수)에 포함된 정보의 위치 간의 관련성을 나타내는 제 3 국소적 특성을 획득하고, 제 1 국소적 특성, 제 2 국소적 특성 및 제 3 국소적 특성에 서로 상이한 가중치를 적용하여 제 1 정보의 특성을 결정할 수 있다.The processor 120 according to an embodiment may include a first local characteristic indicating a relationship between positions of information included in the 2-2 information (eg, a numeric variable), and the 2-3 information (eg, a nominal variable) obtain a second local characteristic indicating the relation between the positions of the information included in , and a third local characteristic indicating the correlation between the positions of the information included in the 2-4 information (eg, ordinal variable), and The characteristic of the first information may be determined by applying different weights to the local characteristic, the second local characteristic, and the third local characteristic.

일 실시 예에서, 프로세서(120)는 제 1 국소적 특성에 적용되는 가중치인 제 1 가중치, 제 3 국소적 특성에 적용되는 가중치인 제 3 가중치 및 제 2 국소적 특성에 적용되는 가중치인 제 2 가중치 순으로 큰 값을 부여할 수 있다(예: 제 1 가중치 > 제 3 가중치 > 제 2 가중치).In an embodiment, the processor 120 may configure a first weight that is a weight applied to the first local characteristic, a third weight that is a weight applied to the third local characteristic, and a second weight that is a weight applied to the second local characteristic. Larger values may be assigned in the order of weights (eg, first weight > third weight > second weight).

이에 따라, 프로세서(120)는 수치형 변수에 가장 높은 가중치를 부여하고 명목형 변수에 가장 낮은 가중치를 부여함으로써, 수치형 변수에 포함된 위치 간의 상관관계를 더 많이 반영하여 제 1 정보의 특성을 결정할 수 있다. 일반적으로, 수치형 변수의 국소적 특성이 순서형 변수의 국소적 특성보다 더 유의미하고, 순서형 변수의 국소적 특성이 명목형 변수의 국소적 특성보다 더 유의미하기 때문에, 프로세서(120)는 가중치를 기반으로 유의미한 정보의 반영 비율을 결정할 수 있어 고객의 구매 확률을 보다 높은 정확성으로 예측할 수 있다.Accordingly, the processor 120 assigns the highest weight to the numeric variable and the lowest weight to the nominal variable, thereby reflecting the correlation between positions included in the numeric variable more to reflect the characteristics of the first information. can decide In general, since local features of numerical variables are more significant than local features of ordinal variables, and local features of ordinal variables are more significant than local features of nominal variables, processor 120 calculates the weight Based on this, it is possible to determine the reflection ratio of meaningful information, so that the customer's purchase probability can be predicted with higher accuracy.

일 실시 예에서, 제 2 가중치는 범주형 변수에 포함된 범주의 개수에 따라서 결정되고, 범주의 개수가 2보다 작거나 같을 경우, 제 1 가중치, 제 2 가중치 및 제 3 가중치 순으로 큰 값을 부여할 수 있다(예: 제 1 가중치 > 제 2 가중치 > 제 3 가중치). 예를 들면, 명목형 변수의 범주가 1개 또는 2개라면 동일 범주 내의 값들의 관련성은 순서형 변수에서의 관련성보다 더 유의미할 수 있으므로, 프로세서(120)는 범주의 개수를 기반으로 유의미한 정보의 반영 비율을 조정할 수 있다.In an embodiment, the second weight is determined according to the number of categories included in the categorical variable, and when the number of categories is less than or equal to 2, the first weight, the second weight, and the third weight are the largest in the order. may be given (eg, first weight > second weight > third weight). For example, if the nominal variable has one or two categories, the relevance of values within the same category may be more significant than the relevance in the ordinal variable, so the processor 120 determines the number of meaningful information based on the number of categories. You can adjust the reflection ratio.

일 실시 예에 따른 프로세서(120)는 정보 처리 장치(100)의 동작 전반을 제어하는 CPU(central processor unit)로 구현될 수 있고, 통신부(110) 및 저장부(130)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다.The processor 120 according to an embodiment may be implemented as a central processor unit (CPU) that controls the overall operation of the information processing device 100 , and is electrically connected to the communication unit 110 and the storage unit 130 to make these You can control the flow of data between them.

일 실시 예에 따른 저장부(130)는 제 1 정보 및 제 2 정보를 저장할 수 있고, 그밖에 정보 처리 장치(100)의 동작 전반에 필요한 데이터를 저장할 수 있다. 일 실시 예에서, 저장부(130)는 SSD(Solid State Disk) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현된 보조기억장치 또는 RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다. 또한, 일 실시 예에서, 저장부(130)는 데이터베이스로 구현되거나 클라우드 또는 별도의 저장 서버로 구현되어 유무선 통신망을 통해 정보 처리 장치(100)에 필요한 데이터 및 저장 공간을 제공할 수 있다.The storage unit 130 according to an embodiment may store the first information and the second information, and other data necessary for the overall operation of the information processing apparatus 100 . In one embodiment, the storage unit 130 is a main memory implemented as a volatile memory such as a secondary storage device or RAM (Random Access Memory) implemented as a non-volatile memory such as a solid state disk (SSD) or HDD (hard disk drive). device may be included. In addition, in an embodiment, the storage unit 130 may be implemented as a database or a cloud or a separate storage server to provide data and storage space necessary for the information processing apparatus 100 through a wired/wireless communication network.

또한, 도 1에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 정보 처리 장치(100)에 더 포함될 수 있음을 관련 기술 분야에서 통상의 지식을 가진 자라면 이해할 수 있다. 예를 들면 정보 처리 장치(100)은 CNN 기반의 학습 모델을 구현하기 위한 알고리즘을 포함할 수 있고, 그 밖에도 사용자 입력을 수신하거나 출력하기 위한 입출력 인터페이스나 출력 정보를 시각화하기 위한 디스플레이 등을 더 포함할 수 있다. 또한, 다른 실시 예에 따를 경우, 도 1에 도시된 구성요소들 중 일부는 생략될 수 있다. 또한, 일 실시 예에 따른 정보 처리 장치(100)는 하나 이상의 서버로 구현될 수 있다.In addition, those of ordinary skill in the art can understand that other general-purpose components other than those shown in FIG. 1 may be further included in the information processing apparatus 100 . For example, the information processing apparatus 100 may include an algorithm for implementing a CNN-based learning model, and in addition, an input/output interface for receiving or outputting a user input, a display for visualizing output information, etc. can do. In addition, according to another embodiment, some of the components shown in FIG. 1 may be omitted. Also, the information processing apparatus 100 according to an embodiment may be implemented as one or more servers.

도 2는 도 1에 있는 정보 처리 장치(100)가 정보 처리를 수행하는 방법의 일 예를 도시한 흐름도이다.FIG. 2 is a flowchart illustrating an example of a method in which the information processing apparatus 100 of FIG. 1 performs information processing.

단계 S210 에서 일 실시 예에 따른 정보 처리 장치(100)는 1차원 형식으로 표현되고 범주형 변수를 포함하는 제 1 정보를 수신할 수 있다. 일 실시 예에서, 제 1 정보는 고객의 신상 정보, 구매 정보, 행동 정보 및 위치 정보 중 적어도 하나를 포함할 수 있고, 명목형 변수, 순서형 변수 및 수치형 변수 중 적어도 하나를 포함할 수 있다.In operation S210 , the information processing apparatus 100 according to an embodiment may receive first information expressed in a one-dimensional format and including a categorical variable. In an embodiment, the first information may include at least one of customer's personal information, purchase information, behavior information, and location information, and may include at least one of a nominal variable, an ordinal variable, and a numeric variable. .

단계 S220 에서 일 실시 예에 따른 정보 처리 장치(100)는 제 1 정보에 포함된 범주형 변수의 특성에 기초하여 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정할 수 있다. 일 실시 예에서, 하나 이상의 기준은 구매가 발생하는 제품에 대한 카테고리 정보, 상권 또는 업종에 대한 정보 중 적어도 하나를 포함할 수 있다.In operation S220, the information processing apparatus 100 according to an embodiment may determine one or more criteria used for rearranging the first information based on the characteristics of the categorical variable included in the first information. In an embodiment, the one or more criteria may include at least one of category information on a product for which purchase occurs, and information on a commercial area or industry.

단계 S230 에서 일 실시 예에 따른 정보 처리 장치(100)는 하나 이상의 기준에 기초하여 상기 제 1 정보를 2차원 형식으로 변환하여 제 2 정보를 획득할 수 있다. 일 실시 예에서, 범주형 변수는 명목형 변수를 포함하고, 제 1 정보는 명목형 변수가 나타내는 변수들에 대한 정보가 순차적으로 제공되는 비트스트림을 포함할 수 있다. 또한, 범주형 변수는 순서형 변수를 더 포함하고, 하나 이상의 기준은 순서형 변수의 기준이 되는 순서를 포함할 수 있다. 또한, 범주형 변수는 순서형 변수를 더 포함하고, 하나 이상의 기준은 순서형 변수의 기준이 되는 순서를 포함할 수 있다. 또한, 제 1 정보는 구매 정보를 포함하고, 범주형 변수는 구매가 발생하는 제품에 대한 카테고리 정보를 포함하고, 하나 이상의 기준은 카테고리를 포함할 수 있다. 또한, 제 1 정보는 구매 정보를 포함하고, 범주형 변수는 구매가 발생하는 상권 또는 업종에 대한 정보를 포함하고, 하나 이상의 기준은 상권 또는 업종을 포함할 수 있다.In operation S230, the information processing apparatus 100 according to an embodiment may obtain the second information by converting the first information into a two-dimensional format based on one or more criteria. In an embodiment, the categorical variable may include a nominal variable, and the first information may include a bitstream in which information on variables indicated by the nominal variable is sequentially provided. In addition, the categorical variable may further include an ordinal variable, and the one or more criteria may include an order in which the ordinal variable is based. In addition, the categorical variable may further include an ordinal variable, and the one or more criteria may include an order in which the ordinal variable is based. In addition, the first information may include purchase information, the categorical variable may include category information about a product for which purchase occurs, and the one or more criteria may include a category. In addition, the first information may include purchase information, the categorical variable may include information on a commercial area or industry in which purchase occurs, and one or more criteria may include a commercial area or industry type.

단계 S240 에서 일 실시 예에 따른 정보 처리 장치(100)는 2차원 형식으로 변환된 제 2 정보에 CNN을 적용하여 제 1 정보의 특성을 결정할 수 있다. 일 실시 예에서, 정보 처리 장치(100)는 제 2 정보에 포함된 정보의 위치 간의 국소적 특성에 기초하여 제 1 정보의 속성을 결정할 수 있다. 또한, 정보 처리 장치(100)는 제 2 정보를 필터링하여 주요 정보를 획득하고, 주요 정보에 기초하여 제 1 정보의 특성을 결정할 수 있다.In step S240, the information processing apparatus 100 according to an embodiment may determine the characteristic of the first information by applying the CNN to the second information converted into a two-dimensional format. In an embodiment, the information processing apparatus 100 may determine the attribute of the first information based on local characteristics between positions of information included in the second information. Also, the information processing apparatus 100 may obtain main information by filtering the second information, and may determine a characteristic of the first information based on the main information.

본 발명의 일 실시 예에 따르면, 사용자가 자신의 편향에 따라 좋은 변수들을 편집하여 넣지 않더라도 수많은 주요 변수들을 이미지에 압축하여 CNN 기반의 데이터 분석에 반영할 수 있어 변수의 개수가 늘어나더라도 저장 용량의 측면에서 효율적인 효과가 있다.According to an embodiment of the present invention, even if the user does not edit and put good variables according to his or her bias, numerous major variables can be compressed into an image and reflected in CNN-based data analysis, so even if the number of variables increases, the storage capacity is reduced. in terms of effectiveness.

본 발명의 일 실시 예에 따르면, 1차원 형식으로 표현되는 범주형 변수들을 2차원 형식의 이미지로 변환하고 CNN을 기반으로 이미지에 대한 학습을 수행함으로써, 수많은 변수들을 하나의 이미지에 압축 반영하여 학습 효율을 향상시킬 수 있다.According to an embodiment of the present invention, by converting categorical variables expressed in one-dimensional format into two-dimensional images and learning the image based on CNN, a number of variables are compressed and reflected in one image to learn efficiency can be improved.

도 3은 도 1에 있는 정보 처리 장치(100)가 1차원 형식으로 표현되는 제 1 정보를 2차원 형식으로 표현되는 제 2 정보로 변환하고 제 2 정보를 CNN을 적용하여 제 1 정보의 특성을 결정하는 동작의 일 예시를 설명하기 위한 도면이다.3 is a diagram in which the information processing device 100 in FIG. 1 converts first information expressed in a one-dimensional format into second information expressed in a two-dimensional format, and applies CNN to the second information to determine the characteristics of the first information It is a diagram for explaining an example of an operation of determining.

도 3을 참조하면, 일 실시 예에 따른 정보 처리 장치(100)는 식별번호 310 내지 320에 도시된 일 예시와 같이, 고객의 신상 정보 및 고객의 구매 정보(예: 식별번호 310 참조)를 관리하는 서버(예: 고객 정보 관리 서버)로부터 1차원 형식으로 표현되고 범주형 변수를 포함하는 제 1 정보(예: 식별번호 310 내지 320 참조)을 수신할 수 있다.Referring to FIG. 3 , the information processing apparatus 100 according to an embodiment manages the customer's personal information and the customer's purchase information (eg, refer to the identification number 310 ), as in an example shown in identification numbers 310 to 320 . may receive first information (eg, refer to identification numbers 310 to 320) expressed in a one-dimensional format and including categorical variables from a server (eg, a customer information management server).

또한, 정보 처리 장치(100)는 식별번호 330에 도시된 일 예시와 같이, 제 1 정보에 포함된 범주형 변수의 특성에 기초하여 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정하고, 결정된 하나 이상의 기준에 기초하여 제 1 정보를 2차원 형식의 매트릭스 정보(예: 식별번호 330 참조)로 변환하여 제 2 정보를 획득할 수 있다.In addition, the information processing device 100 determines one or more criteria used for rearranging the first information based on the characteristics of the categorical variable included in the first information, as in an example shown in identification number 330, and the determined The second information may be obtained by converting the first information into matrix information in a two-dimensional format (eg, refer to identification number 330) based on one or more criteria.

또한, 정보 처리 장치(100)는 식별번호 340에 도시된 일 예시와 같이, 2차원 형식의 매트릭스 정보로 변환된 제 2 정보에 CNN을 적용할 수 있고(예: 식별번호 340 참조), CNN 기반의 딥러닝을 통해 정보 특성 결정 모델을 생성하고 학습하여 개선시킬 수 있다. In addition, the information processing apparatus 100 may apply the CNN to the second information converted into matrix information in a two-dimensional format, as in an example shown in identification number 340 (eg, refer to identification number 340), CNN-based Through deep learning, information characteristic determination models can be created and improved by learning.

또한, 정보 처리 장치(100)는 식별번호 340에 도시된 일 예시와 같이, 제 2 정보에 포함된 정보의 위치 간의 국소적 특성에 기초하여 제 1 정보의 속성을 결정할 수 있고, 예를 들면, 특정 제품에 대한 고객의 구매 확률(예: 식별번호 350 참조)을 결정할 수 있다.In addition, the information processing apparatus 100 may determine the attribute of the first information based on the local characteristics between the positions of the information included in the second information, as in an example shown in the identification number 340, for example, It is possible to determine the probability of a customer's purchase of a particular product (eg, see identification number 350).

도 4는 도 1에 있는 정보 처리 장치(100)가 복수의 학습 모델을 적용하여 제 1 정보의 특성을 결정한 결과를 평가하는 동작의 일 예시를 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining an example of an operation in which the information processing apparatus 100 of FIG. 1 evaluates a result of determining a characteristic of first information by applying a plurality of learning models.

도 4를 참조하면, 정보 처리 장치(100)는 고객의 구매 정보를 복수의 학습 모델에 적용하고, 학습을 통해 성능을 개선시킨 각 학습 모델을 이용하여 각 카테고리에 대한 고객의 구매 확률의 예측값을 결정하며, 예측값과 실제값을 비교하여 각 학습 모델에 대한 대한 성능 지표를 산출할 수 있다.Referring to FIG. 4 , the information processing apparatus 100 applies the customer's purchase information to a plurality of learning models, and predicts the customer's purchase probability for each category using each learning model whose performance is improved through learning. It is determined, and a performance index for each learning model can be calculated by comparing the predicted value with the actual value.

일 실시 예에서, 복수의 학습 모델은 RF, LighGBM, MLP, DNN, 1차원 CNN 및 2차원 CNN 중 적어도 하나를 포함할 수 있다. 예를 들면, 학습 모델이 CNN을 기반으로 하는 경우, 정보 처리 장치(100)는 상술한 방법에 따라 1차원 형식으로 표현되는 제 1 정보를 2차원 형식의 매트릭스 또는 이미지로 표현되는 제 2 정보로 변환하고 변환된 제 2 정보에 CNN을 적용하여 각 카테고리에 대한 고객의 구매 확률의 예측값을 결정할 수 있다. 또한, 학습 모델이 CNN이 아닌 다른 학습 알고리즘을 기반으로 하는 경우, 정보 처리 장치(100)는 1차원 형식으로 표현되는 제 1 정보에 해당 학습 알고리즘을 적용하여 각 카테고리에 대한 고객의 구매 확률의 예측값을 결정할 수 있다.In an embodiment, the plurality of learning models may include at least one of RF, LighGBM, MLP, DNN, one-dimensional CNN, and two-dimensional CNN. For example, when the learning model is CNN-based, the information processing apparatus 100 converts first information expressed in a one-dimensional format to second information expressed in a matrix or image in a two-dimensional format according to the above-described method. By applying the CNN to the transformed second information, it is possible to determine the predicted value of the purchase probability of the customer for each category. In addition, when the learning model is based on a learning algorithm other than CNN, the information processing device 100 applies the learning algorithm to the first information expressed in a one-dimensional format to predict the customer's purchase probability for each category can be decided

일 실시 예에서, 정보 처리 장치(100)는 MSE(Mean Squared Error)를 기반으로 제 1 성능 지표를 산출할 수 있다. 여기에서, MSE는 실제값과 예측값의 차이의 제곱을 평균한 값으로서, 값이 작을수록 실제값을 잘 예측하였음을 나타낸다.In an embodiment, the information processing apparatus 100 may calculate a first performance indicator based on a mean squared error (MSE). Here, the MSE is an average of the squares of the difference between the actual value and the predicted value, and the smaller the value, the better the prediction of the actual value.

일 실시 예에서, 정보 처리 장치(100)는 하기의 수학식 1에 따라 제 2 성능 지표를 나타내는 리프트(Lift)를 산출할 수 있다. 예를 들면, 리프트는 무작위로 선택된 고객들에 비해 각 모델이 예측한 구매 점수 기준 상위 고객들이 실제로 해당 카테고리의 상품을 얼마나 더 구매했는지에 대한 비율을 나타내며, 값이 클수록 실제값을 잘 예측하였음을 나타낼 수 있다. 이에 따라, 정보 처리 장치(100)는 예측 상품수가 높은 고객에 대한 상품 구매 여부를 평가함으로써 보다 높은 정밀하게 성능 지표를 산출할 수 있다.In an embodiment, the information processing apparatus 100 may calculate a lift indicating the second performance indicator according to Equation 1 below. For example, the lift represents the ratio of how many more products in the category were actually purchased by the top customers based on the purchase score predicted by each model compared to the randomly selected customers, and a larger value indicates a better prediction of the actual value. can Accordingly, the information processing apparatus 100 may calculate a performance index with higher precision by evaluating whether a customer with a high number of predicted products purchases a product.

[수학식 1][Equation 1]

(여기에서, N_T는 전체 고객 수를 나타내고, N_C는 해당 카테고리의 제품을 구매한 고객 수를 나타내고, N_X는 모델이 예측한 구매점수 상위 X%에 해당하는 고객의 수를 나타내고, N_XC는 해당 상위 고객 중 실제 해당 카테고리의 제품을 구매한 고객 수를 나타내며, X는 사용자에 의해 설정될 수 있음)(Where, N _T represents the total number of customers, N _C represents the number of customers who purchased a product in the corresponding category, N _X represents the number of customers corresponding to the top X% of the purchasing score predicted by the model, N _XC represents the number of customers who actually purchased the product in the category among the top customers, and X can be set by the user)

도 4에 도시된 테스트 결과의 일 예시처럼, 본 발명의 일 실시 예에 따라 범주형 변수를 포함하는 1차원 형식의 제 1 정보를 2차원 형식의 제 2 정보로 변환하여 CNN에 적용한 경우, 높은 리프트 값과 낮은 MSE 값이 산출되어 높은 성능을 보이는 것을 확인할 수 있다.As an example of the test result shown in FIG. 4 , in accordance with an embodiment of the present invention, when the first information in a one-dimensional format including a categorical variable is converted into second information in a two-dimensional format and applied to the CNN, high It can be seen that the lift value and the low MSE value are calculated, showing high performance.

즉, 본 발명의 일 실시 예에 따르면, 범주형 변수를 포함하는 수많은 변수들을 2차원 형식으로 효율적으로 압축하여 CNN에 반영함으로써, 저장 용량의 측면에서 효율적이고 학습 모델의 높은 성능 개선 또한 이룰 수 있고, 상하좌우 관계가 없어 인접한 비트간에 관련성이 없는 랜덤성의 범주형 변수에 대해서도 국소적 상관관계를 읽어낼 수 있어 수많은 범주형 변수에 대해 효율적으로 유의미한 데이터 분석을 수행할 수 있다.That is, according to an embodiment of the present invention, by efficiently compressing numerous variables including categorical variables in a two-dimensional format and reflecting them in CNN, it is efficient in terms of storage capacity and high performance improvement of the learning model can also be achieved. , it is possible to read local correlations even for categorical variables with randomness that are not related between adjacent bits because there is no vertical, horizontal, and vertical relationship, so that meaningful data analysis can be efficiently performed on numerous categorical variables.

도 5는 도 1에 있는 정보 처리 장치(100)가 제 1 정보의 재정렬에 이용되는 하나 이상의 기준을 결정하는 동작의 일 예시를 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining an example of an operation in which the information processing apparatus 100 of FIG. 1 determines one or more criteria used for rearranging first information.

도 5를 참조하면, 일 실시 예에 따른 정보 처리 장치(100)는 제 1 정보에 포함된 정보의 종류가 기저장된 복수의 조건 중 어느 하나를 충족하면, 해당 조건에 대응되는 변환 방법을 이용하여 제 1 정보를 재정렬하도록 결정할 수 있다. 여기에서, 복수의 조건은 제 1 정보에 포함된 정보의 종류별 조합에 따른 경우의 수를 의미한다.Referring to FIG. 5 , when the type of information included in the first information satisfies any one of a plurality of pre-stored conditions, the information processing apparatus 100 according to an embodiment uses a conversion method corresponding to the condition. It may be decided to rearrange the first information. Here, the plurality of conditions means the number of cases according to combinations of types of information included in the first information.

일 실시 예에서, 각각의 변환 방법은 제 1 정보의 재정렬을 위한 하나 이상의 기준을 포함하고, 예를 들면, 도 5에 도시된 바와 같이, 시간 정보 제시 여부, 고객의 특성 정보(demographics data) 제시 방법, 전체 정보 제시 방법, 신경망 및 우선 순위 중 적어도 하나에 대한 정보를 포함할 수 있다.In one embodiment, each transformation method includes one or more criteria for reordering the first information, for example, as shown in FIG. 5 , whether to present time information, presenting demographics data of the customer It may include information on at least one of a method, an overall information presentation method, a neural network, and a priority.

일 실시 예에서, 시간 정보 제시 여부는 제 1 정보를 2차원 형식으로 재정렬하는 과정에서, 시간 정보를 제시하는 방법(예: 조회 순서별, 일자별, 고밀도, 저밀도 등)에 대한 정보를 나타내고, 고객의 특성 정보 제시 방법은 제 1 정보를 제시하는 방법(예: 별도 반영, 섞어 넣기, RGB 채널 중 별도 채널 이용)에 대한 정보를 나타내고, 전체 정보 제시 방법은 전체 정보를 매트릭스에 배열하는 방법(예: 순서 제시, 랜덤 제시)에 대한 정보를 나타내고, 신경망은 딥러닝에 이용할 CNN 신경망 구조(예: 1차원, 2차원)에 대한 정보를 나타내고, 우선 순위는 우선적으로 결정하고자 하는 변환 방법에 부여되는 가중치에 대한 정보를 나타낸다.In one embodiment, whether to present time information indicates information on a method of presenting time information (eg, by inquiry order, by date, high density, low density, etc.) in the process of rearranging the first information in a two-dimensional format, and the customer's The characteristic information presentation method indicates information on a method of presenting the first information (eg, separate reflection, mixing, use of a separate channel among RGB channels), and the total information presentation method indicates a method of arranging the entire information in a matrix (eg: order presentation, random presentation), the neural network indicates information about the CNN neural network structure (eg, one-dimensional, two-dimensional) to be used for deep learning, and the priority is the weight given to the transformation method to be determined preferentially shows information about

예를 들면, 프로세서(120)는 제 1 정보에 포함된 고객의 신상 정보 및 고객의 행동 정보 중 카테고리별 고객의 클릭 횟수가 포함되면, 시간 정보 제시 여부를 '비제시'로, 고객의 특성 정보 제시 방법을 '섞어 넣기'로, 전체 정보 제시 방법을 '순서 제시'로, 신경망을 '2차원'으로 결정하는 제 1 변환 방법(예: BCL, 도 3a 참조)에 따라서, 고객 ID와 최근 14일 동안 카테고리별 고객의 클릭 횟수에 대한 정보를 순차적으로 2차원 형식의 매트릭스에 배열할 수 있다. 즉, 시간을 기준으로 하지 않고, 고객 ID와 카테고리별 클릭 횟수에 대한 정보가 하나의 매트릭스 내에 혼합되어 제시되면서 각각이 순차적으로 제시되도록 기준을 결정하여 제 1 정보를 2차원 형식으로 재정렬할 수 있다.For example, when the number of clicks of a customer by category is included among the customer's personal information and the customer's behavior information included in the first information, the processor 120 sets whether to present the time information as 'non-present', and the customer's characteristic information According to the first transformation method (eg BCL, see Fig. 3a) which determines the presentation method as 'shuffle', the overall information presentation method as 'order presentation', and the neural network as 'two-dimensional', the customer ID and the last 14 Information on the number of clicks of customers by category during the day can be sequentially arranged in a two-dimensional matrix. That is, the first information can be rearranged in a two-dimensional format by determining the criteria so that the information on the number of clicks by customer ID and the number of clicks by category is mixed and presented in one matrix without being based on time, and each is presented sequentially. .

일 실시 예에 따른 정보 처리 장치(100)는 제 1 정보에 포함된 정보의 종류가 기저장된 복수의 조건 중 둘 이상을 충족하는 경우에는 기설정 우선 순위가 높은 변환 방법을 우선적으로 결정할 수 있다.The information processing apparatus 100 according to an embodiment may preferentially determine a conversion method having a higher preset priority when the type of information included in the first information satisfies two or more of a plurality of pre-stored conditions.

일 실시 예에서, 도 2에 도시된 변환 방법의 명칭(예: ACL, BCR 등)은 시간 정보를 제시하면 A, 시간 정보를 제시하지 않으면 B, RGB 채널 중 이용하는 채널 개수가 1개면 C, 채널 개수가 3개면 D, 순서 제시에 따라 제시되는 정보의 순서에 상관관계 정보가 있으면 L, 상관관계 정보가 없으면 R을 부가하는 방식으로 관리될 수 있고, 이에 따라, 각 변환 방법의 특성을 관리자가 용이하게 확인할 수 있도록 지원할 수 있다.In one embodiment, the name (eg, ACL, BCR, etc.) of the conversion method shown in FIG. 2 is A if time information is presented, B if time information is not presented, C if the number of channels used among RGB channels is one, channel If the number is three, it can be managed by adding D, if there is correlation information in the order of information presented according to the order presentation, L if there is no correlation information, and R is added. We can assist you in making it easier to check.

도 6은 도 1에 있는 정보 처리 장치(100)가 하나 이상의 기준에 기초하여 제 1 정보를 2차원 형식으로 변환하여 제 2 정보를 획득하는 동작의 다양한 실시 예들을 설명하기 위한 도면이다.6 is a view for explaining various embodiments of an operation in which the information processing apparatus 100 of FIG. 1 converts first information into a two-dimensional format based on one or more criteria to obtain second information.

도 6a를 참조하면, 제 1 실시 예에 따른 정보 처리 장치(100)는 고객의 신상 정보 및 고객의 기설정 기간 동안 각각의 카테고리에 대한 클릭 횟수에 대한 정보에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다.Referring to FIG. 6A , the information processing apparatus 100 according to the first embodiment converts the first information into a two-dimensional format based on the customer's personal information and information on the number of clicks for each category during the customer's preset period. can be converted to

제 1 실시 예에서, 제 1 정보는 고객의 신상 정보(예: 고객 ID) 및 고객의 기설정 기간(예: 최근 14일) 동안 각각의 카테고리에 대한 클릭 횟수에 대한 정보를 포함하고, 정보 처리 장치(100)는 제 1 정보를 고객의 신상 정보 및 고객의 기설정 기간 동안 각각의 카테고리에 대한 클릭 횟수에 대한 정보가 구별되어 배치된 매트릭스 정보를 획득할 수 있다.In the first embodiment, the first information includes the customer's personal information (eg, customer ID) and information on the number of clicks for each category during the customer's preset period (eg, the last 14 days), information processing The device 100 may obtain matrix information in which the first information is arranged by distinguishing the personal information of the customer and the information on the number of clicks for each category during the preset period of the customer.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객의 신상 정보, 이용 정보 및 카테고리별 선호 정보를 기설정된 순서에 따라 특정 위치에 따른 점으로 입력하여 산포도(scatter plot)의 형태로 나타내는 제 1 변환 방법(예: BCL)에 따라 변환을 수행할 수 있고, 구체적으로는, 각 고객의 신상정보(예: 고객 ID) 및 사이트 이용 정보를 더미화하고, 마지막 구매 직전의 14일간 발생한 최근 본 상품의 2 등급 카테고리 정보를 합산한 벡터의 원소를 기설정 사이즈의 매트릭스에 순서대로 배치하여 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 601 참조).For example, the information processing apparatus 100 inputs each customer's personal information, usage information, and preference information for each category included in the first information as dots according to a specific location in a predetermined order to form a scatter plot. Conversion may be performed according to a first conversion method (eg, BCL) indicated in the form, specifically, each customer's personal information (eg, customer ID) and site usage information are dummy, and the 14 It is possible to obtain matrix information in a two-dimensional format by sequentially arranging the elements of a vector that sums up 2nd-grade category information of recently viewed products that have occurred on a daily basis in a matrix of a preset size (refer to identification number 601).

예를 들면, 정보 처리 장치(100)는 매트릭스에 행렬 순으로 기설정 개수(예: 고객수)의 고객별 신상 정보(예: 고객 ID)를 먼저 순차 배열하고, 이어서 기설정 개수(예: 2등급 카테고리의 개수)의 카테고리별 클릭 횟수를 순차 배열할 수 있다.For example, the information processing device 100 first sequentially arranges a predetermined number (eg, number of customers) of personal information (eg, customer ID) for each customer in a matrix order, and then a predetermined number (eg, 2) The number of clicks for each category of rating categories) may be sequentially arranged.

도 6b를 참조하면, 제 2 실시 예에 따른 정보 처리 장치(100)는 고객의 신상 정보 및 고객의 기설정 기간 동안 각각의 카테고리에 대한 클릭 횟수에 대한 정보에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다.Referring to FIG. 6B , the information processing apparatus 100 according to the second embodiment converts the first information into a two-dimensional format based on the customer's personal information and information on the number of clicks for each category during the customer's preset period. can be converted to

제 2 실시 예에서, 제 1 정보는 고객의 신상 정보(예: 고객 ID) 및 고객의 기설정 기간(예: 최근 14일) 동안 각각의 카테고리에 대한 클릭 횟수에 대한 정보를 포함하고, 정보 처리 장치(100)는 제 1 정보를 고객의 신상 정보 및 고객의 기설정 기간 동안 각각의 카테고리에 대한 상기 클릭 횟수에 대한 정보가 랜덤하게 배치된 매트릭스 정보를 획득할 수 있다.In the second embodiment, the first information includes the customer's personal information (eg, customer ID) and information on the number of clicks for each category during the customer's preset period (eg, the last 14 days), information processing The apparatus 100 may obtain first information matrix information in which personal information of the customer and information on the number of clicks for each category during a preset period of the customer are randomly arranged.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객의 신상 정보, 이용 정보 및 카테고리별 선호 정보를 설정된 순서 없이 랜덤하게 특정 위치에 따른 점으로 입력하여 산포도의 형태로 나타내는 제 2 변환 방법(예: BCR)에 따라 변환을 수행하며, 구체적으로는, 상술한 제 1 실시 예에 따라 생성된 매트릭스의 원소를 기설정 사이즈의 매트릭스에 랜덤하게 배치하여 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 602 참조).For example, the information processing apparatus 100 inputs the personal information of each customer, usage information, and preference information for each category included in the first information as dots according to a specific location at random without a set order, and displays the first information in the form of a scatter diagram. 2 Transformation is performed according to a transformation method (eg, BCR), and specifically, matrix information in a two-dimensional format is obtained by randomly arranging elements of the matrix generated according to the first embodiment in a matrix of a preset size. may be obtained (see identification number 602).

예를 들면, 정보 처리 장치(100)는 매트릭스에 행렬 순으로 기설정 개수(예: 고객수)의 고객별 신상 정보(예: 고객 ID) 및 기설정 개수(예: 2등급 카테고리의 개수)의 카테고리별 클릭 횟수를 랜덤하게 배열할 수 있다.For example, the information processing device 100 may store personal information (eg, customer ID) of a preset number (eg, number of customers) of each customer in matrix order and a preset number (eg, number of second-class categories) in the matrix. The number of clicks per category can be randomly arranged.

이에 따라, 정보 처리 장치(100)는 매트릭스의 원소를 랜덤한 위치에 배치함으로써, 변수 순서 등으로 반영되는 인접 픽셀 간 상관관계 정보가 제거된 매트릭스 정보를 학습 데이터셋으로 이용하여 고객의 구매 속성의 결정 정확성을 보다 향상시킬 수 있다.Accordingly, the information processing apparatus 100 arranges the elements of the matrix at random positions and uses the matrix information from which the correlation information between adjacent pixels reflected in the order of variables, etc. is removed as a learning data set to determine the customer's purchase attribute. Determination accuracy can be further improved.

도 6c을 참조하면, 제 3 실시 예에 따른 정보 처리 장치(100)는 클릭 순서 및 제품에 대한 카테고리에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다.Referring to FIG. 6C , the information processing apparatus 100 according to the third embodiment may convert the first information into a two-dimensional format based on a click order and a product category.

제 3 실시 예에서, 상술한 하나 이상의 기준은 클릭 순서 및 제품에 대한 카테고리를 포함하고, 매트릭스 정보는 각각의 카테고리에 대한 고객의 클릭 순서를 나타내는 정보를 포함할 수 있다.In a third embodiment, the one or more criteria described above may include a click order and a category for products, and the matrix information may include information indicating a customer's click order for each category.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객의 신상 정보, 이용 정보 및 카테고리별 선호 정보를 고객의 상품 조회(또는 사이트 이용) 발생 시점마다 기록하여 이에 대한 순서를 나타내는 제 3 변환 방법(예: ACL)에 따라 변환을 수행하며, 구체적으로는, 각 고객마다 최근 본 상품별로 해당 카테고리의 마지막 조회일부터 최근 조회일까지 순서대로 배치하고, 고객의 신상 정보 및 사이트 이용 정보를 모든 조회 발생 열(row)에 입력하고, 전체 이미지 길이를 최대 최근 본 상품 건수인 54로 설정할 수 있으며, 이와 같이 각각의 카테고리에 대한 고객의 클릭 순서에 대한 시간의 흐름에 따라 카테고리 정보를 배치하여 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 603 참조).For example, the information processing device 100 records each customer's personal information, usage information, and category preference information included in the first information at each customer's product inquiry (or site use) occurrence time to indicate the order. Conversion is performed according to the third conversion method (eg, ACL), and specifically, for each customer recently viewed product, from the last inquiry date of the category to the most recent inquiry date, the customer's personal information and use of the site Information can be entered in every row of hits, and the total image length can be set to the maximum number of recently viewed products of 54. In this way, category information is displayed over time for the click order of customers for each category. By disposing, matrix information in a two-dimensional format can be obtained (refer to identification number 603).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 클릭 순서를 나타내기 위한 시간의 기준을 적용하고, 제 2축(예: Y축)에 카테고리 정보의 기준을 적용하고, 각 행렬 값에 고객의 클릭 여부의 기준을 적용하여, 매트릭스에 시간별 카테고리별 고객의 클릭 여부에 대한 정보를 순차 배열할 수 있다. 일 실시 예에서, 카테고리의 종류 및 배열 순서는 초기에 랜덤하게 결정된 후 고정적으로 이용될 수 있다.For example, the information processing device 100 applies a time criterion for indicating a click order to a first axis (eg, X-axis) of the matrix, and a criterion of category information to a second axis (eg, Y-axis) , and by applying the criterion of whether or not the customer clicks to each matrix value, information on whether the customer clicks by category by time can be sequentially arranged in the matrix. In an embodiment, the type and arrangement order of the category may be initially randomly determined and then be fixedly used.

도 6d를 참조하면, 제 4 실시 예에 따른 정보 처리 장치(100)는 날짜 및 제품에 대한 카테고리에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 4 실시 예에서, 상술한 하나 이상의 기준은 날짜 및 제품에 대한 카테고리를 포함하고, 매트릭스 정보는 각각의 날짜에 대응되는 각각의 카테고리에 대한 고객의 클릭 횟수에 대한 정보를 포함할 수 있다.Referring to FIG. 6D , the information processing apparatus 100 according to the fourth embodiment may convert the first information into a 2D format based on a date and a category for a product. In the fourth embodiment, the above-described one or more criteria may include categories for dates and products, and the matrix information may include information on the number of clicks of customers for each category corresponding to each date.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객의 신상 정보, 이용 정보 및 카테고리별 선호 정보를 일별로 합산하여 일자별로 고객의 정보 변동 또는 발생 내역을 기록하고 나타내는 제 4 변환 방법(예: A2CL)에 따라 변환을 수행하며, 구체적으로는, 상술한 제 3 실시 예에 따라 생성된 매트릭스의 카테고리 정보를 동일일자에 발생한 상품별로 합산하여 기설정 개수(예: 14개)의 열로 정렬하여 배치하고, 개인정보를 모든 열에 입력하여 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 604 참조).For example, the information processing apparatus 100 adds up each customer's personal information, usage information, and category-specific preference information included in the first information by day, and records and displays the customer's information change or occurrence history by date. The conversion is performed according to the conversion method (eg, A2CL), and specifically, the category information of the matrix generated according to the third embodiment is summed for each product generated on the same day, and a preset number (eg, 14) It is possible to obtain matrix information in a two-dimensional format by arranging it in a column of , and inputting personal information into all columns (refer to identification number 604).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 날짜의 기준을 적용하고, 제 2축(예: Y축)에 카테고리 정보의 기준을 적용하고, 각 행렬 값에 고객의 클릭 횟수의 기준을 적용하여, 매트릭스에 날짜별 카테고리별 고객의 클릭 횟수에 대한 정보를 순차 배열할 수 있다.For example, the information processing device 100 applies the criterion of the date to the first axis (eg, the X-axis) of the matrix, applies the criterion of the category information to the second axis (eg, the Y-axis), and each matrix By applying the criterion of the number of clicks of the customer to the value, information on the number of clicks of the customer by date and category can be sequentially arranged in the matrix.

도 6e를 참조하면, 제 5 실시 예에 따른 정보 처리 장치(100)는 시간 및 구매 히스토리에 대한 업종에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 5 실시 예에서, 상술한 하나 이상의 기준은 시간 및 구매 히스토리에 대한 업종을 포함하고, 매트릭스 정보는 각각의 시간대에 대응되는 각각의 업종에 대한 고객의 구매 금액에 대한 정보를 포함할 수 있다.Referring to FIG. 6E , the information processing apparatus 100 according to the fifth embodiment may convert the first information into a two-dimensional format based on a business type for time and purchase history. In the fifth embodiment, the above-mentioned one or more criteria may include a business type for time and purchase history, and the matrix information may include information on a purchase amount of a customer for each type of business corresponding to each time zone.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객별 월별 업종별 가맹점 이용 금액을 요일 및 시간대로 나누어 합산하고, 각 고객의 최근 업종이용여부 및 업종별 이용금액을 하나의 매트릭스에 배열하는 제 5 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 605 참조).For example, the information processing device 100 divides and sums up the amount of use of affiliated stores for each customer and each industry by month, included in the first information, by day and time period, and sums up each customer's recent industry usage and usage amount by industry in one matrix. According to the fifth transformation method of arranging, matrix information in a two-dimensional format may be obtained (refer to identification number 605).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 업종의 기준을 적용하고, 제 2축(예: Y축)에 요일(또는 시간대)의 기준을 적용하고, 각 행렬 값에 고객의 구매 금액의 기준을 적용하여, 매트릭스에 업종별 요일(또는 시간대)별 고객의 구매 금액에 대한 정보를 순차 배열할 수 있다.For example, the information processing device 100 applies the standard of the industry to the first axis (eg, the X-axis) of the matrix, and applies the standard of the day (or time zone) to the second axis (eg, the Y-axis), , by applying the standard of the customer's purchase amount to each matrix value, information on the customer's purchase amount for each business day (or time zone) can be sequentially arranged in the matrix.

도 6f를 참조하면, 제 6 실시 예에 따른 정보 처리 장치(100)는 시간 및 구매 히스토리에 대한 업종에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 6 실시 예에서, 상술한 하나 이상의 기준은 시간 및 구매 히스토리에 대한 업종을 포함하고, 매트릭스 정보는 각각의 시간대에 대응되는 각각의 업종에 대한 고객의 구매 횟수에 대한 정보를 포함할 수 있다.Referring to FIG. 6F , the information processing apparatus 100 according to the sixth embodiment may convert the first information into a two-dimensional format based on a business type for time and purchase history. In the sixth embodiment, the above-mentioned one or more criteria may include industry types for time and purchase history, and the matrix information may include information on the number of purchases by customers for each industry corresponding to each time zone.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객별 월별 업종별 가맹점 이용 횟수를 요일 및 시간대로 나누어 합산하고, 각 고객의 최근 업종이용여부 및 업종별 이용빈도를 하나의 매트릭스에 배열하는 제 6 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 606 참조).For example, the information processing device 100 divides and sums the number of times of use of affiliated stores by business type by month for each customer included in the first information by day and time period and sums up each customer's recent business use and frequency of use by business type in one matrix. According to the sixth transformation method of arranging, matrix information in a two-dimensional format may be obtained (refer to identification number 606).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 업종의 기준을 적용하고, 제 2축(예: Y축)에 요일(또는 시간대)의 기준을 적용하고, 각 행렬 값에 고객의 구매 횟수의 기준을 적용하여, 매트릭스에 업종별 요일(또는 시간대)별 고객의 구매 횟수에 대한 정보를 순차 배열할 수 있다.For example, the information processing device 100 applies the standard of the industry to the first axis (eg, the X-axis) of the matrix, and applies the standard of the day (or time zone) to the second axis (eg, the Y-axis), , by applying the criterion of the number of purchases of customers to each matrix value, information on the number of purchases by customers by day (or time zone) for each industry type may be sequentially arranged in the matrix.

제 5 내제 제 6 실시 예에 따라 획득된 매트리스 정보는 고객 시간대별 업종별 사용 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있고, RGB 채널 중 하나 이상의 이미지 채널 특성을 이용하여 여러 매트릭스 정보를 하나의 이미지에 겹쳐서 나타낼 수 있으며, 상술한 시간대와 요일의 개수는 정보 합산의 기준에 따라 조정될 수 있다.Mattress information obtained according to the fifth and sixth embodiments may be used in an information characteristic determination model for predicting the availability of each industry by customer time period, and multiple matrix information using one or more image channel characteristics among RGB channels. may be displayed overlaid on the image of , and the above-described time zone and the number of days may be adjusted according to the information summation criteria.

도 6g를 참조하면, 제 7 실시 예에 따른 정보 처리 장치(100)는 시간 및 구매 히스토리에 대한 업종에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 7 실시 예에서, 상술한 하나 이상의 기준은 시간 및 구매 히스토리에 대한 업종을 포함하고, 매트릭스 정보는 각각의 시간대에 대응되는 각각의 업종에 대한 고객의 구매 금액 및 구매 횟수에 대한 정보를 포함할 수 있다.Referring to FIG. 6G , the information processing apparatus 100 according to the seventh embodiment may convert the first information into a two-dimensional format based on a business type for time and purchase history. In the seventh embodiment, the above-described one or more criteria include industry for time and purchase history, and the matrix information includes information on the customer's purchase amount and number of purchases for each industry corresponding to each time zone. can

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객별 구매 정보를 기설정 기간 단위(예: 1개월 단위 또는 1주 단위)로 수합하여 이를 둘 이상의 채널(예: 3번째 차원 방향)으로 조합하여 고객의 월별 또는 주별로 변화하는 구매 패턴을 나타내는 제 7 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 607 참조).For example, the information processing device 100 collects purchase information for each customer included in the first information in units of a preset period (eg, in units of one month or one week), and collects them in two or more channels (eg, in the third dimensional direction) to obtain matrix information in a two-dimensional format according to the seventh transformation method representing a customer's monthly or weekly purchasing pattern (refer to identification number 607).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 업종의 기준을 적용하고, 제 2축(예: Y축)에 요일(또는 시간대)의 기준을 적용하고, 제 3축(예: Z축)에 기간(예: 1주 단위)의 기준을 적용하고, 각 행렬 값에 고객의 구매 횟수의 기준을 적용하여, 각각의 매트릭스에 업종별 요일(또는 시간대)별 고객의 구매 횟수(또는 구매 금액에 대한 정보를 순차 배열하고, 복수의 매트릭스를 제 3축에 따라 정렬하여 매트릭스 정보에 주별(또는 월별) 구매 패턴이 나타나도록 할 수 있다.For example, the information processing device 100 applies the standard of the industry to the first axis (eg, the X-axis) of the matrix, and applies the standard of the day (or time zone) to the second axis (eg, the Y-axis), , by applying the criterion of period (eg, one week) to the third axis (eg, Z-axis), and applying the criterion of the number of customers' purchases to each matrix value, by day (or time period) by business type in each matrix By sequentially arranging information on the number of purchases (or purchase amount) of the customer, and arranging a plurality of matrices along the third axis, a weekly (or monthly) purchase pattern may appear in the matrix information.

제 7 실시 예에 따라 획득된 매트리스 정보는 고객의 차주 요일 시간대별 구매 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있고, 상술한 시간대와 요일의 개수는 정보 합산의 기준에 따라 조정될 수 있다.The mattress information obtained according to the seventh embodiment may be used in an information characteristic determination model for predicting the purchase possibility by time of the next day of the week of the customer, and the above-described time period and the number of days may be adjusted according to the information summation criteria. .

도 6h를 참조하면, 제 8 실시 예에 따른 정보 처리 장치(100)는 시간 및 구매 히스토리에 대한 상권에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 8 실시 예에서, 상술한 하나 이상의 기준은 시간 및 구매 히스토리에 대한 상권을 포함하고, 매트릭스 정보는 각각의 주에 대응되는 각각의 상권에 대한 고객의 구매 여부에 대한 정보를 포함할 수 있다.Referring to FIG. 6H , the information processing apparatus 100 according to the eighth embodiment may convert the first information into a two-dimensional format based on a commercial area for time and purchase history. In the eighth embodiment, the above-mentioned one or more criteria may include commercial districts for time and purchase history, and the matrix information may include information on whether the customer purchases each commercial district corresponding to each state.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 기설정 기간별로(예: 각 주별로) 고객의 구매가 발생한 가맹점이 위치한 상권별로 고객의 구매 여부를 코딩하여 1년 간의 위치 정보를 분석하고, 각 주별로 상권에 대한 고객의 구매 여부를 매트릭스에 배열하는 제 8 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 608 참조).For example, the information processing device 100 codes whether the customer purchases for each commercial district in which the affiliate store where the customer's purchase occurs for each preset period (eg, each state) included in the first information, and provides location information for one year. , and it is possible to obtain matrix information in a two-dimensional format according to the eighth transformation method of arranging whether customers purchase or not for each state in a matrix (refer to identification number 608).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 상권의 기준을 적용하고, 제 2축(예: Y축)에 주의 기준을 적용하여, 매트릭스에 상권별 주별 고객의 구매 여부에 대한 정보를 순차 배열할 수 있다.For example, the information processing device 100 applies the criterion of the commercial district to the first axis (eg, X-axis) of the matrix, and applies the criterion of attention to the second axis (eg, the Y-axis), and applies the criterion of each business district to the matrix. Information on whether a customer has purchased by week can be sequentially arranged.

도 6i를 참조하면, 제 9 실시 예에 따른 정보 처리 장치(100)는 시간 및 구매 히스토리에 대한 상권에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 9 실시 예에서, 상술한 하나 이상의 기준은 시간 및 구매 히스토리에 대한 상권을 포함하고, 매트릭스 정보는 각각의 주에 대응되는 각각의 상권에 대한 상기 고객의 구매 횟수에 대한 정보를 포함할 수 있다.Referring to FIG. 6I , the information processing apparatus 100 according to the ninth embodiment may convert the first information into a two-dimensional format based on time and a commercial district for a purchase history. In the ninth embodiment, the one or more criteria described above may include commercial districts for time and purchase history, and the matrix information may include information on the number of purchases of the customer for each commercial district corresponding to each state. .

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 기설정 기간별로(예: 각 주별로) 고객의 구매가 발생한 가맹점이 위치한 상권별로 고객의 구매 횟수를 코딩하여 1년 간의 위치 정보를 분석하고, 각 주별로 상권에 대한 고객의 구매 횟수를 매트릭스에 배열하는 제 9 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 609 참조).For example, the information processing device 100 codes the number of purchases of the customer for each commercial district in which the affiliate store where the customer's purchase occurred for each preset period (eg, each state) included in the first information, thereby providing location information for one year. can be analyzed, and matrix information in a two-dimensional format can be obtained according to the ninth transformation method of arranging the number of purchases by customers for each state in a matrix (refer to identification number 609).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 상권의 기준을 적용하고, 제 2축(예: Y축)에 주의 기준을 적용하여, 매트릭스에 상권별 주별 고객의 구매 횟수에 대한 정보를 순차 배열할 수 있다.For example, the information processing device 100 applies the criterion of the commercial district to the first axis (eg, X-axis) of the matrix, and applies the criterion of attention to the second axis (eg, the Y-axis), and applies the criterion of each business district to the matrix. Information on the number of purchases by customers per week can be sequentially arranged.

제 8 내제 제 9 실시 예에 따라 획득된 매트리스 정보는 고객이 요일시간대별로 어느 위치에 있을지를 예측하기 위한 정보 특성 결정 모델에 이용될 수 있고, 이를 고객의 상권 선호 히스토리에 대한 학습에 반영하여 정보 특성 결정 모델을 개선시킬 수 있다. 또한, 고객의 예상 위치 및 과거 위치 히스토리를 고려한 구매 가능성을 예측하기 위한 정보 특성 결정 모델, 상권별 방문 고객군을 예측하기 위한 정보 특성 결정 모델 및 고객별 위치 히스토리 기반 상권별 방문 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있고, 타겟팅에 고객 TPO별 예상 위치를 반영하여 정확성을 향상시킬 수 있다.Mattress information obtained according to the eighth and ninth embodiments may be used in an information characteristic determination model for predicting where the customer will be by day of the week and by reflecting this in the learning of the customer's preference history in the business district. The characterization model can be improved. In addition, information characteristic determination model for predicting the purchase possibility considering the customer's expected location and past location history, the information characteristic determination model for predicting the visiting customer group by commercial area, and information for predicting the visitability by location history by customer It can be used in the characteristic determination model, and the accuracy can be improved by reflecting the expected location for each customer TPO in the targeting.

도 6j를 참조하면, 제 10 실시 예에 따른 정보 처리 장치(100)는 요일 및 시간대에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 10 실시 예에서, 상술한 하나 이상의 기준은 요일 및 시간대를 포함하고, 매트릭스 정보는 각각의 요일 및 각각의 시간대에 대응되는 고객의 구매 횟수에 대한 정보를 포함할 수 있다.Referring to FIG. 6J , the information processing apparatus 100 according to the tenth embodiment may convert the first information into a two-dimensional format based on the day of the week and the time zone. In the tenth embodiment, the above-described one or more criteria may include a day and time zone, and the matrix information may include information on the number of purchases of customers corresponding to each day and each time zone.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객 별로 기설정 기간별(예: 월별 또는 주별) 해당 기간 내에 발생한 구매건에 대해 업종별, 전체구매 또는 주요업종에 대해 시계열에 따라 배치하는 형태로 구매 횟수를 합산하여 매트릭스에 배열하는 제 10 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 610 참조).For example, the information processing device 100 for each customer included in the first information for each preset period (eg, monthly or weekly) for purchases that occurred within the period for each type of business, all purchases, or major business types are arranged according to time series According to the tenth transformation method of summing the number of purchases in the form of , and arranging them in a matrix, matrix information in a two-dimensional format can be obtained (refer to identification number 610).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 시간대의 기준을 적용하고, 제 2축(예: Y축)에 요일의 기준을 적용하여, 매트릭스에 시간대별 요일별 고객의 구매 횟수의 합에 대한 정보를 순차 배열할 수 있다.For example, the information processing device 100 applies the time zone standard to the first axis (eg, X-axis) of the matrix, and applies the reference of the day of the week to the second axis (eg, Y-axis), and displays the time in the matrix. Information on the sum of the number of purchases by customers for each unit and day of the week may be sequentially arranged.

제 10 실시 예에 따라 획득된 매트리스 정보는 고객의 구매 발생 패턴, 시간 및 요일 활용 패턴을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다. 또한, 정보 처리 장치(100)는 사용자(예: 관리자)에 의해 선택된 몇 가지 업종(예: 편의점, 온라인, 식당 등의 빈발 업종)을 R, G 및 B의 채널별로 분리하여 각각의 매트릭스에 배열할 수도 있다.The mattress information obtained according to the tenth embodiment may be used in an information characteristic determination model for predicting a customer's purchase occurrence pattern, time and day usage pattern. In addition, the information processing device 100 separates several industries (eg, frequent industries such as convenience stores, online, restaurants, etc.) selected by a user (eg, an administrator) by channels of R, G, and B and arranges them in respective matrices You may.

도 6k를 참조하면, 제 11 실시 예에 따른 정보 처리 장치(100)는 구매 히스토리에 대한 상권 및 업종에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다. 제 11 실시 예에서, 상술한 하나 이상의 기준은 구매 히스토리에 대한 상권 및 업종을 포함하고, 매트릭스 정보는 각각의 상권 및 각각의 업종에 대응되는 상기 고객의 구매 횟수에 대한 정보를 포함할 수 있다.Referring to FIG. 6K , the information processing apparatus 100 according to the eleventh embodiment may convert the first information into a two-dimensional format based on a commercial district and a business type for a purchase history. In the eleventh embodiment, the above-described one or more criteria may include commercial districts and industries for the purchase history, and the matrix information may include information on the number of purchases of the customer corresponding to each commercial district and each industry.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 각 고객별로 최근 3개월간 발생한 구매내역에 대하여 각 업종별 상권별로 고객의 구매 횟수를 합산하여 코딩하고, 주요상권에 따라 업종 별로 고객의 구매 횟수를 나타내는 제 11 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 612 참조).For example, the information processing device 100 sums up and codes the number of purchases of customers for each business district for each industry with respect to the purchase history that has occurred in the last three months for each customer included in the first information, and codes the number of purchases of the customer by industry according to the major commercial districts. According to the eleventh transformation method indicating the number of purchases, matrix information in a two-dimensional format may be obtained (refer to identification number 612).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 상권의 기준을 적용하고, 제 2축(예: Y축)에 업종의 기준을 적용하여, 매트릭스에 상권별 업종별 고객의 구매 횟수의 합에 대한 정보를 순차 배열할 수 있다. 일 실시 예에서, 상권은 행정구역을 포함할 수 있다.For example, the information processing device 100 applies the standard of the business district to the first axis (eg, the X-axis) of the matrix, and applies the standard of the industry to the second axis (eg, the Y-axis), and applies the standard of the business district to the matrix. Information on the sum of the number of purchases by customers for each industry may be sequentially arranged. In an embodiment, the commercial district may include an administrative district.

이에 따라, 정보 처리 장치(100)는 고객이 어떤 상권에서 어떤 업종을 주로 소비하는 지에 대한 매트릭스 정보를 도출하여 사용자(예: 관리자)에 의해 선택된 몇 가지 업종(예: 편의점, 온라인, 식당 등의 빈발 업종)을 R, G 및 B의 채널별로 분리하여 각각의 매트릭스에 배열할 수도 있다.Accordingly, the information processing device 100 derives matrix information on which type of business the customer mainly consumes in which commercial area, and selects several types of business (eg, convenience store, online, restaurant, etc.) by the user (eg, manager). Frequent industries) may be separated by channels of R, G, and B and arranged in each matrix.

제 11 실시 예에 따라 획득된 매트리스 정보는 고객이 어떤 상권에서 어떤 업종을 주로 소비하는지를 분석하고 상권별 업종별 고객의 구매 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.The mattress information obtained according to the eleventh embodiment may be used in an information characteristic determination model for analyzing which type of business the customer mainly consumes in which commercial area and predicting the purchase possibility of the customer by each type of business area.

도 6l를 참조하면, 제 12 실시 예에 따른 정보 처리 장치(100)는 시간, 구매 히스토리에 대한 상권 및 업종에 기초하여 제 1 정보를 2차원 형식으로 변환할 수 있다.Referring to FIG. 6L , the information processing apparatus 100 according to the twelfth embodiment may convert the first information into a two-dimensional format based on time, a commercial district and a business type for purchase history.

제 12 실시 예에서, 상술한 하나 이상의 기준은 시간, 구매 히스토리에 대한 상권 및 업종을 포함하고, 매트릭스 정보는 각각의 시간대에 대응되는 각각의 업종에 대한 상기 고객의 구매 금액 또는 구매 횟수에 대한 정보를 포함하는 제 1 매트릭스 정보, 각각의 주에 대응되는 각각의 상권에 대한 상기 고객의 구매 여부 또는 구매 횟수에 대한 정보를 포함하는 제 2 매트릭스 정보 및 각각의 상권 및 각각의 업종에 대응되는 상기 고객의 구매 횟수에 대한 정보를 포함하는 제 3 매트릭스 정보를 포함할 수 있다.In the twelfth embodiment, the above-described one or more criteria include time, a commercial district and a business type for purchase history, and the matrix information is information about the customer's purchase amount or number of purchases for each industry corresponding to each time zone. First matrix information including, the second matrix information including information on whether or not the customer purchases or the number of purchases for each commercial district corresponding to each state, and the customer corresponding to each commercial district and each industry may include third matrix information including information on the number of purchases.

예를 들면, 정보 처리 장치(100)는 제 1 정보에 포함된 기설정 기간별(예: 월별, 주별, 연도별 등) 기간 구획에 따라 해당 기간 동안 발생한 고객의 구매 이력의 업종, 요일, 시간 및 상권을 두 개의 쌍으로 구분하고, 해당 영역 별로 발생한 구매 횟수를 합산하여 코딩하고, 생성된 3 개의 매트릭스를 시간, 상권 및 업종 각각의 축에 따라 고객의 구매 횟수를 나타내는 제 12 변환 방법에 따라 2차원 형식의 매트릭스 정보를 획득할 수 있다(식별번호 612 참조).For example, the information processing device 100 may determine the type of business, day, time, and date of the customer's purchase history that occurred during the period according to the period division for each preset period (eg, monthly, weekly, yearly, etc.) included in the first information. The commercial district is divided into two pairs, and the number of purchases that occur in each area is summed and coded, and the generated three matrices are converted to 2 according to the twelfth transformation method indicating the number of purchases by customers along the axes of time, commercial district, and industry. It is possible to obtain matrix information in a dimension format (refer to identification number 612).

예를 들면, 정보 처리 장치(100)는 매트릭스의 제 1축(예: X축)에 요일(또는 시간)의 기준을 적용하고, 제 2축(예: Y축)에 상권의 기준을 적용하고, 제 3축(예: Z축)에 업종의 기준을 적용하여, 매트릭스에 해당 축에 따른 고객의 구매 횟수의 합에 대한 정보를 순차 배열하고, 복수의 매트릭스를 복수의 축에 따라 정렬할 수 있다.For example, the information processing device 100 applies the standard of the day (or time) to the first axis (eg, the X-axis) of the matrix, and applies the standard of the business district to the second axis (eg, the Y-axis), , by applying the industry standard to the third axis (eg, Z axis), sequentially arranging information about the sum of the number of purchases by customers along the corresponding axis in the matrix, and arranging multiple matrices along multiple axes have.

제 12 실시 예에 따라 획득된 매트리스 정보는 고객이 어떤 상권에서 어떤 업종을 어떤 시간대에 주로 소비하는지를 분석하고 상권별 업종별 시간대별 고객의 구매 가능성을 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.The mattress information obtained according to the twelfth embodiment may be used in an information characteristic determination model for analyzing which type of business and at which time period the customer mainly consumes in which commercial district and predicting the purchase possibility of the customer for each time period by business district and industry.

도 6m을 참조하면, 제 13 실시 예에 따른 정보 처리 장치(100)는 시간, 구매 히스토리에 대한 상권 및 업종에 기초하여 제 1 정보를 변환할 수 있다.Referring to FIG. 6M , the information processing apparatus 100 according to the thirteenth embodiment may convert the first information based on time, a commercial district and a business type for purchase history.

예를 들면, 정보 처리 장치(100)는 상술한 제 5 변환 방법, 제 9 변환 방법 및 제 11 변환 방법 각각에 따라 2차원 형식의 제 1 매트릭스 정보, 제 2 매트릭스 정보 및 제 3 매트릭스 정보를 획득할 수 있고, 제 1축(예: X축)에 요일(또는 시간)의 기준을 적용하고, 제 2축(예: Y축)에 상권의 기준을 적용하고, 제 3축(예: Z축)에 업종의 기준을 적용하여, 제 1 매트릭스 정보, 제 2 매트릭스 정보 및 제 3 매트릭스 정보를 각 축에 따라 정렬할 수 있다.For example, the information processing apparatus 100 obtains the first matrix information, the second matrix information, and the third matrix information in a two-dimensional format according to each of the above-described fifth transformation method, ninth transformation method, and eleventh transformation method. You can, apply the standard of the day (or time) to the first axis (eg, X-axis), apply the standard of the business district to the second axis (eg, Y-axis), and apply the standard of the third axis (eg, Z-axis) ) by applying the industry standard, the first matrix information, the second matrix information, and the third matrix information may be aligned along each axis.

도 6n을 참조하면, 제 14 실시 예에 따른 정보 처리 장치(100)는 시간, 구매 히스토리에 대한 상권 및 업종에 기초하여 제 1 정보를 변환할 수 있다.Referring to FIG. 6N , the information processing apparatus 100 according to the fourteenth embodiment may convert the first information based on time, a commercial district and a business type for purchase history.

예를 들면, 정보 처리 장치(100)는 상술한 제 1 변환 방법 내지 제 4 변환 방법 중 어느 하나에 따라 동일한 시간 구간을 가지는 기간별로 2차원 형식의 매트릭스 정보를 3개 획득하여 R, G 및 B의 다채널로 정렬할 수 있다.For example, the information processing apparatus 100 obtains three pieces of matrix information in a two-dimensional format for each period having the same time section according to any one of the above-described first to fourth transformation methods, R, G, and B can be sorted into multiple channels of

예를 들면, 구매 속성 결정 장치(100)는 각 고객별로 고객이 구매한 카테고리 정보를 동일 기간(예: 일별, 주별, 월별 등)으로 수합하고, 해당 정보를 1주차에 구매 카테고리 정보를 R채널로, 2주차에 구매한 카테고리 정보를 G채널로, 3주차에 구매한 카테고리 정보를 B채널로 정렬할 수 있다.For example, the purchase attribute determination device 100 collects category information purchased by the customer for each customer in the same period (eg, daily, weekly, monthly, etc.), and collects the purchase category information in the first week of the R channel , the category information purchased in the 2nd week can be sorted into the G channel, and the category information purchased in the 3rd week can be sorted by the B channel.

이에 따라, 구매 속성 결정 장치(100)는 시점 정보에 관계 없이 하나의 값으로 축약되는 값들이 시점 정보를 기준으로 분리되어 축약이 감소되는 효과를 제공할 수 있다. 즉, 저장되는 정보의 양은 늘어나지만 저장 용량은 그대로 유지되거나 줄어들 수 있어 보다 효율적인 방식으로 이미지를 생성할 수 있다.Accordingly, the purchase attribute determining apparatus 100 may provide an effect of reducing the abbreviation by dividing values abbreviated to one value regardless of the viewpoint information based on the viewpoint information. That is, although the amount of information to be stored increases, the storage capacity may be maintained or reduced, so that an image may be generated in a more efficient manner.

제 14 실시 예에 따라 획득된 매트리스 정보는 고객이 특정 기간(예: 최근 3주) 동안 구매한 카테고리 정보를 기반으로 차주에 구매할 카테고리 정보를 예측하기 위한 정보 특성 결정 모델에 이용될 수 있다.The mattress information obtained according to the fourteenth embodiment may be used in an information characteristic determination model for predicting category information to be purchased by the borrower based on category information purchased by the customer during a specific period (eg, the last three weeks).

한편, 상술한 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터로 읽을 수 있는 기록매체에 여러 수단을 통하여 기록될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 램, USB, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.Meanwhile, the above-described method can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium. In addition, the structure of the data used in the above-described method may be recorded in a computer-readable recording medium through various means. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (eg, ROM, RAM, USB, floppy disk, hard disk, etc.) and an optically readable medium (eg, CD-ROM, DVD, etc.) do.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention.

100: 정보 처리 장치
110: 통신부 120: 프로세서
130: 저장부100: information processing device
110: communication unit 120: processor
130: storage

Claims

An information processing method comprising:
receiving first information expressed in a one-dimensional format and including a categorical variable;
determining one or more criteria used for rearranging the first information based on a characteristic of the categorical variable included in the first information;
converting the first information into a two-dimensional format based on the one or more criteria to obtain second information; and
Determining a characteristic of the first information by applying a Convolutional Neural Network (CNN) to the second information converted into the two-dimensional format.

The method of claim 1,
The step of determining the characteristic of the first information by applying the CNN
determining an attribute of the first information based on a local characteristic between locations of the information included in the second information.

3. The method of claim 2,
the categorical variable includes a nominal variable,
The first information includes a bitstream in which information on variables indicated by the nominal variable is sequentially provided.

4. The method of claim 3,
The categorical variable further includes an ordinal variable,
wherein the one or more criteria comprises an order by which the ordinal variable is based.

The method of claim 1,
The first information includes purchase information, and the categorical variable includes category information about a product for which purchase occurs,
wherein the one or more criteria comprises the category.

The method of claim 1,
The first information includes purchase information, and the categorical variable includes information on a commercial area or industry in which a purchase occurs,
The one or more criteria include the commercial district or the industry, the method.

The method of claim 1,
The second information includes the first information,
The step of determining the characteristic of the first information by applying the CNN
filtering the second information to obtain main information; and
determining a characteristic of the first information based on the primary information.

The method of claim 1,
classifying the first information into a first matrix, a second matrix and a third matrix according to the one or more criteria;
obtaining an image in which the first matrix, the second matrix, and the third matrix are represented by R, G, and B, respectively; and
determining the characteristic of the first information by applying the CNN to the image.

5. The method of claim 4,
The first information further includes a numeric variable,
obtaining, from the first information, 1-1 information consisting of the categorical variable and 1-2 information consisting of the numerical variable;
obtaining 2-1 information by converting the 1-1 information into a two-dimensional format based on the characteristics of the categorical variable;
obtaining 2-2 information by converting the 1-2 information into a two-dimensional format based on the characteristics of the numeric variable;
Based on the inter-location local characteristics of the information included in the 2-1 information and the inter-location local characteristics of the information included in the 2-2 information, the 2-1 information and the 2-2 information are applying each of the CNNs; and
determining the characteristic of the first information using a result of applying the CNN to the 2-1 information and the 2-2 information.

10. The method of claim 9,
obtaining, from the 1-1 information, 1-3 th information composed of the nominal variable and 1-4 th information composed of the ordinal variable;
converting the 1-3 th information into a two-dimensional format to obtain 2-3 th information based on the characteristic of the nominal variable;
converting the 1-4th information into a two-dimensional format to obtain 2-4th information based on the characteristics of the ordinal variable;
Based on the inter-location local characteristics of the information included in the 2-3 information and the inter-location local characteristics of the information included in the 2-4 information, the 2-3 information and the 2-4 information applying each of the CNNs; and
determining the characteristic of the first information using a result of applying the CNN to the 2-3th information and the 2-4th information;

An information processing device comprising:
a communication unit for receiving first information expressed in a one-dimensional format and including a categorical variable; and
One or more criteria used for rearranging the first information are determined based on the characteristics of the categorical variable included in the first information, and the first information is converted into a two-dimensional format based on the one or more criteria. A processor that acquires second information and determines a characteristic of the first information by applying a Convolutional Neural Network (CNN) to the second information converted into the two-dimensional format.

12. The method of claim 11,
the processor
and determine an attribute of the first information based on a local characteristic between locations of the information included in the second information.

13. The method of claim 12,
the categorical variable includes a nominal variable,
The first information includes a bitstream in which information on variables indicated by the nominal variable is sequentially provided.

14. The method of claim 13,
The categorical variable further includes an ordinal variable,
wherein the one or more criteria comprises an order in which the ordinal variable is based.

12. The method of claim 11,
The first information includes purchase information, and the categorical variable includes category information about a product for which purchase occurs,
wherein the one or more criteria comprises the category.

12. The method of claim 11,
The first information includes purchase information, and the categorical variable includes information on a commercial area or industry in which a purchase occurs,
The one or more criteria include the commercial district or the industry, the device.

12. The method of claim 11,
The second information includes the first information,
the processor
The apparatus of claim 1, wherein the second information is filtered to obtain main information, and a characteristic of the first information is determined based on the main information.

12. The method of claim 11,
the processor
classifying the first information into a first matrix, a second matrix and a third matrix according to the one or more criteria;
obtaining an image in which the first matrix, the second matrix and the third matrix are represented by R, G, and B, respectively;
Applies the CNN to the image to determine the characteristic of the first information.

15. The method of claim 14,
The first information further includes a numeric variable,
the processor
Obtaining 1-1 information consisting of the categorical variable and 1-2 information consisting of the numerical variable from the first information,
converting the 1-1 information into a two-dimensional format based on the characteristics of the categorical variable to obtain 2-1 information,
Converting the 1-2 information into a two-dimensional format based on the characteristics of the numeric variable to obtain 2-2 information,
Based on the inter-location local characteristics of the information included in the 2-1 information and the inter-location local characteristics of the information included in the 2-2 information, the 2-1 information and the 2-2 information are Each of the CNNs is applied,
An apparatus for determining a characteristic of the first information by using a result of applying the CNN to the 2-1 information and the 2-2 information.

20. The method of claim 19,
the processor
obtaining 1-3 th information including the nominal variable and 1-4 th information including the ordinal variable from the 1-1 information;
transforming the 1-3 information into a two-dimensional format based on the characteristic of the nominal variable to obtain the 2-3 information;
converting the 1-4th information into a two-dimensional format based on the characteristics of the ordinal variable to obtain 2-4th information,
Based on the inter-location local characteristics of the information included in the 2-3 information and the inter-location local characteristics of the information included in the 2-4 information, the 2-3 information and the 2-4 information Each of the CNNs is applied,
and determining a characteristic of the first information by using a result of applying the CNN to the 2-3th information and the 2-4th information.

A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 10 in a computer is recorded.