KR102657791B1

KR102657791B1 - Method for providing persona code using deep learning module and server performing the same

Info

Publication number: KR102657791B1
Application number: KR1020240007334A
Authority: KR
Inventors: 최성필; 박수창
Original assignee: 주식회사 데이타몬드
Priority date: 2023-08-31
Filing date: 2024-01-17
Publication date: 2024-04-17

Abstract

본 발명은 페르소나 코드 생성 방법 및 이를 수행하는 서버에 관한 것이다. 상기 페르소나 코드 생성 방법은, 상기 서버와 연동된 데이터베이스에 미리 저장된 설문 데이터에 대한 답변 데이터와, 상기 사용자 단말을 통해 수집된 비정형 데이터를 상기 사용자 단말로부터 수신하는 단계, 상기 답변 데이터 및 상기 비정형 데이터에 대해 전처리 동작을 수행하여 전처리 데이터를 생성하는 단계, 상기 전처리 데이터에 대응하는 태그(tag)를 부여하여 태그 부여 데이터를 생성하는 단계, 상기 태그 부여 데이터에 포함된 상기 태그를 이용하여 제1 페르소나 코드를 생성하는 단계, 상기 설문 데이터를 구성하는 각 설문문항에 미리 지정된 카테고리를 기초로 상기 전처리 데이터에 대한 군집화를 통해 제2 페르소나 코드를 생성하는 단계 및 상기 제1 페르소나 코드 및 상기 제2 페르소나 코드를 결합하여 최종 페르소나 코드를 생성하는 단계를 포함한다.The present invention relates to a method for generating persona codes and a server that performs the same. The persona code generation method includes receiving response data to survey data pre-stored in a database linked to the server and unstructured data collected through the user terminal from the user terminal, the answer data and the unstructured data Generating pre-processed data by performing a pre-processing operation on the pre-processed data, generating tagged data by assigning a tag corresponding to the pre-processed data, and generating a first persona code using the tag included in the tagged data. Generating a second persona code through clustering of the preprocessed data based on a category pre-specified for each survey question constituting the survey data, and generating the first persona code and the second persona code. This includes combining to generate the final persona code.

Description

Method for providing persona code using deep learning module and server performing the same}

본 발명은 딥러닝 모듈을 이용한 페르소나 코드 제공 방법 및 이를 수행하는 서버에 관한 것이다. 구체적으로, 본 발명은 다양한 카테고리에 관한 사용자의 성향을 세분화하여 반영한 페르소나 코드를 생성하여 제공하며, 사용자 데이터가 결여된 경우에 페르소나 코드를 예측하여 제공하는 방법에 관한 것이다. The present invention relates to a method of providing persona code using a deep learning module and a server that performs the same. Specifically, the present invention relates to a method of generating and providing a persona code that reflects the user's tendencies in various categories in detail, and predicting and providing a persona code when user data is lacking.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section simply provides background information for this embodiment and does not constitute prior art.

최근 MBTI(Myers-Briggs Type Indicator)를 이용한 성향 분석 서비스를 제공하여, 인간의 성향을 특정한 개수로 분류하는 서비스가 증가하고 있다. 이러한 서비스에서는 사용자의 성향을 파악하기 위한 데이터 수집의 중요도가 매우 높아지고 있는 실정이다.Recently, there has been an increase in the number of services that classify human tendencies into specific numbers by providing tendency analysis services using MBTI (Myers-Briggs Type Indicator). In these services, the importance of data collection to understand user tendencies is increasing significantly.

이에 따라, 다양한 형태의 답변을 수집하여 사용자의 고유한 성향을 나타내는 정확한 페르소나 코드를 생성하고자 하는 니즈가 존재하였다. 또한, 사용자의 답변이 부족한 경우에도 미리 저장된 빅데이터를 이용하여 정확도 높은 페르소나 코드를 생성하고자 하는 니즈가 존재하였다.Accordingly, there was a need to collect various types of answers and create an accurate persona code that represents the user's unique tendencies. In addition, there was a need to generate highly accurate persona codes using pre-stored big data even when the user's response was insufficient.

본 발명의 목적은, 수신된 사용자의 답변에 기초하여 사용자의 특성을 파악하고, 특성별로 사용자의 성향을 판단하여 도출된 페르소나 코드를 제공하는 것이다.The purpose of the present invention is to provide a persona code derived by identifying the user's characteristics based on the received user's answers and determining the user's tendency for each characteristic.

또한, 본 발명의 목적은, 텍스트 이외의 이미지 등을 포함하는 비정형 데이터를 수신한 경우, 수신된 비정형 데이터를 정형화시키고, 정형화된 데이터를 반영하여 페르소나 코드를 생성 및 제공하는 것이다. Additionally, the purpose of the present invention is to standardize the received unstructured data when receiving unstructured data including images other than text, and to generate and provide a persona code by reflecting the structured data.

또한, 본 발명의 목적은, 부족한 사용자 답변에 대한 데이터를 활용하여 복수의 임시 페르소나 코드를 도출하고, 도출된 페르소나 코드에 대한 사용자의 응답을 수신하여 가장 정확도 높은 페르소나 코드를 선정 및 제공하는 것이다.Additionally, the purpose of the present invention is to derive a plurality of temporary persona codes by utilizing data on insufficient user responses, receive the user's responses to the derived persona codes, and select and provide the persona code with the highest accuracy.

또한, 본 발명의 목적은, 사용자의 답변과 최종 페르소나 코드를 포함한 학습 데이터를 이용하여 결여된 데이터를 예측하기 위한 딥러닝 모듈을 학습시키고, 미리 학습된 딥러닝 모듈을 이용하여 페르소나 코드를 도출 및 제공하는 것이다.In addition, the purpose of the present invention is to train a deep learning module to predict missing data using learning data including the user's answers and the final persona code, and to derive the persona code using the pre-trained deep learning module. It is provided.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the objects mentioned above, and other objects and advantages of the present invention that are not mentioned can be understood by the following description and will be more clearly understood by the examples of the present invention. Additionally, it will be readily apparent that the objects and advantages of the present invention can be realized by the means and combinations thereof indicated in the patent claims.

본 발명의 몇몇 실시예에 따른 페르소나 코드 생성 방법은, 사용자 단말과 연동되는 서버에서 수행되는 페르소나 코드 제공 방법에 있어서, 상기 서버와 연동된 데이터베이스에 미리 저장된 설문 데이터에 대한 답변 데이터와, 상기 사용자 단말을 통해 수집된 비정형 데이터를 상기 사용자 단말로부터 수신하는 단계, 상기 답변 데이터 및 상기 비정형 데이터에 대해 전처리 동작을 수행하여 전처리 데이터를 생성하는 단계, 상기 전처리 데이터에 포함된 각 키워드에 태그(tag)를 부여하여 태그 부여 데이터를 생성하는 단계, 상기 태그 부여 데이터에 포함된 상기 태그를 이용하여 제1 페르소나 코드를 생성하는 단계, 상기 설문 데이터를 구성하는 각 설문문항에 미리 지정된 카테고리를 기초로, 상기 전처리 데이터에 대한 군집화를 통해 제2 페르소나 코드를 생성하는 단계 및 상기 제1 페르소나 코드 및 상기 제2 페르소나 코드를 결합하여 최종 페르소나 코드를 생성하는 단계를 포함한다.A method of generating a persona code according to some embodiments of the present invention includes: response data to survey data pre-stored in a database linked to the server, and the user terminal. Receiving unstructured data collected through the user terminal, performing a preprocessing operation on the answer data and the unstructured data to generate preprocessed data, tagging each keyword included in the preprocessed data. Generating tagging data by assigning tags, Generating a first persona code using the tag included in the tagging data, Preprocessing based on a pre-specified category for each survey question constituting the survey data It includes generating a second persona code through clustering of data and combining the first persona code and the second persona code to generate a final persona code.

또한, 상기 비정형 데이터는, 상기 사용자 단말에서 상기 설문 데이터에 대한 사용자 응답을 수신하는 과정에서 수집된 이미지, 영상, 음성 및 위치정보 중 적어도 하나를 포함할 수 있다.Additionally, the unstructured data may include at least one of images, videos, voices, and location information collected in the process of receiving user responses to the survey data at the user terminal.

또한, 상기 전처리 데이터를 생성하는 단계는, 상기 답변 데이터에 대한 텍스트 마이닝을 통해 상기 답변 데이터에 포함된 미리 설정된 주요 키워드로 구성된 추출 데이터를 생성하는 단계와, 상기 비정형 데이터에 포함된 객체와 관련된 키워드들을 추출하여, 상기 비정형 데이터에 대한 정형화 데이터를 생성하는 단계와, 상기 답변 데이터에 포함된 상기 설문 데이터의 각 응답에 대한 신뢰도와, 상기 정형화 데이터에 포함된 키워드의 중요도를 기초로, 상기 추출 데이터 및 상기 정형화 데이터의 가중치를 산출하는 단계와, 상기 추출 데이터 및 상기 정형화 데이터와, 이에 대한 상기 가중치를 포함하는 전처리 데이터를 생성하는 단계를 포함할 수 있다.In addition, the step of generating the preprocessed data includes generating extracted data consisting of preset main keywords included in the answer data through text mining on the answer data, and keywords related to objects included in the unstructured data. generating standardized data for the unstructured data by extracting the extracted data based on the reliability of each response of the survey data included in the answer data and the importance of keywords included in the structured data; and calculating a weight of the standardized data, and generating preprocessed data including the extracted data, the standardized data, and the weights for the extracted data.

또한, 상기 제1 페르소나 코드를 부여하는 단계는, 상기 최종 페르소나 코드의 제1 카테고리에 포함된 제1 특성과 관련된 제1 태그들을 상기 전처리 데이터에 부여하여 제1 태그 부여 데이터를 생성하는 단계와, 상기 제1 태그가 부여된 키워드에 대한 상기 전처리 데이터의 가중치를 고려하여 상기 제1 특성에 대한 제1 스코어를 산출하는 단계와, 상기 제1 카테고리에 포함된 상기 제1 특성과 다른 제2 특성과 관련된 제2 태그들을 상기 전처리 데이터에 부여하여 제2 태그 부여 데이터를 생성하는 단계와, 상기 전처리 데이터 내에서 상기 제2 태그가 부여된 키워드에 대한 가중치를 고려하여 상기 제2 특성에 대한 제2 스코어를 산출하는 단계와, 상기 제1 스코어에 대응되는 제1 서브 페르소나 코드를 부여하는 단계와, 상기 제2 스코어에 대응되는 제2 서브 페르소나 코드를 부여하는 단계와, 상기 제1 서브 페르소나 코드 및 상기 제2 서브 페르소나 코드를 결합하여 상기 제1 페르소나 코드를 생성하는 단계를 포함할 수 있다.In addition, the step of assigning the first persona code includes assigning first tags related to a first characteristic included in the first category of the final persona code to the preprocessed data to generate first tag assigned data; calculating a first score for the first characteristic by considering a weight of the preprocessed data for the keyword to which the first tag is assigned; a second characteristic different from the first characteristic included in the first category; Generating second tagged data by assigning related second tags to the preprocessed data, and generating a second score for the second characteristic by considering a weight for keywords to which the second tag is assigned in the preprocessed data. Calculating, assigning a first sub-persona code corresponding to the first score, assigning a second sub-persona code corresponding to the second score, the first sub-persona code and the It may include generating the first persona code by combining second sub-persona codes.

또한, 상기 제1 페르소나 코드를 부여하는 단계는, 상기 사용자 단말에 의해 미리 입력한 사용자 정보를 이용하여 상기 제1 스코어 및 상기 제2 스코어를 보정하는 단계를 더 포함할 수 있다.Additionally, the step of assigning the first persona code may further include correcting the first score and the second score using user information pre-entered by the user terminal.

또한, 상기 제1 페르소나 코드를 부여하는 단계는, 상기 전처리 데이터에 포함된 키워드에, 미리 설정된 주제별 태그를 부여하는 단계와, 상기 전처리 데이터에 부여된 상기 주제별 태그를 제1 특성 또는 제2 특성으로 분류하는 단계와, 상기 제1 특성에 대한 태그가 부여된 키워드에 대한 상기 전처리 데이터의 가중치를 고려하여 상기 제1 특성에 대한 제1 스코어를 산출하는 단계와, 상기 제2 특성에 대한 태그가 부여된 키워드에 대한 상기 전처리 데이터의 가중치를 고려하여 상기 제2 특성에 대한 제2 스코어를 산출하는 단계와, 상기 제1 스코어에 대응되는 제1 서브 페르소나 코드를 부여하는 단계와, 상기 제2 스코어에 대응되는 제2 서브 페르소나 코드를 부여하는 단계와, 상기 제1 서브 페르소나 코드 및 상기 제2 서브 페르소나 코드를 결합하여 상기 제1 페르소나 코드를 생성하는 단계를 포함할 수 있다.In addition, the step of assigning the first persona code includes assigning preset tags for each topic to keywords included in the preprocessing data, and converting the tags for each topic assigned to the preprocessing data into a first characteristic or a second characteristic. Classifying, calculating a first score for the first characteristic by considering a weight of the preprocessed data for keywords tagged with the first characteristic, and assigning a tag to the second characteristic. calculating a second score for the second characteristic by considering the weight of the preprocessed data for the keyword, assigning a first sub-persona code corresponding to the first score, and assigning a first sub-persona code to the second score. It may include assigning a corresponding second sub-persona code and combining the first sub-persona code and the second sub-persona code to generate the first persona code.

또한, 상기 제2 페르소나 코드를 생성하는 단계는, 상기 전처리 데이터를 상기 최종 페르소나 코드의 제2 카테고리에 포함된 미리 정해진 각 특성에 따라 분류하는 단계와, 상기 각 특성마다 서로 다른 클러스터링 모델을 이용하여 분류된 데이터에 중에서 유사도 높은 데이터에 대한 군집화를 수행하는 단계와, 상기 각 특성 별로 군집화한 클러스터에 대해 서브 페르소나 코드를 각각 도출하는 단계와, 상기 각 특성 별로 도출된 복수의 상기 서브 페르소나 코드를 병합하여, 상기 제2 페르소나 코드를 생성하는 단계를 포함할 수 있다.In addition, the step of generating the second persona code includes classifying the preprocessed data according to each predetermined characteristic included in the second category of the final persona code, and using a different clustering model for each characteristic. Performing clustering on data with high similarity among the classified data, deriving sub-persona codes for each cluster clustered for each characteristic, and merging the plurality of sub-persona codes derived for each characteristic. Thus, the step of generating the second persona code may be included.

또한, 상기 군집화를 수행하는 단계는, (a) 제1 특성에 대한 클러스터링 모델에 입력된 데이터 군집화 결과에 대한 실루엣 계수를 산출하는 단계와, (b) 상기 실루엣 계수가 미리 설정된 기준치 이상인 군집화 계수를 선정하는 단계와, (c) 선정된 상기 군집화 계수를 기초로 상기 제1 특성에 대한 데이터의 군집화를 확정하는 단계와, 상기 (a) 내지 (c) 단계를 상기 제1 특성과 다른 제2 특성 및 제3 특성에 대해 동일하게 반복 수행하는 단계를 포함할 수 있다.In addition, the step of performing the clustering includes (a) calculating a silhouette coefficient for the result of clustering data input to the clustering model for the first characteristic, and (b) calculating a clustering coefficient where the silhouette coefficient is greater than or equal to a preset reference value. selecting, (c) determining clustering of data for the first characteristic based on the selected clustering coefficient, and performing steps (a) to (c) on a second characteristic different from the first characteristic. And it may include repeating the same steps for the third characteristic.

또한, 상기 사용자 단말로부터 수신된 데이터에 결측 데이터가 존재하는지 여부를 판단하는 단계와, 상기 결측 데이터가 존재하는 경우, 미리 학습된 딥러닝 모듈에 상기 수신된 데이터에 대한 전처리 데이터를 입력하고, 이에 대한 출력으로 상기 딥러닝 모듈로부터 적어도 하나의 임시 페르소나 코드를 수신하는 단계와, 수신된 상기 적어도 하나의 임시 페르소나 코드와 관련된 새로운 설문 데이터를 도출하는 단계와, 상기 설문 데이터를 상기 사용자 단말에 제공하고, 상기 설문 데이터에 대한 응답 데이터를 상기 사용자 단말로부터 수신하는 단계와, 상기 응답 데이터를 기초로 상기 임시 페르소나 코드의 신뢰도를 도출하고, 상기 신뢰도를 기초로 최종 페르소나 코드를 결정하는 단계를 더 포함할 수 있다.In addition, determining whether missing data exists in the data received from the user terminal, and if the missing data exists, inputting preprocessing data for the received data into a pre-trained deep learning module, and receiving at least one temporary persona code from the deep learning module as an output, deriving new survey data related to the received at least one temporary persona code, and providing the survey data to the user terminal; , receiving response data for the survey data from the user terminal, deriving reliability of the temporary persona code based on the response data, and determining a final persona code based on the reliability. You can.

또한, 상기 결측 데이터를 포함하지 않는 상기 수신된 데이터에 대한 상기 전처리 데이터와, 상기 전처리 데이터를 기초로 도출된 상기 최종 페르소나 코드를 상기 데이터베이스에 저장하는 단계와, 상기 데이터베이스에 저장된 상기 전처리 데이터를 상기 딥러닝 모듈의 입력노드에 인가하고, 상기 전처리 데이터에 대한 상기 페르소나 코드를 상기 딥러닝 모듈의 출력노드에 인가함으로써, 상기 딥러닝 모듈을 지도 학습하는 단계를 더 포함할 수 있다.In addition, storing the pre-processing data for the received data that does not include the missing data and the final persona code derived based on the pre-processing data in the database, and storing the pre-processing data stored in the database It may further include supervised learning of the deep learning module by applying it to the input node of the deep learning module and applying the persona code for the preprocessed data to the output node of the deep learning module.

한편, 본 발명의 몇몇 실시예에 따른 서버는, 프로세서, 상기 프로세서에 의해 실행되는 컴퓨터 프로그램을 로드하는 메모리, 및 상기 컴퓨터 프로그램의 실행과정에서 발생되는 데이터를 사용자 단말과 교환하는 인터페이스를 포함하되, 상기 컴퓨터 프로그램은, 상기 서버와 연동된 데이터베이스에 미리 저장된 설문 데이터에 대한 답변 데이터와, 상기 사용자 단말을 통해 수집된 비정형 데이터를 상기 사용자 단말로부터 수신하는 단계, 상기 답변 데이터 및 상기 비정형 데이터에 대해 전처리 동작을 수행하여 전처리 데이터를 생성하는 단계, 상기 전처리 데이터에 포함된 각 키워드에 태그(tag)를 부여하여 태그 부여 데이터를 생성하는 단계, 상기 태그 부여 데이터에 포함된 상기 태그를 이용하여 제1 페르소나 코드를 생성하는 단계, 상기 설문 데이터를 구성하는 각 설문문항에 미리 지정된 카테고리를 기초로 상기 전처리 데이터에 대한 군집화를 통해 제2 페르소나 코드를 생성하는 단계 및 상기 제1 페르소나 코드 및 상기 제2 페르소나 코드를 결합하여 최종 페르소나 코드를 생성하는 단계를 포함한다.Meanwhile, a server according to some embodiments of the present invention includes a processor, a memory for loading a computer program executed by the processor, and an interface for exchanging data generated during the execution of the computer program with a user terminal, The computer program includes receiving response data for survey data pre-stored in a database linked to the server and unstructured data collected through the user terminal from the user terminal, preprocessing the response data and the unstructured data. Generating preprocessed data by performing an operation, Generating tagging data by assigning a tag to each keyword included in the preprocessing data, Creating a first persona using the tag included in the tagging data Generating a code, generating a second persona code through clustering of the preprocessed data based on categories pre-specified for each survey question constituting the survey data, and the first persona code and the second persona code It includes the step of combining to generate the final persona code.

본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법은, 수신된 사용자의 설문에 대한 답변 데이터를 이용하여 전처리 데이터를 생성하고, 전처리 데이터를 통해 미리 정해진 카테고리로 구분되는 복수의 특성에 대응하는 사용자의 페르소나 코드를 각각 도출하고 병합함으로써, 사용자의 초개인화된 데이터가 반영된 페르소나 코드를 제공할 수 있다.A method of providing a persona code according to some embodiments of the present invention generates preprocessed data using response data to a received user's questionnaire, and generates preprocessed data of the user corresponding to a plurality of characteristics divided into predetermined categories through the preprocessed data. By deriving and merging each persona code, it is possible to provide a persona code that reflects the user's hyper-personalized data.

또한, 본 발명은, 답변 데이터와 상이한 형태의 비정형 데이터를 전처리하고, 생성된 전처리 데이터를 이용해 페르소나 코드를 생성하는 방법을 제공함으로써, 사용자의 성향을 보다 정밀하게 페르소나 코드에 반영할 수 있다.In addition, the present invention provides a method of preprocessing unstructured data in a different form from answer data and generating a persona code using the generated preprocessed data, so that the user's preference can be more precisely reflected in the persona code.

또한, 본 발명은, 사용자의 답변 데이터가 부족한 경우(즉, 수신된 사용자 데이터가 결측 데이터인 경우)에 미리 학습된 딥러닝 모델을 이용하여 복수의 임시 페르소나 코드를 생성하고, 부가적인 사용자의 응답 데이터를 이용하여 생성된 임시 페르소나 코드를 검증하여 도출된 페르소나 코드를 제공함으로써, 페르소나 코드의 예측 정확도를 향상시킬 수 있다.In addition, the present invention generates a plurality of temporary persona codes using a pre-trained deep learning model when the user's response data is insufficient (i.e., when the received user data is missing data) and generates additional user responses. By providing a persona code derived by verifying a temporary persona code created using data, the prediction accuracy of the persona code can be improved.

또한, 본 발명은, 사용자의 설문에 대한 답변 데이터 및 비정형 데이터를 기초로 도출된 페르소나 코드로 구성된 정확도 높은 학습 데이터를 이용하여, 지도학습을 통해 딥러닝 모델을 학습시키고, 학습된 딥러닝 모델을 이용하여 결측 데이터에 대한 정확도 높은 페르소나 코드를 도출하여 이용할 수 있다.In addition, the present invention trains a deep learning model through supervised learning using highly accurate learning data consisting of persona codes derived based on user questionnaire response data and unstructured data, and uses the learned deep learning model to You can use this to derive and use a highly accurate persona code for missing data.

상술한 내용과 더불어 본 발명의 구체적인 효과는 이하 발명을 실시하기 위한 구체적인 사항을 설명하면서 함께 기술한다.In addition to the above-described content, specific effects of the present invention are described below while explaining specific details for carrying out the invention.

도 1은 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법을 수행하는 시스템을 설명하기 위한 개념도이다.
도 2는 도 1의 서버의 구성을 예시적으로 설명하기 위한 블록도이다.
도 3은 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법을 설명하기 위한 순서도이다.
도 4는 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법을 설명하기 위한 블록도이다.
도 5는 도 3의 최종 페르소나 코드를 구성하는 제1 및 제2 페르소나 코드를 설명하기 위한 도면이다.
도 6은 도 5의 최종 페르소나 코드의 제1 및 제2 페르소나가 나타내는 카테고리 및 내용을 설명하기 위한 도면이다.
도 7은 도 6의 카테고리에 포함된 설문 데이터와 상기 설문 데이터에 대응하는 답변 데이터의 예시를 나타내는 도면이다.
도 8은 도 3의 전처리 데이터를 생성하는 과정을 나타내는 순서도이다.
도 9는 도 8의 전처리 데이터를 생성하는 전처리 모듈의 구체적인 구성을 나타내는 블록도이다.
도 10은 도 8의 비정형 데이터의 예시를 나타내는 도면이다.
도 11은 도 10의 비정형 데이터에 대해 전처리 동작을 수행한 정형화 데이터의 예시를 나타내는 도면이다.
도 12는 도 3의 제1 페르소나 코드를 생성하는 과정을 구체화한 일 실시예를 나타내는 순서도이다.
도 13은 도 12의 태그 모듈과 제1 페르소나 코드 생성 모듈의 구성을 구체화한 블록도이다.
도 14는 도 3의 제1 페르소나 코드를 생성하는 과정을 구체화한 다른 실시예를 나타내는 순서도이다.
도 15는 도 14의 태그 모듈과 제1 페르소나 코드 생성 모듈의 구성을 구체화한 블록도이다.
도 16은 도 15에서 태그를 부여하는 예시를 나타내는 도면이다.
도 17은 도 3의 제2 페르소나 코드를 생성하는 과정을 구체화한 순서도이다.
도 18은 도 17의 제2 페르소나 코드 생성 모듈의 구성을 나타내는 블록도이다.
도 19는 도 17의 군집화를 수행하는 과정을 구체화한 순서도이다.
도 20은 도 19의 군집화 과정을 설명하기 위한 도면이다.
도 21은 본 발명의 몇몇 실시예에 따른 결측 데이터를 포함하는 경우의 페르소나 코드 생성 방법을 나타내는 순서도이다.
도 22는 본 발명의 몇몇 실시예에 따른 결측 데이터를 포함하는 경우의 페르소나 코드 생성 방법을 설명하기 위한 블록도이다.
도 23은 도 22의 페르소나 코드 생성 딥러닝 모듈의 구조를 설명하기 위한 도면이다.
도 24는 도 22의 페르소나 코드 딥러닝 모듈의 학습 방법을 설명하기 위한 순서도이다.
도 25는 도 22의 페르소나 코드 딥러닝 모듈의 학습 방법을 설명하기 위한 블록도이다.
도 26은 본 발명의 몇몇 실시예에 따른 페르소나 코드 생성 방법을 수행하는 서버의 하드웨어 구성을 설명하기 위한 도면이다. 1 is a conceptual diagram illustrating a system that performs a method for providing a persona code according to some embodiments of the present invention.
FIG. 2 is a block diagram illustrating the configuration of the server of FIG. 1 by way of example.
Figure 3 is a flowchart illustrating a method for providing a persona code according to some embodiments of the present invention.
Figure 4 is a block diagram illustrating a method for providing a persona code according to some embodiments of the present invention.
FIG. 5 is a diagram for explaining the first and second persona codes that make up the final persona code of FIG. 3.
FIG. 6 is a diagram for explaining the categories and contents indicated by the first and second personas of the final persona code of FIG. 5.
FIG. 7 is a diagram illustrating an example of survey data included in the category of FIG. 6 and response data corresponding to the survey data.
Figure 8 is a flowchart showing the process of generating the preprocessing data of Figure 3.
FIG. 9 is a block diagram showing the specific configuration of a preprocessing module that generates the preprocessing data of FIG. 8.
FIG. 10 is a diagram showing an example of unstructured data of FIG. 8.
FIG. 11 is a diagram illustrating an example of structured data that has undergone a preprocessing operation on the unstructured data of FIG. 10.
FIG. 12 is a flow chart illustrating an embodiment of the process of generating the first persona code of FIG. 3.
FIG. 13 is a block diagram specifying the configuration of the tag module and first persona code generation module of FIG. 12.
FIG. 14 is a flowchart showing another embodiment specifying the process of generating the first persona code of FIG. 3.
FIG. 15 is a block diagram specifying the configuration of the tag module and first persona code generation module of FIG. 14.
FIG. 16 is a diagram showing an example of assigning a tag in FIG. 15.
FIG. 17 is a flowchart detailing the process of generating the second persona code of FIG. 3.
FIG. 18 is a block diagram showing the configuration of the second persona code generation module of FIG. 17.
Figure 19 is a flowchart detailing the process of performing clustering in Figure 17.
FIG. 20 is a diagram for explaining the clustering process of FIG. 19.
Figure 21 is a flowchart showing a method for generating a persona code when missing data is included according to some embodiments of the present invention.
Figure 22 is a block diagram illustrating a method for generating a persona code when missing data is included according to some embodiments of the present invention.
FIG. 23 is a diagram for explaining the structure of the persona code generation deep learning module of FIG. 22.
Figure 24 is a flowchart for explaining the learning method of the persona code deep learning module of Figure 22.
Figure 25 is a block diagram for explaining the learning method of the persona code deep learning module of Figure 22.
Figure 26 is a diagram for explaining the hardware configuration of a server that performs a persona code generation method according to some embodiments of the present invention.

본 명세서 및 특허청구범위에서 사용된 용어나 단어는 일반적이거나 사전적인 의미로 한정하여 해석되어서는 아니된다. 발명자가 그 자신의 발명을 최선의 방법으로 설명하기 위해 용어나 단어의 개념을 정의할 수 있다는 원칙에 따라, 본 발명의 기술적 사상과 부합하는 의미와 개념으로 해석되어야 한다. 또한, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명이 실현되는 하나의 실시예에 불과하고, 본 발명의 기술적 사상을 전부 대변하는 것이 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 및 응용 가능한 예들이 있을 수 있음을 이해하여야 한다.Terms or words used in this specification and patent claims should not be construed as limited to their general or dictionary meaning. According to the principle that the inventor can define the term or word concept in order to explain his or her invention in the best way, it should be interpreted with a meaning and concept consistent with the technical idea of the present invention. In addition, the embodiments described in this specification and the configurations shown in the drawings are only one embodiment of the present invention and do not completely represent the technical idea of the present invention, so they cannot be replaced at the time of filing the present application. It should be understood that there may be various equivalents, variations, and applicable examples.

본 명세서 및 특허청구범위에서 사용된 제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. '및/또는' 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B used in the present specification and claims may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be named a second component without departing from the scope of the present invention, and similarly, the second component may also be named a first component. The term 'and/or' includes any of a plurality of related stated items or a combination of a plurality of related stated items.

본 명세서 및 특허청구범위에서 사용된 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서 "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the specification and claims are merely used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as "include" or "have" should be understood as not precluding the existence or addition possibility of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification. .

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해서 일반적으로 이해되는 것과 동일한 의미를 가지고 있다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the present invention pertains.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted in an ideal or excessively formal sense unless explicitly defined in the present application. No.

또한, 본 발명의 각 실시예에 포함된 각 구성, 과정, 공정 또는 방법 등은 기술적으로 상호 간 모순되지 않는 범위 내에서 공유될 수 있다. Additionally, each configuration, process, process, or method included in each embodiment of the present invention may be shared within the scope of not being technically contradictory to each other.

이하에서는, 도 1 내지 도 26을 참조하여 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법 및 이를 수행하는 시스템에 대해 자세히 설명하도록 한다.Hereinafter, a method for providing a persona code and a system for performing the method according to some embodiments of the present invention will be described in detail with reference to FIGS. 1 to 26.

도 1은 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법을 수행하는 시스템을 설명하기 위한 개념도이다.1 is a conceptual diagram illustrating a system that performs a method for providing a persona code according to some embodiments of the present invention.

도 1을 참조하면, 본 발명의 몇몇 실시예에 따른 시스템은, 플랫폼 서버(100)(이하, 서버) 및 사용자 단말(200)을 포함한다. 여기에서, 서버(100)는, 통신망(300)을 통해 사용자 단말(200)과 연계되어 동작할 수 있다. Referring to FIG. 1, a system according to some embodiments of the present invention includes a platform server 100 (hereinafter referred to as server) and a user terminal 200. Here, the server 100 may operate in conjunction with the user terminal 200 through the communication network 300.

서버(100)는 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법의 수행 주체가 된다. 다만, 본 발명이 이에 한정되는 것은 아니며, 본 발명의 몇몇 실시예에서, 사용자 단말(200)이 본 발명의 페르소나 코드 제공 방법 중 일부를 수행할 수 있음은 물론이다.The server 100 becomes the entity that performs the persona code provision method according to some embodiments of the present invention. However, the present invention is not limited to this, and of course, in some embodiments of the present invention, the user terminal 200 can perform some of the persona code providing method of the present invention.

사용자 단말(200)은 서버(100)에서 제공하는 페르소나 코드 제공 방법에 이용되는 설문에 대한 답변 데이터 및 비정형 데이터를 사용자로부터 입력 받는 단말을 의미한다. 이때, 사용자 단말(200)은 사용자의 외관 또는 주변 환경을 촬영하는 촬상장치(예. 카메라)를 포함할 수 있다. 또한, 사용자 단말(200)은 사용자의 위치 정보를 파악하는 위치 추적 장치(예. GPS 장치)를 포함할 수 있다.The user terminal 200 refers to a terminal that receives response data and unstructured data from a user for a survey used in the persona code provision method provided by the server 100. At this time, the user terminal 200 may include an imaging device (eg, camera) that photographs the user's appearance or the surrounding environment. Additionally, the user terminal 200 may include a location tracking device (eg, GPS device) that determines the user's location information.

구체적으로, 본 발명의 일 실시예에서, 사용자 단말(200)은 입력 인터페이스를 통해 사용자로부터 답변 데이터 및 비정형 데이터를 입력받을 수 있다. 다만, 전술한 설명은 사용자 단말(200)에서 답변 데이터 및 비정형 데이터를 전송하는 몇몇 예시에 불과하고, 본 발명이 이에 한정되는 것은 아니다.Specifically, in one embodiment of the present invention, the user terminal 200 may receive answer data and unstructured data from the user through an input interface. However, the above description is only a few examples of transmitting answer data and unstructured data from the user terminal 200, and the present invention is not limited thereto.

사용자 단말(200)은 미리 저장된 설문 데이터에 대한 사용자의 답변 데이터 및 비정형 데이터를 서버(100)에 전달할 수 있다. 또한, 본 발명의 실시예에 따라, 사용자 단말(200)은 답변 데이터 및 비정형 데이터 중 어느 하나만을 서버(100)에 전달할 수 있다.The user terminal 200 may transmit the user's answer data and unstructured data to the server 100 for pre-stored survey data. Additionally, according to an embodiment of the present invention, the user terminal 200 may transmit only one of answer data and unstructured data to the server 100.

이때, 사용자 단말(200)은 퍼스널 컴퓨터(PC), 노트북, 태블릿, 휴대폰, 스마트폰, 웨어러블 디바이스(예를 들어, 워치형 단말기)등의 다양한 전자 장치를 의미할 수 있다. 또한, 사용자 단말(200)은 사용자의 입력을 수신하는 입력부(미도시), 비주얼 정보를 디스플레이하는 디스플레이부(미도시), 외부와 신호를 송수신하는 통신부(미도시) 및 데이터를 프로세싱하고 사용자 단말(200) 내부의 각 유닛들을 제어하며 유닛들 간의 데이터 송/수신을 제어하는 제어부(미도시)를 포함할 수 있다. At this time, the user terminal 200 may refer to various electronic devices such as a personal computer (PC), a laptop, a tablet, a mobile phone, a smartphone, or a wearable device (eg, a watch-type terminal). In addition, the user terminal 200 includes an input unit (not shown) that receives the user's input, a display unit (not shown) that displays visual information, a communication unit (not shown) that transmits and receives signals to and from the outside, and a data processing unit that processes the user terminal. (200) It may include a control unit (not shown) that controls each internal unit and controls data transmission/reception between units.

서버(100)는 온라인 플랫폼을 통해 사용자 단말(200)을 이용하는 사용자에게, 페르소나 코드 생성 서비스를 제공할 수 있다. 서버(100)에서 제공하는 이러한 온라인 서비스는 사용자 단말(200)에 미리 설치된 어플리케이션을 통해 제공될 수 있다.The server 100 may provide a persona code generation service to users who use the user terminal 200 through an online platform. These online services provided by the server 100 may be provided through an application pre-installed on the user terminal 200.

이하에서는, 페르소나 코드 제공 방법을 수행하는 서버(100)의 구성에 대해 자세히 살펴보도록 한다.Below, we will take a closer look at the configuration of the server 100 that performs the persona code provision method.

도 2는 도 1의 서버의 구성을 예시적으로 설명하기 위한 블록도이다.FIG. 2 is a block diagram illustrating the configuration of the server of FIG. 1 by way of example.

도 2를 참조하면, 본 발명의 서버(100)는 인터페이스(110), 데이터베이스(120), 프로세서(130) 및 메모리(140)를 포함한다. 이때, 메모리(140)에는 전처리 모듈(Preprocessing Module; 이하, PM), 태그 모듈(Tagging Module; 이하, TM), 코드 생성 모듈(Code Generating Module; 이하, CGM) 및 검증 모듈(Verifying Module; 이하, VM) 로딩되어 프로세서(130)에 의해 구동 또는 실행될 수 있다. 각각의 모듈은 서버(100)에 포함된 데이터베이스(120) 또는 스토리지(미도시)에 컴퓨터 프로그램의 형태로 저장되어 이용될 수 있다.Referring to FIG. 2, the server 100 of the present invention includes an interface 110, a database 120, a processor 130, and a memory 140. At this time, the memory 140 includes a preprocessing module (PM), a tagging module (TM), a code generating module (CGM), and a verification module (Verifying Module). VM) may be loaded and driven or executed by the processor 130. Each module may be stored and used in the form of a computer program in the database 120 or storage (not shown) included in the server 100.

이때, 코드 생성 모듈(CGM)은 제1 페르소나 코드 생성 모듈(Persona Code Generating Module 1; 이하, PCGM1), 제2 페르소나 코드 생성 모듈(Persona Code Generating Module 2; 이하, PCGM2) 및 페르소나 코드 생성 딥러닝 모듈(Persona Code Deep Learning Module; 이하, PCDM)을 포함할 수 있다.At this time, the code generation module (CGM) includes the first persona code generating module (Persona Code Generating Module 1; hereinafter, PCGM1), the second persona code generating module (Persona Code Generating Module 2; hereinafter, PCGM2), and persona code generation deep learning. It may include a module (Persona Code Deep Learning Module; hereinafter referred to as PCDM).

추가적으로, 메모리(140)에는 본 발명의 실시예에 따라 학습 모듈(Training Module; 이하, TRM)이 추가적으로 로딩되어 프로세서(130)에 의해 구동 또는 실행될 수 있다. 각 실시예에 대한 자세한 설명은 이하에서 자세히 설명하도록 한다. Additionally, a training module (TRM) may be additionally loaded into the memory 140 and driven or executed by the processor 130 according to an embodiment of the present invention. A detailed description of each embodiment will be provided below.

우선, 인터페이스(110)는 사용자 단말(200)과 데이터를 주고받는 기능을 수행한다. 인터페이스(110)는 사용자 단말(200)에 미리 저장된 설문 데이터를 제공하고, 제공된 설문 데이터에 대한 사용자의 답변 데이터 및 비정형 데이터를 수신할 수 있다. 이때, 인터페이스(110)는 수신된 답변 데이터 및 비정형 데이터를 서버(100) 내의 다른 구성요소에 전달할 수 있다. 인터페이스(110)는 다양한 통신 모듈을 포함할 수 있으며, 통신망(300)을 통해 사용자 단말(200)과의 사이의 데이터 교환을 수행할 수 있다.First, the interface 110 performs a function of exchanging data with the user terminal 200. The interface 110 may provide survey data pre-stored in the user terminal 200 and receive the user's response data and unstructured data for the provided survey data. At this time, the interface 110 may transmit the received answer data and unstructured data to other components within the server 100. The interface 110 may include various communication modules and may exchange data with the user terminal 200 through the communication network 300.

데이터베이스(120)는 인터페이스(110)를 통해 수신된 데이터를 저장 및 관리하는 기능을 수행한다. 데이터베이스(120)는 메모리(140)에 로딩되어 이용 가능한 복수의 모듈이 프로그램 형태로 저장될 수 있다. The database 120 performs the function of storing and managing data received through the interface 110. The database 120 may be loaded into the memory 140 and store a plurality of available modules in program form.

또한, 데이터베이스(120)는 페르소나 코드 생성 딥러닝 모듈(PCDM)을 학습시키기 위한 학습 데이터 뿐만 아니라, 복수의 레이어로 구성된 백본 네트워크(backbone network)의 가중치도 저장되어 관리될 수 있다. 다만, 이는 하나의 예시에 불과하고, 본 발명이 이에 한정되는 것은 아니다.Additionally, the database 120 may store and manage not only training data for training a persona code generation deep learning module (PCDM), but also weights of a backbone network composed of a plurality of layers. However, this is only one example, and the present invention is not limited thereto.

프로세서(130)는 소프트웨어를 실행하여 서버(100)의 적어도 하나의 다른 구성요소(예: 하드웨어 또는 소프트웨어 구성요소)를 제어할 수 있고, 다양한 데이터 처리 및 연산을 수행할 수 있다. 예를 들어, 프로세서(130)는 다른 구성요소(예: 인터페이스(110))로부터 수신된 정보, 명령 또는 데이터를 메모리(140)에 로드하고, 로드된 정보, 명령 또는 데이터를 이용하여 연산을 수행하며, 이에 따른 결과 데이터를 메모리(140) 또는 스토리지(미도시)에 저장할 수 있다.The processor 130 may execute software to control at least one other component (eg, hardware or software component) of the server 100 and may perform various data processing and calculations. For example, the processor 130 loads information, commands, or data received from another component (e.g., the interface 110) into the memory 140 and performs an operation using the loaded information, commands, or data. And the resulting data can be stored in memory 140 or storage (not shown).

또한, 프로세서(130)는 웹 크롤링을 통해 키워드를 도출할 수 있다. 프로세서(130)는 도출된 키워드와 사용자의 성격을 매칭한 테이블을 데이터베이스(120)에 저장할 수 있다.Additionally, the processor 130 may derive keywords through web crawling. The processor 130 may store a table matching the derived keywords and the user's personality in the database 120.

메모리(140)는 서버(100)의 적어도 하나의 구성요소(예: 프로세서(130))에서 사용되는 다양한 데이터를 저장 및 로드할 수 있다. 예를 들어, 데이터는 소프트웨어 및 이와 관련된 명령에 대한 입력 데이터 또는 출력 데이터를 포함할 수 있다.The memory 140 may store and load various data used in at least one component (eg, processor 130) of the server 100. For example, data may include input data or output data for software and instructions related thereto.

따라서, 프로세서(130)는 메모리(140) 상에 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법의 다양한 동작과 관련된 모듈(module) 또는 인스트럭션들(instructions)을 로드하여 이용할 수 있다. Accordingly, the processor 130 may load and use modules or instructions related to various operations of the method for providing persona codes according to some embodiments of the present invention onto the memory 140.

구체적으로, 프로세서(130)는 메모리(140) 상에 로드된 전처리 모듈(PM)을 이용하여, 인터페이스(110)를 통해 수신된 답변 데이터 또는 비정형 데이터를 전처리하여, 태그 모듈(TM)에 전달할 수 있다.Specifically, the processor 130 may preprocess the answer data or unstructured data received through the interface 110 using the preprocessing module (PM) loaded on the memory 140 and transmit it to the tag module (TM). there is.

이때, 태그 모듈(TM)은 전처리 데이터에 대응하는 태그(tag)를 부여하여 태그 부여 데이터를 생성할 수 있다. At this time, the tag module (TM) may generate tagged data by assigning a tag corresponding to the preprocessed data.

도 3은 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법을 설명하기 위한 순서도이다. 도 4는 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법을 설명하기 위한 블록도이다. 이하에서는, 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법의 수행 주체가 서버(100) 또는 서버(100)에 포함된 전술한 프로세서(130)인 것을 예로 들어 설명하도록 한다. Figure 3 is a flowchart illustrating a method for providing a persona code according to some embodiments of the present invention. Figure 4 is a block diagram illustrating a method for providing a persona code according to some embodiments of the present invention. Hereinafter, the method for providing a persona code according to some embodiments of the present invention will be described by taking as an example that the performer is the server 100 or the above-described processor 130 included in the server 100.

우선, 도 3 및 도 4를 참조하면, 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법에서, 서버(100)는 사용자 단말(200)로부터 미리 저장된 설문 데이터에 대한 사용자의 답변 데이터 및 비정형 데이터를 수신할 수 있다(S100).First, referring to FIGS. 3 and 4, in the method of providing a persona code according to some embodiments of the present invention, the server 100 receives the user's response data and unstructured data for pre-stored survey data from the user terminal 200. Can receive (S100).

이때, 사용자 단말(200)에서 전송하는 답변 데이터는 선택형, 단답형 및 서술형 답변을 포함할 수 있다. 또한, 비정형 데이터는 사용자 단말(200)에서 설문 데이터에 대한 사용자 응답을 수신하는 과정에서 수집될 수 있다. 구체적으로, 비정형 데이터는 이미지, 영상 음성 및 위치 정보 중 적어도 하나를 포함할 수 있다. At this time, the answer data transmitted from the user terminal 200 may include multiple-choice, short-answer, and descriptive answers. Additionally, unstructured data may be collected in the process of receiving user responses to survey data at the user terminal 200. Specifically, unstructured data may include at least one of images, video, audio, and location information.

구체적으로, 서버(100)의 프로세서(130)는 미리 저장된 설문 데이터를 데이터베이스(120)로부터 추출할 수 있다. 이어서, 프로세서(130)는 사용자 단말(200)에 설문 데이터를 제공할 수 있다. 이때, 설문 데이터는 사용자의 성향을 파악할 수 있는 질문을 포함할 수 있다.Specifically, the processor 130 of the server 100 may extract pre-stored survey data from the database 120. Subsequently, the processor 130 may provide survey data to the user terminal 200. At this time, the survey data may include questions that can determine the user's tendency.

프로세서(130)는 설문 데이터에 대한 사용자의 답변 데이터 및 비정형 데이터를 수신할 수 있다. 이때, 답변 데이터 및 비정형 데이터를 모두 수신할 수 있다. 구체적으로, 답변 데이터는 선택형 또는 단답형 답변일 수 있다. 또한, 비정형 데이터는 텍스트 외의 형태를 가지는 데이터일 수 있다. 설문 데이터, 답변 데이터 및 비정형 데이터에 대해서는 추후에 자세히 설명한다.The processor 130 may receive user response data and unstructured data regarding survey data. At this time, both answer data and unstructured data can be received. Specifically, the answer data may be multiple-choice or short-answer answers. Additionally, unstructured data may be data that has a form other than text. Survey data, response data, and unstructured data will be explained in detail later.

이어서, 서버(100)는 수신된 답변 데이터 및 비정형 데이터에 대한 전처리 동작을 수행하여 전처리 데이터를 생성한다(S200). 구체적으로, 서버(100)의 전처리 모듈(PM)은 답변 데이터 및 비정형 데이터 중 적어도 어느 하나를 이용하여 전처리 데이터를 생성할 수 있다. 전처리 데이터는 답변 데이터에서 키워드를 추출하고, 가중치를 부여한 데이터일 수 있다. 또한, 전처리 데이터는 비정형 데이터를 정형화하여, 가중치를 부여한 데이터일 수 있다.Next, the server 100 generates preprocessing data by performing a preprocessing operation on the received answer data and unstructured data (S200). Specifically, the preprocessing module (PM) of the server 100 may generate preprocessing data using at least one of answer data and unstructured data. Preprocessed data may be data in which keywords are extracted from response data and weighted. Additionally, the preprocessed data may be data in which unstructured data is standardized and weighted.

이어서, 서버(100)는 전처리 데이터에 대응하는 태그를 부여할 수 있다(S300). 구체적으로, 서버(100)의 태그 모듈(TM)은 사용자의 성향을 나타내는 전처리 데이터에 자동으로 태그를 부여할 수 있다. 즉, 전처리 데이터가 텍스트인 경우에, 태그 모듈(TM)은 문서 분석 기법을 활용하여 주요 텍스트를 포함하는 전처리 데이터를 구별할 수 있다. 예를 들어, 태그 모듈(TM)은 TF-IDF(Term Frequency-Inverse Document Frequency)를 이용하여 텍스트의 빈도와 문서 집합 내에서의 텍스트의 희소성을 분석하여, 주요 텍스트로 판단할 수 있다. 이어서, 태그 모듈(TM)은 주요 텍스트에 태그를 부여하여 태그 부여 데이터를 생성할 수 있다.Next, the server 100 may assign a tag corresponding to the preprocessed data (S300). Specifically, the tag module (TM) of the server 100 can automatically assign tags to preprocessed data representing the user's tendencies. That is, when the preprocessed data is text, the tag module (TM) can distinguish the preprocessed data including the main text using a document analysis technique. For example, the tag module (TM) can use TF-IDF (Term Frequency-Inverse Document Frequency) to analyze the frequency of the text and the scarcity of the text within the document set and determine it to be a main text. Subsequently, the tag module (TM) can generate tagged data by assigning tags to key texts.

이어서, 서버(100)는 부여된 태그를 이용하여 제1 페르소나 코드를 생성할 수 있다(S400). 제1 페르소나 코드는 태그 부여 데이터에 포함된 태그를 이용하여 생성될 수 있다.Next, the server 100 may generate a first persona code using the assigned tag (S400). The first persona code can be created using a tag included in tag assignment data.

이어서, 서버(100)는 설문 데이터에 지정된 카테고리를 기초로 전처리 데이터에 대한 군집화를 통해 제2 페르소나 코드를 생성할 수 있다(S500). 여기서, 각 설문문항의 카테고리는 데이터베이스(120)에 미리 지정되어 있을 수 있다. 이때, 제1 페르소나 코드와 제2 페르소나 코드는 병렬적으로 생성될 수 있다. 즉, 서버(100)의 코드 생성 모듈(CGM)은 제1 페르소나 코드 및 제2 페르소나 코드 중 어느 하나를 먼저 생성할 수 있다. 또한, 코드 생성 모듈(CGM)은 제1 페르소나 코드 및 제2 페르소나 코드를 동시에 생성할 수 있다.Subsequently, the server 100 may generate a second persona code through clustering of the preprocessed data based on the category specified in the survey data (S500). Here, the category of each survey question may be pre-designated in the database 120. At this time, the first persona code and the second persona code may be generated in parallel. That is, the code generation module (CGM) of the server 100 may first generate either the first persona code or the second persona code. Additionally, the code generation module (CGM) can simultaneously generate the first persona code and the second persona code.

이어서, 서버(100)는 제1 페르소나 코드 및 제2 페르소나 코드를 결합하여 최종 페르소나 코드를 생성할 수 있다(S600). 최종 페르소나 코드는 문자 또는 숫자를 포함할 수 있다. 이때, 문자 또는 숫자는 각각 사용자의 성향을 나타낼 수 있다.Next, the server 100 may generate a final persona code by combining the first persona code and the second persona code (S600). The final persona code may contain letters or numbers. At this time, each letter or number may represent the user's personality.

도 5는 도 3의 최종 페르소나 코드를 구성하는 제1 및 제2 페르소나 코드를 설명하기 위한 도면이다. 도 6은 도 5의 최종 페르소나 코드의 제1 및 제2 페르소나가 나타내는 카테고리 및 내용을 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining the first and second persona codes that make up the final persona code of FIG. 3. FIG. 6 is a diagram for explaining the categories and contents indicated by the first and second personas of the final persona code of FIG. 5.

도 5를 참조하면, 최종 페르소나 코드는 적어도 하나의 페르소나 코드를 포함할 수 있다. 이때, 적어도 하나의 페르소나 코드는 제1 및 제2 페르소나 코드를 포함할 수 있다. 구체적으로, 제1 페르소나 코드는 하나의 카테고리에 관련된 사용자의 성향을 나타낼 수 있다. 카테고리는 본성 요소(Original Nature Factor), 환경 요소(Environmental Factor) 및 순간 요소(Moment Factor)를 포함할 수 있다. 또한, 하나의 카테고리는 복수 개의 특성을 포함할 수 있다.Referring to FIG. 5, the final persona code may include at least one persona code. At this time, at least one persona code may include first and second persona codes. Specifically, the first persona code may represent the user's tendency related to one category. Categories may include Original Nature Factor, Environmental Factor, and Moment Factor. Additionally, one category may include multiple characteristics.

구체적으로, 도 6을 참조하면, 본성 요소는, 통합 심리학 이론을 기반으로 사람의 성격을 특정할 수 있는 요소를 의미할 수 있다. 즉, 본성 요소는 성향 이론으로 분석할 수 있는 사용자의 성격을 나타낼 수 있다. 본성 요소는, 예를 들어, MBTI(Myers-Briggs Type Indicator), DISC, 백금률(Platinum Rule)을 나타낼 수 있다.Specifically, referring to FIG. 6, the nature element may refer to an element that can specify a person's personality based on integrated psychological theory. In other words, the nature factor can represent the user's personality that can be analyzed by disposition theory. Nature elements may represent, for example, MBTI (Myers-Briggs Type Indicator), DISC, and Platinum Rule.

구체적으로, 본성 요소는 통합 심리학 이론에 따라 복수 개의 행동 특성을 이용할 수 있다. 즉, 본성 요소는 쉽게 변하지 않는 사람의 본성을 나타내는 장기적인 성격을 의미할 수 있다.Specifically, nature factors can utilize multiple behavioral traits according to integrative psychological theory. In other words, the nature factor can refer to a long-term personality that represents a person's nature that does not change easily.

먼저, 본성 요소는 사람들과 상호 작용하는 방식과 에너지를 얻는 출처에 따라 외향 대 내향으로 나눌 수 있다. 다음으로, 본성 요소는, 정보를 수집하고 이해하는 방식에 따라 감각 대 직관으로, 의사결정을 내리는 과정에서 논리와 감정을 어떻게 활용하는지에 따라 사고 대 감정으로 나눌 수 있다. 또한, 본성 요소는, 외부 세계에 대한 태도와 생활 방식에 따라 판단 대 인식으로 나눌 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.First, nature factors can be divided into extroversion versus introversion, depending on how you interact with people and where you get your energy from. Next, the nature factor can be divided into sensing versus intuition, depending on how information is collected and understood, and thinking versus feeling, depending on how logic and emotions are used in the decision-making process. Additionally, the nature factor can be divided into judgment versus perception depending on one's attitude toward the external world and lifestyle. However, this embodiment is not limited to this.

제1 페르소나 코드 및 제2 페르소나 코드는 복수의 서브 페르소나 코드를 포함할 수 있다. 구체적으로, 제1 페르소나 코드는 제1 서브 페르소나 코드 내지 제4 페르소나 코드를 포함할 수 있다. 이때, 제1 서브 페르소나 코드는 사용자의 성격이 외향적인지 혹은 내향적인지 여부를 나타낼 수 있다. 예를 들어, 제1 서브 페르소나 코드가 0인 경우에 사용자가 외향적이고, 1인 경우에 사용자가 내향적임을 의미할 수 있다.The first persona code and the second persona code may include a plurality of sub-persona codes. Specifically, the first persona code may include first sub-persona codes to fourth persona codes. At this time, the first sub-persona code may indicate whether the user's personality is extroverted or introverted. For example, if the first sub-persona code is 0, it may mean that the user is an extrovert, and if the first sub-persona code is 1, it may mean that the user is an introvert.

한편, 제2 서브 페르소나 코드는 사용자가 정보를 수집 및 이해하는 방식이 감각에 의존하는지, 직관에 의존하는지 여부를 나타낼 수 있다. 예를 들어, 제2 서브 페르소나 코드가 0인 경우에 사용자가 감각에 따르고, 1인 경우에 사용자가 직관에 따름을 의미할 수 있다.Meanwhile, the second sub-persona code may indicate whether the way the user collects and understands information relies on senses or intuition. For example, if the second sub-persona code is 0, it may mean that the user follows their senses, and if the second sub-persona code is 1, it may mean that the user follows their intuition.

또한, 제3 서브 페르소나 코드는 의사결정을 내리는 과정에서 논리를 활용하는지, 감정을 활용하는지 여부를 나타낼 수 있다. 예를 들어, 제3 서브 페르소나 코드가 0인 경우에 사용자가 이성에 따르고, 1인 경우에 사용자가 감정에 따름을 의미할 수 있다. 이어서, 제4 서브 페르소나 코드는 외부 세계에 대한 태도가 판단에 의존하는지, 인식에 의존하는지 여부를 나타낼 수 있다. 예를 들어, 제4 서브 페르소나 코드가 0인 경우에 사용자의 생활 방식이 판단에 의존하고, 1인 경우에 사용자의 생활 방식이 인식에 의존함을 의미할 수 있다. Additionally, the third sub-persona code can indicate whether logic or emotion is used in the decision-making process. For example, if the third sub-persona code is 0, it may mean that the user follows reason, and if the third sub-persona code is 1, it may mean that the user follows emotions. Subsequently, the fourth sub-persona code can indicate whether the attitude toward the external world relies on judgment or perception. For example, if the fourth sub-persona code is 0, it may mean that the user's lifestyle depends on judgment, and if the fourth sub-persona code is 1, it may mean that the user's lifestyle depends on recognition.

다시 도 5를 참조하면, 제2 페르소나 코드는 환경 요소와 순간 요소를 포함할 수 있다. 환경 요소는 환경의 변화에 따라 달라지는 가치관, 욕구 등의 심리를 특정할 수 있다. 또한, 순간 요소는 특정 상황 및 트렌드에 따른 순간의 감정, 취향, 성향 심리 등을 특정할 수 있다. 즉, 환경 요소는 중기적 관점에서, 순간 요소는 단기적 관점에서 수집된 사용자의 성격을 의미할 수 있다.Referring again to FIG. 5, the second persona code may include environmental elements and momentary elements. Environmental factors can specify psychology such as values and desires that change depending on environmental changes. In addition, momentary elements can specify momentary emotions, tastes, psychological tendencies, etc. according to specific situations and trends. In other words, environmental factors can refer to the user's personality collected from a mid-term perspective, and momentary factors can refer to the user's personality collected from a short-term perspective.

또한, 제2 페르소나 코드는 복수의 서브 페르소나 코드를 포함할 수 있다. 각각의 서브 페르소나 코드는 각각의 특성에 대응될 수 있다.Additionally, the second persona code may include a plurality of sub-persona codes. Each sub-persona code may correspond to each characteristic.

구체적으로, 도 6을 참조하면, 환경 요소는 카테고리 내에 복수 개의 특성을 포함할 수 있다. 구체적으로, 환경 요소에 포함된 특성은 가정(Home), 직장(Office) 및 소셜(Social)에 대한 가치관 및 욕구의 유형을 포함할 수 있다. 예를 들어, 환경 요소는, 가정에 관련하여 결혼여부 및 자녀여부를 포함할 수 있고, 직장에 관련하여 직장명 및 직업을 포함할 수 있다. 또한, 환경 요소는, 소셜에 관련하여 이성친구 유무를 포함할 수 있다.Specifically, referring to FIG. 6, an environmental element may include a plurality of characteristics within a category. Specifically, characteristics included in environmental factors may include types of values and desires for home, work, and social. For example, environmental factors may include marital status and child status in relation to home, and may include name and occupation in relation to work. Additionally, environmental factors may include the presence or absence of friends of the opposite sex in relation to social networking.

한편, 순간 요소는 카테고리 내에 복수 개의 특성을 포함할 수 있다. 구체적으로, 순간 요소에 포함된 특성은 감정, 상황 및 취향에 대한 성향 유형을 포함할 수 있다. 예를 들어, 순간 요소는, 감정에 관련하여 기쁨, 슬픔 및 분노를 포함할 수 있고, 상황에 관련하여 날씨 및 기념일을 포함할 수 있다. 또한, 순간 요소는, 취향에 관련하여 입맛, 최신 트렌드를 포함할 수 있다.Meanwhile, a moment element may include multiple characteristics within a category. Specifically, characteristics included in moment elements may include types of dispositions for emotions, situations, and tastes. For example, moment elements may include joy, sadness, and anger in relation to emotions, and weather and anniversaries in relation to situations. Additionally, in-the-moment elements may include tastes and current trends in relation to tastes.

도 7은 도 6의 카테고리에 포함된 설문 데이터와 상기 설문 데이터에 대응하는 답변 데이터의 예시를 나타내는 도면이다.FIG. 7 is a diagram illustrating an example of survey data included in the category of FIG. 6 and response data corresponding to the survey data.

도 2 및 도 7을 참조하면, 서버(100)의 데이터베이스(120)에는 카테고리에 대응하는 특성과, 각각의 특성에 대응하는 설문 데이터가 저장될 수 있다. 구체적으로, 카테고리를 본성, 환경 및 순간 요소로 분류한 경우에, 환경 요소 내의 특성은 가정, 자녀, 직장 및 생활 수준을 포함할 수 있다. 설문 데이터는 가정 특성에 대응하여 결혼 여부, 결혼 희망 시기, 결혼 희망 이유 등에 관한 질문을 포함할 수 있다. 답변 데이터는 결혼 여부에 대해 미혼 또는 기혼 여부를 선택하는 선택형 답변일 수 있다. 다른 예로, 답변 데이터는 결혼 희망 시기에 관해 나이 또는 희망연도를 기재하는 단답형 답변일 수 있다. 또 다른 예로, 답변 데이터는 결혼 희망 이유에 관해 사용자의 결혼관을 구체적으로 기재한 서술형 답변일 수 있다.Referring to FIGS. 2 and 7 , characteristics corresponding to categories and survey data corresponding to each characteristic may be stored in the database 120 of the server 100. Specifically, if the categories are divided into nature, environment, and moment factors, characteristics within the environment factor may include home, children, work, and standard of living. Survey data may include questions about marital status, when one wishes to marry, and the reasons for wishing to marry, in response to family characteristics. The answer data may be a multiple-choice answer for marital status, such as single or married. As another example, the response data may be a short answer that describes the age or desired year regarding the desired time to get married. As another example, the response data may be a descriptive answer that specifically describes the user's view on marriage regarding the reasons for wanting to get married.

도 8은 도 3의 전처리 데이터를 생성하는 과정을 나타내는 순서도이다. 도 9는 도 8의 전처리 데이터를 생성하는 전처리 모듈의 구체적인 구성을 나타내는 블록도이다. 도 10은 도 8의 비정형 데이터의 예시를 나타내는 도면이다. 도 11은 도 10의 비정형 데이터에 대해 전처리 동작을 수행한 정형화 데이터의 예시를 나타내는 도면이다.Figure 8 is a flowchart showing the process of generating the preprocessing data of Figure 3. FIG. 9 is a block diagram showing the specific configuration of a preprocessing module that generates the preprocessing data of FIG. 8. FIG. 10 is a diagram showing an example of unstructured data of FIG. 8. FIG. 11 is a diagram illustrating an example of structured data that has undergone a preprocessing operation on the unstructured data of FIG. 10.

도 8 및 도 9를 참조하면, 본 발명의 몇몇 실시예에서 이용하는 전처리 모듈(PM)은 텍스트 마이닝 모델(TMM), 정형화 모델(NM) 및 가중치 산출 모델(WCM)을 포함할 수 있다. 전처리 모듈(PM)의 텍스트 마이닝 모듈(TMM)은 답변 데이터에 대한 텍스트 마이닝을 통해 답변 데이터에 포함된 주요 키워드만으로 구성된 추출 데이터를 생성할 수 있다(S210).Referring to Figures 8 and 9, the preprocessing module (PM) used in some embodiments of the present invention may include a text mining model (TMM), a normalization model (NM), and a weight calculation model (WCM). The text mining module (TMM) of the preprocessing module (PM) can generate extracted data consisting of only the main keywords included in the answer data through text mining on the answer data (S210).

답변 데이터는 선택형 또는 단답형 텍스트를 포함할 수 있다. 또한, 답변 데이터는, 문장형 텍스트를 포함할 수 있다. 이때, 서버(100)의 데이터베이스(120)는 미리 설정된 주요 키워드를 저장할 수 있다. 데이터베이스(120)는 사용자의 성향을 나타내는 주요 키워드를 포함할 수 있다. 텍스트 마이닝 모듈(TMM)은 답변 데이터를 크롤링하여 주요 키워드 도출을 위한 텍스트 마이닝을 수행할 수 있다. 구체적으로, 텍스트 마이닝 모듈(TMM)은 파이썬(python)의 판다스(pandas)를 활용하여 포함된 단어를 인식할 수 있다. 이어서, 텍스트 마이닝 모듈(TMM)은 미리 저장된 테이블과 크롤링된 단어를 비교하여 답변 데이터에 포함된 주요 키워드를 인식할 수 있다. 텍스트 마이닝은, 예를 들어, 형태소 분석 알고리즘을 활용하여 데이터베이스(120)에 미리 저장된 테이블과 크롤링된 단어를 비교할 수 있다. 즉, 텍스트 마이닝 모듈(TMM)은 미리 저장된 테이블과 전처리 데이터에 모두 포함되는 주요 키워드만으로 구성된 추출 데이터를 생성할 수 있다.Answer data may include multiple-choice or short-answer text. Additionally, the answer data may include sentence-type text. At this time, the database 120 of the server 100 may store preset main keywords. The database 120 may include key keywords that indicate user tendencies. The text mining module (TMM) can perform text mining to derive key keywords by crawling response data. Specifically, the text mining module (TMM) can recognize contained words using Python's pandas. Subsequently, the text mining module (TMM) can recognize key keywords included in the answer data by comparing the crawled words with the pre-stored table. Text mining, for example, may utilize a morphological analysis algorithm to compare crawled words with a table previously stored in the database 120. In other words, the text mining module (TMM) can generate extracted data consisting of only key keywords included in both pre-stored tables and preprocessed data.

이때, 미리 저장된 테이블에는 신조어와 의미가 모호한 단어가 포함될 수 있다. 즉, 데이터베이스(120)는 신조어와 의미가 모호한 단어를 별개로 분류하여 저장할 수 있다.At this time, the pre-stored table may include new words and words with ambiguous meanings. That is, the database 120 can separately classify and store new words and words with ambiguous meanings.

이어서, 전처리 모듈(PM)에 포함된 정형화 모델(NM)은 비정형 데이터를 기초로 정형화 데이터를 생성할 수 있다(S220). 구체적으로, 서버(100)는 비정형 데이터에 포함된 객체와 관련된 키워드들을 추출하여, 비정형 데이터에 대한 정형화 데이터를 생성할 수 있다.Next, the structured model (NM) included in the preprocessing module (PM) can generate structured data based on unstructured data (S220). Specifically, the server 100 may extract keywords related to objects included in the unstructured data and generate structured data for the unstructured data.

도 9를 참조하면, 비정형 데이터는, 이미지 데이터, 영상 데이터, 음성 데이터 및 위치 정보 데이터를 포함할 수 있다. 이때, 비정형 데이터는 사용자를 둘러싼 환경에 대한 데이터를 포함할 수 있다. 다만, 이는 비정형 데이터의 몇몇 예시에 불과하고, 본 발명이 이에 한정되는 것은 아니다. Referring to FIG. 9, unstructured data may include image data, video data, voice data, and location information data. At this time, unstructured data may include data about the environment surrounding the user. However, these are only a few examples of unstructured data, and the present invention is not limited to these.

구체적으로 도 10을 참조하면, 이미지 데이터는 사용자의 성향을 판단하는 데 근거가 되는 이미지를 포함할 수 있다. 이미지 데이터는, 예를 들어, 사용자가 섭취하는 음식 또는 음료의 이미지를 포함할 수 있다. 또한, 이미지 데이터는 사용자가 착용한 의복, 신발 등을 포함할 수 있다. 그 외에, 이미지 데이터는 사용자의 취향이 반영된 공간의 이미지를 포함할 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.Specifically, referring to FIG. 10 , image data may include images that serve as a basis for determining a user's tendency. Image data may include, for example, images of food or drinks consumed by the user. Additionally, image data may include clothing, shoes, etc. worn by the user. In addition, image data may include images of spaces that reflect the user's tastes. However, this embodiment is not limited to this.

다시 도 9를 참조하면, 음성 데이터는 사용자가 선호하는 음악이나 주변 소음을 포함할 수 있다. 마찬가지로, 영상 데이터는 사용자의 주위 환경을 포함할 수 있다.Referring again to FIG. 9, voice data may include the user's preferred music or ambient noise. Likewise, image data may include the user's surrounding environment.

한편, 비정형 데이터는 사용자의 반응을 나타내는 데이터를 포함할 수 있다. 구체적으로, 이미지 데이터 또는 영상 데이터는 사용자의 반응에서 기쁨, 슬픔, 놀람, 분노, 공포, 당황 등의 감정을 포함할 수 있다. 즉, 이미지 데이터 또는 영상 데이터는 비언어적 표현(예. 자세, 표정, 시선)을 포함할 수 있다.Meanwhile, unstructured data may include data representing user reactions. Specifically, image data or video data may include emotions such as joy, sadness, surprise, anger, fear, and embarrassment in the user's reaction. That is, image data or video data may include non-verbal expressions (eg, posture, facial expression, gaze).

또한, 음성 데이터는 사용자의 발화 습관이나 사용하는 언어의 수준을 포함할 수 있다. 구체적으로, 음성 데이터는 반언어적 표현(예. 발음, 조음, 고저, 어조, 속도)을 포함할 수 있다. 또한, 위치 데이터는 사용자가 선호하는 장소 또는 사용자의 주된 소비처를 포함할 수 있다. 다시 말해서, 비정형 데이터는 사용자의 성향에 관련하여, 텍스트 외의 형식을 가지는 데이터를 포함할 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.Additionally, voice data may include the user's speaking habits or the level of language used. Specifically, speech data may include semi-linguistic expressions (e.g., pronunciation, articulation, pitch, tone, speed). Additionally, location data may include the user's preferred location or the user's main consumption location. In other words, unstructured data may include data in a format other than text, related to the user's tendencies. However, this embodiment is not limited to this.

정형화 모델(NM)은 비정형 데이터는 정형화 데이터를 생성할 수 있다. 구체적으로, 정형화 모델(NM)은 비정형 데이터를 텍스트로 나타낼 수 있다. 즉, 정형화 데이터는 비정형 데이터의 내용을 텍스트로 나타낸 데이터이다. 도 11을 참조하면, 비정형 데이터가 사용자의 이미지 데이터인 경우에, 정형화 모듈(NM)은 이미지 데이터로부터 얼굴 각 부위의 수치를 정형화 데이터로 도출할 수 있다. 또한, 정형화 데이터는 얼굴 각 부위의 수치의 조합으로 도출한 감정을 포함할 수 있다. A formal model (NM) can generate structured data from unstructured data. Specifically, a structured model (NM) can represent unstructured data as text. In other words, structured data is data that expresses the contents of unstructured data in text. Referring to FIG. 11, when the unstructured data is the user's image data, the normalization module (NM) can derive the numerical value of each part of the face from the image data as standardized data. Additionally, standardized data may include emotions derived from a combination of values for each part of the face.

다시 도 8을 참조하면, 전처리 모듈(PM)에 포함된 가중치 산출 모델(WCM)은 답변 데이터에 포함된 각 응답의 신뢰도와 정형화 데이터에 포함된 키워드의 중요도를 기초로 추출 데이터 및 정형화 데이터의 가중치를 산출할 수 있다(S230). 다시 말해서, 답변 데이터에 포함된 설문 데이터의 각 응답에 대한 신뢰도와, 정형화 데이터에 포함된 키워드의 중요도를 기초로, 추출 데이터 및 정형화 데이터의 가중치를 산출할 수 있다.Referring again to FIG. 8, the weight calculation model (WCM) included in the preprocessing module (PM) calculates the weights of the extracted data and standardized data based on the reliability of each response included in the answer data and the importance of keywords included in the standardized data. can be calculated (S230). In other words, the weight of the extracted data and the standardized data can be calculated based on the reliability of each response in the survey data included in the answer data and the importance of the keywords included in the standardized data.

구체적으로, 답변 데이터에 포함된 각 응답의 신뢰도는 데이터의 가치에 따라 결정될 수 있다. 즉, 신뢰도는 연속성 있는 데이터 수집과 검수를 통해 결정될 수 있다. 예를 들어, 동일한 카테고리의 동일한 특성에 대한 2 이상의 설문 데이터가 존재하는 경우, 각 설문에 대해 모순된 답변을 한다면 신뢰도가 낮게 결정될 수 있다. 반면, 각 설문에 대해 일관된 답변을 한다면 신뢰도가 높게 결정될 수 있다.Specifically, the reliability of each response included in the response data may be determined according to the value of the data. In other words, reliability can be determined through continuous data collection and inspection. For example, if there are two or more survey data on the same characteristics in the same category, the reliability may be determined to be low if contradictory answers are given to each survey. On the other hand, if you provide consistent answers to each questionnaire, reliability can be determined to be high.

또 다른 예로, 각 응답의 신뢰도는 답변 데이터의 업데이트 횟수와 업데이트 시기에 따라 결정될 수 있다. 답변 데이터가 자주 업데이트되고, 최근에 업데이트되었다면 신뢰도가 높게 결정될 수 있다. 반면, 답변 데이터가 드물게 업데이트되고, 과거에 업데이트되었다면 신뢰도가 낮게 결정될 수 있다.As another example, the reliability of each response may be determined based on the number of updates and update timing of the response data. If the answer data is updated frequently and recently, reliability can be determined to be high. On the other hand, if the answer data is rarely updated and has been updated in the past, the reliability may be determined to be low.

한편, 중요도는 비정형 데이터에서 도출된 키워드에 기초하여 도출될 수 있다. 정형화 데이터에 포함된 단어의 중요도는 비정형 데이터의 종류에 따라 결정될 수 있다. 구체적으로, 비정형 데이터가 이미지 데이터인 경우에, 중요도는 이미지 데이터에서 도출된 키워드에 해당하는 이미지가 차지하는 비중에 따라 결정될 수 있다. 예를 들어, 키워드에 해당하는 이미지가 이미지 데이터의 대부분을 차지하는 경우, 중요도가 높게 결정될 수 있다. 반면, 키워드에 해당하는 이미지가 이미지 데이터의 일부만을 차지하는 경우, 중요도가 낮게 결정될 수 있다.Meanwhile, importance can be derived based on keywords derived from unstructured data. The importance of words included in structured data can be determined depending on the type of unstructured data. Specifically, when the unstructured data is image data, importance may be determined according to the proportion of images corresponding to keywords derived from the image data. For example, if images corresponding to keywords occupy most of the image data, their importance may be determined to be high. On the other hand, if the image corresponding to the keyword occupies only a portion of the image data, its importance may be determined to be low.

또 다른 예로, 중요도는 비정형 데이터의 업데이트 횟수와 업데이트 시기에 따라 결정될 수 있다. 구체적으로, 비정형 데이터가 자주 업데이트되고, 최근에 업데이트되었다면 신뢰도가 높게 결정될 수 있다. 반면, 비정형 데이터가 드물게 업데이트되고, 과거에 업데이트되었다면 신뢰도가 낮게 결정될 수 있다.As another example, importance can be determined based on the number of updates and timing of updates of unstructured data. Specifically, if unstructured data is frequently updated and was recently updated, reliability may be determined to be high. On the other hand, if unstructured data is updated infrequently and has been updated in the past, reliability may be determined to be low.

이어서, 전처리 모듈(PM)은 산출된 가중치를 포함하는 전처리 데이터를 생성할 수 있다(S240), 다시 말해서, 추출 데이터 및 정형화 데이터와, 이에 대한 가중치를 포함하는 전처리 데이터를 생성할 수 있다. 구체적으로, 답변 데이터의 경우, 전처리 데이터는 추출 데이터와 각 응답의 신뢰도에 기초한 가중치를 포함할 수 있다. 또한, 비정형 데이터의 경우, 전처리 데이터는 정형화 데이터와 키워드의 중요도에 기초한 가중치를 포함할 수 있다.Subsequently, the preprocessing module (PM) may generate preprocessing data including the calculated weights (S240). In other words, it may generate preprocessing data including extracted data and standardized data and weights therefor. Specifically, in the case of response data, preprocessing data may include extracted data and weights based on the reliability of each response. Additionally, in the case of unstructured data, preprocessed data may include weights based on the importance of structured data and keywords.

정리하면, 전처리 데이터는 답변 데이터를 기초로 도출된 추출 데이터 및 이에 대한 제1 가중치와, 비정형 데이터를 기초로 도출된 정형화 데이터 및 이에 대한 제2 가중치를 포함할 수 있다. 이때, 제1 가중치와 제2 가중치는 답변 데이터 및 비정형 데이터에 대한 미리 정해진 비율을 기초로 조정될 수 있음은 물론이다. 다만, 이는 본 발명의 일 예시에 불과하고, 전처리 데이터에 포함되는 가중치는 다양한 방식으로 산출되어 이용될 수 있다.In summary, the preprocessed data may include extracted data derived based on answer data and a first weight thereof, and structured data derived based on unstructured data and a second weight thereof. At this time, of course, the first weight and the second weight can be adjusted based on a predetermined ratio for answer data and unstructured data. However, this is only an example of the present invention, and the weights included in the preprocessing data can be calculated and used in various ways.

도 12는 도 3의 제1 페르소나 코드를 생성하는 과정을 구체화한 일 실시예를 나타내는 순서도이다. 도 13은 도 12의 태그 모듈과 제1 페르소나 코드 생성 모듈의 구성을 구체화한 블록도이다.FIG. 12 is a flow chart illustrating an embodiment of the process of generating the first persona code of FIG. 3. FIG. 13 is a block diagram specifying the configuration of the tag module and first persona code generation module of FIG. 12.

도 12 및 도 13을 참조하면, 제1 페르소나 생성 모듈(PGCM1)은 제1 내지 제4 스코어 산출 모델(SCM1~4) 및 스코어 보정 모델(SRM)을 포함할 수 있다. 태그 모듈(TM)은 제1 카테고리에 포함된 제1 특성과 관련된 제1 태그들을 전처리 데이터에 부여하여 제1 태그 부여 데이터를 생성할 수 있다(S410). 사용자의 성향에 대한 카테고리는 제1 내지 제4 카테고리를 포함할 수 있다. 하나의 카테고리는 적어도 하나의 특성을 포함할 수 있다. 즉, 제1 카테고리는 제1 내지 제4 특성을 포함할 수 있다.Referring to FIGS. 12 and 13 , the first persona creation module (PGCM1) may include first to fourth score calculation models (SCM1 to 4) and a score correction model (SRM). The tag module TM may generate first tag assigned data by assigning first tags related to the first characteristic included in the first category to the preprocessed data (S410). Categories for user tendencies may include first to fourth categories. One category may include at least one characteristic. That is, the first category may include the first to fourth characteristics.

태그 모듈(TM)은 미리 저장된 카테고리 및 특성에 따라 태그를 부여할 수 있다. 태그는 전처리 데이터에 포함된 주요 키워드에 대해 자동으로 부여될 수 있다. 구체적으로, 태그 모듈(TM)은 제1 특성과 관련된 제1 태그들을 전처리 데이터에 부여할 수 있다. 이때, 태그 모듈(TM)은 전처리 데이터에 제1 태그를 부여하여 제1 태그 부여 데이터를 생성할 수 있다. 이어서, 제1 태그 부여 데이터는 제1 페르소나 생성 모듈(PCGM1)에 전송될 수 있다. The tag module (TM) can assign tags according to pre-stored categories and characteristics. Tags can be automatically assigned to key keywords included in preprocessed data. Specifically, the tag module (TM) may assign first tags related to the first characteristic to the preprocessed data. At this time, the tag module (TM) may generate first tagged data by assigning a first tag to the preprocessed data. Subsequently, the first tag assignment data may be transmitted to the first persona creation module (PCGM1).

이때, 제1 페르소나 생성 모듈(PCGM1)은 제1 내지 제4 스코어 산출 모델(SCM1~4) 및 스코어 보정 모델(SRM)을 포함할 수 있다. 즉, 제1 스코어 산출 모델(SCM1)은 제1 태그 부여 데이터를 수신할 수 있다.At this time, the first persona generation module (PCGM1) may include first to fourth score calculation models (SCM1 to 4) and a score correction model (SRM). That is, the first score calculation model (SCM1) can receive first tag assignment data.

이어서, 제1 스코어 산출 모델(SCM1)은 태그가 부여된 전처리 데이터에 대한 가중치를 고려하여 제1 특성에 대한 제1 스코어를 산출할 수 있다(S420). 이때, 제1 스코어는 제1 태그 부여 데이터에 가중치를 곱한 값일 수 있다. 또한, 가중치가 정수일 때, 제1 태그 부여 데이터가 가중치만큼 존재한다고 보고, 제1 스코어는 제1 태그 부여 데이터의 수로 나타낼 수 있다. 단, 본 발명의 실시예가 이에 제한되는 것은 아니다.Subsequently, the first score calculation model (SCM1) may calculate the first score for the first characteristic by considering the weight of the tagged preprocessed data (S420). At this time, the first score may be a value obtained by multiplying the first tagged data by a weight. Additionally, when the weight is an integer, it is considered that there is as much first tagged data as the weight, and the first score can be expressed as the number of first tagged data. However, the embodiments of the present invention are not limited thereto.

이어서, 제2 스코어 산출 모델(SCM2)은 제1 카테고리에 포함된 제1 특성과 다른 제2 특성과 관련된 제2 태그들을 전처리 데이터에 부여하여 제2 태그 부여 데이터를 생성할 수 있다(S430). 구체적으로, 태그 모듈(TM)은 제2 특성과 관련된 제2 태그들을 전처리 데이터에 부여할 수 있다. 이때, 태그 모듈(TM)은 전처리 데이터에 제2 태그를 부여하여 제2 태그 부여 데이터를 생성할 수 있다. 이어서, 제2 태그 부여 데이터는 제1 페르소나 생성 모듈(PCGM1)에 전송될 수 있다. Subsequently, the second score calculation model (SCM2) may generate second tag assigned data by assigning second tags related to the second characteristic different from the first characteristic included in the first category to the preprocessed data (S430). Specifically, the tag module (TM) may assign second tags related to the second characteristic to the preprocessed data. At this time, the tag module (TM) may generate second tagged data by assigning a second tag to the preprocessed data. Subsequently, the second tag assignment data may be transmitted to the first persona creation module (PCGM1).

즉, 제2 스코어 산출 모델(SCM2)은 제2 태그 부여 데이터를 수신할 수 있다. 마찬가지로, 제3 스코어 산출 모델(SCM3)은 제3 태그 부여 데이터를, 제4 스코어 산출 모델(SCM4)은 제4 태그 부여 데이터를 수신할 수 있다.That is, the second score calculation model (SCM2) can receive the second tag assignment data. Likewise, the third score calculation model (SCM3) may receive third tag assignment data, and the fourth score calculation model (SCM4) may receive fourth tag assignment data.

이어서, 제2 태그가 부여된 키워드의 전처리 데이터에 대한 가중치를 고려하여 제2 특성에 대한 제2 스코어를 산출할 수 있다(S440).Next, a second score for the second characteristic may be calculated by considering the weight of the preprocessed data of the keyword to which the second tag is assigned (S440).

이때, 제2 스코어는 제2 태그 부여 데이터에 가중치를 곱한 값일 수 있다. 또한, 가중치가 정수일 때, 제2 태그 부여 데이터가 가중치만큼 존재한다고 보고, 제2 스코어는 제2 태그 부여 데이터의 수로 나타낼 수 있다. 마찬가지로, 제3 제4 스코어는 각각 제3 및 제4 태그 부여 데이터에 기초하여 산정할 수 있다.At this time, the second score may be a value obtained by multiplying the second tag assigned data by a weight. Additionally, when the weight is an integer, it is considered that there is as much second tagged data as the weight, and the second score can be expressed as the number of second tagged data. Likewise, the third and fourth scores can be calculated based on the third and fourth tag assignment data, respectively.

이어서, 스코어 보정 모델(SRM)은 사용자가 미리 입력한 사용자 정보를 기초로 산출된 제1 스코어 및 제2 스코어를 보정할 수 있다(S450). 구체적으로, 사용자 정보는 회원가입 시 입력한 사용자의 개인 정보를 의미할 수 있다. 예를 들어, 사용자 정보는 사용자의 성별, 이름, 나이 또는 주소를 포함할 수 있다. 이때, 스코어 보정 모델(SRM)은 사용자 정보를 이용하여 제1 및 제2 스코어를 보정할 수 있다. 다만, 본 발명의 몇몇 실시예에서 S250 단계는 생략되어 실시될 수 있음은 물론이다.Next, the score correction model (SRM) may correct the first score and the second score calculated based on user information pre-entered by the user (S450). Specifically, user information may refer to the user's personal information entered when registering as a member. For example, user information may include the user's gender, name, age, or address. At this time, the score correction model (SRM) can correct the first and second scores using user information. However, of course, in some embodiments of the present invention, step S250 may be omitted.

이어서, 제1 스코어 및 제2 스코어에 대응되는 제1 서브 페르소나 코드 및 제2 서브 페르소나 코드를 부여할 수 있다(S460). 구체적으로, 제1 서브 페르소나 코드는 제1 스코어에 대응한 한 자리의 숫자 또는 문자일 수 있다. 또한, 제2 서브 페르소나 코드는 제2 스코어에 대응한 한 자리의 숫자 또는 문자일 수 있다.Next, the first sub-persona code and the second sub-persona code corresponding to the first score and the second score may be assigned (S460). Specifically, the first sub-persona code may be a one-digit number or letter corresponding to the first score. Additionally, the second sub-persona code may be a one-digit number or letter corresponding to the second score.

한편, 제1 페르소나 코드 생성 모듈(PCGM1)은 제3 스코어 및 제4 스코어에 각각 대응되는 제3 서브 페르소나 코드 및 제4 서브 페르소나 코드를 생성할 수 있다. 다만, 본 발명의 실시예가 이에 제한되는 것은 아니다. Meanwhile, the first persona code generation module (PCGM1) may generate a third sub-persona code and a fourth sub-persona code corresponding to the third score and the fourth score, respectively. However, embodiments of the present invention are not limited thereto.

이어서, 제1 및 제2 서브 페르소나 코드를 결합하여 제1 페르소나 코드를 생성할 수 있다(S470). 추가적으로, 제1 및 제2 서브 페르소나 코드에 제3 및 제4 서브 페르소나 코드를 더 결합하여 제1 페르소나 코드를 생성할 수 있다. 이때, 제1 페르소나 코드는 제1 내지 제4 서브 페르소나 코드를 포함할 수 있다.Next, the first and second sub-persona codes can be combined to generate the first persona code (S470). Additionally, the first persona code can be generated by further combining the third and fourth sub-persona codes with the first and second sub-persona codes. At this time, the first persona code may include first to fourth sub-persona codes.

도 14는 도 3의 제1 페르소나 코드를 생성하는 과정을 구체화한 다른 실시예를 나타내는 순서도이다. 도 15는 도 14의 태그 모듈과 제1 페르소나 코드 생성 모듈의 구성을 구체화한 블록도이다. 도 16은 도 15에서 태그를 부여하는 예시를 나타내는 도면이다. 이하에서는 전술한 내용과 중복되는 내용은 생략하고 차이점을 위주로 설명하도록 한다.FIG. 14 is a flowchart showing another embodiment specifying the process of generating the first persona code of FIG. 3. FIG. 15 is a block diagram specifying the configuration of the tag module and first persona code generation module of FIG. 14. FIG. 16 is a diagram showing an example of assigning a tag in FIG. 15. In the following, content that overlaps with the above-mentioned content will be omitted and the differences will be mainly explained.

도 14 내지 도 16을 참조하면, 전처리 데이터에 주제별 태그를 부여할 수 있다(S415). 서버(100)는 전처리 데이터에 포함된 키워드에, 미리 설정된 주제별 태그를 부여할 수 있다. 이때, 미리 설정된 주제는 키워드를 그룹화할 수 있는 분류 기준일 수 있다. 다시 말해서, 미리 설정된 주제는 제1 및 제2 특성을 포함할 수 있다. 또한, 서버(100)의 데이터베이스(120)는 키워드와 연관될 수 있는 주제를 미리 저장할 수 있다.Referring to FIGS. 14 to 16, tags by subject can be assigned to preprocessed data (S415). The server 100 may assign preset tags for each topic to keywords included in the preprocessed data. At this time, the preset topic may be a classification standard that can group keywords. In other words, the preset subject may include first and second characteristics. Additionally, the database 120 of the server 100 may store topics that may be associated with keywords in advance.

예를 들어, 전처리 데이터가 '활발한', '사교적인', '친절한', '감정적인'을 키워드로 포함하는 경우, 주제는 성격일 수 있다. 또 다른 예로, 전처리 데이터가 '올빼미족', '모임 많음', '가족과 거주', '필라테스', '국내여행 즐김'을 키워드로 포함하는 경우, 주제는 라이프스타일일 수 있다.For example, if the preprocessed data includes 'lively', 'sociable', 'friendly', and 'emotional' as keywords, the topic may be personality. As another example, if the preprocessed data includes 'night owl', 'lots of gatherings', 'living with family', 'pilates', and 'enjoying domestic travel' as keywords, the topic may be lifestyle.

이어서, 전처리 데이터에 부여된 주제별 태그를 제1 특성 또는 제2 특성으로 분류할 수 있다(S425). 태그 분류 및 스코어링 모델(TDSM)은 태그 부여 데이터를 특성별로 분류할 수 있다. 구체적으로, 태그 분류 및 스코어링 모델(TDSM)은 주제에 대해 태그 부여 데이터와 대응되는 전처리 데이터의 내용을 특성별로 분류할 수 있다. 이때, 제1 특성과 제2 특성은 서로 상이한 특성을 나타낼 수 있다. 예를 들어, 제1 특성이 성격을 의미하고, 제2 특성은 라이프스타일을 의미할 수 있다.Next, thematic tags assigned to the preprocessed data can be classified into first characteristics or second characteristics (S425). Tag classification and scoring model (TDSM) can classify tagged data by characteristics. Specifically, the tag classification and scoring model (TDSM) can classify the contents of preprocessed data corresponding to tagged data for a topic by characteristics. At this time, the first characteristic and the second characteristic may exhibit different characteristics. For example, the first characteristic may mean personality, and the second characteristic may mean lifestyle.

몇몇 실시예에서, 복수의 주제가 하나의 특성을 나타낼 수 있다. 예를 들어, 주제가 성격과 라이프스타일이고, 두 개의 주제로 하나의 특성, 즉, 외향적인 또는 내향적인 사용자의 성향을 나타낼 수 있다.In some embodiments, multiple subjects may represent a characteristic. For example, the topic is personality and lifestyle, and the two topics can represent one characteristic, that is, the user's tendency to be extroverted or introverted.

이어서, 제1 특성에 대한 태그가 부여된 전처리 데이터의 가중치를 고려하여 제1 특성에 대한 제1 스코어를 산출할 수 있다(S435). 태그 분류 및 스코어링 모델(TDSM)은 전처리 데이터의 가중치를 고려하여 제1 스코어를 산출할 수 있다. 이때, 제1 스코어는 다수로 분류된 태그의 수에 가중치를 곱한 값일 수 있다. Next, the first score for the first characteristic can be calculated by considering the weight of the preprocessed data tagged for the first characteristic (S435). The tag classification and scoring model (TDSM) can calculate the first score by considering the weight of the preprocessed data. At this time, the first score may be a value obtained by multiplying the number of tags classified as majority by a weight.

구체적으로, 제1 특성이 성격인 경우에, 외향적인 성향을 나타내는 키워드 데이터와 내향적인 성향을 나타내는 키워드가 존재할 수 있다. 예를 들어, '활발한', '사교적인', '친절한'은 외향적인 성향을 나타내는 키워드로 분류될 수 있고, '감정적인', '주관이 뚜렷한'은 내향적인 성향을 나타내는 키워드로 분류될 수 있다. 이때, 외향적인 성향을 나타내는 키워드에 대응하는 태그가 3개이고, 내향적인 성향을 나타내는 키워드에 대응하는 태그가 2개이므로, 외향적인 성향을 나타내는 태그가 다수를 차지할 수 있다. 이때, 제1 스코어는 3*(가중치)일 수 있다. 단, 본 발명의 실시예가 이에 제한되는 것은 아니다.Specifically, when the first characteristic is personality, keyword data indicating an extroverted tendency and keywords indicating an introverted tendency may exist. For example, 'active', 'sociable', and 'friendly' can be classified as keywords indicating an extroverted tendency, while 'emotional' and 'subjective' can be classified as keywords indicating an introverted tendency. there is. At this time, since there are three tags corresponding to keywords indicating extroverted tendencies and two tags corresponding to keywords indicating introverted tendencies, the tags indicating extroverted tendencies may account for the majority. At this time, the first score may be 3* (weight). However, the embodiments of the present invention are not limited thereto.

이어서, 제2 특성에 대한 태그가 부여된 전처리 데이터의 가중치를 고려하여 제2 특성에 대한 제2 스코어를 산출할 수 있다(S445). 마찬가지로, 태그 분류 및 스코어링 모델(TDSM)은 전처리 데이터의 가중치를 고려하여 제2 스코어를 산출할 수 있다. 이때, 제2 스코어는 다수로 분류된 태그의 수에 가중치를 곱한 값일 수 있다.Next, a second score for the second characteristic may be calculated by considering the weight of the preprocessed data tagged for the second characteristic (S445). Likewise, the tag classification and scoring model (TDSM) can calculate the second score by considering the weight of the preprocessed data. At this time, the second score may be a value obtained by multiplying the number of tags classified as majority by a weight.

구체적으로, 제2 특성이 라이프스타일인 경우에, 사회적인 성향을 나타내는 키워드와 독립적인 성향을 나타내는 키워드가 존재할 수 있다. 예를 들어, '모임 많음', '가족과 거주', '국내여행 즐김'은 사회적인 성향을 나타내는 키워드로 분류될 수 있다. 반면, '올빼미족', '필라테스'는 독립적인 성향을 나타내는 키워드로 분류될 수 있다. 다만, 본 발명의 실시예가 이에 제한되는 것은 아니다.Specifically, when the second characteristic is a lifestyle, there may be a keyword indicating a social tendency and a keyword indicating an independent tendency. For example, ‘lots of gatherings’, ‘living with family’, and ‘enjoying domestic travel’ can be classified as keywords that indicate social tendencies. On the other hand, 'night owl' and 'Pilates' can be classified as keywords that indicate an independent tendency. However, embodiments of the present invention are not limited thereto.

이때, 사회적인 성향을 나타내는 키워드에 대응하는 태그가 4개이고, 내향적인 성향을 나타내는 키워드에 대응하는 태그가 2개이므로, 사회적인 성향을 나타내는 태그가 다수를 차지할 수 있다. 이때, 제2 스코어는 4*(가중치)일 수 있다. At this time, since there are four tags corresponding to keywords indicating social tendencies and two tags corresponding to keywords indicating introverted tendencies, the tags indicating social tendencies may account for the majority. At this time, the second score may be 4* (weight).

이어서, 서버(100)는 전술한 S450 내지 S470 단계를 수행할 수 있다.Subsequently, the server 100 may perform steps S450 to S470 described above.

이하에서는, 본 발명에 몇몇 실시예에서 제2 페르소나 코드를 생성하는 방법에 대해 자세히 설명하도록 한다.Hereinafter, a method for generating a second persona code in some embodiments of the present invention will be described in detail.

도 17은 도 3의 제2 페르소나 코드를 생성하는 과정을 구체화한 순서도이다. 도 18은 도 17의 제2 페르소나 코드 생성 모듈의 구성을 나타내는 블록도이다.FIG. 17 is a flowchart detailing the process of generating the second persona code of FIG. 3. FIG. 18 is a block diagram showing the configuration of the second persona code generation module of FIG. 17.

도 17 및 도 18을 참조하면, 제2 페르소나 코드 생성 모듈(PCGM)은 분류 모델(DM)과 적어도 하나의 서브 페르소나 코드 생성 모듈을 포함할 수 있다. 구체적으로, 제2 페르소나 코드 생성 모듈(PCGM)은 제1 서브 페르소나 코드 생성 모듈(SPCM1) 내지 제8 서브 페르소나 코드 생성 모듈(SPCM8)을 포함할 수 있다.Referring to FIGS. 17 and 18 , the second persona code generation module (PCGM) may include a classification model (DM) and at least one sub-persona code generation module. Specifically, the second persona code generation module (PCGM) may include a first sub-persona code generation module (SPCM1) to an eighth sub-persona code generation module (SPCM8).

서버(100)는 전처리 데이터를 제2 카테고리에 포함된 미리 정해진 각 특성에 따라 분류할 수 있다(S510). 즉, 서버(100)는 전처리 데이터를 상기 최종 페르소나 코드의 제2 카테고리에 포함된 미리 정해진 각 특성에 따라 분류할 수 있다.The server 100 may classify the preprocessed data according to each predetermined characteristic included in the second category (S510). That is, the server 100 may classify the preprocessed data according to each predetermined characteristic included in the second category of the final persona code.

이때, 서버(100)의 데이터베이스(120)는 제2 카테고리에 적어도 하나의 특성을 포함할 수 있다. 제2 카테고리는, 예를 들어, 제5 내지 제12 특성을 포함할 수 있다. 여기에서, 제5 내지 제12 특성은 도 6을 참조하여 전술한 환경 요소 및 순간 요소에 관한 특성일 수 있으나, 본 발명이 이에 한정되는 것은 아니다.At this time, the database 120 of the server 100 may include at least one characteristic in the second category. The second category may include, for example, the fifth to twelfth characteristics. Here, the fifth to twelfth characteristics may be characteristics related to environmental factors and momentary factors described above with reference to FIG. 6, but the present invention is not limited thereto.

이어서, 분류 모델(DM)은 수신한 전처리 데이터를 제5 내지 제12 특성에 각각 대응되도록 분류할 수 있다.Subsequently, the classification model (DM) may classify the received preprocessed data to correspond to the fifth to twelfth characteristics, respectively.

이어서, 서버(100)는 각 특성마다 서로 다른 클러스터링 모델을 이용하여 분류된 데이터 중에서 유사도 높은 데이터에 대한 군집화를 수행할 수 있다(S520). Next, the server 100 may perform clustering on data with high similarity among the classified data using different clustering models for each characteristic (S520).

이어서, 서버(100)는 각 특성 별로 군집화한 클러스터에 대한 서브 페르소나 코드를 도출할 수 있다(S530). 구체적으로, 서브 페르소나 코드 생성 모듈은 최다수의 데이터가 포함된 클러스터에서 특성 별로 서브 페르소나 코드를 도출할 수 있다. 예를 들어, 제1 서브 페르소나 코드 생성 모듈(SPCM1)은 최다수의 데이터가 포함된 클러스터에서 제5 특성에 해당하는 제5 서브 페르소나 코드를 도출할 수 있다. 마찬가지의 방식으로, 제6 내지 제12 서브 페르소나 코드는 각각 제2 내지 제8 서브 페르소나 코드 생성 모듈(SPCM2~8)을 통해 도출될 수 있다.Subsequently, the server 100 may derive sub-persona codes for clusters clustered for each characteristic (S530). Specifically, the sub-persona code generation module can derive sub-persona codes for each characteristic from the cluster containing the most data. For example, the first sub-persona code generation module (SPCM1) may derive the fifth sub-persona code corresponding to the fifth characteristic from the cluster containing the largest amount of data. In the same way, the 6th to 12th sub-persona codes can be derived through the 2nd to 8th sub-persona code generation modules (SPCM2 to 8), respectively.

이어서, 서버(100)는 각 특성 별로 도출된 서브 페르소나 코드를 병합하여 제2 페르소나 코드를 생성할 수 있다(S540). 즉, 제2 페르소나 코드 생성 모듈(PCGM2)은 제5 내지 제12 특성에 대응하는 제5 내지 제12 서브 페르소나 코드를 병합하여 제2 페르소나 코드를 생성할 수 있다. 이때, 제2 서브 페르소나 코드 생성 모듈(PCGM2)은 제5 내지 제12 서브 페르소나 코드를 순차로 병합할 수 있다. 다만, 본 발명의 실시예가 이에 제한되는 것은 아니다.Next, the server 100 may generate a second persona code by merging the sub-persona codes derived for each characteristic (S540). That is, the second persona code generation module (PCGM2) may generate the second persona code by merging the fifth to twelfth sub-persona codes corresponding to the fifth to twelfth characteristics. At this time, the second sub-persona code generation module (PCGM2) may sequentially merge the 5th to 12th sub-persona codes. However, embodiments of the present invention are not limited thereto.

도 19는 도 17의 군집화를 수행하는 과정을 구체화한 순서도이다. 도 20은 도 19의 군집화 과정을 설명하기 위한 도면이다.Figure 19 is a flowchart detailing the process of performing clustering in Figure 17. FIG. 20 is a diagram for explaining the clustering process of FIG. 19.

도 18 내지 도 20을 참조하면, 각각의 서브 페르소나 코드 생성 모듈은 군집화 계수 산출 모델 및 클러스터링 모델을 포함할 수 있다. 이하에서는 제1 서브 페르소나 코드 생성 모듈(SPCM1)을 예로 들어 설명하도록 한다.18 to 20, each sub-persona code generation module may include a clustering coefficient calculation model and a clustering model. Hereinafter, the first sub-persona code generation module (SPCM1) will be described as an example.

몇몇 실시예에서, 제1 서브 페르소나 코드 생성 모듈(SPCM1)은 제1 군집화 계수 산출 모델(CVM1)과 제1 클러스터링 모델(CM1)을 포함할 수 있다. 제1 군집화 계수 산출 모델(CVM1)은 제5 특성에 대한 전처리 데이터를 수신할 수 있다.In some embodiments, the first sub-persona code generation module (SPCM1) may include a first clustering coefficient calculation model (CVM1) and a first clustering model (CM1). The first clustering coefficient calculation model (CVM1) may receive preprocessing data for the fifth characteristic.

이어서, 서버(100)는 특정 특성에 대한 클러스터링 모델에 입력된 데이터 군집화 결과에 대한 실루엣 계수를 산출할 수 있다(S521). 다시 말해서, 제5 특성에 대한 클러스터링 모델에 입력된 데이터 군집화 결과에 대한 실루엣 계수를 산출할 수 있다((a) 단계). 구체적으로, 제1 군집화 계수 산출 모델(CVM1)은 제5 특성에 대한 클러스터링 모델에 대응하는 실루엣 계수를 산출할 수 있다. 실루엣 계수란, 군집화 결과를 평가하는 지표로, -1에서 1 사이의 값을 가질 수 있다. 이때, 실루엣 계수가 1에 가까울수록 각 클러스터가 멀리 떨어져 있어 군집화가 잘 이루어졌음을 의미한다. 또한, 실루엣 계수가 0에 가까울수록 각 클러스터가 서로 가까이에 있음을 의미한다. 반면, 실루엣 계수가 음수 값을 가지는 경우 군집화가 잘못 이루어졌음을 의미한다. Subsequently, the server 100 may calculate a silhouette coefficient for the data clustering result input into the clustering model for a specific characteristic (S521). In other words, the silhouette coefficient for the data clustering result input into the clustering model for the fifth characteristic can be calculated (step (a)). Specifically, the first clustering coefficient calculation model (CVM1) may calculate a silhouette coefficient corresponding to the clustering model for the fifth characteristic. The silhouette coefficient is an indicator that evaluates the clustering results and can have values between -1 and 1. At this time, the closer the silhouette coefficient is to 1, the further apart each cluster is, meaning that clustering is done well. Additionally, the closer the silhouette coefficient is to 0, the closer each cluster is to each other. On the other hand, if the silhouette coefficient has a negative value, it means that the clustering was done incorrectly.

이어서, 서버(100)는 실루엣 계수가 미리 설정된 기준치 이상인 군집화 계수(k)를 선정할 수 있다(S523, (b) 단계). 이때, 군집화 계수(k)는 클러스터의 개수를 의미할 수 있다. 서버(100)는 그래프를 활용하여 군집화 계수(k)를 선정할 수 있다. 구체적으로, 서버(100)는 알고리즘을 활용하여 클러스터 간의 거리의 합이 급격히 감소하는 구간의 클러스터의 개수를 군집화 계수(k)로 활용할 수 있다. 이때, 실루엣 계수가 0.7 이상일 때의 군집화 계수(k)를 선정할 수 있다. 제1 군집화 계수 산출 모델(CVM1)은 군집화 계수(k1)를 제1 클러스터링 모델(CM1)에 전송할 수 있다. 다만, 본 발명의 실시예가 이에 제한되는 것은 아니다.Subsequently, the server 100 may select a clustering coefficient (k) whose silhouette coefficient is greater than or equal to a preset reference value (S523, step (b)). At this time, the clustering coefficient (k) may mean the number of clusters. The server 100 can select the clustering coefficient (k) using a graph. Specifically, the server 100 may utilize an algorithm to use the number of clusters in a section where the sum of distances between clusters rapidly decreases as the clustering coefficient (k). At this time, the clustering coefficient (k) can be selected when the silhouette coefficient is 0.7 or more. The first clustering coefficient calculation model (CVM1) may transmit the clustering coefficient (k1) to the first clustering model (CM1). However, embodiments of the present invention are not limited thereto.

이어서, 서버(100)는 선정된 군집화 계수를 기초로 특정 특성에 대한 데이터의 군집화를 확정할 수 있다(S525). 구체적으로, 선정된 군집화 계수를 기초로 제5 특성에 대한 데이터의 군집화를 확정할 수 있다((c) 단계). 예를 들어, 제1 클러스터링 모델(CM1)은 군집화 계수(k1)에 따라 제5 특성에 대한 클러스터링 그래프를 생성할 수 있다.Subsequently, the server 100 may confirm clustering of data for specific characteristics based on the selected clustering coefficient (S525). Specifically, clustering of data for the fifth characteristic can be confirmed based on the selected clustering coefficient (step (c)). For example, the first clustering model (CM1) may generate a clustering graph for the fifth characteristic according to the clustering coefficient (k1).

이어서, 서버(100)는 제5 특성과 상이한 특성에 대해 (a) 단계 내지 (c) 단계를 동일하게 반복 수행할 수 있다. 즉, 서버(100)는 제5 특성과 상이한 제6 특성 내지 제12 특성에 대해 (a) 단계 내지 (c) 단계를 수행하여 각각의 서브 페르소나 코드를 생성할 수 있다.Subsequently, the server 100 may repeatedly perform steps (a) to (c) for characteristics different from the fifth characteristic. That is, the server 100 may generate each sub-persona code by performing steps (a) to (c) for the sixth to twelfth characteristics that are different from the fifth characteristic.

이를 통해, 서버(100)는 각 특성 별로 도출된 서브 페르소나 코드를 병합하여 제2 페르소나 코드를 생성할 수 있다. Through this, the server 100 can generate a second persona code by merging the sub-persona codes derived for each characteristic.

전술한 방법을 통하여, 본 발명의 몇몇 실시예에 따른 페르소나 코드 제공 방법은, 수신된 사용자의 설문에 대한 답변 데이터를 이용하여 전처리 데이터를 생성하고, 전처리 데이터를 통해 미리 정해진 카테고리로 구분되는 복수의 특성에 대응하는 사용자의 페르소나 코드를 각각 도출하고 병합함으로써, 사용자의 초개인화된 데이터가 반영된 페르소나 코드를 제공할 수 있다.Through the above-described method, the method of providing a persona code according to some embodiments of the present invention generates preprocessed data using response data to the received user's questionnaire, and generates a plurality of data divided into predetermined categories through the preprocessed data. By deriving and merging each user's persona code corresponding to the characteristic, a persona code that reflects the user's hyper-personalized data can be provided.

도 21은 본 발명의 몇몇 실시예에 따른 결측 데이터를 포함하는 경우의 페르소나 코드 생성 방법을 나타내는 순서도이다. 도 22는 본 발명의 몇몇 실시예에 따른 결측 데이터를 포함하는 경우의 페르소나 코드 생성 방법을 설명하기 위한 블록도이다.Figure 21 is a flowchart showing a method for generating a persona code when missing data is included according to some embodiments of the present invention. Figure 22 is a block diagram illustrating a method for generating a persona code when missing data is included according to some embodiments of the present invention.

도 21 및 도 22를 참조하면, 서버(100)의 전처리 모듈(PM)은 사용자 단말(200)로부터 수신된 데이터에 결측 데이터가 존재하는지 여부를 판단할 수 있다(S710). 구체적으로, 수신된 데이터는 답변 데이터 및 비정형 데이터를 포함할 수 있다. 또한, 결측 데이터는 사용자에게 제공된 설문 데이터에 대해 사용자가 응답하지 않은 데이터를 의미할 수 있다. 즉, 제1 내지 제12 서브 페르소나 코드를 도출해야 하는 경우에, 제3 서브 페르소나 코드를 도출할 수 있는 데이터가 존재하지 않을 수 있다. 이때, 결측 데이터는 제3 서브 페르소나 코드를 도출하기 위한 답변 데이터 또는 비정형 데이터일 수 있다.21 and 22, the preprocessing module (PM) of the server 100 may determine whether missing data exists in the data received from the user terminal 200 (S710). Specifically, the received data may include response data and unstructured data. Additionally, missing data may mean data in which the user did not respond to survey data provided to the user. That is, when the first to twelfth sub-persona codes need to be derived, data from which the third sub-persona code can be derived may not exist. At this time, the missing data may be answer data or unstructured data for deriving the third sub-persona code.

결측 데이터가 존재하는 경우, 서버(100)는 결측 데이터를 포함하지 않은 학습 데이터를 이용하여 미리 학습된 딥러닝 모듈에 수신된 데이터에 대한 전처리 데이터를 입력할 수 있다(S720). 이때, 학습 데이터는 답변 데이터 및 비정형 데이터를 포함할 수 있다. 구체적으로, 학습 데이터는 특성별로 서브 페르소나 코드를 모두 도출할 수 있는 답변 데이터 및 비정형 데이터를 포함할 수 있다. 예를 들어, 학습 데이터는 제1 내지 제12 특성 전부에 대한 서브 페르소나 코드를 도출할 수 있는 데이터일 수 있다.If missing data exists, the server 100 may input preprocessing data for the received data into a deep learning module that has been previously trained using training data that does not include the missing data (S720). At this time, the learning data may include answer data and unstructured data. Specifically, the learning data may include answer data and unstructured data from which all sub-persona codes can be derived for each characteristic. For example, the training data may be data from which sub-persona codes for all of the first to twelfth characteristics can be derived.

이어서, 서버(100)는 딥러닝 모듈로부터 적어도 하나의 임시 페르소나 코드를 수신할 수 있다(S730). 구체적으로, 전처리 데이터에 대한 출력으로 딥러닝 모듈로부터 적어도 하나의 임시 페르소나 코드를 수신할 수 있다. 이때, 임시 페르소나 코드는 결측 데이터를 포함하는 수신된 데이터에 대해 생성한 페르소나 코드일 수 있다.Next, the server 100 may receive at least one temporary persona code from the deep learning module (S730). Specifically, at least one temporary persona code may be received from the deep learning module as an output for the preprocessed data. At this time, the temporary persona code may be a persona code generated for received data including missing data.

또한, 임시 페르소나 코드는 복수 개일 수 있다. 이때, 복수 개의 임시 페르소나 코드는 서로 다를 수 있다. 예를 들어, 제3 특성에 대한 결측 데이터가 존재하는 경우, 제3 특성에 대응하는 서브 페르소나 코드는 임의로 생성될 수 있다. 즉, 복수 개의 임시 페르소나 코드는 제3 특성에 대응하는 서브 페르소나 코드를 서로 달리할 수 있다.Additionally, there may be multiple temporary persona codes. At this time, the plurality of temporary persona codes may be different from each other. For example, if there is missing data for the third characteristic, a sub-persona code corresponding to the third characteristic may be randomly generated. That is, a plurality of temporary persona codes may have different sub-persona codes corresponding to the third characteristic.

한편, 결측 데이터에 대한 서브 페르소나 코드는 기 제공된 학습 데이터를 활용하여 생성될 수 있다. 즉, 결측 데이터에 대한 서브 페르소나 코드는 미리 학습된 딥러닝 모듈을 활용하여 생성될 수 있다. 학습된 딥러닝 모듈에 대해서는 추후에 자세히 설명한다.Meanwhile, a sub-persona code for missing data can be created using already provided training data. In other words, the sub-persona code for missing data can be generated using a pre-trained deep learning module. The learned deep learning module will be described in detail later.

이어서, 서버(100)는 수신된 적어도 하나의 임시 페르소나 코드와 관련된 설문 데이터를 도출할 수 있다(S740). 몇몇 실시예에 따르면, 딥러닝 모듈로부터 수신한 복수 개의 임시 페르소나 코드로부터 설문 데이터를 선정할 수 있다. 즉, 임시 페르소나 코드와 같은 성향을 가지는 사용자들의 답변 데이터 및 비정형 데이터에 기초하여 설문 데이터를 도출할 수 있다.Subsequently, the server 100 may derive survey data related to at least one received temporary persona code (S740). According to some embodiments, survey data may be selected from a plurality of temporary persona codes received from a deep learning module. In other words, survey data can be derived based on user response data and unstructured data with the same tendencies as temporary persona codes.

이어서, 서버(100)는 사용자 단말에 설문을 제공하고, 제공된 설문 데이터에 대한 응답 데이터를 사용자 단말(200)로부터 수신할 수 있다(S750). 구체적으로, 응답 데이터는 제공된 설문 데이터에 대한 텍스트, 이미지, 영상 및 음성 등의 형태일 수 있다. Next, the server 100 may provide a survey to the user terminal and receive response data for the provided survey data from the user terminal 200 (S750). Specifically, response data may be in the form of text, image, video, and voice for the provided survey data.

이어서, 서버(100)는 수신된 응답 데이터를 기초로 임시 페르소나 코드의 신뢰도를 도출하고, 이를 기초로 최종 페르소나 코드를 결정할 수 있다(S760). 구체적으로, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 검증 모듈(VM)에 임시 페르소나 코드를 전송할 수 있다. 또한, 검증 모듈(VM)은 사용자 응답 데이터를 수신할 수 있다.Next, the server 100 may derive the reliability of the temporary persona code based on the received response data and determine the final persona code based on this (S760). Specifically, the persona code generation deep learning module (PCDM) can send a temporary persona code to the verification module (VM). Additionally, the verification module (VM) may receive user response data.

한편, S760 단계는 임시 페르소나 코드의 정확도를 확인하는 단계일 수 있다. 즉, S760 단계는 임시 페르소나 코드에 기초하여 예상한 사용자의 답변과 사용자로부터 수신한 응답 데이터가 일치하는지 여부를 확인할 수 있다. 이어서, 검증 모듈(VM)은 사용자 응답 데이터와 임시 페르소나 코드가 모순되지 않는 경우에 최종 페르소나 코드를 생성할 수 있다. Meanwhile, step S760 may be a step to check the accuracy of the temporary persona code. That is, step S760 can check whether the user's answer expected based on the temporary persona code matches the response data received from the user. The verification module (VM) may then generate a final persona code if the user response data and the temporary persona code do not conflict.

반면, 검증 모듈(VM)은 사용자 응답 데이터와 임시 페르소나 코드에 모순이 존재하는 경우, 사용자 응답 데이터에 일치하도록 임시 페르소나 코드를 수정할 수 있다. 이어서, 검증 모듈(VM)은 수정된 임시 페르소나 코드를 이용한 최종 페르소나 코드를 생성할 수 있다. On the other hand, if there is a contradiction between the user response data and the temporary persona code, the verification module (VM) may modify the temporary persona code to match the user response data. Subsequently, the verification module (VM) may generate a final persona code using the modified temporary persona code.

도 23은 도 22의 페르소나 코드 생성 딥러닝 모듈의 구조를 설명하기 위한 도면이다. 도 24는 도 22의 페르소나 코드 딥러닝 모듈의 학습 방법을 설명하기 위한 순서도이다. 도 25는 도 22의 페르소나 코드 딥러닝 모듈의 학습 방법을 설명하기 위한 블록도이다.FIG. 23 is a diagram for explaining the structure of the persona code generation deep learning module of FIG. 22. Figure 24 is a flowchart for explaining the learning method of the persona code deep learning module of Figure 22. Figure 25 is a block diagram for explaining the learning method of the persona code deep learning module of Figure 22.

도 23 내지 도 25를 참조하면, 본 발명의 몇몇 실시예에서 이용하는 페르소나 코드 생성 딥러닝 모듈(PCDM)은 빅데이터를 기초로 학습된 인공신경망을 이용하여 입력된 데이터에 대한 최종 페르소나 코드를 출력할 수 있다.23 to 25, the persona code generation deep learning module (PCDM) used in some embodiments of the present invention outputs the final persona code for the input data using an artificial neural network learned based on big data. You can.

페르소나 코드 생성 딥러닝 모듈(PCDM)은 딥러닝 모듈로 구현되어 입력된 데이터를 기초로 도출된 별도의 파라미터에 대한 매핑 데이터를 이용하여 인공신경망 학습을 수행할 수 있다. 이대, 딥러닝 모듈은 학습 인자로 입력되는 파라미터들에 대하여 머신 러닝(machine learning)을 수행할 수 있다. The Persona Code Generation Deep Learning Module (PCDM) is implemented as a deep learning module and can perform artificial neural network learning using mapping data for separate parameters derived based on input data. Ewha Womans University's deep learning module can perform machine learning on parameters input as learning factors.

보다 자세히 설명하자면, 머신 러닝(Machine Learning)의 일종인 딥러닝(Deep Learning) 기술은 데이터를 기반으로 다단계로 깊은 수준까지 내려가 학습하는 것이다.To explain in more detail, Deep Learning technology, a type of Machine Learning, learns at a deep level in multiple stages based on data.

딥러닝(Deep learning)은, 단계를 높여가면서 복수의 데이터로부터 핵심적인 데이터를 추출하는 머신 러닝(Machine Learning) 알고리즘의 집합을 나타낸다.Deep learning refers to a set of machine learning algorithms that extract key data from multiple data at increasing levels.

페르소나 코드 생성 딥러닝 모듈(PCDM)은 공지된 다양한 딥러닝 구조를 이용할 수 있다. 예를 들어, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), DBN(Deep Belief Network), GNN(Graph Neural Network) 등의 구조를 이용할 수 있다.The persona code generation deep learning module (PCDM) can use a variety of known deep learning structures. For example, the persona code generation deep learning module (PCDM) can use structures such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Belief Network (DBN), and Graph Neural Network (GNN).

한편, 페르소나 코드 생성 딥러닝 모듈(PCDM)의 인공신경망 학습은 주어진 입력에 대하여 원하는 출력이 나오도록 노드간 연결선의 웨이트(weight)를 조정(필요한 경우 바이어스(bias) 값도 조정)함으로써 이루어질 수 있다. 또한, 인공신경망은 학습에 의해 웨이트(weight) 값을 지속적으로 업데이트 시킬 수 있다. 또한, 인공신경망의 학습에는 역전파(Back Propagation) 등의 방법이 사용될 수 있다.Meanwhile, artificial neural network learning of the persona code generation deep learning module (PCDM) can be accomplished by adjusting the weight of the connection lines between nodes (adjusting the bias value if necessary) to produce the desired output for a given input. . Additionally, artificial neural networks can continuously update weight values through learning. Additionally, methods such as back propagation can be used to learn artificial neural networks.

이때, 서버(100)의 메모리(140)에는 머신 러닝으로 미리 학습된 인공신경망(Artificial Neural Network)이 탑재될 수 있다. 즉, 메모리(140)에는 머신 러닝에 사용되는 데이터 및 결과 데이터 등이 저장될 수 있다.At this time, an artificial neural network previously trained through machine learning may be installed in the memory 140 of the server 100. That is, data used for machine learning and result data may be stored in the memory 140.

본 발명의 몇몇 실시예에서, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 인코더(EN) 및 디코더(DE)를 포함하는 트랜스포머(Transformer) 또는 오토-인코더(autoencoder) 기반의 딥러닝 모델을 포함할 수 있다. 다만, 이는 본 발명의 몇몇 예시에 불과하고, 본 발명이 이에 한정되는 것은 아니다. In some embodiments of the present invention, the persona code generation deep learning module (PCDM) may include a transformer or autoencoder-based deep learning model including an encoder (EN) and a decoder (DE). there is. However, these are only a few examples of the present invention, and the present invention is not limited thereto.

도 23을 참조하면, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 전처리 데이터를 입력노드로 하는 입력 레이어(input)와, 페르소나 코드(또는, 임시 페르소나 코드)를 출력노드로 하는 출력 레이어(Output)와, 입력 레이어와 출력 레이어 사이에 배치되는 M 개의 히든 레이어를 포함한다.Referring to Figure 23, the persona code generation deep learning module (PCDM) includes an input layer (input) with preprocessed data as an input node, and an output layer (output) with a persona code (or temporary persona code) as an output node. , includes M hidden layers placed between the input layer and the output layer.

여기서, 각 레이어들의 노드를 연결하는 에지(edge)에는 가중치가 설정될 수 있다. 이러한 가중치 혹은 에지의 유무는 학습 과정에서 추가, 제거, 또는 업데이트 될 수 있다. 따라서, 학습 과정을 통하여, k개의 입력노드와 i개의 출력노드 사이에 배치되는 노드들 및 에지들의 가중치는 업데이트될 수 있다.Here, weights may be set on the edges connecting the nodes of each layer. The presence or absence of these weights or edges can be added, removed, or updated during the learning process. Therefore, through the learning process, the weights of nodes and edges arranged between k input nodes and i output nodes can be updated.

페르소나 코드 생성 딥러닝 모듈(PCDM)이 학습을 수행하기 전에는 모든 노드와 에지는 초기값으로 설정될 수 있다. 그러나, 누적하여 정보가 입력될 경우, 노드 및 에지들의 가중치는 변경되고, 이 과정에서 학습인자로 입력되는 파라미터들(즉, 전처리 데이터)과 출력노드로 할당되는 값(즉, 최종 페르소나 코드) 사이의 매칭이 이루어질 수 있다. Before the persona code generation deep learning module (PCDM) performs learning, all nodes and edges can be set to initial values. However, when information is input cumulatively, the weights of nodes and edges change, and in this process, there is a difference between the parameters input as learning factors (i.e., preprocessed data) and the value assigned to the output node (i.e., final persona code). Matching can be achieved.

추가적으로, 클라우드 서버(미도시)를 이용하는 경우, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 많은 수의 파라미터들을 수신하여 처리할 수 있다. 따라서, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 방대한 데이터에 기반하여 학습을 수행할 수 있다.Additionally, when using a cloud server (not shown), the persona code generation deep learning module (PCDM) can receive and process a large number of parameters. Therefore, the persona code generation deep learning module (PCDM) can perform learning based on massive data.

페르소나 코드 생성 딥러닝 모듈(PCDM)을 구성하는 입력노드와 출력노드 사이의 노드 및 에지의 가중치는 페르소나 코드 생성 딥러닝 모듈(PCDM)의의 학습 과정에 의해 업데이트될 수 있다. 또한, 페르소나 코드 생성 딥러닝 모듈(PCDM)에서 출력되는 파라미터는 요약 데이터 외에도 다양한 데이터로 추가 확장될 수 있음은 물론이다.The weights of nodes and edges between the input nodes and output nodes that make up the Persona Code Generation Deep Learning Module (PCDM) can be updated by the learning process of the Persona Code Generation Deep Learning Module (PCDM). In addition, of course, the parameters output from the persona code generation deep learning module (PCDM) can be further expanded to various data in addition to summary data.

페르소나 코드 생성 딥러닝 모듈(PCDM)에서 이용하는 머신 러닝 방법으로는 준지도학습(semi-supervised learning)과 지도학습(supervised learning)이 모두 사용될 수 있다. 또한, 페르소나 코드 생성 딥러닝 모듈(PCDM)은 설정에 따라 학습 후 더 정확한 요약 데이터를 출력하기 위한 인공신경망 구조를 자동 업데이트하도록 제어될 수 있다.Both semi-supervised learning and supervised learning can be used as machine learning methods used in the persona code generation deep learning module (PCDM). Additionally, the persona code generation deep learning module (PCDM) can be controlled to automatically update the artificial neural network structure to output more accurate summary data after learning according to settings.

추가적으로, 도면에 명확하게 도시하지는 않았으나, 본 발명의 다른 몇몇 실시예에서, 페르소나 코드 생성 딥러닝 모듈(PCDM)의 동작은 서버(100) 또는 별도의 클라우드 서버(미도시)에 연계되어 실시될 수 있다.Additionally, although not clearly shown in the drawing, in some other embodiments of the present invention, the operation of the persona code generation deep learning module (PCDM) may be performed in conjunction with the server 100 or a separate cloud server (not shown). there is.

도 24 및 도 25를 참조하면, 서버(100)는 결측 데이터를 포함하지 않은 수신된 데이터에 대한 전처리 데이터와, 해당 전처리 데이터를 기초로 도출된 최종 페르소나 코드를 데이터베이스에 저장할 수 있다(S810). Referring to FIGS. 24 and 25 , the server 100 may store preprocessing data for received data that does not include missing data and the final persona code derived based on the preprocessing data in a database (S810).

여기에서, 전처리 데이터 및 최종 페르소나 코드는 도 3 내지 도 20을 참조하여 전술한 본 발명의 몇몇 실시예를 통해 도출될 수 있으며, 페르소나 코드 생성 딥러닝 모듈(PCDM)을 학습시키기 위한 학습 데이터로 이용될 수 있다. 다만, 이는 페르소나 코드 생성 딥러닝 모듈(PCDM)의 학습 데이터의 하나의 예시에 불과하고, 본 발명이 이에 한정되는 것은 아니다. Here, the preprocessing data and final persona code can be derived through several embodiments of the present invention described above with reference to FIGS. 3 to 20, and are used as learning data to train the persona code generation deep learning module (PCDM). It can be. However, this is only an example of learning data of the persona code generation deep learning module (PCDM), and the present invention is not limited to this.

이어서, 서버(100)는 데이터베이스에 저장된 전처리 데이터와 이에 대한 최종 페르소나 코드로 구성되는 학습 데이터를 로드할 수 있다(S820).Next, the server 100 may load training data consisting of preprocessed data stored in the database and the final persona code for the preprocessed data (S820).

이어서, 로드된 학습 데이터를 페르소나 코드 생성 딥러닝 모듈(PCDM)의 입력단자 및 출력단자에 인가함으로써 페르소나 코드 생성 딥러닝 모듈(PCDM)을 지도 학습시킬 수 있다(S830). 구체적으로, 서버(100)는 페르소나 코드 생성 딥러닝 모듈(PCDM)의 입력 단자에 전처리 데이터를 인가하고, 페르소나 코드 생성 딥러닝 모듈(PCDM)의 출력 단자에 최종 페르소나 코드를 인가함으로써, 페르소나 코드 생성 딥러닝 모듈(PCDM)을 지도학습 시킬 수 있다. 다만 이는 하나의 예시에 불과하며, 페르소나 코드 생성 딥러닝 모듈(PCDM)에 인가되는 전처리 데이터를 대신하여 사용자 단말(200)로부터 수신된 데이터(즉, 답변 데이터 또는 비정형 데이터)가 그대로 이용될 수 있음은 물론이다.Next, the persona code generation deep learning module (PCDM) can be supervised by applying the loaded learning data to the input and output terminals of the persona code generation deep learning module (PCDM) (S830). Specifically, the server 100 generates persona codes by applying preprocessing data to the input terminal of the persona code generation deep learning module (PCDM) and applying the final persona code to the output terminal of the persona code generation deep learning module (PCDM). Deep learning module (PCDM) can be used for supervised learning. However, this is only an example, and the data received from the user terminal 200 (i.e., answer data or unstructured data) can be used as is instead of the preprocessed data applied to the persona code generation deep learning module (PCDM). Of course.

이를 통해, 본 발명은, 사용자의 답변 데이터가 부족한 경우(즉, 수신된 사용자 데이터가 결측 데이터인 경우)에 미리 학습된 딥러닝 모델을 이용하여 복수의 임시 페르소나 코드를 생성하고, 부가적인 사용자의 응답 데이터를 이용하여 생성된 임시 페르소나 코드를 검증하여 도출된 페르소나 코드를 제공함으로써, 페르소나 코드의 예측 정확도를 향상시킬 수 있다.Through this, the present invention generates a plurality of temporary persona codes using a pre-trained deep learning model when the user's answer data is insufficient (i.e., when the received user data is missing data) and generates a plurality of temporary persona codes for additional users. By providing a persona code derived by verifying a temporary persona code generated using response data, the prediction accuracy of the persona code can be improved.

도 26은 본 발명의 몇몇 실시예에 따른 페르소나 코드 생성 방법을 수행하는 서버의 하드웨어 구성을 설명하기 위한 도면이다. Figure 26 is a diagram for explaining the hardware configuration of a server that performs a persona code generation method according to some embodiments of the present invention.

도 26을 참조하면, 본 발명의 몇몇 실시예들에 따른 페르소나 코드 생성 방법을 수행하는 시스템 또는 서버(100)는 전자 장치(1000)로 구현될 수 있다. 전자 장치(1000)는 프로세서(1010), 입출력 장치(1020, I/O), 메모리(1030, memory), 인터페이스(1040), 스토리지(1050, storage) 및 버스(1060, bus)를 포함할 수 있다. 프로세서(1010), 입출력 장치(1020), 메모리(1030), 인터페이스(1040), 및/또는 스토리지(1050)는 버스(1060)를 통하여 서로 결합될 수 있다. 버스(1060)는 데이터들이 이동되는 통로(path)에 해당한다.Referring to FIG. 26, a system or server 100 that performs a persona code generation method according to some embodiments of the present invention may be implemented as an electronic device 1000. The electronic device 1000 may include a processor 1010, an input/output device 1020 (I/O), a memory 1030, an interface 1040, a storage 1050, and a bus 1060. there is. The processor 1010, input/output device 1020, memory 1030, interface 1040, and/or storage 1050 may be coupled to each other through a bus 1060. The bus 1060 corresponds to a path through which data moves.

구체적으로, 프로세서(1010)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit), 마이크로프로세서, 디지털 신호 프로세스, 마이크로컨트롤러, 금융 어플리케이션 프로세서(AP, application processor) 및 이들과 유사한 기능을 수행할 수 있는 논리 소자들 중에서 적어도 하나를 포함할 수 있다. Specifically, the processor 1010 includes a Central Processing Unit (CPU), Micro Processor Unit (MPU), Micro Controller Unit (MCU), Graphic Processing Unit (GPU), microprocessor, digital signal processor, microcontroller, and financial application processor ( It may include at least one of an AP (application processor) and logic elements that can perform similar functions.

입출력 장치(1020)는 키패드(keypad), 키보드, 터치스크린 및 디스플레이 장치 중 적어도 하나를 포함할 수 있다. The input/output device 1020 may include at least one of a keypad, a keyboard, a touch screen, and a display device.

메모리(1030)는 데이터 및/또는 프로그램 등을 로드(load)할 수 있다. 이때, 메모리(1030)는 프로세서(1010)의 동작을 향상시키기 위한 동작 메모리로서, 고속의 디램 및/또는 에스램 등을 포함할 수 있다. 메모리(1030)는 DDR SDRAM(Double Data Rate Static DRAM), SDR SDRAM(Single Data Rate SDRAM)과 같은 하나 이상의 휘발성 메모리 장치 및/또는 EEPROM(Electrical Erasable Programmable ROM), 플래시 메모리(flash memory)과 같은 하나 이상의 비휘발성 메모리 장치를 포함할 수 있다.The memory 1030 may load data and/or programs. At this time, the memory 1030 is an operating memory for improving the operation of the processor 1010, and may include high-speed DRAM and/or SRAM. The memory 1030 may be one or more volatile memory devices such as Double Data Rate Static DRAM (DDR SDRAM), Single Data Rate SDRAM (SDR SDRAM), and/or Electrical Erasable Programmable ROM (EEPROM), or flash memory. It may include one or more non-volatile memory devices.

인터페이스(1040)는 통신 네트워크로 데이터를 전송하거나 통신 네트워크로부터 데이터를 수신하는 기능을 수행할 수 있다. 인터페이스(1040)는 유선 또는 무선 형태일 수 있다. 예컨대, 인터페이스(1040)는 안테나 또는 유무선 트랜시버 등을 포함할 수 있다. The interface 1040 may perform a function of transmitting data to or receiving data from a communication network. Interface 1040 may be wired or wireless. For example, the interface 1040 may include an antenna or a wired or wireless transceiver.

스토리지(1050)는 데이터 및/또는 프로그램 등을 저장 및 보관할 수 있다. 스토리지(1050)는 반도체 드라이브(SSD, Solid State Drive), 하드 드라이브(hard drive), 플래시 메모리(flash memory)와 같은 하나 이상의 비휘발성 메모리 장치를 포함할 수 있다. 본 발명에서 스토리지(1050)는 전술한 페르소나 코드 제공 방법을 수행하기 위한 인스트럭션(instruction)으로 구성된 컴퓨터 프로그램을 저장할 수 있다.The storage 1050 can store and store data and/or programs. Storage 1050 may include one or more non-volatile memory devices, such as a solid state drive (SSD), a hard drive, or flash memory. In the present invention, the storage 1050 can store a computer program consisting of instructions for performing the above-described persona code providing method.

사용자 단말(200)은 개인 휴대용 정보 단말기(PDA, personal digital assistant) 포터블 컴퓨터(portable computer), 웹 타블렛(web tablet), 무선 전화기(wireless phone), 모바일 폰(mobile phone), 디지털 뮤직 플레이어(digital music player), 메모리 카드(memory card), 또는 정보를 무선환경에서 송신 및/또는 수신할 수 있는 모든 전자 제품에 적용될 수 있다.The user terminal 200 may be a personal digital assistant (PDA), a portable computer, a web tablet, a wireless phone, a mobile phone, or a digital music player. It can be applied to music players, memory cards, or any electronic product that can transmit and/or receive information in a wireless environment.

또한, 본 발명의 실시예들에 따른 서버(100), 사용자 단말(200)은 각각 복수의 전자 장치(1000)가 네트워크를 통해서 서로 연결되어 형성된 시스템일 수 있다. 이러한 경우에는 각각의 모듈 또는 모듈의 조합들이 전자 장치(1000)로 구현될 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.Additionally, the server 100 and the user terminal 200 according to embodiments of the present invention may each be a system formed by connecting a plurality of electronic devices 1000 to each other through a network. In this case, each module or combination of modules may be implemented as the electronic device 1000. However, this embodiment is not limited to this.

추가적으로, 서버(100)는 워크스테이션(workstation), 데이터 센터, 인터넷 데이터 센터(internet data center(IDC)), DAS(direct attached storage) 시스템, SAN(storage area network) 시스템, NAS(network attached storage) 시스템 및 RAID(redundant array of inexpensive disks, or redundant array of independent disks) 시스템 중 적어도 하나로 구현될 수 있으나, 본 실시예가 이에 제한되는 것은 아니다.Additionally, the server 100 is a workstation, a data center, an internet data center (IDC), a direct attached storage (DAS) system, a storage area network (SAN) system, and a network attached storage (NAS). It may be implemented as at least one of a system and a RAID (redundant array of inexpensive disks, or redundant array of independent disks) system, but the present embodiment is not limited thereto.

또한, 서버(100)는 사용자 단말(200)을 이용하여 네트워크를 통해서 데이터를 전송할 수 있다. 네트워크는 유선 인터넷 기술, 무선 인터넷 기술 및 근거리 통신 기술에 의한 네트워크를 포함할 수 있다. 유선 인터넷 기술은 예를 들어, 근거리 통신망(LAN, Local area network) 및 광역 통신망(WAN, wide area network) 중 적어도 하나를 포함할 수 있다.Additionally, the server 100 can transmit data through a network using the user terminal 200. Networks may include networks based on wired Internet technology, wireless Internet technology, and short-distance communication technology. Wired Internet technology may include, for example, at least one of a local area network (LAN) and a wide area network (WAN).

무선 인터넷 기술은 예를 들어, 무선랜(Wireless LAN: WLAN), DMNA(Digital Living Network Alliance), 와이브로(Wireless Broadband: Wibro), 와이맥스(World Interoperability for Microwave Access: Wimax), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), IEEE 802.16, 롱 텀 에볼루션(Long Term Evolution: LTE), LTE-A(Long Term Evolution-Advanced), 광대역 무선 이동 통신 서비스(Wireless Mobile Broadband Service: WMBS) 및 5G NR(New Radio) 기술 중 적어도 하나를 포함할 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.Wireless Internet technologies include, for example, Wireless LAN (WLAN), DMNA (Digital Living Network Alliance), Wibro (Wireless Broadband), Wimax (World Interoperability for Microwave Access: Wimax), and HSDPA (High Speed Downlink Packet). Access), HSUPA (High Speed Uplink Packet Access), IEEE 802.16, Long Term Evolution (LTE), LTE-A (Long Term Evolution-Advanced), Wireless Mobile Broadband Service (WMBS) and 5G NR (New Radio) technology. However, this embodiment is not limited to this.

근거리 통신 기술은 예를 들어, 블루투스(Bluetooth), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association: IrDA), UWB(Ultra-Wideband), 지그비(ZigBee), 인접 자장 통신(Near Field Communication: NFC), 초음파 통신(Ultra Sound Communication: USC), 가시광 통신(Visible Light Communication: VLC), 와이 파이(Wi-Fi), 와이 파이 다이렉트(Wi-Fi Direct), 5G NR (New Radio) 중 적어도 하나를 포함할 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.Short-range communication technologies include, for example, Bluetooth, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra-Wideband (UWB), ZigBee, and Near Field Communication. At least one of NFC), Ultrasound Communication (USC), Visible Light Communication (VLC), Wi-Fi, Wi-Fi Direct, and 5G NR (New Radio) may include. However, this embodiment is not limited to this.

네트워크를 통해서 통신하는 서버(100)는 이동통신을 위한 기술표준 및 표준 통신 방식을 준수할 수 있다. 예를 들어, 표준 통신 방식은 GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTEA(Long Term Evolution-Advanced) 및 5G NR(New Radio) 중 적어도 하나를 포함할 수 있다. 단, 본 실시예가 이에 제한되는 것은 아니다.The server 100 that communicates through a network can comply with technical standards and standard communication methods for mobile communication. For example, standard communication methods include GSM (Global System for Mobile communication), CDMA (Code Division Multi Access), CDMA2000 (Code Division Multi Access 2000), and EV-DO (Enhanced Voice-Data Optimized or Enhanced Voice-Data Only). , at least one of Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTEA), and 5G New Radio (NR) may include. However, this embodiment is not limited to this.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an illustrative explanation of the technical idea of the present embodiment, and those skilled in the art will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but rather to explain it, and the scope of the technical idea of the present embodiment is not limited by these examples. The scope of protection of this embodiment should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this embodiment.

Claims

In the method of providing a persona code performed on a server linked to a user terminal,
Receiving response data to survey data pre-stored in a database linked to the server and unstructured data collected through the user terminal from the user terminal;
performing a preprocessing operation on the answer data and the unstructured data to generate preprocessing data including weights for the answer data and the unstructured data;
Generating tagged data by assigning tags to each keyword included in the preprocessed data;
generating a first persona code using the tag included in the tag assignment data;
generating a second persona code through clustering of the pre-processed data based on a category pre-specified for each survey question constituting the survey data; and
Combining the first persona code and the second persona code to generate a final persona code,
The step of generating the preprocessing data is,
Generating extracted data consisting of preset main keywords included in the response data through text mining on the response data;
determining the reliability of the answer data based on the number of updates of the answer data, the update time, and whether or not there is a contradiction in the answers of the survey data included in the answer data;
generating first preprocessing data including the extracted data and the reliability;
Extracting keywords related to objects included in the unstructured data and generating structured data for the unstructured data;
determining the importance of the structured data based on the number of updates of the unstructured data, the update time, and the proportion of the unstructured data of the part corresponding to the keyword derived from the unstructured data;
generating second preprocessing data including the standardized data and the importance level;
Deriving a first weight for the extracted data based on the first preprocessing data and deriving a second weight for the unstructured data based on the second preprocessing data;
Generating the preprocessed data including the extracted data derived based on the answer data and the first weight for the extracted data, and the structured data derived based on the unstructured data and the second weight for the extracted data; ,
The step of generating the second persona code is,
Classifying the preprocessed data according to each predetermined characteristic included in the second category of the final persona code;
performing clustering on data with high similarity among the classified data using different clustering models for each characteristic;
Deriving sub-persona codes for each cluster clustered according to the above characteristics;
Generating the second persona code by merging the plurality of sub-persona codes derived for each characteristic,
The step of performing the clustering is,
(a) calculating a silhouette coefficient for the data clustering result input to the clustering model for the first characteristic,
(b) selecting a clustering coefficient whose silhouette coefficient is greater than or equal to a preset standard value;
(c) determining clustering of data for the first characteristic based on the selected clustering coefficient;
Comprising repeating steps (a) to (c) equally for second and third characteristics different from the first characteristic.
How to provide persona codes.

According to claim 1,
The unstructured data includes at least one of image, video, voice, and location information collected in the process of receiving a user response to the survey data at the user terminal.
How to provide persona codes.

delete

According to claim 1,
The step of assigning the first persona code is,
Generating first tag assigned data by assigning first tags related to a first characteristic included in the first category of the final persona code to the preprocessed data;
calculating a first score for the first characteristic by considering the weight of the preprocessed data for the keyword to which the first tag is assigned;
Generating second tagged data by assigning second tags related to second characteristics different from the first characteristics included in the first category to the preprocessed data;
calculating a second score for the second characteristic by considering a weight for the keyword to which the second tag is assigned in the preprocessed data;
assigning a first sub-persona code corresponding to the first score;
assigning a second sub-persona code corresponding to the second score;
Combining the first sub-persona code and the second sub-persona code to generate the first persona code.
How to provide persona codes.

According to clause 4,
The step of assigning the first persona code is,
Further comprising correcting the first score and the second score using user information pre-entered by the user terminal.
How to provide persona codes.

According to claim 1,
The step of assigning the first persona code is,
assigning preset tags for each topic to keywords included in the preprocessed data;
Classifying the subject tags assigned to the preprocessed data into first characteristics or second characteristics;
calculating a first score for the first characteristic by considering a weight of the preprocessed data for keywords tagged with the first characteristic;
calculating a second score for the second characteristic by considering the weight of the preprocessed data for keywords tagged with the second characteristic;
assigning a first sub-persona code corresponding to the first score;
assigning a second sub-persona code corresponding to the second score;
Combining the first sub-persona code and the second sub-persona code to generate the first persona code.
How to provide persona codes.

delete

According to claim 1,
determining whether missing data exists in the data received from the user terminal;
If the missing data exists, inputting preprocessing data for the received data into a pre-trained deep learning module and receiving at least one temporary persona code as an output from the deep learning module;
deriving new survey data associated with the received at least one temporary persona code;
providing the survey data to the user terminal and receiving response data to the survey data from the user terminal;
Further comprising deriving reliability of the temporary persona code based on the response data and determining a final persona code based on the reliability.
How to provide persona codes.

According to clause 9,
storing the preprocessed data for the received data that does not include the missing data and the final persona code derived based on the preprocessed data in the database;
Applying the pre-processing data stored in the database to the input node of the deep learning module, and applying the persona code for the pre-processing data to the output node of the deep learning module, further comprising supervised learning of the deep learning module. containing
How to provide persona codes.

delete