KR102426236B1

KR102426236B1 - Method and System for hybrid user profile extraction

Info

Publication number: KR102426236B1
Application number: KR1020220018411A
Authority: KR
Inventors: 심충섭; 하영광
Original assignee: 주식회사 딥하이
Priority date: 2020-05-18
Filing date: 2022-02-11
Publication date: 2022-08-02
Also published as: KR20220110679A; KR20220026551A; KR20210143120A; KR102629876B1; WO2021235815A1

Abstract

비정형 컨테츠로부터 유저 프로파일 추출방법 및 그 시스템이 개시된다.
유저 프로파일 추출 방법은 유저 프로파일 추출 시스템이 소정의 유저가 작성한 컨텐츠에 상응하는 문서를 특정하는 단계, 상기 유저 프로파일 추출 시스템이 상기 문서의 전체 또는 일부인 처리단위 문서별로 유저 프로파일에 포함되며 미리 정의된 적어도 하나의 세부 프로파일 각각의 속성을 판단하는 속성판단 단계, 및 상기 속성판단 단계의 결과에 기초하여 상기 유저 프로파일 추출 시스템이 상기 유저 프로파일을 특정하는 단계를 포함하며, 상기 유저 프로파일 추출 시스템이 상기 문서의 전체 또는 일부인 처리단위 문서별로 유저 프로파일에 포함되며 미리 정의된 적어도 하나의 세부 프로파일 각각의 속성을 판단하는 속성판단 단계는, 문맥 인식 자연어 처리 모델에 기반하여 학습된 분류엔진 또는 기계독해 엔진 중 적어도 하나를 이용하여 수행되는 것을 특징으로 한다.A method and system for extracting a user profile from unstructured content are disclosed.
The user profile extraction method includes the steps of: specifying, by a user profile extraction system, a document corresponding to content created by a predetermined user; a property determination step of determining an attribute of each one detailed profile, and the user profile extraction system specifying the user profile based on the result of the property determination step, wherein the user profile extraction system determines the At least one of a classification engine or a machine reading comprehension engine learned based on a context-aware natural language processing model, the attribute determination step of determining the attribute of each of the at least one detailed profile that is included in the user profile for each processing unit document that is all or part of it It is characterized in that it is performed using

Description

Hybrid user profile extraction method and system {Method and System for hybrid user profile extraction}

본 발명은 비정형 컨텐츠(예컨대, 뉴스, 댓글, 사용자 작성 컨텐츠 등)로부터 상기 비정형 컨텐츠를 작성한 유저에 관련된 다양한 속성(예컨대, 성별, 기혼여부, 자녀 존재 여부, 작성경위, 연령대, 지역 등)들 즉, 유저 프로파일을 추출할 수 있는 방법 및 그 시스템에 관한 것이다.The present invention provides various attributes (eg, gender, marital status, existence of children, writing history, age group, region, etc.) related to the user who created the atypical content from the atypical content (eg, news, comments, user-written content, etc.) , a method for extracting a user profile, and a system for the same.

유무선 인터넷 상에서 다수의 사용자가 생성한 컨텐츠는 그 자체로써 유용한 정보가 될 수 있다. Content generated by a large number of users on the wired/wireless Internet may be useful information by itself.

이러한 정보들은 예컨대, 사용자 반응 분석 등의 다양한 분야에서 분석대상이 될 수 있으며, 이른바 소셜 어낼리틱스(social analytics) 또는 소셜 리스닝(social listening)이라는 서비스 형태로 필요한 사람들에게 제공하고 있다.Such information can be analyzed in various fields, such as user reaction analysis, and is provided to people who need it in the form of a service called social analytics or social listening.

유저가 작성한 컨텐츠들(예컨대, 텍스트, 이미지, 및/또는 동영상 들)에는 다양한 이슈 예컨대, 사회적 이슈, 특정 상품 또는 서비스, 특정 인물, 특정 기업 등에 대해 사람들이 어떻게 생각하고 어떻게 반응하는지에 대한 정보가 내포되어 있다.User-created contents (eg, text, images, and/or videos) contain information on how people think and react to various issues, such as social issues, specific products or services, specific people, and specific companies. is nested

그리고 이러한 정보들을 분석함으로써 사회적 현상이나 특정 상품 또는 서비스에 대한 다양한 사람들의 반응을 이해함으로써 일련의 인사이트(insight)를 도출하고자 하는 시도가 많이 이루어지고 있다.And by analyzing such information, many attempts are being made to derive a series of insights by understanding various people's reactions to social phenomena or specific products or services.

하지만 종래의 이러한 시도들은 사람들이 어떤 특정 이슈에 대해 긍정적으로 반응했는지 부정적으로 반응했는지 등을 판단하는 소위 감성분석과 그러한 반응을 하게 된 이유를 추정할 수 있는 주요 키워드들에 초점이 맞추어져 있었다. However, these prior attempts were focused on so-called sentiment analysis, which determines whether people responded positively or negatively to a specific issue, and key keywords that could estimate the reason for such a reaction.

그리고 이러한 유저 컨텐츠들을 작성한 유저들이 어떠한 유저들인지에 대해서는 상대적으로 분석하고자 하는 시도가 없었다. 왜냐하면 다양한 채널에 불특정 사용자들이 작성한 비정형 컨텐츠(예컨대, SNS, 카페, 블로그, 뉴스 또는 이러한 것들에 대해 작성된 댓글 등)에 대해서, 로그인을 하여 유저의 정보를 획득할 수 있는 해아 채널 플랫폼을 제외하고는, 어떤 유저가 해당 컨텐츠를 작성하였는지를 파악하고자 하는 시도가 존재하지 않았다. And there was no attempt to relatively analyze the types of users who wrote these user contents. This is because, except for the Haeah channel platform, where users can log in to obtain user information by logging into atypical content (eg, SNS, cafe, blog, news, or comments written on these) written by unspecified users on various channels. , there was no attempt to determine which user wrote the corresponding content.

하지만 최근에는 자연어 처리(Natural Language Processing) 분야의 비약적인 발전이 이루어졌다.However, in recent years, there has been a rapid development in the field of natural language processing.

따라서 유저 컨텐츠의 내용을 통해서 해당 유저 컨텐츠가 어떤 유저에 의해 작성된 컨텐츠인지를 알 수 있다면 다수의 비정형 컨텐츠로부터 더욱 가치있는 인사이트의 도출이 가능해질 수 있으며 이러한 기술적 사상이 요구된다.Therefore, if it is possible to know which user the corresponding user content is written by through the content of the user content, it is possible to derive more valuable insights from a large number of atypical content, and such a technical idea is required.

한국공개특허 1020070017415 "유저 기호 추정 장치, 유저 프로파일 추정 장치 및 로봇"Korean Patent Laid-Open Patent No. 1020070017415 "User preference estimating device, user profile estimating device and robot"

따라서 본 발명의 기술적 사상에 의하면 유저들이 작성한 유저 컨텐츠로부터 해당 유저의 유저 프로파일을 효과적으로 특정할 수 있는 방법 및 시스템을 제공하는 것이다. Therefore, according to the technical idea of the present invention, it is to provide a method and system for effectively specifying a user profile of a corresponding user from user contents created by users.

본 발명의 일 실시 예에 따른 하이브리드 유저 프로파일 추출방법은 유저 프로파일 추출 시스템이 소정의 유저가 작성한 컨텐츠에 상응하는 문서를 특정하는 단계, 상기 유저 프로파일 추출 시스템이 상기 문서의 전체 또는 일부인 처리단위 문서별로 유저 프로파일에 포함되며 미리 정의된 적어도 하나의 세부 프로파일 각각의 속성을 판단하는 속성판단 단계, 및 상기 속성판단 단계의 결과에 기초하여 상기 유저 프로파일 추출 시스템이 상기 유저 프로파일을 특정하는 단계를 포함하며, 상기 유저 프로파일 추출 시스템이 상기 문서의 전체 또는 일부인 처리단위 문서별로 유저 프로파일에 포함되며 미리 정의된 적어도 하나의 세부 프로파일 각각의 속성을 판단하는 속성판단 단계는, 상기 유저 프로파일 추출 시스템이 미리 저장된 적어도 하나의 세부 프로파일별 질문을 미리 학습된 기계독해(Machine Reading Comprehension) 엔진에 입력하여 상기 문서를 대상으로 상기 세부 프로파일별 답변을 획득하는 단계, 획득한 상기 세부 프로파일별 답변에 기초하여 상기 적어도 하나의 세부 프로파일 중 적어도 일부의 속성을 특정하는 단계, 및 상기 적어도 하나의 세부 프로파일 중 상기 세부 프로파일별 답변에 기초하여 속성이 특정되지 않은 나머지 세부 프로파일에 대해, 상기 적어도 하나의 세부 프로파일 각각에 상응하는 적어도 하나의 세부 프로파일 분류엔진 중에서 상기 나머지 세부 프로파일에 상응하는 세부 프로파일 분류엔진 각각으로 상기 처리단위 문서를 입력하여 상기 나머지 세부 프로파일의 속성을 판단하는 단계를 포함하며, 상기 적어도 하나의 세부 프로파일 분류엔진 각각은, 상기 처리단위 문서가 해당하는 세부 프로파일에 대해 미리 정해진 속성들 중 어떤 속성을 갖는지를 분류하도록 학습된 딥러닝 기반의 분류엔진으로 구현되는 것을 특징으로 한다.A hybrid user profile extraction method according to an embodiment of the present invention includes the steps of, by a user profile extraction system, specifying a document corresponding to content created by a predetermined user, and the user profile extraction system for each processing unit document that is all or part of the document An attribute determination step of determining an attribute of each of the at least one detailed profile included in the user profile and predefined, and the step of specifying the user profile by the user profile extraction system based on the result of the attribute determination step, The attribute determination step of the user profile extraction system determining the attributes of each of the at least one detailed profile that is included in the user profile for each processing unit document that is all or a part of the document and that is defined in advance includes at least one of the user profile extraction system stored in advance inputting a question for each detailed profile of the machine reading comprehension engine learned in advance to obtain an answer for each detailed profile with respect to the document; specifying an attribute of at least one of the profiles, and at least one corresponding to each of the at least one detailed profile for the remaining detailed profiles in which an attribute is not specified based on the answer for each detailed profile among the at least one detailed profile and inputting the processing unit document into each of the detailed profile classification engines corresponding to the remaining detailed profile from among the detailed profile classification engines of , characterized in that it is implemented as a deep learning-based classification engine trained to classify which of the properties the processing unit document has among the properties predetermined for the corresponding detailed profile.

상기 유저 프로파일 추출 시스템이 상기 문서의 전체 또는 일부인 처리단위 문서별로 유저 프로파일에 포함되며 미리 정의된 적어도 하나의 세부 프로파일 각각의 속성을 판단하는 속성판단 단계는, 상기 문서에 제1처리단위 문서 및 제2처리단위 문서가 포함되고, 상기 유저 프로파일 추출 시스템이 먼저 속성판단이 수행되는 제1처리단위 문서에서 상기 적어도 하나의 세부 프로파일에 포함되는 제1세부 프로파일에 대한 속성을 제1속성으로 특정하는 단계 및 상기 유저 프로파일 추출 시스템이 상기 제2처리단위 문서에서 상기 제1세부 프로파일에 대한 속성을 제2속성으로 특정하는 단계를 더 포함하며, 상기 속성판단 단계의 결과에 기초하여 상기 유저 프로파일 추출 시스템이 상기 유저의 유저 프로파일을 특정하는 단계는, 상기 유저 프로파일 추출 시스템이 상기 유저 프로파일 중 상기 제1세부 프로파일에 대해서는 속성을 미특정하는 것을 특징으로 한다.The attribute determination step in which the user profile extraction system determines the attributes of each of the at least one detailed profile that is included in the user profile for each processing unit document that is all or a part of the document and that is defined in advance includes: Specifying, as a first attribute, an attribute for a first detailed profile included in the at least one detailed profile in a first processing unit document including two processing unit documents and in which attribute determination is performed by the user profile extraction system first and specifying, by the user profile extraction system, an attribute of the first detailed profile in the second processing unit document as a second attribute, wherein the user profile extraction system The step of specifying the user profile of the user is characterized in that the user profile extraction system does not specify an attribute with respect to the first detailed profile among the user profiles.

상기의 방법은 컴퓨터 판독가능한 매체에 저장된 컴퓨터 프로그램에 의해 구현될 수 있다.The above method may be implemented by a computer program stored in a computer readable medium.

다른 일 측면에 따른 하이브리드 유저 프로파일 추출시스템은 프로세서, 프로그램이 저장된 저장장치를 포함하며, 상기 프로세서는 상기 프로그램을 구동하여, 소정의 유저가 작성한 컨텐츠에 상응하는 문서를 특정하고, 상기 문서의 전체 또는 일부인 처리단위 문서별로 유저 프로파일에 포함되며 미리 정의된 적어도 하나의 세부 프로파일 각각의 속성을 판단하고, 상기 속성판단의 결과에 기초하여 상기 유저 프로파일을 특정하되, 미리 저장된 적어도 하나의 세부 프로파일별 질문을 미리 학습된 기계독해(Machine Reading Comprehension) 엔진에 입력하여 상기 문서를 대상으로 상기 세부 프로파일별 답변을 획득하고, 획득한 상기 세부 프로파일별 답변에 기초하여 상기 적어도 하나의 세부 프로파일 중 적어도 일부의 속성을 특정하며, 상기 적어도 하나의 세부 프로파일 중 상기 세부 프로파일별 답변에 기초하여 속성이 특정되지 않은 나머지 세부 프로파일에 대해, 상기 적어도 하나의 세부 프로파일 각각에 상응하는 적어도 하나의 세부 프로파일 분류엔진 중에서 상기 나머지 세부 프로파일에 상응하는 세부 프로파일 분류엔진 각각으로 상기 처리단위 문서를 입력하여 상기 나머지 세부 프로파일의 속성을 판단하며, 상기 적어도 하나의 세부 프로파일 분류엔진 각각은, 상기 처리단위 문서가 해당하는 세부 프로파일에 대해 미리 정해진 속성들 중 어떤 속성을 갖는지를 분류하도록 학습된 딥러닝 기반의 분류엔진으로 구현되는 것일 수 있다.A hybrid user profile extraction system according to another aspect includes a processor and a storage device in which a program is stored, wherein the processor drives the program to specify a document corresponding to content created by a predetermined user, and to select the entire document or Determining the properties of each of the at least one detailed profile that is included in the user profile for each processing unit document that is a part, and specifying the user profile based on the result of the attribute determination, but at least one pre-stored question for each detailed profile By inputting a pre-learned machine reading comprehension engine to obtain the answer for each detailed profile for the document, the properties of at least some of the at least one detailed profile based on the obtained answer for each detailed profile specific, among the at least one detailed profile, among the at least one detailed profile classification engine corresponding to each of the at least one detailed profile, for the remaining detailed profiles whose attributes are not specified based on the answer for each detailed profile, the remaining detailed profiles The processing unit document is input to each of the detailed profile classification engines corresponding to the profile to determine the attributes of the remaining detailed profiles, and each of the at least one detailed profile classification engine is configured to pre-review the detailed profile to which the processing unit document corresponds. It may be implemented as a classification engine based on deep learning that has been trained to classify which property among the set properties.

본 발명의 기술적 사상에 따르면 또한 유저가 작성한 컨텐츠의 내용을 파악하여 유저의 프로파일을 추출할 수 있으므로, 유저의 프로파일을 활용한 다양한 서비스를 제공할 수 있는 효과가 있다.According to the technical idea of the present invention, the user's profile can be extracted by grasping the contents of the content created by the user, so that various services can be provided using the user's profile.

또한 MRC(Maching Reading Comprehension) 엔진 또는 분류(Classification) 엔진을 상호 보완적으로 이용하여, 유저의 컨텐츠에 유저 프로파일을 직접 알 수 있는 표현이 존재하는 경우와 그렇지 않은 경우 모두에도 효과적인 유저 프로파일을 추출할 수 있는 효과가 있다.In addition, by using the MRC (Maching Reading Comprehension) engine or the Classification engine, it is possible to extract an effective user profile both when there is an expression that can directly know the user profile in the user's content and when there is not. can have an effect.

도 1은 본 발명의 일 실시 예에 따른 유저 프로파일 추출방법을 구현하기 위한 시스템 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시 예에 따른 유저 프로파일 추출 시스템의 물리적인 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시 예에 따른 유저 프로파일 추출 시스템의 논리적인 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시 예에 따른 유저 프로파일 추출방법을 설명하기 위한 개략적인 플로우 차트를 나타낸다.
도 5는 본 발명의 일 실시 예에 따라 MRC 엔진을 통해 유저 프로파일을 추출하는 개념을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시 예에 따라 MRC 엔진과 분류 엔진을 이용하여 유저 프로파일을 추출하는 개념을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시 예에 따라 특정되는 유저 프로파일의 일 예를 나타내는 도면이다. 1 is a diagram for explaining a system configuration for implementing a user profile extraction method according to an embodiment of the present invention.
2 is a diagram for explaining a physical configuration of a user profile extraction system according to an embodiment of the present invention.
3 is a diagram for explaining a logical configuration of a user profile extraction system according to an embodiment of the present invention.
4 is a schematic flowchart for explaining a user profile extraction method according to an embodiment of the present invention.
5 is a diagram for explaining the concept of extracting a user profile through an MRC engine according to an embodiment of the present invention.
6 is a diagram for explaining the concept of extracting a user profile using an MRC engine and a classification engine according to an embodiment of the present invention.
7 is a diagram illustrating an example of a user profile specified according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Since the present invention can apply various transformations and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise.

본 명세서에 있어서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In the present specification, terms such as “comprise” or “have” are intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and one or more other It is to be understood that this does not preclude the possibility of addition or presence of features or numbers, steps, operations, components, parts, or combinations thereof.

또한, 본 명세서에 있어서는 어느 하나의 구성요소가 다른 구성요소로 데이터를 '전송'하는 경우에는 상기 구성요소는 상기 다른 구성요소로 직접 상기 데이터를 전송할 수도 있고, 적어도 하나의 또 다른 구성요소를 통하여 상기 데이터를 상기 다른 구성요소로 전송할 수도 있는 것을 의미한다. 반대로 어느 하나의 구성요소가 다른 구성요소로 데이터를 '직접 전송'하는 경우에는 상기 구성요소에서 다른 구성요소를 통하지 않고 상기 다른 구성요소로 상기 데이터가 전송되는 것을 의미한다.In addition, in the present specification, when any one component 'transmits' data to another component, the component may directly transmit the data to the other component or through at least one other component. This means that the data may be transmitted to the other component. Conversely, when one component 'directly transmits' data to another component, it means that the data is transmitted from the component to the other component without passing through the other component.

도 1은 본 발명의 일 실시 예에 따른 유저 프로파일 추출방법을 구현하기 위한 시스템 구성을 설명하기 위한 도면이다.1 is a diagram for explaining a system configuration for implementing a user profile extraction method according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 기술적 사상에 따른 유저 프로파일 추출 시스템(100)은 소정의 소셜 채널(예컨대, SNS, 블로그, 뉴스, 카페 등)로부터 수집된 다수의 컨텐츠로부터 유저 프로파일을 추출할 수 있다.1, the user profile extraction system 100 according to the technical idea of the present invention can extract a user profile from a plurality of contents collected from a predetermined social channel (eg, SNS, blog, news, cafe, etc.) have.

상기 컨텐츠에는 유저들이 생성한 비정형 컨텐츠가 포함될 수 있다. 비정형 컨텐츠는 유저가 SNS(Social Network Service), 블로그, 카페, 뉴스 채널 등과 같이 자신이 원하는 컨텐츠를 작성하고 업로드하여 타인에게 공개하기 위한 다양한 소셜 채널들에 공개된 컨텐츠일 수 있다. 또한 상기 비정형 컨텐츠는 타인의 컨텐츠에 대해 자신의 의견을 덧붙여 작성하는 댓글 등의 컨텐츠일 수도 있다. The content may include atypical content generated by users. The atypical content may be content published on various social channels for a user to create and upload content desired by the user, such as a social network service (SNS), blog, cafe, news channel, and the like, and disclose it to others. In addition, the atypical content may be content such as a comment written by adding one's own opinion to other's content.

물론, 본 발명의 기술적 사상이 적용되는 컨텐츠는 불특정의 유저들이 자유롭게 작성하는 컨텐츠뿐만 아니라 언론사, 특정 기업 등이 작성하는 다양한 정형 또는 비정형 컨텐츠에도 적용될 수 있음은 물론이다. 하지만 본 명세서에서는 발명의 요지를 명확하게 하기 위해 불특정 유저들이 작성한 비정형 컨텐츠를 위주로 설명하기로 하면, 본 발명의 권리범위가 이에 한정되지는 않는다.Of course, the content to which the technical idea of the present invention is applied can be applied not only to content freely created by unspecified users, but also to various structured or unstructured content created by media companies, specific companies, and the like. However, in the present specification, if the atypical content created by unspecified users is mainly described in order to clarify the gist of the present invention, the scope of the present invention is not limited thereto.

불특정 유저들이 작성한 비정형 컨텐츠는 통상 해당 비정형 컨텐츠를 업로드한 채널 측(플랫폼 서비스 제공 측)에서 로그인 정보를 이용하여 어떤 유저가 해당 비정형 컨텐츠를 작성하였는지 즉, 개략의 유저 프로파일(예컨대, 성별, 연령, 거주지역, 직업 등)을 제한적으로 알 수는 있다. The atypical content created by unspecified users usually uses login information from the channel that uploaded the atypical content (platform service provider) to determine which user created the atypical content, that is, a user profile (eg, gender, age, area of residence, occupation, etc.) can be known to a limited extent.

하지만 이처럼 로그인을 통해 유저의 개인정보를 아는 것은 비정형 컨텐츠를 작성할 수 있는 서비스 측에서만 제한적으로 알 수 있을 뿐 아니라, 로그인을 하지 않고도 비정형 컨텐츠의 작성이 허용되는 경우에는 어떤 유저인지를 특정할 수 없어서 비정형 컨텐츠의 작성자의 유저 프롤파일을 알 수 없는 문제점이 있다. 더욱이 서비스 플랫폼 측에 회원 가입을 통해 제공되는 개인정보는 매우 제한적일 수 있고 이러한 제한적인 개인정보이외에도 유저가 작성한 컨텐츠에는 해당 유저의 프로파일을 알 수 있는 가치있는 정보들이 내포되어 있는 경우가 많다. 하지만 이러한 가치있는 정보들이 현재는 활용되지 못하고 있는 실정이다.However, knowing the user's personal information through login is limited only to the service side that can create atypical content, and when the creation of atypical content is allowed without logging in, it is not possible to specify what kind of user it is. There is a problem that the user profile file of the creator of the atypical content cannot be known. Moreover, the personal information provided to the service platform through membership registration can be very limited, and in addition to such limited personal information, the contents created by the user often contain valuable information that can identify the user's profile. However, such valuable information is currently not being utilized.

따라서 본 발명의 기술적 사상에 따른 상기 유저 프로파일 추출 시스템(100)은 유저가 작성한 컨텐츠 특히 비정형 컨텐츠의 내용을 이해하고 분석하여 상기 컨텐츠 작성자의 유저 프로파일을 특정할 수 있다.Therefore, the user profile extraction system 100 according to the technical idea of the present invention can understand and analyze the content of the content created by the user, especially the atypical content, to specify the user profile of the content creator.

그리고 다수의 비정형 컨텐츠 각각의 유저 프로파일을 특정하고 이를 활용하는 경우에는, 종래의 소셜 어낼리틱스 또는 소셜 리스닝 서비스에서 제공하는 특정 이슈에 대한 사용자 반응들이 총 몇 건이고 이중에서 긍정적으로 반응한 사용자는 약 몇 퍼센트, 부정적으로 반응한 사용자는 약 몇 퍼센트라는 분석 결과 이상의 가치있는 정보들이 도출될 수 있다. And when a user profile of each of a plurality of atypical contents is specified and used, there are a total of several user reactions to a specific issue provided by the conventional social analytics or social listening service, and among them, the user who responded positively is Valuable information can be derived beyond the analysis result of about a few percent, and about a few percent of users reacting negatively.

예컨대, 특정 이슈(예컨대, 뉴스, 인물, 정책, 상품, 서비스)에 대해 종래의 소셜 어낼리틱스의 경우에는 해당 이슈에 관련된 다수의 비정형 컨텐츠를 수집하고, 수집한 비정형 컨텐츠들 중에서 몇 프로는 긍정적인 반응을 했고 몇 프로는 부정적인 반응을 했으며, 나머지 몇 프로는 중립적인 반응을 했다는 결과가 도출된다. For example, in the case of conventional social analytics for a specific issue (eg, news, person, policy, product, service), a large number of unstructured content related to the issue is collected, and some of the collected unstructured content is positive. The result is that some of them reacted positively, some had a negative reaction, and the rest had a neutral reaction.

이에 비해 본 발명의 기술적 사상이 적용되는 경우에는 상기 특정 이슈에 대해 유저 프로파일 별로 감성분석의 결과가 도출될 수 있다. 예컨대, 유저 프로파일 중 성별이 포함되는 경우, 성별에 따라 비정형 컨텐츠가 독립적으로 분석될 수 있다. 따라서 남성의 몇 프로는 긍정, 몇 프로는 부정, 몇 프로가 중립이고, 여성의 경우에는 몇 프로는 긍정, 몇 프로는 부정, 몇 프로가 중립이라는 분석결과 도출될 수 있다.On the other hand, when the technical idea of the present invention is applied, the result of sentiment analysis can be derived for each user profile for the specific issue. For example, when gender is included in the user profile, atypical content may be independently analyzed according to gender. Therefore, some percentage of men are positive, some percentages are negative, some percentages are neutral, and in the case of women, some percentages are positive, some percentages are negative, and some percentages are neutral.

이러한 유저 프로파일은 성별뿐만 아니라 다양한 프로파일이 적용될 수 있다. 예컨대, 후술할 도 7에서는 성별, 기혼, 자녀 존재, 특정상품의 구매여부가 각각 세부 프로파일로써 유저 프로파일에 포함된 경우를 예시하고 있다. Such a user profile may be applied to various profiles as well as gender. For example, in FIG. 7, which will be described later, a case in which gender, marital status, existence of children, and purchase of a specific product are included in the user profile as detailed profiles is exemplified.

하지만 필요에 따라 또는 분석할 이슈에 대해, 주거지역, 학력, 연령대 등 다양한 프로파일이 유저 프로파일에 포함될 수 있음은 물론이다.However, it goes without saying that various profiles such as residential area, educational background, and age group may be included in the user profile as needed or for the issue to be analyzed.

본 발명의 기술적 사상에 따르면 상기 유저 프로파일 추출 시스템(100)은 비정형 컨텐츠 그 자체로부터 유저 프로파일을 판단할 수 있다. 본 명세서에서 유저 프로파일을 판단한다고 함은, 유저 프로파일에 포함되는 각각의 세부 프로파일이 판단됨을 의미할 수 있다.According to the technical idea of the present invention, the user profile extraction system 100 may determine the user profile from the atypical content itself. In the present specification, determining the user profile may mean that each detailed profile included in the user profile is determined.

예컨대, 비정형 컨텐츠 즉 텍스트의 내용만으로 소정의 세부 프로파일이 특정될 수 있는 경우와 그렇지 않은 경우가 존재할 수 있다.For example, there may exist a case in which a predetermined detailed profile can be specified only by the content of atypical content, that is, text, and a case in which it is not.

예컨대, 세부 프로파일이 성별인 경우, 소정의 비정형 컨텐츠에 다음과 같은 문장 1이 포함되어 있을 수 있다 .For example, when the detailed profile is gender, the following sentence 1 may be included in the predetermined atypical content.

문장 1: 남편이 OOO차를 사줬는데 너무 좋아요.Sentence 1: My husband bought me an OOO car and I love it.

이러한 경우 상기 문장 1만으로 세부 프로파일 중 성별이 여성으로 특정될 수 있다. 또한 세부 프로파일에 기혼 여부가 존재하는 경우 기혼으로 특정될 수 있다.In this case, the gender of the detailed profile may be specified as female only in sentence 1 above. In addition, if marital status exists in the detailed profile, it may be specified as married.

한편, 다음과 같은 문장 2가 비정형 컨텐츠에 포함되어 있을 수 있다.Meanwhile, the following sentence 2 may be included in the atypical content.

문장 2 : 오늘 월급을 모아서 드디어 OOO 차를 샀는데, 일단 디자인이 마음에 들어요.Sentence 2: I finally bought an OOO car by saving my salary today, but I like the design.

이러한 경우 상기 문장 2만으로는 세부 프로파일 중 성별, 기혼여부가 특정되지 않을 수 있다. 다만 세부 프로파일이 취업여부인 경우 취업을 했다는 속성이 특정될 수 있다.In this case, with only sentence 2, gender and marital status among the detailed profiles may not be specified. However, if the detailed profile is employment status, the attribute of being employed can be specified.

이처럼 비정형 컨텐츠에는 그 내용만으로 미리 정의된 세부 프로파일들 중에서 그 속성이 특정될 수 있는 것도 있고, 특정되지 않는 것이 존재할 수도 있다.As such, in the atypical content, properties may or may not be specified from among the detailed profiles defined in advance based on the content alone.

본 발명의 기술적 사상에 의하면, 이러한 방식으로 비정형 컨텐츠의 내용에 기반하여 상기 유저 프로파일 추출 시스템(100)은 미리 정해진 유저 프로파일들을 판단할 수 있다. According to the technical idea of the present invention, in this way, the user profile extraction system 100 may determine predetermined user profiles based on the content of the unstructured content.

판단 결과 각각의 세부 프로파일의 속성이 특정될 수도 있고, 미특정 즉, 특정되지 않을 수도 있다. As a result of the determination, the attribute of each detailed profile may be specified or may not be specified, that is, not specified.

예컨대, 도 7에 도시된 바와 같이 유저 프로파일에 성별, 기혼, 자녀존재, 구매여부가 각각 세부 프로파일로 포함되어 있을 수 있고, 성별이라는 세부 프로파일의 속성은 남성 및 여성일 수 있다. 또한, 기혼여부라는 세부 프로파일에서 속성은 기혼 및 미혼일 수 있다. 또한 자녀존재 여부라는 세부 프로파일에서 속성은 존재 및 미존재일 수 있다. 또한 구매여부라는 세부 프로파일에서 속성은 구매 및 미구매일 수 있다.For example, as shown in FIG. 7 , gender, marriage, existence of children, and purchase status may each be included as detailed profiles in the user profile, and attributes of the detailed profile such as gender may be male and female. In addition, in the detailed profile of marital status, attributes may be married and unmarried. In addition, in the detailed profile of whether children exist, attributes may be present or non-existent. Also, in the detailed profile of whether to purchase, attributes may be purchased and not purchased.

본 명세서에서 유저 프로파일을 판단한다고 함은 비정형 컨텐츠로부터 유저 프로파일에 포함된 각각의 세부 프로파일을 판단함을 의미할 수 있다. 이러한 세부 프로파일은 필요에 따라 하나만 존재할 수도 있고 복수개 존재할 수도 있다.In the present specification, determining the user profile may mean determining each detailed profile included in the user profile from the atypical content. One such detailed profile may exist or a plurality of such detailed profiles may exist as needed.

또한 세부 프로파일을 판단한다고 함은, 상기 비정형 컨텐츠의 내용에 의해 해당 세부 프로파일의 속성이 특정되는지 또는 미특정되는지를 판단함을 의미하며, 세부 프로파일의 속성이 특정되는 경우에는 어떤 속성으로 특정되는지를 판단함을 의미할 수 있다.In addition, determining the detailed profile means determining whether the attribute of the corresponding detailed profile is specified or unspecified by the content of the atypical content, and when the attribute of the detailed profile is specified, which attribute is specified It can mean judging.

본 발명의 기술적 사상에 따른 상기 유저 프로파일 추출 시스템(100)은 딥러닝 기반의 자연어 처리 모델을 이용하여 비정형 컨텐츠로부터 유저 프로파일을 판단할 수 있다.The user profile extraction system 100 according to the technical idea of the present invention may determine a user profile from atypical content using a deep learning-based natural language processing model.

일 예에 의하면, 상기 유저 프로파일 추출 시스템(100)은 텍스트 분류엔진을 포함할 수 있으며 이러한 분류엔진을 통해 유저 프로파일을 판단할 수 있다. 즉, 비정형 컨텐츠에 대한 유저 프로파일의 판단을 텍스트의 분류문제로 해결할 수 있다.According to an example, the user profile extraction system 100 may include a text classification engine, and may determine the user profile through the classification engine. That is, the judgment of the user profile for the atypical content can be resolved as a text classification problem.

또한 일 예에 의하면, 상기 유저 프로파일 추출 시스템(100)은 기계독해(MRC : Machine Reading Comprehenshion) 엔진을 포함할 수 있으며, 기계독해 엔진을 통해 유저 프로파일을 판단할 수 있다. 즉, 비정형 컨텐츠에 대한 유저 프로파일의 판단을 질문에 대한 응답으로부터 획득하는 접근을 할 수 있다. Also, according to an example, the user profile extraction system 100 may include a machine reading comprehension (MRC) engine, and may determine the user profile through the machine reading comprehension engine. That is, it is possible to access the judgment of the user profile for the unstructured content from the response to the question.

일 예에 의하면, 상기 텍스트 분류엔진 및/또는 기계독해 엔진은 문맥 인식(Context Sensitive) 한 자연어 처리모델로 사전학습(pre-trained)된 자연어 모델을 이용해 텍스트 분류 태스크 또는 기계독해 태스크를 수행하도록 학습된 엔진일 수 있다.According to an example, the text classification engine and/or the machine reading comprehension engine learns to perform a text classification task or a machine reading comprehension task using a pre-trained natural language model with a context sensitive natural language processing model. may be an old engine.

상기 자연어 처리모델은 예컨대, BERT(Bidirectional Encoder Representations from Transformers), ALBERT(A Lite BERT for self-supervised learning of language representations), ELECTRA(Efficiently Learning an Encoder that Classifies Token Replacements Accurately) 모델 등이 널리 알려져 있다. 이러한 자연어 처리모델은 양방향으로 토큰들의 관계를 벡터 임베딩에 이용함으로써 문맥을 인식할 수 있는 자연어 모델로 널리 알려져 있으며, 이러한 자연어 모델 이외에도 다양한 문맥 인식 기반의 자연어 모델을 통해 본 발명의 기술적 사상을 구현하기 위한 텍스트 분류엔지 및/또는 기계독해 엔진이 구축될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.As the natural language processing model, for example, Bidirectional Encoder Representations from Transformers (BERT), A Lite BERT for self-supervised learning of language representations (ALBERT), and Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) models are widely known. This natural language processing model is widely known as a natural language model that can recognize context by using the relationship between tokens for vector embedding in both directions. An average expert in the technical field of the present invention can easily infer that a text classification engine and/or a machine reading engine can be built for

텍스트 분류엔진은 딥러닝 분야에서 전형적으로 많이 사용되는 분류(Classification) 태스크(task)를 통해 현재 처리하고 있는 텍스트의 사이즈 즉, 처리문서(예컨대, 비정형 컨텐츠 전체 또는 그 중 일부)가 세부 프로파일의 어떤 속성을 가진다고 분류하도록 학습된 것일 수 있다. 이하 분류엔진은 문단단위로 분류를 수행할 수도 있고, 전체 문서 단위로 분류를 수행할 수도 있고, 문장 단위로 분류를 수행할 수도 있다. 따라서 어느 하나의 비정형 컨텐츠가 문서라고 정의하고 분류를 처리하는 단위를 처리문서로 정의할 수 있다. 본 명세서에서는 설명의 편의를 위해 상기 처리문서를 문장단위의 문서인 경우를 예시적으로 설명하도록 한다.The text classification engine determines the size of the text currently being processed through the classification task typically used in the deep learning field, that is, the processed document (eg, all or part of the unstructured content) of the detailed profile. It may have been learned to classify as having properties. Hereinafter, the classification engine may perform classification in units of paragraphs, may perform classification in units of entire documents, or may perform classification in units of sentences. Therefore, any one unstructured content may be defined as a document, and a unit for processing classification may be defined as a processing document. In the present specification, for convenience of explanation, the case where the processed document is a document in sentence units will be exemplarily described.

이러한 텍스트 분류엔진은 세부 프로파일별로 각각 구축될 수 있음은 물론이다. 물론 실시 예에 따라 하나의 처리문서를 다수의 세부 프로파일들의 속성을 분류하도록 분류엔진을 학습시킬 수 있다. 하지만 이러한 경우보다는 각각의 세부 프로파일별로 분류엔진을 구축하고 하나의 분류엔진은 해당하는 세부 프로파일의 속성을 분류하는 태스크만 수행하는 것이 더욱 정확도가 높을 수 있다. It goes without saying that such a text classification engine can be built for each detailed profile. Of course, according to an embodiment, the classification engine may be trained to classify the attributes of a plurality of detailed profiles for one processed document. However, rather than in this case, it may be more accurate to build a classification engine for each detailed profile and to perform only the task of classifying the properties of the corresponding detailed profile with one classification engine.

예컨대, 성별을 분류하는 세부 프로파일 분류엔진이 구축될 수 있다.For example, a detailed profile classification engine that classifies gender may be built.

성별 세부 프로파일 분류엔진은 상술한 바와 같은 사전학습된 자연어 모델을 통해 다수의 문장과 다수의 문장들 각각에 클래스를 라벨링(또는 어노테이션)하여 준비되는 학습 데이터를 학습함으로써 구축될 수 있다. 여기서 라벨링되는 클래스는 각각의 세부 프로파일별 속성일 수 있다. 예컨대, 성별에 대한 세부 프로파일인 경우 다수의 문장들, 다수의 문장들 각각의 라벨(예컨대, 남성, 여성, 미특정)이 학습 데이터로 준비될 수 있다. The gender detailed profile classification engine may be constructed by learning the training data prepared by labeling (or annotating) a plurality of sentences and a class in each of the plurality of sentences through the pre-trained natural language model as described above. The class labeled here may be an attribute for each detailed profile. For example, in the case of a detailed profile for gender, a plurality of sentences and a label (eg, male, female, unspecified) of each of the plurality of sentences may be prepared as learning data.

그러면 학습 데이터를 학습한 자연어 모델은 새로운 문장이 입력되면, 해당 문장을 통해 이 문장을 작성한 유저가 남성인지, 여성인지, 또는 특정할 수 없는지를 결과로 출력할 수 있다.Then, when a new sentence is input, the natural language model learning the learning data may output as a result whether the user who wrote the sentence is male, female, or cannot be specified through the corresponding sentence.

이러한 방식으로 각각의 세부 프로파일에 대해 소정의 처리문서 단위별로 해당 세부 프로파일의 속성을 분류할 수 있도록 세부 프로파일별 텍스트 분류엔진이 구현될 수 있다.In this way, for each detailed profile, a text classification engine for each detailed profile may be implemented to classify the properties of the corresponding detailed profile for each predetermined processing document unit.

그러면 상기 유저 프로파일 추출 시스템(100)은 소정의 비정형 컨텐츠를 입력받으면, 상기 비정형 컨텐츠에 대해 처리문서 단위별로 세부 프로파일 분류엔진에 입력하여 그 결과를 획득함으로써 유저 프로파일을 판단할 수 있다.Then, when the user profile extraction system 100 receives a predetermined unstructured content, the user profile can be determined by inputting the unstructured content into a detailed profile classification engine for each processing document unit and obtaining the result.

한편, 상기 유저 프로파일 추출 시스템(100)은 기계독해 엔진을 포함할 수도 있다.Meanwhile, the user profile extraction system 100 may include a machine reading comprehension engine.

기계독해 엔진은 널리 공지된 바와 같이, 상술한 바와 같이 대량의 말뭉치를 학습하여 문맥인식이 가능하도록 사전 학습된 자연어 모델을 통해, 소정의 질문이 입력되면 대상문서로부터 해당 질문에 대한 답변을 출력하도록 학습된 모델일 수 있다.As is well known, the machine reading engine learns a large amount of corpus as described above and outputs an answer to the question from the target document when a predetermined question is input through a pre-trained natural language model to enable context recognition. It may be a trained model.

이러한 기계독해 엔진은 지문(즉, 문서), 질문, 그리고 상기 질문에 대한 상기 지문 내에서 도출가능한 답변이 하나의 학습 데이터로 준비되고, 이러한 학습 데이터가 다 수 학습됨으로써 구축될 수 있다. Such a machine reading engine may be constructed by preparing a fingerprint (ie, a document), a question, and an answer derivable within the fingerprint to the question as one learning data, and learning a plurality of these learning data.

본 발명의 기술적 사상을 구현하기 위해서 학습된 기계독해 엔진은 각각의 비정형 컨텐츠가 답변을 도출할 지문(즉, 문서)이 되고, 각각의 비정형 컨텐츠로부터 세부 프로파일별 질문에 대한 답변을 출력할 수 있다.The machine reading comprehension engine learned to implement the technical idea of the present invention becomes a fingerprint (that is, a document) from which each unstructured content will derive an answer, and can output an answer to a question for each detailed profile from each unstructured content. .

물론, 이를 위해서 상기 기계독해 엔진은 세부 프로파일을 묻는 질문과 이에 대한 답변이 포함된 학습 데이터를 포함시켜 학습되는 것이 정확도롤 높이는데 유리할 수 있다. Of course, for this purpose, it may be advantageous for the machine reading comprehension engine to be learned by including training data including a question asking for a detailed profile and an answer thereto to increase accuracy.

상기 유저 프로파일 추출 시스템(100)은 세부 프로파일별로 질문을 미리 저장할 수 있다. 예컨대, 성별이라는 세부 프로파일에 대한 세부 프로파일별 질문은 "성별은 무엇인가요?" 또는 이와 실질적으로 동일한 의미를 갖는 질문일 수 있다.The user profile extraction system 100 may store questions in advance for each detailed profile. For example, for a detailed profile called gender, the detailed profile-specific question is "What is your gender?" Or it may be a question having substantially the same meaning.

예컨대, 기혼여부라는 세부 프로파일에 대한 세부 프로파일별 질문은 "결혼을 했나요?" 또는 이와 실질적으로 동일한 의미를 갖는 질문일 수 있다.For example, the question for each detailed profile about the detailed profile of marital status is "Are you married?" Or it may be a question having substantially the same meaning.

상기 유저 프로파일 추출 시스템(100)은 소정의 비정형 컨텐츠가 입력되면, 입력된 비정형 컨텐츠를 답변을 도출할 문서로 특정하여 상기 문서로부터 각각의 세부 프로파일별 질문에 대한 대답을 획득함으로써 상기 비정형 컨텐츠에 대한 유저 프로파일을 판단할 수 있다.When a predetermined unstructured content is input, the user profile extraction system 100 specifies the input unstructured content as a document to derive an answer, and obtains an answer to each detailed profile question from the document. User profile can be determined.

결국 본 발명의 기술적 사상에 따르면 상기 유저 프로파일 추출 시스템(100) 소정의 비정형 컨텐츠를 작성한 유저의 유저 프로파일을 텍스트 분류 태스크 및/또는 기계독해 태스크를 이용하여 판단할 수 있다.As a result, according to the technical idea of the present invention, the user profile extraction system 100 may determine the user profile of the user who has created the predetermined atypical content using a text classification task and/or a machine reading task.

통상적으로 기계독해 엔진은 하나의 엔진으로 여러 질문에 대한 답변을 획득할 수 있으므로, 세부 프로파일의 개수만큼 엔진을 구축하지 않아도 되어 시스템 구축에는 효과적일 수 있다. 하지만 이러한 기계독해의 태스크는 질문에 대한 답변이 지문에 명시적으로 표현되어 있는 경우에만 답변을 할 수 있고, 그렇지 않고 지문을 통해 간접적으로 답변을 알 수 있는 경우에는 답변을 하지 못할 가능성이 있다. In general, since a machine reading engine can obtain answers to several questions with one engine, it is not necessary to build an engine as many as the number of detailed profiles, which can be effective for system construction. However, this task of machine reading can answer only when the answer to the question is explicitly expressed in the passage, otherwise there is a possibility that the answer cannot be answered if the answer is indirectly known through the passage.

예컨대, 상술한 예시적 문장인 "남편이 OOO차를 사줬는데 너무 좋아요"라는 문장이 비정형 컨텐츠에 포함되어 있더라도, 해당 문장만으로는 현재 수준의 기계독해 엔진은 "성별은 무엇인가요?"라는 질문에 답변을 하지 못할 가능성이 있다. 즉, 지문에 유저가 남성이나 여성이라는 직접적인 표현이 존재하는 경우에만 기계독해 엔진은 "성별은 무엇인가요?"라는 질문에 정확한 답변을 할 가능성이 높다. For example, even if the above-mentioned exemplary sentence, "My husband bought me an OOO car, I love it" is included in the unstructured content, the current level of machine reading engine answers the question "What is your gender?" is unlikely to be able to In other words, the machine reading engine is more likely to correctly answer the question "What is your gender?" only if there is a direct expression that the user is male or female in the fingerprint.

이에 비해 텍스트 분류엔진의 경우는 간접적으로 세부 프로파일별 속성을 알 수 있는 경우라도, 이러한 간접적인 표현을 통해 속성이 특정되는 경우가 학습 데이터에 포함된 경우에는 상대적으로 높은 확률로 정확하게 분류를 할 수 있을 가능성이 크다.On the other hand, in the case of a text classification engine, even when the properties of each detailed profile can be indirectly known, when the case where the properties are specified through such indirect expression is included in the training data, classification can be performed accurately with a relatively high probability. It is very likely that there will be

따라서 비정형 컨텐츠에 세부 프로파일의 속성을 특정할 수 있는 내용이 포함되어 있더라도, 상대적으로 기계독해 엔진이 답변을 하지 못하여 속성을 특정하지 못하는 경우에도 세부 프로파일 분류엔진은 속성을 특정할 수 있을 가능성이 클 수 있다. Therefore, even if the unstructured content contains content that can specify the properties of the detailed profile, it is highly likely that the detailed profile classification engine will be able to specify the properties, even if the machine reading engine does not respond relatively to the machine reading comprehension engine and cannot specify the properties. can

따라서 서로 상호 보완적으로 기계독해 엔진과 세부 프로파일 분류엔진을 이용하는 경우 유저 프로파일을 판단하는데 효과적일 수 있다.Therefore, when the machine reading engine and the detailed profile classification engine are used complementary to each other, it can be effective to determine the user profile.

한편, 상기 유저 프로파일 추출 시스템(100)은 관리자 단말(200)과 통신을 수행할 수도 있다. 상기 관리자 단말(200)은 상기 유저 프로파일 추출 시스템(100)에 대한 설정을 수행할 수 있다. 예컨대, 전술한 바와 같이 상기 유저 프로파일 추출 시스템(100)에 포함되는 세부 프로파일 분류엔진 및/또는 기계독해 엔진을 학습시키기 위한 학습 데이터를 입력할 수 있다. 또는 기계독해 엔진을 통한 유저 프로파일 판단을 위해 세부 프로파일별 질문을 상기 유저 프로파일 추출 시스템(100)으로 입력할 수도 있다.Meanwhile, the user profile extraction system 100 may communicate with the manager terminal 200 . The manager terminal 200 may perform settings for the user profile extraction system 100 . For example, as described above, learning data for learning the detailed profile classification engine and/or the machine reading comprehension engine included in the user profile extraction system 100 may be input. Alternatively, a question for each detailed profile may be input into the user profile extraction system 100 in order to determine the user profile through the machine reading engine.

또는 소셜 채널로부터 수집할 특정 이슈를 입력할 수도 있다. 그러면 상기 유저 프로파일 추출 시스템(100)은 입력된 상기 특정 이슈에 관련된 비정형 컨텐츠를 다양한 소셜 채널로부터 수집할 수 있다. Alternatively, you can enter specific issues to collect from social channels. Then, the user profile extraction system 100 may collect the input atypical content related to the specific issue from various social channels.

상기 유저 프로파일 추출 시스템(100)은 소셜 채널로부터 다수의 비정형 컨텐츠를 수집하는 기능을 수행할 수 있고, 이를 위해 각각의 채널별로 크롤링(crawling)을 수행하는 크롤러를 구비할 수도 있다. 이러한 크롤링은 정기적인 주기로 수행될 수도 있다. 실시 예에 따라서는 크롤러는 상기 유저 프로파일 추출 시스템(100)에 포함되지 않을 수도 있고, 이러한 경우 상기 유저 프로파일 추출 시스템(100)은 별개의 크롤러로부터 비정형 컨텐츠를 입력받아 저장할 수도 있다. 비정형 컨텐츠의 수집에 대해서는 다양한 실시 예가 가능하다.The user profile extraction system 100 may perform a function of collecting a plurality of unstructured content from social channels, and for this purpose, a crawler may be provided for crawling for each channel. Such crawling may be performed at regular intervals. In some embodiments, the crawler may not be included in the user profile extraction system 100 , and in this case, the user profile extraction system 100 may receive and store unstructured content from a separate crawler. Various embodiments are possible with respect to the collection of atypical content.

이러한 기술적 사상을 구현하기 위한 상기 유저 프로파일 추출 시스템(100)의 구성은 도 2 및 도 3에 도시된 바와 같을 수 있다.The configuration of the user profile extraction system 100 for implementing this technical idea may be as shown in FIGS. 2 and 3 .

도 2는 본 발명의 일 실시 예에 따른 유저 프로파일 추출 시스템의 물리적인 구성을 설명하기 위한 도면이다. 또한 도 3은 본 발명의 일 실시 예에 따른 유저 프로파일 추출 시스템의 논리적인 구성을 설명하기 위한 도면이다.2 is a diagram for explaining a physical configuration of a user profile extraction system according to an embodiment of the present invention. 3 is a diagram for explaining a logical configuration of a user profile extraction system according to an embodiment of the present invention.

우선 도 2를 참조하면, 상기 유저 프로파일 추출 시스템(100)은 소정의 데이터 처리장치로 구현될 수 있다. First, referring to FIG. 2 , the user profile extraction system 100 may be implemented as a predetermined data processing device.

상기 유저 프로파일 추출 시스템(100)은 도 2에 도시된 바와 같이 본 명세서에서 정의되는 기능을 구현하기 위한 프로세서(110) 및 저장장치(120)를 포함한다. 상기 프로세서(110)는 소정의 프로그램(소프트웨어 코드)을 실행할 수 있는 연산장치를 의미할 수 있으며 상기 데이터 처리장치의 구현 예 또는 벤더(Vendor) 모바일 프로세서, 마이크로 프로세서, CPU, 싱글 프로세서, 멀티 프로세서, GPU 등 다양한 명칭으로 명명될 수 있으며 하나 이상의 프로세서로 구현될 수 있다. The user profile extraction system 100 includes a processor 110 and a storage device 120 for implementing the functions defined herein as shown in FIG. 2 . The processor 110 may mean an arithmetic device capable of executing a predetermined program (software code), and may include an implementation example of the data processing device or a vendor mobile processor, microprocessor, CPU, single processor, multiprocessor, It may be named by various names such as GPU, and may be implemented by one or more processors.

상기 프로세서(110)는 상기 프로그램을 구동하여 본 발명의 기술적 사상에 필요한 데이터 처리를 수행할 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.An average expert in the technical field of the present invention can easily infer that the processor 110 can perform data processing necessary for the technical idea of the present invention by driving the program.

상기 저장장치(120)는 본 발명의 기술적 사상을 구현하기 위한 프로그램이 저장/설치되는 장치를 의미할 수 있다. 구현 예에 따라 상기 저장장치(120)는 복수의 서로 다른 물리적 장치로 분할되어 있을 수 있으며, 구현 예에 따라 상기 저장장치(120)의 일부는 상기 프로세서(110)의 내부에 존재할 수도 있다. 상기 저장장치(120)는 구현 예에 따라 하드 디스크, GPU, SSD(Solid State Disk), 광 디스크, RAM(Random Access Memory), 및/또는 기타 다양한 종류의 기억매체로 구현될 수 있으며, 필요에 따라서는 상기 유저 프로파일 추출 시스템(100)에 착탈식으로 구현될 수도 있다. The storage device 120 may mean a device in which a program for implementing the technical idea of the present invention is stored/installed. According to an embodiment, the storage device 120 may be divided into a plurality of different physical devices, and according to an embodiment, a part of the storage device 120 may exist inside the processor 110 . The storage device 120 may be implemented as a hard disk, a GPU, a solid state disk (SSD), an optical disk, a random access memory (RAM), and/or other various types of storage media, depending on the embodiment. Accordingly, it may be implemented in a detachable manner in the user profile extraction system 100 .

상기 유저 프로파일 추출 시스템(100)은 소셜 채널로부터 비정형 컨텐츠를 수집하고 분석하기 위한 서버로 구현될 수 있지만, 이에 국한되지는 않으며 상기 프로그램을 실행할 데이터 처리능력이 있는 어떠한 데이터 처리장치(예컨대, 컴퓨터, 모바일 단말 등)로도 구현될 수 있다. The user profile extraction system 100 may be implemented as a server for collecting and analyzing unstructured content from social channels, but is not limited thereto and any data processing device (eg, a computer, It can also be implemented as a mobile terminal, etc.).

또한, 상기 유저 프로파일 추출 시스템(100)은 상기 프로세서(110), 상기 저장장치(120), 및 상기 유저 프로파일 추출 시스템(100)에 구비되는 다양한 주변장치들(예컨대, 입출력장치, 통신장치, 디스플레이 장치, 오디오 장치 등, 140, 141)과 이러한 장치들을 연결하기 위한 통신 인터페이스(예컨대, 통신 버스, 130 등)가 구비될 수도 있음은 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.In addition, the user profile extraction system 100 includes the processor 110 , the storage device 120 , and various peripheral devices (eg, input/output devices, communication devices, and displays) provided in the user profile extraction system 100 . It will be readily inferred by an average person skilled in the art that a device, an audio device, etc. 140, 141 and a communication interface (eg, a communication bus, 130, etc.) for connecting these devices may be provided. .

한편, 본 발명의 기술적 사상은 상기 저장장치(120)에 저장된 상기 프로그램과 상기 프로세서(110)가 유기적으로 결합되어 구현될 수 있으며, 이러한 상기 유저 프로파일 추출 시스템(100)이 실행하는 기능적인/논리적인 구성단위는 도 3에 도시된 바와 같을 수 있다. On the other hand, the technical idea of the present invention may be implemented by organically combining the program stored in the storage device 120 and the processor 110, and the functional/logical function executed by the user profile extraction system 100 The phosphorus constituent unit may be as shown in FIG. 3 .

즉, 상기 유저 프로파일 추출 시스템(100)은 제어모듈(110-1), 데이터 특정모듈(120-1), 텍스트 분류엔진(140-1), 및/또는 DB(150-1)을 포함할 수 있다. 실시 예에 따라 상기 기계독해 엔진(MRC 엔진, 130-1)을 더 포함할 수 있다.That is, the user profile extraction system 100 may include a control module 110-1, a data specification module 120-1, a text classification engine 140-1, and/or a DB 150-1. have. According to an embodiment, the machine reading comprehension engine (MRC engine, 130-1) may be further included.

본 명세서에서 모듈이라 함은, 본 발명의 기술적 사상을 수행하기 위한 하드웨어(예컨대, 상기 프로세서(110) 및/또는 저장장치(120)) 및 상기 하드웨어를 구동하기 위한 소프트웨어(예컨대, 본 발명의 기술적 사상을 구현하기 위한 상기 프로그램)의 기능적, 구조적 결합을 의미할 수 있다. 예컨대, 상기 각각의 구성들은 소정의 코드와 상기 소정의 코드가 수행되기 위한 하드웨어 리소스(resource)의 논리적인 단위를 의미할 수 있으며, 반드시 물리적으로 연결된 코드를 의미하거나, 한 종류나 특정 개수의 하드웨어를 의미하는 것은 아님은 본 발명의 기술분야의 평균적 전문가에게는 용이하게 추론될 수 있다. 따라서 상기 각각의 구성들은 본 명세서에서 정의되는 기능을 수행하는 하드웨어 및 소프트웨어의 결합을 의미하며 특정 물리적 구성을 의미하는 것은 아니다. 도 3에 도시된 구성들이 어떤 기능을 수행한다고 함은, 상기 저장장치(120)에 저장된 프로그램을 상기 프로세서(110)가 구동하여 해당 기능을 수행하는 것을 의미함을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.In this specification, the term "module" refers to hardware (eg, the processor 110 and/or the storage device 120) for performing the technical idea of the present invention and software (eg, the technology of the present invention) for driving the hardware. It may mean a functional and structural combination of the above program) for implementing the idea. For example, each of the components may mean a logical unit of a predetermined code and a hardware resource for executing the predetermined code, and must necessarily mean physically connected code, or one type or a specific number of hardware. It can be easily deduced to an average expert in the technical field of the present invention. Accordingly, each of the above components refers to a combination of hardware and software performing a function defined in this specification, and does not mean a specific physical configuration. An average expert in the technical field of the present invention means that the processor 110 drives the program stored in the storage device 120 to perform a certain function of the components shown in FIG. 3 to perform the corresponding function. can be easily inferred.

상기 제어모듈(110-1)은 상기 유저 프로파일 추출 시스템(100)에 포함되는 다른 구성(예컨대, 상기 데이터 특정모듈(120-1), 기계독해 엔진(MRC 엔진, 130-1), 텍스트 분류엔진(140-1), 및/또는 DB(150-1) 등)의 기능 및/또는 리소스(resource)를 제어할 수 있다. The control module 110-1 includes other components included in the user profile extraction system 100 (eg, the data specification module 120-1, a machine reading comprehension engine (MRC engine, 130-1), and a text classification engine). (140-1), and/or DB 150-1, etc.) may control functions and/or resources.

또한 상기 제어모듈(110-1)은 비정형 컨텐츠별로 유저 프로파일을 판단할 수 있다. 이를 위해 상기 기계독해 엔진(130-1) 및/또는 텍스트 분류엔진(140-1)을 이용할 수 있다. Also, the control module 110 - 1 may determine a user profile for each unstructured content. For this, the machine reading engine 130-1 and/or the text classification engine 140-1 may be used.

상기 텍스트 분류엔진(140-1)은 세부 프로파일별로 적어도 하나의 세부 프로파일 분류엔진 예컨대, 제1세부 프로파일 분류엔진(141-1), 제N세부 프로파일 분류엔진(142-1)을 포함할 수 있다.The text classification engine 140-1 may include at least one detailed profile classification engine for each detailed profile, for example, a first detailed profile classification engine 141-1 and an Nth detailed profile classification engine 142-1. .

상기 기계독해 엔진(130-1)은 전술한 바와 같이 특정 유저가 작성한 비정형 컨텐츠를 지문으로 하고, 소정의 질문을 입력받으면 상기 지문으로부터 질문에 대한 답변을 출력하도록 학습된 엔진일 수 있다.As described above, the machine reading engine 130 - 1 may be an engine that has been trained to use atypical content written by a specific user as a fingerprint, and to output an answer to a question from the fingerprint when a predetermined question is input.

상기 제어모듈(110-1)은 미리 저정된 특정 비정형 컨텐츠를 유저 프로파일을 판단할 컨텐츠로 특정할 수 있다. 이러한 비정형 컨텐츠로부터 문서를 특정할 수 있다. 문서는 비정형 컨텐츠가 텍스트인 경우에는 비정형 컨텐츠 자체일 수도 있다. 또는 비정형 컨텐츠가 음성이나 이미지 또는 동영상 인 경우에는, 상기 음성을 텍스트로 변환(예컨대 STT(Speech-to-text))하여 생성되는 텍스트, 이미지의 경우에는 이미지에 표시된 문자가 OCR 되어 획득되는 텍스트가 문서로 특정될 수 있다. 또는 동영상인 경우에는 문자 OCR, 및/또는 화자의 음성이 텍스트로 변환되어 문서가 특정될 수 있다. 이를 위해 상기 유저 프로파일 추출 시스템(100)에는 OCR 엔진, 또는 STT 엔진 등이 더 포함될 수도 있다.The control module 110 - 1 may specify a specific atypical content stored in advance as a content for which a user profile is to be determined. A document can be specified from such unstructured content. The document may be the unstructured content itself when the unstructured content is text. Alternatively, if the atypical content is voice, image, or video, text generated by converting the voice into text (eg, Speech-to-text). It can be specified as a document. Alternatively, in the case of a moving picture, text OCR and/or a speaker's voice may be converted into text to specify a document. To this end, the user profile extraction system 100 may further include an OCR engine or an STT engine.

어떠한 경우든 비정형 컨텐츠로부터 분석의 대상이 되는 텍스트가 문서로써 특정될 수 있다. In any case, text to be analyzed from the unstructured content may be specified as a document.

유저 프로파일을 판단할 비정형 컨텐츠는 상기 데이터 특정모듈(120-1)에 의해 특정될 수 있다. 데이터 특정모듈(120-1)은 소셜 채널로부터 불특정 다수의 비정형 컨텐츠를 수집하여 DB(150-1)에 저장할 수 있다. 실시 예에 따라 상기 데이터 특정모듈(120-1)은 상기 관리자 단말(200)로부터 입력되는 키워드 또는 문장 등 이슈를 특정할 수 있는 정보를 입력받아, 해당 정보에 관련된 비정형 컨텐츠를 선택적으로 소셜 채널로부터 수집할 수도 있다. The atypical content for which the user profile is to be determined may be specified by the data specifying module 120-1. The data specifying module 120-1 may collect a large number of unspecified atypical contents from social channels and store them in the DB 150-1. According to an embodiment, the data specifying module 120-1 receives information that can specify an issue, such as a keyword or a sentence, input from the manager terminal 200, and selectively selects atypical content related to the information from a social channel. can also be collected.

다른 실시 예에 따라서는 상기 DB(150-1)에는 다수의 비정형 컨텐츠가 저장되어 있고, 관리자 단말(200)로부터 입력되는 정보에 기초하여 상기 데이터 특정모듈(120-1)은 유저 프로파일을 판단할 컨텐츠를 선택할 수도 있다.According to another embodiment, a plurality of unstructured contents are stored in the DB 150-1, and the data specifying module 120-1 determines a user profile based on information input from the manager terminal 200. You can also select content.

그러면 상기 제어모듈(110-1)은 텍스트 분류엔진(140-1)을 이용하여 선택된 비정형 컨텐츠에 상응하는 문서를 대상으로 유저 프로파일을 판단할 수 있다.Then, the control module 110-1 may determine a user profile for a document corresponding to the selected atypical content using the text classification engine 140-1.

상기 텍스트 분류엔진(140-1)에 포함된 각각의 세부 프로파일 분류엔진들(141-1, 142-1) 각각은은 처리단위인 처리문서별로 세부 프로파일 속성을 판단할 수 있다. Each of the detailed profile classification engines 141-1 and 142-1 included in the text classification engine 140-1 may determine a detailed profile attribute for each processed document that is a processing unit.

상기 제어모듈(110-1)은 처리문서를 세부 프로파일 분류엔진들(141-1, 142-1) 각각으로 입력하여 분류결과 즉, 속성의 판단결과를 획득할 수 있다. 그리고 이러한 작업을 문서 전체에 수행함으로써 전체 문서에 대해 텍스트 분류 태스크 기반으로 유저 프로파일을 판단할 수 있다.The control module 110-1 may obtain a classification result, that is, a determination result of an attribute by inputting the processed document to each of the detailed profile classification engines 141-1 and 142-1. And by performing this operation on the entire document, the user profile can be determined based on the text classification task for the entire document.

한편, 상기 제어모듈(110-1)은 기계독해 엔진(130-1)을 이용하여 유저 프로파일을 판단할 수도 있다.Meanwhile, the control module 110 - 1 may determine the user profile using the machine reading engine 130 - 1 .

이러한 경우 상기 제어모듈(110-1)은 비정형 컨텐츠에 상응하는 문서를 특정하고, 상기 문서를 지문으로 설정하여 상기 기계독해 엔진(130-1)에 순차적으로 세부 프로파일별 질문을 입력할 수 있다. 그리고 각각의 세부 프로파일별 질문에 대한 답변을 획득함으로써 유저 프로파일을 판단할 수 있다.In this case, the control module 110-1 may specify a document corresponding to the unstructured content, set the document as a fingerprint, and sequentially input questions for each detailed profile to the machine reading engine 130-1. In addition, the user profile can be determined by obtaining answers to questions for each detailed profile.

상기 DB(150-1)에는 본 발명의 기술적 사상에 따른 다양한 정보(예컨대, 수집된 비정형 컨텐츠들, 프로그램, 세부 프로파일별 질문, 유저 프로파일 분석결과 등)이 저장될 수 있음은 물론이다.Of course, various information (eg, collected atypical contents, programs, detailed profile questions, user profile analysis results, etc.) may be stored in the DB 150-1 according to the technical idea of the present invention.

도 4는 본 발명의 일 실시 예에 따른 유저 프로파일 추출방법을 설명하기 위한 개략적인 플로우 차트를 나타낸다.4 is a schematic flowchart for explaining a user profile extraction method according to an embodiment of the present invention.

도 4를 참조하면, 상기 유저 프로파일 추출 시스템(100)은 소정의 유저가 작성한 비정형 컨텐츠에 상응하는 문서를 특정할 수 있다(S100). 상기 문서는 텍스트로 구성되는 문서일 수 있다. Referring to FIG. 4 , the user profile extraction system 100 may specify a document corresponding to atypical content created by a predetermined user ( S100 ). The document may be a document composed of text.

상술한 문서는 유저 프로파일 판단의 대상이 되는 문서일 수 있다. The above-described document may be a document subject to user profile determination.

상기 비정형 컨텐츠가 텍스트로 작성된 것일 경우, 상기 문서는 상기 비정형 컨텐츠 자체일 수 있다. 만약 상기 비정형 컨텐츠가 텍스트 이외에 음성, 이미지, 동영상 등을 포함하는 경우, 전술한 바와 같이 상기 유저 프로파일 추출 시스템(100)은 텍스트 이외의 형식인 음성, 이미지, 동영상 등으로부터 OCR, STT 등을 통해 텍스트를 추출할 수 있으며, 추출한 텍스트를 상기 문서에 포함시킬 수 있다.When the unstructured content is written in text, the document may be the unstructured content itself. If the atypical content includes voice, image, video, etc. in addition to text, as described above, the user profile extraction system 100 receives text from voice, image, video, etc., which are other than text, through OCR, STT, etc. may be extracted, and the extracted text may be included in the document.

유저 프로파일의 판단의 대상이 되는 비정형 컨텐츠에 상응하는 문서가 특정되면 상기 유저 프로파일 추출 시스템(100)은 처리단위인 처리문서를 특정할 수 있다(S110). 상기 처리문서는 상술한 바와 같이 상기 문서에 포함되는 문장단위일 수 있지만 이에 한정되지는 않으며 문서 자체가 처리문서일 수도 있고, 문단 단위가 처리문서일 수도 있다. 이러한 처리문서의 단위 또는 크기는 텍스트 분류엔진(140-1) 또는 기계독해 엔진(130-1)의 학습을 어떤 단위로 진행하였는지에 따라 의존적으로 결정될 수도 있다.When the document corresponding to the atypical content, which is the target of the user profile determination, is specified, the user profile extraction system 100 may specify the processing document, which is a processing unit (S110). The processed document may be a sentence unit included in the document as described above, but is not limited thereto, and the document itself may be a processed document or a paragraph unit may be a processed document. The unit or size of the processed document may be determined depending on the unit in which the learning of the text classification engine 140-1 or the machine reading comprehension engine 130-1 is performed.

예컨대, 문서 전체에 대해 속성을 분류한 경우에는 복수의 문장들이 같이 이해되어야 속성을 분류할 수 있는 경우에 유리하며, 문장별로 속성을 분류하는 경우에는 보다 짧은 정보만을 엔진이 이해하면 되므로 해당 문장에 해당 속성을 알 수 있는 문맥/내용이 명확히 존재하는 경우에는 오히려 정확도가 높을 수 있으며 다양한 실시 예가 가능할 수 있다.For example, when attributes are classified for the entire document, it is advantageous when a plurality of sentences must be understood together to classify attributes. When attributes are classified by sentence, the engine only needs to understand shorter information, so If the context/content for which the corresponding attribute can be known clearly exists, the accuracy may be rather high, and various embodiments may be possible.

그러면 상기 유저 프로파일 추출 시스템(100)은 특정된 처리문서에 대해 속성판단 프로세스를 수행할 수 있다(S120). 이러한 속성판단 프로세스는 적어도 하나의 세부 프로파일 분류엔진(예컨대, 141-1, 142-1)로 출력하여, 각각의 출력결과를 획득하는 세부 프로파일별 속성판단 프로세스를 포함할 수 있다(S121, S122). Then, the user profile extraction system 100 may perform an attribute determination process for the specified processed document (S120). This attribute determination process may include an attribute determination process for each detailed profile that outputs to at least one detailed profile classification engine (eg, 141-1, 142-1) to obtain respective output results (S121, S122) .

출력결과는 세부 프로파일에 정의되는 속성들 중 어느 하나로 특정되거나 또는 미특정되는 것일 수 있다.The output result may be specified or unspecified by any one of the properties defined in the detailed profile.

어느 하나의 처리문서에 대해서 텍스트 분류기반 즉, 적어도 하나의 세부 프로파일 분류엔진(141-1, 142-1)을 이용한 속성판단 프로세스를 수행한 후, 상기 유저 프로파일 추출 시스템(100)은 상기 문서에 남은 부분 즉 속성판단 프로세스가 수행되지 않은 부분이 존재하는지를 확인할 수 있다(S130).After performing an attribute determination process using text classification-based, that is, at least one detailed profile classification engine 141-1, 142-1 for any one processed document, the user profile extraction system 100 adds the document to the document. It can be checked whether there is a remaining part, that is, a part on which the attribute determination process is not performed (S130).

만약 남은 부분이 존재한다면 남은 부분에서 다음 처리문서를 특정하여(S110), 다음 처리문서에 대해 상술한 바와 같은 텍스트 분류기반의 속성판단 프로세스를 수행할 수 있다(S120). If there is a remaining part, the next processed document may be specified in the remaining part (S110), and the text classification-based attribute determination process as described above may be performed on the next processed document (S120).

이러한 과정을 통해 문서 전체에 대한 텍스트 분류 기반의 속성판단 프로세스가 완료될 수 있다.Through this process, the text classification-based attribute judgment process for the entire document can be completed.

그러면 상기 유저 프로파일 추출 시스템(100)은 속성판단 결과에 기초하여 최종적으로 유저 프로파일을 특정할 수 있다(S140).Then, the user profile extraction system 100 may finally specify the user profile based on the attribute determination result (S140).

한편, 유저 프로파일 판단의 대상이 되는 비정형 컨텐츠에 상응하는 문서에 복수의 처리문서가 포함될 수 있다.On the other hand, a plurality of processed documents may be included in a document corresponding to the atypical content to be determined by the user profile.

이때 상기 유저 프로파일 추출 시스템(100)은 도 4에서 전술한 바와 같이 모든 처리문서들 각각에 대해 적어도 하나의 세부 프로파일 각각에 대한 속성을 판단하는 프로세스를 수행할 수도 있다.In this case, the user profile extraction system 100 may perform a process of determining an attribute of each of at least one detailed profile for each of all processed documents as described above with reference to FIG. 4 .

하지만 만약 먼저 속성판단이 수행되는 처리문서에서 특정 세부 프로파일의 속성이 특정되는 경우라면, 다음 처리문서에서는 상기 특정 세부 프로파일에 대해서는 속성 판단을 생략함으로써 빠른 시간내에 최종적인 유저 프로파일을 특정할 수도 있다.However, if the attribute of a specific detailed profile is specified in the processed document in which the attribute determination is first performed, the final user profile may be specified in a short time by omitting the attribute determination for the specific detailed profile in the next processed document.

예컨대, 도 7에 예시된 바와 같이 유저 프로파일에는 4개의 세부 프로파일 즉, 제1세부 프로파일, 제2세부 프로파일, 제3세부 프로파일, 및 제4세부 프로파일이 정의되어 있을 수 있다. 제1세부 프로파일, 제2세부 프로파일, 제3세부 프로파일, 및 제4세부 프로파일 각각은 제1속성 및 제2속성을 가질 수 있고, 디폴트로 미특정 속성을 가질 수 있다.For example, as illustrated in FIG. 7 , four detailed profiles, ie, a first detailed profile, a second detailed profile, a third detailed profile, and a fourth detailed profile, may be defined in the user profile. Each of the first detailed profile, the second detailed profile, the third detailed profile, and the fourth detailed profile may have a first attribute and a second attribute, and may have an unspecified attribute as a default.

또한 유저 프로파일 추출 시스템(100)이 특정한 문서에는 3개의 처리문서 즉, 제1처리문서, 제2처리문서, 및 제3처리문서가 포함되어 있을 수 있다. 그리고 상기 유저 프로파일 추출 시스템(100)은 순차적으로 제1처리문서, 제2처리문서, 제3처리문서에 대해 속성판단 프로세스를 수행할 수 있다.In addition, the document specific to the user profile extraction system 100 may include three processed documents, that is, a first processed document, a second processed document, and a third processed document. In addition, the user profile extraction system 100 may sequentially perform an attribute determination process on the first processed document, the second processed document, and the third processed document.

이때 상기 유저 프로파일 추출 시스템(100)은 상기 제1처리문서로부터 제1세부 프로파일에 대해 제1속성을 가진다고 특정하고, 제2 내지 제4세부 프로파일에 대해서는 미특정으로 판단할 수 있다.In this case, the user profile extraction system 100 may determine from the first processed document that the first detailed profile has the first attribute, and determine that the second to fourth detailed profiles are unspecified.

그러면 상기 유저 프로파일 추출 시스템(100)은 제2처리문서에 대해서는 제2 내지 제4세부 프로파일에 대해서만 속성판단을 수행하고, 먼저 특정된 제1세부 프로파일에 대해서는 속성판단 프로세스의 수행을 생략할 수도 있다. 이를 통해 빠른 유저 프로파일의 특정이 가능한 효과가 있다.Then, the user profile extraction system 100 may perform attribute determination only on the second to fourth detailed profiles with respect to the second processed document, and may omit the execution of the attribute determination process for the first detailed profile specified first. . Through this, it is possible to quickly specify a user profile.

한편, 다른 일예에 의하면, 먼저 판단이 수행되는 처리문서로부터 특정 세부 프로파일에 대해서 소정의 속성이 특정되더라도, 전술한 바와 같이 나머지 처리문서에 대해서도 모든 세부 프로파일에 대한 속성 판단이 수행될 수도 있다.Meanwhile, according to another example, even if a predetermined attribute is specified for a specific detailed profile from the processed document on which determination is first performed, as described above, attribute determination for all detailed profiles may also be performed on the remaining processed documents.

이때 먼저 속성판단이 수행된 처리문서인 제1처리문서에서 특정되는 속성과 나중에 수행되는 처리문서인 제2처리문서에서 특정되는 속성이 다를 수도 있다. 예컨대 제1처리문서로부터는 제1세부 프로파일(예컨대, 성별)의 속성이 제1속성(예컨대, 남성)으로 특정되었는데, 제2처리문서로부터는 제1세부 프로파일(예컨대, 성별)의 속성이 제2속성(예컨대, 여성)으로 특정될 수도 있다.In this case, the attribute specified in the first processed document, which is the processed document on which the attribute determination is performed first, may be different from the attribute specified in the second processed document, which is the processed document that is performed later. For example, from the first processed document, the attribute of the first detailed profile (eg, gender) is specified as the first attribute (eg, male), and from the second processed document, the attribute of the first detailed profile (eg, gender) is specified as the second attribute. It may be specified with two attributes (eg, female).

이러한 경우는 텍스트 분류엔진(140-1) 및/또는 기계독해 엔진(130-1)의 정확도의 문제일 수도 있고, 비정형 컨텐츠 자체의 신뢰도의 문제일 수도 있다. 어떠한 경우든 이러한 경우 즉 속성판단을 통해 특정된 속성에 서로 모순이 있는 세부 프로파일에 대해서는 미특정으로 판단하는 것이 오히려 신뢰도 있는 결과를 도출할 수 있다.In this case, it may be a problem of the accuracy of the text classification engine 140-1 and/or the machine reading engine 130-1, or it may be a problem of the reliability of the unstructured content itself. In any case, in this case, that is, judging the detailed profiles that contradict each other in the attributes specified through attribute judgment as unspecified can lead to a rather reliable result.

따라서 상기 유저 프로파일 추출 시스템(100)이 세부 프로파일별 속성판단의 결과에 기초하여 최종적인 유저 프로파일을 특정할 때에는, 모순이 있는 상기 세부 프로파일에 대해서는 속성이 미특정된 것으로 판단할 수도 있다. Accordingly, when the user profile extraction system 100 specifies the final user profile based on the result of attribute determination for each detailed profile, it may be determined that the attribute is unspecified for the detailed profile having a contradiction.

한편, 전술한 바와 같이 상기 유저 프로파일 추출 시스템(100)은 기계독해 엔진(130-1)을 이용하여 기계독해 기반 속성판단을 수행할 수도 있다. Meanwhile, as described above, the user profile extraction system 100 may perform machine reading-based attribute determination using the machine reading comprehension engine 130-1.

유저 프로파일 추출 시스템(100)은 텍스트 분류기반 속성판단 또는 기계독해 기반 속성판단 중 어느 하나의 방식만 이용할 수도 있고, 같이 이용할 수도 있다.The user profile extraction system 100 may use only one method of text classification-based attribute determination or machine reading-based attribute determination, or may be used together.

기계독해 기반 속성판단을 수행하는 개념은 도 5에 도시된 바와 같을 수 있다.The concept of performing machine reading-based attribute determination may be as shown in FIG. 5 .

도 5는 본 발명의 일 실시 예에 따라 MRC 엔진을 통해 유저 프로파일을 추출하는 개념을 설명하기 위한 도면이다.5 is a diagram for explaining the concept of extracting a user profile through an MRC engine according to an embodiment of the present invention.

도 5를 참조하면, 전술한 바와 같이 상기 유저 프로파일 추출 시스템(100)은 비정형 컨텐츠에 상응하는 문서를 특정할 수 있다. Referring to FIG. 5 , as described above, the user profile extraction system 100 may specify a document corresponding to the unstructured content.

그러면 특정한 상기 문서가 답변을 도출하는 지문으로 특정되며, 상기 유저 프로파일 추출 시스템(100)은 기계독해 엔진(130-1)으로 순차적으로 미리 저장된 세부 프로파일별 질문(예컨대, 제1세부 프로파일 질문, 제N세부 프로파일 질문)을 입력할 수 있다. Then, the specific document is specified as a fingerprint from which an answer is derived, and the user profile extraction system 100 sequentially pre-stores each detailed profile question (eg, the first detailed profile question, the second N detailed profile questions).

그리고 상기 기계독해 엔진(130-1)으로부터 세부 프로파일별 응답(답변)(예컨대, 제1세부 프로파일 응답(답변) 및 제N세부 프로파일 응답(답변))을 획득할 수 있다. 상기 답변은 세부 프로파일별로 미리 정해진 속성일 수도 있고, 해당 속성을 알 수 있는 문장 또는 문서의 영역일 수도 있다. 물론, 답변이 불가한 경우에는 해당 세부 프로파일에 대한 속성이 미특정으로 판단될 수 있다.In addition, a response (answer) for each detailed profile (eg, a first detailed profile response (answer) and an Nth detailed profile response (answer)) may be obtained from the machine reading engine 130 - 1 . The answer may be a predetermined attribute for each detailed profile, or may be a sentence or a document area in which the corresponding attribute can be known. Of course, when an answer is impossible, the attribute for the corresponding detailed profile may be determined to be unspecified.

어떠한 경우든 상기 답변을 통해서 적어도 하나의 세부 프로파일 중 전체 또는 일부의 속성이 특정될 수도 있다.In any case, attributes of all or part of at least one detailed profile may be specified through the answer.

한편, 전술한 바와 같이 상기 기계독해 기반의 속성판단 프로세스와 텍스트 분류기반의 속성판단 프로세스가 같이 이용될 경우, 두 프로세스는 서로 독립적으로 각각 수행될 수도 있으며 각각 수행된 결과에 기초하여 상기 유저 프로파일 추출 시스템(100)은 최종적으로 유저 프로파일을 특정할 수도 있다.Meanwhile, as described above, when the machine reading-based attribute judgment process and the text classification-based attribute judgment process are used together, the two processes may be performed independently of each other, and the user profile is extracted based on the respectively performed results. The system 100 may finally specify a user profile.

하지만 일 실시 예에 의하면, 상기 유저 프로파일 추출 시스템(100)은 어느 하나의 속성판단 프로세스(예컨대, 기계독해 기반 속성판단 프로세스)를 먼저 수행하고, 수행결과 특정되지 않은 세부 프로파일에 대해서만 선택적으로 다른 속성판단 프로세스(예컨대, 텍스트 분류기반 속성판단 프로세스)를 수행할 수도 있다.However, according to an embodiment, the user profile extraction system 100 performs any one attribute determination process (eg, a machine reading-based attribute determination process) first, and selectively other attributes only for the detailed profile that is not specified as a result of the execution. A determination process (eg, a text classification-based attribute determination process) may be performed.

다양한 실시 예가 가능하면 기계독해 기반 속성판단 프로세스와 텍스트 분류기반 속성판단 프로세스를 같이 사용하는 일 예는 도 6에 도시된다.If various embodiments are possible, an example of using the machine reading-based attribute judgment process and the text classification-based attribute judgment process together is shown in FIG. 6 .

도 6은 본 발명의 일 실시 예에 따라 MRC 엔진과 분류 엔진을 이용하여 유저 프로파일을 추출하는 개념을 설명하기 위한 도면이다.6 is a diagram for explaining the concept of extracting a user profile using an MRC engine and a classification engine according to an embodiment of the present invention.

도 6을 참조하면, 상기 유저 프로파일 추출 시스템(100)은 전술한 바와 같이 비정형 컨텐츠에 상응하는 문서를 특정할 수 있다(S100).Referring to FIG. 6 , the user profile extraction system 100 may specify a document corresponding to the unstructured content as described above ( S100 ).

그러면 상기 유저 프로파일 추출 시스템(100)은 기계독해 기반 속성판단 프로세스를 수행할 수 있다(S100-1). 그리고 텍스트 분류기반 속성판단 프로세스를 수행할 수 있다(S120).Then, the user profile extraction system 100 may perform a machine reading-based attribute determination process (S100-1). And a text classification-based attribute determination process may be performed (S120).

물론 텍스트 분류기반 속성판단 프로세스를 먼저 수행할 수도 있다. Of course, the text classification-based attribute judgment process can be performed first.

이때에는 텍스트 분류기반 속성판단 프로세스로 속성이 특정되는 세부 프로파일이 존재하는 경우, 나머지 세부 프로파일에 대해서만 기계독해 기반 속성판단 프로세스가 수행될 수 있다.In this case, if a detailed profile whose attributes are specified by the text classification-based attribute judgment process exists, the machine reading-based attribute judgment process may be performed only on the remaining detailed profiles.

하지만 통상적으로는 기계독해 기반 속성판단 프로세스를 통해 속성이 특정되지 않는 경우라도 텍스트 분류기반 속성판단 프로세스로는 속성이 특정될 수 있는 경우의 문서가 더 많을 수 있다. However, in general, even if the attribute is not specified through the machine reading-based attribute judgment process, there may be more documents in which the attribute can be specified by the text classification-based attribute judgment process.

따라서 일 실시 예에 의하면, 상기 유저 프로파일 추출 시스템(100)은 우선 기계독해 기반 속성판단 프로세스를 수행하여 기계독해 기반 속성판단 프로세스로 속성이 특정되는 세부 프로파일(예컨대, 제1세부 프로파일 및 제2세부 프로파일)에 대해서는 기계독해 기반 속성판단 프로세스로 특정되는 속성으로 확정하고, 속성이 특정되지 않은 세부 프로파일(예컨대, 제3세부 프로파일 및 제4세부 프로파일)에 대해서만 선택적으로 텍스트 분류기반 속성판단 프로세스를 수행할 수도 있다.Therefore, according to an embodiment, the user profile extraction system 100 first performs a machine reading-based property determination process, and then a detailed profile (eg, a first detailed profile and a second detailed profile) whose properties are specified by the machine reading-based property determination process. profile) is determined as an attribute specified by the machine reading comprehension-based attribute judgment process, and a text classification-based attribute judgment process is selectively performed only for the detailed profile (eg, the third detailed profile and the fourth detailed profile) for which the attribute is not specified You may.

물론 전술한 바와 같이 기계독해 기반 속성판단 프로세스와 텍스트 분류기반 속성판단 프로세스는 독립적으로 모든 세부 프로파일에 대해서 수행될 수 있고, 그 결과 동일한 세부 프로파일에 대해서 서로 다른 속성으로 특정되는 경우 즉 모순이 있는 세부 프로파일에 대해서는 미특정으로 처리할 수도 있다.Of course, as described above, the machine reading-based attribute judgment process and the text classification-based attribute judgment process can be independently performed for all detailed profiles, and as a result, when the same detailed profile is specified as different attributes, that is, inconsistent details Profiles may be treated as unspecified.

도 7은 본 발명의 일 실시 예에 따라 특정되는 유저 프로파일의 일 예를 나타내는 도면이다. 7 is a diagram illustrating an example of a user profile specified according to an embodiment of the present invention.

도 7은 특정 상품(예컨대, 특정 자동차 모델)에 대한 소비자 반응에 해당하는 비정형 컨텐츠를 수집하고, 이로부터 비정형 컨텐츠의 유저 프로파일을 특정한 경우를 예시하고 있다.7 illustrates a case in which atypical content corresponding to a consumer reaction to a specific product (eg, a specific car model) is collected, and a user profile of the atypical content is specified therefrom.

상기 유저 프로파일은 세부 프로파일로써 성별, 기혼여부, 자녀존재여부, 구매여부가 각각 포함된 경우를 예시하고 있다. 이러한 세부 프로파일은 어떤 종류의 비정형 컨텐츠를 어떤 목적으로 분석하는지에 따라 다양하게 설정될 수 있다. The user profile exemplifies a case in which gender, marital status, child existence, and purchase status are included as detailed profiles. This detailed profile may be set in various ways depending on what kind of atypical content is analyzed for what purpose.

예컨대 소비자들은 특정 자동차 모델에 대해서 다양한 소셜 채널에 비정형 컨텐츠를 작성하여 업로드할 수 있다. For example, consumers can create and upload unstructured content on various social channels for a specific car model.

그로한 비정형 컨텐츠의 내용에는 성별을 알 수 있는 내용, 기혼인지 여부를 알 수 있는 내용, 자녀가 존재하는지를 알 수 있는 내용, 상기 특정 자동차 모델을 구매한 소비자인지 아니면 단순히 평가한 소비자인지를 알 수 있는 내용이 포함된 경우가 있을 수 있다.In the content of such atypical content, it is possible to determine whether the consumer has purchased the specific car model or simply evaluated the content. There may be cases where content is included.

그리고 해당 내용을 상술한 바와 같은 방법으로 분석하여, 특정되는 유저 프로파일은 도 7에 도시된 바와 같을 수 있다.And by analyzing the corresponding contents in the same manner as described above, the specified user profile may be as shown in FIG. 7 .

예컨대, 문서 1 즉, 비정형 컨텐츠 1을 작성한 제1유저의 유저 프로파일은 성별이 남자이고, 기혼이며, 자녀가 존재하고, 아직 해당 특정 자동차 모델은 구매하지 않은 것으로 특정될 수 있다.For example, the user profile of the first user who created the document 1, that is, the atypical content 1, may be specified as having a male gender, a married woman, a child, and not yet purchasing the specific car model.

또한, 문서 2 즉, 비정형 컨텐츠 2를 작성한 제2유저의 유저 프로파일은 성별이 남자이고, 기혼이며, 자녀가 존재하는지 여부는 알 수 없고, 해당 특정 자동차 모델을 구매하였는지도 알 수 없는 것으로 특정될 수 있다. In addition, the user profile of the second user who created document 2, that is, atypical content 2, is a male, married, and has children. can

이와 같은 방식으로 문서 3 내지 9를 작성한 유저 각각의 유저 프로파일이 도 7에 도시된 바와 같이 특정될 수 있다.In this way, the user profile of each user who has created documents 3 to 9 can be specified as shown in FIG. 7 .

이러한 방식으로 다수의 비정형 컨텐츠 각각에 대해 유저 프로파일이 특정되면, 유저 프로파일에 포함된 세부 프로파일별로 감성분석이나 기타 소셜 어낼리틱스에서 이용되는 다양한 데이터 분석이 이루어질 수 있다. 그리고 이를 통해 유저 프로파일별로 상품(서비스)의 기획, 소비자 대응, 기타 다양하고 유의미한 인사이트의 도출이 가능해지는 효과가 있다.When a user profile is specified for each of a plurality of atypical contents in this way, various data analysis used in sentiment analysis or other social analytics may be performed for each detailed profile included in the user profile. And through this, product (service) planning for each user profile, consumer response, and other various and meaningful insights can be derived.

한편, 구현 예에 따라서, 상기 유저 프로파일 추출 시스템(100)은 프로세서 및 상기 프로세서에 의해 실행되는 프로그램을 저장하는 메모리를 포함할 수 있다. 상기 프로세서는 싱글 코어 CPU혹은 멀티 코어 CPU를 포함할 수 있다. 메모리는 고속 랜덤 액세스 메모리를 포함할 수 있고 하나 이상의 자기 디스크 저장 장치, 플래시 메모리 장치, 또는 기타 비휘발성 고체상태 메모리 장치와 같은 비휘발성 메모리를 포함할 수도 있다. 프로세서 및 기타 구성 요소에 의한 메모리로의 액세스는 메모리 컨트롤러에 의해 제어될 수 있다. 여기서, 상기 프로그램은, 프로세서에 의해 실행되는 경우, 본 실시예에 따른 비정규화 언어의 정규화 언어 변환시스템으로 하여금, 상술한 방법을 수행하도록 할 수 있다.Meanwhile, according to an embodiment, the user profile extraction system 100 may include a processor and a memory for storing a program executed by the processor. The processor may include a single-core CPU or a multi-core CPU. The memory may include high-speed random access memory and may include non-volatile memory such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to memory by the processor and other components may be controlled by a memory controller. Here, when the program is executed by the processor, the system for converting the normalized language of the non-normalized language according to the present embodiment may perform the above-described method.

한편, 본 발명의 실시예에 따른 방법은 컴퓨터가 읽을 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명의 실시예에 따른 제어 프로그램 및 대상 프로그램도 컴퓨터로 판독 가능한 기록 매체에 저장될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.On the other hand, the method according to the embodiment of the present invention may be implemented in the form of a computer readable program command and stored in a computer readable recording medium, and the control program and the target program according to the embodiment of the present invention may also be transmitted to the computer. It may be stored in a readable recording medium. The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored.

기록 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the software field.

컴퓨터로 읽을 수 있는 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and floppy disks. hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, and the like. In addition, the computer-readable recording medium is distributed in a computer system connected to a network, so that the computer-readable code can be stored and executed in a distributed manner.

프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Examples of the program instruction include not only machine code such as generated by a compiler, but also a device for electronically processing information using an interpreter or the like, for example, a high-level language code that can be executed by a computer.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타나며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

Claims

specifying, by the user profile extraction system, a document corresponding to content created by a predetermined user;
Attribute determination step in which the user profile extraction system determines the attribute of each of the at least one detailed profile included in the user profile and predefined for each processing unit document that is all or part of the document - Each of the at least one detailed profile is the document It is judged by the contents of the contents described in -, and
and specifying, by the user profile extraction system, the user profile based on the result of the attribute determination step,
The attribute determination step in which the user profile extraction system determines the attributes of at least one predefined detailed profile included in the user profile for each processing unit document that is all or part of the document,
obtaining, by the user profile extraction system, an answer for each detailed profile from the document by inputting at least one pre-stored question for each detailed profile into a pre-trained Machine Reading Comprehension engine;
specifying an attribute of at least a portion of the at least one detailed profile based on the obtained answer for each detailed profile; and
Among the at least one detailed profile, for the remaining detailed profiles whose attributes are not specified based on the answer for each detailed profile, the remaining detailed profile corresponds to the remaining detailed profile among at least one detailed profile classification engine corresponding to each of the at least one detailed profile and inputting the processing unit document to each of the detailed profile classification engines to determine the attributes of the remaining detailed profiles,
Each of the at least one detailed profile classification engine,
A hybrid user profile extraction method, characterized in that the processing unit document is implemented as a deep learning-based classification engine trained to classify which attribute among the predetermined attributes for the corresponding detailed profile.

According to claim 1, wherein the attribute determination step of determining the attribute of each of the at least one detailed profile that is included in the user profile for each processing unit document that is all or a part of the document by the user profile extraction system and is defined in advance,
A first detailed profile included in the at least one detailed profile in a first processing unit document in which the document includes a first processing unit document and a second processing unit document, and the user profile extraction system performs attribute determination first specifying the attribute for the first attribute as the first attribute; and
specifying, by the user profile extraction system, an attribute of the first detailed profile in the second processing unit document as a second attribute,
The step of specifying, by the user profile extraction system, the user profile of the user based on the result of the attribute determination step,
The hybrid user profile extraction method, characterized in that the user profile extraction system does not specify an attribute for the first detailed profile among the user profiles.

A computer program stored in a computer-readable recording medium and installed in a data processing apparatus to perform the method according to any one of claims 1 to 2.

processor;
Including a storage device in which the program is stored,
The processor drives the program,
Specifies a document corresponding to the content created by a predetermined user, and determines the attribute of each of at least one detailed profile that is included in the user profile and is predefined for each processing unit document that is all or part of the document--the at least one detailed profile Each of the profiles is determined by the content of the content described in the document-, specifying the user profile based on the result of attribute determination,
At least one pre-stored question for each detailed profile is input into a machine reading comprehension engine learned in advance to obtain the answer for each detailed profile for the document, and based on the obtained answer for each detailed profile, the at least At least one attribute of at least one detailed profile is specified, and at least corresponding to each of the at least one detailed profile for the remaining detailed profiles whose attributes are not specified based on the answer for each detailed profile among the at least one detailed profile inputting the processing unit document into each of the detailed profile classification engines corresponding to the remaining detailed profile among one detailed profile classification engine to determine the attributes of the remaining detailed profile;
Each of the at least one detailed profile classification engine,
A hybrid user profile extraction system that is implemented with a deep learning-based classification engine trained to classify which of the properties the processing unit document has among the properties predetermined for the corresponding detailed profile.