KR20200087312A

KR20200087312A - Device and method for evaluating the communication qualities of authors based on machine learning models

Info

Publication number: KR20200087312A
Application number: KR1020180172144A
Authority: KR
Inventors: 한경식
Original assignee: 아주대학교산학협력단
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-21
Also published as: KR102213358B1

Abstract

According to one embodiment of the present invention, a device for evaluating the communication qualities of authors based on a machine learning model may include: a collection unit for collecting survey data for evaluating the communication qualities of online text authors; a communication quality information generation unit for generating communication quality information for each online text author based on the survey data; a feature extraction unit for grouping a plurality of online text authors based on the communication quality information for each online text author, and extracting features from the online text of and author information on the grouped online text authors; a model building unit for constructing a model capable of evaluating the communication quality information on the online text authors by using a preset machine learning method based on the extracted features; and an evaluation unit for receiving new online text and evaluating the communication quality information on authors of the new online text through the built model.

Description

DEVICE AND METHOD FOR EVALUATING THE COMMUNICATION QUALITIES OF AUTHORS BASED ON MACHINE LEARNING MODELS}

본원은 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치 및 방법에 관한 것이다.The present application relates to an apparatus and method for evaluating communication qualities of authors based on a machine learning model.

소셜 네트워크 서비스는 개인에게 다양한 정보의 접근을 가능하게 하고, 다양한 사용자들과의 상호작용을 위한 환경을 제공한다. 소셜 네트워크 서비스는 현대인의 일상과 밀접한 관계가 있으며, 다양한 정보의 생성 및 제공으로 인해 그 활용도 또한 증가하고 있다.The social network service enables individuals to access various information and provides an environment for interaction with various users. Social network services are closely related to the daily lives of modern people, and their use is also increasing due to the generation and provision of various information.

한편, 소셜 네트워크 서비스는 누구나 접근 가능한 특성으로 인해 정보의 신뢰성 및 정확성에 대한 보장이 명확하지 않으며, 오해의 소지가 있는 정보 또는 잘못된 정보가 쉽고 빠르게 배포될 수 있는 문제점이 있다. 그러나, 소셜 네트워크 서비스를 통해 독자에게 제공되는 모든 정보의 신뢰성을 확인하는 것은 어려운 실정이므로, 독자 개인의 지적 능력, 경험, 지식 등의 기준으로 정보의 신뢰도를 판단해야 하는 한계점이 존재한다. 또한, 종래의 신뢰도 평가 기법들은 단순한 방법으로 온라인 게시물 자체의 신뢰도를 평가하는데 그쳤으며, 온라인 게시물의 저자 자체의 신뢰도를 바탕으로 해당 저자가 작성한 온라인 게시물의 신뢰도를 평가하려는 시도는 존재하지 않았다.On the other hand, the social network service has a problem in that the guarantee of the reliability and accuracy of information is not clear due to the characteristics that anyone can access, and misleading information or wrong information can be easily and quickly distributed. However, since it is difficult to confirm the reliability of all information provided to the reader through the social network service, there is a limitation in determining the reliability of information based on the reader's individual intellectual ability, experience, and knowledge. In addition, conventional reliability evaluation techniques have been limited to evaluating the reliability of the online post itself by a simple method, and there have been no attempts to evaluate the reliability of the online post written by the author based on the reliability of the author of the online post itself.

본원의 배경이 되는 기술은 한국공개특허공보 제10-2014-0076667호에 개시되어 있다.The background technology of the present application is disclosed in Korean Patent Publication No. 10-2014-0076667.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 온라인 텍스트에 대한 독자의 평가에 기초한 머신러닝 모델을 통해 온라인 텍스트 저자의 의사소통 자질을 평가할 수 있는 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치 및 방법을 제공하는 것을 목적으로 한다.The present application is to solve the above-mentioned problems of the prior art, the machine learning model based on the machine's learning model based on the reader's evaluation of the online text, the machine learning model based author's communication quality evaluation device that can evaluate the communication qualities of the author And to provide a method.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 신규 온라인 텍스트 저자의 의사소통 자질을 평가할 수 있는 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치 및 방법을 제공하는 것을 목적으로 한다.An object of the present invention is to provide a device and method for evaluating communication qualities of authors based on a machine learning model capable of evaluating communication qualities of new online text authors as a solution to the aforementioned problems of the prior art.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiments of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 방법은 (a) 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문 데이터를 수집하는 단계, (b) 상기 설문 데이터에 기초하여 온라인 텍스트 저자별 의사소통 자질 정보를 생성하는 단계, (c) 상기 온라인 텍스트 저자별 의사소통 자질 정보에 기초하여 복수의 온라인 텍스트 저자를 그룹화 하는 단계, (d) 상기 그룹화된 복수의 온라인 텍스트 저자의 온라인 텍스트 및 저자 정보로부터 특징을 추출하는 단계 및 (e) 상기 추출된 특징에 기초하여 기 설정된 기계학습 방식으로 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있는 모델을 구축하는 단계 및 (f) 신규 온라인 텍스트를 수신하고, 상기 구축된 모델을 통해 상기 신규 온라인 텍스트의 저자의 의사소통 자질 정보를 평가하는 단계를 포함할 수 있다.As a technical means for achieving the above technical task, a method for evaluating communication qualities of an author based on a machine learning model according to an embodiment of the present application includes: (a) collecting questionnaire data for evaluating communication qualities of an online text author , (b) generating online text author communication qualities information based on the questionnaire data, (c) grouping a plurality of online text authors based on the online text author communication qualities information, (d) the A step of extracting features from online text and author information of a plurality of grouped online text authors, and (e) a model capable of evaluating communication qualities of authors of online texts using a preset machine learning method based on the extracted features. And (f) receiving a new online text and evaluating the communication quality information of the author of the new online text through the constructed model.

본원의 일 실시예에 따르면, 상기 (a) 단계는, 상기 설문 데이터는 의사소통 자질 평가를 위한 자질 파라미터에 대한 응답을 포함하고, 상기 자질 파라미터는, 출처 신뢰도, 대인관계 매력도, 대화 능력, 상호작용 의도성을 포함할 수 있다.According to an embodiment of the present application, in step (a), the questionnaire data includes a response to a quality parameter for evaluating communication quality, and the quality parameter includes: source reliability, interpersonal attraction, conversation ability, Interaction intentions.

본원의 일 실시예에 따르면, 상기 (b) 단계는, 설문 응답자의 특성 데이터를 더 고려하여 의사소통 자질 정보를 생성하고, 상기 설문 데이터와 설문 응답자의 특성 데이터를 이용한 통계적 분석에 기초하여 상기 의사소통 자질 정보를 생성하고, 다변량 선형 회귀 분석을 통해 상기 특성 데이터와 상기 의사소통 자질 정보의 상관관계를 산출하되, 상기 설문 응답자의 특성 데이터는, 상기 설문 응답자의 성별, 나이, SNS 사용기간, SNS 사용빈도 및 성격 중 적어도 어느 하나를 포함할 수 있다.According to an embodiment of the present application, in step (b), communication quality information is generated by further considering characteristic data of a questionnaire respondent, and the physician is based on statistical analysis using the questionnaire data and characteristic data of a questionnaire respondent. Communication quality information is generated, and the correlation between the characteristic data and the communication quality information is calculated through multivariate linear regression analysis. The characteristic data of the questionnaire respondents include gender, age, SNS usage period, and SNS of the questionnaire respondent. It may include at least one of frequency and nature of use.

본원의 일 실시예에 따르면, 상기 (c) 단계는, 전체 온라인 텍스트 저자 중 재샘플링 기법에 기초하여 후보 온라인 텍스트 저자를 선별하고, 상기 의사소통 자질 정보에 대한 미리 설정된 기준값에 기초하여 상기 후보 온라인 텍스트 저자 각각을 상위 그룹 및 하위 그룹으로 분류할 수 있다.According to an embodiment of the present application, in step (c), candidate online text authors are selected based on a resampling technique among full online text authors, and the candidate online is based on preset reference values for the communication quality information. Each of the text authors can be divided into upper and lower groups.

본원의 일 실시예에 따르면, 상기 (d) 단계는, 상기 저자 정보, 구문 정보, 유사성 정보, 감정 정보 및 언어학 기법에 기초하여 상기 특징을 추출하고, 상기 (e) 단계는, 상기 특징을 입력으로 하는 분류 모델에 기초하여 상기 모델을 구축할 수 있다.According to an embodiment of the present application, step (d) extracts the feature based on the author information, syntax information, similarity information, emotion information, and linguistic technique, and step (e) inputs the feature The above model can be constructed based on the classification model.

본원의 일 실시예에 따르면, 상기 언어학 기법은, 언어 조사 및 단어 계산 분석 및 코사인유사성을 포함하고, 상기 언어 조사 및 단어 계산 분석은 상기 온라인 텍스트로부터 감정, 스타일, 조사 및 단어 중 적어도 어느 하나를 추출하고, 상기 코사인 유사성은, 상기 온라인 텍스트로부터 해시태그, 문장부호, 이모티콘, 링크, 리트윗 중 적어도 어느 하나를 추출할 수 있다.According to one embodiment of the present application, the linguistic technique includes linguistic survey and word count analysis and cosine similarity, and the linguistic survey and word count analysis includes at least one of emotion, style, survey, and word from the online text. Extraction, and the cosine similarity may extract at least one of hashtag, punctuation, emoticon, link, and retweet from the online text.

본원의 일 실시예에 따른 의사소통 자질을 평가하기 위한 모델을 구축하는 방법은, (b) 설문 응답자에 의한 상기 온라인 텍스트의 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문 데이터를 수집하는 단계, (c) 상기 설문 데이터에 기초하여 의사소통 자질 정보를 생성하는 단계, (d) 상기 온라인 텍스트의 저자 정보 및 상기 온라인 텍스트로부터 특징을 추출하는 단계 및 (e) 상기 특징을 입력으로 하는 분류 학습 모델에 기초하여 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있는 모델을 구축하는 단계를 포함할 수 있다.A method of constructing a model for evaluating communication qualities according to an embodiment of the present application includes: (b) collecting questionnaire data for evaluating communication qualities of an online text author of the online text by a survey respondent, ( c) generating communication quality information based on the questionnaire data, (d) extracting author information of the online text and features from the online text, and (e) classifying the learning model using the features as input. On the basis of this, it may include the step of constructing a model capable of evaluating the communication quality information of the author of the online text.

본원의 일 실시예에 따르면, 상기 (d) 단계는, 상기 저자 정보, 구문 정보, 유사성 정보, 감정 정보 및 언어학 기법에 기초하여 상기 특징을 추출하고, 상기 언어학 기법은, 언어 조사 및 단어 계산 분석 및 코사인유사성을 포함하고, 상기 언어 조사 및 단어 계산 분석은 상기 온라인 텍스트로부터 감정, 스타일, 조사 및 단어 중 적어도 어느 하나를 추출하고, 상기 코사인 유사성은, 상기 온라인 텍스트로부터 해시태그, 문장부호, 이모티콘, 링크, 리트윗 중 적어도 어느 하나를 추출할 수 있다.According to an embodiment of the present application, step (d) extracts the features based on the author information, syntax information, similarity information, emotion information, and linguistic techniques, and the linguistic techniques include language research and word calculation analysis And cosine similarity, wherein the language survey and word calculation analysis extract at least one of emotions, styles, surveys, and words from the online text, and the cosine similarity is a hashtag, punctuation, emoticon from the online text. , Link, Retweet can extract at least one.

본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치는 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문 데이터를 수집하는 수집부, 상기 설문 데이터에 기초하여 온라인 텍스트 저자별 의사소통 자질 정보를 생성하는 의사소통 자질 정보 생성부, 상기 온라인 텍스트 저자별 의사소통 자질 정보에 기초하여 복수의 온라인 텍스트 저자를 그룹화 하고, 상기 그룹화된 복수의 온라인 텍스트 저자의 온라인 텍스트 및 저자 정보로부터 특징을 추출하는 특징 추출부, 상기 추출된 특징에 기초하여 기 설정된 기계학습 방식으로 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있는 모델을 구축하는 모델 구축부 및 신규 온라인 텍스트를 수신하고, 상기 구축된 모델을 통해 상기 신규 온라인 텍스트의 저자의 의사소통 자질 정보를 평가하는 평가부를 포함할 수 있다.The apparatus for evaluating communication qualities of authors based on a machine learning model according to an embodiment of the present disclosure includes a collection unit that collects survey data for evaluating communication qualities of online text authors, and communication qualities information for each online text author based on the survey data Communication feature information generating unit for generating, grouping a plurality of online text authors based on the communication feature information for each of the online text authors, and extracting features from online text and author information of the grouped plurality of online text authors Extraction unit, receiving a model construction unit and a new online text to build a model capable of evaluating the communication qualities information of the author of the online text in a preset machine learning method based on the extracted features, and through the constructed model It may include an evaluation unit for evaluating the communication quality information of the author of the new online text.

본원의 일 실시예에 따르면, 상기 설문 데이터는 의사소통 자질 평가를 위한 자질 파라미터에 대한 응답을 포함하고, 상기 자질 파라미터는, 출처 신뢰도, 대인관계 매력도, 대화 능력, 상호작용 의도성을 포함할 수 있다.According to an embodiment of the present application, the questionnaire data includes a response to a quality parameter for evaluating communication quality, and the quality parameter includes source reliability, interpersonal attraction, communication ability, and interaction intention. Can.

본원의 일 실시예에 따르면, 상기 저자 정보, 구문 정보, 유사성 정보, 감정 정보 및 언어학 기법에 기초하여 상기 특징을 추출하고, 상기 언어학 기법은, 언어 조사 및 단어 계산 분석 및 코사인유사성을 포함하고, 상기 언어 조사 및 단어 계산 분석은 상기 온라인 텍스트로부터 감정, 스타일, 조사 및 단어 중 적어도 어느 하나를 추출하고, 상기 코사인 유사성은, 상기 온라인 텍스트로부터 해시태그, 문장부호, 이모티콘, 링크, 리트윗 중 적어도 어느 하나를 추출할 수 있다.According to an embodiment of the present application, the feature is extracted based on the author information, syntax information, similarity information, emotion information, and linguistic techniques, and the linguistic technique includes language survey and word calculation analysis, and cosine similarity, The language survey and word calculation analysis extracts at least one of emotion, style, survey, and word from the online text, and the cosine similarity is at least one of hashtag, punctuation, emoticon, link, and retweet from the online text. Either can be extracted.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 온라인 텍스트에 대한 독자의 평가에 기초한 머신러닝 모델을 통해 온라인 텍스트 저자의 의사소통 자질을 평가할 수 있는 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, an apparatus and method for evaluating communication qualities of authors based on a machine learning model capable of evaluating communication qualities of an online text author through a machine learning model based on readers' evaluation of online text can do.

전술한 본원의 과제 해결 수단에 의하면, 신규 온라인 텍스트 저자의 의사소통 자질을 평가할 수 있는 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an apparatus and method for evaluating communication qualities of authors based on a machine learning model capable of evaluating communication qualities of new online text authors.

도 1은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 평가 시스템의 구성을 도시한 도면이다.
도 2는 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 평가 장치의 구성을 도시한 도면이다.
도 3은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치의 의사소통 자질 평가를 위한 자질 파라미터에 대한 질문의 예를 도시한 도면이다.
도 4 및 도 5는 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치의 온라인 텍스트의 예를 도시한 도면이다.
도 6은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치의 자질 파라미터의 다변량 회귀 분석 연산 결과를 도시한 도면이다.
도 7은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 방법의 흐름을 도시한 도면이다.
도 8은 본원의 일 실시예에 따른 의사소통 자질을 평가하기 위한 모델을 구축하는 방법의 흐름을 도시한 도면이다.1 is a view showing the configuration of the author's communication evaluation system based on a machine learning model according to an embodiment of the present application.
FIG. 2 is a diagram showing the configuration of an author's communication evaluation apparatus based on a machine learning model according to an embodiment of the present application.
3 is a diagram illustrating an example of a question about a quality parameter for evaluating a communication quality of an apparatus for evaluating communication quality of an author based on a machine learning model according to an embodiment of the present application.
4 and 5 are diagrams illustrating examples of online texts of an author's communication quality evaluation device based on a machine learning model according to an embodiment of the present application.
FIG. 6 is a diagram illustrating a multivariate regression analysis result of a feature parameter of an author's communication feature evaluation apparatus based on a machine learning model according to an embodiment of the present application.
7 is a diagram illustrating a flow of a method for evaluating communication qualities of an author based on a machine learning model according to an embodiment of the present application.
8 is a diagram illustrating a flow of a method for building a model for evaluating communication qualities according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present application pertains may easily practice. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present application in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when one member is positioned on another member “on”, “on top”, “top”, “bottom”, “bottom”, “bottom”, it means that one member is on another member This includes cases where there is another member between the two members as well as when in contact.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the present specification, when a part “includes” a certain component, it means that the component may further include other components, not to exclude other components, unless specifically stated to the contrary.

도 1은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 평가 시스템의 구성을 도시한 도면이고, 도 2는 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 평가 장치의 구성을 도시한 도면이다.1 is a diagram showing the configuration of a communication evaluation system of an author based on a machine learning model according to an embodiment of the present application, and FIG. 2 is a configuration of a communication evaluation device of an author based on a machine learning model according to an embodiment of the present application It is a diagram showing.

도 1을 참조하면, 의사소통 자질 평가 시스템(10)은 의사소통 자질 평가 장치(100) 및 소셜 네트워크 서버(200)를 포함할 수 있다. 소셜 네트워크 서버(200)는 모든 온라인 텍스트 저자가 작성한 온라인 텍스트를 저장하는 데이터베이스를 포함할 수 있다. 또한 의사소통 자질 평가 장치(100) 및 소셜 네트워크 서버(200)는 네트워크로 연결될 수 있다. 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는, 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5G 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), wifi 네트워크, 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 반드시 이에 한정되지는 않는다.Referring to FIG. 1, the communication quality evaluation system 10 may include a communication quality evaluation device 100 and a social network server 200. The social network server 200 may include a database that stores online text written by all online text authors. In addition, the communication quality evaluation device 100 and the social network server 200 may be connected to a network. The network refers to a connection structure capable of exchanging information between each node such as terminals and servers, and examples of such a network include a 3rd Generation Partnership Project (3GPP) network, a Long Term Evolution (LTE) network, and 5G. Network, World Interoperability for Microwave Access (WIMAX) network, Internet, Local Area Network (LAN), Wireless Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), wifi network, Bluetooth (Bluetooth) network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, but is not limited thereto.

도 2를 참조하면, 의사소통 자질 평가 장치(100)는 수집부(110), 의사소통 자질 정보 생성부(120), 특징 추출부(130), 모델 구축부(140) 및 평가부(150)를 포함할 수 있다.Referring to FIG. 2, the communication feature evaluation device 100 includes a collection unit 110, a communication feature information generating unit 120, a feature extraction unit 130, a model building unit 140, and an evaluation unit 150 It may include.

수집부(100)는 소셜 네트워크 서버(200)로부터 상기 네트워크를 통해 온라인 텍스트를 수집할 수 있다. 또한, 수집부(100)는 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문 데이터를 수집할 수 있다. 설문 데이터를 수집하기에 앞서 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문에 대해 설명하면, 2016년 6월 내지 8월 올림픽 관련 트윗 즉, 온라인 텍스트와 2017년 1월 내지 4월 지카, 에볼라, 치쿤구니야 바이러스 관련 트윗에 대해 Twitter Search API (https://dev.twitter.com/rest/public)를 통해 약 12,000,000개의 온라인 텍스트를 수집하고, 상기 트윗 중 10개 미만의 트윗을 작성한 온라인 텍스트 저자는 필터링한다. 필터링 후, 15,260명의 저자가 산출되었으며, 상기 저자 중 무작위로 1,000의 저자와 각 저자의 트윗을 선정하여 설문에 활용하였다.The collection unit 100 may collect online text from the social network server 200 through the network. In addition, the collection unit 100 may collect questionnaire data for evaluating communication qualities of online text authors. If you describe the questionnaire for evaluating the communication qualities of online text authors before collecting the survey data, the tweets for the June to August 2016 Olympic Games, that is, the online text and January to April 2017, Zika, Ebola, Chi An online text author who collected about 12,000,000 online texts via Twitter Search API (https://dev.twitter.com/rest/public) and wrote less than 10 of the above tweets for the Kungunya virus related tweets. To filter. After filtering, 15,260 authors were calculated, and 1,000 authors and tweets of each author were randomly selected from the authors.

설문은 SurveyMonkey를 통해 진행되며, 수집부(110)는 SurveyMonkey로부터 설문 데이터를 수집할 수 있다. 설문에 앞서 설문 응답자에게 설문 조사의 목표와 절차를 공지하고 설문의 동의를 요청하였으며, 동의한 설문자에 한하여 설문을 수행한다. 설문은 크게 두가지 섹션으로 구분될 수 있으며, 첫번째 섹션은 설문 응답자의 성별, 나이, SNS 사용기간, SNS 사용빈도 및 성격에 대한 설문을 설문 응답자에게 제공하며, 각 설문당 1점 내지 5점의 리커트(Likert) 척도로 응답할 수 있다. 예를 들어, 나이의 경우, 10대 1점, 20대 2점, 30대 3점, 40대 4점, 50대 5점으로 구분될 수 있으며, 설문 응답자는 본인이 해당하는 나이 대를 선택하는 방식으로 설문을 진행할 수 있다. 상기 성격에 대한 설문은 5대 성격 특성을 활용하였으며, 성격 특성에는 외향적(사회적 수준), 공감성(조직적, 철저성), 양심적(조직적, 철저성), 신경과민적(감정적 안정성 수준), 개방성(지식과 새로운 경험에 대한 창의성과 욕망 수준)을 포함할 수 있다.The survey is conducted through SurveyMonkey, and the collection unit 110 may collect survey data from SurveyMonkey. Prior to the survey, the survey respondents were informed of the goals and procedures of the survey, and the consent of the survey was requested. The questionnaire can be roughly divided into two sections, and the first section provides survey respondents with questionnaires about gender, age, SNS usage period, SNS usage frequency, and personality of the survey respondents, and each question has 1 to 5 points You can respond on the Likert scale. For example, in the case of age, it can be divided into 1 point in 10s, 2 points in 20s, 3 points in 30s, 4 points in 40s, and 5 points in 50s. You can conduct a questionnaire in a manner. The above personality questionnaire used five personality traits, and the personality traits were extroverted (social level), sympathetic (organized, thorough), conscientious (organized, thorough), neurosensitive (emotional stability level), and openness. (Level of creativity and desire for knowledge and new experiences).

설문의 두번째 섹션에서는 각 설문 응답자에게 동일한 온라인 텍스트 저자가 작성한 온라인 텍스트를 제공하고, 의사소통 자질 평가를 위한 자질 파라미터에 대한 질문을 설문 응답자에게 제공한다. 상술한 바와 같이 각 질문은 1점 내지 5점의 척도로 응답할 수 있다. 수집부(110)는 전술한 첫번째 섹션의 설문으로부터 설문 응답자의 특성 데이터를 수집할 수 있고, 두번째 섹션의 설문으로부터 설문 데이터를 수집할 수 있다. 즉, 상기 설문 응답자의 특성 데이터는 설문 응답자의 성별, 나이, SNS 사용기간, SNS 사용빈도 및 성격 중 적어도 어느 하나를 포함할 수 있다. 또한, 설문 데이터는 의사소통 자질 평가를 위한 자질 파라미터에 대한 응답을 포함할 수 있다.In the second section of the survey, each survey respondent is given an online text written by the same online text author, and a questionnaire respondent is asked about the quality parameters for communication quality assessment. As described above, each question can be answered on a scale of 1 to 5 points. The collection unit 110 may collect characteristic data of the survey respondents from the questionnaire of the first section, and may collect questionnaire data from the questionnaire of the second section. That is, the characteristic data of the survey respondent may include at least one of the gender, age, SNS usage period, SNS usage frequency, and personality of the survey respondent. In addition, the questionnaire data may include a response to a quality parameter for evaluating communication quality.

도 3은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치의 의사소통 자질 평가를 위한 자질 파라미터에 대한 질문의 예를 도시한 도면이다.3 is a diagram illustrating an example of a question about a quality parameter for evaluating a communication quality of an apparatus for evaluating communication quality of an author based on a machine learning model according to an embodiment of the present application.

상기 자질 파라미터는 예시적으로, 출처 신뢰도(Source credibility (SC)), 대인관계 매력도(Interpersonal attraction (IA)), 대화 능력(Communication competence (CC)), 및 상호작용 의도성(Intent to interact (INT))을 포함할 수 있다. 도 3을 참조하면, 각 자질 파라미터당 2개 이상의 질문을 포함하며, 각 질문은 리커트 척도에 기반하여 1점(전적으로 동의하지 않음), 2점(동의하지 않음), 3점(보통), 4점(동의), 5점(전적으로 동의) 중 하나를 선택하는 것으로 응답할 수 있다. 또한, 도 3을 참조하면 크론바흐 알파(Cronbach's α)의 결과에 따르면, 각 자질 파라미터의 질문 상호간은 높은 연관성을 나타내는 것을 확인할 수 있다. 예시적으로, 대인관계 매력도에서 Ia1과 Ia2의 질문은 사회적 매력과 관련되고 Ia3과 Ia4는 업무적 매력과 관련된다고 할 수 있다.The quality parameters are, for example, source credibility (SC), interpersonal attraction (IA), communication competence (CC), and intent to interact ( INT)). Referring to FIG. 3, each quality parameter includes two or more questions, and each question is based on the Likert scale, 1 point (not entirely agreed), 2 points (not agreed), 3 points (normal), You can respond by choosing one of four (agree) or five (agree entirely). In addition, referring to Figure 3, according to the results of Cronbach's α (Cronbach's α), it can be seen that the interrelationship between questions of each feature parameter is high. For example, in interpersonal attractiveness, the questions of Ia1 and Ia2 are related to social attractiveness, and Ia3 and Ia4 are related to business attractiveness.

의사소통 자질 정보 생성부(120)는 설문 데이터에 기초하여 온라인 텍스트 저자별 의사소통 자질 정보를 생성할 수 있다. 설문 데이터를 통한 의사소통 자질 정보는 전술한 4가지 자질 파라미터에 의존적인 변수일 수 있다. 그러나, 도 3의 크론바흐 알파의 결과에 도시된 바와 같이, 각 자질 파라미터의 질문은 높은 신뢰도를 가지면서 상호 밀접한 상관관계가 있으므로, 자질 파리미터에 대한 응답으로부터 의사소통 자질 정보를 산출하는 것이 유효한 접근법이라고 할 수 있다.The communication feature information generation unit 120 may generate communication feature information for each online text author based on the questionnaire data. Communication quality information through the questionnaire data may be a variable dependent on the above four quality parameters. However, as shown in the results of the Kronbach alpha of FIG. 3, since the question of each feature parameter has a high reliability and is closely related to each other, it is an effective approach to calculate communication feature information from the response to the feature parameter. It can be said.

또한, 설문 응답자의 특성 데이터는 의사 소통 자질 정보에 영향을 미칠 수 있다. 예시적으로, 노년층의 설문 응답자는 온라인 텍스트 저자의 의사소통 자질에 부정적인 인식을 가지고 있고, 여성인 설문 응답자는 모든 질문에 대해 남성 응답자보다 긍정적인 인식을 보였다. 이에 의사소통 자질 정보 생성부(120)는 설문 응답자의 특성 데이터를 더 고려하여 의사소통 자질 정보를 생성할 수 있다. In addition, characteristic data of survey respondents may influence communication qualities information. For example, the elderly respondents had a negative perception of the communication qualities of online text authors, and the female respondents had a more positive perception of all questions than the male respondents. Accordingly, the communication quality information generating unit 120 may generate communication quality information by further considering characteristic data of the survey respondent.

의사소통 자질 정보 생성부(120)는 설문 데이터와 설문 응답자의 특성 데이터를 이용한 통계적 분석에 기초하여 의사소통 자질 정보를 생성할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 다변량 선형 회귀 분석을 통해 특성 데이터와 의사소통 자질 정보의 상관관계를 산출할 수 있다. 상기 상관관계를 분석하기 위한 다양한 기법 중 다변량 선형 회귀 분석의 경우, 회귀 결과의 유효성을 측정하기 위해, DurbinWatson 통계 테스트, 분산팽창계수(variance inflation factor, VIF)검사 및 SPSS 통계 분석의 추가적인 측정 및 연산이 요구될 수 있다. 예시적으로, DurbinWatson 통계 테스트는 회귀 분석에서 잔차의 자기 상관 관계를 측정하는 것을 의미한다. DurbinWatson 통계의 결과값은 0 내지 4의 값으로 산출될 수 있다. 결과값이 2인 경우에는 표본에 자기 상관관계가 없음을 의미하고, 결과값이 0에 가까울수록 양(positive)의 자기 상관 관계를 나타낸다. 또한, 결과값이 4에 가까울수록 음(negative)의 자기 상관 관계를 나타낸다.The communication feature information generating unit 120 may generate communication feature information based on statistical analysis using questionnaire data and characteristic data of the survey respondents. In addition, the communication feature information generating unit 120 may calculate a correlation between characteristic data and communication feature information through multivariate linear regression analysis. Among the various techniques for analyzing the correlation, in the case of multivariate linear regression analysis, to measure the effectiveness of the regression results, DurbinWatson statistical test, variance inflation factor (VIF) test, and additional measurement and calculation of SPSS statistical analysis This may be required. Illustratively, the DurbinWatson statistical test means measuring the autocorrelation of the residuals in a regression analysis. The result of the DurbinWatson statistic can be calculated as a value of 0-4. When the result is 2, it means that there is no auto-correlation in the sample, and the closer the result is to 0, the more the positive auto-correlation. In addition, the closer the result is to 4, the more the negative auto-correlation.

분산팽창계수(VIF)검사는 최소 자승 회귀 분석에서 다중 공선 성의 심각도를 측정하는 것을 의미한다. 분산팽창계수는 공선성에 의해 추정된 회귀 계수의 분산이 얼마나 증가했는지 측정할 수 있는 지수를 제공할 수 있다. 통상적으로 분산팽창계수가 4를 초과하는 경우, 추가조사가 필요하며, 10을 초과하는 경우에는 상당한 다층성을 나타낸다고 해석할 수 있다.Variance expansion coefficient (VIF) testing means measuring the severity of multicollinearity in a least-squares regression analysis. The variance expansion coefficient can provide an index to measure how much the variance of the regression coefficient estimated by collinearity increased. In general, when the coefficient of dispersion expansion exceeds 4, further investigation is required, and when it exceeds 10, it can be interpreted that it exhibits considerable multi-layer properties.

단계적 선형 회귀 특성을 활용한 SPSS 통계 분석은 중요하지 않은 변수들을 동시에 제거하면서, 여러 개의 독립적인 변수를 다시 압축할 수 있다. 이는 가장 작은 상관 변수를 제거할 때마다 여러 횟수의 회귀분석으로 이어질 수 있다. 결과적으로, SPSS 통계 분석은 분포를 가장 잘 나타내는 독립 변수를 남기는 분석이다.SPSS statistical analysis using a stepwise linear regression feature can recompress multiple independent variables while simultaneously removing non-critical variables. This can lead to multiple regressions each time the smallest correlation variable is removed. Consequently, SPSS statistical analysis is the analysis that leaves the independent variable that best represents the distribution.

전술한 바와 같이, 의사소통 자질 정보 생성부(120)는 의사소통 자질 정보를 생성함에 있어서, 다변량 선형 회귀 분석을 통해 특성 데이터와 의사소통 자질 정보의 상관관계를 고려할 수 있다. 특성 데이터와 의사소통 자질 정보의 상관관계에 의하면, 설문 응답자의 특성 데이터는 의사소통 자질 정보에 영향을 미칠 수 있다는 것을 확인할 수 있다. 따라서, 의사소통 자질 정보 생성 시 설문 응답자의 특성 데이터를 고려하는 것은 중요한 사항이라고 할 수 있다. 의사소통 자질 정보 생성부(120)는 상기 자질 파라미터에 대한 응답의 평균을 통해 하나의 점수로 산출되는 의사소통 자질 정보를 생성할 수 있다. 이 때, 설문 응답자에 의한 설문 데이터는 설문 응답자 개인의 기준에 의한 점수에 기초하여 생성된 것이므로, 이러한 개인의 기준이 전체 의사소통 자질 정보에 영향을 미칠 수 있다. 따라서, 설문에 사용된 온라인 텍스트는 합의(consensus)가 고려될 필요가 있다. 본원의 일 실시예에 따르면, 의사소통 자질 정보 생성부(120)는 설문에 의한 의사소통 자질 정보의 평균 점수의 표준편차 σ를 고려할 수 있다. 또한 의사소통 자질 정보 생성부(120)는 온라인 텍스트의 비 언어적 요소(예를 들어, 해시태그, 문장부호, 이모티콘, 링크(URL), 리트윗)를 독립 변수로 고려할 수 있다.As described above, in generating communication quality information, the communication quality information generating unit 120 may consider correlation between characteristic data and communication quality information through multivariate linear regression analysis. According to the correlation between the characteristic data and the communication quality information, it can be confirmed that the characteristic data of the questionnaire respondents may affect the communication quality information. Therefore, it can be said that it is important to consider characteristic data of survey respondents when generating communication quality information. The communication feature information generating unit 120 may generate communication feature information that is calculated as one score through an average of responses to the feature parameters. At this time, since the survey data by the survey respondent is generated based on the score based on the criteria of the individual survey respondent, the criteria of such individuals may affect the overall communication quality information. Therefore, consensus needs to be considered for the online text used in the questionnaire. According to one embodiment of the present application, the communication feature information generation unit 120 may consider the standard deviation σ of the average score of the communication feature information by questionnaire. In addition, the communication feature information generation unit 120 may consider non-verbal elements (eg, hashtags, punctuation marks, emoticons, links (URLs), and retweets) of online text as independent variables.

도 4 및 도 5는 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치의 온라인 텍스트의 예를 도시한 도면이다.4 and 5 are diagrams illustrating examples of online texts of an author's communication quality evaluation device based on a machine learning model according to an embodiment of the present application.

도 4는 의사소통 자질 정보가 서로 다른 세가지 온라인 텍스트를 도시하고, 도 5는 의사소통 자질 정보가 상대적으로 높은 온라인 텍스트를 도시한다. 도 4를 참조하면, 도 4의 (a)는 다수의 해시태그로 구성된 온라인 텍스트를 도시한다. 멘션없이 해시태그만으로 트윗을 작성한 경우 낮은 의사소통 자질 정보로 산출될 수 있다. 도 4의 (a)의 의사소통 자질 정보, 즉 점수는 2.59로 산출되었고 표준편차 σ는 1.0으로 산출되었다. 이는 다수의 설문 응답자들이 해당 온라인 텍스트에 낮은 점수를 주었으나 소수의 설문 응답자들이 높은 점수를 주었다는 것을 의미한다. 도 4의 (b)는 반복되는 멘션과 해시태그를 모두 포함하는 트윗를 도시하며, 이러한 경우에도 도 4의 (a)와 같이 낮은 의사소통 자질 정보로 산출될 수 있다. (점수 2.98, 표준편차 σ 1.3) 도 4의 (c)는 동일한 주제에 대한 유사한 트윗을 도시한다. 도 4의 (c)의 경우, 설문 응답자에게 중복되고 무의미한 트윗으로 인식되어 낮은 점수가 산출될 수 있다. (점수 2.98, 표준편차 σ 1.2)FIG. 4 shows three online texts with different communication qualities information, and FIG. 5 shows online text with relatively high communication qualities information. Referring to Figure 4, Figure 4 (a) shows an online text composed of a number of hashtags. If a tweet is created with a hash tag without mention, it may be calculated with low communication quality information. The communication quality information of FIG. 4(a), that is, the score was calculated as 2.59 and the standard deviation σ was calculated as 1.0. This means that many survey respondents gave a low score to the online text, while a few survey respondents gave a high score. FIG. 4(b) shows a tweet including both the repeated mention and the hash tag, and in this case, as shown in FIG. 4(a), it may be calculated with low communication quality information. (Score 2.98, standard deviation σ 1.3) Fig. 4(c) shows similar tweets on the same subject. In the case of (c) of FIG. 4, a low score may be calculated because it is recognized as a duplicate and meaningless tweet to the questionnaire respondent. (Score 2.98, standard deviation σ 1.2)

반면 도 5는 상대적으로 높은 의사소통 자질 정보로 산출된 트윗을 도시한다.(점수 3.75, 표준편차 σ 0.55) 도 5를 참조하면, 설문 응답자들은 해시태그나 반복되는 멘션을 포함하지 않은 트윗에 높은 점수를 주었다는 것을 확인할 수 있다. 살펴본 바와 같이, 상기 비 언어적 요소는 의사소통 자질 정보에 영향을 미치므로, 의사소통 자질 정보를 산출함에 있어서, 비 언어적 요소를 고려하는 것은 중요한 사항이라 할 수 있다.On the other hand, FIG. 5 shows tweets calculated with relatively high communication quality information (score 3.75, standard deviation σ 0.55). Referring to FIG. 5, survey respondents are high in tweets that do not contain hashtags or repeated mentions. You can confirm that you gave the score. As described above, since the non-verbal element affects communication feature information, it can be said that it is important to consider the non-verbal element in calculating communication feature information.

특징 추출부(130)는 온라인 텍스트 저자별 의사소통 자질 정보에 기초하여 복수의 온라인 텍스트 저자를 그룹화할 수 있다. 예시적으로, 특징 추출부(130)는 의사소통 자질 정보에 의한 점수가 높은 상위 그룹과 점수가 낮은 하위 그룹으로 구분할 수 있다. 의사소통 자질 정보에 의한 점수의 높고 낮음을 구분하는 기준을 고려하여 의사소통 자질을 평가하기 위한 모델을 구축함으로써 보다 정확한 모델이 구축될 수 있다. 상기 모델의 구축은 보다 뒤에서 살펴보기로 한다.The feature extracting unit 130 may group a plurality of online text authors based on communication quality information for each online text author. For example, the feature extraction unit 130 may be divided into an upper group having a high score and a lower group having a low score according to communication quality information. A more accurate model can be built by constructing a model for evaluating communication qualities in consideration of criteria for classifying high and low scores based on communication quality information. The construction of the model will be described later.

특징 추출부(130)는 전체 온라인 텍스트 저자 중 재샘플링 기법(SMOT)에 기초하여 후보 온라인 텍스트 저자를 선별할 수 있다. 또한, 특징 추출부(130)는 의사소통 자질 정보에 대한 미리 설정된 기준값에 기초하여 후보 온라인 텍스트 저자 각각을 상위 그룹 및 하위 그룹으로 분류할 수 있다. 전술한 설문에 따르면, 각 설문은 리커트 척도로 응답할 수 있다. 이는 2.5점 이하의 점수는 설문 응답자의 부정적인 반응을 나타내고, 3.5점 이상의 점수는 설문 응답자의 긍정적인 반응을 나타낸다고 가정할 수 있다. 따라서, 특징 추출부(130)는 3.5점 이상의 의사소통 자질 정보를 갖는 온라인 텍스트 저자를 상위 그룹으로 분류하고, 2.5점 이하의 의사소통 자질 정보를 갖는 온라인 텍스트 저자를 하위 그룹으로 분류할 수 있다. 전체 온라인 텍스트 저자를 상위 그룹과 하위 그룹으로 분류한 결과, 하위 그룹이 20% 상위 그룹이 80%를 차지하였다. 특징 추출부(130)는 각 그룹에서 동일한 수의 후보 온라인 텍스트 저자를 선별하기 위해 재샘플링 기법 중 하나인 합성 마이너리티 오버 샘플링 기법(Synthetic Minority Over-sampling Technique, SMOTE)에 기초하여 후보 온라인 텍스트 저자를 선별할 수 있다. 특징 추출부(130)는 하위 그룹에서 오버 샘플링하여 상위 그룹과 하위 그룹이 동일한 비율로 후보 온라인 텍스트 저자를 선별할 수 있다. 합성 마이너리티 오버 샘플링 기법은 KNN 접근 방식을 적용한 기법으로 기 공지된 기법이므로 구체적인 설명은 생략한다.The feature extraction unit 130 may select a candidate online text author based on a resampling technique (SMOT) from among all online text authors. Also, the feature extracting unit 130 may classify each of the candidate online text authors into an upper group and a lower group based on a preset reference value for communication feature information. According to the above-mentioned questionnaire, each questionnaire can respond on the Likert scale. It can be assumed that a score of 2.5 or less indicates a negative response of the questionnaire, and a score of 3.5 or more indicates a positive response of the questionnaire. Accordingly, the feature extracting unit 130 may classify online text authors having communication qualities of 3.5 points or more into upper groups, and online text authors having communication qualities of 2.5 points or less into lower groups. As a result of classifying the entire online text authors into upper and lower groups, the lower group accounted for 20% and the upper group accounted for 80%. The feature extracting unit 130 selects candidate online text authors based on the Synthetic Minority Over-sampling Technique (SMOTE), which is one of the resampling techniques, to select the same number of candidate online text authors in each group. Can be screened. The feature extracting unit 130 may over-sample from the lower group to select candidate online text authors at the same ratio as the upper group and the lower group. The composite minity oversampling technique is a technique using the KNN approach, so a detailed description is omitted.

또한, 특징 추출부(130)는 그룹화된 복수의 온라인 텍스트 저자의 온라인 텍스트 및 저자 정보로부터 특징을 추출할 수 있다. 추출된 특징은 의사소통 자질을 평가하기 위한 모델 구축에 활용될 수 있다. 특징 추출부(130)는 5가지 기준에 기초하여 온라인 텍스트로부터 특징을 추출할 수 있다. 구체적으로, 특징 추출부(130)는 저자 정보, 구문 정보, 유사성 정보, 감정 정보 및 언어학 기법에 기초하여 특징을 추출할 수 있다. 저자 정보는 저자명, 저자 프로필 멘션, 나이, 팔로잉 수, 팔로워 수, 팔로워, 게시물 및 저자의 호감 게시물(예를 들어, 트위터의 '좋아요'를 입력한 게시물) 중 적어도 어느 하나를 포함할 수 있다. 구문 정보는 온라인 텍스트의 리트윗 수, 즐겨찾기, 해시태그, 온라인 텍스트 내의 문장 부호(물음표, 느낌표 등), 단어 수 및 단어 길이 중 적어도 어느 하나를 포함할 수 있다. 유사성 정보는 동일한 저자가 작성한 온라인 텍스트를 TF-IDF벡터로 변환하고 코사인 유사성을 연산하여 주제의 유사성을 산출한 것을 의미한다. 감정 정보는 TextBlob을 사용한 자연어 처리에 기반하여 감정 점수를 산출할 수 있다. 감정 점수는 -1 내지 1의 범위를 가지며, -1은 부정적인 감정 점수를 의미하고 1은 긍정적인 감정 점수를 의미한다. Also, the feature extracting unit 130 may extract features from online text and author information of a plurality of grouped online text authors. The extracted features can be used to build a model for evaluating communication qualities. The feature extraction unit 130 may extract features from online text based on five criteria. Specifically, the feature extraction unit 130 may extract features based on author information, syntax information, similarity information, emotion information, and linguistic techniques. The author information may include at least one of the author's name, author profile mention, age, number of followers, number of followers, followers, posts, and author's favorite posts (for example, posts with'Like' on Twitter). . The syntax information may include at least one of the number of retweets of online text, favorites, hashtags, punctuation marks (question marks, exclamation marks, etc.) in online text, the number of words, and the length of words. The similarity information means that the online text written by the same author is converted into a TF-IDF vector and the cosine similarity is calculated to calculate the similarity of the subject. The emotion information can be calculated based on natural language processing using TextBlob. The emotion score has a range of -1 to 1, -1 means a negative emotion score, and 1 means a positive emotion score.

언어학 기법은 언어 조사 및 단어 계산 분석(Linguistic Inquiry and Word Count, LIWC) 및 코사인 유사성을 통해 특징을 추출할 수 있다. 상기 언어 조사 및 단어 계산 분석은 상기 온라인 텍스트로부터 감정, 스타일, 조사 및 단어 중 적어도 어느 하나를 추출할 수 있다. 언어 조사 및 단어 계산 분석에 의하면, 온라인 텍스트 내에 텍스트를 통해 추출할 수 있으나, 비 언어적 요소는 언어 조사 및 단어 계산 분석을 이용할 수 없으므로, 코사인 유사성을 통해 비 언어적 요소로부터 특징을 추출할 수 있다. 코사인 유사성은 온라인 텍스트로부터 해시태그, 문장부호, 이모티콘, 링크(URL), 리트윗 중 적어도 어느 하나를 추출할 수 있다. 또한, 코사인 유사성은 동일한 저자가 작성한 온라인 텍스트를 TF-IDF벡터로 변환하여 코사인 유사성을 연산할 수 있다. Linguistics techniques can extract features through linguistic inquiry and word count analysis (LIWC) and cosine similarity. The language survey and word calculation analysis may extract at least one of emotion, style, survey, and word from the online text. According to linguistic survey and word count analysis, text can be extracted from online text, but since non-verbal factors cannot use language survey and word count analysis, features can be extracted from non-verbal factors through cosine similarity. have. Cosine similarity can extract at least one of hashtags, punctuation marks, emoticons, links (URLs), and retweets from online text. In addition, cosine similarity can be calculated by converting online text written by the same author into a TF-IDF vector.

도 6은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 장치의 자질 파라미터의 다변량 회귀 분석 연산 결과를 도시한 도면이다.FIG. 6 is a diagram illustrating a multivariate regression analysis result of a feature parameter of an author's communication feature evaluation apparatus based on a machine learning model according to an embodiment of the present application.

도 6을 참조하면, 특징 추출부(130)는 다변량 회귀 분석 언어학 기법 및 코사인 유사성을 통해 추출된 특징과 의사소통 자질 정보와의 상관관계를 산출할 수 있다. 예시적으로, 6글자 이상의 단어(특징)가 많을수록 자질 파라미터(출처 신뢰도, 대인관계 매력도, 대화 능력, 상호작용 의도성)가 이 높은 점수를 부여 받을 수 있다. 6글자 이상의 단어를 사용하는 것은 트윗에서 교육과 사회 계층과 관련된 요소를 의미하며, 이러한 단어를 사용하는 것이 의사소통 자질 평가를 위한 설문에 긍정적인 영향을 미칠 수 있다. 또한, 정관사를 빈번하게 사용할수록 높은 점수의 의사소통 자질 정보가 산출될 수 있다. 구체적으로, 구상 명사, 물체 및 사물에 대한 관심을 나타내는 트윗일수록 상대적으로 높은 점수의 의사소통 자질 정보가 산출될 수 있다. Referring to FIG. 6, the feature extraction unit 130 may calculate a correlation between features extracted through multivariate regression analysis linguistics technique and cosine similarity and communication quality information. For example, the more words (features) of 6 characters or more, the higher the quality parameters (source reliability, interpersonal attractiveness, conversational ability, and interaction intention) can be awarded. Using six or more words means factors related to education and social class in tweets, and using these words can have a positive impact on the questionnaire for evaluating communication qualities. In addition, the more frequently a definite article is used, the higher the quality of communication information can be calculated. Specifically, the higher the number of communication qualities of a relatively high score, the more tweets representing interests in concept nouns, objects, and objects.

모델 구축부(140)는 추출된 특징을 입력으로 하는 분류 모델에 기초하여 상기 모델을 구축할 수 있다. 구체적으로, 모델 구축부(140)는 추출된 특징에 기초하여 기 설정된 기계학습 방식으로 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 예시적으로, 모델 구축부(140)는 상기 분류 모델 중 랜덤 포레스트(Random Forest) 알고리즘에 기초한 기계학습을 통해 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 모델 구축부(140)는 랜덤 포레스트 알고리즘 외에도 Logistic Regression (LR), Support Vector Machine (SVM), Adaptive boosted Decision Trees (ADT) 등의 알고리즘에 기초하여 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 그러나 상기 각각의 알고리즘에 기초한 모델의 정확성, 정밀성, 검출율을 산출한 결과, 랜덤 포레스트 알고리즘이 가장 우수한 성능을 나타낸 바, 모델 구축부(140)는 랜덤 포레스트 알고리즘에 기초한 기계학습을 통해 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 모델 구축부(140)에서 구축되는 의사소통 자질 정보를 평가할 수 있는 모델은 종래의 랜덤 포레스트 알고리즘에 기초한 모델과 종속 변수가 상이할 뿐만 아니라 라벨링 과정 또한 완전히 새롭게 수행될 수 있다.The model building unit 140 may build the model based on the classification model using the extracted feature as an input. Specifically, the model building unit 140 may build a model capable of evaluating the communication quality information of the author of the online text using a preset machine learning method based on the extracted features. For example, the model building unit 140 may build a model capable of evaluating communication quality information through machine learning based on a random forest algorithm among the classification models. In addition to the random forest algorithm, the model building unit 140 can build a model capable of evaluating communication quality information based on algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), and Adaptive boosted Decision Trees (ADT). have. However, as a result of calculating the accuracy, precision, and detection rate of the models based on the respective algorithms, the random forest algorithm showed the best performance, and the model building unit 140 communicated through machine learning based on the random forest algorithm. Build a model to evaluate information. The model capable of evaluating the communication quality information constructed by the model building unit 140 is different from the model based on the random forest algorithm and the dependent variable, and the labeling process can be completely newly performed.

평가부(150)는 신규 온라인 텍스트를 수신하고, 상기 구축된 모델을 통해 신규 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있다. 평가부(150)는 신규 온라인 텍스트와 신규 온라인 텍스트의 저자 정보로부터 특징을 추출하고, 추출된 특징을 입력으로 하는 의사소통 자질 정보를 평가할 수 있는 모델을 통해 신규 온라인 텍스트의 저자의 의사소통 자질 정보의 점수를 산출할 수 있다.The evaluation unit 150 may receive the new online text and evaluate the communication quality information of the author of the new online text through the constructed model. The evaluation unit 150 extracts features from the new online text and the author information of the new online text, and the communication quality information of the author of the new online text through a model capable of evaluating communication quality information using the extracted features as input. You can calculate the score.

도 7은 본원의 일 실시예에 따른 머신러닝 모델 기반 저자의 의사소통 자질 평가 방법의 흐름을 도시한 도면이고, 도 8은 본원의 일 실시예에 따른 의사소통 자질을 평가하기 위한 모델을 구축하는 방법의 흐름을 도시한 도면이다.7 is a diagram illustrating a flow of a method for evaluating communication qualities of an author based on a machine learning model according to an embodiment of the present application, and FIG. 8 is a model for constructing a model for evaluating communication qualities according to an embodiment of the present application It is a diagram showing the flow of the method.

도 7에 도시된 머신러닝 모델 기반 저자의 의사소통 자질 평가 방법 및 도 8에 도시된 의사소통 자질을 평가하기 위한 모델을 구축하는 방법은 앞선 도1 내지 도 6을 통해 설명된 의사소통 자질 평가 장치(100)에 의하여 수행될 수 있다. 따라서 이하 생략된 내용이라고 하더라도 도 1 내지 도 6를 통해 의사소통 자질 평가 장치(100)에 대하여 설명된 내용은 도 7 및 도 8에도 동일하게 적용될 수 있다.The method of evaluating the communication qualities of the author based on the machine learning model shown in FIG. 7 and the method of constructing a model for evaluating the communication qualities shown in FIG. 8 are the communication qualities evaluation devices described through FIGS. 1 to 6 It can be performed by (100). Therefore, even if omitted, the description of the communication quality evaluation apparatus 100 through FIGS. 1 to 6 may be applied to FIGS. 7 and 8 as well.

도 7을 참조하면, 단계 S710에서 수집부(100)는 소셜 네트워크 서버(200)로부터 상기 네트워크를 통해 온라인 텍스트를 수집할 수 있다. 또한, 수집부(100)는 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문 데이터를 수집할 수 있다. 예시적으로, 설문은 SurveyMonkey를 통해 진행되며, 수집부(110)는 SurveyMonkey로부터 설문 데이터를 수집할 수 있다. 설문 데이터는 의사소통 자질 평가를 위한 자질 파라미터에 대한 응답을 포함할 수 있다. 상기 자질 파라미터는 예시적으로, 출처 신뢰도(Source credibility (SC)), 대인관계 매력도(Interpersonal attraction (IA)), 대화 능력(Communication competence (CC)), 및 상호작용 의도성(Intent to interact (INT))을 포함할 수 있다.Referring to FIG. 7, in step S710, the collection unit 100 may collect online text from the social network server 200 through the network. In addition, the collection unit 100 may collect questionnaire data for evaluating communication qualities of online text authors. For example, the survey is conducted through SurveyMonkey, and the collection unit 110 may collect survey data from SurveyMonkey. The questionnaire data may include a response to a quality parameter for evaluating communication quality. The quality parameters are, for example, source credibility (SC), interpersonal attraction (IA), communication competence (CC), and intent to interact ( INT)).

단계 S720에서 의사소통 자질 정보 생성부(120)는 설문 데이터에 기초하여 온라인 텍스트 저자별 의사소통 자질 정보를 생성할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 설문 응답자의 특성 데이터를 더 고려하여 의사소통 자질 정보를 생성할 수 있다. 상기 설문 응답자의 특성 데이터는 설문 응답자의 성별, 나이, SNS 사용기간, SNS 사용빈도 및 성격 중 적어도 어느 하나를 포함할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 설문 데이터와 설문 응답자의 특성 데이터를 이용한 통계적 분석에 기초하여 의사소통 자질 정보를 생성할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 다변량 선형 회귀 분석을 통해 특성 데이터와 의사소통 자질 정보의 상관관계를 산출할 수 있다. In step S720, the communication feature information generating unit 120 may generate communication feature information for each online text author based on the questionnaire data. In addition, the communication feature information generating unit 120 may generate communication feature information by further considering characteristic data of the survey respondent. The characteristic data of the survey respondent may include at least one of the gender, age, SNS usage period, SNS usage frequency, and personality of the survey respondent. In addition, the communication quality information generating unit 120 may generate communication quality information based on statistical analysis using questionnaire data and characteristic data of the survey respondents. In addition, the communication feature information generating unit 120 may calculate a correlation between characteristic data and communication feature information through multivariate linear regression analysis.

단계 S730에서 특징 추출부(130)는 온라인 텍스트 저자별 의사소통 자질 정보에 기초하여 복수의 온라인 텍스트 저자를 그룹화할 수 있다. 예시적으로, 특징 추출부(130)는 의사소통 자질 정보에 의한 점수가 높은 상위 그룹과 점수가 낮은 하위 그룹으로 구분할 수 있다. 특징 추출부(130)는 전체 온라인 텍스트 저자 중 재샘플링 기법(SMOT)에 기초하여 후보 온라인 텍스트 저자를 선별할 수 있다. 또한, 특징 추출부(130)는 의사소통 자질 정보에 대한 미리 설정된 기준값에 기초하여 후보 온라인 텍스트 저자 각각을 상위 그룹 및 하위 그룹으로 분류할 수 있다. In step S730, the feature extraction unit 130 may group a plurality of online text authors based on the communication quality information for each online text author. For example, the feature extraction unit 130 may be divided into an upper group having a high score and a lower group having a low score according to communication quality information. The feature extraction unit 130 may select a candidate online text author based on a resampling technique (SMOT) from among all online text authors. In addition, the feature extracting unit 130 may classify each of the candidate online text authors into an upper group and a lower group based on a preset reference value for communication quality information.

단계 S740에서 특징 추출부(130)는 그룹화된 복수의 온라인 텍스트 저자의 온라인 텍스트 및 저자 정보로부터 특징을 추출할 수 있다. 특징 추출부(130)는 5가지 기준에 기초하여 온라인 텍스트로부터 특징을 추출할 수 있다. 구체적으로, 특징 추출부(130)는 저자 정보, 구문 정보, 유사성 정보, 감정 정보 및 언어학 기법에 기초하여 특징을 추출할 수 있다. 언어학 기법은 언어 조사 및 단어 계산 분석(Linguistic Inquiry and Word Count, LIWC) 및 코사인 유사성을 통해 특징을 추출할 수 있다. 상기 언어 조사 및 단어 계산 분석은 상기 온라인 텍스트로부터 감정, 스타일, 조사 및 단어 중 적어도 어느 하나를 추출할 수 있다. 언어 조사 및 단어 계산 분석에 의하면, 온라인 텍스트 내에 텍스트를 통해 추출할 수 있으나, 비 언어적 요소는 언어 조사 및 단어 계산 분석을 이용할 수 없으므로, 코사인 유사성을 통해 비 언어적 요소로부터 특징을 추출할 수 있다. 코사인 유사성은 온라인 텍스트로부터 해시태그, 문장부호, 이모티콘, 링크(URL), 리트윗 중 적어도 어느 하나를 추출할 수 있다. 또한, 코사인 유사성은 동일한 저자가 작성한 온라인 텍스트를 TF-IDF벡터로 변환하여 코사인 유사성을 연산할 수 있다. In step S740, the feature extraction unit 130 may extract features from online text and author information of a plurality of grouped online text authors. The feature extraction unit 130 may extract features from online text based on five criteria. Specifically, the feature extraction unit 130 may extract features based on author information, syntax information, similarity information, emotion information, and linguistic techniques. Linguistics techniques can extract features through linguistic inquiry and word count analysis (LIWC) and cosine similarity. The language survey and word calculation analysis may extract at least one of emotion, style, survey, and word from the online text. According to linguistic survey and word count analysis, text can be extracted from online text, but since non-verbal factors cannot use language survey and word count analysis, features can be extracted from non-verbal factors through cosine similarity. have. Cosine similarity can extract at least one of hashtags, punctuation marks, emoticons, links (URLs), and retweets from online text. In addition, cosine similarity can be calculated by converting online text written by the same author into a TF-IDF vector.

단계 S750에서 모델 구축부(140)는 추출된 특징을 입력으로 하는 분류 모델에 기초하여 상기 모델을 구축할 수 있다. 구체적으로, 모델 구축부(140)는 추출된 특징에 기초하여 기 설정된 기계학습 방식으로 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 예시적으로, 모델 구축부(140)는 상기 분류 모델 중 랜덤 포레스트(Random Forest) 알고리즘에 기초한 기계학습을 통해 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 모델 구축부(140)에서 구축되는 의사소통 자질 정보를 평가할 수 있는 모델은 종래의 랜덤 포레스트 알고리즘에 기초한 모델과 종속 변수가 상이할 뿐만 아니라 라벨링 과정 또한 완전히 새롭게 수행될 수 있다.In step S750, the model building unit 140 may build the model based on the classification model using the extracted feature as an input. Specifically, the model building unit 140 may build a model capable of evaluating the communication quality information of the author of the online text using a preset machine learning method based on the extracted features. For example, the model building unit 140 may build a model capable of evaluating communication quality information through machine learning based on a random forest algorithm among the classification models. The model capable of evaluating the communication quality information constructed by the model building unit 140 is different from the model based on the random forest algorithm and the dependent variable, and the labeling process can be completely newly performed.

단계 S760에서 평가부(150)는 신규 온라인 텍스트를 수신하고, 상기 구축된 모델을 통해 신규 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있다. 평가부(150)는 신규 온라인 텍스트와 신규 온라인 텍스트의 저자 정보로부터 특징을 추출하고, 추출된 특징을 입력으로 하는 의사소통 자질 정보를 평가할 수 있는 모델을 통해 신규 온라인 텍스트의 저자의 의사소통 자질 정보의 점수를 산출할 수 있다.In step S760, the evaluation unit 150 may receive the new online text and evaluate the communication quality information of the author of the new online text through the constructed model. The evaluation unit 150 extracts features from the new online text and the author information of the new online text, and the communication quality information of the author of the new online text through a model capable of evaluating communication quality information using the extracted features as input. You can calculate the score.

도 8을 참조하면, 단계 S810에서 수집부(100)는 소셜 네트워크 서버(200)로부터 상기 네트워크를 통해 온라인 텍스트를 수집할 수 있다.Referring to FIG. 8, in step S810, the collection unit 100 may collect online text from the social network server 200 through the network.

단계 S820에서, 수집부(100)는 온라인 텍스트 저자의 의사소통 자질 평가를 위한 설문 데이터를 수집할 수 있다. 예시적으로, 설문은 SurveyMonkey를 통해 진행되며, 수집부(110)는 SurveyMonkey로부터 설문 데이터를 수집할 수 있다. 상기 설문 데이터는 의사소통 자질 평가를 위한 자질 파라미터에 대한 응답을 포함할 수 있다. 상기 자질 파라미터는 예시적으로, 출처 신뢰도(Source credibility (SC)), 대인관계 매력도(Interpersonal attraction (IA)), 대화 능력(Communication competence (CC)), 및 상호작용 의도성(Intent to interact (INT))을 포함할 수 있다.In step S820, the collection unit 100 may collect questionnaire data for evaluation of communication qualities of the online text author. For example, the survey is conducted through SurveyMonkey, and the collection unit 110 may collect survey data from SurveyMonkey. The questionnaire data may include a response to a quality parameter for evaluating communication quality. The quality parameters are, for example, source credibility (SC), interpersonal attraction (IA), communication competence (CC), and intent to interact ( INT)).

단계 S830에서 의사소통 자질 정보 생성부(120)는 설문 데이터에 기초하여 온라인 텍스트 저자의 의사소통 자질 정보를 생성할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 설문 응답자의 특성 데이터를 더 고려하여 의사소통 자질 정보를 생성할 수 있다. 상기 설문 응답자의 특성 데이터는 설문 응답자의 성별, 나이, SNS 사용기간, SNS 사용빈도 및 성격 중 적어도 어느 하나를 포함할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 설문 데이터와 설문 응답자의 특성 데이터를 이용한 통계적 분석에 기초하여 의사소통 자질 정보를 생성할 수 있다. 또한, 의사소통 자질 정보 생성부(120)는 다변량 선형 회귀 분석을 통해 특성 데이터와 의사소통 자질 정보의 상관관계를 산출할 수 있다.In step S830, the communication feature information generating unit 120 may generate the communication feature information of the online text author based on the questionnaire data. In addition, the communication feature information generating unit 120 may generate communication feature information by further considering characteristic data of the survey respondent. The characteristic data of the survey respondent may include at least one of the gender, age, SNS usage period, SNS usage frequency, and personality of the survey respondent. In addition, the communication quality information generating unit 120 may generate communication quality information based on statistical analysis using questionnaire data and characteristic data of the survey respondents. In addition, the communication feature information generating unit 120 may calculate a correlation between characteristic data and communication feature information through multivariate linear regression analysis.

단계 S840에서 특징 추출부(130)는 온라인 텍스트의 저자 정보 및 온라인 텍스트로부터 특징을 추출할 수 있다. 구체적으로, 특징 추출부(130)는 저자 정보, 구문 정보, 유사성 정보, 감정 정보 및 언어학 기법에 기초하여 특징을 추출할 수 있다. 언어학 기법은 언어 조사 및 단어 계산 분석(Linguistic Inquiry and Word Count, LIWC) 및 코사인 유사성을 통해 특징을 추출할 수 있다. 상기 언어 조사 및 단어 계산 분석은 상기 온라인 텍스트로부터 감정, 스타일, 조사 및 단어 중 적어도 어느 하나를 추출할 수 있다. 언어 조사 및 단어 계산 분석에 의하면, 온라인 텍스트 내에 텍스트를 통해 추출할 수 있으나, 비 언어적 요소는 언어 조사 및 단어 계산 분석을 이용할 수 없으므로, 코사인 유사성을 통해 비 언어적 요소로부터 특징을 추출할 수 있다. 코사인 유사성은 온라인 텍스트로부터 해시태그, 문장부호, 이모티콘, 링크(URL), 리트윗 중 적어도 어느 하나를 추출할 수 있다. 또한, 코사인 유사성은 동일한 저자가 작성한 온라인 텍스트를 TF-IDF벡터로 변환하여 코사인 유사성을 연산할 수 있다. In step S840, the feature extraction unit 130 may extract features from the author information and online text of the online text. Specifically, the feature extraction unit 130 may extract features based on author information, syntax information, similarity information, emotion information, and linguistic techniques. Linguistics techniques can extract features through linguistic inquiry and word count analysis (LIWC) and cosine similarity. The language survey and word calculation analysis may extract at least one of emotion, style, survey, and word from the online text. According to linguistic survey and word count analysis, text can be extracted from online text, but since non-verbal factors cannot use language survey and word count analysis, features can be extracted from non-verbal factors through cosine similarity. have. Cosine similarity can extract at least one of hashtags, punctuation marks, emoticons, links (URLs), and retweets from online text. In addition, cosine similarity can be calculated by converting online text written by the same author into a TF-IDF vector.

단계 S850에서 모델 구축부(140)는 추출된 특징을 입력으로 하는 분류 모델에 기초하여 상기 모델을 구축할 수 있다. 구체적으로, 모델 구축부(140)는 추출된 특징에 기초하여 기 설정된 기계학습 방식으로 온라인 텍스트의 저자의 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 예시적으로, 모델 구축부(140)는 상기 분류 모델 중 랜덤 포레스트(Random Forest) 알고리즘에 기초한 기계학습을 통해 의사소통 자질 정보를 평가할 수 있는 모델을 구축할 수 있다. 모델 구축부(140)에서 구축되는 의사소통 자질 정보를 평가할 수 있는 모델은 종래의 랜덤 포레스트 알고리즘에 기초한 모델과 종속 변수가 상이할 뿐만 아니라 라벨링 과정 또한 완전히 새롭게 수행될 수 있다.In step S850, the model building unit 140 may build the model based on the classification model using the extracted feature as an input. Specifically, the model building unit 140 may build a model capable of evaluating the communication quality information of the author of the online text using a preset machine learning method based on the extracted features. For example, the model building unit 140 may build a model capable of evaluating communication quality information through machine learning based on a random forest algorithm among the classification models. The model capable of evaluating the communication quality information constructed by the model building unit 140 may differ from the model based on the random forest algorithm and the dependent variables, and the labeling process may be entirely newly performed.

본원의 일 실시 예에 따른, 머신러닝 모델 기반 저자의 의사소통 자질 평가 방법 및 의사소통 자질을 평가하기 위한 모델을 구축하는 방법은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.According to one embodiment of the present application, a method for evaluating communication qualities of an author based on a machine learning model and a method for constructing a model for evaluating communication qualities are implemented in a form of program instructions that can be performed through various computer means. It can be recorded on a readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes made by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is for illustrative purposes, and a person having ordinary knowledge in the technical field to which the present application belongs will understand that it is possible to easily change to other specific forms without changing the technical spirit or essential characteristics of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims, which will be described later, rather than the detailed description, and all modifications or variations derived from the meaning and scope of the claims and equivalent concepts should be interpreted to be included in the scope of the present application.

Claims

In the method of evaluating the communication qualities of authors based on machine learning models,
(a) collecting questionnaire data for evaluating communication qualities of online text authors;
(b) generating communication quality information for each online text author based on the questionnaire data;
(c) grouping a plurality of online text authors based on the communication quality information for each online text author;
(d) extracting features from online text and author information of the grouped online text authors; And
(e) constructing a model capable of evaluating the communication quality information of the author of the online text using a preset machine learning method based on the extracted feature; And
(f) receiving a new online text and evaluating the communication quality information of the author of the new online text through the constructed model,
A method of evaluating communication qualities of authors based on a machine learning model comprising a.

According to claim 1,
Step (a) is,
The questionnaire data includes a response to a quality parameter for evaluating communication quality,
The qualities parameters include source reliability, interpersonal attractiveness, conversational ability, and interaction intentionality. The method of evaluating communication qualities of authors based on a machine learning model.

According to claim 1,
Step (b) is,
Communication characteristics are generated by further considering characteristic data of survey respondents,
The communication quality information is generated based on the statistical analysis using the questionnaire data and characteristic data of the survey respondents,
A correlation between the characteristic data and the communication quality information is calculated through multivariate linear regression analysis,
The characteristic data of the survey respondents,
A method for evaluating communication qualities of authors based on a machine learning model, which includes at least one of gender, age, SNS usage period, SNS usage frequency, and personality of the survey respondent.

According to claim 1,
Step (c) is,
Among the online text authors, candidate online text authors are selected based on the resampling technique.
A method for evaluating communication qualities of authors based on a machine learning model, wherein each of the candidate online text authors is classified into a high group and a low group based on a preset reference value for the communication qualities.

According to claim 1,
Step (d) is,
Extracting the feature based on the author information, syntax information, similarity information, emotion information, and linguistic techniques,
Step (e) is,
A method for evaluating communication qualities of authors based on a machine learning model, wherein the model is constructed based on a classification model using the feature as an input.

The method of claim 5,
The linguistic technique,
Includes language research and word count analysis and cosine similarity,
The language survey and word calculation analysis extracts at least one of emotion, style, survey, and word from the online text,
The cosine similarity is to extract at least one of hashtags, punctuation marks, emoticons, links, and retweets from the online text, and a method for evaluating communication qualities of authors based on a machine learning model.

In the method of building a model for evaluating communication qualities,
(a) collecting online text;
(b) collecting survey data for evaluating communication qualities of the online text author of the online text by a survey respondent;
(c) generating communication quality information based on the questionnaire data;
(d) extracting author information of the online text and features from the online text; And
(e) constructing a model capable of evaluating the communication quality information of the author of the online text based on the classification learning model using the feature as an input,
How to build a model for evaluating communication qualities, including.

The method of claim 7,
Step (a) is,
The questionnaire data includes a response to a quality parameter for evaluating communication quality,
The qualities parameters include source reliability, interpersonal attractiveness, conversational ability, and interaction intention. A method for building a model for evaluating communication qualities.

The method of claim 7,
Step (d) is,
Extracting the feature based on the author information, syntax information, similarity information, emotion information, and linguistic techniques,
The linguistic technique,
Includes language research and word count analysis and cosine similarity,
The language survey and word calculation analysis extracts at least one of emotion, style, survey, and word from the online text,
The cosine similarity is to extract a hash tag, punctuation marks, emoticons, links, and retweets from the online text, and build a model for evaluating communication qualities.

In the machine learning model based author's communication quality evaluation device,
A collection unit that collects questionnaire data for evaluating communication qualities of online text authors;
A communication feature information generating unit generating communication feature information for each online text author based on the questionnaire data;
A feature extraction unit for grouping a plurality of online text authors based on the communication feature information for each of the online text authors and extracting features from online text and author information of the grouped online text authors;
A model building unit for constructing a model capable of evaluating the communication quality information of the author of the online text in a preset machine learning method based on the extracted features; And
An evaluation unit for receiving a new online text and evaluating the communication quality information of the author of the new online text through the constructed model,
Apparatus for evaluating communication qualities of authors based on a machine learning model comprising a.

The method of claim 10,
The questionnaire data includes a response to a quality parameter for evaluating communication quality,
The qualities parameters include source reliability, interpersonal attractiveness, conversational ability, and interaction intention. The machine learning model-based author communication qualities evaluation device.

The method of claim 10,
The feature extraction unit,
Extracting the feature based on the author information, syntax information, similarity information, emotion information, and linguistic techniques,
The linguistic technique,
Includes language surveys and word counting analysis and cosine similarity,
The language survey and word calculation analysis extracts at least one of emotion, style, survey, and word from the online text,
The cosine similarity is to extract at least one of hashtags, punctuation marks, emoticons, links, and retweets from the online text, a machine learning model-based author communication quality evaluation device.