KR102117534B1

KR102117534B1 - Apparatus and method for predicting credibility of online data

Info

Publication number: KR102117534B1
Application number: KR1020180172778A
Authority: KR
Inventors: 한경식; 조용걸
Original assignee: 아주대학교산학협력단
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-06-01

Abstract

The present invention relates to a method for predicting credibility of online data. The method for predicting credibility comprises the steps of: selecting a first element and a second element affecting credibility evaluation of online data based on a recognition process of a user group; storing the selected first element and second element in a specific vector space; learning a prediction model capable of evaluating the credibility of the online data in a preset machine learning method based on characteristics stored in the vector space; and receiving prediction target online data, and using the learned prediction model to predict the credibility of the prediction target online data.

Description

Apparatus and method for predicting reliability of online data {APPARATUS AND METHOD FOR PREDICTING CREDIBILITY OF ONLINE DATA}

본원은 온라인 데이터의 신뢰도 예측 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for predicting reliability of online data.

소셜 네트워크 서비스(Social Network Service,SNS)는 사용자 간의 자유로운 의사 소통과 정보공유 등을 통해 온라인에서 사회적 관계를 생성할 수 있는 온라인 플랫폼을 통칭한다. 소셜 네트워크 서비스를 통해 공유되는 온라인 데이터는 텍스트, 이미지, 오디오, 비디오 등의 다양한 형태를 가지며 블로그, 페이스북, 트위터, 인스턴트 메시지 보드, 팟 캐스트, 위키, 사용자 제작 컨텐츠(UCC) 등이 예시가 될 수 있다.Social Network Service (SNS) is an online platform that can create social relationships online through free communication between users and information sharing. Online data shared through social network services has various forms such as text, image, audio, and video, and blogs, Facebook, Twitter, instant message boards, podcasts, wikis, user-generated content (UCC), etc. Can be.

전술한 다양한 소셜 네트워크 서비스를 통해 공유하고자 하는 정보를 게시하거나 원하는 정보를 검색하는 행위는 점점 증가하고 있고, 이러한 소셜 네트워크 서비스를 통한 정보 수집에 대한 의존도가 높아짐에 따라 소셜 네트워크 서비스 상의 온라인 데이터에 포함된 정보를 신뢰할 수 있는 것인가에 대한 판단이 중요해지고 있다. Posting of information to be shared or searching for desired information through various social network services described above is increasing, and as the dependence on collecting information through these social network services increases, it is included in online data on social network services. Judgment as to whether or not to trust the information is becoming important.

이에 따라, 온라인 데이터의 신뢰성을 분석하기 위한 선행 연구들에서는, 크게 신뢰성을 평가의 대상이 되는 방대한 데이터를 컴퓨터 분석이나 분류 모델링을 통해 구축하는 전산 접근법(Computer-based approach), 인지 과학 분야의 이론적인 지식과 설문조사, 인터뷰 등을 활용한 실증적인 분석을 활용한 인간 중심 접근법(Human-centered approach)의 두 측면에서 온라인 데이터의 신뢰성을 평가할 수 있는 방법 들을 제시했다.Accordingly, in previous studies for analyzing the reliability of online data, computer-based approach to construct massive data subject to reliability evaluation through computer analysis or classification modeling, theoretical in the field of cognitive science Methods for evaluating the reliability of online data were presented in two aspects: a human-centered approach using empirical analysis using human knowledge, surveys, and interviews.

다만, 전술한 선행 연구들에서는 전산 접근법과 관련하여, 신뢰도 분석을 위한 라벨링 과정에서 데이터의 수집이 사람이 주석을 다는 방식을 통해 이루어짐으로써 사람마다 기준이 상이할 수 있으며 실수가 발생할 가능성이 있다는 단점이 있다.However, in the above-mentioned prior studies, with respect to the computational approach, the collection of data in the labeling process for reliability analysis is done through a method in which a person annotates, so that each person may have different standards and there is a possibility that mistakes may occur. There is this.

또한, 인간 중심 접근법과 관련하여, 사례 연구들에서 설문자가 온라인 데이터에 대한 신뢰도를 평가할 때 영향을 미치는 여러 세부 요인들에 있어서, 해당 요인들이 동일한 중요도를 갖지 않음에도 상기 요인들의 가중치에 차이를 두지 않고 결과를 분석했다는 한계가 있다.In addition, with respect to the human-centered approach, in the case studies, the weights of the factors are not different, even though the factors do not have the same importance in the various detailed factors that influence the questioner when evaluating the reliability of the online data. There is a limit to analyzing the results without.

본원의 배경이 되는 기술은 한국특허공개공보 제10-1516635호에 개시되어 있다.The background technology of the present application is disclosed in Korean Patent Publication No. 10-1516635.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 사용자 군의 인지 프로세스를 고려하여 사용자 군이 온라인 데이터의 신뢰도 정보를 모르는 상태에서의 본능적 신뢰도 평가 단계 및 사용자 군에게 온라인 데이터의 신뢰도 정보가 주어진 상태에서의 반영적 신뢰도 평가 단계로 이루어진 두 단계의 사례 분석을 바탕으로 사용자 군이 온라인 데이터의 신뢰도를 평가하는 과정에서 중요하게 고려하는 요소들의 순위를 평가하여, 이를 기반으로 예측 대상 온라인 데이터의 신뢰도를 높은 정확도로 예측할 수 있는 장치 및 방법을 제공하는 것을 목적으로 한다.In order to solve the problems of the prior art described above, in consideration of the cognitive process of the user group, the instinctive reliability evaluation step in the state where the user group does not know the reliability information of the online data and the reliability information of the online data are given to the user group Based on the two-stage case analysis, which consists of the reflective reliability evaluation in the state, the ranking of factors that are considered important in the process of evaluating the reliability of the online data by the user group is evaluated, and based on this, the reliability of the predicted online data An object of the present invention is to provide an apparatus and method capable of predicting high accuracy.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiments of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 온라인 데이터의 신뢰도 예측 방법은, 사용자 군의 인지 프로세스를 기반으로 온라인 데이터의 신뢰도 평가에 영향을 미치는 제1요소 및 제2요소를 선정하는 단계, 상기 선정된 제1요소 및 제2요소를 특성 벡터 공간에 저장하는 단계, 상기 벡터 공간에 저장된 특성을 기반으로 하여 기 설정된 기계학습 방식으로 온라인 데이터의 신뢰도를 평가할 수 있는 예측 모델을 학습시키는 단계 및 예측 대상 온라인 데이터를 수신하고, 상기 학습된 예측 모델을 이용하여 상기 예측 대상 온라인 데이터의 신뢰도를 예측하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, a method for predicting reliability of online data according to an embodiment of the present application includes first and second factors affecting reliability evaluation of online data based on a cognitive process of a user group. Predicting an element, storing the selected first and second elements in a characteristic vector space, and evaluating the reliability of online data using a preset machine learning method based on the characteristics stored in the vector space The method may include training a model, receiving online data to be predicted, and predicting reliability of the online data to be predicted using the learned prediction model.

또한, 상기 제1요소 및 제2요소를 선정하는 단계는, 사용자 군이 상기 온라인 데이터의 신뢰도를 알기 전, 신뢰도 평가에 고려한 요소를 수집하는 본능적 신뢰도 평가 단계, 사용자 군이 상기 온라인 데이터의 신뢰도를 알고 난 후, 신뢰도 평가에 고려한 요소를 수집하는 반영적 신뢰도 평가 단계 및 상기 본능적 신뢰도 평가 단계에서 수집된 요소들 중에서 상기 반영적 신뢰도 평가 단계에서 순위가 가장 많이 오른 요소를 상기 제2요소로 선정하는 단계를 포함할 수 있다.In addition, the step of selecting the first element and the second element includes an instinctive reliability evaluation step of collecting elements considered for reliability evaluation before the user group knows the reliability of the online data, and the user group determines the reliability of the online data. After knowing, among the elements collected in the reflective reliability evaluation step and the instinctive reliability evaluation step, collecting the factors considered in the reliability evaluation, selecting the element with the highest ranking in the reflective reliability evaluation step as the second element It may include steps.

또한, 상기 순위가 가장 많이 오른 요소는 콘텐츠 관련 요소 및 저자 관련 요소를 포함할 수 있고, 상기 제2요소로 선정하는 단계는, 상기 저자 관련 요소를 배제하고 상기 콘텐츠 관련 요소를 상기 제2요소로 선정할 수 있다.In addition, the highest ranked element may include a content-related element and an author-related element, and selecting the second element may exclude the author-related element and the content-related element as the second element. Can be selected.

또한, 상기 제2요소는 구조, 텍스트 길이비, 이미지 개수비, 정렬을 포함할 수 있다.In addition, the second element may include a structure, a text length ratio, an image number ratio, and alignment.

또한, 상기 기 설정된 기계학습 방식은, 로지스틱 회귀분석(Logistic Regression), 랜덤 포레스트(Random Forest), 선형 서포트 벡터 머신(linear Support Vector Machine) 또는 다층 퍼셉트론(Multi-Layer Perceptron) 중 어느 하나일 수 있다.In addition, the preset machine learning method may be any one of logistic regression, random forest, linear support vector machine, or multi-layer perceptron. .

또한, 상기 특성 벡터 공간에 저장하는 단계는, TF-IDF 가중치 알고리즘에 의해 특정한 크기를 갖는 벡터에 상기 선정된 제1요소 및 제2요소가 매핑될 수 있다.Also, in the step of storing in the characteristic vector space, the selected first element and second element may be mapped to a vector having a specific size by a TF-IDF weighting algorithm.

한편, 상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 온라인 데이터의 신뢰도 예측 모델 생성 방법은, 신뢰도 있는 온라인 데이터 및 신뢰도 없는 온라인 데이터를 수집하는 단계, 상기 신뢰도 있는 온라인 데이터 및 상기 신뢰도 없는 온라인 데이터를 임의로 선정하여 사용자 군에게 전송하는 단계, 상기 사용자 군으로부터 상기 임의로 선정된 상기 신뢰도 있는 온라인 데이터 및 상기 신뢰도 없는 온라인 데이터의 신뢰도 평가 결과를 수신하는 단계, 상기 수신된 신뢰도 평가 결과에 기초하여, 사용자 군의 인지 프로세스를 기반으로 온라인 데이터의 신뢰도 평가에 영향을 미치는 제1요소 및 제2요소를 선정하는 단계, 상기 선정된 제1요소 및 제2요소를 특성 벡터 공간에 저장하는 단계 및 상기 벡터 공간에 저장된 특성을 기반으로 하여 기 설정된 기계학습 방식으로 온라인 데이터의 신뢰도를 평가할 수 있는 예측 모델을 학습시키는 단계를 포함할 수 있다.On the other hand, as a technical means for achieving the above technical problem, a method for generating a reliability prediction model of online data according to an embodiment of the present application includes collecting reliable online data and unreliable online data, and the reliable online data And randomly selecting the unreliable online data and transmitting it to a user group, receiving a result of the reliability evaluation of the randomly selected trusted online data and the unreliable online data from the user group, and evaluating the received reliability. Based on the results, selecting the first and second elements that influence the reliability evaluation of the online data based on the recognition process of the user group, and storing the selected first and second elements in a characteristic vector space And learning a prediction model capable of evaluating the reliability of online data using a preset machine learning method based on the characteristics stored in the vector space.

또한, 기 제2요소는 구조, 텍스트 길이비, 이미지 개수비, 정렬을 포함할 수 있다.Also, the second element may include a structure, a text length ratio, an image number ratio, and alignment.

한편, 상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 온라인 데이터의 신뢰도 예측 장치는, 신뢰도를 알고 있는 온라인 데이터와 예측 대상 온라인 데이터를 수신하는 데이터 수신부, 상기 신뢰도를 알고 있는 온라인 데이터로부터 사용자 군의 인지 프로세스를 기반으로 신뢰도 평가에 영향을 미치는 제1요소 및 제2요소를 선정하는 데이터 특성 추출부, 상기 선정된 제1요소 및 제2요소를 특성 벡터 공간에 저장하는 벡터 저장부 및 상기 벡터 공간에 저장된 특성을 기반으로 하여 기 설정된 기계학습 방식으로 학습된 예측 모델을 통해 온라인 데이터의 신뢰도에 대한 예측값을 도출하는 온라인 데이터 신뢰도 예측부를 포함할 수 있다.On the other hand, as a technical means for achieving the above technical problem, the reliability prediction device for online data according to an embodiment of the present application, a data receiving unit for receiving the online data and online data to be predicted to know the reliability, know the reliability Data feature extraction unit that selects the first and second elements that affect the reliability evaluation based on the cognitive process of the user group from the online data, and stores the selected first and second elements in the characteristic vector space It may include a vector storage unit and an online data reliability prediction unit that derives a predicted value for the reliability of the online data through a prediction model learned by a preset machine learning method based on characteristics stored in the vector space.

또한, 상기 데이터 특성 추출부는, 사용자 군이 상기 온라인 데이터의 신뢰도를 알기 전, 신뢰도 평가에 고려한 요소를 수집하는 본능적 신뢰도 평가 단계를 수행하고, 사용자 군이 상기 온라인 데이터의 신뢰도를 알고 난 후, 신뢰도 평가에 고려한 요소를 수집하는 반영적 신뢰도 평가 단계를 수행하고, 상기 본능적 신뢰도 평가 단계에서 수집된 요소들 중에서 상기 반영적 신뢰도 평가 단계에서 순위가 가장 많이 오른 요소를 상기 제2요소로 선정하는 것일 수 있다.In addition, the data characteristic extracting unit performs an instinctive reliability evaluation step of collecting factors considered for reliability evaluation before the user group knows the reliability of the online data, and after the user group knows the reliability of the online data, the reliability A reflective reliability evaluation step of collecting factors considered for evaluation may be performed, and among the elements collected in the instinctive reliability evaluation step, an element having the highest ranking in the reflective reliability evaluation step may be selected as the second element. have.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 소셜 네트워크 서비스상의 온라인 데이터에 있어서, 전산 접근법 및 인간 중심 접근법 모두를 고려하고 이 두 접근법을 연관지어, 사용자의 인지 프로세스를 기반으로 한, 상기 온라인 데이터의 신뢰도를 정확도 높게 예측할 수 있는 온라인 데이터 신뢰도 예측 모델을 생성할 수 있다.According to the above-described problem solving means of the present application, in online data on a social network service, both the computational approach and the human-centered approach are considered, and these two approaches are associated and the reliability of the online data is based on the user's cognitive process. It is possible to generate an online data reliability prediction model that can accurately predict.

전술한 본원의 과제 해결 수단에 의하면, 온라인 데이터의 신뢰도를 판단할 수 있는 정확도 높은 예측 모델을 이용하여, 예측 대상 온라인 데이터에 대한 신뢰도 예측을 수행할 수 있는 온라인 데이터 신뢰도 예측 장치를 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an online data reliability prediction apparatus capable of performing reliability prediction on online data to be predicted using a high-accuracy prediction model capable of determining the reliability of online data. .

전술한 본원의 과제 해결 수단에 의하면, 온라인 데이터 신뢰도 예측 방법 및 장치는, 임의의 온라인 데이터에 접근하는 독자에게 해당 온라인 데이터의 신뢰도 정보를 제공함으로써, 독자가 신뢰할 수 없는 내용을 진실한 것으로 믿거나 거짓 광고에 현혹되는 현상을 방지할 수 있다.According to the above-mentioned problem solving means of the present application, the online data reliability prediction method and apparatus provide the reliability information of the online data to a reader who accesses any online data, thereby believing that the content that the reader cannot trust is true or false It is possible to prevent the phenomenon of being deceived by advertisements.

도 1은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 시스템의 개략적인 도면이다.
도 2는 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 장치의 개략적인 블록도이다.
도 3은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 모델 생성 방법에 대한 동작흐름도이다.
도 4는 본원의 일 실시예에 따른 제1요소 및 제2요소를 선정하는 방법에 대한 동작흐름도이다.
도5는 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법에 대한 동작흐름도이다.
도6은 본원의 일 실시예에 따른 신뢰도 있는 데이터 및 신뢰도 없는 데이터에 대한 카테고리 별 수집 결과를 도시한 도면이다.
도7은 본원의 일 실시예에 따른 온라인 데이터의 신뢰도 평가가 수행되는 과정을 설명하기 위한 도면이다.
도8은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 본능적 신뢰도 평가에서 고려된 중요 요소 및 반영적 신뢰도 평가에서 고려된 중요 요소를 각각 순위화한 도표이다.
도9는 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 사용자 군의 인지 프로세스를 기반으로 제1요소 및 제2요소를 선정한 결과를 예시한 도표이다.
도10은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 기계 학습 방식 별 예측 정확도 및 F1점수를 도시한 도면이다.1 is a schematic diagram of an online data reliability prediction system according to an embodiment of the present application.
2 is a schematic block diagram of an online data reliability prediction apparatus according to an embodiment of the present application.
3 is an operation flow diagram of a method for generating an online data reliability prediction model according to an embodiment of the present application.
4 is an operation flow diagram of a method for selecting a first element and a second element according to an embodiment of the present application.
5 is an operation flow diagram of an online data reliability prediction method according to an embodiment of the present application.
FIG. 6 is a diagram illustrating a collection result by category for trusted data and unreliable data according to an embodiment of the present application.
7 is a diagram for explaining a process in which reliability evaluation of online data is performed according to an embodiment of the present application.
8 is an example of an experiment linked with an online data reliability prediction method according to an embodiment of the present application, and is a chart ranking the important factors considered in the instinctive reliability evaluation and the important elements considered in the reflective reliability evaluation.
9 is a diagram illustrating an example of selecting a first element and a second element based on a cognitive process of a user group as an example of an experiment linked with an online data reliability prediction method according to an embodiment of the present application.
10 is a diagram illustrating prediction accuracy and F1 score for each machine learning method as an example of an experiment linked with an online data reliability prediction method according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present application pertains may easily practice. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present application in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the present specification, when a member is positioned on another member “on”, “on the top”, “top”, “bottom”, “bottom”, “bottom”, this means that one member is attached to another member. This includes cases where there is another member between the two members as well as when in contact.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the present specification, when a part “includes” a certain component, it means that the component may further include other components, not to exclude other components, unless specifically stated to the contrary.

본원은 온라인 데이터 신뢰도 예측 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for predicting online data reliability.

도1은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 시스템의 개략적인 도면이다.1 is a schematic diagram of an online data reliability prediction system according to an embodiment of the present application.

도1을 참조하면, 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 장치(100)는 신뢰도 수준이 상이한 개별적인 데이터로 구성된 온라인 데이터(21)를 수신하여, 사용자 군(40)에게 전송하고, 사용자 군(40)이 온라인 데이터(21)의 신뢰도 정보를 알지 못하는 상태로 수행한 본능적 신뢰도 평가의 응답 결과와 사용자 군(40)이 온라인 데이터(21)의 신뢰도 정보를 숙지한 상태로 수행한 반영적 신뢰도 평가의 응답결과를 네트워크(10) 연결에 따라 수신하여, 본능적 신뢰도 평가 및 반영적 신뢰도 평가에서 사용자 군(40)이 온라인 데이터(21)의 신뢰도를 평가하는데 중요하게 작용한 요소들을 파악하고, 상기 요소들을 동일선 상에서 취급하는 것이 아니라, 신뢰도 평가에 있어 요소 별 중요도를 산정하여, 중요도가 높은 요소에 더 높은 가중치가 부여되도록 하여, 신뢰도 수준이 상이한 온라인 데이터 간에 존재하는 상기 파악된 요소들의 특성 차이를 학습하는 온라인 데이터의 신뢰도를 분류할 수 있는 기계 학습 기반의 모델을 생성할 수 있다.Referring to FIG. 1, the online data reliability prediction apparatus 100 according to an embodiment of the present disclosure receives online data 21 composed of individual data having different reliability levels, transmits the online data 21 to the user group 40, and the user group The response result of the instinctive reliability evaluation performed without knowing the reliability information of the online data 21 and the reflected reliability performed by the user group 40 knowing the reliability information of the online data 21 Receiving the response result of the evaluation according to the network 10 connection, grasping the factors that were important for the user group 40 to evaluate the reliability of the online data 21 in the instinctive reliability evaluation and the reflective reliability evaluation, and Rather than treating the elements on the same line, the importance of each element in the evaluation of reliability is calculated, so that a higher weight is given to the higher importance factor, and the difference in characteristics of the identified elements existing between online data having different reliability levels is determined. A machine learning-based model can be generated to classify the reliability of online data being trained.

또한, 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 장치(100)는 별도의 사용자 단말(30)을 통해 작성 및 공유되고, 신뢰도에 대한 정보를 알 수 없는 새로운 온라인 데이터(이하에서 '예측 대상 온라인 데이터(22)'라 한다.) 를 새로이 수신했을 때 상기 모델을 기반으로 상기 예측 대상 온라인 데이터(22)의 신뢰도를 예측하여 결과값을 도출할 수 있다.In addition, the online data reliability prediction apparatus 100 according to an embodiment of the present application is created and shared through a separate user terminal 30 and new online data (hereinafter referred to as 'predicted online') for which reliability information is unknown. When data is newly received, the reliability of the online data 22 to be predicted can be predicted based on the model to derive a result value.

본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 장치(100)는 사용자 단말(30)에 의해 작성된 예측 대상 온라인 데이터(22)를 수신하고, 수신된 데이터에 대응되는 신뢰도를 예측하며, 예측 결과값을 사용자 단말(30)로 제공하는 서버 또는 장치일 수 있다.The online data reliability prediction apparatus 100 according to an embodiment of the present application receives the prediction target online data 22 created by the user terminal 30, predicts the reliability corresponding to the received data, and predicts the prediction result value. It may be a server or device provided to the user terminal 30.

예를 들어, 상기 네트워크(10)는, 단말 및 서버와 같은 각각의 노드 상호 간에 정보 교환이 가능한 유, 무선의 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5G 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.For example, the network 10 refers to a wired or wireless connection structure capable of exchanging information between nodes, such as a terminal and a server, and an example of such a network is a 3GPP (3rd Generation Partnership Project) network. , LTE (Long Term Evolution) network, 5G network, WIMAX (World Interoperability for Microwave Access) network, Internet (Internet), LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), Personal Area Network (PAN), Bluetooth (Bluetooth) network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, and the like are included, but are not limited thereto.

예를 들어, 사용자 단말(30)은, PCS(Personal Communication System), GSM(Global System for Mobile communication), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-스마트패드(SmartPad), 태블릿 PC, 데스크탑 PC, 노트북, 웨어러블 디바이스 등일 수 있으며, 이에 한정되는 것은 아니고, 모든 종류의 유/무선 통신 장치를 포함할 수 있다. For example, the user terminal 30 is PCS (Personal Communication System), GSM (Global System for Mobile communication), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (Personal Digital Assistant), International Mobile Telecommunication) (2000), Code Division Multiple Access (CDMA) -2000, W-SmartPad, Tablet PC, Desktop PC, Laptop, Wearable device, etc. It may include a wireless communication device.

예를 들어, 상기 온라인 데이터(21) 및 예측 대상 온라인 데이터(22)는 블로그, 인스타그램 등 다양한 종류의 인터넷 게시물 및 SNS 데이터를 포함할 수 있다.For example, the online data 21 and the online data 22 to be predicted may include various types of Internet posts and SNS data such as blogs and Instagram.

도1을 참조하면, 온라인 데이터 신뢰도 예측 장치(100)는 신뢰도 있는 온라인 데이터 및 신뢰도 없는 온라인 데이터를 수집할 수 있다.Referring to FIG. 1, the online data reliability prediction apparatus 100 may collect reliable online data and unreliable online data.

예를 들어, 온라인 데이터 신뢰도 예측 장치(100)는 신되도 있는 온라인 데이터 및 신뢰도 없는 온라인 데이터를 온라인 데이터를 업로드 및 관리하는 별도의 웹 서버로부터 수신할 수 있다.For example, the online data reliability prediction apparatus 100 may receive trusted online data and unreliable online data from separate web servers for uploading and managing online data.

본원의 일 실시예에 따르면, 상기 신뢰도는 신용(Trustworthiness) 및 전문 지식(Expertise) 측면에서 정의될 수 있다. 상기 신용은 온라인 데이터가 왜곡되지 않은 정직한 의견을 바탕으로 작성되었다는 믿음을 의미할 수 있고, 상기 전문 지식은 온라인 데이터가 해당 분야에 대한 지식과 경험을 바탕으로 작성되었다는 믿음을 의미할 수 있다.According to one embodiment of the present application, the reliability may be defined in terms of trustworthiness and expertise. The credit may refer to the belief that the online data was created based on honest and undistorted opinions, and the expertise may refer to the belief that the online data was created based on knowledge and experience in the field.

본원의 일 실시예에 따르면, 상기 신뢰도 있는 온라인 데이터를 '해당 분야의 지식이 풍부한 작성자의 진정한 의견을 담은 정보'로 정의할 수 있고, 상기 신뢰도 없는 온라인 데이터를 '해당 분야의 지식이 없는 작성자에 의해 직접 경험한 바 없는 것을 바탕으로 작성된 정보'로 정의할 수 있다.According to an embodiment of the present application, the trusted online data may be defined as 'information containing true opinions of authors who are knowledgeable in the field', and the unreliable online data may be referred to as 'authors without knowledge in the field.' It can be defined as 'information written on the basis of what you have never experienced before.'

본원의 일 실시예에 따르면, 상기 신뢰도 있는 데이터의 일 예로, NAVER 포털에서 활동하는 파워 블로거들(Power Bloggers)에 의해 작성된 블로그들을 들 수 있다. 블로그 게시물이 신뢰할 수 있는 것이라 단정하기는 어렵지만, NAVER는 경험, 신뢰, 상업성, 저작권 및 활동 측면에서 매 달 30가지 분야의 블로그들을 검토하여 우수하고 신뢰할 수 있는 정보를 제공한 작성자들을 분야별 파워 블로거로 선정하여 이들의 게시글을 소개하는 서비스를 제공하기 있기 때문에, 이를 신뢰도 있는 데이터로 간주하는 것은 타당할 수 있다.According to an embodiment of the present application, as an example of the reliable data, blogs written by Power Bloggers working in the NAVER portal may be mentioned. It is difficult to conclude that blog posts are reliable, but NAVER reviews authors who provide excellent and reliable information by reviewing 30 different blogs each month in terms of experience, trust, commerciality, copyright, and activity as a power blogger for each field Since it provides a service to select and introduce their postings, it can be reasonable to regard it as reliable data.

본원의 일 실시예에 따르면, 반대로 상기 신뢰도 없는 데이터의 일 예로, Dbdbdepp(http://dbdbdeep.co.kr) 웹 사이트에 게시된 글을 들 수 있다. 전술한 NAVER가 선정한 파워 블로거들이 게시한 블로그들과 달리, 상기 Dbdbdeep 웹 사이트에는 작성자가 실제 경험을 바탕으로 작성하였는지 확인할 수 없는 식당, 미용실, 화장품, 숙박업소 등에 대한 리뷰들이 다수 존재하며, 대부분이 기업이 실제 리뷰인 듯 작성한 광고인 경우가 많아, 이를 신뢰도 없는 데이터로 간주하는 것은 타당할 수 있다.According to one embodiment of the present application, on the contrary, as an example of the unreliable data, there is an article posted on the Dbdbdepp ( http://dbdbdeep.co.kr ) website. Unlike blogs posted by NAVER's selected power bloggers, the Dbdbdeep website has many reviews on restaurants, beauty salons, cosmetics, and lodging establishments where authors can't check whether they've written based on actual experience. It is often advisable to consider this as untrustworthy data, as it is often an advertisement written by a company as if it were an actual review.

본원의 일 실시예에 따르면, 상기 신뢰도 있는 데이터 및 신뢰도 없는 데이터는 동일한 주제를 포함하는 카테고리 별로 수집될 수 있다.According to one embodiment of the present application, the trusted data and the unreliable data may be collected by categories including the same subject.

도6은 본원의 일 실시예에 따른 신뢰도 있는 데이터 및 신뢰도 없는 데이터에 대한 카테고리 별 수집 결과를 도시한 도면이다.FIG. 6 is a diagram illustrating a collection result by category for trusted data and unreliable data according to an embodiment of the present application.

도6을 참조하면, 상기 카테고리는 패션·뷰티, 건강·의약품, 교육·학습, 유명한 식당, 차·커피·디저트, IT·컴퓨터, 드라마·엔터테이먼트, 공연·전시, 차량, 외국어, 화폐·경제, 여행, 와인·주류, 스포츠, 육아·결혼, 레저, 사진 또는 만화·애니메이션 등일 수 있다.Referring to Fig. 6, the categories include fashion · beauty, health · medicine, education · learning, famous restaurant, tea · coffee · dessert, IT · computer, drama · entertainment, performance · exhibition, vehicle, foreign language, currency · It can be economy, travel, wine / alcohol, sports, parenting / marriage, leisure, photography or comics / animation.

본원의 일 실시예에 따르면, 신뢰도 있는 데이터 및 신뢰도 없는 데이터를 수집하기 위한 크롤링 코드는 파이썬(Python) 언어를 사용하여 작성될 수 있다.According to one embodiment of the present application, crawling code for collecting trusted data and unreliable data may be written using a Python language.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 상기 수집된 신뢰도 있는 온라인 데이터 및 신뢰도 없는 온라인 데이터를 임의로 선정하여 사용자 군에게 전송할 수 있다.In addition, the online data reliability prediction apparatus 100 may randomly select the collected reliable online data and unreliable online data and transmit it to the user group.

본원의 일 실시예에 따르면, 온라인 데이터 신뢰도 예측 장치(100)는 신뢰도 있는 데이터 중 임의로 선택된3개의 블로그 게시글과 신뢰도 없는 데이터 중 임의로 선택된 3개의 블로그 게시글을 한 세트로 하여 상기 사용자 군에게 전송할 수 있다. 다만, 전술한 블로그 게시글의 개수는 예시적 기재이며 이에 한정될 필요는 없다.According to an embodiment of the present application, the online data reliability prediction apparatus 100 may transmit a set of three randomly selected blog posts from trusted data and three randomly selected blog posts from unreliable data to the user group. . However, the number of blog posts described above is an exemplary description and need not be limited thereto.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 상기 사용자 군으로부터 상기 임의로 선정된 상기 신뢰도 있는 온라인 데이터 및 상기 신뢰도 없는 온라인 데이터의 신뢰도 평가 결과를 수신할 수 있다.In addition, the online data reliability prediction apparatus 100 may receive a result of the reliability evaluation of the trusted online data and the unreliable online data selected from the user group.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 상기 수신된 신뢰도 평가 결과에 기초하여, 사용자 군의 인지 프로세스를 기반으로 온라인 데이터의 신뢰도 평가에 영향을 미치는 제1요소 및 제2요소를 선정할 수 있다. 인지 프로세스를 기반으로 온라인 데이터의 신뢰도를 평가한다라는 것은, 특정 온라인 데이터가 신뢰도 있는 것으로 분류된 것인지를 인지하기 전과 인지하기 후의 사용자의 평가 과정, 평가 요소를 고려하여, 사용자가 특정 온라인 데이터가 신뢰도 있는 것으로 인지하였을 경우에 중요시 하는 평가 요소를 추출하고 해당 요소를 고려하여 보다 정확하게 온라인 데이터의 신뢰도를 평가하는 것을 의미할 수 있다.In addition, the online data reliability prediction apparatus 100 may select the first and second elements that affect the reliability evaluation of the online data based on the recognition process of the user group based on the received reliability evaluation result. . Evaluating the reliability of online data based on the cognitive process means that, before and after recognizing whether or not a particular online data is classified as reliable, the user's evaluation process and evaluation factors are considered. When it is recognized as, it can mean that the reliability of online data is more accurately evaluated by extracting the important evaluation factors and considering the factors.

도7은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 온라인 데이터의 신뢰도 평가가 수행되는 과정을 설명하기 위한 도면이다.7 is an example of an experiment linked to an online data reliability prediction method according to an embodiment of the present application, and is a diagram for explaining a process in which online data reliability evaluation is performed.

도7을 참조하면, 먼저 사용자 군의 인적 사항 및 블로그 사용 및 검색 양상에 대한 문항에 대한 응답 정보가 수집될 수 있다.Referring to FIG. 7, first, response information on questions about the personal information of the user group and the use of the blog and the search aspect may be collected.

다음으로, 본능적 신뢰도 평가(Visceral & Behavioral Assessment)가 수행될 수 있다.Next, an instinctive reliability assessment (Visceral & Behavioral Assessment) may be performed.

본원의 일 실시예에 따르면, 상기 본능적 신뢰도 평가는 사용자 군이 상기 온라인 데이터의 신뢰도를 알기전 신뢰도 평가에 고려한 요소를 수집하는 단계로, 상기 온라인 데이터 세트를 구성하는 각각의 게시글에 대하여, 각각의 게시글을 읽고 해당 게시글을 신뢰할 수 있는지 또는 신뢰할 수 없는지에 대한 즉각적인 평가를 내리고, 상기 즉각적인 평가를 내리는 데 있어서 자신이 고려한 요소들을 평가와 함께 제출하는 방식으로 수행될 수 있다.According to one embodiment of the present application, the instinctive reliability evaluation is a step of collecting factors considered in the reliability evaluation before the user group knows the reliability of the online data. For each post constituting the online data set, each It can be done by reading the post and making an immediate assessment of whether the post is trusted or not, and submitting the factors that you considered in making the immediate assessment along with the assessment.

본원의 일 실시예에 따르면, 상기 고려한 요소들은 다양한 선행 연구들을 분석하여, 일반적으로 온라인 데이터의 신뢰도를 평가하는데 고려될 수 있는 요소들에 대한 목록을 작성하여 사용자 군에 제시하고, 상기 목록 중에서 사용자 자신이 고려한 항목에 대하여만 별도의 표시를 하여 제출하는 방식으로 수집될 수 있다.According to an embodiment of the present application, the considered factors are analyzed by various prior studies, and a list of elements that can be considered in evaluating reliability of online data is generally prepared and presented to a user group, and among the list, users It can be collected by submitting it with a separate label only for items considered by itself.

본원의 일 실시예에 따르면, 상기 고려될 수 있는 요소들에 대한 목록은 내용(Content), 감정(Sentiment) 또는 스타일(Style) 에 관한 하위 항목으로 구성될 수 있다.According to an embodiment of the present application, the list of elements that can be considered may be composed of sub-items related to content, sentiment, or style.

다음으로, 온라인 데이터 신뢰도 예측 장치(100)는 사용자 군에게 전송된 온라인 데이터 세트 각각의 게시글에 대한 신뢰도 정보를 사용자 군에게 전송할 수 있다.Next, the online data reliability prediction apparatus 100 may transmit reliability information for each post of the online data set transmitted to the user group to the user group.

본원의 일 실시예에 따르면, 상기 신뢰도 정보는 온라인 데이터 세트 내 각각의 게시글이 신뢰할 수 있는 데이터인 NAVER파워 블로거에 의한 글인지 또는 신뢰할 수 없는 데이터인 Dbdbdepp 웹 사이트에서 수집된 글인지에 대한 정보를 포함할 수 있다.According to one embodiment of the present application, the reliability information includes information on whether each post in the online data set is a post by a trusted NAVER power blogger, or an untrusted post by the Dbdbdepp website. It can contain.

다음으로, 반영적 신뢰도 평가(Reflective Assessment)가 수행될 수 있다.Next, Reflective Assessment can be performed.

본원의 일 실시예에 따르면, 상기 반영적 신뢰도 평가는 사용자 군이 상기 온라인 데이터의 신뢰도를 알고난 후, 신뢰도 평가에 고려한 요소를 수집하는 단계로, 사용자 군이 상기 온라인 데이터 세트를 구성하는 각각의 게시글에 대한 신뢰도 정보를 제공받은 후 상기 신뢰도 정보를 바탕으로, 각각의 게시글을 다시 면밀히 읽고 해당 게시글이 신뢰할 수 있는 데이터 또는 신뢰할 수 없는 데이터로 구분되는데 있어서, 자신이 평가하기에 중요하게 여겨지는 요소들을 제출하는 방식으로 수행될 수 있다.According to an embodiment of the present application, the reflective reliability evaluation is a step of collecting elements considered for reliability evaluation after the user group knows the reliability of the online data. After receiving the reliability information for the post, based on the reliability information, each post is read again and classified into trusted data or untrusted data, which are considered important factors for evaluation. It can be done by submitting them.

본원의 일 실시예에 따르면, 상기 중요하게 여겨지는 요소들은 본능적 신뢰도 평가에서 동일하게 사용된 목록 중 사용자 자신이 고려한 항목에 대하여만 별도의 표시를 하여 제출하는 방식으로 수집될 수 있다.According to an embodiment of the present application, the elements that are considered to be important may be collected in a manner that a user separately considers and submits only items considered by the user among the same list used in the instinctive reliability evaluation.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 전술한 과정을 통해 수집된 본능적 신뢰도 평가에 고려된 요소들의 중요도 순위를 매길 수 있다.In addition, the online data reliability prediction apparatus 100 may rank the importance of factors considered for evaluating instinctive reliability collected through the above-described process.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 전술한 과정을 통해 수집된 반영적 신뢰도 평가에 고려된 요소들의 중요도 순위를 매길 수 있다.In addition, the online data reliability prediction apparatus 100 may rank the importance of factors considered in the reflective reliability evaluation collected through the above-described process.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 상기 본능적 신뢰도 평가 단계에서 수집된 요소들 중에서 상기 반영적 신뢰도 평가 단계에서 순위가 가장 많이 오른 요소를 제2요소로 선정할 수 있다.In addition, the online data reliability prediction apparatus 100 may select the second highest element among the elements collected in the instinctive reliability evaluation step in the reflective reliability evaluation step.

본원의 일 실시예에 따르면, 상기 본능적 신뢰도 평가에 비하여 반영적 신뢰도 평가에서 순위가 상승한 제2요소들은, 사용자가 자신이 읽은 게시들의 신뢰도 정보를 알고 있는 상태에서 이전의 본능적 신뢰도 평가에서 보다 중요하게 고려한 요소들로서, 제1요소와 제2요소를 구분함으로써 인간의 본능적이고 즉각적인 인지 과정 및 분석적이고 신중한 인지 과정을 모두 고려하여 온라인 데이터의 신뢰도를 평가할 수 있도록 할 수 있다.According to the exemplary embodiment of the present application, the second elements of which the ranking is higher in the reflective reliability evaluation compared to the instinctive reliability evaluation are more important in the previous instinctive reliability evaluation while the user knows the reliability information of the publications he has read. As factors considered, by distinguishing the first and second factors, it is possible to evaluate the reliability of online data by considering both human instinctive and immediate cognitive processes and analytic and prudent cognitive processes.

도8은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 본능적 신뢰도 평가에서 고려된 중요 요소 및 반영적 신뢰도 평가에서 고려된 중요 요소를 각각 순위화한 도표이다.8 is an experimental example linked to an online data reliability prediction method according to an embodiment of the present application, and is a diagram ranking each of important factors considered in the instinctive reliability evaluation and important elements considered in the reflective reliability evaluation.

도8의 파란색 셀은 본능적 신뢰도 평가에서 보다 반영적 신뢰도 평가에서 중요도 순위가 상승한 요소들을 표시한 것이며, 초록색 셀은 본능적 신뢰도 평가에서만 고려된 요소 및 반영적 신뢰도 평가에서만 고려된 요소를 각각 표시한 것이다.The blue cells in FIG. 8 indicate elements of higher importance ranking in the reflective reliability evaluation than in the instinctive reliability evaluation, and the green cells indicate elements considered only in the instinctive reliability evaluation and those considered only in the reflective reliability evaluation. .

본원의 일 실시예에 따르면, 본능적 신뢰도 평가에서는 시각적 요소가 높은 순위를 차지하였고, 반영적 신뢰도 평가에서는 시각적 요인보다는 온라인 데이터가 일관되게 작성되었는가를 측정할 수 있는 요소 들이 주로 높은 순위를 차지하였다. According to an embodiment of the present application, in the instinctive reliability evaluation, the visual factor occupied a high rank, and in the reflective reliability evaluation, the factors capable of measuring whether the online data was written consistently rather than the visual factor mainly occupied the high ranking.

본원의 일 실시예에 따르면, 상기 순위가 가장 많이 오른 요소는 콘텐츠 관련 요소 및 저자 관련 요소를 포함할 수 있다. 도8을 통해 알 수 있듯이, 상기 순위가 가장 많이 오른 요소에는 게시글의 일관도(Coherency of the post)로 대표되는 콘텐츠 관련 요소와 저자의 전문성(Author expertise)로 대표되는 저자 관련 요소가 포함될 수 있다.According to one embodiment of the present application, the element with the highest ranking may include a content-related element and an author-related element. As can be seen through FIG. 8, the most ranked element may include content-related elements represented by the coherency of the post and author-related elements represented by author expertise. .

본원의 일 실시예에 따르면, 상기 저자 관련 요소를 배제하고 상기 콘텐츠 관련 요소를 상기 제2요소로 선정할 수 있다.According to an embodiment of the present application, the author-related element may be excluded and the content-related element may be selected as the second element.

이는, 본원의 일 실시예에 따른 신뢰도 있는 데이터인 파워 블로거에 의해 작성된 온라인 데이터와 관련하여, 파워 블로거들은 일반적인 온라인 데이터 작성자(블로그 유저) 보다 월등히 대중에게 노출될 확률이 높다고 할 수 있고, 본원의 일 실시예에 따른 온라인 데이터가 독자에게 보여지는 화면 구조 상 저자 관련 정보를 추출하기 어려우며, 일반적으로 정보를 검색하고자 하는 인터넷 사용자는 특정한 저자를 북 마킹하여 이를 바탕으로 정보를 수집하기 보다는 검색어를 통한 검색을 통해 발견되는 온라인 데이터를 통해 정보를 수집하는 경향이 강하기 때문에, 저자 관련 요소를 제2요소에 포함시킬 경우, 실제 개별적인 온라인 데이터 자체에 대한 신뢰도를 정확하게 반영하지 못할 수 있다.This can be said that in relation to online data created by a power blogger, which is reliable data according to an embodiment of the present application, power bloggers are more likely to be exposed to the public than a general online data creator (blog user). Due to the screen structure in which online data according to an embodiment is shown to the reader, it is difficult to extract information related to the author, and in general, an Internet user who wants to search for information may search through a search term rather than collect information based on a book by marking a specific author. Since the tendency to collect information through online data found through search is strong, when author-related elements are included in the second element, the reliability of the actual individual online data itself may not be accurately reflected.

본원의 일 실시예에 따르면, 상기 제 2요소는 구조, 텍스트 길이비, 이미지 개수비, 정렬을 포함할 수 있다.According to one embodiment of the present application, the second element may include a structure, a text length ratio, an image number ratio, and alignment.

본원의 일 실시예에 따르면 상기 구조(Structure)는 온라인 데이터 상에 텍스트 부분과 이미지 부분의 구성을 나타낼 수 있다. 본원의 일 실시예에 따르면, 온라인 데이터의 전체 구성은 5-gram 텍스트 및 이미지 시퀀스로 나뉠 수 있고(예를 들어, Text-Text-Text-Image-Text 혹은 Text-Image-Text-Text-Text), TF-IDF(Term Frequency-Inverse Document Frequency)를 이용하여 벡터화 될 수 있다.According to an embodiment of the present disclosure, the structure may represent a structure of a text part and an image part on online data. According to one embodiment of the present application, the entire configuration of online data may be divided into 5-gram text and image sequences (eg, Text-Text-Text-Image-Text or Text-Image-Text-Text-Text). , It can be vectorized using TF-IDF (Term Frequency-Inverse Document Frequency).

본원의 일 실시예에 따르면, 상기 텍스트 길이비(Effort text ratio)는 온라인 데이터가 속하는 분야의 평균 텍스트 길이로 대상 온라인 데이터의 텍스트 길이를 나눈 값일 수 있다.According to an embodiment of the present disclosure, the text length ratio may be a value obtained by dividing the text length of the target online data by the average text length of the field to which the online data belongs.

본원의 일 실시예에 따르면, 상기 이미지 개수비(Effort image ratio)는 온라인 데이터가 속하는 분야의 평균 이미지 포함개수로 대상 온라인 데이터의 이미지 포함개수를 나눈 값일 수 있다.According to one embodiment of the present application, the image number ratio (Effort image ratio) may be a value obtained by dividing the number of images included in the target online data by the average number of images included in the field to which the online data belongs.

본원의 일 실시예에 따르면, 상기 배열(Alignment)는 온라인 데이터 내의 이미지 및 텍스트 문단 각각이 좌측 정렬, 중앙 정렬, 우측 정렬 또는 혼합 정렬된 양상을 나타내는 요소일 수 있다. According to an embodiment of the present disclosure, the alignment may be an element in which the image and text paragraphs in the online data each have a left-aligned, center-aligned, right-aligned, or mixed-aligned aspect.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 상기 본능적 신뢰도 평가 및 반영적 신뢰도 평가 전체를 통해 수집된 요소들에서 상기 제2요소를 제외한 요소들을 제1요소로 결정할 수 있다.In addition, the online data reliability prediction apparatus 100 may determine elements other than the second element from the elements collected through the instinctive reliability evaluation and the reflective reliability evaluation as the first element.

본원의 일 실시예에 따르면, 제1요소는 내용(Content), 감정(Sentiment) 또는 스타일(Stlye)과 관련된 세부 항목들로 구성될 수 있다.According to one embodiment of the present application, the first element may be composed of detailed items related to content, sentiment, or style.

도9는 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 사용자 군의 인지 프로세스를 기반으로 제1요소 및 제2요소를 선정한 결과를 예시한 도표이다.9 is a diagram illustrating a result of selecting a first element and a second element based on a cognitive process of a user group, as an experimental example associated with an online data reliability prediction method according to an embodiment of the present application.

도9를 참조하면, 파란색으로 표시된 요소는 제2요소의 일 예가 될 수 있으며, 파란색으로 표시되지 않은 요소들은 제1요소의 일 예가 될 수 있다.Referring to FIG. 9, elements marked in blue may be examples of the second element, and elements not marked in blue may be examples of the first element.

도9를 참조하면, 제2요소로 구조, 텍스트 길이비, 이미지 개수비 및 정렬을 선정할 수 있으며, 제1요소로 제목 길이, URL 존부, 물음표 개수, 1인칭 단어비, 2인칭 단어비, 태그 수, 스티커 수, 미디어 개수, 문법 오류의 수, 지도의 존부(이상은 내용과 관련된 요소), 긍정적인 단어비, 부정적인 단어비, 주관성, 양극성, 감정적 차이(이상은 감정과 관련된 요소) 또는 글씨체 변경 여부, 글씨 크기 변경 여부, 굵은 글씨 사용비, 글씨 색상 변경비(이상은 스타일과 관련된요소) 중 적어도 하나를 선정할 수 있다.Referring to FIG. 9, a structure, a text length ratio, an image count ratio, and an alignment may be selected as the second element, and the title length, URL presence, question mark number, first person word ratio, second person word ratio, and so on, as the first element. Number of tags, number of stickers, number of media, number of grammatical errors, presence or absence of maps (ideal elements related to content), positive word ratios, negative word ratios, subjectivity, polarity, emotional differences (ideals are elements related to emotions) or At least one of whether to change the font, whether to change the font size, to use the bold font, or to change the font color (above is a style-related element) can be selected.

또한, 온라인 데이터 신뢰도 예측 장치(100)는 상기 선정된 제1요소 및 제2요소를 특성 벡터 공간에 저장할 수 있다.Also, the online data reliability prediction apparatus 100 may store the selected first and second elements in a characteristic vector space.

본원의 일 실시예에 따르면, 상기 특성 벡터 공간에 저장하는 것은, TF-IDF 가중치 알고리즘에 의해 특정한 크기를 갖는 벡터에 상기 선정된 제1요소 및 제2요소가 매핑되는 것일 수 있다.According to one embodiment of the present application, storing in the characteristic vector space may be that the selected first and second elements are mapped to a vector having a specific size by a TF-IDF weighting algorithm.

본원의 일 실시예에 따르면, 온라인 데이터 신뢰도 예측 장치(100)는 온라인 데이터의 신뢰도 정보와 상기 선정된 제1요소 및 제2요소 정보를 연계하여 특성 벡터 공간에 저장할 수 있다.According to an embodiment of the present application, the online data reliability prediction apparatus 100 may store the reliability information of the online data and the selected first and second element information in a characteristic vector space.

본원의 일 실시예에 따른 상기 TF-IDF (Term Frequency-Inverse Document Frequency) 가중치는 문서의 핵심어를 추출하거나, 문서들 사이의 비슷한 정도를 구하는 등에 이용되는 가중치로, 여러 문서로 이루어진 문서 군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한 것인지를 나타내는 통계적 수치이다.The TF-IDF (Term Frequency-Inverse Document Frequency) weight according to an embodiment of the present application is a weight used to extract key words of a document or to obtain a similar degree between documents, and there may be a group of documents composed of several documents. This is a statistical measure of how important a word is in a particular document.

또한, 온라인 데이터 신뢰도 예측 장치(100)는, 상기 벡터 공간에 저장된 특성을 기반으로 하여 기 설정된 기계학습 방식으로 온라인 데이터의 신뢰도를 평가할 수 있는 예측 모델을 학습 시킬 수 있다.In addition, the online data reliability prediction apparatus 100 may train a prediction model capable of evaluating the reliability of online data using a preset machine learning method based on characteristics stored in the vector space.

본원의 일 실시예에 따르면, 상기 기 설정된 기계학습 방식은, 로지스틱 회귀분석(Logistic Regression), 랜덤 포레스트(Random Forest), 선형 서포트 벡터 머신(linear Support Vector Machine) 또는 다층 퍼셉트론(Multi-Layer Perceptron) 중 어느 하나일 수 있다.According to one embodiment of the present application, the preset machine learning method includes logistic regression, random forest, linear support vector machine, or multi-layer perceptron. It can be either.

다만, 기계 학습 방법 중 로지스틱 회귀분석(Logistic Regression), 랜덤 포레스트(Random Forest), 선형 서포트 벡터 머신(linear Support Vector Machine) 또는 다층 퍼셉트론(Multi-Layer Perceptron)은 이해를 돕기 위한 예시적 기재일 뿐, 다른 기계 학습의 실시예가 본 사상에 적용되는 것을 제한하거나 한정하는 것으로 해석되어서는 안될 것이다.However, among the machine learning methods, logistic regression, random forest, linear support vector machine, or multi-layer perceptron are only exemplary descriptions for understanding. However, it should not be construed as limiting or limiting the application of other machine learning embodiments to this idea.

또한, 온라인 데이터 신뢰도 예측 장치(100)는, 예측 대상 온라인 데이터를 수신할 수 있다.Also, the online data reliability prediction apparatus 100 may receive online data to be predicted.

본원의 일 실시예에 따르면, 예측 대상 온라인 데이터는 신뢰도 정보가 주어지지 않은 온라인 데이터로써, 학습된 예측 모델에 의해 신뢰도 예측의 대상이 되는 온라인 데이터일 수 있다.According to an embodiment of the present application, the online data to be predicted is online data to which no reliability information is given, and may be online data to be subjected to reliability prediction by a trained prediction model.

본원의 일 실시예에 따르면, 온라인 데이터 신뢰도 예측 장치(100)는 수신된 예측 대상 온라인 데이터로부터 제1요소들 및 제2요소들 중 적어도 하나와 관련된 예측 대상 온라인 데이터의 특성을 추출할 수 있다.According to an embodiment of the present disclosure, the online data reliability prediction apparatus 100 may extract characteristics of online data to be predicted related to at least one of the first and second elements from the received online data to be predicted.

본원의 일 실시예에 따르면, 온라인 데이터 신뢰도 예측 장치(100)는 상기 추출된 예측 대상 온라인 데이터의 특성을 상기 학습된 예측 모델에 인가(입력)할 수 있다. According to an embodiment of the present application, the online data reliability prediction apparatus 100 may apply (input) the characteristics of the extracted prediction target online data to the learned prediction model.

또한, 온라인 데이터 신뢰도 예측 장치(100)는, 상기 학습된 예측 모델을 이용하여 상기 예측 대상 온라인 데이터의 신뢰도를 예측할 수 있다.In addition, the online data reliability prediction apparatus 100 may predict the reliability of the prediction target online data using the learned prediction model.

본원의 일 실시예에 따르면, 상기 예측 대상 온라인 데이터의 신뢰도 예측 결과는 신뢰도 있음 또는 신뢰도 없음으로 2진법적(binary) 형태로 주어질 수도 있고, 신뢰도를 수치화 하여 퍼센티지 또는 소수값으로 출력될 수 도 있다.According to one embodiment of the present application, the prediction result of the reliability of the online data to be predicted may be given in a binary form, with or without reliability, or may be output as a percentage or a decimal value by digitizing the reliability. .

본원의 일 실시예에 따르면, 온라인 데이터 신뢰도 예측 장치(100)의 분류 능력를 정확도(Accuracy) 또는 F1점수(F1 Score: F1)로 측정할 수 있다.According to one embodiment of the present application, the classification capability of the online data reliability prediction apparatus 100 may be measured with accuracy or F1 score (F1).

기계 학습 모델의 성능을 평가하기 위한 지표 중 하나인 정확도(Accuracy)는 해당 모델이 전체 정답 가운데 몇 개를 정확하게 맞췄는가에 대한 값이며, 정확도의 계산 방법은 아래의 [식1]로 주어진다.Accuracy, which is one of the indicators for evaluating the performance of the machine learning model, is a value of how many of the correct answers are correct for the model, and the calculation method of accuracy is given by [Equation 1] below.

[식1][Equation 1]

상기 [식1]에서 True Positives는 정답이 참인데 모델이 참으로 예측한 경우의 수, True Negatives는 정답이 거짓인데 모델이 거짓으로 예측한 경우의 수, False Negatives는 정답이 거짓인데 모델이 오답인 참으로 예측한 경우의 수, False Positives는 정답이 참인데 모델이 오답인 거짓으로 예측한 경우의 수를 각각 나타낸다.In the above [Equation 1], True Positives is true, but the number of cases where the model predicted is true, True Negatives is false when the model is predicted false, False Negatives is false, and the model is incorrect. The number of cases predicted as true and False Positives indicates the number of cases predicted as false if the correct answer is true but the model is incorrect.

기계 학습 모델의 성능은 또한 재현율(Recall)과 정확률(Precision)을 이용하여 측정된다. 재현율과 정확률의 계산 방법은 아래의 [식2] 및 [식3]로 주어진다.The performance of the machine learning model is also measured using Recall and Precision. The calculation method of reproducibility and accuracy is given by [Equation 2] and [Equation 3] below.

[식2][Equation 2]

[식3][Equation 3]

기계 학습 모델이 적용되는 상황에 따라서 재현율이 높은 모델을 선호하는 경우도 있고 정확률이 높은 모델을 선호할 수도 있다. 따라서 기계 학습 모델의 성능이 얼마나 우수한지를 살펴보기 위해서 일반적으로 F1 점수(F1 Score: F1)를 사용할 수 있다. F1점수는 재현율과 정확률의 조화 평균으로 구하며 아래의 [식4]과 같다.Depending on the situation in which the machine learning model is applied, a model with a high reproducibility may be preferred and a model with a high accuracy may be preferred. Therefore, in order to see how excellent the performance of the machine learning model is, an F1 score (F1 Score) can be generally used. The F1 score is obtained from the harmonized average of the reproducibility and the accuracy, and is shown in [Equation 4] below.

[식4][Equation 4]

도10은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법과 연계된 일 실험예로서, 기계 학습 방식 별 예측 정확도 및 F1점수를 도시한 도면이다.10 is a diagram illustrating prediction accuracy and F1 score for each machine learning method as an example of an experiment linked with an online data reliability prediction method according to an embodiment of the present application.

도10을 참조하면, 본원의 일 실시예에 따른 기계학습 방식(로지스틱 회귀분석-LR, 랜덤 포레스트-RF, 선형 서포트 벡터 머신-SVM, 다층 퍼셉트론-MLP) 및 고려된 요소(제1요소만을 고려한 경우, 제1요소와 제2요소를 함께 고려한 경우)에 따른 온라인 데이터 신뢰도 예측 장치(100)의 예측 능력에 대한 정확도 및 F1점수를 확인할 수 있다. 좌측의 노란색 막대는 제1요소만을 특성 벡터 공간에 저장한 경우로서, 비교를 위한 기준선으로 삼은 것이고, 우측 녹색 막대는 제1요소 및 제2요소를 모두 특성 벡터 공간에 저장한 후 학습을 진행한 모델에 대한 결과이다.Referring to FIG. 10, the machine learning method (logistic regression-LR, random forest-RF, linear support vector machine-SVM, multi-layer perceptron-MLP) and considered factors (only the first factor is considered) according to an embodiment of the present application In this case, the accuracy and F1 score of the prediction capability of the online data reliability prediction apparatus 100 according to the case where the first and second factors are considered together) can be confirmed. The yellow bar on the left is a case where only the first element is stored in the feature vector space, and is used as a baseline for comparison. The green bar on the right side stores the first element and the second element in the feature vector space and then proceeds with learning. This is the result for the model.

도10을 참조하면, 제2요소를 고려한 모든 경우에 있어서 제1요소만을 고려한 모델보다 정확도와 F1 점수가 상승한 것을 알 수 있으며, 기계 학습 방식 중에서는 다층 퍼셉트론(MLP)에 의한 모델이 가장 훌륭한 예측 능력을 보였음을 알 수 있다.Referring to FIG. 10, it can be seen that in all cases in which the second factor is considered, accuracy and F1 scores are higher than the model in which only the first factor is considered, and among the machine learning methods, the model by multilayer perceptron (MLP) is the best prediction. You can see that it showed the ability.

도2는 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 장치의 개략적인 블록도이다.2 is a schematic block diagram of an online data reliability prediction apparatus according to an embodiment of the present application.

도2를 참조하면, 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 장치(100)는 데이터 수신부(110), 데이터 특성 추출부(120), 벡터 저장부(130) 및 데이터 신뢰도 예측부(140)를 포함할 수 있다.2, the online data reliability prediction apparatus 100 according to an embodiment of the present application includes a data receiving unit 110, a data characteristic extraction unit 120, a vector storage unit 130, and a data reliability prediction unit 140. It may include.

도2를 참조하면, 데이터 수신부(110)는 신뢰도를 알고 있는 온라인 데이터와 예측 대상 온라인 데이터를 수신할 수 있다.Referring to FIG. 2, the data receiving unit 110 may receive online data and online data to be predicted that have a known reliability.

본원의 일 실시예에 따르면, 온라인 데이터의 신뢰도 정보는 이진적(Binary)으로 평가될 수 있으며, 데이터 수신부(110)는 신뢰도 있는 온라인 데이터로 NAVER에서 선정한 파워 블로거들에 의해 작성된 게시글들로 하고, 신뢰도 없는 온라인 데이터로 Dbdbdepp 웹 사이트에 게시된 게시글들로 하여, 별도의 웹 서버로부터 수신할 수 있다.According to one embodiment of the present application, the reliability information of the online data may be evaluated as binary, and the data receiving unit 110 is reliable online data made of postings made by power bloggers selected by NAVER, With unreliable online data, postings posted on the Dbdbdepp website can be received from a separate web server.

또한, 데이터 특성 추출부(120)는, 상기 신뢰도를 알고 있는 온라인 데이터로부터 사용자 군의 인지 프로세스를 기반으로 신뢰도 평가에 영향을 미치는 제1요소 및 제2요소를 선정할 수 있다.In addition, the data characteristic extraction unit 120 may select first and second elements that affect the reliability evaluation based on the cognitive process of the user group from online data that knows the reliability.

본원의 일 실시예에 따르면, 데이터 특성 추출부(120)는, 사용자 군이 상기 온라인 데이터의 신뢰도를 알기 전, 신뢰도 평가에 고려한 요소를 수집하는 본능적 신뢰도 평가 단계를 수행하고, 사용자 군이 상기 온라인 데이터의 신뢰도를 알고 난 후, 신뢰도 평가에 고려한 요소를 수집하는 반영적 신뢰도 평가 단계를 수행하고, 상기 본능적 신뢰도 평가 단계에서 수집된 요소들 중에서 상기 반영적 신뢰도 평가 단계에서 순위가 가장 많이 오른 요소를 상기 제2요소로 선정할 수 있다.According to an embodiment of the present application, the data characteristic extracting unit 120 performs an instinctive reliability evaluation step of collecting factors considered for reliability evaluation before the user group knows the reliability of the online data, and the user group is online After knowing the reliability of the data, a reflective reliability evaluation step of collecting the factors considered for reliability evaluation is performed, and among the elements collected in the instinctive reliability evaluation step, the element with the highest ranking in the reflective reliability evaluation step is selected. It can be selected as the second element.

본원의 일 실시예에 따르면, 데이터 특성 추출부(120)의 신뢰도 평가 분석 결과, 상기 순위가 가장 많이 오른 요소는 콘텐츠 관련 요소 및 저자 관련 요소를 포함할 수 있다. 하지만, 본원의 일 실시예에 따르면 상기 저자 관련 요소를 배제하고 상기 콘텐츠 관련 요소만을 제2요소로 선정하는 것이 실제 온라인 데이터 작성 및 공유 양상에 부합할 수 있다. According to an embodiment of the present application, as a result of the reliability evaluation analysis of the data feature extraction unit 120, the element with the highest ranking may include a content-related element and an author-related element. However, according to an embodiment of the present application, excluding the author-related element and selecting only the content-related element as the second element may correspond to an actual online data creation and sharing aspect.

본원의 일 실시예에 따르면, 상기 제2요소는 구조, 텍스트 길이비, 이미지 개수비, 정렬을 포함할 수 있다.According to one embodiment of the present application, the second element may include a structure, a text length ratio, an image number ratio, and alignment.

또한, 벡터 저장부(130)는, 상기 선정된 제1요소 및 제2요소를 특성 벡터 공간에 저장할 수 있다.In addition, the vector storage unit 130 may store the selected first element and second element in a characteristic vector space.

본원의 일 실시예에 따르면, 벡터 저장부(130)는, TF-IDF 가중치 알고리즘에 의해 특정한 크기를 갖는 벡터에 상기 데이터 특성 추출부(120)에 의해 선정된 제1요소 및 제2요소를 매핑할 수 있다.According to one embodiment of the present application, the vector storage unit 130 maps the first element and the second element selected by the data characteristic extraction unit 120 to a vector having a specific size by a TF-IDF weighting algorithm. can do.

본원의 일 실시예에 따르면, 벡터 저장부(130)는, 온라인 데이터의 신뢰도 정보와 상기 선정된 제1요소 및 제2요소 정보를 연계하여 특성 벡터 공간에 저장할 수 있다.According to one embodiment of the present application, the vector storage unit 130 may store the reliability information of the online data and the selected first and second element information in a characteristic vector space.

또한, 데이터 신뢰도 예측부(140)는, 상기 벡터 공간에 저장된 특성을 기반으로 하여 기 설정된 기계학습 방식으로 학습된 예측 모델을 통해 온라인 데이터의 신뢰도에 대한 예측값을 도출할 수 있다.In addition, the data reliability prediction unit 140 may derive a prediction value for the reliability of online data through a prediction model trained by a preset machine learning method based on characteristics stored in the vector space.

본원의 일 실시예에 따르면, 데이터 신뢰도 예측부(140)는 수신된 예측 대상 온라인 데이터로부터 제1요소들 및 제2요소들 중 적어도 하나와 관련된 예측 대상 온라인 데이터의 특성을 추출할 수 있다.According to an embodiment of the present disclosure, the data reliability prediction unit 140 may extract characteristics of the online data to be predicted related to at least one of the first and second elements from the received online data to be predicted.

본원의 일 실시예에 따르면, 데이터 신뢰도 예측부(140)는 상기 추출된 예측 대상 온라인 데이터의 특성을 상기 학습된 예측 모델에 인가(입력)할 수 있다. According to an embodiment of the present disclosure, the data reliability prediction unit 140 may apply (input) the characteristics of the extracted prediction target online data to the learned prediction model.

본원의 일 실시예에 따르면, 데이터 신뢰도 예측부(140)는, 상기 기 설정된 기계학습 방식으로, 로지스틱 회귀분석(Logistic Regression), 랜덤 포레스트(Random Forest), 선형 서포트 벡터 머신(linear Support Vector Machine) 또는 다층 퍼셉트론(Multi-Layer Perceptron) 중 어느 하나를 활용할 수 있다.According to the exemplary embodiment of the present application, the data reliability prediction unit 140, in the preset machine learning method, logistic regression, random forest, linear support vector machine Alternatively, one of multi-layer perceptrons may be used.

본원의 일 실시예에 따르면, 데이터 신뢰도 예측부(140)는, 데이터 수신부(110)에서 수신된 예측 대상 온라인 데이터의 신뢰도 예측 결과를 신뢰도 있음 또는 신뢰도 없음으로 2진법적(binary) 형태로 평가할 수 있고, 신뢰도를 수치화 하여 퍼센티지 또는 소수값으로 출력할 수 도 있다.According to the exemplary embodiment of the present application, the data reliability prediction unit 140 may evaluate the reliability prediction result of the online data to be predicted received by the data reception unit 110 in a binary form as reliable or unreliable. In addition, the reliability can be quantified and output as a percentage or decimal value.

도3은 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 모델 생성 방법에 대한 동작흐름도이다.3 is an operation flow diagram of a method for generating an online data reliability prediction model according to an embodiment of the present application.

도3에 도시된 온라인 데이터 신뢰도 예측 모델 생성 방법은 앞서 설명된 온라인 데이터 신뢰도 예측 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 온라인 데이터 신뢰도 예측 장치(100)에 대하여 설명된 내용은 도 3에도 동일하게 적용될 수 있다.The method for generating an online data reliability prediction model illustrated in FIG. 3 may be performed by the online data reliability prediction apparatus 100 described above. Therefore, even if omitted, the description of the online data reliability prediction apparatus 100 may be applied to FIG. 3 as well.

도3을 참조하면, 단계 S311에서, 데이터 수신부(110)는 신뢰도 있는 온라인 데이터를 수집 및 수신할 수 있다. 예를 들어, 데이터 수신부(110)는 상대적으로 신뢰도 있는 것으로 기 평가된 온라인 데이터가 저장되어 있는 데이터 서버 또는 개별의 사용자 단말로부터 온라인 데이터를 수집 및 수신할 수 있다.Referring to FIG. 3, in step S311, the data receiving unit 110 may collect and receive reliable online data. For example, the data receiving unit 110 may collect and receive online data from a data server or an individual user terminal in which online data pre-evaluated as being relatively reliable is stored.

또한, 단계 S312에서, 데이터 수신부(110)는 신뢰도 없는 온라인 데이터를 수집 및 수신할 수 있다. 예를 들어, 데이터 수신부(110)는 상대적으로 신뢰도 없는 것으로 기 평가된 온라인 데이터가 저장되어 있는 데이터 서버 또는 개별의 사용자 단말로부터 온라인 데이터를 수집 및 수신할 수 있다.In addition, in step S312, the data receiving unit 110 may collect and receive unreliable online data. For example, the data receiving unit 110 may collect and receive online data from a data server or an individual user terminal in which online data pre-evaluated as relatively unreliable is stored.

본원의 일 실시예에 따르면, 상기 신뢰도 있는 온라인 데이터 및 신뢰도 없는 온라인 데이터는 블로그, 인스타그램 등 다양한 종류의 인터넷 게시물 및 SNS 데이터를 포함할 수 있다.According to one embodiment of the present application, the trusted online data and unreliable online data may include various types of Internet posts and SNS data such as blogs and Instagram.

다음으로, 단계 S320에서, 온라인 데이터 신뢰도 예측 장치(100)는 상기 신뢰도 있는 온라인 데이터 및 상기 신뢰도 없는 온라인 데이터를 임의로 선정하여 사용자 군에게 전송할 수 있다. 예를 들어, 온라인 데이터 신뢰도 예측 장치(100)는 상기 신뢰도 있는 온라인 데이터 및 상기 신뢰도 없는 온라인 데이터를 동일한 수로 혼합하여 사용자 군의 사용자 단말로 전송할 수 있다.Next, in step S320, the online data reliability prediction apparatus 100 may randomly select the trusted online data and the unreliable online data and transmit them to the user group. For example, the online data reliability prediction apparatus 100 may mix the reliable online data and the unreliable online data with the same number and transmit the same to the user terminal of the user group.

다음으로, 단계 S330에서, 데이터 수신부(110)는 상기 사용자 군의 사용자 단말로부터 상기 임의로 선정된 상기 신뢰도 있는 온라인 데이터 및 상기 신뢰도 없는 온라인 데이터의 신뢰도 평가 결과를 수신할 수 있다. Next, in step S330, the data receiving unit 110 may receive the reliability evaluation results of the arbitrarily selected trusted online data and the unreliable online data from the user terminal of the user group.

또한, 단계 S330에서, 데이터 특성 추출부(120)는, 상기 데이터 수신부(110)로 수신된 신뢰도 평가 결과에 기초하여, 사용자 군의 인지 프로세스를 기반으로 온라인 데이터의 신뢰도 평가에 영향을 미치는 제1요소 및 제2요소를 선정할 수 있다.In addition, in step S330, the data characteristic extracting unit 120, based on the reliability evaluation result received by the data receiving unit 110, the first to affect the reliability evaluation of the online data based on the recognition process of the user group The element and the second element can be selected.

본원의 일 실시예에 따르면, 상기 신뢰도 평가는 본능적 신뢰도 평가 및 반영적 신뢰도 평가의 두 단계를 거쳐 이루어지며, 후 순위로 진행되는 반영적 신뢰도 평가에서 선행된 본능적 신뢰도 평가에서 보다 중요도 순위가 상승한 요소를 제2요소로 선정할 수 있다.According to one embodiment of the present application, the reliability evaluation is performed through two steps of instinctive reliability evaluation and reflective reliability evaluation. Can be selected as the second element.

다음으로, 단계 S340에서, 벡터 저장부(130)는 상기 선정된 제1요소 및 제2요소를 특성 벡터 공간에 저장할 수 있다.Next, in step S340, the vector storage unit 130 may store the selected first and second elements in a characteristic vector space.

본원의 일 실시예에 따르면, 단계 S340에서, 벡터 저장부(130)는 온라인 데이터의 신뢰도 정보와 상기 선정된 제1요소 및 제2요소 정보를 연계하여 특성 벡터 공간에 저장할 수 있다.According to one embodiment of the present application, in step S340, the vector storage unit 130 may store the reliability information of the online data and the selected first and second element information in a characteristic vector space.

다음으로, 단계 S350에서, 데이터 신뢰도 예측부(140)는 상기 벡터 공간에 저장된 특성을 기반으로 하여 기 설정된 기계학습 방식으로 온라인 데이터의 신뢰도를 평가할 수 있는 예측 모델을 학습시킬 수 있다.Next, in step S350, the data reliability prediction unit 140 may train a prediction model capable of evaluating the reliability of online data using a preset machine learning method based on the characteristics stored in the vector space.

본원의 일 실시예에 따르면, 단계 S350에서, 데이터 신뢰도 예측부(140)는 상기 기 설정된 기계학습 방식으로, 로지스틱 회귀분석(Logistic Regression), 랜덤 포레스트(Random Forest), 선형 서포트 벡터 머신(linear Support Vector Machine) 또는 다층 퍼셉트론(Multi-Layer Perceptron) 중 어느 하나를 활용할 수 있다.According to one embodiment of the present application, in step S350, the data reliability prediction unit 140 is the preset machine learning method, logistic regression, logistic regression, random forest, linear support vector machine (linear support) Vector Machine) or Multi-Layer Perceptron.

상술한 설명에서, 단계 S310 내지 S350은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S310 to S350 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present application. In addition, some steps may be omitted if necessary, and the order between the steps may be changed.

도 4는 본원의 일 실시예에 따른 제1요소 및 제2요소를 선정하는 방법에 대한 동작흐름도이다.4 is an operation flow diagram of a method for selecting a first element and a second element according to an embodiment of the present application.

도4에 도시된 제1요소 및 제2요소를 선정하는 방법은 앞서 설명된 온라인 데이터 신뢰도 예측 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 온라인 데이터 신뢰도 예측 장치(100)에 대하여 설명된 내용은 도 4에도 동일하게 적용될 수 있다.The method of selecting the first and second elements illustrated in FIG. 4 may be performed by the online data reliability prediction apparatus 100 described above. Therefore, even if omitted, the description of the online data reliability prediction apparatus 100 may be applied to FIG. 4 as well.

도4를 참조하면, 단계 S410에서, 데이터 수신부(110)는, 본능적 신뢰도 평가 결과를 수신하고, 사용자 군이 상기 온라인 데이터의 신뢰도를 알기전, 신뢰도 평가에 고려한 요소를 수집할 수 있다.Referring to FIG. 4, in step S410, the data receiving unit 110 may receive an instinctive reliability evaluation result and collect factors considered for reliability evaluation before the user group knows the reliability of the online data.

다음으로, 단계 S420에서, 온라인 데이터 신뢰도 예측 장치(100)는, 본능적 신뢰도 평가를 경료한 사용자 군에게 평가의 대상이된 온라인 데이터의 신뢰도 정보를 전송할 수 있다.Next, in step S420, the online data reliability prediction apparatus 100 may transmit reliability information of online data subject to evaluation to a group of users who have completed the instinctive reliability evaluation.

다음으로, 단계 S430에서, 데이터 수신부(110)는, 반영적 신뢰도 평가 결과를 수신하고, 사용자 군이 상기 온라인 데이터의 신뢰도를 알고 난 후, 신뢰도 평가에 고려한 요소를 수집할 수 있다.Next, in step S430, the data receiving unit 110, after receiving the results of the reflective reliability evaluation, and after the user group knows the reliability of the online data, may collect the factors considered in the reliability evaluation.

다음으로, 단계 S440에서, 데이터 특성 추출부(120)는 데이터 수신부(110)로부터 본능적 신뢰도 평가에 고려된 요소 정보를 수신하여, 요소 별 중요도 순위를 계산할 수 있다.Next, in step S440, the data characteristic extracting unit 120 may receive the element information considered for evaluating instinctual reliability from the data receiving unit 110 and calculate the importance ranking for each element.

다음으로, 단계 S450에서, 데이터 특성 추출부(120)는 데이터 수신부(110)로부터 반영적 신뢰도 평가에 고려된 요소 정보를 수신하여, 요소 별 중요도 순위를 계산할 수 있다.Next, in step S450, the data characteristic extraction unit 120 may receive the element information considered for the reflective reliability evaluation from the data receiving unit 110, and calculate the importance ranking for each element.

다음으로, 단계 S460에서, 데이터 특성 추출부(120)는 상기 요소 별 중요도 순위를 비교하여, 본능적 신뢰도 평가 단계에서 수집된 요소들 중에서 상기 반영적 신뢰도 평가 단계에서 순위가 가장 많이 오른 요소를 상기 제2요소로 선정할 수 있다.Next, in step S460, the data characteristic extraction unit 120 compares the importance ranking for each of the elements, and among the elements collected in the instinctive reliability evaluation step, removes the element with the highest ranking in the reflective reliability evaluation step. It can be selected as 2 elements.

다음으로, 데이터 특성 추출부(120)는 단계 S470에서 본능적 신뢰도 평가 및 반영적 신뢰도 평가 전체에서 수집된 요소들 중에서 상기 선정된 제2요소를 제외한 요소들을 제1요소로 결정할 수 있다.Next, the data characteristic extracting unit 120 may determine, in step S470, elements other than the selected second element from among elements collected in the instinctive reliability evaluation and the reflective reliability evaluation as the first element.

상술한 설명에서, 단계 S410 내지 S470은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S410 to S470 may be further divided into additional steps, or combined into fewer steps, according to embodiments herein. Also, some steps may be omitted if necessary, and the order between the steps may be changed.

도5 는 본원의 일 실시예에 따른 온라인 데이터 신뢰도 예측 방법에 대한 동작흐름도이다.5 is an operation flow diagram of an online data reliability prediction method according to an embodiment of the present application.

도5에 도시된 온라인 데이터 신뢰도 예측 방법은 앞서 설명된 온라인 데이터 신뢰도 예측 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용 이라고 하더라도 온라인 데이터 신뢰도 예측 장치(100)에 대하여 설명된 내용은 도 5에도 동일하게 적용될 수 있다.The online data reliability prediction method illustrated in FIG. 5 may be performed by the online data reliability prediction apparatus 100 described above. Therefore, even if it is omitted below, the contents described with respect to the online data reliability prediction apparatus 100 may be applied to FIG. 5.

도5를 참조하면, 단계 S510에서, 데이터 수신부(110)는 예측 대상 온라인 데이터를 수신할 수 있다.Referring to FIG. 5, in step S510, the data receiving unit 110 may receive prediction target online data.

본원의 일 실시예에 따르면, 상기 예측 대상 온라인 데이터는 신뢰도 정보가 주어지지 않은 블로그, 인스타그램 등 다양한 종류의 인터넷 게시물 및 SNS 데이터를 포함할 수 있다.According to an embodiment of the present application, the predicted online data may include various types of Internet posts and SNS data, such as a blog or instagram for which no reliability information is given.

다음으로, 단계S520에서, 데이터 신뢰도 예측부(140)는, 수신된 예측 대상 온라인 데이터로부터 제1요소들 및 제2요소들 중 적어도 하나와 관련된 예측 대상 온라인 데이터의 특성을 추출할 수 있다.Next, in step S520, the data reliability prediction unit 140 may extract characteristics of the predicted online data related to at least one of the first elements and the second elements from the received predicted online data.

또한, 단계 S520에서 데이터 신뢰도 예측부(140)는, 상기 추출된 예측 대상 온라인 데이터의 특성을 상기 학습된 예측 모델에 인가(입력)할 수 있다. In addition, in step S520, the data reliability prediction unit 140 may apply (input) the characteristics of the extracted prediction target online data to the learned prediction model.

다음으로, 단계 S530에서, 데이터 신뢰도 예측부(140)는, 상기 학습된 예측 모델을 이용하여 상기 예측 대상 온라인 데이터의 신뢰도를 예측할 수 있다.Next, in step S530, the data reliability prediction unit 140 may predict the reliability of the prediction target online data using the learned prediction model.

본원의 일 실시예에 따르면, 단계 S530에서, 데이터 신뢰도 예측부(140)는, 상기 예측 대상 온라인 데이터의 신뢰도 예측 결과를 신뢰도 있음 또는 신뢰도 없음으로 2진법적(binary)형태로 제공할 수도 있고, 신뢰도를 수치화 하여 퍼센티지 또는 소수값으로 출력할 수도 있다.According to an embodiment of the present application, in step S530, the data reliability prediction unit 140 may provide the prediction result of the reliability of the online data to be predicted in a binary form, with or without reliability, Reliability can be quantified and output as a percentage or decimal value.

상술한 설명에서, 단계 S510 내지 S530은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S510 to S530 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present application. In addition, some steps may be omitted if necessary, and the order between the steps may be changed.

본원의 일 실시 예에 따른 온라인 데이터 신뢰도 예측 모델 생성 방법 및 온라인 데이터 신뢰도 예측 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method for generating an online data reliability prediction model and the method for predicting online data reliability according to an embodiment of the present application may be implemented in the form of program instructions that can be performed through various computer means and may be recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

또한, 전술한 온라인 데이터 신뢰도 예측 모델 생성 방법 및 온라인 데이터 신뢰도 예측 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.Further, the above-described method for generating an online data reliability prediction model and the method for predicting online data reliability may also be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is for illustrative purposes, and those skilled in the art to which the present application pertains will understand that it is possible to easily modify to other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the above-described embodiments are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims below, rather than the detailed description, and it should be interpreted that all modifications or variations derived from the meaning and scope of the claims and equivalent concepts thereof are included in the scope of the present application.

100: 온라인 데이터 신뢰도 예측 장치
110: 데이터 수신부
120: 데이터 특성 추출부
130: 벡터 저장부
140: 데이터 신뢰도 예측부
10: 네트워크
21: 온라인 데이터
22: 예측 대상 온라인 데이터
30: 사용자 단말
40: 사용자 군100: online data reliability prediction device
110: data receiving unit
120: data characteristic extraction unit
130: vector storage
140: data reliability prediction unit
10: network
21: Online data
22: Predicted Online Data
30: user terminal
40: user group

Claims

In the online data reliability prediction method performed by the online data reliability prediction apparatus,
Selecting first and second factors that influence the reliability evaluation of the online data based on the cognitive process of the user group;
Storing the selected first and second elements in a characteristic vector space;
Learning a prediction model capable of evaluating the reliability of online data using a preset machine learning method based on the characteristics stored in the vector space; And
Receiving the predicted online data, and predicting the reliability of the predicted online data using the learned prediction model,
Including,
The step of selecting the first element and the second element,
An instinctive reliability evaluation step of collecting factors considered for reliability evaluation before the user group knows the reliability of the online data;
After the user group knows the reliability of the online data, a reflective reliability evaluation step of collecting factors considered for reliability evaluation; And
Among the elements collected in the instinctive reliability evaluation step, selecting the element with the highest ranking in the reflective reliability evaluation step as the second element,
Containing, Online data reliability prediction method.

delete

According to claim 1,
The highest ranked element includes content-related elements and author-related elements,
The step of selecting the second element is to exclude the author-related element and to select the content-related element as the second element, a method for predicting reliability of online data.

According to claim 1,
The second element includes a structure, a text length ratio, an image number ratio, and an alignment method for predicting reliability of online data.

According to claim 1,
The preset machine learning method,
A method for predicting the reliability of online data, which is one of logistic regression, random forest, linear support vector machine, or multi-layer perceptron.

According to claim 1,
The step of storing in the characteristic vector space is a method for predicting reliability of online data, wherein the selected first and second elements are mapped to a vector having a specific size by a TF-IDF weighting algorithm.

In the method for generating a reliability prediction model for online data performed by a reliability prediction device for online data,
Collecting reliable online data and unreliable online data;
Randomly selecting the trusted online data and the unreliable online data and transmitting them to a user group;
Receiving a result of the reliability evaluation of the trusted online data and the unreliable online data selected from the user group;
Selecting first and second elements influencing the reliability evaluation of the online data based on the recognition process of the user group based on the received reliability evaluation result;
Storing the selected first and second elements in a characteristic vector space; And
Learning a prediction model capable of evaluating the reliability of online data using a preset machine learning method based on the characteristics stored in the vector space,
Including,
The step of selecting the first element and the second element,
An instinctive reliability evaluation step of collecting factors considered for reliability evaluation before the user group knows the reliability of the online data;
After the user group knows the reliability of the online data, a reflective reliability evaluation step of collecting factors considered for reliability evaluation; And
Among the elements collected in the instinctive reliability evaluation step, selecting the element with the highest ranking in the reflective reliability evaluation step as the second element,
Containing, a method for generating a reliability prediction model of online data.

delete

The method of claim 7,
The preset machine learning method,
A method for generating a reliability prediction model for online data, which is one of logistic regression, random forest, linear support vector machine, or multi-layer perceptron. .

The method of claim 7,
The second element includes a structure, a text length ratio, an image count ratio, and alignment.

In the reliability prediction device for online data,
A data receiver configured to receive online data and online data to be predicted with known reliability;
A data characteristic extraction unit for selecting first and second factors influencing reliability evaluation based on a cognitive process of a user group from online data that knows the reliability;
A vector storage unit for storing the selected first and second elements in a characteristic vector space; And
An online data reliability prediction unit that derives a predicted value for the reliability of online data through a predictive model learned by a preset machine learning method based on the characteristics stored in the vector space,
Including,
The data characteristic extraction unit,
Before the user group knows the reliability of the online data, it performs an instinctive reliability evaluation step of collecting the factors considered in the reliability evaluation, and after the user group knows the reliability of the online data, reflects collecting the factors considered in the reliability evaluation An online data reliability prediction device that performs an appropriate reliability evaluation step and selects the element with the highest ranking in the reflective reliability evaluation step from among the elements collected in the instinctive reliability evaluation step as the second element.

delete

The method of claim 11,
The second element is an online data reliability prediction apparatus comprising a structure, a text length ratio, an image number ratio, and alignment.