KR102242317B1

KR102242317B1 - Qualitative system for determining fake news, qualitative method for determining fake news, and computer-readable medium having a program recorded therein for executing the same

Info

Publication number: KR102242317B1
Application number: KR1020190020905A
Authority: KR
Inventors: 강장묵; 박기남
Original assignee: 글로벌사이버대학교 산학협력단
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2021-04-21
Also published as: KR20200106231A

Abstract

본 발명의 정성적 가짜 뉴스 판단 시스템은 사용자 단말기 및 상기 사용자 단말기와 통신 가능한 서버를 포함하며, 상기 서버는, 데이터 스크래핑(data scraping) 기술을 이용하여 진짜 뉴스, 가짜 뉴스, 및 각각의 뉴스의 출처 정보를 포함하는 뉴스 정보를 수집하는 정보 수집부; 상기 각각의 뉴스 간의 유사도에 기초하여 상기 뉴스를 그룹핑하고 각 그룹별로 상기 출처 정보를 분석하여 공유 패턴을 추정하는 추정부; 상기 공유 패턴 및 상기 출처 정보 기반으로 상기 그룹별로 상기 가짜 뉴스가 차지하는 비율을 예측하는 확률값 산출부; 상기 각각의 뉴스를 상기 비율 기반으로 하나 이상의 클래스로 분류하여 훈련 데이터셋을 생성하는 데이터셋 생성부; 상기 훈련 데이터셋을 저장하는 데이터베이스; 상기 훈련 데이터셋을 기반으로 기계 학습을 수행하여 인공지능 모델을 생성하는 인공지능 처리장치; 및 상기 사용자 단말기로부터 가짜 뉴스 판단 요청을 입력받는 요청 입력부;를 포함하고, 상기 인공지능 모델이 사용되어 상기 사용자 단말기 상에 디스플레이된 뉴스가 가짜일 확률이 산출되는 것을 특징으로 하며, 이에 의하면, 가짜 뉴스의 제작, 공유 및 보급 과정에서, 가짜 뉴스가 인터넷 생태계에 미치는 영향을 이해함과 동시에 가짜 뉴스를 차단하거나 줄이는 방법에 대한 통찰력을 제공하고, 가짜 뉴스를 판단할 수 있다.The qualitative fake news determination system of the present invention includes a user terminal and a server capable of communicating with the user terminal, and the server uses a data scraping technology to provide real news, fake news, and sources of each news. An information collection unit that collects news information including information; An estimating unit for grouping the news based on the similarity between the news and estimating a sharing pattern by analyzing the source information for each group; A probability value calculating unit that predicts a proportion of the fake news for each group based on the sharing pattern and the source information; A data set generator configured to generate a training data set by classifying each news into one or more classes based on the ratio; A database storing the training data set; An artificial intelligence processing device for generating an artificial intelligence model by performing machine learning based on the training data set; And a request input unit that receives a request for determining fake news from the user terminal, wherein the artificial intelligence model is used to calculate a probability that the news displayed on the user terminal is fake, according to which, In the process of creating, sharing, and disseminating news, you can understand the impact of fake news on the Internet ecosystem, while providing insights on how to block or reduce fake news, and judge fake news.

Description

Qualitative system for determining fake news, qualitative method for determining fake news, and computer-readable medium having a program recorded therein for executing the same}

본 발명은 정성적 가짜 뉴스 판단 시스템, 판단 방법 및 이를 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체에 관한 것으로, 더 상세하게는 뉴스의 사실과 가치 그리고 문맥 등을 기준으로 가짜 뉴스를 정성적으로 판별하기 위한 인공 지능 요소 기술 중 자연어 처리 알고리즘을 이용한 시스템, 판단 방법 및 기록 매체에 관한 것이다.
The present invention relates to a qualitative fake news determination system, a determination method, and a computer-readable recording medium recording a program for executing the same, and more particularly, to qualitatively detect fake news based on the facts, values, and context of the news. It relates to a system, a judgment method, and a recording medium using a natural language processing algorithm among artificial intelligence element technologies for discrimination.

경찰은 가짜 뉴스를 `실제 언론 보도처럼 보이도록 가공해 신뢰도를 높이는 방식으로 유포되는 정보'로 정의하고 있으며 한국언론진흥재단은 `정치·경제적 이익을 위해 의도적으로 언론 보도의 형식을 하고 유포된 거짓 정보'로 정의한다. The police define fake news as'information disseminated in a way that improves credibility by processing it to look like actual media coverage'. It is defined as'information'.

가짜 뉴스는 누구나 만들 수 있으며, 만들도록 도와주는 인터넷사이트도 넘쳐나고 있고 최근에는 일인방송, 유투버 등이 방송과 같은 기능을 하여 더욱 가짜뉴스의 빈도가 높아지고 있다. 데일리파닥은 기사처럼 제목, 기자명을 넣고 사진을 첨부하면 어떤 내용도 기사 형태처럼 만들어주는 서비스를 제공한다. 가짜 뉴스를 만드는 걸 돕거나 장난 수준의 가짜 뉴스를 만드는 것이 위법은 아니라는 게 선거관리위원회 설명이다. 선거관리 위원회는 표현의 자유 측면에서 가짜 뉴스에 대해 접근하였는데, 이처럼 가짜 뉴스는 그 개념이나 정의 그리고 범위도 상당히 혼선을 많이 보이고 있는 실정이다. Anyone can make fake news, and internet sites that help make it are overflowing, and recently, single-person broadcasts and YouTubers function like broadcasts, and the frequency of fake news is increasing. DailyPadak provides a service that makes any content look like an article if you put a title and reporter's name like an article and attach a photo. It is not illegal to help make fake news or make fake news at the level of pranks, the Election Commission explained. The Election Commission approached fake news in terms of freedom of expression, and as such, the concept, definition, and scope of fake news are quite confused.

한국언론진흥재단의 가짜 뉴스 현황과 문제점(2017) 보고서에 따르면, 응답자의 76.3%가 포털, 페이스북, 카카오톡 등 인터넷 서비스를 통해 가짜 뉴스를 받은 것으로 나타났다. 가짜 뉴스의 유통 통로 중 절대 다수가 인터넷 서비스인 셈이다. 신문, TV 등 대중매체를 통해 접한 경우는 9.1%, 친구나 선후배 등 사적모임은 7.7%로 소수에 불과했다. According to the Korea Press Foundation's Fake News Status and Problems (2017) report, 76.3% of respondents received fake news through Internet services such as portals, Facebook, and KakaoTalk. It seems that the vast majority of the distribution channels of fake news are Internet services. Only a few were exposed to media such as newspapers and TV at 9.1%, and private gatherings such as friends or seniors and juniors at 7.7%.

특히, 이중에서도 카카오톡, 라인 등 모바일 메신저로 가짜 뉴스를 받은 이용자가 39.7%로 가장 높았다. 눈에 띄는 부분은 50대에서 45.6%로 그 비중이 특히 높았다는 점이다. 선거철이 다가오면 정보취약계층인 노년층을 대상으로 카카오톡을 통해 특정 후보에 대한 가짜 뉴스가 유통되는 식이다. '요금 할인' 찌라시톡 정도는 '애교' 수준이지만, 비방 등을 목적으로 한 허위사실 유포로 넘어가면 이야기가 심각해진다. In particular, among them, 39.7% of users who received fake news through mobile messengers such as KakaoTalk and Line were the highest. What stands out is that the proportion was particularly high at 45.6% in their 50s. As the election season approaches, fake news about specific candidates is distributed through Kakao Talk to the elderly, who are the information-vulnerable class. The'fare discount' Chirashitek level is at the level of'Aegyo', but the story gets serious when it goes to spreading false facts for the purpose of slander.

페이스북, 트위터 등 소셜플랫폼 역시 27.7%로 적지 않은 비율을 기록했다. 인터넷 카페/커뮤니티, 블로그도 24.3%로 높은 축에 속했다. 유튜브, 아프리카TV 등이 4.6%, 가짜 뉴스 사이트 3.7% 순이었다. Social platforms such as Facebook and Twitter also recorded a not small percentage at 27.7%. Internet cafes/communities and blogs were also high at 24.3%. YouTube and Africa TV followed 4.6%, followed by fake news sites 3.7%.

언론재단은 “온라인 이용자들은 단톡방 등에서 특정사실 또는 허위사실에 대한 글, 사진 등을 교환하는데, 이것이 밖으로 급속이 번져 나가면서 가짜 뉴스가 되기도 한다”며 “가짜 뉴스의 유통은 모바일 메신저, 소셜 플랫폼 등 콘텐츠 유통플랫폼을 통해 매개돼 전파되고 있다”고 분석했다. The media foundation said, “Online users exchange texts and photos about specific or false facts in single chat rooms, and this quickly spreads out and becomes fake news.” “The distribution of fake news is through mobile messengers and social platforms. It is being mediated and spreading through content distribution platforms such as,” he analyzed.

가짜 뉴스의 강력한 파괴력은 이와 같은 유통 구조에 있다. 찌라시는 단체 카톡방 위주로 음성적으로 퍼지는 반면 가짜 뉴스는 공개적으로 유통, 확산된다. 일간베스트 저장소(일베), 오늘의 유머(오유), 뽐뿌 등 인터넷 커뮤니티에 가짜 뉴스가 주로 도배된다. 해당 사이트에서 호응을 얻었다면 페이스북, 카톡, 트위터, 네이버밴드 등 SNS(social network services/sites)로 2차 확산된다. The powerful destructive power of fake news lies in this distribution structure. Chirashi is mainly distributed through group katokbangs, while fake news is publicly distributed and spread. Fake news is mainly flooded with Internet communities such as Daily Best Store (Ilbe), Today's Humor (Oyu), and Pompu. If the site is well received, it will be spread to SNS (social network services/sites) such as Facebook, KakaoTalk, Twitter, and Naver Band.

문제는 이러한 가짜 뉴스가 확산되더라도 막을 방안이 마땅치 않다는 점이다. The problem is that even if such fake news spreads, there is no way to stop it.

카카오톡, 페이스북 등 지인 기반 서비스의 경우 가짜 뉴스를 받더라도 '아는 사람'이 전달해준 만큼 정보에 대한 신뢰도가 일반적인 경우보다 높아지는 경향이 있다. 또, 전파 속도가 빠른 인터넷 서비스의 특성상, 피해자가 경찰에 신고해 유포자를 수사하더라도 이미 피해 사실은 눈덩이처럼 불어나 있는 경우가 대부분이다.In the case of acquaintance-based services such as KakaoTalk and Facebook, even if they receive fake news, the reliability of the information tends to be higher than that of the general case, as the'acquaintance' delivered it. In addition, due to the nature of the Internet service, which spreads quickly, even if the victim reports to the police and investigates the spreader, the fact of the damage is already snowballed in most cases.

이에 따라, 신뢰할 수 있는 뉴스 출처를 확인함으로써 온라인 콘텐츠의 신뢰성에 대한 통찰력을 제공할 수 있는 도구가 필요한 실정이다.Accordingly, there is a need for a tool that can provide insight into the reliability of online content by identifying reliable news sources.

종래기술로는, 국내등록특허 제10-1864439호(특허문헌 1)가 있다. 특허문헌 1은 가짜 뉴스 판별 가능한 게시글 그래픽 유저 인터페이스 화면창을 구비한 가짜 뉴스 판별 시스템을 제공한다. As a prior art, there is a domestic registered patent No. 10-1864439 (Patent Document 1). Patent Document 1 provides a fake news detection system with a graphic user interface screen window capable of identifying fake news.

이에 따르면, 진실(True), 거짓(Fake)의 댓글을 분리하여 감정적 싸움이 아닌 서로 논리적으로 논쟁할 수 있도록 여건을 조성하며, 의문이 가는 기사나 루머를 링크나 게시글 작성을 통하여 네티즌의 집단 자성을 활용하여 비교-판단하게 할 수 있다. According to this, by separating the comments of True and Fake, the environment is created so that they can argue logically with each other, rather than an emotional fight, and the collective self-defense of netizens by writing links or postings on articles or rumors in question. Can be used to compare-judge.

그러나, 이는 결국 네티즌 대다수의 의견에 기반하여 판별이 이루어지는 것으로 가짜 뉴스를 명확하게 판별할 수 있는 시스템으로 보기 어려우며, 참여도가 저조할수록, 판별력이 떨어질 수밖에 없다는 한계가 있다. 또 한편으로는 기자나 미디어 전문가 그리고 언론 학자가 실제 뉴스가 참인지 가짜인지 여부를 판단하는 전문가 기반의 시스템이 존재한다. 그러나 가짜 뉴스가 양적으로 팽창하고 있는 현 시점에서 제한된 전문가의 수작업으로 참과 가짜를 구분하는 것은 불가능한 상황이다. However, this is ultimately discriminated based on the opinions of the majority of netizens, which is difficult to see as a system capable of clearly discriminating fake news, and the lower the participation, the lower the discrimination power. On the other hand, there is an expert-based system for reporters, media experts, and journalists to judge whether actual news is true or fake. However, at the present time when fake news is expanding quantitatively, it is impossible to distinguish between true and fake by hand by limited experts.

따라서, 현재는, 네티즌의 의견과 상관없이, 인공지능 기술로 실제 뉴스에서 학습한 자연어 처리 기반의 알고리즘으로 뉴스의 가치나 내용 그리고 문맥이 참인지 거짓인지를 판단하는 가짜 뉴스 판별 시스템이 필요한 실정이다.
Therefore, at present, regardless of the opinions of netizens, a fake news discrimination system is needed to determine whether the value, content, and context of the news are true or false with an algorithm based on natural language processing learned from real news with artificial intelligence technology. .

KR 10-1864439 B1KR 10-1864439 B1

상기한 바와 같은 종래의 문제점을 해결하기 위한 본 발명은, 가짜 뉴스의 생산 및 공유의 동기를 파악하여 정성적으로 가짜 뉴스를 판별하여 차단하거나 줄일 수 있는 시스템 및 방법을 제공하는 것을 목적으로 한다.
An object of the present invention for solving the conventional problems as described above is to provide a system and method capable of blocking or reducing fake news by qualitatively discriminating fake news by grasping the motives of producing and sharing fake news.

상기 목적을 달성하기 위한 본 발명의 정성적 가짜 뉴스 판단 시스템은, 사용자 단말기 및 상기 사용자 단말기와 통신 가능한 서버를 포함하며, 상기 서버는, 데이터 스크래핑(data scrapping) 기술을 이용하여 진짜 뉴스, 가짜 뉴스, 및 각각의 뉴스의 출처 정보를 포함하는 뉴스 정보를 수집하는 정보 수집부; 상기 각각의 뉴스 간의 유사도에 기초하여 상기 뉴스를 그룹핑하고 각 그룹별로 상기 출처 정보를 분석하여 공유 패턴을 추정하는 추정부; 상기 공유 패턴 및 상기 출처 정보 기반으로 상기 그룹별로 상기 가짜 뉴스가 차지하는 비율을 예측하는 확률값 산출부; 상기 각각의 뉴스를 상기 비율 기반으로 하나 이상의 클래스로 분류하여 훈련 데이터셋을 생성하는 데이터셋 생성부; 상기 훈련 데이터셋을 저장하는 데이터베이스; 상기 훈련 데이터셋을 기반으로 기계 학습을 수행하여 인공지능 모델을 생성하는 인공지능 처리장치; 및 상기 사용자 단말기로부터 가짜 뉴스 판단 요청을 입력받는 요청 입력부;를 포함하고, 상기 인공지능 모델이 사용되어 상기 사용자 단말기 상에 디스플레이된 뉴스가 가짜일 확률이 산출되는 것을 특징으로 한다. The qualitative fake news determination system of the present invention for achieving the above object includes a user terminal and a server capable of communicating with the user terminal, and the server uses a data scrapping technology to provide real news and fake news. , And an information collection unit for collecting news information including source information of each news; An estimating unit for grouping the news based on the similarity between the news and estimating a sharing pattern by analyzing the source information for each group; A probability value calculating unit that predicts a proportion of the fake news for each group based on the sharing pattern and the source information; A data set generator configured to generate a training data set by classifying each news into one or more classes based on the ratio; A database storing the training data set; An artificial intelligence processing device for generating an artificial intelligence model by performing machine learning based on the training data set; And a request input unit receiving a request for determining fake news from the user terminal, wherein the artificial intelligence model is used to calculate a probability that the news displayed on the user terminal is fake.

또한, 상기 유사도는 상기 각각의 뉴스의 제목, 주요 문장, 분야, 및 제작 시기 중 적어도 어느 한 가지가 비교됨으로써 산출되는 것이 바람직하다. In addition, the similarity is preferably calculated by comparing at least one of the title, main sentence, field, and production time of each of the news.

또한, 상기 공유 패턴은 상기 그룹에 포함된 뉴스의 개수, 보도 매체의 종류, 제작 시기의 밀집도 중 적어도 한 가지가 사용되어 추정되는 것이 바람직하다. In addition, the sharing pattern is preferably estimated by using at least one of the number of news included in the group, the type of news media, and the density of production time.

또한, 상기 공유 패턴은 상기 그룹에 포함된 상기 뉴스의 상기 제작 시기에 기초하여 시간 순으로 상기 뉴스의 순서가 결정되어 상기 뉴스의 확산 경로 및 속도가 산출됨으로써 추정되는 것이 바람직하다. In addition, the sharing pattern is preferably estimated by determining the order of the news in chronological order based on the production timing of the news included in the group, and calculating the spreading path and speed of the news.

또한, 상기 주요 문장은 프로세서에 의해 상기 뉴스 정보를 대상으로 형태소 분석 단계, 개체명 인식 단계, 구문 분석 단계, 대용어 처리 단계, 어휘 의미 분석 단계, 의미 역인식 단계, 및 상호 참조 단계 중 적어도 어느 한 단계가 수행됨으로써 추출되는 것이 바람직하다. In addition, the main sentence is at least one of a morpheme analysis step, an entity name recognition step, a syntax analysis step, a proxy processing step, a vocabulary semantic analysis step, a meaning inverse recognition step, and a cross-reference step for the news information by a processor. It is preferably extracted by performing one step.

또한, 상기 확률값 산출부에서 상기 비율을 예측할 시에, 상기 각각의 뉴스의 상기 제목 및 상기 주요 문장의 내용 불일치 여부에 따라 상기 비율이 달라지는 것이 바람직하다. In addition, when the probability value calculation unit predicts the ratio, it is preferable that the ratio is changed according to whether the title of each news and the content of the main sentence are inconsistent.

또한, 상기 확률값 산출부에서 상기 비율을 예측할 시에, 상기 각각의 뉴스의 맥락 불일치 여부에 따라 상기 비율이 달라지는 것이 바람직하다. In addition, when predicting the ratio in the probability value calculation unit, it is preferable that the ratio is changed according to whether the context of each news is inconsistent.

또한, 상기 인공지능 처리장치에는 인공지능에 쓰이는 알고리즘이 저장되어 있고, 상기 알고리즘은 인공 신경망 (artificial neural networks) 또는 퍼지 신경망(fuzzy neural networks)을 사용하는 것이 바람직하다.
In addition, the artificial intelligence processing apparatus stores an algorithm used for artificial intelligence, and the algorithm is preferably an artificial neural network or a fuzzy neural network.

상기 목적을 달성하기 위한 본 발명의 정성적 가짜 뉴스 판단 방법은, 서버에 포함된 정보 수집부에 의해, 데이터 스크래핑(data scrapping) 기술이 이용되어 진짜 뉴스, 가짜 뉴스, 및 각각의 뉴스의 출처 정보를 포함하는 뉴스 정보가 수집되는 단계; 상기 서버에 포함된 추정부에 의해 상기 각각의 뉴스 간의 유사도에 기초하여 상기 뉴스가 그룹핑되고 각 그룹별로 상기 출처 정보가 분석되어 공유 패턴이 추정되는 단계; 상기 서버에 포함된 확률값 산출부에 의해 상기 공유 패턴 및 상기 출처 정보 기반으로 상기 그룹별로 상기 가짜 뉴스가 차지하는 비율이 예측되는 단계; 상기 서버에 포함된 데이터셋 생성부에 의해 상기 각각의 뉴스가 상기 비율 기반으로 하나 이상의 클래스로 분류되어 훈련 데이터셋이 생성되는 단계; 상기 서버에 포함된 인공지능 처리장치에 의해 상기 훈련 데이터셋을 기반으로 기계 학습이 수행되어 인공지능 모델이 생성되는 단계; 및 상기 인공지능 모델이 사용되어 사용자 단말기 상에 디스플레이된 뉴스가 가짜일 확률이 산출되는 단계를 포함하는 것을 특징으로 한다.The qualitative fake news determination method of the present invention for achieving the above object is, by the information collection unit included in the server, data scrapping technology is used to provide real news, fake news, and source information of each news. Collecting news information including; Estimating a sharing pattern by grouping the news by an estimation unit included in the server based on the similarity between the news and analyzing the source information for each group; Predicting a proportion of the fake news for each group based on the sharing pattern and the source information by a probability value calculator included in the server; Generating a training data set by classifying each news into one or more classes based on the ratio by a data set generator included in the server; Generating an artificial intelligence model by performing machine learning based on the training data set by an artificial intelligence processing device included in the server; And calculating a probability that the news displayed on the user terminal is fake by using the artificial intelligence model.

또한, 상기 뉴스가 가짜일 확률이 산출되는 단계는, 상기 사용자 단말기 상에 디스플레이된 뉴스가 상기 서버에 입력되는 단계; 상기 정보 수집부에 의해, 데이터 스크래핑(data scrapping) 기술이 이용되어 상기 뉴스와 유사한 뉴스 정보가 수집되는 단계; 수집된 상기 유사한 뉴스 정보 기반으로 공유 패턴이 추정되는 단계; 상기 인공지능 모델이 사용되어 상기 공유 패턴 기반으로 상기 확률이 산출되어 상기 뉴스의 가짜 여부가 판단되는 단계; 및 상기 뉴스의 상기 확률 및 상기 가짜 여부 중 어느 하나가 결과로서 출력되는 단계;를 포함하는 것이 바람직하다. In addition, the calculating the probability that the news is fake may include inputting the news displayed on the user terminal to the server; Collecting, by the information collecting unit, news information similar to the news by using a data scrapping technique; Estimating a sharing pattern based on the collected similar news information; Determining whether the news is fake by using the artificial intelligence model and calculating the probability based on the sharing pattern; And outputting one of the probability of the news and whether the news is fake as a result.

더 나아가 상기 뉴스와 유사한 다른 뉴스 중 상기 다른 뉴스가 가짜일 확률이 기 설정된 확률 이상일 경우 리스팅되어 결과로 출력될 수 있다.
Furthermore, if the probability that the other news is fake among other news similar to the news is more than a preset probability, it may be listed and output as a result.

상기 목적을 달성하기 위해 본 발명은 정성적 가짜 뉴스 판단 방법을 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체를 더 제공한다.
In order to achieve the above object, the present invention further provides a computer-readable recording medium in which a program for executing a qualitative fake news determination method is recorded.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 첨부 도면들에 포함되어 있다. Details of other embodiments are included in the detailed description and the accompanying drawings.

본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예를 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예는 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다.
Advantages and/or features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, only this embodiment is intended to complete the disclosure of the present invention, and the general knowledge in the technical field to which the present invention belongs. It is provided to fully inform the person of the scope of the present invention.

이상에서 살펴본 바와 같이, 본 발명에 의하면, 가짜 뉴스의 제작, 공유 및 보급 과정에서, 가짜 뉴스인지의 여부를 판단할 수 있어, 가짜 뉴스를 차단하거나 줄이는 방법을 제공한다.
As described above, according to the present invention, in the process of creating, sharing, and distributing fake news, it is possible to determine whether or not it is fake news, thereby providing a method of blocking or reducing fake news.

도 1은 본 발명의 정성적 가짜 뉴스 판단 시스템의 구성을 설명하기 위한 개념도이다.
도 2는 본 발명의 정성적 가짜 뉴스 판단 시스템의 서버에서의 동작들을 설명하기 위한 개념도이다.
도 3은 본 발명의 정성적 가짜 뉴스 판단 시스템의 사용자 단말기 상에서의 결과가 출력되는 일 예시를 나타내는 도면이다.
도 4는 본 발명의 정성적 가짜 뉴스 판단 방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 정성적 가짜 뉴스 판단 방법을 더 설명하기 위한 흐름도이다. 1 is a conceptual diagram for explaining the configuration of a qualitative fake news determination system of the present invention.
2 is a conceptual diagram illustrating operations in a server of the qualitative fake news determination system of the present invention.
3 is a diagram illustrating an example of outputting a result on a user terminal of the qualitative fake news determination system of the present invention.
4 is a flowchart illustrating a method of determining qualitative fake news according to the present invention.
5 is a flowchart for further explaining the method of determining qualitative fake news according to the present invention.

본 발명을 상세하게 설명하기 전에, 본 명세서에서 사용된 용어나 단어는 통상적이거나 사전적인 의미로 무조건 한정하여 해석되어서는 아니되며, 본 발명의 발명자가 자신의 발명을 가장 최선의 방법으로 설명하기 위해서 각종 용어의 개념을 적절하게 정의하여 사용할 수 있고, 더 나아가 이들 용어나 단어는 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 함을 알아야 한다.Before describing the present invention in detail, terms or words used in the present specification should not be interpreted as being unconditionally limited to their usual or dictionary meanings, and in order for the inventors of the present invention to describe their invention in the best way It should be understood that the concepts of various terms can be appropriately defined and used, and furthermore, these terms or words should be interpreted as meanings and concepts consistent with the technical idea of the present invention.

즉, 본 명세서에서 사용된 용어는 본 발명의 바람직한 실시예를 설명하기 위해서 사용되는 것일 뿐이고, 본 발명의 내용을 구체적으로 한정하려는 의도로 사용된 것이 아니며, 이들 용어는 본 발명의 여러 가지 가능성을 고려하여 정의된 용어임을 알아야 한다.That is, the terms used in the present specification are only used to describe the preferred embodiments of the present invention, and are not intended to specifically limit the content of the present invention, and these terms are used to describe various possibilities of the present invention. It should be noted that this is a term defined in consideration.

또한, 본 명세서에 있어서, 단수의 표현은 문맥상 명확하게 다른 의미로 지시하지 않는 이상, 복수의 표현을 포함할 수 있으며, 유사하게 복수로 표현되어 있다고 하더라도 단수의 의미를 포함할 수 있음을 알아야 한다.In addition, in this specification, it should be understood that the singular expression may include a plural expression unless clearly indicated in a different meaning in the context, and even if similarly expressed in the plural, the singular expression may include the meaning of the singular number. do.

본 명세서의 전체에 걸쳐서 어떤 구성 요소가 다른 구성 요소를 "포함"한다고 기재하는 경우에는, 특별히 반대되는 의미의 기재가 없는 한 임의의 다른 구성 요소를 제외하는 것이 아니라 임의의 다른 구성 요소를 더 포함할 수도 있다는 것을 의미할 수 있다.Throughout the present specification, when a component is described as "including" another component, it does not exclude any other component, but further includes any other component unless otherwise indicated. It can mean that you can do it.

더 나아가서, 어떤 구성 요소가 다른 구성 요소의 "내부에 존재하거나, 연결되어 설치된다"고 기재한 경우에는, 이 구성 요소가 다른 구성 요소와 직접적으로 연결되어 있거나 접촉하여 설치되어 있을 수 있고, 일정한 거리를 두고 이격되어 설치되어 있을 수도 있으며, 일정한 거리를 두고 이격되어 설치되어 있는 경우에 대해서는 해당 구성 요소를 다른 구성 요소에 고정 내지 연결시키기 위한 제 3의 구성 요소 또는 수단이 존재할 수 있으며, 이 제 3의 구성 요소 또는 수단에 대한 설명은 생략될 수도 있음을 알아야 한다.Furthermore, in the case where a component is described as "existing inside or connected and installed" of another component, the component may be directly connected to or installed in contact with the other component, and It may be installed spaced apart by a distance, and in the case of installation spaced apart by a certain distance, a third component or means may exist for fixing or connecting the component to other components. It should be noted that a description of the elements or means of 3 may be omitted.

반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결"되어 있다거나, 또는 "직접 접속"되어 있다고 기재되는 경우에는, 제 3의 구성 요소 또는 수단이 존재하지 않는 것으로 이해하여야 한다.On the other hand, when a component is described as being "directly connected" or "directly connected" to another component, it should be understood that there is no third component or means.

마찬가지로, 각 구성 요소 간의 관계를 설명하는 다른 표현들, 즉 " ~ 사이에"와 "바로 ~ 사이에", 또는 " ~ 에 이웃하는"과 " ~ 에 직접 이웃하는" 등도 마찬가지의 취지를 가지고 있는 것으로 해석되어야 한다.Likewise, other expressions describing the relationship between each component, such as "between" and "directly between", or "neighbor to" and "directly neighbor to" have the same effect. Should be interpreted as.

또한, 본 명세서에 있어서 "일면", "타면", "일측", "타측", "제 1", "제 2" 등의 용어는, 사용된다면, 하나의 구성 요소에 대해서 이 하나의 구성 요소가 다른 구성 요소로부터 명확하게 구별될 수 있도록 하기 위해서 사용되며, 이와 같은 용어에 의해서 해당 구성 요소의 의미가 제한적으로 사용되는 것은 아님을 알아야 한다.In addition, in the present specification, terms such as “one side”, “the other side”, “one side”, “the other side”, “first”, and “second”, if used, refer to one constituent element for one constituent element. Is used to be clearly distinguishable from other constituent elements, and it should be noted that the meaning of the constituent element is not limitedly used by such terms.

또한, 본 명세서에서 "상", "하", "좌", "우" 등의 위치와 관련된 용어는, 사용된다면, 해당 구성 요소에 대해서 해당 도면에서의 상대적인 위치를 나타내고 있는 것으로 이해하여야 하며, 이들의 위치에 대해서 절대적인 위치를 특정하지 않는 이상은, 이들 위치 관련 용어가 절대적인 위치를 언급하고 있는 것으로 이해하여서는 아니된다.In addition, terms related to positions such as "upper", "lower", "left", and "right" in the present specification, if used, should be understood as indicating a relative position in the drawing with respect to the corresponding component, These position-related terms should not be understood as referring to absolute positions unless absolute positions are specified for their positions.

더욱이, 본 발명의 명세서에서는, "…부", "…기", "모듈", "장치" 등의 용어는, 사용된다면, 하나 이상의 기능이나 동작을 처리할 수 있는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어, 또는 하드웨어와 소프트웨어의 결합으로 구현될 수 있음을 알아야 한다.Furthermore, in the specification of the present invention, terms such as "... unit", "... group", "module", "device", if used, mean a unit capable of processing one or more functions or operations, which is a hardware Or it should be noted that it can be implemented in software, or a combination of hardware and software.

또한, 본 명세서에서는 각 도면의 각 구성 요소에 대해서 그 도면 부호를 명기함에 있어서, 동일한 구성 요소에 대해서는 이 구성 요소가 비록 다른 도면에 표시되더라도 동일한 도면 부호를 가지고 있도록, 즉 명세서 전체에 걸쳐 동일한 참조 부호는 동일한 구성 요소를 지시하고 있다.In addition, in the present specification, in specifying the reference numerals for each component of each drawing, the same reference numerals for the same components even if the components are indicated in different drawings, that is, the same reference throughout the specification. The symbols indicate the same components.

본 명세서에 첨부된 도면에서 본 발명을 구성하는 각 구성 요소의 크기, 위치, 결합 관계 등은 본 발명의 사상을 충분히 명확하게 전달할 수 있도록 하기 위해서 또는 설명의 편의를 위해서 일부 과장 또는 축소되거나 생략되어 기술되어 있을 수 있고, 따라서 그 비례나 축척은 엄밀하지 않을 수 있다.In the drawings attached to the present specification, the size, position, coupling relationship, etc. of each component constituting the present invention are partially exaggerated, reduced, or omitted in order to sufficiently clearly convey the spirit of the present invention or for convenience of description. It may have been described, and therefore its proportion or scale may not be exact.

또한, 이하에서, 본 발명을 설명함에 있어서, 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 구성, 예를 들어, 종래 기술을 포함하는 공지 기술에 대한 상세한 설명은 생략될 수도 있다.
In addition, in the following description of the present invention, a detailed description of a configuration determined to unnecessarily obscure the subject matter of the present invention, for example, a known technology including the prior art may be omitted.

이하에서는 본 발명의 일 실시예에 따른 정성적 가짜 뉴스 판단 시스템, 판단 방법 및 이를 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체를 설명한다.Hereinafter, a qualitative fake news determination system, a determination method, and a computer-readable recording medium in which a program for executing the same according to an embodiment of the present invention is recorded will be described.

먼저, 도 1 내지 도 3을 참조하여, 본 발명의 일 실시예에 따른 정성적 가짜 뉴스 판단 시스템을 설명한다. First, a qualitative fake news determination system according to an embodiment of the present invention will be described with reference to FIGS. 1 to 3.

도 1은 본 발명의 정성적 가짜 뉴스 판단 시스템의 구성을 설명하기 위한 개념도이다. 1 is a conceptual diagram for explaining the configuration of a qualitative fake news determination system of the present invention.

도 1에 도시된 바와 같이, 본 발명의 정성적 가짜 뉴스 판단 시스템은, 서버(100), 통신망(200) 및 사용자 단말기(300)를 포함한다. As shown in FIG. 1, the qualitative fake news determination system of the present invention includes a server 100, a communication network 200, and a user terminal 300.

서버(100)는 정보 수집부(110), 추정부(120), 확률값 산출부(130), 데이터셋 생성부(140), 데이터베이스(150), 인공지능 처리장치(160), 요청 입력부(170) 및 출력부(180)를 포함하는 것이 바람직하다. 이와 같은 서버(100)의 각각의 유닛들은 하나 이상의 프로세서가 구비된 것이 바람직하다. The server 100 includes an information collection unit 110, an estimation unit 120, a probability value calculation unit 130, a data set generation unit 140, a database 150, an artificial intelligence processing unit 160, and a request input unit 170. ) And an output unit 180 is preferably included. Each unit of the server 100 is preferably provided with one or more processors.

정보 수집부(110)는 서버(100)에서 데이터 스크래핑(data scraping) 기술을 이용하여 진짜 뉴스, 가짜 뉴스, 및 각각의 뉴스의 출처 정보를 포함하는 뉴스 정보를 수집하는 유닛인 것이 바람직하다. The information collection unit 110 is preferably a unit that collects news information including real news, fake news, and source information of each news by using a data scraping technology in the server 100.

데이터 스크래핑 기술을 이용하면 웹 브라우저 또는 SNS에 노출된 다양한 정형, 비정형, 반정형 데이터를 실시간 분석할 수 있다. 데이터 스크래핑 기술은 자동으로 시스템에 접속해 데이터를 화면에 나타낸 후 필요한 자료를 추출하여 가져오는 기술로서, 특히 웹 스크래핑은 웹사이트 상에서 원하는 특정 부분으로부터 정보를 추출하는 기술이다. 이를 통해 웹사이트 혹은 프로그램의 정보를 끄집어내 다른 프로그램이나 DB에 저장하여 필요할 때마다 수시로 조회 및 사용이 가능하며 저장된 데이터를 비교분석 자료로 활용할 수 있다.Using data scraping technology, various structured, unstructured, and semi-structured data exposed to web browsers or SNS can be analyzed in real time. Data scraping technology is a technology that automatically connects to a system, displays data on the screen, and extracts and retrieves necessary data. Web scraping is a technology that extracts information from a specific part of a website. Through this, information of a website or program can be retrieved and stored in another program or DB so that it can be viewed and used whenever necessary, and the stored data can be used as comparative analysis data.

여기서 출처 정보는, 최초로 해당 뉴스가 제작된 웹사이트, 작성자, 제작 일시, 보도 매체의 종류, 유통 채널, 소비자, 등을 포함할 수 있다. Here, the source information may include a website on which the corresponding news was first produced, a creator, a production date and time, a type of news media, a distribution channel, a consumer, and the like.

추정부(120)는 정보 수집부(110)와 연결되어, 각각의 뉴스 간의 유사도에 기초하여 상기 뉴스를 그룹핑하고 각 그룹별로 출처 정보를 분석하여 공유 패턴을 추정하는 유닛인 것이 바람직하다. The estimating unit 120 is preferably a unit connected to the information collecting unit 110 to estimate a sharing pattern by grouping the news based on the similarity between the news and analyzing source information for each group.

도 2를 더 참조하여 그룹핑, 공유 패턴, 및 확률값 산출의 일 예시를 설명한다. An example of calculating grouping, sharing patterns, and probability values will be described with further reference to FIG. 2.

도 2는 본 발명의 정성적 가짜 뉴스 판단 시스템의 서버에서의 동작들을 설명하기 위한 개념도이다.2 is a conceptual diagram illustrating operations in a server of the qualitative fake news determination system of the present invention.

도 2에 도시된 바와 같이, 서로 유사한 뉴스 1a, 뉴스 1b, 뉴스 1c 등이 그룹 1로 그룹핑될 수 있다. As shown in FIG. 2, similar news 1a, news 1b, news 1c, and the like may be grouped into group 1. As shown in FIG.

여기서 유사도는, 각각의 뉴스의 제목, 주요 문장, 분야, 및 제작 시기 중 적어도 어느 한 가지가 비교됨으로써 산출되는 것이 바람직하다. 예를 들어, 경제 뉴스 중 비슷한 시기에 작성된 비슷한 주제의 뉴스들이 하나의 그룹으로 그룹핑될 수 있다. 여기서 비슷한 주제인지 판단되기 위해, 뉴스의 제목, 주요 문장 및 분야가 분석될 수 있다. Here, the similarity is preferably calculated by comparing at least one of the title, main sentence, field, and production time of each news. For example, among economic news, news of similar topics created at a similar time may be grouped into a group. Here, the title, main sentence, and field of the news may be analyzed to determine whether it is a similar subject.

여기서 주요 문장은 프로세서에 의해 뉴스 정보를 대상으로 형태소 분석 단계, 개체명 인식 단계, 구문 분석 단계, 대용어 처리 단계, 어휘 의미 분석 단계, 의미 역인식 단계, 및 상호 참조 단계 중 적어도 어느 한 단계가 수행됨으로써 추출되는 것이 바람직하다. Here, the main sentence is at least one of a morpheme analysis step, an entity name recognition step, a syntax analysis step, a proxy processing step, a vocabulary semantic analysis step, a meaning inverse recognition step, and a cross-reference step for news information by the processor. It is preferably extracted by carrying out.

공유 패턴은 그룹에 포함된 뉴스의 개수, 보도 매체의 종류, 제작 시기의 밀집도 중 적어도 한 가지가 사용되어 추정되는 것이 바람직하다. The sharing pattern is preferably estimated by using at least one of the number of news included in the group, the type of news media, and the density of production time.

예를 들어, 하나의 그룹에 유사도가 높은 뉴스가 25개가 있고, 이중 보도 매체가 페이스북, 외국 언론 매체, 블로그, 카카오톡 등이며, 모두 이틀 사이에 제작 및 배포된 것으로 분석될 수 있다. 이를 토대로 추정부에서 뉴스가 확산된 경로 및 속도가 산출될 수 있다. 이와 같이, 공유 패턴은 그룹에 포함된 뉴스의 제작 시기에 기초하여 시간 순으로 뉴스의 순서가 결정되어 뉴스의 확산 경로 및 속도가 산출됨으로써 추정되는 것이 바람직하다. For example, there are 25 news stories with high similarity in one group, and among them, Facebook, foreign media, blogs, and KakaoTalk, all of which were produced and distributed within two days can be analyzed. Based on this, the estimating unit can calculate the path and speed of spreading the news. As described above, the sharing pattern is preferably estimated by determining the order of the news in chronological order based on the production timing of the news included in the group, and calculating the spreading path and speed of the news.

도 2에 도시된 바와 같이, 그룹 1에 그룹핑된 뉴스 1a 등의 출처가 출처 1, 출처 2, 등으로 확인될 수 있다. 이때 출처가 블로그 형태, 언론사 형태 등의 매체로 확인될 수 있다. As shown in FIG. 2, sources of news 1a grouped in group 1 may be identified as source 1, source 2, and the like. At this time, the source can be identified as a medium such as a blog type or a press company type.

도 2에 도시된 바와 같이, 각각의 매체의 출현 횟수 및 각각의 뉴스의 제작 일시에 따라 확률값이 산출될 수 있다. As shown in FIG. 2, a probability value may be calculated according to the number of appearances of each medium and the production date and time of each news.

도 1에 도시된 바와 같이, 확률값 산출부(130)는 추정부(120)와 연결되어, 공유 패턴 및 출처 정보 기반으로 그룹별로 가짜 뉴스가 차지하는 비율, 즉 확률값을 예측하는 것이 바람직하다. 이와 같은 확률값에 대해, 인공지능을 통한 반복적인 학습과 수정에 의해 점차적으로 정확도가 증가할 수 있다.As shown in FIG. 1, it is preferable that the probability value calculation unit 130 is connected to the estimation unit 120 and predicts a ratio, that is, a probability value, of fake news for each group based on the sharing pattern and source information. For such a probability value, the accuracy may gradually increase by repetitive learning and correction through artificial intelligence.

확률값 산출부(130)에서 가짜 뉴스가 차지하는 비율을 예측할 시에, 공유 패턴 및 출처 정보만을 분석했을 때, 신뢰성이 떨어지는 보도 매체들 위주로 뉴스가 확산 되었거나, 추가적으로 SNS를 통해 확산돼서 확산 속도 (평소 활동하지 않던 블로거 등이 갑자기 연속적으로 관련 뉴스를 공유하거나 다수가 조직적으로 특정 뉴스를 확산하는 징후 등)가 기설정된 정도를 넘을 경우, 가짜일 확률이 높아진다. When the probability value calculation unit 130 predicts the proportion of fake news, when only the sharing pattern and source information is analyzed, the news has spread mainly to report media that are less reliable, or additionally through SNS, the spread rate (normal activity If a blogger who didn't do it suddenly continuously shares related news, or signs that a large number of people are spreading certain news systematically) exceeds a preset level, the probability of being faked increases.

이때, 추가적으로 뉴스의 제목 및 주요 문장의 내용이 분석되어, 제목과 내용이 불일치할 경우, 가짜일 확률이 더 높아질 수 있다. 또한, 다른 일 실시 예에서는, 확률값 산출부(130)에서 가짜 뉴스가 차지하는 비율을 예측할 시에, 각각의 뉴스의 맥락 불일치 여부에 따라 비율이 결정될 수도 있다. 예를 들어, 뉴스의 내용 중 맥락과 관계 없는 문장(여러 방법 중 해당 뉴스를 초록으로 만든 후 해당 요약본과 본문을 비교하면서 거리가 먼 문장을 맥락과 상관없는 문장으로 뽑음)이 존재할 경우, 뉴스가 가짜일 확률이 더 높아질 수 있다. In this case, the title and content of the main sentence of the news are additionally analyzed, and if the title and the content are inconsistent, the probability of being fake may be higher. In addition, in another embodiment, when the probability value calculator 130 predicts the proportion of fake news, the ratio may be determined according to whether the context of each news is inconsistent. For example, if there is a sentence that is not related to the context in the contents of the news (the news is made abstract among several methods, and the distant sentence is selected as a sentence that is not related to the context while comparing the summary with the text), the news is It can be more likely to be fake.

또는 뉴스의 제목과 내용의 불일치, 뉴스 내용 안에서 특정 문장이나 문단의 불일치 등을 각각 병렬로 처리하거나 순차적으로 처리하는 순서를 달리하여 더 정교한 값을 도출할 수 있다.Or, it is possible to derive more sophisticated values by processing inconsistencies between the title and contents of the news, and inconsistencies in specific sentences or paragraphs in the news contents, respectively, or by changing the order of sequential processing.

데이터셋 생성부(140)는 확률값 산출부(130)와 연결되어, 각각의 뉴스를 상기 비율 기반으로 하나 이상의 클래스로 분류하여 훈련 데이터셋을 생성하는 유닛인 것이 바람직하다. The dataset generation unit 140 is preferably a unit that is connected to the probability value calculation unit 130 and generates a training dataset by classifying each news into one or more classes based on the ratio.

데이터베이스(150)는 데이터셋 생성부(140)에서 생성된 훈련 데이터셋을 저장한다. 데이터베이스(150)는 훈련 데이터셋뿐만 아니라, 추정부(120), 확률값 산출부(130) 및 인공지능 처리장치(160)와도 연결되어, 각각으로부터의 데이터를 저장할 수 있다. The database 150 stores the training data set generated by the data set generation unit 140. The database 150 may be connected to not only the training dataset, but also the estimating unit 120, the probability value calculating unit 130, and the artificial intelligence processing apparatus 160, and may store data from each of them.

즉, 데이터베이스(150)는 추청부(120)에서의 공유 패턴과 확률값 산출부(130)에서의 비율, 인공지능 처리장치(160)에서 생성된 인공지능 모델(161) 등을 저장할 수 있다. That is, the database 150 may store a shared pattern in the estimation unit 120, a ratio in the probability value calculation unit 130, and the artificial intelligence model 161 generated by the artificial intelligence processing apparatus 160.

인공지능 처리장치(160)는 훈련 데이터셋을 기반으로 기계 학습을 수행하여 인공지능 모델(161)을 생성하는 유닛인 것이 바람직하다. The artificial intelligence processing apparatus 160 is preferably a unit that generates an artificial intelligence model 161 by performing machine learning based on a training data set.

인공지능 처리장치(160)에는 인공지능에 쓰이는 알고리즘이 저장되어 있고, 상기 알고리즘은 인공 신경망 (artificial neural networks) 또는 퍼지 신경망(fuzzy neural networks)을 사용하는 것이 바람직하다. 특히, 인공지능 처리장치(160)에는 자연어 처리를 위한 알고리즘이 저장되어 있는 것이 바람직하다.Algorithms used for artificial intelligence are stored in the artificial intelligence processing device 160, and it is preferable to use artificial neural networks or fuzzy neural networks as the algorithm. In particular, it is preferable that the artificial intelligence processing apparatus 160 stores an algorithm for processing natural language.

인공 신경망은 구조 및 기능에 따라 여러 종류로 구분되며, 가장 일반적인 인공 신경망은 한 개의 입력층과 출력층 사이에 다수의 은닉층(hidden layer)이 있는 다층 퍼셉트론(multilayer perceptron)이다. 인공 신경망은 인공지능 처리장치(160)를 통해 구현될 수 있으며, 기초 컴퓨팅 단위인 뉴런 여러 개가 가중된 링크(weighted link)로 연결된 형태로서, 가중된 링크(weighted link)는 주어진 환경에 적응할 수 있도록 가중치를 조정할 수 있다.Artificial neural networks are classified into several types according to their structure and function, and the most common artificial neural network is a multilayer perceptron with multiple hidden layers between one input layer and an output layer. The artificial neural network can be implemented through the artificial intelligence processing device 160, and is a form in which several neurons, which are basic computing units, are connected by a weighted link, so that the weighted link can adapt to a given environment. You can adjust the weight.

인공 신경망은 자기 조직화 지도(SOM: Self-Organizing Map), 순환 신경망(RNN: Recurrent Neural Network), 콘볼루션 신경망(CNN: Convolutional Neural Network)과 같은 다양한 모델을 포함하며, 이에 한정되지 않는다. The artificial neural network includes various models such as a self-organizing map (SOM), a recurrent neural network (RNN), and a convolutional neural network (CNN), but is not limited thereto.

퍼지 신경망은 규칙들을 언어적으로 표현하거나 지식 베이스에 새로운 규칙을 갱신하는데 신경망의 학습 능력을 이용하는 시스템이다. 본 발명에서 퍼지 신경망을 사용할 경우, 규칙들은 공유 패턴, 유사도, 및/또는 가짜 확률값 등을 산출하는 데에 정의되어 사용될 수 있다. Fuzzy neural networks are systems that use the learning capabilities of neural networks to express rules verbally or update new rules in a knowledge base. In the case of using a fuzzy neural network in the present invention, rules may be defined and used to calculate a shared pattern, similarity, and/or a false probability value.

요청 입력부(170)는 사용자 단말기로부터 가짜 뉴스 판단 요청을 입력받는 유닛인 것이 바람직하다. 여기서 가짜 뉴스 판단 요청은 사용자 단말기(300)로부터 수신된 요청으로서, 판단하고자 하는 뉴스의 제목, 본문, 출처 정보, 기자 정보, 공유 빈도 및 분당 확산 정보 등을 포함하는 정보인 것이 바람직하다. The request input unit 170 is preferably a unit that receives a request for determining fake news from a user terminal. Here, the fake news determination request is a request received from the user terminal 300, and is preferably information including the title, body, source information, reporter information, sharing frequency and spread information per minute of the news to be determined.

사용자 단말기(300)는 스마트폰, 태블릿, 터치 스크린, 웨어러블 일종인 스마트 와치 등이 구비된 컴퓨터 등과 같은 사용자 단말기인 것이 바람직하다. The user terminal 300 is preferably a user terminal such as a computer equipped with a smart phone, a tablet, a touch screen, and a smart watch, which is a kind of wearable.

이와 같이 요청 입력부(170)에 요청이 입력되면, 인공지능 처리장치(160)로 생성되어 데이터베이스(150)에 저장된 인공지능 모델(161)이 사용되어 사용자 단말기(300) 상에 디스플레이된 뉴스가 가짜일 확률이 산출될 수 있다. In this way, when a request is input to the request input unit 170, the artificial intelligence model 161 generated by the artificial intelligence processing device 160 and stored in the database 150 is used, and the news displayed on the user terminal 300 is fake. Probability can be calculated.

이와 같이 산출된 확률값은 출력부(180)를 통해 출력되어 서버(100)로부터 통신망(200)을 통해 사용자 단말기(300)로 전송될 수 있다. The calculated probability value may be output through the output unit 180 and transmitted from the server 100 to the user terminal 300 through the communication network 200.

도 3에 도시된 바와 같이, 확률값이 단말기(300)상에 표시될 수 있다. 도 3은 본 발명의 정성적 가짜 뉴스 판단 시스템의 사용자 단말기 상에서의 결과가 출력되는 일 예시를 나타내는 도면이다.As shown in FIG. 3, a probability value may be displayed on the terminal 300. 3 is a diagram illustrating an example of outputting a result on a user terminal of the qualitative fake news determination system of the present invention.

도 3에 도시된 바와 같이 확률값이 사용자 단말기(300)상에 표시되기까지의 과정을 사용자 단말기(300)에서의 동작으로부터 시작하여 더 상세하게 설명한다. As shown in FIG. 3, the process until the probability value is displayed on the user terminal 300 will be described in more detail starting from the operation in the user terminal 300.

먼저, 사용자 단말기(300)를 통해 뉴스 A(또는 출처 정보 등을 포함하는 뉴스 정보)가 통신망(200)을 통해 서버(100)로 수신되어 요청 입력부(170)에 입력된다. First, news A (or news information including source information) through the user terminal 300 is received by the server 100 through the communication network 200 and input to the request input unit 170.

정보 수집부(110)에서 데이터 스크래핑을 통해 뉴스 A와 유사한 정보, 예를 들어, 뉴스 A1, 뉴스 A2, 뉴스 A3, 등을 수집하고, 수집된 정보에 기초하여 추정부(120)에서 공유 패턴 A를 추정한다. The information collection unit 110 collects information similar to news A through data scraping, for example, news A1, news A2, news A3, etc., and sharing pattern A in the estimation unit 120 based on the collected information. Estimate

추정부(120) 통해서 뉴스에 대한 공유 패턴을 추정하면 데이터베이스(150)에 저장돼 있는 인공지능 모델(161)이 사용되어 공유 패턴만으로도 확률값이 즉각적으로 산출될 수 있다. 즉, 공유 패턴A와 유사한 다른 공유 패턴에 대한 확률값이 적용되어 출력부(180)를 통해 확률값이 출력되어 사용자 단말기(300) 상에 표시될 수 있다. When the sharing pattern for news is estimated through the estimating unit 120, the artificial intelligence model 161 stored in the database 150 is used, so that a probability value can be immediately calculated using only the sharing pattern. That is, a probability value for another sharing pattern similar to the sharing pattern A is applied, the probability value is output through the output unit 180 and displayed on the user terminal 300.

다른 일 실시 예에서는, 공유 패턴뿐만 아니라, 뉴스 A의 제목 및 주요 문장의 내용 불일치 여부 또는 뉴스 A 본문에서의 맥락 불일치 여부가 프로세서에 의해 분석 및 판단될 수 있다. 이에 따라 확률값 산출부(130)에서 최종적으로 인공지능 모델(161)로부터의 확률값과 다른 프로세서로부터의 확률값이 기설정된 수학식을 통해 산출될 수 있다. In another embodiment, not only the sharing pattern, but also whether the content of the title and main sentence of News A is inconsistent, or whether there is a context inconsistency in the body of News A may be analyzed and determined by the processor. Accordingly, the probability value calculation unit 130 may finally calculate a probability value from the artificial intelligence model 161 and a probability value from another processor through a predetermined equation.

이와 같이 최종적으로 산출된 확률값이 출력부(180)를 통해 사용자 단말기(300)로 전송되어 사용자 단말기(300)의 디스플레이 상에 표시될 수 있다.
The probability value finally calculated as described above may be transmitted to the user terminal 300 through the output unit 180 and displayed on the display of the user terminal 300.

다음은 도 4를 더 참조하여 본 발명의 바람직한 일 실시 예에 따른 정성적 가짜 뉴스 판단 방법을 설명한다. Next, a method of determining qualitative fake news according to an exemplary embodiment of the present invention will be described with reference to FIG. 4 further.

도 4는 본 발명의 정성적 가짜 뉴스 판단 방법을 설명하기 위한 흐름도이다. 4 is a flowchart illustrating a method of determining qualitative fake news according to the present invention.

도 4에 도시된 바와 같이, 본 발명의 바람직한 일 실시 예에 따른 정성적 가짜 뉴스 판단 방법은, 정보 수집 단계(S100), 뉴스 그룹핑 단계(S200), 공유 패턴 추정 단계(S300), 확률값 산출 단계(S400), 데이터셋 생성 단계(S500), 및 인공지능 모델 생성 단계(S600)를 포함한다. As shown in FIG. 4, the method for determining qualitative fake news according to a preferred embodiment of the present invention includes an information collection step (S100), a news grouping step (S200), a sharing pattern estimation step (S300), and a probability value calculation step. (S400), a data set generation step (S500), and an artificial intelligence model generation step (S600).

정보 수집 단계(S100)에서는, 서버(100)에 포함된 정보 수집부(110)에 의해, 데이터 스크래핑(data scraping) 기술이 이용되어 진짜 뉴스, 가짜 뉴스, 및 각각의 뉴스의 출처 정보, 뉴스 제작사 정보, 뉴스 작성자(기자) 정보, 뉴스 제작 시간 정보, 뉴스의 실시간 이슈 정보 등을 포함하는 뉴스 정보가 수집된다. In the information collection step (S100), a data scraping technology is used by the information collection unit 110 included in the server 100 to provide real news, fake news, and source information of each news, news production company. News information including information, news author (reporter) information, news production time information, and real-time issue information of the news is collected.

뉴스 그룹핑 단계(S200)에서는, 서버(100)에 포함된 추정부(120)에 의해 각각의 뉴스 간의 유사도에 기초하여 뉴스가 그룹핑된다. In the news grouping step (S200), news is grouped based on the similarity between each news by the estimation unit 120 included in the server 100.

공유 패턴 추정 단계(S300)에서는, 추정부(120)에 의해 각 그룹별로 출처 정보, 기자 정보, 언론사 정보, 뉴스 퍼플리싱 시점 정보가 분석되어 공유 패턴이 추정되는 것이 바람직하다. In the sharing pattern estimation step (S300), it is preferable that the source information, reporter information, media company information, and news publishing time information are analyzed for each group by the estimation unit 120 to estimate the sharing pattern.

확률값 산출 단계(S400)에서는, 서버(100)에 포함된 확률값 산출부(130)에 의해 공유 패턴 및 출처 정보 기반으로 그룹별로 가짜 뉴스가 차지하는 비율이 예측된다. In the probability value calculation step S400, the percentage of fake news for each group is predicted based on the sharing pattern and source information by the probability value calculator 130 included in the server 100.

데이터셋 생성 단계(S500)에서는, 서버(100)에 포함된 데이터셋 생성부(140)에 의해 각각의 뉴스가 상기 비율 기반으로 하나 이상의 클래스로 분류되어 훈련 데이터셋이 생성된다. 데이터 셋은 유효성 검증용, 테스트용, 학습용 등으로 구분하여 학습시킨다.In the data set generation step (S500), each news is classified into one or more classes based on the ratio by the data set generation unit 140 included in the server 100 to generate a training data set. Data sets are trained by classifying them into validation, test, and learning purposes.

인공지능 모델 생성 단계(S600)에서는, 서버(100)에 포함된 인공지능 처리장치(160)에 의해 훈련 데이터셋을 기반으로 기계 학습이 수행되어 인공지능 모델(161)이 생성된다. In the artificial intelligence model generation step (S600), the artificial intelligence model 161 is generated by performing machine learning based on the training dataset by the artificial intelligence processing device 160 included in the server 100.

도 5를 더 참조하여 본 발명의 바람직한 일 실시 예에 따른 정성적 가짜 뉴스 판단 방법에서 인공지능 모델(161)이 사용되어 사용자 단말기(300) 상에 디스플레이된 뉴스가 가짜일 확률이 산출되기까지의 세부 단계들을 설명한다. With further reference to FIG. 5, until the artificial intelligence model 161 is used in the qualitative fake news determination method according to an exemplary embodiment of the present invention to calculate the probability that the news displayed on the user terminal 300 is fake. Explain the detailed steps.

도 5는 본 발명의 정성적 가짜 뉴스 판단 방법을 더 설명하기 위한 흐름도이다. 5 is a flowchart for further explaining the method of determining qualitative fake news according to the present invention.

뉴스가 가짜일 확률이 산출되는 단계는, 뉴스 입력 단계(S710), 유사 정보 수집 단계(S720), 공유 패턴 추정 단계(S730), 인공지능 모델을 사용하여 판단하는 단계(S740), 및 결과 출력 단계(S750)를 포함한다. The step of calculating the probability that the news is fake includes the news input step (S710), the similar information collection step (S720), the sharing pattern estimation step (S730), the step of determining using an artificial intelligence model (S740), and the result output. It includes step S750.

뉴스 입력 단계(S710)에서는 사용자 단말기(300) 상에 디스플레이된 뉴스가 서버(100)에 입력될 수 있다. In the news input step S710, news displayed on the user terminal 300 may be input to the server 100.

유사 정보 수집 단계(S720)에서는, 정보 수집부(110)에 의해, 데이터 스크래핑(data scraping) 기술이 이용되어 상기 뉴스와 유사한 뉴스 정보가 수집될 수 있다. In the similar information collection step S720, the information collection unit 110 uses a data scraping technology to collect news information similar to the news.

공유 패턴 추정 단계(S730)에서는, 추정부(120)에 의해, 수집된 상기 유사한 뉴스 정보 기반으로 공유 패턴이 추정될 수 있다. In the sharing pattern estimation step S730, the sharing pattern may be estimated based on the collected similar news information by the estimating unit 120.

인공지능 모델(161)을 사용하여 판단하는 단계(S740)에서는, 인공지능 모델(161)이 사용되어 상기 공유 패턴 기반으로 상기 확률이 산출되어 상기 뉴스의 가짜 여부가 판단될 수 있다. In the step of determining using the artificial intelligence model 161 (S740), the artificial intelligence model 161 is used and the probability is calculated based on the sharing pattern to determine whether the news is fake.

결과 출력 단계(S750)에서는, 상기 뉴스의 상기 확률 및 상기 가짜 여부 중 어느 하나가 결과로서 출력될 수 있다. In the result output step S750, one of the probability of the news and whether the news is fake may be output as a result.

더 나아가, 결과 출력 단계(S750)에서, 상기 뉴스와 유사한 다른 뉴스 중 상기 다른 뉴스가 가짜일 확률이 기 설정된 확률 이상일 경우 리스팅되어 결과로 출력될 수 있다.
Further, in the result output step (S750), if the probability that the other news is fake among other news similar to the news is more than a preset probability, it may be listed and output as a result.

본 발명의 바람직한 일 실시 예에 따른 컴퓨터 판독 가능한 기록 매체는, 정성적 가짜 뉴스 판단 방법을 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체인 것을 특징으로 한다.A computer-readable recording medium according to an embodiment of the present invention is characterized in that it is a computer-readable recording medium in which a program for executing a qualitative fake news determination method is recorded.

컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 플롭티컬 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.
Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specially configured to store and execute the same program instructions are included. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

100: 서버
110: 정보 수집부
120: 요청 입력부
130: 출력부
140: 데이터셋 생성부
150: 데이터베이스
160: 인공지능 처리장치
161: 인공지능 모델
200: 통신망
300: 사용자 단말기100: server
110: information collection unit
120: request input unit
130: output
140: data set generation unit
150: database
160: artificial intelligence processing device
161: artificial intelligence model
200: communication network
300: user terminal

Claims

It includes a user terminal and a server capable of communicating with the user terminal,
The server,
An information collection unit that collects news information including real news, fake news, and source information of each news by using a data scraping technology;
An estimating unit for grouping the news based on the similarity between the news and estimating a sharing pattern by analyzing the source information for each group;
A probability value calculating unit that predicts a proportion of the fake news for each group based on the sharing pattern and the source information;
A data set generator configured to generate a training data set by classifying each news into one or more classes based on the ratio;
A database storing the training data set;
An artificial intelligence processing device for generating an artificial intelligence model by performing machine learning based on the training data set; And
Including; a request input unit for receiving a request to determine the fake news from the user terminal,
The artificial intelligence model is used to calculate the probability that the news displayed on the user terminal is fake,
The similarity is calculated by comparing at least one of the title, main sentence, field, and production time of each of the news,
The sharing pattern is estimated by using at least one of the number of news included in the group, the type of news media, and the density of production time,
The sharing pattern is estimated by determining the order of the news in chronological order based on the production time of the news included in the group, and calculating the spreading path and speed of the news,
When the news information is received to the server through the user terminal and input to the request input unit, a probability value for another sharing pattern similar to the sharing pattern is applied, and a probability value, which is the proportion of fake news, is output through the output unit. Characterized in that displayed on the,
Qualitative fake news judgment system.

delete

The method of claim 1,
The main sentence is at least one of a morpheme analysis step, an entity name recognition step, a syntax analysis step, a proxy processing step, a vocabulary semantic analysis step, a meaning inverse recognition step, and a cross-reference step for the news information by a processor. Characterized in that it is extracted by performing,
Qualitative fake news judgment system.

The method of claim 5,
When predicting the ratio in the probability value calculation unit,
Characterized in that the ratio is changed according to whether the content of the title and the main sentence of each of the news are inconsistent,
Qualitative fake news judgment system.

The method of claim 5,
When predicting the ratio in the probability value calculation unit,
It characterized in that the ratio varies depending on whether the context of each news is inconsistent,
Qualitative fake news judgment system.

The method of claim 1,
Algorithms used for artificial intelligence are stored in the artificial intelligence processing device,
The algorithm is characterized in that it uses artificial neural networks or fuzzy neural networks,
Qualitative fake news judgment system.

Collecting news information including real news, fake news, and source information of each news by using a data scraping technology by an information collecting unit included in the server;
Estimating a sharing pattern by grouping the news by an estimation unit included in the server based on the similarity between the news and analyzing the source information for each group;
Predicting a proportion of the fake news for each group based on the sharing pattern and the source information by a probability value calculator included in the server;
Generating a training data set by classifying each news into one or more classes based on the ratio by a data set generator included in the server;
Generating an artificial intelligence model by performing machine learning based on the training data set by an artificial intelligence processing device included in the server; And
And calculating a probability that the news displayed on the user terminal is fake by using the artificial intelligence model,
The similarity is calculated by comparing at least one of the title, main sentence, field, and production time of each of the news,
The sharing pattern is estimated by using at least one of the number of news included in the group, the type of news media, and the density of production time,
The sharing pattern is estimated by determining the order of the news in chronological order based on the production time of the news included in the group, and calculating the spreading path and speed of the news,
When the news information is received to the server through the user terminal and input to the request input unit, a probability value for another sharing pattern similar to the sharing pattern is applied, and a probability value, which is the proportion of fake news, is output through the output unit. Characterized in that displayed on the,
How to judge qualitative fake news.

The method of claim 9,
The step of calculating the probability that the news is fake,
Inputting news displayed on the user terminal to the server;
Collecting, by the information collecting unit, news information similar to the news by using a data scraping technique;
Estimating a sharing pattern based on the collected similar news information;
Determining whether the news is fake by using the artificial intelligence model and calculating the probability based on the sharing pattern; And
And outputting one of the probability of the news and whether the news is fake as a result;
How to judge qualitative fake news.

delete

The method of claim 9,
The main sentence is at least one of a morpheme analysis step, an entity name recognition step, a syntax analysis step, a proxy processing step, a vocabulary semantic analysis step, a meaning inverse recognition step, and a cross-reference step for the news information by a processor. Characterized in that it is extracted by performing,
How to judge qualitative fake news.

The method of claim 14,
When predicting the ratio in the probability value calculation unit,
Characterized in that the ratio is changed according to whether the content of the title and the main sentence of each news is inconsistent, or whether the context of each news is inconsistent,
How to judge qualitative fake news.

The method of claim 9,
Algorithms used for artificial intelligence are stored in the artificial intelligence processing device,
The algorithm is characterized in that it uses artificial neural networks or fuzzy neural networks,
How to judge qualitative fake news.

A computer-readable recording medium recording a program for executing the qualitative fake news determination method according to any one of claims 9, 10, 14 to 16.