KR20200064943A

KR20200064943A - Fake news detection server and method based on korean grammar transformation

Info

Publication number: KR20200064943A
Application number: KR1020190156574A
Authority: KR
Inventors: 정창성; 김남원
Original assignee: 고려대학교 산학협력단
Priority date: 2018-11-29
Filing date: 2019-11-29
Publication date: 2020-06-08
Also published as: KR102426599B1

Abstract

According to an embodiment of the present invention, provided is a fake news detection method based on Korean grammar conversion performed by a fake news detection server, which includes the steps of: receiving a query sentence and a news article to be detected; extracting an article sentence related to a query sentence as a key sentence in the news article to be detected; checking whether a word in the query sentence corresponds to a word in the key sentence; checking whether the query sentence and the key sentence are semantically matched according to whether the word matches; determining whether a grammatical correspondence between a query sentence and a core sentence is made according to whether or not there is a semantic correspondence; and determining whether a key sentence is true or false based on the confirmed semantic response result and grammatical response result.

Description

Fake NEWS DETECTION SERVER AND METHOD BASED ON KOREAN GRAMMAR TRANSFORMATION}

본 발명에 따른 한국어 문법 변환 기반 가짜뉴스 탐지 시스템은 뉴스 기사에 관련된 질의 문장과 뉴스 기사 문장을 한국어 문법에 기반하여 서로 매칭하여 가짜 뉴스를 선별하는 서버 및 그 방법에 관한 것이다.The fake news detection system based on Korean grammar conversion according to the present invention relates to a server and a method for selecting fake news by matching query sentences related to news articles and news article sentences based on Korean grammar.

일반적으로, 사람들이 공유하는 많은 양의 콘텐츠는 여러가지 여론을 형성한다. 때로는 잘못된 상업적 및 정치적 의도로 만들어진 가짜 뉴스가 여론 형성에 악영향을 줄 수 있다. 다양한 미디어 매체 그리고 통신 기술의 발전으로 가짜 뉴스 탐지는 뉴스의 진실성을 판별함에 있어서 필수적이고 도전적인 문제가 되었다.In general, the large amount of content shared by people creates different opinions. Sometimes fake news created with false commercial and political intentions can adversely affect public opinion. With the development of various media and communication technologies, fake news detection has become an essential and challenging problem in determining the truth of news.

한편, 문장 매칭은 자연어 처리의 핵심적인 기술로서 비교대상인 두 문장이 의미적으로 유사한지 여부를 확인할 수 있다. 최근 GPU와 같은 하드웨어의 발전으로 딥 러닝 연구가 활성화되었다. 딥 러닝에 기반한 자연어 처리 모델은 문장 매칭을 위해 다양한 시도로 발전되었다. 그 중 일부 모델은 다양한 길이의 문맥의 의미를 이해하기 위해 순환신경망(RNN: recurrent neural network)을 사용하였다. RNN은 많은 양의 데이터를 순차적으로 처리할 수 있기 때문에 여러 문장의 의미 분석에 적합한 반면에 정보의 유실(vanishing, exploding gradient)에 관한 문제가 있었다. 이러한 문제는 RNN에 Forget Gate를 추가한 장단기 메모리(LSTM: long short-term memory) 방식으로 개선되었다.On the other hand, sentence matching is a core technique of natural language processing, and it is possible to check whether two sentences to be compared are semantically similar. Deep learning research has been recently activated with the development of hardware such as GPUs. The natural language processing model based on deep learning has been developed in various attempts for sentence matching. Some of them used a recurrent neural network (RNN) to understand the meaning of contexts of varying lengths. Since RNN can process a large amount of data sequentially, it is suitable for semantic analysis of several sentences, but has a problem with vanishing (exploding gradient) of information. This problem has been improved with a long short-term memory (LSTM) method that adds a Forget Gate to RNN.

본 발명과 관련된 선행 문헌으로는 Bilateral Multi-Perspective Matching for Natural Language Sentences(BiMPM, Zhiguo Wang, 2017) 등이 있다. BiMPM 모델은 영어 데이터셋을 이용한 테스트 결과에서 최신의 성능을 달성했다. 이러한 성과에도 불구하고 BiMPM 모델에 한국어 뉴스 데이터셋을 적용함에 있어서 몇 가지 제한 사항이 있다. Prior literature related to the present invention includes Bilateral Multi-Perspective Matching for Natural Language Sentences (BiMPM, Zhiguo Wang, 2017). The BiMPM model achieved the latest performance in the test results using the English dataset. Despite these achievements, there are some limitations in applying the Korean news dataset to the BiMPM model.

첫 번째는, 영어와 한국어의 형태학적 특성이 다르기 때문에 영어 문장 매칭 기술 기반에서 한국어 문장 매칭 적용이 제한된다는 점이다. 두 번째는, 뉴스 기사와 같은 다수의 문장의 길이가 긴 글에서 중요한 정보를 포착하기 어렵다는 것이다.First, because the morphological characteristics of English and Korean are different, the application of Korean sentence matching is limited based on English sentence matching technology. The second is that it is difficult to capture important information in articles with many sentences, such as news articles.

본 발명의 해결하고자 하는 과제는 질의 문장과 관련된 문장을 뉴스 기사에서 찾아서 단어 수준에서 분석한 결과를 기초로, 딥 러닝 모델을 이용하여 의미적 일치 여부를 판단한 결과를 제공하고, 한국어 문법을 고려한 분석을 통해 문장 매칭한 결과를 제공하여, 의미적으로 문장 매칭한 결과와 문법적으로 문장 매칭한 결과를 집계하여 가짜 뉴스를 탐지하는 것이다.The problem to be solved of the present invention is to find a sentence related to a query sentence in a news article, and provide a result of determining semantic matching using a deep learning model based on the result of analysis at the word level, and analysis considering Korean grammar By providing a sentence match result through, semantically sentence matching results and grammatically sentence matching results are aggregated to detect fake news.

본 발명의 일 실시예에 따른 가짜 뉴스 탐지 서버에 의해 수행되는 한국어 문법 변환 기반 가짜뉴스 탐지방법에 있어서, 질의 문장 및 탐지 대상 뉴스 기사를 수신하는 단계; 탐지 대상 뉴스 기사 내에서 질의 문장과 관련된 기사 문장을 핵심 문장으로 추출하는 단계; 질의 문장의 단어와 핵심 문장의 단어 대응 여부를 확인하는 단계; 단어 매칭 여부에 따라, 질의 문장과 핵심 문장의 의미적 대응 여부를 확인하는 단계; 의미적 대응 여부에 따라, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인하는 단계; 및 확인된 의미적 대응 여부 결과와 문법적 대응 결과에 기초하여 핵심 문장의 참 또는 거짓을 판단하는 단계를 포함하는, 한국어 문법 변환 기반 가짜뉴스 탐지방법을 제공하고자 한다.A fake news detection method based on Korean grammar conversion performed by a fake news detection server according to an embodiment of the present invention, comprising: receiving a query sentence and a news article to be detected; Extracting an article sentence related to a query sentence as a key sentence from the news article to be detected; Checking whether a word in the query sentence corresponds to a word in the key sentence; Checking whether the query sentence and the key sentence are semanticly matched according to whether the word matches; Determining whether a grammatical correspondence between a query sentence and a core sentence is made according to whether or not there is a semantic correspondence; And determining whether the core sentence is true or false based on the confirmed semantic correspondence result and the grammatical correspondence result.

본 실시예에 있어서 핵심 문장 추출 단계는, 질의 문장 및 기사 문장을 어근과 접미사를 포함하는 단어 구성 단위로 분해하는 단계; 질의 문장의 단어 구성 단위와 기사 문장의 단어 구성 단위를 벡터화하고 서로 비교하여 코사인 유사도를 산출하는 단계; 및 질의 문장의 단어 구성 단위와 코사인 유사도가 가장 높은 단어 구성 단위를 포함하는 기사 문장을 핵심 문장으로 추출하는 단계를 포함하는, 한국어 문법 변환 기반 가짜뉴스 탐지방법을 제공할 수 있다.In the present embodiment, the core sentence extraction step includes: decomposing a query sentence and an article sentence into word configuration units including a root and a suffix; Calculating a cosine similarity by vectorizing the word construction units of the query sentence and the word construction units of the article sentence and comparing them with each other; And extracting an article sentence including a word construction unit of a query sentence and a word construction unit having the highest cosine similarity as a core sentence, and providing a fake news detection method based on Korean grammar conversion.

본 실시예에 있어서, 질의 문장의 단어와 핵심 문장의 단어 대응 여부를 확인하는 단계는, 질의 문장을 단어별로 분해하고 각 단어의 배치 순서대로 저장하여 단어 매칭 세트를 생성하는 단계; 핵심 문장을 단어별로 분해하고 각 단어의 배치 순서대로 저장하여 단어 매칭 세트를 생성하는 단계; 질의 문장의 단어 매칭 세트와 핵심 문장의 단어 매칭 세트를 서로 비교하는 단계; 및 비교 결과를 완전 단어 매칭, 부분 단어 매칭으로 분류하여 출력하는 단계를 포함하는, 한국어 문법 변환 기반 가짜뉴스 탐지방법을 제공할 수 있다.In the present embodiment, the step of determining whether a word in a query sentence corresponds to a word in a key sentence includes: decomposing the query sentence into words and storing the words in a batch order to generate a word matching set; Generating a word matching set by decomposing a key sentence into words and storing each word in the arrangement order of words; Comparing the word matching set of the query sentence and the word matching set of the core sentence with each other; And classifying and outputting the comparison result into full word matching and partial word matching, and providing a fake news detection method based on Korean grammar conversion.

본 실시예에 있어서, 비교 결과가 완전 단어 매칭된 것으로 출력된 경우, 질의 문장과 핵심 문장이 의미적 대응된 것으로 판단하여, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인하는, 한국어 문법 변환 기반 가짜뉴스 탐지방법을 제공할 수 있다.In the present exemplary embodiment, when the comparison result is output as a complete word match, it is determined that the query sentence and the core sentence are semantically matched, and the grammatical correspondence between the query sentence and the core sentence is checked. It can provide news detection methods.

본 실시예에 있어서, 질의 문장과 핵심 문장의 의미적 대응 여부를 확인하는 단계는, 질의 문장의 단어 벡터와 핵심 문장의 단어 벡터를 추출하는 단계; 추출된 질의 문장의 단어 벡터와 핵심 문장의 단어 벡터를 한국어 뉴스 기사를 미리 학습한 단어 임베딩 벡터와 매칭하는 단계; 매칭된 질의 문장의 단어 벡터와 핵심 문장의 단어 벡터에 딥러닝을 적용하여 각각 문맥 관련 임베딩 벡터로 추출하는 단계; 추출된 질의 문장의 문맥 관련 임베딩 벡터와 핵심 문장 문맥 관련 임베딩 벡터에 딥러닝을 적용하여 각각 질의 문장 매칭 벡터 및 핵심 문장 매칭 벡터로 추출하는 단계; 추출된 질의 문장 매칭 벡터와 핵심 문장 매칭 벡터에 딥러닝을 적용하여 문맥 집계 질의 문장 매칭 벡터(Contextual Aggregated Question Matching Vector)와 문맥 집계 핵심 문장 매칭 벡터(Contextual Aggregated Key Sentence Set Matching Vector)로 추출하는 단계; 문맥 집계 질의 문장 매칭 벡터와 문맥 집계 핵심 문장 매칭 벡터에 마지막 단계 일치 벡터(Last Time Step Matching Vector)을 매칭하여 집계 질의 문장 매칭 벡터(Aggregated Question Matching Vector)와 집계 핵심 문장 매칭 벡터(Aggregated Key Sentence Set Matching Vector)를 추출하는 단계; 및 추출된 집계 질의 문장 매칭 벡터와 집계 핵심 문장 매칭 벡터를 기초로 인공신경망과 정규화를 이용하여 질의 문장과 핵심 문장의 의미적 유사도를 산출하는 단계를 포함하는, 한국어 문법 변환 기반 가짜뉴스 탐지방법을 제공할 수 있다.In the present embodiment, the step of determining whether the query sentence and the key sentence are semantic includes: extracting a word vector of the query sentence and a word vector of the key sentence; Matching the word vector of the extracted query sentence with the word vector of the core sentence with the word embedding vector of pre-trained Korean news articles; Applying deep learning to the word vector of the matched query sentence and the word vector of the core sentence to extract each as a context-related embedding vector; Extracting the query sentence matching vector and the core sentence matching vector by applying deep learning to the context-related embedding vector and the core sentence context-related embedding vector of the extracted query sentence, respectively; Step of applying deep learning to the extracted query sentence matching vector and core sentence matching vector to extract into the context aggregated query matching vector and contextual aggregated key sentencing set matching vector ; Aggregated Question Matching Vector and Aggregated Key Sentence Set by matching the Last Time Step Matching Vector to the context aggregate query sentence matching vector and the context aggregate core sentence matching vector Matching Vector); And calculating the semantic similarity between the query sentence and the core sentence using artificial neural networks and normalization based on the extracted aggregate query sentence matching vector and the aggregate core sentence matching vector. Can provide.

본 실시예에 있어서, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인하는 단계는, 질의 문장의 단어들의 순서와 핵심 문장의 단어들의 순서를 각각 설정하는 단계; 순서가 설정된 질의 문장의 단어들과 핵심 문장의 단어들을 형태소 분석하여 각 형태소별로 분류하는 단계; 분류된 질의 문장 단어의 형태소와 핵심 문장 단어의 형태소를 어절을 기초로 분류하는 단계; 어절을 기초로 분류된 질의 문장의 패턴과 핵심 문장의 패턴을 생성하는 단계; 및 생성된 질의 문장의 패턴과 핵심 문장의 패턴을 서로 비교하여 질의 문장과 핵심 문장이 상호 변형 가능한 문장인지 확인하는 단계를 포함하는, 한국어 문법 변환 기반 가짜뉴스 탐지방법을 제공할 수 있다.In the present embodiment, the step of determining whether the query sentence and the key sentence are grammatical includes: setting the order of words in the query sentence and the order of words in the key sentence, respectively; Morphologically analyzing the words of the query sentence in which the order is set and the words of the core sentence and classifying them into each morpheme; Classifying the morphemes of the classified query sentence words and the morphemes of the key sentence words based on the word; Generating a pattern of a query sentence classified based on a word phrase and a pattern of a core sentence; And comparing the pattern of the generated query sentence with the pattern of the key sentence to check whether the query sentence and the key sentence are mutually deformable sentences.

본 발명의 일 실시예에 따른 가짜 뉴스 탐지 서버에 있어서, 한국어 문법 변환 기반 가짜뉴스 탐지 방법 프로그램이 기록된 메모리; 및 메모리에 기록된 프로그램을 실행하는 프로세서를 포함하고, 프로세서는 프로그램의 실행에 따라, 질의 문장 및 탐지 대상 뉴스 기사를 수신하고, 탐지 대상 뉴스 기사 내에서 질의 문장과 관련된 기사 문장을 핵심 문장으로 추출하고, 질의 문장의 단어와 핵심 문장의 단어 대응 여부를 확인하고, 단어 매칭 여부에 따라, 질의 문장과 핵심 문장의 의미적 대응 여부를 확인하고, 의미적 대응 여부에 따라, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인하고, 확인된 의미적 대응 여부 결과와 문법적 대응 결과에 기초하여 핵심 문장의 참 또는 거짓을 판단하는, 한국어 문법 변환 기반 가짜뉴스 탐지 서버를 제공하고자 한다.In the fake news detection server according to an embodiment of the present invention, a memory in which a fake news detection method program based on Korean grammar conversion is recorded; And a processor that executes a program recorded in the memory, and the processor receives a query sentence and a news article to be detected according to execution of the program, and extracts an article sentence related to the query sentence from the news article to be detected as a core sentence. , And checks whether the word of the query sentence and the key sentence correspond, and whether the word matches or not, checks the semantic correspondence of the query sentence and the key sentence, and according to the semantic response, of the query sentence and the key sentence. It is intended to provide a fake news detection server based on Korean grammar conversion, which checks for grammatical correspondence and determines whether a key sentence is true or false based on the confirmed semantic correspondence results and grammatical correspondence results.

본 발명의 효과는 질의 문장과 관련된 문장을 뉴스 기사에서 찾아서 단어 수준에서 분석한 결과를 기초로, 딥 러닝 모델을 이용하여 의미적 일치 여부를 판단한 결과를 제공하고, 한국어 문법을 고려한 분석을 통해 문장 매칭한 결과를 제공하여, 의미적으로 문장 매칭한 결과와 문법적으로 문장 매칭한 결과를 집계하여 가짜 뉴스를 탐지할 수 있다는 것이다.The effect of the present invention is to find a sentence related to a query sentence in a news article and provide a result of determining semantic matching using a deep learning model based on the result of analysis at the word level, and sentence through analysis considering Korean grammar By providing a matched result, it is possible to detect fake news by counting the semantically sentence-matched result and the grammatically sentence-matched result.

또한 본 발명에 따르면 인터넷 신문, 소셜 네트워크 서비스 등과 같은 대중 매체에서 발생되는 뉴스 데이터들을 대상으로 하여 뉴스 기사의 진위를 판별할 수 있다. 또한, 뉴스 기사의 진위를 고려하여 대중 매체에 의해 잘못 전달된 정보를 판별할 수 있다. 나아가, 가짜뉴스에 의해 형성되는 여론을 방지할 수 있다.In addition, according to the present invention, it is possible to determine the authenticity of a news article by targeting news data generated from mass media such as Internet newspapers and social network services. In addition, information erroneously delivered by the mass media can be determined in consideration of the authenticity of the news article. Furthermore, public opinion formed by fake news can be prevented.

도 1은 본 발명에 따른 가짜 뉴스 탐지 방법을 설명하는 개략도이다.
도 2는 본 발명에 따른 가짜 뉴스 탐지 방법을 설명하는 블록도이다.
도 3은 본 발명에 따른 가짜 뉴스 탐지 방법을 설명하는 순서도이다.
도 4는 본 발명에 따른 핵심 문장 추출 방법을 보여주는 순서도이다.
도 5는 본 발명에 따른 핵심 문장 추출 모델을 보여주는 개략도이다.
도 6은 본 발명에 따른 단어 대응 여부 확인 방법을 보여주는 순서도이다.
도 7은 본 발명에 따른 단어 매칭 모듈을 보여주는 개략도이다.
도 8은 본 발명에 따른 의미적 대응 여부 확인 방법을 보여주는 순서도이다.
도 9는 본 발명에 따른 의미적 대응 여부 확인 모듈을 보여주는 개략도이다.
도 10은 본 발명에 따른 문법적 대응 여부 확인 모듈의 일부를 보여주는 계략도이다.
도 11은 본 발명에 따른 문법적 대응 여부 확인을 위한 문장 구조화를 보여주는 도면이다.
도 12는 본 발명에 따른 의미적 대응 여부 확인 방법의 알고리즘 예시를 보여주는 도면이다.
도 13은 본 발명에 따른 문법적 대응 여부 확인 방법의 알고리즘 예시를 보여주는 도면이다.
도 14는 본 발명에 따른 문법적 대응 여부 확인 방법의 입력 문장, 형태소 분석결과를 보여주는 도면이다.
도 15는 본 발명에 따른 문법적 대응 여부 확인 방법의 어절 단위 문장 분석을 보여주는 도면이다.
도 16은 본 발명에 따른 문장 패턴 생성 결과를 보여주는 도면이다.1 is a schematic diagram illustrating a fake news detection method according to the present invention.
2 is a block diagram illustrating a fake news detection method according to the present invention.
3 is a flowchart illustrating a fake news detection method according to the present invention.
4 is a flow chart showing a core sentence extraction method according to the present invention.
5 is a schematic diagram showing a core sentence extraction model according to the present invention.
6 is a flowchart illustrating a method for confirming whether a word corresponds according to the present invention.
7 is a schematic diagram showing a word matching module according to the present invention.
8 is a flowchart illustrating a method for confirming whether or not a semantic response is made according to the present invention.
9 is a schematic diagram showing a semantic correspondence checking module according to the present invention.
10 is a schematic diagram showing a part of a module for determining whether a grammatical correspondence exists according to the present invention.
11 is a diagram showing sentence structure for checking whether grammatical correspondence is in accordance with the present invention.
12 is a diagram showing an example of an algorithm of a method for checking whether a semantic correspondence is performed according to the present invention.
13 is a diagram showing an example of an algorithm of a method for determining whether a grammatical correspondence is performed according to the present invention.
14 is a diagram showing input sentence and morpheme analysis results of a method for determining whether a grammatical correspondence is in accordance with the present invention.
15 is a diagram illustrating sentence-by-word sentence analysis of a method for determining whether grammatical correspondence is in accordance with the present invention.
16 is a diagram showing the result of generating a sentence pattern according to the present invention.

아래에서는 첨부한 도면을 참조하여, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고, 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art to which the present invention pertains can easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . Also, when a part is said to "include" a certain component, this means that other components may be further included rather than excluding other components, unless otherwise stated.

이하에서는 본 발명의 일 실시예에 따른 한국어 문법 변환 기반 가짜뉴스 탐지 서버에 대하여 설명하기로 한다.Hereinafter, a fake news detection server based on Korean grammar conversion according to an embodiment of the present invention will be described.

도 1은 본 발명에 따른 가짜 뉴스 탐지 방법을 설명하는 개략도이다.1 is a schematic diagram illustrating a fake news detection method according to the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 한국어 문법 변환 기반 가짜뉴스 탐지 서버는 사용자 단말로부터 질의 문장과 뉴스 기사를 수신하고, 이에 대하여 질의 문장과 뉴스 기사 내의 진실 문장의 유사도를 산출하고, 이를 기초로 질의 문장의 진위를 판별하여 그 결과값을 사용자 단말로 제공할 수 있다.Referring to FIG. 1, a fake news detection server based on Korean grammar conversion according to an embodiment of the present invention receives a query sentence and a news article from a user terminal, and calculates the similarity between the query sentence and the truth sentence in the news article. Based on this, the authenticity of the query sentence can be determined and the result value can be provided to the user terminal.

한편 가짜 뉴스 탐지 서버는 통신 모듈, 메모리, 프로세서를 포함할 수 있다.Meanwhile, the fake news detection server may include a communication module, a memory, and a processor.

통신 모듈은 통신망과 연동하여 가짜 뉴스 탐지 서버에 통신 인터페이스를 제공하는데, 사용자 단말과 데이터를 송수신하는 역할을 수행할 수 있다. 여기서, 통신 모듈은 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다. The communication module provides a communication interface to a fake news detection server by interworking with a communication network, and may serve to transmit and receive data to and from a user terminal. Here, the communication module may be a device including hardware and software necessary for transmitting and receiving a signal such as a control signal or a data signal through a wired or wireless connection with another network device.

메모리는 가짜 뉴스 탐지 프로그램이 기록된 것일 수 있다. 또한, 메모리는 프로세서가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행할 수 있다. 여기서, 메모리는 휘발성 저장 매체(volatile storage media) 또는 비휘발성 저장 매체(non-volatile storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The memory may be a record of a fake news detection program. In addition, the memory may function to temporarily or permanently store data processed by the processor. Here, the memory may include volatile storage media or non-volatile storage media, but the scope of the present invention is not limited thereto.

프로세서는 가짜 뉴스 탐지 서버에서 가짜 뉴스 탐지 프로그램이 수행하는 전체 과정을 제어할 수 있다. 프로세서가 수행하는 과정의 각 단계에 대해서는 도 2 내지 도 10을 참조하여 후술하기로 한다.The processor may control the entire process performed by the fake news detection program in the fake news detection server. Each step of the process performed by the processor will be described later with reference to FIGS. 2 to 10.

여기서, 프로세서는 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로서, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the processor may include any kind of device capable of processing data, such as a processor. Here, a'processor' may mean a data processing device embedded in hardware having physically structured circuits, for example, to perform functions represented by codes or instructions included in a program. As an example of such a data processing device embedded in hardware, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, and an application-specific integrated ASIC circuit, a field programmable gate array (FPGA), or the like, but the scope of the present invention is not limited thereto.

이하에서는 본 발명의 일 실시예에 따른 한국어 문법 변환 기반 가짜뉴스 탐지 방법에 대하여 설명하기로 한다.Hereinafter, a method of detecting fake news based on Korean grammar conversion according to an embodiment of the present invention will be described.

도 2는 본 발명에 따른 가짜 뉴스 탐지 방법을 설명하는 블록도이다. 도 3은 본 발명에 따른 가짜 뉴스 탐지 방법을 설명하는 순서도이다.2 is a block diagram illustrating a fake news detection method according to the present invention. 3 is a flowchart illustrating a fake news detection method according to the present invention.

도 2 및 도 3을 참조하면, 본 발명의 일 실시예에 따른 가짜 뉴스 탐지 프로그램은 뉴스 기사에서 질의 문장과 관련된 기사 문장을 찾는 핵심 문장 추출 모델, 질의 문장의 단어와 핵심 문장의 단어 대응 여부를 확인하는 모델, 질의 문장과 핵심 문장의 의미적 대응 여부를 확인하는 모델, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인하는 모델을 통합한 한국어 문법 변환 기반 가짜뉴스 탐지 프로그램 제공한다.2 and 3, the fake news detection program according to an embodiment of the present invention is a core sentence extraction model for finding an article sentence related to a query sentence in a news article, whether a word in the query sentence corresponds to a word in the core sentence We provide a fake news detection program based on Korean grammar conversion that integrates a model to check, a model to check whether a query sentence and a key sentence have a semantic response, and a model to check whether a query sentence and a key sentence have a grammatical response.

본 프로그램은 개략적으로 뉴스 기사와 관련된 질의 문장을 단어 구성 단위로 나누고, 단어 구성 단위를 이용하여 뉴스 기사에서 관련도가 높은 핵심 문장을 추출한다. 그리고 질의 문장과 관련된 뉴스 기사 문장의 단어 매칭 여부를 확인하고, 질의 문장과 핵심 문장을 매칭시켜 의미적 대응 여부를 확인하고, 문법적 대응 여부를 확인한 뒤, Bidirectional Long Short-term Memory(BiLSTM)을 이용하여 의미적으로 문장 매칭한 결과와 문법적으로 문장 매칭한 결과를 집계하여 가짜뉴스를 판독할 수 있다. 여기서 질의 문장은 진위 여부의 확인이 필요한 문장이고, 탐지 대상 뉴스 기사는 거짓 정보가 없는 진실 정보로 구성된 뉴스 기사일 수 있다.The program roughly divides query sentences related to news articles into word units, and extracts key sentences with high relevance from news articles using word units. Then, it checks whether the word of the news article sentence related to the query sentence is matched, checks the semantic correspondence by matching the query sentence and the core sentence, checks the grammatical correspondence, and then uses Bidirectional Long Short-term Memory (BiLSTM). By semantically matching the result of sentence matching and the result of grammatically matching sentence, fake news can be read. Here, the query sentence is a sentence that requires confirmation of authenticity, and the news article to be detected may be a news article composed of truth information without false information.

본 발명의 일 실시예에 따른 가짜 뉴스 탐지 서버에 의해 수행되는 한국어 문법 변환 기반 가짜뉴스 탐지방법에 있어서, 먼저 질의 문장 및 탐지 대상 뉴스 기사를 수신하는 단계(S310)가 수행될 수 있다.In the method of detecting fake news based on Korean grammar conversion performed by the fake news detection server according to an embodiment of the present invention, first, a step S310 of receiving a query sentence and a news article to be detected may be performed.

다음으로 탐지 대상 뉴스 기사 내에서 질의 문장과 관련된 기사 문장을 핵심 문장으로 추출하는 단계(S320)가 수행될 수 있다.Next, a step (S320) of extracting an article sentence related to a query sentence as a key sentence in the news article to be detected may be performed.

도4는 본 발명에 따른 핵심 문장 추출 방법을 보여주는 순서도이다. 도 5는 본 발명에 따른 핵심 문장 추출 모델을 보여주는 개략도이다.4 is a flowchart showing a method of extracting a key sentence according to the present invention. 5 is a schematic diagram showing a core sentence extraction model according to the present invention.

도 4 및 도5를 참조하면 핵심 문장 추출 단계는, 먼저 질의 문장 및 기사 문장을 어근과 접미사를 포함하는 단어 구성 단위로 분해하는 단계(S410)가 수행될 수 있다.Referring to FIGS. 4 and 5, in the core sentence extraction step, first, a step (S410) of decomposing the query sentence and the article sentence into word configuration units including a root and a suffix may be performed.

다음으로, 질의 문장의 단어 구성 단위와 기사 문장의 단어 구성 단위를 벡터화하고 서로 비교하여 코사인 유사도를 산출하는 단계(S420)가 수행될 수 있다. 이는 뉴스 기사에서 분해된 단어 구성 단위를 이용하여 단어의 빈도 측면에서 질의 문장과 가장 유사한 문장을 회수하기 위한 것이다.Next, a step (S420) of calculating a cosine similarity by vectorizing and comparing each of the word unit of the query sentence and the word unit of the article sentence may be performed. This is for retrieving the sentence most similar to the query sentence in terms of the frequency of the word using the decomposed word unit in the news article.

다음으로, 질의 문장의 단어 구성 단위와 코사인 유사도가 가장 높은 단어 구성 단위를 포함하는 기사 문장을 핵심 문장으로 추출하는 단계(S430)가 수행될 수 있다. 이는 뉴스 기사의 각 문장에서 질의 문장 단어 구성 단위의 출현 빈도를 확인하여 핵심 문장을 추출하기 위한 것이다.Next, a step (S430) of extracting the article sentence including the word unit of the query sentence and the word unit of the highest cosine similarity as a key sentence may be performed. This is to extract the core sentence by checking the frequency of appearance of the query sentence word composition unit in each sentence of the news article.

도 6은 본 발명에 따른 단어 대응 여부 확인 방법을 보여주는 순서도이다. 도 7은 본 발명에 따른 단어 대응 여부 확인 모듈을 보여주는 개략도이다.6 is a flowchart illustrating a method for confirming whether a word corresponds according to the present invention. 7 is a schematic diagram showing a word correspondence checking module according to the present invention.

다음으로 도 6 및 도 7을 참조하면, 질의 문장의 단어와 핵심 문장의 단어 대응 여부를 확인하는 단계가 수행될 수 있다.Next, referring to FIGS. 6 and 7, a step of determining whether a word in a query sentence corresponds to a word in a key sentence may be performed.

질의 문장의 단어와 핵심 문장의 단어 대응 여부를 확인하는 단계는, 먼저 질의 문장을 단어별로 분해하고 각 단어의 배치 순서대로 저장하여 단어 매칭 세트를 생성하는 단계(S610)가 수행될 수 있다.In the step of determining whether a word in the query sentence corresponds to a word in the key sentence, first, a step (S610) of generating a word matching set by decomposing the query sentence into words and storing the words in the arrangement order of each word may be performed.

다음으로, 핵심 문장을 단어별로 분해하고 각 단어의 배치 순서대로 저장하여 단어 매칭 세트(WMS: Word Matching Sentence Set)를 생성하는 단계(S620)가 수행될 수 있다. 단어 매칭 세트는 입력 문장을 분할하고, 분할된 순서대로 리스트에 저장된 것이다.Next, a step of generating a word matching set (WMS: Word Matching Sentence Set) may be performed by decomposing the key sentences for each word and storing them in the order of arrangement of each word (S620 ). The word matching set is for splitting the input sentences and storing them in the list in the order of division.

다음으로, 질의 문장의 단어 매칭 세트와 핵심 문장의 단어 매칭 세트를 서로 비교(S630)하는 단계가 수행될 수 있다.Next, a step of comparing the word matching set of the query sentence with the word matching set of the core sentence (S630) may be performed.

다음으로, 비교 결과를 완전 단어 매칭(Complete word matching), 부분 단어 매칭(Partial word matching)으로 분류하여 출력하는 단계가 수행될 수 있다. 여기서 완전 단어 매칭은 질의 문장과 핵심 문장의 단어와 그 순서가 모두 매칭된 것이고, 부분 단어 매칭은 단어 및/또는 순서 중에 부분적으로 매칭된 것이고, 또는 단어 비매칭(Non-word matching)은 단어와 순서가 매칭이 전혀 되지 않은 것을 의미한다.Next, a step of classifying and outputting the comparison result into complete word matching and partial word matching may be performed. Here, the complete word matching is that the words of the query sentence and the key sentence are all matched, and the partial word matching is partially matched among words and/or sequences, or the word non-word matching is the word and This means that the order was not matched at all.

한편 비교 결과가 완전 단어 매칭된 것으로 출력된 경우, 질의 문장과 핵심 문장이 의미적 대응된 것으로 판단하여, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인할 수 있다. 문법적 대응 여부 확인 방법에 대하여는 후술하기로 한다.On the other hand, when the comparison result is output as a complete word match, it is determined that the query sentence and the core sentence are semantically matched, and it is possible to check whether the query sentence and the core sentence are grammatically matched. The method of checking whether or not the grammatical correspondence will be described later.

도 8은 본 발명에 따른 의미적 대응 여부 확인 방법을 보여주는 순서도이다. 도 9는 본 발명에 따른 의미적 대응 여부 확인 모듈을 보여주는 개략도이다.8 is a flowchart illustrating a method for confirming whether a semantic response is made according to the present invention. 9 is a schematic diagram showing a semantic correspondence checking module according to the present invention.

다음으로 도 8 및 도 9를 참조하면, 단어 매칭 여부에 따라 질의 문장과 핵심 문장의 의미적 대응 여부를 확인하는 단계가 수행될 수 있다.Next, referring to FIGS. 8 and 9, a step of determining whether a query sentence and a key sentence are semanticly matched may be performed according to word matching.

질의 문장과 핵심 문장의 의미적 대응 여부를 확인하는 단계는, 먼저 질의 문장의 단어 벡터와 핵심 문장의 단어 벡터를 추출하는 단계(S810)가 수행될 수 있다.In the step of determining whether the query sentence and the key sentence are semanticly matched, a step S810 of extracting the word vector of the query sentence and the word vector of the key sentence may be performed.

구체적으로, 질의 문장의 단어와 핵심 문장의 단어들은 단어 표현 계층(Word Representation Layer)를 통해 질의 문장 단어 벡터(question word vector)와 핵심 문장 단어 벡터(key sentence set word vector)로 출력될 수 있다.Specifically, the words of the query sentence and the words of the key sentence may be output as a query word vector and a key sentence set word vector through a word representation layer.

다음으로, 추출된 질의 문장의 단어 벡터와 핵심 문장의 단어 벡터를 한국어 뉴스 기사를 미리 학습한 단어 임베딩 벡터와 매칭하는 단계(S820)가 수행될 수 있다.Next, a step (S820) of matching the word vector of the extracted query sentence with the word vector of the core sentence with the word embedding vector of a pre-trained Korean news article may be performed.

구체적으로, 질의 문장 단어 벡터와 핵심 문장 단어 벡터는 한국어 신문 기사를 word2vec 알고리즘을 이용하여 학습된 단어 임베딩(Word Embedding)의 단어들과 매칭될 수 있다. Specifically, the query sentence word vector and the core sentence word vector may match Korean newspaper articles with words of word embedding learned using the word2vec algorithm.

다음으로, 매칭된 질의 문장의 단어 벡터와 핵심 문장의 단어 벡터에 딥러닝을 적용하여 각각 문맥 관련 임베딩 벡터로 추출하는 단계(S830)가 수행될 수 있다.Next, a step S830 of extracting each context-related embedding vector by applying deep learning to the word vector of the matched query sentence and the word vector of the core sentence may be performed.

구체적으로, 출력된 단어 벡터는 문맥 표현 계층(Context Representation Layer)에서 BiLSTM 을 통해 문맥 관련 질의 문장 임베딩 벡터(question contextual embedding vector)과 문맥 관련 핵심 문장 임베딩 벡터(key sentence set contextual embedding vector)로 출력될 수 있다.Specifically, the output word vector is to be output as a context related query sentence embedding vector and a key sentence set contextual embedding vector through BiLSTM in a context representation layer. Can be.

여기서 단어 임베딩은 단어의 문자를 벡터로 변환하는 과정을 의미한다. 단어 벡터를 문맥 관련 임베딩 벡터로 변환하는 과정은 먼저 질의 문장 단어 벡터와 핵심 문장 단어 벡터를 각각 BiLSTM에 입력한다. 이때, BiLSTM은 입력된 각각의 단어 벡터 정보들을 인코딩하고 단어 순서 및 주변 단어 분포에 따라 정보를 통합한다. 그리고 각각 인코딩 된 정보들은 질의 문장의 문맥 관련 임베딩 벡터와 핵심 문장의 문맥 관련 임베딩 벡터로 출력된다. 이에 따라 단어 벡터에서 표현된 정보보다 더 넓은 범위에서 정보를 임베딩 벡터로 표현할 수 있다.Here, word embedding refers to a process of converting a character of a word into a vector. The process of converting a word vector into a context-related embedding vector first inputs a query sentence word vector and a key sentence word vector into BiLSTM. At this time, BiLSTM encodes each input word vector information and integrates the information according to word order and surrounding word distribution. And each encoded information is output as the context-related embedding vector of the query sentence and the context-related embedding vector of the core sentence. Accordingly, the information can be expressed as an embedding vector in a wider range than the information expressed in the word vector.

다음으로, 추출된 질의 문장의 문맥 관련 임베딩 벡터와 핵심 문장 문맥 관련 임베딩 벡터에 딥러닝을 적용하여 각각 질의 문장 매칭 벡터 및 핵심 문장 매칭 벡터로 추출하는 단계(S840)가 수행될 수 있다.Next, step S840 of extracting the query sentence matching vector and the core sentence matching vector by applying deep learning to the context-related embedding vector and the core sentence context-related embedding vector of the extracted query sentence may be performed, respectively.

구체적으로, 출력된 질의 문장과 핵심 문장의 문맥 관련 임베딩 벡터는 매칭 표현 계층(Matching representation layer)에서 완전 매칭(Full matching) 과 상세 매칭(Attentive matching) 딥러닝 기법을 적용하여 매칭 벡터(question matching vector, key sentence set matching vector)로 출력될 수 있다.Specifically, the embedding vector related to the context of the output query sentence and the core sentence is a matching vector by applying deep matching and full matching in the matching representation layer. , key sentence set matching vector).

매칭 벡터를 생성하기 위해 먼저, 질의 문장의 문맥 관련 임베딩 벡터와 핵심 문장의 문맥 관련 임베딩 벡터 각각을 교차하여 매칭 연산을 수행하는 레이어로 입력하게 한다. 이 때, 매칭 연산을 수행하는 레이어는 완전 매칭(Full matching) 과 상세 매칭(Attentive matching) 연산 과정을 수행한다. 여기서 완전 매칭과 상세 매칭은 매칭 벡터를 생성하기 위한 연산 도구이다. 각각의 연산과정은 먼저 완전 매칭은 질의 문장 문맥 관련 임베딩 벡터와 핵심 문장의 문맥 관련 임베딩 벡터에 표현된 모든 벡터 정보를 순차적으로 매칭하여 벡터들을 생성한다.In order to generate a matching vector, first, a context-related embedding vector of a query sentence and a context-related embedding vector of a core sentence are intersected and input into a layer performing a matching operation. At this time, the layer that performs the matching operation performs a full matching (Full matching) and detailed matching (Attentive matching) operation process. Here, full matching and detailed matching are computational tools for generating a matching vector. In each operation process, first, full matching sequentially matches all vector information expressed in the query sentence context-related embedding vector and the core sentence context-related embedding vector to generate vectors.

매칭 벡터를 생성하기 위한 과정은 순차적으로 매칭된 벡터들을 예시로, 현재 벡터 정보와 그 다음 순서의 벡터 정보들을 각각 코사인 유사도 연산을 통해 가중치를 계산한다. 여기서 지속적으로 연산하는 벡터는 핵심 문장의 문맥 관련 임베딩 벡터이다. 현재 벡터 정보 그리고 그 다음 순서의 벡터 정보들을 각각 비교하여 마지막 벡터가 등장할 때까지 가중치를 업데이트 하여 벡터들을 출력한다.In the process of generating a matching vector, for example, sequentially matched vectors, the current vector information and the next order of vector information are respectively calculated through a cosine similarity operation. Here, the vector that is continuously calculated is an embedding vector related to the context of the core sentence. The current vector information and the next order of vector information are compared, and the weights are updated until the last vector appears to output the vectors.

전체적으로 다시 정리하면 문맥 관련 벡터들을 서로 교차하여 입력하게 하고, 서로 같은 연산을 수행하는 게 아니라, 한 쪽은 질의 문장에 기준을 두고 핵심 문장들의 문맥 벡터들을 순차적으로 연산하는 것이고, 다른 쪽은 핵심 문장에 기준을 두고 질의 문장의 문맥 벡터들을 순차적으로 연산한다.Overall reorganization allows context-related vectors to be inputted crossing each other, and does not perform the same operation as each other, but one side sequentially calculates context vectors of key sentences based on the query statement, and the other side executes the core statements. Based on, the context vectors of the query statement are sequentially calculated.

그 다음 상세 매칭은 완전 매칭 연산 기법과 유사하지만, 가중치 연산과정에서 발생하는 가중치들을 지속적으로 업데이트하는 것이 아니라 가중합(weighted sum) 연산을 이용하여 각각 연산된 결과에 가중치 값을 곱한 후 그 결과들을 다시 합하여 연산하게 하는 것이다.Next, the detailed matching is similar to the perfect matching operation technique, but instead of continuously updating the weights generated in the weight calculation process, the results are multiplied by the weighted sum operation, and the results are multiplied. It is to sum them up again.

마지막으로 완전 매칭과 상세 매칭을 통해 연산된 결과들을 통합하여 매칭 벡터들을 출력하게 합니다. 이에 따라 질의 문장과 핵심 문장의 정보들을 비교하여 유사한 정보들은 더욱 유사하게 부각시키고, 유사하지 않은 부분들은 더욱 유사하지 않는 정보들로 부각시킬 수 있다.Lastly, the results calculated through full matching and detailed matching are combined to output matching vectors. Accordingly, by comparing the information of the query sentence and the key sentence, similar information can be highlighted more similarly, and unsimilar parts can be highlighted as more dissimilar information.

다음으로, 추출된 질의 문장 매칭 벡터와 핵심 문장 매칭 벡터에 딥러닝을 적용하여 문맥 집계 질의 문장 매칭 벡터(Contextual Aggregated Question Matching Vector)와 문맥 집계 핵심 문장 매칭 벡터(Contextual Aggregated Key Sentence Set Matching Vector)로 추출하는 단계(S850)가 수행될 수 있다.Next, deep learning is applied to the extracted query sentence matching vector and the core sentence matching vector, resulting in a context aggregated query matching vector and a contextual aggregated key matching set. Extracting step (S850) may be performed.

구체적으로, 출력된 각각의 매칭 벡터는 또 다른 BiLSTM을 통해 문맥 집계 질의 문장 매칭 벡터와 문맥 집계 핵심 문장 매칭 벡터로 출력될 수 있다.Specifically, each output matching vector may be output as a context aggregation query sentence matching vector and a context aggregation core sentence matching vector through another BiLSTM.

여기서는 출력된 매칭 벡터들을 또 다른 BiLSTM을 이용하여 인코딩 하는 과정을 통해 벡터 정보들을 함축할 수 있다. 이에 따라 복잡한 매칭 벡터들을 집계된 벡터들로 함축하여 의미적 유사도를 산출하는 과정에서의 연산적 복잡성을 줄일 수 있다. Here, vector information may be implied through a process of encoding the output matching vectors using another BiLSTM. Accordingly, it is possible to reduce computational complexity in the process of calculating semantic similarity by implicating complex matching vectors with aggregated vectors.

다음으로, 문맥 집계 질의 문장 매칭 벡터와 상기 문맥 집계 핵심 문장 매칭 벡터에 마지막 단계 일치 벡터(Last Time Step Matching Vector)을 매칭하여 집계 질의 문장 매칭 벡터(Aggregated Question Matching Vector)와 집계 핵심 문장 매칭 벡터(Aggregated Key Sentence Set Matching Vector)를 추출하는 단계(S860)가 수행될 수 있다.Next, an aggregate query sentence matching vector and an aggregate core sentence matching vector are matched by matching a last time step matching vector to the context aggregate query sentence matching vector and the context aggregate core sentence matching vector ( Step S860 of extracting the Aggregated Key Sentence Set Matching Vector may be performed.

다음으로, 추출된 집계 질의 문장 매칭 벡터와 상기 집계 핵심 문장 매칭 벡터를 기초로 인공신경망과 정규화를 이용하여 상기 질의 문장과 상기 핵심 문장의 의미적 유사도를 산출하는 단계(S870)가 수행될 수 있다.Next, using the artificial neural network and normalization based on the extracted aggregate query sentence matching vector and the aggregate core sentence matching vector, step S870 of calculating the semantic similarity between the query sentence and the core sentence may be performed. .

구체적으로, 출력된 집계 질의 문장 매칭 벡터와 집계 핵심 문장 매칭 벡터는 결정 계층(Decision Layer)에서 2개의 계층으로 구성된 피드 포워드(Feed forward) 신경망과 소프트 맥스(Softmax) 활성화 함수를 이용하여 질의 문장과 핵심 문장 사이의 의미적 유사도를 산출할 수 있다.Specifically, the output aggregated query sentence matching vector and aggregated core sentence matching vector are used to determine the query sentence and the query statement using a feed forward neural network composed of two layers in a decision layer and a softmax activation function. Semantic similarity between key sentences can be calculated.

도 10은 본 발명에 따른 문법적 대응 여부 확인 모듈의 일부를 보여주는 계략도이다. 도 11은 본 발명에 따른 문법적 대응 여부 확인을 위한 문장 구조화를 보여주는 도면이다.10 is a schematic diagram showing a part of a module for determining whether a grammatical correspondence exists according to the present invention. 11 is a diagram showing sentence structure for checking whether grammatical correspondence is in accordance with the present invention.

다음으로 도 10 및 도 11을 참조하면, 의미적 대응 여부에 따라, 질의 문장과 핵심 문장의 문법적 대응 여부를 확인하는 단계가 수행될 수 있다. 문법적 대응 여부 확인 단계는 질의 문장과 추출된 뉴스 기사 핵심 문장의 문법적인 특징을 매칭하여 문법 변형을 확인하는 단계이다.Next, referring to FIGS. 10 and 11, a step of determining whether a grammatical correspondence between a query sentence and a core sentence may be performed according to semantic correspondence. The grammatical correspondence checking step is a step of checking grammatical variations by matching the grammatical characteristics of the query sentence and the extracted news article core sentence.

문법적 대응 여부를 확인하는 단계는, 먼저 질의 문장의 단어들의 순서와 핵심 문장의 단어들의 순서를 각각 설정하는 단계(S1010)가 수행될 수 있다.The step of determining whether a grammatical correspondence may be performed may include first setting the order of words in the query sentence and the order of words in the core sentence (S1010 ).

다음으로, 순서가 설정된 질의 문장의 단어들과 상기 핵심 문장의 단어들을 형태소 분석하여 각 형태소별로 분류하는 단계(S1020)가 수행될 수 있다. 질의 문장과 핵심 문장의 문법적 특징을 구체화하기 위해 문장을 도 11과 같은 파싱 트리(Parsing tree)형태로 구조화할 수 있다. 여기서 terminal은 문장에서 표현된 단어의 최소 단위이다. Non-terminal은 terminal 및 다른 non-terminal로 분해 가능한 단위이다.Next, a step (S1020) of sorting the words of the query sentence in which the order is set and the words of the core sentence by morpheme analysis may be performed. In order to specify the grammatical characteristics of the query sentence and the core sentence, the sentence may be structured in the form of a parsing tree as shown in FIG. 11. Here, terminal is the minimum unit of words expressed in a sentence. Non-terminal is a terminal and other non-terminal degradable units.

다음으로, 분류된 질의 문장 단어의 형태소와 핵심 문장 단어의 형태소를 어절을 기초로 분류하는 단계(S1030)가 수행될 수 있다.Next, a step (S1030) of classifying the morphemes of the classified query sentence words and the morphemes of the key sentence words based on the word may be performed.

다음으로, 어절을 기초로 분류된 질의 문장의 패턴과 상기 핵심 문장의 패턴을 생성하는 단계(S1040)가 수행될 수 있다. 어절을 기초로 분류된 각각의 문장은 주어(subject), 목적어(object), 보어(complement), 동사(verb), 부사(adverb), 독립어구(independent component)의 성분으로 분류되어 패턴이 생성될 수 있다.Next, a step (S1040) of generating a pattern of a query sentence classified based on a word and a pattern of the core sentence may be performed. Each sentence classified based on a word is classified into the components of a subject, an object, a complement, a verb, an adverb, and an independent component to generate a pattern. Can be.

다음으로, 생성된 질의 문장의 패턴과 핵심 문장의 패턴을 서로 비교하여 질의 문장과 핵심 문장이 상호 변형 가능한지 확인하는 단계(S1040)가 수행될 수 있다. 질의 문장의 패턴과 핵심 문장의 패턴이 상호 변형 가능한 문장으로 확인된다면 질의 문장은 진위 판단에서 참값을 출력할 수 있는 것이다.Next, a step (S1040) of determining whether the query sentence and the core sentence are mutually deformable may be performed by comparing the pattern of the generated query sentence with the pattern of the core sentence. If the pattern of the query sentence and the pattern of the core sentence are identified as mutually deformable sentences, the query sentence can output a true value in authenticity judgment.

마지막으로, 확인된 의미적 대응 여부 결과와 문법적 대응 결과에 기초하여 핵심 문장의 참 또는 거짓을 판단하여 그 결과값을 출력하는 단계(S1050)가 수행될 수 있다. 예를 들어, 의미적 대응 여부 결과로서 의미적 유사도가 일정 값 이상으로 산출되고, 문법적 대응 결과 질의 문장과 핵심 문장이 상호 변형 가능한 문장으로 확인된다면, 질의 문장은 진위 판단에서 참값을 출력할 수 있다.Finally, a step (S1050) of determining whether a key sentence is true or false based on the confirmed semantic correspondence result and the grammatical correspondence result and outputting the result value may be performed. For example, as a result of semantic correspondence, if the semantic similarity is calculated as a predetermined value or more, and if the query sentence and the core sentence are identified as mutually deformable sentences as a result of the grammatical correspondence, the query sentence may output a true value in authenticity determination. .

도 12는 본 발명에 따른 의미적 대응 여부 확인 방법의 알고리즘 예시를 보여주는 도면이다. 도 13은 본 발명에 따른 문법적 대응 여부 확인 방법의 알고리즘을 보여주는 도면이다. 도 14는 본 발명에 따른 문법적 대응 여부 확인 방법의 입력 문장, 형태소 분석결과를 보여주는 도면이다. 도 15는 본 발명에 따른 문법적 대응 여부 확인 방법의 어절 단위 문장 분석을 보여주는 도면이다. 도 16은 본 발명에 따른 문장 패턴 생성 결과를 보여주는 도면이다.12 is a diagram showing an example of an algorithm of a method for determining whether a semantic correspondence is performed according to the present invention. 13 is a diagram showing an algorithm of a method for determining whether a grammatical correspondence is performed according to the present invention. 14 is a diagram showing input sentence and morpheme analysis results of a method for determining whether a grammatical correspondence is in accordance with the present invention. 15 is a diagram illustrating sentence-by-word analysis of a method for determining whether a grammatical correspondence is performed according to the present invention. 16 is a diagram showing the result of generating a sentence pattern according to the present invention.

도 12내지 도 16을 참조하면 본 발명의 컴퓨터에 의해 수행되는 의미적 대응 여부 확인 방법의 알고리즘 및 수행 결과, 문법적 대응 여부 확인 방법의 알고리즘 및 수행 결과를 확인할 수 있다.12 to 16, algorithms and performance results of a method for determining semantic correspondence performed by a computer of the present invention, and algorithms and performance results for a method for determining grammatical correspondence can be confirmed.

문법적 대응 여부 확인에 대하여 설명하면, 도 14에 개시된 아래와 같은 문장이 입력으로 주어진다고 가정한다.When describing grammatical correspondence, it is assumed that the following sentence disclosed in FIG. 14 is given as input.

질의 문장 : 1일 평균 300여명이 이용하는 포항시민볼링장은 24개 레인으로 운영되고 있다.Question sentence: Pohang Citizen Bowling Center, which is used by 300 people on average per day, operates in 24 lanes.

핵심 문장 : 포항시민볼링장은 24개 레인으로 오전 10시부터 밤 12시까지 운영하고 있으며 1일 평균 300여명이 이용하고 있다.Key sentence: Pohang Citizen Bowling Alley operates 24 lanes from 10 am to 12 pm and is used by an average of 300 people per day.

먼저 아래와 같은 6개의 주요 및 부속 성분들로 입력 문장들의 패턴을 생성한다.First, patterns of input sentences are created with the following six main and sub components.

이들은 각각 Sbj (주어 성분), Obj (목적어 성분), Cmp (보어 성분), Vp (동사 성분), Adverb (부사 성분), 독립 성분(Independent components )일 수 있다.These may be Sbj (main component), Obj (target component), Cmp (bore component), Vp (verb component), Adverb (adverb component), and independent components, respectively.

그리고 패턴이 생성될 수 없는 조건이 있을 수도 있으므로 예외 처리를 위한 패턴도 생성하고, 생성된 것은 Etc (기타 성분)일 수 있다.In addition, there may be conditions in which a pattern cannot be generated, so a pattern for exception handling is also generated, and the generated one may be Etc (other components).

그리고 아래의 [표1]과 같은 품사 태그 기준을 따른다. (사용하는 형태소 분석기 마다 다를 수 있으나, 여기서는 mecab 형태소 분석기 기준을 따르기로 한다.)And follow the part-of-speech tag criteria as shown in [Table 1] below. (It may be different for each morpheme analyzer to be used, but here we will follow the mecab morpheme analyzer standards.)

NNGNNG 일반 명사Common noun NNPNNP 고유명사Proper noun NNBNNB 의존 명사Dependent nouns NNBCNNBC 단위를 나타내는 명사Noun NRNR 수사Investigation NPNP 대명사pronoun VVVV 동사verb VAVA 형용사adjective VXVX 보조 용언Auxiliary sayings VCPVCP 긍정 지정사A positive designator VCNVCN 부정 지정사Fraud MMMM 관형사Adjective MAGMAG 일반 부사General adverb MAJMAJ 접속 부사Adverb ICIC 감탄사interjection JKSJKS 주격 조사Subjective investigation JKCJKC 보격 조사An interrogation investigation JKGJKG 관형격 조사Pipe type investigation JKOJKO 목적격 조사Objective investigation JKBJKB 부사격 조사Secondary fire investigation JKVJKV 호격 조사Census JKQJKQ 인용격 조사Citation investigation JXJX 보조사Assistant JCJC 접속 조사Connection investigation EPEP 선어말 어미A fresh mother EFEF 종결 어미The ending mother ECEC 연결 어미Connecting mother ETNETN 명사형 전성어미A noun form ETMETM 관형형 전성어미 Tubular malleable mother XPNXPN 체언 접두사Body prefix XSNXSN 명사 파생 접미사Noun derived suffix XSVXSV 동사 파생 접미사Verb-derived suffix XSAXSA 형용사 파생 접미사Adjective derivative suffix XRXR 어근radix SFSF 마침표, 물음표, 느낌표Period, question mark, exclamation mark SESE 줄임표 ?Ellipsis? SSOSSO 여는 괄호 (, [Opening parenthesis (, [ SSCSSC 닫는 괄호 ), ]Closing parenthesis ),] SCSC 구분자 , · / :Separator , · / : SYSY 기타 기호Other symbols SLSL 외국어Foreign language SHSH 한자Chinese character SNSN 숫자number

여기서, 문장의 패턴은 형태소 분석에서 출력된 결과(도 14 참조)를 기반으로 조사(JK~ 성분) 또는 보조사 등문장의 보조 성분들을 기준으로 형성될 수 있다. 예를 들어 다른 성분들과는 달리 조금 복잡한 규칙이 있는 주어 및 보어 성분 패턴 분류 살펴본다. "1일 평균 300여명이"까지 분석이 된다면 여기서 "이"는 주격 조사(JKS) 이므로 주어 성분이 될 수 있다. 그리고 보편적으로 주어가 될 수 있는 "은, 는, 이, 가"의 단어가 주어 성분이 될 수 있다. 하지만 예외의 규칙이 있습니다. 단순히 주격 조사(JKS)가 포함되는 것으로 주어로 판단한다면 보어 성분을 찾을 수 없다. 여기서 보어 규칙을 추가적으로 적용하여, 주어를 찾더라도 '는'+'ETM'(관형형 전성 어미)과 같은 또 다른 주어 후보가 다음 어절에 등장한다면 보어로 판단하여 첫번째 등장한 주어 후보인 어절과 분리할 수 있다. 예를 들어, "1일 평균 300여명이"-> 주어성분 "이용하는"-> 보어성분(문장의 주어를 보충함)의 순서로 진행될 수 있다.이렇게 조사 또는 보조사 등으로 찾아진 성분들은 앞에 not found 로 표시된(즉 문장 성분을 아직은 구분할 수 없음) 단어들과 병합하여, 하나의 문장 패턴으로 표현하도록 한다. 따라서 도 16에 개시된 문장 패턴 생성 결과와 같은 형태로 출력될 수 있다.Here, the pattern of the sentence may be formed on the basis of the results (see FIG. 14) output from the morpheme analysis (JK ~ component) or auxiliary components of the assistant's sentence. For example, look at the subject and bore component pattern classification, which has a slightly more complicated rule than other components. If "Average 300 people per day" is analyzed, "this" is a subject investigation (JKS), so it can be a subject. And the words "silver, silver, tooth, and a", which can be universally subject, can be a subject. But there are exceptions. If it is judged that the subject is simply included in the subject investigation (JKS), the bore component cannot be found. Here, by applying the Bohr rule additionally, even if the subject is found, if another subject candidate such as'+' EMT' (tubular pronoun mother) appears in the next word, it can be judged as a bore and separated from the first subject candidate word. have. For example, it can be performed in the order of "average of 300 people per day" -> subject ingredient "used" -> bore ingredient (supplement the subject of the sentence). It is merged with words marked as found (that is, the sentence component is not yet distinguishable), and expressed as a sentence pattern. Therefore, it can be output in the same form as the sentence pattern generation result disclosed in FIG. 16.

이상으로 설명한 본 발명의 효과는 질의 문장과 관련된 문장을 뉴스 기사에서 찾아서 단어 수준에서 분석한 결과를 기초로, 딥 러닝 모델을 이용하여 의미적 일치 여부를 판단한 결과를 제공하고, 한국어 문법을 고려한 분석을 통해 문장 매칭한 결과를 제공하여, 의미적으로 문장 매칭한 결과와 문법적으로 문장 매칭한 결과를 집계하여 가짜 뉴스를 탐지할 수 있다는 것이다.The above-described effect of the present invention provides a result of determining a semantic match using a deep learning model based on a result of finding a sentence related to a query sentence in a news article and analyzing it at a word level, and an analysis considering Korean grammar Through this, it is possible to detect fake news by providing sentence matching results and counting semantically matching sentences and grammatically matching sentences.

한편, 본 발명의 일 실시예에 따른 한국어 문법 변환 기반 가짜뉴스 탐지 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Meanwhile, the method for detecting fake news based on Korean grammar conversion according to an embodiment of the present invention may be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Although the methods and systems of the present invention have been described in connection with specific embodiments, some or all of their components or operations may be implemented using a computer system having a general purpose hardware architecture.

이상의 설명은 본 발명의 기술적 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예는 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술적 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain them, and the scope of the technical spirit of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the claims below, and all technical spirits within the scope equivalent thereto should be interpreted as being included in the scope of the present invention.

Claims

In the fake news detection method based on Korean grammar conversion performed by the fake news detection server,
Receiving a query sentence and a news article to be detected;
Extracting an article sentence related to the query sentence as a key sentence in the news article to be detected;
Checking whether a word in the query sentence corresponds to a word in the core sentence;
Checking whether the query sentence and the core sentence are semanticly matched according to whether the word matches;
Checking whether a grammatical correspondence exists between the query sentence and the core sentence according to the semantic correspondence; And
And determining whether the core sentence is true or false based on the identified semantic correspondence result and the grammatical correspondence result.

According to claim 1,
The core sentence extraction step,
Decomposing the query sentence and the article sentence into word composition units including a root and a suffix;
Calculating a cosine similarity by vectorizing and comparing the word construction units of the query sentence with the word construction units of the article sentence; And
And extracting the article sentence including the word construction unit of the query sentence and the word construction unit having the highest cosine similarity as the core sentence.

According to claim 1,
The step of checking whether a word in the query sentence corresponds to a word in the key sentence is
Decomposing the query sentence for each word and storing each word in the arrangement order of the words to generate a word matching set;
Decomposing the key sentence into words and storing the words in the arrangement order of each word to generate a word matching set;
Comparing the word matching set of the query sentence and the word matching set of the core sentence with each other; And
A method for detecting fake news based on Korean grammar conversion, comprising the step of classifying and outputting the comparison result into full word matching and partial word matching.

According to claim 3,
When the comparison result is output as a complete word match,
A fake news detection method based on Korean grammar conversion by determining whether the query sentence and the core sentence are semantically matched and checking whether the query sentence and the core sentence are grammatically matched.

According to claim 1,
The step of checking whether the query sentence and the key sentence are semanticly corresponded is:
Extracting the word vector of the query sentence and the word vector of the core sentence;
Matching the word vector of the extracted query sentence and the word vector of the core sentence with a word embedding vector of pre-trained Korean news articles;
Applying deep learning to the word vector of the matched query sentence and the word vector of the core sentence to extract each as a context-related embedding vector;
Extracting a query sentence matching vector and a core sentence matching vector by applying deep learning to the context-related embedding vector of the extracted query sentence and the core sentence context-related embedding vector;
Deep learning is applied to the extracted query sentence matching vector and the core sentence matching vector to extract them into a context aggregated question matching vector and a contextual aggregated key sentencing set matching vector To do;
Aggregated Question Matching Vector and Aggregated Key Sentence Matching Vector by matching the Last Time Step Matching Vector to the context aggregate query sentence matching vector and the context aggregate core sentence matching vector Sentence Set Matching Vector); And
And calculating a semantic similarity between the query sentence and the core sentence using artificial neural networks and normalization based on the extracted aggregate query sentence matching vector and the aggregate core sentence matching vector. Detection method.

According to claim 1,
The step of checking whether the query sentence and the core sentence are grammatically mapped is:
Setting the order of words in the query sentence and the order of words in the core sentence, respectively;
Morphologically analyzing words of the query sentence in which the order is set and words of the core sentence and classifying each morpheme;
Classifying the morpheme of the classified query sentence word and the morpheme of the core sentence word based on a word;
Generating a pattern of the query sentence classified based on the word and the pattern of the core sentence; And
Comprising the steps of comparing the pattern of the generated query sentence and the pattern of the core sentence to determine whether the query sentence and the core sentence are mutually deformable sentences, Korean grammar conversion-based fake news detection method.

For fake news detection servers,
A memory containing a program for detecting fake news based on Korean grammar conversion; And
It includes a processor for executing the program recorded in the memory,
The processor according to the execution of the program,
Receive a query sentence and a news article to be detected, extract an article sentence related to the query sentence from the news article to be detected as a key sentence, check whether a word in the query sentence corresponds to a word in the key sentence, and According to whether the word matches, the semantic correspondence between the query sentence and the core sentence is checked, and according to the semantic correspondence, the grammatical correspondence between the query sentence and the core sentence is checked, and the identified semantic A fake news detection server based on Korean grammar conversion that determines whether the core sentence is true or false based on a result of the correspondence and the result of the grammatical correspondence.

The method of claim 7,
Extracting the key sentence,
The query sentence and the article sentence are decomposed into a word configuration unit including a root and a suffix, the word composition unit of the query sentence and the word construction unit of the article sentence are vectorized and compared with each other to calculate cosine similarity, and the query A fake news detection server based on Korean grammar conversion, comprising extracting the article sentence including the word structure unit of the sentence and the word structure unit having the highest cosine similarity as the core sentence.

The method of claim 7,
Checking whether the word in the query sentence corresponds to the word in the key sentence is
The query sentence is decomposed by word and stored in the order of placement of each word to generate a word matching set, and the core sentence is decomposed by word and stored in the order of placement of each word to generate a word matching set. A fake news detection server based on Korean grammar conversion, which includes comparing the word matching set and the word matching set of the core sentence and classifying the result of the comparison into full word matching and partial word matching.

The method of claim 9,
When the comparison result is output as a complete word match,
A fake news detection server based on Korean grammar conversion by determining whether the query sentence and the core sentence are semantically matched and checking whether the query sentence and the core sentence are grammatically corresponding.

The method of claim 7,
The semantic correspondence between the query sentence and the key sentence is checked.
The word vector of the query sentence and the word vector of the core sentence are extracted, the word vector of the extracted query sentence and the word vector of the core sentence are matched with the word embedding vector of a pre-learned Korean news article, and the matched Deep learning is applied to the word vector of the query sentence and the word vector of the core sentence to extract each as a context-related embedding vector, and deep learning is applied to the context-related embedding vector and the core sentence context-related embedding vector of the extracted query sentence. To extract the query sentence matching vector and the core sentence matching vector respectively, and apply deep learning to the extracted query sentence matching vector and the core sentence matching vector to context aggregate aggregated query matching vector and context aggregate Extracted as a core sentence matching vector (Contextual Aggregated Key Sentence Set Matching Vector), and aggregated query sentences by matching the last time step matching vector to the context aggregate query sentence matching vector and the context aggregate key sentence matching vector Extracting the Aggregated Question Matching Vector and Aggregated Key Sentence Set Matching Vector, and using artificial neural networks and normalization based on the extracted aggregate query sentence matching vector and the aggregated core sentence matching vector And calculating a semantic similarity between the query sentence and the core sentence.

The method of claim 7,
Checking the grammatical correspondence between the query sentence and the key sentence,
The order of words in the query sentence and the order of words in the key sentence are respectively set, the words of the query sentence in which the order is set and the words in the key sentence are morphologically analyzed, classified into each morpheme, and the classified query sentence The morpheme of the word and the morpheme of the key sentence word are classified based on a word, and a pattern of the query sentence and the pattern of the key sentence classified based on the word are generated, and the pattern and the key sentence of the generated query sentence are A fake news detection server based on Korean grammar conversion, comprising comparing the patterns with each other and checking whether the query sentence and the core sentence are mutually deformable sentences.

A computer-readable recording medium on which a program for implementing the method according to claims 1 to 6 is recorded.