KR100931785B1

KR100931785B1 - Device and method for discriminating illegal content

Info

Publication number: KR100931785B1
Application number: KR1020070117712A
Authority: KR
Inventors: 윤세웅; 공기중; 이재호; 박순선
Original assignee: 주식회사 오피엠에스
Priority date: 2007-11-19
Filing date: 2007-11-19
Publication date: 2009-12-14
Also published as: KR20090051362A; WO2009066898A1

Abstract

부정 컨텐츠 판별 장치 및 방법이 제공된다. 본 발명의 컨텐츠 판별 장치는 각 단어가 부정적인 내용의 컨텐츠에 노출될 확률값이 저장되어 있는 사전 모듈, 분석할 컨텐츠에 포함된 소정의 키워드를 추출하고, 추출된 키워드에 대한 점수를 상기 사전 모듈을 이용하여 계산하는 계산 모듈, 계산된 점수가 소정 점수를 초과하는지 여부를 확인하고, 소정 점수를 초과하면 해당 컨텐츠를 부정 컨텐츠로 판단하는 분석 모듈을 포함한다. 본 발명에 의하면 부정적인 내용을 포함하는 컨텐츠를 판별하여 웹 사이트 상에서 광고를 선별적으로 게재할 수 있는 효과가 있다. An apparatus and method for discriminating illegal content are provided. The content determining apparatus of the present invention extracts a dictionary module in which a probability value of each word is exposed to negative content, a predetermined keyword included in the content to be analyzed, and uses the dictionary module to score the extracted keyword. And a calculation module to calculate the calculated score and whether the calculated score exceeds a predetermined score, and when the calculated score exceeds the predetermined score, determining a corresponding content as negative content. According to the present invention, there is an effect of selectively displaying an advertisement on a web site by determining content including negative contents.

컨텐츠, 광고, 토픽, 주제, 탐색, 판별, 부정, 네거티브, 긍정, 포지티브. Content, advertising, topics, topics, navigation, discrimination, negative, negative, positive, positive.

Description

Apparatus and method for discriminating negative contents}

본 발명은 부정 컨텐츠 판별 장치 및 방법에 관한 것으로서, 더욱 상세하게는 웹 사이트에 제공되는 컨텐츠에서 부정적인 내용을 포함하는 컨텐츠를 판별하는 부정 컨텐츠 판별 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for discriminating illegal content, and more particularly, to an apparatus and method for discriminating illegal content for determining content including negative content from content provided on a web site.

인터넷 환경이 급속도록 발전하면서 웹 사이트를 통한 광고가 급격히 증가하는 추세이다. 인터넷에 광고를 하는 방식은 웹 사이트에 배너 형식으로 광고를 하거나, 텍스트 형식으로 광고를 하는 방식이 있다. As the Internet environment develops rapidly, advertisements through websites are rapidly increasing. There are two ways to advertise on the Internet: advertising in a banner format on a website or advertising in a text format.

최근 구글(google), 네이버 등과 같은 검색사이트에서 사용자가 입력한 키워드를 이용한 광고 방식이 많이 개발되고 있다. 예를 들어, 키워드 검색 광고는 미리 광고를 준비해 두었다가 관련 키워드의 질의가 나오면 해당 광고를 게재하는 형식이다. Recently, many advertising methods using keywords entered by users in search sites such as Google and Google have been developed. For example, a keyword search advertisement is prepared by preparing an advertisement in advance, and then displaying the advertisement when a query of related keywords appears.

이러한 키워드 검색 광고 기법에서 더 나아가서 웹 사이트에 게재되는 컨텐 츠의 주제를 탐색하는 주제 탐색(topic detection) 기법을 이용하여 광고를 게재하는 방식이 있다. 즉, 주제 탐색 기법을 통해 컨텐츠의 토픽을 파악하고, 컨텐츠 주위에 해당 토픽에 관련된 내용의 광고를 배너나 텍스트 형식으로 게재하는 것이다. 예를 들어, 주제 탐색 기법을 통해 컨텐츠의 토픽이 "가방"으로 판별되면, 가방에 관련된 광고주의 광고를 해당 컨텐츠에 싣는 것이다. 또는, 주제 탐색 기법을 통해 컨텐츠의 토픽이 "건강"으로 판별되면, 건강에 관련된 광고주의 광고를 해당 컨텐츠에 싣는다. 이 때의 웹 페이지 화면이 도 1에 도시되어 있다. 도 1에서 기사 컨텐츠는 건강이라는 토픽이고, 건강 토픽에 관련된 헬스 광고(20)가 배너 형식으로 게재되어 있다. In addition to the keyword search advertisement technique, there is a method of displaying an advertisement using a topic detection technique that searches for a topic of content displayed on a web site. In other words, the topic search method is used to identify the topic of the content and to display advertisements of the content related to the topic around the content in a banner or text format. For example, if a topic of content is determined to be a "bag" through a subject search technique, an advertisement of an advertiser related to a bag is placed on the corresponding content. Or, if the topic of the content is determined to be "health" through a topic search technique, the advertisement of the advertiser related to the health is placed on the content. The web page screen at this time is shown in FIG. In FIG. 1, the article content is a health topic, and a health advertisement 20 related to the health topic is posted in a banner format.

그러나, 이러한 주제탐색 기법을 이용한 광고 게재 방법은 광고주의 광고에 연관된 토픽의 컨텐츠에 광고를 게재함으로써, 경우에 따라서는 기대하는 광고효과를 얻지 못하고 오히려 부정적인 역효과를 얻게 되는 문제점이 있다. 예를 들어, 가방이라는 토픽의 컨텐츠에 가방 광고를 게재하였는데, 해당 컨텐츠가 범죄에 대한 내용의 컨텐츠인 경우에 오히려 광고 게재가 안좋은 영향을 줄 수 있다. 이처럼, 컨텐츠의 토픽을 추출하는 것 뿐만 아니라, 해당 컨텐츠가 정서적으로 옳은 것인지 아닌지 여부를 판단하는 프로세스가 필요하다. 즉, 광고를 게재할 컨텐츠가 광고에 부합하는 긍정적인 내용의 컨텐츠인지 아니면 부정적인 내용을 담고있는 컨텐츠인지를 판별하여 긍정적인 내용의 컨텐츠에만 선별적으로 광고를 게재하는 프로세스가 필요한 것이다. However, there is a problem in that the advertisement posting method using the subject search technique does not obtain the expected advertising effect in some cases, but rather has a negative adverse effect by placing the advertisement in the content of a topic related to the advertisement of the advertiser. For example, if a bag advertisement is placed on the content of a topic called a bag, but the corresponding content is a content of a crime, the advertisement may be adversely affected. As such, a process is needed to not only extract the topic of the content, but also determine whether the content is emotionally correct or not. In other words, it is necessary to determine whether the content in which the advertisement is to be displayed is the content of the positive content corresponding to the advertisement or the content containing the negative content, and selectively display the advertisement only on the content of the positive content.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 컨텐츠가 부정적인 내용의 컨텐츠인지 여부를 판별할 수 있도록 하는 부정 컨텐츠 판별 장치 및 방법을 제공하는데 그 목적이 있다.Disclosure of Invention The present invention has been made to solve the above problems, and an object thereof is to provide an apparatus and method for discriminating negative content which enables to determine whether content is negative content.

이와 같은 목적을 달성하기 위한 본 발명의 컨텐츠 판별 장치는 각 단어가 부정적인 내용의 컨텐츠(이하, "부정 컨텐츠"라 함)에 노출될 확률값이 저장되어 있는 사전 모듈, 분석할 컨텐츠에 포함된 소정의 키워드를 추출하고, 추출된 키워드에 대한 점수를 상기 사전 모듈을 이용하여 계산하는 계산 모듈, 계산된 점수가 소정 점수를 초과하는지 여부를 확인하고, 소정 점수를 초과하면 해당 컨텐츠를 부정 컨텐츠로 판단하는 분석 모듈을 포함한다. The content discriminating apparatus of the present invention for achieving the above object is a dictionary module that stores the probability that each word is exposed to the content of negative content (hereinafter referred to as "negative content"), the predetermined module included in the content to be analyzed A calculation module for extracting a keyword, calculating a score for the extracted keyword using the dictionary module, checking whether the calculated score exceeds a predetermined score, and determining the content as an illegal content if the score exceeds the predetermined score Includes an analysis module.

본 발명의 일 실시예에서 분석할 컨텐츠를 제목과 본문으로 분리하는 분리모듈을 더 포함하고, 상기 계산 모듈은 제목과 본문별로 다른 가중치를 적용하여 점수를 계산할 수 있다. In one embodiment of the present invention further comprises a separation module for separating the content to be analyzed to the title and body, the calculation module may calculate the score by applying different weights for each title and body.

상기 사전 모듈은 소정 단어가 부정 컨텐츠에 노출되는 빈도수를 수치화한 결과값을 이용하여 카이제곱 검정을 통해 확률값을 얻을 수 있다. 이때, 상기 사전 모듈은 소정 단어가 노출된 긍정 컨텐츠의 수를 A, 소정 단어가 노출된 부정 컨텐츠의 수를 B, 소정 단어가 노출되지 않은 긍정 컨텐츠의 수를 C, 소정 단어가 노출 되지 않은 부정 컨텐츠의 수를 D, 전체 컨텐츠의 수를 N이라고 할 때, 카이제곱 통계치 X는

의 수식을 만족할 수 있다. The dictionary module may obtain a probability value through a chi-square test using a result value obtained by quantifying a frequency at which a predetermined word is exposed to negative content. In this case, the dictionary module may determine the number of positive contents in which a predetermined word is exposed, the number of negative contents in which a predetermined word is exposed, the number of positive contents in which a predetermined word is not exposed, and the number of positive contents in which a predetermined word is not exposed. When the number of contents is D and the total number of contents is N, the chi-square statistic X is

It can satisfy the formula of.

상기 사전 모듈은 소정 단어가 컨텐츠에 노출된 위치에 따른 가중치가 저장될 수 있다. The dictionary module may store weights according to positions where a predetermined word is exposed to content.

상기 사전 모듈은 소정 단어가 컨텐츠의 상단 또는 하단에 위치할 때의 가중치가 높게 설정되어 저장될 수 있다. The dictionary module may be stored with a high weight when a predetermined word is located at the top or bottom of the content.

상기 계산 모듈은 추출된 키워드의 해당 컨텐츠에 노출된 빈도수에 따른 가중치를 적용하여 점수를 계산할 수 있다. The calculation module may calculate a score by applying a weight according to the frequency of exposure to the corresponding content of the extracted keyword.

상기 계산 모듈은 추출된 키워드의 해당 컨텐츠에 노출된 위치에 따라 가중치를 적용하여 점수를 계산할 수 있다. The calculation module may calculate a score by applying a weight according to a position exposed to the corresponding content of the extracted keyword.

상기 분석 모듈은 컨텐츠를 부정 컨텐츠로 판단한 경우에 판단의 원인이 된 키워드들을 분석하여 해당 컨텐츠에 대한 네거티브(negative) 토픽을 설정할 수 있다. 이때, 상기 네거티브 토픽을 구매한 광고주가 있는 경우, 상기 분석 모듈은 상기 네거티브 토픽에 해당하는 컨텐츠에 광고 게재를 허용할 수 있다. When determining that the content is negative content, the analysis module may set negative topics for the corresponding content by analyzing keywords that caused the determination. In this case, when there is an advertiser who purchases the negative topic, the analysis module may allow the advertisement to be displayed on the content corresponding to the negative topic.

본 발명의 일 실시예에 따른 부정 컨텐츠 판별 방법은 분석할 컨텐츠를 수신하면, 컨텐츠에서 키워드를 추출하는 제1단계와, 각 단어가 부정 컨텐츠에 노출될 확률값이 저장되어 있는 사전 모듈을 이용하여 키워드에 대한 점수를 계산하는 제2단계와, 계산된 점수가 소정 점수를 초과하는지 여부를 확인하고, 소정 점수를 초 과하면 해당 컨텐츠를 부정 컨텐츠로 판단하는 제3단계를 구비한다. In the negative content determining method according to an embodiment of the present invention, when a content to be analyzed is received, a keyword is extracted by using a first step of extracting a keyword from the content and a dictionary module in which a probability value of each word is exposed to the negative content is stored. And a third step of checking whether the calculated score exceeds a predetermined score, and if the calculated score exceeds the predetermined score, determining a corresponding content as an illegal content.

상기 제2단계는 분석할 컨텐츠를 제목과 본문으로 분리하는 단계, 제목과 본문별로 다른 가중치를 적용하여 점수를 계산하는 단계를 포함하여 이루어질 수 있다. The second step may include separating the content to be analyzed into a title and a body, and calculating a score by applying different weights to the title and the body.

상기 사전 모듈은 소정 단어가 부정 컨텐츠에 노출되는 빈도수를 수치화한 결과값을 이용하여 카이제곱 검정을 통해 확률값을 얻을 수 있다. 이때, 상기 사전 모듈은 소정 단어가 노출된 긍정 컨텐츠의 수를 A, 소정 단어가 노출된 부정 컨텐츠의 수를 B, 소정 단어가 노출되지 않은 긍정 컨텐츠의 수를 C, 소정 단어가 노출되지 않은 부정 컨텐츠의 수를 D, 전체 컨텐츠의 수를 N이라고 할 때, 카이제곱 통계치 X는

의 수식을 만족할 수 있다. The dictionary module may obtain a probability value through a chi-square test using a result value obtained by quantifying a frequency at which a predetermined word is exposed to negative content. In this case, the dictionary module determines the number of positive contents in which a predetermined word is exposed, the number of negative contents in which a predetermined word is exposed, the number of positive contents in which a predetermined word is not exposed, and the number of positive contents in which a predetermined word is not exposed. When the number of contents is D and the total number of contents is N, the chi-square statistic X is

It can satisfy the formula of.

본 발명의 일 실시예에서 컨텐츠를 부정 컨텐츠로 판단한 경우에 판단의 원인이 된 키워드들을 분석하여 해당 컨텐츠에 대한 네거티브(negative) 토픽을 설정하는 제4단계를 더 포함할 수 있다. 이때, 상기 네거티브 토픽을 구매한 광고주가 있는 경우, 상기 분석 모듈은 상기 네거티브 토픽에 해당하는 컨텐츠에 광고 게재를 허용하는 제5단계를 더 포함할 수 있다. According to an embodiment of the present invention, if the content is determined to be inaccurate content, the method may further include a fourth step of analyzing a keyword causing the determination and setting a negative topic for the corresponding content. In this case, when there is an advertiser who purchases the negative topic, the analysis module may further include a fifth step of allowing an advertisement to be displayed on content corresponding to the negative topic.

본 발명에 의하면 부정적인 내용을 포함하는 컨텐츠를 판별하여 웹 사이트 상에서 광고를 선별적으로 게재할 수 있는 효과가 있다. According to the present invention, there is an effect of selectively displaying an advertisement on a web site by determining content including negative contents.

또한, 광고주의 선택에 따라 부정적인 내용의 컨텐츠라고 하더라도 광고 게재를 허용하여 광고효과를 높일 수 있는 장점이 있다. In addition, even if the content of the negative content according to the selection of the advertiser has the advantage that can increase the advertising effect by allowing the advertisement.

이하, 첨부된 도면을 참조해서 본 발명의 실시예를 상세히 설명하면 다음과 같다. 우선 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 그리고, 본 발명을 설명함에 있어서, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. First of all, in adding reference numerals to the components of each drawing, it should be noted that the same reference numerals have the same reference numerals as much as possible even if displayed on different drawings. In describing the present invention, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

도 2는 본 발명의 일 실시예에 따른 부정 컨텐츠 판별 장치의 내부구성을 보여주는 블록도이다. 부정 컨텐츠 판별 장치는 사전 모듈(100), 분리 모듈(200), 계산 모듈(300), 분석 모듈(400)을 포함하여 이루어진다. 2 is a block diagram illustrating an internal configuration of an apparatus for discriminating negative content according to an embodiment of the present invention. The fraudulent content determination device includes a dictionary module 100, a separation module 200, a calculation module 300, and an analysis module 400.

사전 모듈(100)은 각 단어가 부정적인 내용의 컨텐츠(이하, "부정 컨텐츠"라 함)에 노출될 확률값이 저장되어 있다. The dictionary module 100 stores a probability value in which each word is exposed to content of negative content (hereinafter, referred to as "negative content").

분리 모듈(200)은 분석할 컨텐츠를 제목과 본문으로 분리하는 역할을 한다. 본 발명의 일 실시예에서 분리 모듈(200)은 생략이 가능하다. 본 발명에서 분리 모듈(200)에서 컨텐츠를 제목과 본문으로 분리하는 이유는 제목과 본문마다 각각 가중치를 달리 적용하기 위함이다. The separation module 200 separates content to be analyzed into a title and a body. In one embodiment of the present invention, the separation module 200 may be omitted. In the present invention, the reason for separating the content into the title and the body in the separation module 200 is to apply different weights to each of the title and the body.

계산 모듈(300)은 분석할 컨텐츠에 포함된 소정의 키워드를 추출하고, 추출 된 키워드에 대한 점수를 사전 모듈(100)을 이용하여 계산하는 역할을 한다. 또한, 계산 모듈(300)은 분리 모듈(200)에서 분리된 제목과 본문별로 다른 가중치를 적용하여 점수를 계산한다. The calculation module 300 extracts a predetermined keyword included in the content to be analyzed and calculates a score for the extracted keyword using the dictionary module 100. In addition, the calculation module 300 calculates a score by applying different weights for the title and the body separated by the separation module 200.

분석 모듈(400)은 계산 모듈(300)에서 계산된 점수가 소정 점수를 초과하는지 여부를 확인하고, 소정 점수를 초과하면 해당 컨텐츠를 부정 컨텐츠로 판단한다. The analysis module 400 checks whether the score calculated in the calculation module 300 exceeds a predetermined score, and if the score exceeds the predetermined score, determines the corresponding content as negative content.

본 발명의 일 실시예에서 사전 모듈(100)은 소정의 과정을 거쳐서 미리 구축되어 있다. 본 발명에서 사전 모듈(100)은 소정 단어가 부정 컨텐츠에 노출되는 빈도수를 수치화한 결과값을 이용하여 카이제곱 검정을 통해 확률값을 얻을 수 있다. 예를 들어, 소정 단어가 노출된 긍정 컨텐츠의 수를 A, 소정 단어가 노출된 부정 컨텐츠의 수를 B, 소정 단어가 노출되지 않은 긍정 컨텐츠의 수를 C, 소정 단어가 노출되지 않은 부정 컨텐츠의 수를 D, 전체 컨텐츠의 수를 N이라고 할 때, 카이제곱 통계치 X는,In one embodiment of the present invention, the dictionary module 100 is built in advance through a predetermined process. In the present invention, the dictionary module 100 may obtain a probability value through a chi-square test using a result value obtained by quantifying a frequency at which a given word is exposed to negative content. For example, the number of positive contents in which a predetermined word is exposed A, the number of negative contents in which a predetermined word is exposed B, the number of positive contents in which a predetermined word is not exposed C, When the number is D and the total number of contents is N, the chi-square statistic X is

의 수식을 만족한다. Satisfies the formula.

본 발명의 일 실시예에서 사전 모듈(100)은 소정 단어가 컨텐츠에 노출된 위치에 따른 가중치가 저장되어 있을 수 있다. 예를 들어 사전 모듈(100)은 소정 단어가 컨텐츠의 상단 또는 하단에 위치할 때의 가중치가 높게 설정되어 저장될 수 있다. 왜냐하면 보통 문서의 상단 또는 하단에 중요한 내용이 위치하기 때문이다.In an embodiment of the present invention, the dictionary module 100 may store weights according to positions where a predetermined word is exposed to content. For example, the dictionary module 100 may store the weight when the predetermined word is located at the top or the bottom of the content. This is because there is usually important content at the top or bottom of the document.

본 발명의 일 실시예에서 계산 모듈(300)은 추출된 키워드의 해당 컨텐츠에 노출된 빈도수에 따른 가중치를 적용하여 점수를 계산할 수 있다. 예를 들어, "불량"이라는 키워드가 해당 컨텐츠에 몇번 노출되었는지를 검사하여 노출된 빈도수에 따라 가중치를 적용하여 점수를 계산하는 것이다. In one embodiment of the present invention, the calculation module 300 may calculate the score by applying a weight according to the frequency of exposure to the corresponding content of the extracted keyword. For example, by checking how many times the keyword "bad" has been exposed to the corresponding content, the score is calculated by applying a weight according to the exposed frequency.

본 발명의 다른 실시예에서 계산 모듈(300)은 추출된 키워드의 해당 컨텐츠에 노출된 위치에 따라 가중치를 적용하여 점수를 계산할 수 있다. 예를 들어, "불량"이라는 키워드가 해당 컨텐츠의 상단이나 하단에 위치할 때에 가중치를 높게 적용하여 계산하는 식이다.In another embodiment of the present invention, the calculation module 300 may calculate a score by applying a weight according to a position exposed to the corresponding content of the extracted keyword. For example, when the keyword "bad" is located at the top or the bottom of the content, it is calculated by applying a high weight.

본 발명에서 부정 컨텐츠를 판별하는 이유는 부정적인 내용의 컨텐츠를 판별하여 해당 컨텐츠에 광고를 게재하지 않기 위함이다. 예를 들어, 비록 컨텐츠의 토픽은 가방이라 하더라도 범죄에 관한 내용으로 부정 컨텐츠로 판별받은 컨텐츠에는 가방에 관련된 광고의 게재를 허용하지 않는 것이다. In the present invention, the reason for discriminating the negative content is to determine the content of the negative content and not display the advertisement on the corresponding content. For example, even though the topic of the bag is a bag, the content related to the crime is not allowed to be posted on the bag.

그러나, 부정적인 내용을 포함하는 컨텐츠라도 경우에 따라서는 광고를 게재하는 것이 효과적일 수 있다. 예를 들어, 사고, 범죄, 재난 등에 관한 내용을 포함하는 부정 컨텐츠의 경우에 보험과 관련된 광고를 게재하면 광고효과를 높일 수 있다. 따라서, 본 발명에서는 부정 컨텐츠의 토픽에 따라서 광고를 게재하는 방법을 제안하고자 한다. However, in some cases, it may be effective to post an advertisement even if the content includes negative contents. For example, in the case of fraudulent content including accidents, crimes, disasters, and the like, advertisements related to insurance can increase advertising effectiveness. Accordingly, the present invention proposes a method of displaying an advertisement according to a topic of fraudulent content.

본 발명의 일 실시예에서 분석 모듈(400)은 컨텐츠를 부정 컨텐츠로 판단한 경우에 판단의 원인이 된 키워드들을 분석하여 해당 컨텐츠에 대한 네거티브(negative) 토픽을 설정할 수 있다. 예를 들어, 부정 컨텐츠로 판단한 원인이 된 키워드들을 분석한 결과, "재난"을 해당 컨텐츠에 대한 네거티브 토픽으로 설정하는 것이다. According to an embodiment of the present invention, when determining that the content is negative content, the analysis module 400 may set negative topics for the corresponding content by analyzing the keywords causing the determination. For example, as a result of analyzing keywords that have been determined to be negative content, "disaster" is set as a negative topic for the corresponding content.

이때, 네거티브 토픽을 구매한 광고주가 있는 경우, 분석 모듈(400)은 네거티브 토픽에 해당하는 컨텐츠에 광고 게재를 허용한다. 예를 들어, 도 4에서 컨텐츠는 교통사고에 관한 내용을 담고 있기 때문에 부정 컨텐츠로 판별된 상태이고, "재난"으로 네거티브 토픽이 설정되어 있다면, "재난"이라는 네거티브 토픽을 구매한 광고주가 있는 경우, 해당 광고주의 광고(40)를 부정 컨텐츠에 게재하는 것이다. 도 4에서 게재된 광고(40)는 어린이 의료 보장 보험에 관한 보험 광고로서 해당 컨텐츠의 내용인 사고 기사에 대하여 적절한 광고 효과를 기대할 수 있다.In this case, if there is an advertiser who purchased a negative topic, the analysis module 400 allows the advertisement to be displayed on the content corresponding to the negative topic. For example, in FIG. 4, if the content is determined to be inaccurate content because it contains information about a traffic accident, and a negative topic is set as “disaster,” there is an advertiser who purchased a negative topic called “disaster”. In this case, the advertisement 40 of the advertiser is displayed on the fraudulent content. The advertisement 40 shown in FIG. 4 is an insurance advertisement about children's medical insurance, and an appropriate advertisement effect may be expected for an accident article which is the content of the corresponding content.

도 3은 본 발명의 일 실시예에 따른 부정 컨텐츠 판별 방법을 보여주는 흐름도이다.3 is a flowchart illustrating a method of determining negative content according to an embodiment of the present invention.

분석할 컨텐츠를 수신하면, 컨텐츠에서 키워드를 추출한다(S301).When the content to be analyzed is received, a keyword is extracted from the content (S301).

각 단어가 부정 컨텐츠에 노출될 확률값이 저장되어 있는 사전 모듈(100)을 이용하여 키워드에 대한 점수를 계산한다(S303).The score for the keyword is calculated using the dictionary module 100 in which the probability value of each word is exposed to the negative content is stored (S303).

계산된 점수가 소정 점수를 초과하는지 여부를 확인한다(S305). It is checked whether the calculated score exceeds a predetermined score (S305).

소정 점수를 초과하지 않으면 해당 컨텐츠를 긍정 컨텐츠로 판단한다(S313). 긍정 컨텐츠로 판단된 컨텐츠에 광고 게재를 허용한다(S315).If the predetermined score is not exceeded, the corresponding content is determined as positive content (S313). Ad serving is allowed on the content determined to be affirmative content (S315).

소정 점수를 초과하면 해당 컨텐츠를 부정 컨텐츠로 판단한다(S307).If the predetermined score is exceeded, the corresponding content is determined to be negative content (S307).

컨텐츠를 부정 컨텐츠로 판단한 경우에 판단의 원인이 된 키워드들을 분석하 여 해당 컨텐츠에 대한 네거티브 토픽을 설정한다(S309).If it is determined that the content is negative content, a negative topic for the corresponding content is set by analyzing keywords causing the determination (S309).

네거티브 토픽을 구매한 광고주가 있는 경우(S311), 네거티브 토픽에 해당하는 컨텐츠에 광고 게재를 허용한다(S315).If there is an advertiser who purchased the negative topic (S311), the advertisement is allowed to be displayed on the content corresponding to the negative topic (S315).

본 발명의 일 실시예에서 S303 단계는 분석할 컨텐츠를 제목과 본문으로 분리하는 단계, 제목과 본문별로 다른 가중치를 적용하여 점수를 계산하는 단계를 포함하여 이루어질 수 있다. In an embodiment of the present disclosure, step S303 may include separating content to be analyzed into a title and a body, and calculating a score by applying different weights for each title and the body.

의 수식을 만족한다. Satisfies the formula.

본 발명에서 S303 단계는 추출된 키워드의 해당 컨텐츠에 노출된 빈도수에 따른 가중치를 적용하여 점수를 계산할 수 있다. 예를 들어, "불량"이라는 키워드가 해당 컨텐츠에 몇번 노출되었는지를 검사하여 노출된 빈도수에 따라 가중치를 적용하여 점수를 계산하는 것이다. In the present invention, step S303 may calculate the score by applying a weight according to the frequency of exposure to the corresponding content of the extracted keyword. For example, by checking how many times the keyword "bad" has been exposed to the corresponding content, the score is calculated by applying a weight according to the exposed frequency.

또한, 본 발명에서 S303 단계는 추출된 키워드의 해당 컨텐츠에 노출된 위치에 따라 가중치를 적용하여 점수를 계산할 수 있다. 예를 들어, "불량"이라는 키워드가 해당 컨텐츠의 상단이나 하단에 위치할 때에 가중치를 높게 적용하여 계산하는 식이다.Also, in the present invention, step S303 may calculate a score by applying a weight according to the position exposed to the corresponding content of the extracted keyword. For example, when the keyword "bad" is located at the top or the bottom of the content, it is calculated by applying a high weight.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 다양한 변화와 수정을 가할 수 있음을 이해할 것이다.While the invention has been described using some preferred embodiments, these embodiments are illustrative and not restrictive. Those skilled in the art will appreciate that various changes and modifications can be made without departing from the spirit of the invention and the scope of the rights set forth in the appended claims.

도 1은 일반적인 웹 페이지 화면을 보여주는 도면이다.1 is a diagram illustrating a general web page screen.

도 2는 본 발명의 일 실시예에 따른 부정 컨텐츠 판별 장치의 내부구성을 보여주는 블록도이다.2 is a block diagram illustrating an internal configuration of an apparatus for discriminating negative content according to an embodiment of the present invention.

도 4는 본 발명의 일 실시예에 따른 웹 페이지 화면을 보여주는 도면이다.4 is a diagram illustrating a web page screen according to an embodiment of the present invention.

*도면의 주요 부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

100 사전 모듈 200 분리 모듈100 pre-module 200 separation module

300 계산 모듈 400 분석 모듈300 calculation module 400 analysis module

Claims

A dictionary module in which each word is stored in a negative value content (hereinafter, referred to as "negative content") and a weight value according to a position where a predetermined word is exposed to the content is stored;

A calculation module which extracts a predetermined keyword included in the content to be analyzed and calculates a score for the extracted keyword by applying a weight of the dictionary module according to the location of the extracted keyword in the corresponding content;

An analysis module for checking whether the calculated score exceeds a predetermined score, and determining the corresponding content as negative content when the calculated score exceeds a predetermined score,

The analysis module, when determining that the content is negative content, analyzes the keywords causing the determination, sets a negative topic for the corresponding content, and when there is an advertiser who purchased the negative topic, the analysis module Allow ads to be displayed on content corresponding to the negative topics,

The dictionary module, the negative content determining device, characterized in that the weight is set when the predetermined word is located at the top or bottom of the content is stored high.

The method of claim 1,

It further includes a separation module for separating the content to be analyzed into a title and body,

And the calculation module calculates a score by applying different weights to titles and texts.

delete

The method according to claim 1 or 2,

The dictionary module is configured to determine the number of positive contents in which a predetermined word is exposed, the number of negative contents in which a predetermined word is exposed, the number of positive contents in which a predetermined word is not exposed, and the number of negative contents in which the predetermined word is not exposed. When the number is D and the total number of contents is N, the chi-square statistic X is

The negative content determining device, characterized in that to satisfy the formula.

delete

The method according to claim 1 or 2,

The calculating module is negative content determination device, characterized in that for calculating the score by applying a weight according to the frequency of exposure to the corresponding content of the extracted keyword.

delete

Receiving a content to be analyzed, a first step of extracting a keyword from the content;

A second step of calculating a score for a keyword using a dictionary module in which a probability value of each word is exposed to negative content and a weight according to a position where a predetermined word is exposed to the content is stored;

A third step of checking whether the calculated score exceeds a predetermined score, and if the calculated score exceeds the predetermined score, determining the corresponding content as an illegal content;

A fourth step of setting negative topics for the corresponding contents by analyzing the keywords that caused the determination when the contents are judged to be illegal contents;

A fifth step of allowing an advertisement to be displayed on the content corresponding to the negative topic when an advertiser has purchased the negative topic

Negative content determination method comprising a.

The method of claim 10, wherein the second step

And separating the content to be analyzed into a title and a body, and calculating a score by applying different weights for each of the title and the body.

delete

The method according to claim 10 or 11, wherein

Negative content discrimination method characterized in that to satisfy the formula.

delete

The method according to claim 10 or 11, wherein

The dictionary module is a negative content determining method, characterized in that the weight is set when the predetermined word is located at the top or bottom of the content is stored high.

The method according to claim 10 or 11, wherein

The second step is a negative content determination method, characterized in that for calculating the score by applying a weight according to the frequency of exposure to the corresponding content of the extracted keyword.

delete