KR100770163B1

KR100770163B1 - Method and system for computing spam index

Info

Publication number: KR100770163B1
Application number: KR1020060019863A
Authority: KR
Inventors: 박소연; 최유미
Original assignee: 엔에이치엔(주)
Priority date: 2006-03-02
Filing date: 2006-03-02
Publication date: 2007-10-26
Also published as: KR20070090312A

Abstract

본 발명은 질문 정보 및 상기 질문 정보와 연관된 답변 정보를 포함하는 질문-답변 정보 세트의 저장 시각 정보, 키워드 및 IP 정보와 연관된 어뷰징(abusing) 요소에 기초하여 상기 질문-답변 정보 세트의 스팸 지수를 산정하고, 상기 스팸 지수에 따라 상기 질문-답변 정보 세트를 적절히 처리하는 방법 및 시스템에 관한 것이다. 본 발명에 따른 스팸 지수 산정 시스템에서 수행되는 스팸 지수 산정 방법은 질문 정보 및 상기 질문 정보와 연관된 답변 정보를 포함하는 질문-답변 정보 세트(set)가 소정의 데이터베이스에 저장되는 저장 시각 정보를 확인하는 단계, 상기 질문-답변 정보 세트에 포함된 키워드를 확인하는 단계, 상기 질문-답변 정보 세트의 IP 정보를 확인하는 단계, 상기 저장 시각 정보, 상기 키워드 및 상기 IP 정보와 연관된 어뷰징(abusing) 요소에 기초하여 상기 질문-답변 정보 세트의 스팸 지수를 산정하는 단계, 및 상기 산정된 스팸 지수에 따라 상기 질문-답변 정보 세트를 처리하는 단계를 포함한다.The present invention relates to the spam score of the question-answer information set based on the storing time information of the question-answer information set including the question information and the answer information associated with the question information, the abusing element associated with the keyword and the IP information. And a method and system for calculating and appropriately processing the question-answer information set according to the spam index. The spam index estimating method performed in the spam index estimating system according to the present invention checks stored time information in which a question-answer information set including question information and answer information associated with the question information is stored in a predetermined database. Identifying a keyword included in the question-answer information set, identifying IP information of the question-answer information set, storing time information, an abusing element associated with the keyword and the IP information. Calculating a spam index of the question-answer information set based on the step, and processing the question-answer information set according to the calculated spam index.

질문-답변 정보 세트, 저장 시각 정보, 키워드, IP, 스팸 지수 테이블 Question-answer information set, stored visual information, keyword, IP, spam index table

Description

METHOD AND SYSTEM FOR COMPUTING SPAM INDEX}

도 1은 종래기술에 따른 웹 페이지에 게시된 스팸성 게시물을 도시한 도면이다.1 is a diagram illustrating spammy postings posted on a web page according to the prior art.

도 2는 본 발명의 일실시예에 있어서, 스팸 지수 산정 방법을 도시한 흐름도이다.2 is a flowchart illustrating a spam index calculation method according to an embodiment of the present invention.

도 3은 본 발명의 일실시예에 있어서, 스팸 지수 테이블의 구조를 도시한 도면이다.3 is a diagram illustrating a structure of a spam index table according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 있어서, IP 정보와 연관된 스팸 지수 산정 방법을 도시한 흐름도이다.4 is a flowchart illustrating a spam index calculation method associated with IP information according to an embodiment of the present invention.

도 5는 본 발명의 일실시예에 있어서, 저장 시각 정보와 연관된 스팸 지수 산정 방법을 도시한 흐름도이다.5 is a flowchart illustrating a spam index calculation method associated with stored time information according to an embodiment of the present invention.

도 6은 본 발명의 일실시예에 있어서, 키워드와 연관된 스팸 지수 산정 방법을 도시한 흐름도이다.6 is a flowchart illustrating a method of calculating a spam index associated with a keyword according to an embodiment of the present invention.

도 7은 본 발명의 일실시예에 있어서, 키워드 스팸 지수 테이블의 구조를 도시한 도면이다.7 is a diagram illustrating the structure of a keyword spam index table according to one embodiment of the present invention.

도 8은 본 발명의 다른 실시예에 있어서, 키워드와 연관된 스팸 지수 산정 방법을 도시한 흐름도이다.8 is a flowchart illustrating a method of calculating a spam index associated with a keyword according to another embodiment of the present invention.

도 9는 본 발명의 일실시예에 있어서, 스팸 지수 산정 시스템의 구성을 도시한 블록도이다.9 is a block diagram showing a configuration of a spam index calculation system according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for main parts of the drawings>

900: 스팸 지수 산정 시스템900: spam index calculation system

910: 데이터베이스910 database

920: 스팸 지수 테이블920 spam table

930: 키워드 스팸 지수 테이블930: Keyword Spam Index Table

940: 시각 정보 확인부940: visual information confirmation unit

950: 키워드 확인부950: keyword verification unit

960: IP 정보 확인부960: IP information verification unit

970: 스팸 지수 산정부970: spam index calculation

980: 게시물 처리부980: post processing unit

본 발명은 질문 정보 및 상기 질문 정보와 연관된 답변 정보를 포함하는 질문-답변 정보 세트의 저장 시각 정보, 키워드 및 IP 정보와 연관된 어뷰징(abusing) 요소에 기초하여 상기 질문-답변 정보 세트의 스팸 지수를 산정하고, 상기 스팸 지수에 따라 상기 질문-답변 정보 세트를 적절히 처리하는 방법 및 시스템 에 관한 것이다.The present invention relates to the spam score of the question-answer information set based on the storing time information of the question-answer information set including the question information and the answer information associated with the question information, the abusing element associated with the keyword and the IP information. And a method and system for calculating and appropriately processing the question-answer information set according to the spam index.

최근에 남녀노소 할 것 없이 인터넷 사용이 대중화되고 있는 반면, 인터넷 사용에 따른 피해 사례도 점차 증가하고 있다. 일례로, 인터넷 상에서 정보를 서로 공유하기 위하여 운영되고 있는 정보 제공 서비스에 유용한 정보를 등록하지 않고 유해, 성인 광고 등의 스팸성 게시물을 게시하여 많은 네티즌들의 불편함을 가중시키고 있다.Recently, the use of the Internet has become popular regardless of age or gender, and the damage caused by the use of the Internet is also gradually increasing. For example, it is aggravating the inconvenience of many netizens by posting spammy postings such as harmful and adult advertisements without registering useful information in an information providing service operated to share information on the Internet.

"네티즌 A"는 자신이 겪어보지 못한 일이나 잘 알지 못하는 정보에 대하여 다른 네티즌들의 도움을 받기 위해 몇 가지 질문 사항을 적은 질문 글을 등록하고, 웹 페이지 운영자는 질문 글을 확인하지 않고, 곧바로 웹 페이지상에 게시하도록 하였다."Netizen A" registers a questionnaire with a few questions to help other netizens about what they have not experienced or are not familiar with, and the webpage operator does not check the questionnaire. Posted on the page.

도시한 바와 같이, "네티즌 B"가 "네티즌 A"의 질문 글에 대한 답변 글로 질문 사항에 대한 유용한 정보를 등록하고, 웹 페이지 운영자는 답변 글 역시 확인하지 않고 곧바로 웹 페이지상에 게시하도록 하였다. 그러나, 답변 글에 도면부호(102)와 같은 광고성 키워드가 포함되어 있는 경우, 광고성 게시물이므로 다른 네티즌에게 피해를 주기 전에 웹 페이지상에서 삭제 조치해야 마땅하다.As shown, "Netizen B" registers useful information about the question as an answer to the question of "Netizen A", and the web page operator immediately posts it on the web page without checking the reply. However, if an answer includes an advertising keyword such as the reference numeral 102, it is an advertisement post, so it is necessary to delete the web page before damaging other netizens.

그러나, 종래에는 "네티즌 B"의 답변 글이 광고성 게시물임을 확인한 "네티즌 C"가 신고(101)버튼을 눌러야지만, 웹 페이지 운영자에게 "네티즌 B"의 답변 글이 광고성 게시물임이 보고되어 웹 페이지상에서 삭제 조치할 수 있었다. 또는, 스팸성 게시물이 자주 올라오는 디렉토리를 검수하여 해당 디렉토리에서 전체 게시물을 스크리닝하여 삭제하였다. 또는, 불량 단어의 중요도에 따라 기간별, 시기별 차등 검수하여 게시물을 일일이 확인하여 광고성 게시물인지 판단하여 삭제할 수 있었다. 또는, 게시자 ID나 로그인 IP를 기준으로 스팸성 게시물을 많이 등록하는 ID 리스트를 생성하여 불량 ID를 찾아내어 불량 ID로 게시된 게시글을 찾아 전부 삭제하는 방식으로 스팸성 글을 처리할 수 있었다.However, in the past, "Netizen C", which confirmed that the reply of "Netizen B" is an advertisement post, should press the Report 101 button, but the response of "Netizen B" is reported to the web page operator as an advertisement post. Could delete action. Or, the spammy posts are frequently uploaded and the entire posts are screened and deleted from the directory. Or, depending on the importance of the bad words can be deleted by checking the posts by checking the posts by period, period and time. In addition, the spam list could be processed by generating a list of IDs that register a large number of spammy posts based on the publisher ID or login IP, finding bad IDs, and finding all the posts posted as bad IDs.

따라서, 상기와 같은 방법은 웹 페이지 운영자가 게시된 게시물에 대한 신고 접수를 받지 못하면, 광고성 게시물이라 하더라도 웹 페이지상에 장기간 노출되어 네티즌들의 불만을 가중시키는 문제점이 있었다. 또한, 하루에도 몇 천 건씩 등록되는 모든 게시물에 대하여 일일이 스팸성 게시물인지 판단하여 삭제하는 방식은 스팸성 게시물에 대한 대응 속도가 늦을뿐더러 많은 인력과 비용이 소요된다는 단점이 있다. 또한, 스팸성 게시물이 장시간 노출되면 검색 신용도가 저하된다는 문제점도 발생한다.Therefore, if the above method does not receive a report on a posted post, the web page operator has a problem in that it is exposed to the web page for a long time even if it is an advertising post to increase the complaints of netizens. In addition, the method of determining whether to delete spam posts every day for all posts registered several thousand times a day has a disadvantage in that the response to the spam posts is slow and a lot of manpower and cost are required. In addition, if the spammy posts are exposed for a long time, there is a problem that the search credit rating is lowered.

이처럼, 실시간으로 등록되는 게시물에 대하여 웹 페이지에 곧바로 게시하지 않고, 게시하기 전에 스팸성 게시물인지 여부를 판단하여 스팸성 게시물인 경우 사전에 삭제하도록 하는 방법이 요청되고 있는 실정이다.As such, there is a demand for a method of determining whether a post is registered in real time without being immediately posted on a web page, and determining whether the post is spam or not, if the post is spam.

본 발명은 상술한 바와 같은 종래기술의 문제점을 해결하기 위해 안출된 것으로서, 질문-답변 정보 세트에 포함된 질문자/답변자 IP 정보를 판단하여 IP 정보와 연관된 어뷰징 요소에 따라 스팸 지수를 산정하고, 질문 정보/답변 정보의 저장 시각 정보를 확인하여 저장 시각 정보와 연관된 어뷰징 요소에 따라 스팸 지수를 산정하고, 상기 질문-답변 정보 세트로부터 키워드를 산출하여 키워드 스팸 지수 테이블에 저장된 키워드와 매칭하고, 상기 키워드와 연관된 어뷰징 요소에 따라 스팸 지수를 산정하고, 각 어뷰징 요소에 따라 산정된 스팸 지수의 총 합이 스팸 검출 지수보다 높은 경우 상기 질문-답변 정보 세트를 스팸성 게시물로 검출하여 일정 시간 동안 검색 목록에서 제외시키고, 시스템 운영자에게 보고함으로써, 악성 스팸을 효과적으로 발견하고 사전에 처리하는 지능화된 시스템을 제공하는 방법 및 시스템을 제공하는 것을 목적으로 한다.The present invention has been made to solve the problems of the prior art as described above, to determine the questionnaire / answerer IP information included in the question-answer information set to calculate the spam index according to the abusing factors associated with the IP information, Check the storage time information of the information / answer information to calculate a spam index according to the abusing element associated with the storage time information, calculate a keyword from the question-answer information set, match the keyword stored in the keyword spam index table, and match the keyword. The spam score is calculated according to the abusing factor associated with, and if the total sum of the spam scores calculated according to each abusing factor is higher than the spam detection score, the question-answer information set is detected as a spam posting and excluded from the search list for a certain period of time. And report malicious spam to the system administrator For its object to provide a method and system for providing an intelligent system for processing a high-pre.

또한, 본 발명은 실시간으로 등록되는 질문-답변 정보 세트에 대하여 웹 페이지 등록 전에 상기 질문-답변 정보 세트의 각 어뷰징 요소에 따라 스팸 지수를 산정하고, 상기 스팸 지수가 낮아 스팸성 게시물로 의심되지 않은 경우에만 상기 웹 페이지에 게시하도록 함으로써, 검색 신용도를 증대시키고 검색 질(Quality)을 향상시킬 수 있는 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention is to calculate the spam index according to each of the abusing elements of the question-answer information set before the web page registration for the question-answer information set registered in real time, and if the spam index is not suspected as spam spam posts It is an object of the present invention to provide a method and system that can increase the search credibility and improve the search quality by only posting on the web page.

또한, 본 발명은 키워드 별 차등 부여한 키워드 스팸 지수를 키워드 스팸 지수 테이블에 유지하고, 질문-답변 정보 세트로부터 키워드를 추출하여 상기 추출된 키워드를 상기 키워드 스팸 지수 테이블과 매칭하고, 키워드가 매칭되는 해당 키워드 스팸 지수에 따라 스팸 지수를 산정함으로써, 키워드 스팸 지수가 높은 키워드를 포함하는 질문-답변 정보 세트를 용이하게 스팸성 게시물로 검출하고, 상기 스팸성 게시물을 검색 목록에서 제외시켜 스팸성 게시물이 웹 페이지에 노출되지 않도록 하는 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention is to maintain the keyword spam index given by the keyword in the keyword spam index table, extract the keyword from the question-answer information set to match the extracted keyword with the keyword spam index table, the corresponding keyword is matched By estimating the spam index according to the keyword spam index, it is easy to detect a set of question-answer information containing keywords with a high keyword spam index as spam postings, and exclude the spam postings from the search list to expose the spam postings to web pages. It is an object of the present invention to provide a method and a system which prevents the use of the same.

상기의 목적을 달성하고, 상술한 종래기술의 문제점을 해결하기 위하여, 본 발명의 일실시예에 따른 스팸 지수 산정 시스템에서 수행되는 스팸 지수 산정 방법은 질문 정보 및 상기 질문 정보와 연관된 답변 정보를 포함하는 질문-답변 정보 세트(set)가 소정의 데이터베이스에 저장되는 저장 시각 정보를 확인하는 단계, 상기 질문-답변 정보 세트에 포함된 키워드를 확인하는 단계, 상기 질문-답변 정보 세트의 IP 정보를 확인하는 단계, 상기 저장 시각 정보, 상기 키워드 및 상기 IP 정보와 연관된 어뷰징(abusing) 요소에 기초하여 상기 질문-답변 정보 세트의 스팸 지수를 산정하는 단계, 및 상기 산정된 스팸 지수에 따라 상기 질문-답변 정보 세트를 처리하는 단계를 포함한다.In order to achieve the above object and solve the problems of the prior art, the spam index calculation method performed in the spam index calculation system according to an embodiment of the present invention includes question information and answer information associated with the question information. Confirming stored time information in which the question-answer information set is stored in a predetermined database, identifying keywords included in the question-answer information set, and confirming IP information of the question-answer information set. Estimating a spam index of the set of question-answer information based on the storing time information, the keyword and an abusing element associated with the IP information, and the question-answer according to the calculated spam index. Processing the set of information.

또한, 본 발명의 다른 실시예에 따른 스팸 지수 산정 시스템은 질문 정보 및 상기 질문 정보와 연관된 답변 정보를 포함하는 질문-답변 정보 세트가 소정의 데이터베이스에 저장되는 저장 시각 정보를 확인하는 시각 정보 확인부, 상기 질문-답변 정보 세트에 포함된 키워드를 확인하는 키워드 확인부, 상기 질문-답변 정보 세트의 IP 정보를 확인하는 IP 정보 확인부, 상기 저장 시각 정보, 상기 키워드 및 상기 IP 정보와 연관된 어뷰징 요소에 기초하여 상기 질문-답변 정보 세트의 스팸 지수를 산정하는 스팸 지수 산정부, 및 상기 산정된 스팸 지수에 따라 상기 질문-답변 정보 세트를 처리하는 게시물 처리부를 포함한다.In addition, the spam index calculation system according to another embodiment of the present invention is a visual information confirmation unit for confirming the stored time information is stored in a database a question-answer information set including the question information and the answer information associated with the question information A keyword identification unit for identifying keywords included in the question-answer information set, an IP information identification unit for confirming IP information of the question-answer information set, an storing element associated with the storage time information, the keyword, and the IP information. And a spam index calculation unit for calculating a spam index of the question-answer information set based on the post processing unit, and a post processing unit for processing the question-answer information set according to the calculated spam index.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention;

도 2는 본 발명의 일실시예에 있어서, 스팸 지수 산정 방법을 도시한 흐름도 이다.2 is a flowchart illustrating a spam index calculation method according to an embodiment of the present invention.

단계(201)에서, 스팸 지수 산정 시스템은 질문-답변 정보 세트를 소정의 데이터베이스에 저장할 수 있다. 상기 질문-답변 정보 세트는 질문자가 등록한 질문 정보 및 상기 질문 정보를 보고 답변자가 등록한 답변 정보를 포함할 수 있다. 또한, 상기 질문 정보와 연관된 답변 정보는 하나 이상일 수 있다. 예를 들어, "이준기"라는 질문에 대해 답변자 A, 답변자 B, 답변자 C가 답변 정보를 등록하였다면, 상기 질문-답변 정보 세트는 질문 정보 하나에 3개의 답변 정보가 포함된 것이라 할 수 있다. In step 201, the spam index estimating system may store the question-answer information set in a predetermined database. The question-answer information set may include question information registered by a questioner and answer information registered by an answerer based on the question information. In addition, the answer information associated with the question information may be one or more. For example, if answerer A, answerer B, and answerer C register answer information for the question “Lee Jun-ki,” the question-answer information set may include three answer information in one question information.

본 발명의 일실시예에 따르면, 상기 데이터베이스는 질문-답변 정보 세트 별 질문 정보의 저장 시각 정보, 답변 정보의 저장 시각 정보, 상기 답변 정보의 채택 시각 정보 및 상기 질문자 IP 정보, 상기 답변자 IP 정보를 포함할 수 있다.According to an embodiment of the present invention, the database stores the storage time information of the question information for each question-answer information set, the storage time information of the answer information, the adoption time information of the answer information, the questioner IP information, and the answerer IP information It may include.

단계(202)에서, 상기 스팸 지수 산정 시스템은 저장 시각 정보, 키워드 및 IP 정보와 연관된 각각의 어뷰징(abusing) 요소를 스팸 지수 테이블에 유지할 수 있다.In step 202, the spam index estimating system may maintain respective abusing elements associated with stored time information, keywords, and IP information in a spam index table.

이하, 도 3을 참조하여 설명한다. 도 3은 본 발명의 일실시예에 있어서, 스팸 지수 테이블의 구조를 도시한 도면이다.A description with reference to FIG. 3 is as follows. 3 is a diagram illustrating a structure of a spam index table according to an embodiment of the present invention.

도시한 바와 같이, 스팸 지수 테이블은 IP 정보(i1 내지 i4), 저장 시각 정보(t1, t2) 및 키워드(k1 내지 k5)와 연관된 각각의 어뷰징 요소 및 상기 어뷰징 요소와 연관된 스팸 지수를 유지할 수 있다.As shown, the spam index table may maintain respective abusing elements associated with IP information i1 to i4, stored time information t1 and t2, and keywords k1 to k5, and a spam index associated with the abusing element. .

i1과 연관된 IP 정보의 어뷰징 요소는 질문자 IP 정보와 베스트 답변 정보의 답변자 IP 정보가 동일한 경우일 수 있다. 상기에서도 설명하였듯이, 질문 정보 하나에 답변 정보가 하나 이상일 수 있으며, 상기 베스트 답변 정보의 답변자 IP 정보는 하나 이상의 답변 정보 중 질문자가 답변으로 채택한 답변 정보의 답변자 IP 정보일 수 있다.The arranging element of the IP information associated with i1 may be the case where the interrogator IP information and the answerer IP information of the best answer information are identical. As described above, the question information may include one or more answer information, and the answerer IP information of the best answer information may be the answerer IP information of the answer information adopted by the questioner among the one or more answer information.

예를 들어, 질문자 A가 질문 정보를 등록하고, 다시 상기 질문 정보와 연관된 답변 정보를 등록한 후, 상기 답변 정보를 베스트 답변 정보로 채택하면, 질문자와 답변자의 IP 정보가 동일하다. 이런 경우, 상기 스팸 지수 산정 시스템은 상기 질문 정보와 상기 답변 정보를 포함하는 질문-답변 정보 세트에 대해서 상기 스팸 지수 테이블를 참조하여 IP 정보의 어뷰징 요소 i1과 연관된 "150"으로 스팸 지수를 산정할 수 있다.For example, if questioner A registers the question information, registers the answer information associated with the question information again, and adopts the answer information as the best answer information, the IP information of the questioner and the answerer is the same. In this case, the spam index calculation system may calculate a spam index as "150" associated with the abusing element i1 of IP information with reference to the spam index table for the question-answer information set including the question information and the answer information. have.

i2과 연관된 IP 정보의 어뷰징 요소는 질문자 IP 정보와 베스트 답변 정보의 답변자 IP 정보의 C-Class가 동일한 경우일 수 있다. IP 주소는 인터넷 상에서 물리적인 네트워크 주소와 일치하는 개념으로, 상기 IP 주소를 식별하면 호스트(Host)를 식별할 수 있으므로, 웹 페이지에 게시한 게시물을 누가 등록하였는지 알 수 있다. 상기 IP 주소는 클래스(Class)로 나뉘어 있으며, 하나의 네트워크에서 모든 호스트는 동일한 prefix를 공유한다. The arranging element of the IP information associated with i2 may be the case where the C-Class of the interrogator IP information and the interrogator IP information of the best answer information are the same. The IP address is a concept that corresponds to a physical network address on the Internet. When the IP address is identified, a host can be identified, and thus, it is possible to know who registered a post posted on a web page. The IP address is divided into classes, and all hosts in the same network share the same prefix.

그 중에서도 C-Class IP는 소규모 네트워크에 적용되며, 일정 그룹에 속한 사용자의 IP 정보 중 C-Class IP 정보는 동일하다. 예를 들어, A라는 사업장에서 20대의 컴퓨터로 인터넷을 사용하는 경우, 상기 20대의 컴퓨터에 할당되는 IP 중 C-Class IP는 모두 동일한 것이다. 따라서, 일정 그룹에 속한 사용자들이 고의적 으로 스팸성 게시물을 올리는 경우, 상기 사용자들의 IP 중 C-Class IP는 모두 동일하므로, 상기 스팸 지수 산정 시스템은 C-Class IP를 확인하여 등록되는 게시물이 스팸성 게시물인지 용이하게 판단할 수 있다. 즉, 상기 질문자 IP 정보와 상기 답변자 IP 정보가 완전히 동일하지 않아도 C-Class IP가 동일하다면 동일 IP로 판단할 수 있는 것이다.Among them, C-Class IP is applied to small networks, and C-Class IP information is the same among IP information of users belonging to a certain group. For example, in the case of using the Internet with 20 computers in the workplace A, all C-Class IPs among the IPs allocated to the 20 computers are the same. Therefore, when users belonging to a certain group deliberately post spammy posts, the C-Class IPs among the IPs of the users are all the same, so the spam index estimating system checks the C-Class IP to see if the posts registered are spammy posts. It can be judged easily. That is, even if the interrogator IP information and the answerer IP information are not completely identical, if the C-Class IP is the same, it can be determined as the same IP.

따라서, 상기와 같은 경우 상기 스팸 지수 산정 시스템은 상기 질문 정보와 상기 베스트 답변 정보를 포함하는 질문-답변 정보 세트에 대해서 상기 스팸 지수 테이블를 참조하여 IP 정보의 어뷰징 요소 i2와 연관된 "100"으로 스팸 지수를 산정할 수 있다.Thus, in such a case, the spam index calculation system refers to the spam index table for the question-answer information set including the question information and the best answer information to the spam index as "100" associated with the abusing element i2 of IP information. Can be calculated.

i3과 연관된 IP 정보의 어뷰징 요소는 질문자 IP 정보와 일반 답변 정보의 답변자 IP 정보가 동일한 경우일 수 있다. 상기 일반 답변 정보는 답변자가 질문 정보와 연관된 답변 정보로 등록하였지만, 질문자가 답변 정보로 채택하지 않은 것일 수 있다. 이런 경우, IP 정보의 스팸 지수는 "90"으로 산정될 수 있다.The arranging element of the IP information associated with i3 may be the case where the questionnaire IP information and the answerer IP information of the general answer information are identical. The general answer information may be an answerer registered as answer information associated with the question information, but the questioner did not adopt the answer information. In such a case, the spam index of the IP information can be calculated as "90".

i4과 연관된 IP 정보의 어뷰징 요소는 질문자 IP 정보와 일반 답변 정보의 답변자 IP 정보의 C-Class가 동일한 경우일 수 있다. 이런 경우, IP 정보의 스팸 지수는 "80"으로 산정될 수 있다.The arranging element of the IP information associated with i4 may be the case where the C-Class of the answerer IP information and the answerer IP information of the general answer information are the same. In such a case, the spam index of the IP information may be calculated as "80".

또한, 상기 스팸 지수 테이블은 저장 시각 정보와 연관된 어뷰징 요소 및 스팸 지수를 유지할 수 있다. 도시한 바와 같이, t1과 연관된 IP 정보의 어뷰징 요소는 질문자가 질문 정보와 연관된 답변 정보를 베스트 답변 정보로 채택하는 채택 시간이 1분 이내인 경우, 스팸 지수 "60"으로 산정할 수 있다. 또한, 베스트 답변 정보로 채택하는 채택 시간이 3분 이내인 경우, 스팸 지수 "60"으로 산정할 수 있다. 상기 채택 시간은 상기 스팸 지수 산정 시스템 운영자가 설정할 수 있다.In addition, the spam index table can maintain an abusing factor and spam index associated with stored time information. As shown, the abusing element of the IP information associated with t1 may be calculated as the spam index "60" when the time for which the questioner adopts the answer information associated with the question information as the best answer information is within 1 minute. In addition, when the adoption time to be adopted as the best answer information is within 3 minutes, it can be calculated as the spam index "60". The adoption time may be set by the spam index calculation system operator.

또한, 상기 스팸 지수 테이블은 키워드와 연관된 어뷰징 요소 및 스팸 지수를 유지할 수 있다. 도시한 바와 같이, 키워드 중 URL(k1)이 가장 높은 스팸 지수로 설정되고, 광고성 키워드(k2), 성인 텍스트(k3), 특정 음절빈도(k4), 중복 키워드(k5)에 대하여 스팸 지수가 서로 상이하게 설정될 수 있다. 상기 키워드와 연관된 스팸 지수는 키워드의 피해 사례를 고려하여 상기 스팸 지수 산정 시스템 운영자가 설정할 수 있다. 예를 들어, URL을 통하면 유해, 성인 사이트로 바로 접속이 가능하기 때문에 호기심이 강한 청소년이 접속할 수 있으므로, URL을 키워드 중 가장 높은 스팸 지수로 설정할 수 있다.In addition, the spam index table may maintain an abusing factor and spam index associated with a keyword. As shown, among the keywords, the URL (k1) is set to the highest spam index, and the spam index is different for the advertising keyword (k2), the adult text (k3), the specific syllable frequency (k4), and the duplicate keyword (k5). It can be set differently. The spam index associated with the keyword may be set by the spam index calculation system operator in consideration of the damage case of the keyword. For example, a URL can be set as the highest spam index among keywords because a curious youth can access the site through a URL.

다시, 도 2를 참조하여, 단계(203a) 내지 단계(203c)에서 상기 스팸 지수 산정 시스템은 질문-답변 정보 세트의 스팸성 여부를 판단하기 위해, IP 정보, 저장 시각 정보 및 키워드를 확인할 수 있다. 상기 스팸 지수 산정 시스템은 상기 스팸 지수 테이블을 참조하여 IP 정보는 상기 i1 내지 상기 i4에서, 저장 시각 정보는 상기 t1 및 상기 t2에서, 상기 키워드는 상기 k1 내지 k5에서 각 어뷰징 요소를 확인할 수 있다. 상기 스팸 지수 산정 시스템은 상기 키워드를 확인하기 위해, 상기 질문-답변 정보 세트로부터 키워드를 추출하여 소정의 키워드 스팸 지수 테이블과 매칭할 수 있다. 키워드 확인에 대한 자세한 내용은 도 6 내지 도 8을 참조하여 후술한다.Again, referring to FIG. 2, in steps 203a to 203c, the spam index calculation system may check IP information, storage time information, and keywords in order to determine whether the question-answer information set is spam. The spam index calculation system may identify each abusing element by referring to the spam index table, IP information in the i1 to the i4, storage time information in the t1 and the t2, and the keyword in the k1 to k5. The spam index calculation system may extract a keyword from the question-answer information set and match it with a predetermined keyword spam index table to identify the keyword. Details of keyword checking will be described later with reference to FIGS. 6 to 8.

단계(204)에서, 상기 스팸 지수 산정 시스템은 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수 있다. 상기 스팸 지수 산정 시스템은 상기 IP 정보와 연관된 스팸 지수, 상기 저장 시각 정보와 연관된 스팸 지수, 상기 키워드와 연관된 스팸 지수를 모두 합산한 총 스팸 지수를 산정할 수 있다. 예를 들어, 상기 IP 정보의 스팸 지수가 "150"이고, 상기 저장 시각 정보의 스팸 지수가 "90", "상기 키워드의 스팸 지수가 "160"인 경우, 상기 스팸 지수 산정 시스템은 상기 스팸 지수를 모두 합산한 "400"을 상기 질문-답변 정보 세트의 총 스팸 지수로 산정할 수 있다.In step 204, the spam index estimating system may calculate a spam index of the question-answer information set. The spam index calculation system may calculate a total spam index obtained by adding up a spam index associated with the IP information, a spam index associated with the storage time information, and a spam index associated with the keyword. For example, when the spam index of the IP information is "150", the spam index of the stored time information is "90", and the spam index of the keyword is "160", the spam index calculation system determines the spam index. The sum of all the "400" can be calculated as the total spam index of the question-answer information set.

단계(205)에서, 상기 스팸 지수 산정 시스템은 상기 총 스팸 지수가 스팸 검출 지수보다 큰 경우, 상기 질문-답변 정보 세트를 스팸성 게시물로 검출할 수 있다. 상기 스팸 검출 지수는 스팸성 게시물로 검출할 수 있는 최소한의 총 스팸 지수로 설정될 수 있으며, 또는, 상기 스팸 지수 산정 시스템 운영자가 설정할 수 있다. 예를 들어, 상기 스팸 검출 지수가 "500"으로 설정된 경우, 상기 질문-답변 정보 세트의 총 스팸 지수가 500보다 작으면, 비록 스팸 지수가 산정되었지만 스팸성 게시물로 검출하지 않을 수 있다. 상기 총 스팸 지수가 600인 경우, 상기 스팸 검출 지수보다 크니까, 상기 질문-답변 정보 세트를 스팸성 게시물로 검출할 수 있다.In step 205, if the total spam index is greater than the spam detection score, the spam index calculation system may detect the question-answer information set as spammy posts. The spam detection index may be set to a minimum total spam index that can be detected as spam postings, or may be set by the spam index calculation system operator. For example, if the spam detection index is set to " 500 ", if the total spam index of the question-answer information set is less than 500, it may not be detected as a spam post, although the spam index is calculated. When the total spam index is 600, since the spam detection index is greater than the spam detection index, the question-answer information set may be detected as a spam posting.

단계(206)에서, 상기 스팸 지수 산정 시스템은 상기 스팸성 게시물로 검출된 상기 질문-답변 정보 세트를 검색 목록에서 제외시키고, 상기 운영자에게 보고할 수 있다. 단계(201) 내지 단계(206)은 상기 질문-답변 정보 세트가 웹 페이지에 게시되기 전에 수행되는 것으로, 상기 질문-답변 정보 세트가 스팸성 게시물이 아닌 경우에는 상기 웹 페이지에 게시된다. 그러나, 상기 스팸성 게시물로 검출된 경우에는 다른 네티즌들이 상기 질문-답변 정보 세트와 연관된 검색어를 입력하여도 상기 질문-답변 정보 세트가 검색되지 않도록 상기 검색 목록에서 삭제시킬 수 있다. In step 206, the spam index estimating system may exclude the question-answer information set detected as the spammy post from the search list and report it to the operator. Steps 201 to 206 are performed before the question-answer information set is posted to a web page, and if the question-answer information set is not a spammy post. However, if detected as the spam posts, even if other netizens input a search word associated with the question-answer information set, the question-answer information set may be deleted from the search list so as not to be searched.

본 발명의 일실시예에 따르면, 상기 스팸 지수 산정 시스템은 소정의 일정 시간 예를 들어, 2주 정도 상기 검색 목록에서 삭제하고, 상기 스팸성 게시물과 연관된 질문자, 답변자 등의 정보를 파악하여 상기 질문자, 상기 답변자가 게시한 게시물을 일괄적으로 삭제하거나 강제 탈퇴시키는 등으로 스팸성 게시물을 게시한 사용자를 처리할 수 있다.According to an embodiment of the present invention, the spam index calculation system deletes the search list for a predetermined time, for example, for about two weeks, and grasps information such as a questioner, an answerer, and the like, which are related to the spam posts, by the questioner, The user who posted the spammy post may be processed by collectively deleting or forcibly leaving the post posted by the responder.

이처럼, 본 발명에 따르면, 악성 스팸을 효과적으로 발견하고 사전에 처리하는 지능화된 시스템을 제공할 수 있다.As described above, according to the present invention, an intelligent system for effectively detecting and pre-processing malicious spam can be provided.

도 4에서는 상기 단계(203a)의 IP 정보를 확인하는 절차를 구체적으로 설명한다.In FIG. 4, the procedure of confirming the IP information of the step 203a will be described in detail.

단계(401) 및 단계(402)에서, 상기 스팸 지수 산정 시스템은 질문-답변 정보 세트의 질문자 IP 정보, 답변자 IP 정보를 식별할 수 있다. 본 발명의 일실시예에 따르면, 상기 스팸 지수 산정 시스템은 상기 질문자 IP 정보와 상기 답변자 IP 정보의 전체 IP 주소를 식별할 수도 있고, C-Class IP 주소를 식별할 수도 있다.In steps 401 and 402, the spam index estimating system may identify the interrogator IP information and the responder IP information of the question-answer information set. According to an embodiment of the present invention, the spam index calculation system may identify the IP address of the interrogator IP information and the answerer IP information, or may identify a C-Class IP address.

단계(403a)에서, 상기 스팸 지수 산정 시스템은 상기 데이터베이스를 참조하여 상기 질문자 IP 정보 및 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는지 여부를 판단할 수 있다. 예를 들어, 스팸성 게시물을 등록하기 위하여, 사용자 A가 질문 정보를 등록하면, 사용자 B가 답변 정보를 등록하는 방식으로 다수의 질문-답변 정보 세트를 등록한 경우, 데이터베이스에 저장된 질문-답변 정보 세트 중에 상기 사용자 A와 상기 사용자 B가 등록한 질문-답변 정보 세트가 다수 발견될 것이다.In step 403a, the spam index calculation system may determine whether a question-answer information set having the same questioner IP information and the answerer IP information exists with reference to the database. For example, in order to register a spammy post, if user A registers question information, and if user B registers a plurality of question-answer information sets in a manner that registers the answer information, one of the question-answer information sets stored in the database A plurality of question-answer information sets registered by the user A and the user B will be found.

단계(404)에서, 상기 스팸 지수 산정 시스템은 상기 질문자 IP 정보 및 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는 경우, 상기 스팸 지수 테이블을 참조하여 스팸 지수를 산정할 수 있다. 본 발명의 일실시예에 따르면, 상기 산정된 스팸 지수는 IP 정보의 스팸 지수일 수 있다. In step 404, the spam index calculation system may calculate a spam index by referring to the spam index table when the question-answer information set having the same questioner IP information and the answerer IP information exists. According to an embodiment of the present invention, the calculated spam index may be a spam index of IP information.

또는, 단계(403b)에서, 상기 스팸 지수 산정 시스템은 상기 질문자 IP 정보와 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는지 여부를 판단할 수 있다. 예를 들어, 사용자 A가 질문 정보와 답변 정보를 모두 등록한 경우, 상기 질문 정보와 상기 답변 정보의 IP 주소가 동일할 것이다. 또는, 소규모의 네트워크 망을 통해 질문 정보와 그에 대한 답변 정보를 등록한 경우, 상기 질문 정보와 상기 답변 정보의 C-Class IP 주소가 동일할 것이다.Alternatively, in step 403b, the spam index calculation system may determine whether a question-answer information set having the same questioner IP information and the answerer IP information exists. For example, if user A registers both question information and answer information, the IP address of the question information and the answer information will be the same. Alternatively, when the question information and the answer information are registered through a small network, the C-Class IP address of the question information and the answer information will be the same.

단계(404)에서, 상기 스팸 지수 산정 시스템은 상기 질문자 IP 정보와 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는 경우, 상기 스팸 지수 테이블을 참조하여 스팸 지수를 산정할 수 있다. 예를 들어, 상기 스팸 지수 산정 시스템은 전체 IP 주소가 동일한 경우 스팸 지수로 150 또는 90을, C-Class IP 주소가 동일한 경우, 스팸 지수로 100 또는 80을 상기 질문-답변 정보 세트에 대한 스팸 지수를 산정할 수 있다.In step 404, the spam index calculation system may calculate a spam index by referring to the spam index table when the question-answer information set having the same questioner IP information and the answerer IP information exists. For example, the spam index estimating system may use 150 or 90 as the spam index if the total IP addresses are the same, or 100 or 80 as the spam index if the C-Class IP addresses are the same. Can be calculated.

도 5에서는 상기 단계(203b)의 저장 시각 정보를 확인하는 절차를 구체적으로 설명한다.In FIG. 5, a procedure of confirming the storage time information of the step 203b will be described in detail.

단계(501)에서, 상기 스팸 지수 산정 시스템은 질문 정보가 데이터베이스에 저장된 시각 정보를 식별할 수 있다. 상기 스팸 지수 산정 시스템은 질문자로부터 질문 정보를 등록 받은 경우, 바로 웹 페이지에 게시하지 않고 상기 데이터베이스에 저장해 두고, 스팸성 게시물이 아닌 것으로 판단된 경우에만 웹 페이지에 게시할 수 있도록 한다.In step 501, the spam index calculation system may identify visual information in which question information is stored in a database. When the spam index calculation system receives question information from a questioner, the spam index calculation system stores the information in the database instead of posting it directly to a web page, and posts it to a web page only when it is determined that the spam information is not spam.

단계(502)에서, 상기 스팸 지수 산정 시스템은 상기 질문 정보와 연관된 답변 정보의 저장 시각 정보를 식별할 수 있다. 상기 스팸 지수 산정 시스템은 상기 질문 정보와 연관된 모든 답변 정보의 저장 시각 정보를 식별할 수 있다.In step 502, the spam index calculation system may identify storage time information of answer information associated with the question information. The spam index calculation system may identify storage time information of all answer information associated with the question information.

단계(503)에서, 상기 스팸 지수 산정 시스템은 답변 정보가 답변으로 채택된 채택 시각 정보를 식별할 수 있다. 질문자는 상기 질문 정보와 연관된 답변 정보 중 하나를 답변 정보로 채택할 수 있다.In step 503, the spam index estimating system may identify adoption time information for which the response information has been adopted as the answer. The questioner may adopt one of the answer information associated with the question information as the answer information.

단계(504a)에서, 상기 스팸 지수 산정 시스템은 상기 질문 정보의 저장 시각 정보와 상기 답변 정보의 저장 시각 정보의 차이가 소정의 기준 등록 시간 정보보다 작은지 여부를 판단할 수 있다. 상기 기준 등록 시간 정보는 질문 정보를 보고 답변 정보를 입력하는데 소요되는 평균적인 시간으로 설정될 수 있으며, 예를 들 어, 10분, 1시간 등으로 설정될 수 있다. 일실시예로서, 질문자가 질문 정보를 등록하면 답변자가 바로 답변 정보를 등록하여 상기 기준 등록 시간 정보가 매우 짧을 수 있다. 이때, 상기 스팸 지수 산정 시스템은 상기 기준 등록 시간 정보가 매우 짧은 질문-답변 정보 세트가 존재하는 경우, 스팸성 게시물로 의심하여 스팸 지수를 산정할 수 있다. In step 504a, the spam index calculation system may determine whether a difference between the storage time information of the question information and the storage time information of the answer information is smaller than predetermined reference registration time information. The reference registration time information may be set to an average time required to view the question information and input the answer information. For example, the reference registration time information may be set to 10 minutes or 1 hour. In one embodiment, when the questioner registers the question information, the answerer immediately registers the answer information so that the reference registration time information may be very short. In this case, the spam index calculation system may calculate a spam index by suspecting a spam posting when there is a question-answer information set having very short reference registration time information.

또는, 단계(504b)에서, 상기 스팸 지수 산정 시스템은 상기 답변 정보의 저장 시각 정보와 상기 채택 시각 정보의 차이가 소정의 답변 반영 시간 정보보다 작은지 여부를 판단할 수 있다. 상기 답변 반영 시간 정보는 질문자가 답변 정보를 보고 답변으로 채택하는데 소요되는 평균적인 시간으로 설정될 수 있으며, 예를 들어, 1분, 3분, 1시간 등일 수 있다. 일실시예로서, 질문자와 답변자 두 사람이 협의하여 상기 질문자가 질문 정보를 등록하면 상기 답변자가 바로 답변 정보를 등록하고, 상기 질문자가 상기 답변 정보를 답변으로 채택하는 경우, 상기 답변 반영 시간 정보가 매우 짧을 수 있다. 이때, 상기 스팸 지수 산정 시스템은 상기 답변 반영 시간이 매우 짧은 질문-답변 정보 세트가 존재하는 경우, 스팸성 게시물로 의심하여 스팸 지수를 산정할 수 있다. Alternatively, in step 504b, the spam index calculation system may determine whether a difference between the storage time information of the response information and the adoption time information is smaller than predetermined response reflection time information. The response reflection time information may be set to an average time required for the questioner to see the answer information and adopt it as an answer. For example, the response reflection time information may be 1 minute, 3 minutes, 1 hour, and the like. In one embodiment, when the questioner and the answerer negotiate and the questioner registers the question information, the answerer immediately registers the answer information, and when the questioner adopts the answer information as the answer, the answer reflecting time information is It can be very short. In this case, the spam index calculation system may calculate a spam index by suspicious of spammy posts when there is a question-answer information set having a very short response reflection time.

단계(504)에서, 상기 스팸 지수 산정 시스템은 상기 스팸 지수 테이블을 참조하여 해당 어뷰징 요소에 따라 상기 질문-답변 정보 세트에 대한 스팸 지수를 산정할 수 있다. 본 발명의 일실시예에 따르면, 상기 산정된 스팸 지수는 저장 시각 정보의 스팸 지수일 수 있다. In step 504, the spam index calculation system may calculate a spam index for the set of question-answer information according to the corresponding abusing element with reference to the spam index table. According to an embodiment of the present invention, the calculated spam index may be a spam index of stored time information.

도 6은 본 발명의 일실시예에 있어서, 키워드와 연관된 스팸 지수 산정 방법 을 도시한 흐름도이다.6 is a flowchart illustrating a method of calculating a spam index associated with a keyword according to an embodiment of the present invention.

도 6에서는 상기 단계(203c)의 키워드를 확인하는 절차를 구체적으로 설명한다.In FIG. 6, a procedure of confirming the keyword of step 203c will be described in detail.

단계(601)에서, 상기 스팸 지수 산정 시스템은 키워드 별 키워드 스팸 지수를 키워드 스팸 지수 테이블에 유지할 수 있다.In step 601, the spam index calculation system may maintain a keyword spam index for each keyword in the keyword spam index table.

이하 도 7를 참조하여 설명한다. 도 7은 본 발명의 일실시예에 있어서, 키워드 스팸 지수 테이블의 구조를 도시한 도면이다.A description with reference to FIG. 7 is as follows. 7 is a diagram illustrating the structure of a keyword spam index table according to one embodiment of the present invention.

도시한 바와 같이, 키워드 스팸 지수 테이블은 URL(k1), 광고성 키워드(k2), 성인 키워드(k3) 및 특정 음절빈도(k4)와 연관된 키워드를 유지하고, 상기 키워드 별 키워드 스팸 지수를 유지할 수 있다. 상기 스팸 지수 산정 시스템은 키워드 속성을 고려하여 키워드 스팸 지수를 차등 부여할 수 있으며, 불편함을 가중시키는 키워드에 대한 지수를 높게 산정할 수 있다.As shown, the keyword spam index table may maintain keywords associated with URL (k1), advertising keyword (k2), adult keyword (k3), and specific syllable frequency (k4), and maintain the keyword spam index for each keyword. . The spam index calculation system may give a keyword spam index differential in consideration of keyword attributes, and calculate a high index for a keyword that increases discomfort.

단계(602)에서, 상기 스팸 지수 산정 시스템은 질문-답변 정보 세트로부터 키워드를 추출할 수 있다. 상기 스팸 지수 산정 시스템은 상기 데이터베이스에 저장된 모든 질문-답변 정보 세트에 대해서 키워드를 추출할 수도 있지만, IP 정보와 저장 시각 정보 확인 후, 스팸성 게시물일 확률이 높은 질문-답변 정보 세트에 대해서 키워드를 추출할 수 있다.In step 602, the spam index estimating system may extract keywords from the set of question-answer information. The spam index estimating system may extract keywords for all question-answer information sets stored in the database. However, after checking IP information and storage time information, the spam index calculation system extracts keywords for question-answer information sets that are likely to be spam posts. can do.

단계(603)에서, 상기 스팸 지수 산정 시스템은 상기 추출된 키워드를 상기 키워드 스팸 지수 테이블에 저장된 키워드와 매칭할 수 있다. 상기 스팸 지수 산정 시스템은 상기 추출된 키워드를 URL, 광고성 키워드, 성인 키워드 및 특정 음절 빈도에 포함되는 키워드에 매칭시킬 수 있다.In step 603, the spam index calculation system may match the extracted keyword with a keyword stored in the keyword spam index table. The spam index calculation system may match the extracted keyword to a keyword included in a URL, an advertisement keyword, an adult keyword, and a specific syllable frequency.

단계(604)에서, 상기 스팸 지수 산정 시스템은 상기 키워드 스팸 지수 테이블을 참조하여 상기 추출된 키워드와 매칭되는 키워드 스팸 지수를 식별할 수 있다. 본 발명의 일실시예에 따르면, 상기 스팸 지수 산정 시스템은 동일한 키워드에 대해서는 하나의 키워드로 하여 상기 키워드 스팸 지수를 식별할 수 있다.In step 604, the spam index calculation system may identify a keyword spam index that matches the extracted keyword with reference to the keyword spam index table. According to an embodiment of the present invention, the spam index calculation system may identify the keyword spam index as one keyword for the same keyword.

단계(605)에서, 상기 스팸 지수 산정 시스템은 상기 식별된 키워드 스팸 지수에 따라 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수 있다. 본 발명의 일실시예에 따르면, 상기 산정된 스팸 지수는 키워드의 스팸 지수일 수 있다. In step 605, the spam index estimating system may calculate the spam index of the question-answer information set according to the identified keyword spam index. According to an embodiment of the present invention, the calculated spam index may be a spam index of a keyword.

단계(801)에서, 상기 스팸 지수 산정 시스템은 상기 질문-답변 정보 세트로부터 키워드를 추출할 수 있다.In step 801, the spam index estimating system may extract keywords from the question-answer information set.

단계(802)에서, 상기 스팸 지수 산정 시스템은 상기 추출된 키워드 중 반복되는 키워드가 존재하는지 여부를 식별할 수 있다. 예를 들어, 상기 스팸 지수 산정 시스템은 영어, 컴퓨터 등의 유해 광고 키워드가 반복적으로 검출되거나 박지성, 이영표 등의 인기 키워드가 반복적으로 검출되는 질문-답변 정보 세트가 존재하는지 여부를 식별할 수 있다. In step 802, the spam index calculation system may identify whether there is a repeated keyword among the extracted keywords. For example, the spam index calculation system may identify whether there is a question-answer information set in which harmful advertisement keywords such as English and computers are repeatedly detected, or popular keywords such as Park Ji-sung and Lee Young-pyo are repeatedly detected.

단계(803)에서, 상기 스팸 지수 산정 시스템은 상기 질문-답변 정보 세트에 반복되는 키워드가 존재하는 경우, 상기 스팸 지수 테이블을 참조하여 해당 어뷰징 요소에 따라 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수 있다. 도 3을 참 조하면, 중복되는 키워드에 대한 스팸 지수는 "80"이다. 본 발명의 일실시예에 따르면, 상기 스팸 지수 산정 시스템은 질문-답변 정보 세트로부터 추출된 키워드 하나 당 반복 여부를 판단하고, 그에 따른 스팸 지수를 모두 합산하여 키워드와 연관된 총 스팸 지수를 산정할 수 있다. In step 803, the spam index estimating system calculates a spam index of the question-answer information set according to the corresponding abusing element with reference to the spam index table when there is a keyword repeated in the question-answer information set. can do. Referring to FIG. 3, the spam index for the duplicate keyword is "80". According to an embodiment of the present invention, the spam index calculation system may determine whether to repeat per keyword extracted from the question-answer information set, and calculate the total spam index associated with the keyword by adding all the spam indexes accordingly. have.

스팸 지수 산정 시스템(900)은 데이터베이스(910), 스팸 지수 테이블(920), 키워드 스팸 지수 테이블(930), 시각 정보 확인부(940), 키워드 확인부(950), IP 정보 확인부(960), 스팸 지수 산정부(970) 및 게시물 처리부(980)를 포함할 수 있다.The spam index calculation system 900 includes a database 910, a spam index table 920, a keyword spam index table 930, a time information check unit 940, a keyword check unit 950, and an IP information check unit 960. The spam index calculation unit 970 and the post processing unit 980 may be included.

데이터베이스(910)는 질문-답변 정보 세트 별 질문 정보의 저장 시각 정보, 답변 정보의 저장 시각 정보, 상기 답변 정보의 채택 시각 정보 및 상기 질문자 IP 정보, 상기 답변자 IP 정보를 포함할 수 있다.The database 910 may include storage time information of question information for each question-answer information set, storage time information of answer information, adoption time information of the answer information, the questioner IP information, and the answerer IP information.

스팸 지수 테이블(920)은 IP 정보, 저장 시각 정보 및 키워드와 연관된 어뷰징 요소 및 상기 어뷰징 요소에 대응하는 스팸 지수를 유지할 수 있다(도 3 참조).The spam index table 920 may maintain an abusing element associated with IP information, stored time information and keywords, and a spam index corresponding to the abusing element (see FIG. 3).

키워드 스팸 지수 테이블(930)은 URL, 광고성 키워드, 성인성 키워드 및 유해 음절과 연관된 키워드 및 상기 URL, 상기 광고성 키워드, 상기 성인성 키워드 및 상기 유해 음절 각각 차등 부여한 키워드 스팸 지수를 유지할 수 있다(도 7 참조).The keyword spam index table 930 may maintain a keyword associated with a URL, an advertising keyword, an adult keyword, and harmful syllables, and a keyword spam index differentially assigned to each of the URL, the advertising keyword, the adult keyword, and the harmful syllable (FIG. 7).

시각 정보 확인부(940)는 질문 정보 및 상기 질문 정보와 연관된 답변 정보 를 포함하는 질문-답변 정보 세트가 소정의 데이터베이스에 저장되는 저장 시각 정보를 확인할 수 있다. 구체적으로, 시각 정보 확인부(940)는 상기 질문 정보의 저장 시각 정보와 상기 답변 정보의 저장 시각 정보를 식별하거나, 질문 정보를 입력한 질문자로부터 상기 답변 정보를 상기 질문 정보의 답변으로 채택하는 채택 시각 정보를 식별할 수 있다.The visual information checking unit 940 may check the stored visual information in which a question-answer information set including question information and answer information associated with the question information is stored in a predetermined database. Specifically, the visual information confirming unit 940 identifies the storage time information of the question information and the storage time information of the answer information, or adopts the answer information as the answer of the question information from the questioner who inputs the question information. Visual information can be identified.

스팸 지수 산정부(970)는 상기 질문 정보의 저장 시각 정보와 상기 답변 정보의 저장 시각 정보의 차이가 소정의 기준 등록 시간 정보보다 작은 경우, 상기 스팸 지수 테이블을 참조하여 해당 어뷰징 요소에 따라 상기 질문-답변 정보 세트에 대한 스팸 지수를 산정할 수 있다. 상기 기준 등록 시간 정보는 질문 정보를 보고 답변 정보를 입력하는데 소요되는 평균적인 시간으로 설정될 수 있으며, 예를 들어, 10분, 1시간 등으로 설정될 수 있다. When the difference between the storage time information of the question information and the storage time information of the answer information is smaller than a predetermined reference registration time information, the spam index calculation unit 970 refers to the spam index table to determine the question according to the corresponding abusing factor. The spam index for the answer information set can be calculated. The reference registration time information may be set to an average time required to view the question information and input the answer information. For example, the reference registration time information may be set to 10 minutes or 1 hour.

또는 스팸 지수 산정부(970)는 상기 답변 정보의 저장 시각 정보와 상기 채택 시각 정보의 차이가 소정의 답변 반영 시간 정보보다 작은 경우, 상기 스팸 지수 테이블을 참조하여 해당 어뷰징 요소에 따라 상기 질문-답변 정보 세트에 대한 스팸 지수를 산정할 수도 있다. 상기 답변 반영 시간 정보는 질문자가 답변 정보를 보고 답변으로 채택하는데 소요되는 평균적인 시간으로 설정될 수 있으며, 예를 들어, 1시간 하루 등일 수 있다. Alternatively, if the difference between the storage time information of the response information and the adoption time information is smaller than a predetermined response reflecting time information, the spam index calculation unit 970 may refer to the spam index table and answer the question-answer according to the corresponding abusing factor. You can also calculate the spam index for a set of information. The response reflecting time information may be set to an average time required for the questioner to view the answer information and adopt it as the answer.

키워드 확인부(950)는 상기 질문-답변 정보 세트에 포함된 키워드를 확인할 수 있다. 키워드 확인부(950)는 상기 질문-답변 정보 세트로부터 키워드를 추출하고, 상기 추출된 키워드를 상기 키워드 스팸 지수 테이블에 저장된 키워드와 매칭 할 수 있다. 키워드 확인부(950)는 상기 질문-답변 정보 세트로부터 키워드를 추출하고, 상기 추출된 키워드 중 반복되는 키워드가 존재하는지 여부를 식별할 수 있다.The keyword checking unit 950 may check keywords included in the question-answer information set. The keyword verification unit 950 may extract a keyword from the question-answer information set and match the extracted keyword with a keyword stored in the keyword spam index table. The keyword verification unit 950 may extract a keyword from the question-answer information set, and identify whether there is a repeated keyword among the extracted keywords.

스팸 지수 산정부(970)는 상기 키워드 스팸 지수 테이블을 참조하여 상기 추출된 키워드와 매칭되는 키워드 스팸 지수를 식별하고, 상기 식별된 키워드 스팸 지수에 따라 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수 있다. 상기 키워드 스팸 지수 테이블은 키워드 별로 키워드 스팸 지수가 다르므로, 키워드 스팸 지수가 높은 키워드를 포함하고 있는 질문-답변 정보 세트는 전체 스팸 지수도 높아서 스팸성 게시물로 검출될 확률이 더 높아지게 된다.The spam index calculation unit 970 may identify a keyword spam index that matches the extracted keyword with reference to the keyword spam index table, and calculate a spam index of the question-answer information set according to the identified keyword spam index. Can be. Since the keyword spam index table has a different keyword spam index for each keyword, the question-answer information set including a keyword having a high keyword spam index has a higher overall spam index and thus has a higher probability of being detected as a spam post.

또는, 스팸 지수 산정부(970)는 상기 질문-답변 정보 세트에 반복되는 키워드가 존재하는 경우, 상기 스팸 지수 테이블을 참조하여 해당 어뷰징 요소에 따라 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수도 있다.Alternatively, when there is a keyword repeated in the question-answer information set, the spam index calculation unit 970 may calculate the spam index of the question-answer information set according to the abusing factor with reference to the spam index table. have.

IP 정보 확인부(960)는 상기 질문 정보를 입력한 질문자 IP 정보와 상기 답변 정보를 입력한 답변자 IP 정보를 식별하고, 상기 데이터베이스를 참조하여 상기 질문자 IP 정보 및 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는지 여부를 확인할 수 있다. 스팸 지수 산정부(970)는 상기 질문자 IP 정보 및 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는 경우, 상기 스팸 지수 테이블을 참조하여 해당 어뷰징 요소에 따라 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수 있다. The IP information confirming unit 960 identifies the interrogator IP information input the question information and the answerer IP information input the answer information, and the question-answer having the same questioner IP information and the answerer IP information with reference to the database. You can check whether the information set exists. The spam index calculation unit 970, when there is a question-answer information set having the same questioner IP information and the answerer IP information, refers to the spam index table and determines the spam index of the question-answer information set according to the abusing element. Can be calculated.

또한, IP 정보 확인부(960)는 상기 질문자 IP 정보와 상기 답변자 IP 정보가 동일한 질문-답변 정보 세트가 존재하는지 여부를 확인할 수도 있다. 이때, IP 정보 확인부(960)는 상기 질문자 IP 정보와 상기 답변자 IP 정보의 전체 IP 주소가 동일한지, C-Class IP 주소가 동일한지 확인할 수 있다. 스팸 지수 산정부(970)는 상기 질문자 IP 정보와 상기 답변자 IP 정보의 전체 IP 주소가 동일한지, C-Class IP 주소가 가 동일한지에 따라 상기 스팸 지수 테이블을 참조하여 상기 질문-답변 정보 세트의 스팸 지수를 산정할 수 있다.In addition, the IP information checking unit 960 may check whether a question-answer information set having the same questioner IP information and the answerer IP information exists. In this case, the IP information checking unit 960 may check whether the questionnaire IP information and the answerer IP information have the same IP address and the C-Class IP address. The spam index calculation unit 970 refers to the spam index table according to whether the total IP address of the interrogator IP information and the answerer IP information is the same and the C-Class IP address is the same. The index can be calculated.

스팸 지수 산정부(970)는 상기 스팸 지수 테이블을 참조하여 상기 저장 시각 정보와 연관된 제1 스팸 지수, 상기 키워드와 연관된 제2 스팸 지수, 상기 IP 정보와 연관된 제3 스팸 지수를 산정하고, 상기 제1 스팸 지수, 상기 제2 스팸 지수 및 상기 제3 스팸 지수를 합산한 총 스팸 지수를 산출할 수 있다. 이에, 스팸 지수 산정부(970)는 상기 총 스팸 지수가 스팸 검출 지수보다 큰 경우, 상기 질문-답변 정보 세트를 스팸성 게시물로 검출할 수 있다. 상기 스팸 검출 지수는 스팸성 게시물로 검출할 수 있는 최소한의 총 스팸 지수로 설정될 수 있으며, 상기 스팸 지수 산정 시스템 운영자가 설정할 수 있다.The spam index calculation unit 970 calculates a first spam index associated with the storage time information, a second spam index associated with the keyword, and a third spam index associated with the IP information with reference to the spam index table. The total spam index may be calculated by adding the first spam index, the second spam index, and the third spam index. Accordingly, when the total spam index is greater than the spam detection index, the spam index calculation unit 970 may detect the question-answer information set as spam postings. The spam detection index may be set to a minimum total spam index that can be detected as spam postings, and may be set by the spam index calculation system operator.

게시물 처리부(980)는 상기 산정된 스팸 지수에 따라 상기 질문-답변 정보 세트가 스팸성 게시물로 검출된 경우, 소정의 일정 시간 동안 상기 질문-답변 정보 세트를 검색 목록에서 삭제시키고, 상기 질문-답변 정보 세트를 스팸 지수 산정 시스템(900) 운영자에게 보고할 수 있다. 상기 소정의 일정 시간은 하루, 일주일, 한달 일 수 있으며, 상기 운영자가 결정할 수 있다. The post processing unit 980 deletes the question-answer information set from the search list for a predetermined time, when the question-answer information set is detected as a spam post according to the calculated spam index, and the question-answer information The set may be reported to the spam index calculation system 900 operator. The predetermined time may be one day, one week, or one month, and may be determined by the operator.

본 발명에 따른 스팸 지수 산정 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The spam index estimating method according to the present invention can be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. As described above, although the present invention has been described with reference to limited embodiments and drawings, the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해 져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the claims below, but also by those equivalent to the claims.

본 발명에 따르면, 질문-답변 정보 세트에 포함된 질문자/답변자 IP 정보를 판단하여 IP 정보와 연관된 어뷰징 요소에 따라 스팸 지수를 산정하고, 질문 정보/답변 정보의 저장 시각 정보를 확인하여 저장 시각 정보와 연관된 어뷰징 요소에 따라 스팸 지수를 산정하고, 상기 질문-답변 정보 세트로부터 키워드를 산출하여 키워드 스팸 지수 테이블에 저장된 키워드와 매칭하고, 상기 키워드와 연관된 어뷰징 요소에 따라 스팸 지수를 산정하고, 각 어뷰징 요소에 따라 산정된 스팸 지수의 총 합이 스팸 검출 지수보다 높은 경우 상기 질문-답변 정보 세트를 스팸성 게시물로 검출하여 일정 시간 동안 검색 목록에서 제외시키고, 시스템 운영자에게 보고함으로써, 악성 스팸을 효과적으로 발견하고 사전에 처리하는 지능화된 시스템을 제공할 수 있다.According to the present invention, the interrogator / answerer IP information included in the question-answer information set is determined, the spam index is calculated according to the abusing element associated with the IP information, and the storage time information is checked by confirming the storage time information of the question information / answer information. Calculate a spam index according to the abusing factor associated with the keyword, calculate a keyword from the question-answer information set, match the keyword stored in a keyword spam index table, calculate a spam index according to the abusing factor associated with the keyword, and each abusing If the total sum of spam scores calculated according to factors is higher than the spam detection score, the question-answer information set is detected as a spam posting, excluded from the search list for a certain period of time, and reported to the system operator, thereby effectively detecting malicious spam. It can provide an intelligent system that processes in advance.

또한, 본 발명에 따르면, 실시간으로 등록되는 질문-답변 정보 세트에 대하여 웹 페이지 등록 전에 상기 질문-답변 정보 세트의 각 어뷰징 요소에 따라 스팸 지수를 산정하고, 상기 스팸 지수가 낮아 스팸성 게시물로 의심되지 않은 경우에만 상기 웹 페이지에 게시하도록 함으로써, 검색 신용도를 증대시키고 검색 질(Quality)을 향상시킬 수 있다.Further, according to the present invention, a spam index is calculated according to each arranging element of the question-answer information set before registering a web page for a question-answer information set registered in real time, and the spam index is not suspected to be spam. By posting on the web page only when it is not, it is possible to increase the search credibility and improve the search quality.

또한, 본 발명에 따르면, 키워드 별 차등 부여한 키워드 스팸 지수를 키워드 스팸 지수 테이블에 유지하고, 질문-답변 정보 세트로부터 키워드를 추출하여 상기 추출된 키워드를 상기 키워드 스팸 지수 테이블과 매칭하고, 키워드가 매칭되는 해 당 키워드 스팸 지수에 따라 스팸 지수를 산정함으로써, 키워드 스팸 지수가 높은 키워드를 포함하는 질문-답변 정보 세트를 용이하게 스팸성 게시물로 검출하고, 상기 스팸성 게시물을 검색 목록에서 제외시켜 스팸성 게시물이 웹 페이지에 노출되지 않도록 할 수 있다.In addition, according to the present invention, the keyword spam index given by the keyword is maintained in the keyword spam index table, the keyword is extracted from the question-answer information set, and the extracted keyword is matched with the keyword spam index table, and the keyword is matched. By estimating the spam index according to the corresponding keyword spam index, it is easy to detect a question-answer information set including a keyword having a high keyword spam index as a spam posting, and to exclude the spam posting from the search list so that the spam posting is You can prevent it from being exposed to the page.

Claims

A first step of identifying a set of question-answer information including question information and answer information associated with the question information, identifying stored time information stored in a database, and identifying an abusing element for the identified stored time information;

Identifying a keyword included in the question-answer information set and identifying an abusing element for the identified keyword;

Identifying IP information in the question-answer information set and identifying an arranging element for the identified IP information;

If the abusing element is identified in at least one of the first, second, and third steps, extracting a spam index corresponding to the identified abusing element from a spam index table; And

Calculating a total spam index by summing the extracted spam indexes

Spam index calculation method comprising a.

The method of claim 1,

The third step,

Identifying the questioner IP information of the questioner who inputs the question information and the answerer IP information of the answerer who has entered the answer information; And

If the identified interrogator IP information and the answerer IP information are the same, identifying in the spam index table an abusing element associated with the same questioner and answerer's IP as the abusing element;

Spam index calculation method comprising a.

The method of claim 1,

The third step,

Identifying the interrogator IP information of the interrogator who input the question information and the answerer IP information of the responder who inputs the answer information adopted by the interrogator among the answer information; And

Spam index calculation method comprising a.

The method of claim 1,

The third step,

Identifying the questioner IP information of the questioner who inputs the question information and the answerer IP information of the answerer who has entered the answer information;

Comparing the identified interrogator IP information and the C-Class IP information of the answerer IP information; And

If the C-Class IP information is the same as a result of the comparison, identifying an abusing element in the spam index table related to the same C-Class of the questioner and the answerer as the abusing element;

Spam index calculation method comprising a.

The method of claim 1,

The third step,

Spam index calculation method comprising a.

The method of claim 1,

The first step,

Identifying storage time information of the question information and storage time information of the answer information, which are stored in the database; And

If the difference between the stored time information of the identified question information and the stored time information of the answer information is less than a reference registration time information, identifying an abusing element related to adopting answer information as the abusing element in the spam index table

Spam index calculation method comprising a.

The method of claim 1,

The first step,

Identifying time-of-storage information of the question information stored in the database and time-of-adoption of the answer information adopted by the questioner who inputs the question information among the answer information; And

If the difference between the stored time information of the identified question information and the adopted time information is smaller than the response reflecting time information, identifying an abusing element related to adopting the answer information as the abusing element in the spam index table;

Spam index calculation method comprising a.

The method of claim 1,

The first step,

Identifying time-of-storage information of the answer information stored in the database and time-of-adoption of the answer information adopted by the questioner who inputs the question information among the answer information; And

If the difference between the stored time information of the identified answer information and the adopted time information is smaller than the response reflecting time information, identifying an abusing element related to adopting the answer information as the abusing element in the spam index table;

Spam index calculation method comprising a.

The method of claim 1,

The second step,

Extracting URL information from the question-answer information set; And

When the extracted URL information is registered in advance in association with spam, identifying an abusing element related to UEL detection registered as the abusing element in the spam index table.

Spam index calculation method comprising a.

The method of claim 1,

The second step,

Identifying whether an advertising keyword is included in the set of question-answer information; And

If an advertising keyword is identified, identifying in the spam index table an abusing element relating to advertising keyword detection as the abusing element;

Spam index calculation method comprising a.

The method of claim 1,

The second step,

Identifying whether adult text is included in the set of question-answer information; And

If adult text is identified, identifying in the spam index table an abusing element relating to adult text detection as the abusing element

Spam index calculation method comprising a.

The method of claim 1,

The second step,

Identifying whether syllables satisfying a selected frequency are included among syllables included in the question-answer information set; And

If a syllable that satisfies the selected frequency is identified, identifying an abusing element associated with a high specific syllable frequency as the abusing element in the spam index table;

Spam index calculation method comprising a.

The method of claim 1,

The second step,

Identifying whether a keyword satisfying a selected frequency is included among the keywords included in the question-answer information set; And

If a keyword satisfying the selected frequency is identified, identifying the abusing element associated with the duplicate keyword as the abusing element in the spam index table;

Spam index calculation method comprising a.

The method of claim 1,

The second step,

Maintaining a keyword spam index for each keyword in the keyword spam index table;

Extracting keywords from the question-answer information set; And

Identifying a keyword spam index matching the extracted keyword by referring to the keyword spam index table

Spam index calculation method comprising a.

The method of claim 12,

The above step of maintaining the keyword spam index in the keyword spam index table,

Differentially assigning a keyword spam index to each of the URL, the advertising keyword, the adult keyword, and the harmful syllable; And

Associating the differentiated keyword spam index, the URL, the advertisement keyword, the adult keyword, and the harmful syllable, respectively, and maintaining the keyword spam index in the keyword spam index table.

Spam index calculation method comprising a.

The method of claim 12,

The step of calculating the total spam index,

Calculating the total spam index in view of the identified keyword spam index

Spam index calculation method comprising a.

The method of claim 1,

If the calculated total spam index is greater than the spam detection index, detecting the question-answer information set as spam postings

Spam index calculation method characterized in that it further comprises.

The method of claim 17,

Deleting the set of question-answer information detected as spam posts from the search list for a predetermined time; or

Reporting the set of question-answer information detected as spammy posts to a system operator

Spam index calculation method characterized in that it further comprises.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 18.

A time-information confirming unit, which includes a question-answer information set including question information and answer information associated with the question information, for confirming stored time information stored in a database;

A keyword checking unit that checks keywords included in the question-answer information set;

An IP information confirming unit confirming IP information of the question-answer information set; And

Identifying an abusing element for each of the identified storage time information, keywords, and IP information, and when the abusing element is identified, extracting a spam index corresponding to the identified abusing element from a spam index table, and extracting A spam index calculation that adds the total spam scores to yield a total spam score

Spam index calculation system comprising a.

The method of claim 20,

The IP information verification unit,

Identifies the questioner IP information of the questioner who inputs the question information and the answerer IP information of the answerer who inputs the answer information,

The spam index calculation unit,

And if the identified interrogator IP information and the answerer IP information are the same, an abusing element associated with the same questioner and answerer's IP as the abusing element is identified in the spam index table.

The method of claim 20,

The IP information verification unit,

Identify the interrogator IP information of the interrogator who entered the question information and the answerer IP information of the answerer who input the answer information adopted by the interrogator among the answer information,

The spam index calculation unit,

The method of claim 20,

The IP information verification unit,

The spam index calculation unit,

Comparing the identified C-Class IP information of the interrogator IP information and the answerer IP information, and if the comparison results in the same C-Class IP information, abusing the C-Class of the interrogator and the answerer as the abusing element. Spam index estimating system, characterized in that an element is identified in said spam index table.

The method of claim 20,

The IP information verification unit,

The spam index calculation unit,

The method of claim 20,

The time information confirmation unit,

Identify storage time information of the question information and storage time information of the answer information, which are stored in the database,

The spam index calculation unit,

When the difference between the stored time information of the identified question information and the stored time information of the answer information is smaller than a reference registration time information, identifying the abusing element related to adopting the answer information as the abusing element in the spam index table. Spam index calculation system.

The method of claim 20,

The time information confirmation unit,

Identifying the stored time information of the question information and the adopted time information about the answer information adopted by the questioner who input the question information among the answer information, stored in the database,

The spam index calculation unit,

And when the difference between the stored time information of the identified question information and the adopted time information is smaller than the response reflecting time information, an spamming element related to adopting the answer information as the abusing element is identified in the spam index table. Index calculation system.

The method of claim 20,

The time information confirmation unit,

Identifying time-of-storage information of the answer information stored in the database and time-of-adoption of the answer information adopted by the questioner who inputs the question information among the answer information;

The spam index calculation unit,

And when the difference between the stored time information of the identified answer information and the adopted time information is smaller than the response reflecting time information, an spamming element related to adopting the answer information as the abusing element is identified in the spam index table. Index calculation system.

The method of claim 20,

The keyword verification unit,

Extract URL information from the question-answer information set,

The spam index calculation unit,

And when the extracted URL information is registered in advance in association with spam, an spamming table relating to UEL detection registered as the bridging element in the spam index table.

The method of claim 20,

The keyword verification unit,

Identify whether an advertising keyword is included in the set of question-answer information,

The spam index calculation unit,

And if an advertising keyword is identified, identifying an abusing element related to detection of the advertising keyword as the abusing element in the spam index table.

The method of claim 20,

The keyword verification unit,

Identify whether adult text is included in the set of question-answer information,

The spam index calculation unit,

And if adult text is identified, identifying in said spam index table an abusing element relating to adult text detection as said abusing element.

The method of claim 20,

The keyword verification unit,

Identifying whether syllables or keywords satisfying a selected frequency are included among syllables or keywords included in the question-answer information set,

The spam index calculation unit,

And if a syllable or a keyword satisfying the selected frequency is identified, an abusing element associated with a specific syllable frequency or a duplicate keyword as the abusing element is identified in the spam index table.