KR101089338B1

KR101089338B1 - Method and apparatus for evaluation of original text in bulletin board

Info

Publication number: KR101089338B1
Application number: KR1020090104295A
Authority: KR
Inventors: 이용규
Original assignee: 동국대학교 산학협력단
Priority date: 2009-10-30
Filing date: 2009-10-30
Publication date: 2011-12-02
Also published as: KR20110047601A

Abstract

인터넷 등의 게시판에 있어서 원문을 평가하는 방법이 제공된다. 게시물 원문의 텍스트로부터 키워드를 추출하여 벡터화한다. 원문에 대한 댓글로부터 키워드를 추출하여 댓글별로 벡터화한다. 원문 키워드 벡터와 상기 댓글 벡터의 유사도를 비교한다. 유사도가 낮은 소정 범위의 댓글을 제외하고 나머지를 선택한다. 선택된 댓글들에서 찬성 키워드 및 반대 키워드를 추출하여 각 댓글에 대한 찬성 점수 및 반대 점수를 얻는다. 찬성 점수 및 반대 점수에 기초하여 게시물 원문에 점수를 부여한다.A method for evaluating the original text in a bulletin board such as the Internet is provided. Extract and vectorize keywords from the text of the original post. Extract keywords from original comments and vectorize them by comments. The similarity between the original keyword vector and the comment vector is compared. Except for a range of comments with low similarity, the rest is selected. A positive keyword and a negative keyword are extracted from the selected comments to obtain a positive score and a negative score for each comment. Scores are given to the text of the post based on the scores for the pros and cons.

Description

Method and apparatus for evaluating the original text of the post {Method and apparatus for evaluation of original text in bulletin board}

본 발명의 실시예는 인터넷에 게시된 게시물의 댓글을 분석함으로써 인터넷 등의 게시물 원문에 대한 평가를 행하는 방법 및 장치에 관한 것이다.Embodiments of the present invention relate to a method and apparatus for evaluating a text of a post, such as the Internet, by analyzing comments of a post posted on the Internet.

인터넷이 활성화되어 있지 않았던 시대에는 특정 주제에 관한 자료를 구하는 것 자체가 힘들었다. 그리고, 인터넷 발달의 초기에는 인터넷 사용 인구가 그리 많지 않고, 따라서 유사한 관심 분야를 갖는 사람들이 인터넷 상의 가상의 공간에서 함께 모이는 것이 비교적 용이하였다. 비록 자료의 양은 적었을지라도 인터넷에 존재하는 특정 분야에 대한 자료를 찾는 것은 상대적으로 쉬웠다고 볼 수도 있다.In times when the Internet was not active, it was difficult to get data on a particular subject. And in the early days of the development of the Internet, it was relatively easy for people with similar Internet interests to gather together in a virtual space on the Internet. Although the amount of data was small, finding data on a particular area of the Internet was relatively easy.

한편, 오늘날은 인터넷에 관한 각종 산업 및 개인의 의식 및 이용 양태가 이전과 달라져서 예전과는 비교할 수도 없을 정도로 방대하고 수많은 양의 자료들이 인터넷 상에 공개되어 있다.On the other hand, today, the consciousness and use of various industries and individuals on the Internet has changed from the past, and a vast amount of materials are disclosed on the Internet.

이처럼 많은 자료를 활용할 수 있게 되어 편리한 점도 있으나, 한편으로는 이 많은 자료들 중에서 어떠한 글들이 유용한 글이고 어떠한 글들이 유용하지 않은 글인지를 찾아내는 일은 더욱 어려워졌다고 하겠다. 즉, 예전에는 자료의 수가 적 은 것이 문제가 되었으나 요즈음에 들어서는 방대한 자료 중에서 옥석을 가려내는 일이 더 관건이 되는 상황이 되었다. 다시 말해, 인터넷의 급격한 발달 및 정보량의 증가에 따라 각종 자료를 평가하는 일이 중요한 위치를 차지하게 된 것이다.It is convenient to be able to use such a large amount of data, but on the other hand, it is more difficult to find out which of these many materials are useful and which are not useful. In other words, in the past, a small number of data became a problem, but nowadays, it is more important to select a stone from a large amount of data. In other words, evaluating various materials has become an important place due to the rapid development of the Internet and the increase of information volume.

본 발명의 실시예는 인터넷 상에 게시된 게시물에 대하여, 그 댓글로부터 원문을 평가하는 방법 및 장치를 제공하고자 한다.An embodiment of the present invention is to provide a method and apparatus for evaluating a text from a comment on a post posted on the Internet.

본 발명의 실시예에 따르면, 원문 키워드 벡터화 수단에 의해, 전기통신망에서의 게시물에 대해, 게시물 원문의 텍스트로부터 키워드를 추출하여 벡터화하는 원문 키워드 벡터화 단계; 댓글 키워드 벡터화 수단에 의해, 상기 게시물 원문에 대한 n개의 댓글(n은 1 이상의 자연수)로부터 키워드를 추출하여 각각의 댓글별로 벡터화하는 댓글 키워드 벡터화 단계; 벡터 유사도 비교 수단에 의해, 원문 키워드 벡터와 상기 n개의 댓글 키워드 벡터의 유사도를 비교하는 벡터 유사도 비교 단계; 댓글 선택 수단에 의해, 상기 비교한 유사도의 순위 또는 유사도 수치에 따라 유사도가 낮은 소정 범위의 댓글을 제외하고 나머지 댓글들만을 선택하는 댓글 선택 단계; 찬성 점수 및 반대 점수 획득 수단에 의해, 상기 선택된 댓글들에서 찬성 키워드 및 반대 키워드를 추출하여 각 댓글에 대한 점수를 얻는 찬성 점수 및 반대 점수 획득 단계; 및 원문 점수 부여 수단에 의해, 상기 찬성 점수 및 반대 점수에 기초하여 게시물 원문에 점수를 부여하는 원문 점수 부여 단계를 포함하는 원문 평가 방법이 제공된다.According to an embodiment of the present invention, a text keyword vectorizing step of extracting and vectorizing a keyword from a text of a text of a post text for a post in a telecommunication network by the text keyword vectorizing means; A comment keyword vectorizing step of extracting keywords from n comments (n is a natural number of 1 or more) for the original text by the comment keyword vectorizing means and vectorizing each comment; A vector similarity comparison step of comparing, by vector similarity comparing means, similarities between the original keyword vector and the n comment keyword vectors; A comment selecting step of selecting, by the comment selecting means, only the remaining comments except for a predetermined range of low similarity according to the ranking of the similarity or the similarity number; Obtaining, by the affirmative and disagreeing score obtaining means, affirmative and disagreeing score obtaining points for each comment by extracting affirmative keywords and disagreeing keywords from the selected comments; And an original text scoring method for assigning a score to the original text based on the positive score and the opposite score by the original score assigning means.

본 발명의 실시예에 따르면, 전기통신망에서의 게시물에 대해, 게시물 원문의 텍스트로부터 키워드를 추출하여 벡터화하는 원문 키워드 벡터화 수단; 상기 게 시물 원문에 대한 n개의 댓글(n은 1 이상의 자연수)로부터 키워드를 추출하여 각각의 댓글별로 벡터화하는 댓글 키워드 벡터화 수단; 원문 키워드 벡터와 상기 n개의 댓글 키워드 벡터의 유사도를 비교하는 벡터 유사도 비교 수단; 상기 비교한 유사도의 순위 또는 유사도 수치에 따라 유사도가 낮은 소정 범위의 댓글을 제외하고 나머지 댓글들만을 선택하는 댓글 선택 수단; 상기 선택된 댓글들에서 찬성 키워드 및 반대 키워드를 추출하여 각 댓글에 대한 점수를 얻는 찬성 점수 및 반대 점수 획득 수단; 및 상기 찬성 점수 및 반대 점수에 기초하여 게시물 원문에 점수를 부여하는 원문 점수 부여 수단를 포함하는 원문 평가 장치가 제공된다.According to an embodiment of the present invention, for a post in a telecommunications network, original keyword vectorization means for extracting and vectorizing a keyword from the text of the original post; A comment keyword vectorization means for extracting keywords from n comments (n is a natural number of 1 or more) for the original text of the post and vectorizing each comment; Vector similarity comparison means for comparing the similarity between the original keyword vector and the n comment keyword vectors; Comment selecting means for selecting only the remaining comments except for a predetermined range of low similarity according to the ranking of similarity or similarity value; A means for acquiring and disagreeing scores for extracting the disapproval and disapproval keywords from the selected comments to obtain a score for each comment; And an original text scoring means for assigning a score to the original text based on the affirmative score and the opposite score.

본 발명의 실시예는 인터넷 상에 게시된 게시물에 대하여 댓글을 평가하고 복수의 댓글에 대한 평가를 종합하여 원문에 대한 평가가 이루어지도록 한다.The embodiment of the present invention evaluates the comments on the posts posted on the Internet, and evaluates the original text by combining the evaluations of the plurality of comments.

본 발명의 실시예에 의하면 게시판에 별도의 투표 기능을 추가하지 않더라도 통상의 댓글만으로도 원문에 대한 평가가 이루어진다. According to an embodiment of the present invention, even if a separate voting function is not added to the bulletin board, the evaluation of the original text is made with only ordinary comments.

또는, 원문에서의 찬성(호감) 또는 반대(비호감)의 정도에 따라서 세밀한 평가가 이루어질 수도 있으므로 단지 찬성 1표 또는 반대 1표인 투표 기능에 비해서 더욱 세밀한 평가가 이루어지도록 할 수도 있다.Alternatively, a detailed evaluation may be made according to the degree of affirmation (agreement) or disagreement (a favorability) in the text, so that a more detailed evaluation may be made compared to the voting function of only one vote or affirmative vote.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. However, this is only an example and the present invention is not limited thereto.

본 발명을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설 명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. In describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intention or custom of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification.

본 발명의 기술적 사상은 청구범위에 의해 결정되며, 이하의 실시예는 본 발명의 기술적 사상을 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 효율적으로 설명하기 위한 일 수단일 뿐이다.The technical spirit of the present invention is determined by the claims, and the following embodiments are merely means for efficiently explaining the technical spirit of the present invention to those skilled in the art.

도 1은 본 발명의 일실시예에 따른 원문 평가 방법에 대한 흐름도이다.1 is a flowchart illustrating a text evaluation method according to an embodiment of the present invention.

도 1을 참조하면, 단계 102에서 인터넷 게시판 등에 작성된 게시물의 텍스트로부터 키워드 및 빈도를 추출한다. 텍스트라 함은 문자 형태의 모든 정보로서 원문의 제목 및 내용의 텍스트를 의미할 수 있으며, 더 나아가서는 이미지 파일에서 문자를 자동 인식하는 등의 변형도 가능할 것이다. Referring to FIG. 1, a keyword and a frequency are extracted from the text of a post written on an Internet bulletin board or the like in step 102. The text is all information in the form of a letter, which may mean the text of the title and the content of the original text, and may further be modified such as automatic recognition of the text in the image file.

키워드를 추출함에 있어서는 키워드 사전과 형태소 분석을 이용할 수 있다. 예컨대, 형태소 분석을 이용하여 불필요한 조사(은, 는, 이, 가) 등의 영향을 배제하는 효과를 얻을 수도 있다. 즉, 모든 텍스트가 키워드가 되는 것은 아니며, 키워드 후보를 찾아내기 위해서 형태소 분석 방법을 사용할 수 있는데, 형태소란 의미를 갖는 언어단위 중 최소단위이며, 형태소 분석에서는 실질적 의미를 갖는 실질형태소를 식별함으로써 색인어 추출을 위해서 사용된다. 색인어로 사용되는 품사는 주로 명사(또는 체언)이므로 형태소 분석을 통해 이를 식별하고 불규칙 활용이나 탈락, 축약 등이 있을 경우는 이를 처리하여 원형을 찾아낸다. 형태소 분석과 키워드 추출을 자동으로 수행하는 프로그램들이 활용되고 있다.In extracting keywords, a keyword dictionary and stemming analysis can be used. For example, morphological analysis can be used to eliminate the effects of unnecessary irradiation (s, s, s). In other words, not all texts are keywords, and morphological analysis can be used to find keyword candidates. Morphological is the smallest unit of meaningful language units. In morphological analysis, index terms are identified by identifying real morphemes that have substantial meanings. Used for extraction. Parts of speech used as index terms are mostly nouns (or verbs), so they are identified through morphological analysis, and in the case of irregular use, dropout, or abbreviation, they are processed to find a prototype. Programs that automatically perform stemming and keyword extraction are being used.

형태소 분석에서, 한글의 경우를 예로 들면, '명사+조사'로 구성되는 텍스트를 명사와 조사로 구분하고 조사를 제외한 명사에서 키워드 후보를 찾는 것이 일례가 될 수 있다. 키워드 후보란 아직 키워드로 선정되지는 않았으며 키워드가 되기 위한 검증을 거칠 단어를 지칭한다.In morpheme analysis, for example, in the case of Hangul, the text consisting of 'noun + search' may be divided into nouns and surveys, and the keyword candidate may be found in nouns other than the survey. The keyword candidate refers to a word that has not yet been selected as a keyword and has been verified to be a keyword.

또한, 앞에서 인터넷이라고 하였으나, 이는 일예이며 컴퓨터가 개입되는 전기통신망(전자정보통신망)이라는 의미로 파악하여야 할 것이다. 따라서, 외부로 연결되지 않는 인트라넷이라 해도 어느 정도의 규모만 갖추면 본 발명의 실시예의 기본 사상이 적용될 수 있음은 자명하다. 따라서, 본 발명은 전자적인 수단을 통한 전자정보통신망 전반에 적용되는 것으로 보아야 할 것이다. 또한, 그 통신이 유선에 의한 것이든 무선에 의한 것이든 적용될 수 있음은 물론이다.In addition, the Internet was mentioned earlier, but this is an example and should be understood as a telecommunication network (electronic information communication network) in which a computer is involved. Therefore, even if the intranet is not connected to the outside it is obvious that the basic idea of the embodiment of the present invention can be applied if only a certain amount of scale. Therefore, it should be seen that the present invention is applied to an entire electronic information communication network through electronic means. In addition, it is a matter of course that the communication can be applied by wire or wireless.

게시글(원문)에서 추출된 키워드(필요에 따라서는 당해 키워드의 빈도에 대한 정보도 포함)를 키워드 집합 A라 한다. 이처럼 추출된 키워드 및/또는 빈도는 벡터화되며, 이를 원문의 벡터 VA라 한다.The keyword extracted from the article (original text) (including information on the frequency of the keyword, if necessary) is referred to as keyword set A. The extracted keywords and / or frequencies are vectorized, which is referred to as the original vector VA.

다음으로, 단계 104에서, 단계 102의 원문에 대한 댓글을 분석한다. 즉, 게시물 원문에 대한 댓글로부터 키워드를 추출하는 것이다. Next, in step 104, a comment on the original text of step 102 is analyzed. In other words, the keyword is extracted from the comment on the original text of the post.

댓글이란, 전술한 바와 같이, 원래 게시자의 설명(원문)을 보고 난 후의 사용자의 반응이 될 것이다. 이러한 댓글은 물론 복수 개일 수 있으며, 여러 사용자 에 의한 것일 수 있다.A comment, as described above, will be the user's response after viewing the original publisher's description. These comments may of course be plural and may be by multiple users.

키워드 추출 과정에서는 형태소 분석 방법이 사용될 수 있다. 또한 키워드 사전이 사용될 수도 있다. 댓글이 n개라 가정하면, 각각의 댓글의 키워드 집합을 B1, B2, B3, ..., Bn이라 할 수 있다. 이들 각각으로부터 댓글의 벡터 VB1, VB2, VB3, ..., VBn을 얻을 수 있다.In the keyword extraction process, a morphological analysis method may be used. Keyword dictionaries may also be used. Assuming n comments, the keyword set of each comment may be referred to as B1, B2, B3, ..., Bn. From each of these, the vectors VB1, VB2, VB3, ..., VBn of the comment can be obtained.

단계 102 및/또는 단계 104를 수행하면서 키워드를 추출함에 있어서, 금칙어 또는 불용어(stop word)를 배제하는 것이 바람직하다. 금칙어라 함은 욕설이나 광고 문구 등을 제외하는 것이다. 그리고, 금칙어가 포함된 댓글은 평가에서 제외하거나 최하의 점수를 부여할 수도 있으며, 욕설이나 광고 게시를 상습적으로 행하는 자의 ID를 별도로 관리하여 이들의 댓글을 평가에서 제외하는 방식으로 진행할 수도 있다. 또한, 불용어란 색인어로서의 가치가 없는 단어들이며 미리 설정해 두고서 이를 제외한다는 것이다. 금칙어 및 불용어는 키워드를 추출하는 과정을 마치고 나서 배제할 수도 있고, 키워드를 추출하는 과정에서 배제할 수도 있으나, 효율성의 관점(컴퓨팅 자원의 효율적 이용)에서 후자가 바람직할 것이다.In extracting the keyword while performing step 102 and / or step 104, it is desirable to exclude banned words or stop words. Banned words exclude abusive language and advertising. In addition, the comment containing the banned words may be excluded from the evaluation or given the lowest score, and the ID of the person who habitually performs abusive or post posting may be managed in a manner of excluding these comments from the evaluation. Also, stopwords are words that have no value as index words, and are excluded after being set in advance. Kind words and stopwords may be excluded after the keyword extraction process, or may be excluded from the keyword extraction process, but the latter would be preferable in terms of efficiency (efficient use of computing resources).

일예로, 욕설이나 광고 문구 등에 사용되는 용어들을 금칙어로 별도의 사전으로 관리하여 키워드 후보로 선정되지 않도록 할 수도 있으며, 색인어로서의 가치가 없는 불용어(stop word)도 별도의 사전으로 관리하여 키워드 후보로 선정되지 않도록 할 수도 있다.For example, terms used in abusive language and advertising phrases can be managed as separate dictionaries to prevent them from being selected as keyword candidates. Also, stop words that are not of value as index terms can be managed as separate dictionaries as keyword candidates. It can also be chosen.

다음으로, 단계 106에서, 원문 벡터와 각 댓글 벡터의 유사도를 비교한다.Next, in step 106, the similarity of the original vector and each comment vector is compared.

원문 벡터는 단계 102에서 구한 VA를 말하며, 각 댓글 벡터는 단계 104에서 구한 VBk(k는 1부터 n까지의 자연수)를 말한다. 즉, VA와 VB1을 비교하고, 또 VA와 VB2를 비교하고, VA와 VB3을 비교하고, ..., VA와 VBn을 비교하여 각각의 경우에 있어서의 유사도를 계산하는 것이다. The original vector refers to VA obtained in step 102, and each comment vector refers to VBk (k is a natural number from 1 to n) obtained in step 104. That is, VA and VB1 are compared, VA and VB2 are compared, VA and VB3 are compared, ..., and VA and VBn are compared to calculate the similarity in each case.

이때 유사도의 비교 방법으로서 여러 가지 기법이 있다. 대표적인 것으로는 (1) 불리언 모델의 키워드 비교 방식, (2) 유클리디언 거리 공식에 의한 유사도 계산, (3) 코사인 공식에 의한 유사도 계산 등을 이용할 수 있다. 이는 예시일 뿐이고 유사도를 비교할 수 있는 방법이라면 어느 것이라도 사용 가능할 것이다.At this time, there are various techniques as a comparison method of similarity. Representative examples may include (1) a keyword comparison method of a Boolean model, (2) similarity calculation using Euclidean distance formula, and (3) similarity calculation using cosine formula. This is only an example and any method that can compare the similarity may be used.

키워드 집합을 벡터로 변환하여 벡터에 의한 검색을 수행하는데, 물론, 반드시 벡터로 변환하여 검색을 하여야 하는 것은 아니며, 이하 설명하는 내용은 일예로 보아야 할 것이다. The keyword set is converted to a vector to perform a search using a vector. Of course, the keyword set is not necessarily converted to a vector to be searched.

이 중 몇가지를 설명하자면 다음과 같다.Some of these are described below.

유클리디언 거리에 의한 유사도 측정, 즉, 유클리디언 거리 공식이란 다차원 공간에서 두 점 간의 거리를 구하는 것으로서, 두 점(벡터)을 (p1, p2, p3, p4,...)와 (q1, q2, q3, q4, ...)로 표기한 경우 유클리디안 거리 공식은 아래와 같다.The similarity measure by Euclidean distance, that is, the Euclidean distance formula, is to calculate the distance between two points in multidimensional space.The two points (vector) are defined as (p1, p2, p3, p4, ...) and (q1). , q2, q3, q4, ...), the Euclidean distance formula is:

한편, 코사인 공식에 의한 유사도 측정은 다음과 같다.On the other hand, the similarity measurement by the cosine formula is as follows.

코사인 공식은 벡터 간의 코사인 값을 구하여 유사도로 사용하는 것이다. 이때, 코사인 값이 큰 것이 유사도가 높은 것이다.The cosine formula is used to calculate the cosine value between vectors. At this time, the larger the cosine value, the higher the similarity.

다음으로, 단계 108에서, 댓글들의 원문과의 유사도에 의해 유사도가 높은 것부터 정렬(예를 들어, 내림차순)하여 댓글들에 랭킹을 부여한다.Next, in step 108, the comments are ranked from the highest ones (eg, in descending order) by the similarity with the original text to give the rankings to the comments.

즉, VA와 VB1의 유사도, VA와 VB2의 유사도, VA와 VB3의 유사도, ..., VA와 VBn의 유사도 중에서 어느 것이 가장 수치가 크고 어느 것이 가장 작은지 순위를 매기는 것이다.That is, among the similarity between VA and VB1, the similarity between VA and VB2, the similarity between VA and VB3, ..., and the similarity between VA and VBn, the highest and highest are ranked.

다음으로, 단계 110에서, 유사도가 사전에 설정된 일정 수치 이상인 댓글들만을 선택한다. 예를 들어, VA와 VB1의 유사도, VA와 VB2의 유사도, VA와 VB3의 유사도, ..., VA와 VBn의 유사도가 각각 0과 1사이의 수치(예를 들어, 1에 가까울수록 유사도 높다고 가정)로 나타냈을 때, 0.4, 0.3, 0.1, 0.4, 0.7, 0.5, ... 0.9였고 사전에 설정된 일정 수치(즉, 유사도의 임계값)가 0.35라면, 위 수치 중에서 0.35 이상의 유사도를 갖는 댓글들에 대해서만 순위를 매기는 것이다. 이처럼 유사도가 낮은 댓글들은 원문의 내용과 관계가 적은 댓글이므로 이후의 평가에서 제외 할 수 있다. 그리고 앞으로 댓글과 게시물 원문의 평가와 관련해서 사용되는 댓글이란 용어는 이 과정에서 선택된 유사 댓글을 의미한다.Next, in step 110, only comments whose similarity is equal to or greater than a predetermined predetermined value are selected. For example, the similarity between VA and VB1, the similarity between VA and VB2, the similarity between VA and VB3, ..., the similarity between VA and VBn, respectively, between 0 and 1 (e.g., the closer to 1, the higher the similarity). Assumptions) is 0.4, 0.3, 0.1, 0.4, 0.7, 0.5, ... 0.9 and if a predetermined predetermined value (i.e. the threshold of similarity) is 0.35, a comment having a similarity of 0.35 or more among the above values Only the fields are ranked. Such low similarity comments can be excluded from subsequent evaluation because they have little relationship with the original content. In addition, the term "comment" used in relation to the evaluation of the comment and the original text of the post means a similar comment selected in this process.

단계 108 및 단계 110과 관련하여, 단계 108에서는 유사도에 따라 순위를 매기고, 단계 110에서는 유사도 수치의 임계값이 따라 분류를 행하는데, 필요에 따라서는 이들 중의 하나의 판단 척도만을 이용할 수도 있을 것이다. 즉, 순위를 계산하여 일정 순위 이상의 댓글만을 선택하거나, 유사도 수치를 계산하여 일정 수치 이상의 댓글만을 선택하는 실시예도 가능할 것이다.With respect to steps 108 and 110, in step 108 the ranking is ranked according to the similarity, and in step 110 the thresholds of the similarity numbers are sorted accordingly, although only one of these judgment measures may be used if necessary. That is, an embodiment of selecting only comments above a certain rank by calculating a rank or selecting only comments above a certain number by calculating a similarity value may be possible.

다음으로, 단계 112에서, 선택된 댓글 작성자에게 유사도 수치에 근거하여 인센티브 점수를 부여할 수 있다. 인센티브 수치는 유사도 수치 자체를 그대로 사용하거나 유사도 수치를 가공(예를 들어, 소정의 가중치를 곱한 값)하여 사용할 수 있다. 이 점수는 기간 별로 또는 통산하여 누계되어 댓글 작성자 랭킹을 산정하는 데에 사용될 수도 있다. 이때의 댓글 작성자의 랭킹은 원문에 관계있는 충실한 댓글을 작성한 순위를 나타낸다.Next, in step 112, the selected commenter may be given an incentive score based on the similarity value. The incentive value may be used as the similarity value itself or by processing the similarity value (for example, multiplied by a predetermined weight). This score may be aggregated over time or aggregated to determine the commenter ranking. The ranking of the commenter at this time indicates the ranking of the faithful comments related to the original text.

예를 들어, n개의 댓글 중에서, 댓글 1, 댓글 4, 댓글 5가 소정의 임계값을 넘은 유사도를 가진다고 하고, 이 값이 각각 0.4, 0.45, 0.7이라고 하면 이 0.4, 0.45, 0.7의 유사도 수치(s) 자체를 인센티브 점수로 줄 수도 있고 이에 소정의 가중치(w1)를 곱한 값, 즉 s×w1을 인센티브 점수로 줄 수도 있다. w1은 예를 들어 0.1로 할 수 있다.For example, among n comments, if a comment 1, a comment 4, and a comment 5 have similarities exceeding a predetermined threshold, and the values are 0.4, 0.45, and 0.7, respectively, the similarity values of 0.4, 0.45, and 0.7 ( s) itself may be given as an incentive score, or a value obtained by multiplying a predetermined weight w1, that is, s × w1, may be given as an incentive score. w1 can be made into 0.1, for example.

이러한 평가는, 하나의 원문에 대한 댓글의 평가로 끝날 수도 있고, 전술한 바와 같이, 사용자(ID로 구분함) 별로 누계하여 통계를 내서 사용할 수도 있다. 누 계란 당해 원문에 대한 누계 뿐만 아니라 게시판 전체의 원문에 대한 누계, 게시판 전체의 원문에 대한 소정 기간에서의 누계 등 여러 가지로 가공할 수 있다.Such evaluation may end with an evaluation of a comment on one original text, or, as described above, may be aggregated by user (identified by ID) and used as a statistic. Cumulative eggs can be processed in various ways, including not only the cumulative total of the original text, but also the cumulative total of the original text of the entire bulletin board, and the cumulative total of the original text of the entire bulletin board.

다음으로, 단계 114에서, 선택된 댓글들에서 찬성 키워드와 반대 키워드를 추출한다. 이때 전술한 형태소 분석과 키워드 사전을 이용할 수 있다. 그리고 찬성과 반대 관련 키워드의 빈도도 알아내는 것이 바람직하다. 이와 더불어 강조어 테이블(후술함) 및 반의어 사전(후술함)도 함께 이용하는 것이 바람직하다. 여기서 ‘찬성’, ‘반대’ 용어는 단지 협의의 찬성과 반대에만 국한되는 용어가 아니다. 찬성은 광의로 찬성이나 동의, 지지, 찬동, 허가, 허락, 승낙, 호감, 우호, 좋음 등 원문에 공감을 표시하는 용어의 총칭이며, 반대 또한 원문의 뜻에 반하는 다양한 표현의 총칭이다. 따라서 찬성과 반대 용어의 의미를 정하는 방법에 따라 본 발명을 다양한 경우에 적용할 수 있다.Next, in step 114, the affirmative keyword and the opposite keyword are extracted from the selected comments. At this time, the above-described morphological analysis and keyword dictionary may be used. It's also a good idea to find out the frequency of the pros and cons related keywords. In addition, it is preferable to use an emphasis table (to be described later) and an antonym dictionary (to be described later) together. The terms 'yes' and 'opposition' are not limited to agreements and disagreements. Agree is a general term for a term that expresses empathy in the text, such as approval, consent, support, approve, permission, permission, acceptance, favour, friendship, good, in a broad sense, and vice versa. Therefore, the present invention can be applied to various cases according to a method of determining the meaning of pros and cons.

빈도가 적정수(예를 들어 3회)를 초과하는 경우에는 사전에 설정된 적정수의 빈도수만을 부여한다. 예를 들어, 어떤 댓글에 "찬성 찬성 찬성 찬성 찬성 찬성"이라는 문구가 있을 수 있는데 이 문구에 '찬성'이 6번 있다고 하여 점수를 6배만큼 부여하게 된다면 모든 사용자들이 댓글을 달면서 이러한 문구를 가능한 많이 반복(예를 들어 100번, 1000번)할 수도 있는데 이는 바람직하지 않다. 따라서, 이러한 경우라면 사전에 설정된 적절한 빈도수를 3으로 설정하여 두고 '찬성'이라는 단어가 6번 있더라도 3번까지만 의미를 부여한다는 것이다. 이렇게 함으로써 지나치게 많은 찬성이나 반대 키워드를 사용하여 찬성과 반대 점수를 높이려는 시도를 제한할 수 있게 되는 것이다.If the frequency exceeds the appropriate number (for example, three times), only the frequency set in advance is set. For example, a comment might contain the phrase "yes" and "yes", but if you say "yes" six times and give it six times as many points, then all users will be able to comment You may repeat many times (eg 100 times, 1000 times), which is undesirable. Therefore, in this case, the proper frequency set in advance is set to 3, and the meaning is given only up to 3 even if the word 'yes' is 6 times. This will limit attempts to increase the number of pros and cons by using too many pros and cons keywords.

또한, 필요에 따라서는 찬성 키워드와 반대 키워드의 수식어 중에서도 형태소 분석 기법에 의해 강조 수식어를 찾아낼 수 있다. 예를 들어 '매우 찬성', '강하게 동의', '적극 지지' 등에서 '매우', '강하게', '적극' 등이 강조 수식어이다. 이러한 구성에 의하면 '찬성'과 '매우 찬성'을 다르게 취급할 수 있게 된다. In addition, if necessary, among the modifiers of the affirmative keyword and the opposite keyword, the emphasis modifier can be found by a morphological analysis technique. For example, 'highly', 'strongly', 'strongly' and 'highly' are the emphasis modifiers. According to this configuration, 'yes' and 'very positive' can be treated differently.

이를 위해서는 찬성어 사전과 반대어 사전을 이용할 수 있다. 찬성어 사전에는 찬성의 뜻으로 사용하는 단어들(예를 들어, 찬성, 동의, 지지, 동조 등)이 포함되며, 반대어 사전에는 반대의 뜻으로 사용하는 단어들(예를 들어, 반대, 거부 등)이 포함된다. 한편, 전술한 강조어에 관한 사항은 (강조어, 가중치) 필드를 갖는 강조어 테이블에 의해 처리될 수 있다.To do this, you can use the pros and dictionaries. Pros and dictionaries contain words that are used to mean pros (for example, affirmative, consent, support, sympathy, etc.). ) Is included. On the other hand, the matter regarding the above-described emphasis word may be processed by the emphasis word table having a (strength word, weight) field.

또한, 반의어 사전을 사용하는 것이 바람직한데, 반의어 사전은 (단어, 반의어) 필드로 구성될 수 있으며, 이는 '동의하지 않는다'라는 문구를 동의에 반하는 의사(즉, 반대)로 판단하기 위해 필요하다. 이러한 반의어 사전이 없다면 '동의하지 않는다'라는 문구는 단순히 형태소 분석에 의해서 '동의'로 간주될 수도 있기 때문이다. 물론 ‘반대하지 않는다’는 문구를 반대에 반하는 의사(즉, 동의)로 판단할 수 있음은 물론이다. In addition, it is preferable to use an antonym dictionary, which can be composed of (word, antonym) fields, which is necessary to determine the phrase 'I do not agree' with a disagreement (ie, vice versa). . Without this antonym, the phrase 'don't agree' could simply be regarded as 'agreement' by morphological analysis. Of course, you can judge the phrase “do not disagree” with the intention of opposing (ie, consent).

이때 반의어 사전 대신에 동의어 사전을 사용하는 것도 가능하다. 동의어 사전은 (단어, 동의어) 필드들로 구성될 수 있으며, 일례로 ‘동의하지 않음’을 ‘반대’의 동의어로 설정할 수 있다. 또한 반의어 사전과 동의어 사전에서의 단어는 단어뿐만 아니라 복수의 단어로 구성되는 복합 단어나 구, 절 등도 가능하다.It is also possible to use a synonym dictionary instead of the antonym dictionary. The synonym dictionary may be composed of (words, synonyms) fields, and for example, 'don't agree' may be set as 'opposite' synonyms. In addition, the words in the antonym and thesaurus can be compound words, phrases, clauses, etc. consisting of a plurality of words as well as words.

이러한 과정을 거쳐서 각 댓글들의 찬성 점수와 반대 점수를 얻는다.Through this process, each player's comment is given a positive and negative score.

단계 114를 수행하는 큰 틀(즉, 찬성어 사전, 반대어 사전, 강조어 테이블 및 반의어 사전의 이용)에 대해서 설명하였으며 그 구체적인 방법은 여러 가지가 될 수 있다. 단계 114에 대한 구체적인 여러 가지 수행 방식은 도 2, 3, 4를 참조하여 더욱 상세히 후술하기로 한다.The large framework for performing step 114 (i.e., use of the agree dictionary, the antonym dictionary, the emphasis table, and the antonym dictionary) has been described. Various implementation manners for step 114 will be described later in more detail with reference to FIGS. 2, 3, and 4.

다음으로 단계 116에서, 앞 단계에서 얻은 각 댓글들의 찬성 점수와 반대 점수로부터 게시물의 원문에 대하여 다양한 측정치를 얻을 수 있다. 여기서 말하는 다양한 측정치란 전술한 단계들에서 구한 수치를 이용하여 가공한 어느 수치이든지 될 수 있으며, 예를 들어 다음과 같다. 이는 일예이며 아래에 정의되지 않은 항목을 정의하여 사용하거나, 또는 아래의 각 항목의 정의를 다소 변형하여 사용하는 것도 가능함은 물론이다. 다음의 척도에서 댓글은 앞에서 설명한대로 평가를 위해 선택된 유사 댓글을 의미한다.Next, in step 116, various measures can be obtained for the text of the post from the positive and negative scores of each comment obtained in the previous step. The various measured values herein may be any values processed using the values obtained in the above-described steps, for example, as follows. This is an example, and it is also possible to define and use an item not defined below or to use a slightly modified definition of each item below. Comments in the following scale refer to similar comments selected for evaluation as described previously.

* 찬성 댓글의 수: 찬성 점수가 반대 점수보다 일정 수치(e1 ≥ 0) 이상 큰 댓글들의 수* Number of comments: The number of comments with a positive score greater than a certain number (e1 ≥ 0).

* 반대 댓글의 수: 반대 점수가 찬성 점수보다 일정 수치(e1 ≥ 0) 이상 큰 댓글들의 수* Number of negative comments: Number of comments with a negative score greater than a positive score (e1 ≥ 0)

* 찬성 댓글 랭킹 (점수 순)* Comment ranking in favor (in order of score)

* 반대 댓글 랭킹 (점수 순)Ranking of comments in reverse order

* 찬성율: 댓글 중 찬성 댓글의 비율* Agree: Proportion of comments in comments

* 반대율: 댓글 중 반대 댓글의 비율* Relative rate: The percentage of negative comments in the comment

* 기권율: 1 - 찬성율 - 반대율* Withdrawal Rate: 1-Yes-No

* 찬성 총점: 댓글들에서의 찬성 점수의 총점* Total score in favor: Total score in favor of comments

* 반대 총점: 댓글들에서의 반대 점수의 총점* Total score against: Total score of the score in the comments

* 찬성 댓글들의 평균 찬성 점수: Aavg* Average approval score of pros: Aavg

* 반대 댓글들의 평균 반대 점수: Davg* Average Opposite Score for Opposite Comments: Davg

* 각 댓글의 (찬성 점수 - 반대 점수)의 전체 합계: Ptotal* Total sum of (Yes-Negative scores) for each comment: Ptotal

→ 여기서, Ptotal이 일정 수치(e2 ≥ 0) 이상이면 찬성이 반대보다 큰 게시물로 판단한다.→ If Ptotal is above a certain value (e2 ≥ 0), it is determined that the post is larger than the opposite.

* 각 댓글의 (찬성 점수 - 반대 점수)의 전체 평균: Pavg* Overall Average Of Each Comment's (Pros-Opposite Scores): Pavg

→ 여기서, Pavg가 일정 수치(e3 ≥ 0) 이상이면 찬성이 반대보다 큰 게시물로 판단한다.→ If Pavg is above a certain value (e3 ≥ 0), it is determined that the post is larger than the opposite.

* 각 댓글의 (찬성 점수 - 반대 점수)의 전체 평균(Pavg)에 찬성 댓글의 수로 가중치를 부여: Pavg × 찬성의 수 × c1* Weighted the overall average (Pavg) of each comment's (favor scores-dissent scores) by the number of comments: Pavg × Number of pros ×

→ 여기서, c1은 조정 계수이며, c1은 찬성의 숫자별로 서로 상이한 가중치를 부여할 수 있다. 참고로 c1이 1이면 Ptotal과 같게 된다.¡Æ where c1 is an adjustment factor and c1 may be given a different weight for each number of pros. For reference, if c1 is 1, it is equal to Ptotal.

* Ptotal × c2* Ptotal × c2

→ 여기서, c2는 조정 계수이며, c2는 Ptotal 점수대별로 서로 상이한 가중치를 부여할 수 있다. 참고로 c2가 1이면 Ptotal과 같게 된다.¡Æ where c2 is an adjustment factor and c2 may be assigned a different weight for each Ptotal score band. For reference, 1 is equal to Ptotal.

다음으로, 단계 118에서, 단계 116의 측정치 중에서 적당한 측정치를 선택(예를 들어, Pavg 및 Ptotal)하여 게시물 원문의 점수로서 부여할 수 있다.Next, in step 118, appropriate measurements can be selected (eg, Pavg and Ptotal) from the measurements in step 116 and assigned as the score of the original text.

단계 116의 측정치 중에서 적당한 측정치를 선택한다고 하였으나, 단계 116 에서 꼭 필요한 측정치만을 계산한 후에 단계 118에서 사용하는 것까지도 포함하는 개념으로 생각하는 것이 타당할 것이다.Although it is said that a suitable measurement is selected from the measurements in step 116, it would be reasonable to think of it as a concept that includes only the measurements necessary in step 116 and then use in step 118.

다음으로, 단계 120에서, 게시물의 점수 순으로 랭킹을 부여할 수 있다. 즉, 측정치에 따라서 가장 점수가 높은 것부터 내림차순으로 정렬하여 게시물들에 랭킹을 부여하는 것이다. 예를 들어, Pavg를 사용한다면, 각 댓글의 찬성 점수에서 반대 점수를 뺀 점수의 평균값이 가장 높은 것부터 내림차순으로 정렬하여 랭킹을 부여할 수 있다.Next, in step 120, ranking may be given in order of the score of the post. That is, the ranking is given to the posts by sorting the highest score in descending order according to the measurement. For example, if you use Pavg, you can give the ranking by sorting in descending order from the highest score of each comment minus the opposite score.

다음으로, 단계 122에서, 원문에 부여된 점수에 의해 원문 게시자에게 인센티브를 부여할 수 있다. 인센티브 점수는, 예를 들어, 부여된 점수(Pavg)를 사용하거나 이를 가공한 값(Pavg×w2, 여기서 w2는 조정 계수)이어도 된다. 이 점수는 기간 별로 또는 통산하여 누계되어 게시자 랭킹을 산정할 수 있다. 이때의 게시자의 랭킹은 많은 댓글들로부터의 동의를 받은 순위를 나타내게 될 것이다.Next, in step 122, an incentive may be given to the original publisher by the score given to the original. The incentive score may be, for example, a value using a given score Pavg or a processed value (Pavg × w 2, where w 2 is an adjustment factor). This score can be accumulated over time or in aggregate to calculate publisher ranking. At this time, the ranking of the publisher will represent the ranking obtained by consent from many comments.

한편, 본 발명을 실시함에 있어서, 필요에 따라서는, 전술한 단계 108 및 단계 120에서 랭킹을 부여하는 것이 불필요하다고 생각될 수도 있을 것이다. 즉, 댓글의 유사도 랭킹 및/또는 원문의 랭킹이 불필요하고 댓글의 유사도 수치 및/또는 원문의 점수 수치만이 필요한 경우라면 단계 108 및 단계 120에서의 랭킹 부여를 생략하고 수치만으로 본 발명을 실시할 수도 있다.On the other hand, in practicing the present invention, it may be considered unnecessary to give a ranking in the above-described steps 108 and 120 as necessary. That is, if the similarity ranking of comments and / or the ranking of the original text is unnecessary and only the similarity number of the comments and / or the score value of the original text are needed, the ranking given in steps 108 and 120 may be omitted, and the present invention may be implemented only by the numerical values. It may be.

도 2는 전술한 도 1의 단계 114의 구현 형태의 일예를 나타낸다.2 illustrates an example of an implementation of step 114 of FIG. 1 described above.

전술한 바와 같이 단계 114는 각 댓글에 대한 점수를 얻는 과정이며, 이러한 점수를 얻는 과정은 여러 가지 방법으로 구현될 수 있다. 이하, 그러한 여러 가지 방법 중에서 일실시예를 설명하기로 한다.As described above, step 114 is a process of obtaining a score for each comment, and the process of obtaining such a score may be implemented in various ways. Hereinafter, one embodiment of such various methods will be described.

도 1의 단계 114를 요약하면, 형태소 분석 기법 및 찬성어 사전, 반대어 사전을 이용하여, 선택된 댓글들에서 찬성 키워드, 반대 키워드를 추출(필요에 따라서는, 추가적으로, 찬성 키워드와 반대 키워드의 수식어 중에서 강조 수식어도 추출)하고, 이에 따라 각 댓글에 대한 점수를 얻는다는 것이었다.In summary of the step 114 of FIG. Extracting the stress modifiers), thereby obtaining a score for each comment.

이의 구체적인 구현 방식으로서, 단계 114-2를 보면, 찬성어 사전의 각 필드를 (찬성어)로 구성하고, 반대어 사전의 각 필드를 (반대어)로 구성한다. 이는 찬성어 사전 및 반대어 사전의 가장 간단한 구현 형태이다. As a concrete implementation manner thereof, in step 114-2, each field of the proactive dictionary is composed of (a pronoun), and each field of the opposite dictionary is composed of (an opposite). This is the simplest implementation of the pro and dictionaries.

단계 114-2에 있어서 찬성어 사전 및 반대어 사전은 미리 설정되어 있어야 한다. 또한 반의어(또는 동의어)사전도 미리 설정되어 있어야 한다. 반의어 사전은 전술한 바와 같이 “동의하지 않는다”를 “반대”라는 의미로 정확히 인식하기 위한 사전이다. 반의어 사전은 (단어, 반의어)를 필드로 한다면, 동의어 사전은 (단어, 동의어)를 필드로 하므로, 반의어 사전이나 동의어 사전 중 하나를 이용하면 같은 효과를 얻을 수 있을 것이다. 이하에서는 편의상 반의어 사전을 사용한다고 서술하지만, 동의어 사전을 이용하여도 무방함은 물론이다.In step 114-2, the agree dictionary and the opposite dictionary must be set in advance. In addition, the antonym (or synonym) dictionary must also be set in advance. The antonym dictionary is a dictionary for accurately recognizing “not agree” as described above. If the antonym dictionary uses (words, antonyms) as a field, the synonym dictionary uses (words, synonyms) as a field. Therefore, the same effect can be obtained by using either an antonym dictionary or a synonym dictionary. In the following description, the antonym dictionary is used for convenience, but the synonym dictionary may be used as a matter of course.

다음으로, 단계 114-4를 보면, 형태소 분석에 의해서, 각 댓글에 대하여 찬성어의 개수 및 반대어의 개수를 파악하고, 단계 114-6에서, 특정 댓글에서 찬성어의 개수가 반대어의 개수보다 많으면 그 댓글은 찬성한 것으로 간주하고, 반대의 경우이면 그 댓글은 반대한 것으로 간주한다.Next, referring to step 114-4, by morphological analysis, the number of pros and cons for each comment is determined. In step 114-6, if the number of pros in a particular comment is greater than the number of converses. The comment is considered to be in favor, and in the opposite case, the comment is denied.

이렇게 하여, 찬성 키워드들을 반대 키워드보다 더 많이 사용한 댓글은 찬성으로 간주하고, 반대 키워드들은 찬성 키워드들보다 더 많이 사용한 댓글은 반대로 평가할 수 있게 된다. 한편, 찬성이나 반대 키워드들을 전혀 사용하지 않은 댓글은 기권으로 처리할 수 있을 것이다.In this way, comments that use more of the favor keywords than the opposite keywords are considered as affirmative, and comments that use more than the affirmative keywords can be reversed. On the other hand, comments that do not use the pros and cons keywords at all may be treated as absent.

즉, 이 방법은 댓글 당 1표씩 투표하는 효과를 가진다. 이를 통해 찬성의 수, 반대의 수, 찬성율(유사 댓글 중 찬성 댓글 비율), 반대율(유사 댓글 중 반대 댓글 비율), 및 기권율을 계산할 수 있다. 이 방식은 기존의 투표 방식을 자동으로 수행하는 것에 비교할 수도 있다. 물론 기존의 투표 방식에 있어서는 투표에 관련된 기능이 게시판에 구현되어 있어야 하므로 완전히 동일한 것은 아니라 할 것이다.In other words, this method has the effect of voting one vote per comment. Through this, it is possible to calculate the number of pros, the number of dissent, the rate of approval (a ratio of the like comments among similar comments), the rate of disagreement (the ratio of disagree comments among similar comments), and the rate of abstention. This approach may be compared to automatically performing the existing voting method. Of course, in the existing voting method, since the functions related to voting must be implemented in the bulletin board, it will not be exactly the same.

도 3은 전술한 도 1의 단계 114의 구현 형태의 다른 일예를 나타낸다.3 illustrates another example of an implementation of step 114 of FIG. 1 described above.

전술한 바와 같이 단계 114는 각 댓글에 대한 점수를 얻는 과정이며, 이러한 점수를 얻는 과정은 여러 가지 방법으로 구현될 수 있다. 이하, 그러한 여러 가지 방법 중에서 도 2에서 설명한 것과는 다른 일실시예를 설명하기로 한다.As described above, step 114 is a process of obtaining a score for each comment, and the process of obtaining such a score may be implemented in various ways. Hereinafter, one embodiment different from those described in FIG. 2 among various methods will be described.

도 1의 단계 114를 요약하면, 형태소 분석 기법 및 찬성어 사전, 반대어 사전을 이용하여, 선택된 댓글들에서 찬성 키워드, 반대 키워드를 추출(필요에 따라서는, 추가적으로, 찬성 키워드와 반대 키워드의 수식어 중에서 강조 수식어도 추출)하고, 이에 따라 각 댓글에 대한 점수를 얻는다는 것이었다. 이때 반의어 사전도 이용한다.In summary of the step 114 of FIG. Extracting the stress modifiers), thereby obtaining a score for each comment. The antonym dictionary is also used.

이의 구체적인 구현 방식으로서, 먼저, 단계 114'-2를 보면, (찬성어, 찬성점수) 필드로 구성된 찬성어 사전, 및 (반대어, 반대점수) 필드로 구성된 반대어 사전을 이용한다. 여기서, 각 점수는 0에서 1 사이의 값을 갖도록 한다. 예를 들어, '괜찮다'는 0.6점, '찬성'은 0.8점, '지지'는 1.0점을 부여하도록 할 수 있을 것이다. 이는 사전적 의미의 강약의 강도에 따라서 정해 둘 수 있다.As a specific implementation manner thereof, first, in step 114'-2, a proactive dictionary composed of a (positive word, a positive score) field, and a reverse dictionary consisting of a (anti-correlation, negative score) field are used. Here, each score has a value between 0 and 1. For example, 0.6 points for 'OK', 0.8 points for 'yes', and 1.0 points for 'support' could be assigned. This can be determined according to the strength of the strength of the dictionary.

즉, 단계 114'-2에 있어서 찬성어 사전 및 반대어 사전을 이용하며, 예를 들어, 찬성어 사전은 (찬성어, 찬성점수) 필드로 구성되고, 반대어 사전은 (반대어, 반대점수) 필드로 구성될 수 있고, 필드에 들어갈 각각의 구성요소(즉, 이러한 사전의 작성)는 기본적인 설정에 해당되므로 미리 행해 두는 것이 바람직하다.That is, in step 114'-2, the affirmative dictionary and the antonym dictionary are used, for example, the affirmative dictionary is composed of the (agreement, affirmative score) field, and the antonym dictionary is a (counter negative, negative score) field. It is preferable to do so in advance since each component (that is, the creation of such a dictionary) to be included in the field corresponds to a basic setting.

또한, (강조어, 가중치) 필드로 구성된 강조어 테이블도 미리 설정하여 두고 함께 사용하는 것이 바람직하다. 반의어 사전도 미리 설정되어서 단계 114’-2에서 함께 이용된다. 반의어 사전은 전술한 바와 같이 “동의하지 않는다”를 “반대”라는 의미로 정확히 인식하기 위한 사전이다. 즉, 반의어 사전을 이용하여 ‘동의하지 않는다’를 ‘반대’ 키워드로 변환할 수 있다. 반의어 사전은 (단어, 반의어)를 필드로 한다면, 동의어 사전은 (단어, 동의어)를 필드로 하므로, 반의어 사전이나 동의어 사전 중 하나를 이용하면 같은 효과를 얻을 수 있을 것이다.In addition, it is preferable that the emphasis word table composed of the (strength word, weight) field is also set in advance and used together. The antonym dictionary is also preset and used together in step 114'-2. The antonym dictionary is a dictionary for accurately recognizing “not agree” as described above. In other words, an antonym dictionary can be used to convert "don't agree" to "opposite" keywords. If the antonym dictionary uses (words, antonyms) as a field, the synonym dictionary uses (words, synonyms) as a field. Therefore, the same effect can be obtained by using either an antonym dictionary or a synonym dictionary.

다음으로, 단계 114'-4에서, 댓글의 키워드와 강조어를 함께 추출한다. 키워드에는 찬성어와 반대어가 포함되며, 따라서, 추출된 것은 찬성어, 반대어, 그 밖의 키워드, 및 강조어일 수 있다. 키워드 추출과정에서 또는 키워드 추출 후에 반의어 사전 또는 동의어 사전을 이용하여 키워드를 변환한다. 예를 들면, ‘동의하 지 않는다’를 ‘반대’ 키워드로 변환한다. 물론 이때도 강조 수식어가 사용될 수 있으므로 ‘매우 동의하지 않는다’는 ‘매우 반대’로 변환할 수 있다.Next, in step 114'-4, keywords and emphasis words of the comments are extracted together. The keywords include the pros and cons, and thus the extracted ones may be the pros, the opposites, other keywords, and the emphasis. Keywords are converted using an antonym or a synonym dictionary during or after keyword extraction. For example, convert "I don't agree" to the "opposite" keyword. Of course, the accent qualifier can also be used, so you can convert ‘very disagree’ to ‘very disagree’.

이 중에서 찬성어 및 강조어와 관련하여 설명한다. 찬성어를 중심으로 보면, 강조어가 없었다면 찬성어 단독으로서 추출될 것이고, 강조어가 있다면 강조어+찬성어가 추출될 것이다. 찬성어 단독의 경우에는 소정의 찬성어 점수를 부여하면 되고, 찬성어+강조어(즉, 찬성어와 강조어가 모두 추출된 경우)에 대해서는 가중치가 부여된 점수를 부여할 수 있을 것이다. 이들 각각을 모두 합하면 찬성어 점수의 합계가 된다. 물론, 반대어에 대해서도 마찬가지의 작업을 행할 수 있다.Among these, explanations will be made regarding pros and cons. Focusing on the affirmative words, if there were no emphasis words, they would be extracted as the affirmative words alone. In the case of the apology alone, a predetermined affirmative score may be given, and a weighted score may be given to the apology + emphasis (that is, when both the apology and the emphasis are extracted). The sum of each of these adds up to the total score for the pros. Of course, the same operation can be performed also for the opposite language.

한편, 강조어에 의한 가중치 부여를 좀 더 자세히 설명하면, (강조어, 가중치) 필드로 구성된 강조어 테이블을 이용하여, 가중치 점수를 계산한다. 이는 강조어에 의한 점수 가중치 부여를 위한 것이다. 예를 들어, '절대'는 가중치 1.0, '매우'는 가중치 0.6, '약간'은 0.3일 수 있다.On the other hand, when the weighting by the emphasis is described in more detail, the weighting score is calculated using the emphasis table composed of the (strength, weight) field. This is for score weighting by emphasis. For example, 'absolute' may be a weight of 1.0, 'very' may be a weight of 0.6, and 'little' may be 0.3.

구체적인 계산법은 다음의 수식Specific calculation method is the following formula

[단어의 점수 + {단어의 점수 × (가중치 - w3)}] / (단어의 최대점수)[Word score + {word score × (weight-w3)}] / (word maximum score)

에 의할 수 있다. It can be by.

w3은 조정계수로 0에서 1사이의 값을 가지며, 예로 0.5일 수 있다. 여기서 ‘가중치 - w3’(예를 들어 w3은 0.5)을 하는 이유는 약간, 조금 등의 의미를 약하게 하는 강조어는 점수를 감하기 위해서이다. 또한 수학식 3의 분모에 있는 ‘단어의 최대점수’는 상기 수학식의 분자 부분이 가질 수 있는 최대값이며, 따라서, 상 기 수학식의 결과는 0에서 1사이의 값을 갖도록 정규화된다. 상기 수학식의 변형도 가능하다. 예를 들면, 의미를 약하게 하는 강조어는 음의 정수의 가중치를 갖도록 할 수도 있다. 이 경우에는 “[단어의 점수 + {단어의 점수 × 가중치}] / (단어의 최대점수)”로 상기 수학식의 변형이 가능하다.w3 is an adjustment factor and has a value between 0 and 1, and may be, for example, 0.5. The reason for the weighting-w3 (for example, w3 is 0.5) is to reduce the score of the accent word that weakens the meaning of a little, a little, etc. In addition, the 'maximum score of words' in the denominator of Equation 3 is the maximum value that the molecular part of the equation can have, and therefore, the result of the equation is normalized to have a value between 0 and 1. It is also possible to modify the above equation. For example, an emphasis word that weakens meaning may have a weight of a negative integer. In this case, the above equation can be modified to "[word score + {word score x weight}] / (word maximum score)".

가중치를 반영하는 경우, 각 댓글의 찬성어 총점(Td)은 당해 댓글에서의 찬성어의 모든 점수의 합(가중치까지 포함)이 되고, 반대어 총점(Td)은 당해 댓글에서의 반대어의 모든 점수의 합(가중치까지 포함)이 된다. 즉, 찬성어의 총점은 댓글 내 모든 찬성어의 출현에 대하여 각 찬성어 단어의 ‘찬성어 점수’ 및 ‘각 강조어와 결합된 찬성어 단어의 강조어에 의한 가중치가 반영된 찬성어 점수’의 합계로 하고, 반대어의 총점은 댓글 내 모든 반대어의 출현에 대하여 각 반대어 단어의 ‘반대어 점수’ 및 각 강조어와 결합된 반대어 단어의 ‘강조어에 의한 가중치가 반영된 반대어 점수’의 합계로 한다.When reflecting the weights, the total score of the pros (Td) for each comment is the sum of all scores (including weights) of the pros in the comment, and the total score of the negatives (Td) is the sum of all the scores of the opposite word in the comment. Sum (including weights). That is, the total score of the pros is the sum of the 'favor scores' of each approve word and the 'proscore scores that are weighted by the accent words of the proximal words combined with each accented word' for the appearance of all the pros in the comment. The total score of the opposite word is the sum of the 'counter word scores' of each of the opposite word words and the 'counter weights of the opposite words combined with the emphasis' for each occurrence of the opposite word in the comment.

다음으로, 단계 114'-6에서, 각 댓글의 찬성어 총점과 반대어 총점을 정규화한다. 즉, 각 댓글의 찬성어 총점과 반대어 총점을 0(낮음)에서 1(높음) 사이의 값을 갖도록 변환할 수 있다. Next, in steps 114'-6, the total scores for the negative and negative words of each comment are normalized. That is, it is possible to convert the total scores of the pros and cons of each comment to have a value between 0 (low) and 1 (high).

예를 들어, 이는 다음의 수식에 의해 가능하다. 정규화된 찬성어 총점을 NTa, 정규화된 반대어 총점을 NTd라 하자.For example, this is possible by the following formula. Let NTa be the normalized pros and cons and NTd.

NTa = Ta/(찬성어 총점 최대값)NTa = Ta / (maximum number of pros)

NTd = Td/(반대어 총점 최대값)NTd = Td / (maximum counter total)

여기서, 찬성어 총점 최대값은 댓글에 부여 가능한 최대의 찬성어 총점수이고, 반대어 총점 최대값은 댓글에 부여 가능한 최대의 반대어 총점수이다.Here, the maximum value of the total number of approved words is the maximum total number of approved words that can be given to the comment, and the maximum total number of negative words is the maximum total number of negative words that can be given to the comment.

도 4는 전술한 도 1의 단계 114의 구현 형태의 또 다른 일예를 나타낸다.4 illustrates another example of an implementation of step 114 of FIG. 1 described above.

전술한 바와 같이 단계 114는 각 댓글에 대한 점수를 얻는 과정이며, 이러한 점수를 얻는 과정은 여러 가지 방법으로 구현될 수 있다. 이하, 그러한 여러 가지 방법 중에서 도 2 및 도 3에서 설명한 것과는 다른 일 실시예를 설명하기로 한다.As described above, step 114 is a process of obtaining a score for each comment, and the process of obtaining such a score may be implemented in various ways. Hereinafter, an embodiment different from those described with reference to FIGS. 2 and 3 will be described.

도 1의 단계 114를 요약하면, 형태소 분석 기법 및 찬성어 사전, 반대어 사전을 이용하여, 선택된 댓글들에서 찬성 키워드, 반대 키워드를 추출(필요에 따라서는, 추가적으로, 찬성 키워드와 반대 키워드의 수식어 중에서 강조 수식어도 추출)하고, 이에 따라 각 댓글에 대한 점수를 얻는다는 것이었다. 또한 이 과정에서 반의어 사전이나 동의어 사전을 이용하여 키워드를 변환할 수 있다.In summary of the step 114 of FIG. Extracting the stress modifiers), thereby obtaining a score for each comment. Also, in this process, keywords can be converted using an antonym dictionary or a thesaurus.

이의 구체적인 구현 방식으로서, 단계 114"-2를 보면, (찬성어, 강조어, 찬성점수) 필드로 구성된 찬성어 사전, 및 (반대어, 강조어, 반대점수) 필드로 구성된 반대어 사전을 이용한다. 즉, 이는 도 3에서 설명한 것과는 찬성어 사전 및 반대어 사전을 구성하는 필드에 강조어가 이미 포함되어 있는 점이 상이하다. 또한 (단어, 반의어) 필드로 구성되는 반의어 사전 또는 (단어, 동의어) 필드로 구성되는 동의어 사전을 활용한다. 여기서, 단계 114"-2에 있어서 찬성어 사전 및 반대어 사전의 기본적인 설정 자체는 반드시 이 시기에 행해져야만 하는 것은 아님을 알 수 있고, 오히려, 필드에 들어갈 각각의 구성요소(즉, 이러한 사전의 작성)는 기본적인 설정에 해당되므로 미리 행해 두는 것이 바람직하다. 또한 반의어 사전(또는 동의어 사전)도 미리 설정되어 있어야 한다.As a specific implementation of this, referring to step 114 "-2, a proactive dictionary consisting of the (agreement, the emphasis, the affirmative score) field, and a counter dictionary consisting of the (antonym, the emphasis, the negative score) field are used. This is different from that described in Fig. 3 in that the fields constituting the proactive and thesaurus are already included in the accent dictionary, and the antonym dictionary or the synonym field composed of the (word, antonym) field. Synonym dictionaries. Here, in step 114 "-2, it can be seen that the basic settings of the proactive dictionaries and the opposite dictionaries themselves do not necessarily have to be done at this time. That is, it is preferable to make such a dictionary in advance because it corresponds to a basic setting. The antonym dictionary (or synonym dictionary) must also be set up in advance.

찬성 점수와 반대 점수는 강조어와 함께 단어가 사용되었을 때의 점수가 부여된다. 강조어가 사용되지 않은 경우에는 원래의 단어 점수가 부여된다. 이때 각 점수는 0에서 1 사이를 갖도록 한다.Yes and Negative scores are given when words are used with emphasis. If no emphasis is used, the original word score is given. At this time, each score should be between 0 and 1.

다음으로 단계 114"-4를 수행하며, 이는 도 3과 관련하여 전술한 단계 114'-6과 유사 또는 동일하게 수행할 수 있다.Step 114 " -4 is then performed, which can be performed similarly or identically to step 114'-6 described above in connection with FIG.

요컨대, 도 4의 실시예와 도 3의 실시예는 찬성어 사전 및 반대어 사전에 강조어 필드가 포함되는지 아니면 별도의 강조어 사전을 만드는지 하는 점이 상이하다고 할 것이다.In other words, the embodiment of FIG. 4 and the embodiment of FIG. 3 will be different in that the emphasis word field and the opposite word dictionary are included or a separate emphasis dictionary is created.

도 5는 전술한 방법을 실제 게시물의 예를 들어 간략히 설명한 것이다.5 briefly describes the method described above as an example of an actual post.

도 5를 보면, 네모박스 안에 게시물 원문이 있고 그에 대한 사용자들의 댓글이 10개 달려 있음을 알 수 있다. 도 5의 예에서는 각각의 댓글에 댓글을 단 사람의 아이디가 표시되어 있어서 누가 댓글을 달았는지 구분할 수 있다. 예를 들어 댓글 4와 댓글 6은 동일한 사람이 작성한 것임을 알 수 있다.Looking at Figure 5, it can be seen that the original text in the box and the user's comment on it depends on 10. In the example of FIG. 5, the ID of the person who has commented on each comment is displayed to identify who has commented. For example, comment 4 and comment 6 are written by the same person.

도 5의 예에서는 댓글이 짧은 1~2개 정도의 문장으로 예시되었으나, 원문에 대해서 정식의 게시물(예를 들어 댓글 하나에 문장 10~20개)이 작성되는 경우도 있을 것이다. 다만, 도 5에서는 설명을 간단히 하기 위하여 짧은 원문 및 댓글들만을 예로 들었다.In the example of FIG. 5, the comment is illustrated as one or two short sentences, but a formal post (for example, 10 to 20 sentences in one comment) may be created for the original text. However, in FIG. 5, only short texts and comments are given as examples for simplicity of explanation.

도 5의 실시예를 도 1의 단계에 따라 간략히 분석해 보자.The embodiment of FIG. 5 will be briefly analyzed according to the steps of FIG. 1.

단계 102에 따라서, 원문의 벡터를 (생물, 동물, 식물, 종자식물, 겉씨식물, 속씨식물, 외떡잎식물, 쌍떡잎식물) 등으로 선정할 수 있을 것이다.According to step 102, the original vector may be selected as a living organism, animal, plant, seed plant, seed plant, genus plant, monocotyledonous plant, dicotyledonous plant, or the like.

다음으로, 단계 104에 따라서, 댓글 1 ~ 댓글 10의 벡터(각각, VB1~VB10)를 선정할 수 있을 것이다.Next, according to step 104, the vectors 1 to 10 of comments (VB1 to VB10, respectively) may be selected.

단계 102와 단계 104에서 얻은 벡터는 형태소 분석 및 금칙어/불용어 필터링이 이미 적용된 결과일 수 있다.The vectors obtained in steps 102 and 104 may be the result of the morphological analysis and the stopword / stopword filtering already applied.

단계 106에 따라, 원문 벡터(VA)와 댓글 벡터(VB1~VB10)를 각각 비교한다. According to step 106, the original vector VA and the comment vectors VB1 to VB10 are compared, respectively.

단계 108에 따라, 유사도 순위의 랭킹을 매긴다. 예를 들어, 랭킹이 댓글 5, 댓글 2, 댓글 8, 댓글 9, 댓글 1, 댓글 10, 댓글 6, 댓글 3, 댓글 4의 순으로 정해지고 댓글 7은 랭킹에서 제외되었다고 하자. According to step 108, the similarity ranking is ranked. For example, let's say that the ranking is set in order of comment 5, comment 2, comment 8, comment 9, comment 1, comment 10, comment 6, comment 3, comment 4 and comment 7 is excluded from the ranking.

예를 들어 댓글 3 및 댓글 4에는 찬성 또는 반대의 의사 표시는 있으나 원문의 키워드와 관련어가 없어서 랭킹이 낮다고 볼 수 있다. 다만, 시스템을 운영함에 있어서 찬성 또는 반대의 의사 표시도 관련어로 등록해 둘 수도 있을 것이다. 또한, 댓글 7에는 금칙어가 많아서 랭킹에서 제외된 것으로 볼 수 있다. For example, comments 3 and 4 have a positive or negative intention, but the ranking is low because there is no related word in the original keyword. However, in the operation of the system, the pros and cons may be registered as relevant. In addition, comment 7 has a lot of banned words can be seen as excluded from the ranking.

단계 110에 따라, 미리 설정된 기준에 따라서 유사도 랭킹이 낮은 댓글 3, 댓글 4를 제외한다. 물론 댓글 7의 경우는 금칙어가 많아서 유사도를 판단하기 이전에 이미 제외된 것으로 하였으나, 설사 금칙어로 인해 제외되지 않았다고 가정해도, 결국은 유사도가 낮다는 이유로 제외될 확률이 크다.According to step 110, comments 3 and 4 having a low similarity ranking are excluded according to a preset criterion. Of course, in the case of comment 7, it was already excluded before judging similarity because there were many banned words, but even if it was not excluded due to banned words, it is likely to be excluded because of low similarity.

단계 112에 따라, 댓글의 작성자들에게 인센티브 점수를 부여한다. 이 점수는 사용자 별로 누적되어 기록될 수 있다.In accordance with step 112, the creator of the comment is awarded an incentive score. This score may be accumulated by user.

단계 114에 따라, 랭킹이 높은 댓글 5, 댓글 2, 댓글 8, 댓글 9, 댓글 1, 댓글 10, 댓글 6 중에서 찬성 관련 점수 및 반대 관련 점수를 적절히 계산한다. 단계 114는 여러 가지 방식으로 수행될 수 있으며 예를 들어 도 2 내지 도 4에 나타난 방식으로 수행될 수 있다. According to the step 114, a positive score and a negative score are appropriately calculated among the high ranking comment 5, comment 2, comment 8, comment 9, comment 1, comment 10, and comment 6. Step 114 may be performed in various ways, for example, in the manner shown in FIGS.

예컨대, 댓글 2에 '동의'가 4번 나타나지만 이를 1번 내지 2번만 나타난 것으로 간주하여 점수를 매길 수 있다. 이는, 댓글 10에서 '반대'가 5번 나타나는 것에 대해서도 마찬가지이다. 또한, 댓글 6에서 '동의하지 않습니다'를 동의가 아니라 반대의 뜻으로서 파악하는 것은 전술한 반의어 사전에 의해 가능하다. 또한, 댓글 8에서 '아주 아주'를 강조어로 생각하여 가중치를 줄 수도 있다. 물론, 실시예에 따라서는 강조어를 무시할 수도 있을 것이다.For example, the 'consent' appears 4 times in the comment 2, but it can be regarded as only 1 or 2 times and scored. The same is true for comment 10, which appears five times. In addition, it is possible to understand 'not agree' in Comment 6 as the opposite meaning, not the consent, by the above-mentioned counter- dictionary. You can also give weight to comment 8 with the word `` very very '' as the emphasis. Of course, in some embodiments, the emphasis word may be ignored.

단계 116 및 단계 118에 따라, 댓글들에 대한 찬성 및 반대 점수를 토대로 하여 원문에 대한 측정치를 얻고 그에 따라 원문에 점수를 부여한다. 단계 120에 따라서, 게시물(원문)에 랭킹을 부여한다. 도 5에는 원문(네모 박스 안의 글)이 하나밖에 나타나 있지 않지만, 다른 게시물이 더 많다고 가정하자. 예컨대 다른 원문이 5개 더 있고 그들 각각의 점수(원문의 점수)를 비교하여 점수가 가장 높은 원문이 가장 높은 랭킹의 것임을 알 수 있다.In accordance with steps 116 and 118, measurements are obtained for the text based on the pros and cons of the comments. According to step 120, a ranking is given to the post (the original). In FIG. 5, there is only one original text (an article in a square box), but assume that there are more posts. For example, there are five other texts, and their scores (comparison scores) are compared to show that the text with the highest score is the highest ranking.

단계 122에 따라서, 도 5의 원문의 게시자에게 인센티브 점수를 부여할 수 있을 것이다.According to step 122, an incentive score may be given to the publisher of the original text of FIG. 5.

도 5의 예에서는 단순한 문장을 예로 들었으나 문장이 길어지고 표본의 수가 많아질수록 더욱 세밀한 분류 및 정확한 평가가 가능해질 것이다.In the example of FIG. 5, a simple sentence is taken as an example, but as the sentence becomes longer and the number of samples increases, more detailed classification and accurate evaluation will be possible.

도 6은 본 발명의 실시예에 따른 장치를 나타낸다.6 shows an apparatus according to an embodiment of the invention.

도 6에서 본 발명의 장치(600)는 인터넷으로 대표되는 전기통신망(전자정보통신망)과 연결되어 있다. 전기통신망은 인터넷이라도 좋고 인트라넷이라도 좋고, 유선/무선 여부, 규모 등은 불문이다.In FIG. 6, the device 600 of the present invention is connected to a telecommunication network (electronic information communication network) represented by the Internet. The telecommunication network may be the Internet or an intranet, regardless of whether it is wired or wireless or not.

본 발명의 장치(600)는, 원문 키워드 벡터화 수단(602), 댓글 키워드 벡터화 수단(604), 벡터 유사도 비교 수단(606), 댓글 선택 수단(608), 찬성 점수/반대 점수 획득 수단(610), 및 원문 점수 부여 수단(612) 등을 포함한다.The apparatus 600 of the present invention includes a text keyword vectorizing means 602, a comment keyword vectorizing means 604, a vector similarity comparing means 606, a comment selecting means 608, a positive score / inverse score obtaining means 610. , And text score assigning means 612.

원문 키워드 벡터화 수단(602)은 전기통신망에서의 게시물에 대해, 게시물 원문의 텍스트로부터 키워드를 추출하여 벡터화한다. The original keyword vectorizing means 602 extracts and vectorizes a keyword from the text of the original text of the post in the telecommunication network.

댓글 키워드 벡터화 수단(604)은 게시물 원문에 대한 n개의 댓글(n은 1 이상의 자연수)로부터 키워드를 추출하여 각각의 댓글별로 벡터화한다. The comment keyword vectorizing means 604 extracts a keyword from n comments (n is a natural number of 1 or more) for the original text of the post and vectorizes each comment.

벡터 유사도 비교 수단(606)은 원문 키워드 벡터와 n개의 댓글 키워드 벡터의 유사도를 비교한다. The vector similarity comparison means 606 compares the similarity between the original keyword vector and the n comment keyword vectors.

댓글 선택 수단(608)은 비교한 유사도의 순위 또는 유사도 수치에 따라 유사도가 낮은 소정 범위의 댓글을 제외하고 나머지 댓글들만을 선택한다. The comment selecting means 608 selects only the remaining comments except for a range of comments having a low similarity according to the ranking or similarity value of the similarity.

찬성 점수/반대 점수 획득 수단(610)은 선택된 댓글들에서 찬성 키워드 및 반대 키워드를 추출하여 각 댓글에 대한 점수를 얻는다.The affirmative score / inverse score obtaining means 610 extracts affirmative keywords and opposite keywords from the selected comments to obtain a score for each comment.

원문 점수 부여 수단(612)은 찬성 점수 및 반대 점수에 기초하여 게시물 원문에 점수를 부여한다. The original text scoring means 612 assigns a score to the original text based on the positive score and the opposite score.

이상에서 일예를 들어 설명하였다. 예를 들어 인터넷이라고 하였으나, 이는 일예이며 컴퓨터가 개입되는 전기통신망이라는 의미로 파악하여야 할 것이다. 따라서, 외부로 연결되지 않는 인트라넷이라 해도 어느 정도의 규모만 갖추면 본 발명의 실시예의 기본 사상이 적용될 수 있음은 자명하다. 따라서, 본 발명은 전자적인 수단을 통한 전자정보통신망 전반에 적용되는 것으로 보아야 할 것이다. 또한, 그 통신이 유선에 의한 것이든 무선에 의한 것이든 적용될 수 있음은 물론이다.An example has been described above. For example, the Internet, but it is an example and should be understood as a telecommunications network in which computers are involved. Therefore, even if the intranet is not connected to the outside it is obvious that the basic idea of the embodiment of the present invention can be applied if only a certain amount of scale. Therefore, it should be seen that the present invention is applied to an entire electronic information communication network through electronic means. In addition, it is a matter of course that the communication can be applied by wire or wireless.

또한, 예컨대 반의어 사전을 이용하여 반대의 의사 표시를 동의의 의사 표시로 오해하지 않도록, 그리고 동의의 의사 표시를 반대의 의사 표시로 오해하지 않도록 한다고 하였는데, 동의어 사전을 이용하여도 같은 효과를 얻을 수 있음은 자명하다. 즉, 청구항에 반의어 사전만을 언급하였다 해도 동의어 사전도 동일한 기술 사상임은 명백하다.In addition, for example, an antonym dictionary was used to avoid misinterpretation of the opposite's intention as an intention of consent and to avoid misunderstanding the intention of the opposite. It is self-evident. In other words, even if an antonym dictionary is mentioned in a claim, it is clear that the synonym dictionary is the same technical idea.

한편, 본 발명의 실시예는 본 명세서에서 기술한 방법들을 컴퓨터상에서 수행하기 위한 프로그램을 포함하는 컴퓨터 판독 가능 기록매체를 포함할 수 있다. 상기 컴퓨터 판독가능 기록매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 플로피 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Meanwhile, an embodiment of the present invention may include a computer readable recording medium including a program for performing the methods described herein on a computer. The computer-readable recording medium may include program instructions, local data files, local data structures, etc. alone or in combination. The media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute program instructions are included. Examples of program instructions may include high-level language code that can be executed by a computer using an interpreter as well as machine code such as produced by a compiler.

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. Although the present invention has been described in detail with reference to exemplary embodiments above, those skilled in the art to which the present invention pertains can make various modifications to the above-described embodiments without departing from the scope of the present invention. Will understand.

그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined by the claims below and equivalents thereof.

도 4는 전술한 도 1의 단계 114의 구현 형태의 또다른 일예를 나타낸다.4 illustrates another example of an implementation of step 114 of FIG. 1 described above.

Claims

An original keyword vectorizing step of extracting and vectorizing a keyword from the text of the original text of the post in the telecommunication network by the original keyword vectorizing means;

A comment keyword vectorizing step of extracting keywords from n comments (n is a natural number of 1 or more) for the original text by the comment keyword vectorizing means and vectorizing each comment;

A vector similarity comparison step of comparing, by vector similarity comparing means, similarities between the original keyword vector and the n comment keyword vectors;

A comment selecting step of selecting, by the comment selecting means, a range of comments having a high similarity according to the ranking or similarity value of the compared similarities;

By the affirmative and disagreeing score acquiring means, extracting the affirmative keyword and the disagreeing keyword from the selected comments, and calculating a score for each comment according to the number of the affirmative keyword and the disagreeing keyword, and agreeing the score and the disagreeing score. Acquiring the pros and cons scores; And

And a text scoring step of assigning a score to the text of the post based on the positive score and the opposite score, by the text scoring unit.

The method of claim 1,

The comment selection step,

Ranking of the comment according to the ranking of the similarity, the original text evaluation method, characterized in that excludes the comment with a low similarity ranking.

The method of claim 1,

The comment selection step,

According to the numerical value of similarity, the original text evaluation method, characterized in that the comment with a low similarity value is excluded.

The method according to claim 2 or 3,

The comment selection step, in addition,

Based on the similarity value or rank, the original text evaluation method characterized in that the comment author is given a predetermined incentive score.

The method of claim 1,

Between the pros and cons score obtaining step and the original score assigning step,

Further obtaining a measure of the original text from the pros and cons of each comment;

The measurement value,

* The number of positive comments (i.e., the number of comments with a certain number (e1) greater than the negative score)

* The number of negative comments (i.e., the number of comments where the negative score is a certain number (e1) more than the positive score)

* Comment ranking in favor (in order of score)

Ranking of comments in reverse order

* Affirmative (i.e., the percentage of negative comments in the comment)

* Negative rate (i.e., the percentage of negative comments in the comment)

* Abstention rate (ie 1-Yes-No)

* Total score for the vote (ie total score of the vote in the comments)

The total score of the opposite score (ie the total score of the negative score in the comments);

* Average approval score of pros

* Average dissent score of dissent comments

* The total sum of each vote's (Yes-Negative scores)

* Overall average of each vote's (favorite score-disagree score)

* Weighted by the number of pros and cons to the overall average (Pavg) of (Pros-Disagree) for each comment.

Is one or more of

In the original text scoring step, the original text evaluation method characterized in that the score is assigned to the original text using the measurement.

The method according to any one of claims 1 to 3,

After the text scoring step,

Ranking the posts by the score of the original text of the post; And

Giving a predetermined incentive score to the original publisher by the score of the original text of the post

Post text evaluation method that includes more.

The method according to any one of claims 1 to 3,

After the text scoring step,

Post text evaluation method that includes more.

The method of claim 1,

The acknowledgment point and the opposite point obtaining step,

The number of pros for each comment by morpheme analysis, using a prologue dictionary containing the (favorite) field, an antonym dictionary containing the (antonym) field, and an antonym dictionary containing the (word, antonym) field. And identifying the number of opposites, wherein the antonym dictionary includes the steps of: preventing the intention of the opposite opinion to be regarded as the intention of consent or the intention of the consent to be regarded as the opposite intention. And

If the number of pros in a particular comment is greater than the number of negatives, the comment is considered to be approved, and if the opposite is the case, the comment is denied.

Original text evaluation method comprising a.

The method of claim 1,

The acknowledgment point and the opposite point obtaining step,

Includes a dictionary of apologies that include the Pros and Cons scores field, a counterpart dictionary that includes the Counter, Negative Score field, and a weight table consisting of the Highlight, Weight field, and the Word, Antonyms field. Using an antonym dictionary, the total score of the pros and cons is the sum of the 'favor scores' of each word and the 'favor scores reflecting the weights of the accents'. It is a sum of 'antonym score reflecting weight by accent word', but the antonym dictionary is characterized in that the opposite expression of intention is not regarded as the intention of consent or the intention of consent is not regarded as the opposite intention. How to rate the text of the post.

The method of claim 1,

In the pros and cons score acquisition step,

YES using a proactive dictionary that includes the (accept, accent, and affirmative) field, an antonym dictionary that includes the (antonym, an accent, a negative score) field, and an antonym that includes the (word, antonym) field. Compute the total score of the word and the total score of the opposite word, wherein the antonym dictionary checks that the intention of disagreement is not regarded as the intention of consent or the intention of consent is not regarded as the intention of disagreement. .

The method according to claim 9 and 10,

The weight by the emphasis is

[Word score + {word score × (weight-w3)}] / (word maximum score)

Reflected by

The w3 has a value between 0 and 1 as an adjustment factor, and the maximum score of a word is a maximum value that a molecular part of the equation can have.

The method according to claim 9 and 10,

The total score (Ta) in favor of the comment and the total score (Td) in the opposite words of each comment are as follows.

NTa = Ta / (maximum number of pros)

NTd = Td / (maximum counter total)

NTa is the normalized total number of approved pros, NTd is the normalized negative total, the maximum total number of pros is the maximum total number of pros that can be given to a comment, and the maximum total of negatives can be given the maximum Original text evaluation method characterized in that the total score is the opposite.

A computer-readable recording medium having recorded thereon a program for performing the method according to any one of claims 1 to 3, 5 and 8 to 10 on a computer.

Text keyword vectorizing means for extracting and vectorizing the keyword from the text of the text of the post for the post in the telecommunication network;

Comment keyword vectorizing means for extracting keywords from n comments (n is a natural number of 1 or more) for the original text of the post and vectorizing each comment;

Vector similarity comparison means for comparing the similarity between the original keyword vector and the n comment keyword vectors;

Comment selecting means for selecting a predetermined range of comments having a high similarity according to the ranking of similarity or the similarity value;

Affirmative and disagreeing score obtaining means for extracting affirmative and disagreeing keywords from the selected comments, and calculating affirmative and disagreeing scores for each comment according to the number of extracted and disagreeing keywords; And

And an original text scoring means for assigning a score to the original text based on the positive score and the opposite score.

The method of claim 14,

The pros and cons score obtaining means,

The number of pros for each comment by morpheme analysis, using a prologue dictionary containing the (favorite) field, an antonym dictionary containing the (antonym) field, and an antonym dictionary containing the (word, antonym) field. And identifying the number of antonyms, wherein the antonym dictionary comprises: means for the contrary intention to be regarded as an intention of consent or for the intention of consent not to be regarded as an intention of intention; And

Means that a comment is considered to be approved if the number of approved words in the comment is greater than the number of negative words, and if it is reversed, the comment is considered as disapproval.

Original text evaluation device comprising a.

The method of claim 14,

The pros and cons score obtaining means,

Includes a dictionary of apologies that include the Pros and Cons scores field, a counterpart dictionary that includes the Counter, Negative Score field, a weight table that includes the Highlight, Weights field, and the Word, Antonyms field. Using an antonym dictionary, the total score of the pros and cons is the sum of the 'favor scores' of each word and the 'favor scores reflecting the weights of the accents', and the total scores of the opposite words are the 'counter words of each word' and It is a sum of "antonym score reflecting the weight by emphasis", but the antonym dictionary is characterized in that the opposite expression of intention is not regarded as the intention of consent or the intention of consent is not regarded as the opposite intention. Post text evaluation device.

The method of claim 14,

In the means for obtaining the positive and negative scores,

YES using a proactive dictionary that includes the (accept, accent, and affirmative) field, an antonym dictionary that includes the (antonym, an accent, a negative score) field, and an antonym that includes the (word, antonym) field. A total score of a word and a total score of a counter word, wherein the antonym dictionary is such that the disagreement intention is not regarded as the intention of consent or the intention of consent is not regarded as the intention of the opposite. .