KR101122335B1

KR101122335B1 - Apparatus and method for analyzing information diffusion of topic based in the blog world

Info

Publication number: KR101122335B1
Application number: KR1020100021220A
Authority: KR
Inventors: 김상욱; 강규황
Original assignee: 한양대학교 산학협력단
Priority date: 2010-03-10
Filing date: 2010-03-10
Publication date: 2012-03-23
Also published as: KR20110101882A

Abstract

블로그 월드에서의 주제별 정보 파급 분석 장치 및 방법이 개시된다. 블로그 월드에서의 주제별 정보 파급 분석 장치는 블로그에 등록되는 게시물을 주제 별로 분류하는 분류부, 상기 게시물에 대한 파급이력정보에 기초하여 상기 게시물이 등록되었던 복수 블로그 간을 연관시키고 연결망을 구성하는 전처리부, 및 상기 연결망을 이용하여, 상기 주제 별로 분류된 게시물에 대한 정보 파급을 분석하는 처리부를 포함한다.Disclosed are an apparatus and method for analyzing topic spreading information in a blog world. The topic information spreading and analyzing device in the blog world includes a classification unit for classifying posts registered in a blog by a topic, and a preprocessing unit for associating a plurality of blogs in which the post is registered based on the spread history information for the post and forming a connection network. And, and using the network, the processing unit for analyzing the information spread for the posts classified by the topic.

Description

APPARATUS AND METHOD FOR ANALYZING INFORMATION DIFFUSION OF TOPIC BASED IN THE BLOG WORLD}

본 발명의 실시예들은 블로그에 등록되는 게시물에 대해 주제별로 정보 파급을 분석하고, 블로그 간의 파급 확률을 보다 정확히 예측할 수 있는 블로그 월드에서의 주제별 정보 파급 분석 장치 및 방법에 관한 것이다.Embodiments of the present invention relate to an apparatus and method for analyzing information spread by topic in a blog world, which can analyze information spread by topic for a post registered in a blog and more accurately predict the spread probability between blogs.

블로그는 사용자가 자신의 생각을 게시물의 형태로 온라인 상에 표현할 수 있는 대표적인 서비스 수단이다. 이때, 블로그 간의 관계들로 이루어진 온라인 사회 연결망을 블로그 월드라고 한다. 블로그는 제공하는 서비스 기능(예컨대, 스크랩하기(scrap), 엮인 글 달기(trackback), 댓글 달기(comment))을 통하여 다른 블로그와 다양한 관계를 맺을 수 있다. 블로그 월드에서의 게시물에 대한 파급 분석 정보는 블로그 월드의 활성화, 블로그 사용자에 대한 마케팅 등에 대한 기초 자료로 유용하게 활용됨에 따라, 블로그 월드에서의 파급 분석 방법에 대한 연구가 활발히 진행되고 있다.A blog is a representative service means that allows users to express their ideas online in the form of posts. At this time, an online social network consisting of relationships between blogs is called a blog world. A blog may have various relationships with other blogs through service functions provided (eg, scraping, trackback, and commenting). As the ripple analysis information of the posts in the blog world is usefully used as basic data for the activation of the blog world, marketing to the blog users, research on the ripple analysis method in the blog world is being actively conducted.

기존의 블로그 월드에서의 파급 분석 방법은 전체 게시물의 파급이력정보에 기초하여 연결망을 구성하고, 블로그 간에 파급 확률을 부여한 후, 독립 전파 모델을 이용함으로써, 블로그 월드에서의 게시물에 대한 파급 분석 정보를 도출할 수 있다.The existing ripple analysis method in the blog world forms a network based on the ripple history information of all posts, assigns ripple probability between blogs, and then uses an independent propagation model to generate ripple analysis information about the posts in the blog world. Can be derived.

그러나, 상기 파급 분석 방법은 사용자나 게시물의 주제를 고려하지 않음에 따라, 게시물에 대해 주제별 파급 분석이 용이하지 않다. 또한, 상기 파급 분석 방법은 게시물의 파급 발생 가능성이 있는 블로그 사용자를 고려하지 않으므로, 블로그 월드에서의 파급을 정확하게 예측하는데 어려움이 있다.However, since the ripple analysis method does not consider the subject of the user or the post, the ripple analysis by topic is not easy. In addition, since the ripple analysis method does not consider a blog user that may have a ripple occurrence of a post, it is difficult to accurately predict the ripple in the blog world.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 블로그에 등록되는 게시물에 대한 파급이력정보에 기초하여 게시물에 대한 정보 파급을 주제 별로 분석하고, 파급 가능성이 있는 후보 블로그를 더 고려하여, 블로그 간의 파급 확률을 보다 정확히 예측하는 것을 목적으로 한다.The present invention has been made in order to solve the above problems, based on the ripple history information for the post registered in the blog to analyze the spread of information on the topic by topic, in consideration of the potential candidate blogs, It aims to more accurately predict the spread probability between blogs.

상기의 목적을 이루기 위한 블로그 월드에서의 주제별 정보 파급 분석 장치는 블로그에 등록되는 게시물을 주제 별로 분류하는 분류부, 상기 게시물에 대한 파급이력정보에 기초하여 상기 게시물이 등록되었던 복수 블로그 간을 연관시키고 연결망을 구성하는 전처리부, 및 상기 연결망을 이용하여, 상기 주제 별로 분류된 게시물에 대한 정보 파급을 분석하는 처리부를 포함하여 구성할 수 있다.In order to achieve the above object, the information spreading analysis apparatus for each topic in the blog world is to classify the posts registered in the blog by the topic, based on the spread history information on the posts and associates between the plurality of blogs registered the posts; It may be configured to include a pre-processing unit constituting a network, and a processing unit for analyzing the information spreading for the posts classified by the topic using the network.

또한, 상기 목적을 달성하기 위한 기술적 방법으로서, 블로그 월드에서의 주제별 정보 파급 분석 방법은 블로그에 등록되는 게시물을 주제 별로 분류하는 단계, 상기 게시물에 대한 파급이력정보에 기초하여 상기 게시물이 등록되었던 복수 블로그 간을 연관시키고 연결망을 구성하는 단계, 및 상기 연결망을 이용하여, 상기 주제 별로 분류된 게시물에 대한 정보 파급을 분석하는 단계를 포함한다.In addition, as a technical method for achieving the above object, the method for analyzing the information spread by topic in the blog world is to classify the posts registered in the blog by the topic, the plurality of posts are registered based on the spread history information for the post Associating blogs and forming a network, and analyzing information spreading for posts classified by the topic using the network.

본 발명의 실시예에 따르면, 블로그에 등록되는 게시물에 대한 파급이력정보에 기초하여 게시물에 대한 정보 파급을 주제 별로 분석하고, 파급 가능성이 있는 후보 블로그를 더 고려하여, 블로그 간의 파급 확률을 보다 정확히 예측할 수 있다.According to an embodiment of the present invention, based on the ripple history information for the posts registered in the blog to analyze the information ripples on the topic by topic, in consideration of the potential blog candidates more ripple, the ripple probability between blogs more accurately It can be predicted.

도 1은 본 발명의 일실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 장치의 구성을 나타내는 도면이다.
도 2는 본 발명의 일실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 방법을 나타내는 흐름도이다.
도 3은 주제별 정보 파급 분석 장치에서, 여행에 관한 게시물의 파급을 예측한 결과를 도시한 도면이다.
도 4는 주제별 정보 파급 분석 장치에서, 영어에 관한 게시물의 파급을 예측한 결과를 도시한 도면이다.
도 5는 주제별 정보 파급 분석 장치에서, 이전에 파급이력이 존재하지 않았으나, 예측 구간에 새롭게 파급이 일어난 연결망에 대한 정확도를 도시한 도면이다.
도 6은 주제별 정보 파급 분석 장치에서, 다섯 가지 주제에 대한 현상 분석 결과를 도시한 도면이다.1 is a view showing the configuration of a topic-specific information spread analysis device in the blog world according to an embodiment of the present invention.
2 is a flowchart illustrating a method of analyzing information spread by topic in a blog world according to an embodiment of the present invention.
3 is a diagram showing a result of predicting the spread of a post about travel in a topic-specific information spread analysis device.
FIG. 4 is a diagram illustrating a result of predicting the spread of a post regarding English language in the information spreading analysis apparatus for each topic.
FIG. 5 is a diagram illustrating the accuracy of a connection network in which a ripple history is newly generated in a prediction interval, although a ripple history has not previously existed.
FIG. 6 is a diagram illustrating a phenomenon analysis result for five topics in a topic-specific information spreading and analyzing device.

이하, 첨부된 도면들을 참조하여 본 발명의 일실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 장치 및 방법에 대해 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the apparatus and method for analyzing the information spread by topic in the blog world according to an embodiment of the present invention.

도 1은 본 발명의 일실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 장치의 구성을 나타내는 도면이다.1 is a view showing the configuration of a topic-specific information spread analysis device in the blog world according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 장치(101)는 분류부(103), 전처리부(105), 처리부(111) 및 데이터베이스(113)를 포함한다.Referring to FIG. 1, an apparatus 101 for analyzing information per topic in a blog world according to an exemplary embodiment of the present invention includes a classification unit 103, a preprocessor 105, a processor 111, and a database 113. do.

분류부(103)는 블로그에 등록되는 게시물을 주제 별로 분류한다. 예컨대, 분류부(103)는 블로그에 등록되는 게시물을 '여행', '영어', '요리' 등의 주제를 기준으로 분류할 수 있다.The classification unit 103 classifies posts registered in a blog by a topic. For example, the classification unit 103 may classify posts registered in a blog based on a theme such as 'travel', 'English', or 'cooking'.

전처리부(105)는 주제별로 분류된 게시물에 대한 파급이력정보에 기초하여, 게시물이 등록되었던 복수 블로그 간을 연관시키고 연결망을 구성한다. 여기서, 전처리부(105)는 데이터베이스(113)에 유지되는 파급이력정보를 이용하여, 상기 연결망을 구성할 수 있다.The preprocessor 105 associates a plurality of blogs on which posts are registered and forms a connection network based on the ripple history information on posts classified by subject. Here, the preprocessor 105 may configure the connection network using the propagation history information maintained in the database 113.

전처리부(105)는 상기 연결망 구성 시, 분석된 정보 파급에 의거하여, 제1 게시물이 등록될 것으로 판단되는 후보 블로그에 대해 연결망을 구성할 수 있다. 이때, 전처리부(105)는 블로그를, 노드로 구성하고, 노드 간에 연결되는 간선을 포함하는 연결망을 구성할 수 있다. 이후, 전처리부(105)는 산출되는 게시물의 파급 확률을 해당 노드에 연결된 간선에 부여할 수 있다.The preprocessing unit 105 may configure a connection network for a candidate blog that is determined to be registered with the first post based on the analyzed information spread when the connection network is configured. In this case, the preprocessor 105 may configure a blog as a node, and configure a connection network including trunk lines connected between the nodes. Thereafter, the preprocessor 105 may assign the propagation probability of the calculated post to the edge connected to the corresponding node.

전처리부(105)는 게시물의 파급이 발생한 블로그 뿐만 아니라, 파급 가능성이 있는 후보 블로그를 더 고려하여 연결망을 구성함으로써, 이후, 블로그 간의 파급 확률을 보다 정확히 예측할 수 있는 환경을 마련한다. The preprocessing unit 105 configures a network in consideration of not only blogs in which posts have spread, but also candidate blogs that may have spread, thereby providing an environment in which the spread probability between blogs can be more accurately predicted.

구체적으로, 전처리부(105)는 상기 제1 게시물과 제2 게시물을 등록하는 제1 블로그와, 상기 제2 게시물과 관련되어 연결망을 구성하고 있는 제2 블로그를 검색하고, 상기 검색된 제2 블로그에 상기 제1 게시물이 등록되지 않는 경우, 상기 제2 블로그를 후보 블로그로 판단하여 상기 제1 게시물과 관련되는 연결망을 상기 제1, 2 블로그 간에 구성할 수 있다. 이는, 블로그 간에 특정 주제에 관한 게시물이 파급되면, 다른 주제에 관한 게시물 또한, 파급될 가능성이 있음에 착안하여, 후보 블로그를 구별한 것이다.In detail, the preprocessing unit 105 searches for a first blog that registers the first post and the second post, and a second blog that forms a connection network in connection with the second post, and searches for the searched second blog. If the first post is not registered, the second blog may be determined as a candidate blog and a connection network related to the first post may be configured between the first and second blogs. This distinguishes candidate blogs, considering that if a post on a particular topic spreads between blogs, posts on other topics may also spread.

예컨대, 전처리부(105)는 '여행'에 관한 게시물과 '영어'에 관한 게시물을 등록하는 제1 블로그와 '영어'에 관한 게시물과 관련되어 연결망을 구성하고 있는 제2 블로그를 검색하고, 상기 제2 블로그에 '여행'에 관한 게시물이 등록되지 않는 경우, 상기 제2 블로그를 후보 블로그로 판단하여 '여행'에 관한 게시물과 관련되는 연결망을 제1, 제2 블로그 간에 구성할 수 있다. 즉, 전처리부(105)는 제1, 2 블로그 간에 '영어'에 관한 게시물이 파급되면, '여행'에 관한 게시물이 파급될 가능성이 있으므로, '여행'에 관한 게시물과 관련되는 연결망을 제1, 제2 블로그 간에 구성할 수 있다.For example, the preprocessing unit 105 searches for a first blog that registers a post about 'travel' and a post about 'English' and a second blog that forms a network in connection with a post about 'English', If a post about 'travel' is not registered in the second blog, the second blog may be determined as a candidate blog and a connection network related to the post about 'travel' may be configured between the first and second blogs. In other words, if the post about 'English' is spread between the first and second blogs, the preprocessing unit 105 may spread the post about 'Travel', so that the first network may be connected to the post about 'Travel'. Can be configured between the second blog.

또한, 전처리부(105)는 상기 제1 게시물과 상기 제2 게시물을 등록하는 제1 블로그와, 상기 제1 게시물과 동일한 주제에 관한 제3 게시물을 등록하는 제3 블로그를 검색하고, 상기 제3 블로그(또는, 제1 블로그)를 후보 블로그로 판단하여 상기 제1 게시물(또는, 제3 블로그)과 관련되는 연결망을 상기 제1, 3 블로그 간에 구성할 수 있다.In addition, the preprocessing unit 105 searches for a first blog for registering the first post and the second post, a third blog for registering a third post on the same topic as the first post, and the third post. The blog (or the first blog) may be determined as a candidate blog and a connection network associated with the first post (or the third blog) may be configured between the first and third blogs.

전처리부(105)는 상기 구성된 연결망에 대해, 인자값을 이용하여 상기 게시물의 파급 확률을 산출할 수 있다. 여기서, 전처리부(105)는 상기 연결망이 구성된 블로그로 파급되는 제1 게시물의 비율에 관한 값(

), 상기 연결망이 구성된 블로그로 파급되는 상기 제1 게시물과 다르게 분류된 제2 게시물의 비율에 관한 값(

), 상기 연결망이 구성되지 않는 블로그로 파급되는 제1 게시물의 비율에 관한 값(

), 또는 상기 연결망이 구성된 블로그에 등록된 상기 제1 게시물의 수에 관한 값(

) 중 어느 하나를 상기 인자값으로 정의할 수 있다.The preprocessor 105 may calculate the propagation probability of the post using the factor value for the configured network. Here, the preprocessing unit 105 is a value for the ratio of the first post spread to the blog composed of the network (

), A value relating to a percentage of second posts classified differently from the first post that the network is spread to

), A value relating to the percentage of first posts that spread to blogs that are not

), Or a value relating to the number of first posts registered to a blog configured with the network (

) May be defined as the factor value.

첫번째 인자값(

)은, 주제 K를 만족하는 블로그 A의 게시물 중에서 블로그 B에게 파급된 비율을 나타내는 인자값으로, 블로그 A가 주제 K에 대하여 블로그 B에게 미친 영향을 의미한다.First argument value (

) Is a factor value representing the proportion of blog A that spreads to blog B among the posts of blog A satisfying the topic K, and indicates the influence that blog A has on blog B on topic K.

전처리부(105)는 기존의 파급확률 부여 기법인, 유효 점수를 이용한 확률 부여 기법을 이용하여,

을 산출할 수 있다. 구체적으로, 전처리부(105)는

(블로그 A가 가지고 있는 게시물의 집합)의 모든 게시물들의 파급 유효도를 계산하여, 이를 정규화한 값들의 총합을 계산한다. 상기 계산된 총합은 블로그 A가 주제 K에 대한 파급을 의도한 전체 규모라고 볼 수 있다. 전처리부(105)는

의 모든 게시물들의 파급 유효도를 정규화한 값들의 총합 중에서

에 포함된 모든 게시물들의 파급 유효도 값들을 정규화하여 합을 계산 함으로써, 실제로 블로그 B에게 파급된 비율을 산출하여, [수학식 1]과 같이 나타낼 수 있다.The preprocessor 105 uses a probability assigning technique using an effective score, which is a conventional spread probability assigning technique,

Can be calculated. Specifically, the preprocessor 105

Compute the ripple validity of all the posts (the set of posts that Blog A has) and calculate the sum of the normalized values. The calculated sum can be seen as the total size of blog A intended to spread to topic K. The preprocessor 105

Of the sum of the normalized ripple validity of all posts in

By calculating the sum by normalizing the ripple validity values of all the posts included in, the ratio actually spread to the blog B may be expressed as shown in [Equation 1].

여기서, norm_score(Di)는 Di의 파급 유효도 값을 정규화한 값을 의미하고,

는

에서 블로그 B에게 파급된 게시물들의 집합한다.Here, norm_score (Di) means a value obtained by normalizing the ripple validity value of Di,

Is

A collection of posts spread to Blog B.

두번째 인자값(

)은, 주제 K가 아닌 블로그 A의 게시물 중에서 블로그 B에게 파급된 비율을 나타내는 인자값으로, 블로그 A가 K를 제외한 주제에 대하여 블로그 B에게 미친 영향이며, 다른 모든 주제들에 대하여 전반적으로 블로그 A 가 B에게 미친 영향을 의미한다고 볼 수 있다. Second argument value (

) Is a factor that indicates the percentage of blog A's posts that are spread to blog B, not topic K. This is the effect blog A has on blog B on topics other than K, and blog A on all other topics. Can be said to mean the effect on B.

전처리부(105)는

와 마찬가지로, 유효 점수를 이용한 확률 부여 기법을 이용하여

을 산출하여, [수학식 2]과 같이 나타낼 수 있다.The preprocessor 105

Similarly, using the probability assigning technique using the effective score

It can be calculated as shown in [Equation 2].

세번째 인자값(

)은, 블로그 A가 B를 제외한 블로그들에게 주제 K에 대하여 미치는 영향력을 나타내는 인자 값으로, 블로그 A가 B를 제외한 주제 K에 대한 이웃(neighbor)들에게 미친 영향을 의미한다.Third argument value (

) Is a factor indicating the influence of blog A on topic K on blogs except B, and indicates the effect blog A has on neighbors on topic K except B.

전처리부(105)는

을 산출하여, [수학식 3]과 같이 나타낼 수 있다.The preprocessor 105

Similarly, using the probability assigning technique using the effective score

It can be calculated as shown in [Equation 3].

네번째 인자값(

)은, 주제 K에 대한 블로그 B의 관심도를 나타내는 인자값으로, 블로그 B가 가지고 있는 주제 K를 만족하는 게시물의 수를 의미한다. 이때, 블로그 B의 주제 K에 대한 관심이 클수록, 이후에 주제 K에 대한 게시물을 파급할 가능성이 커지게 된다. The fourth argument value (

) Is a parameter value indicating the degree of interest of the blog B in the topic K, and means the number of posts satisfying the topic K of the blog B. At this time, the greater the interest in the topic K of the blog B, the greater the possibility of spreading posts on the topic K later.

상기 파급 확률을 산출/예측하기 위한 구성으로서, 전처리부(105)는 분석부(107) 및 예측부(109)를 포함한다.As a configuration for calculating / predicting the propagation probability, the preprocessor 105 includes an analyzer 107 and a predictor 109.

분석부(107)는 상기 연결망의 일부를 전연결망과, 상기 연결망의 다른 일부를 후연결망을 구분하고, 상기 전연결망에 대응하는 전인자값과, 상기 후연결망에 대응한 후인자값을 확인하며, 상기 확인된 전인자값을 독립변수로, 상기 후인자값을 이용하여 산출된 상기 후연결망에서의 파급 확률을 종속변수로, 포함하는 회귀 방정식을 도출할 수 있다. 이때, 분석부(107)는 예컨대, 설정된 파급 시점을 기준으로, 상기 연결망으로부터 전연결망과 후연결망을 구분할 수 있다.The analysis unit 107 distinguishes a part of the network from the pre-connection network and another part of the network from the post-connection network, and checks the pre-factor value corresponding to the pre-connection network and the post-factor value corresponding to the post-connection network. A regression equation may be derived that includes the identified prefactor value as an independent variable and the propagation probability in the back network calculated as the dependent variable as a dependent variable. At this time, the analysis unit 107, for example, based on the set propagation time, can distinguish the front and back network from the connection network.

예측부(109)는 상기 후인자값을 상기 회귀 방정식에 대입하여, 상기 게시물이 등록될 것으로 판단되는 후보 블로그에 대해 구성된 연결망의 파급 확률을 예측할 수 있다.The prediction unit 109 may substitute the latter factor value into the regression equation to predict the propagation probability of the connection network configured for the candidate blog where the post is determined to be registered.

처리부(111)는 상기 연결망을 이용하여, 주제 별로 분류된 게시물에 대한 정보 파급을 분석한다. 즉, 처리부(111)는 파급 확률이 부여된 연결망에 전파 모델(예컨대, 독립 전파 모델)을 적용하여, 주제 별로 분류된 게시물에 대한 정보 파급을 분석할 수 있다.The processor 111 analyzes the spread of information on posts classified by topic using the connection network. That is, the processor 111 may apply the propagation model (eg, independent propagation model) to the connection network to which the propagation probability is assigned, and analyze the information propagation on the post classified by the subject.

데이터베이스(113)는 게시물에 대한 파급이력정보를 저장한다. 이때, 데이터베이스(113)는 게시물에 대한 파급이력정보를 주제별로 분류하여 저장할 수 있고, 상기 파급이력정보를 파급 시점 별로 구분하여 저장할 수도 있다.The database 113 stores the ripple history information for the post. In this case, the database 113 may classify and store the ripple history information on the post by subject, and may also store the ripple history information by ripple time.

도 2는 본 발명의 일실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 방법을 나타내는 흐름도이다.2 is a flowchart illustrating a method of analyzing information spread by topic in a blog world according to an embodiment of the present invention.

도 2를 참조하면, 정보 파급 분석 장치는 블로그에 등록되는 게시물을 주제 별로 분류하고(201), 주제별로 분류된 게시물에 대한 파급이력정보에 기초하여, 게시물이 등록되었던 복수 블로그 간을 연관시키고 연결망을 구성한다(203).Referring to FIG. 2, the information spreading analysis apparatus classifies a post registered in a blog by a topic 201, and associates a plurality of blogs in which a post is registered based on the spread history information on posts classified by a topic. (203).

이때, 정보 파급 분석 장치는 분석된 정보 파급에 의거하여, 제1 게시물이 등록될 것으로 판단되는 후보 블로그에 대해 연결망을 구성할 수 있다. 정보 파급 분석 장치는 게시물의 파급이 발생한 블로그 뿐만 아니라, 파급 가능성이 있는 후보 블로그를 더 고려하여 연결망을 구성함으로써, 이후, 블로그 간의 파급 확률을 보다 정확히 예측할 수 있다.In this case, the information spreading analysis apparatus may configure a connection network for the candidate blog that is determined to be registered with the first post, based on the analyzed information spreading. The information ripple analysis apparatus can more accurately predict the spread probability between blogs by constructing a network in consideration of not only blogs where the posts have spread, but also candidate blogs that may have spread.

구체적으로, 정보 파급 분석 장치는 상기 제1 게시물과 제2 게시물을 등록하는 제1 블로그와, 상기 제2 게시물과 관련되어 연결망을 구성하고 있는 제2 블로그를 검색하고, 상기 검색된 제2 블로그에 상기 제1 게시물이 등록되지 않는 경우, 상기 제2 블로그를 후보 블로그로 판단하여 상기 제1 게시물과 관련되는 연결망을 상기 제1, 2 블로그 간에 구성할 수 있다.In detail, the information spreading analysis apparatus searches for a first blog for registering the first post and the second post, a second blog constituting a connection network in association with the second post, and the searched second blog. If the first post is not registered, the second blog may be determined as a candidate blog and a connection network related to the first post may be configured between the first and second blogs.

또한, 정보 파급 분석 장치는 상기 제1 게시물과 상기 제2 게시물을 등록하는 제1 블로그와, 상기 제1 게시물과 동일한 주제에 관한 제3 게시물을 등록하는 제3 블로그를 검색하고, 상기 제3 블로그(또는, 제1 블로그)를 후보 블로그로 판단하여 상기 제1 게시물(또는, 제3 블로그)과 관련되는 연결망을 상기 제1, 3 블로그 간에 구성할 수 있다.In addition, the information spreading analysis device searches a first blog for registering the first post and the second post, and a third blog for registering a third post on the same topic as the first post, and the third blog. The first blog may be determined as a candidate blog, and a connection network associated with the first post (or third blog) may be configured between the first and third blogs.

정보 파급 분석 장치는 상기 구성된 연결망에 대해, 인자값을 이용하여 상기 게시물의 파급 확률을 산출한다(205).The information propagation analyzing apparatus calculates the propagation probability of the post using the factor value for the configured network (205).

이때, 정보 파급 분석 장치는 상기 연결망이 구성된 블로그로 파급되는 제1 게시물의 비율에 관한 값(

) 중 어느 하나를 상기 인자값으로 이용하여, 상기 게시물의 파급 확률을 산출할 수 있다.At this time, the information spreading analysis device is a value (ratio) regarding the ratio of the first post spreading to the blog that the network is configured (

) Can be used as the factor value to calculate the propagation probability of the post.

구체적으로, 정보 파급 분석 장치는 상기 연결망의 일부를 전연결망과, 상기 연결망의 다른 일부를 후연결망을 구분하고, 상기 전연결망에 대응하는 전인자값과, 상기 후연결망에 대응한 후인자값을 확인하며, 상기 확인된 전인자값을 독립변수로, 상기 후인자값을 이용하여 산출된 상기 후연결망에서의 파급 확률을 종속변수로, 포함하는 회귀 방정식을 도출할 수 있다. 이후, 정보 파급 분석 장치는 상기 후인자값을 상기 회귀 방정식에 대입하여, 상기 게시물이 등록될 것으로 판단되는 후보 블로그에 대해 구성된 연결망의 파급 확률을 예측할 수 있다.In detail, the apparatus for analyzing information spreads a part of the network into an all-connection network, and another part of the network into a back connection network, and the pre-factor value corresponding to the pre-connection network and the after-factor value corresponding to the after-connection network. In addition, a regression equation including the identified prefactor value as an independent variable and the propagation probability in the back network calculated using the postfactor value as a dependent variable may be derived. Subsequently, the information propagation analysis apparatus may estimate the propagation probability of the connection network configured for the candidate blog where the post is determined to be registered by substituting the latter factor value into the regression equation.

정보 파급 분석 장치는 상기 연결망을 이용하여, 주제 별로 분류된 게시물에 대한 정보 파급을 분석한다(207). 즉, 정보 파급 분석 장치는 파급 확률이 부여된 연결망에 전파 모델(예컨대, 독립 전파 모델)을 적용하여, 주제 별로 분류된 게시물에 대한 정보 파급을 분석할 수 있다.The information ripple analysis apparatus analyzes the information ripple for the post classified by topic using the connection network (207). In other words, the information ripple analysis apparatus may analyze the information ripple for posts classified by topic by applying a propagation model (for example, an independent propagation model) to the network to which the ripple probability is given.

이하, 블로그 월드에서의 주제별 정보 파급 분석 방법으로서, 제안된 예측 모델의 유용성을 검증하기 위하여, 이후에 발생할 정보의 파급을 예측하는 예측 분석을 수행한 결과, 및 현재 시점에서 발생한 정보 파급의 분석에는 어느 정도의 정확도를 나타내는지 파악하기 위하여 현상 분석을 수행한 결과에 대해를 설명한다.Hereinafter, as a method of analyzing information spread by topic in the blog world, in order to verify the usefulness of the proposed prediction model, a result of performing predictive analysis to predict the spread of information to be generated later, and analysis of information spread occurring at the present time The following describes the results of the phenomenon analysis to determine the degree of accuracy.

<실험 환경> Experimental Environment

블로그 월드에서의 주제별 정보 파급 분석 장치는 블로그 월드 내의 많은 사용자（블로그）들이 정보를 파급시킨 '여행', '영어', '요리', '자동차', '축구'의 총 5개 주제에 대한 블로그 연결망에서 성능을 분석하였고, 각 주제에 관련된 블로그 사용자의 수와 게시물의 수는 [표 1]과 같다. 게시물은 단 하나의 주제를 만족하며, 사용자는 다양한 주제에 관심을 가질 수 있다고 가정하였다. Thematic information ripple analysis device in the blog world is a blog about five topics of 'travel', 'English', 'cuisine', 'car' and 'soccer' that many users (blogs) in the blog world spread information. Performance was analyzed in the network, and the number of blog users and the number of posts related to each topic are shown in [Table 1]. It is assumed that posts satisfy only one topic, and that the user may be interested in various topics.

성능 분석의 비교 대상으로는, 주제별 블로그 연결망에서 기존의 유효 점수를 이용한 확률 부여 방법을 사용한 기법(Naive), 본 발명에 따른 주제별 정보파급 분석 기법(Topic), 유효 점수를 이용한 확률 부여 기법(ES: Effective Score), 모든 간선에 동일하게 1%의 확률을 부여한 기법 CST1(Constant 1%), 모든 간선에 동일하게 5%의 확률을 부여한 기법 CST5(Constant 5%)이 있다.For comparison of performance analysis, the topic-based blogging network using the probability-providing method using the existing effective score (Naive), the topical information spread analysis method (Topic) according to the present invention, the probability-providing method using the effective score (ES : Effective Score), technique CST1 (Constant 1%), which equally gives 1% probability to all edges, and technique CST5 (Constant 5%), which gives 5% probability equally to all edges.

블로그 월드에서의 주제별 정보 파급 분석 장치는 상기 기법들의 성능을 측정하기 위해서, 블로그 월드에서 실제로 정보가 파급된 기록들과 각 기법을 통해서 구성한 블로그 연결망들을 대상으로 독립 전파 모델을 수행하여 생성된 파급 기록을 비교하였다. 비교의 척도로는 정보 검색 분야에서 널리 사용되는 응답도(recall)와 정밀도(precision) 및 응답도와 정밀도를 이용하여 계산한 F-measure를 사용하였다. 응답도와 정밀도는 서로 이율배반의 관계이다. 즉, 응답도가 매우 큰 값을 가질 경우에 정밀도는 낮아지게 되며, 정밀도가 매우 큰 값을 가질 경우에는 응답도가 낮아지게된다. 이러한 응답도와 정밀도의 조화 평균값이 F-measure이다.In order to measure the performance of the above techniques, the apparatus for analyzing the information of each topic in the blog world is based on the propagation record generated by performing the independent propagation model on the records that are actually distributed in the blog world and the blog networks configured through each technique. Was compared. As a measure of comparison, F-measure calculated by using recall, precision and responsiveness and precision, which are widely used in the field of information retrieval. Responsiveness and precision are mutually exclusive. That is, the precision is low when the response has a very large value, and the response is low when the precision is very large. This harmonic mean value of responsiveness and precision is F-measure.

<예측 분석><Predictive Analysis>

블로그 월드에서의 주제별 정보 파급 분석 장치는 주제별 정보 파급 분석 기법의 예측 정확도를 측정하기 위하여 분석에 이용한 데이터를 동일한 크기의 3개의 구간으로 분할했다. 이후, 블로그 월드에서의 주제별 정보 파급 분석 장치는 구간 1에서 구한 인자값들과 구간 2에서 구한 파급확률을 이용하여 주제별 정보 파급 확률 계산식을 도출하였고, 이 계산식에 구간 2에서 구한 인자 값들을 대입하여 주제별 정보 파급 확률을 구했다. 블로그 월드에서의 주제별 정보 파급 분석 장치는 주제별 정보 파급 확률로 구성한 블로그 연결망에 독립 전파 모델을 수행한 결과를 구간 3의 실제 파급 이력과 비교하였다. 이하, 회귀분석에 독립변수로 이용되는 네 가지 인자 값들은 편의상, 첫번째 인자값(

)을 ag1, 두번째 인자값(

)을 ag2, 세번째 인자값(

)을 ag3, 네번째 인자값(

)을 ag4로 표기한다.The thematic information ripple analysis device in the blog world divided the data used in the analysis into three sections of the same size to measure the prediction accuracy of the thematic information ripple analysis technique. Subsequently, the thematic information spreading analysis apparatus in the blog world derives a formula for calculating the probability of spreading information using the factors obtained in the interval 1 and the probability of the distribution in the interval 2, and substitutes the values of the factors obtained in the interval 2 in the calculation formula. The probability of dissemination of information by topic was calculated. The thematic information ripple analysis device in the blog world compares the results of the independent propagation model on the blog network consisting of the probability of information ripple with the actual ripple history of section 3. Hereinafter, the four parameter values used as the independent variables in the regression analysis are described for convenience.

) Ag1, the second argument value (

) Ag2, the third argument value (

) To ag3, the fourth argument (

) Is written as ag4.

도 3은 주제별 정보 파급 분석 장치에서, 여행에 관한 게시물의 파급을 예측한 결과를 도시한 도면이다.3 is a diagram showing a result of predicting the spread of a post about travel in a topic-specific information spread analysis device.

주제 '여행'에 대한 회귀분석 결과, 단계선택법에 따라 네 가지의 인자 값 중에서 ag2만이 선택되었으며, 결정계수(coefficient of determination)(R 제곱)는 0.446이다. 이는 종속변수의 44.6%가 독립변수에 의해 설명된다는 것을 의미한다. 회귀모형이 유의한지 검증하는 분산분석(analysis of variance) 결과로는 유의확률이 0.000으로, 도출된 회귀모형이 통계적으로 유의하다는 것을 알 수 있다. 도출된 회귀모형의 상수와 회귀계수(regression coefficient), 및 회귀계수의 통계적 유의성을 검증하는 결과는 주제 '여행'의 회귀모형으로서, [표 2]에 의해 나타낼 수 있다.The regression analysis on the topic 'Travel' revealed that only ag2 was selected from the four factor values according to the step selection method, and the coefficient of determination (R squared) was 0.446. This means that 44.6% of the dependent variables are explained by independent variables. As a result of analysis of variance to verify whether the regression model is significant, the probability of significance is 0.000, indicating that the derived regression model is statistically significant. The results of verifying the statistical significance of the derived regression constants, regression coefficients, and regression coefficients can be represented by Table 2 as a regression model of the subject 'travel'.

상수와 회귀계수의 유의확률이 0.000으로 모두 통계적으로 유의한 결과를 보였다. 회귀분석을 통하여 도출된 주제 '여행'에 대한 회귀방정식 F_여행은 [수학식 4]와 같다.The probability of the constant and the regression coefficient was 0.000, which showed statistically significant results. Regression equation F _traveling on the subject "travel" obtained through the regression analysis, is shown in [Equation 4].

F_여행을 이용하여 주제별 정보 파급 확률을 구하고, 이 확률을 이용하여 독립 전파 모델을 수행한 결과는 도 3과 같다. The propagation probability of each topic is calculated using F _travel , and the independent propagation model is performed using this probability as shown in FIG. 3.

도 3을 참조하면, x축은 비교 기법들을 나타내고, y축은 정확도를 의미한다. 우선 응답도를 살펴보면, 기법 Topic은 0.31로 0.13인 기법 Naive보다 약 2.4배 성능 향상을 보였으며, 정밀도에서 기법 Topic은 0.68로 0.75인 기법 Naive의 약 91% 정도의 성능을 보였다. 　F-measure를 살펴보면, 기법 Topic은 0.42로 0.24인 기법 ES에 비하여 약 1.8배의 성능 향상을 보였고, 0.22인 기법 Naive에 비하여 약 1.9배의 성능 향상을 보였으며, 기법 CST1과 기법 CST5에 비하여 각각 약 42배, 약 11배의 성능 향상을 보였다. 　각 비교 기법들의 특징을 살펴보면, 기법 ES는 응답도와 정밀도의 차이가 크지 않은 결과를 보였다. 　기법 Naive의 경우에는 응답도는 낮은 반면에 정밀도는 매우 높게 나왔다. 　이를 통하여 기법 Naive로 구성한 연결망은 이전에 파급 이력이 존재하지 않을 경우에 간선이 존재하지 않아 파급 수행 결과가 적게 나왔지만, 나온 결과들의 정확도는 다른 기법들보다 높다는 것을 알 수 있다. 　본 발명에서 제안하는 기법 Topic은 응답도와 정밀도에서 고르게 높은 정확도를 보여,　응답도와 정밀도의 조화 평균 값인 F-measure에서 가장 높은 정확도를 보였다. 　기법 CST1과 CST5의 경우에는 다른 기법들보다 매우 낮은 정확도를 보여, 모든 간선에 동일한 확률을 부여하는 방법으로는 정보의 파급을 예측하는 것이 어려움을 보였다.Referring to FIG. 3, the x axis represents comparison techniques and the y axis represents accuracy. First of all, the responsiveness of the technique is 0.31, which is about 2.4 times better than that of 0.13, which is 0.33. The precision of the technique is 0.68, which is about 91% of that of 0.75, which is 0.75. Looking at the F-measure, the technique Topic was 0.42, which showed about 1.8 times better performance than the 0.24 technique ES, and about 1.9 times the performance improvement compared to the 0.22 technique Naive, respectively, compared to the techniques CST1 and CST5. The performance improvement was about 42 times and about 11 times. Looking at the characteristics of each comparison technique, the method ES showed a result of little difference in responsiveness and precision. In the case of the technique Naive, the response is low while the precision is very high. Through this, the network consisting of the technique Naive resulted in less ripple performance due to the absence of edges when there is no ripple history, but the accuracy of the results is higher than that of other techniques. The technique Topic proposed in the present invention showed evenly high accuracy in response and precision, and showed the highest accuracy in F-measure, which is a harmonic mean value of response and precision. In the case of techniques CST1 and CST5, the accuracy is much lower than that of other techniques. Therefore, it is difficult to predict the spread of information by the method of giving the same probability to all edges.

도 4는 주제별 정보 파급 분석 장치에서, 영어에 관한 게시물의 파급을 예측한 결과를 도시한 도면이다.FIG. 4 is a diagram illustrating a result of predicting the spread of a post regarding English language in the information spreading analysis apparatus for each topic.

주제 '영어'에 대한 회귀분석 결과, 단계선택법에 따라 네 가지의 인자 값 중에서 ag2만이 선택되었으며, 결정계수(R 제곱)는 0.403이다. 　이는 종속변수의 40.3%가 독립변수에 의해 설명된다는 것을 의미한다. 　회귀모형이 유의한지 검증하는 분산분석 결과로는 유의확률이 0.000으로, 도출된 회귀모형이 통계적으로 유의하다는 것을 알 수 있다. 　도출된 회귀모형의 상수와 회귀계수, 및 회귀계수의 통계적 유의성을 검증하는 결과는 주제 '영어'의 회귀모형으로서, [표 3]에 의해 나타낼 수 있다.According to the regression analysis on the subject 'English', only ag2 was selected from the four factors according to the step selection method, and the coefficient of determination (R-squared) was 0.403. This means that 40.3% of the dependent variables are explained by independent variables. As a result of variance analysis to verify whether the regression model is significant, the probability of significance is 0.000, indicating that the derived regression model is statistically significant. The results of verifying the statistical significance of the constants, regression coefficients, and regression coefficients of the derived regression model are regression models of the subject 'English', and can be represented by [Table 3].

상수와 회귀계수의 유의확률이 0.000으로 모두 통계적으로 유의한 결과를 보였다. 　회귀분석을 통하여 도출된 주제 '영어'에 대한 회귀방정식 F_영어는 [수학식 5]와 같다.The probability of the constant and the regression coefficient was 0.000, which showed statistically significant results. The regression equation F _English on the subject 'English' derived from the regression analysis is shown in [Equation 5].

F_영어를 이용하여 주제별 정보 파급 확률을 구하고, 이 확률을 이용하여 독립 전파 모델을 수행한 결과는 도 4와 같다.The result of the information propagation probability of each subject is calculated using F _English , and the independent propagation model is performed using this probability as shown in FIG. 4.

도 4를 참조하면, x축은 비교 기법들을 나타내고, y축은 정확도를 의미한다. 　우선 응답도를 살펴보면, 기법 Topic은 0.30으로 0.20인 기법 Naive보다 약 1.5배 성능 향상을 보였으며, 정밀도에서 기법Topic은 0.66으로 0.88인 기법 Naive의 약 75% 정도의 성능을 보였다. F-measure를 살펴보면, 기법 Topic은 0.41로 0.26인 기법 ES에 비하여 약 1.6배의 성능 향상을 보였고, 0.33인 기법 Naive에 비하여 약 1.2배의 성능 향상을 보였으며, 기법 CST1과 기법 CST5에 비하여 각각 약 40배, 약 10배의 성능 향상을 보였다. 　각 비교 기법들의 특징을 살펴보면, 기법 ES는 주제 '여행'에서와 같이 응답도와 정밀도의 차이가 크지 않은 결과를 보였다. 　기법 Naive의 경우에는 주제'여행'의 결과보다 응답도가 높게 나왔고, 기법 ES와 거의 차이가 없어 F-measure에서는 기법 ES보다 높은 정확도를 보였다. 　이는 각 주제에 관련된 블로그 사용자의 수와 관련이 있다.　 주제 '여행'보다 '영어'에 관련된 사용자의 수가 많으며, 해당 주제에 관련된 사용자가 많아질수록 기법 Naive를 이용하여 구성하는 블로그 연결망의 크기가 커지게 된다. 　이처럼 파급을 수행하는 블로그 연결망의 크기가 커질수록 많은 결과를 얻게 되어 응답도는 높아진다.　 제안하는 기법 Topic은 응답도와 정밀도에서 고르게 높은 정확도를 보여, 응답도와 정밀도의 조화 평균 값인 F-measure에서 가장 높은 정확도를 보였다.Referring to FIG. 4, the x axis represents comparison techniques and the y axis represents accuracy. First of all, the response topologies showed a performance improvement of about 1.5 times compared to the technique naive of 0.30, which is 0.30, and the precision of the technique topic was 0.66, which is about 75% of that of the technique Naive of 0.88. Looking at the F-measure, the technique Topic is 0.41, which is about 1.6 times better than the 0.2 ES, which is about 1.2 times better than the 0.3 Nave, which is about 0.3 times, and compared to the techniques CST1 and CST5, respectively. The performance improvement was about 40 times and about 10 times. Looking at the characteristics of each comparison technique, the technique ES showed little difference in responsiveness and precision as in the topic 'Travel'. In the case of the technique Naive, the response was higher than the result of the subject 'travel', and there was almost no difference from the technique ES. This is related to the number of blog users involved in each topic. The number of users related to 'English' is larger than the topic 'Travel', and the larger the number of users related to the topic, the larger the network size of the blog network using the technique Naive. The larger the size of the blog network that spreads like this, the more results you get and the more responsive it is. The proposed Topic shows high accuracy in response and precision, and the highest accuracy in F-measure, which is the harmonic mean value of response and precision.

또한, 주제 '요리', '자동차' 및 '축구'에 대한 도면을 도시하지는 않았으나, 상기 주제 '요리', '자동차' 및 '축구'에 대한 예측 결과에서도, 제안하는 기법 Topic이 다른 기법들보다 높은 응답도와 F-measure에서의 높은 정확도를 만족하는 것을 알 수 있었다.In addition, although the drawings for the theme 'cooking', 'car' and 'soccer' are not shown, the proposed technique Topic is better than other techniques in the prediction results for the subject 'cooking', 'car' and 'soccer'. High response and high accuracy in F-measure were found.

도 5는 주제별 정보 파급 분석 장치에서, 이전에 파급이력이 존재하지 않았으나, 예측 구간에 새롭게 파급이 일어난 연결망에 대한 정확도를 도시한 도면이다.FIG. 5 is a diagram illustrating the accuracy of a connection network in which a ripple history is newly generated in a prediction interval, although a ripple history has not previously existed.

도 5를 참조하면, 다섯 가지 주제의 결과를 평균내보면, F-measure에서 제안하는 기법 Topic은 0.37로 0.25인 기법 ES에 비하여 약 1.5배 의 성능 향상을 보였고, 기법 CST1과 기법 CST5에 비하여 각각 약 37배, 약 9.3배의 성능 향상을 보였다. 　모든 주제에서 기법 ES와 기법 Topic은 실험 1의 결과와 유사하게 나타났지만, 기법 Naive의 경우에는 이전에 파급이력이 존재하지 않을 경우 연결망에 간선이 설정되지 않아서, 새롭게 발생한 파급들을 전혀 예측하지 못하였다.Referring to FIG. 5, when the average of the results of the five subjects is averaged, the technique Topic proposed by the F-measure is 0.37, which is about 1.5 times better than the technique ES of 0.25, and compared with the technique CST1 and the technique CST5, respectively. The performance improvement was about 37 times and about 9.3 times. In all subjects, the technique ES and the technique Topic appeared similar to the results of Experiment 1, but in the case of the technique Naive, if there was no previous history, no edges were established in the network, so no newly generated impacts could be predicted. .

도 6은 주제별 정보 파급 분석 장치에서, 다섯 가지 주제에 대한 현상 분석 결과를 도시한 도면이다.FIG. 6 is a diagram illustrating a phenomenon analysis result for five topics in a topic-specific information spreading and analyzing device.

<현상 분석><Phenomena Analysis>

현재 시점에서 발생한 정보의 파급을 분석하기 위하여, 비교 기법들을 대상으로 현상 분석을 수행하였다. 현상 분석에서는 제안하는 기법의 주제별 정보 파급 확률 계산식을 도출하기 위하여 하나의 구간만을 사용하였다.In order to analyze the spread of information that occurred at the present time, phenomenon analysis was performed on comparative techniques. In the phenomenon analysis, only one interval was used to derive the information propagation probability calculation for each subject of the proposed method.

모든 주제에 대한 회귀분석 결과로 결정계수(R 제곱)가 0.999에 근접하게 나타났고, 이는 종속변수의 대부분이 독립변수에 의해 설명된다는 것을 의미한다.Regression analysis on all subjects showed that the coefficient of determination (R-squared) was close to 0.999, meaning that most of the dependent variables were accounted for by independent variables.

동일하게 모든 주제에서 회귀모형이 유의한지 검증하는 분산분석 결과로 유의확률이 0.000으로 나타나, 도출된 회귀모형이 통계적으로 유의함을 보였다. 도출된 회귀모형의 상수와 회귀계수들의 유의확률이 0.000으로 모두 통계적으로 유의한 결과를 보였다. 　회귀분석을 통하여 도출된 각각의 주제에 대한 회귀방정식은 [표 4]와 같다.Similarly, the variance analysis results verifying that the regression model is significant in all subjects, and the probability of significance is 0.000, indicating that the derived regression model is statistically significant. The significant probability of the constant and regression coefficients of the regression model was 0.000, which was statistically significant. The regression equations for each subject derived through the regression analysis are shown in [Table 4].

모든 주제에 대한 회귀방정식에서 ag2의 회귀계수가 매우 큰 것을 알 수 있다. 이는 현상 분석에서 해당 주제가 아닌 다른 모든 주제에 대하여 사용자 A가 B에게 미친 영향이 매우 크다는 것을 의미한다.The regression equations for all subjects show that the regression coefficient of ag2 is very large. This means that the effect of user A on B is very large for all other subjects in the phenomenon analysis.

도 6를 참조하면, 다섯 가지 주제의 결과를 평균내보면, F-measure에서 제안하는 기법 Topic은 0.40으로 0.41인 기법 ES의 약 98% 정도의 성능을 보였고, 0.26인 기법 Naive에 비하여 약 1.5배의 성능 향상을 보였으며, 기법 CST1과 기법 CST5에 비하여 각각 약40배, 약 10배의 성능 향상을 보였다. 기법 ES와 기법 Naive의 결과 모두 예측분석의 결과보다 높은 정확도를 보였으며, 기법 Naive의 특성상 해당 주제에 관련된 사용자들만으로 연결망을 구성하기 때문에 주제와 관련된 사용자들의 수가 많을수록 높은 정확도를 보였다. 제안하는 기법 Topic의 경우, 예측 분석을 위한 모델이므로 현상 분석에는 기법 ES와 비슷하거나 조금 낮은 정확도를 보였다.Referring to FIG. 6, the average of the results of the five subjects shows that the technique Topic proposed by the F-measure is 0.40, which is about 98% of the technique ES of 0.41, and about 1.5 times that of the technique Naive of 0.26. The performance of the system is improved by about 40 times and about 10 times as compared to the technique CST1 and the technique CST5. Both the results of the technique ES and the technique naive showed higher accuracy than the results of the predictive analysis, and because of the nature of the technique naive, only the users related to the subject constitute the network, so the more the number of users related to the subject, the higher the accuracy. As the proposed Topic is a model for predictive analysis, the phenomenon analysis showed similar or slightly lower accuracy than the technique ES.

결과적으로, 제안하는 기법 Topic은 예측 분석에서 다른 기법들보다 높은 응답도와 정밀도, F-measure를 보였으며, 기법 Naive로는 예측할수 없는 새로운 파급을 예측하는데 유용함을 나타냄을 알 수 있다.As a result, the proposed Topic showed higher responsiveness, precision, and F-measure than other techniques in the predictive analysis, and it is found that it is useful for predicting new ripples that cannot be predicted by the technique Naive.

본 발명의 실시예에 따른 블로그 월드에서의 주제별 정보 파급 분석 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The subject information spread analysis method in the blog world according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

101: 블로그 월드에서의 주제별 정보 파급 분석 장치
103: 분류부
105: 전처리부
111: 처리부
113: 데이터베이스101: Apparatus for analyzing and spreading thematic information in the blog world
103: classification
105: preprocessing unit
111: processing unit
113: database

Claims

Search for a second blog constituting a network in connection with a second post registered in a first blog, and among the searched second blogs, a blog that does not register the first post registered in the first blog as a candidate blog. A preprocessor configured to determine and form a connection network between the first blog and the candidate blog, the connection network associated with the first post; And
Prediction unit for predicting the propagation probability of the network configured for the candidate blog
Apparatus for analyzing the topic information in the blog world comprising a.

The method of claim 1,
A classification unit classifying posts registered in a blog by a topic; And
A processing unit for analyzing the information spread for the posts classified by the topic using the network
Further comprising:
The preprocessing unit,
The subject-specific information spreading analysis device in the blog world, further comprising the connection network for the candidate blog in consideration of the analyzed information spreading.

delete

The method of claim 1,
The preprocessing unit,
The propagation probability of the post with respect to the configured network, the value of the ratio of the first post to the blog is configured to the network, the second post classified differently from the first post to the blog is configured Calculating based on any one of a value relating to a ratio, a value relating to a ratio of first posts propagated to a blog which is not configured, or a value relating to the number of first posts registered to a blog configured as the connecting network. , Thematic information ripple analysis device in the blog world.

The method of claim 1,
The preprocessing unit,
Deriving a regression equation that distinguishes the network from the front network and the back network based on the set propagation time point, and includes a value associated with the front network as an independent variable and a propagation probability of the back network as a dependent variable. Analysis department
Apparatus for analyzing thematic information spread in the blog world, including.

delete

Searching for a second blog constituting a connection network in association with a second post registered in the first blog;
Determining, from among the searched second blogs, a blog which does not register a first post registered in the first blog as a candidate blog;
Configuring a connection network associated with the first post between the first blog and the candidate blog; And
Estimating the propagation probability of the network configured for the candidate blog
Thematic information spread analysis method in the blog world comprising a.

The method of claim 8,
Categorizing the posts registered in the blog by the topic; And
Analyzing the information spread for the posts classified by the subject using the network;
Further comprising:
The step of configuring the network,
Configuring the connection network for the candidate blog further considering the analyzed information spread
Thematic information spread analysis method in the blog world comprising a.

delete

The method of claim 8,
Predicting the propagation probability of the network,
The propagation probability of the post with respect to the configured network, the value of the ratio of the first post to the blog is configured to the network, the second post classified differently from the first post to the blog is configured Calculating based on any one of a value relating to a ratio, a value relating to a ratio of first posts propagated to a blog which is not configured, or a value relating to the number of first posts registered to a blog configured as the connecting network. step
Thematic information spread analysis method in the blog world comprising a.

The method of claim 8,
Deriving a regression equation that distinguishes the network from the front network and the back network based on the set propagation time point, and includes a value associated with the front network as an independent variable and a propagation probability of the back network as a dependent variable. step
Thematic information spread analysis method in the blog world further comprising.

delete