KR20110089473A

KR20110089473A - System and method for matching between contents and advertizing medium

Info

Publication number: KR20110089473A
Application number: KR1020100008894A
Authority: KR
Inventors: 석윤찬; 김홍찬; 윤성로
Original assignee: 석윤찬
Priority date: 2010-02-01
Filing date: 2010-02-01
Publication date: 2011-08-09

Abstract

PURPOSE: A matching system between content and advertisement and method thereof are provided to supply advertisement to a user of content by analyzing content. CONSTITUTION: A keyword database(102) stores interest information of a keyword. An advertisement database(104) stores targeted advertisement layer information of the advertisement. A keyword extractor(106) extracts the frequency of each keyword from a web page. A keyword processing unit(108) uses interest information in keyword database.

Description

System and method for matching between contents and advertizing medium}

본 발명은 콘텐츠와 가장 관련이 높은 광고물을 매칭시키는 시스템 및 방법에 관한 것이다.The present invention is directed to a system and method for matching advertisements most relevant to content.

인터넷이 대중화되면서, 다른 매체보다 인터넷을 통하여 정보를 검색하고 이를 습득하는 경우가 많아지고 있다. 또한 이러한 추세에 맞추어, 광고 시장 또한 종래의 TV, 신문 등의 의존에서 벗어나 인터넷을 이용하는 경우가 기하급수적으로 증가하게 되었다.As the Internet is popularized, more information is searched and acquired through the Internet than other media. In addition, in line with this trend, the advertising market has also increased exponentially in the case of using the Internet away from the conventional TV, newspapers, and the like.

일반적으로 인터넷상의 광고는 짧은 텍스트, 배너, 또는 위젯 등의 형태로 구현되어 신문 기사 또는 블로그 게시물 등과 함께 웹 브라우저 상에 표시되는 형태로 제공된다. 그러나 이러한 광고의 경우 함께 게시되는 신문 기사 또는 블로그 게시물 등과는 별도로 제공되므로 상기 신문 기사 등과 전혀 관계없는 광고가 한 페이지에 표시되는 바, 사용자가 관심있어 하는 분야의 광고를 적소에 공급하지 못하게 되어 광고 효과가 반감되는 경우가 많다.In general, advertisements on the Internet are implemented in the form of short texts, banners, or widgets, and are provided in a form of being displayed on a web browser together with newspaper articles or blog posts. However, since these advertisements are provided separately from newspaper articles or blog posts that are published together, advertisements irrelevant to the newspaper articles are displayed on one page. The effect is often halved.

본 발명의 목적은, 인터넷 등의 콘텐츠를 분석하여 상기 콘텐츠와 가장 유사도가 높은 광고물을 선정함으로써 상기 콘텐츠의 이용자에게 적절한 광고물을 제공하기 위한 것이다.An object of the present invention is to provide an appropriate advertisement to a user of the content by analyzing the content such as the Internet and selecting the advertisement having the highest similarity with the content.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 시스템은, 복수 개의 키워드 및 상기 키워드의 계층 구간별 관심도 정보가 저장되는 키워드 데이터베이스; 복수 개의 광고물 및 상기 광고물의 타겟 광고층 정보가 저장되는 광고물 데이터베이스; 웹 페이지로부터 키워드 및 각 키워드의 빈도수를 추출하는 키워드 추출부; 추출된 상기 키워드 및 빈도수와 상기 키워드 데이터베이스에 저장된 각 키워드의 계층 구간별 관심도 정보를 이용하여 상기 웹 페이지의 관심 성향을 계산하는 키워드 처리부; 및 상기 키워드 처리부에서 계산된 상기 웹 페이지의 관심 성향 및 상기 광고물 데이터베이스에 저장된 광고물 별 타겟 광고층 정보를 대비하여 상기 웹 페이지에 대응되는 광고물을 선정하는 광고물 선정부;를 포함한다.According to an aspect of the present invention, there is provided a matching system between a content and an advertisement, the keyword database storing a plurality of keywords and interest level information for each hierarchical section of the keyword; An advertisement database storing a plurality of advertisements and target advertisement layer information of the advertisements; A keyword extraction unit for extracting keywords and the frequency of each keyword from the web page; A keyword processing unit that calculates an interest propensity of the web page by using the extracted keyword and frequency and interest level information for each hierarchical section of each keyword stored in the keyword database; And an advertisement selector configured to select advertisements corresponding to the web page in consideration of the interest propensity of the web page calculated by the keyword processor and target advertisement layer information for each advertisement stored in the advertisement database.

한편, 상기 과제를 해결하기 위한 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 방법은, 콘텐츠와 광고물 간 매칭 시스템에서, 웹 페이지로부터 키워드 및 각 키워드의 빈도수를 추출하는 단계; 상기 콘텐츠와 광고물 간 매칭 시스템에서, 추출된 상기 키워드 및 빈도수와 각 키워드의 계층 구간별 관심도 정보를 이용하여 상기 웹 페이지의 관심 성향을 계산하는 단계; 상기 콘텐츠와 광고물 간 매칭 시스템에서, 계산된 상기 웹 페이지의 관심 성향 및 상기 광고물 별 타겟 광고층 정보를 대비하여 상기 웹 페이지에 대응되는 광고물을 선정하는 단계;를 포함한다.On the other hand, the matching method between the content and the advertisement according to an embodiment of the present invention for solving the above problem, in the matching system between the content and the advertisement, extracting a keyword and the frequency of each keyword from the web page; Calculating an interest propensity of the web page by using the extracted keyword and frequency and interest information for each hierarchical section of the keyword in the matching system between the content and the advertisement; And selecting, in the matching system between the content and the advertisement, the advertisement corresponding to the web page by comparing the calculated propensity to interest of the web page and target advertisement layer information for each advertisement.

본 발명은 인터넷 등의 콘텐츠를 분석하여 상기 콘텐츠와 가장 유사도가 높은 광고물을 선정하여 상기 콘텐츠와 함께 표시될 수 있도록 함으로써 상기 광고물의 주목도 및 클릭율을 높이고 나아가 광고하고자 하는 상품 등의 구매율을 높일 수 있는 장점이 있다.The present invention analyzes content such as the Internet and selects an advertisement having the highest similarity with the content so that the advertisement can be displayed together with the content, thereby increasing the attention and click rate of the advertisement and increasing the purchase rate of a product to be advertised. There are advantages to it.

도 1은 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 시스템(100)의 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따라 웹 페이지로부터 추출된 키워드 및 빈도수로부터 상기 웹 페이지의 관심 성향을 계산하는 과정을 나타낸 순서도(200)이다.
도 3은 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 방법(300)을 도시한 순서도이다.1 is a view showing the configuration of a matching system 100 between content and advertisements according to an embodiment of the present invention.
2 is a flowchart 200 illustrating a process of calculating a propensity of interest of the web page from keywords and frequencies extracted from the web page according to an embodiment of the present invention.
3 is a flowchart illustrating a matching method 300 between content and advertisements according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. However, this is only an example and the present invention is not limited thereto.

본 발명을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. In describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intention or custom of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification.

본 발명의 기술적 사상은 청구범위에 의해 결정되며, 이하의 실시예는 본 발명의 기술적 사상을 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 효율적으로 설명하기 위한 일 수단일 뿐이다.
The technical spirit of the present invention is determined by the claims, and the following embodiments are merely means for efficiently explaining the technical spirit of the present invention to those skilled in the art.

도 1은 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 시스템(100)의 구성을 나타낸 도면이다.1 is a view showing the configuration of a matching system 100 between content and advertisements according to an embodiment of the present invention.

도시된 바와 같이, 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 시스템(100)은 키워드 데이터베이스(102), 광고물 데이터베이스(104), 키워드 추출부(106), 키워드 처리부(108) 및 광고물 선정부(110)를 포함한다.As shown, the matching system 100 between the content and the advertisement according to an embodiment of the present invention is a keyword database 102, the advertisement database 104, keyword extraction unit 106, keyword processing unit 108 and The advertisement selection unit 110 is included.

키워드 데이터베이스(102)는 복수 개의 키워드 및 상기 키워드의 계층 구간별 관심도 정보가 저장되는 데이터베이스이다. 상기 키워드는 주로 한국어 명사로 구성되며, 일상 생활에서 자주 사용되는 단어로 구성될 수 있다. 예를 들어, "가방", "가구", "자동차" 등의 단어들이 키워드 데이터베이스(102)에 저장되는 키워드로 선정될 수 있다. The keyword database 102 is a database in which a plurality of keywords and interest information for each hierarchical section of the keyword are stored. The keyword is mainly composed of Korean nouns, it may be composed of words frequently used in everyday life. For example, words such as "bag", "furniture", "car", etc. may be selected as keywords stored in the keyword database 102.

또한 상기 계층 구간별 관심도 정보는 키워드 데이터베이스(102)에 저장되는 각 키워드에 대하여 기 설정된 계층 구간별로 관심도의 차이가 어떠한지에 대한 정보를 나타낸다. 예를 들어, 상기 계층 구간별 관심도 정보는 상기 키워드에 대한 “성별 및 연령 계층별” 관심도의 차이에 대한 정보일 수 있으며, 이를 위하여 성별 및 연령 구간을 "남10대, 남20대, 남30대, 남40대, 남50대, 여10대, 여20대, 여30대, 여40대, 여50대"의 10개의 구간으로 나누고 각각의 구간별 해당 키워드에 대한 관심도의 차이를 백분률(%)로 측정할 수 있다. 아래의 표 1은 키워드 "가구" 및 "가방"에 대하여 성별 및 연령 구간별 관심도 정보를 예시한 것이다.
In addition, the interest level information for each hierarchical section indicates information on whether there is a difference in interest for each preset hierarchical section for each keyword stored in the keyword database 102. For example, the interest level information for each hierarchical section may be information on a difference in interest of “by gender and age hierarchy” for the keyword. For this, the gender and age sections may be referred to as “South teens, South 20s, South 30s”. Divide into 10 sections of "large, 40s, 50s, 10s, 20s, 30s, 40s, 50s" and divide the percentage of interest in the corresponding keywords for each section. It can be measured in (%). Table 1 below illustrates the interest information for each gender and age section for the keywords “furniture” and “bag”.

키워드keyword 남south 여female 10~1910-19 20~2920-29 30~3930-39 40~4940-49 50~5950-59 10~1910-19 20~2920-29 30~3930-39 40~4940-49 50~5950-59 가구furniture 7.27.2 9.69.6 17.217.2 8.28.2 4.44.4 8.28.2 11.011.0 19.719.7 9.39.3 5.25.2 가방bag 9.69.6 11.411.4 7.87.8 4.34.3 2.02.0 17.617.6 21.121.1 14.314.3 8.28.2 3.73.7

상기 키워드와 계층 구간별 관심도 정보는 예를 들어 포털 등의 검색 엔진에서 입력된 검색어 및 상기 검색어를 입력한 사용자의 계층 정보를 이용하여 생성될 수 있다. 즉, 각각의 키워드에 대하여 특정 구간(일례로, 남20대)에 해당하는 사용자의 해당 키워드 입력 빈도가 높을 경우, 해당 구간의 관심도 백분률이 증가할 수 있다. 따라서 상기 키워드 및 계층 구간별 관심도를 이용하면 특정 키워드의 각 계층 구간별 관심 성향을 파악할 수 있다. 상기 사용자의 계층 정보는 상기 사용자로부터 직접 입력받을 수도 있으며, 또한 상기 사용자가 가입시 입력한 정보를 이용할 수도 있다.The keyword and interest information for each hierarchical section may be generated using, for example, a search word input from a search engine such as a portal and hierarchical information of a user who inputs the search word. That is, if a user inputs a keyword corresponding to a specific section (for example, South 20s) with respect to each keyword, the interest percentage of the corresponding section may increase. Therefore, by using the keyword and the interest level for each hierarchical section, it is possible to determine the interest propensity for each hierarchical section of a specific keyword. The hierarchical information of the user may be input directly from the user, or may use information input when the user subscribes.

한편, 본 실시예에서는 상기 계층 구간의 예로서 주로 성별 및 연령 구간을 예로 들어 설명하였으나, 본 발명에서의 계층 구간은 이 외에도 다양한 기준이 적용될 수 있다. 예를 들어 사용자의 급여수준, 직종, 지역, 인종, 미혼/기혼 여부 등을 본 발명에서의 계층 구간으로 정의하고, 정의된 계층 구간별 각 키워드의 관심도를 저장할 수 있다.Meanwhile, in the present embodiment, the example of the hierarchical section has been mainly described as a gender and an age section, but various criteria may be applied to the hierarchical section in the present invention. For example, a user's salary level, occupation, region, race, single / married status, etc. may be defined as hierarchical sections in the present invention, and the interest level of each keyword for each defined hierarchical section may be stored.

광고물 데이터베이스(104)는 복수 개의 광고물 및 상기 광고물의 타겟 광고층 정보가 저장되는 데이터베이스이다. 상기 광고물은 예를 들어, 광고주가 광고하고자 하는 웹사이트에 대한 정보일 수 있으며, 이 경우 상기 광고물은 광고하고자 하는 웹사이트의 유알엘(URL) 정보를 포함하여 구성된다. 또한 상기 광고물의 타겟 광고층 정보는 해당 웹사이트에 대한 예상 방문자의 계층 구간에 따른 선호도 정보이다. 예를 들어, 광고주가 광고하고자 하는 웹사이트가 20~30대 여성 의류 판매 사이트일 경우, 타겟 광고층은 주로 여20대 및 여30대가 될 것이며, 고급 외제 자동차 정보 사이트일 경우, 타겟 광고층은 주로 남 30대 내지 남50대가 될 것이다. 또한 광고물 데이터베이스(104)는 광고하고자 하는 웹사이트와 연관되는 키워드에 대한 정보 또한 저장할 수 있다. 예를 들어 여성 의류 판매 사이트와 연관되는 키워드는 치마, 스커트 등이 될 수 있으며, 외제 자동차 정보 사이트의 경우 벤츠, BMW 등이 될 수 있다.The advertisement database 104 is a database in which a plurality of advertisements and target advertisement layer information of the advertisements are stored. The advertisement may be, for example, information about a website to be advertised by an advertiser. In this case, the advertisement may include URL information of a website to be advertised. In addition, the target advertisement layer information of the advertisement is preference information according to the hierarchical section of the prospective visitor for the website. For example, if the website you want to advertise is a women's clothing sale site in your twenties and thirties, your target audience will be primarily in your twenties and female thirties. It will be mainly in their 30s and 50s. Advertisement database 104 may also store information about keywords associated with the website to be advertised. For example, a keyword associated with a women's clothing sales site may be a skirt, a skirt, or the like, and in the case of an external automobile information site, the keyword may be Mercedes, BMW, or the like.

표 2는 이와 같이 광고물 데이터베이스(104)에 저장된 타겟 광고층을 예시한 것으로서, 키워드 데이터베이스(102)와 동일하게 구분된 성별 및 연령 구간에 대한 각 광고물의 예상 타겟 광고층을 백분률(%)로 나타낸 것이다.
Table 2 illustrates the target advertisement layer stored in the advertisement database 104 as described above. The expected target advertisement layer of each advertisement for the gender and age intervals identical to the keyword database 102 is represented as a percentage (%). It is represented as.

광고물Advertisement 남south 여female 10~1910-19 20~2920-29 30~3930-39 40~4940-49 50~5950-59 10~1910-19 20~2920-29 30~3930-39 40~4940-49 50~5950-59 여성의류 쇼핑몰Women's Clothing Store 00 00 00 00 00 1010 4040 4040 1010 00 외제차 정보 사이트Foreign car information site 00 1010 2525 2525 1515 00 00 1010 1010 55

키워드 추출부(106)는 웹 페이지로부터 키워드 및 각 키워드의 빈도수를 추출한다. 상기 웹 페이지는 본 발명의 실시예에 따라 광고물이 결합되기 위한 콘텐츠가 저장된 페이지로서, 예를 들어, 신문 기사, 블로그 게시물, 기타 웹사이트 게시물 등이 될 수 있다.The keyword extracting unit 106 extracts the keyword and the frequency of each keyword from the web page. According to an embodiment of the present invention, the web page is a page storing content for combining advertisements. For example, the web page may be a newspaper article, a blog post, or another website post.

키워드 추출부(106)에서 웹 페이지로부터 키워드 및 각 키워드의 빈도수를 추출하기 위해서는 먼저 웹 페이지로부터 HTML 태그를 제외하고 의미 있는 텍스트만을 추출할 필요가 있다. 이와 같은 웹 페이지로부터 HTML 태그의 제거는 HTML 파서(Parser)를 이용하여 이루어진다. 상기 HTML 파서의 기술적 구성 등에 대해서는 본 기술분야에서 공지된 사항에 해당하는 바, 여기서는 그 상세한 설명을 생략한다. 또한 상기 텍스트 추출 과정에서 상기 웹 페이지에 포함되어 있는 광고 등의 불필요한 부분 또한 제거될 수 있다.In order to extract the keyword and the frequency of each keyword from the web page, the keyword extractor 106 needs to extract only the meaningful text from the web page without the HTML tag. Removal of HTML tags from such web pages is accomplished using an HTML parser. The technical configuration of the HTML parser, etc. correspond to those known in the art, and a detailed description thereof will be omitted herein. In addition, in the text extraction process, unnecessary parts such as advertisements included in the web page may also be removed.

상기와 같이 웹 페이지로부터 HTML 태그가 제거되고 텍스트가 추출되면, 다음으로 상기 텍스트를 형태소 분석하여 키워드 및 각 키워드의 텍스트 내에서의 빈도수를 산출한다. 상기 텍스트의 형태소 분석 과정에서 불필요한 기호, 영문자 또는 불용어 등은 제거되며, 음절정보 사전, 조사사전, 어미사전 등을 이용하여 문법 형태소를 제거하여 어간만을 찾은 후 이를 기본사전 및 확장사전 등과 대조하여 키워드를 추출하게 된다.As described above, when the HTML tag is removed from the web page and the text is extracted, the text is then morphologically analyzed to calculate keywords and frequencies in the text of each keyword. In the process of morphological analysis of the text, unnecessary symbols, English letters or stopwords are removed, and grammatical morphemes are removed using syllable information dictionaries, search dictionaries, and ending dictionaries. Will be extracted.

키워드 처리부(108)는 키워드 추출부(106)로부터 추출된 상기 키워드 및 빈도수와 키워드 데이터베이스(102)에 저장된 각 키워드의 계층 구간별 관심도 정보를 이용하여 상기 웹 페이지의 관심 성향을 계산한다. 상기 웹 페이지의 관심 성향은 해당 웹 페이지에 관심을 가지는 계층 구간은 어떠한지를 나타내는 지표로서, 예를 들어 키워드 데이터베이스(102)에서 구분된 것과 동일한 계층구간에서의 해당 웹 페이지에 대한 상대적 관심도의 차이의 백분률로 나타낼 수 있다.The keyword processing unit 108 calculates an interest propensity of the web page using the keyword and frequency extracted from the keyword extraction unit 106 and the interest level information for each hierarchical section of each keyword stored in the keyword database 102. The interest tendency of the web page is an index indicating how the hierarchy section interested in the web page is concerned. For example, the difference in the relative interest of the web page in the same hierarchy section as distinguished from the keyword database 102 is determined. It can be expressed as a percentage.

도 2는 본 발명의 일 실시예에 따라 웹 페이지로부터 추출된 키워드 및 빈도수로부터 상기 웹 페이지의 관심 성향을 계산하는 과정을 나타낸 순서도(200)이다.2 is a flowchart 200 illustrating a process of calculating a propensity of interest of the web page from keywords and frequencies extracted from the web page according to an embodiment of the present invention.

먼저, 상기 추출된 키워드 별로 키워드 데이터베이스(102)로부터 해당 키워드의 계층별 관심도 정보를 추출한다(202). 다음으로 상기 추출된 관심도 정보의 각 구간별 값에 해당 키워드의 빈도수를 곱하고(204), 상기 빈도수가 곱해진 각 키워드 별 관심도 정보를 웹 페이지의 모든 키워드에 대하여 합산하고 이를 전 구간의 합이 100%가 되도록 정규화(normalize)함으로써(206), 상기 관심 성향을 계산할 수 있다.First, hierarchical interest information for each keyword is extracted from the keyword database 102 for each extracted keyword (202). Next, the frequency value of the keyword is multiplied by the value of each section of the extracted interest information (204), and the interest information for each keyword multiplied by the frequency is summed for all keywords of the web page, and the sum of all sections is 100. By normalizing to 206 (206), the propensity of interest can be calculated.

이와 같은 방법으로 생성된 상기 웹 페이지의 관심 성향 정보는 상기 각 구간별 값을 원소로 갖는 벡터로 표현될 수 있다. 예를 들어, 전술한 바와 같이 상기 성별 및 연령 구간별 관심도 정보가 남10대부터 여50대까지 10개의 구간으로 나뉘어질 경우 상기 웹페이지의 관심 성향 정보는 10개의 원소를 갖는 10차원 벡터로 표현될 수 있다.The interest propensity information of the web page generated in this manner may be expressed as a vector having a value of each section as an element. For example, as described above, when the interest information for each gender and age section is divided into 10 sections from South teens to 50s, the interest propensity information of the web page is expressed as a 10-dimensional vector having 10 elements. Can be.

상기와 같은 웹 페이지로부터의 관심 성향 추출 과정을 예를 들어 나타내면 다음과 같다. 예를 들어, 소정의 웹 페이지로부터 추출된 키워드가 "가구", "가방"의 2개이고, 각각의 빈도수를 2, 3이라 하자. 상기 키워드들의 성별 및 연령 구간별 관심도 정보는 각각 표 1에 기재된 것과 같다고 가정한다.For example, the process of extracting a disposition of interest from such a web page is as follows. For example, suppose that the keywords extracted from a given web page are two of "furniture" and "bag", and the frequencies of each are 2 and 3. It is assumed that interest information for each gender and age section of the keywords is as described in Table 1, respectively.

먼저, 표 1에 기재된 "가구"의 성별 및 연령 계층별 관심도 정보에 상기 웹 페이지에서의 "가구"의 빈도수인 3을 곱하면 다음과 같다. (각 값들의 순서는 표 1에서와 동일하다.)
First, multiplying the frequency of interest information by gender and age hierarchy of "furniture" shown in Table 1 by 3, which is the frequency of "furniture" in the web page, is as follows. (The order of each value is the same as in Table 1.)

(21.6, 28.8, 51.6, 24.6, 13.2, 24.6, 33, 59.1, 27.9, 15.6)
(21.6, 28.8, 51.6, 24.6, 13.2, 24.6, 33, 59.1, 27.9, 15.6)

다음으로, 동일한 방법으로 "가방"에 해당하는 관심도 정보에 "가방"의 빈도수인 2를 곱하면 다음과 같다.
Next, multiplying the degree of interest information corresponding to "bag" by 2, which is the frequency of "bag," as follows.

(19.2, 22.8, 15.6, 8.6, 4, 35.2, 42.2, 28.6, 16.4, 7.4)
(19.2, 22.8, 15.6, 8.6, 4, 35.2, 42.2, 28.6, 16.4, 7.4)

다음으로, 빈도수가 곱해진 관심도 정보를 웹페이지의 모든 키워드(가구 및 가방)에 대해 합산하면 다음과 같고,
Next, adding interest multiplied by frequency for all the keywords (furnitures and bags) on a webpage:

(43.2, 53.4, 57.8, 29.3, 14.8, 69.2, 85.3, 82.3, 43.2, 21.5)
(43.2, 53.4, 57.8, 29.3, 14.8, 69.2, 85.3, 82.3, 43.2, 21.5)

이를 다시 전체 빈도수인 5로 나누어 합계가 100이 되도록 정규화하면 다음과 같다.
This is divided by the total frequency of 5 and normalized so that the sum is 100.

(8.64, 10.68, 11.56, 5.86, 2.96, 13.84, 17.06, 16.46, 8.64, 4.3)
(8.64, 10.68, 11.56, 5.86, 2.96, 13.84, 17.06, 16.46, 8.64, 4.3)

전술한 바와 같이, 상기 정규화된 관심도 정보는 10차원 벡터로 표현 가능한 것을 상기 예를 통하여 알 수 있다. 상기와 같이 계산된 관심도 정보는 해당 웹 페이지에 대한 성별 및 연령대별 관심도를 나타낸 것으로서, 그 값이 높을수록 해당 성별 또는 계층의 상기 웹페이지에 대한 관심도가 상대적으로 높음을 나타낸다.As described above, it can be seen through the above example that the normalized interest information can be expressed as a 10-dimensional vector. The interest information calculated as described above represents interests of the corresponding web pages by gender and age group, and the higher the value, the higher the interest level of the corresponding web page of the corresponding gender or hierarchy.

마지막으로, 광고물 선정부(110)는 키워드 처리부(108)에서 계산된 상기 웹 페이지의 관심 성향 및 광고물 데이터베이스(104)에 저장된 광고물 별 타겟 광고층 정보를 대비하여 상기 웹 페이지에 대응되는 광고물을 선정한다. 이때, 광고물 선정부(110)는 광고물 데이터베이스(104)에 저장된 상기 각 광고물 별로 대응되는 상기 계층 구간별 예상 선호도를 원소로 하는 광고물 성향 벡터를 생성하고, 생성된 상기 광고물 성향 벡터와 상기 웹 페이지의 관심 성향 벡터를 내적한 값의 크기를 비교함으로써 상기 웹 페이지에 대응되는 광고물을 선정하게 된다.Lastly, the advertisement selection unit 110 corresponds to the web page by comparing the interest propensity of the web page calculated by the keyword processing unit 108 and target advertisement layer information for each advertisement stored in the advertisement database 104. Select the advertisement. In this case, the advertisement selecting unit 110 generates an advertisement propensity vector having an expected preference for each hierarchical section corresponding to each advertisement stored in the advertisement database 104, and generates the advertisement propensity vector. And the advertisement corresponding to the web page is selected by comparing the magnitude of the value of the interest propensity vector of the web page.

예를 들어, 상기 표 2에 도시된 여성의류 쇼핑몰과 외제차 정보 사이트의 타겟 광고층 정보를 이용하여 광고물 성향 벡터를 생성하면 다음과 같다.
For example, when the advertisement propensity vector is generated using the target advertisement floor information of the women's clothing shopping mall and the foreign vehicle information site shown in Table 2, it is as follows.

여성의류 쇼핑몰 : (0, 0, 0, 0, 0, 10, 40, 40, 10, 0)Women's Clothing Store: (0, 0, 0, 0, 0, 10, 40, 40, 10, 0)

외제차 정보 사이트 : (0, 10, 25, 25, 15, 0, 0, 10, 10, 5)
Foreign car information site: (0, 10, 25, 25, 15, 0, 0, 10, 10, 5)

다음으로 상기 광고물 성향 벡터를 각 값들의 합이 1이 되도록 정규화하고, 키워드 처리부(108)에서 생성한 관심 성향 벡터와의 내적값을 구하면 각각 다음과 같다.
Next, the advertisement propensity vector is normalized such that the sum of the values is 1, and the internal product value of the propensity vector of interest generated by the keyword processor 108 is obtained as follows.

여성의류 쇼핑몰 : 78.28Women's Clothing Store: 78.28

외제차 정보 사이트 : 42.96
Foreign car information site: 42.96

상기 내적값은 웹페이지의 관심 성향 계층과 광고물의 타겟 광고층간의 유사도를 나타내는 것으로서, 상기 내적값이 클수록 상기 웹 페이지의 관심 성향 계층과 광고물의 타겟 광고층이 유사하다는 것을 나타낸다. 광고물 선정부(110)는 상기 내적값들을 비교하여 가장 큰 값을 가지는 1개의 광고물 또는 내적값이 큰 순서로 소정 개수의 광고물을 상기 웹 페이지에 매칭되는 광고물로 선정할 수 있다. 상기 예에서는 여성의류 쇼핑몰이 외제차 정보 사이트와 비교하여 내적값이 높으므로 해당 웹 페이지에 매칭되는 광고물로 상기 여성의류 쇼핑몰을 선정할 수 있다.
The inner value represents the similarity between the interest propensity layer of the web page and the target advertisement layer of the advertisement, and the larger the inner value, the more similar the interest propensity layer of the web page and the target advertisement layer of the advertisement. The advertisement selecting unit 110 may compare the inner product values and select one advertisement having the largest value or a predetermined number of advertisements in the order of the inner product values as the advertisements matching the web page. In the above example, since the women's clothing shopping mall has a higher internal value compared to the foreign car information site, the women's clothing shopping mall may be selected as an advertisement that matches the corresponding web page.

한편, 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 시스템(100)은 키워드 확장부(미도시) 및 확장 키워드 데이터베이스(미도시)를 더 포함할 수 있다.Meanwhile, the matching system 100 between the content and the advertisement according to an embodiment of the present invention may further include a keyword expansion unit (not shown) and an extended keyword database (not shown).

상기 키워드 확장부는, 키워드 추출부(106)에서 추출된 키워드 중 키워드 데이터베이스(102)에 저장되어 있지 않은 키워드가 존재하는 경우, 해당 키워드를 확장 키워드로 선정하고, 상기 확장 키워드의 계층 구간별 관심도 정보 및 상기 웹 페이지에서의 빈도수를 계산한다. 즉, 본 발명의 실시예에 있어 키워드 데이터베이스(102)는 사용자들이 많이 사용하거나 또는 사용자들의 성향을 쉽게 파악할 수 있는 키워드가 주로 저장되며, 키워드 데이터베이스(106)에 저장되어 있지 않은 키워드가 웹 페이지의 키워드로 추출되는 경우, 해당 키워드는 상기 키워드 확장부에 의하여 상기 확장 키워드 데이터베이스에 저장된다. If there is a keyword that is not stored in the keyword database 102 among the keywords extracted by the keyword extraction unit 106, the keyword expansion unit selects the corresponding keyword as an expansion keyword, and interest level information for each of the hierarchical sections of the expansion keyword. And calculate the frequency in the web page. That is, in the embodiment of the present invention, the keyword database 102 mainly stores keywords that are frequently used by users or that can easily grasp the propensity of the users. When extracted as a keyword, the keyword is stored in the extended keyword database by the keyword expansion unit.

상기 키워드 확장부는 키워드 데이터베이스(102)에 저장된 키워드들의 계층 구간별 관심도를 계산하는 것과 동일한 방법으로 상기 확장 키워드의 관심도를 계산할 수 있다. 즉, 사용자로부터 입력받은 키워드 및 상기 키워드를 입력한 사용자의 신상 정보를 이용하여 상기 확장 키워드를 입력한 계층의 빈도를 계산할 수 있다.The keyword expansion unit may calculate the interest of the extended keyword in the same way as to calculate the interest of each of the hierarchical sections of the keywords stored in the keyword database 102. That is, the frequency of the hierarchical layer in which the extended keyword is input may be calculated using the keyword input from the user and the personal information of the user who inputs the keyword.

상기 확장 키워드 데이터베이스는, 상기 확장 키워드 계산부에서 계산된 상기 확장 키워드의 계층 구간별 관심도 정보가 저장된다.The extended keyword database stores interest information for each layer section of the extended keyword calculated by the extended keyword calculator.

이와 같이 확장 키워드가 저장되는 경우, 키워드 처리부(108)는 키워드 추출부(106)에서 추출된 키워드 중 키워드 데이터베이스(102)에 저장되지 않은 키워드가 존재하는 경우, 해당 키워드가 상기 확장 키워드 데이터베이스에 저장된 키워드인지의 여부를 판단하고, 상기 확장 키워드 데이터베이스에 저장된 키워드인 경우 상기 확장 키워드 데이터베이스로부터 해당 키워드의 계층 구간별 관심도 정보를 추출하게 된다. 다만, 상기 확장 키워드 데이터베이스에 저장된 키워드는 키워드 데이터베이스(102)에 저장된 키워드와는 중요도가 다르게 적용되어야 하는 바, 추출된 성향 정보에 별도의 가중치를 곱해주게 된다. 상기 가중치는 해당 확장 키워드의 웹 페이지 내에서의 빈도수에 따라 정해지며, 구체적으로 다음과 같이 정해질 수 있다.When the extended keyword is stored as described above, the keyword processing unit 108, when there is a keyword that is not stored in the keyword database 102 among the keywords extracted by the keyword extraction unit 106, the keyword is stored in the extended keyword database. It is determined whether the keyword is a keyword, and if the keyword is stored in the extended keyword database, interest information for each hierarchical section of the corresponding keyword is extracted from the extended keyword database. However, since the keyword stored in the extended keyword database should be applied differently from the keyword stored in the keyword database 102, the extracted propensity information is multiplied by a separate weight. The weight is determined according to the frequency in the web page of the corresponding extended keyword. Specifically, the weight may be determined as follows.

먼저, 상기 빈도수가 기 설정된 최소값(lower bound)인 경우 상기 가중치는 0으로 설정된다. 즉, 빈도수가 일정 값 이하인 확장 키워드의 경우 해당 웹 페이지의 주제어와는 관계가 없을 가능성이 높으므로 해당 확장 키워드의 관심도는 계산에서 배제한다.First, when the frequency is a predetermined lower bound, the weight is set to zero. That is, since an extension keyword having a frequency less than or equal to a certain value is most likely not related to the main word of the web page, the interest of the extension keyword is excluded from the calculation.

다음으로, 상기 빈도수가 기 설정된 최대값(upper bound)인 경우 기 설정된 가중치의 최대값을 적용한다. 예를 들어, 기 설정된 가중치의 최대값이 A라면, 빈도수가 기 설정된 최대값을 넘는 확장 키워드의 경우 각 원소의 값에 A를 곱해주게 된다.Next, when the frequency is an upper bound, the maximum value of the preset weight is applied. For example, if the maximum value of the preset weight is A, in case of the extended keyword whose frequency exceeds the preset maximum value, A is multiplied by the value of each element.

마지막으로, 상기 빈도수가 최소값과 최대값 사이인 경우에는 상기 가중치의 최대값에 해당 확장 키워드의 빈도수와 최대값의 비율을 곱한 만큼 가중치를 적용할 수 있다. 예를 들어, 최대 빈도수가 10이고, 특정 확장 키워드의 빈도수가 6, 최대 가중치가 0.8인 경우, 해당 확장 키워드의 가중치는 0.8 * 6 / 10 = 0.48로 정해질 수 있다.
Finally, when the frequency is between the minimum value and the maximum value, the weight may be applied as much as the maximum value of the weight is multiplied by the ratio of the frequency and the maximum value of the corresponding extended keyword. For example, when the maximum frequency is 10, the frequency of the specific extended keyword is 6 and the maximum weight is 0.8, the weight of the corresponding extended keyword may be set to 0.8 * 6/10 = 0.48.

도 3은 본 발명의 일 실시예에 따른 콘텐츠와 광고물 간 매칭 방법(300)을 도시한 순서도이다.3 is a flowchart illustrating a matching method 300 between content and advertisements according to an embodiment of the present invention.

먼저, 웹 페이지로부터 키워드 및 각 키워드의 빈도수를 추출한다(302). 상기 키워드 및 키워드의 빈도수 추출 과정에 대해서는 전술하였다.First, keywords and frequency of each keyword are extracted from a web page (302). The keyword and the frequency extraction process of the keyword have been described above.

다음으로, 추출된 상기 키워드 및 빈도수와 각 키워드 데이터베이스(102)로부터 추출된 각 키워드의 계층 구간별 관심도 정보를 이용하여 상기 웹 페이지의 관심 성향을 계산한다(304). 상기 웹 페이지의 관심 성향은 전술한 바와 같이, 상기 각 계층 구간의 관심도 값을 원소로 가지는 다차원 벡터로 나타낼 수 있다.Next, the interest propensity of the web page is calculated using the extracted keyword and frequency and the interest level information for each hierarchical section of each keyword extracted from each keyword database 102 (304). As described above, the interest propensity of the web page may be represented as a multidimensional vector having an interest value of each layer section as an element.

다음으로, 계산된 상기 웹 페이지의 관심 성향 및 광고물 데이터베이스(104)에 저장된 상기 광고물 별 타겟 광고층 정보를 대비하여 상기 웹 페이지에 대응되는 광고물을 선정한다. 상기 광고물 선정은 벡터로 표현된 상기 관심 성향과 계층별 타겟 광고층 정보를 내적(inner product)하고 각 내적값을 비교함으로써 이루어질 수 있음은 전술한 바와 같다.
Next, an advertisement corresponding to the web page is selected in consideration of the calculated interest propensity of the web page and target advertisement layer information for each advertisement stored in the advertisement database 104. As described above, the advertisement selection may be performed by inner product of the interest tendency expressed by the vector and target advertisement layer information for each layer, and comparing respective inner products.

한편, 본 발명의 실시예는 본 명세서에서 기술한 방법들을 컴퓨터상에서 수행하기 위한 프로그램을 포함하는 컴퓨터 판독 가능 기록매체를 포함할 수 있다. 상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야에서 통상의 지식을 가진 자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 플로피 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Meanwhile, an embodiment of the present invention may include a computer readable recording medium including a program for performing the methods described herein on a computer. The computer-readable recording medium may include program instructions, local data files, local data structures, etc. alone or in combination. The media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those skilled in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute program instructions are included. Examples of program instructions may include high-level language code that can be executed by a computer using an interpreter as well as machine code such as produced by a compiler.

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. Although the present invention has been described in detail with reference to exemplary embodiments above, those skilled in the art to which the present invention pertains can make various modifications to the above-described embodiments without departing from the scope of the present invention. Will understand.

그러므로 본 발명의 권리범위는 설명된 실시 예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined by the claims below and equivalents thereof.

100 : 콘텐츠와 광고물 간 매칭 시스템
102 : 키워드 데이터베이스 104 : 광고물 데이터베이스
106 : 키워드 추출부 108 : 키워드 처리부
110 : 광고물 선정부100: Matching system between content and advertisement
102: keyword database 104: advertisement database
106: keyword extraction unit 108: keyword processing unit
110: advertisement selection unit

Claims

A keyword database for storing a plurality of keywords and interest information for each hierarchical section of the keyword;
An advertisement database storing a plurality of advertisements and target advertisement layer information of the advertisements;
A keyword extraction unit for extracting keywords and the frequency of each keyword from the web page;
A keyword processing unit that calculates an interest propensity of the web page by using the extracted keyword and frequency and interest level information for each hierarchical section of each keyword stored in the keyword database; And
An advertisement selection unit configured to select advertisements corresponding to the web page by comparing the interest propensity of the web page calculated by the keyword processor and target advertisement layer information for each advertisement stored in the advertisement database;
Matching system between the content and the advertisement comprising a.

The method of claim 1,
The target advertisement layer information of the advertisement is a matching system between content and advertisement, which is an expected preference of the advertisement for each layer section.

The method of claim 2,
The keyword extracting unit generates a text from which the HTML code value is removed from the web page, and extracts the keyword and the frequency by stemming the text.

The method of claim 2,
The keyword processing unit,
The keyword extracting unit multiplies the interest level information of each layer section corresponding to each keyword by the frequency of each keyword, sums up the interest information of each keyword of the web page multiplied by the frequency, and adds the interest level information to each layer section. A matching system between the content and the advertisement, which generates a propensity vector of interest whose element is.

The method of claim 4, wherein
The advertisement selection unit,
Generates an advertisement propensity vector having an element of the predicted preference for each layer section corresponding to each advertisement stored in the advertisement database, and internalizes the generated advertisement propensity vector and the interest propensity vector of the web page. Matching content between the content and the advertisement by comparing the sizes of the web pages.

The method of claim 1,
If there is a keyword that is not stored in the keyword database among the keywords extracted by the keyword extracting unit, the keyword is selected as an extended keyword, and the interest information for each hierarchical section of the extended keyword and the frequency in the web page are calculated. A keyword expansion unit; And
An extended keyword database for storing the interest information for each of the extension keywords and the hierarchical sections of the extension keyword;
Further comprising, the matching system between the content and the advertisement.

The method of claim 6,
The keyword processing unit,
If there is a keyword that is not stored in the keyword database among the keywords extracted by the keyword extracting unit, it is determined whether the keyword is a keyword stored in the extended keyword database. Matching system between the content and the advertisement, extracting the interest level information for each layer of the keyword from the keyword database.

The method of claim 7, wherein
When the keyword processing unit extracts the interest level information for each of the hierarchical sections of the corresponding keyword from the extended keyword database, the keyword processor applies a weight to the interest information for each of the hierarchical sections according to the frequency of the web page of the extended keyword. And matching system between advertisements.

Extracting keywords and a frequency of each keyword from a web page in a matching system between the content and the advertisement;
Calculating an interest propensity of the web page by using the extracted keyword and frequency and interest information for each hierarchical section of the keyword in the matching system between the content and the advertisement;
Selecting an advertisement corresponding to the web page in the matching system between the content and the advertisement, by comparing the calculated interest inclination of the web page and target advertisement layer information for each advertisement;
Matching method between the content and the advertisement comprising a.

10. The method of claim 9,
The target advertisement layer information of the advertisement is a predicted preference of the advertisement for each of the hierarchical sections.

The method of claim 10,
The keyword and frequency extraction step,
Generating text in which the HTML code value has been removed from the web page; And
Morphologically analyzing the text to extract the keyword and the frequency;
Matching method between the content and the advertisement comprising a.

The method of claim 10,
Calculating interest propensity of the web page,
Multiplying the frequency information of each keyword by the interest information for each hierarchical section corresponding to each keyword extracted in the keyword and frequency extraction step;
Generating an interest propensity vector having the interest of each section as the sum of all interest information for each keyword of the web page multiplied by the frequency;
Matching method between the content and the advertisement further comprising.

The method of claim 12,
The advertisement selection step,
Generating an advertisement propensity vector having an element of the preferred preference for each of the hierarchical sections corresponding to each advertisement and comparing the magnitude of the generated advertisement propensity vector with the interest propensity vector of the web page. Matching method between the content and the advertisement to select the advertisement corresponding to the web page.

A computer-readable recording medium storing a program for performing a method recorded in claim 9 on a computer.