KR20090112020A

KR20090112020A - System and method for extracting caption candidate and system and method for extracting image caption using text information and structural information of document

Info

Publication number: KR20090112020A
Application number: KR1020080037684A
Authority: KR
Inventors: 임해창; 이재범; 김지승; 이상호
Original assignee: 엔에이치엔(주)
Priority date: 2008-04-23
Filing date: 2008-04-23
Publication date: 2009-10-28
Also published as: KR100955758B1

Abstract

PURPOSE: A system and a method for extracting caption candidate, and a system and a method for extracting image caption using text information, and structural information of document are provided to improve image caption extracting function by calculating the probability that the image-caption candidate pair is the correct answer pair. CONSTITUTION: A system and a method for extracting caption candidate, and a system and a method for extracting image caption using text information comprises an image-caption candidate pair generating unit, a feature extracting unit, and an image caption selecting unit. The image-caption candidate pair generating unit creates the included in a document image-caption candidate pair about an image using a text and structural information of a document(S801). The feature extracting unit extracts the generated feature about the image-caption candidate pair(S802). The image caption selecting unit selects the image caption from the image-caption candidate pair of which is generated using the probability according to the extracted feature(S803).

Description

SYSTEM AND METHOD FOR EXTRACTING CAPTION CANDIDATE AND SYSTEM AND METHOD FOR EXTRACTING IMAGE CAPTION USING TEXT INFORMATION AND STRUCTURAL INFORMATION OF DOCUMENT}

본 발명은 문서의 텍스트 정보 및 구조적 정보를 이용한 캡션 후보 추출 시스템 및 방법, 그리고 이미지 캡션 추출 시스템 및 방법에 관한 것이다. 보다 자세하게는, 문서의 이미지 각각에 대한 캡션 후보를 추출하여 이미지-캡션 후보를 생성하고, 생성된 이미지-캡션 후보에 대해 사전 학습에 따른 확률을 계산하여 이미지 캡션을 추출하는 캡션 후보 추출 시스템 및 방법, 그리고 이미지 캡션 추출 시스템 및 방법에 관한 것이다.The present invention relates to a caption candidate extraction system and method using text information and structural information of a document, and an image caption extraction system and method. In more detail, a caption candidate extraction system and method for extracting a caption candidate for each image of a document to generate an image-caption candidate, and extracting an image caption by calculating probabilities according to pre-learning for the generated image-caption candidate. And an image caption extraction system and method.

최근 문서에 포함된 이미지를 검색하는 기술에 대해 관심이 높아지고 있다. 다만, 웹 문서에는 다양한 이미지가 포함되어 있기 때문에, 웹 문서에서 실제로 검색자가 요구하는 이미지를 검색할 때, 웹 문서에는 다양한 이미지가 포함되어 있다.Recently, there is a growing interest in a technique for searching for an image included in a document. However, since a web document includes various images, when searching for an image actually required by a searcher in the web document, the web document includes various images.

이 때, 이미지 검색을 위해 이미지 자체의 속성보다는 이미지를 설명하는 단 어인 이미지 캡션을 추출하여 이미지 캡션에 따라 검색하는 경우, 이미지 검색의 성능이 향상될 수 있다. 결국, 이미지 캡션 추출의 성능은 이미지 검색 시스템의 성능에 영향을 미칠 수 있다.In this case, when extracting an image caption which is a word describing an image rather than an attribute of the image itself and searching for the image caption, the performance of the image search may be improved. As a result, the performance of image caption extraction may affect the performance of the image retrieval system.

다만, 이미지 캡션은 문서에 포함된 다수의 텍스트 중 하나이기 때문에, 텍스트 중 이미지 캡션을 선택하는 기준이 요구된다. 또한, 사전에 이미지 캡션 추출 대상이 될 수 있는 이미지를 선택할 필요도 있다. 그리고, 이미지를 설명하는 텍스트는 다수 존재할 수 있으므로, 이러한 텍스트에서 최적의 이미지 캡션을 선택하는 기준도 필요하다.However, since the image caption is one of a plurality of texts included in the document, a criterion for selecting an image caption among the texts is required. It is also necessary to select an image that can be an image caption extraction target in advance. In addition, since there may be many texts describing the image, a criterion for selecting an optimal image caption from such text is also required.

본 발명은 문서의 텍스트 정보 및 구조적 정보를 이용하여 문서의 이미지-캡션 후보 쌍을 생성함으로써, 이미지 캡션을 효율적으로 추출하는 캡션 후보 추출 시스템 및 방법과 이미지 캡션 추출 시스템 및 방법을 제공한다.The present invention provides a caption candidate extraction system and method and an image caption extraction system and method for efficiently extracting an image caption by generating an image-caption candidate pair of a document using text information and structural information of the document.

본 발명은 이미지 속성 및 텍스트 속성에 따른 규칙을 이용함으로써, 캡션 후보 추출 성능을 향상시킬 수 있는 캡션 후보 추출 시스템 및 방법과 이미지 캡션 추출 시스템 및 방법을 제공한다.The present invention provides a caption candidate extraction system and method and an image caption extraction system and method that can improve caption candidate extraction performance by using rules according to image attributes and text attributes.

본 발명은 문서를 구조적으로 파싱하여 브라우저 상의 문서 구성 요소의 실제 위치값을 추출함으로써, 이미지에 대한 캡션 후보를 효율적으로 추출하는 캡션 후보 추출 시스템 및 방법과 이미지 캡션 추출 시스템 및 방법을 제공한다.The present invention provides a caption candidate extraction system and method and an image caption extraction system and method for efficiently extracting a caption candidate for an image by structurally parsing a document and extracting actual position values of document components on a browser.

본 발명은 사전 학습된 확률 기반의 분류 모델을 통해 생성된 이미지-캡션 후보 쌍이 정답 쌍이 될 확률을 계산하여 이미지 캡션을 추출함으로써, 이미지 캡션 추출 성능을 향상시키는 이미지 캡션 추출 시스템 및 방법을 제공한다.The present invention provides an image caption extraction system and method for improving image caption extraction performance by extracting an image caption by calculating a probability that an image-caption candidate pair generated through a pre-trained probability-based classification model becomes a correct answer pair.

본 발명의 일실시예에 따른 캡션 후보 추출 시스템은 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출하는 문서 파싱부, 상기 문서에 포함된 이미지들 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행하는 이미지 필터링부 및 상기 문서에 포함된 텍스트로부터 상기 이미지 캡션 추출 대상인 이미지에 대한 캡션 후보를 결정하는 캡션 후보 결정부를 포함할 수 있 다.A caption candidate extraction system according to an embodiment of the present invention includes a document parsing unit for parsing a document and extracting text information and structural information of the document, and an image for determining an image that is an image caption extraction target among images included in the document. The image filtering unit may perform filtering and a caption candidate determiner to determine a caption candidate for an image that is an image caption extraction target from text included in the document.

본 발명의 일실시예에 따른 이미지 캡션 추출 시스템은 문서의 텍스트 정보 및 구조적 정보를 이용하여 상기 문서에 포함된 이미지들 각각에 대한 이미지-캡션 후보 쌍을 생성하는 이미지-캡션 후보 쌍 생성부, 상기 생성된 이미지-캡션 후보 쌍 각각에 대한 피처를 추출하는 피처 추출부 및 상기 추출된 피처에 따른 확률을 이용하여 상기 생성된 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 이미지 캡션 선택부를 포함할 수 있다.The image caption extraction system according to an embodiment of the present invention is an image-caption candidate pair generator for generating an image-caption candidate pair for each of the images included in the document using text information and structural information of the document. It may include a feature extraction unit for extracting a feature for each of the generated image-caption candidate pair and an image caption selection unit for selecting an image caption from the generated image-caption candidate pair by using the probability according to the extracted feature .

본 발명의 일실시예에 따른 캡션 후보 추출 방법은 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출하는 단계, 상기 문서에 포함된 이미지들 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행하는 단계 및 상기 문서에 포함된 텍스트로부터 상기 이미지 캡션 추출 대상인 이미지에 대한 캡션 후보를 결정하는 단계를 포함할 수 있다.The caption candidate extraction method according to an embodiment of the present invention includes parsing a document to extract text information and structural information of the document, and performing image filtering to determine an image that is an image caption extraction target among images included in the document. And performing a caption candidate for the image that is the image caption extraction target from the text included in the document.

본 발명의 일실시예에 따른 이미지 캡션 추출 방법은 문서의 텍스트 정보 및 구조적 정보를 이용하여 상기 문서에 포함된 이미지들 각각에 대한 이미지-캡션 후보 쌍을 생성하는 단계, 상기 생성된 이미지-캡션 후보 쌍 각각에 대한 피처를 추출하는 단계 및 상기 추출된 피처에 따른 확률을 이용하여 상기 생성된 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 단계를 포함할 수 있다.The image caption extraction method according to an embodiment of the present invention comprises the steps of generating an image-caption candidate pair for each of the images included in the document using text information and structural information of the document, the generated image-caption candidate Extracting a feature for each pair and selecting an image caption from the generated image-caption candidate pair using a probability according to the extracted feature.

본 발명에 따르면, 문서의 텍스트 정보 및 구조적 정보를 이용하여 문서의 이미지-캡션 후보 쌍을 생성함으로써, 이미지 캡션을 효율적으로 추출하는 캡션 후 보 추출 시스템 및 방법과 이미지 캡션 추출 시스템 및 방법이 제공된다.According to the present invention, there is provided a caption candidate extraction system and method and an image caption extraction system and method for efficiently extracting an image caption by generating an image-caption candidate pair of the document using text information and structural information of the document. .

본 발명에 따르면, 이미지 속성 및 텍스트 속성에 따른 규칙을 이용함으로써, 캡션 후보 추출 성능을 향상시킬 수 있는 캡션 후보 추출 시스템 및 방법과 이미지 캡션 추출 시스템 및 방법이 제공된다.According to the present invention, there is provided a caption candidate extraction system and method and an image caption extraction system and method capable of improving caption candidate extraction performance by using rules according to image attributes and text attributes.

본 발명에 따르면, 문서를 구조적으로 파싱하여 브라우저 상의 문서 구성 요소의 실제 위치값을 추출함으로써, 이미지에 대한 캡션 후보를 효율적으로 추출하는 캡션 후보 추출 시스템 및 방법과 이미지 캡션 추출 시스템 및 방법이 제공된다.According to the present invention, there is provided a caption candidate extraction system and method and an image caption extraction system and method for efficiently extracting a caption candidate for an image by structurally parsing a document and extracting actual position values of document components on a browser. .

본 발명에 따르면, 사전 학습된 확률 기반의 분류 모델을 통해 생성된 이미지-캡션 후보 쌍이 정답 쌍이 될 확률을 계산하여 이미지 캡션을 추출함으로써, 이미지 캡션 추출 성능을 향상시키는 이미지 캡션 추출 시스템 및 방법이 제공된다.According to the present invention, there is provided an image caption extraction system and method for improving image caption extraction performance by extracting an image caption by calculating a probability that an image-caption candidate pair generated through a pre-learned probability-based classification model becomes a correct answer pair. do.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다. 본 발명의 일실시예에 따른 캡션 후보 추출 방법은 캡션 후보 추출 시스템에 의해 수행될 수 있다. 그리고, 본 발명의 일실시예에 따른 이미지 캡션 추출 방법은 이미지 캡션 추출 시스템에 의해 수행될 수 있다.Hereinafter, with reference to the contents described in the accompanying drawings will be described in detail an embodiment according to the present invention. However, the present invention is not limited or limited by the embodiments. Like reference numerals in the drawings denote like elements. Caption candidate extraction method according to an embodiment of the present invention may be performed by a caption candidate extraction system. And, the image caption extraction method according to an embodiment of the present invention can be performed by the image caption extraction system.

도 1은 본 발명의 일실시예에 따른 이미지 캡션 추출 시스템을 이용하여 이미지 캡션 리스트를 추출하는 과정을 도시한 도면이다.1 is a diagram illustrating a process of extracting an image caption list using an image caption extraction system according to an embodiment of the present invention.

도 1을 참고하면, 본 발명의 이미지 캡션 추출 시스템(103)은 이미지의 주변에 있는 텍스트 정보 및 구조적 정보를 이용하여 HTML 웹 문서(102-1~102-n) 각각에 포함된 이미지 별로 이미지 캡션을 추출하여 이미지 캡션 리스트(104)를 생성할 수 있다. 도 1에서 웹(100)을 도시하였지만, 웹 이외에 다른 온라인 환경도 적용될 수 있다.Referring to FIG. 1, the image caption extraction system 103 of the present invention uses image information and structural information in the periphery of an image to capture an image caption for each image included in each of the HTML web documents 102-1 to 102-n. May be extracted to generate an image caption list 104. Although the web 100 is illustrated in FIG. 1, other online environments may be applied in addition to the web.

일례로, HTML 웹 문서(102-1~102-n)에 포함된 이미지는 이미지 캡션을 가질 수도 있거나, 가지지 않을 수도 있다. 그리고, 이미지 별로 추출되는 이미지 캡션은 최대 하나의 이미지 캡션을 가질 수 있다. 이 때. 이미지 캡션은 이미지를 설명하기 위해 HTML 웹 문서(102-1~102-n)의 본문과 별도로 부여된 텍스트를 의미한다.In one example, images included in HTML web documents 102-1 through 102-n may or may not have image captions. The image caption extracted for each image may have at most one image caption. At this time. Image caption refers to text given separately from the body of HTML web documents 102-1 through 102-n to describe an image.

이미지 캡션 추출 시스템(103)은 HTML 웹 문서(102-1~102-n)에 포함된 이미지 별로 캡션 후보를 결정할 수 있다. 이 때, 캡션 후보는 이미지에 대해 HTML 웹 문서(102-1~102-n)에 포함된 텍스트 중 이미지 캡션이 될 수 있는 단위 텍스트를 의미할 수 있다.The image caption extraction system 103 may determine a caption candidate for each image included in the HTML web documents 102-1 through 102-n. In this case, the caption candidate may mean unit text that may become an image caption among texts included in the HTML web documents 102-1 to 102-n with respect to the image.

본 발명의 일실시예에 따르면, 이미지 캡션 추출 시스템(103)은 HTML 웹 문서(102-1~102-n)를 구조적으로 파싱하여 이미지와 캡션 후보의 웹 브라우저(101-1~101-n) 상의 실제 위치 값(구조적 정보)을 추출할 수 있다. 즉, HTML 웹 문서가 웹 브라우저를 통해 제공되는 경우, 이미지 캡션 추출 시스템(103)은 웹 브라우저 상에서의 이미지나 캡션 후보의 실제 위치를 추출할 수 있다. 이 때, 웹 브라우저(101-1~101-n)의 종류는 한정되지 않는다. 또한, HTML 웹 문서(102-1~102-n)는 일례에 불과하고, 태그에 따라 문서를 구조적으로 구성할 수 있는 다른 언어(예를 들면, XML 등)로 구현된 문서로 대체될 수 있다.According to an embodiment of the present invention, the image caption extraction system 103 structurally parses the HTML web documents 102-1 to 102-n to web browsers 101-1 to 101-n of the image and caption candidates. The actual position value (structural information) of the image can be extracted. That is, when the HTML web document is provided through the web browser, the image caption extraction system 103 may extract the actual position of the image or caption candidate on the web browser. At this time, the types of web browsers 101-1 to 101-n are not limited. In addition, the HTML web documents 102-1 to 102-n are merely examples, and may be replaced with documents implemented in other languages (for example, XML, etc.) that can structurally construct the document according to a tag. .

결국, 본 발명은 HTML 웹 문서(102-1~102-n)를 파싱하여 추출된 구조적 정보를 이용함으로써, 웹 브라우저(101-1~101-n)를 통해 시각적으로 느끼는 실제 거리를 반영하여 이미지에 대해 보다 정확한 이미지 캡션을 추출할 수 있다. 또한, 본 발명은 구조적 정보를 이용함으로써, 보다 넓은 범위에서 넓은 단위의 텍스트인 이미지 캡션을 추출할 수 있다.As a result, the present invention uses the structural information extracted by parsing the HTML web documents 102-1 to 102-n, thereby reflecting the actual distance visually felt through the web browsers 101-1 to 101-n. More accurate image captions can be extracted for. In addition, the present invention can extract the image caption which is a wide unit of text in a wider range by using the structural information.

도 2는 본 발명의 일실시예에 따른 캡션 후보 추출 시스템의 전체 구성을 도시한 블록 다이어그램이다.2 is a block diagram showing the overall configuration of a caption candidate extraction system according to an embodiment of the present invention.

도 2를 참고하면, 캡션 후보 추출 시스템(200)은 문서 파싱부(201), 이미지 필터링부(202) 및 캡션 후보 결정부(203)를 포함할 수 있다. 이 때, 캡션 후보 추출 시스템(200)은 이미지 캡션 추출 시스템(103)의 한 구성 요소일 수 있다.Referring to FIG. 2, the caption candidate extraction system 200 may include a document parser 201, an image filter 202, and a caption candidate determiner 203. In this case, the caption candidate extraction system 200 may be a component of the image caption extraction system 103.

문서 파싱부(201)는 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출할 수 있다. 이 때, 텍스트 정보는 문서에 포함된 텍스트의 길이, 폭, 폰트 등의 텍스트 형태를 의미할 수 있다. 그리고, 구조적 정보는 문서의 구성 요소의 위치를 의미할 수 있다.The document parsing unit 201 may parse the document and extract text information and structural information of the document. In this case, the text information may refer to a text form such as a length, a width, and a font of the text included in the document. In addition, the structural information may mean the location of the components of the document.

일례로, 문서 파싱부(201)는 문서를 파싱하여 브라우저 상에서 문서 구성 요소의 실제 위치 값을 추출할 수 있다. 이 때, 문서 파싱부(201)는 문서에 포함된 이미지와 텍스트 각각의 위치 값을 추출함으로써, 웹 브라우저 상에서의 실제 거리가 결정될 수 있다.In one example, the document parsing unit 201 may parse the document to extract the actual position value of the document component on the browser. In this case, the document parsing unit 201 may extract a position value of each of the image and the text included in the document, thereby determining the actual distance on the web browser.

이 때, 문서는 HTML 웹 문서를 포함할 수 있다. 일례로, 문서 파싱부(201)는 HTML 인터페이스에 따른 파서(parser)를 이용하여 HTML로 구성된 HTML 웹 문서를 파싱 할 수 있다.In this case, the document may include an HTML web document. For example, the document parsing unit 201 may parse an HTML web document composed of HTML using a parser according to the HTML interface.

이미지 필터링부(202)는 문서에 포함된 이미지들 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행할 수 있다. 이 때, 이미지 필터링부(202)는 이미지들의 속성에 따른 필터를 이용하여 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행할 수 있다.The image filtering unit 202 may perform image filtering to determine an image, which is an image caption extraction target, among images included in a document. In this case, the image filtering unit 202 may perform image filtering to determine an image that is an image caption extraction target by using a filter according to the attributes of the images.

일례로, 이미지들의 속성에 따른 필터는 이미지의 크기에 따른 필터, 이미지의 가로세로 비율에 따른 필터, 이미지의 파일명에 따른 필터 및 문서 내 이미지의 중복 배치에 따른 필터를 포함할 수 있다.For example, the filter according to the attributes of the images may include a filter based on the size of the image, a filter based on the aspect ratio of the image, a filter based on the file name of the image, and a filter based on the overlapping arrangement of the images in the document.

예를 들면, 이미지 필터링부(202)는 이미지의 크기에 따른 필터를 통해 이미지의 가로와 세로 곱이 미리 설정된 값보다 작은 이미지를 필터링 할 수 있다. 그리고, 이미지 필터링부(202)는 이미지의 가로세로 비율에 따른 필터를 통해 이미지의 가로세로 비율이 미리 설정된 값 이상인 경우 해당 이미지를 필터링 할 수 있다.For example, the image filtering unit 202 may filter an image in which the horizontal and vertical products of the image are smaller than a preset value through a filter according to the size of the image. The image filtering unit 202 may filter the image when the aspect ratio of the image is greater than or equal to a preset value through a filter based on the aspect ratio of the image.

또한, 이미지 필터링부(202)는 이미지의 파일명에 따른 필터는 banner, icon, menu, logo, top, bottom, left, right 등 문서 상에서 메뉴, 로고 또는 배너 등을 지칭하는 단어가 이미지의 파일명에 포함된 경우 해당 이미지를 필터링 할 수 있다. 또한, 이미지 필터링부(202)는 문서 내 이미지의 중복 배치에 따른 필터를 통해 문서 내 중복 등장하는 이미지를 필터링 할 수 있다. 다만, 배열 형태의 이미지는 배열 형태의 이미지 전체를 설명하는 이미지 캡션이 존재할 수 있기 때문에, 예외적으로 필터링 되지 않을 수 있다.In addition, the image filtering unit 202, the filter according to the image file name of the image, such as banner, icon, menu, logo, top, bottom, left, right, such as a menu, logo or banner on the document including the file name of the image If so, you can filter the image. In addition, the image filtering unit 202 may filter the images appearing in the document through a filter according to the overlapping arrangement of the images in the document. However, since the image of the array type may have an image caption describing the entire image of the array type, it may not be filtered exceptionally.

즉, 문서에 포함된 이미지가 문서의 부수적인 구성 요소에 해당할 가능성이 높은 경우, 상기 이미지에 대한 캡션은 무의미할 수 있다. 따라서, 이미지들의 속성에 따른 필터는 문서의 부수적인 구성 요소에 해당하는 이미지를 제거하는 필터들을 의미할 수 있다.That is, when an image included in a document is highly likely to correspond to ancillary components of the document, the caption for the image may be meaningless. Accordingly, the filters according to the attributes of the images may refer to filters for removing an image corresponding to an additional component of the document.

캡션 후보 결정부(203)는 문서에 포함된 텍스트로부터 상기 이미지 캡션 추출 대상인 이미지에 대한 캡션 후보를 결정할 수 있다. 이 때, 캡션 후보는 문서에 포함된 텍스트 중 이미지 캡션 추출 대상인 이미지와 인접한 단위 텍스트들을 의미할 수 있다.The caption candidate determiner 203 may determine a caption candidate for the image that is the image caption extraction target from the text included in the document. In this case, the caption candidate may mean unit texts adjacent to an image caption extraction target image among the text included in the document.

일례로, 캡션 후보 결정부(203)는 이미지 필터링부(202)를 통해 이미지 캡션 추출 대상인 이미지에 대해 각 방향 별로 인접한 단위 텍스트를 대상으로 캡션 후보를 결정할 수 있다. 이 때, 캡션 후보 결정부(203)는 문서의 태그를 이용하여 단위 텍스트를 추출하고, 상기 단위 텍스트 중 텍스트의 길이 및 이미지와의 거리를 이용하여 캡션 후보를 결정할 수 있다.For example, the caption candidate determiner 203 may determine the caption candidate for the unit text adjacent to each direction of the image that is the image caption extraction target through the image filtering unit 202. In this case, the caption candidate determiner 203 may extract the unit text by using the tag of the document, and determine the caption candidate by using the length of the text and the distance from the image in the unit text.

예를 들면, 캡션 후보 결정부(203)는 문서 내에 단락 또는 테이블에 대한 태그를 이용하여 단위 텍스트를 추출할 수 있다. 다만, 상기 태그의 하위 노드에 대응하는 단락 또는 테이블에 대한 태그는 배제될 수 있다.For example, the caption candidate determiner 203 may extract unit text using a tag for a paragraph or a table in the document. However, a tag for a paragraph or a table corresponding to a lower node of the tag may be excluded.

그리고, 캡션 후보 결정부(203)는 추출된 단위 텍스트가 미리 설정한 텍스트의 길이에 해당하는 지 여부 및 미리 설정한 이미지와의 거리(예를 들면, 이미지의 폭과 높이의 평균값의 2배 이하 등)에 포함되는 지 여부를 고려하여 캡션 후보를 결정할 수 있다.The caption candidate determiner 203 determines whether the extracted unit text corresponds to a preset length of text and a distance from the preset image (for example, two times or less of an average value of the width and height of the image). Caption candidate may be determined in consideration of whether the information is included in the "

일례로, 캡션 후보 결정부(203)는 이미지에 대해 각 방향 별로 인접한 단위 텍스트를 대상으로 캡션 후보를 결정할 수 있는데, 각 방향에 대해 단위 텍스트가 하나 이상인 경우, 이미지와 거리가 가장 가까운 단위 텍스트를 캡션 후보를 결정할 수 있다. 만약, 이미지와의 거리가 가장 가까운 단위 텍스트가 하나 이상인 경우, 상기 단위 텍스트 모두를 캡션 후보로 결정할 수 있다.For example, the caption candidate determiner 203 may determine the caption candidate for the unit text adjacent to each direction of the image. When the unit text is one or more in each direction, the caption candidate determiner 203 may determine the unit text closest to the image. Caption candidates may be determined. If there is at least one unit text closest to the image, the unit text may be determined as a caption candidate.

여기서, 문서에 포함된 텍스트 중 캡션 후보를 결정하기 위한 기준을 만족하지 못하는 경우, 이미지 캡션 추출 대상인 특정 이미지에 대해 캡션 후보는 결정되지 않을 수 있다. 즉, 이미지 캡션 추출 대상인 이미지 별로 결정되는 캡션 후보는 적어도 하나가 존재하거나 또는 존재하지 않을 수 있다.Here, when the criteria for determining the caption candidate among the text included in the document are not satisfied, the caption candidate may not be determined for a specific image that is an image caption extraction target. That is, at least one caption candidate determined for each image that is an image caption extraction target may or may not exist.

본 발명은 이미지에 인접하는 단위 텍스트를 대상으로 캡션 후보를 결정하는 방법은 상기 예에 한정되지 않고, 다양한 방법이 적용될 수 있다.The present invention is not limited to the above example for determining the caption candidate for the unit text adjacent to the image, and various methods may be applied.

도 3은 본 발명의 일실시예에 따른 이미지 캡션 추출 시스템의 전체 구성을 도시한 블록 다이어그램이다.3 is a block diagram showing the overall configuration of an image caption extraction system according to an embodiment of the present invention.

도 3을 참고하면, 이미지 캡션 추출 시스템(300)은 이미지-캡션 후보 쌍 생성부(301), 피처 추출부(302) 및 이미지 캡션 선택부(303)를 포함할 수 있다. 이 때, 이미지-캡션 후보 쌍 생성부(301)는 도 2에 도시된 캡션 후보 추출 시스템(200)으로 대체될 수 있다.Referring to FIG. 3, the image caption extraction system 300 may include an image-caption candidate pair generator 301, a feature extractor 302, and an image caption selector 303. In this case, the image-caption candidate pair generator 301 may be replaced with the caption candidate extraction system 200 illustrated in FIG. 2.

이미지-캡션 후보 쌍 생성부(301)는 문서의 텍스트 정보 및 구조적 정보를 이용하여 상기 문서에 포함된 이미지들 각각에 대한 이미지-캡션 후보 쌍을 생성할 수 있다. 미리 설정된 규칙에 따라 상기 이미지에 대해 캡션 후보가 결정되면, 이미지 별로 이미지-캡션 후보 쌍이 생성된다.The image-caption candidate pair generator 301 may generate an image-caption candidate pair for each of the images included in the document using text information and structural information of the document. When a caption candidate is determined for the image according to a preset rule, an image-caption candidate pair is generated for each image.

이 때, 이미지-캡션 후보 쌍 생성부(301)는 문서 파싱부(304), 이미지 필터링부(305) 및 캡션 후보 결정부(306)를 포함할 수 있다. 즉, 이미지 캡션 후보 쌍 생성부(301)는 문서의 이미지에 대해 캡션 후보를 결정하고, 결정된 캡션 후보에 따라 이미지 별로 이미지-캡션 후보 쌍을 생성할 수 있다.In this case, the image-caption candidate pair generator 301 may include a document parser 304, an image filter 305, and a caption candidate determiner 306. That is, the image caption candidate pair generation unit 301 may determine a caption candidate for an image of the document, and generate an image-caption candidate pair for each image according to the determined caption candidate.

문서 파싱부(304)는 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출할 수 있다. 이 때, 문서 파싱부(304)는 문서를 파싱하여 브라우저 상에서 문서 구성 요소의 실제 위치 값을 추출할 수 있다. 일례로, 문서 파싱부(304)는 문서가 HTML 웹 문서인 경우, 하이퍼텍스트 마크 업 언어(HTML) 인터페이스에 따른 파서(parser)를 이용하여 HTML로 구성된 상기 문서를 파싱 할 수 있다.The document parsing unit 304 may parse the document and extract text information and structural information of the document. At this time, the document parsing unit 304 may parse the document to extract the actual position value of the document component on the browser. For example, when the document is an HTML web document, the document parsing unit 304 may parse the document composed of HTML using a parser according to a hypertext markup language (HTML) interface.

이미지 필터링부(305)는 문서에 포함된 이미지들 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행할 수 있다. 일례로, 이미지 필터링부(305)는 이미지들의 속성에 따른 필터를 이용하여 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행할 수 있다. 이미지들의 속성에 따른 필터는 도 2의 설명을 참고할 수 있다.The image filtering unit 305 may perform image filtering to determine an image, which is an image caption extraction target, among images included in a document. For example, the image filtering unit 305 may perform image filtering to determine an image that is an image caption extraction target by using a filter according to the attributes of the images. The filter according to the attributes of the images may refer to the description of FIG. 2.

캡션 후보 결정부(306)는 문서에 포함된 텍스트로부터 상기 이미지 캡션 추출 대상인 이미지에 대한 캡션 후보를 결정할 수 있다. 이 때, 캡션 후보 결정부(306)는 이미지 캡션 추출 대상인 이미지에 대해 각 방향 별로 인접한 단위 텍스 트를 대상으로 캡션 후보를 결정할 수 있다.The caption candidate determiner 306 may determine a caption candidate for the image that is the image caption extraction target from the text included in the document. In this case, the caption candidate determiner 306 may determine a caption candidate based on unit text adjacent to each direction of the image that is the image caption extraction target.

이미지 캡션 추출 대상인 이미지에 대해 캡션 후보를 결정하는 경우, 문서에 포함된 텍스트 중 미리 설정한 캡션 후보 결정 기준을 만족하는 텍스트가 존재하지 않는 경우, 상기 이미지에 대해 캡션 후보가 결정되지 않을 수 있다. 캡션 후보가 결정되지 않은 이미지에 대해서는 이미지-캡션 후보 쌍으로부터 피처를 추출하는 과정과 이미지 캡션을 선택하는 과정이 진행되지 않는다.When caption candidates are determined for an image that is an image caption extraction target, when there is no text that satisfies a predetermined caption candidate determination criterion among texts included in a document, a caption candidate may not be determined for the image. The extraction of the feature from the image-caption candidate pair and the selection of the image caption are not performed on the image for which the caption candidate is not determined.

일례로, 캡션 후보 결정부(306)는 문서의 태그를 이용하여 상기 단위 텍스트를 추출하고, 상기 단위 텍스트 중 텍스트의 길이 및 이미지와의 거리를 이용하여 캡션 후보를 결정할 수 있다. 캡션 후보를 결정하는 구체적인 예는 도 2를 참고할 수 있다.For example, the caption candidate determiner 306 may extract the unit text by using a tag of the document, and determine the caption candidate using the length of the text and the distance from the image in the unit text. A specific example of determining the caption candidate may refer to FIG. 2.

피처 추출부(302)는 생성된 이미지-캡션 후보 쌍 각각에 대한 피처(feature)를 추출할 수 있다. 일례로, 피처 추출부(302)는 생성된 이미지-캡션 후보 쌍에 대해 텍스트 정보 및 구조적 정보를 이용하여 피처를 추출할 수 있다. 여기서, 피처는 이미지의 속성, 캡션 후보의 속성 및 이미지와 캡션 후보와의 관계에 따른 속성일 수 있다.The feature extractor 302 may extract a feature for each of the generated image-caption candidate pairs. For example, the feature extractor 302 may extract a feature using text information and structural information about the generated image-caption candidate pair. Here, the feature may be an attribute of an image, an attribute of a caption candidate, and an attribute according to a relationship between the image and the caption candidate.

이 때, 피처 추출부(302)는 생성된 이미지-캡션 후보 쌍에 대해 이미지 크기, 이미지 가로세로 비율, 이미지 포맷, 캡션 후보의 길이, 캡션 후보의 반복 존재 여부, 폰트 태그 사용 여부, 이미지와 캡션 후보와의 거리, 캡션 후보 방향, 이미지 폭 대비 캡션 후보의 폭, 앵커 태그의 사용 여부, 캡션 키워드 존재 여부 또는 종결 부호 존재 여부를 포함하는 피처를 추출할 수 있다.At this time, the feature extraction unit 302 is the image size, image aspect ratio, image format, the length of the caption candidate, whether the caption candidate is repeated, whether the font tag is used, the image and the caption for the generated image-caption candidate pair Features including the distance to the candidate, the caption candidate direction, the width of the caption candidate relative to the image width, whether to use an anchor tag, whether a caption keyword exists or whether there is a termination code may be extracted.

상기 언급한 피처는 일례에 불과하고, 시스템의 구성에 따라 변경될 수 있다. 상기 언급한 피처 각각에 대해서는 도 6에서 구체적으로 설명된다.The above-mentioned features are only examples and may be changed according to the configuration of the system. Each of the above-mentioned features is described in detail in FIG. 6.

이미지 캡션 선택부(303)는 추출된 피처에 따른 확률을 이용하여 생성된 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택할 수 있다. 일례로, 이미지 캡션 선택부(303)는 사전 학습된 확률 기반의 분류 모델에 따라 추출된 피처를 이용하여 이미지-캡션 후보 쌍이 정답 쌍이 될 확률 값을 계산할 수 있다.The image caption selector 303 may select an image caption from the generated image-caption candidate pair using the probability according to the extracted feature. As an example, the image caption selector 303 may calculate a probability value that the image-caption candidate pair will be a correct answer pair using the extracted feature according to a pre-learned probability-based classification model.

이 때, 사전 학습된 확률 기반의 분류 모델은 주어진 학습 자료(training data)를 통해 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 과정을 학습하는 것을 의미한다. 이 때, 확률 기반의 분류 모델은 캡션 후보가 이미지 캡션이 될 확률을 이미지-캡션 후보 쌍에 대한 피처 각각에 따른 확률을 통해 결정되는 것으로, 학습 자료를 통해 사전 학습될 수 있다.In this case, the pre-trained probability-based classification model means learning a process of selecting an image caption from an image-caption candidate pair through training data. In this case, the probability-based classification model is to determine the probability that the caption candidate is an image caption through the probability according to each feature for the image-caption candidate pair, it can be pre-learned through the training material.

그리고, 이미지 캡션 선택부(303)는 생성된 이미지-캡션 후보 쌍들 중 상기 계산된 확률값이 미리 설정한 임계값 이상인 것을 대상으로 확률값이 가장 큰 이미지-캡션 후보 쌍을 선정하여 이미지 캡션을 선택할 수 있다. 따라서, 이미지-캡션 후보 쌍에 대해 계산된 확률값 전부가 미리 설정한 임계값 이상이 아닌 경우, 상기 이미지에 대한 이미지 캡션은 선택되지 않을 수 있다.The image caption selector 303 may select an image caption candidate by selecting an image-caption candidate pair having the largest probability value from among the generated image-caption candidate pairs having the calculated probability value equal to or greater than a preset threshold value. . Thus, if all of the probability values calculated for the image-caption candidate pair are not greater than or equal to a preset threshold, the image caption for the image may not be selected.

도 4는 본 발명의 일실시예에 따라 문서를 파싱 하는 과정을 도시한 도면이다.4 illustrates a process of parsing a document according to an embodiment of the present invention.

도 4를 참고하면, 문서(401)는 이미지(403)와 텍스트(404)가 포함될 수 있다. 앞에서 이미 언급했듯이, 문서(401)는 HTML로 된 웹 문서일 수 있다. 본 발 명은 이미지(403)에 인접하는 텍스트(404)로부터 이미지를 설명하는 이미지 캡션을 추출할 수 있다. 이 때, 문서(401)의 구조적 정보를 이용하여 문서(401)에 포함된 텍스트가 이미지(403)에 인접하는 지 여부를 판단할 수 있다.Referring to FIG. 4, the document 401 may include an image 403 and text 404. As already mentioned above, the document 401 may be a web document in HTML. The present invention may extract an image caption describing an image from text 404 adjacent to image 403. In this case, it may be determined whether the text included in the document 401 is adjacent to the image 403 using the structural information of the document 401.

문서 파싱부(304)는 문서(401)를 파싱하여 문서(401)의 텍스트 정보 및 구조적 정보를 추출할 수 있다. 이 때, 문서 파싱부(304)는 문서를 파싱하여 브라우저 상에서 문서 구성 요소(이미지 및 텍스트)의 실제 위치 값을 추출할 수 있다. 이 때, 구조적 정보는 웹 브라우저 상에서 문서 구성 요소들의 실제 위치 값을 포함할 수 있다.The document parsing unit 304 may parse the document 401 to extract text information and structural information of the document 401. At this time, the document parsing unit 304 may parse the document to extract the actual position values of the document components (images and texts) on the browser. In this case, the structural information may include an actual position value of the document elements on the web browser.

일례로, 문서가 HTML 웹 문서인 경우, 문서 파싱부(304)는 하이퍼텍스트 마크 업 언어(HTML) 인터페이스에 따른 파서(parser)를 이용하여 문서(401)를 파싱 할 수 있다.For example, when the document is an HTML web document, the document parsing unit 304 may parse the document 401 using a parser according to a hypertext markup language (HTML) interface.

태그 데이터(402)는 문서(401)의 태그를 통해 문서(401)의 구조를 나타내고 있다. 이 때, 문서 파싱부(304)는 문서(401)의 태그 데이터(402)를 이용하여 문서(401)를 파싱 할 수 있다.The tag data 402 represents the structure of the document 401 through the tag of the document 401. In this case, the document parsing unit 304 may parse the document 401 using the tag data 402 of the document 401.

일례로, 이미지(403)는 img라는 태그를 통해 파싱 될 수 있다(도 4에서는 123.jpg). 그리고, 텍스트(404)는 단락 태그인 <p> 또는 테이블 태그인 <td> 태그를 통해 파싱 될 수 있다(도 4에서 XYZ). 결국, 문서 파싱부(304)를 통해 문서(401)의 텍스트 정보 및 구조적 정보가 추출될 수 있다.In one example, image 403 may be parsed via a tag called img (123.jpg in FIG. 4). In addition, the text 404 may be parsed through a <p> tag or a <td> tag that is a table tag (XYZ in FIG. 4). As a result, text information and structural information of the document 401 may be extracted through the document parser 304.

그러면, 이미지 필터링부(305)는 파싱된 이미지(403)가 이미지 속성에 따른 필터를 통해 문서에 포함된 이미지 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행할 수 있다.. 그리고, 캡션 후보 결정부(306)는 파싱된 텍스트(404)가 캡션 후보가 될 수 있는 단위 텍스트인 지 여부를 판단할 수 있다. 이미지 필터링부(305)와 캡션 후보 결정부(306)는 문서(401)가 파싱 되어 추출된 텍스트 정보 및 구조적 정보를 활용할 수 있다.Then, the image filtering unit 305 may perform image filtering to determine the image that the parsed image 403 is an image caption extraction target among images included in the document through a filter based on the image property. The candidate determiner 306 may determine whether the parsed text 404 is unit text capable of being a caption candidate. The image filtering unit 305 and the caption candidate determiner 306 may utilize text information and structural information extracted by parsing the document 401.

도 5는 본 발명의 일실시예에 따라 캡션 후보를 결정하여 이미지-캡션 후보 쌍을 생성하는 과정을 설명하기 위한 도면이다.5 is a diagram for describing a process of generating an image-caption candidate pair by determining a caption candidate according to an embodiment of the present invention.

이미지-캡션 후보 쌍 생성부(301)는 문서의 텍스트 정보 및 구조적 정보를 이용하여 상기 문서에 포함된 이미지들 각각에 대한 이미지-캡션 후보 쌍을 생성할 수 있다. The image-caption candidate pair generator 301 may generate an image-caption candidate pair for each of the images included in the document using text information and structural information of the document.

도 5에서 문서(500)에 포함된 이미지(501)는 이미지 필터링부(305)를 통해 이미지 필터링 과정을 거친 이미지를 의미한다. 일례로, 도 5의 이미지(501)는 이미지의 크기에 따른 필터, 이미지의 가로세로 비율에 따른 필터, 이미지의 파일명에 따른 필터 및 문서 내 이미지의 중복 배치에 따른 필터를 통해 이미지 필터링 과정을 거친 이미지로써, 이미지 캡션 추출 대상인 이미지를 의미한다. 즉, 이미지 필터링 과정은 문서에 포함된 이미지 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 과정이라고 할 수 있다.In FIG. 5, the image 501 included in the document 500 refers to an image that has undergone an image filtering process through the image filtering unit 305. For example, the image 501 of FIG. 5 is subjected to an image filtering process through a filter according to an image size, a filter based on an aspect ratio of an image, a filter based on a file name of an image, and a filter based on overlapping arrangement of images in a document. As an image, an image that is an image caption extraction target. That is, the image filtering process may be referred to as a process for determining an image that is an image caption extraction target among images included in a document.

그리고, 문서(500)에 포함된 텍스트(502, 503, 504, 505)는 이미지(501)에 인접하는 단위 텍스트를 의미한다. 이 때, 캡션 후보 결정부(306)는 이미지 캡션 추출 대상인 이미지(501)에 대해 문서의 태그를 이용하여 각 방향 별로 인접한 단위 텍스트(502, 503, 504, 505)를 추출할 수 있다. 이 때, 도 5에서 도시된 이미 지의 각 방향은 위, 아래, 왼쪽, 오른쪽이지만, 본 발명은 상기 방향에 한정되지 않는다. 앞에서 이미 언급했듯이, 캡션 후보 결정부(306)는 단락 태그 <p>와 테이블 태그<td>를 포함하는 문서의 태그를 이용하여 단위 텍스트(502, 503, 504, 505)를 추출할 수 있다.In addition, the texts 502, 503, 504, and 505 included in the document 500 mean unit text adjacent to the image 501. In this case, the caption candidate determiner 306 may extract unit texts 502, 503, 504, and 505 adjacent to each direction by using the tag of the document with respect to the image 501, which is an image caption extraction target. At this time, each direction of the image shown in Figure 5 is up, down, left, right, but the present invention is not limited to the above direction. As mentioned above, the caption candidate determiner 306 may extract the unit texts 502, 503, 504, and 505 using tags of a document including the paragraph tag <p> and the table tag <td>.

그리고, 캡션 후보 결정부(306)는 추출된 단위 텍스트(502, 503, 504, 505)를 대상으로 캡션 후보를 결정할 수 있다. 단위 텍스트(502, 503, 504, 505)가 캡션 후보를 결정하는 기준을 만족하지 못하는 경우, 이미지(501)에 대한 캡션 후보가 결정되지 않을 수 있다. 캡션 후보가 결정되지 않은 이미지에 대해서는 이미지-캡션 후보 쌍으로부터 피처를 추출하는 과정과 이미지 캡션을 선택하는 과정이 진행되지 않는다.The caption candidate determiner 306 may determine a caption candidate based on the extracted unit texts 502, 503, 504, and 505. If the unit texts 502, 503, 504, and 505 do not satisfy the criteria for determining the caption candidate, the caption candidate for the image 501 may not be determined. The extraction of the feature from the image-caption candidate pair and the selection of the image caption are not performed on the image for which the caption candidate is not determined.

이 때, 캡션 후보 결정부(306)는 추출된 단위 텍스트 중 텍스트의 길이 및 이미지와의 거리를 이용하여 캡션 후보를 결정할 수 있다. 즉, 캡션 후보를 결정하는 기준은 단위 텍스트의 길이 및 이미지와의 거리로 설정될 수 있다. 캡션 후보는 이미지(501)를 설명하는 이미지 캡션이 될 수 있는 텍스트이기 때문에, 텍스트의 길이는 설명 문구가 될 수 있을 정도의 길이를 의미할 수 있다. 그리고, 이미지와의 거리도 설명 문구가 될 수 있을 정도의 거리를 의미할 수 있다.In this case, the caption candidate determiner 306 may determine the caption candidate using the length of the text and the distance from the image of the extracted unit text. That is, the criterion for determining the caption candidate may be set by the length of the unit text and the distance from the image. Since the caption candidate is text that may be an image caption describing the image 501, the length of the text may mean that the caption is a length enough to be a descriptive text. The distance from the image may also mean a distance enough to be an explanatory phrase.

예를 들어, 제한된 텍스트의 길이가 2 byte ~ 500 byte일 때, 텍스트 2(503)의 길이가 600byte라고 한다면, 텍스트 2(503)는 캡션 후보로 결정되지 않는다. 그리고, 이미지와의 제한된 거리가 이미지의 폭과 높이의 평균 값의 2배 이하라고 할 때, 텍스트 3(504)과 이미지(501)의 거리가 이미지(501)의 폭과 높이의 평균값 의 3배라면, 텍스트 3(504)는 캡션 후보로 결정되지 않는다.For example, when the length of the limited text is 2 bytes to 500 bytes, if the length of the text 2 503 is 600 bytes, the text 2 503 is not determined as the caption candidate. When the limited distance to the image is less than twice the average of the width and height of the image, the distance between the text 3 504 and the image 501 is three times the average of the width and height of the image 501. If not, text 3 504 is not determined to be a caption candidate.

캡션 후보 결정부(306)를 통해 캡션 후보(502, 505)가 결정되면, 이미지(501)에 대해 이미지-캡션 후보 쌍이 생성될 수 있다. 생성되는 이미지-캡션 후보 쌍은 적어도 하나일 수 있다.When caption candidates 502 and 505 are determined through the caption candidate determiner 306, an image-caption candidate pair may be generated for the image 501. The generated image-caption candidate pair may be at least one.

그러면, 피처 추출부(303)는 생성된 이미지-캡션 후보 쌍 각각에 대한 피처를 추출할 수 있다. 일례로, 추출된 피처는 이미지-캡션 후보 쌍에 대해 이미지 크기, 이미지 가로세로 비율, 이미지 포맷, 캡션 후보의 길이, 캡션 후보의 반복 존재 여부, 폰트 태그 사용 여부, 이미지와 캡션 후보와의 거리, 캡션 후보 방향, 이미지 폭 대비 캡션 후보의 폭, 앵커 태그의 사용 여부, 캡션 키워드 존재 여부 또는 종결 부호 존재 여부를 포함할 수 있다.Then, the feature extractor 303 may extract a feature for each of the generated image-caption candidate pairs. For example, the extracted feature may include an image size, an image aspect ratio, an image format, a length of a caption candidate, whether there is a repetition of the caption candidate, whether a font tag is used, a distance between the image and the caption candidate, The caption candidate direction, the width of the caption candidate relative to the image width, whether to use an anchor tag, whether a caption keyword exists or whether there is a termination code may be included.

이미지 크기는 이미지의 가로와 세로의 곱으로 결정될 수 있다. 이미지 가로 세로 비율은 이미지의 가로 길이와 세로 길이 간의 비율로 결정될 수 있다. 이미지 포맷은 이미지의 파일 형식(jpg, gif, tiff 등)을 의미할 수 있다. 캡션 후보의 길이는 이미지에 인접하는 텍스트인 캡션 후보의 길이로 결정될 수 있다.The image size may be determined by the product of the width and length of the image. The image aspect ratio may be determined as the ratio between the width and the length of the image. The image format may mean a file format of an image (jpg, gif, tiff, etc.). The length of the caption candidate may be determined by the length of the caption candidate, which is text adjacent to the image.

캡션 후보의 반복 존재 여부는 캡션 후보가 이미지를 설명하기 위해 반복되는 문구(제목, 내용, 작성 일자 등)에 해당하여 문서에 자주 등장하는 지 여부로 결정될 수 있다.The presence or absence of the caption candidate may be determined by whether the caption candidate frequently appears in the document corresponding to the repeated phrase (title, content, creation date, etc.) to describe the image.

그리고, 폰트 태그 사용 여부는 캡션 후보인 텍스트의 글꼴, 크기, 굵기 등의 효과가 적용되었는 지 여부를 의미한다. 캡션 후보 방향은 캡션 후보가 이미지의 어느 방향에 위치하는 지 여부를 의미한다. 이미지 폭 대비 캡션 후보의 폭은 이미지의 폭과 캡션 후보의 폭 간의 비율로 결정될 수 있다.In addition, whether the font tag is used means whether effects such as font, size, and thickness of the caption candidate text are applied. The caption candidate direction means in which direction of the image the caption candidate is located. The width of the caption candidate to the image width may be determined as a ratio between the width of the image and the width of the caption candidate.

앵커 태그의 사용 여부는 캡션 후보가 특정 링크로 연결하는 하이퍼링크가 적용되었는 지 여부를 결정된다. 캡션 키워드 존재 여부는 이미지를 설명하는 의미가 포함된 키워드(예를 들면, 모습, 장면, 사진 등)가 캡션 후보에 존재하는 지 여부로 결정된다. 그리고, 종결 부호 존재 여부는 캡션 후보의 끝에 마침표 등의 종결 부호가 존재하는 지 여부로 결정된다.Whether to use an anchor tag is determined whether a hyperlink to which a caption candidate connects to a specific link has been applied. The presence or absence of a caption keyword is determined by whether or not a keyword (eg, a figure, a scene, a photo, etc.) including a meaning describing the image exists in the caption candidate. The presence or absence of a termination code is determined by whether a termination code such as a period exists at the end of the caption candidate.

도 6은 본 발명의 일실시예에 따라 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 과정을 설명하기 위한 도면이다. 도 6을 참고하면, 이미지 캡션을 선택하는 예제(601, 602)가 도시되어 있다.6 is a diagram illustrating a process of selecting an image caption from an image-caption candidate pair according to an embodiment of the present invention. Referring to FIG. 6, examples 601 and 602 of selecting an image caption are shown.

일례로, 상기 확률 기반의 분류 모델은 학습 자료를 통해 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 과정을 학습될 수 있다. 일례로, 이미지 캡션 선택부(303)는 하기 수학식 1에 따라 이미지-캡션 후보 쌍이 정답 쌍이 될 확률값을 계산할 수 있다.For example, the probability-based classification model may be trained to select an image caption from an image-caption candidate pair through training materials. For example, the image caption selector 303 may calculate a probability value that the image-caption candidate pair will be a correct pair according to Equation 1 below.

이미지 캡션 선택부(303)는 추출된 이미지-캡션 후보 쌍들 중 상기 계산된 확률값이 미리 설정한 임계값 이상인 것을 대상으로 확률값이 가장 큰 이미지-캡션 후보 쌍을 선정하여 이미지 캡션을 선택할 수 있다. 일례로, 이미지 캡션 선택부(303)는 하기 수학식 2에 따라 이미지 캡션을 선택할 수 있다.The image caption selector 303 may select an image caption by selecting an image-caption candidate pair having the largest probability value from among the extracted image-caption candidate pairs having the calculated probability value equal to or greater than a preset threshold value. For example, the image caption selector 303 may select an image caption according to Equation 2 below.

여기서,

는 이미지에 대한 이미지 캡션을 의미하고,

는 이미지-캡션 후보 쌍을 의미한다. 그리고,

는 하나의 캡션 후보를 의미하고,

는 캡션 후보의 집합을 의미할 수 있다.

는 이미지-캡션 후보 쌍이 정답 쌍이 될 확률을 의미한다.here,

Means an image caption for the image,

Means an image-caption candidate pair. And,

Means one caption candidate,

May mean a set of caption candidates.

Denotes the probability that the image-caption candidate pair will be a correct pair.

도 6을 참고하면, 임계값(

)이 80%이라고 가정할 수 있다. 예제(601)에서, 이미지 1에 대한 캡션 후보가 캡션 후보 1, 캡션 후보 2가 존재할 때, 이미지 1-캡션 후보 1 쌍이 정답 쌍이 될 확률값은 90%이고, 이미지1-캡션 후보 2 쌍이 정답 쌍이 될 확률값은 95%이다. 상기 수학식 2를 적용하면, 선택되는 이미지 캡션은 캡션 후보 2이다.Referring to Figure 6, the threshold value (

) Can be assumed to be 80%. In example 601, when the caption candidate for image 1 is caption candidate 1 and caption candidate 2, there is a 90% probability that the pair of image 1-caption candidates will be the correct pair, and the pair of image 1-caption candidates will be the correct pair The probability value is 95%. Applying Equation 2, the selected image caption is caption candidate 2.

예제(602)에서, 이미지 2에 대한 캡션 후보가 캡션 후보 3, 캡션 후보 4가 존재할 때, 이미지2-캡션 후보 3 쌍이 정답 쌍이 될 확률값은 40%이고, 이미지2-캡션 후보 4 쌍이 정답 쌍이 될 확률값은 75%이다. 예제(602)의 경우, 이미지2-캡션 후보 4 쌍이 정답 쌍이 될 확률값이 가장 크지만, 임계값 이상인 확률값이 존재하지 않으므로, 예제(602)에서 선택되는 이미지 캡션은 없다고 할 수 있다. In example 602, when the caption candidate for image 2 is caption candidate 3 and caption candidate 4, the probability value that the pair of image2-caption candidates 3 will be the correct pair is 40%, and the pair of image2-caption candidates will be the correct pair The probability is 75%. In the case of Example 602, the probability value of the four pairs of image2-caption candidates to be the correct pair is the largest, but since there is no probability value that is greater than or equal to the threshold value, it can be said that there is no image caption selected in the example 602.

다만, 본 발명은 상기 언급한 예에 한정되지 않고, 다양한 방법이 적용될 수 있다.However, the present invention is not limited to the above-mentioned examples, and various methods may be applied.

도 7은 본 발명의 일실시예에 따른 캡션 후보 추출 방법의 전체 구성을 도시한 플로우차트이다.7 is a flowchart showing the overall configuration of a caption candidate extraction method according to an embodiment of the present invention.

본 발명의 일실시예에 따른 캡션 후보 추출 방법은 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출할 수 있다(S701). 이 때, 텍스트 정보는 문서에 포함된 텍스트의 길이, 폭, 폰트 등의 텍스트 형태를 의미할 수 있다. 그리고, 구조적 정보는 문서의 구성 요소의 위치를 의미할 수 있다.The caption candidate extraction method according to an embodiment of the present invention may parse the document to extract text information and structural information of the document (S701). In this case, the text information may refer to a text form such as a length, a width, and a font of the text included in the document. In addition, the structural information may mean the location of the components of the document.

이 때, 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출하는 단계(S701)는 문서를 파싱하여 브라우저 상에서 문서 구성 요소의 실제 위치 값을 추출할 수 있다. 일례로, 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출하는 단계(S701)는 하이퍼텍스트 마크 업 언어(HTML) 인터페이스에 따 른 파서(parser)를 이용하여 HTML로 구성된 상기 문서를 파싱 할 수 있다.At this time, the step of parsing the document to extract the text information and structural information of the document (S701) can parse the document to extract the actual position value of the document component on the browser. For example, parsing a document and extracting text information and structural information of the document (S701) may parse the document composed of HTML using a parser according to a hypertext markup language (HTML) interface. Can be.

본 발명의 일실시예에 따른 캡션 후보 추출 방법은 문서에 포함된 이미지들 중 이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행할 수 있다(S702).The caption candidate extraction method according to an embodiment of the present invention may perform image filtering to determine an image, which is an image caption extraction target, among images included in a document (S702).

이미지 캡션 추출 대상인 이미지를 결정하기 위한 이미지 필터링을 수행하는 단계(S702)는 이미지들의 속성에 따른 필터를 이용하여 이미지 캡션 추출 대상인 이미지 결정하기 위한 이미지 필터링을 수행할 수 있다. 이 때, 이미지들의 속성에 따른 필터는 이미지의 크기에 따른 필터, 이미지의 가로세로 비율에 따른 필터, 이미지의 파일명에 따른 필터 및 문서 내 이미지의 중복 배치에 따른 필터를 포함할 수 있다.Performing image filtering to determine an image that is an image caption extraction target (S702) may perform image filtering to determine an image that is an image caption extraction target using a filter according to attributes of the images. In this case, the filter according to the attributes of the images may include a filter based on the size of the image, a filter based on the aspect ratio of the image, a filter based on the file name of the image, and a filter based on the overlapping arrangement of the images in the document.

본 발명의 일실시예에 따른 캡션 후보 추출 방법은 문서에 포함된 텍스트로부터 상기 이미지 캡션 추출 대상인 이미지에 대한 캡션 후보를 결정할 수 있다(S703).The caption candidate extraction method according to an embodiment of the present invention may determine a caption candidate for an image that is the image caption extraction target from text included in a document (S703).

이 때, 캡션 후보는 문서에 포함된 텍스트 중 이미지 캡션이 될 수 있는 이미지와 인접한 단위 텍스트들일 수 있다. 일례로, 캡션 후보를 결정하는 단계(S703)는 이미지 캡션 추출 대상인 이미지에 대해 각 방향 별로 인접한 단위 텍 스트를 대상으로 캡션 후보를 결정할 수 있다.In this case, the caption candidates may be unit texts adjacent to an image which may become an image caption among texts included in the document. For example, in the determining of the caption candidate (S703), the caption candidate may be determined based on unit text adjacent to each direction of the image that is the image caption extraction target.

그리고, 캡션 후보를 결정하는 단계(S703)는 문서의 태그를 이용하여 상기 단위 텍스트를 추출하고, 상기 단위 텍스트 중 텍스트의 길이 및 이미지와의 거리를 이용하여 캡션 후보를 결정할 수 있다.In operation S703, the caption candidate may be extracted using the tag of the document, and the caption candidate may be determined using the length of the text in the unit text and the distance from the image.

예를 들면, 캡션 후보를 결정하는 단계(S703)는 문서 내에 단락 또는 테이블에 대한 태그를 이용하여 단위 텍스트를 추출할 수 있다. 다만, 상기 태그의 하위 노드에 대응하는 단락 또는 테이블에 대한 태그는 배제될 수 있다.For example, determining the caption candidate (S703) may extract unit text using a tag for a paragraph or a table in the document. However, a tag for a paragraph or a table corresponding to a lower node of the tag may be excluded.

그리고, 캡션 후보를 결정하는 단계(S703)는 추출된 단위 텍스트가 미리 설정한 텍스트의 길이에 해당하는 지 여부 및 미리 설정한 이미지와의 거리(예를 들면, 이미지의 폭과 높이의 평균값의 2배 이하 등)에 포함되는 지 여부를 고려하여 캡션 후보를 결정할 수 있다.The determining of the caption candidate may include determining whether the extracted unit text corresponds to a preset length of text and a distance from the preset image (for example, 2 of an average value of the width and height of the image). The caption candidate may be determined in consideration of whether the information is included in the same or less times.

이미지 캡션 추출 대상인 이미지에 대해 캡션 후보를 결정하는 경우, 문서에 포함된 텍스트 중 미리 설정한 캡션 후보 결정 기준(단위 텍스트의 길이 및 이미지와의 거리)을 만족하는 텍스트가 존재하지 않는 경우, 상기 이미지에 대해 캡션 후보가 결정되지 않을 수 있다.When determining a caption candidate for an image that is an image caption extraction target, and when there is no text among the text included in the document that satisfies a predetermined caption candidate determination criteria (length of unit text and distance from the image), the image The caption candidate may not be determined for.

결국, 캡션 후보 결정 방법을 통해 문서에 포함된 텍스트로부터 이미지에 대한 캡션 후보를 결정함으로써, 이미지-캡션 후보 쌍이 생성될 수 있다.As a result, an image-caption candidate pair may be generated by determining a caption candidate for an image from text included in a document through a caption candidate determination method.

도 8은 본 발명의 일실시예에 따른 이미지 캡션 추출 방법의 전체 구성을 도시한 플로우차트이다.8 is a flowchart illustrating the overall configuration of an image caption extraction method according to an embodiment of the present invention.

본 발명의 일실시예에 따른 이미지 캡션 추출 방법은 문서의 텍스트 정보 및 구조적 정보를 이용하여 상기 문서에 포함된 이미지들 각각에 대한 이미지-캡션 후보 쌍을 생성할 수 있다(S801). 즉, 이미지-캡션 후보 쌍을 생성하는 단계(S801)는 문서의 이미지에 대해 캡션 후보를 결정하고, 결정된 캡션 후보에 따라 이미지 별로 이미지-캡션 후보 쌍을 생성할 수 있다.The image caption extraction method according to an embodiment of the present invention may generate an image-caption candidate pair for each of the images included in the document using text information and structural information of the document (S801). That is, in operation S801 of generating an image-caption candidate pair, a caption candidate may be determined for an image of a document, and an image-caption candidate pair may be generated for each image according to the determined caption candidate.

일례로, 이미지-캡션 후보 쌍을 생성하는 단계(S801)는 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출하는 단계를 포함할 수 있다. 이 때, 문서를 파싱하여 상기 문서의 텍스트 정보 및 구조적 정보를 추출하는 단계는 상기 문서를 파싱하여 브라우저 상에서 문서 구성 요소의 실제 위치 값을 추출할 수 있다.For example, generating an image-caption candidate pair in operation S801 may include parsing a document and extracting text information and structural information of the document. At this time, parsing the document to extract the text information and structural information of the document may parse the document to extract the actual position value of the document component on the browser.

일례로, 이미지-캡션 후보 쌍을 생성하는 단계(S801)는 문서에 포함된 이미지들 중 이미지 캡션 추출 대상인 이미지를 결정하는 이미지 필터링을 수행하는 단계를 포함할 수 있다. 일례로, 이미지-캡션 후보 쌍을 생성하는 단계(S801)는 이미지들의 속성에 따른 필터를 이용하여 이미지 캡션 추출 대상인 이미지를 결정하는 이미지 필터링을 수행할 수 있다. 이미지들의 속성에 따른 필터는 도 2의 설명을 참고할 수 있다.For example, the generating of the image-caption candidate pair in operation S801 may include performing image filtering to determine an image, which is an image caption extraction target, among images included in the document. For example, in operation S801 of generating an image-caption candidate pair, image filtering may be performed to determine an image that is an image caption extraction target by using a filter according to attributes of the images. The filter according to the attributes of the images may refer to the description of FIG. 2.

일례로, 이미지-캡션 후보 쌍을 생성하는 단계(S801)는 문서에 포함된 텍스트로부터 상기 이미지 캡션 추출 대상인 이미지에 대한 캡션 후보를 결정하는 단계를 포함할 수 있다. 상기 캡션 후보를 결정하는 단계는 상기 이미지 캡션 추출 대상인 이미지에 대해 각 방향 별로 인접한 단위 텍스트를 대상으로 캡션 후보를 결정할 수 있다. 캡션 후보를 결정하는 구체적인 예는 도 2를 참고할 수 있다.For example, generating an image-caption candidate pair (S801) may include determining a caption candidate for an image that is an image caption extraction target from text included in a document. The determining of the caption candidate may determine the caption candidate for the unit text adjacent to each direction of the image that is the image caption extraction target. A specific example of determining the caption candidate may refer to FIG. 2.

본 발명의 일실시예에 따른 이미지 캡션 추출 방법은 생성된 이미지-캡션 후보 쌍 각각에 대한 피처를 추출할 수 있다(S802). 여기서, 피처는 이미지의 속성, 캡션 후보의 속성 및 이미지와 캡션 후보와의 관계에 따른 속성일 수 있다.The image caption extraction method according to an embodiment of the present invention may extract a feature for each of the generated image-caption candidate pairs (S802). Here, the feature may be an attribute of an image, an attribute of a caption candidate, and an attribute according to a relationship between the image and the caption candidate.

일례로, 피처를 추출하는 단계(S802)는 이미지-캡션 후보 쌍에 대해 이미지 크기, 이미지 가로세로 비율, 이미지 포맷, 캡션 후보의 길이, 캡션 후보의 반복 존재 여부, 폰트 태그 사용 여부, 이미지와 캡션 후보와의 거리, 캡션 후보 방향, 이미지 폭 대비 캡션 후보의 폭, 앵커 태그의 사용 여부, 캡션 키워드 존재 여부 또는 종결 부호 존재 여부를 포함하는 피처를 추출할 수 있다. 상기 언급한 피처는 일례에 불과하고, 시스템의 구성에 따라 변경될 수 있다. 상기 언급한 피처 각각에 대해서는 도 6을 참고할 수 있다.In one example, extracting the feature (S802) may include image size, image aspect ratio, image format, length of caption candidate, presence of caption candidate repetition, font tag use, image and caption for image-caption candidate pairs. Features including the distance to the candidate, the caption candidate direction, the width of the caption candidate relative to the image width, whether to use an anchor tag, whether a caption keyword exists or whether there is a termination code may be extracted. The above-mentioned features are only examples and may be changed according to the configuration of the system. See FIG. 6 for each of the above-mentioned features.

본 발명의 일실시예에 따른 이미지 캡션 추출 방법은 추출된 피처에 따른 확률을 이용하여 생성된 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택할 수 있다(S803).The image caption extraction method according to an embodiment of the present invention may select an image caption from an image-caption candidate pair generated using the probability according to the extracted feature (S803).

이미지 캡션을 선택하는 단계(S803)는 사전 학습된 확률 기반의 분류 모델에 따라 상기 추출된 피처를 이용하여 상기 이미지-캡션 후보 쌍이 정답 쌍이 될 확률 값을 계산할 수 있다.Selecting an image caption (S803) may calculate a probability value of the image-caption candidate pair to be a correct answer pair using the extracted features according to a pre-learned probability-based classification model.

이 때, 사전 학습된 확률 기반의 분류 모델은 주어진 학습 자료(training data)를 통해 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 과정을 학습하는 것을 의미한다. 이 때, 확률 기반의 분류 모델은 캡션 후보가 이미지 캡션이 될 확률을 이미지-캡션 후보 쌍에 대한 피처 각각에 따른 확률을 통해 결정되는 것 으로, 학습 자료를 통해 사전 학습될 수 있다.In this case, the pre-trained probability-based classification model means learning a process of selecting an image caption from an image-caption candidate pair through training data. In this case, the probability-based classification model is to determine the probability that the caption candidate becomes an image caption through the probability of each of the features for the image-caption candidate pair, it can be pre-learned through the training material.

이 때, 이미지 캡션을 선택하는 단계(S803)는 추출된 이미지-캡션 후보 쌍들 중 상기 계산된 확률값이 미리 설정한 임계값 이상인 것을 대상으로 확률값이 가장 큰 이미지-캡션 후보 쌍을 선정하여 이미지 캡션을 선택할 수 있다. 따라서, 이미지-캡션 후보 쌍에 대해 계산된 확률값 전부가 미리 설정한 임계값 이상이 아닌 경우, 상기 이미지에 대한 이미지 캡션은 선택되지 않을 수 있다.In this case, selecting an image caption (S803) may be performed by selecting an image-caption candidate pair having the largest probability value from among the extracted image-caption candidate pairs having a value greater than or equal to a preset threshold. You can choose. Thus, if all of the probability values calculated for the image-caption candidate pair are not greater than or equal to a preset threshold, the image caption for the image may not be selected.

도 7 및 도 8에서 설명되지 않은 부분은 도 1 내지 도 6에서 설명된 내용을 참고할 수 있다.Parts not described in FIGS. 7 and 8 may refer to the contents described with reference to FIGS. 1 to 6.

또한 본 발명의 일실시예에 따른 캡션 후보 추출 방법 및 이미지 캡션 추출 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함한다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨 터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.In addition, the caption candidate extraction method and the image caption extraction method according to an embodiment of the present invention includes a computer readable medium including program instructions for performing various computer-implemented operations. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium or program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above-described embodiments, which can be variously modified and modified by those skilled in the art to which the present invention pertains. Modifications are possible. Accordingly, the spirit of the present invention should be understood only by the claims set forth below, and all equivalent or equivalent modifications thereof will belong to the scope of the present invention.

도 6은 본 발명의 일실시예에 따라 이미지-캡션 후보 쌍으로부터 이미지 캡션을 선택하는 과정을 설명하기 위한 도면이다.6 is a diagram illustrating a process of selecting an image caption from an image-caption candidate pair according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: WEB100: WEB

101-1~101-n: 웹 브라우저101-1 to 101-n: web browser

102-1~102-n: HTML 웹 문서102-1 through 102-n: HTML Web Documents

103: 이미지 캡션 추출 시스템103: image caption extraction system

104: 이미지 캡션 리스트104: image caption list

Claims

An image-caption candidate pair generator for generating an image-caption candidate pair for each of the images included in the document using text information and structural information of the document;

A feature extractor for extracting a feature for each of the generated image-caption candidate pairs; And

An image caption selector which selects an image caption from the generated image-caption candidate pair using a probability according to the extracted feature

Image caption extraction system comprising a.

The method of claim 1,

The image-caption candidate pair generation unit,

A document parsing unit which parses a document and extracts text information and structural information of the document;

An image filtering unit configured to perform image filtering to determine an image, which is an image caption extraction target, among images included in the document; And

Caption candidate determination unit for determining a caption candidate for the image that is the image caption extraction target from the text included in the document

Image caption extraction system comprising a.

The method of claim 2,

The document parsing unit,

Parsing the document to extract the actual position values of document components on a browser.

The method of claim 2,

The document parsing unit,

And parsing the document using a parser.

The method of claim 2,

The image filtering unit,

And image filtering to determine an image that is an image caption extraction target using a filter according to the attributes of the images.

The method of claim 5,

The filter according to the attributes of the images,

An image caption extraction system comprising a filter according to an image size, a filter according to an aspect ratio of an image, a filter based on a file name of an image, and a filter based on overlapping arrangement of images in a document.

The method of claim 2,

The caption candidate determiner,

And caption candidates are determined based on unit text adjacent to each direction with respect to the image that is the image caption extraction target.

The method of claim 2,

The caption candidate is

And image caption extraction unit text adjacent to the image caption extraction target image among the text included in the document.

The method of claim 7, wherein

The caption candidate determiner,

And extracting the unit text by using the tag of the document, and determining a caption candidate using the length of the text and the distance from the image of the unit text.

The method of claim 1,

The feature extraction unit,

Image size, image aspect ratio, image format, length of caption candidate, presence of repetition of caption candidate, use of font tag, distance between image and caption candidate, caption candidate direction, image width for the image-caption candidate pair And extracting a feature including a width of a contrast caption candidate, whether an anchor tag is used, whether a caption keyword exists, or whether a terminator exists.

The method of claim 1,

The image caption selector,

And calculating a probability value of the image-caption candidate pair to be a correct answer pair using the extracted features according to a pre-learned probability-based classification model.

The method of claim 11,

The image caption selector,

And selecting an image caption by selecting an image-caption candidate pair having the largest probability value from among the generated image-caption candidate pairs having the calculated probability value equal to or greater than a preset threshold.

Parsing a document to extract textual information and structural information of the document;

Performing image filtering to determine an image, which is an image caption extraction target, among images included in the document; And

Determining a caption candidate for an image that is an image caption extraction target from text included in the document;

Caption candidate extraction method comprising a.

The method of claim 13,

The caption candidate is

Caption candidate extraction method characterized in that the unit of the text contained in the document adjacent to the image caption extraction target image.

The method of claim 13,

Parsing the document to extract text information and structural information of the document,

And parsing the document to extract the actual position value of the document component on the browser.

The method of claim 13,

Caption candidate extraction method characterized in parsing the document using a parser.

The method of claim 13,

Performing image filtering to determine an image that is the image caption extraction target,

Caption candidate extraction method characterized in that for performing the image filtering to determine the image that is the image caption extraction target using a filter according to the properties of the images.

The method of claim 17,

The filter according to the attributes of the images,

Caption candidate extraction method comprising a filter according to the size of the image, a filter according to the aspect ratio of the image, a filter according to the file name of the image and a filter according to the overlapping arrangement of the image in the document.

The method of claim 13,

Determining a caption candidate for the image that is the image caption extraction target,

Caption candidate extraction method characterized in that for determining the caption candidate for the unit text adjacent to each direction for the image that is the image caption extraction target.

The method of claim 19,

Generating an image-caption candidate pair for each of the images included in the document using textual information and structural information of the document;

Extracting features for each of the generated image-caption candidate pairs; And

Selecting an image caption from the generated image-caption candidate pair using a probability according to the extracted feature

Image caption extraction method comprising a.

The method of claim 21,

Generating the image-caption candidate pair,

Image caption extraction method comprising a.

The method of claim 22,

Parsing the document to extract an actual position value of a document component on a browser.

The method of claim 22,

And caption candidates are determined based on unit text adjacent to each direction with respect to the image caption extraction image.

The method of claim 21,

Extracting features for each of the generated image-caption candidate pairs may include:

The method of claim 21,

Selecting an image caption from the generated image-caption candidate pair,

And a probability value of the generated image-caption candidate pair to be a correct answer pair using the extracted features according to a pre-learned probability-based classification model.

The method of claim 26,

Selecting the image caption,

And selecting an image caption by selecting an image-caption candidate pair having the largest probability value from among the extracted image-caption candidate pairs, wherein the calculated probability value is equal to or greater than a preset threshold value.

A computer-readable recording medium in which a program for executing the method of any one of claims 13 to 27 is recorded.