KR100955181B1

KR100955181B1 - Method and System for Searching Image

Info

Publication number: KR100955181B1
Application number: KR1020080034550A
Authority: KR
Inventors: 박근한; 김준석; 신현민; 김지승; 이상호
Original assignee: 엔에이치엔(주)
Priority date: 2008-04-15
Filing date: 2008-04-15
Publication date: 2010-04-29
Also published as: KR20090109215A

Abstract

이미지들의 품질지수를 산출함으로써 이미지 검색결과 제공시 각 이미지들의 품질지수를 고려하여 검색결과를 제공할 수 있는 본 발명의 일 실시예에 따른 이미지 검색방법은, 각 이미지의 품질지수(Quality Value)를 산출하는 단계; 쿼리(Query)와 상기 각 이미지간의 텍스트 유사도(Similarity)를 산출하는 단계; 및 상기 각 이미지의 품질지수 및 상기 각 이미지의 텍스트 유사도를 이용하여 상기 쿼리에 대한 이미지의 적합도(Relevance)를 산출하는 단계를 포함한다.The image retrieval method according to an embodiment of the present invention, which provides search results in consideration of the quality index of each image by calculating the quality index of the images, provides a quality value of each image. Calculating; Calculating a text similarity between a query and each of the images; And calculating a relevance of the image for the query using the quality index of each image and the text similarity of each image.

이미지, 품질, 지수, 특징정보, 검색, 선호도, 최신도, 적합도, 유사도 Image, quality, index, feature, search, rating, freshness, goodness of fit, similarity

Description

Image Search Method and Search System {Method and System for Searching Image}

본 발명은 검색 방법에 관한 것으로서 보다 상세하게는 이미지 검색 방법에 관한 것이다.The present invention relates to a search method, and more particularly, to an image search method.

인터넷의 발달 및 인터넷 보급의 증가로 인해 인터넷을 이용한 다양한 서비스가 제공되고 있는데, 그 중 대표적인 예가 검색 서비스라 할 수 있다. 검색 서비스는 사용자가 검색하고자 하는 단어 또는 단어의 조합을 쿼리로 입력하면, 검색 엔진이 입력된 쿼리에 상응하는 검색결과 문서(예컨대, 사용자로부터 입력된 검색 쿼리를 포함하는 웹 사이트, 기사)를 사용자에게 제공하는 서비스를 의미한다.Due to the development of the Internet and the increase of the Internet, various services are provided using the Internet, and a representative example thereof is a search service. When a search service inputs a word or a combination of words to be searched by a user, the search engine searches a search result document (eg, a web site or an article including a search query entered by the user) corresponding to the search engine. Means services provided to

최근에는 사용자들의 다양한 검색요구에 부응하기 위해 검색 서비스 제공자들은 텍스트 형태의 검색결과뿐만 아니라 이미지와 같은 컨텐츠를 검색결과로 제공하고 있다. 이때, 검색결과로써 이미지를 제공함에 있어서, 기존의 이미지 검색 시스템은 사용자가 입력한 쿼리를 포함하고 있는 이미지들, 예컨대 사용자가 입력한 쿼리가 파일명 또는 이미지와 관련된 텍스트에 포함되어 있는 이미지를 해당 쿼리의 중요도에 따라 모델링함으로써 이미지 제공하게 된다.Recently, in order to meet various search demands of users, search service providers have provided content such as images as search results as well as text search results. In this case, in providing an image as a search result, the existing image search system searches for images including a query input by a user, for example, an image in which the query input by the user is included in a file name or text related to the image. By modeling according to the importance of the image will be provided.

그러나 검색결과로써 이미지를 제공하는 경우 상술한 바와 같이, 해당 이미 지의 파일명에 사용자가 입력한 쿼리가 포함되어 있거나 해당 이미지와 관련된 텍스트 파일에 사용자가 입력한 쿼리가 포함되어 있는 경우 해당 이미지를 검색결과로 제공하기 때문에, 검색결과로 제공되는 이미지가 사용자가 입력한 쿼리와 관련이 없는 것이거나 열악한 품질을 가진 것일 수 있고, 이러한 이미지들이 검색결과로 제공되는 경우 이미지 검색 서비스에 대한 신뢰도가 저하될 수 있다는 문제점이 있다.However, when providing an image as a search result, as described above, if the file name of the image includes a query entered by the user or a query entered by the user in a text file related to the image, the image is returned as the search result. Since the images provided in the search results may not be related to the query entered by the user or have poor quality, the reliability of the image search service may be deteriorated when these images are provided as the search results. There is a problem.

또한, 기존의 검색 시스템은 이미지 검색결과를 제공함에 있어서 사용자들이 입력한 쿼리와 유사도가 높은 이미지를 검색결과로 제공하였기 때문에, 사용자들에 의해 많이 선택되지 않는 이미지, 즉 사용자들의 선호도가 높지 않은 이미지들이 검색결과에 포함되거나, 검색결과 내에서도 사용자들의 선호도가 높지 않은 이미지들이 검색결과의 상단에 배치될 수 있어 이미지 검색 서비스의 정확도가 저하될 수 있다는 문제점이 있다.In addition, since the existing search system provides an image search result with an image similar to the query input by the user, the image is not selected by the users, that is, the user does not have high preference. May be included in the search results, or images that do not have high user preferences may be disposed at the top of the search results, thereby reducing the accuracy of the image search service.

본 발명은 상술한 문제점을 해결하기 위한 것으로서, 이미지들의 품질지수를 산출함으로써 이미지 검색결과 제공시 각 이미지들의 품질지수를 고려하여 검색결과를 제공할 수 있는 이미지 검색 방법 및 시스템을 제공하는 것을 기술적 과제로 한다.SUMMARY OF THE INVENTION The present invention has been made in view of the above-described problems, and provides an image retrieval method and system capable of providing a search result in consideration of the quality index of each image by calculating the quality index of the images. Shall be.

또한, 본 발명은 각 이미지들의 사용자 선호도를 산출함으로써 이미지 검색결과 제공시 각 이미지들의 선호도를 고려하여 검색결과를 제공할 수 있는 이미지 검색 방법 및 시스템을 제공하는 것을 다른 기술적 과제로 한다.In addition, another object of the present invention is to provide an image retrieval method and system capable of providing a search result in consideration of the preference of each image when providing the image search result by calculating a user preference of each image.

상술한 목적을 달성하기 위한 본 발명의 일 측면에 따른 이미지 검색방법은, 각 이미지의 품질지수(Quality Value)를 산출하는 단계; 쿼리(Query)와 상기 각 이미지간의 텍스트 유사도(Similarity)를 산출하는 단계; 및 상기 각 이미지의 품질지수 및 상기 각 이미지의 텍스트 유사도를 이용하여 상기 쿼리에 대한 이미지의 적합도(Relevance)를 산출하는 단계를 포함한다.An image retrieval method according to an aspect of the present invention for achieving the above object comprises the steps of: calculating a quality index (Quality Value) of each image; Calculating a text similarity between a query and each of the images; And calculating a relevance of the image for the query using the quality index of each image and the text similarity of each image.

일 실시예에 있어서, 이미지 검색방법은 상기 이미지의 적합도를 이용하여 상기 쿼리에 대한 이미지 검색결과를 생성하는 단계; 및 상기 이미지 검색결과를 사용자에게 제공하는 단계를 더 포함할 수 있다.In one embodiment, the image search method comprises the steps of: generating an image search result for the query using the goodness of fit of the image; And providing the image search result to a user.

이때, 상기 이미지의 텍스트 유사도 산출 단계에서, 상기 이미지의 텍스트 유사도는 상기 쿼리에 포함된 단어가 상기 이미지와 관련된 텍스트의 본문, 제목, 및 해당 이미지의 링크주소 중 적어도 하나에서 차지하는 중요도를 이용하여 산출되는 것을 특징으로 하고, 상기 중요도는 수정된 포아송(2-Poisson) 모델을 이용하여 산출되는 것을 특징으로 한다.At this time, in the text similarity calculation step of the image, the text similarity of the image is calculated using the importance of the words included in the query in at least one of the body, the title, and the link address of the image associated with the image. The importance is characterized in that it is calculated using a modified Poisson (2-Poisson) model.

한편, 상기 이미지 품질지수는 상기 각 이미지로부터 추출된 이미지 기본정보, 칼라 분포정보, 그레이(Gray) 분포정보, 및 텍스추어(Texture)정보 중 적어도 하나를 포함하는 이미지 특징정보들을 이용하여 생성될 수 있다.The image quality index may be generated using image feature information including at least one of image basic information, color distribution information, gray distribution information, and texture information extracted from each image. .

또한, 상기 이미지 검색방법은 상기 각 이미지의 선호도(Popularity)를 산출하는 단계를 더 포함하고, 상기 적합도 산출단계에서, 상기 각 이미지의 선호도를 함께 고려하여 상기 적합도를 산출하는 것을 특징으로 한다.The image retrieval method may further include calculating a preference of the respective images, and in the calculating of the fitness, the fitness may be calculated in consideration of the preference of each image.

일 실시예에 있어서, 상기 각 이미지의 선호도 정보는 이미지 자체의 중복 정도를 나타내는 제1 선호도 정보 및 이미지의 출처 중복 정도를 나타내는 제2 선호도 정보 중 적어도 하나를 포함하되, 상기 제1 선호도 정보는 상기 이미지 자체의 중복 개수를 이용하여 산출되고, 상기 제2 선호도 정보는 상기 이미지가 포함된 출처의 개수를 이용하여 산출되는 것을 특징으로 한다.According to an embodiment, the preference information of each image may include at least one of first preference information indicating a degree of overlap of the image itself and second preference information indicating a degree of overlap of a source of the image, wherein the first preference information includes: The second preference information is calculated using the number of overlaps of the image itself, and the second preference information is calculated using the number of sources including the image.

한편, 상기 적합도 산출 단계는, 상기 텍스트 유사도, 상기 품질지수, 이미지 자체의 중복개수, 및 이미지가 포함된 출처의 개수 중 적어도 하나가 소정 조건을 만족하는지 여부를 판단하는 단계를 포함하고, 만족하는 경우 상기 제1 및 제2 선호도 중 적어도 하나를 함께 고려하여 상기 적합도를 산출하는 것을 특징으로 한다.Meanwhile, the calculating of the goodness of fit may include determining whether at least one of the text similarity, the quality index, the number of overlaps of the image itself, and the number of sources including the image satisfies a predetermined condition. In this case, the fitness is calculated by considering at least one of the first and second preferences together.

또한, 상기 이미지 검색방법은, 상기 이미지의 최신도(Recency)를 산출하는 단계를 더 포함하고, 상기 적합도 산출단계에서, 상기 이미지의 최신도를 함께 고려하여 상기 적합도를 산출하는 것을 특징으로 한다.The image retrieval method may further include calculating a recency of the image. In the relevance calculation step, the relevance may be calculated in consideration of the recency of the image.

일 실시예에 있어서, 상기 이미지의 최신도는 상기 이미지의 생성 날짜와 현재 날짜와의 차이값과 임계치 날짜를 이용하여 산출되는 것을 특징으로 한다.In one embodiment, the latest degree of the image is calculated using a difference value between the creation date and the current date of the image and a threshold date.

한편, 상기 이미지 검색방법은 상기 이미지의 텍스트 유사도 및 상기 이미지의 품질지수에 대한 가중치를 산출하는 단계를 더 포함하고, 상기 적합도 산출단계에서, 상기 적합도는 상기 가중치가 반영된 이미지의 품질지수 및 상기 가중치가 반영된 이미지의 텍스트 유사도를 이용하여 산출되는 것을 특징으로 한다.The image retrieval method may further include calculating weights for the text similarity of the image and the quality index of the image. In the fitness calculation step, the fitness is the quality index and the weight of the image in which the weight is reflected. Is calculated using the text similarity of the reflected image.

상술한 목적을 달성하기 위한 본 발명의 다른 측면에 따른 이미지 검색 시스템은 각 이미지의 품질지수(Quality Value)를 산출하는 품질지수 산출부; 쿼리와 상기 각 이미지간의 텍스트 유사도(Similarity)를 산출하는 텍스트 유사도 산출부; 및 상기 각 이미지의 품질지수 및 상기 각 이미지의 텍스트 유사도를 이용하여 상기 쿼리에 대한 이미지의 적합도(Relevance)를 산출하는 적합도 산출부를 포함하는 것을 특징으로 한다.An image retrieval system according to another aspect of the present invention for achieving the above object is a quality index calculation unit for calculating the quality index (Quality Value) of each image; A text similarity calculator configured to calculate a text similarity between the query and the respective images; And a fitness calculation unit for calculating a relevance of the image to the query using the quality index of each image and the text similarity of each image.

상술한 바와 같이 본 발명에 따르면, 이미지 검색결과 생성시 각 이미지들의 품질지수를 고려하여 양질의 이미지들이 검색결과의 상단에 배치되도록 함으로써 사용자들이 양질의 이미지를 먼저 열람할 수 있게 하며, 이로 인해 검색 서비스의 만족도를 향상시킬 수 있다는 효과가 있다.As described above, according to the present invention, by generating the image search results in consideration of the quality index of each image quality images are placed on the top of the search results so that users can browse the high-quality images first, thereby searching There is an effect that can improve the satisfaction of the service.

또한, 본 발명은 이미지 검색결과 생성시 각 이미지들의 사용자 선호도를 고 려하여 사용자 선호도가 높은 이미지들이 검색결과의 상단에 배치되도록 함으로써 많은 사용자들이 열람하는 이미지를 먼저 열람할 수 있게 되어 검색의 정확도를 향상시킬 수 있다는 효과가 있다.In addition, the present invention is to consider the user preferences of each image when generating the image search results so that the image having a high user preference is placed on the top of the search results, so that many users can view the image first to view the accuracy of the search There is an effect that can be improved.

이하 첨부된 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 이미지 검색 시스템의 개략적인 블럭도이다. 이미지 검색 시스템(100)은 인터넷(102)을 통해 사용자 단말기(104)와 연결되어 사용자 단말기(104)로부터 입력되는 쿼리(Query)를 수신하고, 수신된 쿼리에 상응하는 이미지 검색결과를 사용자 단말기(104)로 제공한다. 이하에서는 설명의 편의를 위해 사용자 단말기(104)라는 용어를 사용자와 혼용하여 사용하기로 한다.1 is a schematic block diagram of an image retrieval system according to an embodiment of the present invention. The image retrieval system 100 is connected to the user terminal 104 via the Internet 102 to receive a query input from the user terminal 104, and the image search result corresponding to the received query is received by the user terminal ( 104). Hereinafter, for convenience of description, the term user terminal 104 will be used interchangeably with a user.

도시된 바와 같이, 이미지 검색 시스템(100)은 이미지 데이터베이스(110), 품질지수 산출부(120), 선호도 산출부(130), 최신도 산출부(140), 인터페이스부(150), 텍스트 유사도 산출부(160), 적합도 산출부(170), 및 검색결과 생성부(180)를 포함할 수 있다.As shown, the image search system 100 calculates the image database 110, the quality index calculator 120, the preference calculator 130, the latestness calculator 140, the interface 150, and the text similarity. The unit 160, the fitness calculator 170, and the search result generator 180 may be included.

이미지 데이터베이스(110)는 검색에 이용될 각종 이미지들이 저장되는 곳으로서, 본 실시예에 있어서 이미지라 함은, 이미지로만 되어 있는 문서뿐만 아니라, 텍스트와 함께 이미지가 포함되어 있는 문서도 포함하는 개념을 의미하는 것으로서, 이하에서는 설명의 편의를 위해 이러한 모든 문서를 이미지로 호칭하기로 한다.The image database 110 is a place where various images to be used for a search are stored. In this embodiment, an image refers to a concept including not only a document that is an image but also a document including an image together with text. In the following description, all these documents will be referred to as images for the convenience of description.

한편, 이미지 데이터베이스(110)에 저장되어 있는 각 이미지들에는 후술할 품질지수 산출부(120)에 의해 산출되는 이미지 품질지수, 선호도 산출부(130)에 의해 산출되는 이미지 선호도, 및 최신도 산출부(140)에 의해 산출되는 이미지 최신도가 각 이미지와 함께 저장되어 있으며, 이외에도 텍스트 유사도 산출을 위해 사용되는 해당 이미지에 포함된 단어의 중요도가 각 이미지와 함께 매칭되어 저장되어 있다.Meanwhile, each of the images stored in the image database 110 includes an image quality index calculated by the quality index calculator 120, an image preference calculated by the preference calculator 130, and a latestness calculator. The image latestness calculated by 140 is stored with each image, and in addition, the importance of words included in the image used for calculating text similarity is matched with each image and stored.

품질지수 산출부(120)는 각 이미지들의 품질지수(Quality Value)를 산출한다. 이러한 품질지수 산출부(120)의 세부 구성을 도 2를 참조하여 보다 구체적으로 설명한다. 본 발명의 일 실시예에 따른 품질지수 산출부(120)는 도 2에 도시된 바와 같이, 특징정보 추출부(210), 비정상 이미지 판단부(220), 가중치 반영부(230), 및 품질지수 연산부(240)를 포함한다.The quality index calculator 120 calculates a quality value of each image. Detailed configuration of the quality index calculation unit 120 will be described in more detail with reference to FIG. 2. As illustrated in FIG. 2, the quality index calculator 120 according to an embodiment of the present invention includes a feature information extractor 210, an abnormal image determiner 220, a weight reflector 230, and a quality index. The calculator 240 is included.

특징정보 추출부(210)는 각 이미지로부터 해당 이미지의 특징정보들을 추출하는 것으로서, 특징정보 추출부(210)는 다양한 포맷으로 압축된 이미지를 이미지 디코딩 라이브러리(Image Decoding Library)를 이용하여 디코딩 함으로써 RGB(Red, Blue, Green) 형태의 로우 데이터(Raw Data)를 생성하고, 생성된 RGB 형태의 로우 데이터에 이미지 처리 기법을 적용함으로써 특징정보를 추출한다.The feature information extractor 210 extracts feature information of a corresponding image from each image. The feature information extractor 210 decodes an image compressed in various formats by using an image decoding library. Raw data is generated in the form of (Red, Blue, Green), and feature information is extracted by applying an image processing technique to the generated RGB data.

일 실시예에 있어서, 특징정보 추출부(210)는 이미지의 특징정보로써 이미지 기본정보, 칼라 분포(Color Distribution)정보, 그레이 분포(Gray Distribution)정보, 및 텍스추어(Texture) 정보를 추출할 수 있는데, 이를 위해 특징정보 추출부(210)는 도 2에 도시된 바와 같이, 이미지 기본정보 추출부(212), 칼라 분포정보 추출부(214), 그레이 분포정보 추출부(216), 및 텍스추어 정보 추출부(218)를 포함 한다.In one embodiment, the feature information extractor 210 may extract image basic information, color distribution information, gray distribution information, and texture information as image feature information. To this end, the feature information extracting unit 210 is an image basic information extracting unit 212, color distribution information extracting unit 214, gray distribution information extracting unit 216, and texture information extraction, as shown in FIG. Section 218.

이미지 기본정보 추출부(212)는 각 이미지로부터 이미지 기본정보를 추출하는 것으로서, 일 실시예에 있어서 이미지 기본정보 추출부(212)는 이미지 기본정보로써 이미지의 크기값, 이미지의 폭(Width)과 높이(Height)의 비율(Ratio), 및 이미지의 압축정보 중 적어도 하나를 추출할 수 있다.The basic image information extracting unit 212 extracts basic image information from each image. In the exemplary embodiment, the basic image information extracting unit 212 may include image size values, image widths, and the like as image basic information. At least one of a ratio of height and compressed information of an image may be extracted.

이미지 크기값은 이미지의 크기를 나타내는 값으로서, 이미지 기본정보 추출부(212)는 해당 이미지의 폭과 높이를 곱한 값을 임계치와 비교함으로써 이미지의 크기값을 결정할 수 있다. 구체적으로, 해당 이미지의 폭과 높이를 곱한 값이 임계치보다 큰 경우 해당 이미지의 크기값으로 1을 결정하고, 해당 이미지의 폭과 높이를 곱한 값이 임계치 이하인 경우 해당 이미지의 폭과 높이를 곱한 값에 대한 로그값과 임계치의 로그값의 비율을 해당 이미지의 크기값으로 결정할 수 있다. 일 실시예에 있어서 이미지의 크기값을 결정하기 위한 임계치는 이미지의 최대 크기(Maximum Size)값으로 설정될 수 있는데, 그 폭은 300이고 높이는 400일 수 있다. 이미지 크기값을 산출하는 방법을 수학식으로 표현하면 아래의 수학식 1과 같다.The image size value is a value representing the size of the image, and the image basic information extractor 212 may determine the size value of the image by comparing a value obtained by multiplying the width and the height of the corresponding image with a threshold value. Specifically, if the product of the width and height of the image is larger than the threshold, the size of the image is determined as 1, and if the product of the width and height of the image is less than or equal to the threshold, the product of the width and height of the image is multiplied. The ratio of the log value of the log and the log value of the threshold may be determined as the size value of the image. In one embodiment, the threshold for determining the size value of the image may be set to a maximum size value of the image, which may be 300 in width and 400 in height. A method of calculating an image size value is expressed by Equation 1 below.

이미지의 폭과 높이의 비율은 1 이하의 범위 내에서 결정될 수 있는데, 이는 이미지의 높이와 폭이 동일한 이미지에 대해 최대값인 1을 설정한다는 것을 의미한 다.The ratio of the width to the height of the image can be determined within a range of 1 or less, which means that a maximum value of 1 is set for an image of the same height and width.

이미지 압축정보는 이미지는 압축이 많이 된 것보다 적게 된 것이 이미지의 화질이 좋을 수 있다는데 착안한 것으로서, 이미지 기본정보 추출부(212)는 이미지를 구성하는 픽셀당 비트 수(Bit Per Pixel)를 계산함에 의해 압축정보를 산출할 수 있다. 이때, 픽셀당 비트 수는 해당 이미지를 구성하는 바이트 크기(Bytesize)를 해당 이미지의 폭과 높이의 곱으로 나눔으로써 산출할 수 있는데 이를 수학식으로 표현하면 아래의 수학식 2와 같다.The image compression information is conceived that the image may have better image quality than that which is compressed much. The image basic information extracting unit 212 calculates bit per pixel constituting the image. Compression information can be calculated. At this time, the number of bits per pixel can be calculated by dividing the byte size constituting the image by the product of the width and height of the image, which is expressed by Equation 2 below.

다음으로, 칼라 분포정보 추출부(214)는 각 이미지로부터 칼라 분포정보를 추출하는 것으로서, 일 실시예에 있어서 칼라 분포정보 추출부(214)는 칼라 분포정보로써 칼라 히스토그램(Color Histogram) 정보, 칼라 채널분포(Color Channel Distribution)정보, 및 기하학적 칼라 분포(Geometry Color Distribution)정보를 추출할 수 있다.Next, the color distribution information extractor 214 extracts color distribution information from each image. In one embodiment, the color distribution information extractor 214 uses color histogram information and color as color distribution information. Color channel distribution information and geometric color distribution information may be extracted.

칼라 히스토그램이란 해당 이미지의 각 픽셀들의 HSV(Hue, Saturation, Value)값을 이용하여 각 픽셀에 상응하는 칼라 빈(Color Bin)을 결정하고, 이러한 칼라 빈들의 출현확률을 그래프 형태로 나타낸 것을 의미한다. 여기서 H는 해당 이미지의 색상을 의미하고, S는 해당 이미지의 명도를 의미하며, V는 해당 이미지의 채도를 의미한다.The color histogram means that color bins corresponding to each pixel are determined by using HSV (Hue, Saturation, Value) values of each pixel of the corresponding image, and the probability of occurrence of these color bins is represented graphically. . Where H means the color of the image, S means the brightness of the image, V means the saturation of the image.

칼라 분포정보 추출부(214)는 칼라 히스토그램 정보로써 해당 이미지에 포함된 전체 칼라 빈들 중 출현확률값이 임계치 이상인 칼라 빈들의 개수, 해당 이미지에 포함된 전체 칼라 빈들의 출현확률값 중 최대값, 해당 이미지에 포함된 전체 칼라 빈들의 출현확률값들 중 최대값과 두 번째로 큰 값의 비율, 해당 이미지를 구성하는 전체 칼라 빈들 각각의 출현확률값과 유니폼(Uniform)한 출현확률값을 가지는 기준 칼라 히스토그램의 칼라 빈들의 출현확률값의 차이값, 및 칼라/흑백 플래그 중 적어도 하나를 추출한다.The color distribution information extracting unit 214 is color histogram information. The number of color bins in which the probability of occurrence of the color bins included in the image is greater than or equal to the threshold, the maximum value of the probability of occurrence of all the color bins included in the image, Of the color bins of the reference color histogram having a ratio between the maximum value and the second largest value among the total color bins included, the probability of occurrence of each of the color bins constituting the image, and a uniform probability of uniformity. At least one of the difference value of the appearance probability value and the color / monochrome flag is extracted.

예컨대, 도 3a에 도시된 바와 같이, 칼라 빈의 출현확률값에 대한 임계치가 0.02인 경우 512개의 전체 칼라 빈들 중 그 출현확률이 0.02이상인 칼라 빈은 7개이고, 출현확률의 최대값은 0.241이며, 출현확률의 최대값(0.241)과 두 번째로 큰 값(0.232)의 비율은 1.04임을 알 수 있다. 여기서, 출현확률값의 최대값과 두 번째로 큰 값의 비율이 크다는 것은 이미지의 품질이 좋지 않다는 것을 의미할 수 있다.For example, as shown in FIG. 3A, when the threshold for the appearance probability value of the color bins is 0.02, among the 512 color bins, there are seven color bins whose probability of occurrence is 0.02 or more, and the maximum probability of the appearance probability is 0.241. It can be seen that the ratio of the maximum value of the probability (0.241) and the second largest value (0.232) is 1.04. Here, the ratio of the maximum value of the probability of occurrence to the second largest value may mean that the image quality is not good.

또한, 해당 이미지를 구성하는 전체 칼라 빈들 각각의 출현확률값과 유니폼한 출현확률값을 가지는 기준 칼라 히스토그램의 칼라 빈들의 출현확률값의 차이값은, 도 3b에 도시된 바와 같이 칼라 빈의 출현확률값이 0.1로 모두 동일한 기준 칼라 히스토그램을 이용하여 산출할 수 있다. 이때, 기준 칼라 히스토그램의 칼라 빈의 출현확률값과 해당 이미지의 칼라 히스토그램의 칼라 빈의 출현확률값의 차이(D)는 다음의 수학식 3을 이용하여 산출할 수 있다.In addition, the difference between the appearance probability value of each of the color bins constituting the image and the appearance probability value of the color bins of the reference color histogram having a uniform appearance probability value is 0.1 as the probability value of the color bin is 0.1. All can be calculated using the same reference color histogram. At this time, the difference (D) between the appearance probability value of the color bin of the reference color histogram and the color bin of the color histogram of the corresponding image may be calculated using Equation 3 below.

여기서, I는 기준 칼라 히스토그램에 상응하는 이미지를 나타내고, I'는 해당 이미지를 나타내며, f_j는 기준 칼라 히스토그램의 j번째 칼라 빈의 출현확률값을 나타내고, f_j'는 해당 이미지의 칼라 히스토그램의 j번째 칼라 빈의 출현확률값을 나타낸다.Where I represents an image corresponding to the reference color histogram, I 'represents the image, f _j represents the probability of occurrence of the jth color bin of the reference color histogram, and f _j ' represents _j of the color histogram of the image. The probability of occurrence of the first color bean.

한편, 칼라/흑백 플래그는 해당 이미지의 흑백/칼라 여부를 나타내는 것으로서, 해다 이미지의 흑백/칼라 여부는 해당 이미지의 픽셀 값이 특정 칼라에 분포되어 있는 정도를 이용하여 결정할 수 있다. 예컨대, 해당 이미지의 픽셀 값이 HSV 색 공간(Color Space) 상에서 흑색과 백색 축에만 밀집되어 있는 경우 해당 이미지는 흑백 이미지로 결정하게 된다.On the other hand, the color / black and white flag indicates whether the image is black and white / color, and whether or not the black and white / color of the image can be determined using the degree that the pixel value of the image is distributed in a specific color. For example, when the pixel values of the image are concentrated only on the black and white axes in the HSV color space, the image is determined as a black and white image.

전체의 평균 HSV 값 및 해당 이미지 중 중앙영역의 평균 HSV값 중 적어도 하나를 포함한다. 즉, 칼라 분포정보 추출부(214)는 도 3c에 도시된 바와 같이, 해당 이미지를 구성하는 모든 픽셀들의 H값의 평균, S값의 평균, 및 V값의 평균 및 해당 이미지의 중앙 영역을 구성하는 모든 픽셀들의 H값의 평균, S값의 평균, 및 V값의 평균 중 적어도 하나를 해당 이미지의 칼라 채널분포 정보로 산출하는 것이다.At least one of an average HSV value of the whole and an average HSV value of the center region of the image. That is, the color distribution information extracting unit 214 constitutes an average of H values, an average of S values, and an average of V values, and a central region of the image, as shown in FIG. 3C. At least one of the average of the H value, the average of the S value, and the average of the V values of all the pixels is calculated as the color channel distribution information of the corresponding image.

한편, 기하학적 칼라 분포정보는, 해당 이미지의 평균 HSV값 각각과 해당 이 미지의 중앙영역의 평균 HSV값의 차이를 의미한다. 즉, 칼라 분포정보 추출부(214)는 도 3c에 도시된 바와 같이, 전체 이미지의 평균 HSV값 각각과 해당 이미지의 중앙 영역의 평균 HSV값 각각의 차이값을 해당 이미지의 기하학적 칼라 분포정보로 산출하는 것이다.On the other hand, the geometric color distribution information means a difference between each average HSV value of the image and the average HSV value of the center region of the image. That is, as shown in FIG. 3C, the color distribution information extracting unit 214 calculates a difference value between each average HSV value of the entire image and the average HSV value of the central region of the image as geometric color distribution information of the corresponding image. It is.

여기서, 산출된 차이값이 작다는 것, 즉, 해당 이미지에서 특징이 될 수 있는 중앙 영역의 평균 HSV값과 이미지 전체의 HSV값의 차이가 작다는 것은 해당 이미지의 품질이 좋지 않다는 것을 의미할 수 있다.Here, a small difference between the calculated HSV value of the center region and the entire HSV value of the image, which may be characteristic of the image, may mean that the quality of the image is not good. have.

다음으로, 그레이 분포정보 추출부(216)는 해당 이미지로부터 그레이 분포정보를 추출하는 것으로서, 일 실시예에 있어서 그레이 분포정보 추출부(216)는 그레이 분포정보로써 그레이 히스토그램(Gray Histogram) 정보를 추출할 수 있다.Next, the gray distribution information extractor 216 extracts gray distribution information from the corresponding image. In one embodiment, the gray distribution information extractor 216 extracts gray histogram information as gray distribution information. can do.

그레이 히스토그램이란 해당 이미지를 구성하는 모든 그레이 빈(Gray Bin)의 출현확률값을 그래프 형태로 나타낸 것으로서, 그레이 분포정보 추출부(216)는 그레이 히스토그램 정보로써 도 3d에 도시된 바와 같이, 해당 이미지에 포함된 전체 그레이 빈들 중 출현확률값이 임계치 이상인 그레이 빈들의 개수, 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값 중 최대값, 및 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값들 중 최대값과 두 번째로 큰 값의 비율 중 적어도 하나를 추출할 수 있다.The gray histogram is a graph showing probability values of all gray bins constituting the image, and the gray distribution information extractor 216 is included in the image as shown in FIG. 3D as gray histogram information. The number of gray bins whose threshold probability is greater than or equal to the threshold, the maximum value of the probability of occurrence of all gray bins included in the image, and the second maximum value of the probability of occurrence of all the gray bins included in the image. At least one of the ratios of the large values can be extracted.

예컨대, 도 3d에서는, 그레이 빈의 출현확률값에 대한 임계치가 0.02인 경우 128개의 전체 그레이 빈들 중 그 출현확률이 0.02이상인 칼라 빈은 5개이고, 출현확률의 최대값은 0.241이며, 출현확률의 최대값(0.241)과 두 번째로 큰 값(0.232) 의 비율은 1.04임을 알 수 있다.For example, in FIG. 3D, when the threshold for the appearance probability value of the gray bins is 0.02, among the total 128 gray bins, there are five color bins whose probability of occurrence is 0.02 or more, the maximum value of the probability of occurrence is 0.241, and the maximum value of the probability of appearance. It can be seen that the ratio of (0.241) to the second largest value (0.232) is 1.04.

그레이 히스토그램 정보 또한 칼라 히스토그램 정보와 마찬가지로, 출현확률값의 최대값과 두 번째로 큰 값의 비율이 크다는 것은 이미지의 품질이 좋지 않다는 것을 의미할 수 있다.Like the histogram information, the gray histogram information may mean that the image quality is not good because the ratio of the maximum value of the appearance probability value to the second largest value is large.

다음으로, 텍스추어 정보 추출부(218)는 해당 이미지로부터 해당 이미지의 텍스추어 정보를 추출하는 것으로서, 여기서, 텍스추어 정보란 해당 이미지를 구성하는 그레이 값들의 공간적 분포에 대한 통계값으로 정의된다. 일 실시예에 있어서, 텍스추어 정보 추출부(218)는 해당 이미지의 텍스추어 정보를 GLCM(Gray-Level Co-occurrence Matrix) 기법을 이용하여 산출할 수 있다.Next, the texture information extractor 218 extracts texture information of the image from the image, where texture information is defined as a statistical value for the spatial distribution of gray values constituting the image. In an embodiment, the texture information extractor 218 may calculate texture information of the image using a gray-level co-occurrence matrix (GLCM) technique.

GLCM 기법은 두 픽셀 사이의 공간적 관계성을 표현하기 위한 방법으로서, 도 3e에 도시된 바와 같이 서로 이웃하는 픽셀 간의 거리 및 각도를 이용하여 서로 이웃하는 픽셀의 밝기 값의 관계를 평균, 대비, 상관관계 등과 같은 기본적인 통계량으로 계산하고, 다시 그 계산 값을 커널내의 중심 픽셀에 새로운 밝기 값으로 할당해서 표현하며 해당 이미지의 부분적인 텍스추어 특징으로 표현하는 기법이다.The GLCM technique is a method for expressing a spatial relationship between two pixels. The distance, angle, and correlation between brightness values of neighboring pixels are averaged, contrasted, and correlated using distances and angles between neighboring pixels as shown in FIG. 3E. It is a technique of calculating basic statistics such as relations, and expressing the calculated values by assigning new brightness values to the central pixels in the kernel and expressing them as partial texture features of the image.

일 실시예에 있어서 텍스추어 정보 추출부(218)는 GLCM을 이용하여 에너지(Energy)값, 엔트로피(Entropy)값, 상관도(Correlation)값, 관성(Inertia)값, 및 동질성(Homogeneity)값을 해당 이미지의 텍스추어 정보로 산출할 수 있다.In one embodiment, the texture information extractor 218 corresponds to an energy value, an entropy value, a correlation value, an inertia value, and a homogeneity value using GLCM. Can be calculated with the texture information of the image.

여기서, 에너지값은 해당 이미지 내에서 명암 값의 균일도(Uniformity)를 나타내는 값으로서, 명암 값이 일정하거나 반복된 형태로 나타나는 경우 해당 이미지는 큰 에너지값을 갖게 되고 그렇지 않은 경우 해당 이미지는 낮은 에너지 값을 가 지게 된다.Here, the energy value is a value representing the uniformity of the contrast value in the image. If the contrast value appears in a constant or repeated form, the image has a large energy value, otherwise the image has a low energy value. Will be taken.

엔트로피값은 해당 이미지의 무질서 또는 혼란 정도, 즉 명암도 분포의 임의성을 나타내는 값으로서, 해당 이미지 내에서 픽셀들의 명암의 차이값이 다양하면 해당 이미지는 큰 엔트로피 값을 갖게 되고, 명암의 차이값이 다양하지 않으면 해당 이미지는 작은 엔트로피 값을 갖게 된다. 즉, 엔트로피 값은 에너지 값에 반비례하는 경향이 있음을 알 수 있다.The entropy value represents the degree of disorder or confusion of the image, that is, the randomness of the intensity distribution. If the difference in contrast between pixels in the image varies, the image has a large entropy value, and the difference in contrast varies. Otherwise, the image will have a small entropy value. In other words, it can be seen that the entropy value tends to be inversely proportional to the energy value.

상관도값은 해당 이미지 내에서 이웃한 두 픽셀간의 명암의 선형 의존도를 나타내는 것으로서, 명암 값의 분포에 특정 패턴이 존재한다면 서로 상관관계가 있다고 판단된다. 예컨대, 상관도값이 0인 경우 상관관계가 없다는 것을 나타내고, 상관도값이 1인 경우 완벽한 상관관계에 있다는 것을 나타낸다. 즉, 특정거리에 비슷한 명암 패턴이 나타나면 해당 이미지는 높은 상관도값을 갖게 되는 것이다.The correlation value represents a linear dependence of contrast between two neighboring pixels in the corresponding image, and if a specific pattern exists in the distribution of the contrast values, it is determined that there is a correlation. For example, a correlation value of 0 indicates no correlation, and a correlation value of 1 indicates perfect correlation. That is, if a similar contrast pattern appears at a specific distance, the image has a high correlation value.

관성값은 해당 이미지 내에서의 명암의 관성 도를 나타내는 것으로서, 관성값은 인접한 두 픽셀의 명암의 차이값을 이용하여 산출할 수 있는데, 두 픽셀간의 명암차가 큰 값이 많을수록 해당 이미지는 큰 관성도 값을 가지게 된다.The inertia value represents the degree of inertia of the contrast within the image. The inertia value can be calculated using the difference between the contrasts of two adjacent pixels. The larger the difference between the two pixels, the greater the inertia of the image. It will have a value.

동질성값은 해당 이미지 내에서의 픽셀들의 동질성을 나타내는 것으로서, 일정 거리 내에 비슷한 명암 값이 많을 수록 해당 이미지는 큰 동질성값을 갖게 되고, 일정 거리 내에 다른 명암 값이 많을수록 해당 이미지는 작은 동질성값을 갖게 된다.The homogeneity value indicates the homogeneity of the pixels in the image. The more similar contrast values within a certain distance, the greater the homogeneity of the image. The more contrast values within a certain distance, the smaller the homogeneity of the image. do.

이때, 텍스추어 정보 추출부(118)는 에너지값, 엔트로피값, 상관도 값, 관성값, 및 동질성 값의 산출에 이용되는 GLCM을 아래의 수학식 4를 이용하여 산출할 수 있다.In this case, the texture information extractor 118 may calculate the GLCM used to calculate the energy value, entropy value, correlation value, inertia value, and homogeneity value by using Equation 4 below.

여기서, P(i,j;d,Θ)는 공동 확률 밀도 함수(Joint Probability Density Function)를 의미하는 것으로서, i 및 j는 각각 해당 이미지 중 제1 좌표(k,l)와 제2 좌표(m,n)의 그레이 값을 의미하고, d는 제1 좌표 및 제2 좌표의 거리를 의미하는 것으로서 수학식 5와 같이 정의되며, Θ는 제1 좌표 및 제2 좌표 사이의 각도를 의미하는 것으로서 수학식 6과 같이 정의된다.Here, P (i, j; d, Θ) means a joint probability density function, i and j are the first coordinate (k, l) and the second coordinate (m) of the corresponding image, respectively. , n) means a gray value, and d means the distance between the first coordinate and the second coordinate, and is defined as in Equation 5, and Θ means the angle between the first coordinate and the second coordinate. It is defined as Equation 6.

정리하여 설명하면, 특징정보 추출부(210)는 이미지 기본정보 추출부(212), 칼라 분포정보 추출부(214), 그레이 분포정보 추출부(216), 및 텍스추어 정보 추출부(218)을 이용하여 해당 이미지로부터 이미지 기본정보, 칼라 분포정보, 그레이 분포정보, 및 텍스추어 정보를 추출한다.In summary, the feature information extractor 210 uses an image basic information extractor 212, a color distribution information extractor 214, a gray distribution information extractor 216, and a texture information extractor 218. Image basic information, color distribution information, gray distribution information, and texture information are extracted from the corresponding image.

다음으로, 비정상 이미지 판단부(220)는 해당 이미지가 비정상적인 이미지인지 여부를 판단하는 것으로서, 비정상 이미지라 함은 예컨대, 해당 이미지가 그래프나 테이블로만 구성되어 있거나, 해당 이미지가 검정색 또는 흰색으로만 구성되 어 있거나, 해당 이미지의 폭과 높이의 비율이 임계치 미만인 경우와 같은 이미지를 의미한다.Next, the abnormal image determination unit 220 determines whether the corresponding image is an abnormal image, and the abnormal image is, for example, the image is composed only of a graph or a table, or the image is composed only of black or white. Or the same image as the width-to-height ratio of the image is below the threshold.

일 실시예에 있어서, 비정상 이미지 판단부(220)는 특징정보 추출부(210)에 의해 추출된 특징정보들 중 일부 특징정보들이 사전에 정해진 값 이하인 경우 해당 이미지를 비정상 이미지로 결정할 수 있다.According to an embodiment, the abnormal image determiner 220 may determine the corresponding image as an abnormal image when some feature information among the feature information extracted by the feature information extractor 210 is equal to or less than a predetermined value.

예컨대, 특징정보 추출부(110)에 의해 추출된 특징정보들 중 이미지의 압축정보, 해당 이미지에 포함된 전체 칼라 빈들 중 출현확률값이 임계치 이상인 칼라 빈들의 개수, 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값 중 최대값, 해당 이미지에 포함된 전체 칼라 빈들의 출현확률값들 중 최대값과 두 번째로 큰 값의 비율, 칼라/흑백 플래그, 해당 이미지에 포함된 전체 그레이 빈들 중 출현확률값이 임계치 이상인 그레이 빈들의 개수, 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값 중 최대값, 및 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값들 중 최대값과 두 번째로 큰 값의 비율 등과 같은 특징 정보들이 임계치 보다 작은 경우 해당 이미지를 비정상 이미지로 판단할 수 있다.For example, among the feature information extracted by the feature information extractor 110, the compressed information of the image, the number of color bins in which the appearance probability value of the color bins included in the image is greater than or equal to a threshold, and the total gray bins included in the image. The maximum value of the probability of appearance, the ratio of the maximum value and the second largest value among the probability values of all the color bins included in the image, the color / monochrome flag, and the gray whose probability of occurrence of the total gray bins in the image is above the threshold. Characteristic information such as the number of bins, the maximum value of the probability of occurrence of the total gray bins included in the image, and the ratio of the maximum value and the second largest value of the probability of occurrence of the total gray bins included in the image may be higher than the threshold. If it is small, the image can be determined as an abnormal image.

비정상 이미지 판단부(220)에 의해 해당 이미지가 비정상적인 이미지로 판단되는 경우, 별도의 품질지수 산출 과정을 거치지 않고 해당 이미지의 품질지수는 0으로 결정된다.If it is determined by the abnormal image determination unit 220 that the image is an abnormal image, the quality index of the image is determined to be zero without undergoing a separate quality index calculation process.

가중치 반영부(230)는 특징정보 추출부(210)에 의해 추출된 각 특징정보 별 가중치를 산출하고, 산출된 특징정보 별 가중치를 각 특징정보에 반영한다. 일 실시예에 있어서, 가중치 반영부(230)는 소정 이미지들을 이용하여 트레이닝 셋(Training Set)을 구성한 후, 트레이닝 셋에 대해 로지스틱 회귀분석(Logistic Regression)과 같은 회귀분석기법을 적용함으로써 각 특징정보 별 가중치를 산출할 수 있다.The weight reflector 230 calculates a weight for each feature information extracted by the feature information extractor 210 and reflects the calculated weight for each feature information to each feature information. In one embodiment, the weight reflector 230 configures a training set using predetermined images, and then applies regression analysis techniques such as logistic regression to each training set. Star weights can be calculated.

상술한 실시예에 있어서는 가중치 반영부(230)가 품질지수 산출부(120)의 필수 구성요소인 것으로 기재하였으나, 각 특징정보에 가중치를 반영하는 것은 선택적인 사항이므로 가중치 반영부(230)는 선택적으로 포함될 수 있을 것이다. 또한, 상술한 실시예에 있어서는 가중치 반영부(230)가 각 특징정보 별로 가중치를 산출하여 각 특징정보 별로 반영하는 것으로 기재하였지만, 변형된 실시예에 있어서는 모든 특징정보에 대해 하나의 가중치를 산출하여 이를 각 특징정보 별로 반영할 수도 있을 것이다.In the above-described embodiment, the weight reflector 230 is described as an essential component of the quality index calculator 120. However, the weight reflector 230 is optional since the weight reflector is optional. It may be included as. In addition, in the above-described embodiment, the weight reflecting unit 230 calculates a weight for each feature information and reflects it for each feature information. However, in the modified embodiment, one weight is calculated for all feature information. This may be reflected for each feature information.

품질지수 연산부(240)는 특징정보 추출부(210)에 의해 추출된 특징정보들을 이용하여 해당 이미지의 품질지수를 산출한다. 여기서, 추출된 각 특징정보들에는 가중치 반영부(230)에 의해 각 특징정보 별로 산출된 가중치가 반영되어 있을 수 있다.The quality index calculator 240 calculates the quality index of the image by using the feature information extracted by the feature information extractor 210. Here, the weights calculated for each feature information by the weight reflector 230 may be reflected in the extracted feature information.

구체적으로, 품질지수 연산부(240)는 추출된 특징정보들 중 2개 이상의 특징정보를 합산함으로써 해당 이미지의 품질지수를 산출할 수 있다. 이때, 각 특징정보들에는 상술한 바와 같이 각 특징정보 별로 산출된 가중치가 반영되어 있을 수 있다.In detail, the quality index calculator 240 may calculate the quality index of the corresponding image by summing two or more feature information of the extracted feature information. In this case, the weights calculated for each feature information may be reflected in each feature information.

일 실시예에 있어서, 특정 특징정보들이 소정 범위 이내에 포함되는 경우 다른 특징정보들에 관계없이 이미지의 품질이 매우 열악할 수 있기 때문에 이를 반영 하기 위해 품질지수 연산부(240)는 추출된 특징정보들 중 일부 특징정보들이 소정 범위 이내에 포함되는 경우 산출된 품질지수에 소정 패널티를 적용할 수 있다.In one embodiment, when the specific feature information is included within a predetermined range because the quality of the image may be very poor regardless of other feature information, the quality index calculation unit 240 of the extracted feature information If some feature information is included within a predetermined range, a predetermined penalty may be applied to the calculated quality index.

예컨대, 해당 이미지의 크기값이 임계치 이하이거나, 해당 이미지의 폭과 높이의 비율이 임계 범위 이내이거나, 해당 이미지의 압축정보가 임계치 이하이거나, 해당 이미지를 구성하는 전체 칼라 빈 중 그 출현확률값이 소정 값 이상인 칼라 빈의 개수가 임계치 이하이거나, 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값들 중 최대값이 임계치 이상이거나, 해당 이미지에 포함된 전체 그레이 빈들의 출현확률값들 중 최대값과 두 번째로 큰 값의 비율이 임계치 이상이거나, 해당 이미지의 중앙 영역의 평균 S값이 임계치 이하인 경우 해당 이미지에 패널티를 적용할 수 있다.For example, the size value of the image is less than or equal to the threshold, the ratio of the width and height of the image is within the threshold range, the compression information of the image is less than or equal to the threshold, and the probability of occurrence of all color bins constituting the image is predetermined. The number of color bins above the value is less than or equal to the threshold, or the maximum of the probability values of all the gray bins included in the image is greater than or equal to the threshold, or second to the maximum value of the probability of the total gray bins included in the image. If the ratio of the large value is above the threshold or the average S value of the central region of the image is below the threshold, the penalty may be applied to the image.

한편, 패널티의 적용으로 인해 해당 이미지의 품질지수가 음의 값이 되거나, 품질지수가 기준 값보다 작아지는 경우 품질지수 연산부(240)는 패널티가 적용된 품질지수를 보상함으로써 최종적인 품질지수를 산출할 수 있다.On the other hand, if the quality index of the image becomes a negative value or the quality index is smaller than the reference value due to the application of the penalty, the quality index calculator 240 calculates the final quality index by compensating for the quality index to which the penalty is applied. Can be.

품질지수 연산부(240)는 산출된 이미지의 품질지수를 해당 이미지와 매칭시켜 이미지 데이터베이스(110)에 저장할 수 있다.The quality index calculator 240 may match the calculated quality index with the corresponding image and store it in the image database 110.

다시 도 1을 참조하면, 선호도 산출부(130)는 각 이미지의 선호도(Popularity)를 산출한다. 여기서, 선호도는 많은 사용자들에 의해서 인용된 정도를 나타내는 것으로서, 일 실시예에 있어서, 선호도는 이미지 자체의 중복 정도를 나타내는 제1 선호도 정보와 해당 이미지가 포함되어 있는 출처의 중복 정도를 나타내는 제2 선호도 정보를 포함할 수 있다.Referring back to FIG. 1, the preference calculator 130 calculates a preference of each image. Here, the preference indicates a degree cited by many users, and in one embodiment, the preference indicates a second degree of overlap between first preference information indicating a degree of overlap of the image itself and a source including the image. May include preference information.

제1 선호도 정보는 이미지 자체가 중복된 개수(sig_dup_cnt)를 이용하여 산출할 수 있는데, 제1 선호도 정보는 이미지 자체가 중복된 개수가 많을수록 높은 값을 갖게 된다.The first preference information may be calculated using the number sig_dup_cnt of which the image itself is duplicated. The first preference information may have a higher value as the number of images itself is duplicated.

한편, 제2 선호도 정보는 이미지가 포함된 컬렉션의 개수(col_dup_cnt)를 이용하여 산출할 수 있는데, 여기서, 컬렉션이란 블로그, 게시판, 뉴스, 웹, 사전 등과 같은 항목들을 의미한다. 이러한 제2 선호도 정보 또한 이미지가 포함된 컬렉션의 개수가 많을수록 높은 값을 갖게 된다.Meanwhile, the second preference information may be calculated using the number of collections (col_dup_cnt) including an image, where the collection means items such as a blog, a bulletin board, a news, a web, a dictionary, and the like. The second preference information also has a higher value as the number of collections containing images increases.

선호도 산출부(130)는 상술한 방법을 통해 산출된 이미지의 선호도를 각 이미지와 매칭시켜 이미지 데이터베이스(110)에 저장할 수 있다.The preference calculator 130 may match the preference of the image calculated by the above method with each image and store the same in the image database 110.

다음으로, 최신도 산출부(140)는 각 이미지들의 최신도(Recency)를 산출한다. 일 실시예에 있어서, 최신도 산출부(140)는 각 이미지가 생성된 날짜와 현재 날짜와의 차이값과 임계치 날짜를 변수로 하는 단조감수함수를 이용하여 각 이미지의 최신도를 산출할 수 있다. 최신도 산출부(140)는 산출된 최신도를 각 이미지와 매칭시켜 이미지 데이터베이스(110)에 저장할 수 있다.Next, the recency calculator 140 calculates recency of each image. In one embodiment, the freshness calculator 140 may calculate the freshness of each image by using a monotonic subtractive function having the difference between the date at which each image is generated and the current date and the threshold date as variables. . The latestness calculator 140 may match the calculated latestness with each image and store it in the image database 110.

인터페이스부(150)는 사용자 단말기(104)로부터 입력되는 쿼리를 수신하고, 후술할 검색결과 생성부(180)에 의해 생성된 검색결과를 사용자 단말기(104)를 통해 사용자에게 제공한다.The interface unit 150 receives a query input from the user terminal 104 and provides a search result generated by the search result generator 180 to be described later to the user through the user terminal 104.

텍스트 유사도 산출부(160)는 인터페이스(150)를 통해 수신된 쿼리와 각 이미지들 사이의 텍스트의 유사도(Similarity)를 산출하는 것으로서, 텍스트 유사도 산출부(160)는 각 이미지들과 관련된 텍스트의 제목, 본문, 해당 이미지에 대한 링 크주소 중 적어도 하나(이하, '이미지 문서'라 함) 에 포함되어 있는 단어들의 중요도(Weight)를 이용하여 각 이미지의 텍스트 유사도를 산출할 수 있는데, 이러한 단어의 중요도는 상술한 바와 같이 해당 이미지와 매칭되어 저장되어 있을 수 있다.The text similarity calculator 160 calculates a similarity of text between the query received through the interface 150 and the respective images. The text similarity calculator 160 may include a title of the text associated with each image. The text similarity of each image can be calculated by using the weights of the words included in at least one of the body, the text, and the link address for the image (hereinafter referred to as an 'image document'). As described above, the importance may be matched with the corresponding image and stored.

일 실시예에 있어서, 각 이미지 문서에 포함되어 있는 단어들의 중요도는 수정된 포아송(2-Poisson) 모델을 이용하여 산출할 수 있는데, 구체적으로 각 단어의 중요도는 해당 단어가 해당 이미지 문서에 출현한 횟수, 해당 단어가 출현한 이미지 문서의 개수, 각 이미지 문서에 포함되어 있는 단어의 개수 등을 이용하여 산출할 수 있다.In one embodiment, the importance of the words included in each image document may be calculated using a modified Poisson model. Specifically, the importance of each word is determined by the fact that the word appears in the image document. The number may be calculated using the number of times, the number of image documents in which the corresponding word appears, the number of words included in each image document, and the like.

예컨대, 소정 이미지 문서에 "예쁜 전지현이 나오는 영화인데, 이 영화에서는 전지현이 엽기적인 행동을 하는 것으로 나온다"라는 텍스트가 포함되어 있는 경우, 형태소 분석 등을 통하여 해당 이미지 문서에 포함되어 있는 단어들을 추출하고, 각 단어들에 대해 해당 단어가 해당 이미지 문서 내에 출현한 횟수, 해당 단어가 출현한 이미지 문서의 개수, 각 이미지 문서에 포함되어 있는 단어의 개수 등을 이용하여 각 단어의 중요도(예컨대, 전지현: 100, 영화: 50, 엽기: 30)를 산출하여 이를 해당 이미지와 함께 저장하는 것이다.For example, if a certain image document contains the text, "This is a movie with a beautiful Jun Ji-hyun, it appears that Jun Ji-hyun is acting bizarrely." For each word, the importance of each word is determined using the number of times the word appears in the image document, the number of image documents in which the word appears, and the number of words in each image document. 100, movie: 50, bizarre: 30) and store it with the image.

텍스트 유사도 산출부(160)는 상술한 각 단어의 중요도를 이용하여 해당 쿼리와 각 이미지와의 텍스트 유사도를 산출할 수 있다. 일 실시예에 있어서, 텍스트 유사도 산출부(160)는 아래의 수학식 7을 이용하여 텍스트 유사도를 산출할 수 있다.The text similarity calculator 160 may calculate a text similarity between the query and each image by using the importance of the above-described words. In one embodiment, the text similarity calculator 160 may calculate the text similarity using Equation 7 below.

수학식 7은 q라는 쿼리와 d라는 이미지 문서간의 텍스트 유사도를 산출하는 수학식을 나타내는 것으로서, 여기서, qtf_i 는 쿼리 q에 내에서 단어i의 출현횟수를 의미하고, weight(term_i,d)는 d라는 이미지 문서 내에서 단어i의 중요도를 의미한다.Equation 7 represents an equation for calculating text similarity between the query q and the image document d, where qtf _i denotes the number of occurrences of the word i in the query q, and weight (term _i , d) Denotes the importance of the word i in the image document d.

예컨대, 사용자로부터 입력된 쿼리가 "예쁜 예쁜 전지현"이고, 해당 쿼리에 포함되어 있는 "예쁜"이라는 단어와 이미지 문서간의 텍스트 유사도를 산출하는 경우, 해당 이미지 내에서"예쁜"이라는 단어의 중요도가 50이라고 가정하면, "예쁜"이라는 단어에 대한 텍스트 유사도는 쿼리 내에서 "예쁜"이라는 단어가 2회 출현하였기 때문에 "2ⅹ50=100"이 된다.For example, if the query entered from the user is "pretty pretty Jun Ji-hyun" and the text "goodness" included in the query yields a text similarity between the image document, the importance of the word "pretty" in the image is 50. , The text similarity for the word " pretty " becomes " 2ⅹ50 = 100 " since the word " pretty " appears twice in the query.

다음으로, 적합도 산출부(170)는 품질지수 산출부(120)에 의해 산출된 품질지수, 선호도 산출부(130)에 의해 산출된 선호도, 최신도 산출부(140)에 의해 산출된 최신도, 및 텍스트 유사도 산출부(160)에 의해 산출된 쿼리와 이미자간 텍스트 유사도를 이용하여 입력 쿼리에 대한 각 이미지들의 적합도를 산출한다.Next, the goodness-of-fit calculation unit 170 is the quality index calculated by the quality index calculation unit 120, the preference calculated by the preference calculation unit 130, the latestness calculated by the latest degree calculation unit 140, And using the text similarity between the query and the image calculated by the text similarity calculating unit 160, the goodness of fit of each image for the input query.

일 실시예에 있어서, 적합도 산출부(170)는 각 이미지의 품질지수, 선호도, 최신도, 및 텍스트 유사도를 모두 합산함으로써 각 이미지의 적합도를 산출할 수 있다. 변형된 실시예에 있어서는, 품질지수, 선호도, 및 최신도 모두를 텍스트 유사도와 합산하는 것이 아니라 품질지수, 선호도, 및 최신도 중 적어도 하나를 텍스트 유사도와 합산함으로써 각 이미지의 적합도를 산출할 수도 있을 것이다.In one embodiment, the goodness-of-fit calculation unit 170 may calculate the goodness of fit of each image by adding up the quality index, the preference, the latestness, and the text similarity of each image. In a modified embodiment, the goodness of fit of each image may be calculated by summing at least one of the quality index, preference, and freshness with the text similarity, rather than adding all of the quality index, preference, and freshness to the text similarity. will be.

한편, 적합도 산출부(170)는 적합도를 산출함에 있어서, 선호도의 경우 소정 조건을 만족하는 경우에 한하여 이를 고려하여 적합도를 산출할 수 있다. 일 실시예에 있어서, 제1 선호도의 경우 이미지의 품질지수, 텍스트 유사도, 및 이미지 자체의 중복 개수가 임계치 이상인 경우에 한하여 제1 선호도를 고려하여 적합도를 산출하고, 제2 선호도의 경우 이미지 품질지수 및 이미지가 포함된 컬렉션의 개수가 임계치 이상인 경우에 한하여 제2 선호도를 고려하여 적합도를 산출할 수 있다.Meanwhile, in calculating the fitness, the fitness calculator 170 may calculate the fitness by considering the case only when the predetermined condition is satisfied. In one embodiment, the fitness index is calculated in consideration of the first preference only when the quality index of the image, the text similarity, and the number of overlaps of the image itself are greater than or equal to the threshold, and the image quality index is the second preference. And the fitness may be calculated in consideration of the second preference only when the number of collections including the image is greater than or equal to the threshold.

상술한 실시예에 있어서는 적합도 산출부(170)가 품질지수, 선호도, 최신도, 및 텍스트 유사도를 바로 합산하는 것으로 기재하였지만, 변형된 실시예에 있어서는 이들 값 각각에 대하여 가중치를 반영한 후 가중치가 반영된 값들을 합산할 수도 있을 것이다. 이를 위해 이미지 검색 시스템(100)은 품질지수, 선호도, 최신도, 및 텍스트 유사도 각각에 대해 가중치를 산출하는 가중치 산출부(175)를 더 포함할 수 있다.In the above-described embodiment, the goodness-of-fit calculation unit 170 directly adds the quality index, the preference, the latestness, and the text similarity. However, in the modified embodiment, the weight is reflected after each weight is applied to each of these values. You can also sum the values. To this end, the image retrieval system 100 may further include a weight calculator 175 that calculates a weight for each of the quality index, the preference, the latestness, and the text similarity.

즉, 가중치 산출부(175)가 품질지수, 선호도, 최신도, 및 텍스트 유사도 별로 가중치를 산출하여 그 결과를 적합도 산출부(170)로 제공하면, 적합도 산출부(170)가 가중치를 반영한 후 그 결과값들을 모두 합산함으로써 이미지의 적합도를 산출하게 되는 것이다. 이때, 가중치 산출부(175)는 선호도에 대한 가중치를 산출함에 있어서 제1 선호도 정보 및 제2 선호도 정보 각각에 대해 가중치를 따로 산출할 수 있다.That is, if the weight calculator 175 calculates the weight for each quality index, preference, freshness, and text similarity and provides the result to the fitness calculator 170, the fitness calculator 170 reflects the weight and then the weight is calculated. By summing up all the results, you can calculate the goodness of fit of the image. In this case, the weight calculator 175 may calculate weights for each of the first preference information and the second preference information separately in calculating the weight for the preference.

상술한 실시예에 있어서 가중치 산출부(175)는 품질지수, 선호도, 및 최신도 모두에 대해 가중치를 산출하는 것으로 기재하였지만, 적합도 산출에 품질지수, 선호도, 및 최신도 모두가 이용되지 않는 경우, 가중치 산출부(175)는 적합도 산출에 이용되는 것들에 대해서만 가중치를 산출할 수 있을 것이다.In the above-described embodiment, the weight calculator 175 calculates the weights for all of the quality index, the preference, and the latestness. However, if neither the quality index, the preference, nor the latestness are used for the calculation of the fitness, The weight calculator 175 may calculate weights only for those used for calculating the goodness of fit.

다음으로, 검색결과 생성부(180)는 적합도 산출부(170)에 의해 산출된 이미지의 적합도를 이용하여 사용자에 의해 입력된 쿼리에 대한 이미지 검색결과를 생성하고, 생성된 이미지 검색결과를 인터페이스부(150)를 통해 사용자 단말기(104)로 제공한다.Next, the search result generator 180 generates an image search result for the query input by the user using the goodness of fit of the image calculated by the fitness calculator 170, and generates the image search result into the interface unit. Provided to user terminal 104 through 150.

일 실시예에 있어서 검색결과 생성부(180)는 적합도 산출부(170)에 의해 산출된 적합도 순서대로 각 이미지들을 정렬함으로써 입력 쿼리에 대한 이미지 검색결과를 생성할 수 있다. 이때, 이미지 검색결과를 생성함에 있어서, 적합도가 기준치 이하인 것들은 검색결과 내에 포함시키지 않을 수 있을 것이다.In one embodiment, the search result generator 180 may generate an image search result for the input query by sorting the images in the order of fitness calculated by the fitness calculator 170. In this case, in generating the image search results, those having a goodness of fit or less than the reference value may not be included in the search results.

이하에서는 도 4를 참조하여 본 발명의 일 실시예에 따른 이미지 검색방법에 대해 구체적으로 설명한다.Hereinafter, an image search method according to an embodiment of the present invention will be described in detail with reference to FIG. 4.

먼저, 사용자 단말기를 통해 소정 쿼리가 수신되면(S400), 수신된 쿼리와 각 이미지간의 텍스트 유사도를 산출한다(S410). 일 실시예에 있어서 각 이미지의 텍스트 유사도는 상술한 수학식 7에 기재된 바와 같이, 각 이미지와 관련된 텍스트의 제목, 본문, 해당 이미지에 대한 링크주소 중 적어도 하나에 포함되어 있는 단어들의 중요도를 이용하여 산출할 수 있으며, 이때 각 이미지에 포함되어 있는 단어의 중요도는 미리 계산되어 해당 이미지와 매칭되어 저장되어 있을 수 있다.First, when a predetermined query is received through the user terminal (S400), a text similarity between the received query and each image is calculated (S410). In one embodiment, the text similarity of each image may be determined by using the importance of words included in at least one of a title, a text, and a link address of the image associated with each image, as described in Equation 7 above. In this case, the importance of the words included in each image may be calculated in advance and matched with the corresponding image and stored.

여기서, 각 이미지 문서에 포함되어 있는 단어들의 중요도는 수정된 포아송(2-Poisson) 모델을 이용하여 산출할 수 있는데, 구체적으로 각 단어의 중요도는 해당 단어가 해당 이미지 문서에 출현한 횟수, 해당 단어가 출현한 이미지 문서의 개수, 각 이미지 문서에 포함되어 있는 단어의 개수 등을 이용하여 산출할 수 있다.Here, the importance of the words included in each image document can be calculated using the modified Poisson (2-Poisson) model. Specifically, the importance of each word is the number of times the word appeared in the image document, the word It can be calculated using the number of the image document appeared, the number of words contained in each image document, and the like.

다음으로, 각 이미지의 품질지수, 선호도, 및 최신도 중 적어도 하나를 텍스트 유사도와 합산함으로써 해당 쿼리에 대한 각 이미지의 적합도를 산출한다(S420). 즉, 품질지수, 선호도, 및 최신도 모두를 텍스트 유사도와 합산하거나, 품질지수, 선호도, 및 최신도 중 적어도 하나를 텍스트 유사도와 합산함으로써 해당 쿼리에 대한 각 이미지의 적합도를 산출하는 것이다.Next, a goodness of fit of each image for the corresponding query is calculated by summing at least one of the quality index, the preference, and the latestness of each image with the text similarity (S420). That is, the fitness of each image for the corresponding query is calculated by summing all of the quality index, the preference, and the latestness with the text similarity, or summing at least one of the quality index, the preference, and the latestness with the text similarity.

일 실시예에 있어서, 적합도를 산출함에 있어서 선호도의 경우 소정 조건을 만족하는 경우에 한하여 이를 고려하여 적합도를 산출할 수 있다. 예컨대, 제1 선호도의 경우 이미지의 품질지수, 텍스트 유사도, 및 이미지 자체의 중복 개수가 임계치 이상인 경우에 한하여 제1 선호도를 고려하여 적합도를 산출하고, 제2 선호도의 경우 이미지 품질지수 및 이미지가 포함된 컬렉션의 개수가 임계치 이상인 경우에 한하여 제2 선호도를 고려하여 적합도를 산출할 수 있다.In one embodiment, in calculating the goodness of fit, the goodness of fit may be calculated in consideration of this only when the predetermined condition is satisfied. For example, in the case of the first preference, the fitness may be calculated in consideration of the first preference only when the quality index of the image, the text similarity, and the number of overlaps of the image itself are greater than or equal to the threshold, and in the case of the second preference, the image quality index and the image are included. Only when the number of collected collections is greater than or equal to the threshold, the fitness may be calculated in consideration of the second preference.

상술한 실시예에 있어서는 품질지수, 선호도, 최신도, 및 텍스트 유사도를 바로 합산하는 것으로 기재하였으나, 변형된 실시예에 있어서는, 품질지수, 선호도, 최신도, 및 텍스트 유사도 각각에 대해 가중치를 산출한 후, 가중치가 부여된 품질지수, 선호도, 및 최신도 중 적어도 하나와 가중치가 부여된 텍스트 유사도를 합산함으로써 적합도를 산출할 수도 있을 것이다.In the above-described embodiment, the quality index, the preference, the latestness, and the text similarity are described as directly summing. However, in the modified embodiment, weights are calculated for each of the quality index, the preference, the latestness, and the text similarity. Then, the fitness may be calculated by summing at least one of the weighted quality index, the preference, and the latestness and the weighted text similarity.

한편, 각 이미지의 품질지수, 선호도, 및 최신도는 미리 계산되어 각 이미지 별로 저장되어 있을 수 있는데, 이하에서는 이러한 이미지 품질지수, 선호도, 및 최신도를 산출하는 방법에 대해 구체적으로 설명한다.Meanwhile, the quality index, the preference, and the latestness of each image may be calculated in advance and stored for each image. Hereinafter, a method of calculating the image quality index, the preference, and the latestness will be described in detail.

첫 번째로, 이미지의 품질지수는 각 이미지의 품질을 나타내는 것으로서, 각 이미지로부터 추출되는 특징정보들을 이용하여 산출될 수 있다. 이러한 이미지 품질지수 산출방법을 도 5를 참조하여 구체적으로 살펴보면 먼저, 각 이미지로부터 특징정보들을 추출한다(S500). 일 실시예에 있어서, 해당 이미지의 특징정보로써 이미지 기본정보, 칼라 분포정보, 그레이 분포정보, 및 텍스추어 정보 중 적어도 하나를 추출할 수 있다.First, the quality index of an image represents the quality of each image, and may be calculated using feature information extracted from each image. Looking at the image quality index calculation method in detail with reference to Figure 5, first, feature information is extracted from each image (S500). In one embodiment, at least one of basic image information, color distribution information, gray distribution information, and texture information may be extracted as feature information of the corresponding image.

여기서, 이미지 기본정보는 이미지의 크기값, 이미지의 폭과 높이의 비율, 이미지의 압축정보 중 적어도 하나를 포함하고, 칼라 분포정보는 칼라 히스토그램 정보, 칼라 채널정보, 및 기하하적 칼라 분포 정보 중 적어도 하나를 포함하며, 그레이 분포정보는 해당 이미지의 그레이 히스토그램 정보를 포함한다. 칼라 히스토그램 정보, 칼라 채널정보, 기하하적 칼라 분포정보, 그레이 히스토그램 정보는 다시 다양한 정보들을 포함할 수 있는데, 이러한 다양한 정보들은 상술한 이미지 품질지수 산출 시스템의 설명부분에서 상세히 설명하였으므로 구체적인 설명은 생략하기로 한다.Here, the basic image information includes at least one of a size value of the image, a ratio of the width and height of the image, and compression information of the image, and the color distribution information includes color histogram information, color channel information, and geometric color distribution information. The gray distribution information includes at least one gray histogram of the corresponding image. The color histogram information, the color channel information, the geometric color distribution information, and the gray histogram information may include various pieces of information, which are described in detail in the above description of the image quality index calculation system. Let's do it.

한편, 텍스추어 정보는 해당 이미지를 구성하는 그레이 값들의 공간적 분포에 대한 통계값을 의미하는 것으로서, 일 실시예에 있어서 GLCM을 이용하여 산출될 수 있다. 구체적으로, 본 발명은 이러한 텍스추어 정보로써 GLCM값으로부터 에너지값, 엔트로피값, 상관도값, 관성값, 및 동질성값을 산출할 수 있다.Meanwhile, the texture information refers to a statistical value of a spatial distribution of gray values constituting the image, and may be calculated using GLCM in one embodiment. Specifically, the present invention can calculate the energy value, entropy value, correlation value, inertia value, and homogeneity value from the GLCM value with such texture information.

다음으로 추출된 특징정보들을 이용하여 해당 이미지가 비정상 이미지인지 여부를 판단한다(S510). 여기서, 비정상적인 이미지라 함은 해당 이미지가 그래프나 테이블로만 구성되어 있거나, 해당 이미지가 검정색 또는 흰색으로만 구성되어 있거나, 해당 이미지의 폭과 높이의 비율이 임계치 미만인 경우와 같은 이미지를 의미한다. 해당 이미지가 비정상 이미지인지 여부를 판단하기 위해 사용되는 특징정보들의 종류는 상술한 비정상 이미지 판단부의 설명에서 상세히 기재하였으므로 구체적으로 설명은 생략하기로 한다.Next, it is determined whether the corresponding image is an abnormal image using the extracted feature information (S510). Here, the abnormal image refers to an image such as when the image is composed only of a graph or table, the image is composed only of black or white, or the ratio of the width and height of the image is less than a threshold. Types of feature information used to determine whether the corresponding image is an abnormal image have been described in detail in the above description of the abnormal image determination unit, and thus a detailed description thereof will be omitted.

판단결과, 해당 이미지가 비정상적인 이미지인 것으로 판단되면 해당 이미지의 최종적인 품질지수를 0으로 결정하고(S580), 해당 이미지가 비정상적인 이미지가 아닌 것으로 판단되면 각 특징정보 별로 산출된 가중치를 각 특징정보에 반영한다(S520). 일 실시예에 있어서, 각 특징정보 별 가중치는 소정 이미지들을 이용하여 트레이닝 셋(Training Set)을 구성한 후, 트레이닝 셋에 대해 로지스틱 회귀분석기법(Logistic Regression)을 적용함으로써 산출할 수 있다.As a result, if it is determined that the image is an abnormal image, the final quality index of the image is determined to be 0 (S580). If it is determined that the image is not an abnormal image, the weight calculated for each feature information is assigned to each feature information. Reflect (S520). In one embodiment, the weight for each feature information may be calculated by constructing a training set using predetermined images and then applying logistic regression to the training set.

본 실시예에 있어서는 가중치 반영단계가 필수적인 과정인 것으로 기재하였으나, 가중치 반영 과정은 선택적으로 포함될 수 있을 것이다.In the present embodiment, the weight reflecting step is described as an essential process, but the weight reflecting process may be optionally included.

다음으로, 가중치가 반영된 특징정보들 중 2개 이상의 특징정보들을 합산함으로써 해당 이미지의 품질지수를 산출한다(S530).Next, the quality index of the corresponding image is calculated by summing two or more feature information among the feature information reflecting the weight (S530).

이후, S500단계에서 추출된 특징정보들 중 특정 특징정보가 소정 범위 이내 에 포함되는지 여부를 판단하여(S540), 소정 범위 이내에 포함되는 경우 S530단계에서 산출된 품질지수에 소정 패널티를 적용한다. 여기서, 품질지수에 패널티를 적용하는 이유는 특정 특징정보들이 소정 범위 이내에 포함되는 경우 다른 특징정보들에 관계없이 이미지의 품질이 매우 열악할 수 있기 때문에, 이를 이미지 품질지수에 반영하기 위한 것이다. 따라서, 이러한 패널티 적용과정은 선택적으로 포함될 수 있을 것이다. 패널티 적용 판단을 위해 이용되는 특징정보들의 종류는 상술한 품질지수 연산부의 설명에서 상세히 기재하였으므로 구체적인 설명은 생략하기로 한다.Thereafter, it is determined whether the specific feature information among the feature information extracted in step S500 is included within a predetermined range (S540), and when included within the predetermined range, a predetermined penalty is applied to the quality index calculated in step S530. Here, the reason why the penalty is applied to the quality index is to reflect the quality of the image because the quality of the image may be very poor regardless of other feature information when the specific feature information is included within a predetermined range. Therefore, this penalty application process may be optionally included. Types of feature information used for determining a penalty application are described in detail in the above description of the quality index calculation unit, and thus a detailed description thereof will be omitted.

한편, S540단계의 판단결과 특정 특징정보가 소정 범위 이내에 포함되지 않는 경우 S530단계에서 산출된 품질지수를 해당 이미지의 최종 품질지수로 결정한다(S580).On the other hand, if the specific feature information is not included within the predetermined range as a result of the determination in step S540 determines the quality index calculated in step S530 as the final quality index of the image (S580).

이후, 패널티가 적용된 품질지수가 음수 또는 기준값 이하인지 여부를 판단하여(S560), 품질지수가 음수 또는 기준값 이하인 것으로 판단되면 패널티가 적용된 품질지수를 보상하고(S570), 보상된 품질지수를 해당 이미지에 대한 최종적인 품질지수로 결정한다(S580). 상술한 패널티 적용 과정이 선택적인 사항이므로 품질지수 보상 과정 또한 선택적으로 포함될 수 있는 사항일 것이다.Subsequently, it is determined whether the quality index to which the penalty is applied is negative or less than the reference value (S560). If it is determined that the quality index is negative or less than the reference value, the quality index to which the penalty is applied is compensated (S570), and the compensated quality index is applied to the corresponding image. Determine the final quality index for (S580). Since the above-described penalty application process is optional, the quality index compensation process may also be optionally included.

한편, S560단계의 판단결과 패널티가 적용된 품질지수가 음수 또는 기준값 이하가 아닌 경우, 패널티가 적용된 품질지수를 해당 이미지의 최종 품질지수로 결정한다(S580).On the other hand, if it is determined in step S560 that the quality index to which the penalty is applied is not negative or less than the reference value, the quality index to which the penalty is applied is determined as the final quality index of the image (S580).

두 번째로, 이미지의 선호도는 각 이미지가 사용자들에 의해서 인용된 정도 를 나타내는 것으로서, 일 실시예에 있어서 선호도는 이미지 자체의 중복 정도를 나타내는 제1 선호도 정보와 해당 이미지가 포함되어 있는 출처의 중복 정도를 나타내는 제2 선호도 정보를 포함할 수 있다.Secondly, the preference of an image indicates the degree to which each image is cited by the users, and in one embodiment, the preference is the overlap of the first preference information indicating the degree of overlap of the image itself and the source in which the image is included. It may include second preference information indicating the degree.

여기서, 제1 선호도 정보는 이미지 자체가 중복된 개수(sig_dup_cnt)를 이용하여 산출할 수 있는데, 제1 선호도 정보는 이미지 자체가 중복된 개수가 많을수록 높은 값을 갖게 된다. 또한, 제2 선호도 정보는 이미지가 포함된 컬렉션의 개수(col_dup_cnt)를 이용하여 산출할 수 있는데, 여기서, 컬렉션이란 블로그, 게시판, 뉴스, 웹, 사전 등과 같은 항목들을 의미한다. 이러한 제2 선호도 정보 또한 이미지가 포함된 컬렉션의 개수가 많을수록 높은 값을 갖게 된다.Here, the first preference information may be calculated using the number sig_dup_cnt of which the image itself is duplicated. The first preference information has a higher value as the number of images itself is duplicated. In addition, the second preference information may be calculated by using the number of collections (col_dup_cnt) including an image. Here, the collection means items such as a blog, a bulletin board, a news, a web, a dictionary, and the like. The second preference information also has a higher value as the number of collections containing images increases.

세 번째로, 이미지의 최신도는 각 이미지들이 얼마나 최신의 것인지를 나타내는 것으로서, 이러한 각 이미지의 최신도는 각 이미지가 생성된 날짜와 현재 날짜와의 차이값과 임계치 날짜를 변수로 하는 단조감수함수를 이용하여 산출할 수 있다.Third, the freshness of an image indicates how up-to-date each image is, and the freshness of each image is the monotonic function with the difference between the date when each image was created and the current date and the threshold date. It can be calculated using.

다시 도 4를 참조하면, S420단계에서 산출된 적합도를 이용하여 수신된 쿼리에 대한 이미지 검색결과를 생성한다(S430). 일 실시예에 있어서, 이미지 검색결과는 각 이미지들을 적합도 순서대로 정렬함으로써 생성할 수 있는데, 이때, 기준치 이하의 적합도를 가지는 이미지들은 검색결과에 포함시키지 않을 수 있다.Referring to FIG. 4 again, an image search result for the received query is generated using the goodness of fit calculated in step S420 (S430). In one embodiment, the image search results may be generated by arranging the images in the order of goodness of fit, wherein images having a goodness of fit below the reference value may not be included in the search results.

마지막으로 생성된 이미지 검색결과를 사용자 단말기를 통해 사용자에게 제공한다(S440).Finally, the generated image search result is provided to the user through the user terminal (S440).

상술한 이미지 검색방법은 다양한 컴퓨터 수단을 이용하여 수행될 수 있는 프로그램 형태로도 구현될 수 있는데, 이때 이미지 검색방법을 수행하기 위한 프로그램은 하드 디스크, CD-ROM, DVD, 롬(ROM), 램, 또는 플래시 메모리와 같은 컴퓨터로 판독할 수 있는 기록 매체에 저장된다.The above-described image retrieval method may also be implemented in the form of a program that can be executed using various computer means. In this case, the program for performing the image retrieval method may be a hard disk, a CD-ROM, a DVD, a ROM, or a RAM. Or a computer-readable recording medium such as a flash memory.

본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.Those skilled in the art to which the present invention pertains will understand that the present invention can be implemented in other specific forms without changing the technical spirit or essential features.

그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Therefore, it is to be understood that the embodiments described above are exemplary in all respects and not restrictive. The scope of the present invention is shown by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

도 1은 본 발명의 일 실시예에 따른 이미지 검색 시스템의 개략적인 블록도.1 is a schematic block diagram of an image retrieval system according to an embodiment of the present invention;

도 2는 도 1에 도시된 품질지수 산출부의 일 예를 보여주는 블럭도.FIG. 2 is a block diagram illustrating an example of the quality index calculator shown in FIG. 1.

도 3a 내지 도 3e는 품질지수 산출부가 이미지로부터 특징정보를 추출하는 방법을 보여주는 도면.3A to 3E are views illustrating a method of extracting feature information from an image by a quality index calculator.

도 4는 본 발명의 일 실시예에 따른 이미지 검색 방법을 보여주는 플로우차트.4 is a flowchart showing an image retrieval method according to an embodiment of the present invention;

도 5는 본 발명의 일 실시예에 따른 품질지수 산출 방법을 보여주는 플로우차트.5 is a flowchart showing a method for calculating a quality index according to an embodiment of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100: 이미지 검색 시스템 110: 이미지 데이터베이스100: image search system 110: image database

120: 품질지수 산출부 130: 선호도 산출부120: quality index calculator 130: preference calculator

140: 최신도 산출부 150: 인터페이스부140: latest calculation unit 150: interface unit

160: 텍스트 유사도 산출부 170: 적합도 산출부160: text similarity calculator 170: goodness of fit calculator

175: 가중치 산출부 180: 검색결과 생성부175: weight calculator 180: search result generator

Claims

Calculating a quality value of each image;

Calculating a text similarity between a query and each of the images; And

Calculating a relevance of the image for the query using the quality index of each image and the text similarity of each image;

Generating an image search result for the query using the goodness of fit of the image; And

Providing the image search results to a user

Including,

Calculating a preference of each image;

In the fitness calculation step, the fitness is calculated in consideration of the preference of each image,

And the preference information of each image comprises at least one of first preference information indicating a degree of overlap of the image itself and second preference information indicating a degree of overlap of a source of the image.

delete

The method of claim 1, wherein the calculating of the text similarity of the image comprises:

The text similarity of the image is calculated by using the importance of the words included in the query in at least one of the body, the title, and the link address of the image associated with the image.

The method of claim 3,

The importance is calculated by using a modified Poisson (2-Poisson) model.

The method of claim 1,

The image quality index is generated using image feature information including at least one of image basic information, color distribution information, gray distribution information, and texture information extracted from each image. How to search for images.

delete

The method of claim 1,

And the first preference information is calculated using the number of overlaps of the image itself, and the second preference information is calculated using the number of sources including the image.

The method of claim 8, wherein the calculating of the goodness of fit,

Determining whether at least one of the text similarity, the quality index, the number of duplicates of the image itself, and the number of sources including the image satisfies a predetermined condition;

If it satisfies the image retrieval method characterized in that calculating the goodness of fit in consideration of the first and second preferences.

The method of claim 1,

Calculating a recency of the image;

In the fitness calculation step, the image retrieval method comprising calculating the goodness of fit in consideration of the latest degree of the image.

The method of claim 10,

The latest degree of the image is calculated using a difference between the date of creation of the image and the current date and the threshold date.

The method of claim 1,

Calculating weights for the text similarity of the image and the quality index of the image;

In the goodness of fit calculation step, the goodness of fit is calculated using the quality index of the image reflecting the weight and the text similarity of the image reflecting the weight.

A computer-readable recording medium having recorded thereon a program for performing the method according to any one of claims 1 or 3 to 5 or 8 to 12.

A quality index calculator for calculating a quality index of each image;

A text similarity calculator for calculating a text similarity between a query and each of the images;

A fitness calculation unit for calculating a relevance of the image to the query using the quality index of each image and the text similarity of each image;

An interface unit for receiving the query and providing an image search result corresponding to the query; And

Search result generation unit for generating an image search results for the query using the goodness of fit of the image

Including,

Further comprising a preference calculator for calculating the preference (Popularity) of each image,

The fitness calculator calculates the fitness by considering the preferences of the images together.

And the preference calculator calculates at least one of first preference information indicating a degree of overlap of the image itself and second preference information indicating a degree of source overlap of the image as preference information of the image.

delete

The method of claim 14,

The text similarity calculating unit calculates the text similarity using the importance of a word included in the query in at least one of a body, a title, and a link address of the image associated with the image. .

The method of claim 14,

The quality index calculator comprises: a feature information extraction unit for extracting image basic information, color distribution information, gray distribution information, and texture information from the respective images as image feature information; And,

And a quality index calculator configured to generate the image quality index by using the extracted image feature information.

delete

The method of claim 14,

And the preference calculator calculates the first preference information by using the number of overlapping images, and calculates the second preference information by using the number of sources including the image.

The method of claim 20,

The goodness-of-fit calculation unit determines whether at least one of the text similarity, the quality index, the number of overlapping of the image itself, and the number of sources including the image satisfies a predetermined condition. And calculating the goodness of fit by considering the second preference.

The method of claim 14,

The apparatus may further include a recency calculator for calculating a recency of each image.

The fitness calculation unit calculates the fitness by considering the latest image of each image together.

The method of claim 22,

And the latestness calculator calculates the latestness of each image by using a difference value between a creation date of the image, a current date, and a threshold date.

The method of claim 14,

And a weight calculator configured to calculate weights for the text similarity of each image and the quality index of the image.

And the fitness calculation unit calculates the fitness using the quality index of the image in which the weight is reflected and the text similarity of the image in which the weight is reflected.