KR102048638B1

KR102048638B1 - Method and system for recognizing content

Info

Publication number: KR102048638B1
Application number: KR1020180103990A
Authority: KR
Inventors: 박용식; 김상연; 김철연
Original assignee: 망고슬래브 주식회사
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2019-11-25
Also published as: WO2020045714A1

Abstract

The present invention relates to a method for recognizing content, which is capable of increasing a content recognition rate. According to the present invention, the method for recognizing content comprises the following steps: receiving a first image including at least a part of first content; extracting first text information from the first image; extracting one or more partial images of a first set from the first image; calculating a text similarity score between first and second images based on the first text information and second text information associated with a second image; calculating an image similarity score based on the one or more partial images of the first set and one or more partial images of a second set associated with the second image; and calculating a total similarity score based on the text and image similarity scores; and determining that at least a part of the first content is included in the second image when the total similarity score is equal to or greater than a preset first threshold value.

Description

Content Recognition Method and System {METHOD AND SYSTEM FOR RECOGNIZING CONTENT}

본 개시는 콘텐츠를 인식하기 위한 방법 및 이를 위한 시스템에 관한 것으로, 보다 상세하게는, 텍스트 검색과 이미지 검색을 함께 사용하여 콘텐츠 인식률을 향상시킬 수 있는 콘텐츠 인식 방법 및 시스템에 관한 것이다.The present disclosure relates to a method for recognizing content and a system therefor. More particularly, the present disclosure relates to a method and system for recognizing content that can improve content recognition rate by using a text search and an image search together.

최근에 들어, 빅데이터, AI 등의 정보통신 기술이 발전됨에 따라, 다양한 콘텐츠 자료들을 손쉽게 접근할 수 있다. 이에 따라, 콘텐츠를 인식하여 동일 또는 유사한 콘텐츠를 검색하기 위한 검색 시스템에 대한 필요성이 대두되고 있다.Recently, as information and communication technologies such as big data and AI are developed, various content materials can be easily accessed. Accordingly, there is a need for a search system for recognizing content and searching for the same or similar content.

종래의 이미지 검색 방법으로는 콘텐츠 자료에 포함된 텍스트를 인식하여, 추출된 텍스트 정보를 기반으로 검색하는 텍스트 기반의 검색 방법과 콘텐츠 자료에 포함된 이미지 특징을 이용한 이미지 특징 기반 검색 방법이 있다. 텍스트 기반의 검색 방법은 콘텐츠 자료를 인식하고 검색하기에 가장 간단하며 쉬운 방법이나, 각종 왜곡으로 인해 인식률이 떨어지질 수 있으며, 콘텐츠에 텍스트 이외의 요소가 많이 포함되어 있는 경우 정확한 검색이 어렵다.Conventional image retrieval methods include a text-based retrieval method for recognizing text included in content material and searching based on extracted text information, and an image feature-based retrieval method using image features included in content material. The text-based retrieval method is the simplest and easiest way to recognize and search content materials, but the recognition rate may be degraded due to various distortions, and accurate search is difficult when the content includes many elements other than text.

이미지 특징 기반 검색 방법은 색채(color), 형태(Shape) 등의 이미지 특징에 기초하여 유사 이미지를 검색하는 방법이다. 그림 이미지의 경우, 비교적 정확한 검색이 가능하나, 텍스트가 포함된 이미지의 경우, 검색 정확도가 떨어지는 문제가 있다.
[선행기술문헌]
1. 한국 등록특허공보 제10-0670003호(2007.01.10)
2. 한국 공개특허공보 제10-2016-0057125호(2016.05.23)
3. 한국 공개특허공보 제10-2018-0039596호(2018.04.18)
4. 미국 특허공보 US 8,041,139(2011.10.18)The image feature-based search method is a method of searching for similar images based on image features such as color and shape. In the case of a picture image, a relatively accurate search is possible, but in the case of an image including text, there is a problem that the search accuracy is low.
[Preceding technical literature]
1. Korea Registered Patent Publication No. 10-0670003 (2007.01.10)
2. Korean Patent Publication No. 10-2016-0057125 (2016.05.23)
3. Korean Unexamined Patent Publication No. 10-2018-0039596 (2018.04.18)
4. US Patent Publication US 8,041,139 (October 18, 2011)

본 명세서에서 개시되는 실시예들은, 텍스트 검색 및 이미지 검색을 함께 사용하여 사용자 단말을 통해 수신 받은 콘텐츠를 인식함으로써, 콘텐츠 인식률을 높일 수 있는 방법 및 이를 위한 시스템에 관한 것이다.Embodiments disclosed herein relate to a method and a system for increasing content recognition rate by recognizing content received through a user terminal using a text search and an image search together.

본 개시의 일 실시예에 따른 콘텐츠를 인식하기 위한 방법은, 제1 콘텐츠의 적어도 일부가 포함된 제1 이미지를 수신하는 단계, 제1 이미지로부터 제1 텍스트 정보를 추출하는 단계, 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출하는 단계, 제1 텍스트 정보 및 제2 이미지와 연관된 제2 텍스트 정보에 기초하여 제1 이미지와 제2 이미지의 텍스트 유사도 점수를 산출하는 단계, 제1 세트의 하나 이상의 부분 이미지와 제2 이미지와 연관된 제2 세트의 하나 이상의 부분 이미지에 기초하여 이미지 유사도 점수를 산출하는 단계, 텍스트 유사도 점수 및 이미지 유사도 점수에 기초하여 종합 유사도 점수를 산출하는 단계 및 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 콘텐츠의 적어도 일부가 제2 이미지에 포함된 것으로 판정하는 단계를 포함할 수 있다.A method for recognizing content according to an embodiment of the present disclosure may include receiving a first image including at least a portion of first content, extracting first text information from the first image, and extracting the first text information from the first image. Extracting at least one partial image of the first set, calculating a text similarity score of the first image and the second image based on the first text information and the second text information associated with the second image, the first set of Calculating an image similarity score based on the one or more partial images and a second set of one or more partial images associated with the second image, calculating a composite similarity score based on the text similarity score and the image similarity score and the composite similarity score Is greater than or equal to a preset first threshold, determine that at least a portion of the first content is included in the second image. It can include.

본 개시의 일 실시예에 따른 콘텐츠를 인식하기 위한 명령어들이 저장된 비-일시적 컴퓨터 판독가능 저장 매체에서, 명령어들은 프로세서에 의해 실행될 때 프로세서로 하여금, 제1 콘텐츠의 적어도 일부가 포함된 제1 이미지를 수신하고, 제1 이미지로부터 제1 텍스트 정보를 추출하고, 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출하고, 제1 텍스트 정보 및 제2 이미지와 연관된 제2 텍스트 정보에 기초하여 제1 이미지와 제2 이미지의 텍스트 유사도 점수를 산출하고, 제1 세트의 하나 이상의 부분 이미지와 제2 이미지와 연관된 제2 세트의 하나 이상의 부분 이미지에 기초하여 이미지 유사도 점수를 산출하고, 텍스트 유사도 점수 및 이미지 유사도 점수에 기초하여 제1 이미지와 제2 이미지의 종합 유사도 점수를 산출하고, 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 콘텐츠의 적어도 일부가 제2 이미지에 포함된 것으로 판정하도록 야기할 수 있다.In a non-transitory computer readable storage medium having stored thereon instructions for recognizing content according to an embodiment of the present disclosure, when executed by the processor, the instructions cause the processor to generate a first image including at least a portion of the first content. Receive, extract first text information from the first image, extract the first set of one or more partial images from the first image, and based on the first text information and the second text information associated with the second image Calculate a text similarity score of the image and the second image, calculate an image similarity score based on the at least one partial image of the first set and the at least one partial image associated with the second image, and calculate the text similarity score and the image Comprehensive similarity score of the first image and the second image is calculated based on the similarity score, and the comprehensive similarity point If the number is greater than or equal to a preset first threshold, it may cause the determination that at least a portion of the first content is included in the second image.

본 개시의 일 실시예에 따른 콘텐츠 인식 시스템은, 제1 콘텐츠의 적어도 일부가 포함된 제1 이미지를 수신하도록 구성된 통신부, 제1 이미지로부터 제1 텍스트 정보를 추출하도록 구성된 OCR 시스템, 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출하도록 구성된 객체 인식 시스템, 제1 텍스트 정보 및 제2 이미지와 연관된 제2 텍스트 정보에 기초하여 제1 이미지와 제2 이미지의 텍스트 유사도 점수를 산출하도록 구성된 텍스트 검색 시스템, 제1 세트의 하나 이상의 부분 이미지와 제2 이미지와 연관된 제2 세트의 하나 이상의 부분 이미지에 기초하여 이미지 유사도 점수를 산출하도록 구성된 이미지 검색 시스템, 및 텍스트 유사도 점수 및 이미지 유사도 점수에 기초하여 종합 유사도 점수를 산출하도록 구성된 검색 시스템을 포함하고, 검색 시스템은 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 콘텐츠의 적어도 일부가 제2 이미지에 포함된 것으로 판정할 수 있다.According to an embodiment of the present disclosure, a content recognizing system includes a communication unit configured to receive a first image including at least a portion of first content, an OCR system configured to extract first text information from the first image, and a first image. An object recognition system configured to extract a first set of one or more partial images, a text search configured to calculate a text similarity score of the first image and the second image based on the first text information and the second text information associated with the second image A system, an image retrieval system configured to calculate an image similarity score based on the first set of one or more partial images and the second set of one or more partial images associated with the second image, and a composite based on the text similarity score and the image similarity score Include a search system configured to calculate a similarity score, System can, if the comprehensive degree of similarity score greater than a preset first threshold value, at least a portion of the first content is determined to be included in the second image.

본 개시의 일 실시예에 따른 콘텐츠를 인식하기 위한 방법은, 제1 이미지와 연관된 제1 텍스트 정보 및 제1 세트의 하나 이상의 부분 이미지에 대한 데이터를 수신하는 단계 - 제1 이미지는 제1 콘텐츠의 적어도 일부를 포함함 -, 제1 텍스트 정보 및 제2 이미지와 연관된 제2 텍스트 정보에 기초하여 제1 이미지와 제2 이미지의 텍스트 유사도 점수를 산출하는 단계, 제1 세트의 하나 이상의 부분 이미지와 제2 이미지와 연관된 제2 세트의 하나 이상의 부분 이미지에 기초하여 이미지 유사도 점수를 산출하는 단계, 텍스트 유사도 점수 및 이미지 유사도 점수에 기초하여 종합 유사도 점수를 산출하는 단계, 및 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 콘텐츠의 적어도 일부가 제2 이미지에 포함된 것으로 판정하는 단계를 포함하고, 이미지 유사도 점수는 하나 이상의 부분 이미지 유사도 점수를 포함하고, 종합 유사도 점수는 텍스트 유사도 점수 및 하나 이상의 부분 이미지 유사도 점수에 가중치를 적용하여 산출되고, 가중치는 부분 이미지 내의 객체 유형에 따라 상이하게 적용될 수 있다.A method for recognizing content in accordance with an embodiment of the present disclosure includes receiving first text information associated with a first image and data for a first set of one or more partial images, wherein the first image is selected from the first content. Including at least a portion; calculating a text similarity score of the first image and the second image based on the first text information and the second text information associated with the second image; Calculating an image similarity score based on the second set of one or more partial images associated with the two images, calculating a composite similarity score based on the text similarity score and the image similarity score, and a first preset similarity score If at least a threshold, determining that at least a portion of the first content is included in the second image, wherein the image The similarity score includes one or more partial image similarity scores, the composite similarity score is calculated by applying weights to the text similarity score and the one or more partial image similarity scores, and the weights may be applied differently according to the object type in the partial image.

본 개시의 다양한 실시예들에 따르면, 사용자 단말을 통해 수신한 이미지를 텍스트 검색 및 이미지 검색함으로써, 텍스트 검색과 이미지 검색이 상호 보완적으로 작용한다. 따라서, 콘텐츠 인식 시 각종 왜곡을 극복할 수 있고, 인식율을 향상시킬 수 있다. 또한, 재편집에 의해 이미지 내의 텍스트 및 객체의 위치가 변경되더라도 동일 콘텐츠로 인식이 가능하다.According to various embodiments of the present disclosure, by text search and image search for an image received through the user terminal, the text search and the image search are complementary to each other. Therefore, it is possible to overcome various distortions in content recognition, and to improve the recognition rate. In addition, even if the position of the text and the object in the image is changed by re-editing, the same content can be recognized.

본 개시의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청구범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

본 개시의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 개시의 일 실시예에 따른 콘텐츠 인식 시스템이 사용자 단말의 요청에 따라 콘텐츠를 인식하는 환경을 나타내는 도면이다.
도 2는 본 개시의 일 실시예에 따른 콘텐츠 인식 시스템의 상세 구성을 나타내는 블록도이다.
도 3은 본 개시의 일 실시예에 따른 사용자 단말에 의해 촬영된 제1 이미지의 예를 나타내는 도면이다.
도 4는 본 개시의 일 실시예에 따른 제1 이미지 내의 텍스트를 인식하는 예를 나타내는 도면이다.
도 5는 도 4에서 인식된 텍스트 중 "게수"라는 단어에 대한 후보 단어들의 점수를 산출한 표를 나타내는 도면이다.
도 6은 사전 DB 및 언어 모델에 기초하여 제1 이미지에서 인식된 텍스트를 보정한 예시를 나타내는 도면이다.
도 7은 제1 이미지로부터 추출된 제1 텍스트 정보를 콘텐츠 DB에 저장된 제2 이미지와 연관된 제2 텍스트 정보와 비교하여 텍스트 유사도 점수를 산출하는 예시를 나타내는 도면이다.
도 8은 텍스트 라인 영역이 제거된 제1 이미지의 예시를 나타내는 도면이다.
도 9는 본 개시의 일 실시예에 따른 이미지 유사도 점수를 산출하는 예시를 나타내는 도면이다.
도 10 내지 도 12는 본 개시의 다른 실시예에 따른 이미지 유사도 점수를 산출하는 예시를 나타내는 도면이다.
도 13은 본 개시의 일 실시예에 따른 콘텐츠 인식 방법을 나타내는 순서도이다.Embodiments of the present disclosure will be described with reference to the accompanying drawings, which are described below, wherein like reference numerals denote similar elements, but are not limited thereto.
1 is a diagram illustrating an environment in which a content recognition system recognizes content at the request of a user terminal according to an embodiment of the present disclosure.
2 is a block diagram illustrating a detailed configuration of a content recognizing system according to an embodiment of the present disclosure.
3 is a diagram illustrating an example of a first image captured by a user terminal according to an exemplary embodiment.
4 is a diagram illustrating an example of recognizing text in a first image according to an embodiment of the present disclosure.
FIG. 5 is a diagram illustrating a table that calculates scores of candidate words for a word "count" in the recognized text in FIG. 4.
6 is a diagram illustrating an example of correcting text recognized in a first image based on a dictionary DB and a language model.
FIG. 7 is a diagram illustrating an example of calculating a text similarity score by comparing first text information extracted from a first image with second text information associated with a second image stored in a content DB.
8 is a diagram illustrating an example of a first image in which a text line area is removed.
9 is a diagram illustrating an example of calculating an image similarity score according to an embodiment of the present disclosure.
10 to 12 are diagrams illustrating an example of calculating an image similarity score according to another exemplary embodiment of the present disclosure.
13 is a flowchart illustrating a content recognition method according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. However, in the following description, when there is a risk of unnecessarily obscuring the subject matter of the present disclosure, a detailed description of well-known functions and configurations will be omitted.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, the same or corresponding components are given the same reference numerals. In addition, in the following description of the embodiments, it may be omitted to repeatedly describe the same or corresponding components. However, even if the description of the component is omitted, it is not intended that such component is not included in any embodiment.

본 개시에서, "제1 이미지"는 사용자가 사용자 단말을 통해 촬영 또는 입력한 이미지를 지칭할 수 있다. 본 개시에서, "제2 이미지"는 데이터베이스에 저장된 임의의 이미지를 지칭할 수 있다. 본 개시에서, "콘텐츠 이미지"는, 콘텐츠의 적어도 일부가 포함된 이미지를 지칭할 수 있다.In the present disclosure, “first image” may refer to an image photographed or input by a user through a user terminal. In the present disclosure, “second image” may refer to any image stored in a database. In the present disclosure, “content image” may refer to an image including at least a portion of content.

도 1은 본 개시의 일 실시예에 따른 콘텐츠 인식 시스템(130)이 사용자 단말(110_1 내지 110_n)의 요청에 따라 콘텐츠를 인식하는 환경을 나타내는 도면이다. 일 실시예에 따르면, 하나 이상의 사용자 단말(110_1 내지 110_n)이 통신 네트워크(120)를 통해 콘텐츠 인식 시스템(130)과 통신할 수 있다. 통신 네트워크(120)는 설치환경에 따라 이더넷(Ethernet), 유선 홈 네트워크, 전력선 통신망(Power Line Communication), 전화선 통신망 및 RS-serial 통신 등의 유선 네트워크 또는 WLAN(Wireless LAN), Bluetooth 및 지그비(ZigBee) 등과 같은 무선 네트워크로 다양하게 선택되어 구성될 수 있다.1 is a diagram illustrating an environment in which the content recognizing system 130 recognizes content at the request of the user terminals 110_1 to 110_n according to an embodiment of the present disclosure. According to an embodiment, one or more user terminals 110_1 to 110_n may communicate with the content recognition system 130 through the communication network 120. The communication network 120 may be a wired network such as Ethernet, a wired home network, a power line communication, a telephone line network, or an RS-serial communication, or a wireless LAN (WLAN), Bluetooth, and ZigBee, depending on an installation environment. It may be configured and selected in a variety of wireless networks, such as).

사용자 단말(110_1 내지 110_n)은 카메라 모듈(미도시)을 사용하여 콘텐츠를 촬영할 수 있고, 통신 네트워크(120)를 통해 촬영된 이미지에 대한 데이터를 콘텐츠 인식 시스템(130)으로 전송할 수 있다. 대안적으로, 사용자 단말(110_1 내지 110_n)은 내부 저장된 이미지에 대한 데이터를 콘텐츠 인식 시스템(130)으로 전송할 수 있다. 일 실시예에서, 사용자 단말(110_1 내지 110_n)은 스마트 폰, 태블릿 PC, 노트북, PDA(Personal Digital Assistants), 이동통신 단말기 등을 포함할 수 있으며, 이에 한정되지 않고, 카메라 모듈 및/또는 통신 모듈을 구비한 임의의 장치일 수 있다.The user terminals 110_1 to 110_n may photograph content using a camera module (not shown), and may transmit data about the captured image to the content recognition system 130 through the communication network 120. Alternatively, the user terminals 110_1 to 110_n may transmit data about an internally stored image to the content recognizing system 130. In an embodiment, the user terminals 110_1 to 110_n may include a smart phone, a tablet PC, a notebook computer, a personal digital assistant (PDA), a mobile communication terminal, and the like, but are not limited thereto, and a camera module and / or a communication module. It may be any device having a.

콘텐츠 인식 시스템(130)은 사용자 단말(110_1 내지 110_n)이 전송한 이미지 내의 콘텐츠를 인식하기 위한 서버 장치로서, 통신부(140), 프로세서(150) 및 데이터베이스(160)를 포함할 수 있다. 통신부(140)는 통신 네트워크(120)를 통해 사용자 단말(110_1 내지 110_n)로부터 이미지 데이터를 수신할 수 있다. 프로세서(150)는 통신부(140)가 수신한 이미지 데이터를 전달받아, 해당 이미지 데이터를 처리할 수 있다.The content recognizing system 130 is a server device for recognizing content in an image transmitted by the user terminals 110_1 to 110_n and may include a communication unit 140, a processor 150, and a database 160. The communicator 140 may receive image data from the user terminals 110_1 to 110_n through the communication network 120. The processor 150 may receive the image data received by the communicator 140 and process the corresponding image data.

일 실시예에서, 프로세서(150)는 CPU(central processing unit), GPU(graphic processing unit), DSP(digital signal processor) 중 적어도 하나를 포함하여 연산 동작을 수행할 수 있다. 프로세서(150)는 이미지 데이터로부터 텍스트 정보 및 부분 이미지를 추출하고, 텍스트 검색, 객체 인식 및 이미지 검색 등을 수행하여 데이터베이스(160) 내에 저장된 콘텐츠 이미지 중 수신된 이미지 데이터에 포함된 콘텐츠와 동일 또는 유사한 콘텐츠를 검색할 수 있다. 콘텐츠 인식 시스템(130)이 수신된 콘텐츠 이미지와 동일 또는 유사한 데이터베이스(160) 내의 콘텐츠를 인식하는 과정은 도 2를 참조하여 자세히 설명한다.In an embodiment, the processor 150 may include at least one of a central processing unit (CPU), a graphic processing unit (GPU), and a digital signal processor (DSP) to perform an operation operation. The processor 150 extracts the text information and the partial image from the image data, performs text search, object recognition, and image search, and the like or similar to the content included in the received image data among the content images stored in the database 160. You can search for content. The process of the content recognizing system 130 to recognize content in the database 160 that is the same as or similar to the received content image will be described in detail with reference to FIG.

도 2는 본 개시의 일 실시예에 따른 콘텐츠 인식 시스템(200)의 상세 구성을 나타내는 블록도이다. 도 2에 도시된 바와 같이, 콘텐츠 인식 시스템(200)은 통신부(140), 데이터베이스(160) 및 프로세서(150)를 포함할 수 있다. 콘텐츠 인식 시스템(200)의 기능 또는 구성요소들 중에서, 도 1에서 설명된 것과 동일한 부재번호 또는 명칭을 갖는 구성요소들에 대해서는, 반복을 피하기 위해 상세한 설명을 생략할 수 있으며, 변경 또는 추가적인 부분만 설명할 수 있다.2 is a block diagram illustrating a detailed configuration of a content recognizing system 200 according to an embodiment of the present disclosure. As shown in FIG. 2, the content recognition system 200 may include a communication unit 140, a database 160, and a processor 150. Among the functions or components of the content recognizing system 200, components having the same reference numbers or names as described in FIG. 1 may be omitted in order to avoid repetition, and only changes or additional parts may be used. It can be explained.

프로세서(150)는 OCR 시스템(210), 객체 인식 시스템(220) 및 검색 시스템(230)을 포함할 수 있다. 검색 시스템(230)의 경우, 텍스트 검색 시스템(232) 및 이미지 검색 시스템(234)을 포함할 수 있다. 데이터베이스(160)는 콘텐츠 DB(162), 사전 DB(164) 및 언어 모델(language model)(160)을 포함할 수 있다.The processor 150 may include an OCR system 210, an object recognition system 220, and a search system 230. In the case of search system 230, it may include a text search system 232 and an image search system 234. The database 160 may include a content DB 162, a dictionary DB 164, and a language model 160.

콘텐츠 인식 시스템(200)의 통신부(140)는 통신 네트워크(120)를 통해 사용자 단말과 같은 외부기기와 통신할 수 있다. 일 실시예에 따르면, 콘텐츠 인식 시스템(200)은 통신부(140)가 이미지를 수신하고, 프로세서(150)가 수신된 이미지를 데이터베이스(160)에 저장된 콘텐츠 이미지들과 비교하여 동일 또는 유사한 콘텐츠가 포함된 이미지를 검색하도록 구성될 수 있다. 검색된 동일 또는 유사한 콘텐츠가 포함된 이미지는 통신 네트워크(120)를 통해 사용자 단말에 전송될 수 있다. 일 실시예에서, 이미지에 포함된 콘텐츠는 수학 문제와 같은 학습 콘텐츠일 수 있으며, 이에 한정되지 않고, 텍스트와 텍스트 이외의 객체(그림, 도형, 그래프, 수식 등)가 포함된 임의의 콘텐츠일 수 있다.The communication unit 140 of the content recognition system 200 may communicate with an external device such as a user terminal through the communication network 120. According to an embodiment of the present disclosure, the content recognition system 200 may include the same or similar content by the communicator 140 receiving an image, and the processor 150 comparing the received image with content images stored in the database 160. Can be configured to retrieve the captured image. The image including the retrieved same or similar content may be transmitted to the user terminal through the communication network 120. In one embodiment, the content included in the image may be learning content such as a math problem, but is not limited thereto, and may be any content including text and non-text objects (pictures, figures, graphs, formulas, etc.). have.

구체적으로, 통신부(140)는 통신 네트워크(120)를 통해 사용자 단말로부터 제1 이미지를 수신할 수 있다. 여기서, 제1 이미지는 사용자가 사용자 단말을 이용하여 촬영한 이미지나 사용자 단말에 저장된 이미지일 수 있으며, 제1 이미지는 제1 콘텐츠의 적어도 일부를 포함할 수 있다. 통신부(140)는 사용자 단말로부터 수신된 제1 이미지를 프로세서(150)로 제공할 수 있다.In detail, the communication unit 140 may receive the first image from the user terminal through the communication network 120. Here, the first image may be an image captured by the user using the user terminal or an image stored in the user terminal, and the first image may include at least a part of the first content. The communicator 140 may provide the processor 150 with the first image received from the user terminal.

OCR 시스템(210)은 이미지의 내의 텍스트 정보를 추출하도록 구성될 수 있다. 일 실시예에서, OCR 시스템(210)은 사용자 단말이 전송한 제1 이미지로부터 제1 텍스트 정보를 추출할 수 있다. 이를 위해, OCR 시스템(210)은 제1 이미지 내의 하나 이상의 텍스트 라인 영역을 검출할 수 있다. 이 때, 텍스트 라인 영역 인식률 및 텍스트 인식률을 높이기 위해, OCR 시스템(210)은 이진화 및 콘트라스트 처리 등의 이미지 전처리를 수행하여 배경을 제거하고, 텍스트 부분을 도드라지게 할 수 있다. 이후에 이루어지는 이미지 유사도 점수 및 부분 유사도 점수도 이미지 전처리가 수행된 후의 이미지에 기초하여 산출될 수 있다.OCR system 210 may be configured to extract textual information within an image. In one embodiment, the OCR system 210 may extract the first text information from the first image transmitted by the user terminal. To this end, the OCR system 210 may detect one or more text line regions within the first image. At this time, in order to increase the text line area recognition rate and the text recognition rate, the OCR system 210 may perform image preprocessing such as binarization and contrast processing to remove the background and make the text portion sharp. The image similarity score and the partial similarity score which are made later may also be calculated based on the image after the image preprocessing is performed.

그 후, OCR 시스템(210)이 광학 문자 판독(OCR: Optical Character Recognition) 기술을 이용하여 검출된 텍스트 라인 영역 내의 텍스트를 인식할 수 있다. 이 때, 텍스트 인식률을 높이기 위해, 텍스트 라인 영역 이미지를 최적의 사이즈로 리사이징 할 수 있다. 예를 들어, 텍스트 라인 영역 이미지의 크기를 텍스트 높이가 폰트 20의 크기가 되도록 리사이징할 수 있다.The OCR system 210 may then recognize the text in the detected text line region using optical character recognition (OCR) technology. At this time, in order to increase the text recognition rate, the text line area image may be resized to an optimal size. For example, the size of the text line area image may be resized such that the text height is the size of font 20.

OCR 시스템(210)은 인식된 텍스트를 사전 DB(164) 및 언어 모델(166)에 기초하여 보정할 수 있다. 사전 DB(164) 및 언어 모델(166)은 인식하고자 하는 텍스트의 언어 및 콘텐츠 분야에 적합하도록 준비된 데이터일 수 있다. 예를 들어, 인식하고자 하는 콘텐츠가 한글 수학 학습 콘텐츠의 경우, 사전 DB(164)는 한글 수학 학습 콘텐츠에서 자주 나타나는 단어들을 포함할 수 있고, 언어 모델(166)은 한글 수학 학습 콘텐츠에 기초하여 학습된 언어 모델일 수 있다.The OCR system 210 may correct the recognized text based on the dictionary DB 164 and the language model 166. The dictionary DB 164 and the language model 166 may be data prepared to be suitable for the language and content field of the text to be recognized. For example, if the content to be recognized is Hangul mathematics learning content, the dictionary DB 164 may include words that frequently appear in the Hangul mathematics learning content, and the language model 166 learns based on the Hangul mathematics learning content. It may be a language model.

일 실시예에 따르면, OCR 시스템(210)은 인식된 텍스트를 단어 단위별로 보정할 수 있다. 이 경우, OCR 시스템(210)은 사전 DB(164)에 기초하여 각 단어별로 하나 이상의 후보 단어를 판정할 수 있다. 예를 들어, OCR 시스템(210)은 편집거리 알고리즘에 기초하여 인식된 단어와 사전 DB(164)에 저장된 각 단어 사이의 편집 거리(Edit Distance)를 계산하고, 계산된 편집 거리가 미리 설정된 임계값 이내인 단어들을 후보 단어로 판정할 수 있다. 여기서, 편집 거리는 인식된 단어를 사전 DB(164)에 저장된 단어와 동일하게 만들기 위해 필요한 최소한의 연산 수를 나타낼 수 있다.According to an embodiment, the OCR system 210 may correct the recognized text on a word-by-word basis. In this case, the OCR system 210 may determine one or more candidate words for each word based on the dictionary DB 164. For example, the OCR system 210 calculates an edit distance between the recognized word and each word stored in the dictionary DB 164 based on the edit distance algorithm, and the calculated edit distance is a preset threshold. Words within the range can be determined as candidate words. Here, the editing distance may represent the minimum number of operations required to make the recognized word the same as the word stored in the dictionary DB 164.

후보 단어를 선출한 후, OCR 시스템(210)은 언어 모델(166)에 기초하여 각 후보 단어에 대한 점수를 산출할 수 있다. 점수 산출시 OCR 시스템(210)은 해당 단어의 앞에 위치하는 단어들과 해당 단어의 뒤에 위치하는 단어들을 참조하여 점수를 산출할 수 있다. 예를 들어, 언어 모델(166)은 RNN(Recurrent Neural Network) 언어 모델일 수 있으며, 해당 단어 앞의 5 내지 10 개의 단어 및 해당 단어 뒤의 2 또는 3 개의 단어를 참조할 수 있다. 언어 모델(166)은 RNN 언어 모델에 한정되지 않으며, 본 개시를 벗어나지 않는 범위 내에서 다양하게 구현 가능하다.After selecting candidate words, the OCR system 210 may calculate a score for each candidate word based on the language model 166. In calculating the score, the OCR system 210 may calculate a score by referring to words positioned before the word and words positioned behind the word. For example, the language model 166 may be a Recurrent Neural Network (RNN) language model and may refer to 5 to 10 words before the word and 2 or 3 words after the word. The language model 166 is not limited to the RNN language model, and may be variously implemented without departing from the present disclosure.

OCR 시스템(210)은 최고 점수를 갖는 후보 단어의 점수가 미리 설정된 임계값(Threshold) 이상인 경우, 최고 점수를 갖는 후보 단어를 최종 단어로 선택할 수 있다. 반면, 최고 점수를 갖는 후보 단어의 점수가 미리 설정된 임계값 미만인 경우, OCR 시스템(210)은 해당 단어를 삭제하고 해당 단어가 포함된 영역을 텍스트 라인 영역에서 제외시킬 수 있다. 각 단어 별로 상술한 과정을 반복하여, 제1 이미지 내의 텍스트 정보를 추출할 수 있다. 이와 같은 과정을 통해, 오인식된 텍스트는 보정되고, 텍스트가 포함되어 있지 않은데 텍스트 라인 영역으로 잘못 추출된 부분은 텍스트 라인 영역에서 제외될 수 있다.The OCR system 210 may select the candidate word having the highest score as the final word when the score of the candidate word having the highest score is greater than or equal to a preset threshold. On the other hand, when the score of the candidate word having the highest score is less than the preset threshold, the OCR system 210 may delete the word and exclude the area containing the word from the text line area. The above-described process may be repeated for each word to extract text information in the first image. Through this process, misrecognized text is corrected, and a part which is incorrectly extracted as a text line area without text is included in the text line area.

객체 인식 시스템(220)은 사용자 단말로부터 전달받은 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출할 수 있다. 구체적으로, 객체 인식 시스템(220)은 제1 이미지로부터 텍스트 라인 영역을 제거하고, 클러스터링 기법을 사용하여 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출할 수 있다. 예를 들어, DBSCAN(Density-Based Spatial Clustering of Applications with Noise) 등의 기법을 사용하여, 텍스트 라인이 제거된 제1 이미지로부터 여백을 기준으로 하나 이상의 부분 이미지를 추출할 수 있다. 또한, 객체 인식 시스템(220)은 제1 이미지 전체를 추출하여 부분 이미지로서 제1 세트의 하나 이상의 부분 이미지에 포함시킬 수 있다.The object recognition system 220 may extract one or more partial images of the first set from the first image received from the user terminal. Specifically, object recognition system 220 may remove a text line region from the first image and extract one or more sets of one or more partial images from the first image using a clustering technique. For example, a technique such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) may be used to extract one or more partial images based on a margin from a first image from which text lines have been removed. In addition, the object recognition system 220 may extract the entire first image and include it in the one or more partial images of the first set as a partial image.

그 후, 객체 인식 시스템(220)은 제1 이미지로부터 추출된 제1 세트의 하나 이상의 부분 이미지 내의 객체를 인식할 수 있다. 예를 들어, 객체 인식 시스템(220)은 딥러닝 기반 분류 모델을 사용하여 부분 이미지 내의 객체를 인식할 수 있다. 여기서, 인식되는 객체는 수식, 그래프, 도형 및 그림 중 적어도 하나를 포함할 수 있다. 부분 이미지로서 추출된 전체 이미지에 대해서는 별도로 객체 인식을 수행하지 않고, 전체 이미지로 태깅(tagging)할 수 있다.The object recognition system 220 can then recognize the objects in the one or more partial images of the first set extracted from the first image. For example, the object recognition system 220 may recognize an object in the partial image using a deep learning based classification model. Here, the recognized object may include at least one of an equation, a graph, a figure, and a picture. The entire image extracted as the partial image may be tagged as the entire image without performing object recognition separately.

검색 시스템(230)은 텍스트 검색 시스템(232)에 의해 산출되는 텍스트 유사도 점수와 이미지 검색 시스템(234)에 의해 산출되는 이미지 유사도 점수에 기초하여 사용자 단말로부터 전송된 제1 이미지와 콘텐츠 DB(162)에 저장된 제2 이미지 사이의 종합 유사도를 계산하도록 구성될 수 있다. 일 실시예에서, 콘텐츠 DB(162)는 복수의 콘텐츠 이미지를 포함할 수 있고, 해당 콘텐츠 이미지와 연관된 정보(텍스트 정보, 부분 이미지에 대한 정보, 부분 이미지에 포함된 객체 정보 등)도 함께 저장할 수 있다.The search system 230 stores the first image and the content DB 162 transmitted from the user terminal based on the text similarity score calculated by the text search system 232 and the image similarity score calculated by the image search system 234. And calculate a comprehensive similarity between the second images stored in the. In one embodiment, the content DB 162 may include a plurality of content images, and may also store information (text information, information about a partial image, object information included in the partial image, etc.) associated with the content image. have.

텍스트 검색 시스템(232)은 제1 이미지로부터 추출된 제1 텍스트 정보 및 콘텐츠 DB(162)에 저장된 제2 이미지와 연관된 제2 텍스트 정보에 기초하여 제1 이미지와 제2 이미지의 텍스트 유사도 점수를 산출할 수 있다. 일 실시예에서, 텍스트 검색 시스템(232)은 제1 텍스트 정보와 제2 텍스트 정보 내의 전체 텍스트를 비교하여 텍스트 유사도 점수를 산출할 수 있다. 다른 실시예에서, 텍스트 검색 시스템(232)은 제1 텍스트 정보와 제2 텍스트 정보 내의 키워드를 비교하여 텍스트 유사도 점수를 산출할 수 있다The text search system 232 calculates a text similarity score of the first image and the second image based on the first text information extracted from the first image and the second text information associated with the second image stored in the content DB 162. can do. In one embodiment, the text search system 232 may calculate a text similarity score by comparing the entire text in the first text information and the second text information. In another embodiment, the text search system 232 may calculate a text similarity score by comparing the keywords in the first text information and the second text information.

이미지 검색 시스템(234)은 제1 이미지로부터 추출된 제1 세트의 하나 이상의 부분 이미지와 콘텐츠 DB(162)에 저장된 제2 이미지와 연관된 제2 세트의 하나 이상의 부분 이미지에 기초하여 이미지 유사도 점수를 산출할 수 있다. 여기서, 이미지 유사도 점수는 하나 이상의 부분 이미지 유사도 점수를 포함할 수 있으며, 이미지 유사도 점수는 부분 이미지 유사도 점수들의 가중합으로 산출될 수 있다. 일 실시예에서, 이미지 검색 시스템(234)은 동일한 객체 유형을 가지는 부분 이미지들 사이의 부분 이미지 유사도 점수를 산출할 수 있다. 다른 실시예에서, 이미지 검색 시스템(234)은 유사도가 가장 높은 부분 이미지들 사이의 부분 이미지 유사도 점수를 산출할 수 있다. 또한, 부분 이미지로서 추출된 전체 이미지는 전체 이미지끼리 비교하여 부분 이미지 유사도 점수를 산출할 수 있다.Image retrieval system 234 calculates an image similarity score based on the first set of one or more partial images extracted from the first image and the second set of one or more partial images associated with the second image stored in content DB 162. can do. Here, the image similarity score may include one or more partial image similarity scores, and the image similarity score may be calculated as a weighted sum of the partial image similarity scores. In one embodiment, image retrieval system 234 may calculate partial image similarity scores between partial images having the same object type. In another embodiment, image retrieval system 234 may calculate partial image similarity scores between the partial images with the highest similarity. In addition, the entire image extracted as the partial image may be compared with each other to calculate the partial image similarity score.

부분 이미지 유사도 점수는 2개의 부분 이미지로부터 추출된 이미지 특징에 기초하여 산출될 수 있다. 예를 들어, 이미지 특징은 SIFT(Scale Invariant Feature Transform), SURF(Speed-Up Robust Feature), BRIEF(Binary Robust Independent Elementary Features), ORB(Oriented FAST and Rotated BRIEF) 기법 등을 이용하여 추출될 수 있다.The partial image similarity score may be calculated based on image features extracted from the two partial images. For example, image features may be extracted using Scale Invariant Feature Transform (SIFT), Speed-Up Robust Feature (SURF), Binary Robust Independent Elementary Features (BRIEF), and Oriented FAST and Rotated BRIEF (ORB) techniques. .

일부 실시예에서, 검색 시스템(230)은 텍스트 유사도 점수 및 하나 이상의 부분 이미지 유사도 점수에 가중치를 적용하여 종합 유사도 점수를 산출할 수 있다. 예를 들어, 종합 유사도 점수는 다음 수식에 의해 산출될 수 있다.In some embodiments, search system 230 may apply a weight to the text similarity score and the one or more partial image similarity scores to calculate a composite similarity score. For example, the overall similarity score may be calculated by the following equation.

여기서,

은 텍스트 유사도 점수를 나타내고,

는 부분 이미지 유사도 점수를 나타내고,

는 종합 유사도 점수를 나타내고,

는 텍스트 유사도 점수에 적용되는 가중치를 나타내고,

는 부분 이미지 유사도 점수에 적용되는 가중치를 나타내고, n은 부분 이미지 유사도 점수의 개수를 나타낸다. 일 실시예에서, 텍스트 유사도 점수에 적용되는 가중치

는 부분 이미지 유사도 점수에 적용되는 가중치

보다 클 수 있다.here,

Indicates a text similarity score,

Represents a partial image similarity score,

Represents a composite similarity score,

Represents the weight applied to the text similarity score,

Denotes a weight applied to the partial image similarity score, and n denotes the number of partial image similarity scores. In one embodiment, the weight applied to the text similarity score

Is a weight applied to the partial image similarity score

Can be greater than

또한, 부분 이미지 유사도 점수에 적용되는 가중치

는 부분 이미지 내의 객체 유형(예를 들어, 수식, 그래프, 도형, 그림, 전체 이미지)에 따라 상이하게 적용될 수 있다. 예를 들어, 부분 이미지 내의 객체가 그림인 경우, 부분 이미지 내의 객체가 도형인 경우보다 높은 가중치가 적용될 수 있고, 부분 이미지 내의 객체가 도형인 경우, 부분 이미지 내의 객체가 그래프인 경우보다 높은 가중치가 적용될 수 있다. 또한, 부분 이미지 내의 객체가 그래프인 경우, 부분 이미지 내의 객체가 수식인 경우보다 높은 가중치가 적용될 수 있고, 부분 이미지 내의 객체가 수식인 경우, 부분 이미지가 전체 이미지인 경우보다 높은 가중치가 적용될 수 있다.Also, the weight applied to the partial image similarity score

May be applied differently according to the object type (eg, formula, graph, figure, picture, whole image) in the partial image. For example, when the object in the partial image is a picture, a higher weight may be applied than when the object in the partial image is a shape, and when the object in the partial image is a shape, the weight is higher than when the object in the partial image is a graph. Can be applied. In addition, when the object in the partial image is a graph, a higher weight may be applied than when the object in the partial image is a formula, and when the object in the partial image is a formula, a higher weight may be applied than when the partial image is a full image. .

검색 시스템(230)은 산출된 종합 유사도 점수에 기초하여 제1 이미지와 제2 이미지 사이의 관계(동일 콘텐츠, 유사 콘텐츠, 비유사 콘텐츠 등)를 판정할 수 있다. 예를 들어, 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 이미지와 제2 이미지는 동일 콘텐츠를 포함하는 것으로 판정할 수 있다. 종합 유사도 점수가 미리 설정된 제2 임계값 이상이고, 제1 임계값 미만인 경우, 제1 이미지와 제2 이미지가 유사 콘텐츠를 포함하는 것으로 판정할 수 있다. 반면, 종합 유사도 점수가 미리 설정된 제2 임계값 미만인 경우, 제1 이미지와 제2 이미지는 비유사한 콘텐츠를 포함하는 것으로 판정할 수 있다.The search system 230 may determine a relationship (same content, similar content, dissimilar content, etc.) between the first image and the second image based on the calculated comprehensive similarity score. For example, when the comprehensive similarity score is equal to or greater than the first threshold value, the first image and the second image may be determined to include the same content. When the overall similarity score is equal to or greater than a second preset threshold and less than the first threshold, it may be determined that the first image and the second image include similar content. On the other hand, when the comprehensive similarity score is less than the second threshold, the first image and the second image may be determined to include dissimilar content.

검색 시스템(230)은 상술한 과정에 따라 사용자 단말로부터 전송된 제1 이미지와 콘텐츠 DB(162)에 저장된 모든 이미지를 비교하여, 동일 또는 유사 콘텐츠를 검색할 수 있다. 일 실시예에서, 콘텐츠 인식 시스템(200)은 제1 이미지와 동일 또는 유사 콘텐츠가 검색된 경우, 해당 콘텐츠가 포함된 이미지를 사용자 단말로 전송할 수 있다. 추가로, 연관된 텍스트 정보, 부분 이미지에 대한 정보, 부분 이미지에 포함된 객체 정보 중 하나 이상을 이미지와 함께 제공할 수도 있다. 추가로 또는 대안적으로, 콘텐츠 인식 시스템(200)은 검색된 동일 또는 유사한 콘텐츠와 연관된 정보(학년 정보, 난이도 정보, 과목 정보 등)를 사용자 단말에 제공할 수 있다. 다른 실시예에서, 콘텐츠 인식 시스템(200)은 제1 이미지와 동일한 콘텐츠가 검색되지 않은 경우, 제1 이미지를 연관된 텍스트 정보, 부분 이미지에 대한 정보, 부분 이미지에 포함된 객체 정보 등과 함께 콘텐츠 DB(162)에 저장할 수 있다.The search system 230 may search for the same or similar content by comparing the first image transmitted from the user terminal with all images stored in the content DB 162 according to the above-described process. In an embodiment, when the same or similar content as the first image is found, the content recognizing system 200 may transmit an image including the corresponding content to the user terminal. In addition, one or more of associated text information, information about the partial image, and object information included in the partial image may be provided together with the image. Additionally or alternatively, the content recognition system 200 may provide the user terminal with information (grade information, difficulty information, subject information, etc.) associated with the retrieved same or similar content. In another embodiment, if the same content as the first image is not found, the content recognizing system 200 stores the first image along with the content DB (including related text information, information about the partial image, object information included in the partial image), and the like. 162).

도 2에서는 콘텐츠 인식 시스템(200)이 OCR 시스템(210)과 객체 인식 시스템(220)을 포함하는 것으로 도시되었으나, 이에 한정되지 않으며, 사용자 단말이 OCR 시스템(210)과 객체 인식 시스템(220)을 포함하는 것으로 구성될 수도 있다. 이 경우, 사용자 단말이 콘텐츠 인식 시스템(200)에 제1 이미지로부터 추출된 제1 텍스트 정보, 제1 이미지로부터 추출된 제1 세트의 하나 이상의 부분 이미지 및 각 부분 이미지에 포함된 객체의 정보를 전송할 수 있다.In FIG. 2, the content recognition system 200 is illustrated as including the OCR system 210 and the object recognition system 220, but is not limited thereto. The user terminal may use the OCR system 210 and the object recognition system 220. It may be configured to include. In this case, the user terminal transmits the first text information extracted from the first image, the first set of one or more partial images extracted from the first image, and the information of the object included in each partial image to the content recognizing system 200. Can be.

이상에서 설명한 것과 같이, 콘텐츠 인식 시스템(200)은 텍스트 유사도 점수, 부분 이미지 유사도 점수 및 부분 이미지 내의 객체 정보를 모두 고려하여 콘텐츠 DB(162)를 검색하도록 구성되어, 재편집에 의해 콘텐츠 내의 텍스트 또는 객체의 위치가 변경되더라도 동일 콘텐츠로 인식하는 것이 가능하다. 또한, 텍스트 검색 시스템(232)과 이미지 검색 시스템(234)이 상호 보완적으로 작용하므로, 사용자 단말에 의해 촬영된 이미지에 포함된 각종 왜곡(편집의 차이, 촬영 조건의 차이 등)을 극복하고 콘텐츠 인식율을 향상시키는 것이 가능하다. 콘텐츠 인식 시스템(200)의 동작 예시를 도 3 내지 도 9를 참조하여 자세히 설명한다.As described above, the content recognition system 200 is configured to search the content DB 162 in consideration of all of the text similarity score, the partial image similarity score, and the object information in the partial image, thereby re-editing the text or the content within the content. Even if the position of the object is changed, it is possible to recognize the same content. In addition, since the text retrieval system 232 and the image retrieval system 234 are complementary to each other, it is possible to overcome various distortions (differences in editing, differences in shooting conditions, etc.) included in images captured by the user terminal, It is possible to improve the recognition rate. An example of the operation of the content recognizing system 200 will be described in detail with reference to FIGS. 3 to 9.

도 3은 본 개시의 일 실시예에 따른 사용자 단말에 의해 촬영된 제1 이미지(310)의 예를 나타내는 도면이다. 도 3에 도시된 바와 같이, 사용자는 사용자 단말(예를 들어, 스마트 폰)을 이용하여 콘텐츠를 촬영해 제1 이미지(310)를 저장하고 통신 네트워크를 통해 콘텐츠 인식 시스템으로 전송할 수 있다. 여기서, 제1 이미지(310)는 수학 문제를 촬영한 이미지이다.3 is a diagram illustrating an example of a first image 310 photographed by a user terminal according to an exemplary embodiment. As illustrated in FIG. 3, a user may photograph content using a user terminal (eg, a smart phone), store the first image 310, and transmit the content to the content recognition system through a communication network. Here, the first image 310 is an image of a math problem.

도 4는 본 개시의 일 실시예에 따른 제1 이미지(310) 내의 텍스트를 인식하는 예를 나타내는 도면이다. 제1 이미지(310)는 수학 문제 콘텐츠를 포함한다. OCR 시스템은 제1 이미지(310) 내의 텍스트 라인 영역(410 내지 460)을 검출할 수 있다. 일 실시예에서, OCR 시스템은 텍스트 라인 영역 인식률 및 텍스트 인식률을 높이기 위해, 텍스트 라인 영역 검출 이전에, 이진화 및 콘트라스트 처리 등의 이미지 전처리 수행하여 배경을 제거하고, 텍스트 부분을 도드라지게 할 수 있다.4 is a diagram illustrating an example of recognizing text in a first image 310 according to an embodiment of the present disclosure. The first image 310 includes math problem content. The OCR system can detect text line regions 410-460 in the first image 310. In an embodiment, the OCR system may perform image preprocessing such as binarization and contrast processing to remove the background and to sharpen the text portion before the text line region detection to increase the text line region recognition rate and the text recognition rate.

그 후, OCR 시스템은 텍스트 라인 영역(410 내지 460) 내의 텍스트를 인식할 수 있다. 도 4에 도시된 것과 같이, 수식 부분은 텍스트 라인 영역(460)이 제대로 인식되지 않아 "g")..― ln(.r+l) '"와 같이 텍스트가 오인식 될 수 있다. 이와 같이, 수식의 경우, 적분, 미분, 행렬 등에 의해 텍스트 라인 영역을 제대로 인식하기 어려워 OCR 시스템이 잘못된 텍스트 인식 결과를 출력하는 경우가 많다. 또한, 제대로 인식된 텍스트 라인 영역(410, 420)에서도 "게수" 및 "그간"과 같이 텍스트가 제대로 인식되지 않은 경우도 있을 수 있다.The OCR system can then recognize the text in the text line areas 410-460. As shown in Fig. 4, in the modifier portion, the text line area 460 is not properly recognized, so that the text may be misrecognized as "g") ..- ln (.r + l) '". In the case of a formula, the OCR system often outputs an incorrect text recognition result because it is difficult to properly recognize the text line area due to the integral, derivative, matrix, etc. Also, the "count" in the properly recognized text line areas 410 and 420 may be used. And text may not be recognized properly.

도 5는 도 4에서 인식된 텍스트 중 "게수"라는 단어에 대한 후보 단어들의 점수를 산출한 표(500)를 나타내는 도면이다. OCR 시스템은 사전 DB에 기초하여 "게수"에 대한 후보 단어들을 판정할 수 있다. 일 실시예에서, OCR 시스템은 사전 DB내의 단어들 중 "게수"와 미리 설정된 임계값 이내의 편집 거리(Edit Distance)를 가지는 단어들을 검색하여 후보 단어들을 판정할 수 있다.FIG. 5 is a diagram illustrating a table 500 in which scores of candidate words are calculated for the word “count” in the recognized text in FIG. 4. The OCR system may determine candidate words for "count" based on the dictionary DB. In one embodiment, the OCR system may determine candidate words by searching for words having an "count" among words in the dictionary DB and an edit distance within a preset threshold.

예를 들어, 도시된 바와 같이 "게수", "계수", "개수", "계사"의 단어들이 편집 거리 2 이내인 단어들로 검색되어 후보 단어들로 판정될 수 있다. 게수의 경우, 편집거리가 0이 될 수 있고, 계수의 경우는, "ㅔ"가 "ㅖ"로 교체되어 교체연산에 의해 편집거리가 1가 될 수 있다. 마찬가지로, "개수"는 교체연산에 의해 편집거리가 1, 계사는 편집거리가 2가 될 수 있다. 여기서, 삽입연산은 특정 문자열에 새로운 문자를 추가함에 따라 발생하는 연산을 의미하고, 교체연산은 특정 문자열에 포함된 문자를 새로운 문자로 교체함에 따라 발생하는 연산을 의미한다. 또한, 삭제연산은 특정 문자열에 포함된 문자를 삭제함에 따라 발생하는 연산을 의미하고, 전위연산은 특정 문자열에 포함된 서로 인접한 문자의 순서를 변경함에 따라 발생하는 연산을 의미한다.For example, as shown, words of "count", "count", "count", and "count" may be searched for words within an editing distance of 2 and may be determined as candidate words. In the case of the odd number, the editing distance may be 0, and in the case of the coefficient, "ㅔ" may be replaced with "ㅖ" and the editing distance may be 1 by the replacement operation. Similarly, the "number" can be edited distance 1 and house 2 by editing operation. Here, the insertion operation means an operation generated by adding a new character to a specific string, and the replacement operation means an operation generated by replacing a character included in the specific string with a new character. In addition, the delete operation refers to an operation generated by deleting a character included in a specific string, and the prefix operation refers to an operation generated by changing the order of adjacent characters included in a specific string.

하나 이상의 후보 단어를 선출한 후, OCR 시스템은 언어 모델에 기초하여 각 후보 단어에 대한 점수를 산출할 수 있다. 점수 산출시 OCR 시스템은 해당 단어의 앞에 위치하는 단어들과 해당 단어의 뒤에 위치하는 단어들을 참조하여 점수를 산출할 수 있다. 예를 들어, 언어 모델은 RNN(Recurrent Neural Network) 언어 모델일 수 있으며, 해당 단어 앞의 5 내지 10 개의 단어 및 해당 단어 뒤의 2 내지 3 개의 단어를 참조할 수 있다.After electing one or more candidate words, the OCR system may calculate a score for each candidate word based on the language model. In calculating the score, the OCR system may calculate a score by referring to words positioned before the word and words positioned behind the word. For example, the language model may be a Recurrent Neural Network (RNN) language model, and may refer to 5 to 10 words before the word and 2 to 3 words after the word.

도 4를 참조하면, 제1 텍스트 정보에 포함된 "게수"의 앞 단어는 "이차항" 하나이며, 뒷 단어는 "이차함수", "f(x)", "함수" 등이 될 수 있다. 후보 단어 중 "계수"의 경우, 앞 단어인 이차항과 뒷 단어인 "이차함수", "f(x)", "함수" 와 높은 연관성을 가지므로, OCR 시스템이 언어 모델에 기초하여 도시된 바와 같이 95점의 점수를 산출할 수 있다. 반대로, 나머지 후보 단어들은 앞 단어와 뒷 단어들과 연관성이 높지 않아 45점, 55점, 35점의 상대적으로 낮은 점수가 부여될 수 있다.Referring to FIG. 4, the first word of "colon" included in the first text information may be one "secondary term", and the second word may be "secondary function", "f (x)", "function", or the like. . In the case of "coefficient" of candidate words, the OCR system is shown based on the language model because it has a high correlation with the preceding word, the second term, and the following words, "secondary function", "f (x)", and "function". As described above, a score of 95 points can be calculated. On the contrary, the remaining candidate words are not highly related to the front word and the back word, so that relatively low scores of 45, 55, and 35 points can be given.

후보 단어 중 최고 점수를 갖는 후보 단어의 점수가 미리 설정된 임계값(Threshold) 이상인 경우, 최고 점수를 갖는 후보 단어를 최종 단어로 선택할 수 있다. 반면, 최고 점수를 갖는 후보 단어의 점수가 미리 설정된 임계값 미만인 경우, 해당 단어를 인식된 텍스트에서 삭제하고, 해당 단어가 포함된 영역을 텍스트 라인 영역에서 제외시킬 수 있다. 예를 들어, 미리 설정된 임계값이 80점인 경우, "게수"에 대한 후보 단어들 중 최고 점수를 갖는 "계수"가 95점으로 임계값 이상이기 때문에 "계수"를 최종 단어로 선택할 수 있다.If the score of the candidate word having the highest score among the candidate words is equal to or greater than a preset threshold, the candidate word having the highest score may be selected as the final word. On the other hand, when the score of the candidate word having the highest score is less than the preset threshold, the word may be deleted from the recognized text, and the area containing the word may be excluded from the text line area. For example, when the preset threshold is 80 points, the "coefficient" may be selected as the final word because the "coefficient" having the highest score among candidate words for the "count" is 95 or more thresholds.

도 6은 사전 DB 및 언어 모델에 기초하여 제1 이미지(310)에서 인식된 텍스트를 보정한 예시를 나타내는 도면이다. 앞에서 설명한 바와 같이, 제1 이미지(310)에서 인식된 텍스트에 포함된 모든 단어에 대해 도 5의 보정 과정을 반복하여 인식된 텍스트를 보정할 수 있다. 예를 들어, 도 6에 도시된 바와 같이 "게수" 및 "그간"이 사전 DB 및 언어 모델에 기초하여 "계수" 및 "구간"으로 보정될 수 있다. 또한, "g")..― ln(.r+l) '"의 경우, 최고 점수를 갖는 후보 단어가 미리 설정된 임계값 미만인 것으로 판정되어, 인식된 텍스트에서 삭제되고, 해당 영역(460)이 텍스트 라인 영역에서 제외될 수 있다. 이러한 과정을 통해 제1 이미지(310)로부터 제1 텍스트 정보를 추출할 수 있다.6 is a diagram illustrating an example of correcting text recognized in the first image 310 based on a dictionary DB and a language model. As described above, the recognized text may be corrected by repeating the correcting process of FIG. 5 for all words included in the recognized text in the first image 310. For example, as shown in FIG. 6, "counts" and "medium" may be corrected to "coefficient" and "segment" based on dictionary DB and language model. Further, in the case of "g") ..- ln (.r + l) '", the candidate word having the highest score is determined to be below a preset threshold, and is deleted from the recognized text, and the corresponding area 460 is deleted. The first text information may be extracted from the first image 310 through the process.

도 7은 제1 이미지(310)로부터 추출된 제1 텍스트 정보를 콘텐츠 DB에 저장된 제2 이미지(710)와 연관된 제2 텍스트 정보와 비교하여 텍스트 유사도 점수를 산출하는 예시를 나타내는 도면이다. 제2 텍스트 정보는 제2 이미지(710)와 연관되어 콘텐츠 DB에 저장되어 있을 수 있다. 도시된 바와 같이, 제1 이미지(310)와 제2 이미지(710)는 동일한 수학 문제 콘텐츠를 포함하고 있으나 편집의 차이에 의해 제2 이미지(710)의 폭이 더 길어서 전체적인 배치는 상이하다. 제1 이미지(310)와 제2 이미지(710)는 전체적인 배치는 상이하지만 동일한 텍스트를 포함하고 있으므로, 텍스트 검색 엔진은 제1 이미지(310)와 제2 이미지(710)의 텍스트 유사도 점수를 높게 산출할 수 있다. 예를 들어, 제1 이미지(310)와 제2 이미지(710)의 텍스트 유사도 점수는 95점일 수 있다.FIG. 7 is a diagram illustrating an example of calculating a text similarity score by comparing first text information extracted from the first image 310 with second text information associated with the second image 710 stored in the content DB. The second text information may be stored in the content DB in association with the second image 710. As shown, the first image 310 and the second image 710 contain the same mathematical problem content, but the overall layout is different because the width of the second image 710 is longer due to the difference in editing. Since the first image 310 and the second image 710 are identical in their overall layout but include the same text, the text search engine calculates a high text similarity score of the first image 310 and the second image 710. can do. For example, the text similarity score of the first image 310 and the second image 710 may be 95 points.

도 8은 텍스트 라인 영역(410 내지 450)이 제거된 제1 이미지(810)의 예시를 나타내는 도면이다. 객체 인식 시스템은 제1 이미지(310)로부터 텍스트 라인 영역(410 내지 450)을 제거할 수 있다. 앞서 설명한 것과 같이, 도 4에 도시된 영역(460)은 텍스트 라인 영역에서 제외되어 제1 이미지(310)로부터 제거되지 않는다.8 is a diagram illustrating an example of a first image 810 from which text line regions 410 to 450 are removed. The object recognition system may remove the text line areas 410 through 450 from the first image 310. As described above, the region 460 shown in FIG. 4 is excluded from the text line region and is not removed from the first image 310.

객체 인식 시스템은 텍스트 라인 영역(410 내지 450)이 제거된 제1 이미지(810)로부터 제1 세트의 하나 이상의 부분 이미지를 추출할 수 있다. 도시된 바와 같이, 텍스트 라인 영역(410 내지 450)이 제거된 제1 이미지(310)로부터 수식이 포함된 부분 이미지(820)가 추출될 수 있다. 또한, 텍스트 라인 영역이 제거되지 않은 제1 이미지(310)도 부분 이미지로서 추출될 수 있다. 따라서, 제1 이미지(310)의 경우, 수식 부분이 포함된 부분 이미지(820)와 제1 이미지(310)가 제1 세트의 부분 이미지에 포함될 수 있다.The object recognition system may extract the first set of one or more partial images from the first image 810 from which the text line regions 410-450 have been removed. As illustrated, the partial image 820 including the formula may be extracted from the first image 310 from which the text line regions 410 to 450 are removed. In addition, the first image 310 in which the text line area is not removed may also be extracted as a partial image. Accordingly, in the case of the first image 310, the partial image 820 including the modifier part and the first image 310 may be included in the first set of partial images.

그리고 나서, 객체 인식 시스템은 부분 이미지(820) 내의 객체를 인식할 수 있다. 예를 들어, 부분 이미지(820) 내의 객체는 수식으로 인식될 수 있다. 제1 이미지(310)의 경우, 객체 인식 시스템이 전체 이미지로 인식할 수 있다.The object recognition system can then recognize the object in the partial image 820. For example, an object in the partial image 820 may be recognized by an equation. In the case of the first image 310, the object recognition system may recognize the entire image.

도 9는 본 개시의 일 실시예에 따른 이미지 유사도 점수를 산출하는 예시를 나타내는 도면이다. 이미지 검색 시스템은 제1 이미지(310)으로부터 추출된 제1 세트의 부분 이미지(310, 820)와 제2 이미지(710)와 연관된 제2 세트의 부분 이미지(710, 910)에 기초하여 이미지 유사도 점수를 산출할 수 있다. 제2 이미지(710)와 연관된 제2 세트의 부분 이미지(710, 910)는 제2 이미지(710)와 연관되어 콘텐츠 DB에 저장되어 있을 수 있다.9 is a diagram illustrating an example of calculating an image similarity score according to an embodiment of the present disclosure. The image retrieval system scores an image similarity based on the first set of partial images 310, 820 extracted from the first image 310 and the second set of partial images 710, 910 associated with the second image 710. Can be calculated. The second set of partial images 710, 910 associated with the second image 710 may be stored in the content DB in association with the second image 710.

일 실시예에서, 이미지 검색 시스템은 동일 유형의 객체(예를 들어, 수식, 그래프, 도형 및 그림, 전체 이미지)가 포함된 부분 이미지끼리 비교하여 부분 이미지 유사도 점수를 산출할 수 있다. 예를 들어, 이미지 검색 시스템은 전체 이미지에 해당하는 제1 이미지(310)와 제2 이미지(710)의 부분 이미지 유사도 점수를 산출하고, 수식 객체가 포함된 부분 이미지(820)와 부분 이미지(910)의 부분 이미지 유사도 점수를 산출할 수 있다. 다른 실시예에서, 이미지 검색 시스템은 유사도가 높은 부분 이미지끼리 비교하여 부분 이미지 유사도 점수를 산출할 수 있다. 예를 들어, 이미지 검색 시스템은 제1 이미지(310)를 제2 이미지(710) 및 부분 이미지(910)와 비교하여 유사도가 높은 제2 이미지(710)와의 부분 이미지 유사도 점수를 산출할 수 있다.In one embodiment, the image retrieval system may calculate partial image similarity scores by comparing partial images including objects of the same type (eg, equations, graphs, figures and drawings, and entire images). For example, the image retrieval system calculates a partial image similarity score of the first image 310 and the second image 710 corresponding to the entire image, and the partial image 820 and the partial image 910 including the equation object. The partial image similarity score of) can be calculated. In another embodiment, the image retrieval system may calculate partial image similarity scores by comparing partial images having high similarity. For example, the image retrieval system may calculate the partial image similarity score with the second image 710 having a high similarity by comparing the first image 310 with the second image 710 and the partial image 910.

도시된 바와 같이, 제1 이미지(310)와 제2 이미지(710) 사이의 제1 부분 이미지 유사도 점수는, 제1 이미지(310)와 제2 이미지(710)가 동일 수학 문제 콘텐츠를 포함하지만 배치가 상이하므로, 낮게 산출될 수 있다. 예를 들어, 제1 부분 이미지 점수는 50점일 수 있다. 부분 이미지(820)와 부분 이미지(910)는 동일한 수식이 포함되어 있으므로, 부분 이미지(820)와 부분 이미지(910) 사이의 제2 부분 이미지 유사도 점수는 높게 산출될 수 있다. 예를 들어, 제2 부분 이미지 유사도 점수는 90점일 수 있다.As shown, the first partial image similarity score between the first image 310 and the second image 710 is arranged although the first image 310 and the second image 710 contain the same mathematical problem content. Since it is different, it can be calculated low. For example, the first partial image score may be 50 points. Since the partial image 820 and the partial image 910 include the same equation, the second partial image similarity score between the partial image 820 and the partial image 910 may be calculated to be high. For example, the second partial image similarity score may be 90 points.

일 실시예에서, 이미지 유사도 점수는 산출된 각각의 부분 이미지 유사도 점수의 가중합을 통해 산출될 수 있다. 부분 이미지 유사도 점수에 적용되는 가중치는 부분 이미지 내의 객체 유형(수식, 그래프, 도형, 그림, 전체 이미지)에 따라 상이하게 적용될 수 있다. 예를 들어, 객체 유형이 수식인 경우가 객체 유형이 전체 이미지인 경우보다 높은 가중치가 적용될 수 있다. 따라서, 제1 이미지(310)와 제2 이미지(710) 내의 콘텐츠 배치가 상이하더라도 이미지 유사도 점수는 높게 산출될 수 있다.In one embodiment, the image similarity score may be calculated through a weighted sum of each calculated partial image similarity score. The weight applied to the partial image similarity score may be applied differently according to the object type (formula, graph, figure, picture, whole image) in the partial image. For example, when the object type is a formula, a higher weight may be applied than when the object type is an entire image. Therefore, even if the content arrangement in the first image 310 and the second image 710 is different, the image similarity score may be calculated to be high.

그 후, 검색 시스템은 텍스트 유사도 점수와 이미지 유사도 점수에 기초하여 제1 이미지(310)와 제2 이미지(710)의 종합 유사도 점수를 산출할 수 있다. 일 실시예에서, 텍스트 유사도 점수에 적용되는 가중치는 부분 이미지 유사도 점수에 적용되는 가중치보다 클 수 있다. 예를 들어, 제1 이미지(310)와 제2 이미지(710)의 종합 유사도 점수는 아래의 수식에 의해 산출될 수 있다.Thereafter, the search system may calculate a comprehensive similarity score of the first image 310 and the second image 710 based on the text similarity score and the image similarity score. In one embodiment, the weight applied to the text similarity score may be greater than the weight applied to the partial image similarity score. For example, the comprehensive similarity score of the first image 310 and the second image 710 may be calculated by the following equation.

여기서,

은 텍스트 유사도 점수(95점)를 나타내고,

는 제1 부분 이미지 유사도 점수(50점)를 나타내고,

는 제2 부분 이미지 유사도 점수(90점)를 나타내고,

는 종합 유사도 점수를 나타내고,

는 텍스트 유사도 점수에 적용되는 가중치(0.6)를 나타내고,

는 제1 부분 이미지 유사도 점수에 적용되는 가중치(0.3)를 나타내고,

는 제2 부분 이미지 유사도 점수에 적용되는 가중치(0.1)를 나타낼 수 있다. 이 경우, 종합 유사도 점수(

)는 89점으로 산출될 수 있다.here,

Indicates a text similarity score (95 points),

Represents a first partial image similarity score (50 points),

Represents a second partial image similarity score (90 points),

Represents a composite similarity score,

Represents the weight (0.6) applied to the text similarity score,

Represents a weight (0.3) applied to the first partial image similarity score,

May represent a weight (0.1) applied to the second partial image similarity score. In this case, the composite similarity score (

) Can be calculated as 89.

일 실시예에서, 검색 시스템은 산출된 종합 유사도 점수에 기초하여 제1 이미지(310)와 제2 이미지(710) 사이의 관계(동일 콘텐츠, 유사 콘텐츠, 비유사 콘텐츠 등)를 결정할 수 있다. 예를 들어, 종합 유사도 점수가 미리 설정된 제1 임계값(80점) 이상인 경우, 제1 이미지와 제2 이미지는 동일 콘텐츠를 포함하는 것으로 판정할 수 있다. 제1 이미지(310)와 제2 이미지(710)의 종합 유사도 점수는 89점으로 산출되었으므로, 검색 시스템은 제1 이미지와 제2 이미지는 동일 콘텐츠를 포함하는 것으로 판정할 수 있다.In one embodiment, the search system may determine a relationship (same content, similar content, dissimilar content, etc.) between the first image 310 and the second image 710 based on the calculated overall similarity score. For example, when the overall similarity score is equal to or greater than a first threshold value (80 points), the first image and the second image may be determined to include the same content. Since the total similarity score between the first image 310 and the second image 710 is calculated as 89 points, the search system may determine that the first image and the second image include the same content.

도 10 내지 도 12는 본 개시의 다른 실시예에 따른 이미지 유사도 점수를 산출하는 예시를 나타내는 도면이다. 도 10은 본 개시의 다른 실시예에 따른 제1 이미지(1010) 및 제2 이미지(1020)를 나타내는 도면이다. 도시된 바와 같이, 제1 이미지(1010) 및 제2 이미지(1020)는 동일한 수학 문제 콘텐츠를 포함하고 있으나, 편집의 차이에 의해 텍스트와 객체의 배치가 서로 상이하다. 앞에서 설명한 것과 동일한 방식으로 제1 이미지(1010)로부터 제1 텍스트 정보가 추출되며, 추출된 제1 텍스트 정보와 콘텐츠 DB에 저장된 제2 이미지(1020)와 연관된 제2 텍스트 정보의 유사도를 비교하여 텍스트 유사도 점수가 산출될 수 있다.10 to 12 are diagrams illustrating an example of calculating an image similarity score according to another exemplary embodiment of the present disclosure. 10 is a diagram illustrating a first image 1010 and a second image 1020 according to another embodiment of the present disclosure. As shown, the first image 1010 and the second image 1020 contain the same mathematical problem content, but the arrangement of the text and the object are different from each other due to a difference in editing. The first text information is extracted from the first image 1010 in the same manner as described above, and the text is compared by comparing the similarity between the extracted first text information and the second text information associated with the second image 1020 stored in the content DB. Similarity scores can be calculated.

또한, 앞에서 설명한 것과 동일한 방식으로 제1 이미지(1010)로부터 제1 세트의 부분 이미지가 추출될 수 있다. 도시된 바와 같이, 제1 이미지(1010), 수식이 포함된 제1 부분 이미지(1112), 도형이 포함된 제2 부분 이미지(1210)가 추출될 수 있다. 이미지 검색 시스템은 제1 세트의 부분 이미지(1010, 1112, 1210)와 콘텐츠 DB에 제2 이미지(1020)와 연관되어 저장된 제2 세트의 부분 이미지(1020, 1122, 1220)의 유사도를 비교하여 부분 이미지 유사도 점수 및 이미지 유사도 점수를 산출할 수 있다. 예를 들어, 전체 이미지인 제1 이미지(1010)와 제2 이미지(1020)를 비교하여 제1 부분 이미지 유사도 점수를 산출하고, 수식이 포함된 제1 부분 이미지(1112)와 수식이 포함된 제3 부분 이미지(1122)를 비교하여 제2 부분 이미지 유사도 점수를 산출하고, 도형이 포함된 제2 부분 이미지(1210)와 도형이 포함된 제4 부분 이미지(1220)를 비교하여 제3 부분 이미지 유사도 점수를 산출할 수 있다.In addition, a first set of partial images may be extracted from the first image 1010 in the same manner as described above. As illustrated, the first image 1010, the first partial image 1112 including the equation, and the second partial image 1210 including the figure may be extracted. The image retrieval system compares the similarity between the first set of partial images 1010, 1112, 1210 and the second set of partial images 1020, 1122, 1220 stored in association with the second image 1020 in the content DB. An image similarity score and an image similarity score may be calculated. For example, a first partial image similarity score is calculated by comparing the first image 1010 and the second image 1020, which are the entire image, and the first partial image 1112 including the formula and the formula including the formula. Comparing the three partial images 1122, a second partial image similarity score is calculated, and a third partial image similarity is compared by comparing the second partial image 1210 including a figure and the fourth partial image 1220 including a figure. The score can be calculated.

이미지 유사도 점수는 제1 내지 제3 부분 이미지 유사도 점수의 가중합을 통해 산출될 수 있다. 부분 이미지 유사도 점수에 적용되는 가중치는 부분 이미지 내의 객체 유형(수식, 그래프, 도형, 그림, 전체 이미지)에 따라 상이하게 적용될 수 있다. 예를 들어, 객체 유형이 도형인 경우가 객체 유형이 수식인 경우보다 높은 가중치가 적용되고, 객체 유형이 수식인 경우가 객체 유형이 전체 이미지인 경우보다 높은 가중치가 적용될 수 있다. 그 후, 앞서 설명한 것과 마찬가지 방식으로 텍스트 유사도 점수와 이미지 유사도 점수에 기초하여 종합 유사도 점수가 산출되어, 제1 이미지(1010)와 제2 이미지(1020)는 동일한 콘텐츠를 포함하는 것으로 판정될 수 있다.The image similarity score may be calculated through a weighted sum of the first to third partial image similarity scores. The weight applied to the partial image similarity score may be applied differently according to the object type (formula, graph, figure, picture, whole image) in the partial image. For example, when the object type is a figure, a higher weight may be applied than when the object type is a formula, and when the object type is a formula, a higher weight may be applied than when the object type is an entire image. Thereafter, in the same manner as described above, a comprehensive similarity score may be calculated based on the text similarity score and the image similarity score, so that the first image 1010 and the second image 1020 may include the same content. .

도 13은 본 개시의 일 실시예에 따른 콘텐츠 인식 방법(1300)을 나타내는 순서도이다. 콘텐츠를 인식하기 위한 방법(1300)은 제1 콘텐츠의 적어도 일부가 포함된 제1 이미지를 수신하는 단계(1010)로 개시될 수 있다. 일 실시예에서, 콘텐츠 인식 시스템의 통신부가 사용자 단말로부터 전송된 제1 이미지를 수신하여 수신된 제1 이미지를 프로세서로 제공할 수 있다.13 is a flowchart illustrating a content recognizing method 1300 according to an embodiment of the present disclosure. The method 1300 for recognizing content may begin with receiving 1010 a first image that includes at least a portion of the first content. In an embodiment, the communication unit of the content recognizing system may receive the first image transmitted from the user terminal and provide the received first image to the processor.

그 후, 단계(1320)에서 제1 이미지로부터 제1 텍스트 정보를 추출할 수 있다. 예를 들어, OCR 시스템은 제1 이미지 내의 하나 이상의 텍스트 라인 영역을 검출하여 광학 문자 판독(OCR: Optical Character Recognition) 기술을 이용해 텍스트를 인식할 수 있다. 인식된 텍스트는 사전 DB 및 언어 모델(Language Model)에 기초하여 보정될 수 있다.Thereafter, in operation 1320, first text information may be extracted from the first image. For example, the OCR system may detect one or more text line regions in the first image to recognize text using Optical Character Recognition (OCR) technology. The recognized text may be corrected based on a dictionary DB and a language model.

제1 텍스트 정보를 추출한 후, 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출하는 단계(1330)가 수행될 수 있다. 일 실시예에서, 객체 인식 시스템은 제1 이미지로부터 텍스트 라인 영역을 제거하고, 클러스터링 기법을 사용하여 제1 이미지로부터 제1 세트의 하나 이상의 부분 이미지를 추출할 수 있다. 그리고 나서, 객체 인식 시스템은 예를 들어, 딥러닝 기반 분류 모델에 기초하여 추출된 부분 이미지 내에 포함된 객체를 인식할 수 있다. 여기서, 인식되는 객체는 수식, 그래프, 도형, 그림 및 전체 이미지를 포함할 수 있다.After extracting the first text information, an operation 1330 of extracting the first set of one or more partial images from the first image may be performed. In one embodiment, the object recognition system may remove a text line area from the first image and extract a first set of one or more partial images from the first image using a clustering technique. The object recognition system may then recognize the object included in the extracted partial image, for example based on the deep learning based classification model. Here, the recognized object may include an equation, a graph, a figure, a picture, and an entire image.

그리고 나서, 제1 이미지로부터 추출된 제1 텍스트 정보와 콘텐츠 DB에 저장된 제2 이미지와 연관된 제2 텍스트 정보에 기초하여 제1 이미지와 제2 이미지의 텍스트 유사도 점수를 산출하는 단계(1340)가 수행될 수 있다. 일 실시예에서, 텍스트 검색 시스템은 제1 텍스트 정보와 제2 텍스트 정보 내의 전체 텍스트를 비교하여 텍스트 유사도 점수를 산출할 수 있다. 다른 실시예에서, 텍스트 검색 시스템은 제1 텍스트 정보와 제2 텍스트 정보 내의 키워드를 비교하여 텍스트 유사도 점수를 산출할 수 있다.In operation 1340, a text similarity score between the first image and the second image is calculated based on the first text information extracted from the first image and the second text information associated with the second image stored in the content DB. Can be. In one embodiment, the text search system may calculate a text similarity score by comparing the entire text in the first text information and the second text information. In another embodiment, the text search system may calculate a text similarity score by comparing the keywords in the first text information and the second text information.

그 후, 제1 이미지로부터 추출된 제1 세트의 하나 이상의 부분 이미지와 콘텐츠 DB에 제2 이미지와 연관되어 저장된 제2 세트의 하나 이상의 부분 이미지에 기초하여 이미지 유사도 점수를 산출하는 단계(1350)가 수행될 수 있다. 여기서, 이미지 유사도 점수는 하나 이상의 부분 이미지 유사도 점수를 포함할 수 있으며, 이미지 유사도 점수는 부분 이미지 유사도 점수들의 가중합으로 산출될 수 있다. 일 실시예에서, 이미지 검색 시스템은 동일한 객체 유형을 가지는 부분 이미지들 사이의 부분 이미지 유사도 점수를 산출할 수 있다.Thereafter, calculating 1350 an image similarity score based on the first set of one or more partial images extracted from the first image and the second set of one or more partial images stored in association with the second image in the content DB. Can be performed. Here, the image similarity score may include one or more partial image similarity scores, and the image similarity score may be calculated as a weighted sum of the partial image similarity scores. In one embodiment, the image retrieval system may calculate partial image similarity scores between partial images having the same object type.

텍스트 유사도 점수와 이미지 유사도 점수를 산출한 후, 텍스트 유사도 점수 및 이미지 유사도 점수에 기초하여 제1 이미지와 제2 이미지의 종합 유사도 점수를 산출하는 단계(1360)를 수행할 수 있다. 일부 실시예에서, 검색 시스템은 텍스트 유사도 점수 및 하나 이상의 부분 이미지 유사도 점수에 가중치를 적용하여 종합 유사도 점수를 산출할 수 있다. 예를 들어, 종합 유사도 점수는 다음 수식에 의해 산출될 수 있다.After calculating the text similarity score and the image similarity score, an operation of calculating a comprehensive similarity score of the first image and the second image based on the text similarity score and the image similarity score may be performed 1360. In some embodiments, the search system may calculate a composite similarity score by applying weights to the text similarity score and the one or more partial image similarity scores. For example, the overall similarity score may be calculated by the following equation.

여기서,

은 텍스트 유사도 점수를 나타내고,

는 부분 이미지 유사도 점수를 나타내고,

는 종합 유사도 점수를 나타내고,

는 텍스트 유사도 점수에 적용되는 가중치를 나타내고,

는 부분 이미지 유사도 점수에 적용되는 가중치

보다 클 수 있다. 또한, 부분 이미지 유사도 점수에 적용되는 가중치

는 부분 이미지 내의 객체 유형(수식, 그래프, 도형, 그림, 전체 이미지)에 따라 상이하게 적용될 수 있다.here,

Indicates a text similarity score,

Represents a partial image similarity score,

Represents a composite similarity score,

Represents the weight applied to the text similarity score,

Is a weight applied to the partial image similarity score

Can be greater than Also, the weight applied to the partial image similarity score

May be applied differently according to the object type (formula, graph, figure, picture, whole image) in the partial image.

마지막으로, 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 콘텐츠의 적어도 일부가 제2 이미지에 포함된 것으로 판정하는 단계(1070)가 수행될 수 있다. 일부 실시예에서, 검색 시스템은 산출된 종합 유사도 점수에 기초하여 제1 이미지와 제2 이미지 사이의 관계(동일 콘텐츠, 유사 콘텐츠, 비유사 콘텐츠 등)를 결정할 수 있다. 예를 들어, 종합 유사도 점수가 미리 설정된 제1 임계값 이상인 경우, 제1 이미지와 제2 이미지는 동일 콘텐츠를 포함하는 것으로 판정할 수 있다.Finally, if the overall similarity score is equal to or greater than the first threshold value, step 1070 may be performed to determine that at least a portion of the first content is included in the second image. In some embodiments, the search system may determine a relationship (same content, similar content, dissimilar content, etc.) between the first image and the second image based on the calculated overall similarity score. For example, when the comprehensive similarity score is equal to or greater than the first threshold value, the first image and the second image may be determined to include the same content.

종합 유사도 점수가 미리 설정된 제2 임계값 이상이고, 제1 임계값 미만인 경우, 제1 이미지와 제2 이미지가 유사 콘텐츠를 포함하는 것으로 판정할 수 있다. 또한, 종합 유사도 점수가 미리 설정된 제2 임계값 미만인 경우, 제1 이미지와 제2 이미지는 비유사한 콘텐츠를 포함하는 것으로 판정할 수 있다. 검색 시스템은 사용자 단말로부터 전송된 제1 이미지와 콘텐츠 DB에 저장된 모든 이미지를 비교하여, 동일 또는 유사 콘텐츠를 검색할 수 있다. 또한, 사용자 단말로부터 전송된 제1 이미지와 제1 이미지로부터 추출된 제1 텍스트 정보 및 제1 세트의 부분 이미지는 콘텐츠 DB에 함께 저장될 수 있다.When the overall similarity score is equal to or greater than a second preset threshold and less than the first threshold, it may be determined that the first image and the second image include similar content. In addition, when the comprehensive similarity score is less than the second threshold, the first image and the second image may be determined to include dissimilar content. The search system may search for the same or similar content by comparing the first image transmitted from the user terminal with all images stored in the content DB. In addition, the first image transmitted from the user terminal, the first text information extracted from the first image, and the first set of partial images may be stored together in the content DB.

도 13에서는 제1 텍스트 정보를 추출하는 단계(1320) 및 텍스트 유사도 점수를 산출하는 단계(1340)가 부분 이미지를 추출하는 단계(1330) 및 이미지 유사도 점수를 산출하는 단계(1350)보다 먼저 진행되도록 도시되어 있으나, 이에 한정되지 않고, 부분 이미지를 추출하는 단계(1330) 및 이미지 유사도 점수를 산출하는 단계(1350)가 먼저 진행될 수도 있다. 대안적으로, 제1 텍스트 정보를 추출하는 단계(1320) 및 텍스트 유사도 점수를 산출하는 단계(1340)가 부분 이미지를 추출하는 단계(1330) 및 이미지 유사도 점수를 산출하는 단계(1350)와 병렬적으로 진행될 수 있다.In FIG. 13, the extracting of the first text information 1320 and the calculating of the text similarity score 1340 are performed before the extracting the partial image 1330 and the calculating the image similarity score 1350. Although illustrated, the present invention is not limited thereto, and the extracting of the partial image 1330 and the calculating of the image similarity score 1350 may be performed first. Alternatively, extracting first text information 1320 and calculating text similarity score 1340 are in parallel with extracting partial images 1330 and calculating image similarity score 1350. It can proceed to.

상술한 콘텐츠를 인식하기 위한 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수도 있다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 상기 실시예들을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The method for recognizing the above-described content may be embodied as computer readable code on a computer readable recording medium. Computer-readable recording media include all kinds of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for implementing the above embodiments can be easily inferred by programmers in the art to which the present invention belongs.

본원에 기술된 기법들은 다양한 수단에 의해 구현될 수도 있다. 예를 들어, 이러한 기법들은 하드웨어, 펌웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수도 있다. 본원의 개시와 연계하여 설명된 다양한 예시적인 논리적 블록들, 모듈들, 회로들, 및 알고리즘 단계들은 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양자의 조합들로 구현될 수도 있음을 당업자들은 더 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호교환성을 명확하게 설명하기 위해, 다양한 예시적인 컴포넌트들, 블록들, 모듈들, 회로들, 및 단계들이 그들의 기능성의 관점에서 일반적으로 위에서 설명되었다. 그러한 기능이 하드웨어로서 구현되는지 또는 소프트웨어로서 구현되는 지의 여부는, 특정 애플리케이션 및 전체 시스템에 부과되는 설계 제약들에 따라 달라진다. 당업자들은 각각의 특정 애플리케이션을 위해 다양한 방식들로 설명된 기능을 구현할 수도 있으나, 그러한 구현 결정들은 본 개시의 범위로부터 벗어나게 하는 것으로 해석되어서는 안된다.The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. Those skilled in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

하드웨어 구현에서, 기법들을 수행하는 데 이용되는 프로세싱 유닛들은, 하나 이상의 ASIC들, DSP들, 디지털 신호 프로세싱 디바이스들 (digital signal processing devices; DSPD들), 프로그램가능 논리 디바이스들 (programmable logic devices; PLD들), 필드 프로그램가능 게이트 어레이들 (field programmable gate arrays; FPGA들), 프로세서들, 제어기들, 마이크로제어기들, 마이크로프로세서들, 전자 디바이스들, 본원에 설명된 기능들을 수행하도록 설계된 다른 전자 유닛들, 컴퓨터, 또는 이들의 조합 내에서 구현될 수도 있다.In a hardware implementation, the processing units used to perform the techniques may include one or more ASICs, DSPs, digital signal processing devices (DSPDs), programmable logic devices (PLDs) Field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, It may be implemented within a computer, or a combination thereof.

따라서, 본원의 개시와 연계하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 및 회로들은 범용 프로세서, DSP, ASIC, FPGA나 다른 프로그램 가능 논리 디바이스, 이산 게이트나 트랜지스터 로직, 이산 하드웨어 컴포넌트들, 또는 본원에 설명된 기능들을 수행하도록 설계된 것들의 임의의 조합으로 구현되거나 수행될 수도 있다. 범용 프로세서는 마이크로프로세서일 수도 있지만, 대안에서, 프로세서는 임의의 종래의 프로세서, 제어기, 마이크로제어기, 또는 상태 머신일 수도 있다. 프로세서는 또한, 컴퓨팅 디바이스들의 조합, 예를 들면, DSP와 마이크로프로세서, 복수의 마이크로프로세서들, DSP 코어와 연계한 하나 이상의 마이크로프로세서들, 또는 임의의 다른 그러한 구성의 조합으로서 구현될 수도 있다.Accordingly, various exemplary logic blocks, modules, and circuits described in connection with the disclosure herein may be used in general purpose processors, DSPs, ASICs, FPGAs or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or It may be implemented or performed in any combination of those designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

펌웨어 및/또는 소프트웨어 구현에 있어서, 기법들은 랜덤 액세스 메모리 (random access memory; RAM), 판독 전용 메모리 (read-only memory; ROM), 불휘발성 RAM (non-volatile random access memory; NVRAM), PROM (programmable read-only memory), EPROM (erasable programmable read-only memory), EEPROM (electrically erasable PROM), 플래시 메모리, 컴팩트 디스크 (compact disc; CD), 자기 또는 광학 데이터 스토리지 디바이스 등과 같은 컴퓨터 판독가능 매체 상에 저장된 명령들로서 구현될 수도 있다. 명령들은 하나 이상의 프로세서들에 의해 실행가능할 수도 있고, 프로세서(들)로 하여금 본원에 설명된 기능의 특정 양태들을 수행하게 할 수도 있다.In firmware and / or software implementations, the techniques are random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), PROM ( on computer readable media such as programmable read-only memory (EPROM), erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, compact disc (CD), magnetic or optical data storage devices, and the like. It may be implemented as stored instructions. The instructions may be executable by one or more processors, and may cause the processor (s) to perform certain aspects of the functionality described herein.

소프트웨어로 구현되면, 상기 기능들은 하나 이상의 명령들 또는 코드로서 컴퓨터 판독 가능한 매체 상에 저장되거나 또는 컴퓨터 판독 가능한 매체를 통해 전송될 수도 있다. 컴퓨터 판독가능 매체들은 한 장소에서 다른 장소로 컴퓨터 프로그램의 전송을 용이하게 하는 임의의 매체를 포함하여 컴퓨터 저장 매체들 및 통신 매체들 양자를 포함한다. 저장 매체들은 컴퓨터에 의해 액세스될 수 있는 임의의 이용 가능한 매체들일 수도 있다. 비제한적인 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EEPROM, CD-ROM 또는 다른 광학 디스크 스토리지, 자기 디스크 스토리지 또는 다른 자기 스토리지 디바이스들, 또는 소망의 프로그램 코드를 명령들 또는 데이터 구조들의 형태로 이송 또는 저장하기 위해 사용될 수 있으며 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체를 포함할 수 있다. 또한, 임의의 접속이 컴퓨터 판독가능 매체로 적절히 칭해진다.If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a computer. By way of non-limiting example, such computer-readable media may be in the form of RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or desired program code in the form of instructions or data structures. Or any other medium that can be used for transport or storage to a computer and that can be accessed by a computer. Also, any connection is properly termed a computer readable medium.

예를 들어, 소프트웨어가 동축 케이블, 광섬유 케이블, 연선, 디지털 가입자 회선 (DSL), 또는 적외선, 무선, 및 마이크로파와 같은 무선 기술들을 사용하여 웹사이트, 서버, 또는 다른 원격 소스로부터 전송되면, 동축 케이블, 광섬유 케이블, 연선, 디지털 가입자 회선, 또는 적외선, 무선, 및 마이크로파와 같은 무선 기술들은 매체의 정의 내에 포함된다. 본원에서 사용된 디스크 (disk) 와 디스크 (disc)는, CD, 레이저 디스크, 광 디스크, DVD (digital versatile disc), 플로피디스크, 및 블루레이 디스크를 포함하며, 여기서 디스크들 (disks) 은 보통 자기적으로 데이터를 재생하고, 반면 디스크들 (discs) 은 레이저를 이용하여 광학적으로 데이터를 재생한다. 위의 조합들도 컴퓨터 판독가능 매체들의 범위 내에 포함되어야 한다.For example, if the software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, wireless, and microwave, the coaxial cable , Fiber optic cable, twisted pair, digital subscriber line, or wireless technologies such as infrared, wireless, and microwave are included within the definition of a medium. As used herein, disks and disks include CDs, laser disks, optical disks, digital versatile discs, floppy disks, and Blu-ray disks, where the disks are usually magnetic Data is reproduced optically, while discs are optically reproduced using a laser. Combinations of the above should also be included within the scope of computer-readable media.

소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터들, 하드 디스크, 이동식 디스크, CD-ROM, 또는 공지된 임의의 다른 형태의 저장 매체 내에 상주할 수도 있다. 예시적인 저장 매체는, 프로세가 저장 매체로부터 정보를 판독하거나 저장 매체에 정보를 기록할 수 있도록, 프로세서에 커플링될 수 있다. 대안으로, 저장 매체는 프로세서에 통합될 수도 있다. 프로세서와 저장 매체는 ASIC 내에 존재할 수도 있다. ASIC은 유저 단말 내에 존재할 수도 있다. 대안으로, 프로세서와 저장 매체는 유저 단말에서 개별 컴포넌트들로서 존재할 수도 있다.The software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other type of storage medium known in the art. An example storage medium may be coupled to the processor such that the processor can read information from or write information to the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may be present in the user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

본 개시의 앞선 설명은 당업자들이 본 개시를 행하거나 이용하는 것을 가능하게 하기 위해 제공된다. 본 개시의 다양한 수정예들이 당업자들에게 쉽게 자명할 것이고, 본원에 정의된 일반적인 원리들은 본 개시의 취지 또는 범위를 벗어나지 않으면서 다양한 변형예들에 적용될 수도 있다. 따라서, 본 개시는 본원에 설명된 예들에 제한되도록 의도된 것이 아니고, 본원에 개시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위가 부여되도록 의도된다.The previous description of the disclosure is provided to enable a person skilled in the art to make or use the disclosure. Various modifications of the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to various modifications without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the examples described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

비록 예시적인 구현예들이 하나 이상의 독립형 컴퓨터 시스템의 맥락에서 현재 개시된 주제의 양태들을 활용하는 것을 언급할 수도 있으나, 본 주제는 그렇게 제한되지 않고, 오히려 네트워크나 분산 컴퓨팅 환경과 같은 임의의 컴퓨팅 환경과 연계하여 구현될 수도 있다. 또 나아가, 현재 개시된 주제의 양상들은 복수의 프로세싱 칩들이나 디바이스들에서 또는 그들에 걸쳐 구현될 수도 있고, 스토리지는 복수의 디바이스들에 걸쳐 유사하게 영향을 받게 될 수도 있다. 이러한 디바이스들은 PC들, 네트워크 서버들, 및 핸드헬드 디바이스들을 포함할 수도 있다.Although example implementations may refer to utilizing aspects of the presently disclosed subject matter in the context of one or more standalone computer systems, the subject matter is not so limited, but rather in connection with any computing environment, such as a network or a distributed computing environment. It may also be implemented. Moreover, aspects of the presently disclosed subject matter may be implemented in or across a plurality of processing chips or devices, and storage may be similarly affected across a plurality of devices. Such devices may include PCs, network servers, and handheld devices.

비록 본 주제가 구조적 특징들 및/또는 방법론적 작용들에 특정한 언어로 설명되었으나, 첨부된 청구항들에서 정의된 주제가 위에서 설명된 특정 특징들 또는 작용들로 반드시 제한되는 것은 아님이 이해될 것이다. 오히려, 위에서 설명된 특정 특징들 및 작용들은 청구항들을 구현하는 예시적인 형태로서 설명된다.Although the subject matter has been described in language specific to structural features and / or methodological acts, it will be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are described as example forms of implementing the claims.

이 명세서에서 언급된 방법은 특정 실시예들을 통하여 설명되었지만, 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀 질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 실시예들을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.Although the method described in this specification has been described with reference to specific embodiments, it is possible to embody it as computer readable code on a computer readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. And, the functional program, code and code segments for implementing the embodiments can be easily inferred by programmers in the art to which the present invention belongs.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present disclosure has been described herein in connection with some embodiments, various modifications and changes may be made without departing from the scope of the present disclosure to those skilled in the art. Also, such modifications and variations are intended to fall within the scope of the claims appended hereto.

110_1 내지 110_n: 사용자 단말
120: 통신 네트워크
130: 콘텐츠 인식 시스템
140: 통신부
150: 프로세서
160: 데이터베이스
162: 콘텐츠 DB
164: 사전 DB
166: 언어 모델
210: OCR 시스템
220: 객체 인식 시스템
230: 검색 시스템
232: 텍스트 검색 시스템
234: 이미지 검색 시스템110_1 to 110_n: user terminal
120: communication network
130: content recognition system
140: communication unit
150: processor
160: database
162: content DB
164: dictionary DB
166: language model
210: OCR system
220: object recognition system
230: search system
232: text search system
234: image search system

Claims

A method performed by a processor to recognize content,
Receiving a first image that includes at least a portion of the first content;
Extracting first text information from the first image;
Extracting a first set of one or more partial images from the first image, wherein the one or more partial images of the first set correspond to a partial region of the first image including an equation, graph, figure or picture Contains a partial image;
Calculating a text similarity score between the first image and the second image based on the first text information and second text information associated with the second image;
Calculating an image similarity score based on the one or more partial images of the first set and the one or more partial images of the second set associated with the second image, wherein the one or more partial images of the second set are formulas, graphs, figures Or a second partial image corresponding to a partial region of the second image including a picture, wherein the image similarity score is calculated based on the similarity between the first partial image and the second partial image;
Calculating a comprehensive similarity score based on the text similarity score and the image similarity score; And
Determining that at least a portion of the first content is included in the second image when the comprehensive similarity score is equal to or greater than a first threshold value preset;
Comprising a method performed by a processor to recognize content.

The method of claim 1,
Extracting the first text information may include:
Detecting one or more text line regions in the first image;
Recognizing text in the one or more text line areas; And
Correcting the recognized text based on a dictionary database and a language model.
Comprising a method performed by a processor to recognize content.

The method of claim 2,
Correcting the recognized text based on a dictionary database and a language model, for each word included in the recognized text,
Determining one or more candidate words based on the dictionary database;
Calculating a score for each of the one or more candidate words based on the language model; And
If the score of the candidate word having the highest score is greater than or equal to a preset threshold, the candidate word having the highest score is selected as the final word, and if the score of the candidate word having the highest score is less than the preset threshold, the word is included. To exclude the broken region from the text line region
Comprising a method performed by a processor to recognize content.

The method of claim 3,
Extracting the first set of one or more partial images from the first image,
Removing the text line area from the first image; And
Extracting the one or more partial images of the first set based on a clustering technique
Including,
And the at least one partial image of the first set further comprises a third partial image corresponding to the entire area of the first image.

The method of claim 4, wherein
Recognizing an object in the one or more partial images of the first set,
And the object comprises at least one of a formula, graph, figure, and figure, performed by a processor to recognize content.

The method of claim 5,
The image similarity score is calculated by comparing partial images including objects of the same type and comparing the third partial image with a fourth partial image corresponding to the entire area of the second image. The method performed by.

The method of claim 5,
The image similarity score comprises one or more partial image similarity scores,
The composite similarity score is calculated by applying weights to each of the text similarity score and the one or more partial image similarity scores,
Wherein the weighting is applied differently to each of the one or more partial image similarity scores according to the type of object in the partial image.

The method of claim 7, wherein
The comprehensive similarity score is calculated by the following formula,

here,

Indicates a text similarity score,

Represents a partial image similarity score,

Represents a composite similarity score,

Represents the weight applied to the text similarity score,

Denotes a weight applied to the partial image similarity score, n denotes the number of partial image similarity scores,
The weight applied to the text similarity score is greater than the weight applied to the partial image similarity score,
If the object in the partial image is a picture, a greater weight is applied than if the object in the partial image is a shape,
If the object in the partial image is a shape, a greater weight is applied than if the object in the partial image is a graph,
If the object in the partial image is a graph, a greater weight is applied than if the object in the partial image is a formula,
If the object in the partial image is a formula, a greater weight is applied than if the partial image is a full image.

The method of claim 1,
Determining that the first content is similar to second content included in the second image when the comprehensive similarity score is greater than or equal to a second predetermined threshold and less than the first threshold. The method performed by the processor to recognize.

A non-transitory computer readable storage medium having stored thereon instructions for recognizing content, wherein the instructions, when executed by a processor, cause the processor to:
Receive a first image that includes at least a portion of the first content,
Extracting first text information from the first image,
Extract a first set of one or more partial images from the first image, wherein the one or more partial images of the first set correspond to a partial region of the first image that includes an equation, graph, figure or picture Contains image-,
Calculate a text similarity score between the first image and the second image based on the first text information and second text information associated with the second image,
Calculate an image similarity score based on the one or more partial images of the first set and the one or more partial images of the second set associated with the second image, wherein the one or more partial images of the second set are formulas, graphs, figures or A second partial image corresponding to a partial region of the second image including a picture, wherein the image similarity score is calculated based on the similarity between the first partial image and the second partial image;
Calculating a comprehensive similarity score between the first image and the second image based on the text similarity score and the image similarity score,
If the comprehensive similarity score is equal to or greater than a first threshold value, it is determined that at least a part of the first content is included in the second image.
Computer readable storage media.

As a content recognition system,
A communication unit configured to receive a first image including at least a portion of the first content;
An OCR system configured to extract first text information from the first image;
An object recognition system configured to extract a first set of one or more partial images from the first image, wherein the one or more partial images of the first set correspond to a partial region of the first image including an equation, graph, figure or picture A first partial image to be included;
A text retrieval system configured to calculate a text similarity score of the first image and the second image based on the first text information and second text information associated with the second image;
An image retrieval system configured to calculate an image similarity score based on the one or more partial images of the first set and the one or more partial images of the second set associated with the second image, wherein the one or more partial images of the second set are formulas; And a second partial image corresponding to a partial region of the second image including a graph, a figure, or a picture, wherein the image similarity score is calculated based on the similarity between the first partial image and the second partial image. ; And
A search system configured to calculate a comprehensive similarity score based on the text similarity score and the image similarity score
Including,
And the search system determines that at least a portion of the first content is included in the second image when the comprehensive similarity score is equal to or greater than a first predetermined threshold.

A method performed by a processor to recognize content,
Receiving first text information associated with a first image and data for a first set of one or more partial images, the first image comprising at least a portion of first content, wherein the one or more partial images of the first set Includes a first partial image corresponding to a partial region of the first image including an equation, graph, figure or picture;
Calculating a text similarity score between the first image and the second image based on the first text information and second text information associated with the second image;
Calculating an image similarity score based on the one or more partial images of the first set and the one or more partial images of the second set associated with the second image, wherein the one or more partial images of the second set are formulas, graphs, figures Or a second partial image corresponding to a partial region of the second image including a picture, wherein the image similarity score is calculated based on the similarity between the first partial image and the second partial image;
Calculating a comprehensive similarity score based on the text similarity score and the image similarity score; And
Determining that at least a portion of the first content is included in the second image when the comprehensive similarity score is equal to or greater than a first threshold value preset;
Including,
The image similarity score includes one or more partial image similarity scores, wherein the one or more partial image similarity scores are calculated by comparing partial images including objects of the same type,
The composite similarity score is calculated by applying weights to each of the text similarity score and the one or more partial image similarity scores,
Wherein the weighting is applied differently to each of the one or more partial image similarity scores according to the type of object in the partial image.