KR102509343B1

KR102509343B1 - Method and system for analyzing layout of image

Info

Publication number: KR102509343B1
Application number: KR1020200154026A
Authority: KR
Inventors: 구형일; 김용균
Original assignee: 아주대학교산학협력단
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2023-03-13
Also published as: KR20220067387A

Abstract

본 개시의 기술적 사상에 의한 일 양태에 따른 이미지의 레이아웃 분석 시스템은, 적어도 하나의 클래스를 포함하는 이미지를 획득하는 이미지 획득 모듈, 획득된 이미지 중 분석할 영역을 지정하는 마스크 데이터를 획득하는 마스크 데이터 획득 모듈, 및 상기 획득된 이미지 중, 상기 마스크 데이터에 의해 지정된 영역의 클래스 및 상기 클래스의 레이아웃을 예측하는 레이아웃 분석 모듈을 포함하고, 상기 레이아웃 분석 모듈은, 상기 이미지 및 상기 마스크 데이터를 입력받고, 입력받은 상기 이미지 및 상기 마스크 데이터에 기초하여 상기 클래스 및 상기 레이아웃을 예측한 예측 결과를 출력하도록 학습된 레이아웃 분석 네트워크를 포함한다.An image layout analysis system according to an aspect according to the technical idea of the present disclosure includes an image acquisition module for acquiring an image including at least one class, and mask data for acquiring mask data designating a region to be analyzed among the acquired images. An acquisition module and a layout analysis module for predicting a class of a region designated by the mask data and a layout of the class among the acquired images, wherein the layout analysis module receives the image and the mask data as input, and a layout analysis network trained to output a prediction result of predicting the class and the layout based on the input image and the mask data.

Description

Layout analysis method and system of image {METHOD AND SYSTEM FOR ANALYZING LAYOUT OF IMAGE}

본 개시(disclosure)의 기술적 사상은 이미지의 레이아웃을 분석하는 방법 및 시스템에 관한 것이다.The technical idea of this disclosure relates to a method and system for analyzing the layout of an image.

텍스트, 표, 그림, 사진 등의 각종 클래스의 정보가 혼재된 이미지(예컨대, 문서 이미지 등)의 분석 또는 처리 작업 시, 이미지 내의 각 클래스마다 분석 방식이나 처리 방식이 상이할 수 있는 바, 각 클래스를 정확히 구분하는 동작이 중요할 수 있다. 예컨대 상기 이미지의 분석 또는 처리 작업은, 이미지 내의 색인 생성이나 검색, 이미지 내의 객체 탐지 또는 인식, OCR(optical character recognition), 데이터 압축 등을 포함할 수 있다.When analyzing or processing an image (eg, document image, etc.) in which information of various classes such as text, table, picture, and photo is mixed, the analysis method or processing method may be different for each class in the image, and each class It may be important to accurately distinguish between . For example, the analysis or processing of the image may include indexing or searching in the image, object detection or recognition in the image, optical character recognition (OCR), data compression, and the like.

이미지 내의 각 클래스를 구분하기 위해서는, 각 클래스에 대응하는 영역(또는 경계(레이아웃))의 정확한 검출이 요구되고, 각 클래스를 정확히 식별하는 것 또한 요구될 수 있다.In order to distinguish each class in an image, accurate detection of a region (or boundary (layout)) corresponding to each class is required, and it may also be required to accurately identify each class.

종래의 경우, 텍스트나 그림 등의 구별을 위해 정의된 규칙(알고리즘 등)에 기초하여 레이아웃을 검출하거나 클래스를 식별하는 방법이 주류를 이루었으나, 상기 규칙은 대부분 일반화된 규칙에 해당하는 바 새로운 패턴의 클래스에 대한 검출 및 식별 정확도가 낮을 수 있고, 시스템의 복잡화에 따른 규칙의 개선이나 변경이 용이하지 않을 수 있다.In the conventional case, a method of detecting a layout or identifying a class based on a rule (algorithm, etc.) defined for distinguishing text or picture has been mainstream, but most of the rules correspond to generalized rules and thus a new pattern. Detection and identification accuracy for the class of may be low, and it may not be easy to improve or change rules according to the complexity of the system.

또한, 종래의 방식은 대부분 각 클래스의 레이아웃을 사각형과 같은 제한적인 형태의 도형으로 표현함에 따라, 레이아웃 내에 해당 클래스의 영역이 모두 포함되지 못하거나, 다른 클래스의 영역이 포함되는 등 상기 레이아웃의 검출 정확도가 향상되기 어려울 수 있다.In addition, since most of the conventional methods express the layout of each class in a limited shape such as a rectangle, detection of the layout such that all areas of the corresponding class are not included in the layout or areas of other classes are included in the layout. Accuracy can be difficult to improve.

본 발명이 해결하고자 하는 일 과제는, 텍스트, 표, 그림, 사진 등 다양한 클래스를 갖는 이미지(문서 이미지 등)로부터, 소정 영역에 대한 레이아웃(경계) 및 클래스를 보다 정확히 검출할 수 있는 방법을 제공하는 것이다.One problem to be solved by the present invention is to provide a method for more accurately detecting the layout (boundary) and class of a predetermined area from images (document images, etc.) having various classes such as text, table, picture, and photograph. is to do

상기와 같은 목적을 달성하기 위하여, 본 개시의 기술적 사상에 의한 일 양태(aspect)에 따른 이미지의 레이아웃 분석 시스템은, 적어도 하나의 클래스를 포함하는 이미지를 획득하는 이미지 획득 모듈, 획득된 이미지 중 분석할 영역을 지정하는 마스크 데이터를 획득하는 마스크 데이터 획득 모듈, 및 상기 획득된 이미지 중, 상기 마스크 데이터에 의해 지정된 영역의 클래스 및 상기 클래스의 레이아웃을 예측하는 레이아웃 분석 모듈을 포함하고, 상기 레이아웃 분석 모듈은, 상기 이미지 및 상기 마스크 데이터를 입력받고, 입력받은 상기 이미지 및 상기 마스크 데이터에 기초하여 상기 클래스 및 상기 레이아웃을 예측한 예측 결과를 출력하도록 학습된 레이아웃 분석 네트워크를 포함한다.In order to achieve the above object, an image layout analysis system according to an aspect according to the technical idea of the present disclosure includes an image acquisition module for acquiring an image including at least one class, and analysis of the acquired images a mask data acquisition module that acquires mask data specifying an area to be performed; and a layout analysis module that predicts a class of an area designated by the mask data and a layout of the class, among the acquired images, wherein the layout analysis module includes: includes a layout analysis network trained to receive the image and the mask data and output a prediction result of predicting the class and the layout based on the input image and the mask data.

실시 예에 따라, 상기 클래스의 레이아웃은 상기 이미지 중, 상기 마스크 데이터에 의해 지정된 영역에 존재하는 클래스가 차지하는 영역의 경계에 대응할 수 있다.Depending on the embodiment, the layout of the class may correspond to a boundary of an area occupied by a class existing in an area designated by the mask data in the image.

실시 예에 따라, 상기 레이아웃 분석 네트워크는, 상기 클래스의 레이아웃을 나타내는 복수의 포인트들에 대응하는 복수의 좌표들을 출력하고, 상기 레이아웃 분석 모듈은, 상기 출력된 복수의 좌표들에 기초하여 상기 클래스의 레이아웃 정보를 생성하는 레이아웃 생성기를 더 포함할 수 있다.According to an embodiment, the layout analysis network outputs a plurality of coordinates corresponding to a plurality of points representing the layout of the class, and the layout analysis module determines the coordinates of the class based on the outputted plurality of coordinates. A layout generator generating layout information may be further included.

실시 예에 따라, 상기 레이아웃 생성기는 상기 복수의 포인트들을 연결한 다각형의 형태로 표현되는 상기 레이아웃 정보를 생성하고, 상기 복수의 포인트들 중 적어도 일부는 상기 다각형의 꼭지점들에 대응할 수 있다.According to an embodiment, the layout generator may generate the layout information expressed in the form of a polygon connecting the plurality of points, and at least some of the plurality of points may correspond to vertices of the polygon.

실시 예에 따라, 상기 레이아웃 분석 시스템은 상기 예측 결과에 기초하여 상기 레이아웃 분석 네트워크의 학습을 수행하는 네트워크 학습 모듈을 더 포함하고, 상기 네트워크 학습 모듈은 상기 레이아웃 정보와 정답 레이아웃 정보, 및 예측된 클래스 정보 및 정답 클래스 정보에 기초하여 상기 레이아웃 분석 네트워크를 업데이트할 수 있다.According to an embodiment, the layout analysis system further includes a network learning module that performs learning of the layout analysis network based on the prediction result, and the network learning module includes the layout information, the correct answer layout information, and the predicted class. The layout analysis network may be updated based on information and correct answer class information.

실시 예에 따라, 상기 네트워크 학습 모듈은 상기 레이아웃 정보에 대응하는 레이아웃 영역과, 상기 정답 레이아웃 정보에 대응하는 정답 레이아웃 영역의 차집합의 면적 또는 교집합의 면적 중 적어도 하나에 기초하여 상기 레이아웃 분석 네트워크에 포함된 신경망의 노드들 간의 가중치를 업데이트할 수 있다.According to an embodiment, the network learning module determines the layout analysis network based on at least one of an area of a difference set or an area of an intersection of a layout area corresponding to the layout information and a correct answer layout area corresponding to the correct answer layout information. Weights between nodes of the included neural network may be updated.

실시 예에 따라, 상기 적어도 하나의 클래스는 상기 이미지에 포함된 정보의 형태를 나타내고, 상기 마스크 데이터는 상기 이미지 중 상기 분석할 영역에 대응하는 적어도 하나의 지점을 나타내는 점, 선, 또는 영역이 표현된 마스크 이미지를 포함할 수 있다.According to an embodiment, the at least one class represents a form of information included in the image, and the mask data represents a dot, line, or region representing at least one point corresponding to the region to be analyzed in the image. mask image may be included.

본 개시의 기술적 사상에 의한 일 양태에 따른 이미지의 레이아웃 분석 방법은, 적어도 하나의 클래스를 포함하는 이미지를 획득하는 단계; 획득된 이미지 중 분석할 영역을 지정하는 마스크 데이터를 획득하는 단계; 상기 획득된 이미지 중, 상기 마스크 데이터에 의해 지정된 영역의 클래스 및 상기 클래스의 레이아웃을 예측하도록 학습된 레이아웃 분석 네트워크로, 상기 이미지 및 마스크 데이터를 입력하는 단계; 및 상기 레이아웃 분석 네트워크를 통해, 상기 입력된 이미지 중 상기 마스크 데이터에 의해 지정된 영역의 클래스 및 상기 클래스의 레이아웃을 예측한 예측 결과를 획득하는 단계를 포함한다.A layout analysis method of an image according to an aspect of the present disclosure includes obtaining an image including at least one class; Obtaining mask data designating a region to be analyzed among the acquired images; inputting the image and mask data to a layout analysis network trained to predict a class of a region designated by the mask data and a layout of the class among the acquired images; and acquiring a prediction result of predicting a class of a region designated by the mask data and a layout of the class in the input image through the layout analysis network.

본 개시의 실시 예에 따른 레이아웃 분석 방법은, 딥러닝 기반의 레이아웃 분석 네트워크를 이용하여 이미지 내의 소정 영역에 대한 클래스 식별 및 레이아웃 검출을 수행함으로써, 시스템 설계 시 도메인 지식 의존성을 크게 감소시키면서도 정확한 분석 결과를 획득할 수 있다.The layout analysis method according to an embodiment of the present disclosure performs class identification and layout detection for a predetermined area in an image using a deep learning-based layout analysis network, thereby greatly reducing domain knowledge dependence in system design and providing accurate analysis results. can be obtained.

또한, 본 개시의 실시 예에 따르면 검출된 레이아웃을 직사각형 등의 한정된 형태가 아닌 다양한 형태의 다각형으로 표현 가능하도록 구현되어, 레이아웃의 검출 정확도를 극대화할 수 있다.In addition, according to an embodiment of the present disclosure, the detected layout can be expressed in various shapes of polygons instead of a limited shape such as a rectangle, so that the layout detection accuracy can be maximized.

본 개시의 기술적 사상에 따른 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects according to the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

본 개시에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 본 개시의 예시적 실시 예에 따른 이미지의 레이아웃 분석 방법이 구현되는 레이아웃 분석 시스템의 개략적인 블록도이다.
도 2는 도 1에 도시된 레이아웃 분석 모듈에 포함된 구성의 일 실시 예를 나타내는 블록도이다.
도 3은 본 개시의 예시적 실시 예에 따른 이미지의 레이아웃 분석 방법을 설명하기 위한 플로우차트이다.
도 4는 도 3의 이미지 획득 단계에서 획득되는 이미지의 일례로서 문서 이미지를 나타낸다.
도 5는 도 3의 마스크 데이터 획득 단계에서 획득되는 마스크 데이터의 예들을 나타낸다.
도 6은 도 2에 도시된 레이아웃 분석 네트워크가, 이미지 및 마스크 데이터로부터 레이아웃과 관련된 복수의 포인트들 및 클래스 정보를 출력하는 동작을 나타낸다.
도 7은 도 2에 도시된 레이아웃 생성기가, 레이아웃 분석 네트워크로부터 출력된 복수의 포인트들에 기초하여 레이아웃 정보를 생성하는 동작을 나타낸다.
도 8은 레이아웃 분석 모듈에 포함된 레이아웃 분석 네트워크의 학습 동작을 설명하기 위한 예시도이다.
도 9는 본 개시의 예시적 실시 예에 따른 이미지의 레이아웃 분석 방법을 수행하는 디바이스의 개략적인 블록도이다.A brief description of each figure is provided in order to more fully understand the figures cited in this disclosure.
1 is a schematic block diagram of a layout analysis system in which a method for analyzing a layout of an image according to an exemplary embodiment of the present disclosure is implemented.
FIG. 2 is a block diagram illustrating an embodiment of a configuration included in the layout analysis module shown in FIG. 1 .
3 is a flowchart illustrating a method for analyzing a layout of an image according to an exemplary embodiment of the present disclosure.
FIG. 4 shows a document image as an example of an image acquired in the image acquisition step of FIG. 3 .
FIG. 5 shows examples of mask data acquired in the mask data acquisition step of FIG. 3 .
FIG. 6 shows an operation of the layout analysis network shown in FIG. 2 outputting a plurality of points related to layout and class information from image and mask data.
FIG. 7 illustrates an operation in which the layout generator shown in FIG. 2 generates layout information based on a plurality of points output from a layout analysis network.
8 is an exemplary diagram for explaining a learning operation of a layout analysis network included in a layout analysis module.
Fig. 9 is a schematic block diagram of a device performing a method for analyzing a layout of an image according to an exemplary embodiment of the present disclosure.

본 개시의 기술적 사상에 따른 예시적인 실시 예들은 당해 기술 분야에서 통상의 지식을 가진 자에게 본 개시의 기술적 사상을 더욱 완전하게 설명하기 위하여 제공되는 것으로, 아래의 실시 예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 개시의 기술적 사상의 범위가 아래의 실시 예들로 한정되는 것은 아니다. 오히려, 이들 실시 예들은 본 개시를 더욱 충실하고 완전하게 하며 당업자에게 본 발명의 기술적 사상을 완전하게 전달하기 위하여 제공되는 것이다.Exemplary embodiments according to the technical spirit of the present disclosure are provided to more completely explain the technical spirit of the present disclosure to those skilled in the art, and the following embodiments are modified in various forms. It may be, and the scope of the technical spirit of the present disclosure is not limited to the following embodiments. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the spirit of the invention to those skilled in the art.

본 개시에서 제1, 제2 등의 용어가 다양한 부재, 영역, 층들, 부위 및/또는 구성 요소들을 설명하기 위하여 사용되지만, 이들 부재, 부품, 영역, 층들, 부위 및/또는 구성 요소들은 이들 용어에 의해 한정되어서는 안 됨은 자명하다. 이들 용어는 특정 순서나 상하, 또는 우열을 의미하지 않으며, 하나의 부재, 영역, 부위, 또는 구성 요소를 다른 부재, 영역, 부위 또는 구성 요소와 구별하기 위하여만 사용된다. 따라서, 이하 상술할 제1 부재, 영역, 부위 또는 구성 요소는 본 개시의 기술적 사상의 가르침으로부터 벗어나지 않고서도 제2 부재, 영역, 부위 또는 구성 요소를 지칭할 수 있다. 예를 들면, 본 개시의 권리 범위로부터 이탈되지 않은 채 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Although terms such as first and second are used in this disclosure to describe various members, regions, layers, regions and/or components, these members, parts, regions, layers, regions and/or components do not refer to these terms. It is self-evident that it should not be limited by These terms do not imply any particular order, top or bottom, or superiority or inferiority, and are used only to distinguish one member, region, region, or component from another member, region, region, or component. Accordingly, a first member, region, region, or component to be described in detail below may refer to a second member, region, region, or component without departing from the teachings of the technical concept of the present disclosure. For example, a first element may be termed a second element, and similarly, the second element may be termed a first element, without departing from the scope of the present disclosure.

달리 정의되지 않는 한, 여기에 사용되는 모든 용어들은 기술 용어와 과학 용어를 포함하여 본 개시의 개념이 속하는 기술 분야에서 통상의 지식을 가진 자가 공통적으로 이해하고 있는 바와 동일한 의미를 지닌다. 또한, 통상적으로 사용되는, 사전에 정의된 바와 같은 용어들은 관련되는 기술의 맥락에서 이들이 의미하는 바와 일관되는 의미를 갖는 것으로 해석되어야 하며, 여기에 명시적으로 정의하지 않는 한 과도하게 형식적인 의미로 해석되어서는 아니 될 것이다.Unless defined otherwise, all terms used herein, including technical terms and scientific terms, have the same meaning as commonly understood by those of ordinary skill in the art to which the concepts of the present disclosure belong. In addition, commonly used terms as defined in the dictionary should be interpreted as having a meaning consistent with what they mean in the context of the technology to which they relate, and in an overly formal sense unless explicitly defined herein. will not be interpreted.

어떤 실시 예가 달리 구현 가능한 경우에 특정한 공정 순서는 설명되는 순서와 다르게 수행될 수도 있다. 예를 들면, 연속하여 설명되는 두 공정이 실질적으로 동시에 수행될 수도 있고, 설명되는 순서와 반대의 순서로 수행될 수도 있다.When an embodiment is otherwise embodied, a specific process sequence may be performed differently from the described sequence. For example, two processes described in succession may be performed substantially simultaneously, or may be performed in an order reverse to the order described.

첨부한 도면에 있어서, 예를 들면, 제조 기술 및/또는 공차에 따라, 도시된 형상의 변형들이 예상될 수 있다. 따라서, 본 개시의 기술적 사상에 의한 실시 예들은 본 개시에 도시된 영역의 특정 형상에 제한된 것으로 해석되어서는 아니 되며, 예를 들면, 제조 과정에서 초래되는 형상의 변화를 포함하여야 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고, 이들에 대한 중복된 설명은 생략한다.In the accompanying drawings, variations of the shapes shown may be expected, eg depending on manufacturing techniques and/or tolerances. Therefore, embodiments according to the technical idea of the present disclosure should not be construed as being limited to the specific shape of the region shown in the present disclosure, and should include, for example, changes in shape resulting from the manufacturing process. The same reference numerals are used for the same components in the drawings, and duplicate descriptions thereof are omitted.

여기에서 사용된 '및/또는' 용어는 언급된 부재들의 각각 및 하나 이상의 모든 조합을 포함한다.The term 'and/or' as used herein includes each and every combination of one or more of the recited elements.

이하에서는 첨부한 도면들을 참조하여 본 개시의 기술적 사상에 의한 실시 예들에 대해 상세히 설명한다.Hereinafter, embodiments according to the technical idea of the present disclosure will be described in detail with reference to the accompanying drawings.

도 1은 본 개시의 예시적 실시 예에 따른 이미지의 레이아웃 분석 방법이 구현되는 레이아웃 분석 시스템의 개략적인 블록도이다. 도 2는 도 1에 도시된 레이아웃 분석 모듈에 포함된 구성의 일 실시 예를 나타내는 블록도이다.1 is a schematic block diagram of a layout analysis system in which a method for analyzing a layout of an image according to an exemplary embodiment of the present disclosure is implemented. FIG. 2 is a block diagram illustrating an embodiment of a configuration included in the layout analysis module shown in FIG. 1 .

도 1을 참조하면, 본 개시의 실시 예에 따른 이미지의 레이아웃 분석 방법이 구현되는 시스템(100; 이하 '레이아웃 분석 시스템'이라 정의함)은, 텍스트, 표, 그림, 사진 등의 다양한 클래스가 혼재된 이미지(예컨대 문서 이미지 등)로부터, 소정 지점에 대응하는 클래스의 레이아웃을 검출하고, 해당 클래스를 식별하는 분석 동작을 수행할 수 있다. 상기 클래스는 데이터(또는 정보)의 속성이나 형태를 의미할 수 있다.Referring to FIG. 1 , in a system 100 (hereinafter referred to as a 'layout analysis system') in which a method for analyzing a layout of an image according to an embodiment of the present disclosure is implemented, various classes such as text, table, picture, and photo are mixed. A layout of a class corresponding to a predetermined point may be detected from an image (eg, a document image) and an analysis operation for identifying the corresponding class may be performed. The class may mean a property or form of data (or information).

이러한 레이아웃 분석 시스템(100)은 적어도 하나의 컴퓨팅 장치를 포함할 수 있다. 예컨대, 상기 적어도 하나의 컴퓨팅 장치 각각은 프로세서, 메모리, 통신 인터페이스, 입력부, 및/또는 출력부 등을 포함하는 하드웨어 기반의 장치에 해당한다. 이 경우, 레이아웃 분석 시스템(100)에 포함되는 구성들은 하드웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수 있으며, 상기 적어도 하나의 컴퓨팅 장치에 통합 또는 분할되어 구현될 수 있다.The layout analysis system 100 may include at least one computing device. For example, each of the at least one computing device corresponds to a hardware-based device including a processor, memory, communication interface, input unit, and/or output unit. In this case, components included in the layout analysis system 100 may be implemented as hardware, software, or a combination thereof, and may be implemented by being integrated or divided into the at least one computing device.

본 개시의 일 실시 예에 따른 레이아웃 분석 시스템(100)은 이미지 획득 모듈(110), 마스크 데이터 획득 모듈(120), 레이아웃 분석 모듈(130), 및 네트워크 학습 모듈(140)을 포함할 수 있다. The layout analysis system 100 according to an embodiment of the present disclosure may include an image acquisition module 110, a mask data acquisition module 120, a layout analysis module 130, and a network learning module 140.

이미지 획득 모듈(110)은, 레이아웃의 검출 및 클래스 식별을 수행할 분석 대상 이미지를 획득하거나, 후술할 레이아웃 분석 모듈(130)의 학습 동작을 위한 학습용 이미지를 획득할 수 있다. 상기 분석 대상 이미지 또는 학습용 이미지는, 텍스트, 표, 그림, 사진 등 다양한 클래스가 혼재된 이미지(예컨대 문서 이미지)에 해당할 수 있다. The image acquisition module 110 may acquire an analysis target image for performing layout detection and class identification, or may acquire a learning image for a learning operation of the layout analysis module 130 to be described later. The image to be analyzed or the image for learning may correspond to an image (eg, a document image) in which various classes such as text, table, picture, and photo are mixed.

이미지 획득 모듈(110)은 레이아웃 분석 시스템(100)에 포함된 입력 수단이나 인터페이스를 통해 사용자 또는 관리자 등으로부터 상기 분석 대상 이미지 또는 학습용 이미지를 획득할 수 있다. 실시 예에 따라, 이미지 획득 모듈(110)은 네트워크를 통해 연결되는 디바이스, 서버, 데이터베이스 등으로부터 상기 분석 대상 이미지 또는 학습용 이미지를 획득할 수도 있다. The image acquisition module 110 may acquire the analysis target image or training image from a user or manager through an input means or interface included in the layout analysis system 100 . Depending on the embodiment, the image acquisition module 110 may acquire the analysis target image or training image from a device, server, database, etc. connected through a network.

마스크 데이터 획득 모듈(120)은, 획득된 이미지 중 분석할 영역을 지정하는 마스크 데이터를 획득할 수 있다. 상기 마스크 데이터는 분석할 영역에 포함된 적어도 하나의 지점을 포함할 수 있다. 예컨대, 상기 마스크 데이터는 도 5에서 후술할 바와 같이 상기 적어도 하나의 지점을 나타내는 점, 선, 또는 영역 등이 표현된 마스크 이미지 형태로 제공되거나, 상기 적어도 하나의 지점의 좌표값을 포함할 수 있다. 상기 마스크 데이터는 레이아웃 분석 시스템(100)에 포함된 입력 수단을 통해 사용자 등으로부터 입력될 수 있으나, 이에 한정되는 것은 아니다.The mask data acquisition module 120 may acquire mask data designating a region to be analyzed among the acquired images. The mask data may include at least one point included in the region to be analyzed. For example, as described later in FIG. 5 , the mask data may be provided in the form of a mask image in which a point, line, or region representing the at least one point is expressed, or may include coordinate values of the at least one point. . The mask data may be input from a user or the like through an input means included in the layout analysis system 100, but is not limited thereto.

레이아웃 분석 모듈(130)은, 획득된 이미지 및 마스크 데이터를 분석하여, 상기 이미지 중 상기 마스크 데이터에 의해 지정되는 영역의 클래스를 식별한 클래스 정보, 및 상기 클래스가 존재하는 영역에 대한 레이아웃 정보를 포함하는 예측 결과(분석 결과)를 출력할 수 있다.The layout analysis module 130 analyzes the acquired image and mask data, and includes class information identifying a class of a region designated by the mask data among the images, and layout information about a region where the class exists. The prediction result (analysis result) can be output.

본 개시의 실시 예에 따른 레이아웃 분석 방법은, 딥러닝 기반으로 학습된 레이아웃 분석 네트워크를 이용하여, 상기 이미지 중 상기 마스크 데이터에 의해 지정되는 영역의 클래스 예측 및/또는 상기 클래스의 레이아웃 예측을 수행할 수 있다. 또한, 본 레이아웃 분석 방법에 따르면, 상기 예측된 클래스가 존재하는 영역의 레이아웃을 나타낼 때 레이아웃 형태의 제한을 최소화할 수 있다. 구체적으로, 본 개시의 레이아웃 분석 방법은 직사각형 등의 제한적인 형태의 도형뿐만 아니라 비정형적인 형태의 다각형(도 6 내지 도 8의 실시 예 참조)으로 레이아웃을 나타낼 수 있도록 구현됨으로써, 레이아웃의 검출 정확도를 극대화할 수 있다.A layout analysis method according to an embodiment of the present disclosure may perform class prediction of a region designated by the mask data and/or layout prediction of the class using a layout analysis network learned based on deep learning. can In addition, according to the present layout analysis method, when the layout of the area where the predicted class exists is indicated, restrictions on the layout form can be minimized. Specifically, the layout analysis method of the present disclosure is implemented to represent a layout not only in a limited shape such as a rectangle but also in an atypical polygon (see the embodiments of FIGS. 6 to 8), thereby improving the detection accuracy of the layout. can be maximized.

이와 관련하여 도 2를 참조하면, 레이아웃 분석 모듈(130)은 레이아웃 분석 네트워크(132)를 포함할 수 있다.Referring to FIG. 2 in this regard, the layout analysis module 130 may include a layout analysis network 132 .

레이아웃 분석 네트워크(132)는 이미지 획득 모듈(110)에 의해 획득된 이미지와, 마스크 데이터 획득 모듈(120)에 의해 획득된 마스크 데이터가 입력되면, 상기 이미지 중 마스크 데이터에 의해 지정되는 영역의 레이아웃 예측 결과와, 상기 영역의 클래스 예측 결과를 출력할 수 있다. 상기 이미지 중 상기 마스크 데이터에 의해 지정되는 영역이란, 상기 마스크 데이터에 포함된 적어도 하나의 지점(좌표값 등)에 위치한 클래스(텍스트, 표, 그림, 사진 등)가 차지하는 영역을 의미할 수 있다.When the image obtained by the image acquisition module 110 and the mask data acquired by the mask data acquisition module 120 are input, the layout analysis network 132 predicts the layout of the region designated by the mask data among the images. Results and class prediction results of the region may be output. The area designated by the mask data in the image may mean an area occupied by a class (text, table, picture, photo, etc.) located at at least one point (coordinate value, etc.) included in the mask data.

레이아웃 분석 네트워크(132)는 딥러닝 기반의 학습을 통해 구현될 수 있고, 이러한 레이아웃 분석 네트워크(132)는 신경망(neural network) 구조를 포함할 수 있다. 예컨대, 레이아웃 분석 네트워크(132)는 콘볼루션 신경망(convolutional neural network(CNN))을 포함할 수 있으나, 이에 한정되는 것은 아니다. 예컨대, 상기 CNN은 입력된 이미지로부터 특징을 추출하여 특징 맵을 형성하는 적어도 하나의 콘볼루션 레이어(convolutional layers) 및 풀링 레이어(pooling layer), 추출된 특징을 신경망에 적용하여 이미지 내의 영역들 중 상기 마스크 데이터에 의해 지정된 영역에 존재하는 클래스의 레이아웃을 나타내는 복수의 포인트들과, 상기 지정된 영역의 클래스 정보를 출력하는 fully-connected 레이어 및 softmax 레이어를 포함할 수 있다.The layout analysis network 132 may be implemented through deep learning-based learning, and this layout analysis network 132 may include a neural network structure. For example, the layout analysis network 132 may include, but is not limited to, a convolutional neural network (CNN). For example, the CNN includes at least one convolutional layer and a pooling layer that extracts features from an input image and forms a feature map, and applies the extracted features to a neural network to select the regions in the image. It may include a plurality of points representing the layout of classes existing in a region designated by mask data, and a fully-connected layer and a softmax layer outputting class information of the designated region.

상기 복수의 포인트들은, 상기 이미지 중 상기 마스크 데이터에 의해 지정되는 영역에 존재하는 클래스의 레이아웃을 나타내는 복수의 좌표들에 포함될 수 있다. 예컨대, 상기 복수의 포인트들 중 적어도 일부는, 다각형 형태의 상기 레이아웃의 꼭지점들에 대응할 수 있다. 상기 레이아웃은, 상기 지정된 영역에 존재하는 클래스와, 상기 클래스와 인접한 다른 클래스 사이의 경계를 의미할 수 있다. 다르게 표현하면, 상기 레이아웃은 상기 이미지 중, 상기 지정된 영역에 존재하는 클래스가 차지하는 영역의 경계를 의미할 수 있다.The plurality of points may be included in a plurality of coordinates indicating a layout of a class existing in an area designated by the mask data in the image. For example, at least some of the plurality of points may correspond to vertices of the polygonal layout. The layout may mean a boundary between a class existing in the designated area and another class adjacent to the class. In other words, the layout may mean a boundary of an area occupied by a class existing in the designated area in the image.

실시 예에 따라, 레이아웃 분석 모듈(130)은 레이아웃 생성기(134)를 더 포함할 수 있다. 레이아웃 생성기(134)는 레이아웃 분석 네트워크(132)로부터 출력되는 복수의 포인트들로부터 레이아웃 정보를 생성할 수 있다. 예컨대, 레이아웃 생성기(134)는 직교 투영(orthographic projection) 등의 기법에 기초하여 상기 복수의 포인트들을 연결하여, 다각형 형태를 갖는 레이아웃 정보를 생성할 수 있다. 이 때, 상기 다각형은 상기 클래스가 존재하는 영역을 모두 포함하도록 생성될 수 있다.Depending on the embodiment, the layout analysis module 130 may further include a layout generator 134. The layout generator 134 may generate layout information from a plurality of points output from the layout analysis network 132 . For example, the layout generator 134 may generate layout information having a polygonal shape by connecting the plurality of points based on a technique such as orthographic projection. In this case, the polygon may be generated to include all regions where the class exists.

상기 레이아웃 정보는 이미지에 포함된 좌표값들(픽셀 좌표 등) 중, 레이아웃(다각형)에 포함되는 좌표값들의 정보를 포함하거나, 좌표값들 각각이 상기 예측된 레이아웃(또는 레이아웃의 내부 영역)에 포함되는지 여부에 대한 정보를 포함할 수 있다.The layout information includes information on coordinate values included in a layout (polygon) among coordinate values (pixel coordinates, etc.) included in an image, or each coordinate value corresponds to the predicted layout (or internal area of the layout). It may include information on whether or not it is included.

상기 클래스 정보는, 상기 이미지 중 상기 마스크 데이터에 대응하는 영역의 클래스(예컨대 텍스트, 표, 그림, 사진 등)를 나타낼 수 있다.The class information may indicate a class (eg, text, table, picture, photo, etc.) of a region corresponding to the mask data in the image.

네트워크 학습 모듈(140)은, 레이아웃 분석 모듈(130)에 의해 예측된 레이아웃 정보 및 클래스 정보와, 기 제공되는 정답(ground truth) 레이아웃 정보 및 정답 클래스 정보에 기초하여, 레이아웃 분석 네트워크(132)에 대한 학습을 수행할 수 있다. The network learning module 140 performs the layout analysis network 132 on the basis of the layout information and class information predicted by the layout analysis module 130 and the previously provided ground truth layout information and correct answer class information. learning can be performed.

예컨대, 네트워크 학습 모듈(140)은 레이아웃 분석 네트워크(132)의 학습을 위한 목적 함수를 포함할 수 있다. 상기 목적 함수는, 상기 예측된 레이아웃 영역과 상기 정답 레이아웃 정보에 따른 레이아웃 영역 간의 차이와, 상기 예측된 클래스 정보와 정답 클래스 정보 간의 차이를 반영하여 레이아웃 분석 네트워크(132)의 학습을 수행하도록 구현될 수 있다. 예컨대, 상기 목적 함수는 예측된 레이아웃 영역과 정답 레이아웃 영역의 차집합의 면적, 차집합의 면적의 제곱, 교집합의 면적과 합집합의 면적(예를 들어, 1-(교집합의 면적)/(합집합의 면적)) 등에 기초하여 학습을 수행하도록 구현될 수 있다. 또한, 상기 목적 함수는 예측된 클래스 정보와 정답 클래스 정보에 기초한 cross-entropy loss가 최소화되는 방향으로 레이아웃 분석 네트워크(132)를 학습하도록 구현될 수 있다. 다만, 상기 목적 함수의 구현 방식이 상술한 예에 한정되는 것은 아닌 바, 목적 함수는 레이아웃 분석 네트워크(132)의 학습을 위한 공지된 다양한 방식으로 구현될 수 있다.For example, the network learning module 140 may include an objective function for learning the layout analysis network 132 . The objective function may be implemented to perform learning of the layout analysis network 132 by reflecting the difference between the predicted layout area and the layout area according to the correct answer layout information and the difference between the predicted class information and the correct answer class information. can For example, the objective function is the area of the difference between the predicted layout area and the correct layout area, the square of the area of the difference, the area of the intersection and the area of the union (eg, 1-(area of the intersection)/(of the union) It can be implemented to perform learning based on area)), etc. In addition, the objective function may be implemented to learn the layout analysis network 132 in a direction in which cross-entropy loss based on the predicted class information and the correct answer class information is minimized. However, since the implementation method of the objective function is not limited to the above-described example, the objective function may be implemented in various known ways for learning the layout analysis network 132 .

한편, 본 명세서에서 사용되는 '학습'은 러닝(learning), 훈련(또는 트레이닝(training))과 동일한 의미일 수 있으며, 학습을 수행한다는 의미는 네트워크가 학습을 수행하거나, 네트워크를 학습시키는 의미 모두를 포함할 수 있다.On the other hand, 'learning' used in this specification may have the same meaning as learning and training (or training), and the meaning of performing learning means that the network performs learning or the network learns both can include

도 3은 본 개시의 예시적 실시 예에 따른 이미지의 레이아웃 분석 방법을 설명하기 위한 플로우차트이다. 도 4는 도 3의 이미지 획득 단계에서 획득되는 이미지의 일례로서 문서 이미지를 나타낸다. 도 5는 도 3의 마스크 데이터 획득 단계에서 획득되는 마스크 데이터의 예들을 나타낸다. 도 6은 도 2에 도시된 레이아웃 분석 네트워크가, 이미지 및 마스크 데이터로부터 레이아웃과 관련된 복수의 포인트들 및 클래스 정보를 출력하는 동작을 나타낸다. 도 7은 도 2에 도시된 레이아웃 생성기가, 레이아웃 분석 네트워크로부터 출력된 복수의 포인트들에 기초하여 레이아웃 정보를 생성하는 동작을 나타낸다. 도 8은 레이아웃 분석 모듈에 포함된 레이아웃 분석 네트워크의 학습 동작을 설명하기 위한 예시도이다.3 is a flowchart illustrating a method for analyzing a layout of an image according to an exemplary embodiment of the present disclosure. FIG. 4 shows a document image as an example of an image acquired in the image acquisition step of FIG. 3 . FIG. 5 shows examples of mask data acquired in the mask data acquisition step of FIG. 3 . FIG. 6 shows an operation of the layout analysis network shown in FIG. 2 outputting a plurality of points related to layout and class information from image and mask data. FIG. 7 illustrates an operation in which the layout generator shown in FIG. 2 generates layout information based on a plurality of points output from a layout analysis network. 8 is an exemplary diagram for explaining a learning operation of a layout analysis network included in a layout analysis module.

도 3을 참조하면, 상기 레이아웃 분석 방법은 이미지를 획득하는 단계(S300), 및 획득된 이미지 중 분석할 영역에 대응하는 마스크 데이터를 획득하는 단계(S310)를 포함할 수 있다.Referring to FIG. 3 , the layout analysis method may include obtaining an image (S300) and acquiring mask data corresponding to a region to be analyzed among the acquired images (S310).

상술한 바와 같이, 레이아웃 분석 시스템(100)은 레이아웃의 검출 및 클래스 식별을 수행할 분석 대상 이미지, 또는 학습용 이미지를 획득할 수 있다. 상기 이미지는 각종 입력 수단이나 인터페이스를 통해 사용자 등으로부터 획득되거나, 네트워크를 통해 연결된 디바이스, 서버 등으로부터 획득될 수 있다.As described above, the layout analysis system 100 may obtain an analysis target image or an image for learning to perform layout detection and class identification. The image may be acquired from a user or the like through various input means or interfaces, or may be acquired from a device, server, or the like connected through a network.

상기 이미지는 텍스트, 표, 그림, 사진 등 다양한 클래스가 혼재된 이미지에 해당할 수 있다. 상기 이미지의 일례로서 도 4의 문서 이미지(400)를 참조하면, 문서 이미지(400)는 제1 텍스트 영역(401), 제2 텍스트 영역(402), 제3 텍스트 영역(403), 제4 텍스트 영역(404), 및 그림 영역(405)으로 구분될 수 있다. 각각의 영역(401, 402, 403, 404, 405)은 소정의 경계(레이아웃)에 따라 서로 구분될 수 있다. 실시 예에 따라서는, 영역들 중 일부가 서로 중첩될 수도 있다.The image may correspond to an image in which various classes such as text, table, picture, and photo are mixed. Referring to the document image 400 of FIG. 4 as an example of the image, the document image 400 includes a first text area 401, a second text area 402, a third text area 403, and a fourth text area. It can be divided into an area 404 and a picture area 405 . Each of the regions 401, 402, 403, 404, and 405 may be distinguished from each other according to a predetermined boundary (layout). Depending on embodiments, some of the regions may overlap each other.

한편, 레이아웃 분석 시스템(100)은 획득된 이미지 중 분석할 영역을 지정하는 마스크 데이터를 획득할 수 있다. 상술한 바와 같이, 마스크 데이터에 포함되는 마스크는 상기 분석할 영역의 지정을 위한 적어도 하나의 지점(좌표 등)을 포함할 수 있고, 상기 마스크는 사용자 등에 의해 입력될 수 있다.Meanwhile, the layout analysis system 100 may obtain mask data designating a region to be analyzed among the acquired images. As described above, a mask included in the mask data may include at least one point (coordinates, etc.) for designating the region to be analyzed, and the mask may be input by a user or the like.

상기 마스크 데이터의 예들과 관련하여 도 5를 참조하면, 사용자 등은 입력 수단(예컨대 터치패널, 터치 펜, 마우스 등)을 이용하여, 분석할 영역을 지정하기 위한 마스크를 입력할 수 있다. 예컨대, 사용자는 도 5의 (a)에 도시된 바와 같이 점 형태의 마스크(502)를 입력하고, 레이아웃 분석 시스템(100)은 입력된 마스크(502)를 포함하는 마스크 데이터(500)를 생성할 수 있다. 실시 예에 따라, 레이아웃 분석 시스템(100)은 도 5의 (b)에 도시된 바와 같이 선 형태로 입력된 마스크(512)를 포함하는 마스크 데이터(510)를 생성하는 등, 다양한 형태로 입력되는 마스크를 포함하는 마스크 데이터를 생성할 수 있다.Referring to FIG. 5 with respect to examples of the mask data, a user or the like may input a mask for designating a region to be analyzed using an input means (eg, a touch panel, a touch pen, a mouse, etc.). For example, a user inputs a dotted mask 502 as shown in (a) of FIG. 5, and the layout analysis system 100 generates mask data 500 including the input mask 502. can Depending on the embodiment, the layout analysis system 100 generates mask data 510 including a mask 512 input in the form of a line, as shown in (b) of FIG. 5, which is input in various forms. Mask data including a mask may be generated.

다시 도 3을 설명한다.Figure 3 will be described again.

상기 레이아웃 분석 방법은, 획득된 이미지 중 상기 마스크 데이터에 대응하는 영역(마스크 데이터에 의해 지정되는 영역)의 레이아웃 및 클래스를 예측하는 단계(S320)를 포함할 수 있다.The layout analysis method may include predicting a layout and class of a region corresponding to the mask data (region designated by the mask data) among the acquired images (S320).

도 6을 함께 참조하면, 레이아웃 분석 시스템(100)은 획득된 이미지(예컨대 문서 이미지(400))와 마스크 데이터(500)를 레이아웃 분석 네트워크(132)로 입력할 수 있다. 실시 예에 따라, 레이아웃 분석 시스템(100)은 마스크 데이터(500)에 포함된 마스크에 기초하여 적어도 하나의 지점에 대한 정보(좌표값 등)를 획득하고, 획득된 정보를 레이아웃 분석 네트워크(132)로 입력할 수도 있다.Referring to FIG. 6 together, the layout analysis system 100 may input an acquired image (eg, a document image 400) and mask data 500 to the layout analysis network 132. According to an embodiment, the layout analysis system 100 obtains information (coordinate values, etc.) on at least one point based on the mask included in the mask data 500, and transfers the obtained information to the layout analysis network 132. can also be entered as

레이아웃 분석 네트워크(132)는, 입력된 이미지(400) 중 마스크 데이터(500)에 의해 지정된 영역에 기초하여 이미지(400)를 분석하여, 레이아웃 및 클래스에 대한 예측 결과를 출력하도록 학습될 수 있다. 상술한 바와 같이, 레이아웃 분석 네트워크(132)는 CNN 등으로 구현될 수 있다. 본 명세서에서는 레이아웃 분석 네트워크(132)가 하나의 네트워크로 구성되어 레이아웃 및 클래스에 대한 예측 결과를 출력하는 것으로 설명하였으나, 실시 예에 따라 레이아웃 분석 네트워크(132)는 레이아웃에 대한 예측 결과를 출력하는 제1 네트워크와, 클래스에 대한 예측 결과를 출력하는 제2 네트워크로 서로 구분될 수도 있다.The layout analysis network 132 may be trained to analyze the image 400 based on a region designated by the mask data 500 among the input images 400 and output prediction results for layouts and classes. As described above, the layout analysis network 132 may be implemented as a CNN or the like. In the present specification, it has been described that the layout analysis network 132 is composed of one network and outputs prediction results for layouts and classes, but according to an embodiment, the layout analysis network 132 outputs prediction results for layouts. It may be divided into a first network and a second network that outputs a prediction result for a class.

레이아웃 분석 네트워크(132)는 이미지(400) 중 마스크 데이터(500)에 의해 지정되는 영역을 기초로 이미지(400)를 분석할 수 있다. 분석 결과에 기초하여, 레이아웃 분석 네트워크(132)는 상기 지정되는 영역에 존재하는 클래스의 레이아웃을 나타내는 복수의 포인트들(610), 및 클래스 정보(620)를 상기 예측 결과로서 출력할 수 있다. 상술한 바와 같이, 복수의 포인트들(610)은 상기 클래스의 레이아웃을 나타내는 복수의 좌표들에 포함될 수 있다. 복수의 포인트들(610) 중 적어도 일부는 다각형 형태의 상기 레이아웃의 꼭지점들에 대응할 수 있다. 한편, 레이아웃 분석 네트워크(132)는 기 설정된 수의 포인트들을 출력할 수 있고, 포인트들의 수는 5개 이상으로 설정될 수 있다. 이에 따라, 레이아웃 분석 시스템(100)은 상기 클래스의 레이아웃을 직사각형 등의 한정된 형태의 도형이 아닌, 비정형 형태의 다양한 다각형으로 나타낼 수 있다.The layout analysis network 132 may analyze the image 400 based on a region designated by the mask data 500 in the image 400 . Based on the analysis result, the layout analysis network 132 may output a plurality of points 610 representing the layout of the class existing in the designated area and class information 620 as the prediction result. As described above, the plurality of points 610 may be included in a plurality of coordinates representing the layout of the class. At least some of the plurality of points 610 may correspond to vertices of the polygonal layout. Meanwhile, the layout analysis network 132 may output a preset number of points, and the number of points may be set to 5 or more. Accordingly, the layout analysis system 100 may represent the layout of the class as various polygons of atypical shapes, rather than limited shapes such as rectangles.

도 7을 참조하면, 레이아웃 분석 시스템(100; 예컨대 레이아웃 생성기(134))은, 레이아웃 분석 네트워크(132)로부터 출력된 복수의 포인트들(610)에 기초하여 레이아웃 정보(710)를 생성할 수 있다. 상술한 바와 같이, 레이아웃 분석 시스템(100)은 직교 투영 등의 기법에 기초하여 복수의 포인트들(610)을 연결하여, 다각형 형태의 레이아웃 정보(710)를 생성할 수 있다. 레이아웃 정보(710)는 이미지(400) 중 상기 검출된 레이아웃에 포함되는 좌표값들의 정보를 포함하거나, 이미지(400)의 좌표값들 각각이 상기 레이아웃에 포함되는지 여부에 대한 정보를 포함할 수 있다.Referring to FIG. 7 , the layout analysis system 100 (for example, the layout generator 134) may generate layout information 710 based on a plurality of points 610 output from the layout analysis network 132. . As described above, the layout analysis system 100 may generate polygonal layout information 710 by connecting a plurality of points 610 based on a technique such as orthogonal projection. The layout information 710 may include information on coordinate values included in the detected layout of the image 400 or information on whether each coordinate value of the image 400 is included in the layout. .

실시 예에 따라, 상기 레이아웃 분석 방법은, 예측 결과에 기초한 레이아웃 정보 및 클래스 정보를 출력하는 단계(S330)를 포함할 수 있다.According to an embodiment, the layout analysis method may include outputting layout information and class information based on prediction results (S330).

레이아웃 분석 시스템(100)은 생성된 레이아웃 정보(710) 및 클래스 정보(620)를 출력 수단(디스플레이 등)을 통해 출력할 수 있다. 실시 예에 따라, 레이아웃 분석 시스템(100)은 생성된 레이아웃 정보(710) 및 클래스 정보(620)를 통신 인터페이스를 통해 연결된 다른 디바이스로 전송할 수도 있다.The layout analysis system 100 may output the generated layout information 710 and class information 620 through an output means (display, etc.). Depending on the embodiment, the layout analysis system 100 may transmit the generated layout information 710 and class information 620 to other connected devices through a communication interface.

실시 예에 따라, 상기 레이아웃 분석 방법은 예측 결과에 기초하여 레이아웃 분석 네트워크의 학습을 수행하는 단계(S340)를 포함할 수 있다.According to an embodiment, the layout analysis method may include performing learning of a layout analysis network based on a prediction result (S340).

도 8을 함께 참조하면, 레이아웃 분석 시스템(100)은 학습용 이미지(800) 및 마스크 데이터(510)를 분석하여 레이아웃 및 클래스 예측 결과를 출력할 수 있다. 레이아웃 분석 시스템(100)은 출력된 예측 결과와 정답 레이아웃 및 정답 클래스에 기초하여 레이아웃 분석 모듈(130) 또는 레이아웃 분석 네트워크(132)의 학습을 수행할 수 있다.Referring to FIG. 8 together, the layout analysis system 100 may analyze the training image 800 and the mask data 510 to output layout and class prediction results. The layout analysis system 100 may perform learning of the layout analysis module 130 or the layout analysis network 132 based on the output prediction result, the correct answer layout, and the correct answer class.

네트워크 학습 모듈(140)은 예측된 레이아웃 정보(810 또는 820) 및 예측된 클래스 정보(825)와, 정답 레이아웃 정보(830 또는 840) 및 정답 클래스 정보(850)에 기초하여 레이아웃 분석 네트워크(132)의 학습을 수행할 수 있다. 예측된 레이아웃 정보(810 또는 820)는 레이아웃 생성기(134)에 의해 생성된 레이아웃(다각형 영역)과 관련된 정보일 수 있으나, 실시 예에 따라서는 레이아웃 분석 네트워크(132)로부터 출력된 복수의 포인트들과 관련된 정보를 포함할 수도 있다.The network learning module 140 performs the layout analysis network 132 based on the predicted layout information 810 or 820 and the predicted class information 825, the correct answer layout information 830 or 840, and the correct answer class information 850. of learning can be performed. The predicted layout information 810 or 820 may be information related to the layout (polygon area) generated by the layout generator 134, but depending on the embodiment, a plurality of points output from the layout analysis network 132 and You may also include relevant information.

도 1에서 상술한 바와 같이, 네트워크 학습 모듈(140)은 레이아웃 분석 네트워크(132)의 학습을 위한 목적 함수를 포함할 수 있다. 상기 목적 함수는, 상기 예측된 레이아웃 영역과 상기 정답 레이아웃 정보에 따른 레이아웃 영역 간의 차이와, 상기 예측된 클래스 정보와 정답 클래스 정보 간의 차이를 반영하여 레이아웃 분석 네트워크(132)의 학습을 수행하도록 구현될 수 있다. 예컨대, 상기 목적 함수는 예측된 레이아웃 영역과 정답 레이아웃 영역의 차집합의 면적, 차집합의 면적의 제곱, 교집합의 면적과 합집합의 면적(예를 들어, 1-(교집합의 면적)/(합집합의 면적)) 등에 기초하여 학습을 수행하도록 구현될 수 있다. 또한, 상기 목적 함수는 예측된 클래스 정보와 정답 클래스 정보에 기초한 cross-entropy loss가 최소화되는 방향으로 레이아웃 분석 네트워크(132)를 학습하도록 구현될 수 있다.As described above in FIG. 1 , the network learning module 140 may include an objective function for learning the layout analysis network 132 . The objective function may be implemented to perform learning of the layout analysis network 132 by reflecting the difference between the predicted layout area and the layout area according to the correct answer layout information and the difference between the predicted class information and the correct answer class information. can For example, the objective function is the area of the difference between the predicted layout area and the correct layout area, the square of the area of the difference, the area of the intersection and the area of the union (eg, 1-(area of the intersection)/(of the union) It can be implemented to perform learning based on area)), etc. In addition, the objective function may be implemented to learn the layout analysis network 132 in a direction in which cross-entropy loss based on the predicted class information and the correct answer class information is minimized.

상기 목적 함수에 기초하여, 레이아웃 분석 시스템(100)은 레이아웃 분석 네트워크(132)의 신경망을 구성하는 노드들 간의 가중치를 변경(업데이트)함으로써 레이아웃 분석 네트워크(132)의 학습을 수행할 수 있다.Based on the objective function, the layout analysis system 100 may perform learning of the layout analysis network 132 by changing (updating) weights between nodes constituting the neural network of the layout analysis network 132 .

도 9는 본 개시의 예시적 실시 예에 따른 이미지의 레이아웃 분석 방법을 수행하는 디바이스의 개략적인 블록도이다.Fig. 9 is a schematic block diagram of a device performing a method for analyzing a layout of an image according to an exemplary embodiment of the present disclosure.

도 9를 참조하면, 본 개시의 실시 예에 따른 디바이스(900)는 도 1에서 상술한 레이아웃 분석 시스템(100)을 구성하는 적어도 하나의 컴퓨팅 장치 중 어느 하나에 대응할 수 있다. 이 경우, 디바이스(900)는 도 3 내지 도 8에서 상술한 이미지(문서 이미지 등)의 레이아웃 분석 동작 및/또는 학습 동작을 수행하는 디바이스에 해당할 수 있다. Referring to FIG. 9 , a device 900 according to an embodiment of the present disclosure may correspond to any one of at least one computing device constituting the layout analysis system 100 described above in FIG. 1 . In this case, the device 900 may correspond to a device that performs a layout analysis operation and/or a learning operation of the image (document image, etc.) described above with reference to FIGS. 3 to 8 .

이러한 디바이스(900)는 프로세서(910) 및 메모리(920)를 포함할 수 있다. 다만, 디바이스(900)의 구성 요소가 전술한 예에 한정되는 것은 아니다. 예를 들어, 디바이스(900)는 전술한 구성 요소들보다 더 많은 구성 요소를 포함하거나 더 적은 구성 요소를 포함할 수 있다. 또한, 프로세서(910)는 적어도 하나일 수 있으며, 메모리(920) 또한 적어도 하나일 수 있다. 또한, 프로세서(910) 및 메모리(920) 중 둘 이상이 하나의 칩으로 결합된 형태일 수도 있다.Such a device 900 may include a processor 910 and a memory 920 . However, the components of the device 900 are not limited to the above examples. For example, the device 900 may include more or fewer components than the aforementioned components. Also, there may be at least one processor 910 and at least one memory 920 . Also, two or more of the processor 910 and the memory 920 may be combined into a single chip.

일 실시 예에 따라, 프로세서(910)는 상술한 이미지 획득 모듈(110), 마스크 데이터 획득 모듈(120), 레이아웃 분석 모듈(130), 및 네트워크 학습 모듈(140) 중 적어도 하나에 대응하거나, 상기 모듈들 중 적어도 하나를 실행 또는 제어할 수 있다.According to an embodiment, the processor 910 corresponds to at least one of the above-described image acquisition module 110, mask data acquisition module 120, layout analysis module 130, and network learning module 140, or the At least one of the modules may be executed or controlled.

이러한 프로세서(910)는 CPU, AP(application processor), 집적 회로, 마이크로컴퓨터, ASIC(application specific integrated circuit), FPGA(field programmable gate array), 및/또는 NPU(neural processing unit) 등의 하드웨어를 포함할 수 있다.The processor 910 includes hardware such as a CPU, an application processor (AP), an integrated circuit, a microcomputer, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), and/or a neural processing unit (NPU). can do.

본 개시의 일 실시 예에 따르면, 메모리(920)는 디바이스(900)의 동작에 필요한 프로그램 및 데이터를 저장할 수 있다.According to an embodiment of the present disclosure, the memory 920 may store programs and data necessary for the operation of the device 900 .

또한, 메모리(920)는 프로세서(910)를 통해 생성되거나 획득된 데이터 중 적어도 하나를 저장할 수 있다. 실시 예에 따라, 메모리(920)는 이미지 획득 모듈(110), 마스크 데이터 획득 모듈(120), 레이아웃 분석 모듈(130), 및/또는 네트워크 학습 모듈(140)와 관련된 데이터, 명령어, 알고리즘 등을 저장할 수 있다.Also, the memory 920 may store at least one of data generated or obtained through the processor 910 . According to an embodiment, the memory 920 stores data, commands, algorithms, etc. related to the image acquisition module 110, the mask data acquisition module 120, the layout analysis module 130, and/or the network learning module 140. can be saved

메모리(920)는 롬(ROM), 램(RAM), 플래시 메모리, SSD, HDD 등의 저장 매체 또는 저장 매체들의 조합으로 구성될 수 있다.The memory 920 may include a storage medium such as ROM, RAM, flash memory, SSD, HDD, or a combination of storage media.

상기한 실시 예들의 설명은 본 개시의 더욱 철저한 이해를 위하여 도면을 참조로 예를 든 것들에 불과하므로, 본 개시의 기술적 사상을 한정하는 의미로 해석되어서는 안될 것이다. The descriptions of the above embodiments are only examples with reference to the drawings for a more thorough understanding of the present disclosure, and should not be construed as limiting the technical spirit of the present disclosure.

또한, 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 개시의 기본적 원리를 벗어나지 않는 범위 내에서 다양한 변화와 변경이 가능함은 명백하다 할 것이다.In addition, it will be clear to those skilled in the art that various changes and modifications are possible within a range that does not deviate from the basic principles of the present disclosure.

Claims

In the image layout analysis system,
an image acquisition module that acquires an image including at least one class;
a mask data acquisition module that obtains mask data designating a region to be analyzed among the acquired images; and
Among the obtained images, a layout analysis module predicts a class of a region designated by the mask data and a layout of the class;
The layout analysis module,
A layout analysis network trained to receive the image and the mask data and output a prediction result of predicting the class and the layout based on the input image and the mask data,
The layout analysis network,
outputs a plurality of coordinates corresponding to a plurality of points representing the layout of the class;
The layout analysis module,
Further comprising a layout generator for generating layout information of the class based on the outputted plurality of coordinates;
The layout generator,
Generating the layout information expressed in the form of a polygon connecting the plurality of points,
layout analysis system.

According to claim 1,
The layout of the class is,
Of the image, corresponding to the boundary of the region occupied by the class existing in the region designated by the mask data,
layout analysis system.

delete

According to claim 1,
At least some of the plurality of points correspond to vertices of the polygon,
layout analysis system.

According to claim 1,
Further comprising a network learning module for performing learning of the layout analysis network based on the prediction result;
The network learning module,
Updating the layout analysis network based on the layout information, the correct answer layout information, and the predicted class information and correct answer class information;
layout analysis system.

According to claim 5,
The network learning module,
A weight between nodes of a neural network included in the layout analysis network is updated based on at least one of the area of the difference set or the area of intersection of the layout area corresponding to the layout information and the correct answer layout area corresponding to the correct answer layout information. doing,
layout analysis system.

According to claim 1,
The at least one class represents a type of information included in the image,
The mask data,
Including a mask image in which a point, line, or region representing at least one point corresponding to the region to be analyzed is expressed in the image,
layout analysis system.

obtaining an image including at least one class;
Obtaining mask data designating a region to be analyzed among the acquired images;
inputting the image and mask data to a layout analysis network trained to predict a class of a region designated by the mask data and a layout of the class among the acquired images; and
Obtaining, through the layout analysis network, a prediction result of predicting a class of a region designated by the mask data in the input image and a layout of the class;
Obtaining the prediction result,
obtaining, from the layout analysis network, a plurality of coordinates corresponding to a plurality of points representing the layout of the class; and
Acquiring a prediction result of a layout represented by a polygon connecting a plurality of obtained coordinates,
How to analyze the layout of an image.

According to claim 8,
The layout of the class is,
Corresponding to the boundary of the region occupied by the class existing in the region designated by the mask data, among the at least one class included in the image,
How to analyze the layout of an image.

delete

According to claim 8,
At least some of the plurality of points correspond to vertices of the polygon,
How to analyze the layout of an image.

According to claim 8,
Further comprising the step of performing learning of the layout analysis network based on the prediction result,
The step of performing the learning is,
Updating the layout analysis network based on predicted classes and predicted layouts, correct answer class information and correct answer layout information,
How to analyze the layout of an image.

According to claim 12,
Updating the layout analysis network,
Updating a weight between nodes of a neural network included in the layout analysis network based on an area of a difference between a layout area corresponding to the predicted layout and a correct answer layout area corresponding to the correct answer layout information ,
How to analyze the layout of an image.

According to claim 8,
The at least one class includes at least one of text, picture, photo, figure, and table,
The mask data,
Including a mask image in which a point, line, or region representing at least one point corresponding to the region to be analyzed is expressed,
How to analyze the layout of an image.