KR20220001844A

KR20220001844A - Method and apparatus for providing data indicating whether electronic documents in different formats match

Info

Publication number: KR20220001844A
Application number: KR1020200080282A
Authority: KR
Inventors: 구용구
Original assignee: 주식회사 폴라리스오피스
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2022-01-06
Also published as: KR102414935B1

Abstract

According to an embodiment of the present invention, there is provided a method and apparatus for providing data indicating whether electronic documents in different formats match. The method for providing data whether indicating whether electronic documents in different formats match according to the embodiment of the present invention includes: generating sentence attribute data by analyzing each of the electronic documents in different formats; determining sentence similarity with respect to the electronic documents based on the generated sentence attribute data; generating word attribute data for the electronic documents according to the determined sentence similarity; determining word similarity with respect to the electronic documents based on the generated word attribute data; determining whether the electronic documents match according to the determined word similarity; and providing result data indicating whether the electronic documents match. The present invention allows the user to check not only differences in words but also differences in attributes.

Description

Method and apparatus for providing data indicating whether electronic documents of different formats match each other

본 발명은 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for providing data indicating whether electronic documents of different formats match.

최근, 컴퓨터나 스마트폰 또는 태블릿 PC 등이 널리 보급됨에 따라, 이러한 단말기기를 이용하여 전자 문서를 열람, 작성, 편집할 수 있도록 하는 다양한 종류의 전자 문서 관련 프로그램들이 출시되고 있다.Recently, as computers, smart phones, or tablet PCs are widely used, various types of electronic document-related programs have been released that allow users to read, write, and edit electronic documents using such terminal devices.

이러한 전자 문서 관련 프로그램들로는 기본적인 문서의 작성, 편집 등을 지원하는 워드프로세서, 데이터의 입력, 산술연산, 데이터 관리를 보조하는 스프레드시트, 발표자의 발표를 보조하기 위한 프레젠테이션 프로그램들이 있다.These electronic document-related programs include a word processor supporting basic document creation and editing, a spreadsheet supporting data input, arithmetic operations, and data management, and presentation programs supporting a presenter's presentation.

일반적으로 사용자는 이러한 전자 문서 관련 프로그램을 이용하여 전자 문서를 작성하며, 이와 같이 작성된 전자 문서가 업데이트될 때마다 새로운 사본을 생성할 수 있다. In general, a user creates an electronic document using such an electronic document-related program, and a new copy can be created whenever the electronic document created in this way is updated.

최종 버전의 전자 문서가 생성된 경우 사용자는 생성된 전자 문서와 다른 포맷의 전자 문서를 생성할 수 있으며, 이러한 경우 포맷 변경에 의해 단어가 깨지거나, 문장 또는 단어의 속성이 변경되어 최종 버전의 전자 문서와는 다른 전자 문서가 생성될 수 있다.When the final version of the electronic document is generated, the user may create an electronic document in a format different from that of the generated electronic document. An electronic document different from the document may be created.

또한, 이를 확인하기 위해 사용자가 원본과 사본을 비교해야 할 경우 사용자는 전자 문서 관련 프로그램을 통해 서로 다른 포맷의 전자 문서를 비교하기 어려우므로, 각 전자 문서들 중 어느 하나를 동일 포맷으로 변경하여 이들을 서로 비교해야 하는 번거로움이 있다.In addition, when the user needs to compare the original and the copy to confirm this, it is difficult for the user to compare electronic documents in different formats through the electronic document related program. There is trouble in comparing them with each other.

이에, 서로 다른 포맷의 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공하기 위한 방법 및 장치가 요구되고 있다.Accordingly, there is a need for a method and apparatus for providing data indicating whether electronic documents of different formats match each other.

본 발명이 해결하고자 하는 과제는 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 방법 및 장치를 제공하는 것이다. SUMMARY OF THE INVENTION An object of the present invention is to provide a method and apparatus for providing data indicating whether electronic documents of different formats match each other.

구체적으로, 본 발명이 해결하고자 하는 과제는 사용자가 작성한 전자 문서에 대한 사본을 생성한 경우 원본과 사본의 일치 여부를 용이하게 확인하기 위해 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 방법 및 장치를 제공하는 것이다.Specifically, the problem to be solved by the present invention is to provide data indicating whether electronic documents of different formats match each other in order to easily check whether the original and the copy match when a copy of the electronic document created by the user is created To provide a method and apparatus.

본 발명의 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems of the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

전술한 바와 같은 과제를 해결하기 위하여 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 방법 및 장치가 제공된다. In order to solve the above problems, a method and apparatus for providing data indicating whether electronic documents of different formats match each other are provided.

본 발명의 실시예에 따른 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 장치의 제어부에 의해서 수행되는 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 방법은, 상기 서로 다른 포맷의 전자 문서들 각각을 분석하여 문장 속성 데이터를 생성하는 단계; 상기 생성된 문장 속성 데이터에 기반하여 상기 전자 문서들에 대한 문장 유사도를 결정하는 단계; 상기 결정된 문장 유사도에 따라 상기 전자 문서들에 대한 단어 속성 데이터를 생성하는 단계; 상기 생성된 단어 속성 데이터에 기반하여 상기 전자 문서들에 대한 단어 유사도를 결정하는 단계; 상기 결정된 단어 유사도에 따라 상기 전자 문서들 간의 일치 여부를 결정하는 단계; 및 상기 전자 문서들 간의 일치 여부를 나타내는 결과 데이터를 제공하는 단계를 포함한다.According to an embodiment of the present invention, the data providing method for indicating whether electronic documents of different formats match each other, performed by a control unit of a data providing apparatus indicating whether electronic documents of different formats match or not, according to an embodiment of the present invention, generating sentence attribute data by analyzing each of the formatted electronic documents; determining sentence similarity with respect to the electronic documents based on the generated sentence attribute data; generating word attribute data for the electronic documents according to the determined sentence similarity; determining word similarity with respect to the electronic documents based on the generated word attribute data; determining whether the electronic documents match each other according to the determined word similarity; and providing result data indicating whether the electronic documents match or not.

본 발명의 실시예에 따른 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 장치는, 서로 다른 포맷의 전자 문서들을 저장하는 저장부; 및 상기 저장부와 연결하도록 구성된 제어부를 포함하고, 상기 제어부는, 상기 서로 다른 포맷의 전자 문서들 각각을 분석하여 문장 속성 데이터를 생성하고, 상기 생성된 문장 속성 데이터에 기반하여 상기 전자 문서들에 대한 문장 유사도를 결정하고, 상기 결정된 문장 유사도에 따라 상기 전자 문서들에 대한 단어 속성 데이터를 생성하고, 상기 생성된 단어 속성 데이터에 기반하여 상기 전자 문서들에 대한 단어 유사도를 결정하고, 상기 결정된 단어 유사도에 따라 상기 전자 문서들 간의 일치 여부를 결정하고, 상기 전자 문서들 간의 일치 여부를 나타내는 결과 데이터를 제공하도록 구성된다.According to an embodiment of the present invention, there is provided an apparatus for providing data indicating whether electronic documents of different formats match each other, comprising: a storage unit for storing electronic documents of different formats; and a control unit configured to be connected to the storage unit, wherein the control unit analyzes each of the electronic documents of the different formats to generate sentence attribute data, and based on the generated sentence attribute data, to the electronic documents determines the similarity of a sentence with respect to each other, generates word attribute data for the electronic documents according to the determined sentence similarity, determines the similarity of words with respect to the electronic documents based on the generated word attribute data, and the determined word and determine whether the electronic documents match according to the degree of similarity, and provide result data indicating whether the electronic documents match.

기타 실시예의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Details of other embodiments are included in the detailed description and drawings.

본 발명은 문서 포맷과 상관없이 둘 이상의 전자 문서들 간의 일치 여부를 빠르고 쉽게 확인할 수 있다.According to the present invention, it is possible to quickly and easily check whether two or more electronic documents match regardless of a document format.

또한 본 발명은 서로 다른 포맷의 전자 문서들 간의 일치 여부를 결정하기 위해 동일한 포맷으로 변환하기 위한 번거로움을 최소화할 수 있다.In addition, the present invention can minimize the hassle of converting to the same format in order to determine whether electronic documents of different formats match each other.

또한 본 발명은 사용자가 특정 전자 문서에 대하여 다른 포맷의 사본을 만든 경우 원본과 사본의 일치 여부를 용이하게 확인할 수 있다.Also, according to the present invention, when a user makes a copy in a different format for a specific electronic document, it is possible to easily check whether the original and the copy match.

또한 본 발명은 서로 다른 포맷을 가지는 원본과 사본을 서로 비교하여 일치 여부를 나타내는 데이터를 제공함으로써, 사용자가 단어의 차이뿐만 아니라 속성의 차이도 확인할 수 있다.In addition, the present invention compares an original and a copy having different formats and provides data indicating whether or not they match, so that the user can check not only differences in words but also differences in attributes.

본 발명에 따른 효과는 이상에서 예시한 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 명세서 내에 포함되어 있다.The effect according to the present invention is not limited by the contents exemplified above, and more various effects are included in the present specification.

도 1은 본 발명의 실시예에 따른 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 장치를 설명하기 위한 개략도이다.
도 2는 본 발명의 실시예에 따른 전자 장치의 개략적인 블록도이다.
도 3은 본 발명의 실시예에 따른 두 개의 전자 문서들에 대한 문장 속성 데이터를 나타내는 예시도이고, 도 4는 본 발명의 실시예에 따른 두 개의 전자 문서들에 대한 문장 비교 지표 데이터를 나타내는 예시도이다.
도 5는 본 발명의 실시예에 따른 문장 속성 데이터를 생성하는 방법을 설명하기 위한 예시도이다.
도 6은 본 발명의 실시예에 따른 두 개의 전자 문서에 대한 단어 속성 데이터를 나타내는 예시도이다.
도 7a 및 도 7b는 본 발명의 실시예에 따른 두 개의 전자 문서에 대한 단어 비교 지표 데이터를 나타내는 예시도들이다.
도 8은 본 발명의 실시예에 따른 두 개의 전자 문서들에 대한 일치 여부를 결정하기 위해 사용되는 결정 지표 데이터를 나타내는 예시도이다.
도 9는 본 발명의 실시예에 따른 전자 장치에서 서로 다른 포맷의 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공하는 방법을 설명하기 위한 개략적인 흐름도이다.1 is a schematic diagram for explaining a data providing apparatus indicating whether electronic documents of different formats match each other according to an embodiment of the present invention.
2 is a schematic block diagram of an electronic device according to an embodiment of the present invention.
3 is an exemplary diagram illustrating sentence attribute data for two electronic documents according to an embodiment of the present invention, and FIG. 4 is an exemplary diagram illustrating sentence comparison index data for two electronic documents according to an embodiment of the present invention It is also
5 is an exemplary diagram for explaining a method of generating sentence attribute data according to an embodiment of the present invention.
6 is an exemplary diagram illustrating word attribute data for two electronic documents according to an embodiment of the present invention.
7A and 7B are exemplary views illustrating word comparison index data for two electronic documents according to an embodiment of the present invention.
8 is an exemplary diagram illustrating decision index data used to determine whether two electronic documents match or not according to an embodiment of the present invention.
9 is a schematic flowchart for explaining a method of providing data indicating whether electronic documents of different formats match each other in an electronic device according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조부호가 사용될 수 있다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be embodied in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. In connection with the description of the drawings, like reference numerals may be used for like components.

본 문서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this document, expressions such as "have," "may have," "includes," or "may include" refer to the presence of a corresponding characteristic (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.

본 문서에서, "A 또는 B," "A 또는/및 B 중 적어도 하나," 또는 "A 또는/및 B 중 하나 또는 그 이상" 등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B," "A 및 B 중 적어도 하나," 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는(3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.In this document, expressions such as “A or B,” “at least one of A or/and B,” or “one or more of A or/and B” may include all possible combinations of the items listed together. . For example, "A or B," "at least one of A and B," or "at least one of A or B" means (1) includes at least one A, (2) includes at least one B; Or (3) it may refer to all cases including both at least one A and at least one B.

본 문서에서 사용된 "제1," "제2," "첫째," 또는 "둘째," 등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. 예를 들면, 제1 사용자 기기와 제2 사용자 기기는, 순서 또는 중요도와 무관하게, 서로 다른 사용자 기기를 나타낼 수 있다. 예를 들면, 본 문서에 기재된 권리범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제1 구성요소로 바꾸어 명명될 수 있다.As used herein, expressions such as "first," "second," "first," or "second," may modify various elements, regardless of order and/or importance, and refer to one element. It is used only to distinguish it from other components, and does not limit the components. For example, the first user equipment and the second user equipment may represent different user equipment regardless of order or importance. For example, without departing from the scope of the rights described in this document, the first component may be named as the second component, and similarly, the second component may also be renamed as the first component.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 다른 구성요소(예: 제3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.A component (eg, a first component) is "coupled with/to (operatively or communicatively)" to another component (eg, a second component); When referring to "connected to", it will be understood that the certain element may be directly connected to the other element or may be connected through another element (eg, a third element). On the other hand, when it is said that a component (eg, a first component) is "directly connected" or "directly connected" to another component (eg, a second component), the component and the It may be understood that other components (eg, a third component) do not exist between other components.

본 문서에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~ 를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된)프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.The expression "configured to (or configured to)" as used in this document, depending on the context, for example, "suitable for," "having the capacity to ," "designed to," "adapted to," "made to," or "capable of." The term “configured (or configured to)” may not necessarily mean only “specifically designed to” in hardware. Instead, in some circumstances, the expression “a device configured to” may mean that the device is “capable of” with other devices or parts. For example, the phrase "a processor configured (or configured to perform) A, B, and C" refers to a dedicated processor (eg, an embedded processor) for performing the corresponding operations, or by executing one or more software programs stored in a memory device. , may mean a generic-purpose processor (eg, a CPU or an application processor) capable of performing corresponding operations.

본 문서에서 사용된 용어들은 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 다른 실시예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 문서에 기재된 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 본 문서에 사용된 용어들 중 일반적인 사전에 정의된 용어들은, 관련 기술의 문맥상 가지는 의미와 동일 또는 유사한 의미로 해석될 수 있으며, 본 문서에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 문서에서 정의된 용어일지라도 본 문서의 실시 예들을 배제하도록 해석될 수 없다.Terms used in this document are only used to describe specific embodiments, and may not be intended to limit the scope of other embodiments. The singular expression may include the plural expression unless the context clearly dictates otherwise. Terms used herein, including technical or scientific terms, may have the same meanings as commonly understood by one of ordinary skill in the art described in this document. Among terms used in this document, terms defined in a general dictionary may be interpreted with the same or similar meaning as the meaning in the context of the related art, and unless explicitly defined in this document, ideal or excessively formal meanings is not interpreted as In some cases, even terms defined in this document cannot be construed to exclude embodiments of this document.

본 발명의 여러 실시예들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하며, 당업자가 충분히 이해할 수 있듯이 기술적으로 다양한 연동 및 구동이 가능하며, 각 실시예들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관 관계로 함께 실시 가능할 수도 있다.Each feature of the various embodiments of the present invention may be partially or wholly combined or combined with each other, and as those skilled in the art will fully understand, technically various interlocking and driving are possible, and each embodiment may be implemented independently of each other, It may be possible to implement together in a related relationship.

이하, 첨부된 도면을 참조하여 본 발명의 다양한 실시예들을 상세히 설명한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 서로 다른 포맷의 전자 문서들에 대한 일치 여부를 나타내는 데이터 제공 장치(이하 '전자 장치'라고 함)를 설명하기 위한 개략도이다.1 is a schematic diagram for explaining a data providing apparatus (hereinafter referred to as an 'electronic device') indicating whether electronic documents of different formats match each other according to an embodiment of the present invention.

도 1을 참조하면, 전자 장치(100)는 서로 다른 포맷의 전자 문서들을 비교하여 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공하기 위한 전자 장치로서, 이를 위한 애플리케이션, 프로그램, 위젯 또는 웹 브라우저 등이 설치된 스마트폰, 태블릿 PC(Personal Computer), 노트북 및/또는 PC 등 중 적어도 하나를 포함할 수 있다.Referring to FIG. 1 , an electronic device 100 is an electronic device for comparing electronic documents of different formats and providing data indicating whether the electronic documents match, and an application, program, widget, or web browser for this purpose It may include at least one of an installed smart phone, a tablet PC (Personal Computer), a notebook computer, and/or a PC.

이러한 전자 장치(100)는 애플리케이션, 프로그램, 위젯 또는 웹 브라우저 등을 통해 서로 다른 포맷의 전자 문서들을 선택하고, 선택된 전자 문서들 간의 일치 여부를 결정하며, 결정된 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공할 수 있다.The electronic device 100 selects electronic documents of different formats through an application, program, widget or web browser, etc., determines whether the selected electronic documents match, and receives data indicating whether the determined electronic documents match. can provide

.doc, .png, .mht 및 .pdf와 같이 서로 다른 포맷을 가지는 '특허란'이라는 전자 문서들(110, 112, 114, 116) 중 적어도 둘 이상의 전자 문서들의 일치 여부를 판단하기 위한 요청에 따라 전자 장치(100)는 적어도 둘 이상의 전자 문서들 각각을 분석하여 문장 속성 데이터를 생성하고, 생성된 문장 속성 데이터에 기반하여 적어도 둘 이상의 전자 문서들에 대한 문장 유사도를 결정할 수 있다. In response to a request for determining whether at least two or more of the electronic documents (110, 112, 114, 116) called 'patent column' having different formats such as .doc, .png, .mht and .pdf match Accordingly, the electronic device 100 may analyze each of the at least two or more electronic documents to generate sentence attribute data, and determine the sentence similarity of the at least two or more electronic documents based on the generated sentence attribute data.

구체적으로, 전자 장치(100)는 적어도 둘 이상의 전자 문서들 각각에 대하여 글자 및 글자가 아닌 객체를 분류할 수 있다. 여기서, 글자가 아닌 객체는 이미지, 표, 그래프, 도형 및/또는 주석 등을 포함할 수 있다. 또한 글자와 글자가 아닌 객체를 분류하기 위해 전자 장치(100)는 다양한 글자(또는 문자) 인식 방식을 이용할 수 있다.Specifically, the electronic device 100 may classify text and non-text objects for each of at least two or more electronic documents. Here, objects other than text may include images, tables, graphs, figures, and/or comments. Also, in order to classify letters and non-letter objects, the electronic device 100 may use various letter (or text) recognition methods.

전자 장치(100)는 글자를 분석하여 적어도 하나의 문장을 구분할 수 있다. 적어도 하나의 문장을 구분하기 위해 전자 장치(100)는 띄어쓰기, 공백, 및/또는 마침표 등과 같은 구분을 위한 객체를 구분 지표로써 이용할 수 있다. 예를 들어, 띄어쓰기가 인식되고, 마침표가 인식되면 전자 장치(100)는 인식된 띄어쓰기 및 인식된 마침표 사이의 글자들을 하나의 문장으로서 결정할 수 있다. The electronic device 100 may classify at least one sentence by analyzing the character. In order to distinguish at least one sentence, the electronic device 100 may use an object for classification, such as a space, a space, and/or a period, as a classification index. For example, when a space is recognized and a period is recognized, the electronic device 100 may determine characters between the recognized space and the recognized period as one sentence.

전자 장치(100)는 적어도 하나의 구분된 문장에 대한 문장 속성 데이터를 생성할 수 있다. 여기서, 문장 속성 데이터는 적어도 하나의 문장 각각에 대한 글꼴, 크기, 색상 및 효과(굵기, 이탤릭체 변환, 또는 밑줄 등) 등 중 적어도 하나를 나타낼 수 있으나, 이에 한정되지 않는다. The electronic device 100 may generate sentence attribute data for at least one divided sentence. Here, the sentence attribute data may indicate at least one of font, size, color, effect (thickness, italic conversion, or underline, etc.) for each of at least one sentence, but is not limited thereto.

전자 장치(100)는 생성된 문장 속성 데이터에 기반하여 적어도 둘 이상의 전자 문서들에 대한 문장 유사도를 결정할 수 있다. 구체적으로, 전자 장치(100)는 문장 속성 데이터에 기반하여 적어도 둘 이상의 전자 문서들의 문장을 비교하기 위해 사용되는 문장 비교 지표 데이터를 생성하고, 생성된 문장 비교 지표 데이터를 이용하여 문장 유사도를 결정할 수 있다. 여기서, 문장 비교 지표 데이터는 글꼴 형태(type), 글꼴 크기, 글꼴 색상 및 효과 등 중 적어도 하나를 포함할 수 있으나, 이에 한정되지 않으며, 문장 비교를 위해 사용되는 다양한 데이터가 비교 지표로서 포함될 수 있다. 글꼴 형태는 문장별 글꼴을 구분하기 위해 사용되는 임의의 문자를 의미한다. The electronic device 100 may determine the sentence similarity of at least two or more electronic documents based on the generated sentence attribute data. Specifically, the electronic device 100 may generate sentence comparison index data used to compare sentences of at least two electronic documents based on the sentence attribute data, and determine the sentence similarity using the generated sentence comparison indicator data. have. Here, the sentence comparison index data may include at least one of a font type, font size, font color, and effect, but is not limited thereto, and various data used for sentence comparison may be included as a comparison index. . The font type means any character used to distinguish the font for each sentence.

예를 들어, 제1 전자 문서와, 제1 전자 문서를 다른 포맷으로 변경한 제2 전자 문서 간의 일치 여부를 결정하기 위해 전자 장치(100)는 제1 전자 문서에서 첫번째 문장의 글꼴이 '맑은 고딕'인 경우 '맑은 고딕'에 대응하여 문자 'A'를 글꼴 형태로서 나타내고, 두번째 문장의 글꼴이 '굴림'인 경우 '굴림'에 대응하여 문자 'B'를 글꼴 형태로서 나타낼 수 있다. 전자 장치(100)는 제2 전자 문서에서 첫번째 문장의 글꼴이 '돋음'인 경우 문자 'A'를 글꼴 형태로서 나타내고, 두번째 문장의 글꼴이 '궁서'인 경우 문자 'B'를 글꼴 형태로 나타낼 수 있다. 이는 제1 전자 문서를 제1 전자 문서와 다른 포맷의 제2 전자 문서로 변경한 경우 제2 전자 문서에서 각 문장의 글꼴이 제1 전자 문서의 그것과 다를 수 있지만, 각 문장이 서로 같은 내용을 가질 것이므로, 이들을 임의의 동일한 문자로서 나타내기 위함이다.For example, in order to determine whether the first electronic document matches the second electronic document obtained by changing the first electronic document to a different format, the electronic device 100 sets the font of the first sentence in the first electronic document to 'clear gothic'. ', the letter 'A' may be expressed as a font form in correspondence to 'clear Gothic', and if the font of the second sentence is 'Gulrim', the letter 'B' may be expressed as a font form in correspondence to 'Gulrim'. In the second electronic document, the electronic device 100 represents the letter 'A' as a font form when the font of the first sentence in the second electronic document is 'Gungseo', and displays the character 'B' in the font form when the font of the second sentence is 'Gungseo'. can This means that when the first electronic document is changed to a second electronic document having a format different from that of the first electronic document, the font of each sentence in the second electronic document may be different from that of the first electronic document, but each sentence contains the same content will have, so to represent them as arbitrary identical characters.

예를 들어, 워드(doc) 형태로 저장된 '특허란.doc'와 같은 제1 전자 문서(110) 및 PDF 형태로 저장된 '특허란.pdf'와 같은 제1 전자 문서(116)의 일치 여부를 판단하기 위한 요청이 있다고 가정한다. 이러한 경우 전자 장치(100)는 제1 전자 문서(110)와 제2 전자 문서(116) 각각을 분석하여 글자 및 글자에 해당하지 않는 객체를 분류하고, 글자를 분석하여 제1 전자 문서(110)와 제2 전자 문서(116) 각각에 대하여 적어도 하나의 문장을 구분할 수 있다. 제1 전자 문서(110) 및 제2 전자 문서(116) 각각이 10개의 문장을 포함하는 경우 전자 장치(100)는 제1 전자 문서(110)에 대응하는 10개의 문장에 대한 제1 문장 속성 데이터를 생성하고, 제2 전자 문서(116)에 대응하는 10개의 문장에 대한 제2 문장 속성 데이터를 생성할 수 있다. For example, whether the first electronic document 110 such as 'patent column.doc' stored in word (doc) format and the first electronic document 116 such as 'patent column.pdf' stored in PDF format match or not Assume that there is a request for judgment. In this case, the electronic device 100 analyzes each of the first electronic document 110 and the second electronic document 116 to classify letters and objects that do not correspond to letters, and analyzes the letters to obtain the first electronic document 110 . and at least one sentence for each of the second electronic document 116 may be distinguished. When each of the first electronic document 110 and the second electronic document 116 includes 10 sentences, the electronic device 100 provides first sentence attribute data for 10 sentences corresponding to the first electronic document 110 . , and second sentence attribute data for 10 sentences corresponding to the second electronic document 116 may be generated.

전자 장치(100)는 제1 전자 문서(110)에 대응하는 10개의 문장에 대한 제1 문장 속성 데이터를 이용하여 제2 전자 문서(116)와의 비교를 위한 제1 문장 비교 지표 데이터를 생성하고, 제2 전자 문서(116)에 대응하는 10개의 문장에 대한 제2 문장 속성 데이터를 이용하여 제1 전자 문서(110)와의 비교를 위한 제2 문장 비교 지표 데이터를 생성할 수 있다. 예를 들어, 전자 장치(100)는 '특', '허', '제', '도', '의' '목', '적'과 같이 7개의 문자로 구성된 특정 문장을 분석한 결과 서로 다른 글꼴이 인식되는 경우 빈도수가 높은 글꼴을 해당 문장의 기준 글꼴로서 결정할 수 있다. '특', '허', '제', '도', '의' '목'의 글꼴이 '굴림'이고, '적'의 글꼴이 '돋음'인 경우 전자 장치(100)는 해당 문장의 기준 글꼴을 '굴림'으로 결정할 수 있다. 여기서, 특정 문장 또는 특정 단어를 구성하는 '문자'는 글꼴이 적용될 수 있는 문장 또는 단어의 최소 단위를 의미할 수 있다.The electronic device 100 generates first sentence comparison index data for comparison with the second electronic document 116 by using the first sentence attribute data for 10 sentences corresponding to the first electronic document 110, Second sentence comparison index data for comparison with the first electronic document 110 may be generated by using the second sentence attribute data for ten sentences corresponding to the second electronic document 116 . For example, the electronic device 100 analyzes a specific sentence composed of 7 characters, such as 'teuk', 'heo', 'je', 'do', 'ui', 'muk', and 'red', as a result of analyzing each other. When other fonts are recognized, a font having a high frequency may be determined as a reference font of the corresponding sentence. When the font of 'Teuk', 'Heo', 'Je', 'Do', 'Ui', and 'Mok' is 'Gulrim' and the font of 'Enemy' is 'Jangeum', the electronic device 100 sets the text of the corresponding sentence. You can decide the base font to be 'Arial'. Here, the 'character' constituting a specific sentence or specific word may mean a minimum unit of a sentence or word to which a font can be applied.

전자 장치(100)는 제1 문장 비교 지표 데이터 및 제2 문장 비교 지표 데이터를 비교하여 제1 전자 문서(110) 및 제2 전자 문서(116)에 대한 문장 유사도를 결정할 수 있다. 여기서, 문장 유사도는 적어도 둘 이상의 전자 문서들 간의 문장 비교 지표 데이터가 일치하는 문장의 개수를 수치화한 값을 의미할 수 있으나, 이에 한정되지 않는다. 다양한 실시예에서 문장 유사도는 전체 문장 개수 대비 일치하는 문장 개수의 비율로서 나타낼 수도 있다.The electronic device 100 may compare the first sentence comparison indicator data and the second sentence comparison indicator data to determine the sentence similarity with respect to the first electronic document 110 and the second electronic document 116 . Here, the sentence similarity may mean a value obtained by quantifying the number of sentences in which sentence comparison index data between at least two or more electronic documents match, but is not limited thereto. In various embodiments, the sentence similarity may be expressed as a ratio of the number of matching sentences to the total number of sentences.

다시 말해서, 전자 장치(100)는 제1 전자 문서(110)에 대응하는 10개의 문장 각각에 대한 문장 비교 지표 데이터와 제2 전자 문서(116)에 대응하는 10개의 문장 각각에 대한 문장 비교 지표 데이터를 비교하여 문장 비교 지표 데이터가 일치하는 문장의 개수를 확인할 수 있다. 전자 장치(100)는 일치하는 문장 개수를 수치화하거나, 전체 문장 개수 대비 일치하는 문장 개수의 비율을 수치화하여 문장 유사도를 결정할 수 있다.In other words, the electronic device 100 provides sentence comparison index data for each of ten sentences corresponding to the first electronic document 110 and sentence comparison index data for each of ten sentences corresponding to the second electronic document 116 . can be compared to check the number of sentences that match the sentence comparison index data. The electronic device 100 may determine the sentence similarity by quantifying the number of matching sentences or quantifying the ratio of the number of matching sentences to the total number of sentences.

다음으로, 전자 장치(100)는 결정된 문장 유사도에 따라 적어도 둘 이상의 전자 문서들 각각에 대한 단어 속성 데이터를 생성하고, 생성된 단어 속성 데이터에 기반하여 적어도 둘 이상의 전자 문서들에 대한 단어 유사도를 결정할 수 있다.Next, the electronic device 100 generates word attribute data for each of the at least two or more electronic documents according to the determined sentence similarity, and determines the word similarity of the at least two or more electronic documents based on the generated word attribute data. can

구체적으로, 전자 장치(100)는 제1 전자 문서(110)에 대응하는 10개의 문장 각각을 적어도 하나의 단어로 구분하고, 제2 전자 문서(116)에 대응하는 10개의 문장 각각을 적어도 하나의 단어로 구분할 수 있다.Specifically, the electronic device 100 divides each of the ten sentences corresponding to the first electronic document 110 into at least one word, and divides each of the ten sentences corresponding to the second electronic document 116 into at least one word. can be separated by words.

적어도 하나의 단어를 구분하기 위해 전자 장치(100)는 공백, 띄어쓰기, 및/또는 마침표 등과 같은 구분 객체, 및 글꼴, 크기, 색상 및/또는 효과 등과 같은 문자 속성을 이용할 수 있다. 예를 들어, 전자 장치(100)는 10개의 문장 각각을 분석하여 공백이 인식되고, 다음 공백이 인식되면 인식된 공백 사이의 문자들 각각의 속성을 확인할 수 있다. 전자 장치(100)는 공백 사이에서 동일한 속성을 갖는 적어도 하나의 문자를 단어로서 결정할 수 있다. 다양한 실시예에서 공백 사이의 문자들 중 문자 속성이 일치하지 않은 문자가 존재하면 전자 장치(100)는 해당 문자를 하나의 단어로서 결정할 수도 있다. 다시 말해서, 속성이 일치하지 않은 하나 이상의 문자를 단어로서 결정하게 된다.In order to distinguish at least one word, the electronic device 100 may use a classification object such as a space, a space, and/or a period, and character properties such as a font, size, color, and/or effect. For example, the electronic device 100 analyzes each of 10 sentences to recognize a space, and when the next space is recognized, the electronic device 100 may check the properties of each character between the recognized spaces. The electronic device 100 may determine at least one character having the same attribute between spaces as a word. According to various embodiments, if there is a character whose character attribute does not match among the characters between spaces, the electronic device 100 may determine the corresponding character as a single word. In other words, one or more characters whose attributes do not match are determined as words.

전자 장치(100)는 제1 전자 문서(110)에 대응하여 적어도 하나의 구분된 단어에 대한 제1 단어 속성 데이터를 생성하고, 제2 전자 문서(116)에 대응하여 적어도 하나의 구분된 단어에 대한 제2 단어 속성 데이터를 생성할 수 있다. 여기서, 단어 속성 데이터는 적어도 하나의 단어 각각에 대한 문장 번호, 글꼴, 크기, 색상 및 효과 등 중 적어도 하나를 포함할 수 있으나, 이에 한정되지 않는다. 여기서, 문장 번호는 단어가 속하는 문장의 번호를 의미할 수 있다. The electronic device 100 generates first word attribute data for at least one divided word in correspondence to the first electronic document 110 , and corresponds to the second electronic document 116 , in response to the at least one divided word. It is possible to generate second word attribute data for Here, the word attribute data may include at least one of a sentence number, font, size, color, and effect for each of the at least one word, but is not limited thereto. Here, the sentence number may mean the number of the sentence to which the word belongs.

전자 장치(100)는 제1 전자 문서에 대응하는 제1 단어 속성 데이터를 이용하여 제2 전자 문서의 단어와의 비교를 위해 사용되는 제1 단어 비교 지표 데이터를 생성하고, 제2 전자 문서에 대응하는 제2 단어 속성 데이터를 이용하여 제1 전자 문서의 단어와의 비교를 위해 사용되는 제2 단어 비교 지표 데이터를 생성할 수 있다. 여기서, 단어 비교 지표 데이터는 글꼴 형태, 글꼴 형태별 빈도수, 글꼴 크기 비율, 글꼴 크기별 빈도수, 글꼴 색상 비율 및 글꼴 색상별 빈도수 등 중 적어도 하나를 포함할 수 있으나, 단어 비교를 위해 사용되는 다양한 데이터가 비교 지표로서 포함될 수 있다. 여기서, 글꼴 크기 비율은 문서 내 전체 크기를 100%로 할 경우 전체 크기 대비 글꼴 크기가 차지하는 비율(%)을 의미하고, 글꼴 색상 비율은 문서 내 전체 색상을 RGB별로 100%로 할 경우 전체 색상 대비 해당 글꼴 색상이 차지하는 비율(%)을 의미할 수 있다. 또한, 빈도수는 단어의 개수를 의미할 수 있다.The electronic device 100 generates first word comparison index data used for comparison with words in a second electronic document by using first word attribute data corresponding to the first electronic document, and corresponds to the second electronic document The second word comparison index data used for comparison with the words of the first electronic document may be generated by using the second word attribute data. Here, the word comparison index data may include at least one of a font type, a frequency for each font type, a font size ratio, a frequency for each font size, a font color ratio, and a frequency for each font color. It can be included as an indicator. Here, the font size ratio means the ratio (%) of the font size to the overall size when the total size of the document is 100%, and the font color ratio is the total color ratio when the entire color in the document is 100% for each RGB. It may mean the percentage (%) occupied by the corresponding font color. Also, the frequency may mean the number of words.

전자 장치(100)는 제1 단어 비교 지표 데이터 및 제2 단어 비교 지표 데이터를 비교하여 제1 전자 문서(110) 및 제2 전자 문서(116)에 대한 단어 유사도를 결정할 수 있다. 여기서, 단어 유사도는 적어도 둘 이상의 전자 문서 간의 단어 비교 지표 데이터가 일치하는 단어의 개수를 수치화한 값을 의미할 수 있으나, 이에 한정되지 않는다. 다양한 실시예에서 단어 유사도는 전체 단어 개수 대비 일치하는 단어 개수의 비율로서 나타낼 수도 있다.The electronic device 100 may determine the word similarity with respect to the first electronic document 110 and the second electronic document 116 by comparing the first word comparison index data and the second word comparison index data. Here, the word similarity may mean a value obtained by quantifying the number of words that match the word comparison index data between at least two electronic documents, but is not limited thereto. In various embodiments, the word similarity may be expressed as a ratio of the number of matching words to the total number of words.

다시 말해서, 전자 장치(100)는 제1 전자 문서(110)에 대응하는 10개의 문장 각각을 구성하는 적어도 하나의 단어 각각의 단어 비교 지표 데이터와, 제2 전자 문서(116)에 대응하는 10개의 문장 각각을 구성하는 적어도 하나의 단어 각각의 단어 비교 지표 데이터를 비교하여 단어 비교 지표 데이터가 일치하는 단어의 개수를 확인할 수 있다. 전자 장치(100)는 일치하는 단어 개수를 수치화하거나, 전체 단어 개수 대비 일치하는 단어 개수의 비율을 수치화하여 단어 유사도를 결정할 수 있다.In other words, the electronic device 100 provides word comparison index data for each of at least one word constituting each of 10 sentences corresponding to the first electronic document 110 , and 10 sentences corresponding to the second electronic document 116 . By comparing the word comparison index data of each of at least one word constituting each sentence, the number of words with which the word comparison index data matches may be confirmed. The electronic device 100 may determine the word similarity by quantifying the number of matching words or by quantifying the ratio of the number of matching words to the total number of words.

전자 장치(100)는 결정된 단어 유사도에 따라 적어도 둘 이상의 전자 문서들 간의 일치 여부를 결정하고, 이를 나타내는 결과 데이터를 제공할 수 있다. The electronic device 100 may determine whether at least two or more electronic documents match each other according to the determined word similarity, and may provide result data indicating this.

구체적으로, 전자 장치(100)는 결정된 단어 유사도가 기 설정된 임계 유사도 이상인지를 결정하여 임계 유사도 이상이면 적어도 둘 이상의 전자 문서에 대한 상세 비교를 수행할 수 있다. Specifically, the electronic device 100 may determine whether the determined word similarity is equal to or greater than a preset threshold similarity, and, if equal to or greater than the threshold similarity, may perform detailed comparison of at least two or more electronic documents.

상세 비교를 위해 전자 장치(100)는 적어도 둘 이상의 전자 문서에 대한 상세 비교 지표 데이터를 생성할 수 있다. 예를 들어, 전자 장치(100)는 제1 전자 문서(110)의 문장 번호, 문장 번호에 따른 적어도 하나의 단어, 단어별 글꼴 형태, 단어별 글꼴 크기, 및 단어별 글꼴 색상 등 중 적어도 하나를 포함하는 제1 상세 비교 지표 데이터를 생성하고, 제2 전자 문서(116)의 제2 상세 비교 지표 데이터를 생성할 수 있다. For detailed comparison, the electronic device 100 may generate detailed comparison index data for at least two or more electronic documents. For example, the electronic device 100 selects at least one of a sentence number of the first electronic document 110, at least one word according to the sentence number, a font shape for each word, a font size for each word, and a font color for each word. It is possible to generate the first detailed comparison index data including, and to generate the second detailed comparison index data of the second electronic document 116 .

전자 장치(100)는 생성된 상세 비교 지표 데이터를 비교하여 적어도 둘 이상의 전자 문서 간의 일치 여부를 결정할 수 있다. 예를 들어, 전자 장치(100)는 제1 전자 문서(110)의 제1 상세 비교 지표 데이터 및 제2 전자 문서(116)의 제2 상세 비교 지표 데이터를 비교하는데, 각 문장 번호에 따른 적어도 하나의 단어를 비교하고, 비교 결과 적어도 하나의 단어가 일치하면 단어별 글꼴 형태, 단어별 글꼴 크기 및 단어별 글꼴 색상을 비교할 수 있다.The electronic device 100 may determine whether at least two or more electronic documents match by comparing the generated detailed comparison index data. For example, the electronic device 100 compares the first detailed comparison index data of the first electronic document 110 and the second detailed comparison index data of the second electronic document 116 , and at least one , and if at least one word matches as a result of the comparison, a font shape for each word, a font size for each word, and a font color for each word may be compared.

단어별 글꼴 형태, 단어별 글꼴 크기 및 단어별 글꼴 색상이 서로 일치하면 전자 장치(100)는 제1 전자 문서(110)와 제2 전자 문서(116)가 서로 일치한다고 결정하고, 이를 나타내는 결과 데이터를 제공할 수 있다.When the font shape for each word, the font size for each word, and the font color for each word coincide with each other, the electronic device 100 determines that the first electronic document 110 and the second electronic document 116 match each other, and results data indicating this can provide

각 문장 번호에 따른 적어도 하나의 단어가 일치하지 않거나, 단어별 글꼴 형태, 단어별 글꼴 크기 및 단어별 글꼴 색상 중 적어도 하나가 일치하지 않으면 전자 장치(100)는 이를 나타내는 결과 데이터 또한 제공할 수 있다. 예를 들어, 전자 장치(100)는 제1 전자 문서(110)와 제2 전자 문서(116) 간의 일치하지 않은 단어, 일치하지 않은 글꼴 형태, 일치하지 않은 글꼴 크기 및 일치하지 않은 글꼴 색상 중 적어도 하나를 나타내는 결과 데이터를 제공할 수 있다.If at least one word according to each sentence number does not match or at least one of a font shape for each word, a font size for each word, and a font color for each word does not match, the electronic device 100 may also provide result data indicating this . For example, the electronic device 100 may display at least one of mismatched words, mismatched font shapes, mismatched font sizes, and mismatched font colors between the first electronic document 110 and the second electronic document 116 . You can provide result data that represents one.

다양한 실시예에서 전자 문서 내에 이미지, 표, 그래프, 도형 및/또는 주석 등과 같은 글자가 아닌 객체가 존재하는 경우 전자 장치(100)는 객체를 분석하여 객체 크기를 포함하는 객체 속성 데이터를 생성하고, 생성된 객체 속성 데이터를 이용하여 적어도 둘 이상의 전자 문서들의 객체를 비교하기 위해 사용되는 객체 비교 지표 데이터를 생성할 수 있다. 이미지의 경우 전자 장치(100)는 다양한 이미지 분석 방식을 통해 이미지의 크기를 분석하지만 이미지가 아닌 표, 그래프, 도형 및/또는 주석 등의 경우 이미지로 변환하여 이미지 분석을 수행할 수 있다. 여기서, 객체 비교 지표 데이터는 전체 문서 크기 대비 변환된 이미지 크기의 비율을 수치화한 값을 나타낼 수 있다. In various embodiments, when non-text objects such as images, tables, graphs, figures and/or comments exist in the electronic document, the electronic device 100 analyzes the object to generate object property data including the size of the object, Object comparison index data used to compare objects of at least two or more electronic documents may be generated using the generated object attribute data. In the case of an image, the electronic device 100 analyzes the size of the image through various image analysis methods, but in the case of a table, graph, figure, and/or annotation other than an image, it may be converted into an image to perform image analysis. Here, the object comparison index data may represent a value obtained by quantifying the ratio of the converted image size to the total document size.

전자 장치(100)는 적어도 둘 이상의 전자 문서 각각에 대한 객체 비교 지표 데이터를 비교하여 적어도 둘 이상의 전자 문서들에 대한 객체 유사도를 결정할 수 있다. 여기서, 객체 유사도는 적어도 둘 이상의 전자 문서들 간의 객체 비교 지표 데이터가 일치하는 객체의 개수를 수치화한 값을 의미할 수 있으나, 이에 한정되지 않으며, 객체 유사도는 전체 객체 개수 대비 일치하는 객체 개수의 비율로서 나타낼 수도 있다. The electronic device 100 may determine object similarity with respect to at least two or more electronic documents by comparing object comparison index data for each of the at least two or more electronic documents. Here, the object similarity may mean a value obtained by quantifying the number of objects to which the object comparison index data between at least two or more electronic documents match, but is not limited thereto, and the object similarity is the ratio of the number of matching objects to the total number of objects. It can also be expressed as

전자 장치(100)는 상술한 단어 유사도뿐만 아니라 객체 유사도도 함께 고려하여 전자 문서들 간의 일치 여부를 결정할 수 있다. 구체적으로, 전자 장치(100)는 결정된 단어 유사도가 임계 유사도 이상이고, 결정된 객체 유사도가 임계 유사도 이상이면 적어도 둘 이상의 전자 문서에 대한 상세 비교를 수행할 수 있다. 상세 비교를 위해 전자 장치(100)는 적어도 둘 이상의 전자 문서에 대한 상세 비교 지표 데이터를 생성할 수 있다. 예를 들어, 상세 비교 지표 데이터는 상술한 문장 번호, 문장 번호에 따른 적어도 하나의 단어, 단어별 글꼴 형태, 단어별 글꼴 크기, 및 단어별 글꼴 색상뿐만 아니라 전체 문서 크기 대비 이미지 크기의 비율을 더 포함할 수 있다. The electronic device 100 may determine whether electronic documents match each other by considering not only the above-described word similarity but also object similarity. Specifically, when the determined word similarity is equal to or greater than the threshold similarity and the determined object similarity is equal to or greater than the threshold similarity, the electronic device 100 may perform detailed comparison on at least two or more electronic documents. For detailed comparison, the electronic device 100 may generate detailed comparison index data for at least two or more electronic documents. For example, the detailed comparison index data includes the above-described sentence number, at least one word according to the sentence number, font shape for each word, font size for each word, and font color for each word, as well as the ratio of the image size to the overall document size. may include

전자 장치(100)는 생성된 상세 비교 지표 데이터를 비교하여 적어도 둘 이상의 전자 문서 간의 일치 여부를 결정할 수 있다. 예를 들어, 전자 장치(100)는 각 문장 번호에 따른 적어도 하나의 단어를 비교하고, 비교 결과 적어도 하나의 단어 일치하면 단어별 글꼴 형태, 단어별 글꼴 크기, 단어별 글꼴 색상 및 전체 문서 크기 대비 이미지 크기의 비율을 비교할 수 있다.The electronic device 100 may determine whether at least two or more electronic documents match by comparing the generated detailed comparison index data. For example, the electronic device 100 compares at least one word according to each sentence number, and if at least one word matches as a result of the comparison, the font shape for each word, the font size for each word, the font color for each word, and the overall document size are compared You can compare the ratio of image size.

단어별 글꼴 형태, 단어별 글꼴 크기, 단어별 글꼴 색상 및 전체 문서 크기 대비 이미지 크기의 비율이 서로 일치하면 전자 장치(100)는 제1 전자 문서(110)와 제2 전자 문서(116)가 서로 일치한다고 결정하고, 이를 나타내는 결과 데이터를 제공할 수 있다.When the font shape for each word, the font size for each word, the font color for each word, and the ratio of the image size to the overall document size match each other, the electronic device 100 determines that the first electronic document 110 and the second electronic document 116 are mutually You can determine a match and provide result data indicating this.

각 문장 번호에 따른 적어도 하나의 단어가 일치하지 않거나, 단어별 글꼴 형태, 단어별 글꼴 크기, 단어별 글꼴 색상 및 중 전체 문서 크기 대비 이미지 크기의 비율 중 적어도 하나가 일치하지 않으면 전자 장치(100)는 이를 나타내는 결과 데이터 또한 제공할 수 있다. If at least one word according to each sentence number does not match, or at least one of the font shape for each word, the font size for each word, the font color for each word, and the ratio of the image size to the total document size does not match, the electronic device 100 may also provide result data indicating this.

이와 같이 본 발명은 본 발명은 문서 포맷과 상관없이 둘 이상의 전자 문서들 간의 일치 여부를 빠르고 쉽게 확인할 수 있다.As described above, according to the present invention, it is possible to quickly and easily check whether two or more electronic documents match each other regardless of the document format.

하기에서는 도 2를 참조하여 서로 다른 포맷의 전자 문서들을 비교하여 유사도를 결정하는 전자 장치에 대해서 구체적으로 설명하도록 한다.Hereinafter, an electronic device for determining similarity by comparing electronic documents of different formats with reference to FIG. 2 will be described in detail.

도 2는 본 발명의 실시예에 따른 전자 장치의 개략적인 블록도이다.2 is a schematic block diagram of an electronic device according to an embodiment of the present invention.

도 2를 참조하면, 전자 장치(200)는 통신부(210), 표시부(220), 저장부(230) 및 제어부(240)를 포함한다. 제시된 실시예에서 전자 장치(200)는 도 1의 전자 장치(100)를 의미할 수 있다.Referring to FIG. 2 , the electronic device 200 includes a communication unit 210 , a display unit 220 , a storage unit 230 , and a control unit 240 . In the presented embodiment, the electronic device 200 may refer to the electronic device 100 of FIG. 1 .

통신부(210)는 전자 장치(200)가 외부 장치와 통신이 가능하도록 연결한다. 통신부(210)는 유/무선 통신을 이용하여 외부 장치와 연결되어 다양한 데이터를 송수신할 수 있다.The communication unit 210 connects the electronic device 200 to enable communication with an external device. The communication unit 210 may be connected to an external device using wired/wireless communication to transmit/receive various data.

표시부(220)는 사용자에게 텍스트, 이미지, 비디오, 아이콘, 배너 또는 심벌 등과 같은 다양한 콘텐츠를 표시할 수 있다. 예를 들어, 표시부(220)는 서로 다른 포맷의 전자 문서들 및/또는 전자 문서들 간의 비교 결과를 나타내는 다양한 인터페이스 화면을 표시할 수 있다.The display unit 220 may display various contents such as text, image, video, icon, banner or symbol to the user. For example, the display unit 220 may display various interface screens indicating electronic documents of different formats and/or comparison results between electronic documents.

다양한 실시예에서 표시부(220)는 터치스크린을 포함할 수 있으며, 예를 들면, 전자 펜 또는 사용자의 신체의 일부를 이용한 터치(touch), 제스처(gesture), 근접, 드래그(drag), 스와이프(swipe) 또는 호버링(hovering) 입력 등을 수신할 수 있다. In various embodiments, the display unit 220 may include a touch screen, for example, a touch, a gesture, a proximity, a drag, and a swipe using an electronic pen or a part of the user's body. A swipe or hovering input may be received.

저장부(230)는 서로 다른 포맷의 전자 문서들 및 전자 문서들을 비교하기 위해 사용되는 다양한 데이터를 저장할 수 있다. The storage unit 230 may store electronic documents of different formats and various data used to compare electronic documents.

다양한 실시예에서 저장부(230)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 전자 장치(200)는 인터넷(internet)상에서 상기 저장부(230)의 저장 기능을 수행하는 웹 스토리지(web storage)와 관련되어 동작할 수도 있다.In various embodiments, the storage unit 230 includes a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, SD or XD). memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM) , a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium. The electronic device 200 may operate in relation to a web storage that performs the storage function of the storage unit 230 on the Internet.

제어부(240)는 통신부(210), 표시부(220) 및 저장부(230)와 동작 가능하게 연결되며, 서로 다른 포맷의 전자 문서들을 비교하여 해당 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공하기 위한 다양한 명령들을 수행할 수 있다.The control unit 240 is operatively connected to the communication unit 210, the display unit 220, and the storage unit 230, and compares electronic documents of different formats to provide data indicating whether the electronic documents are consistent with each other. Various commands can be executed.

이를 위해 제어부(240)는 서로 다른 포맷의 전자 문서들을 분석하여 각 전자 문서에 대한 문장 속성 데이터를 생성하고, 생성된 문장 속성 데이터에 기반하여 전자 문서들에 대한 문장 유사도를 결정하고, 결정된 문장 유사도에 따라 각 전자 문서에 대한 단어 속성 데이터를 생성하고, 생성된 단어 속성 데이터에 기반하여 전자 문서들에 대한 단어 유사도를 결정하고, 결정된 단어 유사도에 따라 전자 문서들 간의 일치 여부를 결정하며, 이를 나타내는 결과 데이터를 제공할 수 있다.To this end, the controller 240 analyzes electronic documents of different formats to generate sentence attribute data for each electronic document, determines the sentence similarity to the electronic documents based on the generated sentence attribute data, and determines the sentence similarity generates word attribute data for each electronic document according to Result data can be provided.

구체적으로, 제어부(240)는 서로 다른 두 개의 전자 문서들을 분석하여 각 전자 문서에서 글자 및 글자가 아닌 객체를 분류하고, 글자를 분석하여 적어도 하나의 문장을 구분한 후 적어도 하나의 구분된 문장에 대한 문장 속성 데이터를 생성할 수 있다.Specifically, the control unit 240 analyzes two different electronic documents to classify letters and non-letter objects in each electronic document, analyzes the letters to classify at least one sentence, and then adds them to at least one divided sentence. You can create sentence attribute data for

제어부(240)는 각 전자 문서에 대한 문장 속성 데이터에 기반하여 문장 비교 지표 데이터를 생성하고, 생성된 문장 비교 지표 데이터를 이용하여 두 개의 전자 문서 간의 문장 유사도를 결정할 수 있다. The controller 240 may generate sentence comparison index data based on the sentence attribute data for each electronic document, and determine the sentence similarity between the two electronic documents by using the generated sentence comparison indicator data.

문장 유사도를 결정하기 위한 방법에 대해서 도 3 및 도 4를 참조하여 구체적으로 설명하도록 한다.A method for determining the sentence similarity will be described in detail with reference to FIGS. 3 and 4 .

도 3은 본 발명의 실시예에 따른 두 개의 전자 문서들에 대한 문장 속성 데이터를 나타내는 예시도이고, 도 4는 본 발명의 실시예에 따른 두 개의 전자 문서들에 대한 문장 비교 지표 데이터를 나타내는 예시도이다.3 is an exemplary diagram illustrating sentence attribute data for two electronic documents according to an embodiment of the present invention, and FIG. 4 is an exemplary diagram illustrating sentence comparison index data for two electronic documents according to an embodiment of the present invention It is also

도 3을 참조하면, 제1 전자 문서(300) 및 제2 전자 문서(310)는 서로 다른 포맷을 가지고, 두 전자 문서들의 일치 여부를 결정하기 위한 요청이 있다고 가정한다.Referring to FIG. 3 , it is assumed that the first electronic document 300 and the second electronic document 310 have different formats, and there is a request for determining whether the two electronic documents match.

제어부(240)는 제1 전자 문서(300)를 분석하여 글자와 글자가 아닌 객체로 구분하고, 글자를 분석하여 13개의 문장으로 구분할 수 있다. 또한 제어부(240)는 제2 전자 문서(310)를 분석하여 글자와 글자가 아닌 객체로 구분하고, 글자를 분석하여 13개의 문장으로 구분할 수 있다.The control unit 240 may analyze the first electronic document 300 to classify it into letters and non-letter objects, and analyze the letters to classify the first electronic document 300 into 13 sentences. In addition, the controller 240 may analyze the second electronic document 310 to classify the text into text and non-text objects, and analyze the text to classify the second electronic document 310 into 13 sentences.

제어부(240)는 도 3의 (a)와 같이 제1 전자 문서(300)의 13개 문장에 대한 제1 문장 속성 데이터(305)를 생성하고, 도 3의 (b)와 같이 제2 전자 문서(310)의 13개 문장에 대한 제2 문장 속성 데이터(315)를 생성할 수 있다.The control unit 240 generates first sentence attribute data 305 for 13 sentences of the first electronic document 300 as shown in FIG. 3A, and as shown in FIG. 3B, the second electronic document Second sentence attribute data 315 for 13 sentences of 310 may be generated.

도 3의 (a)를 참조하면, 제1 문장 속성 데이터(305)는 문장 번호(예: 1, 2, …13), 글꼴(예: 맑은 고딕, 굴림, 돋음, …돋음), 글꼴 크기(예: 사이즈 12, 사이즈 10, 사이즈 9, …사이즈 9), 및 글꼴 색상(예: 연한 파랑, 파랑, 검정, …검정)을 포함할 수 있다.Referring to (a) of FIG. 3 , the first sentence attribute data 305 is a sentence number (eg, 1, 2, … 13), a font (eg, clear gothic, gulrim, bulge, … bolster), font size ( Examples: size 12, size 10, size 9, …size 9), and font color (eg, light blue, blue, black, …black).

도 3의 (b)를 참조하면, 제2 문장 속성 데이터(315)는 문장 번호(예: 1, 2, …13), 글꼴(예: 맑은 고딕, 굴림, 바탕, …돋음), 글꼴 크기(예: 사이즈 12, 사이즈 10, 사이즈 9, …사이즈 9), 및 글꼴 색상(예: 연한 파랑, 파랑, 검정, …검정)을 포함할 수 있다.Referring to (b) of FIG. 3, the second sentence attribute data 315 is a sentence number (eg, 1, 2, … 13), a font (eg, clear gothic, gullim, background, … embossed), a font size ( Examples: size 12, size 10, size 9, …size 9), and font color (eg, light blue, blue, black, …black).

도 4를 참조하면, 제어부(240)는 제1 문장 속성 데이터(305)에 기반하여 도 4의 (a)와 같이 제1 문장 비교 지표 데이터를 생성하고, 제2 문장 속성 데이터(315)에 기반하여 도 4의 (b)와 같이 제2 문장 지표 데이터를 생성할 수 있다. Referring to FIG. 4 , the controller 240 generates first sentence comparison index data as shown in FIG. 4 ( a ) based on the first sentence attribute data 305 , and based on the second sentence attribute data 315 . Thus, the second sentence indicator data may be generated as shown in FIG. 4(b).

구체적으로, 제1 전자 문서(300)에 대해서 제어부(240)는 문장 1의 문장 속성 데이터(예: 1. 맑은 고딕, 사이즈 12, 컬러 연한 파랑)에 기반하여 문장 1의 글꼴에 대응하여 글꼴 형태를 나타내는 임의의 문자 'A'를 부여하고, 제1 전자 문서(300)의 전체 문서 크기 대비 문장 1의 글꼴 크기의 비율을 산출하며, 제1 전자 문서(300)의 전체 RGB 색상 대비 문장 1의 글꼴 색상의 비율을 산출하여 문장 1에 대한 글꼴 형태(예: A), 글꼴 크기 비율(예: 39) 및 글꼴 색상 비율(예: 0:69:94)을 포함하는 문장 속성 데이터를 생성할 수 있다. 제어부(240)는 문장 2 내지 문장 13에 대해서도 동일한 방법을 이용하여 문장 속성 데이터를 생성함으로써, 도 4의 (a)와 같이 제1 전자 문서(300)에 대응하는 제1 문장 비교 지표 데이터를 생성할 수 있다.Specifically, for the first electronic document 300 , the control unit 240 corresponds to the font of the sentence 1 based on the sentence attribute data of the sentence 1 (eg, 1. clear gothic, size 12, color light blue) and the font form An arbitrary letter 'A' representing By calculating the ratio of font color, you can generate sentence attribute data including font shape (eg A), font size ratio (eg 39), and font color ratio (eg 0:69:94) for sentence 1 have. The control unit 240 generates sentence attribute data for sentences 2 to 13 using the same method, thereby generating first sentence comparison index data corresponding to the first electronic document 300 as shown in FIG. 4A . can do.

다음으로, 제2 전자 문서(310)에 대해서 제어부(240)는 문장 1의 문장 속성 데이터(예: 1. 맑은 고딕, 사이즈 12, 컬러 연한 파랑)에 기반하여 문장 1의 글꼴에 임의의 글꼴 형태에 해당하는 문자 'A'를 부여하고, 제1 전자 문서(300)의 전체 문서 크기 대비 문장 1의 글꼴 크기의 비율을 산출하며, 제2 전자 문서(310)의 전체 RGB 색상 대비 문장 1의 글꼴 색상의 비율을 산출하여 문장 1에 대한 글꼴 형태(예: A), 글꼴 크기 비율(예: 39) 및 글꼴 색상 비율(예: 0:69:94)을 포함하는 문장 속성 데이터를 생성할 수 있다. 제어부(240)는 문장 2 내지 문장 13에 대해서도 동일한 방법을 이용하여 문장 속성 데이터를 생성함으로써, 도 4의 (b)와 같이 제2 전자 문서(310)에 대응하는 제2 문장 비교 지표 데이터를 생성할 수 있다.Next, with respect to the second electronic document 310 , the control unit 240 sets an arbitrary font type to the font of the sentence 1 based on the sentence attribute data of the sentence 1 (eg, 1. Clear Gothic, size 12, color light blue). A letter 'A' corresponding to 'A' is given, the ratio of the font size of sentence 1 to the total document size of the first electronic document 300 is calculated, and the font of sentence 1 compared to the entire RGB color of the second electronic document 310 By calculating the ratio of colors, sentence attribute data including font type (eg A), font size ratio (eg 39) and font color ratio (eg 0:69:94) for sentence 1 can be generated. . The control unit 240 generates sentence attribute data for sentences 2 to 13 using the same method, thereby generating second sentence comparison index data corresponding to the second electronic document 310 as shown in FIG. 4( b ). can do.

다양한 실시예에서 제1 전자 문서(300)에서 문장 3의 글꼴과 제2 전자 문서(310)에서 문장 3의 글꼴은 다르지만 동일한 문장 번호에 해당하므로, 제어부(240)는 제1 전자 문서의 문장 3의 글꼴과 제2 전자 문서의 문장 3의 글꼴에 대응하여 임의의 문자 'C'를 동일하게 부여할 수 있다.In various embodiments, since the font of sentence 3 in the first electronic document 300 and the font of sentence 3 in the second electronic document 310 are different but correspond to the same sentence number, the controller 240 controls sentence 3 of the first electronic document An arbitrary letter 'C' may be equally assigned to the font of ' and the font of sentence 3 of the second electronic document.

제어부(240)는 이와 같이 생성된 제1 문장 비교 지표 데이터와 제2 문장 비교 지표 데이터를 비교하여 일치하는 문장의 개수를 산출하고, 산출된 문장 개수를 수치화하거나, 전체 문장 개수 대비 일치하는 문장 개수의 비율을 수치화하여 문장 유사도로서 산출할 수 있다. The control unit 240 compares the generated first sentence comparison index data with the second sentence comparison index data to calculate the number of matching sentences, quantify the calculated number of sentences, or the number of matching sentences compared to the total number of sentences It can be calculated as the sentence similarity by quantifying the ratio of .

다양한 실시예에서 'doc' 포맷의 제1 전자 문서를 'pdf' 포맷의 제2 전자 문서로 변환했을 때 일부 단어의 글꼴이 제1 전자 문서의 그것과 다른 글꼴로 변환된 경우 문장 속성 데이터를 생성하는 방법을 도 5를 참조하여 설명하도록 한다.In various embodiments, when a first electronic document in 'doc' format is converted into a second electronic document in 'pdf' format, when the font of some words is converted to a font different from that of the first electronic document, sentence attribute data is generated A method of doing this will be described with reference to FIG. 5 .

도 5는 본 발명의 실시예에 따른 문장 속성 데이터를 생성하는 방법을 설명하기 위한 예시도이다.5 is an exemplary diagram for explaining a method of generating sentence attribute data according to an embodiment of the present invention.

도 5를 참조하면, 제2 전자 문서(500)는 제1 전자 문서의 포맷과 다른 포맷으로 변환된 전자 문서를 의미한다. Referring to FIG. 5 , the second electronic document 500 refers to an electronic document converted into a format different from that of the first electronic document.

제2 전자 문서(500)의 문장 3을 구성하는 단어들 중 일부 단어(505, 510)(예: 발명을, 수단)의 속성이 나머지의 단어들의 속성과 다르다는 것을 인식하는 경우 제어부(240)는 문장 3을 구성하는 각 단어의 속성을 인식하고, 인식된 속성에서 서로 다른 속성을 갖는 단어 개수를 확인하여 더 많은 단어 개수가 갖는 속성을 문장 3의 속성으로 결정할 수 있다. 예를 들어, 문장 3을 구성하는 단어들 중 일부 단어(505, 510)의 글꼴이 '굴림'이고, 나머지 단어들의 글꼴이 '바탕'인 경우 제어부(240)는 '굴림' 글꼴을 갖는 단어의 개수와 '바탕' 글꼴을 갖는 단어의 개수를 확인하고, 개수를 비교하여 더 많은 개수를 갖는 글꼴을 문장 3의 글꼴로서 결정할 수 있다. 이러한 경우 '바탕' 글꼴을 갖는 단어의 개수가 '굴림' 글꼴을 갖는 단어의 개수보다 더 많으므로 문장 3의 글꼴은 '바탕'으로 결정될 수 있다. 이에, 제2 전자 문서(500)의 문장 속성 데이터(515)에서 문장 3의 문장 속성 데이터(515)는 '3. 바탕, 사이즈 9, 컬러 검정'을 포함할 수 있다.When recognizing that properties of some words 505 and 510 (eg, invention, means) among words constituting sentence 3 of the second electronic document 500 are different from properties of the remaining words, the controller 240 controls The attribute of each word constituting the sentence 3 may be recognized, and the number of words having different attributes may be checked from the recognized attributes, and the attribute having the greater number of words may be determined as the attribute of the sentence 3 . For example, when the font of some of the words 505 and 510 among the words constituting sentence 3 is 'Gulrim' and the fonts of the other words are 'Badang', the controller 240 controls the font of the word having the 'Gurum' font. The number and the number of words having the 'background' font may be checked, and a font having a larger number may be determined as the font of sentence 3 by comparing the number. In this case, since the number of words having the 'background' font is greater than the number of words having the 'Gulim' font, the font of sentence 3 may be determined as 'background'. Accordingly, in the sentence attribute data 515 of the second electronic document 500 , the sentence attribute data 515 of sentence 3 is '3. background, size 9, color black.

다시 도 2를 참조하여 제어부(240)는 결정된 문장 유사도에 따라 두 개의 전자 문서들 각각에 대응하여 적어도 하나의 구분된 문장 각각을 분석하여 적어도 하나의 단어를 구분하고, 적어도 하나의 구분된 단어에 대한 단어 속성 데이터를 생성할 수 있다. 예를 들어, 제어부(240)는 문장 유사도가 기 설정된 임계 문장 유사도 이상이면 단어 속성 데이터를 생성하기 위한 동작을 수행할 수 있으나, 이에 한정되지 않으며, 도 3의 제1 전자 문서(300) 및 제2 전자 문서(310)의 문장 비교 지표 데이터가 일치하면 단어 속성 데이터를 생성하기 위한 동작을 수행할 수도 있다.Referring back to FIG. 2 , the control unit 240 analyzes each of at least one divided sentence corresponding to each of the two electronic documents according to the determined sentence similarity to classify at least one word, and adds to the at least one divided word. You can create word attribute data for For example, if the sentence similarity is greater than or equal to a preset threshold sentence similarity, the control unit 240 may perform an operation for generating word attribute data, but is not limited thereto, and the first electronic document 300 and the first electronic document 300 of FIG. When the sentence comparison index data of the two electronic documents 310 match, an operation for generating word attribute data may be performed.

제어부(240)는 각 전자 문서에 대한 단어 속성 데이터에 기반하여 단어 비교 지표 데이터를 생성하고, 생성된 단어 비교 지표 데이터를 이용하여 두 개의 전자 문서 간의 단어 유사도를 결정할 수 있다. The controller 240 may generate word comparison index data based on word attribute data for each electronic document, and determine the word similarity between two electronic documents using the generated word comparison index data.

단어 유사도를 결정하기 위한 방법에 대해서 도 6, 도 7a 및 도 7b를 참조하여 구체적으로 설명하도록 한다.A method for determining the word similarity will be described in detail with reference to FIGS. 6, 7A, and 7B.

도 6은 본 발명의 실시예에 따른 두 개의 전자 문서에 대한 단어 속성 데이터를 나타내는 예시도이고, 도 7a 및 도 7b는 본 발명의 실시예에 따른 두 개의 전자 문서에 대한 단어 비교 지표 데이터를 나타내는 예시도들이다.6 is an exemplary diagram illustrating word attribute data for two electronic documents according to an embodiment of the present invention, and FIGS. 7A and 7B are diagrams showing word comparison index data for two electronic documents according to an embodiment of the present invention they are examples

도 3에서 상술한 바와 같이 제1 전자 문서(300)는 13개의 문장으로 구분되고, 제2 전자 문서(310)는 13개의 문장으로 구분된다고 가정한다.As described above in FIG. 3 , it is assumed that the first electronic document 300 is divided into 13 sentences and the second electronic document 310 is divided into 13 sentences.

제어부(240)는 도 6의 (a)와 같이 제1 전자 문서(300)의 13개 문장에 대한 제1 단어 속성 데이터를 생성하고, 도 6의 (b)와 같이 제2 전자 문서(310)의 13개 문장에 대한 제2 단어 속성 데이터를 생성할 수 있다.The controller 240 generates first word attribute data for 13 sentences of the first electronic document 300 as shown in FIG. 6(a), and as shown in FIG. 6(b), the second electronic document 310 It is possible to generate second word attribute data for 13 sentences of .

도 6의 (a)를 참조하면, 제1 단어 속성 데이터는 문장 번호(예: 1, 2, …13), 문장 번호별 단어(예: 특허란, 특허제도의, 목적, …각 단어에 대한 글꼴(예: 맑은 고딕, 굴림, 돋음, …), 각 단어에 대한 글꼴 크기(예: 12, 10, 9, …및 글꼴 색상(예: 연한 파랑, 파랑, 검정, …)을 포함할 수 있다.Referring to (a) of FIG. 6 , the first word attribute data includes sentence numbers (eg, 1, 2, … 13), words for each sentence number (eg, patent column, patent system, purpose, … for each word). You can include the font (eg Clear Gothic, Arial, Helmet, …), the font size for each word (eg 12, 10, 9, …, and the font color (eg light blue, blue, black, …) .

도 6의 (b)를 참조하면, 제2 단어 속성 데이터는 문장 번호(예: 1, 2, …13), 문장 번호별 단어(예: 특허란, 특허제도의, 목적, …각 단어에 대한 글꼴(예: 맑은 고딕, 굴림, 바탕, …), 각 단어에 대한 글꼴 크기(예: 12, 10, 9, …및 글꼴 색상(예: 연한 파랑, 파랑, 검정, …을 포함할 수 있다.Referring to (b) of FIG. 6 , the second word attribute data includes sentence numbers (eg, 1, 2, … 13), words for each sentence number (eg, patent column, patent system, purpose, … for each word). You can include the font (eg Clear Gothic, Arial, Background, …), the font size for each word (eg 12, 10, 9, …, and the font color (eg light blue, blue, black, …).

도 7a 및 도 7b를 참조하면, 제어부(240)는 제1 단어 속성 데이터에 기반하여 도 7a와 같이 제1 단어 비교 지표 데이터를 생성하고, 제2 단어 속성 데이터에 기반하여 도 7b와 같이 제2 단어 비교 지표 데이터를 생성할 수 있다. 7A and 7B , the controller 240 generates first word comparison index data as shown in FIG. 7A based on the first word attribute data, and the second word attribute data as shown in FIG. 7B based on the second word attribute data. Word comparison index data can be generated.

구체적으로, 제어부(240)는 제1 전자 문서(300)의 각 단어의 단어 속성 데이터(예: 글꼴, 크기, 색상)에 기반하여 제1 단어 비교 지표 데이터(예: 글꼴 형태, 글꼴 형태별 빈도수, 글꼴 크기 비율, 글꼴 크기 비율별 빈도수, 글꼴 색상 비율 및 글꼴 색상 비율별 빈도수)를 생성하고, 제2 전자 문서(310)의 각 단어의 속성 데이터에 기반하여 제2 단어 비교 지표 데이터를 생성할 수 있다.Specifically, the control unit 240 is configured to control first word comparison index data (eg, font type, frequency of each font type, font size ratio, frequency by font size ratio, font color ratio, and frequency by font color ratio), and to generate second word comparison index data based on the attribute data of each word in the second electronic document 310 have.

예를 들어, 제1 전자 문서(300)에 대해서 제어부(240)는 문장 1의 단어 '특허란'의 단어 속성 데이터(예: 문장 1, 맑은 고딕, 12, 연한 파랑), 문장 2의 단어 '특허제도의'의 단어 속성 데이터(예: 문장 2, 굴림, 10, 파랑), 문장 2의 단어 '목적'의 단어 속성 데이터(예: 문장 2, 굴림, 10, 파랑), 문장 3의 단어 '특허제도는'의 단어 속성 데이터(예: 문장 3, 돋음, 9, 검정), 문장 3의 단어 '발명을'의 단어 속성 데이터(예: 문장 3, 돋음, 9, 검정), …에 기반하여 도 7a의 (a)와 같이 '맑은 고딕' 글꼴에 대응하여 제1 글꼴 형태를 나타내는 임의의 문자 'A'를 부여하고, '굴림' 글꼴에 대응하여 제2 글꼴 형태를 나타내는 임의의 문자 'B'를 부여하고, '돋음' 글꼴에 대응하여 제3 글꼴 형태를 나타내는 임의의 문자 'C'를 부여하고, 각 글꼴 형태를 갖는 단어의 개수인 글꼴 형태 빈도수를 산출할 수 있다. 또한 제어부(240)는 제1 단어 속성 데이터에 기반하여 도 7a의 (b)와 같이 전체 문서 크기 대비 각 단어의 글꼴 크기의 비율을 산출하며, 각 글꼴 크기 비율을 갖는 단어의 개수인 글꼴 크기 비율별 빈도수를 산출하고, 도 7a의 (c)와 같이 전체 RGB 색상 대비 각 단어의 글꼴 색상의 비율을 산출하여 각 글꼴 색상 비율을 갖는 단어의 개수인 글꼴 색상 비율별 빈도수를 산출할 수 있다. For example, with respect to the first electronic document 300 , the controller 240 controls the word attribute data of the word 'patent column' of sentence 1 (eg, sentence 1, clear gothic, 12, light blue), the word 'of sentence 2' Word attribute data of 'of the patent system (eg sentence 2, roll, 10, blue), word attribute data of word 'purpose' of sentence 2 (eg sentence 2, roll, 10, blue), word ' of sentence 3 In the patent system, word attribute data of '(eg, sentence 3, bold, 9, black), word attribute data of the word 'invention' of sentence 3 (eg, sentence 3, bold, 9, black), ... As shown in (a) of FIG. 7a, an arbitrary letter 'A' representing the first font type is given in correspondence to the 'clear Gothic' font, and an arbitrary letter 'A' representing the second font type corresponding to the 'Gulim' font is given. A letter 'B' may be assigned, and an arbitrary letter 'C' representing a third font shape may be assigned to the 'embossed' font, and the font shape frequency, which is the number of words having each font shape, may be calculated. Also, the controller 240 calculates a ratio of the font size of each word to the overall document size as shown in FIG. 7A (b) based on the first word attribute data, and the font size ratio, which is the number of words having each font size ratio. By calculating the frequency of each star, and calculating the ratio of the font color of each word to the total RGB color as shown in FIG.

제어부(240)는 도 7a의 (a), (b), (c)와 같이 제1 전자 문서(300)에 대응하는 제1 단어 비교 지표 데이터를 생성할 수 있다.The controller 240 may generate first word comparison index data corresponding to the first electronic document 300 as shown in (a), (b), and (c) of FIG. 7A .

다음으로, 제2 전자 문서(310)에 대해서 제어부(240)는 문장 1의 단어 '특허란'의 단어 속성 데이터(예: 문장 1, 맑은 고딕, 12, 연한 파랑), 문장 2의 단어 '특허제도의'의 단어 속성 데이터(예: 문장 2, 굴림, 10, 파랑), 문장 2의 단어 '목적'의 단어 속성 데이터(예: 문장 2, 굴림, 10, 파랑), 문장 3의 단어 '특허제도는'의 단어 속성 데이터(예: 문장 3, 돋음, 9, 검정), 문장 3의 단어 '발명을'의 단어 속성 데이터(예: 문장 3, 바탕, 9, 검정), …에 기반하여 도 7b의 (a)와 같이 '맑은 고딕' 글꼴에 대응하여 제1 글꼴 형태를 나타내는 임의의 문자 'A'를 부여하고, '굴림' 글꼴에 대응하여 제2 글꼴 형태를 나타내는 임의의 문자 'B'를 부여하고, '바탕' 글꼴에 대응하여 제3 글꼴 형태를 나타내는 임의의 문자 'C'를 부여하고, 각 글꼴 형태를 갖는 단어의 개수인 글꼴 형태 빈도수를 산출할 수 있다. Next, with respect to the second electronic document 310 , the controller 240 controls the word attribute data of the word 'patent column' of sentence 1 (eg, sentence 1, clear gothic, 12, light blue), the word 'patent of sentence 2' Word attribute data of 'of the system' (eg sentence 2, Arial, 10, blue), word attribute data of the word 'purpose' of sentence 2 (eg sentence 2, roll, 10, blue), word 'patent' of sentence 3 The word attribute data of 'draft' (eg sentence 3, raised, 9, black), the word attribute data of the word 'invention' of sentence 3 (eg sentence 3, ground, 9, black), ... As shown in (a) of FIG. 7b based on , an arbitrary letter 'A' representing the first font type is given in correspondence with the 'clear Gothic' font, and an arbitrary letter indicating the second font type corresponding to the 'Gulim' font is given. The letter 'B' may be assigned, and an arbitrary letter 'C' representing the third font shape may be assigned to the 'base' font, and the font shape frequency, which is the number of words having each font shape, may be calculated.

다양한 실시예에서 제2 전자 문서(310)에서 문장 3을 구성하는 단어들은 문장 4 내지 문장 13을 구성하는 단어들과 글꼴(예: 바탕, 돋음)이 서로 다르고, 제1 전자 문서(300)에서 문장 3을 구성하는 단어들의 글꼴(예: 돋음)과 제2 전자 문서(310)에서 문장 3을 구성하는 단어들의 글꼴(예: 바탕)이 서로 다르지만 제1 전자 문서(300)에서 문장 3을 구성하는 단어들과 제2 전자 문서(310)에서 문장 3을 구성하는 단어들은 동일한 문장 번호에 해당하므로, 제어부(240)는 제1 전자 문서(300)에서 문장 3을 구성하는 단어들의 글꼴 형태를 제2 전자 문서(310)에서 문장 3을 구성하는 단어들의 글꼴 형태와 동일하게 임의의 문자 'C'로 부여할 수 있다.In various embodiments, words constituting sentence 3 in the second electronic document 310 have different fonts (eg, background, embossed) from words constituting sentences 4 to 13, and in the first electronic document 300 Although the font (eg, embossed) of the words constituting sentence 3 and the font (eg, background) of the words constituting sentence 3 in the second electronic document 310 are different, sentence 3 is composed in the first electronic document 300 . Since the words and words constituting sentence 3 in the second electronic document 310 correspond to the same sentence number, the controller 240 sets the font shape of the words constituting sentence 3 in the first electronic document 300 . 2 In the electronic document 310, an arbitrary letter 'C' may be assigned to the same font form as the words constituting the sentence 3 .

또한 제어부(240)는 제2 단어 속성 데이터에 기반하여 도 7b의 (b)와 같이 전체 문서 크기 대비 각 단어의 글꼴 크기의 비율을 산출하며, 각 글꼴 크기 비율을 갖는 단어의 개수인 글꼴 크기 비율별 빈도수를 산출하고, 도 7b의 (c)와 같이 전체 RGB 색상 대비 각 단어의 글꼴 색상의 비율을 산출하여 각 글꼴 색상 비율을 갖는 단어의 개수인 글꼴 색상 비율별 빈도수를 산출할 수 있다. Also, the controller 240 calculates a ratio of the font size of each word to the overall document size as shown in FIG. 7B (b) based on the second word attribute data, and the font size ratio, which is the number of words having each font size ratio. By calculating the frequency of each star and calculating the ratio of the font color of each word to the entire RGB color as shown in FIG.

제어부(240)는 도 7b의 (a), (b), (c)와 같이 제2 전자 문서(310)에 대응하는 제2 단어 비교 지표 데이터를 생성할 수 있다.The controller 240 may generate second word comparison index data corresponding to the second electronic document 310 as shown in (a), (b), and (c) of FIG. 7B .

제어부(240)는 이와 같이 생성된 제1 단어 비교 지표 데이터와 제2 단어 비교 지표 데이터를 비교하여 일치하는 단어의 개수를 산출하고, 산출된 단어 개수를 수치화하거나, 전체 단어 개수 대비 일치하는 단어 개수의 비율을 수치화하여 단어 유사도로서 산출할 수 있다. The control unit 240 compares the generated first word comparison index data with the second word comparison index data to calculate the number of matching words, digitize the calculated number of words, or the number of matching words compared to the total number of words It can be calculated as word similarity by quantifying the ratio of .

다시 도 2를 참조하여 제어부(240)는 결정된 단어 유사도에 따라 두 개의 전자 문서들 간의 일치 여부를 결정하고, 이를 나타내는 결과 데이터를 제공할 수 있다. 예를 들어, 제어부(240)는 단어 유사도가 기 설정된 임계 단어 유사도 이상이면 전자 문서들 간의 일치 여부를 결정하기 위한 동작을 수행할 수 있으나, 이에 한정되지 않으며, 도 3의 제1 전자 문서(300) 및 제2 전자 문서(310)의 단어 비교 지표 데이터가 일치하면 전자 문서들 간의 일치 여부를 결정하기 위한 동작을 수행할 수도 있다.Referring again to FIG. 2 , the controller 240 may determine whether two electronic documents match or not according to the determined word similarity, and may provide result data indicating this. For example, if the word similarity is greater than or equal to a preset threshold word similarity, the controller 240 may perform an operation for determining whether electronic documents match, but is not limited thereto, and the first electronic document 300 of FIG. 3 . ) and the word comparison index data of the second electronic document 310 match, an operation for determining whether the electronic documents match may be performed.

다양한 실시예에서 전자 문서 내에 글자가 아닌 객체가 포함된 경우 제어부(240)는 도 1에서 상술한 전자 장치(100)의 글자가 아닌 객체가 포함된 경우의 동작과 동일한 동작을 수행할 수 있다.According to various embodiments, when an object other than text is included in the electronic document, the controller 240 may perform the same operation as when the object other than text of the electronic device 100 described above with reference to FIG. 1 is included.

이와 같이 본 발명은 서로 다른 포맷의 전자 문서들 간의 일치 여부를 결정하기 위해 동일한 포맷으로 변환하기 위한 번거로움을 최소화할 수 있고, 사용자가 특정 전자 문서에 대하여 다른 포맷의 사본을 만든 경우 원본과 사본의 일치 여부를 용이하게 확인할 수 있다.As described above, the present invention can minimize the hassle of converting to the same format in order to determine whether electronic documents of different formats match each other, and when a user makes a copy in a different format for a specific electronic document, the original and the copy It can be easily checked whether the

전자 문서들 간의 일치 여부를 나타내는 결과 데이터를 제공하기 위해 도 8을 참조하여 구체적으로 설명하도록 한다.In order to provide result data indicating whether electronic documents match or not, it will be described in detail with reference to FIG. 8 .

도 8은 본 발명의 실시예에 따른 두 개의 전자 문서들에 대한 일치 여부를 결정하기 위해 사용되는 결정 지표 데이터를 나타내는 예시도이다.8 is an exemplary diagram illustrating decision index data used to determine whether two electronic documents match or not according to an embodiment of the present invention.

도 8을 참조하면, 결정 지표 데이터는 문장 번호, 문장 번호별 단어, 단어별 글꼴 형태, 단어별 글꼴 크기 비율 및 단어별 글꼴 색상 비율을 포함할 수 있다. Referring to FIG. 8 , the determination index data may include a sentence number, a word by sentence number, a font shape for each word, a font size ratio for each word, and a font color ratio for each word.

구체적으로, 제어부(240)는 도 8의 (a)와 같이 도 3의 제1 전자 문서(300)에 대한 제1 결정 지표 데이터와 도 8의 (b)와 같이 제2 전자 문서(310)에 대한 제2 결정 지표 데이터를 이용하여 제1 전자 문서(300) 및 제2 전자 문서(310) 간의 일치 여부를 결정할 수 있다. Specifically, the control unit 240 is the first determination index data for the first electronic document 300 of FIG. 3 as shown in FIG. 8(a) and the second electronic document 310 as shown in FIG. 8(b). Whether or not the first electronic document 300 and the second electronic document 310 match may be determined by using the second determination index data for .

예를 들어, 제어부(240)는 각 문장 번호에 따라 단어를 비교하고, 각 단어가 일치하면 각 단어에 따른 글꼴 형태, 글꼴 크기 비율, 및 글꼴 색상 비율을 비교하여 일치 여부를 결정할 수 있다.For example, the controller 240 may compare words according to each sentence number, and if each word matches, compare the font shape, font size ratio, and font color ratio according to each word to determine whether they match.

일치하지 않은 적어도 하나의 단어가 존재하면 제어부(240)는 일치하지 않은 단어를 나타내는 결과 데이터를 제공할 수 있다. 제시된 실시예에서 결과 데이터를 제공하는 동작은 결과 데이터를 표시부(220)를 통해 표시하는 동작을 의미할 수 있다.If at least one word that does not match exists, the controller 240 may provide result data indicating the word that does not match. In the presented embodiment, the operation of providing the result data may mean an operation of displaying the result data through the display unit 220 .

각 단어가 일치하지만 글꼴 형태, 글꼴 크기 비율, 및 글꼴 색상 비율 중 적어도 하나가 일치하지 않으면 제어부(240)는 이를 나타내는 결과 데이터를 제공할 수 있다. If each word matches but at least one of a font shape, a font size ratio, and a font color ratio does not match, the controller 240 may provide result data indicating this.

다양한 실시예에서 제어부(240)는 각 문장 번호에 따라 단어를 비교하여 일치하지 않은 적어도 하나의 단어가 존재하면 다른 문장 번호에 해당하는 단어들을 비교하여 일치 여부를 결정할 수 있다. 예를 들어, 제1 전자 문서의 문장 1에 해당하는 단어인 “특허란”과 제2 전자 문서의 문장 1에 해당하는 단어인 “특허요건”이 서로 일치하지 않으면 제어부(240)는 제2 전자 문서에서 제1 전자 문서의 문장 1에 해당하는 단어인 “특허란”과 일치하는 단어를 포함하는 문장이 존재하거나, 제1 전자 문서에서 제2 전자 문서의 문장 1에 해당하는 단어인 “특허요건”과 일치하는 단어를 포함하는 문장이 존재하는지를 결정할 수 있다. In various embodiments, the controller 240 compares words according to each sentence number and, if there is at least one word that does not match, compares words corresponding to other sentence numbers to determine whether or not they match. For example, if the word “patent column” corresponding to sentence 1 of the first electronic document and “patent requirement”, which is the word corresponding to sentence 1 of the second electronic document, do not match, the controller 240 controls the second electronic document In the document, there is a sentence that includes a word that matches the word “patent column”, which is the word corresponding to sentence 1 of the first electronic document, or “patent requirement” that is a word corresponding to sentence 1 of the second electronic document in the first electronic document It can be determined whether there is a sentence containing a word matching ”.

제1 전자 문서의 문장 5에서 “특허요건”과 일치하는 단어를 포함하면 제어부(240)는 이를 나타내는 결과 데이터를 제공할 수 있다. 이때, 결과 데이터는 제2 전자 문서의 문장 1에 해당하는 “특허요건”이 제1 전자 문서의 문장 5에 포함되어 있음을 알리기 위한 데이터일 수 있다.If a word matching "patent requirement" is included in sentence 5 of the first electronic document, the controller 240 may provide result data indicating this. In this case, the result data may be data for notifying that the “patent requirement” corresponding to sentence 1 of the second electronic document is included in sentence 5 of the first electronic document.

다양한 실시예에서 제2 전자 문서의 문장 10에서 “특허란”과 일치하는 단어를 포함하면 제어부(240)는 이를 나타내는 결과 데이터 또한 제공할 수 있다. 이때, 결과 데이터는 제1 전자 문서의 문장 1에 해당하는 “특허란”이 제2 전자 문서의 문장 10에 포함되어 있음을 알리기 위한 데이터일 수 있다.In various embodiments, if a word matching "patent column" is included in sentence 10 of the second electronic document, the controller 240 may also provide result data indicating this. In this case, the result data may be data for notifying that the “patent column” corresponding to sentence 1 of the first electronic document is included in sentence 10 of the second electronic document.

하기에서는 전자 장치(100)에서 서로 다른 포맷의 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공하는 방법에 대해서 도 9를 참조하여 설명하도록 한다.Hereinafter, a method of providing data indicating whether electronic documents of different formats match each other in the electronic device 100 will be described with reference to FIG. 9 .

도 9는 본 발명의 실시예에 따른 전자 장치에서 서로 다른 포맷의 전자 문서들 간의 일치 여부를 나타내는 데이터를 제공하는 방법을 설명하기 위한 개략적인 흐름도이다. 하기에서 서술하는 동작들은 전자 장치(200)의 제어부(240)에 의해서 수행될 수 있다.9 is a schematic flowchart for explaining a method of providing data indicating whether electronic documents of different formats match each other in an electronic device according to an embodiment of the present invention. The operations described below may be performed by the controller 240 of the electronic device 200 .

도 9를 참조하면, 전자 장치(100)는 서로 다른 포맷의 전자 문서들 각각을 분석하여 문장 속성 데이터를 생성하고(S900), 생성된 문장 속성 데이터에 기반하여 전자 문서들에 대한 문장 유사도를 결정한다(S910). Referring to FIG. 9 , the electronic device 100 analyzes each of the electronic documents in different formats to generate sentence attribute data ( S900 ), and determines the sentence similarity to the electronic documents based on the generated sentence attribute data. do (S910).

구체적으로, 전자 장치(100)는 전자 문서들 각각을 분석하여 글자와 글자가 아닌 객체로 구분하고, 글자를 분석하여 적어도 하나의 문장을 구분할 수 있다. 전자 장치(100)는 각 전자 문서의 적어도 하나의 문장에 대한 문장 속성 데이터를 생성하고, 생성된 문장 속성 데이터를 기반하여 전자 문서들의 각 문장을 비교하기 위한 문장 비교 지표 데이터를 생성할 수 있다. 여기서, 문장 속성 데이터는 문장의 글꼴, 글꼴 크기, 글꼴 색상 및/또는 효과 등을 포함할 수 있으나, 이에 한정되지 않는다. 또한 문장 비교 지표 데이터는 문장 번호, 문장 번호별 글꼴 형태, 글꼴 크기 비율 및 글꼴 색상 비율 등을 포함할 수 있다.Specifically, the electronic device 100 may analyze each of the electronic documents to classify them into letters and non-letters, and analyze the letters to classify at least one sentence. The electronic device 100 may generate sentence attribute data for at least one sentence of each electronic document, and generate sentence comparison index data for comparing each sentence of the electronic documents based on the generated sentence attribute data. Here, the sentence attribute data may include, but is not limited to, the font, font size, font color and/or effect of the sentence. In addition, the sentence comparison index data may include a sentence number, a font shape for each sentence number, a font size ratio, a font color ratio, and the like.

전자 장치(100)는 생성된 문장 비교 지표 데이터를 이용하여 전자 문서들에 대한 문장 유사도를 결정할 수 있다. 예를 들어, 전자 장치(100)는 전자 문서들에 대응하여 문장 비교 지표 데이터가 일치하는 문장의 개수를 산출하고, 산출된 문장 개수에 기반하여 문장 유사도를 결정할 수 있다.The electronic device 100 may determine the sentence similarity with respect to electronic documents by using the generated sentence comparison index data. For example, the electronic device 100 may calculate the number of sentences in which the sentence comparison index data matches the electronic documents, and determine the sentence similarity based on the calculated number of sentences.

전자 장치(100)는 결정된 문장 유사도에 따라 전자 문서들에 대한 단어 속성 데이터를 생성하고(S920), 생성된 단어 속성 데이터에 기반하여 전자 문서들에 대한 단어 유사도를 결정한다(S930).The electronic device 100 generates word attribute data for the electronic documents according to the determined sentence similarity (S920), and determines the word similarity for the electronic documents based on the generated word attribute data (S930).

구체적으로, 전자 장치(100)는 각 전자 문서의 적어도 하나의 구분된 문장을 분석하여 각 문장에 대한 적어도 하나의 단어를 구분하고, 적어도 하나의 구분된 단어에 대한 단어 속성 데이터를 생성할 수 있다. 여기서, 단어 속성 데이터는 단어의 문장 번호, 글꼴, 글꼴 크기, 글꼴 색상 및/또는 효과 등을 포함할 수 있으나, 이에 한정되지 않는다.Specifically, the electronic device 100 may analyze at least one divided sentence of each electronic document to classify at least one word for each sentence, and generate word attribute data for the at least one divided word. . Here, the word attribute data may include, but is not limited to, sentence number, font, font size, font color and/or effect of the word.

전자 장치(100)는 생성된 단어 속성 데이터에 기반하여 전자 문서들의 각 단어를 비교하기 위한 단어 비교 지표 데이터를 생성하고, 생성된 단어 비교 지표 데이터를 이용하여 전자 문서들에 대한 단어 유사도를 결정할 수 있다. 여기서, 단어 비교 지표 데이터는 글꼴 형태, 글꼴 형태별 빈도수, 글꼴 크기 비율, 글꼴 크기 비율별 빈도수, 글꼴 색상 비율 및 글꼴 색상 비율별 빈도수를 포함할 수 있으나, 이에 한정되지 않는다. 예를 들어, 전자 장치(100)는 전자 문서들에 대응하여 단어 비교 지표 데이터가 일치하는 단어의 개수를 산출하고, 산출된 단어 개수에 기반하여 단어 유사도를 결정할 수 있다.The electronic device 100 may generate word comparison index data for comparing each word of electronic documents based on the generated word attribute data, and determine word similarity with respect to electronic documents using the generated word comparison index data. have. Here, the word comparison index data may include, but is not limited to, a font type, a frequency for each font type, a font size ratio, a frequency for each font size ratio, a font color ratio, and a frequency for each font color ratio. For example, the electronic device 100 may calculate the number of words that match the word comparison index data corresponding to the electronic documents, and determine the word similarity based on the calculated number of words.

전자 장치(100)는 결정된 단어 유사도에 따라 전자 문서들 간의 일치 여부를 결정하고(S940), 전자 문서들 간의 일치 여부를 나타내는 결과 데이터를 제공한다(S950).The electronic device 100 determines whether electronic documents match each other according to the determined word similarity (S940), and provides result data indicating whether the electronic documents match (S950).

구체적으로, 전자 장치(100)는 전자 문서들 각각의 문장 번호별 각 단어를 비교하여 단어 일치 여부를 결정하고, 문장 번호별 각 단어가 서로 일치하면 각 단어의 글꼴 형태, 글꼴 크기 비율 및 글꼴 색상 비율 중 적어도 하나를 비교하여 일치 여부를 결정할 수 있다.Specifically, the electronic device 100 compares each word by sentence number of each of the electronic documents to determine whether the word matches, and if each word by sentence number matches each other, the font shape, font size ratio, and font color of each word A match may be determined by comparing at least one of the ratios.

각 단어의 글꼴 형태, 글꼴 크기 비율 및 글꼴 색상 비율 중 적어도 하나가 서로 일치하면 전자 장치(100)는 서로 다른 포맷의 전자 문서들이 서로 일치한다고 결정하고, 이를 나타내는 결과 데이터를 제공할 수 있다.When at least one of a font shape, a font size ratio, and a font color ratio of each word matches each other, the electronic device 100 may determine that electronic documents of different formats match each other, and provide result data indicating this.

문장 번호별 각 단어가 서로 일치하지 않으면 전자 장치(100)는 일치하지 않은 단어 및 단어에 대한 속성 데이터를 나타내는 결과 데이터를 제공할 수 있다.If the words for each sentence number do not match each other, the electronic device 100 may provide the non-matching word and result data indicating attribute data for the word.

각 단어의 글꼴 형태, 글꼴 크기 비율 및 글꼴 색상 비율 중 적어도 하나가 일치하지 않으면 전자 장치(100)는 이를 나타내는 결과 데이터를 제공할 수 있다. If at least one of a font shape, a font size ratio, and a font color ratio of each word does not match, the electronic device 100 may provide result data indicating this.

이와 같이 본 발명은 문서 포맷과 상관없이 둘 이상의 전자 문서들 간의 일치 여부를 빠르고 쉽게 확인할 수 있다.As described above, according to the present invention, it is possible to quickly and easily check whether two or more electronic documents match each other regardless of the document format.

본 발명의 실시예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.The apparatus and method according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.

컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The program instructions recorded on the computer readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the computer software field. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - Includes magneto-optical media and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형 실시될 수 있다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although embodiments of the present invention have been described in more detail with reference to the accompanying drawings, the present invention is not necessarily limited to these embodiments, and various modifications may be made within the scope without departing from the technical spirit of the present invention. . Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100, 200: 전자 장치
210: 통신부
220: 표시부
230: 저장부
240: 제어부100, 200: electronic device
210: communication department
220: display unit
230: storage
240: control unit

Claims

A data providing method for indicating whether electronic documents of different formats match each other, which is performed by a control unit of a data providing device indicating whether electronic documents of different formats match, the method comprising:
generating sentence attribute data by analyzing each of the electronic documents in the different formats;
determining sentence similarity with respect to the electronic documents based on the generated sentence attribute data;
generating word attribute data for the electronic documents according to the determined sentence similarity;
determining word similarity with respect to the electronic documents based on the generated word attribute data;
determining whether the electronic documents match each other according to the determined word similarity; and
and providing result data indicating whether the electronic documents match each other.

According to claim 1,
The sentence attribute data includes at least one of a font, a size, a color, and an effect for each of at least one sentence,
Wherein the word attribute data includes at least one of a sentence number, font, size, color, and effect for each of at least one word, the data providing method indicating whether electronic documents of different formats match.

The method of claim 1, wherein the generating of the sentence attribute data comprises:
classifying text and non-text objects for each of the electronic documents;
classifying at least one sentence by analyzing the letters;
and generating sentence attribute data for the at least one divided sentence.

The method of claim 3, wherein the object that does not correspond to a letter,
A data providing method indicating whether electronic documents of different formats, including at least one of an image, a table, a graph, a figure, and an annotation, match.

The method of claim 1, wherein determining the sentence similarity comprises:
generating sentence comparison index data used to compare sentences of the electronic documents based on the sentence attribute data; and
and determining the sentence similarity by using the generated sentence comparison index data.

According to claim 5, wherein the sentence comparison index data,
A method of providing data indicating whether electronic documents of different formats are matched, including at least one of a font type, a font size, a font color, and an effect.

The method of claim 5, wherein the sentence similarity
In electronic documents of different formats, a value obtained by quantifying the number of sentences in which the sentence comparison index data between the electronic documents matches or the ratio of the number of sentences in which the sentence comparison index data between the electronic documents match to the total number of sentences How to provide data indicating whether or not there is a match.

The method of claim 3, wherein the generating of the word attribute data comprises:
dividing each of the at least one divided sentence into at least one word for each of the electronic documents; and
and generating word attribute data for the at least one divided word.

The method of claim 1, wherein determining the word similarity comprises:
generating word comparison index data used to compare words of the electronic documents by using the word attribute data; and
and determining the word similarity by using the generated word comparison index data.

10. The method of claim 9, wherein the word comparison index data,
A method for providing data indicating whether electronic documents of different formats match or not, including at least one of a font type, a frequency for each font type, a font size ratio, a frequency for each font size, a font color ratio, and a frequency for each font color.

The method of claim 10, wherein the word similarity
In electronic documents of different formats, which is a value obtained by quantifying the number of words in which the word comparison index data between the electronic documents match or the ratio of the number of words in which the word comparison index data between the electronic documents match to the total number of words How to provide data indicating whether or not there is a match.

a storage unit for storing electronic documents in different formats; and
a control unit configured to connect with the storage unit;
The control unit is
Analyze each of the electronic documents of the different formats to generate sentence attribute data,
determining sentence similarity to the electronic documents based on the generated sentence attribute data;
generating word attribute data for the electronic documents according to the determined sentence similarity;
determining a word similarity with respect to the electronic documents based on the generated word attribute data;
determining whether the electronic documents match each other according to the determined word similarity;
and providing result data indicating whether the electronic documents match each other.