KR20220031097A

KR20220031097A - Document processing methods, devices, devices and computer-readable storage media

Info

Publication number: KR20220031097A
Application number: KR1020227004409A
Authority: KR
Inventors: 밍제 잔; 옌 쉬; 딩 량; 쉐보 류
Original assignee: 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드
Priority date: 2020-06-29
Filing date: 2021-06-11
Publication date: 2022-03-11
Also published as: JP2022543052A; WO2022001637A1; CN111782808A

Abstract

본 발명은 문서 처리 방법, 장치, 기기 및 컴퓨터 판독 가능 저장 매체를 제공한다. 상기 방법은, 처리될 문서의 시맨틱 특징 및 시각적 특징을 획득하는 단계; 상기 시맨틱 특징 및 상기 시각적 특징에 근거하여 상기 처리될 문서의 통용 특징을 결정하는 단계; 및 상기 처리될 문서의 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하는 단계를 포함한다.The present invention provides a document processing method, an apparatus, a device, and a computer-readable storage medium. The method includes: obtaining semantic characteristics and visual characteristics of a document to be processed; determining a currency characteristic of the to-be-processed document based on the semantic characteristic and the visual characteristic; and determining a category of the to-be-processed document based on a common characteristic of the to-be-processed document.

Description

Document processing methods, devices, devices and computer-readable storage media

[관련 출원의 교차 인용][Cross Citation of Related Applications]

본 출원은 2020년 6월 29일에 제출되고, 출원 번호가 202010610080.8이며, 발명의 명칭이 “문서 처리 방법, 장치, 기기 및 컴퓨터 판독 가능 저장 매체”인 중국 출원의 우선권을 주장하는 바, 해당 출원의 모든 내용은 참조로서 본 명세서에 인용된다.This application is filed on June 29, 2020, the application number is 202010610080.8, and claims priority to the Chinese application entitled "Document Processing Method, Apparatus, Apparatus and Computer-readable Storage Medium", the application All contents of are incorporated herein by reference.

[기술분야][Technical field]

본 발명은 컴퓨터 시각 기술에 관한 것으로, 특히 문서 처리 방법, 장치, 기기 및 컴퓨터 판독 가능 저장 매체에 관한 것이다.FIELD OF THE INVENTION The present invention relates to computer vision technology, and more particularly to a document processing method, apparatus, apparatus and computer readable storage medium.

현재, 일반적으로 OCR(Optical Character Recognition, 광학적 문자 판독) 기술을 응용하여 문서를 인식한다. 이 기술을 이용하여 인식할 때, 문서의 카테고리를 정확하게 획득하고, 상응한 템플릿을 사용해야 하지만, 관련 기술에 의한 문서의 분류 결과는 정확하지 않다.Currently, documents are generally recognized by applying OCR (Optical Character Recognition) technology. When recognizing using this technology, the category of the document should be accurately obtained and the corresponding template should be used, but the classification result of the document by the related technology is not accurate.

따라서, 어떻게 문서를 정확하게 분류하는가 하는 것은 하나의 시급히 해결해야 할 과제이다.Therefore, how to correctly classify documents is an urgent task.

본 발명의 실시예는 문서 분류 방안을 제공한다.An embodiment of the present invention provides a document classification scheme.

본 발명의 일 양태에 따르면, 문서 처리 방법을 제공하고, 상기 방법은, 처리될 문서의 시맨틱 특징 및 시각적 특징을 획득하는 단계; 상기 시맨틱 특징 및 상기 시각적 특징에 근거하여 상기 처리될 문서의 통용 특징을 결정하는 단계; 및 상기 처리될 문서의 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하는 단계를 포함한다.According to one aspect of the present invention, there is provided a document processing method, the method comprising: acquiring semantic characteristics and visual characteristics of a document to be processed; determining a currency characteristic of the to-be-processed document based on the semantic characteristic and the visual characteristic; and determining a category of the to-be-processed document based on a common characteristic of the to-be-processed document.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 처리될 문서의 시맨틱 특징을 획득하는 단계는, 상기 처리될 문서의 텍스트 인식 결과를 획득하는 단계; 및 상기 텍스트 인식 결과에 기반하여, 상기 처리될 문서의 시맨틱 특징을 획득하는 단계를 포함한다.Combining any one embodiment provided in the present invention, the obtaining of the semantic characteristics of the to-be-processed document may include: obtaining a text recognition result of the to-be-processed document; and obtaining a semantic characteristic of the document to be processed based on the text recognition result.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 처리될 문서의 텍스트 인식 결과를 획득하는 단계는, 상기 처리될 문서 중의 타깃 텍스트 박스 및 상기 타깃 텍스트 박스에 포함된 텍스트 콘텐츠를 결정하는 단계; 및 각각의 상기 타깃 텍스트 박스 중의 텍스트 콘텐츠의 단어 분리 처리 결과를 획득하는 단계; 및 상기 단어 분리 처리 결과에 대응하는 특징 벡터를 획득하는 단계를 포함한다.Combining any one embodiment provided in the present invention, the step of obtaining a text recognition result of the to-be-processed document includes: determining a target text box in the to-be-processed document and text content included in the target text box; step; and obtaining a word separation processing result of text content in each of the target text boxes; and obtaining a feature vector corresponding to a result of the word separation processing.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 시각적 특징 및 상기 시맨틱 특징에 근거하여 상기 처리될 문서의 통용 특징을 결정하는 단계는, 상기 시각적 특징 및 상기 시맨틱 특징에 대해 각각 정규화(regularization) 처리를 수행하는 단계; 및 정규화 처리된 상기 시각적 특징 및 정규화 처리된 상기 시맨틱 특징에 대해 가중 합산을 수행하여, 상기 처리될 문서의 통용 특징을 획득하는 단계를 포함한다.Combining any one embodiment provided in the present invention, the step of determining a common characteristic of the to-be-processed document based on the visual characteristic and the semantic characteristic may include normalizing ( regularization) processing; and performing weighted summation on the normalized visual feature and the normalized semantic feature to obtain a common feature of the to-be-processed document.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 문서 처리 방법은 신경망을 이용하여 수행되고, 상기 신경망은 상기 처리될 문서의 통용 특징을 추출하기 위한 특징 추출 서브 네트워크와, 상기 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하기 위한 제1 분류 서브 네트워크를 포함하며, 상기 제1 분류 서브 네트워크는 구체적으로, 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 카테고리의 문서의 표준 특징의 유사도를 결정하고; 획득된 적어도 하나의 유사도에 근거하여 상기 처리될 문서의 카테고리를 결정한다.Combining any one embodiment provided in the present invention, the document processing method is performed using a neural network, wherein the neural network includes a feature extraction sub-network for extracting common features of the document to be processed; a first classification subnetwork for determining the category of the to-be-processed document based on comparing the characteristics to determine a degree of similarity between a common characteristic of the document to be processed and a standard characteristic of the document of the at least one category; A category of the document to be processed is determined based on the obtained at least one degree of similarity.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 획득된 적어도 하나의 유사도에 근거하여 상기 처리될 문서의 카테고리를 결정하는 단계는, 상기 적어도 하나의 유사도 중 가장 높은 유사도를 획득하는 단계; 및 상기 가장 높은 유사도가 기설정된 유사도 임계값보다 크거나 같은 것에 응답하여, 상기 가장 높은 유사도에 대응하는 표준 특징이 속하는 문서의 카테고리를 상기 처리될 문서의 카테고리로 결정하는 단계를 포함한다.Combining any one of the embodiments provided in the present invention, determining the category of the document to be processed based on the obtained at least one degree of similarity may include: acquiring a highest degree of similarity among the at least one degree of similarity; and determining, as the category of the document to be processed, a category of a document to which a standard feature corresponding to the highest similarity belongs in response to the highest similarity being greater than or equal to a preset similarity threshold.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 방법은 상기 신경망 중의 특징 추출 서브 네트워크를 트레이닝하는 단계를 더 포함하고, 구체적으로, 샘플 문서를 상기 특징 추출 서브 네트워크에 입력하여, 상기 샘플 문서의 통용 특징을 획득하는 단계 - 상기 샘플 문서에는 카테고리가 라벨링됨 -; 상기 통용 특징을 제2 분류 서브 네트워크에 입력하여, 상기 샘플 문서의 예측 카테고리를 획득하는 단계; 및 상기 샘플 문서의 예측 카테고리와 상기 샘플 문서의 라벨링 카테고리 사이의 차이에 근거하여, 상기 특징 추출 서브 네트워크의 네트워크 파라미터를 조정하는 단계를 포함한다.Combining any one embodiment provided in the present invention, the method further comprises training a feature extraction subnetwork in the neural network, specifically, inputting a sample document into the feature extraction subnetwork, wherein the obtaining a currency characteristic of a sample document, wherein the sample document is labeled with a category; inputting the common feature into a second classification sub-network to obtain a prediction category of the sample document; and adjusting a network parameter of the feature extraction subnetwork based on a difference between the prediction category of the sample document and the labeling category of the sample document.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 적어도 하나의 카테고리의 문서의 표준 특징은 트레이닝 완료된 특징 추출 서브 네트워크를 이용하여, 상기 적어도 하나의 카테고리의 문서에 대해 특징 추출을 수행함으로써 획득된다.Combining any one embodiment provided in the present invention, the standard feature of the document of the at least one category is obtained by performing feature extraction on the document of the at least one category using a trained feature extraction sub-network. is obtained

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 방법은, 상기 가장 높은 유사도가 상기 기설정된 유사도 임계값보다 작은 것에 응답하여, 상기 처리될 문서를 표준 템플릿으로 추가하고, 상기 처리될 문서의 통용 특징을 신규 추가 표준 템플릿에 대응하는 카테고리의 표준 특징으로 결정하는 단계를 더 포함한다.Combining any one embodiment provided in the present invention, the method includes: in response to the highest similarity being less than the preset similarity threshold, adding the to-be-processed document as a standard template; The method further includes determining the prevailing characteristic of the document as a standard characteristic of a category corresponding to the newly added standard template.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 방법은, 선택 명령에 응답하여, 기설정된 문서 카테고리 중 적어도 하나의 카테고리를 선택하여 타깃 카테고리로 간주하는 단계를 더 포함하고; 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 카테고리의 문서의 표준 특징의 유사도를 결정하는 단계는, 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 타깃 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 타깃 카테고리의 문서의 표준 특징의 유사도를 결정하는 단계를 포함한다.Combining any one embodiment provided in the present invention, the method further includes, in response to the selection command, selecting at least one category from among preset document categories to be regarded as a target category; determining a similarity between the common feature of the document to be processed and the standard feature of the document of the at least one category by comparing the common feature of the to-be-processed document with the standard feature of the document of at least one category; Comparing a common characteristic of a document to be processed with a standard characteristic of a document of at least one preset target category, determining a degree of similarity between the common characteristic of the to-be-processed document and a standard characteristic of a document of the at least one target category do.

본 발명에서 제공되는 임의의 하나의 실시 형태를 결합하면, 상기 방법은, 상기 처리될 문서의 카테고리에 근거하여 대응하는 기설정된 표준 템플릿을 획득하는 단계; 및 상기 표준 템플릿에 기반하여, 상기 처리될 문서에 대해 레이아웃 인식 처리를 수행하여, 문서의 레이아웃 인식 결과를 획득하는 단계를 더 포함한다.Combining any one embodiment provided in the present invention, the method may further include: obtaining a corresponding preset standard template based on the category of the document to be processed; and performing layout recognition processing on the document to be processed based on the standard template to obtain a layout recognition result of the document.

본 발명의 일 양태에 따르면, 문서 처리 장치를 제공하는 바, 상기 장치는, 처리될 문서의 시맨틱 특징 및 시각적 특징을 획득하기 위한 획득 모듈; 상기 시맨틱 특징 및 상기 시각적 특징에 근거하여 상기 처리될 문서의 통용 특징을 결정하기 위한 통용 모듈; 및 상기 처리될 문서의 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하기 위한 분류 모듈을 포함한다.According to an aspect of the present invention, there is provided a document processing apparatus, the apparatus comprising: an acquiring module for acquiring semantic features and visual features of a document to be processed; a currency module for determining a currency characteristic of the to-be-processed document based on the semantic characteristic and the visual characteristic; and a classification module for determining a category of the to-be-processed document based on a common characteristic of the to-be-processed document.

본 발명의 일 양태에 따르면, 문서 처리 기기를 제공하는 바, 상기 문서 처리 기기는 비휘발성 저장 매체, 프로세서를 포함하고, 상기 저장 매체는 프로세서에서 실행 가능한 컴퓨터 명령을 저장하며, 상기 프로세서는 본 발명의 임의의 하나의 실시 형태에 따른 방법을 수행한다.According to one aspect of the present invention, there is provided a document processing device, the document processing device comprising a non-volatile storage medium and a processor, the storage medium storing computer instructions executable by the processor, the processor comprising: perform the method according to any one embodiment of

본 발명의 일 양태에 따르면, 컴퓨터 프로그램이 저장된 컴퓨터 판독 가능 저장 매체를 제공하고, 상기 컴퓨터 프로그램이 프로세서에 의해 실행될 경우, 본 발명의 임의의 하나의 실시 형태에 따른 방법이 구현된다.According to one aspect of the present invention, there is provided a computer readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the method according to any one embodiment of the present invention is implemented.

본 발명의 일 양태에 따르면, 컴퓨터 프로그램을 제공하고, 상기 컴퓨터 프로그램이 프로세서에 의해 실행될 경우, 본 발명의 임의의 하나의 실시 형태에 따른 방법이 구현된다.According to an aspect of the present invention, there is provided a computer program, and when the computer program is executed by a processor, the method according to any one embodiment of the present invention is implemented.

본 발명의 하나 또는 복수의 실시예의 문서 처리 방법, 장치, 기기, 컴퓨터 판독 가능 저장 매체 및 컴퓨터 프로그램은, 획득된 문서의 시각적 특징 및 시맨틱 특징에 근거하여 문서의 통용 특징을 결정하고, 통용 특징에 근거하여 문서의 카테고리를 결정한다. 본 발명의 문서 처리 방법은 임의의 문서에 대한 정확한 분류를 구현할 수 있고; 시맨틱 특징 및 시각적 특징을 결합하여 문서의 통용 특징을 획득함으로써, 시각적 특징이 유사한 상이한 카테고리 문서의 분류 결과의 정확성을 향상시키고, 문서 분류의 강건성(robustness)도 향상시킨다.The document processing method, apparatus, device, computer-readable storage medium and computer program of one or more embodiments of the present invention determine a currency characteristic of a document based on a visual characteristic and a semantic characteristic of the obtained document, and based on the category of the document. The document processing method of the present invention can implement accurate classification for any document; By combining semantic characteristics and visual characteristics to obtain common characteristics of documents, the accuracy of classification results of documents of different categories having similar visual characteristics is improved, and robustness of document classification is also improved.

위의 일반적인 설명 및 후문의 세부 사항에 대한 설명은 예시적이고 해석을 위한 것일 뿐 본 발명을 제한하기 위함이 아니다.The general description above and the description of the details in the back are illustrative and for the purpose of interpretation and not limitation of the present invention.

여기서의 도면은 명세서에 통합되어 본 명세서의 일부분을 구성하고, 이러한 도면들은 본 발명에 부합되는 실시예를 나타내며, 명세서와 함께 본 명세서의 원리를 해석하기 위한 것이다.
도 1은 본 발명의 실시예에 도시된 문서 처리 방법의 흐름도이다.
도 2는 본 발명의 실시예에 따른 시각적 특징을 추출하기 위한 신경망의 일부 네트워크 구조를 예시적으로 도시한다.
도 3은 본 발명의 실시예에 따른 시맨틱 특징을 추출하기 위한 신경망의 일부 네트워크 구조를 예시적으로 도시한다.
도 4는 본 발명의 실시예에 도시된 서식(form)의 텍스트 인식 과정의 모식도이다.
도 5는 본 발명의 실시예에 도시된 사용자 선택 인터페이스 모식도이다.
도 6은 본 발명의 실시예에 도시된 문서 처리 장치의 모식도이다.
도 7은 본 발명의 실시예에 도시된 문서 처리 기기의 구조 모식도이다.BRIEF DESCRIPTION OF THE DRAWINGS The drawings herein are incorporated in and constitute a part of this specification, and these drawings illustrate embodiments consistent with the present invention and, together with the specification, serve to interpret the principles of the present specification.
1 is a flowchart of a document processing method shown in an embodiment of the present invention.
2 exemplarily shows a partial network structure of a neural network for extracting visual features according to an embodiment of the present invention.
3 exemplarily shows a partial network structure of a neural network for extracting semantic features according to an embodiment of the present invention.
4 is a schematic diagram of a text recognition process of a form shown in an embodiment of the present invention.
5 is a schematic diagram of a user selection interface shown in an embodiment of the present invention.
6 is a schematic diagram of a document processing apparatus shown in the embodiment of the present invention.
7 is a structural schematic diagram of the document processing apparatus shown in the embodiment of the present invention.

예들이 본 명세서에서 상세히 설명될 것이며, 그 예시들은 도면들에 나타나 있다. 이하의 설명들이 도면들을 포함할 때, 상이한 도면들에서의 동일한 번호들은 달리 지시되지 않는 한 동일하거나 유사한 요소들을 지칭한다. 하기 예들에 설명된 실시예들은 본 개시내용과 부합하는 모든 실시예를 나타내지 않는다. 오히려, 이들은 본 개시내용의 일부 양태들과 부합하며 첨부된 청구항들에 상술된 바와 같은 디바이스들 및 방법들의 예들에 불과하다.Examples will be described in detail herein, examples of which are shown in the drawings. When the following description includes drawings, like numbers in different drawings refer to the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, these are merely examples of devices and methods consistent with some aspects of the present disclosure and as detailed in the appended claims.

본 발명에서 사용되는 용어는 특정 실시예의 목적을 설명하기 위한 것일 뿐, 본 발명을 제한하지 않는다. 문맥상 명확하게 다른 의미를 나타내지 않는 한, 본 발명 및 첨부된 특허청구범위에서 사용된 홀수 형식의 “일”, "상기", 및 "당해"는 복수 형식도 포함한다. 또한, 본 명세서에서 사용된 용어 "및/또는"은 하나 또는 복수의 상관된 나열된 항목을 포함하는 임의의 또는 모든 가능한 조합을 포함한다는 것을 이해해야 한다.The terminology used in the present invention is only for describing the purpose of a specific embodiment, and does not limit the present invention. As used in the present invention and the appended claims, the odd-numbered forms “a,” “the,” and “therefore” also include the plural forms, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein includes any or all possible combinations including one or more correlated listed items.

본 발명에서 용어 제1, 제2, 제3 등을 사용하여 다양한 정보를 설명하였을 지라도, 이러한 정보는 이러한 용어에 제한되어서는 아니 됨을 이해해야 한다. 이러한 용어는 동일한 카테고리의 정보를 서로 구별하기 위해서만 사용된다. 예를 들어, 본 발명의 범위를 벗어나지 않는 경우, 제1 정보를 제2 정보라고도 할 수 있으며, 유사하게, 제2 정보를 제1 정보라고 할 수도 있다. 문맥에 따라, 여기서 사용된 단어 “만약”은 “... ... 경우”, “... ...때" 또는 “... ...결정에 응답하여”로 해석될 수 있다.Although various information has been described using the terms first, second, third, etc. in the present invention, it should be understood that such information is not limited to these terms. These terms are only used to distinguish between the same categories of information. For example, without departing from the scope of the present invention, the first information may be referred to as second information, and similarly, the second information may be referred to as first information. Depending on the context, the word “if” as used herein may be interpreted as “if…”, “when…” or “…in response to a decision.”

현재, 일반적으로 OCR(Optical Character Recognition, 광학적 문자 판독) 기술을 응용하여 문서를 인식한다. 상기 기술을 이용하여 인식할 때, 문서의 카테고리를 정확하게 획득하고, 상응한 템플릿을 사용해야 하지만, 관련 기술을 이용한 문서의 분류 결과는 정확하지 않다.Currently, documents are generally recognized by applying OCR (Optical Character Recognition) technology. When recognizing using the above technique, the category of the document is accurately obtained and a corresponding template is used, but the classification result of the document using the related technique is not accurate.

이를 토대로, 본 발명의 적어도 하나의 실시예는 문서 처리 방법을 제공하는 바, 도 1을 참조하면, 이는 상기 문서 처리 방법의 프로세스를 도시하였으며, 단계 S101 내지 단계 S103을 포함한다.Based on this, at least one embodiment of the present invention provides a document processing method. Referring to FIG. 1 , the process of the document processing method is illustrated, including steps S101 to S103.

여기서, 상기 문서는 책, 서류, 서식, 티켓, 증명 서류 및 무선 주파수(RF) 카드 등 중 하나 이상을 포함할 수 있고, 구체적으로, 예를 들어, 통용 문자, 신분증, 은행 카드, 주행증, 운전면허증, 여권, 서식, 영수증, 사업자등록증 및 필기(handwritten) 서류 등이다. 상기 문서 처리 방법은 상기 문서의 카테고리를 자동으로 인식할 수 있고, 예를 들어, 하나의 은행 카드를 은행 카드 카테고리로 자동으로 인식하거나, 또는 하나의 신분증을 신분증 카테고리로 자동으로 인식하거나, 또는 한 장의 영수증을 영수증 카테고리로 자동으로 인식할 수 있다. 설명해야 할 것은, 구현 과정에서, 처리될 문서는 하나 이상일 수 있다. 다시 말해서, 사용자는 자체 수요에 따라 처리될 문서의 일괄 처리 또는 단일 처리를 선택할 수 있다. 구현 과정에서, 일괄 처리 중 각 처리될 문서의 처리 과정은 단일 처리될 문서의 처리 과정과 유사하며, 단일 처리될 문서의 처리 과정을 참조할 수 있다. 본 발명에서 설명의 편의를 위해, 단일 처리될 문서를 예로 설명하였지만, 이는 본 발명의 기술적 해결수단에 대한 한정이 아니다.Here, the document may include one or more of books, documents, forms, tickets, identification documents, and radio frequency (RF) cards, and specifically, for example, common texts, identification cards, bank cards, driving permits, Driver's licenses, passports, forms, receipts, business licenses, and handwritten documents. The document processing method may automatically recognize the category of the document, for example, automatically recognize one bank card as the bank card category, or automatically recognize one ID card as the ID category, or A receipt can be automatically recognized as a receipt category. It should be noted that, in the implementation process, there may be more than one document to be processed. In other words, the user can select batch processing or single processing of documents to be processed according to their own needs. In the implementation process, the processing process of each to-be-processed document during batch processing is similar to the processing process of the single to-be-processed document, and the processing process of the single to-be-processed document may be referred to. For convenience of description in the present invention, a single document to be processed has been described as an example, but this is not a limitation on the technical solution of the present invention.

단계 S101에서, 처리될 문서의 시맨틱 특징 및 시각적 특징을 획득한다.In step S101, semantic characteristics and visual characteristics of the document to be processed are acquired.

본 단계에서 시맨틱 특징을 획득하고 시각적 특징을 획득하는 선후 순서에 대해 구체적으로 제한하지 않으며, 즉 시맨틱 특징을 먼저 획득한 후, 시각적 특징을 획득하거나, 또는 시각적 특징을 먼저 획득한 후, 시맨틱 특징을 획득하거나, 또는 시맨틱 특징과 시각적 특징을 동시에 획득할 수 있다.In this step, there is no specific limitation on the precedence order of acquiring the semantic feature and acquiring the visual feature, that is, acquiring the semantic feature first and then acquiring the visual feature, or acquiring the visual feature first, then acquiring the semantic feature Alternatively, the semantic feature and the visual feature may be acquired at the same time.

본 단계에서 신경망을 사용하여 처리될 문서의 시각적 특징을 추출할 수 있다. 구체적으로, 먼저 콘볼루션 커널(예를 들어, 3*3의 콘볼루션 커널)을 사용하여 처리될 문서의 초기 특징을 추출한 다음, 초기 특징은 복수의(예를 들어, 7개) 역방향 잔차(inverse residuals) 블록을 거쳐 차례로 중간 특징을 추출하고, 마지막 역방향 잔차 블록에 의해 출력된 중간 특징은 다시 하나의 콘볼루션 커널(예를 들어, 1*1의 콘볼루션 커널)을 거쳐 콘볼루션됨으로써, 지정된 차원의 특징을 출력하여, 처리될 문서의 시각적 특징으로 사용될 수 있다. 각각의 역방향 잔차 블록은 모두 하나의 1*1의 콘볼루션 커널 및 활성화 함수(예를 들어, Relu6)로 구성된 상승 채널 모듈(입력된 특징의 채널 수를 확장함), 하나의 깊이 분리 가능한 콘볼루션 계층 및 활성화 함수로 구성된 추출 모듈(각 채널의 특징을 추출하고, 각 채널의 특징을 연결시킴), 및 하나의 1*1의 콘볼루션 커널로 구성된 하강 채널 모듈(입력된 특징의 채널 수를 복원함)을 포함한다. 각 역방향 잔차 블록은 모두 그의 입력과 하강 채널 모듈의 출력을 합산하여 역방향 잔차 블록의 출력으로 사용한다. 마지막 역방향 잔차 블록을 제외한 각 역방향 잔차 블록의 출력은 모두 다음 역방향 잔차 모듈의 입력으로 사용된다.In this step, a neural network can be used to extract the visual features of the document to be processed. Specifically, first, an initial feature of a document to be processed is extracted using a convolutional kernel (for example, a convolutional kernel of 3*3), and then the initial feature is converted into a plurality of (for example, 7) inverse residuals. Residuals) blocks to extract intermediate features in turn, and the intermediate features output by the last inverse residual block are again convolved through one convolution kernel (for example, a 1*1 convolution kernel), so that the specified dimension By outputting the features of , it can be used as a visual feature of the document to be processed. Each backward residual block consists of one 1*1 convolution kernel and an up channel module (extending the number of channels of the input feature) composed of an activation function (eg Relu6), one depth separable convolution An extraction module composed of a layer and an activation function (extracts the features of each channel and connects the features of each channel), and a descending channel module consisting of one 1*1 convolutional kernel (restores the number of channels of the input features) ) is included. Each reverse residual block is used as the output of the reverse residual block by summing its input and the output of the falling channel module. All outputs of each reverse residual block except for the last reverse residual block are used as inputs to the next reverse residual module.

일 예시에서, 도 2는 처리될 문서의 시각적 특징을 추출하기 위한 네트워크 구조의 일부를 예시적으로 도시한다. 도 2에 도시된 일부 네트워크 구조는 두 개의 역방향 잔차 블록, 즉 제1 역방향 잔차 블록(201) 및 제2 역방향 잔차 블록(202)을 포함한다. 제1 역방향 잔차 블록(201)은 차례로 연결된 제1 상승 채널 모듈(2011), 제1 추출 모듈(2012) 및 제1 하강 채널 모듈(2013)을 포함한다. 여기서, 제1 상승 채널 모듈(2011)은 예를 들어, 하나의 1*1의 콘볼루션 커널(Conv1*1) 및 활성화 함수(예를 들어, Relu6)로 구성될 수 있고, 제1 추출 모듈(2012)은 예를 들어, 깊이 분리 가능한 3*3 콘볼루션 계층(Dwise3*3) 및 활성화 함수(예를 들어, Relu6)로 구성될 수 있으며, 제1 하강 채널 모듈(2013)은 예를 들어, 하나의 1*1의 콘볼루션 커널(Conv1*1)로 구성될 수 있다. 제1 역방향 잔차 블록(201)의 제1 입력은 처리될 문서의 초기 특징이고, 예를 들어, 3*3의 콘볼루션 커널로 추출하여 획득될 수 있다. 제1 역방향 잔차 블록(201)의 제1 출력은 제1 입력과 제1 하강 채널 모듈의 출력의 합이고, 제1 출력은 제2 역방향 잔차 블록(202)의 제2 입력이다. 제2 역방향 잔차 블록(202)은 차례로 연결된 제2 상승 채널 모듈(2021), 제2 추출 모듈(2022) 및 제2 하강 채널 모듈(2023)을 포함한다. 여기서, 제2 상승 채널 모듈(2021)은 예를 들어, 하나의 1*1의 콘볼루션 커널(Conv1*1) 및 활성화 함수(예를 들어, Relu6)로 구성될 수 있고, 제2 추출 모듈(2022)은 예를 들어, 깊이 분리 가능한 콘볼루션 계층(Dwise3*3) 및 활성화 함수(예를 들어, Relu6)로 구성될 수 있으며, 제2 하강 채널 모듈은 예를 들어, 하나의 1*1의 콘볼루션 커널(Conv1*1)로 구성될 수 있다. 제2 역방향 잔차 블록(202)의 제2 출력은 제2 입력과 제2 하강 채널 모듈의 출력의 합이다.In one example, FIG. 2 exemplarily shows a part of a network structure for extracting visual features of a document to be processed. Some network structures shown in FIG. 2 include two reverse residual blocks, a first reverse residual block 201 and a second reverse residual block 202 . The first reverse residual block 201 includes a first rising channel module 2011 , a first extraction module 2012 , and a first falling channel module 2013 connected in turn. Here, the first up channel module 2011 may be composed of, for example, one 1*1 convolution kernel (Conv1*1) and an activation function (eg, Relu6), and the first extraction module ( 2012) may consist, for example, of a deeply separable 3*3 convolutional layer (Dwise3*3) and an activation function (eg, Relu6), and the first falling channel module 2013 is, for example, It may be composed of one 1*1 convolutional kernel (Conv1*1). A first input of the first backward residual block 201 is an initial feature of a document to be processed, and may be obtained by extraction with, for example, a 3*3 convolution kernel. The first output of the first reverse residual block 201 is the sum of the first input and the output of the first falling channel module, and the first output is the second input of the second reverse residual block 202 . The second reverse residual block 202 includes a second rising channel module 2021 , a second extraction module 2022 and a second falling channel module 2023 connected in turn. Here, the second up-channel module 2021 may be composed of, for example, one 1*1 convolutional kernel (Conv1*1) and an activation function (eg, Relu6), and a second extraction module ( 2022) may be composed of, for example, a depth separable convolutional layer (Dwise3*3) and an activation function (eg Relu6), and the second falling channel module is, for example, one 1*1 It can be composed of a convolutional kernel (Conv1*1). The second output of the second reverse residual block 202 is the sum of the second input and the output of the second falling channel module.

본 단계에서, 하기 방식을 사용하여 처리될 문서의 시맨틱 특징을 획득할 수 있다. 먼저, 상기 처리될 문서의 텍스트 인식 결과를 획득한다. 이어서, 상기 텍스트 인식 결과에 기반하여, 상기 처리될 문서의 시맨틱 특징을 획득한다.In this step, the semantic characteristics of the document to be processed may be obtained using the following method. First, a text recognition result of the document to be processed is obtained. Then, based on the text recognition result, a semantic characteristic of the document to be processed is acquired.

여기서, 텍스트 인식 결과는 처리될 문서 중의 텍스트 콘텐츠를 추출하고 특정 방식을 사용하여 표시한 결과일 수 있다. 일 예시에서, OCR 기술을 사용하여 처리될 문서의 텍스트 인식 결과를 획득할 수 있다.Here, the text recognition result may be a result of extracting text content from a document to be processed and displaying it using a specific method. In one example, OCR technology may be used to obtain a text recognition result of a document to be processed.

여기서, 신경망을 사용하여 텍스트 인식 결과의 시맨틱 특징을 추출할 수 있다. 구체적으로, 먼저 텍스트 인식 결과의 상이한 레벨의 특징을 추출한 다음, 상이한 레벨의 특징을 연결하고 추출하여, 마지막으로 텍스트 인식 결과의 시맨틱 특징을 획득할 수 있다.Here, a semantic feature of the text recognition result can be extracted using a neural network. Specifically, it is possible to first extract features of different levels of the text recognition result, then connect and extract features of different levels, and finally obtain semantic features of the text recognition result.

도 3을 참조하면, 일 예시에서, 먼저 적어도 하나의 제3 추출 모듈(301)을 이용하여 텍스트 인식 결과의 중간 특징을 추출하고, 여기서, 각각의 제3 추출 모듈(301)은 수용야(receptive field)가 상이한 콘볼루션 커널일 수 있다. 예를 들어, 수용야가 1인 콘볼루션 커널, 수용야가 3인 콘볼루션 커널, 및 수용야가 5인 콘볼루션 커널을 사용하여 텍스트 인식 결과의 3개의 상이한 레벨의 특징(예를 들어, 콘볼루션 및/또는 풀링 등 조작을 통해)을 추출한 다음, 3개의 상이한 레벨의 특징을 연결하여, 중간 특징을 획득할 수 있다. 다음, 제4 추출 모듈(302)(예를 들어, 1*1의 콘볼루션 커널)을 이용하여, 중간 특징에 대해 추가적인 특징 추출(예를 들어, 콘볼루션 및/또는 풀링 등 조작을 통해)을 수행하여, 텍스트 인식 결과의 시맨틱 특징을 획득한다.Referring to FIG. 3 , in one example, intermediate features of text recognition results are first extracted using at least one third extraction module 301 , wherein each third extraction module 301 is a receptive field. field) may be different convolutional kernels. For example, using a convolution kernel with receptacle 1, a convolution kernel with receptacle 3, and a convolution kernel with receptacle 5, three different levels of features (e.g., convolutional After extracting (through manipulation and/or pooling, etc.), three different level features can be connected to obtain an intermediate feature. Then, using the fourth extraction module 302 (eg, a convolution kernel of 1*1), additional feature extraction (eg, through manipulations such as convolution and/or pooling) is performed on the intermediate features. to obtain the semantic characteristics of the text recognition result.

상기 도 3에 대응하는 특징 추출 과정은 단지 시맨틱 특징을 추출하는 하나의 예시로서, 텍스트 인식 결과의 시맨틱 특징을 추출하는 방식에 대한 구체적인 한정이 아니며, 더 많은 수량 또는 더 적은 수량의 콘볼루션 커널 및 다른 수용야 조합을 사용하여 상이한 레벨의 특징을 추출할 수 있다.The feature extraction process corresponding to FIG. 3 is merely an example of extracting semantic features, and is not a specific limitation on a method of extracting semantic features of a text recognition result, and includes a convolutional kernel with a larger quantity or a smaller quantity and Different receptive field combinations can be used to extract different levels of features.

여기서, 처리될 문서의 시맨틱 특징은 시각적 특징이 유사하지만 텍스트 콘텐츠가 상이한 다양한 문서를 구별하는 데에 사용될 수 있고, 전술한 다양한 문서들은 바로 관련 기술로 정확하게 분류할 수 없는 상황 중 하나로서, 본 실시예는 시맨틱 특징을 추가함으로써 관련 기술의 이러한 문제를 해결하였다.Here, the semantic characteristics of the document to be processed can be used to distinguish various documents having similar visual characteristics but different text contents, and the above-described various documents cannot be accurately classified into related technologies. The example solves this problem of the related art by adding a semantic feature.

단계 S102에서, 상기 시맨틱 특징 및 상기 시각적 특징에 근거하여 상기 처리될 문서의 통용 특징을 결정한다.In step S102, a current characteristic of the document to be processed is determined based on the semantic characteristic and the visual characteristic.

여기서, 단계 S101은 시각적 특징을 추출하고 시맨틱 특징을 추출할 경우, 차원이 동일한 시각적 특징 및 시맨틱 특징을 출력함으로써, 두 가지 특징의 융합을 용이하게 할 수 있다. 물론, 본 실시예는 단계 S101에서 추출된 시각적 특징 및 시맨틱 특징의 차원 관계에 대해 한정하지 않는다.Here, in step S101, when the visual feature is extracted and the semantic feature is extracted, the visual feature and the semantic feature having the same dimension are outputted, thereby facilitating the fusion of the two features. Of course, this embodiment does not limit the dimensional relationship between the visual feature and the semantic feature extracted in step S101.

여기서, 단계 S101은 시각적 특징을 추출하고 시맨틱 특징을 추출할 경우, 상이한 차원의 시각적 특징 및 시맨틱 특징을 출력할 수도 있다. 이 경우, 두 가지 특징의 차원을 비교한 다음, 두 가지 특징 중 차원이 더 높은 특징에 대해 차원 축소를 수행하여, 두 가지 특징의 차원을 동일하게 한 다음, 두 가지 특징의 융합을 수행할 수 있다. 차원 축소 방식으로는 예를 들어, 선형 차원 축소 및 비선형 차원 축소를 사용할 수 있다.Here, step S101 may output visual features and semantic features of different dimensions when the visual features are extracted and the semantic features are extracted. In this case, after comparing the dimensions of two features, dimensionality reduction is performed on the higher-dimensional feature among the two features to make the dimensions of the two features the same, and then the fusion of the two features can be performed. there is. As the dimensionality reduction method, for example, linear dimensionality reduction and non-linear dimensionality reduction may be used.

일 예시에서, 먼저, 상기 시각적 특징 및 상기 시맨틱 특징에 대해 각각 정규화 처리를 수행한다. 다음, 정규화 처리된 상기 시각적 특징 및 정규화 처리된 상기 시맨틱 특징에 대해 가중 합산을 수행하여, 상기 처리될 문서의 통용 특징을 획득한다.In one example, first, normalization processing is performed on the visual feature and the semantic feature, respectively. Then, weighted summing is performed on the normalized visual feature and the normalized semantic feature to obtain a common feature of the document to be processed.

기타 다른 방식을 사용하여 처리될 문서의 통용 특징을 획득할 수 도 있는 바, 예를 들어, 시각적 특징 및 시맨틱 특징에 대해 노멀라이제이션(normalization) 또는 표준화를 수행한 후, 가중 합산하거나, 또는 포인트별 비트 단위 덧셈 또는 벡터 접합 방식을 사용하여 시맨틱 특징 및 시각적 특징을 융합하여, 처리될 문서의 통용 특징을 획득하는 등이다.Other methods may be used to obtain common characteristics of the document to be processed, for example, after performing normalization or normalization on visual characteristics and semantic characteristics, weighted summing, or points By fusing semantic features and visual features using a bit-by-bit addition or vector splicing method, a common feature of a document to be processed is obtained, and the like.

본 발명의 실시예에서, 처리될 문서의 시맨틱 특징 및 시각적 특징을 융합함으로써, 처리될 문서의 통용 특징을 획득할 수 있다. 여기서, 처리될 문서의 통용 특징은 단계 S103의 문서 분류에 사용될 수 있고, 문서 비교에 사용되어 문서 이미지를 매칭할 수도 있다.In an embodiment of the present invention, by fusing the semantic feature and the visual feature of the to-be-processed document, it is possible to obtain the prevailing characteristic of the to-be-processed document. Here, the common feature of the document to be processed may be used for document classification in step S103, and may be used for document comparison to match document images.

단계 S103에서, 상기 처리될 문서의 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정한다.In step S103, a category of the to-be-processed document is determined according to a common characteristic of the to-be-processed document.

본 발명의 실시예에서, 획득된 문서의 시각적 특징 및 시맨틱 특징에 근거하여 문서의 통용 특징을 결정하고, 통용 특징에 근거하여 문서의 카테고리를 결정한다. 본 발명의 문서 처리 방법은 임의의 문서에 대한 정확한 분류를 구현할 수 있고; 시맨틱 특징 및 시각적 특징을 결합하여 문서의 통용 특징을 획득함으로써, 시각적 특징이 유사한 상이한 카테고리 문서의 분류 결과의 정확성을 향상시키고, 문서 분류의 강건성도 향상시킨다.In an embodiment of the present invention, a currency characteristic of the document is determined based on the obtained visual characteristics and semantic characteristics of the document, and a category of the document is determined based on the currency characteristic. The document processing method of the present invention can implement accurate classification for any document; By combining semantic characteristics and visual characteristics to obtain common characteristics of documents, the accuracy of classification results of documents of different categories having similar visual characteristics is improved, and robustness of document classification is also improved.

일부 실시예에서, 하기 방식을 통해 상기 처리될 문서의 텍스트 인식 결과를 획득할 수 있다.In some embodiments, a text recognition result of the to-be-processed document may be obtained through the following method.

먼저, 상기 처리될 문서 중의 타깃 텍스트 박스 및 상기 타깃 텍스트 박스에 포함된 텍스트 콘텐츠를 결정한다.First, a target text box in the document to be processed and text content included in the target text box are determined.

이어서, 각각의 상기 타깃 텍스트 박스 중의 텍스트 콘텐츠의 단어 분리 처리 결과를 획득한다.Then, a word separation processing result of the text content in each of the target text boxes is obtained.

마지막으로, 상기 단어 분리 처리 결과에 대응하는 특징 벡터를 획득한다.Finally, a feature vector corresponding to the result of the word separation processing is obtained.

도 4를 참조하면, 이는 하나의 처리될 문서(즉, 서식)의 텍스트 인식 과정을 도시한다. 텍스트 인식을 통해, 처리될 문서 중의 타깃 텍스트 박스, 즉 401 내지 415의 15개의 텍스트 박스, 및 각 타깃 텍스트 박스에 포함된 텍스트 콘텐츠를 결정한다. 예를 들어, 텍스트 박스 401에 사무용품 구입 청구 테이블이 포함되고, 텍스트 박스 402에 테이블 작성 시간, 년, 월, 일이 포함되며, 텍스트 박스 415에 총지배인의 의견이 포함된다. 각 텍스트 박스 중의 텍스트 콘텐츠에 대해 단어 분리 처리를 수행하여, 복수의 단어 분리 처리 결과, 예를 들어, 416 내지 426의 11개의 단어 분리 처리 결과를 획득하는 바, 즉 상기 15개의 텍스트 박스 중의 텍스트 콘텐츠에 대해 단어 분리 처리를 수행한 후 획득된 일부 단어 분리 처리 결과를 획득한다. 단어 분리 처리 결과는 글자 또는 단어를 포함할 수 있고, 예를 들어, 단어 분리 처리 결과 416(사무), 417(용품), 418(구입청구) 및 419(테이블)는 텍스트 박스(401) 중의 텍스트 콘텐츠가 단어 분리 처리를 거쳐 획득된 4개의 단어 분리 처리 결과이고, 단어 분리 처리 결과 420(테이블 작성), 421(시간), 422(년), 423(월) 및 424(일)은 텍스트 박스(402) 중의 텍스트 콘텐츠가 단어 분리 처리를 거쳐 획득된 5개의 단어 분리 처리 결과이며, 단어 분리 처리 결과 425(총지배인) 및 426(의견)은 텍스트 박스(415) 중의 텍스트 콘텐츠가 단어 분리 처리를 거쳐 획득된 2개의 단어 분리 처리 결과이다. 427 내지 438는 12개의 특징 벡터이고, 각 특징 벡터는 모두 하나의 단어 분리 처리 결과가 특징 벡터 표시를 거쳐 획득된 결과이다.Referring to FIG. 4 , it shows a text recognition process of one to-be-processed document (ie, format). Through text recognition, target text boxes in the document to be processed, that is, 15 text boxes of 401 to 415, and text content included in each target text box are determined. For example, text box 401 contains an office supply billing table, text box 402 contains table creation time, year, month, and day, and text box 415 contains general manager's opinion. Word separation processing is performed on the text content in each text box to obtain a plurality of word separation processing results, for example, 11 word separation processing results of 416 to 426, that is, text content in the 15 text boxes. A partial word separation processing result obtained after performing word separation processing on . The word separation processing result may include letters or words, for example, the word separation processing results 416 (office work), 417 (goods), 418 (purchase request) and 419 (table) are the text in the text box 401 . The content is the result of four word separation processing obtained through word separation processing, and the word separation processing results 420 (table creation), 421 (hour), 422 (year), 423 (month) and 424 (day) are text boxes ( The text contents in 402) are the five word separation processing results obtained through word separation processing, and the word separation processing results 425 (general manager) and 426 (opinion) are the text contents in the text box 415 obtained through word separation processing. It is the result of two word separation processing. 427 to 438 are 12 feature vectors, and each feature vector is a result obtained by performing a feature vector display of a single word separation processing result.

본 발명의 실시예에서, 문서 중의 타깃 텍스트 박스 및 타깃 텍스트 박스 내의 텍스트 콘텐츠를 결정하고, 텍스트 콘텐츠에 대해 단어 분리 처리 및 특징 벡터 표시를 수행하여 텍스트 인식 결과를 획득한다. 문서 중의 텍스트 콘텐츠(예를 들면, 문서 중의 일부 또는 전체 텍스트 콘텐츠)를 추출할 뿐만 아니라, 텍스트 박스의 분할 및 단어 분리 처리를 거쳐, 텍스트 중의 최소 글자/단어 단위를 획득할 수 있으므로, 시맨틱 특징에 대한 결정이 매우 정확하며, 나아가 문서 분류의 정확성을 향상시킨다. 또한, 텍스트 인식 결과는 특징 벡터 표시이므로, 시맨틱 특징을 추출하기 편리하며, 문서 분류의 효율을 더욱 향상시킨다.In an embodiment of the present invention, a target text box in a document and a text content in the target text box are determined, and word separation processing and feature vector display are performed on the text content to obtain a text recognition result. In addition to extracting text content from a document (for example, part or all text content of a document), it is possible to obtain the minimum character/word unit in the text through text box segmentation and word separation processing, so that the semantic characteristics It is very accurate to make a decision about documents, and further improves the accuracy of document classification. In addition, since the text recognition result is a feature vector display, it is convenient to extract semantic features, and the efficiency of document classification is further improved.

일부 실시예에서, 상기 문서 처리 방법은 신경망을 이용하여 수행될 수 있고, 상기 신경망은 상기 처리될 문서의 통용 특징을 추출하기 위한 특징 추출 서브 네트워크와, 상기 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하기 위한 제1 분류 서브 네트워크를 포함할 수 있다. 여기서, 상기 제1 분류 서브 네트워크는 구체적으로, 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 카테고리의 문서의 표준 특징의 적어도 하나의 유사도를 결정하고; 적어도 하나의 유사도에 근거하여 상기 처리될 문서의 카테고리를 결정할 수 있다.In some embodiments, the document processing method may be performed using a neural network, wherein the neural network includes a feature extraction sub-network for extracting a common feature of the to-be-processed document, and a feature extraction sub-network of the to-be-processed document based on the common feature. and a first classification sub-network for determining a category. Here, the first classification sub-network specifically compares the common feature of the to-be-processed document with the standard feature of the document of at least one preset category, and compares the common feature of the to-be-processed document with the document of the at least one category. determine a degree of similarity of at least one standard feature of The category of the document to be processed may be determined based on at least one degree of similarity.

여기서, 처리될 문서의 통용 특징 및 표준 특징의 차원은 동일할 수 있으므로, 통용 특징 및 표준 특징의 비교가 편리하다. 통용 특징과 표준 특징의 유사도는 양자의 유클리드 거리를 계산하여 획득되거나, 또는 양자의 유사도를 출력할 수 있는 하나의 신경망을 통해 획득될 수 있으며, 상기 신경망은 트레이닝을 통해 획득된다.Here, since the dimensions of the common feature and the standard feature of the document to be processed may be the same, comparison of the common feature and the standard feature is convenient. The similarity between the common feature and the standard feature may be obtained by calculating the Euclidean distance between the two, or may be obtained through a single neural network that can output the similarity between the two, and the neural network is obtained through training.

본 발명의 실시예에서, 신경망 내에 각 카테고리의 문서의 표준 특징을 미리 설정한다. 처리될 문서의 통용 특징과 상이한 표준 특징의 유사도를 이용하여 처리될 문서의 카테고리를 결정한다. 유사도를 통해 처리될 문서와 각 카테고리의 표준 문서의 관계, 즉 유사한지 여부 및 유사한 정도를 표시하므로, 분류 결과의 정확성을 향상시키고, 연산이 간단하며, 분류 효율이 더욱 향상된다.In an embodiment of the present invention, standard characteristics of documents of each category are preset in the neural network. The category of the document to be processed is determined using the similarity of the standard characteristics different from the common characteristics of the document to be processed. The degree of similarity indicates the relationship between the document to be processed and the standard document of each category, that is, whether it is similar and the degree of similarity, so that the accuracy of the classification result is improved, the operation is simple, and the classification efficiency is further improved.

일부 실시예에서, 적어도 하나의 유사도에 근거하여 상기 처리될 문서의 카테고리를 결정하는 것은 구체적으로 하기 방식을 사용한다.In some embodiments, determining the category of the document to be processed based on at least one degree of similarity specifically uses the following manner.

이어서, 상기 적어도 하나의 유사도 중 가장 높은 유사도를 획득한다.Then, the highest similarity among the at least one similarity is acquired.

다음, 상기 가장 높은 유사도가 기설정된 유사도 임계값보다 크거나 같은 것에 응답하여, 상기 가장 높은 유사도에 대응하는 상기 표준 특징이 속하는 문서의 카테고리를 상기 처리될 문서의 카테고리로 결정한다.Next, in response to the highest similarity being greater than or equal to a preset similarity threshold, the category of the document to which the standard feature corresponding to the highest similarity belongs is determined as the category of the document to be processed.

여기서, 각각의 유사도를 비교하여 가장 높은 유사도를 결정한다. 적어도 두 개의 동일한 가장 높은 유사도가 나타날 경우, 유사도를 계산하는 단계로 돌아갈 수 있고, 더 높은 정밀도로 유사도를 다시 계산한 다음, 계산 결과를 다시 비교함으로써, 하나의 가장 높은 유사도를 획득한다. 한 번 또는 여러 번의 반복 계산 후, 여전히 적어도 두 개의 동일한 가장 높은 유사도가 포함되면, 하나의 가장 높은 유사도만 남을 때까지 계속하여 반복 계산한다.Here, each degree of similarity is compared to determine the highest degree of similarity. When at least two identical highest similarities appear, it may return to the step of calculating the similarities, recalculate the similarities with higher precision, and then compare the calculation results again to obtain one highest similarity. After one or several iterations, if at least two identical highest similarities are still included, iterative calculations are continued until only one highest similarity remains.

설명해야 할 것은, 구현 과정에서, 먼저 유사도와 기설정된 유사도 임계값을 비교하여, 유사도 임계값보다 크거나 같은 하나 또는 복수의 유사도를 선별해낸 후, 선별해낸 유사도 중 가장 높은 유사도를 획득할 수 있다. 이로부터 알수 있는바, 유일하고 가장 높은 유사도를 결정하는 구현 방식은 위의 두 가지 상황을 포함하지만 이에 제한되지 않고, 구현 과정에서, 동일하거나 유사한 효과를 달성할 수 있는 다른 구현 방식을 사용할 수도 있으며, 여기서 일일이 설명하지 않는다.It should be explained that during the implementation process, first, by comparing the similarity with a preset similarity threshold, and selecting one or a plurality of similarities that are greater than or equal to the similarity threshold, the highest similarity among the selected similarities can be obtained. . As can be seen from this, the implementation method for determining the unique and highest similarity includes, but is not limited to, the above two situations, and in the implementation process, other implementation methods that can achieve the same or similar effect may be used. , which will not be explained here.

본 실시예에서, 유사도 임계값보다 높은 유사도만이 유효한 유사도로 인정되고, 다시 말해서, 처리될 문서의 통용 특징과 표준 특징의 유사도가 유사도 임계값보다 높거나 같을 때에만, 처리될 문서와 표준 문서가 유사한 것으로 인정되고, 나아가 유사도가 유사도 임계값보다 클수록, 처리될 문서와 표준 문서 사이의 유사 정도가 더 높은 것으로 인정되며, 처리될 문서의 통용 특징과 표준 특징의 유사도가 유사도 임계값보다 낮으면, 처리될 문서와 표준 문서가 유사하지 않은 것으로 인정된다.In this embodiment, only the similarity higher than the similarity threshold is recognized as a valid similarity, that is, only when the similarity between the common feature and the standard feature of the document to be processed is higher than or equal to the similarity threshold value, the document to be processed and the standard document is recognized as similar, and further, as the degree of similarity is greater than the similarity threshold, the degree of similarity between the document to be processed and the standard document is recognized as higher, and if the similarity between the common feature and the standard feature of the document to be processed is lower than the similarity threshold, However, it is recognized that the document to be processed and the standard document are not similar.

본 발명의 실시예에서, 신경망 내에 유사도 임계값을 미리 설정한다. 가장 높은 유사도와 유사도 임계값을 비교하고, 가장 높은 유사도가 유사도 임계값보다 높을 경우에만, 처리될 문서를 표준 문서에 대응하는 카테고리로 분류한다. 이로써, 처리될 문서의 통용 특징과 전체 표준 특징의 유사도가 모두 낮을 때, 즉 처리될 문서가 임의의 하나의 표준 문서에 대응하는 카테고리에 속하지 않을 때 분류 오류가 발생되는 것을 방지한다. 나아가, 분류의 정확성을 향상시켜, 사전 설정 카테고리 외의 문서가 잘못 분류되는 문제를 방지한다.In an embodiment of the present invention, a similarity threshold is preset in the neural network. The highest similarity is compared with a similarity threshold, and only when the highest similarity is higher than the similarity threshold, the document to be processed is classified into a category corresponding to the standard document. Thereby, it is prevented that a classification error is generated when the degree of similarity between the general characteristics of the document to be processed and the overall standard characteristics is low, that is, when the document to be processed does not belong to a category corresponding to any one standard document. Furthermore, by improving the classification accuracy, the problem of misclassifying documents other than the preset category is prevented.

일부 실시예에서, 하기 방식을 사용하여 상기 신경망 중의 특징 추출 서브 네트워크를 트레이닝한다.In some embodiments, the following scheme is used to train a feature extraction sub-network in the neural network.

먼저, 샘플 문서를 상기 특징 추출 서브 네트워크에 입력하여, 상기 샘플 문서의 통용 특징을 획득하고, 여기서, 상기 샘플 문서에는 카테고리가 라벨링된다.First, a sample document is input to the feature extraction subnetwork to obtain a common feature of the sample document, wherein the sample document is labeled with a category.

이어서, 상기 통용 특징을 제2 분류 서브 네트워크에 입력하여, 상기 샘플 문서의 예측 카테고리를 획득한다.Then, the common feature is input to a second classification sub-network to obtain a prediction category of the sample document.

마지막으로, 상기 샘플 문서의 예측 카테고리와 상기 샘플 문서의 라벨링 카테고리 사이의 차이에 근거하여, 상기 특징 추출 서브 네트워크의 네트워크 파라미터를 조정한다.Finally, according to the difference between the prediction category of the sample document and the labeling category of the sample document, the network parameter of the feature extraction subnetwork is adjusted.

여기서, 상기 특징 추출 서브 네트워크의 네트워크 구조는 내부로 입력되는 문서의 통용 특징을 추출할 수 있도록 하고, 특징 추출 서브 네트워크의 트레이닝은 특징을 추출하는 정확성을 향상시킨다.Here, the network structure of the feature extraction sub-network allows to extract common features of an input document, and training of the feature extraction sub-network improves the accuracy of feature extraction.

여기서, 제2 분류 서브 네트워크는 하나의 분류기이고, 예를 들어, 이는 적어도 하나의 완전 연결 층 및 노멀라이제이션 층으로 구성될 수 있다. 제2 분류 서브 네트워크에 의해 분류되는 카테고리 수량은 고정되고, 샘플 문서의 카테고리 수량에 대응되며, 예를 들어, 5개, 8개 또는 10개 등이고, 다시 말해서, 제2 분류 서브 네트워크의 출력은 각각의 사전 설정 카테고리의 확률이며, 확률이 가장 높은 하나의 카테고리가 분류 결과이다. 예를 들어, 모두 10개 카테고리의 샘플 문서가 있고, 각각 A, B, C, D, E, F, G, H, I, J이며, 제2 분류 서브 네트워크의 출력 차원은 10이며, 상기 10개의 카테고리에 각각 대응된다. 특징 추출 서브 네트워크에 의해 추출된 하나의 샘플 문서의 통용 특징이 제2 분류 서브 네트워크에 입력된 후, 제2 분류 서브 네트워크에 의해 10개의 확률이 출력되고, 각각 83%, 2%, 1%, 3%, 0.5%, 0.2%, 0.3%, 5%, 4%, 1%이며, 상기 10개의 확률은 상기 샘플 문서가 각각 A, B, C, D, E, F, G, H, I, J 카테고리일 확률이므로, 제2 분류 서브 네트워크가 샘플 확률을 출력하는 예측 카테고리는 A이다.Here, the second classification subnetwork is one classifier, and for example, it may be composed of at least one fully connected layer and a normalization layer. The number of categories classified by the second classification subnetwork is fixed, and corresponds to the category quantity of the sample document, for example, 5, 8, or 10, in other words, the output of the second classification subnetwork is each is the probability of the preset category of , and the one category with the highest probability is the classification result. For example, there are sample documents of all 10 categories, A, B, C, D, E, F, G, H, I, J, respectively, the output dimension of the second classification subnetwork is 10, the 10 It corresponds to each category. After the common feature of one sample document extracted by the feature extraction subnetwork is input to the second classification subnetwork, 10 probabilities are output by the second classification subnetwork, respectively, 83%, 2%, 1%, 3%, 0.5%, 0.2%, 0.3%, 5%, 4%, 1%, and the ten probabilities are that the sample document is A, B, C, D, E, F, G, H, I, Since it is a probability of category J, the prediction category for which the second classification subnetwork outputs sample probability is A.

여기서, 네트워크 손실값이 사전 설정 손실값 임계값보다 작을 때, 상기 특징 추출 서브 네트워크에 대한 네트워크 파라미터의 조정을 정지할 수 있고, 및/또는 조정 횟수가 사전 설정 횟수 임계값을 초과할 때, 상기 특징 추출 서브 네트워크에 대한 네트워크 파라미터의 조정을 정지할 수 있다.Here, when the network loss value is less than a preset loss value threshold, the adjustment of the network parameters for the feature extraction sub-network may be stopped, and/or when the number of adjustments exceeds the preset number of times threshold, the Adjustment of network parameters for the feature extraction subnetwork may be stopped.

여기서, 샘플 문서 세트를 미리 준비할 수 있다. 먼저, 복수의 샘플 문서를 획득한다. 이어서, 각 상기 샘플 문서의 카테고리를 각각 마킹한다. 마지막으로, 카테고리가 마킹된 복수의 샘플 문서에 근거하여 샘플 문서 세트를 결정한다. 또한, 각 샘플 문서 중 하나를 선택하여 상기 카테고리 문서의 표준 템플릿으로 사용하여, 후속적으로 표준 특징을 저장하여 사용할 수도 있다.Here, a set of sample documents may be prepared in advance. First, a plurality of sample documents are acquired. Then, each category of each of the sample documents is marked. Finally, a sample document set is determined based on the plurality of sample documents marked with categories. In addition, one of the sample documents may be selected and used as a standard template of the category document, and the standard features may be subsequently stored and used.

본 발명의 실시예에서, 특징 추출 서브 네트워크의 추출 능력은 추출된 통용 특징의 정확성을 결정하고, 통용 특징의 정확성은 또한 분류 결과의 정확성을 결정하므로, 제2 분류 서브 네트워크에 의해 출력된 예측 카테고리의 정확성은 특징 추출 서브 네트워크 추출 능력의 강약을 표시할 수 있다. 제2 분류 서브 네트워크에 의해 특징 추출 서브 네트워크의 추출 능력의 표시를 구현하여, 특징 추출 서브 네트워크의 네트워크 파라미터를 피드백 및 조절하고, 네트워크 파라미터를 지속적으로 최적화하여, 특징 추출 서브 네트워크의 추출 능력을 향상시키고, 나아가 추출된 통용 특징의 정확성을 향상시키며, 문서 분류의 정확성을 향상시킨다.In an embodiment of the present invention, the extraction capability of the feature extraction sub-network determines the accuracy of the extracted common features, and the accuracy of the common features also determines the accuracy of the classification result, so that the prediction category output by the second classification subnetwork is The accuracy of may indicate the strength or weakness of the feature extraction subnetwork extraction capability. By implementing the indication of the extraction capability of the feature extraction subnetwork by the second classification subnetwork, feedback and adjustment of the network parameters of the feature extraction subnetwork, and continuously optimizing the network parameters, improve the extraction capability of the feature extraction subnetwork In addition, the accuracy of the extracted common features is improved, and the accuracy of document classification is improved.

일부 실시예에서, 상기 적어도 하나의 카테고리의 문서의 표준 특징은 트레이닝 완료된 특징 추출 서브 네트워크를 이용하여, 상기 적어도 하나의 카테고리의 문서의 표준 템플릿을 처리하여 획득된다.In some embodiments, the standard feature of the document of the at least one category is obtained by processing the standard template of the document of the at least one category, using a trained feature extraction subnetwork.

여기서, 특징 추출 서브 네트워크 트레이닝이 완료된 후, 내부로 입력되는 문서의 통용 특징을 정확하게 추출하는 능력을 구비한다. 먼저 각 카테고리의 문서의 표준 템플릿을 결정할 수 있고, 표준 템플릿의 레이아웃이 명확하고, 텍스트 박스 및/또는 텍스트 블록의 경계가 명확하며, 텍스트 콘텐츠가 완전하고, 각 카테고리의 문서의 표준 템플릿의 통용 특징을 추출한 후, 상기 카테고리의 문서의 표준 특징으로 저장한다. 표준 템플릿에 대해 라벨링을 수행할 수도 있고, 즉 표준 템플릿의 각 위치, 텍스트 박스 및/또는 텍스트 블록 등 속성에 대해 라벨링을 수행함으로써, 상기 표준 템플릿은 문서의 레이아웃 인식(document recognition)을 수행하는데 사용될 수 있다.Here, after the feature extraction subnetwork training is completed, it has the ability to accurately extract common features of documents input into the interior. First, the standard template of the document of each category can be determined, the layout of the standard template is clear, the boundaries of text boxes and/or text blocks are clear, the text content is complete, and the general characteristics of the standard template of the document of each category are clear. After extracting, it is stored as a standard feature of the document of the above category. Labeling may be performed on a standard template, ie by performing labeling on attributes such as each position, text box and/or text block of the standard template, the standard template will be used to perform document layout recognition. can

본 발명의 실시예에서, 표준 템플릿과 처리될 문서의 통용 문서는 모두 특징 추출 서브 네트워크를 사용하여 추출되므로, 통용 특징과 표준 특징이 동일한 출처를 가지고, 규칙 및 표준이 일치하므로, 양자를 통해 결정된 유사도가 더 정확하여 문서 분류의 정확성을 더욱 향상시킨다.In the embodiment of the present invention, since both the standard template and the common document of the document to be processed are extracted using the feature extraction subnetwork, the common feature and the standard feature have the same origin, and the rules and standards match, so that the The similarity is more accurate, further improving the accuracy of document classification.

상기 방식을 통해 저장된 표준 특징은 제한적이고, 모든 문서의 카테고리를 포함할 수 없다. 또한, 전술한 일부 실시예의 설명에 따르면, 가장 높은 유사도 임계값이 유사도 임계값보다 크거나 높을 때에만, 처리될 문서를 가장 높은 유사도에 대응하는 문서 카테고리에 분류할 수 있다. 상기 두 개 양태의 원인에 기반하면, 하나의 문서의 카테고리가 기설정된 표준 템플렛에 포함되지 않을 경우, 분류를 완료할 수 없다.Standard features stored through the above method are limited and cannot include categories of all documents. Further, according to the description of some embodiments described above, only when the highest similarity threshold is greater than or higher than the similarity threshold, the document to be processed may be classified into a document category corresponding to the highest similarity. Based on the causes of the above two aspects, if the category of one document is not included in the preset standard template, the classification cannot be completed.

따라서, 일부 실시예에서 하기 방식을 사용하여 표준 특징을 추가한다.Accordingly, in some examples, standard features are added using the following scheme.

상기 가장 높은 유사도가 기설정된 유사도 임계값보다 작은 것에 응답하여, 상기 처리될 문서를 표준 템플릿으로 추가하고, 상기 처리될 문서의 통용 특징을 신규 추가 표준 템플렛의 대응하는 카테고리의 표준 특징으로 결정한다.In response to the highest similarity being smaller than a preset similarity threshold, the document to be processed is added as a standard template, and a common characteristic of the to-be-processed document is determined as a standard characteristic of a corresponding category of the newly added standard template.

여기서, 가장 높은 유사도가 유사도 임계값보다 작다는 것은 처리될 문서가 임의의 하나의 기설정된 문서 카테고리에 속하지 않음을 설명하고, 즉 상기 처리될 문서는 하나의 새로운 문서 카테고리이다. 분류 실패 시, 분류가 완료되지 못한 처리될 문서를 하나의 새로운 카테고리로서 신경망에 저장하고, 즉 처리될 문서를 표준 템플릿으로 저장하며, 추출된 통용 특징을 상기 새로운 카테고리 문서의 표준 특징으로 저장한다. 또한, 상기 카테고리가 저장된 후, 알림 정보를 생성하여, 상기 카테고리의 표준 템플릿에 대해 라벨링을 수행하여, 레이아웃 인식에 사용될 수 있도록 사용자에게 알려줄 수 있다.Here, that the highest similarity is smaller than the similarity threshold value indicates that the document to be processed does not belong to any one preset document category, that is, the document to be processed is a new document category. When classification fails, the document to be processed for which classification is not completed is stored as a new category in the neural network, that is, the document to be processed is stored as a standard template, and the extracted common feature is stored as a standard feature of the new category document. In addition, after the category is stored, it is possible to generate notification information, perform labeling on a standard template of the category, and inform the user so that it can be used for layout recognition.

본 발명의 실시예에서, 특징 추출 서브 네트워크가 처리될 문서의 통용 특징을 정확하게 추출할 수 있으므로, 제1 분류 서브 네트워크가 분류 차원 또는 수량을 자동으로 확장할 수 있다.In an embodiment of the present invention, since the feature extraction subnetwork can accurately extract common features of the document to be processed, the first classification subnetwork can automatically expand the classification dimension or quantity.

본 발명의 실시예에서, 분류 실패된 처리될 문서를 하나의 새로운 카테고리로 저장하고 설정함으로써, 사전 설정 문서 카테고리의 수량을 자동으로 확장하고, 분류 능력을 지속적으로 향상시킬 수 있다.In an embodiment of the present invention, by storing and setting a document to be processed that has failed to be classified as one new category, it is possible to automatically expand the quantity of the preset document category, and to continuously improve the classification capability.

일부 실시예에서, 선택 명령에 응답하여, 기설정된 문서 카테고리 중 적어도 하나의 카테고리를 선택하여 타깃 카테고리로 간주하는 단계를 더 포함한다. 여기서, 상기 선택 명령은 사용자가 선택 조작함으로써 트리거될 수 있고, 트리거 조건을 사전 설정하고, 트리거 조건이 만족될 경우 자동으로 트리거될 수도 있다.In some embodiments, the method further comprises, in response to the selection command, selecting at least one category from among the preset document categories to be regarded as the target category. Here, the selection command may be triggered by a user's selection operation, or may be automatically triggered when a trigger condition is preset and the trigger condition is satisfied.

하기 방식을 사용하여 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 카테고리의 문서의 표준 특징의 유사도를 결정한다. 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 타깃 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 타깃 카테고리의 문서의 표준 특징의 유사도를 결정한다.A degree of similarity between a common characteristic of the document to be processed and a standard characteristic of a document of the at least one category is determined using the following scheme. A common feature of the to-be-processed document is compared with a standard feature of a document of at least one target category to determine a degree of similarity between the common feature of the to-be-processed document and the standard feature of the document of the at least one target category.

일 예시에서, 도 5를 참조하면, 이는 하나의 사용자 선택 인터페이스 중의 일부 콘텐츠를 도시하고, 도면으로부터 기설정된 문서 카테고리가 통용 문자, 신분증, 은행 카드, 주행증, 운전면허증, 여권, 통용 서식, 부가 가치세 영수증, 사업자등록증 및 필기 문자를 포함함을 보아낼 수 있으며, 사용자는 조작을 통해 신분증, 은행 카드, 통용 서식, 부가 가치세 영수증 및 필기 문자를 타깃 카테고리로 선택한다. 이 경우, 후속적으로 인식될 문서에 기반하여 처리를 수행하는 과정에서, 사용자에 의해 선택된 복수의 카테고리를 참조로 사용하게 된다.In one example, referring to FIG. 5 , it shows some contents of one user selection interface, and document categories preset from the drawing are common characters, identification cards, bank cards, driving licenses, driver's licenses, passports, common forms, additions It can be seen that VAT receipts, business registration certificates and handwritten texts are included, and the user selects identification cards, bank cards, currency forms, VAT receipts and handwritten texts as target categories through manipulation. In this case, in the process of performing processing based on a document to be subsequently recognized, a plurality of categories selected by the user are used as references.

설명해야 할 것은, 도 5에 도시된 콘텐츠는 하나의 가능한 구현 형태일 뿐, 실제 응용 과정에서, 사용자는 템플릿을 능동적으로 생성하여, 새로운 타깃 카테고리를 구축하고, 새로운 타깃 카테고리를 인식할 문서 처리 과정의 참조로 사용할 수도 있다. 이 밖에, 타깃 카테고리는 도 5에 도시된 다양한 카테고리 중의 적어도 일부를 포함할 수 있으며, 즉 도 5에 도시된 것보다 많거나 적은 경우일 수 있으며, 여기에서 이에 대해 한정하지 않는다.It should be explained that the content shown in FIG. 5 is only one possible implementation form, and in an actual application process, the user actively creates a template, builds a new target category, and processes a document to recognize the new target category It can also be used as a reference for In addition, the target category may include at least some of the various categories shown in FIG. 5 , that is, more or fewer than those shown in FIG. 5 , but the present disclosure is not limited thereto.

본 발명은 문서 처리 장치를 더 제공하고, 도 6을 참조하면, 이는 상기 장치의 구조를 도시하였으며, 상기 장치는, 처리될 문서의 시맨틱 특징 및 시각적 특징을 획득하기 위한 획득 모듈(601); 상기 시맨틱 특징 및 상기 시각적 특징에 근거하여 상기 처리될 문서의 통용 특징을 결정하기 위한 통용 모듈(602); 및 상기 처리될 문서의 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하기 위한 분류 모듈(603)을 포함한다.The present invention further provides a document processing apparatus, referring to FIG. 6 , which shows the structure of the apparatus, the apparatus comprising: an acquiring module 601 for acquiring semantic features and visual features of a document to be processed; a currency module (602) for determining a currency characteristic of the to-be-processed document based on the semantic characteristic and the visual characteristic; and a classification module 603 for determining a category of the to-be-processed document based on a common characteristic of the to-be-processed document.

일부 실시예에서, 상기 획득 모듈은 구체적으로, 상기 처리될 문서의 텍스트 인식 결과를 획득하고; 상기 텍스트 인식 결과에 기반하여, 상기 처리될 문서의 시맨틱 특징을 획득한다.In some embodiments, the acquiring module is specifically configured to: acquire a text recognition result of the document to be processed; Based on the text recognition result, a semantic characteristic of the document to be processed is acquired.

일부 실시예에서, 상기 처리될 문서의 텍스트 인식 결과를 획득하는 단계는, 상기 처리될 문서 중의 타깃 텍스트 박스 및 상기 타깃 텍스트 박스에 포함된 텍스트 콘텐츠를 결정하는 단계; 각각의 상기 타깃 텍스트 박스 중의 텍스트 콘텐츠의 단어 분리 처리 결과를 획득하는 단계; 및 상기 단어 분리 처리 결과에 대응하는 특징 벡터를 획득하는 단계를 포함한다.In some embodiments, obtaining a text recognition result of the to-be-processed document may include: determining a target text box in the to-be-processed document and text content included in the target text box; obtaining a word separation processing result of text content in each of the target text boxes; and obtaining a feature vector corresponding to a result of the word separation processing.

일부 실시예에서, 상기 통용 모듈은 구체적으로, 상기 시각적 특징 및 상기 시맨틱 특징에 대해 각각 정규화 처리를 수행하고; 정규화 처리된 상기 시각적 특징 및 정규화 처리된 상기 시맨틱 특징에 대해 가중 합산을 수행하여, 상기 처리될 문서의 통용 특징을 획득한다.In some embodiments, the currency module specifically performs normalization processing on the visual feature and the semantic feature, respectively; Weighted summing is performed on the normalized visual feature and the normalized semantic feature to obtain a common feature of the document to be processed.

일부 실시예에서, 상기 문서 처리 장치는 신경망을 포함하고, 상기 신경망은 상기 처리될 문서의 통용 특징을 추출하기 위한 특징 추출 서브 네트워크와, 상기 통용 특징에 근거하여 상기 처리될 문서의 카테고리를 결정하기 위한 제1 분류 서브 네트워크를 포함하며, 상기 제1 분류 서브 네트워크는 구체적으로, 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 카테고리의 문서의 표준 특징의 유사도를 결정하고; 획득된 적어도 하나의 유사도에 근거하여 상기 처리될 문서의 카테고리를 결정한다.In some embodiments, the document processing apparatus includes a neural network, wherein the neural network is configured to: a feature extraction sub-network for extracting a common feature of the to-be-processed document; and to determine a category of the to-be-processed document based on the common feature and a first classification sub-network for: specifically, comparing the common characteristics of the to-be-processed document with a standard characteristic of a document of at least one preset category, the common characteristics of the to-be-processed document and a degree of similarity of standard features of documents of the at least one category; A category of the document to be processed is determined based on the obtained at least one degree of similarity.

일부 실시예에서, 상기 제1 분류 서브 네트워크는 획득된 적어도 하나의 유사도에 근거하여 상기 처리될 문서의 카테고리를 결정할 경우, 구체적으로, 상기 적어도 하나의 유사도 중 가장 높은 유사도를 획득하고; 상기 가장 높은 유사도가 기설정된 유사도 임계값보다 크거나 같은 것에 응답하여, 상기 가장 높은 유사도에 대응하는 표준 특징이 속하는 문서의 카테고리를 상기 처리될 문서의 카테고리로 결정한다.In some embodiments, when the first classification sub-network determines the category of the document to be processed based on the obtained at least one degree of similarity, specifically, obtain a highest degree of similarity among the at least one degree of similarity; In response to the highest similarity being greater than or equal to a preset similarity threshold, the category of the document to which the standard feature corresponding to the highest similarity belongs is determined as the category of the document to be processed.

일부 실시예에서, 상기 장치는, 상기 신경망 중의 특징 추출 서브 네트워크를 트레이닝하기 위한 트레이닝 모듈을 더 포함하고, 이는 샘플 문서를 상기 특징 추출 서브 네트워크에 입력하여, 상기 샘플 문서의 통용 특징을 획득하고, 상기 샘플 문서에는 카테고리가 라벨링되며; 상기 통용 특징을 제2 분류 서브 네트워크에 입력하여, 상기 샘플 문서의 예측 카테고리를 획득하고; 상기 샘플 문서의 예측 카테고리와 상기 샘플 문서의 라벨링 카테고리 사이의 차이에 근거하여, 상기 특징 추출 서브 네트워크의 네트워크 파라미터를 조정한다.In some embodiments, the apparatus further comprises a training module for training a feature extraction subnetwork in the neural network, which inputs a sample document to the feature extraction subnetwork to obtain common features of the sample document; The sample document is labeled with categories; inputting the common feature into a second classification sub-network to obtain a prediction category of the sample document; According to a difference between the prediction category of the sample document and the labeling category of the sample document, the network parameter of the feature extraction sub-network is adjusted.

일부 실시예에서, 상기 적어도 하나의 카테고리의 문서의 표준 특징은 트레이닝 완료된 특징 추출 서브 네트워크를 이용하여, 상기 적어도 하나의 카테고리의 문서에 대해 특징 추출을 수행함으로써 획득된다.In some embodiments, the standard feature of the document of the at least one category is obtained by performing feature extraction on the document of the at least one category, using a trained feature extraction subnetwork.

일부 실시예에서, 상기 장치는, 상기 가장 높은 유사도가 상기 기설정된 유사도 임계값보다 작은 것에 응답하여, 상기 처리될 문서를 표준 템플릿으로 추가하고, 상기 처리될 문서의 통용 특징을 신규 추가 표준 템플릿에 대응하는 카테고리의 표준 특징으로 결정하기 위한 확장 모듈을 더 포함한다.In some embodiments, the device, in response to the highest similarity being less than the preset similarity threshold, adds the to-be-processed document as a standard template, and adds a common characteristic of the to-be-processed document to the newly added standard template. and an extension module for determining as a standard feature of a corresponding category.

일부 실시예에서, 상기 장치는, 선택 명령에 응답하여, 기설정된 문서 카테고리 중 적어도 하나의 카테고리를 선택하여 타깃 카테고리로 간주하기 위한 타깃 모듈을 더 포함하고; 상기 제1 분류 서브 네트워크는 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 카테고리의 문서의 표준 특징의 유사도를 결정할 경우, 구체적으로, 상기 처리될 문서의 통용 특징과 기설정된 적어도 하나의 타깃 카테고리의 문서의 표준 특징을 비교하여, 상기 처리될 문서의 통용 특징과 상기 적어도 하나의 타깃 카테고리의 문서의 표준 특징의 유사도를 결정한다.In some embodiments, the apparatus further comprises: a target module for, in response to the selection command, selecting at least one category from among preset document categories to be regarded as a target category; The first classification sub-network compares a common feature of the to-be-processed document with a standard feature of a document of at least one preset category, and the degree of similarity between the common feature of the to-be-processed document and the standard feature of the document of the at least one category Specifically, by comparing the common characteristics of the document to be processed with the standard characteristics of the document of at least one target category, the common characteristics of the document to be processed and the standard characteristics of the document of the at least one target category are specifically compared. determine the similarity of

일부 실시예에서, 상기 장치는, 상기 처리될 문서의 카테고리에 근거하여 대응하는 기설정된 표준 템플릿을 획득하고; 상기 표준 템플릿에 기반하여, 상기 처리될 문서에 대해 레이아웃 인식 처리를 수행하여, 문서의 레이아웃 인식 결과를 획득하기 위한 인식 모듈을 더 포함한다.In some embodiments, the device is configured to: obtain a corresponding preset standard template according to the category of the document to be processed; and a recognition module for performing layout recognition processing on the document to be processed based on the standard template to obtain a layout recognition result of the document.

본 발명은 문서 처리 기기를 더 제공하고, 도 7을 참조하면, 이는 상기 문서 처리 기기의 구조를 도시하였으며, 상기 문서 처리 기기는 비휘발성 저장 매체(701), 프로세서(702)를 포함하고, 상기 저장 매체(701)는 프로세서(702)에서 실행 가능한 컴퓨터 명령을 저장하며, 상기 프로세서(702)는 상기 컴퓨터 명령이 실행될 경우, 본 발명의 임의의 하나의 실시예에 따른 방법을 구현한다.The present invention further provides a document processing device, and with reference to FIG. 7 , which shows the structure of the document processing device, the document processing device includes a non-volatile storage medium (701) and a processor (702), the The storage medium 701 stores computer instructions executable by the processor 702, and the processor 702, when the computer instructions are executed, implements the method according to any one embodiment of the present invention.

본 발명은 컴퓨터 프로그램이 저장된 컴퓨터 판독 가능 저장 매체를 더 제공하고, 상기 컴퓨터 프로그램이 프로세서에 의해 실행될 경우, 본 발명의 임의의 하나의 실시예에 따른 방법이 구현된다.The present invention further provides a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the method according to any one embodiment of the present invention is implemented.

본 발명의 실시예에서, 이미 알고 있는 다양한 카테고리의 문서에 근거하여, 본 실시예의 분류 방법을 이용하여 처리될 문서를 분류할 경우, 이 다양한 카테고리 중 적어도 하나의 타깃 카테고리를 참조로 선택함으로써, 유사도를 결정하는 단계의 연산 부하 및 유사도를 비교하는 단계의 연산 부하를 줄이고, 분류의 효율을 향상시킬 수 있다.In an embodiment of the present invention, when a document to be processed is classified using the classification method of the present embodiment based on documents of various categories that are already known, at least one target category among these various categories is selected as a reference, so that the degree of similarity It is possible to reduce the computational load of the step of determining and the computational load of the step of comparing the similarity, and improve the efficiency of classification.

일부 실시예에서, 상기 처리될 문서의 카테고리에 근거하여 대응하는 기설정된 표준 템플릿을 획득하는 단계; 상기 표준 템플릿에 기반하여, 상기 처리될 문서에 대해 레이아웃 인식 처리를 수행하여, 문서의 레이아웃 인식 결과를 획득하는 단계를 더 포함한다.In some embodiments, obtaining a corresponding preset standard template according to the category of the document to be processed; The method further includes performing layout recognition processing on the document to be processed based on the standard template to obtain a layout recognition result of the document.

여기서, 분류 결과를 통해 대응하는 표준 템플릿을 자동으로 정확하게 호출하여 레이아웃 인식을 수행함으로써, 레이아웃 인식의 정확성을 향상시킬 뿐만 아니라, 레이아웃 인식의 효율도 향상시킨다.Here, by performing layout recognition by automatically and accurately calling a corresponding standard template through the classification result, not only the accuracy of layout recognition is improved, but also the efficiency of layout recognition is improved.

본 기술분야의 통상의 기술자라면 본 명세서의 하나 또는 복수의 실시예가 방법, 시스템, 또는 컴퓨터 프로그램 제품으로서 제공될 수 있음을 이해해야 한다. 따라서, 본 명세서의 하나 또는 복수의 실시예는 완전한 하드웨어 실시예, 완전한 소프트웨어 실시예, 또는 소프트웨어와 하드웨어를 결합한 실시예 형식을 사용할 수 있다. 또한, 본 명세서의 하나 또는 복수의 실시예는 컴퓨터 사용 가능 프로그램 코드가 포함된 하나 또는 복수의 컴퓨터 사용 가능 저장 매체(자기 디스크 메모리, CD-ROM 및 광 메모리 등을 포함하지만 이에 제한되지 않음)에서 구현되는 컴퓨터 프로그램 제품의 형식을 사용할 수 있다.Those of ordinary skill in the art should understand that one or more embodiments of the present specification may be provided as a method, a system, or a computer program product. Accordingly, one or more embodiments of the present specification may use a complete hardware embodiment, a complete software embodiment, or a combination of software and hardware. In addition, one or more embodiments of the present specification may be stored in one or more computer-usable storage media (including but not limited to magnetic disk memory, CD-ROM and optical memory, etc.) including computer-usable program code. The form of an implemented computer program product may be used.

본 명세서의 각각의 실시예는 모두 점진적으로 설명되고, 각각의 실시예 사이의 동일하거나 유사한 부분은 서로 참조될 수 있으며, 각 실시예에서 중점적으로 설명된 것은 모두 다른 실시예와 구별되는 점이다. 특히, 장치 실시예의 경우, 방법 실시예와 거의 유사하므로, 비교적 간단하게 설명되었고 관련 부분은 방법 실시예의 일부 설명을 참조할 수 있다.Each embodiment of the present specification is described gradually, and the same or similar parts between each embodiment may be referred to each other, and the point mainly described in each embodiment is a point to be distinguished from other embodiments. In particular, in the case of the apparatus embodiment, since it is almost similar to the method embodiment, the description is relatively simple, and relevant parts may refer to some descriptions of the method embodiment.

이상은 본 명세서의 특정 실시예를 설명하였다. 다른 실시예는 첨부된 특허청구범위의 범위 내에 포함된다. 일부 경우, 특허청구범위에 기재된 행위 또는 단계는 실시예와 다른 순서로 수행될 수 있고 여전히 원하는 결과를 달성할 수 있다. 또한, 도면에서 설명된 과정은 반드시 도시된 특정 순서 또는 연속되는 순서로 수행되어야만 원하는 결과를 달성할 수 있는 것이 아니다. 일부 실시형태에서, 멀티 태스크 처리 및 병행 처리도 가능하며 유리할 수도 있다.The foregoing has described specific embodiments of the present specification. Other embodiments are included within the scope of the appended claims. In some cases, acts or steps recited in the claims may be performed in a different order than the embodiments and still achieve desired results. In addition, the processes described in the drawings do not necessarily have to be performed in the specific order shown or in a continuous order to achieve a desired result. In some embodiments, multi-task processing and parallel processing are also possible and may be advantageous.

본 명세서에서 설명된 주제 및 기능적 동작들의 예들은 디지털 전자 회로들, 유형 타입의 컴퓨터 소프트웨어 또는 펌웨어, 본 출원에 개시된 구조들 및 그들의 구조적 등가물들을 포함할 수 있는 컴퓨터 하드웨어, 또는 이들 중 하나 이상에서 구현될 수 있다. 본 명세서에서 설명된 주제의 예들은 하나 이상의 컴퓨터 프로그램, 즉, 데이터 처리 디바이스에 의해 실행되거나 데이터 처리 디바이스의 동작을 제어하기 위해 타입의 비일시적 프로그램 캐리어에서 인코딩되는 컴퓨터 프로그램 명령어들 내의 하나 이상의 유닛으로서 구현될 수 있다. 대안적으로 또는 그에 부가하여, 프로그램 명령어들은 정보를 인코딩하고 이를 데이터 처리 장치에 의해 수행되도록 적당한 수신기 디바이스로 송신하기 위해 생성되는, 기계-발생 전기, 광, 또는 전자기 신호와 같은, 인위적으로 발생된 전파 신호로 인코딩될 수 있다. 컴퓨터 저장 매체는 머신 판독가능 저장 디바이스, 머신 판독가능 저장 기판, 랜덤 또는 직렬 액세스 메모리 디바이스, 또는 이들 중 하나 이상의 조합일 수 있다.Examples of the subject matter and functional operations described herein are implemented in digital electronic circuits, tangible computer software or firmware, computer hardware, which may include the structures disclosed herein and structural equivalents thereof, or implementation in one or more of these. can be Examples of subject matter described herein are as one or more computer programs, ie, one or more units in computer program instructions executed by a data processing device or encoded in a tangible non-transitory program carrier for controlling the operation of the data processing device. can be implemented. Alternatively or in addition, the program instructions may be generated by an artificially generated, such as a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information and transmit it to a suitable receiver device for execution by a data processing apparatus. It can be encoded into a radio signal. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more thereof.

본 명세서에서 설명되는 처리 및 로직 흐름들은 입력 데이터에 따라 동작하고 출력을 생성함으로써 대응하는 기능들을 수행하기 위해 하나 이상의 컴퓨터 프로그램을 실행하는 하나 이상의 프로그램 가능 컴퓨터에 의해 실행될 수 있다. 처리 및 로직 흐름은 또한 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)와 같은 전용 로직 회로에 의해 실행될 수 있고, 디바이스는 또한 전용 로직 회로로서 구현될 수 있다.The processing and logic flows described herein may be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processing and logic flows may also be executed by dedicated logic circuits such as Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), and the device may also be implemented as dedicated logic circuits.

컴퓨터 프로그램을 실행하기에 적합한 컴퓨터는, 예를 들어, 범용 및/또는 특수 타깃 마이크로프로세서, 또는 임의의 다른 타입의 중앙 처리 유닛을 포함할 수 있다. 일반적으로, 중앙 처리 유닛은 판독 전용 메모리 및/또는 랜덤 액세스 메모리로부터 명령어들 및 데이터를 수신할 것이다. 컴퓨터의 기본 컴포넌트들은 명령어들을 구현하거나 실행하기 위한 중앙 처리 유닛 및 명령어들 및 데이터를 저장하기 위한 하나 이상의 메모리 디바이스를 포함할 수 있다. 일반적으로, 컴퓨터는 또한 자기 디스크, 광자기 디스크, 또는 광 디스크 등과 같은, 데이터를 저장하기 위한 하나 이상의 대용량 저장 디바이스를 포함할 것이고, 또는 컴퓨터는 데이터를 수신하거나 데이터를 그것에 전송하거나, 또는 둘 다를 위해 이 대용량 저장 디바이스와 동작적으로 조합될 것이다. 그러나, 컴퓨터는 이러한 장비를 가질 필요는 없다. 또한, 컴퓨터는 몇 개의 예를 들자면, 휴대 전화, PDA(personal digital assistant), 모바일 오디오 또는 비디오 플레이어, 게임 콘솔, GPS(global positioning system) 수신기, 또는 USB(universal serial bus) 플래시 드라이브와 같은 다른 디바이스에 내장될 수 있다.A computer suitable for executing a computer program may include, for example, a general purpose and/or special target microprocessor, or any other type of central processing unit. Generally, the central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include one or more mass storage devices for storing data, such as a magnetic disk, magneto-optical disk, or optical disk, or the like, or the computer receives data or transmits data to it, or both. will be operatively combined with this mass storage device for However, the computer need not have such equipment. A computer may also be a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or other device such as a universal serial bus (USB) flash drive, to name a few. can be embedded in

컴퓨터 프로그램 명령어들 및 데이터를 저장하기에 적합한 컴퓨터 판독가능 매체는 모든 형태의 비일시적 메모리, 매체 및 메모리 디바이스들, 예컨대 반도체 메모리 디바이스들(예를 들어, EPROM들, EEPROM들 및 플래시 메모리 디바이스들), 자기 디스크들(예를 들어, 내부 하드 디스크들 또는 이동식 디스크들), 광자기 디스크들, CD ROM들 및 DVD-ROM 디스크들을 포함한다. 프로세서 및 메모리는 전용 로직 회로에 의해 보충되거나 전용 로직 회로에 통합될 수 있다.A computer-readable medium suitable for storing computer program instructions and data includes all forms of non-transitory memory, media and memory devices, such as semiconductor memory devices (eg, EPROMs, EEPROMs and flash memory devices). , magnetic disks (eg, internal hard disks or removable disks), magneto-optical disks, CD ROMs and DVD-ROM disks. The processor and memory may be supplemented by or incorporated into dedicated logic circuitry.

본 명세서가 많은 특정 구현 상세들을 포함하지만, 이들은 임의의 개시된 범위 또는 청구된 범위를 제한하는 것으로 해석되어서는 안 되고, 주로 개시된 특정 예들의 특징들을 설명하기 위해 사용된다. 본 명세서의 복수의 예에서 설명된 특정 특징들은 또한 단일 예에서 조합하여 구현될 수 있다. 한편, 단일 예에서 설명된 다양한 특징들은 또한 복수의 예에서 개별적으로 또는 임의의 적절한 하위-조합으로 구현될 수 있다. 또한, 특징들이 위에서 설명된 바와 같이 그리고 심지어 원래 청구된 바와 같이 특정 조합들로 기능할 수 있지만, 청구된 조합으로부터의 하나 이상의 특징은 일부 경우들에서 조합으로부터 제거될 수 있고, 청구된 조합은 하위-조합 또는 하위-조합의 변형을 지칭할 수 있다.Although this specification contains many specific implementation details, these should not be construed as limiting any disclosed or claimed scope, but are primarily used to describe features of the specific examples disclosed. Certain features described in multiple examples herein may also be implemented in combination in a single example. On the other hand, various features described in a single example may also be implemented in a plurality of examples individually or in any suitable sub-combination. Also, although features may function in certain combinations as described above and even as originally claimed, one or more features from a claimed combination may in some cases be eliminated from the combination, and the claimed combination is - may refer to a variant of a combination or sub-combination.

유사하게, 동작들이 도면들에서 특정 순서로 도시되지만, 이는 이러한 동작들이 도시된 특정 순서로 또는 순차적으로 수행될 것을 요구하거나, 모든 예시된 동작들이 원하는 결과를 달성하기 위해 수행될 것을 요구하는 것으로 해석되지 않아야 한다. 일부 경우들에서, 멀티태스킹 및 병렬 처리가 유리할 수 있다. 또한, 예들에서의 다양한 시스템 유닛들 및 컴포넌트들의 분리는 모든 예들에서 그러한 분리를 요구하는 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품에 통합되거나, 복수의 소프트웨어 제품들로 패키징될 수 있다는 것을 이해해야 한다.Similarly, although acts are shown in a particular order in the figures, this is to be interpreted as requiring that such acts be performed in the specific order or sequentially shown, or requiring all illustrated acts to be performed to achieve a desired result. shouldn't be In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of various system units and components in the examples should not be construed as requiring such separation in all examples, and the described program components and systems are generally integrated into a single software product, or a plurality of software products. It should be understood that they can be packaged into

따라서, 주제의 특정 예들에 대해서 설명했다. 다른 예들은 첨부된 청구항들의 범위 내에 있다. 일부 경우들에서, 청구항들에 언급된 액션들은 상이한 순서로 수행되고 여전히 원하는 결과들을 달성할 수 있다. 또한, 도면들에 도시된 프로세스들은 원하는 결과를 달성하기 위해 도시된 특정 순서 또는 순차적인 순서일 필요는 없다. 일부 구현들에서, 멀티태스킹 및 병렬 처리가 유리할 수 있다.Accordingly, specific examples of the subject matter have been described. Other examples are within the scope of the appended claims. In some cases, the actions recited in the claims may be performed in a different order and still achieve desired results. Moreover, the processes depicted in the figures need not be in the particular order shown or sequential order to achieve a desired result. In some implementations, multitasking and parallel processing may be advantageous.

Claims

A document processing method comprising:
obtaining semantic characteristics and visual characteristics of the document to be processed;
determining a currency characteristic of the to-be-processed document based on the semantic characteristic and the visual characteristic; and
determining a category of the to-be-processed document based on a common characteristic of the to-be-processed document;
Document processing method, characterized in that.

According to claim 1,
Obtaining the semantic characteristics of the document to be processed includes:
obtaining a text recognition result of the document to be processed; and
based on the text recognition result, acquiring a semantic characteristic of the document to be processed
Document processing method, characterized in that.

3. The method of claim 2,
The step of obtaining a text recognition result of the document to be processed includes:
determining a target text box in the document to be processed and text content included in the target text box;
obtaining a word separation processing result of text content in each of the target text boxes; and
obtaining a feature vector corresponding to the result of the word separation processing;
Document processing method, characterized in that.

According to claim 1,
The step of determining a common characteristic of the document to be processed based on the semantic characteristic and the visual characteristic,
performing regularization processing on the visual feature and the semantic feature, respectively; and
performing weighted summation on the normalized visual feature and the normalized semantic feature to obtain a common feature of the document to be processed
Document processing method, characterized in that.

5. The method according to any one of claims 1 to 4,
The document processing method is performed using a neural network, wherein the neural network includes a feature extraction sub-network for extracting a common feature of the to-be-processed document, and a first for determining a category of the to-be-processed document based on the common feature. a classification subnetwork, wherein the first classification subnetwork comprises:
comparing the common feature of the to-be-processed document with a standard feature of a document of at least one category to determine a degree of similarity between the common feature of the to-be-processed document and the standard feature of the document of the at least one category;
determining the category of the document to be processed based on the obtained at least one degree of similarity;
Document processing method, characterized in that.

6. The method of claim 5,
Determining the category of the document to be processed based on the obtained at least one degree of similarity comprises:
acquiring the highest similarity among the at least one similarity; and
In response to the highest similarity being greater than or equal to a preset similarity threshold, determining a category of a document to which a standard feature corresponding to the highest similarity belongs as the category of the document to be processed
Document processing method, characterized in that.

7. The method according to claim 5 or 6,
The method further comprises training a feature extraction sub-network in the neural network, wherein training a feature extraction sub-network in the neural network comprises:
inputting a sample document into the feature extraction sub-network to obtain a common feature of the sample document, wherein the sample document is labeled with a category;
inputting the common feature into a second classification sub-network to obtain a prediction category of the sample document; and
adjusting a network parameter of the feature extraction subnetwork based on a difference between a prediction category of the sample document and a labeling category of the sample document
Document processing method, characterized in that.

8. The method of claim 7,
The standard feature of the document of the at least one category is obtained by performing feature extraction on the document of the at least one category using a trained feature extraction sub-network.
Document processing method, characterized in that.

9. The method according to any one of claims 6 to 8,
In response to the highest similarity being smaller than the preset similarity threshold, adding the to-be-processed document as a standard template, and determining a common feature of the to-be-processed document as a standard feature of a category corresponding to the newly added standard template further comprising
Document processing method, characterized in that.

10. The method according to any one of claims 5 to 9,
in response to the selection command, further comprising selecting at least one category from among the preset document categories to be regarded as a target category;
Comparing the common feature of the document to be processed with the standard feature of the document of at least one category to be processed, and determining the similarity between the common feature of the document to be processed and the standard feature of the document of the at least one category,
Comparing a common characteristic of the to-be-processed document with a standard characteristic of a document of at least one preset target category to determine a degree of similarity between the common characteristic of the to-be-processed document and a standard characteristic of the document of the at least one target category doing
Document processing method, characterized in that.

11. The method according to any one of claims 1 to 10,
obtaining a corresponding preset standard template according to the category of the document to be processed; and
Based on the standard template, performing layout recognition processing on the document to be processed, further comprising the step of obtaining a layout recognition result of the document
Document processing method, characterized in that.

A document processing apparatus comprising:
an acquiring module for acquiring semantic features and visual features of the document to be processed;
a currency module for determining a currency characteristic of the to-be-processed document based on the semantic characteristic and the visual characteristic; and
a classification module for determining a category of the to-be-processed document based on a common characteristic of the to-be-processed document;
Document processing apparatus, characterized in that.

13. The method of claim 12,
The acquisition module is
obtain a text recognition result of the document to be processed;
to obtain a semantic characteristic of the document to be processed based on the text recognition result
Document processing apparatus, characterized in that.

14. The method of claim 13,
Obtaining the text recognition result of the document to be processed comprises:
determining a target text box in the document to be processed and text content contained in the target text box;
obtaining a word separation processing result of text content in each of the target text boxes; and
acquiring a feature vector corresponding to the word separation processing result
Document processing apparatus, characterized in that.

13. The method of claim 12,
The common module is
perform normalization processing on the visual feature and the semantic feature, respectively;
performing weighted summation on the normalized visual feature and the normalized semantic feature to obtain a common feature of the document to be processed
Document processing apparatus, characterized in that.

16. The method according to any one of claims 12 to 15,
The document processing apparatus includes a neural network, wherein the neural network includes a feature extraction sub-network for extracting a common feature of the document to be processed, and a first classification sub-network for determining a category of the document to be processed based on the common feature. a network, wherein the first classification sub-network comprises:
comparing the common feature of the to-be-processed document with a standard feature of a document of at least one category to determine a degree of similarity between the common feature of the to-be-processed document and the standard feature of the document of the at least one category;
determining the category of the document to be processed based on the obtained at least one degree of similarity;
Document processing apparatus, characterized in that.

17. The method of claim 16,
When the first classification sub-network determines the category of the document to be processed based on the obtained at least one degree of similarity,
acquire a highest similarity among the at least one similarity;
In response to the highest similarity being greater than or equal to a preset similarity threshold, determining the category of the document to which the standard feature corresponding to the highest similarity belongs as the category of the document to be processed; or
In response to the highest similarity being smaller than the preset similarity threshold, adding the to-be-processed document as a standard template, and determining a common feature of the to-be-processed document as a standard feature of a category corresponding to the newly added standard template
Document processing apparatus, characterized in that.

18. The method of claim 16 or 17,
a target module for selecting at least one category from among preset document categories to be regarded as a target category in response to the selection command;
The first classification sub-network compares a common feature of the to-be-processed document with a standard feature of a document of at least one preset category, and the degree of similarity between the common feature of the to-be-processed document and the standard feature of the document of the at least one category If you decide
determining a degree of similarity between the common feature of the document to be processed and the standard feature of the document of the at least one target category by comparing a common feature of the to-be-processed document with a standard feature of a document of at least one preset target category
Document processing apparatus, characterized in that.

A document processing device comprising:
The document processing device includes a non-volatile storage medium and a processor, wherein the storage medium stores computer instructions executable by the processor, wherein the processor executes the computer instructions. to implement a method according to
Document processing device, characterized in that.

A computer-readable storage medium having a computer program stored therein, comprising:
12. When the computer program is executed by a processor, the method according to any one of claims 1 to 11 is implemented
A computer-readable storage medium, characterized in that.

A computer program comprising:
12. When the computer program is executed by a processor, the method according to any one of claims 1 to 11 is implemented
A computer program characterized in that.