KR102628553B1

KR102628553B1 - Equipment data recognition apparatus and method

Info

Publication number: KR102628553B1
Application number: KR1020210067156A
Authority: KR
Inventors: 서덕성; 김희겸; 조인식
Original assignee: 한국전력공사
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2024-01-25
Also published as: KR20220159154A

Abstract

설비제원 인식 장치 및 방법이 개시된다. 본 발명의 설비제원 인식 장치는 설비제원 이미지를 전처리하는 전처리부; 전처리부에 의해 전처리된 설비제원 이미지에서 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 검출부; 검출부에 의해 검출된 영역을 텍스트로 변환하는 인식부; 및 인식부에 의해 인식된 텍스트로부터 타겟 정보(target information)를 추출하는 정보 추출부를 포함하는 것을 특징으로 한다.An apparatus and method for recognizing equipment specifications are disclosed. The equipment specifications recognition device of the present invention includes a preprocessing unit that preprocesses the equipment specifications image; a detection unit that detects an area containing text corresponding to the equipment specifications in the equipment specifications image pre-processed by the pre-processing unit; a recognition unit that converts the area detected by the detection unit into text; and an information extraction unit that extracts target information from the text recognized by the recognition unit.

Description

Equipment specifications recognition device and method {EQUIPMENT DATA RECOGNITION APPARATUS AND METHOD}

본 발명은 설비제원 인식 장치 및 방법에 관한 것으로서, 더욱 상세하게는 설비사진 내 설비제원을 인식하는 설비제원 인식 장치 및 방법에 관한 것이다. The present invention relates to an equipment specifications recognition device and method, and more specifically, to an equipment specifications recognition device and method for recognizing equipment specifications in equipment photos.

일반적으로, 배전시스템에는 저압계기, 저압계기 부속장치, 고압계기, 고압계기 부속장치, 가공변압기, 피뢰기 등과 같은 다양한 설비가 존재한다. Generally, a variety of equipment exists in a distribution system, such as low-voltage meters, low-pressure meter accessories, high-pressure meters, high-voltage meter accessories, overhead transformers, lightning arresters, etc.

관리자는 해당 설비별로 설비제원을 추출하고 추출된 설비제원을 영업정보 시스템 등에 저장하여 관리한다. The manager extracts equipment specifications for each relevant facility and manages the extracted equipment specifications by storing them in the business information system.

관리자는 설비제원을 입력하기 위해 설비제원을 촬영한 후, 촬영된 이미지에서 설비제원을 직접 확인하여 필요한 정보를 시스템에 입력한다. To input the equipment specifications, the manager takes a picture of the equipment specifications, then directly checks the equipment specifications in the captured image and enters the necessary information into the system.

그러나, 이러한 방식은 복잡한 사진 전송절차와 수작업 입력으로 많은 시간 소요된다. 예컨대, 수작업에 의한 준공사진 및 제원입력 방식으로 업무량이 과다하게 증가하고, 여러 단계로 구성된 사진 전송절차로 인해 많은 시간이 소요되며, 이 과정에서 사진 품질이 저하되는 문제점이 있었다.However, this method takes a lot of time due to complicated photo transfer procedures and manual input. For example, there was a problem that the workload increased excessively due to the manual method of entering construction photos and specifications, the photo transmission process consisting of several steps took a lot of time, and the photo quality deteriorated during this process.

게다가, 설비제원 수작업 입력에 따른 인적오류가 발생할 가능성이 있었다. 예컨대, 단순 반복 업무로 집중도가 저하되어 누락 또는 오류값 입력 등 인적오류 발생하는 등의 문제점이 있었다. In addition, there was a possibility of human error occurring due to manual input of equipment specifications. For example, there were problems such as low concentration due to simple repetitive tasks, resulting in human errors such as omission or input of error values.

본 발명의 배경기술은 대한민국 공개특허공보 10-2001-0098089호(2001.11.08)의 '설비 및 자재의 제원 작성과 검색방법'에 개시되어 있다.The background technology of the present invention is disclosed in Korean Patent Publication No. 10-2001-0098089 (2001.11.08) titled ‘Method for preparing and searching specifications for equipment and materials.’

본 발명은 전술한 문제점을 개선하기 위해 창안된 것으로서, 본 발명의 일 측면에 따른 목적은 설비사진 내 설비제원을 인식하는 설비제원 인식 장치 및 방법을 제공하는 데 있다.The present invention was created to improve the above-described problems, and an object of one aspect of the present invention is to provide an equipment specifications recognition device and method for recognizing equipment specifications in equipment photos.

본 발명의 일 측면에 따른 설비제원 인식 장치는 설비제원 이미지를 전처리하는 전처리부; 상기 전처리부에 의해 전처리된 상기 설비제원 이미지에서 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 검출부; 상기 검출부에 의해 검출된 영역을 텍스트로 변환하는 인식부; 및 상기 인식부에 의해 인식된 텍스트로부터 타겟 정보(target information)를 추출하는 정보 추출부를 포함하는 것을 특징으로 한다.An equipment specifications recognition device according to one aspect of the present invention includes a preprocessing unit that preprocesses an equipment specifications image; a detection unit that detects an area containing text corresponding to equipment specifications in the equipment specifications image pre-processed by the pre-processing unit; a recognition unit that converts the area detected by the detection unit into text; and an information extraction unit that extracts target information from the text recognized by the recognition unit.

본 발명의 상기 검출부는 상기 전처리부에 의해 전처리된 상기 설비제원 이미지에서 텍스트의 위치를 어피너티 스코어(Affinity Score) 및 리젼 스코어(Region Score) 중 적어도 하나를 학습하여 검출하는 것을 특징으로 한다.The detection unit of the present invention is characterized in that it detects the position of the text in the facility specifications image pre-processed by the pre-processing unit by learning at least one of an affinity score and a region score.

본 발명의 상기 검출부는 상기 설비제원 이미지 내 각 픽셀에 대한 문자의 중심에 가까울수록 1에 가깝고 멀수록 0에 까운값으로 상기 리젼 스코어를 학습하는 것을 특징으로 한다.The detection unit of the present invention is characterized by learning the region score so that the closer it is to the center of the character for each pixel in the facility specifications image, the closer it is to 1, and the farther it is, the closer it is to 0.

본 발명의 상기 검출부는 상기 설비제원 이미지 내 각 글자들의 중심점들에 대하여 각 글자가 하나의 단어 또는 문장인지 파악하기 위해 텍스트와 텍스트의 중심인 확률을 상기 어피너티 스코어로 학습하는 것을 특징으로 한다.The detection unit of the present invention is characterized by learning the text and the probability of the center of the text using the affinity score to determine whether each letter is a word or sentence with respect to the center points of each letter in the equipment specifications image.

본 발명의 상기 검출부는 트랜스퍼 러닝(Transfer Learning) 기법을 적용하여 전이학습을 수행하는 것을 특징으로 한다.The detection unit of the present invention is characterized in that it performs transfer learning by applying a transfer learning technique.

본 발명의 상기 검출부는 상기 설비제원 이미지 내 텍스트에 따라 하이퍼 파라미터 최적화(Hyper Parameter Optimization;HPO)를 수행하는 것을 특징으로 한다.The detection unit of the present invention is characterized in that it performs hyper parameter optimization (HPO) according to the text in the equipment specifications image.

본 발명의 상기 인식부는 ASN(Attention Based Semantic Reasoning Networks)을 이용하여 상기 검출부에 의해 검출된 텍스트가 있는 영역을 텍스트로 변환하는 것을 특징으로 한다.The recognition unit of the present invention is characterized by converting the area containing the text detected by the detection unit into text using ASN (Attention Based Semantic Reasoning Networks).

본 발명의 상기 인식부는 상기 설비제원 이미지 내 텍스트를 분류하는 형태로 학습하며, 임베딩 로스(Embedding loss)와 리저닝 로스(Reasoning loss) 및 퓨전 로스(Fusion loss)를 산출한 후, 상기 임베딩 로스와 리저닝 로스 및 퓨전 로스의 합을 최종 로스(Final Loss)로 학습하는 것을 특징으로 한다.The recognition unit of the present invention learns to classify text in the equipment specifications image, calculates embedding loss, reasoning loss, and fusion loss, and then calculates the embedding loss and It is characterized by learning the sum of the regional loss and fusion loss as the final loss.

본 발명의 상기 정보 추출부는 상기 인식부에 의해 인식된 텍스트에 대해 자음과 모음 단위로 표식을 붙인 고(tokenize), 상기 타겟 정보와의 에디트 거리(edit distance)를 산출하여 상기 타겟 정보의 위치를 추출하는 것을 특징으로 한다.The information extraction unit of the present invention tokenizes the text recognized by the recognition unit into consonant and vowel units, calculates the edit distance from the target information, and determines the location of the target information. It is characterized by extraction.

본 발명의 일 측면에 따른 설비제원 인식 방법은 전처리부가 설비제원 이미지를 전처리하는 단계; 검출부가 상기 전처리부에 의해 전처리된 상기 설비제원 이미지에서 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 단계; 인식부가 상기 검출부에 의해 검출된 영역을 텍스트로 변환하는 단계; 및 정보 추출부가 상기 인식부에 의해 인식된 텍스트로부터 타겟 정보(target information)를 추출하는 단계를 포함하는 것을 특징으로 한다.A method for recognizing equipment specifications according to one aspect of the present invention includes the steps of a preprocessing unit preprocessing an image of equipment specifications; A detection unit detecting an area containing text corresponding to equipment specifications in the equipment specifications image pre-processed by the pre-processing unit; A recognition unit converting the area detected by the detection unit into text; and an information extraction unit extracting target information from the text recognized by the recognition unit.

본 발명의 상기 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 단계는, 상기 전처리부에 의해 전처리된 상기 설비제원 이미지에서 텍스트의 위치를 어피너티 스코어(Affinity Score) 및 리젼 스코어(Region Score) 중 적어도 하나를 학습하여 검출하는 것을 특징으로 한다.The step of detecting an area with text corresponding to the equipment specifications of the present invention involves determining the position of the text in the equipment specifications image preprocessed by the preprocessor among the affinity score and region score. It is characterized by learning and detecting at least one.

본 발명의 상기 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 단계는, 상기 설비제원 이미지 내 각 픽셀에 대한 문자의 중심에 가까울수록 1에 가깝고 멀수록 0에 까운값으로 상기 리젼 스코어를 학습하는 것을 특징으로 한다.The step of detecting an area with text corresponding to the equipment specifications of the present invention involves learning the region score so that the closer it is to the center of the character for each pixel in the equipment specifications image, the closer it is to 1, and the farther away it is, the closer it is to 0. It is characterized by

본 발명의 상기 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 단계는, 상기 설비제원 이미지 내 각 글자들의 중심점들에 대하여 각 글자가 하나의 단어 또는 문장인지 파악하기 위해 텍스트와 텍스트의 중심인 확률을 상기 어피너티 스코어로 학습하는 것을 특징으로 한다.The step of detecting the area where the text corresponding to the equipment specifications of the present invention is located is the text and the probability of the center of the text to determine whether each letter is a word or sentence with respect to the center points of each letter in the equipment specifications image. It is characterized by learning with the affinity score.

본 발명의 상기 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 단계는, 트랜스퍼 러닝(Transfer Learning) 기법을 적용하여 전이학습을 수행하는 것을 특징으로 한다.The step of detecting an area containing text corresponding to the equipment specifications of the present invention is characterized by performing transfer learning by applying a transfer learning technique.

본 발명의 상기 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 단계는, 상기 설비제원 이미지 내 텍스트에 따라 하이퍼 파라미터 최적화(Hyper Parameter Optimization;HPO)를 수행하는 것을 특징으로 한다.The step of detecting an area containing text corresponding to the equipment specifications of the present invention is characterized by performing hyper parameter optimization (HPO) according to the text in the equipment specifications image.

본 발명의 상기 영역을 텍스트로 변환하는 단계는, ASN(Attention Based Semantic Reasoning Networks)을 이용하여 상기 검출부에 의해 검출된 텍스트가 있는 영역을 텍스트로 변환하는 것을 특징으로 한다.The step of converting the area into text of the present invention is characterized by converting the area containing the text detected by the detector into text using ASN (Attention Based Semantic Reasoning Networks).

본 발명의 상기 영역을 텍스트로 변환하는 단계는, 상기 설비제원 이미지 내 텍스트를 분류하는 형태로 학습하며, 임베딩 로스(Embedding loss)와 리저닝 로스(Reasoning loss) 및 퓨전 로스(Fusion loss)를 산출한 후, 상기 임베딩 로스와 리저닝 로스 및 퓨전 로스의 합을 최종 로스(Final Loss)로 학습하는 것을 특징으로 한다.In the step of converting the area into text of the present invention, the text in the equipment specifications image is learned in the form of classification, and embedding loss, reasoning loss, and fusion loss are calculated. After that, the sum of the embedding loss, regionaling loss, and fusion loss is learned as the final loss.

본 발명의 상기 텍스트로부터 목표 정보를 추출하는 단계는, 상기 인식부에 의해 인식된 텍스트에 대해 자음과 모음 단위로 표식을 붙인 고(tokenize), 상기 타겟 정보와의 에디트 거리(edit distance)를 산출하여 상기 타겟 정보의 위치를 추출하는 것을 특징으로 한다. The step of extracting target information from the text of the present invention includes tokenizing the text recognized by the recognition unit into consonant and vowel units and calculating an edit distance from the target information. The method is characterized in that the location of the target information is extracted.

본 발명의 일 측면에 따른 설비제원 인식 장치 및 방법은 설비사진 내 설비제원을 인식하여 설비제원 입력 누락 또는 오류값 입력 등의 인적오류를 최소화한다.The equipment specifications recognition device and method according to an aspect of the present invention recognize equipment specifications in equipment photos and minimize human errors such as missing equipment specifications or entering error values.

도 1 은 본 발명의 일 실시예에 따른 설비제원 인식 장치의 블럭 구성도이다.
도 2 는 본 발명의 일 실시예에 따른 설비제원 인식 방법의 순서도이다.
도 3 은 본 발명의 일 실시예에 따른 설비제원 이미지의 예시도이다.
도 4 는 본 발명의 일 실시예에 따른 설비제원 이미지에서 인식된 설비제원의 예시도이다.
도 5 는 본 발명의 일 실시예에 따른 라돈 변환 기반 회전각 보정 적용 예를 나타낸 도면이다.
도 6 은 본 발명의 일 실시예에 따른 실데이터와 합성데이터를 모아 구성한 데이터 구성을 나타낸 도면이다.
도 7 은 본 발명의 일 실시예에 따른 AI Hub 학습용 실데이터 Text in the Wild 예를 나타낸 도면이다.
도 8 은 본 발명의 일 실시예에 따른 AI Hub 학습용 손글씨와 인쇄체 예를 나타낸 도면이다.
도 9 는 본 발명의 일 실시예에 따른 합성 데이터 생성 예를 나타낸 도면이다.
도 10 은 본 발명의 일 실시예에 따른 리젼 스코어와 어피너티 스코어의 예시도이다.
도 11 은 본 발명의 일 실시예에 따른 검출부의 학습 데이터 그라운드 트루쓰 생성 규칙을 나타낸 도면이다.
도 12 는 본 발명의 일 실시예에 따른 학습 데이터 생성 규칙(리젼 스코어)을 나타낸 도면이다.
도 13 은 본 발명의 일 실시예에 따른 학습 데이터 생성 규칙(어피너티 스코어)를 나타낸 도면이다.
도 14 는 본 발명의 일 실시예에 따른 전이 학습을 위한 학습 데이터 생성 예를 나타낸 도면이다.
도 15 는 본 발명의 일 실시예에 따른 모델 학습 구성을 나타낸 도면이다.
도 16 은 본 발명의 일 실시예에 따른 계량기 이미지에 대한 HPO에 따른 성능 변화 양상을 나타낸 도면이다.
도 17 은 본 발명의 일 실시예에 따른 ASN의 문자인식 예를 나타낸 도면이다.
도 18 은 본 발명의 일 실시예에 따른 Transformer의 multi-head attention unit의 예시도이다.
도 19 는 본 발명의 일 실시예에 따른 ASN의 구조를 나타낸 도면이다.
도 20 은 본 발명의 일 실시예에 따른 학습중인 ASN의 로스 추이를 나타낸 도면이다.
도 21 은 본 발명의 일 실시예에 따른 자음 모음 단위의 토크나이즈를 나타낸 도면이다.
도 22 는 본 발명의 일 실시예에 따른 인식부를 통한 설비의 텍스트 추출 예를 나타낸 도면이다.
도 23 은 본 발명의 일 실시예에 따른 에디트 거리의 설명 예를 나타낸 도면이다.
도 24 는 본 발명의 일 실시예에 따른 테스트 데이터에 대한 정보 추출 결과를 나타낸 도면이다.1 is a block diagram of a device for recognizing equipment specifications according to an embodiment of the present invention.
Figure 2 is a flowchart of a method for recognizing equipment specifications according to an embodiment of the present invention.
Figure 3 is an exemplary diagram of an image of equipment specifications according to an embodiment of the present invention.
Figure 4 is an example diagram of equipment specifications recognized in a equipment specifications image according to an embodiment of the present invention.
Figure 5 is a diagram showing an example of application of Radon transform-based rotation angle correction according to an embodiment of the present invention.
Figure 6 is a diagram showing a data structure composed by collecting real data and synthetic data according to an embodiment of the present invention.
Figure 7 is a diagram showing an example of real data Text in the Wild for AI Hub learning according to an embodiment of the present invention.
Figure 8 is a diagram showing examples of handwriting and printed fonts for AI Hub learning according to an embodiment of the present invention.
Figure 9 is a diagram showing an example of synthetic data generation according to an embodiment of the present invention.
Figure 10 is an example diagram of a region score and an affinity score according to an embodiment of the present invention.
Figure 11 is a diagram showing a learning data ground truth generation rule of the detection unit according to an embodiment of the present invention.
Figure 12 is a diagram showing a learning data generation rule (region score) according to an embodiment of the present invention.
Figure 13 is a diagram showing a learning data generation rule (affinity score) according to an embodiment of the present invention.
Figure 14 is a diagram showing an example of generating learning data for transfer learning according to an embodiment of the present invention.
Figure 15 is a diagram showing a model learning configuration according to an embodiment of the present invention.
Figure 16 is a diagram showing a performance change pattern according to HPO for a meter image according to an embodiment of the present invention.
Figure 17 is a diagram showing an example of character recognition of ASN according to an embodiment of the present invention.
Figure 18 is an example diagram of a multi-head attention unit of a Transformer according to an embodiment of the present invention.
Figure 19 is a diagram showing the structure of an ASN according to an embodiment of the present invention.
Figure 20 is a diagram showing the loss trend of an ASN being learned according to an embodiment of the present invention.
Figure 21 is a diagram showing tokenization of consonant and vowel units according to an embodiment of the present invention.
Figure 22 is a diagram showing an example of text extraction from equipment through a recognition unit according to an embodiment of the present invention.
Figure 23 is a diagram illustrating an example of an edit distance according to an embodiment of the present invention.
Figure 24 is a diagram showing the results of information extraction for test data according to an embodiment of the present invention.

이하에서는 본 발명의 일 실시예에 따른 설비제원 인식 장치 및 방법을 첨부된 도면들을 참조하여 상세하게 설명한다. 이러한 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. 또한 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 이용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. Hereinafter, an apparatus and method for recognizing equipment specifications according to an embodiment of the present invention will be described in detail with reference to the attached drawings. In this process, the thickness of lines or sizes of components shown in the drawings may be exaggerated for clarity and convenience of explanation. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, definitions of these terms should be made based on the content throughout this specification.

도 1 을 참조하면, 본 발명의 일 실시예에 따른 설비제원 인식 장치는 설비제원 이미지를 전처리하는 전처리부(10), 전처리부(10)에 의해 전처리된 설비제원 이미지에서 설비제원에 해당하는 텍스트가 있는 영역을 검출하는 검출부(20), 검출부(20)에 의해 검출된 영역을 텍스트로 변환하는 인식부(30), 및 인식부(30)에 의해 인식된 텍스트로부터 목표 정보를 추출하는 정보 추출부(40)를 포함한다.Referring to FIG. 1, the equipment specifications recognition device according to an embodiment of the present invention includes a preprocessing unit 10 that preprocesses the equipment specifications image, and text corresponding to the equipment specifications from the equipment specifications image preprocessed by the preprocessing unit 10. A detection unit 20 that detects an area, a recognition unit 30 that converts the area detected by the detection unit 20 into text, and an information extraction function that extracts target information from the text recognized by the recognition unit 30. Includes part 40.

본 발명의 일 실시예에 따른 설비제원 인식 장치는 설비제원을 촬영한 설비제원 이미지에서 설비제원에 해당하는 텍스트를 검출한다. The equipment specifications recognition device according to an embodiment of the present invention detects text corresponding to the equipment specifications from a equipment specifications image taken of the equipment specifications.

설비제원 인식 장치는 모바일 폰에 탑재될 수 있으며, 도 3 및 도 4 에 도시된 바와 같이 사용자가 모바일 폰의 카메라 등을 통해 획득되거나 외부 기기로부터 전달받은 설비제원 이미지에서 설비제원을 인식한다. 설비제원 인식 장치는 모바일 폰 이외에도 다양한 단말에 설치될 수 있다. The equipment specifications recognition device can be mounted on a mobile phone and, as shown in FIGS. 3 and 4, recognizes equipment specifications from equipment specifications images obtained by the user through the camera of the mobile phone or received from an external device. The equipment specifications recognition device can be installed on various terminals other than mobile phones.

이하 본 발명의 일 실시예에 따른 설비제원 인식 방법을 상세하게 설명한다.Hereinafter, a method for recognizing equipment specifications according to an embodiment of the present invention will be described in detail.

도 4 를 참조하면, 먼저 전처리부(10)는 설비제원 이미지를 입력받아 이 설비제원 이미지를 전처리한다(S10,S20). Referring to FIG. 4, first, the preprocessing unit 10 receives an image of equipment specifications and preprocesses the image of equipment specifications (S10, S20).

전처리부(10)는 설비제원 이미지를 회전시키거나, 이미지 퀄리티를 개선하거나, 설비제원 이미지에 대한 리사이징 등의 기능을 수행한다. The pre-processing unit 10 performs functions such as rotating the equipment specifications image, improving image quality, or resizing the equipment specifications image.

특히, 전처리부(10)는 도 5 에 도시된 바와 같이 라돈 트랜스폼(radon transform)을 통해 설비제원 이미지의 회전각을 보정할 수 있다. 도 5 는 본 발명의 일 실시예에 따른 라돈 변환 기반 회전각 보정 적용 예를 나타낸 도면이다.In particular, the preprocessor 10 can correct the rotation angle of the equipment specifications image through Radon transform, as shown in FIG. 5. Figure 5 is a diagram showing an example of application of Radon transform-based rotation angle correction according to an embodiment of the present invention.

설비제원 이미지에서 설비제원을 인식하기 위해서는, 학습을 위한 학습 데이터가 우선적으로 확보되어야 한다. In order to recognize equipment specifications from equipment specifications images, learning data for learning must first be secured.

도 6 을 참조하면, 학습 데이터에는 실데이터와 합성데이터가 포함된다. Referring to Figure 6, learning data includes real data and synthetic data.

실데이터는 현실에 존재하는 설비의 문자를 인식하기 위한 실물 영상이다. Real data is an actual image used to recognize characters of equipment that exists in reality.

실데이터는 수집에 비용과 시간이 소모되고, 인식 대상으로 하고 있는 설비제원 이미지의 경우, 단기간에 모을 수 없어 학습에 사용하기 어려운 경우가 있다. Real data is costly and time-consuming to collect, and in the case of equipment specification images that are targeted for recognition, they cannot be collected in a short period of time, making them difficult to use for learning.

합성 데이터는 실데이터로 부족한 텍스트를 보완한다. Synthetic data supplements the text that is lacking with real data.

실데이터는 표지판, 이정표, 상표, 간판 등이 포함될 수 있으며, 도 7 에 도시된 바와 같이 Text in the Wild에 대해 다수의 이미지, 예컨대 15만장이 확보하여 학습 데이터셋으로 구성될 수 있다. Real data may include signs, milestones, trademarks, signboards, etc., and as shown in FIG. 7, a large number of images, for example, 150,000 images, for Text in the Wild can be secured and constituted as a learning dataset.

또한, 실데이터에는 도 8 에 도시된 바와 같은 손글씨와 인쇄체도 포함될 수 있다. Additionally, real data may also include handwriting and printed text as shown in FIG. 8.

합성 데이터로는 실데이터로 부족한 국문 단어·음절, 알파벳, 숫자, 및 특수기호가 포함될 수 있다.Synthetic data may include Korean words, syllables, alphabets, numbers, and special symbols that are lacking in real data.

합성 데이터는 도 9 에 도시된 바와 같이 모든 국문 음절, 영문 알파벳, 숫자, 특수기호를 포함하고 있다. 또한, 합성 데이터는 다양한 국문 폰트(font)와 크기의 문자, 5단계의 블러(blur) 처리, 7가지의 글자색 변화 등 여러 가지 형태로 생성된다. As shown in Figure 9, the synthetic data includes all Korean syllables, English alphabets, numbers, and special symbols. In addition, synthetic data is generated in various forms, including characters of various Korean fonts and sizes, five levels of blur processing, and seven types of text color changes.

또한, 합성 데이터는 문자의 조합이 랜덤하게 구성될 수 있으므로, 익숙한 단어에만 정확도가 높은 알고리즘이 아닌, 모든 문자에 사용성이 높은 문자인식 알고리즘을 만들 수 있는 방향으로 설계될 수 있다.In addition, since synthetic data can be composed of random combinations of characters, it can be designed to create a character recognition algorithm that is highly usable for all characters, rather than an algorithm that is highly accurate only for familiar words.

검출부(20)는 전처리부(10)에 의해 전처리된 설비제원 이미지에서 설비제원에 해당하는 텍스트가 있는 영역, ROI(Region of interest) 바운딩 박스를 검출한다(S30). 여기서, 검출부(20)는 Region and Affinity 기반의 뉴럴넷 알고리즘을 토대로 텍스트가 있는 영역을 검출할 수 있다. The detection unit 20 detects an area with text corresponding to the equipment specifications and a region of interest (ROI) bounding box in the equipment specifications image preprocessed by the preprocessing unit 10 (S30). Here, the detection unit 20 can detect the area where the text is based on a neural net algorithm based on Region and Affinity.

검출부(20)는 설비제원 이미지로부터 설비제원에 해당하는 텍스트의 위치를 탐지하기 위해, 도 10 에 도시된 바와 같은 어피너티 스코어(Affinity Score)와 리젼 스코어(Region Score)를 학습하고, 이들 어피너티 스코어와 리젼 스코어를 통해 결과적으로 텍스트가 있을 것으로 기대되는 영역, 즉 텍스트가 있는 영역을 검출한다. The detection unit 20 learns the affinity score and region score as shown in FIG. 10 in order to detect the location of text corresponding to the equipment specifications from the equipment specifications image, and determines these affinity Through the score and region score, the area where the text is expected to be, that is, the area where the text is, is detected.

리젼 스코어는 픽셀이 텍스트를 구성하는 문자의 중심일 확률을 의미한다. The region score refers to the probability that a pixel is the center of the characters that make up the text.

어피너티 스코어는 문자와 문자 사이, 즉 인접한 두 문자의 중심점일 확률을 의미한다.The affinity score means the probability that the center point is between letters, that is, between two adjacent letters.

도 11 내지 도 13 을 참조하면, 검출부(20)는 설비제원 이미지 내 각 픽셀에 대한 문자의 중심에 가까울수록 1에 가깝고 멀수록 0에 가까운 값으로 리젼 스코어를 학습한다. 검출부(20)는 설비제원 이미지 내 각 글자들의 중심점들에 대하여 각 글자가 하나의 단어 또는 문장인지 파악하기 위해 텍스트와 텍스트의 중심인 확률을 어피너티 스코어로 학습한다. Referring to Figures 11 to 13, the detection unit 20 learns the region score for each pixel in the equipment specifications image with a value closer to 1 the closer it is to the center of the character, and a value closer to 0 the further away it is. The detection unit 20 learns the text and the probability of the center of the text as an affinity score to determine whether each letter is a word or sentence with respect to the center points of each letter in the equipment specifications image.

또한, 검출부(20)는 기존 모델에 트랜스퍼 러닝(Transfer Learning) 기법을 적용하여 추가 학습을 수행할 수 있다. 검출부(20)는 추가적인 학습을 위해 도 14 에 도시된 바와 같이, 설비제원 이미지 그라운드 트루쓰 라벨링을 수행할 수 있다. 이와 같이, 전이 학습을 수행함으로써, 한글 감지 성능이 강화되고, 설비제원 이미지에 전문화된 모델로 강화할 수 있다. Additionally, the detector 20 can perform additional learning by applying a transfer learning technique to the existing model. The detection unit 20 may perform facility specification image ground truth labeling as shown in FIG. 14 for additional learning. In this way, by performing transfer learning, Hangul detection performance can be strengthened and strengthened with a model specialized for equipment specifications images.

도 15 를 참조하면, 설비제원 이미지 데이터에 대한 전이학습을 수행하는 학습 모델이 도시된다. Referring to FIG. 15, a learning model that performs transfer learning on equipment specifications image data is shown.

도 15 에서, p는 각 픽셀이며, S_r(p)는 그라운드 트루쓰의 리젼 스코어 값이고, S^* _r(p)는 예측된 리젼 스코어 값이다. S_a(p)는 그라운드 트루쓰의 어피너티 스코어 값이고, S^* _r(p)는 예측된 어피너티 스코어 값이다. In Figure 15, p is each pixel, S _r (p) is the region score value of the ground truth, and S ^* _r (p) is the predicted region score value. S _a (p) is the affinity score value of the ground truth, and S ^* _r (p) is the predicted affinity score value.

게다가, 검출부(20)는 설비제원 이미지 내 텍스트에 따라 하이퍼 파라미터 최적화(Hyper Parameter Optimization;HPO)를 수행한다. In addition, the detection unit 20 performs hyper parameter optimization (HPO) according to the text in the facility specifications image.

파라미터에는 테스트 임계값(Text Threshold), 링크 임계값(Link Threshold) 및 로우 텍스트(Low Text)가 포함된다. Parameters include Test Threshold, Link Threshold, and Low Text.

테스트 임계값은 리젼 스코어 영역 중 얼마만큼의 값까지 텍스트 영역으로 포함할지를 나타내는 한계 값이다.The test threshold is a limit value that indicates how much of the region score area will be included in the text area.

링크 임계값은 어피너티 스코어에서 얼마만큼의 값까지 한 단어로 포함할지를 나타내는 한계 값이다.The link threshold is a limit value that indicates how much value can be included in one word in the affinity score.

로우 텍스트는 리젼 스코어 맵에서 예측된 값 중 중심 값을 글자로 판단하는 최소값이다. Raw text is the minimum value that determines the center value as a letter among the predicted values in the region score map.

하이퍼 파라미터 최적화에 따른 성능 변환 양상은 도 16 에 도시된 바와 같다. 도 16 의 좌측이 이미지가 하이퍼 파라미터 최적화 전의 이미지이고, 우측이 파이퍼 파라미터 최적화 후의 이미지이다. The performance conversion aspect according to hyperparameter optimization is shown in FIG. 16. The image on the left of FIG. 16 is an image before hyperparameter optimization, and the image on the right is an image after Piper parameter optimization.

이미지 사이즈 및 텍스트 사이즈에 맞지 않는 파라미터를 활용할 경우 작은 텍스트 라인 2줄이 한줄로 인식되는 양상을 보이는데, 이는 하이퍼 파라미터 최적화를 통해 해결될 수 있다. When parameters that do not match the image size and text size are used, two small text lines are recognized as one line. This can be resolved through hyperparameter optimization.

인식부(30)는 검출부(20)에 의해 검출된 영역을 텍스트로 변환한다(S40).The recognition unit 30 converts the area detected by the detection unit 20 into text (S40).

인식부(30)는 검출부(20)에 의해 검출된 텍스트가 있는 영역을 텍스트로 변환하기 위한 학습을 수행하되, 이 경우 ASN(Attention Based Semantic Reasoning Networks)을 이용한다. The recognition unit 30 performs learning to convert the area containing the text detected by the detection unit 20 into text, and in this case, ASN (Attention Based Semantic Reasoning Networks) is used.

즉, 인식부(30)는 ASN을 이용하여 검출부(20)에 의해 검출된 텍스트가 있는 영역을 텍스트로 변환한다. That is, the recognition unit 30 converts the area containing the text detected by the detection unit 20 into text using the ASN.

ASN은 딥러닝 기술 중 CNN으로 구성된다. 모델은 도 17 에 도시된 바와 같이 visual feature에 semantic information까지 고려한 추론을 수행하는 것이다. ASN is composed of CNN among deep learning technologies. As shown in FIG. 17, the model performs inference considering visual features and semantic information.

또한, ASN은 도 18 에 도시된 바와 같이 현재 자연어처리(NLP) 분야에서 가장 핵심적인 역할을 하는 Unit인 Transformer unit(BERT에서도 사용)을 사용하여 자연어 인식성능을 극대화한 구조이다. In addition, as shown in Figure 18, ASN is a structure that maximizes natural language recognition performance by using the Transformer unit (also used in BERT), which is the unit that currently plays the most critical role in the field of natural language processing (NLP).

도 19 를 참조하면, ASN의 전체 구조는 입력받은 이미지에 대해 백본 네트워크(Backbone Network)에서 비쥬얼 피처(Visual feature)와 V를 추출하고, 2-D 어텐션(Attention)을 적용하여 얼라인 비쥬얼 피처(Aligned visual feature)와 A를 생성하며, A에 대해 시맨틱 임베딩(semantic embedding)과 멀티-헤드 어텐션(multi-head attention) 과정을 거쳐 시맨틱 피처(semantic feature)와 S를 생성한 뒤 G와 S를 섞는 퓨전 모듈(Fusion module)을 통해 최종 문자를 산출한다. Referring to Figure 19, the overall structure of ASN extracts visual features and V from the backbone network for the input image, and applies 2-D attention to align visual features ( Aligned visual feature and A are generated, semantic features and S are generated for A through semantic embedding and multi-head attention processes, and then G and S are mixed. The final character is calculated through the Fusion module.

ASN은 텍스트가 포함된 이미지 입력에 대해 해당되는 문자를 분류하는 형태로 학습하며, 임베딩 로스(Embedding loss)와 리저닝 로스(Reasoning loss) 및 퓨전 로스(Fusion loss)를 산출한 후, 임베딩 로스와 리저닝 로스 및 퓨전 로스의 합을 최종 로스(Final Loss)로 학습한다.ASN learns by classifying the corresponding characters for image input containing text, calculates the embedding loss, reasoning loss, and fusion loss, and then calculates the embedding loss and The sum of the regional loss and fusion loss is learned as the final loss.

ASN을 이용한 문자인식 알고리즘을 위해, 상기한 학습데이터인 실데이터와 합성데이터가 학습에 사용될 수 있다. For the character recognition algorithm using ASN, the above-described learning data, real data and synthetic data, can be used for learning.

도 20 에는 한글인식을 위한 ASN을 학습하기 위해서 구성한 데이터셋에 대한 학습 과정 중에 확인한 결과물이 도시된다.Figure 20 shows the results confirmed during the learning process for the dataset constructed to learn ASN for Hangul recognition.

도 20 을 참조하면, 100번의 iteration마다 현재 진행상황을 확인하는 과정이며, 상단에는 로스(loss)와 소요시간, 좌측에는 입력된 실데이터/합성데이터 문장, 우측에는 padding($)를 포함한 예측 문장, 가운데는 padding($)를 제외한 예측 문장을 출력하며, 정확도는 가장 우측 True/False를 이용해 확인 가능함을 알 수 있다. 또한, 상대적으로 많이 사용되는 실데이터와 랜덤으로 구성된 학성데이터 모두 학습되고 있음을 알 수 있다. Referring to Figure 20, this is the process of checking the current progress every 100 iterations, with loss and time required at the top, input real data/synthetic data sentences on the left, and prediction sentences including padding ($) on the right. , the prediction sentence excluding padding ($) is output in the middle, and the accuracy can be checked using True/False on the far right. In addition, it can be seen that both relatively widely used real data and randomly composed academic achievement data are being learned.

정보 추출부(40)는 인식부(30)에 의해 인식된 텍스트로부터 목표 정보를 추출하고, 추출된 목표 정보를 출력한다(S50,S60).The information extraction unit 40 extracts target information from the text recognized by the recognition unit 30 and outputs the extracted target information (S50, S60).

일 예로, 설비제원 이미지가 저압 전력량계 이미지인 경우, 제조사, 형식번호, 제조번호, 제조년월을 추출하는 것이 최종 목표이다. For example, if the equipment specifications image is a low-voltage power meter image, the final goal is to extract the manufacturer, type number, manufacturing number, and manufacturing date.

이 경우, 정보 추출부(40)는 도 21 및 도 22 에 도시된 바와 같이, 인식부(30)에 의해 인식된 텍스트에 대해 자음과 모음 단위로 표식을 붙이고(tokenize), 타겟 정보(target information)와의 에디트 거리(edit distance)를 산출하여 타겟 정보의 위치를 추출한다. In this case, as shown in FIGS. 21 and 22, the information extraction unit 40 tokenizes the text recognized by the recognition unit 30 in units of consonants and vowels and provides target information. ) and extract the location of the target information by calculating the edit distance.

에디트 거리는 스트링 시퀀스(string sequence) 1에 대해 삽입과 수정 및 삭제의 과정을 거쳐 스트링 시퀀스 2를 만드는 경우, 몇 회의 수정이 필요한지를 계산하는 방법론이다. 도 23 에 도시된 바와 같이, R을 삭제(delete), P를 삽입(insert), V를 수정(replace)하는 3단계를 통해 relevant를 elephant로 바꾸는 것으로 이해할 수 있다. Edit distance is a methodology that calculates how many modifications are needed when creating string sequence 2 through the process of insertion, modification, and deletion of string sequence 1. As shown in Figure 23, it can be understood that relevant is changed to elephant through three steps: deleting R, inserting P, and replacing V.

정보 추출부(40)는 도 24 에 도시된 바와 같이 타겟 정보의 위치를 찾은 뒤 그 아래에 있는 숫자 정보가 타겟 정보이므로 해당 부분에 대해 정규표현식을 활용하여 숫자를 산출한다. As shown in FIG. 24, the information extraction unit 40 finds the location of the target information, and since the numerical information below it is the target information, it calculates the number using a regular expression for the corresponding part.

한편, 설비제원 인식 장치 및 방법은 인공지능 문자인식기술은 설비뿐만 아니라 다양한 분야에서 활용이 가능하다. Meanwhile, artificial intelligence character recognition technology can be used in various fields, not only equipment, but also equipment specifications recognition devices and methods.

예컨대, 설비제원 인식 장치 및 방법은 저압 전력량계의 부속 설비, 고압 전력량계, MOF(Metering Out Fit), 가공 변압기 등에 적용될 수 있으며, 이외에도 공사현장에서 설비의 번호판 인식, 모뎀의 제원 추출, 영수증 인식, 계약서 및 자격증의 내용 추출, 가스 검침 계기 등에도 적용될 수도 있다. For example, equipment specifications recognition devices and methods can be applied to low-voltage power meters' auxiliary equipment, high-voltage power meters, MOFs (Metering Out Fit), processing transformers, etc. In addition, at construction sites, equipment license plate recognition, modem specifications extraction, receipt recognition, and contract It can also be applied to extracting content from certificates and gas meter reading instruments.

즉, 본 발명의 일 실시예에 따른 설비제원 인식 장치 및 방법은 전력분야뿐만 아니라, 이미지 기반 문자 인식을 필요로 하는 다양한 산업분야와 기기 및 실생활 등에도 적용될 수 있으며, 그 적용 대상이 특별히 한정되는 것은 아니다. In other words, the equipment specifications recognition device and method according to an embodiment of the present invention can be applied not only to the electric power field, but also to various industrial fields and devices that require image-based character recognition, and to real life, and its application target is particularly limited. That is not the case.

상기한 본 발명의 일 실시예에 따른 설비제원 인식 장치 및 방법에 따르면, 설비제원 입력 누락 또는 오류값 입력 등의 인적오류를 최소화할 수 있다. According to the equipment specifications recognition device and method according to an embodiment of the present invention described above, human errors such as omission of equipment specifications input or input of error values can be minimized.

본 명세서에서 설명된 구현은, 예컨대, 방법 또는 프로세스, 장치, 소프트웨어 프로그램, 데이터 스트림 또는 신호로 구현될 수 있다. 단일 형태의 구현의 맥락에서만 논의(예컨대, 방법으로서만 논의)되었더라도, 논의된 특징의 구현은 또한 다른 형태(예컨대, 장치 또는 프로그램)로도 구현될 수 있다. 장치는 적절한 하드웨어, 소프트웨어 및 펌웨어 등으로 구현될 수 있다. 방법은, 예컨대, 컴퓨터, 마이크로프로세서, 집적 회로 또는 프로그래밍가능한 로직 디바이스 등을 포함하는 프로세싱 디바이스를 일반적으로 지칭하는 프로세서 등과 같은 장치에서 구현될 수 있다. 프로세서는 또한 최종-사용자 사이에 정보의 통신을 용이하게 하는 컴퓨터, 셀 폰, 휴대용/개인용 정보 단말기(personal digital assistant: "PDA") 및 다른 디바이스 등과 같은 통신 디바이스를 포함한다.Implementations described herein may be implemented, for example, as a method or process, device, software program, data stream, or signal. Although discussed only in the context of a single form of implementation (eg, only as a method), implementations of the features discussed may also be implemented in other forms (eg, devices or programs). The device may be implemented with appropriate hardware, software, firmware, etc. The method may be implemented in a device such as a processor, which generally refers to a processing device that includes a computer, microprocessor, integrated circuit, or programmable logic device. Processors also include communication devices such as computers, cell phones, portable/personal digital assistants (“PDAs”) and other devices that facilitate communication of information between end-users.

본 발명은 도면에 도시된 실시예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며 당해 기술이 속하는 기술분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 아래의 특허청구범위에 의하여 정해져야할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely illustrative, and those skilled in the art will recognize that various modifications and other equivalent embodiments can be made therefrom. You will understand. Therefore, the true technical protection scope of the present invention should be determined by the scope of the patent claims below.

10: 전처리부
20: 검출부
30: 인식부
40: 정보 추출부 10: Preprocessing unit
20: detection unit
30: Recognition unit
40: Information extraction unit

Claims

A preprocessing unit that preprocesses the equipment specifications image;
a detection unit that detects an area containing text corresponding to equipment specifications in the equipment specifications image pre-processed by the pre-processing unit;
a recognition unit that converts the area detected by the detection unit into text; and
An information extraction unit that extracts target information from the text recognized by the recognition unit,
The recognition unit converts the area containing the text detected by the detection unit into text using ASN (Attention Based Semantic Reasoning Networks),
The recognition unit learns to classify text in the equipment specifications image, calculates embedding loss, reasoning loss, and fusion loss, and then calculates the embedding loss and reasoning loss. and the sum of the fusion losses is learned as the final loss,
Learning data for conversion to text using the ASN includes real data and synthetic data,
The real data is an actual image for recognizing characters of equipment that exists in reality,
The synthetic data includes at least one of Korean words, syllables, alphabets, numbers, and special symbols to complement real data,
The composite data is processed in a plurality of fonts and sizes, blurred, or in a plurality of colors, or the combination of characters is randomly composed.

The method of claim 1, wherein the detection unit
Equipment specifications recognition device, characterized in that the position of the text in the equipment specifications image pre-processed by the pre-processing unit is detected by learning at least one of an affinity score and a region score.

The method of claim 2, wherein the detection unit
A facility specifications recognition device characterized in that the region score is learned with a value closer to 1 the closer it is to the center of the character for each pixel in the facility specifications image, and a value closer to 0 the further away it is.

The method of claim 2, wherein the detection unit
A device for recognizing text and learning the probability of being the center of the text using the affinity score to determine whether each letter is a word or sentence with respect to the center points of each letter in the equipment specifications image.

The method of claim 1, wherein the detection unit
A facility specifications recognition device characterized by performing transfer learning by applying transfer learning techniques.

The method of claim 1, wherein the detection unit
A facility specifications recognition device characterized in that it performs hyper parameter optimization (HPO) according to text in the facility specifications image.

delete

The method of claim 1, wherein the information extraction unit
Equipment characterized in that the text recognized by the recognition unit is tokenized in units of consonants and vowels, and the location of the target information is extracted by calculating an edit distance from the target information. Specification recognition device.

A pre-processing unit pre-processing the equipment specifications image;
A detection unit detecting an area containing text corresponding to equipment specifications in the equipment specifications image pre-processed by the pre-processing unit;
A recognition unit converting the area detected by the detection unit into text; and
An information extraction unit comprising extracting target information from the text recognized by the recognition unit,
In the step of converting the area into text, the area containing the text detected by the detector is converted into text using ASN (Attention Based Semantic Reasoning Networks),
In the step of converting the area into text, the text in the equipment specifications image is learned to be classified, and the embedding loss, reasoning loss, and fusion loss are calculated, The sum of the embedding loss, regionaling loss, and fusion loss is learned as the final loss,
Learning data for conversion to text using the ASN includes real data and synthetic data,
The real data is an actual image for recognizing characters of equipment that exists in reality,
The synthetic data includes at least one of Korean words, syllables, alphabets, numbers, and special symbols to complement real data,
The composite data is processed with a plurality of fonts and sizes, blurred, or with a plurality of colors, or the combination of characters is randomly composed.

The method of claim 10, wherein the step of detecting an area containing text corresponding to the equipment specifications includes:
A method for recognizing equipment specifications, characterized in that the position of text in the equipment specifications image pre-processed by the pre-processing unit is detected by learning at least one of an affinity score and a region score.

The method of claim 11, wherein the step of detecting an area containing text corresponding to the equipment specifications includes:
A facility specifications recognition method characterized in that the region score is learned with a value closer to 1 the closer it is to the center of the character for each pixel in the facility specifications image, and a value closer to 0 the farther away it is.

The method of claim 11, wherein the step of detecting an area containing text corresponding to the equipment specifications includes:
A facility specifications recognition method characterized by learning the text and the probability of being the center of the text using the affinity score to determine whether each letter is a word or sentence with respect to the center points of each letter in the facility specifications image.

The method of claim 10, wherein the step of detecting an area containing text corresponding to the equipment specifications includes:
A facility specifications recognition method characterized by performing transfer learning by applying transfer learning techniques.

The method of claim 10, wherein the step of detecting an area containing text corresponding to the equipment specifications includes:
A method for recognizing equipment specifications, characterized in that hyper parameter optimization (HPO) is performed according to text in the equipment specifications image.

delete

The method of claim 10, wherein extracting target information from the text comprises:
Equipment characterized in that the text recognized by the recognition unit is tokenized in units of consonants and vowels, and the location of the target information is extracted by calculating an edit distance from the target information. Specification recognition method.