KR20220151777A

KR20220151777A - Method and device for classifying building defect using multi task channel attention

Info

Publication number: KR20220151777A
Application number: KR1020210058971A
Authority: KR
Inventors: 김하영; 양동욱; 안용한; 이상효; 신현규
Original assignee: 연세대학교 산학협력단; 한양대학교 에리카산학협력단
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2022-11-15
Also published as: KR102501730B1

Abstract

The present invention relates to a method for classifying a building defect using multi-task channel attention and a device for the same. According to an embodiment of the present invention, the method for classifying a building defect using multi-task channel attention and the device for the same provide effect of classifying a defect of a building by each category with only a short text. The method for classifying a building defect using multi-task channel attention includes: a step of receiving input on text data about the building defect by an analysis device; a step of extracting multiple sharing characteristics by receiving the input on the text data by multiple first encoders having filters of different sizes; a step of extracting multiple work characteristics by receiving a value in which the characteristics output by the first encoders are combined, by multiple second encoders; and a step of outputting a classification value about different items by receiving the input of the value output by each of multiple work specialization characteristic encoders, by multiple classifiers.

Description

Building defect classification method and apparatus using multi-task channel attention {METHOD AND DEVICE FOR CLASSIFYING BUILDING DEFECT USING MULTI TASK CHANNEL ATTENTION}

개시된 기술은 다중 작업 채널 어텐션을 이용하여 건물 하자에 대한 카테고리를 분류하는 방법 및 장치에 관한 것이다.The disclosed technology relates to a method and apparatus for classifying categories of building defects using multi-working channel attention.

건물의 외벽이나 내부에는 다양한 원인으로 인해 균열이나 누수 등 여러 가지 결함(Defect)이 발생할 수 있다. 이와 같이 결함이 발생한 경우 건물의 안전성을 확보하고 수명을 유지하기 위해서 결함의 특징이나 종류에 알맞은 하자보수가 이루어져야 한다. Various defects such as cracks or leaks may occur in the exterior or interior of a building due to various causes. In the event of such a defect, repairs should be made appropriate for the characteristics or types of the defect in order to secure the safety of the building and maintain its lifespan.

종래에는 건물에 대한 결함이 발생하면 건물 관리인이나 거주자가 결함에 따른 현상을 대략적으로 파악하여 이를 작업자에게 구두 또는 수기로 전달하거나 작업자가 직접 현장에 가서 결함의 상태를 확인하였다. 즉, 작업자는 다른 사람으로부터 전달된 모호한 내용의 결함에 대한 정보만으로 불필요한 보수작업을 준비해야 하거나 자신이 직접 현장에 방문하여 보수작업을 위한 준비를 했기 때문에 작업에 대한 효율성이 떨어지는 문제가 있었다. Conventionally, when a defect occurs in a building, a building manager or a resident roughly grasps the phenomenon of the defect and conveys it orally or by hand to a worker, or the worker directly goes to the site and checks the state of the defect. That is, since the worker has to prepare for unnecessary repair work only with ambiguous information about the defect transmitted from another person, or he himself visits the site and prepares for the repair work, there is a problem of low work efficiency.

한편, 건물의 결함을 정확하게 파악하기 위해서 결함 위치를 촬영한 이미지를 딥러닝 모델에 입력하여 결함의 상태를 파악하는 기술은 다수 이용되고 있으나 자연어와 같은 텍스트를 토대로 건물 결함을 파악하는 기술은 존재하지 않아서 건물 결함에 대한 텍스트 데이터를 이용하여 결함을 파악하는 기술이 요구된다.On the other hand, in order to accurately identify building defects, many technologies are used to identify the state of defects by inputting images of the defect locations into a deep learning model, but there is no technology to identify building defects based on text such as natural language. Therefore, a technique for identifying defects using text data on building defects is required.

한국 등록특허 제10-2221317호Korean Patent Registration No. 10-2221317

개시된 기술은 다중 작업 채널 어텐션을 이용하여 건물 하자에 대한 카테고리를 분류하는 방법 및 장치를 제공하는데 있다.The disclosed technology provides a method and apparatus for classifying categories of building defects using multi-working channel attention.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 1 측면은 분석장치가 건물 하자에 대한 텍스트 데이터를 입력받는 단계, 서로 다른 크기의 필터를 가진 복수개의 제 1 인코더들이 상기 텍스트 데이터를 입력받아 각각 복수개의 공유 특징을 추출하는 단계, 복수개의 제 2 인코더들이 상기 제 1 인코더들이 출력한 특징들을 결합한 값을 입력받아 복수의 작업 특징들을 추출하는 단계 및 복수개의 분류기들이 상기 복수개의 작업 특화 특징 인코더들이 각각 출력하는 값을 입력받아 각각 서로 다른 항목에 대한 분류값을 출력하는 단계를 포함하는 건물 하자 자동 분류 방법을 제공하는데 있다.The first aspect of the technology disclosed to achieve the above technical problem is the step of receiving text data about building defects by an analysis device, and a plurality of first encoders having filters of different sizes receive the text data and each of a plurality of Extracting a shared feature, extracting a plurality of task features by receiving a combined value of the features output by the first encoders by a plurality of second encoders, and outputting a plurality of task-specific feature encoders by a plurality of classifiers, respectively It is to provide a building defect automatic classification method comprising the step of receiving a value to output a classification value for each different item.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 2 측면은 건물 하자에 대한 텍스트 데이터를 입력받는 입력장치, 서로 다른 크기의 필터를 가진 복수개의 제 1 인코더, 복수개의 제 2 인코더 및 복수개의 분류기를 포함하는 딥러닝 모델을 저장하는 저장장치 및 상기 복수개의 제 1 인코더에 상기 텍스트 데이터를 입력하여 서로 다른 복수개의 공유 특징을 각각 추출하고, 상기 복수개의 공유 특징을 결합한 값을 상기 복수개의 제 2 인코더에 입력하여 복수의 작업 특징들을 각각 추출하고, 상기 복수개의 분류기에 상기 복수의 작업 특징들을 각각 입력하여 서로 다른 항목에 대한 분류값을 각각 출력하는 연산장치를 포함하는 건물 하자 분류 장치를 제공하는데 있다.The second aspect of the technology disclosed to achieve the above technical problem includes an input device for receiving text data on building defects, a plurality of first encoders having filters of different sizes, a plurality of second encoders, and a plurality of classifiers. A storage device for storing a deep learning model that stores the text data and a plurality of different shared features are extracted by inputting the text data to the plurality of first encoders, and a value obtained by combining the plurality of shared features is transmitted to the plurality of second encoders. It is to provide a building defect classification device including an arithmetic device that extracts a plurality of work features by inputting them, respectively, and outputs classification values for different items by inputting the plurality of work features to the plurality of classifiers, respectively.

개시된 기술의 실시 예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 개시된 기술의 실시 예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다. Embodiments of the disclosed technology may have effects including the following advantages. However, this does not mean that the embodiments of the disclosed technology must include all of them, so the scope of rights of the disclosed technology should not be understood as being limited thereby.

개시된 기술의 일 실시예에 따르면 다중 작업 채널 어텐션을 이용한 건물 하자 분류 방법 및 장치는 짧은 텍스트만으로 건물에 대한 하자를 카테고리 별로 분류하는 효과가 있다.According to an embodiment of the disclosed technology, a building defect classification method and apparatus using multi-working channel attention has an effect of classifying building defects by category using only short text.

또한, 컨볼루션 레이어의 깊이를 증가시키지 않아도 높은 수준의 특징을 추출하는 효과가 있다.In addition, there is an effect of extracting high-level features without increasing the depth of the convolutional layer.

또한, 하나의 작업에 잡음이 있더라도 다른 작업으로부터 공유되는 특징을 이용하여 작업 분류에 대한 정확도를 보장하는 효과가 있다.In addition, even if there is noise in one task, there is an effect of guaranteeing accuracy for task classification by using features shared by other tasks.

도 1은 개시된 기술의 일 실시예에 따른 다중 작업 채널 어텐션을 이용하여 건물 하자를 분류하는 과정을 나타낸 도면이다.
도 2는 개시된 기술의 일 실시예에 따른 건물 하자 분류 방법에 대한 순서도이다.
도 3은 개시된 기술의 일 실시예에 따른 건물 하자 분류 장치에 대한 블록도이다.
도 4는 개시된 기술의 일 실시예에 따른 다중 작업 채널 어텐션을 이용한 딥러닝 모델의 구조를 나타낸 도면이다.
도 5는 개시된 기술의 일 실시예에 따라 임베딩된 벡터를 병렬로 처리하는 과정을 나타낸 도면이다.
도 6은 인코더의 세부적인 구조를 나타낸 도면이다.1 is a diagram illustrating a process of classifying building defects using multi-working channel attention according to an embodiment of the disclosed technology.
2 is a flowchart of a building defect classification method according to an embodiment of the disclosed technology.
3 is a block diagram of an apparatus for classifying building defects according to an embodiment of the disclosed technology.
4 is a diagram showing the structure of a deep learning model using multi-working channel attention according to an embodiment of the disclosed technology.
5 is a diagram illustrating a process of processing embedded vectors in parallel according to an embodiment of the disclosed technology.
6 is a diagram showing a detailed structure of an encoder.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

제 1 , 제 2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various components, but the components are not limited by the above terms, and are only used for the purpose of distinguishing one component from another. used only as For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 한다. 그리고 "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In the terms used herein, singular expressions should be understood to include plural expressions unless the context clearly dictates otherwise. And the term "includes" means that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or number, step, operation component, or part. or the possibility of the presence or addition of combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. Prior to a detailed description of the drawings, it is to be clarified that the classification of components in the present specification is merely a classification for each main function in charge of each component. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function.

그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다. 따라서, 본 명세서를 통해 설명되는 각 구성부들의 존재 여부는 기능적으로 해석되어야 할 것이다.In addition, each component to be described below may additionally perform some or all of the functions of other components in addition to its main function, and some of the main functions of each component may be performed by other components. Of course, it may be dedicated and performed by . Therefore, the existence or nonexistence of each component described through this specification should be interpreted functionally.

도 1은 개시된 기술의 일 실시예에 따른 다중 작업 채널 어텐션을 이용하여 건물 하자를 분류하는 과정을 나타낸 도면이다. 도 1을 참조하면 분석장치는 건물 하자에 대한 텍스트 데이터를 입력받고 이를 딥러닝 모델을 통해 분석하여 하자에 대한 카테고리를 분류할 수 있다.1 is a diagram illustrating a process of classifying building defects using multi-working channel attention according to an embodiment of the disclosed technology. Referring to FIG. 1 , the analysis device may classify categories of defects by receiving text data about building defects and analyzing them through a deep learning model.

분석장치가 입력받는 건물 하자에 대한 텍스트 데이터는 사용자가 수기로 작성하거나 타이핑한 텍스트를 의미한다. 예컨대, 사용자가 건물의 하자를 설명하는 짧은 길이의 텍스트일 수 있다. 도 1에서는 “욕실 바닥 방수불량으로 오염”이라는 짧은 텍스트를 텍스트 데이터로 입력하는 상황을 가정하였다. 분석장치는 사용자가 직접 텍스트를 입력할 수 있도록 키보드나 터치패드와 같은 장치를 구비할 수 있으며 이를 통해 텍스트 데이터를 입력받을 수 있다. 또는 사용자의 단말기로부터 전송되는 텍스트 데이터를 수신할 수도 있다.The text data on the building defects received by the analysis device means text written or typed by the user. For example, it may be a short text in which the user describes a defect in a building. In FIG. 1 , a situation in which a short text “contamination due to poor waterproofing of the bathroom floor” is input as text data is assumed. The analysis device may have a device such as a keyboard or a touch pad so that a user can directly input text, and text data can be input through the device. Alternatively, text data transmitted from the user's terminal may be received.

한편, 분석장치는 이와 같이 입력된 텍스트 데이터를 딥러닝 모델에 입력 가능한 형태로 임베딩할 수 있다. 딥러닝 모델은 서로 다른 두 개의 임베딩 모델을 포함한다. 예컨대, Word2vec과 FastText을 임베딩 모델로 포함할 수 있다. 종래 이미지나 영상 처리를 위해 커널 크기에 따라 행렬에 합성곱 연산을 수행하는 2D 합성곱 신경망(Two dimensional Convolution Neural Network)과는 다르게 개시된 기술에서 언급하는 딥러닝 모델은 자연어 처리 기반 1D 합성곱 신경망(One dimensional Convolution Neural Network)을 이용할 수 있다. 즉, 커널을 행렬이 아닌 높이 방향으로만 이동시키면서 연산을 수행할 수 있다. 분석장치는 임베딩을 통해 텍스트 데이터에 포함된 문장을 토큰화 할 수 있다. Meanwhile, the analysis device may embed the input text data into a deep learning model in a form that can be input. The deep learning model includes two different embedding models. For example, Word2vec and FastText can be included as embedding models. Unlike the conventional 2D Convolution Neural Network (TWO dimensional Convolution Neural Network), which performs a convolution operation on a matrix according to the kernel size for image or video processing, the deep learning model mentioned in the disclosed technology is a natural language processing-based 1D convolution neural network ( One dimensional convolution neural network) can be used. That is, it is possible to perform an operation while moving the kernel only in the height direction, not in the matrix. The analysis device may tokenize sentences included in text data through embedding.

한편, 딥러닝 모델에 입력 가능한 형태로 임베딩 된 텍스트 데이터는 먼저 딥러닝 모델의 공유 특징 인코더에 입력된다. 공유 특징 인코더는 병렬 구조를 갖는 복수개의 인코더들로 구성된다. 이하부터는 공유 특징 인코더를 제 1 인코더라고 한다. 제 1 인코더는 임베딩 된 텍스트 데이터를 입력받아 복수개의 공유 특징을 각각 추출한다. 여기에서 추출된 공유 특징은 이하의 작업 특화 특징 인코더를 학습하는데 이용된다.On the other hand, text data embedded in a form that can be input to a deep learning model is first input to the shared feature encoder of the deep learning model. The shared feature encoder is composed of a plurality of encoders having a parallel structure. Hereinafter, the shared feature encoder is referred to as the first encoder. The first encoder receives the embedded text data and extracts a plurality of shared features, respectively. The shared features extracted here are used to learn the following task-specific feature encoders.

한편 제 1 인코더는 서로 다른 크기의 필터에 각각 연결된 복수의 컨볼루션 레이어를 병렬로 포함한다. 즉, 제 1 인코더에는 병렬 구조인 컨볼루션 레이어가 존재하며 각 레이어 별로 연결된 서로 다른 크기의 필터를 통해 서로 다른 복수의 공유 특징들이 각각 추출된다. 각 레이어에는 컨볼루션 블록과 TSE(Text Squeeze and Excitation) 블록이 존재한다. 컨볼루션 블록은 일정 길이의 컨볼루션 레이어들일 수 있으며 TSE 블록은 인코더의 성능 향상을 위해 채널 어텐션을 수행하는 블록으로 컨볼루션 레이어에 연결된 어텐션 레이어일 수 있다. 복수개의 제 1 인코더는 각각 TSE 블록을 이용하여 각 공유 특징의 어텐션 스코어를 계산할 수 있다.Meanwhile, the first encoder includes a plurality of convolutional layers connected to filters of different sizes in parallel. That is, convolutional layers having a parallel structure exist in the first encoder, and a plurality of different shared features are extracted through filters having different sizes connected to each layer. Each layer has a convolution block and a text squeeze and excitation (TSE) block. The convolution block may be convolution layers of a certain length, and the TSE block may be an attention layer connected to the convolution layer as a block that performs channel attention to improve the performance of the encoder. Each of the plurality of first encoders may calculate an attention score of each shared feature using the TSE block.

한편, 복수개의 제 1 인코더들은 각각 컨볼루션 블록과 TSE 블록이 반복되는 구조를 하나 이상 포함한다. 예컨대, 컨볼루션 블록과 TSE 블록이 하나의 세트로 구성되고, 이러한 세트가 반복되는 구조일 수 있다. 바람직하게는 인코더의 길이가 필요 이상으로 길어지지 않도록 2개의 세트가 반복될 수 있다. 그리고 첫 번째 세트에 임베딩 된 텍스트 데이터를 입력하여 출력된 결과를 다음 세트에 다시 입력하여 특징을 추출할 수 있다. 일 실시예로, 복수개의 제 1 인코더들은 각각 첫번째 컨볼루션 블록 및 첫번째 TSE 블록을 통해 추출된 결과값을 두번째 컨볼루션 블록 및 두번째 TSE 블록에 입력하여 서로 다른 공유 특징을 각각 추출할 수 있다. 이와 같이 첫 번째 세트를 통해 출력된 특징을 두 번째 세트의 입력으로 이용하여 텍스트 데이터에 대한 높은 수준의 특징을 추출할 수 있다.Meanwhile, each of the plurality of first encoders includes one or more structures in which a convolution block and a TSE block are repeated. For example, a convolution block and a TSE block may be configured as one set, and this set may be repeated. Preferably the two sets can be repeated so that the length of the encoder is not longer than necessary. In addition, features can be extracted by inputting the text data embedded in the first set and re-entering the output result into the next set. As an embodiment, the plurality of first encoders may respectively extract different shared features by inputting result values extracted through the first convolution block and the first TSE block to the second convolution block and the second TSE block, respectively. In this way, high-level features of text data can be extracted by using the features output through the first set as inputs of the second set.

한편, 상술한 바와 같이 TSE 블록은 어텐션 레이어에 해당된다. 어텐션 레이어는 글로벌 평균 풀링(Global Average Pooling, GAP) 레이어 및 완전 연결(Fully Connected) 레이어를 포함한다. 어텐션 레이어는 글로벌 풀링 레이어를 통해 스퀴즈(Squeeze) 연산을 수행한 결과값에 완전 연결 레이어의 파라미터를 1회 곱하여 엑시테이션(Excitation) 연산을 수행하고, 엑시테이션 연산의 수행 결과값에 시그모이드(Sigmoid) 함수를 적용하여 어텐션 스코어를 계산한다. 이와 같은 연산에 따라 추출된 특징에는 0에서 1사이의 어텐션 스코어가 부여된다.Meanwhile, as described above, the TSE block corresponds to the attention layer. The Attention layer includes a Global Average Pooling (GAP) layer and a Fully Connected layer. The attention layer performs the excitation operation by multiplying the result of the squeeze operation through the global pooling layer once by the parameter of the fully connected layer, and the sigmoid ( Sigmoid) function is applied to calculate the attention score. An attention score between 0 and 1 is given to the feature extracted according to this operation.

종래의 경우 높은 수준의 특징을 추출하기 위해서 블록 구조를 일정 길이 이상 반복적으로 설계하여 특징 추출에 대한 성능을 높이는 방식이었으나, 개시된 기술에는 서로 다른 크기의 필터 또는 커널을 포함하는 병렬 구조의 레이어를 통해 특징을 추출하기 때문에 컨볼루션 레이어의 깊이를 필요 이상으로 깊게 설계하지 않아도 높은 수준의 특징 추출이 가능하다. 이는 곧 샘플의 수를 증가시키는 것과 유사한 효과를 가지므로 텍스트 데이터에 포함된 로컬 특징을 충분히 추출하는 것이 가능하다.In the conventional case, in order to extract high-level features, a block structure was repeatedly designed for a certain length or more to improve the performance of feature extraction, but in the disclosed technology, through parallel structured layers including filters or kernels of different sizes, Since the feature is extracted, a high level of feature extraction is possible without designing the depth of the convolution layer deeper than necessary. Since this has an effect similar to increasing the number of samples, it is possible to sufficiently extract local features included in text data.

한편, 추출된 공유 특징은 복수개의 작업(Task) 특화 특징 인코더의 입력으로 이용된다. 이하부터는 작업 특화 특징 인코더를 제 2 인코더라고 한다. 복수개의 제 2 인코더는 각 공유 특징들을 결합한(Concatenation) 값을 입력받아 복수의 작업 특징들을 각각 추출한다. 제 2 인코더도 공유 특징 인코더와 마찬가지로 병렬로 연결된 복수의 컨볼루션 블록과 각 컨볼루션 블록에 연결된 복수의 TSE 블록을 포함한다. 즉, 두 인코더의 구조 자체는 동일하다. 다만 제 1 인코더에는 서로 다른 크기의 필터가 존재하고 제 2 인코더에는 동일한 크기의 필터가 존재한다는 점에서 차이가 있다. Meanwhile, the extracted shared features are used as inputs of a plurality of task-specific feature encoders. Hereinafter, the task-specific feature encoder is referred to as a second encoder. The plurality of second encoders receive a concatenation value of each shared feature and extract a plurality of task features, respectively. Like the shared feature encoder, the second encoder includes a plurality of convolution blocks connected in parallel and a plurality of TSE blocks connected to each convolution block. That is, the structure itself of the two encoders is the same. However, there is a difference in that filters having different sizes exist in the first encoder and filters having the same size exist in the second encoder.

복수개의 제 2 인코더는 각각 TSE 블록을 이용하여 작업 특징의 어텐션 스코어를 계산할 수 있다. 각 제 2 인코더들은 첫번째 컨볼루션 블록 및 첫 번째 TSE 블록을 통해 추출된 결과값을 두번째 컨볼루션 블록 및 두번째 TSE 블록에 입력하여 작업 특징을 추출할 수 있다. 이때, 결합된 공유 특징을 기반으로 각 작업에 영향을 미치는 특징맵의 표현력을 향상시킬 수 있다. 즉, 타 작업의 공유 특징을 학습할 수 있으므로 입력값에 노이즈가 포함되더라도 작업 특징 추출을 성공적으로 수행할 수 있고 작업 특징 추출 결과에 대한 정확도를 높일 수 있다.Each of the plurality of second encoders may calculate an attention score of a task feature using the TSE block. Each of the second encoders may extract a task feature by inputting result values extracted through the first convolution block and the first TSE block to the second convolution block and the second TSE block. At this time, it is possible to improve the expressiveness of the feature map affecting each task based on the combined shared features. That is, since shared features of other tasks can be learned, task feature extraction can be successfully performed even if noise is included in the input value, and the accuracy of the task feature extraction result can be increased.

한편, 분석장치는 복수개의 분류기를 통해 각각 서로 다른 항목에 대한 분류값을 출력한다. 각각의 분류기는 하자에 따른 작업의 타입, 하자의 위치, 하자의 타입 및 속성을 분류 작업으로 포함할 수 있다. 도 1을 통해 예시로 든 바와 같이 모델의 출력값이 “욕실 바닥 방수 시공”이라면 작업의 타입은 “방수 시공”으로, 하자의 위치는 “욕실”로, 하자의 타입은 “누수”로, 하자의 속성은 “바닥”으로 각각 분류될 수 있다.Meanwhile, the analysis device outputs classification values for each different item through a plurality of classifiers. Each classifier may include, as a classification task, the type of work according to the defect, the location of the defect, the type and attribute of the defect. As shown in FIG. 1 as an example, if the output value of the model is “bathroom floor waterproof construction”, the type of work is “waterproof construction”, the location of the defect is “bathroom”, the type of defect is “water leak”, and the type of defect is “waterproof construction”. Attributes can each be classified as "floor".

한편, 항목의 개수에 따라 복수개의 제 2 인코더와 복수개의 분류기의 병렬 구조가 결정될 수 있다. 예컨대, 작업이 4개인 경우 제 2 인코더는 4개의 레이어가 병렬로 연결된 구조일 수 있고 분류기 또한 4개가 병렬로 연결된 구조일 수 있다. 물론 제 1 인코더는 반드시 4개일 필요는 없다. 이러한 과정에 따라 건물 하자에 대한 텍스트 데이터에 포함된 하자에 대한 서로 다른 항목들을 동시에 수행할 수 있다. Meanwhile, a parallel structure of a plurality of second encoders and a plurality of classifiers may be determined according to the number of items. For example, when there are 4 tasks, the second encoder may have a structure in which 4 layers are connected in parallel, and the classifier may also have a structure in which 4 layers are connected in parallel. Of course, the number of first encoders need not necessarily be four. According to this process, different items for defects included in the text data for building defects can be performed simultaneously.

도 2는 개시된 기술의 일 실시예에 따른 건물 하자 분류 방법에 대한 순서도이다. 도 2를 참조하면 건물 하자 분류 방법(200)은 분석장치에 탑재된 딥러닝 모델을 통해 순차적으로 수행될 수 있으며 텍스트 데이터를 입력받는 단계(210), 서로 다른 복수개의 공유 특징을 추출하는 단계(220), 복수개의 작업 특징을 추출하는 단계(230) 및 각 항목에 대한 분류값을 출력하는 단계(240)를 포함한다.2 is a flowchart of a building defect classification method according to an embodiment of the disclosed technology. Referring to FIG. 2, the building defect classification method 200 may be sequentially performed through a deep learning model loaded in an analysis device, and includes a step of receiving text data (210) and a step of extracting a plurality of different shared features ( 220), extracting a plurality of task features (230), and outputting a classification value for each item (240).

210 단계에서 분석장치는 건물 하자에 대한 텍스트 데이터를 입력받는다. 건물 하자에 대한 텍스트 데이터는 건물에 발생한 하자를 설명하는 짧은 텍스트로 이루어져 있으며 하자보수 작업자에게 전달하기 위해 건물 관리인이나 거주자가 작성한 데이터일 수 있다. 분석장치는 텍스트 데이터를 전송하는 수단으로부터 텍스트 데이터를 수신할 수도 있고 직접 텍스트 데이터를 입력받을 수도 있다.In step 210, the analysis device receives text data about building defects. Text data on building defects consists of short texts describing defects in the building, and may be data prepared by a building manager or occupants to be delivered to repair workers. The analysis device may receive text data from a means for transmitting text data or may directly receive text data.

220 단계에서 분석장치는 딥러닝 모델의 제 1 인코더인 복수개의 공유 특징 인코더에 텍스트 데이터를 각각 전달한다. 각 공유 특징 인코더는 입력된 텍스트 데이터로부터 공유 특징을 추출한다. 공유 특징 인코더는 서로 다른 크기의 필터를 가진 컨볼루션 레이어가 병렬로 연결된 구조이며 각각의 레이어는 컨볼루션 블록과 TSE 블록으로 구성된다. 당연히 필터 크기가 서로 다르므로 각 레이어에서는 서로 다른 공유 특징들이 추출될 수 있다.In step 220, the analysis device transmits text data to a plurality of shared feature encoders, which are the first encoders of the deep learning model. Each shared feature encoder extracts shared features from input text data. The shared feature encoder has a structure in which convolution layers having filters of different sizes are connected in parallel, and each layer is composed of a convolution block and a TSE block. Naturally, since the filter sizes are different, different shared features can be extracted from each layer.

230 단계에서 딥러닝 모델의 제 2 인코더인 복수개의 작업(Task) 특화 특징 인코더는 복수개의 공유 특징들을 결합한 값을 입력받아 복수의 작업 특징들을 추출한다. 앞서 220 단계에서 공유 특징 인코더가 각 공유 특징에 대한 어텐션 스코어를 포함한 결과값을 출력하였으므로 작업 특화 특징 인코더는 어텐션 스코어를 참조하여 각 작업에 적합한 특징들을 추출할 수 있다.In step 230, a plurality of task-specific feature encoders, which are second encoders of the deep learning model, receive a value obtained by combining a plurality of shared features and extract a plurality of task features. Since the shared feature encoder outputs the resulting value including the attention score for each shared feature in step 220, the task-specific feature encoder can extract features suitable for each task by referring to the attention score.

240 단계에서 딥러닝 모델의 복수개의 분류기는 복수의 작업 특징들의 항목을 각각 분류한다. 복수개의 분류기는 각 작업을 서로 다른 항목으로 동시에 분류하기 위해서 작업 특징에서 채널 당 가장 중요한 특징만 풀링할 수 있다. 이는 작업 특화 특징 인코더에 포함된 TSE 블록을 통해 계산된 어텐션 스코어를 통해 결정된다. 예컨대, 어텐션 스코어가 가장 높은 것을 풀링할 수 있다. 이러한 과정을 병렬로 연결된 각각의 분류기들이 처리함으로써 항목 분류를 멀티태스킹 할 수 있다.In step 240, a plurality of classifiers of the deep learning model classify items of a plurality of task characteristics. A plurality of classifiers can pool only the most important features per channel from task features in order to simultaneously classify each task into different items. This is determined through the attention score calculated through the TSE block included in the task-specific feature encoder. For example, one with the highest attention score may be pooled. Item classification can be multitasked by processing this process by each classifier connected in parallel.

도 3은 개시된 기술의 일 실시예에 따른 건물 하자 분류 장치에 대한 블록도이다. 도 3을 참조하면 건물 하자 분류 장치(300)는 입력장치(310), 저장장치(320) 및 연산장치(330)를 포함한다.3 is a block diagram of an apparatus for classifying building defects according to an embodiment of the disclosed technology. Referring to FIG. 3 , the building defect classification device 300 includes an input device 310 , a storage device 320 and an arithmetic device 330 .

입력장치(310)는 건물 하자에 대한 텍스트 데이터를 입력받는다. 입력장치(310)는 텍스트 데이터를 입력받기 위한 통신모듈 및 입력 인터페이스를 포함할 수 있다. 예컨대, 통신모듈을 통해 타 디바이스에서 전송되는 텍스트 데이터를 수신하거나 입력 인터페이스를 통해 사용자가 입력하는 텍스트 데이터를 수신할 수 있다. 통신모듈은 블루투스와 같은 근거리 무선 통신을 지원할 수도 있고 4G, 5G 등의 이동통신을 지원할 수도 있다. 그리고 입력 인터페이스로 키보드나 터치패드와 같은 장치를 구비할 수도 있다.The input device 310 receives text data about building defects. The input device 310 may include a communication module and an input interface for receiving text data. For example, text data transmitted from another device may be received through a communication module or text data input by a user may be received through an input interface. The communication module may support short-range wireless communication such as Bluetooth or mobile communication such as 4G and 5G. In addition, a device such as a keyboard or a touch pad may be provided as an input interface.

저장장치(320)는 딥러닝 모델을 저장한다. 저장장치(320)에 저장되는 딥러닝 모델은 토큰화 된 텍스트를 처리하도록 학습된 것으로, 복수개의 공유 특징 인코더, 복수개의 작업(Task) 특화 특징 인코더 및 복수개의 분류기를 포함한다. 복수개의 공유 특징 인코더에는 서로 다른 크기의 필터가 구비되며 복수개의 작업 특화 특징 인코더에는 동일한 크기의 필터가 구비된다. 저장장치(320)는 이러한 딥러닝 모델을 저장할 수 있는 용량을 가진 메모리로 구현될 수 있다. The storage device 320 stores the deep learning model. The deep learning model stored in the storage device 320 is learned to process tokenized text, and includes a plurality of shared feature encoders, a plurality of task-specific feature encoders, and a plurality of classifiers. Filters of different sizes are provided in the plurality of shared feature encoders, and filters of the same size are provided in the plurality of task-specific feature encoders. The storage device 320 may be implemented as a memory having a capacity to store such a deep learning model.

연산장치(330)는 딥러닝 모델에 텍스트 데이터를 입력하여 특징을 추출하고 각 특징 별 항목을 분류할 수 있다. 연산장치(330)는 카테고리를 분류하는 연산을 수행할 수 있는 AP 내지는 프로세서로 구현될 수 있다. 연산장치(330)는 먼저 건물 하자에 대한 텍스트 데이터를 딥러닝 모델에 입력 가능한 형태로 임베딩할 수 있다. 그리고, 딥러닝 모델에 포함된 서로 다른 크기의 필터를 갖는 복수개의 공유 특징 인코더에 임베딩 된 텍스트 데이터를 각각 입력하여 서로 다른 복수개의 공유 특징들을 추출할 수 있다. 그리고, 복수개의 공유 특징들을 결합한 값을 복수개의 작업 특화 특징 인코더에 입력하여 복수의 작업 특징들을 각각 추출하고, 복수개의 분류기에 복수의 작업 특징들을 각각 입력하여 서로 다른 항목에 대한 분류값을 출력하는 연산을 수행할 수 있다.The computing device 330 may input text data to the deep learning model, extract features, and classify items for each feature. The computing device 330 may be implemented as an AP or a processor capable of performing calculations for classifying categories. The arithmetic device 330 may first embed text data about building defects into a form that can be input to the deep learning model. In addition, a plurality of different shared features may be extracted by inputting text data embedded in a plurality of shared feature encoders having filters of different sizes included in the deep learning model, respectively. In addition, a value obtained by combining a plurality of shared features is input to a plurality of task-specific feature encoders to extract a plurality of task features, respectively, and a plurality of task features are respectively input to a plurality of classifiers to output classification values for different items. calculations can be performed.

한편, 상술한 바와 같은 건물 하자 분류 장치(300)는 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수도 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.Meanwhile, the building defect classification apparatus 300 as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be stored and provided in a temporary or non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium is not a medium that stores data for a short moment, such as a register, cache, or memory, but a medium that stores data semi-permanently and can be read by a device. Specifically, the various applications or programs described above are CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), EPROM (Erasable PROM, EPROM) Alternatively, it may be stored and provided in a non-transitory readable medium such as EEPROM (Electrically EPROM) or flash memory.

일시적 판독 가능 매체는 스태틱 램(Static RAM，SRAM), 다이내믹 램(Dynamic RAM，DRAM), 싱크로너스 디램 (Synchronous DRAM，SDRAM), 2배속 SDRAM(Double Data Rate SDRAM，DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM，ESDRAM), 동기화 DRAM(Synclink DRAM，SLDRAM) 및 직접 램버스 램(Direct Rambus RAM，DRRAM) 과 같은 다양한 RAM을 의미한다.Temporary readable media include static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), and enhanced SDRAM (Enhanced SDRAM). SDRAM, ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM) and Direct Rambus RAM (DRRAM).

도 4는 개시된 기술의 일 실시예에 따른 다중 작업 채널 어텐션을 이용한 딥러닝 모델의 구조를 나타낸 도면이다. 종래 자연어 처리를 위한 합성곱 신경망의 구조는 컨볼루션 레이어의 깊이를 증가시켜서 공간 계측 구조를 만든 후 이를 통해 높은 수준의 특징을 추출하여 성능을 향상시킬 수 있었다. 이러한 구조의 장점은 낮은 레벨의 컨볼루션 레이어가 로컬 특징을 추출하는 동안 일반적인 패턴을 학습하고 높은 레벨의 컨볼루션 레이어가 전체 입력에 걸쳐 포괄적인 패턴을 학습한다는 것이다. 그러나 입력되는 텍스트 데이터의 길이가 짧은 경우에는 데이터 부족으로 인해 모델의 깊이를 늘려도 충분한 성능을 나타내지 못하는 문제가 있었다.4 is a diagram showing the structure of a deep learning model using multi-working channel attention according to an embodiment of the disclosed technology. In the conventional structure of convolutional neural networks for natural language processing, a spatial measurement structure was created by increasing the depth of the convolutional layer, and then high-level features were extracted through this to improve performance. The advantage of this architecture is that low-level convolutional layers learn general patterns while extracting local features, and high-level convolutional layers learn global patterns across all inputs. However, when the length of the input text data is short, there is a problem in that sufficient performance is not shown even if the depth of the model is increased due to lack of data.

따라서, 제한된 깊이의 레이어만으로 특징을 효율적으로 추출할 수 있는 모델이 요구된다. 개시된 기술에서 이용하는 딥러닝 모델은 이러한 문제점을 해소하기 위하여 고안된 것으로, AutoDefect 모델이라고 명명한다. AutoDefect 모델은 복수의 레이어가 병렬로 연결된 구조를 갖는다. 그리고 2단계로 구성된 병렬 인코더들을 이용하여 특징을 추출함으로써 짧은 길이의 텍스트 데이터가 입력되더라도 충분한 성능을 보장할 수 있다.Therefore, a model capable of efficiently extracting features using only a layer with a limited depth is required. The deep learning model used in the disclosed technology is designed to solve this problem, and is called an AutoDefect model. The AutoDefect model has a structure in which a plurality of layers are connected in parallel. In addition, by extracting features using two-step parallel encoders, sufficient performance can be guaranteed even when short text data is input.

한편, 각각 독립된 모델을 설계하지 않고 병렬로 연결된 하나의 모델을 이용하여 높은 성능을 나타낼 수 있다. 예컨대, 특징 추출을 병렬로 동시에 처리함으로써 샘플의 수를 늘리는 효과가 발생하므로 과적합을 방지하는 것이 가능하다. 그리고 TSE 블록을 이용하여 각 레이어 별 특징맵에서 상대적인 중요도(어텐션 스코어)를 반영함으로써 모델이 의미있는 정보를 학습하여 특징 추출 기능이 극대화될 수 있다.On the other hand, high performance can be exhibited by using one model connected in parallel without designing each independent model. For example, since the number of samples is increased by concurrently processing feature extraction in parallel, it is possible to prevent overfitting. In addition, by reflecting the relative importance (attention score) in the feature map for each layer using the TSE block, the model learns meaningful information and the feature extraction function can be maximized.

한편, TSE 블록은 도 4에 도시된 바와 같이 1D CNN 기반 텍스트 모델링에 적합한 형태로 재설계된 SE Net(Squeeze and excitation network)이다. SE Net은 각 채널들의 중요한 정보만을 추출하고 채널간 의존성에 따라 재조정하는 네트워크를 의미한다. 이와 유사하게 TSE 블록은 입력된 텍스트 데이터에서 특징을 채널별로 추출하고 각 채널별 가중치인 어텐션 스코어를 계산한다. 어텐션 스코어는 이하의 수학식 1에 따라 정의된다.Meanwhile, as shown in FIG. 4, the TSE block is a redesigned SE Net (Squeeze and excitation network) suitable for 1D CNN-based text modeling. SE Net refers to a network that extracts only important information from each channel and readjusts it according to dependencies between channels. Similarly, the TSE block extracts features for each channel from the input text data and calculates an attention score, which is a weight for each channel. The attention score is defined according to Equation 1 below.

[수학식 1][Equation 1]

TSE 블록에서 글로별 평균 풀링(Global Average Polling, GAP)는 1D CNN의 출력인 특징맵

에서 채널 별 공간 정보를 요약하는 연산을 의미한다. 이는 SE Net의 GAP로부터 변경된 공식으로 아래 수학식 2로 정의된다.Global Average Polling (GAP) in the TSE block is the feature map output of the 1D CNN

It means an operation summarizing spatial information for each channel in . This is a formula modified from the GAP of SE Net and is defined as Equation 2 below.

[수학식 2][Equation 2]

SE Net의 엑시테이션(Encitation)에 대응하는 TSE 블록의 연산 과정은 병목 현상 없이 직접 채널 별 관계를 계산하도록 변경된다. 병목 현상의 제거는 특징을 저차원 공간으로 투영한 뒤 다시 매핑하는 과정을 생략한다는 것을 의미한다. 즉, 종래 SE Net의 경우에는 이미지를 처리하는데 따른 코스트를 줄이기 위해 저차원 매핑을 한번 더 수행하였으나 개시된 기술에서 이용하는 TSE 블록은 기본적으로는 SE Net과 유사하지만, SE Net에서 병목 구간을 제거함으로써 짧은 길이의 텍스트에 대한 특징을 추출하는데 적합한 형태로 변경된 것이다. 이러한 변경에 따라 TSE 블록은 입력값에 대한 연결관계가 손실되지 않아서 TSE 블록의 어텐션 스코어가 명확한 채널 별 관계 정보를 나타낼 수 있다. TSE의 어텐션 스코어는

의 출력에 완전 연결 계층(Fully Connected Layer)의

를 한번 곱하고 시그모이드 함수를 활성화하여 계산된다. 그리고 입력

의 채널 별 특징맵에 어텐션 스코어를 곱하는 과정은 SE Net과 동일하게 수행된다. 공간 정보를 조정하는 대신 채널 단위 정보를 조정하는 이 과정은 건물 하자에 대한 텍스트 데이터와 같이 길이가 짧은 텍스트의 특징을 효율적으로 추출하는데 영향을 미치게 된다.The calculation process of the TSE block corresponding to the activation of the SE Net is changed to directly calculate the relationship per channel without bottlenecks. Eliminating the bottleneck means omitting the process of projecting features into a low-dimensional space and remapping them. That is, in the case of the conventional SE Net, low-dimensional mapping is performed once more to reduce the cost of image processing, but the TSE block used in the disclosed technology is basically similar to the SE Net, but by removing the bottleneck section in the SE Net, a short It is changed to a form suitable for extracting features of text of length. According to this change, the TSE block does not lose the connection relationship with the input value, so that the attention score of the TSE block can indicate clear channel-specific relationship information. TSE's attention score

The output of the Fully Connected Layer

It is calculated by multiplying once and activating the sigmoid function. and enter

The process of multiplying the feature map of each channel by the attention score is performed in the same way as SE Net. This process of adjusting channel-level information instead of adjusting spatial information has an effect on efficiently extracting features of short texts such as text data about building defects.

한편, AutoDefect 모델의 입력 차원의 크기는

이며,

은 문장의 길이를 의미한다. 그리고

는 단어 임베딩 벡터의 차원을 의미한다. 모델에 입력할 데이터를 구성하기 위해 사전에 훈련된 임베딩 행렬을 조회하고 텍스트의 정수로 인코딩된 단어 토큰을 실제 값인 임베딩 벡터로 변환할 수 있다. 즉, 고유한 단어 토큰의 고차원 행렬을 사용하는 대신 저차원의 임베딩 값인

를 사용한다. 그리고 하나의 임베딩 모델이 아닌 서로 다른 복수의 임베딩 모델을 이용할 수 있다. 이러한 모델은 임베딩을 위한 행렬 형태일 수 있으며 서로 다른 두 모델을 함께 이용하여 텍스트를 임베딩할 수 있다. 예컨대, Word2vec과 FastText를 함께 이용하여 텍스트를 임베딩할 수 있다. Word2vec과 FastText에 따라 AutoDefect 모델의 입력 차원은 300차원 임베딩 벡터를 더한 600이 되는데, 이와 같이 큰 단어 임베딩 데이터는 입력 정보가 짧은 텍스트 길이에는 충분하지 않기 때문에 유용할 수 있으며 하나의 행렬에 임베딩 벡터가 없는 특정 단어에 대한 임베딩 정보를 다른 행렬을 통해 획득하는 것이 가능하다.On the other hand, the size of the input dimension of the AutoDefect model is

is,

represents the length of the sentence. and

denotes the dimension of the word embedding vector. To construct the data to be input to the model, we can look up a pre-trained embedding matrix and convert word tokens encoded as integers in the text into real-valued embedding vectors. That is, instead of using a high-dimensional matrix of unique word tokens, a low-dimensional embedding value

Use In addition, a plurality of different embedding models may be used instead of one embedding model. This model may be in the form of a matrix for embedding, and text may be embedded using two different models together. For example, text can be embedded using both Word2vec and FastText. According to Word2vec and FastText, the input dimension of the AutoDefect model is 600 plus the 300-dimensional embedding vector, which can be useful because large word embedding data like this is not sufficient for short text lengths, and embedding vectors in one matrix It is possible to obtain embedding information for a specific word that does not exist through another matrix.

도 5는 개시된 기술의 일 실시예에 따라 임베딩된 벡터를 병렬로 처리하는 과정을 나타낸 도면이다. 도 5를 참조하면 AutoDefect 모델의 공유 특징 인코더는 병렬로 연결된 3개의 컨볼루션 레이어로 구성될 수 있다. 각 컨볼루션 레이어에는 서로 다른 크기의 필터가 포함된다. 서로 다른 크기의 필터를 통해 다양한 공간 크기의 로컬 상관관계를 학습하여 다양한 공유 특징을 추출할 수 있다. 도 5와 같이 추출된 특징은 3-grams, 4-grams, 5-grams의 서로 길이가 다른 데이터이며 단어 토큰 간에 압축된 관계형 텍스트 정보를 의미한다. 이와 같이 1차적으로 추출된 특징은 다음 컨볼루션 레이어에 입력되어 높은 수준의 특징으로 추출될 수 있다.5 is a diagram illustrating a process of processing embedded vectors in parallel according to an embodiment of the disclosed technology. Referring to FIG. 5 , the shared feature encoder of the AutoDefect model may be composed of three convolutional layers connected in parallel. Each convolutional layer contains filters of different sizes. Various shared features can be extracted by learning local correlations of different spatial sizes through filters of different sizes. The features extracted as shown in FIG. 5 are data of different lengths of 3-grams, 4-grams, and 5-grams, and mean relational text information compressed between word tokens. The features primarily extracted in this way can be input to the next convolution layer and extracted as high-level features.

도 6은 인코더의 세부적인 구조를 나타낸 도면이다. 도 6에 도시된 바와 같이 인코더는 컨볼루션 블록이 반복되는 구조이며, 각 컨볼루션 블록은 컨볼루션 레이어, 배치 정규화 레이어, PReLU 레이어 및 TSE 블록으로 구성된다6 is a diagram showing a detailed structure of an encoder. As shown in FIG. 6, the encoder has a structure in which convolution blocks are repeated, and each convolution block is composed of a convolution layer, a batch normalization layer, a PReLU layer, and a TSE block.

도 6과 같이 여러 필터들 중 크기가 5인 필터를 사용한 컨볼루션 블록을 이용하되 학습에 대한 정확도를 높이기 위해 배치 정규화와 학습 가능한 활성화 함수를 이용할 수 있다. 그리고 기울기값(그라디언트)이 손실되는 것을 방지하기 위해 TSE 블록에 따라 컨볼루션 계층에서 출력된 채널 별 특징맵에 어텐션 스코어를 곱할 수 있다. TSE 블록은 글로벌 풀링 레이어를 통해 스퀴즈(Squeeze) 연산을 수행한 결과값에 완전 연결 레이어의 파라미터를 1회 곱하여 엑시테이션(Excitation) 연산을 수행하고, 엑시테이션 연산의 수행 결과값에 시그모이드(Sigmoid) 함수를 적용하여 어텐션 스코어를 계산할 수 있다.As shown in FIG. 6, a convolution block using a filter having a size of 5 among several filters is used, but batch normalization and a learnable activation function can be used to increase learning accuracy. In addition, in order to prevent loss of the gradient value, the attention score may be multiplied by the feature map for each channel output from the convolution layer according to the TSE block. The TSE block multiplies the result of the squeeze operation through the global pooling layer once by the parameters of the fully connected layer to perform the excitation operation, and the excitation operation result is sigmoid ( Sigmoid) function can be applied to calculate the attention score.

한편, TSE 블록의 계산에 따라 높은 어텐션 스코어를 갖는 특징은 4가지 작업(Task)에 영향을 미치는 중요한 공통패턴을 학습한 특징맵에 반영될 수 있다. 결과적으로 채널 별 특징맵의 표현력이 강화되므로 작업에 적합한 특징 추출이 가능하다.Meanwhile, according to the calculation of the TSE block, a feature having a high attention score may be reflected in a feature map in which important common patterns affecting the four tasks are learned. As a result, since the expressiveness of the feature map for each channel is strengthened, it is possible to extract features suitable for the work.

한편, 작업 특화 특징 인코더의 입력은

에서 추출된 특징의 채널 별 연결이다. 이는 공유 특징 인코더의 서로 다른 복수의 필터로부터 추출된 특징을 채널 별로 합산한 것을 의미한다. 이 입력은 작업 특화 특징 인코더의 4개의 분할된 병렬 컨볼루션 블록에 입력되어 4개의 작업에 적합한 특징을 추출하는데 이용된다.On the other hand, the input of the task-specific feature encoder is

It is a channel-specific connection of features extracted from . This means that features extracted from a plurality of different filters of the shared feature encoder are summed for each channel. This input is input to the four divided parallel convolution blocks of the task-specific feature encoder and used to extract features suitable for the four tasks.

이와 같이 추출된 4개의 작업 특징은 분류기로 입력되어 4개의 작업 별 카테고리를 분류하는 멀티태스킹 과정이 수행될 수 있다. 각 작업 별로 분류된 카테고리를 출력하기 위해서 각 작업에 맞게 추출된 특징 중 채널 당 가장 중요한 특징만을 글로벌 최대 풀링(Global Max Polling, GMP)을 통해 선택할 수 있다. 그리고 밀집 레이어(Dence Layer)에 연결되고 소프트 맥스 활성화를 통해 각 작업에 대한 분류 확률을 계산할 수 있다. 이러한 방식으로 AutoDefect 모델은 4가지 작업의 결과를 동시에 출력하는 멀티 태스킹을 통해 작업을 분류할 수 있다.The four task characteristics extracted in this way are input to a classifier, and a multitasking process of classifying four task-specific categories may be performed. In order to output categories classified for each task, only the most important features per channel among the features extracted for each task can be selected through Global Max Polling (GMP). And it is connected to the dense layer and can calculate the classification probability for each task through soft max activation. In this way, the AutoDefect model can classify tasks through multitasking, outputting the results of four tasks simultaneously.

개시된 기술의 일 실시예에 따른 다중 작업 채널 어텐션을 이용하여 건물 하자를 분류 방법 및 장치는 이해를 돕기 위하여 도면에 도시된 실시 예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 개시된 기술의 진정한 기술적 보호범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.The method and apparatus for classifying building defects using multi-working channel attention according to an embodiment of the disclosed technology have been described with reference to the embodiments shown in the drawings to aid understanding, but this is only exemplary, and conventional in the art. Those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom. Therefore, the true technical scope of protection of the disclosed technology should be defined by the appended claims.

Claims

receiving text data about building defects by an analysis device;
receiving the text data and extracting a plurality of shared features by a plurality of first encoders having filters of different sizes;
extracting, by a plurality of second encoders, a plurality of working features by receiving values obtained by combining the features output by the first encoders; and
A method of automatically classifying building defects comprising a plurality of classifiers receiving values output from each of the plurality of task-specific feature encoders and outputting classification values for different items, respectively.

According to claim 1,
The building defect classification method comprising the analysis device to embed the text data using two different embedding models.

According to claim 1,
The plurality of first encoders each include one or more structures in which a convolution layer and an attention layer are repeated, and the convolution layers have different filter sizes.

According to claim 1,
The plurality of second encoders each include one or more structures in which a convolution layer and an attention layer are repeated, and the convolution layers have the same filter size.

According to claim 3,
The attention layer includes a Global Average Pooling (GAP) layer and a Fully Connected layer,
Performing an excitation operation by multiplying a result value obtained by performing a squeeze operation through the global pooling layer by a parameter of the fully connected layer once;
A building defect classification method for calculating an attention score by applying a sigmoid function to a result of performing the actuation operation.

According to claim 1,
Each of the plurality of first encoders calculates an attention score of the shared feature using a Text Squeeze and Excitation (TSE) block,
The building defect classification method of claim 1 , wherein each of the plurality of second encoders calculates an attention score of the task feature using a TSE block.

According to claim 1,
The plurality of classifiers respectively output the type of work according to the defect, the location of the defect, the type of defect, and the property of the defect as the classification value.

an input device for receiving text data about building defects;
a storage device for storing deep learning models including a plurality of first encoders having filters of different sizes, a plurality of second encoders, and a plurality of classifiers; and
Inputting the text data to the plurality of first encoders to extract a plurality of different shared features, respectively, and inputting a value obtained by combining the plurality of shared features to the plurality of second encoders to extract a plurality of working features, respectively; and an arithmetic unit inputting the plurality of work characteristics to the plurality of classifiers and outputting classification values for different items, respectively.

According to claim 8,
The deep learning model further includes two different embedding models, and the building defect classification apparatus including embedding the text data using the two embedding models.

According to claim 8,
The plurality of first encoders each include one or more structures in which a convolution layer and an attention layer are repeated, and the convolution layers have different filter sizes.

According to claim 8,
The plurality of second encoders each include one or more structures in which a convolution layer and an attention layer are repeated, and the convolution layer has the same filter size.

According to claim 10,
The attention layer includes a Global Average Pooling (GAP) layer and a Fully Connected layer,
Performing an excitation operation by multiplying a result value obtained by performing a squeeze operation through the global pooling layer by a parameter of the fully connected layer once;
A building defect classification device that calculates an attention score by applying a sigmoid function to a result of the execution of the actuation operation.

According to claim 8,
Each of the plurality of first encoders calculates an attention score of the shared feature using a Text Squeeze and Excitation (TSE) block,
The plurality of second encoders each calculate an attention score of the work feature using a TSE block.

According to claim 8,
The plurality of classifiers respectively output the type of work according to the defect, the location of the defect, the type of defect, and the property of the defect as the classification value.