KR102193952B1

KR102193952B1 - Method of learning and inferring based on convolution neural network for minimizing clutter in passive sonar signal

Info

Publication number: KR102193952B1
Application number: KR1020190053214A
Authority: KR
Inventors: 박지훈; 이상호; 정대진; 김인철
Original assignee: 국방과학연구소
Priority date: 2019-05-07
Filing date: 2019-05-07
Publication date: 2020-12-22
Also published as: KR20200130777A

Abstract

본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 추론 방법이 제공된다. 상기 방법은 제어부에 의해 수행되고, 상기 음탐그램의 클러터 제거를 위한 콘볼루션 뉴럴 네트워크를 구성하는 콘볼루션 뉴럴 네트워크 구성 단계; 음탐그램 데이터를 이용하여 학습 데이터를 생성하는 학습 데이터 생성 단계; 및 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트를 통해 음탐그램 클러터를 제거하고 토널 성분을 탐지하여, 음탐그램을 추론하는 음탐그램 추론 단계를 포함하며, 클러터를 자동 제거하여, 토널 성분 탐지의 정확도 향상 및 자동화를 통한 음탐그램 검출 및 추론 방법을 제공할 수 있다.In accordance with the present invention, a convolutional neural network-based inference method is provided for minimizing sound gram clutter. The method includes a convolutional neural network configuration step of configuring a convolutional neural network for removing clutter from the sound tomgram, performed by a control unit; A learning data generation step of generating learning data by using sound gram data; And a sonicgram inference step of inferring a sound gram by removing sound gram clutter and detecting a tonal component through the learned convolutional neural network structure and learning weight, and automatically removing the clutter to detect the tonal component. It is possible to provide a method for detecting and inferencing soundgrams through improved accuracy and automation.

Description

{Method of learning and inferring based on convolution neural network for minimizing clutter in passive sonar signal}

본 발명은 음탐그램에서 클러터 제거를 통한 음탐그램 검출 및 추론 방법에 관한 것이다. 보다 상세하게는, 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법에 관한 것이다. The present invention relates to a method for detecting and inferring a tomgram by removing clutter from a tomgram. In more detail, it relates to a convolutional neural network-based learning and inference method for minimizing the tomgram clutter.

해양환경에서는 음향신호의 전달 특성을 활용한 소나를 이용하여 수중 및 수상함의 탐지/식별을 진행할 수 있다.In a marine environment, underwater and surface ships can be detected/identified by using a sonar utilizing the transmission characteristics of acoustic signals.

이와 관련하여, 수중 청음기를 이용하여 해양의 음향신호를 녹음한 후, 다수의 신호처리과정을 거쳐 LOFAR(LOw Frequency Analysis and Recording) 또는 DEMON(Detection of Envelope Modulation On Noise)과 같은 시간-주파수 분석/표현 방법을 이용하여 음향신호를 가시적으로 확인할 수 있는 음탐그램을 생성한다. 이후, 이러한 음탐그램을 사람의 눈으로 분석하여 토널 성분을 탐지하고 해당 주파수 위치를 기준으로 알려져 있는 수중 및 수상함과 비교하여 식별과정을 진행할 수 있다. In this regard, time-frequency analysis such as LOFAR (LOw Frequency Analysis and Recording) or DEMON (Detection of Envelope Modulation On Noise) after multiple signal processing processes after recording acoustic signals of the ocean using an underwater earpiece/ Using the expression method, a sound probe is generated that can visually check the sound signal. Thereafter, the tonal component is detected by analyzing the sound track with the human eye, and the identification process can be performed by comparing the frequency position with known underwater and suspicious vessels.

하지만, 이러한 수동 분석/식별과정은 음탐그램에 섞여 있는 클러터로 인하여 식별과정에 있어 난이도가 매우 높다는 문제점이 있다.However, such a manual analysis/identification process has a problem in that the difficulty in the identification process is very high due to the clutter mixed in the sound tomgram.

따라서, 클러터를 자동 제거하여, 토널 성분 탐지의 정확도 향상 및 자동화를 통한 음탐그램 검출 및 추론 방법이 필요하다. Therefore, there is a need for a method of detecting and inferencing a tonal gram through automation and improving accuracy of tonal component detection by automatically removing clutter.

본 발명의 목적은 클러터를 자동 제거하여, 토널 성분 탐지의 정확도 향상 및 자동화를 통한 음탐그램 검출 및 추론 방법을 제공하는 것에 있다. An object of the present invention is to provide a method for detecting and inferencing a tonal gram through automatic removal of clutter, improving accuracy of tonal component detection, and automating.

또한, 본 발명의 목적은, 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법과 이를 이용하는 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 장치를 제공하는 것에 있다.In addition, the object of the present invention is to It is to provide a convolutional neural network-based learning and inference method for minimizing clutter and a convolutional neural network-based learning and inference device using the same.

상기와 같은 과제를 해결하기 위한 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 추론 방법이 제공된다. 상기 방법은 제어부에 의해 수행되고, 상기 음탐그램의 클러터 제거를 위한 콘볼루션 뉴럴 네트워크를 구성하는 콘볼루션 뉴럴 네트워크 구성 단계; 음탐그램 데이터를 이용하여 학습 데이터를 생성하는 학습 데이터 생성 단계; 및 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트를 통해 음탐그램 클러터를 제거하고 토널 성분을 탐지하여, 음탐그램을 추론하는 음탐그램 추론 단계를 포함하며, 클러터를 자동 제거하여, 토널 성분 탐지의 정확도 향상 및 자동화를 통한 음탐그램 검출 및 추론 방법을 제공할 수 있다.In order to solve the above problems, a convolutional neural network-based inference method is provided for minimizing sonar gram clutter according to the present invention. The method includes a convolutional neural network configuration step of configuring a convolutional neural network for removing clutter from the sound tomgram, performed by a control unit; A learning data generation step of generating learning data by using sound gram data; And a sonicgram inference step of inferring a sound gram by removing sound gram clutter and detecting a tonal component through the learned convolutional neural network structure and learning weight, and automatically removing the clutter to detect the tonal component. It is possible to provide a method for detecting and inferencing soundgrams through improved accuracy and automation.

일 실시 예에서, 음탐그램 식별을 위한 상기 콘볼루션 뉴럴 네트워크 구조는 Fully connected layer를 제외한 Convolutional layer, Activation layer, Batch Normalization layer를 포함한다. In one embodiment, the structure of the convolutional neural network for identification of a tonogram includes a convolutional layer, an activation layer, and a batch normalization layer excluding a fully connected layer.

일 실시 예에서, 상기 콘볼루션 뉴럴 네트워크 구조는, 상기 Convolutional layer가 (rowsize, colsize)의 kernel size를 갖는 경우, 상기 음탐그램의 시간 정보를 잘 반영하기 위하여 rowsize≥colsize의 조건을 만족하는 kernel size를 갖는 것을 특징으로 한다.In one embodiment, the convolutional neural network structure is, when the convolutional layer has a kernel size of (rowsize, colsize), a kernel size that satisfies the condition of rowsize≥colsize in order to reflect time information of the tonegram well. It is characterized by having.

일 실시 예에서, 상기 학습 데이터는 원본 음탐그램과 상기 원본 음탐그램에 해당하는 라벨링 영상을 포함한다. 한편, 상기 학습 데이터 생성 단계에서, 상기 라벨링 영상의 크기와 각 픽셀의 값을 설정한다. 이때, 상기 라벨링 영상의 크기는 상기 원본 음탐그램을 상기 콘볼루션 뉴럴 네트워크에 입력으로 사용하여 출력되는 결과물의 크기와 동일하게 설정될 수 있다.In one embodiment, the training data includes an original sound tomgram and a labeling image corresponding to the original sound tomgram. Meanwhile, in the generating of the training data, a size of the labeling image and a value of each pixel are set. In this case, the size of the labeling image may be set equal to the size of a result output by using the original sound tomgram as an input to the convolutional neural network.

일 실시 예에서, 상기 학습 데이터 생성 단계에서, 상기 원본 음탐그램에서 토널 성분이라고 판단되는 부분에 대해, 제1 라벨링 영상의 해당하는 위치에 상기 토널 성분을 대표하는 값으로 표시할 수 있다. 또한, 상기 원본 음탐그램에서 토널 성분이 아니라고 판단되는 부분에 대해, 제2 라벨링 영상의 해당하는 위치에 클러터를 대표하는 값으로 표시할 수 있다.In an embodiment, in the generating of the training data, a portion of the original sound tomgram determined to be a tonal component may be displayed as a value representing the tonal component at a corresponding position of the first labeling image. In addition, a portion of the original sound tomgram that is determined not to be a tonal component may be displayed as a value representing the clutter at a corresponding position in the second labeling image.

일 실시 예에서, 상기 학습 데이터 생성 단계에서, 상기 라벨링 영상의 크기와 상기 각 픽셀의 값이 반영된 상기 음탐그램 데이터를 이용하여 상기 학습 데이터를 생성할 수 있다. 이때, 상기 음탐그램의 시간 및 주파수 특성을 고려하여, 상기 학습 데이터를 증강하여 생성할 수 있다.In an embodiment, in the generating of the training data, the training data may be generated by using the sound tomgram data in which the size of the labeling image and the value of each pixel are reflected. In this case, the learning data may be augmented and generated in consideration of the time and frequency characteristics of the sound tomgram.

일 실시 예에서, 상기 음탐그램 추론 단계에서, 상기 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트에 따른 학습 모델을 기반으로 상기 음탐그램의 입력에 대하여 상기 각 픽셀에 해당하는 값들이 0 내지 1 사이의 실수 값을 갖도록 음탐그램을 추론할 수 있다. 이때, 상기 각 픽셀을 특정 임계치와 비교하여 이진분류하고, 콘볼루션 뉴럴 네트워크의 출력 값에 관하여 특정 시간 위치에서 2 픽셀 이상 연속적으로 나타나는 토널성분을 1 픽셀 크기로 변경할 수 있다. 또한, 상기 출력 값과 상기 원본 음탐그램의 동일 위치 픽셀값과 비교하여, 차이가 가장 높은 값을 나타내는 부분은 상기 토널성분을 대표하는 미리 설정된 값으로 치환될 수 있다. 상기 적어도 두 개의 픽셀들 중 상기 차이가 가장 큰 픽셀을 제외한 픽셀의 출력 값은 상기 클러터를 대표하는 미리 설정된 값으로 표시될 수 있다.In one embodiment, in the step of inferring the tomgram, values corresponding to the respective pixels for the input of the tomgram are between 0 and 1 based on the learned convolutional neural network structure and a learning model according to a learning weight. We can infer the soundgram to have a real value. In this case, each pixel may be subjected to binary classification by comparing with a specific threshold value, and a tonal component that continuously appears 2 pixels or more at a specific time position with respect to the output value of the convolutional neural network may be changed to a size of 1 pixel. In addition, by comparing the output value with a pixel value at the same location of the original sound tomgram, a portion representing a value having the highest difference may be replaced with a preset value representing the tonal component. An output value of a pixel of the at least two pixels excluding a pixel having the largest difference may be displayed as a preset value representing the clutter.

본 발명의 다른 측면에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 추론 장치가 제공된다. 상기 장치는, 음탐그램에 대한 영상정보를수신하도록 구성된 인터페이스부; 및 상기 음탐그램의 클러터 제거를 위한 콘볼루션 뉴럴 네트워크를 구성하고, 음탐그램 데이터를 이용하여 학습 데이터를 생성하고,학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트를 통해 음탐그램 클러터를 제거하고 토널 성분을 탐지하여, 음탐그램을 추론하도록 구성된 제어부를 포함한다.In accordance with another aspect of the present invention, a convolutional neural network-based inference apparatus for minimizing sonar gram clutter is provided. The device may include an interface unit configured to receive image information for a sound tomgram; And constructing a convolutional neural network for removing the clutter of the sonic gram, generating training data using the sonic gram data, removing the sonic gram clutter through the learned convolutional neural network structure and learning weight, and And a control unit configured to detect the component and infer a sound tomgram.

일 실시 예에서, 음탐그램 식별을 위한 상기 콘볼루션 뉴럴 네트워크 구조는 Fully connected layer를 제외한 Convolutional layer, Activation layer, Batch Normalization layer를 포함할 수 있다. 이때, 상기 제어부는 상기 콘볼루션 뉴럴 네트워크 구조를 적용할 수 있도록 kernel size를 구성할 수 있다. 한편, 상기 콘볼루션 뉴럴 네트워크 구조는, 상기 Convolutional layer가 (rowsize, colsize)의 kernel size를 갖는 경우, 상기 음탐그램의 시간 정보를 잘 반영하기 위하여 rowsize≥colsize의 조건을 만족하는 kernel size를 갖는 것을 특징으로 한다.In an embodiment, the structure of the convolutional neural network for identification of a tonogram may include a convolutional layer, an activation layer, and a batch normalization layer excluding a fully connected layer. In this case, the control unit may configure a kernel size to apply the convolutional neural network structure. On the other hand, the convolutional neural network structure, when the convolutional layer has a kernel size of (rowsize, colsize), has a kernel size that satisfies the condition of rowsize≥colsize in order to reflect the temporal information of the tonegram well. It is characterized.

일 실시 예에서, 상기 학습 데이터는 원본 음탐그램과 상기 원본 음탐그램에 해당하는 라벨링 영상을 포함한다. 한편, 상기 제어부는, 상기 라벨링 영상의 크기와 각 픽셀의 값을 설정할 수 있다. 이때, 상기 라벨링 영상의 크기는 상기 원본 음탐그램을 상기 콘볼루션 뉴럴 네트워크에 입력으로 사용하여 출력되는 결과물의 크기와 동일하게 설정될 수 있다.In one embodiment, the training data includes an original sound tomgram and a labeling image corresponding to the original sound tomgram. Meanwhile, the controller may set a size of the labeling image and a value of each pixel. In this case, the size of the labeling image may be set equal to the size of a result output by using the original sound tomgram as an input to the convolutional neural network.

일 실시 예에서, 상기 제어부는, 상기 원본 음탐그램에서 토널 성분이라고 판단되는 부분에 대해, 제1 라벨링 영상의 해당하는 위치에 상기 토널 성분을 대표하는 값으로 표시할 수 있다. 또한, 상기 원본 음탐그램에서 토널 성분이 아니라고 판단되는 부분에 대해, 제2 라벨링 영상의 해당하는 위치에 클러터를 대표하는 값으로 표시할 수 있다.In an embodiment, the control unit may display a portion of the original sound tomgram that is determined to be a tonal component as a value representing the tonal component at a corresponding position of the first labeling image. In addition, a portion of the original sound tomgram that is determined not to be a tonal component may be displayed as a value representing the clutter at a corresponding position in the second labeling image.

일 실시 예에서, 상기 제어부는, 상기 라벨링 영상의 크기와 상기 각 픽셀의 값이 반영된 상기 음탐그램 데이터를 이용하여 상기 학습 데이터를 생성할 수 있다. 또한, 상기 음탐그램의 시간 및 주파수 특성을 고려하여, 상기 학습 데이터를 증강하여 생성할 수 있다. In an embodiment, the controller may generate the training data by using the sound gram data in which the size of the labeling image and the value of each pixel are reflected. In addition, the learning data may be augmented and generated in consideration of the time and frequency characteristics of the sound tomgram.

일 실시 예에서, 상기 제어부는, 상기 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트에 따른 학습 모델을 기반으로 상기 음탐그램의 입력에 대하여 상기 각 픽셀에 해당하는 값들이 0 내지 1 사이의 실수 값을 갖도록 음탐그램을 추론할 수 있다. 이때, 상기 각 픽셀을 특정 임계치와 비교하여 이진분류하고, 콘볼루션 뉴럴 네트워크의 출력 값에 관하여 특정 시간 위치에서 2 픽셀 이상 연속적으로 나타나는 토널성분을 1 픽셀 크기로 변경할 수 있다.또한, 상기 출력 값과 상기 원본 음탐그램의 동일 위치 픽셀값과 비교하여 차이가 가장 높은 값을 나타내는 부분은 상기 토널성분을 대표하는 미리 설정된 값으로 치환될 수 있다. 상기 적어도 두 개의 픽셀들 중 상기 차이가 가장 큰 픽셀을 제외한 픽셀의 출력 값을 상기 클러터를 대표하는 미리 설정된 값으로 표시할 수 있다.
In an embodiment, the control unit calculates a real value between 0 and 1 in which values corresponding to the respective pixels are based on the learned convolutional neural network structure and a learning model according to a learning weight. You can infer the sound tomgram to have. In this case, each pixel may be subjected to binary classification by comparing with a specific threshold value, and a tonal component that continuously appears 2 pixels or more at a specific time position with respect to the output value of the convolutional neural network may be changed to a size of 1 pixel. A portion indicating a value having the highest difference compared to the pixel value at the same position of the original sound tomgram may be replaced with a preset value representing the tonal component. An output value of a pixel of the at least two pixels excluding a pixel having the largest difference may be displayed as a preset value representing the clutter.

본 발명을 통해 음탐 그램으로부터 클러터를 최소화시켜 토널 주파수 정보를 강조시킴으로써 토널 주파수를 자동/반자동으로 식별하는 과정을 지원할 수 있다는 장점이 있다.According to the present invention, there is an advantage in that it is possible to support a process of automatically/semi-automatically identifying a tonal frequency by minimizing clutter from a sound tomgram to emphasize tonal frequency information.

또한, 본 발명을 통해 궁극적으로 음탐 그램을 이용한 식별/탐지 정확도의 향상을 기대할 수 있다는 장점이 있다.In addition, through the present invention, there is an advantage that it is possible to ultimately expect improvement in identification/detection accuracy using a sonic gram.

도 1은 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 추론 장치의 구성도를 나타낸다.
도 2는 이러한 본 발명에 따른 음탐그램 추론을 위한 토널 성분 검출 (탐지)을 위한 절차의 구성을 나타낸다.
도 3은 본 발명에 따른 콘볼루션 뉴럴 네트워크 구조의 예시를 나타낸다.
도 4는 본 발명에 따른 학습 데이터 생성 방법을 나타낸다.
도 5는 본 발명에 따른 음탐그램 특성을 고려한 데이터 증강 기법의 예시를 나타낸다.
도 6은 본 발명에 따른 콘볼루션 뉴럴 네트워크 기반 추론 방법의 예시를 나타낸다.
도 7은 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법의 흐름도를 나타낸다. 1 is a block diagram of a convolutional neural network-based inference apparatus for minimizing sound gram clutter according to the present invention.
2 shows the configuration of a procedure for detecting (detecting) a tonal component for inferring a tongram according to the present invention.
3 shows an example of a convolutional neural network structure according to the present invention.
4 shows a method of generating learning data according to the present invention.
5 shows an example of a data augmentation technique in consideration of the sonic gram characteristic according to the present invention.
6 shows an example of a convolutional neural network-based inference method according to the present invention.
7 is a flowchart of a convolutional neural network-based learning and inference method for minimizing sonar gram clutter according to the present invention.

전술한 본 발명의 특징 및 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. The features and effects of the present invention described above will become more apparent through the following detailed description in connection with the accompanying drawings, and accordingly, those of ordinary skill in the technical field to which the present invention pertains can easily implement the technical idea of the present invention. I will be able to.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to a specific embodiment, it is to be understood to include all changes, equivalents, or substitutes included in the spirit and scope of the present invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다.In describing each drawing, similar reference numerals are used for similar elements.

제1, 제2등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various elements, but the elements should not be limited by the terms. These terms are used only for the purpose of distinguishing one component from another component.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.For example, without departing from the scope of the present invention, a first element may be referred to as a second element, and similarly, a second element may be referred to as a first element. The term and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in this application. Shouldn't.

이하의 설명에서 사용되는 구성요소에 대한 접미사 모듈, 블록 및 부는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. The suffix modules, blocks, and parts for the components used in the following description are given or mixed in consideration of only the ease of writing the specification, and do not have meanings or roles that are distinguished from each other by themselves.

이하, 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 당해 분야에 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 설명한다. 하기에서 본 발명의 실시 예를 설명함에 있어, 관련된 공지의 기능 또는 공지의 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. In the following description of the embodiments of the present invention, when it is determined that a detailed description of a related known function or a known configuration may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted.

이하에서는, 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법과 이를 이용하는 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 장치에 대해 살펴보기로 한다.In the following, the sound gram according to the present invention A convolutional neural network-based learning and inference method for minimizing clutter and a convolutional neural network-based learning and inference device using the same will be described.

먼저, 본 발명의 구성과 관련하여, 본 발명은 음탐그램 클러터를 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법으로, 다음과 같은 구성 및 기법들을 포함한다.First, with respect to the configuration of the present invention, the present invention is a convolutional neural network-based learning and inference method for minimizing the tomgram clutter, and includes the following configurations and techniques.

- 클러터 제거를 위한 콘볼루션 뉴럴 네트워크 구조: 토널성분 특성을 반영한 콘볼루션 뉴럴 네트워크 구조 -Convolutional neural network structure for clutter removal: Convolutional neural network structure reflecting the characteristics of tonal components

- 음탐그램 데이터를 이용한 학습 데이터 생성 방법: -Learning data generation method using sound gram data:

1) 학습 데이터 생성 방법 1) How to create training data

2) 음탐그램 특성을 고려한 데이터 증강 기법 2) Data augmentation technique considering the characteristics of sound gram

- 콘볼루션 뉴럴 네트워크 기반 추론 방법: 결과 분석 방법 -Convolutional neural network-based reasoning method: result analysis method

한편, 도 1은 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 추론 장치의 구성도를 나타낸다. 도 1을 참조하면, 콘볼루션 뉴럴 네트워크 기반 추론 장치(100)는 인터페이스부(110), 제어부(120) 및 메모리(130)를 포함한다. 여기서, “음탐그램”의 의미는 수동 소나 신호(passive sonar signal)과 동일 또는 유사한 의미로 사용된다. 보다 상세하게, 음탐그램은 수동 소나 신호를 시간 축과 주파수 축상에 매핑한 영상 신호일 수 있다. Meanwhile, FIG. 1 shows a configuration diagram of a convolutional neural network-based inference apparatus for minimizing sound gram clutter according to the present invention. Referring to FIG. 1, a convolutional neural network-based inference device 100 includes an interface unit 110, a control unit 120, and a memory 130. Here, the meaning of “sound tomgram” is used in the same or similar meaning as a passive sonar signal. In more detail, the sound tomgram may be an image signal in which a passive sonar signal is mapped on a time axis and a frequency axis.

인터페이스부(110)는 음탐그램에 대한 영상정보를 수신하도록 구성된다. 한편, 제어부(120)는 상기 음탐그램의 클러터 제거를 위한 콘볼루션 뉴럴 네트워크를 구성하고, 음탐그램 데이터를 이용하여 학습 데이터를 생성하도록 구성된다. 또한, 메모리(130)는 음탐그램에 대한 정보, 학습 데이터에 대한 정보 및/또는 추론된 음탐그램에 대한 정보를 저장하도록 구성된다.The interface unit 110 is configured to receive image information for a sound tomgram. Meanwhile, the control unit 120 is configured to configure a convolutional neural network for removing the clutter of the tomgram, and to generate training data by using the tomgram data. In addition, the memory 130 is configured to store information on sound tomgrams, information on learning data, and/or information on inferred sound tomgrams.

한편, 제어부(120)는 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트를 통해 음탐그램 클러터를 제거하고 토널 성분을 탐지하여, 음탐그램을 추론하도록 구성될 수 있다. On the other hand, the controller 120 may be configured to infer the sound tomgram by removing the sound tomgram clutter and detecting the tonal component through the learned convolutional neural network structure and the learning weight.

한편, 도 2는 이러한 본 발명에 따른 음탐그램 추론을 위한 토널 성분 검출 (탐지)을 위한 절차의 구성을 나타낸다. 이와 관련하여, 도 2에 따른 토널 성분 검출 (탐지)을 위한 절차는 제어부(120)에 의해 수행될 수 있다. 대안적으로, 토널 성분 검출 (탐지)을 위한 절차 중 일부, 예컨대 시간-연속 소나 파형(time-series sonar wave)을 생성하는 것은 별도의 외부 장치에 의해 수행될 수 있다. 이때, 상기 소나 파형을 인터페이스부(110)가 수신하고, 이에 따라, 제어부(120)가 콘볼루션 뉴럴 네트워크 구성, 학습 데이터 생성 및 음탐그램 추론 (분석)을 수행할 수 있다.Meanwhile, FIG. 2 shows the configuration of a procedure for detecting (detecting) a tonal component for inferring a tonal gram according to the present invention. In this regard, the procedure for detecting (detecting) the tonal component according to FIG. 2 may be performed by the controller 120. Alternatively, some of the procedures for tonal component detection (detection), such as generating a time-series sonar wave, may be performed by a separate external device. At this time, the interface unit 110 receives the sonar waveform, and accordingly, the control unit 120 may configure a convolutional neural network, generate training data, and perform sound tomgram inference (analysis).

이러한 음탐그램 데이터와 관련하여, 제어부(120)는 시뮬레이션 파라미터에 기반하여 시뮬레이터를 통해 시간-연속 소나 파형(time-series sonar wave)를 생성할 수 있다. 한편, 제어부(120)는 시뮬레이션 파라미터에 따라 데이터 학습과 연관된 Ground Truth Table을 생성할 수 있다. 또한, 제어부(120)는 상기 시간-연속 소나 파형으로부터 생성된 음탐그램 데이터와 상기 Ground Truth Table을 이용하여 학습 데이터를 생성할 수 있다. 이때, 상기 음탐그램 데이터를 LOFAR (Low Frequency Analysis and Recording)로 지칭할 수 있으나, 이에 한정되는 것은 아니고 응용에 따라 다양하게 변형 가능하다.In relation to the sonar gram data, the controller 120 may generate a time-series sonar wave through a simulator based on a simulation parameter. Meanwhile, the controller 120 may generate a ground truth table related to data learning according to the simulation parameter. In addition, the control unit 120 may generate training data by using the tomgram data generated from the time-continuous sonar waveform and the Ground Truth Table. In this case, the tonegram data may be referred to as LOFAR (Low Frequency Analysis and Recording), but is not limited thereto and may be variously modified according to applications.

한편, 시간-연속 소나 파형으로부터 LOFARgram과 같은 음탐그램 데이터, 즉 Spectrogram을 획득하는 과정은 다음과 같다. 이와 관련하여, 시간-연속 소나 파형을 주파수 영역으로 변환하기 위해 Short-Time Fourier Transform을 수행할 수 있다. 이후, 푸리에 변환된 소나 신호를 LOFARgram과 같은 음탐그램 데이터, 즉 Spectrogram으로 구성할 수 있다. 이때, 음탐그램 데이터는 LOFARgram 이외에 DEMON (DEModulation Of Noise)과 관련된 DEMONgram을 포함할 수 있으나, 이에 한정되는 것은 아니고 응용에 따라 변경 가능하다.On the other hand, the process of acquiring sound tomgram data such as LOFARgram, that is, Spectrogram, from the time-continuous sonar waveform is as follows. In this regard, a Short-Time Fourier Transform may be performed to transform a time-continuous sonar waveform into a frequency domain. Thereafter, the Fourier-transformed sonar signal may be configured as sound tomgram data such as LOFARgram, that is, a spectrogram. In this case, the tonegram data may include DEMONgram related to DEMON (DEModulation Of Noise) in addition to LOFARgram, but is not limited thereto and may be changed according to an application.

한편, 이와 같은 음탐그램 데이터를 이용하여 토널 라인 검출(Tonal line detection)을 수행하다. 또한, 토널 라인 검출을 통해 음탐그램 데이터로부터 클러터와 구분 가능하도록 source 검출(detection) 및 분류(classification)를 수행할 수 있다.On the other hand, tonal line detection is performed by using the tongram data. In addition, source detection and classification may be performed so as to be distinguishable from clutter from the tonal line detection through the tonal line detection.

한편, 음탐그램 식별을 위한 상기 콘볼루션 뉴럴 네트워크 구조는 Fully connected layer를 제외한 Convolutional layer, Activation layer, Batch Normalization layer를 포함할 수 있다. 이때, 제어부(120)는 상기 콘볼루션 뉴럴 네트워크 구조를 적용할 수 있도록 kernel size를 구성할 수 있다. 이와 관련하여, 상기 콘볼루션 뉴럴 네트워크 구조는, 상기 Convolutional layer가 (rowsize, colsize)의 kernel size를 갖는 경우, 상기 음탐그램의 시간 정보를 잘 반영하기 위하여 rowsize≥colsize의 조건을 만족하는 kernel size를 갖는 것을 특징으로 할 수 있다.On the other hand, the convolutional neural network structure for identifying the tomgram may include a convolutional layer, an activation layer, and a batch normalization layer excluding a fully connected layer. At this time, the controller 120 may configure a kernel size to apply the convolutional neural network structure. In this regard, the convolutional neural network structure, when the convolutional layer has a kernel size of (rowsize, colsize), a kernel size that satisfies the condition of rowsize≥colsize in order to reflect the temporal information of the tonegram well It can be characterized by having.

한편, 상기 학습 데이터는 원본 음탐그램과 상기 원본 음탐그램에 해당하는 라벨링 영상을 포함할 수 있다. 이와 관련하여, 제어부(120)는 상기 라벨링 영상의 크기와 각 픽셀의 값을 설정할 수 있다. 이때, 제어부(120)는 상기 라벨링 영상의 크기는 상기 원본 음탐그램을 상기 콘볼루션 뉴럴 네트워크에 입력으로 사용하여 출력되는 결과물의 크기와 동일하게 설정될 수 있다.Meanwhile, the training data may include an original sound tomgram and a labeling image corresponding to the original sound tomgram. In this regard, the controller 120 may set the size of the labeling image and a value of each pixel. In this case, the controller 120 may set the size of the labeling image to be the same as the size of a result output by using the original sound tomgram as an input to the convolutional neural network.

한편, 제어부(120)는 상기 원본 음탐그램에서 토널 성분이라고 판단되는 부분에 대해, 제1 라벨링 영상의 해당하는 위치에 상기 토널 성분을 대표하는 값으로 표시할 수 있다. 또한, 제어부(120)는 상기 원본 음탐그램에서 토널 성분이 아니라고 판단되는 부분에 대해, 제2 라벨링 영상의 해당하는 위치에 클러터를 대표하는 값으로 표시할 수 있다. Meanwhile, the control unit 120 may display a portion of the original sound tomgram that is determined to be a tonal component as a value representing the tonal component at a corresponding position of the first labeling image. In addition, the controller 120 may display a portion of the original sound tomgram that is determined not to be a tonal component as a value representing the clutter at a corresponding position of the second labeling image.

한편, 제어부(120)는 상기 라벨링 영상의 크기와 상기 각 픽셀의 값이 반영된 상기 음탐그램 데이터를 이용하여 상기 학습 데이터를 생성할 수 있다. 이때, 상기 음탐그램의 시간 및 주파수 특성을 고려하여, 상기 학습 데이터를 증강(augmentation)하여 생성할 수 있다.Meanwhile, the control unit 120 may generate the learning data by using the tonegram data in which the size of the labeling image and the value of each pixel are reflected. In this case, the learning data may be generated by augmentation in consideration of the time and frequency characteristics of the tonegram.

따라서, 제어부(120)는 음탐그램 데이터와 Ground Truth Table을 이용하여 생성된 학습 데이터를 대한 데이터 증강(Data augmentation)을 수행할 수 있다. 또한, 제어부(120)는 상기 데이터 증가된 학습 데이터를 이용하여 음탐그램의 토널 성분 검출을 위한 트레이닝(training) 및 평가(evaluation)을 수행할 수 있다. 이러한 일련의 절차를 통해, 음탐그램에서 토널 라인 검출(tonal line detection)이 가능하다.Accordingly, the controller 120 may perform data augmentation on the training data generated by using the sound gram data and the ground truth table. In addition, the control unit 120 may perform training and evaluation for detecting a tonal component of a sound gram using the increased learning data. Through such a series of procedures, tonal line detection in the tonal gram is possible.

한편, 제어부(120)는 상기 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트에 따른 학습 모델을 기반으로 상기 음탐그램의 입력에 대하여 상기 각 픽셀에 해당하는 값들이 0 내지 1 사이의 실수 값을 갖도록 음탐그램을 추론할 수 있다. 이때, 상기 각 픽셀을 특정 임계치와 비교하여 이진분류하고, 콘볼루션 뉴럴 네트워크의 출력 값에 관하여 특정 시간 위치에서 2 픽셀 이상 연속적으로 나타나는 토널성분을 1 픽셀 크기로 변경할 수 있다. 또한, 상기 출력 값과 상기 원본 음탐그램의 동일 위치 픽셀값과 비교하여, 차이가 가장 높은 값을 나타내는 부분은 상기 토널성분을 대표하는 미리 설정된 값으로 치환될 수 있다. 상기 적어도 두 개의 픽셀들 중 상기 차이가 가장 큰 픽셀을 제외한 픽셀의 출력 값을 상기 클러터를 대표하는 미리 설정된 값으로 표시할 수 있다.On the other hand, the control unit 120 is sound probed so that values corresponding to each pixel have a real value between 0 and 1 for the input of the sound tomgram based on the learned convolutional neural network structure and a learning model according to a learning weight. Gram can be deduced. In this case, each pixel may be subjected to binary classification by comparing with a specific threshold value, and a tonal component that continuously appears 2 pixels or more at a specific time position with respect to the output value of the convolutional neural network may be changed to a size of 1 pixel. In addition, by comparing the output value with a pixel value at the same location of the original sound tomgram, a portion representing a value having the highest difference may be replaced with a preset value representing the tonal component. An output value of a pixel of the at least two pixels excluding a pixel having the largest difference may be displayed as a preset value representing the clutter.

한편, 전술한 본 발명에 따른 음탐그램 클러터를 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 장치의 구체적인 동작 및 그 원리에 대해 살펴보면 다음과 같다.On the other hand, a detailed operation of the convolutional neural network-based learning and reasoning apparatus for minimizing the sonic gram clutter according to the present invention and its principle will be described as follows.

이와 관련하여, 도 3은 본 발명에 따른 콘볼루션 뉴럴 네트워크 구조의 예시를 나타낸다. 도 3을 참조하면, 각 레이어(layer)마다 커널 크기(Kernel size), 채널 크기(# Channel) 및 활성화(Activation) 함수를 나타낸다. 이와 관련하여, 구체적인 사항은 다음과 같다.In this regard, FIG. 3 shows an example of a convolutional neural network structure according to the present invention. Referring to FIG. 3, a kernel size, a channel size (# Channel), and an activation function are shown for each layer. In this regard, specific matters are as follows.

- 도 3의 뉴럴 네트워크 구조는 음탐그램 클러터 제거를 위한 콘볼루션 뉴럴 네트워크의 예시이다. -The neural network structure of FIG. 3 is an example of a convolutional neural network for removing tomgram clutter.

- 음탐그램의 세로축은 시간축이며, 가로축은 주파수축이다. -The vertical axis of the sound tomgram is the time axis, and the horizontal axis is the frequency axis.

- Convolution layer의 Kernel size를 (rowsize, colsize)로 정의할 경우, 음탐그램의 시간축 정보를 잘 반영하기 위하여 rowsize≥colsize의 조건을 만족하도록 정의할 수 있다. -When the kernel size of the convolution layer is defined as (rowsize, colsize), it can be defined to satisfy the condition of rowsize≥colsize in order to reflect the time axis information of the tonegram well.

- ReLU Activation을 활용한 Convolutional layer를 다계층으로 활용할 수 있고, 마지막 Convolutional layer의 경우만 0 내지 1사이의 값으로 클러터/토널성분 유무를 표현할 수 있도록 Activation을 설정할 수 있다. -Convolutional layer using ReLU Activation can be used as multiple layers, and activation can be set so that only the last convolutional layer can express the presence or absence of clutter/tonal components with a value between 0 and 1.

한편, 도 4는 본 발명에 따른 학습 데이터 생성 방법을 나타낸다. 도 4를 참조하면, 원본 음탐그램 데이터로부터 라벨링 영상을 포함하는 학습 데이터 생성 방법의 기술적 특징은 다음과 같다.Meanwhile, FIG. 4 shows a method of generating learning data according to the present invention. Referring to FIG. 4, technical characteristics of a method of generating learning data including a labeling image from original sound gram data are as follows.

1) 원본 음탐그램과 그에 해당하는 라벨링 영상을 포함하는 것을 학습데이터로 정의할 수 있다.1) It can be defined as learning data that includes the original sound tomgram and the corresponding labeling image.

2) 원본 음탐그램을 상기 식별을 위한 콘볼루션 뉴럴 네트워크에 입력으로 사용하여 출력되는 결과물의 크기를 라벨링 영상의 크기로 동일하게 설정할 수 있다.2) The size of the output result may be set equal to the size of the labeling image by using the original sound tomgram as an input to the convolutional neural network for identification.

3) 원본 음탐그램에서 토널 성분이라고 생각되는 부분은 라벨링 영상의 해당하는 위치에 토널 성분을 대표하는 값(예: 1)으로 표기할 수 있다.3) In the original sound tomgram, the part that is considered to be a tonal component can be expressed as a value representing the tonal component (eg, 1) at a corresponding position in the labeling image.

4) 원본 음탐그램에서 토널 성분이 아니라고 생각되는 부분은 라벨링 영상의 해당하는 위치에 클러터를 대표하는 값(예: 0)으로 표기할 수 있다.4) In the original sound gram, the part that is not considered to be a tonal component can be marked as a value representing clutter (eg, 0) at a corresponding position in the labeling image.

한편, 도 5는 본 발명에 따른 음탐그램 특성을 고려한 데이터 증강 기법의 예시를 나타낸다. 도 5를 참조하면, 본 발명에 따른 음탐그램 특성을 고려한 데이터 증강 기법의 기술적 특징은 다음과 같다.Meanwhile, FIG. 5 shows an example of a data augmentation technique in consideration of the sound gram characteristics according to the present invention. Referring to FIG. 5, the technical characteristics of the data enhancement technique in consideration of the sound gram characteristics according to the present invention are as follows.

1) 2차원에서 연속이며, 1개의 함수값을 갖는 함수 f(x) (예시: 가우시안 확률밀도 함수)를 선정할 수 있다.1) A function f(x) that is continuous in two dimensions and has one function value (eg, Gaussian probability density function) can be selected.

2) 함수 f(x)를 활용하는 다음의 함수 g(x)를 g(x) = d * m * f(x) + s, where d∈{-1, 1}, m∈{실수}, s∈{실수} 로 정의할 수 있다.2) The following function g(x) that utilizes the function f(x) is converted into g(x) = d * m * f(x) + s, where d∈{-1, 1}, m∈{real}, It can be defined as s∈{real}.

3) 원본 음탐그램의 시간축 bin 수만큼 연속적으로 g(x) 값 추출할 수 있다.3) The g(x) value can be continuously extracted as much as the number of bins on the time axis of the original sound gram.

4) 원본 음탐그램의 각 시간위치마다 주파수 축을 추출된 g(x)만큼 이동시킬 수 있다.4) The frequency axis can be moved by the extracted g(x) at each time position of the original sound tomgram.

한편, 도 6은 본 발명에 따른 콘볼루션 뉴럴 네트워크 기반 추론 방법의 예시를 나타낸다. 도 6을 참조하면, 본 발명에 따른 콘볼루션 뉴럴 네트워크 기반 추론 방법의 기술적 특징은 다음과 같다.Meanwhile, FIG. 6 shows an example of an inference method based on a convolutional neural network according to the present invention. Referring to FIG. 6, the technical features of the convolutional neural network-based reasoning method according to the present invention are as follows.

1) 상기 방법에 의해 학습된 모델을 기반으로 음탐그램 입력에 대하여 추론결과에 대하여 각 픽셀에 해당하는 값들은 0 내지 1 사이의 실수값을 가지게 된다.1) Based on the model learned by the above method, values corresponding to each pixel with respect to the inference result for the input of the tomgram have a real value between 0 and 1.

2) 이를 이진분류를 위하여 특정 임계치 값 α를 기준으로 0 또는 1의 값으로 임계치 적용을 한다.2) For binary classification, a threshold value of 0 or 1 is applied based on a specific threshold value α.

3) 콘볼루션 뉴럴 네트워크의 출력 값에 관하여 특정 시간 위치에서 토널 성분 신호로 나타나는 부분이 2 픽셀이상 연속적으로 나타날 수 있다. 이것을 1픽셀의 크기로 바꿔야 할 경우, 상기 출력 값과 원본 음탐그램의 동일 위치 픽셀값과 비교하여, 차이가 가장 높은 값을 나타내는 부분이 토널성분을 대표하는 미리 설정된 값으로 치환되고, 그렇지 않은 픽셀에 대해서는 클러터를 대표하는 미리 설정된 값으로 표기된다.3) With respect to the output value of the convolutional neural network, a portion that appears as a tonal component signal at a specific time position may appear continuously for 2 pixels or more. When it is necessary to change this to the size of 1 pixel, the output value and the pixel value at the same position of the original tomgram are compared, and the part representing the highest difference value is replaced with a preset value representing the tonal component, and the pixel is not Is expressed as a preset value representing clutter.

전술한 내용을 토대로, 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법에 대해 살펴보면 다음과 같다. 이와 관련하여, 도 7은 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법의 흐름도를 나타낸다. 한편, 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법은 도 1의 콘볼루션 뉴럴 네트워크 기반 추론 장치(100)의 제어부(120)에 의해 수행될 수 있다.Based on the above description, a method of learning and inference based on a convolutional neural network for minimizing sound gram clutter according to the present invention will be described as follows. In this regard, FIG. 7 shows a flow chart of a convolutional neural network-based learning and inference method for minimizing sonar gram clutter according to the present invention. Meanwhile, the convolutional neural network-based learning and inference method may be performed by the controller 120 of the convolutional neural network-based inference apparatus 100 of FIG. 1.

도 7을 참조하면, 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법은 콘볼루션 뉴럴 네트워크 구성 단계(S110), 학습 데이터 생성 단계(S120) 및 음탐그램 추론 단계(S130)를 포함한다. Referring to FIG. 7, the convolutional neural network-based learning and inference method includes a convolutional neural network construction step (S110), a training data generation step (S120), and a tonegram inference step (S130).

콘볼루션 뉴럴 네트워크 구성 단계(S110)에서, 음탐그램의 클러터 제거를 위한 콘볼루션 뉴럴 네트워크를 구성한다. 한편, 학습 데이터 생성 단계(S120)에서, 음탐그램 데이터를 이용하여 학습 데이터를 생성한다. In the convolutional neural network configuration step (S110), a convolutional neural network is configured to remove clutter from the tomgram. On the other hand, in the learning data generation step (S120), learning data is generated by using the sound tomgram data.

또한, 음탐그램 추론 단계(S130)에서, 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트를 통해 음탐그램 클러터를 제거하고 토널 성분을 탐지하여, 음탐그램을 추론 (구분 또는 분류)할 수 있다.In addition, in the sound tomgram inference step (S130), the sound tomgram may be inferred (classified or classified) by removing the tonal component and removing the tonal component through the learned convolutional neural network structure and the learning weight.

이때, 음탐그램 식별을 위한 상기 콘볼루션 뉴럴 네트워크 구조는 Fully connected layer를 제외한 Convolutional layer, Activation layer, Batch Normalization layer를 포함할 수 있다. In this case, the structure of the convolutional neural network for identifying the tomgram may include a convolutional layer, an activation layer, and a batch normalization layer excluding a fully connected layer.

한편, 상기 콘볼루션 뉴럴 네트워크 구조는, 상기 Convolutional layer가 (rowsize, colsize)의 kernel size를 갖는 경우, 상기 음탐그램의 시간 정보를 잘 반영하기 위하여 rowsize≥colsize의 조건을 만족하는 kernel size를 갖는 것을 특징으로 할 수 있다.On the other hand, the convolutional neural network structure, when the convolutional layer has a kernel size of (rowsize, colsize), has a kernel size that satisfies the condition of rowsize≥colsize in order to reflect the temporal information of the tonegram well. It can be characterized.

한편, 상기 학습 데이터는 원본 음탐그램과 상기 원본 음탐그램에 해당하는 라벨링 영상을 포함할 수 있다. 이와 관련하여, 학습 데이터 생성 단계(S120)에서, 상기 라벨링 영상의 크기와 각 픽셀의 값을 설정할 수 있다. 이때, 상기 라벨링 영상의 크기는 상기 원본 음탐그램을 상기 콘볼루션 뉴럴 네트워크에 입력으로 사용하여 출력되는 결과물의 크기와 동일하게 설정될 수 있다.Meanwhile, the training data may include an original sound tomgram and a labeling image corresponding to the original sound tomgram. In this regard, in the learning data generation step S120, the size of the labeling image and the value of each pixel may be set. In this case, the size of the labeling image may be set equal to the size of a result output by using the original sound tomgram as an input to the convolutional neural network.

한편, 학습 데이터 생성 단계(S120)에서, 상기 원본 음탐그램에서 토널 성분이라고 판단되는 부분에 대해, 제1 라벨링 영상의 해당하는 위치에 상기 토널 성분을 대표하는 값으로 표시할 수 있다. 또한, 상기 원본 음탐그램에서 토널 성분이 아니라고 판단되는 부분에 대해, 제2 라벨링 영상의 해당하는 위치에 클러터를 대표하는 값으로 표시할 수 있다.Meanwhile, in the training data generation step S120, a portion determined as a tonal component in the original sound tomgram may be displayed as a value representing the tonal component at a corresponding position of the first labeling image. In addition, a portion of the original sound tomgram that is determined not to be a tonal component may be displayed as a value representing the clutter at a corresponding position in the second labeling image.

한편, 학습 데이터 생성 단계(S120)에서, 상기 라벨링 영상의 크기와 상기 각 픽셀의 값이 반영된 상기 음탐그램 데이터를 이용하여 상기 학습 데이터를 생성할 수 있다. 이때, 상기 음탐그램의 시간 및 주파수 특성을 고려하여, 상기 학습 데이터를 증강하여 생성할 수 있다.Meanwhile, in the training data generation step S120, the training data may be generated by using the sound tomgram data in which the size of the labeling image and the value of each pixel are reflected. In this case, the learning data may be augmented and generated in consideration of the time and frequency characteristics of the tonegram.

한편, 음탐그램 추론 단계(S130)에서, 상기 학습된 콘볼루션 뉴럴 네트워크 구조 및 학습 웨이트에 따른 학습 모델을 기반으로 상기 음탐그램의 입력에 대하여 상기 각 픽셀에 해당하는 값들이 0 내지 1 사이의 실수 값을 갖도록 음탐그램을 추론할 수 있다. On the other hand, in the tomgram inference step (S130), based on the learned convolutional neural network structure and a learning model according to a learning weight, values corresponding to the respective pixels for the input of the tomgram are real numbers between 0 and 1 You can infer the sound tomgram to have a value.

구체적으로, 상기 각 픽셀을 특정 임계치와 비교하여 이진분류하고, 콘볼루션 뉴럴 네트워크의 출력 값에 관하여 특정 시간 위치에서 2 픽셀 이상 연속적으로 나타나는 토널성분을 1 픽셀 크기로 변경할 수 있다. 또한, 상기 출력 값과 상기 원본 음탐그램의 동일 위치 픽셀값과 비교하여, 차이가 가장 높은 값을 나타내는 부분은 상기 토널성분을 대표하는 미리 설정된 값으로 치환될 수 있다. 상기 적어도 두 개의 픽셀들 중 상기 차이가 가장 큰 픽셀을 제외한 픽셀의 출력 값을 상기 클러터를 대표하는 미리 설정된 값으로 표시할 수 있다.Specifically, each pixel may be subjected to binary classification by comparing it with a specific threshold value, and a tonal component that continuously appears 2 pixels or more at a specific time position with respect to the output value of the convolutional neural network may be changed to a size of 1 pixel. In addition, by comparing the output value with a pixel value at the same location of the original sound tomgram, a portion representing a value having the highest difference may be replaced with a preset value representing the tonal component. An output value of a pixel of the at least two pixels excluding a pixel having the largest difference may be displayed as a preset value representing the clutter.

한편, 전술한 단계와 관련하여, 일부 단계가 그 순서를 변경하거나 또는 일부 단계를 반복하여 수행하는 것이 가능하다. 예를 들어, 학습 데이터 생성 단계(S120) 수행 중 콘볼루션 뉴럴 네트워크 구성 단계(S110)를 다시 수행하여 뉴럴 네트워크 구조에 일부 변경을 수행할 수 있다. 이에 따라, 일부 변경된 볼루션 뉴럴 네트워크에 기반하여, 학습 데이터 생성 단계(S120) 및 음탐그램 추론 단계(S130)를 수행할 수 있다.On the other hand, with respect to the above-described steps, it is possible for some steps to change their order or to repeatedly perform some steps. For example, while the training data generation step S120 is performed, the convolutional neural network configuration step S110 may be performed again to partially change the neural network structure. Accordingly, the training data generation step S120 and the sound gram inference step S130 may be performed based on the partially changed volute neural network.

이상에서는 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법과 이를 이용하는 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 장치에 대해 살펴보았다. 이러한 본 발명에 따른 음탐그램 클러터 최소화를 위한 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 방법과 이를 이용하는 콘볼루션 뉴럴 네트워크 기반 학습 및 추론 장치의 기술적 효과는 다음과 같다.In the above, a learning and inference method based on a convolutional neural network for minimizing sonar gram clutter and a convolutional neural network-based learning and inference apparatus using the same have been described. The technical effects of the convolutional neural network-based learning and inference method and the convolutional neural network-based learning and inference apparatus using the same for minimizing the sonic gram clutter according to the present invention are as follows.

본 발명의 적어도 일 실시 예에 따르면, 음탐 그램으로부터 클러터를 최소화시켜 토널 주파수 정보를 강조시킴으로써 토널 주파수를 자동/반자동으로 식별하는 과정을 지원할 수 있다는 장점이 있다.According to at least one embodiment of the present invention, there is an advantage that it is possible to support a process of automatically/semi-automatically identifying a tonal frequency by highlighting tonal frequency information by minimizing clutter from a sound tomgram.

또한, 본 발명의 적어도 일 실시 예에 따르면, 궁극적으로 음탐 그램을 이용한 식별/탐지 정확도의 향상을 기대할 수 있다는 장점이 있다.In addition, according to at least one embodiment of the present invention, there is an advantage that it is possible to ultimately expect an improvement in identification/detection accuracy using a sonic gram.

소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능뿐만 아니라 각각의 구성 요소들에 대한 설계 및 파라미터 최적화는 별도의 소프트웨어 모듈로도 구현될 수 있다. 적절한 프로그램 언어로 쓰여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 메모리에 저장되고, 제어부(controller) 또는 프로세서(processor)에 의해 실행될 수 있다.According to the software implementation, not only the procedures and functions described in the present specification, but also design and parameter optimization for each component may be implemented as a separate software module. The software code can be implemented as a software application written in an appropriate programming language. The software code may be stored in a memory and executed by a controller or a processor.

Claims

In the convolutional neural network-based inference method for minimizing sound gram clutter, the method is performed by a control unit,
A convolutional neural network configuration step of configuring a convolutional neural network for removing the clutter of the sound tamgram;
A learning data generation step of generating training data including an original sound tomgram and a labeling image corresponding to the original sound tomgram using the sound tomgram data; And
A sound gram inference step of inferring a sound gram by removing sound gram clutter and detecting a tonal component through a learned convolutional neural network structure and a learning weight generated based on the learning data,
The generating of the training data includes a process of setting a size of the labeling image and a value of each pixel,
The step of inferring tomgrams,
Based on the learned convolutional neural network structure and a learning model according to a learning weight, a sound tomgram is inferred so that values corresponding to each pixel have a real value between 0 and 1 for the input of the sound tomgram, The process of binary classification by comparing the pixels with a specific threshold,
With respect to the output value of the convolutional neural network, when a tonal component signal continuously appears in at least two pixels at a specific time position, the difference between the output value and the pixel value at the same position of the original sound tomgram is the largest pixel. Substituting an output value with a preset value representing the tonal component, and displaying an output value of a pixel excluding a pixel having the largest difference among the at least two pixels as a preset value representing the clutter Including, convolutional neural network-based reasoning method.

The method of claim 1,
The convolutional neural network structure for identifying a tomgram, characterized in that it includes a convolutional layer, an activation layer, and a batch normalization layer excluding a fully connected layer.

The method of claim 2,
The convolutional neural network structure,
When the convolutional layer has a kernel size of (rowsize, colsize), the convolutional neural network-based inference, characterized in that it has a kernel size that satisfies the condition of rowsize≥colsize in order to reflect the temporal information of the tonegram well Way.

The method of claim 1,
The size of the labeling image is set equal to the size of a result output by using the original sound tomgram as an input to the convolutional neural network.

The method of claim 4,
In the learning data generation step,
For a portion determined to be a tonal component in the original sound tomgram, a value representing the tonal component is displayed at a corresponding position in the first labeling image,
A convolutional neural network-based reasoning method, characterized in that a portion of the original tonegram that is determined not to be a tonal component is displayed as a value representing a clutter at a corresponding position of the second labeling image.

The method of claim 5,
In the learning data generation step,
Generating the training data by using the sound tomgram data reflecting the size of the labeling image and the value of each pixel,
In consideration of the time and frequency characteristics of the tonegram, characterized in that generating by augmenting the learning data, convolutional neural network-based inference method.

In the convolutional neural network-based inference device for minimizing the tomgram clutter,
An interface unit configured to receive image information for the sound tomgram;
Construct a convolutional neural network for removing the clutter of the tonegram,
Using the tomgram data, training data including an original tomgram and a labeling image corresponding to the original tomgram is generated,
A control unit configured to infer the sound tomgram by removing the tonal component and removing the tonal component through the learned convolutional neural network structure and training weight generated based on the learning data,
The control unit sets the size of the labeling image and the value of each pixel,
Based on the learned convolutional neural network structure and a learning model according to a learning weight, a sound tomgram is inferred so that values corresponding to each pixel have a real value between 0 and 1 for the input of the sound tomgram, Binary classification by comparing the pixels to a specific threshold,
When a tonal component signal continuously appears in at least two pixels at a specific time position in the output value of the convolutional neural network, the output value of the pixel having the largest difference between the output value and the pixel value at the same position of the original sound tomgram A convolutional neural network for substituting a preset value representing the tonal component and displaying an output value of a pixel excluding a pixel having the largest difference among the at least two pixels as a preset value representing the clutter Based reasoning device.

The method of claim 7,
The convolutional neural network structure for identifying a tomgram, characterized in that it includes a convolutional layer, an activation layer, and a batch normalization layer excluding a fully connected layer.

The method of claim 8,
The convolutional neural network structure,
When the convolutional layer has a kernel size of (rowsize, colsize), the convolutional neural network-based inference, characterized in that it has a kernel size that satisfies the condition of rowsize≥colsize in order to reflect the temporal information of the tonegram well Device.

The method of claim 7,
The size of the labeling image is set equal to the size of a result output by using the original sound tomgram as an input to the convolutional neural network.

The method of claim 10,
The control unit,
For a portion determined to be a tonal component in the original sound tomgram, a value representing the tonal component is displayed at a corresponding position in the first labeling image,
A convolutional neural network-based reasoning apparatus, characterized in that a portion of the original tonegram that is determined not to be a tonal component is displayed as a value representing a clutter at a corresponding position of the second labeling image.

The method of claim 10,
The control unit,
Generating the training data by using the sound tomgram data reflecting the size of the labeling image and the value of each pixel,
A convolutional neural network-based inference device, characterized in that for generating by augmenting the learning data in consideration of the time and frequency characteristics of the tonegram.

delete